Hi Micah, I couldn't find arrow::util::Optional::nullopt but I did find arrow::util::nullopt which also did not seem to work. However, I then found arrow::util::optional<T>() right afterwhich seems to output NaNs!
I do see that the resulting dataframe when loaded in pandas has the column dtype as float64. Do you know if there is a way to define the schema such that I can input an uint64_t (linux epoch time nanos) and have it output as datetime64[ns] in parquet cpp? Thank You, Arun On Tue, Sep 13, 2022 at 10:49 PM Micah Kornfield <[email protected]> wrote: > Hi Arun, > The schema should be `parquet::Repetition:OPTIONAL`, > parquet::Repetition:REPEATED > should be for repeated groups. IIRC you can insert > arrow::util::Optional::nullopt into the stream for a null value. > > Hope this helps. > > Micah > > On Tue, Sep 13, 2022 at 8:58 AM Arun Joseph <[email protected]> wrote: > >> Hi all, >> >> I've tried defining my field with the following: >> >> fields.push_back( >> parquet::schema::PrimitiveNode::Make( >> "field_name", >> parquet::Repetition::REQUIRED, >> parquet::Type::INT64, >> parquet::ConvertedType::INT_64) >> ); >> >> and I'm not sure if it's possible to specify a null value for an int64 >> column. I understand that C++ ints don't have a null value. I write to the >> field with the following: >> >> os << std::numeric_limits<int64_t>::quiet_NaN(); >> >> where os is: >> >> parquet::WriterProperties::Builder builder_; >> parquet::StreamWriter os {parquet::ParquetFileWriter::Open(outfile_, >> schema_, builder_.build())}; >> >> This (as expected) writes a 0 for the value. But is there a way to >> specify a null value? From my understanding parquet::Repetition:OPTIONAL is >> meant for repeating groups. >> >> My actual usecase is trying to represent a null linux epoch timestamp in >> nanos e.g. NaN or NaT in the resulting pandas dataframe after reading the >> written parquet file. It seems like in Pandas, int columns with nulls >> are implicitly casted to float but I think parquet is able to define a >> null value like this. Is this the only way to achieve this to convert >> the column to a float or is there a way to specify value is null in >> parquet cpp? >> >> Thank You, >> Arun Joseph >> >> -- Arun Joseph
