Hi all,
I've tried defining my field with the following:
fields.push_back(
parquet::schema::PrimitiveNode::Make(
"field_name",
parquet::Repetition::REQUIRED,
parquet::Type::INT64,
parquet::ConvertedType::INT_64)
);
and I'm not sure if it's possible to specify a null value for an int64
column. I understand that C++ ints don't have a null value. I write to the
field with the following:
os << std::numeric_limits<int64_t>::quiet_NaN();
where os is:
parquet::WriterProperties::Builder builder_;
parquet::StreamWriter os {parquet::ParquetFileWriter::Open(outfile_,
schema_, builder_.build())};
This (as expected) writes a 0 for the value. But is there a way to specify
a null value? From my understanding parquet::Repetition:OPTIONAL is meant
for repeating groups.
My actual usecase is trying to represent a null linux epoch timestamp in
nanos e.g. NaN or NaT in the resulting pandas dataframe after reading the
written parquet file. It seems like in Pandas, int columns with nulls are
implicitly casted to float but I think parquet is able to define a null
value like this. Is this the only way to achieve this to convert the column
to a float or is there a way to specify value is null in parquet cpp?
Thank You,
Arun Joseph