Hi Micah,

I couldn't find arrow::util::Optional::nullopt but I did find
arrow::util::nullopt which also did not seem to work. However, I then
found arrow::util::optional<T>()
right afterwhich seems to output NaNs!

I do see that the resulting dataframe when loaded in pandas has the column
dtype as float64. Do you know if there is a way to define the schema such
that I can input an uint64_t (linux epoch time nanos) and have it output as
datetime64[ns] in parquet cpp?

Thank You,
Arun

On Tue, Sep 13, 2022 at 10:49 PM Micah Kornfield <[email protected]>
wrote:

> Hi Arun,
> The schema should be `parquet::Repetition:OPTIONAL`, 
> parquet::Repetition:REPEATED
> should be for repeated groups.  IIRC you can insert
> arrow::util::Optional::nullopt into the stream for a null value.
>
> Hope this helps.
>
> Micah
>
> On Tue, Sep 13, 2022 at 8:58 AM Arun Joseph <[email protected]> wrote:
>
>> Hi all,
>>
>> I've tried defining my field with the following:
>>
>> fields.push_back(
>>   parquet::schema::PrimitiveNode::Make(
>>     "field_name",
>>     parquet::Repetition::REQUIRED,
>>     parquet::Type::INT64,
>>     parquet::ConvertedType::INT_64)
>> );
>>
>> and I'm not sure if it's possible to specify a null value for an int64
>> column. I understand that C++ ints don't have a null value. I write to the
>> field with the following:
>>
>> os << std::numeric_limits<int64_t>::quiet_NaN();
>>
>> where os is:
>>
>> parquet::WriterProperties::Builder builder_;
>> parquet::StreamWriter os {parquet::ParquetFileWriter::Open(outfile_,
>> schema_, builder_.build())};
>>
>> This (as expected) writes a 0 for the value. But is there a way to
>> specify a null value? From my understanding parquet::Repetition:OPTIONAL is
>> meant for repeating groups.
>>
>> My actual usecase is trying to represent a null linux epoch timestamp in
>> nanos e.g. NaN or NaT in the resulting pandas dataframe after reading the
>> written parquet file. It seems like in Pandas, int columns with nulls
>> are implicitly casted to float but I think parquet is able to define a
>> null value like this. Is this the only way to achieve this to convert
>> the column to a float or is there a way to specify value is null in
>> parquet cpp?
>>
>> Thank You,
>> Arun Joseph
>>
>>

-- 
Arun Joseph

Reply via email to