On Thu, 14 Jul 2022 at 08:34, Micah Kornfield <[email protected]> wrote:

> Hi Louis,
> I would lean against doing this.  Parquet doesn't seem to be prescriptive,
> but I understand Time type to have a max value of at most 1 day (i.e. 86400
> seconds, this is how Arrow defines the type at least [1]).  Durations can
> be larger and that can lead to ambiguity in handling.  Second, the Arrow
> schema should be preserved by default when writing the parquet file so it
> should be recoverable, I understand this doesn't help for non-arrow based
> systems but it potentially gives a work-around in some contexts.
>

Small note: what Micah mentions here about preserving this information in
the arrow schema (stored in the parquet file metadata) so roundtrips
from/to arrow-based systems work for duration, this is implemented earlier
this year and available since 8.0.0 (
https://issues.apache.org/jira/browse/ARROW-6780)


>
> I think the more appropriate solution is to see if there is interest in
> extending Parquet's type system for this type OR figuring out conventions
> that are more universal for logical types that aren't in Parquet's type
> system.
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L222
>
> On Tue, Jul 12, 2022 at 8:02 AM Louis C <[email protected]> wrote:
>
>> Hello,
>>
>> I integrated the arrow library to a larger project, and was testing doing
>> exports/imports of the same tables to see if it behaved well. Doing this, I
>> became aware that arrow DURATION types were exported as INT64 (as the
>> corresponding number of µs if I remember correctly) in the parquet export,
>> and then imported as INT64 types. So the parquet export loses the type for
>> the DURATION fields.
>> Would not it be better to export the DURATION type as the parquet logical
>> type "TIME_MICROS" (meaning TIME wit precision micro, as TIME_MICROS seems
>> to be somewhat deprecated (
>> https://apache.googlesource.com/parquet-format/+/refs/heads/bloom-filter/LogicalTypes.md))
>> as is doing matlab (see
>> https://fr.mathworks.com/help/matlab/import_export/datatype-mappings-matlab-parquet.html)
>> ?
>>
>> Best regards,
>> Louis C
>>
>

Reply via email to