On Thu, 14 Jul 2022 at 08:34, Micah Kornfield <[email protected]> wrote:
> Hi Louis, > I would lean against doing this. Parquet doesn't seem to be prescriptive, > but I understand Time type to have a max value of at most 1 day (i.e. 86400 > seconds, this is how Arrow defines the type at least [1]). Durations can > be larger and that can lead to ambiguity in handling. Second, the Arrow > schema should be preserved by default when writing the parquet file so it > should be recoverable, I understand this doesn't help for non-arrow based > systems but it potentially gives a work-around in some contexts. > Small note: what Micah mentions here about preserving this information in the arrow schema (stored in the parquet file metadata) so roundtrips from/to arrow-based systems work for duration, this is implemented earlier this year and available since 8.0.0 ( https://issues.apache.org/jira/browse/ARROW-6780) > > I think the more appropriate solution is to see if there is interest in > extending Parquet's type system for this type OR figuring out conventions > that are more universal for logical types that aren't in Parquet's type > system. > > Thanks, > Micah > > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L222 > > On Tue, Jul 12, 2022 at 8:02 AM Louis C <[email protected]> wrote: > >> Hello, >> >> I integrated the arrow library to a larger project, and was testing doing >> exports/imports of the same tables to see if it behaved well. Doing this, I >> became aware that arrow DURATION types were exported as INT64 (as the >> corresponding number of µs if I remember correctly) in the parquet export, >> and then imported as INT64 types. So the parquet export loses the type for >> the DURATION fields. >> Would not it be better to export the DURATION type as the parquet logical >> type "TIME_MICROS" (meaning TIME wit precision micro, as TIME_MICROS seems >> to be somewhat deprecated ( >> https://apache.googlesource.com/parquet-format/+/refs/heads/bloom-filter/LogicalTypes.md)) >> as is doing matlab (see >> https://fr.mathworks.com/help/matlab/import_export/datatype-mappings-matlab-parquet.html) >> ? >> >> Best regards, >> Louis C >> >
