[jira] [Commented] (ARROW-9065) [Python] Support parsing date32 in dataset partition folders
[ https://issues.apache.org/jira/browse/ARROW-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133353#comment-17133353 ] Francois Saint-Jacques commented on ARROW-9065: --- There's a general void of time based types support in dataset, we need to clean this before 1.0.0. > [Python] Support parsing date32 in dataset partition folders > > > Key: ARROW-9065 > URL: https://issues.apache.org/jira/browse/ARROW-9065 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Dave Hirschfeld >Assignee: Francois Saint-Jacques >Priority: Minor > Labels: dataset > > I have some data which is partitioned by year/month/date. It would be useful > if the date could be automatically parsed: > {code:python} > In [17]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), > ("day", pa.date32())]) > In [18]: partition = DirectoryPartitioning(schema) > In [19]: partition.parse("/2020/06/2020-06-08") > --- > ArrowNotImplementedError Traceback (most recent call last) > in > > 1 partition.parse("/2020/06/2020-06-08") > ~\envs\dev\lib\site-packages\pyarrow\_dataset.pyx in > pyarrow._dataset.Partitioning.parse() > ~\envs\dev\lib\site-packages\pyarrow\error.pxi in > pyarrow.lib.pyarrow_internal_check_status() > ~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status() > ArrowNotImplementedError: parsing scalars of type date32[day] > {code} > Not a big issue since you can just use string and convert, but nevertheless > it would be nice if it Just Worked > {code} > In [22]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), > ("day", pa.string())]) > In [23]: partition = DirectoryPartitioning(schema) > In [24]: partition.parse("/2020/06/2020-06-08") > Out[24]: 6:int8)) and (day == 2020-06-08:string))> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9065) [Python] Support parsing date32 in dataset partition folders
[ https://issues.apache.org/jira/browse/ARROW-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133207#comment-17133207 ] Joris Van den Bossche commented on ARROW-9065: -- [~dhirschfeld] thanks for the report cc [~fsaintjacques] [~bkietz] I am not sure to what extent we want to start parsing non-primitive types? (certainly for things like dates where you can have different formats ..) > [Python] Support parsing date32 in dataset partition folders > > > Key: ARROW-9065 > URL: https://issues.apache.org/jira/browse/ARROW-9065 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Dave Hirschfeld >Priority: Minor > > I have some data which is partitioned by year/month/date. It would be useful > if the date could be automatically parsed: > {code:python} > In [17]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), > ("day", pa.date32())]) > In [18]: partition = DirectoryPartitioning(schema) > In [19]: partition.parse("/2020/06/2020-06-08") > --- > ArrowNotImplementedError Traceback (most recent call last) > in > > 1 partition.parse("/2020/06/2020-06-08") > ~\envs\dev\lib\site-packages\pyarrow\_dataset.pyx in > pyarrow._dataset.Partitioning.parse() > ~\envs\dev\lib\site-packages\pyarrow\error.pxi in > pyarrow.lib.pyarrow_internal_check_status() > ~\envs\dev\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status() > ArrowNotImplementedError: parsing scalars of type date32[day] > {code} > Not a big issue since you can just use string and convert, but nevertheless > it would be nice if it Just Worked > {code} > In [22]: schema = pa.schema([("year", pa.int16()), ("month", pa.int8()), > ("day", pa.string())]) > In [23]: partition = DirectoryPartitioning(schema) > In [24]: partition.parse("/2020/06/2020-06-08") > Out[24]: 6:int8)) and (day == 2020-06-08:string))> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)