[jira] [Updated] (ARROW-3325) [Python] Support reading Parquet binary/string columns as pandas Categorical
[ https://issues.apache.org/jira/browse/ARROW-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3325: Fix Version/s: (was: 0.14.0) 1.0.0 > [Python] Support reading Parquet binary/string columns as pandas Categorical > > > Key: ARROW-3325 > URL: https://issues.apache.org/jira/browse/ARROW-3325 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 1.0.0 > > > Requires PARQUET-1324 and probably quite a bit of extra work > Properly implementing this will require dictionary normalization across row > groups. When reading a new row group, a fast path that compares the current > dictionary with the prior dictionary should be used. This also needs to > handle the case where a column chunk "fell back" to PLAIN encoding mid-stream -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3325) [Python] Support reading Parquet binary/string columns as pandas Categorical
[ https://issues.apache.org/jira/browse/ARROW-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3325: Fix Version/s: (was: 0.13.0) 0.14.0 > [Python] Support reading Parquet binary/string columns as pandas Categorical > > > Key: ARROW-3325 > URL: https://issues.apache.org/jira/browse/ARROW-3325 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.14.0 > > > Requires PARQUET-1324 and probably quite a bit of extra work > Properly implementing this will require dictionary normalization across row > groups. When reading a new row group, a fast path that compares the current > dictionary with the prior dictionary should be used. This also needs to > handle the case where a column chunk "fell back" to PLAIN encoding mid-stream -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3325) [Python] Support reading Parquet binary/string columns as pandas Categorical
[ https://issues.apache.org/jira/browse/ARROW-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3325: Fix Version/s: (was: 0.12.0) 0.13.0 > [Python] Support reading Parquet binary/string columns as pandas Categorical > > > Key: ARROW-3325 > URL: https://issues.apache.org/jira/browse/ARROW-3325 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.13.0 > > > Requires PARQUET-1324 and probably quite a bit of extra work > Properly implementing this will require dictionary normalization across row > groups. When reading a new row group, a fast path that compares the current > dictionary with the prior dictionary should be used. This also needs to > handle the case where a column chunk "fell back" to PLAIN encoding mid-stream -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3325) [Python] Support reading Parquet binary/string columns as pandas Categorical
[ https://issues.apache.org/jira/browse/ARROW-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3325: Labels: parquet (was: ) > [Python] Support reading Parquet binary/string columns as pandas Categorical > > > Key: ARROW-3325 > URL: https://issues.apache.org/jira/browse/ARROW-3325 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.12.0 > > > Requires PARQUET-1324 and probably quite a bit of extra work > Properly implementing this will require dictionary normalization across row > groups. When reading a new row group, a fast path that compares the current > dictionary with the prior dictionary should be used. This also needs to > handle the case where a column chunk "fell back" to PLAIN encoding mid-stream -- This message was sent by Atlassian JIRA (v7.6.3#76005)