[jira] [Updated] (ARROW-5965) [Python] Regression: segfault when reading hive table with v0.14
[ https://issues.apache.org/jira/browse/ARROW-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5965: --- Labels: parquet (was: ) > [Python] Regression: segfault when reading hive table with v0.14 > > > Key: ARROW-5965 > URL: https://issues.apache.org/jira/browse/ARROW-5965 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.0 >Reporter: H. Vetinari >Priority: Critical > Labels: parquet > > I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow > installed in a conda env. > The data I'm reading is a hive(-registered) table written as parquet, and > with v0.13, reading this table (that is partitioned) does not cause any > issues. > The code that worked before and now crashes with v0.14 is simply: > ``` > import pyarrow.parquet as pq > pq.ParquetDataset('hdfs:///data/raw/source/table').read() > ``` > Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I > cannot report much more, but this is a pretty severe usability restriction. > So far the solution is to enforce `pyarrow<0.14` -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5965) [Python] Regression: segfault when reading hive table with v0.14
[ https://issues.apache.org/jira/browse/ARROW-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5965: --- Component/s: Python > [Python] Regression: segfault when reading hive table with v0.14 > > > Key: ARROW-5965 > URL: https://issues.apache.org/jira/browse/ARROW-5965 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.0 >Reporter: H. Vetinari >Priority: Critical > > I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow > installed in a conda env. > The data I'm reading is a hive(-registered) table written as parquet, and > with v0.13, reading this table (that is partitioned) does not cause any > issues. > The code that worked before and now crashes with v0.14 is simply: > ``` > import pyarrow.parquet as pq > pq.ParquetDataset('hdfs:///data/raw/source/table').read() > ``` > Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I > cannot report much more, but this is a pretty severe usability restriction. > So far the solution is to enforce `pyarrow<0.14` -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-5965) [Python] Regression: segfault when reading hive table with v0.14
[ https://issues.apache.org/jira/browse/ARROW-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5965: --- Summary: [Python] Regression: segfault when reading hive table with v0.14 (was: Regression: segfault when reading hive table with v0.14) > [Python] Regression: segfault when reading hive table with v0.14 > > > Key: ARROW-5965 > URL: https://issues.apache.org/jira/browse/ARROW-5965 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: H. Vetinari >Priority: Critical > > I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow > installed in a conda env. > The data I'm reading is a hive(-registered) table written as parquet, and > with v0.13, reading this table (that is partitioned) does not cause any > issues. > The code that worked before and now crashes with v0.14 is simply: > ``` > import pyarrow.parquet as pq > pq.ParquetDataset('hdfs:///data/raw/source/table').read() > ``` > Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I > cannot report much more, but this is a pretty severe usability restriction. > So far the solution is to enforce `pyarrow<0.14` -- This message was sent by Atlassian JIRA (v7.6.14#76016)