[jira] [Updated] (ARROW-6396) [C++] Add ResolveNullOptions to Logical kernels

2019-11-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6396: -- Labels: dataset pull-request-available (was: dataset) > [C++] Add ResolveNullOptions to

[jira] [Commented] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966838#comment-16966838 ] Antoine Pitrou commented on ARROW-7056: --- After a cleanup, I'm not seeing any failures anymore.

[jira] [Resolved] (ARROW-7033) [C++] Error in./configure step for jemalloc when building on OSX 10.14.6

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-7033. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5769

[jira] [Closed] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou closed ARROW-7056. - Resolution: Cannot Reproduce > [Python] test errors without S3 > ---

[jira] [Updated] (ARROW-7057) [C++] Add API to parse URI query strings

2019-11-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7057: -- Labels: pull-request-available (was: ) > [C++] Add API to parse URI query strings >

[jira] [Resolved] (ARROW-6944) [Rust] Add StringType

2019-11-04 Thread Neville Dipale (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Dipale resolved ARROW-6944. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5722

[jira] [Commented] (ARROW-7043) [Python] pyarrow.csv.read_csv, memory consumed much larger than raw pandas.read_csv

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966785#comment-16966785 ] Wes McKinney commented on ARROW-7043: - The jemalloc changes in 0.15.1 could be a factor here >

[jira] [Updated] (ARROW-3789) [Python] Enable calling object in Table.to_pandas to "self-destruct" for improved memory use

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3789: Fix Version/s: 1.0.0 > [Python] Enable calling object in Table.to_pandas to "self-destruct" for >

[jira] [Assigned] (ARROW-3789) [Python] Enable calling object in Table.to_pandas to "self-destruct" for improved memory use

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3789: --- Assignee: Wes McKinney > [Python] Enable calling object in Table.to_pandas to

[jira] [Resolved] (ARROW-6825) [C++] Rework CSV reader IO around readahead iterator

2019-11-04 Thread Ben Kietzman (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-6825. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5727

[jira] [Commented] (ARROW-7043) [Python] pyarrow.csv.read_csv, memory consumed much larger than raw pandas.read_csv

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966682#comment-16966682 ] Antoine Pitrou commented on ARROW-7043: --- I've tried to reproduce using pyarrow 0.15.1, compiled

[jira] [Created] (ARROW-7057) [C++] Add API to parse URI query strings

2019-11-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7057: - Summary: [C++] Add API to parse URI query strings Key: ARROW-7057 URL: https://issues.apache.org/jira/browse/ARROW-7057 Project: Apache Arrow Issue Type:

[jira] [Commented] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Krisztian Szucs (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966759#comment-16966759 ] Krisztian Szucs commented on ARROW-7056: [~apitrou] the fixture is marked with

[jira] [Created] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7056: - Summary: [Python] test errors without S3 Key: ARROW-7056 URL: https://issues.apache.org/jira/browse/ARROW-7056 Project: Apache Arrow Issue Type: Bug

[jira] [Commented] (ARROW-7028) [R] Date roundtrip results in different R storage mode

2019-11-04 Thread Neal Richardson (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966831#comment-16966831 ] Neal Richardson commented on ARROW-7028: The error message suggests a workaround that avoids

[jira] [Resolved] (ARROW-6942) [Developer] Add support for Parquet in pull request check by GitHub Actions

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-6942. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5768

[jira] [Reopened] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reopened ARROW-7056: --- Woops. I was checking {{test_filesystem.py}}. {{test_fs.py}} still fails. > [Python] test

[jira] [Created] (ARROW-7058) [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to the base_dir of its selector

2019-11-04 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7058: --- Summary: [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to the base_dir of its selector Key: ARROW-7058 URL:

[jira] [Updated] (ARROW-7058) [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to the base_dir of its selector

2019-11-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7058: -- Labels: pull-request-available (was: ) > [C++] FileSystemDataSourceDiscovery should apply

[jira] [Commented] (ARROW-4890) [Python] Spark+Arrow Grouped pandas UDAF - read length must be positive or -1

2019-11-04 Thread SURESH CHAGANTI (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966994#comment-16966994 ] SURESH CHAGANTI commented on ARROW-4890: [~emkornfi...@gmail.com] is there any size limit as to

[jira] [Created] (ARROW-7061) [C++][Dataset] FileSystemDiscovery with ParquetFileFormat should ignore files that aren't Parquet

2019-11-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7061: -- Summary: [C++][Dataset] FileSystemDiscovery with ParquetFileFormat should ignore files that aren't Parquet Key: ARROW-7061 URL:

[jira] [Created] (ARROW-7063) [C++] Schema print method prints too much metadata

2019-11-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7063: -- Summary: [C++] Schema print method prints too much metadata Key: ARROW-7063 URL: https://issues.apache.org/jira/browse/ARROW-7063 Project: Apache Arrow

[jira] [Assigned] (ARROW-7047) [C++][Dataset] Filter expressions should not require exact type match

2019-11-04 Thread Ben Kietzman (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman reassigned ARROW-7047: --- Assignee: Ben Kietzman > [C++][Dataset] Filter expressions should not require exact type

[jira] [Updated] (ARROW-7060) [R] Post-0.15.1 cleanup

2019-11-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7060: -- Labels: pull-request-available (was: ) > [R] Post-0.15.1 cleanup > --- >

[jira] [Created] (ARROW-7059) Reading parquet file with many columns is still slow for 0.15.1

2019-11-04 Thread Eric Kisslinger (Jira)
Eric Kisslinger created ARROW-7059: -- Summary: Reading parquet file with many columns is still slow for 0.15.1 Key: ARROW-7059 URL: https://issues.apache.org/jira/browse/ARROW-7059 Project: Apache

[jira] [Created] (ARROW-7064) [R] Implement null type

2019-11-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7064: -- Summary: [R] Implement null type Key: ARROW-7064 URL: https://issues.apache.org/jira/browse/ARROW-7064 Project: Apache Arrow Issue Type: New Feature

[jira] [Commented] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966926#comment-16966926 ] Antoine Pitrou commented on ARROW-7056: --- The problem seems to be that a function like

[jira] [Commented] (ARROW-7056) [Python] test errors without S3

2019-11-04 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966927#comment-16966927 ] Antoine Pitrou commented on ARROW-7056: --- I think we should ditch the pytest magic and use regular

[jira] [Created] (ARROW-7060) [R] Post-0.15.1 cleanup

2019-11-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7060: -- Summary: [R] Post-0.15.1 cleanup Key: ARROW-7060 URL: https://issues.apache.org/jira/browse/ARROW-7060 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-7062) [C++] Parquet file parse error messages should include the file name

2019-11-04 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7062: -- Summary: [C++] Parquet file parse error messages should include the file name Key: ARROW-7062 URL: https://issues.apache.org/jira/browse/ARROW-7062 Project:

[jira] [Assigned] (ARROW-6625) [Python] Allow concat_tables to null or default fill missing columns

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-6625: --- Assignee: Zhuo Peng > [Python] Allow concat_tables to null or default fill missing columns

[jira] [Commented] (ARROW-7059) Reading parquet file with many columns is still slow for 0.15.1

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967226#comment-16967226 ] Wes McKinney commented on ARROW-7059: - I can confirm that the performance in 0.15.0/0.15.1 is slower.

[jira] [Updated] (ARROW-7059) [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-7059: Summary: [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x

[jira] [Updated] (ARROW-7038) [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs

2019-11-04 Thread Kevin Jung (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated ARROW-7038: -- Attachment: arrow version.png > [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs >

[jira] [Commented] (ARROW-7038) [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs

2019-11-04 Thread Kevin Jung (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967246#comment-16967246 ] Kevin Jung commented on ARROW-7038: --- [~wesm]  It is still present when I upgraded pyarrow to 0.15.1.

[jira] [Commented] (ARROW-6820) [C++] [Doc] [Format] Map specification and implementation inconsistent

2019-11-04 Thread Bryan Cutler (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967123#comment-16967123 ] Bryan Cutler commented on ARROW-6820: - I don't have a strong preference to specific naming, but we

[jira] [Commented] (ARROW-7038) [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967210#comment-16967210 ] Wes McKinney commented on ARROW-7038: - I'm not sure the reason. You're using a pretty old version of

[jira] [Updated] (ARROW-7038) [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs

2019-11-04 Thread Kevin Jung (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated ARROW-7038: -- Attachment: (was: arrow version.png) > [Python] Reading from HDFS after ctrl+c(SIGTERM) causes

[jira] [Updated] (ARROW-6904) [Python] Implement MapArray and MapType

2019-11-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6904: -- Labels: pull-request-available (was: ) > [Python] Implement MapArray and MapType >

[jira] [Updated] (ARROW-7049) [C++] warnings building on mingw-w64

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-7049: Fix Version/s: 1.0.0 > [C++] warnings building on mingw-w64 >

[jira] [Updated] (ARROW-7050) [R] Compiler warning in R bindings

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-7050: Fix Version/s: 1.0.0 > [R] Compiler warning in R bindings > -- > >

[jira] [Assigned] (ARROW-7052) [C++] Datasets example fails to build with ARROW_SHARED=OFF

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-7052: --- Assignee: Wes McKinney > [C++] Datasets example fails to build with ARROW_SHARED=OFF >

[jira] [Resolved] (ARROW-7052) [C++] Datasets example fails to build with ARROW_SHARED=OFF

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-7052. - Resolution: Fixed Issue resolved by pull request 5776

[jira] [Assigned] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-2428: --- Assignee: Joris Van den Bossche > [Python] Add API to map Arrow types (including extension

[jira] [Resolved] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-2428. - Resolution: Fixed Issue resolved by pull request 5512

[jira] [Updated] (ARROW-7038) [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs

2019-11-04 Thread Kevin Jung (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Jung updated ARROW-7038: -- Attachment: arrow version.png > [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs >

[jira] [Resolved] (ARROW-7057) [C++] Add API to parse URI query strings

2019-11-04 Thread Ben Kietzman (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-7057. - Resolution: Fixed Issue resolved by pull request 5770

[jira] [Updated] (ARROW-7052) [C++] Datasets example fails to build with ARROW_SHARED=OFF

2019-11-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7052: -- Labels: pull-request-available (was: ) > [C++] Datasets example fails to build with

[jira] [Updated] (ARROW-6282) [Format] Support lossy compression

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6282: Summary: [Format] Support lossy compression (was: Support lossy compression) > [Format] Support

[jira] [Closed] (ARROW-7053) [Python] setuptools-scm produces incorrect version at apache-arrow-0.15.1 tag

2019-11-04 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-7053. --- Resolution: Cannot Reproduce This seems to have been caused by not running {{git submodule update}}

[jira] [Comment Edited] (ARROW-7038) [Python] Reading from HDFS after ctrl+c(SIGTERM) causes python hangs

2019-11-04 Thread Kevin Jung (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967246#comment-16967246 ] Kevin Jung edited comment on ARROW-7038 at 11/5/19 7:24 AM: [~wesm]  It is

[jira] [Commented] (ARROW-7028) [R] Date roundtrip results in different R storage mode

2019-11-04 Thread Bai Ming (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966644#comment-16966644 ] Bai Ming commented on ARROW-7028: - Hi, I've just recently tried out the arrow R package, and I've ran

[jira] [Closed] (ARROW-6935) [Java] Improve the performance of comparing two blocks of heap data

2019-11-04 Thread Liya Fan (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liya Fan closed ARROW-6935. --- Resolution: Won't Fix > [Java] Improve the performance of comparing two blocks of heap data >

[jira] [Commented] (ARROW-7026) [Java] Remove assertions in MessageSerializer/vector/writer/reader

2019-11-04 Thread Ji Liu (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966597#comment-16966597 ] Ji Liu commented on ARROW-7026: --- Besides, some assertions are in the hot path(i.e. {{VarCharVector}} set