subject:"\[GitHub\] \[arrow\] wesm commented on pull request #7545\: ARROW\-9139\: \[Python\] Switch parquet.read_table to use new datasets API by default"

[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox

wesm commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856 I think the rationale is that the memory and performance savings related to materializing the partition columns are mostly related to string data. So it's definitely beneficial to

[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox

wesm commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658256892 I'm fine with whatever you decide but it would be good to merge this today This is an automated message from the

[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-11 Thread GitBox

wesm commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-657140613 Where does this PR stand? It needs a rebase This is an automated message from the Apache Git Service. To respond to