subject:"\[GitHub\] \[arrow\] martindurant commented on pull request #7545\: ARROW\-9139\: \[Python\] Switch parquet.read_table to use new datasets API by default"

[GitHub] [arrow] martindurant commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-15 Thread GitBox

martindurant commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658919780 Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] martindurant commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-15 Thread GitBox

martindurant commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658917048 (please please do ensure that dict encoding does happen, at least for str) This is an automated message from

[GitHub] [arrow] martindurant commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox

martindurant commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658254501 I would advicate for yes, keeping dict encoding for partitioning column: I think it's the obvious mapping, and of course saves a lot of memory/processing. Part of the reason