[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


wesm commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658465856


   I think the rationale is that the memory and performance savings related to 
materializing the partition columns are mostly related to string data. So it's 
definitely beneficial to return them as dictionary types.
   
   IMHO if there is a change from dictionary/dense required post-1.0.0 it is 
not the end of the world, so I'm OK either with merging this as is or changing 
all partition types to be dictionary. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-14 Thread GitBox


wesm commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658256892


   I'm fine with whatever you decide but it would be good to merge this today



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

2020-07-11 Thread GitBox


wesm commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-657140613


   Where does this PR stand? It needs a rebase



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org