[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-18 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-645976313


   @bkietz did you already open some follow-up JIRAs? (eg for 
https://github.com/apache/arrow/pull/7156#discussion_r439503475)
   
   I will handle my comment at 
https://github.com/apache/arrow/pull/7156#discussion_r442172970 in 
https://github.com/apache/arrow/pull/7468



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-18 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-645971102


   The travis failure is an unrelated Flight failure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966741


   Note that it is not *only* for testing. We for sure use it for testing in 
pyarrow, but in pandas 1.0.4, we accidentally broke reading parquet files from 
file-like objects, and we directly got a some bug reports about it. So actual 
users also do that, to a certain extent.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-16 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966084


   Taking a step back: wouldn't it be possible to eg "just" allow to create a 
Fragment from a buffer instead from a file?
   
   In practice, I think we only need to support dealing with buffers when there 
is a *single* buffer (so not like paths, where you can have multiple paths or a 
directory etc). And then do we need discovery at all? If we can construct a 
Fragment backed by a buffer instead of a file path, then you can create a 
Dataset from that, either with the physical schema of the fragment (no 
unification is needed if there is only one) or either with a user-specified 
schema.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-06-02 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-637743541


   Regarding my comments on `dataset.py`, since that file is in a need of a 
general clean-up regarding "input" handling (the handling of single file path / 
directory path / list of paths / ...), it's certainly fine for me to only deal 
with those comment then



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles

2020-05-14 Thread GitBox


jorisvandenbossche commented on pull request #7156:
URL: https://github.com/apache/arrow/pull/7156#issuecomment-628657581


   I am testing the other parquet tests that are also skipped, and that turned 
up already one issue: https://issues.apache.org/jira/browse/ARROW-8799
   
   With a few small edits, all other tests pass now. Do I push that here? (can 
also keep for follow-up PR)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org