[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles
jorisvandenbossche commented on pull request #7156: URL: https://github.com/apache/arrow/pull/7156#issuecomment-645976313 @bkietz did you already open some follow-up JIRAs? (eg for https://github.com/apache/arrow/pull/7156#discussion_r439503475) I will handle my comment at https://github.com/apache/arrow/pull/7156#discussion_r442172970 in https://github.com/apache/arrow/pull/7468 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles
jorisvandenbossche commented on pull request #7156: URL: https://github.com/apache/arrow/pull/7156#issuecomment-645971102 The travis failure is an unrelated Flight failure This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles
jorisvandenbossche commented on pull request #7156: URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966741 Note that it is not *only* for testing. We for sure use it for testing in pyarrow, but in pandas 1.0.4, we accidentally broke reading parquet files from file-like objects, and we directly got a some bug reports about it. So actual users also do that, to a certain extent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles
jorisvandenbossche commented on pull request #7156: URL: https://github.com/apache/arrow/pull/7156#issuecomment-644966084 Taking a step back: wouldn't it be possible to eg "just" allow to create a Fragment from a buffer instead from a file? In practice, I think we only need to support dealing with buffers when there is a *single* buffer (so not like paths, where you can have multiple paths or a directory etc). And then do we need discovery at all? If we can construct a Fragment backed by a buffer instead of a file path, then you can create a Dataset from that, either with the physical schema of the fragment (no unification is needed if there is only one) or either with a user-specified schema. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles
jorisvandenbossche commented on pull request #7156: URL: https://github.com/apache/arrow/pull/7156#issuecomment-637743541 Regarding my comments on `dataset.py`, since that file is in a need of a general clean-up regarding "input" handling (the handling of single file path / directory path / list of paths / ...), it's certainly fine for me to only deal with those comment then This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7156: ARROW-8074: [C++][Dataset][Python] FileFragments from buffers and NativeFiles
jorisvandenbossche commented on pull request #7156: URL: https://github.com/apache/arrow/pull/7156#issuecomment-628657581 I am testing the other parquet tests that are also skipped, and that turned up already one issue: https://issues.apache.org/jira/browse/ARROW-8799 With a few small edits, all other tests pass now. Do I push that here? (can also keep for follow-up PR) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org