William Liu created ARROW-9063: ---------------------------------- Summary: [Python][C++] Order of files are not respected using the new pyarrow.dataset Key: ARROW-9063 URL: https://issues.apache.org/jira/browse/ARROW-9063 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 0.17.1 Environment: ubuntu-18.04 Reporter: William Liu
Say we have multiple parquet files under the same folder (a.parquet, b.parquet, c.parquet). If I pass a list of file paths into either of the two statements below {code:java} ds = pq.ParquetDataset(fps, use_legacy_dataset=False) ds = pyarrow.dataset(fps){code} Then rows of the resulting table will have: aaaa...bbbb...aaa...bbbb...aaa...ccc..bbb...cccc -- This message was sent by Atlassian Jira (v8.3.4#803005)