subject:"\[GitHub\] \[arrow\] jorisvandenbossche commented on pull request #7073\: ARROW\-8318\: \[C\+\+\]\[Dataset\] Construct FileSystemDataset from fragments"

[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox

jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622354887 Do we need FileSystemDataset, maybe not. Is it still useful, IMO yes. As mentioned above, I personally find it convenient to know that my dataset has a single

[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox

jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622265852 From a user perspective, I find that also an added convenience. In Python, the `FileSystemDataset.format` attribute let you check the format of your dataset (instead

[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-04-30 Thread GitBox

jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-621940060 > Fragments are not required to use the same backing filesystem nor the same format. Shouldn't we require that? That seems the goal of UnionDataset to combine