[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622354887 Do we need FileSystemDataset, maybe not. Is it still useful, IMO yes. As mentioned above, I personally find it convenient to know that my dataset has a single format / filesystem, and be able to easily check this format. Now of course, it might be that this convenience is not worth it given the added complexity to the code. Or that such convenience could also be given by the wrapper languages (eg in Python we could still have a Dataset subclass for single formats) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-622265852 From a user perspective, I find that also an added convenience. In Python, the `FileSystemDataset.format` attribute let you check the format of your dataset (instead of needing to check `next(dataset.get_fragments()).format`, which is not impossible of course, but I was using the format attribute in my dask branch). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments
jorisvandenbossche commented on pull request #7073: URL: https://github.com/apache/arrow/pull/7073#issuecomment-621940060 > Fragments are not required to use the same backing filesystem nor the same format. Shouldn't we require that? That seems the goal of UnionDataset to combine datasets with different formats This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org