[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox


jorisvandenbossche commented on pull request #7073:
URL: https://github.com/apache/arrow/pull/7073#issuecomment-622354887


   Do we need FileSystemDataset, maybe not. Is it still useful, IMO yes. 
   
   As mentioned above, I personally find it convenient to know that my dataset 
has a single format / filesystem, and be able to easily check this format. 
   Now of course, it might be that this convenience is not worth it given the 
added complexity to the code. Or that such convenience could also be given by 
the wrapper languages (eg in Python we could still have a Dataset subclass for 
single formats)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-05-01 Thread GitBox


jorisvandenbossche commented on pull request #7073:
URL: https://github.com/apache/arrow/pull/7073#issuecomment-622265852


   From a user perspective, I find that also an added convenience. In Python, 
the `FileSystemDataset.format` attribute let you check the format of your 
dataset (instead of needing to check `next(dataset.get_fragments()).format`, 
which is not impossible of course, but I was using the format attribute in my 
dask branch). 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on pull request #7073: ARROW-8318: [C++][Dataset] Construct FileSystemDataset from fragments

2020-04-30 Thread GitBox


jorisvandenbossche commented on pull request #7073:
URL: https://github.com/apache/arrow/pull/7073#issuecomment-621940060


   >  Fragments are not required to use the same
   backing filesystem nor the same format.
   
   Shouldn't we require that? That seems the goal of UnionDataset to combine 
datasets with different formats



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org