[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1423041952 Thanks @saikrishna1-bidgely -- will check them out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1406369732 Please mark it ready for review once if I made a mistake or when it is next ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1406369413 Marking as a draft so it is clear this is waiting on some updates rather than waiting on review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1399227866 Thank you @saikrishna1-bidgely -- sounds like great progress. I think if we go the trait route as long as we document how to use it (basically we can make a doc example showing the use of a ) I think we'll be good. Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1398306975 Here is a proposal to at least add some more docs: https://github.com/apache/arrow-datafusion/pull/5001 -- it is not necessarily mutually exclusive with updating the signatures as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1398259203 > Quick question, wouldn't we want to support multiple paths anyways since we would want to use it in DataFusion-CLI? I think that is likely a separate question -- DataFusion CLI can potentially create a ListingTable directly rather than using the higher level SessionContext API as well > Also, if we are able to crack the implementation which can take both single and multiple paths in the same function, API itself should be unchanged, right? Yes, I agree > So, I think the following are our options: I agree with your assessments -- I do think documentation on how to create a ListingTable would go a long way. It seems we are lacking in such docs now https://docs.rs/datafusion/16.0.0/datafusion/datasource/listing/struct.ListingTable.html I think documentation will help regardless of what else we chose to do -- I'll go write some now. Thank you for this good discussion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.
alamb commented on PR #4908: URL: https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1397083616 > What do you think about having a single method which only takes a list of paths? For a single path, the callee can create a slice/Vec. This would be a lot simpler to do. I was thinking about this PR and I have an alternate suggestion It seems to me that `read_parquet`, `read_avro`, etc are wrappers to simplify the process of creating a `ListingTable`. Support for multiple paths starts complicating the API more -- what do you think about instead of adding `read_parquet_from_path`s we make it easier to see how to read multiple files using the `ListingTable` API directly? For example, I bet if we added a doc example like the following ```rust /// Creates a [`DataFrame`] for reading a Parquet data source from a single file or directory. /// /// Note: if you want to read from multiple files, or control other behaviors /// you can use the [`ListingTable`] API directly. For example to read multiple files /// /// ``` /// Example here (basically copy/paste the implementation of read_parquet and support multiple files) /// ``` pub async fn read_parquet( , table_path: impl AsRef, options: ParquetReadOptions<'_>, ) -> Result { ... ``` We could give similar treatment to the docstrings for `read_avro` and `read_csv` (perhaps by pointing to the docs for `read_parquet` for an example of creating `ListingTable`s) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org