[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-02-08 Thread via GitHub


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1423041952

   Thanks @saikrishna1-bidgely  -- will check them out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-27 Thread via GitHub


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1406369732

   Please mark it ready for review once if I made a mistake or when it is next 
ready


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-27 Thread via GitHub


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1406369413

   Marking as a draft so it is clear this is waiting on some updates rather 
than waiting on review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-21 Thread via GitHub


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1399227866

   Thank you @saikrishna1-bidgely  -- sounds like great progress. I think if we 
go the trait route as long as we document how to use it (basically we can make 
a doc example showing the use of a ) I think we'll be good. 
   
   Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-20 Thread GitBox


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1398306975

   Here is a proposal to at least add some more docs: 
https://github.com/apache/arrow-datafusion/pull/5001 -- it is not necessarily 
mutually exclusive with updating the signatures as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-20 Thread GitBox


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1398259203

   > Quick question, wouldn't we want to support multiple paths anyways since 
we would want to use it in DataFusion-CLI?
   
   I think that is likely a separate question -- DataFusion CLI can potentially 
create a ListingTable directly rather than using the higher level 
SessionContext API as well
   
   > Also, if we are able to crack the implementation which can take both 
single and multiple paths in the same function, API itself should be unchanged, 
right?
   
   Yes, I agree 
   
   > So, I think the following are our options:
   
   I agree with your assessments -- I do think documentation on how to create a 
ListingTable would go a long way. It seems we are lacking in such docs now 
https://docs.rs/datafusion/16.0.0/datafusion/datasource/listing/struct.ListingTable.html
   
   I think documentation will help regardless of what else we chose to do -- 
I'll go write some now. Thank you for this good discussion
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #4908: added a method to read multiple locations at the same time.

2023-01-19 Thread GitBox


alamb commented on PR #4908:
URL: 
https://github.com/apache/arrow-datafusion/pull/4908#issuecomment-1397083616

   > What do you think about having a single method which only takes a list of 
paths? For a single path, the callee can create a slice/Vec. This would be a 
lot simpler to do.
   
   I was thinking about this PR and I have an alternate suggestion
   
   It seems to me that  `read_parquet`, `read_avro`, etc are wrappers to 
simplify the process of creating a `ListingTable`. Support for multiple paths 
starts complicating the API more -- what do you think about instead of adding 
`read_parquet_from_path`s we make it easier to see how to read multiple files 
using the `ListingTable` API directly?
   
   For example, I bet if we added a doc example like the following
   
   ```rust
   /// Creates a [`DataFrame`] for reading a Parquet data source from a 
single file or directory. 
   ///
   /// Note: if you want to read from multiple files, or control other 
behaviors
   /// you can use the [`ListingTable`] API directly. For example to read 
multiple files
   /// 
   /// ```
   /// Example here (basically copy/paste the implementation of 
read_parquet and support multiple files)
   /// ```
   pub async fn read_parquet(
   ,
   table_path: impl AsRef,
   options: ParquetReadOptions<'_>,
   ) -> Result {
   ...
   ```
   
   We could give similar treatment to the docstrings for `read_avro` and 
`read_csv` (perhaps by pointing to the docs for `read_parquet` for an example 
of creating `ListingTable`s)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org