Re: Interest in Parquet V3

2024-05-20 Thread Parth Chandra
Hi Parquet team, It is very exciting to see this effort. Thanks Micah for starting this. For most use case that our team sees the broad areas for improvement appear to be - 1) Optimizing for cloud storage (latency is high, seeks are expensive) 2) Optimized metadata reading - we've seen

Next community call

2024-05-20 Thread Felix Cheung
Hi folks, how can I find information about how to join?

Re: [DISCUSS] rename parquet-mr to parquet-java?

2024-05-20 Thread Julien Le Dem
Thank you Andrew! On Mon, May 20, 2024 at 7:05 AM Andrew Lamb wrote: > Here is the infrastructure ticket with the request to rename the > repository: https://issues.apache.org/jira/browse/INFRA-25802 > > On Fri, May 17, 2024 at 1:28 PM Prem Sahoo wrote: > > > +1 as it will be apt name . > >

Re: [DISCUSS] rename parquet-mr to parquet-java?

2024-05-20 Thread Andrew Lamb
Here is the infrastructure ticket with the request to rename the repository: https://issues.apache.org/jira/browse/INFRA-25802 On Fri, May 17, 2024 at 1:28 PM Prem Sahoo wrote: > +1 as it will be apt name . > Sent from my iPhone > > > On May 17, 2024, at 12:32 PM, Daniel Weeks wrote: > > > >

Re: [DISCUSS] Propose changing the default branch of the parquet-site repo

2024-05-20 Thread Andrew Lamb
I have filed an issue[1] with this request [1] https://issues.apache.org/jira/browse/INFRA-25801 On Wed, May 15, 2024 at 6:54 PM Julien Le Dem wrote: > +1 > > On Wed, May 15, 2024 at 4:15 AM Andrew Lamb > wrote: > > > I plan to wait until next week to allow any one else who has an opinion >

Re: Is Parquet Meant As a Standalone Database or is a Catalog/Metastore Required?

2024-05-20 Thread Uwe L. Korn
Hello all, I work in environments where both usages exist. The single file approach at leat in this setting comes from the fact that a lot of input data for ML pipelines has been historically a single CSV fike dump. As also a lot of data analysis tools have been single-threaded, people are