Re: DISCUSS code, config, design walk through sessions

2020-07-08 Thread Shiyan Xu
The time slot works for me but i guess it may conflict with work hours in other time zones. Maybe alternating morning and evening sessions in PST work better? On Wed, Jul 8, 2020 at 9:07 PM Vinoth Chandar wrote: > Apologies. Should have been more detailed. > > It’s Tuesday. Please see here for

Re: DISCUSS code, config, design walk through sessions

2020-07-08 Thread Vinoth Chandar
Apologies. Should have been more detailed. It’s Tuesday. Please see here for details https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Community+Weekly+Sync On Wed, Jul 8, 2020 at 8:55 PM Adam Feldman wrote: > Hi, what day will this be? > > On Tue, Jul 7, 2020, 17:25 Vinoth Chandar

Re: DISCUSS code, config, design walk through sessions

2020-07-08 Thread Adam Feldman
Hi, what day will this be? On Tue, Jul 7, 2020, 17:25 Vinoth Chandar wrote: > Thanks, everyone! There appears to be great interest. let's do it. > > In terms of timing, I was thinking if we can extend one of our existing > community weekly sync meetings for this purpose. > So, timing would be

Re: Hudi - Concurrent Writes

2020-07-08 Thread Vinoth Chandar
We are looking into adding support for parallel writers in 0.6.0. So that should help. I am curious to understand though why you prefer to have 1000 different writer jobs, as opposed to having just one writer. Typical use cases for parallel writing I have seen are related to backfills and such.

Re: Hudi - Concurrent Writes

2020-07-08 Thread Mario de Sá Vera
hey Shayan, that seems actually a very good approach ... just curious with the glue metastore you mentioned. Would it be an external metastore for spark to query over ??? external in terms of not managed by Hudi ??? that would be my only concern ... how to maintain the sync between all metadata

Re: Keeping Hive in Sync

2020-07-08 Thread vbal...@apache.org
I don't remember the root cause completely Vinoth. I guess it was due to some protocol mismatch.  Balaji.V On Tuesday, July 7, 2020, 10:25:48 PM PDT, Vinoth Chandar wrote: Hi, Yes. It can be an issue, probably good to get the table written using hive style partitioning. I will check  on

Hudi - Concurrent Writes

2020-07-08 Thread Shayan Hati
Hi folks, We have a use-case where we want to ingest data concurrently for different partitions. Currently Hudi doesn't support concurrent writes on the same Hudi table. One of the approaches we were thinking was to use one hudi table per partition of data. So let us say we have 1000 partitions,