[ 
https://issues.apache.org/jira/browse/HUDI-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma reassigned HUDI-2318:
--------------------------------------

    Assignee: Pratyaksh Sharma

> Enhance and stablize multi-table deltastreamer
> ----------------------------------------------
>
>                 Key: HUDI-2318
>                 URL: https://issues.apache.org/jira/browse/HUDI-2318
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Utilities
>            Reporter: sivabalan narayanan
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>
> Currently multi-table deltastreamer supports COW and only for run once mode. 
> We need to enhance lot more and make it usable for all different scenarios. 
>  
> There are asks from the community on this. Typical use-cases:
> I have 1000+ tables and I wish to ingest all of them into hudi efficiently. I 
> don't want to use 1000+ delta streamer instances as I have to allot resources 
> for every deltastreamer instance. 
>  
> Requirements
>  * Add MOR support to Multi-table deltastreamer
>  * Add continuous mode support to multi-table ds.
>  * Add support to sync concurrently across diff tables.  As of now, each 
> table is synced serially which may not work out well for 1000+ tables. And we 
> may not want to sync all 1000+ tables concurrently. But using a thread-pool, 
> we can achieve some level of concurrency. 
>  ** Check out [https://github.com/apache/hudi/issues/2175] to ingest to 
> multiple hudi tables using spark structured streaming. We can also try to see 
> if we can add it as utility. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to