[ https://issues.apache.org/jira/browse/HUDI-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pratyaksh Sharma reassigned HUDI-2318: -------------------------------------- Assignee: Pratyaksh Sharma > Enhance and stablize multi-table deltastreamer > ---------------------------------------------- > > Key: HUDI-2318 > URL: https://issues.apache.org/jira/browse/HUDI-2318 > Project: Apache Hudi > Issue Type: Improvement > Components: Utilities > Reporter: sivabalan narayanan > Assignee: Pratyaksh Sharma > Priority: Major > > Currently multi-table deltastreamer supports COW and only for run once mode. > We need to enhance lot more and make it usable for all different scenarios. > > There are asks from the community on this. Typical use-cases: > I have 1000+ tables and I wish to ingest all of them into hudi efficiently. I > don't want to use 1000+ delta streamer instances as I have to allot resources > for every deltastreamer instance. > > Requirements > * Add MOR support to Multi-table deltastreamer > * Add continuous mode support to multi-table ds. > * Add support to sync concurrently across diff tables. As of now, each > table is synced serially which may not work out well for 1000+ tables. And we > may not want to sync all 1000+ tables concurrently. But using a thread-pool, > we can achieve some level of concurrency. > ** Check out [https://github.com/apache/hudi/issues/2175] to ingest to > multiple hudi tables using spark structured streaming. We can also try to see > if we can add it as utility. > -- This message was sent by Atlassian Jira (v8.20.1#820001)