Re: Hudi Concurrent Ingestion with Spark Streaming

2020-09-16 Thread nishith agarwal
Tanu, I'm assuming you're talking about multiple kafka partitions from a single Spark Streaming job. In this case, your job can read from multiple partitions but at the end, this data should be written to a single table. The dataset/rdd resulting from reading multiple partitions is passed as a

Hudi Concurrent Ingestion with Spark Streaming

2020-09-16 Thread tanu dua
Hi, I need to try myself more on this but how Hudi concurrent ingestion works with Spark Streaming. We have multiple Kafka partitions on which Spark is listening on so there is a possibility that at any given point of time multiple executors will be reading the kafka partitions and start ingesting