> I'm aware of some ideas so far: one simple idea is killing topology when > spout gets all acknowledge messages and data source has no more data.
There would be more work needed to optimize the shuffle and scheduling (batch), but I agree with minor modifications to the Spout API (like having a batch Spout that returns some End-of-data marker) and waiting till all messages are ack-ed, Storm can address most of the simple bounded use cases (batch processing). From: Jungtaek Lim <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, July 5, 2018 at 5:49 AM To: "[email protected]" <[email protected]> Subject: Re: Batch processing. Hi Gaurav, While streaming of Apache Spark lacks some features like branch because it is bound to MR nature, I guess most of cases it doesn't matter when you are doing batch. If you are looking for batch solution which is streaming-oriented nature (so ingesting finite source to streaming engine), Apache Flink would be the thing to consider. If you don't mind to define spout working with finite data source you can still consider Apache Storm. I'm aware of some ideas so far: one simple idea is killing topology when spout gets all acknowledge messages and data source has no more data. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 7월 5일 (목) 오후 9:35, Gaurav Sehgal <[email protected]>님이 작성: Hello, Is there another framework like Apache Storm, which does batch processing of Data. I have been looking at Apache Spark, but the use cases it addresses are more of Map Reduce nature. The use case we are looking at is to read data from data source such as Mongo in batches and upload in ElasticSearch. Regards, Gaurav
