Re: Batch processing.

Arun Mahadevan Thu, 05 Jul 2018 08:42:01 -0700

> I'm aware of some ideas so far: one simple idea is killing topology when 
> spout gets all acknowledge messages and data source has no more data.

There would be more work needed to optimize the shuffle and scheduling (batch), 
but I agree with minor modifications to the Spout API (like having a batch 
Spout that returns some End-of-data marker) and waiting till all messages are 
ack-ed, Storm can address  most of the simple bounded use cases (batch 
processing). 

From:  Jungtaek Lim <[email protected]>
Reply-To:  "[email protected]" <[email protected]>
Date:  Thursday, July 5, 2018 at 5:49 AM
To:  "[email protected]" <[email protected]>
Subject:  Re: Batch processing.

Hi Gaurav, 

While streaming of Apache Spark lacks some features like branch because it is 
bound to MR nature, I guess most of cases it doesn't matter when you are doing 
batch.
If you are looking for batch solution which is streaming-oriented nature (so 
ingesting finite source to streaming engine), Apache Flink would be the thing 
to consider.

If you don't mind to define spout working with finite data source you can still 
consider Apache Storm. I'm aware of some ideas so far: one simple idea is 
killing topology when spout gets all acknowledge messages and data source has 
no more data.

Thanks,
Jungtaek Lim (HeartSaVioR)

2018년 7월 5일 (목) 오후 9:35, Gaurav Sehgal <[email protected]>님이 작성:
Hello, 
      Is there another framework like Apache Storm, which does batch processing 
of Data. I have been looking at Apache Spark, but the use cases it addresses 
are more of Map Reduce nature. The use case we are looking at is to read data 
from data source such as Mongo in batches and upload in ElasticSearch.

Regards,
Gaurav

Re: Batch processing.

Reply via email to