Flavio - i'm new to Spark as well but I've done stream processing using
other frameworks. My comments below are not spark-streaming specific. Maybe
someone who know more can provide better insights.

I read your post on my phone and I believe my answer doesn't completely
address the issue you have raised.

Do you need to call the external service for every event ? i.e., do you
need to process all events ? Also does order of processing events matter?
Is there is time bound in which each event should be processed ?

Calling an external service means network IO. So you have to buffer events
if your service is rate limited or slower than rate at which you are
processing your event.

Here are some ways of dealing with this situation:

1. Drop events based on a policy (such as buffer/queue size),
2. Tell the event producer to slow down if that's in your control
3. Use a proxy or a set of proxies to distribute the calls to the remote
service, if the rate limit is by user or network node only.

I'm not sure how many of these are implemented directly in Spark streaming
but you can have an external component that can :
control the rate of event and only send events to Spark streams when it's
ready to process more messages.

Hope this helps.

-Soumya




On Wed, Jun 18, 2014 at 6:50 PM, Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> Thanks for the quick reply soumya. Unfortunately I'm a newbie with
> Spark..what do you mean? is there any reference to how to do that?
>
>
> On Thu, Jun 19, 2014 at 12:24 AM, Soumya Simanta <soumya.sima...@gmail.com
> > wrote:
>
>>
>> You can add a back pressured enabled component in front that feeds data
>> into Spark. This component can control in input rate to spark.
>>
>> > On Jun 18, 2014, at 6:13 PM, Flavio Pompermaier <pomperma...@okkam.it>
>> wrote:
>> >
>> > Hi to all,
>> > in my use case I'd like to receive events and call an external service
>> as they pass through. Is it possible to limit the number of contemporaneous
>> call to that service (to avoid DoS) using Spark streaming? if so, limiting
>> the rate implies a possible buffer growth...how can I control the buffer of
>> incoming events waiting to be processed?
>> >
>> > Best,
>> > Flavio
>>
>

Reply via email to