Thanks for your support, This is my idea of the project, i'm a newbie so
please forgive my misunderstandings:

Spark streaming will collect requests, for example: create a table, append
records to a table, erase a table (it's just an example).

With spark streaming i can filter the messages by key (kind of request) and
send them (forEachRDD) to a specific function which should care about each
kind of request.


This is just fine when the requests are "self-contained", or say in other
words, one-step request, for example, create table, drop table.

But it's a bit more complicated if i need a connection that:

1) must survive to the scope of the function
2) must be shared across slaves.

For example, a connection to the database.

How do you think is the best approach for this scenario?







2014-03-26 10:30 GMT+01:00 Tathagata Das <[email protected]>:

> When you say "launch long-running tasks" does it mean long running Spark
> jobs/tasks, or long-running tasks in another system?
>
> If the rate of requests from Kafka is not low (in terms of records per
> second), you could collect the records in the driver, and maintain the
> "shared bag" in the driver. A separate thread in the driver could pick
> stuff from the bag and launch "tasks". This is a slightly unorthodox use of
> Spark Streaming, but should work.
>
> If the rate of request from Kafka is high, then I am not sure how you can
> sustain that many long running tasks (assuming 1 task corresponding to each
> request from Kafka).
>
> TD
>
>
> On Wed, Mar 26, 2014 at 1:19 AM, Bryan Bryan <[email protected]>wrote:
>
>> Hi there,
>>
>> I have read about the two fundamental shared features in spark
>> (broadcasting variables and accumulators), but this is what i need.
>>
>> I'm using spark streaming in order to get requests from Kafka, these
>> requests may launch long-running tasks, and i need to control them:
>>
>> 1) Keep them in a shared bag, like a Hashmap, to retrieve them by ID, for
>> example.
>> 2) Retrieve an instance of this object/task whatever on-demand
>> (on-request, in fact)
>>
>>
>> Any idea about that? How can i share objects between slaves? May i use
>> something out of spark (maybe hazelcast')
>>
>>
>> Regards.
>>
>
>

Reply via email to