Re: spark streaming rate limiting from kafka

Tathagata Das Thu, 17 Jul 2014 18:23:22 -0700

You can create multiple kafka stream to partition your topics across them,
which will run multiple receivers or multiple executors. This is covered in
the Spark streaming guide.
<http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving>

And for the purpose of this thread, to answer the original question, we now
have the ability
<https://issues.apache.org/jira/browse/SPARK-1854?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20priority%20DESC>
to limit the receiving rate. Its in the master branch, and will be
available in Spark 1.1. It basically sets the limits at the receiver level
(so applies to all sources) on what is the max records per second that can
will be received by the receiver.

TD

On Thu, Jul 17, 2014 at 6:15 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:

> Bill,
>
> are you saying, after repartition(400), you have 400 partitions on one
> host and the other hosts receive nothing of the data?
>
> Tobias
>
>
> On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay <bill.jaypeter...@gmail.com>
> wrote:
>
>> I also have an issue consuming from Kafka. When I consume from Kafka,
>> there are always a single executor working on this job. Even I use
>> repartition, it seems that there is still a single executor. Does anyone
>> has an idea how to add parallelism to this job?
>>
>>
>>
>> On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song...@gmail.com>
>> wrote:
>>
>>> Thanks Luis and Tobias.
>>>
>>>
>>> On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <t...@preferred.jp>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song...@gmail.com>
>>>> wrote:
>>>>>
>>>>> * Is there a way to control how far Kafka Dstream can read on
>>>>> topic-partition (via offset for example). By setting this to a small
>>>>> number, it will force DStream to read less data initially.
>>>>>
>>>>
>>>> Please see the post at
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccaph-c_m2ppurjx-n_tehh0bvqe_6la-rvgtrf1k-lwrmme+...@mail.gmail.com%3E
>>>> Kafka's auto.offset.reset parameter may be what you are looking for.
>>>>
>>>> Tobias
>>>>
>>>>
>>>
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>

Re: spark streaming rate limiting from kafka

Reply via email to