So for Kafka+spark streaming, Receiver based streaming used highlevel api
and non receiver based streaming used low level api.

1.In high level receiver based streaming does it registers consumers at
each job start(whenever a new job is launched by streaming application say
at each second)?
2.No of executors in highlevel receiver based jobs will always equal to no
of partitions in topic ?
3.Will data from a single topic be consumed by executors in parllel or only
one receiver consumes in multiple threads and assign to executors in high
level receiver based approach ?




On Tue, May 19, 2015 at 2:38 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> spark.streaming.concurrentJobs takes an integer value, not boolean. If
> you set it as 2 then 2 jobs will run parallel. Default value is 1 and the
> next job will start once it completes the current one.
>
>
>> Actually, in the current implementation of Spark Streaming and under
>> default configuration, only job is active (i.e. under execution) at any
>> point of time. So if one batch's processing takes longer than 10 seconds,
>> then then next batch's jobs will stay queued.
>> This can be changed with an experimental Spark property
>> "spark.streaming.concurrentJobs" which is by default set to 1. Its not
>> currently documented (maybe I should add it).
>> The reason it is set to 1 is that concurrent jobs can potentially lead to
>> weird sharing of resources and which can make it hard to debug the whether
>> there is sufficient resources in the system to process the ingested data
>> fast enough. With only 1 job running at a time, it is easy to see that if
>> batch processing time < batch interval, then the system will be stable.
>> Granted that this may not be the most efficient use of resources under
>> certain conditions. We definitely hope to improve this in the future.
>
>
> Copied from TD's answer written in SO
> <http://stackoverflow.com/questions/23528006/how-jobs-are-assigned-to-executors-in-spark-streaming>
> .
>
> Non-receiver based streaming for example you can say are the fileStream,
> directStream ones. You can read a bit of information from here
> https://spark.apache.org/docs/1.3.1/streaming-kafka-integration.html
>
> Thanks
> Best Regards
>
> On Tue, May 19, 2015 at 2:13 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Thanks Akhil.
>> When I don't  set spark.streaming.concurrentJobs to true. Will the all
>> pending jobs starts one by one after 1 jobs completes,or it does not
>> creates jobs which could not be started at its desired interval.
>>
>> And Whats the difference and usage of Receiver vs non-receiver based
>> streaming. Is there any documentation for that?
>>
>> On Tue, May 19, 2015 at 1:35 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> It will be a single job running at a time by default (you can also
>>> configure the spark.streaming.concurrentJobs to run jobs parallel which is
>>> not recommended to put in production).
>>>
>>> Now, your batch duration being 1 sec and processing time being 2
>>> minutes, if you are using a receiver based streaming then ideally those
>>> receivers will keep on receiving data while the job is running (which will
>>> accumulate in memory if you set StorageLevel as MEMORY_ONLY and end up in
>>> block not found exceptions as spark drops some blocks which are yet to
>>> process to accumulate new blocks). If you are using a non-receiver based
>>> approach, you will not have this problem of dropping blocks.
>>>
>>> Ideally, if your data is small and you have enough memory to hold your
>>> data then it will run smoothly without any issues.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, May 19, 2015 at 1:23 PM, Shushant Arora <
>>> shushantaror...@gmail.com> wrote:
>>>
>>>> What happnes if in a streaming application one job is not yet finished
>>>> and stream interval reaches. Does it starts next job or wait for first to
>>>> finish and rest jobs will keep on accumulating in queue.
>>>>
>>>>
>>>> Say I have a streaming application with stream interval of 1 sec, but
>>>> my job takes 2 min to process 1 sec stream , what will happen ?  At any
>>>> time there will be only one job running or multiple ?
>>>>
>>>>
>>>
>>
>

Reply via email to