I feel need of pause and resume in streaming app :)

Is there any limit on max queued jobs ? If yes what happens once that limit
reaches? Does job gets killed?


On Tue, Sep 1, 2015 at 10:02 PM, Cody Koeninger <c...@koeninger.org> wrote:

> Sounds like you'd be better off just failing if the external server is
> down, and scripting monitoring / restarting of your job.
>
> On Tue, Sep 1, 2015 at 11:19 AM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Since in my app , after processing the events I am posting the events to
>> some external server- if external server is down - I want to backoff
>> consuming from kafka. But I can't stop and restart the consumer since it
>> needs manual effort.
>>
>> Backing off few batches is also not possible -since decision of backoff
>> is based on last batch process status but driver has already computed
>> offsets for next batches - so if I ignore further few batches till external
>> server is back to normal its a dataloss if I cannot reset the offset .
>>
>> So only option seems is to delay the last batch by calling sleep() in
>> foreach rdd method after returning from foreachpartitions transformations.
>>
>> So concern here is further batches will keep enqueening until current
>> slept batch completes. So whats the max size of scheduling queue? Say if
>> server does not come up for hours and my batch size is 5 sec it will
>> enqueue 720 batches .
>> Will that be a issue ?
>>  And is there any setting in directkafkastream to enforce not to call
>> further computes() method after a threshold of scheduling queue size say
>> (50 batches).Once queue size comes back to less than threshold call compute
>> and enqueue the next job.
>>
>>
>>
>>
>>
>> On Tue, Sep 1, 2015 at 8:57 PM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>>
>>> Honestly I'd concentrate more on getting your batches to finish in a
>>> timely fashion, so you won't even have the issue to begin with...
>>>
>>> On Tue, Sep 1, 2015 at 10:16 AM, Shushant Arora <
>>> shushantaror...@gmail.com> wrote:
>>>
>>>> What if I use custom checkpointing. So that I can take care of offsets
>>>> being checkpointed at end of each batch.
>>>>
>>>> Will it be possible then to reset the offset.
>>>>
>>>> On Tue, Sep 1, 2015 at 8:42 PM, Cody Koeninger <c...@koeninger.org>
>>>> wrote:
>>>>
>>>>> No, if you start arbitrarily messing around with offset ranges after
>>>>> compute is called, things are going to get out of whack.
>>>>>
>>>>> e.g. checkpoints are no longer going to correspond to what you're
>>>>> actually processing
>>>>>
>>>>> On Tue, Sep 1, 2015 at 10:04 AM, Shushant Arora <
>>>>> shushantaror...@gmail.com> wrote:
>>>>>
>>>>>> can I reset the range based on some condition - before calling
>>>>>> transformations on the stream.
>>>>>>
>>>>>> Say -
>>>>>> before calling :
>>>>>>  directKafkaStream.foreachRDD(new Function<JavaRDD<byte[][]>, Void>()
>>>>>> {
>>>>>>
>>>>>> @Override
>>>>>> public Void call(JavaRDD<byte[][]> v1) throws Exception {
>>>>>> v1.foreachPartition(new  VoidFunction<Iterator<byte[][]>>{
>>>>>> @Override
>>>>>> public void call(Iterator<byte[][]> t) throws Exception {
>>>>>> }});}});
>>>>>>
>>>>>> change directKafkaStream's RDD's offset range.(fromOffset).
>>>>>>
>>>>>> I can't do this in compute method since compute would have been
>>>>>> called at current batch queue time - but condition is set at previous 
>>>>>> batch
>>>>>> run time.
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 1, 2015 at 7:09 PM, Cody Koeninger <c...@koeninger.org>
>>>>>> wrote:
>>>>>>
>>>>>>> It's at the time compute() gets called, which should be near the
>>>>>>> time the batch should have been queued.
>>>>>>>
>>>>>>> On Tue, Sep 1, 2015 at 8:02 AM, Shushant Arora <
>>>>>>> shushantaror...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> In spark streaming 1.3 with kafka- when does driver bring latest
>>>>>>>> offsets of this run - at start of each batch or at time when  batch 
>>>>>>>> gets
>>>>>>>> queued ?
>>>>>>>>
>>>>>>>> Say few of my batches take longer time to complete than their batch
>>>>>>>> interval. So some of batches will go in queue. Will driver waits for
>>>>>>>>  queued batches to get started or just brings the latest offsets before
>>>>>>>> they even actually started. And when they start running they will work 
>>>>>>>> on
>>>>>>>> old offsets brought at time when they were queued.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to