GitHub user vchekan opened a pull request:

    https://github.com/apache/spark/pull/961

    Key not found exception when slow receiver starts

    I got "java.util.NoSuchElementException: key not found: 1401756085000 ms" 
exception when using kafka stream and 1 sec batchPeriod.
    
    Investigation showed that the reason is that 
ReceiverLauncher.startReceivers is asynchronous (started in a thread).
    
https://github.com/vchekan/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L206
    
    In case of slow starting receiver, such as Kafka, it easily takes more than 
2sec to start. In result, no single "compute" will be called on 
ReceiverInputDStream before first batch job is executed and receivedBlockInfo 
remains empty (obviously). Batch job will cause 
ReceiverInputDStream.getReceivedBlockInfo call and "key not found" exception.
    
    The patch makes getReceivedBlockInfo more robust by tolerating missing 
values.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vchekan/spark branch-1.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #961
    
----
commit 46095633db8e0b0694aebf8bfba2e723b34fd239
Author: Vadim Chekan <[email protected]>
Date:   2014-06-03T22:59:43Z

    Key not found exception: if receiver is slow to start, it is possible that 
getReceivedBlockInfo will be called before compute has been called

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to