GitHub user YuvalItzchakov opened a pull request:
https://github.com/apache/spark/pull/19059
[SS] - Avoid using `return` inside `CachedKafkaConsumer.get`
During profiling of a structured streaming application with Kafka as the
source, I came across this exception:

This is a 1 minute sample, which caused 106K `NonLocalReturnControl`
exceptions to be thrown.
This happens because `CachedKafkaConsumer.get` is ran inside:
`private def runUninterruptiblyIfPossible[T](body: => T): T`
Where `body: => T` is the `get` method. Turning the method into a function
means that in order to escape the `while` loop defined in `get` the runtime has
to do dirty tricks which involve throwing the above exception.
## What changes were proposed in this pull request?
Instead of using `return` (which is generally not recommended in Scala), we
place the result of the `fetchData` method inside a local variable and use a
boolean flag to indicate the status of fetching data, which we monitor as our
predicate to the `while` loop.
## How was this patch tested?
I've ran the `KafkaSourceSuite` to make sure regression passes. Since the
exception isn't visible from user code, there is no way (at least that I could
think of) to add this as a test to the existing suite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/YuvalItzchakov/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19059.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19059
----
commit c20bd14a4bed34644efc11de420a1caeccea329e
Author: Yuval Itzchakov <[email protected]>
Date: 2017-08-26T15:21:17Z
Avoid using "return" inside `CachedKafkaConsumer.get` as it is passed to
`org.apache.spark.util.UninterruptibleThread.runUninterruptibly` as a function
type which causes a NonLocalReturnControl to be called for every call
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]