GitHub user daroo opened a pull request:
https://github.com/apache/spark/pull/19789
[SPARK-22562][Streaming] CachedKafkaConsumer unsafe eviction from cache
## What changes were proposed in this pull request?
Fixes a problem when one thread wants to add a new consumer into fully
packed cache and another one still uses an instance of cached consumer which is
marked for eviction. In such cases underlying KafkaConsumer throws
ConcurrentModificationException.
My solution is to always remove the eldest consumer from the cache, but
sometimes delay calling close() method (in separate thread) until is no longer
used (released) by KafkaRDDIterator
## How was this patch tested?
Any ideas how to write good unit test to cover this are more than welcome.
In the meantime I'll try to run the code on our DEV env for a longer period of
time.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/daroo/spark SPARK-22562
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19789.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19789
----
commit 9b16ddd723bc3dc324c33bd39a4aa2b065e926b1
Author: Dariusz Szablinski <[email protected]>
Date: 2017-11-20T16:47:29Z
[SPARK-22562][Streaming] CachedKafkaConsumer unsafe eviction from cache
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]