Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/19096#discussion_r137778385
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
---
@@ -43,8 +43,10 @@ private[kafka010] class KafkaWriteTask(
* Writes key value data out to topics.
*/
def execute(iterator: Iterator[InternalRow]): Unit = {
- producer = CachedKafkaProducer.getOrCreate(producerConfiguration)
+ val paramsSeq = CachedKafkaProducer.paramsToSeq(producerConfiguration)
while (iterator.hasNext && failedWrite == null) {
+ // Prevent producer to get expired/evicted from guava
cache.(SPARK-21869)
+ producer = CachedKafkaProducer.getOrCreate(paramsSeq)
--- End diff --
Hi @zsxwing , thanks for looking, I too feel that - it seemed to be the
easiest solution though. Anyway, now in the new approach, I am tracking how
many threads are currently using the producer. Since guava cache, does not
provide a API to prevent an item from being removed. We insert an in use
producer back, instead of closing/cleaning it up.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]