[GitHub] spark pull request #19096: [SPARK-21869][SS] A cached Kafka producer should ...

ScrapCodes Fri, 08 Sep 2017 05:32:09 -0700

Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19096#discussion_r137778385
  
    --- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
 ---
    @@ -43,8 +43,10 @@ private[kafka010] class KafkaWriteTask(
        * Writes key value data out to topics.
        */
       def execute(iterator: Iterator[InternalRow]): Unit = {
    -    producer = CachedKafkaProducer.getOrCreate(producerConfiguration)
    +    val paramsSeq = CachedKafkaProducer.paramsToSeq(producerConfiguration)
         while (iterator.hasNext && failedWrite == null) {
    +      // Prevent producer to get expired/evicted from guava 
cache.(SPARK-21869)
    +      producer = CachedKafkaProducer.getOrCreate(paramsSeq)
    --- End diff --
    
    Hi @zsxwing , thanks for looking, I too feel that - it seemed to be the 
easiest solution though. Anyway, now in the new approach, I am tracking how 
many threads are currently using the producer. Since guava cache, does not 
provide a API to prevent an item from being removed. We insert an in use 
producer back, instead of closing/cleaning it up.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19096: [SPARK-21869][SS] A cached Kafka producer should ...

Reply via email to