[GitHub] spark pull request #17308: [SPARK-19968][SPARK-20737][SS] Use a cached insta...

ScrapCodes Mon, 22 May 2017 04:01:03 -0700

Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17308#discussion_r117717831
  
    --- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 ---
    @@ -94,4 +94,10 @@ private[kafka010] object KafkaWriter extends Logging {
           }
         }
       }
    +
    +  def close(sc: SparkContext, kafkaParams: ju.Map[String, Object]): Unit = 
{
    +    sc.parallelize(1 to 10000).foreachPartition { iter =>
    +      CachedKafkaProducer.close(kafkaParams)
    +    }
    --- End diff --
    
    Using guave cache, we can close if not used for a certain time. Shall we 
ignore closing them during a shutdown ? 
    In the particular case of kafka producer, I do not see a direct problem 
with that. Since we do a producer.flush() on each batch. I was just wondering, 
with streaming sinks general - what should be our strategy ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17308: [SPARK-19968][SPARK-20737][SS] Use a cached insta...

Reply via email to