redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] 
Support partition column when writing to Kafka
URL: https://github.com/apache/spark/pull/26153#discussion_r336471883
 
 

 ##########
 File path: docs/structured-streaming-kafka-integration.md
 ##########
 @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically 
added (see Kafka semantics
 how ```null``` valued key values are handled). If a topic column exists then 
its value
 is used as the topic when writing the given row to Kafka, unless the "topic" 
configuration
 option is set i.e., the "topic" configuration option overrides the topic 
column.
+If a partition column is not specified then the partition is calculated by the 
Kafka producer
+(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```).
+This can be overridden in Spark by setting the ```kafka.partitioner.class``` 
option.
 
 Review comment:
   > What I would like to test here is that Spark parameterizes producer 
properly.
   > These testing points has nothing to do with producer internals:
   >     * `kafka.partitioner.class` config reaches producer instances and 
takes effect
   
   I actually manually tested this and it does because they are passed down by 
other classes (see `val specifiedKafkaParams = 
kafkaParamsForProducer(caseInsensitiveParameters)`, line 177 of 
`KafkaSourceProvider`. Any config that starts with `kafka.` will be passed down 
to the producer, actually. The problem is that this is not stated in the doc 
(it only mentions two optional configs). Maybe I can update the doc in this 
respect?
   
   >     * Partition field is set under some circumstances
   
   Don't the other test I added prove this point already?
   
   In any case, I'm fine with adding two more tests:
   1. `kafka.partitioner.class` overrides default partitioner.
   2. `partition` column overrides `kafka.partitioner.class`. 
   
   I can write a simple Kafka Partitioner that always spits `0` and test it 
with the same pattern
   (`collect()` and then two topics as you suggested). 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to