redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336471883
########## File path: docs/structured-streaming-kafka-integration.md ########## @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically added (see Kafka semantics how ```null``` valued key values are handled). If a topic column exists then its value is used as the topic when writing the given row to Kafka, unless the "topic" configuration option is set i.e., the "topic" configuration option overrides the topic column. +If a partition column is not specified then the partition is calculated by the Kafka producer +(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```). +This can be overridden in Spark by setting the ```kafka.partitioner.class``` option. Review comment: > What I would like to test here is that Spark parameterizes producer properly. > These testing points has nothing to do with producer internals: > * `kafka.partitioner.class` config reaches producer instances and takes effect I actually manually tested this and it does because they are passed down by other classes (see `val specifiedKafkaParams = kafkaParamsForProducer(caseInsensitiveParameters)`, line 177 of `KafkaSourceProvider`. Any config that starts with `kafka.` will be passed down to the producer, actually. The problem is that this is not stated in the doc (it only mentions two optional configs). Maybe I can update the doc in this respect? > * Partition field is set under some circumstances Don't the other test I added prove this point already? In any case, I'm fine with adding two more tests: 1. `kafka.partitioner.class` overrides default partitioner. 2. `partition` column overrides `kafka.partitioner.class`. I can write a simple Kafka Partitioner that always spits `0` and test it with the same pattern (`collect()` and then two topics as you suggested). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
