Nimfadora commented on a change in pull request #23791: [SPARK-20597][SQL][SS][WIP] KafkaSourceProvider falls back on path as synonym for topic URL: https://github.com/apache/spark/pull/23791#discussion_r261634612
########## File path: docs/structured-streaming-kafka-integration.md ########## @@ -457,8 +463,17 @@ The following configurations are optional: <td>string</td> <td>none</td> <td>streaming and batch</td> + <td>Sets the topic that all rows will be written to in Kafka. This option overrides + ```path``` option and any topic column that may exist in the data.</td> +</tr> +<tr> + <td>path</td> + <td>string</td> + <td>none</td> + <td>streaming and batch</td> <td>Sets the topic that all rows will be written to in Kafka. This option overrides any - topic column that may exist in the data.</td> + topic column that may exist in the data and is overridden by ```topic``` option. Review comment: I agree with you, that all three of them should be checked. However, now we have the validation being splitted and duplicated between [KafkaWriter#validateQuery](https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala#L45), [KafkaWriteTask#createProjection](https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala#L101) and [KafkaSourceProvider#resolveTopic](https://github.com/apache/spark/pull/23791/files#diff-eeac5bdf3a1ecd7b9f8aaf10fff37f05R197). We can refactor this moment and move this validations to one place, or just leave as is and add topic column and topic/path option comparison validation to KafkaSourceProvider#validateQuery. The fist is more complicated and error-prone, but will result in more readable code. On the other hand, second solution will not require so much code to be rewritten. Which way do you think is right? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
