Nimfadora commented on a change in pull request #23791: 
[SPARK-20597][SQL][SS][WIP] KafkaSourceProvider falls back on path as synonym 
for topic
URL: https://github.com/apache/spark/pull/23791#discussion_r266027495
 
 

 ##########
 File path: docs/structured-streaming-kafka-integration.md
 ##########
 @@ -457,8 +463,17 @@ The following configurations are optional:
   <td>string</td>
   <td>none</td>
   <td>streaming and batch</td>
+  <td>Sets the topic that all rows will be written to in Kafka. This option 
overrides
+  ```path``` option and any topic column that may exist in the data.</td>
+</tr>
+<tr>
+  <td>path</td>
+  <td>string</td>
+  <td>none</td>
+  <td>streaming and batch</td>
   <td>Sets the topic that all rows will be written to in Kafka. This option 
overrides any
-  topic column that may exist in the data.</td>
+  topic column that may exist in the data and is overridden by ```topic``` 
option.
 
 Review comment:
   @gaborgsomogyi I agree with you, that first solution seems to be right way 
to go. I hope that I will not miss any place where it is used, while writing 
unit tests. My concern is about checking the topic column value. We cannot move 
the projection out of KafkaWriteTask, but if we move all the validation inside, 
than it would no longer be fail-fast: in current implementation we are checking 
all that we can before RDD creation. Although, I do not really understand why 
checks are duplicated between 
[KafkaWriteTask#createProjection](https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala#L101)
 and 
[KafkaWriter#validateQuery](https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala#L45)?
 As I can see validateQuery is always called before createProjection. 
@gaborgsomogyi @jaceklaskowski do you know any reason for this duplication?
   
![image](https://user-images.githubusercontent.com/10544767/54438390-a059ca80-4747-11e9-8416-13a16009e2bd.png)
   
   The solution I propose: 
   - remove duplicated checks from KafkaWriteTask#createProjection
   - move check for topic and path option match to KafkaWriter#validateQuery
   - add topic option and topic attribute match validation to KafkaWriteTask

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to