HeartSaVioR edited a comment on issue #24738: [WIP][SPARK-23098][SQL] Migrate 
Kafka Batch source to v2.
URL: https://github.com/apache/spark/pull/24738#issuecomment-497904088
 
 
   @rdblue 
   Maybe I haven't explained properly. Logically, Kafka has a fixed schema 
(key/value), but Kafka source and sink are also providing/receiving kinds of 
metadata as well, which are useful to apply some logic based on that 
information. For example, Kafka has different use cases compared to read from 
specific table: reading from multiple topics, which end users want to know 
which topic the row comes from and apply some logic based on that.
   
   Please refer Structured Streaming + Kafka Integration Guide.
   
http://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
   
   > source (reader)
   
   Column | Type
   -- | --
   key | binary
   value | binary
   topic | string
   partition | int
   offset | long
   timestamp | long
   timestampType | int
   
   > sink (writer)
   
   Column | Type
   -- | --
   key (optional) | string or binary
   value (required) | string or binary
   topic (*optional) | string
   
   * The topic column is required if the “topic” configuration option is not 
specified.
   
   Kafka writer's query validation has been done in KafkaWriter.
   
   
https://github.com/apache/spark/blob/aec0869fb2ae1ace93056ee1f9ea50b1bdbae9ad/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala#L45-L78

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to