Re: [PR] [SPARK-45815][SQL][Streaming] Provide an interface for other Streaming sources to add `_metadata` columns [spark]

via GitHub Wed, 08 Nov 2023 03:14:19 -0800


cloud-fan commented on code in PR #43692:
URL: https://github.com/apache/spark/pull/43692#discussion_r1386458174



##########
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala:
##########
@@ -309,3 +309,22 @@ trait InsertableRelation {
 trait CatalystScan {
   def buildScan(requiredColumns: Seq[Attribute], filters: Seq[Expression]): 
RDD[Row]
 }
+
+/**
+ * Implemented by StreamSourceProvider objects that can generate file metadata 
columns.
+ */
+trait SupportsStreamSourceMetadataColumns extends StreamSourceProvider {
+
+  /**
+   * Returns the metadata columns that should be added to the schema of the 
Stream Data Source.
+   *
+   * @param spark The SparkSession used for the operation.
+   * @param options A map of options of the Stream Data Source.
+   * @param userSpecifiedSchema An optional user-provided schema of the Stream 
Data Source.
+   * @return A Seq of AttributeReference representing the metadata output 
attributes.

Review Comment:
   Can we return `StructType` instead? It's very weird to return 
`AttributeReference`, as this is internal API and it's unclear what attr ID 
should the implementations use.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45815][SQL][Streaming] Provide an interface for other Streaming sources to add `_metadata` columns [spark]

Reply via email to