[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options
amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1145563490 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -458,7 +458,9 @@ class DataFrameReader private[sql] (sparkSession: SparkSession) extends Logging */ def table(tableName: String): DataFrame = { sparkSession.newDataFrame { builder => - builder.getReadBuilder.getNamedTableBuilder.setUnparsedIdentifier(tableName) + builder.getReadBuilder.getNamedTableBuilder +.setUnparsedIdentifier(tableName) +.putAllOptions(extraOptions.toMap.asJava) Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options
amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1145530854 ## python/pyspark/sql/connect/plan.py: ## @@ -302,13 +302,16 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation: class Read(LogicalPlan): -def __init__(self, table_name: str) -> None: +def __init__(self, table_name: str, options: Dict[str, str] = {}) -> None: Review Comment: done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options
amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142929343 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -148,6 +143,13 @@ message Read { // This is only supported by the JDBC data source. repeated string predicates = 5; } + + // Options for data sources and named table. + // + // When using for data sources, the context of this map varies based on the + // data source format. This options could be empty for valid data source format. + // The map key is case insensitive. + map options = 3; Review Comment: Choose to use non-breaking way to change the proto. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on a diff in pull request #40498: [SPARK-42878][CONNECT] The table API in DataFrameReader could also accept options
amaliujia commented on code in PR #40498: URL: https://github.com/apache/spark/pull/40498#discussion_r1142929171 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -183,7 +183,7 @@ class DataFrameReader private[sql] (sparkSession: SparkSession) extends Logging dataSourceBuilder.setFormat(source) userSpecifiedSchema.foreach(schema => dataSourceBuilder.setSchema(schema.toDDL)) extraOptions.foreach { case (k, v) => -dataSourceBuilder.putOptions(k, v) +builder.getReadBuilder.putOptions(k, v) Review Comment: I found I can only add a meaningful test in server side. On client sides there is no way to verify an option has passed through. In existing codebase, it is tested because we can do `df.queryExecution.analyzed` then get the complete plan the to fetch the options and verify it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org