Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r198315709 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -109,6 +134,20 @@ class JDBCOptions( s"When reading JDBC data sources, users need to specify all or none for the following " + s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', '$JDBC_UPPER_BOUND', " + s"and '$JDBC_NUM_PARTITIONS'") + + require(!(query.isDefined && partitionColumn.isDefined), + s""" + |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not be specified together. + |Please define the query using `$JDBC_TABLE_NAME` option instead and make sure to qualify + |the partition columns using the supplied subquery alias to resolve any ambiguity. + |Example : + |spark.read.format("jdbc") + | .option("dbtable", "(select c1, c2 from t1) as subq") + | .option("partitionColumn", "subq.c1" + | .load() + """.stripMargin + ) --- End diff -- @maropu Currently we disallow it to be on the safe side. Lets take your example. When using the query option to pass on the query, we basically expect the users to supply ```SQL select c0 p0, c1 p1, c2 p2 from t where c0 > 1 ``` In spark , we will parentesize the query and add in an alias to confirm to the table subquery syntax. Given the user input the above query, he could decide to qualify the partition column names with the table name. So he could do the following : ``` SQL al df = spark.read .format("jdbc") .option("driver", "org.postgresql.Driver") .option("url", "jdbc:postgresql://localhost:5432/postgres?user=maropu") .option("query", "select c0 p0, c1 p1, c2 p2 from t where c0 > 1") .option("partitionColumn", "t.p2") ==> User qualifies the column names. .option("lowerBound", "1") .option("upperBound", "3") .option("numPartitions", "2") .load() ``` In this case we will end up generating the query of the following form - ``` SQL select * from (select c0 p0, c1 p1, c2 p2 from t where c0 > 1) __SPARK_GEN_ALIAS where t.p2 >= 1 and t.p2 <=3 ``` However this would be an invalid query. In the query option, its possible to specify a complex query involving joins. Thats the reason, we disallow it to be in safe side. In the dbtable option, users are responsible to explicitly specify the alias and would now how to qualify the partition columns. Lets see if we can improve this in future. If you have some ideas, please let us know.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org