[jira] [Commented] (SPARK-20259) Support push down join optimizations in DataFrameReader when loading from JDBC
[ https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962404#comment-15962404 ] Hyukjin Kwon commented on SPARK-20259: -- If so, I guess it is a duplicate of SPARK-12449. I'd close this if this gets not updated for a long time like few days a couple of weeks assuming it refers pushing down the join. > Support push down join optimizations in DataFrameReader when loading from JDBC > -- > > Key: SPARK-20259 > URL: https://issues.apache.org/jira/browse/SPARK-20259 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2, 2.1.0 >Reporter: John Muller >Priority: Minor > > Given two dataframes loaded from the same JDBC connection: > {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid} > val ordersDF = spark.read > .format("jdbc") > .option("url", "jdbc:postgresql:dbserver") > .option("dbtable", "northwind.orders") > .option("user", "username") > .option("password", "password") > .load().toDS > > val productDF = spark.read > .format("jdbc") > .option("url", "jdbc:postgresql:dbserver") > .option("dbtable", "northwind.product") > .option("user", "username") > .option("password", "password") > .load().toDS > > ordersDF.createOrReplaceTempView("orders") > productDF.createOrReplaceTempView("product") > // Followed by a join between them: > val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o > INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name") > {code} > Catalyst should optimize the query to be: > SELECT northwind.product.name, SUM(northwind.orders.qty) > FROM northwind.orders > INNER JOIN northwind.product ON > northwind.orders.product_id = northwind.product.product_id > GROUP BY p.name -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20259) Support push down join optimizations in DataFrameReader when loading from JDBC
[ https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962400#comment-15962400 ] Xiao Li commented on SPARK-20259: - Pushing join into JDBC data sources? > Support push down join optimizations in DataFrameReader when loading from JDBC > -- > > Key: SPARK-20259 > URL: https://issues.apache.org/jira/browse/SPARK-20259 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2, 2.1.0 >Reporter: John Muller >Priority: Minor > > Given two dataframes loaded from the same JDBC connection: > {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid} > val ordersDF = spark.read > .format("jdbc") > .option("url", "jdbc:postgresql:dbserver") > .option("dbtable", "northwind.orders") > .option("user", "username") > .option("password", "password") > .load().toDS > > val productDF = spark.read > .format("jdbc") > .option("url", "jdbc:postgresql:dbserver") > .option("dbtable", "northwind.product") > .option("user", "username") > .option("password", "password") > .load().toDS > > ordersDF.createOrReplaceTempView("orders") > productDF.createOrReplaceTempView("product") > // Followed by a join between them: > val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o > INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name") > {code} > Catalyst should optimize the query to be: > SELECT northwind.product.name, SUM(northwind.orders.qty) > FROM northwind.orders > INNER JOIN northwind.product ON > northwind.orders.product_id = northwind.product.product_id > GROUP BY p.name -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20259) Support push down join optimizations in DataFrameReader when loading from JDBC
[ https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961677#comment-15961677 ] Hyukjin Kwon commented on SPARK-20259: -- Could you describe the current status and why it should be like that? > Support push down join optimizations in DataFrameReader when loading from JDBC > -- > > Key: SPARK-20259 > URL: https://issues.apache.org/jira/browse/SPARK-20259 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.2, 2.1.0 >Reporter: John Muller >Priority: Minor > > Given two dataframes loaded from the same JDBC connection: > {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid} > val ordersDF = spark.read > .format("jdbc") > .option("url", "jdbc:postgresql:dbserver") > .option("dbtable", "northwind.orders") > .option("user", "username") > .option("password", "password") > .load().toDS > > val productDF = spark.read > .format("jdbc") > .option("url", "jdbc:postgresql:dbserver") > .option("dbtable", "northwind.product") > .option("user", "username") > .option("password", "password") > .load().toDS > > ordersDF.createOrReplaceTempView("orders") > productDF.createOrReplaceTempView("product") > // Followed by a join between them: > val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o > INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name") > {code} > Catalyst should optimize the query to be: > SELECT northwind.product.name, SUM(northwind.orders.qty) > FROM northwind.orders > INNER JOIN northwind.product ON > northwind.orders.product_id = northwind.product.product_id > GROUP BY p.name -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org