[ 
https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-20259.
----------------------------------
    Resolution: Duplicate

Actually, the title refers pushing down the join. I am resolving this.

> Support push down join optimizations in DataFrameReader when loading from JDBC
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-20259
>                 URL: https://issues.apache.org/jira/browse/SPARK-20259
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.2, 2.1.0
>            Reporter: John Muller
>            Priority: Minor
>
> Given two dataframes loaded from the same JDBC connection:
> {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid}
> val ordersDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.orders")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> val productDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.product")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> ordersDF.createOrReplaceTempView("orders")
> productDF.createOrReplaceTempView("product")
> // Followed by a join between them:
> val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o 
> INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name")
> {code}
> Catalyst should optimize the query to be:
> SELECT northwind.product.name, SUM(northwind.orders.qty)
> FROM northwind.orders
> INNER JOIN northwind.product ON
>   northwind.orders.product_id = northwind.product.product_id
> GROUP BY p.name



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to