[jira] [Commented] (SPARK-20259) Support push down join optimizations in DataFrameReader when loading from JDBC

2017-04-09 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962404#comment-15962404
 ] 

Hyukjin Kwon commented on SPARK-20259:
--

If so, I guess it is a duplicate of SPARK-12449. I'd close this if this gets 
not updated for a long time like few days a couple of weeks assuming it refers 
pushing down the join.

> Support push down join optimizations in DataFrameReader when loading from JDBC
> --
>
> Key: SPARK-20259
> URL: https://issues.apache.org/jira/browse/SPARK-20259
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2, 2.1.0
>Reporter: John Muller
>Priority: Minor
>
> Given two dataframes loaded from the same JDBC connection:
> {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid}
> val ordersDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.orders")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> val productDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.product")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> ordersDF.createOrReplaceTempView("orders")
> productDF.createOrReplaceTempView("product")
> // Followed by a join between them:
> val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o 
> INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name")
> {code}
> Catalyst should optimize the query to be:
> SELECT northwind.product.name, SUM(northwind.orders.qty)
> FROM northwind.orders
> INNER JOIN northwind.product ON
>   northwind.orders.product_id = northwind.product.product_id
> GROUP BY p.name



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20259) Support push down join optimizations in DataFrameReader when loading from JDBC

2017-04-09 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962400#comment-15962400
 ] 

Xiao Li commented on SPARK-20259:
-

Pushing join into JDBC data sources?

> Support push down join optimizations in DataFrameReader when loading from JDBC
> --
>
> Key: SPARK-20259
> URL: https://issues.apache.org/jira/browse/SPARK-20259
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2, 2.1.0
>Reporter: John Muller
>Priority: Minor
>
> Given two dataframes loaded from the same JDBC connection:
> {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid}
> val ordersDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.orders")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> val productDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.product")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> ordersDF.createOrReplaceTempView("orders")
> productDF.createOrReplaceTempView("product")
> // Followed by a join between them:
> val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o 
> INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name")
> {code}
> Catalyst should optimize the query to be:
> SELECT northwind.product.name, SUM(northwind.orders.qty)
> FROM northwind.orders
> INNER JOIN northwind.product ON
>   northwind.orders.product_id = northwind.product.product_id
> GROUP BY p.name



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20259) Support push down join optimizations in DataFrameReader when loading from JDBC

2017-04-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961677#comment-15961677
 ] 

Hyukjin Kwon commented on SPARK-20259:
--

Could you describe the current status and why it should be like that?

> Support push down join optimizations in DataFrameReader when loading from JDBC
> --
>
> Key: SPARK-20259
> URL: https://issues.apache.org/jira/browse/SPARK-20259
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.2, 2.1.0
>Reporter: John Muller
>Priority: Minor
>
> Given two dataframes loaded from the same JDBC connection:
> {code:title=UnoptimizedJDBCJoin.scala|borderStyle=solid}
> val ordersDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.orders")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> val productDF = spark.read
>   .format("jdbc")
>   .option("url", "jdbc:postgresql:dbserver")
>   .option("dbtable", "northwind.product")
>   .option("user", "username")
>   .option("password", "password")
>   .load().toDS
>   
> ordersDF.createOrReplaceTempView("orders")
> productDF.createOrReplaceTempView("product")
> // Followed by a join between them:
> val ordersByProduct = sql("SELECT p.name, SUM(o.qty) AS qty FROM orders AS o 
> INNER JOIN product AS p ON o.product_id = p.product_id GROUP BY p.name")
> {code}
> Catalyst should optimize the query to be:
> SELECT northwind.product.name, SUM(northwind.orders.qty)
> FROM northwind.orders
> INNER JOIN northwind.product ON
>   northwind.orders.product_id = northwind.product.product_id
> GROUP BY p.name



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org