[jira] [Commented] (SPARK-13333) DataFrame filter + randn + unionAll has bad interaction

2017-06-15 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050776#comment-16050776
 ] 

Xiao Li commented on SPARK-1:
-

This function is still missing in the SQL interface.

We can achieve the resolution by names by using the CORRESPONDING BY clause. 
For example,

{noformat}
(select * from t1) union corresponding by (c1, c2) (select * from t2);
{noformat}


> DataFrame filter + randn + unionAll has bad interaction
> ---
>
> Key: SPARK-1
> URL: https://issues.apache.org/jira/browse/SPARK-1
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.2, 1.6.1, 2.0.0
>Reporter: Joseph K. Bradley
>
> Buggy workflow
> * Create a DataFrame df0
> * Filter df0
> * Add a randn column
> * Create a copy of the DataFrame
> * unionAll the two DataFrames
> This fails, where randn produces the same results on the original DataFrame 
> and the copy before unionAll but fails to do so after unionAll.  Removing the 
> filter fixes the problem.
> The bug can be reproduced on master:
> {code}
> import org.apache.spark.sql.functions.randn
> val df0 = sqlContext.createDataFrame(Seq(0, 1).map(Tuple1(_))).toDF("id")
> // Removing the following filter() call makes this give the expected result.
> val df1 = df0.filter(col("id") === 0).withColumn("b", randn(12345))
> println("DF1")
> df1.show()
> val df2 = df1.select("id", "b")
> println("DF2")
> df2.show()  // same as df1.show(), as expected
> val df3 = df1.unionAll(df2)
> println("DF3")
> df3.show()  // NOT two copies of df1, which is unexpected
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21111) Fix test failure in 2.2

2017-06-15 Thread Xiao Li (JIRA)
Xiao Li created SPARK-2:
---

 Summary: Fix test failure in 2.2 
 Key: SPARK-2
 URL: https://issues.apache.org/jira/browse/SPARK-2
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li
Priority: Blocker


Test failure:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT

2017-06-15 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21112:
---

 Summary: ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
 Key: SPARK-21112
 URL: https://issues.apache.org/jira/browse/SPARK-21112
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li
Assignee: Xiao Li


{{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if the 
input does not have the property of `COMMENT`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21114:

Description: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/
  (was: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/)

> Test failure fix in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch

2017-06-15 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21114:
---

 Summary: Test failure fix in Spark 2.1 due to name mismatch
 Key: SPARK-21114
 URL: https://issues.apache.org/jira/browse/SPARK-21114
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.1.1
Reporter: Xiao Li
Assignee: Xiao Li


https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21114:

Summary: Test failure in Spark 2.1 due to name mismatch  (was: Test failure 
fix in Spark 2.1 due to name mismatch)

> Test failure in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21114:

Affects Version/s: 2.0.2

> Test failure in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21114:

Description: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/

  
was:https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/


> Test failure in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21114:

Description: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/

  was:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/


> Test failure in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20749) Built-in SQL Function Support - all variants of LEN[GTH]

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-20749:
---

Assignee: Kazuaki Ishizaki

> Built-in SQL Function Support - all variants of LEN[GTH]
> 
>
> Key: SPARK-20749
> URL: https://issues.apache.org/jira/browse/SPARK-20749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Kazuaki Ishizaki
>  Labels: starter
> Fix For: 2.3.0
>
>
> {noformat}
> LEN[GTH]()
> {noformat}
> The SQL 99 standard includes BIT_LENGTH(), CHAR_LENGTH(), and OCTET_LENGTH() 
> functions.
> We need to support all of them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20749) Built-in SQL Function Support - all variants of LEN[GTH]

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20749.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Built-in SQL Function Support - all variants of LEN[GTH]
> 
>
> Key: SPARK-20749
> URL: https://issues.apache.org/jira/browse/SPARK-20749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Kazuaki Ishizaki
>  Labels: starter
> Fix For: 2.3.0
>
>
> {noformat}
> LEN[GTH]()
> {noformat}
> The SQL 99 standard includes BIT_LENGTH(), CHAR_LENGTH(), and OCTET_LENGTH() 
> functions.
> We need to support all of them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20752) Build-in SQL Function Support - SQRT

2017-06-15 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051441#comment-16051441
 ] 

Xiao Li commented on SPARK-20752:
-

Yes!

> Build-in SQL Function Support - SQRT
> 
>
> Key: SPARK-20752
> URL: https://issues.apache.org/jira/browse/SPARK-20752
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>  Labels: starter
>
> {noformat}
> SQRT()
> {noformat}
> Returns Power(, 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20750) Built-in SQL Function Support - REPLACE

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20750.
-
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.3.0

> Built-in SQL Function Support - REPLACE
> ---
>
> Key: SPARK-20750
> URL: https://issues.apache.org/jira/browse/SPARK-20750
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Kazuaki Ishizaki
>  Labels: starter
> Fix For: 2.3.0
>
>
> {noformat}
> REPLACE(,  [, ])
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-20752) Build-in SQL Function Support - SQRT

2017-06-15 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-20752.
---
Resolution: Duplicate

> Build-in SQL Function Support - SQRT
> 
>
> Key: SPARK-20752
> URL: https://issues.apache.org/jira/browse/SPARK-20752
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>  Labels: starter
>
> {noformat}
> SQRT()
> {noformat}
> Returns Power(, 2)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21119) unset table properties should keep the table comment

2017-06-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21119.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> unset table properties should keep the table comment
> 
>
> Key: SPARK-21119
> URL: https://issues.apache.org/jira/browse/SPARK-21119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21089) Table properties are not shown in DESC EXTENDED/FORMATTED

2017-06-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21089.
-
Resolution: Fixed

> Table properties are not shown in DESC EXTENDED/FORMATTED
> -
>
> Key: SPARK-21089
> URL: https://issues.apache.org/jira/browse/SPARK-21089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
>
> Since both table properties and storage properties share the same key values, 
> table properties are not shown in the output of DESC EXTENDED/FORMATTED when 
> the storage properties are not empty. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21089) Table properties are not shown in DESC EXTENDED/FORMATTED

2017-06-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21089:

Fix Version/s: 2.2.0

> Table properties are not shown in DESC EXTENDED/FORMATTED
> -
>
> Key: SPARK-21089
> URL: https://issues.apache.org/jira/browse/SPARK-21089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.2.0
>
>
> Since both table properties and storage properties share the same key values, 
> table properties are not shown in the output of DESC EXTENDED/FORMATTED when 
> the storage properties are not empty. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21129) Arguments of SQL function call should not be named expressions

2017-06-17 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21129:
---

 Summary: Arguments of SQL function call should not be named 
expressions
 Key: SPARK-21129
 URL: https://issues.apache.org/jira/browse/SPARK-21129
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1, 2.0.2, 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li


Function argument should not be named expressions. It could cause misleading 
error message.

{noformat}
spark-sql> select count(distinct c1, distinct c2) from t1;
{noformat}
{noformat}
Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; line 
1 pos 26;
'Project [unresolvedalias('count(c1#30, 'distinct), None)]
+- SubqueryAlias t1
   +- CatalogRelation `default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored

2017-06-17 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21132:
---

 Summary: DISTINCT modifier of function arguments should not be 
silently ignored
 Key: SPARK-21132
 URL: https://issues.apache.org/jira/browse/SPARK-21132
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li


DISTINCT modifier of function arguments should not be silently ignored when it 
is not being supported. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21129) Arguments of SQL function call should not be named expressions

2017-06-17 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21129:

Affects Version/s: (was: 2.1.1)
   (was: 2.0.2)

> Arguments of SQL function call should not be named expressions
> --
>
> Key: SPARK-21129
> URL: https://issues.apache.org/jira/browse/SPARK-21129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Function argument should not be named expressions. It could cause misleading 
> error message.
> {noformat}
> spark-sql> select count(distinct c1, distinct c2) from t1;
> {noformat}
> {noformat}
> Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; 
> line 1 pos 26;
> 'Project [unresolvedalias('count(c1#30, 'distinct), None)]
> +- SubqueryAlias t1
>+- CatalogRelation `default`.`t1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored

2017-06-17 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21132:

Affects Version/s: 2.0.2
   2.1.1

> DISTINCT modifier of function arguments should not be silently ignored
> --
>
> Key: SPARK-21132
> URL: https://issues.apache.org/jira/browse/SPARK-21132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> DISTINCT modifier of function arguments should not be silently ignored when 
> it is not being supported. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20948) Built-in SQL Function UnaryMinus/UnaryPositive support string type

2017-06-18 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20948.
-
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 2.3.0

> Built-in SQL Function UnaryMinus/UnaryPositive support string type
> --
>
> Key: SPARK-20948
> URL: https://issues.apache.org/jira/browse/SPARK-20948
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Fix For: 2.3.0
>
>
> {{UnaryMinus}}/{{UnaryPositive}} function should support string type, same as 
> hive:
> {code:sql}
> $ bin/hive
> Logging initialized using configuration in 
> jar:file:/home/wym/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
> hive> select positive('-1.11'), negative('-1.11');
> OK
> -1.11   1.11
> Time taken: 1.937 seconds, Fetched: 1 row(s)
> hive> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19824) Standalone master JSON not showing cores for running applications

2017-06-18 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-19824.
-
  Resolution: Fixed
Assignee: Jiang Xingbo
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> Standalone master JSON not showing cores for running applications
> -
>
> Key: SPARK-19824
> URL: https://issues.apache.org/jira/browse/SPARK-19824
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.0
>Reporter: Dan
>Assignee: Jiang Xingbo
>Priority: Minor
> Fix For: 2.3.0
>
>
> The JSON API of the standalone master ("/json") does not show the number of 
> cores for a running application, which is available on the UI.
>   "activeapps" : [ {
> "starttime" : 1488702337788,
> "id" : "app-20170305102537-19717",
> "name" : "POPAI_Aggregated",
> "user" : "ibiuser",
> "memoryperslave" : 16384,
> "submitdate" : "Sun Mar 05 10:25:37 IST 2017",
> "state" : "RUNNING",
> "duration" : 1141934
>   } ],



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19975) Add map_keys and map_values functions to Python

2017-06-19 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-19975.
-
   Resolution: Fixed
 Assignee: Yong Tang
Fix Version/s: 2.3.0

> Add map_keys and map_values functions  to Python 
> -
>
> Key: SPARK-19975
> URL: https://issues.apache.org/jira/browse/SPARK-19975
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.1.0
>Reporter: Maciej BryƄski
>Assignee: Yong Tang
> Fix For: 2.3.0
>
>
> We have `map_keys` and `map_values` functions in SQL.
> There is no Python equivalent functions for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21144:
---

 Summary: Unexpected results when the data schema and partition 
schema have the duplicate columns
 Key: SPARK-21144
 URL: https://issues.apache.org/jira/browse/SPARK-21144
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Xiao Li


{noformat}
withTempPath { dir =>
  val basePath = dir.getCanonicalPath
  spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
"foo=1").toString)
  spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
"foo=a").toString)
  spark.read.parquet(basePath).show()
}
{noformat}

The result of the above case is
{noformat}
+---+
|foo|
+---+
|  1|
|  1|
|  a|
|  a|
|  1|
|  a|
+---+
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054788#comment-16054788
 ] 

Xiao Li commented on SPARK-21144:
-

cc [~maropu]

> Unexpected results when the data schema and partition schema have the 
> duplicate columns
> ---
>
> Key: SPARK-21144
> URL: https://issues.apache.org/jira/browse/SPARK-21144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>
> {noformat}
> withTempPath { dir =>
>   val basePath = dir.getCanonicalPath
>   spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
> "foo=1").toString)
>   spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
> "foo=a").toString)
>   spark.read.parquet(basePath).show()
> }
> {noformat}
> The result of the above case is
> {noformat}
> +---+
> |foo|
> +---+
> |  1|
> |  1|
> |  a|
> |  a|
> |  1|
> |  a|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21144:

Target Version/s: 2.2.0

> Unexpected results when the data schema and partition schema have the 
> duplicate columns
> ---
>
> Key: SPARK-21144
> URL: https://issues.apache.org/jira/browse/SPARK-21144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>
> {noformat}
> withTempPath { dir =>
>   val basePath = dir.getCanonicalPath
>   spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
> "foo=1").toString)
>   spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
> "foo=a").toString)
>   spark.read.parquet(basePath).show()
> }
> {noformat}
> The result of the above case is
> {noformat}
> +---+
> |foo|
> +---+
> |  1|
> |  1|
> |  a|
> |  a|
> |  1|
> |  a|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.

2017-06-19 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21150:
---

 Summary: Persistent view stored in Hive metastore should be case 
preserving.
 Key: SPARK-21150
 URL: https://issues.apache.org/jira/browse/SPARK-21150
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Xiao Li


{noformat}
withView("view1") {
  spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD")
  val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, 
"int").build()

  val expectedSchema = StructType(List(
StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata),
StructField("aBcD", IntegerType, nullable = false, metadata)))
  assert(spark.table("view1").schema == expectedSchema, "Schema should 
match")
  checkAnswer(
sql("select aBcD, cAsEpReSeRvE from view1"),
Row(2, 1))
}
{noformat}

The column names of persistent view stored in Hive metastore should be case 
preserving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.

2017-06-19 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-21150:
---

Assignee: Jiang Xingbo
Target Version/s: 2.2.0
Priority: Blocker  (was: Major)

> Persistent view stored in Hive metastore should be case preserving.
> ---
>
> Key: SPARK-21150
> URL: https://issues.apache.org/jira/browse/SPARK-21150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Jiang Xingbo
>Priority: Blocker
>
> {noformat}
> withView("view1") {
>   spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD")
>   val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, 
> "int").build()
>   val expectedSchema = StructType(List(
> StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata),
> StructField("aBcD", IntegerType, nullable = false, metadata)))
>   assert(spark.table("view1").schema == expectedSchema, "Schema should 
> match")
>   checkAnswer(
> sql("select aBcD, cAsEpReSeRvE from view1"),
> Row(2, 1))
> }
> {noformat}
> The column names of persistent view stored in Hive metastore should be case 
> preserving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.

2017-06-19 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-21150:
---

Assignee: (was: Jiang Xingbo)

> Persistent view stored in Hive metastore should be case preserving.
> ---
>
> Key: SPARK-21150
> URL: https://issues.apache.org/jira/browse/SPARK-21150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Priority: Blocker
>
> {noformat}
> withView("view1") {
>   spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD")
>   val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, 
> "int").build()
>   val expectedSchema = StructType(List(
> StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata),
> StructField("aBcD", IntegerType, nullable = false, metadata)))
>   assert(spark.table("view1").schema == expectedSchema, "Schema should 
> match")
>   checkAnswer(
> sql("select aBcD, cAsEpReSeRvE from view1"),
> Row(2, 1))
> }
> {noformat}
> The column names of persistent view stored in Hive metastore should be case 
> preserving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.

2017-06-20 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21150.
-
   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 2.2.0

> Persistent view stored in Hive metastore should be case preserving.
> ---
>
> Key: SPARK-21150
> URL: https://issues.apache.org/jira/browse/SPARK-21150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.2.0
>
>
> {noformat}
> withView("view1") {
>   spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD")
>   val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, 
> "int").build()
>   val expectedSchema = StructType(List(
> StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata),
> StructField("aBcD", IntegerType, nullable = false, metadata)))
>   assert(spark.table("view1").schema == expectedSchema, "Schema should 
> match")
>   checkAnswer(
> sql("select aBcD, cAsEpReSeRvE from view1"),
> Row(2, 1))
> }
> {noformat}
> The column names of persistent view stored in Hive metastore should be case 
> preserving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2017-06-20 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-10655:
---

Assignee: Suresh Thalamati

> Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
> -
>
> Key: SPARK-10655
> URL: https://issues.apache.org/jira/browse/SPARK-10655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>Assignee: Suresh Thalamati
> Fix For: 2.3.0
>
>
> Default type mapping does not work when reading from DB2 table that contains  
> XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2017-06-20 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-10655.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
> -
>
> Key: SPARK-10655
> URL: https://issues.apache.org/jira/browse/SPARK-10655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>Assignee: Suresh Thalamati
> Fix For: 2.3.0
>
>
> Default type mapping does not work when reading from DB2 table that contains  
> XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17851) Make sure all test sqls in catalyst pass checkAnalysis

2017-06-21 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-17851.
-
   Resolution: Fixed
 Assignee: Jiang Xingbo
Fix Version/s: 2.3.0

> Make sure all test sqls in catalyst pass checkAnalysis
> --
>
> Key: SPARK-17851
> URL: https://issues.apache.org/jira/browse/SPARK-17851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Jiang Xingbo
>Assignee: Jiang Xingbo
>Priority: Minor
> Fix For: 2.3.0
>
>
> Currently we have several tens of test sqls in catalyst will fail at 
> `SimpleAnalyzer.checkAnalysis`, we should make sure they are valid.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21164) Remove isTableSample from Sample

2017-06-21 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21164:
---

 Summary: Remove isTableSample from Sample
 Key: SPARK-21164
 URL: https://issues.apache.org/jira/browse/SPARK-21164
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li


{{isTableSample}} was introduced for SQL Generation. Since SQL Generation is 
removed, we do not need to keep {{isTableSample}}. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21165) Fail to write into partitioned hive table due to attribute reference not working with cast on partition column

2017-06-21 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058440#comment-16058440
 ] 

Xiao Li commented on SPARK-21165:
-

Unable to reproduce it in the current master branch. Will try to use 2.2 RC5 
later

> Fail to write into partitioned hive table due to attribute reference not 
> working with cast on partition column
> --
>
> Key: SPARK-21165
> URL: https://issues.apache.org/jira/browse/SPARK-21165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Imran Rashid
>Priority: Blocker
>
> A simple "insert into ... select" involving partitioned hive tables fails.  
> Here's a simpler repro which doesn't involve hive at all -- this succeeds on 
> 2.1.1, but fails on 2.2.0-rc5:
> {noformat}
> spark.sql("""SET hive.exec.dynamic.partition.mode=nonstrict""")
> spark.sql("""DROP TABLE IF EXISTS src""")
> spark.sql("""DROP TABLE IF EXISTS dest""")
> spark.sql("""
> CREATE TABLE src (first string, word string)
>   PARTITIONED BY (length int)
> """)
> spark.sql("""
> INSERT INTO src PARTITION(length) VALUES
>   ('a', 'abc', 3),
>   ('b', 'bcde', 4),
>   ('c', 'cdefg', 5)
> """)
> spark.sql("""
>   CREATE TABLE dest (word string, length int)
> PARTITIONED BY (first string)
> """)
> spark.sql("""
>   INSERT INTO TABLE dest PARTITION(first) SELECT word, length, cast(first as 
> string) as first FROM src
> """)
> {noformat}
> The exception is
> {noformat}
> 17/06/21 14:25:53 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 10, 
> localhost, executor driver): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute
> , tree: first#74
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:49)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:43)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:884)
> at 
> org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363)
> at 
> org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63)
> at 
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
> at 
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec

[jira] [Commented] (SPARK-21165) Fail to write into partitioned hive table due to attribute reference not working with cast on partition column

2017-06-21 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058490#comment-16058490
 ] 

Xiao Li commented on SPARK-21165:
-

2.2 branch failed with the same error.

> Fail to write into partitioned hive table due to attribute reference not 
> working with cast on partition column
> --
>
> Key: SPARK-21165
> URL: https://issues.apache.org/jira/browse/SPARK-21165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Imran Rashid
>Priority: Blocker
>
> A simple "insert into ... select" involving partitioned hive tables fails.  
> Here's a simpler repro which doesn't involve hive at all -- this succeeds on 
> 2.1.1, but fails on 2.2.0-rc5:
> {noformat}
> spark.sql("""SET hive.exec.dynamic.partition.mode=nonstrict""")
> spark.sql("""DROP TABLE IF EXISTS src""")
> spark.sql("""DROP TABLE IF EXISTS dest""")
> spark.sql("""
> CREATE TABLE src (first string, word string)
>   PARTITIONED BY (length int)
> """)
> spark.sql("""
> INSERT INTO src PARTITION(length) VALUES
>   ('a', 'abc', 3),
>   ('b', 'bcde', 4),
>   ('c', 'cdefg', 5)
> """)
> spark.sql("""
>   CREATE TABLE dest (word string, length int)
> PARTITIONED BY (first string)
> """)
> spark.sql("""
>   INSERT INTO TABLE dest PARTITION(first) SELECT word, length, cast(first as 
> string) as first FROM src
> """)
> {noformat}
> The exception is
> {noformat}
> 17/06/21 14:25:53 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 10, 
> localhost, executor driver): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute
> , tree: first#74
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:49)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:43)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:884)
> at 
> org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363)
> at 
> org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63)
> at 
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
> at 
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101)
> at 
> org.apache.spark.

[jira] [Assigned] (SPARK-21165) Fail to write into partitioned hive table due to attribute reference not working with cast on partition column

2017-06-21 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-21165:
---

Assignee: Xiao Li

> Fail to write into partitioned hive table due to attribute reference not 
> working with cast on partition column
> --
>
> Key: SPARK-21165
> URL: https://issues.apache.org/jira/browse/SPARK-21165
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Imran Rashid
>Assignee: Xiao Li
>Priority: Blocker
>
> A simple "insert into ... select" involving partitioned hive tables fails.  
> Here's a simpler repro which doesn't involve hive at all -- this succeeds on 
> 2.1.1, but fails on 2.2.0-rc5:
> {noformat}
> spark.sql("""SET hive.exec.dynamic.partition.mode=nonstrict""")
> spark.sql("""DROP TABLE IF EXISTS src""")
> spark.sql("""DROP TABLE IF EXISTS dest""")
> spark.sql("""
> CREATE TABLE src (first string, word string)
>   PARTITIONED BY (length int)
> """)
> spark.sql("""
> INSERT INTO src PARTITION(length) VALUES
>   ('a', 'abc', 3),
>   ('b', 'bcde', 4),
>   ('c', 'cdefg', 5)
> """)
> spark.sql("""
>   CREATE TABLE dest (word string, length int)
> PARTITIONED BY (first string)
> """)
> spark.sql("""
>   INSERT INTO TABLE dest PARTITION(first) SELECT word, length, cast(first as 
> string) as first FROM src
> """)
> {noformat}
> The exception is
> {noformat}
> 17/06/21 14:25:53 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 10, 
> localhost, executor driver): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute
> , tree: first#74
> at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:49)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:43)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:884)
> at 
> org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363)
> at 
> org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63)
> at 
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102)
> at 
> org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInterna

[jira] [Updated] (SPARK-21174) Validate sampling fraction in logical operator level

2017-06-22 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21174:

Component/s: (was: Optimizer)
 SQL

> Validate sampling fraction in logical operator level
> 
>
> Key: SPARK-21174
> URL: https://issues.apache.org/jira/browse/SPARK-21174
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Currently the validation of sampling fraction in dataset is incomplete.
> As an improvement, validate sampling ratio in logical operator level:
> 1) if with replacement: ratio should be nonnegative
> 2) else: ratio should be on interval [0, 1]
> Also add test cases for the validation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21144.
-
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.2.0

> Unexpected results when the data schema and partition schema have the 
> duplicate columns
> ---
>
> Key: SPARK-21144
> URL: https://issues.apache.org/jira/browse/SPARK-21144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Takeshi Yamamuro
> Fix For: 2.2.0
>
>
> {noformat}
> withTempPath { dir =>
>   val basePath = dir.getCanonicalPath
>   spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
> "foo=1").toString)
>   spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, 
> "foo=a").toString)
>   spark.read.parquet(basePath).show()
> }
> {noformat}
> The result of the above case is
> {noformat}
> +---+
> |foo|
> +---+
> |  1|
> |  1|
> |  a|
> |  a|
> |  1|
> |  a|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21164:

Summary: Remove isTableSample from Sample and isGenerated from Alias and 
AttributeReference  (was: Remove isTableSample from Sample)

> Remove isTableSample from Sample and isGenerated from Alias and 
> AttributeReference
> --
>
> Key: SPARK-21164
> URL: https://issues.apache.org/jira/browse/SPARK-21164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> {{isTableSample}} was introduced for SQL Generation. Since SQL Generation is 
> removed, we do not need to keep {{isTableSample}}. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21164:

Description: 
isTableSample and isGenerated were introduced for SQL Generation respectively 
by #11148 and #11050

Since SQL Generation is removed, we do not need to keep isTableSample.

  was:{{isTableSample}} was introduced for SQL Generation. Since SQL Generation 
is removed, we do not need to keep {{isTableSample}}. 


> Remove isTableSample from Sample and isGenerated from Alias and 
> AttributeReference
> --
>
> Key: SPARK-21164
> URL: https://issues.apache.org/jira/browse/SPARK-21164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> isTableSample and isGenerated were introduced for SQL Generation respectively 
> by #11148 and #11050
> Since SQL Generation is removed, we do not need to keep isTableSample.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21164:

Description: 
isTableSample and isGenerated were introduced for SQL Generation respectively 
by PR 11148 and PR 11050

Since SQL Generation is removed, we do not need to keep isTableSample.

  was:
isTableSample and isGenerated were introduced for SQL Generation respectively 
by #11148 and #11050

Since SQL Generation is removed, we do not need to keep isTableSample.


> Remove isTableSample from Sample and isGenerated from Alias and 
> AttributeReference
> --
>
> Key: SPARK-21164
> URL: https://issues.apache.org/jira/browse/SPARK-21164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> isTableSample and isGenerated were introduced for SQL Generation respectively 
> by PR 11148 and PR 11050
> Since SQL Generation is removed, we do not need to keep isTableSample.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21180) Remove conf from stats functions since now we have conf in LogicalPlan

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21180.
-
   Resolution: Fixed
 Assignee: Zhenhua Wang
Fix Version/s: 2.3.0

> Remove conf from stats functions since now we have conf in LogicalPlan
> --
>
> Key: SPARK-21180
> URL: https://issues.apache.org/jira/browse/SPARK-21180
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20417) Move error reporting for subquery from Analyzer to CheckAnalysis

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20417.
-
   Resolution: Fixed
 Assignee: Dilip Biswal
Fix Version/s: 2.3.0

> Move error reporting for subquery from Analyzer to CheckAnalysis
> 
>
> Key: SPARK-20417
> URL: https://issues.apache.org/jira/browse/SPARK-20417
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
> Fix For: 2.3.0
>
>
> Currently we do a lot of validations for subquery in the Analyzer. We should 
> move them to CheckAnalysis which is the framework to catch and report 
> Analysis errors. This was mentioned as a review comment in SPARK-18874.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21164.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Remove isTableSample from Sample and isGenerated from Alias and 
> AttributeReference
> --
>
> Key: SPARK-21164
> URL: https://issues.apache.org/jira/browse/SPARK-21164
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.3.0
>
>
> isTableSample and isGenerated were introduced for SQL Generation respectively 
> by PR 11148 and PR 11050
> Since SQL Generation is removed, we do not need to keep isTableSample.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21203) Wrong result are inserted by Array of Struct

2017-06-23 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21203:
---

 Summary: Wrong result are inserted by Array of Struct
 Key: SPARK-21203
 URL: https://issues.apache.org/jira/browse/SPARK-21203
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1, 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li
Priority: Critical


{noformat}
  spark.sql(
"""
  |CREATE TABLE `tab1`
  |(`custom_fields` ARRAY>)
  |USING parquet
""".stripMargin)
  spark.sql(
"""
  |INSERT INTO `tab1`
  |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 
2, 'value', 'b'))
""".stripMargin)

  spark.sql("SELECT custom_fields.id, custom_fields.value FROM tab1").show()
{noformat}

The returned result is wrong:
{noformat}
Row(Array(2, 2), Array("b", "b"))
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21203) Wrong results are inserted by Array of Struct

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21203:

Summary: Wrong results are inserted by Array of Struct  (was: Wrong result 
are inserted by Array of Struct)

> Wrong results are inserted by Array of Struct
> -
>
> Key: SPARK-21203
> URL: https://issues.apache.org/jira/browse/SPARK-21203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
>
> {noformat}
>   spark.sql(
> """
>   |CREATE TABLE `tab1`
>   |(`custom_fields` ARRAY>)
>   |USING parquet
> """.stripMargin)
>   spark.sql(
> """
>   |INSERT INTO `tab1`
>   |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), 
> named_struct('id', 2, 'value', 'b'))
> """.stripMargin)
>   spark.sql("SELECT custom_fields.id, custom_fields.value FROM 
> tab1").show()
> {noformat}
> The returned result is wrong:
> {noformat}
> Row(Array(2, 2), Array("b", "b"))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21203) Wrong results of insertion of Array of Struct

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21203:

Summary: Wrong results of insertion of Array of Struct  (was: Wrong results 
are inserted by Array of Struct)

> Wrong results of insertion of Array of Struct
> -
>
> Key: SPARK-21203
> URL: https://issues.apache.org/jira/browse/SPARK-21203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
>
> {noformat}
>   spark.sql(
> """
>   |CREATE TABLE `tab1`
>   |(`custom_fields` ARRAY>)
>   |USING parquet
> """.stripMargin)
>   spark.sql(
> """
>   |INSERT INTO `tab1`
>   |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), 
> named_struct('id', 2, 'value', 'b'))
> """.stripMargin)
>   spark.sql("SELECT custom_fields.id, custom_fields.value FROM 
> tab1").show()
> {noformat}
> The returned result is wrong:
> {noformat}
> Row(Array(2, 2), Array("b", "b"))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21203) Wrong results of insertion of Array of Struct

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21203:

Target Version/s: 2.2.0

> Wrong results of insertion of Array of Struct
> -
>
> Key: SPARK-21203
> URL: https://issues.apache.org/jira/browse/SPARK-21203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
>
> {noformat}
>   spark.sql(
> """
>   |CREATE TABLE `tab1`
>   |(`custom_fields` ARRAY>)
>   |USING parquet
> """.stripMargin)
>   spark.sql(
> """
>   |INSERT INTO `tab1`
>   |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), 
> named_struct('id', 2, 'value', 'b'))
> """.stripMargin)
>   spark.sql("SELECT custom_fields.id, custom_fields.value FROM 
> tab1").show()
> {noformat}
> The returned result is wrong:
> {noformat}
> Row(Array(2, 2), Array("b", "b"))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20555) Incorrect handling of Oracle's decimal types via JDBC

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20555.
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.2

> Incorrect handling of Oracle's decimal types via JDBC
> -
>
> Key: SPARK-20555
> URL: https://issues.apache.org/jira/browse/SPARK-20555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Gabor Feher
> Fix For: 2.1.2, 2.2.0
>
>
> When querying an Oracle database, Spark maps some Oracle numeric data types 
> to incorrect Catalyst data types:
> 1. DECIMAL(1) becomes BooleanType
> In Orcale, a DECIMAL(1) can have values from -9 to 9.
> In Spark now, values larger than 1 become the boolean value true.
> 2. DECIMAL(3,2) becomes IntegerType
> In Oracle, a DECIMAL(2) can have values like 1.23
> In Spark now, digits after the decimal point are dropped.
> 3. DECIMAL(10) becomes IntegerType
> In Oracle, a DECIMAL(10) can have the value 99 (ten nines), which is 
> more than 2^31
> Spark throws an exception: "java.sql.SQLException: Numeric Overflow"
> I think the best solution is to always keep Oracle's decimal types. (In 
> theory we could introduce a FloatType in some case of #2, and fix #3 by only 
> introducing IntegerType for DECIMAL(9). But in my opinion, that would end up 
> complicated and error-prone.)
> Note: I think the above problems were introduced as part of  
> https://github.com/apache/spark/pull/14377
> The main purpose of that PR seems to be converting Spark types to correct 
> Oracle types, and that part seems good to me. But it also adds the inverse 
> conversions. As it turns out in the above examples, that is not possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20555) Incorrect handling of Oracle's decimal types via JDBC

2017-06-23 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-20555:
---

Assignee: Gabor Feher

> Incorrect handling of Oracle's decimal types via JDBC
> -
>
> Key: SPARK-20555
> URL: https://issues.apache.org/jira/browse/SPARK-20555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Gabor Feher
>Assignee: Gabor Feher
> Fix For: 2.1.2, 2.2.0
>
>
> When querying an Oracle database, Spark maps some Oracle numeric data types 
> to incorrect Catalyst data types:
> 1. DECIMAL(1) becomes BooleanType
> In Orcale, a DECIMAL(1) can have values from -9 to 9.
> In Spark now, values larger than 1 become the boolean value true.
> 2. DECIMAL(3,2) becomes IntegerType
> In Oracle, a DECIMAL(2) can have values like 1.23
> In Spark now, digits after the decimal point are dropped.
> 3. DECIMAL(10) becomes IntegerType
> In Oracle, a DECIMAL(10) can have the value 99 (ten nines), which is 
> more than 2^31
> Spark throws an exception: "java.sql.SQLException: Numeric Overflow"
> I think the best solution is to always keep Oracle's decimal types. (In 
> theory we could introduce a FloatType in some case of #2, and fix #3 by only 
> introducing IntegerType for DECIMAL(9). But in my opinion, that would end up 
> complicated and error-prone.)
> Note: I think the above problems were introduced as part of  
> https://github.com/apache/spark/pull/14377
> The main purpose of that PR seems to be converting Spark types to correct 
> Oracle types, and that part seems good to me. But it also adds the inverse 
> conversions. As it turns out in the above examples, that is not possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21079) ANALYZE TABLE fails to calculate totalSize for a partitioned table

2017-06-24 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21079.
-
   Resolution: Fixed
 Assignee: Maria
Fix Version/s: 2.2.0

> ANALYZE TABLE fails to calculate totalSize for a partitioned table
> --
>
> Key: SPARK-21079
> URL: https://issues.apache.org/jira/browse/SPARK-21079
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Maria
>Assignee: Maria
>  Labels: easyfix
> Fix For: 2.2.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ANALYZE TABLE table COMPUTE STATISTICS invoked for a partition table produces 
> totalSize = 0.
> AnalyzeTableCommand fetches table-level storage URI and calculated total size 
> of files in the corresponding directory recursively. However, for partitioned 
> tables, each partition has its own storage URI which may not be a 
> subdirectory of the table-level storage URI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21256) Add WithSQLConf to Catalyst

2017-06-29 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21256:
---

 Summary: Add WithSQLConf to Catalyst
 Key: SPARK-21256
 URL: https://issues.apache.org/jira/browse/SPARK-21256
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 2.3.0
Reporter: Xiao Li
Assignee: Xiao Li


Add WithSQLConf to the Catalyst module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21256) Add WithSQLConf to Catalyst Test

2017-06-29 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21256:

Summary: Add WithSQLConf to Catalyst Test  (was: Add WithSQLConf to 
Catalyst)

> Add WithSQLConf to Catalyst Test
> 
>
> Key: SPARK-21256
> URL: https://issues.apache.org/jira/browse/SPARK-21256
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Add WithSQLConf to the Catalyst module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table

2017-06-30 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20073:

Labels:   (was: correctness)

> Unexpected Cartesian product when using eqNullSafe in join with a derived 
> table
> ---
>
> Key: SPARK-20073
> URL: https://issues.apache.org/jira/browse/SPARK-20073
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Everett Anderson
>
> It appears that if you try to join tables A and B when B is derived from A 
> and you use the eqNullSafe / <=> operator for the join condition, Spark 
> performs a Cartesian product.
> However, if you perform the join on tables of the same data when they don't 
> have a relationship, the expected non-Cartesian product join occurs.
> {noformat}
> // Create some fake data.
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Dataset
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions
> val peopleRowsRDD = sc.parallelize(Seq(
> Row("Fred", 8, 1),
> Row("Fred", 8, 2),
> Row(null, 10, 3),
> Row(null, 10, 4),
> Row("Amy", 12, 5),
> Row("Amy", 12, 6)))
> 
> val peopleSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("group", IntegerType, nullable = true),
> StructField("data", IntegerType, nullable = true)))
> 
> val people = spark.createDataFrame(peopleRowsRDD, peopleSchema)
> people.createOrReplaceTempView("people")
> scala> people.show
> ++-++
> |name|group|data|
> ++-++
> |Fred|8|   1|
> |Fred|8|   2|
> |null|   10|   3|
> |null|   10|   4|
> | Amy|   12|   5|
> | Amy|   12|   6|
> ++-++
> // Now create a derived table from that table. It doesn't matter much what.
> val variantCounts = spark.sql("select name, count(distinct(name, group, 
> data)) as variant_count from people group by name having variant_count > 1")
> variantCounts.show
> ++-+  
>   
> |name|variant_count|
> ++-+
> |Fred|2|
> |null|2|
> | Amy|2|
> ++-+
> // Now try an inner join using the regular equalTo that drops nulls. This 
> works fine.
> val innerJoinEqualTo = variantCounts.join(people, 
> variantCounts("name").equalTo(people("name")))
> innerJoinEqualTo.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay now lets switch to the <=> operator
> //
> // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error 
> like
> // "Cartesian joins could be prohibitively expensive and are disabled by 
> default. To explicitly enable them, please set spark.sql.crossJoin.enabled = 
> true;"
> //
> // if you have enabled them, you'll get the table below.
> //
> // However, we really don't want or expect a Cartesian product!
> val innerJoinSqlNullSafeEqOp = variantCounts.join(people, 
> variantCounts("name")<=>(people("name")))
> innerJoinSqlNullSafeEqOp.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> |Fred|2|null|   10|   3|
> |Fred|2|null|   10|   4|
> |Fred|2| Amy|   12|   5|
> |Fred|2| Amy|   12|   6|
> |null|2|Fred|8|   1|
> |null|2|Fred|8|   2|
> |null|2|null|   10|   3|
> |null|2|null|   10|   4|
> |null|2| Amy|   12|   5|
> |null|2| Amy|   12|   6|
> | Amy|2|Fred|8|   1|
> | Amy|2|Fred|8|   2|
> | Amy|2|null|   10|   3|
> | Amy|2|null|   10|   4|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay, let's try to construct the exact same variantCount table manually
> // so it has no relationship to the original.
> val variantCountRowsRDD = sc.parallelize(Seq(
> Row("Fred", 2),
> Row(null, 2),
> Row("Amy", 2)))
> 
> val variantCountSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("variant_count", IntegerType, nullable = true)))
> 
> val manualVariantCounts = spark.createDataFrame(variantCountRowsRDD, 
> variantCountSchema)
> // Now perform the 

[jira] [Updated] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table

2017-06-30 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20073:

Component/s: (was: Optimizer)
 SQL

> Unexpected Cartesian product when using eqNullSafe in join with a derived 
> table
> ---
>
> Key: SPARK-20073
> URL: https://issues.apache.org/jira/browse/SPARK-20073
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Everett Anderson
>
> It appears that if you try to join tables A and B when B is derived from A 
> and you use the eqNullSafe / <=> operator for the join condition, Spark 
> performs a Cartesian product.
> However, if you perform the join on tables of the same data when they don't 
> have a relationship, the expected non-Cartesian product join occurs.
> {noformat}
> // Create some fake data.
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Dataset
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions
> val peopleRowsRDD = sc.parallelize(Seq(
> Row("Fred", 8, 1),
> Row("Fred", 8, 2),
> Row(null, 10, 3),
> Row(null, 10, 4),
> Row("Amy", 12, 5),
> Row("Amy", 12, 6)))
> 
> val peopleSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("group", IntegerType, nullable = true),
> StructField("data", IntegerType, nullable = true)))
> 
> val people = spark.createDataFrame(peopleRowsRDD, peopleSchema)
> people.createOrReplaceTempView("people")
> scala> people.show
> ++-++
> |name|group|data|
> ++-++
> |Fred|8|   1|
> |Fred|8|   2|
> |null|   10|   3|
> |null|   10|   4|
> | Amy|   12|   5|
> | Amy|   12|   6|
> ++-++
> // Now create a derived table from that table. It doesn't matter much what.
> val variantCounts = spark.sql("select name, count(distinct(name, group, 
> data)) as variant_count from people group by name having variant_count > 1")
> variantCounts.show
> ++-+  
>   
> |name|variant_count|
> ++-+
> |Fred|2|
> |null|2|
> | Amy|2|
> ++-+
> // Now try an inner join using the regular equalTo that drops nulls. This 
> works fine.
> val innerJoinEqualTo = variantCounts.join(people, 
> variantCounts("name").equalTo(people("name")))
> innerJoinEqualTo.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay now lets switch to the <=> operator
> //
> // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error 
> like
> // "Cartesian joins could be prohibitively expensive and are disabled by 
> default. To explicitly enable them, please set spark.sql.crossJoin.enabled = 
> true;"
> //
> // if you have enabled them, you'll get the table below.
> //
> // However, we really don't want or expect a Cartesian product!
> val innerJoinSqlNullSafeEqOp = variantCounts.join(people, 
> variantCounts("name")<=>(people("name")))
> innerJoinSqlNullSafeEqOp.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> |Fred|2|null|   10|   3|
> |Fred|2|null|   10|   4|
> |Fred|2| Amy|   12|   5|
> |Fred|2| Amy|   12|   6|
> |null|2|Fred|8|   1|
> |null|2|Fred|8|   2|
> |null|2|null|   10|   3|
> |null|2|null|   10|   4|
> |null|2| Amy|   12|   5|
> |null|2| Amy|   12|   6|
> | Amy|2|Fred|8|   1|
> | Amy|2|Fred|8|   2|
> | Amy|2|null|   10|   3|
> | Amy|2|null|   10|   4|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay, let's try to construct the exact same variantCount table manually
> // so it has no relationship to the original.
> val variantCountRowsRDD = sc.parallelize(Seq(
> Row("Fred", 2),
> Row(null, 2),
> Row("Amy", 2)))
> 
> val variantCountSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("variant_count", IntegerType, nullable = true)))
> 
> val manualVariantCounts = spark.createDataFrame(variantCountRowsRDD, 
> variantCountSchema)
>

[jira] [Updated] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table

2017-06-30 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20073:

Issue Type: Improvement  (was: Bug)

> Unexpected Cartesian product when using eqNullSafe in join with a derived 
> table
> ---
>
> Key: SPARK-20073
> URL: https://issues.apache.org/jira/browse/SPARK-20073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Everett Anderson
>
> It appears that if you try to join tables A and B when B is derived from A 
> and you use the eqNullSafe / <=> operator for the join condition, Spark 
> performs a Cartesian product.
> However, if you perform the join on tables of the same data when they don't 
> have a relationship, the expected non-Cartesian product join occurs.
> {noformat}
> // Create some fake data.
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Dataset
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions
> val peopleRowsRDD = sc.parallelize(Seq(
> Row("Fred", 8, 1),
> Row("Fred", 8, 2),
> Row(null, 10, 3),
> Row(null, 10, 4),
> Row("Amy", 12, 5),
> Row("Amy", 12, 6)))
> 
> val peopleSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("group", IntegerType, nullable = true),
> StructField("data", IntegerType, nullable = true)))
> 
> val people = spark.createDataFrame(peopleRowsRDD, peopleSchema)
> people.createOrReplaceTempView("people")
> scala> people.show
> ++-++
> |name|group|data|
> ++-++
> |Fred|8|   1|
> |Fred|8|   2|
> |null|   10|   3|
> |null|   10|   4|
> | Amy|   12|   5|
> | Amy|   12|   6|
> ++-++
> // Now create a derived table from that table. It doesn't matter much what.
> val variantCounts = spark.sql("select name, count(distinct(name, group, 
> data)) as variant_count from people group by name having variant_count > 1")
> variantCounts.show
> ++-+  
>   
> |name|variant_count|
> ++-+
> |Fred|2|
> |null|2|
> | Amy|2|
> ++-+
> // Now try an inner join using the regular equalTo that drops nulls. This 
> works fine.
> val innerJoinEqualTo = variantCounts.join(people, 
> variantCounts("name").equalTo(people("name")))
> innerJoinEqualTo.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay now lets switch to the <=> operator
> //
> // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error 
> like
> // "Cartesian joins could be prohibitively expensive and are disabled by 
> default. To explicitly enable them, please set spark.sql.crossJoin.enabled = 
> true;"
> //
> // if you have enabled them, you'll get the table below.
> //
> // However, we really don't want or expect a Cartesian product!
> val innerJoinSqlNullSafeEqOp = variantCounts.join(people, 
> variantCounts("name")<=>(people("name")))
> innerJoinSqlNullSafeEqOp.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> |Fred|2|null|   10|   3|
> |Fred|2|null|   10|   4|
> |Fred|2| Amy|   12|   5|
> |Fred|2| Amy|   12|   6|
> |null|2|Fred|8|   1|
> |null|2|Fred|8|   2|
> |null|2|null|   10|   3|
> |null|2|null|   10|   4|
> |null|2| Amy|   12|   5|
> |null|2| Amy|   12|   6|
> | Amy|2|Fred|8|   1|
> | Amy|2|Fred|8|   2|
> | Amy|2|null|   10|   3|
> | Amy|2|null|   10|   4|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay, let's try to construct the exact same variantCount table manually
> // so it has no relationship to the original.
> val variantCountRowsRDD = sc.parallelize(Seq(
> Row("Fred", 2),
> Row(null, 2),
> Row("Amy", 2)))
> 
> val variantCountSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("variant_count", IntegerType, nullable = true)))
> 
> val manualVariantCounts = spark.createDataFrame(variantCountRowsRDD, 
> variantCountSchema)
> // Now per

[jira] [Resolved] (SPARK-21129) Arguments of SQL function call should not be named expressions

2017-06-30 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21129.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> Arguments of SQL function call should not be named expressions
> --
>
> Key: SPARK-21129
> URL: https://issues.apache.org/jira/browse/SPARK-21129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> Function argument should not be named expressions. It could cause misleading 
> error message.
> {noformat}
> spark-sql> select count(distinct c1, distinct c2) from t1;
> {noformat}
> {noformat}
> Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; 
> line 1 pos 26;
> 'Project [unresolvedalias('count(c1#30, 'distinct), None)]
> +- SubqueryAlias t1
>+- CatalogRelation `default`.`t1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21273) Decouple stats propagation from logical plan

2017-06-30 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21273.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Decouple stats propagation from logical plan
> 
>
> Key: SPARK-21273
> URL: https://issues.apache.org/jira/browse/SPARK-21273
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.3.0
>
>
> We currently implement statistics propagation directly in logical plan. Given 
> we already have two different implementations, it'd make sense to actually 
> decouple the two and add stats propagation using mixin.
> This can also be a powerful pattern in the future to add additional 
> properties (e.g. constraints).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18004) DataFrame filter Predicate push-down fails for Oracle Timestamp type columns

2017-07-02 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-18004.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> DataFrame filter Predicate push-down fails for Oracle Timestamp type columns
> 
>
> Key: SPARK-18004
> URL: https://issues.apache.org/jira/browse/SPARK-18004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Suhas Nalapure
>Assignee: Rui Zha
>Priority: Critical
> Fix For: 2.3.0
>
>
> DataFrame filter Predicate push-down fails for Oracle Timestamp type columns 
> with Exception java.sql.SQLDataException: ORA-01861: literal does not match 
> format string:
> Java source code (this code works fine for mysql & mssql databases) :
> {noformat}
> //DataFrame df = create a DataFrame over an Oracle table
> df = df.filter(df.col("TS").lt(new 
> java.sql.Timestamp(System.currentTimeMillis(;
>   df.explain();
>   df.show();
> {noformat}
> Log statements with the Exception:
> {noformat}
> Schema: root
>  |-- ID: string (nullable = false)
>  |-- TS: timestamp (nullable = true)
>  |-- DEVICE_ID: string (nullable = true)
>  |-- REPLACEMENT: string (nullable = true)
> {noformat}
> {noformat}
> == Physical Plan ==
> Filter (TS#1 < 1476861841934000)
> +- Scan 
> JDBCRelation(jdbc:oracle:thin:@10.0.0.111:1521:orcl,ORATABLE,[Lorg.apache.spark.Partition;@78c74647,{user=user,
>  password=pwd, url=jdbc:oracle:thin:@10.0.0.111:1521:orcl, dbtable=ORATABLE, 
> driver=oracle.jdbc.driver.OracleDriver})[ID#0,TS#1,DEVICE_ID#2,REPLACEMENT#3] 
> PushedFilters: [LessThan(TS,2016-10-19 12:54:01.934)]
> 2016-10-19 12:54:04,268 ERROR [Executor task launch worker-0] 
> org.apache.spark.executor.Executor
> Exception in task 0.0 in stage 0.0 (TID 0)
> java.sql.SQLDataException: ORA-01861: literal does not match format string
>   at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:461)
>   at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:402)
>   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1065)
>   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:681)
>   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:256)
>   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:577)
>   at 
> oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:239)
>   at 
> oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:75)
>   at 
> oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1043)
>   at 
> oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:)
>   at 
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1353)
>   at 
> oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4485)
>   at 
> oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4566)
>   at 
> oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:5251)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:383)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:359)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18004) DataFrame filter Predicate push-down fails for Oracle Timestamp type columns

2017-07-02 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-18004:
---

Assignee: Rui Zha

> DataFrame filter Predicate push-down fails for Oracle Timestamp type columns
> 
>
> Key: SPARK-18004
> URL: https://issues.apache.org/jira/browse/SPARK-18004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Suhas Nalapure
>Assignee: Rui Zha
>Priority: Critical
> Fix For: 2.3.0
>
>
> DataFrame filter Predicate push-down fails for Oracle Timestamp type columns 
> with Exception java.sql.SQLDataException: ORA-01861: literal does not match 
> format string:
> Java source code (this code works fine for mysql & mssql databases) :
> {noformat}
> //DataFrame df = create a DataFrame over an Oracle table
> df = df.filter(df.col("TS").lt(new 
> java.sql.Timestamp(System.currentTimeMillis(;
>   df.explain();
>   df.show();
> {noformat}
> Log statements with the Exception:
> {noformat}
> Schema: root
>  |-- ID: string (nullable = false)
>  |-- TS: timestamp (nullable = true)
>  |-- DEVICE_ID: string (nullable = true)
>  |-- REPLACEMENT: string (nullable = true)
> {noformat}
> {noformat}
> == Physical Plan ==
> Filter (TS#1 < 1476861841934000)
> +- Scan 
> JDBCRelation(jdbc:oracle:thin:@10.0.0.111:1521:orcl,ORATABLE,[Lorg.apache.spark.Partition;@78c74647,{user=user,
>  password=pwd, url=jdbc:oracle:thin:@10.0.0.111:1521:orcl, dbtable=ORATABLE, 
> driver=oracle.jdbc.driver.OracleDriver})[ID#0,TS#1,DEVICE_ID#2,REPLACEMENT#3] 
> PushedFilters: [LessThan(TS,2016-10-19 12:54:01.934)]
> 2016-10-19 12:54:04,268 ERROR [Executor task launch worker-0] 
> org.apache.spark.executor.Executor
> Exception in task 0.0 in stage 0.0 (TID 0)
> java.sql.SQLDataException: ORA-01861: literal does not match format string
>   at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:461)
>   at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:402)
>   at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1065)
>   at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:681)
>   at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:256)
>   at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:577)
>   at 
> oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:239)
>   at 
> oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:75)
>   at 
> oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1043)
>   at 
> oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:)
>   at 
> oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1353)
>   at 
> oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4485)
>   at 
> oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4566)
>   at 
> oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:5251)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:383)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:359)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21282) Fix test failure in 2.0

2017-07-02 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21282:
---

 Summary: Fix test failure in 2.0
 Key: SPARK-21282
 URL: https://issues.apache.org/jira/browse/SPARK-21282
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.2
Reporter: Xiao Li
Assignee: Xiao Li


There is a test failure after backporting a fix from 2.2 to 2.0, because the 
automatically generated column names are different. 

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21287) Cannot use Int.MIN_VALUE as Spark SQL fetchsize

2017-07-03 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072700#comment-16072700
 ] 

Xiao Li commented on SPARK-21287:
-

This value is very specific to MySQL. Since we are supporting different 
dialects, we could introduce a dialect-specific checking logics.

> Cannot use Int.MIN_VALUE as Spark SQL fetchsize
> ---
>
> Key: SPARK-21287
> URL: https://issues.apache.org/jira/browse/SPARK-21287
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Maciej BryƄski
>
> MySQL JDBC driver gives possibility to not store ResultSet in memory.
> We can do this by setting fetchSize to Int.MIN_VALUE.
> Unfortunately this configuration isn't correct in Spark.
> {code}
> java.lang.IllegalArgumentException: requirement failed: Invalid value 
> `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the 
> value is 0, the JDBC driver ignores the value and does the estimates.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:34)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206)
>   at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:214)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table

2017-07-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20073.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Unexpected Cartesian product when using eqNullSafe in join with a derived 
> table
> ---
>
> Key: SPARK-20073
> URL: https://issues.apache.org/jira/browse/SPARK-20073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Everett Anderson
>Assignee: Takeshi Yamamuro
> Fix For: 2.3.0
>
>
> It appears that if you try to join tables A and B when B is derived from A 
> and you use the eqNullSafe / <=> operator for the join condition, Spark 
> performs a Cartesian product.
> However, if you perform the join on tables of the same data when they don't 
> have a relationship, the expected non-Cartesian product join occurs.
> {noformat}
> // Create some fake data.
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Dataset
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions
> val peopleRowsRDD = sc.parallelize(Seq(
> Row("Fred", 8, 1),
> Row("Fred", 8, 2),
> Row(null, 10, 3),
> Row(null, 10, 4),
> Row("Amy", 12, 5),
> Row("Amy", 12, 6)))
> 
> val peopleSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("group", IntegerType, nullable = true),
> StructField("data", IntegerType, nullable = true)))
> 
> val people = spark.createDataFrame(peopleRowsRDD, peopleSchema)
> people.createOrReplaceTempView("people")
> scala> people.show
> ++-++
> |name|group|data|
> ++-++
> |Fred|8|   1|
> |Fred|8|   2|
> |null|   10|   3|
> |null|   10|   4|
> | Amy|   12|   5|
> | Amy|   12|   6|
> ++-++
> // Now create a derived table from that table. It doesn't matter much what.
> val variantCounts = spark.sql("select name, count(distinct(name, group, 
> data)) as variant_count from people group by name having variant_count > 1")
> variantCounts.show
> ++-+  
>   
> |name|variant_count|
> ++-+
> |Fred|2|
> |null|2|
> | Amy|2|
> ++-+
> // Now try an inner join using the regular equalTo that drops nulls. This 
> works fine.
> val innerJoinEqualTo = variantCounts.join(people, 
> variantCounts("name").equalTo(people("name")))
> innerJoinEqualTo.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay now lets switch to the <=> operator
> //
> // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error 
> like
> // "Cartesian joins could be prohibitively expensive and are disabled by 
> default. To explicitly enable them, please set spark.sql.crossJoin.enabled = 
> true;"
> //
> // if you have enabled them, you'll get the table below.
> //
> // However, we really don't want or expect a Cartesian product!
> val innerJoinSqlNullSafeEqOp = variantCounts.join(people, 
> variantCounts("name")<=>(people("name")))
> innerJoinSqlNullSafeEqOp.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> |Fred|2|null|   10|   3|
> |Fred|2|null|   10|   4|
> |Fred|2| Amy|   12|   5|
> |Fred|2| Amy|   12|   6|
> |null|2|Fred|8|   1|
> |null|2|Fred|8|   2|
> |null|2|null|   10|   3|
> |null|2|null|   10|   4|
> |null|2| Amy|   12|   5|
> |null|2| Amy|   12|   6|
> | Amy|2|Fred|8|   1|
> | Amy|2|Fred|8|   2|
> | Amy|2|null|   10|   3|
> | Amy|2|null|   10|   4|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay, let's try to construct the exact same variantCount table manually
> // so it has no relationship to the original.
> val variantCountRowsRDD = sc.parallelize(Seq(
> Row("Fred", 2),
> Row(null, 2),
> Row("Amy", 2)))
> 
> val variantCountSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("variant_count", IntegerType, nullable = true)))
> 
> val manualVariantCoun

[jira] [Assigned] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table

2017-07-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-20073:
---

Assignee: Takeshi Yamamuro

> Unexpected Cartesian product when using eqNullSafe in join with a derived 
> table
> ---
>
> Key: SPARK-20073
> URL: https://issues.apache.org/jira/browse/SPARK-20073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Everett Anderson
>Assignee: Takeshi Yamamuro
> Fix For: 2.3.0
>
>
> It appears that if you try to join tables A and B when B is derived from A 
> and you use the eqNullSafe / <=> operator for the join condition, Spark 
> performs a Cartesian product.
> However, if you perform the join on tables of the same data when they don't 
> have a relationship, the expected non-Cartesian product join occurs.
> {noformat}
> // Create some fake data.
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Dataset
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions
> val peopleRowsRDD = sc.parallelize(Seq(
> Row("Fred", 8, 1),
> Row("Fred", 8, 2),
> Row(null, 10, 3),
> Row(null, 10, 4),
> Row("Amy", 12, 5),
> Row("Amy", 12, 6)))
> 
> val peopleSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("group", IntegerType, nullable = true),
> StructField("data", IntegerType, nullable = true)))
> 
> val people = spark.createDataFrame(peopleRowsRDD, peopleSchema)
> people.createOrReplaceTempView("people")
> scala> people.show
> ++-++
> |name|group|data|
> ++-++
> |Fred|8|   1|
> |Fred|8|   2|
> |null|   10|   3|
> |null|   10|   4|
> | Amy|   12|   5|
> | Amy|   12|   6|
> ++-++
> // Now create a derived table from that table. It doesn't matter much what.
> val variantCounts = spark.sql("select name, count(distinct(name, group, 
> data)) as variant_count from people group by name having variant_count > 1")
> variantCounts.show
> ++-+  
>   
> |name|variant_count|
> ++-+
> |Fred|2|
> |null|2|
> | Amy|2|
> ++-+
> // Now try an inner join using the regular equalTo that drops nulls. This 
> works fine.
> val innerJoinEqualTo = variantCounts.join(people, 
> variantCounts("name").equalTo(people("name")))
> innerJoinEqualTo.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay now lets switch to the <=> operator
> //
> // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error 
> like
> // "Cartesian joins could be prohibitively expensive and are disabled by 
> default. To explicitly enable them, please set spark.sql.crossJoin.enabled = 
> true;"
> //
> // if you have enabled them, you'll get the table below.
> //
> // However, we really don't want or expect a Cartesian product!
> val innerJoinSqlNullSafeEqOp = variantCounts.join(people, 
> variantCounts("name")<=>(people("name")))
> innerJoinSqlNullSafeEqOp.show
> ++-++-++  
>   
> |name|variant_count|name|group|data|
> ++-++-++
> |Fred|2|Fred|8|   1|
> |Fred|2|Fred|8|   2|
> |Fred|2|null|   10|   3|
> |Fred|2|null|   10|   4|
> |Fred|2| Amy|   12|   5|
> |Fred|2| Amy|   12|   6|
> |null|2|Fred|8|   1|
> |null|2|Fred|8|   2|
> |null|2|null|   10|   3|
> |null|2|null|   10|   4|
> |null|2| Amy|   12|   5|
> |null|2| Amy|   12|   6|
> | Amy|2|Fred|8|   1|
> | Amy|2|Fred|8|   2|
> | Amy|2|null|   10|   3|
> | Amy|2|null|   10|   4|
> | Amy|2| Amy|   12|   5|
> | Amy|2| Amy|   12|   6|
> ++-++-++
> // Okay, let's try to construct the exact same variantCount table manually
> // so it has no relationship to the original.
> val variantCountRowsRDD = sc.parallelize(Seq(
> Row("Fred", 2),
> Row(null, 2),
> Row("Amy", 2)))
> 
> val variantCountSchema = StructType(Seq(
> StructField("name", StringType, nullable = true),
> StructField("variant_count", IntegerType, nullable = true)))
> 
> val manualVariantCounts = spark.cre

[jira] [Resolved] (SPARK-21284) rename SessionCatalog.registerFunction parameter name

2017-07-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21284.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> rename SessionCatalog.registerFunction parameter name
> -
>
> Key: SPARK-21284
> URL: https://issues.apache.org/jira/browse/SPARK-21284
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21295) Use qualified name in the error message for missing references

2017-07-03 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21295:
---

 Summary: Use qualified name in the error message for missing 
references
 Key: SPARK-21295
 URL: https://issues.apache.org/jira/browse/SPARK-21295
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li
Assignee: Xiao Li


It is strange to see the following error message. Actually, the column is from 
different tables. 
{noformat}
`cannot resolve '`right.a`' given input columns: [a, c, d];`
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21295) Confusing error message for missing references

2017-07-03 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21295:

Summary: Confusing error message for missing references  (was: Use 
qualified name in the error message for missing references)

> Confusing error message for missing references
> --
>
> Key: SPARK-21295
> URL: https://issues.apache.org/jira/browse/SPARK-21295
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> It is strange to see the following error message. Actually, the column is 
> from different tables. 
> {noformat}
> `cannot resolve '`right.a`' given input columns: [a, c, d];`
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19726) Faild to insert null timestamp value to mysql using spark jdbc

2017-07-04 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-19726.
-
   Resolution: Fixed
 Assignee: wangshuangshuang
Fix Version/s: 2.3.0

> Faild to insert null timestamp value to mysql using spark jdbc
> --
>
> Key: SPARK-19726
> URL: https://issues.apache.org/jira/browse/SPARK-19726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: AnfengYuan
>Assignee: wangshuangshuang
> Fix For: 2.3.0
>
>
> 1. create a table in mysql
> {code:borderStyle=solid}
> CREATE TABLE `timestamp_test` (
>   `id` bigint(23) DEFAULT NULL,
>   `time_stamp` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
> CURRENT_TIMESTAMP
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8
> {code}
> 2. insert one row using spark
> {code:borderStyle=solid}
> CREATE OR REPLACE TEMPORARY VIEW jdbcTable
> USING org.apache.spark.sql.jdbc
> OPTIONS (
>   url 
> 'jdbc:mysql://xxx.xxx.xxx.xxx:3306/default?characterEncoding=utf8&useServerPrepStmts=false&rewriteBatchedStatements=true',
>   dbtable 'timestamp_test',
>   driver 'com.mysql.jdbc.Driver',
>   user 'root',
>   password 'root'
> );
> insert into jdbcTable values (1, null);
> {code}
> the insert statement failed with exceptions:
> {code:borderStyle=solid}
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 599 in stage 1.0 failed 4 times, most recent failure: Lost task 599.3 in 
> stage 1.0 (TID 1202, A03-R07-I12-135.JD.LOCAL): 
> java.sql.BatchUpdateException: Data truncation: Incorrect datetime value: 
> '1970-01-01 08:00:00' for column 'time_stamp' at row 1
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
>   at com.mysql.jdbc.Util.getInstance(Util.java:387)
>   at 
> com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1154)
>   at 
> com.mysql.jdbc.PreparedStatement.executeBatchedInserts(PreparedStatement.java:1582)
>   at 
> com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1248)
>   at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:959)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:227)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:300)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:299)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect 
> datetime value: '1970-01-01 08:00:00' for column 'time_stamp' at row 1
>   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3876)
>   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814)
>   at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2478)
>   at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625)
>   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551)
>   at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
>   at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
>   at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009)
>   at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5094)
>   at 
> com.mysql.jdbc.PreparedStatement.executeBatchedInserts(PreparedStatement.java:1543)
>   ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.

[jira] [Resolved] (SPARK-20256) Fail to start SparkContext/SparkSession with Hive support enabled when user does not have read/write privilege to Hive metastore warehouse dir

2017-07-04 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20256.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.3.0
   2.2.1

> Fail to start SparkContext/SparkSession with Hive support enabled when user 
> does not have read/write privilege to Hive metastore warehouse dir
> --
>
> Key: SPARK-20256
> URL: https://issues.apache.org/jira/browse/SPARK-20256
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Xin Wu
>Assignee: Dongjoon Hyun
>Priority: Critical
> Fix For: 2.2.1, 2.3.0
>
>
> In a cluster setup with production Hive running, when the user wants to run 
> spark-shell using the production Hive metastore, hive-site.xml is copied to 
> SPARK_HOME/conf. So when spark-shell is being started, it tries to check 
> database existence of "default" database from Hive metastore. Yet, since this 
> user may not have READ/WRITE access to the configured Hive warehouse 
> directory done by Hive itself, such permission error will prevent spark-shell 
> or any spark application with Hive support enabled from starting at all. 
> Example error:
> {code}To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionState':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
>   at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
>   at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
>   ... 47 elided
> Caused by: java.lang.reflect.InvocationTargetException: 
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.security.AccessControlException: Permission 
> denied: user=notebook, access=READ, 
> inode="/apps/hive/warehouse":hive:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1712)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1686)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:8238)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:1933)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1455)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1697)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)
> );
>   at sun.reflect.NativeConstructorAccessorImpl.new

[jira] [Created] (SPARK-21307) Remove SQLConf parameters from the parser-related classes.

2017-07-04 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21307:
---

 Summary: Remove SQLConf parameters from the parser-related classes.
 Key: SPARK-21307
 URL: https://issues.apache.org/jira/browse/SPARK-21307
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li
Assignee: Xiao Li


Remove SQLConf parameters from the parser-related classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21309) Remove SQLConf parameters from the analyzer

2017-07-04 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21309:
---

 Summary: Remove SQLConf parameters from the analyzer
 Key: SPARK-21309
 URL: https://issues.apache.org/jira/browse/SPARK-21309
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li
Assignee: Xiao Li


Remove SQLConf parameters from the analyzer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21308) Remove SQLConf parameters from the optimizer

2017-07-04 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21308:
---

 Summary: Remove SQLConf parameters from the optimizer
 Key: SPARK-21308
 URL: https://issues.apache.org/jira/browse/SPARK-21308
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li
Assignee: Xiao Li


Remove SQLConf parameters from the optimizer



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19439) PySpark's registerJavaFunction Should Support UDAFs

2017-07-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-19439:
---

Assignee: Jeff Zhang

> PySpark's registerJavaFunction Should Support UDAFs
> ---
>
> Key: SPARK-19439
> URL: https://issues.apache.org/jira/browse/SPARK-19439
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Keith Bourgoin
>Assignee: Jeff Zhang
>
> When trying to import a Scala UDAF using registerJavaFunction, I get this 
> error:
> {code}
> In [1]: sqlContext.registerJavaFunction('geo_mean', 
> 'com.foo.bar.GeometricMean')
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 sqlContext.registerJavaFunction('geo_mean', 
> 'com.foo.bar.GeometricMean')
> /home/kfb/src/projects/spark/python/pyspark/sql/context.pyc in 
> registerJavaFunction(self, name, javaClassName, returnType)
> 227 if returnType is not None:
> 228 jdt = 
> self.sparkSession._jsparkSession.parseDataType(returnType.json())
> --> 229 self.sparkSession._jsparkSession.udf().registerJava(name, 
> javaClassName, jdt)
> 230
> 231 # TODO(andrew): delete this once we refactor things to take in 
> SparkSession
> /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
>1131 answer = self.gateway_client.send_command(command)
>1132 return_value = get_return_value(
> -> 1133 answer, self.gateway_client, self.target_id, self.name)
>1134
>1135 for temp_arg in temp_args:
> /home/kfb/src/projects/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 317 raise Py4JJavaError(
> 318 "An error occurred while calling {0}{1}{2}.\n".
> --> 319 format(target_id, ".", name), value)
> 320 else:
> 321 raise Py4JError(
> Py4JJavaError: An error occurred while calling o28.registerJava.
> : java.io.IOException: UDF class com.foo.bar.GeometricMean doesn't implement 
> any UDF interface
>   at 
> org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:438)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:214)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> According to SPARK-10915, UDAFs in Python aren't happening anytime soon. 
> Without this, there's no way to get Scala UDAFs into Python Spark SQL 
> whatsoever. Fixing that would be a huge help so that we can keep aggregations 
> in the JVM and using DataFrames. Otherwise, all our code has to drop to to 
> RDDs and live in Python.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19439) PySpark's registerJavaFunction Should Support UDAFs

2017-07-05 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-19439.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> PySpark's registerJavaFunction Should Support UDAFs
> ---
>
> Key: SPARK-19439
> URL: https://issues.apache.org/jira/browse/SPARK-19439
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Keith Bourgoin
>Assignee: Jeff Zhang
> Fix For: 2.3.0
>
>
> When trying to import a Scala UDAF using registerJavaFunction, I get this 
> error:
> {code}
> In [1]: sqlContext.registerJavaFunction('geo_mean', 
> 'com.foo.bar.GeometricMean')
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 sqlContext.registerJavaFunction('geo_mean', 
> 'com.foo.bar.GeometricMean')
> /home/kfb/src/projects/spark/python/pyspark/sql/context.pyc in 
> registerJavaFunction(self, name, javaClassName, returnType)
> 227 if returnType is not None:
> 228 jdt = 
> self.sparkSession._jsparkSession.parseDataType(returnType.json())
> --> 229 self.sparkSession._jsparkSession.udf().registerJava(name, 
> javaClassName, jdt)
> 230
> 231 # TODO(andrew): delete this once we refactor things to take in 
> SparkSession
> /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
>1131 answer = self.gateway_client.send_command(command)
>1132 return_value = get_return_value(
> -> 1133 answer, self.gateway_client, self.target_id, self.name)
>1134
>1135 for temp_arg in temp_args:
> /home/kfb/src/projects/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 317 raise Py4JJavaError(
> 318 "An error occurred while calling {0}{1}{2}.\n".
> --> 319 format(target_id, ".", name), value)
> 320 else:
> 321 raise Py4JError(
> Py4JJavaError: An error occurred while calling o28.registerJava.
> : java.io.IOException: UDF class com.foo.bar.GeometricMean doesn't implement 
> any UDF interface
>   at 
> org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:438)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:214)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> According to SPARK-10915, UDAFs in Python aren't happening anytime soon. 
> Without this, there's no way to get Scala UDAFs into Python Spark SQL 
> whatsoever. Fixing that would be a huge help so that we can keep aggregations 
> in the JVM and using DataFrames. Otherwise, all our code has to drop to to 
> RDDs and live in Python.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21281) cannot create empty typed array column

2017-07-07 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21281.
-
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.3.0

> cannot create empty typed array column
> --
>
> Key: SPARK-21281
> URL: https://issues.apache.org/jira/browse/SPARK-21281
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Saif Addin
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 2.3.0
>
>
> Hi all
> I am running this piece of code
> {code:java}
> val data = spark.read.parquet("somedata.parquet")
> data.withColumn("my_new_column", array().cast("array")).show
> {code}
> and it works fine
> {code:java}
> +--+-++---+
> |itemid|sentiment|text|my_new_column|
> +--+-++---+
> | 1|0| ...| []|
> | 2|0| ...| []|
> | 3|1|  omg...| []|
> | 4|0|  .. Omga...| []|
> {code}
> but when I do
> {code:java}
> val data = spark.read.parquet("somedata.parquet")
> import org.apache.spark.sql.types._
> data.withColumn("my_new_column", array().cast("array").show
> {code}
> I get:
> {code:java}
> scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$)
>   at org.apache.spark.sql.catalyst.expressions.Cast.castToInt(Cast.scala:264)
>   at 
> org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$cast(Cast.scala:433)
>   at org.apache.spark.sql.catalyst.expressions.Cast.castArray(Cast.scala:380)
>   at 
> org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$cast(Cast.scala:437)
>   at 
> org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:447)
>   at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:447)
>   at 
> org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:449)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:325)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:50)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:43)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:248)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:258)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:262)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:262)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$6.apply(QueryPlan.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:267)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1.applyOrElse(expressions.scala:43)

[jira] [Closed] (SPARK-21307) Remove SQLConf parameters from the parser-related classes.

2017-07-08 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-21307.
---
Resolution: Won't Fix

> Remove SQLConf parameters from the parser-related classes.
> --
>
> Key: SPARK-21307
> URL: https://issues.apache.org/jira/browse/SPARK-21307
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Remove SQLConf parameters from the parser-related classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21350) Fix the error message when the number of arguments is wrong when invoking a UDF

2017-07-08 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21350:
---

 Summary: Fix the error message when the number of arguments is 
wrong when invoking a UDF
 Key: SPARK-21350
 URL: https://issues.apache.org/jira/browse/SPARK-21350
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1, 2.0.2, 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li


Got a confusing error message when the number of arguments is wrong when 
invoking a UDF. 

{noformat}
val df = spark.emptyDataFrame
spark.udf.register("foo", (_: String).length)
df.selectExpr("foo(2, 3, 4)")
{noformat}
{noformat}
org.apache.spark.sql.UDFSuite$$anonfun$9$$anonfun$apply$mcV$sp$12 cannot be 
cast to scala.Function3
java.lang.ClassCastException: 
org.apache.spark.sql.UDFSuite$$anonfun$9$$anonfun$apply$mcV$sp$12 cannot be 
cast to scala.Function3
at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.(ScalaUDF.scala:109)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21354) INPUT FILE related functions do not support more than one sources

2017-07-09 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21354:
---

 Summary: INPUT FILE related functions do not support more than one 
sources
 Key: SPARK-21354
 URL: https://issues.apache.org/jira/browse/SPARK-21354
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1, 2.0.2, 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li


{noformat}
hive> select *, INPUT__FILE__NAME FROM t1, t2;
FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One 
Tables/Subqueries
{noformat}

The build-in functions {{input_file_name}}, {{input_file_block_start}}, 
{{input_file_block_length}} do not support more than one sources, like what 
Hive does



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21354) INPUT FILE related functions do not support more than one sources

2017-07-09 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21354:

Description: 
{noformat}
hive> select *, INPUT__FILE__NAME FROM t1, t2;
FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One 
Tables/Subqueries
{noformat}

The build-in functions {{input_file_name}}, {{input_file_block_start}}, 
{{input_file_block_length}} do not support more than one sources, like what 
Hive does. Currently, we do not block it and the outputs are ambiguous.

  was:
{noformat}
hive> select *, INPUT__FILE__NAME FROM t1, t2;
FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One 
Tables/Subqueries
{noformat}

The build-in functions {{input_file_name}}, {{input_file_block_start}}, 
{{input_file_block_length}} do not support more than one sources, like what 
Hive does


> INPUT FILE related functions do not support more than one sources
> -
>
> Key: SPARK-21354
> URL: https://issues.apache.org/jira/browse/SPARK-21354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> {noformat}
> hive> select *, INPUT__FILE__NAME FROM t1, t2;
> FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One 
> Tables/Subqueries
> {noformat}
> The build-in functions {{input_file_name}}, {{input_file_block_start}}, 
> {{input_file_block_length}} do not support more than one sources, like what 
> Hive does. Currently, we do not block it and the outputs are ambiguous.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21272) SortMergeJoin LeftAnti does not update numOutputRows

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21272.
-
   Resolution: Fixed
 Assignee: Juliusz Sompolski
Fix Version/s: 2.3.0
   2.2.1

> SortMergeJoin LeftAnti does not update numOutputRows
> 
>
> Key: SPARK-21272
> URL: https://issues.apache.org/jira/browse/SPARK-21272
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Trivial
> Fix For: 2.2.1, 2.3.0
>
>
> Output rows metric not updated in one of the branches.
> PR pending.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21079) ANALYZE TABLE fails to calculate totalSize for a partitioned table

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21079:

Labels:   (was: easyfix)

> ANALYZE TABLE fails to calculate totalSize for a partitioned table
> --
>
> Key: SPARK-21079
> URL: https://issues.apache.org/jira/browse/SPARK-21079
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Maria
>Assignee: Maria
> Fix For: 2.2.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ANALYZE TABLE table COMPUTE STATISTICS invoked for a partition table produces 
> totalSize = 0.
> AnalyzeTableCommand fetches table-level storage URI and calculated total size 
> of files in the corresponding directory recursively. However, for partitioned 
> tables, each partition has its own storage URI which may not be a 
> subdirectory of the table-level storage URI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21059) LikeSimplification can NPE on null pattern

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21059:

Fix Version/s: (was: 2.3.0)

> LikeSimplification can NPE on null pattern
> --
>
> Key: SPARK-21059
> URL: https://issues.apache.org/jira/browse/SPARK-21059
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20920) ForkJoinPool pools are leaked when writing hive tables with many partitions

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20920:

Fix Version/s: (was: 2.3.0)

> ForkJoinPool pools are leaked when writing hive tables with many partitions
> ---
>
> Key: SPARK-20920
> URL: https://issues.apache.org/jira/browse/SPARK-20920
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Rares Mirica
>Assignee: Sean Owen
> Fix For: 2.1.2, 2.2.0
>
>
> This bug is loosely related to SPARK-17396
> In this case it happens when writing to a hive table with many, many, 
> partitions (my table is partitioned by hour and stores data it gets from 
> kafka in a spark streaming application):
> df.repartition()
>   .write
>   .format("orc")
>   .option("path", s"$tablesStoragePath/$tableName")
>   .mode(SaveMode.Append)
>   .partitionBy("dt", "hh")
>   .saveAsTable(tableName)
> As this table grows beyond a certain size, ForkJoinPool pools start leaking. 
> Upon examination (with a debugger) I found that the caller is 
> AlterTableRecoverPartitionsCommand and the problem happens when 
> `evalTaskSupport` is used (line 555). I have tried setting a very large 
> threshold via `spark.rdd.parallelListingThreshold` and the problem went away.
> My assumption is that the problem happens in this case and not in the one in 
> SPARK-17396 due to the fact that AlterTableRecoverPartitionsCommand is a case 
> class while UnionRDD is an object so multiple instances are not possible, 
> therefore no leak.
> Regards,
> Rares



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21043) Add unionByName API to Dataset

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21043.
-
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.3.0

> Add unionByName API to Dataset
> --
>
> Key: SPARK-21043
> URL: https://issues.apache.org/jira/browse/SPARK-21043
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Assignee: Takeshi Yamamuro
> Fix For: 2.3.0
>
>
> It would be useful to add unionByName which resolves columns by name, in 
> addition to the existing union (which resolves by position).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-19285:
---

Assignee: Xiao Li

> Java - Provide user-defined function of 0 arguments (UDF0)
> --
>
> Key: SPARK-19285
> URL: https://issues.apache.org/jira/browse/SPARK-19285
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Amit Baghel
>Assignee: Xiao Li
>Priority: Minor
>
> I need to implement zero argument UDF but Spark java api doesn't provide 
> UDF0. 
> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java
> For workaround I am creating UDF1 with one argument and not using this 
> argument.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)

2017-07-10 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-19285:

Component/s: (was: Java API)
 SQL

> Java - Provide user-defined function of 0 arguments (UDF0)
> --
>
> Key: SPARK-19285
> URL: https://issues.apache.org/jira/browse/SPARK-19285
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Amit Baghel
>Priority: Minor
>
> I need to implement zero argument UDF but Spark java api doesn't provide 
> UDF0. 
> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java
> For workaround I am creating UDF1 with one argument and not using this 
> argument.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-18598:
---

Assignee: Xiao Li

> Encoding a Java Bean with extra accessors, produces inconsistent Dataset, 
> resulting in AssertionError
> -
>
> Key: SPARK-18598
> URL: https://issues.apache.org/jira/browse/SPARK-18598
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hamish Morgan
>Assignee: Xiao Li
>Priority: Minor
> Fix For: 2.3.0
>
>
> Most operations of {{org.apache.spark.sql.Dataset}} throw 
> {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java 
> bean {{Encoder}}, where the bean has more accessors than properties.
> The following until test demonstrates the steps to replicate:
> {code}
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Encoder;
> import org.apache.spark.sql.Encoders;
> import org.apache.spark.sql.SparkSession;
> import org.junit.Test;
> import org.xml.sax.SAXException;
> import java.io.IOException;
> import static java.util.Collections.singletonList;
> public class SparkBeanEncoderTest {
> public static class TestBean2 {
> private String name;
> public void setName(String name) {
> this.name = name;
> }
> public String getName() {
> return name;
> }
> public String getName2() {
> return name.toLowerCase();
> }
> }
> @Test
> public void testCreateDatasetFromBeanFailure() throws IOException, 
> SAXException {
> SparkSession spark = SparkSession
> .builder()
> .master("local")
> .getOrCreate();
> TestBean2 bean = new TestBean2();
> bean.setName("testing123");
> Encoder encoder = Encoders.bean(TestBean2.class);
> Dataset dataset = spark.createDataset(singletonList(bean), 
> encoder);
> dataset.show();
> spark.stop();
> }
> }
> {code}
> Running the above produces the following output:
> {code}
> 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2
> 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/27 14:00:04 WARN Utils: Your hostname,  resolves to a loopback 
> address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0)
> 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(); groups 
> with view permissions: Set(); users  with modify permissions: Set(); 
> groups with modify permissions: Set()
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on 
> port 34688.
> 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker
> 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster
> 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29
> 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB
> 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://192.168.1.68:4040
> 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost
> 16/11/27 14:00:05 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688.
> 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on 
> 192.168.1.68:42688
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, 
> 42688)
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> 16/11/27 14:00:05 INFO SharedState: Warehouse path is 
> 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'.
> 16/11/27 

[jira] [Reopened] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-18598:
-

> Encoding a Java Bean with extra accessors, produces inconsistent Dataset, 
> resulting in AssertionError
> -
>
> Key: SPARK-18598
> URL: https://issues.apache.org/jira/browse/SPARK-18598
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hamish Morgan
>Assignee: Xiao Li
>Priority: Minor
>
> Most operations of {{org.apache.spark.sql.Dataset}} throw 
> {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java 
> bean {{Encoder}}, where the bean has more accessors than properties.
> The following until test demonstrates the steps to replicate:
> {code}
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Encoder;
> import org.apache.spark.sql.Encoders;
> import org.apache.spark.sql.SparkSession;
> import org.junit.Test;
> import org.xml.sax.SAXException;
> import java.io.IOException;
> import static java.util.Collections.singletonList;
> public class SparkBeanEncoderTest {
> public static class TestBean2 {
> private String name;
> public void setName(String name) {
> this.name = name;
> }
> public String getName() {
> return name;
> }
> public String getName2() {
> return name.toLowerCase();
> }
> }
> @Test
> public void testCreateDatasetFromBeanFailure() throws IOException, 
> SAXException {
> SparkSession spark = SparkSession
> .builder()
> .master("local")
> .getOrCreate();
> TestBean2 bean = new TestBean2();
> bean.setName("testing123");
> Encoder encoder = Encoders.bean(TestBean2.class);
> Dataset dataset = spark.createDataset(singletonList(bean), 
> encoder);
> dataset.show();
> spark.stop();
> }
> }
> {code}
> Running the above produces the following output:
> {code}
> 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2
> 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/27 14:00:04 WARN Utils: Your hostname,  resolves to a loopback 
> address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0)
> 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(); groups 
> with view permissions: Set(); users  with modify permissions: Set(); 
> groups with modify permissions: Set()
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on 
> port 34688.
> 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker
> 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster
> 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29
> 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB
> 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://192.168.1.68:4040
> 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost
> 16/11/27 14:00:05 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688.
> 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on 
> 192.168.1.68:42688
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, 
> 42688)
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> 16/11/27 14:00:05 INFO SharedState: Warehouse path is 
> 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'.
> 16/11/27 14:00:05 INFO CodeGenerator: Code generated in 166.762154 

[jira] [Closed] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-18598.
---
   Resolution: Unresolved
Fix Version/s: (was: 2.3.0)

> Encoding a Java Bean with extra accessors, produces inconsistent Dataset, 
> resulting in AssertionError
> -
>
> Key: SPARK-18598
> URL: https://issues.apache.org/jira/browse/SPARK-18598
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hamish Morgan
>Priority: Minor
>
> Most operations of {{org.apache.spark.sql.Dataset}} throw 
> {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java 
> bean {{Encoder}}, where the bean has more accessors than properties.
> The following until test demonstrates the steps to replicate:
> {code}
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Encoder;
> import org.apache.spark.sql.Encoders;
> import org.apache.spark.sql.SparkSession;
> import org.junit.Test;
> import org.xml.sax.SAXException;
> import java.io.IOException;
> import static java.util.Collections.singletonList;
> public class SparkBeanEncoderTest {
> public static class TestBean2 {
> private String name;
> public void setName(String name) {
> this.name = name;
> }
> public String getName() {
> return name;
> }
> public String getName2() {
> return name.toLowerCase();
> }
> }
> @Test
> public void testCreateDatasetFromBeanFailure() throws IOException, 
> SAXException {
> SparkSession spark = SparkSession
> .builder()
> .master("local")
> .getOrCreate();
> TestBean2 bean = new TestBean2();
> bean.setName("testing123");
> Encoder encoder = Encoders.bean(TestBean2.class);
> Dataset dataset = spark.createDataset(singletonList(bean), 
> encoder);
> dataset.show();
> spark.stop();
> }
> }
> {code}
> Running the above produces the following output:
> {code}
> 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2
> 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/27 14:00:04 WARN Utils: Your hostname,  resolves to a loopback 
> address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0)
> 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(); groups 
> with view permissions: Set(); users  with modify permissions: Set(); 
> groups with modify permissions: Set()
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on 
> port 34688.
> 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker
> 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster
> 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29
> 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB
> 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://192.168.1.68:4040
> 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost
> 16/11/27 14:00:05 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688.
> 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on 
> 192.168.1.68:42688
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, 
> 42688)
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> 16/11/27 14:00:05 INFO SharedState: Warehouse path is 
> 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'.
> 16/11/27 14:00:05 INFO CodeGenerator

[jira] [Updated] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-19285:

Priority: Major  (was: Minor)

> Java - Provide user-defined function of 0 arguments (UDF0)
> --
>
> Key: SPARK-19285
> URL: https://issues.apache.org/jira/browse/SPARK-19285
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Amit Baghel
>Assignee: Xiao Li
> Fix For: 2.3.0
>
>
> I need to implement zero argument UDF but Spark java api doesn't provide 
> UDF0. 
> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java
> For workaround I am creating UDF1 with one argument and not using this 
> argument.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-18598.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Encoding a Java Bean with extra accessors, produces inconsistent Dataset, 
> resulting in AssertionError
> -
>
> Key: SPARK-18598
> URL: https://issues.apache.org/jira/browse/SPARK-18598
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hamish Morgan
>Assignee: Xiao Li
>Priority: Minor
> Fix For: 2.3.0
>
>
> Most operations of {{org.apache.spark.sql.Dataset}} throw 
> {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java 
> bean {{Encoder}}, where the bean has more accessors than properties.
> The following until test demonstrates the steps to replicate:
> {code}
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Encoder;
> import org.apache.spark.sql.Encoders;
> import org.apache.spark.sql.SparkSession;
> import org.junit.Test;
> import org.xml.sax.SAXException;
> import java.io.IOException;
> import static java.util.Collections.singletonList;
> public class SparkBeanEncoderTest {
> public static class TestBean2 {
> private String name;
> public void setName(String name) {
> this.name = name;
> }
> public String getName() {
> return name;
> }
> public String getName2() {
> return name.toLowerCase();
> }
> }
> @Test
> public void testCreateDatasetFromBeanFailure() throws IOException, 
> SAXException {
> SparkSession spark = SparkSession
> .builder()
> .master("local")
> .getOrCreate();
> TestBean2 bean = new TestBean2();
> bean.setName("testing123");
> Encoder encoder = Encoders.bean(TestBean2.class);
> Dataset dataset = spark.createDataset(singletonList(bean), 
> encoder);
> dataset.show();
> spark.stop();
> }
> }
> {code}
> Running the above produces the following output:
> {code}
> 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2
> 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/27 14:00:04 WARN Utils: Your hostname,  resolves to a loopback 
> address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0)
> 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(); groups 
> with view permissions: Set(); users  with modify permissions: Set(); 
> groups with modify permissions: Set()
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on 
> port 34688.
> 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker
> 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster
> 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29
> 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB
> 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://192.168.1.68:4040
> 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost
> 16/11/27 14:00:05 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688.
> 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on 
> 192.168.1.68:42688
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, 
> 42688)
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> 16/11/27 14:00:05 INFO SharedState: Warehouse path is 
> 'file:/home/hamish/git/language-identifier/wikidump/spark-

[jira] [Assigned] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-18598:
---

Assignee: (was: Xiao Li)

> Encoding a Java Bean with extra accessors, produces inconsistent Dataset, 
> resulting in AssertionError
> -
>
> Key: SPARK-18598
> URL: https://issues.apache.org/jira/browse/SPARK-18598
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hamish Morgan
>Priority: Minor
>
> Most operations of {{org.apache.spark.sql.Dataset}} throw 
> {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java 
> bean {{Encoder}}, where the bean has more accessors than properties.
> The following until test demonstrates the steps to replicate:
> {code}
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Encoder;
> import org.apache.spark.sql.Encoders;
> import org.apache.spark.sql.SparkSession;
> import org.junit.Test;
> import org.xml.sax.SAXException;
> import java.io.IOException;
> import static java.util.Collections.singletonList;
> public class SparkBeanEncoderTest {
> public static class TestBean2 {
> private String name;
> public void setName(String name) {
> this.name = name;
> }
> public String getName() {
> return name;
> }
> public String getName2() {
> return name.toLowerCase();
> }
> }
> @Test
> public void testCreateDatasetFromBeanFailure() throws IOException, 
> SAXException {
> SparkSession spark = SparkSession
> .builder()
> .master("local")
> .getOrCreate();
> TestBean2 bean = new TestBean2();
> bean.setName("testing123");
> Encoder encoder = Encoders.bean(TestBean2.class);
> Dataset dataset = spark.createDataset(singletonList(bean), 
> encoder);
> dataset.show();
> spark.stop();
> }
> }
> {code}
> Running the above produces the following output:
> {code}
> 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2
> 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/27 14:00:04 WARN Utils: Your hostname,  resolves to a loopback 
> address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0)
> 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: 
> 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(); groups 
> with view permissions: Set(); users  with modify permissions: Set(); 
> groups with modify permissions: Set()
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on 
> port 34688.
> 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker
> 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster
> 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29
> 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB
> 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://192.168.1.68:4040
> 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost
> 16/11/27 14:00:05 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688.
> 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on 
> 192.168.1.68:42688
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, 
> 42688)
> 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(driver, 192.168.1.68, 42688)
> 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> 16/11/27 14:00:05 INFO SharedState: Warehouse path is 
> 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'.
> 16/11/27 14:00:05 INFO CodeGenerator: Code generated in 166.

[jira] [Resolved] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)

2017-07-11 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-19285.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Java - Provide user-defined function of 0 arguments (UDF0)
> --
>
> Key: SPARK-19285
> URL: https://issues.apache.org/jira/browse/SPARK-19285
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Amit Baghel
>Assignee: Xiao Li
> Fix For: 2.3.0
>
>
> I need to implement zero argument UDF but Spark java api doesn't provide 
> UDF0. 
> https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java
> For workaround I am creating UDF1 with one argument and not using this 
> argument.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21426) Fix test failure due to unsupported hex literals.

2017-07-15 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21426:
---

 Summary: Fix test failure due to unsupported hex literals. 
 Key: SPARK-21426
 URL: https://issues.apache.org/jira/browse/SPARK-21426
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 2.0.2
Reporter: Xiao Li
Assignee: Xiao Li


SPARK 2.0 does not support hex literal. Thus, the test case failed after 
backporting https://github.com/apache/spark/pull/18571



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21426) Fix test failure due to unsupported hex literals.

2017-07-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21426.
-
   Resolution: Fixed
Fix Version/s: 2.0.3

> Fix test failure due to unsupported hex literals. 
> --
>
> Key: SPARK-21426
> URL: https://issues.apache.org/jira/browse/SPARK-21426
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.3
>
>
> SPARK 2.0 does not support hex literal. Thus, the test case failed after 
> backporting https://github.com/apache/spark/pull/18571



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21426) Fix test failure due to unsupported hex literals.

2017-07-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-21426:

Affects Version/s: (was: 2.0.2)
   2.0.3

> Fix test failure due to unsupported hex literals. 
> --
>
> Key: SPARK-21426
> URL: https://issues.apache.org/jira/browse/SPARK-21426
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.0.3
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.3
>
>
> SPARK 2.0 does not support hex literal. Thus, the test case failed after 
> backporting https://github.com/apache/spark/pull/18571



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-21344:
---

Assignee: Kazuaki Ishizaki

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Shubham Chopra
>Assignee: Kazuaki Ishizaki
> Fix For: 2.0.3, 2.1.2, 2.2.1
>
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-16 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21344.
-
   Resolution: Fixed
Fix Version/s: 2.2.1
   2.1.2
   2.0.3

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Shubham Chopra
>Assignee: Kazuaki Ishizaki
> Fix For: 2.0.3, 2.1.2, 2.2.1
>
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >