[jira] [Commented] (SPARK-13333) DataFrame filter + randn + unionAll has bad interaction
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050776#comment-16050776 ] Xiao Li commented on SPARK-1: - This function is still missing in the SQL interface. We can achieve the resolution by names by using the CORRESPONDING BY clause. For example, {noformat} (select * from t1) union corresponding by (c1, c2) (select * from t2); {noformat} > DataFrame filter + randn + unionAll has bad interaction > --- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.2, 1.6.1, 2.0.0 >Reporter: Joseph K. Bradley > > Buggy workflow > * Create a DataFrame df0 > * Filter df0 > * Add a randn column > * Create a copy of the DataFrame > * unionAll the two DataFrames > This fails, where randn produces the same results on the original DataFrame > and the copy before unionAll but fails to do so after unionAll. Removing the > filter fixes the problem. > The bug can be reproduced on master: > {code} > import org.apache.spark.sql.functions.randn > val df0 = sqlContext.createDataFrame(Seq(0, 1).map(Tuple1(_))).toDF("id") > // Removing the following filter() call makes this give the expected result. > val df1 = df0.filter(col("id") === 0).withColumn("b", randn(12345)) > println("DF1") > df1.show() > val df2 = df1.select("id", "b") > println("DF2") > df2.show() // same as df1.show(), as expected > val df3 = df1.unionAll(df2) > println("DF3") > df3.show() // NOT two copies of df1, which is unexpected > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21111) Fix test failure in 2.2
Xiao Li created SPARK-2: --- Summary: Fix test failure in 2.2 Key: SPARK-2 URL: https://issues.apache.org/jira/browse/SPARK-2 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.2.0 Reporter: Xiao Li Assignee: Xiao Li Priority: Blocker Test failure: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
Xiao Li created SPARK-21112: --- Summary: ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT Key: SPARK-21112 URL: https://issues.apache.org/jira/browse/SPARK-21112 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Xiao Li Assignee: Xiao Li {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if the input does not have the property of `COMMENT` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21114: Description: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ (was: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/) > Test failure fix in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch
Xiao Li created SPARK-21114: --- Summary: Test failure fix in Spark 2.1 due to name mismatch Key: SPARK-21114 URL: https://issues.apache.org/jira/browse/SPARK-21114 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.1.1 Reporter: Xiao Li Assignee: Xiao Li https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21114: Summary: Test failure in Spark 2.1 due to name mismatch (was: Test failure fix in Spark 2.1 due to name mismatch) > Test failure in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21114: Affects Version/s: 2.0.2 > Test failure in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.0.2, 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21114: Description: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/ was:https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ > Test failure in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.0.2, 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21114) Test failure in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21114: Description: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ was: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/ > Test failure in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.0.2, 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20749) Built-in SQL Function Support - all variants of LEN[GTH]
[ https://issues.apache.org/jira/browse/SPARK-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20749: --- Assignee: Kazuaki Ishizaki > Built-in SQL Function Support - all variants of LEN[GTH] > > > Key: SPARK-20749 > URL: https://issues.apache.org/jira/browse/SPARK-20749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Kazuaki Ishizaki > Labels: starter > Fix For: 2.3.0 > > > {noformat} > LEN[GTH]() > {noformat} > The SQL 99 standard includes BIT_LENGTH(), CHAR_LENGTH(), and OCTET_LENGTH() > functions. > We need to support all of them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20749) Built-in SQL Function Support - all variants of LEN[GTH]
[ https://issues.apache.org/jira/browse/SPARK-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20749. - Resolution: Fixed Fix Version/s: 2.3.0 > Built-in SQL Function Support - all variants of LEN[GTH] > > > Key: SPARK-20749 > URL: https://issues.apache.org/jira/browse/SPARK-20749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Kazuaki Ishizaki > Labels: starter > Fix For: 2.3.0 > > > {noformat} > LEN[GTH]() > {noformat} > The SQL 99 standard includes BIT_LENGTH(), CHAR_LENGTH(), and OCTET_LENGTH() > functions. > We need to support all of them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20752) Build-in SQL Function Support - SQRT
[ https://issues.apache.org/jira/browse/SPARK-20752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051441#comment-16051441 ] Xiao Li commented on SPARK-20752: - Yes! > Build-in SQL Function Support - SQRT > > > Key: SPARK-20752 > URL: https://issues.apache.org/jira/browse/SPARK-20752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li > Labels: starter > > {noformat} > SQRT() > {noformat} > Returns Power(, 2) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20750) Built-in SQL Function Support - REPLACE
[ https://issues.apache.org/jira/browse/SPARK-20750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20750. - Resolution: Fixed Assignee: Kazuaki Ishizaki Fix Version/s: 2.3.0 > Built-in SQL Function Support - REPLACE > --- > > Key: SPARK-20750 > URL: https://issues.apache.org/jira/browse/SPARK-20750 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Kazuaki Ishizaki > Labels: starter > Fix For: 2.3.0 > > > {noformat} > REPLACE(, [, ]) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-20752) Build-in SQL Function Support - SQRT
[ https://issues.apache.org/jira/browse/SPARK-20752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-20752. --- Resolution: Duplicate > Build-in SQL Function Support - SQRT > > > Key: SPARK-20752 > URL: https://issues.apache.org/jira/browse/SPARK-20752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li > Labels: starter > > {noformat} > SQRT() > {noformat} > Returns Power(, 2) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21119) unset table properties should keep the table comment
[ https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21119. - Resolution: Fixed Fix Version/s: 2.3.0 > unset table properties should keep the table comment > > > Key: SPARK-21119 > URL: https://issues.apache.org/jira/browse/SPARK-21119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21089) Table properties are not shown in DESC EXTENDED/FORMATTED
[ https://issues.apache.org/jira/browse/SPARK-21089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21089. - Resolution: Fixed > Table properties are not shown in DESC EXTENDED/FORMATTED > - > > Key: SPARK-21089 > URL: https://issues.apache.org/jira/browse/SPARK-21089 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Critical > > Since both table properties and storage properties share the same key values, > table properties are not shown in the output of DESC EXTENDED/FORMATTED when > the storage properties are not empty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21089) Table properties are not shown in DESC EXTENDED/FORMATTED
[ https://issues.apache.org/jira/browse/SPARK-21089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21089: Fix Version/s: 2.2.0 > Table properties are not shown in DESC EXTENDED/FORMATTED > - > > Key: SPARK-21089 > URL: https://issues.apache.org/jira/browse/SPARK-21089 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Critical > Fix For: 2.2.0 > > > Since both table properties and storage properties share the same key values, > table properties are not shown in the output of DESC EXTENDED/FORMATTED when > the storage properties are not empty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21129) Arguments of SQL function call should not be named expressions
Xiao Li created SPARK-21129: --- Summary: Arguments of SQL function call should not be named expressions Key: SPARK-21129 URL: https://issues.apache.org/jira/browse/SPARK-21129 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.1, 2.0.2, 2.2.0 Reporter: Xiao Li Assignee: Xiao Li Function argument should not be named expressions. It could cause misleading error message. {noformat} spark-sql> select count(distinct c1, distinct c2) from t1; {noformat} {noformat} Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; line 1 pos 26; 'Project [unresolvedalias('count(c1#30, 'distinct), None)] +- SubqueryAlias t1 +- CatalogRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31] {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored
Xiao Li created SPARK-21132: --- Summary: DISTINCT modifier of function arguments should not be silently ignored Key: SPARK-21132 URL: https://issues.apache.org/jira/browse/SPARK-21132 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Xiao Li Assignee: Xiao Li DISTINCT modifier of function arguments should not be silently ignored when it is not being supported. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21129) Arguments of SQL function call should not be named expressions
[ https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21129: Affects Version/s: (was: 2.1.1) (was: 2.0.2) > Arguments of SQL function call should not be named expressions > -- > > Key: SPARK-21129 > URL: https://issues.apache.org/jira/browse/SPARK-21129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > Function argument should not be named expressions. It could cause misleading > error message. > {noformat} > spark-sql> select count(distinct c1, distinct c2) from t1; > {noformat} > {noformat} > Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; > line 1 pos 26; > 'Project [unresolvedalias('count(c1#30, 'distinct), None)] > +- SubqueryAlias t1 >+- CatalogRelation `default`.`t1`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored
[ https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21132: Affects Version/s: 2.0.2 2.1.1 > DISTINCT modifier of function arguments should not be silently ignored > -- > > Key: SPARK-21132 > URL: https://issues.apache.org/jira/browse/SPARK-21132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > DISTINCT modifier of function arguments should not be silently ignored when > it is not being supported. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20948) Built-in SQL Function UnaryMinus/UnaryPositive support string type
[ https://issues.apache.org/jira/browse/SPARK-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20948. - Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 2.3.0 > Built-in SQL Function UnaryMinus/UnaryPositive support string type > -- > > Key: SPARK-20948 > URL: https://issues.apache.org/jira/browse/SPARK-20948 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang > Fix For: 2.3.0 > > > {{UnaryMinus}}/{{UnaryPositive}} function should support string type, same as > hive: > {code:sql} > $ bin/hive > Logging initialized using configuration in > jar:file:/home/wym/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties > hive> select positive('-1.11'), negative('-1.11'); > OK > -1.11 1.11 > Time taken: 1.937 seconds, Fetched: 1 row(s) > hive> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19824) Standalone master JSON not showing cores for running applications
[ https://issues.apache.org/jira/browse/SPARK-19824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19824. - Resolution: Fixed Assignee: Jiang Xingbo Fix Version/s: 2.3.0 Target Version/s: 2.3.0 > Standalone master JSON not showing cores for running applications > - > > Key: SPARK-19824 > URL: https://issues.apache.org/jira/browse/SPARK-19824 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.0 >Reporter: Dan >Assignee: Jiang Xingbo >Priority: Minor > Fix For: 2.3.0 > > > The JSON API of the standalone master ("/json") does not show the number of > cores for a running application, which is available on the UI. > "activeapps" : [ { > "starttime" : 1488702337788, > "id" : "app-20170305102537-19717", > "name" : "POPAI_Aggregated", > "user" : "ibiuser", > "memoryperslave" : 16384, > "submitdate" : "Sun Mar 05 10:25:37 IST 2017", > "state" : "RUNNING", > "duration" : 1141934 > } ], -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19975) Add map_keys and map_values functions to Python
[ https://issues.apache.org/jira/browse/SPARK-19975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19975. - Resolution: Fixed Assignee: Yong Tang Fix Version/s: 2.3.0 > Add map_keys and map_values functions to Python > - > > Key: SPARK-19975 > URL: https://issues.apache.org/jira/browse/SPARK-19975 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.1.0 >Reporter: Maciej BryĆski >Assignee: Yong Tang > Fix For: 2.3.0 > > > We have `map_keys` and `map_values` functions in SQL. > There is no Python equivalent functions for that. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns
Xiao Li created SPARK-21144: --- Summary: Unexpected results when the data schema and partition schema have the duplicate columns Key: SPARK-21144 URL: https://issues.apache.org/jira/browse/SPARK-21144 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Xiao Li {noformat} withTempPath { dir => val basePath = dir.getCanonicalPath spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, "foo=1").toString) spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, "foo=a").toString) spark.read.parquet(basePath).show() } {noformat} The result of the above case is {noformat} +---+ |foo| +---+ | 1| | 1| | a| | a| | 1| | a| +---+ {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054788#comment-16054788 ] Xiao Li commented on SPARK-21144: - cc [~maropu] > Unexpected results when the data schema and partition schema have the > duplicate columns > --- > > Key: SPARK-21144 > URL: https://issues.apache.org/jira/browse/SPARK-21144 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li > > {noformat} > withTempPath { dir => > val basePath = dir.getCanonicalPath > spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, > "foo=1").toString) > spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, > "foo=a").toString) > spark.read.parquet(basePath).show() > } > {noformat} > The result of the above case is > {noformat} > +---+ > |foo| > +---+ > | 1| > | 1| > | a| > | a| > | 1| > | a| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21144: Target Version/s: 2.2.0 > Unexpected results when the data schema and partition schema have the > duplicate columns > --- > > Key: SPARK-21144 > URL: https://issues.apache.org/jira/browse/SPARK-21144 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li > > {noformat} > withTempPath { dir => > val basePath = dir.getCanonicalPath > spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, > "foo=1").toString) > spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, > "foo=a").toString) > spark.read.parquet(basePath).show() > } > {noformat} > The result of the above case is > {noformat} > +---+ > |foo| > +---+ > | 1| > | 1| > | a| > | a| > | 1| > | a| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.
Xiao Li created SPARK-21150: --- Summary: Persistent view stored in Hive metastore should be case preserving. Key: SPARK-21150 URL: https://issues.apache.org/jira/browse/SPARK-21150 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Xiao Li {noformat} withView("view1") { spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD") val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, "int").build() val expectedSchema = StructType(List( StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata), StructField("aBcD", IntegerType, nullable = false, metadata))) assert(spark.table("view1").schema == expectedSchema, "Schema should match") checkAnswer( sql("select aBcD, cAsEpReSeRvE from view1"), Row(2, 1)) } {noformat} The column names of persistent view stored in Hive metastore should be case preserving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.
[ https://issues.apache.org/jira/browse/SPARK-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-21150: --- Assignee: Jiang Xingbo Target Version/s: 2.2.0 Priority: Blocker (was: Major) > Persistent view stored in Hive metastore should be case preserving. > --- > > Key: SPARK-21150 > URL: https://issues.apache.org/jira/browse/SPARK-21150 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Jiang Xingbo >Priority: Blocker > > {noformat} > withView("view1") { > spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD") > val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, > "int").build() > val expectedSchema = StructType(List( > StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata), > StructField("aBcD", IntegerType, nullable = false, metadata))) > assert(spark.table("view1").schema == expectedSchema, "Schema should > match") > checkAnswer( > sql("select aBcD, cAsEpReSeRvE from view1"), > Row(2, 1)) > } > {noformat} > The column names of persistent view stored in Hive metastore should be case > preserving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.
[ https://issues.apache.org/jira/browse/SPARK-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-21150: --- Assignee: (was: Jiang Xingbo) > Persistent view stored in Hive metastore should be case preserving. > --- > > Key: SPARK-21150 > URL: https://issues.apache.org/jira/browse/SPARK-21150 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Priority: Blocker > > {noformat} > withView("view1") { > spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD") > val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, > "int").build() > val expectedSchema = StructType(List( > StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata), > StructField("aBcD", IntegerType, nullable = false, metadata))) > assert(spark.table("view1").schema == expectedSchema, "Schema should > match") > checkAnswer( > sql("select aBcD, cAsEpReSeRvE from view1"), > Row(2, 1)) > } > {noformat} > The column names of persistent view stored in Hive metastore should be case > preserving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21150) Persistent view stored in Hive metastore should be case preserving.
[ https://issues.apache.org/jira/browse/SPARK-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21150. - Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 2.2.0 > Persistent view stored in Hive metastore should be case preserving. > --- > > Key: SPARK-21150 > URL: https://issues.apache.org/jira/browse/SPARK-21150 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Wenchen Fan >Priority: Blocker > Fix For: 2.2.0 > > > {noformat} > withView("view1") { > spark.sql("CREATE VIEW view1 AS SELECT 1 AS cAsEpReSeRvE, 2 AS aBcD") > val metadata = new MetadataBuilder().putString(types.HIVE_TYPE_STRING, > "int").build() > val expectedSchema = StructType(List( > StructField("cAsEpReSeRvE", IntegerType, nullable = false, metadata), > StructField("aBcD", IntegerType, nullable = false, metadata))) > assert(spark.table("view1").schema == expectedSchema, "Schema should > match") > checkAnswer( > sql("select aBcD, cAsEpReSeRvE from view1"), > Row(2, 1)) > } > {noformat} > The column names of persistent view stored in Hive metastore should be case > preserving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
[ https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-10655: --- Assignee: Suresh Thalamati > Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT > - > > Key: SPARK-10655 > URL: https://issues.apache.org/jira/browse/SPARK-10655 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Suresh Thalamati >Assignee: Suresh Thalamati > Fix For: 2.3.0 > > > Default type mapping does not work when reading from DB2 table that contains > XML, DECFLOAT for READ , and DECIMAL type for write. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
[ https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-10655. - Resolution: Fixed Fix Version/s: 2.3.0 > Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT > - > > Key: SPARK-10655 > URL: https://issues.apache.org/jira/browse/SPARK-10655 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Suresh Thalamati >Assignee: Suresh Thalamati > Fix For: 2.3.0 > > > Default type mapping does not work when reading from DB2 table that contains > XML, DECFLOAT for READ , and DECIMAL type for write. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17851) Make sure all test sqls in catalyst pass checkAnalysis
[ https://issues.apache.org/jira/browse/SPARK-17851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-17851. - Resolution: Fixed Assignee: Jiang Xingbo Fix Version/s: 2.3.0 > Make sure all test sqls in catalyst pass checkAnalysis > -- > > Key: SPARK-17851 > URL: https://issues.apache.org/jira/browse/SPARK-17851 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Minor > Fix For: 2.3.0 > > > Currently we have several tens of test sqls in catalyst will fail at > `SimpleAnalyzer.checkAnalysis`, we should make sure they are valid. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21164) Remove isTableSample from Sample
Xiao Li created SPARK-21164: --- Summary: Remove isTableSample from Sample Key: SPARK-21164 URL: https://issues.apache.org/jira/browse/SPARK-21164 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Xiao Li Assignee: Xiao Li {{isTableSample}} was introduced for SQL Generation. Since SQL Generation is removed, we do not need to keep {{isTableSample}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21165) Fail to write into partitioned hive table due to attribute reference not working with cast on partition column
[ https://issues.apache.org/jira/browse/SPARK-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058440#comment-16058440 ] Xiao Li commented on SPARK-21165: - Unable to reproduce it in the current master branch. Will try to use 2.2 RC5 later > Fail to write into partitioned hive table due to attribute reference not > working with cast on partition column > -- > > Key: SPARK-21165 > URL: https://issues.apache.org/jira/browse/SPARK-21165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Imran Rashid >Priority: Blocker > > A simple "insert into ... select" involving partitioned hive tables fails. > Here's a simpler repro which doesn't involve hive at all -- this succeeds on > 2.1.1, but fails on 2.2.0-rc5: > {noformat} > spark.sql("""SET hive.exec.dynamic.partition.mode=nonstrict""") > spark.sql("""DROP TABLE IF EXISTS src""") > spark.sql("""DROP TABLE IF EXISTS dest""") > spark.sql(""" > CREATE TABLE src (first string, word string) > PARTITIONED BY (length int) > """) > spark.sql(""" > INSERT INTO src PARTITION(length) VALUES > ('a', 'abc', 3), > ('b', 'bcde', 4), > ('c', 'cdefg', 5) > """) > spark.sql(""" > CREATE TABLE dest (word string, length int) > PARTITIONED BY (first string) > """) > spark.sql(""" > INSERT INTO TABLE dest PARTITION(first) SELECT word, length, cast(first as > string) as first FROM src > """) > {noformat} > The exception is > {noformat} > 17/06/21 14:25:53 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 10, > localhost, executor driver): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > , tree: first#74 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:49) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:43) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:884) > at > org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363) > at > org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec
[jira] [Commented] (SPARK-21165) Fail to write into partitioned hive table due to attribute reference not working with cast on partition column
[ https://issues.apache.org/jira/browse/SPARK-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058490#comment-16058490 ] Xiao Li commented on SPARK-21165: - 2.2 branch failed with the same error. > Fail to write into partitioned hive table due to attribute reference not > working with cast on partition column > -- > > Key: SPARK-21165 > URL: https://issues.apache.org/jira/browse/SPARK-21165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Imran Rashid >Priority: Blocker > > A simple "insert into ... select" involving partitioned hive tables fails. > Here's a simpler repro which doesn't involve hive at all -- this succeeds on > 2.1.1, but fails on 2.2.0-rc5: > {noformat} > spark.sql("""SET hive.exec.dynamic.partition.mode=nonstrict""") > spark.sql("""DROP TABLE IF EXISTS src""") > spark.sql("""DROP TABLE IF EXISTS dest""") > spark.sql(""" > CREATE TABLE src (first string, word string) > PARTITIONED BY (length int) > """) > spark.sql(""" > INSERT INTO src PARTITION(length) VALUES > ('a', 'abc', 3), > ('b', 'bcde', 4), > ('c', 'cdefg', 5) > """) > spark.sql(""" > CREATE TABLE dest (word string, length int) > PARTITIONED BY (first string) > """) > spark.sql(""" > INSERT INTO TABLE dest PARTITION(first) SELECT word, length, cast(first as > string) as first FROM src > """) > {noformat} > The exception is > {noformat} > 17/06/21 14:25:53 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 10, > localhost, executor driver): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > , tree: first#74 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:49) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:43) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:884) > at > org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363) > at > org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101) > at > org.apache.spark.
[jira] [Assigned] (SPARK-21165) Fail to write into partitioned hive table due to attribute reference not working with cast on partition column
[ https://issues.apache.org/jira/browse/SPARK-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-21165: --- Assignee: Xiao Li > Fail to write into partitioned hive table due to attribute reference not > working with cast on partition column > -- > > Key: SPARK-21165 > URL: https://issues.apache.org/jira/browse/SPARK-21165 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Imran Rashid >Assignee: Xiao Li >Priority: Blocker > > A simple "insert into ... select" involving partitioned hive tables fails. > Here's a simpler repro which doesn't involve hive at all -- this succeeds on > 2.1.1, but fails on 2.2.0-rc5: > {noformat} > spark.sql("""SET hive.exec.dynamic.partition.mode=nonstrict""") > spark.sql("""DROP TABLE IF EXISTS src""") > spark.sql("""DROP TABLE IF EXISTS dest""") > spark.sql(""" > CREATE TABLE src (first string, word string) > PARTITIONED BY (length int) > """) > spark.sql(""" > INSERT INTO src PARTITION(length) VALUES > ('a', 'abc', 3), > ('b', 'bcde', 4), > ('c', 'cdefg', 5) > """) > spark.sql(""" > CREATE TABLE dest (word string, length int) > PARTITIONED BY (first string) > """) > spark.sql(""" > INSERT INTO TABLE dest PARTITION(first) SELECT word, length, cast(first as > string) as first FROM src > """) > {noformat} > The exception is > {noformat} > 17/06/21 14:25:53 WARN TaskSetManager: Lost task 1.0 in stage 4.0 (TID 10, > localhost, executor driver): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute > , tree: first#74 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$bind$1.apply(GenerateOrdering.scala:49) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:49) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.bind(GenerateOrdering.scala:43) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:884) > at > org.apache.spark.sql.execution.SparkPlan.newOrdering(SparkPlan.scala:363) > at > org.apache.spark.sql.execution.SortExec.createSorter(SortExec.scala:63) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:102) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:101) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInterna
[jira] [Updated] (SPARK-21174) Validate sampling fraction in logical operator level
[ https://issues.apache.org/jira/browse/SPARK-21174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21174: Component/s: (was: Optimizer) SQL > Validate sampling fraction in logical operator level > > > Key: SPARK-21174 > URL: https://issues.apache.org/jira/browse/SPARK-21174 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Gengliang Wang >Priority: Minor > > Currently the validation of sampling fraction in dataset is incomplete. > As an improvement, validate sampling ratio in logical operator level: > 1) if with replacement: ratio should be nonnegative > 2) else: ratio should be on interval [0, 1] > Also add test cases for the validation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21144. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.2.0 > Unexpected results when the data schema and partition schema have the > duplicate columns > --- > > Key: SPARK-21144 > URL: https://issues.apache.org/jira/browse/SPARK-21144 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Takeshi Yamamuro > Fix For: 2.2.0 > > > {noformat} > withTempPath { dir => > val basePath = dir.getCanonicalPath > spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, > "foo=1").toString) > spark.range(0, 3).toDF("foo").write.parquet(new Path(basePath, > "foo=a").toString) > spark.read.parquet(basePath).show() > } > {noformat} > The result of the above case is > {noformat} > +---+ > |foo| > +---+ > | 1| > | 1| > | a| > | a| > | 1| > | a| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference
[ https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21164: Summary: Remove isTableSample from Sample and isGenerated from Alias and AttributeReference (was: Remove isTableSample from Sample) > Remove isTableSample from Sample and isGenerated from Alias and > AttributeReference > -- > > Key: SPARK-21164 > URL: https://issues.apache.org/jira/browse/SPARK-21164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > {{isTableSample}} was introduced for SQL Generation. Since SQL Generation is > removed, we do not need to keep {{isTableSample}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference
[ https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21164: Description: isTableSample and isGenerated were introduced for SQL Generation respectively by #11148 and #11050 Since SQL Generation is removed, we do not need to keep isTableSample. was:{{isTableSample}} was introduced for SQL Generation. Since SQL Generation is removed, we do not need to keep {{isTableSample}}. > Remove isTableSample from Sample and isGenerated from Alias and > AttributeReference > -- > > Key: SPARK-21164 > URL: https://issues.apache.org/jira/browse/SPARK-21164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > isTableSample and isGenerated were introduced for SQL Generation respectively > by #11148 and #11050 > Since SQL Generation is removed, we do not need to keep isTableSample. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference
[ https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21164: Description: isTableSample and isGenerated were introduced for SQL Generation respectively by PR 11148 and PR 11050 Since SQL Generation is removed, we do not need to keep isTableSample. was: isTableSample and isGenerated were introduced for SQL Generation respectively by #11148 and #11050 Since SQL Generation is removed, we do not need to keep isTableSample. > Remove isTableSample from Sample and isGenerated from Alias and > AttributeReference > -- > > Key: SPARK-21164 > URL: https://issues.apache.org/jira/browse/SPARK-21164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > isTableSample and isGenerated were introduced for SQL Generation respectively > by PR 11148 and PR 11050 > Since SQL Generation is removed, we do not need to keep isTableSample. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21180) Remove conf from stats functions since now we have conf in LogicalPlan
[ https://issues.apache.org/jira/browse/SPARK-21180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21180. - Resolution: Fixed Assignee: Zhenhua Wang Fix Version/s: 2.3.0 > Remove conf from stats functions since now we have conf in LogicalPlan > -- > > Key: SPARK-21180 > URL: https://issues.apache.org/jira/browse/SPARK-21180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang >Assignee: Zhenhua Wang > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20417) Move error reporting for subquery from Analyzer to CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-20417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20417. - Resolution: Fixed Assignee: Dilip Biswal Fix Version/s: 2.3.0 > Move error reporting for subquery from Analyzer to CheckAnalysis > > > Key: SPARK-20417 > URL: https://issues.apache.org/jira/browse/SPARK-20417 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Dilip Biswal >Assignee: Dilip Biswal > Fix For: 2.3.0 > > > Currently we do a lot of validations for subquery in the Analyzer. We should > move them to CheckAnalysis which is the framework to catch and report > Analysis errors. This was mentioned as a review comment in SPARK-18874. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21164) Remove isTableSample from Sample and isGenerated from Alias and AttributeReference
[ https://issues.apache.org/jira/browse/SPARK-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21164. - Resolution: Fixed Fix Version/s: 2.3.0 > Remove isTableSample from Sample and isGenerated from Alias and > AttributeReference > -- > > Key: SPARK-21164 > URL: https://issues.apache.org/jira/browse/SPARK-21164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.3.0 > > > isTableSample and isGenerated were introduced for SQL Generation respectively > by PR 11148 and PR 11050 > Since SQL Generation is removed, we do not need to keep isTableSample. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21203) Wrong result are inserted by Array of Struct
Xiao Li created SPARK-21203: --- Summary: Wrong result are inserted by Array of Struct Key: SPARK-21203 URL: https://issues.apache.org/jira/browse/SPARK-21203 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.1, 2.2.0 Reporter: Xiao Li Assignee: Xiao Li Priority: Critical {noformat} spark.sql( """ |CREATE TABLE `tab1` |(`custom_fields` ARRAY>) |USING parquet """.stripMargin) spark.sql( """ |INSERT INTO `tab1` |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2, 'value', 'b')) """.stripMargin) spark.sql("SELECT custom_fields.id, custom_fields.value FROM tab1").show() {noformat} The returned result is wrong: {noformat} Row(Array(2, 2), Array("b", "b")) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21203) Wrong results are inserted by Array of Struct
[ https://issues.apache.org/jira/browse/SPARK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21203: Summary: Wrong results are inserted by Array of Struct (was: Wrong result are inserted by Array of Struct) > Wrong results are inserted by Array of Struct > - > > Key: SPARK-21203 > URL: https://issues.apache.org/jira/browse/SPARK-21203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Critical > > {noformat} > spark.sql( > """ > |CREATE TABLE `tab1` > |(`custom_fields` ARRAY>) > |USING parquet > """.stripMargin) > spark.sql( > """ > |INSERT INTO `tab1` > |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), > named_struct('id', 2, 'value', 'b')) > """.stripMargin) > spark.sql("SELECT custom_fields.id, custom_fields.value FROM > tab1").show() > {noformat} > The returned result is wrong: > {noformat} > Row(Array(2, 2), Array("b", "b")) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21203) Wrong results of insertion of Array of Struct
[ https://issues.apache.org/jira/browse/SPARK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21203: Summary: Wrong results of insertion of Array of Struct (was: Wrong results are inserted by Array of Struct) > Wrong results of insertion of Array of Struct > - > > Key: SPARK-21203 > URL: https://issues.apache.org/jira/browse/SPARK-21203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Critical > > {noformat} > spark.sql( > """ > |CREATE TABLE `tab1` > |(`custom_fields` ARRAY>) > |USING parquet > """.stripMargin) > spark.sql( > """ > |INSERT INTO `tab1` > |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), > named_struct('id', 2, 'value', 'b')) > """.stripMargin) > spark.sql("SELECT custom_fields.id, custom_fields.value FROM > tab1").show() > {noformat} > The returned result is wrong: > {noformat} > Row(Array(2, 2), Array("b", "b")) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21203) Wrong results of insertion of Array of Struct
[ https://issues.apache.org/jira/browse/SPARK-21203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21203: Target Version/s: 2.2.0 > Wrong results of insertion of Array of Struct > - > > Key: SPARK-21203 > URL: https://issues.apache.org/jira/browse/SPARK-21203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Critical > > {noformat} > spark.sql( > """ > |CREATE TABLE `tab1` > |(`custom_fields` ARRAY>) > |USING parquet > """.stripMargin) > spark.sql( > """ > |INSERT INTO `tab1` > |SELECT ARRAY(named_struct('id', 1, 'value', 'a'), > named_struct('id', 2, 'value', 'b')) > """.stripMargin) > spark.sql("SELECT custom_fields.id, custom_fields.value FROM > tab1").show() > {noformat} > The returned result is wrong: > {noformat} > Row(Array(2, 2), Array("b", "b")) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20555) Incorrect handling of Oracle's decimal types via JDBC
[ https://issues.apache.org/jira/browse/SPARK-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20555. - Resolution: Fixed Fix Version/s: 2.2.0 2.1.2 > Incorrect handling of Oracle's decimal types via JDBC > - > > Key: SPARK-20555 > URL: https://issues.apache.org/jira/browse/SPARK-20555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Gabor Feher > Fix For: 2.1.2, 2.2.0 > > > When querying an Oracle database, Spark maps some Oracle numeric data types > to incorrect Catalyst data types: > 1. DECIMAL(1) becomes BooleanType > In Orcale, a DECIMAL(1) can have values from -9 to 9. > In Spark now, values larger than 1 become the boolean value true. > 2. DECIMAL(3,2) becomes IntegerType > In Oracle, a DECIMAL(2) can have values like 1.23 > In Spark now, digits after the decimal point are dropped. > 3. DECIMAL(10) becomes IntegerType > In Oracle, a DECIMAL(10) can have the value 99 (ten nines), which is > more than 2^31 > Spark throws an exception: "java.sql.SQLException: Numeric Overflow" > I think the best solution is to always keep Oracle's decimal types. (In > theory we could introduce a FloatType in some case of #2, and fix #3 by only > introducing IntegerType for DECIMAL(9). But in my opinion, that would end up > complicated and error-prone.) > Note: I think the above problems were introduced as part of > https://github.com/apache/spark/pull/14377 > The main purpose of that PR seems to be converting Spark types to correct > Oracle types, and that part seems good to me. But it also adds the inverse > conversions. As it turns out in the above examples, that is not possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20555) Incorrect handling of Oracle's decimal types via JDBC
[ https://issues.apache.org/jira/browse/SPARK-20555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20555: --- Assignee: Gabor Feher > Incorrect handling of Oracle's decimal types via JDBC > - > > Key: SPARK-20555 > URL: https://issues.apache.org/jira/browse/SPARK-20555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Gabor Feher >Assignee: Gabor Feher > Fix For: 2.1.2, 2.2.0 > > > When querying an Oracle database, Spark maps some Oracle numeric data types > to incorrect Catalyst data types: > 1. DECIMAL(1) becomes BooleanType > In Orcale, a DECIMAL(1) can have values from -9 to 9. > In Spark now, values larger than 1 become the boolean value true. > 2. DECIMAL(3,2) becomes IntegerType > In Oracle, a DECIMAL(2) can have values like 1.23 > In Spark now, digits after the decimal point are dropped. > 3. DECIMAL(10) becomes IntegerType > In Oracle, a DECIMAL(10) can have the value 99 (ten nines), which is > more than 2^31 > Spark throws an exception: "java.sql.SQLException: Numeric Overflow" > I think the best solution is to always keep Oracle's decimal types. (In > theory we could introduce a FloatType in some case of #2, and fix #3 by only > introducing IntegerType for DECIMAL(9). But in my opinion, that would end up > complicated and error-prone.) > Note: I think the above problems were introduced as part of > https://github.com/apache/spark/pull/14377 > The main purpose of that PR seems to be converting Spark types to correct > Oracle types, and that part seems good to me. But it also adds the inverse > conversions. As it turns out in the above examples, that is not possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21079) ANALYZE TABLE fails to calculate totalSize for a partitioned table
[ https://issues.apache.org/jira/browse/SPARK-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21079. - Resolution: Fixed Assignee: Maria Fix Version/s: 2.2.0 > ANALYZE TABLE fails to calculate totalSize for a partitioned table > -- > > Key: SPARK-21079 > URL: https://issues.apache.org/jira/browse/SPARK-21079 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.1 >Reporter: Maria >Assignee: Maria > Labels: easyfix > Fix For: 2.2.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > ANALYZE TABLE table COMPUTE STATISTICS invoked for a partition table produces > totalSize = 0. > AnalyzeTableCommand fetches table-level storage URI and calculated total size > of files in the corresponding directory recursively. However, for partitioned > tables, each partition has its own storage URI which may not be a > subdirectory of the table-level storage URI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21256) Add WithSQLConf to Catalyst
Xiao Li created SPARK-21256: --- Summary: Add WithSQLConf to Catalyst Key: SPARK-21256 URL: https://issues.apache.org/jira/browse/SPARK-21256 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 2.3.0 Reporter: Xiao Li Assignee: Xiao Li Add WithSQLConf to the Catalyst module. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21256) Add WithSQLConf to Catalyst Test
[ https://issues.apache.org/jira/browse/SPARK-21256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21256: Summary: Add WithSQLConf to Catalyst Test (was: Add WithSQLConf to Catalyst) > Add WithSQLConf to Catalyst Test > > > Key: SPARK-21256 > URL: https://issues.apache.org/jira/browse/SPARK-21256 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Xiao Li > > Add WithSQLConf to the Catalyst module. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table
[ https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20073: Labels: (was: correctness) > Unexpected Cartesian product when using eqNullSafe in join with a derived > table > --- > > Key: SPARK-20073 > URL: https://issues.apache.org/jira/browse/SPARK-20073 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.0.2, 2.1.0 >Reporter: Everett Anderson > > It appears that if you try to join tables A and B when B is derived from A > and you use the eqNullSafe / <=> operator for the join condition, Spark > performs a Cartesian product. > However, if you perform the join on tables of the same data when they don't > have a relationship, the expected non-Cartesian product join occurs. > {noformat} > // Create some fake data. > import org.apache.spark.sql.Row > import org.apache.spark.sql.Dataset > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions > val peopleRowsRDD = sc.parallelize(Seq( > Row("Fred", 8, 1), > Row("Fred", 8, 2), > Row(null, 10, 3), > Row(null, 10, 4), > Row("Amy", 12, 5), > Row("Amy", 12, 6))) > > val peopleSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("group", IntegerType, nullable = true), > StructField("data", IntegerType, nullable = true))) > > val people = spark.createDataFrame(peopleRowsRDD, peopleSchema) > people.createOrReplaceTempView("people") > scala> people.show > ++-++ > |name|group|data| > ++-++ > |Fred|8| 1| > |Fred|8| 2| > |null| 10| 3| > |null| 10| 4| > | Amy| 12| 5| > | Amy| 12| 6| > ++-++ > // Now create a derived table from that table. It doesn't matter much what. > val variantCounts = spark.sql("select name, count(distinct(name, group, > data)) as variant_count from people group by name having variant_count > 1") > variantCounts.show > ++-+ > > |name|variant_count| > ++-+ > |Fred|2| > |null|2| > | Amy|2| > ++-+ > // Now try an inner join using the regular equalTo that drops nulls. This > works fine. > val innerJoinEqualTo = variantCounts.join(people, > variantCounts("name").equalTo(people("name"))) > innerJoinEqualTo.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay now lets switch to the <=> operator > // > // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error > like > // "Cartesian joins could be prohibitively expensive and are disabled by > default. To explicitly enable them, please set spark.sql.crossJoin.enabled = > true;" > // > // if you have enabled them, you'll get the table below. > // > // However, we really don't want or expect a Cartesian product! > val innerJoinSqlNullSafeEqOp = variantCounts.join(people, > variantCounts("name")<=>(people("name"))) > innerJoinSqlNullSafeEqOp.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > |Fred|2|null| 10| 3| > |Fred|2|null| 10| 4| > |Fred|2| Amy| 12| 5| > |Fred|2| Amy| 12| 6| > |null|2|Fred|8| 1| > |null|2|Fred|8| 2| > |null|2|null| 10| 3| > |null|2|null| 10| 4| > |null|2| Amy| 12| 5| > |null|2| Amy| 12| 6| > | Amy|2|Fred|8| 1| > | Amy|2|Fred|8| 2| > | Amy|2|null| 10| 3| > | Amy|2|null| 10| 4| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay, let's try to construct the exact same variantCount table manually > // so it has no relationship to the original. > val variantCountRowsRDD = sc.parallelize(Seq( > Row("Fred", 2), > Row(null, 2), > Row("Amy", 2))) > > val variantCountSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("variant_count", IntegerType, nullable = true))) > > val manualVariantCounts = spark.createDataFrame(variantCountRowsRDD, > variantCountSchema) > // Now perform the
[jira] [Updated] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table
[ https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20073: Component/s: (was: Optimizer) SQL > Unexpected Cartesian product when using eqNullSafe in join with a derived > table > --- > > Key: SPARK-20073 > URL: https://issues.apache.org/jira/browse/SPARK-20073 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Everett Anderson > > It appears that if you try to join tables A and B when B is derived from A > and you use the eqNullSafe / <=> operator for the join condition, Spark > performs a Cartesian product. > However, if you perform the join on tables of the same data when they don't > have a relationship, the expected non-Cartesian product join occurs. > {noformat} > // Create some fake data. > import org.apache.spark.sql.Row > import org.apache.spark.sql.Dataset > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions > val peopleRowsRDD = sc.parallelize(Seq( > Row("Fred", 8, 1), > Row("Fred", 8, 2), > Row(null, 10, 3), > Row(null, 10, 4), > Row("Amy", 12, 5), > Row("Amy", 12, 6))) > > val peopleSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("group", IntegerType, nullable = true), > StructField("data", IntegerType, nullable = true))) > > val people = spark.createDataFrame(peopleRowsRDD, peopleSchema) > people.createOrReplaceTempView("people") > scala> people.show > ++-++ > |name|group|data| > ++-++ > |Fred|8| 1| > |Fred|8| 2| > |null| 10| 3| > |null| 10| 4| > | Amy| 12| 5| > | Amy| 12| 6| > ++-++ > // Now create a derived table from that table. It doesn't matter much what. > val variantCounts = spark.sql("select name, count(distinct(name, group, > data)) as variant_count from people group by name having variant_count > 1") > variantCounts.show > ++-+ > > |name|variant_count| > ++-+ > |Fred|2| > |null|2| > | Amy|2| > ++-+ > // Now try an inner join using the regular equalTo that drops nulls. This > works fine. > val innerJoinEqualTo = variantCounts.join(people, > variantCounts("name").equalTo(people("name"))) > innerJoinEqualTo.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay now lets switch to the <=> operator > // > // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error > like > // "Cartesian joins could be prohibitively expensive and are disabled by > default. To explicitly enable them, please set spark.sql.crossJoin.enabled = > true;" > // > // if you have enabled them, you'll get the table below. > // > // However, we really don't want or expect a Cartesian product! > val innerJoinSqlNullSafeEqOp = variantCounts.join(people, > variantCounts("name")<=>(people("name"))) > innerJoinSqlNullSafeEqOp.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > |Fred|2|null| 10| 3| > |Fred|2|null| 10| 4| > |Fred|2| Amy| 12| 5| > |Fred|2| Amy| 12| 6| > |null|2|Fred|8| 1| > |null|2|Fred|8| 2| > |null|2|null| 10| 3| > |null|2|null| 10| 4| > |null|2| Amy| 12| 5| > |null|2| Amy| 12| 6| > | Amy|2|Fred|8| 1| > | Amy|2|Fred|8| 2| > | Amy|2|null| 10| 3| > | Amy|2|null| 10| 4| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay, let's try to construct the exact same variantCount table manually > // so it has no relationship to the original. > val variantCountRowsRDD = sc.parallelize(Seq( > Row("Fred", 2), > Row(null, 2), > Row("Amy", 2))) > > val variantCountSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("variant_count", IntegerType, nullable = true))) > > val manualVariantCounts = spark.createDataFrame(variantCountRowsRDD, > variantCountSchema) >
[jira] [Updated] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table
[ https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20073: Issue Type: Improvement (was: Bug) > Unexpected Cartesian product when using eqNullSafe in join with a derived > table > --- > > Key: SPARK-20073 > URL: https://issues.apache.org/jira/browse/SPARK-20073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Everett Anderson > > It appears that if you try to join tables A and B when B is derived from A > and you use the eqNullSafe / <=> operator for the join condition, Spark > performs a Cartesian product. > However, if you perform the join on tables of the same data when they don't > have a relationship, the expected non-Cartesian product join occurs. > {noformat} > // Create some fake data. > import org.apache.spark.sql.Row > import org.apache.spark.sql.Dataset > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions > val peopleRowsRDD = sc.parallelize(Seq( > Row("Fred", 8, 1), > Row("Fred", 8, 2), > Row(null, 10, 3), > Row(null, 10, 4), > Row("Amy", 12, 5), > Row("Amy", 12, 6))) > > val peopleSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("group", IntegerType, nullable = true), > StructField("data", IntegerType, nullable = true))) > > val people = spark.createDataFrame(peopleRowsRDD, peopleSchema) > people.createOrReplaceTempView("people") > scala> people.show > ++-++ > |name|group|data| > ++-++ > |Fred|8| 1| > |Fred|8| 2| > |null| 10| 3| > |null| 10| 4| > | Amy| 12| 5| > | Amy| 12| 6| > ++-++ > // Now create a derived table from that table. It doesn't matter much what. > val variantCounts = spark.sql("select name, count(distinct(name, group, > data)) as variant_count from people group by name having variant_count > 1") > variantCounts.show > ++-+ > > |name|variant_count| > ++-+ > |Fred|2| > |null|2| > | Amy|2| > ++-+ > // Now try an inner join using the regular equalTo that drops nulls. This > works fine. > val innerJoinEqualTo = variantCounts.join(people, > variantCounts("name").equalTo(people("name"))) > innerJoinEqualTo.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay now lets switch to the <=> operator > // > // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error > like > // "Cartesian joins could be prohibitively expensive and are disabled by > default. To explicitly enable them, please set spark.sql.crossJoin.enabled = > true;" > // > // if you have enabled them, you'll get the table below. > // > // However, we really don't want or expect a Cartesian product! > val innerJoinSqlNullSafeEqOp = variantCounts.join(people, > variantCounts("name")<=>(people("name"))) > innerJoinSqlNullSafeEqOp.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > |Fred|2|null| 10| 3| > |Fred|2|null| 10| 4| > |Fred|2| Amy| 12| 5| > |Fred|2| Amy| 12| 6| > |null|2|Fred|8| 1| > |null|2|Fred|8| 2| > |null|2|null| 10| 3| > |null|2|null| 10| 4| > |null|2| Amy| 12| 5| > |null|2| Amy| 12| 6| > | Amy|2|Fred|8| 1| > | Amy|2|Fred|8| 2| > | Amy|2|null| 10| 3| > | Amy|2|null| 10| 4| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay, let's try to construct the exact same variantCount table manually > // so it has no relationship to the original. > val variantCountRowsRDD = sc.parallelize(Seq( > Row("Fred", 2), > Row(null, 2), > Row("Amy", 2))) > > val variantCountSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("variant_count", IntegerType, nullable = true))) > > val manualVariantCounts = spark.createDataFrame(variantCountRowsRDD, > variantCountSchema) > // Now per
[jira] [Resolved] (SPARK-21129) Arguments of SQL function call should not be named expressions
[ https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21129. - Resolution: Fixed Fix Version/s: 2.2.0 > Arguments of SQL function call should not be named expressions > -- > > Key: SPARK-21129 > URL: https://issues.apache.org/jira/browse/SPARK-21129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.2.0 > > > Function argument should not be named expressions. It could cause misleading > error message. > {noformat} > spark-sql> select count(distinct c1, distinct c2) from t1; > {noformat} > {noformat} > Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; > line 1 pos 26; > 'Project [unresolvedalias('count(c1#30, 'distinct), None)] > +- SubqueryAlias t1 >+- CatalogRelation `default`.`t1`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21273) Decouple stats propagation from logical plan
[ https://issues.apache.org/jira/browse/SPARK-21273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21273. - Resolution: Fixed Fix Version/s: 2.3.0 > Decouple stats propagation from logical plan > > > Key: SPARK-21273 > URL: https://issues.apache.org/jira/browse/SPARK-21273 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.3.0 > > > We currently implement statistics propagation directly in logical plan. Given > we already have two different implementations, it'd make sense to actually > decouple the two and add stats propagation using mixin. > This can also be a powerful pattern in the future to add additional > properties (e.g. constraints). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18004) DataFrame filter Predicate push-down fails for Oracle Timestamp type columns
[ https://issues.apache.org/jira/browse/SPARK-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-18004. - Resolution: Fixed Fix Version/s: 2.3.0 > DataFrame filter Predicate push-down fails for Oracle Timestamp type columns > > > Key: SPARK-18004 > URL: https://issues.apache.org/jira/browse/SPARK-18004 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Suhas Nalapure >Assignee: Rui Zha >Priority: Critical > Fix For: 2.3.0 > > > DataFrame filter Predicate push-down fails for Oracle Timestamp type columns > with Exception java.sql.SQLDataException: ORA-01861: literal does not match > format string: > Java source code (this code works fine for mysql & mssql databases) : > {noformat} > //DataFrame df = create a DataFrame over an Oracle table > df = df.filter(df.col("TS").lt(new > java.sql.Timestamp(System.currentTimeMillis(; > df.explain(); > df.show(); > {noformat} > Log statements with the Exception: > {noformat} > Schema: root > |-- ID: string (nullable = false) > |-- TS: timestamp (nullable = true) > |-- DEVICE_ID: string (nullable = true) > |-- REPLACEMENT: string (nullable = true) > {noformat} > {noformat} > == Physical Plan == > Filter (TS#1 < 1476861841934000) > +- Scan > JDBCRelation(jdbc:oracle:thin:@10.0.0.111:1521:orcl,ORATABLE,[Lorg.apache.spark.Partition;@78c74647,{user=user, > password=pwd, url=jdbc:oracle:thin:@10.0.0.111:1521:orcl, dbtable=ORATABLE, > driver=oracle.jdbc.driver.OracleDriver})[ID#0,TS#1,DEVICE_ID#2,REPLACEMENT#3] > PushedFilters: [LessThan(TS,2016-10-19 12:54:01.934)] > 2016-10-19 12:54:04,268 ERROR [Executor task launch worker-0] > org.apache.spark.executor.Executor > Exception in task 0.0 in stage 0.0 (TID 0) > java.sql.SQLDataException: ORA-01861: literal does not match format string > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:461) > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:402) > at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1065) > at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:681) > at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:256) > at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:577) > at > oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:239) > at > oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:75) > at > oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1043) > at > oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:) > at > oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1353) > at > oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4485) > at > oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4566) > at > oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:5251) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:383) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:359) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18004) DataFrame filter Predicate push-down fails for Oracle Timestamp type columns
[ https://issues.apache.org/jira/browse/SPARK-18004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-18004: --- Assignee: Rui Zha > DataFrame filter Predicate push-down fails for Oracle Timestamp type columns > > > Key: SPARK-18004 > URL: https://issues.apache.org/jira/browse/SPARK-18004 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Suhas Nalapure >Assignee: Rui Zha >Priority: Critical > Fix For: 2.3.0 > > > DataFrame filter Predicate push-down fails for Oracle Timestamp type columns > with Exception java.sql.SQLDataException: ORA-01861: literal does not match > format string: > Java source code (this code works fine for mysql & mssql databases) : > {noformat} > //DataFrame df = create a DataFrame over an Oracle table > df = df.filter(df.col("TS").lt(new > java.sql.Timestamp(System.currentTimeMillis(; > df.explain(); > df.show(); > {noformat} > Log statements with the Exception: > {noformat} > Schema: root > |-- ID: string (nullable = false) > |-- TS: timestamp (nullable = true) > |-- DEVICE_ID: string (nullable = true) > |-- REPLACEMENT: string (nullable = true) > {noformat} > {noformat} > == Physical Plan == > Filter (TS#1 < 1476861841934000) > +- Scan > JDBCRelation(jdbc:oracle:thin:@10.0.0.111:1521:orcl,ORATABLE,[Lorg.apache.spark.Partition;@78c74647,{user=user, > password=pwd, url=jdbc:oracle:thin:@10.0.0.111:1521:orcl, dbtable=ORATABLE, > driver=oracle.jdbc.driver.OracleDriver})[ID#0,TS#1,DEVICE_ID#2,REPLACEMENT#3] > PushedFilters: [LessThan(TS,2016-10-19 12:54:01.934)] > 2016-10-19 12:54:04,268 ERROR [Executor task launch worker-0] > org.apache.spark.executor.Executor > Exception in task 0.0 in stage 0.0 (TID 0) > java.sql.SQLDataException: ORA-01861: literal does not match format string > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:461) > at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:402) > at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1065) > at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:681) > at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:256) > at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:577) > at > oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:239) > at > oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:75) > at > oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1043) > at > oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:) > at > oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1353) > at > oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4485) > at > oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4566) > at > oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:5251) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:383) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:359) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21282) Fix test failure in 2.0
Xiao Li created SPARK-21282: --- Summary: Fix test failure in 2.0 Key: SPARK-21282 URL: https://issues.apache.org/jira/browse/SPARK-21282 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.2 Reporter: Xiao Li Assignee: Xiao Li There is a test failure after backporting a fix from 2.2 to 2.0, because the automatically generated column names are different. https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.0-test-maven-hadoop-2.2/lastCompletedBuild/testReport/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21287) Cannot use Int.MIN_VALUE as Spark SQL fetchsize
[ https://issues.apache.org/jira/browse/SPARK-21287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072700#comment-16072700 ] Xiao Li commented on SPARK-21287: - This value is very specific to MySQL. Since we are supporting different dialects, we could introduce a dialect-specific checking logics. > Cannot use Int.MIN_VALUE as Spark SQL fetchsize > --- > > Key: SPARK-21287 > URL: https://issues.apache.org/jira/browse/SPARK-21287 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Maciej BryĆski > > MySQL JDBC driver gives possibility to not store ResultSet in memory. > We can do this by setting fetchSize to Int.MIN_VALUE. > Unfortunately this configuration isn't correct in Spark. > {code} > java.lang.IllegalArgumentException: requirement failed: Invalid value > `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the > value is 0, the JDBC driver ignores the value and does the estimates. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:105) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:34) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166) > at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206) > at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:748) > {code} > https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table
[ https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20073. - Resolution: Fixed Fix Version/s: 2.3.0 > Unexpected Cartesian product when using eqNullSafe in join with a derived > table > --- > > Key: SPARK-20073 > URL: https://issues.apache.org/jira/browse/SPARK-20073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Everett Anderson >Assignee: Takeshi Yamamuro > Fix For: 2.3.0 > > > It appears that if you try to join tables A and B when B is derived from A > and you use the eqNullSafe / <=> operator for the join condition, Spark > performs a Cartesian product. > However, if you perform the join on tables of the same data when they don't > have a relationship, the expected non-Cartesian product join occurs. > {noformat} > // Create some fake data. > import org.apache.spark.sql.Row > import org.apache.spark.sql.Dataset > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions > val peopleRowsRDD = sc.parallelize(Seq( > Row("Fred", 8, 1), > Row("Fred", 8, 2), > Row(null, 10, 3), > Row(null, 10, 4), > Row("Amy", 12, 5), > Row("Amy", 12, 6))) > > val peopleSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("group", IntegerType, nullable = true), > StructField("data", IntegerType, nullable = true))) > > val people = spark.createDataFrame(peopleRowsRDD, peopleSchema) > people.createOrReplaceTempView("people") > scala> people.show > ++-++ > |name|group|data| > ++-++ > |Fred|8| 1| > |Fred|8| 2| > |null| 10| 3| > |null| 10| 4| > | Amy| 12| 5| > | Amy| 12| 6| > ++-++ > // Now create a derived table from that table. It doesn't matter much what. > val variantCounts = spark.sql("select name, count(distinct(name, group, > data)) as variant_count from people group by name having variant_count > 1") > variantCounts.show > ++-+ > > |name|variant_count| > ++-+ > |Fred|2| > |null|2| > | Amy|2| > ++-+ > // Now try an inner join using the regular equalTo that drops nulls. This > works fine. > val innerJoinEqualTo = variantCounts.join(people, > variantCounts("name").equalTo(people("name"))) > innerJoinEqualTo.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay now lets switch to the <=> operator > // > // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error > like > // "Cartesian joins could be prohibitively expensive and are disabled by > default. To explicitly enable them, please set spark.sql.crossJoin.enabled = > true;" > // > // if you have enabled them, you'll get the table below. > // > // However, we really don't want or expect a Cartesian product! > val innerJoinSqlNullSafeEqOp = variantCounts.join(people, > variantCounts("name")<=>(people("name"))) > innerJoinSqlNullSafeEqOp.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > |Fred|2|null| 10| 3| > |Fred|2|null| 10| 4| > |Fred|2| Amy| 12| 5| > |Fred|2| Amy| 12| 6| > |null|2|Fred|8| 1| > |null|2|Fred|8| 2| > |null|2|null| 10| 3| > |null|2|null| 10| 4| > |null|2| Amy| 12| 5| > |null|2| Amy| 12| 6| > | Amy|2|Fred|8| 1| > | Amy|2|Fred|8| 2| > | Amy|2|null| 10| 3| > | Amy|2|null| 10| 4| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay, let's try to construct the exact same variantCount table manually > // so it has no relationship to the original. > val variantCountRowsRDD = sc.parallelize(Seq( > Row("Fred", 2), > Row(null, 2), > Row("Amy", 2))) > > val variantCountSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("variant_count", IntegerType, nullable = true))) > > val manualVariantCoun
[jira] [Assigned] (SPARK-20073) Unexpected Cartesian product when using eqNullSafe in join with a derived table
[ https://issues.apache.org/jira/browse/SPARK-20073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20073: --- Assignee: Takeshi Yamamuro > Unexpected Cartesian product when using eqNullSafe in join with a derived > table > --- > > Key: SPARK-20073 > URL: https://issues.apache.org/jira/browse/SPARK-20073 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Everett Anderson >Assignee: Takeshi Yamamuro > Fix For: 2.3.0 > > > It appears that if you try to join tables A and B when B is derived from A > and you use the eqNullSafe / <=> operator for the join condition, Spark > performs a Cartesian product. > However, if you perform the join on tables of the same data when they don't > have a relationship, the expected non-Cartesian product join occurs. > {noformat} > // Create some fake data. > import org.apache.spark.sql.Row > import org.apache.spark.sql.Dataset > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions > val peopleRowsRDD = sc.parallelize(Seq( > Row("Fred", 8, 1), > Row("Fred", 8, 2), > Row(null, 10, 3), > Row(null, 10, 4), > Row("Amy", 12, 5), > Row("Amy", 12, 6))) > > val peopleSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("group", IntegerType, nullable = true), > StructField("data", IntegerType, nullable = true))) > > val people = spark.createDataFrame(peopleRowsRDD, peopleSchema) > people.createOrReplaceTempView("people") > scala> people.show > ++-++ > |name|group|data| > ++-++ > |Fred|8| 1| > |Fred|8| 2| > |null| 10| 3| > |null| 10| 4| > | Amy| 12| 5| > | Amy| 12| 6| > ++-++ > // Now create a derived table from that table. It doesn't matter much what. > val variantCounts = spark.sql("select name, count(distinct(name, group, > data)) as variant_count from people group by name having variant_count > 1") > variantCounts.show > ++-+ > > |name|variant_count| > ++-+ > |Fred|2| > |null|2| > | Amy|2| > ++-+ > // Now try an inner join using the regular equalTo that drops nulls. This > works fine. > val innerJoinEqualTo = variantCounts.join(people, > variantCounts("name").equalTo(people("name"))) > innerJoinEqualTo.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay now lets switch to the <=> operator > // > // If you haven't set spark.sql.crossJoin.enabled=true, you'll get an error > like > // "Cartesian joins could be prohibitively expensive and are disabled by > default. To explicitly enable them, please set spark.sql.crossJoin.enabled = > true;" > // > // if you have enabled them, you'll get the table below. > // > // However, we really don't want or expect a Cartesian product! > val innerJoinSqlNullSafeEqOp = variantCounts.join(people, > variantCounts("name")<=>(people("name"))) > innerJoinSqlNullSafeEqOp.show > ++-++-++ > > |name|variant_count|name|group|data| > ++-++-++ > |Fred|2|Fred|8| 1| > |Fred|2|Fred|8| 2| > |Fred|2|null| 10| 3| > |Fred|2|null| 10| 4| > |Fred|2| Amy| 12| 5| > |Fred|2| Amy| 12| 6| > |null|2|Fred|8| 1| > |null|2|Fred|8| 2| > |null|2|null| 10| 3| > |null|2|null| 10| 4| > |null|2| Amy| 12| 5| > |null|2| Amy| 12| 6| > | Amy|2|Fred|8| 1| > | Amy|2|Fred|8| 2| > | Amy|2|null| 10| 3| > | Amy|2|null| 10| 4| > | Amy|2| Amy| 12| 5| > | Amy|2| Amy| 12| 6| > ++-++-++ > // Okay, let's try to construct the exact same variantCount table manually > // so it has no relationship to the original. > val variantCountRowsRDD = sc.parallelize(Seq( > Row("Fred", 2), > Row(null, 2), > Row("Amy", 2))) > > val variantCountSchema = StructType(Seq( > StructField("name", StringType, nullable = true), > StructField("variant_count", IntegerType, nullable = true))) > > val manualVariantCounts = spark.cre
[jira] [Resolved] (SPARK-21284) rename SessionCatalog.registerFunction parameter name
[ https://issues.apache.org/jira/browse/SPARK-21284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21284. - Resolution: Fixed Fix Version/s: 2.3.0 > rename SessionCatalog.registerFunction parameter name > - > > Key: SPARK-21284 > URL: https://issues.apache.org/jira/browse/SPARK-21284 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Minor > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21295) Use qualified name in the error message for missing references
Xiao Li created SPARK-21295: --- Summary: Use qualified name in the error message for missing references Key: SPARK-21295 URL: https://issues.apache.org/jira/browse/SPARK-21295 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Xiao Li Assignee: Xiao Li It is strange to see the following error message. Actually, the column is from different tables. {noformat} `cannot resolve '`right.a`' given input columns: [a, c, d];` {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21295) Confusing error message for missing references
[ https://issues.apache.org/jira/browse/SPARK-21295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21295: Summary: Confusing error message for missing references (was: Use qualified name in the error message for missing references) > Confusing error message for missing references > -- > > Key: SPARK-21295 > URL: https://issues.apache.org/jira/browse/SPARK-21295 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Xiao Li > > It is strange to see the following error message. Actually, the column is > from different tables. > {noformat} > `cannot resolve '`right.a`' given input columns: [a, c, d];` > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19726) Faild to insert null timestamp value to mysql using spark jdbc
[ https://issues.apache.org/jira/browse/SPARK-19726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19726. - Resolution: Fixed Assignee: wangshuangshuang Fix Version/s: 2.3.0 > Faild to insert null timestamp value to mysql using spark jdbc > -- > > Key: SPARK-19726 > URL: https://issues.apache.org/jira/browse/SPARK-19726 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: AnfengYuan >Assignee: wangshuangshuang > Fix For: 2.3.0 > > > 1. create a table in mysql > {code:borderStyle=solid} > CREATE TABLE `timestamp_test` ( > `id` bigint(23) DEFAULT NULL, > `time_stamp` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE > CURRENT_TIMESTAMP > ) ENGINE=InnoDB DEFAULT CHARSET=utf8 > {code} > 2. insert one row using spark > {code:borderStyle=solid} > CREATE OR REPLACE TEMPORARY VIEW jdbcTable > USING org.apache.spark.sql.jdbc > OPTIONS ( > url > 'jdbc:mysql://xxx.xxx.xxx.xxx:3306/default?characterEncoding=utf8&useServerPrepStmts=false&rewriteBatchedStatements=true', > dbtable 'timestamp_test', > driver 'com.mysql.jdbc.Driver', > user 'root', > password 'root' > ); > insert into jdbcTable values (1, null); > {code} > the insert statement failed with exceptions: > {code:borderStyle=solid} > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 599 in stage 1.0 failed 4 times, most recent failure: Lost task 599.3 in > stage 1.0 (TID 1202, A03-R07-I12-135.JD.LOCAL): > java.sql.BatchUpdateException: Data truncation: Incorrect datetime value: > '1970-01-01 08:00:00' for column 'time_stamp' at row 1 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:404) > at com.mysql.jdbc.Util.getInstance(Util.java:387) > at > com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1154) > at > com.mysql.jdbc.PreparedStatement.executeBatchedInserts(PreparedStatement.java:1582) > at > com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1248) > at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:959) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:227) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:300) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:299) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect > datetime value: '1970-01-01 08:00:00' for column 'time_stamp' at row 1 > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3876) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3814) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2478) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2625) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2009) > at > com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5094) > at > com.mysql.jdbc.PreparedStatement.executeBatchedInserts(PreparedStatement.java:1543) > ... 15 more > {code} -- This message was sent by Atlassian JIRA (v6.
[jira] [Resolved] (SPARK-20256) Fail to start SparkContext/SparkSession with Hive support enabled when user does not have read/write privilege to Hive metastore warehouse dir
[ https://issues.apache.org/jira/browse/SPARK-20256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20256. - Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.3.0 2.2.1 > Fail to start SparkContext/SparkSession with Hive support enabled when user > does not have read/write privilege to Hive metastore warehouse dir > -- > > Key: SPARK-20256 > URL: https://issues.apache.org/jira/browse/SPARK-20256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.1.1, 2.2.0 >Reporter: Xin Wu >Assignee: Dongjoon Hyun >Priority: Critical > Fix For: 2.2.1, 2.3.0 > > > In a cluster setup with production Hive running, when the user wants to run > spark-shell using the production Hive metastore, hive-site.xml is copied to > SPARK_HOME/conf. So when spark-shell is being started, it tries to check > database existence of "default" database from Hive metastore. Yet, since this > user may not have READ/WRITE access to the configured Hive warehouse > directory done by Hive itself, such permission error will prevent spark-shell > or any spark application with Hive support enabled from starting at all. > Example error: > {code}To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > java.lang.IllegalArgumentException: Error while instantiating > 'org.apache.spark.sql.hive.HiveSessionState': > at > org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981) > at > org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) > at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) > at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95) > ... 47 elided > Caused by: java.lang.reflect.InvocationTargetException: > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.security.AccessControlException: Permission > denied: user=notebook, access=READ, > inode="/apps/hive/warehouse":hive:hadoop:drwxrwx--- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:320) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1712) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1686) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAccess(FSNamesystem.java:8238) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkAccess(NameNodeRpcServer.java:1933) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.checkAccess(ClientNamenodeProtocolServerSideTranslatorPB.java:1455) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1697) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045) > ); > at sun.reflect.NativeConstructorAccessorImpl.new
[jira] [Created] (SPARK-21307) Remove SQLConf parameters from the parser-related classes.
Xiao Li created SPARK-21307: --- Summary: Remove SQLConf parameters from the parser-related classes. Key: SPARK-21307 URL: https://issues.apache.org/jira/browse/SPARK-21307 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Xiao Li Assignee: Xiao Li Remove SQLConf parameters from the parser-related classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21309) Remove SQLConf parameters from the analyzer
Xiao Li created SPARK-21309: --- Summary: Remove SQLConf parameters from the analyzer Key: SPARK-21309 URL: https://issues.apache.org/jira/browse/SPARK-21309 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Xiao Li Assignee: Xiao Li Remove SQLConf parameters from the analyzer -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21308) Remove SQLConf parameters from the optimizer
Xiao Li created SPARK-21308: --- Summary: Remove SQLConf parameters from the optimizer Key: SPARK-21308 URL: https://issues.apache.org/jira/browse/SPARK-21308 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Xiao Li Assignee: Xiao Li Remove SQLConf parameters from the optimizer -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19439) PySpark's registerJavaFunction Should Support UDAFs
[ https://issues.apache.org/jira/browse/SPARK-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-19439: --- Assignee: Jeff Zhang > PySpark's registerJavaFunction Should Support UDAFs > --- > > Key: SPARK-19439 > URL: https://issues.apache.org/jira/browse/SPARK-19439 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.1.0 >Reporter: Keith Bourgoin >Assignee: Jeff Zhang > > When trying to import a Scala UDAF using registerJavaFunction, I get this > error: > {code} > In [1]: sqlContext.registerJavaFunction('geo_mean', > 'com.foo.bar.GeometricMean') > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 sqlContext.registerJavaFunction('geo_mean', > 'com.foo.bar.GeometricMean') > /home/kfb/src/projects/spark/python/pyspark/sql/context.pyc in > registerJavaFunction(self, name, javaClassName, returnType) > 227 if returnType is not None: > 228 jdt = > self.sparkSession._jsparkSession.parseDataType(returnType.json()) > --> 229 self.sparkSession._jsparkSession.udf().registerJava(name, > javaClassName, jdt) > 230 > 231 # TODO(andrew): delete this once we refactor things to take in > SparkSession > /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py > in __call__(self, *args) >1131 answer = self.gateway_client.send_command(command) >1132 return_value = get_return_value( > -> 1133 answer, self.gateway_client, self.target_id, self.name) >1134 >1135 for temp_arg in temp_args: > /home/kfb/src/projects/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 317 raise Py4JJavaError( > 318 "An error occurred while calling {0}{1}{2}.\n". > --> 319 format(target_id, ".", name), value) > 320 else: > 321 raise Py4JError( > Py4JJavaError: An error occurred while calling o28.registerJava. > : java.io.IOException: UDF class com.foo.bar.GeometricMean doesn't implement > any UDF interface > at > org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > {code} > According to SPARK-10915, UDAFs in Python aren't happening anytime soon. > Without this, there's no way to get Scala UDAFs into Python Spark SQL > whatsoever. Fixing that would be a huge help so that we can keep aggregations > in the JVM and using DataFrames. Otherwise, all our code has to drop to to > RDDs and live in Python. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19439) PySpark's registerJavaFunction Should Support UDAFs
[ https://issues.apache.org/jira/browse/SPARK-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19439. - Resolution: Fixed Fix Version/s: 2.3.0 > PySpark's registerJavaFunction Should Support UDAFs > --- > > Key: SPARK-19439 > URL: https://issues.apache.org/jira/browse/SPARK-19439 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.1.0 >Reporter: Keith Bourgoin >Assignee: Jeff Zhang > Fix For: 2.3.0 > > > When trying to import a Scala UDAF using registerJavaFunction, I get this > error: > {code} > In [1]: sqlContext.registerJavaFunction('geo_mean', > 'com.foo.bar.GeometricMean') > --- > Py4JJavaError Traceback (most recent call last) > in () > > 1 sqlContext.registerJavaFunction('geo_mean', > 'com.foo.bar.GeometricMean') > /home/kfb/src/projects/spark/python/pyspark/sql/context.pyc in > registerJavaFunction(self, name, javaClassName, returnType) > 227 if returnType is not None: > 228 jdt = > self.sparkSession._jsparkSession.parseDataType(returnType.json()) > --> 229 self.sparkSession._jsparkSession.udf().registerJava(name, > javaClassName, jdt) > 230 > 231 # TODO(andrew): delete this once we refactor things to take in > SparkSession > /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py > in __call__(self, *args) >1131 answer = self.gateway_client.send_command(command) >1132 return_value = get_return_value( > -> 1133 answer, self.gateway_client, self.target_id, self.name) >1134 >1135 for temp_arg in temp_args: > /home/kfb/src/projects/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) > 61 def deco(*a, **kw): > 62 try: > ---> 63 return f(*a, **kw) > 64 except py4j.protocol.Py4JJavaError as e: > 65 s = e.java_exception.toString() > /home/kfb/src/projects/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py > in get_return_value(answer, gateway_client, target_id, name) > 317 raise Py4JJavaError( > 318 "An error occurred while calling {0}{1}{2}.\n". > --> 319 format(target_id, ".", name), value) > 320 else: > 321 raise Py4JError( > Py4JJavaError: An error occurred while calling o28.registerJava. > : java.io.IOException: UDF class com.foo.bar.GeometricMean doesn't implement > any UDF interface > at > org.apache.spark.sql.UDFRegistration.registerJava(UDFRegistration.scala:438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > {code} > According to SPARK-10915, UDAFs in Python aren't happening anytime soon. > Without this, there's no way to get Scala UDAFs into Python Spark SQL > whatsoever. Fixing that would be a huge help so that we can keep aggregations > in the JVM and using DataFrames. Otherwise, all our code has to drop to to > RDDs and live in Python. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21281) cannot create empty typed array column
[ https://issues.apache.org/jira/browse/SPARK-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21281. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.3.0 > cannot create empty typed array column > -- > > Key: SPARK-21281 > URL: https://issues.apache.org/jira/browse/SPARK-21281 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Saif Addin >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 2.3.0 > > > Hi all > I am running this piece of code > {code:java} > val data = spark.read.parquet("somedata.parquet") > data.withColumn("my_new_column", array().cast("array")).show > {code} > and it works fine > {code:java} > +--+-++---+ > |itemid|sentiment|text|my_new_column| > +--+-++---+ > | 1|0| ...| []| > | 2|0| ...| []| > | 3|1| omg...| []| > | 4|0| .. Omga...| []| > {code} > but when I do > {code:java} > val data = spark.read.parquet("somedata.parquet") > import org.apache.spark.sql.types._ > data.withColumn("my_new_column", array().cast("array").show > {code} > I get: > {code:java} > scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) > at org.apache.spark.sql.catalyst.expressions.Cast.castToInt(Cast.scala:264) > at > org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$cast(Cast.scala:433) > at org.apache.spark.sql.catalyst.expressions.Cast.castArray(Cast.scala:380) > at > org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$cast(Cast.scala:437) > at > org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:447) > at org.apache.spark.sql.catalyst.expressions.Cast.cast(Cast.scala:447) > at > org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:449) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:325) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:50) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:43) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:287) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:293) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionDown$1(QueryPlan.scala:248) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:258) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:262) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:262) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$6.apply(QueryPlan.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:267) > at > org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1.applyOrElse(expressions.scala:43)
[jira] [Closed] (SPARK-21307) Remove SQLConf parameters from the parser-related classes.
[ https://issues.apache.org/jira/browse/SPARK-21307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-21307. --- Resolution: Won't Fix > Remove SQLConf parameters from the parser-related classes. > -- > > Key: SPARK-21307 > URL: https://issues.apache.org/jira/browse/SPARK-21307 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Xiao Li > > Remove SQLConf parameters from the parser-related classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21350) Fix the error message when the number of arguments is wrong when invoking a UDF
Xiao Li created SPARK-21350: --- Summary: Fix the error message when the number of arguments is wrong when invoking a UDF Key: SPARK-21350 URL: https://issues.apache.org/jira/browse/SPARK-21350 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.1, 2.0.2, 2.2.0 Reporter: Xiao Li Assignee: Xiao Li Got a confusing error message when the number of arguments is wrong when invoking a UDF. {noformat} val df = spark.emptyDataFrame spark.udf.register("foo", (_: String).length) df.selectExpr("foo(2, 3, 4)") {noformat} {noformat} org.apache.spark.sql.UDFSuite$$anonfun$9$$anonfun$apply$mcV$sp$12 cannot be cast to scala.Function3 java.lang.ClassCastException: org.apache.spark.sql.UDFSuite$$anonfun$9$$anonfun$apply$mcV$sp$12 cannot be cast to scala.Function3 at org.apache.spark.sql.catalyst.expressions.ScalaUDF.(ScalaUDF.scala:109) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21354) INPUT FILE related functions do not support more than one sources
Xiao Li created SPARK-21354: --- Summary: INPUT FILE related functions do not support more than one sources Key: SPARK-21354 URL: https://issues.apache.org/jira/browse/SPARK-21354 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.1, 2.0.2, 2.2.0 Reporter: Xiao Li Assignee: Xiao Li {noformat} hive> select *, INPUT__FILE__NAME FROM t1, t2; FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One Tables/Subqueries {noformat} The build-in functions {{input_file_name}}, {{input_file_block_start}}, {{input_file_block_length}} do not support more than one sources, like what Hive does -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21354) INPUT FILE related functions do not support more than one sources
[ https://issues.apache.org/jira/browse/SPARK-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21354: Description: {noformat} hive> select *, INPUT__FILE__NAME FROM t1, t2; FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One Tables/Subqueries {noformat} The build-in functions {{input_file_name}}, {{input_file_block_start}}, {{input_file_block_length}} do not support more than one sources, like what Hive does. Currently, we do not block it and the outputs are ambiguous. was: {noformat} hive> select *, INPUT__FILE__NAME FROM t1, t2; FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One Tables/Subqueries {noformat} The build-in functions {{input_file_name}}, {{input_file_block_start}}, {{input_file_block_length}} do not support more than one sources, like what Hive does > INPUT FILE related functions do not support more than one sources > - > > Key: SPARK-21354 > URL: https://issues.apache.org/jira/browse/SPARK-21354 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > {noformat} > hive> select *, INPUT__FILE__NAME FROM t1, t2; > FAILED: SemanticException Column INPUT__FILE__NAME Found in more than One > Tables/Subqueries > {noformat} > The build-in functions {{input_file_name}}, {{input_file_block_start}}, > {{input_file_block_length}} do not support more than one sources, like what > Hive does. Currently, we do not block it and the outputs are ambiguous. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21272) SortMergeJoin LeftAnti does not update numOutputRows
[ https://issues.apache.org/jira/browse/SPARK-21272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21272. - Resolution: Fixed Assignee: Juliusz Sompolski Fix Version/s: 2.3.0 2.2.1 > SortMergeJoin LeftAnti does not update numOutputRows > > > Key: SPARK-21272 > URL: https://issues.apache.org/jira/browse/SPARK-21272 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Trivial > Fix For: 2.2.1, 2.3.0 > > > Output rows metric not updated in one of the branches. > PR pending. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21079) ANALYZE TABLE fails to calculate totalSize for a partitioned table
[ https://issues.apache.org/jira/browse/SPARK-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21079: Labels: (was: easyfix) > ANALYZE TABLE fails to calculate totalSize for a partitioned table > -- > > Key: SPARK-21079 > URL: https://issues.apache.org/jira/browse/SPARK-21079 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.1 >Reporter: Maria >Assignee: Maria > Fix For: 2.2.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > ANALYZE TABLE table COMPUTE STATISTICS invoked for a partition table produces > totalSize = 0. > AnalyzeTableCommand fetches table-level storage URI and calculated total size > of files in the corresponding directory recursively. However, for partitioned > tables, each partition has its own storage URI which may not be a > subdirectory of the table-level storage URI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21059) LikeSimplification can NPE on null pattern
[ https://issues.apache.org/jira/browse/SPARK-21059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21059: Fix Version/s: (was: 2.3.0) > LikeSimplification can NPE on null pattern > -- > > Key: SPARK-21059 > URL: https://issues.apache.org/jira/browse/SPARK-21059 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20920) ForkJoinPool pools are leaked when writing hive tables with many partitions
[ https://issues.apache.org/jira/browse/SPARK-20920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-20920: Fix Version/s: (was: 2.3.0) > ForkJoinPool pools are leaked when writing hive tables with many partitions > --- > > Key: SPARK-20920 > URL: https://issues.apache.org/jira/browse/SPARK-20920 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Rares Mirica >Assignee: Sean Owen > Fix For: 2.1.2, 2.2.0 > > > This bug is loosely related to SPARK-17396 > In this case it happens when writing to a hive table with many, many, > partitions (my table is partitioned by hour and stores data it gets from > kafka in a spark streaming application): > df.repartition() > .write > .format("orc") > .option("path", s"$tablesStoragePath/$tableName") > .mode(SaveMode.Append) > .partitionBy("dt", "hh") > .saveAsTable(tableName) > As this table grows beyond a certain size, ForkJoinPool pools start leaking. > Upon examination (with a debugger) I found that the caller is > AlterTableRecoverPartitionsCommand and the problem happens when > `evalTaskSupport` is used (line 555). I have tried setting a very large > threshold via `spark.rdd.parallelListingThreshold` and the problem went away. > My assumption is that the problem happens in this case and not in the one in > SPARK-17396 due to the fact that AlterTableRecoverPartitionsCommand is a case > class while UnionRDD is an object so multiple instances are not possible, > therefore no leak. > Regards, > Rares -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21043) Add unionByName API to Dataset
[ https://issues.apache.org/jira/browse/SPARK-21043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21043. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.3.0 > Add unionByName API to Dataset > -- > > Key: SPARK-21043 > URL: https://issues.apache.org/jira/browse/SPARK-21043 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Takeshi Yamamuro > Fix For: 2.3.0 > > > It would be useful to add unionByName which resolves columns by name, in > addition to the existing union (which resolves by position). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)
[ https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-19285: --- Assignee: Xiao Li > Java - Provide user-defined function of 0 arguments (UDF0) > -- > > Key: SPARK-19285 > URL: https://issues.apache.org/jira/browse/SPARK-19285 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Amit Baghel >Assignee: Xiao Li >Priority: Minor > > I need to implement zero argument UDF but Spark java api doesn't provide > UDF0. > https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java > For workaround I am creating UDF1 with one argument and not using this > argument. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)
[ https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-19285: Component/s: (was: Java API) SQL > Java - Provide user-defined function of 0 arguments (UDF0) > -- > > Key: SPARK-19285 > URL: https://issues.apache.org/jira/browse/SPARK-19285 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Amit Baghel >Priority: Minor > > I need to implement zero argument UDF but Spark java api doesn't provide > UDF0. > https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java > For workaround I am creating UDF1 with one argument and not using this > argument. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError
[ https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-18598: --- Assignee: Xiao Li > Encoding a Java Bean with extra accessors, produces inconsistent Dataset, > resulting in AssertionError > - > > Key: SPARK-18598 > URL: https://issues.apache.org/jira/browse/SPARK-18598 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hamish Morgan >Assignee: Xiao Li >Priority: Minor > Fix For: 2.3.0 > > > Most operations of {{org.apache.spark.sql.Dataset}} throw > {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java > bean {{Encoder}}, where the bean has more accessors than properties. > The following until test demonstrates the steps to replicate: > {code} > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Encoder; > import org.apache.spark.sql.Encoders; > import org.apache.spark.sql.SparkSession; > import org.junit.Test; > import org.xml.sax.SAXException; > import java.io.IOException; > import static java.util.Collections.singletonList; > public class SparkBeanEncoderTest { > public static class TestBean2 { > private String name; > public void setName(String name) { > this.name = name; > } > public String getName() { > return name; > } > public String getName2() { > return name.toLowerCase(); > } > } > @Test > public void testCreateDatasetFromBeanFailure() throws IOException, > SAXException { > SparkSession spark = SparkSession > .builder() > .master("local") > .getOrCreate(); > TestBean2 bean = new TestBean2(); > bean.setName("testing123"); > Encoder encoder = Encoders.bean(TestBean2.class); > Dataset dataset = spark.createDataset(singletonList(bean), > encoder); > dataset.show(); > spark.stop(); > } > } > {code} > Running the above produces the following output: > {code} > 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2 > 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/11/27 14:00:04 WARN Utils: Your hostname, resolves to a loopback > address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0) > 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(); groups > with view permissions: Set(); users with modify permissions: Set(); > groups with modify permissions: Set() > 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on > port 34688. > 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker > 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster > 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29 > 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB > 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator > 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.1.68:4040 > 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost > 16/11/27 14:00:05 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688. > 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on > 192.168.1.68:42688 > 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, > 42688) > 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/11/27 14:00:05 INFO SharedState: Warehouse path is > 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'. > 16/11/27
[jira] [Reopened] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError
[ https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reopened SPARK-18598: - > Encoding a Java Bean with extra accessors, produces inconsistent Dataset, > resulting in AssertionError > - > > Key: SPARK-18598 > URL: https://issues.apache.org/jira/browse/SPARK-18598 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hamish Morgan >Assignee: Xiao Li >Priority: Minor > > Most operations of {{org.apache.spark.sql.Dataset}} throw > {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java > bean {{Encoder}}, where the bean has more accessors than properties. > The following until test demonstrates the steps to replicate: > {code} > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Encoder; > import org.apache.spark.sql.Encoders; > import org.apache.spark.sql.SparkSession; > import org.junit.Test; > import org.xml.sax.SAXException; > import java.io.IOException; > import static java.util.Collections.singletonList; > public class SparkBeanEncoderTest { > public static class TestBean2 { > private String name; > public void setName(String name) { > this.name = name; > } > public String getName() { > return name; > } > public String getName2() { > return name.toLowerCase(); > } > } > @Test > public void testCreateDatasetFromBeanFailure() throws IOException, > SAXException { > SparkSession spark = SparkSession > .builder() > .master("local") > .getOrCreate(); > TestBean2 bean = new TestBean2(); > bean.setName("testing123"); > Encoder encoder = Encoders.bean(TestBean2.class); > Dataset dataset = spark.createDataset(singletonList(bean), > encoder); > dataset.show(); > spark.stop(); > } > } > {code} > Running the above produces the following output: > {code} > 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2 > 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/11/27 14:00:04 WARN Utils: Your hostname, resolves to a loopback > address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0) > 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(); groups > with view permissions: Set(); users with modify permissions: Set(); > groups with modify permissions: Set() > 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on > port 34688. > 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker > 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster > 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29 > 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB > 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator > 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.1.68:4040 > 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost > 16/11/27 14:00:05 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688. > 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on > 192.168.1.68:42688 > 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, > 42688) > 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/11/27 14:00:05 INFO SharedState: Warehouse path is > 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'. > 16/11/27 14:00:05 INFO CodeGenerator: Code generated in 166.762154
[jira] [Closed] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError
[ https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-18598. --- Resolution: Unresolved Fix Version/s: (was: 2.3.0) > Encoding a Java Bean with extra accessors, produces inconsistent Dataset, > resulting in AssertionError > - > > Key: SPARK-18598 > URL: https://issues.apache.org/jira/browse/SPARK-18598 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hamish Morgan >Priority: Minor > > Most operations of {{org.apache.spark.sql.Dataset}} throw > {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java > bean {{Encoder}}, where the bean has more accessors than properties. > The following until test demonstrates the steps to replicate: > {code} > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Encoder; > import org.apache.spark.sql.Encoders; > import org.apache.spark.sql.SparkSession; > import org.junit.Test; > import org.xml.sax.SAXException; > import java.io.IOException; > import static java.util.Collections.singletonList; > public class SparkBeanEncoderTest { > public static class TestBean2 { > private String name; > public void setName(String name) { > this.name = name; > } > public String getName() { > return name; > } > public String getName2() { > return name.toLowerCase(); > } > } > @Test > public void testCreateDatasetFromBeanFailure() throws IOException, > SAXException { > SparkSession spark = SparkSession > .builder() > .master("local") > .getOrCreate(); > TestBean2 bean = new TestBean2(); > bean.setName("testing123"); > Encoder encoder = Encoders.bean(TestBean2.class); > Dataset dataset = spark.createDataset(singletonList(bean), > encoder); > dataset.show(); > spark.stop(); > } > } > {code} > Running the above produces the following output: > {code} > 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2 > 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/11/27 14:00:04 WARN Utils: Your hostname, resolves to a loopback > address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0) > 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(); groups > with view permissions: Set(); users with modify permissions: Set(); > groups with modify permissions: Set() > 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on > port 34688. > 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker > 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster > 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29 > 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB > 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator > 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.1.68:4040 > 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost > 16/11/27 14:00:05 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688. > 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on > 192.168.1.68:42688 > 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, > 42688) > 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/11/27 14:00:05 INFO SharedState: Warehouse path is > 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'. > 16/11/27 14:00:05 INFO CodeGenerator
[jira] [Updated] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)
[ https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-19285: Priority: Major (was: Minor) > Java - Provide user-defined function of 0 arguments (UDF0) > -- > > Key: SPARK-19285 > URL: https://issues.apache.org/jira/browse/SPARK-19285 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Amit Baghel >Assignee: Xiao Li > Fix For: 2.3.0 > > > I need to implement zero argument UDF but Spark java api doesn't provide > UDF0. > https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java > For workaround I am creating UDF1 with one argument and not using this > argument. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError
[ https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-18598. - Resolution: Fixed Fix Version/s: 2.3.0 > Encoding a Java Bean with extra accessors, produces inconsistent Dataset, > resulting in AssertionError > - > > Key: SPARK-18598 > URL: https://issues.apache.org/jira/browse/SPARK-18598 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hamish Morgan >Assignee: Xiao Li >Priority: Minor > Fix For: 2.3.0 > > > Most operations of {{org.apache.spark.sql.Dataset}} throw > {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java > bean {{Encoder}}, where the bean has more accessors than properties. > The following until test demonstrates the steps to replicate: > {code} > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Encoder; > import org.apache.spark.sql.Encoders; > import org.apache.spark.sql.SparkSession; > import org.junit.Test; > import org.xml.sax.SAXException; > import java.io.IOException; > import static java.util.Collections.singletonList; > public class SparkBeanEncoderTest { > public static class TestBean2 { > private String name; > public void setName(String name) { > this.name = name; > } > public String getName() { > return name; > } > public String getName2() { > return name.toLowerCase(); > } > } > @Test > public void testCreateDatasetFromBeanFailure() throws IOException, > SAXException { > SparkSession spark = SparkSession > .builder() > .master("local") > .getOrCreate(); > TestBean2 bean = new TestBean2(); > bean.setName("testing123"); > Encoder encoder = Encoders.bean(TestBean2.class); > Dataset dataset = spark.createDataset(singletonList(bean), > encoder); > dataset.show(); > spark.stop(); > } > } > {code} > Running the above produces the following output: > {code} > 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2 > 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/11/27 14:00:04 WARN Utils: Your hostname, resolves to a loopback > address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0) > 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(); groups > with view permissions: Set(); users with modify permissions: Set(); > groups with modify permissions: Set() > 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on > port 34688. > 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker > 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster > 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29 > 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB > 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator > 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.1.68:4040 > 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost > 16/11/27 14:00:05 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688. > 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on > 192.168.1.68:42688 > 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, > 42688) > 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/11/27 14:00:05 INFO SharedState: Warehouse path is > 'file:/home/hamish/git/language-identifier/wikidump/spark-
[jira] [Assigned] (SPARK-18598) Encoding a Java Bean with extra accessors, produces inconsistent Dataset, resulting in AssertionError
[ https://issues.apache.org/jira/browse/SPARK-18598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-18598: --- Assignee: (was: Xiao Li) > Encoding a Java Bean with extra accessors, produces inconsistent Dataset, > resulting in AssertionError > - > > Key: SPARK-18598 > URL: https://issues.apache.org/jira/browse/SPARK-18598 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Hamish Morgan >Priority: Minor > > Most operations of {{org.apache.spark.sql.Dataset}} throw > {{java.lang.AssertionError}} when the {{Dataset}} was created with an Java > bean {{Encoder}}, where the bean has more accessors than properties. > The following until test demonstrates the steps to replicate: > {code} > import org.apache.spark.sql.Dataset; > import org.apache.spark.sql.Encoder; > import org.apache.spark.sql.Encoders; > import org.apache.spark.sql.SparkSession; > import org.junit.Test; > import org.xml.sax.SAXException; > import java.io.IOException; > import static java.util.Collections.singletonList; > public class SparkBeanEncoderTest { > public static class TestBean2 { > private String name; > public void setName(String name) { > this.name = name; > } > public String getName() { > return name; > } > public String getName2() { > return name.toLowerCase(); > } > } > @Test > public void testCreateDatasetFromBeanFailure() throws IOException, > SAXException { > SparkSession spark = SparkSession > .builder() > .master("local") > .getOrCreate(); > TestBean2 bean = new TestBean2(); > bean.setName("testing123"); > Encoder encoder = Encoders.bean(TestBean2.class); > Dataset dataset = spark.createDataset(singletonList(bean), > encoder); > dataset.show(); > spark.stop(); > } > } > {code} > Running the above produces the following output: > {code} > 16/11/27 14:00:04 INFO SparkContext: Running Spark version 2.0.2 > 16/11/27 14:00:04 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/11/27 14:00:04 WARN Utils: Your hostname, resolves to a loopback > address: 127.0.1.1; using 192.168.1.68 instead (on interface eth0) > 16/11/27 14:00:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls to: > 16/11/27 14:00:04 INFO SecurityManager: Changing view acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: Changing modify acls groups to: > 16/11/27 14:00:04 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(); groups > with view permissions: Set(); users with modify permissions: Set(); > groups with modify permissions: Set() > 16/11/27 14:00:05 INFO Utils: Successfully started service 'sparkDriver' on > port 34688. > 16/11/27 14:00:05 INFO SparkEnv: Registering MapOutputTracker > 16/11/27 14:00:05 INFO SparkEnv: Registering BlockManagerMaster > 16/11/27 14:00:05 INFO DiskBlockManager: Created local directory at > /tmp/blockmgr-0ae3a00f-eb46-4be2-8ece-1873f3db1a29 > 16/11/27 14:00:05 INFO MemoryStore: MemoryStore started with capacity 3.0 GB > 16/11/27 14:00:05 INFO SparkEnv: Registering OutputCommitCoordinator > 16/11/27 14:00:05 INFO Utils: Successfully started service 'SparkUI' on port > 4040. > 16/11/27 14:00:05 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at > http://192.168.1.68:4040 > 16/11/27 14:00:05 INFO Executor: Starting executor ID driver on host localhost > 16/11/27 14:00:05 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42688. > 16/11/27 14:00:05 INFO NettyBlockTransferService: Server created on > 192.168.1.68:42688 > 16/11/27 14:00:05 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.1.68:42688 with 3.0 GB RAM, BlockManagerId(driver, 192.168.1.68, > 42688) > 16/11/27 14:00:05 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(driver, 192.168.1.68, 42688) > 16/11/27 14:00:05 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > 16/11/27 14:00:05 INFO SharedState: Warehouse path is > 'file:/home/hamish/git/language-identifier/wikidump/spark-warehouse'. > 16/11/27 14:00:05 INFO CodeGenerator: Code generated in 166.
[jira] [Resolved] (SPARK-19285) Java - Provide user-defined function of 0 arguments (UDF0)
[ https://issues.apache.org/jira/browse/SPARK-19285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19285. - Resolution: Fixed Fix Version/s: 2.3.0 > Java - Provide user-defined function of 0 arguments (UDF0) > -- > > Key: SPARK-19285 > URL: https://issues.apache.org/jira/browse/SPARK-19285 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Amit Baghel >Assignee: Xiao Li > Fix For: 2.3.0 > > > I need to implement zero argument UDF but Spark java api doesn't provide > UDF0. > https://github.com/apache/spark/tree/master/sql/core/src/main/java/org/apache/spark/sql/api/java > For workaround I am creating UDF1 with one argument and not using this > argument. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21426) Fix test failure due to unsupported hex literals.
Xiao Li created SPARK-21426: --- Summary: Fix test failure due to unsupported hex literals. Key: SPARK-21426 URL: https://issues.apache.org/jira/browse/SPARK-21426 Project: Spark Issue Type: Bug Components: Tests Affects Versions: 2.0.2 Reporter: Xiao Li Assignee: Xiao Li SPARK 2.0 does not support hex literal. Thus, the test case failed after backporting https://github.com/apache/spark/pull/18571 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21426) Fix test failure due to unsupported hex literals.
[ https://issues.apache.org/jira/browse/SPARK-21426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21426. - Resolution: Fixed Fix Version/s: 2.0.3 > Fix test failure due to unsupported hex literals. > -- > > Key: SPARK-21426 > URL: https://issues.apache.org/jira/browse/SPARK-21426 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.0.2 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.3 > > > SPARK 2.0 does not support hex literal. Thus, the test case failed after > backporting https://github.com/apache/spark/pull/18571 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21426) Fix test failure due to unsupported hex literals.
[ https://issues.apache.org/jira/browse/SPARK-21426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21426: Affects Version/s: (was: 2.0.2) 2.0.3 > Fix test failure due to unsupported hex literals. > -- > > Key: SPARK-21426 > URL: https://issues.apache.org/jira/browse/SPARK-21426 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.0.3 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.3 > > > SPARK 2.0 does not support hex literal. Thus, the test case failed after > backporting https://github.com/apache/spark/pull/18571 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-21344: --- Assignee: Kazuaki Ishizaki > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.1 >Reporter: Shubham Chopra >Assignee: Kazuaki Ishizaki > Fix For: 2.0.3, 2.1.2, 2.2.1 > > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21344. - Resolution: Fixed Fix Version/s: 2.2.1 2.1.2 2.0.3 > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.1 >Reporter: Shubham Chopra >Assignee: Kazuaki Ishizaki > Fix For: 2.0.3, 2.1.2, 2.2.1 > > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org