date:20200615

[jira] [Created] (SPARK-31998) Change package references for ArrowBuf

2020-06-15 Thread Liya Fan (Jira)

Liya Fan created SPARK-31998:


 Summary: Change package references for ArrowBuf
 Key: SPARK-31998
 URL: https://issues.apache.org/jira/browse/SPARK-31998
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Liya Fan


Recently, we have moved class ArrowBuf from package io.netty.buffer to 
org.apache.arrow.memory. So after upgrading Arrow library, we need to update 
the references to ArrowBuf with the correct package name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31986) Test failure RebaseDateTimeSuite."optimization of micros rebasing - Julian to Gregorian"

2020-06-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31986:
---

Assignee: Maxim Gekk

> Test failure RebaseDateTimeSuite."optimization of micros rebasing - Julian to 
> Gregorian"
> 
>
> Key: SPARK-31986
> URL: https://issues.apache.org/jira/browse/SPARK-31986
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The test fails on 1945-09-14 23:30:00.0. The failure can be reproduced by 
> modifying the test:
> {code:scala}
>   test("optimization of micros rebasing - Julian to Gregorian") {
> outstandingZoneIds.filter(_.getId.contains("Hong"))foreach { zid =>
>   withClue(s"zone id = $zid") {
> withDefaultTimeZone(zid) {
>   val start = rebaseGregorianToJulianMicros(
> instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 
> 0).atZone(zid).toInstant))
>   val end = rebaseGregorianToJulianMicros(
> instantToMicros(LocalDateTime.of(2100, 1, 1, 0, 0, 
> 0).atZone(zid).toInstant))
>   var micros = -7612110L
>   do {
> val rebased = rebaseJulianToGregorianMicros(zid, micros)
> val rebasedAndOptimized = rebaseJulianToGregorianMicros(micros)
> assert(rebasedAndOptimized === rebased)
> micros += (MICROS_PER_DAY * 30 * (0.5 + Math.random())).toLong
>   } while (micros <= end)
> }
>   }
> }
>   }
> {code}
> {code}
> zone id = Asia/Hong_Kong -7612110 did not equal -7612074
> ScalaTestFailureLocation: 
> org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite at 
> (RebaseDateTimeSuite.scala:236)
> Expected :-7612074
> Actual   :zone id = Asia/Hong_Kong -7612110
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31986) Test failure RebaseDateTimeSuite."optimization of micros rebasing - Julian to Gregorian"

2020-06-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31986.
-
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28816
[https://github.com/apache/spark/pull/28816]

> Test failure RebaseDateTimeSuite."optimization of micros rebasing - Julian to 
> Gregorian"
> 
>
> Key: SPARK-31986
> URL: https://issues.apache.org/jira/browse/SPARK-31986
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> The test fails on 1945-09-14 23:30:00.0. The failure can be reproduced by 
> modifying the test:
> {code:scala}
>   test("optimization of micros rebasing - Julian to Gregorian") {
> outstandingZoneIds.filter(_.getId.contains("Hong"))foreach { zid =>
>   withClue(s"zone id = $zid") {
> withDefaultTimeZone(zid) {
>   val start = rebaseGregorianToJulianMicros(
> instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 
> 0).atZone(zid).toInstant))
>   val end = rebaseGregorianToJulianMicros(
> instantToMicros(LocalDateTime.of(2100, 1, 1, 0, 0, 
> 0).atZone(zid).toInstant))
>   var micros = -7612110L
>   do {
> val rebased = rebaseJulianToGregorianMicros(zid, micros)
> val rebasedAndOptimized = rebaseJulianToGregorianMicros(micros)
> assert(rebasedAndOptimized === rebased)
> micros += (MICROS_PER_DAY * 30 * (0.5 + Math.random())).toLong
>   } while (micros <= end)
> }
>   }
> }
>   }
> {code}
> {code}
> zone id = Asia/Hong_Kong -7612110 did not equal -7612074
> ScalaTestFailureLocation: 
> org.apache.spark.sql.catalyst.util.RebaseDateTimeSuite at 
> (RebaseDateTimeSuite.scala:236)
> Expected :-7612074
> Actual   :zone id = Asia/Hong_Kong -7612110
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31997) Should drop test_udtf table when SingleSessionSuite completed

2020-06-15 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-31997:
-
Description: 
If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme"  
in HiveThriftBinaryServerSuite will failed as following:

 
{code:java}
- SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
java.sql.SQLException: Error running query: 
org.apache.spark.sql.AnalysisException: Can not create the managed 
table('`default`.`test_udtf`'). The associated 
location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
 already exists.; at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
 at scala.collection.immutable.List.foreach(List.scala:392) at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
{code}
 

because SingleSessionSuite do `create table test_udtf` and not drop it when 
test  completed and HiveThriftBinaryServerSuite want to re-create this table.

If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
order,both test suites will succeed, but we shouldn't rely on their execution 
order

 

  was:
If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme"  
in HiveThriftBinaryServerSuite will failed as following:

 
{code:java}
- SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
java.sql.SQLException: Error running query: 
org.apache.spark.sql.AnalysisException: Can not create the managed 
table('`default`.`test_udtf`'). The associated 
location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
 already exists.; at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
 at scala.collection.immutable.List.foreach(List.scala:392) at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
{code}
 

because SingleSessionSuite do `create table test_udtf` and not drop it when 
test  complete and HiveThriftBinaryServerSuite want to re-create this table.

If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
order,both test suites will succeed, but we shouldn't rely on their execution 
order

 


> Should drop test_udtf table when SingleSessionSuite completed
> -
>
> Key: SPARK-31997
> URL: https://issues.apache.org/jira/browse/SPARK-31997
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
> order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme" 
>  in HiveThriftBinaryServerSuite will failed as following:
>  
> {code:java}
> - SPARK-11595 ADD JAR with input path having U

[jira] [Updated] (SPARK-31997) Should drop test_udtf table when SingleSessionSuite completed

2020-06-15 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-31997:
-
Priority: Minor  (was: Major)

> Should drop test_udtf table when SingleSessionSuite completed
> -
>
> Key: SPARK-31997
> URL: https://issues.apache.org/jira/browse/SPARK-31997
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
> order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme" 
>  in HiveThriftBinaryServerSuite will failed as following:
>  
> {code:java}
> - SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
> java.sql.SQLException: Error running query: 
> org.apache.spark.sql.AnalysisException: Can not create the managed 
> table('`default`.`test_udtf`'). The associated 
> location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
>  already exists.; at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
>  at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> {code}
>  
> because SingleSessionSuite do `create table test_udtf` and not drop it when 
> test  complete and HiveThriftBinaryServerSuite want to re-create this table.
> If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
> order,both test suites will succeed, but we shouldn't rely on their execution 
> order
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31959) Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian to Julian"

2020-06-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31959:

Fix Version/s: 3.0.1

> Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian 
> to Julian"
> 
>
> Key: SPARK-31959
> URL: https://issues.apache.org/jira/browse/SPARK-31959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> See 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123688/testReport/org.apache.spark.sql.catalyst.util/RebaseDateTimeSuite/optimization_of_micros_rebasing___Gregorian_to_Julian/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31997) Should drop test_udtf table when SingleSessionSuite completed

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31997:


Assignee: (was: Apache Spark)

> Should drop test_udtf table when SingleSessionSuite completed
> -
>
> Key: SPARK-31997
> URL: https://issues.apache.org/jira/browse/SPARK-31997
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yang Jie
>Priority: Major
>
> If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
> order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme" 
>  in HiveThriftBinaryServerSuite will failed as following:
>  
> {code:java}
> - SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
> java.sql.SQLException: Error running query: 
> org.apache.spark.sql.AnalysisException: Can not create the managed 
> table('`default`.`test_udtf`'). The associated 
> location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
>  already exists.; at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
>  at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> {code}
>  
> because SingleSessionSuite do `create table test_udtf` and not drop it when 
> test  complete and HiveThriftBinaryServerSuite want to re-create this table.
> If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
> order,both test suites will succeed, but we shouldn't rely on their execution 
> order
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31997) Should drop test_udtf table when SingleSessionSuite completed

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31997:


Assignee: Apache Spark

> Should drop test_udtf table when SingleSessionSuite completed
> -
>
> Key: SPARK-31997
> URL: https://issues.apache.org/jira/browse/SPARK-31997
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
> order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme" 
>  in HiveThriftBinaryServerSuite will failed as following:
>  
> {code:java}
> - SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
> java.sql.SQLException: Error running query: 
> org.apache.spark.sql.AnalysisException: Can not create the managed 
> table('`default`.`test_udtf`'). The associated 
> location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
>  already exists.; at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
>  at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> {code}
>  
> because SingleSessionSuite do `create table test_udtf` and not drop it when 
> test  complete and HiveThriftBinaryServerSuite want to re-create this table.
> If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
> order,both test suites will succeed, but we shouldn't rely on their execution 
> order
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31997) Should drop test_udtf table when SingleSessionSuite completed

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136309#comment-17136309
 ] 

Apache Spark commented on SPARK-31997:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/28838

> Should drop test_udtf table when SingleSessionSuite completed
> -
>
> Key: SPARK-31997
> URL: https://issues.apache.org/jira/browse/SPARK-31997
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yang Jie
>Priority: Major
>
> If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
> order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme" 
>  in HiveThriftBinaryServerSuite will failed as following:
>  
> {code:java}
> - SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
> java.sql.SQLException: Error running query: 
> org.apache.spark.sql.AnalysisException: Can not create the managed 
> table('`default`.`test_udtf`'). The associated 
> location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
>  already exists.; at 
> org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
>  at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
>  at scala.collection.immutable.List.foreach(List.scala:392) at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
>  at 
> org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> {code}
>  
> because SingleSessionSuite do `create table test_udtf` and not drop it when 
> test  complete and HiveThriftBinaryServerSuite want to re-create this table.
> If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
> order,both test suites will succeed, but we shouldn't rely on their execution 
> order
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31997) Should drop test_udtf table when SingleSessionSuite completed

2020-06-15 Thread Yang Jie (Jira)

Yang Jie created SPARK-31997:


 Summary: Should drop test_udtf table when SingleSessionSuite 
completed
 Key: SPARK-31997
 URL: https://issues.apache.org/jira/browse/SPARK-31997
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0
Reporter: Yang Jie


If we execute mvn test SingleSessionSuite and HiveThriftBinaryServerSuite in 
order,  the test case "SPARK-11595 ADD JAR with input path having URL scheme"  
in HiveThriftBinaryServerSuite will failed as following:

 
{code:java}
- SPARK-11595 ADD JAR with input path having URL scheme *** FAILED *** 
java.sql.SQLException: Error running query: 
org.apache.spark.sql.AnalysisException: Can not create the managed 
table('`default`.`test_udtf`'). The associated 
location('file:/home/yarn/spark_ut/spark_ut/baidu/inf-spark/spark-source/sql/hive-thriftserver/spark-warehouse/test_udtf')
 already exists.; at 
org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
 at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65(HiveThriftServer2Suites.scala:603)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$65$adapted(HiveThriftServer2Suites.scala:603)
 at scala.collection.immutable.List.foreach(List.scala:392) at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63(HiveThriftServer2Suites.scala:603)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite.$anonfun$new$63$adapted(HiveThriftServer2Suites.scala:573)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3(HiveThriftServer2Suites.scala:1074)
 at 
org.apache.spark.sql.hive.thriftserver.HiveThriftJdbcTest.$anonfun$withMultipleConnectionJdbcStatement$3$adapted(HiveThriftServer2Suites.scala:1074)
 at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
{code}
 

because SingleSessionSuite do `create table test_udtf` and not drop it when 
test  complete and HiveThriftBinaryServerSuite want to re-create this table.

If we execute mvn test HiveThriftBinaryServerSuite and SingleSessionSuite in 
order,both test suites will succeed, but we shouldn't rely on their execution 
order

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31996) Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136303#comment-17136303
 ] 

Apache Spark commented on SPARK-31996:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/28837

> Specify the version of ChromeDriver and RemoteWebDriver which can work with 
> guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteWebDriver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31996) Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31996:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Specify the version of ChromeDriver and RemoteWebDriver which can work with 
> guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteWebDriver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31996) Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31996:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Specify the version of ChromeDriver and RemoteWebDriver which can work with 
> guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteWebDriver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31996) Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136302#comment-17136302
 ] 

Apache Spark commented on SPARK-31996:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/28837

> Specify the version of ChromeDriver and RemoteWebDriver which can work with 
> guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteWebDriver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Description: 
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver and RemoteWebDriver are implicitly 
upgraded because of dependency and the the implicitly upgraded modules can't 
work with guava 14.0.1 due to an API compatibility so we need to run 
ChromeUISeleniumSuite with a guava version specified like 
-Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older version which can work with 
guava 14.0.1.

  was:
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, chrome-driver and remote-driver are implicitly 
upgraded because of dependency and the the implicitly upgraded modules can't 
work with guava 14.0.1 due to an API compatibility so we need to run 
ChromeUISeleniumSuite with a guava version specified like 
-Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older version which can work with 
guava 14.0.1.


> Specify the version of ChromeDriver and RemoteWebDriver which can work with 
> guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteWebDriver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of ChromeDriver and RemoteWebDriver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Summary: Specify the version of ChromeDriver and RemoteWebDriver which can 
work with guava 14.0.1  (was: Specify the version of chrome-driver and 
remote-driver which can work with guava 14.0.1)

> Specify the version of ChromeDriver and RemoteWebDriver which can work with 
> guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, chrome-driver and remote-driver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of chrome-driver and remote-driver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Description: 
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, chrome-driver and remote-driver are implicitly 
upgraded because of dependency and the the implicitly upgraded modules can't 
work with guava 14.0.1 due to an API compatibility so we need to run 
ChromeUISeleniumSuite with a guava version specified like 
-Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older version which can work with 
guava 14.0.1.

  was:
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver and RemoteDriver is implicitly upgraded 
because of dependency and the upgraded ChromeDriver can't work with guava 
14.0.1 due to an API compatibility so we need to run ChromeUISeleniumSuite with 
a guava version specified like -Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older ChromeDriver which can work 
with guava 14.0.1.


> Specify the version of chrome-driver and remote-driver which can work with 
> guava 14.0.1
> ---
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, chrome-driver and remote-driver are implicitly 
> upgraded because of dependency and the the implicitly upgraded modules can't 
> work with guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older version which can work 
> with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of chrome-driver and remote-driver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Summary: Specify the version of chrome-driver and remote-driver which can 
work with guava 14.0.1  (was: Specify the version of ChromeDriver and 
RemoteDriver which can work with guava 14.0.1)

> Specify the version of chrome-driver and remote-driver which can work with 
> guava 14.0.1
> ---
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteDriver is implicitly 
> upgraded because of dependency and the upgraded ChromeDriver can't work with 
> guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older ChromeDriver which can 
> work with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of ChromeDriver and RemoteDriver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Description: 
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver and RemoteDriver is implicitly upgraded 
because of dependency and the upgraded ChromeDriver can't work with guava 
14.0.1 due to an API compatibility so we need to run ChromeUISeleniumSuite with 
a guava version specified like -Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older ChromeDriver which can work 
with guava 14.0.1.

  was:
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver is implicitly upgraded because of 
dependency and the upgraded ChromeDriver can't work with guava 14.0.1 due to an 
API compatibility so we need to run ChromeUISeleniumSuite with a guava version 
specified like -Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older ChromeDriver which can work 
with guava 14.0.1.


> Specify the version of ChromeDriver and RemoteDriver which can work with 
> guava 14.0.1
> -
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver and RemoteDriver is implicitly 
> upgraded because of dependency and the upgraded ChromeDriver can't work with 
> guava 14.0.1 due to an API compatibility so we need to run 
> ChromeUISeleniumSuite with a guava version specified like 
> -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older ChromeDriver which can 
> work with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of ChromeDriver and RemoteDriver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Summary: Specify the version of ChromeDriver and RemoteDriver which can 
work with guava 14.0.1  (was: Specify the version of ChromeDriver which can 
work with guava 14.0.1)

> Specify the version of ChromeDriver and RemoteDriver which can work with 
> guava 14.0.1
> -
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver is implicitly upgraded because of 
> dependency and the upgraded ChromeDriver can't work with guava 14.0.1 due to 
> an API compatibility so we need to run ChromeUISeleniumSuite with a guava 
> version specified like -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older ChromeDriver which can 
> work with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31996) Specify the version of ChromeDriver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-31996:
---
Description: 
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver is implicitly upgraded because of 
dependency and the upgraded ChromeDriver can't work with guava 14.0.1 due to an 
API compatibility so we need to run ChromeUISeleniumSuite with a guava version 
specified like -Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older ChromeDriver which can work 
with guava 14.0.1.

  was:
SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver is explicitly upgraded because of 
dependency and the upgraded ChromeDriver can't work with guava 14.0.1 due to an 
API compatibility so we need to run ChromeUISeleniumSuite with a guava version 
specified like -Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older ChromeDriver which can work 
with guava 14.0.1.


> Specify the version of ChromeDriver which can work with guava 14.0.1
> 
>
> Key: SPARK-31996
> URL: https://issues.apache.org/jira/browse/SPARK-31996
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
> needed to be upgraded to work with the upgraded HtmlUnit.
> After upgrading Selenium, ChromeDriver is implicitly upgraded because of 
> dependency and the upgraded ChromeDriver can't work with guava 14.0.1 due to 
> an API compatibility so we need to run ChromeUISeleniumSuite with a guava 
> version specified like -Dguava.version=25.0-jre.
> {code:java}
> $ build/sbt -Dguava.version=25.0-jre 
> -Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
> -Dtest.default.exclude.tags= "testOnly 
> org.apache.spark.ui.UISeleniumSuite"{code}
> It's a little bit inconvenience so let's use older ChromeDriver which can 
> work with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31996) Specify the version of ChromeDriver which can work with guava 14.0.1

2020-06-15 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-31996:
--

 Summary: Specify the version of ChromeDriver which can work with 
guava 14.0.1
 Key: SPARK-31996
 URL: https://issues.apache.org/jira/browse/SPARK-31996
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


SPARK-31765 upgraded HtmlUnit due to a security reason and Selenium was also 
needed to be upgraded to work with the upgraded HtmlUnit.

After upgrading Selenium, ChromeDriver is explicitly upgraded because of 
dependency and the upgraded ChromeDriver can't work with guava 14.0.1 due to an 
API compatibility so we need to run ChromeUISeleniumSuite with a guava version 
specified like -Dguava.version=25.0-jre.
{code:java}
$ build/sbt -Dguava.version=25.0-jre 
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver 
-Dtest.default.exclude.tags= "testOnly 
org.apache.spark.ui.UISeleniumSuite"{code}
It's a little bit inconvenience so let's use older ChromeDriver which can work 
with guava 14.0.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31561) Add QUALIFY Clause

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31561:


Assignee: (was: Apache Spark)

> Add QUALIFY Clause
> --
>
> Key: SPARK-31561
> URL: https://issues.apache.org/jira/browse/SPARK-31561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> In a SELECT statement, the QUALIFY clause filters the results of window 
> functions.
> QUALIFY does with window functions what HAVING does with aggregate functions 
> and GROUP BY clauses.
> In the execution order of a query, QUALIFY is therefore evaluated after 
> window functions are computed.
> Examples:
> https://docs.snowflake.com/en/sql-reference/constructs/qualify.html#examples
> More details:
> https://docs.snowflake.com/en/sql-reference/constructs/qualify.html
> https://docs.teradata.com/reader/2_MC9vCtAJRlKle2Rpb0mA/19NnI91neorAi7LX6SJXBw



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31561) Add QUALIFY Clause

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31561:


Assignee: Apache Spark

> Add QUALIFY Clause
> --
>
> Key: SPARK-31561
> URL: https://issues.apache.org/jira/browse/SPARK-31561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> In a SELECT statement, the QUALIFY clause filters the results of window 
> functions.
> QUALIFY does with window functions what HAVING does with aggregate functions 
> and GROUP BY clauses.
> In the execution order of a query, QUALIFY is therefore evaluated after 
> window functions are computed.
> Examples:
> https://docs.snowflake.com/en/sql-reference/constructs/qualify.html#examples
> More details:
> https://docs.snowflake.com/en/sql-reference/constructs/qualify.html
> https://docs.teradata.com/reader/2_MC9vCtAJRlKle2Rpb0mA/19NnI91neorAi7LX6SJXBw



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31561) Add QUALIFY Clause

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136264#comment-17136264
 ] 

Apache Spark commented on SPARK-31561:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28836

> Add QUALIFY Clause
> --
>
> Key: SPARK-31561
> URL: https://issues.apache.org/jira/browse/SPARK-31561
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> In a SELECT statement, the QUALIFY clause filters the results of window 
> functions.
> QUALIFY does with window functions what HAVING does with aggregate functions 
> and GROUP BY clauses.
> In the execution order of a query, QUALIFY is therefore evaluated after 
> window functions are computed.
> Examples:
> https://docs.snowflake.com/en/sql-reference/constructs/qualify.html#examples
> More details:
> https://docs.snowflake.com/en/sql-reference/constructs/qualify.html
> https://docs.teradata.com/reader/2_MC9vCtAJRlKle2Rpb0mA/19NnI91neorAi7LX6SJXBw



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates

2020-06-15 Thread JinxinTang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136248#comment-17136248
 ] 

JinxinTang edited comment on SPARK-31980 at 6/16/20, 3:01 AM:
--

[~DaveDeCaprio] Maybe one of these PRs could rebase to solve this code section 
conflict.


was (Author: jinxintang):
[~DaveDeCaprio] Maybe one of this PRs could rebase to solve this code section 
conflict.

> Spark sequence() fails if start and end of range are identical dates
> 
>
> Key: SPARK-31980
> URL: https://issues.apache.org/jira/browse/SPARK-31980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Spark 2.4.4 standalone and on AWS EMR
>Reporter: Dave DeCaprio
>Priority: Minor
>
>  
> The following Spark SQL query throws an exception
> {code:java}
> select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), 
> interval 1 month)
> {code}
> The error is:
>  
>  
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 
> 1java.lang.ArrayIndexOutOfBoundsException: 1 at 
> scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at 
> org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681)
>  at 
> org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31980) Spark sequence() fails if start and end of range are identical dates

2020-06-15 Thread JinxinTang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136248#comment-17136248
 ] 

JinxinTang commented on SPARK-31980:


[~DaveDeCaprio] Maybe one of this PRs could rebase to solve this code section 
conflict.

> Spark sequence() fails if start and end of range are identical dates
> 
>
> Key: SPARK-31980
> URL: https://issues.apache.org/jira/browse/SPARK-31980
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
> Environment: Spark 2.4.4 standalone and on AWS EMR
>Reporter: Dave DeCaprio
>Priority: Minor
>
>  
> The following Spark SQL query throws an exception
> {code:java}
> select sequence(cast("2011-03-01" as date), cast("2011-03-01" as date), 
> interval 1 month)
> {code}
> The error is:
>  
>  
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 
> 1java.lang.ArrayIndexOutOfBoundsException: 1 at 
> scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:92) at 
> org.apache.spark.sql.catalyst.expressions.Sequence$TemporalSequenceImpl.eval(collectionOperations.scala:2681)
>  at 
> org.apache.spark.sql.catalyst.expressions.Sequence.eval(collectionOperations.scala:2514)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:389){noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14651) CREATE TEMPORARY TABLE is not supported yet

2020-06-15 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136200#comment-17136200
 ] 

Takeshi Yamamuro commented on SPARK-14651:
--

Yea, not supported yet. Please use `create temporary view` instead.

> CREATE TEMPORARY TABLE is not supported yet
> ---
>
> Key: SPARK-14651
> URL: https://issues.apache.org/jira/browse/SPARK-14651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>  Labels: bulk-closed
>
> With today's master it seems that {{CREATE TEMPORARY TABLE}} may or may not 
> work depending on how complete the DDL is (?)
> {code}
> scala> sql("CREATE temporary table t2")
> 16/04/14 23:29:26 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2
> org.apache.spark.sql.catalyst.parser.ParseException:
> CREATE TEMPORARY TABLE is not supported yet. Please use registerTempTable as 
> an alternative.(line 1, pos 0)
> == SQL ==
> CREATE temporary table t2
> ^^^
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:169)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:1049)
>   at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:62)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:198)
>   at 
> org.apache.spark.sql.hive.HiveContext.org$apache$spark$sql$hive$HiveContext$$super$parseSql(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:228)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:175)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:174)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:217)
>   at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:200)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:765)
>   ... 48 elided
> scala> sql("CREATE temporary table t2 USING PARQUET OPTIONS (PATH 'hello') AS 
> SELECT * FROM t1")
> 16/04/14 23:30:21 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2 USING PARQUET OPTIONS (PATH 'hello') AS SELECT * FROM t1
> org.apache.spark.sql.AnalysisException: Table or View not found: t1; line 1 
> pos 80
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$getTable(Analyzer.scala:412)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:421)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:416)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
>   at

[jira] [Updated] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31995:
--
Environment: 
Apache Spark 2.4.5 Scala 2.11 without Hadoop

Hadoop 2.7.3 - YARN cluster

delta-core_ 2.11:0.6.1

 

  was:
Apache Spark 2.4.5 without Hadoop

Hadoop 2.7.3 - YARN cluster

delta-core_ 2.11:0.6.1

 


> Spark Structure Streaming checkpiontFileManager ERROR when 
> HDFS.DFSOutputStream.completeFile with IOException unable to close file 
> because the last block does not have enough number of replicas
> -
>
> Key: SPARK-31995
> URL: https://issues.apache.org/jira/browse/SPARK-31995
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.5
> Environment: Apache Spark 2.4.5 Scala 2.11 without Hadoop
> Hadoop 2.7.3 - YARN cluster
> delta-core_ 2.11:0.6.1
>  
>Reporter: Jim Huang
>Priority: Major
>
> I am using Spark 2.4.5's Spark Structured Streaming with Delta table (0.6.1) 
> as the sink running in YARN cluster running on Hadoop 2.7.3.  I have been 
> using Spark Structured Streaming for several months now in this runtime 
> environment until this new corner case that handicapped my Spark structured 
> streaming job in partial working state.
>  
> I have included the ERROR message and stack trace.  I did a quick search 
> using the string "MicroBatchExecution: Query terminated with error" but did 
> not find any existing Jira that looks like my stack trace.  
>  
> Based on the naive look at this error message and stack trace, is it possible 
> the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
> better to simply wait a little longer for HDFS's pipeline to complete the 
> replicas?  
>  
> Being new to this code, where can I find the configuration parameter that 
> sets the replica counts for the `streaming.HDFSMetadataLog`?  I am just 
> trying to understand if there are already some holistic configuration tuning 
> variable(s) the current code provide to be able to handle this IOException 
> more gracefully?  Hopefully experts can provide some pointers or directions.  
>  
> {code:java}
> 20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = 
> yarn-job-id-redacted, runId = run-id-redacted] terminated with error
>  java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
>  at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>  at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
>  at

[jira] [Updated] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31995:
--
Labels:   (was: delta)

> Spark Structure Streaming checkpiontFileManager ERROR when 
> HDFS.DFSOutputStream.completeFile with IOException unable to close file 
> because the last block does not have enough number of replicas
> -
>
> Key: SPARK-31995
> URL: https://issues.apache.org/jira/browse/SPARK-31995
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.5
> Environment: Apache Spark 2.4.5 without Hadoop
> Hadoop 2.7.3 - YARN cluster
> delta-core_ 2.11:0.6.1
>  
>Reporter: Jim Huang
>Priority: Major
>
> I am using Spark 2.4.5's Spark Structured Streaming with Delta table (0.6.1) 
> as the sink running in YARN cluster running on Hadoop 2.7.3.  I have been 
> using Spark Structured Streaming for several months now in this runtime 
> environment until this new corner case that handicapped my Spark structured 
> streaming job in partial working state.
>  
> I have included the ERROR message and stack trace.  I did a quick search 
> using the string "MicroBatchExecution: Query terminated with error" but did 
> not find any existing Jira that looks like my stack trace.  
>  
> Based on the naive look at this error message and stack trace, is it possible 
> the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
> better to simply wait a little longer for HDFS's pipeline to complete the 
> replicas?  
>  
> Being new to this code, where can I find the configuration parameter that 
> sets the replica counts for the `streaming.HDFSMetadataLog`?  I am just 
> trying to understand if there are already some holistic configuration tuning 
> variable(s) the current code provide to be able to handle this IOException 
> more gracefully?  Hopefully experts can provide some pointers or directions.  
>  
> {code:java}
> 20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = 
> yarn-job-id-redacted, runId = run-id-redacted] terminated with error
>  java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
>  at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>  at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>  at 
> org.apache.spark.sql.execution.streamin

[jira] [Updated] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31995:
--
Labels: delta  (was: )

> Spark Structure Streaming checkpiontFileManager ERROR when 
> HDFS.DFSOutputStream.completeFile with IOException unable to close file 
> because the last block does not have enough number of replicas
> -
>
> Key: SPARK-31995
> URL: https://issues.apache.org/jira/browse/SPARK-31995
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.5
> Environment: Apache Spark 2.4.5 without Hadoop
> Hadoop 2.7.3 - YARN cluster
> delta-core_ 2.11:0.6.1
>  
>Reporter: Jim Huang
>Priority: Major
>  Labels: delta
>
> I am using Spark 2.4.5's Spark Structured Streaming with Delta table (0.6.1) 
> as the sink running in YARN cluster running on Hadoop 2.7.3.  I have been 
> using Spark Structured Streaming for several months now in this runtime 
> environment until this new corner case that handicapped my Spark structured 
> streaming job in partial working state.
>  
> I have included the ERROR message and stack trace.  I did a quick search 
> using the string "MicroBatchExecution: Query terminated with error" but did 
> not find any existing Jira that looks like my stack trace.  
>  
> Based on the naive look at this error message and stack trace, is it possible 
> the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
> better to simply wait a little longer for HDFS's pipeline to complete the 
> replicas?  
>  
> Being new to this code, where can I find the configuration parameter that 
> sets the replica counts for the `streaming.HDFSMetadataLog`?  I am just 
> trying to understand if there are already some holistic configuration tuning 
> variable(s) the current code provide to be able to handle this IOException 
> more gracefully?  Hopefully experts can provide some pointers or directions.  
>  
> {code:java}
> 20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = 
> yarn-job-id-redacted, runId = run-id-redacted] terminated with error
>  java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
>  at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>  at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>  at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
>  at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>  at 
> org.apache

[jira] [Updated] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31995:
--
Description: 
I am using Spark 2.4.5's Spark Structured Streaming with Delta table (0.6.1) as 
the sink running in YARN cluster running on Hadoop 2.7.3.  I have been using 
Spark Structured Streaming for several months now in this runtime environment 
until this new corner case that handicapped my Spark structured streaming job 
in partial working state.

 

I have included the ERROR message and stack trace.  I did a quick search using 
the string "MicroBatchExecution: Query terminated with error" but did not find 
any existing Jira that looks like my stack trace.  

 

Based on the naive look at this error message and stack trace, is it possible 
the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
better to simply wait a little longer for HDFS's pipeline to complete the 
replicas?  

 

Being new to this code, where can I find the configuration parameter that sets 
the replica counts for the `streaming.HDFSMetadataLog`?  I am just trying to 
understand if there are already some holistic configuration tuning variable(s) 
the current code provide to be able to handle this IOException more gracefully? 
 Hopefully experts can provide some pointers or directions.  

 
{code:java}
20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = yarn-job-id-redacted, 
runId = run-id-redacted] terminated with error
 java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
 at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
 at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193){

[jira] [Updated] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31995:
--
Description: 
I am using Spark 2.4.5's Spark Structured Streaming with Delta table as the 
sink running in YARN cluster running on Hadoop 2.7.3.  I have been using Spark 
Structured Streaming for several months now in this runtime environment until 
this new corner case that handicapped my Spark structured streaming job in 
partial working state.

 

I have included the ERROR message and stack trace.  I did a quick search using 
the string "MicroBatchExecution: Query terminated with error" but did not find 
any existing Jira that looks like my stack trace.  

 

Based on the naive look at this error message and stack trace, is it possible 
the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
better to simply wait a little longer for HDFS's pipeline to complete the 
replicas?  

 

Being new to this code, where can I find the configuration parameter that sets 
the replica counts for the `streaming.HDFSMetadataLog`?  I am just trying to 
understand if there are already some holistic configuration tuning variable(s) 
the current code provide to be able to handle this IOException more gracefully? 
 Hopefully experts can provide some pointers or directions.  

 
{code:java}
20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = yarn-job-id-redacted, 
runId = run-id-redacted] terminated with error
 java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
 at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
 at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193){code}

[jira] [Updated] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Huang updated SPARK-31995:
--
Description: 
I am using Spark 2.4.5's Spark Structured Streaming running in YARN cluster 
running on Hadoop 2.7.3.  I have been using Spark Structured Streaming for 
several months now in this runtime environment until this new corner case that 
handicapped my Spark structured streaming job in partial working state.

 

I have included the ERROR message and stack trace.  I did a quick search using 
the string "MicroBatchExecution: Query terminated with error" but did not find 
any existing Jira that looks like my stack trace.  

 

Based on the naive look at this error message and stack trace, is it possible 
the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
better to simply wait a little longer for HDFS's pipeline to complete the 
replicas?  

 

Being new to this code, where can I find the configuration parameter that sets 
the replica counts for the `streaming.HDFSMetadataLog`?  I am just trying to 
understand if there are already some holistic configuration tuning variable(s) 
the current code provide to be able to handle this IOException more gracefully? 
 Hopefully experts can provide some pointers or directions.  

 
{code:java}
20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = yarn-job-id-redacted, 
runId = run-id-redacted] terminated with error
 java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
 at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
 at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193){code}
 

 

  was:
I am using Spark 2.

[jira] [Created] (SPARK-31995) Spark Structure Streaming checkpiontFileManager ERROR when HDFS.DFSOutputStream.completeFile with IOException unable to close file because the last block does not have e

2020-06-15 Thread Jim Huang (Jira)

Jim Huang created SPARK-31995:
-

 Summary: Spark Structure Streaming checkpiontFileManager ERROR 
when HDFS.DFSOutputStream.completeFile with IOException unable to close file 
because the last block does not have enough number of replicas
 Key: SPARK-31995
 URL: https://issues.apache.org/jira/browse/SPARK-31995
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.4.5
 Environment: Apache Spark 2.4.5 without Hadoop

Hadoop 2.7.3 - YARN cluster

delta-core_ 2.11:0.6.1

 
Reporter: Jim Huang


I am using Spark 2.4.5's Spark Structured Streaming running in YARN cluster 
running on Hadoop 2.7.3.  I have been using Spark Structured Streaming for 
several months now in this runtime environment until this new corner case that 
handicapped my Spark structured streaming job in partial working state.

 

I have included the ERROR message and stack trace.  I did a quick search using 
the string "MicroBatchExecution: Query terminated with error" but did not find 
any existing Jira that looks like my stack trace.  

 

Based on the naive look at this error message and stack trace, is it possible 
the Spark's CheckpointFileManager could attempt to handle this HDFS exception 
better to simply wait a little longer for HDFS's pipeline to complete the 
replicas?  

 

Being new to this code, where can I find the configuration parameter that sets 
the replica counts for the `streaming.HDFSMetadataLog`?  I am just trying to 
understand if there are already some holistic configuration tuning variable(s) 
the current code provide to be able to handle this IOException more gracefully? 
 Hopefully experts can provide some pointers or directions.  

 

```

20/06/12 20:14:15 ERROR MicroBatchExecution: Query [id = yarn-job-id-redacted, 
runId = run-id-redacted] terminated with error
java.io.IOException: Unable to close file because the last block does not have 
enough number of replicas.
 at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2511)
 at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:2472)
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2437)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$RenameBasedFSDataOutputStream.close(CheckpointFileManager.scala:145)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatchToFile(HDFSMetadataLog.scala:126)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply$mcZ$sp(HDFSMetadataLog.scala:112)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog$$anonfun$add$1.apply(HDFSMetadataLog.scala:110)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply$mcV$sp(MicroBatchExecution.scala:547)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:545)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.ex

[jira] [Comment Edited] (SPARK-14651) CREATE TEMPORARY TABLE is not supported yet

2020-06-15 Thread Jeffrey E Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136169#comment-17136169
 ] 

Jeffrey E  Rodriguez edited comment on SPARK-14651 at 6/15/20, 10:03 PM:
-

This is showing as solve/incomplete. Trying to work with Spark 2.4 Cloudera and 
it is still an issue. I get the message "

CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW 
as an alternative."

Is that the fix?? Use CREATE TEMPORARY VIEW??


was (Author: jeffreyr97):
This is showing as solve/incomplete. Trying to work with Spark 2.4 Cloudera and 
it is still an issue. I get the message "

CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW 
as an alternative."

Is that the fix?? Use TEMPORARY VIEW??

> CREATE TEMPORARY TABLE is not supported yet
> ---
>
> Key: SPARK-14651
> URL: https://issues.apache.org/jira/browse/SPARK-14651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>  Labels: bulk-closed
>
> With today's master it seems that {{CREATE TEMPORARY TABLE}} may or may not 
> work depending on how complete the DDL is (?)
> {code}
> scala> sql("CREATE temporary table t2")
> 16/04/14 23:29:26 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2
> org.apache.spark.sql.catalyst.parser.ParseException:
> CREATE TEMPORARY TABLE is not supported yet. Please use registerTempTable as 
> an alternative.(line 1, pos 0)
> == SQL ==
> CREATE temporary table t2
> ^^^
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:169)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:1049)
>   at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:62)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:198)
>   at 
> org.apache.spark.sql.hive.HiveContext.org$apache$spark$sql$hive$HiveContext$$super$parseSql(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:228)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:175)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:174)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:217)
>   at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:200)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:765)
>   ... 48 elided
> scala> sql("CREATE temporary table t2 USING PARQUET OPTIONS (PATH 'hello') AS 
> SELECT * FROM t1")
> 16/04/14 23:30:21 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2 USING PARQUET OPTIONS (PATH 'hello') AS SELECT * FROM t1
> org.apache.spark.sql.AnalysisException: Table or View not found: t1; line 1 
> pos 80
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$getTable(Analyzer.scala:412)
>   at 
> org.apache.spark.sql.catal

[jira] [Comment Edited] (SPARK-14651) CREATE TEMPORARY TABLE is not supported yet

2020-06-15 Thread Jeffrey E Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136169#comment-17136169
 ] 

Jeffrey E  Rodriguez edited comment on SPARK-14651 at 6/15/20, 9:58 PM:


This is showing as solve/incomplete. Trying to work with Spark 2.4 Cloudera and 
it is still an issue. I get the message "

CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW 
as an alternative."

Is that the fix?? Use TEMPORARY VIEW??


was (Author: jeffreyr97):
This is showing as solve/incomplete. Trying to work with Spark 2.4 Cloudera and 
it is still an issue. I get message "

CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW 
as an alternative."

Is that the fix?? Use TEMPARY VIEW??

> CREATE TEMPORARY TABLE is not supported yet
> ---
>
> Key: SPARK-14651
> URL: https://issues.apache.org/jira/browse/SPARK-14651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>  Labels: bulk-closed
>
> With today's master it seems that {{CREATE TEMPORARY TABLE}} may or may not 
> work depending on how complete the DDL is (?)
> {code}
> scala> sql("CREATE temporary table t2")
> 16/04/14 23:29:26 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2
> org.apache.spark.sql.catalyst.parser.ParseException:
> CREATE TEMPORARY TABLE is not supported yet. Please use registerTempTable as 
> an alternative.(line 1, pos 0)
> == SQL ==
> CREATE temporary table t2
> ^^^
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:169)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:1049)
>   at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:62)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:198)
>   at 
> org.apache.spark.sql.hive.HiveContext.org$apache$spark$sql$hive$HiveContext$$super$parseSql(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:228)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:175)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:174)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:217)
>   at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:200)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:765)
>   ... 48 elided
> scala> sql("CREATE temporary table t2 USING PARQUET OPTIONS (PATH 'hello') AS 
> SELECT * FROM t1")
> 16/04/14 23:30:21 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2 USING PARQUET OPTIONS (PATH 'hello') AS SELECT * FROM t1
> org.apache.spark.sql.AnalysisException: Table or View not found: t1; line 1 
> pos 80
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$getTable(Analyzer.scala:412)
>   at 
> org.apache.spark.sql.catalyst.analysis.An

[jira] [Commented] (SPARK-14651) CREATE TEMPORARY TABLE is not supported yet

2020-06-15 Thread Jeffrey E Rodriguez (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136169#comment-17136169
 ] 

Jeffrey E  Rodriguez commented on SPARK-14651:
--

This is showing as solve/incomplete. Trying to work with Spark 2.4 Cloudera and 
it is still an issue. I get message "

CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW 
as an alternative."

Is that the fix?? Use TEMPARY VIEW??

> CREATE TEMPORARY TABLE is not supported yet
> ---
>
> Key: SPARK-14651
> URL: https://issues.apache.org/jira/browse/SPARK-14651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>  Labels: bulk-closed
>
> With today's master it seems that {{CREATE TEMPORARY TABLE}} may or may not 
> work depending on how complete the DDL is (?)
> {code}
> scala> sql("CREATE temporary table t2")
> 16/04/14 23:29:26 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2
> org.apache.spark.sql.catalyst.parser.ParseException:
> CREATE TEMPORARY TABLE is not supported yet. Please use registerTempTable as 
> an alternative.(line 1, pos 0)
> == SQL ==
> CREATE temporary table t2
> ^^^
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:169)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder$$anonfun$visitCreateTable$1.apply(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:165)
>   at 
> org.apache.spark.sql.hive.execution.HiveSqlAstBuilder.visitCreateTable(HiveSqlParser.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:1049)
>   at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:63)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:85)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:62)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53)
>   at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:198)
>   at 
> org.apache.spark.sql.hive.HiveContext.org$apache$spark$sql$hive$HiveContext$$super$parseSql(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.HiveContext$$anonfun$parseSql$1.apply(HiveContext.scala:201)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:228)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:175)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:174)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:217)
>   at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:200)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:765)
>   ... 48 elided
> scala> sql("CREATE temporary table t2 USING PARQUET OPTIONS (PATH 'hello') AS 
> SELECT * FROM t1")
> 16/04/14 23:30:21 INFO HiveSqlParser: Parsing command: CREATE temporary table 
> t2 USING PARQUET OPTIONS (PATH 'hello') AS SELECT * FROM t1
> org.apache.spark.sql.AnalysisException: Table or View not found: t1; line 1 
> pos 80
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$getTable(Analyzer.scala:412)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:421)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:416)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:58)
>   at 
> org.apache.sp

[jira] [Commented] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

2020-06-15 Thread John Born (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136157#comment-17136157
 ] 

John Born commented on SPARK-14948:
---

Alternative workaround (prettier code-wise, probably worse resource-wise). This 
creates a new instance of df based off of the rdd *after* creating the derived 
agg_df.
{code:java}
val rdd = sc.parallelize(List("a" -> 1, "b" -> 1, "a" -> 2))
val df = rdd.toDF(Seq("letter", "number"): _*)
val agg_df = 
df.groupBy("letter").agg(max("number")).withColumnRenamed("max(number)", "max")

// Error occurs:
agg_df.join(df, agg_df("letter") === df("letter") and agg_df("max") === 
df("number"), "inner").show()

// Re-create df instance:
val df = rdd.toDF(Seq("letter", "number"): _*)

// No error, exact same code that caused above error:
agg_df.join(df, agg_df("letter") === df("letter") and agg_df("max") === 
df("number"), "inner").show(){code}
 

> Exception when joining DataFrames derived form the same DataFrame
> -
>
> Key: SPARK-14948
> URL: https://issues.apache.org/jira/browse/SPARK-14948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Saurabh Santhosh
>Priority: Major
>
> h2. Spark Analyser is throwing the following exception in a specific scenario 
> :
> h2. Exception :
> org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing 
> from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> JavaRDD rdd =
> 
> sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
>  "b")));
> DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
> StructType(fields));
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
> asd, F2 from t1");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, 
> "t2");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
> 
> DataFrame join = aliasedDf.join(df, 
> aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
> DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
> select.collect();
> {code}
> h2. Observations :
> * This issue is related to the Data Type of Fields of the initial Data 
> Frame.(If the Data Type is not String, it will work.)
> * It works fine if the data frame is registered as a temporary table and an 
> sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30520) Eliminate deprecation warnings for UserDefinedAggregateFunction

2020-06-15 Thread Erik Erlandson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136153#comment-17136153
 ] 

Erik Erlandson commented on SPARK-30520:


Starting in spark 3.0, any custom aggregator that would have been implemented 
using UserDefinedAggregateFunction should now be implemented using Aggregator. 
To use a custom Aggregator with a dynamically typed DataFrame (aka 
DataSet[Row]), register it using org.apache.spark.sql.functions.udaf

 

> Eliminate deprecation warnings for UserDefinedAggregateFunction
> ---
>
> Key: SPARK-30520
> URL: https://issues.apache.org/jira/browse/SPARK-30520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> {code}
> /Users/maxim/proj/eliminate-expr-info-warnings/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
> Warning:Warning:line (718)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
>   val udaf = 
> clazz.getConstructor().newInstance().asInstanceOf[UserDefinedAggregateFunction]
> Warning:Warning:line (719)method register in class UDFRegistration is 
> deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now be registered 
> as a UDF via the functions.udaf(agg) method.
>   register(name, udaf)
> /Users/maxim/proj/eliminate-expr-info-warnings/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala
> Warning:Warning:line (328)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
> udaf: UserDefinedAggregateFunction,
> Warning:Warning:line (326)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
> case class ScalaUDAF(
> /Users/maxim/proj/eliminate-expr-info-warnings/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
> Warning:Warning:line (363)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
> val udaf = new UserDefinedAggregateFunction {
> /Users/maxim/proj/eliminate-expr-info-warnings/sql/core/src/test/java/test/org/apache/spark/sql/MyDoubleSum.java
> Warning:Warning:line (25)java: 
> org.apache.spark.sql.expressions.UserDefinedAggregateFunction in 
> org.apache.spark.sql.expressions has been deprecated
> Warning:Warning:line (35)java: 
> org.apache.spark.sql.expressions.UserDefinedAggregateFunction in 
> org.apache.spark.sql.expressions has been deprecated
> /Users/maxim/proj/eliminate-expr-info-warnings/sql/core/src/test/java/test/org/apache/spark/sql/MyDoubleAvg.java
> Warning:Warning:line (25)java: 
> org.apache.spark.sql.expressions.UserDefinedAggregateFunction in 
> org.apache.spark.sql.expressions has been deprecated
> Warning:Warning:line (36)java: 
> org.apache.spark.sql.expressions.UserDefinedAggregateFunction in 
> org.apache.spark.sql.expressions has been deprecated
> /Users/maxim/proj/eliminate-expr-info-warnings/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
> Warning:Warning:line (36)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
> class ScalaAggregateFunction(schema: StructType) extends 
> UserDefinedAggregateFunction {
> Warning:Warning:line (73)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
> class ScalaAggregateFunctionWithoutInputSchema extends 
> UserDefinedAggregateFunction {
> Warning:Warning:line (100)class UserDefinedAggregateFunction in package 
> expressions is deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now 
> be registered as a UDF via the functions.udaf(agg) method.
> class LongProductSum extends UserDefinedAggregateFunction {
> Warning:Warning:line (189)method register in class UDFRegistration is 
> deprecated (since 3.0.0): Aggregator[IN, BUF, OUT] should now be registered 
> as a UDF via the functions.udaf(agg) method.
> spark.udf.register("mydoublesum", new MyDoubleSum)
> Warning:Warning:line (190)method register in class UDFRegistration is 
> deprecated (since 3.0.0): Aggregator[IN, BUF, OUT

[jira] [Resolved] (SPARK-31824) DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully

2020-06-15 Thread Xingbo Jiang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xingbo Jiang resolved SPARK-31824.
--
Fix Version/s: 3.1.0
 Assignee: jiaan.geng
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28641

> DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully
> 
>
> Key: SPARK-31824
> URL: https://issues.apache.org/jira/browse/SPARK-31824
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.1.0
>
>
> DAGSchedulerSuite provides completeShuffleMapStageSuccessfully to make 
> ShuffleMapStage successfully.
> But many test case uses complete directly as follows:
> complete(taskSets(0), Seq((Success, makeMapStatus("hostA", 1
> We need to improve completeShuffleMapStageSuccessfully and reuse it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31976) use MemoryUsage to control the size of block

2020-06-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136103#comment-17136103
 ] 

Dongjoon Hyun commented on SPARK-31976:
---

Thank you for updating, [~smilegator]. :)

> use MemoryUsage to control the size of block
> 
>
> Key: SPARK-31976
> URL: https://issues.apache.org/jira/browse/SPARK-31976
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Major
>
> According to the performance test in 
> https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is 
> mainly related to the nnz of block.
> So it maybe reasonable to control the size of block by memory usage, instead 
> of number of rows.
>  
> note1: param blockSize had already used in ALS and MLP to stack vectors 
> (expected to be dense);
> note2: we may refer to the {{Strategy.maxMemoryInMB}} in tree models;
>  
> There may be two ways to impl:
> 1, compute the sparsity of input vectors ahead of train (this can be computed 
> with other statistics computation, maybe no extra pass), and infer a 
> reasonable number of vectors to stack;
> 2, stack the input vectors adaptively, by monitoring the memory usage in a 
> block;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31976) use MemoryUsage to control the size of block

2020-06-15 Thread Xiao Li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136102#comment-17136102
 ] 

Xiao Li commented on SPARK-31976:
-

I think [~podongfeng] just wants to target this feature in Spark 3.1 instead of 
treat it as the blocker. 

Made the change. Feel free to discuss it here, if this is not what you want. 

> use MemoryUsage to control the size of block
> 
>
> Key: SPARK-31976
> URL: https://issues.apache.org/jira/browse/SPARK-31976
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Major
>
> According to the performance test in 
> https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is 
> mainly related to the nnz of block.
> So it maybe reasonable to control the size of block by memory usage, instead 
> of number of rows.
>  
> note1: param blockSize had already used in ALS and MLP to stack vectors 
> (expected to be dense);
> note2: we may refer to the {{Strategy.maxMemoryInMB}} in tree models;
>  
> There may be two ways to impl:
> 1, compute the sparsity of input vectors ahead of train (this can be computed 
> with other statistics computation, maybe no extra pass), and infer a 
> reasonable number of vectors to stack;
> 2, stack the input vectors adaptively, by monitoring the memory usage in a 
> block;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31976) use MemoryUsage to control the size of block

2020-06-15 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-31976:

Target Version/s: 3.1.0
Priority: Major  (was: Blocker)

> use MemoryUsage to control the size of block
> 
>
> Key: SPARK-31976
> URL: https://issues.apache.org/jira/browse/SPARK-31976
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Major
>
> According to the performance test in 
> https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is 
> mainly related to the nnz of block.
> So it maybe reasonable to control the size of block by memory usage, instead 
> of number of rows.
>  
> note1: param blockSize had already used in ALS and MLP to stack vectors 
> (expected to be dense);
> note2: we may refer to the {{Strategy.maxMemoryInMB}} in tree models;
>  
> There may be two ways to impl:
> 1, compute the sparsity of input vectors ahead of train (this can be computed 
> with other statistics computation, maybe no extra pass), and infer a 
> reasonable number of vectors to stack;
> 2, stack the input vectors adaptively, by monitoring the memory usage in a 
> block;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-31902) K8s integration tests are failing with mockito errors

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-31902:
---

> K8s integration tests are failing with mockito errors
> -
>
> Key: SPARK-31902
> URL: https://issues.apache.org/jira/browse/SPARK-31902
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Blocker
>
> The K8s integration tests are failing consistently with Mockito errors. 
> Looking at the error it seems like we may have inadvertently (or 
> intentionally) bumped our mockito or bytebuddy versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31902) K8s integration tests are failing with mockito errors

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31902.
---
Resolution: Duplicate

> K8s integration tests are failing with mockito errors
> -
>
> Key: SPARK-31902
> URL: https://issues.apache.org/jira/browse/SPARK-31902
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Blocker
>
> The K8s integration tests are failing consistently with Mockito errors. 
> Looking at the error it seems like we may have inadvertently (or 
> intentionally) bumped our mockito or bytebuddy versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31976) use MemoryUsage to control the size of block

2020-06-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136093#comment-17136093
 ] 

Dongjoon Hyun commented on SPARK-31976:
---

Hi, [~podongfeng]. Could you give us more context why this is a blocker issue, 
please?

(cc [~cloud_fan] and [~smilegator])

> use MemoryUsage to control the size of block
> 
>
> Key: SPARK-31976
> URL: https://issues.apache.org/jira/browse/SPARK-31976
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Blocker
>
> According to the performance test in 
> https://issues.apache.org/jira/browse/SPARK-31783, the performance gain is 
> mainly related to the nnz of block.
> So it maybe reasonable to control the size of block by memory usage, instead 
> of number of rows.
>  
> note1: param blockSize had already used in ALS and MLP to stack vectors 
> (expected to be dense);
> note2: we may refer to the {{Strategy.maxMemoryInMB}} in tree models;
>  
> There may be two ways to impl:
> 1, compute the sparsity of input vectors ahead of train (this can be computed 
> with other statistics computation, maybe no extra pass), and infer a 
> reasonable number of vectors to stack;
> 2, stack the input vectors adaptively, by monitoring the memory usage in a 
> block;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31994) Docker image should use `https` urls for only mirrors that support it(SSL)

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31994:
--
Priority: Minor  (was: Major)

> Docker image should use `https` urls for only mirrors that support it(SSL)
> --
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Minor
> Fix For: 3.0.1, 3.1.0
>
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31994) Docker image should use `https` urls for only mirrors that support it(SSL)

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31994:
-

Assignee: Prashant Sharma

> Docker image should use `https` urls for only mirrors that support it(SSL)
> --
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31994) Docker image should use `https` urls for only mirrors that support it(SSL)

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31994.
---
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28834
[https://github.com/apache/spark/pull/28834]

> Docker image should use `https` urls for only mirrors that support it(SSL)
> --
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31926) Fix concurrency issue for ThriftCLIService to getPortNumber

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135996#comment-17135996
 ] 

Apache Spark commented on SPARK-31926:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28835

> Fix concurrency issue for ThriftCLIService to getPortNumber
> ---
>
> Key: SPARK-31926
> URL: https://issues.apache.org/jira/browse/SPARK-31926
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> When 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2#startWithContext 
> called,
> it starts ThriftCLIService in the background with a new Thread, at the same 
> time we call ThriftCLIService.getPortNumber, we might not get the bound port 
> if it's configured with 0. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31860) Only push release tags on success

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31860:
--
Affects Version/s: (was: 3.0.1)

> Only push release tags on success
> -
>
> Key: SPARK-31860
> URL: https://issues.apache.org/jira/browse/SPARK-31860
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Sometimes the build fails during a release, we shouldn't push the RC tag in 
> those situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31860) Only push release tags on success

2020-06-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135983#comment-17135983
 ] 

Dongjoon Hyun edited comment on SPARK-31860 at 6/15/20, 4:10 PM:
-

This is reverted from master/3.0 due to the regression on `branch-3.0`.

- 
https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8
- 
https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7


was (Author: dongjoon):
This is reverted from master/3.0 due to the regression on `branch-3.0`.

- 
[https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8]

- 
https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7

> Only push release tags on success
> -
>
> Key: SPARK-31860
> URL: https://issues.apache.org/jira/browse/SPARK-31860
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0, 3.0.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Sometimes the build fails during a release, we shouldn't push the RC tag in 
> those situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31860) Only push release tags on success

2020-06-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135983#comment-17135983
 ] 

Dongjoon Hyun edited comment on SPARK-31860 at 6/15/20, 4:10 PM:
-

This is reverted from master/3.0 due to the regression on `branch-3.0`.

- 
https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8

- 
https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7


was (Author: dongjoon):
This is reverted from master/3.0 due to the regression on `branch-3.0`.

- 
https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8
- 
https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7

> Only push release tags on success
> -
>
> Key: SPARK-31860
> URL: https://issues.apache.org/jira/browse/SPARK-31860
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0, 3.0.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Sometimes the build fails during a release, we shouldn't push the RC tag in 
> those situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31860) Only push release tags on success

2020-06-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135983#comment-17135983
 ] 

Dongjoon Hyun commented on SPARK-31860:
---

This is reverted from master/3.0 due to the regression on `branch-3.0`.

- 
[https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8]

- 
https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7

> Only push release tags on success
> -
>
> Key: SPARK-31860
> URL: https://issues.apache.org/jira/browse/SPARK-31860
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0, 3.0.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Sometimes the build fails during a release, we shouldn't push the RC tag in 
> those situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31860) Only push release tags on success

2020-06-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135983#comment-17135983
 ] 

Dongjoon Hyun edited comment on SPARK-31860 at 6/15/20, 4:10 PM:
-

This is reverted from master/3.0 due to the regression on `branch-3.0`.
 - 
[https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8]
 
 - 
[https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7]


was (Author: dongjoon):
This is reverted from master/3.0 due to the regression on `branch-3.0`.

- 
https://github.com/apache/spark/commit/d3a5e2963cca58cc464d6ed779bb9fb649a614a8

- 
https://github.com/apache/spark/commit/69ede588c9aa56e8d648c2837a537af138b479c7

> Only push release tags on success
> -
>
> Key: SPARK-31860
> URL: https://issues.apache.org/jira/browse/SPARK-31860
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0, 3.0.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Sometimes the build fails during a release, we shouldn't push the RC tag in 
> those situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31860) Only push release tags on success

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31860:
--
Fix Version/s: (was: 3.0.1)
   (was: 3.1.0)

> Only push release tags on success
> -
>
> Key: SPARK-31860
> URL: https://issues.apache.org/jira/browse/SPARK-31860
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0, 3.0.1
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Sometimes the build fails during a release, we shouldn't push the RC tag in 
> those situations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26905) Revisit reserved/non-reserved keywords based on the ANSI SQL standard

2020-06-15 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-26905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-26905.
--
Fix Version/s: 3.0.1
 Assignee: Takeshi Yamamuro
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28807

> Revisit reserved/non-reserved keywords based on the ANSI SQL standard
> -
>
> Key: SPARK-26905
> URL: https://issues.apache.org/jira/browse/SPARK-26905
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.0.1
>
> Attachments: spark-ansiNonReserved.txt, spark-keywords-list.txt, 
> spark-nonReserved.txt, spark-strictNonReserved.txt, 
> sql2016-02-nonreserved.txt, sql2016-02-reserved.txt, 
> sql2016-09-nonreserved.txt, sql2016-09-reserved.txt, 
> sql2016-14-nonreserved.txt, sql2016-14-reserved.txt
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31950) Extract SQL keywords from the generated parser class in TableIdentifierParserSuite

2020-06-15 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-31950:
-
Fix Version/s: (was: 3.1.0)
   3.0.1

> Extract SQL keywords from the generated parser class in 
> TableIdentifierParserSuite
> --
>
> Key: SPARK-31950
> URL: https://issues.apache.org/jira/browse/SPARK-31950
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31959) Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian to Julian"

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31959.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28832
[https://github.com/apache/spark/pull/28832]

> Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian 
> to Julian"
> 
>
> Key: SPARK-31959
> URL: https://issues.apache.org/jira/browse/SPARK-31959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> See 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123688/testReport/org.apache.spark.sql.catalyst.util/RebaseDateTimeSuite/optimization_of_micros_rebasing___Gregorian_to_Julian/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31959) Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian to Julian"

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31959:
-

Assignee: Maxim Gekk

> Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian 
> to Julian"
> 
>
> Key: SPARK-31959
> URL: https://issues.apache.org/jira/browse/SPARK-31959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> See 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123688/testReport/org.apache.spark.sql.catalyst.util/RebaseDateTimeSuite/optimization_of_micros_rebasing___Gregorian_to_Julian/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31990) Streaming's state store compatibility is broken

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31990.
---
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28830
[https://github.com/apache/spark/pull/28830]

> Streaming's state store compatibility is broken
> ---
>
> Key: SPARK-31990
> URL: https://issues.apache.org/jira/browse/SPARK-31990
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Takeshi Yamamuro
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.0.1, 3.1.0
>
>
> [This 
> line|https://github.com/apache/spark/pull/28062/files#diff-7a46f10c3cedbf013cf255564d9483cdR2458]
>  of [https://github.com/apache/spark/pull/28062] changed the order of 
> groupCols in dropDuplicates(). Thus, the executor JVM could probably crash, 
> throw a random exception or even return a wrong answer when using the 
> checkpoint written by the previous version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31990) Streaming's state store compatibility is broken

2020-06-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31990:
-

Assignee: Takeshi Yamamuro

> Streaming's state store compatibility is broken
> ---
>
> Key: SPARK-31990
> URL: https://issues.apache.org/jira/browse/SPARK-31990
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Takeshi Yamamuro
>Priority: Blocker
>  Labels: correctness
>
> [This 
> line|https://github.com/apache/spark/pull/28062/files#diff-7a46f10c3cedbf013cf255564d9483cdR2458]
>  of [https://github.com/apache/spark/pull/28062] changed the order of 
> groupCols in dropDuplicates(). Thus, the executor JVM could probably crash, 
> throw a random exception or even return a wrong answer when using the 
> checkpoint written by the previous version. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31925) Summary.totalIterations greater than maxIters

2020-06-15 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31925.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28786

> Summary.totalIterations greater than maxIters
> -
>
> Key: SPARK-31925
> URL: https://issues.apache.org/jira/browse/SPARK-31925
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.6, 3.0.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.1.0
>
>
> I am not sure whether it is a bug, but if we set *maxIter=n* in LiR/LiR/etc, 
> the model.summary.totalIterations will return *n+1* if the training procedure 
> does not drop out.
>  
> friendly ping [~huaxingao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31925) Summary.totalIterations greater than maxIters

2020-06-15 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31925:
-
Priority: Minor  (was: Trivial)

> Summary.totalIterations greater than maxIters
> -
>
> Key: SPARK-31925
> URL: https://issues.apache.org/jira/browse/SPARK-31925
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.6, 3.0.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
>
> I am not sure whether it is a bug, but if we set *maxIter=n* in LiR/LiR/etc, 
> the model.summary.totalIterations will return *n+1* if the training procedure 
> does not drop out.
>  
> friendly ping [~huaxingao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31925) Summary.totalIterations greater than maxIters

2020-06-15 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-31925:


Docs Text: Before Spark 3.1, if maxIter=n in LogisticRegression and 
LinearRegression, then the result summary's totalIterations returns n+1. It now 
correctly returns n, in 3.1. The objective history is unchanged.
 Assignee: Huaxin Gao
   Labels: release-notes  (was: )

Just for completeness, I'll add release notes for this change, though it's 
essentially a bug fix.

> Summary.totalIterations greater than maxIters
> -
>
> Key: SPARK-31925
> URL: https://issues.apache.org/jira/browse/SPARK-31925
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.6, 3.0.0
>Reporter: zhengruifeng
>Assignee: Huaxin Gao
>Priority: Trivial
>  Labels: release-notes
>
> I am not sure whether it is a bug, but if we set *maxIter=n* in LiR/LiR/etc, 
> the model.summary.totalIterations will return *n+1* if the training procedure 
> does not drop out.
>  
> friendly ping [~huaxingao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31994) Docker image should use `https` urls for only mirrors that support it(SSL)

2020-06-15 Thread Prashant Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31994:

Summary: Docker image should use `https` urls for only mirrors that support 
it(SSL)  (was: Docker image should use `https` urls for only deb.debian.org 
mirrors.)

> Docker image should use `https` urls for only mirrors that support it(SSL)
> --
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31994) Docker image should use `https` urls for only deb.debian.org mirrors.

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135716#comment-17135716
 ] 

Apache Spark commented on SPARK-31994:
--

User 'ScrapCodes' has created a pull request for this issue:
https://github.com/apache/spark/pull/28834

> Docker image should use `https` urls for only deb.debian.org mirrors.
> -
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31994) Docker image should use `https` urls for only deb.debian.org mirrors.

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135714#comment-17135714
 ] 

Apache Spark commented on SPARK-31994:
--

User 'ScrapCodes' has created a pull request for this issue:
https://github.com/apache/spark/pull/28834

> Docker image should use `https` urls for only deb.debian.org mirrors.
> -
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31994) Docker image should use `https` urls for only deb.debian.org mirrors.

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31994:


Assignee: Apache Spark

> Docker image should use `https` urls for only deb.debian.org mirrors.
> -
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Apache Spark
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31994) Docker image should use `https` urls for only deb.debian.org mirrors.

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31994:


Assignee: (was: Apache Spark)

> Docker image should use `https` urls for only deb.debian.org mirrors.
> -
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31994) Docker image should use `https` urls for only deb.debian.org mirrors.

2020-06-15 Thread Prashant Sharma (Jira)

Prashant Sharma created SPARK-31994:
---

 Summary: Docker image should use `https` urls for only 
deb.debian.org mirrors.
 Key: SPARK-31994
 URL: https://issues.apache.org/jira/browse/SPARK-31994
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0, 3.1.0
Reporter: Prashant Sharma


It appears, that security.debian.org does not support https.
{code}
curl https://security.debian.org
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
security.debian.org:443 
{code}

While building the image, it fails in the following way.
{code}
MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
v3.1.0-1 build
Sending build context to Docker daemon  222.1MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' /etc/apt/sources.list 
&& apt-get update && ln -s /lib /lib64 && apt install -y bash tini 
libc6 libpam-modules krb5-user libnss3 procps && mkdir -p /opt/spark && 
mkdir -p /opt/spark/examples && mkdir -p /opt/spark/work-dir && touch 
/opt/spark/RELEASE && rm /bin/sh && ln -sv /bin/bash /bin/sh && 
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root 
/etc/passwd && chmod ug+rw /etc/passwd && rm -rf /var/cache/apt/*
 ---> Running in a3461dadd6eb
+ sed -i s/http:/https:/g /etc/apt/sources.list
+ apt-get update
Ign:1 https://security.debian.org/debian-security buster/updates InRelease
Err:2 https://security.debian.org/debian-security buster/updates Release
  Could not handshake: The TLS connection was non-properly terminated. [IP: 
151.101.0.204 443]
Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 B]
Reading package lists...
E: The repository 'https://security.debian.org/debian-security buster/updates 
Release' does not have a Release file.
The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
/etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && apt 
install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && mkdir 
-p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*' returned a non-zero code: 100
Failed to build Spark JVM Docker image, please refer to Docker build output for 
details.
{code}

So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31905) Add compatibility tests for streaming state store format

2020-06-15 Thread Yuanjian Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-31905:

Summary: Add compatibility tests for streaming state store format  (was: 
Add compatibility tests for streaming aggregation state store format)

> Add compatibility tests for streaming state store format
> 
>
> Key: SPARK-31905
> URL: https://issues.apache.org/jira/browse/SPARK-31905
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>
> After SPARK-31894, we have a validation checking for the streaming state 
> store. It's better to add integrated tests in the PR builder as soon as the 
> breaking changes introduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29345) Add an API that allows a user to define and observe arbitrary metrics on batch and streaming queries

2020-06-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-29345:

Summary: Add an API that allows a user to define and observe arbitrary 
metrics on batch and streaming queries  (was: Add an API that allows a user to 
define and observe arbitrary metrics on streaming queries)

> Add an API that allows a user to define and observe arbitrary metrics on 
> batch and streaming queries
> 
>
> Key: SPARK-29345
> URL: https://issues.apache.org/jira/browse/SPARK-29345
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31887) Date casting to string is giving wrong value

2020-06-15 Thread Amit Gupta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135638#comment-17135638
 ] 

Amit Gupta commented on SPARK-31887:


[~hyukjin.kwon] I will deploy the master on server and reconfirm if this is 
fixed.

 

> Date casting to string is giving wrong value
> 
>
> Key: SPARK-31887
> URL: https://issues.apache.org/jira/browse/SPARK-31887
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5
> Environment: The spark is running on cluster mode with Mesos.
>  
> Mesos agents are dockerised running on Ubuntu 18.
>  
> Timezone setting of docker instance: UTC
> Timezone of server hosting docker: America/New_York
> Timezone of driver machine: America/New_York
>Reporter: Amit Gupta
>Priority: Major
>
> The code converts the string to date and then write it in csv.
> {code:java}
> val x = Seq(("2020-02-19", "2020-02-19 05:11:00")).toDF("a", 
> "b").select('a.cast("date"), 'b.cast("timestamp"))
> x.show()
> +--+---+
> | a|  b|
> +--+---+
> |2020-02-19|2020-02-19 05:11:00|
> +--+---+
> x.write.mode("overwrite").option("header", true).csv("/tmp/test1.csv")
> {code}
>  
> The date written in CSV file is different:
> {code:java}
> > snakebite cat "/tmp/test1.csv/*.csv"
> a,b
> 2020-02-18,2020-02-19T05:11:00.000Z{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31985) Remove the incomplete code path on aggregation for continuous mode

2020-06-15 Thread Gabor Somogyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135637#comment-17135637
 ] 

Gabor Somogyi commented on SPARK-31985:
---

+1 on this unless somebody wants to continue the work. Supporting code parts 
which are unfinished and not working is an additional burden.

> Remove the incomplete code path on aggregation for continuous mode
> --
>
> Key: SPARK-31985
> URL: https://issues.apache.org/jira/browse/SPARK-31985
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> During work I realized we left some incomplete code parts on supporting 
> aggregation for continuous mode. (shuffle & coalesce)
>  
> The work had been occurred around first half of 2018 and stopped, and no work 
> has been done for around 2 years, so I don't expect anyone is working on this.
>  
> The functionality is undocumented (as the work was only done partially) and 
> continuous mode is experimental so I don't feel risks to get rid of the part.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20680) Spark-sql do not support for void column datatype of view

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135614#comment-17135614
 ] 

Apache Spark commented on SPARK-20680:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/28833

> Spark-sql do not support for void column datatype of view
> -
>
> Key: SPARK-20680
> URL: https://issues.apache.org/jira/browse/SPARK-20680
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Lantao Jin
>Priority: Major
>  Labels: bulk-closed
>
> Create a HIVE view:
> {quote}
> hive> create table bad as select 1 x, null z from dual;
> {quote}
> Because there's no type, Hive gives it the VOID type:
> {quote}
> hive> describe bad;
> OK
> x int 
> z void
> {quote}
> In Spark2.0.x, the behaviour to read this view is normal:
> {quote}
> spark-sql> describe bad;
> x   int NULL
> z   voidNULL
> Time taken: 4.431 seconds, Fetched 2 row(s)
> {quote}
> But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type 
> string: void
> {quote}
> spark-sql> describe bad;
> 17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
> 17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
> Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
> DataType void() is not supported.(line 1, pos 0)
> == SQL ==  
> void   
> ^^^
> ... 61 more
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20680) Spark-sql do not support for void column datatype of view

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135612#comment-17135612
 ] 

Apache Spark commented on SPARK-20680:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/28833

> Spark-sql do not support for void column datatype of view
> -
>
> Key: SPARK-20680
> URL: https://issues.apache.org/jira/browse/SPARK-20680
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Lantao Jin
>Priority: Major
>  Labels: bulk-closed
>
> Create a HIVE view:
> {quote}
> hive> create table bad as select 1 x, null z from dual;
> {quote}
> Because there's no type, Hive gives it the VOID type:
> {quote}
> hive> describe bad;
> OK
> x int 
> z void
> {quote}
> In Spark2.0.x, the behaviour to read this view is normal:
> {quote}
> spark-sql> describe bad;
> x   int NULL
> z   voidNULL
> Time taken: 4.431 seconds, Fetched 2 row(s)
> {quote}
> But in Spark2.1.x, it failed with SparkException: Cannot recognize hive type 
> string: void
> {quote}
> spark-sql> describe bad;
> 17/05/09 03:12:08 INFO execution.SparkSqlParser: Parsing command: describe bad
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: int
> 17/05/09 03:12:08 INFO parser.CatalystSqlParser: Parsing command: void
> 17/05/09 03:12:08 ERROR thriftserver.SparkSQLDriver: Failed in [describe bad]
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365)
>   
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361)
> Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
> DataType void() is not supported.(line 1, pos 0)
> == SQL ==  
> void   
> ^^^
> ... 61 more
> org.apache.spark.SparkException: Cannot recognize hive type string: void
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31959) Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian to Julian"

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135602#comment-17135602
 ] 

Apache Spark commented on SPARK-31959:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28832

> Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian 
> to Julian"
> 
>
> Key: SPARK-31959
> URL: https://issues.apache.org/jira/browse/SPARK-31959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> See 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123688/testReport/org.apache.spark.sql.catalyst.util/RebaseDateTimeSuite/optimization_of_micros_rebasing___Gregorian_to_Julian/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31959) Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian to Julian"

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135600#comment-17135600
 ] 

Apache Spark commented on SPARK-31959:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28832

> Test failure "RebaseDateTimeSuite.optimization of micros rebasing - Gregorian 
> to Julian"
> 
>
> Key: SPARK-31959
> URL: https://issues.apache.org/jira/browse/SPARK-31959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> See 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123688/testReport/org.apache.spark.sql.catalyst.util/RebaseDateTimeSuite/optimization_of_micros_rebasing___Gregorian_to_Julian/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31993) Generated code in 'concat_ws' fails to compile when splitting method is in effect

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135580#comment-17135580
 ] 

Apache Spark commented on SPARK-31993:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/28831

> Generated code in 'concat_ws' fails to compile when splitting method is in 
> effect
> -
>
> Key: SPARK-31993
> URL: https://issues.apache.org/jira/browse/SPARK-31993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> https://github.com/apache/spark/blob/a0187cd6b59a6b6bb2cadc6711bb663d4d35a844/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L88-L195
> There're three parts of generated code in concat_ws (codes, varargCounts, 
> varargBuilds) and all parts try to split method by itself, while 
> `varargCounts` and `varargBuilds` refer on the generated code in `codes`, 
> hence the overall generated code fails to compile if any of part succeeds to 
> split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31993) Generated code in 'concat_ws' fails to compile when splitting method is in effect

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31993:


Assignee: (was: Apache Spark)

> Generated code in 'concat_ws' fails to compile when splitting method is in 
> effect
> -
>
> Key: SPARK-31993
> URL: https://issues.apache.org/jira/browse/SPARK-31993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> https://github.com/apache/spark/blob/a0187cd6b59a6b6bb2cadc6711bb663d4d35a844/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L88-L195
> There're three parts of generated code in concat_ws (codes, varargCounts, 
> varargBuilds) and all parts try to split method by itself, while 
> `varargCounts` and `varargBuilds` refer on the generated code in `codes`, 
> hence the overall generated code fails to compile if any of part succeeds to 
> split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31993) Generated code in 'concat_ws' fails to compile when splitting method is in effect

2020-06-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135579#comment-17135579
 ] 

Apache Spark commented on SPARK-31993:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/28831

> Generated code in 'concat_ws' fails to compile when splitting method is in 
> effect
> -
>
> Key: SPARK-31993
> URL: https://issues.apache.org/jira/browse/SPARK-31993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.1.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> https://github.com/apache/spark/blob/a0187cd6b59a6b6bb2cadc6711bb663d4d35a844/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L88-L195
> There're three parts of generated code in concat_ws (codes, varargCounts, 
> varargBuilds) and all parts try to split method by itself, while 
> `varargCounts` and `varargBuilds` refer on the generated code in `codes`, 
> hence the overall generated code fails to compile if any of part succeeds to 
> split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31993) Generated code in 'concat_ws' fails to compile when splitting method is in effect

2020-06-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31993:


Assignee: Apache Spark

> Generated code in 'concat_ws' fails to compile when splitting method is in 
> effect
> -
>
> Key: SPARK-31993
> URL: https://issues.apache.org/jira/browse/SPARK-31993
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/apache/spark/blob/a0187cd6b59a6b6bb2cadc6711bb663d4d35a844/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L88-L195
> There're three parts of generated code in concat_ws (codes, varargCounts, 
> varargBuilds) and all parts try to split method by itself, while 
> `varargCounts` and `varargBuilds` refer on the generated code in `codes`, 
> hence the overall generated code fails to compile if any of part succeeds to 
> split.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31992) Benchmark the EXCEPTION rebase mode

2020-06-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31992.
-
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 28829
[https://github.com/apache/spark/pull/28829]

> Benchmark the EXCEPTION rebase mode
> ---
>
> Key: SPARK-31992
> URL: https://issues.apache.org/jira/browse/SPARK-31992
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> Add benchmarks for the EXCEPTION rebase mode to DateTimeRebaseBenchmark. It 
> is the default value of spark.sql.legacy.parquet.datetimeRebaseModeInWrite 
> and spark.sql.legacy.parquet.datetimeRebaseModeInRead, and it would be nice 
> to benchmark it as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31992) Benchmark the EXCEPTION rebase mode

2020-06-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31992:
---

Assignee: Maxim Gekk

> Benchmark the EXCEPTION rebase mode
> ---
>
> Key: SPARK-31992
> URL: https://issues.apache.org/jira/browse/SPARK-31992
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Add benchmarks for the EXCEPTION rebase mode to DateTimeRebaseBenchmark. It 
> is the default value of spark.sql.legacy.parquet.datetimeRebaseModeInWrite 
> and spark.sql.legacy.parquet.datetimeRebaseModeInRead, and it would be nice 
> to benchmark it as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31705) Push more possible predicates through Join via CNF

2020-06-15 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-31705:

Summary: Push more possible predicates through Join via CNF  (was: Rewrite 
join condition to conjunctive normal form)

> Push more possible predicates through Join via CNF
> --
>
> Key: SPARK-31705
> URL: https://issues.apache.org/jira/browse/SPARK-31705
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> Rewrite join condition to [conjunctive normal 
> form|https://en.wikipedia.org/wiki/Conjunctive_normal_form] to push more 
> conditions to filter.
> PostgreSQL:
> {code:sql}
> CREATE TABLE lineitem (l_orderkey BIGINT, l_partkey BIGINT, l_suppkey BIGINT, 
>   
> l_linenumber INT, l_quantity DECIMAL(10,0), l_extendedprice DECIMAL(10,0),
> 
> l_discount DECIMAL(10,0), l_tax DECIMAL(10,0), l_returnflag varchar(255), 
>   
> l_linestatus varchar(255), l_shipdate DATE, l_commitdate DATE, l_receiptdate 
> DATE,
> l_shipinstruct varchar(255), l_shipmode varchar(255), l_comment varchar(255));
>   
> CREATE TABLE orders (
> o_orderkey BIGINT, o_custkey BIGINT, o_orderstatus varchar(255),   
> o_totalprice DECIMAL(10,0), o_orderdate DATE, o_orderpriority varchar(255),
> o_clerk varchar(255), o_shippriority INT, o_comment varchar(255));  
> EXPLAIN
> SELECT Count(*)
> FROM   lineitem,
>orders
> WHERE  l_orderkey = o_orderkey
>AND ( ( l_suppkey > 3
>AND o_custkey > 13 )
>   OR ( l_suppkey > 1
>AND o_custkey > 11 ) )
>AND l_partkey > 19;
> EXPLAIN
> SELECT Count(*)
> FROM   lineitem
>JOIN orders
>  ON l_orderkey = o_orderkey
> AND ( ( l_suppkey > 3
> AND o_custkey > 13 )
>OR ( l_suppkey > 1
> AND o_custkey > 11 ) )
> AND l_partkey > 19;
> EXPLAIN
> SELECT Count(*) 
> FROM   lineitem, 
>orders 
> WHERE  l_orderkey = o_orderkey 
>AND NOT ( ( l_suppkey > 3 
>AND ( l_suppkey > 2 
>   OR o_custkey > 13 ) ) 
>   OR ( l_suppkey > 1 
>AND o_custkey > 11 ) ) 
>AND l_partkey > 19;
> {code}
> {noformat}
> postgres=# EXPLAIN
> postgres-# SELECT Count(*)
> postgres-# FROM   lineitem,
> postgres-#orders
> postgres-# WHERE  l_orderkey = o_orderkey
> postgres-#AND ( ( l_suppkey > 3
> postgres(#AND o_custkey > 13 )
> postgres(#   OR ( l_suppkey > 1
> postgres(#AND o_custkey > 11 ) )
> postgres-#AND l_partkey > 19;
>QUERY PLAN
> -
>  Aggregate  (cost=21.18..21.19 rows=1 width=8)
>->  Hash Join  (cost=10.60..21.17 rows=2 width=0)
>  Hash Cond: (orders.o_orderkey = lineitem.l_orderkey)
>  Join Filter: (((lineitem.l_suppkey > 3) AND (orders.o_custkey > 13)) 
> OR ((lineitem.l_suppkey > 1) AND (orders.o_custkey > 11)))
>  ->  Seq Scan on orders  (cost=0.00..10.45 rows=17 width=16)
>Filter: ((o_custkey > 13) OR (o_custkey > 11))
>  ->  Hash  (cost=10.53..10.53 rows=6 width=16)
>->  Seq Scan on lineitem  (cost=0.00..10.53 rows=6 width=16)
>  Filter: ((l_partkey > 19) AND ((l_suppkey > 3) OR 
> (l_suppkey > 1)))
> (9 rows)
> postgres=# EXPLAIN
> postgres-# SELECT Count(*)
> postgres-# FROM   lineitem
> postgres-#JOIN orders
> postgres-#  ON l_orderkey = o_orderkey
> postgres-# AND ( ( l_suppkey > 3
> postgres(# AND o_custkey > 13 )
> postgres(#OR ( l_suppkey > 1
> postgres(# AND o_custkey > 11 ) )
> postgres-# AND l_partkey > 19;
>QUERY PLAN
> -
>  Aggregate  (cost=21.18..21.19 rows=1 width=8)
>->  Hash Join  (cost=10.60..21.17 rows=2 width=0)
>  Hash Cond: (orders.o_orderkey = lineitem.l_orderkey)
>  Join Filter: (((lineitem.l_suppkey > 3) AND (orders.o_custkey > 13)) 
> OR ((lineitem.l_suppkey > 1) AND (orders.o_custkey > 11)))
>  ->  Seq Scan on orders  (cost=0.00..10.45 rows=17 width=16)
>Filter: ((o_custkey

87 matches

Mail list logo