[jira] [Updated] (SPARK-32268) Bloom Filter Join
[ https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-32268: Attachment: q16-default.png q16-bloom-filter.png > Bloom Filter Join > - > > Key: SPARK-32268 > URL: https://issues.apache.org/jira/browse/SPARK-32268 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Attachments: q16-bloom-filter.png, q16-default.png > > > We can improve the performance of some joins by pre-filtering one side of a > join using a Bloom filter and IN predicate generated from the values from the > other side of the join. > For > example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql] > Before this optimization: > After this optimization: > *Query Performance Benchmarks: TPC-DS Performance Evaluation* > Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and > Partitioned Parquet table > > |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)| > |tpcds q16|84|46| > |tpcds q36|29|21| > |tpcds q57|39|28| > |tpcds q94|42|34| > |tpcds q95|306|288| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32268) Bloom Filter Join
Yuming Wang created SPARK-32268: --- Summary: Bloom Filter Join Key: SPARK-32268 URL: https://issues.apache.org/jira/browse/SPARK-32268 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang Assignee: Yuming Wang Attachments: q16-bloom-filter.png, q16-default.png We can improve the performance of some joins by pre-filtering one side of a join using a Bloom filter and IN predicate generated from the values from the other side of the join. For example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql] Before this optimization: After this optimization: *Query Performance Benchmarks: TPC-DS Performance Evaluation* Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and Partitioned Parquet table |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)| |tpcds q16|84|46| |tpcds q36|29|21| |tpcds q57|39|28| |tpcds q94|42|34| |tpcds q95|306|288| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32250) Reenable MasterSuite's "Master should avoid dead loop while launching executor failed in Worker"
[ https://issues.apache.org/jira/browse/SPARK-32250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32250: - Description: It was disabled because this test alone was flaky. We should enable it back in Github Actions. See https://github.com/HyukjinKwon/spark/pull/4/files#diff-65f17704c0bbb7aa2d85a9a9b2f0dbfaR688. it fails as below: {code} - SPARK-27510: Master should avoid dead loop while launching executor failed in Worker *** FAILED *** (10 seconds, 63 milliseconds) worker.launchExecutorReceived.await(10L, SECONDS) was false (MasterSuite.scala:709) org.scalatest.exceptions.TestFailedException: at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.deploy.master.MasterSuite.$anonfun$new$40(MasterSuite.scala:709) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:157) at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286) at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:59) at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at org.apache.spark.deploy.master.MasterSuite.org$scalatest$BeforeAndAfter$$super$runTest(MasterSuite.scala:138) at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at org.apache.spark.deploy.master.MasterSuite.runTest(MasterSuite.scala:138) at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393) at scala.collection.immutable.List.foreach(List.scala:392) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381) at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458) at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite.run(Suite.scala:1124) at org.scalatest.Suite.run$(Suite.scala:1106) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:518) at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:59) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.deploy.master.MasterSuite.org$scalatest$BeforeAndAfter$$super$run(MasterSuite.scala:138) at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258) at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256) at org.apache.spark.deploy.master.MasterSuite.run(MasterSuite.scala:138) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was:It was disabled because this test alone was flaky. We should enable it back in Github Actions. See https://github.com/HyukjinKwon/spark/pull/4/files#diff-65f17704c0bbb7aa2d85a9a9b2f0dbfaR688 > Reenable MasterSuite's "Master should avoid dead loop while launching >
[jira] [Commented] (SPARK-32250) Reenable MasterSuite's "Master should avoid dead loop while launching executor failed in Worker"
[ https://issues.apache.org/jira/browse/SPARK-32250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155133#comment-17155133 ] Hyukjin Kwon commented on SPARK-32250: -- cc [~Ngone51] FYI > Reenable MasterSuite's "Master should avoid dead loop while launching > executor failed in Worker" > > > Key: SPARK-32250 > URL: https://issues.apache.org/jira/browse/SPARK-32250 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > It was disabled because this test alone was flaky. We should enable it back > in Github Actions. See > https://github.com/HyukjinKwon/spark/pull/4/files#diff-65f17704c0bbb7aa2d85a9a9b2f0dbfaR688 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32257) [SQL Parser] Report Error for invalid usage of SET command
[ https://issues.apache.org/jira/browse/SPARK-32257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155094#comment-17155094 ] Takeshi Yamamuro commented on SPARK-32257: -- Thanks, [~beliefer]. Please ping me for reviews. > [SQL Parser] Report Error for invalid usage of SET command > -- > > Key: SPARK-32257 > URL: https://issues.apache.org/jira/browse/SPARK-32257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > {code:java} > SET spark.sql.ansi.enabled true{code} > The above SQL command does not change the conf value and it just tries to > display the value of conf "spark.sql.ansi.enabled true". > We can disallow using the space in the conf name and issue a user friendly > error instead. In the error message, we should tell users a workaround to use > the quote if they still needs to specify a conf with a space. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32237: Assignee: (was: Apache Spark) > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140) > at >
[jira] [Assigned] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32237: Assignee: Apache Spark > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Assignee: Apache Spark >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140) > at >
[jira] [Commented] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155093#comment-17155093 ] Apache Spark commented on SPARK-32237: -- User 'LantaoJin' has created a pull request for this issue: https://github.com/apache/spark/pull/29062 > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140)
[jira] [Comment Edited] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155085#comment-17155085 ] Lantao Jin edited comment on SPARK-32237 at 7/10/20, 4:21 AM: -- Thanks to report this. I am going to fix it. was (Author: cltlfcjin): Thanks to report this. I am going to fix that. > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at >
[jira] [Assigned] (SPARK-32242) Fix flakiness of CliSuite
[ https://issues.apache.org/jira/browse/SPARK-32242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-32242: Assignee: Jungtaek Lim > Fix flakiness of CliSuite > - > > Key: SPARK-32242 > URL: https://issues.apache.org/jira/browse/SPARK-32242 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > A lot of build failures happened in recent builds, which showed that CliSuite > is very flaky. > While the flakiness depends on the power of the machine, it'd be better if we > can make the suite less flaky even on slow machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32242) Fix flakiness of CliSuite
[ https://issues.apache.org/jira/browse/SPARK-32242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-32242. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29036 [https://github.com/apache/spark/pull/29036] > Fix flakiness of CliSuite > - > > Key: SPARK-32242 > URL: https://issues.apache.org/jira/browse/SPARK-32242 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.1.0 > > > A lot of build failures happened in recent builds, which showed that CliSuite > is very flaky. > While the flakiness depends on the power of the machine, it'd be better if we > can make the suite less flaky even on slow machine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32237) Cannot resolve column when put hint in the views of common table expression
[ https://issues.apache.org/jira/browse/SPARK-32237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155085#comment-17155085 ] Lantao Jin commented on SPARK-32237: Thanks to report this. I am going to fix that. > Cannot resolve column when put hint in the views of common table expression > --- > > Key: SPARK-32237 > URL: https://issues.apache.org/jira/browse/SPARK-32237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 > Environment: Hadoop-2.7.7 > Hive-2.3.6 > Spark-3.0.0 >Reporter: Kernel Force >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Suppose we have a table: > {code:sql} > CREATE TABLE DEMO_DATA ( > ID VARCHAR(10), > NAME VARCHAR(10), > BATCH VARCHAR(10), > TEAM VARCHAR(1) > ) STORED AS PARQUET; > {code} > and some data in it: > {code:sql} > 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; > +---+-+-+-+ > | t.id | t.name | t.batch | t.team | > +---+-+-+-+ > | 1 | mike| 2020-07-08 | A | > | 2 | john| 2020-07-07 | B | > | 3 | rose| 2020-07-06 | B | > | | > +---+-+-+-+ > {code} > If I put query hint in va or vb and run it in spark-shell: > {code:sql} > sql(""" > WITH VA AS > (SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM DEMO_DATA T WHERE T.TEAM = 'A'), > VB AS > (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM > FROM VA T) > SELECT T.ID, T.NAME, T.BATCH, T.TEAM > FROM VB T > """).show > {code} > In Spark-2.4.4 it works fine. > But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint > warning: > {code:scala} > 20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: > REPARTITION(3) > org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input > columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7; > 'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM] > +- SubqueryAlias T >+- SubqueryAlias VB > +- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- SubqueryAlias T > +- SubqueryAlias VA >+- Project [ID#0, NAME#1, BATCH#2, TEAM#3] > +- Filter (TEAM#3 = A) > +- SubqueryAlias T > +- SubqueryAlias spark_catalog.default.demo_data >+- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140) > at >
[jira] [Comment Edited] (SPARK-32257) [SQL Parser] Report Error for invalid usage of SET command
[ https://issues.apache.org/jira/browse/SPARK-32257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155068#comment-17155068 ] jiaan.geng edited comment on SPARK-32257 at 7/10/20, 3:35 AM: -- [~smilegator]OK. Let me do it. but wait me to finish other PR. was (Author: beliefer): [~smilegator]OK. Let me do it. > [SQL Parser] Report Error for invalid usage of SET command > -- > > Key: SPARK-32257 > URL: https://issues.apache.org/jira/browse/SPARK-32257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > {code:java} > SET spark.sql.ansi.enabled true{code} > The above SQL command does not change the conf value and it just tries to > display the value of conf "spark.sql.ansi.enabled true". > We can disallow using the space in the conf name and issue a user friendly > error instead. In the error message, we should tell users a workaround to use > the quote if they still needs to specify a conf with a space. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32257) [SQL Parser] Report Error for invalid usage of SET command
[ https://issues.apache.org/jira/browse/SPARK-32257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155068#comment-17155068 ] jiaan.geng commented on SPARK-32257: [~smilegator]OK. Let me do it. > [SQL Parser] Report Error for invalid usage of SET command > -- > > Key: SPARK-32257 > URL: https://issues.apache.org/jira/browse/SPARK-32257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > {code:java} > SET spark.sql.ansi.enabled true{code} > The above SQL command does not change the conf value and it just tries to > display the value of conf "spark.sql.ansi.enabled true". > We can disallow using the space in the conf name and issue a user friendly > error instead. In the error message, we should tell users a workaround to use > the quote if they still needs to specify a conf with a space. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32238) Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF
[ https://issues.apache.org/jira/browse/SPARK-32238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-32238: - Issue Type: Bug (was: Improvement) > Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF > - > > Key: SPARK-32238 > URL: https://issues.apache.org/jira/browse/SPARK-32238 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Priority: Minor > > *BEFORE* > {code:java} > object MalformedClassObject extends Serializable { > class MalformedFunction extends (String => Int) with Serializable { > override def apply(v1: String): Int = v1.toInt / 0 > } > } > Seq("20").toDF("col").select(udf(new > MalformedFunction).apply(Column("col"))).collect() > An exception or error caused a run to abort: Malformed class name > java.lang.InternalError: Malformed class name > at java.lang.Class.getSimpleName(Class.java:1330) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) > at > org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) > at > org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) > > {code} > *AFTER* > {code} > org.apache.spark.SparkException: Failed to execute user defined > function(UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction: > (string) => int) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:753) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:127) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.sql.UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction.apply(UDFSuite.scala:677) > at > org.apache.spark.sql.UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction.apply(UDFSuite.scala:676) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32238) Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF
[ https://issues.apache.org/jira/browse/SPARK-32238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32238: -- Description: {code:java} object MalformedClassObject extends Serializable { class MalformedFunction extends (String => Int) with Serializable { override def apply(v1: String): Int = v1.toInt / 0 } } Seq("20").toDF("col").select(udf(new MalformedFunction).apply(Column("col"))).collect() An exception or error caused a run to abort: Malformed class name java.lang.InternalError: Malformed class name at java.lang.Class.getSimpleName(Class.java:1330) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) at org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) at org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) {code} *AFTER* {code} org.apache.spark.SparkException: Failed to execute user defined function(UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction: (string) => int) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:753) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ArithmeticException: / by zero at org.apache.spark.sql.UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction.apply(UDFSuite.scala:677) at org.apache.spark.sql.UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction.apply(UDFSuite.scala:676) ... 17 more {code} was: {code:java} object MalformedClassObject extends Serializable { class MalformedFunction extends (String => Int) with Serializable { override def apply(v1: String): Int = v1.toInt / 0 } } Seq("20").toDF("col").select(udf(new MalformedFunction).apply(Column("col"))).collect() An exception or error caused a run to abort: Malformed class name java.lang.InternalError: Malformed class name at java.lang.Class.getSimpleName(Class.java:1330) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) at org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) at org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) {code} > Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF > - > > Key: SPARK-32238 > URL: https://issues.apache.org/jira/browse/SPARK-32238 > Project: Spark > Issue Type:
[jira] [Updated] (SPARK-32238) Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF
[ https://issues.apache.org/jira/browse/SPARK-32238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32238: -- Description: *BEFORE* {code:java} object MalformedClassObject extends Serializable { class MalformedFunction extends (String => Int) with Serializable { override def apply(v1: String): Int = v1.toInt / 0 } } Seq("20").toDF("col").select(udf(new MalformedFunction).apply(Column("col"))).collect() An exception or error caused a run to abort: Malformed class name java.lang.InternalError: Malformed class name at java.lang.Class.getSimpleName(Class.java:1330) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) at org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) at org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) {code} *AFTER* {code} org.apache.spark.SparkException: Failed to execute user defined function(UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction: (string) => int) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:753) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ArithmeticException: / by zero at org.apache.spark.sql.UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction.apply(UDFSuite.scala:677) at org.apache.spark.sql.UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction.apply(UDFSuite.scala:676) ... 17 more {code} was: {code:java} object MalformedClassObject extends Serializable { class MalformedFunction extends (String => Int) with Serializable { override def apply(v1: String): Int = v1.toInt / 0 } } Seq("20").toDF("col").select(udf(new MalformedFunction).apply(Column("col"))).collect() An exception or error caused a run to abort: Malformed class name java.lang.InternalError: Malformed class name at java.lang.Class.getSimpleName(Class.java:1330) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) at org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) at org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) {code} *AFTER* {code} org.apache.spark.SparkException: Failed to execute user defined function(UDFSuite$MalformedClassObject$MalformedNonPrimitiveFunction: (string) => int) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at
[jira] [Updated] (SPARK-32238) "Malformed class name" error from ScalaUDF
[ https://issues.apache.org/jira/browse/SPARK-32238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32238: -- Affects Version/s: (was: 2.4.0) (was: 3.0.0) 3.1.0 > "Malformed class name" error from ScalaUDF > -- > > Key: SPARK-32238 > URL: https://issues.apache.org/jira/browse/SPARK-32238 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Priority: Major > > {code:java} > object MalformedClassObject extends Serializable { > class MalformedFunction extends (String => Int) with Serializable { > override def apply(v1: String): Int = v1.toInt / 0 > } > } > Seq("20").toDF("col").select(udf(new > MalformedFunction).apply(Column("col"))).collect() > An exception or error caused a run to abort: Malformed class name > java.lang.InternalError: Malformed class name > at java.lang.Class.getSimpleName(Class.java:1330) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) > at > org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) > at > org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32238) "Malformed class name" error from ScalaUDF
[ https://issues.apache.org/jira/browse/SPARK-32238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32238: -- Priority: Minor (was: Major) > "Malformed class name" error from ScalaUDF > -- > > Key: SPARK-32238 > URL: https://issues.apache.org/jira/browse/SPARK-32238 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: wuyi >Priority: Minor > > {code:java} > object MalformedClassObject extends Serializable { > class MalformedFunction extends (String => Int) with Serializable { > override def apply(v1: String): Int = v1.toInt / 0 > } > } > Seq("20").toDF("col").select(udf(new > MalformedFunction).apply(Column("col"))).collect() > An exception or error caused a run to abort: Malformed class name > java.lang.InternalError: Malformed class name > at java.lang.Class.getSimpleName(Class.java:1330) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage$lzycompute(ScalaUDF.scala:1157) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.udfErrorMessage(ScalaUDF.scala:1155) > at > org.apache.spark.sql.catalyst.expressions.ScalaUDF.doGenCode(ScalaUDF.scala:1077) > at > org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:147) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:142) > at > org.apache.spark.sql.catalyst.expressions.Alias.genCode(namedExpressions.scala:160) > at > org.apache.spark.sql.execution.ProjectExec.$anonfun$doConsume$1(basicPhysicalOperators.scala:69) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes
[ https://issues.apache.org/jira/browse/SPARK-32224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32224. --- Resolution: Won't Do Like `spark.yarn.tags`, we had better use a general conf, `--conf spark.yarn.priority`, instead of adding everything to spark submit parameter because it's a limited resource. > Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes > > > Key: SPARK-32224 > URL: https://issues.apache.org/jira/browse/SPARK-32224 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and > should we support pass this configuration through the SparkSubmit command line -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-32224) Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes
[ https://issues.apache.org/jira/browse/SPARK-32224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-32224. - > Support pass 'spark.yarn.priority' through the SparkSubmit on Yarn modes > > > Key: SPARK-32224 > URL: https://issues.apache.org/jira/browse/SPARK-32224 > Project: Spark > Issue Type: Improvement > Components: Spark Submit >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Minor > > SPARK-29603 support configure 'spark.yarn.priority' on Yarn deploy modes, and > should we support pass this configuration through the SparkSubmit command line -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32207) Support 'F'-suffixed Float Literals
[ https://issues.apache.org/jira/browse/SPARK-32207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32207. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29022 [https://github.com/apache/spark/pull/29022] > Support 'F'-suffixed Float Literals > --- > > Key: SPARK-32207 > URL: https://issues.apache.org/jira/browse/SPARK-32207 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.4, 2.4.6, 3.0.1, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.0 > > > {code:sql} > select 1.1f; > {code} > currently thows > {code:java} > ``` > org.apache.spark.sql.AnalysisException: Can't extract value from 1: need > struct type but got int; > ``` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32267) Decrease parallelism of PySpark tests on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-32267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155052#comment-17155052 ] Hyukjin Kwon commented on SPARK-32267: -- Seems the situation got better after Jenkins rebooting. I will revisit this ticket when it starts to happen again. > Decrease parallelism of PySpark tests on Jenkins > > > Key: SPARK-32267 > URL: https://issues.apache.org/jira/browse/SPARK-32267 > Project: Spark > Issue Type: Test > Components: Project Infra, PySpark >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > PySpark tests became flaky apparently due to the machines being slow. The > parallelism of PySpark increased at SPARK-26148 to speed up. > Maybe we should decrease it to make the test stability better. It will only > increase around 10 minutes which is rather a small portion compared to 6 ~ 7 > hours in total testing time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32267) Decrease parallelism of PySpark tests on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-32267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32267. -- Resolution: Later > Decrease parallelism of PySpark tests on Jenkins > > > Key: SPARK-32267 > URL: https://issues.apache.org/jira/browse/SPARK-32267 > Project: Spark > Issue Type: Test > Components: Project Infra, PySpark >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > PySpark tests became flaky apparently due to the machines being slow. The > parallelism of PySpark increased at SPARK-26148 to speed up. > Maybe we should decrease it to make the test stability better. It will only > increase around 10 minutes which is rather a small portion compared to 6 ~ 7 > hours in total testing time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32267) Decrease parallelism of PySpark tests on Jenkins
Hyukjin Kwon created SPARK-32267: Summary: Decrease parallelism of PySpark tests on Jenkins Key: SPARK-32267 URL: https://issues.apache.org/jira/browse/SPARK-32267 Project: Spark Issue Type: Improvement Components: Project Infra, PySpark Affects Versions: 3.0.0, 2.4.6, 3.1.0 Reporter: Hyukjin Kwon Assignee: Hyukjin Kwon PySpark tests became flaky apparently due to the machines being slow. The parallelism of PySpark increased at SPARK-26148 to speed up. Maybe we should decrease it to make the test stability better. It will only increase around 10 minutes which is rather a small portion compared to 6 ~ 7 hours in total testing time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32267) Decrease parallelism of PySpark tests on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-32267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32267: - Issue Type: Test (was: Improvement) > Decrease parallelism of PySpark tests on Jenkins > > > Key: SPARK-32267 > URL: https://issues.apache.org/jira/browse/SPARK-32267 > Project: Spark > Issue Type: Test > Components: Project Infra, PySpark >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > PySpark tests became flaky apparently due to the machines being slow. The > parallelism of PySpark increased at SPARK-26148 to speed up. > Maybe we should decrease it to make the test stability better. It will only > increase around 10 minutes which is rather a small portion compared to 6 ~ 7 > hours in total testing time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32266) Run smoke tests after a commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-32266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-32266: --- Affects Version/s: 2.4.6 3.0.0 > Run smoke tests after a commit is pushed > > > Key: SPARK-32266 > URL: https://issues.apache.org/jira/browse/SPARK-32266 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32265) PySpark regexp_replace does not work as expected for the following pattern
[ https://issues.apache.org/jira/browse/SPARK-32265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Vieira updated SPARK-32265: - Description: I'm using spark streaming to consume from a topic and make transformations on the data. Amidst these is a regex replacement. The `regexp_replace` function from `pyspark.sql.functions` is not replacing the following pattern (I tested it beforehand using regex101.com, `re` from python, etc): {code:python} df.withColumn('value', f.regexp_replace('value', '([A-Za-z]+=[^,]*?)(\[[A-Z,a-z,0-9]+\])',r'$1')) {code} this is a snippet of the record: {code:hocon} {someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, {code} and This is the "target" of the regex pattern: {code:hocon} someMessage=Test. [BL056]` {code} It should match the entire target and split in two groups, and replace it by the first group matched alone (as by `r'$1'`). These are also patterns that didn't work: * ` df.withColumn('value', f.regexp_replace('value', '([A-Za-z]+=[^,]*?)',''))` * ` df.withColumn('value', f.regexp_replace('value', '(\[[A-Z,a-z,0-9]+\])',''))` This worked: * ` df.withColumn('value', f.regexp_replace('value', 'someMessage=Test. [BL056]',''))` Why is this happening? Are there specificities to the spark regex engine? What would be the right pattern for what I'm trying to do? Examples and the entire script is listed below: This is an example value of the "value" column: {code:hocon} {someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, someNumber=9872329, someAppOrigin=APP_PADRAO, someId=c3ASAUSQTiWvl_YA9DYpDV:APA91bGfVcLNNGL20hfmaDDS0D8TuzJDuCjj4tgbRNcJcYASIBRVEE2FnA4exnE4ZWTuupRX7FQkdcJiMWkNEatk8lktkFcpR7P7mehb4r_SVnabIabGInjagGZ6pGyweDkxW2JUGK8g, someType=1, someOriginOpen=null, someOS=null, eventSubType=TESTLOGON, someToken=, ip=error, somePair=0.4220043,-1.084015, eventType=SUCESSO, someMag=aWg4V01qSxDMjAvWmlEWGJ6aExnc2nZJbWZVPQ==, macAddress=33d94a3f7d2f8aff, someJSON=\{"ip":"error","hostname":null,"type":null,"concode":null,"continent":null,"country":null,"country_name":null,"code":null,"name":null,"city":null,"zip":null,"latitude":null,"longitude":null,"anotherJSON":{"id":null,"capital":null,"languages":null,"flag":null,"flag_emoji":null,"flag_emoji_unicode":null,"calling_code":null,"is_eu":null},"time_zone":\{"id":null,"current_time":null,"gmt_offset":null,"code":null,"is_daylight_saving":null},"currency":\{"code":null,"name":null,"plural":null,"symbol":null,"symbol_native":null},"connection":\{"asn":null,"isp":null},"security":\{"is_proxy":null,"proxy_type":null,"is_crawler":null,"crawler_name":null,"crawler_type":null,"is_tor":null,"threat_level":null,"threat_types":null}}, organization=IBPF, codigoCliente=440149, device=Android SDK built for x86, eventDate=6/1/20 4:03 PM} {code} This is the whole code: {code:python} import re import json import pyhocon import fastavro import requests from io import BytesIO from pyspark.sql import SparkSession from pyspark.sql import functions as f spark = SparkSession.builder.getOrCreate() def decode(msg, schema): bytes_io = BytesIO(msg) bytes_io.seek(5) msg = fastavro.schemaless_reader(bytes_io, schema) return msg def parse(msg): conf = pyhocon.ConfigParser.parse(msg) msg_converter = pyhocon.tool.HOCONConverter.to_json(conf) msg = json.loads(msg_converter) return msg def get_schema(registry_url,topic): URL = f'\{registry_url}/subjects/\{topic}/versions/latest' response = requests.get(url=URL, verify=False) subject = response.json() schema_id = subject['id'] schema = json.loads(subject['schema']) return [schema_id, schema] schema_id, schema = get_schema(registry_url=SCHEMA_REGISTRY,topic=SUBSCRIBE_TOPIC) spark.udf.register('decode',lambda value: decode(value,schema)) spark.udf.register('parse',parse) spark.readStream \ .format('kafka') \ .option('subscribe', SUBSCRIBE_TOPIC) \ .option('startingOffsets', 'earliest') \ .option('kafka.bootstrap.servers', HOST) \ .option('kafka.security.protocol', 'SSL') \ .option('kafka.ssl.key.password', KEYSTORE_PASSWORD) \ .option('kafka.ssl.keystore.location', KEYSTORE_PATH) \ .option('kafka.ssl.truststore.location', KEYSTORE_PATH) \ .option('kafka.ssl.keystore.password', KEYSTORE_PASSWORD) \ .option('kafka.ssl.truststore.password', KEYSTORE_PASSWORD) \ .load() \ .selectExpr(f'decode(value) as value') \ .withColumn('value', f.regexp_replace('value', '([A-Za-z]+=[^,]*?)(\[[A-Z,a-z,1-9]+\])','$1'))\ .writeStream \ .format('console') \ .option('truncate', 'false') \ .start() {code} was: I'm using spark streaming to consume from a topic and make transformations on the data. Amidst these is a regex replacement. The `regexp_replace` function from `pyspark.sql.functions` is not
[jira] [Created] (SPARK-32265) PySpark regexp_replace does not work as expected for the following pattern
Marco Vieira created SPARK-32265: Summary: PySpark regexp_replace does not work as expected for the following pattern Key: SPARK-32265 URL: https://issues.apache.org/jira/browse/SPARK-32265 Project: Spark Issue Type: Question Components: PySpark, Spark Core, SQL, Structured Streaming Affects Versions: 2.4.4 Reporter: Marco Vieira I'm using spark streaming to consume from a topic and make transformations on the data. Amidst these is a regex replacement. The `regexp_replace` function from `pyspark.sql.functions` is not replacing the following pattern (I tested it beforehand using regex101.com, `re` from python, etc): ` df.withColumn('value', f.regexp_replace('value', '([A-Za-z]+=[^,]*?)(\[[A-Z,a-z,0-9]+\])',r'$1'))` this is a snippet of the record: ``` {someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, ``` and This is the "target" of the regex pattern: ` someMessage=Test. [BL056]` It should match the entire target and split in two groups, and replace it by the first group matched alone (as by `r'$1'`). These are also patterns that didn't work: # ` df.withColumn('value', f.regexp_replace('value', '([A-Za-z]+=[^,]*?)',''))` # ` df.withColumn('value', f.regexp_replace('value', '(\[[A-Z,a-z,0-9]+\])',''))` This worked: # ` df.withColumn('value', f.regexp_replace('value', 'someMessage=Test. [BL056]',''))` Why is this happening? Are there specificities to the spark regex engine? What would be the right pattern for what I'm trying to do? Examples and the entire script is listed below: This is an example value of the "value" column: ```hocon {someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, someNumber=9872329, someAppOrigin=APP_PADRAO, someId=c3ASAUSQTiWvl_YA9DYpDV:APA91bGfVcLNNGL20hfmaDDS0D8TuzJDuCjj4tgbRNcJcYASIBRVEE2FnA4exnE4ZWTuupRX7FQkdcJiMWkNEatk8lktkFcpR7P7mehb4r_SVnabIabGInjagGZ6pGyweDkxW2JUGK8g, someType=1, someOriginOpen=null, someOS=null, eventSubType=TESTLOGON, someToken=, ip=error, somePair=0.4220043,-1.084015, eventType=SUCESSO, someMag=aWg4V01qSxDMjAvWmlEWGJ6aExnc2nZJbWZVPQ==, macAddress=33d94a3f7d2f8aff, someJSON=\{"ip":"error","hostname":null,"type":null,"concode":null,"continent":null,"country":null,"country_name":null,"code":null,"name":null,"city":null,"zip":null,"latitude":null,"longitude":null,"anotherJSON":{"id":null,"capital":null,"languages":null,"flag":null,"flag_emoji":null,"flag_emoji_unicode":null,"calling_code":null,"is_eu":null},"time_zone":\{"id":null,"current_time":null,"gmt_offset":null,"code":null,"is_daylight_saving":null},"currency":\{"code":null,"name":null,"plural":null,"symbol":null,"symbol_native":null},"connection":\{"asn":null,"isp":null},"security":\{"is_proxy":null,"proxy_type":null,"is_crawler":null,"crawler_name":null,"crawler_type":null,"is_tor":null,"threat_level":null,"threat_types":null}}, organization=IBPF, codigoCliente=440149, device=Android SDK built for x86, eventDate=6/1/20 4:03 PM} ``` This is the whole code: ```python import re import json import pyhocon import fastavro import requests from io import BytesIO from pyspark.sql import SparkSession from pyspark.sql import functions as f spark = SparkSession.builder.getOrCreate() def decode(msg, schema): bytes_io = BytesIO(msg) bytes_io.seek(5) msg = fastavro.schemaless_reader(bytes_io, schema) return msg def parse(msg): conf = pyhocon.ConfigParser.parse(msg) msg_converter = pyhocon.tool.HOCONConverter.to_json(conf) msg = json.loads(msg_converter) return msg def get_schema(registry_url,topic): URL = f'\{registry_url}/subjects/\{topic}/versions/latest' response = requests.get(url=URL, verify=False) subject = response.json() schema_id = subject['id'] schema = json.loads(subject['schema']) return [schema_id, schema] schema_id, schema = get_schema(registry_url=SCHEMA_REGISTRY,topic=SUBSCRIBE_TOPIC) spark.udf.register('decode',lambda value: decode(value,schema)) spark.udf.register('parse',parse) spark.readStream \ .format('kafka') \ .option('subscribe', SUBSCRIBE_TOPIC) \ .option('startingOffsets', 'earliest') \ .option('kafka.bootstrap.servers', HOST) \ .option('kafka.security.protocol', 'SSL') \ .option('kafka.ssl.key.password', KEYSTORE_PASSWORD) \ .option('kafka.ssl.keystore.location', KEYSTORE_PATH) \ .option('kafka.ssl.truststore.location', KEYSTORE_PATH) \ .option('kafka.ssl.keystore.password', KEYSTORE_PASSWORD) \ .option('kafka.ssl.truststore.password', KEYSTORE_PASSWORD) \ .load() \ .selectExpr(f'decode(value) as value') \ .withColumn('value', f.regexp_replace('value', '([A-Za-z]+=[^,]*?)(\[[A-Z,a-z,1-9]+\])','$1'))\ .writeStream \ .format('console') \ .option('truncate', 'false') \ .start() ``` -- This
[jira] [Assigned] (SPARK-31875) Provide a option to disable user supplied Hints.
[ https://issues.apache.org/jira/browse/SPARK-31875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31875: - Assignee: Dilip Biswal > Provide a option to disable user supplied Hints. > > > Key: SPARK-31875 > URL: https://issues.apache.org/jira/browse/SPARK-31875 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Dilip Biswal >Assignee: Dilip Biswal >Priority: Major > > Provide a config option similar to Oracle's OPTIMIZER_IGNORE_HINTS. This can > be helpful to study the impact of performance difference when hints are > applied vs when they are not. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155018#comment-17155018 ] Shixiong Zhu commented on SPARK-32256: -- Yep. This doesn't happen in Hadoop 3.1.0 and above > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars > to access Hive Metastore. These jars are loaded by the isolated classloader. > Because we also share Hadoop classes with the isolated classloader, the user > doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which > means when we are using the isolated classloader, hadoop-common jar is not > available in this case. If Hadoop VersionInfo is not initialized before we > switch to the isolated classloader, and we try to initialize it using the > isolated classloader (the current thread context classloader), it will fail > and report `Unknown` which causes Hive to throw the following exception: > {code} > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217) > at > org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204) > at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:331) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292) > at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at >
[jira] [Commented] (SPARK-32206) Enable multi-line true could break the read csv in Azure Data Lake Storage gen2
[ https://issues.apache.org/jira/browse/SPARK-32206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155016#comment-17155016 ] Qionghui Zhang commented on SPARK-32206: [~JinxinTang] I'm using Windows 10 and we are not allowed to use colon in file name. > Enable multi-line true could break the read csv in Azure Data Lake Storage > gen2 > --- > > Key: SPARK-32206 > URL: https://issues.apache.org/jira/browse/SPARK-32206 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.5 >Reporter: Qionghui Zhang >Priority: Major > > I'm using azure data lake gen2, when I'm loading data frame with certain > options: > var df = spark.read.format("csv") > .option("ignoreLeadingWhiteSpace", "true") > .option("ignoreTrailingWhiteSpace", "true") > .option("parserLib", "UNIVOCITY") > .option("multiline", "true") > .option("inferSchema", "true") > .option("mode", "PERMISSIVE") > .option("quote", "\"") > .option("escape", "\"") > .option("timeStampFormat", "M/d/ H:m:s a") > > .load("abfss://\{containername}@\{storage}.dfs.core.windows.net/\{somedirectory}/\{somefilesnapshotname}-MM-dd'T'hh:mm:ss") > .limit(1) > It will load data correctly. > > But if I use \{DirectoryWithColon}, it will thrown error: > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: \{somefilesnapshotname}-MM-dd'T'hh:mm:ss > > Then if I remove .option("multiline", "true"), data can be loaded, but for > sure that the dataframe is not handled correctly because there are newline > character. > > So I believe it is a bug. > > And since our production is running correctly if we enable > spark.read.schema(\{SomeSchemaList}).format("csv"), and we want to use > inferschema feature on those file path with colon or other special > characters, could you help fix this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32244) Build and run the Spark with test cases in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155011#comment-17155011 ] Hyukjin Kwon commented on SPARK-32244: -- Oh sure, let me create a JIRA for that too. > Build and run the Spark with test cases in Github Actions > - > > Key: SPARK-32244 > URL: https://issues.apache.org/jira/browse/SPARK-32244 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Critical > > Last week and onwards, the Jenkins machines became very unstable for some > reasons. > - Apparently, the machines became extremely slow. Almost all tests can't > pass. > - One machine (worker 4) started to have the corrupt .m2 which fails the > build. > - Documentation build fails time to time for an unknown reason in Jenkins > machine specifically. > Almost all PRs are basically blocked by this instability currently. > This JIRA aims to run the tests in Github Actions. > - To avoid depending on few persons who can access to the cluster. > - To reduce the elapsed time in the build - we could split the tests (e.g., > SQL, ML, CORE), and run them in parallel so the total build time will > significantly reduce. > - To control the environment more flexibly. > - Other contributors can test and propose to fix Github Actions > configurations so we can distribute this build management cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32262) PySpark regexp_replace not replacing JSON çlçlçl
Marco Vieira created SPARK-32262: Summary: PySpark regexp_replace not replacing JSON çlçlçl Key: SPARK-32262 URL: https://issues.apache.org/jira/browse/SPARK-32262 Project: Spark Issue Type: Question Components: PySpark, Spark Core, SQL Affects Versions: 2.4.4 Environment: asdasd Reporter: Marco Vieira This is an example of the column value: ```hocon '\{someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, someNumber=9872329, someAppOrigin=APP_PADRAO, someId=c3ASAUSQTiWvl_YA9DYpDV:APA91bGfVcLNNGL20hfmaDDS0D8TuzJDuCjj4tgbRNcJcYASIBRVEE2FnA4exnE4ZWTuupRX7FQkdcJiMWkNEatk8lktkFcpR7P7mehb4r_SVnabIabGInjagGZ6pGyweDkxW2JUGK8g, someType=1, someOriginOpen=null, someOS=null, eventSubType=TESTLOGON, someToken=, ip=error, somePair=0.4220043,-1.084015, eventType=SUCESSO, someMag=aWg4V01qSxDMjAvWmlEWGJ6aExnc2nZJbWZVPQ==, macAddress=33d94a3f7d2f8aff, someJSON={"ip":"error","hostname":null,"type":null,"concode":null,"continent":null,"country":null,"country_name":null,"code":null,"name":null,"city":null,"zip":null,"latitude":null,"longitude":null,"anotherJSON":{"id":null,"capital":null,"languages":null,"flag":null,"flag_emoji":null,"flag_emoji_unicode":null,"calling_code":null,"is_eu":null},"time_zone":\{"id":null,"current_time":null,"gmt_offset":null,"code":null,"is_daylight_saving":null},"currency":\{"code":null,"name":null,"plural":null,"symbol":null,"symbol_native":null},"connection":\{"asn":null,"isp":null},"security":\{"is_proxy":null,"proxy_type":null,"is_crawler":null,"crawler_name":null,"crawler_type":null,"is_tor":null,"threat_level":null,"threat_types":null}}, organization=IBPF, codigoCliente=440149, device=Android SDK built for x86, eventDate=6/1/20 4:03 PM}' ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32263) PySpark regexp_replace not replacing JSON çlçlçlçl
Marco Vieira created SPARK-32263: Summary: PySpark regexp_replace not replacing JSON çlçlçlçl Key: SPARK-32263 URL: https://issues.apache.org/jira/browse/SPARK-32263 Project: Spark Issue Type: Question Components: PySpark, Spark Core, SQL Affects Versions: 2.4.4 Environment: asdasd Reporter: Marco Vieira This is an example of the column value: ```hocon '\{someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, someNumber=9872329, someAppOrigin=APP_PADRAO, someId=c3ASAUSQTiWvl_YA9DYpDV:APA91bGfVcLNNGL20hfmaDDS0D8TuzJDuCjj4tgbRNcJcYASIBRVEE2FnA4exnE4ZWTuupRX7FQkdcJiMWkNEatk8lktkFcpR7P7mehb4r_SVnabIabGInjagGZ6pGyweDkxW2JUGK8g, someType=1, someOriginOpen=null, someOS=null, eventSubType=TESTLOGON, someToken=, ip=error, somePair=0.4220043,-1.084015, eventType=SUCESSO, someMag=aWg4V01qSxDMjAvWmlEWGJ6aExnc2nZJbWZVPQ==, macAddress=33d94a3f7d2f8aff, someJSON={"ip":"error","hostname":null,"type":null,"concode":null,"continent":null,"country":null,"country_name":null,"code":null,"name":null,"city":null,"zip":null,"latitude":null,"longitude":null,"anotherJSON":{"id":null,"capital":null,"languages":null,"flag":null,"flag_emoji":null,"flag_emoji_unicode":null,"calling_code":null,"is_eu":null},"time_zone":\{"id":null,"current_time":null,"gmt_offset":null,"code":null,"is_daylight_saving":null},"currency":\{"code":null,"name":null,"plural":null,"symbol":null,"symbol_native":null},"connection":\{"asn":null,"isp":null},"security":\{"is_proxy":null,"proxy_type":null,"is_crawler":null,"crawler_name":null,"crawler_type":null,"is_tor":null,"threat_level":null,"threat_types":null}}, organization=IBPF, codigoCliente=440149, device=Android SDK built for x86, eventDate=6/1/20 4:03 PM}' ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32260) PySpark regexp_replace not replacing JSON çlçl
Marco Vieira created SPARK-32260: Summary: PySpark regexp_replace not replacing JSON çlçl Key: SPARK-32260 URL: https://issues.apache.org/jira/browse/SPARK-32260 Project: Spark Issue Type: Question Components: PySpark, Spark Core, SQL Affects Versions: 2.4.4 Environment: asdasd Reporter: Marco Vieira This is an example of the column value: ```hocon '\{someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, someNumber=9872329, someAppOrigin=APP_PADRAO, someId=c3ASAUSQTiWvl_YA9DYpDV:APA91bGfVcLNNGL20hfmaDDS0D8TuzJDuCjj4tgbRNcJcYASIBRVEE2FnA4exnE4ZWTuupRX7FQkdcJiMWkNEatk8lktkFcpR7P7mehb4r_SVnabIabGInjagGZ6pGyweDkxW2JUGK8g, someType=1, someOriginOpen=null, someOS=null, eventSubType=TESTLOGON, someToken=, ip=error, somePair=0.4220043,-1.084015, eventType=SUCESSO, someMag=aWg4V01qSxDMjAvWmlEWGJ6aExnc2nZJbWZVPQ==, macAddress=33d94a3f7d2f8aff, someJSON={"ip":"error","hostname":null,"type":null,"concode":null,"continent":null,"country":null,"country_name":null,"code":null,"name":null,"city":null,"zip":null,"latitude":null,"longitude":null,"anotherJSON":{"id":null,"capital":null,"languages":null,"flag":null,"flag_emoji":null,"flag_emoji_unicode":null,"calling_code":null,"is_eu":null},"time_zone":\{"id":null,"current_time":null,"gmt_offset":null,"code":null,"is_daylight_saving":null},"currency":\{"code":null,"name":null,"plural":null,"symbol":null,"symbol_native":null},"connection":\{"asn":null,"isp":null},"security":\{"is_proxy":null,"proxy_type":null,"is_crawler":null,"crawler_name":null,"crawler_type":null,"is_tor":null,"threat_level":null,"threat_types":null}}, organization=IBPF, codigoCliente=440149, device=Android SDK built for x86, eventDate=6/1/20 4:03 PM}' ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32261) PySpark regexp_replace not replacing JSON çlçlçl
Marco Vieira created SPARK-32261: Summary: PySpark regexp_replace not replacing JSON çlçlçl Key: SPARK-32261 URL: https://issues.apache.org/jira/browse/SPARK-32261 Project: Spark Issue Type: Question Components: PySpark, Spark Core, SQL Affects Versions: 2.4.4 Environment: asdasd Reporter: Marco Vieira This is an example of the column value: ```hocon '\{someVersion=8.3.2-hmg-dev, someUnitName=IB, someMessage=Test. [BL056], someOrigin=MOBILE, someStatus=TEST, duration=3500, someNumber=9872329, someAppOrigin=APP_PADRAO, someId=c3ASAUSQTiWvl_YA9DYpDV:APA91bGfVcLNNGL20hfmaDDS0D8TuzJDuCjj4tgbRNcJcYASIBRVEE2FnA4exnE4ZWTuupRX7FQkdcJiMWkNEatk8lktkFcpR7P7mehb4r_SVnabIabGInjagGZ6pGyweDkxW2JUGK8g, someType=1, someOriginOpen=null, someOS=null, eventSubType=TESTLOGON, someToken=, ip=error, somePair=0.4220043,-1.084015, eventType=SUCESSO, someMag=aWg4V01qSxDMjAvWmlEWGJ6aExnc2nZJbWZVPQ==, macAddress=33d94a3f7d2f8aff, someJSON={"ip":"error","hostname":null,"type":null,"concode":null,"continent":null,"country":null,"country_name":null,"code":null,"name":null,"city":null,"zip":null,"latitude":null,"longitude":null,"anotherJSON":{"id":null,"capital":null,"languages":null,"flag":null,"flag_emoji":null,"flag_emoji_unicode":null,"calling_code":null,"is_eu":null},"time_zone":\{"id":null,"current_time":null,"gmt_offset":null,"code":null,"is_daylight_saving":null},"currency":\{"code":null,"name":null,"plural":null,"symbol":null,"symbol_native":null},"connection":\{"asn":null,"isp":null},"security":\{"is_proxy":null,"proxy_type":null,"is_crawler":null,"crawler_name":null,"crawler_type":null,"is_tor":null,"threat_level":null,"threat_types":null}}, organization=IBPF, codigoCliente=440149, device=Android SDK built for x86, eventDate=6/1/20 4:03 PM}' ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32228) Partition column of hive table was capitalized and stored on HDFS
[ https://issues.apache.org/jira/browse/SPARK-32228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kernel Force updated SPARK-32228: - Description: Suppose we have a target hive table to be insert by spark with dynamic partition feature on. {code:sql} CREATE TABLE DEMO_PART ( ID VARCHAR(10), NAME VARCHAR(10) ) PARTITIONED BY (BATCH DATE, TEAM VARCHAR(10)) STORED AS ORC; {code} And have a source data table like: {code:sql} 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; +---+-+-+-+ | t.id | t.name | t.batch | t.team | +---+-+-+-+ | 1 | mike| 2020-07-08 | A | | 2 | john| 2020-07-07 | B | +---+-+-+-+ 2 rows selected (0.177 seconds) {code} Then doing join operation against an exploded view and insert the result into DEMO_PART table: {code:sql} sql(""" WITH VA AS ( SELECT ARRAY_REPEAT(1,10) A ), VB AS ( SELECT EXPLODE(T.A) IDX FROM VA T ), VC AS ( SELECT ROW_NUMBER() OVER(ORDER BY NULL) RN FROM VB T ), VD AS ( SELECT T.RN, DATE_ADD(TO_DATE('2020-07-01','-MM-dd'),T.RN) DT FROM VC T ), VE AS ( SELECT T.DT BATCH, T.RN ID, CASE WHEN T.RN > 5 THEN 'A' ELSE 'B' END TEAM FROM VD T ) SELECT T.BATCH BATCH, S.ID ID, S.NAME NAME, S.TEAM TEAM FROM VE T INNER JOIN DEMO_DATA S ON T.TEAM = S.TEAM """). selectExpr(spark.table("DEMO_PART").columns:_*). write.mode("overwrite").insertInto("DEMO_PART") {code} The result could NOT be read by hive beeline: {code:sql} 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_PART T; +---+-+--+-+ | t.id | t.name | t.batch | t.team | +---+-+--+-+ +---+-+--+-+ No rows selected (0.268 seconds) {code} Because the underlying data was stored in HDFS uncorrectly: {code:bash} [user@HOSTNAME ~]$ dfs -ls /user/hive/warehouse/demo_part/ Found 21 items /user/hive/warehouse/demo_part/BATCH=2020-07-02 /user/hive/warehouse/demo_part/BATCH=2020-07-03 /user/hive/warehouse/demo_part/BATCH=2020-07-04 /user/hive/warehouse/demo_part/BATCH=2020-07-05 /user/hive/warehouse/demo_part/BATCH=2020-07-06 /user/hive/warehouse/demo_part/BATCH=2020-07-07 /user/hive/warehouse/demo_part/BATCH=2020-07-08 /user/hive/warehouse/demo_part/BATCH=2020-07-09 /user/hive/warehouse/demo_part/BATCH=2020-07-10 /user/hive/warehouse/demo_part/BATCH=2020-07-11 /user/hive/warehouse/demo_part/_SUCCESS /user/hive/warehouse/demo_part/batch=2020-07-02 /user/hive/warehouse/demo_part/batch=2020-07-03 /user/hive/warehouse/demo_part/batch=2020-07-04 /user/hive/warehouse/demo_part/batch=2020-07-05 /user/hive/warehouse/demo_part/batch=2020-07-06 /user/hive/warehouse/demo_part/batch=2020-07-07 /user/hive/warehouse/demo_part/batch=2020-07-08 /user/hive/warehouse/demo_part/batch=2020-07-09 /user/hive/warehouse/demo_part/batch=2020-07-10 /user/hive/warehouse/demo_part/batch=2020-07-11 {code} Both "BATCH=" and "batch=" directories appeared, and the data files was stored in "BATCH" directories but not "batch" The result will be correct if I change the SQL statement, simply change the column alias to lower case in the last select, like: {code:sql} SELECT T.BATCH batch, S.ID id, S.NAME name, S.TEAM team FROM VE T INNER JOIN DEMO_DATA S ON T.TEAM = S.TEAM {code} was: Suppose we have a target hive table to be insert by spark with dynamic partition feature on. {code:sql} CREATE TABLE DEMO_PART ( ID VARCHAR(10), NAME VARCHAR(10) ) PARTITIONED BY (BATCH DATE, TEAM VARCHAR(10)) STORED AS ORC; {code} And have a source data table like: {code:sql} 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_DATA T; +---+-+-+-+ | t.id | t.name | t.batch | t.team | +---+-+-+-+ | 1 | mike| 2020-07-08 | A | | 2 | john| 2020-07-07 | B | +---+-+-+-+ 2 rows selected (0.177 seconds) {code} Then doing join operation against an exploded view and insert the result into DEMO_PART table: {code:sql} sql(""" WITH VA AS ( SELECT ARRAY_REPEAT(1,10) A ), VB AS ( SELECT EXPLODE(T.A) IDX FROM VA T ), VC AS ( SELECT ROW_NUMBER() OVER(ORDER BY NULL) RN FROM VB T ), VD AS ( SELECT T.RN, DATE_ADD(TO_DATE('2020-07-01','-MM-dd'),T.RN) DT FROM VC T ), VE AS ( SELECT T.DT BATCH, T.RN ID, CASE WHEN T.RN > 5 THEN 'A' ELSE 'B' END TEAM FROM VD T ) SELECT T.BATCH BATCH, S.ID ID, S.NAME NAME, S.TEAM TEAM FROM VE T INNER JOIN DEMO_DATA S ON T.TEAM = S.TEAM """). selectExpr(spark.table("DEMO_PART").columns:_*). write.mode("overwrite").insertInto("DEMO_PART") {code} The result could NOT be read by hive beeline: {code:sql} 0: jdbc:hive2://HOSTNAME:1> SELECT T.* FROM DEMO_PART T; +---+-+--+-+ | t.id | t.name | t.batch | t.team | +---+-+--+-+
[jira] [Commented] (SPARK-32220) Cartesian Product Hint cause data error
[ https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154963#comment-17154963 ] Dongjoon Hyun commented on SPARK-32220: --- According to the PR test case, I labeled this as a correctness issue and raised to `Blocker` for 3.0.1. Thanks, [~angerszhuuu]. > Cartesian Product Hint cause data error > --- > > Key: SPARK-32220 > URL: https://issues.apache.org/jira/browse/SPARK-32220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Labels: correctness > > {code:java} > spark-sql> select * from test4 order by a asc; > 1 2 > Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO > SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s) > spark-sql>select * from test5 order by a asc > 1 2 > 2 2 > Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO > SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar > spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 > where test4.a = test5.a order by test4.a asc ; > 1 2 1 2 > 1 2 2 2 > Time taken: 0.351 seconds, Fetched 2 row(s) > 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched > 2 row(s){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32220) Cartesian Product Hint cause data error
[ https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32220: -- Target Version/s: 3.0.1 > Cartesian Product Hint cause data error > --- > > Key: SPARK-32220 > URL: https://issues.apache.org/jira/browse/SPARK-32220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Blocker > Labels: correctness > > {code:java} > spark-sql> select * from test4 order by a asc; > 1 2 > Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO > SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s) > spark-sql>select * from test5 order by a asc > 1 2 > 2 2 > Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO > SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar > spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 > where test4.a = test5.a order by test4.a asc ; > 1 2 1 2 > 1 2 2 2 > Time taken: 0.351 seconds, Fetched 2 row(s) > 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched > 2 row(s){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32220) Cartesian Product Hint cause data error
[ https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32220: -- Priority: Blocker (was: Major) > Cartesian Product Hint cause data error > --- > > Key: SPARK-32220 > URL: https://issues.apache.org/jira/browse/SPARK-32220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Blocker > Labels: correctness > > {code:java} > spark-sql> select * from test4 order by a asc; > 1 2 > Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO > SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s) > spark-sql>select * from test5 order by a asc > 1 2 > 2 2 > Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO > SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar > spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 > where test4.a = test5.a order by test4.a asc ; > 1 2 1 2 > 1 2 2 2 > Time taken: 0.351 seconds, Fetched 2 row(s) > 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched > 2 row(s){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32220) Cartesian Product Hint cause data error
[ https://issues.apache.org/jira/browse/SPARK-32220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32220: -- Labels: correctness (was: ) > Cartesian Product Hint cause data error > --- > > Key: SPARK-32220 > URL: https://issues.apache.org/jira/browse/SPARK-32220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > Labels: correctness > > {code:java} > spark-sql> select * from test4 order by a asc; > 1 2 > Time taken: 1.063 seconds, Fetched 4 row(s)20/07/08 14:11:25 INFO > SparkSQLCLIDriver: Time taken: 1.063 seconds, Fetched 4 row(s) > spark-sql>select * from test5 order by a asc > 1 2 > 2 2 > Time taken: 1.18 seconds, Fetched 24 row(s)20/07/08 14:13:59 INFO > SparkSQLCLIDriver: Time taken: 1.18 seconds, Fetched 24 row(s)spar > spark-sql>select /*+ shuffle_replicate_nl(test4) */ * from test4 join test5 > where test4.a = test5.a order by test4.a asc ; > 1 2 1 2 > 1 2 2 2 > Time taken: 0.351 seconds, Fetched 2 row(s) > 20/07/08 14:18:16 INFO SparkSQLCLIDriver: Time taken: 0.351 seconds, Fetched > 2 row(s){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154953#comment-17154953 ] Prakash Rajendran commented on SPARK-32259: --- [~rvesse] Hi Can you help on this? If you need additional information let me know. > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events given in > attachment > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Rajendran updated SPARK-32259: -- Description: In Spark-Submit, I have these config "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local storage usage exceeds the total limit of containers.*{color}" We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod logs for stack trace is not available. we have only pod events given in attachment was: In Spark-Submit, I have these config "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local storage usage exceeds the total limit of containers.*{color}" We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod logs for stack trace is not available. we have only pod events as below. > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events given in > attachment > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Rajendran updated SPARK-32259: -- Attachment: Capture.PNG > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events as below. > > !image-2020-07-09-16-43-16-473.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
[ https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Rajendran updated SPARK-32259: -- Description: In Spark-Submit, I have these config "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local storage usage exceeds the total limit of containers.*{color}" We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod logs for stack trace is not available. we have only pod events as below. was: In Spark-Submit, I have these config "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local storage usage exceeds the total limit of containers.*{color}" We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod logs for stack trace is not available. we have only pod events as below. !image-2020-07-09-16-43-16-473.png! > tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s > --- > > Key: SPARK-32259 > URL: https://issues.apache.org/jira/browse/SPARK-32259 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Prakash Rajendran >Priority: Blocker > Attachments: Capture.PNG > > > In Spark-Submit, I have these config > "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark > is not pointing its spill data to SPARK_LOCAL_DIRS path. > K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local > storage usage exceeds the total limit of containers.*{color}" > > We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod > logs for stack trace is not available. we have only pod events as below. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
Prakash Rajendran created SPARK-32259: - Summary: tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s Key: SPARK-32259 URL: https://issues.apache.org/jira/browse/SPARK-32259 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.0.0 Reporter: Prakash Rajendran In Spark-Submit, I have these config "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark is not pointing its spill data to SPARK_LOCAL_DIRS path. K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local storage usage exceeds the total limit of containers.*{color}" We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod logs for stack trace is not available. we have only pod events as below. !image-2020-07-09-16-43-16-473.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31831) Flaky test: org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It is not a test it is a sbt.testing.SuiteSelector)
[ https://issues.apache.org/jira/browse/SPARK-31831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31831. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29039 [https://github.com/apache/spark/pull/29039] > Flaky test: org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It > is not a test it is a sbt.testing.SuiteSelector) > > > Key: SPARK-31831 > URL: https://issues.apache.org/jira/browse/SPARK-31831 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.1.0 > > > I've seen the failures two times (not in a row but closely) which seems to > require investigation. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123147/testReport > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123150/testReport > {noformat} > org.mockito.exceptions.base.MockitoException: ClassCastException occurred > while creating the mockito mock : class to mock : > 'org.apache.hive.service.cli.session.SessionManager', loaded by classloader : > 'sun.misc.Launcher$AppClassLoader@483bf400' created class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' proxy > instance class : 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', > loaded by classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' instance > creation by : ObjenesisInstantiator You might experience classloading > issues, please ask the mockito mailing-list. > Stack Trace > sbt.ForkMain$ForkError: org.mockito.exceptions.base.MockitoException: > ClassCastException occurred while creating the mockito mock : > class to mock : 'org.apache.hive.service.cli.session.SessionManager', > loaded by classloader : 'sun.misc.Launcher$AppClassLoader@483bf400' > created class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' > proxy instance class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' > instance creation by : ObjenesisInstantiator > You might experience classloading issues, please ask the mockito mailing-list. > at > org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.beforeAll(HiveSessionImplSuite.scala:44) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:59) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: java.lang.ClassCastException: > org.mockito.codegen.SessionManager$MockitoMock$1696557705 cannot be cast to > org.mockito.internal.creation.bytebuddy.MockAccess > at > org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:48) > at > org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25) > at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35) > at org.mockito.internal.MockitoCore.mock(MockitoCore.java:63) > at org.mockito.Mockito.mock(Mockito.java:1908) > at org.mockito.Mockito.mock(Mockito.java:1817) > ... 13 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31831) Flaky test: org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It is not a test it is a sbt.testing.SuiteSelector)
[ https://issues.apache.org/jira/browse/SPARK-31831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31831: - Assignee: Jungtaek Lim > Flaky test: org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It > is not a test it is a sbt.testing.SuiteSelector) > > > Key: SPARK-31831 > URL: https://issues.apache.org/jira/browse/SPARK-31831 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > I've seen the failures two times (not in a row but closely) which seems to > require investigation. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123147/testReport > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123150/testReport > {noformat} > org.mockito.exceptions.base.MockitoException: ClassCastException occurred > while creating the mockito mock : class to mock : > 'org.apache.hive.service.cli.session.SessionManager', loaded by classloader : > 'sun.misc.Launcher$AppClassLoader@483bf400' created class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' proxy > instance class : 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', > loaded by classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' instance > creation by : ObjenesisInstantiator You might experience classloading > issues, please ask the mockito mailing-list. > Stack Trace > sbt.ForkMain$ForkError: org.mockito.exceptions.base.MockitoException: > ClassCastException occurred while creating the mockito mock : > class to mock : 'org.apache.hive.service.cli.session.SessionManager', > loaded by classloader : 'sun.misc.Launcher$AppClassLoader@483bf400' > created class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' > proxy instance class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' > instance creation by : ObjenesisInstantiator > You might experience classloading issues, please ask the mockito mailing-list. > at > org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.beforeAll(HiveSessionImplSuite.scala:44) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:59) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: java.lang.ClassCastException: > org.mockito.codegen.SessionManager$MockitoMock$1696557705 cannot be cast to > org.mockito.internal.creation.bytebuddy.MockAccess > at > org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:48) > at > org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25) > at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35) > at org.mockito.internal.MockitoCore.mock(MockitoCore.java:63) > at org.mockito.Mockito.mock(Mockito.java:1908) > at org.mockito.Mockito.mock(Mockito.java:1817) > ... 13 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31831) Flaky test: org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It is not a test it is a sbt.testing.SuiteSelector)
[ https://issues.apache.org/jira/browse/SPARK-31831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31831: -- Component/s: Tests > Flaky test: org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.(It > is not a test it is a sbt.testing.SuiteSelector) > > > Key: SPARK-31831 > URL: https://issues.apache.org/jira/browse/SPARK-31831 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Jungtaek Lim >Priority: Major > > I've seen the failures two times (not in a row but closely) which seems to > require investigation. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123147/testReport > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123150/testReport > {noformat} > org.mockito.exceptions.base.MockitoException: ClassCastException occurred > while creating the mockito mock : class to mock : > 'org.apache.hive.service.cli.session.SessionManager', loaded by classloader : > 'sun.misc.Launcher$AppClassLoader@483bf400' created class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' proxy > instance class : 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', > loaded by classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' instance > creation by : ObjenesisInstantiator You might experience classloading > issues, please ask the mockito mailing-list. > Stack Trace > sbt.ForkMain$ForkError: org.mockito.exceptions.base.MockitoException: > ClassCastException occurred while creating the mockito mock : > class to mock : 'org.apache.hive.service.cli.session.SessionManager', > loaded by classloader : 'sun.misc.Launcher$AppClassLoader@483bf400' > created class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' > proxy instance class : > 'org.mockito.codegen.SessionManager$MockitoMock$1696557705', loaded by > classloader : > 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@47ecf2c6' > instance creation by : ObjenesisInstantiator > You might experience classloading issues, please ask the mockito mailing-list. > at > org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite.beforeAll(HiveSessionImplSuite.scala:44) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:59) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: java.lang.ClassCastException: > org.mockito.codegen.SessionManager$MockitoMock$1696557705 cannot be cast to > org.mockito.internal.creation.bytebuddy.MockAccess > at > org.mockito.internal.creation.bytebuddy.SubclassByteBuddyMockMaker.createMock(SubclassByteBuddyMockMaker.java:48) > at > org.mockito.internal.creation.bytebuddy.ByteBuddyMockMaker.createMock(ByteBuddyMockMaker.java:25) > at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:35) > at org.mockito.internal.MockitoCore.mock(MockitoCore.java:63) > at org.mockito.Mockito.mock(Mockito.java:1908) > at org.mockito.Mockito.mock(Mockito.java:1817) > ... 13 more > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32258) NormalizeFloatingNumbers can directly normalize on certain children expressions
[ https://issues.apache.org/jira/browse/SPARK-32258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154921#comment-17154921 ] Apache Spark commented on SPARK-32258: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/29061 > NormalizeFloatingNumbers can directly normalize on certain children > expressions > --- > > Key: SPARK-32258 > URL: https://issues.apache.org/jira/browse/SPARK-32258 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > Currently NormalizeFloatingNumbers rule treats some expressions as black box > but we can optimize it a bit by normalizing directly the inner children > expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32258) NormalizeFloatingNumbers can directly normalize on certain children expressions
[ https://issues.apache.org/jira/browse/SPARK-32258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154920#comment-17154920 ] Apache Spark commented on SPARK-32258: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/29061 > NormalizeFloatingNumbers can directly normalize on certain children > expressions > --- > > Key: SPARK-32258 > URL: https://issues.apache.org/jira/browse/SPARK-32258 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > Currently NormalizeFloatingNumbers rule treats some expressions as black box > but we can optimize it a bit by normalizing directly the inner children > expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32258) NormalizeFloatingNumbers can directly normalize on certain children expressions
[ https://issues.apache.org/jira/browse/SPARK-32258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32258: Assignee: Apache Spark (was: L. C. Hsieh) > NormalizeFloatingNumbers can directly normalize on certain children > expressions > --- > > Key: SPARK-32258 > URL: https://issues.apache.org/jira/browse/SPARK-32258 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Minor > > Currently NormalizeFloatingNumbers rule treats some expressions as black box > but we can optimize it a bit by normalizing directly the inner children > expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32258) NormalizeFloatingNumbers can directly normalize on certain children expressions
[ https://issues.apache.org/jira/browse/SPARK-32258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32258: Assignee: L. C. Hsieh (was: Apache Spark) > NormalizeFloatingNumbers can directly normalize on certain children > expressions > --- > > Key: SPARK-32258 > URL: https://issues.apache.org/jira/browse/SPARK-32258 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > > Currently NormalizeFloatingNumbers rule treats some expressions as black box > but we can optimize it a bit by normalizing directly the inner children > expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32258) NormalizeFloatingNumbers can directly normalize on certain children expressions
L. C. Hsieh created SPARK-32258: --- Summary: NormalizeFloatingNumbers can directly normalize on certain children expressions Key: SPARK-32258 URL: https://issues.apache.org/jira/browse/SPARK-32258 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh Currently NormalizeFloatingNumbers rule treats some expressions as black box but we can optimize it a bit by normalizing directly the inner children expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32232) IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver given invalid value auto
[ https://issues.apache.org/jira/browse/SPARK-32232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32232: Assignee: Apache Spark > IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver > given invalid value auto > -- > > Key: SPARK-32232 > URL: https://issues.apache.org/jira/browse/SPARK-32232 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.0.0 >Reporter: steven taylor >Assignee: Apache Spark >Priority: Major > > I believe I have discovered a bug when loading > MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I > have tested and can see is not there in at least Spark 2.4.3, Scala 2.11. > (I'm not sure if the Scala version is important). > > I am using pyspark on a databricks cluster and importing the library "from > pyspark.ml.classification import MultilayerPerceptronClassificationModel" > > When running model=MultilayerPerceptronClassificationModel.("load") and then > model. transform (df) I get the following error: IllegalArgumentException: > MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid > value auto. > > > This issue can be easily replicated by running the example given on the spark > documents: > [http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier] > > Then adding a save model, load model and transform statement as such: > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier > *from* *pyspark.ml.evaluation* *import* MulticlassClassificationEvaluator > > _# Load training data_ > data = spark.read.format("libsvm")\ > .load("data/mllib/sample_multiclass_classification_data.txt") > > _# Split the data into train and test_ > splits = data.randomSplit([0.6, 0.4], 1234) > train = splits[0] > test = splits[1] > > _# specify layers for the neural network:_ > _# input layer of size 4 (features), two intermediate of size 5 and 4_ > _# and output of size 3 (classes)_ > layers = [4, 5, 4, 3] > > _# create the trainer and set its parameters_ > trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, > blockSize=128, seed=1234) > > _# train the model_ > model = trainer.fit(train) > > _# compute accuracy on the test set_ > result = model.transform(test) > predictionAndLabels = result.select("prediction", "label") > evaluator = MulticlassClassificationEvaluator(metricName="accuracy") > *print*("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels))) > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier, > MultilayerPerceptronClassificationModel > model.save(Save_location) > model2. MultilayerPerceptronClassificationModel.load(Save_location) > > result_from_loaded = model2.transform(test) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32232) IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver given invalid value auto
[ https://issues.apache.org/jira/browse/SPARK-32232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32232: Assignee: (was: Apache Spark) > IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver > given invalid value auto > -- > > Key: SPARK-32232 > URL: https://issues.apache.org/jira/browse/SPARK-32232 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.0.0 >Reporter: steven taylor >Priority: Major > > I believe I have discovered a bug when loading > MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I > have tested and can see is not there in at least Spark 2.4.3, Scala 2.11. > (I'm not sure if the Scala version is important). > > I am using pyspark on a databricks cluster and importing the library "from > pyspark.ml.classification import MultilayerPerceptronClassificationModel" > > When running model=MultilayerPerceptronClassificationModel.("load") and then > model. transform (df) I get the following error: IllegalArgumentException: > MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid > value auto. > > > This issue can be easily replicated by running the example given on the spark > documents: > [http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier] > > Then adding a save model, load model and transform statement as such: > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier > *from* *pyspark.ml.evaluation* *import* MulticlassClassificationEvaluator > > _# Load training data_ > data = spark.read.format("libsvm")\ > .load("data/mllib/sample_multiclass_classification_data.txt") > > _# Split the data into train and test_ > splits = data.randomSplit([0.6, 0.4], 1234) > train = splits[0] > test = splits[1] > > _# specify layers for the neural network:_ > _# input layer of size 4 (features), two intermediate of size 5 and 4_ > _# and output of size 3 (classes)_ > layers = [4, 5, 4, 3] > > _# create the trainer and set its parameters_ > trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, > blockSize=128, seed=1234) > > _# train the model_ > model = trainer.fit(train) > > _# compute accuracy on the test set_ > result = model.transform(test) > predictionAndLabels = result.select("prediction", "label") > evaluator = MulticlassClassificationEvaluator(metricName="accuracy") > *print*("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels))) > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier, > MultilayerPerceptronClassificationModel > model.save(Save_location) > model2. MultilayerPerceptronClassificationModel.load(Save_location) > > result_from_loaded = model2.transform(test) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32232) IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver given invalid value auto
[ https://issues.apache.org/jira/browse/SPARK-32232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154875#comment-17154875 ] Apache Spark commented on SPARK-32232: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/29060 > IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver > given invalid value auto > -- > > Key: SPARK-32232 > URL: https://issues.apache.org/jira/browse/SPARK-32232 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.0.0 >Reporter: steven taylor >Priority: Major > > I believe I have discovered a bug when loading > MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I > have tested and can see is not there in at least Spark 2.4.3, Scala 2.11. > (I'm not sure if the Scala version is important). > > I am using pyspark on a databricks cluster and importing the library "from > pyspark.ml.classification import MultilayerPerceptronClassificationModel" > > When running model=MultilayerPerceptronClassificationModel.("load") and then > model. transform (df) I get the following error: IllegalArgumentException: > MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid > value auto. > > > This issue can be easily replicated by running the example given on the spark > documents: > [http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier] > > Then adding a save model, load model and transform statement as such: > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier > *from* *pyspark.ml.evaluation* *import* MulticlassClassificationEvaluator > > _# Load training data_ > data = spark.read.format("libsvm")\ > .load("data/mllib/sample_multiclass_classification_data.txt") > > _# Split the data into train and test_ > splits = data.randomSplit([0.6, 0.4], 1234) > train = splits[0] > test = splits[1] > > _# specify layers for the neural network:_ > _# input layer of size 4 (features), two intermediate of size 5 and 4_ > _# and output of size 3 (classes)_ > layers = [4, 5, 4, 3] > > _# create the trainer and set its parameters_ > trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, > blockSize=128, seed=1234) > > _# train the model_ > model = trainer.fit(train) > > _# compute accuracy on the test set_ > result = model.transform(test) > predictionAndLabels = result.select("prediction", "label") > evaluator = MulticlassClassificationEvaluator(metricName="accuracy") > *print*("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels))) > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier, > MultilayerPerceptronClassificationModel > model.save(Save_location) > model2. MultilayerPerceptronClassificationModel.load(Save_location) > > result_from_loaded = model2.transform(test) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32035) Inconsistent AWS environment variables in documentation
[ https://issues.apache.org/jira/browse/SPARK-32035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32035: - Assignee: Richard Joerger > Inconsistent AWS environment variables in documentation > --- > > Key: SPARK-32035 > URL: https://issues.apache.org/jira/browse/SPARK-32035 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.6, 3.0.0 >Reporter: Ondrej Kokes >Assignee: Richard Joerger >Priority: Minor > Fix For: 2.4.7, 3.0.1, 3.1.0 > > > Looking at the actual Scala code, the environment variables used to log into > AWS are: > - AWS_ACCESS_KEY_ID > - AWS_SECRET_ACCESS_KEY > - AWS_SESSION_TOKEN > These are the same that AWS uses in their libraries. > However, looking through the Spark documentation and comments, I see that > these are not denoted correctly across the board: > docs/cloud-integration.md > 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both > different* > 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated > authentication options > docs/streaming-kinesis-integration.md > 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and > `AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different* > external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py > 34: $ export AWS_ACCESS_KEY_ID= > 35: $ export AWS_SECRET_KEY= *<-- different* > 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret > key different* > core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala > 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") > 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") > 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") > external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala > 53: * $ export AWS_ACCESS_KEY_ID= > 54: * $ export AWS_SECRET_KEY= *<-- different* > 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* > external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java > 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] > 60: * $ export AWS_SECRET_KEY= *<-- different* > 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32035) Inconsistent AWS environment variables in documentation
[ https://issues.apache.org/jira/browse/SPARK-32035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32035. --- Fix Version/s: 3.1.0 2.4.7 3.0.1 Resolution: Fixed Issue resolved by pull request 29058 [https://github.com/apache/spark/pull/29058] > Inconsistent AWS environment variables in documentation > --- > > Key: SPARK-32035 > URL: https://issues.apache.org/jira/browse/SPARK-32035 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.6, 3.0.0 >Reporter: Ondrej Kokes >Priority: Minor > Fix For: 3.0.1, 2.4.7, 3.1.0 > > > Looking at the actual Scala code, the environment variables used to log into > AWS are: > - AWS_ACCESS_KEY_ID > - AWS_SECRET_ACCESS_KEY > - AWS_SESSION_TOKEN > These are the same that AWS uses in their libraries. > However, looking through the Spark documentation and comments, I see that > these are not denoted correctly across the board: > docs/cloud-integration.md > 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both > different* > 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated > authentication options > docs/streaming-kinesis-integration.md > 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and > `AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different* > external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py > 34: $ export AWS_ACCESS_KEY_ID= > 35: $ export AWS_SECRET_KEY= *<-- different* > 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret > key different* > core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala > 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") > 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") > 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") > external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala > 53: * $ export AWS_ACCESS_KEY_ID= > 54: * $ export AWS_SECRET_KEY= *<-- different* > 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* > external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java > 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] > 60: * $ export AWS_SECRET_KEY= *<-- different* > 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32232) IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver given invalid value auto
[ https://issues.apache.org/jira/browse/SPARK-32232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154755#comment-17154755 ] Huaxin Gao commented on SPARK-32232: [~st_60]No, I will not remove the getter. I think over, seems all I need to do is to make sure that Python and Scala have the same solver default value for MultilayerPerceptronModel. I will submit a PR soon. > IllegalArgumentException: MultilayerPerceptronClassifier_... parameter solver > given invalid value auto > -- > > Key: SPARK-32232 > URL: https://issues.apache.org/jira/browse/SPARK-32232 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.0.0 >Reporter: steven taylor >Priority: Major > > I believe I have discovered a bug when loading > MultilayerPerceptronClassificationModel in spark 3.0.0, scala 2.1.2 which I > have tested and can see is not there in at least Spark 2.4.3, Scala 2.11. > (I'm not sure if the Scala version is important). > > I am using pyspark on a databricks cluster and importing the library "from > pyspark.ml.classification import MultilayerPerceptronClassificationModel" > > When running model=MultilayerPerceptronClassificationModel.("load") and then > model. transform (df) I get the following error: IllegalArgumentException: > MultilayerPerceptronClassifier_8055d1368e78 parameter solver given invalid > value auto. > > > This issue can be easily replicated by running the example given on the spark > documents: > [http://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier] > > Then adding a save model, load model and transform statement as such: > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier > *from* *pyspark.ml.evaluation* *import* MulticlassClassificationEvaluator > > _# Load training data_ > data = spark.read.format("libsvm")\ > .load("data/mllib/sample_multiclass_classification_data.txt") > > _# Split the data into train and test_ > splits = data.randomSplit([0.6, 0.4], 1234) > train = splits[0] > test = splits[1] > > _# specify layers for the neural network:_ > _# input layer of size 4 (features), two intermediate of size 5 and 4_ > _# and output of size 3 (classes)_ > layers = [4, 5, 4, 3] > > _# create the trainer and set its parameters_ > trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, > blockSize=128, seed=1234) > > _# train the model_ > model = trainer.fit(train) > > _# compute accuracy on the test set_ > result = model.transform(test) > predictionAndLabels = result.select("prediction", "label") > evaluator = MulticlassClassificationEvaluator(metricName="accuracy") > *print*("Test set accuracy = " + str(evaluator.evaluate(predictionAndLabels))) > > *from* *pyspark.ml.classification* *import* MultilayerPerceptronClassifier, > MultilayerPerceptronClassificationModel > model.save(Save_location) > model2. MultilayerPerceptronClassificationModel.load(Save_location) > > result_from_loaded = model2.transform(test) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32236) Local cluster should shutdown gracefully
[ https://issues.apache.org/jira/browse/SPARK-32236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-32236. Resolution: Duplicate > Local cluster should shutdown gracefully > > > Key: SPARK-32236 > URL: https://issues.apache.org/jira/browse/SPARK-32236 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Almost every time we call sc.stop with local cluster mode, like following > exceptions will be thrown. > {code:java} > 20/07/09 08:36:45 ERROR TransportRequestHandler: Error while invoking > RpcHandler#receive() for one-way message. > org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:167) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:150) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:691) > at > org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:253) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is the asynchronously sent RPC message KillExecutor from Master > can be processed after the message loop
[jira] [Updated] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-32256: - Description: Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars to access Hive Metastore. These jars are loaded by the isolated classloader. Because we also share Hadoop classes with the isolated classloader, the user doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which means when we are using the isolated classloader, hadoop-common jar is not available in this case. If Hadoop VersionInfo is not initialized before we switch to the isolated classloader, and we try to initialize it using the isolated classloader (the current thread context classloader), it will fail and report `Unknown` which causes Hive to throw the following exception: {code} java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format) at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3349) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:217) at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:204) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:331) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:292) at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:262) at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:247) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
[jira] [Updated] (SPARK-32257) [SQL Parser] Report Error for invalid usage of SET command
[ https://issues.apache.org/jira/browse/SPARK-32257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-32257: Description: {code:java} SET spark.sql.ansi.enabled true{code} The above SQL command does not change the conf value and it just tries to display the value of conf "spark.sql.ansi.enabled true". We can disallow using the space in the conf name and issue a user friendly error instead. In the error message, we should tell users a workaround to use the quote if they still needs to specify a conf with a space. was: SET spark.sql.ansi.enabled true The above SQL command does not change the conf value and it just tries to display the value of conf "spark.sql.ansi.enabled true". We can disallow using the space in the conf name and issue a user friendly error instead. In the error message, we should tell users a workaround to use the quote if they still needs to specify a conf with a space. > [SQL Parser] Report Error for invalid usage of SET command > -- > > Key: SPARK-32257 > URL: https://issues.apache.org/jira/browse/SPARK-32257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > {code:java} > SET spark.sql.ansi.enabled true{code} > The above SQL command does not change the conf value and it just tries to > display the value of conf "spark.sql.ansi.enabled true". > We can disallow using the space in the conf name and issue a user friendly > error instead. In the error message, we should tell users a workaround to use > the quote if they still needs to specify a conf with a space. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32257) [SQL Parser] Report Error for invalid usage of SET command
[ https://issues.apache.org/jira/browse/SPARK-32257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154739#comment-17154739 ] Xiao Li commented on SPARK-32257: - [~beliefer] [~maropu] Are you interested in this? > [SQL Parser] Report Error for invalid usage of SET command > -- > > Key: SPARK-32257 > URL: https://issues.apache.org/jira/browse/SPARK-32257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > SET spark.sql.ansi.enabled true > The above SQL command does not change the conf value and it just tries to > display the value of conf "spark.sql.ansi.enabled true". > We can disallow using the space in the conf name and issue a user friendly > error instead. In the error message, we should tell users a workaround to use > the quote if they still needs to specify a conf with a space. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32257) [SQL Parser] Report Error for invalid usage of SET command
Xiao Li created SPARK-32257: --- Summary: [SQL Parser] Report Error for invalid usage of SET command Key: SPARK-32257 URL: https://issues.apache.org/jira/browse/SPARK-32257 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Xiao Li SET spark.sql.ansi.enabled true The above SQL command does not change the conf value and it just tries to display the value of conf "spark.sql.ansi.enabled true". We can disallow using the space in the conf name and issue a user friendly error instead. In the error message, we should tell users a workaround to use the quote if they still needs to specify a conf with a space. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32244) Build and run the Spark with test cases in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154738#comment-17154738 ] Dongjoon Hyun commented on SPARK-32244: --- BTW, where is the work item JIRA to contact APACHE INFRA team to get the required resources? > Build and run the Spark with test cases in Github Actions > - > > Key: SPARK-32244 > URL: https://issues.apache.org/jira/browse/SPARK-32244 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Critical > > Last week and onwards, the Jenkins machines became very unstable for some > reasons. > - Apparently, the machines became extremely slow. Almost all tests can't > pass. > - One machine (worker 4) started to have the corrupt .m2 which fails the > build. > - Documentation build fails time to time for an unknown reason in Jenkins > machine specifically. > Almost all PRs are basically blocked by this instability currently. > This JIRA aims to run the tests in Github Actions. > - To avoid depending on few persons who can access to the cluster. > - To reduce the elapsed time in the build - we could split the tests (e.g., > SQL, ML, CORE), and run them in parallel so the total build time will > significantly reduce. > - To control the environment more flexibly. > - Other contributors can test and propose to fix Github Actions > configurations so we can distribute this build management cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32244) Build and run the Spark with test cases in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154736#comment-17154736 ] Dongjoon Hyun commented on SPARK-32244: --- It's a great plan! Thanks! > Build and run the Spark with test cases in Github Actions > - > > Key: SPARK-32244 > URL: https://issues.apache.org/jira/browse/SPARK-32244 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Critical > > Last week and onwards, the Jenkins machines became very unstable for some > reasons. > - Apparently, the machines became extremely slow. Almost all tests can't > pass. > - One machine (worker 4) started to have the corrupt .m2 which fails the > build. > - Documentation build fails time to time for an unknown reason in Jenkins > machine specifically. > Almost all PRs are basically blocked by this instability currently. > This JIRA aims to run the tests in Github Actions. > - To avoid depending on few persons who can access to the cluster. > - To reduce the elapsed time in the build - we could split the tests (e.g., > SQL, ML, CORE), and run them in parallel so the total build time will > significantly reduce. > - To control the environment more flexibly. > - Other contributors can test and propose to fix Github Actions > configurations so we can distribute this build management cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32231) Use Hadoop 3 profile in AppVeyor SparkR build
[ https://issues.apache.org/jira/browse/SPARK-32231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32231: - Assignee: Dongjoon Hyun (was: Apache Spark) > Use Hadoop 3 profile in AppVeyor SparkR build > - > > Key: SPARK-32231 > URL: https://issues.apache.org/jira/browse/SPARK-32231 > Project: Spark > Issue Type: Test > Components: Project Infra, R >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > AppVeyor with Hadoop 3 is failed, see > https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845 > It will be disabled for now at SPARK-32230. > We should investigate and enable it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-32230) Use Hadoop 2.7 profile in AppVeyor SparkR build
[ https://issues.apache.org/jira/browse/SPARK-32230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-32230. - > Use Hadoop 2.7 profile in AppVeyor SparkR build > > > Key: SPARK-32230 > URL: https://issues.apache.org/jira/browse/SPARK-32230 > Project: Spark > Issue Type: Test > Components: Project Infra, R >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Hadoop 3 is used by default as of SPARK-32058 but AppVeyor seems failing: > https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/33977845 > We should use Hadoop 2 in AppVeyor build for now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-32256: -- Target Version/s: 3.0.1 > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars > to access Hive Metastore. These jars are loaded by the isolated classloader. > Because we also share Hadoop classes with the isolated classloader, the user > doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which > means when we are using the isolated classloader, hadoop-common jar is not > available in this case. If Hadoop VersionInfo is not initialized before we > switch to the isolated classloader, and we try to initialize it using the > isolated classloader (the current thread context classloader), it will fail > and report `Unknown` which causes Hive to throw the following exception: > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at
[jira] [Commented] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154730#comment-17154730 ] Dongjoon Hyun commented on SPARK-32256: --- Thank you so much, [~zsxwing]. According to the PR description, this doesn't happen with Apache Spark 3.0.0 with Hadoop 3.2 distribution. > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars > to access Hive Metastore. These jars are loaded by the isolated classloader. > Because we also share Hadoop classes with the isolated classloader, the user > doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which > means when we are using the isolated classloader, hadoop-common jar is not > available in this case. If Hadoop VersionInfo is not initialized before we > switch to the isolated classloader, and we try to initialize it using the > isolated classloader (the current thread context classloader), it will fail > and report `Unknown` which causes Hive to throw the following exception: > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at >
[jira] [Updated] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-32256: - Description: Spark allows the user to set `spark.sql.hive.metastore.jars` to specify jars to access Hive Metastore. These jars are loaded by the isolated classloader. Because we also share Hadoop classes with the isolated classloader, the user doesn't need to add Hadoop jars to `spark.sql.hive.metastore.jars`, which means when we are using the isolated classloader, hadoop-common jar is not available in this case. If Hadoop VersionInfo is not initialized before we switch to the isolated classloader, and we try to initialize it using the isolated classloader (the current thread context classloader), it will fail and report `Unknown` which causes Hive to throw the following exception: {code} 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading shims java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format) at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) {code} Technically,
[jira] [Commented] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154714#comment-17154714 ] Apache Spark commented on SPARK-32256: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/29059 > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > TODO > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) > {code} > I marked
[jira] [Assigned] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32256: Assignee: Apache Spark (was: Shixiong Zhu) > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Blocker > > TODO > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) > {code} > I marked this is a blocker because it's a regression in 3.0.0. -- This message was sent by
[jira] [Commented] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154713#comment-17154713 ] Apache Spark commented on SPARK-32256: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/29059 > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > TODO > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) > {code} > I marked
[jira] [Assigned] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32256: Assignee: Shixiong Zhu (was: Apache Spark) > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > TODO > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) > {code} > I marked this is a blocker because it's a regression in 3.0.0. -- This message was sent by
[jira] [Assigned] (SPARK-32035) Inconsistent AWS environment variables in documentation
[ https://issues.apache.org/jira/browse/SPARK-32035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32035: Assignee: Apache Spark > Inconsistent AWS environment variables in documentation > --- > > Key: SPARK-32035 > URL: https://issues.apache.org/jira/browse/SPARK-32035 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.6, 3.0.0 >Reporter: Ondrej Kokes >Assignee: Apache Spark >Priority: Minor > > Looking at the actual Scala code, the environment variables used to log into > AWS are: > - AWS_ACCESS_KEY_ID > - AWS_SECRET_ACCESS_KEY > - AWS_SESSION_TOKEN > These are the same that AWS uses in their libraries. > However, looking through the Spark documentation and comments, I see that > these are not denoted correctly across the board: > docs/cloud-integration.md > 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both > different* > 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated > authentication options > docs/streaming-kinesis-integration.md > 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and > `AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different* > external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py > 34: $ export AWS_ACCESS_KEY_ID= > 35: $ export AWS_SECRET_KEY= *<-- different* > 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret > key different* > core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala > 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") > 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") > 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") > external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala > 53: * $ export AWS_ACCESS_KEY_ID= > 54: * $ export AWS_SECRET_KEY= *<-- different* > 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* > external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java > 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] > 60: * $ export AWS_SECRET_KEY= *<-- different* > 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32035) Inconsistent AWS environment variables in documentation
[ https://issues.apache.org/jira/browse/SPARK-32035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154710#comment-17154710 ] Apache Spark commented on SPARK-32035: -- User 'Moovlin' has created a pull request for this issue: https://github.com/apache/spark/pull/29058 > Inconsistent AWS environment variables in documentation > --- > > Key: SPARK-32035 > URL: https://issues.apache.org/jira/browse/SPARK-32035 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.6, 3.0.0 >Reporter: Ondrej Kokes >Priority: Minor > > Looking at the actual Scala code, the environment variables used to log into > AWS are: > - AWS_ACCESS_KEY_ID > - AWS_SECRET_ACCESS_KEY > - AWS_SESSION_TOKEN > These are the same that AWS uses in their libraries. > However, looking through the Spark documentation and comments, I see that > these are not denoted correctly across the board: > docs/cloud-integration.md > 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both > different* > 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated > authentication options > docs/streaming-kinesis-integration.md > 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and > `AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different* > external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py > 34: $ export AWS_ACCESS_KEY_ID= > 35: $ export AWS_SECRET_KEY= *<-- different* > 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret > key different* > core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala > 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") > 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") > 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") > external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala > 53: * $ export AWS_ACCESS_KEY_ID= > 54: * $ export AWS_SECRET_KEY= *<-- different* > 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* > external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java > 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] > 60: * $ export AWS_SECRET_KEY= *<-- different* > 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32035) Inconsistent AWS environment variables in documentation
[ https://issues.apache.org/jira/browse/SPARK-32035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32035: Assignee: (was: Apache Spark) > Inconsistent AWS environment variables in documentation > --- > > Key: SPARK-32035 > URL: https://issues.apache.org/jira/browse/SPARK-32035 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.4.6, 3.0.0 >Reporter: Ondrej Kokes >Priority: Minor > > Looking at the actual Scala code, the environment variables used to log into > AWS are: > - AWS_ACCESS_KEY_ID > - AWS_SECRET_ACCESS_KEY > - AWS_SESSION_TOKEN > These are the same that AWS uses in their libraries. > However, looking through the Spark documentation and comments, I see that > these are not denoted correctly across the board: > docs/cloud-integration.md > 106:1. `spark-submit` reads the `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` *<-- both > different* > 107:and `AWS_SESSION_TOKEN` environment variables and sets the associated > authentication options > docs/streaming-kinesis-integration.md > 232:- Set up the environment variables `AWS_ACCESS_KEY_ID` and > `AWS_SECRET_KEY` with your AWS credentials. *<-- secret key different* > external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py > 34: $ export AWS_ACCESS_KEY_ID= > 35: $ export AWS_SECRET_KEY= *<-- different* > 48: Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- secret > key different* > core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala > 438: val keyId = System.getenv("AWS_ACCESS_KEY_ID") > 439: val accessKey = System.getenv("AWS_SECRET_ACCESS_KEY") > 448: val sessionToken = System.getenv("AWS_SESSION_TOKEN") > external/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala > 53: * $ export AWS_ACCESS_KEY_ID= > 54: * $ export AWS_SECRET_KEY= *<-- different* > 65: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* > external/kinesis-asl/src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java > 59: * $ export AWS_ACCESS_KEY_ID=[your-access-key] > 60: * $ export AWS_SECRET_KEY= *<-- different* > 71: * Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY *<-- > secret key different* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
[ https://issues.apache.org/jira/browse/SPARK-32256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-32256: Assignee: Shixiong Zhu > Hive may fail to detect Hadoop version when using isolated classloader > -- > > Key: SPARK-32256 > URL: https://issues.apache.org/jira/browse/SPARK-32256 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Blocker > > TODO > {code} > 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading > shims > java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* > format) > at > org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) > at > org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) > at > org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) > at > org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) > at > org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) > at > org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) > at > org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) > at > org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) > {code} > I marked this is a blocker because it's a regression in 3.0.0. -- This message was sent by Atlassian Jira
[jira] [Created] (SPARK-32256) Hive may fail to detect Hadoop version when using isolated classloader
Shixiong Zhu created SPARK-32256: Summary: Hive may fail to detect Hadoop version when using isolated classloader Key: SPARK-32256 URL: https://issues.apache.org/jira/browse/SPARK-32256 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Shixiong Zhu TODO {code} 08:49:33.242 ERROR org.apache.hadoop.hive.shims.ShimLoader: Error loading shims java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format) at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:147) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:122) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:88) at org.apache.hadoop.hive.metastore.ObjectStore.getDataSourceProps(ObjectStore.java:377) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:268) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:517) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:482) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:544) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:370) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:78) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:219) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:67) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1548) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3080) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3108) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:543) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:511) at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:175) at org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:128) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:301) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:431) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:324) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:72) at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:71) {code} I marked this is a blocker because it's a regression in 3.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32255) Set jobs that runs periodically for other profile combinations
[ https://issues.apache.org/jira/browse/SPARK-32255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32255: - Parent: SPARK-32244 Issue Type: Sub-task (was: Improvement) > Set jobs that runs periodically for other profile combinations > -- > > Key: SPARK-32255 > URL: https://issues.apache.org/jira/browse/SPARK-32255 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Github Actions supports to scheduled build and job, see > https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#onschedule > It should be good if we set the build with other profiles and maybe JDK 11 > combinations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32255) Set jobs that runs periodically for other profile combinations
Hyukjin Kwon created SPARK-32255: Summary: Set jobs that runs periodically for other profile combinations Key: SPARK-32255 URL: https://issues.apache.org/jira/browse/SPARK-32255 Project: Spark Issue Type: Improvement Components: Project Infra, Tests Affects Versions: 3.0.0, 2.4.6, 3.1.0 Reporter: Hyukjin Kwon Github Actions supports to scheduled build and job, see https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#onschedule It should be good if we set the build with other profiles and maybe JDK 11 combinations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32253) Make readability better in the test result logs
[ https://issues.apache.org/jira/browse/SPARK-32253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32253: - Summary: Make readability better in the test result logs (was: Make logs readability better ) > Make readability better in the test result logs > --- > > Key: SPARK-32253 > URL: https://issues.apache.org/jira/browse/SPARK-32253 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Currently, the readability in the logs are not really good. For example, see > https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D > We should have a way to easily see the failed test cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32254) Reenable SparkSQLEnvSuite's "external listeners should be initialized with Spark classloader"
[ https://issues.apache.org/jira/browse/SPARK-32254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32254. -- Resolution: Invalid I just decided to run other tests in parallel only except Hive for now. https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR156 this test isn't an issue now. > Reenable SparkSQLEnvSuite's "external listeners should be initialized with > Spark classloader" > - > > Key: SPARK-32254 > URL: https://issues.apache.org/jira/browse/SPARK-32254 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > When this test runs not in the dedicated JVM with SBT, the test seems > failing. It also fails in Github Actions. We should enable it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32192) Print column name when throws ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-32192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-32192. -- Fix Version/s: (was: 2.4.3) 3.1.0 Resolution: Fixed Issue resolved by pull request 29010 [https://github.com/apache/spark/pull/29010] > Print column name when throws ClassCastException > > > Key: SPARK-32192 > URL: https://issues.apache.org/jira/browse/SPARK-32192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3, 3.0.0 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Minor > Fix For: 3.1.0 > > > We have a demo like this: > {code:java} > drop table if exists cast_exception_test; > create table cast_exception_test(c1 int, c2 string) partitioned by (dt > string) stored as orc; > insert into table cast_exception_test partition(dt='2020-04-08') values('1', > 'jeff_1'); > {code} > Now we query this partition no problem both spark and hive, but if we change > the type of partition's field, spark will throw exception, but hive will not: > {code:java} > alter table cast_exception_test change column c1 c1 string; > -- hive correct, but spark throws ClassCastException > select * from cast_exception_test where dt='2020-04-08'; > {code} > exception like this: > {code:java} > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to org.apache.hadoop.io.Textat > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) > at > org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$23.apply(HiveInspectors.scala:547) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$21$$anonfun$apply$15.apply(TableReader.scala:515) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$21$$anonfun$apply$15.apply(TableReader.scala:515) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:532) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:522) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)at > scala.collection.Iterator$$anon$11.next(Iterator.scala:410)at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:256) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:838) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:838) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:326)at > org.apache.spark.rdd.RDD.iterator(RDD.scala:290)at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at > org.apache.spark.scheduler.Task.run(Task.scala:121)at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32192) Print column name when throws ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-32192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-32192: Assignee: xiepengjie > Print column name when throws ClassCastException > > > Key: SPARK-32192 > URL: https://issues.apache.org/jira/browse/SPARK-32192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3, 3.0.0 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Minor > Fix For: 2.4.3 > > > We have a demo like this: > {code:java} > drop table if exists cast_exception_test; > create table cast_exception_test(c1 int, c2 string) partitioned by (dt > string) stored as orc; > insert into table cast_exception_test partition(dt='2020-04-08') values('1', > 'jeff_1'); > {code} > Now we query this partition no problem both spark and hive, but if we change > the type of partition's field, spark will throw exception, but hive will not: > {code:java} > alter table cast_exception_test change column c1 c1 string; > -- hive correct, but spark throws ClassCastException > select * from cast_exception_test where dt='2020-04-08'; > {code} > exception like this: > {code:java} > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to org.apache.hadoop.io.Textat > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41) > at > org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$23.apply(HiveInspectors.scala:547) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$21$$anonfun$apply$15.apply(TableReader.scala:515) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$21$$anonfun$apply$15.apply(TableReader.scala:515) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:532) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:522) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)at > scala.collection.Iterator$$anon$11.next(Iterator.scala:410)at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:256) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:838) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:838) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:326)at > org.apache.spark.rdd.RDD.iterator(RDD.scala:290)at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)at > org.apache.spark.scheduler.Task.run(Task.scala:121)at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32254) Reenable SparkSQLEnvSuite's "external listeners should be initialized with Spark classloader"
Hyukjin Kwon created SPARK-32254: Summary: Reenable SparkSQLEnvSuite's "external listeners should be initialized with Spark classloader" Key: SPARK-32254 URL: https://issues.apache.org/jira/browse/SPARK-32254 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0, 2.4.6, 3.1.0 Reporter: Hyukjin Kwon When this test runs not in the dedicated JVM with SBT, the test seems failing. It also fails in Github Actions. We should enable it back. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32245) Implement the base to run Spark tests in GitHun Actions
[ https://issues.apache.org/jira/browse/SPARK-32245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32245: Assignee: Hyukjin Kwon (was: Apache Spark) > Implement the base to run Spark tests in GitHun Actions > --- > > Key: SPARK-32245 > URL: https://issues.apache.org/jira/browse/SPARK-32245 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > I have an initial version to run the tests in Github Actions. We should have > the minimal working version first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32245) Implement the base to run Spark tests in GitHun Actions
[ https://issues.apache.org/jira/browse/SPARK-32245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32245: Assignee: Apache Spark (was: Hyukjin Kwon) > Implement the base to run Spark tests in GitHun Actions > --- > > Key: SPARK-32245 > URL: https://issues.apache.org/jira/browse/SPARK-32245 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > I have an initial version to run the tests in Github Actions. We should have > the minimal working version first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32245) Implement the base to run Spark tests in GitHun Actions
[ https://issues.apache.org/jira/browse/SPARK-32245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154601#comment-17154601 ] Apache Spark commented on SPARK-32245: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29057 > Implement the base to run Spark tests in GitHun Actions > --- > > Key: SPARK-32245 > URL: https://issues.apache.org/jira/browse/SPARK-32245 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > I have an initial version to run the tests in Github Actions. We should have > the minimal working version first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32253) Make logs readability better
Hyukjin Kwon created SPARK-32253: Summary: Make logs readability better Key: SPARK-32253 URL: https://issues.apache.org/jira/browse/SPARK-32253 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.0.0, 2.4.6, 3.1.0 Reporter: Hyukjin Kwon Currently, the readability in the logs are not really good. For example, see https://pipelines.actions.githubusercontent.com/gik0C3if0ep5i8iNpgFlcJRQk9UyifmoD6XvJANMVttkEP5xje/_apis/pipelines/1/runs/564/signedlogcontent/4?urlExpires=2020-07-09T14%3A05%3A52.5110439Z=HMACV1=gMGczJ8vtNPeQFE0GpjMxSS1BGq14RJLXUfjsLnaX7s%3D We should have a way to easily see the failed test cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31753) Add missing keywords in the SQL documents
[ https://issues.apache.org/jira/browse/SPARK-31753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154589#comment-17154589 ] Apache Spark commented on SPARK-31753: -- User 'GuoPhilipse' has created a pull request for this issue: https://github.com/apache/spark/pull/29056 > Add missing keywords in the SQL documents > - > > Key: SPARK-31753 > URL: https://issues.apache.org/jira/browse/SPARK-31753 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > Some keywords are missing in the SQL documents and a list of them is as > follows. > [https://github.com/apache/spark/pull/28290#issuecomment-619321301] > {code:java} > AFTER > CASE/ELSE > WHEN/THEN > IGNORE NULLS > LATERAL VIEW (OUTER)? > MAP KEYS TERMINATED BY > NULL DEFINED AS > LINES TERMINATED BY > ESCAPED BY > COLLECTION ITEMS TERMINATED BY > EXPLAIN LOGICAL > PIVOT > {code} > They should be documented there, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31753) Add missing keywords in the SQL documents
[ https://issues.apache.org/jira/browse/SPARK-31753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154588#comment-17154588 ] Apache Spark commented on SPARK-31753: -- User 'GuoPhilipse' has created a pull request for this issue: https://github.com/apache/spark/pull/29056 > Add missing keywords in the SQL documents > - > > Key: SPARK-31753 > URL: https://issues.apache.org/jira/browse/SPARK-31753 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > Some keywords are missing in the SQL documents and a list of them is as > follows. > [https://github.com/apache/spark/pull/28290#issuecomment-619321301] > {code:java} > AFTER > CASE/ELSE > WHEN/THEN > IGNORE NULLS > LATERAL VIEW (OUTER)? > MAP KEYS TERMINATED BY > NULL DEFINED AS > LINES TERMINATED BY > ESCAPED BY > COLLECTION ITEMS TERMINATED BY > EXPLAIN LOGICAL > PIVOT > {code} > They should be documented there, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31753) Add missing keywords in the SQL documents
[ https://issues.apache.org/jira/browse/SPARK-31753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31753: Assignee: (was: Apache Spark) > Add missing keywords in the SQL documents > - > > Key: SPARK-31753 > URL: https://issues.apache.org/jira/browse/SPARK-31753 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Minor > > Some keywords are missing in the SQL documents and a list of them is as > follows. > [https://github.com/apache/spark/pull/28290#issuecomment-619321301] > {code:java} > AFTER > CASE/ELSE > WHEN/THEN > IGNORE NULLS > LATERAL VIEW (OUTER)? > MAP KEYS TERMINATED BY > NULL DEFINED AS > LINES TERMINATED BY > ESCAPED BY > COLLECTION ITEMS TERMINATED BY > EXPLAIN LOGICAL > PIVOT > {code} > They should be documented there, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31753) Add missing keywords in the SQL documents
[ https://issues.apache.org/jira/browse/SPARK-31753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31753: Assignee: Apache Spark > Add missing keywords in the SQL documents > - > > Key: SPARK-31753 > URL: https://issues.apache.org/jira/browse/SPARK-31753 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Assignee: Apache Spark >Priority: Minor > > Some keywords are missing in the SQL documents and a list of them is as > follows. > [https://github.com/apache/spark/pull/28290#issuecomment-619321301] > {code:java} > AFTER > CASE/ELSE > WHEN/THEN > IGNORE NULLS > LATERAL VIEW (OUTER)? > MAP KEYS TERMINATED BY > NULL DEFINED AS > LINES TERMINATED BY > ESCAPED BY > COLLECTION ITEMS TERMINATED BY > EXPLAIN LOGICAL > PIVOT > {code} > They should be documented there, too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32244) Build and run the Spark with test cases in Github Actions
[ https://issues.apache.org/jira/browse/SPARK-32244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154583#comment-17154583 ] Hyukjin Kwon commented on SPARK-32244: -- cc [~dongjoon] [~Gengliang.Wang] FYI > Build and run the Spark with test cases in Github Actions > - > > Key: SPARK-32244 > URL: https://issues.apache.org/jira/browse/SPARK-32244 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Hyukjin Kwon >Priority: Critical > > Last week and onwards, the Jenkins machines became very unstable for some > reasons. > - Apparently, the machines became extremely slow. Almost all tests can't > pass. > - One machine (worker 4) started to have the corrupt .m2 which fails the > build. > - Documentation build fails time to time for an unknown reason in Jenkins > machine specifically. > Almost all PRs are basically blocked by this instability currently. > This JIRA aims to run the tests in Github Actions. > - To avoid depending on few persons who can access to the cluster. > - To reduce the elapsed time in the build - we could split the tests (e.g., > SQL, ML, CORE), and run them in parallel so the total build time will > significantly reduce. > - To control the environment more flexibly. > - Other contributors can test and propose to fix Github Actions > configurations so we can distribute this build management cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32251) fix SQL keyword document
[ https://issues.apache.org/jira/browse/SPARK-32251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32251: Assignee: Apache Spark (was: Wenchen Fan) > fix SQL keyword document > > > Key: SPARK-32251 > URL: https://issues.apache.org/jira/browse/SPARK-32251 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32251) fix SQL keyword document
[ https://issues.apache.org/jira/browse/SPARK-32251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32251: Assignee: Wenchen Fan (was: Apache Spark) > fix SQL keyword document > > > Key: SPARK-32251 > URL: https://issues.apache.org/jira/browse/SPARK-32251 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org