[jira] [Commented] (SPARK-20980) Rename the option `wholeFile` to `multiLine` for JSON and CSV
[ https://issues.apache.org/jira/browse/SPARK-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050023#comment-16050023 ] Apache Spark commented on SPARK-20980: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/18312 > Rename the option `wholeFile` to `multiLine` for JSON and CSV > - > > Key: SPARK-20980 > URL: https://issues.apache.org/jira/browse/SPARK-20980 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.2.0 > > > The current option name `wholeFile` is misleading for CSV. Currently, it is > not representing a record per file. Actually, one file could have multiple > records. Thus, we should rename it. Now, the proposal is `multiLine`. > To make it consistent, we need to rename the same option for JSON and fix the > issue in another JIRA. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21087: Assignee: (was: Apache Spark) > CrossValidator, TrainValidationSplit should preserve all models after > fitting: Scala > > > Key: SPARK-21087 > URL: https://issues.apache.org/jira/browse/SPARK-21087 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > See parent JIRA -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21087: Assignee: Apache Spark > CrossValidator, TrainValidationSplit should preserve all models after > fitting: Scala > > > Key: SPARK-21087 > URL: https://issues.apache.org/jira/browse/SPARK-21087 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > See parent JIRA -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala
[ https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050053#comment-16050053 ] Apache Spark commented on SPARK-21087: -- User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/18313 > CrossValidator, TrainValidationSplit should preserve all models after > fitting: Scala > > > Key: SPARK-21087 > URL: https://issues.apache.org/jira/browse/SPARK-21087 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > See parent JIRA -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20200) Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite
[ https://issues.apache.org/jira/browse/SPARK-20200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050190#comment-16050190 ] Apache Spark commented on SPARK-20200: -- User 'jiangxb1987' has created a pull request for this issue: https://github.com/apache/spark/pull/18314 > Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite > - > > Key: SPARK-20200 > URL: https://issues.apache.org/jira/browse/SPARK-20200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Takuya Ueshin >Priority: Minor > Labels: flaky-test > > This test failed recently here: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/2909/testReport/junit/org.apache.spark.rdd/LocalCheckpointSuite/missing_checkpoint_block_fails_with_informative_message/ > Dashboard > https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message > Error Message > {code} > Collect should have failed if local checkpoint block is removed... > {code} > {code} > org.scalatest.exceptions.TestFailedException: Collect should have failed if > local checkpoint block is removed... > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at org.scalatest.Assertions$class.fail(Assertions.scala:1328) > at org.scalatest.FunSuite.fail(FunSuite.scala:1555) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply$mcV$sp(LocalCheckpointSuite.scala:173) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.rdd.LocalCheckpointSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(LocalCheckpointSuite.scala:27) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.rdd.LocalCheckpointSuite.runTest(LocalCheckpointSuite.scala:27) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31) > at org.scalat
[jira] [Commented] (SPARK-16251) LocalCheckpointSuite's - missing checkpoint block fails with informative message is flaky.
[ https://issues.apache.org/jira/browse/SPARK-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050189#comment-16050189 ] Apache Spark commented on SPARK-16251: -- User 'jiangxb1987' has created a pull request for this issue: https://github.com/apache/spark/pull/18314 > LocalCheckpointSuite's - missing checkpoint block fails with informative > message is flaky. > -- > > Key: SPARK-16251 > URL: https://issues.apache.org/jira/browse/SPARK-16251 > Project: Spark > Issue Type: Bug >Reporter: Prashant Sharma >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20200) Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite
[ https://issues.apache.org/jira/browse/SPARK-20200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20200: Assignee: Apache Spark > Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite > - > > Key: SPARK-20200 > URL: https://issues.apache.org/jira/browse/SPARK-20200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Minor > Labels: flaky-test > > This test failed recently here: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/2909/testReport/junit/org.apache.spark.rdd/LocalCheckpointSuite/missing_checkpoint_block_fails_with_informative_message/ > Dashboard > https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message > Error Message > {code} > Collect should have failed if local checkpoint block is removed... > {code} > {code} > org.scalatest.exceptions.TestFailedException: Collect should have failed if > local checkpoint block is removed... > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at org.scalatest.Assertions$class.fail(Assertions.scala:1328) > at org.scalatest.FunSuite.fail(FunSuite.scala:1555) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply$mcV$sp(LocalCheckpointSuite.scala:173) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.rdd.LocalCheckpointSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(LocalCheckpointSuite.scala:27) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.rdd.LocalCheckpointSuite.runTest(LocalCheckpointSuite.scala:27) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31) > at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492) > at > org.scalatest.Suite$$ano
[jira] [Assigned] (SPARK-20200) Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite
[ https://issues.apache.org/jira/browse/SPARK-20200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20200: Assignee: (was: Apache Spark) > Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite > - > > Key: SPARK-20200 > URL: https://issues.apache.org/jira/browse/SPARK-20200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Takuya Ueshin >Priority: Minor > Labels: flaky-test > > This test failed recently here: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/2909/testReport/junit/org.apache.spark.rdd/LocalCheckpointSuite/missing_checkpoint_block_fails_with_informative_message/ > Dashboard > https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message > Error Message > {code} > Collect should have failed if local checkpoint block is removed... > {code} > {code} > org.scalatest.exceptions.TestFailedException: Collect should have failed if > local checkpoint block is removed... > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at org.scalatest.Assertions$class.fail(Assertions.scala:1328) > at org.scalatest.FunSuite.fail(FunSuite.scala:1555) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply$mcV$sp(LocalCheckpointSuite.scala:173) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155) > at > org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.rdd.LocalCheckpointSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(LocalCheckpointSuite.scala:27) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.rdd.LocalCheckpointSuite.runTest(LocalCheckpointSuite.scala:27) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31) > at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492) > at > org.scalatest.Suite$$anonfun$runNestedSuites$1.ap
[jira] [Commented] (SPARK-21108) convert LinearSVC to aggregator framework
[ https://issues.apache.org/jira/browse/SPARK-21108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050931#comment-16050931 ] Apache Spark commented on SPARK-21108: -- User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/18315 > convert LinearSVC to aggregator framework > - > > Key: SPARK-21108 > URL: https://issues.apache.org/jira/browse/SPARK-21108 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.2.0 >Reporter: yuhao yang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21108) convert LinearSVC to aggregator framework
[ https://issues.apache.org/jira/browse/SPARK-21108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21108: Assignee: (was: Apache Spark) > convert LinearSVC to aggregator framework > - > > Key: SPARK-21108 > URL: https://issues.apache.org/jira/browse/SPARK-21108 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.2.0 >Reporter: yuhao yang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21108) convert LinearSVC to aggregator framework
[ https://issues.apache.org/jira/browse/SPARK-21108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21108: Assignee: Apache Spark > convert LinearSVC to aggregator framework > - > > Key: SPARK-21108 > URL: https://issues.apache.org/jira/browse/SPARK-21108 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.2.0 >Reporter: yuhao yang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21111) Fix test failure in 2.2
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-2: Assignee: Apache Spark (was: Xiao Li) > Fix test failure in 2.2 > > > Key: SPARK-2 > URL: https://issues.apache.org/jira/browse/SPARK-2 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Apache Spark >Priority: Blocker > > Test failure: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21111) Fix test failure in 2.2
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-2: Assignee: Xiao Li (was: Apache Spark) > Fix test failure in 2.2 > > > Key: SPARK-2 > URL: https://issues.apache.org/jira/browse/SPARK-2 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Blocker > > Test failure: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21111) Fix test failure in 2.2
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051095#comment-16051095 ] Apache Spark commented on SPARK-2: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/18316 > Fix test failure in 2.2 > > > Key: SPARK-2 > URL: https://issues.apache.org/jira/browse/SPARK-2 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li >Priority: Blocker > > Test failure: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader
[ https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051134#comment-16051134 ] Apache Spark commented on SPARK-21113: -- User 'sitalkedia' has created a pull request for this issue: https://github.com/apache/spark/pull/18317 > Support for read ahead input stream to amortize disk IO cost in the Spill > reader > > > Key: SPARK-21113 > URL: https://issues.apache.org/jira/browse/SPARK-21113 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Sital Kedia >Priority: Minor > > Profiling some of our big jobs, we see that around 30% of the time is being > spent in reading the spill files from disk. In order to amortize the disk IO > cost, the idea is to implement a read ahead input stream which which > asynchronously reads ahead from the underlying input stream when specified > amount of data has been read from the current buffer. It does it by > maintaining two buffer - active buffer and read ahead buffer. Active buffer > contains data which should be returned when a read() call is issued. The read > ahead buffer is used to asynchronously read from the underlying input stream > and once the current active buffer is exhausted, we flip the two buffers so > that we can start reading from the read ahead buffer without being blocked in > disk I/O. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader
[ https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21113: Assignee: Apache Spark > Support for read ahead input stream to amortize disk IO cost in the Spill > reader > > > Key: SPARK-21113 > URL: https://issues.apache.org/jira/browse/SPARK-21113 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Sital Kedia >Assignee: Apache Spark >Priority: Minor > > Profiling some of our big jobs, we see that around 30% of the time is being > spent in reading the spill files from disk. In order to amortize the disk IO > cost, the idea is to implement a read ahead input stream which which > asynchronously reads ahead from the underlying input stream when specified > amount of data has been read from the current buffer. It does it by > maintaining two buffer - active buffer and read ahead buffer. Active buffer > contains data which should be returned when a read() call is issued. The read > ahead buffer is used to asynchronously read from the underlying input stream > and once the current active buffer is exhausted, we flip the two buffers so > that we can start reading from the read ahead buffer without being blocked in > disk I/O. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader
[ https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21113: Assignee: (was: Apache Spark) > Support for read ahead input stream to amortize disk IO cost in the Spill > reader > > > Key: SPARK-21113 > URL: https://issues.apache.org/jira/browse/SPARK-21113 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2 >Reporter: Sital Kedia >Priority: Minor > > Profiling some of our big jobs, we see that around 30% of the time is being > spent in reading the spill files from disk. In order to amortize the disk IO > cost, the idea is to implement a read ahead input stream which which > asynchronously reads ahead from the underlying input stream when specified > amount of data has been read from the current buffer. It does it by > maintaining two buffer - active buffer and read ahead buffer. Active buffer > contains data which should be returned when a read() call is issued. The read > ahead buffer is used to asynchronously read from the underlying input stream > and once the current active buffer is exhausted, we flip the two buffers so > that we can start reading from the read ahead buffer without being blocked in > disk I/O. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
[ https://issues.apache.org/jira/browse/SPARK-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21112: Assignee: Apache Spark (was: Xiao Li) > ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT > -- > > Key: SPARK-21112 > URL: https://issues.apache.org/jira/browse/SPARK-21112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Apache Spark > > {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if > the input does not have the property of `COMMENT` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
[ https://issues.apache.org/jira/browse/SPARK-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21112: Assignee: Xiao Li (was: Apache Spark) > ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT > -- > > Key: SPARK-21112 > URL: https://issues.apache.org/jira/browse/SPARK-21112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Xiao Li > > {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if > the input does not have the property of `COMMENT` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
[ https://issues.apache.org/jira/browse/SPARK-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051140#comment-16051140 ] Apache Spark commented on SPARK-21112: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/18318 > ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT > -- > > Key: SPARK-21112 > URL: https://issues.apache.org/jira/browse/SPARK-21112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Xiao Li > > {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if > the input does not have the property of `COMMENT` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21114: Assignee: Apache Spark (was: Xiao Li) > Test failure fix in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.1.1 >Reporter: Xiao Li >Assignee: Apache Spark > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051290#comment-16051290 ] Apache Spark commented on SPARK-21114: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/18319 > Test failure fix in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch
[ https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21114: Assignee: Xiao Li (was: Apache Spark) > Test failure fix in Spark 2.1 due to name mismatch > -- > > Key: SPARK-21114 > URL: https://issues.apache.org/jira/browse/SPARK-21114 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.1.1 >Reporter: Xiao Li >Assignee: Xiao Li > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR
[ https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21093: Assignee: Apache Spark > Multiple gapply execution occasionally failed in SparkR > > > Key: SPARK-21093 > URL: https://issues.apache.org/jira/browse/SPARK-21093 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.1, 2.2.0 > Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3 >Reporter: Hyukjin Kwon >Assignee: Apache Spark > > On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks > failed as below: > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan > since it was too large. This behavior can be adjusted by setting > 'spark.debug.maxToStringFields' in SparkEnv.conf. > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > Error in handleErrors(returnStatus, conn) : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 > in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage > 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: > R computation failed with > at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.a > ... > *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated > === Backtrace: = > /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597] > /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750] > /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507] > /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015] > /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e] > /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4] > /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529] > /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7] > /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1] > /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9] > /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] > /usr/lib64/R/lib/libR.so(+0x11ee91)[0x7fe69c5abe91] > /usr/lib64/R/lib/libR.so(Rf_
[jira] [Assigned] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR
[ https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21093: Assignee: (was: Apache Spark) > Multiple gapply execution occasionally failed in SparkR > > > Key: SPARK-21093 > URL: https://issues.apache.org/jira/browse/SPARK-21093 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.1, 2.2.0 > Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3 >Reporter: Hyukjin Kwon > > On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks > failed as below: > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan > since it was too large. This behavior can be adjusted by setting > 'spark.debug.maxToStringFields' in SparkEnv.conf. > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > Error in handleErrors(returnStatus, conn) : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 > in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage > 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: > R computation failed with > at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.a > ... > *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated > === Backtrace: = > /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597] > /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750] > /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507] > /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015] > /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e] > /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4] > /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529] > /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7] > /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1] > /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9] > /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] > /usr/lib64/R/lib/libR.so(+0x11ee91)[0x7fe69c5abe91] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad13
[jira] [Commented] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR
[ https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051308#comment-16051308 ] Apache Spark commented on SPARK-21093: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/18320 > Multiple gapply execution occasionally failed in SparkR > > > Key: SPARK-21093 > URL: https://issues.apache.org/jira/browse/SPARK-21093 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.1.1, 2.2.0 > Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3 >Reporter: Hyukjin Kwon > > On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks > failed as below: > {code} > Welcome to > __ >/ __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT > /_/ > SparkSession available as 'spark'. > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan > since it was too large. This behavior can be adjusted by setting > 'spark.debug.maxToStringFields' in SparkEnv.conf. > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > a b c d > 1 1 1 1 0.1 > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > Error in handleErrors(returnStatus, conn) : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 > in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage > 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: > R computation failed with > at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432) > at > org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.a > ... > *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated > === Backtrace: = > /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597] > /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750] > /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507] > /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015] > /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e] > /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4] > /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529] > /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce] > /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e] > /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7] > /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1] > /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9] > /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af] > /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101] > /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138] > /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69
[jira] [Commented] (SPARK-12552) Recovered driver's resource is not counted in the Master
[ https://issues.apache.org/jira/browse/SPARK-12552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051322#comment-16051322 ] Apache Spark commented on SPARK-12552: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/18321 > Recovered driver's resource is not counted in the Master > > > Key: SPARK-12552 > URL: https://issues.apache.org/jira/browse/SPARK-12552 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Core >Affects Versions: 1.6.0 >Reporter: Saisai Shao >Assignee: Saisai Shao > Fix For: 2.2.1, 2.3.0 > > > Currently in the implementation of Standalone Master HA, if application is > submitted as cluster mode, the resource (CPU cores and memory) of driver is > not counted again when recovered from failure, which will lead to unexpected > behaviors, like more than expected executors, negative core and memory usage > in the web UI. Also the recovered application's state is always {{WAITING}}, > we have to change the state to {{RUNNING}} when fully recovered. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21115) If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule
[ https://issues.apache.org/jira/browse/SPARK-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21115: Assignee: (was: Apache Spark) > If the cores left is less than the coresPerExecutor,the cores left will not > be allocated, so it should not to check in every schedule > - > > Key: SPARK-21115 > URL: https://issues.apache.org/jira/browse/SPARK-21115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: eaton >Priority: Minor > > If we start an app with the param --total-executor-cores=4 and > spark.executor.cores=3, the cores left is always 1, so it will try to > allocate executors in the function > org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule. > Another question is, is it will be better to allocate another executor with 1 > core for the cores left. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21115) If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule
[ https://issues.apache.org/jira/browse/SPARK-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051427#comment-16051427 ] Apache Spark commented on SPARK-21115: -- User 'eatoncys' has created a pull request for this issue: https://github.com/apache/spark/pull/18322 > If the cores left is less than the coresPerExecutor,the cores left will not > be allocated, so it should not to check in every schedule > - > > Key: SPARK-21115 > URL: https://issues.apache.org/jira/browse/SPARK-21115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: eaton >Priority: Minor > > If we start an app with the param --total-executor-cores=4 and > spark.executor.cores=3, the cores left is always 1, so it will try to > allocate executors in the function > org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule. > Another question is, is it will be better to allocate another executor with 1 > core for the cores left. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21115) If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule
[ https://issues.apache.org/jira/browse/SPARK-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21115: Assignee: Apache Spark > If the cores left is less than the coresPerExecutor,the cores left will not > be allocated, so it should not to check in every schedule > - > > Key: SPARK-21115 > URL: https://issues.apache.org/jira/browse/SPARK-21115 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: eaton >Assignee: Apache Spark >Priority: Minor > > If we start an app with the param --total-executor-cores=4 and > spark.executor.cores=3, the cores left is always 1, so it will try to > allocate executors in the function > org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule. > Another question is, is it will be better to allocate another executor with 1 > core for the cores left. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET
[ https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21117: Assignee: (was: Apache Spark) > Built-in SQL Function Support - WIDTH_BUCKET > > > Key: SPARK-21117 > URL: https://issues.apache.org/jira/browse/SPARK-21117 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > For a given expression, the {{WIDTH_BUCKET}} function returns the bucket > number into which the value of this expression would fall after being > evaluated. > {code:sql} > WIDTH_BUCKET (expr , min_value , max_value , num_buckets) > {code} > Ref: > https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET
[ https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21117: Assignee: Apache Spark > Built-in SQL Function Support - WIDTH_BUCKET > > > Key: SPARK-21117 > URL: https://issues.apache.org/jira/browse/SPARK-21117 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark > > For a given expression, the {{WIDTH_BUCKET}} function returns the bucket > number into which the value of this expression would fall after being > evaluated. > {code:sql} > WIDTH_BUCKET (expr , min_value , max_value , num_buckets) > {code} > Ref: > https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET
[ https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051457#comment-16051457 ] Apache Spark commented on SPARK-21117: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/18323 > Built-in SQL Function Support - WIDTH_BUCKET > > > Key: SPARK-21117 > URL: https://issues.apache.org/jira/browse/SPARK-21117 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > For a given expression, the {{WIDTH_BUCKET}} function returns the bucket > number into which the value of this expression would fall after being > evaluated. > {code:sql} > WIDTH_BUCKET (expr , min_value , max_value , num_buckets) > {code} > Ref: > https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21045) Spark executor blocked instead of throwing exception because exception occur when python worker send exception info to PythonRDD in Python 2+
[ https://issues.apache.org/jira/browse/SPARK-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051471#comment-16051471 ] Apache Spark commented on SPARK-21045: -- User 'dataknocker' has created a pull request for this issue: https://github.com/apache/spark/pull/18261 > Spark executor blocked instead of throwing exception because exception occur > when python worker send exception info to PythonRDD in Python 2+ > - > > Key: SPARK-21045 > URL: https://issues.apache.org/jira/browse/SPARK-21045 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1, 2.0.2, 2.1.1 > Environment: It has problem only in Python 2+. > Python 3+ is ok. >Reporter: Joshuawangzj > > My pyspark program is always blocking in product yarn cluster. Then I jstack > and found : > {code} > "Executor task launch worker for task 0" #60 daemon prio=5 os_prio=31 > tid=0x7fb2f44e3000 nid=0xa003 runnable [0x000123b4a000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x0007acab1c98> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:190) > at > org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234) > at > org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is blocking in socket read. I view the log on blocking executor and found > error: > {code} > Traceback (most recent call last): > File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 178, in > main > write_with_length(traceback.format_exc().encode("utf-8"), outfile) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 618: > ordinal not in range(128) > {code} > Finally I found the problem: > {code:title=worker.py|borderStyle=solid} > # 178 line in spark 2.1.1 > except Exception: > try: > write_int(SpecialLengths.PYTHON_EXCEPTION_THROWN, outfile) > write_with_length(traceback.format_exc().encode("utf-8"), outfile) > except IOError: > # JVM close the socket > pass > except Exception: > # Write the error to stderr if it happened while serializing > print("PySpark worker failed with exception:", file=sys.stderr) > print(traceback.format_exc(), file=sys.stderr) > {code} > when write_with_length(traceback.format_exc().encode("utf-8"), outfile) occur > exception like UnicodeDecodeError, the python worker can't send the trace > info, but when the PythonRDD get PYTHON_EXCEPTION_THROWN, It should read the > trace info length next. So it is blocking. > {code:title=PythonRDD.scala|borderStyle=solid} > # 190 line in spark 2.1.1 > case SpecialLengths.PYTHON_EXCEPTION_THROWN => > // Signals that an exception has been thrown in python > val exLength = stream.readInt() // It is possible to be blocked > {code} > {color:red} > We can triggle the bug use simple program: > {color} > {code:title=test.py|borderStyle=solid} > spark = SparkSession.builder.master('local').getOrCreate() > rdd = spark.sparkContext.parallelize(['中']).map(lambda x: > x.encode("utf8")) > rdd.collect() > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21045) Spark executor blocked instead of throwing exception because exception occur when python worker send exception info to PythonRDD in Python 2+
[ https://issues.apache.org/jira/browse/SPARK-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051485#comment-16051485 ] Apache Spark commented on SPARK-21045: -- User 'dataknocker' has created a pull request for this issue: https://github.com/apache/spark/pull/18324 > Spark executor blocked instead of throwing exception because exception occur > when python worker send exception info to PythonRDD in Python 2+ > - > > Key: SPARK-21045 > URL: https://issues.apache.org/jira/browse/SPARK-21045 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1, 2.0.2, 2.1.1 > Environment: It has problem only in Python 2+. > Python 3+ is ok. >Reporter: Joshuawangzj > > My pyspark program is always blocking in product yarn cluster. Then I jstack > and found : > {code} > "Executor task launch worker for task 0" #60 daemon prio=5 os_prio=31 > tid=0x7fb2f44e3000 nid=0xa003 runnable [0x000123b4a000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) > at java.io.BufferedInputStream.read(BufferedInputStream.java:265) > - locked <0x0007acab1c98> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:190) > at > org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234) > at > org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > It is blocking in socket read. I view the log on blocking executor and found > error: > {code} > Traceback (most recent call last): > File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 178, in > main > write_with_length(traceback.format_exc().encode("utf-8"), outfile) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 618: > ordinal not in range(128) > {code} > Finally I found the problem: > {code:title=worker.py|borderStyle=solid} > # 178 line in spark 2.1.1 > except Exception: > try: > write_int(SpecialLengths.PYTHON_EXCEPTION_THROWN, outfile) > write_with_length(traceback.format_exc().encode("utf-8"), outfile) > except IOError: > # JVM close the socket > pass > except Exception: > # Write the error to stderr if it happened while serializing > print("PySpark worker failed with exception:", file=sys.stderr) > print(traceback.format_exc(), file=sys.stderr) > {code} > when write_with_length(traceback.format_exc().encode("utf-8"), outfile) occur > exception like UnicodeDecodeError, the python worker can't send the trace > info, but when the PythonRDD get PYTHON_EXCEPTION_THROWN, It should read the > trace info length next. So it is blocking. > {code:title=PythonRDD.scala|borderStyle=solid} > # 190 line in spark 2.1.1 > case SpecialLengths.PYTHON_EXCEPTION_THROWN => > // Signals that an exception has been thrown in python > val exLength = stream.readInt() // It is possible to be blocked > {code} > {color:red} > We can triggle the bug use simple program: > {color} > {code:title=test.py|borderStyle=solid} > spark = SparkSession.builder.master('local').getOrCreate() > rdd = spark.sparkContext.parallelize(['中']).map(lambda x: > x.encode("utf8")) > rdd.collect() > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21119) unset table properties should keep the table comment
[ https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051490#comment-16051490 ] Apache Spark commented on SPARK-21119: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/18325 > unset table properties should keep the table comment > > > Key: SPARK-21119 > URL: https://issues.apache.org/jira/browse/SPARK-21119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21119) unset table properties should keep the table comment
[ https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21119: Assignee: Apache Spark (was: Wenchen Fan) > unset table properties should keep the table comment > > > Key: SPARK-21119 > URL: https://issues.apache.org/jira/browse/SPARK-21119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21119) unset table properties should keep the table comment
[ https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21119: Assignee: Wenchen Fan (was: Apache Spark) > unset table properties should keep the table comment > > > Key: SPARK-21119 > URL: https://issues.apache.org/jira/browse/SPARK-21119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21120: Assignee: (was: Apache Spark) > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051660#comment-16051660 ] Apache Spark commented on SPARK-21120: -- User 'guoxiaolongzte' has created a pull request for this issue: https://github.com/apache/spark/pull/18326 > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21120: Assignee: Apache Spark > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21047) Add test suites for complicated cases in ColumnarBatchSuite
[ https://issues.apache.org/jira/browse/SPARK-21047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21047: Assignee: Apache Spark > Add test suites for complicated cases in ColumnarBatchSuite > --- > > Key: SPARK-21047 > URL: https://issues.apache.org/jira/browse/SPARK-21047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark > > Current {{ColumnarBatchSuite}} has very simple test cases for array. This > JIRA will add test suites for complicated cases such as nested array in > {{ColumnVector}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21047) Add test suites for complicated cases in ColumnarBatchSuite
[ https://issues.apache.org/jira/browse/SPARK-21047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051768#comment-16051768 ] Apache Spark commented on SPARK-21047: -- User 'jinxing64' has created a pull request for this issue: https://github.com/apache/spark/pull/18327 > Add test suites for complicated cases in ColumnarBatchSuite > --- > > Key: SPARK-21047 > URL: https://issues.apache.org/jira/browse/SPARK-21047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > > Current {{ColumnarBatchSuite}} has very simple test cases for array. This > JIRA will add test suites for complicated cases such as nested array in > {{ColumnVector}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21047) Add test suites for complicated cases in ColumnarBatchSuite
[ https://issues.apache.org/jira/browse/SPARK-21047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21047: Assignee: (was: Apache Spark) > Add test suites for complicated cases in ColumnarBatchSuite > --- > > Key: SPARK-21047 > URL: https://issues.apache.org/jira/browse/SPARK-21047 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > > Current {{ColumnarBatchSuite}} has very simple test cases for array. This > JIRA will add test suites for complicated cases such as nested array in > {{ColumnVector}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21121) Set up StorageLevel for CACHE TABLE command
[ https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051910#comment-16051910 ] Apache Spark commented on SPARK-21121: -- User 'dosoft' has created a pull request for this issue: https://github.com/apache/spark/pull/18328 > Set up StorageLevel for CACHE TABLE command > --- > > Key: SPARK-21121 > URL: https://issues.apache.org/jira/browse/SPARK-21121 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.1.1 >Reporter: Oleg Danilov >Priority: Minor > > Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage > level. We can add a possibility to specify it using variable, let say, > spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit > data into the memory with using MEMORY_AND_DISK_SER storage level. > Going to submit PR for this change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21121) Set up StorageLevel for CACHE TABLE command
[ https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21121: Assignee: Apache Spark > Set up StorageLevel for CACHE TABLE command > --- > > Key: SPARK-21121 > URL: https://issues.apache.org/jira/browse/SPARK-21121 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.1.1 >Reporter: Oleg Danilov >Assignee: Apache Spark >Priority: Minor > > Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage > level. We can add a possibility to specify it using variable, let say, > spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit > data into the memory with using MEMORY_AND_DISK_SER storage level. > Going to submit PR for this change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21121) Set up StorageLevel for CACHE TABLE command
[ https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21121: Assignee: (was: Apache Spark) > Set up StorageLevel for CACHE TABLE command > --- > > Key: SPARK-21121 > URL: https://issues.apache.org/jira/browse/SPARK-21121 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.1.1 >Reporter: Oleg Danilov >Priority: Minor > > Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage > level. We can add a possibility to specify it using variable, let say, > spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit > data into the memory with using MEMORY_AND_DISK_SER storage level. > Going to submit PR for this change. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19909) Batches will fail in case that temporary checkpoint dir is on local file system while metadata dir is on HDFS
[ https://issues.apache.org/jira/browse/SPARK-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051981#comment-16051981 ] Apache Spark commented on SPARK-19909: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/18329 > Batches will fail in case that temporary checkpoint dir is on local file > system while metadata dir is on HDFS > - > > Key: SPARK-19909 > URL: https://issues.apache.org/jira/browse/SPARK-19909 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Kousuke Saruta >Priority: Minor > > When we try to run Structured Streaming in local mode but use HDFS for the > storage, batches will be fail because of error like as follows. > {code} > val handle = stream.writeStream.format("console").start() > 17/03/09 16:54:45 ERROR StreamMetadata: Error writing stream metadata > StreamMetadata(fc07a0b1-5423-483e-a59d-b2206a49491e) to > /private/var/folders/4y/tmspvv353y59p3w4lknrf7ccgn/T/temporary-79d4fe05-4301-4b6d-a902-dff642d0ddca/metadata > org.apache.hadoop.security.AccessControlException: Permission denied: > user=kou, access=WRITE, > inode="/private/var/folders/4y/tmspvv353y59p3w4lknrf7ccgn/T/temporary-79d4fe05-4301-4b6d-a902-dff642d0ddca/metadata":hdfs:supergroup:drwxr-xr-x > {code} > It's because that a temporary checkpoint directory is created on local file > system but metadata whose path is based on the checkpoint directory will be > created on HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20749) Built-in SQL Function Support - all variants of LEN[GTH]
[ https://issues.apache.org/jira/browse/SPARK-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052058#comment-16052058 ] Apache Spark commented on SPARK-20749: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/18330 > Built-in SQL Function Support - all variants of LEN[GTH] > > > Key: SPARK-20749 > URL: https://issues.apache.org/jira/browse/SPARK-20749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Kazuaki Ishizaki > Labels: starter > Fix For: 2.3.0 > > > {noformat} > LEN[GTH]() > {noformat} > The SQL 99 standard includes BIT_LENGTH(), CHAR_LENGTH(), and OCTET_LENGTH() > functions. > We need to support all of them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21124) Wrong user shown in UI when using kerberos
[ https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21124: Assignee: Apache Spark > Wrong user shown in UI when using kerberos > -- > > Key: SPARK-21124 > URL: https://issues.apache.org/jira/browse/SPARK-21124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark >Priority: Minor > > When submitting an app to a kerberos-secured cluster, the OS user and the > user running the application may differ. Although it may also happen in > cluster mode depending on the cluster manager's configuration, it's more > common in client mode. > The UI should show enough information about user running the application to > correctly identify the actual user. The "app user" can be easily retrieved > via {{Utils.getCurrentUserName()}}, so it's mostly a matter of how to record > this information (for showing in replayed applications) and how to present it > in the UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21124) Wrong user shown in UI when using kerberos
[ https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21124: Assignee: (was: Apache Spark) > Wrong user shown in UI when using kerberos > -- > > Key: SPARK-21124 > URL: https://issues.apache.org/jira/browse/SPARK-21124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.0 >Reporter: Marcelo Vanzin >Priority: Minor > > When submitting an app to a kerberos-secured cluster, the OS user and the > user running the application may differ. Although it may also happen in > cluster mode depending on the cluster manager's configuration, it's more > common in client mode. > The UI should show enough information about user running the application to > correctly identify the actual user. The "app user" can be easily retrieved > via {{Utils.getCurrentUserName()}}, so it's mostly a matter of how to record > this information (for showing in replayed applications) and how to present it > in the UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21124) Wrong user shown in UI when using kerberos
[ https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052441#comment-16052441 ] Apache Spark commented on SPARK-21124: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/18331 > Wrong user shown in UI when using kerberos > -- > > Key: SPARK-21124 > URL: https://issues.apache.org/jira/browse/SPARK-21124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.0 >Reporter: Marcelo Vanzin >Priority: Minor > > When submitting an app to a kerberos-secured cluster, the OS user and the > user running the application may differ. Although it may also happen in > cluster mode depending on the cluster manager's configuration, it's more > common in client mode. > The UI should show enough information about user running the application to > correctly identify the actual user. The "app user" can be easily retrieved > via {{Utils.getCurrentUserName()}}, so it's mostly a matter of how to record > this information (for showing in replayed applications) and how to present it > in the UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21125) PySpark context missing function to set Job Description.
[ https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052516#comment-16052516 ] Apache Spark commented on SPARK-21125: -- User 'sjarvie' has created a pull request for this issue: https://github.com/apache/spark/pull/18332 > PySpark context missing function to set Job Description. > > > Key: SPARK-21125 > URL: https://issues.apache.org/jira/browse/SPARK-21125 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Shane Jarvie >Priority: Trivial > Labels: beginner > Fix For: 2.2.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > The PySpark API is missing a convienient function currently found in the > Scala API, which sets the Job Description for display in the Spark UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21125) PySpark context missing function to set Job Description.
[ https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21125: Assignee: Apache Spark > PySpark context missing function to set Job Description. > > > Key: SPARK-21125 > URL: https://issues.apache.org/jira/browse/SPARK-21125 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Shane Jarvie >Assignee: Apache Spark >Priority: Trivial > Labels: beginner > Fix For: 2.2.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > The PySpark API is missing a convienient function currently found in the > Scala API, which sets the Job Description for display in the Spark UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21125) PySpark context missing function to set Job Description.
[ https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21125: Assignee: (was: Apache Spark) > PySpark context missing function to set Job Description. > > > Key: SPARK-21125 > URL: https://issues.apache.org/jira/browse/SPARK-21125 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Shane Jarvie >Priority: Trivial > Labels: beginner > Fix For: 2.2.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > The PySpark API is missing a convienient function currently found in the > Scala API, which sets the Job Description for display in the Spark UI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark
[ https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052626#comment-16052626 ] Apache Spark commented on SPARK-21126: -- User 'liu-zhaokun' has created a pull request for this issue: https://github.com/apache/spark/pull/18333 > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark > - > > Key: SPARK-21126 > URL: https://issues.apache.org/jira/browse/SPARK-21126 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: liuzhaokun > > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark,so I think it should be removed from > configuration.md. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark
[ https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21126: Assignee: Apache Spark > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark > - > > Key: SPARK-21126 > URL: https://issues.apache.org/jira/browse/SPARK-21126 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: liuzhaokun >Assignee: Apache Spark > > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark,so I think it should be removed from > configuration.md. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark
[ https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21126: Assignee: (was: Apache Spark) > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark > - > > Key: SPARK-21126 > URL: https://issues.apache.org/jira/browse/SPARK-21126 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: liuzhaokun > > The configuration which named "spark.core.connection.auth.wait.timeout" > hasn't been used in spark,so I think it should be removed from > configuration.md. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21127) Update statistics after data changing commands
[ https://issues.apache.org/jira/browse/SPARK-21127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21127: Assignee: Apache Spark > Update statistics after data changing commands > -- > > Key: SPARK-21127 > URL: https://issues.apache.org/jira/browse/SPARK-21127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21127) Update statistics after data changing commands
[ https://issues.apache.org/jira/browse/SPARK-21127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21127: Assignee: (was: Apache Spark) > Update statistics after data changing commands > -- > > Key: SPARK-21127 > URL: https://issues.apache.org/jira/browse/SPARK-21127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21127) Update statistics after data changing commands
[ https://issues.apache.org/jira/browse/SPARK-21127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052751#comment-16052751 ] Apache Spark commented on SPARK-21127: -- User 'wzhfy' has created a pull request for this issue: https://github.com/apache/spark/pull/18334 > Update statistics after data changing commands > -- > > Key: SPARK-21127 > URL: https://issues.apache.org/jira/browse/SPARK-21127 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Zhenhua Wang > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
[ https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21128: Assignee: Apache Spark > Running R tests multiple times failed due to pre-exiting "spark-warehouse" / > "metastore_db" > --- > > Key: SPARK-21128 > URL: https://issues.apache.org/jira/browse/SPARK-21128 > Project: Spark > Issue Type: Test > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > Currently, running R tests multiple times fails due to pre-exiting > "spark-warehouse" / "metastore_db" as below: > {code} > SparkSQL functions: Spark package found in SPARK_HOME: .../spark > ...1234... > {code} > {code} > Failed > - > 1. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 2. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > 3. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 4. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > DONE > === > {code} > It looks we should remove both "spark-warehouse" and "metastore_db" _before_ > listing files into {{sparkRFilesBefore}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
[ https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052792#comment-16052792 ] Apache Spark commented on SPARK-21128: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/18335 > Running R tests multiple times failed due to pre-exiting "spark-warehouse" / > "metastore_db" > --- > > Key: SPARK-21128 > URL: https://issues.apache.org/jira/browse/SPARK-21128 > Project: Spark > Issue Type: Test > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Currently, running R tests multiple times fails due to pre-exiting > "spark-warehouse" / "metastore_db" as below: > {code} > SparkSQL functions: Spark package found in SPARK_HOME: .../spark > ...1234... > {code} > {code} > Failed > - > 1. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 2. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > 3. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 4. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > DONE > === > {code} > It looks we should remove both "spark-warehouse" and "metastore_db" _before_ > listing files into {{sparkRFilesBefore}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"
[ https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21128: Assignee: (was: Apache Spark) > Running R tests multiple times failed due to pre-exiting "spark-warehouse" / > "metastore_db" > --- > > Key: SPARK-21128 > URL: https://issues.apache.org/jira/browse/SPARK-21128 > Project: Spark > Issue Type: Test > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hyukjin Kwon >Priority: Minor > > Currently, running R tests multiple times fails due to pre-exiting > "spark-warehouse" / "metastore_db" as below: > {code} > SparkSQL functions: Spark package found in SPARK_HOME: .../spark > ...1234... > {code} > {code} > Failed > - > 1. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 2. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3384) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > 3. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > length(list1) not equal to length(list2). > 1/1 mismatches > [1] 25 - 23 == 2 > 4. Failure: No extra files are created in SPARK_HOME by starting session and > making calls (@test_sparkSQL.R#3388) > sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE). > 10/25 mismatches > x[16]: "metastore_db" > y[16]: "pkg" > x[17]: "pkg" > y[17]: "R" > x[18]: "R" > y[18]: "README.md" > x[19]: "README.md" > y[19]: "run-tests.sh" > x[20]: "run-tests.sh" > y[20]: "SparkR_2.2.0.tar.gz" > x[21]: "metastore_db" > y[21]: "pkg" > x[22]: "pkg" > y[22]: "R" > x[23]: "R" > y[23]: "README.md" > x[24]: "README.md" > y[24]: "run-tests.sh" > x[25]: "run-tests.sh" > y[25]: "SparkR_2.2.0.tar.gz" > DONE > === > {code} > It looks we should remove both "spark-warehouse" and "metastore_db" _before_ > listing files into {{sparkRFilesBefore}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21129) Arguments of SQL function call should not be named expressions
[ https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21129: Assignee: Xiao Li (was: Apache Spark) > Arguments of SQL function call should not be named expressions > -- > > Key: SPARK-21129 > URL: https://issues.apache.org/jira/browse/SPARK-21129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > Function argument should not be named expressions. It could cause misleading > error message. > {noformat} > spark-sql> select count(distinct c1, distinct c2) from t1; > {noformat} > {noformat} > Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; > line 1 pos 26; > 'Project [unresolvedalias('count(c1#30, 'distinct), None)] > +- SubqueryAlias t1 >+- CatalogRelation `default`.`t1`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21129) Arguments of SQL function call should not be named expressions
[ https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052990#comment-16052990 ] Apache Spark commented on SPARK-21129: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/18338 > Arguments of SQL function call should not be named expressions > -- > > Key: SPARK-21129 > URL: https://issues.apache.org/jira/browse/SPARK-21129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > Function argument should not be named expressions. It could cause misleading > error message. > {noformat} > spark-sql> select count(distinct c1, distinct c2) from t1; > {noformat} > {noformat} > Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; > line 1 pos 26; > 'Project [unresolvedalias('count(c1#30, 'distinct), None)] > +- SubqueryAlias t1 >+- CatalogRelation `default`.`t1`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21129) Arguments of SQL function call should not be named expressions
[ https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21129: Assignee: Apache Spark (was: Xiao Li) > Arguments of SQL function call should not be named expressions > -- > > Key: SPARK-21129 > URL: https://issues.apache.org/jira/browse/SPARK-21129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Xiao Li >Assignee: Apache Spark > > Function argument should not be named expressions. It could cause misleading > error message. > {noformat} > spark-sql> select count(distinct c1, distinct c2) from t1; > {noformat} > {noformat} > Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; > line 1 pos 26; > 'Project [unresolvedalias('count(c1#30, 'distinct), None)] > +- SubqueryAlias t1 >+- CatalogRelation `default`.`t1`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway
[ https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21094: Assignee: Apache Spark > Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway > > > Key: SPARK-21094 > URL: https://issues.apache.org/jira/browse/SPARK-21094 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Peter Parente >Assignee: Apache Spark > > The Popen call to launch the py4j gateway specifies no stdout and stderr > options, meaning logging from the JVM always goes to the parent process > terminal. > https://github.com/apache/spark/blob/v2.1.1/python/pyspark/java_gateway.py#L77 > It would be super handy if the launch_gateway function took an additional > dict parameter called popen_kwargs and passed it along to the Popen calls. > This API enhancement, for example, would allow Python applications to capture > all stdout and stderr coming from Spark and process it programmatically, > without resorting to reading from log files or other hijinks. > Example use: > {code} > import pyspark > import subprocess > from pyspark.java_gateway import launch_gateway > # Make the py4j JVM stdout and stderr available without buffering > popen_kwargs = { > 'stdout': subprocess.PIPE, > 'stderr': subprocess.PIPE, > 'bufsiz': 0 > } > # Launch the gateway with our custom settings > gateway = launch_gateway(popen_kwargs=popen_kwargs) > # Use the gateway we launched > sc = pyspark.SparkContext(gateway=gateway) > # This could be done in a thread or event loop or ... > # Written briefly / poorly here only as a demo > while True: > buf = gateway.proc.stdout.read() > print(buf.decode('utf-8')) > {code} > To get access to the stdout and stderr pipes, the "proc" instance created in > launch_gateway also needs to be exposed to the application. I'm thinking that > stashing it on the JavaGateway instance that the function already returns is > the cleanest from the client perspective, but means hanging an extra > attribute off the py4j.JavaGateway object. > I can submit a PR with this addition for further discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway
[ https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21094: Assignee: (was: Apache Spark) > Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway > > > Key: SPARK-21094 > URL: https://issues.apache.org/jira/browse/SPARK-21094 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Peter Parente > > The Popen call to launch the py4j gateway specifies no stdout and stderr > options, meaning logging from the JVM always goes to the parent process > terminal. > https://github.com/apache/spark/blob/v2.1.1/python/pyspark/java_gateway.py#L77 > It would be super handy if the launch_gateway function took an additional > dict parameter called popen_kwargs and passed it along to the Popen calls. > This API enhancement, for example, would allow Python applications to capture > all stdout and stderr coming from Spark and process it programmatically, > without resorting to reading from log files or other hijinks. > Example use: > {code} > import pyspark > import subprocess > from pyspark.java_gateway import launch_gateway > # Make the py4j JVM stdout and stderr available without buffering > popen_kwargs = { > 'stdout': subprocess.PIPE, > 'stderr': subprocess.PIPE, > 'bufsiz': 0 > } > # Launch the gateway with our custom settings > gateway = launch_gateway(popen_kwargs=popen_kwargs) > # Use the gateway we launched > sc = pyspark.SparkContext(gateway=gateway) > # This could be done in a thread or event loop or ... > # Written briefly / poorly here only as a demo > while True: > buf = gateway.proc.stdout.read() > print(buf.decode('utf-8')) > {code} > To get access to the stdout and stderr pipes, the "proc" instance created in > launch_gateway also needs to be exposed to the application. I'm thinking that > stashing it on the JavaGateway instance that the function already returns is > the cleanest from the client perspective, but means hanging an extra > attribute off the py4j.JavaGateway object. > I can submit a PR with this addition for further discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway
[ https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053028#comment-16053028 ] Apache Spark commented on SPARK-21094: -- User 'parente' has created a pull request for this issue: https://github.com/apache/spark/pull/18339 > Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway > > > Key: SPARK-21094 > URL: https://issues.apache.org/jira/browse/SPARK-21094 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.1.1 >Reporter: Peter Parente > > The Popen call to launch the py4j gateway specifies no stdout and stderr > options, meaning logging from the JVM always goes to the parent process > terminal. > https://github.com/apache/spark/blob/v2.1.1/python/pyspark/java_gateway.py#L77 > It would be super handy if the launch_gateway function took an additional > dict parameter called popen_kwargs and passed it along to the Popen calls. > This API enhancement, for example, would allow Python applications to capture > all stdout and stderr coming from Spark and process it programmatically, > without resorting to reading from log files or other hijinks. > Example use: > {code} > import pyspark > import subprocess > from pyspark.java_gateway import launch_gateway > # Make the py4j JVM stdout and stderr available without buffering > popen_kwargs = { > 'stdout': subprocess.PIPE, > 'stderr': subprocess.PIPE, > 'bufsiz': 0 > } > # Launch the gateway with our custom settings > gateway = launch_gateway(popen_kwargs=popen_kwargs) > # Use the gateway we launched > sc = pyspark.SparkContext(gateway=gateway) > # This could be done in a thread or event loop or ... > # Written briefly / poorly here only as a demo > while True: > buf = gateway.proc.stdout.read() > print(buf.decode('utf-8')) > {code} > To get access to the stdout and stderr pipes, the "proc" instance created in > launch_gateway also needs to be exposed to the application. I'm thinking that > stashing it on the JavaGateway instance that the function already returns is > the cleanest from the client perspective, but means hanging an extra > attribute off the py4j.JavaGateway object. > I can submit a PR with this addition for further discussion. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21131) Fix batch gradient bug in SVDPlusPlus
[ https://issues.apache.org/jira/browse/SPARK-21131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21131: Assignee: (was: Apache Spark) > Fix batch gradient bug in SVDPlusPlus > - > > Key: SPARK-21131 > URL: https://issues.apache.org/jira/browse/SPARK-21131 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.1.1 >Reporter: ruiqiangfan > > I think the mode of batch gradient in Graphx SVDPlusPlus is incorrect. The > changes of this pull request are as blow: > * Use the mean of batch gradient instead of the sum of batch gradient. > * When calculating the bias of each rating during iteration, the bias > shouldn't be limit in (minVal, maxVal). > * The method "calculateIterationRMSE" is added to show the iteration effect > clearly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21131) Fix batch gradient bug in SVDPlusPlus
[ https://issues.apache.org/jira/browse/SPARK-21131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053032#comment-16053032 ] Apache Spark commented on SPARK-21131: -- User 'lxmly' has created a pull request for this issue: https://github.com/apache/spark/pull/18337 > Fix batch gradient bug in SVDPlusPlus > - > > Key: SPARK-21131 > URL: https://issues.apache.org/jira/browse/SPARK-21131 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.1.1 >Reporter: ruiqiangfan > > I think the mode of batch gradient in Graphx SVDPlusPlus is incorrect. The > changes of this pull request are as blow: > * Use the mean of batch gradient instead of the sum of batch gradient. > * When calculating the bias of each rating during iteration, the bias > shouldn't be limit in (minVal, maxVal). > * The method "calculateIterationRMSE" is added to show the iteration effect > clearly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21131) Fix batch gradient bug in SVDPlusPlus
[ https://issues.apache.org/jira/browse/SPARK-21131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21131: Assignee: Apache Spark > Fix batch gradient bug in SVDPlusPlus > - > > Key: SPARK-21131 > URL: https://issues.apache.org/jira/browse/SPARK-21131 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.1.1 >Reporter: ruiqiangfan >Assignee: Apache Spark > > I think the mode of batch gradient in Graphx SVDPlusPlus is incorrect. The > changes of this pull request are as blow: > * Use the mean of batch gradient instead of the sum of batch gradient. > * When calculating the bias of each rating during iteration, the bias > shouldn't be limit in (minVal, maxVal). > * The method "calculateIterationRMSE" is added to show the iteration effect > clearly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored
[ https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053054#comment-16053054 ] Apache Spark commented on SPARK-21132: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/18340 > DISTINCT modifier of function arguments should not be silently ignored > -- > > Key: SPARK-21132 > URL: https://issues.apache.org/jira/browse/SPARK-21132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > DISTINCT modifier of function arguments should not be silently ignored when > it is not being supported. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored
[ https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21132: Assignee: Xiao Li (was: Apache Spark) > DISTINCT modifier of function arguments should not be silently ignored > -- > > Key: SPARK-21132 > URL: https://issues.apache.org/jira/browse/SPARK-21132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Xiao Li > > DISTINCT modifier of function arguments should not be silently ignored when > it is not being supported. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored
[ https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21132: Assignee: Apache Spark (was: Xiao Li) > DISTINCT modifier of function arguments should not be silently ignored > -- > > Key: SPARK-21132 > URL: https://issues.apache.org/jira/browse/SPARK-21132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li >Assignee: Apache Spark > > DISTINCT modifier of function arguments should not be silently ignored when > it is not being supported. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053093#comment-16053093 ] Apache Spark commented on SPARK-21133: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/18343 > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecu
[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21133: Assignee: Apache Spark > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java
[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21133: Assignee: (was: Apache Spark) > HighlyCompressedMapStatus#writeExternal throws NPE > -- > > Key: SPARK-21133 > URL: https://issues.apache.org/jira/browse/SPARK-21133 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Yuming Wang > > Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for > simple: > {code:sql} > spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7 -e " > set spark.sql.shuffle.partitions=2001; > drop table if exists spark_hcms_npe; > create table spark_hcms_npe as select id, count(*) from big_table group by > id; > " > {code} > Error logs: > {noformat} > 17/06/18 15:00:27 ERROR Utils: Exception encountered > java.lang.NullPointerException > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310) > at > org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at > org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337) > at > org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619) > at > org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562) > at > org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.l
[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21123: Assignee: Apache Spark > Options for file stream source are in a wrong table > --- > > Key: SPARK-21123 > URL: https://issues.apache.org/jira/browse/SPARK-21123 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Minor > Labels: starter > Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png > > > Right now options for file stream source are documented with file sink. We > should create a table for source options and fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21123) Options for file stream source are in a wrong table
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053125#comment-16053125 ] Apache Spark commented on SPARK-21123: -- User 'assafmendelson' has created a pull request for this issue: https://github.com/apache/spark/pull/18342 > Options for file stream source are in a wrong table > --- > > Key: SPARK-21123 > URL: https://issues.apache.org/jira/browse/SPARK-21123 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Shixiong Zhu >Priority: Minor > Labels: starter > Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png > > > Right now options for file stream source are documented with file sink. We > should create a table for source options and fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21123: Assignee: (was: Apache Spark) > Options for file stream source are in a wrong table > --- > > Key: SPARK-21123 > URL: https://issues.apache.org/jira/browse/SPARK-21123 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 2.1.1, 2.2.0 >Reporter: Shixiong Zhu >Priority: Minor > Labels: starter > Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png > > > Right now options for file stream source are documented with file sink. We > should create a table for source options and fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
[ https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21134: Assignee: Apache Spark > Codegen-only expressions should not be collapsed with upper CodegenFallback > expression > -- > > Key: SPARK-21134 > URL: https://issues.apache.org/jira/browse/SPARK-21134 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > The rule {{CollapseProject}} in optimizer collapses lower and upper > expressions if they have common expressions. > We have seen a use case that a codegen-only expression has been collapsed > with an upper {{CodegenFallback}} expression. Because all children > expressions under {{CodegenFallback}} will be evaluated with non-codegen path > {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
[ https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053404#comment-16053404 ] Apache Spark commented on SPARK-21134: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/18346 > Codegen-only expressions should not be collapsed with upper CodegenFallback > expression > -- > > Key: SPARK-21134 > URL: https://issues.apache.org/jira/browse/SPARK-21134 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Liang-Chi Hsieh > > The rule {{CollapseProject}} in optimizer collapses lower and upper > expressions if they have common expressions. > We have seen a use case that a codegen-only expression has been collapsed > with an upper {{CodegenFallback}} expression. Because all children > expressions under {{CodegenFallback}} will be evaluated with non-codegen path > {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression
[ https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21134: Assignee: (was: Apache Spark) > Codegen-only expressions should not be collapsed with upper CodegenFallback > expression > -- > > Key: SPARK-21134 > URL: https://issues.apache.org/jira/browse/SPARK-21134 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 >Reporter: Liang-Chi Hsieh > > The rule {{CollapseProject}} in optimizer collapses lower and upper > expressions if they have common expressions. > We have seen a use case that a codegen-only expression has been collapsed > with an upper {{CodegenFallback}} expression. Because all children > expressions under {{CodegenFallback}} will be evaluated with non-codegen path > {{eval}}, it causes an exception in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20599: Assignee: (was: Apache Spark) > ConsoleSink should work with write (batch) > -- > > Key: SPARK-20599 > URL: https://issues.apache.org/jira/browse/SPARK-20599 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Minor > Labels: starter > > I think the following should just work. > {code} > spark. > read. // <-- it's a batch query not streaming query if that matters > format("kafka"). > option("subscribe", "topic1"). > option("kafka.bootstrap.servers", "localhost:9092"). > load. > write. > format("console"). // <-- that's not supported currently > save > {code} > The above combination of {{kafka}} source and {{console}} sink leads to the > following exception: > {code} > java.lang.RuntimeException: > org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow > create table as select. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20599: Assignee: Apache Spark > ConsoleSink should work with write (batch) > -- > > Key: SPARK-20599 > URL: https://issues.apache.org/jira/browse/SPARK-20599 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Minor > Labels: starter > > I think the following should just work. > {code} > spark. > read. // <-- it's a batch query not streaming query if that matters > format("kafka"). > option("subscribe", "topic1"). > option("kafka.bootstrap.servers", "localhost:9092"). > load. > write. > format("console"). // <-- that's not supported currently > save > {code} > The above combination of {{kafka}} source and {{console}} sink leads to the > following exception: > {code} > java.lang.RuntimeException: > org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow > create table as select. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20599) ConsoleSink should work with write (batch)
[ https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053413#comment-16053413 ] Apache Spark commented on SPARK-20599: -- User 'lubozhan' has created a pull request for this issue: https://github.com/apache/spark/pull/18347 > ConsoleSink should work with write (batch) > -- > > Key: SPARK-20599 > URL: https://issues.apache.org/jira/browse/SPARK-20599 > Project: Spark > Issue Type: Improvement > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Minor > Labels: starter > > I think the following should just work. > {code} > spark. > read. // <-- it's a batch query not streaming query if that matters > format("kafka"). > option("subscribe", "topic1"). > option("kafka.bootstrap.servers", "localhost:9092"). > load. > write. > format("console"). // <-- that's not supported currently > save > {code} > The above combination of {{kafka}} source and {{console}} sink leads to the > following exception: > {code} > java.lang.RuntimeException: > org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow > create table as select. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) > ... 48 elided > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.
[ https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053424#comment-16053424 ] Apache Spark commented on SPARK-21120: -- User 'guoxiaolongzte' has created a pull request for this issue: https://github.com/apache/spark/pull/18348 > Increasing the master's metric is conducive to the spark cluster management > system monitoring. > -- > > Key: SPARK-21120 > URL: https://issues.apache.org/jira/browse/SPARK-21120 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: guoxiaolongzte >Priority: Minor > Attachments: 1.png > > > The current number of master metric is very small, unable to meet the needs > of spark large-scale cluster management system. So I am as much as possible > to complete the relevant metric. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20927: Assignee: Apache Spark > Add cache operator to Unsupported Operations in Structured Streaming > - > > Key: SPARK-20927 > URL: https://issues.apache.org/jira/browse/SPARK-20927 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Assignee: Apache Spark >Priority: Trivial > Labels: starter > > Just [found > out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries] > that {{cache}} is not allowed on streaming datasets. > {{cache}} on streaming datasets leads to the following exception: > {code} > scala> spark.readStream.text("files").cache > org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start();; > FileSource[files] > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92) > at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603) > at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613) > ... 48 elided > {code} > It should be included in Structured Streaming's [Unsupported > Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20927: Assignee: (was: Apache Spark) > Add cache operator to Unsupported Operations in Structured Streaming > - > > Key: SPARK-20927 > URL: https://issues.apache.org/jira/browse/SPARK-20927 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Trivial > Labels: starter > > Just [found > out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries] > that {{cache}} is not allowed on streaming datasets. > {{cache}} on streaming datasets leads to the following exception: > {code} > scala> spark.readStream.text("files").cache > org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start();; > FileSource[files] > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92) > at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603) > at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613) > ... 48 elided > {code} > It should be included in Structured Streaming's [Unsupported > Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053468#comment-16053468 ] Apache Spark commented on SPARK-20927: -- User 'ZiyueHuang' has created a pull request for this issue: https://github.com/apache/spark/pull/18349 > Add cache operator to Unsupported Operations in Structured Streaming > - > > Key: SPARK-20927 > URL: https://issues.apache.org/jira/browse/SPARK-20927 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.2.0 >Reporter: Jacek Laskowski >Priority: Trivial > Labels: starter > > Just [found > out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries] > that {{cache}} is not allowed on streaming datasets. > {{cache}} on streaming datasets leads to the following exception: > {code} > scala> spark.readStream.text("files").cache > org.apache.spark.sql.AnalysisException: Queries with streaming sources must > be executed with writeStream.start();; > FileSource[files] > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34) > at > org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104) > at > org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68) > at > org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92) > at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603) > at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613) > ... 48 elided > {code} > It should be included in Structured Streaming's [Unsupported > Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21135) On history server page,duration of incompleted applications should be hidden instead of showing up as 0
[ https://issues.apache.org/jira/browse/SPARK-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053665#comment-16053665 ] Apache Spark commented on SPARK-21135: -- User 'fjh100456' has created a pull request for this issue: https://github.com/apache/spark/pull/18351 > On history server page,duration of incompleted applications should be hidden > instead of showing up as 0 > --- > > Key: SPARK-21135 > URL: https://issues.apache.org/jira/browse/SPARK-21135 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.2.1 >Reporter: Jinhua Fu >Priority: Minor > > On history server page,duration of incompleted applications should be hidden > instead of showing up as 0. > In addition, the application of an exception abort (such as the application > of a background kill or driver outage) will always be treated as a > Incompleted application, and I'm not sure if this is a problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21135) On history server page,duration of incompleted applications should be hidden instead of showing up as 0
[ https://issues.apache.org/jira/browse/SPARK-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21135: Assignee: Apache Spark > On history server page,duration of incompleted applications should be hidden > instead of showing up as 0 > --- > > Key: SPARK-21135 > URL: https://issues.apache.org/jira/browse/SPARK-21135 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.2.1 >Reporter: Jinhua Fu >Assignee: Apache Spark >Priority: Minor > > On history server page,duration of incompleted applications should be hidden > instead of showing up as 0. > In addition, the application of an exception abort (such as the application > of a background kill or driver outage) will always be treated as a > Incompleted application, and I'm not sure if this is a problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21135) On history server page,duration of incompleted applications should be hidden instead of showing up as 0
[ https://issues.apache.org/jira/browse/SPARK-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21135: Assignee: (was: Apache Spark) > On history server page,duration of incompleted applications should be hidden > instead of showing up as 0 > --- > > Key: SPARK-21135 > URL: https://issues.apache.org/jira/browse/SPARK-21135 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.2.1 >Reporter: Jinhua Fu >Priority: Minor > > On history server page,duration of incompleted applications should be hidden > instead of showing up as 0. > In addition, the application of an exception abort (such as the application > of a background kill or driver outage) will always be treated as a > Incompleted application, and I'm not sure if this is a problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053817#comment-16053817 ] Apache Spark commented on SPARK-21138: -- User 'sharkdtu' has created a pull request for this issue: https://github.com/apache/spark/pull/18352 > Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and > "spark.hadoop.fs.defaultFS" are different > - > > Key: SPARK-21138 > URL: https://issues.apache.org/jira/browse/SPARK-21138 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.1.1 >Reporter: sharkd tu > > When I set different clusters for "spark.hadoop.fs.defaultFS" and > "spark.yarn.stagingDir" as follows: > {code:java} > spark.hadoop.fs.defaultFS hdfs://tl-nn-tdw.tencent-distribute.com:54310 > spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark > {code} > I got following logs: > {code:java} > 17/06/19 17:55:48 INFO SparkContext: Successfully stopped SparkContext > 17/06/19 17:55:48 INFO ApplicationMaster: Unregistering ApplicationMaster > with SUCCEEDED > 17/06/19 17:55:48 INFO AMRMClientImpl: Waiting for application to be > successfully unregistered. > 17/06/19 17:55:48 INFO ApplicationMaster: Deleting staging directory > hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618 > 17/06/19 17:55:48 ERROR Utils: Uncaught exception in thread Thread-2 > java.lang.IllegalArgumentException: Wrong FS: > hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, > expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getPathName(Cdh3DistributedFileSystem.java:197) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.access$000(Cdh3DistributedFileSystem.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:644) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:640) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.delete(Cdh3DistributedFileSystem.java:640) > at > org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:216) > at > org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at scala.util.Try$.apply(Try.scala:192) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21138: Assignee: Apache Spark > Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and > "spark.hadoop.fs.defaultFS" are different > - > > Key: SPARK-21138 > URL: https://issues.apache.org/jira/browse/SPARK-21138 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.1.1 >Reporter: sharkd tu >Assignee: Apache Spark > > When I set different clusters for "spark.hadoop.fs.defaultFS" and > "spark.yarn.stagingDir" as follows: > {code:java} > spark.hadoop.fs.defaultFS hdfs://tl-nn-tdw.tencent-distribute.com:54310 > spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark > {code} > I got following logs: > {code:java} > 17/06/19 17:55:48 INFO SparkContext: Successfully stopped SparkContext > 17/06/19 17:55:48 INFO ApplicationMaster: Unregistering ApplicationMaster > with SUCCEEDED > 17/06/19 17:55:48 INFO AMRMClientImpl: Waiting for application to be > successfully unregistered. > 17/06/19 17:55:48 INFO ApplicationMaster: Deleting staging directory > hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618 > 17/06/19 17:55:48 ERROR Utils: Uncaught exception in thread Thread-2 > java.lang.IllegalArgumentException: Wrong FS: > hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, > expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getPathName(Cdh3DistributedFileSystem.java:197) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.access$000(Cdh3DistributedFileSystem.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:644) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:640) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.delete(Cdh3DistributedFileSystem.java:640) > at > org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:216) > at > org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at scala.util.Try$.apply(Try.scala:192) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21138: Assignee: (was: Apache Spark) > Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and > "spark.hadoop.fs.defaultFS" are different > - > > Key: SPARK-21138 > URL: https://issues.apache.org/jira/browse/SPARK-21138 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.1.1 >Reporter: sharkd tu > > When I set different clusters for "spark.hadoop.fs.defaultFS" and > "spark.yarn.stagingDir" as follows: > {code:java} > spark.hadoop.fs.defaultFS hdfs://tl-nn-tdw.tencent-distribute.com:54310 > spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark > {code} > I got following logs: > {code:java} > 17/06/19 17:55:48 INFO SparkContext: Successfully stopped SparkContext > 17/06/19 17:55:48 INFO ApplicationMaster: Unregistering ApplicationMaster > with SUCCEEDED > 17/06/19 17:55:48 INFO AMRMClientImpl: Waiting for application to be > successfully unregistered. > 17/06/19 17:55:48 INFO ApplicationMaster: Deleting staging directory > hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618 > 17/06/19 17:55:48 ERROR Utils: Uncaught exception in thread Thread-2 > java.lang.IllegalArgumentException: Wrong FS: > hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, > expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getPathName(Cdh3DistributedFileSystem.java:197) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.access$000(Cdh3DistributedFileSystem.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:644) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:640) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.delete(Cdh3DistributedFileSystem.java:640) > at > org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:216) > at > org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) > at scala.util.Try$.apply(Try.scala:192) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21142: Assignee: (was: Apache Spark) > spark-streaming-kafka-0-10 has too fat dependency on kafka > -- > > Key: SPARK-21142 > URL: https://issues.apache.org/jira/browse/SPARK-21142 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.1.1 >Reporter: Tim Van Wassenhove >Priority: Minor > > Currently spark-streaming-kafka-0-10_2.11 has a dependency on kafka, where it > should only need a dependency on kafka-clients. > The only reason there is a dependency on kafka (full server) is due to > KafkaTestUtils class to run in memory tests against a kafka broker. This > class should be moved to src/test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21142: Assignee: Apache Spark > spark-streaming-kafka-0-10 has too fat dependency on kafka > -- > > Key: SPARK-21142 > URL: https://issues.apache.org/jira/browse/SPARK-21142 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.1.1 >Reporter: Tim Van Wassenhove >Assignee: Apache Spark >Priority: Minor > > Currently spark-streaming-kafka-0-10_2.11 has a dependency on kafka, where it > should only need a dependency on kafka-clients. > The only reason there is a dependency on kafka (full server) is due to > KafkaTestUtils class to run in memory tests against a kafka broker. This > class should be moved to src/test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org