from:"Apache Spark \(JIRA\)"

[jira] [Commented] (SPARK-20980) Rename the option `wholeFile` to `multiLine` for JSON and CSV

2017-06-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050023#comment-16050023
 ] 

Apache Spark commented on SPARK-20980:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/18312

> Rename the option `wholeFile` to `multiLine` for JSON and CSV
> -
>
> Key: SPARK-20980
> URL: https://issues.apache.org/jira/browse/SPARK-20980
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> The current option name `wholeFile` is misleading for CSV. Currently, it is 
> not representing a record per file. Actually, one file could have multiple 
> records. Thus, we should rename it. Now, the proposal is `multiLine`.
> To make it consistent, we need to rename the same option for JSON and fix the 
> issue in another JIRA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala

2017-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21087:


Assignee: (was: Apache Spark)

> CrossValidator, TrainValidationSplit should preserve all models after 
> fitting: Scala
> 
>
> Key: SPARK-21087
> URL: https://issues.apache.org/jira/browse/SPARK-21087
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> See parent JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala

2017-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21087:


Assignee: Apache Spark

> CrossValidator, TrainValidationSplit should preserve all models after 
> fitting: Scala
> 
>
> Key: SPARK-21087
> URL: https://issues.apache.org/jira/browse/SPARK-21087
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> See parent JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21087) CrossValidator, TrainValidationSplit should preserve all models after fitting: Scala

2017-06-14 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050053#comment-16050053
 ] 

Apache Spark commented on SPARK-21087:
--

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/18313

> CrossValidator, TrainValidationSplit should preserve all models after 
> fitting: Scala
> 
>
> Key: SPARK-21087
> URL: https://issues.apache.org/jira/browse/SPARK-21087
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> See parent JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20200) Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050190#comment-16050190
 ] 

Apache Spark commented on SPARK-20200:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/18314

> Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite
> -
>
> Key: SPARK-20200
> URL: https://issues.apache.org/jira/browse/SPARK-20200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Takuya Ueshin
>Priority: Minor
>  Labels: flaky-test
>
> This test failed recently here:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/2909/testReport/junit/org.apache.spark.rdd/LocalCheckpointSuite/missing_checkpoint_block_fails_with_informative_message/
> Dashboard
> https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message
> Error Message
> {code}
> Collect should have failed if local checkpoint block is removed...
> {code}
> {code}
> org.scalatest.exceptions.TestFailedException: Collect should have failed if 
> local checkpoint block is removed...
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
>   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply$mcV$sp(LocalCheckpointSuite.scala:173)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(LocalCheckpointSuite.scala:27)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite.runTest(LocalCheckpointSuite.scala:27)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
>   at org.scalat

[jira] [Commented] (SPARK-16251) LocalCheckpointSuite's - missing checkpoint block fails with informative message is flaky.

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050189#comment-16050189
 ] 

Apache Spark commented on SPARK-16251:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/18314

> LocalCheckpointSuite's - missing checkpoint block fails with informative 
> message is flaky.
> --
>
> Key: SPARK-16251
> URL: https://issues.apache.org/jira/browse/SPARK-16251
> Project: Spark
>  Issue Type: Bug
>Reporter: Prashant Sharma
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20200) Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20200:


Assignee: Apache Spark

> Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite
> -
>
> Key: SPARK-20200
> URL: https://issues.apache.org/jira/browse/SPARK-20200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Minor
>  Labels: flaky-test
>
> This test failed recently here:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/2909/testReport/junit/org.apache.spark.rdd/LocalCheckpointSuite/missing_checkpoint_block_fails_with_informative_message/
> Dashboard
> https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message
> Error Message
> {code}
> Collect should have failed if local checkpoint block is removed...
> {code}
> {code}
> org.scalatest.exceptions.TestFailedException: Collect should have failed if 
> local checkpoint block is removed...
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
>   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply$mcV$sp(LocalCheckpointSuite.scala:173)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(LocalCheckpointSuite.scala:27)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite.runTest(LocalCheckpointSuite.scala:27)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$ano

[jira] [Assigned] (SPARK-20200) Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20200:


Assignee: (was: Apache Spark)

> Flaky Test: org.apache.spark.rdd.LocalCheckpointSuite
> -
>
> Key: SPARK-20200
> URL: https://issues.apache.org/jira/browse/SPARK-20200
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Takuya Ueshin
>Priority: Minor
>  Labels: flaky-test
>
> This test failed recently here:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/2909/testReport/junit/org.apache.spark.rdd/LocalCheckpointSuite/missing_checkpoint_block_fails_with_informative_message/
> Dashboard
> https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.rdd.LocalCheckpointSuite&test_name=missing+checkpoint+block+fails+with+informative+message
> Error Message
> {code}
> Collect should have failed if local checkpoint block is removed...
> {code}
> {code}
> org.scalatest.exceptions.TestFailedException: Collect should have failed if 
> local checkpoint block is removed...
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:495)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at org.scalatest.Assertions$class.fail(Assertions.scala:1328)
>   at org.scalatest.FunSuite.fail(FunSuite.scala:1555)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply$mcV$sp(LocalCheckpointSuite.scala:173)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite$$anonfun$16.apply(LocalCheckpointSuite.scala:155)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(LocalCheckpointSuite.scala:27)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.rdd.LocalCheckpointSuite.runTest(LocalCheckpointSuite.scala:27)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
>   at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)
>   at 
> org.scalatest.Suite$$anonfun$runNestedSuites$1.ap

[jira] [Commented] (SPARK-21108) convert LinearSVC to aggregator framework

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050931#comment-16050931
 ] 

Apache Spark commented on SPARK-21108:
--

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/18315

> convert LinearSVC to aggregator framework
> -
>
> Key: SPARK-21108
> URL: https://issues.apache.org/jira/browse/SPARK-21108
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21108) convert LinearSVC to aggregator framework

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21108:


Assignee: (was: Apache Spark)

> convert LinearSVC to aggregator framework
> -
>
> Key: SPARK-21108
> URL: https://issues.apache.org/jira/browse/SPARK-21108
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21108) convert LinearSVC to aggregator framework

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21108:


Assignee: Apache Spark

> convert LinearSVC to aggregator framework
> -
>
> Key: SPARK-21108
> URL: https://issues.apache.org/jira/browse/SPARK-21108
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: yuhao yang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21111) Fix test failure in 2.2

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-2:


Assignee: Apache Spark  (was: Xiao Li)

> Fix test failure in 2.2 
> 
>
> Key: SPARK-2
> URL: https://issues.apache.org/jira/browse/SPARK-2
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Blocker
>
> Test failure:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21111) Fix test failure in 2.2

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-2:


Assignee: Xiao Li  (was: Apache Spark)

> Fix test failure in 2.2 
> 
>
> Key: SPARK-2
> URL: https://issues.apache.org/jira/browse/SPARK-2
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Blocker
>
> Test failure:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21111) Fix test failure in 2.2

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051095#comment-16051095
 ] 

Apache Spark commented on SPARK-2:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18316

> Fix test failure in 2.2 
> 
>
> Key: SPARK-2
> URL: https://issues.apache.org/jira/browse/SPARK-2
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Blocker
>
> Test failure:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.2-test-sbt-hadoop-2.7/203/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051134#comment-16051134
 ] 

Apache Spark commented on SPARK-21113:
--

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/18317

> Support for read ahead input stream to amortize disk IO cost in the Spill 
> reader
> 
>
> Key: SPARK-21113
> URL: https://issues.apache.org/jira/browse/SPARK-21113
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Sital Kedia
>Priority: Minor
>
> Profiling some of our big jobs, we see that around 30% of the time is being 
> spent in reading the spill files from disk. In order to amortize the disk IO 
> cost, the idea is to implement a read ahead input stream which which 
> asynchronously reads ahead from the underlying input stream when specified 
> amount of data has been read from the current buffer. It does it by 
> maintaining two buffer - active buffer and read ahead buffer. Active buffer 
> contains data which should be returned when a read() call is issued. The read 
> ahead buffer is used to asynchronously read from the underlying input stream 
> and once the current active buffer is exhausted, we flip the two buffers so 
> that we can start reading from the read ahead buffer without being blocked in 
> disk I/O.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21113:


Assignee: Apache Spark

> Support for read ahead input stream to amortize disk IO cost in the Spill 
> reader
> 
>
> Key: SPARK-21113
> URL: https://issues.apache.org/jira/browse/SPARK-21113
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Sital Kedia
>Assignee: Apache Spark
>Priority: Minor
>
> Profiling some of our big jobs, we see that around 30% of the time is being 
> spent in reading the spill files from disk. In order to amortize the disk IO 
> cost, the idea is to implement a read ahead input stream which which 
> asynchronously reads ahead from the underlying input stream when specified 
> amount of data has been read from the current buffer. It does it by 
> maintaining two buffer - active buffer and read ahead buffer. Active buffer 
> contains data which should be returned when a read() call is issued. The read 
> ahead buffer is used to asynchronously read from the underlying input stream 
> and once the current active buffer is exhausted, we flip the two buffers so 
> that we can start reading from the read ahead buffer without being blocked in 
> disk I/O.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21113:


Assignee: (was: Apache Spark)

> Support for read ahead input stream to amortize disk IO cost in the Spill 
> reader
> 
>
> Key: SPARK-21113
> URL: https://issues.apache.org/jira/browse/SPARK-21113
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Sital Kedia
>Priority: Minor
>
> Profiling some of our big jobs, we see that around 30% of the time is being 
> spent in reading the spill files from disk. In order to amortize the disk IO 
> cost, the idea is to implement a read ahead input stream which which 
> asynchronously reads ahead from the underlying input stream when specified 
> amount of data has been read from the current buffer. It does it by 
> maintaining two buffer - active buffer and read ahead buffer. Active buffer 
> contains data which should be returned when a read() call is issued. The read 
> ahead buffer is used to asynchronously read from the underlying input stream 
> and once the current active buffer is exhausted, we flip the two buffers so 
> that we can start reading from the read ahead buffer without being blocked in 
> disk I/O.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21112:


Assignee: Apache Spark  (was: Xiao Li)

> ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
> --
>
> Key: SPARK-21112
> URL: https://issues.apache.org/jira/browse/SPARK-21112
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if 
> the input does not have the property of `COMMENT`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21112:


Assignee: Xiao Li  (was: Apache Spark)

> ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
> --
>
> Key: SPARK-21112
> URL: https://issues.apache.org/jira/browse/SPARK-21112
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if 
> the input does not have the property of `COMMENT`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21112) ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051140#comment-16051140
 ] 

Apache Spark commented on SPARK-21112:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18318

> ALTER TABLE SET TBLPROPERTIES should not overwrite COMMENT
> --
>
> Key: SPARK-21112
> URL: https://issues.apache.org/jira/browse/SPARK-21112
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> {{ALTER TABLE SET TBLPROPERTIES}} should not overwrite the COMMENT even if 
> the input does not have the property of `COMMENT`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21114:


Assignee: Apache Spark  (was: Xiao Li)

> Test failure fix in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051290#comment-16051290
 ] 

Apache Spark commented on SPARK-21114:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18319

> Test failure fix in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21114) Test failure fix in Spark 2.1 due to name mismatch

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21114:


Assignee: Xiao Li  (was: Apache Spark)

> Test failure fix in Spark 2.1 due to name mismatch
> --
>
> Key: SPARK-21114
> URL: https://issues.apache.org/jira/browse/SPARK-21114
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-2.1-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql/SQLQueryTestSuite/arithmetic_sql/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21093:


Assignee: Apache Spark

> Multiple gapply execution occasionally failed in SparkR 
> 
>
> Key: SPARK-21093
> URL: https://issues.apache.org/jira/browse/SPARK-21093
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.1, 2.2.0
> Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>
> On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks 
> failed as below:
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.3.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan 
> since it was too large. This behavior can be adjusted by setting 
> 'spark.debug.maxToStringFields' in SparkEnv.conf.
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> Error in handleErrors(returnStatus, conn) :
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 
> in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage 
> 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: 
> R computation failed with
> at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.a
> ...
> *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
> === Backtrace: =
> /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597]
> /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750]
> /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507]
> /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015]
> /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e]
> /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4]
> /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529]
> /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7]
> /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1]
> /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9]
> /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> /usr/lib64/R/lib/libR.so(+0x11ee91)[0x7fe69c5abe91]
> /usr/lib64/R/lib/libR.so(Rf_

[jira] [Assigned] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21093:


Assignee: (was: Apache Spark)

> Multiple gapply execution occasionally failed in SparkR 
> 
>
> Key: SPARK-21093
> URL: https://issues.apache.org/jira/browse/SPARK-21093
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.1, 2.2.0
> Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3
>Reporter: Hyukjin Kwon
>
> On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks 
> failed as below:
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.3.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan 
> since it was too large. This behavior can be adjusted by setting 
> 'spark.debug.maxToStringFields' in SparkEnv.conf.
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> Error in handleErrors(returnStatus, conn) :
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 
> in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage 
> 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: 
> R computation failed with
> at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.a
> ...
> *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
> === Backtrace: =
> /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597]
> /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750]
> /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507]
> /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015]
> /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e]
> /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4]
> /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529]
> /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7]
> /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1]
> /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9]
> /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> /usr/lib64/R/lib/libR.so(+0x11ee91)[0x7fe69c5abe91]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad13

[jira] [Commented] (SPARK-21093) Multiple gapply execution occasionally failed in SparkR

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051308#comment-16051308
 ] 

Apache Spark commented on SPARK-21093:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/18320

> Multiple gapply execution occasionally failed in SparkR 
> 
>
> Key: SPARK-21093
> URL: https://issues.apache.org/jira/browse/SPARK-21093
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.1, 2.2.0
> Environment: CentOS 7.2.1511 / R 3.4.0, CentOS 7.2.1511 / R 3.3.3
>Reporter: Hyukjin Kwon
>
> On Centos 7.2.1511 with R 3.4.0/3.3.0, multiple execution of {{gapply}} looks 
> failed as below:
> {code}
>  Welcome to
>   __
>/ __/__  ___ _/ /__
>   _\ \/ _ \/ _ `/ __/  '_/
>  /___/ .__/\_,_/_/ /_/\_\   version  2.3.0-SNAPSHOT
> /_/
>  SparkSession available as 'spark'.
> > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> 17/06/14 18:21:01 WARN Utils: Truncated the string representation of a plan 
> since it was too large. This behavior can be adjusted by setting 
> 'spark.debug.maxToStringFields' in SparkEnv.conf.
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>   a b c   d
> 1 1 1 1 0.1
> > collect(gapply(df, "a", function(key, x) { x }, schema(df)))
> Error in handleErrors(returnStatus, conn) :
>   org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 
> in stage 14.0 failed 1 times, most recent failure: Lost task 98.0 in stage 
> 14.0 (TID 1305, localhost, executor driver): org.apache.spark.SparkException: 
> R computation failed with
> at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:432)
> at 
> org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$13.apply(objects.scala:414)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.a
> ...
> *** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
> === Backtrace: =
> /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe699b3f597]
> /lib64/libc.so.6(+0x10c750)[0x7fe699b3d750]
> /lib64/libc.so.6(+0x10e507)[0x7fe699b3f507]
> /usr/lib64/R/modules//internet.so(+0x6015)[0x7fe689bb7015]
> /usr/lib64/R/modules//internet.so(+0xe81e)[0x7fe689bbf81e]
> /usr/lib64/R/lib/libR.so(+0xbd1b6)[0x7fe69c54a1b6]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x354)[0x7fe69c5ad2f4]
> /usr/lib64/R/lib/libR.so(+0x123f8e)[0x7fe69c5b0f8e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x589)[0x7fe69c5ad529]
> /usr/lib64/R/lib/libR.so(+0x1254ce)[0x7fe69c5b24ce]
> /usr/lib64/R/lib/libR.so(+0x1104d0)[0x7fe69c59d4d0]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69c5ada7e]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x817)[0x7fe69c5ad7b7]
> /usr/lib64/R/lib/libR.so(+0x1256d1)[0x7fe69c5b26d1]
> /usr/lib64/R/lib/libR.so(+0x1552e9)[0x7fe69c5e22e9]
> /usr/lib64/R/lib/libR.so(+0x11062a)[0x7fe69c59d62a]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x1221af)[0x7fe69c5af1af]
> /usr/lib64/R/lib/libR.so(+0x119101)[0x7fe69c5a6101]
> /usr/lib64/R/lib/libR.so(Rf_eval+0x198)[0x7fe69c5ad138]
> /usr/lib64/R/lib/libR.so(+0x120a7e)[0x7fe69

[jira] [Commented] (SPARK-12552) Recovered driver's resource is not counted in the Master

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051322#comment-16051322
 ] 

Apache Spark commented on SPARK-12552:
--

User 'jerryshao' has created a pull request for this issue:
https://github.com/apache/spark/pull/18321

> Recovered driver's resource is not counted in the Master
> 
>
> Key: SPARK-12552
> URL: https://issues.apache.org/jira/browse/SPARK-12552
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 1.6.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 2.2.1, 2.3.0
>
>
> Currently in the implementation of Standalone Master HA, if application is 
> submitted as cluster mode, the resource (CPU cores and memory) of driver is 
> not counted again when recovered from failure, which will lead to unexpected 
> behaviors, like more than expected executors, negative core and memory usage 
> in the web UI. Also the recovered application's state is always {{WAITING}}, 
> we have to change the state to {{RUNNING}} when fully recovered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21115) If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21115:


Assignee: (was: Apache Spark)

> If the cores left is less than the coresPerExecutor,the cores left will not 
> be allocated, so it should not to check in every schedule
> -
>
> Key: SPARK-21115
> URL: https://issues.apache.org/jira/browse/SPARK-21115
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: eaton
>Priority: Minor
>
> If we start an app with the param --total-executor-cores=4 and 
> spark.executor.cores=3, the cores left is always 1, so it will try to 
> allocate executors in the function 
> org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule.
> Another question is, is it will be better to allocate another executor with 1 
> core for the cores left.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21115) If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051427#comment-16051427
 ] 

Apache Spark commented on SPARK-21115:
--

User 'eatoncys' has created a pull request for this issue:
https://github.com/apache/spark/pull/18322

> If the cores left is less than the coresPerExecutor,the cores left will not 
> be allocated, so it should not to check in every schedule
> -
>
> Key: SPARK-21115
> URL: https://issues.apache.org/jira/browse/SPARK-21115
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: eaton
>Priority: Minor
>
> If we start an app with the param --total-executor-cores=4 and 
> spark.executor.cores=3, the cores left is always 1, so it will try to 
> allocate executors in the function 
> org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule.
> Another question is, is it will be better to allocate another executor with 1 
> core for the cores left.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21115) If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21115:


Assignee: Apache Spark

> If the cores left is less than the coresPerExecutor,the cores left will not 
> be allocated, so it should not to check in every schedule
> -
>
> Key: SPARK-21115
> URL: https://issues.apache.org/jira/browse/SPARK-21115
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: eaton
>Assignee: Apache Spark
>Priority: Minor
>
> If we start an app with the param --total-executor-cores=4 and 
> spark.executor.cores=3, the cores left is always 1, so it will try to 
> allocate executors in the function 
> org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule.
> Another question is, is it will be better to allocate another executor with 1 
> core for the cores left.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21117:


Assignee: (was: Apache Spark)

> Built-in SQL Function Support - WIDTH_BUCKET
> 
>
> Key: SPARK-21117
> URL: https://issues.apache.org/jira/browse/SPARK-21117
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>
> For a given expression, the {{WIDTH_BUCKET}} function returns the bucket 
> number into which the value of this expression would fall after being 
> evaluated.
> {code:sql}
> WIDTH_BUCKET (expr , min_value , max_value , num_buckets)
> {code}
> Ref: 
> https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET

2017-06-15 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21117:


Assignee: Apache Spark

> Built-in SQL Function Support - WIDTH_BUCKET
> 
>
> Key: SPARK-21117
> URL: https://issues.apache.org/jira/browse/SPARK-21117
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>
> For a given expression, the {{WIDTH_BUCKET}} function returns the bucket 
> number into which the value of this expression would fall after being 
> evaluated.
> {code:sql}
> WIDTH_BUCKET (expr , min_value , max_value , num_buckets)
> {code}
> Ref: 
> https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21117) Built-in SQL Function Support - WIDTH_BUCKET

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051457#comment-16051457
 ] 

Apache Spark commented on SPARK-21117:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/18323

> Built-in SQL Function Support - WIDTH_BUCKET
> 
>
> Key: SPARK-21117
> URL: https://issues.apache.org/jira/browse/SPARK-21117
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>
> For a given expression, the {{WIDTH_BUCKET}} function returns the bucket 
> number into which the value of this expression would fall after being 
> evaluated.
> {code:sql}
> WIDTH_BUCKET (expr , min_value , max_value , num_buckets)
> {code}
> Ref: 
> https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21045) Spark executor blocked instead of throwing exception because exception occur when python worker send exception info to PythonRDD in Python 2+

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051471#comment-16051471
 ] 

Apache Spark commented on SPARK-21045:
--

User 'dataknocker' has created a pull request for this issue:
https://github.com/apache/spark/pull/18261

> Spark executor blocked instead of throwing exception because exception occur 
> when python worker send exception info to PythonRDD in Python 2+
> -
>
> Key: SPARK-21045
> URL: https://issues.apache.org/jira/browse/SPARK-21045
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.1, 2.0.2, 2.1.1
> Environment: It has problem only in Python 2+. 
> Python 3+ is ok.
>Reporter: Joshuawangzj
>
> My pyspark program is always blocking in product yarn cluster. Then I jstack 
> and found :
> {code}
> "Executor task launch worker for task 0" #60 daemon prio=5 os_prio=31 
> tid=0x7fb2f44e3000 nid=0xa003 runnable [0x000123b4a000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0007acab1c98> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:190)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234)
> at 
> org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It is blocking in socket read.  I view the log on blocking executor and found 
> error:
> {code}
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 178, in 
> main
> write_with_length(traceback.format_exc().encode("utf-8"), outfile)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 618: 
> ordinal not in range(128)
> {code}
> Finally I found the problem:
> {code:title=worker.py|borderStyle=solid}
> # 178 line in spark 2.1.1
> except Exception:
> try:
> write_int(SpecialLengths.PYTHON_EXCEPTION_THROWN, outfile)
> write_with_length(traceback.format_exc().encode("utf-8"), outfile)
> except IOError:
> # JVM close the socket
> pass
> except Exception:
> # Write the error to stderr if it happened while serializing
> print("PySpark worker failed with exception:", file=sys.stderr)
> print(traceback.format_exc(), file=sys.stderr)
> {code}
> when write_with_length(traceback.format_exc().encode("utf-8"), outfile) occur 
> exception like UnicodeDecodeError, the python worker can't send the trace 
> info, but when the PythonRDD get PYTHON_EXCEPTION_THROWN, It should read the 
> trace info length next. So it is blocking.
> {code:title=PythonRDD.scala|borderStyle=solid}
> # 190 line in spark 2.1.1
> case SpecialLengths.PYTHON_EXCEPTION_THROWN =>
>  // Signals that an exception has been thrown in python
>  val exLength = stream.readInt()  // It is possible to be blocked
> {code}
> {color:red}
> We can triggle the bug use simple program:
> {color}
> {code:title=test.py|borderStyle=solid}
> spark = SparkSession.builder.master('local').getOrCreate()
> rdd = spark.sparkContext.parallelize(['中']).map(lambda x: 
> x.encode("utf8"))
> rdd.collect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21045) Spark executor blocked instead of throwing exception because exception occur when python worker send exception info to PythonRDD in Python 2+

2017-06-15 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051485#comment-16051485
 ] 

Apache Spark commented on SPARK-21045:
--

User 'dataknocker' has created a pull request for this issue:
https://github.com/apache/spark/pull/18324

> Spark executor blocked instead of throwing exception because exception occur 
> when python worker send exception info to PythonRDD in Python 2+
> -
>
> Key: SPARK-21045
> URL: https://issues.apache.org/jira/browse/SPARK-21045
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.1, 2.0.2, 2.1.1
> Environment: It has problem only in Python 2+. 
> Python 3+ is ok.
>Reporter: Joshuawangzj
>
> My pyspark program is always blocking in product yarn cluster. Then I jstack 
> and found :
> {code}
> "Executor task launch worker for task 0" #60 daemon prio=5 os_prio=31 
> tid=0x7fb2f44e3000 nid=0xa003 runnable [0x000123b4a000]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:170)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x0007acab1c98> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:190)
> at 
> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234)
> at 
> org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It is blocking in socket read.  I view the log on blocking executor and found 
> error:
> {code}
> Traceback (most recent call last):
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 178, in 
> main
> write_with_length(traceback.format_exc().encode("utf-8"), outfile)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 618: 
> ordinal not in range(128)
> {code}
> Finally I found the problem:
> {code:title=worker.py|borderStyle=solid}
> # 178 line in spark 2.1.1
> except Exception:
> try:
> write_int(SpecialLengths.PYTHON_EXCEPTION_THROWN, outfile)
> write_with_length(traceback.format_exc().encode("utf-8"), outfile)
> except IOError:
> # JVM close the socket
> pass
> except Exception:
> # Write the error to stderr if it happened while serializing
> print("PySpark worker failed with exception:", file=sys.stderr)
> print(traceback.format_exc(), file=sys.stderr)
> {code}
> when write_with_length(traceback.format_exc().encode("utf-8"), outfile) occur 
> exception like UnicodeDecodeError, the python worker can't send the trace 
> info, but when the PythonRDD get PYTHON_EXCEPTION_THROWN, It should read the 
> trace info length next. So it is blocking.
> {code:title=PythonRDD.scala|borderStyle=solid}
> # 190 line in spark 2.1.1
> case SpecialLengths.PYTHON_EXCEPTION_THROWN =>
>  // Signals that an exception has been thrown in python
>  val exLength = stream.readInt()  // It is possible to be blocked
> {code}
> {color:red}
> We can triggle the bug use simple program:
> {color}
> {code:title=test.py|borderStyle=solid}
> spark = SparkSession.builder.master('local').getOrCreate()
> rdd = spark.sparkContext.parallelize(['中']).map(lambda x: 
> x.encode("utf8"))
> rdd.collect()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21119) unset table properties should keep the table comment

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051490#comment-16051490
 ] 

Apache Spark commented on SPARK-21119:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/18325

> unset table properties should keep the table comment
> 
>
> Key: SPARK-21119
> URL: https://issues.apache.org/jira/browse/SPARK-21119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21119) unset table properties should keep the table comment

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21119:


Assignee: Apache Spark  (was: Wenchen Fan)

> unset table properties should keep the table comment
> 
>
> Key: SPARK-21119
> URL: https://issues.apache.org/jira/browse/SPARK-21119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21119) unset table properties should keep the table comment

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21119:


Assignee: Wenchen Fan  (was: Apache Spark)

> unset table properties should keep the table comment
> 
>
> Key: SPARK-21119
> URL: https://issues.apache.org/jira/browse/SPARK-21119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21120:


Assignee: (was: Apache Spark)

> Increasing the master's metric is conducive to the spark cluster management 
> system monitoring.
> --
>
> Key: SPARK-21120
> URL: https://issues.apache.org/jira/browse/SPARK-21120
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: guoxiaolongzte
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051660#comment-16051660
 ] 

Apache Spark commented on SPARK-21120:
--

User 'guoxiaolongzte' has created a pull request for this issue:
https://github.com/apache/spark/pull/18326

> Increasing the master's metric is conducive to the spark cluster management 
> system monitoring.
> --
>
> Key: SPARK-21120
> URL: https://issues.apache.org/jira/browse/SPARK-21120
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: guoxiaolongzte
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21120:


Assignee: Apache Spark

> Increasing the master's metric is conducive to the spark cluster management 
> system monitoring.
> --
>
> Key: SPARK-21120
> URL: https://issues.apache.org/jira/browse/SPARK-21120
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: guoxiaolongzte
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21047) Add test suites for complicated cases in ColumnarBatchSuite

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21047:


Assignee: Apache Spark

> Add test suites for complicated cases in ColumnarBatchSuite
> ---
>
> Key: SPARK-21047
> URL: https://issues.apache.org/jira/browse/SPARK-21047
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> Current {{ColumnarBatchSuite}} has very simple test cases for array. This 
> JIRA will add test suites for complicated cases such as nested array in 
> {{ColumnVector}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21047) Add test suites for complicated cases in ColumnarBatchSuite

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051768#comment-16051768
 ] 

Apache Spark commented on SPARK-21047:
--

User 'jinxing64' has created a pull request for this issue:
https://github.com/apache/spark/pull/18327

> Add test suites for complicated cases in ColumnarBatchSuite
> ---
>
> Key: SPARK-21047
> URL: https://issues.apache.org/jira/browse/SPARK-21047
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>
> Current {{ColumnarBatchSuite}} has very simple test cases for array. This 
> JIRA will add test suites for complicated cases such as nested array in 
> {{ColumnVector}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21047) Add test suites for complicated cases in ColumnarBatchSuite

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21047:


Assignee: (was: Apache Spark)

> Add test suites for complicated cases in ColumnarBatchSuite
> ---
>
> Key: SPARK-21047
> URL: https://issues.apache.org/jira/browse/SPARK-21047
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>
> Current {{ColumnarBatchSuite}} has very simple test cases for array. This 
> JIRA will add test suites for complicated cases such as nested array in 
> {{ColumnVector}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21121) Set up StorageLevel for CACHE TABLE command

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051910#comment-16051910
 ] 

Apache Spark commented on SPARK-21121:
--

User 'dosoft' has created a pull request for this issue:
https://github.com/apache/spark/pull/18328

> Set up StorageLevel for CACHE TABLE command
> ---
>
> Key: SPARK-21121
> URL: https://issues.apache.org/jira/browse/SPARK-21121
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Oleg Danilov
>Priority: Minor
>
> Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage 
> level. We can add a possibility to specify it using variable, let say, 
> spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit 
> data into the memory with using MEMORY_AND_DISK_SER storage level.
> Going to submit PR for this change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21121) Set up StorageLevel for CACHE TABLE command

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21121:


Assignee: Apache Spark

> Set up StorageLevel for CACHE TABLE command
> ---
>
> Key: SPARK-21121
> URL: https://issues.apache.org/jira/browse/SPARK-21121
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Oleg Danilov
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage 
> level. We can add a possibility to specify it using variable, let say, 
> spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit 
> data into the memory with using MEMORY_AND_DISK_SER storage level.
> Going to submit PR for this change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21121) Set up StorageLevel for CACHE TABLE command

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21121:


Assignee: (was: Apache Spark)

> Set up StorageLevel for CACHE TABLE command
> ---
>
> Key: SPARK-21121
> URL: https://issues.apache.org/jira/browse/SPARK-21121
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Oleg Danilov
>Priority: Minor
>
> Currently, "CACHE TABLE" always uses the default MEMORY_AND_DISK storage 
> level. We can add a possibility to specify it using variable, let say, 
> spark.sql.inMemoryColumnarStorage.level. It will give user a chance to fit 
> data into the memory with using MEMORY_AND_DISK_SER storage level.
> Going to submit PR for this change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19909) Batches will fail in case that temporary checkpoint dir is on local file system while metadata dir is on HDFS

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051981#comment-16051981
 ] 

Apache Spark commented on SPARK-19909:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/18329

> Batches will fail in case that temporary checkpoint dir is on local file 
> system while metadata dir is on HDFS
> -
>
> Key: SPARK-19909
> URL: https://issues.apache.org/jira/browse/SPARK-19909
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Kousuke Saruta
>Priority: Minor
>
> When we try to run Structured Streaming in local mode but use HDFS for the 
> storage, batches will be fail because of error like as follows.
> {code}
> val handle = stream.writeStream.format("console").start()
> 17/03/09 16:54:45 ERROR StreamMetadata: Error writing stream metadata 
> StreamMetadata(fc07a0b1-5423-483e-a59d-b2206a49491e) to 
> /private/var/folders/4y/tmspvv353y59p3w4lknrf7ccgn/T/temporary-79d4fe05-4301-4b6d-a902-dff642d0ddca/metadata
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=kou, access=WRITE, 
> inode="/private/var/folders/4y/tmspvv353y59p3w4lknrf7ccgn/T/temporary-79d4fe05-4301-4b6d-a902-dff642d0ddca/metadata":hdfs:supergroup:drwxr-xr-x
> {code}
> It's because that a temporary checkpoint directory is created on local file 
> system but metadata whose path is based on the checkpoint directory will be 
> created on HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20749) Built-in SQL Function Support - all variants of LEN[GTH]

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052058#comment-16052058
 ] 

Apache Spark commented on SPARK-20749:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/18330

> Built-in SQL Function Support - all variants of LEN[GTH]
> 
>
> Key: SPARK-20749
> URL: https://issues.apache.org/jira/browse/SPARK-20749
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Kazuaki Ishizaki
>  Labels: starter
> Fix For: 2.3.0
>
>
> {noformat}
> LEN[GTH]()
> {noformat}
> The SQL 99 standard includes BIT_LENGTH(), CHAR_LENGTH(), and OCTET_LENGTH() 
> functions.
> We need to support all of them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21124) Wrong user shown in UI when using kerberos

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21124:


Assignee: Apache Spark

> Wrong user shown in UI when using kerberos
> --
>
> Key: SPARK-21124
> URL: https://issues.apache.org/jira/browse/SPARK-21124
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> When submitting an app to a kerberos-secured cluster, the OS user and the 
> user running the application may differ. Although it may also happen in 
> cluster mode depending on the cluster manager's configuration, it's more 
> common in client mode.
> The UI should show enough information about user running the application to 
> correctly identify the actual user. The "app user" can be easily retrieved 
> via {{Utils.getCurrentUserName()}}, so it's mostly a matter of how to record 
> this information (for showing in replayed applications) and how to present it 
> in the UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21124) Wrong user shown in UI when using kerberos

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21124:


Assignee: (was: Apache Spark)

> Wrong user shown in UI when using kerberos
> --
>
> Key: SPARK-21124
> URL: https://issues.apache.org/jira/browse/SPARK-21124
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> When submitting an app to a kerberos-secured cluster, the OS user and the 
> user running the application may differ. Although it may also happen in 
> cluster mode depending on the cluster manager's configuration, it's more 
> common in client mode.
> The UI should show enough information about user running the application to 
> correctly identify the actual user. The "app user" can be easily retrieved 
> via {{Utils.getCurrentUserName()}}, so it's mostly a matter of how to record 
> this information (for showing in replayed applications) and how to present it 
> in the UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21124) Wrong user shown in UI when using kerberos

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052441#comment-16052441
 ] 

Apache Spark commented on SPARK-21124:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/18331

> Wrong user shown in UI when using kerberos
> --
>
> Key: SPARK-21124
> URL: https://issues.apache.org/jira/browse/SPARK-21124
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> When submitting an app to a kerberos-secured cluster, the OS user and the 
> user running the application may differ. Although it may also happen in 
> cluster mode depending on the cluster manager's configuration, it's more 
> common in client mode.
> The UI should show enough information about user running the application to 
> correctly identify the actual user. The "app user" can be easily retrieved 
> via {{Utils.getCurrentUserName()}}, so it's mostly a matter of how to record 
> this information (for showing in replayed applications) and how to present it 
> in the UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21125) PySpark context missing function to set Job Description.

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052516#comment-16052516
 ] 

Apache Spark commented on SPARK-21125:
--

User 'sjarvie' has created a pull request for this issue:
https://github.com/apache/spark/pull/18332

> PySpark context missing function to set Job Description.
> 
>
> Key: SPARK-21125
> URL: https://issues.apache.org/jira/browse/SPARK-21125
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Shane Jarvie
>Priority: Trivial
>  Labels: beginner
> Fix For: 2.2.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The PySpark API is missing a convienient function currently found in the 
> Scala API, which sets the Job Description for display in the Spark UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21125) PySpark context missing function to set Job Description.

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21125:


Assignee: Apache Spark

> PySpark context missing function to set Job Description.
> 
>
> Key: SPARK-21125
> URL: https://issues.apache.org/jira/browse/SPARK-21125
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Shane Jarvie
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: beginner
> Fix For: 2.2.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The PySpark API is missing a convienient function currently found in the 
> Scala API, which sets the Job Description for display in the Spark UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21125) PySpark context missing function to set Job Description.

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21125:


Assignee: (was: Apache Spark)

> PySpark context missing function to set Job Description.
> 
>
> Key: SPARK-21125
> URL: https://issues.apache.org/jira/browse/SPARK-21125
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Shane Jarvie
>Priority: Trivial
>  Labels: beginner
> Fix For: 2.2.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The PySpark API is missing a convienient function currently found in the 
> Scala API, which sets the Job Description for display in the Spark UI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark

2017-06-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052626#comment-16052626
 ] 

Apache Spark commented on SPARK-21126:
--

User 'liu-zhaokun' has created a pull request for this issue:
https://github.com/apache/spark/pull/18333

> The configuration which named "spark.core.connection.auth.wait.timeout" 
> hasn't been used in spark
> -
>
> Key: SPARK-21126
> URL: https://issues.apache.org/jira/browse/SPARK-21126
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: liuzhaokun
>
> The configuration which named "spark.core.connection.auth.wait.timeout" 
> hasn't been used in spark,so I think it should be removed from 
> configuration.md.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21126:


Assignee: Apache Spark

> The configuration which named "spark.core.connection.auth.wait.timeout" 
> hasn't been used in spark
> -
>
> Key: SPARK-21126
> URL: https://issues.apache.org/jira/browse/SPARK-21126
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: liuzhaokun
>Assignee: Apache Spark
>
> The configuration which named "spark.core.connection.auth.wait.timeout" 
> hasn't been used in spark,so I think it should be removed from 
> configuration.md.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21126) The configuration which named "spark.core.connection.auth.wait.timeout" hasn't been used in spark

2017-06-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21126:


Assignee: (was: Apache Spark)

> The configuration which named "spark.core.connection.auth.wait.timeout" 
> hasn't been used in spark
> -
>
> Key: SPARK-21126
> URL: https://issues.apache.org/jira/browse/SPARK-21126
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: liuzhaokun
>
> The configuration which named "spark.core.connection.auth.wait.timeout" 
> hasn't been used in spark,so I think it should be removed from 
> configuration.md.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21127) Update statistics after data changing commands

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21127:


Assignee: Apache Spark

> Update statistics after data changing commands
> --
>
> Key: SPARK-21127
> URL: https://issues.apache.org/jira/browse/SPARK-21127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21127) Update statistics after data changing commands

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21127:


Assignee: (was: Apache Spark)

> Update statistics after data changing commands
> --
>
> Key: SPARK-21127
> URL: https://issues.apache.org/jira/browse/SPARK-21127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21127) Update statistics after data changing commands

2017-06-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052751#comment-16052751
 ] 

Apache Spark commented on SPARK-21127:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/18334

> Update statistics after data changing commands
> --
>
> Key: SPARK-21127
> URL: https://issues.apache.org/jira/browse/SPARK-21127
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21128:


Assignee: Apache Spark

> Running R tests multiple times failed due to pre-exiting "spark-warehouse" / 
> "metastore_db"
> ---
>
> Key: SPARK-21128
> URL: https://issues.apache.org/jira/browse/SPARK-21128
> Project: Spark
>  Issue Type: Test
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, running R tests multiple times fails due to pre-exiting 
> "spark-warehouse" / "metastore_db" as below:
> {code}
> SparkSQL functions: Spark package found in SPARK_HOME: .../spark
> ...1234...
> {code}
> {code}
> Failed 
> -
> 1. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 2. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> 3. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 4. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> DONE 
> ===
> {code}
> It looks we should remove both "spark-warehouse" and "metastore_db" _before_ 
> listing files into  {{sparkRFilesBefore}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"

2017-06-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052792#comment-16052792
 ] 

Apache Spark commented on SPARK-21128:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/18335

> Running R tests multiple times failed due to pre-exiting "spark-warehouse" / 
> "metastore_db"
> ---
>
> Key: SPARK-21128
> URL: https://issues.apache.org/jira/browse/SPARK-21128
> Project: Spark
>  Issue Type: Test
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, running R tests multiple times fails due to pre-exiting 
> "spark-warehouse" / "metastore_db" as below:
> {code}
> SparkSQL functions: Spark package found in SPARK_HOME: .../spark
> ...1234...
> {code}
> {code}
> Failed 
> -
> 1. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 2. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> 3. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 4. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> DONE 
> ===
> {code}
> It looks we should remove both "spark-warehouse" and "metastore_db" _before_ 
> listing files into  {{sparkRFilesBefore}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21128) Running R tests multiple times failed due to pre-exiting "spark-warehouse" / "metastore_db"

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21128:


Assignee: (was: Apache Spark)

> Running R tests multiple times failed due to pre-exiting "spark-warehouse" / 
> "metastore_db"
> ---
>
> Key: SPARK-21128
> URL: https://issues.apache.org/jira/browse/SPARK-21128
> Project: Spark
>  Issue Type: Test
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> Currently, running R tests multiple times fails due to pre-exiting 
> "spark-warehouse" / "metastore_db" as below:
> {code}
> SparkSQL functions: Spark package found in SPARK_HOME: .../spark
> ...1234...
> {code}
> {code}
> Failed 
> -
> 1. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 2. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3384)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> 3. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> length(list1) not equal to length(list2).
> 1/1 mismatches
> [1] 25 - 23 == 2
> 4. Failure: No extra files are created in SPARK_HOME by starting session and 
> making calls (@test_sparkSQL.R#3388)
> sort(list1, na.last = TRUE) not equal to sort(list2, na.last = TRUE).
> 10/25 mismatches
> x[16]: "metastore_db"
> y[16]: "pkg"
> x[17]: "pkg"
> y[17]: "R"
> x[18]: "R"
> y[18]: "README.md"
> x[19]: "README.md"
> y[19]: "run-tests.sh"
> x[20]: "run-tests.sh"
> y[20]: "SparkR_2.2.0.tar.gz"
> x[21]: "metastore_db"
> y[21]: "pkg"
> x[22]: "pkg"
> y[22]: "R"
> x[23]: "R"
> y[23]: "README.md"
> x[24]: "README.md"
> y[24]: "run-tests.sh"
> x[25]: "run-tests.sh"
> y[25]: "SparkR_2.2.0.tar.gz"
> DONE 
> ===
> {code}
> It looks we should remove both "spark-warehouse" and "metastore_db" _before_ 
> listing files into  {{sparkRFilesBefore}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21129) Arguments of SQL function call should not be named expressions

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21129:


Assignee: Xiao Li  (was: Apache Spark)

> Arguments of SQL function call should not be named expressions
> --
>
> Key: SPARK-21129
> URL: https://issues.apache.org/jira/browse/SPARK-21129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Function argument should not be named expressions. It could cause misleading 
> error message.
> {noformat}
> spark-sql> select count(distinct c1, distinct c2) from t1;
> {noformat}
> {noformat}
> Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; 
> line 1 pos 26;
> 'Project [unresolvedalias('count(c1#30, 'distinct), None)]
> +- SubqueryAlias t1
>+- CatalogRelation `default`.`t1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21129) Arguments of SQL function call should not be named expressions

2017-06-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052990#comment-16052990
 ] 

Apache Spark commented on SPARK-21129:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18338

> Arguments of SQL function call should not be named expressions
> --
>
> Key: SPARK-21129
> URL: https://issues.apache.org/jira/browse/SPARK-21129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Function argument should not be named expressions. It could cause misleading 
> error message.
> {noformat}
> spark-sql> select count(distinct c1, distinct c2) from t1;
> {noformat}
> {noformat}
> Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; 
> line 1 pos 26;
> 'Project [unresolvedalias('count(c1#30, 'distinct), None)]
> +- SubqueryAlias t1
>+- CatalogRelation `default`.`t1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21129) Arguments of SQL function call should not be named expressions

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21129:


Assignee: Apache Spark  (was: Xiao Li)

> Arguments of SQL function call should not be named expressions
> --
>
> Key: SPARK-21129
> URL: https://issues.apache.org/jira/browse/SPARK-21129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Function argument should not be named expressions. It could cause misleading 
> error message.
> {noformat}
> spark-sql> select count(distinct c1, distinct c2) from t1;
> {noformat}
> {noformat}
> Error in query: cannot resolve '`distinct`' given input columns: [c1, c2]; 
> line 1 pos 26;
> 'Project [unresolvedalias('count(c1#30, 'distinct), None)]
> +- SubqueryAlias t1
>+- CatalogRelation `default`.`t1`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#30, c2#31]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21094:


Assignee: Apache Spark

> Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway
> 
>
> Key: SPARK-21094
> URL: https://issues.apache.org/jira/browse/SPARK-21094
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Peter Parente
>Assignee: Apache Spark
>
> The Popen call to launch the py4j gateway specifies no stdout and stderr 
> options, meaning logging from the JVM always goes to the parent process 
> terminal. 
> https://github.com/apache/spark/blob/v2.1.1/python/pyspark/java_gateway.py#L77
> It would be super handy if the launch_gateway function took an additional 
> dict parameter called popen_kwargs and passed it along to the Popen calls. 
> This API enhancement, for example, would allow Python applications to capture 
> all stdout and stderr coming from Spark and process it programmatically, 
> without resorting to reading from log files or other hijinks.
> Example use:
> {code}
> import pyspark
> import subprocess
> from pyspark.java_gateway import launch_gateway
> # Make the py4j JVM stdout and stderr available without buffering
> popen_kwargs = {
>   'stdout': subprocess.PIPE,
>   'stderr': subprocess.PIPE,
>   'bufsiz': 0
> }
> # Launch the gateway with our custom settings
> gateway = launch_gateway(popen_kwargs=popen_kwargs)
> # Use the gateway we launched
> sc = pyspark.SparkContext(gateway=gateway)
> # This could be done in a thread or event loop or ...
> # Written briefly / poorly here only as a demo
> while True:
>   buf = gateway.proc.stdout.read()
>   print(buf.decode('utf-8'))
> {code}
> To get access to the stdout and stderr pipes, the "proc" instance created in 
> launch_gateway also needs to be exposed to the application. I'm thinking that 
> stashing it on the JavaGateway instance that the function already returns is 
> the cleanest from the client perspective, but means hanging an extra 
> attribute off the py4j.JavaGateway object. 
> I can submit a PR with this addition for further discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21094:


Assignee: (was: Apache Spark)

> Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway
> 
>
> Key: SPARK-21094
> URL: https://issues.apache.org/jira/browse/SPARK-21094
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Peter Parente
>
> The Popen call to launch the py4j gateway specifies no stdout and stderr 
> options, meaning logging from the JVM always goes to the parent process 
> terminal. 
> https://github.com/apache/spark/blob/v2.1.1/python/pyspark/java_gateway.py#L77
> It would be super handy if the launch_gateway function took an additional 
> dict parameter called popen_kwargs and passed it along to the Popen calls. 
> This API enhancement, for example, would allow Python applications to capture 
> all stdout and stderr coming from Spark and process it programmatically, 
> without resorting to reading from log files or other hijinks.
> Example use:
> {code}
> import pyspark
> import subprocess
> from pyspark.java_gateway import launch_gateway
> # Make the py4j JVM stdout and stderr available without buffering
> popen_kwargs = {
>   'stdout': subprocess.PIPE,
>   'stderr': subprocess.PIPE,
>   'bufsiz': 0
> }
> # Launch the gateway with our custom settings
> gateway = launch_gateway(popen_kwargs=popen_kwargs)
> # Use the gateway we launched
> sc = pyspark.SparkContext(gateway=gateway)
> # This could be done in a thread or event loop or ...
> # Written briefly / poorly here only as a demo
> while True:
>   buf = gateway.proc.stdout.read()
>   print(buf.decode('utf-8'))
> {code}
> To get access to the stdout and stderr pipes, the "proc" instance created in 
> launch_gateway also needs to be exposed to the application. I'm thinking that 
> stashing it on the JavaGateway instance that the function already returns is 
> the cleanest from the client perspective, but means hanging an extra 
> attribute off the py4j.JavaGateway object. 
> I can submit a PR with this addition for further discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21094) Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway

2017-06-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053028#comment-16053028
 ] 

Apache Spark commented on SPARK-21094:
--

User 'parente' has created a pull request for this issue:
https://github.com/apache/spark/pull/18339

> Allow stdout/stderr pipes in pyspark.java_gateway.launch_gateway
> 
>
> Key: SPARK-21094
> URL: https://issues.apache.org/jira/browse/SPARK-21094
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Peter Parente
>
> The Popen call to launch the py4j gateway specifies no stdout and stderr 
> options, meaning logging from the JVM always goes to the parent process 
> terminal. 
> https://github.com/apache/spark/blob/v2.1.1/python/pyspark/java_gateway.py#L77
> It would be super handy if the launch_gateway function took an additional 
> dict parameter called popen_kwargs and passed it along to the Popen calls. 
> This API enhancement, for example, would allow Python applications to capture 
> all stdout and stderr coming from Spark and process it programmatically, 
> without resorting to reading from log files or other hijinks.
> Example use:
> {code}
> import pyspark
> import subprocess
> from pyspark.java_gateway import launch_gateway
> # Make the py4j JVM stdout and stderr available without buffering
> popen_kwargs = {
>   'stdout': subprocess.PIPE,
>   'stderr': subprocess.PIPE,
>   'bufsiz': 0
> }
> # Launch the gateway with our custom settings
> gateway = launch_gateway(popen_kwargs=popen_kwargs)
> # Use the gateway we launched
> sc = pyspark.SparkContext(gateway=gateway)
> # This could be done in a thread or event loop or ...
> # Written briefly / poorly here only as a demo
> while True:
>   buf = gateway.proc.stdout.read()
>   print(buf.decode('utf-8'))
> {code}
> To get access to the stdout and stderr pipes, the "proc" instance created in 
> launch_gateway also needs to be exposed to the application. I'm thinking that 
> stashing it on the JavaGateway instance that the function already returns is 
> the cleanest from the client perspective, but means hanging an extra 
> attribute off the py4j.JavaGateway object. 
> I can submit a PR with this addition for further discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21131) Fix batch gradient bug in SVDPlusPlus

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21131:


Assignee: (was: Apache Spark)

> Fix batch gradient bug in SVDPlusPlus
> -
>
> Key: SPARK-21131
> URL: https://issues.apache.org/jira/browse/SPARK-21131
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.1.1
>Reporter: ruiqiangfan
>
> I think the mode of batch gradient in Graphx SVDPlusPlus is incorrect. The 
> changes of this pull request are as blow:
> * Use the mean of batch gradient instead of the sum of batch gradient.
> * When calculating the bias of each rating during iteration,  the bias 
> shouldn't be limit in (minVal, maxVal).
> * The method "calculateIterationRMSE" is added to show the iteration effect 
> clearly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21131) Fix batch gradient bug in SVDPlusPlus

2017-06-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053032#comment-16053032
 ] 

Apache Spark commented on SPARK-21131:
--

User 'lxmly' has created a pull request for this issue:
https://github.com/apache/spark/pull/18337

> Fix batch gradient bug in SVDPlusPlus
> -
>
> Key: SPARK-21131
> URL: https://issues.apache.org/jira/browse/SPARK-21131
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.1.1
>Reporter: ruiqiangfan
>
> I think the mode of batch gradient in Graphx SVDPlusPlus is incorrect. The 
> changes of this pull request are as blow:
> * Use the mean of batch gradient instead of the sum of batch gradient.
> * When calculating the bias of each rating during iteration,  the bias 
> shouldn't be limit in (minVal, maxVal).
> * The method "calculateIterationRMSE" is added to show the iteration effect 
> clearly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21131) Fix batch gradient bug in SVDPlusPlus

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21131:


Assignee: Apache Spark

> Fix batch gradient bug in SVDPlusPlus
> -
>
> Key: SPARK-21131
> URL: https://issues.apache.org/jira/browse/SPARK-21131
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.1.1
>Reporter: ruiqiangfan
>Assignee: Apache Spark
>
> I think the mode of batch gradient in Graphx SVDPlusPlus is incorrect. The 
> changes of this pull request are as blow:
> * Use the mean of batch gradient instead of the sum of batch gradient.
> * When calculating the bias of each rating during iteration,  the bias 
> shouldn't be limit in (minVal, maxVal).
> * The method "calculateIterationRMSE" is added to show the iteration effect 
> clearly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored

2017-06-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053054#comment-16053054
 ] 

Apache Spark commented on SPARK-21132:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18340

> DISTINCT modifier of function arguments should not be silently ignored
> --
>
> Key: SPARK-21132
> URL: https://issues.apache.org/jira/browse/SPARK-21132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> DISTINCT modifier of function arguments should not be silently ignored when 
> it is not being supported. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21132:


Assignee: Xiao Li  (was: Apache Spark)

> DISTINCT modifier of function arguments should not be silently ignored
> --
>
> Key: SPARK-21132
> URL: https://issues.apache.org/jira/browse/SPARK-21132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> DISTINCT modifier of function arguments should not be silently ignored when 
> it is not being supported. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21132) DISTINCT modifier of function arguments should not be silently ignored

2017-06-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21132:


Assignee: Apache Spark  (was: Xiao Li)

> DISTINCT modifier of function arguments should not be silently ignored
> --
>
> Key: SPARK-21132
> URL: https://issues.apache.org/jira/browse/SPARK-21132
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> DISTINCT modifier of function arguments should not be silently ignored when 
> it is not being supported. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053093#comment-16053093
 ] 

Apache Spark commented on SPARK-21133:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/18343

> HighlyCompressedMapStatus#writeExternal throws NPE
> --
>
> Key: SPARK-21133
> URL: https://issues.apache.org/jira/browse/SPARK-21133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>
> Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for 
> simple:
> {code:sql}
> spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7   -e "
>   set spark.sql.shuffle.partitions=2001;
>   drop table if exists spark_hcms_npe;
>   create table spark_hcms_npe as select id, count(*) from big_table group by 
> id;
> "
> {code}
> Error logs:
> {noformat}
> 17/06/18 15:00:27 ERROR Utils: Exception encountered
> java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecu

[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21133:


Assignee: Apache Spark

> HighlyCompressedMapStatus#writeExternal throws NPE
> --
>
> Key: SPARK-21133
> URL: https://issues.apache.org/jira/browse/SPARK-21133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>
> Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for 
> simple:
> {code:sql}
> spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7   -e "
>   set spark.sql.shuffle.partitions=2001;
>   drop table if exists spark_hcms_npe;
>   create table spark_hcms_npe as select id, count(*) from big_table group by 
> id;
> "
> {code}
> Error logs:
> {noformat}
> 17/06/18 15:00:27 ERROR Utils: Exception encountered
> java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java

[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21133:


Assignee: (was: Apache Spark)

> HighlyCompressedMapStatus#writeExternal throws NPE
> --
>
> Key: SPARK-21133
> URL: https://issues.apache.org/jira/browse/SPARK-21133
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>
> Reproduce, set {{set spark.sql.shuffle.partitions>2000}} with shuffle, for 
> simple:
> {code:sql}
> spark-sql --executor-memory 12g --driver-memory 8g --executor-cores 7   -e "
>   set spark.sql.shuffle.partitions=2001;
>   drop table if exists spark_hcms_npe;
>   create table spark_hcms_npe as select id, count(*) from big_table group by 
> id;
> "
> {code}
> Error logs:
> {noformat}
> 17/06/18 15:00:27 ERROR Utils: Exception encountered
> java.lang.NullPointerException
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply$mcV$sp(MapStatus.scala:171)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$$anonfun$writeExternal$2.apply(MapStatus.scala:167)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/06/18 15:00:27 ERROR MapOutputTrackerMaster: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
> at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.writeExternal(MapStatus.scala:167)
> at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply$mcV$sp(MapOutputTracker.scala:617)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at 
> org.apache.spark.MapOutputTracker$$anonfun$serializeMapStatuses$1.apply(MapOutputTracker.scala:616)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1337)
> at 
> org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:619)
> at 
> org.apache.spark.MapOutputTrackerMaster.getSerializedMapOutputStatuses(MapOutputTracker.scala:562)
> at 
> org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.l

[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21123:


Assignee: Apache Spark

> Options for file stream source are in a wrong table
> ---
>
> Key: SPARK-21123
> URL: https://issues.apache.org/jira/browse/SPARK-21123
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
> Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png
>
>
> Right now options for file stream source are documented with file sink. We 
> should create a table for source options and fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053125#comment-16053125
 ] 

Apache Spark commented on SPARK-21123:
--

User 'assafmendelson' has created a pull request for this issue:
https://github.com/apache/spark/pull/18342

> Options for file stream source are in a wrong table
> ---
>
> Key: SPARK-21123
> URL: https://issues.apache.org/jira/browse/SPARK-21123
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Shixiong Zhu
>Priority: Minor
>  Labels: starter
> Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png
>
>
> Right now options for file stream source are documented with file sink. We 
> should create a table for source options and fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21123:


Assignee: (was: Apache Spark)

> Options for file stream source are in a wrong table
> ---
>
> Key: SPARK-21123
> URL: https://issues.apache.org/jira/browse/SPARK-21123
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Shixiong Zhu
>Priority: Minor
>  Labels: starter
> Attachments: Screen Shot 2017-06-15 at 11.25.49 AM.png
>
>
> Right now options for file stream source are documented with file sink. We 
> should create a table for source options and fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21134:


Assignee: Apache Spark

> Codegen-only expressions should not be collapsed with upper CodegenFallback 
> expression
> --
>
> Key: SPARK-21134
> URL: https://issues.apache.org/jira/browse/SPARK-21134
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> The rule {{CollapseProject}} in optimizer collapses lower and upper 
> expressions if they have common expressions.
> We have seen a use case that a codegen-only expression has been collapsed 
> with an upper {{CodegenFallback}} expression. Because all children 
> expressions under {{CodegenFallback}} will be evaluated with non-codegen path 
> {{eval}}, it causes an exception in runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression

2017-06-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053404#comment-16053404
 ] 

Apache Spark commented on SPARK-21134:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/18346

> Codegen-only expressions should not be collapsed with upper CodegenFallback 
> expression
> --
>
> Key: SPARK-21134
> URL: https://issues.apache.org/jira/browse/SPARK-21134
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Liang-Chi Hsieh
>
> The rule {{CollapseProject}} in optimizer collapses lower and upper 
> expressions if they have common expressions.
> We have seen a use case that a codegen-only expression has been collapsed 
> with an upper {{CodegenFallback}} expression. Because all children 
> expressions under {{CodegenFallback}} will be evaluated with non-codegen path 
> {{eval}}, it causes an exception in runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21134) Codegen-only expressions should not be collapsed with upper CodegenFallback expression

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21134:


Assignee: (was: Apache Spark)

> Codegen-only expressions should not be collapsed with upper CodegenFallback 
> expression
> --
>
> Key: SPARK-21134
> URL: https://issues.apache.org/jira/browse/SPARK-21134
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Liang-Chi Hsieh
>
> The rule {{CollapseProject}} in optimizer collapses lower and upper 
> expressions if they have common expressions.
> We have seen a use case that a codegen-only expression has been collapsed 
> with an upper {{CodegenFallback}} expression. Because all children 
> expressions under {{CodegenFallback}} will be evaluated with non-codegen path 
> {{eval}}, it causes an exception in runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20599:


Assignee: (was: Apache Spark)

> ConsoleSink should work with write (batch)
> --
>
> Key: SPARK-20599
> URL: https://issues.apache.org/jira/browse/SPARK-20599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>  Labels: starter
>
> I think the following should just work.
> {code}
> spark.
>   read.  // <-- it's a batch query not streaming query if that matters
>   format("kafka").
>   option("subscribe", "topic1").
>   option("kafka.bootstrap.servers", "localhost:9092").
>   load.
>   write.
>   format("console").  // <-- that's not supported currently
>   save
> {code}
> The above combination of {{kafka}} source and {{console}} sink leads to the 
> following exception:
> {code}
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow 
> create table as select.
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20599) ConsoleSink should work with write (batch)

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20599:


Assignee: Apache Spark

> ConsoleSink should work with write (batch)
> --
>
> Key: SPARK-20599
> URL: https://issues.apache.org/jira/browse/SPARK-20599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> I think the following should just work.
> {code}
> spark.
>   read.  // <-- it's a batch query not streaming query if that matters
>   format("kafka").
>   option("subscribe", "topic1").
>   option("kafka.bootstrap.servers", "localhost:9092").
>   load.
>   write.
>   format("console").  // <-- that's not supported currently
>   save
> {code}
> The above combination of {{kafka}} source and {{console}} sink leads to the 
> following exception:
> {code}
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow 
> create table as select.
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20599) ConsoleSink should work with write (batch)

2017-06-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053413#comment-16053413
 ] 

Apache Spark commented on SPARK-20599:
--

User 'lubozhan' has created a pull request for this issue:
https://github.com/apache/spark/pull/18347

> ConsoleSink should work with write (batch)
> --
>
> Key: SPARK-20599
> URL: https://issues.apache.org/jira/browse/SPARK-20599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>  Labels: starter
>
> I think the following should just work.
> {code}
> spark.
>   read.  // <-- it's a batch query not streaming query if that matters
>   format("kafka").
>   option("subscribe", "topic1").
>   option("kafka.bootstrap.servers", "localhost:9092").
>   load.
>   write.
>   format("console").  // <-- that's not supported currently
>   save
> {code}
> The above combination of {{kafka}} source and {{console}} sink leads to the 
> following exception:
> {code}
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.streaming.ConsoleSinkProvider does not allow 
> create table as select.
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21120) Increasing the master's metric is conducive to the spark cluster management system monitoring.

2017-06-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053424#comment-16053424
 ] 

Apache Spark commented on SPARK-21120:
--

User 'guoxiaolongzte' has created a pull request for this issue:
https://github.com/apache/spark/pull/18348

> Increasing the master's metric is conducive to the spark cluster management 
> system monitoring.
> --
>
> Key: SPARK-21120
> URL: https://issues.apache.org/jira/browse/SPARK-21120
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: guoxiaolongzte
>Priority: Minor
> Attachments: 1.png
>
>
> The current number of master metric is very small, unable to meet the needs 
> of spark large-scale cluster management system. So I am as much as possible 
> to complete the relevant metric.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20927:


Assignee: Apache Spark

> Add cache operator to Unsupported Operations in Structured Streaming 
> -
>
> Key: SPARK-20927
> URL: https://issues.apache.org/jira/browse/SPARK-20927
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Assignee: Apache Spark
>Priority: Trivial
>  Labels: starter
>
> Just [found 
> out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries]
>  that {{cache}} is not allowed on streaming datasets.
> {{cache}} on streaming datasets leads to the following exception:
> {code}
> scala> spark.readStream.text("files").cache
> org.apache.spark.sql.AnalysisException: Queries with streaming sources must 
> be executed with writeStream.start();;
> FileSource[files]
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63)
>   at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104)
>   at 
> org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68)
>   at 
> org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92)
>   at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603)
>   at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613)
>   ... 48 elided
> {code}
> It should be included in Structured Streaming's [Unsupported 
> Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming

2017-06-18 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20927:


Assignee: (was: Apache Spark)

> Add cache operator to Unsupported Operations in Structured Streaming 
> -
>
> Key: SPARK-20927
> URL: https://issues.apache.org/jira/browse/SPARK-20927
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>  Labels: starter
>
> Just [found 
> out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries]
>  that {{cache}} is not allowed on streaming datasets.
> {{cache}} on streaming datasets leads to the following exception:
> {code}
> scala> spark.readStream.text("files").cache
> org.apache.spark.sql.AnalysisException: Queries with streaming sources must 
> be executed with writeStream.start();;
> FileSource[files]
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63)
>   at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104)
>   at 
> org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68)
>   at 
> org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92)
>   at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603)
>   at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613)
>   ... 48 elided
> {code}
> It should be included in Structured Streaming's [Unsupported 
> Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20927) Add cache operator to Unsupported Operations in Structured Streaming

2017-06-18 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053468#comment-16053468
 ] 

Apache Spark commented on SPARK-20927:
--

User 'ZiyueHuang' has created a pull request for this issue:
https://github.com/apache/spark/pull/18349

> Add cache operator to Unsupported Operations in Structured Streaming 
> -
>
> Key: SPARK-20927
> URL: https://issues.apache.org/jira/browse/SPARK-20927
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>  Labels: starter
>
> Just [found 
> out|https://stackoverflow.com/questions/42062092/why-does-using-cache-on-streaming-datasets-fail-with-analysisexception-queries]
>  that {{cache}} is not allowed on streaming datasets.
> {{cache}} on streaming datasets leads to the following exception:
> {code}
> scala> spark.readStream.text("files").cache
> org.apache.spark.sql.AnalysisException: Queries with streaming sources must 
> be executed with writeStream.start();;
> FileSource[files]
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>   at 
> org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63)
>   at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
>   at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
>   at 
> org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:104)
>   at 
> org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:68)
>   at 
> org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:92)
>   at org.apache.spark.sql.Dataset.persist(Dataset.scala:2603)
>   at org.apache.spark.sql.Dataset.cache(Dataset.scala:2613)
>   ... 48 elided
> {code}
> It should be included in Structured Streaming's [Unsupported 
> Operations|http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#unsupported-operations].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21135) On history server page，duration of incompleted applications should be hidden instead of showing up as 0

2017-06-19 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053665#comment-16053665
 ] 

Apache Spark commented on SPARK-21135:
--

User 'fjh100456' has created a pull request for this issue:
https://github.com/apache/spark/pull/18351

> On history server page，duration of incompleted applications should be hidden 
> instead of showing up as 0
> ---
>
> Key: SPARK-21135
> URL: https://issues.apache.org/jira/browse/SPARK-21135
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.2.1
>Reporter: Jinhua Fu
>Priority: Minor
>
> On history server page，duration of incompleted applications should be hidden 
> instead of showing up as 0.
> In addition, the application of an exception abort (such as the application 
> of a background kill or driver outage) will always be treated as a 
> Incompleted application, and I'm not sure if this is a problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21135) On history server page，duration of incompleted applications should be hidden instead of showing up as 0

2017-06-19 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21135:


Assignee: Apache Spark

> On history server page，duration of incompleted applications should be hidden 
> instead of showing up as 0
> ---
>
> Key: SPARK-21135
> URL: https://issues.apache.org/jira/browse/SPARK-21135
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.2.1
>Reporter: Jinhua Fu
>Assignee: Apache Spark
>Priority: Minor
>
> On history server page，duration of incompleted applications should be hidden 
> instead of showing up as 0.
> In addition, the application of an exception abort (such as the application 
> of a background kill or driver outage) will always be treated as a 
> Incompleted application, and I'm not sure if this is a problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21135) On history server page，duration of incompleted applications should be hidden instead of showing up as 0

2017-06-19 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21135:


Assignee: (was: Apache Spark)

> On history server page，duration of incompleted applications should be hidden 
> instead of showing up as 0
> ---
>
> Key: SPARK-21135
> URL: https://issues.apache.org/jira/browse/SPARK-21135
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.2.1
>Reporter: Jinhua Fu
>Priority: Minor
>
> On history server page，duration of incompleted applications should be hidden 
> instead of showing up as 0.
> In addition, the application of an exception abort (such as the application 
> of a background kill or driver outage) will always be treated as a 
> Incompleted application, and I'm not sure if this is a problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053817#comment-16053817
 ] 

Apache Spark commented on SPARK-21138:
--

User 'sharkdtu' has created a pull request for this issue:
https://github.com/apache/spark/pull/18352

> Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and 
> "spark.hadoop.fs.defaultFS" are different 
> -
>
> Key: SPARK-21138
> URL: https://issues.apache.org/jira/browse/SPARK-21138
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: sharkd tu
>
> When I set different clusters for "spark.hadoop.fs.defaultFS" and 
> "spark.yarn.stagingDir" as follows：
> {code:java}
> spark.hadoop.fs.defaultFS  hdfs://tl-nn-tdw.tencent-distribute.com:54310
> spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark
> {code}
> I got following logs:
> {code:java}
> 17/06/19 17:55:48 INFO SparkContext: Successfully stopped SparkContext
> 17/06/19 17:55:48 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with SUCCEEDED
> 17/06/19 17:55:48 INFO AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 17/06/19 17:55:48 INFO ApplicationMaster: Deleting staging directory 
> hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618
> 17/06/19 17:55:48 ERROR Utils: Uncaught exception in thread Thread-2
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, 
> expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getPathName(Cdh3DistributedFileSystem.java:197)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.access$000(Cdh3DistributedFileSystem.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:644)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:640)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.delete(Cdh3DistributedFileSystem.java:640)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:216)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233)
>   at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21138:


Assignee: Apache Spark

> Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and 
> "spark.hadoop.fs.defaultFS" are different 
> -
>
> Key: SPARK-21138
> URL: https://issues.apache.org/jira/browse/SPARK-21138
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: sharkd tu
>Assignee: Apache Spark
>
> When I set different clusters for "spark.hadoop.fs.defaultFS" and 
> "spark.yarn.stagingDir" as follows：
> {code:java}
> spark.hadoop.fs.defaultFS  hdfs://tl-nn-tdw.tencent-distribute.com:54310
> spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark
> {code}
> I got following logs:
> {code:java}
> 17/06/19 17:55:48 INFO SparkContext: Successfully stopped SparkContext
> 17/06/19 17:55:48 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with SUCCEEDED
> 17/06/19 17:55:48 INFO AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 17/06/19 17:55:48 INFO ApplicationMaster: Deleting staging directory 
> hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618
> 17/06/19 17:55:48 ERROR Utils: Uncaught exception in thread Thread-2
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, 
> expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getPathName(Cdh3DistributedFileSystem.java:197)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.access$000(Cdh3DistributedFileSystem.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:644)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:640)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.delete(Cdh3DistributedFileSystem.java:640)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:216)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233)
>   at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21138:


Assignee: (was: Apache Spark)

> Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and 
> "spark.hadoop.fs.defaultFS" are different 
> -
>
> Key: SPARK-21138
> URL: https://issues.apache.org/jira/browse/SPARK-21138
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.1.1
>Reporter: sharkd tu
>
> When I set different clusters for "spark.hadoop.fs.defaultFS" and 
> "spark.yarn.stagingDir" as follows：
> {code:java}
> spark.hadoop.fs.defaultFS  hdfs://tl-nn-tdw.tencent-distribute.com:54310
> spark.yarn.stagingDir hdfs://ss-teg-2-v2/tmp/spark
> {code}
> I got following logs:
> {code:java}
> 17/06/19 17:55:48 INFO SparkContext: Successfully stopped SparkContext
> 17/06/19 17:55:48 INFO ApplicationMaster: Unregistering ApplicationMaster 
> with SUCCEEDED
> 17/06/19 17:55:48 INFO AMRMClientImpl: Waiting for application to be 
> successfully unregistered.
> 17/06/19 17:55:48 INFO ApplicationMaster: Deleting staging directory 
> hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618
> 17/06/19 17:55:48 ERROR Utils: Uncaught exception in thread Thread-2
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://ss-teg-2-v2/tmp/spark/.sparkStaging/application_1496819138021_77618, 
> expected: hdfs://tl-nn-tdw.tencent-distribute.com:54310
>   at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.getPathName(Cdh3DistributedFileSystem.java:197)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.access$000(Cdh3DistributedFileSystem.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:644)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem$10.doCall(Cdh3DistributedFileSystem.java:640)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoopcdh3.hdfs.Cdh3DistributedFileSystem.delete(Cdh3DistributedFileSystem.java:640)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:216)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:545)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:233)
>   at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>   at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21142:


Assignee: (was: Apache Spark)

> spark-streaming-kafka-0-10 has too fat dependency on kafka
> --
>
> Key: SPARK-21142
> URL: https://issues.apache.org/jira/browse/SPARK-21142
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
>Reporter: Tim Van Wassenhove
>Priority: Minor
>
> Currently spark-streaming-kafka-0-10_2.11 has a dependency on kafka, where it 
> should only need a dependency on kafka-clients.
> The only reason there is a dependency on kafka (full server) is due to 
> KafkaTestUtils class to run in  memory tests against a kafka broker. This 
> class should be moved to src/test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21142:


Assignee: Apache Spark

> spark-streaming-kafka-0-10 has too fat dependency on kafka
> --
>
> Key: SPARK-21142
> URL: https://issues.apache.org/jira/browse/SPARK-21142
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.1.1
>Reporter: Tim Van Wassenhove
>Assignee: Apache Spark
>Priority: Minor
>
> Currently spark-streaming-kafka-0-10_2.11 has a dependency on kafka, where it 
> should only need a dependency on kafka-clients.
> The only reason there is a dependency on kafka (full server) is due to 
> KafkaTestUtils class to run in  memory tests against a kafka broker. This 
> class should be moved to src/test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 86812 matches

Mail list logo