[jira] [Assigned] (SPARK-17406) Event Timeline will be very slow when there are too many executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17406: Assignee: Apache Spark > Event Timeline will be very slow when there are too many executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Assignee: Apache Spark >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are thousands of executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. Before this pr, it > looks like timeline1.png. After this pr, it looks like timeline2.png(we can > set how many events will be displayed) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17406) Event Timeline will be very slow when there are too many executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466459#comment-15466459 ] Apache Spark commented on SPARK-17406: -- User 'cenyuhai' has created a pull request for this issue: https://github.com/apache/spark/pull/14969 > Event Timeline will be very slow when there are too many executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are thousands of executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. Before this pr, it > looks like timeline1.png. After this pr, it looks like timeline2.png(we can > set how many events will be displayed) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17406) Event Timeline will be very slow when there are too many executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17406: Assignee: (was: Apache Spark) > Event Timeline will be very slow when there are too many executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are thousands of executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. Before this pr, it > looks like timeline1.png. After this pr, it looks like timeline2.png(we can > set how many events will be displayed) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are too many executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated SPARK-17406: -- Summary: Event Timeline will be very slow when there are too many executor events (was: Event Timeline will be very slow when there are very executor events) > Event Timeline will be very slow when there are too many executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are thousands of executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. Before this pr, it > looks like timeline1.png. After this pr, it looks like timeline2.png(we can > set how many events will be displayed) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated SPARK-17406: -- Description: The job page will be too slow to open when there are thousands of executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time. Before this pr, it looks like timeline1.png. After this pr, it looks like timeline2.png(we can set how many events will be displayed) (was: The job page will be too slow to open when there are very executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time. Before this pr, it looks like timeline1.png. After this pr, it looks like timeline2.png(we can set how many events will be displayed)) > Event Timeline will be very slow when there are very executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are thousands of executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. Before this pr, it > looks like timeline1.png. After this pr, it looks like timeline2.png(we can > set how many events will be displayed) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17369: Assignee: Sean Zhong (was: Apache Spark) > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466429#comment-15466429 ] Apache Spark commented on SPARK-17369: -- User 'clockfly' has created a pull request for this issue: https://github.com/apache/spark/pull/14968 > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17369: Assignee: Apache Spark (was: Sean Zhong) > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Apache Spark >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17409) Query in CTAS is Optimized Twice
[ https://issues.apache.org/jira/browse/SPARK-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466397#comment-15466397 ] Xiao Li commented on SPARK-17409: - Working on it. Thanks! > Query in CTAS is Optimized Twice > > > Key: SPARK-17409 > URL: https://issues.apache.org/jira/browse/SPARK-17409 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Xiao Li > > The query in CTAS is optimized twice, as reported in the PR: > https://github.com/apache/spark/pull/14797 > {quote} > Some analyzer rules have assumptions on logical plans, optimizer may break > these assumption, we should not pass an optimized query plan into > QueryExecution (will be analyzed again), otherwise we may some weird bugs. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17409) Query in CTAS is Optimized Twice
Xiao Li created SPARK-17409: --- Summary: Query in CTAS is Optimized Twice Key: SPARK-17409 URL: https://issues.apache.org/jira/browse/SPARK-17409 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Xiao Li The query in CTAS is optimized twice, as reported in the PR: https://github.com/apache/spark/pull/14797 {quote} Some analyzer rules have assumptions on logical plans, optimizer may break these assumption, we should not pass an optimized query plan into QueryExecution (will be analyzed again), otherwise we may some weird bugs. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17072) generate table level stats:stats generation/storing/loading
[ https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17072: - Assignee: Zhenhua Wang > generate table level stats:stats generation/storing/loading > --- > > Key: SPARK-17072 > URL: https://issues.apache.org/jira/browse/SPARK-17072 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu >Assignee: Zhenhua Wang > Fix For: 2.1.0 > > > need to generating , storing, and loading statistics information into/from > meta store. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17408) Flaky test: org.apache.spark.sql.hive.StatisticsSuite
[ https://issues.apache.org/jira/browse/SPARK-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466358#comment-15466358 ] Yin Huai commented on SPARK-17408: -- cc [~cloud_fan] [~ZenWzh] > Flaky test: org.apache.spark.sql.hive.StatisticsSuite > - > > Key: SPARK-17408 > URL: https://issues.apache.org/jira/browse/SPARK-17408 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64956/testReport/junit/org.apache.spark.sql.hive/StatisticsSuite/test_statistics_of_LogicalRelation_converted_from_MetastoreRelation/ > {code} > org.apache.spark.sql.hive.StatisticsSuite.test statistics of LogicalRelation > converted from MetastoreRelation > Failing for the past 1 build (Since Failed#64956 ) > Took 1.4 sec. > Error Message > org.scalatest.exceptions.TestFailedException: 6871 did not equal 4236 > Stacktrace > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 6871 > did not equal 4236 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:247) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:241) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223) > at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:117) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118) > at > org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:158) > at > org.apache.spark.sql.hive.StatisticsSuite.org$apache$spark$sql$hive$StatisticsSuite$$checkLogicalRelationStats(StatisticsSuite.scala:241) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3$$anonfun$apply$mcV$sp$4.apply$mcV$sp(StatisticsSuite.scala:271) > at > org.apache.spark.sql.test.SQLTestUtils$class.withSQLConf(SQLTestUtils.scala:99) > at > org.apache.spark.sql.hive.StatisticsSuite.withSQLConf(StatisticsSuite.scala:35) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3.apply$mcV$sp(StatisticsSuite.scala:268) > at > org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168) > at > org.apache.spark.sql.hive.StatisticsSuite.withTable(StatisticsSuite.scala:35) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply$mcV$sp(StatisticsSuite.scala:260) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257) > at > org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at
[jira] [Created] (SPARK-17408) Flaky test: org.apache.spark.sql.hive.StatisticsSuite
Yin Huai created SPARK-17408: Summary: Flaky test: org.apache.spark.sql.hive.StatisticsSuite Key: SPARK-17408 URL: https://issues.apache.org/jira/browse/SPARK-17408 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64956/testReport/junit/org.apache.spark.sql.hive/StatisticsSuite/test_statistics_of_LogicalRelation_converted_from_MetastoreRelation/ {code} org.apache.spark.sql.hive.StatisticsSuite.test statistics of LogicalRelation converted from MetastoreRelation Failing for the past 1 build (Since Failed#64956 ) Took 1.4 sec. Error Message org.scalatest.exceptions.TestFailedException: 6871 did not equal 4236 Stacktrace sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 6871 did not equal 4236 at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:247) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:241) at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223) at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:117) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118) at org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:158) at org.apache.spark.sql.hive.StatisticsSuite.org$apache$spark$sql$hive$StatisticsSuite$$checkLogicalRelationStats(StatisticsSuite.scala:241) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3$$anonfun$apply$mcV$sp$4.apply$mcV$sp(StatisticsSuite.scala:271) at org.apache.spark.sql.test.SQLTestUtils$class.withSQLConf(SQLTestUtils.scala:99) at org.apache.spark.sql.hive.StatisticsSuite.withSQLConf(StatisticsSuite.scala:35) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3.apply$mcV$sp(StatisticsSuite.scala:268) at org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168) at org.apache.spark.sql.hive.StatisticsSuite.withTable(StatisticsSuite.scala:35) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply$mcV$sp(StatisticsSuite.scala:260) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257) at org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
[jira] [Commented] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466350#comment-15466350 ] Yin Huai commented on SPARK-17369: -- Let'e create a patch for branch 2.0. I have reverted the current one because it breaks the 2.0 build. > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-17369: - Fix Version/s: (was: 2.0.1) > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reopened SPARK-17369: -- > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17358) Cached table(parquet/orc) should be shard between beelines
[ https://issues.apache.org/jira/browse/SPARK-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-17358. - Resolution: Fixed Fix Version/s: 2.1.0 2.0.1 Issue resolved by pull request 14913 [https://github.com/apache/spark/pull/14913] > Cached table(parquet/orc) should be shard between beelines > -- > > Key: SPARK-17358 > URL: https://issues.apache.org/jira/browse/SPARK-17358 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yadong Qi > Fix For: 2.0.1, 2.1.0 > > > I cached table(parquet) in Beeline1, and couldn't use cache data in Beeline2. > But table in text format is OK.(Cached table(text) can be shard between > beelines) > Beeline1 > {code:sql} > CACHE TABLE src_pqt; > EXPLAIN SELECT * FROM src_pqt; > | == Physical Plan == > InMemoryTableScan >+- InMemoryRelation > +- *FileScan parquet default.src_pqt > {code} > Beeline2 > {code:sql} > EXPLAIN SELECT * FROM src_pqt; > | == Physical Plan == > *FileScan parquet default.src_pqt > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17407) Unable to update structured stream from CSV
Chris Parmer created SPARK-17407: Summary: Unable to update structured stream from CSV Key: SPARK-17407 URL: https://issues.apache.org/jira/browse/SPARK-17407 Project: Spark Issue Type: Question Components: PySpark Affects Versions: 2.0.0 Environment: Mac OSX Spark 2.0.0 Reporter: Chris Parmer Priority: Trivial I am creating a simple example of a Structured Stream from a CSV file with an in-memory output stream. When I add rows the CSV file, my output stream does not update with the new data. From this example: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4012078893478893/3202384642551446/5985939988045659/latest.html, I expected that subsequent queries on the same output stream would contain updated results. Here is a reproducable code example: https://plot.ly/~chris/17703 Thanks for the help here! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17358) Cached table(parquet/orc) should be shard between beelines
[ https://issues.apache.org/jira/browse/SPARK-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-17358: Assignee: Yadong Qi > Cached table(parquet/orc) should be shard between beelines > -- > > Key: SPARK-17358 > URL: https://issues.apache.org/jira/browse/SPARK-17358 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Yadong Qi >Assignee: Yadong Qi > Fix For: 2.0.1, 2.1.0 > > > I cached table(parquet) in Beeline1, and couldn't use cache data in Beeline2. > But table in text format is OK.(Cached table(text) can be shard between > beelines) > Beeline1 > {code:sql} > CACHE TABLE src_pqt; > EXPLAIN SELECT * FROM src_pqt; > | == Physical Plan == > InMemoryTableScan >+- InMemoryRelation > +- *FileScan parquet default.src_pqt > {code} > Beeline2 > {code:sql} > EXPLAIN SELECT * FROM src_pqt; > | == Physical Plan == > *FileScan parquet default.src_pqt > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-17369: Fix Version/s: 2.0.1 > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-17369: Assignee: Sean Zhong > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Assignee: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated SPARK-17406: -- Description: The job page will be too slow to open when there are very executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time. Before this pr, it looks like timeline1.png. After this pr, it looks like timeline2.png(we can set how many events will be displayed) (was: The job page will be too slow to open when there are very executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time.) > Event Timeline will be very slow when there are very executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are very executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. Before this pr, it > looks like timeline1.png. After this pr, it looks like timeline2.png(we can > set how many events will be displayed) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17369) MetastoreRelation toJSON throws exception
[ https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-17369. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14928 [https://github.com/apache/spark/pull/14928] > MetastoreRelation toJSON throws exception > - > > Key: SPARK-17369 > URL: https://issues.apache.org/jira/browse/SPARK-17369 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Sean Zhong >Priority: Minor > Fix For: 2.1.0 > > > MetastoreRelationSuite.toJSON now throws exception. > test case: > {code} > package org.apache.spark.sql.hive > import org.apache.spark.SparkFunSuite > import org.apache.spark.sql.catalyst.TableIdentifier > import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, > CatalogTable, CatalogTableType} > import org.apache.spark.sql.types.{IntegerType, StructField, StructType} > class MetastoreRelationSuite extends SparkFunSuite { > test("makeCopy and toJSON should work") { > val table = CatalogTable( > identifier = TableIdentifier("test", Some("db")), > tableType = CatalogTableType.VIEW, > storage = CatalogStorageFormat.empty, > schema = StructType(StructField("a", IntegerType, true) :: Nil)) > val relation = MetastoreRelation("db", "test")(table, null, null) > // No exception should be thrown > relation.makeCopy(Array("db", "test")) > // No exception should be thrown > relation.toJSON > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated SPARK-17406: -- Attachment: timeline2.png > Event Timeline will be very slow when there are very executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are very executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events
[ https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated SPARK-17406: -- Attachment: timeline1.png > Event Timeline will be very slow when there are very executor events > > > Key: SPARK-17406 > URL: https://issues.apache.org/jira/browse/SPARK-17406 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.2, 2.0.0 >Reporter: cen yuhai >Priority: Minor > Attachments: timeline1.png, timeline2.png > > > The job page will be too slow to open when there are very executor > events(added or removed). I found that in ExecutorsTab file, executorIdToData > will not remove elements, it will increase all the time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16943) CREATE TABLE LIKE generates a non-empty table when source is a data source table
[ https://issues.apache.org/jira/browse/SPARK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-16943: Fix Version/s: 2.0.1 > CREATE TABLE LIKE generates a non-empty table when source is a data source > table > > > Key: SPARK-16943 > URL: https://issues.apache.org/jira/browse/SPARK-16943 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.1, 2.1.0 > > > When the source table is a data source table, the table generated by CREATE > TABLE LIKE is non-empty. The expected table should be empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is an External Hive Serde table
[ https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-16942: Fix Version/s: 2.0.1 > CREATE TABLE LIKE generates External table when source table is an External > Hive Serde table > > > Key: SPARK-16942 > URL: https://issues.apache.org/jira/browse/SPARK-16942 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.1, 2.1.0 > > > When the table type of source table is an EXTERNAL Hive serde table, {{CREATE > TABLE LIKE}} will generate an EXTERNAL table. The expected table type should > be MANAGED > The table type of the generated table is `EXTERNAL` when the source table is > an external Hive Serde table. Currently, we explicitly set it to `MANAGED`, > but Hive is checking the table property `EXTERNAL` to decide whether the > table is `EXTERNAL` or not. (See > https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408) > Thus, the created table is still `EXTERNAL`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17353) CREATE TABLE LIKE statements when Source is a VIEW
[ https://issues.apache.org/jira/browse/SPARK-17353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-17353: Fix Version/s: 2.0.1 > CREATE TABLE LIKE statements when Source is a VIEW > -- > > Key: SPARK-17353 > URL: https://issues.apache.org/jira/browse/SPARK-17353 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.1, 2.1.0 > > > - Add a support for tempory view > - When the source table is a `VIEW`, the metadata of the generated table > contains the original view text and view original text. So far, this does not > break anything, but it could cause something wrong in Hive. (For example, > https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406) > - When the type of source table is a view, the target table is using the > default format of data source tables: `spark.sql.sources.default`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17406) Event Timeline will be very slow when there are very executor events
cen yuhai created SPARK-17406: - Summary: Event Timeline will be very slow when there are very executor events Key: SPARK-17406 URL: https://issues.apache.org/jira/browse/SPARK-17406 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.0.0, 1.6.2 Reporter: cen yuhai Priority: Minor The job page will be too slow to open when there are very executor events(added or removed). I found that in ExecutorsTab file, executorIdToData will not remove elements, it will increase all the time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17279) better error message for exceptions during ScalaUDF execution
[ https://issues.apache.org/jira/browse/SPARK-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-17279. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 14850 [https://github.com/apache/spark/pull/14850] > better error message for exceptions during ScalaUDF execution > - > > Key: SPARK-17279 > URL: https://issues.apache.org/jira/browse/SPARK-17279 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17397) what to do when awaitTermination() throws?
[ https://issues.apache.org/jira/browse/SPARK-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466183#comment-15466183 ] Spiro Michaylov edited comment on SPARK-17397 at 9/6/16 2:24 AM: - Apologies: I was originally going to file it as a doc bug, and perhaps I should have done that, although I actually suspect there's a feature enhancement to file as well. I guess maybe the user list was a better place to hash that out. I mostly had more mundane exceptions in mind, say throwing a custom exception when processing a stream like this fragment from my code above. {code} stream .map(x => { throw new SomeException("something"); x} ) .foreachRDD(r => println("*** count = " + r.count())) {code} I got the impression from the unit tests (StreamingContextSuite) that such processing exceptions were intended to propagate through awaitTermination() and indeed they do. In so far as the intended usage is orderly shutdown of the StreamingContext when the first such exception propagates, I agree it's just a doc issue, and perhaps the intended usage pattern is something like: {code} try { ssc.awaitTermination() println("*** streaming terminated") } catch { case e: Exception => { ssc.stop() // ADDED THIS AS A WAY TO GET AN ORDERLY SHUTDOWN } } {code} But it seems like to get resilient streaming applications you'd like to have the option of seeing which exceptions have propagated, how severe, and how many, so you'd like (at least _I_ would like) something like: {code} var terminated = false var nastyExceptionCount = 0 while (!terminated && nastyExceptionCount < 100) { try { ssc.awaitTermination() terminated = true println("*** streaming terminated") } catch { case ne: NastyException => { // log it and count it but don't panic nastyExceptionCount = nastyExceptionCount + 1 } case e: Exception => { // yawn! } } } if (!terminated) ssc.stop() // because we got too many nasty exceptions {code} However, that doesn't seem to be supported (looks like the first exception gets repeated every time awaitTermination() is called), so I'm inclined to file it as either a bug or a feature request according to your taste, or just drop it if you think what I'm proposing is inappropriate. In any case, I don't want to abuse the process more than I already have: I'm happy to either drop this, take it to the user list, or open other tickets if desired and close this one. In fact, I notice I can still turn this into a bug report. Should I? was (Author: spirom): Apologies: I was originally going to file it as a doc bug, and perhaps I should have done that, although I actually suspect there's a feature enhancement to file as well. I guess maybe the user list was a better place to hash that out. I mostly had more mundane exceptions in mind, say throwing a custom exception when processing a stream like this fragment from my code above. {code} stream .map(x => { throw new SomeException("something"); x} ) .foreachRDD(r => println("*** count = " + r.count())) {code} I got the impression from the unit tests (StreamingContextSuite) that such processing exceptions were intended to propagate through awaitTermination() and indeed they do. In so far as the intended usage is orderly shutdown of the StreamingContext when the first such exception propagates, I agree it's just a doc issue, and perhaps the intended usage pattern is something like: {code} try { ssc.awaitTermination() println("*** streaming terminated") } catch { case e: Exception => { ssc.stop() // ADDED THIS AS A WAY TO GET AN ORDERLY SHUTDOWN } } {code} But it seems like to get resilient streaming applications you'd like to have the option of seeing which exceptions have propagated, how severe, and how many, so you'd like (at least _I_ would like) something like: {code} var terminated = false var nastyExceptionCount = 0 while (!terminated && nastyExceptionCount < 100) { try { ssc.awaitTermination() terminated = true println("*** streaming terminated") } catch { case ne: NastyException => { // log it and count it but don't panic nastyExceptionCount = nastyExceptionCount + 1 } case e: Exception => { // yawn! } } } if (!terminated) ssc.stop() // because we got too many nasty exceptions {code} However, that doesn't seem to be supported (looks like the first exception gets repeated every time awaitTermination() is called), so I'm inclined to file it as either a bug or a feature request according to your taste, or just drop it if you think what I'm proposing is inappropriate. In any case, I don't want to abuse the process more than I already have: I'm happy to either drop this, take it to the user list, or open other tickets if desired and close this one. > what to do
[jira] [Commented] (SPARK-17397) what to do when awaitTermination() throws?
[ https://issues.apache.org/jira/browse/SPARK-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466183#comment-15466183 ] Spiro Michaylov commented on SPARK-17397: - Apologies: I was originally going to file it as a doc bug, and perhaps I should have done that, although I actually suspect there's a feature enhancement to file as well. I guess maybe the user list was a better place to hash that out. I mostly had more mundane exceptions in mind, say throwing a custom exception when processing a stream like this fragment from my code above. {code} stream .map(x => { throw new SomeException("something"); x} ) .foreachRDD(r => println("*** count = " + r.count())) {code} I got the impression from the unit tests (StreamingContextSuite) that such processing exceptions were intended to propagate through awaitTermination() and indeed they do. In so far as the intended usage is orderly shutdown of the StreamingContext when the first such exception propagates, I agree it's just a doc issue, and perhaps the intended usage pattern is something like: {code} try { ssc.awaitTermination() println("*** streaming terminated") } catch { case e: Exception => { ssc.stop() // ADDED THIS AS A WAY TO GET AN ORDERLY SHUTDOWN } } {code} But it seems like to get resilient streaming applications you'd like to have the option of seeing which exceptions have propagated, how severe, and how many, so you'd like (at least _I_ would like) something like: {code} var terminated = false var nastyExceptionCount = 0 while (!terminated && nastyExceptionCount < 100) { try { ssc.awaitTermination() terminated = true println("*** streaming terminated") } catch { case ne: NastyException => { // log it and count it but don't panic nastyExceptionCount = nastyExceptionCount + 1 } case e: Exception => { // yawn! } } } if (!terminated) ssc.stop() // because we got too many nasty exceptions {code} However, that doesn't seem to be supported (looks like the first exception gets repeated every time awaitTermination() is called), so I'm inclined to file it as either a bug or a feature request according to your taste, or just drop it if you think what I'm proposing is inappropriate. In any case, I don't want to abuse the process more than I already have: I'm happy to either drop this, take it to the user list, or open other tickets if desired and close this one. > what to do when awaitTermination() throws? > --- > > Key: SPARK-17397 > URL: https://issues.apache.org/jira/browse/SPARK-17397 > Project: Spark > Issue Type: Question > Components: Streaming >Affects Versions: 2.0.0 > Environment: Linux, Scala but probably general >Reporter: Spiro Michaylov >Priority: Minor > > When awaitTermination propagates an exception that was thrown in processing a > batch, the StreamingContext keeps running. Perhaps this is by design, but I > don't see any mention of it in the API docs or the streaming programming > guide. It's not clear what idiom should be used to block the thread until the > context HAS been stopped in a situation where stream processing is throwing > lots of exceptions. > For example, in the following, streaming takes the full 30 seconds to > terminate. My hope in asking this is to improve my own understanding and > perhaps inspire documentation improvements. I'm not filing a bug because it's > not clear to me whether this is working as intended. > {code} > val conf = new > SparkConf().setAppName("ExceptionPropagation").setMaster("local[4]") > val sc = new SparkContext(conf) > // streams will produce data every second > val ssc = new StreamingContext(sc, Seconds(1)) > val qm = new QueueMaker(sc, ssc) > // create the stream > val stream = // create some stream > // register for data > stream > .map(x => { throw new SomeException("something"); x} ) > .foreachRDD(r => println("*** count = " + r.count())) > // start streaming > ssc.start() > new Thread("Delayed Termination") { > override def run() { > Thread.sleep(3) > ssc.stop() > } > }.start() > println("*** producing data") > // start producing data > qm.populateQueue() > try { > ssc.awaitTermination() > println("*** streaming terminated") > } catch { > case e: Exception => { > println("*** streaming exception caught in monitor thread") > } > } > // if the above goes down the exception path, there seems no > // good way to block here until the streaming context is stopped > println("*** done") > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table
[ https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466180#comment-15466180 ] pin_zhang commented on SPARK-17396: --- "Thread-1902" daemon prio=6 tid=0x14078800 nid=0x3a6c runnable [0x38d5e000] "Thread-1901" daemon prio=6 tid=0x0c64f800 nid=0x32fc runnable [0x191ef000] "Thread-1900" daemon prio=6 tid=0x14249800 nid=0x263c runnable [0x4c73e000] "Thread-1899" daemon prio=6 tid=0x14244000 nid=0x189c runnable [0x17c7e000] "Thread-1898" daemon prio=6 tid=0x0d96a800 nid=0x3e54 runnable [0x4c5ef000] "ForkJoinPool-120-worker-1" daemon prio=6 tid=0x1407d000 nid=0x2234 waiting for monitor entry [0x4c31e000] "ForkJoinPool-120-worker-3" daemon prio=6 tid=0x13a64000 nid=0x1f0c waiting for monitor entry [0x4c0de000] "ForkJoinPool-120-worker-5" daemon prio=6 tid=0x13a75800 nid=0x1660 waiting for monitor entry [0x4241e000] "ForkJoinPool-120-worker-7" daemon prio=6 tid=0x13d6c000 nid=0x117c waiting for monitor entry [0x4bece000] "ForkJoinPool-120-worker-9" daemon prio=6 tid=0x14233800 nid=0x2a20 waiting for monitor entry [0x4bd3e000] "ForkJoinPool-120-worker-11" daemon prio=6 tid=0x1423f800 nid=0x3568 waiting for monitor entry [0x4afae000] "ForkJoinPool-120-worker-13" daemon prio=6 tid=0x1424e000 nid=0x378c waiting for monitor entry [0x4bc0e000] "ForkJoinPool-120-worker-15" daemon prio=6 tid=0x14238000 nid=0x1b8c waiting for monitor entry [0x18dfd000] "ForkJoinPool-119-worker-1" daemon prio=6 tid=0x13d74800 nid=0x29a0 waiting for monitor entry [0x4bade000] "ForkJoinPool-119-worker-3" daemon prio=6 tid=0x12cd4000 nid=0x18a0 in Object.wait() [0x4b9ae000] "ForkJoinPool-119-worker-7" daemon prio=6 tid=0x12cd3000 nid=0x15ec waiting for monitor entry [0x4b87d000] "ForkJoinPool-119-worker-5" daemon prio=6 tid=0x13bbd800 nid=0x2c24 waiting for monitor entry [0x4b76d000] "ForkJoinPool-119-worker-9" daemon prio=6 tid=0x13bc9800 nid=0x3d78 waiting for monitor entry [0x2acae000] "ForkJoinPool-119-worker-11" daemon prio=6 tid=0x0d9eb000 nid=0x3f40 waiting for monitor entry [0x4b57e000] "ForkJoinPool-119-worker-13" daemon prio=6 tid=0x0d9e4800 nid=0x286c waiting for monitor entry [0x4b40e000] "ForkJoinPool-119-worker-15" daemon prio=6 tid=0x0d9e9000 nid=0x2304 in Object.wait() [0x194de000] "ForkJoinPool-118-worker-1" daemon prio=6 tid=0x14077000 nid=0x3a50 runnable [0x393dd000] "ForkJoinPool-118-worker-3" daemon prio=6 tid=0x1407a000 nid=0x1dc0 runnable [0x2331d000] "ForkJoinPool-118-worker-5" daemon prio=6 tid=0x0d2f9000 nid=0x2990 runnable [0x1b6fd000] "ForkJoinPool-118-worker-7" daemon prio=6 tid=0x0d2df800 nid=0x3bb4 runnable [0x4a9dd000] "ForkJoinPool-118-worker-9" daemon prio=6 tid=0x0d2f7800 nid=0x37e4 waiting for monitor entry [0x2bf5e000] "ForkJoinPool-118-worker-11" daemon prio=6 tid=0x12648000 nid=0x2878 runnable [0x2b26d000] "ForkJoinPool-118-worker-13" daemon prio=6 tid=0x12646000 nid=0x4cc waiting for monitor entry [0x183de000] "ForkJoinPool-118-worker-15" daemon prio=6 tid=0x12647800 nid=0x30c8 waiting for monitor entry [0x2bd3d000] "ForkJoinPool-117-worker-5" daemon prio=6 tid=0x12b5c800 nid=0x3510 waiting for monitor entry [0x4b2be000] "ForkJoinPool-117-worker-1" daemon prio=6 tid=0x12b5d000 nid=0x36b8 waiting for monitor entry [0x4b11e000] "ForkJoinPool-117-worker-3" daemon prio=6 tid=0x12eac800 nid=0x32d4 in Object.wait() [0x4acae000] "ForkJoinPool-117-worker-7" daemon prio=6 tid=0x12ea9800 nid=0x16c4 waiting for monitor entry [0x4ab1e000] "ForkJoinPool-117-worker-9" daemon prio=6 tid=0x12e9b000 nid=0x1e44 waiting for monitor entry [0x2162e000] "ForkJoinPool-117-worker-11" daemon prio=6 tid=0x13bcc000 nid=0x37f4 waiting for monitor entry [0x40dee000] "ForkJoinPool-117-worker-13" daemon prio=6 tid=0x13bcb000 nid=0x361c in Object.wait() [0x35dbe000] "ForkJoinPool-117-worker-15" daemon prio=6 tid=0x13bca800 nid=0x3344 in Object.wait() [0x2c0ce000] "ForkJoinPool-116-worker-1" daemon prio=6 tid=0x13bc9000 nid=0x3a34 runnable [0x4867d000] "ForkJoinPool-116-worker-3" daemon prio=6 tid=0x13bc8000 nid=0x1c10 in Object.wait() [0x4a8be000] "ForkJoinPool-116-worker-7" daemon prio=6 tid=0x13bc7800 nid=0x2910 waiting on condition [0x45e7f000] "ForkJoinPool-116-worker-5" daemon prio=6 tid=0x13bc6800 nid=0x3b1c waiting for monitor entry
[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466154#comment-15466154 ] Yin Huai commented on SPARK-17404: -- Thanks! > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai >Assignee: Sun Rui > Fix For: 1.6.3 > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4957) TaskScheduler: when no resources are available: backoff after # of tries and crash.
[ https://issues.apache.org/jira/browse/SPARK-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout closed SPARK-4957. - Resolution: Won't Fix Closing this as won't fix since it's been open (with no activity) for a while > TaskScheduler: when no resources are available: backoff after # of tries and > crash. > --- > > Key: SPARK-4957 > URL: https://issues.apache.org/jira/browse/SPARK-4957 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 1.1.0 >Reporter: Nathan Bijnens > Labels: scheduler > > Currently the TaskSchedulerImpl retries scheduling if there are no resources > available. Unfortunately it keeps retrying with a small delay. It would make > sense to throw an exception after a number of tries, instead of hanging > indefinitely. > https://github.com/apache/spark/blob/cb0eae3b78d7f6f56c0b9521ee48564a4967d3de/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L164-175 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-17404. --- Resolution: Fixed Assignee: Sun Rui Fix Version/s: 1.6.3 Fixed by backporting https://github.com/apache/spark/pull/12867 > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai >Assignee: Sun Rui > Fix For: 1.6.3 > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17072) generate table level stats:stats generation/storing/loading
[ https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466082#comment-15466082 ] Zhenhua Wang commented on SPARK-17072: -- My JIRA account name is ZenWzh, thanks! > generate table level stats:stats generation/storing/loading > --- > > Key: SPARK-17072 > URL: https://issues.apache.org/jira/browse/SPARK-17072 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > Fix For: 2.1.0 > > > need to generating , storing, and loading statistics information into/from > meta store. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17405) Simple aggregation query OOMing after SPARK-16525
Josh Rosen created SPARK-17405: -- Summary: Simple aggregation query OOMing after SPARK-16525 Key: SPARK-17405 URL: https://issues.apache.org/jira/browse/SPARK-17405 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Josh Rosen Priority: Blocker Prior to SPARK-16525 / https://github.com/apache/spark/pull/14176, the following query ran fine via Beeline / Thrift Server and the Spark shell, but after that patch it is consistently OOMING: {code} CREATE TEMPORARY VIEW table_1(double_col_1, boolean_col_2, timestamp_col_3, smallint_col_4, boolean_col_5, int_col_6, timestamp_col_7, varchar0008_col_8, int_col_9, string_col_10) AS ( SELECT * FROM (VALUES (CAST(-147.818640624 AS DOUBLE), CAST(NULL AS BOOLEAN), TIMESTAMP('2012-10-19 00:00:00.0'), CAST(9 AS SMALLINT), false, 77, TIMESTAMP('2014-07-01 00:00:00.0'), '-945', -646, '722'), (CAST(594.195125271 AS DOUBLE), false, TIMESTAMP('2016-12-04 00:00:00.0'), CAST(NULL AS SMALLINT), CAST(NULL AS BOOLEAN), CAST(NULL AS INT), TIMESTAMP('1999-12-26 00:00:00.0'), '250', -861, '55'), (CAST(-454.171126363 AS DOUBLE), false, TIMESTAMP('2008-12-13 00:00:00.0'), CAST(NULL AS SMALLINT), false, -783, TIMESTAMP('2010-05-28 00:00:00.0'), '211', -959, CAST(NULL AS STRING)), (CAST(437.670945524 AS DOUBLE), true, TIMESTAMP('2011-10-16 00:00:00.0'), CAST(952 AS SMALLINT), true, 297, TIMESTAMP('2013-01-13 00:00:00.0'), '262', CAST(NULL AS INT), '936'), (CAST(-387.226759334 AS DOUBLE), false, TIMESTAMP('2019-10-03 00:00:00.0'), CAST(-496 AS SMALLINT), CAST(NULL AS BOOLEAN), -925, TIMESTAMP('2028-06-27 00:00:00.0'), '-657', 948, '18'), (CAST(-306.138230875 AS DOUBLE), true, TIMESTAMP('1997-10-07 00:00:00.0'), CAST(332 AS SMALLINT), false, 744, TIMESTAMP('1990-09-22 00:00:00.0'), '-345', 566, '-574'), (CAST(675.402140308 AS DOUBLE), false, TIMESTAMP('2017-06-26 00:00:00.0'), CAST(972 AS SMALLINT), true, CAST(NULL AS INT), TIMESTAMP('2026-06-10 00:00:00.0'), '518', 683, '-320'), (CAST(734.839647174 AS DOUBLE), true, TIMESTAMP('1995-06-01 00:00:00.0'), CAST(-792 AS SMALLINT), CAST(NULL AS BOOLEAN), CAST(NULL AS INT), TIMESTAMP('2021-07-11 00:00:00.0'), '-318', 564, '142') ) as t); CREATE TEMPORARY VIEW table_3(string_col_1, float_col_2, timestamp_col_3, boolean_col_4, timestamp_col_5, decimal3317_col_6) AS ( SELECT * FROM (VALUES ('88', CAST(191.92508 AS FLOAT), TIMESTAMP('1990-10-25 00:00:00.0'), false, TIMESTAMP('1992-11-02 00:00:00.0'), CAST(NULL AS DECIMAL(33,17))), ('-419', CAST(-13.477915 AS FLOAT), TIMESTAMP('1996-03-02 00:00:00.0'), true, CAST(NULL AS TIMESTAMP), -653.51000BD), ('970', CAST(-360.432 AS FLOAT), TIMESTAMP('2010-07-29 00:00:00.0'), false, TIMESTAMP('1995-09-01 00:00:00.0'), -936.48000BD), ('807', CAST(814.30756 AS FLOAT), TIMESTAMP('2019-11-06 00:00:00.0'), false, TIMESTAMP('1996-04-25 00:00:00.0'), 335.56000BD), ('-872', CAST(616.50525 AS FLOAT), TIMESTAMP('2011-08-28 00:00:00.0'), false, TIMESTAMP('2003-07-19 00:00:00.0'), -951.18000BD), ('-167', CAST(-875.35675 AS FLOAT), TIMESTAMP('1995-07-14 00:00:00.0'), false, TIMESTAMP('2005-11-29 00:00:00.0'), 224.89000BD) ) as t); SELECT CAST(MIN(t2.smallint_col_4) AS STRING) AS char_col, LEAD(MAX((-387) + (727.64)), 90) OVER (PARTITION BY COALESCE(t2.int_col_9, t2.smallint_col_4, t2.int_col_9) ORDER BY COALESCE(t2.int_col_9, t2.smallint_col_4, t2.int_col_9) DESC, CAST(MIN(t2.smallint_col_4) AS STRING)) AS decimal_col, COALESCE(t2.int_col_9, t2.smallint_col_4, t2.int_col_9) AS int_col FROM table_3 t1 INNER JOIN table_1 t2 ON (((t2.timestamp_col_3) = (t1.timestamp_col_5)) AND ((t2.string_col_10) = (t1.string_col_1))) AND ((t2.string_col_10) = (t1.string_col_1)) WHERE (t2.smallint_col_4) IN (t2.int_col_9, t2.int_col_9) GROUP BY COALESCE(t2.int_col_9, t2.smallint_col_4, t2.int_col_9); {code} Here's the OOM: {code} org.apache.hive.service.cli.HiveSQLException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 9, localhost): java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:100) at org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:783) at org.apache.spark.unsafe.map.BytesToBytesMap.(BytesToBytesMap.java:204) at org.apache.spark.unsafe.map.BytesToBytesMap.(BytesToBytesMap.java:219) at org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.(UnsafeFixedWidthAggregationMap.java:104) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.createHashMap(HashAggregateExec.scala:305) at
[jira] [Issue Comment Deleted] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-17404: -- Comment: was deleted (was: I've kicked off https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/288/ after the cherry-pick. Lets watch this and see if it fixes things) > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465927#comment-15465927 ] Shivaram Venkataraman commented on SPARK-17404: --- I've kicked off https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/288/ after the cherry-pick. Lets watch this and see if it fixes things > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465929#comment-15465929 ] Shivaram Venkataraman commented on SPARK-17404: --- I've kicked off https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/288/ after the cherry-pick. Lets watch this and see if it fixes things > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465915#comment-15465915 ] Shivaram Venkataraman commented on SPARK-17404: --- Hmm - we upgraded the version of testthat on the cluster last week. This could be related to that. I'm going to try and backport https://github.com/apache/spark/pull/12867 to branch-1.6 and see if it fixes things > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17400) MinMaxScaler.transform() outputs DenseVector by default, which causes poor performance
[ https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465845#comment-15465845 ] Frank Dai commented on SPARK-17400: --- Currently I'm haunted by this issue in production, I have 6 millions users and 564 thousands items and the MinMaxScaler.transform() takes more than 22 hours for prediction, the ALS.fit() actually is fast. My cluster has 70 nodes and each node has 16 cores and 32GB memory. > MinMaxScaler.transform() outputs DenseVector by default, which causes poor > performance > -- > > Key: SPARK-17400 > URL: https://issues.apache.org/jira/browse/SPARK-17400 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Frank Dai > > MinMaxScaler.transform() outputs DenseVector by default, which will cause > poor performance and consume a lot of memory. > The most important line of code is the following: > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala#L195 > I suggest that the code should calculate the number of non-zero elements in > advance, if the number of non-zero elements is less than half of the total > elements in the matrix, use SparseVector, otherwise use DenseVector > Or we can make it configurable by adding a parameter to > MinMaxScaler.transform(), for example MinMaxScaler.transform(isDense: > Boolean), so that users can decide whether their output result is dense or > sparse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17400) MinMaxScaler.transform() outputs DenseVector by default, which causes poor performance
[ https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465845#comment-15465845 ] Frank Dai edited comment on SPARK-17400 at 9/5/16 9:52 PM: --- Currently I'm blocked by this issue in production, I have 6 millions users and 564 thousands items and the MinMaxScaler.transform() takes more than 22 hours for prediction, the ALS.fit() actually is fast. My cluster has 70 nodes and each node has 16 cores and 32GB memory. was (Author: soulmachine): Currently I'm haunted by this issue in production, I have 6 millions users and 564 thousands items and the MinMaxScaler.transform() takes more than 22 hours for prediction, the ALS.fit() actually is fast. My cluster has 70 nodes and each node has 16 cores and 32GB memory. > MinMaxScaler.transform() outputs DenseVector by default, which causes poor > performance > -- > > Key: SPARK-17400 > URL: https://issues.apache.org/jira/browse/SPARK-17400 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 1.6.1, 1.6.2, 2.0.0 >Reporter: Frank Dai > > MinMaxScaler.transform() outputs DenseVector by default, which will cause > poor performance and consume a lot of memory. > The most important line of code is the following: > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala#L195 > I suggest that the code should calculate the number of non-zero elements in > advance, if the number of non-zero elements is less than half of the total > elements in the matrix, use SparseVector, otherwise use DenseVector > Or we can make it configurable by adding a parameter to > MinMaxScaler.transform(), for example MinMaxScaler.transform(isDense: > Boolean), so that users can decide whether their output result is dense or > sparse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14649) DagScheduler re-starts all running tasks on fetch failure
[ https://issues.apache.org/jira/browse/SPARK-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-14649: --- Description: When a fetch failure occurs, the DAGScheduler re-launches the previous stage (to re-generate output that was missing), and then re-launches all tasks in the stage with the fetch failure that hadn't *completed* when the fetch failure occurred (the DAGScheduler re-lanches all of the tasks whose output data is not available -- which is equivalent to the set of tasks that hadn't yet completed). The assumption when this code was originally written was that when a fetch failure occurred, the output from at least one of the tasks in the previous stage was no longer available, so all of the tasks in the current stage would eventually fail due to not being able to access that output. This assumption does not hold for some large-scale, long-running workloads. E.g., there's one use case where a job has ~100k tasks that each run for about 1 hour, and only the first 5-10 minutes are spent fetching data. Because of the large number of tasks, it's very common to see a few tasks fail in the fetch phase, and it's wasteful to re-run other tasks that had finished fetching data so aren't affected by the fetch failure (and may be most of the way through their hour-long execution). The DAGScheduler should not re-start these tasks. was:When running a job we found out that there are many duplicate tasks running after fetch failure in a stage. The issue is that when submitting tasks for a stage, the dag scheduler submits all the pending tasks (tasks whose output is not available). But out of those pending tasks, some tasks might already be running on the cluster. The dag scheduler need to submit only non-running tasks for a stage. Summary: DagScheduler re-starts all running tasks on fetch failure (was: DagScheduler runs duplicate tasks on fetch failure) > DagScheduler re-starts all running tasks on fetch failure > - > > Key: SPARK-14649 > URL: https://issues.apache.org/jira/browse/SPARK-14649 > Project: Spark > Issue Type: Bug > Components: Scheduler >Reporter: Sital Kedia > > When a fetch failure occurs, the DAGScheduler re-launches the previous stage > (to re-generate output that was missing), and then re-launches all tasks in > the stage with the fetch failure that hadn't *completed* when the fetch > failure occurred (the DAGScheduler re-lanches all of the tasks whose output > data is not available -- which is equivalent to the set of tasks that hadn't > yet completed). > The assumption when this code was originally written was that when a fetch > failure occurred, the output from at least one of the tasks in the previous > stage was no longer available, so all of the tasks in the current stage would > eventually fail due to not being able to access that output. This assumption > does not hold for some large-scale, long-running workloads. E.g., there's > one use case where a job has ~100k tasks that each run for about 1 hour, and > only the first 5-10 minutes are spent fetching data. Because of the large > number of tasks, it's very common to see a few tasks fail in the fetch phase, > and it's wasteful to re-run other tasks that had finished fetching data so > aren't affected by the fetch failure (and may be most of the way through > their hour-long execution). The DAGScheduler should not re-start these tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17302) Cannot set non-Spark SQL session variables in hive-site.xml, spark-defaults.conf, or using --conf
[ https://issues.apache.org/jira/browse/SPARK-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465818#comment-15465818 ] Sean Zhong commented on SPARK-17302: [~rdblue] Can you write a reproducer code sample to describe your problem? To elaborate what's your expected behavior, and what is your current observation? I am not sure whether the {monospace}spark.conf.set("key", "value"){monospace} is what you expect or not. > Cannot set non-Spark SQL session variables in hive-site.xml, > spark-defaults.conf, or using --conf > - > > Key: SPARK-17302 > URL: https://issues.apache.org/jira/browse/SPARK-17302 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Ryan Blue > > When configuration changed for 2.0 to the new SparkSession structure, Spark > stopped using Hive's internal HiveConf for session state and now uses > HiveSessionState and an associated SQLConf. Now, session options like > hive.exec.compress.output and hive.exec.dynamic.partition.mode are pulled > from this SQLConf. This doesn't include session properties from hive-site.xml > (including hive.exec.compress.output), and no longer contains Spark-specific > overrides from spark-defaults.conf that used the spark.hadoop.hive... pattern. > Also, setting these variables on the command-line no longer works because > settings must start with "spark.". > Is there a recommended way to set Hive session properties? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465808#comment-15465808 ] Herman van Hovell edited comment on SPARK-17403 at 9/5/16 9:24 PM: --- Well then we sorta know where to look. These things are notoriously hard to debug. was (Author: hvanhovell): Well then we sorta know where to look. > Fatal Error: SIGSEGV on Jdbc joins > -- > > Key: SPARK-17403 > URL: https://issues.apache.org/jira/browse/SPARK-17403 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Spark standalone cluster (3 Workers, 47 cores) > Ubuntu 14 > Java 8 >Reporter: Ruben Hernando > > The process creates views from JDBC (SQL server) source and combines them to > create other views. > Finally it dumps results via JDBC > Error: > {quote} > # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build > 1.8.0_101-b13) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode > linux-amd64 ) > # Problematic frame: > # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 > bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] > # > {quote} > SQL Query plan (fields truncated): > {noformat} > == Parsed Logical Plan == > 'Project [*] > +- 'UnresolvedRelation `COEQ_63` > == Analyzed Logical Plan == > InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int > Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, > priceId#20244] > +- SubqueryAlias coeq_63 >+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- SubqueryAlias 6__input > +- > Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# > .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables > TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where > _TP = 24) input) > == Optimized Logical Plan == > Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], > [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] > JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... > 36 more fields] > == Physical Plan == > *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] >: +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] > [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... > 36 more fields] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465808#comment-15465808 ] Herman van Hovell commented on SPARK-17403: --- Well then we sorta know where to look. > Fatal Error: SIGSEGV on Jdbc joins > -- > > Key: SPARK-17403 > URL: https://issues.apache.org/jira/browse/SPARK-17403 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Spark standalone cluster (3 Workers, 47 cores) > Ubuntu 14 > Java 8 >Reporter: Ruben Hernando > > The process creates views from JDBC (SQL server) source and combines them to > create other views. > Finally it dumps results via JDBC > Error: > {quote} > # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build > 1.8.0_101-b13) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode > linux-amd64 ) > # Problematic frame: > # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 > bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] > # > {quote} > SQL Query plan (fields truncated): > {noformat} > == Parsed Logical Plan == > 'Project [*] > +- 'UnresolvedRelation `COEQ_63` > == Analyzed Logical Plan == > InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int > Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, > priceId#20244] > +- SubqueryAlias coeq_63 >+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- SubqueryAlias 6__input > +- > Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# > .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables > TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where > _TP = 24) input) > == Optimized Logical Plan == > Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], > [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] > JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... > 36 more fields] > == Physical Plan == > *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] >: +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] > [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... > 36 more fields] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17364) Can not query hive table starting with number
[ https://issues.apache.org/jira/browse/SPARK-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465799#comment-15465799 ] Herman van Hovell commented on SPARK-17364: --- This should work. For instance this works: {{SELECT * from 20160826_ip_list}} > Can not query hive table starting with number > - > > Key: SPARK-17364 > URL: https://issues.apache.org/jira/browse/SPARK-17364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Egor Pahomov > > I can do it with spark-1.6.2 > {code} > SELECT * from temp.20160826_ip_list limit 100 > {code} > {code} > Error: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input '.20160826' expecting {, ',', 'SELECT', 'FROM', 'ADD', > 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', > 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', > 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 19) > == SQL == > SELECT * from temp.20160826_ip_list limit 100 > ---^^^ > SQLState: null > ErrorCode: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
[ https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465781#comment-15465781 ] Yin Huai commented on SPARK-17404: -- cc [~felixcheung] and [~shivaram] > [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R > --- > > Key: SPARK-17404 > URL: https://issues.apache.org/jira/browse/SPARK-17404 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.6.3 >Reporter: Yin Huai > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console > Seems showDF in test_sparkSQL.R is broken. > {code} > Failed > - > 1. Failure: showDF() (@test_sparkSQL.R#1400) > --- > `s` produced no output > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
Yin Huai created SPARK-17404: Summary: [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R Key: SPARK-17404 URL: https://issues.apache.org/jira/browse/SPARK-17404 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 1.6.3 Reporter: Yin Huai https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console Seems showDF in test_sparkSQL.R is broken. {code} Failed - 1. Failure: showDF() (@test_sparkSQL.R#1400) --- `s` produced no output {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17364) Can not query hive table starting with number
[ https://issues.apache.org/jira/browse/SPARK-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465754#comment-15465754 ] Sean Zhong edited comment on SPARK-17364 at 9/5/16 8:47 PM: [~epahomov] Spark 2.0 rewrote the Sql parser with antlr. You can use the backticked version like Yuming's example. was (Author: clockfly): [~epahomov] Spark 2.0 rewrites the Sql parser with antlr. You can use the backticked version like Yuming's example. > Can not query hive table starting with number > - > > Key: SPARK-17364 > URL: https://issues.apache.org/jira/browse/SPARK-17364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Egor Pahomov > > I can do it with spark-1.6.2 > {code} > SELECT * from temp.20160826_ip_list limit 100 > {code} > {code} > Error: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input '.20160826' expecting {, ',', 'SELECT', 'FROM', 'ADD', > 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', > 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', > 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 19) > == SQL == > SELECT * from temp.20160826_ip_list limit 100 > ---^^^ > SQLState: null > ErrorCode: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17364) Can not query hive table starting with number
[ https://issues.apache.org/jira/browse/SPARK-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465754#comment-15465754 ] Sean Zhong commented on SPARK-17364: [~epahomov] Spark 2.0 rewrites the Sql parser with antlr. You can use the backticked version like Yuming's example. > Can not query hive table starting with number > - > > Key: SPARK-17364 > URL: https://issues.apache.org/jira/browse/SPARK-17364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Egor Pahomov > > I can do it with spark-1.6.2 > {code} > SELECT * from temp.20160826_ip_list limit 100 > {code} > {code} > Error: org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input '.20160826' expecting {, ',', 'SELECT', 'FROM', 'ADD', > 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', > 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', > 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', > 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 19) > == SQL == > SELECT * from temp.20160826_ip_list limit 100 > ---^^^ > SQLState: null > ErrorCode: 0 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465539#comment-15465539 ] Ruben Hernando commented on SPARK-17403: You're right. Removing cache() errors disappeared > Fatal Error: SIGSEGV on Jdbc joins > -- > > Key: SPARK-17403 > URL: https://issues.apache.org/jira/browse/SPARK-17403 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Spark standalone cluster (3 Workers, 47 cores) > Ubuntu 14 > Java 8 >Reporter: Ruben Hernando > > The process creates views from JDBC (SQL server) source and combines them to > create other views. > Finally it dumps results via JDBC > Error: > {quote} > # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build > 1.8.0_101-b13) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode > linux-amd64 ) > # Problematic frame: > # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 > bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] > # > {quote} > SQL Query plan (fields truncated): > {noformat} > == Parsed Logical Plan == > 'Project [*] > +- 'UnresolvedRelation `COEQ_63` > == Analyzed Logical Plan == > InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int > Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, > priceId#20244] > +- SubqueryAlias coeq_63 >+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- SubqueryAlias 6__input > +- > Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# > .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables > TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where > _TP = 24) input) > == Optimized Logical Plan == > Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], > [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] > JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... > 36 more fields] > == Physical Plan == > *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] >: +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] > [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... > 36 more fields] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465383#comment-15465383 ] Herman van Hovell commented on SPARK-17403: --- [~rhernando] Could you try this without caching, and see if you run into trouble? > Fatal Error: SIGSEGV on Jdbc joins > -- > > Key: SPARK-17403 > URL: https://issues.apache.org/jira/browse/SPARK-17403 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: Spark standalone cluster (3 Workers, 47 cores) > Ubuntu 14 > Java 8 >Reporter: Ruben Hernando > > The process creates views from JDBC (SQL server) source and combines them to > create other views. > Finally it dumps results via JDBC > Error: > {quote} > # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build > 1.8.0_101-b13) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode > linux-amd64 ) > # Problematic frame: > # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 > bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] > # > {quote} > SQL Query plan (fields truncated): > {noformat} > == Parsed Logical Plan == > 'Project [*] > +- 'UnresolvedRelation `COEQ_63` > == Analyzed Logical Plan == > InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int > Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, > priceId#20244] > +- SubqueryAlias coeq_63 >+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- SubqueryAlias 6__input > +- > Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# > .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables > TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where > _TP = 24) input) > == Optimized Logical Plan == > Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], > [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] > JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... > 36 more fields] > == Physical Plan == > *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS > price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] > +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] >: +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, > _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, > StorageLevel(disk, memory, deserialized, 1 replicas) >: : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], > [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], > [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] > [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = > [SLTables].[_TableSL_SID] where _TP = 24) input) > [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... > 36 more fields] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table
[ https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465375#comment-15465375 ] Ryan Blue commented on SPARK-17396: --- I'm not sure that the ForkJoinPool is to blame. Each partition in a Hive table becomes a partition RDD in the union. Each UnionRDD creates a new ForkJoinPool (parallelism=8) to list the files in a partition in parallel. I think it is more likely that the problem is that we don't use a shared ForkJoinPool, but instead create a new one for each RDD that doesn't get cleaned up until the UnionRDD is cleaned. Even if the ForkJoinPool can't reuse threads and we're getting more than 8, it wouldn't grow to thousands if we weren't creating so many pools. But, I agree that we should consider a different executor service. I've been reading up on ForkJoinPool and it looks like it's not a great implementation; just trying to determine the maximum number of threads it will use was painfully undocumented. [~pin_zhang], can you post a list of the thread names? We can use the names to determine how many threads there are in each pool. That should tell us whether we should use a different executor service in addition to using a shared pool. > Threads number keep increasing when query on external CSV partitioned table > --- > > Key: SPARK-17396 > URL: https://issues.apache.org/jira/browse/SPARK-17396 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: pin_zhang > > 1. Create a external partitioned table row format CSV > 2. Add 16 partitions to the table > 3. Run SQL "select count(*) from test_csv" > 4. ForkJoinThread number keep increasing > This happend when table partitions number greater than 10. > 5. Test Code > {code:lang=java} > import org.apache.spark.SparkConf > import org.apache.spark.SparkContext > import org.apache.spark.sql.hive.HiveContext > object Bugs { > def main(args: Array[String]): Unit = { > val location = "file:///g:/home/test/csv" > val create = s"""CREATE EXTERNAL TABLE test_csv > (ID string, SEQ string ) > PARTITIONED BY(index int) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION "${location}" > """ > val add_part = s""" > ALTER TABLE test_csv ADD > PARTITION (index=1)LOCATION '${location}/index=1' > PARTITION (index=2)LOCATION '${location}/index=2' > PARTITION (index=3)LOCATION '${location}/index=3' > PARTITION (index=4)LOCATION '${location}/index=4' > PARTITION (index=5)LOCATION '${location}/index=5' > PARTITION (index=6)LOCATION '${location}/index=6' > PARTITION (index=7)LOCATION '${location}/index=7' > PARTITION (index=8)LOCATION '${location}/index=8' > PARTITION (index=9)LOCATION '${location}/index=9' > PARTITION (index=10)LOCATION '${location}/index=10' > PARTITION (index=11)LOCATION '${location}/index=11' > PARTITION (index=12)LOCATION '${location}/index=12' > PARTITION (index=13)LOCATION '${location}/index=13' > PARTITION (index=14)LOCATION '${location}/index=14' > PARTITION (index=15)LOCATION '${location}/index=15' > PARTITION (index=16)LOCATION '${location}/index=16' > """ > val conf = new SparkConf().setAppName("scala").setMaster("local[2]") > conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse") > val ctx = new SparkContext(conf) > val hctx = new HiveContext(ctx) > hctx.sql(create) > hctx.sql(add_part) > for (i <- 1 to 6) { > new Query(hctx).start() > } > } > class Query(htcx: HiveContext) extends Thread { > setName("Query-Thread") > override def run = { > while (true) { > htcx.sql("select count(*) from test_csv").show() > Thread.sleep(100) > } > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins
[ https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-17403: -- Description: The process creates views from JDBC (SQL server) source and combines them to create other views. Finally it dumps results via JDBC Error: {quote} # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 ) # Problematic frame: # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] # {quote} SQL Query plan (fields truncated): {noformat} == Parsed Logical Plan == 'Project [*] +- 'UnresolvedRelation `COEQ_63` == Analyzed Logical Plan == InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, priceId#20244] +- SubqueryAlias coeq_63 +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- SubqueryAlias 6__input +- Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) == Optimized Logical Plan == Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... 36 more fields] == Physical Plan == *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] : +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) : : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 36 more fields] {noformat} was: The process creates views from JDBC (SQL server) source and combines them to create other views. Finally it dumps results via JDBC Error: {quote} # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 ) # Problematic frame: # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] # {quote} SQL Query plan (fields truncated): {quote} == Parsed Logical Plan == 'Project [*] +- 'UnresolvedRelation `COEQ_63` == Analyzed Logical Plan == InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, priceId#20244] +- SubqueryAlias coeq_63 +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- SubqueryAlias 6__input +- Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) == Optimized Logical Plan ==
[jira] [Updated] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table
[ https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-17396: -- Description: 1. Create a external partitioned table row format CSV 2. Add 16 partitions to the table 3. Run SQL "select count(*) from test_csv" 4. ForkJoinThread number keep increasing This happend when table partitions number greater than 10. 5. Test Code {code:lang=java} import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql.hive.HiveContext object Bugs { def main(args: Array[String]): Unit = { val location = "file:///g:/home/test/csv" val create = s"""CREATE EXTERNAL TABLE test_csv (ID string, SEQ string ) PARTITIONED BY(index int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION "${location}" """ val add_part = s""" ALTER TABLE test_csv ADD PARTITION (index=1)LOCATION '${location}/index=1' PARTITION (index=2)LOCATION '${location}/index=2' PARTITION (index=3)LOCATION '${location}/index=3' PARTITION (index=4)LOCATION '${location}/index=4' PARTITION (index=5)LOCATION '${location}/index=5' PARTITION (index=6)LOCATION '${location}/index=6' PARTITION (index=7)LOCATION '${location}/index=7' PARTITION (index=8)LOCATION '${location}/index=8' PARTITION (index=9)LOCATION '${location}/index=9' PARTITION (index=10)LOCATION '${location}/index=10' PARTITION (index=11)LOCATION '${location}/index=11' PARTITION (index=12)LOCATION '${location}/index=12' PARTITION (index=13)LOCATION '${location}/index=13' PARTITION (index=14)LOCATION '${location}/index=14' PARTITION (index=15)LOCATION '${location}/index=15' PARTITION (index=16)LOCATION '${location}/index=16' """ val conf = new SparkConf().setAppName("scala").setMaster("local[2]") conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse") val ctx = new SparkContext(conf) val hctx = new HiveContext(ctx) hctx.sql(create) hctx.sql(add_part) for (i <- 1 to 6) { new Query(hctx).start() } } class Query(htcx: HiveContext) extends Thread { setName("Query-Thread") override def run = { while (true) { htcx.sql("select count(*) from test_csv").show() Thread.sleep(100) } } } } {code} was: 1. Create a external partitioned table row format CSV 2. Add 16 partitions to the table 3. Run SQL "select count(*) from test_csv" 4. ForkJoinThread number keep increasing This happend when table partitions number greater than 10. 5. Test Code import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.sql.hive.HiveContext object Bugs { def main(args: Array[String]): Unit = { val location = "file:///g:/home/test/csv" val create = s"""CREATE EXTERNAL TABLE test_csv (ID string, SEQ string ) PARTITIONED BY(index int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION "${location}" """ val add_part = s""" ALTER TABLE test_csv ADD PARTITION (index=1)LOCATION '${location}/index=1' PARTITION (index=2)LOCATION '${location}/index=2' PARTITION (index=3)LOCATION '${location}/index=3' PARTITION (index=4)LOCATION '${location}/index=4' PARTITION (index=5)LOCATION '${location}/index=5' PARTITION (index=6)LOCATION '${location}/index=6' PARTITION (index=7)LOCATION '${location}/index=7' PARTITION (index=8)LOCATION '${location}/index=8' PARTITION (index=9)LOCATION '${location}/index=9' PARTITION (index=10)LOCATION '${location}/index=10' PARTITION (index=11)LOCATION '${location}/index=11' PARTITION (index=12)LOCATION '${location}/index=12' PARTITION (index=13)LOCATION '${location}/index=13' PARTITION (index=14)LOCATION '${location}/index=14' PARTITION (index=15)LOCATION '${location}/index=15' PARTITION (index=16)LOCATION '${location}/index=16' """ val conf = new SparkConf().setAppName("scala").setMaster("local[2]") conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse") val ctx = new SparkContext(conf) val hctx = new HiveContext(ctx) hctx.sql(create) hctx.sql(add_part) for (i <- 1 to 6) { new Query(hctx).start() } } class Query(htcx: HiveContext) extends Thread { setName("Query-Thread") override def run = { while (true) { htcx.sql("select count(*) from test_csv").show() Thread.sleep(100) } } } } > Threads number keep increasing when query on external CSV partitioned table > --- > > Key: SPARK-17396 > URL: https://issues.apache.org/jira/browse/SPARK-17396 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465353#comment-15465353 ] Ruben Hernando commented on SPARK-15822: Created https://issues.apache.org/jira/browse/SPARK-17403 > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins
Ruben Hernando created SPARK-17403: -- Summary: Fatal Error: SIGSEGV on Jdbc joins Key: SPARK-17403 URL: https://issues.apache.org/jira/browse/SPARK-17403 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Environment: Spark standalone cluster (3 Workers, 47 cores) Ubuntu 14 Java 8 Reporter: Ruben Hernando The process creates views from JDBC (SQL server) source and combines them to create other views. Finally it dumps results via JDBC Error: {quote} # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 ) # Problematic frame: # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] # {quote} SQL Query plan (fields truncated): {quote} == Parsed Logical Plan == 'Project [*] +- 'UnresolvedRelation `COEQ_63` == Analyzed Logical Plan == InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, priceId#20244] +- SubqueryAlias coeq_63 +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- SubqueryAlias 6__input +- Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) == Optimized Logical Plan == Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... 36 more fields] == Physical Plan == *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] : +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) : : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 36 more fields] {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465342#comment-15465342 ] Ruben Hernando commented on SPARK-15822: Thank you! I'll do it > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465338#comment-15465338 ] Herman van Hovell commented on SPARK-15822: --- Could you open a new ticket for this one? It is definitely not connected to this or the broadcast join issue. > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465335#comment-15465335 ] Herman van Hovell commented on SPARK-15822: --- Thanks! Do you also have this problem when you don't cache the table? > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465316#comment-15465316 ] Ruben Hernando commented on SPARK-15822: {quote} == Parsed Logical Plan == 'Project [*] +- 'UnresolvedRelation `COEQ_63` == Analyzed Logical Plan == InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, priceId#20244] +- SubqueryAlias coeq_63 +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- SubqueryAlias 6__input +- Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON# .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables]. ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) == Optimized Logical Plan == Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], [TPSLTables].[SL_ACT_GI_DTE], ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,... 36 more fields] == Physical Plan == *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244] +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189] : +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, deserialized, 1 replicas) : : +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], [SLTables].[Name], [SLTables].[TableP_DCID], ... [...] FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP = 24) input) [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 36 more fields] {quote} > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown >
[jira] [Commented] (SPARK-16992) Pep8 code style
[ https://issues.apache.org/jira/browse/SPARK-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465314#comment-15465314 ] Apache Spark commented on SPARK-16992: -- User 'Stibbons' has created a pull request for this issue: https://github.com/apache/spark/pull/14963 > Pep8 code style > --- > > Key: SPARK-16992 > URL: https://issues.apache.org/jira/browse/SPARK-16992 > Project: Spark > Issue Type: Improvement >Reporter: Semet > > Add code style checks and auto formating into the Python code. > Features: > - add a {{.editconfig}} file (Spark's Scala files use 2-spaces indentation, > while Python files uses 4) for compatible editors (almost every editors has a > plugin to support {{.editconfig}} file) > - use autopep8 to fix basic pep8 mistakes > - use isort to automatically sort and organise {{import}} statements and > organise them into logically linked order (see doc here. The most important > thing is that it splits import statements that loads more than one object > into several lines. It send keep the imports sorted. Said otherwise, for a > given module import, the line where it should be added will be fixed. This > will increase the number of line of the file, but this facilitates a lot file > maintainance and file merges if needed. > add a 'validate.sh' script in order to automatise the correction (need isort > and autopep8 installed) > You can see similar script in prod in the > [Buildbot|https://github.com/buildbot/buildbot/blob/master/common/validate.sh] > project. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17072) generate table level stats:stats generation/storing/loading
[ https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465257#comment-15465257 ] Herman van Hovell edited comment on SPARK-17072 at 9/5/16 3:36 PM: --- This is resolved by PR https://github.com/apache/spark/pull/14712 by Zhenhua Wang. If someone knows his JIRA account name, I would be happy to assign it to him. was (Author: hvanhovell): This is resolve by PR https://github.com/apache/spark/pull/14712 by Zhenhua Wang. If someone knows his JIRA account name, I would be happy to assign it to him. > generate table level stats:stats generation/storing/loading > --- > > Key: SPARK-17072 > URL: https://issues.apache.org/jira/browse/SPARK-17072 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > Fix For: 2.1.0 > > > need to generating , storing, and loading statistics information into/from > meta store. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17072) generate table level stats:stats generation/storing/loading
[ https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-17072. --- Resolution: Fixed Fix Version/s: 2.1.0 This is resolve by PR https://github.com/apache/spark/pull/14712 by Zhenhua Wang. If someone knows his JIRA account name, I would be happy to assign it to him. > generate table level stats:stats generation/storing/loading > --- > > Key: SPARK-17072 > URL: https://issues.apache.org/jira/browse/SPARK-17072 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ron Hu > Fix For: 2.1.0 > > > need to generating , storing, and loading statistics information into/from > meta store. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations
[ https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465226#comment-15465226 ] Wenchen Fan commented on SPARK-17154: - can you check if your design satisfies following requirements? 1. self-join issue should be resolved(at least give a better error message) 2. can reference same-name column from different dataframes, e.g. {code} val df1 = ... val df2 = ... val joined = df1.join(df2, (df1("a") + 1) === df2("a")) joined.drop(df2("a")) {code} 3. can reference column from previous dataframe, e.g. {code} val df = ... df.filter(df("a") > 0).filter(df("b") > 0).filter(df("c") === 1) {code} > Wrong result can be returned or AnalysisException can be thrown after > self-join or similar operations > - > > Key: SPARK-17154 > URL: https://issues.apache.org/jira/browse/SPARK-17154 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.0.0 >Reporter: Kousuke Saruta > Attachments: Solution_Proposal_SPARK-17154.pdf > > > When we join two DataFrames which are originated from a same DataFrame, > operations to the joined DataFrame can fail. > One reproducible example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val joined = filtered.join(df, filtered("col1") === df("col1"), "inner") > val selected1 = joined.select(df("col3")) > {code} > In this case, AnalysisException is thrown. > Another example is as follows. > {code} > val df = Seq( > (1, "a", "A"), > (2, "b", "B"), > (3, "c", "C"), > (4, "d", "D"), > (5, "e", "E")).toDF("col1", "col2", "col3") > val filtered = df.filter("col1 != 3").select("col1", "col2") > val rightOuterJoined = filtered.join(df, filtered("col1") === df("col1"), > "right") > val selected2 = rightOuterJoined.select(df("col1")) > selected2.show > {code} > In this case, we will expect to get the answer like as follows. > {code} > 1 > 2 > 3 > 4 > 5 > {code} > But the actual result is as follows. > {code} > 1 > 2 > null > 4 > 5 > {code} > The cause of the problems in the examples is that the logical plan related to > the right side DataFrame and the expressions of its output are re-created in > the analyzer (at ResolveReference rule) when a DataFrame has expressions > which have a same exprId each other. > Re-created expressions are equally to the original ones except exprId. > This will happen when we do self-join or similar pattern operations. > In the first example, df("col3") returns a Column which includes an > expression and the expression have an exprId (say id1 here). > After join, the expresion which the right side DataFrame (df) has is > re-created and the old and new expressions are equally but exprId is renewed > (say id2 for the new exprId here). > Because of the mismatch of those exprIds, AnalysisException is thrown. > In the second example, df("col1") returns a column and the expression > contained in the column is assigned an exprId (say id3). > On the other hand, a column returned by filtered("col1") has an expression > which has the same exprId (id3). > After join, the expressions in the right side DataFrame are re-created and > the expression assigned id3 is no longer present in the right side but > present in the left side. > So, referring df("col1") to the joined DataFrame, we get col1 of right side > which includes null. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17399) Update specific data in mySql Table from Spark
[ https://issues.apache.org/jira/browse/SPARK-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17399. --- Resolution: Invalid Target Version/s: (was: 2.0.0) Please direction questions to the u...@spark.apache.org list, but you'd also need to be clearer about how this relates to Spark, as you're asking mostly about updating MySQL. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > Update specific data in mySql Table from Spark > -- > > Key: SPARK-17399 > URL: https://issues.apache.org/jira/browse/SPARK-17399 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.0.0 > Environment: Linux >Reporter: Farman Ali > Labels: features > Original Estimate: 12h > Remaining Estimate: 12h > > I want to update specific data in mySql Table from Spark data-frames or RDD. > for example i have column > Name ID > Farman 1 > Ali 2 > now i want to update name where id=1.How can I update this specific record. > how it is possible in Spark? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17401) Scala IDE not able to connect to Spark Master
[ https://issues.apache.org/jira/browse/SPARK-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17401: -- Target Version/s: (was: 2.0.0) Priority: Major (was: Blocker) See https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark -- don't set blocker/target > Scala IDE not able to connect to Spark Master > - > > Key: SPARK-17401 > URL: https://issues.apache.org/jira/browse/SPARK-17401 > Project: Spark > Issue Type: Bug > Components: Build, Deploy, Examples >Affects Versions: 2.0.0 > Environment: Windows (x64 Bit) >Reporter: Sachin Gupta > > Able to compile Spark project in Scala IDE. We've started master using > "spark-shell" which works fine from console with master reachable at > http://localhost:4040. > When we execute project from IDE it says can't connect to port 4040 and tries > on 4041. On master console (DOS) following error is shown: > 16/09/05 17:40:35 WARN HttpParser: Illegal character 0x0 in state=START for > buff > er > HeapByteBuffer@773c8742[p=1,l=1302,c=16384,r=1301]={\x00<<<\x00\x00\x00\x00\x > 00\x05\x16\x03N\x8d\xFcA\xE1\xDb.\x16\x00...\x00\r192.168.13.61>>>\x00\x00\x00\x > 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x0 > 0\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/09/05 17:40:35 WARN HttpParser: badMessage: 400 Illegal character 0x0 for > Htt > pChannelOverHttp@660d0260{r=0,c=false,a=IDLE,uri=} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17402: Assignee: Wenchen Fan (was: Apache Spark) > separate the management of temp views and metastore tables/views in > SessionCatalog > -- > > Key: SPARK-17402 > URL: https://issues.apache.org/jira/browse/SPARK-17402 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17402: Assignee: Apache Spark (was: Wenchen Fan) > separate the management of temp views and metastore tables/views in > SessionCatalog > -- > > Key: SPARK-17402 > URL: https://issues.apache.org/jira/browse/SPARK-17402 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465085#comment-15465085 ] Apache Spark commented on SPARK-17402: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/14962 > separate the management of temp views and metastore tables/views in > SessionCatalog > -- > > Key: SPARK-17402 > URL: https://issues.apache.org/jira/browse/SPARK-17402 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog
Wenchen Fan created SPARK-17402: --- Summary: separate the management of temp views and metastore tables/views in SessionCatalog Key: SPARK-17402 URL: https://issues.apache.org/jira/browse/SPARK-17402 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17379: -- Assignee: Adam Roberts > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.6.2, 2.0.0 >Reporter: Adam Roberts >Assignee: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17379: Assignee: (was: Apache Spark) > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.6.2, 2.0.0 >Reporter: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464976#comment-15464976 ] Apache Spark commented on SPARK-17379: -- User 'a-roberts' has created a pull request for this issue: https://github.com/apache/spark/pull/14961 > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.6.2, 2.0.0 >Reporter: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17379: Assignee: Apache Spark > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.6.2, 2.0.0 >Reporter: Adam Roberts >Assignee: Apache Spark >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464962#comment-15464962 ] Herman van Hovell commented on SPARK-15822: --- Could you share the query plan? > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464925#comment-15464925 ] Adam Roberts commented on SPARK-17379: -- Thanks Artur, useful to know, no new tests fail with branch-1.6 and the above change either. > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.6.2, 2.0.0 >Reporter: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Roberts updated SPARK-17379: - Affects Version/s: 1.6.2 Component/s: Build > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 1.6.2, 2.0.0 >Reporter: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16213) Reduce runtime overhead of a program that creates an primitive array in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-16213: - Affects Version/s: 2.0.0 > Reduce runtime overhead of a program that creates an primitive array in > DataFrame > - > > Key: SPARK-16213 > URL: https://issues.apache.org/jira/browse/SPARK-16213 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Kazuaki Ishizaki > > Reduce runtime overhead of a program that creates an primitive array in > DataFrame > When a program creates an array in DataFrame, the code generator creates > boxing operations. If an array is for primitive type, there are some > opportunities for optimizations in generated code to reduce runtime overhead. > Here is a simple example that has generated code with boxing operation > {code} > val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF > df.selectExpr("Array(value + 1.1d, value + 2.2d)").show > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17401) Scala IDE not able to connect to Spark Master
[ https://issues.apache.org/jira/browse/SPARK-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464899#comment-15464899 ] Sean Owen commented on SPARK-17401: --- 4040 is normally where the Spark UI binds, not master. I think you have something mixed up where, where something is talking to the UI when it expects to talk to the master or vice versa. Run your master on some other port to rule out this mix up. > Scala IDE not able to connect to Spark Master > - > > Key: SPARK-17401 > URL: https://issues.apache.org/jira/browse/SPARK-17401 > Project: Spark > Issue Type: Bug > Components: Build, Deploy, Examples >Affects Versions: 2.0.0 > Environment: Windows (x64 Bit) >Reporter: Sachin Gupta >Priority: Blocker > > Able to compile Spark project in Scala IDE. We've started master using > "spark-shell" which works fine from console with master reachable at > http://localhost:4040. > When we execute project from IDE it says can't connect to port 4040 and tries > on 4041. On master console (DOS) following error is shown: > 16/09/05 17:40:35 WARN HttpParser: Illegal character 0x0 in state=START for > buff > er > HeapByteBuffer@773c8742[p=1,l=1302,c=16384,r=1301]={\x00<<<\x00\x00\x00\x00\x > 00\x05\x16\x03N\x8d\xFcA\xE1\xDb.\x16\x00...\x00\r192.168.13.61>>>\x00\x00\x00\x > 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x0 > 0\x00\x00\x00\x00\x00\x00\x00\x00\x00} > 16/09/05 17:40:35 WARN HttpParser: badMessage: 400 Illegal character 0x0 for > Htt > pChannelOverHttp@660d0260{r=0,c=false,a=IDLE,uri=} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17401) Scala IDE not able to connect to Spark Master
Sachin Gupta created SPARK-17401: Summary: Scala IDE not able to connect to Spark Master Key: SPARK-17401 URL: https://issues.apache.org/jira/browse/SPARK-17401 Project: Spark Issue Type: Bug Components: Build, Deploy, Examples Affects Versions: 2.0.0 Environment: Windows (x64 Bit) Reporter: Sachin Gupta Priority: Blocker Able to compile Spark project in Scala IDE. We've started master using "spark-shell" which works fine from console with master reachable at http://localhost:4040. When we execute project from IDE it says can't connect to port 4040 and tries on 4041. On master console (DOS) following error is shown: 16/09/05 17:40:35 WARN HttpParser: Illegal character 0x0 in state=START for buff er HeapByteBuffer@773c8742[p=1,l=1302,c=16384,r=1301]={\x00<<<\x00\x00\x00\x00\x 00\x05\x16\x03N\x8d\xFcA\xE1\xDb.\x16\x00...\x00\r192.168.13.61>>>\x00\x00\x00\x 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x0 0\x00\x00\x00\x00\x00\x00\x00\x00\x00} 16/09/05 17:40:35 WARN HttpParser: badMessage: 400 Illegal character 0x0 for Htt pChannelOverHttp@660d0260{r=0,c=false,a=IDLE,uri=} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruben Hernando updated SPARK-15822: --- Comment: was deleted (was: I'm doing joins between SQL views. Data source is JDBC to SQL server) > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464860#comment-15464860 ] Ruben Hernando commented on SPARK-15822: I'm doing joins between SQL views. Data source is JDBC to SQL server > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464861#comment-15464861 ] Ruben Hernando commented on SPARK-15822: I'm doing joins between SQL views. Data source is JDBC to SQL server > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17339) Fix SparkR tests on Windows
[ https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464815#comment-15464815 ] Apache Spark commented on SPARK-17339: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/14960 > Fix SparkR tests on Windows > --- > > Key: SPARK-17339 > URL: https://issues.apache.org/jira/browse/SPARK-17339 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Reporter: Shivaram Venkataraman > > A number of SparkR tests are current failing when run on Windows as discussed > in https://github.com/apache/spark/pull/14743 > The list of tests that fail right now is at > https://gist.github.com/shivaram/7693df7bd54dc81e2e7d1ce296c41134 > A full log from a build and test on AppVeyor is at > https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17339) Fix SparkR tests on Windows
[ https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17339: Assignee: Apache Spark > Fix SparkR tests on Windows > --- > > Key: SPARK-17339 > URL: https://issues.apache.org/jira/browse/SPARK-17339 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Reporter: Shivaram Venkataraman >Assignee: Apache Spark > > A number of SparkR tests are current failing when run on Windows as discussed > in https://github.com/apache/spark/pull/14743 > The list of tests that fail right now is at > https://gist.github.com/shivaram/7693df7bd54dc81e2e7d1ce296c41134 > A full log from a build and test on AppVeyor is at > https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17339) Fix SparkR tests on Windows
[ https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17339: Assignee: (was: Apache Spark) > Fix SparkR tests on Windows > --- > > Key: SPARK-17339 > URL: https://issues.apache.org/jira/browse/SPARK-17339 > Project: Spark > Issue Type: Bug > Components: SparkR, Tests >Reporter: Shivaram Venkataraman > > A number of SparkR tests are current failing when run on Windows as discussed > in https://github.com/apache/spark/pull/14743 > The list of tests that fail right now is at > https://gist.github.com/shivaram/7693df7bd54dc81e2e7d1ce296c41134 > A full log from a build and test on AppVeyor is at > https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464808#comment-15464808 ] Herman van Hovell commented on SPARK-15822: --- [~rhernando] It should not be related to this one. Are you doing a broadcast hash join by any chance? > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464809#comment-15464809 ] Herman van Hovell commented on SPARK-15822: --- [~rhernando] It should not be related to this one. Are you doing a broadcast hash join by any chance? > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-15822: -- Comment: was deleted (was: [~rhernando] It should not be related to this one. Are you doing a broadcast hash join by any chance? ) > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String
[ https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464793#comment-15464793 ] Ruben Hernando commented on SPARK-15822: Encountered a similar error (using 2.0.0): {quote}# SIGSEGV (0xb) at pc=0x7fbb355dfd6c, pid=5592, tid=0x7faa9f2f4700 # # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 ) # Problematic frame: # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc] # {quote} I'm not sure if it's related > segmentation violation in o.a.s.unsafe.types.UTF8String > > > Key: SPARK-15822 > URL: https://issues.apache.org/jira/browse/SPARK-15822 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: linux amd64 > openjdk version "1.8.0_91" > OpenJDK Runtime Environment (build 1.8.0_91-b14) > OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) >Reporter: Pete Robbins >Assignee: Herman van Hovell >Priority: Blocker > Fix For: 2.0.0 > > > Executors fail with segmentation violation while running application with > spark.memory.offHeap.enabled true > spark.memory.offHeap.size 512m > Also now reproduced with > spark.memory.offHeap.enabled false > {noformat} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400 > # > # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # J 4816 C2 > org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I > (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d] > {noformat} > We initially saw this on IBM java on PowerPC box but is recreatable on linux > with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the > same code point: > {noformat} > 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48) > java.lang.NullPointerException > at > org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831) > at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664) > at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365) > at > org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.lang.Thread.run(Thread.java:785) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464744#comment-15464744 ] Sean Owen commented on SPARK-17381: --- That's it then. It's 'normal' operation, but does mean you need much bigger driver memory if you want to store state for all of these stages and tasks (or else keep fewer of them). No, it's not data that's somehow only supposed to be on the executor. I commented on this above. > Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics > - > > Key: SPARK-17381 > URL: https://issues.apache.org/jira/browse/SPARK-17381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: EMR 5.0.0 (submitted as yarn-client) > Java Version 1.8.0_101 (Oracle Corporation) > Scala Version version 2.11.8 > Problem also happens when I run locally with similar versions of java/scala. > OS: Ubuntu 16.04 >Reporter: Joao Duarte > > I am running a Spark Streaming application from a Kinesis stream. After some > hours running it gets out of memory. After a driver heap dump I found two > problems: > 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems > this was a problem before: > https://issues.apache.org/jira/browse/SPARK-11192); > To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just > needed to run the code below: > {code} > val dstream = ssc.union(kinesisStreams) > dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { > val toyDF = streamInfo.map(_ => > (1, "data","more data " > )) > .toDF("Num", "Data", "MoreData" ) > toyDF.agg(sum("Num")).first().get(0) > } > ) > {code} > 2) huge amount of Array[Byte] (9Gb+) > After some analysis, I noticed that most of the Array[Byte] where being > referenced by objects that were being referenced by SQLTaskMetrics. The > strangest thing is that those Array[Byte] were basically text that were > loaded in the executors, so they should never be in the driver at all! > Still could not replicate the 2nd problem with a simple code (the original > was complex with data coming from S3, DynamoDB and other databases). However, > when I debug the application I can see that in Executor.scala, during > reportHeartBeat(), the data that should not be sent to the driver is being > added to "accumUpdates" which, as I understand, will be sent to the driver > for reporting. > To be more precise, one of the taskRunner in the loop "for (taskRunner <- > runningTasks.values().asScala)" contains a GenericInternalRow with a lot of > data that should not go to the driver. The path would be in my case: > taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if > not the same) to the data I see when I do a driver heap dump. > I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is > fixed I would have less of this undesirable data in the driver and I could > run my streaming app for a long period of time, but I think there will always > be some performance lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464740#comment-15464740 ] Joao Duarte edited comment on SPARK-17381 at 9/5/16 11:00 AM: -- Thank you for your suggestion [~srowen]! Setting spark.sql.ui.retainedExecutions to a low number seems to be a good workaround. I'm running my application for about one hour with spark.sql.ui.retainedExecutions=10 and the number of SQLTaskMetrics objects and the heap memory size of the driver seem to stabilise. I'll give an update at the end of the day or tomorrow to tell you if it remains stable. However, I think it is really strange that the driver is sent data that are supposed to be only in the executors. In my case, I am parsing HTML pages and some of those are being sent to the driver as part of ColumnStats (as you referred in you previous comment). Are they being sent as a summary by mistake? Do Spark really need this kind of information? The work around enables my application to run but sending unneeded data to the driver certainly reduces performance (some HTML pages I parse can be really big). Cheers was (Author: joaomaiaduarte): Hi Sean, Thank you for your suggestion! Setting spark.sql.ui.retainedExecutions to a low number seems to be a good workaround. I'm running my application for about one hour with spark.sql.ui.retainedExecutions=10 and the number of SQLTaskMetrics objects and the heap memory size of the driver seem to stabilise. I'll give an update at the end of the day or tomorrow to tell you if it remains stable. However, I think it is really strange that the driver is sent data that are supposed to be only in the executors. In my case, I am parsing HTML pages and some of those are being sent to the driver as part of ColumnStats (as you referred in you previous comment). Are they being sent as a summary by mistake? Do Spark really need this kind of information? The work around enables my application to run but sending unneeded data to the driver certainly reduces performance (some HTML pages I parse can be really big). Cheers > Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics > - > > Key: SPARK-17381 > URL: https://issues.apache.org/jira/browse/SPARK-17381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: EMR 5.0.0 (submitted as yarn-client) > Java Version 1.8.0_101 (Oracle Corporation) > Scala Version version 2.11.8 > Problem also happens when I run locally with similar versions of java/scala. > OS: Ubuntu 16.04 >Reporter: Joao Duarte > > I am running a Spark Streaming application from a Kinesis stream. After some > hours running it gets out of memory. After a driver heap dump I found two > problems: > 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems > this was a problem before: > https://issues.apache.org/jira/browse/SPARK-11192); > To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just > needed to run the code below: > {code} > val dstream = ssc.union(kinesisStreams) > dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { > val toyDF = streamInfo.map(_ => > (1, "data","more data " > )) > .toDF("Num", "Data", "MoreData" ) > toyDF.agg(sum("Num")).first().get(0) > } > ) > {code} > 2) huge amount of Array[Byte] (9Gb+) > After some analysis, I noticed that most of the Array[Byte] where being > referenced by objects that were being referenced by SQLTaskMetrics. The > strangest thing is that those Array[Byte] were basically text that were > loaded in the executors, so they should never be in the driver at all! > Still could not replicate the 2nd problem with a simple code (the original > was complex with data coming from S3, DynamoDB and other databases). However, > when I debug the application I can see that in Executor.scala, during > reportHeartBeat(), the data that should not be sent to the driver is being > added to "accumUpdates" which, as I understand, will be sent to the driver > for reporting. > To be more precise, one of the taskRunner in the loop "for (taskRunner <- > runningTasks.values().asScala)" contains a GenericInternalRow with a lot of > data that should not go to the driver. The path would be in my case: > taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if > not the same) to the data I see when I do a driver heap dump. > I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is > fixed I would have less of this undesirable data in the driver and I could > run my streaming app for a long period of time, but I think there will always > be some performance lost. -- This message was sent by
[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464740#comment-15464740 ] Joao Duarte commented on SPARK-17381: - Hi Sean, Thank you for your suggestion! Setting spark.sql.ui.retainedExecutions to a low number seems to be a good workaround. I'm running my application for about one hour with spark.sql.ui.retainedExecutions=10 and the number of SQLTaskMetrics objects and the heap memory size of the driver seem to stabilise. I'll give an update at the end of the day or tomorrow to tell you if it remains stable. However, I think it is really strange that the driver is sent data that are supposed to be only in the executors. In my case, I am parsing HTML pages and some of those are being sent to the driver as part of ColumnStats (as you referred in you previous comment). Are they being sent as a summary by mistake? Do Spark really need this kind of information? The work around enables my application to run but sending unneeded data to the driver certainly reduces performance (some HTML pages I parse can be really big). Cheers > Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics > - > > Key: SPARK-17381 > URL: https://issues.apache.org/jira/browse/SPARK-17381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: EMR 5.0.0 (submitted as yarn-client) > Java Version 1.8.0_101 (Oracle Corporation) > Scala Version version 2.11.8 > Problem also happens when I run locally with similar versions of java/scala. > OS: Ubuntu 16.04 >Reporter: Joao Duarte > > I am running a Spark Streaming application from a Kinesis stream. After some > hours running it gets out of memory. After a driver heap dump I found two > problems: > 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems > this was a problem before: > https://issues.apache.org/jira/browse/SPARK-11192); > To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just > needed to run the code below: > {code} > val dstream = ssc.union(kinesisStreams) > dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { > val toyDF = streamInfo.map(_ => > (1, "data","more data " > )) > .toDF("Num", "Data", "MoreData" ) > toyDF.agg(sum("Num")).first().get(0) > } > ) > {code} > 2) huge amount of Array[Byte] (9Gb+) > After some analysis, I noticed that most of the Array[Byte] where being > referenced by objects that were being referenced by SQLTaskMetrics. The > strangest thing is that those Array[Byte] were basically text that were > loaded in the executors, so they should never be in the driver at all! > Still could not replicate the 2nd problem with a simple code (the original > was complex with data coming from S3, DynamoDB and other databases). However, > when I debug the application I can see that in Executor.scala, during > reportHeartBeat(), the data that should not be sent to the driver is being > added to "accumUpdates" which, as I understand, will be sent to the driver > for reporting. > To be more precise, one of the taskRunner in the loop "for (taskRunner <- > runningTasks.values().asScala)" contains a GenericInternalRow with a lot of > data that should not go to the driver. The path would be in my case: > taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if > not the same) to the data I see when I do a driver heap dump. > I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is > fixed I would have less of this undesirable data in the driver and I could > run my streaming app for a long period of time, but I think there will always > be some performance lost. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics
[ https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joao Duarte updated SPARK-17381: Description: I am running a Spark Streaming application from a Kinesis stream. After some hours running it gets out of memory. After a driver heap dump I found two problems: 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems this was a problem before: https://issues.apache.org/jira/browse/SPARK-11192); To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just needed to run the code below: {code} val dstream = ssc.union(kinesisStreams) dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { val toyDF = streamInfo.map(_ => (1, "data","more data " )) .toDF("Num", "Data", "MoreData" ) toyDF.agg(sum("Num")).first().get(0) } ) {code} 2) huge amount of Array[Byte] (9Gb+) After some analysis, I noticed that most of the Array[Byte] where being referenced by objects that were being referenced by SQLTaskMetrics. The strangest thing is that those Array[Byte] were basically text that were loaded in the executors, so they should never be in the driver at all! Still could not replicate the 2nd problem with a simple code (the original was complex with data coming from S3, DynamoDB and other databases). However, when I debug the application I can see that in Executor.scala, during reportHeartBeat(), the data that should not be sent to the driver is being added to "accumUpdates" which, as I understand, will be sent to the driver for reporting. To be more precise, one of the taskRunner in the loop "for (taskRunner <- runningTasks.values().asScala)" contains a GenericInternalRow with a lot of data that should not go to the driver. The path would be in my case: taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if not the same) to the data I see when I do a driver heap dump. I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is fixed I would have less of this undesirable data in the driver and I could run my streaming app for a long period of time, but I think there will always be some performance lost. was: I am running a Spark Streaming application from a Kinesis stream. After some hours running it gets out of memory. After a driver heap dump I found two problems: 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems this was a problem before: https://issues.apache.org/jira/browse/SPARK-11192); To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just needed to run the code below: {code} val dstream = ssc.union(kinesisStreams) dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => { //load data val toyDF = streamInfo.map(_ => (1, "data","more data " )) .toDF("Num", "Data", "MoreData" ) toyDF.agg(sum("Num")).first().get(0) } ) {code} 2) huge amount of Array[Byte] (9Gb+) After some analysis, I noticed that most of the Array[Byte] where being referenced by objects that were being referenced by SQLTaskMetrics. The strangest thing is that those Array[Byte] were basically text that were loaded in the executors, so they should never be in the driver at all! Still could not replicate the 2nd problem with a simple code (the original was complex with data coming from S3, DynamoDB and other databases). However, when I debug the application I can see that in Executor.scala, during reportHeartBeat(), the data that should not be sent to the driver is being added to "accumUpdates" which, as I understand, will be sent to the driver for reporting. To be more precise, one of the taskRunner in the loop "for (taskRunner <- runningTasks.values().asScala)" contains a GenericInternalRow with a lot of data that should not go to the driver. The path would be in my case: taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if not the same) to the data I see when I do a driver heap dump. I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is fixed I would have less of this undesirable data in the driver and I could run my streaming app for a long period of time, but I think there will always be some performance lost. > Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics > - > > Key: SPARK-17381 > URL: https://issues.apache.org/jira/browse/SPARK-17381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: EMR 5.0.0 (submitted as yarn-client) > Java Version 1.8.0_101 (Oracle Corporation) > Scala Version version 2.11.8 > Problem also happens when I run locally with similar versions of java/scala. > OS: Ubuntu 16.04 >
[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table
[ https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464627#comment-15464627 ] Sean Owen commented on SPARK-17396: --- [~rdblue] what do you think of this -- I think there's a good point here that ForkJoinPool won't limit the number of threads it uses? if that's true then this would probably need to switch to a normal Java executor service that can cap threads. > Threads number keep increasing when query on external CSV partitioned table > --- > > Key: SPARK-17396 > URL: https://issues.apache.org/jira/browse/SPARK-17396 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: pin_zhang > > 1. Create a external partitioned table row format CSV > 2. Add 16 partitions to the table > 3. Run SQL "select count(*) from test_csv" > 4. ForkJoinThread number keep increasing > This happend when table partitions number greater than 10. > 5. Test Code > import org.apache.spark.SparkConf > import org.apache.spark.SparkContext > import org.apache.spark.sql.hive.HiveContext > object Bugs { > def main(args: Array[String]): Unit = { > val location = "file:///g:/home/test/csv" > val create = s"""CREATE EXTERNAL TABLE test_csv > (ID string, SEQ string ) > PARTITIONED BY(index int) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' > LOCATION "${location}" > """ > val add_part = s""" > ALTER TABLE test_csv ADD > PARTITION (index=1)LOCATION '${location}/index=1' > PARTITION (index=2)LOCATION '${location}/index=2' > PARTITION (index=3)LOCATION '${location}/index=3' > PARTITION (index=4)LOCATION '${location}/index=4' > PARTITION (index=5)LOCATION '${location}/index=5' > PARTITION (index=6)LOCATION '${location}/index=6' > PARTITION (index=7)LOCATION '${location}/index=7' > PARTITION (index=8)LOCATION '${location}/index=8' > PARTITION (index=9)LOCATION '${location}/index=9' > PARTITION (index=10)LOCATION '${location}/index=10' > PARTITION (index=11)LOCATION '${location}/index=11' > PARTITION (index=12)LOCATION '${location}/index=12' > PARTITION (index=13)LOCATION '${location}/index=13' > PARTITION (index=14)LOCATION '${location}/index=14' > PARTITION (index=15)LOCATION '${location}/index=15' > PARTITION (index=16)LOCATION '${location}/index=16' > """ > val conf = new SparkConf().setAppName("scala").setMaster("local[2]") > conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse") > val ctx = new SparkContext(conf) > val hctx = new HiveContext(ctx) > hctx.sql(create) > hctx.sql(add_part) > for (i <- 1 to 6) { > new Query(hctx).start() > } > } > class Query(htcx: HiveContext) extends Thread { > setName("Query-Thread") > override def run = { > while (true) { > htcx.sql("select count(*) from test_csv").show() > Thread.sleep(100) > } > } > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464604#comment-15464604 ] Artur Sukhenko commented on SPARK-17379: Tried to use spark with 4.1.5, after fixing deprecated and implementing new methods some tests failed. So agree with you to just upgrade to 4.0.41. > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org