[jira] [Assigned] (SPARK-17406) Event Timeline will be very slow when there are too many executor events

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17406:


Assignee: Apache Spark

> Event Timeline will be very slow when there are too many executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Assignee: Apache Spark
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are thousands of executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time. Before this pr, it 
> looks like timeline1.png. After this pr, it looks like timeline2.png(we can 
> set how many events will be displayed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17406) Event Timeline will be very slow when there are too many executor events

2016-09-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466459#comment-15466459
 ] 

Apache Spark commented on SPARK-17406:
--

User 'cenyuhai' has created a pull request for this issue:
https://github.com/apache/spark/pull/14969

> Event Timeline will be very slow when there are too many executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are thousands of executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time. Before this pr, it 
> looks like timeline1.png. After this pr, it looks like timeline2.png(we can 
> set how many events will be displayed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17406) Event Timeline will be very slow when there are too many executor events

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17406:


Assignee: (was: Apache Spark)

> Event Timeline will be very slow when there are too many executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are thousands of executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time. Before this pr, it 
> looks like timeline1.png. After this pr, it looks like timeline2.png(we can 
> set how many events will be displayed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are too many executor events

2016-09-05 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated SPARK-17406:
--
Summary: Event Timeline will be very slow when there are too many executor 
events  (was: Event Timeline will be very slow when there are very executor 
events)

> Event Timeline will be very slow when there are too many executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are thousands of executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time. Before this pr, it 
> looks like timeline1.png. After this pr, it looks like timeline2.png(we can 
> set how many events will be displayed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events

2016-09-05 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated SPARK-17406:
--
Description: The job page will be too slow to open when there are thousands 
of executor events(added or removed). I found that in ExecutorsTab file, 
executorIdToData will not remove elements, it will increase all the time. 
Before this pr, it looks like timeline1.png. After this pr, it looks like 
timeline2.png(we can set how many events will be displayed)  (was: The job page 
will be too slow to open when there are very executor events(added or removed). 
I found that in ExecutorsTab file, executorIdToData will not remove elements, 
it will increase all the time. Before this pr, it looks like timeline1.png. 
After this pr, it looks like timeline2.png(we can set how many events will be 
displayed))

> Event Timeline will be very slow when there are very executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are thousands of executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time. Before this pr, it 
> looks like timeline1.png. After this pr, it looks like timeline2.png(we can 
> set how many events will be displayed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17369:


Assignee: Sean Zhong  (was: Apache Spark)

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466429#comment-15466429
 ] 

Apache Spark commented on SPARK-17369:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/14968

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17369:


Assignee: Apache Spark  (was: Sean Zhong)

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17409) Query in CTAS is Optimized Twice

2016-09-05 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466397#comment-15466397
 ] 

Xiao Li commented on SPARK-17409:
-

Working on it. Thanks!

> Query in CTAS is Optimized Twice
> 
>
> Key: SPARK-17409
> URL: https://issues.apache.org/jira/browse/SPARK-17409
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>
> The query in CTAS is optimized twice, as reported in the PR: 
> https://github.com/apache/spark/pull/14797
> {quote}
> Some analyzer rules have assumptions on logical plans, optimizer may break 
> these assumption, we should not pass an optimized query plan into 
> QueryExecution (will be analyzed again), otherwise we may some weird bugs.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17409) Query in CTAS is Optimized Twice

2016-09-05 Thread Xiao Li (JIRA)
Xiao Li created SPARK-17409:
---

 Summary: Query in CTAS is Optimized Twice
 Key: SPARK-17409
 URL: https://issues.apache.org/jira/browse/SPARK-17409
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Xiao Li


The query in CTAS is optimized twice, as reported in the PR: 
https://github.com/apache/spark/pull/14797

{quote}
Some analyzer rules have assumptions on logical plans, optimizer may break 
these assumption, we should not pass an optimized query plan into 
QueryExecution (will be analyzed again), otherwise we may some weird bugs.
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17072) generate table level stats:stats generation/storing/loading

2016-09-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-17072:
-
Assignee: Zhenhua Wang

> generate table level stats:stats generation/storing/loading
> ---
>
> Key: SPARK-17072
> URL: https://issues.apache.org/jira/browse/SPARK-17072
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
>Assignee: Zhenhua Wang
> Fix For: 2.1.0
>
>
> need to generating , storing, and loading statistics information into/from 
> meta store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17408) Flaky test: org.apache.spark.sql.hive.StatisticsSuite

2016-09-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466358#comment-15466358
 ] 

Yin Huai commented on SPARK-17408:
--

cc [~cloud_fan] [~ZenWzh]

> Flaky test: org.apache.spark.sql.hive.StatisticsSuite
> -
>
> Key: SPARK-17408
> URL: https://issues.apache.org/jira/browse/SPARK-17408
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64956/testReport/junit/org.apache.spark.sql.hive/StatisticsSuite/test_statistics_of_LogicalRelation_converted_from_MetastoreRelation/
> {code}
> org.apache.spark.sql.hive.StatisticsSuite.test statistics of LogicalRelation 
> converted from MetastoreRelation
> Failing for the past 1 build (Since Failed#64956 )
> Took 1.4 sec.
> Error Message
> org.scalatest.exceptions.TestFailedException: 6871 did not equal 4236
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 6871 
> did not equal 4236
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:247)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:241)
>   at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223)
>   at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:117)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:158)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite.org$apache$spark$sql$hive$StatisticsSuite$$checkLogicalRelationStats(StatisticsSuite.scala:241)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3$$anonfun$apply$mcV$sp$4.apply$mcV$sp(StatisticsSuite.scala:271)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withSQLConf(SQLTestUtils.scala:99)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite.withSQLConf(StatisticsSuite.scala:35)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3.apply$mcV$sp(StatisticsSuite.scala:268)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite.withTable(StatisticsSuite.scala:35)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply$mcV$sp(StatisticsSuite.scala:260)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257)
>   at 
> org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 

[jira] [Created] (SPARK-17408) Flaky test: org.apache.spark.sql.hive.StatisticsSuite

2016-09-05 Thread Yin Huai (JIRA)
Yin Huai created SPARK-17408:


 Summary: Flaky test: org.apache.spark.sql.hive.StatisticsSuite
 Key: SPARK-17408
 URL: https://issues.apache.org/jira/browse/SPARK-17408
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Yin Huai


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64956/testReport/junit/org.apache.spark.sql.hive/StatisticsSuite/test_statistics_of_LogicalRelation_converted_from_MetastoreRelation/

{code}
org.apache.spark.sql.hive.StatisticsSuite.test statistics of LogicalRelation 
converted from MetastoreRelation

Failing for the past 1 build (Since Failed#64956 )
Took 1.4 sec.
Error Message

org.scalatest.exceptions.TestFailedException: 6871 did not equal 4236
Stacktrace

sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 6871 did 
not equal 4236
at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:247)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$14.applyOrElse(StatisticsSuite.scala:241)
at scala.PartialFunction$Lifted.apply(PartialFunction.scala:223)
at scala.PartialFunction$Lifted.apply(PartialFunction.scala:219)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$collect$1.apply(TreeNode.scala:158)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:117)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreach$1.apply(TreeNode.scala:118)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:118)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.collect(TreeNode.scala:158)
at 
org.apache.spark.sql.hive.StatisticsSuite.org$apache$spark$sql$hive$StatisticsSuite$$checkLogicalRelationStats(StatisticsSuite.scala:241)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3$$anonfun$apply$mcV$sp$4.apply$mcV$sp(StatisticsSuite.scala:271)
at 
org.apache.spark.sql.test.SQLTestUtils$class.withSQLConf(SQLTestUtils.scala:99)
at 
org.apache.spark.sql.hive.StatisticsSuite.withSQLConf(StatisticsSuite.scala:35)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6$$anonfun$apply$mcV$sp$3.apply$mcV$sp(StatisticsSuite.scala:268)
at 
org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
at 
org.apache.spark.sql.hive.StatisticsSuite.withTable(StatisticsSuite.scala:35)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply$mcV$sp(StatisticsSuite.scala:260)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257)
at 
org.apache.spark.sql.hive.StatisticsSuite$$anonfun$6.apply(StatisticsSuite.scala:257)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:57)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)

[jira] [Commented] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466350#comment-15466350
 ] 

Yin Huai commented on SPARK-17369:
--

Let'e create a patch for branch 2.0. I have reverted the current one because it 
breaks the 2.0 build.

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-17369:
-
Fix Version/s: (was: 2.0.1)

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reopened SPARK-17369:
--

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17358) Cached table(parquet/orc) should be shard between beelines

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-17358.
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14913
[https://github.com/apache/spark/pull/14913]

> Cached table(parquet/orc) should be shard between beelines
> --
>
> Key: SPARK-17358
> URL: https://issues.apache.org/jira/browse/SPARK-17358
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yadong Qi
> Fix For: 2.0.1, 2.1.0
>
>
> I cached table(parquet) in Beeline1, and couldn't use cache data in Beeline2. 
> But table in text format is OK.(Cached table(text) can be shard between 
> beelines)
> Beeline1
> {code:sql}
> CACHE TABLE src_pqt;
> EXPLAIN SELECT * FROM src_pqt;
> | == Physical Plan ==
> InMemoryTableScan
>+- InMemoryRelation
>  +- *FileScan parquet default.src_pqt
> {code}
> Beeline2
> {code:sql}
> EXPLAIN SELECT * FROM src_pqt;
> | == Physical Plan ==
> *FileScan parquet default.src_pqt
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17407) Unable to update structured stream from CSV

2016-09-05 Thread Chris Parmer (JIRA)
Chris Parmer created SPARK-17407:


 Summary: Unable to update structured stream from CSV
 Key: SPARK-17407
 URL: https://issues.apache.org/jira/browse/SPARK-17407
 Project: Spark
  Issue Type: Question
  Components: PySpark
Affects Versions: 2.0.0
 Environment: Mac OSX
Spark 2.0.0
Reporter: Chris Parmer
Priority: Trivial


I am creating a simple example of a Structured Stream from a CSV file with an 
in-memory output stream.
When I add rows the CSV file, my output stream does not update with the new 
data. From this example: 
https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4012078893478893/3202384642551446/5985939988045659/latest.html,
 I expected that subsequent queries on the same output stream would contain 
updated results.

Here is a reproducable code example: https://plot.ly/~chris/17703

Thanks for the help here!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17358) Cached table(parquet/orc) should be shard between beelines

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-17358:

Assignee: Yadong Qi

> Cached table(parquet/orc) should be shard between beelines
> --
>
> Key: SPARK-17358
> URL: https://issues.apache.org/jira/browse/SPARK-17358
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Yadong Qi
>Assignee: Yadong Qi
> Fix For: 2.0.1, 2.1.0
>
>
> I cached table(parquet) in Beeline1, and couldn't use cache data in Beeline2. 
> But table in text format is OK.(Cached table(text) can be shard between 
> beelines)
> Beeline1
> {code:sql}
> CACHE TABLE src_pqt;
> EXPLAIN SELECT * FROM src_pqt;
> | == Physical Plan ==
> InMemoryTableScan
>+- InMemoryRelation
>  +- *FileScan parquet default.src_pqt
> {code}
> Beeline2
> {code:sql}
> EXPLAIN SELECT * FROM src_pqt;
> | == Physical Plan ==
> *FileScan parquet default.src_pqt
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-17369:

Fix Version/s: 2.0.1

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-17369:

Assignee: Sean Zhong

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Assignee: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events

2016-09-05 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated SPARK-17406:
--
Description: The job page will be too slow to open when there are very 
executor events(added or removed). I found that in ExecutorsTab file, 
executorIdToData will not remove elements, it will increase all the time. 
Before this pr, it looks like timeline1.png. After this pr, it looks like 
timeline2.png(we can set how many events will be displayed)  (was: The job page 
will be too slow to open when there are very executor events(added or removed). 
I found that in ExecutorsTab file, executorIdToData will not remove elements, 
it will increase all the time.)

> Event Timeline will be very slow when there are very executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are very executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time. Before this pr, it 
> looks like timeline1.png. After this pr, it looks like timeline2.png(we can 
> set how many events will be displayed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17369) MetastoreRelation toJSON throws exception

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-17369.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14928
[https://github.com/apache/spark/pull/14928]

> MetastoreRelation toJSON throws exception
> -
>
> Key: SPARK-17369
> URL: https://issues.apache.org/jira/browse/SPARK-17369
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Sean Zhong
>Priority: Minor
> Fix For: 2.1.0
>
>
> MetastoreRelationSuite.toJSON now throws exception.
> test case:
> {code}
> package org.apache.spark.sql.hive
> import org.apache.spark.SparkFunSuite
> import org.apache.spark.sql.catalyst.TableIdentifier
> import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
> CatalogTable, CatalogTableType}
> import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
> class MetastoreRelationSuite extends SparkFunSuite {
>   test("makeCopy and toJSON should work") {
> val table = CatalogTable(
>   identifier = TableIdentifier("test", Some("db")),
>   tableType = CatalogTableType.VIEW,
>   storage = CatalogStorageFormat.empty,
>   schema = StructType(StructField("a", IntegerType, true) :: Nil))
> val relation = MetastoreRelation("db", "test")(table, null, null)
> // No exception should be thrown
> relation.makeCopy(Array("db", "test"))
> // No exception should be thrown
> relation.toJSON
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events

2016-09-05 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated SPARK-17406:
--
Attachment: timeline2.png

> Event Timeline will be very slow when there are very executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are very executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17406) Event Timeline will be very slow when there are very executor events

2016-09-05 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated SPARK-17406:
--
Attachment: timeline1.png

> Event Timeline will be very slow when there are very executor events
> 
>
> Key: SPARK-17406
> URL: https://issues.apache.org/jira/browse/SPARK-17406
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.6.2, 2.0.0
>Reporter: cen yuhai
>Priority: Minor
> Attachments: timeline1.png, timeline2.png
>
>
> The job page will be too slow to open when there are very executor 
> events(added or removed). I found that in ExecutorsTab file, executorIdToData 
> will not remove elements, it will increase all the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16943) CREATE TABLE LIKE generates a non-empty table when source is a data source table

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-16943:

Fix Version/s: 2.0.1

> CREATE TABLE LIKE generates a non-empty table when source is a data source 
> table
> 
>
> Key: SPARK-16943
> URL: https://issues.apache.org/jira/browse/SPARK-16943
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.1, 2.1.0
>
>
> When the source table is a data source table, the table generated by CREATE 
> TABLE LIKE is non-empty. The expected table should be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16942) CREATE TABLE LIKE generates External table when source table is an External Hive Serde table

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-16942:

Fix Version/s: 2.0.1

> CREATE TABLE LIKE generates External table when source table is an External 
> Hive Serde table
> 
>
> Key: SPARK-16942
> URL: https://issues.apache.org/jira/browse/SPARK-16942
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.1, 2.1.0
>
>
> When the table type of source table is an EXTERNAL Hive serde table, {{CREATE 
> TABLE LIKE}} will generate an EXTERNAL table. The expected table type should 
> be MANAGED
> The table type of the generated table is `EXTERNAL` when the source table is 
> an external Hive Serde table. Currently, we explicitly set it to `MANAGED`, 
> but Hive is checking the table property `EXTERNAL` to decide whether the 
> table is `EXTERNAL` or not. (See 
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1407-L1408)
>  Thus, the created table is still `EXTERNAL`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17353) CREATE TABLE LIKE statements when Source is a VIEW

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-17353:

Fix Version/s: 2.0.1

> CREATE TABLE LIKE statements when Source is a VIEW
> --
>
> Key: SPARK-17353
> URL: https://issues.apache.org/jira/browse/SPARK-17353
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.0.1, 2.1.0
>
>
> - Add a support for tempory view
> - When the source table is a `VIEW`, the metadata of the generated table 
> contains the original view text and view original text. So far, this does not 
> break anything, but it could cause something wrong in Hive. (For example, 
> https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1405-L1406)
> - When the type of source table is a view, the target table is using the 
> default format of data source tables: `spark.sql.sources.default`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17406) Event Timeline will be very slow when there are very executor events

2016-09-05 Thread cen yuhai (JIRA)
cen yuhai created SPARK-17406:
-

 Summary: Event Timeline will be very slow when there are very 
executor events
 Key: SPARK-17406
 URL: https://issues.apache.org/jira/browse/SPARK-17406
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.0.0, 1.6.2
Reporter: cen yuhai
Priority: Minor


The job page will be too slow to open when there are very executor events(added 
or removed). I found that in ExecutorsTab file, executorIdToData will not 
remove elements, it will increase all the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17279) better error message for exceptions during ScalaUDF execution

2016-09-05 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-17279.
-
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14850
[https://github.com/apache/spark/pull/14850]

> better error message for exceptions during ScalaUDF execution
> -
>
> Key: SPARK-17279
> URL: https://issues.apache.org/jira/browse/SPARK-17279
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17397) what to do when awaitTermination() throws?

2016-09-05 Thread Spiro Michaylov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466183#comment-15466183
 ] 

Spiro Michaylov edited comment on SPARK-17397 at 9/6/16 2:24 AM:
-

Apologies: I was originally going to file it as a doc bug, and perhaps I should 
have done that, although I actually suspect there's a feature enhancement to 
file as well. I guess maybe the user list was a better place to hash that out. 

I mostly had more mundane exceptions in mind, say throwing a custom exception 
when processing a stream like this fragment from my code above. 
{code}
stream
  .map(x => { throw new SomeException("something"); x} )
  .foreachRDD(r => println("*** count = " + r.count()))
{code}
I got the impression from the unit tests (StreamingContextSuite) that such 
processing exceptions were intended to propagate through awaitTermination() and 
indeed they do.  

In so far as the intended usage is orderly shutdown of the StreamingContext 
when the first such exception propagates, I agree it's just a doc issue, and 
perhaps the intended usage pattern is something like:

{code}
try {
ssc.awaitTermination()
println("*** streaming terminated")
} catch {
   case e: Exception => {
  ssc.stop() // ADDED THIS AS A WAY TO GET AN ORDERLY SHUTDOWN
  }
}
{code}

But it seems like to get resilient streaming applications you'd like to have 
the option of seeing which exceptions have propagated, how severe, and how 
many, so you'd like (at least _I_ would like) something like:

{code}
var terminated = false
var nastyExceptionCount = 0
while (!terminated && nastyExceptionCount < 100) {
  try {
  ssc.awaitTermination()
  terminated = true
  println("*** streaming terminated")
  } catch {
case ne: NastyException => {
   // log it and count it but don't panic
   nastyExceptionCount = nastyExceptionCount + 1
}
case e: Exception => {
  // yawn!
}
  }
}
if (!terminated) ssc.stop() // because we got too many nasty exceptions
{code}

However, that doesn't seem to be supported (looks like the first exception gets 
repeated every time awaitTermination() is called), so I'm inclined to file it 
as either a bug or a feature request according to your taste, or just drop it 
if you think what I'm proposing is inappropriate.

In any case, I don't want to abuse the process more than I already have: I'm 
happy to either drop this, take it to the user list, or open other tickets if 
desired and close this one.  In fact, I notice I can still turn this into a bug 
report. Should I?



was (Author: spirom):
Apologies: I was originally going to file it as a doc bug, and perhaps I should 
have done that, although I actually suspect there's a feature enhancement to 
file as well. I guess maybe the user list was a better place to hash that out. 

I mostly had more mundane exceptions in mind, say throwing a custom exception 
when processing a stream like this fragment from my code above. 
{code}
stream
  .map(x => { throw new SomeException("something"); x} )
  .foreachRDD(r => println("*** count = " + r.count()))
{code}
I got the impression from the unit tests (StreamingContextSuite) that such 
processing exceptions were intended to propagate through awaitTermination() and 
indeed they do.  

In so far as the intended usage is orderly shutdown of the StreamingContext 
when the first such exception propagates, I agree it's just a doc issue, and 
perhaps the intended usage pattern is something like:

{code}
try {
ssc.awaitTermination()
println("*** streaming terminated")
} catch {
   case e: Exception => {
  ssc.stop() // ADDED THIS AS A WAY TO GET AN ORDERLY SHUTDOWN
  }
}
{code}

But it seems like to get resilient streaming applications you'd like to have 
the option of seeing which exceptions have propagated, how severe, and how 
many, so you'd like (at least _I_ would like) something like:

{code}
var terminated = false
var nastyExceptionCount = 0
while (!terminated && nastyExceptionCount < 100) {
  try {
  ssc.awaitTermination()
  terminated = true
  println("*** streaming terminated")
  } catch {
case ne: NastyException => {
   // log it and count it but don't panic
   nastyExceptionCount = nastyExceptionCount + 1
}
case e: Exception => {
  // yawn!
}
  }
}
if (!terminated) ssc.stop() // because we got too many nasty exceptions
{code}

However, that doesn't seem to be supported (looks like the first exception gets 
repeated every time awaitTermination() is called), so I'm inclined to file it 
as either a bug or a feature request according to your taste, or just drop it 
if you think what I'm proposing is inappropriate.

In any case, I don't want to abuse the process more than I already have: I'm 
happy to either drop this, take it to the user list, or open other tickets if 
desired and close this one.  


> what to do 

[jira] [Commented] (SPARK-17397) what to do when awaitTermination() throws?

2016-09-05 Thread Spiro Michaylov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466183#comment-15466183
 ] 

Spiro Michaylov commented on SPARK-17397:
-

Apologies: I was originally going to file it as a doc bug, and perhaps I should 
have done that, although I actually suspect there's a feature enhancement to 
file as well. I guess maybe the user list was a better place to hash that out. 

I mostly had more mundane exceptions in mind, say throwing a custom exception 
when processing a stream like this fragment from my code above. 
{code}
stream
  .map(x => { throw new SomeException("something"); x} )
  .foreachRDD(r => println("*** count = " + r.count()))
{code}
I got the impression from the unit tests (StreamingContextSuite) that such 
processing exceptions were intended to propagate through awaitTermination() and 
indeed they do.  

In so far as the intended usage is orderly shutdown of the StreamingContext 
when the first such exception propagates, I agree it's just a doc issue, and 
perhaps the intended usage pattern is something like:

{code}
try {
ssc.awaitTermination()
println("*** streaming terminated")
} catch {
   case e: Exception => {
  ssc.stop() // ADDED THIS AS A WAY TO GET AN ORDERLY SHUTDOWN
  }
}
{code}

But it seems like to get resilient streaming applications you'd like to have 
the option of seeing which exceptions have propagated, how severe, and how 
many, so you'd like (at least _I_ would like) something like:

{code}
var terminated = false
var nastyExceptionCount = 0
while (!terminated && nastyExceptionCount < 100) {
  try {
  ssc.awaitTermination()
  terminated = true
  println("*** streaming terminated")
  } catch {
case ne: NastyException => {
   // log it and count it but don't panic
   nastyExceptionCount = nastyExceptionCount + 1
}
case e: Exception => {
  // yawn!
}
  }
}
if (!terminated) ssc.stop() // because we got too many nasty exceptions
{code}

However, that doesn't seem to be supported (looks like the first exception gets 
repeated every time awaitTermination() is called), so I'm inclined to file it 
as either a bug or a feature request according to your taste, or just drop it 
if you think what I'm proposing is inappropriate.

In any case, I don't want to abuse the process more than I already have: I'm 
happy to either drop this, take it to the user list, or open other tickets if 
desired and close this one.  


> what to do when awaitTermination() throws? 
> ---
>
> Key: SPARK-17397
> URL: https://issues.apache.org/jira/browse/SPARK-17397
> Project: Spark
>  Issue Type: Question
>  Components: Streaming
>Affects Versions: 2.0.0
> Environment: Linux, Scala but probably general
>Reporter: Spiro Michaylov
>Priority: Minor
>
> When awaitTermination propagates an exception that was thrown in processing a 
> batch, the StreamingContext keeps running. Perhaps this is by design, but I 
> don't see any mention of it in the API docs or the streaming programming 
> guide. It's not clear what idiom should be used to block the thread until the 
> context HAS been stopped in a situation where stream processing is throwing 
> lots of exceptions. 
> For example, in the following, streaming takes the full 30 seconds to 
> terminate. My hope in asking this is to improve my own understanding and 
> perhaps inspire documentation improvements. I'm not filing a bug because it's 
> not clear to me whether this is working as intended. 
> {code}
> val conf = new 
> SparkConf().setAppName("ExceptionPropagation").setMaster("local[4]")
> val sc = new SparkContext(conf)
> // streams will produce data every second
> val ssc = new StreamingContext(sc, Seconds(1))
> val qm = new QueueMaker(sc, ssc)
> // create the stream
> val stream = // create some stream
> // register for data
> stream
>   .map(x => { throw new SomeException("something"); x} )
>   .foreachRDD(r => println("*** count = " + r.count()))
> // start streaming
> ssc.start()
> new Thread("Delayed Termination") {
>   override def run() {
> Thread.sleep(3)
> ssc.stop()
>   }
> }.start()
> println("*** producing data")
> // start producing data
> qm.populateQueue()
> try {
>   ssc.awaitTermination()
>   println("*** streaming terminated")
> } catch {
>   case e: Exception => {
> println("*** streaming exception caught in monitor thread")
>   }
> }
> // if the above goes down the exception path, there seems no 
> // good way to block here until the streaming context is stopped 
> println("*** done")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-05 Thread pin_zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466180#comment-15466180
 ] 

pin_zhang commented on SPARK-17396:
---

"Thread-1902" daemon prio=6 tid=0x14078800 nid=0x3a6c runnable 
[0x38d5e000]
"Thread-1901" daemon prio=6 tid=0x0c64f800 nid=0x32fc runnable 
[0x191ef000]
"Thread-1900" daemon prio=6 tid=0x14249800 nid=0x263c runnable 
[0x4c73e000]
"Thread-1899" daemon prio=6 tid=0x14244000 nid=0x189c runnable 
[0x17c7e000]
"Thread-1898" daemon prio=6 tid=0x0d96a800 nid=0x3e54 runnable 
[0x4c5ef000]
"ForkJoinPool-120-worker-1" daemon prio=6 tid=0x1407d000 nid=0x2234 
waiting for monitor entry [0x4c31e000]
"ForkJoinPool-120-worker-3" daemon prio=6 tid=0x13a64000 nid=0x1f0c 
waiting for monitor entry [0x4c0de000]
"ForkJoinPool-120-worker-5" daemon prio=6 tid=0x13a75800 nid=0x1660 
waiting for monitor entry [0x4241e000]
"ForkJoinPool-120-worker-7" daemon prio=6 tid=0x13d6c000 nid=0x117c 
waiting for monitor entry [0x4bece000]
"ForkJoinPool-120-worker-9" daemon prio=6 tid=0x14233800 nid=0x2a20 
waiting for monitor entry [0x4bd3e000]
"ForkJoinPool-120-worker-11" daemon prio=6 tid=0x1423f800 nid=0x3568 
waiting for monitor entry [0x4afae000]
"ForkJoinPool-120-worker-13" daemon prio=6 tid=0x1424e000 nid=0x378c 
waiting for monitor entry [0x4bc0e000]
"ForkJoinPool-120-worker-15" daemon prio=6 tid=0x14238000 nid=0x1b8c 
waiting for monitor entry [0x18dfd000]
"ForkJoinPool-119-worker-1" daemon prio=6 tid=0x13d74800 nid=0x29a0 
waiting for monitor entry [0x4bade000]
"ForkJoinPool-119-worker-3" daemon prio=6 tid=0x12cd4000 nid=0x18a0 in 
Object.wait() [0x4b9ae000]
"ForkJoinPool-119-worker-7" daemon prio=6 tid=0x12cd3000 nid=0x15ec 
waiting for monitor entry [0x4b87d000]
"ForkJoinPool-119-worker-5" daemon prio=6 tid=0x13bbd800 nid=0x2c24 
waiting for monitor entry [0x4b76d000]
"ForkJoinPool-119-worker-9" daemon prio=6 tid=0x13bc9800 nid=0x3d78 
waiting for monitor entry [0x2acae000]
"ForkJoinPool-119-worker-11" daemon prio=6 tid=0x0d9eb000 nid=0x3f40 
waiting for monitor entry [0x4b57e000]
"ForkJoinPool-119-worker-13" daemon prio=6 tid=0x0d9e4800 nid=0x286c 
waiting for monitor entry [0x4b40e000]
"ForkJoinPool-119-worker-15" daemon prio=6 tid=0x0d9e9000 nid=0x2304 in 
Object.wait() [0x194de000]
"ForkJoinPool-118-worker-1" daemon prio=6 tid=0x14077000 nid=0x3a50 
runnable [0x393dd000]
"ForkJoinPool-118-worker-3" daemon prio=6 tid=0x1407a000 nid=0x1dc0 
runnable [0x2331d000]
"ForkJoinPool-118-worker-5" daemon prio=6 tid=0x0d2f9000 nid=0x2990 
runnable [0x1b6fd000]
"ForkJoinPool-118-worker-7" daemon prio=6 tid=0x0d2df800 nid=0x3bb4 
runnable [0x4a9dd000]
"ForkJoinPool-118-worker-9" daemon prio=6 tid=0x0d2f7800 nid=0x37e4 
waiting for monitor entry [0x2bf5e000]
"ForkJoinPool-118-worker-11" daemon prio=6 tid=0x12648000 nid=0x2878 
runnable [0x2b26d000]
"ForkJoinPool-118-worker-13" daemon prio=6 tid=0x12646000 nid=0x4cc 
waiting for monitor entry [0x183de000]
"ForkJoinPool-118-worker-15" daemon prio=6 tid=0x12647800 nid=0x30c8 
waiting for monitor entry [0x2bd3d000]
"ForkJoinPool-117-worker-5" daemon prio=6 tid=0x12b5c800 nid=0x3510 
waiting for monitor entry [0x4b2be000]
"ForkJoinPool-117-worker-1" daemon prio=6 tid=0x12b5d000 nid=0x36b8 
waiting for monitor entry [0x4b11e000]
"ForkJoinPool-117-worker-3" daemon prio=6 tid=0x12eac800 nid=0x32d4 in 
Object.wait() [0x4acae000]
"ForkJoinPool-117-worker-7" daemon prio=6 tid=0x12ea9800 nid=0x16c4 
waiting for monitor entry [0x4ab1e000]
"ForkJoinPool-117-worker-9" daemon prio=6 tid=0x12e9b000 nid=0x1e44 
waiting for monitor entry [0x2162e000]
"ForkJoinPool-117-worker-11" daemon prio=6 tid=0x13bcc000 nid=0x37f4 
waiting for monitor entry [0x40dee000]
"ForkJoinPool-117-worker-13" daemon prio=6 tid=0x13bcb000 nid=0x361c in 
Object.wait() [0x35dbe000]
"ForkJoinPool-117-worker-15" daemon prio=6 tid=0x13bca800 nid=0x3344 in 
Object.wait() [0x2c0ce000]
"ForkJoinPool-116-worker-1" daemon prio=6 tid=0x13bc9000 nid=0x3a34 
runnable [0x4867d000]
"ForkJoinPool-116-worker-3" daemon prio=6 tid=0x13bc8000 nid=0x1c10 in 
Object.wait() [0x4a8be000]
"ForkJoinPool-116-worker-7" daemon prio=6 tid=0x13bc7800 nid=0x2910 
waiting on condition [0x45e7f000]
"ForkJoinPool-116-worker-5" daemon prio=6 tid=0x13bc6800 nid=0x3b1c 
waiting for monitor entry 

[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466154#comment-15466154
 ] 

Yin Huai commented on SPARK-17404:
--

Thanks!

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>Assignee: Sun Rui
> Fix For: 1.6.3
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4957) TaskScheduler: when no resources are available: backoff after # of tries and crash.

2016-09-05 Thread Kay Ousterhout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout closed SPARK-4957.
-
Resolution: Won't Fix

Closing this as won't fix since it's been open (with no activity) for a while

> TaskScheduler: when no resources are available: backoff after # of tries and 
> crash.
> ---
>
> Key: SPARK-4957
> URL: https://issues.apache.org/jira/browse/SPARK-4957
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 1.1.0
>Reporter: Nathan Bijnens
>  Labels: scheduler
>
> Currently the TaskSchedulerImpl retries scheduling if there are no resources 
> available. Unfortunately it keeps retrying with a small delay. It would make 
> sense to throw an exception after a number of tries, instead of hanging 
> indefinitely. 
> https://github.com/apache/spark/blob/cb0eae3b78d7f6f56c0b9521ee48564a4967d3de/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L164-175



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-17404.
---
   Resolution: Fixed
 Assignee: Sun Rui
Fix Version/s: 1.6.3

Fixed by backporting https://github.com/apache/spark/pull/12867

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>Assignee: Sun Rui
> Fix For: 1.6.3
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17072) generate table level stats:stats generation/storing/loading

2016-09-05 Thread Zhenhua Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466082#comment-15466082
 ] 

Zhenhua Wang commented on SPARK-17072:
--

My JIRA account name is ZenWzh, thanks!

> generate table level stats:stats generation/storing/loading
> ---
>
> Key: SPARK-17072
> URL: https://issues.apache.org/jira/browse/SPARK-17072
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
> Fix For: 2.1.0
>
>
> need to generating , storing, and loading statistics information into/from 
> meta store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17405) Simple aggregation query OOMing after SPARK-16525

2016-09-05 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-17405:
--

 Summary: Simple aggregation query OOMing after SPARK-16525
 Key: SPARK-17405
 URL: https://issues.apache.org/jira/browse/SPARK-17405
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Josh Rosen
Priority: Blocker


Prior to SPARK-16525 / https://github.com/apache/spark/pull/14176, the 
following query ran fine via Beeline / Thrift Server and the Spark shell, but 
after that patch it is consistently OOMING:

{code}
CREATE TEMPORARY VIEW table_1(double_col_1, boolean_col_2, timestamp_col_3, 
smallint_col_4, boolean_col_5, int_col_6, timestamp_col_7, varchar0008_col_8, 
int_col_9, string_col_10) AS (
  SELECT * FROM (VALUES
(CAST(-147.818640624 AS DOUBLE), CAST(NULL AS BOOLEAN), 
TIMESTAMP('2012-10-19 00:00:00.0'), CAST(9 AS SMALLINT), false, 77, 
TIMESTAMP('2014-07-01 00:00:00.0'), '-945', -646, '722'),
(CAST(594.195125271 AS DOUBLE), false, TIMESTAMP('2016-12-04 00:00:00.0'), 
CAST(NULL AS SMALLINT), CAST(NULL AS BOOLEAN), CAST(NULL AS INT), 
TIMESTAMP('1999-12-26 00:00:00.0'), '250', -861, '55'),
(CAST(-454.171126363 AS DOUBLE), false, TIMESTAMP('2008-12-13 00:00:00.0'), 
CAST(NULL AS SMALLINT), false, -783, TIMESTAMP('2010-05-28 00:00:00.0'), '211', 
-959, CAST(NULL AS STRING)),
(CAST(437.670945524 AS DOUBLE), true, TIMESTAMP('2011-10-16 00:00:00.0'), 
CAST(952 AS SMALLINT), true, 297, TIMESTAMP('2013-01-13 00:00:00.0'), '262', 
CAST(NULL AS INT), '936'),
(CAST(-387.226759334 AS DOUBLE), false, TIMESTAMP('2019-10-03 00:00:00.0'), 
CAST(-496 AS SMALLINT), CAST(NULL AS BOOLEAN), -925, TIMESTAMP('2028-06-27 
00:00:00.0'), '-657', 948, '18'),
(CAST(-306.138230875 AS DOUBLE), true, TIMESTAMP('1997-10-07 00:00:00.0'), 
CAST(332 AS SMALLINT), false, 744, TIMESTAMP('1990-09-22 00:00:00.0'), '-345', 
566, '-574'),
(CAST(675.402140308 AS DOUBLE), false, TIMESTAMP('2017-06-26 00:00:00.0'), 
CAST(972 AS SMALLINT), true, CAST(NULL AS INT), TIMESTAMP('2026-06-10 
00:00:00.0'), '518', 683, '-320'),
(CAST(734.839647174 AS DOUBLE), true, TIMESTAMP('1995-06-01 00:00:00.0'), 
CAST(-792 AS SMALLINT), CAST(NULL AS BOOLEAN), CAST(NULL AS INT), 
TIMESTAMP('2021-07-11 00:00:00.0'), '-318', 564, '142')
  ) as t);

CREATE TEMPORARY VIEW table_3(string_col_1, float_col_2, timestamp_col_3, 
boolean_col_4, timestamp_col_5, decimal3317_col_6) AS (
  SELECT * FROM (VALUES
('88', CAST(191.92508 AS FLOAT), TIMESTAMP('1990-10-25 00:00:00.0'), false, 
TIMESTAMP('1992-11-02 00:00:00.0'), CAST(NULL AS DECIMAL(33,17))),
('-419', CAST(-13.477915 AS FLOAT), TIMESTAMP('1996-03-02 00:00:00.0'), 
true, CAST(NULL AS TIMESTAMP), -653.51000BD),
('970', CAST(-360.432 AS FLOAT), TIMESTAMP('2010-07-29 00:00:00.0'), false, 
TIMESTAMP('1995-09-01 00:00:00.0'), -936.48000BD),
('807', CAST(814.30756 AS FLOAT), TIMESTAMP('2019-11-06 00:00:00.0'), 
false, TIMESTAMP('1996-04-25 00:00:00.0'), 335.56000BD),
('-872', CAST(616.50525 AS FLOAT), TIMESTAMP('2011-08-28 00:00:00.0'), 
false, TIMESTAMP('2003-07-19 00:00:00.0'), -951.18000BD),
('-167', CAST(-875.35675 AS FLOAT), TIMESTAMP('1995-07-14 00:00:00.0'), 
false, TIMESTAMP('2005-11-29 00:00:00.0'), 224.89000BD)
  ) as t);

SELECT
CAST(MIN(t2.smallint_col_4) AS STRING) AS char_col,
LEAD(MAX((-387) + (727.64)), 90) OVER (PARTITION BY COALESCE(t2.int_col_9, 
t2.smallint_col_4, t2.int_col_9) ORDER BY COALESCE(t2.int_col_9, 
t2.smallint_col_4, t2.int_col_9) DESC, CAST(MIN(t2.smallint_col_4) AS STRING)) 
AS decimal_col,
COALESCE(t2.int_col_9, t2.smallint_col_4, t2.int_col_9) AS int_col
FROM table_3 t1
INNER JOIN table_1 t2 ON (((t2.timestamp_col_3) = (t1.timestamp_col_5)) AND 
((t2.string_col_10) = (t1.string_col_1))) AND ((t2.string_col_10) = 
(t1.string_col_1))
WHERE
(t2.smallint_col_4) IN (t2.int_col_9, t2.int_col_9)
GROUP BY
COALESCE(t2.int_col_9, t2.smallint_col_4, t2.int_col_9);
{code}

Here's the OOM:

{code}
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most 
recent failure: Lost task 1.0 in stage 1.0 (TID 9, localhost): 
java.lang.OutOfMemoryError: Unable to acquire 262144 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:100)
at 
org.apache.spark.unsafe.map.BytesToBytesMap.allocate(BytesToBytesMap.java:783)
at 
org.apache.spark.unsafe.map.BytesToBytesMap.(BytesToBytesMap.java:204)
at 
org.apache.spark.unsafe.map.BytesToBytesMap.(BytesToBytesMap.java:219)
at 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.(UnsafeFixedWidthAggregationMap.java:104)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.createHashMap(HashAggregateExec.scala:305)
at 

[jira] [Issue Comment Deleted] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-17404:
--
Comment: was deleted

(was: I've kicked off 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/288/
 after the cherry-pick. Lets watch this and see if it fixes things)

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465927#comment-15465927
 ] 

Shivaram Venkataraman commented on SPARK-17404:
---

I've kicked off 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/288/
 after the cherry-pick. Lets watch this and see if it fixes things

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465929#comment-15465929
 ] 

Shivaram Venkataraman commented on SPARK-17404:
---

I've kicked off 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/288/
 after the cherry-pick. Lets watch this and see if it fixes things

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465915#comment-15465915
 ] 

Shivaram Venkataraman commented on SPARK-17404:
---

Hmm - we upgraded the version of testthat on the cluster last week. This could 
be related to that. I'm going to try and backport
https://github.com/apache/spark/pull/12867 to branch-1.6 and see if it fixes 
things

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17400) MinMaxScaler.transform() outputs DenseVector by default, which causes poor performance

2016-09-05 Thread Frank Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465845#comment-15465845
 ] 

Frank Dai commented on SPARK-17400:
---

Currently I'm haunted by this issue in production, I have 6 millions users and 
564 thousands  items and the MinMaxScaler.transform() takes more than 22 hours 
for prediction, the ALS.fit() actually is fast.

My cluster has 70 nodes and each node has 16 cores and 32GB memory.

> MinMaxScaler.transform() outputs DenseVector by default, which causes poor 
> performance
> --
>
> Key: SPARK-17400
> URL: https://issues.apache.org/jira/browse/SPARK-17400
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Frank Dai
>
> MinMaxScaler.transform() outputs DenseVector by default, which will cause 
> poor performance and consume a lot of memory.
> The most important line of code is the following:
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala#L195
> I suggest that the code should calculate the number of non-zero elements in 
> advance, if the number of non-zero elements is less than half of the total 
> elements in the matrix, use SparseVector, otherwise use DenseVector
> Or we can make it configurable by adding  a parameter to 
> MinMaxScaler.transform(), for example MinMaxScaler.transform(isDense: 
> Boolean), so that users can decide whether  their output result is dense or 
> sparse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17400) MinMaxScaler.transform() outputs DenseVector by default, which causes poor performance

2016-09-05 Thread Frank Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465845#comment-15465845
 ] 

Frank Dai edited comment on SPARK-17400 at 9/5/16 9:52 PM:
---

Currently I'm blocked by this issue in production, I have 6 millions users and 
564 thousands  items and the MinMaxScaler.transform() takes more than 22 hours 
for prediction, the ALS.fit() actually is fast.

My cluster has 70 nodes and each node has 16 cores and 32GB memory.


was (Author: soulmachine):
Currently I'm haunted by this issue in production, I have 6 millions users and 
564 thousands  items and the MinMaxScaler.transform() takes more than 22 hours 
for prediction, the ALS.fit() actually is fast.

My cluster has 70 nodes and each node has 16 cores and 32GB memory.

> MinMaxScaler.transform() outputs DenseVector by default, which causes poor 
> performance
> --
>
> Key: SPARK-17400
> URL: https://issues.apache.org/jira/browse/SPARK-17400
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Frank Dai
>
> MinMaxScaler.transform() outputs DenseVector by default, which will cause 
> poor performance and consume a lot of memory.
> The most important line of code is the following:
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala#L195
> I suggest that the code should calculate the number of non-zero elements in 
> advance, if the number of non-zero elements is less than half of the total 
> elements in the matrix, use SparseVector, otherwise use DenseVector
> Or we can make it configurable by adding  a parameter to 
> MinMaxScaler.transform(), for example MinMaxScaler.transform(isDense: 
> Boolean), so that users can decide whether  their output result is dense or 
> sparse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14649) DagScheduler re-starts all running tasks on fetch failure

2016-09-05 Thread Kay Ousterhout (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout updated SPARK-14649:
---
Description: 
When a fetch failure occurs, the DAGScheduler re-launches the previous stage 
(to re-generate output that was missing), and then re-launches all tasks in the 
stage with the fetch failure that hadn't *completed* when the fetch failure 
occurred (the DAGScheduler re-lanches all of the tasks whose output data is not 
available -- which is equivalent to the set of tasks that hadn't yet completed).

The assumption when this code was originally written was that when a fetch 
failure occurred, the output from at least one of the tasks in the previous 
stage was no longer available, so all of the tasks in the current stage would 
eventually fail due to not being able to access that output.  This assumption 
does not hold for some large-scale, long-running workloads.  E.g., there's one 
use case where a job has ~100k tasks that each run for about 1 hour, and only 
the first 5-10 minutes are spent fetching data.  Because of the large number of 
tasks, it's very common to see a few tasks fail in the fetch phase, and it's 
wasteful to re-run other tasks that had finished fetching data so aren't 
affected by the fetch failure (and may be most of the way through their 
hour-long execution).  The DAGScheduler should not re-start these tasks.

  was:When running a job we found out that there are many duplicate tasks 
running after fetch failure in a stage. The issue is that when submitting tasks 
for a stage, the dag scheduler submits all the pending tasks (tasks whose 
output is not available). But out of those pending tasks, some tasks might 
already be running on the cluster. The dag scheduler need to submit only 
non-running tasks for a stage. 

Summary: DagScheduler re-starts all running tasks on fetch failure  
(was: DagScheduler runs duplicate tasks on fetch failure)

> DagScheduler re-starts all running tasks on fetch failure
> -
>
> Key: SPARK-14649
> URL: https://issues.apache.org/jira/browse/SPARK-14649
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: Sital Kedia
>
> When a fetch failure occurs, the DAGScheduler re-launches the previous stage 
> (to re-generate output that was missing), and then re-launches all tasks in 
> the stage with the fetch failure that hadn't *completed* when the fetch 
> failure occurred (the DAGScheduler re-lanches all of the tasks whose output 
> data is not available -- which is equivalent to the set of tasks that hadn't 
> yet completed).
> The assumption when this code was originally written was that when a fetch 
> failure occurred, the output from at least one of the tasks in the previous 
> stage was no longer available, so all of the tasks in the current stage would 
> eventually fail due to not being able to access that output.  This assumption 
> does not hold for some large-scale, long-running workloads.  E.g., there's 
> one use case where a job has ~100k tasks that each run for about 1 hour, and 
> only the first 5-10 minutes are spent fetching data.  Because of the large 
> number of tasks, it's very common to see a few tasks fail in the fetch phase, 
> and it's wasteful to re-run other tasks that had finished fetching data so 
> aren't affected by the fetch failure (and may be most of the way through 
> their hour-long execution).  The DAGScheduler should not re-start these tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17302) Cannot set non-Spark SQL session variables in hive-site.xml, spark-defaults.conf, or using --conf

2016-09-05 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465818#comment-15465818
 ] 

Sean Zhong commented on SPARK-17302:


[~rdblue] Can you write a reproducer code sample to describe your problem? To 
elaborate what's your expected behavior, and what is your current observation?

I am not sure whether the {monospace}spark.conf.set("key", "value"){monospace} 
is what you expect or not.

> Cannot set non-Spark SQL session variables in hive-site.xml, 
> spark-defaults.conf, or using --conf
> -
>
> Key: SPARK-17302
> URL: https://issues.apache.org/jira/browse/SPARK-17302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ryan Blue
>
> When configuration changed for 2.0 to the new SparkSession structure, Spark 
> stopped using Hive's internal HiveConf for session state and now uses 
> HiveSessionState and an associated SQLConf. Now, session options like 
> hive.exec.compress.output and hive.exec.dynamic.partition.mode are pulled 
> from this SQLConf. This doesn't include session properties from hive-site.xml 
> (including hive.exec.compress.output), and no longer contains Spark-specific 
> overrides from spark-defaults.conf that used the spark.hadoop.hive... pattern.
> Also, setting these variables on the command-line no longer works because 
> settings must start with "spark.".
> Is there a recommended way to set Hive session properties?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465808#comment-15465808
 ] 

Herman van Hovell edited comment on SPARK-17403 at 9/5/16 9:24 PM:
---

Well then we sorta know where to look. These things are notoriously hard to 
debug.


was (Author: hvanhovell):
Well then we sorta know where to look.

> Fatal Error: SIGSEGV on Jdbc joins
> --
>
> Key: SPARK-17403
> URL: https://issues.apache.org/jira/browse/SPARK-17403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Spark standalone cluster (3 Workers, 47 cores)
> Ubuntu 14
> Java 8
>Reporter: Ruben Hernando
>
> The process creates views from JDBC (SQL server) source and combines them to 
> create other views.
> Finally it dumps results via JDBC
> Error:
> {quote}
> # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
> 1.8.0_101-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode 
> linux-amd64 )
> # Problematic frame:
> # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
> bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
> #
> {quote}
> SQL Query plan (fields truncated):
> {noformat}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'UnresolvedRelation `COEQ_63`
> == Analyzed Logical Plan ==
> InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
> Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
> priceId#20244]
> +- SubqueryAlias coeq_63
>+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
>   +- SubqueryAlias 6__input
>  +- 
> Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
>  .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
> TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where 
> _TP = 24) input)
> == Optimized Logical Plan ==
> Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>:  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
> [TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] 
> JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
>  36 more fields] 
> == Physical Plan ==
> *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
>:  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>: :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
> [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
> 36 more fields]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465808#comment-15465808
 ] 

Herman van Hovell commented on SPARK-17403:
---

Well then we sorta know where to look.

> Fatal Error: SIGSEGV on Jdbc joins
> --
>
> Key: SPARK-17403
> URL: https://issues.apache.org/jira/browse/SPARK-17403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Spark standalone cluster (3 Workers, 47 cores)
> Ubuntu 14
> Java 8
>Reporter: Ruben Hernando
>
> The process creates views from JDBC (SQL server) source and combines them to 
> create other views.
> Finally it dumps results via JDBC
> Error:
> {quote}
> # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
> 1.8.0_101-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode 
> linux-amd64 )
> # Problematic frame:
> # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
> bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
> #
> {quote}
> SQL Query plan (fields truncated):
> {noformat}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'UnresolvedRelation `COEQ_63`
> == Analyzed Logical Plan ==
> InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
> Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
> priceId#20244]
> +- SubqueryAlias coeq_63
>+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
>   +- SubqueryAlias 6__input
>  +- 
> Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
>  .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
> TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where 
> _TP = 24) input)
> == Optimized Logical Plan ==
> Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>:  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
> [TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] 
> JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
>  36 more fields] 
> == Physical Plan ==
> *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
>:  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>: :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
> [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
> 36 more fields]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17364) Can not query hive table starting with number

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465799#comment-15465799
 ] 

Herman van Hovell commented on SPARK-17364:
---

This should work. For instance this works: {{SELECT * from 20160826_ip_list}}


> Can not query hive table starting with number
> -
>
> Key: SPARK-17364
> URL: https://issues.apache.org/jira/browse/SPARK-17364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Egor Pahomov
>
> I can do it with spark-1.6.2
> {code}
> SELECT * from  temp.20160826_ip_list limit 100
> {code}
> {code}
> Error: org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input '.20160826' expecting {, ',', 'SELECT', 'FROM', 'ADD', 
> 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 
> 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
> 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
> 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 19)
> == SQL ==
> SELECT * from  temp.20160826_ip_list limit 100
> ---^^^
> SQLState:  null
> ErrorCode: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465781#comment-15465781
 ] 

Yin Huai commented on SPARK-17404:
--

cc [~felixcheung] and [~shivaram]

> [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
> ---
>
> Key: SPARK-17404
> URL: https://issues.apache.org/jira/browse/SPARK-17404
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.3
>Reporter: Yin Huai
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console
> Seems showDF in test_sparkSQL.R is broken. 
> {code}
> Failed 
> -
> 1. Failure: showDF() (@test_sparkSQL.R#1400) 
> ---
> `s` produced no output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17404) [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R

2016-09-05 Thread Yin Huai (JIRA)
Yin Huai created SPARK-17404:


 Summary: [BRANCH-1.6] Broken test: showDF in test_sparkSQL.R
 Key: SPARK-17404
 URL: https://issues.apache.org/jira/browse/SPARK-17404
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.6.3
Reporter: Yin Huai


https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-branch-1.6-test-sbt-hadoop-2.2/286/console

Seems showDF in test_sparkSQL.R is broken. 

{code}
Failed -
1. Failure: showDF() (@test_sparkSQL.R#1400) ---
`s` produced no output
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17364) Can not query hive table starting with number

2016-09-05 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465754#comment-15465754
 ] 

Sean Zhong edited comment on SPARK-17364 at 9/5/16 8:47 PM:


[~epahomov] 

Spark 2.0 rewrote the Sql parser with antlr. 
You can use the backticked version like Yuming's example.


was (Author: clockfly):
[~epahomov] 

Spark 2.0 rewrites the Sql parser with antlr. 
You can use the backticked version like Yuming's example.

> Can not query hive table starting with number
> -
>
> Key: SPARK-17364
> URL: https://issues.apache.org/jira/browse/SPARK-17364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Egor Pahomov
>
> I can do it with spark-1.6.2
> {code}
> SELECT * from  temp.20160826_ip_list limit 100
> {code}
> {code}
> Error: org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input '.20160826' expecting {, ',', 'SELECT', 'FROM', 'ADD', 
> 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 
> 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
> 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
> 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 19)
> == SQL ==
> SELECT * from  temp.20160826_ip_list limit 100
> ---^^^
> SQLState:  null
> ErrorCode: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17364) Can not query hive table starting with number

2016-09-05 Thread Sean Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465754#comment-15465754
 ] 

Sean Zhong commented on SPARK-17364:


[~epahomov] 

Spark 2.0 rewrites the Sql parser with antlr. 
You can use the backticked version like Yuming's example.

> Can not query hive table starting with number
> -
>
> Key: SPARK-17364
> URL: https://issues.apache.org/jira/browse/SPARK-17364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Egor Pahomov
>
> I can do it with spark-1.6.2
> {code}
> SELECT * from  temp.20160826_ip_list limit 100
> {code}
> {code}
> Error: org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input '.20160826' expecting {, ',', 'SELECT', 'FROM', 'ADD', 
> 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 
> 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
> 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
> 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 19)
> == SQL ==
> SELECT * from  temp.20160826_ip_list limit 100
> ---^^^
> SQLState:  null
> ErrorCode: 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465539#comment-15465539
 ] 

Ruben Hernando commented on SPARK-17403:


You're right. Removing cache() errors disappeared

> Fatal Error: SIGSEGV on Jdbc joins
> --
>
> Key: SPARK-17403
> URL: https://issues.apache.org/jira/browse/SPARK-17403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Spark standalone cluster (3 Workers, 47 cores)
> Ubuntu 14
> Java 8
>Reporter: Ruben Hernando
>
> The process creates views from JDBC (SQL server) source and combines them to 
> create other views.
> Finally it dumps results via JDBC
> Error:
> {quote}
> # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
> 1.8.0_101-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode 
> linux-amd64 )
> # Problematic frame:
> # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
> bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
> #
> {quote}
> SQL Query plan (fields truncated):
> {noformat}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'UnresolvedRelation `COEQ_63`
> == Analyzed Logical Plan ==
> InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
> Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
> priceId#20244]
> +- SubqueryAlias coeq_63
>+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
>   +- SubqueryAlias 6__input
>  +- 
> Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
>  .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
> TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where 
> _TP = 24) input)
> == Optimized Logical Plan ==
> Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>:  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
> [TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] 
> JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
>  36 more fields] 
> == Physical Plan ==
> *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
>:  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>: :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
> [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
> 36 more fields]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465383#comment-15465383
 ] 

Herman van Hovell commented on SPARK-17403:
---

[~rhernando] Could you try this without caching, and see if you run into 
trouble?

> Fatal Error: SIGSEGV on Jdbc joins
> --
>
> Key: SPARK-17403
> URL: https://issues.apache.org/jira/browse/SPARK-17403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Spark standalone cluster (3 Workers, 47 cores)
> Ubuntu 14
> Java 8
>Reporter: Ruben Hernando
>
> The process creates views from JDBC (SQL server) source and combines them to 
> create other views.
> Finally it dumps results via JDBC
> Error:
> {quote}
> # JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
> 1.8.0_101-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode 
> linux-amd64 )
> # Problematic frame:
> # J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
> bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
> #
> {quote}
> SQL Query plan (fields truncated):
> {noformat}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'UnresolvedRelation `COEQ_63`
> == Analyzed Logical Plan ==
> InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
> Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
> priceId#20244]
> +- SubqueryAlias coeq_63
>+- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
>   +- SubqueryAlias 6__input
>  +- 
> Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
>  .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
> TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where 
> _TP = 24) input)
> == Optimized Logical Plan ==
> Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>:  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
> [TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] 
> JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
>  36 more fields] 
> == Physical Plan ==
> *Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
> price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
> +- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
>:  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
> _TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
> StorageLevel(disk, memory, deserialized, 1 replicas)
>: :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
> [SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
> [SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
> [SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
> [SLTables].[_TableSL_SID] where _TP = 24) input) 
> [_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
> 36 more fields]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-05 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465375#comment-15465375
 ] 

Ryan Blue commented on SPARK-17396:
---

I'm not sure that the ForkJoinPool is to blame. Each partition in a Hive table 
becomes a partition RDD in the union. Each UnionRDD creates a new ForkJoinPool 
(parallelism=8) to list the files in a partition in parallel. I think it is 
more likely that the problem is that we don't use a shared ForkJoinPool, but 
instead create a new one for each RDD that doesn't get cleaned up until the 
UnionRDD is cleaned. Even if the ForkJoinPool can't reuse threads and we're 
getting more than 8, it wouldn't grow to thousands if we weren't creating so 
many pools. But, I agree that we should consider a different executor service. 
I've been reading up on ForkJoinPool and it looks like it's not a great 
implementation; just trying to determine the maximum number of threads it will 
use was painfully undocumented.

[~pin_zhang], can you post a list of the thread names? We can use the names to 
determine how many threads there are in each pool. That should tell us whether 
we should use a different executor service in addition to using a shared pool.

> Threads number keep increasing when query on external CSV partitioned table
> ---
>
> Key: SPARK-17396
> URL: https://issues.apache.org/jira/browse/SPARK-17396
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: pin_zhang
>
> 1. Create a external partitioned table row format CSV
> 2. Add 16 partitions to the table
> 3. Run SQL "select count(*) from test_csv"
> 4. ForkJoinThread number keep increasing 
> This happend when table partitions number greater than 10.
> 5. Test Code
> {code:lang=java}
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.hive.HiveContext
> object Bugs {
>   def main(args: Array[String]): Unit = {
> val location = "file:///g:/home/test/csv"
> val create = s"""CREATE   EXTERNAL  TABLE  test_csv
>  (ID string,  SEQ string )
>   PARTITIONED BY(index int)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
>   LOCATION "${location}" 
>   """
> val add_part = s"""
>   ALTER TABLE test_csv ADD 
>   PARTITION (index=1)LOCATION '${location}/index=1'
>   PARTITION (index=2)LOCATION '${location}/index=2'
>   PARTITION (index=3)LOCATION '${location}/index=3'
>   PARTITION (index=4)LOCATION '${location}/index=4'
>   PARTITION (index=5)LOCATION '${location}/index=5'
>   PARTITION (index=6)LOCATION '${location}/index=6'
>   PARTITION (index=7)LOCATION '${location}/index=7'
>   PARTITION (index=8)LOCATION '${location}/index=8'
>   PARTITION (index=9)LOCATION '${location}/index=9'
>   PARTITION (index=10)LOCATION '${location}/index=10'
>   PARTITION (index=11)LOCATION '${location}/index=11'
>   PARTITION (index=12)LOCATION '${location}/index=12'
>   PARTITION (index=13)LOCATION '${location}/index=13'
>   PARTITION (index=14)LOCATION '${location}/index=14'
>   PARTITION (index=15)LOCATION '${location}/index=15'
>   PARTITION (index=16)LOCATION '${location}/index=16'
> """
> val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
> conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
> val ctx = new SparkContext(conf)
> val hctx = new HiveContext(ctx)
> hctx.sql(create)
> hctx.sql(add_part)
>  for (i <- 1 to 6) {
>   new Query(hctx).start()
> }
>   }
>   class Query(htcx: HiveContext) extends Thread {
> setName("Query-Thread")
> override def run = {
>   while (true) {
> htcx.sql("select count(*) from test_csv").show()
> Thread.sleep(100)
>   }
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins

2016-09-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-17403:
--
Description: 
The process creates views from JDBC (SQL server) source and combines them to 
create other views.
Finally it dumps results via JDBC

Error:
{quote}

# JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
1.8.0_101-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 
)
# Problematic frame:
# J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
#

{quote}

SQL Query plan (fields truncated):

{noformat}
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `COEQ_63`

== Analyzed Logical Plan ==
InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
priceId#20244]
+- SubqueryAlias coeq_63
   +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
  +- SubqueryAlias 6__input
 +- 
Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
 .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP 
= 24) input)

== Optimized Logical Plan ==
Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
+- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, 
ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
[SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
[TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] JOIN 
sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
[SLTables].[_TableSL_SID] where _TP = 24) input) 
[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
 36 more fields] 

== Physical Plan ==
*Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
+- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
   :  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
_TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   : :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
[SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
[SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
[SLTables].[_TableSL_SID] where _TP = 24) input) 
[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
36 more fields]
{noformat}

  was:
The process creates views from JDBC (SQL server) source and combines them to 
create other views.
Finally it dumps results via JDBC

Error:
{quote}

# JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
1.8.0_101-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 
)
# Problematic frame:
# J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
#

{quote}

SQL Query plan (fields truncated):

{quote}
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `COEQ_63`

== Analyzed Logical Plan ==
InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
priceId#20244]
+- SubqueryAlias coeq_63
   +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
  +- SubqueryAlias 6__input
 +- 
Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
 .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP 
= 24) input)

== Optimized Logical Plan ==

[jira] [Updated] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-05 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated SPARK-17396:
--
Description: 
1. Create a external partitioned table row format CSV
2. Add 16 partitions to the table
3. Run SQL "select count(*) from test_csv"
4. ForkJoinThread number keep increasing 
This happend when table partitions number greater than 10.
5. Test Code

{code:lang=java}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

object Bugs {

  def main(args: Array[String]): Unit = {

val location = "file:///g:/home/test/csv"
val create = s"""CREATE   EXTERNAL  TABLE  test_csv
 (ID string,  SEQ string )
  PARTITIONED BY(index int)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  LOCATION "${location}" 
  """
val add_part = s"""
  ALTER TABLE test_csv ADD 
  PARTITION (index=1)LOCATION '${location}/index=1'
  PARTITION (index=2)LOCATION '${location}/index=2'
  PARTITION (index=3)LOCATION '${location}/index=3'
  PARTITION (index=4)LOCATION '${location}/index=4'
  PARTITION (index=5)LOCATION '${location}/index=5'
  PARTITION (index=6)LOCATION '${location}/index=6'
  PARTITION (index=7)LOCATION '${location}/index=7'
  PARTITION (index=8)LOCATION '${location}/index=8'
  PARTITION (index=9)LOCATION '${location}/index=9'
  PARTITION (index=10)LOCATION '${location}/index=10'
  PARTITION (index=11)LOCATION '${location}/index=11'
  PARTITION (index=12)LOCATION '${location}/index=12'
  PARTITION (index=13)LOCATION '${location}/index=13'
  PARTITION (index=14)LOCATION '${location}/index=14'
  PARTITION (index=15)LOCATION '${location}/index=15'
  PARTITION (index=16)LOCATION '${location}/index=16'
"""

val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
val ctx = new SparkContext(conf)
val hctx = new HiveContext(ctx)
hctx.sql(create)
hctx.sql(add_part)
 for (i <- 1 to 6) {
  new Query(hctx).start()
}
  }

  class Query(htcx: HiveContext) extends Thread {

setName("Query-Thread")

override def run = {
  while (true) {
htcx.sql("select count(*) from test_csv").show()
Thread.sleep(100)
  }

}
  }
}
{code}

  was:
1. Create a external partitioned table row format CSV
2. Add 16 partitions to the table
3. Run SQL "select count(*) from test_csv"
4. ForkJoinThread number keep increasing 
This happend when table partitions number greater than 10.
5. Test Code
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

object Bugs {

  def main(args: Array[String]): Unit = {

val location = "file:///g:/home/test/csv"
val create = s"""CREATE   EXTERNAL  TABLE  test_csv
 (ID string,  SEQ string )
  PARTITIONED BY(index int)
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  LOCATION "${location}" 
  """
val add_part = s"""
  ALTER TABLE test_csv ADD 
  PARTITION (index=1)LOCATION '${location}/index=1'
  PARTITION (index=2)LOCATION '${location}/index=2'
  PARTITION (index=3)LOCATION '${location}/index=3'
  PARTITION (index=4)LOCATION '${location}/index=4'
  PARTITION (index=5)LOCATION '${location}/index=5'
  PARTITION (index=6)LOCATION '${location}/index=6'
  PARTITION (index=7)LOCATION '${location}/index=7'
  PARTITION (index=8)LOCATION '${location}/index=8'
  PARTITION (index=9)LOCATION '${location}/index=9'
  PARTITION (index=10)LOCATION '${location}/index=10'
  PARTITION (index=11)LOCATION '${location}/index=11'
  PARTITION (index=12)LOCATION '${location}/index=12'
  PARTITION (index=13)LOCATION '${location}/index=13'
  PARTITION (index=14)LOCATION '${location}/index=14'
  PARTITION (index=15)LOCATION '${location}/index=15'
  PARTITION (index=16)LOCATION '${location}/index=16'
"""

val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
val ctx = new SparkContext(conf)
val hctx = new HiveContext(ctx)
hctx.sql(create)
hctx.sql(add_part)
 for (i <- 1 to 6) {
  new Query(hctx).start()
}
  }

  class Query(htcx: HiveContext) extends Thread {

setName("Query-Thread")

override def run = {
  while (true) {
htcx.sql("select count(*) from test_csv").show()
Thread.sleep(100)
  }

}
  }
}


> Threads number keep increasing when query on external CSV partitioned table
> ---
>
> Key: SPARK-17396
> URL: https://issues.apache.org/jira/browse/SPARK-17396
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects 

[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465353#comment-15465353
 ] 

Ruben Hernando commented on SPARK-15822:


Created https://issues.apache.org/jira/browse/SPARK-17403

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17403) Fatal Error: SIGSEGV on Jdbc joins

2016-09-05 Thread Ruben Hernando (JIRA)
Ruben Hernando created SPARK-17403:
--

 Summary: Fatal Error: SIGSEGV on Jdbc joins
 Key: SPARK-17403
 URL: https://issues.apache.org/jira/browse/SPARK-17403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
 Environment: Spark standalone cluster (3 Workers, 47 cores)
Ubuntu 14
Java 8

Reporter: Ruben Hernando


The process creates views from JDBC (SQL server) source and combines them to 
create other views.
Finally it dumps results via JDBC

Error:
{quote}

# JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
1.8.0_101-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 
)
# Problematic frame:
# J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
#

{quote}

SQL Query plan (fields truncated):

{quote}
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `COEQ_63`

== Analyzed Logical Plan ==
InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
priceId#20244]
+- SubqueryAlias coeq_63
   +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
  +- SubqueryAlias 6__input
 +- 
Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
 .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP 
= 24) input)

== Optimized Logical Plan ==
Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
+- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, 
ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
[SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
[TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] JOIN 
sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
[SLTables].[_TableSL_SID] where _TP = 24) input) 
[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
 36 more fields] 

== Physical Plan ==
*Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
+- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
   :  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
_TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   : :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
[SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
[SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
[SLTables].[_TableSL_SID] where _TP = 24) input) 
[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
36 more fields]
{quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465342#comment-15465342
 ] 

Ruben Hernando commented on SPARK-15822:


Thank you!

I'll do it

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465338#comment-15465338
 ] 

Herman van Hovell commented on SPARK-15822:
---

Could you open a new ticket for this one? It is definitely not connected to 
this or the broadcast join issue.

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465335#comment-15465335
 ] 

Herman van Hovell commented on SPARK-15822:
---

Thanks! Do you also have this problem when you don't cache the table?

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465316#comment-15465316
 ] 

Ruben Hernando commented on SPARK-15822:


{quote}
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `COEQ_63`

== Analyzed Logical Plan ==
InstanceId: bigint, price: double, ZoneId: int, priceItemId: int, priceId: int
Project [InstanceId#20236L, price#20237, ZoneId#20239, priceItemId#20242, 
priceId#20244]
+- SubqueryAlias coeq_63
   +- Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
  +- SubqueryAlias 6__input
 +- 
Relation[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,SL_Xcl_C#152,SL_Css_Cojs#153L,SL_Config#154,SL_CREATEDON#
 .. 36 more fields] JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables]. ... [...]  FROM [sch].[SLTables] [SLTables] JOIN sch.TPSLTables 
TPSLTables ON [TPSLTables].[_TableSL_SID] = [SLTables].[_TableSL_SID] where _TP 
= 24) input)

== Optimized Logical Plan ==
Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
+- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, _TableSH_SID#145L, 
ID#146, Name#147, ... 36 more fields], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
[SLTables].[Name], [SLTables].[TableP_DCID], [SLTables].[TableSHID], 
[TPSLTables].[SL_ACT_GI_DTE],  ... [...] FROM [sch].[SLTables] [SLTables] JOIN 
sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
[SLTables].[_TableSL_SID] where _TP = 24) input) 
[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,TableP_DCID#148,TableSHID#149,SL_ACT_GI_DTE#150,SL_Xcl_C#151,...
 36 more fields] 

== Physical Plan ==
*Project [_TableSL_SID#143L AS InstanceId#20236L, SL_RD_ColR_N#189 AS 
price#20237, 24 AS ZoneId#20239, 6 AS priceItemId#20242, 63 AS priceId#20244]
+- InMemoryTableScan [_TableSL_SID#143L, SL_RD_ColR_N#189]
   :  +- InMemoryRelation [_TableSL_SID#143L, _TableP_DC_SID#144L, 
_TableSH_SID#145L, ID#146, Name#147, ... 36 more fields], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   : :  +- *Scan JDBCRelation((select [SLTables].[_TableSL_SID], 
[SLTables].[_TableP_DC_SID], [SLTables].[_TableSH_SID], [SLTables].[ID], 
[SLTables].[Name], [SLTables].[TableP_DCID],  ... [...] FROM [sch].[SLTables] 
[SLTables] JOIN sch.TPSLTables TPSLTables ON [TPSLTables].[_TableSL_SID] = 
[SLTables].[_TableSL_SID] where _TP = 24) input) 
[_TableSL_SID#143L,_TableP_DC_SID#144L,_TableSH_SID#145L,ID#146,Name#147,,... 
36 more fields]
{quote}

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  

[jira] [Commented] (SPARK-16992) Pep8 code style

2016-09-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465314#comment-15465314
 ] 

Apache Spark commented on SPARK-16992:
--

User 'Stibbons' has created a pull request for this issue:
https://github.com/apache/spark/pull/14963

> Pep8 code style
> ---
>
> Key: SPARK-16992
> URL: https://issues.apache.org/jira/browse/SPARK-16992
> Project: Spark
>  Issue Type: Improvement
>Reporter: Semet
>
> Add code style checks and auto formating into the Python code.
> Features:
> - add a {{.editconfig}} file (Spark's Scala files use 2-spaces indentation, 
> while Python files uses 4) for compatible editors (almost every editors has a 
> plugin to support {{.editconfig}} file)
> - use autopep8 to fix basic pep8 mistakes
> - use isort to automatically sort and organise {{import}} statements and 
> organise them into logically linked order (see doc here. The most important 
> thing is that it splits import statements that loads more than one object 
> into several lines. It send keep the imports sorted. Said otherwise, for a 
> given module import, the line where it should be added will be fixed. This 
> will increase the number of line of the file, but this facilitates a lot file 
> maintainance and file merges if needed.
> add a 'validate.sh' script in order to automatise the correction (need isort 
> and autopep8 installed)
> You can see similar script in prod in the 
> [Buildbot|https://github.com/buildbot/buildbot/blob/master/common/validate.sh]
>  project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17072) generate table level stats:stats generation/storing/loading

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465257#comment-15465257
 ] 

Herman van Hovell edited comment on SPARK-17072 at 9/5/16 3:36 PM:
---

This is resolved by PR https://github.com/apache/spark/pull/14712 by Zhenhua 
Wang.

If someone knows his JIRA account name, I would be happy to assign it to him.


was (Author: hvanhovell):
This is resolve by PR https://github.com/apache/spark/pull/14712 by Zhenhua 
Wang.

If someone knows his JIRA account name, I would be happy to assign it to him.

> generate table level stats:stats generation/storing/loading
> ---
>
> Key: SPARK-17072
> URL: https://issues.apache.org/jira/browse/SPARK-17072
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
> Fix For: 2.1.0
>
>
> need to generating , storing, and loading statistics information into/from 
> meta store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17072) generate table level stats:stats generation/storing/loading

2016-09-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-17072.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

This is resolve by PR https://github.com/apache/spark/pull/14712 by Zhenhua 
Wang.

If someone knows his JIRA account name, I would be happy to assign it to him.

> generate table level stats:stats generation/storing/loading
> ---
>
> Key: SPARK-17072
> URL: https://issues.apache.org/jira/browse/SPARK-17072
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ron Hu
> Fix For: 2.1.0
>
>
> need to generating , storing, and loading statistics information into/from 
> meta store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17154) Wrong result can be returned or AnalysisException can be thrown after self-join or similar operations

2016-09-05 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465226#comment-15465226
 ] 

Wenchen Fan commented on SPARK-17154:
-

can you check if your design satisfies following requirements?

1. self-join issue should be resolved(at least give a better error message)
2. can reference same-name column from different dataframes, e.g.
{code}
val df1 = ...
val df2 = ...
val joined = df1.join(df2, (df1("a") + 1) === df2("a"))
joined.drop(df2("a"))
{code}
3. can reference column from previous dataframe, e.g.
{code}
val df = ...
df.filter(df("a") > 0).filter(df("b") > 0).filter(df("c") === 1)
{code}

> Wrong result can be returned or AnalysisException can be thrown after 
> self-join or similar operations
> -
>
> Key: SPARK-17154
> URL: https://issues.apache.org/jira/browse/SPARK-17154
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Kousuke Saruta
> Attachments: Solution_Proposal_SPARK-17154.pdf
>
>
> When we join two DataFrames which are originated from a same DataFrame, 
> operations to the joined DataFrame can fail.
> One reproducible  example is as follows.
> {code}
> val df = Seq(
>   (1, "a", "A"),
>   (2, "b", "B"),
>   (3, "c", "C"),
>   (4, "d", "D"),
>   (5, "e", "E")).toDF("col1", "col2", "col3")
>   val filtered = df.filter("col1 != 3").select("col1", "col2")
>   val joined = filtered.join(df, filtered("col1") === df("col1"), "inner")
>   val selected1 = joined.select(df("col3"))
> {code}
> In this case, AnalysisException is thrown.
> Another example is as follows.
> {code}
> val df = Seq(
>   (1, "a", "A"),
>   (2, "b", "B"),
>   (3, "c", "C"),
>   (4, "d", "D"),
>   (5, "e", "E")).toDF("col1", "col2", "col3")
>   val filtered = df.filter("col1 != 3").select("col1", "col2")
>   val rightOuterJoined = filtered.join(df, filtered("col1") === df("col1"), 
> "right")
>   val selected2 = rightOuterJoined.select(df("col1"))
>   selected2.show
> {code}
> In this case, we will expect to get the answer like as follows.
> {code}
> 1
> 2
> 3
> 4
> 5
> {code}
> But the actual result is as follows.
> {code}
> 1
> 2
> null
> 4
> 5
> {code}
> The cause of the problems in the examples is that the logical plan related to 
> the right side DataFrame and the expressions of its output are re-created in 
> the analyzer (at ResolveReference rule) when a DataFrame has expressions 
> which have a same exprId each other.
> Re-created expressions are equally to the original ones except exprId.
> This will happen when we do self-join or similar pattern operations.
> In the first example, df("col3") returns a Column which includes an 
> expression and the expression have an exprId (say id1 here).
> After join, the expresion which the right side DataFrame (df) has is 
> re-created and the old and new expressions are equally but exprId is renewed 
> (say id2 for the new exprId here).
> Because of the mismatch of those exprIds, AnalysisException is thrown.
> In the second example, df("col1") returns a column and the expression 
> contained in the column is assigned an exprId (say id3).
> On the other hand, a column returned by filtered("col1") has an expression 
> which has the same exprId (id3).
> After join, the expressions in the right side DataFrame are re-created and 
> the expression assigned id3 is no longer present in the right side but 
> present in the left side.
> So, referring df("col1") to the joined DataFrame, we get col1 of right side 
> which includes null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17399) Update specific data in mySql Table from Spark

2016-09-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-17399.
---
  Resolution: Invalid
Target Version/s:   (was: 2.0.0)

Please direction questions to the u...@spark.apache.org list, but you'd also 
need to be clearer about how this relates to Spark, as you're asking mostly 
about updating MySQL.

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

> Update specific data in mySql Table from Spark
> --
>
> Key: SPARK-17399
> URL: https://issues.apache.org/jira/browse/SPARK-17399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.0.0
> Environment: Linux 
>Reporter: Farman Ali
>  Labels: features
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> I want to update specific data in mySql Table from Spark data-frames or RDD. 
> for example i have column
> Name   ID
> Farman 1
> Ali 2
> now i want to update name where id=1.How can I update this specific record.
> how it is possible in Spark?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17401) Scala IDE not able to connect to Spark Master

2016-09-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17401:
--
Target Version/s:   (was: 2.0.0)
Priority: Major  (was: Blocker)

See https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark -- 
don't set blocker/target

> Scala IDE not able to connect to Spark Master
> -
>
> Key: SPARK-17401
> URL: https://issues.apache.org/jira/browse/SPARK-17401
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Deploy, Examples
>Affects Versions: 2.0.0
> Environment: Windows (x64 Bit)
>Reporter: Sachin Gupta
>
> Able to compile Spark project in Scala IDE. We've started master using 
> "spark-shell" which works fine from console with master reachable at 
> http://localhost:4040. 
> When we execute project from IDE it says can't connect to port 4040 and tries 
> on 4041. On master console (DOS) following error is shown:
> 16/09/05 17:40:35 WARN HttpParser: Illegal character 0x0 in state=START for 
> buff
> er 
> HeapByteBuffer@773c8742[p=1,l=1302,c=16384,r=1301]={\x00<<<\x00\x00\x00\x00\x
> 00\x05\x16\x03N\x8d\xFcA\xE1\xDb.\x16\x00...\x00\r192.168.13.61>>>\x00\x00\x00\x
> 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x0
> 0\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/09/05 17:40:35 WARN HttpParser: badMessage: 400 Illegal character 0x0 for 
> Htt
> pChannelOverHttp@660d0260{r=0,c=false,a=IDLE,uri=}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17402:


Assignee: Wenchen Fan  (was: Apache Spark)

> separate the management of temp views and metastore tables/views in 
> SessionCatalog
> --
>
> Key: SPARK-17402
> URL: https://issues.apache.org/jira/browse/SPARK-17402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17402:


Assignee: Apache Spark  (was: Wenchen Fan)

> separate the management of temp views and metastore tables/views in 
> SessionCatalog
> --
>
> Key: SPARK-17402
> URL: https://issues.apache.org/jira/browse/SPARK-17402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog

2016-09-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465085#comment-15465085
 ] 

Apache Spark commented on SPARK-17402:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/14962

> separate the management of temp views and metastore tables/views in 
> SessionCatalog
> --
>
> Key: SPARK-17402
> URL: https://issues.apache.org/jira/browse/SPARK-17402
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17402) separate the management of temp views and metastore tables/views in SessionCatalog

2016-09-05 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-17402:
---

 Summary: separate the management of temp views and metastore 
tables/views in SessionCatalog
 Key: SPARK-17402
 URL: https://issues.apache.org/jira/browse/SPARK-17402
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17379:
--
Assignee: Adam Roberts

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Adam Roberts
>Assignee: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17379:


Assignee: (was: Apache Spark)

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464976#comment-15464976
 ] 

Apache Spark commented on SPARK-17379:
--

User 'a-roberts' has created a pull request for this issue:
https://github.com/apache/spark/pull/14961

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17379:


Assignee: Apache Spark

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Adam Roberts
>Assignee: Apache Spark
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464962#comment-15464962
 ] 

Herman van Hovell commented on SPARK-15822:
---

Could you share the query plan? 

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Adam Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464925#comment-15464925
 ] 

Adam Roberts commented on SPARK-17379:
--

Thanks Artur, useful to know, no new tests fail with branch-1.6 and the above 
change either.

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Adam Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Roberts updated SPARK-17379:
-
Affects Version/s: 1.6.2
  Component/s: Build

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16213) Reduce runtime overhead of a program that creates an primitive array in DataFrame

2016-09-05 Thread Kazuaki Ishizaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-16213:
-
Affects Version/s: 2.0.0

> Reduce runtime overhead of a program that creates an primitive array in 
> DataFrame
> -
>
> Key: SPARK-16213
> URL: https://issues.apache.org/jira/browse/SPARK-16213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Kazuaki Ishizaki
>
> Reduce runtime overhead of a program that creates an primitive array in 
> DataFrame
> When a program creates an array in DataFrame, the code generator creates 
> boxing operations. If an array is for primitive type, there are some 
> opportunities for optimizations in generated code to reduce runtime overhead.
> Here is a simple example that has generated code with boxing operation
> {code}
> val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF
> df.selectExpr("Array(value + 1.1d, value + 2.2d)").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17401) Scala IDE not able to connect to Spark Master

2016-09-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464899#comment-15464899
 ] 

Sean Owen commented on SPARK-17401:
---

4040 is normally where the Spark UI binds, not master. I think you have 
something mixed up where, where something is talking to the UI when it expects 
to talk to the master or vice versa. Run your master on some other port to rule 
out this mix up.

> Scala IDE not able to connect to Spark Master
> -
>
> Key: SPARK-17401
> URL: https://issues.apache.org/jira/browse/SPARK-17401
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Deploy, Examples
>Affects Versions: 2.0.0
> Environment: Windows (x64 Bit)
>Reporter: Sachin Gupta
>Priority: Blocker
>
> Able to compile Spark project in Scala IDE. We've started master using 
> "spark-shell" which works fine from console with master reachable at 
> http://localhost:4040. 
> When we execute project from IDE it says can't connect to port 4040 and tries 
> on 4041. On master console (DOS) following error is shown:
> 16/09/05 17:40:35 WARN HttpParser: Illegal character 0x0 in state=START for 
> buff
> er 
> HeapByteBuffer@773c8742[p=1,l=1302,c=16384,r=1301]={\x00<<<\x00\x00\x00\x00\x
> 00\x05\x16\x03N\x8d\xFcA\xE1\xDb.\x16\x00...\x00\r192.168.13.61>>>\x00\x00\x00\x
> 00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x0
> 0\x00\x00\x00\x00\x00\x00\x00\x00\x00}
> 16/09/05 17:40:35 WARN HttpParser: badMessage: 400 Illegal character 0x0 for 
> Htt
> pChannelOverHttp@660d0260{r=0,c=false,a=IDLE,uri=}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17401) Scala IDE not able to connect to Spark Master

2016-09-05 Thread Sachin Gupta (JIRA)
Sachin Gupta created SPARK-17401:


 Summary: Scala IDE not able to connect to Spark Master
 Key: SPARK-17401
 URL: https://issues.apache.org/jira/browse/SPARK-17401
 Project: Spark
  Issue Type: Bug
  Components: Build, Deploy, Examples
Affects Versions: 2.0.0
 Environment: Windows (x64 Bit)
Reporter: Sachin Gupta
Priority: Blocker


Able to compile Spark project in Scala IDE. We've started master using 
"spark-shell" which works fine from console with master reachable at 
http://localhost:4040. 

When we execute project from IDE it says can't connect to port 4040 and tries 
on 4041. On master console (DOS) following error is shown:

16/09/05 17:40:35 WARN HttpParser: Illegal character 0x0 in state=START for buff
er HeapByteBuffer@773c8742[p=1,l=1302,c=16384,r=1301]={\x00<<<\x00\x00\x00\x00\x
00\x05\x16\x03N\x8d\xFcA\xE1\xDb.\x16\x00...\x00\r192.168.13.61>>>\x00\x00\x00\x
00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00...\x00\x00\x00\x00\x00\x0
0\x00\x00\x00\x00\x00\x00\x00\x00\x00}
16/09/05 17:40:35 WARN HttpParser: badMessage: 400 Illegal character 0x0 for Htt
pChannelOverHttp@660d0260{r=0,c=false,a=IDLE,uri=}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruben Hernando updated SPARK-15822:
---
Comment: was deleted

(was: I'm doing joins between SQL views. Data source is JDBC to SQL server)

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464860#comment-15464860
 ] 

Ruben Hernando commented on SPARK-15822:


I'm doing joins between SQL views. Data source is JDBC to SQL server

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464861#comment-15464861
 ] 

Ruben Hernando commented on SPARK-15822:


I'm doing joins between SQL views. Data source is JDBC to SQL server

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17339) Fix SparkR tests on Windows

2016-09-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464815#comment-15464815
 ] 

Apache Spark commented on SPARK-17339:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/14960

> Fix SparkR tests on Windows
> ---
>
> Key: SPARK-17339
> URL: https://issues.apache.org/jira/browse/SPARK-17339
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Reporter: Shivaram Venkataraman
>
> A number of SparkR tests are current failing when run on Windows as discussed 
> in https://github.com/apache/spark/pull/14743
> The list of tests that fail right now is at 
> https://gist.github.com/shivaram/7693df7bd54dc81e2e7d1ce296c41134
> A full log from a build and test on AppVeyor is at 
> https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17339) Fix SparkR tests on Windows

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17339:


Assignee: Apache Spark

> Fix SparkR tests on Windows
> ---
>
> Key: SPARK-17339
> URL: https://issues.apache.org/jira/browse/SPARK-17339
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Reporter: Shivaram Venkataraman
>Assignee: Apache Spark
>
> A number of SparkR tests are current failing when run on Windows as discussed 
> in https://github.com/apache/spark/pull/14743
> The list of tests that fail right now is at 
> https://gist.github.com/shivaram/7693df7bd54dc81e2e7d1ce296c41134
> A full log from a build and test on AppVeyor is at 
> https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17339) Fix SparkR tests on Windows

2016-09-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17339:


Assignee: (was: Apache Spark)

> Fix SparkR tests on Windows
> ---
>
> Key: SPARK-17339
> URL: https://issues.apache.org/jira/browse/SPARK-17339
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Reporter: Shivaram Venkataraman
>
> A number of SparkR tests are current failing when run on Windows as discussed 
> in https://github.com/apache/spark/pull/14743
> The list of tests that fail right now is at 
> https://gist.github.com/shivaram/7693df7bd54dc81e2e7d1ce296c41134
> A full log from a build and test on AppVeyor is at 
> https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464808#comment-15464808
 ] 

Herman van Hovell commented on SPARK-15822:
---

[~rhernando] It should not be related to this one. Are you doing a broadcast 
hash join by any chance? 

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464809#comment-15464809
 ] 

Herman van Hovell commented on SPARK-15822:
---

[~rhernando] It should not be related to this one. Are you doing a broadcast 
hash join by any chance? 

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-15822:
--
Comment: was deleted

(was: [~rhernando] It should not be related to this one. Are you doing a 
broadcast hash join by any chance? )

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15822) segmentation violation in o.a.s.unsafe.types.UTF8String

2016-09-05 Thread Ruben Hernando (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464793#comment-15464793
 ] 

Ruben Hernando commented on SPARK-15822:


Encountered a similar error (using 2.0.0):

{quote}#  SIGSEGV (0xb) at pc=0x7fbb355dfd6c, pid=5592, 
tid=0x7faa9f2f4700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 
1.8.0_101-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode linux-amd64 
)
# Problematic frame:
# J 4895 C1 org.apache.spark.unsafe.Platform.getLong(Ljava/lang/Object;J)J (9 
bytes) @ 0x7fbb355dfd6c [0x7fbb355dfd60+0xc]
#
{quote}

I'm not sure if it's related

> segmentation violation in o.a.s.unsafe.types.UTF8String 
> 
>
> Key: SPARK-15822
> URL: https://issues.apache.org/jira/browse/SPARK-15822
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: linux amd64
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>Reporter: Pete Robbins
>Assignee: Herman van Hovell
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Executors fail with segmentation violation while running application with
> spark.memory.offHeap.enabled true
> spark.memory.offHeap.size 512m
> Also now reproduced with 
> spark.memory.offHeap.enabled false
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7f4559b4d4bd, pid=14182, tid=139935319750400
> #
> # JRE version: OpenJDK Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # J 4816 C2 
> org.apache.spark.unsafe.types.UTF8String.compareTo(Lorg/apache/spark/unsafe/types/UTF8String;)I
>  (64 bytes) @ 0x7f4559b4d4bd [0x7f4559b4d460+0x5d]
> {noformat}
> We initially saw this on IBM java on PowerPC box but is recreatable on linux 
> with OpenJDK. On linux with IBM Java 8 we see a null pointer exception at the 
> same code point:
> {noformat}
> 16/06/08 11:14:58 ERROR Executor: Exception in task 1.0 in stage 5.0 (TID 48)
> java.lang.NullPointerException
>   at 
> org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:831)
>   at org.apache.spark.unsafe.types.UTF8String.compare(UTF8String.java:844)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.findNextInnerJoinRows$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$doExecute$2$$anon$2.hasNext(WholeStageCodegenExec.scala:377)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
>   at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:664)
>   at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1365)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1362)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:757)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.lang.Thread.run(Thread.java:785)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2016-09-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464744#comment-15464744
 ] 

Sean Owen commented on SPARK-17381:
---

That's it then. It's 'normal' operation, but does mean you need much bigger 
driver memory if you want to store state for all of these stages and tasks (or 
else keep fewer of them).

No, it's not data that's somehow only supposed to be on the executor. I 
commented on this above. 

> Memory leak  org.apache.spark.sql.execution.ui.SQLTaskMetrics
> -
>
> Key: SPARK-17381
> URL: https://issues.apache.org/jira/browse/SPARK-17381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: EMR 5.0.0 (submitted as yarn-client)
> Java Version  1.8.0_101 (Oracle Corporation)
> Scala Version version 2.11.8
> Problem also happens when I run locally with similar versions of java/scala. 
> OS: Ubuntu 16.04
>Reporter: Joao Duarte
>
> I am running a Spark Streaming application from a Kinesis stream. After some 
> hours running it gets out of memory. After a driver heap dump I found two 
> problems:
> 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
> this was a problem before: 
> https://issues.apache.org/jira/browse/SPARK-11192);
> To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
> needed to run the code below:
> {code}
> val dstream = ssc.union(kinesisStreams)
> dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
>   val toyDF = streamInfo.map(_ =>
> (1, "data","more data "
> ))
> .toDF("Num", "Data", "MoreData" )
>   toyDF.agg(sum("Num")).first().get(0)
> }
> )
> {code}
> 2) huge amount of Array[Byte] (9Gb+)
> After some analysis, I noticed that most of the Array[Byte] where being 
> referenced by objects that were being referenced by SQLTaskMetrics. The 
> strangest thing is that those Array[Byte] were basically text that were 
> loaded in the executors, so they should never be in the driver at all!
> Still could not replicate the 2nd problem with a simple code (the original 
> was complex with data coming from S3, DynamoDB and other databases). However, 
> when I debug the application I can see that in Executor.scala, during 
> reportHeartBeat(),  the data that should not be sent to the driver is being 
> added to "accumUpdates" which, as I understand, will be sent to the driver 
> for reporting.
> To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
> runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
> data that should not go to the driver. The path would be in my case: 
> taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
> not the same) to the data I see when I do a driver heap dump. 
> I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
> fixed I would have less of this undesirable data in the driver and I could 
> run my streaming app for a long period of time, but I think there will always 
> be some performance lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2016-09-05 Thread Joao Duarte (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464740#comment-15464740
 ] 

Joao Duarte edited comment on SPARK-17381 at 9/5/16 11:00 AM:
--

Thank you for your suggestion [~srowen]! Setting 
spark.sql.ui.retainedExecutions to a low number seems to be a good workaround. 
I'm running my application for about one hour with 
spark.sql.ui.retainedExecutions=10 and the number of SQLTaskMetrics objects and 
the heap memory size of the driver seem to stabilise. I'll give an update at 
the end of the day or tomorrow to tell you if it remains stable.
However, I think it is really strange that the driver is sent data that are 
supposed to be only in the executors. In my case, I am parsing HTML pages and 
some of those are being sent to the driver as part of ColumnStats (as you 
referred in you previous comment). Are they being sent as a summary by mistake? 
Do Spark really need this kind of information? The work around enables my 
application to run but sending unneeded data to the driver certainly reduces 
performance (some HTML pages I parse can be really big). 

Cheers


was (Author: joaomaiaduarte):
Hi Sean,

Thank you for your suggestion! Setting spark.sql.ui.retainedExecutions to a low 
number seems to be a good workaround. I'm running my application for about one 
hour with spark.sql.ui.retainedExecutions=10 and the number of SQLTaskMetrics 
objects and the heap memory size of the driver seem to stabilise. I'll give an 
update at the end of the day or tomorrow to tell you if it remains stable.
However, I think it is really strange that the driver is sent data that are 
supposed to be only in the executors. In my case, I am parsing HTML pages and 
some of those are being sent to the driver as part of ColumnStats (as you 
referred in you previous comment). Are they being sent as a summary by mistake? 
Do Spark really need this kind of information? The work around enables my 
application to run but sending unneeded data to the driver certainly reduces 
performance (some HTML pages I parse can be really big). 

Cheers

> Memory leak  org.apache.spark.sql.execution.ui.SQLTaskMetrics
> -
>
> Key: SPARK-17381
> URL: https://issues.apache.org/jira/browse/SPARK-17381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: EMR 5.0.0 (submitted as yarn-client)
> Java Version  1.8.0_101 (Oracle Corporation)
> Scala Version version 2.11.8
> Problem also happens when I run locally with similar versions of java/scala. 
> OS: Ubuntu 16.04
>Reporter: Joao Duarte
>
> I am running a Spark Streaming application from a Kinesis stream. After some 
> hours running it gets out of memory. After a driver heap dump I found two 
> problems:
> 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
> this was a problem before: 
> https://issues.apache.org/jira/browse/SPARK-11192);
> To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
> needed to run the code below:
> {code}
> val dstream = ssc.union(kinesisStreams)
> dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
>   val toyDF = streamInfo.map(_ =>
> (1, "data","more data "
> ))
> .toDF("Num", "Data", "MoreData" )
>   toyDF.agg(sum("Num")).first().get(0)
> }
> )
> {code}
> 2) huge amount of Array[Byte] (9Gb+)
> After some analysis, I noticed that most of the Array[Byte] where being 
> referenced by objects that were being referenced by SQLTaskMetrics. The 
> strangest thing is that those Array[Byte] were basically text that were 
> loaded in the executors, so they should never be in the driver at all!
> Still could not replicate the 2nd problem with a simple code (the original 
> was complex with data coming from S3, DynamoDB and other databases). However, 
> when I debug the application I can see that in Executor.scala, during 
> reportHeartBeat(),  the data that should not be sent to the driver is being 
> added to "accumUpdates" which, as I understand, will be sent to the driver 
> for reporting.
> To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
> runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
> data that should not go to the driver. The path would be in my case: 
> taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
> not the same) to the data I see when I do a driver heap dump. 
> I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
> fixed I would have less of this undesirable data in the driver and I could 
> run my streaming app for a long period of time, but I think there will always 
> be some performance lost.



--
This message was sent by 

[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2016-09-05 Thread Joao Duarte (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464740#comment-15464740
 ] 

Joao Duarte commented on SPARK-17381:
-

Hi Sean,

Thank you for your suggestion! Setting spark.sql.ui.retainedExecutions to a low 
number seems to be a good workaround. I'm running my application for about one 
hour with spark.sql.ui.retainedExecutions=10 and the number of SQLTaskMetrics 
objects and the heap memory size of the driver seem to stabilise. I'll give an 
update at the end of the day or tomorrow to tell you if it remains stable.
However, I think it is really strange that the driver is sent data that are 
supposed to be only in the executors. In my case, I am parsing HTML pages and 
some of those are being sent to the driver as part of ColumnStats (as you 
referred in you previous comment). Are they being sent as a summary by mistake? 
Do Spark really need this kind of information? The work around enables my 
application to run but sending unneeded data to the driver certainly reduces 
performance (some HTML pages I parse can be really big). 

Cheers

> Memory leak  org.apache.spark.sql.execution.ui.SQLTaskMetrics
> -
>
> Key: SPARK-17381
> URL: https://issues.apache.org/jira/browse/SPARK-17381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: EMR 5.0.0 (submitted as yarn-client)
> Java Version  1.8.0_101 (Oracle Corporation)
> Scala Version version 2.11.8
> Problem also happens when I run locally with similar versions of java/scala. 
> OS: Ubuntu 16.04
>Reporter: Joao Duarte
>
> I am running a Spark Streaming application from a Kinesis stream. After some 
> hours running it gets out of memory. After a driver heap dump I found two 
> problems:
> 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
> this was a problem before: 
> https://issues.apache.org/jira/browse/SPARK-11192);
> To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
> needed to run the code below:
> {code}
> val dstream = ssc.union(kinesisStreams)
> dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
>   val toyDF = streamInfo.map(_ =>
> (1, "data","more data "
> ))
> .toDF("Num", "Data", "MoreData" )
>   toyDF.agg(sum("Num")).first().get(0)
> }
> )
> {code}
> 2) huge amount of Array[Byte] (9Gb+)
> After some analysis, I noticed that most of the Array[Byte] where being 
> referenced by objects that were being referenced by SQLTaskMetrics. The 
> strangest thing is that those Array[Byte] were basically text that were 
> loaded in the executors, so they should never be in the driver at all!
> Still could not replicate the 2nd problem with a simple code (the original 
> was complex with data coming from S3, DynamoDB and other databases). However, 
> when I debug the application I can see that in Executor.scala, during 
> reportHeartBeat(),  the data that should not be sent to the driver is being 
> added to "accumUpdates" which, as I understand, will be sent to the driver 
> for reporting.
> To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
> runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
> data that should not go to the driver. The path would be in my case: 
> taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
> not the same) to the data I see when I do a driver heap dump. 
> I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
> fixed I would have less of this undesirable data in the driver and I could 
> run my streaming app for a long period of time, but I think there will always 
> be some performance lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2016-09-05 Thread Joao Duarte (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joao Duarte updated SPARK-17381:

Description: 
I am running a Spark Streaming application from a Kinesis stream. After some 
hours running it gets out of memory. After a driver heap dump I found two 
problems:
1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
this was a problem before: 
https://issues.apache.org/jira/browse/SPARK-11192);

To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
needed to run the code below:

{code}
val dstream = ssc.union(kinesisStreams)
dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
  val toyDF = streamInfo.map(_ =>
(1, "data","more data "
))
.toDF("Num", "Data", "MoreData" )
  toyDF.agg(sum("Num")).first().get(0)
}
)
{code}


2) huge amount of Array[Byte] (9Gb+)

After some analysis, I noticed that most of the Array[Byte] where being 
referenced by objects that were being referenced by SQLTaskMetrics. The 
strangest thing is that those Array[Byte] were basically text that were loaded 
in the executors, so they should never be in the driver at all!

Still could not replicate the 2nd problem with a simple code (the original was 
complex with data coming from S3, DynamoDB and other databases). However, when 
I debug the application I can see that in Executor.scala, during 
reportHeartBeat(),  the data that should not be sent to the driver is being 
added to "accumUpdates" which, as I understand, will be sent to the driver for 
reporting.

To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
data that should not go to the driver. The path would be in my case: 
taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
not the same) to the data I see when I do a driver heap dump. 

I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
fixed I would have less of this undesirable data in the driver and I could run 
my streaming app for a long period of time, but I think there will always be 
some performance lost.





  was:
I am running a Spark Streaming application from a Kinesis stream. After some 
hours running it gets out of memory. After a driver heap dump I found two 
problems:
1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
this was a problem before: 
https://issues.apache.org/jira/browse/SPARK-11192);

To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
needed to run the code below:

{code}
val dstream = ssc.union(kinesisStreams)
dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
  //load data
  val toyDF = streamInfo.map(_ =>
(1, "data","more data "
))
.toDF("Num", "Data", "MoreData" )
  toyDF.agg(sum("Num")).first().get(0)
}
)
{code}


2) huge amount of Array[Byte] (9Gb+)

After some analysis, I noticed that most of the Array[Byte] where being 
referenced by objects that were being referenced by SQLTaskMetrics. The 
strangest thing is that those Array[Byte] were basically text that were loaded 
in the executors, so they should never be in the driver at all!

Still could not replicate the 2nd problem with a simple code (the original was 
complex with data coming from S3, DynamoDB and other databases). However, when 
I debug the application I can see that in Executor.scala, during 
reportHeartBeat(),  the data that should not be sent to the driver is being 
added to "accumUpdates" which, as I understand, will be sent to the driver for 
reporting.

To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
data that should not go to the driver. The path would be in my case: 
taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
not the same) to the data I see when I do a driver heap dump. 

I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
fixed I would have less of this undesirable data in the driver and I could run 
my streaming app for a long period of time, but I think there will always be 
some performance lost.






> Memory leak  org.apache.spark.sql.execution.ui.SQLTaskMetrics
> -
>
> Key: SPARK-17381
> URL: https://issues.apache.org/jira/browse/SPARK-17381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: EMR 5.0.0 (submitted as yarn-client)
> Java Version  1.8.0_101 (Oracle Corporation)
> Scala Version version 2.11.8
> Problem also happens when I run locally with similar versions of java/scala. 
> OS: Ubuntu 16.04
>

[jira] [Commented] (SPARK-17396) Threads number keep increasing when query on external CSV partitioned table

2016-09-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464627#comment-15464627
 ] 

Sean Owen commented on SPARK-17396:
---

[~rdblue] what do you think of this -- I think there's a good point here that 
ForkJoinPool won't limit the number of threads it uses? if that's true then 
this would probably need to switch to a normal Java executor service that can 
cap threads.

> Threads number keep increasing when query on external CSV partitioned table
> ---
>
> Key: SPARK-17396
> URL: https://issues.apache.org/jira/browse/SPARK-17396
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: pin_zhang
>
> 1. Create a external partitioned table row format CSV
> 2. Add 16 partitions to the table
> 3. Run SQL "select count(*) from test_csv"
> 4. ForkJoinThread number keep increasing 
> This happend when table partitions number greater than 10.
> 5. Test Code
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.hive.HiveContext
> object Bugs {
>   def main(args: Array[String]): Unit = {
> val location = "file:///g:/home/test/csv"
> val create = s"""CREATE   EXTERNAL  TABLE  test_csv
>  (ID string,  SEQ string )
>   PARTITIONED BY(index int)
>   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
>   LOCATION "${location}" 
>   """
> val add_part = s"""
>   ALTER TABLE test_csv ADD 
>   PARTITION (index=1)LOCATION '${location}/index=1'
>   PARTITION (index=2)LOCATION '${location}/index=2'
>   PARTITION (index=3)LOCATION '${location}/index=3'
>   PARTITION (index=4)LOCATION '${location}/index=4'
>   PARTITION (index=5)LOCATION '${location}/index=5'
>   PARTITION (index=6)LOCATION '${location}/index=6'
>   PARTITION (index=7)LOCATION '${location}/index=7'
>   PARTITION (index=8)LOCATION '${location}/index=8'
>   PARTITION (index=9)LOCATION '${location}/index=9'
>   PARTITION (index=10)LOCATION '${location}/index=10'
>   PARTITION (index=11)LOCATION '${location}/index=11'
>   PARTITION (index=12)LOCATION '${location}/index=12'
>   PARTITION (index=13)LOCATION '${location}/index=13'
>   PARTITION (index=14)LOCATION '${location}/index=14'
>   PARTITION (index=15)LOCATION '${location}/index=15'
>   PARTITION (index=16)LOCATION '${location}/index=16'
> """
> val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
> conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
> val ctx = new SparkContext(conf)
> val hctx = new HiveContext(ctx)
> hctx.sql(create)
> hctx.sql(add_part)
>  for (i <- 1 to 6) {
>   new Query(hctx).start()
> }
>   }
>   class Query(htcx: HiveContext) extends Thread {
> setName("Query-Thread")
> override def run = {
>   while (true) {
> htcx.sql("select count(*) from test_csv").show()
> Thread.sleep(100)
>   }
> }
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464604#comment-15464604
 ] 

Artur Sukhenko commented on SPARK-17379:


Tried to use spark with 4.1.5, after fixing deprecated and implementing new 
methods some tests failed. So agree with you to just upgrade to 4.0.41.

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >