[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-12-03 Thread Sumesh Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719170#comment-15719170
 ] 

Sumesh Kumar commented on SPARK-18200:
--

Thanks much [~dongjoon]

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>Assignee: Dongjoon Hyun
>  Labels: graph, graphx
> Fix For: 2.0.3, 2.1.0
>
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-12-03 Thread Sumesh Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719081#comment-15719081
 ] 

Sumesh Kumar commented on SPARK-18200:
--

Does this issue exist currently in version 2.0.1?. I just ran a test and it's 
throwing the following exception.

User class threw exception: org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 3 in stage 10.0 failed 4 times, most recent failure: Lost 
task 3.3 in stage 10.0 (TID 196, BD-S2F13): java.lang.IllegalArgumentException: 
requirement failed: Invalid initial capacity
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.spark.util.collection.OpenHashSet$mcJ$sp.(OpenHashSet.scala:51)
at 
org.apache.spark.util.collection.OpenHashSet$mcJ$sp.(OpenHashSet.scala:57)
at 
org.apache.spark.graphx.lib.TriangleCount$$anonfun$5.apply(TriangleCount.scala:70)
at 
org.apache.spark.graphx.lib.TriangleCount$$anonfun$5.apply(TriangleCount.scala:69)
at 
org.apache.spark.graphx.impl.VertexPartitionBaseOps.map(VertexPartitionBaseOps.scala:61)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$mapValues$2.apply(VertexRDDImpl.scala:102)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$mapValues$2.apply(VertexRDDImpl.scala:102)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:154)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>Assignee: Dongjoon Hyun
>  Labels: graph, graphx
> Fix For: 2.0.3, 2.1.0
>
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org