[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719170#comment-15719170 ] Sumesh Kumar commented on SPARK-18200: -- Thanks much [~dongjoon] > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee >Assignee: Dongjoon Hyun > Labels: graph, graphx > Fix For: 2.0.3, 2.1.0 > > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719124#comment-15719124 ] Dongjoon Hyun commented on SPARK-18200: --- Hi, It will be in upcoming Apache Spark 2.0.3 and 2.1.0. We cannot backport into 2.0.1 because it's already released. > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee >Assignee: Dongjoon Hyun > Labels: graph, graphx > Fix For: 2.0.3, 2.1.0 > > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719081#comment-15719081 ] Sumesh Kumar commented on SPARK-18200: -- Does this issue exist currently in version 2.0.1?. I just ran a test and it's throwing the following exception. User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 10.0 failed 4 times, most recent failure: Lost task 3.3 in stage 10.0 (TID 196, BD-S2F13): java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity at scala.Predef$.require(Predef.scala:224) at org.apache.spark.util.collection.OpenHashSet$mcJ$sp.(OpenHashSet.scala:51) at org.apache.spark.util.collection.OpenHashSet$mcJ$sp.(OpenHashSet.scala:57) at org.apache.spark.graphx.lib.TriangleCount$$anonfun$5.apply(TriangleCount.scala:70) at org.apache.spark.graphx.lib.TriangleCount$$anonfun$5.apply(TriangleCount.scala:69) at org.apache.spark.graphx.impl.VertexPartitionBaseOps.map(VertexPartitionBaseOps.scala:61) at org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$mapValues$2.apply(VertexRDDImpl.scala:102) at org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$mapValues$2.apply(VertexRDDImpl.scala:102) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156) at org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:154) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332) at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee >Assignee: Dongjoon Hyun > Labels: graph, graphx > Fix For: 2.0.3, 2.1.0 > > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633534#comment-15633534 ] Apache Spark commented on SPARK-18200: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/15754 > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee >Assignee: Dongjoon Hyun > Labels: graph, graphx > Fix For: 2.0.3, 2.1.0 > > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631251#comment-15631251 ] Dongjoon Hyun commented on SPARK-18200: --- Good for you. :) > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee > Labels: graph, graphx > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631204#comment-15631204 ] SathyaNarayanan Srinivasan commented on SPARK-18200: Thank you Dongjoon Hyun and Denny Lee for taking a very serious consideration to my question in Stack Overflow. I am in the process of implementing and testing the proposed solution and the reported answers work fine. Thanks > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee > Labels: graph, graphx > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630864#comment-15630864 ] Dongjoon Hyun commented on SPARK-18200: --- The described scenario is also tested. {code} scala> import org.apache.spark.graphx.{GraphLoader, PartitionStrategy} scala> val filepath = "/tmp/ca-HepTh.txt" scala> val graph = GraphLoader.edgeListFile(sc, filepath, true).partitionBy(PartitionStrategy.RandomVertexCut) scala> val triCounts = graph.triangleCount().vertices scala> triCounts.toDF().show() +-+---+ | _1| _2| +-+---+ |50130| 2| |20484| 11| |10598|190| |31760| 29| {code} > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee > Labels: graph, graphx > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630821#comment-15630821 ] Dongjoon Hyun commented on SPARK-18200: --- Actually, there is a node whose don't have neighbor. So, it requested to create `VertexSet` with zero initial capacity. > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee > Labels: graph, graphx > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630814#comment-15630814 ] Apache Spark commented on SPARK-18200: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/15741 > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee > Labels: graph, graphx > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount
[ https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630798#comment-15630798 ] Dongjoon Hyun commented on SPARK-18200: --- Hi, [~dennyglee]. It's due to `OpenHashSet`. I'll make a PR for this. > GraphX Invalid initial capacity when running triangleCount > -- > > Key: SPARK-18200 > URL: https://issues.apache.org/jira/browse/SPARK-18200 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 2.0.0, 2.0.1, 2.0.2 > Environment: Databricks, Ubuntu 16.04, macOS Sierra >Reporter: Denny Lee > Labels: graph, graphx > > Running GraphX triangle count on large-ish file results in the "Invalid > initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, > 2.0.1, and 2.0.2). You can see the results at: http://bit.ly/2eQKWDN > Running the same code on Spark 1.6 and the query completes without any > problems: http://bit.ly/2fATO1M > As well, running the GraphFrames version of this code runs as well (Spark > 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8 > Reference Stackoverflow question: > Spark GraphX: requirement failed: Invalid initial capacity > (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org