[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-12-03 Thread Sumesh Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719170#comment-15719170
 ] 

Sumesh Kumar commented on SPARK-18200:
--

Thanks much [~dongjoon]

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>Assignee: Dongjoon Hyun
>  Labels: graph, graphx
> Fix For: 2.0.3, 2.1.0
>
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-12-03 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719124#comment-15719124
 ] 

Dongjoon Hyun commented on SPARK-18200:
---

Hi,

It will be in upcoming Apache Spark 2.0.3 and 2.1.0.
We cannot backport into 2.0.1 because it's already released.

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>Assignee: Dongjoon Hyun
>  Labels: graph, graphx
> Fix For: 2.0.3, 2.1.0
>
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-12-03 Thread Sumesh Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15719081#comment-15719081
 ] 

Sumesh Kumar commented on SPARK-18200:
--

Does this issue exist currently in version 2.0.1?. I just ran a test and it's 
throwing the following exception.

User class threw exception: org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 3 in stage 10.0 failed 4 times, most recent failure: Lost 
task 3.3 in stage 10.0 (TID 196, BD-S2F13): java.lang.IllegalArgumentException: 
requirement failed: Invalid initial capacity
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.spark.util.collection.OpenHashSet$mcJ$sp.(OpenHashSet.scala:51)
at 
org.apache.spark.util.collection.OpenHashSet$mcJ$sp.(OpenHashSet.scala:57)
at 
org.apache.spark.graphx.lib.TriangleCount$$anonfun$5.apply(TriangleCount.scala:70)
at 
org.apache.spark.graphx.lib.TriangleCount$$anonfun$5.apply(TriangleCount.scala:69)
at 
org.apache.spark.graphx.impl.VertexPartitionBaseOps.map(VertexPartitionBaseOps.scala:61)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$mapValues$2.apply(VertexRDDImpl.scala:102)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$mapValues$2.apply(VertexRDDImpl.scala:102)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:154)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>Assignee: Dongjoon Hyun
>  Labels: graph, graphx
> Fix For: 2.0.3, 2.1.0
>
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633534#comment-15633534
 ] 

Apache Spark commented on SPARK-18200:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/15754

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>Assignee: Dongjoon Hyun
>  Labels: graph, graphx
> Fix For: 2.0.3, 2.1.0
>
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-02 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631251#comment-15631251
 ] 

Dongjoon Hyun commented on SPARK-18200:
---

Good for you. :)

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>  Labels: graph, graphx
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-02 Thread SathyaNarayanan Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631204#comment-15631204
 ] 

SathyaNarayanan Srinivasan commented on SPARK-18200:


Thank you  Dongjoon Hyun and  Denny Lee for taking a very serious consideration 
to my question in Stack Overflow. I am in the process of implementing and 
testing the proposed solution and the reported answers work fine. Thanks

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>  Labels: graph, graphx
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-02 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630864#comment-15630864
 ] 

Dongjoon Hyun commented on SPARK-18200:
---

The described scenario is also tested.
{code}
scala> import org.apache.spark.graphx.{GraphLoader, PartitionStrategy}

scala> val filepath = "/tmp/ca-HepTh.txt"

scala> val graph = GraphLoader.edgeListFile(sc, filepath, 
true).partitionBy(PartitionStrategy.RandomVertexCut)

scala> val triCounts = graph.triangleCount().vertices

scala> triCounts.toDF().show()
+-+---+
|   _1| _2|
+-+---+
|50130|  2|
|20484| 11|
|10598|190|
|31760| 29|
{code}

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>  Labels: graph, graphx
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-02 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630821#comment-15630821
 ] 

Dongjoon Hyun commented on SPARK-18200:
---

Actually, there is a node whose don't have neighbor. So, it requested to create 
`VertexSet` with zero initial capacity.

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>  Labels: graph, graphx
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630814#comment-15630814
 ] 

Apache Spark commented on SPARK-18200:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/15741

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>  Labels: graph, graphx
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18200) GraphX Invalid initial capacity when running triangleCount

2016-11-02 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630798#comment-15630798
 ] 

Dongjoon Hyun commented on SPARK-18200:
---

Hi, [~dennyglee].
It's due to `OpenHashSet`. I'll make a PR for this.

> GraphX Invalid initial capacity when running triangleCount
> --
>
> Key: SPARK-18200
> URL: https://issues.apache.org/jira/browse/SPARK-18200
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
> Environment: Databricks, Ubuntu 16.04, macOS Sierra
>Reporter: Denny Lee
>  Labels: graph, graphx
>
> Running GraphX triangle count on large-ish file results in the "Invalid 
> initial capacity" error when running on Spark 2.0 (tested on Spark 2.0, 
> 2.0.1, and 2.0.2).  You can see the results at: http://bit.ly/2eQKWDN
> Running the same code on Spark 1.6 and the query completes without any 
> problems: http://bit.ly/2fATO1M
> As well, running the GraphFrames version of this code runs as well (Spark 
> 2.0, GraphFrames 0.2): http://bit.ly/2fAS8W8
> Reference Stackoverflow question:
> Spark GraphX: requirement failed: Invalid initial capacity 
> (http://stackoverflow.com/questions/40337366/spark-graphx-requirement-failed-invalid-initial-capacity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org