[jira] [Commented] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly

2014-08-15 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099317#comment-14099317
 ] 

Xiangrui Meng commented on SPARK-2944:
--

I changed the priority to Major because I couldn't re-produce the bug in a 
deterministic way, nor I could verify whether this is an issue introduced after 
v1.0. It seems that it only happens when each task is very small.

 sc.makeRDD doesn't distribute partitions evenly
 ---

 Key: SPARK-2944
 URL: https://issues.apache.org/jira/browse/SPARK-2944
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng

 16 nodes EC2 cluster:
 {code}
 val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
 rdd.count()
 {code}
 Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly

2014-08-10 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092135#comment-14092135
 ] 

Xiangrui Meng commented on SPARK-2944:
--

Found that this behavior is not deterministic. So it is hard to tell which 
commit introduces it now. It seems that it happens when tasks are very small. 
Some workers may get a lot more assignments than others because they finishes 
the tasks very quickly and TaskSetManager always picks the first available one. 
(There are no randomization in `TaskSetManager`.)

 sc.makeRDD doesn't distribute partitions evenly
 ---

 Key: SPARK-2944
 URL: https://issues.apache.org/jira/browse/SPARK-2944
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 16 nodes EC2 cluster:
 {code}
 val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
 rdd.count()
 {code}
 Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly

2014-08-09 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091699#comment-14091699
 ] 

Patrick Wendell commented on SPARK-2944:


Hey [~mengxr], do you know how the behavior differs from Spark 1.0? Also, if 
there is a clear difference, could you see if the behavior is modified by this 
patch?

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=63bdb1f41b4895e3a9444f7938094438a94d3007

 sc.makeRDD doesn't distribute partitions evenly
 ---

 Key: SPARK-2944
 URL: https://issues.apache.org/jira/browse/SPARK-2944
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 16 nodes EC2 cluster:
 {code}
 val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
 rdd.count()
 {code}
 Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2944) sc.makeRDD doesn't distribute partitions evenly

2014-08-09 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091784#comment-14091784
 ] 

Xiangrui Meng commented on SPARK-2944:
--

I checked that one first. It was okay after that commit, and it was bad before 
this one:

https://github.com/apache/spark/commit/28dbae85aaf6842e22cd7465cb11cb34d58fc56d

I didn't see anything suspicious in between, doing a binary search now 

 sc.makeRDD doesn't distribute partitions evenly
 ---

 Key: SPARK-2944
 URL: https://issues.apache.org/jira/browse/SPARK-2944
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Critical

 16 nodes EC2 cluster:
 {code}
 val rdd = sc.makeRDD(0 until 1e9.toInt, 1000).cache()
 rdd.count()
 {code}
 Saw 156 partitions on one node while only 8 partitions on another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org