[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15077


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-13 Thread codlife
Github user codlife commented on a diff in the pull request:

https://github.com/apache/spark/pull/15077#discussion_r78552813
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope 
{
 assertNotStopped()
 val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
-new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, 
indexToPrefs)
+new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 
defaultParallelism), indexToPrefs)
--- End diff --

ok ,thanks for your explain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15077#discussion_r78551772
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope 
{
 assertNotStopped()
 val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
-new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, 
indexToPrefs)
+new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 
defaultParallelism), indexToPrefs)
--- End diff --

The problem is that the default is OK because it's changeable, but here 
someone has no way to change it. I think it might be better to stay 
conservative.

Really this is such a corner case that it doesn't matter much. It only 
showed up for you because you specified no type on your Seq. If you had, it 
would have chosen the other overload which works fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-13 Thread codlife
Github user codlife commented on a diff in the pull request:

https://github.com/apache/spark/pull/15077#discussion_r78551032
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope 
{
 assertNotStopped()
 val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
-new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, 
indexToPrefs)
+new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 
defaultParallelism), indexToPrefs)
--- End diff --

To keep the same with sc.parallelize, I think the defalutParallelism is 
reasonable, but which one to use, you can make a decision.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-13 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15077#discussion_r78549808
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope 
{
 assertNotStopped()
 val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
-new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, 
indexToPrefs)
+new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 
defaultParallelism), indexToPrefs)
--- End diff --

I would say `math.max(seq.size, 1)`. Really this method would normally just 
use the provided partition count (called "numSlices" in this old API) but this 
one doesn't have that parameter, which is more reason it's an odd man out. 
Still I think the most reasonable behavior is to use at least 1 partition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-13 Thread codlife
GitHub user codlife reopened a pull request:

https://github.com/apache/spark/pull/15077

[SPARK-17521]Error when I use sparkContext.makeRDD(Seq())

## What changes were proposed in this pull request?

 when i use sc.makeRDD below
```
val data3 = sc.makeRDD(Seq())
println(data3.partitions.length)
```
I got an error:
Exception in thread "main" java.lang.IllegalArgumentException: Positive 
number of slices required

We can fix this bug just modify the last line ,do a check of seq.size
```
  def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope {
assertNotStopped()
val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap
new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, 
defaultParallelism), indexToPrefs)
  }
```


## How was this patch tested?

 manual tests


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/codlife/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15077.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15077


commit 673c29b2166e002d97b914ef8f8316df71fc8be7
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T02:02:21Z

solve spark-17447

commit a4609059350af3ebeb68e5acdfc99daf424a817a
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T02:26:46Z

Update Partitioner.scala

commit 7829bd0a3c66c474ec67f64d1ef043d0e251cdf6
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T12:26:54Z

solve spark-17447

commit 8ddc442fc40f71d85fcaef8e4a721f6b31a5ea5c
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T12:33:19Z

fix  code style

commit 81c0eb9bb45b15dc746d935afa7a3259bb0efcd9
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T15:20:09Z

solve spark-17447

commit f5d1e24d38f4a24f2ebc29214eb1a331846a0b1b
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T15:21:44Z

Update Partitioner.scala

commit e717f65ff419e152a02e359f1241343d48e56977
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T10:45:34Z

Merge branch 'master' of https://github.com/codlife/spark

commit e426ccfabeb4e9baa38bceac893db7d985cfa860
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T10:51:57Z

solve SPARK-17521

commit af1a102192794bce88afab172f3b074e901d8383
Author: codlife 
Date:   2016-09-13T11:32:34Z

Merge pull request #2 from apache/master

NEW




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...

2016-09-13 Thread codlife
Github user codlife closed the pull request at:

https://github.com/apache/spark/pull/15077


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org