[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15077 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
Github user codlife commented on a diff in the pull request: https://github.com/apache/spark/pull/15077#discussion_r78552813 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope { assertNotStopped() val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap -new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, indexToPrefs) +new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs) --- End diff -- ok ,thanks for your explain. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15077#discussion_r78551772 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope { assertNotStopped() val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap -new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, indexToPrefs) +new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs) --- End diff -- The problem is that the default is OK because it's changeable, but here someone has no way to change it. I think it might be better to stay conservative. Really this is such a corner case that it doesn't matter much. It only showed up for you because you specified no type on your Seq. If you had, it would have chosen the other overload which works fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
Github user codlife commented on a diff in the pull request: https://github.com/apache/spark/pull/15077#discussion_r78551032 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope { assertNotStopped() val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap -new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, indexToPrefs) +new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs) --- End diff -- To keep the same with sc.parallelize, I think the defalutParallelism is reasonable, but which one to use, you can make a decision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15077#discussion_r78549808 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -795,7 +795,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope { assertNotStopped() val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap -new ParallelCollectionRDD[T](this, seq.map(_._1), seq.size, indexToPrefs) +new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs) --- End diff -- I would say `math.max(seq.size, 1)`. Really this method would normally just use the provided partition count (called "numSlices" in this old API) but this one doesn't have that parameter, which is more reason it's an odd man out. Still I think the most reasonable behavior is to use at least 1 partition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
GitHub user codlife reopened a pull request: https://github.com/apache/spark/pull/15077 [SPARK-17521]Error when I use sparkContext.makeRDD(Seq()) ## What changes were proposed in this pull request? when i use sc.makeRDD below ``` val data3 = sc.makeRDD(Seq()) println(data3.partitions.length) ``` I got an error: Exception in thread "main" java.lang.IllegalArgumentException: Positive number of slices required We can fix this bug just modify the last line ,do a check of seq.size ``` def makeRDD[T: ClassTag](seq: Seq[(T, Seq[String])]): RDD[T] = withScope { assertNotStopped() val indexToPrefs = seq.zipWithIndex.map(t => (t._2, t._1._2)).toMap new ParallelCollectionRDD[T](this, seq.map(_._1), math.max(seq.size, defaultParallelism), indexToPrefs) } ``` ## How was this patch tested? manual tests (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/codlife/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15077.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15077 commit 673c29b2166e002d97b914ef8f8316df71fc8be7 Author: codlife <1004910...@qq.com> Date: 2016-09-10T02:02:21Z solve spark-17447 commit a4609059350af3ebeb68e5acdfc99daf424a817a Author: codlife <1004910...@qq.com> Date: 2016-09-10T02:26:46Z Update Partitioner.scala commit 7829bd0a3c66c474ec67f64d1ef043d0e251cdf6 Author: codlife <1004910...@qq.com> Date: 2016-09-10T12:26:54Z solve spark-17447 commit 8ddc442fc40f71d85fcaef8e4a721f6b31a5ea5c Author: codlife <1004910...@qq.com> Date: 2016-09-10T12:33:19Z fix code style commit 81c0eb9bb45b15dc746d935afa7a3259bb0efcd9 Author: codlife <1004910...@qq.com> Date: 2016-09-10T15:20:09Z solve spark-17447 commit f5d1e24d38f4a24f2ebc29214eb1a331846a0b1b Author: codlife <1004910...@qq.com> Date: 2016-09-10T15:21:44Z Update Partitioner.scala commit e717f65ff419e152a02e359f1241343d48e56977 Author: codlife <1004910...@qq.com> Date: 2016-09-13T10:45:34Z Merge branch 'master' of https://github.com/codlife/spark commit e426ccfabeb4e9baa38bceac893db7d985cfa860 Author: codlife <1004910...@qq.com> Date: 2016-09-13T10:51:57Z solve SPARK-17521 commit af1a102192794bce88afab172f3b074e901d8383 Author: codlife Date: 2016-09-13T11:32:34Z Merge pull request #2 from apache/master NEW --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15077: [SPARK-17521]Error when I use sparkContext.makeRD...
Github user codlife closed the pull request at: https://github.com/apache/spark/pull/15077 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org