Hi
I have created a jira for this feature
https://issues.apache.org/jira/browse/SPARK-12524
Please vote this feature if its necessary. I would like to implement this
feature.
Thanks
Shushant
On Wed, Dec 2, 2015 at 1:14 PM, Rajat Kumar
wrote:
> What if I don't have to use aggregate function onl
What if I don't have to use aggregate function only groupbykeylocally() and
then a map transformation?
Will reduceByKeyLocally help here? Or is there any workaround if groupbykey
is not locally and is global across all partitions.
Thanks
On Tue, Dec 1, 2015 at 5:20 PM, ayan guha wrote:
> I bel
I believe reduceByKeyLocally was introduced for this purpose.
On Tue, Dec 1, 2015 at 10:21 PM, Jacek Laskowski wrote:
> Hi Rajat,
>
> My quick test has showed that groupBy will preserve the partitions:
>
> scala>
> sc.parallelize(Seq(0,0,0,0,1,1,1,1),2).map((_,1)).mapPartitionsWithIndex
> { case
Hi Rajat,
My quick test has showed that groupBy will preserve the partitions:
scala> sc.parallelize(Seq(0,0,0,0,1,1,1,1),2).map((_,1)).mapPartitionsWithIndex
{ case (idx, iter) => val s = iter.toSeq; println(idx + " with " +
s.size + " elements: " + s); s.toIterator
}.groupBy(_._1).mapPartitionsW
Hi
i have a javaPairRdd rdd1. i want to group by rdd1 by keys but
preserve the partitions of original rdd only to avoid shuffle since I know
all same keys are already in same partition.
PairRdd is basically constrcuted using kafka streaming low level consumer
which have all records with same key
Thanks a lot. Yes, this mapPartitions seems a better way of dealing with this
problem as for groupBy() I need to collect() data before applying
parallelize(), which is expensive.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Grouping-tp12407p12424
something like that but the return-type
> is PipelinedRDD, which is not iterable.
> Anybody an idea?
> Thanks in advance,
> Tassilo
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabb
ntext:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Grouping-tp12407.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For addit