>From the jist of it, it seems like you need to override the default
partitioner to control how your data is distributed among partitions. Take
a look at different Partitioners available (Default, Range, Hash) if none
of these get you desired result, you might want to provide your own.


On Fri, Mar 28, 2014 at 2:08 PM, Adrian Mocanu <amoc...@verticalscope.com>wrote:

> I say you need to remap so you have a key for each tuple that you can sort
> on.
> Then call rdd.sortByKey(true) like this mystream.transform(rdd =>
> rdd.sortByKey(true))
> For this fn to be available you need to import
> org.apache.spark.rdd.OrderedRDDFunctions
>
> -----Original Message-----
> From: yh18190 [mailto:yh18...@gmail.com]
> Sent: March-28-14 5:02 PM
> To: u...@spark.incubator.apache.org
> Subject: RE: Splitting RDD and Grouping together to perform computation
>
>
> Hi,
> Here is my code for given scenario.Could you please let me know where to
> sort?I mean on what basis we have to sort??so that they maintain order in
> partition as thatof original sequence..
>
> val res2=reduced_hccg.map(_._2)// which gives RDD of numbers
> res2.foreach(println)
>     val result= res2.mapPartitions(p=>{
>    val l=p.toList
>
>    val approx=new ListBuffer[(Int)]
>    val detail=new ListBuffer[Double]
>    for(i<-0 until l.length-1 by 2)
>    {
>     println(l(i),l(i+1))
>     approx+=(l(i),l(i+1))
>
>
>    }
>    approx.toList.iterator
>    detail.toList.iterator
>  })
> result.foreach(println)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Splitting-RDD-and-Grouping-together-to-perform-computation-tp3153p3450.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to