Re: How to sort an RDD ?

Fabrizio Milo aka misto Sat, 22 Feb 2014 16:49:26 -0800

I am using the latest from github compiled locally

On Sat, Feb 22, 2014 at 3:22 PM, Tathagata Das
<tathagata.das1...@gmail.com> wrote:
> Which version of Spark are you using?
>
> TD
>
>
> On Sat, Feb 22, 2014 at 3:15 PM, Fabrizio Milo aka misto
> <mistob...@gmail.com> wrote:
>>
>> Well it turns out you can use the takeOrdered function and create your
>> own Compare object
>>
>>    object AceScoreOrdering extends Ordering[Record] {
>>       def compare(a:Record, b:Record) = a.score.ace_score compare
>> b.score.ace_score
>>     }
>>
>>     val collected = dataset.takeOrdered(topN)(AceScoreOrdering)
>>
>>    and that is what I really wanted but now for some reason I am
>> getting this error:
>>
>>
>> 14/02/22 09:11:53 ERROR actor.OneForOneStrategy:
>> scala.collection.immutable.Nil$ cannot be cast to
>> org.apache.spark.util.BoundedPriorityQueue
>> java.lang.ClassCastException: scala.collection.immutable.Nil$ cannot
>> be cast to org.apache.spark.util.BoundedPriorityQueue
>> at org.apache.spark.rdd.RDD$$anonfun$top$2.apply(RDD.scala:941)
>> at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:727)
>> at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:724)
>> at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
>> at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:843)
>> at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:598)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>
>> On Sat, Feb 22, 2014 at 2:56 PM, Tathagata Das
>> <tathagata.das1...@gmail.com> wrote:
>> > You can use RDD.sortByKey to sort as well. rdd.map(x => (x,
>> > x)).sortByKey(... ).map(_._1)
>> >
>> > Not sure if it will work, but rdd.map(x => (x, null)).sortByKey(.......
>> > maybe more efficient.
>> >
>> > TD
>> >
>> >
>> >
>> > On Sat, Feb 22, 2014 at 2:41 PM, Fabrizio Milo aka misto
>> > <mistob...@gmail.com> wrote:
>> >>
>> >> Hello everyone,
>> >>
>> >> Is it possible to parallel sort using spark ?
>> >> I would expect some kind of method  rdd.sort( a,b  =>  a < b)
>> >> but I can only find sortByKeys.
>> >>
>> >> Am I missing something ?
>> >>
>> >> Thanks
>> >>
>> >> Fabrizio
>> >> --
>> >> LinkedIn: http://linkedin.com/in/fmilo
>> >> Twitter: @fabmilo
>> >> Github: http://github.com/Mistobaan/
>> >> -----------------------
>> >> Simplicity, consistency, and repetition - that's how you get through.
>> >> (Jack Welch)
>> >> Perfection must be reached by degrees; she requires the slow hand of
>> >> time (Voltaire)
>> >> The best way to predict the future is to invent it (Alan Kay)
>> >
>> >
>>
>>
>>
>> --
>> LinkedIn: http://linkedin.com/in/fmilo
>> Twitter: @fabmilo
>> Github: http://github.com/Mistobaan/
>> -----------------------
>> Simplicity, consistency, and repetition - that's how you get through.
>> (Jack Welch)
>> Perfection must be reached by degrees; she requires the slow hand of
>> time (Voltaire)
>> The best way to predict the future is to invent it (Alan Kay)
>
>




-- 
LinkedIn: http://linkedin.com/in/fmilo
Twitter: @fabmilo
Github: http://github.com/Mistobaan/
-----------------------
Simplicity, consistency, and repetition - that's how you get through.
(Jack Welch)
Perfection must be reached by degrees; she requires the slow hand of
time (Voltaire)
The best way to predict the future is to invent it (Alan Kay)

Re: How to sort an RDD ?

Reply via email to