Thanks Sean,
the important part of your answer for me is that orderBy + limit is doing
only "partial sort" because of optimizer. That's what I was missing. I will
give it a try...
J.D.
On Mon, Sep 5, 2016 at 2:26 PM, Sean Owen wrote:
> No,
> I'm not advising you to use .rdd, just saying it
No,
I'm not advising you to use .rdd, just saying it is possible.
Although I'd only use RDDs if you had a good reason to, given Datasets
now, they are not gone or even deprecated.
You do not need to order the whole data set to get the top eleme
nt. That isn't what top does though. You might
Thanks Sean,
I was under impression that spark creators are trying to persuade user
community not to use RDD api directly. Spark summit I attended was full of
this. So I am a bit surprised that I hear use-rdd-api as an advice from
you. But if this is a way then I have a second question. For conver
You can always call .rdd.top(n) of course. Although it's slightly
clunky, you can also .orderBy($"value".desc).take(n). Maybe there's an
easier way.
I don't think if there's a strong reason other than it wasn't worth it
to write this and many other utility wrappers that a) already exist on
the und