Re: secondary sort in crunch on spark.

David Ortiz Tue, 23 Jun 2015 06:58:12 -0700

Correct me if I'm wrong, but if you are using an avro record or a Tuple
data structure, couldn't you get a secondary sort by just sticking the
fields in the order you want to apply the sort, and then using the regular
sort api?  For example, if I had say, itemid, itemprice, nosold and I
wanted to do something like....


select itemid, itemprice, sum(nosold) from table group by itemid,
itemprice, order by itemid, itemprice asc;

I could implement that as...
PTable<Pair<Integer, Double>, Long> items = {...some code to load the data
into this
structure...}.groupByKey().combineValues(Aggregators.SUM_LONGS).sort() and
get something similar right?


On Tue, Jun 23, 2015 at 8:52 AM Kidong Lee <[email protected]> wrote:

> Hi,
>
> I have been using spark to implement our recommendation algorithm, for
> which it was hard to get secondary sort by value, thus, I have implemented
> this algorithm with the help of hive.
> I think, spark does not support secondary sort yet.
>
> I have recently implemented the same recommendation algorithm in crunch
> running on spark with using crunch secondary sort API.
>
> I am wondering how to implement secondary sort in crunch running on spark.
>
> Anybody can give me some explanations about the implementation of
> secondary sort in crunch spark?
>
> thanks,
>
> - Kidong.
>
>

Re: secondary sort in crunch on spark.

Reply via email to