There has been some investigation into Crunch on Tez here[1]. I don't believe anyone is currently actively working on it but we'd love patches if someone had the time to bake it.
[1] - https://issues.apache.org/jira/browse/CRUNCH-441 On Wed, Jun 24, 2015 at 8:20 AM, Kidong Lee <[email protected]> wrote: > thanks for your reply, your answer is very helpful to understand it. > > I have another question. Is there any plan to support Tez on which crunch > can be run? > > - Kidong. > > > > > > 2015-06-23 23:06 GMT+09:00 Josh Wills <[email protected]>: > >> Hey Kidong, >> >> The short answer is that we cheat. The class to look at for the >> implementation details is: >> >> >> https://github.com/apache/crunch/blob/master/crunch-spark/src/main/java/org/apache/crunch/impl/spark/collect/PGroupedTableImpl.java >> >> ...and you sort of have to walk through three different tricks we do to >> make MapReduce partitioners, sorting classes, and grouping classes-- all of >> which we use in the secondary sort implementation-- to work on Spark. >> >> J >> >> On Tue, Jun 23, 2015 at 6:57 AM, David Ortiz <[email protected]> wrote: >> >>> Correct me if I'm wrong, but if you are using an avro record or a Tuple >>> data structure, couldn't you get a secondary sort by just sticking the >>> fields in the order you want to apply the sort, and then using the regular >>> sort api? For example, if I had say, itemid, itemprice, nosold and I >>> wanted to do something like.... >>> >>> select itemid, itemprice, sum(nosold) from table group by itemid, >>> itemprice, order by itemid, itemprice asc; >>> >>> I could implement that as... >>> PTable<Pair<Integer, Double>, Long> items = {...some code to load the >>> data into this >>> structure...}.groupByKey().combineValues(Aggregators.SUM_LONGS).sort() and >>> get something similar right? >>> >>> >>> On Tue, Jun 23, 2015 at 8:52 AM Kidong Lee <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I have been using spark to implement our recommendation algorithm, for >>>> which it was hard to get secondary sort by value, thus, I have implemented >>>> this algorithm with the help of hive. >>>> I think, spark does not support secondary sort yet. >>>> >>>> I have recently implemented the same recommendation algorithm in crunch >>>> running on spark with using crunch secondary sort API. >>>> >>>> I am wondering how to implement secondary sort in crunch running on >>>> spark. >>>> >>>> Anybody can give me some explanations about the implementation of >>>> secondary sort in crunch spark? >>>> >>>> thanks, >>>> >>>> - Kidong. >>>> >>>> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> > >
