subject:"about mr\-style merge sort"

Re: about mr-style merge sort

2015-09-12 Thread 周千昊

Hi, Shao & Pendey After repartition and sort within the partition, the application running on Spark is now faster than on MR. I will try to run it on a much larger dataset for benchmark. Thanks again for the guidance. 周千昊于2015年9月11日周五下午1:35写道： > Hi, Shao & Pendey > Thanks for

Re: about mr-style merge sort

2015-09-10 Thread 周千昊

Hi, Shao & Pendey Thanks for tips. I will try to workaround this. Saisai Shao 于2015年9月11日周五下午1:23写道： > Hi Qianhao, > > I think you could sort the data by yourself if you want achieve the same > result as MR, like rdd.reduceByKey(...).mapPartitions(// sort within each > partition). Do not

Re: about mr-style merge sort

2015-09-10 Thread Saisai Shao

Hi Qianhao, I think you could sort the data by yourself if you want achieve the same result as MR, like rdd.reduceByKey(...).mapPartitions(// sort within each partition). Do not call sortByKey again since it will introduce another shuffle (that's the reason why it is slower than MR). The problem

Re: about mr-style merge sort

2015-09-10 Thread Raghavendra Pandey

In mr jobs, the output is sorted only within reducer.. That can be better emulated by sorting each partition of rdd rather than total sorting the rdd.. In Rdd.mapPartition you can sort the data in one partition and try... On Sep 11, 2015 7:36 AM, "周千昊" wrote: > Hi, all > Can anyone give some

Re: about mr-style merge sort

2015-09-10 Thread 周千昊

Hi, all Can anyone give some tips about this issue? 周千昊于2015年9月8日周二下午4:46写道： > Hi, community > I have an application which I try to migrate from MR to Spark. > It will do some calculations from Hive and output to hfile which will > be bulk load to HBase Table, details as follow:

about mr-style merge sort

2015-09-08 Thread 周千昊

Hi, community I have an application which I try to migrate from MR to Spark. It will do some calculations from Hive and output to hfile which will be bulk load to HBase Table, details as follow: Rdd input = getSourceInputFromHive() Rdd> mapSideResult = input.glom().mapPartition

Re: about mr-style merge sort

Re: about mr-style merge sort

Re: about mr-style merge sort

Re: about mr-style merge sort

Re: about mr-style merge sort

about mr-style merge sort

6 matches

Site Navigation

Mail list logo

Footer information