Query regarding Hive Parallel Orderby

Vaibhav Jain Thu, 20 Feb 2014 20:33:07 -0800

Hi,

Hive 12 has added the functionality of parallel order by. I have a few
queries regarding the working of it.
>From the source code I have figured out that to do a parallel orderby , a
partition table needs to created
which is provided as an input to TotalOrderPartitioner.  To create the
partition table, a sample of
the hive table is stored as ArrayList of byte arrays and then sorted.


So I have the following queries :

1)  Is my understanding correct?

2) Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?


-- 
Thanks
Vaibhav Jain

Query regarding Hive Parallel Orderby

Reply via email to