Hi, Hive 12 has added the functionality of parallel order by. I have a few queries regarding the working of it. >From the source code I have figured out that to do a parallel orderby , a partition table needs to created which is provided as an input to TotalOrderPartitioner. To create the partition table, a sample of the hive table is stored as ArrayList of byte arrays and then sorted.
So I have the following queries : 1) Is my understanding correct? 2) Isn't it a possibility that storing the entire sample in memory would become a bottleneck when the sample size is large? -- Thanks Vaibhav Jain
