bq. Is my understanding correct? Yes.
bq. Isn't it a possibility that storing the entire sample in memory would become a bottleneck when the sample size is large? Yes. Thanks, 2014-02-21 13:32 GMT+09:00 Vaibhav Jain <[email protected]>: > Hi, > > Hive 12 has added the functionality of parallel order by. I have a few > queries regarding the working of it. > From the source code I have figured out that to do a parallel orderby , a > partition table needs to created > which is provided as an input to TotalOrderPartitioner. To create the > partition table, a sample of > the hive table is stored as ArrayList of byte arrays and then sorted. > > So I have the following queries : > > 1) Is my understanding correct? > > 2) Isn't it a possibility that storing the entire sample in memory would > become a bottleneck when the sample size is large? > > > -- > Thanks > Vaibhav Jain >
