bq. Is my understanding correct?

Yes.

bq. Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?

Yes.

Thanks,


2014-02-21 13:32 GMT+09:00 Vaibhav Jain <[email protected]>:

> Hi,
>
> Hive 12 has added the functionality of parallel order by. I have a few
> queries regarding the working of it.
> From the source code I have figured out that to do a parallel orderby , a
> partition table needs to created
> which is provided as an input to TotalOrderPartitioner.  To create the
> partition table, a sample of
> the hive table is stored as ArrayList of byte arrays and then sorted.
>
> So I have the following queries :
>
> 1)  Is my understanding correct?
>
> 2) Isn't it a possibility that storing the entire sample in memory would
> become a bottleneck when the sample size is large?
>
>
> --
> Thanks
> Vaibhav Jain
>

Reply via email to