there is no sampling for order by in Hive. Hive uses a single reducer for order by (if you're talking about MR execution engine).
Hive on Spark is different for this, thought. Thanks, Xuefu On Mon, Mar 2, 2015 at 2:17 AM, Jeff Zhang <[email protected]> wrote: > Order by usually invoke 2 steps (sampling job and repartition job) but > hive only run one mr job for order by, so wondering when and where does > hive do sampling ? client side ? > > > -- > Best Regards > > Jeff Zhang >
