Re: Block Sampling

Carl Steinbach Fri, 15 Jun 2012 13:00:54 -0700

Done!

On Fri, Jun 15, 2012 at 12:26 PM, Ladda, Anand <[email protected]>wrote:


>  Thanks Carl. Could you give me edit rights to the wiki (
> [email protected]) to update the sampling page with this info****
>
> ** **
>
> *From:* Carl Steinbach [mailto:[email protected]]
> *Sent:* Friday, June 15, 2012 3:20 PM
> *To:* [email protected]
> *Subject:* Re: Block Sampling****
>
> ** **
>
> Hi Anand,****
>
> ** **
>
> This feature was implemented in HIVE-2121 and appeared in Hive 0.8.0.****
>
> ** **
>
> Ref: https://issues.apache.org/jira/browse/HIVE-2121****
>
> ** **
>
> Thanks.****
>
> ** **
>
> Carl****
>
> On Fri, Jun 15, 2012 at 11:59 AM, Ladda, Anand <[email protected]>
> wrote:****
>
> Has the block sampling feature been added to one of the latest (Hive 0.8
> or Hive 0.9) releases. The wiki has the blurb below on block sampling****
>
> *Block Sampling*****
>
> It is a feature that is still on trunk and is not yet in any release
> version.****
>
> block_sample: TABLESAMPLE (n PERCENT)****
>
> This will allow Hive to pick up at least n% data size (notice it doesn't
> necessarily mean number of rows) as inputs. Only CombineHiveInputFormat is
> supported and some special compression formats are not handled. If we fail
> to sample it, the input of MapReduce job will be the whole table/partition.
> We do it in HDFS block level so that the sampling granularity is block
> size. For example, if block size is 256MB, even if n% of input size is only
> 100MB, you get 256MB of data.****
>
> In the following example the input size 0.1% or more will be used for the
> query.****
>
> SELECT * ****
>
> FROM source TABLESAMPLE(0.1 PERCENT) s; ****
>
> Sometimes you want to sample the same data with different blocks, you can
> change this seed number:****
>
> set hive.sample.seednumber=<INTEGER>;****
>
>  ****
>
> ** **
>

Re: Block Sampling

Reply via email to