Have you tried bucketing by the column plus setting orce,create.index and
orc.bloom.filter.columns
CREATE TABLE dummy (
ID INT
, CLUSTERED INT
, SCATTERED INT
, RANDOMISED INT
, RANDOM_STRING VARCHAR(50)
, SMALL_VC VARCHAR(10)
, PADDING VARCHAR(10)
)
*CLUSTERED BY (ID) INTO 256 BUCKETS*STORED AS ORC
TBLPROPERTIES (
*"orc.create.index"="true","orc.bloom.filter.columns"="ID","*
orc.bloom.filter.fpp"="0.05",
"orc.compress"="SNAPPY",
"orc.stripe.size"="16777216",
"orc.row.index.stride"="10000" )
;
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 9 April 2016 at 01:53, Gautam <[email protected]> wrote:
> Hey,
>
> This might be too obvious a question but I haven't found a way
> to validate ordering in an ORC file. I need each file to be ordered by a
> column, Is there a sure shot way of ensuring the sort order in an ORC file
> is as I expect it?
>
> The closest i'v come to is using the hive --orcfiledump --rowindex
> <col_id> which prints that columns min/max values in the index. But that is
> still not saying if the data within the stripes is sorted.
>
> Cheers,
> -Gautam.
>