Hi Team,

I am creating this table:
CREATE TABLE IF NOT EXISTS orctest2 (
id string,
id2 string,
id3 string,
id4 string
)
STORED AS ORC tblproperties
("orc.stripe.size"="1048576","orc.row.index.stride"="3333”);

The stripe size is set to 1MB.
After loading data, the table file is about 60MB:
-rwxr-xr-x  1 root root 61335650 Dec  2 18:08 000000_0

However it genrated 1492 stripes and each stripe is about 40KB.

Stripes:

  Stripe: offset: 3 data: 39124 rows: 5000 tail: 68 index: 292

    Stream: column 0 section ROW_INDEX start: 3 length 16

    Stream: column 1 section ROW_INDEX start: 19 length 69

    Stream: column 2 section ROW_INDEX start: 88 length 69

    Stream: column 3 section ROW_INDEX start: 157 length 69

    Stream: column 4 section ROW_INDEX start: 226 length 69

    Stream: column 1 section DATA start: 295 length 9762

    Stream: column 1 section LENGTH start: 10057 length 19

    Stream: column 2 section DATA start: 10076 length 9762

    Stream: column 2 section LENGTH start: 19838 length 19

    Stream: column 3 section DATA start: 19857 length 9762

    Stream: column 3 section LENGTH start: 29619 length 19

    Stream: column 4 section DATA start: 29638 length 9762

    Stream: column 4 section LENGTH start: 39400 length 19

    Encoding column 0: DIRECT

    Encoding column 1: DIRECT_V2

    Encoding column 2: DIRECT_V2

    Encoding column 3: DIRECT_V2

    Encoding column 4: DIRECT_V2



Anybody knows how does the ORC stripe define?
Thanks.

-- 
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Reply via email to