Eugene Koifman created HIVE-17923:
-------------------------------------

             Summary: 'cluster by' should not be needed for a bucketed table
                 Key: HIVE-17923
                 URL: https://issues.apache.org/jira/browse/HIVE-17923
             Project: Hive
          Issue Type: Sub-task
    Affects Versions: 3.0.0
            Reporter: Eugene Koifman
            Priority: Blocker


given 
{noformat}
CREATE TABLE over10k_orc_bucketed(t tinyint,
           si smallint,
           i int,
           b bigint,
           f float,
           d double,
           bo boolean,
           s string,
           ts timestamp,
           `dec` decimal(4,2),
           bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
{noformat}
insert into over10k_orc_bucketed select * from over10k
{noformat}
produces 1 data file (bucket 0).  It should produce 4 based on input data.
{noformat}
insert into over10k_orc_bucketed select * from over10k cluster by si
{noformat}

does the right thing.

acid_vectorization_original.q has the full script



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to