You can create an hourly partitioned table say H with the partition columns(dt, 
country, hour), then the INSERVER OVERWRITE command will become: ...

INSERT OVERWRITE TABLE H... PARTITION (dt='...', country, hr) ... select ...  
country, hour ..

You can also keep the old table page_view_stg's schema unchanged but make it an 
external table pointing to H. In this way your old queries on page_view_stg 
don't need to be changed.

On Apr 15, 2011, at 12:49 AM, Erix Yao wrote:

Oh, I see.

just as the example we have:

FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
           SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, 
null, null, pvs.ip, pvs.country

The dynamic partition we have is on country, and the other partition is dt.

In this implementation, what if I want to import the data into page_view more 
than 1 time ? Let us say, we import the data hourly, and with current dynamic 
partition implementation , the existing country partition will be overwritten!

Is there any other way to avoid this without telling me to import the data once 
per day?




2011/4/15 Ning Zhang <nzh...@fb.com<mailto:nzh...@fb.com>>
The INSERT OVERWRITE command will not overwrite the whole table. If you specify 
a partition in that table, it will only overwrite that partition. If you 
specify dynamic partitions, it will only create/overwrite partitions that will 
be seen from the input query (pvs.country in the example).


On Apr 15, 2011, at 12:31 AM, Erix Yao wrote:

Does this mean if I want the type field as the partition key , I will have to 
split the raw data by myself and load the files into the target table?

I see there's an example in tutorial:

FROM page_view_stg pvs
    INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
           SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, 
null, null, pvs.ip, pvs.country

but insert must overwrite the table whole table partition.
Can I insert without the overwrite key word?


2011/4/15 Ning Zhang <nzh...@fb.com<mailto:nzh...@fb.com>>
The LOAD DATA command only copy the files to the destination directory. It 
doesn't read the records of the input file, so it cannot do partitioning based 
on record values.

On Apr 14, 2011, at 10:52 PM, Erix Yao wrote:

hi,all
    The dynamic partition function is amazing ,but only works in insert clause. 
Can I use it while loading data into table?

    For example: load data  LOAD DATA LOCAL INPATH `/tmp/pv_2008-06-08_us.txt` 
INTO TABLE page_view PARTITION(date='2008-06-08', country='US', type);
type is the dynamic partition key in the raw data?

This will be very cool! If is supported, I will not have to category the raw 
data according to the type column.



--
haitao.yao@Beijing








--
haitao.yao@Beijing








--
haitao.yao@Beijing





Reply via email to