The earlier query was just to show the dynamic partitions concept. For your case, you will have to use a query something like the one below. Although I have not given it a try, in theory this should work -
FROM raw insert overwrite table polished partition (partition1, partition2) select TRANSFORM(raw.data) USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING, STRING>) CLUSTER BY date Sumanth On Sun, Sep 18, 2011 at 12:28 PM, Adriaan Tijsseling <[email protected] > wrote: > I looked at your solution, but the problem is still that the "data" column > needs to be processed still. What I want is to process "data" and put the > results into a table with partitioned defined by the other columns. With > your solution, I get partitions but still with the same unprocessed data. > > Adriaan > > On 2011/09/18, at 04:56, Sumanth V wrote: > > > Hi Adriaan, > > > > To use dynamic partition, follow the following steps inside hive shell - > > > > #Set the following values - > > > > set hive.exec.dynamic.partition.mode=nonstrict; > > > > set hive.exec.dynamic.partition=true > > > > #Create another table - > > > > create table raw_2 > > ( > > data string > > ) > > partitioned by (partition1 string, partition2 string); > > > > #Now insert the values stored in table raw into table raw_2 using the > > following query - > > > > from raw > > insert overwrite table raw_2 partition (partition1, partition2) > > select data, partition1, partition2; > > > > This will dynamically create the 2 partitions based on the values of > > partition1 and partition2 and insert the values of 'data' in the > appropriate > > partition. > > > > Regards, > > Sumanth > > > > > > > > On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling > > <[email protected]>wrote: > > > >> Hi, > >> > >> I have a table created with > >> > >> CREATE TABLE raw(partition1 string, partition2 string, data string) ROW > >> FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE; > >> > >> I want to further process "data" and put it in a partition (partition1, > >> partition2) defined by the values in the relevant row. > >> > >> I'm however stuck at trying to use dynamic partitions in a query. With > >> predefined partition values it's straightforward: > >> > >> FROM ( > >> FROM raw > >> SELECT TRANSFORM(raw.data) > >> USING 'python parser.py' AS (foo STRING, date STRING, bar > >> MAP<STRING,STRING>) > >> CLUSTER BY date > >> ) tmap > >> INSERT OVERWRITE TABLE polished PARTITION (partition1='p1', > >> partition2='p2') SELECT foo, date, bar; > >> > >> What would be the best way to define the partition using raw.partition1 > and > >> raw.partition2 as values? > >> > >> Thanks much, > >> > >> Adriaan > >> > >> > >
