The earlier query was just to show the dynamic partitions concept.
For your case, you will have to use a query something like the one below.
Although I have not given it a try, in theory this should work -

FROM raw
insert overwrite table polished partition (partition1, partition2)
select TRANSFORM(raw.data)
USING 'python parser.py' AS (foo STRING, date STRING, bar MAP<STRING,
STRING>)
CLUSTER BY date


Sumanth



On Sun, Sep 18, 2011 at 12:28 PM, Adriaan Tijsseling <[email protected]
> wrote:

> I looked at your solution, but the problem is still that the "data" column
> needs to be processed still. What I want is to process "data" and put the
> results into a table with partitioned defined by the other columns. With
> your solution, I get partitions but still with the same unprocessed data.
>
> Adriaan
>
> On 2011/09/18, at 04:56, Sumanth V wrote:
>
> > Hi Adriaan,
> >
> > To use dynamic partition, follow the following steps inside hive shell -
> >
> > #Set the following values -
> >
> > set hive.exec.dynamic.partition.mode=nonstrict;
> >
> > set hive.exec.dynamic.partition=true
> >
> > #Create another table -
> >
> > create table raw_2
> > (
> > data string
> > )
> > partitioned by (partition1 string, partition2 string);
> >
> > #Now insert the values stored in table raw into table raw_2 using the
> > following query -
> >
> > from raw
> > insert overwrite table raw_2 partition (partition1, partition2)
> > select data, partition1, partition2;
> >
> > This will dynamically create the 2 partitions based on the values of
> > partition1 and partition2 and insert the values of 'data' in the
> appropriate
> > partition.
> >
> > Regards,
> > Sumanth
> >
> >
> >
> > On Sat, Sep 17, 2011 at 2:18 PM, Adriaan Tijsseling
> > <[email protected]>wrote:
> >
> >> Hi,
> >>
> >> I have a table created with
> >>
> >> CREATE TABLE raw(partition1 string, partition2 string, data string) ROW
> >> FORMAT DELIMITED FIELDS TERMINATED BY '\001' STORED AS TEXTFILE;
> >>
> >> I want to further process "data" and put it in a partition (partition1,
> >> partition2) defined by the values in the relevant row.
> >>
> >> I'm however stuck at trying to use dynamic partitions in a query. With
> >> predefined partition values it's straightforward:
> >>
> >> FROM (
> >> FROM raw
> >> SELECT TRANSFORM(raw.data)
> >>        USING 'python parser.py' AS (foo STRING, date STRING, bar
> >> MAP<STRING,STRING>)
> >> CLUSTER BY date
> >> ) tmap
> >> INSERT OVERWRITE TABLE polished PARTITION (partition1='p1',
> >> partition2='p2') SELECT foo, date, bar;
> >>
> >> What would be the best way to define the partition using raw.partition1
> and
> >> raw.partition2 as values?
> >>
> >> Thanks much,
> >>
> >> Adriaan
> >>
> >>
>
>

Reply via email to