RE: "Exploding" a Hive array in Pig from an RCFile

Malcolm Tye Wed, 11 Apr 2012 15:59:54 -0700

Hi Norbert,
            I don't seem to be getting what I'm after. If my data looks like
this


1133957209,61:0:1
4524524233,21:0

I want to produce

1133957209,61
1133957209,0
1133957209,1
4524524233,21
4524524233,0

I changed the LOAD statement to

mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING
org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID string,seg_ids
array');
opt = foreach mt generate C_SUB_ID, FLATTEN(STRSPLIT(seg_ids,':')) as
s_seg_id;

I don't seem to be getting the cross product, just something like the
following

1133957209,61,0,1
4524524233,21,0

Any ideas ?


Thanks

Malc


-----Original Message-----
From: Norbert Burger [mailto:[email protected]] 
Sent: 06 April 2012 16:01
To: [email protected]
Subject: Re: "Exploding" a Hive array<string> in Pig from an RCFile

Malcolm -- typically, you'd use a STRSPLIT and optional FLATTEN to tokenize
a chararray on some delimeter.  So the following should work:

opt = foreach mt generate C_SUB_ID, flatten(STRSPLIT(seg_ids,':')) as
s_seg_id;

Norbert

On Thu, Apr 5, 2012 at 8:58 AM, Malcolm Tye
<[email protected]>wrote:

> Hi,
>    I'm storing data into a partitioned table using Hive in RCFile 
> format, but I want to use Pig to do the aggregation of that data.
>
> In my array <string> in Hive, I have colon delimited data, E.g.
>
> :0:12:21:99:
>
> With the lateral view and explode functions in Hive, I can output each 
> value as a separate row.
>
> In Pig, I think I need to use flatten, but it just outputs the array 
> as a single field, and I can't see where to specify that the delimiter 
> is the delimiter/value separator
>
> register /opt/pig/trunk/bin/piggybank.jar mt = LOAD 
> '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING 
> org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID
> string,seg_ids
> array<string>');
> opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id; dump 
> opt;
>
>
>
> Thanks
>
> Malc
>
>
>

RE: "Exploding" a Hive array in Pig from an RCFile

Reply via email to