Malcolm -- typically, you'd use a STRSPLIT and optional FLATTEN to tokenize a chararray on some delimeter. So the following should work:
opt = foreach mt generate C_SUB_ID, flatten(STRSPLIT(seg_ids,':')) as s_seg_id; Norbert On Thu, Apr 5, 2012 at 8:58 AM, Malcolm Tye <[email protected]>wrote: > Hi, > I'm storing data into a partitioned table using Hive in RCFile format, > but I want to use Pig to do the aggregation of that data. > > In my array <string> in Hive, I have colon delimited data, E.g. > > :0:12:21:99: > > With the lateral view and explode functions in Hive, I can output each > value > as a separate row. > > In Pig, I think I need to use flatten, but it just outputs the array as a > single field, and I can't see where to specify that the delimiter is the > delimiter/value separator > > register /opt/pig/trunk/bin/piggybank.jar > mt = LOAD '/hrly_sub_smry/year_month_day=20120329/hour=04/*' USING > org.apache.pig.piggybank.storage.HiveColumnarLoader('C_SUB_ID > string,seg_ids > array<string>'); > opt = foreach mt generate C_SUB_ID, flatten(seg_ids) as s_seg_id; > dump opt; > > > > Thanks > > Malc > > >
