Thanks for your guys. I tried the code and found out what was the right pattern of the bag which could be loaded.
regards! Yong On Mon, Jun 11, 2012 at 10:32 PM, Russell Jurney <[email protected]>wrote: > my_data = LOAD 'location' AS (name:chararray, val1:int, val2:int); > by_name = foreach (group my_data by name) generate group as name, > my_data.(val1, val2) as my_data; > store by_name into 'new_location'; > > grouped_data = LOAD 'new_location') AS (name:chararray, > my_bag:bag{T2:tuple(val1:int, val2:int)}); > -- Wallah! > > On Mon, Jun 11, 2012 at 1:15 PM, Jonathan Coveney <[email protected] > >wrote: > > > Yong, > > > > If your data is not in the form of a bag, then there is no reason to load > > it in as a bag. You should load it in as chararray, int, int, and then > you > > can transform it into the form you want via the script itself. > > > > 2012/6/11 yonghu <[email protected]> > > > > > Dear Russell, > > > > > > My pig version is 0.91. I have tried a little bit. But I got a problem. > > My > > > data is looks like: > > > > > > henrietta 1 25 > > > sally 1 82 > > > fred 2 120 > > > elsie 3 45 > > > tom 1 82 > > > tom 4 98 > > > sally 2 87 > > > > > > the delimiter is '\t'. > > > > > > I use the command to load the data > > > > > > A = LOAD '/home/yonghu/test/student.txt' AS > > > >> (name:chararray,B:{T1:(id:int,result:int)}); > > > > > > then I got the following errors: > > > > > > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 2, column > 42> > > > mismatched input ';' expecting RIGHT_PAREN > > > Details at logfile: /home/yonghu/pig-0.9.1/bin/pig_1339440832010.log > > > > > > what does here right_paren mean? Is there any request of the input > data? > > > > > > Thanks. > > > > > > Yong > > > > > > On Mon, Jun 11, 2012 at 8:56 PM, Russell Jurney < > > [email protected] > > > >wrote: > > > > > > > High five! o/\o > > > > > > > > On Mon, Jun 11, 2012 at 11:51 AM, yonghu <[email protected]> > > wrote: > > > > > > > > > Dear Russell, > > > > > > > > > > Thanks for your response. > > > > > > > > > > Yong > > > > > > > > > > On Mon, Jun 11, 2012 at 7:33 PM, Russell Jurney < > > > > [email protected] > > > > > >wrote: > > > > > > > > > > > Doesn't need a UDF (if it's PigStorage or something else > > supported), > > > > > just a > > > > > > cast. > > > > > > > > > > > > foo = LOAD 'location' as B:bag{T2:tuple(t1:float,t2:float)}; > > > > > > > > > > > > Pulled from the docs: > > > > > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html > > > > > > > > > > > > A = LOAD 'mydata' AS (T1:tuple(f1:int, f2:int), > > > > > > B:bag{T2:tuple(t1:float,t2:float)}, M:map[] ); > > > > > > > > > > > > A = LOAD 'mydata' AS (T1:(f1:int, f2:int), > > > B:{T2:(t1:float,t2:float)}, > > > > > > M:[] ); > > > > > > > > > > > > > > > > > > Russell Jurney > > > > > > twitter.com/rjurney > > > > > > [email protected] > > > > > > datasyndrome.com > > > > > > > > > > > > On Jun 11, 2012, at 9:07 AM, yonghu <[email protected]> > wrote: > > > > > > > > > > > > Dear All, > > > > > > > > > > > > How can I define UDF load function to load the bag field? Such as > > A = > > > > > LOAD > > > > > > 'location' as (filed_name : bag {}). Can anyone show me an > example > > > > code? > > > > > > > > > > > > Regards! > > > > > > > > > > > > Yong > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Russell Jurney twitter.com/rjurney [email protected] > > > > datasyndrome.com > > > > > > > > > > > > > -- > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com >
