Try adding a FLATTEN before applying TOBAG: foo = foreach searches GENERATE FLATTEN(STRSPLIT(hostinglist, ',')) as hostings, user; bar = foreach foo GENERATE TOBAG(*); dump bar;
Norbert On Thu, Feb 23, 2012 at 11:32 AM, Flo Leibert <[email protected]>wrote: > I was expecting similar behavior as TOKENIZE from STRSPLIT. I.e. all items > ending up in a bag. > Is there a way to further split these out such that they're elements of a > bag? The TOBAG function just places the entire tuple in a bag... > > Thanks! > > On Wed, Feb 22, 2012 at 7:59 PM, Norbert Burger <[email protected] > >wrote: > > > Hi Flo - in your example data, it seems like the STRSPLIT() is working as > > expected -- the function returns back a tuple which is being serialized > in > > the shell as "(t1,t2,t3,t4)". > > > > When you mention "hostinglist isn't split properly", which part are you > > referring to? > > > > Norbert > > > > On Wed, Feb 22, 2012 at 9:13 PM, Flo Leibert <[email protected] > > >wrote: > > > > > Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on > ','. I > > > have the following data > > > > > > user2 hosting9 > > > user1 hosting1,hosting2,hosting3,hosting4 > > > user1 hosting2,hosting4,hosting5 > > > > > > > > > searches = load '/data/sample/searches' using PigStorage('\t') as > (user: > > > chararray, hostinglist: chararray); > > > grunt> describe searches > > > searches: {user: chararray,hostinglist: chararray} > > > foo = foreach searches GENERATE STRSPLIT(hostinglist, ',') as hostings, > > > user; > > > dump foo > > > ((hosting9),user2) > > > ((hosting1,hosting2,hosting3,hosting4),user1) > > > ((hosting2,hosting4,hosting5),user1) > > > > > > > > > hostinglist isn't split properly - i tried to use the unicode character > > as > > > well but still no luck. Is this a known bug? > > > > > > Thanks, > > > Flo > > > > > >
