Hi All,
I am pretty new to pig and am having some issues with dereferencing. My data in
simplified form looks like below
data = load 'visitevent' using PigStorage() AS (visit:tuple(visitorid, visitid,
browser), events:bag{event:tuple(pagename, pagevar)});
cat visitevent (note there is tab in between the visit and the events)
(vr1,vi1,ff) {((pagea,eb1)),((pageb,eb3))}
(vr1,vi2,ff) {((pageb,eb2))}
(vr2,vi3,ff) {((pageb,eb4))}
(vr3,vi4,ie) {((pagec,eb3)),((pagea,eb5))}
My task is the following
1) Generate count(visitid) and count(distinct visitorid) by browser
2) Generate count(events), count(visitid) and count(distinct visitorid) by
pagename
I have issues with the first task. I tried the below after flattening visit
and it worked.
data = load 'c:/shared/visitevent' using PigStorage() AS
(visit:tuple(visitorid, visitid, browser), events:bag{event:tuple(pagename,
pagevar)});
data2 = foreach data generate FLATTEN(visit);
data3 = group data2 by browser;
dc = foreach data3 {d1 = data2.visitorid; d2 = distinct d1; generate group,
COUNT(d2), COUNT(d1);};
describe dc;
dump dc;
I don't understand why I would need to flatten visit. I tried the below
without flattening and whatever I try it doesn't work. Not sure why.
data = load 'c:/shared/visitevent' using PigStorage() AS
(visit:tuple(visitorid, visitid, browser), events:bag{event:tuple(pagename,
pagevar)});
data2 = foreach data generate visit;
data3 = group data2 by browser;
# describe data3 produces below
# data3: {group: bytearray,data2: {visit: (visitorid: bytearray,visitid:
bytearray,browser: bytearray)}}
# none of the below work as somehow it doesn't find the alias. Why?
dc = foreach data3 {d1 = data2.visitorid; d2 = distinct d1; generate group,
COUNT(d2), COUNT(d1);};
dc = foreach data3 {d1 = visit.visitorid; d2 = distinct d1; generate group,
COUNT(d2), COUNT(d1);};
What am I doing wrong? Since my task #2 is going to group by pagename which is
in a bag->tuple, do I have to flatten that one twice to get this working? Are
there any documentation on dereferencing complex and nested structures? Any
help appreciated.
Thanks
Priyo