Hi Carlo, >> It looks like if I nest a foreach loop inside another foreach I'm not able to project any more the first level fields.
PIG-3581 tried to fix this, but it has introduced a regression. In trunk, targetDate is actually resolved. But date in your filter expression doesn't. I am not entirely sure whether defining a local scalar variable inside a nested foreach is supposed to be supported or not. Please see my comment in the jira- https://issues.apache.org/jira/browse/PIG-3581?focusedCommentId=13855935&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13855935 Thanks, Cheolsoo On Thu, Dec 19, 2013 at 5:36 AM, Carlo Di Fulco <[email protected]>wrote: > Hi, > > I'm writing a script to perform some analytics on a set of events occurring > in a set of apps. > I'm using Pig 0.11 and Hadoop 1.3. > > Every event contains: > > - d: date of the event > - aid: app id > - uid: user id > > The aim of my script is to calculate for each application and for each day > in my log the number of unique users during the previous x days (in the > example code that is 2). > > After trying various approaches with no result my current scripts looks > like: > > ________________________________________________________________ > > /** > * describe events output: > * > * events: {d: chararray,aid: chararray,uid: chararray} > */ > > eventDates = FOREACH events GENERATE d as targetDate; > dates = DISTINCT eventDates; > crossed = CROSS (GROUP events BY (aid)), dates; > > /** > * describe crossed output: > * > * crossed: {1-7::group: chararray,1-7::events: {(d: chararray,aid: > chararray,uid: chararray)},dates::targetDate: chararray} > */ > > result = FOREACH crossed { > date = ToDate(targetDate, 'yyyy-MM-dd'); > filtered = FILTER events BY DaysBetween(ToDate(d, 'yyyy-MM-dd'), > date) < 2 > AND SecondsBetween(ToDate(d, 'yyyy-MM-dd'), > date) > 0; > uniqueUsers = DISTINCT filtered.uid; > GENERATE group as aid, targetDate as date, COUNT(uniqueUsers) as > result; > } > > describe result; > dump result; > ________________________________________________________________ > > At this point I get the following error: > > 2013-12-19 05:20:17,283 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1025: > <file script.pig, line 46, column 25> Invalid field projection. Projected > field [targetDate] does not exist in schema: > d:bytearray,aid:chararray,uid:chararray. > > Line 46 is equivalent to: > > date = ToDate(targetDate, 'yyyy-MM-dd'); > > > But if I hardcode the date instead of reading it from the "crossed" bag: > > date = ToDate('2013-12-01', 'yyyy-MM-dd'); > > It actually works. > > It looks like if I nest a foreach loop inside another foreach I'm not able > to project any more the first level fields. > > Any idea about the reason of this? Or perhaps any better way to achieve the > same result? > > > Forgive any stupidity I may have written, this is my first approach to Pig > scripting! Any suggestion is highly appreciated. > > Thanks and Regards, > Carlo > > -- > Carlo Di Fulco >
