Hi Cheolsoo, thanks for the help. I'll check the fix out once version 0.13 is out.
In the meanwhile are you aware of any way to solve my problem without that fix? Thanks, Carlo On Mon, Dec 23, 2013 at 10:23 PM, Cheolsoo Park <[email protected]>wrote: > Hi Carlo, > > >> It looks like if I nest a foreach loop inside another foreach I'm not > able to project any more the first level fields. > > PIG-3581 tried to fix this, but it has introduced a regression. In trunk, > targetDate is actually resolved. But date in your filter expression > doesn't. I am not entirely sure whether defining a local scalar variable > inside a nested foreach is supposed to be supported or not. > > Please see my comment in the jira- > > https://issues.apache.org/jira/browse/PIG-3581?focusedCommentId=13855935&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13855935 > > Thanks, > Cheolsoo > > > On Thu, Dec 19, 2013 at 5:36 AM, Carlo Di Fulco <[email protected] > >wrote: > > > Hi, > > > > I'm writing a script to perform some analytics on a set of events > occurring > > in a set of apps. > > I'm using Pig 0.11 and Hadoop 1.3. > > > > Every event contains: > > > > - d: date of the event > > - aid: app id > > - uid: user id > > > > The aim of my script is to calculate for each application and for each > day > > in my log the number of unique users during the previous x days (in the > > example code that is 2). > > > > After trying various approaches with no result my current scripts looks > > like: > > > > ________________________________________________________________ > > > > /** > > * describe events output: > > * > > * events: {d: chararray,aid: chararray,uid: chararray} > > */ > > > > eventDates = FOREACH events GENERATE d as targetDate; > > dates = DISTINCT eventDates; > > crossed = CROSS (GROUP events BY (aid)), dates; > > > > /** > > * describe crossed output: > > * > > * crossed: {1-7::group: chararray,1-7::events: {(d: chararray,aid: > > chararray,uid: chararray)},dates::targetDate: chararray} > > */ > > > > result = FOREACH crossed { > > date = ToDate(targetDate, 'yyyy-MM-dd'); > > filtered = FILTER events BY DaysBetween(ToDate(d, 'yyyy-MM-dd'), > > date) < 2 > > AND SecondsBetween(ToDate(d, > 'yyyy-MM-dd'), > > date) > 0; > > uniqueUsers = DISTINCT filtered.uid; > > GENERATE group as aid, targetDate as date, COUNT(uniqueUsers) as > > result; > > } > > > > describe result; > > dump result; > > ________________________________________________________________ > > > > At this point I get the following error: > > > > 2013-12-19 05:20:17,283 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 1025: > > <file script.pig, line 46, column 25> Invalid field projection. Projected > > field [targetDate] does not exist in schema: > > d:bytearray,aid:chararray,uid:chararray. > > > > Line 46 is equivalent to: > > > > date = ToDate(targetDate, 'yyyy-MM-dd'); > > > > > > But if I hardcode the date instead of reading it from the "crossed" bag: > > > > date = ToDate('2013-12-01', 'yyyy-MM-dd'); > > > > It actually works. > > > > It looks like if I nest a foreach loop inside another foreach I'm not > able > > to project any more the first level fields. > > > > Any idea about the reason of this? Or perhaps any better way to achieve > the > > same result? > > > > > > Forgive any stupidity I may have written, this is my first approach to > Pig > > scripting! Any suggestion is highly appreciated. > > > > Thanks and Regards, > > Carlo > > > > -- > > Carlo Di Fulco > > > -- Carlo Di Fulco
