Dexin, After re-reading your original post, I better understand what you were asking and I see that I didn't really answer your question.
Python UDFs do make writing UDFs much simpler so you might be more likely to actually use them. If you know Java, Python shouldn't be difficult to pick up. Though not having done it myself, I would say that you should be able to pass the tuple to the UDF. I see in the source for the ScriptingEngine that a Pig tuple is converted into a Python tuple so you should be able access any element of the Pig tuple in the Python UDF. I noticed this comment in the Python UDF manual though @ http://pig.apache.org/docs/r0.8.1/udf.html#Python+UDFs # tuple in python are immutable, appending to a tuple is not possible. The immutable comment is important, you won't be able to enrich the tuple but you can copy the values into a new tuple and return that. All that being said, here is one possible approach to the original problem that produces (a,0,(quarter,month,week,,today)) (b,1,(quarter,month,week,yesterday,)) (c,9,(quarter,month,,,)) (d,40,(quarter,,,,)) for your input data. C = FOREACH B GENERATE names, days_ago, udfs.myudf(names, days_ago); and the python UDF (in another file): @outputSchema("timeperiods:tuple(quarter:chararray,month:chararray,week:chararray,yesterday:chararray,today:chararray)") def timePeriods(names, days_ago): periods = [] if days_ago <= 90: periods.append('quarter') else: periods.append(None) if days_ago <= 30: periods.append('month') else: periods.append(None) if days_ago <= 7: periods.append('week') else: periods.append(None) if days_ago == 1: periods.append('yesterday') else: periods.append(None) if days_ago == 0: periods.append('today') else: periods.append(None) return tuple(periods) It's not exactly what you wanted but maybe it will suggest a proper solution. scott. On Fri, Jul 22, 2011 at 6:10 PM, Dexin Wang <[email protected]> wrote: > Thanks. I'm not familiar with python, but I write bunch of UDFs in java. > > One question though, how do I pass the the entire tuple to the UDF, I mean I > can't do something like this: > > B = FOREACH A GENERATE myudf(A) > > Essentially what I want is given a tuple, I want to enrich the tuple to add > one more field to it, and the value of the new field depends on the value in > some existing fields in the tuple. > > (a,1) -> (a,1,yesterday) > > how would I do that? > > I imagine I can do > B = GROUP A BY random; > C = FOREACH B GENERATE myudf(A); > > But I really don't like adding another GROUP BY here. > > On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster <[email protected]>wrote: > >> Hi Dexin, >> This is the sort of thing I've started using Python UDFs for. See: >> http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of >> how to write the python code. >> >> If your udf was implemented in Python you could then do this... >> >> register 'udfs.py' using jython as udf; >> ... >> B = FOREACH A generate name, udf.daysAgoString(days_ago); >> >> scott. >> >> On Fri, Jul 22, 2011 at 4:42 PM, Dexin Wang <[email protected]> wrote: >> > Possible to do conditional and more than one generate inside a foreach? >> > >> > for example, I have tuples like this (names, days_ago) >> > >> > (a,0) >> > (b,1) >> > (c,9) >> > (d,40) >> > >> > b shows up 1 day ago, so it belongs to all of the following: yesterday, >> last >> > week, last month, and last quarter. So I'd like to turn the above to: >> > >> > (a,0,today) >> > (b,1,yesterday) >> > (b,1,week) >> > (b,1,month) >> > (b,1,quarter) >> > (c,9,month) >> > (c,9,quarter) >> > (d,40,quarter) >> > >> > I imagine/dream I could do something like this >> > >> > B = FOREACH A >> > { >> > if (days_ago <= 90) generate name,days_ago,'quarter'; >> > if (days_ago <= 30) generate name,days_ago,'month'; >> > if (days_ago <= 7) generate name,days_ago,'week'; >> > if (days_ago == 1) generate name,days_ago,'yesterday'; >> > if (days_ago == 0) generate name,days_ago,'today'; >> > } >> > >> > of course that's not valid syntax. I could write my own UDF but would be >> > nice there's some way to get what I want without UDF. >> > >> > Thanks! >> > Dexin >> > >> >
