On Tue, Jan 04, 2011 at 02:10:52PM -0500, Jonathan Coveney wrote: > I wasn't quite sure what title this, but hopefully it'll make sense. I have > a couple of questions relating to a query that ultimately seeks to do this > > You have > > 1 10 > 1 12 > 1 15 > 1 16 > 2 1 > 2 2 > 2 3 > 2 6 > > You want your output to be the difference between the successive numbers in > the second column, ie > > 1 (10,0) > 1 (12,2) > 1 (15,3) > 1 (15,1) > 2 (1,0) > 2 (2,1) > 2 (3,1) > 2 (6,3) > > Obviously, I need to write a udf to do this, but I have a couple questions..
If you were to have some sort of row counter, then I suspect that you could do something along the lines of relCopy = relName; newRel = JOIN relName BY counter, relCopy BY counter-1; diff = FOREACH newRel GENERATE relName::stuff AS [...], relCopy::thing-relName::thing AS difference; if you really want to avoid writing an extra UDF. But in the absence of such a counter, yeah, I think a UDF would be necessary. Cheers, Kris -- Kris Coward http://unripe.melon.org/ GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3
