This time with the link to the paper: http://www.vldb.org/conf/1991/P443.PDF
:)
Alan.
On Jan 27, 2011, at 8:48 AM, Alan Gates wrote:
The script you propose will work, but if your data is of even
reasonable size it will be very slow. A quick search of the web
turned up one paper with an algorithm for parallel non-equijoins that
at first glance might work in your case.
Alan.
On Jan 26, 2011, at 5:15 PM, Jonathan Coveney wrote:
Also, it'd be worth thinking about this for the case where the min
and maxes
are arbitrary, and also the case where they aren't overlapping. That
is to
say, there is only one thing for a given value.
2011/1/26 Jonathan Coveney <[email protected]>
A is (val:int)
B is (thing:chararray, min:int, max:int)
Basically what I want is C = (val, thing) where val is between min
and max
for that thing. In sql the syntax for this would not be hard, in
pig the
naive solution I have is..
cro = CROSS A,B;
fil = FILTER cro BY val >= min AND val <= max;
C = FOREACH fil GENERATE val,thing;
I am wondering what the most efficient way of doing this sort of
operation
is. I imagine with some sort of indexing you could ideally speed
things up?
Not sure. But this is important enough that I'd be willing to do
some
legwork.
As always, thanks for your help.