Assuming col1 is numeric, as you've indicated, couldn't you simply generate a new column in file 1 by rounding to the nearest 1000? Then file 1 would look like:
*File 1: col1 col2 join_key 1234 2 1000 2222 3 2000 3333 5 3000 4444 6 4000 Then you could just join by the new key from file 1 and col2 from file 2. This works even if your ranges are smaller, just round to whatever makes sense. Eg, nearest 10. What this does not work for is if your ranges are variable. Are your ranges variable? :) --jacob @thedatachef On Fri, 2011-07-15 at 01:23 -0700, Lakshminarayana Motamarri wrote: > Hi all > > I have 2 CSV files a shown below: > > *File 1: File2: > col1 col2 col1 col2 col3 col4 > 1234 2 1000 1999 > 2222 3 2000 2999 > 3333 5 3000 3999 > 4444 6 4000 4999* > > Now I need to JOIN these 2 files in such a way that: > > File1-col1 should lie in between File2-col1 and File2-col2 > > Can I use JOIN / COGROUP or any other existing operators? > > or shud I build a new UDF? > > thanks > Narayan.
