Re: JOIN or COGROUP? - need to join 1 column from first file which should lie in between 2 columns from second file.

Jacob Perkins Fri, 15 Jul 2011 05:23:38 -0700

Assuming col1 is numeric, as you've indicated, couldn't you simply
generate a new column in file 1 by rounding to the nearest 1000? Then
file 1 would look like:


*File 1:
col1  col2 join_key
1234  2    1000
2222  3    2000
3333  5    3000
4444  6    4000

Then you could just join by the new key from file 1 and col2 from file
2.

This works even if your ranges are smaller, just round to whatever makes
sense. Eg, nearest 10. What this does not work for is if your ranges are
variable. Are your ranges variable? :)

--jacob
@thedatachef

On Fri, 2011-07-15 at 01:23 -0700, Lakshminarayana Motamarri wrote:
> Hi all
> 
> I have 2 CSV files a shown below:
> 
> *File 1:                     File2:
> col1  col2             col1    col2   col3   col4
> 1234    2                1000   1999
> 2222    3                2000   2999
> 3333    5                3000   3999
> 4444    6                4000   4999*
> 
> Now I need to JOIN these 2 files in such a way that:
> 
> File1-col1 should lie in between File2-col1 and File2-col2
> 
> Can I use JOIN / COGROUP or any other existing operators?
> 
> or shud I build a new UDF?
> 
> thanks
> Narayan.

Re: JOIN or COGROUP? - need to join 1 column from first file which should lie in between 2 columns from second file.

Reply via email to