Hi Jacob,

thanks for the response and suggestion. But as u mentioned the range is NOT
small and NON linear...

              file 2:
              col1               col2            50331648    67108863
67108864 67109119  67109120 67109631  67109632 67110399  67110400
67110655
....

so as mentioned, the range is not small and there could be more than one
match from file 1, which like in between these...

suggestions? ideas? please...

thanks
Narayan.

On Fri, Jul 15, 2011 at 5:23 AM, Jacob Perkins <[email protected]>wrote:

> Assuming col1 is numeric, as you've indicated, couldn't you simply
> generate a new column in file 1 by rounding to the nearest 1000? Then
> file 1 would look like:
>
> *File 1:
> col1  col2 join_key
> 1234  2    1000
> 2222  3    2000
> 3333  5    3000
> 4444  6    4000
>
> Then you could just join by the new key from file 1 and col2 from file
> 2.
>
> This works even if your ranges are smaller, just round to whatever makes
> sense. Eg, nearest 10. What this does not work for is if your ranges are
> variable. Are your ranges variable? :)
>
> --jacob
> @thedatachef
>
> On Fri, 2011-07-15 at 01:23 -0700, Lakshminarayana Motamarri wrote:
> > Hi all
> >
> > I have 2 CSV files a shown below:
> >
> > *File 1:                     File2:
> > col1  col2             col1    col2   col3   col4
> > 1234    2                1000   1999
> > 2222    3                2000   2999
> > 3333    5                3000   3999
> > 4444    6                4000   4999*
> >
> > Now I need to JOIN these 2 files in such a way that:
> >
> > File1-col1 should lie in between File2-col1 and File2-col2
> >
> > Can I use JOIN / COGROUP or any other existing operators?
> >
> > or shud I build a new UDF?
> >
> > thanks
> > Narayan.
>
>
>


-- 
Thanking you,
---
Thanks & Regards,
Mr. Lakshminarayana. Motamarri, {Call-out Name: Narayan}
Graduate Student (M.S in Computer Science),
Arizona State University,
School of Computing and Informatics,
Mobile: (480) 280 0179.
http://www.public.asu.edu/~lmotamar/

Reply via email to