Re: Merge join

Ankur Jain Wed, 20 Jul 2011 12:14:01 -0700

Thanks Ashutosh! Right, I too realized that yesterday. So, is there any
other loader that implements
CollectableLoadFunc interface required by the merge join?



Thanks,
Ankur


On Wed, Jul 20, 2011 at 10:22 AM, Ashutosh Chauhan <[email protected]>wrote:

> Hey Ankur,
>
> Zebra's TableLoader works with the data written out using Zebra's
> TableStorer. So, you need to write the data first using Zebra and then
> subsequently load using TableLoader and do merge-join.
>
> Ashutosh
> On Tue, Jul 19, 2011 at 14:28, Ankur Jain <[email protected]> wrote:
> > Hi all,
> >
> > I'm trying to do a map-side only merge join [1] in pig using Zebra's
> > TableLoader. (My data allows merge join.) But I'm being unable to use the
> > TableLoader. Even a simple script that loads a table and just stores it
> back
> > doesn't work -
> >
> >  ----
> >  A = load 'my_input' using org.apache.hadoop.zebra.pig.TableLoader('',
> > 'sorted');
> >  store A into 'my_output';
> >  ----
> >
> >
> >  'my_input' is input directory containing a single file with just 1
> column -
> >  ---
> >  1
> >  2
> >  3
> >  ---
> >
> >  The error I get is -
> >
> >  "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected
> internal
> > error. Failed to find deleted column groupsjava.io.IOException: BT Schema
> > file doesn't exist: *file:/......./my_input/.btschema*"
> >
> >
> >  I have tried specifying the schema using the 'AS' clause and the
> DESCRIBE
> > statement as well, but its fetches me the same error. Is the .btschema
> file
> > required? Is there any documentation available on its format? (I tried
> > comma-separated column names with/without type info)
> >
> >
> > I am also willing to work with any other loader that satisfies the merge
> > join constraints. Thanks in anticipation.
> >
> >
> >  Regards,
> >  Ankur
> >
> >
> >  [1] *http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Merge+Joins*
> >
>

Re: Merge join

Reply via email to