Thanks Ashutosh! Right, I too realized that yesterday. So, is there any other loader that implements CollectableLoadFunc interface required by the merge join?
Thanks, Ankur On Wed, Jul 20, 2011 at 10:22 AM, Ashutosh Chauhan <[email protected]>wrote: > Hey Ankur, > > Zebra's TableLoader works with the data written out using Zebra's > TableStorer. So, you need to write the data first using Zebra and then > subsequently load using TableLoader and do merge-join. > > Ashutosh > On Tue, Jul 19, 2011 at 14:28, Ankur Jain <[email protected]> wrote: > > Hi all, > > > > I'm trying to do a map-side only merge join [1] in pig using Zebra's > > TableLoader. (My data allows merge join.) But I'm being unable to use the > > TableLoader. Even a simple script that loads a table and just stores it > back > > doesn't work - > > > > ---- > > A = load 'my_input' using org.apache.hadoop.zebra.pig.TableLoader('', > > 'sorted'); > > store A into 'my_output'; > > ---- > > > > > > 'my_input' is input directory containing a single file with just 1 > column - > > --- > > 1 > > 2 > > 3 > > --- > > > > The error I get is - > > > > "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected > internal > > error. Failed to find deleted column groupsjava.io.IOException: BT Schema > > file doesn't exist: *file:/......./my_input/.btschema*" > > > > > > I have tried specifying the schema using the 'AS' clause and the > DESCRIBE > > statement as well, but its fetches me the same error. Is the .btschema > file > > required? Is there any documentation available on its format? (I tried > > comma-separated column names with/without type info) > > > > > > I am also willing to work with any other loader that satisfies the merge > > join constraints. Thanks in anticipation. > > > > > > Regards, > > Ankur > > > > > > [1] *http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Merge+Joins* > > >
