Not sure if this would be helpful, but docs says that the default PigStorage does implement that. I guess that your data needs to be already sorted if you do not want to go through the reduce phase during the join.
T On Wed, Jul 20, 2011 at 12:13 PM, Ankur Jain <[email protected]> wrote: > Thanks Ashutosh! Right, I too realized that yesterday. So, is there any > other loader that implements > CollectableLoadFunc interface required by the merge join? > > > Thanks, > Ankur > > > On Wed, Jul 20, 2011 at 10:22 AM, Ashutosh Chauhan > <[email protected]>wrote: > >> Hey Ankur, >> >> Zebra's TableLoader works with the data written out using Zebra's >> TableStorer. So, you need to write the data first using Zebra and then >> subsequently load using TableLoader and do merge-join. >> >> Ashutosh >> On Tue, Jul 19, 2011 at 14:28, Ankur Jain <[email protected]> wrote: >> > Hi all, >> > >> > I'm trying to do a map-side only merge join [1] in pig using Zebra's >> > TableLoader. (My data allows merge join.) But I'm being unable to use the >> > TableLoader. Even a simple script that loads a table and just stores it >> back >> > doesn't work - >> > >> > ---- >> > A = load 'my_input' using org.apache.hadoop.zebra.pig.TableLoader('', >> > 'sorted'); >> > store A into 'my_output'; >> > ---- >> > >> > >> > 'my_input' is input directory containing a single file with just 1 >> column - >> > --- >> > 1 >> > 2 >> > 3 >> > --- >> > >> > The error I get is - >> > >> > "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected >> internal >> > error. Failed to find deleted column groupsjava.io.IOException: BT Schema >> > file doesn't exist: *file:/......./my_input/.btschema*" >> > >> > >> > I have tried specifying the schema using the 'AS' clause and the >> DESCRIBE >> > statement as well, but its fetches me the same error. Is the .btschema >> file >> > required? Is there any documentation available on its format? (I tried >> > comma-separated column names with/without type info) >> > >> > >> > I am also willing to work with any other loader that satisfies the merge >> > join constraints. Thanks in anticipation. >> > >> > >> > Regards, >> > Ankur >> > >> > >> > [1] *http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Merge+Joins* >> > >> >
