Not sure if this would be helpful, but docs says that the default
PigStorage does implement that. I guess that your data needs to be
already sorted if you do not want to go through the reduce phase
during the join.

T

On Wed, Jul 20, 2011 at 12:13 PM, Ankur Jain <[email protected]> wrote:
> Thanks Ashutosh! Right, I too realized that yesterday. So, is there any
> other loader that implements
> CollectableLoadFunc interface required by the merge join?
>
>
> Thanks,
> Ankur
>
>
> On Wed, Jul 20, 2011 at 10:22 AM, Ashutosh Chauhan 
> <[email protected]>wrote:
>
>> Hey Ankur,
>>
>> Zebra's TableLoader works with the data written out using Zebra's
>> TableStorer. So, you need to write the data first using Zebra and then
>> subsequently load using TableLoader and do merge-join.
>>
>> Ashutosh
>> On Tue, Jul 19, 2011 at 14:28, Ankur Jain <[email protected]> wrote:
>> > Hi all,
>> >
>> > I'm trying to do a map-side only merge join [1] in pig using Zebra's
>> > TableLoader. (My data allows merge join.) But I'm being unable to use the
>> > TableLoader. Even a simple script that loads a table and just stores it
>> back
>> > doesn't work -
>> >
>> >  ----
>> >  A = load 'my_input' using org.apache.hadoop.zebra.pig.TableLoader('',
>> > 'sorted');
>> >  store A into 'my_output';
>> >  ----
>> >
>> >
>> >  'my_input' is input directory containing a single file with just 1
>> column -
>> >  ---
>> >  1
>> >  2
>> >  3
>> >  ---
>> >
>> >  The error I get is -
>> >
>> >  "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected
>> internal
>> > error. Failed to find deleted column groupsjava.io.IOException: BT Schema
>> > file doesn't exist: *file:/......./my_input/.btschema*"
>> >
>> >
>> >  I have tried specifying the schema using the 'AS' clause and the
>> DESCRIBE
>> > statement as well, but its fetches me the same error. Is the .btschema
>> file
>> > required? Is there any documentation available on its format? (I tried
>> > comma-separated column names with/without type info)
>> >
>> >
>> > I am also willing to work with any other loader that satisfies the merge
>> > join constraints. Thanks in anticipation.
>> >
>> >
>> >  Regards,
>> >  Ankur
>> >
>> >
>> >  [1] *http://pig.apache.org/docs/r0.8.0/piglatin_ref1.html#Merge+Joins*
>> >
>>
>

Reply via email to