OK….. I still can't get this to work.

I've read the documentation and i still get the same error on 0.9.0 …

Here's my code. I think it's implying that I need to have the predecessor as
a LOAD and meet the following conditions:


Inner merge join (between two tables) will only work under these conditions:
>
>    - Between the load of the sorted input and the merge join statement
>    there can only be filter statements and foreach statement where the foreach
>    statement should meet the following conditions:
>
>
>    - There should be no UDFs in the foreach statement.
>
>
>    - The foreach statement should not change the position of the join
>    keys.
>
>
>    - There should be no transformation on the join keys which will change
>    the sort order.
>
>
>    - Data must be sorted on join keys in ascending (ASC) order on both
>    sides.
>
>
>    - Right-side loader must implement either the {OrderedLoadFunc}
>    interface or {IndexableLoadFunc} interface.
>
>
>    - Type information must be provided for the join key in the schema.
>
> The Zebra and PigStorage loaders satisfy all of these conditions.


…… which I believe I AM….. but it's still not working.

Here's the data:


1,1
1,2
1,3
1,4
1,1000000000
0,1
0,2
0,3
0,4
0,1000000000


… and the script.

data = LOAD 'test2.csv' USING PigStorage(',') AS (source:int, target:int);

by_source = ORDER data BY source;
by_target = FOREACH (ORDER data BY target) GENERATE target, source;

STORE by_source INTO 'tmp/by_source' USING PigStorage();
STORE by_target INTO 'tmp/by_target' USING PigStorage();

by_source = LOAD 'tmp/by_source' USING PigStorage() AS (source:int,
target:int);
by_target = LOAD 'tmp/by_target' USING PigStorage() AS (source:int,
target:int);

joined = JOIN by_source BY source, by_target BY target USING 'merge';

STORE joined           INTO 'tmp/joined' ;


-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Reply via email to