OK….. I still can't get this to work.
I've read the documentation and i still get the same error on 0.9.0 …
Here's my code. I think it's implying that I need to have the predecessor as
a LOAD and meet the following conditions:
Inner merge join (between two tables) will only work under these conditions:
>
> - Between the load of the sorted input and the merge join statement
> there can only be filter statements and foreach statement where the foreach
> statement should meet the following conditions:
>
>
> - There should be no UDFs in the foreach statement.
>
>
> - The foreach statement should not change the position of the join
> keys.
>
>
> - There should be no transformation on the join keys which will change
> the sort order.
>
>
> - Data must be sorted on join keys in ascending (ASC) order on both
> sides.
>
>
> - Right-side loader must implement either the {OrderedLoadFunc}
> interface or {IndexableLoadFunc} interface.
>
>
> - Type information must be provided for the join key in the schema.
>
> The Zebra and PigStorage loaders satisfy all of these conditions.
…… which I believe I AM….. but it's still not working.
Here's the data:
1,1
1,2
1,3
1,4
1,1000000000
0,1
0,2
0,3
0,4
0,1000000000
… and the script.
data = LOAD 'test2.csv' USING PigStorage(',') AS (source:int, target:int);
by_source = ORDER data BY source;
by_target = FOREACH (ORDER data BY target) GENERATE target, source;
STORE by_source INTO 'tmp/by_source' USING PigStorage();
STORE by_target INTO 'tmp/by_target' USING PigStorage();
by_source = LOAD 'tmp/by_source' USING PigStorage() AS (source:int,
target:int);
by_target = LOAD 'tmp/by_target' USING PigStorage() AS (source:int,
target:int);
joined = JOIN by_source BY source, by_target BY target USING 'merge';
STORE joined INTO 'tmp/joined' ;
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
Skype-in: *(415) 871-0687*