Hi all,

the conditions of the Merge Join say that there are only FILTER and FOREACH 
allowed between the LOAD and the Merge Join.
I wonder why it is not possible to order the loaded input on the join key with 
the ORDER statement before applying the Merge Join?
Afterwards the input would be sorted according to the join key such that a 
Merge join would be possible.

Script could look for example like that:

indata1 = LOAD 'inputFile1' AS (a, b, c);
indata2 = LOAD 'inputFile2' AS (a, b, c);
sorted_indata1 = ORDER indata1 BY a ASC;
sorted_indata2 = ORDER indata2 BY a ASC;
result = JOIN sorted_indata1 BY a, sorted_indata2 BY a USING "merge";


Second question: Is the output of a Merge Join not also sorted on the Join key? 
This would highly improve the use of a Merge Join because it would be possible 
to concatenate multiple Merge Joins like this:

result1 = JOIN sorted_indata1 BY a, sorted_indata2 BY a USING "merge";
result2 = JOIN result1 BY a, sorted_indata3 BY a USING "merge";
...

Thx in advance,
Alex


Reply via email to