I found this edge case issue: the ORDER statement assumes non-empty
partitions to operate on. A simplified example below.

in = LOAD 'a.gz' AS (label:int);
sel = DISTINCT in PARALLEL <X>;
ord = ORDER sel BY label;
STORE ord INTO 'ord';

File a contents (*two* numbers, one per line)
"""
1
2
"""

If X == 1, the script works
If X == 20 (i.e. larger than # of results from previous step), the
scripts fails:
ERROR 2999: Unexpected internal error. java.lang.RuntimeException:
Empty samples file

The script above is simplified to distill the problem in to something
easy to test. Instead of DISTINCT, my script actually has a complex
JOIN statement that runs on multiple instances and produces a handful
of statistics (stored in the 'sel' variable).

Is this something other people have run into?

Skepty

Reply via email to