I found this edge case issue: the ORDER statement assumes non-empty partitions to operate on. A simplified example below.
in = LOAD 'a.gz' AS (label:int); sel = DISTINCT in PARALLEL <X>; ord = ORDER sel BY label; STORE ord INTO 'ord'; File a contents (*two* numbers, one per line) """ 1 2 """ If X == 1, the script works If X == 20 (i.e. larger than # of results from previous step), the scripts fails: ERROR 2999: Unexpected internal error. java.lang.RuntimeException: Empty samples file The script above is simplified to distill the problem in to something easy to test. Instead of DISTINCT, my script actually has a complex JOIN statement that runs on multiple instances and produces a handful of statistics (stored in the 'sel' variable). Is this something other people have run into? Skepty
