We've got some processes that dump some reporting data as a bunch of
parquet files, then runs queries involving joins with those tables (i.e. we
have a main table which is always non-empty, then a number of link tables
which join against which can be empty).

The Parquet files contain schema metadata, but some contain no row data.

Trying to join against them in Drill using e.g.

SELECT *
FROM dfs.`a.parquet` AS A
JOIN dfs.`b.parquet` AS B ON (A.id=B.id)
JOIN dfs.`c.parquet` AS C ON (A.id=C.id);

Fails with: "SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has
no read entries assigned" if either b.parquet or c.parquet contain no rows.

It looks like it might have been reported as an issue here
https://issues.apache.org/jira/browse/DRILL-4517 , but as it hasn't been
fixed since 2016, I'm wondering if there are any suggested workarounds for
the above, rather than waiting for a fix.

In MySQL/Postgres etc., joining against empty tables is fine, so this
behaviour was a bit unexpected, and is a major blocker for a project I'm
using Drill for.

Thanks,
Dave

Reply via email to