Re: How to deal with Parquet files containing no rows without Drill errors?

2018-05-24 Thread Vitalii Diravka
Hi Dave,

The issue is not in joining, Drill can join empty schemaless table (for
example empty JSON file or empty directory).
DRILL-4517 is exactly describes the issue. You can add your test case with
data to that jira ticket.

Regarding workarounds, I am not aware of any.

Kind regards
Vitalii


On Thu, May 24, 2018 at 5:19 AM Dave Challis 
wrote:

> We've got some processes that dump some reporting data as a bunch of
> parquet files, then runs queries involving joins with those tables (i.e. we
> have a main table which is always non-empty, then a number of link tables
> which join against which can be empty).
>
> The Parquet files contain schema metadata, but some contain no row data.
>
> Trying to join against them in Drill using e.g.
>
> SELECT *
> FROM dfs.`a.parquet` AS A
> JOIN dfs.`b.parquet` AS B ON (A.id=B.id)
> JOIN dfs.`c.parquet` AS C ON (A.id=C.id);
>
> Fails with: "SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has
> no read entries assigned" if either b.parquet or c.parquet contain no rows.
>
> It looks like it might have been reported as an issue here
> https://issues.apache.org/jira/browse/DRILL-4517 , but as it hasn't been
> fixed since 2016, I'm wondering if there are any suggested workarounds for
> the above, rather than waiting for a fix.
>
> In MySQL/Postgres etc., joining against empty tables is fine, so this
> behaviour was a bit unexpected, and is a major blocker for a project I'm
> using Drill for.
>
> Thanks,
> Dave
>


How to deal with Parquet files containing no rows without Drill errors?

2018-05-24 Thread Dave Challis
We've got some processes that dump some reporting data as a bunch of
parquet files, then runs queries involving joins with those tables (i.e. we
have a main table which is always non-empty, then a number of link tables
which join against which can be empty).

The Parquet files contain schema metadata, but some contain no row data.

Trying to join against them in Drill using e.g.

SELECT *
FROM dfs.`a.parquet` AS A
JOIN dfs.`b.parquet` AS B ON (A.id=B.id)
JOIN dfs.`c.parquet` AS C ON (A.id=C.id);

Fails with: "SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 0 has
no read entries assigned" if either b.parquet or c.parquet contain no rows.

It looks like it might have been reported as an issue here
https://issues.apache.org/jira/browse/DRILL-4517 , but as it hasn't been
fixed since 2016, I'm wondering if there are any suggested workarounds for
the above, rather than waiting for a fix.

In MySQL/Postgres etc., joining against empty tables is fine, so this
behaviour was a bit unexpected, and is a major blocker for a project I'm
using Drill for.

Thanks,
Dave