Re: Question about Drill aggregate queries and schema change

Jinfeng Ni Mon, 24 Jul 2017 11:17:44 -0700

If you see such errors only when you enable predicate pushdown, it might be
related to a known issue; schema change failure caused by empty batch [1].
This happened when predicate prunes everything, and kudu reader did not
return a RowResult with a schema.  In such case, Drill would interprete the
requested column (such as a) as nullable int, which would lead conflict to
other minor-fragment which may have the data/schema.


The reason why you hit such failure randomly : there is a race condition
for such conflict to happen. If the minor-fragment with empty batch is
executed after the one with data is executed, the empty batch would be
ignored. If reverse order, it would cause conflict, hence query failure.

1. https://issues.apache.org/jira/browse/DRILL-5546



On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <[email protected]> wrote:

> I spent some time over the weekend altering Drill's storage-kudu to use
> Kudu's predicate pushdown api. Everything worked great as long as I
> performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but
> whenever I tested aggregate queries, they would succeed sometimes, then
> fail other times -- using the exact same queries.
>
> The failures were always like below. After searching around, I came across
> a number of jiras, like https://issues.apache.org/jira/browse/DRILL-2602
> that imply Drill can't handle sorts/aggregate queries on "changing
> schemas". This was confusing to me because I was testing with a single
> table/single schema, which leaves me wondering if "changing schema" means
> the unknown type of the aggregate itself? Meaning,  SELECT SUM(a),b FROM t
> GROUP BY a; where field a is an INT64, Drill can't figure out how to deal
> with SUM(a) because it may exceed the scale of INT64?
>
> If someone could clarify this for me I'd really appreciate it. I'm really
> hoping my above understanding is not correct and it's just a problem with
> the Vector handling in storage-kudu, because otherwise it seems that
> Drill's aggregation capabilities are rather limited.
>
> Errors:
>
> java.lang.IllegalStateException: Failure while reading vector.  Expected
> vector class of org.apache.drill.exec.vector.NullableIntVector but was
> holding vector class org.apache.drill.exec.vector.BigIntVector, field=
> campaign_id(BIGINT:REQUIRED)
> at org.apache.drill.exec.record.VectorContainer.getValueAccessorById(
> VectorContainer.java:321)
> at org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(
> RecordBatchLoader.java:179)
>
> OR
>
> Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts
> with changing schemas.
>

Re: Question about Drill aggregate queries and schema change

Reply via email to