Re: Question about Drill aggregate queries and schema change

Cliff Resnick Mon, 24 Jul 2017 17:37:57 -0700

That makes sense, so I guess the solution is to return a null row instead?
If so is there a way to fag it to be ignored downstream (to avoid any
unintended consequences)?


Thanks for the help!

On Mon, Jul 24, 2017 at 7:06 PM, Jinfeng Ni <[email protected]> wrote:

> Based on my limited understanding of Drill's KuduRecordReader, the problem
> seems to be in the next() method [1]. When RowResult's iterator return
> false for hasNext(), in the case filter prune everything, the code will
> skip the call of addRowResult(). That means no columns/data will be added
> to scan's batch.  Nullable int will be injected in downstream operator.
>
> 1.
> https://github.com/apache/drill/blob/master/contrib/
> storage-kudu/src/main/java/org/apache/drill/exec/store/
> kudu/KuduRecordReader.java#L149-L163
>
>
> On Mon, Jul 24, 2017 at 1:35 PM, Cliff Resnick <[email protected]> wrote:
>
> > Jinfeng,
> >
> > I'm wondering if there's a way to push schema info to Drill even if there
> > is no result. KuduScanner always has schema, and RecordReader always has
> > scanner. But I can't seem to find the disconnect. Any idea if this is
> > possible even if it's Kudu-specific hack?
> >
> > -Cliff
> >
> > On Mon, Jul 24, 2017 at 2:46 PM, Cliff Resnick <[email protected]> wrote:
> >
> >> Jinfeng,
> >>
> >> Thanks, that confirms my thoughts as well. If I query using full range
> >> bounds and all hash keys, then Kudu prunes to the exact tablets and
> there
> >> is no error. I'll watch that jira expectantly because Kudu + Drill
> would be
> >> an awseome combo. But without the pruning it's useless to us.
> >>
> >> -Cliff
> >>
> >> On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <[email protected]> wrote:
> >>
> >>> If you see such errors only when you enable predicate pushdown, it
> might
> >>> be
> >>> related to a known issue; schema change failure caused by empty batch
> >>> [1].
> >>> This happened when predicate prunes everything, and kudu reader did not
> >>> return a RowResult with a schema.  In such case, Drill would interprete
> >>> the
> >>> requested column (such as a) as nullable int, which would lead conflict
> >>> to
> >>> other minor-fragment which may have the data/schema.
> >>>
> >>> The reason why you hit such failure randomly : there is a race
> condition
> >>> for such conflict to happen. If the minor-fragment with empty batch is
> >>> executed after the one with data is executed, the empty batch would be
> >>> ignored. If reverse order, it would cause conflict, hence query
> failure.
> >>>
> >>> 1. https://issues.apache.org/jira/browse/DRILL-5546
> >>>
> >>>
> >>>
> >>> On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <[email protected]>
> >>> wrote:
> >>>
> >>> > I spent some time over the weekend altering Drill's storage-kudu to
> use
> >>> > Kudu's predicate pushdown api. Everything worked great as long as I
> >>> > performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but
> >>> > whenever I tested aggregate queries, they would succeed sometimes,
> then
> >>> > fail other times -- using the exact same queries.
> >>> >
> >>> > The failures were always like below. After searching around, I came
> >>> across
> >>> > a number of jiras, like https://issues.apache.org/jira
> >>> /browse/DRILL-2602
> >>> > that imply Drill can't handle sorts/aggregate queries on "changing
> >>> > schemas". This was confusing to me because I was testing with a
> single
> >>> > table/single schema, which leaves me wondering if "changing schema"
> >>> means
> >>> > the unknown type of the aggregate itself? Meaning,  SELECT SUM(a),b
> >>> FROM t
> >>> > GROUP BY a; where field a is an INT64, Drill can't figure out how to
> >>> deal
> >>> > with SUM(a) because it may exceed the scale of INT64?
> >>> >
> >>> > If someone could clarify this for me I'd really appreciate it. I'm
> >>> really
> >>> > hoping my above understanding is not correct and it's just a problem
> >>> with
> >>> > the Vector handling in storage-kudu, because otherwise it seems that
> >>> > Drill's aggregation capabilities are rather limited.
> >>> >
> >>> > Errors:
> >>> >
> >>> > java.lang.IllegalStateException: Failure while reading vector.
> >>> Expected
> >>> > vector class of org.apache.drill.exec.vector.NullableIntVector but
> was
> >>> > holding vector class org.apache.drill.exec.vector.BigIntVector,
> field=
> >>> > campaign_id(BIGINT:REQUIRED)
> >>> > at org.apache.drill.exec.record.VectorContainer.
> getValueAccessorById(
> >>> > VectorContainer.java:321)
> >>> > at org.apache.drill.exec.record.RecordBatchLoader.getValueAcces
> >>> sorById(
> >>> > RecordBatchLoader.java:179)
> >>> >
> >>> > OR
> >>> >
> >>> > Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support
> >>> sorts
> >>> > with changing schemas.
> >>> >
> >>>
> >>
> >>
> >
>

Re: Question about Drill aggregate queries and schema change

Reply via email to