That makes sense, so I guess the solution is to return a null row instead? If so is there a way to fag it to be ignored downstream (to avoid any unintended consequences)?
Thanks for the help! On Mon, Jul 24, 2017 at 7:06 PM, Jinfeng Ni <[email protected]> wrote: > Based on my limited understanding of Drill's KuduRecordReader, the problem > seems to be in the next() method [1]. When RowResult's iterator return > false for hasNext(), in the case filter prune everything, the code will > skip the call of addRowResult(). That means no columns/data will be added > to scan's batch. Nullable int will be injected in downstream operator. > > 1. > https://github.com/apache/drill/blob/master/contrib/ > storage-kudu/src/main/java/org/apache/drill/exec/store/ > kudu/KuduRecordReader.java#L149-L163 > > > On Mon, Jul 24, 2017 at 1:35 PM, Cliff Resnick <[email protected]> wrote: > > > Jinfeng, > > > > I'm wondering if there's a way to push schema info to Drill even if there > > is no result. KuduScanner always has schema, and RecordReader always has > > scanner. But I can't seem to find the disconnect. Any idea if this is > > possible even if it's Kudu-specific hack? > > > > -Cliff > > > > On Mon, Jul 24, 2017 at 2:46 PM, Cliff Resnick <[email protected]> wrote: > > > >> Jinfeng, > >> > >> Thanks, that confirms my thoughts as well. If I query using full range > >> bounds and all hash keys, then Kudu prunes to the exact tablets and > there > >> is no error. I'll watch that jira expectantly because Kudu + Drill > would be > >> an awseome combo. But without the pruning it's useless to us. > >> > >> -Cliff > >> > >> On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <[email protected]> wrote: > >> > >>> If you see such errors only when you enable predicate pushdown, it > might > >>> be > >>> related to a known issue; schema change failure caused by empty batch > >>> [1]. > >>> This happened when predicate prunes everything, and kudu reader did not > >>> return a RowResult with a schema. In such case, Drill would interprete > >>> the > >>> requested column (such as a) as nullable int, which would lead conflict > >>> to > >>> other minor-fragment which may have the data/schema. > >>> > >>> The reason why you hit such failure randomly : there is a race > condition > >>> for such conflict to happen. If the minor-fragment with empty batch is > >>> executed after the one with data is executed, the empty batch would be > >>> ignored. If reverse order, it would cause conflict, hence query > failure. > >>> > >>> 1. https://issues.apache.org/jira/browse/DRILL-5546 > >>> > >>> > >>> > >>> On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <[email protected]> > >>> wrote: > >>> > >>> > I spent some time over the weekend altering Drill's storage-kudu to > use > >>> > Kudu's predicate pushdown api. Everything worked great as long as I > >>> > performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but > >>> > whenever I tested aggregate queries, they would succeed sometimes, > then > >>> > fail other times -- using the exact same queries. > >>> > > >>> > The failures were always like below. After searching around, I came > >>> across > >>> > a number of jiras, like https://issues.apache.org/jira > >>> /browse/DRILL-2602 > >>> > that imply Drill can't handle sorts/aggregate queries on "changing > >>> > schemas". This was confusing to me because I was testing with a > single > >>> > table/single schema, which leaves me wondering if "changing schema" > >>> means > >>> > the unknown type of the aggregate itself? Meaning, SELECT SUM(a),b > >>> FROM t > >>> > GROUP BY a; where field a is an INT64, Drill can't figure out how to > >>> deal > >>> > with SUM(a) because it may exceed the scale of INT64? > >>> > > >>> > If someone could clarify this for me I'd really appreciate it. I'm > >>> really > >>> > hoping my above understanding is not correct and it's just a problem > >>> with > >>> > the Vector handling in storage-kudu, because otherwise it seems that > >>> > Drill's aggregation capabilities are rather limited. > >>> > > >>> > Errors: > >>> > > >>> > java.lang.IllegalStateException: Failure while reading vector. > >>> Expected > >>> > vector class of org.apache.drill.exec.vector.NullableIntVector but > was > >>> > holding vector class org.apache.drill.exec.vector.BigIntVector, > field= > >>> > campaign_id(BIGINT:REQUIRED) > >>> > at org.apache.drill.exec.record.VectorContainer. > getValueAccessorById( > >>> > VectorContainer.java:321) > >>> > at org.apache.drill.exec.record.RecordBatchLoader.getValueAcces > >>> sorById( > >>> > RecordBatchLoader.java:179) > >>> > > >>> > OR > >>> > > >>> > Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support > >>> sorts > >>> > with changing schemas. > >>> > > >>> > >> > >> > > >
