Awesome! I'll be watching that issue for the PR. On Tue, Jul 25, 2017 at 2:50 PM, Jinfeng Ni <[email protected]> wrote:
> I'm currently working on a patch using the idea described in DRILL-5546. > The idea is similar to your idea of null row : in stead of returning an > empty batch, or a scan batch with injected nullable-int columns, we will > return NONE to the downstream operators directly, which will avoid the > unintended consequence. > > I will probably wrap up that work in a few days, and submit a PR for > review. > > > > On Mon, Jul 24, 2017 at 5:37 PM, Cliff Resnick <[email protected]> wrote: > > > That makes sense, so I guess the solution is to return a null row > instead? > > If so is there a way to fag it to be ignored downstream (to avoid any > > unintended consequences)? > > > > Thanks for the help! > > > > On Mon, Jul 24, 2017 at 7:06 PM, Jinfeng Ni <[email protected]> wrote: > > > > > Based on my limited understanding of Drill's KuduRecordReader, the > > problem > > > seems to be in the next() method [1]. When RowResult's iterator return > > > false for hasNext(), in the case filter prune everything, the code will > > > skip the call of addRowResult(). That means no columns/data will be > added > > > to scan's batch. Nullable int will be injected in downstream operator. > > > > > > 1. > > > https://github.com/apache/drill/blob/master/contrib/ > > > storage-kudu/src/main/java/org/apache/drill/exec/store/ > > > kudu/KuduRecordReader.java#L149-L163 > > > > > > > > > On Mon, Jul 24, 2017 at 1:35 PM, Cliff Resnick <[email protected]> > wrote: > > > > > > > Jinfeng, > > > > > > > > I'm wondering if there's a way to push schema info to Drill even if > > there > > > > is no result. KuduScanner always has schema, and RecordReader always > > has > > > > scanner. But I can't seem to find the disconnect. Any idea if this is > > > > possible even if it's Kudu-specific hack? > > > > > > > > -Cliff > > > > > > > > On Mon, Jul 24, 2017 at 2:46 PM, Cliff Resnick <[email protected]> > > wrote: > > > > > > > >> Jinfeng, > > > >> > > > >> Thanks, that confirms my thoughts as well. If I query using full > range > > > >> bounds and all hash keys, then Kudu prunes to the exact tablets and > > > there > > > >> is no error. I'll watch that jira expectantly because Kudu + Drill > > > would be > > > >> an awseome combo. But without the pruning it's useless to us. > > > >> > > > >> -Cliff > > > >> > > > >> On Mon, Jul 24, 2017 at 2:17 PM, Jinfeng Ni <[email protected]> wrote: > > > >> > > > >>> If you see such errors only when you enable predicate pushdown, it > > > might > > > >>> be > > > >>> related to a known issue; schema change failure caused by empty > batch > > > >>> [1]. > > > >>> This happened when predicate prunes everything, and kudu reader did > > not > > > >>> return a RowResult with a schema. In such case, Drill would > > interprete > > > >>> the > > > >>> requested column (such as a) as nullable int, which would lead > > conflict > > > >>> to > > > >>> other minor-fragment which may have the data/schema. > > > >>> > > > >>> The reason why you hit such failure randomly : there is a race > > > condition > > > >>> for such conflict to happen. If the minor-fragment with empty batch > > is > > > >>> executed after the one with data is executed, the empty batch would > > be > > > >>> ignored. If reverse order, it would cause conflict, hence query > > > failure. > > > >>> > > > >>> 1. https://issues.apache.org/jira/browse/DRILL-5546 > > > >>> > > > >>> > > > >>> > > > >>> On Mon, Jul 24, 2017 at 10:56 AM, Cliff Resnick <[email protected]> > > > >>> wrote: > > > >>> > > > >>> > I spent some time over the weekend altering Drill's storage-kudu > to > > > use > > > >>> > Kudu's predicate pushdown api. Everything worked great as long > as I > > > >>> > performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") > > but > > > >>> > whenever I tested aggregate queries, they would succeed > sometimes, > > > then > > > >>> > fail other times -- using the exact same queries. > > > >>> > > > > >>> > The failures were always like below. After searching around, I > came > > > >>> across > > > >>> > a number of jiras, like https://issues.apache.org/jira > > > >>> /browse/DRILL-2602 > > > >>> > that imply Drill can't handle sorts/aggregate queries on > "changing > > > >>> > schemas". This was confusing to me because I was testing with a > > > single > > > >>> > table/single schema, which leaves me wondering if "changing > schema" > > > >>> means > > > >>> > the unknown type of the aggregate itself? Meaning, SELECT > SUM(a),b > > > >>> FROM t > > > >>> > GROUP BY a; where field a is an INT64, Drill can't figure out how > > to > > > >>> deal > > > >>> > with SUM(a) because it may exceed the scale of INT64? > > > >>> > > > > >>> > If someone could clarify this for me I'd really appreciate it. > I'm > > > >>> really > > > >>> > hoping my above understanding is not correct and it's just a > > problem > > > >>> with > > > >>> > the Vector handling in storage-kudu, because otherwise it seems > > that > > > >>> > Drill's aggregation capabilities are rather limited. > > > >>> > > > > >>> > Errors: > > > >>> > > > > >>> > java.lang.IllegalStateException: Failure while reading vector. > > > >>> Expected > > > >>> > vector class of org.apache.drill.exec.vector.NullableIntVector > but > > > was > > > >>> > holding vector class org.apache.drill.exec.vector.BigIntVector, > > > field= > > > >>> > campaign_id(BIGINT:REQUIRED) > > > >>> > at org.apache.drill.exec.record.VectorContainer. > > > getValueAccessorById( > > > >>> > VectorContainer.java:321) > > > >>> > at org.apache.drill.exec.record.RecordBatchLoader.getValueAcces > > > >>> sorById( > > > >>> > RecordBatchLoader.java:179) > > > >>> > > > > >>> > OR > > > >>> > > > > >>> > Error: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently > support > > > >>> sorts > > > >>> > with changing schemas. > > > >>> > > > > >>> > > > >> > > > >> > > > > > > > > > >
