Re: Drill JDBC Plugin limit queries

Rahul Raj Sat, 13 Oct 2018 04:22:24 -0700

Regarding the heap out of error, it could be that the jdbc driver is
prefetching the entire record set to memory. I just had a look at
JdbcRecordReader, looks like by setting connection#autoCommit(false) and a
sufficient fetch size we could force the driver to stream data as required.
This is how postgres driver works.
https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor


We will have to see the behaviour of other drivers too.

Let me know your thoughts here.

Regards,
Rahul


On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <[email protected]> wrote:

> Hi Vitalii,
>
> There are two concrete implementations of the class -
> DrillPushLimitToScanRule LIMIT_ON_SCAN and
> DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> LIMIT_ON_SCAN has a comment mentioning "For now only applies to Parquet.
> And pushdown only apply limit but not offset"
>
> Also I enabled debug mode and found LIMIT is not getting pushed to the
> query.
>     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0 rows,
> 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
>       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
> 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
>         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
> rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
>           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration" WHERE
> "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows, 50.0 cpu
>
> Regarding the second point, its the java heap getting filled with jdbc
> results. How do we address this?
>
> Regards,
> Rahul
>
> On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <[email protected]>
> wrote:
>
>> Hi Rahul,
>>
>> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
>> optimization, whether the GroupScan supports Limit Push Down.
>> Also you can verify in debug mode whether this rule is fired.
>> Possibly for some external DB (like MapR-DB) Drill should have the
>> separate
>> class for this optimization [2].
>>
>> [1]
>>
>> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
>> [2]
>>
>> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
>>
>> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <[email protected]> wrote:
>>
>> > Hi,
>> >
>> > Drill does not push the LIMIT queries to external databases and I
>> assume it
>> > could be more related to Calcite. This leads to out of memory situations
>> > while querying large table to view few records.  Is there something that
>> > could be improved here? One solutions would be to push filters down to
>> the
>> > DB and/or combined with some JDBC batch size limit to flush a part as
>> > parquet.
>> >
>> > Regards,
>> > Rahul
>> >
>> > --
>> > _*** This email and any files transmitted with it are confidential and
>> > intended solely for the use of the individual or entity to whom it is
>> > addressed. If you are not the named addressee then you should not
>> > disseminate, distribute or copy this e-mail. Please notify the sender
>> > immediately and delete this e-mail from your system.***_
>> >
>>
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Re: Drill JDBC Plugin limit queries

Reply via email to