[ 
https://issues.apache.org/jira/browse/DRILL-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720711#comment-14720711
 ] 

Daniel Barclay (Drill) commented on DRILL-3722:
-----------------------------------------------

Flink handles cases like that (LIMIT on many-file queries) by processing the 
results from reading some files before reading starts on all files. That means 
that when some fragment determines that it has enough data (per LIMIT), many 
planned file reads can be abandoned before they even start.

Would that approach work for Drill?

> LIMIT 1 query on top of a dir with 50K files takes ~150 seconds
> ---------------------------------------------------------------
>
>                 Key: DRILL-3722
>                 URL: https://issues.apache.org/jira/browse/DRILL-3722
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Assignee: Jinfeng Ni
>
> git.commit.id.abbrev=445790f
> I ran the below query on top of TPCH SF100 lineitem table with 50K files. For 
> the nature of the query, it looks like drill is very slow in handling it.
> {code}
> select * from lineitem limit 1;
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+--------------+---------------+----------------+-----------------+--------------+--------------+
> | L_ORDERKEY  | L_PARTKEY  | L_SUPPKEY  | L_LINENUMBER  | L_QUANTITY  | 
> L_EXTENDEDPRICE  | L_DISCOUNT  | L_TAX  | L_RETURNFLAG  | L_LINESTATUS  |  
> L_SHIPDATE  | L_COMMITDATE  | L_RECEIPTDATE  | L_SHIPINSTRUCT  |  L_SHIPMODE  
> |  L_COMMENT   |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+--------------+---------------+----------------+-----------------+--------------+--------------+
> | 456884480   | 19781678   | 781679     | 1             | 21.0        | 
> 36932.49         | 0.1         | 0.03   | [B@44f54509   | [B@4287753d   | 
> [B@4b2219ea  | [B@2bd3782f   | [B@48776c23    | [B@185c9300     | [B@65b6f17e 
>  | [B@4da8bb5d  |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+--------------+---------------+----------------+-----------------+--------------+--------------+
> 1 row selected (158.976 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to