Re: Drill query planning taking a LONG time

David Kincaid Wed, 15 Feb 2017 08:49:09 -0800

I ran that EXPLAIN that you suggested against the entire 100 file table and
it takes about 3 seconds. I will try to get a defect written up in the next
few days.


- Dave

On Tue, Feb 14, 2017 at 9:06 PM, Jinfeng Ni <[email protected]> wrote:

> From the two tests you did, I'm inclined to think there might be some
> special things in your parquet files. How do you generate these
> parquet files? Do they contain normal data type (int/float/varchar),
> or complex type (array/map)?
>
> In our environment, we also have hundreds of parquet files, each with
> size ~ hundreds of MBs.  A typical query (several tables joined) would
> takes a couple of seconds in planning.
>
> One more test if you can help run.
>
> EXPLAIN PLAN FOR
> SELECT someCol1, someCol2
> FROM dfs.`parquet/transaction/OneSingleFile.parquet`;
>
> The above query is simple enough that planner should not spend long
> time in enumerating different choices. If it still takes long time for
> query planning,  the more likely cause might be in parquet files you
> used.
>
>
>
> On Tue, Feb 14, 2017 at 1:06 PM, David Kincaid <[email protected]>
> wrote:
> > I will write up a defect. The first test you suggested below - running
> the
> > query on just one of our Parquet files produces the same result (10-12
> > minute planning time). However, the second test - using
> > cp.`tpch/nation.parquet` - results in a planning time of only about a
> > minute. So, I'm not sure how to interpret that. What does that mean to
> you
> > all?
> >
> > - Dave
> >
> > On Tue, Feb 14, 2017 at 12:37 PM, Jinfeng Ni <[email protected]> wrote:
> >
> >> Normally, the slow query planning could be caused by :
> >>
> >> 1. Some planner rule hit a bug when processing certain operators in
> >> the query, for instance join operator, distinct aggregate.  The query
> >> I tried on a small file seems to rule out this possibility.
> >> 2. The parquet metadata access time. According to the long, this does
> >> not seem to be the issue.
> >> 3. Something we are not aware of.
> >>
> >> To help get some clue, can you help do the following:
> >> 1. run the query over one single parquet files, in stead of 100
> >> parquet files? You can change using
> >> dfs.`parquet/transaction/OneSingleFile.parquet`. I'm wondering if the
> >> planning time is proportional to # of parquet files.
> >>
> >> 2. What if you try your query by replacing
> >> dfs.`parquet/transaction/OneSingleFile.parquet` with
> >> cp.`tpch/nation.parquet` which is a small tpch parquet file (you need
> >> re-enable the storage plugin 'cp')? Run EXPLAIN should be fine. This
> >> will tell us if the problem is caused by the parquet source, or the
> >> query itself.
> >>
> >> Yes, please create a defect in Drill JIRA.
> >>
> >> On Tue, Feb 14, 2017 at 5:02 AM, David Kincaid <[email protected]>
> >> wrote:
> >> > Thank you for the feedback. It seems there is nothing more I can do
> on my
> >> > end. What are my next steps? Shall I create a defect in the Drill
> Jira?
> >> >
> >> > - Dave
> >> >
> >> > On Mon, Feb 13, 2017 at 5:13 PM, Jinfeng Ni <[email protected]> wrote:
> >> >
> >> >> The size of parquet files will matter in terms of meta data access
> >> >> time, which is just 212 ms according to your log file. My
> >> >> understanding is it does not matter too much to the overall planning
> >> >> times. That's why it probably makes sense to try over such a small
> toy
> >> >> example.
> >> >>
> >> >> Normally the planning time for such simple query should be much
> >> >> shorter than 12 minutes.  It indicates it could be caused by a code
> >> >> bug, or something else that we are currently unaware of.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Feb 13, 2017 at 2:47 PM, David Kincaid <
> [email protected]>
> >> >> wrote:
> >> >> > The example in DRILL-5183 is just a very small toy example to
> >> demonstrate
> >> >> > the bug with how Drill reads Parquet array fields. It doesn't have
> >> >> anything
> >> >> > to do with this planning issue (at least I don't think it does).
> Sorry
> >> >> if I
> >> >> > confused things with that reference.
> >> >> >
> >> >> > I just tried running our query directly against the table at
> >> >> > dfs.`parquet/transaction` and get the same result (12 minutes of
> >> planning
> >> >> > time). I disabled the cp and s3 storage plugins that were enabled
> so
> >> that
> >> >> > only the dfs storage plugin is enabled and the result is the same.
> >> >> >
> >> >> > Is this expected for Drill to take this long in the planning phase
> >> for a
> >> >> > query? Is there anything else I can try or information I could
> >> provide to
> >> >> > help identify the bug (seems like a bug to me)? I really appreciate
> >> you
> >> >> > guys helping out so quickly this afternoon.
> >> >> >
> >> >> > - Dave
> >> >> >
> >> >> > On Mon, Feb 13, 2017 at 4:13 PM, Jinfeng Ni <[email protected]>
> wrote:
> >> >> >
> >> >> >> I downloaded books.parquet from DRILL-5183, and created a view on
> top
> >> >> >> of this single parquet file. Then, run EXPLAIN for the query, and
> it
> >> >> >> completes within 1.2 seconds on Drill 1.8.0 release. (The # of
> >> parquet
> >> >> >> files would impact the time to fetch metadata. Since it's not the
> >> >> >> bottleneck in this case, it should not cause a big difference).
> >> >> >>
> >> >> >> Do you see the long planning time issue for this query only, or it
> >> >> >> happens for other queries as well? Besides the possibility of
> >> planning
> >> >> >> rule bugs, we once saw another possible cause of long planning
> issue.
> >> >> >> In your storage plugin configuration, if you enable some other
> >> storage
> >> >> >> plugin (for instance, hbase, or hive etc) which are slow to
> access,
> >> >> >> then those un-relevant storage plugin might impact your query as
> >> well.
> >> >> >> You may temporarily disable those storage plugins, and see if it's
> >> the
> >> >> >> cause of the problem.
> >> >> >>
> >> >> >> 0: jdbc:drill:zk=local> explain plan for
> >> >> >> . . . . . . . . . . . > select fltb1.sapId, yearmo,
> >> >> >> . . . . . . . . . . . > COUNT(*) as totalcnt,
> >> >> >> . . . . . . . . . . . > count(distinct(CASE
> >> >> >> . . . . . . . . . . . >                WHEN
> >> >> >> . . . . . . . . . . . >                (REPEATED_CONTAINS(fltb1.
> >> >> >> classLabels,
> >> >> >> . . . . . . . . . . . >
> >> >> >> 'Thing:Service:MedicalService:Diagnostic:Radiology:
> Ultrasound.*'))
> >> >> >> . . . . . . . . . . . >                THEN fltb1.invoiceId
> >> >> >> . . . . . . . . . . . >                END)) as ultracount,
> >> >> >> . . . . . . . . . . . > count(distinct (CASE
> >> >> >> . . . . . . . . . . . >                 WHEN
> >> >> >> . . . . . . . . . . . >                 (REPEATED_CONTAINS(fltb1.
> >> >> >> classLabels,
> >> >> >> . . . . . . . . . . . >
> >> >> >> 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
> >> >> >> . . . . . . . . . . . >                 THEN fltb1.invoiceId
> >> >> >> . . . . . . . . . . . >                 END)) as labcount
> >> >> >> . . . . . . . . . . . > from (
> >> >> >> . . . . . . . . . . . >   select sapid, invoiceId,
> >> >> >> . . . . . . . . . . . >         TO_CHAR(TO_TIMESTAMP(
> >> transactionDate,
> >> >> >> 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo,
> >> >> >> . . . . . . . . . . . >         classLabels
> >> >> >> . . . . . . . . . . . >       from dfs.tmp.transactionView) fltb1
> >> >> >> . . . . . . . . . . . > group by fltb1.sapId, yearmo;
> >> >> >> +------+------+
> >> >> >> | text | json |
> >> >> >> +------+------+
> >> >> >> | 00-00    Screen
> >> >> >> 00-01      Project(sapId=[$0], yearmo=[$1], totalcnt=[$2],
> >> >> >> ultracount=[$3], labcount=[$4])
> >> >> >> ....................................
> >> >> >> 00-09                SelectionVectorRemover
> >> >> >> 00-12                  Sort(sort0=[$0], sort1=[$1], dir0=[ASC],
> >> >> dir1=[ASC])
> >> >> >> 00-15                    HashAgg(group=[{0, 1}],
> totalcnt=[COUNT()])
> >> >> >> ................................
> >> >> >> 00-22                          Scan(groupscan=[ParquetGroupScan
> >> >> >> [entries=[ReadEntryWithPath [path=file:/tmp/parquet/
> transaction]],
> >> >> >> selectionRoot=file:/tmp/parquet/transaction, numFiles=1,
> >> >> >> usedMetadataFile=false, columns=[`sapId`, `invoiceId`,
> >> >> >> `transactionDate`, `classLabels`.`array`]]])
> >> >> >>
> >> >> >> 1 row selected (1.195 seconds)
> >> >> >>
> >> >> >>
> >> >> >> On Mon, Feb 13, 2017 at 1:51 PM, David Kincaid <
> >> [email protected]>
> >> >> >> wrote:
> >> >> >> > Here is the entire transactionView.view.drill file. As you can
> see
> >> the
> >> >> >> view
> >> >> >> > itself is very simple and is just wrapping a syntactic problem
> with
> >> >> the
> >> >> >> > array field. That's an issue I reported in Jira under
> DRILL-5183 (
> >> >> >> > https://issues.apache.org/jira/browse/DRILL-5183)
> >> >> >> >
> >> >> >> > {
> >> >> >> >   "name" : "transactionView",
> >> >> >> >   "sql" : "SELECT `transactionRowKey`, `sapId`, `practiceName`,
> >> >> >> > `practiceCity`, `practiceState`, `practicePostalCode`,
> `animalId`,
> >> >> >> > `dateOfBirth`, `species`, `breed`, `gender`, `status`,
> `ownerId`,
> >> >> >> > `itemType`, `classification`, `subclass`, `practiceDescription`,
> >> >> >> > `clientDescription`, `invoiceId`, `unitOfMeasure`, `vendorName`,
> >> >> >> `vaccine`,
> >> >> >> > `rabies`, `vaccineType`, `price`, `quantity`, `transactionDate`,
> >> >> >> > `visitReason`, `speciesCode`, `genderCode`,
> >> >> `t`.`classLabels`['array'] AS
> >> >> >> > `classLabels`\nFROM `dfs`.`/parquet/transaction` AS `t`",
> >> >> >> >   "fields" : [ {
> >> >> >> >     "name" : "transactionRowKey",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "sapId",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "practiceName",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "practiceCity",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "practiceState",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "practicePostalCode",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "animalId",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "dateOfBirth",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "species",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "breed",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "gender",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "status",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "ownerId",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "itemType",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "classification",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "subclass",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "practiceDescription",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "clientDescription",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "invoiceId",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "unitOfMeasure",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "vendorName",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "vaccine",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "rabies",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "vaccineType",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "price",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "quantity",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "transactionDate",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "visitReason",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "speciesCode",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "genderCode",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   }, {
> >> >> >> >     "name" : "classLabels",
> >> >> >> >     "type" : "ANY",
> >> >> >> >     "isNullable" : true
> >> >> >> >   } ],
> >> >> >> >   "workspaceSchemaPath" : [ ]
> >> >> >> > }
> >> >> >> >
> >> >> >> > On Mon, Feb 13, 2017 at 3:47 PM, Jinfeng Ni <[email protected]>
> >> wrote:
> >> >> >> >
> >> >> >> >> Yes, the log confirmed that the planning, especially physical
> >> >> >> >> planning, is the one that took most of the time.
> >> >> >> >>
> >> >> >> >> If the definition of view s3.cisexport.transactionView is not
> very
> >> >> >> >> complicated (involves large # of tables), then it's possible
> that
> >> >> some
> >> >> >> >> planner rules have a bug. (In the past, we once saw couple of
> >> planner
> >> >> >> >> rules would be fired in a loop).
> >> >> >> >>
> >> >> >> >> Is it possible that you can share the DDL of the view?  That
> may
> >> help
> >> >> >> >> us re-produce the problem and take a look at the trace of
> Calcite,
> >> >> >> >> which Drill uses as the query planner.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: Drill query planning taking a LONG time

Reply via email to