Yes, the log confirmed that the planning, especially physical
planning, is the one that took most of the time.

If the definition of view s3.cisexport.transactionView is not very
complicated (involves large # of tables), then it's possible that some
planner rules have a bug. (In the past, we once saw couple of planner
rules would be fired in a loop).

Is it possible that you can share the DDL of the view?  That may help
us re-produce the problem and take a look at the trace of Calcite,
which Drill uses as the query planner.





On Mon, Feb 13, 2017 at 1:35 PM, David Kincaid <[email protected]> wrote:
> Yes, we did confirm that it's planning time issue by both looking at the
> query profile and running the EXPLAIN.
>
> We ran with DEBUG on (thanks for the easy to follow instructions). That
> produces a lot of output and I'm not sure how to interpret most of it.
> However we do see these lines that seem to be the important ones around
> planning:
>
> 2017-02-13 21:04:07,285 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Logical Planning (no
> pruning or join). (2655ms)
>
> 2017-02-13 21:04:08,342 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Partition Prune
> Planning (1057ms):
>
> 2017-02-13 21:04:12,449 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:LOPT Join Planning
> (4106ms):
>
> 2017-02-13 21:04:13,270 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to $SUM0
> (821ms):
>
> *2017-02-13 21:14:36,554 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning
> (623281ms):*
>
> 2017-02-13 21:14:38,281 [275de0c2-d74a-2d51-5810-70fded445be4:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Physical Partition
> Prune Planning (1211ms):
>
> So that bolded one - VOLCANO:Physical Planning - is the one that seems to
> be dominating the planning time. Does that mean anything to those who
> understand the inner workings of Drill?
>
> - Dave
>
> On Mon, Feb 13, 2017 at 2:52 PM, Aman Sinha <[email protected]> wrote:
>
>> Ok, so at least the reading of the metadata cache file for planning
>> purposes is relatively quick – 210ms.
>> I assume you have confirmed that it is a planning time issue by either
>> looking at the query profiles ‘start time’ or by running an EXPLAIN command
>> (which only does planning, no execution).
>> Given that, I would suggest enabling the DEBUG logging in the logback.xml
>> to see where that 12 min gap is coming from.
>>
>> Here’s a snippet of the logback.xml to enable debug logging at the top
>> level package.  (additional information at https://drill.apache.org/docs/
>> modify-logback-xml/)
>>
>>   <logger name="org.apache.drill" additivity="false">
>>     <level value="debug" />
>>     <appender-ref ref="SOCKET" />
>>   </logger>
>>
>>
>> On 2/13/17, 12:34 PM, "David Kincaid" <[email protected]> wrote:
>>
>>     Sorry about the S3 confusion in the query.
>> s3.cisexport.transactionView is
>>     just a view. The actual data resides on HDFS. At
>> dfs.`parquet/transaction`.
>>     Using that in the query itself gives the same result.
>>
>>     The only long gap we see in the log is between these two lines:
>>
>>     2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>>     ms, numFiles: 1
>>     2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>>
>>     Here is most of the raw log output (I bolded the two lines with the 12
>>     minute gap):
>>
>>     017-02-13 19:30:23,666 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.drill.exec.work.foreman.Foreman - Query text for query id
>>     275df6b0-09e6-812b-e75a-b8722138eeb8: select fltb1.SAPID, yearmo,
>>     COUNT(*) as totalcnt,
>>     count(distinct(CASE
>>                    WHEN
>>                    (REPEATED_CONTAINS(fltb1.classLabels,
>>     'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*'))
>>                    THEN fltb1.invoiceId
>>                    END)) as ultracount,
>>     count(distinct (CASE
>>                     WHEN
>>                     (REPEATED_CONTAINS(fltb1.classLabels,
>>     'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
>>                     THEN fltb1.invoiceId
>>                     END)) as labcount
>>     from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate,
>>     'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
>>           from s3.cisexport.transactionView) fltb1
>>     group by fltb1.sapid, yearmo
>>     LIMIT 20
>>     2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>>     2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:30:24,261 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:30:28,021 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>>     ms, numFiles: 1
>>     2017-02-13 19:30:28,022 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>>     ms, numFiles: 1
>>     2017-02-13 19:30:31,027 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.exec.store.parquet.Metadata - Took 210 ms to read metadata from
>>     cache file
>>     2017-02-13 19:30:31,028 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.exec.store.parquet.Metadata - No directories were modified.
>> Took 1
>>     ms to check modification time of 1 directories
>>     2017-02-13 19:30:31,043 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman]
>> INFO
>>      o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses()
>> took 0
>>     ms, numFiles: 1
>>     *2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:foreman]
>>     INFO  o.a.d.exec.store.dfs.FileSelection -
>> FileSelection.getStatuses() took
>>     0 ms, numFiles: 1*
>>     *2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms*
>>     2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:109]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,883 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>>     2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:11:123]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,948 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>>     2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:127]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>     2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:112]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 6 classes for
>>     org.apache.drill.exec.store.dfs.FormatPlugin took 0ms
>>     2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a-
>> b8722138eeb8:frag:5:112]
>>     INFO  o.a.d.c.s.persistence.ScanResult - loading 7 classes for
>>     org.apache.drill.common.logical.FormatPluginConfig took 0ms
>>
>>
>>
>>     On Mon, Feb 13, 2017 at 2:22 PM, Aman Sinha <[email protected]> wrote:
>>
>>     > In your drillbit.log file, can you look for the entries for the
>> foreman
>>     > node to see where the time is being spent ?
>>     > e.g entries of the following type:
>>     >   [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO
>>     > o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from
>>     > cache file
>>     >
>>     > Each entry is timestamped, so if you see a long gap between two of
>> them,
>>     > that will give an idea about what’s going on.
>>     > I am not familiar with how the S3 is set up.  My guess is there is
>> some
>>     > latency issues there that could be causing it but let’s first get
>> the log
>>     > output.
>>     >
>>     > -Aman
>>     >
>>     > On 2/13/17, 12:05 PM, "David Kincaid" <[email protected]>
>> wrote:
>>     >
>>     >     We've just recently started working with Drill and I'm seeing
>> something
>>     >     that doesn't seem right and I'm not sure how to troubleshoot. We
>> have
>>     > 100
>>     >     Parquet files which are each about 400MB each using Snappy
>> compression.
>>     >     While trying to query this data I am seeing extraordinary
>> planning
>>     > time for
>>     >     certain queries. The planning time is about 12 minutes and the
>> actual
>>     >     execution of the query is less than 2 minutes.
>>     >
>>     >     A few details of our setup. We are running Drill on an AWS EMR
>> cluster
>>     > on
>>     >     m4.16xlarge nodes (64 cores, 256GB each). We've given Drill an
>> 8GB Java
>>     >     heap and 100GB Java direct memory. We have verified that the
>> metadata
>>     > cache
>>     >     file is being created and used. We have tried a cluster of 10
>> nodes
>>     > and a
>>     >     cluster of 2 nodes with no difference in planning time or
>> execution
>>     > time.
>>     >
>>     >     Does anyone have some pointers on troubleshooting excessive
>> planning
>>     > time?
>>     >     It seems like we must have something misconfigured or are missing
>>     >     something. We're very new to Drill and I think I've exhausted
>> all my
>>     >     troubleshooting ideas so far. Any tips anyone can provide?
>>     >
>>     >     Here is the main query I've been experimenting with, so you can
>> get a
>>     > feel
>>     >     for the query complexity:
>>     >
>>     >     select fltb1.sapId, yearmo,
>>     >     COUNT(*) as totalcnt,
>>     >     count(distinct(CASE
>>     >                    WHEN
>>     >                    (REPEATED_CONTAINS(fltb1.classLabels,
>>     >     'Thing:Service:MedicalService:Diagnostic:Radiology:
>> Ultrasound.*'))
>>     >                    THEN fltb1.invoiceId
>>     >                    END)) as ultracount,
>>     >     count(distinct (CASE
>>     >                     WHEN
>>     >                     (REPEATED_CONTAINS(fltb1.classLabels,
>>     >     'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*'))
>>     >                     THEN fltb1.invoiceId
>>     >                     END)) as labcount
>>     >     from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(
>> transactionDate,
>>     >     'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels
>>     >           from s3.cisexport.transactionView) fltb1
>>     >     group by fltb1.sapId, yearmo;
>>     >
>>     >     Thanks,
>>     >
>>     >     Dave
>>     >
>>     >
>>     >
>>
>>
>>

Reply via email to