Yes, the log confirmed that the planning, especially physical planning, is the one that took most of the time.
If the definition of view s3.cisexport.transactionView is not very complicated (involves large # of tables), then it's possible that some planner rules have a bug. (In the past, we once saw couple of planner rules would be fired in a loop). Is it possible that you can share the DDL of the view? That may help us re-produce the problem and take a look at the trace of Calcite, which Drill uses as the query planner. On Mon, Feb 13, 2017 at 1:35 PM, David Kincaid <[email protected]> wrote: > Yes, we did confirm that it's planning time issue by both looking at the > query profile and running the EXPLAIN. > > We ran with DEBUG on (thanks for the easy to follow instructions). That > produces a lot of output and I'm not sure how to interpret most of it. > However we do see these lines that seem to be the important ones around > planning: > > 2017-02-13 21:04:07,285 [275de0c2-d74a-2d51-5810-70fded445be4:foreman] > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Logical Planning (no > pruning or join). (2655ms) > > 2017-02-13 21:04:08,342 [275de0c2-d74a-2d51-5810-70fded445be4:foreman] > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Partition Prune > Planning (1057ms): > > 2017-02-13 21:04:12,449 [275de0c2-d74a-2d51-5810-70fded445be4:foreman] > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:LOPT Join Planning > (4106ms): > > 2017-02-13 21:04:13,270 [275de0c2-d74a-2d51-5810-70fded445be4:foreman] > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to $SUM0 > (821ms): > > *2017-02-13 21:14:36,554 [275de0c2-d74a-2d51-5810-70fded445be4:foreman] > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning > (623281ms):* > > 2017-02-13 21:14:38,281 [275de0c2-d74a-2d51-5810-70fded445be4:foreman] > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Physical Partition > Prune Planning (1211ms): > > So that bolded one - VOLCANO:Physical Planning - is the one that seems to > be dominating the planning time. Does that mean anything to those who > understand the inner workings of Drill? > > - Dave > > On Mon, Feb 13, 2017 at 2:52 PM, Aman Sinha <[email protected]> wrote: > >> Ok, so at least the reading of the metadata cache file for planning >> purposes is relatively quick – 210ms. >> I assume you have confirmed that it is a planning time issue by either >> looking at the query profiles ‘start time’ or by running an EXPLAIN command >> (which only does planning, no execution). >> Given that, I would suggest enabling the DEBUG logging in the logback.xml >> to see where that 12 min gap is coming from. >> >> Here’s a snippet of the logback.xml to enable debug logging at the top >> level package. (additional information at https://drill.apache.org/docs/ >> modify-logback-xml/) >> >> <logger name="org.apache.drill" additivity="false"> >> <level value="debug" /> >> <appender-ref ref="SOCKET" /> >> </logger> >> >> >> On 2/13/17, 12:34 PM, "David Kincaid" <[email protected]> wrote: >> >> Sorry about the S3 confusion in the query. >> s3.cisexport.transactionView is >> just a view. The actual data resides on HDFS. At >> dfs.`parquet/transaction`. >> Using that in the query itself gives the same result. >> >> The only long gap we see in the log is between these two lines: >> >> 2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() >> took 0 >> ms, numFiles: 1 >> 2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:109] >> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for >> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms >> >> Here is most of the raw log output (I bolded the two lines with the 12 >> minute gap): >> >> 017-02-13 19:30:23,666 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.drill.exec.work.foreman.Foreman - Query text for query id >> 275df6b0-09e6-812b-e75a-b8722138eeb8: select fltb1.SAPID, yearmo, >> COUNT(*) as totalcnt, >> count(distinct(CASE >> WHEN >> (REPEATED_CONTAINS(fltb1.classLabels, >> 'Thing:Service:MedicalService:Diagnostic:Radiology:Ultrasound.*')) >> THEN fltb1.invoiceId >> END)) as ultracount, >> count(distinct (CASE >> WHEN >> (REPEATED_CONTAINS(fltb1.classLabels, >> 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*')) >> THEN fltb1.invoiceId >> END)) as labcount >> from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP(transactionDate, >> 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels >> from s3.cisexport.transactionView) fltb1 >> group by fltb1.sapid, yearmo >> LIMIT 20 >> 2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.c.s.persistence.ScanResult - loading 6 classes for >> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms >> 2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:30:24,260 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:30:24,261 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:30:28,021 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() >> took 0 >> ms, numFiles: 1 >> 2017-02-13 19:30:28,022 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() >> took 0 >> ms, numFiles: 1 >> 2017-02-13 19:30:31,027 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.exec.store.parquet.Metadata - Took 210 ms to read metadata from >> cache file >> 2017-02-13 19:30:31,028 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.exec.store.parquet.Metadata - No directories were modified. >> Took 1 >> ms to check modification time of 1 directories >> 2017-02-13 19:30:31,043 [275df6b0-09e6-812b-e75a-b8722138eeb8:foreman] >> INFO >> o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() >> took 0 >> ms, numFiles: 1 >> *2017-02-13 19:30:31,044 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:foreman] >> INFO o.a.d.exec.store.dfs.FileSelection - >> FileSelection.getStatuses() took >> 0 ms, numFiles: 1* >> *2017-02-13 19:42:05,877 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:109] >> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for >> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms* >> 2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:109] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:109] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,878 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:109] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,883 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:11:123] >> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for >> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms >> 2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:11:123] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:11:123] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,884 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:11:123] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,948 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:127] >> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for >> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms >> 2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:127] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:127] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,949 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:127] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> 2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:112] >> INFO o.a.d.c.s.persistence.ScanResult - loading 6 classes for >> org.apache.drill.exec.store.dfs.FormatPlugin took 0ms >> 2017-02-13 19:42:05,968 [275df6b0-09e6-812b-e75a- >> b8722138eeb8:frag:5:112] >> INFO o.a.d.c.s.persistence.ScanResult - loading 7 classes for >> org.apache.drill.common.logical.FormatPluginConfig took 0ms >> >> >> >> On Mon, Feb 13, 2017 at 2:22 PM, Aman Sinha <[email protected]> wrote: >> >> > In your drillbit.log file, can you look for the entries for the >> foreman >> > node to see where the time is being spent ? >> > e.g entries of the following type: >> > [275dec51-fcc1-f1bf-cb2f-57a838805a82:foreman] INFO >> > o.a.d.exec.store.parquet.Metadata - Took 64 ms to read metadata from >> > cache file >> > >> > Each entry is timestamped, so if you see a long gap between two of >> them, >> > that will give an idea about what’s going on. >> > I am not familiar with how the S3 is set up. My guess is there is >> some >> > latency issues there that could be causing it but let’s first get >> the log >> > output. >> > >> > -Aman >> > >> > On 2/13/17, 12:05 PM, "David Kincaid" <[email protected]> >> wrote: >> > >> > We've just recently started working with Drill and I'm seeing >> something >> > that doesn't seem right and I'm not sure how to troubleshoot. We >> have >> > 100 >> > Parquet files which are each about 400MB each using Snappy >> compression. >> > While trying to query this data I am seeing extraordinary >> planning >> > time for >> > certain queries. The planning time is about 12 minutes and the >> actual >> > execution of the query is less than 2 minutes. >> > >> > A few details of our setup. We are running Drill on an AWS EMR >> cluster >> > on >> > m4.16xlarge nodes (64 cores, 256GB each). We've given Drill an >> 8GB Java >> > heap and 100GB Java direct memory. We have verified that the >> metadata >> > cache >> > file is being created and used. We have tried a cluster of 10 >> nodes >> > and a >> > cluster of 2 nodes with no difference in planning time or >> execution >> > time. >> > >> > Does anyone have some pointers on troubleshooting excessive >> planning >> > time? >> > It seems like we must have something misconfigured or are missing >> > something. We're very new to Drill and I think I've exhausted >> all my >> > troubleshooting ideas so far. Any tips anyone can provide? >> > >> > Here is the main query I've been experimenting with, so you can >> get a >> > feel >> > for the query complexity: >> > >> > select fltb1.sapId, yearmo, >> > COUNT(*) as totalcnt, >> > count(distinct(CASE >> > WHEN >> > (REPEATED_CONTAINS(fltb1.classLabels, >> > 'Thing:Service:MedicalService:Diagnostic:Radiology: >> Ultrasound.*')) >> > THEN fltb1.invoiceId >> > END)) as ultracount, >> > count(distinct (CASE >> > WHEN >> > (REPEATED_CONTAINS(fltb1.classLabels, >> > 'Thing:Service:MedicalService:Diagnostic:LaboratoryTest.*')) >> > THEN fltb1.invoiceId >> > END)) as labcount >> > from (select sapid, invoiceId, TO_CHAR(TO_TIMESTAMP( >> transactionDate, >> > 'YYYY-MM-dd HH:mm:ss.SSSSSS'), 'yyyy-MM') yearmo, classLabels >> > from s3.cisexport.transactionView) fltb1 >> > group by fltb1.sapId, yearmo; >> > >> > Thanks, >> > >> > Dave >> > >> > >> > >> >> >>
