Hi, Another observation is My query had where conditions based on the partition values
Total number of parquet files in directory - 102290 > Before Metadata refresh - Its reading only 4 files > After metadata refresh - its reading 102290 files This is how the refresh metadata works I mean it scans each and every files and get the results ? I dont have access to logs now . Thanks, Divya On 17 August 2017 at 13:48, Divya Gehlot <[email protected]> wrote: > Hi, > Another observation is > My query had where conditions based on the partition values > Before Metadata refresh - Its reading only 4 files > After metadata refresh - its reading 102290 files > > Thanks, > Divya > > On 17 August 2017 at 13:03, Padma Penumarthy <[email protected]> wrote: > >> Does your query have partition filter ? >> Execution time is increased most likely because partition pruning is not >> happening. >> Did you get a chance to look at the logs ? That might give some clues. >> >> Thanks, >> Padma >> >> >> > On Aug 16, 2017, at 9:32 PM, Divya Gehlot <[email protected]> >> wrote: >> > >> > Hi, >> > Even I am surprised . >> > I am running Drill version 1.10 on MapR enterprise version. >> > *Query *- Selecting all the columns on partitioned parquet table >> > >> > I observed few things from Query statistics : >> > >> > Value >> > >> > Before Refresh Metadata >> > >> > After Refresh Metadata >> > >> > Fragments >> > >> > 1 >> > >> > 13 >> > >> > DURATION >> > >> > 01 min 0.233 sec >> > >> > 18 min 0.744 sec >> > >> > PLANNING >> > >> > 59.818 sec >> > >> > 33.087 sec >> > >> > QUEUED >> > >> > Not Available >> > >> > Not Available >> > >> > EXECUTION >> > >> > 0.415 sec >> > >> > 17 min 27.657 sec >> > >> > The planning time is being reduced by approx 60% but the execution time >> > increased drastically. >> > I would like to understand why the exceution time increases after the >> > metadata refresh . >> > >> > >> > Appreciate the help. >> > >> > Thanks, >> > divya >> > >> > >> > On 17 August 2017 at 11:54, Padma Penumarthy <[email protected]> >> wrote: >> > >> >> Refresh table metadata should help reduce query planning time. >> >> It is odd that it went up after you did refresh table metadata. >> >> Did you check the logs to see what is happening ? You might have to >> >> turn on some debugs if needed. >> >> BTW, what version of Drill are you running ? >> >> >> >> Thanks, >> >> Padma >> >> >> >> >> >>> On Aug 16, 2017, at 8:15 PM, Divya Gehlot <[email protected]> >> >> wrote: >> >>> >> >>> Hi, >> >>> I have data in parquet file format . >> >>> when I run the query the data and see the execution plan I could see >> >>> following >> >>> statistics >> >>> >> >>>> TOTAL FRAGMENTS: 1 >> >>>>> DURATION: 01 min 0.233 sec >> >>>>> PLANNING: 59.818 sec >> >>>>> QUEUED: Not Available >> >>>>> EXECUTION: 0.415 sec >> >>>> >> >>>> >> >>> >> >>> As its a paquet file format I tried enabling refresh meta data >> >>> and run below command >> >>> REFRESH TABLE METADATA <path to table> ; >> >>> then run the same query again on the same table same data (no changes >> in >> >>> data) and could find the statistics as show below : >> >>> >> >>> TOTAL FRAGMENTS: 13 >> >>>>> DURATION: 14 min 14.604 sec >> >>>>> PLANNING: 33.087 sec >> >>>>> QUEUED: Not Available >> >>>>> EXECUTION: Not Available >> >>>> >> >>>> >> >>> The query is still running . >> >>> >> >>> Can somebody help me understand why the query taking so long once I >> >> issue >> >>> the refresh metadata command. >> >>> >> >>> Aprreciate the help ! >> >>> >> >>> Thanks, >> >>> Divya >> >> >> >> >> >> >
