Could you be running into https://issues.apache.org/jira/browse/DRILL-3846 ?
- Rahul On Thu, Aug 17, 2017 at 9:13 PM, Padma Penumarthy <ppenumar...@mapr.com> wrote: > It is supposed to work like you expected. May be you are running into a > bug. > Why is it reading all files after metadata refresh ? That is difficult to > answer without > looking at the logs and query profile. If you look at the query profile, > you can may > be check what usedMetadataFile flag says for scan. > Also, I am thinking if you created so many files, your metadata > cache file could be big. May be you can manually sanity > check if it looks ok (look for .drill.parquet.metadata file in the root > directory) and not > corrupted ? > > Thanks, > Padma > > > On Aug 17, 2017, at 8:10 PM, Khurram Faraaz <kfar...@mapr.com<mailto:kfara > a...@mapr.com>> wrote: > > Please share your SQL query and the query plan. > > To get the query plan, execute EXPLAIN PLAN FOR <your-SQL-query>; > > > Thanks, > > Khurram > > ________________________________ > From: Divya Gehlot <divya.htco...@gmail.com<mailto:divya.htco...@gmail.com > >> > Sent: Friday, August 18, 2017 7:15:18 AM > To: user@drill.apache.org<mailto:user@drill.apache.org> > Subject: Re: Query Optimization > > Hi , > Yes its the same query its just the ran the metadata refresh command . > My understanding is metadata refresh command saves reading the metadata. > How about column values ... Why is it reading all the files after metedata > refresh ? > Partition helps to retrieve data faster . > Like in hive how it happens when you mention the partition column in where > condition > it just goes and read and improves the query performace . > In my query also I where conidtion has partioning column it should go and > read those partitioned files right ? > Why is it taking more time ? > Does the Drill works in different way compare to hive ? > > > Thanks, > Divya > > On 18 August 2017 at 07:37, Padma Penumarthy <ppenumar...@mapr.com<mailto: > ppenumar...@mapr.com>> wrote: > > It might read all those files if some new data gets added after running > refresh metadata cache. > If everything is same before and after metadata refresh i.e. no > new data added and query is exactly the same, then it should not do that. > Also, check if you can partition in a way that will not create so many > files in the > first place. > > Thanks, > Padma > > > On Aug 16, 2017, at 10:54 PM, Divya Gehlot <divya.htco...@gmail.com< > mailto:divya.htco...@gmail.com>> > wrote: > > Hi, > Another observation is > My query had where conditions based on the partition values > > Total number of parquet files in directory - 102290 > Before Metadata refresh - Its reading only 4 files > After metadata refresh - its reading 102290 files > > > This is how the refresh metadata works I mean it scans each and every > files > and get the results ? > > I dont have access to logs now . > > Thanks, > Divya > > On 17 August 2017 at 13:48, Divya Gehlot <divya.htco...@gmail.com<mailto: > divya.htco...@gmail.com>> > wrote: > > Hi, > Another observation is > My query had where conditions based on the partition values > Before Metadata refresh - Its reading only 4 files > After metadata refresh - its reading 102290 files > > Thanks, > Divya > > On 17 August 2017 at 13:03, Padma Penumarthy <ppenumar...@mapr.com<mailto: > ppenumar...@mapr.com>> > wrote: > > Does your query have partition filter ? > Execution time is increased most likely because partition pruning is > not > happening. > Did you get a chance to look at the logs ? That might give some clues. > > Thanks, > Padma > > > On Aug 16, 2017, at 9:32 PM, Divya Gehlot <divya.htco...@gmail.com<mailto: > divya.htco...@gmail.com>> > wrote: > > Hi, > Even I am surprised . > I am running Drill version 1.10 on MapR enterprise version. > *Query *- Selecting all the columns on partitioned parquet table > > I observed few things from Query statistics : > > Value > > Before Refresh Metadata > > After Refresh Metadata > > Fragments > > 1 > > 13 > > DURATION > > 01 min 0.233 sec > > 18 min 0.744 sec > > PLANNING > > 59.818 sec > > 33.087 sec > > QUEUED > > Not Available > > Not Available > > EXECUTION > > 0.415 sec > > 17 min 27.657 sec > > The planning time is being reduced by approx 60% but the execution > time > increased drastically. > I would like to understand why the exceution time increases after the > metadata refresh . > > > Appreciate the help. > > Thanks, > divya > > > On 17 August 2017 at 11:54, Padma Penumarthy <ppenumar...@mapr.com<mailto: > ppenumar...@mapr.com>> > wrote: > > Refresh table metadata should help reduce query planning time. > It is odd that it went up after you did refresh table metadata. > Did you check the logs to see what is happening ? You might have to > turn on some debugs if needed. > BTW, what version of Drill are you running ? > > Thanks, > Padma > > > On Aug 16, 2017, at 8:15 PM, Divya Gehlot <divya.htco...@gmail.com<mailto: > divya.htco...@gmail.com>> > wrote: > > Hi, > I have data in parquet file format . > when I run the query the data and see the execution plan I could see > following > statistics > > TOTAL FRAGMENTS: 1 > DURATION: 01 min 0.233 sec > PLANNING: 59.818 sec > QUEUED: Not Available > EXECUTION: 0.415 sec > > > > As its a paquet file format I tried enabling refresh meta data > and run below command > REFRESH TABLE METADATA <path to table> ; > then run the same query again on the same table same data (no > changes > in > data) and could find the statistics as show below : > > TOTAL FRAGMENTS: 13 > DURATION: 14 min 14.604 sec > PLANNING: 33.087 sec > QUEUED: Not Available > EXECUTION: Not Available > > > The query is still running . > > Can somebody help me understand why the query taking so long once I > issue > the refresh metadata command. > > Aprreciate the help ! > > Thanks, > Divya > > > > > > > > >