You can try the below things. For each of the below check the planning time individually
1. Run explain plan for a simple "select * from ` /scratch/localdisk/drill/testdata/Cust_1G_tsv`" 2. Replace the '*' in your query with explicit column names 3. Remove the extract header from your storage plugin configuration and from your data files? Rewrite your query to use, columns[0_based_index] instead of explicit column names Also how many columns do you have in your text files and what is the size of each file? Like gautam suggested, it would be good to take a look at drillbit.log file (from the foreman node where planning occurred) and the query profile as well. - Rahul On Mon, Mar 6, 2017 at 9:30 AM, Gautam Parai <[email protected]> wrote: > Can you please provide the drillbit.log file? > > > Gautam > > ________________________________ > From: PROJJWAL SAHA <[email protected]> > Sent: Monday, March 6, 2017 1:45:38 AM > To: [email protected] > Subject: Fwd: Minimise query plan time for dfs plugin for local file > system on tsv file > > all, please help me in giving suggestions on what areas i can look into > why the query planning time is taking so long for files which are local to > the drill machines. I have the same directory structure copied on all the 5 > nodes of the cluster. I am accessing the source files using out of the box > dfs storage plugin. > > Query planning time is approx 30 secs > Query execution time is apprx 1.5 secs > > Regards, > Projjwal > > ---------- Forwarded message ---------- > From: PROJJWAL SAHA <[email protected]<mailto:[email protected]>> > Date: Fri, Mar 3, 2017 at 5:06 PM > Subject: Minimise query plan time for dfs plugin for local file system on > tsv file > To: [email protected]<mailto:[email protected]> > > > Hello all, > > I am quering select * from dfs.xxx where yyy (filter condition) > > I am using dfs storage plugin that comes out of the box from drill on a > 1GB file, local to the drill cluster. > The 1GB file is split into 10 files of 100 MB each. > As expected I see 11 minor and 2 major fagments. > The drill cluster is 5 nodes cluster with 4 cores, 32 GB each. > > One observation is that the query plan time is more than 30 seconds. I ran > the explain plan query to validate this. > The query execution time is 2 secs. > total time taken is 32secs > > I wanted to understand how can i minimise the query plan time. Suggestions > ? > Is the time taken described above expected ? > Attached is result from explain plan query > > Regards, > Projjwal > > >
