I did not get a chance to review the log file. However the next thing I would try is to make your cluster a single node cluster first and then run the same explain plan query separately on each individual file.
On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" <[email protected]> wrote: > Hi Rahul, > > thanks for your suggestions. However, I am still not able to see any > reduction in query planning time > by explicit column names, removing extract headers and using columns[index] > > As I said, I ran explain plan and its taking 30+ secs for me. > My data is 1 GB tsv split into 20 files of 5 MB each. > Each 5MB file has close to 50k records > Its a 5 node cluster, and width per node is 4 > Therefore, total number of minor fragments for one major fragment is 20 > I have copied the source directory in all the drillbit nodes > > can you tell me a reasonable time estimate which I can expect drill to > return result for query for the above described scenario. > Query is - select columns[0] from > dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv` > where columns[0] ='41' and columns[3] ='568' > > attached is the json profile > and the drillbit.log > > I also have the tracing enabled. > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler > org.apache.drill.exec.work.foreman.Foreman > however i see the duration of various steps in the order of ms in the logs. > i am not sure where planning time of the order of 30 secs is consumed. > > Please help > > Regards, > Projjwal > > > > > > > > On Mon, Mar 6, 2017 at 11:23 PM, rahul challapalli < > [email protected]> wrote: > >> You can try the below things. For each of the below check the planning >> time >> individually >> >> 1. Run explain plan for a simple "select * from ` >> /scratch/localdisk/drill/testdata/Cust_1G_tsv`" >> 2. Replace the '*' in your query with explicit column names >> 3. Remove the extract header from your storage plugin configuration and >> from your data files? Rewrite your query to use, columns[0_based_index] >> instead of explicit column names >> >> Also how many columns do you have in your text files and what is the size >> of each file? Like gautam suggested, it would be good to take a look at >> drillbit.log file (from the foreman node where planning occurred) and the >> query profile as well. >> >> - Rahul >> >> On Mon, Mar 6, 2017 at 9:30 AM, Gautam Parai <[email protected]> wrote: >> >> > Can you please provide the drillbit.log file? >> > >> > >> > Gautam >> > >> > ________________________________ >> > From: PROJJWAL SAHA <[email protected]> >> > Sent: Monday, March 6, 2017 1:45:38 AM >> > To: [email protected] >> > Subject: Fwd: Minimise query plan time for dfs plugin for local file >> > system on tsv file >> > >> > all, please help me in giving suggestions on what areas i can look into >> > why the query planning time is taking so long for files which are local >> to >> > the drill machines. I have the same directory structure copied on all >> the 5 >> > nodes of the cluster. I am accessing the source files using out of the >> box >> > dfs storage plugin. >> > >> > Query planning time is approx 30 secs >> > Query execution time is apprx 1.5 secs >> > >> > Regards, >> > Projjwal >> > >> > ---------- Forwarded message ---------- >> > From: PROJJWAL SAHA <[email protected]<mailto:[email protected]>> >> > Date: Fri, Mar 3, 2017 at 5:06 PM >> > Subject: Minimise query plan time for dfs plugin for local file system >> on >> > tsv file >> > To: [email protected]<mailto:[email protected]> >> > >> > >> > Hello all, >> > >> > I am quering select * from dfs.xxx where yyy (filter condition) >> > >> > I am using dfs storage plugin that comes out of the box from drill on a >> > 1GB file, local to the drill cluster. >> > The 1GB file is split into 10 files of 100 MB each. >> > As expected I see 11 minor and 2 major fagments. >> > The drill cluster is 5 nodes cluster with 4 cores, 32 GB each. >> > >> > One observation is that the query plan time is more than 30 seconds. I >> ran >> > the explain plan query to validate this. >> > The query execution time is 2 secs. >> > total time taken is 32secs >> > >> > I wanted to understand how can i minimise the query plan time. >> Suggestions >> > ? >> > Is the time taken described above expected ? >> > Attached is result from explain plan query >> > >> > Regards, >> > Projjwal >> > >> > >> > >> > >
