Looking at the 1st two lines of the log shows that the bulk of time was lost 
before the query even went into the real planning stage of the query:


2017-03-07 06:27:28,074 [274166de-f543-3fa7-ef9e-8e9e87d5d6a0:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
274166de-f543-3fa7-ef9e-8e9e87d5d6a0: select columns[0] from 
dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv` where columns[0] 
='41' and columns[3] ='568'
2017-03-07 06:28:00,775 [274166de-f543-3fa7-ef9e-8e9e87d5d6a0:foreman] INFO  
o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, 
numFiles: 1


More than 30 secs is unaccounted for. Can you turn on the root logger to be at 
the debug level and retry the explain plan?


Kunal Khatua


________________________________
From: rahul challapalli <challapallira...@gmail.com>
Sent: Tuesday, March 7, 2017 5:24:43 AM
To: user
Subject: Re: Minimise query plan time for dfs plugin for local file system on 
tsv file

I did not get a chance to review the log file.

However the next thing I would try is to make your cluster a single node
cluster first and then run the same explain plan query separately on each
individual file.



On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" <proj.s...@gmail.com> wrote:

> Hi Rahul,
>
> thanks for your suggestions. However, I am still not able to see any
> reduction in query planning time
> by explicit column names, removing extract headers and using columns[index]
>
> As I said, I ran explain plan and its taking 30+ secs for me.
> My data is 1 GB tsv split into 20 files of 5 MB each.
> Each 5MB file has close to 50k records
> Its a 5 node cluster, and width per node is 4
> Therefore, total number of minor fragments for one major fragment is 20
> I have copied the source directory in all the drillbit nodes
>
> can you tell me a reasonable time estimate which I can expect drill to
> return result for query for the above described scenario.
> Query is - select columns[0] from 
> dfs.root.`/scratch/localdisk/drill/testdata/Cust_1G_20_tsv`
> where columns[0] ='41' and columns[3] ='568'
>
> attached is the json profile
> and the drillbit.log
>
> I also have the tracing enabled.
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
> org.apache.drill.exec.work.foreman.Foreman
> however i see the duration of various steps in the order of ms in the logs.
> i am not sure where planning time of the order of 30 secs is consumed.
>
> Please help
>
> Regards,
> Projjwal
>
>
>
>
>
>
>
> On Mon, Mar 6, 2017 at 11:23 PM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
>> You can try the below things. For each of the below check the planning
>> time
>> individually
>>
>> 1. Run explain plan for a simple "select * from `
>> /scratch/localdisk/drill/testdata/Cust_1G_tsv`"
>> 2. Replace the '*' in your query with explicit column names
>> 3. Remove the extract header from your storage plugin configuration and
>> from your data files? Rewrite your query to use, columns[0_based_index]
>> instead of explicit column names
>>
>> Also how many columns do you have in your text files and what is the size
>> of each file? Like gautam suggested, it would be good to take a look at
>> drillbit.log file (from the foreman node where planning occurred) and the
>> query profile as well.
>>
>> - Rahul
>>
>> On Mon, Mar 6, 2017 at 9:30 AM, Gautam Parai <gpa...@mapr.com> wrote:
>>
>> > Can you please provide the drillbit.log file?
>> >
>> >
>> > Gautam
>> >
>> > ________________________________
>> > From: PROJJWAL SAHA <proj.s...@gmail.com>
>> > Sent: Monday, March 6, 2017 1:45:38 AM
>> > To: user@drill.apache.org
>> > Subject: Fwd: Minimise query plan time for dfs plugin for local file
>> > system on tsv file
>> >
>> > all, please help me in giving suggestions on what areas i can look into
>> > why the query planning time is taking so long for files which are local
>> to
>> > the drill machines. I have the same directory structure copied on all
>> the 5
>> > nodes of the cluster. I am accessing the source files using out of the
>> box
>> > dfs storage plugin.
>> >
>> > Query planning time is approx 30 secs
>> > Query execution time is apprx 1.5 secs
>> >
>> > Regards,
>> > Projjwal
>> >
>> > ---------- Forwarded message ----------
>> > From: PROJJWAL SAHA <proj.s...@gmail.com<mailto:proj.s...@gmail.com>>
>> > Date: Fri, Mar 3, 2017 at 5:06 PM
>> > Subject: Minimise query plan time for dfs plugin for local file system
>> on
>> > tsv file
>> > To: user@drill.apache.org<mailto:user@drill.apache.org>
>> >
>> >
>> > Hello all,
>> >
>> > I am quering select * from dfs.xxx where yyy (filter condition)
>> >
>> > I am using dfs storage plugin that comes out of the box from drill on a
>> > 1GB file, local to the drill cluster.
>> > The 1GB file is split into 10 files of 100 MB each.
>> > As expected I see 11 minor and 2 major fagments.
>> > The drill cluster is 5 nodes cluster with 4 cores, 32 GB  each.
>> >
>> > One observation is that the query plan time is more than 30 seconds. I
>> ran
>> > the explain plan query to validate this.
>> > The query execution time is 2 secs.
>> > total time taken is 32secs
>> >
>> > I wanted to understand how can i minimise the query plan time.
>> Suggestions
>> > ?
>> > Is the time taken described above expected ?
>> > Attached is result from explain plan query
>> >
>> > Regards,
>> > Projjwal
>> >
>> >
>> >
>>
>
>

Reply via email to