Thanks James, will check on it.
REgards
Prabhakar
On Thu, May 4, 2023 at 4:36 PM James Turton wrote:
> The cost you've found is just an estimate produced by the planner and
> not a good number to performance tune with. I recommend reviewing the
> fragment and operator metrics written to the que
The cost you've found is just an estimate produced by the planner and
not a good number to performance tune with. I recommend reviewing the
fragment and operator metrics written to the query profile after
execution. If you are further able to isolate the query so that it's the
only thing runnin
Hi Team,
I am trying to understand what is the IO and network cost of the query? I
could find cumulative cost but was not able to interpret it.
Please advise or any documentation pointer will help. thx
REgards
Prabhakar
er to
> > be at the debug level and retry the explain plan?
> >
> >
> > Kunal Khatua
> >
> >
> >
> > From: rahul challapalli
> > Sent: Tuesday, Mar
store.dfs.FileSelection - FileSelection.getStatuses()
> > took 0 ms, numFiles: 1
> >
> >
> > More than 30 secs is unaccounted for. Can you turn on the root logger to
> > be at the debug level and retry the explain plan?
> >
> >
>
you turn on the root logger to
> be at the debug level and retry the explain plan?
>
>
> Kunal Khatua
>
>
> ____
> From: rahul challapalli
> Sent: Tuesday, March 7, 2017 5:24:43 AM
> To: user
> Subject: Re: Minimise query plan time
t; ____
> From: rahul challapalli
> Sent: Tuesday, March 7, 2017 5:24:43 AM
> To: user
> Subject: Re: Minimise query plan time for dfs plugin for local file system
> on tsv file
>
> I did not get a chance to review the log file.
>
> However th
ubject: Re: Minimise query plan time for dfs plugin for local file system on
tsv file
I did not get a chance to review the log file.
However the next thing I would try is to make your cluster a single node
cluster first and then run the same explain plan query separately on each
individual file.
le? Like gautam suggested, it would be good to take a look at
>> drillbit.log file (from the foreman node where planning occurred) and the
>> query profile as well.
>>
>> - Rahul
>>
>> On Mon, Mar 6, 2017 at 9:30 AM, Gautam Parai wrote:
>>
>> >
> Sent: Monday, March 6, 2017 1:45:38 AM
> To: user@drill.apache.org
> Subject: Fwd: Minimise query plan time for dfs plugin for local file
> system on tsv file
>
> all, please help me in giving suggestions on what areas i can look into
> why the query planning time is taking so
Can you please provide the drillbit.log file?
Gautam
From: PROJJWAL SAHA
Sent: Monday, March 6, 2017 1:45:38 AM
To: user@drill.apache.org
Subject: Fwd: Minimise query plan time for dfs plugin for local file system on
tsv file
all, please help me in giving
dfs storage plugin.
Query planning time is approx 30 secs
Query execution time is apprx 1.5 secs
Regards,
Projjwal
-- Forwarded message --
From: PROJJWAL SAHA
Date: Fri, Mar 3, 2017 at 5:06 PM
Subject: Minimise query plan time for dfs plugin for local file system on
tsv file
To
; > 1GB file, local to the drill cluster.
> > > > The 1GB file is split into 10 files of 100 MB each.
> > > > As expected I see 11 minor and 2 major fagments.
> > > > The drill cluster is 5 nodes cluster with 4 cores, 32 GB each.
> > > >
> >
10 files of 100 MB each.
> > > As expected I see 11 minor and 2 major fagments.
> > > The drill cluster is 5 nodes cluster with 4 cores, 32 GB each.
> > >
> > > One observation is that the query plan time is more than 30 seconds. I
> > ran
> > >
GB file, local to the drill cluster.
> > The 1GB file is split into 10 files of 100 MB each.
> > As expected I see 11 minor and 2 major fagments.
> > The drill cluster is 5 nodes cluster with 4 cores, 32 GB each.
> >
> > One observation is that the query plan tim
the drill cluster.
> The 1GB file is split into 10 files of 100 MB each.
> As expected I see 11 minor and 2 major fagments.
> The drill cluster is 5 nodes cluster with 4 cores, 32 GB each.
>
> One observation is that the query plan time is more than 30 seconds. I ran
> the explai
cluster is 5 nodes cluster with 4 cores, 32 GB each.
One observation is that the query plan time is more than 30 seconds. I ran
the explain plan query to validate this.
The query execution time is 2 secs.
total time taken is 32secs
I wanted to understand how can i minimise the query plan time
Query plan caching is typically very useful if you have reporting type of
queries where the query patterns are fixed and vary mostly on filter
conditions. For adhoc queries or data exploration use cases where there may
not be fixed query patterns, this technique is not frequently useful.
Similar
have not seen any specific effort to cache query plans. However Drill is used
for analytical purposes the query volumes are substantially lower than
transactional RDBMS where query plan caching can be critical.
Are you potentially seeing an issue with query volume or query planning time
Currently not to my knowledge. Are there queries you are seeing that are
taking an abnormally long time to plan?
On Thu, Jan 14, 2016 at 6:25 AM, Vince Gonzalez
wrote:
> Does Drill do any caching of query plans?
>
Does Drill do any caching of query plans?
Team, I created a feature request
https://issues.apache.org/jira/browse/DRILL-3710 to make the "20" magic
number configurable so that we do not need to add junk/duplicate in-list
elements.
On Tue, Aug 25, 2015 at 4:49 PM, Sungwook Yoon wrote:
> This trick definitely helps...
> Not as fast as I w
This trick definitely helps...
Not as fast as I want it to be.. but.. it helps...
Sungwook
On Tue, Aug 25, 2015 at 4:25 PM, Jinfeng Ni wrote:
> Hi Sungwook,
>
> I looked at the query profile. It did show that the bottleneck is Filter
> operator, where Filter operator,
> running on 4-node EC2
Hi Sungwook,
I looked at the query profile. It did show that the bottleneck is Filter
operator, where Filter operator,
running on 4-node EC2 cluster (?), took about 82 minutes to process total
approximately 2.5 billion
records. I'm not sure how powerful each Node's CPU. But seems the query is
C
Hi Sungwook,
Can you upload the profile file to a public place? The user mailing list
does not allow to attach files.
On Tue, Aug 25, 2015 at 10:58 AM, Sungwook Yoon wrote:
>
> Jacques,
>
> Here is the physical profile for this query,
> I attached the 1000 lines of the tail of the drillbit.lo
Jacques,
Here is the physical profile for this query,
I attached the 1000 lines of the tail of the drillbit.log from the foreman.
Also, attached is the picture of the filter minorfragment information.
We have plenty of memory and Drill is not using them all.
Let me know your thoughts.
Thanks,
I still think a profile analysis would be good. Can you post the following
on gist:
- profile for this query
- source code for the slow filter operation (turn on debug logging to get
this)
thanks,
Jacques
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Tue, Aug 25, 2015 at 8:16 AM, Sungwook Yo
Understood,..
I can see Jacques and Jinfeng's point of view.
I would not bother think about this query but some claim that this
query ran orders of magnitude faster on some other sql engine...
I don't know how.. they got that performance from this query..
Sungwook
On Tue, Aug 25, 2015 at
CONVERT_FROMUTF8 is a trivial operation. I would be surprised if that was
the bottleneck. See here [1] for the implementation. I'm guessing that
the problem is simply that the best alternative on this structure would be
a non-equality join. The way I look at this query is you have data set
like
I looked at the two physical plans you posted. Seems they are different :
the first one has multiple CONVERT_FROMUTF8() call, while the second does
not have.
1.
Filter(condition=[AND(>=(CAST($0):INTEGER, 2009), <=(CAST($0):INTEGER,
2013), OR(=(CONVERT_FROMUTF8($1), '39891'), =(CONVERT_FROMUTF8($1
Aman,
Sorry for slow reply.
Here is the SQL query.
SELECT count(1) FROM `o` a WHERE (a.DX1 IN ('39891', '4280', '4281',
'42820', '42821', '42822', '42823', '42830', '42831', '42832', '42833',
'42840', '42841', '42842', '42843', '4289') OR a.DX2 IN ('39891', '4280',
'4281', '42820', '42821', '428
On Mon, Aug 24, 2015 at 9:32 PM, Jacques Nadeau wrote:
> I think the isue in many cases is not generating the right data. It more
> often is generating the right query. The easiest to solve these issues is
> to continue to get real world example queries like the one Sunwook provided
> above.
>
On Mon, Aug 24, 2015 at 9:32 PM, Jacques Nadeau wrote:
> Good plug for log-synth. I'm still hoping for a storage plugin... (I
> suppose I have to write some docs on the api first :)
>
I'll write it as soon as I can figure out how. :-)
Good plug for log-synth. I'm still hoping for a storage plugin... (I
suppose I have to write some docs on the api first :)
I think the isue in many cases is not generating the right data. It more
often is generating the right query. The easiest to solve these issues is
to continue to get real w
Just a plug here for a tool that started as a part of Drill and which might
help people build test cases.
Log-synth makes it very easy to build realistic data, flat or nested. It
creates realistic data such as street addresses, zip codes and SSN's. It
also includes lots of tables of auxiliary dat
I was about to say that for IN lists of size 20 or more, Drill uses a more
efficient Values operator instead of OR conditions but then realized the OR
filter is referencing 4 different columns : $1..$4 and each of those
individual lists is less than 20. Sungwook, can you please provide the
SQL qu
On Mon, Aug 24, 2015 at 4:50 PM, Sungwook Yoon wrote:
> Still, the performance drop down due to OR filtering is just astounding...
>
That is what query optimizers are for and why getting them to work well is
important.
The difference in performance that you are observing is not surprising
given
E9, cumulative cost = {5.281182067E9 rows, 3.1687092402E10
> > cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7674
> >
> >
> > Sungwook
> >
> > On Mon, Aug 24, 2015 at 4:18 PM, Jinfeng Ni
> wrote:
> >
> > > Can you post the plan including th
cumulative cost = {5.281182067E9 rows, 3.1687092402E10
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 7674
>
>
> Sungwook
>
> On Mon, Aug 24, 2015 at 4:18 PM, Jinfeng Ni wrote:
>
> > Can you post the plan including the HBaseScan part? I would like check
> if
> > the fi
understand, currently, if part of the filter is pushed
> into Scan operator, it will remain in the Filter operator, although the
> filter in the Scan should have pruned out the rows which do not qualify for
> the filter condition.
>
>
>
>
>
> On Mon,
uery, doing something like
> >
> > a in (v1, v2, v3, v15)
> >
> > The physical query plan looks like the following.
> >
> > Filter(condition=[AND(>=(CAST($0):INTEGER, 2009), <=(CAST($0):INTEGER,
> > 2013), OR(=(CONVERT_FROMUTF8($1), '39891
once.
Aman
On Mon, Aug 24, 2015 at 1:34 PM, Sungwook Yoon wrote:
> Hi,
>
> I have a query, doing something like
>
> a in (v1, v2, v3, v15)
>
> The physical query plan looks like the following.
>
> Filter(condition=[AND(>=(CAST($0):INTEGER, 2009), &l
Hi,
I have a query, doing something like
a in (v1, v2, v3, v15)
The physical query plan looks like the following.
Filter(condition=[AND(>=(CAST($0):INTEGER, 2009), <=(CAST($0):INTEGER,
2013), OR(=(CONVERT_FROMUTF8($1), '39891'), =(CONVERT_FROMUTF8($1),
'4280
> hoping someone might be able to point me in the right direction.
>
> The behaviour seems extremely strange that a field name could affect a
> query plan. Ultimately, the Mongo filter pushdown optimization rule (which
> is the real reason this is causing me pain) should handle cas
Hi guys,
I raised:
https://issues.apache.org/jira/browse/DRILL-2732
I'd be keen to get stuck in and implement a patch for this, but I was
hoping someone might be able to point me in the right direction.
The behaviour seems extremely strange that a field name could affect a
query
45 matches
Mail list logo