Sorry for disrupting the thread,
i am wondering will it be possible to use BRIN indexes to better estimate
distribution?
I mean create btree index and brin index,
probe brin during planning and estimate if abort early plan with btree will be
better.
--
Sent via pgsql-performance mailing
On Wed, Dec 17, 2014 at 4:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
On 12 December 2014 at 03:31, Simon Riggs si...@2ndquadrant.com wrote:
Also attached is a new parameter called enable_sortedpath which can be
used to turn on/off the sorted path generated by the planner.
Now with
On 12 December 2014 at 03:31, Simon Riggs si...@2ndquadrant.com wrote:
Also attached is a new parameter called enable_sortedpath which can be
used to turn on/off the sorted path generated by the planner.
Now with attachment. (Thanks Jeff!)
--
Simon Riggs
On 30 September 2014 at 10:25, Simon Riggs si...@2ndquadrant.com wrote:
On 30 September 2014 00:00, Tom Lane t...@sss.pgh.pa.us wrote:
The existing cost estimation
code effectively assumes that they're perfectly uniformly distributed;
which is a good average-case assumption but can be
On 12 December 2014 at 03:22, Simon Riggs si...@2ndquadrant.com wrote:
It's a simple patch, but it solves the test cases I know about and
does almost nothing to planning time.
Test cases attached. The files marked pettus_* are written up from
Christophe Pettus' blog.
The other test case is one
On 12/05/2014 08:04 AM, Simon Riggs wrote:
On 6 December 2014 at 00:45, Merlin Moncure mmonc...@gmail.com wrote:
Neat -- got any test cases (would this have prevented OP's problem)?
No test case was posted, so I am unable to confirm.
A test case I produced that appears to be the same
On 10 December 2014 at 10:46, Josh Berkus j...@agliodbs.com wrote:
On 12/05/2014 08:04 AM, Simon Riggs wrote:
On 6 December 2014 at 00:45, Merlin Moncure mmonc...@gmail.com wrote:
Neat -- got any test cases (would this have prevented OP's problem)?
No test case was posted, so I am unable to
On Fri, Dec 5, 2014 at 12:46 AM, Simon Riggs si...@2ndquadrant.com wrote:
On 30 September 2014 at 05:53, Simon Riggs si...@2ndquadrant.com wrote:
On 29 September 2014 16:00, Merlin Moncure mmonc...@gmail.com wrote:
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
The
On 6 December 2014 at 00:45, Merlin Moncure mmonc...@gmail.com wrote:
Neat -- got any test cases (would this have prevented OP's problem)?
No test case was posted, so I am unable to confirm.
A test case I produced that appears to be the same issue is fixed.
I await confirmation from the OP.
On 30 September 2014 at 05:53, Simon Riggs si...@2ndquadrant.com wrote:
On 29 September 2014 16:00, Merlin Moncure mmonc...@gmail.com wrote:
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
The problem, as I see it, is different. We assume that if there are
100
On 21.11.2014 19:38, Jeff Janes wrote:
When I run this patch on the regression database, I get a case where
the current method is exact but the adaptive one is off:
WARNING: ndistinct estimate current=676.00 adaptive=906.00
select count(distinct stringu1) from onek;
676
It should be
On Fri, Oct 10, 2014 at 10:53 AM, Tomas Vondra t...@fuzzy.cz wrote:
On 10.10.2014 14:10, Tomas Vondra wrote:
Dne 10 Říjen 2014, 13:16, Greg Stark napsal(a):
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a
On Sat, Oct 18, 2014 at 6:01 PM, Tomas Vondra t...@fuzzy.cz wrote:
Hmmm. I have 0 experience with handling patents and related issues. Any
idea how to address that?
Well there's no real way to address it. But to summarize: 1) We should
not go searching for patents, knowing that something is
On 17.10.2014 19:25, Greg Stark wrote:
On Wed, Oct 15, 2014 at 7:02 PM, Tomas Vondra t...@fuzzy.cz wrote:
If you know the title of the article, it's usually available
elsewhere on the web - either at the university site, or elsewhere.
I found these two articles about block-based sampling:
On Wed, Oct 15, 2014 at 7:02 PM, Tomas Vondra t...@fuzzy.cz wrote:
If you know the title of the article, it's usually available elsewhere
on the web - either at the university site, or elsewhere. I found these
two articles about block-based sampling:
On 10/10/2014 04:16 AM, Greg Stark wrote:
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
can get a MUCH more accurate n_distinct
On 15.10.2014 19:20, Josh Berkus wrote:
On 10/10/2014 04:16 AM, Greg Stark wrote:
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
can get a MUCH more accurate n_distinct estimate using multiple
algorithms, of which
Dne 10 Říjen 2014, 13:16, Greg Stark napsal(a):
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
can get a MUCH more accurate
On Fri, Oct 10, 2014 at 5:10 AM, Tomas Vondra t...@fuzzy.cz wrote:
I've gone looking for papers on this topic but from what I read this
isn't so. To get any noticeable improvement you need to read 10-50% of
the table and that's effectively the same as reading the entire table
-- and it
On 10.10.2014 16:21, Craig James wrote:
On Fri, Oct 10, 2014 at 5:10 AM, Tomas Vondra t...@fuzzy.cz
mailto:t...@fuzzy.cz wrote:
I've gone looking for papers on this topic but from what I read this
isn't so. To get any noticeable improvement you need to read 10-50% of
the
On 10.10.2014 14:10, Tomas Vondra wrote:
Dne 10 Říjen 2014, 13:16, Greg Stark napsal(a):
On Thu, Oct 2, 2014 at 8:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table,
On Fri, Oct 10, 2014 at 9:53 AM, Tomas Vondra t...@fuzzy.cz wrote:
On 10.10.2014 16:21, Craig James wrote:
Our index is for chemical structures. Chemicals are indexed on
chemical fragments
http://emolecules.com/info/molecular-informatics. A search
typically starts with 50-200 indexed
On 10.10.2014 19:59, Craig James wrote:
On Fri, Oct 10, 2014 at 9:53 AM, Tomas Vondra t...@fuzzy.cz
mailto:t...@fuzzy.cz wrote:
On 10.10.2014 16:21, Craig James wrote:
Our index is for chemical structures. Chemicals are indexed on
chemical fragments
On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus j...@agliodbs.com wrote:
On 10/02/2014 02:30 AM, Peter Geoghegan wrote:
On Thu, Oct 2, 2014 at 1:19 AM, Simon Riggs si...@2ndquadrant.com
wrote:
Having read papers on it, I believe the problem is intractable. Coding
is not the issue. To anyone:
On 3.10.2014 21:58, Jeff Janes wrote:
On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus j...@agliodbs.com
mailto:j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
can
On 3.10.2014 02:54, Peter Geoghegan wrote:
On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the
table, we can get a MUCH more accurate n_distinct
On 1 October 2014 19:56, Josh Berkus j...@agliodbs.com wrote:
On 09/30/2014 04:01 PM, Simon Riggs wrote:
On 30 September 2014 18:28, Jeff Janes jeff.ja...@gmail.com wrote:
Anyway, in the particular case I posted fixing n_distinct to realistic
numbers (%) fixed the query plan.
But wouldn't
On Thu, Oct 2, 2014 at 1:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
I disagree that (1) is not worth fixing just because we've provided
users with an API to override the stats. It would unquestionably be
better for us to have a better n_distinct estimate in the first place.
Further, this
On 29/09/2014 9:00 AM, Merlin Moncure wrote:
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data
On 10/02/2014 02:30 AM, Peter Geoghegan wrote:
On Thu, Oct 2, 2014 at 1:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
Having read papers on it, I believe the problem is intractable. Coding
is not the issue. To anyone: please prove me wrong, in detail, with
references so it can be coded.
I
On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus j...@agliodbs.com wrote:
Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
can get a MUCH more accurate n_distinct estimate using multiple
algorithms, of which
On 09/30/2014 04:01 PM, Simon Riggs wrote:
On 30 September 2014 18:28, Jeff Janes jeff.ja...@gmail.com wrote:
Anyway, in the particular case I posted fixing n_distinct to realistic
numbers (%) fixed the query plan.
But wouldn't fixing the absolute number also have fixed the plan?
There
On 29 September 2014 22:54, Josh Berkus j...@agliodbs.com wrote:
On 09/26/2014 01:06 AM, Simon Riggs wrote:
On 23 September 2014 00:56, Josh Berkus j...@agliodbs.com wrote:
We've hashed that out a bit, but frankly I think it's much more
profitable to pursue fixing the actual problem than
On 30 September 2014 00:00, Tom Lane t...@sss.pgh.pa.us wrote:
Simon Riggs si...@2ndquadrant.com writes:
The way I'm seeing it, you can't assume the LIMIT will apply to any
IndexScan that doesn't have an index condition. If it has just a
filter, or nothing at all, just an ordering then it
The existing cost estimation
code effectively assumes that they're perfectly uniformly distributed;
which is a good average-case assumption but can be horribly wrong in
the worst case.
Sorry, just an outsider jumping in with a quick comment.
Every year or two the core count goes up.
On Tue, Sep 30, 2014 at 8:34 AM, Graeme B. Bell g...@skogoglandskap.no wrote:
The existing cost estimation
code effectively assumes that they're perfectly uniformly distributed;
which is a good average-case assumption but can be horribly wrong in
the worst case.
Sorry, just an outsider
Graeme B. Bell g...@skogoglandskap.no writes:
Every year or two the core count goes up. Can/should/does postgres ever
attempt two strategies in parallel, in cases where strategy A is generally
good but strategy B prevents bad worst case behaviour? Kind of like a
Schrödinger's Cat approach
On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower gavinflo...@archidevsys.co.nz
wrote:
Would it be feasible to get a competent statistician to advise what data to
collect, and to analyze it? Maybe it is possible to get a better estimate on
how much of a table needs to be scanned, based on
On Mon, Sep 29, 2014 at 2:54 PM, Josh Berkus j...@agliodbs.com wrote:
On 09/26/2014 01:06 AM, Simon Riggs wrote:
On 23 September 2014 00:56, Josh Berkus j...@agliodbs.com wrote:
We've hashed that out a bit, but frankly I think it's much more
profitable to pursue fixing the actual problem
Thanks for your replies everyone.
You can't run two plans and have them both returning rows to the client,
That wasn't what I had in mind.
I can envisage cases where the worst case behaviour of one plan results in
zero rows by the time the alternative plan has generated the complete
On 01/10/14 05:54, Jeff Janes wrote:
On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower
gavinflo...@archidevsys.co.nz mailto:gavinflo...@archidevsys.co.nz
wrote:
Would it be feasible to get a competent statistician to advise what data to
collect, and to analyze it? Maybe it is possible to
On Tue, Sep 30, 2014 at 11:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
On Mon, Sep 29, 2014 at 7:12 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
Would it be feasible to get a competent statistician to advise what data
to collect, and to analyze it? Maybe it is possible to get a
On 30 September 2014 18:28, Jeff Janes jeff.ja...@gmail.com wrote:
Anyway, in the particular case I posted fixing n_distinct to realistic
numbers (%) fixed the query plan.
But wouldn't fixing the absolute number also have fixed the plan?
There are two causes of this issue.
1. Poor
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data is arranged in the table in a
very
On 29 September 2014 16:00, Merlin Moncure mmonc...@gmail.com wrote:
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of
On 09/26/2014 01:06 AM, Simon Riggs wrote:
On 23 September 2014 00:56, Josh Berkus j...@agliodbs.com wrote:
We've hashed that out a bit, but frankly I think it's much more
profitable to pursue fixing the actual problem than providing a
workaround like risk, such as:
a) fixing n_distinct
Simon Riggs si...@2ndquadrant.com writes:
The way I'm seeing it, you can't assume the LIMIT will apply to any
IndexScan that doesn't have an index condition. If it has just a
filter, or nothing at all, just an ordering then it could easily scan
the whole index if the stats are wrong.
That
On 30/09/14 12:00, Tom Lane wrote:
Simon Riggs si...@2ndquadrant.com writes:
The way I'm seeing it, you can't assume the LIMIT will apply to any
IndexScan that doesn't have an index condition. If it has just a
filter, or nothing at all, just an ordering then it could easily scan
the whole index
On Fri, Sep 26, 2014 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
If we can at least agree it is a problem, we can try to move forwards.
Well that's a good question. I don't think we do and I think the
reason why is because we haven't actually pinned down exactly what is
the problem.
On 23 September 2014 00:56, Josh Berkus j...@agliodbs.com wrote:
We've hashed that out a bit, but frankly I think it's much more
profitable to pursue fixing the actual problem than providing a
workaround like risk, such as:
a) fixing n_distinct estimation
b) estimating stacked quals using
On Sat, Sep 20, 2014 at 1:33 PM, Josh Berkus j...@agliodbs.com wrote:
For example, we could increase the estimated cost
for an abort-early index scan by 10X, to reflect our weak confidence in
its correctness.
Has any progress been made on the performance farm? The problem with
suggestions
On 09/22/2014 06:55 AM, Merlin Moncure wrote:
Has any progress been made on the performance farm? The problem with
suggestions like this (which seem pretty reasonable to me) is that
we've got no way of quantifying the downside.
Yeah, that's certainly an issue. The problem is that we'd need a
On 19 Sep 2014 19:40, Josh Berkus j...@agliodbs.com wrote:
On 09/19/2014 10:15 AM, Merlin Moncure wrote:
On Wed, Sep 17, 2014 at 7:11 PM, Josh Berkus j...@agliodbs.com wrote:
This is the core issue with abort-early plans; they depend on our
statistics being extremely accurate, which we
On Sat, Sep 20, 2014 at 3:38 AM, Greg Stark st...@mit.edu wrote:
Is there a canonical case of where 'abort early' plans help? (I'm new
to that term -- is it a recent planner innovation...got any handy
links?)
Yeah, here's an example of the canonical case:
Table t1 ( a, b, c )
- b is
Greg Stark st...@mit.edu writes:
On 19 Sep 2014 19:40, Josh Berkus j...@agliodbs.com wrote:
Yeah, here's an example of the canonical case:
Table t1 ( a, b, c )
- b is low-cardinality
- c is high-cardinality
- There are separate indexes on both b and c.
SELECT a, b, c FROM t1
WHERE b
On 09/19/2014 11:38 PM, Greg Stark wrote:
On 19 Sep 2014 19:40, Josh Berkus j...@agliodbs.com
mailto:j...@agliodbs.com wrote:
On 09/19/2014 10:15 AM, Merlin Moncure wrote:
On Wed, Sep 17, 2014 at 7:11 PM, Josh Berkus j...@agliodbs.com
mailto:j...@agliodbs.com wrote:
This is the core
On 09/19/2014 10:15 AM, Merlin Moncure wrote:
On Wed, Sep 17, 2014 at 7:11 PM, Josh Berkus j...@agliodbs.com wrote:
This is the core issue with abort-early plans; they depend on our
statistics being extremely accurate, which we know they are not. And if
they're wrong, the execution time climbs
58 matches
Mail list logo