Re: [PERFORM] Bulk persistence strategy

2017-05-22 Thread Simon Riggs
On 22 May 2017 at 09:06, Riaan Stander <rstan...@exa.co.za> wrote:

>> It sounds like you don't know about anonymous code blocks with DO
>> https://www.postgresql.org/docs/devel/static/sql-do.html
>>
>
> Yes I do know about that feature. My first implemented generated an
> anonymous code block, but to my utter dismay once I tried actually doing
> parameter binding from the application it did not work. This seems to be a
> Postgres limitation actually stated in the documentation. The anonymous code
> block is treated as a function body with no parameters.
>
> Thanks for the suggestion though.

Perhaps we should look into parameterisable DO statements.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bulk persistence strategy

2017-05-21 Thread Simon Riggs
On 22 May 2017 at 03:14, Riaan Stander <rstan...@exa.co.za> wrote:
>
>> Riaan Stander <rstan...@exa.co.za> writes:
>>>
>>> The intended use is use-once. The reason is that the statements might
>>> differ per call, especially when we start doing updates. The ideal would
>>> be to just issue the sql statements, but I was trying to cut down on
>>> network calls. To batch them together and get output from one query as
>>> input for the others (declare variables), I have to wrap them in a
>>> function in Postgres. Or am I missing something? In SQL Server TSQL I
>>> could declare variables in any statement as required.
>>
>> Hm, well, feeding data forward to the next query without a network
>> round trip is a valid concern.
>>
>> How stylized are these commands?  Have you considered pushing the
>> generation logic into the function, so that you just have one (or
>> a few) persistent functions, and the variability slack is taken
>> up through EXECUTE'd strings?  That'd likely be significantly
>> more efficient than one-use functions.  Even disregarding the
>> pg_proc update traffic, plpgsql isn't going to shine in that usage
>> because it's optimized for repeated execution of functions.
>>
>> regards, tom lane
>
> The commands are generated from a complex object/type in the application.
> Some of them can be quite large. With modifications they do state tracking
> too, so that we only update fields that actually changed and can do
> optimistic concurrency checking.
>
> It'll probably make more sense to try create a function per type of object
> that deals with the query generation. That way I can create a Postgres type
> that maps from the application object.
>
> Thanks for the advice. I'll give that a shot.

It sounds like you don't know about anonymous code blocks with DO
https://www.postgresql.org/docs/devel/static/sql-do.html

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Backup taking long time !!!

2017-01-23 Thread Simon Riggs
On 23 January 2017 at 17:12, Jeff Janes <jeff.ja...@gmail.com> wrote:

>> Just to make sure anyone reading the mailing list archives isn't
>> confused, running pg_start_backup does *not* make PG stop writing to
>> BASEDIR (or DATADIR, or anything, really).  PG *will* continue to write
>> data into BASEDIR after pg_start_backup has been called.
>
>
>
> Correct.  Unfortunately it is a very common myth that it does cause
> PostgreSQL to stop writing to the base dir.

Never heard that one before. Wow. Who's been saying that?

It's taken me years to hunt down all invalid backup memes and terminate them.

Never fails to surprise me how many people don't read the docs.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding BRIN index performance

2016-10-03 Thread Simon Riggs
On 3 October 2016 at 10:58, Ivan Voras <ivo...@gmail.com> wrote:

> I get that, my question was more about why the index scan returned 25 mil
> rows, when the pages are sequentially filled by timestamps? In my
> understading of BRIN, it should have returned a small number of pages which
> would have been filtered (and sorted) for the exact data, right?

That could be most simply explained if the distribution of your data
is not what you think it is.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding BRIN index performance

2016-10-03 Thread Simon Riggs
On 3 October 2016 at 10:00, Ivan Voras <ivo...@gmail.com> wrote:
> Hi,
>
> I have a table of around 20 G, more than 220 million records, and I'm
> running this query on it:
>
> explain analyze SELECT MAX(id) - (SELECT id FROM expl_transactions WHERE
> dateAdded < (now() - INTERVAL '10 MINUTES') ORDER BY dateAdded DESC LIMIT 1)
> FROM expl_transactions;
>
> "id" is SERIAL, "dateAdded" is timestamp without timezone
>
> The "dateAdded" field also has a "default now()" applied to it some time
> after its creation, and a fair amount of null values in the records (which I
> don't think matters for this query, but maybe I'm wrong).
>
> My first idea is to create a default BRIN index on dateAdded since the above
> query is not run frequently. To my surprise, the planner refused to use the
> index and used sequential scan instead. When I forced sequential scanning
> off, I got this:
>
> https://explain.depesz.com/s/W8oo
>
> The query was executing for 40+ seconds. It seems like the "index scan" on
> it returns nearly 9% of the table, 25 mil rows. Since the data in dateAdded
> actually is sequential and fairly selective (having now() as the default
> over a long period of time), this surprises me.
>
> With a normal btree index, of course, it runs fine:
>
> https://explain.depesz.com/s/TB5

Btree retains ordering, BRIN does not.

We've discussed optimizing the sort based upon BRIN metadata, but
that's not implemented yet.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Millions of tables

2016-09-29 Thread Simon Riggs
On 26 September 2016 at 05:19, Greg Spiegelberg <gspiegelb...@gmail.com> wrote:
> I did look at PostgresXL and CitusDB.  Both are admirable however neither
> could support the need to read a random record consistently under 30ms.
> It's a similar problem Cassandra and others have: network latency.  At this
> scale, to provide the ability to access any given record amongst trillions
> it is imperative to know precisely where it is stored (system & database)
> and read a relatively small index.  I have other requirements that prohibit
> use of any technology that is eventually consistent.

Then XL is exactly what you need, since it does allow you to calculate
exactly where the record is via hash and then access it, which makes
the request just a single datanode task.

XL is not the same as CitusDB.

> I liken the problem to fishing.  To find a particular fish of length, size,
> color  in a data lake you must accept the possibility of scanning the
> entire lake.  However, if all fish were in barrels where each barrel had a
> particular kind of fish of specific length, size, color  then the problem
> is far simpler.

The task of putting the fish in the appropriate barrel is quite hard.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-16 Thread Simon Riggs
On 12 December 2014 at 03:31, Simon Riggs si...@2ndquadrant.com wrote:

 Also attached is a new parameter called enable_sortedpath which can be
 used to turn on/off the sorted path generated by the planner.

Now with attachment. (Thanks Jeff!)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


enable_sorted_path.v1a.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-11 Thread Simon Riggs
On 30 September 2014 at 10:25, Simon Riggs si...@2ndquadrant.com wrote:
 On 30 September 2014 00:00, Tom Lane t...@sss.pgh.pa.us wrote:

 The existing cost estimation
 code effectively assumes that they're perfectly uniformly distributed;
 which is a good average-case assumption but can be horribly wrong in
 the worst case.

 Agreed. This is the main observation from which we can work.

 If we could settle on some other model for the probable distribution
 of the matching tuples, we could adjust the cost estimates for LIMIT
 accordingly.  I have not enough statistics background to know what a
 realistic alternative would be.

 I'm not sure that the correlation alone is sufficient to be able to do
 that. We'd need to estimate where the values looked for are likely to
 be wrt other values, then increase estimate accordingly. That sounds
 like a lot of pushups grovelling through quals and comparing against
 stats. So my thinking is actually to rule that out, unless you've some
 ideas for how to do that?

 Another possibility is to still assume a uniform distribution but estimate
 for, say, a 90% probability instead of 50% probability that we'll find
 enough tuples after scanning X amount of the table.  Again, I'm not too
 sure what that translates to in terms of the actual math, but it sounds
 like something a statistics person could do in their sleep.


The problem is one of risk. Whatever distribution we use, it will be
wrong in some cases and good in others.

For example, if we look at 10 Most Recent Calls for a user, then
frequent users would have one distribution, infrequent users another.
So we have multiple distributions in the same data. We just can't hold
enough information to make sense of this.

Think about how much data needs to be scanned if the user has only done 9 calls.

What I've done in the past is to rewrite the query in different ways
to force different plans, then call each plan depending upon the user
characteristics. This is can also be done with hints, in a more
ignorant way.


 I do not think we should estimate for the worst case though.  If we do,
 we'll hear cries of anguish from a lot of people, including many of the
 same ones complaining now, because the planner stopped picking fast-start
 plans even for cases where they are orders of magnitude faster than the
 alternatives.

 Fast start plans still make sense when performing an IndexScan with no
 filter conditions. Those types of plan should not be changed from
 current costing - they are accurate, good and very important because
 of their frequency in real workloads.

 What I think we are seeing is Ordered plans being selected too often
 in preference to Sorted plans when we make selectivity or stats
 errors. As well as data distributions that aren't correctly described
 by the statistics causing much longer execution times.

 Here are some plan selection strategies

 * Cost based - attempt to exactly calculate the cost based upon
 existing stats - increase the complexity of cost calc to cover other
 aspects. Even if we do that, these may not be that helpful in covering
 the cases where the stats turn out to be wrong.

 * Risk based - A risk adjusted viewpoint would be that we should treat
 the cost as mid-way between the best and the worst. The worst is
 clearly scanning (100% - N) of the tuples, the best is just N tuples.
 So we should be costing scans with excess filter conditions as a (100%
 Scan)/2, no matter the conditions, based purely upon risk.

 * Simplified heuristic - deselect ordered plans when they are driven
 from scans without quals or indexscans with filters, since the risk
 adjusted cost is likely to be higher than the sorted cost. Inspecting
 the plan tree for this could be quite costly, so would only be done
 when the total cost is $high, prior to it being adjusted by LIMIT.


 In terms of practical steps... I suggest the following:

 * Implement enable_orderedscan = on (default) | off. A switch to allow
 plans to de-select ordered plans, so we can more easily see the
 effects of such plans in the wild.

 * Code heuristic approach - I can see where to add my heuristic in the
 grouping planner. So we just need to do a left? deep search of the
 plan tree looking for scans of the appropriate type and bail out if we
 find one.

After looking at this for some time I now have a patch that solves this.

It relies on the observation that index scans with no bounded quals
don't play nicely with LIMIT. The solution relies upon the point that
LIMIT does not reduce the startup cost of plans, only the total cost.
So we can solve the problem by keeping the total cost estimate, just
move some of that into startup cost so LIMIT does not reduce costs as
much as before.

It's a simple patch, but it solves the test cases I know about and
does almost nothing to planning time.

I tried much less subtle approaches involving direct prevention of
LIMIT pushdown but the code was much too complex for my liking.

- - -

The only

Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-11 Thread Simon Riggs
On 12 December 2014 at 03:22, Simon Riggs si...@2ndquadrant.com wrote:

 It's a simple patch, but it solves the test cases I know about and
 does almost nothing to planning time.

Test cases attached. The files marked pettus_* are written up from
Christophe Pettus' blog.
The other test case is one of my own devising, based upon recent
customer problems.

The 10 most recent calls is a restatement of actual problems seen in the past.


Also attached is a new parameter called enable_sortedpath which can be
used to turn on/off the sorted path generated by the planner.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


sortedpath.sql
Description: Binary data


pettus_limit.sql
Description: Binary data


pettus_sel.sql
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-09 Thread Simon Riggs
On 10 December 2014 at 10:46, Josh Berkus j...@agliodbs.com wrote:
 On 12/05/2014 08:04 AM, Simon Riggs wrote:
 On 6 December 2014 at 00:45, Merlin Moncure mmonc...@gmail.com wrote:

 Neat -- got any test cases (would this have prevented OP's problem)?

 No test case was posted, so I am unable to confirm.

 A test case I produced that appears to be the same issue is fixed.

 I await confirmation from the OP.


 So that's proprietary/confidential data.  However, the company involved
 has a large testbed and I could test their data using a patched version
 of Postgres.   In 3 months their data distribution has drifted, so I'll
 need to do some work to recreate the original bad plan circumstances.
 I'll keep you posted on how the patch works for that setup.

 It would be great to come up with a generic/public test for a bad
 abort-early situation.  Ideas?

If you could contribute that, it would be welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-05 Thread Simon Riggs
On 6 December 2014 at 00:45, Merlin Moncure mmonc...@gmail.com wrote:

 Neat -- got any test cases (would this have prevented OP's problem)?

No test case was posted, so I am unable to confirm.

A test case I produced that appears to be the same issue is fixed.

I await confirmation from the OP.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-12-04 Thread Simon Riggs
On 30 September 2014 at 05:53, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 September 2014 16:00, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The problem, as I see it, is different. We assume that if there are
 100 distinct values and you use LIMIT 1 that you would only need to
 scan 1% of rows. We assume that the data is arranged in the table in a
 very homogenous layout. When data is not, and it seldom is, we get
 problems.

 Hm, good point -- 'data proximity'.  At least in theory, can't this be
 measured and quantified?  For example, given a number of distinct
 values, you could estimate the % of pages read (or maybe non
 sequential seeks relative to the number of pages) you'd need to read
 all instances of a particular value in the average (or perhaps the
 worst) case.   One way of trying to calculate that would be to look at
 proximity of values in sampled pages (and maybe a penalty assigned for
 high update activity relative to table size).  Data proximity would
 then become a cost coefficient to the benefits of LIMIT.

 The necessary first step to this is to realise that we can't simply
 apply the LIMIT as a reduction in query cost, in all cases.

 The way I'm seeing it, you can't assume the LIMIT will apply to any
 IndexScan that doesn't have an index condition. If it has just a
 filter, or nothing at all, just an ordering then it could easily scan
 the whole index if the stats are wrong.

 So plans like this could be wrong, by assuming the scan will end
 earlier because of the LIMIT than it actually will.

 Limit
   IndexScan (no index cond)

 Limit
   NestJoin
 IndexScan (no index cond)
 SomeScan

 Limit
   NestJoin
 NestJoin
   IndexScan (no index cond)
   SomeScan
SomeScan

 and deeper...

 I'm looking for a way to identify and exclude such plans, assuming
 that this captures at least some of the problem plans.

After looking at this for some time I now have a patch that solves this.

It relies on the observation that index scans with no bounded quals
don't play nicely with LIMIT. The solution relies upon the point that
LIMIT does not reduce the startup cost of plans, only the total cost.
So we can solve the problem by keeping the total cost estimate, just
move some of that into startup cost so LIMIT does not reduce costs as
much as before.

It's a simple patch, but it solves the test cases I know about and
does almost nothing to planning time.

I tried much less subtle approaches involving direct prevention of
LIMIT pushdown but the code was much too complex for my liking.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


avoid_limit_pushdown.v3.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unnecessary sort in the execution plan when doing group by

2014-11-04 Thread Simon Riggs
On 28 October 2014 06:26, Huang, Suya suya.hu...@au.experian.com wrote:

Memory wanted: 3565580K bytes

This means increase work_mem to this value.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-10-02 Thread Simon Riggs
On 1 October 2014 19:56, Josh Berkus j...@agliodbs.com wrote:
 On 09/30/2014 04:01 PM, Simon Riggs wrote:
 On 30 September 2014 18:28, Jeff Janes jeff.ja...@gmail.com wrote:

 Anyway, in the particular case I posted fixing n_distinct to realistic
 numbers (%) fixed the query plan.


 But wouldn't fixing the absolute number also have fixed the plan?

 There are two causes of this issue.

 1. Poor estimates of n_distinct. Fixable by user.

 2. Poor assumption of homogeneous distribution. No way for user to
 fix. Insufficient stats detail to be able to solve in current planner.

 I see (2) as the main source of issues, since as we observe, (1) is fixable.

 I disagree that (1) is not worth fixing just because we've provided
 users with an API to override the stats.  It would unquestionably be
 better for us to have a better n_distinct estimate in the first place.
 Further, this is an easier problem to solve, and fixing n_distinct
 estimates would fix a large minority of currently pathological queries.
  It's like saying hey, we don't need to fix the leak in your radiator,
 we've given you a funnel in the dashboard you can pour water into.

Having read papers on it, I believe the problem is intractable. Coding
is not the issue. To anyone: please prove me wrong, in detail, with
references so it can be coded.

 I do agree that (2) is worth fixing *as well*.  In a first
 approximation, one possibility (as Tom suggests) would be to come up
 with a mathematical model for a selectivity estimate which was somewhere
 *between* homogenous distribution and the worst case.  While that
 wouldn't solve a lot of cases, it would be a start towards having a
 better model.

This may have a reasonable solution, but I don't know it. A more
accurate mathematical model will still avoid the main problem: it is a
guess, not certain knowledge and the risk will still remain.

 I don't think correlation is up to the task as a complete solution, although
 it might help a little.  There is no way a simple correlation can encode
 that John retired 15 years ago and hasn't logged on since, while Johannes
 was hired yesterday and never logged on before then.

 Ah, OK, essentially the same example.

 Which is why I ruled out correlation stats based approaches and
 suggested a risk-weighted cost approach.

 By risk-weighted you mean just adjusting cost estimates based on what
 the worst case cost looks like, correct?  That seemed to be your
 proposal from an earlier post.  If so, we're in violent agreement here.

I proposed a clear path for this earlier in the thread and received no
comments as yet. Please look at that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Simon Riggs
On 29 September 2014 22:54, Josh Berkus j...@agliodbs.com wrote:
 On 09/26/2014 01:06 AM, Simon Riggs wrote:
 On 23 September 2014 00:56, Josh Berkus j...@agliodbs.com wrote:

 We've hashed that out a bit, but frankly I think it's much more
 profitable to pursue fixing the actual problem than providing a
 workaround like risk, such as:

 a) fixing n_distinct estimation
 b) estimating stacked quals using better math (i.e. not assuming total
 randomness)
 c) developing some kind of correlation stats

 Otherwise we would be just providing users with another knob there's no
 rational way to set.

 I believe this is a serious issue for PostgreSQL users and one that
 needs to be addressed.

 n_distinct can be fixed manually, so that is less of an issue.

 It's an issue for the 99.8% of our users who don't know what n_distinct
 is, let alone how to calculate it.  Also, changing it requires an
 exclusive lock on the table. Of course, you and I have been over this
 issue before.

In 9.4 you'll be able to set n_distinct using only a Share Update
Exclusive lock.

So that's no longer a problem.

The quality of the n_distinct itself is an issue, but with no current
solution, but then that is why you can set it manually,

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Simon Riggs
On 30 September 2014 00:00, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The way I'm seeing it, you can't assume the LIMIT will apply to any
 IndexScan that doesn't have an index condition. If it has just a
 filter, or nothing at all, just an ordering then it could easily scan
 the whole index if the stats are wrong.

 That statement applies with equal force to *any* plan with a LIMIT;
 it's not just index scans.

Agreed

 The real question is to what extent are the tuples satisfying the extra
 filter condition randomly distributed with respect to the index order
 (or physical order, if it's a seqscan).

Agreed

 The existing cost estimation
 code effectively assumes that they're perfectly uniformly distributed;
 which is a good average-case assumption but can be horribly wrong in
 the worst case.

Agreed. This is the main observation from which we can work.

 If we could settle on some other model for the probable distribution
 of the matching tuples, we could adjust the cost estimates for LIMIT
 accordingly.  I have not enough statistics background to know what a
 realistic alternative would be.

I'm not sure that the correlation alone is sufficient to be able to do
that. We'd need to estimate where the values looked for are likely to
be wrt other values, then increase estimate accordingly. That sounds
like a lot of pushups grovelling through quals and comparing against
stats. So my thinking is actually to rule that out, unless you've some
ideas for how to do that?

 Another possibility is to still assume a uniform distribution but estimate
 for, say, a 90% probability instead of 50% probability that we'll find
 enough tuples after scanning X amount of the table.  Again, I'm not too
 sure what that translates to in terms of the actual math, but it sounds
 like something a statistics person could do in their sleep.

 I do not think we should estimate for the worst case though.  If we do,
 we'll hear cries of anguish from a lot of people, including many of the
 same ones complaining now, because the planner stopped picking fast-start
 plans even for cases where they are orders of magnitude faster than the
 alternatives.

Fast start plans still make sense when performing an IndexScan with no
filter conditions. Those types of plan should not be changed from
current costing - they are accurate, good and very important because
of their frequency in real workloads.

What I think we are seeing is Ordered plans being selected too often
in preference to Sorted plans when we make selectivity or stats
errors. As well as data distributions that aren't correctly described
by the statistics causing much longer execution times.

Here are some plan selection strategies

* Cost based - attempt to exactly calculate the cost based upon
existing stats - increase the complexity of cost calc to cover other
aspects. Even if we do that, these may not be that helpful in covering
the cases where the stats turn out to be wrong.

* Risk based - A risk adjusted viewpoint would be that we should treat
the cost as mid-way between the best and the worst. The worst is
clearly scanning (100% - N) of the tuples, the best is just N tuples.
So we should be costing scans with excess filter conditions as a (100%
Scan)/2, no matter the conditions, based purely upon risk.

* Simplified heuristic - deselect ordered plans when they are driven
from scans without quals or indexscans with filters, since the risk
adjusted cost is likely to be higher than the sorted cost. Inspecting
the plan tree for this could be quite costly, so would only be done
when the total cost is $high, prior to it being adjusted by LIMIT.


In terms of practical steps... I suggest the following:

* Implement enable_orderedscan = on (default) | off. A switch to allow
plans to de-select ordered plans, so we can more easily see the
effects of such plans in the wild.

* Code heuristic approach - I can see where to add my heuristic in the
grouping planner. So we just need to do a left? deep search of the
plan tree looking for scans of the appropriate type and bail out if we
find one.

Thoughts?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-30 Thread Simon Riggs
On 30 September 2014 18:28, Jeff Janes jeff.ja...@gmail.com wrote:

 Anyway, in the particular case I posted fixing n_distinct to realistic
 numbers (%) fixed the query plan.


 But wouldn't fixing the absolute number also have fixed the plan?

There are two causes of this issue.

1. Poor estimates of n_distinct. Fixable by user.

2. Poor assumption of homogeneous distribution. No way for user to
fix. Insufficient stats detail to be able to solve in current planner.

I see (2) as the main source of issues, since as we observe, (1) is fixable.

An example is a social media application where the business query is
Display the last 10 posts. If the user is a frequent, recent user
then the query could come back very quickly, so a reverse scan on
post_id would work great. If the user hasn't logged on for ages, then
that plan needs to scan lots and lots of data to get to find 10 posts.
That gives the problem that only certain users experience poor
performance - even the data isn't consistent in its distribution, so
stats wouldn't help much, even if we could capture the profile of the
typical user.

  The problem, as I see it, is different. We assume that if there are
  100 distinct values and you use LIMIT 1 that you would only need to
  scan 1% of rows. We assume that the data is arranged in the table in a
  very homogenous layout. When data is not, and it seldom is, we get
  problems.
 
  Simply put, assuming that LIMIT will reduce the size of all scans is
  just way wrong. I've seen many plans where increasing the LIMIT
  dramatically improves the plan.
 
  If we can at least agree it is a problem, we can try to move forwards.


 I don't think anyone doubts there is a problem (many more than one of them),
 there is just disagreement about the priority and what can be done about it.


 That is certainly another problem.  Does correlation stat figure in the
 LIMIT calculation at all, currently?  That's what correlation stat is
 for, no?


 I don't think correlation is up to the task as a complete solution, although
 it might help a little.  There is no way a simple correlation can encode
 that John retired 15 years ago and hasn't logged on since, while Johannes
 was hired yesterday and never logged on before then.

Ah, OK, essentially the same example.

Which is why I ruled out correlation stats based approaches and
suggested a risk-weighted cost approach.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-29 Thread Simon Riggs
On 29 September 2014 16:00, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The problem, as I see it, is different. We assume that if there are
 100 distinct values and you use LIMIT 1 that you would only need to
 scan 1% of rows. We assume that the data is arranged in the table in a
 very homogenous layout. When data is not, and it seldom is, we get
 problems.

 Hm, good point -- 'data proximity'.  At least in theory, can't this be
 measured and quantified?  For example, given a number of distinct
 values, you could estimate the % of pages read (or maybe non
 sequential seeks relative to the number of pages) you'd need to read
 all instances of a particular value in the average (or perhaps the
 worst) case.   One way of trying to calculate that would be to look at
 proximity of values in sampled pages (and maybe a penalty assigned for
 high update activity relative to table size).  Data proximity would
 then become a cost coefficient to the benefits of LIMIT.

The necessary first step to this is to realise that we can't simply
apply the LIMIT as a reduction in query cost, in all cases.

The way I'm seeing it, you can't assume the LIMIT will apply to any
IndexScan that doesn't have an index condition. If it has just a
filter, or nothing at all, just an ordering then it could easily scan
the whole index if the stats are wrong.

So plans like this could be wrong, by assuming the scan will end
earlier because of the LIMIT than it actually will.

Limit
  IndexScan (no index cond)

Limit
  NestJoin
IndexScan (no index cond)
SomeScan

Limit
  NestJoin
NestJoin
  IndexScan (no index cond)
  SomeScan
   SomeScan

and deeper...

I'm looking for a way to identify and exclude such plans, assuming
that this captures at least some of the problem plans.

Comments? Test Cases?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Yet another abort-early plan disaster on 9.3

2014-09-26 Thread Simon Riggs
On 23 September 2014 00:56, Josh Berkus j...@agliodbs.com wrote:

 We've hashed that out a bit, but frankly I think it's much more
 profitable to pursue fixing the actual problem than providing a
 workaround like risk, such as:

 a) fixing n_distinct estimation
 b) estimating stacked quals using better math (i.e. not assuming total
 randomness)
 c) developing some kind of correlation stats

 Otherwise we would be just providing users with another knob there's no
 rational way to set.

I believe this is a serious issue for PostgreSQL users and one that
needs to be addressed.

n_distinct can be fixed manually, so that is less of an issue.

The problem, as I see it, is different. We assume that if there are
100 distinct values and you use LIMIT 1 that you would only need to
scan 1% of rows. We assume that the data is arranged in the table in a
very homogenous layout. When data is not, and it seldom is, we get
problems.

Simply put, assuming that LIMIT will reduce the size of all scans is
just way wrong. I've seen many plans where increasing the LIMIT
dramatically improves the plan.

If we can at least agree it is a problem, we can try to move forwards.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-09-17 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Frost j...@pgexperts.com writes:
 On Jun 30, 2014, at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Did you check whether the locks were all on temp tables of the
 ON COMMIT DROP persuasion?

 And indeed it did catch up overnight and the lag increased shortly after a 
 correlating spike in AccessExclusiveLocks that were generated by temp table 
 creation with on commit drop.

 OK, so we have a pretty clear idea of where the problem is now.

 It seems like there are three, not mutually exclusive, ways we might
 address this:

 1. Local revisions inside StandbyReleaseLocks to make it perform better in
 the presence of many locks.  This would only be likely to improve matters
 much if there's a fixable O(N^2) algorithmic issue; but there might well
 be one.

 2. Avoid WAL-logging AccessExclusiveLocks associated with temp tables, on
 the grounds that no standby should be touching them.  I'm not entirely
 sure that that argument is bulletproof though; in particular, even though
 a standby couldn't access the table's data, it's possible that it would be
 interested in seeing consistent catalog entries.

 3. Avoid WAL-logging AccessExclusiveLocks associated with
 new-in-transaction tables, temp or not, on the grounds that no standby
 could even see such tables until they're committed.  We could go a bit
 further and not take out any locks on a new-in-transaction table in the
 first place, on the grounds that other transactions on the master can't
 see 'em either.

 It sounded like Andres had taken a preliminary look at #1 and found a
 possible avenue for improvement, which I'd encourage him to pursue.

 For both #2 and the conservative version of #3, the main implementation
 problem would be whether the lock WAL-logging code has cheap access to
 the necessary information.  I suspect it doesn't.

 The radical version of #3 might be pretty easy to do, at least to the
 extent of removing locks taken out during CREATE TABLE.  I suspect there
 are some assertions or other consistency checks that would get unhappy if
 we manipulate relations without locks, though, so those would have to be
 taught about the exception.  Also, we sometimes forget new-in-transaction
 status during relcache flush events; it's not clear if that would be a
 problem for this.

 I don't plan to work on this myself, but perhaps someone with more
 motivation will want to run with these ideas.

Patch implements option 2 in the above.

Skipping the locks entirely seems like it opens a can of worms.

Skipping the lock for temp tables is valid since locks don't need to
exist on the standby. Any catalog entries for them will exist, but the
rows will show them as temp and nobody would expect them to be valid
outside of the original session.

Patch implements a special case that takes the lock normally, but
skips WAL logging the lock info.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


temp_tables_skip_logging_locks.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-07-03 Thread Simon Riggs
On 1 July 2014 20:20, Tom Lane t...@sss.pgh.pa.us wrote:

 I don't plan to work on this myself, but perhaps someone with more
 motivation will want to run with these ideas.

I was planning to work on improving performance of replication apply
over the summer, mid July - Aug, so I'll add this to the list.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-06-16 Thread Simon Riggs
On 16 June 2013 16:04, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 03.05.2013 15:41, Simon Riggs wrote:

 What appears to be happening is we're spending a lot of time in
 TransactionIdIsInProgress() so we can set hints and then when we find
 it is still in progress we then spend more time in XidIsInSnapshot()
 while we check that it is still invisible to us. Even if the
 transaction we see repeatedly ends, we will still pay the cost in
 XidIsInSnapshot repeatedly as we execute.

 Given that code path, I would expect it to suck worse on a live system
 with many sessions, and even worse with many subtransactions.

 (1) A proposed fix is attached, but its only a partial one and barely
 tested.

 Deeper fixes might be

 (2)  to sort the xid array if we call XidIsInSnapshot too many times
 in a transaction. I don't think that is worth it, because a long
 running snapshot may be examined many times, but is unlikely to see
 multiple in-progress xids repeatedly. Whereas your case seems
 reasonably common.


 Yeah, sorting would be a waste of time most of the time.

 Instead of adding a new cache field, how about just swapping the matched XID
 to the beginning of the array?

Do you think that is significantly different from what I've done?

 Did you have some simple performance test script for this?

Files attached to set up and tear down the test. Needs
max_prepared_transactions = 100

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


inprogr1.sql
Description: Binary data


inprogr2.sql
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-06-16 Thread Simon Riggs
On 16 June 2013 16:23, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 06.05.2013 04:51, Mark Kirkwood wrote:

 On 05/05/13 00:49, Simon Riggs wrote:

 On 3 May 2013 13:41, Simon Riggs si...@2ndquadrant.com wrote:

 (3) to make the check on TransactionIdIsInProgress() into a heuristic,
 since we don't *need* to check that, so if we keep checking the same
 xid repeatedly we can reduce the number of checks or avoid xids that
 seem to be long running. That's slightly more coding than my quick
 hack here but seems worth it.

 I think we need both (1) and (3) but the attached patch does just (1).

 This is a similar optimisation to the one I introduced for
 TransactionIdIsKnownCompleted(), except this applies to repeated
 checking of as yet-incomplete xids, and to bulk concurrent
 transactions.


 ISTM we can improve performance of TransactionIdIsInProgress() by
 caching the procno of our last xid.

 Mark, could you retest with both these patches? Thanks.


 Thanks Simon, will do and report back.


 Did anyone ever try (3) ?

No, because my other patch meant I didn't need to. In other words, my
other patch speeded up repeated access enough I didn't care about (3)
anymore.


 I'm not sure if this the same idea as (3) above, but ISTM that
 HeapTupleSatisfiesMVCC doesn't actually need to call
 TransactionIdIsInProgress(), because it checks XidInMVCCSnapshot(). The
 comment at the top of tqual.c says:

  * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT
 array)
  * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
  * pg_clog).  Otherwise we have a race condition: we might decide that a
  * just-committed transaction crashed, because none of the tests succeed.
  * xact.c is careful to record commit/abort in pg_clog before it unsets
  * MyPgXact-xid in PGXACT array.  That fixes that problem, but it also
  * means there is a window where TransactionIdIsInProgress and
  * TransactionIdDidCommit will both return true.  If we check only
  * TransactionIdDidCommit, we could consider a tuple committed when a
  * later GetSnapshotData call will still think the originating transaction
  * is in progress, which leads to application-level inconsistency.
 The
  * upshot is that we gotta check TransactionIdIsInProgress first in all
  * code paths, except for a few cases where we are looking at
  * subtransactions of our own main transaction and so there can't be any
  * race condition.


 If TransactionIdIsInProgress() returns true for a given XID, then surely it
 was also running when the snapshot was taken (or had not even began yet). In
 which case the XidInMVCCSnapshot() call will also return true. Am I missing
 something?

 There's one little problem: we currently only set the hint bits when
 TransactionIdIsInProgress() returns false. If we do that earlier, then even
 though HeapTupleSatisfiesMVCC works correctly thanks to the
 XidInMVCCSnapshot call, other HeapTupleSatisfies* functions that don't call
 XIdInMVCCSnapshot might see the tuple as committed or aborted too early, if
 they see the hint bit as set while the transaction is still in-progress
 according to the proc array. Would have to check all the callers of those
 other HeapTupleSatisfies* functions to verify if that's OK.

Well, I looked at that and its too complex and fiddly to be worth it, IMHO.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Simon Riggs
On 7 May 2013 01:23,  mark.kirkw...@catalyst.net.nz wrote:

 I'm thinking that a variant of (2) might be simpler to inplement:

 (I think Matt C essentially beat me to this suggestion - he originally
 discovered this issue). It is probably good enough for only *new* plans to
 react to the increased/increasing number of in progress rows. So this
 would require backends doing significant numbers of row changes to either
 directly update pg_statistic or report their in progress numbers to the
 stats collector. The key change here is the partial execution numbers
 would need to be sent. Clearly one would need to avoid doing this too
 often (!) - possibly only when number of changed rows 
 autovacuum_analyze_scale_factor proportion of the relation concerned or
 similar.

Are you loading using COPY? Why not break down the load into chunks?

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-07 Thread Simon Riggs
On 7 May 2013 07:32, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:
 On 07/05/13 18:10, Simon Riggs wrote:

 On 7 May 2013 01:23,  mark.kirkw...@catalyst.net.nz wrote:

 I'm thinking that a variant of (2) might be simpler to inplement:

 (I think Matt C essentially beat me to this suggestion - he originally
 discovered this issue). It is probably good enough for only *new* plans
 to
 react to the increased/increasing number of in progress rows. So this
 would require backends doing significant numbers of row changes to either
 directly update pg_statistic or report their in progress numbers to the
 stats collector. The key change here is the partial execution numbers
 would need to be sent. Clearly one would need to avoid doing this too
 often (!) - possibly only when number of changed rows 
 autovacuum_analyze_scale_factor proportion of the relation concerned or
 similar.


 Are you loading using COPY? Why not break down the load into chunks?


 INSERT - but we could maybe workaround by chunking the INSERT. However that
 *really* breaks the idea that in SQL you just say what you want, not how the
 database engine should do it! And more practically means that the most
 obvious and clear way to add your new data has nasty side effects, and you
 have to tip toe around muttering secret incantations to make things work
 well :-)

Yes, we'd need to break up SQL statements into pieces and use external
transaction snapshots to do that.

 I'm still thinking that making postgres smarter about having current stats
 for getting the actual optimal plan is the best solution.

I agree.

The challenge now is to come up with something that actually works;
most of the ideas have been very vague and ignore the many downsides.
The hard bit is the analysis and balanced thinking, not the
developing.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-06 Thread Simon Riggs
On 6 May 2013 02:51, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:
 On 05/05/13 00:49, Simon Riggs wrote:

 On 3 May 2013 13:41, Simon Riggs si...@2ndquadrant.com wrote:

 (3) to make the check on TransactionIdIsInProgress() into a heuristic,
 since we don't *need* to check that, so if we keep checking the same
 xid repeatedly we can reduce the number of checks or avoid xids that
 seem to be long running. That's slightly more coding than my quick
 hack here but seems worth it.

 I think we need both (1) and (3) but the attached patch does just (1).

 This is a similar optimisation to the one I introduced for
 TransactionIdIsKnownCompleted(), except this applies to repeated
 checking of as yet-incomplete xids, and to bulk concurrent
 transactions.


 ISTM we can improve performance of TransactionIdIsInProgress() by
 caching the procno of our last xid.

 Mark, could you retest with both these patches? Thanks.


 Thanks Simon, will do and report back.

OK, here's a easily reproducible test...

Prep:
DROP TABLE IF EXISTS plan;
CREATE TABLE plan
(
  id INTEGER NOT NULL,
  typ INTEGER NOT NULL,
  dat TIMESTAMP,
  val TEXT NOT NULL
);
insert into plan select generate_series(1,10), 0,
current_timestamp, 'some texts';
CREATE UNIQUE INDEX plan_id ON plan(id);
CREATE INDEX plan_dat ON plan(dat);

testcase.pgb
select count(*) from plan where dat is null and typ = 3;

Session 1:
pgbench -n -f testcase.pgb -t 100

Session 2:
BEGIN; insert into plan select 100 + generate_series(1, 10),
3, NULL, 'b';

Transaction rate in Session 1: (in tps)
(a) before we run Session 2:
Current: 5600tps
Patched: 5600tps

(b) after Session 2 has run, yet before transaction end
Current: 56tps
Patched: 65tps

(c ) after Session 2 has aborted
Current/Patched: 836, 1028, 5400tps
VACUUM improves timing again

New version of patch attached which fixes a few bugs.

Patch works and improves things, but we're still swamped by the block
accesses via the index.

Which brings me back to Mark's original point, which is that we are
x100 times slower in this case and it *is* because the choice of
IndexScan is a bad one for this situation.

After some thought on this, I do think we need to do something about
it directly, rather than by tuning infrastructire (as I just
attempted). The root cause here is that IndexScan plans are sensitive
to mistakes in data distribution, much more so than other plan types.

The two options, broadly, are to either

1. avoid IndexScans in the planner unless they have a *significantly*
better cost. At the moment we use IndexScans if cost is lowest, even
if that is only by a whisker.

2. make IndexScans adaptive so that they switch to other plan types
mid-way through execution.

(2) seems fairly hard generically, since we'd have to keep track of
the tids returned from the IndexScan to allow us to switch to a
different plan and avoid re-issuing rows that we've already returned.
But maybe if we adapted the IndexScan plan type so that it adopted a
more page oriented approach internally, it could act like a
bitmapscan. Anyway, that would need some proof that it would work and
sounds like a fair task.

(1) sounds more easily possible and plausible. At the moment we have
enable_indexscan = off. If we had something like
plan_cost_weight_indexscan = N, we could selectively increase the cost
of index scans so that they would be less likely to be selected. i.e.
plan_cost_weight_indexscan = 2 would mean an indexscan would need to
be half the cost of any other plan before it was selected. (parameter
name selected so it could apply to all parameter types). The reason to
apply this weighting would be to calculate risk adjusted cost not
just estimated cost.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


cache_TransactionIdInProgress.v2.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-04 Thread Simon Riggs
On 3 May 2013 13:41, Simon Riggs si...@2ndquadrant.com wrote:

 (3) to make the check on TransactionIdIsInProgress() into a heuristic,
 since we don't *need* to check that, so if we keep checking the same
 xid repeatedly we can reduce the number of checks or avoid xids that
 seem to be long running. That's slightly more coding than my quick
 hack here but seems worth it.

 I think we need both (1) and (3) but the attached patch does just (1).

 This is a similar optimisation to the one I introduced for
 TransactionIdIsKnownCompleted(), except this applies to repeated
 checking of as yet-incomplete xids, and to bulk concurrent
 transactions.

ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.

Mark, could you retest with both these patches? Thanks.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


cache_TransactionIdInProgress.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-03 Thread Simon Riggs
On 2 May 2013 23:19,  mark.kirkw...@catalyst.net.nz wrote:
 On 2 May 2013 01:49, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:

 I think we need a problem statement before we attempt a solution,
 which is what Tom is alluding to.


 Actually no - I think Tom (quite correctly) was saying that the patch was
 not a viable solution. With which I agree.

 I believe the title of this thread is the problem statement.

 ISTM that you've got a case where the plan is very sensitive to a
 table load. Which is a pretty common situation and one that can be
 solved in various ways. I don't see much that Postgres can do because
 it can't know ahead of time you're about to load rows. We could
 imagine an optimizer that set thresholds on plans that caused the
 whole plan to be recalculated half way thru a run, but that would be a
 lot of work to design and implement and even harder to test. Having
 static plans at least allows us to discuss what it does after the fact
 with some ease.

 The plan is set using stats that are set when there are very few
 non-NULL rows, and those increase massively on load. The way to cope
 is to run the ANALYZE immediately after the load and then don't allow
 auto-ANALYZE to reset them later.

 No. We do run analyze immediately after the load. The surprise was that
 this was not sufficient - the (small) amount of time where non optimal
 plans were being used due to the in progress row activity was enough to
 cripple the system - that is the problem. The analysis of why not led to
 the test case included in the original email. And sure it is deliberately
 crafted to display the issue, and is therefore open to criticism for being
 artificial. However it was purely meant to make it easy to see what I was
 talking about.

I had another look at this and see I that I read the second explain incorrectly.

The amount of data examined and returned is identical in both plans.
The only difference is the number of in-progress rows seen by the
second query. Looking at the numbers some more, it looks like 6000
in-progress rows are examined in addition to the data. It might be
worth an EXPLAIN patch to put instrumentation in to show that, but its
not that interesting.

It would be useful to force the indexscan into a bitmapscan to check
that the cost isn't attributable to the plan but to other overheads.

What appears to be happening is we're spending a lot of time in
TransactionIdIsInProgress() so we can set hints and then when we find
it is still in progress we then spend more time in XidIsInSnapshot()
while we check that it is still invisible to us. Even if the
transaction we see repeatedly ends, we will still pay the cost in
XidIsInSnapshot repeatedly as we execute.

Given that code path, I would expect it to suck worse on a live system
with many sessions, and even worse with many subtransactions.

(1) A proposed fix is attached, but its only a partial one and barely tested.

Deeper fixes might be

(2)  to sort the xid array if we call XidIsInSnapshot too many times
in a transaction. I don't think that is worth it, because a long
running snapshot may be examined many times, but is unlikely to see
multiple in-progress xids repeatedly. Whereas your case seems
reasonably common.

(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.

I think we need both (1) and (3) but the attached patch does just (1).

This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


xid_in_snapshot_cache.v1.patch
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-05-02 Thread Simon Riggs
On 2 May 2013 01:49, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote:
 On 02/05/13 02:06, Tom Lane wrote:

 Mark Kirkwood mark.kirkw...@catalyst.net.nz writes:

 I am concerned that the deafening lack of any replies to my original
 message is a result of folk glancing at your original quick reply and
 thinking... incomplete problem spec...ignore... when that is not that
 case - yes I should have muttered 9.2 in the original email, but we
 have covered that now.

 No, I think it's more that we're trying to get to beta, and so anything
 that looks like new development is getting shuffled to folks' to
 look at later queues.  The proposed patch is IMO a complete nonstarter
 anyway; but I'm not sure what a less bogus solution would look like.


 Yeah, I did think that beta might be consuming everyone's attention (of
 course immediately *after* sending the email)!

 And yes, the patch was merely to illustrate the problem rather than any
 serious attempt at a solution.

I think we need a problem statement before we attempt a solution,
which is what Tom is alluding to.

ISTM that you've got a case where the plan is very sensitive to a
table load. Which is a pretty common situation and one that can be
solved in various ways. I don't see much that Postgres can do because
it can't know ahead of time you're about to load rows. We could
imagine an optimizer that set thresholds on plans that caused the
whole plan to be recalculated half way thru a run, but that would be a
lot of work to design and implement and even harder to test. Having
static plans at least allows us to discuss what it does after the fact
with some ease.

The plan is set using stats that are set when there are very few
non-NULL rows, and those increase massively on load. The way to cope
is to run the ANALYZE immediately after the load and then don't allow
auto-ANALYZE to reset them later.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner ignoring constraints on partitioned tables when joining

2013-05-02 Thread Simon Riggs
On 18 April 2013 22:42, Tom Lane t...@sss.pgh.pa.us wrote:

 One could imagine adding planner logic that would make inferences of a
 similar sort for equalities combined with inequalities, but it would be
 vastly more complicated, and would provide useful results in vastly
 fewer queries, than the equality-propagation logic.  So don't hold your
 breath waiting for something like that to happen.

I'll take note that we need to make partitioning work for merge joins also.

On a more general note, it would be good to be able to look at the
starting value from the driving table of the join and use that as a
constraint in the scan on the second table. We rely on that mechanism
for nested loop joins, so we could do with that here also.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WHERE 1 = 2 OR ... makes planner choose a very inefficient plan

2013-05-02 Thread Simon Riggs
On 18 April 2013 15:46, Tom Lane t...@sss.pgh.pa.us wrote:
 dmitry potapov potapov.dmi...@gmail.com writes:
 I recently stumbled upon on what could be a planner bug or a corner case.
 If false condition OR ... is added to WHERE clause of SELECT query,
 then the planner chooses a very inefficient plan. Consider a query:

 SELECT count(k0.id)
 FROM k0
 WHERE 1 = 2
 OR k0.id IN (
 SELECT k1.k0_id
 FROM k1
 WHERE k1.k1k2_id IN (
 SELECT k2.k1k2_id
 FROM k2
 WHERE k2.t = 2
 AND (coalesce(k2.z, '')) LIKE '%12%'
 )
 );

 Perhaps you should fix your application to not generate such incredibly
 silly SQL.  Figuring out that 1=2 is constant false and throwing it away
 costs the server easily a thousand times as many instructions as it
 would take for the client to not emit that in the first place.

 The reason you don't get a nice semijoin plan when you do that is that
 conversion of IN clauses to semijoins happens before
 constant-subexpression simplification.  So the planner hasn't yet
 figured out that the OR is useless when it would need to know that to
 produce a good plan.  (And no, we can't just flip the order of those two
 steps.  Doing two rounds of const-simplification wouldn't be a good
 answer either, because it would penalize well-written queries to benefit
 badly-written ones.)

The situation shown could be the result of SQL injection attack.

It would be nice to have a switch to do additional checks on SQL
queries to ensure such injections don't cause long runtimes to return
useless answers.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance die when COPYing to table with bigint PK

2011-08-01 Thread Simon Riggs
On Sun, Jul 31, 2011 at 2:51 PM, Robert Ayrapetyan
robert.ayrapet...@comodo.com wrote:

 I've found strange behavior of my pg installation (tested both 8.4 and
 9.0 - they behave same) on FreeBSD platform.
 In short - when some table have PK on bigint field - COPY to that
 table from file becomes slower and slower as table grows. When table
 reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
 experimented with all params in configs, moved indexes to separate hdd
 etc - nothing made any improvement. However, once I'm dropping 64 bit
 PK - COPY of 100k records passes in seconds. Interesting thing - same
 table has other indexes, including composite ones, but none of them
 include bigint fields, that's why I reached decision that bug
 connected with indexes on bigint fields only.

 In terms of IO picture is following: after copy started gstat shows
 100% load on index partition (as I mentioned above - I've tried
 separate hdd to keep index tablespace), large queue (over 2k
 elements), and constant slow write on speed of ~2MB\s. Hdd becomes
 completely unresponsive, even ls on empty folder hangs for minute or
 so.

 To avoid thoughts like your hdd is slow, you haven't tuned
 postgresql.conf etc - all slowness dissapears with drop of bigint PK,
 same time other indexes on same table remain alive. And yes - I've
 tried drop PK \ recreate PK, vacuum full analyze and all other things
 - nothing helped, only drop helps.

 Is this known and expected behavior?

This is a duplicate post with one on BUGS, being discussed there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 9.0.4 + Hot Standby + FusionIO Drive + Performance = Query failed ERROR: catalog is missing 1 attribute(s) for relid 172226

2011-05-08 Thread Simon Riggs
On Tue, May 3, 2011 at 10:02 AM, Sethu Prasad sethuprasad...@gmail.com wrote:

 I tried with the PostgreSQL 9.0.4 + Hot Standby and running the database
 from Fusion IO Drive to understand the PG Performance.

 While doing so I got the Query failed ERROR: catalog is missing 1
 attribute(s) for relid 172226. Any idea on this error? Is that combination
 PG + HotSB + Fusion IO Drive is not advisable?!

Why I wonder do you think this might have anything to do with Hot
Standby and/or FusionIO drives?

This indicates either catalog or catalog index corruption of some kind.

Did you only get this error once?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group commit and commit delay/siblings

2010-12-08 Thread Simon Riggs
On Mon, 2010-12-06 at 23:52 -0500, Greg Smith wrote:
 Jignesh Shah wrote:
  On Tue, Dec 7, 2010 at 1:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:

  I could have sworn we'd refactored that to something like
 bool ThereAreAtLeastNActiveBackends(int n)
  which could drop out of the loop as soon as it'd established what we
  really need to know...I'd suggest that we just improve the
  coding so that we don't scan ProcArray at all when commit_siblings is 0.
 
  (I do agree with improving the docs to warn people away from assuming
  this is a knob to frob mindlessly.)
  
  In that case I propose that we support commit_siblings=0 which is not
  currently supported. Minimal value for commit_siblings  is currently
  1. If we support commit_siblings=0 then it should short-circuit that
  function call which is often what I do in my tests with commit_delay.

 
 Everybody should be happy now:  attached patch refactors the code to 
 exit as soon as the siblings count is exceeded, short-circuits with no 
 scanning of ProcArray if the minimum is 0, and allows setting the 
 siblings to 0 to enable that shortcut:

Minor patch, no downsides. Docs checked. Committed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition pruning

2010-03-08 Thread Simon Riggs
On Thu, 2010-03-04 at 17:40 -0500, Robert Haas wrote:
 On Mon, Mar 1, 2010 at 2:29 PM, Anj Adu fotogra...@gmail.com wrote:
  When I use intervals in my query e.g  col1 between current_timestamp -
  interval '10 days' and current_timestamp...the optimizer checks ALL
  partitions  whereas if I use   col1 between 2 hardcoded dates..only
  the applicable partitions are scanned.
 
 Yep.  This is one example of a more general principle:
 constant-folding happens before planning, but anything more complex
 has to wait until execution time.  So the plan can't take into account
 the value of current_timestamp in forming the plan.

It could, but it doesn't yet. Partition removal can take place in the
executor and this is currently targeted for 9.1.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-02-04 Thread Simon Riggs
On Thu, 2010-01-07 at 13:38 +0100, Lefteris wrote:
 Reported query times are (in sec):
 MonetDB 7.9s
 InfoBright 12.13s
 LucidDB 54.8s

It needs to be pointed out that those databases are specifically
optimised for Data Warehousing, whereas Postgres core is optimised for
concurrent write workloads in production systems.

If you want a best-vs-best type of comparison, you should be looking at
a version of Postgres optimised for Data Warehousing. These results show
that Postgres-related options exist that clearly beat the above numbers.
http://community.greenplum.com/showthread.php?t=111
I note also that Greenplum's Single Node Edition is now free to use, so
is a reasonable product for comparison on this list.

Also, I'm unimpressed by a Data Warehouse database that requires
everything to reside in memory, e.g. MonetDB. That severely limits
real-world usability, in my experience because it implies the queries
you're running aren't ad-hoc.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:

 one thing we'd have to consider
 is whether it is okay to suppress calculation of columns containing
 volatile functions.

I think we should have a 4th class of functions,
volatile-without-side-effects (better name needed, obviously).

That would allow us to optimize such calls away, if appropriate.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Mon, 2009-10-19 at 13:43 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sat, 2009-10-17 at 21:41 -0400, Tom Lane wrote:
  one thing we'd have to consider
  is whether it is okay to suppress calculation of columns containing
  volatile functions.
 
  I think we should have a 4th class of functions,
  volatile-without-side-effects (better name needed, obviously).
 
 What for?  There wouldn't be that many, I think.  random() and
 clock_timestamp(), yeah, but most volatile user-defined functions
 are either volatile-with-side-effects or misdeclared.

Read only vs. read write?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculation of unused columns

2009-10-19 Thread Simon Riggs
On Mon, 2009-10-19 at 13:58 -0400, Tom Lane wrote:
 
 Most read-only functions are stable or even immutable.

Huh? I mean a function that only contains SELECTs. (How would those ever
be Stable or Immutable??)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up a query.

2009-07-07 Thread Simon Riggs

On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote:

 The algorithm for packing appointments in respects each constraint and
 typically schedules a day of treatments (30-60) within 9-10 seconds on
 my workstation, down from 27 seconds initially. I would like to get it
 below 5 seconds if possible.
 
 I think what's slowing is down is simply the number of rows and joins.
 The algorithm creates a scheduling matrix with one row per 5 minute
 timeslot, per unit, per nurse assigned to the unit. That translates to
 3,280 rows for the days I have designed in development (each day can
 change). 

ISTM the efficiency of your algorithm is geometrically related to the
number of time slots into which appointments might fit. So reduce number
of possible time slots...

Assign the slot (randomly/hash/round robin) to either the morning or the
afternoon and then run exactly same queries just with half number of
time slots. That should reduce your execution time by one quarter
without using multiple CPUs for each morning/afternoon. Then run twice,
once for morning, once for afternoon.

You could parallelise this and run both at same time on different CPUs,
if the extra work is worthwhile, but it seems not, judging from your
requirements.

Another way would be to arrange all appointments that need odd number of
timeslots into pairs so that you have at most one appointment that needs
an odd number of timeslots. Then schedule appointments on 10 minute
boundaries, rounding up their timeslot requirement. (The single odd
timeslot appointment will always waste 1 timeslot).

Hope that helps.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-06 Thread Simon Riggs

On Wed, 2009-06-03 at 21:21 -0400, Robert Haas wrote:

 But, we're not always real clever about selectivity.  Sometimes you
 have to fake the planner out, as discussed here.
 
 http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php
 
 Actually, I had to do this today on a production application.  In my
 case, the planner thought that a big OR clause was not very selective,
 so it figured it wouldn't have to scan very far through the outer side
 before it found enough rows to satisfy the LIMIT clause.  Therefore it
 materialized the inner side instead of hashing it, and when the
 selectivity estimate turned out to be wrong, it took 220 seconds to
 execute.  I added a fake join condition of the form a || b = a || b,
 where a and b were on different sides of the join, and now it hashes
 the inner side and takes  100 ms.
 
 Fortunately, these kinds of problems are fairly rare, but they can be
 extremely frustrating to debug.  With any kind of query debugging, the
 first question to ask yourself is Are any of my selectivity estimates
 way off?.  If the answer to that question is no, you should then ask
 Where is all the time going in this plan?.  If the answer to the
 first question is yes, though, your time is usually better spent
 fixing that problem, because once you do, the plan will most likely
 change to something a lot better.

The Function Index solution works, but it would be much better if we
could get the planner to remember certain selectivities.

I'm thinking a command like

ANALYZE foo [WHERE  ]

which would specifically analyze the selectivity of the given WHERE
clause for use in queries.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-20 Thread Simon Riggs

On Tue, 2009-05-19 at 23:54 -0400, Robert Haas wrote:

 I don't think it's a good idea to write off the idea of implementing
 this optimization at some point.  I see a lot of queries that join one
 fairly large table against a whole bunch of little tables, and then
 sorting the results by a column that is indexed in the big table.  

Agreed it's a common use case.

 The
 optimizer handles this by sequentially scanning the big table, hash
 joining against all of the little tables, and then sorting the output,
 which is pretty silly (given that all of the tables fit in RAM and are
 in fact actually cached there).  If there is a LIMIT clause, then it
 might instead index-scan the big table, do the hash joins, and then
 sort the already-ordered results.  This is better because at least
 we're not sorting the entire table unnecessarily but it's still poor.

The Hash node is fully executed before we start pulling rows through the
Hash Join node. So the Hash Join node will know at execution time
whether or not it will continue to maintain sorted order. So we put the
Sort node into the plan, then the Sort node can just ask the Hash Join
at execution time whether it should perform a sort or just pass rows
through (act as a no-op).

The cost of the Sort node can either be zero, or pro-rated down from the
normal cost based upon what we think the probability is of going
multi-batch, which would vary by work_mem available.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-20 Thread Simon Riggs

On Wed, 2009-05-20 at 07:17 -0400, Robert Haas wrote:
 On Wed, May 20, 2009 at 4:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
  The Hash node is fully executed before we start pulling rows through the
  Hash Join node. So the Hash Join node will know at execution time
  whether or not it will continue to maintain sorted order. So we put the
  Sort node into the plan, then the Sort node can just ask the Hash Join
  at execution time whether it should perform a sort or just pass rows
  through (act as a no-op).
 
 It's not actually a full sort.  For example if the join has two
 batches, you don't need to dump all of the tuples from both batches
 into a sort.  Each of the two tapes produced by the hash join is
 sorted, but if you read tape one and then tape two, of course then it
 won't be.  What you want to do is read the first tuple from each tape
 and return whichever one is smaller, and put the other one back; then
 lather, rinse, and repeat.  Because it's such a special-case
 computation, I think you're going to want to implement it within the
 HashJoin node rather than inserting a Sort node (or any other kind).

That has wider applicability and seems sound. It will also be easier to
assess a cost for that aspect in the optimizer. I like that approach.

Code wise, you'll need to refactor things quite a lot to make the
tuplesort code accessible to the HJ node. The sorting code is going to
get pretty hectic if we add in all the ideas for this, partial sort,
improved sorting (at least 3 other ideas). Perhaps it will be easier to
write a specific final merge routine just for HJs. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 00:33 +0200, Dimitri wrote:
 
  In particular, running the tests repeatedly using
  H.REF_OBJECT = '01'
  rather than varying the value seems likely to benefit MySQL. The
 
 let me repeat again - the reference is *random*,
 the '01' value I've used just to show a query execution
 plan.
 
 also, what is important - the random ID is chosen in way that no one
 user use the same to avoid deadlocks previously seen with PostgreSQL
 (see the Deadlock mystery note 2 years ago
 http://dimitrik.free.fr/db_STRESS_BMK_Part1.html#note_4355 )

OK, didn't pick up on that.

(Like Tom, I was thinking query cache)

Can you comment on the distribution of values for that column? If you
are picking randomly, this implies distribution is uniform and so I am
surprised we are mis-estimating the selectivity.

 I think yes (but of course I did not try to replay it several times)

If you could that would be appreciated. We don't want to go chasing
after something that is not repeatable.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Mon, 2009-05-18 at 19:00 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  In particular, running the tests repeatedly using 
  H.REF_OBJECT = '01'
  rather than varying the value seems likely to benefit MySQL.

One thing to note in terms of optimisation of this query is that we
perform a top-level sort at the end of the query.

Both plans for this query show an IndexScan on a two column-index, with
an Index Condition of equality on the leading column. The ORDER BY
specifies a sort by the second index column, so the top-level Sort is
superfluous in this case.

My understanding is that we don't currently eliminate superfluous
additional sorts of this kind. Now I know that is a hard subject, but it
seems straightforward to consider interesting sort order equivalence
when we have constant equality constraints.

My guess would be that MySQL does do the sort removal, in latest
version.

Dimitri's EXPLAIN ANALYZEs show differing costs for that additional
step, but the around 10% of query time looks shaveable.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 12:36 +0100, Simon Riggs wrote:

 Partially sorted data takes much less effort to sort (OK, not zero, I
 grant) so this seems like a high complexity, lower value feature. I
 agree it should be on the TODO, just IMHO at a lower priority than some
 other features.

Perhaps its worth looking at a hybrid merge-join/hash-join that can cope
with data only mostly-sorted rather than fully sorted. That way we can
probably skip the partial sort altogether.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 14:00 +0200, Dimitri wrote:

 I may confirm the issue with hash join - it's repeating both with
 prepared and not prepared statements - it's curious because initially
 the response time is lowering near ~1ms (the lowest seen until now)
 and then once workload growing to 16 sessions it's jumping to 2.5ms,
 then with 32 sessions it's 18ms, etc..

Is it just bad all the time, or does it get worse over time?

Do you get the same behaviour as 32 sessions if you run 16 sessions for
twice as long?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 08:58 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  Both plans for this query show an IndexScan on a two column-index, with
  an Index Condition of equality on the leading column. The ORDER BY
  specifies a sort by the second index column, so the top-level Sort is
  superfluous in this case.
 
  My understanding is that we don't currently eliminate superfluous
  additional sorts of this kind.
 
 Nonsense.  The planner might think some other plan is cheaper, but
 it definitely knows how to do this, and has since at least 8.1.

Please look at Dimitri's plan. If it can remove the pointless sort, why
does it not do so?

I agree that it will remove a Sort when the data is already has the
exact same interesting sort order. In this case the sort order is not
exactly the same, but looks fully removable to me.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Simon Riggs

On Tue, 2009-05-19 at 13:01 +0100, Matthew Wakeling wrote:

 That leads me on to another topic. Consider the query:
 
 SELECT * FROM table ORDER BY a, b
 
 where the column a is declared UNIQUE and has an index. Does Postgres 
 eliminate b from the ORDER BY, and therefore allow fetching without 
 sorting from the index?

No, because we don't use unique constraints much at all to infer things.

 Or how about this query:
 
 SELECT * FROM table1, table2 WHERE table1.fk = table2.id ORDER BY
  table1.id, table2.id
 
 where both id columns are UNIQUE with an index. Do we eliminate 
 table2.id from the ORDER BY in this case?

Yes, that is eliminated via equivalence classes.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Simon Riggs

On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote:

 # lwlock_wait_8.4.d `pgrep -n postgres`

Lock IdMode   Combined Time (ns)
   FirstLockMgrLock   Exclusive 803700
BufFreelistLock   Exclusive 3001600
   FirstLockMgrLock  Shared   4586600
  FirstBufMappingLock   Exclusive  6283900
  FirstBufMappingLock  Shared 21792900

I've published two patches to -Hackers to see if we can improve the read
only numbers on 32+ cores.

Try shared_buffer_partitions = 256

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Simon Riggs

On Wed, 2009-05-13 at 23:23 +0200, Dimitri Fontaine wrote:

 As I think I need this solution too, I've coded a PG module to
 scratch  
 that itch this morning, and just published it (BSD licenced) on  
 pgfoundry:
http://preprepare.projects.postgresql.org/README.html
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/preprepare/preprepare/

Looks very cool Dimitri

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Simon Riggs

On Mon, 2009-05-18 at 20:00 +0200, Dimitri wrote:

 From my point of view it needs first to understand where the time is
 wasted on a single query (even when the statement is prepared it runs
 still slower comparing to MySQL).

There is still a significant number of things to say about these numbers
and much tuning still to do, so I'm still confident of improving those
numbers if we needed to.

In particular, running the tests repeatedly using 
H.REF_OBJECT = '01'
rather than varying the value seems likely to benefit MySQL. The
distribution of values is clearly non-linear; while Postgres picks a
strange plan for that particular value, I would guess there are also
values for which the MySQL plan is sub-optimal. Depending upon the
distribution of selected data we might see the results go either way.

What I find worrying is your result of a scalability wall for hash
joins. Is that a repeatable issue?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-14 Thread Simon Riggs

On Tue, 2009-05-12 at 14:28 +0200, Dimitri wrote:

 As problem I'm considering a scalability issue on Read-Only workload -
 only selects, no disk access, and if on move from 8 to 16 cores we
 gain near 100%, on move from 16 to 32 cores it's only 10%...

Dimitri,

Will you be re-running the Read-Only tests?

Can you run the Dtrace script to assess LWlock contention during the
run?

Would you re-run the tests with a patch?

Thanks,

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

 For my big surprise, MySQL was faster!

Ours too.

** I bet you $1000 that I can improve the performance of your benchmark
results with PostgreSQL. You give me $1000 up-front and if I can't
improve your high end numbers I'll give you $2000 back. Either way, you
name me and link to me from your blog. Assuming you re-run the tests as
requested and give me reasonable access to info and measurements. **

I note your blog identifies you as a Sun employee. Is that correct? If
you do not give us the opportunity to improve upon the results then
reasonable observers might be persuaded you did not wish to show
PostgreSQL in its best light. You up for it?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 13:16 +0200, Dimitri wrote:

 Wow, Simon! :-))
 
 yes, I'm working in Sun Benchmark Center :-))
 (I'm not using my Sun email on public lists only to avid a spam)
 
 and as came here and asking questions it's probably proving my
 intentions to show PostgreSQL in its best light, no?.. - I never liked
 not honest comparisons :-))
 
 Regarding your bet: from a very young age I learned a one thing - you
 take any 2 person who betting for any reason - you'll find in them one
 idiot and one bastard :-))   idiot - because betting while missing
 knowledge, and bastard - because knowing the truth is not honset to
 get a profit from idiots :-))  That's why I never betting in my life,
 but every time telling the same story in such situation... Did you
 like it? ;-))

No, but I asked for it, so we're even. ;-)

Let's work on the benchmark.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:

 What I'm trying to do now is to understand what exactly is the
 problem.

You're running with 1600 users, which is above the scalability limit
uncovered (by Sun...) during earlier benchmarking. The scalability
issues are understood but currently considered above the
reasonable-setting limit and so nobody has been inclined to improve
matters.

You should use a connection concentrator to reduce the number of
sessions down to say 400.

You're WAL buffers setting is also too low and you will be experiencing
contention on the WALWriteLock. Increase wal_buffers to about x8 where
you have it now.

You can move pg_xlog to its own set of drives.

Set checkpoint_completion_target to 0.95.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 16:00 +0100, Matthew Wakeling wrote:
 won't connect operations be all handled by a 
 single thread - the parent postmaster?

No, we spawn then authenticate. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 11:30 -0700, Scott Carey wrote:
 the fact is there is no evidence that a connection pooler will fix the
 scalability from 16  32 cores.

There has been much analysis over a number of years of the effects of
the ProcArrayLock, specifically the O(N^2) effect of increasing numbers
of connections on GetSnapshotData(). Most discussion has been on
-hackers, not -perform.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Simon Riggs

On Tue, 2009-05-12 at 15:52 -0400, Robert Haas wrote:
 On Tue, May 12, 2009 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  1. There is no (portable) way to pass the connection from the postmaster
  to another pre-existing process.
 
 [Googles.]  It's not obvious to me that SCM_RIGHTS is non-portable,
 and Windows has an API call WSADuplicateSocket() specifically for this
 purpose.

Robert, Greg,

Tom's main point is it isn't worth doing. We have connection pooling
software that works well, very well. Why do we want to bring it into
core? (Think of the bugs we'd hit...) If we did, who would care?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Simon Riggs

On Mon, 2009-05-11 at 17:18 +0200, Dimitri wrote:

 Yes, forget, MySQL is reaching 17.500 TPS here.

Please share your measurements of MySQL scalability also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Simon Riggs

On Mon, 2009-05-11 at 11:23 -0400, Tom Lane wrote:
 Dimitri dimitrik...@gmail.com writes:
  Anyone may explain me why analyze target may have so huge negative
  secondary effect?..
 
 If these are simple queries, maybe what you're looking at is the
 increase in planning time caused by having to process 10x as much
 statistical data.  Cranking statistics_target to the max just because
 you can is not necessarily a good strategy.

statistics_target effects tables, so we have problems if you have a mix
of simple and complex queries. IMHO we need an explicit planner_effort
control, rather than the more arcane *_limit knobs which are effectively
the same thing, just harder to use in practice.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
 Tom Lane wrote:
  Alvaro Herrera alvhe...@commandprompt.com writes:
  I think there should be a way to refer to individual partitions as
  objects.
  
  Yeah, the individual partitions should be nameable tables, otherwise we
  will be reinventing a *whole* lot of management stuff to little gain.
  I don't actually think there is anything wrong with using table
  inheritance as the basic infrastructure --- I just want more smarts
  about one particular use pattern of inheritance.
 
 Maybe it's worth examining and documenting existing partition setups,
 the reasoning behind them, and how they're implemented, in order to
 guide any future plans for native partitioning support?
 
 Maybe that's already been/being done. On the off chance that it's not:
 
 Ones I can think of:
 
 - Partitioning an equally active dataset by ranges over a key to improve
  scan performance, INSERT/UPDATE costs on indexes, locking issues, etc.
 
 - The classic active/archive partition scheme where there's only one
 partition growing at any one time, and the others are historical data
 that's nowhere near as hot.
 
 - A variant on the basic active/archive structure, where query activity
 decreases slowly over time and there are many partitions of recent data.
 Partitions are merged into larger ones as they age, somewhat like a RRD
 database.
 
 I also expect that in the future there will be demand for striping data
 across multiple partitions in different tablespaces to exploit
 in-parallel scanning (when/if supported) for better I/O utilization in
 multiple-disk-array situations. For example, partitioning on
 MOD(id,10) across 10 separate volumes, and firing off 10 concurrent
 scans, one per partition, to satisfy a query.

That's a good summary. It has already been documented and discussed, but
saying it again and again is the best way to get this across.

You've highlighted that partitioning is a feature with many underlying
requirements: infrequent access to data (frequently historical),
striping for parallelism and getting around RDBMS flaws (if any). We
must be careful to implement each requirement in full, yet separately,
so we don't end up with 60% functionality in each case by delivering an
average or least common denominator solution.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 12:58 +0100, Gregory Stark wrote:

 It would be nice if we were in the same ballpark as MySQL but we would only be
 interesting in such optimizations if they don't come at the expense of
 scalability under more complex workloads.

It doesn't appear there is a scalability issue here at all.

Postgres can clearly do the same query in about the same time.

We just have a case where MySQL happens to optimise it well and Postgres
doesn't. Since we can trivially design cases that show the opposite I'm
not worried too much. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Simon Riggs

On Thu, 2009-05-07 at 20:36 +0200, Dimitri wrote:

 I've simply restarted a full test with hashjoin OFF. Until 32
 concurrent users things are going well. Then since 32 users response
 time is jumping to 20ms, with 64 users it's higher again, and with 256
 users reaching 700ms, so TPS is dropping from 5.000 to ~200..
 
 With hashjoin ON it's not happening, and I'm reaching at least 11.000
 TPS on fully busy 32 cores.

Much better to stick to the defaults. 

Sounds like a problem worth investigating further, but not pro bono.

 About scalability issue - there is one on 8.3.7, because on 32 cores
 with such kind of load it's using only 50% CPU and not outpassing
 6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS..

Yeh, small changes make a big difference. Thanks for the info.

How does MySQL perform?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs

On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:

 The problem has been finding someone who has both the time and the
 ability to do the work.

Unfortunately there has been significant debate over which parts of
partitioning need to be improved. My own view is that considerable
attention needs to be applied to both the executor and planner to
improve matters and that syntax improvements are largely irrelevant,
though seductive.

Deep improvements will require significant analysis, agreement, effort
and skill. What we have now took approximately 20 days to implement,
with later patches adding about another 10-20 days work. I'd estimate
the required work as 60-100 days work from primary author, plus planning
and discussion time. YMMV.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Simon Riggs

On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote:

 I've already tried a target 1000 and the only thing it changes
 comparing to the current 100 (default) is instead of 2404 rows it says
 240 rows, but the plan remaining the same..

Try both of these things
* REINDEX on the index being used in the query, then re-EXPLAIN
* enable_hashjoin = off, then re-EXPLAIN

You should first attempt to get the same plan, then confirm it really is
faster before we worry why the optimizer hadn't picked that plan. 

We already know that MySQL favors nested loop joins, so turning up a
plan that on this occasion is actually better that way is in no way
representative of general performance. Does MySQL support hash joins?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs

On Wed, 2009-05-06 at 17:55 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:
  The problem has been finding someone who has both the time and the
  ability to do the work.
 
  Unfortunately there has been significant debate over which parts of
  partitioning need to be improved. My own view is that considerable
  attention needs to be applied to both the executor and planner to
  improve matters and that syntax improvements are largely irrelevant,
  though seductive.
 
 My thought about it is that what we really need is an explicit notion
 of partitioned tables built into the system, instead of trying to make
 the planner re-deduce the partitioning behavior from first principles
 every time it builds a plan for such a table.  Such a notion would
 presumably involve some new syntax to allow the partitioning rule to be
 specified at table creation time.  I agree that the syntax details are a
 minor issue, but the set of possible partitioning rules is certainly a
 topic of great interest.

Agreed. Perhaps I should say then that the syntax needs to express the
requirements of the planner/executor behaviour, rather than being the
main aspect of the feature, as some have suggested.

Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance for high-volume log insertion

2009-04-22 Thread Simon Riggs

On Mon, 2009-04-20 at 14:53 -0700, da...@lang.hm wrote:

 the big win is going to be in changing the core of rsyslog so that it can 
 process multiple messages at a time (bundling them into a single 
 transaction)

That isn't necessarily true as a single big win.

The reason there is an overhead per transaction is because of commit
delays, which can be removed by executing

  SET synchronous_commit = off; 

after connecting to PostgreSQL 8.3+

You won't need to do much else. This can also be enabled for a
PostgreSQL user without even changing the rsyslog source code, so it
should be easy enough to test.

And this type of application is *exactly* what it was designed for.

Some other speedups should also be possible, but this is easiest. 

I would guess that batching inserts will be a bigger win than simply
using prepared statements because it will reduce network roundtrips to a
centralised log server. Preparing statements might show up well on tests
because people will do tests against a local database, most likely.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] error updating a very large table

2009-04-15 Thread Simon Riggs

On Wed, 2009-04-15 at 09:51 -0400, Tom Lane wrote:
 Brian Cox brian@ca.com writes:
  I changed the logic to update the table in 1M row batches. However, 
  after 159M rows, I get:
 
  ERROR:  could not extend relation 1663/16385/19505: wrote only 4096 of 
  8192 bytes at block 7621407
 
 You're out of disk space.
 
  A df run on this machine shows plenty of space:
 
 Per-user quota restriction, perhaps?
 
 I'm also wondering about temporary files, although I suppose 100G worth
 of temp files is a bit much for this query.  But you need to watch df
 while the query is happening, rather than suppose that an after-the-fact
 reading means anything.

Anytime we get an out of space error we will be in the same situation.

When we get this error, we should
* summary of current temp file usage
* df (if possible on OS)

Otherwise we'll always be wondering what caused the error.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question on pgbench output

2009-04-05 Thread Simon Riggs

On Fri, 2009-04-03 at 16:34 -0700, David Kerr wrote:
 400 concurrent users doesn't mean that they're pulling 1.5 megs /
 second every second. Just that they could potentially pull 1.5 megs at
 any one second. most likely there is a 6 (minimum) to 45 second
 (average) gap  between each individual user's pull.

There's a world of difference between 400 connected and 400 concurrent
users. You've been testing 400 concurrent users, yet without measuring
data transfer. The think time will bring the number of users right down
again, but you really need to include the much higher than normal data
transfer into your measurements and pgbench won't help there.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plpgsql arrays

2009-04-03 Thread Simon Riggs

On Fri, 2009-04-03 at 10:04 -0400, Tom Lane wrote:
 Matthew Wakeling matt...@flymine.org writes:
  On Fri, 3 Apr 2009, Robert Haas wrote:
  Why not just use SQL to do the join?
 
  Because the merge condition is:
 
  WHERE l1.start = l2.end AND l2.start = l1.end
 
  and merge joins in postgres only currently cope with the case where the 
  merge condition is an equals relationship.

(snip)

 I don't actually believe that a standard merge join algorithm will work
 with an intransitive join condition ...

I think it's a common enough problem that having a non-standard join
algorithm written for that case would be interesting indeed.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-21 Thread Simon Riggs

On Fri, 2009-03-20 at 15:28 +, Matthew Wakeling wrote:
 On Thu, 19 Mar 2009, Scott Carey wrote:
  In type B, the ratio of requests that must context switch is always == 
  1.  Every request must queue and wait!
 
 A remarkably good point, although not completely correct. Every request 
 that arrives when the lock is held in any way already will queue and wait. 
 Requests that arrive when the lock is free will run immediately. I admit 
 it, this is a killer for this particular locking strategy.

I think the right mix of theory and test here is for people to come up
with new strategies that seem to make sense and then we'll test them
all. Trying too hard to arrive at the best strategy purely through
discussion will mean we miss a few tricks. Feels like we're on the right
track here.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Sat, 2009-03-14 at 12:09 -0400, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  WALInsertLock is also quite high on Jignesh's list. That I've seen 
  become the bottleneck on other tests too.
 
 Yeah, that's been seen to be an issue before.  I had the germ of an idea
 about how to fix that:
 
   ... with no lock, determine size of WAL record ...
   obtain WALInsertLock
   identify WAL start address of my record, advance insert pointer
   past record end
   *release* WALInsertLock
   without lock, copy record into the space just reserved
 
 The idea here is to allow parallelization of the copying of data into
 the buffers.  The hold time on WALInsertLock would be very short.  Maybe
 it could even become a spinlock, though I'm not sure, because the
 advance insert pointer bit is more complicated than it looks (you have
 to allow for the extra overhead when crossing a WAL page boundary).
 
 Now the fly in the ointment is that there would need to be some way to
 ensure that we didn't write data out to disk until it was valid; in
 particular how do we implement a request to flush WAL up to a particular
 LSN value, when maybe some of the records before that haven't been fully
 transferred into the buffers yet?  The best idea I've thought of so far
 is shared/exclusive locks on the individual WAL buffer pages, with the
 rather unusual behavior that writers of the page would take shared lock
 and only the reader (he who has to dump to disk) would take exclusive
 lock.  But maybe there's a better way.  Currently I don't believe that
 dumping a WAL buffer (WALWriteLock) blocks insertion of new WAL data,
 and it would be nice to preserve that property.

Yeh, that's just what we'd discussed previously:
http://markmail.org/message/gectqy3yzvjs2hru#query:Reworking%20WAL%
20locking+page:1+mid:gectqy3yzvjs2hru+state:results

Are you thinking of doing this for 8.4? :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
 One possibility would be for the locks to alternate between exclusive
 and 
 shared - that is:
 
 1. Take a snapshot of all shared waits, and grant them all -
 thundering
  herd style.
 2. Wait until ALL of them have finished, granting no more.
 3. Take a snapshot of all exclusive waits, and grant them all, one by
 one.
 4. Wait until all of them have been finished, granting no more.
 5. Back to (1)

I agree with that, apart from the granting no more bit.

Currently we queue up exclusive locks, but there is no need to since for
ProcArrayLock commits are all changing different data.

The most useful behaviour is just to have two modes:
* exclusive-lock held - all other x locks welcome, s locks queue
* shared-lock held - all other s locks welcome, x locks queue

This *only* works for ProcArrayLock.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 11:45 +, Matthew Wakeling wrote:
 On Wed, 18 Mar 2009, Simon Riggs wrote:
  I agree with that, apart from the granting no more bit.
 
  The most useful behaviour is just to have two modes:
  * exclusive-lock held - all other x locks welcome, s locks queue
  * shared-lock held - all other s locks welcome, x locks queue
 
 The problem with making all other locks welcome is that there is a 
 possibility of starvation. Imagine a case where there is a constant stream 
 of shared locks - the exclusive locks may never actually get hold of the 
 lock under the all other shared locks welcome strategy. 

That's exactly what happens now. 

 Likewise with the reverse.

I think it depends upon how frequently requests arrive. Commits cause X
locks and we don't commit that often, so its very unlikely that we'd see
a constant stream of X locks and prevent shared lockers.


Some comments from an earlier post on this topic (about 20 months ago):

Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD)  S-S-X-S-X-S-X-S
Currently we would wake the 1st and 2nd waiters only. 

If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help? 

We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request. The situation is worst when the lock is heavily
contended and the workload has a 50/50 mix of shared/exclusive requests,
e.g. serializable transactions or transactions with lots of
subtransactions.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 16:26 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Mon, 2009-03-16 at 16:26 +, Matthew Wakeling wrote:
  One possibility would be for the locks to alternate between exclusive
  and 
  shared - that is:
  
  1. Take a snapshot of all shared waits, and grant them all -
  thundering
  herd style.
  2. Wait until ALL of them have finished, granting no more.
  3. Take a snapshot of all exclusive waits, and grant them all, one by
  one.
  4. Wait until all of them have been finished, granting no more.
  5. Back to (1)
 
  I agree with that, apart from the granting no more bit.
 
  Currently we queue up exclusive locks, but there is no need to since for
  ProcArrayLock commits are all changing different data.
 
  The most useful behaviour is just to have two modes:
  * exclusive-lock held - all other x locks welcome, s locks queue
  * shared-lock held - all other s locks welcome, x locks queue
 
 My goodness, it seems people have forgotten about the lightweight
 part of the LWLock design.

Lightweight is only useful if it fits purpose. If the LWlock design
doesn't fit all cases, especially with critical lock types, then we can
have special cases. We have both spinlocks and LWlocks, plus we split
hash tables into multiple lock partitions. If we have 3 types of
lightweight locking, why not consider having 4?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Simon Riggs

On Wed, 2009-03-18 at 13:49 +, Matthew Wakeling wrote:
 On Wed, 18 Mar 2009, Jignesh K. Shah wrote:
  I thought about that.. Except without putting a restriction a huge queue 
  will cause lot of time spent in manipulating the lock
  list every time. One more thing will be to maintain two list shared and 
  exclusive and round robin through them for every time you
  access the list so manipulation is low.. But the best thing is to allow 
  flexibility to change the algorithm since some workloads
  may work fine with one and others will NOT. The flexibility then allows to 
  tinker for those already reaching the limits.
 
 Yeah, having two separate queues is the obvious way of doing this. It 
 would make most operations really trivial. Just wake everything in the 
 shared queue at once, and you can throw it away wholesale and allocate a 
 new queue. It avoids a whole lot of queue manipulation.

Yes, that sounds good.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Simon Riggs

On Wed, 2009-03-11 at 22:20 -0400, Jignesh K. Shah wrote:

 A tunable does not impact existing behavior

Why not put the tunable parameter into the patch and then show the test
results with it in? If there is no overhead, we should then be able to
see that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Simon Riggs

On Tue, 2009-03-17 at 17:41 -0400, Jignesh K. Shah wrote:

 I did a quick test with patch. Unfortunately it improves my number
 even with default setting 0 (not sure whether I should be pleased or
 sad - Definitely no overhead infact seems to help performance a bit.
 NOTE: Logic is same, implementation is slightly different for default
 set)

OK, I bite. 25% gain from doing nothing??? You're stretching my... err,
credulity.

I like the train of thought for setting 1 and it is worth investigating,
but something feels wrong somewhere.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-17 Thread Simon Riggs

On Tue, 2009-03-17 at 19:54 -0400, Jignesh K. Shah wrote:
 
 Simon Riggs wrote:
  On Tue, 2009-03-17 at 17:41 -0400, Jignesh K. Shah wrote:
 

  I did a quick test with patch. Unfortunately it improves my number
  even with default setting 0 (not sure whether I should be pleased or
  sad - Definitely no overhead infact seems to help performance a bit.
  NOTE: Logic is same, implementation is slightly different for default
  set)
  
 
  OK, I bite. 25% gain from doing nothing??? You're stretching my... err,
  credulity.
 
  I like the train of thought for setting 1 and it is worth investigating,
  but something feels wrong somewhere.
 

 Actually I think I am hurting my credibility here  since I cannot 
 explain the improvement with the patch but still using default logic 
 (thought different way I compare sequential using fields from the 
 previous proc structure  instead of comparing with constant boolean)  
 But the change was necessary to allow it to handle multiple algorithms 
 and yet be sleek and not bloated.
 
  In next couple of weeks I plan to test the patch on a different x64 
 based system to do a sanity testing on lower number of cores and also 
 try out other workloads ...

Good plan. I'm behind your ideas and will be happy to wait.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-14 Thread Simon Riggs

On Wed, 2009-03-11 at 16:53 -0400, Jignesh K. Shah wrote:

 1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019

I think you need to iron out bugs in your test script before we put too
much stock into the results generated. Your throughput should not be
negative.

I'd be interested in knowing the number of S and X locks requested, so
we can think about this from first principles. My understanding is that
ratio of S:X is about 10:1. Do you have more exact numbers?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] caching written values?

2009-01-22 Thread Simon Riggs

On Thu, 2009-01-22 at 13:11 +0100, Thomas Finneid wrote:

 Is there any possibilites of telling pg to save to disk that memory 
 cached data and state when the server is shutdown, so that when the 
 server starts up again, itreads it back into the memory?

It's possible, but not by any directly supported mechanism.

You have to consider whether the data you saved would still be required
when the server restarts.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-07 Thread Simon Waters
On Wednesday 07 January 2009 04:17:10 M. Edward (Ed) Borasky wrote:
 
 1. The package it lives in is called sysstat. Most Linux distros do
 *not* install sysstat by default. Somebody should beat up on them
 about that. :)

Hehe, although sysstat and friends did have issues on Linux for a long time. 
Nothing worse than misleading stats, so I suspect it lost a lot of friends 
back then. It is a lot better these days when most of the Unix software 
targets Linux first, and other kernels second.

Aside from all the advice here about system tuning, as a system admin I'd also 
ask is the box doing the job you need? And are you looking at the Postgres 
log (with logging of slow queries) to see that queries perform in a sensible 
time? I'd assume with the current performance figure there is an issue 
somewhere, but I've been to places where it was as simple as adding one 
index, or even modifying an index so it does what the application developer 
intended instead of what they ask for ;)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-24 Thread Simon Riggs
Hi Mark,

Good to see you producing results again.


On Sat, 2008-12-20 at 16:54 -0800, Mark Wong wrote:
 Here are links to how the throughput changes when increasing shared_buffers:
 
 http://pugs.postgresql.org/node/505

Only starnge thing here is the result at 22528MB. It's the only normal
one there. Seems to be a freeze occurring on most tests around the 30
minute mark, which delays many backends and reduces writes. 

Reduction in performance as shared_buffers increases looks normal.

Increase wal_buffers, but look for something else as well. Try to get a
backtrace from when the lock up happens. It may not be Postgres?

 And another series of tests to show how throughput changes when
 checkpoint_segments are increased:
 
 http://pugs.postgresql.org/node/503
 
 The links go to a graphical summary and raw data.  Note that the
 maximum theoretical throughput at this scale factor is approximately
 12000 notpm.
 
 My first glance takes tells me that the system performance is quite
 erratic when increasing the shared_buffers.  I'm also not what to
 gather from increasing the checkpoint_segments.  Is it simply that the
 more checkpoint segments you have, the more time the database spends
 fsyncing when at a checkpoint?

I would ignore the checkpoint_segment tests because you aren't using a
realistic value of shared_buffers. I doubt any such effect is noticeable
when you use a realistic value determined from set of tests 505.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Simon Waters
On Tuesday 09 December 2008 13:08:14 Jean-David Beyer wrote:
 
 and even if they can, I do not know if postgres uses that ability. I doubt
 it, since I believe (at least in Linux) a process can do that only if run
 as root, which I imagine few (if any) users do.

Disclaimer: I'm not a system programmer... 

I believe that at Linux kernel revision 2.6.8 and before processes need Posix 
capability CAP_IPC_LOCK, and 2.6.9 and after they need CAP_IPC_LOCK to lock 
more than RLIMIT_MEMLOCK.

It is a capability, so a process can run as any user assuming it is started 
with or gained the capability.

No idea if Postgres uses any of this, other than to protect security of 
certain password operations there is probably not much point. If key parts of 
your database are being paged out, get more RAM, if idle parts of your 
database are paged out, you probably could more usefully apply that RAM for 
something else.

The Varnish cache design is the place to look for enlightenment on relying on 
the kernel paging (using memory mapped files) rather than trying to do it 
yourself, but then a proxy server is a lot simpler than a RDBMS. That said, 
Varnish is fast at what it does (reverse HTTP proxy) !

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perc 3 DC

2008-11-24 Thread Simon Waters
On Monday 24 November 2008 14:49:17 Glyn Astill wrote:
 --- On Mon, 24/11/08, Steve Clark [EMAIL PROTECTED] wrote:
   Yeah the battery's on it, that and the 128Mb is
 
  really the only reason I thought I'd give it a whirl.
 
 
  Is the battery  functioning? We found that the unit had to
  be on and charged before write back caching
  would work.

 Yeah the battery is on there, and in the BIOS it says it's PRESENT and
 the status is GOOD.

Sorry I deleted the beginning of this on getting back from a week off.

Writeback is configurable. You can enabled write back caching when the unit is 
not charged if you like. It is offered when you create the array (and can be 
changed later). It is arguably a silly thing to do, but it is an option.

I have some reasonable performance stats for this card assuming you have a 
suitably recent version of the driver software, DELL use to ship with a Linux 
kernel that had a broken driver for this card resulting is very poor 
performance (i.e. substantially slower than software RAID). I have a note 
never to use with Linux before 2.6.22 as the LSI driver bundled had issues, 
DELL themselves shipped (if you asked why is performance so bad) a Redhat 
kernel with a later driver for the card than the official Linux kernel.

That said a couple of weeks back ours corrupted a volume on replacing a dead 
hard disk, so I'm never touching these cheap and tacky LSI RAID cards ever 
again. It is suppose to just start rebuilding the array when you insert the 
replacement drive, if it doesn't just work schedule some down time and 
figure out exactly why, don't (for example) blindly follow the instructions 
in the manual on what to do if it doesn't just work.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Backup strategies

2008-10-17 Thread Simon Riggs

On Wed, 2008-10-15 at 16:05 +0200, Ivan Voras wrote:

 So, pg_start_backup() freezes the data at the time it's called but
 still
 data and xlog are changed, in a different way that's safe to backup?

No, that's not how it works. The pg_start_backup() records the point
that we must rollforward from. There is no freezing.

  Why
 not run with pg_start_backup() always enabled?

It's not a mode that can be enabled/disabled. Its a starting point.

You should run pg_start_backup() each time you run a backup, just like
the fine manual describes.

Check your backups...

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [GENERAL] [PERFORM] Drupal and PostgreSQL - performance issues?

2008-10-13 Thread Simon Waters
On Monday 13 October 2008 15:19:07 Scott Marlowe wrote:
 
  shared_buffers = 24MB
  max_fsm_pages = 153600

 Well, 24MB is pretty small.  See if you can increase your system's
 shared memory and postgresql's shared_buffers to somewhere around 256M
 to 512M.  It likely won't make a big difference in this scenario, but
 overall it will definitely help.

I noted after reading earlier messages in the thread, that my distro documents 
that the values it default to for shared_buffers is rather small.

One of our servers is fairly pressed for memory (some of the time). Is there 
any way to measure the amount of churn in the shared_buffers, as a way of 
demonstrating that more is needed (or at this moment more would help)?

A few very small databases on this server, and one which is 768M (still pretty 
small but a lot bigger than the rest, most of which is logging information). 
The only hot information is the session table, ~9000 lines, one index on 
the session id. Can I ask Postgres to tell me, or estimate, how much memory 
this table would occupy if fully cached in memory?

Half the problem in modern computing is knowing what is slow. In this case, 
counting the rows of the session table takes about 100ms. Deleting expired 
session rows about 120ms, more if it hasn't done it for a while, which is I 
guess evidence that table isn't being cached in memory as efficiency as it 
could be.

In this case the server thinks the system I/O is zero for half the tools in 
use, because of the RAID hardware, so most of the Linux based tools are 
useless in this context.

At the risk of thread hijacking, for the session table I wonder if we are 
handling it the most efficient way. It is just a regular table, indexed on 
session_id. Each request of note to the server requires retrieval of the 
session record, and often updating the expiry information. Every N requests 
the application also issues a:

DELETE FROM sessions WHERE expiresNOW() OR expires IS NULL;

Since there is no index on the table, it sequentially scans, and deletes the 
stale records. I'm thinking since it is indexed for regular queries, making N 
larger has almost no obvious penalty except we accumulate a small number of 
stale records for longer. I'm not sure if an index on expires is worth it, 
probably too small to make much difference either way.

As for Drupal on Postgres, it might be worth the effort for big 
implementations, I did it for a while, but doing it again I'd go with MySQL. 
Nothing to do with the database, everything to do with support for 3rd party 
add-ins. Till Drupal gets to the automated testing of these things routinely 
against different backends and configs.. Perhaps that is all that is 
needed, a service for Drupal authors that tries their plugins against 
Postgres automatically and complains if it doesn't work?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 7.4 - basic tuning question

2008-10-03 Thread Simon Waters
Hi,

we have a log table on one server with 1.9 million records.

One column event (type text) in that table is a string that (currently) 
takes a small number of distinct values (~43) (hmm that could have been 
normalised better).

We noted on querying for events of a specific type, that the queries were 
slower than expected. It simply wasn't using the index (btree, default 
settings) on this column on this server (the test server, with less records, 
was fine).

Using ALTER TABLE SET STATISTICS to increase the number of buckets to 50 
resolved the issue, we went pretty much straight there on discovering there 
are no HINTS.

However we aren't quite sure why this case was pathological, and my brain 
doesn't grok the documentation quite.

I assume that the histogram_bounds for strings are alphabetical in order, so 
that DEMOSTART falls between DELETE and IDEMAIL. Even on a worst case 
of including both these common values, the planner ought to have assumed that 
less than 10% of records were likely covered by the value selected, so it 
seems unlikely to me that not using the index would be a good idea.

What am I missing? (and yes there is a plan to upgrade!).


= SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...lots of time passes...)
 count
---
  1432
(1 row)


= SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count;

  count |   event
+---
  6 | DNRFAIL
 14 | ADMDNR
 14 | UPGRADE
 18 | FOCRENEW
 21 | AUTOCN
 25 | ADMCC
 27 | TEMPIN
 31 | DNRCANCEL
 43 | EXPIRED
128 | DIRECTBUY
130 | CANCEL
130 | CANCELQ
154 | FOCBUY
173 | EXPCCWARN
179 | OFFER
209 | DNROK
214 | TEMPRE
356 | CCWARN
429 | ADMLOGIN
719 | SUBSCRIBE
787 | CCSUCCESS
988 | CCFAILURE
   1217 | TEMPNEW
   1298 | PAYPAL
   1431 | DEMOSTART
   1776 | CCREQUEST
   2474 | ACCTUPD
  15169 | SYSMAINT
  42251 | IDEMAIL
  46964 | DELETE
  50764 | RELOGIN
  57022 | NEWUSR
  64907 | PUBREC0
  65449 | UNPUBLISH
  92843 | LOGOUT
  99018 | KILLSESS
 128900 | UPLOAD
 134994 | LOGIN
 137608 | NEWPAGE
 447556 | PUBREC1
 489572 | PUBLISH


= EXPLAIN SELECT * FROM log WHERE event='DEMOSTART';
 QUERY PLAN

 Seq Scan on log  (cost=0.00..54317.14 rows=20436 width=93)
   Filter: (event = 'DEMOSTART'::text)
(2 rows)


= ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE
LOG(event);
ALTER TABLE
ANALYZE


= EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
QUERY PLAN

---
 Aggregate  (cost=5101.43..5101.43 rows=1 width=0)
   -  Index Scan using log_event on log  (cost=0.00..5098.15 rows=1310
width=0)
 Index Cond: (event = 'DEMOSTART'::text)
(3 rows)


= SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...almost no time passes...)
 count
---
  1432
(1 row)


BEFORE
pajax= select * from pg_stats where tablename = 'log' and attname='event';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |

most_common_vals| 
most_common_freqs | 
 
histogram_bounds   | correlation
+---+-+---+---+++---+-+-
 public | log   | event   | 0 |10 | 25 | 
{PUBLISH,PUBREC1,NEWPAGE,UPLOAD,LOGIN,KILLSESS,LOGOUT} | 
{0.257333,0.248333,0.072,0.0696667,0.061,0.054,0.0506667} | 
{ACCTUPD,DELETE,IDEMAIL,NEWUSR,NEWUSR,PUBREC0,PUBREC0,RELOGIN,SYSMAINT,UNPUBLISH,UNPUBLISH}
 |
0.120881
(1 row)

AFTER
pajax=  select * from pg_stats where tablename='log' and attname='event';
 schemaname | tablename | attname | null_frac | avg_width | n_distinct |

most_common_vals|   
   
most_common_freqs  |
   
histogram_bounds
| 
correlation

Re: [PERFORM] Intel's X25-M SSD

2008-09-22 Thread Simon Riggs

On Mon, 2008-09-08 at 19:12 -0400, Greg Smith wrote:
 If like me you've been reading all the flash SSD drive reviews...

Great post, thanks for the information.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] autovacuum: use case for indenpedent TOAST table autovac settings

2008-08-14 Thread Simon Riggs

On Wed, 2008-08-13 at 21:30 -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  It seems like we'll want to do it somehow.  Perhaps the cleanest way is
  to incorporate toast-table settings in the reloptions of the parent
  table.  Otherwise dump/reload is gonna be a mess.
 
  My question is whether there is interest in actually having support for
  this, or should we just inherit the settings from the main table.  My
  gut feeling is that this may be needed in some cases, but perhaps I'm
  overengineering the thing.
 
 It seems reasonable to inherit the parent's settings by default, in any
 case.  So you could do that now and then extend the feature later if
 there's real demand.

Yeh, I can't really see a reason why you'd want to treat toast tables
differently with regard to autovacuuming. It's one more setting to get
wrong, so no thanks.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-06-01 Thread Simon Riggs

On Sat, 2008-05-31 at 11:53 -0700, Josh Berkus wrote:
 Simon,
 
  There is an optimum for each specific sort.
 
 Well, if the optimum is something other than as much as we can get, then we 
 still have a pretty serious issue with work_mem, no?

Depends upon your view of serious I suppose. I would say it is an
acceptable situation, but needs further optimization. I threw some ideas
around on Hackers around Dec/New Year, but I don't have time to work on
this further myself in this dev cycle. Further contributions welcome.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-31 Thread Simon Riggs

On Wed, 2008-05-28 at 16:59 -0700, Josh Berkus wrote:

 sort_mem: My tests with 8.2 and DBT3 seemed to show that, due to 
 limitations of our tape sort algorithm, allocating over 2GB for a single 
 sort had no benefit.  However, Magnus and others have claimed otherwise.  
 Has this improved in 8.3?

There is an optimum for each specific sort. 

Your results cannot be used to make a global recommendation about the
setting of work_mem. So not finding any benefit in your tests *and*
Magnus seeing an improvement are not inconsistent events.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Outer joins and equivalence

2008-05-28 Thread Simon Riggs

On Wed, 2008-05-28 at 11:45 +0100, Matthew Wakeling wrote:
 On Tue, 27 May 2008, Simon Riggs wrote:
  I do recognise that we would *not* be able to deduce this form of SQL
 
  A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id)
 
 Surely that would not be valid SQL?

You are right, but my point was about inferences during SQL planning,
not about initial analysis of the statement.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GEQO Benchmark

2008-05-28 Thread Simon Riggs

On Wed, 2008-05-28 at 13:13 -0300, Tarcizio Bini wrote:

 Of course, the geqo_threshold can be changed so that the geqo be
 performed in queries that have less than 12 tables. However, we aim to
 test the GEQO algorithm in conditions where the standard algorithm
 (dynamic programming) has a high cost to calculate the query plan.

My understanding is the GEQO cannot arrive at a better plan than the
standard optimizer, so unless you wish to measure planning time there
isn't much to test. What is the quality of the plans it generates? Well
that varies according to the input; sometimes it gets the right plan,
other times it doesn't get close.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Outer joins and equivalence

2008-05-27 Thread Simon Riggs

I have a complex query where making a small change to the SQL increases
run-time by  1000 times.

The first SQL statement is of the form

A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 

and the second is like this

A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)

the only difference is the substitution of a - b

This has been verified by examining EXPLAIN of SQL1, SQL2 and SQL1. The
first and third EXPLAINs are equivalent. All ANALYZE etc has been run.
All relevant join columns are INTEGERs. So we have a repeatable
difference in plans attributable to a single change.

The difference in run time occurs because the second form of the query
uses a SeqScan of a large table, whereas the first form is able to use a
nested loops join to access the large table, which then allows it to
access just 3 rows rather than 85 million rows.

There is a clear equivalence between the two forms of SQL, since the
equivalence a = b is derived from a natural rather than an outer join.
This can be applied from the left side to the right side of the join. 

So this looks to me like either a bug or just an un-implemented
optimizer feature. The code I've just looked at for equivalent class
relationships appears to refer to using this to propagate constant info
only, so I'm thinking it is not a bug. and hence why it is reported here
and not to pgsql-bugs.

I do recognise that we would *not* be able to deduce this form of SQL

A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id)

though that restriction on outer join equivalence is not relevant here.

(SQL, EXPLAINs etc available off-list only, by request).

I'm looking into this more now.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread Simon Riggs

On Fri, 2008-05-16 at 00:31 -0600, kevin kempter wrote:

 I'm running the join shown below and it takes  10 hours and  
 eventually runs out of disk space on a 1.4TB file system

Well, running in 10 hours doesn't mean there's a software problem, nor
does running out of disk space.

Please crunch some numbers before you ask, such as how much disk space
was used by the query, how big you'd expect it to be etc, plus provide
information such as what the primary key of the large table is and what
is your release level is etc..

Are you sure you want to retrieve an estimated 3 billion rows? Can you
cope if that estimate is wrong and the true figure is much higher? Do
you think the estimate is realistic?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Simon Riggs

On Fri, 2008-05-16 at 14:05 -0400, Greg Smith wrote:
 After reading the code and that documentation a bit, the part I'm
 still not sure about is whether the CLOG entry is created when the XID
 is assigned and then kept current as the state changes, or whether
 that isn't even in CLOG until the transaction is committed.  It seems
 like the latter, but there's some ambiguity in the wording and too
 many code paths for me to map right now.

Alvaro already said this, I thought? The clog is updated only at sub or
main transaction end, thank goodness. When the transactionid is assigned
the page of the clog that contains that transactionid is checked to see
if it already exists and if not, it is initialised.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Backup causing poor performance - suggestions

2008-05-05 Thread Simon Riggs
On Mon, 2008-05-05 at 09:10 -0700, Craig James wrote:
 Campbell, Lance wrote:
  We currently backup all of our database tables per schema using pg_dump 
  every half hour.  We have been noticing that the database performance 
  has been very poor during the backup process.  How can I improve the 
  performance?
 
 It sounds like the goal is to have frequent, near-real-time backups of
 your databases for recovery purposes.  Maybe instead of looking at
 pg_dump's performance, a better solution would be a replication system
 such as Slony, or a warm backup using Skype Tools.
 
 Backing up the database every half hour puts a large load on the
 system during the dump, and means you are re-dumping the same data, 48
 times per day.  If you use a replication solution, the backup process
 is continuous (spread out through the day), and you're not re-dumping
 static data; the only data that moves around is the new data.
 
 I've used Slony with mixed success; depending on the complexity and
 size of your database, it can be quite effective.  I've heard very
 good reports about Skype Tools, which has both a Slony-like replicator
 (not as configurable as Slony, but easier to set up and use), plus an
 entirely separate set of scripts that simplifies warm standby using
 WAL logging.

I think we should mention Warm Standby via pg_standby, which is part of
core software and documentation. Seems strange not to mention it at all.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SQL Function Slowness, 8.3.0

2008-04-17 Thread Simon Riggs
On Wed, 2008-04-16 at 11:09 -0400, Tom Lane wrote:
 Gavin M. Roy [EMAIL PROTECTED] writes:
  In 8.3.0, I'm seeing some oddities with SQL functions which I thought were
  immune to the planner data restrictions of plpgsql functions and the sort.
 
 Without a specific example this discussion is pretty content-free, but
 in general SQL functions face the same hazards of bad parameterized
 plans as plpgsql functions do.

I think it would help if there was some way to prepare functions to
allow them to be posted and understood more easily. These would help:

* a name obfuscator, so people can post functions without revealing
inner workings of their company and potentially lose intellectual
property rights over code posted in that way

* a pretty printer, so we can better understand them when we see 'em

Without these, I think we need to realise that many people will never
post their SQL at all.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   >