[PERFORM] Connection pooling - Number of connections

2014-03-21 Thread Guillaume Smet
Hi all,

Brett Wooldridge, the creator of HikariCP [1] - a high performance
Java connection pool - is contemplating the idea to change the way
pooling is done in HikariCP and have a fixed-size pool of connections
always open.

No maxPoolSize, no minIdle, no minPoolSize, juste a poolSize parameter
which sets the size of the pool. At application startup, all the
connections are opened and maintained by the pool throughout the life
of the application.

The basic idea is that if you decide that your application might need
100 connections at time, you set poolSize to 100 and HikariCP
maintains 100 connections open.

I recall very old posts on this list where people were talking about
code paths sensitive to the number of connections open (or even
max_connections) and that it wasn't such a good idea to keep
connections open if they were not really needed.

As a lot of scalability work has been done since this (very old) time,
I was wondering if it was still the rule of thumb or if the idea of
Brett to completely simplify the connection management is the way to
go.

It seems that at least another pool implementation is going this way
so I thought it might be a good idea to have the opinion of the
database side of things. This way, it will be easier to take a well
informed decision.

Thanks in advance for your comments/advices.

-- 
Guillaume

[1] https://github.com/brettwooldridge/HikariCP


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Connection pooling - Number of connections

2014-03-21 Thread Guillaume Smet
On Fri, Mar 21, 2014 at 4:49 PM, David Johnston pol...@yahoo.com wrote:
 Consider this train-of-thought:  no matter how large the pool size if you
 are constantly keeping, say, 90% of the connections actively working then
 having, on average, 10% of the connections sitting idle is probably not
 going to be noticeable on the server and the reduction in overhead of
 managing a pool is typically a net positive.  Now, I had no clue what
 percentage is actually true, or under what conditions and pool sizes it may
 vary, but that is a calculation that someone deciding on between managed and
 un-managed pools would need to make.

Sure.

The big question is if it is suited for general purpose or if having
100 connections open when 10 only are necessary at the time is causing
any unnecessary contention/spinlock issues/performance
overhead/whatever...

-- 
Guillaume


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Connection pooling - Number of connections

2014-03-21 Thread Guillaume Smet
Hi Tom,

On Fri, Mar 21, 2014 at 5:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It will cost you, in ProcArray scans for example.  But lots-of-idle-
 connections is exactly what a pooler is supposed to prevent.  If you have
 a server that can handle say 10 active queries, you should have a pool
 size of 10, not 100.  (If you have a server that can actually handle
 100 active queries, I'd like to have your IT budget.)

 The proposed design sounds fairly reasonable to me, as long as users are
 clear on how to set the pool size --- and in particular that bigger is
 not better.  Clueless users could definitely shoot themselves in the
 foot, though.

Yeah, well.

My understanding of what happened on the field is that people usually
set the pool size limit quite high because they don't want to
experience connection starvation even if there is a temporary slowdown
of their application/database.

Is the overhead of having 100 connections open noticeable or is it
better to not have them but not so bad to have them?

Thanks.

-- 
Guillaume


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Connection pooling - Number of connections

2014-03-21 Thread Guillaume Smet
Hi Sethu,

On Fri, Mar 21, 2014 at 6:51 PM, Sethu Prasad sethuprasad...@gmail.com wrote:
 So here the connection creation action is the costliest among all the other
 management tasks. so keeping the connections ready is the best option.

That's why you often have a minIdle parameter which allows to create
idle connections in advance.

 poolSize parameter is very good in the sense when the application owner know
 what is the optimal number to put, after having application performance
 analysed with the history of previous settings and the improvements made on
 it. server sizing always shows up in this sort of analysis.

It supposes that you do this job. From my experience, most of the not
so demanding apps are put into production without this sort of
detailed analysis.

You do it for your critical high throughput applications, not for the others.

That said, interesting discussion. Not exactly what I expected.

-- 
Guillaume


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any idea on how to improve the statistics estimates for this plan?

2012-12-08 Thread Guillaume Smet
(cough cough, missed the Reply to all button)

Hi Jeff,

On Sat, Dec 8, 2012 at 3:32 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 If those estimates are better, it probably means that your filter
 condition is picking a part of the el JOIN l that has much different
 selectivity to r than the full set does, and PostgreSQL has no way of
 knowing that.

It's certainly that. The fact is that this query is OK on most of the
French territory but it doesn't go well when you're looking at Paris
area in particular. As the query is supposed to return the shows you
can book, the selectivity is quite different as Paris has a lot of
places AND places organize a lot more shows in Paris than in the rest
of France. I was hoping that the high number of places would be enough
to circumvent the second fact which is much harder for PostgreSQL to
get but it looks like it's not.

Is there any way I could mitigate this issue by playing with planner
knobs? I don't remember having seen something I could use for
selectivity (such as the n_distinct stuff). It's not that big a deal
if it's a little worth elsewhere as there are far less places so the
effects of a bad plan are more contained.

-- 
Guillaume


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Any idea on how to improve the statistics estimates for this plan?

2012-12-05 Thread Guillaume Smet
Hi,

I'm struggling with a query for some time and the major problem of the
query is that the statistics are way wrong on a particular operation:
 -  Nested Loop  (cost=3177.72..19172.84 rows=*2* width=112) (actual
time=139.221..603.929 rows=*355331* loops=1)
 Join Filter: (l.location_id = r.location_id)
 -  Hash Join  (cost=3177.71..7847.52 rows=*33914* width=108)
(actual time=138.343..221.852 rows=*36664* loops=1)
 Hash Cond: (el.location_id = l.location_id)
 ...
 -  Index Scan using idx_test1 on representations r
(cost=0.01..0.32 rows=*1* width=12) (actual time=0.002..0.008
rows=*10* loops=36664)
 ...
(extracted from the original plan which is quite massive)

I tried to improve the statistics of l.location_id, el.location_id,
r.location_id and idx_test1.location_id (up to 5000) but it doesn't
get better.

Any idea on how I could get better statistics in this particular
example and why the estimate of the nested loop is so wrong while the
ones for each individual operations are quite good?

This is with PostgreSQL 9.2.1.

Thanks.

-- 
Guillaume


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two different execution plan for the same request

2010-07-07 Thread Guillaume Smet
Hi Nicolas,

On Wed, Jul 7, 2010 at 10:47 AM, JOUANIN Nicolas (44)
nicolas.joua...@dgfip.finances.gouv.fr wrote:
 There were no modification made on the database except a restart yesterday 
 evening and a vacuumdb --analyse ran at night.

It's not really surprising considering you probably kept the
default_statistics_target to 10 (it's the default in 8.3).

Consider raising it to 100 in your postgresql.conf (100 is the default
for newer versions), then reload, and run a new ANALYZE.

You might need to set it higher on specific columns if you have a lot
of data and your data distribution is weird.

And, btw, please upgrade to the latest 8.3.x.

HTH

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance regression between 8.3 and 8.4 on heavy text indexing

2009-08-27 Thread Guillaume Smet
2009/8/27 Gaël Le Mignot g...@pilotsystems.net:
 The  weird thing  was  that with  the  default of  100 for  statistics
 target, it was  worse than when we  moved back to 10. So  I didn't try
 with 1000, but I should have.

When you have so much data and a statistics target so low, you can't
expect the sample taken to be representative :between different runs
of ANALYZE, you can have totally different estimates and so totally
different plans. You just were lucky at 10 and unlucky at 100.

Thanks for your feedback and it's nice to see your problem solved.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance regression between 8.3 and 8.4 on heavy text indexing

2009-08-26 Thread Guillaume Smet
On Wed, Aug 26, 2009 at 6:29 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 g...@pilotsystems.net (=?iso-8859-1?Q?Ga=EBl?= Le Mignot) writes:
 So it seems it was quite wrong about estimated matching rows (192 predicted, 
 10222 reals).

 Yup.  What's even more interesting is that it seems the real win would
 have been to use just the 'claude  duviau' condition (which apparently
 matched only 14 rows).  8.3 had no hope whatever of understanding that,
 it just got lucky.  8.4 should have figured it out, I'm thinking.
 Does it help if you increase the statistics target for fulltext_tsv?
 (Don't forget to re-ANALYZE after doing so.)

It could be interesting to run the query without the condition
(keywords_tsv @@ '''assassinat'''::tsquery) to see the estimate of
(fulltext_tsv @@ '''claud''  ''duviau'''::tsquery) in 8.4.

Btw, what Tom means by increasing the statistics is executing the
following queries:
ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 500;
ANALYZE;
run your query with EXPLAIN ANALYZE;
ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 1000;
ANALYZE;
run your query with EXPLAIN ANALYZE;
ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 5000;
ANALYZE;
run your query with EXPLAIN ANALYZE;

to see if it improves the estimates.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance regression between 8.3 and 8.4 on heavy text indexing

2009-08-23 Thread Guillaume Smet
Hi Gaël,

On Fri, Aug 21, 2009 at 3:37 PM, Gaël Le Mignotg...@pilotsystems.net wrote:
 With 8.3 ::

  Limit  (cost=752.67..752.67 rows=1 width=24)
  (11 rows)

 With 8.4 ::
  (8 rows)

Could you provide us the EXPLAIN *ANALYZE* output of both plans?

From what I can see, one of the difference is that the estimates of
the number of rows are / 3 for this part of the query:
8.3 -  Bitmap Index Scan on keywords_index  (cost=0.00..48.97 rows=574 width=0)
8.4 -  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0)

It might be interesting to see if 8.4 is right or not.

Before 8.4, the selectivity for full text search was a constant (as
you can see it in your 8.3 plan: the number of rows are equal in both
bitmap index scan). 8.4 is smarter which might lead to other plans.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full statement logging problematic on larger machines?

2009-03-12 Thread Guillaume Smet
On Thu, Mar 12, 2009 at 2:05 AM, Andrew Dunstan and...@dunslane.net wrote:
 It is buffered at the individual log message level, so that we make sure we
 don't multiplex messages. No more than that.

OK. So if the OP can afford multiplexed queries by using a log
analyzer supporting them, it might be a good idea to try syslog with
full buffering.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full statement logging problematic on larger machines?

2009-03-11 Thread Guillaume Smet
On Wed, Mar 11, 2009 at 8:27 PM, Frank Joerdens fr...@joerdens.de wrote:
 This works much better but once we are at about 80% of peak load -
 which is around 8000 transactions per second currently - the server goes
 into a tailspin in the manner described above and we have to switch off full
 logging.

First, don't use log_duration = on + log_statement = 'all' to log all
the queries, use log_min_duration_statement=0, it's less verbose.

I don't know if the logging integrated into PostgreSQL can bufferize
its output. Andrew? If not, you should try syslog instead and see if
asynchronous logging with syslog is helping (you need to prefix the
path with a dash to enable asynchronous logging). You can also try to
send the logs on the network via udp (and also tcp if you have an
enhanced syslog-like).

Another option is to log the duration of every query but not the text.
We used to have this sort of configuration to gather comprehensive
statistics and slowest queries on highly loaded servers (it's not
perfect though but it can be an acceptable compromise):
log_duration = on
log_min_duration_statement = 100

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [JDBC] [PERFORM] Query much slower when run from postgres function

2009-03-09 Thread Guillaume Smet
On Mon, Mar 9, 2009 at 5:51 PM, Guillaume Cottenceau g...@mnc.ch wrote:
 Until it's possible to specifically tell the JDBC driver (and/or
 PG?) to not plan once for all runs (or is there something better
 to think of?), or the whole thing would be more clever (off the
 top of my head, PG could try to replan with the first actual
 values - or first xx actual values - and if the plan is
 different, then flag that prepared statement for replanning each
 time if the overall time estimate is different enough), I've
 opted to tell the JDBC driver to use the protocol version 2, as
 prepared statements were not so much prepared back then (IIRC
 parameter interpolation is performed in driver and the whole SQL
 query is passed each time, parsed, and planned) using
 protocolVersion=2 in the JDBC URL. So far it worked very well for
 us.

Unnamed prepared statements are planned after binding the values,
starting with 8.3, or more precisely starting with 8.3.2 as early 8.3
versions were partially broken on this behalf.

It's not always possible to use protocol version 2 as it's quite
limited (especially considering the exceptions returned).

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Abnormal performance difference between Postgres and MySQL

2009-02-23 Thread Guillaume Smet
On Tue, Feb 24, 2009 at 12:27 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 If it's not C then string compares are going to probably need special
 indexes to work the way you expect them. (varchar pattern ops).  Look
 here for more information:

 http://www.postgresql.org/docs/8.3/static/indexes-opclass.html

It's only relevant for pattern matching (eg LIKE or regexp). AFAICS,
the OP only uses plain equals in his query.

An EXPLAIN ANALYZE output would be nice to diagnose the problem.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor plan choice in prepared statement

2009-01-01 Thread Guillaume Smet
On Wed, Dec 31, 2008 at 5:01 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 I think it has been shown enough times that the performance drop caused
 by a worse plan can be orders of magnitudes worse than what's gained by
 producing the plan only once.  It does not seem a bad idea to provide a
 way to carry out only the parse phase, and postpone planning until the
 parameters have been received.

It's already done in 8.3 for unnamed plans, isn't it?

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor plan choice in prepared statement

2009-01-01 Thread Guillaume Smet
On Thu, Jan 1, 2009 at 9:24 PM,  da...@lang.hm wrote:
 forgive my ignorance here, but if it's unnamed how can you reference it
 later to take advantage of the parsing?

You can't. That's what unnamed prepared statements are for.

It's not obvious to me that the parsing phase is worth any caching.
From my experience, the planning phase takes far much time on complex
queries.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor plan choice in prepared statement

2008-12-31 Thread Guillaume Smet
On Tue, Dec 30, 2008 at 7:59 PM, bricklen brick...@gmail.com wrote:
 I would like to continue to use bind variables to prevent sql
 injection, but I'd like to force a plan re-parse for every single
 query (if necessary?)

As far as I understand your problem, you don't care about using
prepared statements.

If so, you can:
- either use pg_query_params():
http://www.php.net/manual/en/function.pg-query-params.php
- or use an unnamed prepared statements when you don't want a prepared
statement if, for some reason, you really need to use prepared
statements in a few cases: you can specify an empty string as plan
name. The condition for this one is that you need to upgrade to a
recent version of 8.3 as postponed planning of unnamed prepared
statements is a new feature of 8.3 and was broken in 8.3.0 and 8.3.1.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Less rows - better performance?

2008-07-21 Thread Guillaume Smet
On Mon, Jul 21, 2008 at 1:25 PM, Andreas Hartmann [EMAIL PROTECTED] wrote:
 SELECT pg_database.datname,
 pg_size_pretty(pg_database_size(pg_database.datname)) AS size
 FROM pg_database where pg_database.datname = 'vvz_live_1';

datname|  size
 ---+-
  vvz_live_1| 2565 MB

 I wonder why the actual size is so much bigger than the data-only dump - is
 this because of index data etc.?

More probably because the database is totally bloated. Do you run
VACUUM regularly or did you set up autovacuum?

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Typecast bug?

2008-06-26 Thread Guillaume Smet
On Thu, Jun 26, 2008 at 9:02 AM, Frank Joerdens [EMAIL PROTECTED] wrote:
 Which version are you using? 8.3 removes a lot of implicit casts (all?
 not sure), so this may already be your fix.

8.3 only removed implicit casts from non text types to text (date -
text, int - text, interval - text...) to avoid unexpected
behaviours.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
Hi -performance,

I experienced this morning a performance problem when we imported a
dump in a 8.1 database.

The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG:  duration: 1636301.317 ms  statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG:  duration: 20613.009 ms  statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG:  duration: 10653.290 ms  statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG:  duration: 15031.579 ms  statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);

The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
option.

As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.

So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
time.
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?

Thanks for any input.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
On Thu, May 22, 2008 at 3:14 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Do you have maintenance_work_mem set large enough that the index
 creation sort is done in-memory?  8.1 depends on the platform's qsort
 and a lot of them are kinda pessimal for input like this.

FWIW, it's a 32 bits CentOS 4.6 box.

maintenance_work_mem is set to 256 MB and the size of the index is 400 MB.

Should I try to raise it up to 512 MB? The server only has 2GB of RAM
so it seems a bit high.

 8.2 (which uses our own qsort) seems to perform better in a quick
 test.

Mmmmh OK. I was considering an upgrade to 8.3 in the next months anyway.

Do we agree that in the case of unnamed prepared statement, 8.3 plans
the query after the BIND? The partial index seems to be a better
solution anyway, considering that it's 12 MB vs 400 MB.

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
On Thu, May 22, 2008 at 9:18 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Ermm .. this is in fact mostly broken in 8.3.0 and 8.3.1.  If you don't
 want to wait for 8.3.2, you need this patch:
 http://archives.postgresql.org/pgsql-committers/2008-03/msg00566.php

That's what I had in mind. We have to test a lot of things before even
considering an upgrade so that's not really a problem for us to wait
for 8.3.2.

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] multiple joins + Order by + LIMIT query performance issue

2008-05-06 Thread Guillaume Smet
Antoine,

On Tue, May 6, 2008 at 5:03 PM, Antoine Baudoux [EMAIL PROTECTED] wrote:
  Limit  (cost=23981.18..23981.18 rows=1 width=977)
-  Sort  (cost=23981.18..23981.18 rows=1 width=977)
  Sort Key: this_.c_date

Can you please provide the EXPLAIN ANALYZE output instead of EXPLAIN?

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgfouine - commit details?

2008-05-06 Thread Guillaume Smet
Josh,

On Tue, May 6, 2008 at 11:10 PM, Josh Cole [EMAIL PROTECTED] wrote:
 We are using pgfouine to try and optimize our database at this time.  Is
 there a way to have pgfouine show examples or breakout commits?

I hesitated before not implementing this idea. The problem is that you
often don't log everything and use log_min_duration_statement and thus
you don't have all the queries of the transaction in your log file
(and you usually don't have the BEGIN; command in the logs).

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recomendations on raid controllers raid 1+0

2008-03-13 Thread Guillaume Smet
Glyn,

On Thu, Mar 13, 2008 at 1:33 PM, Glyn Astill [EMAIL PROTECTED] wrote:
  I'm looking at switching out the perc5i (lsi megaraid) cards from our
  Dell 2950s for something else as they're crap at raid 10.

Do you have numbers? Perc 5/i cards perform quite well for us (we have
a 8 disks RAID 10 in a 2900 server with the traditional Perc 5/i).

--
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Guillaume Smet
On Feb 5, 2008 12:47 PM, Viviane Lestic [EMAIL PROTECTED] wrote:
 Could someone help me analyze this problem?
 I don't manage to see if the problem comes from bad tsearch tuning,
 postgresql configuration, or something else...

Can you try to replace zoneindex_test @@ q with zoneindex_test @@
to_tsquery('partir')? Increasing the statistics for zoneindex_test may
be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).
I'm surprised you have the word partir in so many documents? Do you
use real data?

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 8.3rc1 Out of memory when performing update

2008-01-28 Thread Guillaume Smet
On Jan 25, 2008 5:50 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Hmm.  I think what that really means is you haven't got to the part of
 the query where the leak is :-(.  In my attempt to reproduce this
 I found that 8.3 has introduced a memory leak into the RI trigger
 support, such that even if an UPDATE doesn't change the FK columns
 it's still likely to leak a few dozen bytes per updated row.

 Please see if the attached patch makes it better for you.

Just FYI, somebody on #postgresql had the exact same problem of
memleaks during update yesterday and your patch fixed it for him too.

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Workaround for cross column stats dependency

2008-01-23 Thread Guillaume Smet
On Jan 23, 2008 3:02 AM, Guillaume Smet [EMAIL PROTECTED] wrote:
 I'll post my results tomorrow morning.

It works perfectly well:
cityvox_prod=# CREATE OR REPLACE FUNCTION
getTypesLieuFromTheme(codeTheme text) returns text[] AS
$f$
SELECT ARRAY(SELECT codetylieu::text FROM rubtylieu WHERE codeth = $1);
$f$ LANGUAGE SQL IMMUTABLE;
CREATE FUNCTION

cityvox_prod=# EXPLAIN ANALYZE SELECT vq.codequar, vq.liblong, vq.libcourt
FROM lieu l, vilquartier vq, genrelieu gl, lieugelieu lgl
WHERE l.codequar = vq.codequar AND l.dfinvalidlieu is null AND
vq.codevil = 'MUC' AND lgl.numlieu = l.numlieu AND lgl.codegelieu =
gl.codegelieu
AND gl.codetylieu = ANY(getTypesLieuFromTheme('RES'))
GROUP BY vq.codequar, vq.liblong, vq.libcourt, vq.flagintramuros
ORDER BY vq.flagintramuros, vq.liblong;

 QUERY PLAN
---
 Sort  (cost=5960.02..5960.08 rows=26 width=43) (actual
time=7.467..7.475 rows=13 loops=1)
   Sort Key: vq.flagintramuros, vq.liblong
   Sort Method:  quicksort  Memory: 26kB
   -  HashAggregate  (cost=5959.15..5959.41 rows=26 width=43) (actual
time=7.421..7.428 rows=13 loops=1)
 -  Hash Join  (cost=7.32..5944.52 rows=1463 width=43)
(actual time=0.241..7.212 rows=167 loops=1)
   Hash Cond: ((lgl.codegelieu)::text = (gl.codegelieu)::text)
   -  Nested Loop  (cost=0.00..5898.00 rows=6552
width=47) (actual time=0.038..6.354 rows=973 loops=1)
 -  Nested Loop  (cost=0.00..4585.64 rows=3845
width=47) (actual time=0.031..1.959 rows=630 loops=1)
   -  Index Scan using
idx_vilquartier_codevil on vilquartier vq  (cost=0.00..34.06 rows=47
width=43) (actual time=0.015..0.047 rows=47 loops=1)
 Index Cond: ((codevil)::text = 'MUC'::text)
   -  Index Scan using idx_test on lieu l
(cost=0.00..95.53 rows=105 width=9) (actual time=0.008..0.024 rows=13
loops=47)
 Index Cond: ((l.codequar)::text =
(vq.codequar)::text)
 -  Index Scan using
idx_lieugelieu_numlieu_principal on lieugelieu lgl  (cost=0.00..0.32
rows=2 width=8) (actual time=0.003..0.004 rows=2 loops=630)
   Index Cond: (lgl.numlieu = l.numlieu)
   -  Hash  (cost=6.22..6.22 rows=88 width=4) (actual
time=0.146..0.146 rows=88 loops=1)
 -  Bitmap Heap Scan on genrelieu gl
(cost=2.23..6.22 rows=88 width=4) (actual time=0.022..0.075 rows=88
loops=1)
   Recheck Cond: ((codetylieu)::text = ANY
('{RES}'::text[]))
   -  Bitmap Index Scan on ind_genrelieu2
(cost=0.00..2.21 rows=88 width=0) (actual time=0.016..0.016 rows=88
loops=1)
 Index Cond: ((codetylieu)::text = ANY
('{RES}'::text[]))
 Total runtime: 7.558 ms

It seems like a good tip to keep in mind.

Thanks for your help.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] *_cost recommendation with 8.3 and a fully cached db

2008-01-23 Thread Guillaume Smet
Hi Tom,

On May 9, 2007 6:40 PM, Tom Lane [EMAIL PROTECTED] wrote:
 To return to your original comment: if you're trying to model a
 situation with a fully cached database, I think it's sensible
 to set random_page_cost = seq_page_cost = 0.1 or so.

Is it still valid for 8.3 or is there any reason to change this
recommendation, considering the work you did on the planner during the
8.3 cycle?

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] 8.3 synchronous_commit

2008-01-22 Thread Guillaume Smet
On Jan 22, 2008 9:32 AM, Florian Weimer [EMAIL PROTECTED] wrote:
  Maybe it's just my test box.. single SATA-II drive, XFS on top of LVM.

 Ours was ext3, no LVM or RAID.

Also with SATA? If your SATA disk is lying about effectively SYNCing
the data, I'm not that surprised you don't see any improvement. Being
slower is a bit surprising though.

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Workaround for cross column stats dependency

2008-01-22 Thread Guillaume Smet
Hi -performance,

While testing 8.3, I found this query which is equally slow on 8.1 and
8.3 and seems to be really slow for a not so complex query. The stats
are as good as possible and the behaviour of PostgreSQL seems to be
logical considering the stats but I'm looking for a workaround to
speed up this query.

So here is the original query:
cityvox_prod=# EXPLAIN ANALYZE SELECT vq.codequar, vq.liblong, vq.libcourt
FROM lieu l, vilquartier vq, rubtylieu rtl, genrelieu gl, lieugelieu lgl
WHERE l.codequar = vq.codequar AND l.dfinvalidlieu is null AND
vq.codevil = 'MUC' AND lgl.numlieu = l.numlieu AND lgl.codegelieu =
gl.codegelieu
AND gl.codetylieu = rtl.codetylieu AND rtl.codeth = 'RES' -- the
interesting part is here
GROUP BY vq.codequar, vq.liblong, vq.libcourt, vq.flagintramuros
ORDER BY vq.flagintramuros, vq.liblong;

  QUERY PLAN

 Sort  (cost=2773.18..2773.25 rows=26 width=43) (actual
time=602.822..602.829 rows=13 loops=1)
   Sort Key: vq.flagintramuros, vq.liblong
   Sort Method:  quicksort  Memory: 26kB
   -  HashAggregate  (cost=2772.31..2772.57 rows=26 width=43) (actual
time=602.769..602.778 rows=13 loops=1)
 -  Hash Join  (cost=2737.48..2769.83 rows=248 width=43)
(actual time=601.999..602.580 rows=167 loops=1)
   Hash Cond: ((vq.codequar)::text = (l.codequar)::text)
   -  Bitmap Heap Scan on vilquartier vq
(cost=1.91..27.50 rows=47 width=43) (actual time=0.032..0.067 rows=47
loops=1)
 Recheck Cond: ((codevil)::text = 'MUC'::text)
 -  Bitmap Index Scan on idx_vilquartier_codevil
(cost=0.00..1.90 rows=47 width=0) (actual time=0.023..0.023 rows=47
loops=1)
   Index Cond: ((codevil)::text = 'MUC'::text)
   -  Hash  (cost=2646.25..2646.25 rows=7145 width=5)
(actual time=601.955..601.955 rows=62526 loops=1)
 -  Nested Loop  (cost=0.00..2646.25 rows=*7145*
width=5) (actual time=0.058..548.618 rows=*62526* loops=1)
   -  Nested Loop  (cost=0.00..349.71
rows=7232 width=4) (actual time=0.049..147.221 rows=66292 loops=1)
 -  Nested Loop  (cost=0.00..8.59
rows=13 width=4) (actual time=0.027..0.254 rows=88 loops=1)
   -  Seq Scan on rubtylieu rtl
(cost=0.00..2.74 rows=1 width=4) (actual time=0.013..0.026 rows=1
loops=1)
 Filter: ((codeth)::text =
'RES'::text)
   -  Index Scan using
ind_genrelieu2 on genrelieu gl  (cost=0.00..5.68 rows=*14* width=8)
(actual time=0.013..0.119 rows=*88* loops=1)
 Index Cond:
((gl.codetylieu)::text = (rtl.codetylieu)::text)
 -  Index Scan using
idx_lieugelieu_codegelieu on lieugelieu lgl  (cost=0.00..18.33
rows=633 width=8) (actual time=0.014..0.802 rows=753 loops=88)
   Index Cond:
((lgl.codegelieu)::text = (gl.codegelieu)::text)
   -  Index Scan using pk_lieu on lieu l
(cost=0.00..0.31 rows=1 width=9) (actual time=0.003..0.004 rows=1
loops=66292)
 Index Cond: (l.numlieu = lgl.numlieu)
 Filter: (l.dfinvalidlieu IS NULL)
 Total runtime: 602.930 ms

The query is looking for parts of the city where we can find
restaurants. The problem of this query is that we have several
categories (here genrelieu) for a given type (here rubtylieu) and we
have several types for a given theme (here the theme is codeth =
'RES'). When the value of the theme is RES (restaurants), we have only
1 type (it's also RES). The fact is that there are a lot of rows for
the value RES in genrelieu (all the types of food and so on) compared
to other values. Considering that PostgreSQL doesn't have the value of
RES when building the stats for genrelieu, 14 rows is an expected
value considering the distribution of the values in genrelieu so
there's nothing really wrong here. But it's really really slow.

If I remove the join on rubtylieu and I inject directly the value
obtained in the query, the stats are exact and I get a far better
plan:

cityvox_prod=# EXPLAIN ANALYZE SELECT vq.codequar, vq.liblong, vq.libcourt
FROM lieu l, vilquartier vq, genrelieu gl, lieugelieu lgl
WHERE l.codequar = vq.codequar AND l.dfinvalidlieu is null AND
vq.codevil = 'MUC' AND lgl.numlieu = l.numlieu AND lgl.codegelieu =
gl.codegelieu
AND gl.codetylieu = 'RES'
GROUP BY vq.codequar, vq.liblong, vq.libcourt, vq.flagintramuros
ORDER BY vq.flagintramuros, vq.liblong;

 QUERY PLAN

Re: [PERFORM] Workaround for cross column stats dependency

2008-01-22 Thread Guillaume Smet
On Jan 23, 2008 2:43 AM, Tom Lane [EMAIL PROTECTED] wrote:
 regression=# create or replace function getu2(int) returns int[] as $$
 select array(select unique2 from tenk1 where thousand = $1);
 $$ language sql immutable;
 CREATE FUNCTION
 regression=# explain select * from tenk1 where unique1 = any(getu2(42));
   QUERY PLAN
 --
  Bitmap Heap Scan on tenk1  (cost=38.59..73.80 rows=10 width=244)
Recheck Cond: (unique1 = ANY 
 ('{381,932,2369,4476,5530,6342,6842,6961,7817,7973}'::integer[]))
-  Bitmap Index Scan on tenk1_unique1  (cost=0.00..38.59 rows=10 width=0)
  Index Cond: (unique1 = ANY 
 ('{381,932,2369,4476,5530,6342,6842,6961,7817,7973}'::integer[]))
 (4 rows)

I'll give it a try tomorrow.

 Now this method *only* works for interactive queries, or EXECUTE'd
 queries in plpgsql, because you don't want the plan containing the
 folded constants to get cached.  At least not if you're worried about
 responding promptly to changes in the table you're fetching from.
 But if that table is essentially constant anyway in your application,
 there's little downside to this trick.

Yeah, that sounds like a good idea in our case. We don't use prepared
statements for these queries.

I'll post my results tomorrow morning.

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] big database performance

2008-01-09 Thread Guillaume Smet
Hi Joshua,

On Jan 9, 2008 9:27 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 wal_sync_method = open_sync

Do you recommend it in every situation or just because data are on a
SAN? Do you have any numbers/real cases explaining this choice.

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] More shared buffers causes lower performances

2007-12-29 Thread Guillaume Smet
On Dec 27, 2007 7:54 PM, Tom Lane [EMAIL PROTECTED] wrote:
 I concur with Greg Stark's earlier comment that this is all
 overreaction.  Let's just fix the misleading comment in the
 documentation and leave it at that.

IMHO, we should also have a special tag for all the binaries
distributed with these options on the official website (RPM or not).
If the RPM packages' version has been tagged .debug or something like
that, it would have been the first thing I checked.

I like Gregory's idea to add a warning in pgbench. I usually run a few
pgbench tests to check there is no obvious problem even if I use
another more complicated benchmark afterwards. I don't know if that's
what other people do, though.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] More shared buffers causes lower performances

2007-12-27 Thread Guillaume Smet
On Dec 27, 2007 7:10 AM, Tom Lane [EMAIL PROTECTED] wrote:
  Enables firsttermassertion/ checks in the server, which test for
  many quotecannot happen/ conditions.  This is invaluable for
  code development purposes, but the tests slow things down a little.

 Maybe we ought to put that more strongly --- s/a little/significantly/,
 perhaps?

+1. It seems closer to the reality.

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] More shared buffers causes lower performances

2007-12-26 Thread Guillaume Smet
On Dec 26, 2007 12:21 PM, Simon Riggs [EMAIL PROTECTED] wrote:
 Can you try with

 bgwriter_lru_maxpages = 0

 So we can see if the bgwriter has any hand in this?

I will. I'm currently running tests with less concurrent clients (16)
with exactly the same results:
64M 4213.314902
256M 4012.782820
512M 3676.840722
768M 3377.791211
1024M 2863.133965
64M again 4274.531310

I'm rerunning the tests using Greg Smith's pgbench-tools [1] to obtain
a graph of each run.

[1] http://www.westnet.com/~gsmith/content/postgresql/pgbench-tools.htm

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] More shared buffers causes lower performances

2007-12-26 Thread Guillaume Smet
On Dec 26, 2007 12:06 PM, Cédric Villemain [EMAIL PROTECTED] wrote:
 Which kernel do you have ?

Kernel of the distro. So a RH flavoured 2.6.18.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] More shared buffers causes lower performances

2007-12-26 Thread Guillaume Smet
On Dec 26, 2007 12:21 PM, Simon Riggs [EMAIL PROTECTED] wrote:
 bgwriter_lru_maxpages = 0

 So we can see if the bgwriter has any hand in this?

It doesn't change the behaviour I have.

It's not checkpointing either as using pgbench-tools, I can see that
tps and latency are quite stable during the entire run. Btw, thanks
Greg for these nice tools.

I thought it may be some sort of lock contention so I made a few tests
with -N but I have the same behaviour.

Then I decided to perform read-only tests using -S option (pgbench -S
-s 100 -c 16 -t 3 -U postgres bench). And still the same
behaviour:
shared_buffers=64MB : 20k tps
shared_buffers=1024MB : 8k tps

Any other idea?

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] More shared buffers causes lower performances

2007-12-26 Thread Guillaume Smet
On Dec 26, 2007 4:41 PM, Guillaume Smet [EMAIL PROTECTED] wrote:
 Then I decided to perform read-only tests using -S option (pgbench -S
 -s 100 -c 16 -t 3 -U postgres bench). And still the same
 behaviour:
 shared_buffers=64MB : 20k tps
 shared_buffers=1024MB : 8k tps

Some more information. If I strace the backends during the test, the
test is faster with shared_buffers=1024MB and I have less system calls
(less read and less lseek).

A quick cut | uniq | sort gives me:
With 64MB:
  12548 semop
 160039 sendto
 160056 recvfrom
 294289 read
 613338 lseek

With 1024MB:
  11396 semop
 129947 read
 160039 sendto
 160056 recvfrom
 449584 lseek

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] More shared buffers causes lower performances

2007-12-26 Thread Guillaume Smet
On Dec 26, 2007 7:23 PM, Greg Smith [EMAIL PROTECTED] wrote:
 Ah, now this is really interesting, as it rules out all the write
 components and should be easy to replicate even on a smaller server.  As
 you've already dumped a bunch of time into this the only other thing I
 would suggest checking is whether the same behavior also happens on 8.2 on
 your server.

Let's go with 8.2.5 on the same server (-s 100 / 16 clients / 50k
transactions per client / only read using -S option):
64MB: 33814 tps
512MB: 35833 tps
1024MB: 36986 tps
It's more consistent with what I expected.

I used PGDG RPMs compiled by Devrim for 8.2.5 and the ones I compiled
myself for 8.3b4 (based on the src.rpm of Devrim). I just asked Devrim
to build a set of x86_64 RPMs for 8.3b4 to see if it's not a
compilation problem (they were compiled on a brand new box freshly
installed so it would be a bit surprising but I want to be sure). He's
kindly uploading them right now so I'll work on new tests using his
RPMs.

I'll keep you informed of the results.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] More shared buffers causes lower performances

2007-12-25 Thread Guillaume Smet
Hi all,

I'm currently benchmarking the new PostgreSQL server of one of our
customers with PostgreSQL 8.3 beta4. I have more or less the same
configuration Stefan tested in his blog [1]:
- Dell 2900 with two brand new X5365 processors (quad core 3.0 GHz),
16 GB of memory
- a RAID1 array for pg_xlog and a 6 disks RAID10 array for data (I
moved pg_xlog to the RAID10 array for a few runs - same behaviour) -
all 73 GB 15k drives
- CentOS 5.1 - 64 bits

I started working on pgbench tests. I made a not so stupid
configuration to begin with and I was quite disappointed by my results
compared to Stefan's. I decided to test with a more default
shared_buffers configuration to be able to compare my results with
Stefan's graph [2]. And the fact is that with a very low
shared_buffers configuration, my results are quite similar to Stefan's
results but, as soon as I put higher values of shared_buffers,
performances begins degrading [3].

I performed my tests with: pgbench -i -s 100 -U postgres bench and
pgbench -s 100 -c 100 -t 3 -U postgres bench. Of course, I
initialize the database before each run. I made my tests in one
direction then in the other with similar results so it's not a
degradation due to consecutive runs.

I lowered the number of concurrent clients to 50 because 100 is quite
high and I obtain the same sort of results:
shared_buffers=32MB: 1869 tps
shared_buffers=64MB: 1844 tps
shared_buffers=512MB: 1676 tps
shared_buffers=1024MB: 1559 tps

Non default parameters are:
max_connections = 200
work_mem = 32MB
wal_buffers = 1024kB
checkpoint_segments = 192
effective_cache_size = 5GB
(I use more or less the configuration used by Stefan - I had the same
behaviour with default wal_buffers and checkpoint_segments)

While monitoring the server with vmstat, I can't see any real reason
why it's slower. When shared_buffers has a higher value, I/O are
lower, context switches too and finally performances. The CPU usage is
quite similar (~50-60%). I/O doesn't limit the performances AFAICS.

I must admit I'm a bit puzzled. Does anyone have any pointer which
could explain this behaviour or any way to track the issue? I'll be
glad to perform any test needed to understand the problem.

Thanks.

[1] 
http://www.kaltenbrunner.cc/blog/index.php?/archives/21-8.3-vs.-8.2-a-simple-benchmark.html
[2] http://www.kaltenbrunner.cc/blog/uploads/83b4shm.gif
[3] http://people.openwide.fr/~gsmet/postgresql/tps_shared_buffers.png
(X=shared_buffers in MB/Y=results with pgbench)

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Dealing with big tables

2007-12-02 Thread Guillaume Smet
On Dec 2, 2007 11:26 AM, Mindaugas [EMAIL PROTECTED] wrote:
   I execute simple query select * from bigtable where From='something'. 
 Query returns like 1000 rows and takes 5++ seconds to complete. As far as I 
 understand the query is slow because:

Can you post an EXPLAIN ANALYZE? Which version of PostgreSQL do you use?

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Guillaume Smet
On Nov 9, 2007 5:33 PM, Tom Lane [EMAIL PROTECTED] wrote:
 he's got no MCVs, presumably because the field
 is unique.

It is. The ancestors field contains the current folder itself so the
id of the folder (which is the primary key) is in it.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Guillaume Smet
Tom,

Just to confirm you that your last commit fixed the problem:

lbo=# explain analyze select * from cms_items where ancestors LIKE '1062/%';
  QUERY PLAN
---
 Seq Scan on cms_items  (cost=0.00..688.26 rows=*9097* width=103)
(actual time=0.011..22.605 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 30.022 ms
(3 rows)

Thanks for your time.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet
Tom,

On Nov 8, 2007 12:14 AM, Tom Lane [EMAIL PROTECTED] wrote:
 I've applied a patch that might help you:
 http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php

AFAICS, it doesn't seem to fix the problem. I just compiled
REL8_1_STABLE branch and I still has the following behaviour:
lbo=# ANALYZE cms_items;
ANALYZE
lbo=# explain analyze select * from cms_items where ancestors LIKE '1062/%';
 QUERY PLAN

 Seq Scan on cms_items  (cost=0.00..688.26 rows=1 width=103) (actual
time=0.009..22.258 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 29.835 ms
(3 rows)

lbo=# show lc_collate;
 lc_collate
-
 fr_FR.UTF-8
(1 row)

Do you see any reason why your patch doesn't change anything in this case?

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet
On Nov 8, 2007 4:01 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Hmm, can we see the pg_stats row for the ancestors column?

Sure:
 public | cms_items | ancestors | 0 |32 |
-1 |  |   |
{10011/10010/10009/10018/2554055/,10011/10010/84022/23372040/,10011/2233043/2233042/2233041/,10011/3985097/5020039/,10011/872018/13335056/1051/,1062/22304709/22304714/,1062/2489/2492/27861901/,1062/2527/2530/29658392/,1062/2698/2705/6014040/,1062/52354/52355/255038/255037/,9846852/}
|   -0.151713

I can provide the data if needed, there's nothing confidential in them.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet
On Nov 9, 2007 3:08 AM, Tom Lane [EMAIL PROTECTED] wrote:
 This rule works for all the locales I have installed ... but I don't
 have any Far Eastern locales installed.  Also, my test cases are only
 covering ASCII characters, and I believe many locales have some non-ASCII
 letters that sort after 'Z'.  I'm not sure how hard we need to try to
 cover those corner cases, though.  It is ultimately only an estimate...

My opinion is that it's acceptable to fix the problem for most cases
in most locales because, as you said, it's only an estimate. We didn't
have any report of this problem for years so it seems that it's not a
common case or at least it's not common that the bad estimate leads to
noticeably bad plans.

As far as I understand what you plan to do, it doesn't seem to be
something that prevents us to fix the problem afterwards if someone
comes with an example which doesn't fit in the schema you're proposing
and has a real performance problem with it.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-07 Thread Guillaume Smet
On 11/7/07, Tom Lane [EMAIL PROTECTED] wrote:
 Hmmm ... what locale are you working in?  I notice that the range
 estimator for this pattern would be ancestors = '1062/' AND
 ancestors  '10620', which will do the right thing in C locale
 but maybe not so much elsewhere.

Sorry for not having mentioned it before. Locale is UTF-8.

  Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu,

 You'd probably get better results with 8.2, which has a noticeably
 smarter LIKE-estimator, at least for histogram sizes of 100 or more.

It's not really possible to upgrade this application to 8.2 for now.
It's a very old app based on the thing formerly called as Red Hat WAF
and now known as APLAWS and validating WAF and this application with
8.2 will take quite some time. Moreover the db is big and we can't
afford the downtime of a migration.

I suppose my best bet is to remove the pg_statistic line and to set
the statistics to 0 for this column so that the stats are never
generated again for this column?

Thanks,

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-07 Thread Guillaume Smet
Hi all,

While studying a query taking forever after an ANALYZE on a never
analyzed database (a bad estimate causes a nested loop on a lot of
tuples), I found the following problem:
- without any stats (I removed the line from pg_statistic):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
  QUERY PLAN
--
 Seq Scan on cms_items  (cost=0.00..689.26 rows=114 width=587) (actual
time=0.008..21.692 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 31.097 ms
- the estimate is bad (it's expected) but it's sufficient to prevent
the nested loop so it's my current workaround

- after analyzing the cms_items table (statistics is set to 10 but
it's exactly the same for 100):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
 QUERY PLAN

 Seq Scan on cms_items  (cost=0.00..689.26 rows=*1* width=103) (actual
time=0.010..22.024 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 31.341 ms
- this estimate leads PostgreSQL to choose a nested loop which is
executed more than 11k times and causes the query to take forever.

- if I remove the / from the LIKE clause (which I can't as ancestors
is more or less a path):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062%';
  QUERY PLAN
---
 Seq Scan on cms_items  (cost=0.00..689.26 rows=*9097* width=103)
(actual time=0.043..25.251 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062%'::text)
 Total runtime: 34.778 ms

Which is a really good estimate.

Is it something expected?

The histogram does contain values beginning with '1062/' (5 out of 10)
and the cms_items table has ~ 22k rows.

Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, compiled by GCC
gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3). I checked the release
notes between 8.1.8 and 8.1.10 and I didn't find anything relevant to
fix this problem.

Thanks for any help.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-07 Thread Guillaume Smet
Alexander,

On 11/7/07, Alexander Staubo [EMAIL PROTECTED] wrote:
 That's a difference of less than *three milliseconds* -- a difference
 probably way within the expected overhead of running explain
 analyze. Furthermore, all three queries use the same basic plan: a
 sequential scan with a filter. At any rate you're microbenchmarking in
 a way that is not useful to real-world queries. In what way are these
 timings a problem?

If you read my previous email carefully, you'll see they aren't a
problem: the problem is the estimation, not the timing. This is a self
contained test case of a far more complex query which uses a bad plan
containing a nested loop due to the bad estimate.

 Now all like 'prefix%' queries should use the index.

Not when you retrieve 50% of this table of 22k rows but that's not my
problem anyway. A seqscan is perfectly fine in this case.

Thanks anyway.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-07 Thread Guillaume Smet
On 11/7/07, Tom Lane [EMAIL PROTECTED] wrote:
 I wanted the locale (lc_collate), not the encoding.

fr_FR.UTF-8

 That would optimize this particular query and probably pessimize
 a lot of others.

Sure but there aren't a lot of queries based on the ancestors field
and if they are a bit slower, it's not a problem. However having a
query taking forever is not acceptable as the content management app
is unaccessible.
So it can be an acceptable solution in this case, even if not perfect.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-07 Thread Guillaume Smet
On 11/8/07, Tom Lane [EMAIL PROTECTED] wrote:
 I've applied a patch that might help you:
 http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php

Thanks. I'll build a RPM package tomorrow with this patch and let you
know if it fixes the problem.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-18 Thread Guillaume Smet

On 6/18/07, Sabin Coanda [EMAIL PROTECTED] wrote:

Hi Guillaume,

I tried pgFouine.php app on a sample log file but it reports me some errors.
Could you give me some startup support, please ?
I attach the log here to find what's wrong.


Sorry for the delay. I answered to your private email this evening.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-18 Thread Guillaume Smet

On 6/18/07, Y Sidhu [EMAIL PROTECTED] wrote:

 I am following this discussion with great interest. I have PG running on
FreeBSD and am forced to run pgFouine on a separate Linux box. I am hoping I
can create a log file. and then copy that over and have pgFouine analyze it
on the Linux box.
 a.  I created a log file out of vacuum verbose, is that right? It is not
complete because I don't know how to dump it into a file in some sort of
autmoated fashion. So, I have to take what is on the screen and copy it off.


If you want to analyze a VACUUM log, just run vacuumdb with the option
you need (for example -a -z -v -f for a vacuum full analyze verbose).

# vacuumdb -a -z -v -f  your_log_file.log

Then analyze this log file as explained on the pgFouine website.


 b.  I can also set a variable log_min_duration_statement in pgsql.conf

 I guess I am like Sabin,, and need some hand holding to get started.


This is completely different and it's useful for query log analysis.
So you don't care if you just want to analyze your vacuum behaviour.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Guillaume Smet

Sabin,

On 6/14/07, Sabin Coanda [EMAIL PROTECTED] wrote:

I'd like to understand completely the report generated by VACUUM VERBOSE.
Please tell me where is it documented ?


You can take a look to what I did for pgFouine:
http://pgfouine.projects.postgresql.org/vacuum.html

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Parsing VACUUM VERBOSE

2007-06-14 Thread Guillaume Smet

On 6/14/07, Y Sidhu [EMAIL PROTECTED] wrote:

Can anyone share what value they have set log_min_duration_statement to?


It's OT but we use different values for different databases and needs.

On a very loaded database with a lot of complex queries (lots of join
on big tables, proximity queries, full text queries), we use 100 ms.
It logs ~ 300 000 queries. It allows us to detect big regressions or
new queries which are very slow.

On another database where I want to track transaction leaks, I'm
forced to put it to 0ms.

Basically, the answer is: set it to the lowest value you can afford
without impacting too much your performances (and if you use syslog,
use async I/O or send your log to the network).

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance

2007-05-23 Thread Guillaume Smet

On 5/22/07, cedric [EMAIL PROTECTED] wrote:

I made something very similar, and using PL/pgsql is very slow, when using
perl is very quick.


Another solution is to use tsearch2 for that:
CREATE TABLE word_counts AS SELECT * FROM stat('SELECT
to_tsvector(''simple'', lower(coalesce(field containing words, )))
FROM your table');

I don't know if the fact you have an array of words is a must have or
just a design choice. If you have to keep that, you can transform the
array easily into a string with array_to_string and use the same sort
of query.

I don't know what are exactly your speed requirements but it's quite
fast here. If you drop your table and recreate it into a transaction,
it should work like a charm (or you can use TRUNCATE and INSERT INTO).

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Do I need to rebuild php-pgsql for 8.2.3

2007-04-10 Thread Guillaume Smet

On 4/10/07, Michael Dengler [EMAIL PROTECTED] wrote:

I'm using RHEL4 and wondering if I need to upgrade the php and php-pgsql
packages when upgrading from Postgres 7.4.1 to 8.2.3.


No you don't. Devrim Gunduz provides compat RPM for a long time now.

See http://developer.postgresql.org/~devrim/rpms/compat/ and choose
the correct package for your architecture.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] pg_trgm performance

2007-02-26 Thread Guillaume Smet

On 2/24/07, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

Thanks for your time.


GiN version, short:
   -  Bitmap Heap Scan on tags  (cost=8.64..151.79 rows=41 width=0) (actual 
time=5.555..30.157 rows=7 loops=1)
 Filter: (title % 'foo'::text)
 -  Bitmap Index Scan on trgm_idx  (cost=0.00..8.63 rows=41 width=0) 
(actual time=2.857..2.857 rows= loops=1)
   Index Cond: (title % 'foo'::text)


This is currently the worst case in the gist - gin comparison because
in the index scan, gin version doesn't have the length of the indexed
string. So it returns a lot of rows which have every trigram of your
search string but has in fact a low similarity due to the length of
the indexed string ( rows - 7 rows).
It cannot be fixed at the moment due to the way GIN indexes work.


So, the GiN version seems to be a bit faster for long queries, but it's still
too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for these
three queries, so for the longer queries, the gain is only about a factor
two. (By the way, I would like to stress that this is not my personal music
collection! :-P)


The fact is that pg_trgm is designed to index words and not to index
long sentences. I'm not that surprised it's slow in your case.

It's also my case but following the instructions in README.pg_trgm I
created a dictionary of words using tsearch2 (stat function) and I use
pg_trgm on this dictionary to find similar words in my dictionary.

For example, I rewrite the search:
auberge cevenes
as:
(auberge | auberges | aubberge | auberg)  (ceven | cene | cevenol | cevennes)
using pg_trgm and my query can find Auberge des Cévennes (currently
it's limited to the 4th most similar words but I can change it
easily).

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg_trgm performance

2007-02-26 Thread Guillaume Smet

On 2/26/07, Oleg Bartunov oleg@sai.msu.su wrote:

Did you rewrite query manually or use rewrite feature of tsearch2 ?


Currently, it's manual. I perform a pg_trgm query for each word of the
search words (a few stop words excluded) and I generate the ts_query
with the similar words instead of using the search words.
Is there any benefit of using rewrite() in this case?

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Guillaume Smet

On 2/23/07, Joshua D. Drake [EMAIL PROTECTED] wrote:

Also isn't it pretty much *not* a problem with current versions of
PostgreSQL?


We had a really *big* scalability problem with a quad Xeon MP 2.2 and
PostgreSQL 7.4. The problem is mostly gone since we upgraded to 8.1 a
year ago.

Woodcrest seems to perform really well with PostgreSQL according to
what I can read on the Internet so we will probably change the server
for a dual Woodcrest in a few months.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Guillaume Smet

On 2/23/07, Geoffrey [EMAIL PROTECTED] wrote:

As I've heard.  We're headed for 8 as soon as possible, but until we get
our code ready, we're on 7.4.16.


You should move to at least 8.1 and possibly 8.2. It's not a good idea
to upgrade only to 8 IMHO.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pg_trgm performance

2007-02-23 Thread Guillaume Smet

Florian, Steinar,

Could you try to see if the GIN implementation of pg_trgm is faster in
your cases?

Florian, instead of using WHERE similarity(...)  0.4, you should use
set_limit (SELECT set_limit(0.4);).

I posted it on -patches and it is available here:
http://people.openwide.fr/~gsmet/postgresql/pg_trgm_gin3.diff .

The patch is against HEAD but It applies on 8.2 without any problem.

After applying this patch and installing pg_trgm.sql (you should
uninstall pg_trgm before compiling using the old uninstall script),
then you can create:
CREATE INDEX idx_table_word ON table USING gin(word gin_trgm_ops);

17 characters is quite long so I'm not sure it will help you because
it usually has to recheck a high number of rows due to the GIN
implementation but I'd like to know if it's faster or slower in this
case.

If your data are not private and you don't have the time to test it, I
can test it here without any problem.

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] pg_trgm performance

2007-02-23 Thread Guillaume Smet

Hi Steinar,

On 2/24/07, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

I'm sorry, I can no longer remember where I needed pg_trgm. Simple testing of
your patch seems to indicate that the GiN version is about 65% _slower_ (18ms
vs. 30ms) for a test data set I found lying around, but I remember that on
the data set I needed it, the GIST version was a lot slower than that (think
3-400ms). The 18 vs. 30ms test is a random Amarok database, on 8.2.3
(Debian).


Could you post EXPLAIN ANALYZE for both queries (after 2 or 3 runs)?
And if you can provide  EXPLAIN ANALYZE for a couple of searches
(short length, medium length and long) in both cases, it could be nice
too.

The GiN version is not selective enough currently compared to GiST. It
generally finds the matching rows faster but it has a slower recheck
cond so it's sometimes interesting (in my case) and sometimes not that
interesting (it seems to be your case).

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Proximity query with GIST and row estimation

2007-02-16 Thread Guillaume Smet

On 2/15/07, Guillaume Smet [EMAIL PROTECTED] wrote:

The use of PostGIS is slower than the previous cube/earthdistance
approach (on a similar query and plan).


For the record, here are new information about my proximity query work.

Thanks to Tom Lane, I found the reason of the performance drop. The
problem is that the gist index for operator  is lossy (declared as
RECHECK in the op class).
AFAICS, for the  operator it's done to prevent problems when SRIDs
are not compatible: it forces the execution of the filter and so even
with a should be non lossy bitmap index scan, it throws an error as
if we use a seqscan (Paul, correct me if I'm wrong) because it forces
the execution of the filter.

As I'm sure I won't have this problem (I will write a wrapper stored
procedure so that the end users won't see the SRID used), I created a
different opclass without the RECHECK clause:
CREATE OPERATOR CLASS gist_geometry_ops_norecheck FOR TYPE geometry
USING gist AS
   OPERATOR3,
   FUNCTION1LWGEOM_gist_consistent (internal,
geometry, int4),
   FUNCTION2LWGEOM_gist_union (bytea, internal),
   FUNCTION3LWGEOM_gist_compress (internal),
   FUNCTION4LWGEOM_gist_decompress (internal),
   FUNCTION5LWGEOM_gist_penalty (internal,
internal, internal),
   FUNCTION6LWGEOM_gist_picksplit (internal, internal),
   FUNCTION7LWGEOM_gist_same (box2d, box2d, internal);

UPDATE pg_opclass
SET opckeytype = (SELECT oid FROM pg_type
 WHERE typname = 'box2d'
 AND typnamespace = (SELECT oid FROM pg_namespace
 WHERE nspname=current_schema()))
WHERE opcname = 'gist_geometry_ops_norecheck'
   AND opcnamespace = (SELECT oid from pg_namespace
   WHERE nspname=current_schema());

As I use only the  operator, I put only this one.

And I recreated my index using:
CREATE INDEX idx_lieu_earthpoint ON lieu USING gist(earthpoint
gist_geometry_ops_norecheck);

In the case presented before, the bitmap index scan is then non lossy
and I have similar performances than with earthdistance method.

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-15 Thread Guillaume Smet

Tom,

On 2/13/07, Tom Lane [EMAIL PROTECTED] wrote:

It gets the right answer, yes.  I'm not sure if we could safely put the
condition into the recheck instead of the filter.  The particular code
I showed you has to go the direction it does, because a condition in the
filter has to be checked even if the bitmap is not lossy.  I seem to
recall concluding that we had to recheck partial-index conditions even
if the bitmap is not lossy, but I can't reconstruct my reasoning at the
moment.


I'm still working on my proximity query, testing PostGIS now. I
noticed an issue with a gist index on a point which seems related to
my previous question.

I have the following in my plan:
-  Bitmap Heap Scan on lieu l  (cost=13.37..1555.69 rows=844
width=118) (actual time=3.672..39.497 rows=1509 loops=1)
 Filter: (((dfinvalidlieu IS NULL) OR (dfinvalidlieu = now()))
AND (wgslat IS NOT NULL) AND (wgslon IS NOT NULL) AND (wgslat 
41.89103400) AND (wgslon  12.49244400) AND (earthpoint 
'010320777F01000540019B33410020D1D851410040019B33410040ADDE5141006071B233410040ADDE5141006071B233410020D1D851410040019B33410020D1D85141'::geometry)
AND (numlieu  49187))
  -  Bitmap Index Scan on idx_lieu_earthpoint  (cost=0.00..13.37
rows=1249 width=0) (actual time=2.844..2.844 rows=1510 loops=1)
Index Cond: (earthpoint 
'010320777F01000540019B33410020D1D851410040019B33410040ADDE5141006071B233410040ADDE5141006071B233410020D1D851410040019B33410020D1D85141'::geometry)

Is it normal I have no recheck cond and the index cond of Bitmap Index
Scan is in the filter? Is it also a consequence of the code you
pointed?

The index was created with:
create index idx_lieu_earthpoint on lieu using gist(earthpoint
gist_geometry_ops);

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Proximity query with GIST and row estimation

2007-02-15 Thread Guillaume Smet

On 2/14/07, Paul Ramsey [EMAIL PROTECTED] wrote:

You'll find that PostGIS does a pretty good job of selectivity
estimation.


So I finally have a working PostGIS and I fixed the query to use PostGIS.

The use of PostGIS is slower than the previous cube/earthdistance
approach (on a similar query and plan). But you're right, it does a
pretty good job to calculate the selectivity and the estimations are
really good.
It helps to select a good plan (or a bad one if the previous false
numbers led to a better plan which is my case for certain queries).
I suppose it's normal to be slower as it's more precise. I don't know
which approach is better in my case as I don't need the precision of
PostGIS.

For the record, here is what I did:
select AddGeometryColumn('lieu','earthpoint',32631,'POINT',2);
update lieu set earthpoint=Transform(SetSRID(MakePoint(wgslon,
wgslat), 4327), 32631);
create index idx_lieu_earthpoint on lieu using gist(earthpoint
gist_geometry_ops);

analyze lieu;

select numlieu, nomlieu, wgslon, wgslat, astext(earthpoint) from lieu
where earthpoint  Expand(Transform(SetSRID(MakePoint(12.49244400,
41.89103400), 4326), 32631), 3000);

(3000 is the distance in meters)

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-15 Thread Guillaume Smet

On 2/15/07, Tom Lane [EMAIL PROTECTED] wrote:

I think that the
answer is probably because the index is lossy for this operator,
so it has to be checked even if the bitmap didn't become lossy.
You'd have to check the GIST opclass definition to be sure.


Any idea on what I have to look for (if it's of any interest for
anyone, otherwise, I can live with your answer)?

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-15 Thread Guillaume Smet

On 2/15/07, Guillaume Smet [EMAIL PROTECTED] wrote:

On 2/15/07, Tom Lane [EMAIL PROTECTED] wrote:
 I think that the
 answer is probably because the index is lossy for this operator,
 so it has to be checked even if the bitmap didn't become lossy.
 You'd have to check the GIST opclass definition to be sure.


FYI I've taken a look at PostGIS source code and the index is lossy
for the operator :
OPERATOR3 RECHECK,

(for every operator in the opclass to be exact)

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Proximity query with GIST and row estimation

2007-02-14 Thread Guillaume Smet

Paul,

On 2/14/07, Paul Ramsey [EMAIL PROTECTED] wrote:

You'll find that PostGIS does a pretty good job of selectivity
estimation.


PostGIS is probably what I'm going to experiment in the future. The
only problem is that it's really big for a very basic need.
With my current method, I don't even have to create a new column: I
create directly a functional index so it's really easy to use.
Using PostGIS requires to create a new column and triggers to maintain
it and install PostGIS of course. That's why it was not my first
choice.

Thanks for your answer.

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-13 Thread Guillaume Smet

Hi all,

I'm currently working on optimizing a couple of queries. While
studying the EXPLAIN ANALYZE output of a query, I found this Bitmap
Heap Scan node:

-  Bitmap Heap Scan on lieu l  (cost=12.46..63.98 rows=53 width=94)
(actual time=35.569..97.166 rows=78 loops=1)
 Recheck Cond: ('(4190964.86112204, 170209.656489245,
4801644.52951672),(4194464.8606, 173709.656478266,
4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision,
(wgslon)::double precision))::cube)
 Filter: (parking AND (numlieu  0))
 -  BitmapAnd  (cost=12.46..12.46 rows=26 width=0) (actual
time=32.902..32.902 rows=0 loops=1)
   -  Bitmap Index Scan on idx_lieu_earth  (cost=0.00..3.38
rows=106 width=0) (actual time=30.221..30.221 rows=5864 loops=1)
 Index Cond: ('(4190964.86112204, 170209.656489245,
4801644.52951672),(4194464.8606, 173709.656478266,
4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision,
(wgslon)::double precision))::cube)
   -  Bitmap Index Scan on idx_lieu_parking  (cost=0.00..8.83
rows=26404 width=0) (actual time=0.839..0.839 rows=1095 loops=1)
 Index Cond: (parking = true)

What surprises me is that parking is in the filter and not in the
Recheck Cond whereas it's part of the second Bitmap Index Scan of the
Bitmap And node.
AFAIK, BitmapAnd builds a bitmap of the pages returned by the two
Bitmap Index Scans so I supposed it should append both Index Cond in
the Recheck Cond.

Is there a reason why the second Index Cond in the filter? Does it
make a difference in terms of performance (I suppose no but I'd like
to have a confirmation)?

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Proximity query with GIST and row estimation

2007-02-13 Thread Guillaume Smet

Hi all,

Following the work on Mark Stosberg on this list (thanks Mark!), I
optimized our slow proximity queries by using cube, earthdistance
(shipped with contrib) and a gist index. The result is globally very
interesting apart for a specific query and we'd like to be able to fix
it too to be more consistent (it's currently faster with a basic
distance calculation based on acos, cos and so on but it's slow
anyway).

The problem is that we have sometimes very few places near a given
location (small city) and sometimes a lot of them (in Paris, Bruxelles
and so on - it's the case we have here). The gist index I created
doesn't estimate the number of rows in the area very well.

Table: lieu (100k rows) with wgslat and wgslon as numeric
Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG')
Index: idx_lieu_earth gist (ll_to_earth(wgslat::double precision,
wgslon::double precision))

The simplified query is:
SELECT DISTINCT l.numlieu, l.nomlieu, ROUND
(earth_distance(ll_to_earth(48.85957600, 2.34860800),
ll_to_earth(l.wgslat, l.wgslon))) as dist
FROM lieu l, lieugelieu lgl
WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600,
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu =
l.numlieu ORDER BY dist ASC LIMIT 2;
It's used to find the nearest car parks from a given location.

The plan is attached plan_earthdistance_nestedloop.txt. It uses a
nested loop because the row estimate is pretty bad: (cost=0.00..3.38
rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1).

If I disable the nested loop, the plan is different and faster (see
plan_earthdistance_hash.txt attached).

Is there any way to improve this estimation? I tried to set the
statistics of wgslat and wgslon higher but it doesn't change anything
(I don't know if the operator is designed to use the statistics).

Any other idea to optimize this query is very welcome too.

--
Guillaume
db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND 
(earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, 
l.wgslon))) as dist
db-# FROM lieu l, lieugelieu lgl
db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = 
l.numlieu ORDER BY dist ASC LIMIT 2;

QUERY PLAN  

   
---
 Limit  (cost=626.84..626.85 rows=1 width=51) (actual time=449.287..449.298 
rows=2 loops=1)
   -  Unique  (cost=626.84..626.85 rows=1 width=51) (actual 
time=449.283..449.290 rows=2 loops=1)
 -  Sort  (cost=626.84..626.84 rows=1 width=51) (actual 
time=449.278..449.279 rows=2 loops=1)
   Sort Key: round(sec_to_gc(cube_distance('(4192714.86111655, 
171959.656483755, 4803394.52951123)'::cube, (ll_to_earth((l.wgslat)::double 
precision, (l.wgslon)::double precision))::cube))), l.numlieu, l.nomlieu
   -  Nested Loop  (cost=3.38..626.83 rows=1 width=51) (actual 
time=258.877..448.651 rows=78 loops=1)
 -  Bitmap Heap Scan on lieu l  (cost=3.38..201.34 
rows=106 width=51) (actual time=32.988..60.197 rows=5786 loops=1)
   Recheck Cond: ('(4190964.86112204, 170209.656489245, 
4801644.52951672),(4194464.8606, 173709.656478266, 4805144.52950574)'::cube 
@ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube)
   -  Bitmap Index Scan on idx_lieu_earth  
(cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 
loops=1)
 Index Cond: ('(4190964.86112204, 
170209.656489245, 4801644.52951672),(4194464.8606, 173709.656478266, 
4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, 
(wgslon)::double precision))::cube)
 -  Index Scan using 
idx_lieugelieu_codegelieu_numlieu_principal on lieugelieu lgl  (cost=0.00..4.00 
rows=1 width=4) (actual time=0.052..0.052 rows=0 loops=5786)
   Index Cond: (((lgl.codegelieu)::text = 'PKG'::text) 
AND (lgl.numlieu = outer.numlieu))
 Total runtime: 449.607 ms
(12 rows)
db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND 
(earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, 
l.wgslon))) as dist
db-# FROM lieu l, lieugelieu lgl
db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = 
l.numlieu ORDER BY dist ASC LIMIT 2;
 

Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-13 Thread Guillaume Smet

On 2/13/07, Tom Lane [EMAIL PROTECTED] wrote:

bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual);

What's not immediately clear is why the condition was in both lists to
start with.  Perhaps idx_lieu_parking is a partial index with this as
its WHERE condition?


Yes, it is: idx_lieu_parking btree (parking) WHERE parking = true .
Sorry for not pointing it immediatly.
If not, the index is not used at all (there are very few lines in lieu
with parking=true).

So the basic explanation is that it's in both lists due to the partial
index and only qpqual keeps the condition? I would have expected the
opposite but it doesn't change anything I suppose?

Thanks for your answer.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Guillaume Smet

Dan,

On 2/13/07, Dan Harris [EMAIL PROTECTED] wrote:

Having upgraded to 2.6.18 fairly recently, I am *very* interested in
what caused the throughput to drop in 2.6.18?  I haven't done any
benchmarking on my system to know if it affected my usage pattern
negatively, but I am curious if anyone knows why this happened?


I think you misread the graph. PostgreSQL 8.2 seems to be
approximately 20% faster with kernel 2.6.18 on the platforms tested
(and using tweakers.net benchmark).

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)

2007-01-03 Thread Guillaume Smet

Erik,

Could you set log_min_duration_statement=0 on your server and enable
logging (tutorial here if you don't know how to do that:
http://pgfouine.projects.postgresql.org/tutorial.html).

You should see which queries are executed in both cases and find the
slow one easily.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High CPU Load

2006-09-18 Thread Guillaume Smet

On 9/18/06, Jérôme BENOIS [EMAIL PROTECTED] wrote:

Tomorrow morning i plan to add 2Go RAM in order to test difference with
my actual config.


I don't think more RAM will change anything if you don't swap at all.
You can try to set shared_buffers lower (try 32768 and 16384) but I
don't think it will change anything in 8.1.

The only thing left IMHO is that 8.1 is choosing a bad plan which
consumes a lot of CPU for at least a query.

When you analyze your logs, did you see a particularly slow query? Can
you compare query log analysis from your old server and your new one?

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High CPU Load

2006-09-14 Thread Guillaume Smet

On 9/14/06, Jérôme BENOIS [EMAIL PROTECTED] wrote:

   I migrated Postgres server from 7.4.6 to 8.1.4, But my server is
completely full, by moment load average  40
All queries analyzed by EXPLAIN, all indexes are used .. IO is good ...


What is the bottleneck? Are you CPU bound? Do you have iowait? Do you
swap? Any weird things in vmstat output?


My configuration is correct ?
work_mem = 65536


If you have a lot of concurrent queries, it's probably far too much.
That said, if you don't swap, it's probably not the problem.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High CPU Load

2006-09-14 Thread Guillaume Smet

On 9/14/06, Jérôme BENOIS [EMAIL PROTECTED] wrote:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
15667 postgres  25   0  536m 222m 532m R 98.8 11.0   1:39.29 postmaster
19533 postgres  25   0  535m 169m 532m R 92.9  8.3   0:38.68 postmaster
16278 postgres  25   0  537m 285m 532m R 86.3 14.1   1:37.56 postmaster


Enable stats_command_string and see which queries are running on these
backends by selecting on pg_stat_activity.

Do the queries finish? Do you have them in your query log?

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High CPU Load

2006-09-14 Thread Guillaume Smet

Jérôme,

Perhaps it's a stupid question but are your queries slower than
before? You didn't tell it.

IMHO, it's not a problem to have a high load if you have a lot of
users and your queries are fast (and with 8.1, they should be far
faster than before).

To take a real example, we had a problem with a quad xeon running
postgres 7.4 and even when there were a lot of queries, the load was
always lower than 4 and suddenly the queries were really slow and the
database was completely unusable.
When we upgraded to 8.1, on very high load, we had a far higher cpu
load but queries were far faster even with a high cpu load.

Considering your top output, I suspect you use HT and you should
really remove it if it's the case.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High CPU Load

2006-09-14 Thread Guillaume Smet

On 9/14/06, Jérôme BENOIS [EMAIL PROTECTED] wrote:

Yes i have a lot of users ;-)


So your work_mem is probably far too high (that's what I told you in
my first message) and you probably swap when you have too many users.
Remember that work_mem can be used several times per query (and it's
especially the case when you have a lot of sorts).
When your load is high, check your swap activity and your io/wait. top
gives you these information. If you swap, lower your work_mem to 32 MB
for example then see if it's enough for your queries to run fast (you
can check if there are files created in the $PGDATA/base/your
database oid/pg_tmp) and if it doesn't swap. Retry with a
lower/higher value to find the one that fits best to your queries and
load.


I agree but by moment DB Server is so slow.


Yep, that's the information that was missing :).


what's means HT please ?


Hyper threading. It's usually not recommended to enable it on
PostgreSQL servers. On most servers, you can disable it directly in
the BIOS.

--
Guillaume

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] High concurrency OLTP database performance tuning

2006-08-31 Thread Guillaume Smet

Cosimo,

On 8/31/06, Cosimo Streppone [EMAIL PROTECTED] wrote:

The problem is that under peak load, when n. of concurrent transactions
raises, there is a sensible performance degradation.


Could you give us more information about the performance degradation?
Especially cpu load/iostat/vmstat data when the problem occurs can be
interesting.

--
Guillaume

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Guillaume Smet

Guillaume,

On 28 Aug 2006 11:43:16 +0200, Guillaume Cottenceau [EMAIL PROTECTED] wrote:

 max_fsm_pages is 2
 max_fsm_relations is 1000
Do they look low?


Yes they are probably too low if you don't run VACUUM on a regular
basis and you have a lot of UPDATE/DELETE activity. FSM doesn't take a
lot of memory so it's usually recommended to have a confortable value
for it.

I usually recommend to read:
http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10116cNode=5K1C3W
http://www.pervasive-postgres.com/instantkb13/article.aspx?id=10087cNode=5K1C3W
to understand better what VACUUM and FSM mean.


Can you elaborate? I have created a couple of indexes (according
to multiple models of use in our application) and they do take up
quite some disk space (table dump is 600M but after restore it
takes up 1.5G on disk) but I thought they could only do good or
never be used, not impair performance..


Index slow downs write activity (you have to maintain them). It's not
always a good idea to create them.


 Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.

So these would have reordered the data for faster sequential
access which is not the case of VACUUM ANALYZE?


VACUUM ANALYZE won't help you if your database is completely bloated.
And AFAICS you're not running it on a regular basis so your database
was probably completely bloated which means:
- bloated indexes,
- bloated tables (ie a lot of fragmentation in the pages which means
that you need far more pages to store the same data).

The only ways to solve this situation is either to dump/restore or run
a VACUUM FULL ANALYZE (VERBOSE is better to keep a log), and
eventually reindex any bloated index (depends on your situation).


 When the free_space_map is to low, VACUUM ANALYZE should have told you
 via a warning (at least, if your logging is set appropriately).

Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
can't be sure :/


You should really run VACUUM ANALYZE VERBOSE on a regular basis and
analyze the logs to be sure your VACUUM strategy and FSM settings are
OK.

I developed http://pgfouine.projects.postgresql.org/vacuum.html to
help us doing it on our production databases.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Bad Planner Statistics for Uneven distribution.

2006-07-21 Thread Guillaume Smet

Tom,

On 7/21/06, Tom Lane [EMAIL PROTECTED] wrote:

It's really not possible for a full-table indexscan to be faster than a
seqscan, and not very credible for it even to be approximately as fast.
I suspect your second query here is the beneficiary of the first query
having fetched all the pages into cache.  In general, if you want to
optimize for a mostly-cached database, you need to reduce
random_page_cost below its default value ...


We discussed this case on IRC and the problem was not the first set of
queries but the second one:
select brand_id from brands where exists (select 1 from models_brands
where brand = brands.brand_id);).

Isn't there any way to make PostgreSQL have a better estimation here:
-  Index Scan using models_brands_brand on models_brands
(cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008
rows=0 loops=303)
  Index Cond: (brand = $0)

I suppose it's because the planner estimates that there will be 92372
result rows that it chooses the seqscan instead of the index scan.
ALTER STATISTICS didn't change anything.
IIRC, there were already a few threads about the same sort of
estimation problem and there wasn't any solution to solve this
problem. Do you have any hint/ideas?

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance Problem between Ora 10g and Psql

2006-07-12 Thread Guillaume Smet

Thomas,

On 7/12/06, Thomas Radnetter [EMAIL PROTECTED] wrote:

Is this the correct place to issue this problem?


It is if your issue is due to a PostgreSQL performance problem.


How can I trace down the cause for this performance problem?


The first thing to do is to determine if it is a problem due to the
Oracle - ODBC - PostgreSQL thing or if it is a problem with the
query. My advice is to set log_min_duration_statement to 0 in your
postgresql.conf (and set the logger so that you can see the log output
somewhere). Then you'll see if your query is slow.

If your query is slow, post the output of an explain analyze on the
list with all the relevant information (structure of the concerned
tables, indexes, size...).

If not, it's probably more an ODBC problem.

Regards,

--
Guillaume Smet
Open Wide

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Selects query stats?

2006-05-23 Thread Guillaume Smet

On 5/23/06, Dan Gorman [EMAIL PROTECTED] wrote:

What I am looking for is that our DB is doing X selects a min.


If you're using 7.4, you can use log_duration to only log duration. It
won't log all the query text, only one short line per query. Then you
can use pgFouine to analyze this and having a graph such like that
http://pgfouine.projects.postgresql.org/reports/sample_hourly.html .
If you only log duration, you won't be able to separate
insert/delete/update from select though. So it can be interesting only
if they negligible.

Note that this is not possible in 8.x. You'll have to log the
statement to log the duration. I proposed a patch but it was refused
as it complexified the log configuration.


Turning on logging isn't an option as it will create too much IO in
our enviornment.


What we do here is logging on another machine via the network using
syslog. From our experience, it's not the fact to log that really
slows down the db but the generated I/O load. So if you do that, you
should be able to log the statements without slowing down your
database too much.

On our production databases, we keep the log running all the time and
we generate reports daily.

Regards,

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] how unsafe (or worst scenarios) when setting fsync OFF for postgresql

2006-04-27 Thread Guillaume Smet
Guoping,

On 4/27/06, Guoping Zhang [EMAIL PROTECTED] wrote:
 We have to looking at setting fsync OFF option for performance reason,

Did you try the other wal sync methods (fdatasync in particular)? I
saw a few posts lately explaining how changing sync method can affect
performances in specific cases.

--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query on postgresql 7.4.2 not using index

2006-04-25 Thread Guillaume Smet
On 4/25/06, Arnau [EMAIL PROTECTED] wrote:
 espsm_moviltelevision=# EXPLAIN ANALYZE SELECT agenda_user_group_id FROM
 agenda_users_groups
 espsm_moviltelevision-# WHERE group_id = '9';
 QUERY PLAN
 
   Seq Scan on agenda_users_groups  (cost=0.00..53108.45 rows=339675
 width=8) (actual time=916.903..5763.830 rows=367026 loops=1)
 Filter: (group_id = 9::numeric)
   Total runtime: 7259.861 ms
 (3 filas)

Arnau,

Why do you use a numeric instead of an integer/bigint??

IIRC, there were a few problems with index on numeric column on older
version of PostgreSQL.

You can't change the type of a column with 7.4, so create a new
integer column then copy the values in this new column, drop the old
one, rename the new one. Run vacuum analyze and recreate your index.

It should work far better with an int.

Note that you will have to update all the tables referencing this key...

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-21 Thread Guillaume Smet
On 3/16/06, Tom Lane [EMAIL PROTECTED] wrote:
 Can you try strace'ing some of the backend processes while the system is
 behaving like this?  I suspect what you'll find is a whole lot of
 delaying select() calls due to high contention for spinlocks ...

As announced, we have migrated our production server from 7.4.8 to
8.1.3 this morning. We did some strace'ing before the migration and
you were right on the select calls. We had a lot of them even when the
database was not highly loaded (one every 3-4 lines).

After the upgrade, we have the expected behaviour with a more linear
scalability and a growing cpu load when the database is highly loaded
(and no cpu idle anymore in this case). We have fewer context switches
too.

8.1.3 definitely is far better for quad Xeon MP and I recommend the
upgrade for everyone having this sort of problem.

Tom, thanks for your great work on this problem.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] planner with index scan cost way off actual cost, advices to tweak cost constants?

2006-03-18 Thread Guillaume Smet
Guillaume,

On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau
wrote:
 Reading the documentation and postgresql list archives, I have
 run ANALYZE right before my tests, I have increased the
 statistics target to 50 for the considered table; my problem is
 that the index scan cost reported by EXPLAIN seems to be around
 12.7 times higher that it should, a figure I suppose incompatible
 (too large) for just random_page_cost and effective_cache_size
 tweaks.

It's not surprising you have a high cost for an index scan which is
planned to return and returns so much rows. I really don't think the
planner does something wrong on this one.
AFAIK, increasing the statistics target won't do anything to reduce
the cost as the planner estimation for the number of returned rows is
already really accurate and probably can't be better.

 Of course real queries use smaller date ranges.

What about providing us the respective plans for your real queries?
And in a real case. It's a bad idea to compare index scan and seqscan
when your data have to be loaded in RAM.
Before doing so create an index on the date column to have the most
effective index possible.

 - I then tried to tweak random_page_cost and effective_cache_size
   following advices from documentation:

 SET random_page_cost = 2;

random_page_cost is the way to go for this sort of thing but I don't
think it's a good idea to have it too low globally and I'm still
thinking the problem is that your test case is not accurate.

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
Hello,

We are experiencing performances problem with a quad Xeon MP and
PostgreSQL 7.4 for a year now. Our context switch rate is not so high
but the load of the server is blocked to 4 even on very high load and
we have 60% cpu idle even in this case. Our database fits in RAM and
we don't have any IO problem. I saw this post from Tom Lane
http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php
and several other references to problem with Xeon MP and I suspect our
problems are related to this.
We tried to put our production load on a dual standard Xeon on monday
and it performs far better with the same configuration parameters.

I know that work has been done by Tom for PostgreSQL 8.1 on
multiprocessor support but I didn't find any information on if it
solves the problem with Xeon MP or not.

My question is should we expect a resolution of our problem by
switching to 8.1 or will we still have problems and should we consider
a hardware change? We will try to upgrade next tuesday so we will have
the real answer soon but if anyone has any experience or information
on this, he will be very welcome.

Thanks for your help.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
Richard,

 You should be seeing context-switching jump dramatically if it's the
 classic multi-Xeon problem. There's a point at which it seems to just
 escalate without a corresponding jump in activity.

No we don't have this problem of very high context switching in our
case even when the database is very slow. When I mean very slow, we
have pages which loads in a few seconds in the normal case (load
between 3 and 4) which takes several minutes (up to 5-10 minutes) to
be generated in the worst case (load at 4 but really bad
performances).
If I take a look on our cpu load graph, in one year, the cpu load was
never higher than 5 even in the worst cases...

 I checked with Tom last week. Thread starts below:
http://archives.postgresql.org/pgsql-hackers/2006-02/msg01118.php

 He's of the opinion that 8.1.3 will be an improvement.

Thanks for pointing me this thread, I searched in -performance not in
-hackers as the original thread was in -performance. We planned a
migration to 8.1.3 so we'll see what happen with this version.

Do you plan to test it before next tuesday? If so, I'm interested in
your results. I'll post our results here as soon as we complete the
upgrade.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
Sven,

On 3/16/06, Sven Geisler [EMAIL PROTECTED] wrote:
 What version of XEON MP does your server have?

The server is a dell 6650 from end of 2004 with 4 xeon mp 2.2 and 2MB
cache per proc.

Here are the information from Dell:
4x PROCESSOR, 80532, 2.2GHZ, 2MB cache, 400Mhz, SOCKET F
8x DUAL IN-LINE MEMORY MODULE, 512MB, 266MHz

 Do you use Hyperthreading?

No, we don't use it.

 You should provide details from the XEON DP?

The only problem is that the Xeon DP is installed with a 2.6 kernel
and a postgresql 8.1.3 (it is used to test the migration from 7.4 to
8.1.3). So it's very difficult to really compare the two behaviours.

It's a Dell 2850 with:
2 x PROCESSOR, 80546K, 2.8G, 1MB cache, XEON NOCONA, 800MHz
4 x DUAL IN-LINE MEMORY MODULE, 1GB, 400MHz

This server is obviously newer than the other one.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
On 3/16/06, Sven Geisler [EMAIL PROTECTED] wrote:
 Hi Guillaume,

 I had a similar issue last summer. Could you please provide details
 about your XEON MP server and some statistics (context-switches/load/CPU
 usage)?

I forgot the statistics:
CPU load usually from 1 to 4.
CPU usage  40% for each processor usually and sometimes when the
server completely hangs, it grows to 60%..,

Here is a top output of the server at this time:
 15:21:17  up 138 days, 13:25,  1 user,  load average: 1.29, 1.25, 1.38
82 processes: 81 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   25.7%0.0%3.9%   0.0% 0.3%0.1%   69.7%
   cpu00   29.3%0.0%4.7%   0.1% 0.5%0.0%   65.0%
   cpu01   20.7%0.0%1.9%   0.0% 0.3%0.0%   76.8%
   cpu02   25.5%0.0%5.5%   0.0% 0.1%0.3%   68.2%
   cpu03   27.3%0.0%3.3%   0.0% 0.1%0.1%   68.8%
Mem:  3857224k av, 3298580k used,  558644k free,   0k shrd,  105172k buff
   2160124k actv,  701304k in_d,   56400k in_c
Swap: 4281272k av,6488k used, 4274784k free 2839348k cached

We have currently between 3000 and 13000 context switches/s, average
of 5000 I'd say visually.

Here is a top output I had on november 17 when the server completely
hangs (several minutes for each page of the website) and it is typical
of this server behaviour:
17:08:41  up 19 days, 15:16,  1 user,  load average: 4.03, 4.26, 4.36
288 processes: 285 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   59.0%0.0%8.8%   0.2% 0.0%0.0%   31.9%
   cpu00   52.3%0.0%   13.3%   0.9% 0.0%0.0%   33.3%
   cpu01   65.7%0.0%7.6%   0.0% 0.0%0.0%   26.6%
   cpu02   58.0%0.0%7.6%   0.0% 0.0%0.0%   34.2%
   cpu03   60.0%0.0%6.6%   0.0% 0.0%0.0%   33.3%
Mem:  3857224k av, 3495880k used,  361344k free,   0k shrd,   92160k buff
   2374048k actv,  463576k in_d,   37708k in_c
Swap: 4281272k av,   25412k used, 4255860k free 2173392k cached

As you can see, load is blocked to 4, no iowait and cpu idle of 30%.

Vmstat showed 5000 context switches/s on average so we had no context
switch storm.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
On 3/16/06, Sven Geisler [EMAIL PROTECTED] wrote:
 Did you compare 7.4 on a 4-way with 8.1 on a 2-way?

I know there are too many parameters changing between the two servers
but I can't really change anything before tuesday. On tuesday, we will
be able to compare both servers with the same software.

 How many queries and clients did you use to test the performance?

Googlebot is indexing this site generating 2-3 mbits/s of traffic so
we use the googlebot to stress this server. There was a lot of clients
and a lot of queries.

 How much faster is the XEON DP?

Well, on high load, PostgreSQL scales well on the DP (load at 40,
queries slower but still performing well) and is awfully slow on the
MP box.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
On 3/16/06, Tom Lane [EMAIL PROTECTED] wrote:
 Can you try strace'ing some of the backend processes while the system is
 behaving like this?  I suspect what you'll find is a whole lot of
 delaying select() calls due to high contention for spinlocks ...

Tom,

I think we can try to do it.

You mean strace -p pid with pid on some of the postgres process not on
the postmaster itself, does you? Do we need other options?
Which pattern should we expect? I'm not really familiar with strace
and its output.

Thanks for your help.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL and Xeon MP

2006-03-16 Thread Guillaume Smet
On 3/16/06, Tom Lane [EMAIL PROTECTED] wrote:
 What we want to find out is if there's a lot of select()s and/or
 semop()s shown in the result.  Ideally there wouldn't be any, but
 I fear that's not what you'll find.

OK, I'll try to do it on monday before our upgrade then see what
happens with PostgreSQL 8.1.3.

Thanks for your help.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Very slow queries - please help.

2005-11-24 Thread Guillaume Smet

Hi,


I'm also sending the EXPLAIN outputs.


Please provide EXPLAIN ANALYZE outputs instead of EXPLAIN. You will have 
more information.


Indexes on your tables are obviously missing. You should try to add:

CREATE INDEX idx_node_filter ON node(name, type, usage);
CREATE INDEX idx_job_log_filter ON job_log(job_name, job_start, job_stop);

I'm not so sure it's a good idea to add job_stop in this index as you 
have an IS NULL in your query so I'm not sure it can be used. You should 
try it anyway and remove it if not needed.


I added all your search fields in the indexes but it depends a lot on 
the selectivity of your conditions. I don't know your data but I think 
you understand the idea.


HTH

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] weird performances problem

2005-11-22 Thread Guillaume Smet

Andrew,


I would be very suspicious of that much memory for sort.  Please see
the docs for what that does.  That is the amount that _each sort_ can
allocate before spilling to disk.
If some set of your users are
causing complicated queries with, say, four sorts apiece, then each
user is potentially allocating 4x that much memory.  That's going to
wreak havoc on your disk buffers (which are tricky to monitor on most
systems, and impossible on some).


Yes, we have effectively complicated queries. That's why we put the 
sort_mem so high. I'll see if we can put it lower for the next few days 
to see if it improves our performances.


Thanks for your help.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] weird performances problem

2005-11-22 Thread Guillaume Smet

Qingqing Zhou wrote:
Someone is doing a massive *write* at this time, which makes your query 
*read* quite slow. Can you find out which process is doing write?


Indexes should be in memory so I don't expect a massive write to slow 
down the select queries. sdb is the RAID10 array dedicated to our data 
so the postgresql process is the only one to write on it. I'll check 
which write queries are running because there should really be a few 
updates/inserts on our db during the day.


On a four days log analysis, I have the following:
SELECT  403,964
INSERT  574
UPDATE  393
DELETE  26
So it's not really normal to have a massive write during the day.

Thanks for your help

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] weird performances problem

2005-11-22 Thread Guillaume Smet

Ron wrote:
If I understand your HW config correctly, all of the pg stuff is on the 
same RAID 10 set?


No, the system and the WAL are on a RAID 1 array and the data on their 
own RAID 10 array.
As I said earlier, there's only a few writes in the database so I'm not 
really sure the WAL can be a limitation: IIRC, it's only used for writes 
isn't it?
Don't you think we should have some io wait if the database was waiting 
for the WAL? We _never_ have any io wait on this server but our CPUs are 
still 30-40% idle.


A typical top we have on this server is:
 15:22:39  up 24 days, 13:30,  2 users,  load average: 3.86, 3.96, 3.99
156 processes: 153 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   50.6%0.0%4.7%   0.0% 0.6%0.0%   43.8%
   cpu00   47.4%0.0%3.1%   0.3% 1.5%0.0%   47.4%
   cpu01   43.7%0.0%3.7%   0.0% 0.5%0.0%   51.8%
   cpu02   58.9%0.0%7.7%   0.0% 0.1%0.0%   33.0%
   cpu03   52.5%0.0%4.1%   0.0% 0.1%0.0%   43.0%
Mem:  3857224k av, 3307416k used,  549808k free,   0k shrd,   80640k 
buff

   2224424k actv,  482552k in_d,   49416k in_c
Swap: 4281272k av,   10032k used, 4271240k free 2602424k 
cached


As you can see, we don't swap, we have free memory, we have all our data 
cached (our database size is 1.5 GB).


Context switch are between 10,000 and 20,000 per seconds.

This concept works for other tables as well.  If you have a tables that 
both want services at the same time, disk arm contention will drag 
performance into the floor when they are on the same HW set.
Profile your HD access and put tables that want to be accessed at the 
same time on different HD sets.  Even if you have to buy more HW to do it.


I use iostat and I can only see a little write activity and no read 
activity on both raid arrays.


--
Guillaume

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


  1   2   >