Re: [PERFORM] using like in a prepare doesnt' use the right index

2008-03-28 Thread Gavin Sherry
On Thu, Mar 27, 2008 at 03:14:49PM -0400, Dave Cramer wrote:
> I have a query which is
> 
> prepare s_18 as select uid from user_profile where name like  
> $1::varchar and isactive=$2 order by name asc limit 250;
> 
> explain analyze execute s_18 ('atxchery%','t');
>QUERY 
> PLAN
> -
>  Limit  (cost=0.00..7965.22 rows=250 width=14) (actual  
> time=301.714..3732.269 rows=1 loops=1)
>->  Index Scan using user_profile_name_key on user_profile   
> (cost=0.00..404856.37 rows=12707 width=14) (actual  
> time=301.708..3732.259 rows=1 loops=1)
>  Filter: (((name)::text ~~ $1) AND (isactive = $2))
>  Total runtime: 3732.326 ms
> 
> without prepared statements we get
> 
> explain analyze select uid from user_profile where name like 'foo%'  
> and isactive='t' order by name asc limit 250;
>   
> QUERY PLAN
> -
>  Limit  (cost=293.89..294.08 rows=73 width=14) (actual  
> time=5.947..6.902 rows=250 loops=1)
>->  Sort  (cost=293.89..294.08 rows=73 width=14) (actual  
> time=5.943..6.258 rows=250 loops=1)
>  Sort Key: name
>  Sort Method:  top-N heapsort  Memory: 38kB
>  ->  Bitmap Heap Scan on user_profile  (cost=5.36..291.64  
> rows=73 width=14) (actual time=0.394..2.481 rows=627 loops=1)
>Filter: (isactive AND ((name)::text ~~ 'foo%'::text))
>->  Bitmap Index Scan on user_profile_name_idx   
> (cost=0.00..5.34 rows=73 width=0) (actual time=0.307..0.307 rows=628  
> loops=1)
>  Index Cond: (((name)::text ~>=~ 'foo'::text) AND  
> ((name)::text ~<~ 'fop'::text))
> 
> 
> There are two indexes on it
> 
> "user_profile_name_idx" UNIQUE, btree (name varchar_pattern_ops)
> "user_profile_name_key" UNIQUE, btree (name)
> 
> one for equality, one for like

This is behaving as designed because the planner transforms the
predicate in the second query: Index Cond: (((name)::text ~>=~
'foo'::text) AND ((name)::text ~<~ 'fop'::text)).

It cannot make this transformation for a prepared statement where the 
LIKE argument is a PREPARE parameter (the first query), since the
transformation depends on inspecting the actual string.

You could probably continue using prepared statements and make this
transformation yourself but you'll have to be careful about creating the
'greater' string (see make_greater_string()).

Come to think of it, it'd easier to just make a set returning function
which executes this query, if you need to stick with prepare/execute.

Thanks,

Gavin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help

2007-03-08 Thread Gavin Sherry
On Tue, 6 Mar 2007, lissette wrote:

> 2007-02-02 16:41:03 [Thread-6]message(s) in mail box>
> 2007-02-02 16:41:13 [Thread-39   ]an ebxml message from mail box>
> 2007-02-02 16:41:13 [Thread-39   ]  
> 
> 2007-02-02 16:41:14 [Thread-39   ]outgoing message: [EMAIL PROTECTED]>
> 2007-02-02 16:41:25 [Thread-6]collecting message from mail box>
> hk.hku.cecid.piazza.commons.net.ConnectionException: Unable to
> connect
> to incoming mail server
> by javax.mail.AuthenticationFailedException: [LOGIN-DELAY]
> minimum
> time between mail checks violation
> at
> hk.hku.cecid.piazza.commons.net.MailReceiver.connect(MailReceiver.java:
> 66)

Did you actually read the error message? This seems unrelated to
PostgreSQL and definately not related to PostgreSQL performance.

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Postgres and really huge tables

2007-01-18 Thread Gavin Sherry
Hi Brian,

On Thu, 18 Jan 2007, Brian Hurt wrote:

> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.  Obviously the
> table will be partitioned, and probably spread among several different
> file systems.  Any other tricks I should know about?

Here is a blog post from a user who is in the multi-tb range:

http://www.lethargy.org/~jesus/archives/49-PostreSQL-swelling.html

I think Theo sums up some of the pros and cons well.

Your best bet is a test on scale. Be sure to get our feed back if you
encounter issues.

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Temporary table retains old contents on update eventually

2006-07-18 Thread Gavin Sherry
On Tue, 18 Jul 2006, Rusty Conover wrote:

> Hi,
>
> It would seem that doing any changes on a temp table forces a copy of
> the entire contents of the table to be retained in memory/disk. Is
> this  happening due to MVCC?  Is there a way to change this behavior?
> It could be very useful when you have really huge temp tables that
> need to be updated a few times before they can be dropped.

This is caused by our MVCC implementation. It cannot be easily changed. We
rely on MVCC for two things: concurrency and rolling back of aborted
commands. Without the latter, we couldn't support the following trivially:

template1=# create temp table bar (i int);
CREATE TABLE
template1=# begin;
BEGIN
template1=# insert into bar values(1);
INSERT 0 1
template1=# abort;
ROLLBACK
template1=# select * from bar;
 i
---
(0 rows)

It would be nice if we could special case temp tables because of the fact
that concurrency does not come into the equation but I cannot see it
happening without a generalised overwriting MVCC system.

The only alternative in the mean time is to vacuum your temporary table(s)
as part of your interaction with them.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Recovery will take 10 hours

2006-04-20 Thread Gavin Sherry
On Thu, 20 Apr 2006, Brendan Duddridge wrote:

> Hi Tomas,
>
> Hmm... ktrace -p PID -c returns immediately without doing anything
> unless I've previously done a ktrace -p PID.
>
> According to the man page for ktrace's -c flag:
>-c  Clear the trace points associated with the specified file
> or processes.

On other systems, strace/truss with -c produces a list of sys calls with
the number of times they've been called in the elapsed period.

To answer your other question, temporarily disabling fsync during the
recovery should speed it up.

For future reference, processing thousands of WAL files for recovery is
not ideal. You should be doing a base backup much more often.

Gavin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] import performance

2006-03-13 Thread Gavin Sherry
On Mon, 13 Mar 2006, Dave Dutcher wrote:

> [Snip]
> > >
> > > shared_buffers = 256
> >
> > Make this higher too. If this is a dedicated machine with 512 MB of
> ram,
> > set it to something like 125000.
> >
> > You may need to adjust shared memory settings for your operating
> system.
> > See the manual for details.
> >
>
> Whoa.  Maybe I'm wrong, but isn't each buffer 8192 bytes?  So you are
> suggesting that he set his shared buffers to a gigabyte on a machine
> with 512 MB of ram?  Or was that just a miscalculation?

One to many zeros. Oops.

Gavin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] import performance

2006-03-13 Thread Gavin Sherry
On Tue, 14 Mar 2006, Chris wrote:

> Hi all,
>
> I'm trying to work out why my 8.1 system is slower than my 7.4 system
> for importing data.
>
> The import is a lot of "insert into" commands - it's a converted
> database from another system so I can't change it to copy commands.
>
>
> My uncommented config options:
>
>
> autovacuum = off
>
> bgwriter_all_maxpages = 15
> bgwriter_all_percent = 10.0

The above is a bit high.

> bgwriter_delay = 2000

This too.

> bgwriter_lru_maxpages = 10
> bgwriter_lru_percent = 5.0
>
> checkpoint_segments = 10
>
> commit_delay = 10
> commit_siblings = 500

Way too high

>
> temp_buffers = 500
>
> wal_buffers = 16

Make this at least 64.

>
> max_connections = 16
>
> shared_buffers = 256

Make this higher too. If this is a dedicated machine with 512 MB of ram,
set it to something like 125000.

You may need to adjust shared memory settings for your operating system.
See the manual for details.

Thanks,

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL performance question.

2005-12-14 Thread Gavin Sherry
On Thu, 15 Dec 2005, Harry Jackson wrote:

> Hi all,

> I have been using PostgreSQL (currently 7.4.7) for several years now and
> am very happy with it but I currently run a website that has had a
> little bit of a boost and I am starting to see some performance problems
> (Not necessarily PostgreSQL).

Definately plan an 8.1 upgrade.

[snip]

> The database has been allocated 2Gb worth of shared buffers and I have
> tweaked most of the settings in the config recently to see if I could
> increase the performance any more and have seen very little performance
> gain for the various types of queries that I am running.

2 GB is too much for 7.4. I'm not sure about 8.1 because there hasn't been
any conclusive testing I think. OSDL is using 20, which is ~1.5GB.

Why not turn on log_min_duration_statement or process the log with PQA
(http://pgfoundry.org/projects/pqa/) to look for expensive queries.

Also, why kind of IO load are you seeing (iostat will tell you).

> It would appear that the only alternative may be a new machine that has
> a better disk subsystem or a large disk array then bung more RAM in the
> Opteron machine (max 16Gb 4Gb fitted) or purchase another machine with
> built in U320 SCSI ie an HP Proliant DL380 or Dell 2850.

Have a look at what your IO load is like, first.


> Some indication of current performance is as follows. I know these
> statements are hardly indicative of a full running application and
> everything that goes with it but I would be very interested in hearing
> if anyone has a similar setup and is able to squeeze a lot more out of
> PostgreSQL. From what I can see here the numbers look OK for the
> hardware I am running on and that its not PostgreSQL that is the
> problem.

> Inserting 1 million rows into the following table.These are raw insert
> statements.

[snip]

Yes, the performance looks a bit poor. I'd say that 8.1 will help address
that.

Also, don't under estimate the effects of CLUSTER on performance,
particularly <8.1.

Thanks,

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pgbench results interpretation?

2005-11-02 Thread Gavin Sherry
On Tue, 1 Nov 2005, Joost Kraaijeveld wrote:

> Hi Gavin,
>
> Thanks for answering.
>
> On Tue, 2005-11-01 at 20:16 +1100, Gavin Sherry wrote:
> > On Tue, 1 Nov 2005, Joost Kraaijeveld wrote:
> > > 1. Is there a repository somewhere that shows results, using and
> > > documenting different kinds of hard- and software setups so that I can
> > > compare my results with someone elses?
> >
> > Other than the archives of this mailing list, no.
> OK.
>
> > >
> > > 2. Is there a reason for the difference in values from run-to-run of
> > > pgbench:
> > Well, firstly: pgbench is not a good benchmarking tool.
> Is there a reason why that is the case? I would like to understand why?
> Is it because the transaction is to small/large? Or that the queries are
> to small/large? Or just experience?
>
> > It is mostly used
> > to generate load. Secondly, the numbers are suspicious: do you have fsync
> > turned off?
> In the first trials I posted yes, in the second no.
>
> > Do you have write caching enabled? If so, you'd want to make
> > sure that cache is battery backed.
> I am aware of that, but for now, I am mostly interested in the effects
> of the configuration parameters. I won't do this at home ;-)

Well, pgbench (tpc-b) suffers from inherent concurrency issues because all
connections are updating the branches table heavily. As an aside, did you
initialise with a scaling factor of 10 to match your level of concurrency?

>
>
> > Thirdly, the effects of caching will be
> > seen on subsequent runs.
> In that case I would expect mostly rising values. I only copied and
> pasted 4 trials that were available in my xterm at the time of writing
> my email, but I could expand the list ad infinitum: the variance between
> the runs is very large. I also expect that if there is no shortage of
> memory wrt caching that the effect would be negligible, but I may be
> wrong. Part of using pgbench is learning about performance, not
> achieving it.

Right. it is well known that performance with pgbench can vary wildly. I
usually get a lot less variation than you are getting. My point is though,
it's not a great indication of performance. I generally simulate the
real application running in production and test configuration changes with
that. The hackers list archive also contains links to the testing Mark
Wong has been doing at OSDL with TPC-C and TPC-H. Taking a look at the
configuration file he is using, along with the annotated postgresql.conf,
would be useful, depending on the load you're antipating and your
hardware.

>
> > > 3. It appears that running more transactions with the same amount of
> > > clients leads to a drop in the transactions per second. I do not
> > > understand why this is (a drop from more clients I do understand). Is
> > > this because of the way pgbench works, the way PostgrSQL works or even
> > > Linux?
> > This degradation seems to suggest effects caused by the disk cache filling
> > up (assuming write caching is enabled) and checkpointing.
> Which diskcache are your referring to? The onboard harddisk or RAID5
> controller caches or the OS cache? The first two I can unstand but if
> you refer to the OS cache I do not understand what I am seeing. I have
> enough memory giving the size of the database: during these duration (~)
> tests fsync was on, and the files could be loaded into memory easily
> (effective_cache_size = 32768 which is ~ 265 MB, the complete database
> directory 228 MB)

Well, two things may be at play. 1) if you are using write caching on your
controller/disks then at the point at which that cache fills up
performance will degrade to roughly that you can expect if write through
cache was being used. Secondly, we checkpoint the system periodically to
ensure that recovery wont be too long a job. Running for pgbench for a few
seconds, you will not see the effect of checkpointing, which usually runs
once every 5 minutes.

Hope this helps.

Thanks,

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pgbench results interpretation?

2005-11-01 Thread Gavin Sherry
On Tue, 1 Nov 2005, Joost Kraaijeveld wrote:

> Hi,
>
> I am trying to optimize my Debian Sarge AMD64 PostgreSQL 8.0
> installation, based on the recommendations from "the Annotated
> POSTGRESQL.CONF Guide for
> PostgreSQL" 
> (http://www.powerpostgresql.com/Downloads/annotated_conf_80.html). To see the 
> result of the recommendations I use pgbench from  postgresql-contrib.
>
> I have 3 questions about pgbench:
>
> 1. Is there a repository somewhere that shows results, using and
> documenting different kinds of hard- and software setups so that I can
> compare my results with someone elses?

Other than the archives of this mailing list, no.

>
> 2. Is there a reason for the difference in values from run-to-run of
> pgbench:
>
> The command I used (nothing else is done on the machine, not even mouse
> movement):
> [EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 1000 test
>
> Results for 4 consecutive runs:
>
> tps = 272.932982 (including connections establishing)
> tps = 273.262622 (excluding connections establishing)
>
> tps = 199.501426 (including connections establishing)
> tps = 199.674937 (excluding connections establishing)
>
> tps = 400.462117 (including connections establishing)
> tps = 401.218291 (excluding connections establishing)
>
> tps = 223.695331 (including connections establishing)
> tps = 223.919031 (excluding connections establishing)

Well, firstly: pgbench is not a good benchmarking tool. It is mostly used
to generate load. Secondly, the numbers are suspicious: do you have fsync
turned off? Do you have write caching enabled? If so, you'd want to make
sure that cache is battery backed. Thirdly, the effects of caching will be
seen on subsequent runs.

>
> 3. It appears that running more transactions with the same amount of
> clients leads to a drop in the transactions per second. I do not
> understand why this is (a drop from more clients I do understand). Is
> this because of the way pgbench works, the way PostgrSQL works or even
> Linux?
>
> [EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 10 test
> tps = 379.218809 (including connections establishing)
> tps = 461.968448 (excluding connections establishing)
>
> [EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 100 test
> tps = 533.878031 (including connections establishing)
> tps = 546.571141 (excluding connections establishing)

Well, at this rate pgbench is only running for 2 seconds!

>
> [EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 1000 test
> tps = 204.30 (including connections establishing)
> tps = 204.533627 (excluding connections establishing)
>
> [EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 1 test
> tps = 121.486803 (including connections establishing)
> tps = 121.493681 (excluding connections establishing)
>


This degradation seems to suggest effects caused by the disk cache filling
up (assuming write caching is enabled) and checkpointing.

Hope this helps.

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Gavin Sherry
On Thu, 29 Sep 2005, Joe wrote:

> Magnus Hagander wrote:
> > That actually depends a lot on *how* you use it. I've seen pg-on-windows
> > deployments that come within a few percent of the linux performance.
> > I've also seen those that are absolutely horrible compared.
> >
> > One sure way to kill the performance is to do a lot of small
> > connections. Using persistent connection is even more important on
> > Windows than it is on Unix. It could easily explain a difference like
> > this.
>
> I just tried using pg_pconnect() and I didn't notice any significant
> improvement.  What bothers me most is that with Postgres I tend to see jerky
> behavior on almost every page:  the upper 1/2 or 2/3 of the page is displayed
> first and you can see a blank bottom (or you can see a half-filled completion
> bar).  With MySQL each page is generally displayed in one swoop.

Please post the table definitions, queries and explain analyze results so
we can tell you why the performance is poor.

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-09-28 Thread Gavin Sherry
On Wed, 28 Sep 2005, Joe wrote:

> I'm converting a relatively small database (2 MB) from MySQL to PostgreSQL.  
> It
> is used to generate web pages using PHP.  Although the actual website runs 
> under
> Linux, the development is done under XP.  I've completed most of the data
> conversion and rewrite of the PHP scripts, so now I'm comparing relative
> performance.
>
> It appears that PostgreSQL is two to three times slower than MySQL.  For
> example, some pages that have some 30,000 characters (when saved as HTML) 
> take 1
> to 1 1/2 seconds with MySQL but 3 to 4 seconds with PostgreSQL.  I had read 
> that
> the former was generally faster than the latter, particularly for simple web
> applications but I was hoping that Postgres' performance would not be that
> noticeably slower.

Are you comparing PostgreSQL on XP to MySQL on XP or PostgreSQL on Linux
to MySQL on Linux? Our performance on XP is not great. Also, which version
of PostgreSQL are you using?

>
> I'm trying to determine if the difference can be attributed to anything that
> I've done or missed.  I've run VACUUM ANALYZE on the two main tables and I'm
> looking at the results of EXPLAIN on the query that drives the retrieval of
> probably 80% of the data for the pages in question.

Good.

>
> Before I post the EXPLAIN and the table schema I'd appreciate confirmation 
> that
> this list is the appropriate forum.  I'm a relative newcomer to PostgreSQL 
> (but
> not to relational databases), so I'm not sure if this belongs in the novice or
> general lists.

You can post the results of EXPLAIN ANALYZE here. Please including schema
definitions and the query string(s) themselves.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow concurrent update of same row in a given table

2005-09-28 Thread Gavin Sherry
On Thu, 29 Sep 2005, Rajesh Kumar Mallah wrote:

> On 9/29/05, Gavin Sherry <[EMAIL PROTECTED]> wrote:
> > On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:
> >
> > > > > Number of Copies | Update perl Sec
> > > > >
> > > > > 1  --> 119
> > > > > 2  ---> 59
> > > > > 3  --->  38
> > > > > 4  ---> 28
> > > > > 5 --> 22
> > > > > 6 --> 19
> > > > > 7 --> 16
> > > > > 8 --> 14
> > > > > 9 --> 11
> > > > > 10 --> 11
> > > > > 11 --> 10
> > > >
> > > > So, 11 instances result in 10 updated rows per second, database wide or
> > > > per instance? If it is per instance, then 11 * 10 is close to the
> > > > performance for one connection.
> > >
> > >
> > > Sorry do not understand the difference between "database wide"
> > > and "per instance"
> >
> > Per instance.
> >
> > >
> > > >
> > > > That being said, when you've got 10 connections fighting over one row, I
> > > > wouldn't be surprised if you had bad performance.
> > > >
> > > > Also, at 119 updates a second, you're more than doubling the table's
> > > > initial size (dead tuples) each second. How often are you vacuuming and
> > > > are you using vacuum or vacuum full?
> > >
> > >
> > > Yes I realize the obvious phenomenon now, (and the uselessness of the 
> > > script)
> > >  , we should not consider it a performance degradation.
> > >
> > > I am having performance issue in my live database thats why i tried to
> > > simulate the situation(may the the script was overstresser).
> > >
> > > My original problem is  that i send 100 000s of emails carrying a
> > > beacon for tracking readership every tuesday and on wednesday i see
> > > lot of the said query in pg_stat_activity each of these query update
> > > the SAME row that corresponds to the dispatch of last day and it is
> > > then i face the performance problem.
> > >
> > > I think i can only post further details next wednesday , please lemme
> > > know how should i be dealing with the situation if each the updates takes
> > > 100times more time that normal update duration.
> >
> > I see. These problems regularly come up in database design. The best thing
> > you can do is modify your database design/application such that instead of
> > incrementing a count in a single row, you insert a row into a table,
> > recording the 'dispatch_id'. Counting the number of rows for a given
> > dispatch id will give you your count.
> >
>
> sorry i will be accumulating huge amount of rows in seperate table
> with no extra info when i really want just the count. Do you have
> a better database design in mind?
>
> Also i encounter same problem in implementing read count of
> articles in sites and in counting banner impressions where same
> row get updated by multiple processes frequently.

As I said in private email, accumulating large numbers of rows is not a
problem. In your current application, you are write bound, not read bound.
I've designed many similar systems which have hundred of millions of rows.
It takes a while to generate the count, but you just do it periodically in
non-busy periods.

With 8.1, constraint exclusion will give you significantly better
performance with this system, as well.

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Slow concurrent update of same row in a given table

2005-09-28 Thread Gavin Sherry
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:

> > > Number of Copies | Update perl Sec
> > >
> > > 1  --> 119
> > > 2  ---> 59
> > > 3  --->  38
> > > 4  ---> 28
> > > 5 --> 22
> > > 6 --> 19
> > > 7 --> 16
> > > 8 --> 14
> > > 9 --> 11
> > > 10 --> 11
> > > 11 --> 10
> >
> > So, 11 instances result in 10 updated rows per second, database wide or
> > per instance? If it is per instance, then 11 * 10 is close to the
> > performance for one connection.
>
>
> Sorry do not understand the difference between "database wide"
> and "per instance"

Per instance.

>
> >
> > That being said, when you've got 10 connections fighting over one row, I
> > wouldn't be surprised if you had bad performance.
> >
> > Also, at 119 updates a second, you're more than doubling the table's
> > initial size (dead tuples) each second. How often are you vacuuming and
> > are you using vacuum or vacuum full?
>
>
> Yes I realize the obvious phenomenon now, (and the uselessness of the script)
>  , we should not consider it a performance degradation.
>
> I am having performance issue in my live database thats why i tried to
> simulate the situation(may the the script was overstresser).
>
> My original problem is  that i send 100 000s of emails carrying a
> beacon for tracking readership every tuesday and on wednesday i see
> lot of the said query in pg_stat_activity each of these query update
> the SAME row that corresponds to the dispatch of last day and it is
> then i face the performance problem.
>
> I think i can only post further details next wednesday , please lemme
> know how should i be dealing with the situation if each the updates takes
> 100times more time that normal update duration.

I see. These problems regularly come up in database design. The best thing
you can do is modify your database design/application such that instead of
incrementing a count in a single row, you insert a row into a table,
recording the 'dispatch_id'. Counting the number of rows for a given
dispatch id will give you your count.

Thanks,

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow concurrent update of same row in a given table

2005-09-28 Thread Gavin Sherry
On Wed, 28 Sep 2005, Rajesh Kumar Mallah wrote:

> Hi
>
> While doing some stress testing for updates in a small sized table
> we found the following results. We are not too happy about the speed
> of the updates particularly at high concurrency (10 clients).
>
> Initially we get 119 updates / sec but it drops to 10 updates/sec
> as concurrency is increased.
>
> PostgreSQL: 8.0.3
> ---
> TABLE STRUCTURE: general.stress
> ---
> | dispatch_id  | integer  | not null  |
> | query_id | integer  |   |
> | generated| timestamp with time zone |   |
> | unsubscribes | integer  |   |
> | read_count   | integer  |   |
> | status   | character varying(10)|   |
> | bounce_tracking  | boolean  |   |
> | dispatch_hour| integer  |   |
> | dispatch_date_id | integer  |   |
> +--+--+---+
> Indexes:
> "stress_pkey" PRIMARY KEY, btree (dispatch_id)
>
> UPDATE STATEMENT:
> update general.stress set read_count=read_count+1 where dispatch_id=114

This means you are updating only one row, correct?

> Number of Copies | Update perl Sec
>
> 1  --> 119
> 2  ---> 59
> 3  --->  38
> 4  ---> 28
> 5 --> 22
> 6 --> 19
> 7 --> 16
> 8 --> 14
> 9 --> 11
> 10 --> 11
> 11 --> 10

So, 11 instances result in 10 updated rows per second, database wide or
per instance? If it is per instance, then 11 * 10 is close to the
performance for one connection.

That being said, when you've got 10 connections fighting over one row, I
wouldn't be surprised if you had bad performance.

Also, at 119 updates a second, you're more than doubling the table's
initial size (dead tuples) each second. How often are you vacuuming and
are you using vacuum or vacuum full?

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] performance discrepancy indexing one column versus

2005-09-11 Thread Gavin Sherry
On Sun, 11 Sep 2005, Mark Dilger wrote:

> All,
>
> In the psql output below, I would expect the second query to run faster,
> because the b-tree index on two columns knows the values of 'b' for any
> given value of 'a', and hence does not need to fetch a row from the
> actual table.  I am not seeing a speed-up, however, so I expect my
> understanding of the index mechanism is wrong.  Could anyone enlighten
> me?

A common but incorrect assumption. We must consult the underlying table
when we do an index scan so that we can check visibility information. The
reason it is stored there in the table is so that we have only one place
to check for tuple visibility and therefore avoid race conditions.

A brief explanation of this system is described here:
http://www.postgresql.org/docs/8.0/static/mvcc.html.

and this page shows what information we store in the to do visibility
checks:

http://www.postgresql.org/docs/8.0/static/storage-page-layout.html

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Gavin Sherry
On Wed, 24 Aug 2005, Tom Lane wrote:

> Gavin Sherry <[EMAIL PROTECTED]> writes:
> > A filesystem could, in theory, help us by providing an API which allows us
> > to tell the file system either: the way we'd like it to read ahead, the
> > fact that we don't want it to read ahead or the way we'd like it to cache
> > (or not cache) data. The thing is, most OSes provide interfaces to do this
> > already and we make only little use of them (I'm think of
> > madv_sequential(), madv_random(), POSIX fadvise(), the various flags to
> > open() which AIX, HPUX, Solaris provide).
>
> Yeah ... the main reason we've not spent too much time on that sort of
> stuff is that *it's not portable*.  And with all due respect to Hans,
> special tweaks for one filesystem are even less interesting than special
> tweaks for one OS.

Right.

As an aside, it seems to me that if there is merit in all this low level
interaction with the file system (not to mention the other platform
specific microoptimisations which come up regularly on the lists) then the
companies currently producing niche commercial releases of PostgreSQL
should be taking advantage of them: if it increases performance, then
there's a reason to buy as opposed to just downloading the OSS version.

Gavin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Gavin Sherry
On Wed, 24 Aug 2005, PFC wrote:

>
> > Josh Berkus has already mentioned this as conventional wisdom as written
> > by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been
> > around for a long time; it was probably a clear performance win way back
> > when. Nowadays with how far open-source OS's have advanced, I'd take it
> > with a grain of salt and do my own performance analysis. I suspect the
> > big vendors wouldn't change their stance even if they knew it was no
> > longer true due to the support hassles.
>
>   Reinvent a filesystem... that would be suicidal.
>
>   Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking
> his Reiser4 especially for Postgres. In his own words, he wants a "Killer
> app for reiser4". Reiser4 will offser transactional semantics via a
> special reiser4 syscall, so it might be possible, with a minimum of
> changes to postgres (ie maybe just another sync mode besides fsync,
> fdatasync et al) to use this. Other interesting details were exposed on
> the reiser list, too (ie. a transactional filesystems can give ACID
> guarantees to postgres without the need for fsync()).
>
>   Very interesting.

Ummm... I don't see anything here which will be a win for Postgres. The
transactional semantics we're interested in are fairly complex:

1) Modifications to multiple objects can become visible to the system
atomically
2) On error, a series of modifications which had been grouped together
within a transaction can be rolled back
3) Using object version information, determine which version of which
object is visible to a given session
4) Using version information and locking, detect and resolve read/write
and write/write conflicts

Now, I can see a file system offering (1) and (2). But a file system that
can allow people to do (3) and (4) would require that we make *major*
modifications to how postgresql is implemented. More over, it would be for
no gain, since we've already written a system which can do it.

A filesystem could, in theory, help us by providing an API which allows us
to tell the file system either: the way we'd like it to read ahead, the
fact that we don't want it to read ahead or the way we'd like it to cache
(or not cache) data. The thing is, most OSes provide interfaces to do this
already and we make only little use of them (I'm think of
madv_sequential(), madv_random(), POSIX fadvise(), the various flags to
open() which AIX, HPUX, Solaris provide).

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] index as large as table

2005-08-20 Thread Gavin Sherry
On Sat, 20 Aug 2005 ohp@pyrenet.fr wrote:

> Hi,
>
> While testing 8.1dev I came to this:
>
> CREATE TABLE t (
> a int,
> b int
> PRIMARY KEY (a,b));
>
> In  that case, the index is as big as the table.

Right. Think about it: the index must store a, b, a reference to the data
in the table itself and index meta data. If an index is defined across all
columns of the table, it must be bigger than the table itself. (In
PostgreSQL, when the table is small, the index will be smaller still. This
is because of each entry in the table itself has meta data. But the amount
of data per row of a table remains constant, whereas, the amount of
metadata in an index grows.)

> My question is is it worthwhile to have such index peformance wise.
> I understand I'd loose uniqness buthas such an index any chance to be used
> against seq scan.

Of course. The idea is that, generally speaking, you're only interested in
a small portion of the data stored in the table. Indexes store extra data
so that they can locate the portion you're interested in faster.

Gavin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?

2005-08-19 Thread Gavin Sherry
The query plan does *not* look okay.

> electric=# EXPLAIN ANALYZE
> electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent
> electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR 
> logfielddatatype = 69)
> electric-# AND graphtargetlog = 1327
> electric-# AND timestamp >= 1123052400 AND timestamp <= 1123138800
> electric-# ORDER BY timestamp;
>   
>   
> QUERY PLAN
> --
>  Sort  (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 
> rows=2625 loops=1)
>Sort Key: public.logdata_recent."timestamp"
>->  Result  (cost=0.00..82.41 rows=6 width=14) (actual 
> time=52.483..60200.868 rows=2625 loops=1)
>  ->  Append  (cost=0.00..82.41 rows=6 width=14) (actual 
> time=52.476..60189.929 rows=2625 loops=1)
>->  Seq Scan on logdata_recent  (cost=0.00..46.25 rows=1 
> width=14) (actual time=0.003..0.003 rows=0 loops=1)
>  Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
> 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" 
> >= 1123052400) AND ("timestamp" <= 1123138800))
>->  Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, 
> logdata_recent_1123085306_ix_t_fld_gtl, 
> logdata_recent_1123085306_ix_t_fld_gtl on logdata_recent_stale logdata_recent 
>  (cost=0.00..18.08 rows=3 width=14) (actual time=52.465..60181.624 rows=2625 
> loops=1)

Notice here that expected rows is 3, but actual rows is a hell of a lot
higher. Try increasing stats collections for the columns on which
logdata_recent_1123085306_ix_t_fld_gtl is declared.

Also, the actual index scan is taking a long time. How recently have you
vacuum full'd?

Thanks,

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] limit number of concurrent callers to a stored proc?

2005-08-17 Thread Gavin Sherry
Hi Alan,

On Wed, 17 Aug 2005, Alan Stange wrote:

> Hello all,
>
> is there a simple way to limit the number of concurrent callers to a
> stored proc?
>
> The problem we have is about 50 clients come and perform the same
> operation at nearly the same time.  Typically, this query takes a few
> seconds to run, but in the case of this thundering herd the query time
> drops to 70 seconds or much more.  The query can return up to 15MB of data.
>
> The machine is a dual opteron, 8 GB memory, lots of fiber channel disk,
> Linux 2.6, etc.
>
> So, I'm thinking that a semaphore than will block more than N clients
> from being in the core of the function at one time would be a good thing.

There is no PostgreSQL feature which will do this for you. It should be
possible to implement this yourself, without too much pain. If you're
using PL/PgSQL, write another function in C or one of the other more
sophisticated PLs to implement the logic for you. At the beginning of the
function, execute the function to increment the count; at the end, execute
a function to decrement it.

If you're writing the function in C or one of those more sophisticated
PLs, it's even easier.

As an aside, using semaphores might be a little painful. I'd just grab
some shared memory and keep a counter in it. If the counter is greater
than your desired number of concurrent executions, you sleep and try again
soon.

That being said, did you want to give us a look at your function and data
and see if we can improve the performance at all?

Thanks,

Gavin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Speedier count(*)

2005-08-11 Thread Gavin Sherry
On Thu, 11 Aug 2005, Tino Wildenhain wrote:

> Am Donnerstag, den 11.08.2005, 00:40 -0400 schrieb Mark Cotner:
> > Here's a trigger I wrote to perform essentially the same purpose.  The nice
> > thing about this is it keeps the number up to date for you, but you do incur
> > slight overhead.
> ...
> >
> > CREATE TRIGGER del_rowcount_tr BEFORE DELETE ON test FOR EACH ROW EXECUTE
> >PROCEDURE del_rowcount();
> > CREATE TRIGGER add_rowcount_tr BEFORE INSERT ON test FOR EACH ROW EXECUTE
> >PROCEDURE add_rowcount();
> >
> > INSERT INTO rowcount (tablename) VALUES ('test');
> ...
>
> beware of problems with concurrency and even what happens
> if transactions roll back. Maybe you can "fix" it a bit
> by regulary correcting the count via cronjob or so.

What problems? MVCC takes care of this.

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Speedier count(*)

2005-08-10 Thread Gavin Sherry
Hi Dan,

On Wed, 10 Aug 2005, Dan Harris wrote:

> I have a web page for my customers that shows them count of records
> and some min/max date ranges in each table of a database, as this is
> how we bill them for service.  They can log in and check the counts
> at any time.  I'd like for the counts to be as fresh as possible by
> keeping this dynamic, but I will use a periodic 'snapshot'/cron job
> if that is the only option to speed this up.   I have thought about
> using the table statistics, but the estimate error is probably
> unacceptable because of the billing purposes.
>
> For some reason, the SQL Server we migrated the app from can return
> count(*) in a split second on multi-million row tables, even though
> it is a MUCH slower box hardware-wise, but it's now taking many
> seconds to run. I have read in the archives the problems MVCC brings
> into the count(*) dilemma forcing Pg to run a seq scan to get
> counts.  Does SQLServer not use MVCC or have they found another

SQL Server probably jumps through a lot of hoops to do fast count(*)s. I'm
sure we could do something similar -- it's just a question of complexity,
resources, desirability, etc. The are other solutions, which makes the
idea of doing it less attractive still.

> approach for arriving at this number?  Compounding all the min/max
> and counts from other tables and all those queries take about a
> minute to run. The tables will contain anywhere from 1 million to 40
> million rows.
>
> Also, I am using "select ... group by ... order by .. limit 1" to get
> the min/max since I have already been bit by the issue of min() max()
> being slower.

I generally pre generate the results. There are two ways to do this: the
'snapshot'/cronjon you mentioned or using rules and triggers to maintain
'count' tables. The idea is that if data is added, modified or removed
from your table, you modify counters in these other tables.

Alternatively, feel free to post your schema and sample queries with
explain analyze results to this list. Alternatively, jump on irc at
irc.freenode.net #postgresql and someone will be more than happy to look
through the problem in more detail.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-28 Thread Gavin Sherry
zOn Thu, 28 Jul 2005, Matthew Schumacher wrote:

> Gavin Sherry wrote:
>
> >
> > I had a look at your data -- thanks.
> >
> > I have a question though: put_token() is invoked 120596 times in your
> > benchmark... for 616 messages. That's nearly 200 queries (not even
> > counting the 1-8 (??) inside the function itself) per message. Something
> > doesn't seem right there
> >
> > Gavin
>
> I am pretty sure that's right because it is doing word statistics on
> email messages.
>
> I need to spend some time studying the code, I just haven't found time yet.
>
> Would it be safe to say that there isn't any glaring performance
> penalties other than the sheer volume of queries?

Well, everything relating to one message should be issued in a transaction
block. Secondly, the initial select may be unnecessary -- I haven't looked
at the logic that closely.

There is, potentially, some parser overhead. In C, you could get around
this with PQprepare() et al.

It would also be interesting to look at the cost of a C function.

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-28 Thread Gavin Sherry
On Thu, 28 Jul 2005, Matthew Schumacher wrote:

> Karim Nassar wrote:
> > On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote:
> >
> >
> >>I put the rest of the schema up at
> >>http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone
> >>needs to see it too.
> >
> >
> > Do you have sample data too?
> >
>
> Ok, I finally got some test data together so that others can test
> without installing SA.
>
> The schema and test dataset is over at
> http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz
>
> I have a pretty fast machine with a tuned postgres and it takes it about
> 2 minutes 30 seconds to load the test data.  Since the test data is the
> bayes information on 616 spam messages than comes out to be about 250ms
> per message.  While that is doable, it does add quite a bit of overhead
> to the email system.

I had a look at your data -- thanks.

I have a question though: put_token() is invoked 120596 times in your
benchmark... for 616 messages. That's nearly 200 queries (not even
counting the 1-8 (??) inside the function itself) per message. Something
doesn't seem right there

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Finding bottleneck

2005-07-28 Thread Gavin Sherry
Hi,

On Thu, 28 Jul 2005, Kari Lavikka wrote:

> ->8 Relevant rows from postgresql.conf 8<-
>
> shared_buffers = 15000  # min 16, at least max_connections*2, 8KB each
> work_mem = 1536 # min 64, size in KB

As an aside, I'd increase work_mem -- but it doesn't sound like that is
your problem.

> maintenance_work_mem = 32768# min 1024, size in KB
>
> max_fsm_pages = 100 # min max_fsm_relations*16, 6 bytes each
> max_fsm_relations = 5000# min 100, ~50 bytes each
>
> vacuum_cost_delay = 15  # 0-1000 milliseconds
> vacuum_cost_limit = 120 # 0-1 credits
>
> bgwriter_percent = 2# 0-100% of dirty buffers in each round
>
> fsync = true# turns forced synchronization on or off
>  # fsync, fdatasync, open_sync, or 
> open_datasync
> wal_buffers = 128   # min 4, 8KB each

Some benchmarking results out today suggest that wal_buffers = 1024 or
even 2048 could greatly assist you.

> commit_delay = 8# range 0-10, in microseconds
> commit_siblings = 10# range 1-1000

This may explain the fact that you've got backed up queries and idle CPU
-- I'm not certain though. What does disabling commit_delay do to your
situation?

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] COPY insert performance

2005-07-26 Thread Gavin Sherry
Hi Chris,

Have you considered breaking the data into multiple chunks and COPYing
each concurrently?

Also, have you ensured that your table isn't storing OIDs?

On Mon, 25 Jul 2005, Chris Isaacson wrote:

> #---
> 
> # RESOURCE USAGE (except WAL)
> #---
> 
> shared_buffers = 65536  # min 16, at least max_connections*2, 8KB each

shared_buffers that high has been shown to affect performance. Try 12000.

> wal_buffers = 64  # min 4, 8KB each

Increasing wal_buffers can also have an effect on performance.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Gavin Sherry
On Fri, 22 Apr 2005, Shoaib Burq (VPAC) wrote:

> Please see attached the output from explain analyse. This is with the
>
>   shared_buffers = 10600
>   work_mem = 102400
>   enable_seqscan = true
>
> BTW I guess should mention that I am doing the select count(*) on a View.
>
> Ran the Explain analyse with the nestedloop disabled but it was taking
> forever... and killed it after 30mins.

Try increasing stats collection on ClimateChangeModel40.ClimateId:

alter table ClimateChangeModel40 alter column ClimateId set statistics 1000;
analyze ClimateChangeModel40;

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] two queries and dual cpu (perplexed)

2005-04-21 Thread Gavin Sherry
On Thu, 21 Apr 2005, Jeff wrote:

>
> On Apr 21, 2005, at 7:49 AM, Shoaib Burq (VPAC) wrote:
>
> > Now I have not touch the $PGDATA/postgresql.conf (As I know very little
> > about memory tuning) Have run VACCUM & ANALYZE.
> >
> You should really, really bump up shared_buffers and given you have 8GB
> of ram this query would likely benefit from more work_mem.

I'd recommend shared_buffers = 10600. Its possible that work_mem in the
hundreds of megabytes might have a good impact, but its hard to say
without seeing the EXPLAIN ANALYZE output.

Gavin


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Bulk COPY end of copy delimiter

2005-04-04 Thread Gavin Sherry
Hi,

On Mon, 4 Apr 2005, Steven Rosenstein wrote:

>
>
>
>
> Today while trying to do a bulk COPY of data into a table, the process
> aborted with the following error message:
>
> ERROR: end-of-copy marker corrupt
> CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24
> 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt
> author..."
>
> Googling the error, we found reference to the '\.' (backslash-period) being
> an "end-of-copy marker".  Unfortunately, our data contains the
> backslash-period character sequence.  Is there any know fix or workaround
> for this condition?

Any sequence \. in COPY input data should be escaped as \\. If this data
was generated by pg_dump then its a problem, but I haven't seen any other
reports of this. Can I assume that you've generated the data for bulk load
yourself? If so, there is discussion of escaping characters here:
http://www.postgresql.org/docs/8.0/static/sql-copy.html.

Gavin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql

2005-01-13 Thread Gavin Sherry
On Wed, 12 Jan 2005 [EMAIL PROTECTED] wrote:

> I wonder  if I would like to increase more RAM from 4 Gb. to 6 Gb. [which I 
> hope
> to increase more performance ] and I now I used RH 9 and Pgsql 7.3.2 ON DUAL
> Xeon 3.0 server thay has the limtation of 4 Gb. ram, I should use which OS
> between FC 2-3 or redhat EL 3 [which was claimed to support 64 Gb.ram] .May I
> use FC 2 [which is freely downloaded] with 6 Gb. and PGsql 7.4 ?
> Amrit
> Thailand

Try 7.4 before the memory upgrade. If you still have performance issues,
try optimising your queries. As I mentioned before, you can join the
#postgresql channel on irc.freenode.net and we can assist.

Gavin


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql

2005-01-12 Thread Gavin Sherry
On Wed, 12 Jan 2005 [EMAIL PROTECTED] wrote:

> I wonder  if I would like to increase more RAM from 4 Gb. to 6 Gb. [which I 
> hope
> to increase more performance ] and I now I used RH 9 and Pgsql 7.3.2 ON DUAL
> Xeon 3.0 server thay has the limtation of 4 Gb. ram, I should use which OS
> between FC 2-3 or redhat EL 3 [which was claimed to support 64 Gb.ram] .May I
> use FC 2 [which is freely downloaded] with 6 Gb. and PGsql 7.4 ?

There is no problem with free Linux distros handling > 4 GB of memory. The
problem is that 32 hardware must make use of some less than efficient
mechanisms to be able to address the memory.

So, try 7.4 before the memory upgrade. If you still have performance issues,
try optimising your queries. As I mentioned before, you can join the
#postgresql channel on irc.freenode.net and we can assist.

Gavin


> Amrit
> Thailand

Gavin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Low Performance for big hospital server ..

2005-01-04 Thread Gavin Sherry
On Tue, 4 Jan 2005 [EMAIL PROTECTED] wrote:

> Today is the first official day of this weeks and the system run better in
> serveral points but there are still some points that need to be corrected. 
> Some
> queries or some tables are very slow. I think the queries inside the programe
> need to be rewrite.
> Now I put the sort mem to a little bit bigger:
> sort mem = 16384   increase  the sort mem makes no effect on the slow 
> point
> eventhough there is little connnection.
> shared_buffers = 27853
> effective cache = 12

Even though others have said otherwise, I've had good results from setting
sort_mem higher -- even if that is per query.

>
> I will put more ram but someone said RH 9.0 had poor recognition on the Ram
> above 4 Gb?

I think they were refering to 32 bit architectures, not distributions as
such.

> Should I close the hyperthreading ? Would it make any differnce between open 
> and
> close the hyperthreading?
> Thanks for any comment

In my experience, the largest performance increases come from intensive
analysis and optimisation of queries. Look at the output of EXPLAIN
ANALYZE for the queries your application is generating and see if they can
be tuned in anyway. More often than not, they can.

Feel free to ask for assistence on irc at irc.freenode.net #postgresql.
People there help optimise queries all day ;-).

> Amrit
> Thailand

Gavin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] psql large RSS (1.6GB)

2004-11-01 Thread Gavin Sherry
On Sat, 30 Oct 2004, Dustin Sallings wrote:

> >   If the solution is to just write a little client that uses perl
> > DBI to fetch rows one at a time and write them out, that's doable,
> > but it would be nice if psql could be made to "just work" without
> > the monster RSS.
>
>   It wouldn't make a difference unless that driver implements the
> underlying protocol on its own.

Even though we can tell people to make use of cursors, it seems that
memory usage for large result sets should be addressed. A quick search of
the archives does not reveal any discussion about having libpq spill to
disk if a result set reaches some threshold. Has this been canvassed in
the past?

Thanks,

Gavin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large Database Performance suggestions

2004-10-21 Thread Gavin Sherry
On Thu, 21 Oct 2004, Joshua Marsh wrote:

> Recently, we have found customers who are wanting to use our service
> with data files between 100 million and 300 million records.  At that
> size, each of the three major tables will hold between 150 million and
> 700 million records.  At this size, I can't expect it to run queries
> in 10-15 seconds (what we can do with 10 million records), but would
> prefer to keep them all under a minute.

To provide any useful information, we'd need to look at your table schemas
and sample queries.

The values for sort_mem and shared_buffers will also be useful.

Are you VACUUMing and ANALYZEing? (or is the data read only?))

gavin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select with qualified join condition / Batch inserts

2004-10-19 Thread Gavin Sherry
On Fri, 15 Oct 2004, Bernd wrote:

> Hi,
>
> we are working on a product which was originally developed against an Oracle
> database and which should be changed to also work with postgres.
>
> Overall the changes we had to make are very small and we are very pleased with
> the good performance of postgres - but we also found queries which execute
> much faster on Oracle. Since I am not yet familiar with tuning queries for
> postgres, it would be great if someone could give me a hint on the following
> two issues. (We are using PG 8.0.0beta3 on Linux kernel 2.4.27):
>
> 1/ The following query takes about 5 sec. with postrgres whereas on Oracle it
> executes in about 30 ms (although both tables only contain 200 k records in
> the postgres version).
>
> SQL:
>
> SELECT cmp.WELL_INDEX, cmp.COMPOUND, con.CONCENTRATION
>   FROM SCR_WELL_COMPOUND cmp, SCR_WELL_CONCENTRATION con
>   WHERE cmp.BARCODE=con.BARCODE
>   AND cmp.WELL_INDEX=con.WELL_INDEX
>   AND cmp.MAT_ID=con.MAT_ID
>   AND cmp.MAT_ID = 3
>   AND cmp.BARCODE='910125864'
>   AND cmp.ID_LEVEL = 1;
>
> Table-def:
> Table "public.scr_well_compound"
>Column   |  Type  | Modifiers
> ++---
>  mat_id | numeric(10,0)  | not null
>  barcode| character varying(240) | not null
>  well_index | numeric(5,0)   | not null
>  id_level   | numeric(3,0)   | not null
>  compound   | character varying(240) | not null
> Indexes:
> "scr_wcm_pk" PRIMARY KEY, btree (id_level, mat_id, barcode, well_index)

I presume you've VACUUM FULL'd and ANALYZE'd? Can we also see a plan?
EXPLAIN ANALYZE .
http://www.postgresql.org/docs/7.4/static/sql-explain.html.

You may need to create indexes with other primary columns. Ie, on mat_id
or barcode.


> 2/ Batch-inserts using jdbc (maybe this should go to the jdbc-mailing list -
> but it is also performance related ...):
> Performing many inserts using a PreparedStatement and batch execution makes a
> significant performance improvement in Oracle. In postgres, I did not observe
> any performance improvement using batch execution. Are there any special
> caveats when using batch execution with postgres?

The JDBC people should be able to help with that.

Gavin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [pgsql-advocacy] [PERFORM] [OFF-TOPIC] - Known maximum size of

2004-05-06 Thread Gavin Sherry
On Thu, 6 May 2004, Shridhar Daithankar wrote:

> Richard Huxton wrote:
>
> > Christopher Kings-Lynne wrote:
> >
>  What's the case of bigger database PostgreSQL (so greate and amount of
>  registers) that they know???
> >> Didn't someone say that RedSheriff had a 10TB postgres database or
> >> something?
> >  From http://www.redsheriff.com/us/news/news_4_201.html
> >
> > "According to the company, RedSheriff processes 10 billion records a
> > month and the total amount of data managed is more than 32TB. Griffin
> > said PostgreSQL has been in production for 12 months with not a single
> > database fault in that time “The stability of the database can not be
> > questioned. Needless to say, we are extremely happy."
> >
> > I think it's safe to assume this is not on a spare Dell 600SC though.
> >
>
> I think we should have a case study for that. And publish it on our regular
> news/press contacts(Can't imagine the flame war on /...Umm Yummy..:-)). It would
> make a lot of noise and gain visibility for us.
>
> Of course Red Sherrif need to co-operate and spell the details and/or moderate
> what we write, but all in all, 32TB database is uber-cool..:-)

I've tried contacting them. They will not return my phone calls or emails.

Gavin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Filesystems WAS: Perfomance Tuning

2003-08-14 Thread Gavin Sherry
On Tue, 12 Aug 2003, Bruce Momjian wrote:

> 
> I think Gavin Sherry is working on this.  I am CC'ing him.
> 
> ---

Yes I am working on this. I am about 50% of the way through the patch but
have been held up with other work. For those who are interested, it
basically allow:

1) creation of different 'storage' locations. Tables and indexes can be
created in different storage locations. Storage locations can also be
assigned to schemas and databases. Tables and indexes will default to the
schema storage location if STORAGE 'store name' is not provided to CREATE
 This will cascade to the default database storage location if
the schema was not created with STORAGE 'store name'.

2) the patch will allow different storage locations to have different
rand_cost parameters passed to the planner.

3) the patch *will not* address issues concerning quotas, resource
management, WAL/clog, temp or sort spaces.

Will keep everyone posted if/when I finish.

Thanks,

Gavin


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])