Re: [PERFORM] Using LIKE expression problem..

2004-05-12 Thread Michael Ryan S. Puncia
Yes , I already do that but the same result .. LIKE uses seq scan

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Kings-Lynne
Sent: Wednesday, May 12, 2004 2:48 PM
To: Michael Ryan S. Puncia
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Using LIKE expression problem..

> In the query plan ..it uses seq scan rather than index scan .. why ? I 
> have index on lastname, firtname.

Have you run VACUUM ANALYZE; on the table recently?

Chris


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Using LIKE expression problem..

2004-05-12 Thread Christopher Kings-Lynne
Are you in a non-C locale?

Chris

Michael Ryan S. Puncia wrote:

Yes , I already do that but the same result .. LIKE uses seq scan

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Kings-Lynne
Sent: Wednesday, May 12, 2004 2:48 PM
To: Michael Ryan S. Puncia
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Using LIKE expression problem..

In the query plan ..it uses seq scan rather than index scan .. why ? I 
have index on lastname, firtname.


Have you run VACUUM ANALYZE; on the table recently?

Chris

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Using LIKE expression problem..

2004-05-12 Thread Michael Ryan S. Puncia
Sorry .. I am a newbie and I don't know :( 
How can I know that I am in C locale ?
How can I change my database to use C locale?



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Kings-Lynne
Sent: Wednesday, May 12, 2004 3:59 PM
To: Michael Ryan S. Puncia
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] Using LIKE expression problem..

Are you in a non-C locale?

Chris

Michael Ryan S. Puncia wrote:

> Yes , I already do that but the same result .. LIKE uses seq scan
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Christopher
> Kings-Lynne
> Sent: Wednesday, May 12, 2004 2:48 PM
> To: Michael Ryan S. Puncia
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PERFORM] Using LIKE expression problem..
> 
> 
>>In the query plan ..it uses seq scan rather than index scan .. why ? I 
>>have index on lastname, firtname.
> 
> 
> Have you run VACUUM ANALYZE; on the table recently?
> 
> Chris
> 
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Clarification on some settings

2004-05-12 Thread Shridhar Daithankar
Doug Y wrote:

Hello,
  I've been having some performance issues with a DB I use. I'm trying 
to come up with some performance recommendations to send to the 
"adminstrator".

Hardware:
CPU0: Pentium III (Coppermine) 1000MHz (256k cache)
CPU1: Pentium III (Coppermine) 1000MHz (256k cache)
Memory: 3863468 kB (4 GB)
OS: Red Hat Linux release 7.2 (Enigma)
Kernel: 2.4.9-31smp
I/O I believe is a 3-disk raid 5.
/proc/sys/kernel/shmmax and /proc/sys/kernel/shmall were set to 2G

Postgres version: 7.3.4
 > The DB schema is, well to put it nicely... not exactly normalized. No
constraints to speak of except for the requisite not-nulls on the 
primary keys (many of which are compound). Keys are mostly varchar(256) 
fields.

Ok for what I'm uncertain of...
shared_buffers:
According to http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
Its more of a staging area and more isn't necessarily better. That psql 
relies on the OS to cache data for later use.
But according to 
http://www.ca.postgresql.org/docs/momjian/hw_performance/node3.html its 
where psql caches previous data for queries because the OS cache is 
slower, and should be as big as possible without causing swap.
Those seem to be conflicting statements. In our case, the 
"administrator" kept increasing this until performance seemed to 
increase, which means its now 25 (x 8k is 2G).
Is this just a staging area for data waiting to move to the OS cache, or 
is this really the area that psql caches its data?
It is the area where postgresql works. It updates data in this area and pushes 
it to OS cache for disk writes later.

By experience, larger does not mean better for this parameter. For multi-Gig RAM 
machines, the best(on an average for wide variety of load) value found to be 
around 1-15000. May be even lower.

It is a well known fact that raising this parameter unnecessarily decreases the 
performance. You indicate that best performance occurred at 25. This is very 
very large compared to other people's experience.
effective_cache_size:
Again, according to the Varlena guide this tells psql how much system 
memory is available for it to do its work in.
until recently, this was set at the default value of 1000. It was just 
recently increased to 18 (1.5G)
according to 
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html 
it should be about 25% of memory?
No rule of thumb. It is amount of memory OS will dedicate to psotgresql data 
buffers. Depending uponn what else you run on machine, it could be 
straight-forward or noodly value to calculate. For a 4GB machine, 1.5GB is quite 
good but coupled with 2G of shared buffers it could push the machines to swap 
storm. And swapping shared buffers is a big performance hit.

Finally sort_mem:
Was until recently left at the default of 1000. Is now 16000.
Sort memory is per sort not per query or per connection. So depending upon how 
many concurrent connections you entertain, it could take quite a chuck of RAM.
Increasing the effective cache and sort mem didn't seem to make much of 
a difference. I'm guessing the eff cache was probably raised a bit too 
much, and shared_buffers is way to high.
I agree. For shared buffers start with 5000 and increase in batches on 1000. Or 
set it to a high value and check with ipcs for maximum shared memory usage. If 
share memory usage peaks at 100MB, you don't need more than say 120MB of buffers.

What can I do to help determine what the proper settings should be 
and/or look at other possible choke points. What should I look for in 
iostat, mpstat, or vmstat as red flags that cpu, memory, or i/o bound?
Yes. vmstat is usually a lot of help to locate the bottelneck.

DB maintenance wise, I don't believe they were running vacuum full until 
I told them a few months ago that regular vacuum analyze no longer 
cleans out dead tuples. Now normal vac is run daily, vac full weekly 
(supposedly). How can I tell from the output of vacuum if the vac fulls 
aren't being done, or not done often enough? Or from the system tables, 
what can I read?
In 7.4 you can do vacuum full verbose and it will tell you the stats at the end. 
For 7.3.x, its not there.

I suggest you vacuum full database once.(For large database, dumping restoring 
might work faster. Dump/restore and vacuum full both lock the database 
exclusively i.e. downtime. So I guess faster the better for you. But there is no 
tool/guideline to determine which way to go.)

Is there anywhere else I can look for possible clues? I have access to 
the DB super-user, but not the system root/user.
Other than hardware tuning, find out slow/frequent queries. Use explain analyze 
to determine why they are so slow. Forgetting to typecast a where clause and 
using sequential scan could cost you lot more than mistuned postgresql 
configuration.

Thank you for your time. Please let me know any help or suggestions you 
may have. Unfortunately upgrading postgres, OS, kernel, or re-writing 
schema is most likely

Re: [PERFORM] Configuring PostgreSQL to minimize impact of checkpoints

2004-05-12 Thread Vivek Khera
> "TL" == Tom Lane <[EMAIL PROTECTED]> writes:

TL> Jack Orenstein <[EMAIL PROTECTED]> writes:
>> I'm looking at one case in which two successive transactions, each
>> updating a handful of records, take 26 and 18 *seconds* (not msec) to
>> complete. These transactions normally complete in under 30 msec.

TL> I've seen installations in which it seemed that the "normal" query load
TL> was close to saturating the available disk bandwidth, and the extra load
TL> imposed by a background VACUUM just pushed the entire system's response
TL> time over a cliff.  In an installation that has I/O capacity to spare,

me stand up waving hand... ;-)  This is my only killer problem left.
I always peg my disk usage at 100% when vacuum runs, and other queries
are slow too.  When not running vacuum, my queries are incredibly
zippy fast, including joins and counts and group by's on upwards of
100k rows at a time.

TL> I suspect that the same observations hold true for checkpoints, though
TL> I haven't specifically seen an installation suffering from that effect.

I don't see that.  But I also set checkpoint segments to about 50 on
my big server.



-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Configuring PostgreSQL to minimize impact of

2004-05-12 Thread Vivek Khera
> "JAR" == J Andrew Rogers <[EMAIL PROTECTED]> writes:


JAR> The LSI MegaRAID reading/writing/caching behavior is user configurable.
JAR> It will support both write-back and write-through, and IIRC, three
JAR> different algorithms for reading (none, read-ahead, adaptive).  Plenty
JAR> of configuration options.

For PG max performance, you want to set it for write-back and
read-ahead (adaptive has been claimed to be bad, but I got similar
performace from read-ahead and adaptive, so YMMV).


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Configuring PostgreSQL to minimize impact of checkpoints

2004-05-12 Thread jao
Quoting Vivek Khera <[EMAIL PROTECTED]>:

> > "TL" == Tom Lane <[EMAIL PROTECTED]> writes:
> 
> TL> Jack Orenstein <[EMAIL PROTECTED]> writes:
> >> I'm looking at one case in which two successive transactions, each
> >> updating a handful of records, take 26 and 18 *seconds* (not msec) to
> >> complete. These transactions normally complete in under 30 msec.
> 
> TL> I've seen installations in which it seemed that the "normal" query load
> TL> was close to saturating the available disk bandwidth, and the extra load
> TL> imposed by a background VACUUM just pushed the entire system's response
> TL> time over a cliff.  In an installation that has I/O capacity to spare,
> ...
> TL> I suspect that the same observations hold true for checkpoints, though
> TL> I haven't specifically seen an installation suffering from that effect.
> 
> I don't see that.  But I also set checkpoint segments to about 50 on
> my big server.

But wouldn't that affect checkpoint frequency, not checkpoint cost?

Jack Orenstein



This message was sent using IMP, the Internet Messaging Program.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Configuring PostgreSQL to minimize impact of checkpoints

2004-05-12 Thread Vivek Khera
On May 12, 2004, at 3:22 PM, [EMAIL PROTECTED] wrote:

I don't see that.  But I also set checkpoint segments to about 50 on
my big server.
But wouldn't that affect checkpoint frequency, not checkpoint cost
Seems reasonable.  I suppose checkpointing doesn't cost as much disk 
I/O as vacuum does.  My checkpoints are also on a separate RAID volume 
on a separate RAID channel, so perhaps that gives me extra bandwidth to 
perform the checkpoints.



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Quad processor options

2004-05-12 Thread Manfred Koizar
On Tue, 11 May 2004 15:46:25 -0700, Paul Tuckfield <[EMAIL PROTECTED]>
wrote:
>- the "cache" column shows that linux is using 2.3G for cache. (way too 
>much)

There is no such thing as "way too much cache".

>   you generally want to give memory to postgres to keep it "close" to 
>the user,

Yes, but only a moderate amount of memory.

>   not leave it unused to be claimed by linux cache

Cache is not unused memory.

>- I'll bet you have a low value for shared buffers, like 1.  On 
>your 3G system
>   you should ramp up the value to at least 1G (125000 8k buffers) 

In most cases this is almost the worst thing you can do.  The only thing
even worse would be setting it to 1.5 G.

Postgres is just happy with a moderate shared_buffers setting.  We
usually recommend something like 1.  You could try 2, but don't
increase it beyond that without strong evidence that it helps in your
particular case.

This has been discussed several times here, on -hackers and on -general.
Search the archives for more information.

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [ADMIN] [PERFORM] Quad processor options

2004-05-12 Thread scott.marlowe
On Wed, 12 May 2004, Grega Bremec wrote:

> ...and on Tue, May 11, 2004 at 03:02:24PM -0600, scott.marlowe used the keyboard:
> > 
> > If you get the LSI megaraid, make sure you're running the latest megaraid 
> > 2 driver, not the older, slower 1.18 series.  If you are running linux, 
> > look for the dkms packaged version.  dkms, (Dynamic Kernel Module System) 
> > automagically compiles and installs source rpms for drivers when you 
> > install them, and configures the machine to use them to boot up.  Most 
> > drivers seem to be slowly headed that way in the linux universe, and I 
> > really like the simplicity and power of dkms.
> > 
> 
> Hi,
> 
> Given the fact LSI MegaRAID seems to be a popular solution around here, and
> many of you folx use Linux as well, I thought sharing this piece of info
> might be of use.
> 
> Running v2 megaraid driver on a 2.4 kernel is actually not a good idea _at_
> _all_, as it will silently corrupt your data in the event of a disk failure.
> 
> Sorry to have to say so, but we tested it (on kernels up to 2.4.25, not sure
> about 2.4.26 yet) and it comes out it doesn't do hotswap the way it should.
> 
> Somehow the replaced disk drives are not _really_ added to the array, which
> continues to work in degraded mode for a while and (even worse than that)
> then starts to think the replaced disk is in order without actually having
> resynced it, thus beginning to issue writes to non-existant areas of it.
> 
> The 2.6 megaraid driver indeed seems to be a merged version of the above
> driver and the old one, giving both improved performance and correct
> functionality in the event of a hotswap taking place.

This doesn't make any sense to me, since the hot swapping is handled by 
the card autonomously.  I also tested it with a hot spare and pulled one 
drive and it worked fine during our acceptance testing.

However, I've got a hot spare machine I can test on, so I'll try it again 
and see if I can make it fail.

when testing it, was the problem present in certain RAID configurations or 
only one type or what?  I'm curious to try and reproduce this problem, 
since I've never heard of it before.

Also, what firmware version were those megaraid cards, ours is fairly 
new, as we got it at the beginning of this year, and I'm wondering if it 
is a firmware issue.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Quad processor options - summary

2004-05-12 Thread Bjoern Metzdorf
Hi,
at first, many thanks for your valuable replies. On my quest for the 
ultimate hardware platform I'll try to summarize the things I learned.

-
This is our current setup:
Hardware:
Dual Xeon DP 2.4 on a TYAN S2722-533 with HT enabled
3 GB Ram (2 x 1 GB + 2 x 512 MB)
Mylex Extremeraid Controller U160 running RAID 10 with 4 x 18 GB SCSI 
10K RPM, no other drives involved (system, pgdata and wal are all on the 
same volume).

Software:
Debian 3.0 Woody
Postgresql 7.4.1 (selfcompiled, no special optimizations)
Kernel 2.4.22 + fixes
Database specs:
Size of a gzipped -9 full dump is roughly 1 gb
70-80% selects, 20-30% updates (roughly estimated)
up to 700-800 connections during peak times
kernel.shmall = 805306368
kernel.shmmax = 805306368
max_connections = 900
shared_buffers = 2
sort_mem = 16384
checkpoint_segments = 6
statistics collector is enabled (for pg_autovacuum)
Loads:
We are experiencing average CPU loads of up to 70% during peak hours. As 
Paul Tuckfield correctly pointed out, my vmstat output didn't support 
this. This output was not taken during peak times, it was freshly 
grabbed when I wrote my initial mail. It resembles perhaps 50-60% peak 
time load (30% cpu usage). iostat does not give results about disk 
usage, I don't know exactly why, the blk_read/wrtn columns are just 
empty. (Perhaps due to the Mylex rd driver, I don't know).

-
Suggestions and solutions given:
Anjan Dave reported, that he is pretty confident with his Quad Xeon 
setups, which will cost less than $20K at Dell with a reasonable 
hardware setup. ( Dell 6650 with 2.0GHz/1MB cache/8GB Memory, 5 internal 
drives (4 in RAID 10, 1 spare) on U320, 128MB cache on the PERC controller)

Scott Marlowe pointed out, that one should consider more than 4 drives 
(6 to 8, 10K rpm is enough, 15K is rip-off) for a Raid 10 setup, because 
that can boost performance quite a lot. One should also be using a 
battery backed raid controller. Scott has good experiences with the LSI 
Megaraid single channel controller, which is reasonably priced at ~ 
$500. He also stated, that 20-30% writes on a database is quite a lot.

Next Rob Sell told us about his research on more-than-2-way Intel based 
systems. The memory bandwidth on the xeon platform is always shared 
between the cpus. While a 2way xeon may perform quite well, a 4way 
system will be suffering due to the reduced memory bandwith available 
for each processor.

J. Andrew Roberts supports this. He said that 4way opteron systems scale 
much better than a 4way xeon system. Scaling limits begin at 6-8 cpus on 
the opteron platform. He also says that a fully equipped dual channel 
LSI Megaraid 320 with 256MB cache ram will be less that $1K. A complete 
4way opteron system will be at $10K-$12K.

Paul Tuckfield then gave the suggestion to bump up my shared_buffers. 
With a 3GB memory system, I could happily be using 1GB for shared 
buffers (125000). This was questioned by Andrew McMillian, Manfred 
Kolzar and Halford Dace, who say that common tuning advices limit 
reasonable settings to 1-2 shared buffers, because the OS is 
better at caching than the database.

-
Conclusion:
After having read some comparisons between n-way xeon and opteron systems:
http://www.anandtech.com/IT/showdoc.html?i=1982
http://www.aceshardware.com/read.jsp?id=6275
I was given the impression, that an opteron system is the way to go.
This is what I am considering the ultimate platform for postgresql:
Hardware:
Tyan Thunder K8QS board
2-4 x Opteron 848 in NUMA mode
4-8 GB RAM (DDR400 ECC Registered 1 GB modules, 2 for each processor)
LSI Megaraid 320-2 with 256 MB cache ram and battery backup
6 x 36GB SCSI 10K drives + 1 spare running in RAID 10, split over both 
channels (3 + 4) for pgdata including indexes and wal.
2 x 80 GB S-ATA IDE for system, running linux software raid 1 or 
available onboard hardware raid (perhaps also 2 x 36 GB SCSI)

Software:
Debian Woody in amd64 biarch mode, or perhaps Redhat/SuSE Enterprise 
64bit distributions.
Kernel 2.6
Postgres 7.4.2 in 64bit mode
shared_buffers = 2
a bumbed up effective_cache_size

Now the only problem left (besides my budget) is the availability of 
such a system.

I have found some vendors which ship similar systems, so I will have to 
talk to them about my dream configuration. I will not self build this 
system, there are too many obstacles.

I expect this system to come out on about 12-15K Euro. Very optimistic, 
I know :)

These are the vendors I found up to now:
http://www.appro.com/product/server_4144h.asp
http://www.appro.com/product/server_4145h.asp
http://www.pyramid.de/d/builttosuit/server/4opteron.shtml
http://www.rainbow-it.co.uk/productslist.aspx?CategoryID=4&selection=2
http://www.quadopteron.com/
They all seem to sell more or less the same system. I found als