Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tambet Matiisen
Josh Berkus josh@agliodbs.com writes: 1) When is it necessary to run REINDEX or drop/create an index? All I could really find in the docs is: If you need to VACUUM FULL, you need to REINDEX as well. For example, if you drop millions of rows from a table. That's probably a

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Dave Held
-Original Message- From: Alex Turner [mailto:[EMAIL PROTECTED] Sent: Monday, April 18, 2005 5:50 PM To: Bruce Momjian Cc: Kevin Brown; pgsql-performance@postgresql.org Subject: Re: [PERFORM] How to improve db performance with $7K? Does it really matter at which end of the cable

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Tambet Matiisen [EMAIL PROTECTED] writes: Is my current understanding correct: 1) VACUUM defragments each page locally - moves free space to the end of page. 2) VACUUM FULL defragments table globally - tries to fill up all partially free pages and deletes all resulting empty pages. Both

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Mohan, Ross
Good question. If the SCSI system was moving the head from track 1 to 10, and a request then came in for track 5, could the system make the head stop at track 5 on its way to track 10? That is something that only the controller could do. However, I have no idea if SCSI does that. || SCSI,

Re: [PERFORM] Postgresql works too slow

2005-04-19 Thread Simon Riggs
On Mon, 2005-04-18 at 08:50 +0400, Nurlan Mukhanov (AL/EKZ) wrote: I'm trying to restore my database from dump in several parrallel processes, but restore process works too slow. Number of rows about 100 000 000, RAM: 8192M CPU: Ultra Sparc 3 Number of CPU: 4 OS: SunOS sun 5.8 RDBMS:

Re: [PERFORM] Postgresql works too slow

2005-04-19 Thread Brad Nicholson
Simon Riggs wrote: On Mon, 2005-04-18 at 08:50 +0400, Nurlan Mukhanov (AL/EKZ) wrote: I'm trying to restore my database from dump in several parrallel processes, but restore process works too slow. Number of rows about 100 000 000, RAM: 8192M CPU: Ultra Sparc 3 Number of CPU: 4 OS: SunOS sun

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Josh Berkus
Tambet, Hmm, thanks for a tip. BTW, is output of select count(1), sum(relpages) from pg_class where relkind in ('r','i','t') Well, if you do that for all databases in the cluster, it's the number you start with. However, setting FSM_pages to that would be assuming that you excpected 100%

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Bruce Momjian
Mohan, Ross wrote: The only part I am pretty sure about is that real-world experience shows SCSI is better for a mixed I/O environment. Not sure why, exactly, but the command queueing obviously helps, and I am not sure what else does. || TCQ is the secret sauce, no doubt. I think NCQ

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Mohan, Ross
Clustered file systems is the first/best example that comes to mind. Host A and Host B can both request from diskfarm, eg. -Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 19, 2005 12:10 PM To: Mohan, Ross Cc: pgsql-performance@postgresql.org

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Bruce Momjian
Mohan, Ross wrote: Clustered file systems is the first/best example that comes to mind. Host A and Host B can both request from diskfarm, eg. So one host writes to part of the disk and another host writes to a different part?

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Richard_D_Levine
[EMAIL PROTECTED] wrote on 04/19/2005 11:10:22 AM: What is 'multiple initiators' used for in the real world? I asked this same question and got an answer off list: Somebody said their SAN hardware used multiple initiators. I would try to check the archives for you, but this thread is

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: select count(1), sum(relpages) from pg_class where relkind in ('r','i','t') Well, if you do that for all databases in the cluster, it's the number you start with. However, setting FSM_pages to that would be assuming that you excpected 100% of the rows

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Mohan, Ross
Well, more like they both are allowed to issue disk requests and the magical clustered file system manages locking, etc. In reality, any disk is only reading/writing to one part of the disk at any given time, of course, but that in the multiple initiator deal, multiple streams of requests from

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Josh Berkus
Tom, Not at all. What it says is that you expect 100% of the pages to have useful amounts of free space, which is a *much* weaker criterion. Hmmm. Good point. This seems to be another instance where my rule-of-thumb was based on false logic but nevertheless arrived at correct numbers.

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: Not at all. What it says is that you expect 100% of the pages to have useful amounts of free space, which is a *much* weaker criterion. H actually, it seems like, if you are vacuuming regularly, you only *do* need to track pages that have been

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Alvaro Herrera
On Tue, Apr 19, 2005 at 10:06:40AM -0400, Tom Lane wrote: BTW, VACUUM FULL does the data movement back-to-front, and stops as soon as it finds a tuple it cannot move down; which is a reasonable strategy since the goal is merely to make the file shorter. But it's entirely likely that there

Re: [PERFORM] Question on REINDEX

2005-04-19 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Am I right in thinking that vacuum does at least two passes: one front-to-back to find removable tuples, and other back-to-front for movement? VACUUM FULL, yes. VACUUM only does the first one. I know maintenance_work_mem is used for storing TIDs of

[PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-19 Thread Josh Berkus
Folks, Params: PostgreSQL 8.0.1 on Solaris 10 Statistics = 500 (tablenames have been changed to protect NDA) e1=# select tablename, null_frac, correlation, n_distinct from pg_stats where tablename = 'clickstream1' andattname = 'session_id'; tablename | null_frac | correlation |

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-19 Thread Dave Held
-Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 19, 2005 2:09 PM To: pgsql-perform Subject: [PERFORM] Bad n_distinct estimation; hacks suggested? [...] (BTW, increasing the stats to 1000 only doubles n_distinct, and doesn't solve the problem)

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-19 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: As you can see, n_distinct estimation is off by a factor of 10x and it's causing query planning problems. Any suggested hacks to improve the histogram on this? What's the histogram itself look like? (I'd like to see the whole pg_stats row not just

Re: [PERFORM] [HACKERS] PLM pulling from CVS nightly for testing in STP

2005-04-19 Thread Mark Wong
I have dbt-2 tests automatically running against each pull from CVS and have started to automatically compile results here: http://developer.osdl.org/markw/postgrescvs/ I did start with a bit of a minimalistic approach, so I'm open for any comments, feedback, etc. Mark

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby
On Thu, Apr 14, 2005 at 10:51:46AM -0500, Matthew Nuzum wrote: So if you all were going to choose between two hard drives where: drive A has capacity C and spins at 15K rpms, and drive B has capacity 2 x C and spins at 10K rpms and all other features are the same, the price is the same and C

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby
On Mon, Apr 18, 2005 at 07:41:49PM +0200, Jacques Caron wrote: It would be interesting to actually compare this to real-world (or nearly-real-world) benchmarks to measure the effectiveness of features like TCQ/NCQ etc. I was just thinking that it would be very interesting to benchmark

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby
On Mon, Apr 18, 2005 at 10:20:36AM -0500, Dave Held wrote: Hmm...so you're saying that at some point, quantity beats quality? That's an interesting point. However, it presumes that you can actually distribute your data over a larger number of drives. If you have a db with a bottleneck of one

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby
On Mon, Apr 18, 2005 at 06:41:37PM -, Mohan, Ross wrote: Don't you think optimal stripe width would be a good question to research the binaries for? I'd think that drives the answer, largely. (uh oh, pun alert) EG, oracle issues IO requests (this may have changed _just_ recently) in

Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Jim C. Nasby
On Tue, Apr 19, 2005 at 11:22:17AM -0500, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 04/19/2005 11:10:22 AM: What is 'multiple initiators' used for in the real world? I asked this same question and got an answer off list: Somebody said their SAN hardware used multiple

Re: [PERFORM] immutable functions vs. join for lookups ?

2005-04-19 Thread Jim C. Nasby
You should re-run the function test using SQL as the function language instead of plpgsql. There might be some performance to be had there. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do

Re: [PERFORM] How to improve postgres performace

2005-04-19 Thread Jim C. Nasby
On Mon, Apr 18, 2005 at 11:36:01AM -0300, Rodrigo Moreno wrote: I really worried about that, because it's no enough anymore, and users claim about performace. But running the vacuumdb full, everthing starts to run better again, so i think the problem is not related to a specific query. Vacuum

Re: [PERFORM] Sort and index

2005-04-19 Thread Jim C. Nasby
On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote: Since you are fetching the entire table, you are touching all the rows. If the query were to fetch the rows in index order, it would be seeking all over the table's tracks. By fetching in sequence order, it has a much better chance of

[PERFORM] What to do with 6 disks?

2005-04-19 Thread Jeff Frost
Now that we've hashed out which drives are quicker and more money equals faster... Let's say you had a server with 6 separate 15k RPM SCSI disks, what raid option would you use for a standalone postgres server? a) 3xRAID1 - 1 for data, 1 for xlog, 1 for os? b) 1xRAID1 for OS/xlog, 1xRAID5 for

Re: [PERFORM] What to do with 6 disks?

2005-04-19 Thread Jim C. Nasby
http://stats.distributed.net is setup with the OS, WAL, and temp on a RAID1 and the database on a RAID10. The drives are 200G SATA with a 3ware raid card. I don't think the controller has battery-backed cache, but I'm not sure. In any case, it's almost never disk-bound on the mirror; when it's

[PERFORM] Slow copy with little CPU/disk usage

2005-04-19 Thread Jim C. Nasby
A friend of mine has an application where he's copying in 4000 rows at a time into a table that has about 4M rows. Each row is 40-50 bytes. This is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2 disk SATA mirror, running FBSD 4.10-stable. There's one index on the table. What's

Re: [PERFORM] Sort and index

2005-04-19 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: Actually, the planner (at least in 7.4) isn't smart enough to consider if the sort would fit in memory or not. Really? Have you read cost_sort()? It's certainly possible that the calculation is all wet, but to claim that the issue is not considered is

Re: [PERFORM] What to do with 6 disks?

2005-04-19 Thread Josh Berkus
Jeff, Let's say you had a server with 6 separate 15k RPM SCSI disks, what raid option would you use for a standalone postgres server? a) 3xRAID1 - 1 for data, 1 for xlog, 1 for os? b) 1xRAID1 for OS/xlog, 1xRAID5 for data c) 1xRAID10 for OS/xlong/data d) 1xRAID1 for OS, 1xRAID10 for data

Re: [PERFORM] What to do with 6 disks?

2005-04-19 Thread William Yu
My experience: 1xRAID10 for postgres 1xRAID1 for OS + WAL Jeff Frost wrote: Now that we've hashed out which drives are quicker and more money equals faster... Let's say you had a server with 6 separate 15k RPM SCSI disks, what raid option would you use for a standalone postgres server? a)

Re: [PERFORM] Slow copy with little CPU/disk usage

2005-04-19 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: A friend of mine has an application where he's copying in 4000 rows at a time into a table that has about 4M rows. Each row is 40-50 bytes. This is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2 disk SATA mirror, running FBSD 4.10-stable.

Re: [PERFORM] Sort and index

2005-04-19 Thread Jim C. Nasby
On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Actually, the planner (at least in 7.4) isn't smart enough to consider if the sort would fit in memory or not. Really? Have you read cost_sort()? It's certainly possible that the

Re: [PERFORM] Slow copy with little CPU/disk usage

2005-04-19 Thread Greg Stark
Jim C. Nasby [EMAIL PROTECTED] writes: What's really odd is that neither the CPU or the disk are being hammered. The box appears to be pretty idle; the postgresql proces is using 4-5% CPU. Is he committing every row? In that case you would see fairly low i/o bandwidth usage because most of

Re: [PERFORM] Slow copy with little CPU/disk usage

2005-04-19 Thread Mischa Sandberg
Quoting Tom Lane [EMAIL PROTECTED]: Jim C. Nasby [EMAIL PROTECTED] writes: A friend of mine has an application where he's copying in 4000 rows at a time into a table that has about 4M rows. Each row is 40-50 bytes. This is taking 25 seconds on a dual PIII-1GHz with 1G of RAM and a 2

Re: [PERFORM] How to tell what your postgresql server is doing

2005-04-19 Thread Christopher Kings-Lynne
Is there a way to look at the stats tables and tell what is jamming up your postgres server the most? Other than seeing long running queries and watch top, atop, iostat, vmstat in separate xterms...I'm wondering if postgres keeps some stats on what it spends the most time doing or if there's