[PERFORM] pg_dump slow

2005-12-01 Thread Franklin Haut
I Maked a new install on machine this night, and the same results, on
console localhost

Windows 2000 Server 
Version 5.00.2195

PG Version 8.1


Franklin



Franlin: are you making pg_dump from local or remote box and is this a 
clean install?  Try fresh patched win2k install and see what happens.
He claimed this was local, not network.  It is certainly an 
intriguing possibility that W2K and WinXP handle bytea 
differently.  I'm not competent to comment on that however.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] About the relation between fragmentation of file and

2005-12-01 Thread Michael Stone

On Thu, Dec 01, 2005 at 02:50:56PM +0900, Tatsumi Abe wrote:

Could anyone advise some solutions for this fragmentation problem
without stopping PostgreSQL ?


Stop doing VACUUM FULL so often. If your table size is constant anyway
you're just wasting time by compacting the table and shrinking it, and
encouraging fragmentation as each table file grows then shrinks a little
bit each day.

Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] About the relation between fragmentation of file and VACUUM

2005-12-01 Thread Bill McGonigle

On Dec 1, 2005, at 00:50, Tatsumi Abe wrote:


However, In my current environment I can't stop PostgreSQL and cancel
fragmentation.

Could anyone advise some solutions for this fragmentation problem
without stopping PostgreSQL ?


This is somewhat of an aside and intended just as a helpful suggestion 
since I've been in this spot before: if you have this kind of uptime 
requirement the first project to work on is getting the environment to 
the point where you can take out at least one database server at a time 
for maintenance.  You're going to be forced to do this sooner or later 
- whether by disk failure, software error (Pg or OS), user error 
(restore from backup) or security issues (must patch fixes).


So disk fragmentation is a great thing to worry about at some point, 
but IMHO you've got your neck under the guillotine and worrying about 
your cuticles.


I've heard the arguments before, usually around budget, and if the 
company can't spend any money but needs blood-from-stone performance 
tweaks, somebody isn't doing the math right (I'm assuming this isn't 
running on a satellite).  Plus, your blood pressure will go down when 
things are more resilient.  I've tried the superhero thing before and 
it's just not worth it.


-Bill
-
Bill McGonigle, Owner   Work: 603.448.4440
BFC Computing, LLC  Home: 603.448.1668
[EMAIL PROTECTED]   Mobile: 603.252.2606
http://www.bfccomputing.com/Pager: 603.442.1833
Jabber: [EMAIL PROTECTED]  Text: [EMAIL PROTECTED]
Blog: http://blog.bfccomputing.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] slow insert into very large table

2005-12-01 Thread Wolfgang Gehner

Hi there,

I need a simple but large table with several million records. I do batch 
inserts with JDBC. After the first million or so records,
the inserts degrade to become VERY slow (like 8 minutes vs initially 20 
secondes).


The table has no indices except PK while I do the inserts.

This is with PostgreSQL 8.0 final for WindowsXP on a Pentium 1.86 GHz, 
1GB Memory. HD is fast IDE.


I already have shared buffers already set to 25000.

I wonder what else I can do. Any ideas?

Kindest regards,

Wolfgang Gehner

--
Infonoia SA
7 rue de Berne
1211 Geneva 1
Tel: +41 22 9000 009
Fax: +41 22 9000 018
http://www.infonoia.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi,

we are currently running a postgres server (upgraded to 8.1) which has 
one large database with approx. 15,000 tables. Unfortunately performance 
suffers from that, because the internal tables (especially that which 
holds the attribute info) get too large.


(We NEED that many tables, please don't recommend to reduce them)

Logically these tables could be grouped into 500 databases. My question is:

Would performance be better if I had 500 databases (on one postgres 
server instance) which each contain 30 tables, or is it better to have 
one large database with 15,000 tables? In the old days of postgres 6.5 
we tried that, but performance was horrible with many databases ...


BTW: I searched the mailing list, but found nothing on the subject - and 
there also isn't any information in the documentation about the effects 
of the number of databases, tables or attributes on the performance.


Now, what do you say? Thanks in advance for any comment!

Mike

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] slow insert into very large table

2005-12-01 Thread Andreas Pflug

Wolfgang Gehner wrote:

Hi there,

I need a simple but large table with several million records. I do batch 
inserts with JDBC. After the first million or so records,
the inserts degrade to become VERY slow (like 8 minutes vs initially 20 
secondes).


The table has no indices except PK while I do the inserts.

This is with PostgreSQL 8.0 final for WindowsXP on a Pentium 1.86 GHz, 
1GB Memory. HD is fast IDE.


I already have shared buffers already set to 25000.

I wonder what else I can do. Any ideas?


Run VACUUM ANALYZE to have statistics reflect the growth of the table. 
The planner probably still assumes your table to be small, and thus 
takes wrong plans to check PK indexes or so.


Regards,
Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Michael Riess wrote:


Hi,

we are currently running a postgres server (upgraded to 8.1) which has one 
large database with approx. 15,000 tables. Unfortunately performance suffers 
from that, because the internal tables (especially that which holds the 
attribute info) get too large.


is it becouse the internal tables get large, or is it a problem with disk 
I/O?


with 15,000 tables you are talking about a LOT of files to hold these 
(30,000 files with one index each and each database being small enough to 
not need more then one file to hold it), on linux ext2/3 this many files 
in one directory will slow you down horribly. try different filesystems 
(from my testing and from other posts it looks like XFS is a leading 
contender), and also play around with the tablespaces feature in 8.1 to 
move things out of the main data directory into multiple directories. if 
you do a ls -l on the parent directory you will see that the size of the 
directory is large if it's ever had lots of files in it, the only way to 
shrink it is to mv the old directory to a new name, create a new directory 
and move the files from the old directory to the new one.


David Lang


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] filesystem performance with lots of files

2005-12-01 Thread David Lang
this subject has come up a couple times just today (and it looks like one 
that keeps popping up).


under linux ext2/3 have two known weaknesses (or rather one weakness with 
two manifestations). searching through large objects on disk is slow, this 
applies to both directories (creating, opening, deleting files if there 
are (or have been) lots of files in a directory), and files (seeking to 
the right place in a file).


the rule of thumb that I have used for years is that if files get over a 
few tens of megs or directories get over a couple thousand entries you 
will start slowing down.


common places you can see this (outside of postgres)

1. directories, mail or news storage.
  if you let your /var/spool/mqueue directory get large (for example a 
server that can't send mail for a while or mail gets misconfigured on). 
there may only be a few files in there after it gets fixed, but if the 
directory was once large just doing a ls on the directory will be slow.


  news servers that store each message as a seperate file suffer from this 
as well, they work around it by useing multiple layers of nested 
directories so that no directory has too many files in it (navigating the 
layers of directories costs as well, it's all about the tradeoffs). Mail 
servers that use maildir (and Cyrus which uses a similar scheme) have the 
same problem.


  to fix this you have to create a new directory and move the files to 
that directory (and then rename the new to the old)


  ext3 has an option to make searching directories faster (htree), but 
enabling it kills performance when you create files. And this doesn't help 
with large files.


2. files, mbox formatted mail files and log files
  as these files get large, the process of appending to them takes more 
time. syslog makes this very easy to test. On a box that does syncronous 
syslog writing (default for most systems useing standard syslog, on linux 
make sure there is not a - in front of the logfile name) time how long it 
takes to write a bunch of syslog messages, then make the log file large 
and time it again.


a few weeks ago I did a series of tests to compare different filesystems. 
the test was for a different purpose so the particulars are not what I 
woud do for testing aimed at postgres, but I think the data is relavent) 
and I saw major differences between different filesystems, I'll see aobut 
re-running the tests to get a complete set of benchmarks in the next few 
days. My tests had their times vary from 4 min to 80 min depending on the 
filesystem in use (ext3 with hash_dir posted the worst case). what testing 
have other people done with different filesystems?


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi David,

incidentally: The directory which holds our datbase currently contains 
73883 files ... do I get a prize or something? ;-)


Regards,

Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi David,



with 15,000 tables you are talking about a LOT of files to hold these 
(30,000 files with one index each and each database being small enough 
to not need more then one file to hold it), on linux ext2/3 this many 
files in one directory will slow you down horribly. 


We use ReiserFS, and I don't think that this is causing the problem ... 
although it would probably help to split the directory up using tablespaces.


But thanks for the suggestion!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] About the relation between fragmentation of file and

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Richard Huxton wrote:


Tatsumi Abe wrote:

Question is about the relation between fragmentation of file and VACUUM
performance.

Environment
OS:RedHat Enterprise Linux AS Release 3(Taroon Update 6)
Kernel 2.4.21-37.ELsmp on an i686
Filesystem Type ext3
Filesystem features:  has_journal filetype needs_recovery sparse_super 
large_file


try different filesystems, ext2/3 do a very poor job when you have lots of 
files in a directory (and 7000+ files is a lot). you can also try mounting 
the filesystem with noatime, nodiratime to reduce the seeks when reading, 
and try mounting it with oldalloc (which changes how the files are 
arranged on disk when writing and extending them), I've seen drastic 
speed differences between ext2 and ext3 based on this option (ext2 
defaults to oldalloc, ext3 defaults to orlov, which is faster in many 
cases)



CPU:Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Memory:2.0GB
HDD:80GB(S-ATA)
 SATA max UDMA/133
PostgreSQL:7.3.8

DB Environment
1. Approx. there are 3500 tables in the DB



When the performance of inserting data was measured in the above-
mentioned environment, it takes six minutes to write 1 lines
after 4/5 days the measurement had begun. While searching the reason
of bottleneck by executing iostat command it is understood that DISK I/O
was problem for the neck as %iowait was almost 100% at that time.

On the very first day processing time of VACUUM is not a problem but
when the day progress its process time is increasing.Then I examined the
fragmentation of database area(pgsql/data/base) by using the following tools.

Disk Allocation Viewer
http://sourceforge.net/projects/davtools/

Fragmentation rate is 28% before defrag.


I'd guess the root of your problem is the number of tables (3500), which
if each has one index represents at least 7000 files. That means a lot
of your I/O time will probably be spent moving the disk heads between
the different files.


depending on the size of the tables it can actually be a lot worse then 
this (remember Postgres splits the tables into fixed size chunks)


when postgres adds data it will eventually spill over into additional 
files, when you do a vaccum does it re-write the tables into a smaller 
number of files or just rewrite the individual files (makeing each of them 
smaller, but keeping the same number of files)


speaking of this, the selection of the size of these chunks is a 
comprimize between the time needed to seek in an individual file and the 
number of files that are created, is there an easy way to tinker with this 
(I am sure the default is not correct for all filesystems, the filesystem 
handling of large and/or many files differ drasticly)



You say you can't stop the server, so there's no point in thinking about
a quick hardware upgrade to help you. Also a version-upgrade is not
do-able for you.


there's a difference between stopping the server once for an upgrade 
(hardware or software) and having to stop it every few days to defrag 
things forever after.


David Lang


I can only think of two other options:
1. Change the database schema to reduce the number of tables involved.
I'm assuming that of the 3500 tables most hold the same data but for
different clients (or something similar). This might not be practical
either.

2. Re-order how you access the database. ANALYSE the updated tables
regularly, but only VACUUM them after deletions. Group your inserts so
that all the inserts for table1 go together, then all the inserts for
table2 go together and so on. This should help with the fragmentation by
making sure the files get extended in larger chunks.

Are you sure it's not possible to spend 15 mins offline to solve this?
--
 Richard Huxton
 Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Jaime Casanova
On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:
 Hi,

 we are currently running a postgres server (upgraded to 8.1) which has
 one large database with approx. 15,000 tables. Unfortunately performance
 suffers from that, because the internal tables (especially that which
 holds the attribute info) get too large.

 (We NEED that many tables, please don't recommend to reduce them)


Have you ANALYZEd your database? VACUUMing?

BTW, are you using some kind of weird ERP? I have one that treat
informix as a fool and don't let me get all of informix potential...
maybe the same is in your case...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-01 Thread Markus Wollny
Hi!

I've got an urgent problem with an application which is evaluating a
monthly survey; it's running quite a lot of queries like this:

select  SOURCE.NAME as TYPE,
   count(PARTICIPANT.SESSION_ID) as TOTAL
from  (
   select  PARTICIPANT.SESSION_ID
   from   survey.PARTICIPANT,
  survey.ANSWER
   where  PARTICIPANT.STATUS = 1
   and   date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month',
now()-'1 month'::interval)
   and   PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
   and   ANSWER.QUESTION_ID = 6
   and   ANSWER.VALUE = 1
   )
   as PARTICIPANT,
   survey.ANSWER,
   survey.HANDY_JAVA SOURCE
where  PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
and   ANSWER.QUESTION_ID = 16
and   ANSWER.VALUE = SOURCE.ID
group by SOURCE.NAME,
   SOURCE.POSITION
order by  SOURCE.POSITION asc;

My current PostgreSQL-version is PostgreSQL 8.1.0 on i686-pc-linux-gnu,
compiled by GCC gcc (GCC) 3.2. Up to 8.0, a query like this took a
couple of seconds, maybe even up to a minute. In 8.1 a query like this
will run from 30 minutes up to two hours to complete, depending on ist
complexity. I've got autovaccum enabled and run a nightly vacuum analyze
over all of my databases. Here's some information about the relevant
tables: Table answer has got ~ 8.9M rows (estimated 8,872,130, counted
8,876,648), participant has got ~178K rows (estimated 178,165, counted
178,248), HANDY_JAVA has got three rows. This is the
explain-analyze-output for the above:

Sort  (cost=11383.09..11383.10 rows=3 width=16) (actual
time=1952676.858..1952676.863 rows=3 loops=1)
  Sort Key: source.position
  -  HashAggregate  (cost=11383.03..11383.07 rows=3 width=16) (actual
time=1952676.626..1952676.635 rows=3 loops=1)
-  Nested Loop  (cost=189.32..11383.00 rows=5 width=16)
(actual time=6975.812..1952371.782 rows=9806 loops=1)
  -  Nested Loop  (cost=3.48..3517.47 rows=42 width=20)
(actual time=6819.716..15419.930 rows=9806 loops=1)
-  Nested Loop  (cost=3.48..1042.38 rows=738
width=16) (actual time=258.434..6233.039 rows=162723 loops=1)
  -  Seq Scan on handy_java source
(cost=0.00..1.03 rows=3 width=14) (actual time=0.093..0.118 rows=3
loops=1)
  -  Bitmap Heap Scan on answer
(cost=3.48..344.04 rows=246 width=8) (actual time=172.381..1820.499
rows=54241 loops=3)
Recheck Cond: ((answer.question_id =
16) AND (answer.value = outer.id))
-  Bitmap Index Scan on
idx02_performance  (cost=0.00..3.48 rows=246 width=0) (actual
time=98.321..98.321 rows=54245 loops=3)
  Index Cond: ((answer.question_id
= 16) AND (answer.value = outer.id))
-  Index Scan using idx01_perf_0006 on participant
(cost=0.00..3.34 rows=1 width=4) (actual time=0.049..0.050 rows=0
loops=162723)
  Index Cond: (participant.session_id =
outer.session_id)
  Filter: ((status = 1) AND
(date_trunc('month'::text, created) = date_trunc('month'::text, (now() -
'1 mon'::interval
  -  Bitmap Heap Scan on answer  (cost=185.85..187.26
rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806)
Recheck Cond: ((outer.session_id =
answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1))
-  BitmapAnd  (cost=185.85..185.85 rows=1 width=0)
(actual time=197.421..197.421 rows=0 loops=9806)
  -  Bitmap Index Scan on
idx_answer_session_id  (cost=0.00..2.83 rows=236 width=0) (actual
time=0.109..0.109 rows=49 loops=9806)
Index Cond: (outer.session_id =
answer.session_id)
  -  Bitmap Index Scan on idx02_performance
(cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742
rows=165697 loops=9806)
Index Cond: ((question_id = 6) AND
(value = 1))
Total runtime: 1952678.393 ms

I am really sorry, but currently I haven't got any 8.0-installation
left, so I cannot provide the explain (analyze) output for 8.0. 

I fiddled a little with the statement and managed to speed things up
quite a lot:
 
select  SOURCE.NAME as TYPE,
   count(ANSWER.SESSION_ID) as TOTAL
from survey.ANSWER,
   survey.HANDY_JAVA SOURCE
where  ANSWER.QUESTION_ID = 16
and   ANSWER.VALUE = SOURCE.ID
and ANSWER.SESSION_ID in (
   select  PARTICIPANT.SESSION_ID
   from   survey.PARTICIPANT,
  survey.ANSWER
   where  PARTICIPANT.STATUS = 1
   and   date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month',
now()-'1 month'::interval)
   and   PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
   and   ANSWER.QUESTION_ID = 6
   and   ANSWER.VALUE = 1
   )
group by SOURCE.NAME,
   SOURCE.POSITION
order by  SOURCE.POSITION asc;

Here's the explain analyze output:
Sort  (cost=27835.39..27835.39 rows=3 width=16) (actual
time=9609.207..9609.212 rows=3 loops=1)
  Sort Key: source.position
  -  HashAggregate  

Re: [PERFORM] slow insert into very large table

2005-12-01 Thread Tom Lane
Wolfgang Gehner [EMAIL PROTECTED] writes:
 This is with PostgreSQL 8.0 final for WindowsXP on a Pentium 1.86 GHz, 
 1GB Memory. HD is fast IDE.

Try something more recent, like 8.0.3 or 8.0.4.  IIRC we had some
performance issues in 8.0.0 with tables that grew from zero to large
size during a single session.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Tom Lane
Michael Riess [EMAIL PROTECTED] writes:
 (We NEED that many tables, please don't recommend to reduce them)

No, you don't.  Add an additional key column to fold together different
tables of the same structure.  This will be much more efficient than
managing that key at the filesystem level, which is what you're
effectively doing now.

(If you really have 15000 distinct rowtypes, I'd like to know what
your database design is...)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Jaime Casanova
On 12/1/05, Tom Lane [EMAIL PROTECTED] wrote:
 Michael Riess [EMAIL PROTECTED] writes:
  (We NEED that many tables, please don't recommend to reduce them)

 No, you don't.  Add an additional key column to fold together different
 tables of the same structure.  This will be much more efficient than
 managing that key at the filesystem level, which is what you're
 effectively doing now.

 (If you really have 15000 distinct rowtypes, I'd like to know what
 your database design is...)

regards, tom lane


Maybe he is using some kind of weird ERP... take the case of BaaN
(sadly i use it in my work): BaaN creates about 1200 tables per
company and i have no control of it... we have about 12000 tables
right now...

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] pg_dump slow

2005-12-01 Thread Merlin Moncure
 Franlin: are you making pg_dump from local or remote box and is this
a
 clean install?  Try fresh patched win2k install and see what happens.
 He claimed this was local, not network.  It is certainly an
 intriguing possibility that W2K and WinXP handle bytea
 differently.  I'm not competent to comment on that however.

can you make small extraction of this file (~ 100 rows), zip to file and
send to me off list?  I'll test it vs. a 2000 and xp server and try to
reproduce your results.

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi,



On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:

Hi,

we are currently running a postgres server (upgraded to 8.1) which has
one large database with approx. 15,000 tables. Unfortunately performance
suffers from that, because the internal tables (especially that which
holds the attribute info) get too large.

(We NEED that many tables, please don't recommend to reduce them)



Have you ANALYZEd your database? VACUUMing?


Of course ... before 8.1 we routinely did a vacuum full analyze each 
night. As of 8.1 we use autovacuum.




BTW, are you using some kind of weird ERP? I have one that treat
informix as a fool and don't let me get all of informix potential...
maybe the same is in your case...


No. Our database contains tables for we content management systems. The 
server hosts approx. 500 cms applications, and each of them has approx. 
30 tables.


That's why I'm asking if it was better to have 500 databases with 30 
tables each. In previous Postgres versions this led to even worse 
performance ...


Mike

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-01 Thread Tom Lane
Markus Wollny [EMAIL PROTECTED] writes:
 My current problem is that rewriting hundreds of queries, some of them
 quite a bit more complex than this one, but all of them using the same
 general scheme, would take quite a lot of time - and I'm expected to
 hand over the survey results ASAP. So I will obviously have to do a
 rewrite if there's just no other way, but I wondered if there might be
 some other option that would allow me to point the planner in the right
 direction so it would behave the same as in the previous versions,
 namely 8.0?

It looks like set enable_nestloop = 0 might be a workable hack for
the immediate need.  Once you're not under deadline, I'd like to
investigate more closely to find out why 8.1 does worse than 8.0 here.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-01 Thread Markus Wollny

 -Ursprüngliche Nachricht-
 Von: Tom Lane [mailto:[EMAIL PROTECTED] 
 Gesendet: Donnerstag, 1. Dezember 2005 17:26
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have 
 been much faster in PG=8.0 
 
 It looks like set enable_nestloop = 0 might be a workable 
 hack for the immediate need.  

Whow - that works miracles :)

Sort  (cost=81813.13..81813.14 rows=3 width=16) (actual 
time=7526.745..7526.751 rows=3 loops=1)
  Sort Key: source.position
  -  HashAggregate  (cost=81813.07..81813.11 rows=3 width=16) (actual 
time=7526.590..7526.601 rows=3 loops=1)
-  Merge Join  (cost=81811.40..81813.03 rows=5 width=16) (actual 
time=7423.289..7479.175 rows=9806 loops=1)
  Merge Cond: (outer.id = inner.value)
  -  Sort  (cost=1.05..1.06 rows=3 width=14) (actual 
time=0.085..0.091 rows=3 loops=1)
Sort Key: source.id
-  Seq Scan on handy_java source  (cost=0.00..1.03 rows=3 
width=14) (actual time=0.039..0.049 rows=3 loops=1)
  -  Sort  (cost=81810.35..81811.81 rows=583 width=8) (actual 
time=7423.179..7440.062 rows=9806 loops=1)
Sort Key: mafo.answer.value
-  Hash Join  (cost=27164.31..81783.57 rows=583 width=8) 
(actual time=6757.521..7360.822 rows=9806 loops=1)
  Hash Cond: (outer.session_id = inner.session_id)
  -  Bitmap Heap Scan on answer  
(cost=506.17..54677.92 rows=88334 width=8) (actual time=379.245..2660.344 
rows=162809 loops=1)
Recheck Cond: (question_id = 16)
-  Bitmap Index Scan on 
idx_answer_question_id  (cost=0.00..506.17 rows=88334 width=0) (actual 
time=274.632..274.632 rows=162814 loops=1)
  Index Cond: (question_id = 16)
  -  Hash  (cost=26655.21..26655.21 rows=1175 
width=8) (actual time=3831.362..3831.362 rows=9806 loops=1)
-  Hash Join  (cost=4829.33..26655.21 
rows=1175 width=8) (actual time=542.227..3800.985 rows=9806 loops=1)
  Hash Cond: (outer.session_id = 
inner.session_id)
  -  Bitmap Heap Scan on answer  
(cost=182.84..21429.34 rows=20641 width=4) (actual time=292.067..2750.376 
rows=165762 loops=1)
Recheck Cond: ((question_id = 6) 
AND (value = 1))
-  Bitmap Index Scan on 
idx02_performance  (cost=0.00..182.84 rows=20641 width=0) (actual 
time=167.306..167.306 rows=165769 loops=1)
  Index Cond: ((question_id = 
6) AND (value = 1))
  -  Hash  (cost=4621.13..4621.13 
rows=10141 width=4) (actual time=182.842..182.842 rows=11134 loops=1)
-  Index Scan using 
idx01_perf_0005 on participant  (cost=0.01..4621.13 rows=10141 width=4) (actual 
time=0.632..136.126 rows=11134 loops=1)
  Index Cond: 
(date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '1 
mon'::interval)))
  Filter: (status = 1)
Total runtime: 7535.398 ms

 Once you're not under deadline, 
 I'd like to investigate more closely to find out why 8.1 does 
 worse than 8.0 here.

Please tell me what I can do to help in clearing up this issue, I'd be very 
happy to help! Heck, I am happy anyway that there's such a quick fix, even if 
it's not a beautiful one :)

Kind regards

   Markus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Guido Neitzer

On 01.12.2005, at 17:04 Uhr, Michael Riess wrote:

No. Our database contains tables for we content management systems.  
The server hosts approx. 500 cms applications, and each of them has  
approx. 30 tables.


Just for my curiosity: Are the about 30 tables with similar schemas  
or do they differ much?


We have a small CMS system running here, where I have all information  
for all clients in tables with relationships to a client table.


But I assume you are running a pre-build CMS which is not designed  
for multi-client ability, right?


cug


--
PharmaLine, Essen, GERMANY
Software and Database Development




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-01 Thread Tom Lane
Markus Wollny [EMAIL PROTECTED] writes:
 Once you're not under deadline, 
 I'd like to investigate more closely to find out why 8.1 does 
 worse than 8.0 here.

 Please tell me what I can do to help in clearing up this issue, I'd be
 very happy to help!

The first thing to do is get 8.0's EXPLAIN ANALYZE for the same query.
After we see how that differs from 8.1, we'll know what the next
question should be ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Hi Tom,


Michael Riess [EMAIL PROTECTED] writes:

(We NEED that many tables, please don't recommend to reduce them)


No, you don't.  Add an additional key column to fold together different
tables of the same structure.  This will be much more efficient than
managing that key at the filesystem level, which is what you're
effectively doing now.


Been there, done that. (see below)



(If you really have 15000 distinct rowtypes, I'd like to know what
your database design is...)


Sorry, I should have included that info in the initial post. You're 
right in that most of these tables have a similar structure. But they 
are independent and can be customized by the users.


Think of it this way: On the server there are 500 applications, and each 
has 30 tables. One of these might be a table which contains the products 
of a webshop, another contains news items which are displayed on the 
website etc. etc..


The problem is that the customers can freely change the tables ... add 
columns, remove columns, change column types etc.. So I cannot use 
system wide tables with a key column.



Mike

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 15,000 tables

2005-12-01 Thread me

hi michael


Have you ANALYZEd your database? VACUUMing?


Of course ... before 8.1 we routinely did a vacuum full analyze each 
night. As of 8.1 we use autovacuum.



what i noticed is autovacuum not working properly as it should. i had 8.1 
running with autovacuum for just 2 days or so and got warnings in pgadmin 
that my tables would need an vacuum. i've posted this behaviour some weeks 
ago to the novice list requesting more infos on how to tweak autovacuum 
properly - unfortunately without any respones. thats when i switched the 
nightly analyze job back on - everything runs smooth since then.


maybe it helps in your case as well?

cheers,
thomas





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] pg_stat* values ...

2005-12-01 Thread Marc G. Fournier


Not having found anything so far, does anyone know of, and can point me 
to, either tools, or articles, that talk about doing tuning based on the 
information that this sort of information can help with?



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] filesystem performance with lots of files

2005-12-01 Thread Qingqing Zhou

David Lang [EMAIL PROTECTED] wrote

 a few weeks ago I did a series of tests to compare different filesystems. 
 the test was for a different purpose so the particulars are not what I 
 woud do for testing aimed at postgres, but I think the data is relavent) 
 and I saw major differences between different filesystems, I'll see aobut 
 re-running the tests to get a complete set of benchmarks in the next few 
 days. My tests had their times vary from 4 min to 80 min depending on the 
 filesystem in use (ext3 with hash_dir posted the worst case). what testing 
 have other people done with different filesystems?


That's good ... what benchmarks did you used?

Regards,
Qingqing 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Chris Browne
Michael Riess [EMAIL PROTECTED] writes:
 On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:
 we are currently running a postgres server (upgraded to 8.1) which
 has one large database with approx. 15,000 tables. Unfortunately
 performance suffers from that, because the internal tables
 (especially that which holds the attribute info) get too large.

 (We NEED that many tables, please don't recommend to reduce them)

 Have you ANALYZEd your database? VACUUMing?

 Of course ... before 8.1 we routinely did a vacuum full analyze each
 night. As of 8.1 we use autovacuum.

VACUUM FULL was probably always overkill, unless always includes
versions prior to 7.3...

 BTW, are you using some kind of weird ERP? I have one that treat
 informix as a fool and don't let me get all of informix potential...
 maybe the same is in your case...

 No. Our database contains tables for we content management
 systems. The server hosts approx. 500 cms applications, and each of
 them has approx. 30 tables.

 That's why I'm asking if it was better to have 500 databases with 30
 tables each. In previous Postgres versions this led to even worse
 performance ...

This has the feeling of fitting with Alan Perlis' dictum below...

Supposing you have 500 databases, each with 30 tables, each with 4
indices, then you'll find you have, on disk...

# of files = 500 x 30 x 5 = 75000 files

If each is regularly being accessed, that's bits of 75000 files
getting shoved through OS and shared memory caches.  Oh, yes, and
you'll also have regular participation of some of the pg_catalog
files, with ~500 instances of THOSE, multiplied some number of ways...

An application with 15000 frequently accessed tables doesn't strike me
as being something that can possibly turn out well.  You have, in
effect, more tables than (arguably) bloated ERP systems like SAP R/3;
it only has a few thousand tables, and since many are module-specific,
and nobody ever implements *all* the modules, it is likely only a few
hundred that are hot spots.  No 15000 there...
-- 
(format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com)
http://www3.sympatico.ca/cbbrowne/languages.html
It is better to have 100  functions operate on one data structure than
10 functions on 10 data structures.  -- Alan J. Perlis

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Gavin M. Roy

Hi Michael,

I'm a fan of ReiserFS, and I can be wrong, but I believe using a  
journaling filesystem for the PgSQL database could be slowing things  
down.


Gavin

On Dec 1, 2005, at 6:51 AM, Michael Riess wrote:


Hi David,

with 15,000 tables you are talking about a LOT of files to hold  
these (30,000 files with one index each and each database being  
small enough to not need more then one file to hold it), on linux  
ext2/3 this many files in one directory will slow you down horribly.


We use ReiserFS, and I don't think that this is causing the  
problem ... although it would probably help to split the directory  
up using tablespaces.


But thanks for the suggestion!

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings


Gavin M. Roy
800 Pound Gorilla
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] COPY into table too slow with index

2005-12-01 Thread Rick Schumeyer








Im running postgresql 8.1.0 with postgis 1.0.4 on a
FC3 system, 3Ghz, 1 GB memory.



I am using COPY to fill a table that contains one postgis
geometry column.



With no geometry index, it takes about 45 seconds to COPY
one file.



If I add a geometry index, this time degrades. It
keeps getting worse as more records are

added to the table. It was up to over three minutes
per file on my most recent test.



The problem is that each file contains about 5  10 minutes
of data. Eventually, I want to

add the data to the table in real time.
So the COPY needs to take less time than 

actually generating the data.



Here is the relevant section of my postgresql.conf.



# - Memory -



shared_buffers = 5000 #
min 16 or max_connections*2, 8KB each

#temp_buffers = 1000 #
min 100, 8KB each

#max_prepared_transactions =
5 # can be 0
or more

# note: increasing max_prepared_transactions
costs ~600 bytes of shared memory

# per transaction slot, plus
lock space (see max_locks_per_transaction).

work_mem = 2 #
min 64, size in KB

maintenance_work_mem = 2 #
min 1024, size in KB

#max_stack_depth = 2048 #
min 100, size in KB



Any suggestions for improvement?








Re: [PERFORM] 15,000 tables

2005-12-01 Thread Tino Wildenhain
Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy:
 Hi Michael,
 
 I'm a fan of ReiserFS, and I can be wrong, but I believe using a  
 journaling filesystem for the PgSQL database could be slowing things  
 down.

Have a 200G+ database, someone pulling the power plug
or a regular reboot after a year or so.

Wait for the fsck to finish.

Now think again :-)

++Tino


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Ron
Agreed.  Also the odds of fs corruption or data loss are higher in a 
non journaling fs.  Best practice seems to be to use a journaling fs 
but to put the fs log on dedicated spindles separate from the actual 
fs or pg_xlog.


Ron

At 01:40 PM 12/1/2005, Tino Wildenhain wrote:

Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy:
 Hi Michael,

 I'm a fan of ReiserFS, and I can be wrong, but I believe using a
 journaling filesystem for the PgSQL database could be slowing things
 down.

Have a 200G+ database, someone pulling the power plug
or a regular reboot after a year or so.

Wait for the fsck to finish.

Now think again :-)

++Tino


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Gavin M. Roy

Here's a fairly recent post on reiserfs (and performance):

http://archives.postgresql.org/pgsql-novice/2005-09/msg7.php

I'm still digging on performance of ext2 vrs journaled filesystems,  
as I know I've seen it before.


Gavin


My point was not in doing an fsck, but rather in
On Dec 1, 2005, at 10:40 AM, Tino Wildenhain wrote:


Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy:

Hi Michael,

I'm a fan of ReiserFS, and I can be wrong, but I believe using a
journaling filesystem for the PgSQL database could be slowing things
down.


Have a 200G+ database, someone pulling the power plug
or a regular reboot after a year or so.

Wait for the fsck to finish.

Now think again :-)

++Tino



Gavin M. Roy
800 Pound Gorilla
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Tom Lane
Ron [EMAIL PROTECTED] writes:
 Agreed.  Also the odds of fs corruption or data loss are higher in a 
 non journaling fs.  Best practice seems to be to use a journaling fs 
 but to put the fs log on dedicated spindles separate from the actual 
 fs or pg_xlog.

I think we've determined that best practice is to journal metadata only
(not file contents) on PG data filesystems.  PG does expect the filesystem
to remember where the files are, so you need metadata protection, but
journalling file content updates is redundant with PG's own WAL logging.

On a filesystem dedicated to WAL, you probably do not need any
filesystem journalling at all --- we manage the WAL files in a way
that avoids changing metadata for a WAL file that's in active use.
A conservative approach would be to journal metadata here too, though.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Insert performance slows down in large batch

2005-12-01 Thread Jeremy Haile
I am importing roughly 15 million rows in one batch transaction.  I am
currently doing this through batch inserts of around 500 at a time,
although I am looking at ways to do this via multiple (one-per-table)
copy commands for performance reasons.  

I am currently running: PostgreSQL 8.0.4, Redhat Enterprise Linux 4,
ext3, all-on-one partition.  I am aware of methods of improving
performance by changing ext3 mounting options, splitting WAL, data, and
indexes to separate physical disks, etc.  I have also adjusted my
shared_buffers, work_mem, maintenance_work_mem, and checkpoint_segments
and can post their values if anyone thinks it is relevant to my question
(See questions at the bottom)  

What confuses me is that at the beginning of the import, I am inserting
roughly 25,000 rows every 7 seconds..and by the time I get towards the
end of the import, it is taking 145 seconds for the same number of rows.
 The inserts are spread across 4 tables and I have dropped all indexes
and constraints on these tables, including foreign keys, unique keys,
and even primary keys (even though I think primary key doesn't improve
performance)  The entire bulk import is done in a single transaction.

The result is a table with 4.8 million rows, two tables with 4.8*2
million rows, and another table with several thousand rows.

So, my questions are:
1) Why does the performance degrade as the table sizes grow?  Shouldn't
the insert performance remain fairly constant if there are no indexes or
constraints?  

2) Is there anything I can do to figure out where the time is being
spent?  Will postgres log any statistics or information to help me
diagnose the problem?  I have pasted a fairly representative sample of
vmstat below my e-mail in case it helps, although I'm not quite how to
interpret it in this case.  

3) Any other advice, other than the things I listed above (I am aware of
using copy, ext3 tuning, multiple disks, tuning postgresql.conf
settings)? 

Thanks in advance,
Jeremy Haile


#vmstat 2 20
procs ---memory-- ---swap-- -io --system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy
 id wa
 1  0   9368   4416   2536 177878400   124513 2  2 
 0 96  2
 1  0   9368   4416   2536 177878400 0 0 100553 25 
 0 75  0
 1  1   9368   3904   2544 177932000 12164 6 1103   262 24 
 1 59 16
 1  0   9368   3704   2552 177938000 1625624 1140   344 23 
 1 53 23
 1  1   9368   2936   2560 178012000 16832 6 1143   359 23 
 1 52 24
 1  1   9368   3328   2560 177971200 13120 0    285 24 
 1 58 18
 1  0   9368   4544   2560 177855600  5184 0 1046   141 25 
 0 67  8
 1  1   9368   3776   2568 177929600  7296 6 1064   195 24 
 0 67  9
 1  0   9368   4480   2568 177854800  4096 0 1036   133 24 
 0 69  6
 1  0   9368   4480   2576 177860800  7504 0 1070   213 23 
 0 67 10
 1  0   9368   3136   2576 177990000  9536 0 1084   235 23 
 0 66 10
 1  1   9368   3072   2584 177996000 13632 6 1118   313 24 
 1 60 16
 1  0   9368   4480   2592 177859200  857624 1075   204 24 
 0 63 12
 1  0   9368   4480   2592 177859200 0 6 100452 25 
 0 75  0
 1  0   9368   4544   2600 177865200 0 6 100555 25 
 0 75  0
 1  1   9368   3840   2600 177933200 11264 4 1098   260 24 
 0 63 13
 1  1   9368   3072   2592 178015600 1708814 1145   346 24 
 1 51 24
 1  1   9368   4096   2600 177912800 16768 6 1140   360 23 
 1 54 21
 1  1   9368   3840   2600 177933200 16960 0 1142   343 24 
 1 54 22
 1  0   9368   3436   2596 177967600 16960 0 1142   352 24 
 1 53 23

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Gavin M. Roy

Heh looks like I left a trailing thought...

My post wasn't saying don't use journaled filesystems, but rather  
that it can be slower than non-journaled filesystems, and I don't  
consider recovery time from a crash to be a factor in determining the  
speed of reads and writes on the data.  That being said, I think  
Tom's reply on what to journal and not to journal should really put  
an end to this side of the conversation.


Gavin

On Dec 1, 2005, at 10:49 AM, Gavin M. Roy wrote:


Here's a fairly recent post on reiserfs (and performance):

http://archives.postgresql.org/pgsql-novice/2005-09/msg7.php

I'm still digging on performance of ext2 vrs journaled filesystems,  
as I know I've seen it before.


Gavin


My point was not in doing an fsck, but rather in
On Dec 1, 2005, at 10:40 AM, Tino Wildenhain wrote:


Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy:

Hi Michael,

I'm a fan of ReiserFS, and I can be wrong, but I believe using a
journaling filesystem for the PgSQL database could be slowing things
down.


Have a 200G+ database, someone pulling the power plug
or a regular reboot after a year or so.

Wait for the fsck to finish.

Now think again :-)

++Tino



Gavin M. Roy
800 Pound Gorilla
[EMAIL PROTECTED]



---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings


Gavin M. Roy
800 Pound Gorilla
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Insert performance slows down in large batch

2005-12-01 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 1) Why does the performance degrade as the table sizes grow?  Shouldn't
 the insert performance remain fairly constant if there are no indexes or
 constraints?  

Yeah, insert really should be a constant-time operation if there's no
add-on operations like index updates or FK checks.  Can you get more
information about where the time is going with gprof or oprofile?
(I'm not sure if oprofile is available for RHEL4, but it is in Fedora 4
so maybe RHEL4 has it too.)

If you're not comfortable with performance measurement tools, perhaps
you could crank up a test case program that just generates dummy data
and inserts it in the same way as your real application does.  If you
can confirm a slowdown in a test case that other people can look at,
we'd be happy to look into the reason for it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Michael Riess

Michael Riess [EMAIL PROTECTED] writes:

On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:

we are currently running a postgres server (upgraded to 8.1) which
has one large database with approx. 15,000 tables. Unfortunately
performance suffers from that, because the internal tables
(especially that which holds the attribute info) get too large.

(We NEED that many tables, please don't recommend to reduce them)


Have you ANALYZEd your database? VACUUMing?

Of course ... before 8.1 we routinely did a vacuum full analyze each
night. As of 8.1 we use autovacuum.


VACUUM FULL was probably always overkill, unless always includes
versions prior to 7.3...


Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, 
but the database got considerably slower near the end of the week.





BTW, are you using some kind of weird ERP? I have one that treat
informix as a fool and don't let me get all of informix potential...
maybe the same is in your case...

No. Our database contains tables for we content management
systems. The server hosts approx. 500 cms applications, and each of
them has approx. 30 tables.

That's why I'm asking if it was better to have 500 databases with 30
tables each. In previous Postgres versions this led to even worse
performance ...


This has the feeling of fitting with Alan Perlis' dictum below...

Supposing you have 500 databases, each with 30 tables, each with 4
indices, then you'll find you have, on disk...

# of files = 500 x 30 x 5 = 75000 files

If each is regularly being accessed, that's bits of 75000 files
getting shoved through OS and shared memory caches.  Oh, yes, and
you'll also have regular participation of some of the pg_catalog
files, with ~500 instances of THOSE, multiplied some number of ways...



Not all of the tables are frequently accessed. In fact I would estimate 
that only 20% are actually used ... but there is no way to determine if 
or when a table will be used. I thought about a way to swap out tables 
which have not been used for a couple of days ... maybe I'll do just 
that. But it would be cumbersome ... I had hoped that an unused table 
does not hurt performance. But of course the internal tables which 
contain the meta info get too large.



An application with 15000 frequently accessed tables doesn't strike me
as being something that can possibly turn out well.  You have, in
effect, more tables than (arguably) bloated ERP systems like SAP R/3;
it only has a few thousand tables, and since many are module-specific,
and nobody ever implements *all* the modules, it is likely only a few
hundred that are hot spots.  No 15000 there..


I think that my systems confirms with the 80/20 rule ...
.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Jaime Casanova
On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:
  Michael Riess [EMAIL PROTECTED] writes:
  On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:
  we are currently running a postgres server (upgraded to 8.1) which
  has one large database with approx. 15,000 tables. Unfortunately
  performance suffers from that, because the internal tables
  (especially that which holds the attribute info) get too large.
 
  (We NEED that many tables, please don't recommend to reduce them)
 
  Have you ANALYZEd your database? VACUUMing?
  Of course ... before 8.1 we routinely did a vacuum full analyze each
  night. As of 8.1 we use autovacuum.
 
  VACUUM FULL was probably always overkill, unless always includes
  versions prior to 7.3...

 Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL,
 but the database got considerably slower near the end of the week.

 
  BTW, are you using some kind of weird ERP? I have one that treat
  informix as a fool and don't let me get all of informix potential...
  maybe the same is in your case...
  No. Our database contains tables for we content management
  systems. The server hosts approx. 500 cms applications, and each of
  them has approx. 30 tables.
 
  That's why I'm asking if it was better to have 500 databases with 30
  tables each. In previous Postgres versions this led to even worse
  performance ...
 
  This has the feeling of fitting with Alan Perlis' dictum below...
 
  Supposing you have 500 databases, each with 30 tables, each with 4
  indices, then you'll find you have, on disk...
 
  # of files = 500 x 30 x 5 = 75000 files
 
  If each is regularly being accessed, that's bits of 75000 files
  getting shoved through OS and shared memory caches.  Oh, yes, and
  you'll also have regular participation of some of the pg_catalog
  files, with ~500 instances of THOSE, multiplied some number of ways...
 

 Not all of the tables are frequently accessed. In fact I would estimate
 that only 20% are actually used ... but there is no way to determine if
 or when a table will be used. I thought about a way to swap out tables
 which have not been used for a couple of days ... maybe I'll do just
 that. But it would be cumbersome ... I had hoped that an unused table
 does not hurt performance. But of course the internal tables which
 contain the meta info get too large.

  An application with 15000 frequently accessed tables doesn't strike me
  as being something that can possibly turn out well.  You have, in
  effect, more tables than (arguably) bloated ERP systems like SAP R/3;
  it only has a few thousand tables, and since many are module-specific,
  and nobody ever implements *all* the modules, it is likely only a few
  hundred that are hot spots.  No 15000 there..

 I think that my systems confirms with the 80/20 rule ...
 .


How many disks do you have i imagine you can put tables forming one
logical database in a tablespace and have tables spread on various
disks...


--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Merlin Moncure
 we are currently running a postgres server (upgraded to 8.1) which has
 one large database with approx. 15,000 tables. Unfortunately
performance
 suffers from that, because the internal tables (especially that which
 holds the attribute info) get too large.
 
 (We NEED that many tables, please don't recommend to reduce them)
 
 Logically these tables could be grouped into 500 databases. My
question
 is:
 
 Would performance be better if I had 500 databases (on one postgres
 server instance) which each contain 30 tables, or is it better to have
 one large database with 15,000 tables? In the old days of postgres 6.5
 we tried that, but performance was horrible with many databases ...
 
 BTW: I searched the mailing list, but found nothing on the subject -
and
 there also isn't any information in the documentation about the
effects
 of the number of databases, tables or attributes on the performance.
 
 Now, what do you say? Thanks in advance for any comment!

I've never run near that many databases on one box so I can't comment on
the performance.  But let's assume for the moment pg runs fine with 500
databases.  The most important advantage of multi-schema approach is
cross schema querying.  I think as you are defining your problem this is
a better way to do things.

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Scott Marlowe
On Thu, 2005-12-01 at 13:34, Michael Riess wrote:
  Michael Riess [EMAIL PROTECTED] writes:
  On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote:
  we are currently running a postgres server (upgraded to 8.1) which
  has one large database with approx. 15,000 tables. Unfortunately
  performance suffers from that, because the internal tables
  (especially that which holds the attribute info) get too large.
 
  (We NEED that many tables, please don't recommend to reduce them)
 
  Have you ANALYZEd your database? VACUUMing?
  Of course ... before 8.1 we routinely did a vacuum full analyze each
  night. As of 8.1 we use autovacuum.
  
  VACUUM FULL was probably always overkill, unless always includes
  versions prior to 7.3...
 
 Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, 
 but the database got considerably slower near the end of the week.

Generally, this means either your vacuums are too infrequent, or your
fsm settings are too small.

Note that vacuum and analyze aren't married any more, like in the old
days.  You can issue either separately, depending on your usage
conditions.

Note that with the newest versions of PostgreSQL you can change the
settings for vacuum priority so that while it takes longer to vacuum, it
doesn't stomp on the other processes toes so much anymore, so more
frequent plain vacuums may be the answer.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote:

 what i noticed is autovacuum not working properly as it should. i had 8.1 
 running with autovacuum for just 2 days or so and got warnings in pgadmin 
 that my tables would need an vacuum.

Hum, so how is autovacuum's documentation lacking?  Please read it
critically and let us know so we can improve it.

http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM

Maybe what you need is to lower the vacuum base threshold for tables
that are small.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] COPY into table too slow with index: now an I/O question

2005-12-01 Thread Rick Schumeyer








As a follow up to my own question:



I reran the COPY both ways (with the index
and without) while running iostat. The following values

are averages:


%user %nice %sys %iowait %idle

no index
39 0
2.8 11 47

index
16 1.5 2.1
34 46



Im no performance guru, so please
indulge a couple of silly questions:



1)
Why is there so much idle
time? I would think the CPU would either be busy or waiting for IO.

2)
It seems that I need to
improve my disk situation. Would it help to add another drive to my PC
and

keep the input data on a separate drive
from my pg tables? If so, some pointers on the best way to set that up

would be appreciated.



Please let me know if anyone has
additional ideas.





-Original Message-
From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Rick Schumeyer
Sent: Thursday,
 December 01, 2005 12:58 PM
To:
pgsql-performance@postgresql.org
Subject: [PERFORM] COPY into table
too slow with index



Im running postgresql 8.1.0 with postgis 1.0.4 on a
FC3 system, 3Ghz, 1 GB memory.



I am using COPY to fill a table that contains one postgis
geometry column.



With no geometry index, it takes about 45 seconds to COPY
one file.



If I add a geometry index, this time degrades. It
keeps getting worse as more records are

added to the table. It was up to over three minutes
per file on my most recent test.



The problem is that each file contains about 5  10
minutes of data. Eventually, I want to

add the data to the table in real time.
So the COPY needs to take less time than 

actually generating the data.



Here is the relevant section of my postgresql.conf.



# - Memory -



shared_buffers =
5000
# min 16 or max_connections*2, 8KB each

#temp_buffers =
1000
# min 100, 8KB each

#max_prepared_transactions =
5 # can be 0
or more

# note: increasing
max_prepared_transactions costs ~600 bytes of shared memory

# per transaction slot, plus
lock space (see max_locks_per_transaction).

work_mem =
2
# min 64, size in KB

maintenance_work_mem =
2 # min 1024, size in KB

#max_stack_depth =
2048 #
min 100, size in KB



Any suggestions for improvement?










Re: [PERFORM] COPY into table too slow with index: now an I/O

2005-12-01 Thread Luke Lonergan
Rick,

On 12/1/05 2:18 PM, Rick Schumeyer [EMAIL PROTECTED] wrote:

 As a follow up to my own question:
  
 I reran the COPY both ways (with the index and without) while running iostat.
 The following values
 are averages:
 %user  %nice  %sys  %iowait %idle
 no index 39  02.811  47
 index16  1.5   2.1   34  46
  
 I¹m no performance guru, so please indulge a couple of silly questions:
  
 1)  Why is there so much idle time?  I would think the CPU would either be
 busy or waiting for IO.

The 100% represents 2 CPUs.  When one CPU is fully busy you should see 50%
idle time.

 2)  It seems that I need to improve my disk situation.  Would it help to
 add another drive to my PC and
 keep the input data on a separate drive from my pg tables?  If so, some
 pointers on the best way to set that up
 would be appreciated.

Putting the index and the table on separate disks will fix this IMO.  I
think you can do that using the TABLESPACE concept for each.

The problem I see is nicely shown by the increase in IOWAIT between the two
patterns (with and without index).  It seems likely that the pattern is:
A - insert a tuple into the table
B - insert an entry into the index
C - fsync the WAL
- repeat

This can be as bad as having a disk seek to access the table data every time
the 8KB page boundary is crossed, then again for the index, then again for
the WAL, and random disk seeks happen only as fast as about 10ms, so you can
only do those at a rate of 100/s.

 Please let me know if anyone has additional ideas.

This is a fairly common problem, some people drop the index, load the data,
then recreate the index to get around it.

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] COPY into table too slow with index: now an I/O

2005-12-01 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes:
 The problem I see is nicely shown by the increase in IOWAIT between the two
 patterns (with and without index).  It seems likely that the pattern is:
 A - insert a tuple into the table
 B - insert an entry into the index
 C - fsync the WAL
 - repeat

 This can be as bad as having a disk seek to access the table data every time
 the 8KB page boundary is crossed, then again for the index, then again for
 the WAL, and random disk seeks happen only as fast as about 10ms, so you can
 only do those at a rate of 100/s.

That analysis is far too simplistic, because only the WAL write has to
happen before the transaction can commit.  The table and index writes
will normally happen at some later point in the bgwriter, and with any
luck there will only need to be one write per page, not per tuple.

It is true that having WAL and data on the same spindle is bad news,
because the disk head has to divide its time between synchronous WAL
writes and asynchronous writes of the rest of the files.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] COPY into table too slow with index: now an I/O question

2005-12-01 Thread Rick Schumeyer
I only have one CPU.  Is my copy of iostat confused, or does this have
something to do with hyperthreading or dual core?  (AFAIK, I don't have a
dual core!)

The problem (for me) with dropping the index during a copy is that it takes
tens of minutes (or more) to recreate the geometry index once the table has,
say, 50 million rows.

 -Original Message-
 From: Luke Lonergan [mailto:[EMAIL PROTECTED]
 Sent: Thursday, December 01, 2005 9:27 PM
 To: Rick Schumeyer; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] COPY into table too slow with index: now an I/O
 question
 
 Rick,
 
 On 12/1/05 2:18 PM, Rick Schumeyer [EMAIL PROTECTED] wrote:
 
  As a follow up to my own question:
 
  I reran the COPY both ways (with the index and without) while running
 iostat.
  The following values
  are averages:
  %user  %nice  %sys  %iowait %idle
  no index 39  02.811  47
  index16  1.5   2.1   34  46
 
  I¹m no performance guru, so please indulge a couple of silly questions:
 
  1)  Why is there so much idle time?  I would think the CPU would
 either be
  busy or waiting for IO.
 
 The 100% represents 2 CPUs.  When one CPU is fully busy you should see 50%
 idle time.
 
  2)  It seems that I need to improve my disk situation.  Would it
 help to
  add another drive to my PC and
  keep the input data on a separate drive from my pg tables?  If so, some
  pointers on the best way to set that up
  would be appreciated.
 
 Putting the index and the table on separate disks will fix this IMO.  I
 think you can do that using the TABLESPACE concept for each.
 
 The problem I see is nicely shown by the increase in IOWAIT between the
 two
 patterns (with and without index).  It seems likely that the pattern is:
 A - insert a tuple into the table
 B - insert an entry into the index
 C - fsync the WAL
 - repeat
 
 This can be as bad as having a disk seek to access the table data every
 time
 the 8KB page boundary is crossed, then again for the index, then again for
 the WAL, and random disk seeks happen only as fast as about 10ms, so you
 can
 only do those at a rate of 100/s.
 
  Please let me know if anyone has additional ideas.
 
 This is a fairly common problem, some people drop the index, load the
 data,
 then recreate the index to get around it.
 
 - Luke



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 15,000 tables

2005-12-01 Thread Craig A. James

So say I need 10,000 tables, but I can create tablespaces.  Wouldn't that solve 
the performance problem caused by Linux's (or ext2/3's) problems with large 
directories?

For example, if each user creates (say) 10 tables, and I have 1000 users, I 
could create 100 tablespaces, and assign groups of 10 users to each tablespace. 
 This would limit each tablespace to 100 tables, and keep the ext2/3 
file-system directories manageable.

Would this work?  Would there be other problems?

Thanks,
Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] COPY into table too slow with index: now an I/O

2005-12-01 Thread Luke Lonergan
Tom, 

 That analysis is far too simplistic, because only the WAL 
 write has to happen before the transaction can commit.  The 
 table and index writes will normally happen at some later 
 point in the bgwriter, and with any luck there will only need 
 to be one write per page, not per tuple.

That's good to know - makes sense.  I suppose we might still thrash over
a 1GB range in seeks if the BG writer starts running at full rate in the
background, right?  Or is there some write combining in the BG writer?

 It is true that having WAL and data on the same spindle is 
 bad news, because the disk head has to divide its time 
 between synchronous WAL writes and asynchronous writes of the 
 rest of the files.

That sounds right - could be tested by him turning fsync off, or by
moving the WAL to a different spindle (note I'm not advocating running
in production with fsync off).

- Luke


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Database restore speed

2005-12-01 Thread Luke Lonergan
Steve, 

 When we restore the postmaster process tries to use 100% of the CPU. 
 
 The questions we have are: 
 
 1) What is postmaster doing that it needs so much CPU? 

Parsing mostly, and attribute conversion from text to DBMS native
formats.
 
 2) How can we get our system to go faster? 

Use Postgres 8.1 or Bizgres.  Get a faster CPU. 

These two points are based on our work to improve COPY speed, which led
to a near doubling in Bizgres, and in the 8.1 version it's about 60-70%
faster than in Postgres 8.0.

There are currently two main bottlenecks in COPY, one is parsing +
attribute conversion (if the postgres CPU is nailed at 100% that's what
your limit is) and the other is the write speed through the WAL.  You
can roughly divide the write speed of your disk by 3 to get that limit,
e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed
might be limited to 33MB/s.  You can tell which of these limits you've
hit using vmstat 1 on Linux or iostat on Solaris and watch the blocks
input/output on your disk while you watch your CPU.

 Note: We've tried adjusting the checkpoint_segements 
 parameter to no effect. 

No surprise.

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] filesystem performance with lots of files

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Qingqing Zhou wrote:


David Lang [EMAIL PROTECTED] wrote


a few weeks ago I did a series of tests to compare different filesystems.
the test was for a different purpose so the particulars are not what I
woud do for testing aimed at postgres, but I think the data is relavent)
and I saw major differences between different filesystems, I'll see aobut
re-running the tests to get a complete set of benchmarks in the next few
days. My tests had their times vary from 4 min to 80 min depending on the
filesystem in use (ext3 with hash_dir posted the worst case). what testing
have other people done with different filesystems?



That's good ... what benchmarks did you used?


I was doing testing in the context of a requirement to sync over a million 
small files from one machine to another (rsync would take 10 hours to do 
this over a 100Mb network so I started with the question 'how long would 
it take to do a tar-ftp-untar cycle with no smarts) so I created 1m x 1K 
files in a three deep directory tree (10d/10d/10d/1000files) and was doing 
simple 'time to copy tree', 'time to create tar', 'time to extract from 
tar', 'time to copy tarfile (1.6G file). I flushed the memory between each 
test with cat largefile /dev/null (I know now that I should have 
unmounted and remounted between each test), source and destination on 
different IDE controllers


I don't have all the numbers readily available (and I didn't do all the 
tests on every filesystem), but I found that even with only 1000 
files/directory ext3 had some problems, and if you enabled dir_hash some 
functions would speed up, but writing lots of files would just collapse 
(that was the 80 min run)


I'll have to script it and re-do the tests (and when I do this I'll also 
set it to do a test with far fewer, far larger files as well)


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Open request for benchmarking input (fwd)

2005-12-01 Thread David Lang
here are the suggestions from the MySQL folks, what additional tests 
should I do.


I'd like to see some tests submitted that map out when not to use a 
particular database engine, so if you have a test that you know a 
particular database chokes on let me know (bonus credibility if you 
include tests that your own database has trouble with :)


David Lang

-- Forwarded message -- Date: Thu, 01 Dec 2005 16:14:25

David,

The choice of benchmark depends on what kind of application would you
like to see performance for.

Than someone speaks about one or other database to be faster than other
in general, it makes me smile.  That would be the same as tell one car
would be able to win all competitions starting from Formula-1 and ending
with off-road  racing.

There are certain well known cases when MySQL will be faster - for
example in memory storage engine is hard to beat in point selects, or
bulk inserts  in MyISAM (no transactional overhead).

There are certain known cases when MySQL would not perform well - it is
easy to build the query using subqueries which would be horribly slow on
MySQL but decent on postgresql... but well  writing application for
MySQL you would not write such query.


I think most database agnostic way would be to select the workload
from user point of view and have it implemented the most efficient way
for database in question -  for example you may find TPC-C
implementations by different vendors are a lot different.




For my own interests, I would like to at least cover the following bases:
32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type
tests (data  memory); and web prefs test (active data RAM)


You may grab Dell DVD store:

http://linux.dell.com/dvdstore/

for Web benchmark.  It does not have PostgreSQL build in but there some
implementations available in the Internet

DBT2 by OSDL is other good candidate - it does support postgreSQL and
MySQL natively.


If you want some raw performance number such as number selects/sec you
may use SysBench - http://sysbench.sourceforge.net


For DataWarehouse workloads you could grab  TPC-H  or   DBT3
implementation by OSDL  - We run this successfully with MySQL

You also could take a look at http://benchw.sourceforge.net/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] filesystem performance with lots of files

2005-12-01 Thread Qingqing Zhou


On Fri, 2 Dec 2005, David Lang wrote:

 I don't have all the numbers readily available (and I didn't do all the
 tests on every filesystem), but I found that even with only 1000
 files/directory ext3 had some problems, and if you enabled dir_hash some
 functions would speed up, but writing lots of files would just collapse
 (that was the 80 min run)


Interesting. I would suggest test small number but bigger file would be
better if the target is for database performance comparison. By small
number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G
(PostgreSQL data file is at most this size under normal installation).

Let's take TPCC as an example, if we get a TPCC database of 500 files,
each one is at most 1G (PostgreSQL has this feature/limit in ordinary
installation), then this will give us a 500G database, which is big enough
for your current configuration.

Regards,
Qingqing

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Craig A. James wrote:

So say I need 10,000 tables, but I can create tablespaces.  Wouldn't that 
solve the performance problem caused by Linux's (or ext2/3's) problems with 
large directories?


For example, if each user creates (say) 10 tables, and I have 1000 users, I 
could create 100 tablespaces, and assign groups of 10 users to each 
tablespace.  This would limit each tablespace to 100 tables, and keep the 
ext2/3 file-system directories manageable.


Would this work?  Would there be other problems?


This would definantly help, however there's still the question of how 
large the tables get, and how many total files are needed to hold the 100 
tables.


you still have the problem of having to seek around to deal with all these 
different files (and tablespaces just spread them further apart), you 
can't solve this, but a large write-back journal (as opposed to 
metadata-only) would mask the problem.


it would be a trade-off, you would end up writing all your data twice, so 
the throughput would be lower, but since the data is safe as soon as it 
hits the journal the latency for any one request would be lower, which 
would allow the system to use the CPU more and overlap it with your 
seeking.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Database restore speed

2005-12-01 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Steve,


When we restore the postmaster process tries to use 100% of the CPU.

The questions we have are:

1) What is postmaster doing that it needs so much CPU?


Parsing mostly, and attribute conversion from text to DBMS native
formats.


2) How can we get our system to go faster?


Use Postgres 8.1 or Bizgres.  Get a faster CPU.

These two points are based on our work to improve COPY speed, which led
to a near doubling in Bizgres, and in the 8.1 version it's about 60-70%
faster than in Postgres 8.0.

There are currently two main bottlenecks in COPY, one is parsing +
attribute conversion (if the postgres CPU is nailed at 100% that's what
your limit is) and the other is the write speed through the WAL.  You
can roughly divide the write speed of your disk by 3 to get that limit,
e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed
might be limited to 33MB/s.  You can tell which of these limits you've
hit using vmstat 1 on Linux or iostat on Solaris and watch the blocks
input/output on your disk while you watch your CPU.


Luke, would it help to have one machine read the file and have it connect 
to postgres on a different machine when doing the copy? (I'm thinking that 
the first machine may be able to do a lot of the parseing and conversion, 
leaving the second machine to just worry about doing the writes)


David Lang

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings