Re: [PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Paul Thomas
On 13/09/2004 12:38 Damien Dougan wrote:
[snip]
Are there any tools/tricks/tips with regards to extracting large volumes
of data across related tables from Postgres? It doesnt have to export
into XML, we can do post-processing on the extracted data as needed -
the important thing is to keep the relationship between PvA and PvB on a
row-by-row basis.
Have you considered using cursors?
--
Paul Thomas
+--+---+
| Thomas Micro Systems Limited | Software Solutions for Business   |
| Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk |
+--+---+
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] insert

2004-08-13 Thread Paul Thomas
On 13/08/2004 13:10 Ulrich Wisser wrote:
Hi,
my inserts are done in one transaction, but due to some foreign key 
constraints and five indexes sometimes the 100 inserts will take more 
than 5 minutes.
Two possibilities come to mind:
a) you need an index on the referenced FK field
b) you have an index but a type mis-match (e.g, an int4 field referencing 
an int8 field)

Either of these will cause a sequential table scan and poor performance.
--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] The black art of postgresql.conf tweaking

2004-08-04 Thread Paul Thomas
On 04/08/2004 13:45 Paul Serby wrote:
Can anyone give a good reference site/book for getting the most out of 
your postgres server.

All I can find is contradicting theories on how to work out your 
settings.

This is what I followed to setup our db server that serves our web 
applications.

http://www.phpbuilder.com/columns/smith20010821.php3?page=2
We have a Dell Poweredge with the following spec.
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
CPU: Intel(R) Xeon(TM) CPU 3.06GHz (512 KB Cache)
Physical Memory: 2077264 kB
Swap Memory: 2048244 kB
Apache on the Web server can take up to 300 connections and PHP is 
using  pg_pconnect

Postgres is set with the following.
max_connections = 300
shared_buffers = 38400
Might be higher that neccessary. Some people reckon that there's no 
measurable performance going above ~10,000 buffers


sort_mem = 12000
Do you really need 12MB of sort memory? Remember that this is per 
connection so you could end up with 300x that being allocated in a worst 
case scenario.

But Apache is still maxing out the non-super user connection limit.
The machine is under no load and I would like to up the max_connections 
but I would like to know more about what you need to consider before 
doing so.
I can't think why you should be maxing out when under no load. Maybe you 
need to investigate this further.

The only other source I've found is this:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
But following its method my postgres server locks up straight away as it 
recommends setting max_connections to 16 for Web sites?
I think you've mis-interpreted that. She's talking about using persistent 
connections - i.e., connection pooling.

Is there a scientific method for optimizing postgres or is it all 
'finger in the air' and trial and error.
Posting more details of the queries which are giving the performance 
problems will enable people to help you. You're vacuum/analyzing regularly 
of course ;) People will want to know:

- PostgreSQL version
- hardware configuration (SCSI or IDE? RAID level?)
- table schemas
- queries together with EXPLAIN ANALYZE output
also output from utils like vmstat, top etc may be of use.
HTH
--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Visual Explain

2004-06-17 Thread Paul Thomas
On 17/06/2004 12:10 Adam Witney wrote:
Will this run on other platforms? OSX maybe?
It's a Java app so it runs on any any platform with a reasonably modern
Java VM.
--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] ORDER BY user defined function performance issues

2004-06-11 Thread Paul Thomas
On 11/06/2004 12:14 Nick Trainor wrote:
[snip]
However, when I seek to ORDER the results, then it takes 'forever':
EXPLAIN ANALYSE SELECT t1.value1,t1.value2,
getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0')
FROM tblitem t1 WHERE  t1.type_id=23::int2  and (t1.id = 1::int8 and
t1.id=9223372036854775807::int8)
ORDER BY 
getday_total('1','23',t1.id::integer,'31','59','2','2004','182','153','6','2004','0')
DESC
OFFSET 0 LIMIT 20;
I expect that pg is having to evaluate your function every time it does a 
compare within its sort. Something like 
SELECT t1.value1,t1.value2,
getday_total(..) AS foo
FROM tblitem t1 WHERE  t1.type_id=23::int2  and (t1.id = 1::int8 and 
t1.id=9223372036854775807::int8)
ORDER BY foo

might work. Otherwise try selecting into a temp table then doing the order 
by on that table.

HTH
--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] MySQL vs PG TPC-H benchmarks

2004-04-21 Thread Paul Thomas
On 21/04/2004 09:31 Nick Barr wrote:
Hi,

Has anyone had a look at:

http://people.ac.upc.es/zgomez/

I realize that MySQL  PG cannot really be compared (especially when you 
consider the issues that MySQL has with things like data integrity) but 
still surely PG would perform better than the stats show (i.e. #7 31.28 
seconds versus 42 minutes!!!).
Looks like he's using the default postgresql.conf settings in which case 
I'm not suprised at pg looking so slow. His stated use of foreign keys 
invalidates the tests anyway as MyISAM tables don't support FKs so we're 
probably seeing FK check overheads in pg that are simply ignore by MySQL. 
In an honest test, MySQL should be reported as failing those tests.

Perhaps one of the advocay team will pick up the batton?
On a side note it certainly looks like linux kernel 2.6 is quite a bit 
faster in comparision to 2.4.
Yes, I've seen other benchmarks which also show that.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] MySQL vs PG TPC-H benchmarks

2004-04-21 Thread Paul Thomas
On 21/04/2004 14:31 Cestmir Hybl wrote:
 Looks like he's using the default postgresql.conf settings in which
case
 I'm not suprised at pg looking so slow.
The question also is, IMHO, why the hell, postgreSQL still comes out of
the
box with so stupid configuration defaults, totally underestimated for
todays
average hardware configuration (1+GHz, 0.5+GB RAM, fast FSB, fast HDD).
It seems to me better strategy to force that 1% of users to downgrade
cfg.
than vice-versa.
regards
ch
This has been discussed many times before. Check the archives.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-04 Thread Paul Thomas
On 04/04/2004 09:56 Gary Doades wrote:
Unfortunately I don't understand the question!

My background is the primarily Win32. The last time I used a *nix OS
was about 20 years ago apart from occasional dips into the linux OS
over the past few years. If you can tell be how to find out what you want
I will gladly give you the information.


Googling threw up

http://spider.tm/apr2004/cstory2.html

Interesting and possibly relevant quote:

Benchmarks have shown that in certain conditions the anticipatory 
algorithm is almost 10 times faster than what 2.4 kernel supports.

HTH

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] select slow?

2004-03-31 Thread Paul Thomas
On 31/03/2004 16:40 Tom Lane wrote:
Jaime Casanova [EMAIL PROTECTED] writes:
 There are no indexes yet, and the table is just 6 rows long so even if
 indexes exists the planner will do a seq scan. that's my whole point
63m for
 seq scan in 6 rows table is too much.
That was 63 milliseconds, according to your original post, which seems
perfectly reasonable to me seeing that it's not a super-duper server.
The problem sounds to be either on the client side or somewhere in your
network.  I don't know anything about VB, but you might want to look
through the client-side operations to see what could be eating up the 13
seconds.


Given that the client and server are on different machines, I'm wondering 
the bulk of the 13 seconds is due a network mis-configuration or a very 
slow DNS server...

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Scaling further up

2004-03-03 Thread Paul Thomas
On 02/03/2004 23:25 johnn wrote:
[snip]
random_page_cost should be set with the following things taken into
account:
  - seek speed
Which is not exactly the same thing as spindle speed as it's a combination 
of spindle speed and track-to-track speed. I think you'll find that a 15K 
rpm disk, whilst it will probably have a lower seek time than a 10K rpm 
disk, won't have a proportionately (i.e., 2/3rds) lower seek time.

  - likelihood of page to be cached in memory by the kernel
That's effective cache size.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Scaling further up

2004-03-03 Thread Paul Thomas
On 03/03/2004 18:23 scott.marlowe wrote:
[snip]
There are three factors that affect how fast you can get to the next
sector:
seek time
settle time
rotational latency
Most drives only list the first, and don't bother to mention the other
two.
Ah yes, one of my (very) few still functioning brain cells was nagging 
about another bit of time in the equation :)

On many modern drives, the seek times are around 5 to 10 milliseconds.
[snip]
Going back to the OPs posting about random_page_cost, imagine I have 2 
servers identical in every way except the disk drive. Server A has a 10K 
rpm drive and server B has a 15K rpm drive. Seek/settle times aren't 
spectacularly different between the 2 drives. I'm wondering if drive B 
might actually merit a _higher_ random_page_cost than drive A as, once it 
gets settled on a disk track, it can suck the data off a lot faster. 
opinions/experiences anyone?

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Compile Vs RPMs

2004-02-03 Thread Paul Thomas
On 03/02/2004 20:58 Anjan Dave wrote:
Hello,

I would like to know whether there are any significant performance
advantages of compiling (say, 7.4) on your platform (being RH7.3, 8, and
9.0, and Fedora especially) versus getting the relevant binaries (rpm)
from the postgresql site? Hardware is Intel XEON (various speeds, upto
2.8GHz, single/dual/quad configuration).
Very unlikely I would have thought. Databases tend to speed-limited by I-O 
performance and the amount of RAM available for caching etc. Having said 
that, I've only got one machine (the laptop on which I'm writing this 
email) which has still got its rpm binaries. My other machines have all 
been upgraded from source.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-09 Thread Paul Thomas
On 09/10/2003 09:29 Oliver Scheit wrote:
Hi guys,

I followed the discussion and here are my 0.2$:

I think instead of thinking about where to put the
information about tuning, someone should provide a
pgsql-autotune. Maybe even a shell script would do the
trick.
It's not so hard to find out, how much memory is installed,
and IMHO SHARED_BUFFERS, SORT_MEM and EFFECTIVE_CACHE_SIZE
depend heavily on this. a cat /proc/sys/kernel/shmmax
would give some valuable information on linux boxes,
there is probably other stuff for different OSes.
random_page_cost could be set after probing the harddisks,
maybe even do a hdparm -tT if they seem to be ATA, not SCSI.
Now, let's pretend the script finds out there is 1 GB RAM,
it could ask something like Do you want to optimize the
settings for postgres (other applications may suffer from
having not enough RAM) or do you want to use moderate
settings?
Something like this, you get the idea.


ISR reading that 7.4 will use a default of shared_beffers = 1000 if the 
machine can support it (most can). This alone should make a big difference 
in out-of-the-box performance.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Attempt at work around of int4 query won't touch int8 index ...

2003-09-16 Thread Paul Thomas
On 15/09/2003 08:42 Shridhar Daithankar wrote:
Is it possible to follow data type upgrade model in planner?  Something
like in
C/C++ where data types are promoted upwards to find out better plan?
int2-int4-int8-float4-float8 types.

 That could be a clean solution..

just a thought..

Interestingly, float8 indexes do work OK (float8col = 99). I spend a large 
part of yesterday grepping through the sources to try and find out why 
this should be so. No luck so far but I'm going to keep on trying!

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tests

2003-08-22 Thread Paul Thomas
On 22/08/2003 22:00 Bill Moran wrote:

[snip] Well, as much as I _am_ a die-hard BSD geek, I'm far more 
interested in
knowing what platform is really best when I need a top-notch PostgreSQL
server.

I'm going to try to force some time this weekend to do some tests ...
we'll see if I succeed ...
I, for one, would be very interested to see your results (can't you find 
something better to at the weekend than  about with computers ?:)

selfish-mode
What I'd really be interested in is a comparison of Linux vs BSD using 
each OS's variations of file system on the same single-processor Intel/AMD 
based hardware.
/selfish-mode

Selfishness and sillyness aside, I'm sure your tests will of interest to 
us all. Thanks in advance

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How to force PostgreeSQL to work faster?

2003-08-14 Thread Paul Thomas
On 11/08/2003 09:59 Serge Dorofeev wrote:
Hi PostrgeSQL  team,

My PostrgeSQL installed as part of CYGWIN (Windows XP).
I have compared performance PostrgeSQL to MS SQL (I used a little Java
program with number of inserts in table).
MS SQL is faster in 12 times :-(
It's very strange results.
Guys who developed this server: what you can tell in this - what
customizations needs to increase of productivity?
How to force PostgreeSQL to work faster?
 	 Speed (inserts/sec)	 Elapsed time (ms)	 
MS SQL (Average):	 295	 39 869	 
 	 testInsert 5000	  	 
 	 263	 18 977	 
 	 255	 19 619	 
 	 306	 16 334	 
 	  	  	 
 	 testInsert 1	  	 
 	 315	 31 716	 
 	 324	 30 905	 
 	 319	 31 325	 
 	  	  	 
 	 testInsert 2	  	 
 	 241	 82 919	 
 	 313	 63 922	 
 	 317	 63 101	 
 	  	  	 
PostrgreSQL (Average):	 24	 520 160	 
 	 testInsert 5000	  	 
 	 26	 191 434	 
 	 26	 191 264	 
 	 26	 192 295	 
 	  	  	 
 	 testInsert 1	  	 
 	 22	 463 669	 
 	 25	 393 510	 
 	 24	 409 528	 
 	  	  	 
 	 testInsert 2	  	 
 	 24	 834 911	 
 	 17	 1 184 613	 
 	 24	 820 218	 
MS SQL is faster (times):	 12	 13	 
You don't give any details about your test code or how the databases are 
configured so I'm guessing that you're inserts use an autocommitting 
connection. For PostgreSQL, this causes each insert to be run inside a 
tranaction and the transaction is then immediately written to disk. My 
guess is that MS SQL behaves differently and doesn't immediately write to 
disk (faster maybe but could cause data corruption). Try modifying your 
program to have connection.setAutoCommit(false) and do a 
connection.commit() after say every 100 inserts.

HTH

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Paul Thomas
On 08/08/2003 11:28 mixo wrote:
I have just installed redhat linux 9 which ships with Pg
7.3.2. Pg has to be setup so that data inserts (blobs) should
be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.


It doesn't sound like a particlarly heavy loading to me. I'd start off 
with something like

shared_buffers = 2000
sort_mem = 1024
max_coonections = 100
and see how it performs under normal business loading.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] index / sequential scan problem

2003-07-17 Thread Paul Thomas
On 17/07/2003 10:01 Fabian Kreitner wrote:

Hi Fabian,

When you are doing these kinds of tests, you need to be aware that the 
kernel may have most of your data cached after the first query and this 
may be why the second query appears to run faster.

Also don't be worried if the planner chooses a seq scan for small tables 
as the whole table can often be bought into memory with one IO whereas 
reading the index then the table would be 2 IOs. 
HTH

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Pgsql - Red Hat Linux - VS MySQL VS MSSQL

2003-07-14 Thread Paul Thomas
On 13/07/2003 20:51 Balazs Wellisch wrote:

[snip]
  So, does anyone here have any experience using RH AS and DB 2.1?

 Are RH still selling DB 2.1? I can't find it listed on their web site.
 --
Yes, it's available for free download. The documentation is here:
http://www.redhat.com/docs/manuals/database/. I'd welcome your oppinions
on
this product.
Thank you for your comments.
It looks like they just wrote a number of GUI versions of the command line
utilities. From what I can tell, its still a standard postgresql database
behind the scenes.
--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]