date:20061211


On 12/11/06, Alexander Staubo [EMAIL PROTECTED] wrote:

On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote:

 I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
 charsets to a database. They're all simple C strings that doesn't have
 the zero-byte in the midlle (like UTF16 would) and that doesn't
 require any different processing unless you are doing case insensitive
 search (them you would have a problem).

That's not the whole story. UTF-8 and other variable-width encodings
don't provide a 1:1 mapping of logical characters to single bytes; in
particular, combination characters opens the possibility of multiple
different byte sequences mapping to the same code point; therefore,
string comparison in such encodings generally cannot be done at the
byte level (unless, of course, you first acertain that the strings
involved are all normalized to an unambiguous subset of your encoding).

PostgreSQL's use of strings is not limited to string comparison.
Substring extraction, concatenation, regular expression matching, up/
downcasing, tokenization and so on are all part of PostgreSQL's small
library of text manipulation functions, and all deal with logical
characters, meaning they must be Unicode-aware.

Alexander.



You're right. I was thinking only about my cases that takes the
Unicode normatization for granted and doesn't use
regexp/tokenization/...
Thanks

Best
Daniel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] New to PostgreSQL, performance considerations


Hi Dave,

On 12/11/06, Dave Cramer [EMAIL PROTECTED] wrote:

Hi Daniel
On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote:

 Hi Gene,

 at my postgresql.conf, the only non-comented lines are:
 fsync = off
This can, and will result in lost data.


I know... If there is a power failure things can happen. I'm know, but
the performance dif is really really big I just have to decide if I'm
willing to take that chance or not.


 lc_messages = 'C'
 lc_monetary = 'C'
 lc_numeric = 'C'
 lc_time = 'C'

How much memory does this machine have and what version of postgresql
are you using?

It's only a test server with 512MB RAM, I only used it to see how well
would  the PostgreSQL do in a ugly case.

Daniel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Dave Cramer



On 11-Dec-06, at 5:36 AM, Daniel van Ham Colchete wrote:


Hi Dave,

On 12/11/06, Dave Cramer [EMAIL PROTECTED] wrote:

Hi Daniel
On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote:

 Hi Gene,

 at my postgresql.conf, the only non-comented lines are:
 fsync = off
This can, and will result in lost data.


I know... If there is a power failure things can happen. I'm know, but
the performance dif is really really big I just have to decide if I'm
willing to take that chance or not.


 lc_messages = 'C'
 lc_monetary = 'C'
 lc_numeric = 'C'
 lc_time = 'C'

How much memory does this machine have and what version of postgresql
are you using?

It's only a test server with 512MB RAM, I only used it to see how well
would  the PostgreSQL do in a ugly case.


Given that optimal performance for postgresql can require up to 50%  
of available memory, you are going to leave the OS with 256MB of  
memory ?


Dave


Daniel

---(end of  
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] New to PostgreSQL, performance considerations


Mike,

unfortunally I don't have any benchmarks right now. Doing something
like this with the same machine would be a 2-day work (at least).
Installing a Gentoo and putting it to run well is not a quick task
(although is easy).

But, trust me on this one. It's worth it. Think of this: PostgreSQL
and GNU LibC use a lot of complex algorithms: btree, hashes,
checksums, strings functions, etc... And you have a lot of ways to
compile it into binary code. Now you have Pentium4's vectorization
that allow you to run plenty of instructions in paralell, but AMD
doesn't have this. Intel also have SSE2 that makes double-precision
floatpoint operations a lot faster, AMD also doesn't have this (at
least on 32bits). Now imagine that you're RedHat and that you have to
deliver one CD to AMD and Intel servers. That means you can't use any
AMD-specific or Intel-specific tecnology at the binary level.

Things can get worse. If you take a look at GCC's code (at
gcc/config/i386/i386.c), you'll see that GCC knows what runs faster on
each processor. Let-me give an example with the FDIV and FSQRT
instructions:

ARCH - FDIV - FSQRT (costs relative to an ADD)
i386: 88 - 122
i486: 73 - 83
pentium: 39 - 70
k6: 56 - 56
Athlon: 24 - 35
K8: 19 - 35
Pentium4: 43 - 43
Nocona: 40 - 44

Imagine that you are GCC and that you have two options in front of
you: you can use FSQRT or FDIV plus 20 ADD/SUB. If you are on an
Pentium situation you should use the second option. But on a Pentium4
or on a Athlon you should choose for the first one. This was only an
example, nowadays you would have to choose between: 387, 3dNow,
3dNow+, SSE, SSE2, ...

With this info, GCC knows how to choose the best ways to doing things
to each processor. When you are compiling to an generic i586 (like
Fedora and RedHat), them you are using pentium's timings.

An example that I know of: it's impossible to run my software at a
high demanding customer without compiling it to the it's processor (I
make 5 compilations on every release). Using Intel's Compiler for
Intel's processors makes things even faster, but it is not free and
the how fast part really depends on your application is coded.

With 64bits processors, AMD and Intel restarted the process and
everyone has SSE2 (but not SSE3). Even so, the timings are also very
diferent.

Best regards
Daniel

On 12/11/06, Michael Stone [EMAIL PROTECTED] wrote:

On Sun, Dec 10, 2006 at 11:02:44PM -0200, Daniel van Ham Colchete wrote:
I'm using Gentoo Linux, so all my libraries (including glibc that is
very important to PostgreSQL), and all my softwares are compiled with
good CFLAG options to my processor (-O2 march=athlon-xp (...)). My
Linux is not an Intel-AMD binary compatible turtle like
Fedora/RedHat/SUSE/... It's really important to have your GLIBC
compiled for your processor.  It is essencial for performance.

Please, point to the benchmarks that demonstrate this for a postgres
application.

Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] New to PostgreSQL, performance considerations



 How much memory does this machine have and what version of postgresql
 are you using?
 It's only a test server with 512MB RAM, I only used it to see how well
 would  the PostgreSQL do in a ugly case.

Given that optimal performance for postgresql can require up to 50%
of available memory, you are going to leave the OS with 256MB of
memory ?


If it were the case I wouldn't have any problems letting the OS use
only 256MB, but this is not my production server. My production is
built yet. It'll have at least 2GB of memory.

But it's good to know anyway.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] New to PostgreSQL, performance considerations

On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
 But, trust me on this one. It's worth it.

You know what? I don't.

 Think of this: PostgreSQL and GNU LibC use a lot of complex algorithms:
 btree, hashes, checksums, strings functions, etc... And you have a lot of
 ways to compile it into binary code. Now you have Pentium4's vectorization
 that allow you to run plenty of instructions in paralell, but AMD doesn't
 have this. Intel also have SSE2 that makes double-precision floatpoint
 operations a lot faster, AMD also doesn't have this (at least on 32bits). 

Athlon 64 has SSE2, also in 32-bit-mode.

Of course, it doesn't really matter, since at the instant you hit the disk
even once, it's going to take a million cycles and any advantage you got from
saving single cycles is irrelevant.

 Imagine that you are GCC and that you have two options in front of
 you: you can use FSQRT or FDIV plus 20 ADD/SUB.

Could you please describe a reasonable case where GCC would have such an
option? I cannot imagine any.

 An example that I know of: it's impossible to run my software at a
 high demanding customer without compiling it to the it's processor (I
 make 5 compilations on every release).

What's your software? How can you make such assertions without backing them
up? How can you know that the same holds for PostgreSQL?

As Mike said, point to the benchmarks showing this essential difference
between -O2 and -O2 -mcpu=pentium4 (or whatever). The only single worthwhile
difference I can think of, is that glibc can use the SYSENTER function if it
knows you have a 686 or higher (which includes AMD), and with recent kernels,
I'm not even sure if that is needed anymore.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] New to PostgreSQL, performance considerations


On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:

unfortunally I don't have any benchmarks right now.


That's fairly normal for gentoo users pushing their compile options.

Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations


On 12/11/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
 But, trust me on this one. It's worth it.

You know what? I don't.

So test it yourself.


 Think of this: PostgreSQL and GNU LibC use a lot of complex algorithms:
 btree, hashes, checksums, strings functions, etc... And you have a lot of
 ways to compile it into binary code. Now you have Pentium4's vectorization
 that allow you to run plenty of instructions in paralell, but AMD doesn't
 have this. Intel also have SSE2 that makes double-precision floatpoint
 operations a lot faster, AMD also doesn't have this (at least on 32bits).

Athlon 64 has SSE2, also in 32-bit-mode.

It's true. But, I'm not saying that Postfix is faster on AMD or Intel
systems. I'm saying that it's a lot faster on you compile Postfix and
your glibc to your processor. AMD also has features that Intel systems
doesn't: 3dNow for example. The fact is that if your distro is
compatible with a plain Athlon, you can't use neighter SSE nor SSE2.


Of course, it doesn't really matter, since at the instant you hit the disk
even once, it's going to take a million cycles and any advantage you got from
saving single cycles is irrelevant.


Really??? We're talking about high performance systems and every case
is diferent. I once saw a ddr2 ram based storage once (like 1TB).
Before you say it, I don't understand how it works, but you won't lose
your data on a reboot or powerfailure. It was very expensive but
really solve this thing with the IO bottleneck. Even when your
bottleneck is the IO, still makes no sense to waste CPU resources
unnecessarily.


 Imagine that you are GCC and that you have two options in front of
 you: you can use FSQRT or FDIV plus 20 ADD/SUB.

Could you please describe a reasonable case where GCC would have such an
option? I cannot imagine any.

As I said, it is an example. Take floatpoint divisions. You have
plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have
to make a choice. And this is only one case. Usually, compiler
optimizations are really complex and the processor's timings counts a
lot.

At every optimization the compile needs to mesure the quickest path,
so it uses information on how the processor will run the code. If you
take a look the AMD's docs you will see that theirs SSE2
implementation is diferent from Intel's internally. So, sometimes the
quickest path uses SSE2 and sometimes it doesn't. You also have to
count the costs of converting SSE registers to commom ones.

If you still can't imagine any case, you can read Intel's assembler
reference. You'll see that there are a lot of ways of doing a lot of
things.


 An example that I know of: it's impossible to run my software at a
 high demanding customer without compiling it to the it's processor (I
 make 5 compilations on every release).

What's your software? How can you make such assertions without backing them
up? How can you know that the same holds for PostgreSQL?

As Mike said, point to the benchmarks showing this essential difference
between -O2 and -O2 -mcpu=pentium4 (or whatever). The only single worthwhile
difference I can think of, is that glibc can use the SYSENTER function if it
knows you have a 686 or higher (which includes AMD), and with recent kernels,
I'm not even sure if that is needed anymore.


Steinar, you should really test it. I won't read the PostgreSQL source
to point you were it could use SSE or SSE2 or whatever. And I won't
read glibc's code.

You don't need to belive in what I'm saying. You can read GCC docs,
Intel's assembler reference, AMD's docs about their processor and
about how diferent that arch is.

Best regards,
Daniel Colchete

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] New to PostgreSQL, performance considerations


This is very very very true :-)!

I just remebered one case with MySQL. When I changed the distro from
Conectiva 10 (rpm-based ended brazilian distro) to Gentoo, a MySQL
operation that usually took 2 minutes to run, ended in 47 seconds.
This is absolutely vage. I don't have how to prove it to you. The old
situation doesn't even exists anymore because I used the same hardware
on the upgrade. And I can't mesure how each factor helped: compiling
glibc and Mysql with good cflags, rebuilding my database in a ordered
way, never kernel, etc.. All I know is that this process still runs
with less than 1 minute (my database is larger now).

I used the very same hardware: P4 3.0Ghz SATA disk without RAID. And I
only upgraded because Conectiva's support to their version 10 ended
and I need to keep my system up with the security patches.

Best,
Daniel

On 12/11/06, Michael Stone [EMAIL PROTECTED] wrote:

On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
unfortunally I don't have any benchmarks right now.

That's fairly normal for gentoo users pushing their compile options.

Mike Stone


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Christopher Browne

Oops! [EMAIL PROTECTED] (Daniel van Ham Colchete) was seen spray-painting on 
a wall:
 But, trust me on this one. It's worth it. 

No, the point of performance analysis is that you *can't* trust the
people that say trust me on this one.

If you haven't got a benchmark where you can demonstrate a material
and repeatable difference, then you're just some Gentoo speed racer
making things up.
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://linuxdatabases.info/info/wp.html
One  last  point  about metaphor,  poetry,   etc.   As  an example  to
illustrate these  capabilities  in   Sastric Sanskrit,consider the
bahuvrihi construct (literally  man with a  lot of rice) which  is
used currently  in   linguistics to  describe  references  outside  of
compounds.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] New to PostgreSQL, performance considerations


You are right Christopher.

Okay. Let's solve this matter.

What PostgreSQL benchmark software should I use???

I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
the same version FC6 uses and install it at my Gentoo. I'll use the
same hardware (diferent partitions to each).

I'm not saying that Gentoo is faster than FC6. I just want to prove
that if you compile your software to make better use of your
processor, it will run faster.

It might take a few days because I'm pretty busy right now at my job.

Best regards,
Daniel

On 12/11/06, Christopher Browne [EMAIL PROTECTED] wrote:

Oops! [EMAIL PROTECTED] (Daniel van Ham Colchete) was seen spray-painting on 
a wall:
 But, trust me on this one. It's worth it.

No, the point of performance analysis is that you *can't* trust the
people that say trust me on this one.

If you haven't got a benchmark where you can demonstrate a material
and repeatable difference, then you're just some Gentoo speed racer
making things up.
--
select 'cbbrowne' || '@' || 'acm.org';
http://linuxdatabases.info/info/wp.html
One  last  point  about metaphor,  poetry,   etc.   As  an example  to
illustrate these  capabilities  in   Sastric Sanskrit,consider the
bahuvrihi construct (literally  man with a  lot of rice) which  is
used currently  in   linguistics to  describe  references  outside  of
compounds.

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] New to PostgreSQL, performance considerations

On Mon, Dec 11, 2006 at 11:09:13AM -0200, Daniel van Ham Colchete wrote:
 You know what? I don't.
 So test it yourself.

You're making the claims, you're supposed to be proving them...

 As I said, it is an example. Take floatpoint divisions. You have
 plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have
 to make a choice. 

No, you don't. MMX, SSE and 3Dnow! will all give you the wrong result
(reduced precision). SSE2, on the other hand, has double precision floats, so
you might have a choice there -- except that PostgreSQL doesn't really do a
lot of floating-point anyhow.

 And this is only one case. Usually, compiler optimizations are really
 complex and the processor's timings counts a lot.

You keep asserting this, with no good backing.

 If you still can't imagine any case, you can read Intel's assembler
 reference. You'll see that there are a lot of ways of doing a lot of
 things.

I've been programming x86 assembler for ten years or so...

 Steinar, you should really test it. I won't read the PostgreSQL source
 to point you were it could use SSE or SSE2 or whatever. And I won't
 read glibc's code.

Then you should stop making these sort of wild claims.

 You don't need to belive in what I'm saying. You can read GCC docs,
 Intel's assembler reference, AMD's docs about their processor and
 about how diferent that arch is.

I have.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations


On 12/11/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Mon, Dec 11, 2006 at 11:09:13AM -0200, Daniel van Ham Colchete wrote:
 You know what? I don't.
 So test it yourself.

You're making the claims, you're supposed to be proving them...

 As I said, it is an example. Take floatpoint divisions. You have
 plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have
 to make a choice.

No, you don't. MMX, SSE and 3Dnow! will all give you the wrong result
(reduced precision). SSE2, on the other hand, has double precision floats, so
you might have a choice there -- except that PostgreSQL doesn't really do a
lot of floating-point anyhow.

 And this is only one case. Usually, compiler optimizations are really
 complex and the processor's timings counts a lot.

You keep asserting this, with no good backing.

 If you still can't imagine any case, you can read Intel's assembler
 reference. You'll see that there are a lot of ways of doing a lot of
 things.

I've been programming x86 assembler for ten years or so...

So, I'm a newbie to you. I learned x86 assembler last year.


 Steinar, you should really test it. I won't read the PostgreSQL source
 to point you were it could use SSE or SSE2 or whatever. And I won't
 read glibc's code.

Then you should stop making these sort of wild claims.

 You don't need to belive in what I'm saying. You can read GCC docs,
 Intel's assembler reference, AMD's docs about their processor and
 about how diferent that arch is.

I have.

/* Steinar */


Okay, I'll do the benchmarks. Just sent an e-mail about this to the
list. If you have any sugestions of how to make the benchmark please
let-me know.
I like when I prove myself wrong. Although it's much better when I'm
right :-)...

Best regards,
Daniel Colchete

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] New to PostgreSQL, performance considerations

On Mon, Dec 11, 2006 at 11:17:06AM -0200, Daniel van Ham Colchete wrote:
 I just remebered one case with MySQL. When I changed the distro from
 Conectiva 10 (rpm-based ended brazilian distro) to Gentoo, a MySQL
 operation that usually took 2 minutes to run, ended in 47 seconds.

How do you know that this improvement had _anything_ to do with the use of
different optimization flags? Were even the MySQL versions or configuration
the same?

 This is absolutely vage.

Indeed it is.

 I don't have how to prove it to you.

No, but you should stop making this sort of absolutely essential claims if
you can't.

 And I can't mesure how each factor helped: compiling glibc and Mysql with
 good cflags, rebuilding my database in a ordered way, never kernel, etc..

Exactly. So why are you attributing it to the first factor only? And why do
you think this would carry over to PostgreSQL?

Remember, anecdotal evidence isn't.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] New to PostgreSQL, performance considerations

On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
 What PostgreSQL benchmark software should I use???

Look up the list archives; search for TPC.

 I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
 the same version FC6 uses and install it at my Gentoo. I'll use the
 same hardware (diferent partitions to each).

Why do you want to compare FC6 and Gentoo? Wasn't your point that the -march=
was supposed to be the relevant factor here? In that case, you want to keep
all other things equal; so use the same distribution, only with -O2
-march=i686 vs. -march=athlon-xp (or whatever).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Postgresql - Threshold value.

2006-12-11 Thread Ravindran G - TLS, Chennai.

Thank you very much for your reply. 

This is not working in Postgresql 8.1.4. Its throwing some error. 

BTW, how to fetch the total disk space that is alloted to DB in postgresql
?. Any commands available ?. 

Regards, Ravi




-Original Message-
From: Kaloyan Iliev [mailto:[EMAIL PROTECTED]
Sent: Monday, December 11, 2006 3:04 PM
To: Ravindran G - TLS, Chennai.; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgresql - Threshold value.


Hi,

try using:

tmp=# show all;

and

tmp=# show geqo_threshold;


Regards,

  Kaloyan Iliev

Ravindran G - TLS, Chennai. wrote:

Hello,

How to get Postgresql Threshold value ?.  Any commands available ?. 

Regards, Ravi
DISCLAIMER 
The contents of this e-mail and any attachment(s) are confidential and
intended for the 

named recipient(s) only. It shall not attach any liability on the
originator or HCL or its 

affiliates. Any views or opinions presented in this email are solely those
of the author and 

may not necessarily reflect the opinions of HCL or its affiliates. Any form
of reproduction, 

dissemination, copying, disclosure, modification, distribution and / or
publication of this 

message without the prior written consent of the author of this e-mail is
strictly 

prohibited. If you have received this email in error please delete it and
notify the sender 

immediately. Before opening any mail and attachments please check them for
viruses and 

defect.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


  


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Postgresql - Threshold value.

2006-12-11 Thread A. Kretschmer

am  Mon, dem 11.12.2006, um 19:41:29 +0530 mailte Ravindran G - TLS, Chennai. 
folgendes:
 Thank you very much for your reply. 
 
 This is not working in Postgresql 8.1.4. Its throwing some error. 

Which errors?

test=# show geqo_threshold;
 geqo_threshold

 12
(1 row)

test=*# select version();
  version
---
 PostgreSQL 8.1.4 on i386-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.5 (Debian 
1:3.3.5-13)
(1 row)



 
 BTW, how to fetch the total disk space that is alloted to DB in postgresql
 ?. Any commands available ?. 

Yes, RTFM.
http://www.postgresql.org/docs/8.1/interactive/functions-admin.html
- pg_database_size(name)


 
 Regards, Ravi
 
 
 
 
 -Original Message-

Please, no silly fullquote below your text.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Postgresql - Threshold value.

2006-12-11 Thread Ravindran G - TLS, Chennai.

Thanks. 

I am using Postgres 8.1.4 in windows 2000 and i don't get the proper
response for threshold.

-

pg_database_size(name) - is giving the current space used by DB. But I want
like Total space utilized by DB and Free space. 

Regards, Ravi


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of A. Kretschmer
Sent: Monday, December 11, 2006 7:55 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgresql - Threshold value.


am  Mon, dem 11.12.2006, um 19:41:29 +0530 mailte Ravindran G - TLS,
Chennai. folgendes:
 Thank you very much for your reply. 
 
 This is not working in Postgresql 8.1.4. Its throwing some error. 

Which errors?

test=# show geqo_threshold;
 geqo_threshold

 12
(1 row)

test=*# select version();
  version

---
 PostgreSQL 8.1.4 on i386-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.5
(Debian 1:3.3.5-13)
(1 row)



 
 BTW, how to fetch the total disk space that is alloted to DB in postgresql
 ?. Any commands available ?. 

Yes, RTFM.
http://www.postgresql.org/docs/8.1/interactive/functions-admin.html
- pg_database_size(name)


 
 Regards, Ravi
 
 
 
 
 -Original Message-

Please, no silly fullquote below your text.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
DISCLAIMER 
The contents of this e-mail and any attachment(s) are confidential and intended 
for the 

named recipient(s) only. It shall not attach any liability on the originator or 
HCL or its 

affiliates. Any views or opinions presented in this email are solely those of 
the author and 

may not necessarily reflect the opinions of HCL or its affiliates. Any form of 
reproduction, 

dissemination, copying, disclosure, modification, distribution and / or 
publication of this 

message without the prior written consent of the author of this e-mail is 
strictly 

prohibited. If you have received this email in error please delete it and 
notify the sender 

immediately. Before opening any mail and attachments please check them for 
viruses and 

defect.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Merlin Moncure


On 12/11/06, Daniel van Ham Colchete [EMAIL PROTECTED] wrote:

But, trust me on this one. It's worth it. Think of this: PostgreSQL
and GNU LibC use a lot of complex algorithms: btree, hashes,
checksums, strings functions, etc... And you have a lot of ways to
compile it into binary code. Now you have Pentium4's vectorization
that allow you to run plenty of instructions in paralell, but AMD
doesn't have this. Intel also have SSE2 that makes double-precision
floatpoint operations a lot faster, AMD also doesn't have this (at
least on 32bits). Now imagine that you're RedHat and that you have to
deliver one CD to AMD and Intel servers. That means you can't use any
AMD-specific or Intel-specific tecnology at the binary level.


AMD processors since the K6-2 and I think Intel ones since P-Pro are
essentially RISC processors with a hardware microcode compiler that
translates and reorganizes instructions on the fly.  Instruction
choice and ordering was extremely important in older 32 bit
architectures (like the 486) but is much less important these days.  I
think you will find that an optimized glibc might be faster in
specific contrived cases, the whole is unfortunately less than the sum
of its parts.

While SSE2 might be able to optimize things like video decoding and
the like, for most programs it's of little benifit and IMO a waste of
time.  Also as others pointed out things like cache hits/misses and
i/o considerations are actually much more important than instruction
execution speed.  We ran Gentoo here for months and did not to be
faster enough to merit the bleeding edge quirks it has for production
environments.

If you dig assembly, there was an interesting tackle of the spinlocks
code on the hackers list last year IIRC.

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] Low throughput of binary inserts from windows to linux

2006-12-11 Thread Axel Waggershauser


On 12/9/06, Tom Lane [EMAIL PROTECTED] wrote:

Axel Waggershauser [EMAIL PROTECTED] writes:
 ... This works quite well for the
 following setups:

 client  - server
 -
 linux   - linux
 linux   - windows
 windows - windows

 but pretty bad (meaning about 10 times slower) for this setup

 windows - linux

This has to be a network-level problem.  IIRC, there are some threads in
our archives discussing possibly-related performance issues seen with
Windows' TCP stack.  Don't recall details, but I think in some cases
the problem was traced to third-party add-ons to the Windows stack.
You might want to check just what you're running there.


I searched the archives but found nothing really enlightening
regarding my problem. One large thread regarding win32 was related to
a psql problem related to multiple open handles, other mails referred
to a QoS patch but I could not find more specific information.

I thought about firewall or virus scanning software myself, but I
can't really see why such software should distinguish between a
windows remote host and a linux remote host. Furthermore,
downloading is fast on all setups, it's just uploading from windows
to linux, which is slow.

I repeated my test with a vanilla windows 2000 machine (inc. tons of
microsoft hot-fixes) and it exposes the same problem.

I'm out of ideas here, maybe someone could try to reproduce this
behavior or could point me to the thread containing relevant
information (sorry, maybe I'm just too dumb :-/)

Thank,
 Axel

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Low throughput of binary inserts from windows to linux

2006-12-11 Thread Thomas H.


I'm out of ideas here, maybe someone could try to reproduce this
behavior or could point me to the thread containing relevant
information (sorry, maybe I'm just too dumb :-/)


please specify how you're transfering the data from windows - linux. are 
you using odbc? if yes, what driver? are you using FDQN server names or a 
plain ip adress? etc etc.


- thomas 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Low throughput of binary inserts from windows to linux

2006-12-11 Thread Axel Waggershauser


On 12/11/06, Thomas H. [EMAIL PROTECTED] wrote:

 I'm out of ideas here, maybe someone could try to reproduce this
 behavior or could point me to the thread containing relevant
 information (sorry, maybe I'm just too dumb :-/)

please specify how you're transfering the data from windows - linux. are
you using odbc? if yes, what driver? are you using FDQN server names or a
plain ip adress? etc etc.


You may take a look at my first mail (starting this thread), there you
find a my_lo.c attached, containing a the complete code. I use libpq.
The connection is established like this:

conn = PQsetdbLogin(argv[1], 5432, NULL, NULL, argv[2],
argv[3], argv[4]);

I called the test program with the plain ip-address of the server machine.

Axel

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Postgresql - Threshold value.

2006-12-11 Thread Rajesh Kumar Mallah


On 12/11/06, Ravindran G - TLS, Chennai. [EMAIL PROTECTED] wrote:

Thanks.

I am using Postgres 8.1.4 in windows 2000 and i don't get the proper
response for threshold.


what is the response you get ? please be specific about the issues.

also the footer that comes with your emails are
not appreciated by many people. if possible pls avoid it.

Regds
mallah.



-



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Low throughput of binary inserts from windows to linux

Axel Waggershauser [EMAIL PROTECTED] writes:
 On 12/9/06, Tom Lane [EMAIL PROTECTED] wrote:
 This has to be a network-level problem.  IIRC, there are some threads in
 our archives discussing possibly-related performance issues seen with
 Windows' TCP stack.

 I searched the archives but found nothing really enlightening
 regarding my problem. One large thread regarding win32 was related to
 a psql problem related to multiple open handles, other mails referred
 to a QoS patch but I could not find more specific information.

Yeah, that's what I couldn't think of the other day.  The principal
report was here:
http://archives.postgresql.org/pgsql-general/2005-01/msg01231.php

By default, Windows XP installs the QoS Packet Scheduler service.
It is not installed by default on Windows 2000.  After I installed
QoS Packet Scheduler on the Windows 2000 machine, the latency
problem vanished. 

Now he was talking about a local connection not remote, but it's still
something worth trying.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Chris Browne

[EMAIL PROTECTED] (Daniel van Ham Colchete) writes:
 You are right Christopher.

 Okay. Let's solve this matter.

 What PostgreSQL benchmark software should I use???

pgbench is one option.  

There's a TPC-W at pgFoundry
(http://pgfoundry.org/projects/tpc-w-php/).  

There's the Open Source Database Benchmark.
(http://osdb.sourceforge.net/)

Those are three reasonable options.

 I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
 the same version FC6 uses and install it at my Gentoo. I'll use the
 same hardware (diferent partitions to each).

Wrong approach.  You'll be comparing apples to oranges, because Gentoo
and Fedora pluck sources from different points in the source code
stream.

In order to prove what you want to prove, you need to run the
benchmarks all on Gentoo, where you run with 4 categorizations:

 1.  Where you run PostgreSQL and GLIBC without any processor-specific
 optimizations

 2.  Where you run PostgreSQL and GLIBC with all relevant
 processor-specific optimizations

 3.  Where you run PostgreSQL with, and GLIBC without
 processor-specific optimizations

 4.  Where you run PostgreSQL without, and GLIBC with processor-specific
 optimizations

That would allow one to clearly distinguish which optimizations are
particularly relevant.

 I'm not saying that Gentoo is faster than FC6. I just want to prove
 that if you compile your software to make better use of your
 processor, it will run faster.

 It might take a few days because I'm pretty busy right now at my
 job.

I expect that you'll discover, if you actually do these tests, that
this belief is fairly much nonsense.

- Modern CPUs do a huge amount of on-CPU self-tuning.

- CPU features that could have a material effect tend to be unusable
  when compiling general purpose libraries and applications.  GCC
  doesn't generate MMX-like instructions.

- Database application performance tends to be I/O driven.

- When database application performance *isn't* I/O driven, it is
  likely to be driven by cache management, which compiler options
  won't affect.
-- 
output = reverse(ofni.secnanifxunil @ enworbbc)
http://linuxdatabases.info/info/sgml.html
very few people approach me in real life and insist on proving they
are drooling idiots.  -- Erik Naggum, comp.lang.lisp

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Postgresql - Threshold value.

Rajesh Kumar Mallah [EMAIL PROTECTED] writes:
 On 12/11/06, Ravindran G - TLS, Chennai. [EMAIL PROTECTED] wrote:
 I am using Postgres 8.1.4 in windows 2000 and i don't get the proper
 response for threshold.

 what is the response you get ? please be specific about the issues.

Even more to the point, what sort of response do you think you should
get?  What kind of threshold are you talking about, and why do you think
it even exists in Postgres?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Low throughput of binary inserts from windows to

2006-12-11 Thread Kevin Grittner

 On Mon, Dec 11, 2006 at  8:58 AM, in message
[EMAIL PROTECTED], Axel
Waggershauser [EMAIL PROTECTED] wrote: 
 
 I'm out of ideas here, maybe someone could try to reproduce this
 behavior or could point me to the thread containing relevant
 information
 
No guarantees that this is the problem, but I have seen similar issues
in other software because of delays introduced in the TCP stack by the
Nagle algorithm.  Turning on TCP_NODELAY has solved such problems.  I
don't know if PostgreSQL is vulnerable to this, or how it would be fixed
in a PostgreSQL environment, but it might give you another avenue to
search.
 
-Kevin
 


---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] Looking for hw suggestions for high concurrency OLTP app

2006-12-11 Thread Cosimo Streppone

Hi all,

I'd like to get suggestions from all you out there for
a new Postgresql server that will replace an existing one.

My performance analysis shows very *low* iowaits,
and very high loads at times of peak system activity.
The average concurrent processes number is 3/4, with peaks of 10/15.
*Sustained* system load varies from 1.5 to 4, while peak load
reaches 20 and above, always with low iowait%.
I see this as a clear sign of more processors power need.

I'm aware of the context-switching storm problem, but here
the cs stays well under 50,000, so I think it's not the problem
here.

Current machine is an Acer Altos R700 (2 Xeon 2.8 Ghz, 4 Gb RAM),
similar to this one (R710):
http://www.acer.co.uk/acereuro/page9.do?sp=page4dau34.oid=7036UserCtxParam=0GroupCtxParam=0dctx1=17CountryISOCtxParam=UKLanguageISOCtxParam=enctx3=-1ctx4=United+Kingdomcrc=334044639

So, I'm looking for advice on a mid-range OLTP server with
the following (flexible) requirements:

- 4 physical processors expandable to 8 (dual core preferred),
either Intel or AMD
- 8 Gb RAM exp. to at least 32 Gb
- Compatibility with Acer S300 External storage enclosure.
The old server uses that, and it is boxed with 15krpm hdds
in RAID-10, which do perfectly well their job.
The enclosure is connected via 2 x LSI Logic PCI U320 controllers
- On-site, same-day hardware support contract
- Rack mount

Machines we're evaluating currently include:
- Acer Altos R910
- Sun Fire V40Z,
- HP Integrity RX4640
- IBM eServer 460

Experiences on these machines?
Other suggestions?

Thanks.

--
Cosimo

---(end of broadcast)---
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

[PERFORM] really quick multiple inserts can use COPY?

2006-12-11 Thread Jens Schipkowski


Hello!

In our JAVA application we do multiple inserts to a table by data from a  
Hash Map. Due to poor database access implemention - done by another  
company (we got the job to enhance the software) - we cannot use prepared  
statements. (We are not allowed to change code at database access!)
First, we tried to fire one INSERT statement per record to insert. This  
costs 3 ms per row which is to slow because normally we insert 10.000  
records which results in 30.000 ms just for inserts.


for(){
sql = INSERT INTO tblfoo(foo,bar) VALUES(+it.next()+,+CONST.BAR+);;
}

I was searching for an quicker way - MSSQL offers Array Inserts - at  
PostgreSQL. The only solution seem to be INSERT INTO foo SELECT and this  
is really dirty.

I improved the inserts using the subselect with union.

sql = INSERT INTO tblfoo(foo,bar) ;
for(){
sql += SELECT +it.next()+,+CONST.BAR+ UNION  ...
}

This results in a really long INSERT INTO SELECT UNION statement and works  
correct and quick but looks dirty.


When I heard about COPY I thought this will be the right way. But it does  
not work using JDBC. Testing via psql does it perfect but sending the same  
SQL statements via JDBC throws an error.

- BEGIN
sql = COPY tblfoo(foo,bar) FROM STDIN;\n1  'foobar'\n2  'foobar'\n\\.;
- COMMIT
ERROR:  syntax error at or near 1 at character 34

So, my questions:
Is it possible to use COPY FROM STDIN with JDBC?
Will it bring performance improvement compared to SELECT UNION solution?

many thanks in advance,
Jens Schipkowski

--
**
APUS Software GmbH

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] really quick multiple inserts can use COPY?

2006-12-11 Thread Andreas Kretschmer

Jens Schipkowski [EMAIL PROTECTED] schrieb:

 Hello!
 
 In our JAVA application we do multiple inserts to a table by data from a  
 Hash Map. Due to poor database access implemention - done by another  
 company (we got the job to enhance the software) - we cannot use prepared  
 statements. (We are not allowed to change code at database access!)
 First, we tried to fire one INSERT statement per record to insert. This  
 costs 3 ms per row which is to slow because normally we insert 10.000  
 records which results in 30.000 ms just for inserts.

Can you change this from INSERT-Statements to COPY? Copy is *much*
faster than INSERT.

If no, do you using all INSERTs in one transaction? I believe, a 'BEGIN'
and a 'COMMIT' around all INSERTs may increase the speed.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.(unknow)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] really quick multiple inserts can use COPY?

2006-12-11 Thread Rajesh Kumar Mallah


So, my questions:
Is it possible to use COPY FROM STDIN with JDBC?


Should be. Its at least possible using DBI and DBD::Pg (perl)


my $copy_sth = $dbh - prepare( COPY
general.datamining_mailing_lists (query_id,email_key) FROM STDIN;) ;
 $copy_sth - execute();
 while (my ($email_key ) = $fetch_sth - fetchrow_array ()) {
 $dbh - func($query_id\t$email_key\n, 'putline');
 }
 $fetch_sth - finish();
 $dbh - func(\\.\n, 'putline');
 $dbh - func('endcopy');
 $copy_sth-finish();

Some JDBC expert would tell better how its done with JDBC.



Will it bring performance improvement compared to SELECT UNION solution?


COPY is quite faast.

Regds
mallah.



many thanks in advance,
Jens Schipkowski

--
**
APUS Software GmbH

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] really quick multiple inserts can use COPY?

Jens Schipkowski [EMAIL PROTECTED] writes:
 Is it possible to use COPY FROM STDIN with JDBC?

You should be asking the pgsql-jdbc list, not here.  (I know I've seen
mention of a JDBC patch to support COPY, but I dunno if it's made it into
any official version.)

 Will it bring performance improvement compared to SELECT UNION solution?

Please, at least be smart enough to use UNION ALL not UNION.

If you're using 8.2 you could also consider using INSERT with multiple
VALUES-lists.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] really quick multiple inserts can use COPY?

2006-12-11 Thread Guillaume Cottenceau

Jens Schipkowski jens.schipkowski 'at' apus.co.at writes:

 Hello!
 
 In our JAVA application we do multiple inserts to a table by data from
 a  Hash Map. Due to poor database access implemention - done by
 another  company (we got the job to enhance the software) - we cannot
 use prepared  statements. (We are not allowed to change code at
 database access!)
 First, we tried to fire one INSERT statement per record to insert.
 This  costs 3 ms per row which is to slow because normally we insert
 10.000  records which results in 30.000 ms just for inserts.
 
 for(){
 sql = INSERT INTO tblfoo(foo,bar) VALUES(+it.next()+,+CONST.BAR+);;
 }

You should try to wrap that into a single transaction. PostgreSQL
waits for I/O write completion for each INSERT as it's
implicitely in its own transaction. Maybe the added performance
would be satisfactory for you.

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] New to PostgreSQL, performance considerations


This definitely is the correct approach.

Actually, Daniel van Ham Colchete may not be as all wet as some 
around here think.  We've had previous data that shows that pg can 
become CPU bound (see previous posts by Josh Berkus and others 
regarding CPU overhead in what should be IO bound tasks).


In addition, we know that x86 compatible 64b implementations differ 
enough between AMD and Intel products that it sometimes shows on benches.


Evidence outside the DBMS arena supports the hypothesis that recent 
CPUs are needing more hand-holding and product specific compiling, 
not less, compared to their previous versions.


Side Note: I wonder what if anything pg could gain from using SWAR 
instructions (SSE*, MMX, etc)?


I'd say the fairest attitude is to do everything we can to support 
having the proper experiments done w/o presuming the results.


Ron Peacetree


At 10:47 AM 12/11/2006, Chris Browne wrote:


In order to prove what you want to prove, you need to run the
benchmarks all on Gentoo, where you run with 4 categorizations:

 1.  Where you run PostgreSQL and GLIBC without any processor-specific
 optimizations

 2.  Where you run PostgreSQL and GLIBC with all relevant
 processor-specific optimizations

 3.  Where you run PostgreSQL with, and GLIBC without
 processor-specific optimizations

 4.  Where you run PostgreSQL without, and GLIBC with processor-specific
 optimizations

That would allow one to clearly distinguish which optimizations are
particularly relevant.

 I'm not saying that Gentoo is faster than FC6. I just want to prove
 that if you compile your software to make better use of your
 processor, it will run faster.

 It might take a few days because I'm pretty busy right now at my
 job.

I expect that you'll discover, if you actually do these tests, that
this belief is fairly much nonsense.

- Modern CPUs do a huge amount of on-CPU self-tuning.

- CPU features that could have a material effect tend to be unusable
  when compiling general purpose libraries and applications.  GCC
  doesn't generate MMX-like instructions.

- Database application performance tends to be I/O driven.

- When database application performance *isn't* I/O driven, it is
  likely to be driven by cache management, which compiler options
  won't affect.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations


On Mon, Dec 11, 2006 at 12:15:51PM -0500, Ron wrote:
I'd say the fairest attitude is to do everything we can to support 
having the proper experiments done w/o presuming the results.


Who's presuming results?[1] It is fair to say that making extraordinary 
claims without any evidence should be discouraged. It's also fair to say 
that if there are specific things that need cpu-specific tuning they'll 
be fairly limited critical areas (e.g., locks) which would probably be 
better implemented with a hand-tuned code and runtime cpu detection than 
by magical mystical compiler invocations.


Mike Stone

[1] I will say that I have never seen a realistic benchmark of general 
code where the compiler flags made a statistically significant 
difference in the runtime. There are some particularly cpu-intensive 
codes, like some science simulations or encoding routines where they 
matter, but that's not the norm--and many of those algorithms already 
have hand-tuned versions which will outperform autogenerated code. You'd 
think that with all the talk that the users of certain OS's generate 
about CFLAG settings, there'd be some well-published numbers backing up 
the hype. At any rate if there were numbers to back the claim then I 
think they could certainly be considered without prejudice.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] File Systems Compared

2006-12-11 Thread Bruno Wolff III

On Wed, Dec 06, 2006 at 08:55:14 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:
  Anyone run their RAIDs with disk caches enabled, or is this akin to
  having fsync off?
 
 Disk write caches are basically always akin to having fsync off.  The
 only time a write-cache is (more or less) safe to enable is when it is
 backed by a battery or in some other way made non-volatile.
 
 So a RAID controller with a battery-backed write cache can enable its
 own write cache, but can't safely enable the write-caches on the disk
 drives it manages.

This appears to be changing under Linux. Recent kernels have write barriers
implemented using cache flush commands (which some drives ignore, so you
need to be careful). In very recent kernels, software raid using raid 1
will also handle write barriers. To get this feature, you are supposed to
mount ext3 file systems with the barrier=1 option. For other file systems,
the parameter may need to be different.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations

Statements like these can not be reasonably interpreted in any manner 
_except_ that of presuming the results:


I expect that you'll discover, if you actually do these tests, that 
this belief  (that using arch specific compiler options lead to 
better performing SW) is fairly much nonsense.


...IMO a waste of time...

etc

The correct objective response to claims w/o evidence is to request 
evidence, and to do everything we can to support it being properly 
gathered.  Not to try to discourage the claimant from even trying by 
ganging up on them with multiple instances of Argument From Authority 
or variations of Ad Hominem attacks.
(The validity of the claim has nothing to do with the skills or 
experience of the claimant or anyone else in the discussion.  Only on 
the evidence.)


 It is a tad unfair and prejudicial to call claims that CPU 
optimizations matter to the performance of DB product extraordinary.
Evidence outside the DBMS field exists; and previous posts here show 
that pg can indeed become CPU-bound during what should be IO bound tasks.
At the moment, Daniel's claims are not well supported.  That is far 
different from being extraordinary given the current circumstantial 
evidence.


Let's also bear in mind that as a community project, we can use all 
the help we can get.  Driving potential resources away is in 
opposition to that goal.


[1] The evidence that arch specific flags matter to performance can 
be found as easily as recompiling your kernel or your 
compiler.   While it certainly could be argued how general purpose 
such SW is, the same could be said for just about any SW at some 
level of abstraction.


Ron Peacetree


At 12:31 PM 12/11/2006, Michael Stone wrote:

On Mon, Dec 11, 2006 at 12:15:51PM -0500, Ron wrote:
I'd say the fairest attitude is to do everything we can to support 
having the proper experiments done w/o presuming the results.


Who's presuming results?[1] It is fair to say that making 
extraordinary claims without any evidence should be discouraged. 
It's also fair to say that if there are specific things that need 
cpu-specific tuning they'll be fairly limited critical areas (e.g., 
locks) which would probably be better implemented with a hand-tuned 
code and runtime cpu detection than by magical mystical compiler invocations.


Mike Stone

[1] I will say that I have never seen a realistic benchmark of 
general code where the compiler flags made a statistically 
significant difference in the runtime. There are some particularly 
cpu-intensive codes, like some science simulations or encoding 
routines where they matter, but that's not the norm--and many of 
those algorithms already have hand-tuned versions which will 
outperform autogenerated code. You'd think that with all the talk 
that the users of certain OS's generate about CFLAG settings, 
there'd be some well-published numbers backing up the hype. At any 
rate if there were numbers to back the claim then I think they could 
certainly be considered without prejudice.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] New to PostgreSQL, performance considerations

Michael,

On 12/11/06 9:31 AM, Michael Stone [EMAIL PROTECTED] wrote:

 [1] I will say that I have never seen a realistic benchmark of general
 code where the compiler flags made a statistically significant
 difference in the runtime.

Here's one - I wrote a general purpose Computational Fluid Dynamics analysis
method used by hundreds of people to perform aircraft and propulsion systems
analysis.  Compiler flag tuning would speed it up by factors of 2-3 or even
more on some architectures.  The reason it was so effective is that the
structure of the code was designed to be general, but also to expose the
critical performance sections in a way that the compilers could use - deep
pipelining/vectorization, unrolling, etc, were carefully made easy for the
compilers to exploit in critical sections.  Yes, this made the code in those
sections harder to read, but it was a common practice because it might take
weeks of runtime to get an answer and performance mattered.

The problem I see with general purpose DBMS code the way it's structured in
pgsql (and others) is that many of the critical performance sections are
embedded in abstract interfaces that obscure them from optimization.  An
example is doing a simple is equal to operation has many layers
surrounding it to ensure that UDFs can be declared and that special
comparison semantics can be accomodated.  But if you're simply performing a
large number of INT vs. INT comparisons, it will be thousands of times
slower than a CPU native operation because of the function call overhead,
etc.  I've seen presentations that show IPC of Postgres at about 0.5, versus
the 2-4 possible from the CPU.

Column databases like C-Store remove these abstractions at planner time to
expose native operations in large chunks to the compiler and the IPC
reflects that - typically 1+ and as high as 2.5.  If we were to redesign the
executor and planner to emulate that same structure we could achieve similar
speedups and the compiler would matter more.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Looking for hw suggestions for high concurrency

The Sun X4600 is very good for this, the V40z is actually EOL so I'd stay
away from it.

You can currently do 8 dual core CPUs with the X4600 and 128GB of RAM and
soon you should be able to do 8 quad core CPUs and 256GB of RAM.

- Luke 


On 12/11/06 8:26 AM, Cosimo Streppone [EMAIL PROTECTED] wrote:

 Hi all,
 
 I'd like to get suggestions from all you out there for
 a new Postgresql server that will replace an existing one.
 
 My performance analysis shows very *low* iowaits,
 and very high loads at times of peak system activity.
 The average concurrent processes number is 3/4, with peaks of 10/15.
 *Sustained* system load varies from 1.5 to 4, while peak load
 reaches 20 and above, always with low iowait%.
 I see this as a clear sign of more processors power need.
 
 I'm aware of the context-switching storm problem, but here
 the cs stays well under 50,000, so I think it's not the problem
 here.
 
 Current machine is an Acer Altos R700 (2 Xeon 2.8 Ghz, 4 Gb RAM),
 similar to this one (R710):
 http://www.acer.co.uk/acereuro/page9.do?sp=page4dau34.oid=7036UserCtxParam=0
 GroupCtxParam=0dctx1=17CountryISOCtxParam=UKLanguageISOCtxParam=enctx3=-1
 ctx4=United+Kingdomcrc=334044639
 
 So, I'm looking for advice on a mid-range OLTP server with
 the following (flexible) requirements:
 
 - 4 physical processors expandable to 8 (dual core preferred),
either Intel or AMD
 - 8 Gb RAM exp. to at least 32 Gb
 - Compatibility with Acer S300 External storage enclosure.
The old server uses that, and it is boxed with 15krpm hdds
in RAID-10, which do perfectly well their job.
The enclosure is connected via 2 x LSI Logic PCI U320 controllers
 - On-site, same-day hardware support contract
 - Rack mount
 
 Machines we're evaluating currently include:
 - Acer Altos R910
 - Sun Fire V40Z,
 - HP Integrity RX4640
 - IBM eServer 460
 
 Experiences on these machines?
 Other suggestions?
 
 Thanks.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations


On Mon, Dec 11, 2006 at 01:20:50PM -0500, Ron wrote:
(The validity of the claim has nothing to do with the skills or 
experience of the claimant or anyone else in the discussion.  Only on 
the evidence.)


Please go back and reread the original post. I don't think the response 
was unwarranted.


 It is a tad unfair and prejudicial to call claims that CPU 
optimizations matter to the performance of DB product extraordinary.
Evidence outside the DBMS field exists; and previous posts here show 
that pg can indeed become CPU-bound during what should be IO bound tasks.
At the moment, Daniel's claims are not well supported.  That is far 
different from being extraordinary given the current circumstantial 
evidence.


No, they're extraordinary regardless of whether postgres is CPU bound.  
The question is whether cpu-specific compiler flags will have a 
significant impact--which is, historically, fairly unlikely. Far more 
likely is that performance can be improved with either a 
non-cpu-specific optimization (e.g., loop unrolling vs not) or with an 
algorithmic enhancement. 

More importantly, you're arguing *your own* point, not the original 
claim. I'll refresh your memory: My Linux is not an Intel-AMD binary 
compatible turtle like Fedora/RedHat/SUSE/... It's really important to 
have your GLIBC compiled for your processor.  It is essencial for 
performance. You wanna draw the line between that (IMO, extraordinary) 
claim and the rational argument that you're trying to substitute in its 
place?


[1] The evidence that arch specific flags matter to performance can 
be found as easily as recompiling your kernel or your 
compiler.   


Then, please, point to the body of evidence. IME, the results of such 
efforts aren't statistically all that signficant on most workloads. I'm 
sure there are edge cases, but it's certainly not going to be on my top 
ten things to look at when tuning a database system. (If your kernel's 
cpu utilization is the bottleneck in your database, you've probably got 
bigger problems than compiler flags can solve.) Where you get the real 
big benefits in a (linux) kernel recompile is when you select code 
that's specifically tuned for a particular processor--not from 
arch-specific gcc flags--and those sorts of things are increasingly 
moving toward boot-time autotuning rather than compile-time manual 
tuning for obvious reasons.


Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] New to PostgreSQL, performance considerations


On Mon, Dec 11, 2006 at 10:30:55AM -0800, Luke Lonergan wrote:

Here's one - I wrote a general purpose Computational Fluid Dynamics analysis
method used by hundreds of people to perform aircraft and propulsion systems
analysis.  


That's kinda the opposite of what I meant by general code. I was trying 
(perhaps poorly) to distinguish between scientific codes and other 
stuff (especially I/O or human interface code).



Compiler flag tuning would speed it up by factors of 2-3 or even
more on some architectures.  The reason it was so effective is that the
structure of the code was designed to be general, but also to expose the
critical performance sections in a way that the compilers could use - deep
pipelining/vectorization, unrolling, etc, were carefully made easy for the
compilers to exploit in critical sections.  


It also sounds like code specifically written to take advantage of 
compiler techniques, rather than random code thrown at a pile of cflags. 
I don't disagree that it is possible to get performance improvements if 
code is written to be performant code; I do (and did) disagree with the 
idea that you'll get huge performance improvements by taking regular old 
C application code and playing with compiler flags. 


Yes, this made the code in those
sections harder to read, but it was a common practice because it might 
take weeks of runtime to get an answer and performance mattered.


IMO that's appropriate for some science codes (although I think even 
that sector is beginning to find that they've gone too far in a lot of 
ways), but for a database I'd rather have people debugging clean, readable 
code than risking my data to something incomprehensible that runs in 
optimal time.



Column databases like C-Store remove these abstractions at planner time to
expose native operations in large chunks to the compiler and the IPC
reflects that - typically 1+ and as high as 2.5.  If we were to redesign the
executor and planner to emulate that same structure we could achieve similar
speedups and the compiler would matter more.


gcc --make-it-really-fast-by-rewriting-it-from-the-ground-up?

Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] really quick multiple inserts can use COPY?

2006-12-11 Thread imad


On 12/11/06, Andreas Kretschmer [EMAIL PROTECTED] wrote:

Jens Schipkowski [EMAIL PROTECTED] schrieb:

 Hello!

 In our JAVA application we do multiple inserts to a table by data from a
 Hash Map. Due to poor database access implemention - done by another
 company (we got the job to enhance the software) - we cannot use prepared
 statements. (We are not allowed to change code at database access!)
 First, we tried to fire one INSERT statement per record to insert. This
 costs 3 ms per row which is to slow because normally we insert 10.000
 records which results in 30.000 ms just for inserts.

Can you change this from INSERT-Statements to COPY? Copy is *much*
faster than INSERT.

If no, do you using all INSERTs in one transaction? I believe, a 'BEGIN'
and a 'COMMIT' around all INSERTs may increase the speed.



Performance increment can also be gained by disabling constraints in the
transaction. These disabled constraints are invoked at the end of the
transaction according to the SQL standard, so no worries about data
consistency.

Hmmm... PG currently supports disabling foreign constraints only. But that
can still be significant.

--Imad
www.EnterpriseDB.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Merlin Moncure


On 12/11/06, Ron [EMAIL PROTECTED] wrote:

Statements like these can not be reasonably interpreted in any manner
_except_ that of presuming the results:

I expect that you'll discover, if you actually do these tests, that
this belief  (that using arch specific compiler options lead to
better performing SW) is fairly much nonsense.

...IMO a waste of time...

etc

The correct objective response to claims w/o evidence is to request
evidence, and to do everything we can to support it being properly
gathered.  Not to try to discourage the claimant from even trying by
ganging up on them with multiple instances of Argument From Authority
or variations of Ad Hominem attacks.
(The validity of the claim has nothing to do with the skills or
experience of the claimant or anyone else in the discussion.  Only on
the evidence.)


/shrugs, this is not debate class, I just happened to have barked up
this particular tree before, and decided to share my insights from it.
A lot of the misunderstanding here stems from legacy perceptions
about how cpus work, not to mention the entire architecture.  If
somebody produces hard facts to the contrary, great, and I encourage
them to do so.

also, some people posting here, not necessarily me, are authority figures. :-)

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] New to PostgreSQL, performance considerations


At 01:47 PM 12/11/2006, Michael Stone wrote:

On Mon, Dec 11, 2006 at 01:20:50PM -0500, Ron wrote:
(The validity of the claim has nothing to do with the skills or 
experience of the claimant or anyone else in the discussion.  Only 
on the evidence.)


Please go back and reread the original post. I don't think the 
response was unwarranted.


So he's evidently young and perhaps a trifle over-enthusiast.  We 
were once too. ;-)


We are not going to get valuable contributions nor help people become 
more valuable to the community by flaming them into submission.


...and who knows, =properly= done experiment may provide both 
surprises and unexpected insights/benefits.


I agree completely with telling him he needs to get better evidence 
and even with helping him understand how he should go about getting it.


It should be noted that his opposition has not yet done these 
experiments either.  (Else they could just simply point to the 
results that refute Daniel's hypothesis.)


The reality is that a new CPU architecture and multiple new memory 
technologies are part of this discussion.  I certainly do not expect 
them to change the fundamental thinking regarding how to get best 
performance for a DBMS.  OTOH, there are multiple valid reasons to 
give such new stuff a thorough and rigorous experimental shake-down.


ATM, =both= sides of this debate are lacking evidence for their POV.

Let's support getting definitive evidence.  No matter who brings it 
to the table ;-)
Ron Peacetree 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] New to PostgreSQL, performance considerations


On 12/11/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Mon, Dec 11, 2006 at 11:17:06AM -0200, Daniel van Ham Colchete wrote:
 I just remebered one case with MySQL. When I changed the distro from
 Conectiva 10 (rpm-based ended brazilian distro) to Gentoo, a MySQL
 operation that usually took 2 minutes to run, ended in 47 seconds.

How do you know that this improvement had _anything_ to do with the use of
different optimization flags? Were even the MySQL versions or configuration
the same?

 This is absolutely vage.

Indeed it is.

Finally we agreed on something.



 I don't have how to prove it to you.

No, but you should stop making this sort of absolutely essential claims if
you can't.

 And I can't mesure how each factor helped: compiling glibc and Mysql with
 good cflags, rebuilding my database in a ordered way, never kernel, etc..

Exactly. So why are you attributing it to the first factor only? And why do
you think this would carry over to PostgreSQL?

Remember, anecdotal evidence isn't.


But that's exactly what I said. I'm not attributing this case to the
optimization factor. As I said there are a lot of factors involved.
The MySQL version change of a minor upgrade (from 4.1.15 to 4.1.21).

Steinar, I say I'll do the benchmark and it will be the end of the story.



/* Steinar */


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] New to PostgreSQL, performance considerations


At 02:28 PM 12/11/2006, Merlin Moncure wrote:


also, some people posting here, not necessarily me, are authority figures. :-)

merlin


Noam Chomsky was one of the most influential thinkers in Linguistics 
to yet have lived.  He was proven wrong a number of times.  Even 
within Linguistics.

There are plenty of other historical examples.

As others have said, opinion without evidence and logic is just that- opinion.
And even Expert Opinion has been known to be wrong.  Sometimes very much so.

Part of what makes an expert an expert is that they can back up their 
statements with evidence and logic that are compelling even to the 
non expert when asked to do so.


All I'm saying is let's all remember how assume is spelled and 
support the getting of some hard data.

Ron Peacetree



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] New to PostgreSQL, performance considerations

Michael,

On 12/11/06 10:57 AM, Michael Stone [EMAIL PROTECTED] wrote:

 That's kinda the opposite of what I meant by general code. I was trying
 (perhaps poorly) to distinguish between scientific codes and other
 stuff (especially I/O or human interface code).

Yes - choice of language has often been a differentiator in these markets -
LISP versus FORTRAN, C++ versus SQL/DBMS.  This isn't just about science,
it's also in Business Intelligence - e.g. Special purpose datamining code
versus algorithms expressed inside a data management engine.
  
 It also sounds like code specifically written to take advantage of
 compiler techniques, rather than random code thrown at a pile of cflags.
 I don't disagree that it is possible to get performance improvements if
 code is written to be performant code; I do (and did) disagree with the
 idea that you'll get huge performance improvements by taking regular old
 C application code and playing with compiler flags.

Agreed - that's my point exactly.
 
 IMO that's appropriate for some science codes (although I think even
 that sector is beginning to find that they've gone too far in a lot of
 ways), but for a database I'd rather have people debugging clean, readable
 code than risking my data to something incomprehensible that runs in
 optimal time.

Certainly something of a compromise is needed.
 
 Column databases like C-Store remove these abstractions at planner time to

 gcc --make-it-really-fast-by-rewriting-it-from-the-ground-up?

Maybe not from ground-up, but rather from about 10,000 ft - 25,000 ft?

There are some who have done a lot of work studying the impact of more
efficient DBMS, see here:
  http://homepages.cwi.nl/~boncz/x100.html

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] New to PostgreSQL, performance considerations


On 12/11/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
 What PostgreSQL benchmark software should I use???

Look up the list archives; search for TPC.

 I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
 the same version FC6 uses and install it at my Gentoo. I'll use the
 same hardware (diferent partitions to each).

Why do you want to compare FC6 and Gentoo? Wasn't your point that the -march=
was supposed to be the relevant factor here? In that case, you want to keep
all other things equal; so use the same distribution, only with -O2
-march=i686 vs. -march=athlon-xp (or whatever).

/* Steinar */

Using Gentoo is just a easy way to make cflag optimizations to all the
other libs as well: glibc, ...

I can also mesure performance on Gentoo with cflag optimized
PostgreSQL and plain PostgreSQL as well.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] New to PostgreSQL, performance considerations


On Mon, Dec 11, 2006 at 02:51:09PM -0500, Ron wrote:

Let's support getting definitive evidence.


Since nobody opposed the concept of contrary evidence, I don't suppose 
you're fighting an uphill battle on that particular point.


It's fine to get preachy about supporting intellectual curiosity, but do 
remember that it's a waste of everyone's (limited) time to give equal 
time to all theories. If someone comes to you with an idea for a 
perpetual motion machine your effort is probably better spent on 
something other than helping him build it, regardless of whether that 
somehow seems unfair. (Now if he brings a working model, that's a 
different story...) Heck, even building a bunch of non-working perpetual 
motion machines as a demonstration is a waste of time, because it's 
always easy to say well, if you had just  That's precisely why the 
person bringing the extraordinary claim is also expected to bring the 
proof, rather than expecting that everyone else prove the status quo.


Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] New to PostgreSQL, performance considerations


On 12/11/06, Chris Browne [EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] (Daniel van Ham Colchete) writes:
 You are right Christopher.

 Okay. Let's solve this matter.

 What PostgreSQL benchmark software should I use???

pgbench is one option.

There's a TPC-W at pgFoundry
(http://pgfoundry.org/projects/tpc-w-php/).

There's the Open Source Database Benchmark.
(http://osdb.sourceforge.net/)

Those are three reasonable options.

Thanks Chris, I'm going to take a look at those options.



 I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
 the same version FC6 uses and install it at my Gentoo. I'll use the
 same hardware (diferent partitions to each).

Wrong approach.  You'll be comparing apples to oranges, because Gentoo
and Fedora pluck sources from different points in the source code
stream.

In order to prove what you want to prove, you need to run the
benchmarks all on Gentoo, where you run with 4 categorizations:

 1.  Where you run PostgreSQL and GLIBC without any processor-specific
 optimizations

 2.  Where you run PostgreSQL and GLIBC with all relevant
 processor-specific optimizations

 3.  Where you run PostgreSQL with, and GLIBC without
 processor-specific optimizations

 4.  Where you run PostgreSQL without, and GLIBC with processor-specific
 optimizations

That would allow one to clearly distinguish which optimizations are
particularly relevant.


Good ideia also. And it is much easier to do as well.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Looking for hw suggestions for high concurrency

2006-12-11 Thread Merlin Moncure


On 12/11/06, Luke Lonergan [EMAIL PROTECTED] wrote:

The Sun X4600 is very good for this, the V40z is actually EOL so I'd stay
away from it.


also, it has no pci express slots. make sure to get pci-e slots :)


You can currently do 8 dual core CPUs with the X4600 and 128GB of RAM and
soon you should be able to do 8 quad core CPUs and 256GB of RAM.


...and this 6 of them (wow!). the v40z was top of its class.  Will K8L
run on this server?

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Looking for hw suggestions for high concurrency

Merlin,

On 12/11/06 12:19 PM, Merlin Moncure [EMAIL PROTECTED] wrote:

 ...and this 6 of them (wow!). the v40z was top of its class.  Will K8L
 run on this server?

No official word yet.

The X4600 slipped in there quietly under the X4500 (Thumper) announcement,
but it's a pretty awesome server.  It's been in production for
supercomputing in Japan for a long while now, so I'd trust it.

- Luke



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Craig A. James


Ron wrote:
We are not going to get valuable contributions nor help people become 
more valuable to the community by flaming them into submission.


Let's support getting definitive evidence.  No matter who brings it to 
the table ;-)


Thanks, Ron, for a voice of respect and reason.  Since I first started using Usenet back 
in 1984, inexplicable rudeness has been a plague on otherwise civilized people.  We're a 
community, we're all in this to help one another.  Sometimes we give good advice, and 
sometimes even those wearing the mantle of authority can make boneheaded 
comments.  I know I do, and when it happens, I always appreciate it when I'm taken to 
task with good humor and tolerance.

When someone comes to this forum with an idea you disagree with, no matter how 
brash or absurd their claims, it's so easy to challenge them with grace and 
good humor, rather than chastizing with harsh words and driving someone from 
our community.  If you're right, you will have taught a valuable lesson to 
someone.  And if on occasion a newcomer shows us something new, then we've all 
learned.  Either way, we have a new friend and contributor to the community.

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Mark Kirkwood


Daniel van Ham Colchete wrote:

On 12/11/06, Steinar H. Gunderson [EMAIL PROTECTED] wrote:

On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
 What PostgreSQL benchmark software should I use???

Look up the list archives; search for TPC.

 I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
 the same version FC6 uses and install it at my Gentoo. I'll use the
 same hardware (diferent partitions to each).

Why do you want to compare FC6 and Gentoo? Wasn't your point that the 
-march=
was supposed to be the relevant factor here? In that case, you want to 
keep

all other things equal; so use the same distribution, only with -O2
-march=i686 vs. -march=athlon-xp (or whatever).

/* Steinar */

Using Gentoo is just a easy way to make cflag optimizations to all the
other libs as well: glibc, ...

I can also mesure performance on Gentoo with cflag optimized
PostgreSQL and plain PostgreSQL as well.



I can certainly recall that when I switched from Fedora Core (2 or 3 
can't recall now) to Gentoo that the machine was faster for many 
activities. Of course I can't recall precisely what now :-(.


To actually track down *why* and *what* make it faster is another story, 
and custom CFLAGS is only 1 of the possible factors: others could be:


- different kernel versions (Gentoo would have possibly been later)
- different kernel patches (both RedHat and Gentoo patch 'em)
- different versions of glibc (Gentoo possibly later again).
- different config options for glibc (not sure if they in fact are, but 
it's possible...)
- kernel and glibc built with different versions of gcc (again I suspect 
Gentoo may have used a later version)


So there are a lot of variables to consider if you want to settle this 
debate once and for all :-)!


Best wishes

Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] New to PostgreSQL, performance considerations


Hi yall,

I made some preliminary tests.

Before the results, I would like to make some acknowledgments:
1 - I didn't show any prove to any of the things I said until now.
2 - It really is a waste of everyone's time to say one thing when I
can't prove it.

But all I said, is the knowledge I have been acumulating over the past
few years working on a project where optimization is important. After
algorithmic optimizations, compiler options is the second on my list
and with my software they show measurable improvement. With the other
software I use, they seen to run faster, but I didn't measure it.

TEST PROCEDURE

I ran this test at a Gentoo test machine I have here. It's a Pentium 4
3.0GHz (I don't know witch P4) with 1 GB of RAM memory. It only uses
SATA drives. I didn't changed my glibc (or any other lib) during the
test. I used GCC 3.4.6.

I ran each test three times. So we can get an idea about average
values and standard deviation.

Each time I ran the test with the following commands:
dropdb mydb
createdb mydb
pgbench -i -s 10 mydb 2 /dev/null
psql -c 'vacuum analyze' mydb
psql -c 'checkpoint' mydb
sync
pgbench -v -n -t 600 -c 5 mydb

My postgresql.conf was the default one, except for:
fsync = depends on the test
shared_buffers = 1
work_mem = 10240

Every test results should begin the above, but I removed it because
it's always the same:
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 5
number of transactions per client: 600
number of transactions actually processed: 3000/3000

TESTS RESULTS
==
TEST 01: CFLAGS=-O2 -march=i686 fsync=false

tps = 734.948620 (including connections establishing)
tps = 736.866642 (excluding connections establishing)

tps = 713.225047 (including connections establishing)
tps = 715.039059 (excluding connections establishing)

tps = 721.769259 (including connections establishing)
tps = 723.631065 (excluding connections establishing)


TEST 02: CFLAGS=-O2 -march=i686 fsync=true

tps = 75.466058 (including connections establishing)
tps = 75.485675 (excluding connections establishing)

tps = 75.115797 (including connections establishing)
tps = 75.135311 (excluding connections establishing)

tps = 73.883113 (including connections establishing)
tps = 73.901997 (excluding connections establishing)


TEST 03: CFLAGS=-O2 -march=pentium4 fsync=false

tps = 846.337784 (including connections establishing)
tps = 849.067017 (excluding connections establishing)

tps = 829.476269 (including connections establishing)
tps = 832.008129 (excluding connections establishing)

tps = 831.416457 (including connections establishing)
tps = 835.31 (excluding connections establishing)


TEST 04 CFLAGS=-O2 -march=pentium4 fsync=true

tps = 83.224016 (including connections establishing)
tps = 83.248157 (excluding connections establishing)

tps = 80.811892 (including connections establishing)
tps = 80.834525 (excluding connections establishing)

tps = 80.671406 (including connections establishing)
tps = 80.693975 (excluding connections establishing)


CONCLUSIONS
Everyone can get their own conclusion. Mine is:

1 - You have improvement when you compile your postgresql using
processor specific tecnologies. With the fsync the you have an
improvement of 9% at the tps rate. Without the fsync, the improvement
is of 15,6%.

2 - You can still improve your indexes, sqls and everythingelse, this
only adds another possible improvment.

3 - I can't prove this but I *think* that this is related to the fact
that GCC knows how to do the same thing better on each processor.

4 - I'm still using source-based distros.

WHAT NOW
There are other things I wish to test:
1 - What efect an optimized glibc has on PostgreSQL?
2 - How much improvement can I get playing with my postgresql.conf.
3 - What efect optimizations have with concurrency?
4 - What if I used Intel C++ Compiler instead of GCC?
5 - What if I use GCC 4.1.1 instead of GCC 3.4.6?

I'm thinking about writing a script to make all the tests (more than 3
times each), get the data and plot some graphs.

I don't have the time right now to do it, maybe next week I'll have.

I invite everyone to comment/sugest on the procedure or the results.

Best regards,
Daniel Colchete

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] New to PostgreSQL, performance considerations

Daniel,

Good stuff.

Can you try this with just -O3 versus -O2?

- Luke


On 12/11/06 2:22 PM, Daniel van Ham Colchete [EMAIL PROTECTED]
wrote:

 Hi yall,
 
 I made some preliminary tests.
 
 Before the results, I would like to make some acknowledgments:
 1 - I didn't show any prove to any of the things I said until now.
 2 - It really is a waste of everyone's time to say one thing when I
 can't prove it.
 
 But all I said, is the knowledge I have been acumulating over the past
 few years working on a project where optimization is important. After
 algorithmic optimizations, compiler options is the second on my list
 and with my software they show measurable improvement. With the other
 software I use, they seen to run faster, but I didn't measure it.
 
 TEST PROCEDURE
 
 I ran this test at a Gentoo test machine I have here. It's a Pentium 4
 3.0GHz (I don't know witch P4) with 1 GB of RAM memory. It only uses
 SATA drives. I didn't changed my glibc (or any other lib) during the
 test. I used GCC 3.4.6.
 
 I ran each test three times. So we can get an idea about average
 values and standard deviation.
 
 Each time I ran the test with the following commands:
 dropdb mydb
 createdb mydb
 pgbench -i -s 10 mydb 2 /dev/null
 psql -c 'vacuum analyze' mydb
 psql -c 'checkpoint' mydb
 sync
 pgbench -v -n -t 600 -c 5 mydb
 
 My postgresql.conf was the default one, except for:
 fsync = depends on the test
 shared_buffers = 1
 work_mem = 10240
 
 Every test results should begin the above, but I removed it because
 it's always the same:
 transaction type: TPC-B (sort of)
 scaling factor: 10
 number of clients: 5
 number of transactions per client: 600
 number of transactions actually processed: 3000/3000
 
 TESTS RESULTS
 ==
 TEST 01: CFLAGS=-O2 -march=i686 fsync=false
 
 tps = 734.948620 (including connections establishing)
 tps = 736.866642 (excluding connections establishing)
 
 tps = 713.225047 (including connections establishing)
 tps = 715.039059 (excluding connections establishing)
 
 tps = 721.769259 (including connections establishing)
 tps = 723.631065 (excluding connections establishing)
 
 
 TEST 02: CFLAGS=-O2 -march=i686 fsync=true
 
 tps = 75.466058 (including connections establishing)
 tps = 75.485675 (excluding connections establishing)
 
 tps = 75.115797 (including connections establishing)
 tps = 75.135311 (excluding connections establishing)
 
 tps = 73.883113 (including connections establishing)
 tps = 73.901997 (excluding connections establishing)
 
 
 TEST 03: CFLAGS=-O2 -march=pentium4 fsync=false
 
 tps = 846.337784 (including connections establishing)
 tps = 849.067017 (excluding connections establishing)
 
 tps = 829.476269 (including connections establishing)
 tps = 832.008129 (excluding connections establishing)
 
 tps = 831.416457 (including connections establishing)
 tps = 835.31 (excluding connections establishing)
 
 
 TEST 04 CFLAGS=-O2 -march=pentium4 fsync=true
 
 tps = 83.224016 (including connections establishing)
 tps = 83.248157 (excluding connections establishing)
 
 tps = 80.811892 (including connections establishing)
 tps = 80.834525 (excluding connections establishing)
 
 tps = 80.671406 (including connections establishing)
 tps = 80.693975 (excluding connections establishing)
 
 
 CONCLUSIONS
 Everyone can get their own conclusion. Mine is:
 
 1 - You have improvement when you compile your postgresql using
 processor specific tecnologies. With the fsync the you have an
 improvement of 9% at the tps rate. Without the fsync, the improvement
 is of 15,6%.
 
 2 - You can still improve your indexes, sqls and everythingelse, this
 only adds another possible improvment.
 
 3 - I can't prove this but I *think* that this is related to the fact
 that GCC knows how to do the same thing better on each processor.
 
 4 - I'm still using source-based distros.
 
 WHAT NOW
 There are other things I wish to test:
  1 - What efect an optimized glibc has on PostgreSQL?
  2 - How much improvement can I get playing with my postgresql.conf.
  3 - What efect optimizations have with concurrency?
  4 - What if I used Intel C++ Compiler instead of GCC?
  5 - What if I use GCC 4.1.1 instead of GCC 3.4.6?
 
 I'm thinking about writing a script to make all the tests (more than 3
 times each), get the data and plot some graphs.
 
 I don't have the time right now to do it, maybe next week I'll have.
 
 I invite everyone to comment/sugest on the procedure or the results.
 
 Best regards,
 Daniel Colchete
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Low throughput of binary inserts from windows to linux

2006-12-11 Thread Axel Waggershauser


On 12/11/06, Tom Lane [EMAIL PROTECTED] wrote:

Yeah, that's what I couldn't think of the other day.  The principal
report was here:
http://archives.postgresql.org/pgsql-general/2005-01/msg01231.php

By default, Windows XP installs the QoS Packet Scheduler service.
It is not installed by default on Windows 2000.  After I installed
QoS Packet Scheduler on the Windows 2000 machine, the latency
problem vanished.


I found a QoS-RVPS service (not sure about the last four characters
and I'm sitting at my mac at home now...) on one of the WinXP test
boxes, started it and immediately lost network connection :-(. Since I
have pretty much the same skepticism regarding the usefulness of a QoS
packet scheduler to help with a raw-throughput-problem like Lincoln
Yeoh in a follow up mail to the above
(http://archives.postgresql.org/pgsql-general/2005-01/msg01243.php), I
didn't investigate this further.

And regarding the TCP_NODELAY hint from Kevin Grittner: if I am not
wrong with interpreting fe_connect.c, the libpq already deals with it
(fe_connect.c:connectNoDelay). But this made me think about the
'page'-size I use in my blob table...

I tested different sizes on linux some time ago and found that 64KB
was optimal. But playing with different sizes again revealed that my
windows-linux problem seems to be solved if I use _any_ other
(reasonable - meaning something between 4K and 512K) power of two ?!?

Does this make sense to anyone?

Thanks,
 axel

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Low throughput of binary inserts from windows to linux

Axel Waggershauser [EMAIL PROTECTED] writes:
 I tested different sizes on linux some time ago and found that 64KB
 was optimal. But playing with different sizes again revealed that my
 windows-linux problem seems to be solved if I use _any_ other
 (reasonable - meaning something between 4K and 512K) power of two ?!?

I think this almost certainly indicates a Nagle/delayed-ACK
interaction.  I googled and found a nice description of the issue:
http://www.stuartcheshire.org/papers/NagleDelayedAck/

Note that there are no TCP connections in which message payloads are
exact powers of two (and no, I don't know why they didn't try to make
it so).  You are probably looking at a situation where this particular
transfer size results in an odd number of messages where the other sizes
do not, with the different overheads between Windows and everybody else
accounting for the fact that it's only seen with a Windows sender.

If you don't like that theory, another line of reasoning has to do with
the fact that the maximum advertiseable window size in TCP is 65535 ---
there could be some edge-case behaviors in the Windows and Linux stacks
that don't play nicely together for 64K transfer sizes.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] New to PostgreSQL, performance considerations


On Mon, Dec 11, 2006 at 08:22:42PM -0200, Daniel van Ham Colchete wrote:

TEST 01: CFLAGS=-O2 -march=i686 fsync=false

tps = 734.948620 (including connections establishing)
tps = 736.866642 (excluding connections establishing)

[snip]

TEST 03: CFLAGS=-O2 -march=pentium4 fsync=false

tps = 846.337784 (including connections establishing)
tps = 849.067017 (excluding connections establishing)


Can anyone else reproduce these results? I'm on similar hardware (2.5GHz 
P4, 1.5G RAM) and my test results are more like this:


(postgresql 8.2.0)

CFLAGS=(default)

tps = 527.300454 (including connections establishing)
tps = 528.898671 (excluding connections establishing)

tps = 517.874347 (including connections establishing)
tps = 519.404970 (excluding connections establishing)

tps = 534.934905 (including connections establishing)
tps = 536.562150 (excluding connections establishing)

CFLAGS=686

tps = 525.179375 (including connections establishing)
tps = 526.801278 (excluding connections establishing)

tps = 557.821136 (including connections establishing)
tps = 559.602414 (excluding connections establishing)

tps = 532.142941 (including connections establishing)
tps = 533.740209 (excluding connections establishing)


CFLAGS=pentium4

tps = 518.869825 (including connections establishing)
tps = 520.394341 (excluding connections establishing)

tps = 537.759982 (including connections establishing)
tps = 539.402547 (excluding connections establishing)

tps = 538.522198 (including connections establishing)
tps = 540.200458 (excluding connections establishing)

Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Christopher Browne

After a long battle with technology, [EMAIL PROTECTED] (Michael Stone), an 
earthling, wrote:
 [1] I will say that I have never seen a realistic benchmark of
 general code where the compiler flags made a statistically
 significant difference in the runtime.

When we were initially trying out PostgreSQL on AIX, I did some
(limited, admittedly) comparisons between behaviour when compiled
using GCC 3.something, VisualAge C, and VisualAge C++.  I did some
modifications of -O values; I didn't find differences amounting to
more than a percent or two between any of the combinations.

If there's to be a difference, anywhere, it ought to have figured
pretty prominently between a pretty elderly GCC version and IBM's top
of the line PowerPC compiler.
-- 
output = reverse(gro.mca @ enworbbc)
http://cbbrowne.com/info/linuxdistributions.html
I  doubt this language  difference would  confuse anybody  unless you
were providing instructions on the insertion of a caffeine enema.
-- On alt.coffee

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Greg Smith


On Mon, 11 Dec 2006, Michael Stone wrote:

Can anyone else reproduce these results? I'm on similar hardware (2.5GHz P4, 
1.5G RAM)...


There are two likely candidates for why Daniel's P4 3.0GHz significantly 
outperforms your 2.5GHz system.


1) Most 2.5GHZ P4 processors use a 533MHz front-side bus (FSB); most 
3.0GHZ ones use an 800MHz bus.


2) A typical motherboard paired with a 2.5GHz era processor will have a 
single-channel memory interface; a typical 3.0GHZ era board supports 
dual-channel DDR.


These changes could easily explain the magnitude of difference in results 
you're seeing, expecially when combined with a 20% greater raw CPU clock.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] New to PostgreSQL, performance considerations