Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-04-24 Thread Bruce Momjian

Added to TODO:

   o Impove COPY performance

 http://archives.postgresql.org/pgsql-hackers/2008-02/msg00954.php


---

Heikki Linnakangas wrote:
 Bruce Momjian wrote:
  Heikki, are there any TODO items here?
 
 Hmm. You could put an entry there for profile and optimize COPY, with 
 the below list of ideas as a starting point. It's more about profiling 
 and performance testing than coding.
 
  ---
  
  Heikki Linnakangas wrote:
  Joshua D. Drake wrote:
  On Sun, 24 Feb 2008 00:43:18 +
  Heikki Linnakangas [EMAIL PROTECTED] wrote:
   
  Incidentally, I've been working on a patch to speed up CopyReadLine.
  I was going to run some more tests first, but since we're talking
  about it, I guess I should just post the patch. I'll post to
  pgsql-patches shortly.
  On your post to patches you mentioned only about a 5% improvement.
  Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
  can tell we are about 300% behind the curve.
  Yeah. Looking at the profile, the time is spent really all over the 
  place. There's no one clear bottleneck to focus on. I think we could do 
  a few more ~5% improvements, but
 
  At some point, I think we have to bite the bullet and find a way to use 
  multiple CPUs for a single load. I don't have any good ideas or plans 
  for that, but hopefully someone does.
 
  My tests were maxing out at ~22G an hour. On hardware that can do 
  in 360G an hour and that is assuming  50% overhead between OS, libs,
  etc... I have no choice but to conclude we have a much, much deeper and
  fundamental issue going on with COPY. I am inspired by Itagaki Takahiro
  and his batch update of indexes which should help greatly overall but
  doesn't help my specific issue.
  Yep, the index build idea is an I/O improvement, not a CPU one.
 
  Forgive me for not being a C programmer and Alvaro is not online so I
  would vet these questions with him first.
 
  I know that copy is in theory a bulk loader but, when performing the
  readline how many lines are we reading?  Do we read up to 8192? Or do we
  shove in say 8megs of data before we invoke DoCopy?
  We read 64 KB at a time, and then CopyReadLineText returns one line at a 
  time from that buffer.
 
  Looking at your profile more, and after the memchr patch, the raw input 
  side of copy, consisting of reading the data from disk in 64KB blocks, 
  splitting that into lines, and splitting lines into columns, still takes 
  ~20% of the CPU time. I suspect CopyReadAttributesText is the biggest 
  culprit there.
 
  You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by 
  creating the table (or truncating it) in the same transaction with the 
  COPY.
 
  After that, heap_formtuple is high on the list. I wonder if we could do 
  something about that.
 
  I am just curious if there is some simple low hanging fruit that is
  possibly missing.
  I don't see any piece of code that's causing problems. We can shave off 
  a few percents here and there I think, but don't expect a 300% 
  improvement anytime soon. A few ideas I've thought about are:
 
  - use a specialized version of strtol, for base 10. That won't help on 
  your table, but I've seen strtol consume a significant amount of time on 
  tables with numeric/integer columns.
 
  - Instead of pallocing and memcpying the text fields, leave a little bit 
  of room between fields in the attribute_buf, and write the varlen header 
  there directly. This might help you since your table has a lot of text 
  fields.
 
  - Instead of the normal PG function calling conventions, provide 
  specialized fastpath input functions for the most common data types. 
  InputFunctionCall consumed 4.5% of the CPU time in your profile.
 
  - Use a simpler memory context implementation, that's like a stack with 
  no pfree support, for the per-tuple context.
 
  -- 
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com
 
  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
  
 
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-03-06 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 3 Mar 2008 20:40:02 -0500 (EST)
Bruce Momjian [EMAIL PROTECTED] wrote:

As a closing note on this from my side. When I ran the restore with 90
connections (as a stress test) it only took 20 minutes longer than the
24 connections.

Joshua D. Drake



- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH0E3IATb/zqfZUUQRAjn3AJ9d0IpBn9pd16AYFWT3qs2frS51lQCfcOIN
uQvwc263TjLMZOYzx7Ktbpo=
=6xOV
-END PGP SIGNATURE-

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-03-04 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki, are there any TODO items here?


Hmm. You could put an entry there for profile and optimize COPY, with 
the below list of ideas as a starting point. It's more about profiling 
and performance testing than coding.



---

Heikki Linnakangas wrote:

Joshua D. Drake wrote:

On Sun, 24 Feb 2008 00:43:18 +
Heikki Linnakangas [EMAIL PROTECTED] wrote:
 

Incidentally, I've been working on a patch to speed up CopyReadLine.
I was going to run some more tests first, but since we're talking
about it, I guess I should just post the patch. I'll post to
pgsql-patches shortly.

On your post to patches you mentioned only about a 5% improvement.
Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
can tell we are about 300% behind the curve.
Yeah. Looking at the profile, the time is spent really all over the 
place. There's no one clear bottleneck to focus on. I think we could do 
a few more ~5% improvements, but


At some point, I think we have to bite the bullet and find a way to use 
multiple CPUs for a single load. I don't have any good ideas or plans 
for that, but hopefully someone does.


My tests were maxing out at ~22G an hour. On hardware that can do 
in 360G an hour and that is assuming  50% overhead between OS, libs,

etc... I have no choice but to conclude we have a much, much deeper and
fundamental issue going on with COPY. I am inspired by Itagaki Takahiro
and his batch update of indexes which should help greatly overall but
doesn't help my specific issue.

Yep, the index build idea is an I/O improvement, not a CPU one.


Forgive me for not being a C programmer and Alvaro is not online so I
would vet these questions with him first.

I know that copy is in theory a bulk loader but, when performing the
readline how many lines are we reading?  Do we read up to 8192? Or do we
shove in say 8megs of data before we invoke DoCopy?
We read 64 KB at a time, and then CopyReadLineText returns one line at a 
time from that buffer.


Looking at your profile more, and after the memchr patch, the raw input 
side of copy, consisting of reading the data from disk in 64KB blocks, 
splitting that into lines, and splitting lines into columns, still takes 
~20% of the CPU time. I suspect CopyReadAttributesText is the biggest 
culprit there.


You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by 
creating the table (or truncating it) in the same transaction with the COPY.


After that, heap_formtuple is high on the list. I wonder if we could do 
something about that.



I am just curious if there is some simple low hanging fruit that is
possibly missing.
I don't see any piece of code that's causing problems. We can shave off 
a few percents here and there I think, but don't expect a 300% 
improvement anytime soon. A few ideas I've thought about are:


- use a specialized version of strtol, for base 10. That won't help on 
your table, but I've seen strtol consume a significant amount of time on 
tables with numeric/integer columns.


- Instead of pallocing and memcpying the text fields, leave a little bit 
of room between fields in the attribute_buf, and write the varlen header 
there directly. This might help you since your table has a lot of text 
fields.


- Instead of the normal PG function calling conventions, provide 
specialized fastpath input functions for the most common data types. 
InputFunctionCall consumed 4.5% of the CPU time in your profile.


- Use a simpler memory context implementation, that's like a stack with 
no pfree support, for the per-tuple context.


--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly





--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-03-03 Thread Bruce Momjian

Heikki, are there any TODO items here?

---

Heikki Linnakangas wrote:
 Joshua D. Drake wrote:
  On Sun, 24 Feb 2008 00:43:18 +
  Heikki Linnakangas [EMAIL PROTECTED] wrote:
   
  Incidentally, I've been working on a patch to speed up CopyReadLine.
  I was going to run some more tests first, but since we're talking
  about it, I guess I should just post the patch. I'll post to
  pgsql-patches shortly.
  
  On your post to patches you mentioned only about a 5% improvement.
  Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
  can tell we are about 300% behind the curve.
 
 Yeah. Looking at the profile, the time is spent really all over the 
 place. There's no one clear bottleneck to focus on. I think we could do 
 a few more ~5% improvements, but
 
 At some point, I think we have to bite the bullet and find a way to use 
 multiple CPUs for a single load. I don't have any good ideas or plans 
 for that, but hopefully someone does.
 
  My tests were maxing out at ~22G an hour. On hardware that can do 
  in 360G an hour and that is assuming  50% overhead between OS, libs,
  etc... I have no choice but to conclude we have a much, much deeper and
  fundamental issue going on with COPY. I am inspired by Itagaki Takahiro
  and his batch update of indexes which should help greatly overall but
  doesn't help my specific issue.
 
 Yep, the index build idea is an I/O improvement, not a CPU one.
 
  Forgive me for not being a C programmer and Alvaro is not online so I
  would vet these questions with him first.
  
  I know that copy is in theory a bulk loader but, when performing the
  readline how many lines are we reading?  Do we read up to 8192? Or do we
  shove in say 8megs of data before we invoke DoCopy?
 
 We read 64 KB at a time, and then CopyReadLineText returns one line at a 
 time from that buffer.
 
 Looking at your profile more, and after the memchr patch, the raw input 
 side of copy, consisting of reading the data from disk in 64KB blocks, 
 splitting that into lines, and splitting lines into columns, still takes 
 ~20% of the CPU time. I suspect CopyReadAttributesText is the biggest 
 culprit there.
 
 You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by 
 creating the table (or truncating it) in the same transaction with the COPY.
 
 After that, heap_formtuple is high on the list. I wonder if we could do 
 something about that.
 
  I am just curious if there is some simple low hanging fruit that is
  possibly missing.
 
 I don't see any piece of code that's causing problems. We can shave off 
 a few percents here and there I think, but don't expect a 300% 
 improvement anytime soon. A few ideas I've thought about are:
 
 - use a specialized version of strtol, for base 10. That won't help on 
 your table, but I've seen strtol consume a significant amount of time on 
 tables with numeric/integer columns.
 
 - Instead of pallocing and memcpying the text fields, leave a little bit 
 of room between fields in the attribute_buf, and write the varlen header 
 there directly. This might help you since your table has a lot of text 
 fields.
 
 - Instead of the normal PG function calling conventions, provide 
 specialized fastpath input functions for the most common data types. 
 InputFunctionCall consumed 4.5% of the CPU time in your profile.
 
 - Use a simpler memory context implementation, that's like a stack with 
 no pfree support, for the per-tuple context.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-hackers


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-26 Thread Joshua D. Drake
On Mon, 25 Feb 2008 22:25:09 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

24 connections: 3.5 hours
12 connections: 4.5 hours



-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Jeff Davis
On Sun, 2008-02-24 at 09:47 -0800, Joshua D. Drake wrote:
 A less hacker and more DBA bottleneck will be to limit the number of 
 backends being created for restore. We don't really want to have more 
 than one backend per CPU, otherwise we just start switching.

Are you sure that it would always be a loss?

If there is any significant I/O latency for a single backend, it seems
like a context switch could be a win for processor utilization. It might
not be a win overall, but at least potentially a win.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Jeff Davis
On Mon, 2008-02-25 at 12:05 -0800, Joshua D. Drake wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Mon, 25 Feb 2008 11:36:56 -0800
 Jeff Davis [EMAIL PROTECTED] wrote:
 
  
  If there is any significant I/O latency for a single backend, it seems
  like a context switch could be a win for processor utilization. It
  might not be a win overall, but at least potentially a win.
 
 Do we want a 20% potential win or an 80% potential win?
 
 I would personally rather keep it simple, hard core, and data shoving
 as possible without any issue with scheduling etc..
 

Just a thought. After it's actually implemented it won't be hard to see
if it's a win.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 25 Feb 2008 11:36:56 -0800
Jeff Davis [EMAIL PROTECTED] wrote:

 
 If there is any significant I/O latency for a single backend, it seems
 like a context switch could be a win for processor utilization. It
 might not be a win overall, but at least potentially a win.

Do we want a 20% potential win or an 80% potential win?

I would personally rather keep it simple, hard core, and data shoving
as possible without any issue with scheduling etc..

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHwx+EATb/zqfZUUQRAvOgAJ4vWCO74XzXy9Pbzqz3otWoqKI3HgCfRwUI
ZLd0SOgf5jnInZvOxCS+iNU=
=Syk1
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 25 Feb 2008 12:17:10 -0800
Jeff Davis [EMAIL PROTECTED] wrote:


  I would personally rather keep it simple, hard core, and data
  shoving as possible without any issue with scheduling etc..
  
 
 Just a thought. After it's actually implemented it won't be hard to
 see if it's a win.

Yep :) but as a note:

I am currently testing on the data set that is giving us all these
issues. Previously we were pushing ~ 22G an hour over a single thread.
I am currently pushing ~ 28G every 16 minutes over 6 threads.

With 30-40% IO wait.

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHwyToATb/zqfZUUQRAsDzAJ0cZEujQIW1SQ9Wd1nd1jWRVWy09ACgpryh
SJENqCnmwKoSMF5fSHBRtsg=
=hVeo
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Jeff Davis
On Mon, 2008-02-25 at 12:28 -0800, Joshua D. Drake wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Mon, 25 Feb 2008 12:17:10 -0800
 Jeff Davis [EMAIL PROTECTED] wrote:
 
 
   I would personally rather keep it simple, hard core, and data
   shoving as possible without any issue with scheduling etc..
   
  
  Just a thought. After it's actually implemented it won't be hard to
  see if it's a win.
 
 Yep :) but as a note:
 
 I am currently testing on the data set that is giving us all these
 issues. Previously we were pushing ~ 22G an hour over a single thread.
 I am currently pushing ~ 28G every 16 minutes over 6 threads.
 
 With 30-40% IO wait.

That begs the question: what about 12 threads?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 25 Feb 2008 14:05:58 -0800
Jeff Davis [EMAIL PROTECTED] wrote:

  Yep :) but as a note:
  
  I am currently testing on the data set that is giving us all these
  issues. Previously we were pushing ~ 22G an hour over a single
  thread. I am currently pushing ~ 28G every 16 minutes over 6
  threads.
  
  With 30-40% IO wait.
 
 That begs the question: what about 12 threads?

That is exactly what Alvaro said :P, let me finish this one first.

Joshua D. Drake



- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHwz0EATb/zqfZUUQRAo1HAJkB58g/gkCWPTqSqjNzrcxGZ4eiNACgi7Va
gZGMboxrPwV4euv67anSyfo=
=9S0a
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Joshua D. Drake
On Mon, 25 Feb 2008 14:11:16 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

O.k. with 24 connections 3.5 hours. Testing with 12 now. 6 never
finished due to a bug.

Observations:

As simple as this solution is, it is not eloquent nor is it smart.
Using this method, if you have a 100GB table (which is very common)
you are still bound in a bad way by a single connection and you are
holding up everyone else.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Jeff Davis

On Mon, 2008-02-25 at 21:18 -0800, Joshua D. Drake wrote:
 As simple as this solution is, it is not eloquent nor is it smart.
 Using this method, if you have a 100GB table (which is very common)
 you are still bound in a bad way by a single connection and you are
 holding up everyone else.

In your case I can see your point. 

For me it would still be very helpful. If that 100GB table has several
indexes, particularly on localized text, that can take a lot of
processor time to rebuild (even for a substantially smaller dataset,
like in the 7 hour restore thread). It seems like a no-brainer to be
able to utilize all available cores.

I think one big improvement is to break it into steps as Simon suggests
here:
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00205.php

and my idea to further break it down:
http://archives.postgresql.org/pgsql-hackers/2008-02/msg00699.php

I think we should consider all of these pg_restore improvements, because
they're merely simplifying the DBA's job. Currently, to get these
benefits, I have to organize and parallelize the restore manually.

Actually, the tests you're running are helping me as much as any
pg_restore changes might anyway. I don't mind a small amount of extra
work to dump/restore, but other users might get a bad impression of
PostgreSQL if they don't know how to make it perform to their
expectations.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-25 Thread Joshua D. Drake
On Mon, 25 Feb 2008 22:29:32 -0800
Jeff Davis [EMAIL PROTECTED] wrote:

 For me it would still be very helpful. If that 100GB table has several
 indexes, particularly on localized text, that can take a lot of
 processor time to rebuild (even for a substantially smaller dataset,
 like in the 7 hour restore thread). It seems like a no-brainer to be
 able to utilize all available cores.

Oh, I agree that we should be using all cores. I would argue that we
should have been doing that for years now but more importantly to me is
that pg_restore even single threaded is slow. 

 
 I think we should consider all of these pg_restore improvements,
 because they're merely simplifying the DBA's job. Currently, to get
 these benefits, I have to organize and parallelize the restore
 manually.

Definitely.

 
 Actually, the tests you're running are helping me as much as any
 pg_restore changes might anyway. I don't mind a small amount of extra
 work to dump/restore, but other users might get a bad impression of
 PostgreSQL if they don't know how to make it perform to their
 expectations.

Certainly but having to hand roll this is bad. It presents us in a
decidedly hackish light. 

Sincerely,

Joshua D. Drake

 
 Regards,
   Jeff Davis
 


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Heikki Linnakangas

Joshua D. Drake wrote:

On Sun, 24 Feb 2008 00:43:18 +
Heikki Linnakangas [EMAIL PROTECTED] wrote:
 

Incidentally, I've been working on a patch to speed up CopyReadLine.
I was going to run some more tests first, but since we're talking
about it, I guess I should just post the patch. I'll post to
pgsql-patches shortly.


On your post to patches you mentioned only about a 5% improvement.
Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
can tell we are about 300% behind the curve.


Yeah. Looking at the profile, the time is spent really all over the 
place. There's no one clear bottleneck to focus on. I think we could do 
a few more ~5% improvements, but


At some point, I think we have to bite the bullet and find a way to use 
multiple CPUs for a single load. I don't have any good ideas or plans 
for that, but hopefully someone does.


My tests were maxing out at ~22G an hour. On hardware that can do 
in 360G an hour and that is assuming  50% overhead between OS, libs,

etc... I have no choice but to conclude we have a much, much deeper and
fundamental issue going on with COPY. I am inspired by Itagaki Takahiro
and his batch update of indexes which should help greatly overall but
doesn't help my specific issue.


Yep, the index build idea is an I/O improvement, not a CPU one.


Forgive me for not being a C programmer and Alvaro is not online so I
would vet these questions with him first.

I know that copy is in theory a bulk loader but, when performing the
readline how many lines are we reading?  Do we read up to 8192? Or do we
shove in say 8megs of data before we invoke DoCopy?


We read 64 KB at a time, and then CopyReadLineText returns one line at a 
time from that buffer.


Looking at your profile more, and after the memchr patch, the raw input 
side of copy, consisting of reading the data from disk in 64KB blocks, 
splitting that into lines, and splitting lines into columns, still takes 
~20% of the CPU time. I suspect CopyReadAttributesText is the biggest 
culprit there.


You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by 
creating the table (or truncating it) in the same transaction with the COPY.


After that, heap_formtuple is high on the list. I wonder if we could do 
something about that.



I am just curious if there is some simple low hanging fruit that is
possibly missing.


I don't see any piece of code that's causing problems. We can shave off 
a few percents here and there I think, but don't expect a 300% 
improvement anytime soon. A few ideas I've thought about are:


- use a specialized version of strtol, for base 10. That won't help on 
your table, but I've seen strtol consume a significant amount of time on 
tables with numeric/integer columns.


- Instead of pallocing and memcpying the text fields, leave a little bit 
of room between fields in the attribute_buf, and write the varlen header 
there directly. This might help you since your table has a lot of text 
fields.


- Instead of the normal PG function calling conventions, provide 
specialized fastpath input functions for the most common data types. 
InputFunctionCall consumed 4.5% of the CPU time in your profile.


- Use a simpler memory context implementation, that's like a stack with 
no pfree support, for the per-tuple context.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
I don't see any piece of code that's causing problems. 


Meant to say: I don't see any *single* piece of code that's causing the 
problems...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread andy

Heikki Linnakangas wrote:

Joshua D. Drake wrote:

On Sun, 24 Feb 2008 00:43:18 +
Heikki Linnakangas [EMAIL PROTECTED] wrote:
 

Incidentally, I've been working on a patch to speed up CopyReadLine.
I was going to run some more tests first, but since we're talking
about it, I guess I should just post the patch. I'll post to
pgsql-patches shortly.


On your post to patches you mentioned only about a 5% improvement.
Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
can tell we are about 300% behind the curve.


Yeah. Looking at the profile, the time is spent really all over the 
place. There's no one clear bottleneck to focus on. I think we could do 
a few more ~5% improvements, but


At some point, I think we have to bite the bullet and find a way to use 
multiple CPUs for a single load. I don't have any good ideas or plans 
for that, but hopefully someone does.




There was talk elsewhere about making pg_dump/restore smarter.  It could 
create tables, then COPY, and create the indexes last.  Add to that 
pg_restore using multiple connections and you'd have it.  One connection 
could do a COPY, then a second connection could be created to start the 
CREATE INDEX's for that table, while the first connection went on to 
COPY the next table.


Or something like that...

-Andy

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 At some point, I think we have to bite the bullet and find a way to use 
 multiple CPUs for a single load. I don't have any good ideas or plans 
 for that, but hopefully someone does.

As already mentioned upthread, we could do that today, with zero backend
changes, by making pg_restore drive multiple sessions.  Now there are
scenarios where this wouldn't help too much --- eg, a database with only
one enormous table.  We couldn't parallelize the loading of that table,
although we could parallelize creation of its indexes.  But for an
example such as JD is complaining about, we ought to have no trouble
thrashing his disks into the ground ;-)

What you would need for this is an -Fc or -Ft dump, because a plain
script output from pg_dump doesn't carry any dependency information,
much less any index of where in the file different bits of data are.
Just armwaving, I envision a multiprocess pg_restore like this:

* one controller process that tracks the not-yet-loaded
  TOC items in the dump
* N child processes that just transmit a selected TOC item
  to a connected backend, and then ask the controller
  what to do next

The controller would need to use the dependency information to avoid,
eg, handing out a CREATE INDEX command before the parent table was
created and loaded.

One issue is that this couldn't use -1 single-transaction restoring,
since obviously each child would need its own transaction, and
furthermore would have to commit before going back to the controller
for more work (since dependent TOC items might well get loaded by
a different child later).  That defeats a couple of optimizations that
Simon put in recently.  The one for no XLOG during COPY is not too
hard to see how to re-enable, but I'm not sure what else there was.

Most likely, the bottleneck with this sort of thing would be multiple
parallel reads from the pg_dump archive file.  Possibly the controller
process could be taught to schedule COPY and CREATE INDEX operations
so that not too many processes are trying to read lots of archive
data at the same time.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Joshua D. Drake

Heikki Linnakangas wrote:

Joshua D. Drake wrote:

On Sun, 24 Feb 2008 00:43:18 +
Heikki Linnakangas [EMAIL PROTECTED] wrote:



I know that copy is in theory a bulk loader but, when performing the
readline how many lines are we reading?  Do we read up to 8192? Or do we
shove in say 8megs of data before we invoke DoCopy?


We read 64 KB at a time, and then CopyReadLineText returns one line at a 
time from that buffer.


O.k. I am sure I am oversimplifying things but why are we returning one 
line at a time? That reads expensive to me. Just following the general, 
don't do inserts one at a time, do them in batch idea for example.


I would also question the 64KB at a time. Why not a 1024KB (arbitrary) 
at a time? Is it a resource issue? In the old days when we actually had 
people trying to run postgresql on 128 and 256 megs of ram, o.k. but now?


Looking at your profile more, and after the memchr patch, the raw input 
side of copy, consisting of reading the data from disk in 64KB blocks, 
splitting that into lines, and splitting lines into columns, still takes 
~20% of the CPU time. I suspect CopyReadAttributesText is the biggest 
culprit there.


In reading:

http://doxygen.postgresql.org/backend_2commands_2copy_8c-source.html

It looks like CopyReadAttributesText is used as part of the column 
breakup. It also appears that this happens before insert right? So if 
that is the case we are going to pay an additional penalty on the data 
checking.




You could avoid the ~8% spent in XLogInsert in PostgreSQL 8.3, by 
creating the table (or truncating it) in the same transaction with the 
COPY.


In the same transaction? Oh that's interesting. So that might be a TODO 
right there. Change pg_dump so it does:


create,copy,create,copy,index

Instead of:

create,create,copy,copy,index



After that, heap_formtuple is high on the list. I wonder if we could do 
something about that.


Just from the name I am going to guess this is where we turn it into 
something that is actually injected into PostgreSQL.


I don't see any piece of code that's causing problems. We can shave off 
a few percents here and there I think, but don't expect a 300% 
improvement anytime soon. A few ideas I've thought about are:


Well don't get me wrong, I am not expecting miracles here. I am just 
confounded at the complete lack of performance in this arena. I don't 
think a lot of people recognize what a significant issue this is since 
we don't have in place backups.




- use a specialized version of strtol, for base 10. That won't help on 
your table, but I've seen strtol consume a significant amount of time on 
tables with numeric/integer columns.


- Instead of pallocing and memcpying the text fields, leave a little bit 
of room between fields in the attribute_buf, and write the varlen header 
there directly. This might help you since your table has a lot of text 
fields.


- Instead of the normal PG function calling conventions, provide 
specialized fastpath input functions for the most common data types. 
InputFunctionCall consumed 4.5% of the CPU time in your profile.


- Use a simpler memory context implementation, that's like a stack with 
no pfree support, for the per-tuple context.




By my calculations you are presenting a possibility of at least ~ 30% 
improvement. That is significant in my book. Hopefully as we explore 
these options we will find others.


Sincerely,

Joshua D. Drake






---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Hannes Dorbath

Tom Lane wrote:

Greg Smith [EMAIL PROTECTED] writes:

On Fri, 22 Feb 2008, Tom Lane wrote:
Counts are useless here, we need to see the sequence of write locations 
to find out if there's a lot of nonconsecutive writes happening.



How were you planning to analyze the strace output to quantify that?


I didn't really have any preconceived ideas about that.  I just want to
see some raw data to see if something shows up.


Isn't blktrace the tool to get that kind of information? Anyway, as the 
following threads point out the problems seems to be somewhere else..



--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Joshua D. Drake

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
At some point, I think we have to bite the bullet and find a way to use 
multiple CPUs for a single load. I don't have any good ideas or plans 
for that, but hopefully someone does.


As already mentioned upthread, we could do that today, with zero backend
changes, by making pg_restore drive multiple sessions.  Now there are
scenarios where this wouldn't help too much --- eg, a database with only
one enormous table.  We couldn't parallelize the loading of that table,
although we could parallelize creation of its indexes.  But for an
example such as JD is complaining about, we ought to have no trouble
thrashing his disks into the ground ;-)


Bring it on ! but I would note that with the current pg_restore I was 
not able to trash my disks. I only used four backends (I could have used 
8) but even with that, I was only doing ~ 45M a second. So if I double, 
I still have bandwidth. It would still be an huge improvement though.




What you would need for this is an -Fc or -Ft dump, because a plain
script output from pg_dump doesn't carry any dependency information,


I would focus on -Fc. With the limitations of -Ft this would be a good 
way to start phasing -Ft out.



much less any index of where in the file different bits of data are.
Just armwaving, I envision a multiprocess pg_restore like this:

* one controller process that tracks the not-yet-loaded
  TOC items in the dump
* N child processes that just transmit a selected TOC item
  to a connected backend, and then ask the controller
  what to do next



Most likely, the bottleneck with this sort of thing would be multiple
parallel reads from the pg_dump archive file.  Possibly the controller
process could be taught to schedule COPY and CREATE INDEX operations
so that not too many processes are trying to read lots of archive
data at the same time.


A less hacker and more DBA bottleneck will be to limit the number of 
backends being created for restore. We don't really want to have more 
than one backend per CPU, otherwise we just start switching.


Sincerely,

Joshua D. Drake

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Andrew Dunstan



Joshua D. Drake wrote:

Heikki Linnakangas wrote:


We read 64 KB at a time, and then CopyReadLineText returns one line 
at a time from that buffer.


O.k. I am sure I am oversimplifying things but why are we returning 
one line at a time? That reads expensive to me. Just following the 
general, don't do inserts one at a time, do them in batch idea for 
example.


Quite simply because one line corresponds to one record. And yes, I 
believe you are oversimplifying, or under several misapprehensions about 
what can be done at this level.




I would also question the 64KB at a time. Why not a 1024KB (arbitrary) 
at a time? Is it a resource issue? In the old days when we actually 
had people trying to run postgresql on 128 and 256 megs of ram, o.k. 
but now?


It would be simple enough to change. Try it and see if it actually makes 
a difference. All you have to change is the define of RAW_BUF_SIZE.




In reading:

http://doxygen.postgresql.org/backend_2commands_2copy_8c-source.html

It looks like CopyReadAttributesText is used as part of the column 
breakup. It also appears that this happens before insert right? So 
if that is the case we are going to pay an additional penalty on the 
data checking.




What? I don't understand what you are talking about.

cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Joshua D. Drake wrote:
 I would also question the 64KB at a time. Why not a 1024KB (arbitrary) 
 at a time? Is it a resource issue? In the old days when we actually 
 had people trying to run postgresql on 128 and 256 megs of ram, o.k. 
 but now?

 It would be simple enough to change. Try it and see if it actually makes 
 a difference. All you have to change is the define of RAW_BUF_SIZE.

Seems unlikely that making it bigger than (a fraction of) L2 cache
would be a smart move.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Joshua D. Drake

Andrew Dunstan wrote:



Joshua D. Drake wrote:

Heikki Linnakangas wrote:


We read 64 KB at a time, and then CopyReadLineText returns one line 
at a time from that buffer.


O.k. I am sure I am oversimplifying things but why are we returning 
one line at a time? That reads expensive to me. Just following the 
general, don't do inserts one at a time, do them in batch idea for 
example.


Quite simply because one line corresponds to one record. And yes, I 
believe you are oversimplifying, or under several misapprehensions about 
what can be done at this level.


Well I find without questioning, I won't get the answers so :).

  It would be simple enough to change. Try it and see if it actually 
makes

a difference. All you have to change is the define of RAW_BUF_SIZE.



Fair enough. May I assume this is the only place I need to change it?

http://doxygen.postgresql.org/backend_2commands_2copy_8c-source.html#l00158


It looks like CopyReadAttributesText is used as part of the column 
breakup. It also appears that this happens before insert right? So 
if that is the case we are going to pay an additional penalty on the 
data checking.




What? I don't understand what you are talking about.


Data checking on insert to the DB itself. I have no doubt that I may be 
wrong on this.


Joshua D. Drake


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-24 Thread Joshua D. Drake

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:

Joshua D. Drake wrote:
I would also question the 64KB at a time. Why not a 1024KB (arbitrary) 
at a time? Is it a resource issue? In the old days when we actually 
had people trying to run postgresql on 128 and 256 megs of ram, o.k. 
but now?


It would be simple enough to change. Try it and see if it actually makes 
a difference. All you have to change is the define of RAW_BUF_SIZE.


Seems unlikely that making it bigger than (a fraction of) L2 cache
would be a smart move.


O.k. these CPUs have 1meg of L2 so I will try with 512k.

Joshua D. Drake



regards, tom lane




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake

Stefan Kaltenbrunner wrote:

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

Yeah, that... is beyond my abilities. Well reading it is anyway. I can
provide any information people want though. Tom? Greg? Andrew? Somebody?
What information do you want from me to help you track this down?


The vmstat output you showed before said user CPU time was only around
12%, which seems to destroy Luke's thesis that data conversion time
is the problem.  IIRC we were speculating that data was being written
in a pattern that required a lot of seeking thus ruining throughput,
but we didn't have any hard evidence of that.  Did you do the strace'ing
I suggested?



Joshua said this is an 8 core box - so 12% would about equal to one core 
being maxed out completely CPU wise which seems to correlate fairly well 
with my experience (COPY being CPU bottlenecked on fast hardware without 
any real culprit to find in profiles).


Right. I mentioned previously that I was able to gain what could be 
considered barely acceptable performance break breaking up a custom dump 
into multiple pieces and restoring across multiple connections.


I have Alvaro reviewing the strace. I can't post it without some cleansing.

Sincerely,

Joshua D. Drake


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Stefan Kaltenbrunner

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

Yeah, that... is beyond my abilities. Well reading it is anyway. I can
provide any information people want though. Tom? Greg? Andrew? Somebody?
What information do you want from me to help you track this down?


The vmstat output you showed before said user CPU time was only around
12%, which seems to destroy Luke's thesis that data conversion time
is the problem.  IIRC we were speculating that data was being written
in a pattern that required a lot of seeking thus ruining throughput,
but we didn't have any hard evidence of that.  Did you do the strace'ing
I suggested?



Joshua said this is an 8 core box - so 12% would about equal to one core 
being maxed out completely CPU wise which seems to correlate fairly well 
with my experience (COPY being CPU bottlenecked on fast hardware without 
any real culprit to find in profiles).




Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Tom Lane
Stefan Kaltenbrunner [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The vmstat output you showed before said user CPU time was only around
 12%, which seems to destroy Luke's thesis that data conversion time
 is the problem.

 Joshua said this is an 8 core box - so 12% would about equal to one core 
 being maxed out completely CPU wise

Oh, good point.  That should be fairly easy to confirm with top I
would think.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake

Tom Lane wrote:

Stefan Kaltenbrunner [EMAIL PROTECTED] writes:

Tom Lane wrote:

The vmstat output you showed before said user CPU time was only around
12%, which seems to destroy Luke's thesis that data conversion time
is the problem.


Joshua said this is an 8 core box - so 12% would about equal to one core 
being maxed out completely CPU wise


Oh, good point.  That should be fairly easy to confirm with top I
would think.


mpstat -P ALL 5

01:43:33 PM  CPU   %user   %nice%sys %iowait%irq   %soft  %steal 
  %idleintr/s
01:43:38 PM  all   12.700.000.150.050.000.000.00 
  87.10   1322.60
01:43:38 PM0   99.600.000.400.000.000.000.00 
   0.00   1000.40
01:43:38 PM10.200.000.000.400.000.000.00 
  99.40322.00
01:43:38 PM20.000.000.200.000.000.000.00 
  99.80  0.00
01:43:38 PM32.000.000.200.000.000.000.00 
  97.60  0.00
01:43:38 PM40.000.000.200.000.000.000.00 
 100.00  0.00
01:43:38 PM50.000.000.000.000.000.000.00 
 100.00  0.00
01:43:38 PM60.000.000.000.000.000.000.00 
 100.00  0.00
01:43:38 PM70.000.000.200.000.000.000.00 
 100.00  0.00


01:43:38 PM  CPU   %user   %nice%sys %iowait%irq   %soft  %steal 
  %idleintr/s
01:43:43 PM  all   13.100.000.680.000.000.000.00 
  86.22   1328.20
01:43:43 PM0   95.600.004.400.000.000.000.00 
   0.00   1000.40
01:43:43 PM10.000.000.000.000.000.200.00 
 100.00327.60
01:43:43 PM20.000.000.000.000.000.000.00 
 100.00  0.00
01:43:43 PM39.000.001.200.000.000.000.00 
  89.80  0.00
01:43:43 PM40.000.000.000.000.000.000.00 
 100.00  0.00
01:43:43 PM50.000.000.000.000.000.000.00 
 100.00  0.00
01:43:43 PM60.000.000.000.000.000.000.00 
 100.00  0.00
01:43:43 PM70.000.000.000.000.000.000.00 
 100.00  0.00


01:43:43 PM  CPU   %user   %nice%sys %iowait%irq   %soft  %steal 
  %idleintr/s
01:43:48 PM  all   13.480.001.680.000.000.000.00 
  84.85   1345.29
01:43:48 PM0   87.580.00   11.020.000.000.000.00 
   1.40   1002.40
01:43:48 PM10.000.000.000.000.000.000.00 
 100.00343.09
01:43:48 PM20.000.000.000.000.000.000.00 
 100.20  0.00
01:43:48 PM3   20.440.002.200.000.000.000.00 
  77.56  0.00
01:43:48 PM40.000.000.000.000.000.000.00 
 100.20  0.00
01:43:48 PM50.000.000.000.000.000.000.00 
 100.20  0.00
01:43:48 PM60.000.000.000.000.000.000.00 
 100.20  0.00
01:43:48 PM70.000.000.000.000.000.000.00 
 100.20  0.00


iostat -k 5:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  13.480.002.230.070.00   84.23

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda  36.87 0.00  1443.69  0   7204
sdb   0.60 0.0021.64  0108
sdc   0.00 0.00 0.00  0  0
sdd   0.00 0.00 0.00  0  0
hda   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  13.280.002.151.330.00   83.25

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda 168.66 0.00  9947.31  0  49836
sdb   0.00 0.00 0.00  0  0
sdc   0.00 0.00 0.00  0  0
sdd   0.00 0.00 0.00  0  0
hda   0.00 0.00 0.00  0  0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  13.570.002.320.050.00   84.06

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sda  27.05 0.00  1425.25  0   7112
sdb   0.00 0.00 0.00  0  0
sdc   0.00 0.00 0.00  0  0
sdd   0.00 0.00 0.00  0  0
hda   0.00 0.00 0.00  0  0

 vmstat -adps -S M 5
procs ---memory-- ---swap-- -io --system-- 
-cpu--
 r  b   swpd   free  inact active   si   so  

Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Oh, good point.  That should be fairly easy to confirm with top I
 would think.

 mpstat -P ALL 5

 01:43:38 PM  CPU   %user   %nice%sys %iowait%irq   %soft  %steal 
%idleintr/s
 01:43:43 PM  all   13.100.000.680.000.000.000.00 
86.22   1328.20
 01:43:43 PM0   95.600.004.400.000.000.000.00 
 0.00   1000.40

OK, so it is CPU-bound after all.  Do you have oprofile or anything
equivalent?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

Tom Lane wrote:

Oh, good point.  That should be fairly easy to confirm with top I
would think.



mpstat -P ALL 5


01:43:38 PM  CPU   %user   %nice%sys %iowait%irq   %soft  %steal 
   %idleintr/s
01:43:43 PM  all   13.100.000.680.000.000.000.00 
   86.22   1328.20
01:43:43 PM0   95.600.004.400.000.000.000.00 
0.00   1000.40


OK, so it is CPU-bound after all.  Do you have oprofile or anything
equivalent?


I am at your disposal. If you need oprofile tell me the command and I 
will get you the output.


Joshua D. Drake



regards, tom lane




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake

Joshua D. Drake wrote:


OK, so it is CPU-bound after all.  Do you have oprofile or anything
equivalent?


I am at your disposal. If you need oprofile tell me the command and I 
will get you the output.


(thanks to stefan for helping me get this output)

opreport -l /usr/bin/postgres
CPU: AMD64 processors, speed 2411.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 10

samples  %symbol name
4765415.5165  DoCopy
3623111.7971  CopyReadLine
25046 8.1551  heap_formtuple
24137 7.8592  XLogInsert
13829 4.5028  InputFunctionCall
9618  3.1317  ParseDateTime
9521  3.1001  DecodeDateTime
9446  3.0757  pg_next_dst_boundary
7651  2.4912  DecodeDate
6948  2.2623  AllocSetAlloc
6290  2.0481  ExecConstraints
5462  1.7785  DecodeNumber
3287  1.0703  pg_atoi
3035  0.9882  hash_search_with_hash_value
2866  0.9332  heap_insert
2786  0.9071  MemoryContextCreate
2771  0.9023  hash_any
2727  0.8879  slot_deform_tuple
2639  0.8593  DecodeTime
2517  0.8196  SearchCatCache
2512  0.8179  LWLockAcquire
2362  0.7691  varchar_input
2273  0.7401  timestamptz_in
2267  0.7382  AllocSetDelete
2203  0.7173  .plt
2031  0.6613  int4in
1936  0.6304  CopyGetData
1865  0.6073  tm2timestamp
1846  0.6011  MemoryContextAllocZero
1753  0.5708  ReadBuffer_common
1732  0.5640  ExecMakeFunctionResultNoSets
1658  0.5399  RelationGetBufferForTuple
1655  0.5389  varcharin
1640  0.5340  MemoryContextAlloc
1496  0.4871  AllocSetFree
1476  0.4806  slot_getattr
1466  0.4773  DetermineTimeZoneOffset
1460  0.4754  timestamp_in
1418  0.4617  AllocSetReset
1403  0.4568  LWLockRelease
1398  0.4552  PageAddItem
1384  0.4506  date2j
1370  0.4461  DecodeTimezone
1112  0.3621  boolin
1074  0.3497  slot_attisnull
1049  0.3416  SPI_connect
990   0.3224  AllocSetContextCreate
986   0.3210  PinBuffer
897   0.2921  pq_getmessage
830   0.2703  ExecQual
830   0.2703  SearchSysCache
830   0.2703  pg_strncasecmp
828   0.2696  SPI_finish
777   0.2530  FunctionCall2
758   0.2468  AdjustTimestampForTypmod
746   0.2429  DirectFunctionCall1
725   0.2361  CopyLoadRawBuf
723   0.2354  FunctionCall1
719   0.2341  MarkBufferDirty
694   0.2260  heap_attisnull
686   0.2234  MemoryContextDelete
677   0.2204  pq_getbytes
670   0.2182  float8in
663   0.2159  resetStringInfo
657   0.2139  heap_getsysattr
619   0.2016  CatalogCacheComputeHashValue
608   0.1980  ReleaseBuffer
608   0.1980  ResourceOwnerEnlargeBuffers
605   0.1970  ExecEvalRelabelType
590   0.1921  ReleaseCatCache
551   0.1794  RelationPutHeapTuple
522   0.1700  string_hash
514   0.1674  pfree
506   0.1648  pq_copymsgbytes
505   0.1644  ResourceOwnerForgetCatCacheRef
486   0.1582  pq_getbyte
483   0.1573  Int64GetDatum
476   0.1550  IsToastNamespace
471   0.1534  CacheInvalidateHeapTuple
469   0.1527  hash_search
468   0.1524  hash_uint32
453   0.1475  MemoryContextDeleteChildren
443   0.1442  PageGetHeapFreeSpace
432   0.1407  _SPI_begin_call
428   0.1394  textout
422   0.1374  ResourceOwnerForgetBuffer
421   0.1371  ExecEvalScalarVar
420   0.1368  LockBuffer
393   0.1280  tag_hash
388   0.1263  hashoid
374   0.1218  PageGetFreeSpace
373   0.1215  Float8GetDatum
362   0.1179  UnpinBuffer
355   0.1156  DLMoveToFront
354   0.1153  appendBinaryStringInfo
353   0.1149  string_compare
349   0.1136  isTempToastNamespace
337   0.1097  get_hash_value
335   0.1091  check_stack_depth
331   0.1078  IsSystemRelation
327   0.1065  ExecStoreTuple
322   0.1048  enlargeStringInfo
319   0.1039  BufTableLookup
313   0.1019  ItemPointerEquals
311   0.1013  oideq
308   0.1003  BufferGetBlockNumber
297   0.0967  UnlockReleaseBuffer
291   0.0948  MemoryContextReset
284   0.0925  dt2local
280   0.0912  GetCurrentTransactionIdIfAny
280   0.0912  ReleaseSysCache
271   0.0882  BufTableHashCode
264   0.0860  ResourceOwnerEnlargeCatCacheRefs
259   0.0843  GetCurrentTransactionId
259   0.0843  pg_detoast_datum_packed
252   0.0821  pgstat_count_heap_insert
239   0.0778  ReadBuffer
222   0.0723  IsSystemNamespace
220   0.0716  LockAcquire
219   0.0713  ResourceOwnerRememberCatCacheRef
201   0.0654  GetDatabaseEncoding
182   0.0593  ExecARInsertTriggers
171   0.0557  ResourceOwnerRememberBuffer
157   0.0511  StartBufferIO
147   0.0479  XLogWrite
127   0.0414  AdvanceXLInsertBuffer
101   0.0329  SyncOneBuffer
830.0270  mdextend
790.0257  LockRelease
650.0212  PinBuffer_Locked
640.0208  FileWrite
640.0208  ForwardFsyncRequest
63 

Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 opreport -l /usr/bin/postgres
 CPU: AMD64 processors, speed 2411.14 MHz (estimated)
 Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
 unit mask of 0x00 (No unit mask) count 10
 samples  %symbol name
 4765415.5165  DoCopy
 3623111.7971  CopyReadLine
 25046 8.1551  heap_formtuple
 24137 7.8592  XLogInsert
 13829 4.5028  InputFunctionCall
 9618  3.1317  ParseDateTime
 9521  3.1001  DecodeDateTime
 9446  3.0757  pg_next_dst_boundary
 7651  2.4912  DecodeDate
 6948  2.2623  AllocSetAlloc
 6290  2.0481  ExecConstraints
 5462  1.7785  DecodeNumber
 3287  1.0703  pg_atoi

Hmm, so what's the declaration of the table you're copying into?
Obviously there's some datetime column(s) in it ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

opreport -l /usr/bin/postgres
CPU: AMD64 processors, speed 2411.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 10

samples  %symbol name
4765415.5165  DoCopy
3623111.7971  CopyReadLine
25046 8.1551  heap_formtuple
24137 7.8592  XLogInsert
13829 4.5028  InputFunctionCall
9618  3.1317  ParseDateTime
9521  3.1001  DecodeDateTime
9446  3.0757  pg_next_dst_boundary
7651  2.4912  DecodeDate
6948  2.2623  AllocSetAlloc
6290  2.0481  ExecConstraints
5462  1.7785  DecodeNumber
3287  1.0703  pg_atoi


Hmm, so what's the declaration of the table you're copying into?
Obviously there's some datetime column(s) in it ...


Let me restart the load. I don't have any statement logging turned on. 
However there is a likelyhood of multiple timestamps.


Joshua D. Drake

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake

Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

opreport -l /usr/bin/postgres
CPU: AMD64 processors, speed 2411.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
unit mask of 0x00 (No unit mask) count 10

samples  %symbol name
4765415.5165  DoCopy
3623111.7971  CopyReadLine
25046 8.1551  heap_formtuple
24137 7.8592  XLogInsert
13829 4.5028  InputFunctionCall
9618  3.1317  ParseDateTime
9521  3.1001  DecodeDateTime
9446  3.0757  pg_next_dst_boundary
7651  2.4912  DecodeDate
6948  2.2623  AllocSetAlloc
6290  2.0481  ExecConstraints
5462  1.7785  DecodeNumber
3287  1.0703  pg_atoi


Hmm, so what's the declaration of the table you're copying into?
Obviously there's some datetime column(s) in it ...


Ahh nevermind.. stopping the load gave me the table:
Column |Type | Modifiers
---+-+
 unit_id   | integer | not null
 lead_id   | integer | not null
 timehit   | timestamp with time zone| not null
 followuptime  | timestamp with time zone| not null
 message   | integer | not null default 0
 email | character varying(50)   | not null
 stop_status   | integer | not null
 stop_time | timestamp with time zone|
 note  | character varying(20)   |
 name  | character varying(60)   |
 name2 | character varying(60)   |
 add_method| integer |
 verified  | integer |
 verification_notes| character varying(100)  |
 add_notes | character varying(100)  |
 datum1| character varying(100)  |
 datum2| character varying(100)  |
 datum3| character varying(100)  |
 datum4| character varying(100)  |
 datum5| character varying(100)  |
 datum6| character varying(100)  |
 datum7| character varying(100)  |
 datum8| character varying(100)  |
 datum9| character varying(100)  |
 datum10   | character varying(100)  |
 datum11   | character varying(100)  |
 datum12   | character varying(100)  |
 datum13   | character varying(100)  |
 datum14   | character varying(100)  |
 datum15   | character varying(100)  |
 datum16   | character varying(100)  |
 datum17   | character varying(100)  |
 datum18   | character varying(100)  |
 datum19   | character varying(100)  |
 datum20   | character varying(100)  |
 datum21   | character varying(100)  |
 datum22   | character varying(100)  |
 datum23   | character varying(100)  |
 datum24   | character varying(100)  |
 datum25   | character varying(100)  |
 stop_method   | integer |
 add_url   | character varying(100)  |
 verification_time | timestamp without time zone |
 geog_data_last_update | timestamp without time zone |
 geog_country  | character varying(100)  |
 geog_region   | character varying(100)  |
 geog_city | character varying(100)  |
 geog_postal   | character varying(100)  |
 geog_lat  | double precision|
 geog_lon  | double precision|
 geog_dma_code | integer |
 geog_area_code| integer |
 geog_isp  | character varying(100)  |
 geog_org  | character varying(100)  |
 file_upload_id| integer |

I have 100 of these tables (identical).

Joshua D. Drake




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-23 Thread Joshua D. Drake
On Sun, 24 Feb 2008 00:43:18 +
Heikki Linnakangas [EMAIL PROTECTED] wrote:
 
 Incidentally, I've been working on a patch to speed up CopyReadLine.
 I was going to run some more tests first, but since we're talking
 about it, I guess I should just post the patch. I'll post to
 pgsql-patches shortly.
 

On your post to patches you mentioned only about a 5% improvement.
Don't get me wrong, 5% is 5% and I respect it greatly but as far as I
can tell we are about 300% behind the curve.

My tests were maxing out at ~22G an hour. On hardware that can do 
in 360G an hour and that is assuming  50% overhead between OS, libs,
etc... I have no choice but to conclude we have a much, much deeper and
fundamental issue going on with COPY. I am inspired by Itagaki Takahiro
and his batch update of indexes which should help greatly overall but
doesn't help my specific issue.

Forgive me for not being a C programmer and Alvaro is not online so I
would vet these questions with him first.

I know that copy is in theory a bulk loader but, when performing the
readline how many lines are we reading? Do we read up to 8192? Or do we
shove in say 8megs of data before we invoke DoCopy?

I am just curious if there is some simple low hanging fruit that is
possibly missing.

Sincerely,

Joshua D. Drake




-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 19 Feb 2008 18:54:38 -0800
Luke Lonergan [EMAIL PROTECTED] wrote:

 Bah.
 
 It's the stuff in the format translation path and conversion to/from
 datums that is the bottleneck.

So how do we fix that? This performance is pretty anemic.

 
 We sped up COPY TO recently by a factor of 10 using similar
 approaches to what we did for COPY FROM in the past.  There's a
 format conversion that is the culprit.

Any chance that code (in whatever shape) could be released for review
of applicability to core?

 
 We routinely get about 12 MB/s of heap insertion rate per CPU core
 and it's CPU bound.

That's embarrassing on today's hardware don't you think? Or am I missing
something entirely?

 
 You can peek in on what's happening using gstack on Linux, or the gdb
 attach and print stacktrace approach for a crude profile.
 

Yeah, that... is beyond my abilities. Well reading it is anyway. I can
provide any information people want though. Tom? Greg? Andrew? Somebody?
What information do you want from me to help you track this down?

Sincerely,

Joshua D. Drake

- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHvwWuATb/zqfZUUQRAucvAKCAXMlS25oLrTD5bbdIuy6/qq61OgCgn4NG
4I2V5WgSMgr24TAM3CsIO50=
=AaB/
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Yeah, that... is beyond my abilities. Well reading it is anyway. I can
 provide any information people want though. Tom? Greg? Andrew? Somebody?
 What information do you want from me to help you track this down?

The vmstat output you showed before said user CPU time was only around
12%, which seems to destroy Luke's thesis that data conversion time
is the problem.  IIRC we were speculating that data was being written
in a pattern that required a lot of seeking thus ruining throughput,
but we didn't have any hard evidence of that.  Did you do the strace'ing
I suggested?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 22 Feb 2008 12:39:57 -0500
Tom Lane [EMAIL PROTECTED] wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
  Yeah, that... is beyond my abilities. Well reading it is anyway. I
  can provide any information people want though. Tom? Greg? Andrew?
  Somebody? What information do you want from me to help you track
  this down?
 
 The vmstat output you showed before said user CPU time was only around
 12%, which seems to destroy Luke's thesis that data conversion time
 is the problem.  IIRC we were speculating that data was being written
 in a pattern that required a lot of seeking thus ruining throughput,
 but we didn't have any hard evidence of that.  Did you do the
 strace'ing I suggested?

Yes, I asked if you wanted counts or the whole output. You didn't
answer :). I provided the counts.

http://archives.postgresql.org/pgsql-hackers/2008-02/msg00359.php

I won't be able to run another test until this weekend :( but anything
I have is yours.

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHvwqLATb/zqfZUUQRAl8tAJ0VmS6ffaFBq1a/UUYZxX9F84vcFgCgkhgl
+UF5Zp59H/JIJa1/ZSKvuC4=
=xw/k
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 IIRC we were speculating that data was being written
 in a pattern that required a lot of seeking thus ruining throughput,
 but we didn't have any hard evidence of that.  Did you do the
 strace'ing I suggested?

 Yes, I asked if you wanted counts or the whole output. You didn't
 answer :). I provided the counts.

Uh, sorry.  Counts are useless here, we need to see the sequence of
write locations to find out if there's a lot of nonconsecutive
writes happening.  BTW, the strace had better run across the whole
PG process tree --- it's quite possible that there's some interaction
between the bgwriter and the backend doing COPY, for instance.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 22 Feb 2008 12:54:00 -0500
Tom Lane [EMAIL PROTECTED] wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] wrote:
  IIRC we were speculating that data was being written
  in a pattern that required a lot of seeking thus ruining
  throughput, but we didn't have any hard evidence of that.  Did you
  do the strace'ing I suggested?
 
  Yes, I asked if you wanted counts or the whole output. You didn't
  answer :). I provided the counts.
 
 Uh, sorry.  Counts are useless here, we need to see the sequence of
 write locations to find out if there's a lot of nonconsecutive
 writes happening.  BTW, the strace had better run across the whole
 PG process tree --- it's quite possible that there's some interaction
 between the bgwriter and the backend doing COPY, for instance.

So you want an strace on postmaster not the connection?

Sincerely,

Joshua D. Drake



- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHvw7DATb/zqfZUUQRAoB0AJ9motp+YpW8A4jlu03VsiaLHRIxAACfQ+tO
ZDXUqBcR5e7xsgPnKju9AZc=
=q3FH
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 ...  BTW, the strace had better run across the whole
 PG process tree --- it's quite possible that there's some interaction
 between the bgwriter and the backend doing COPY, for instance.

 So you want an strace on postmaster not the connection?

You need to launch the postmaster about like this:

strace -f postmaster ...postmaster args... 2strace.out

regards, tom lane

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Greg Smith

On Fri, 22 Feb 2008, Tom Lane wrote:

Counts are useless here, we need to see the sequence of write locations 
to find out if there's a lot of nonconsecutive writes happening.


How were you planning to analyze the strace output to quantify that?  I 
would think you'd need to parse the file to grab those locations then 
generate statistics about seeking from them.  Those strace files will be 
huge and I'd expect Joshua would need to do something like that on his 
side before the results will be short enough to share.


it's quite possible that there's some interaction between the bgwriter 
and the backend doing COPY


It might be worthwhile to turn off the background writer for these runs, 
both to see if it changes anything and to simplify the output.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-22 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 On Fri, 22 Feb 2008, Tom Lane wrote:
 Counts are useless here, we need to see the sequence of write locations 
 to find out if there's a lot of nonconsecutive writes happening.

 How were you planning to analyze the strace output to quantify that?

I didn't really have any preconceived ideas about that.  I just want to
see some raw data to see if something shows up.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, 7 Feb 2008 18:40:37 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

 On Thu, 7 Feb 2008 18:32:41 -0800
 Joshua D. Drake [EMAIL PROTECTED] wrote:
 
  
  Do you want counts or actual output?
 
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  58.221.597638  33 47795   write
  31.860.874104  23 38024   recvfrom
   4.430.121559  92  1319   munmap
   4.170.114445   3 37906   lseek
   0.700.019341  13  1440   brk
   0.560.015402  12  1316   mmap
   0.040.000991  2638   open
   0.010.000298   838   close
   0.010.000156   438   time
   0.000.08   8 1   semop
 -- --- --- - - 
 100.002.743942127915   total
 

Were their any thoughts on this? I am also finding that backing up this
database is rudely slow with the same type of odd metrics (almost zero
(or zero) iowait). We can saturate a CPU but the CPU is certainly not
pushing the data to disk as fast as it could.

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHu0hqATb/zqfZUUQRAm77AJ41yWT4f6UNiMwyHtcq9GASK51uDgCgpKzf
lu14eCbSDl2v2Rsq40zww8E=
=DshT
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 19 Feb 2008 13:21:46 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

 Were their any thoughts on this? I am also finding that backing up
 this database is rudely slow with the same type of odd metrics
 (almost zero (or zero) iowait). We can saturate a CPU but the CPU is
 certainly not pushing the data to disk as fast as it could.

Further on this. We have tested on RHEL4 and RHEL5. Their are two
machines, each with 32GB of ram. I have four of these in the RHEL 5
machine:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 65
model name  : Dual-Core AMD Opteron(tm) Processor 8216
stepping: 2
cpu MHz : 2411.132
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm
extapic cr8_legacy bogomips: 4823.59 TLB size: 1024 4K
pages clflush size: 64 cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

Here is the mini specs:

# 5U Q524 - Quad Opteron 24 SCSI
# Tyan S4885G3NR 800 Series CPU Opteron 800
# AMD Opteron 880 - 2.4GHz 2Core x 4
# 32GB - DDR ECC REG 400MHz (16x2GB) x 1  (16 x 2GB 3200 ECC REG Smart
Modular  (32GB) # HD 1: 73GB SCSI 15K RPM x 24
# PCI 1: MegaRaid SCSI 320-2E - 2ch, U320, PCI Express, 128MB
# MegaRaid LSIiTBBU01 Battery - Order #: LSI9
# PCI 2: MegaRaid SCSI 320-2E - 2ch, U320, PCI Express, 128MB
# MegaRaid LSIiTBBU01 Battery - Order #: LSI9
# DVD-ROM/Sony 3.5 Inch Floppy Drive

The configuration is:

/ RAID 1
/ xlogs RAID 1
/data1 10 drives RAID 10
/data2 10 drives RAID 10

The thing that is frustrating here, is it appears that PostgreSQL just
can't utilize the hardware. I *know* it can because we have larger
machines in production that use PostgreSQL happily. However when I have
220G backups taking 8 hours and restores taking 11 hours, I begin to
wonder where the bottleneck is.

Assuming 25 megs a second per drive (random write) on data1 and data2
we should be pushing 250M a second. Just to be insanely conservative
let's cut that in half to 125M per second. That means every 10 seconds
we should do ~ 1G. That means every minute we should to ~ 6G, which
means 360G an hour.

Granted we have index creation and everything else going on but 11
hours and no IO chewing?

As a note these are reproducible on both machines regardless of RHEL5
or RHEL4.

I know there are much stats here but I have provided them in previous
posts on this thread. Perhaps someone sees a red flag in the hardware?


Sincerely,

Joshua D. Drake
- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHu0vyATb/zqfZUUQRAiuYAJ9ut6i/cPv2MYc8RO2+wNw09M5/WwCfUaGY
sAkFt+S14i0kFMn6mz9juBw=
=TNys
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 19 Feb 2008 13:36:48 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

Hello,

Some more testing on this. This time (using 8.3) I modified my restore
process to use multiple processes by manipulating TOC files. I used
three processes for the data copies, two processes for the pk creation,
two process for normal indexes and two processes for constraint
creation.

The machine averaged 40-60MB/s write versus the pathetic ~ 2/3 MB/s on
a single thread. It had an average I/O wait of  10%. Lastly it
restored 57G of data 1.25 hours. Under my single thread testing 57G
would have taken ~ 3 hours.

I am pretty sure I can make it faster too as I wasn't balancing with
tablespaces nor did I move the xlogs off.

IMO this pretty much proves that we need to seriously look at a multi
connection restores. I can't imagine a situation where we look at
people that have dual and quad cores on their desktops and say... sorry
we can't use that to help you get your data quicker.

Sincerely,

Joshua D. Drake

- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHu4QvATb/zqfZUUQRAhyZAJ9U468fVDm8ww/2TrjDt6gM2wtlhwCffYYq
KJEsKpvRm6efiMQ+uAn/cs4=
=ZEGc
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-19 Thread Luke Lonergan
Bah.

It's the stuff in the format translation path and conversion to/from datums
that is the bottleneck.

We sped up COPY TO recently by a factor of 10 using similar approaches to
what we did for COPY FROM in the past.  There's a format conversion that is
the culprit.

We routinely get about 12 MB/s of heap insertion rate per CPU core and it's
CPU bound.

You can peek in on what's happening using gstack on Linux, or the gdb
attach and print stacktrace approach for a crude profile.

- Luke

On 2/19/08 1:36 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Tue, 19 Feb 2008 13:21:46 -0800
 Joshua D. Drake [EMAIL PROTECTED] wrote:
 
 Were their any thoughts on this? I am also finding that backing up
 this database is rudely slow with the same type of odd metrics
 (almost zero (or zero) iowait). We can saturate a CPU but the CPU is
 certainly not pushing the data to disk as fast as it could.
 
 Further on this. We have tested on RHEL4 and RHEL5. Their are two
 machines, each with 32GB of ram. I have four of these in the RHEL 5
 machine:
 
 processor   : 0
 vendor_id   : AuthenticAMD
 cpu family  : 15
 model   : 65
 model name  : Dual-Core AMD Opteron(tm) Processor 8216
 stepping: 2
 cpu MHz : 2411.132
 cache size  : 1024 KB
 physical id : 0
 siblings: 2
 core id : 0
 cpu cores   : 2
 fpu : yes
 fpu_exception   : yes
 cpuid level : 1
 wp  : yes
 flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
 mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
 fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm
 extapic cr8_legacy bogomips: 4823.59 TLB size: 1024 4K
 pages clflush size: 64 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management: ts fid vid ttp tm stc
 
 Here is the mini specs:
 
 # 5U Q524 - Quad Opteron 24 SCSI
 # Tyan S4885G3NR 800 Series CPU Opteron 800
 # AMD Opteron 880 - 2.4GHz 2Core x 4
 # 32GB - DDR ECC REG 400MHz (16x2GB) x 1  (16 x 2GB 3200 ECC REG Smart
 Modular  (32GB) # HD 1: 73GB SCSI 15K RPM x 24
 # PCI 1: MegaRaid SCSI 320-2E - 2ch, U320, PCI Express, 128MB
 # MegaRaid LSIiTBBU01 Battery - Order #: LSI9
 # PCI 2: MegaRaid SCSI 320-2E - 2ch, U320, PCI Express, 128MB
 # MegaRaid LSIiTBBU01 Battery - Order #: LSI9
 # DVD-ROM/Sony 3.5 Inch Floppy Drive
 
 The configuration is:
 
 / RAID 1
 / xlogs RAID 1
 /data1 10 drives RAID 10
 /data2 10 drives RAID 10
 
 The thing that is frustrating here, is it appears that PostgreSQL just
 can't utilize the hardware. I *know* it can because we have larger
 machines in production that use PostgreSQL happily. However when I have
 220G backups taking 8 hours and restores taking 11 hours, I begin to
 wonder where the bottleneck is.
 
 Assuming 25 megs a second per drive (random write) on data1 and data2
 we should be pushing 250M a second. Just to be insanely conservative
 let's cut that in half to 125M per second. That means every 10 seconds
 we should do ~ 1G. That means every minute we should to ~ 6G, which
 means 360G an hour.
 
 Granted we have index creation and everything else going on but 11
 hours and no IO chewing?
 
 As a note these are reproducible on both machines regardless of RHEL5
 or RHEL4.
 
 I know there are much stats here but I have provided them in previous
 posts on this thread. Perhaps someone sees a red flag in the hardware?
 
 
 Sincerely,
 
 Joshua D. Drake
 - -- 
 The PostgreSQL Company since 1997: http://www.commandprompt.com/
 PostgreSQL Community Conference: http://www.postgresqlconference.org/
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 
 iD8DBQFHu0vyATb/zqfZUUQRAiuYAJ9ut6i/cPv2MYc8RO2+wNw09M5/WwCfUaGY
 sAkFt+S14i0kFMn6mz9juBw=
 =TNys
 -END PGP SIGNATURE-
 
---(end of broadcast)---
TIP
 9: In versions below 8.0, the planner will ignore your desire to
   choose
 an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Andrew Dunstan



Joshua D. Drake wrote:

Hello,

I have been testing a migration for a week now trying to get it into a
reasonable state. This is what we have:

Restore file 220G

8.2.6 and 8.3.0 are configured identically:

shared_buffers = 8000MB
work_mem = 32MB
maintenance_work_mem = 512MB
fsync = off
full_page_writes = off
checkpoint_segments = 300
synchronous_commit = off (8.3)
wal_writer_delay = off (8.3)
autovacuum = off

8.2.6 after 2 hours has restored 41GB.
8.3.0 after 2.5 hours had restored 38GB.

Originally I was thinking that 8.2.6 was stomping 8.3. However I am
thinking that the reduction in the tuple header sizes for 8.3 means
that yes I restored 38GB, it is actually *more* data than 8.2.6. Does
that seem accurate to everyone else? If so what can we do to speed this
up? We are certainly *not* saturating the disk (16 spindles SCSI). 


I am thinking the way we are going to need to do this is to have an
extended outage and write a custom script to do a concurrent dump and
load. (no in this case slony is not an option).

  


I just tested a ~110GB load. On our modest backup server, 8.2 yesterday 
did the data load (i.e. the COPY steps) in 1h57m. Today, 8.3 on 
identical data and settings took 1h42m. Relation size is down by about 
10% too, which is very nice, and probably accounts for the load time 
improvement.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 7 Feb 2008 11:11:32 -0500
Stephen Frost [EMAIL PROTECTED] wrote:

 * Joshua D. Drake ([EMAIL PROTECTED]) wrote:
  Ergghh o.k. I am definitely missing something in the environment. By
  your numbers I should be well over 100GB restored at 2.5 hours. I am
  not. I am only 38GB in.
 
 I'm guessing you've checked this, so don't shoot me if you have,
 but 

Uhh yeah :)

How was the restore file built?  Does it create the
 indexes, primary keys, FKs, whatever, before loading the data?
 That'd slow things down tremendously..  Or if it's creating them
 while loading the data, there would be large pauses while it's
 building the indexes...


Absolutely correct. it would. This dump was created using pg_dumpall.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 07 Feb 2008 11:20:49 -0500
Andrew Dunstan [EMAIL PROTECTED] wrote:

 
 
 Joshua D. Drake wrote:
  On Thu, 07 Feb 2008 09:47:08 -0500
  Andrew Dunstan [EMAIL PROTECTED] wrote:
 

  Restore file 220G
 
  8.2.6 and 8.3.0 are configured identically:
 
  shared_buffers = 8000MB
  work_mem = 32MB
  maintenance_work_mem = 512MB
  fsync = off
  full_page_writes = off
  checkpoint_segments = 300
  synchronous_commit = off (8.3)
  wal_writer_delay = off (8.3)
  autovacuum = off
 
  8.2.6 after 2 hours has restored 41GB.
  8.3.0 after 2.5 hours had restored 38GB.

  I just tested a ~110GB load. On our modest backup server, 8.2
  yesterday did the data load (i.e. the COPY steps) in 1h57m. Today,
  8.3 on identical data and settings took 1h42m. Relation size is
  down by about 10% too, which is very nice, and probably accounts
  for the load time improvement.
  
 
  Ergghh o.k. I am definitely missing something in the environment. By
  your numbers I should be well over 100GB restored at 2.5 hours. I am
  not. I am only 38GB in.
 
  What type of IO do you have on that machine? What type of CPU and
  RAM? 
 
 2Ghz Xeon dual core
 16 Gb RAM
 HW RAID0 data store - not sure how many spindles
 
 I didn't touch any of the 8.3-only config stuff. I have work_mem set
 a lot higher than you do, though - not sure if that makes any
 difference to a bulk load.
 
 This is not a very heavy duty box.

Depends on how you look at it. Restores are traditionally CPU bound if
you have any kind of IO. If you have RAID 0 you have some IO available
to you. It bothers me that the environment I am testing which has 16
spindles (8 per volume) is garnering such miserable performance in
comparison.

In my environment we aren't able to push the I/O at all (wait times 
3%) and yet these are Opterons that we have which have great memory
bandwidth. 8.2 did use more I/O than 8.3 but I think that is just
architectural with the change to 8.3 in general.


Joshua D. Drake
-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Andrew Dunstan



Joshua D. Drake wrote:

On Thu, 07 Feb 2008 09:47:08 -0500
Andrew Dunstan [EMAIL PROTECTED] wrote:

  

Restore file 220G

8.2.6 and 8.3.0 are configured identically:

shared_buffers = 8000MB
work_mem = 32MB
maintenance_work_mem = 512MB
fsync = off
full_page_writes = off
checkpoint_segments = 300
synchronous_commit = off (8.3)
wal_writer_delay = off (8.3)
autovacuum = off

8.2.6 after 2 hours has restored 41GB.
8.3.0 after 2.5 hours had restored 38GB.
  

I just tested a ~110GB load. On our modest backup server, 8.2
yesterday did the data load (i.e. the COPY steps) in 1h57m. Today,
8.3 on identical data and settings took 1h42m. Relation size is down
by about 10% too, which is very nice, and probably accounts for the
load time improvement.



Ergghh o.k. I am definitely missing something in the environment. By
your numbers I should be well over 100GB restored at 2.5 hours. I am
not. I am only 38GB in.

What type of IO do you have on that machine? What type of CPU and RAM?
  


2Ghz Xeon dual core
16 Gb RAM
HW RAID0 data store - not sure how many spindles

I didn't touch any of the 8.3-only config stuff. I have work_mem set a 
lot higher than you do, though - not sure if that makes any difference 
to a bulk load.


This is not a very heavy duty box.

Note: a full restore takes much longer than this - it is almost all 
taken up building indexes. I will be testing that over the weekend, 
probably.


cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 7 Feb 2008 11:54:07 -0500 (EST)
Greg Smith [EMAIL PROTECTED] wrote:

 On Wed, 6 Feb 2008, Joshua D. Drake wrote:
 
  8.2.6 after 2 hours has restored 41GB.
 
 I've been doing a long bulk import job recently (COPY) on a box with
 more spindles than yours (but with a dumb controller) and I too am
 stuck at that speed; I calculate a consistant 19.6GB/hour. 

Bingo! O.k. that is ~ I am pushing. So I am not alone in this world.

 The
 actual disk I/O is very low as that works out to only 5.7MB/s of
 progress.  Mine was bottlenecked by capacity of a single CPU (4X
 Opteron system).  I think this is one of those barriers it's hard to
 crack without a caching controller, for reasons I haven't figured out
 completely yet.

O.k. we have 8X but it is also Opteron.

 
  I am thinking the way we are going to need to do this is to have an
  extended outage and write a custom script to do a concurrent dump
  and load.
 
 If you look at the -performance list this week I've been yelping
 about this issue and trying to figure out how to setup a useful
 multi-CPU loader for cases like these.

I would think that for the custom format this would be reasonably
simple. It is certainly something we need to figure out. The idea that
PostgreSQL can only restore 20G an hour on a box that can actually
write 120+ Megs a second (random) is pathetic.

I know Luke has mentioned some issues in the past as well around CPU
boundness with an upper limit of 300M/s (IIRC) but even that doesn't
equate to what is going on here as we are not getting anywhere near
that.

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 I know Luke has mentioned some issues in the past as well around CPU
 boundness with an upper limit of 300M/s (IIRC) but even that doesn't
 equate to what is going on here as we are not getting anywhere near
 that.

Some vmstat and oprofile investigation seems called for.  Right now
we're just guessing about what the bottleneck is.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Greg Smith

On Wed, 6 Feb 2008, Joshua D. Drake wrote:


8.2.6 after 2 hours has restored 41GB.


I've been doing a long bulk import job recently (COPY) on a box with more 
spindles than yours (but with a dumb controller) and I too am stuck at 
that speed; I calculate a consistant 19.6GB/hour.  The actual disk I/O is 
very low as that works out to only 5.7MB/s of progress.  Mine was 
bottlenecked by capacity of a single CPU (4X Opteron system).  I think 
this is one of those barriers it's hard to crack without a caching 
controller, for reasons I haven't figured out completely yet.



I am thinking the way we are going to need to do this is to have an
extended outage and write a custom script to do a concurrent dump and
load.


If you look at the -performance list this week I've been yelping about 
this issue and trying to figure out how to setup a useful multi-CPU loader 
for cases like these.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Stephen Frost
* Joshua D. Drake ([EMAIL PROTECTED]) wrote:
 Ergghh o.k. I am definitely missing something in the environment. By
 your numbers I should be well over 100GB restored at 2.5 hours. I am
 not. I am only 38GB in.

I'm guessing you've checked this, so don't shoot me if you have, but
How was the restore file built?  Does it create the indexes, primary
keys, FKs, whatever, before loading the data?  That'd slow things down
tremendously..  Or if it's creating them while loading the data, there
would be large pauses while it's building the indexes...

Enjoy,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 07 Feb 2008 09:47:08 -0500
Andrew Dunstan [EMAIL PROTECTED] wrote:

 
  Restore file 220G
 
  8.2.6 and 8.3.0 are configured identically:
 
  shared_buffers = 8000MB
  work_mem = 32MB
  maintenance_work_mem = 512MB
  fsync = off
  full_page_writes = off
  checkpoint_segments = 300
  synchronous_commit = off (8.3)
  wal_writer_delay = off (8.3)
  autovacuum = off
 
  8.2.6 after 2 hours has restored 41GB.
  8.3.0 after 2.5 hours had restored 38GB.
 
 I just tested a ~110GB load. On our modest backup server, 8.2
 yesterday did the data load (i.e. the COPY steps) in 1h57m. Today,
 8.3 on identical data and settings took 1h42m. Relation size is down
 by about 10% too, which is very nice, and probably accounts for the
 load time improvement.

Ergghh o.k. I am definitely missing something in the environment. By
your numbers I should be well over 100GB restored at 2.5 hours. I am
not. I am only 38GB in.

What type of IO do you have on that machine? What type of CPU and RAM?

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 07 Feb 2008 13:47:22 -0500
Tom Lane [EMAIL PROTECTED] wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
  I know Luke has mentioned some issues in the past as well around CPU
  boundness with an upper limit of 300M/s (IIRC) but even that doesn't
  equate to what is going on here as we are not getting anywhere near
  that.
 
 Some vmstat and oprofile investigation seems called for.  Right now
 we're just guessing about what the bottleneck is.

vmstat -adSK 5
procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free  inact active   si   sobibo   incs us sy id wa
 0  2800 15853104 7150484 858170800  1571   3870 0 10  7 71 
12
 2  0800 15668016 7335564 858145600 12982 24382 2699  1421 11  2 82 
 5
 1  0800 15488688 7513572 858301600 13006 24351 2601  1322 10  2 82 
 6
 1  0800 15296024 7704488 858416000 13494 23324 2430  1135 11  2 84 
 3
 1  0800 15017240 7984204 858230800 17219 29910 2632  1427 10  3 86 
 1
 1  2800 14754224 8239160 858334400 13710 41932 2666  1371  8  2 80 
 9
 1  3800 14590216 8390680 859422800  8112 33920 2361  1351  5  2 72 
20
 1  0800 14435000 8562800 858554000  9138 23773 3114  1461  6  2 75 
17
 2  0800 14147160 8848424 858707200 14967 35206 3156  2675 10  4 76 
10
 1  0800 13850560 9141480 858863200 15454 42181 3047  2633 10  4 72 
14
 1  1800 13765824 9212296 860374400  4805 33795 2903  6312  4  9 68 
20
 1  0800 13552088 9436156 859428400 11916 20255 2505  1695  7  3 79 
11
 1  0800 13375152 9612124 859533600 11829 26774 2644  1359 11  2 83 
 4
 1  0800 13301232 9684104 859635600  4747 21765 2173   829 12  1 82 
 5
 1  0800 13294680 9693060 859736000   386  1786 1732   587 12  0 87 
 0
 1  0800 13284448 9702212 859845200   410  1871 1939   878 13  0 86 
 1
 1  0800 13273056 9711180 859945200   386  1870 1996   907 12  0 87 
 0
 1  0800 13235488 9748400 860049200  1824  1862 2128  1107 12  1 86 
 1
 1  0800 13019720 9962004 860226400 17773 19770 2632  1596 11  2 81 
 6
 1  0800 12832288 10143020 860391200 18359 19553 2627  1586 11  2 
78  8
 1  0800 12665952 10313812 860508800 17102 17550 2587  1582 11  2 
83  5
 1  0800 12491488 10486568 860629200 16795 17964 2514  1488 11  2 
85  2
 1  0800 12317592 10659160 860748400 17103 18222 2543  1531 11  2 
82  5
 2  0800 12137840 10836760 860863600 17562 17909 2526  1496 11  2 
82  6

This is 8.2.6, 64bit.

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote:
 Some vmstat and oprofile investigation seems called for.  Right now
 we're just guessing about what the bottleneck is.

 vmstat -adSK 5
 ...

Looks to me like you're disk-bound (and your kernel is pretty bad at
distinguishing idle from disk wait states).  Plain COPY into an
index-less table ought to be writing fairly linearly, so I'm surprised
you aren't able to get more out of the disk.  Have you got WAL on
a separate spindle from the database?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 07 Feb 2008 19:20:26 -0500
Tom Lane [EMAIL PROTECTED] wrote:
 Looks to me like you're disk-bound (and your kernel is pretty bad at
 distinguishing idle from disk wait states).  Plain COPY into an
 index-less table ought to be writing fairly linearly, so I'm surprised
 you aren't able to get more out of the disk.  Have you got WAL on
 a separate spindle from the database?

I didn't, but do now and am testing. The way this is currently designed
is:

/data1 (8 disks RAID 10)
/data2 (8 disks RAID 10)
/pg_xlogs (2 disks RAID 1)

/data1 is what is primarily written against for the first couple of
hours and then it will switch to data2 because of table spaces. However
either way, we should expect (I would think) to see at least 100 megs a
second on an 8 Disk RAID 10. It is SCSI.


 vmstat -adSK 5
procs ---memory-- ---swap-- -io --system-- cpu
 r  b   swpd   free  inact active   si   sobibo   incs us sy id wa
 1  1160 6534048 23711184 143453200  1571   3870 0 10  7 71 
12
 1  1160 6206560 23898644 157368000 18168 23851 2684  1647 11  3 84 
 2
 1  2160 5887072 24080840 171106000 17655 58395 3322  1762 11  3 74 
11
 2  2160 5546160 24275268 185610800 18681 61834 3302  1838 11  3 74 
11
 1  2160 5363432 24383836 193302800 13043 52353 3247  1578 12  2 74 
11
 1  1160 5295336 24423764 196380800  4997 39778 2865  1310 12  1 75 
12
 1  0160 5284184 24426444 197160800   385  6967 2087   955 13  1 84 
 3
 1  0160 5274608 24429180 197823600   384  1009 2030   958 13  0 87 
 0
 1  0160 5265408 24431976 198468800   410   946 1780   656 12  0 87 
 0
 1  1160 5224912 24453296 200370000  1998 23168 2419  1163 12  1 79 
 8
 1  2160 5005048 24605300 206923600 18270 41342 2942  1721 12  2 76 
10
 1  2160 4816016 24748120 211720800 19014 37165 3014  1702 12  2 72 
15
 1  1160 4618064 24894476 216741200 19091 15226 2517  1626 12  2 83 
 3
 1  1160 4423376 25039072 221832400 18603 10398 2585  1644 12  2 84 
 3
 1  1160 4223568 25187592 226881600 19475 10326 2511  1666 12  2 84 
 2
 1  1160 4023056 25337080 231984400 19450 10601 2462  1543 12  2 84 
 2
 1  1160 3821328 25487664 237032800 19834 10657 2454  1540 12  2 84 
 2
 1  1160 3617616 25640832 242082000 20398 11202 2536  1678 12  2 84 
 2
 1  2160 3410040 25795488 247380000 20090 29063 2756  1680 12  2 76 
10
 1  1160 3189064 25957688 253199600 20321 13745 2523  1595 12  2 83 
 4

Sincerely,

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 7 Feb 2008 16:37:39 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

 I didn't, but do now and am testing. The way this is currently
 designed is:
 
 /data1 (8 disks RAID 10)
 /data2 (8 disks RAID 10)
 /pg_xlogs (2 disks RAID 1)
 
 /data1 is what is primarily written against for the first couple of
 hours and then it will switch to data2 because of table spaces.
 However either way, we should expect (I would think) to see at least
 100 megs a second on an 8 Disk RAID 10. It is SCSI.
 

Following up with this... 22G in one hour, with xlogs on a different
partition. Just looking at we are averaging 3-5% IOWait, further we are
only writing ~ 2Megs a second.

This is frustrating.

Sincerely,

Joshua D. Drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Following up with this... 22G in one hour, with xlogs on a different
 partition. Just looking at we are averaging 3-5% IOWait, further we are
 only writing ~ 2Megs a second.

It might be interesting to capture some strace output and get a sense of
what's actually being read and written.  It seems to me that the bad
I/O rate has to indicate that we're seeking all over the place.  We
should try to determine exactly what's being touched in what order.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 07 Feb 2008 21:04:44 -0500
Tom Lane [EMAIL PROTECTED] wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
  Following up with this... 22G in one hour, with xlogs on a different
  partition. Just looking at we are averaging 3-5% IOWait, further we
  are only writing ~ 2Megs a second.
 
 It might be interesting to capture some strace output and get a sense
 of what's actually being read and written.  It seems to me that the
 bad I/O rate has to indicate that we're seeking all over the place.
 We should try to determine exactly what's being touched in what order.

Do you want counts or actual output?

Joshua D. Drake

 
   regards, tom lane
 
 ---(end of
 broadcast)--- TIP 4: Have you searched our
 list archives?
 
http://archives.postgresql.org
 


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Mark Wong
On Thu, 07 Feb 2008 13:47:22 -0500
Tom Lane [EMAIL PROTECTED] wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
  I know Luke has mentioned some issues in the past as well around CPU
  boundness with an upper limit of 300M/s (IIRC) but even that doesn't
  equate to what is going on here as we are not getting anywhere near
  that.
 
 Some vmstat and oprofile investigation seems called for.  Right now
 we're just guessing about what the bottleneck is.

I also recommend looking into partition alignment on the disks.  At
least to quantify how much the storage subsystem suffers from
mis-aligned partitions.  It could be that it doesn't.

Regards,
Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] 8.3 / 8.2.6 restore comparison

2008-02-07 Thread Joshua D. Drake
On Thu, 7 Feb 2008 18:32:41 -0800
Joshua D. Drake [EMAIL PROTECTED] wrote:

 
 Do you want counts or actual output?

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 58.221.597638  33 47795   write
 31.860.874104  23 38024   recvfrom
  4.430.121559  92  1319   munmap
  4.170.114445   3 37906   lseek
  0.700.019341  13  1440   brk
  0.560.015402  12  1316   mmap
  0.040.000991  2638   open
  0.010.000298   838   close
  0.010.000156   438   time
  0.000.08   8 1   semop
-- --- --- - - 
100.002.743942127915   total




-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature