Re: [GENERAL] Space for pg_dump

2009-04-18 Thread Greg Smith

On Tue, 31 Mar 2009, Scott Marlowe wrote:


Sadly, there is no exact maths for such things.  If your database has
tons of indexes and such, it might be 20 or 100 times bigger on disk
than it will be during backup.  If it's all compressible text with few
indexes, it might be a 1:1 or so size.


Since running an entire pgdump can take forever on a big database, what I 
usually do here is start by running the disk usage query at 
http://wiki.postgresql.org/wiki/Disk_Usage


That lets you better see index vs. table usage.  Then, for the bigger 
tables, I do something like this:


psql -c COPY (select * from bigtable limit 10) to stdout | gzip  
bigtable.gz
gzip -l bigtable.gz

That lets you get sample a decent sized chunk of the table to figure out 
what compression ratio you're likely to get on the data in there.  Given 
all the table sizes and a compression ratio estimate, from there you can 
make a fairly accurate guess of what the whole dump is going to take up, 
presuming your data is fairly evenly distributed such that the first 
records that come back are typical of the whole thing.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Space for pg_dump

2009-04-18 Thread Rainer Bauer
Greg Smith wrote:

On Tue, 31 Mar 2009, Scott Marlowe wrote:

 Sadly, there is no exact maths for such things.  If your database has
 tons of indexes and such, it might be 20 or 100 times bigger on disk
 than it will be during backup.  If it's all compressible text with few
 indexes, it might be a 1:1 or so size.

Since running an entire pgdump can take forever on a big database, what I 
usually do here is start by running the disk usage query at 
http://wiki.postgresql.org/wiki/Disk_Usage

Interesting. However, the query gives an error if the table name contains
upper case characters, like in my case tblConnections:

ERROR:  relation public.tblconnections does not exist.

Replacing all occurences of relname by  '' || relname || '' fixes the
error.

Rainer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Community account

2009-04-18 Thread Rainer Bauer
Hello,

I signed up a couple of months ago but never received the confirmation email.

If I try to create a new account with the same email address, but a different
handle it is denied with this error message: A user with that email already
exists.

When I click on the 'lost password' link and enter my email address nothing
happens, i.e. no email arrives. In fact I never received *any* email from the
community registration.

How should I proceed? Is there someone I can contact directly regarding this
matter?

Rainer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] number of relations reported by vacuumdb -av

2009-04-18 Thread Irene Barg
never mindI found the answer in the archives.postgresql.org. The 
answer is 'yes' I use the sum of relations from all of the databases. So 
I have reset 'max_fsm_relations' from 1000 to 2000.


Irene Barg wrote:

Hi,

I have a PostgreSQL installation with 8 databases (counting postgres, 
template0, and template1). I run 'vacuumdb -z' daily on 3 of the largest 
user databases. The vacuumdb logs show the 'max_fsm_pages' need to be 
increased with almost each vacuum. So I did a 'vacuumdb -av' on all the 
db's:

INFO:  free space map contains 81016 pages in 100 relations
DETAIL:  A total of 8 page slots are in use (including overhead).
187792 page slots are required to track all free space.
Current limits are:  8 page slots, 1000 relations, using 534 KB.
NOTICE:  number of page slots needed (187792) exceeds max_fsm_pages 
(8)
HINT:  Consider increasing the configuration parameter max_fsm_pages 
to a value over 187792.

VACUUM


I have a couple questions.

1) I can increase 'max_fsm_pages' from 80K to 200K, but why does it keep 
growing?


The main database sees on average 2500-5000 rows inserted per day, and 
deletes are relatively small (although I don't have stats on deletes).


2) How is '100 relations' getting calculated?

If I connect to each one of my 8 db's and do:

select count(*) from pg_class;

The total number of relations is 1725. So shouldn't I increase 
'max_fsm_relations' from 1000 to 1725?


Thank you in advance.

-- irene
-
Irene BargEmail:  ib...@noao.edu
NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg
950 N. Cherry Ave.Voice:  520-318-8273
Tucson, AZ  85726 USA   FAX:  520-318-8360
-



--
-
Irene BargEmail:  ib...@noao.edu
NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg
950 N. Cherry Ave.Voice:  520-318-8273
Tucson, AZ  85726 USA   FAX:  520-318-8360
-

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Full text search strategy for names

2009-04-18 Thread John DeSoi


On Apr 17, 2009, at 7:02 AM, Rick Schumeyer wrote:

You can get extra (undesirable) results, depending on the name.  For  
example, if you are searching for the last name of Ricks, you will  
also find all authors whose first name is Rick


If you can process the names separately from the rest of the text, try

select to_tsvector('simple', 'claude Jones');

 to_tsvector
--
 'jones':2 'claude':1





John DeSoi, Ph.D.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Community account

2009-04-18 Thread Scott Marlowe
On Sat, Apr 18, 2009 at 3:49 AM, Rainer Bauer use...@munnin.com wrote:
 Hello,

 I signed up a couple of months ago but never received the confirmation email.

 If I try to create a new account with the same email address, but a different
 handle it is denied with this error message: A user with that email already
 exists.

The mailing lists don't support user names, just email addresses.
Where exactly are you signing up?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] number of relations reported by vacuumdb -av

2009-04-18 Thread Martin Gainty

Good Morning Irene

could you verify the requirement to set
max_fsm_pages (integer) to 16 times new value of 'max_fsm_relations'

Thanks!
Martin Gainty 
__ 
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de 
déni et de confidentialité 
This message is confidential. If you should not be the intended receiver, then 
we ask politely to report. Each unauthorized forwarding or manufacturing of a 
copy is inadmissible. This message serves only for the exchange of information 
and has no legal binding effect. Due to the easy manipulation of emails we 
cannot take responsibility over the the contents.
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger 
sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung 
oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem 
Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. 
Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung 
fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le 
destinataire prévu, nous te demandons avec bonté que pour satisfaire informez 
l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est 
interdite. Ce message sert à l'information seulement et n'aura pas n'importe 
quel effet légalement obligatoire. Étant donné que les email peuvent facilement 
être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité 
pour le contenu fourni.






 Date: Sat, 18 Apr 2009 03:23:49 -0700
 From: ib...@noao.edu
 To: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] number of relations reported by vacuumdb -av
 
 never mindI found the answer in the archives.postgresql.org. The 
 answer is 'yes' I use the sum of relations from all of the databases. So 
 I have reset 'max_fsm_relations' from 1000 to 2000.
 
 Irene Barg wrote:
  Hi,
  
  I have a PostgreSQL installation with 8 databases (counting postgres, 
  template0, and template1). I run 'vacuumdb -z' daily on 3 of the largest 
  user databases. The vacuumdb logs show the 'max_fsm_pages' need to be 
  increased with almost each vacuum. So I did a 'vacuumdb -av' on all the 
  db's:
  INFO:  free space map contains 81016 pages in 100 relations
  DETAIL:  A total of 8 page slots are in use (including overhead).
  187792 page slots are required to track all free space.
  Current limits are:  8 page slots, 1000 relations, using 534 KB.
  NOTICE:  number of page slots needed (187792) exceeds max_fsm_pages 
  (8)
  HINT:  Consider increasing the configuration parameter max_fsm_pages 
  to a value over 187792.
  VACUUM
  
  I have a couple questions.
  
  1) I can increase 'max_fsm_pages' from 80K to 200K, but why does it keep 
  growing?
  
  The main database sees on average 2500-5000 rows inserted per day, and 
  deletes are relatively small (although I don't have stats on deletes).
  
  2) How is '100 relations' getting calculated?
  
  If I connect to each one of my 8 db's and do:
  
  select count(*) from pg_class;
  
  The total number of relations is 1725. So shouldn't I increase 
  'max_fsm_relations' from 1000 to 1725?
  
  Thank you in advance.
  
  -- irene
  -
  Irene BargEmail:  ib...@noao.edu
  NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg
  950 N. Cherry Ave.Voice:  520-318-8273
  Tucson, AZ  85726 USA   FAX:  520-318-8360
  -
  
 
 -- 
 -
 Irene BargEmail:  ib...@noao.edu
 NOAO/AURA Inc. http://www.noao.edu/noao/staff/ibarg
 950 N. Cherry Ave.Voice:  520-318-8273
 Tucson, AZ  85726 USA   FAX:  520-318-8360
 -
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

_
Windows Live™: Keep your life in sync.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_allup_1a_explore_042009

Re: [GENERAL] Community account

2009-04-18 Thread Rainer Bauer
Hello Scott,

Saturday, April 18, 2009, 4:21:29 PM, you wrote:

SM On Sat, Apr 18, 2009 at 3:49 AM, Rainer Bauer use...@munnin.com wrote:
 Hello,

 I signed up a couple of months ago but never received the confirmation email.

 If I try to create a new account with the same email address, but a different
 handle it is denied with this error message: A user with that email already
 exists.

SM The mailing lists don't support user names, just email addresses.

Yes, I know. I was talking about a community account.

SM Where exactly are you signing up?

Here: http://www.postgresql.org/community/signup

Rainer


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Community account

2009-04-18 Thread Scott Marlowe
On Sat, Apr 18, 2009 at 8:27 AM, Rainer Bauer use...@munnin.com wrote:
 Hello Scott,

 Saturday, April 18, 2009, 4:21:29 PM, you wrote:

 SM On Sat, Apr 18, 2009 at 3:49 AM, Rainer Bauer use...@munnin.com wrote:
 Hello,

 I signed up a couple of months ago but never received the confirmation 
 email.

 If I try to create a new account with the same email address, but a 
 different
 handle it is denied with this error message: A user with that email already
 exists.

 SM The mailing lists don't support user names, just email addresses.

 Yes, I know. I was talking about a community account.

 SM Where exactly are you signing up?

 Here: http://www.postgresql.org/community/signup

Ahh, ok.  I'd guess that you got an email and anti-spam software ate
it?  Just guessing.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Space for pg_dump

2009-04-18 Thread Tom Lane
Rainer Bauer use...@munnin.com writes:
 Greg Smith wrote:
 Since running an entire pgdump can take forever on a big database, what I 
 usually do here is start by running the disk usage query at 
 http://wiki.postgresql.org/wiki/Disk_Usage

 Interesting. However, the query gives an error if the table name contains
 upper case characters, like in my case tblConnections:

 ERROR:  relation public.tblconnections does not exist.

 Replacing all occurences of relname by  '' || relname || '' fixes the
 error.

That still fails if the table name contains double quotes.  A proper
solution is to use the table OID --- I've corrected the example.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Kenneth Tilton
ie, 5hrs and counting, no clue how long it intends to run, but methinks 
this is insane even if it is 10^7 records, mebbe half a dozen dups per 
value (a product-id usually around 8-chars long):


CREATE INDEX web_source_items_by_item_id_strip
   ON web_source_items
   USING btree (item_id_strip);

Am I unreasonably impatient?

I see pg getting just a few % CPU on a heavily ram/core-endowed Sun box 
with nothing else going on. Mebbe they installed pg on a compact flash? 
DVD-RW? /usr/local/something, prolly not.


Jes pestering you all while I wait, insights welcome.

kenneth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Scott Marlowe
On Sat, Apr 18, 2009 at 4:11 PM, Kenneth Tilton kentil...@gmail.com wrote:
 ie, 5hrs and counting, no clue how long it intends to run, but methinks this
 is insane even if it is 10^7 records, mebbe half a dozen dups per value (a
 product-id usually around 8-chars long):

 CREATE INDEX web_source_items_by_item_id_strip
               ON web_source_items
               USING btree (item_id_strip);

 Am I unreasonably impatient?

 I see pg getting just a few % CPU on a heavily ram/core-endowed Sun box with
 nothing else going on. Mebbe they installed pg on a compact flash? DVD-RW?
 /usr/local/something, prolly not.

What does vmstat 1 60 say during the index build?  Specifically the
cpu columns for user, system, wa?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Kenneth Tilton



Scott Marlowe wrote:
 On Sat, Apr 18, 2009 at 4:11 PM, Kenneth Tilton kentil...@gmail.com 
wrote:
 ie, 5hrs and counting, no clue how long it intends to run, but 
methinks this
 is insane even if it is 10^7 records, mebbe half a dozen dups per 
value (a

 product-id usually around 8-chars long):

 CREATE INDEX web_source_items_by_item_id_strip
   ON web_source_items
   USING btree (item_id_strip);

 Am I unreasonably impatient?

 I see pg getting just a few % CPU on a heavily ram/core-endowed Sun 
box with
 nothing else going on. Mebbe they installed pg on a compact flash? 
DVD-RW?

 /usr/local/something, prolly not.

 What does vmstat 1 60 say during the index build?  Specifically the
 cpu columns for user, system, wa?

uh-oh, Unix noob too, and unfortunately someone has jumped on with a 
CPU-intensive task pegging one of the cores at 100%, so these numbers 
prolly do not help, but here goes:


procs ---memory-- ---swap-- -io --system-- 
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa st
 1  1 2076312 1503204 182152 30669308   49   69   260   29933 
28  2 63  7  0
 1  1 2076312 1502900 182152 3066965600   192  2260 1198  332 
25  1 50 24  0
 1  1 2076312 1503024 182152 3066965600 0   704 1181  282 
25  1 50 25  0
 1  3 2076312 1502904 182156 3066974000   104  2780 1224  422 
25  0 48 26  0
 1  3 2076312 1502896 182156 3066974000 0  1552 1173  309 
25  0 50 25  0
 1  1 2076312 1502140 182172 3066997600   120   312 1222  396 
26  1 48 25  0
 1  1 2076312 1501724 182180 3067040800   496  3996 1161  450 
26  1 50 24  0
 1  1 2076312 1501304 182200 3067096000   428  2892 1203  438 
26  1 50 24  0
 1  2 2076312 1501064 182200 3067140000   236  3456 1168  434 
26  1 49 24  0
 1  2 2076312 1501064 182200 3067136000 0  1620 1225  357 
25  0 50 25  0
 1  1 2076312 1501064 182200 3067136000 0   292 1205  339 
25  1 49 25  0
 1  1 2076312 1500652 182220 3067177600   416  1740 1186  410 
25  1 50 24  0
 1  2 2076312 1500472 182224 3067199200   208  3560 1177  399 
25  1 49 24  0



I'll sample again if I get a window, but these jobs tend to run for hours.

thx,

kt

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Scott Marlowe
On Sat, Apr 18, 2009 at 6:32 PM, Kenneth Tilton kentil...@gmail.com wrote:


 Scott Marlowe wrote:
 On Sat, Apr 18, 2009 at 4:11 PM, Kenneth Tilton kentil...@gmail.com
 wrote:
 ie, 5hrs and counting, no clue how long it intends to run, but methinks
 this
 is insane even if it is 10^7 records, mebbe half a dozen dups per value
 (a
 product-id usually around 8-chars long):

 CREATE INDEX web_source_items_by_item_id_strip
               ON web_source_items
               USING btree (item_id_strip);

 Am I unreasonably impatient?

 I see pg getting just a few % CPU on a heavily ram/core-endowed Sun box
 with
 nothing else going on. Mebbe they installed pg on a compact flash?
 DVD-RW?
 /usr/local/something, prolly not.

 What does vmstat 1 60 say during the index build?  Specifically the
 cpu columns for user, system, wa?

 uh-oh, Unix noob too, and unfortunately someone has jumped on with a
 CPU-intensive task pegging one of the cores at 100%, so these numbers prolly
 do not help, but here goes:

 procs ---memory-- ---swap-- -io --system--
 -cpu--
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
 wa st
  1  1 2076312 1503204 182152 30669308   49   69   260   299    3    3 28  2
 63  7  0
  1  1 2076312 1502900 182152 30669656    0    0   192  2260 1198  332 25  1
 50 24  0
  1  1 2076312 1503024 182152 30669656    0    0     0   704 1181  282 25  1
 50 25  0
  1  3 2076312 1502904 182156 30669740    0    0   104  2780 1224  422 25  0
 48 26  0
  1  3 2076312 1502896 182156 30669740    0    0     0  1552 1173  309 25  0

 I'll sample again if I get a window, but these jobs tend to run for hours.

I'm gonna take a guess about a few things:
1: You've got a lot of memory in that machine, try cranking up
work_mem for this query to see if that helps
2: You've got a slow disk subsystem, if you're already seeing 25%
IOWait with only ~2 to 3 megs a second being written.

While having enough memory for everything to fit in it makes for fast
reads, it doesn't do a lot to help with writes.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Martin Gainty

MGtake a look at the man page for vmstat

Martin Gainty 
__ 
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de 
déni et de confidentialité 
This message is confidential. If you should not be the intended receiver, then 
we ask politely to report. Each unauthorized forwarding or manufacturing of a 
copy is inadmissible. This message serves only for the exchange of information 
and has no legal binding effect. Due to the easy manipulation of emails we 
cannot take responsibility over the the contents.






 Date: Sat, 18 Apr 2009 18:55:59 -0600
 Subject: Re: [GENERAL] Nooby Q: Should this take five hours? And counting?
 From: scott.marl...@gmail.com
 To: kentil...@gmail.com
 CC: pgsql-general@postgresql.org
 
 On Sat, Apr 18, 2009 at 6:32 PM, Kenneth Tilton kentil...@gmail.com wrote:
 
 
  Scott Marlowe wrote:
  On Sat, Apr 18, 2009 at 4:11 PM, Kenneth Tilton kentil...@gmail.com
  wrote:
  ie, 5hrs and counting, no clue how long it intends to run, but methinks
  this
  is insane even if it is 10^7 records, mebbe half a dozen dups per value
  (a
  product-id usually around 8-chars long):
 
  CREATE INDEX web_source_items_by_item_id_strip
ON web_source_items
USING btree (item_id_strip);
 
  Am I unreasonably impatient?
 
  I see pg getting just a few % CPU on a heavily ram/core-endowed Sun box
  with
  nothing else going on. Mebbe they installed pg on a compact flash?
  DVD-RW?
  /usr/local/something, prolly not.
 
  What does vmstat 1 60 say during the index build?  Specifically the
  cpu columns for user, system, wa?
 
  uh-oh, Unix noob too, and unfortunately someone has jumped on with a
  CPU-intensive task pegging one of the cores at 100%, so these numbers prolly
  do not help, but here goes:
 
  procs ---memory-- ---swap-- -io --system--
  -cpu--
FIELD DESCRIPTION FOR VM MODE   Procs
   r: The number of processes waiting for run time.
   b: The number of processes in uninterruptible sleep.


   Memory
   swpd: the amount of virtual memory used.
   free: the amount of idle memory.
   buff: the amount of memory used as buffers.
   cache: the amount of memory used as cache.
   inact: the amount of inactive memory. (-a option)
   active: the amount of active memory. (-a option)


   Swap
   si: Amount of memory swapped in from disk (/s).
   so: Amount of memory swapped to disk (/s).


   IO
   bi: Blocks received from a block device (blocks/s).
   bo: Blocks sent to a block device (blocks/s).


   System
   in: The number of interrupts per second, including the clock.
   cs: The number of context switches per second.

   CPU
   These are percentages of total CPU time.
   us: Time spent running non-kernel code. (user time, including nice time)
   sy: Time spent running kernel code. (system time)
   id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
   wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

   r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
  wa st
   1  1 2076312 1503204 182152 30669308   49   69   260   29933 28  2
  63  7  0
   1  1 2076312 1502900 182152 3066965600   192  2260 1198  332 25  1
  50 24  0
   1  1 2076312 1503024 182152 3066965600 0   704 1181  282 25  1
  50 25  0
   1  3 2076312 1502904 182156 3066974000   104  2780 1224  422 25  0
  48 26  0
   1  3 2076312 1502896 182156 3066974000 0  1552 1173  309 25  0
 
  I'll sample again if I get a window, but these jobs tend to run for hours.
 
 I'm gonna take a guess about a few things:
 1: You've got a lot of memory in that machine, try cranking up
 work_mem for this query to see if that helps
 2: You've got a slow disk subsystem, if you're already seeing 25%
 IOWait with only ~2 to 3 megs a second being written.
 
 While having enough memory for everything to fit in it makes for fast
 reads, it doesn't do a lot to help with writes.
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

_
Rediscover Hotmail®: Now available on your iPhone or BlackBerry
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Mobile2_042009

Re: [GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Kenneth Tilton



Scott Marlowe wrote:

On Sat, Apr 18, 2009 at 6:32 PM, Kenneth Tilton kentil...@gmail.com wrote:


Scott Marlowe wrote:

On Sat, Apr 18, 2009 at 4:11 PM, Kenneth Tilton kentil...@gmail.com
wrote:

ie, 5hrs and counting, no clue how long it intends to run, but methinks
this
is insane even if it is 10^7 records, mebbe half a dozen dups per value
(a
product-id usually around 8-chars long):

CREATE INDEX web_source_items_by_item_id_strip
  ON web_source_items
  USING btree (item_id_strip);

Am I unreasonably impatient?

I see pg getting just a few % CPU on a heavily ram/core-endowed Sun box
with
nothing else going on. Mebbe they installed pg on a compact flash?
DVD-RW?
/usr/local/something, prolly not.

What does vmstat 1 60 say during the index build?  Specifically the
cpu columns for user, system, wa?

uh-oh, Unix noob too, and unfortunately someone has jumped on with a
CPU-intensive task pegging one of the cores at 100%, so these numbers prolly
do not help, but here goes:

procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa st
 1  1 2076312 1503204 182152 30669308   49   69   260   29933 28  2
63  7  0
 1  1 2076312 1502900 182152 3066965600   192  2260 1198  332 25  1
50 24  0
 1  1 2076312 1503024 182152 3066965600 0   704 1181  282 25  1
50 25  0
 1  3 2076312 1502904 182156 3066974000   104  2780 1224  422 25  0
48 26  0
 1  3 2076312 1502896 182156 3066974000 0  1552 1173  309 25  0

I'll sample again if I get a window, but these jobs tend to run for hours.


I'm gonna take a guess about a few things:
1: You've got a lot of memory in that machine, try cranking up
work_mem for this query to see if that helps


A bit. I killed the indexing and jacked work_mem up to 500mb, indexing 
then finished in 7 min 25s.


Yer a genius!


2: You've got a slow disk subsystem, if you're already seeing 25%
IOWait with only ~2 to 3 megs a second being written.


This has been passed along to management for consideration.

Thx a ton,

ken

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nooby Q: Should this take five hours? And counting?

2009-04-18 Thread Scott Marlowe
On Sat, Apr 18, 2009 at 6:32 PM, Kenneth Tilton kentil...@gmail.com wrote:


 Scott Marlowe wrote:
 On Sat, Apr 18, 2009 at 4:11 PM, Kenneth Tilton kentil...@gmail.com
 wrote:
 ie, 5hrs and counting, no clue how long it intends to run, but methinks
 this
 is insane even if it is 10^7 records, mebbe half a dozen dups per value
 (a
 product-id usually around 8-chars long):

 CREATE INDEX web_source_items_by_item_id_strip
               ON web_source_items
               USING btree (item_id_strip);

 Am I unreasonably impatient?

 I see pg getting just a few % CPU on a heavily ram/core-endowed Sun box
 with
 nothing else going on. Mebbe they installed pg on a compact flash?
 DVD-RW?
 /usr/local/something, prolly not.

 What does vmstat 1 60 say during the index build?  Specifically the
 cpu columns for user, system, wa?

 uh-oh, Unix noob too, and unfortunately someone has jumped on with a
 CPU-intensive task pegging one of the cores at 100%, so these numbers prolly
 do not help, but here goes:

 procs ---memory-- ---swap-- -io --system--
 -cpu--
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
 wa st
  1  1 2076312 1503204 182152 30669308   49   69   260   299    3    3 28  2
 63  7  0
  1  1 2076312 1502900 182152 30669656    0    0   192  2260 1198  332 25  1
 50 24  0
  1  1 2076312 1503024 182152 30669656    0    0     0   704 1181  282 25  1
 50 25  0
  1  3 2076312 1502904 182156 30669740    0    0   104  2780 1224  422 25  0
 48 26  0

Looking at these numbers, it seems you've got 4 cores, 1 is running
100% user, 2 are sitting idle, and 1 is 100% io wait.  I'm guessing
the 100% iowait is the postgres backend running your query.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general