Re: [PERFORM] two disks - best way to use them?

2005-12-06 Thread Ron

At 12:52 AM 12/6/2005, Thomas Harold wrote:

David Lang wrote:

in that case you logicly have two disks, so see the post from Ron 
earlier in this thread.


And it's a very nice performance gain.  Percent spent waiting 
according to top is down around 10-20% instead of 80-90%.  While 
I'm not prepared to benchmark, database performance is way up.  The 
client machines that are writing the data are running closer to 100% 
CPU (before they were well below 50% CPU utilization).

For accuracy's sake, which exact config did you finally use?

How did you choose the config you finally used?  Did you test the 
three options or just pick one?


Ron



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [GENERAL] need help

2005-12-06 Thread Tino Wildenhain

Jenny schrieb:

I'm running PostgreSQL 8.0.3 on i686-pc-linux-gnu (Fedora Core 2). I've been
dealing with Psql for over than 2 years now, but I've never had this case
before.

I have a table that has about 20 rows in it.

   Table public.s_apotik
Column |  Type| Modifiers
---+--+--
obat_id| character varying(10)| not null
stock  | numeric  | not null
s_min  | numeric  | not null
s_jual | numeric  | 
s_r_jual   | numeric  | 
s_order| numeric  | 
s_r_order  | numeric  | 
s_bs   | numeric  | 
last_receive   | timestamp without time zone  |

Indexes:
   s_apotik_pkey PRIMARY KEY, btree(obat_id)
   
When I try to UPDATE one of the row, nothing happens for a very long time.

First, I run it on PgAdminIII, I can see the miliseconds are growing as I
waited. Then I stop the query, because the time needed for it is unbelievably
wrong.

Then I try to run the query from the psql shell. For example, the table has
obat_id : A, B, C, D.
db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='A';
( nothing happens.. I press the Ctrl-C to stop it. This is what comes out
:)
Cancel request sent
ERROR: canceling query due to user request

(If I try another obat_id)
db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='B';
(Less than a second, this is what comes out :)
UPDATE 1

I can't do anything to that row. I can't DELETE it. Can't DROP the table. 
I want this data out of my database.

What should I do? It's like there's a falsely pointed index here.
Any help would be very much appreciated.



1) lets hope you do regulary backups - and actually tested restore.
1a) if not, do it right now
2) reindex the table
3) try again to modify

Q: are there any foreign keys involved? If so, reindex those
tables too, just in case.

did you vacuum regulary?

HTH
Tino

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-06 Thread Assaf Yaari
Thanks Bruno,

Issuing VACUUM FULL seems not to have influence on the time.
I've added to my script VACUUM ANALYZE every 100 UPDATE's and run the
test again (on different record) and the time still increase.

Any other ideas?

Thanks,
Assaf. 

 -Original Message-
 From: Bruno Wolff III [mailto:[EMAIL PROTECTED] 
 Sent: Monday, December 05, 2005 10:36 PM
 To: Assaf Yaari
 Cc: pgsql-performance@postgresql.org
 Subject: Re: Performance degradation after successive UPDATE's
 
 On Mon, Dec 05, 2005 at 19:05:01 +0200,
   Assaf Yaari [EMAIL PROTECTED] wrote:
  Hi,
   
  I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.
   
  My application updates counters in DB. I left a test over the night 
  that increased counter of specific record. After night running 
  (several hundreds of thousands updates), I found out that the time 
  spent on UPDATE increased to be more than 1.5 second (at 
 the beginning 
  it was less than 10ms)! Issuing VACUUM ANALYZE and even 
 reboot didn't 
  seemed to solve the problem.
 
 You need to be running vacuum more often to get rid of the 
 deleted rows (update is essentially insert + delete). Once 
 you get too many, plain vacuum won't be able to clean them up 
 without raising the value you use for FSM. By now the table 
 is really bloated and you probably want to use vacuum full on it.
 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Memory Leakage Problem

2005-12-06 Thread Kathy Lo
Hi,

I setup a database server using the following configuration.

Redhat 9.0
Postgresql 8.0.3

Then, I setup a client workstation to access this database server with
the following configuration.

Redhat 9.0
unixODBC 2.2.11
psqlodbc-08.01.0101

and write a C++ program to run database query.

In this program, it will access this database server using simple and
complex (joining tables) SQL Select statement and retrieve the matched
rows. For each access, it will connect the database and disconnect it.

I found that the memory of the databaser server nearly used up (total 2G RAM).

After I stop the program, the used memory did not free.

Is there any configuration in postgresql.conf I should set? Currently,
I just set the following in postgresql.conf

listen_addresses = '*'
max_stack_depth = 8100 (when I run ulimit -s the max. value that
kernel supports = 8192)
stats_row_level = true

And, I run pg_autovacuum as background job.

--
Kathy Lo

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] LVM and Postgres

2005-12-06 Thread Rory Campbell-Lange
I need to slice up a web server's disk space to provide space for
postgres and storing binaries such as images and sound files. I'm
thinking of using logical volume management (LVM) to help manage the
amount of space I use between postgres and the data volumes.

The server has a 250GB RAID10 (LSI 320-I + BBU) volume which I am
thinking of slicing up in the following way (Linux 2.6 kernel):

/ : ext3  :  47GB (root, home etc)
/boot : ext3  :  1GB
/tmp  : ext2  :  2GB
/usr  : ext3  :  4GB
/var  : ext3  :  6GB
---
60GB

VG:190GB approx
---
Initially divided so:  
/data : ext3  : 90GB
/postgres : xfs   : 40GB

This gives me left over space of roughly 60GB to extend into on the
volume group, which I can balance between the /data and /postgres
logical volumes as needed.

Are there any major pitfalls to this approach?

Thanks,
Rory

-- 
Rory Campbell-Lange 
[EMAIL PROTECTED]
www.campbell-lange.net

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
Hi,

Is it possible to get this query run faster than it does now, by adding
indexes, changing the query?

SELECT customers.objectid FROM prototype.customers, prototype.addresses
WHERE
customers.contactaddress = addresses.objectid
ORDER BY zipCode asc, housenumber asc
LIMIT 1 OFFSET 283745

Explain:

Limit  (cost=90956.71..90956.71 rows=1 width=55)
  -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
Sort Key: addresses.zipcode, addresses.housenumber
-  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
  Hash Cond: (outer.contactaddress = inner.objectid)
  -  Seq Scan on customers  (cost=0.00..31392.15
rows=368915 width=80)
  -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
-  Seq Scan on addresses  (cost=0.00..13675.15
rows=369315 width=55)

The customers table has an index on contactaddress and objectid.
The addresses table has an index on zipcode+housenumber and objectid.

TIA

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-06 Thread Pandurangan R S
Hi,

You might try these steps

1. Do a vacuum full analyze
2. Reindex the index on id column
3. Cluster the table based on this index

On 12/5/05, Assaf Yaari [EMAIL PROTECTED] wrote:

 Hi,

 I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.

 My application updates counters in DB. I left a test over the night that
 increased counter of specific record. After night running (several hundreds
 of thousands updates), I found out that the time spent on UPDATE increased
 to be more than 1.5 second (at the beginning it was less than 10ms)! Issuing
 VACUUM ANALYZE and even reboot didn't seemed to solve the problem.

 I succeeded to re-produce this with a simple test:

 I created a very simple table that looks like that:
 CREATE TABLE test1
 (
   id int8 NOT NULL,
   counter int8 NOT NULL DEFAULT 0,
   CONSTRAINT Test1_pkey PRIMARY KEY (id)
 ) ;

 I've inserted 15 entries and wrote a script that increase the counter of
 specific record over and over. The SQL command looks like this:
 UPDATE test1 SET counter=number WHERE id=10;

 At the beginning the UPDATE time was around 15ms. After ~9 updates, the
 execution time increased to be more than 120ms.

 1. What is the reason for this phenomena?
 2. Is there anything that can be done in order to improve this?

 Thanks,
 Assaf


--
Regards
Pandu

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Michael Riess

Hi,

Is it possible to get this query run faster than it does now, by adding
indexes, changing the query?

SELECT customers.objectid FROM prototype.customers, prototype.addresses
WHERE
customers.contactaddress = addresses.objectid
ORDER BY zipCode asc, housenumber asc
LIMIT 1 OFFSET 283745

Explain:

Limit  (cost=90956.71..90956.71 rows=1 width=55)
  -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
Sort Key: addresses.zipcode, addresses.housenumber
-  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
  Hash Cond: (outer.contactaddress = inner.objectid)
  -  Seq Scan on customers  (cost=0.00..31392.15
rows=368915 width=80)
  -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
-  Seq Scan on addresses  (cost=0.00..13675.15
rows=369315 width=55)

The customers table has an index on contactaddress and objectid.
The addresses table has an index on zipcode+housenumber and objectid.


When the resulting relation contains all the info from both tables, 
indexes won't help, seq scan is inevitable.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Csaba Nagy
Joost,

Why do you use an offset here ? I guess you're traversing the table
somehow, in this case it would be better to remember the last zipcode +
housenumber and put an additional condition to get the next bigger than
the last one you've got... that would go for the index on
zipcode+housenumber and be very fast. The big offset forces postgres to
traverse that many entries until it's able to pick the one row for the
result...

On Tue, 2005-12-06 at 10:43, Joost Kraaijeveld wrote:
 Hi,
 
 Is it possible to get this query run faster than it does now, by adding
 indexes, changing the query?
 
 SELECT customers.objectid FROM prototype.customers, prototype.addresses
 WHERE
 customers.contactaddress = addresses.objectid
 ORDER BY zipCode asc, housenumber asc
 LIMIT 1 OFFSET 283745
 
 Explain:
 
 Limit  (cost=90956.71..90956.71 rows=1 width=55)
   -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
 Sort Key: addresses.zipcode, addresses.housenumber
 -  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
   Hash Cond: (outer.contactaddress = inner.objectid)
   -  Seq Scan on customers  (cost=0.00..31392.15
 rows=368915 width=80)
   -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
 -  Seq Scan on addresses  (cost=0.00..13675.15
 rows=369315 width=55)
 
 The customers table has an index on contactaddress and objectid.
 The addresses table has an index on zipcode+housenumber and objectid.
 
 TIA
 
 --
 Groeten,
 
 Joost Kraaijeveld
 Askesis B.V.
 Molukkenstraat 14
 6524NB Nijmegen
 tel: 024-3888063 / 06-51855277
 fax: 024-3608416
 e-mail: [EMAIL PROTECTED]
 web: www.askesis.nl
 
 
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Markus Wollny
Hi,

 -Ursprüngliche Nachricht-
 Von: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Im Auftrag 
 von Joost Kraaijeveld
 Gesendet: Dienstag, 6. Dezember 2005 10:44
 An: Pgsql-Performance
 Betreff: [PERFORM] Can this query go faster???
 
 SELECT customers.objectid FROM prototype.customers, 
 prototype.addresses WHERE customers.contactaddress = 
 addresses.objectid ORDER BY zipCode asc, housenumber asc 
 LIMIT 1 OFFSET 283745
 
 Explain:
 
 Limit  (cost=90956.71..90956.71 rows=1 width=55)
   -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
 Sort Key: addresses.zipcode, addresses.housenumber
 -  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
   Hash Cond: (outer.contactaddress = inner.objectid)
   -  Seq Scan on customers  (cost=0.00..31392.15
 rows=368915 width=80)
   -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
 -  Seq Scan on addresses  (cost=0.00..13675.15
 rows=369315 width=55)
 
 The customers table has an index on contactaddress and objectid.
 The addresses table has an index on zipcode+housenumber and objectid.

The planner chooses sequential scans on customers.contactaddress and 
addresses.objectid instead of using the indices. In order to determine whether 
this is a sane decision, you should run EXPLAIN ANALYZE on this query, once 
with SET ENABLE_SEQSCAN = on; and once with SET ENABLE_SEQSCAN = off;. If the 
query is significantly faster with SEQSCAN off, then something is amiss - 
either you haven't run analyze often enough so the stats are out of date or you 
have random_page_cost set too high (look for the setting in postgresql.conf) - 
these two are the usual suspects.

Kind regards

   Markus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
On Tue, 2005-12-06 at 10:52 +0100, Csaba Nagy wrote:
 Joost,
 
 Why do you use an offset here ? I guess you're traversing the table
 somehow, in this case it would be better to remember the last zipcode +
 housenumber and put an additional condition to get the next bigger than
 the last one you've got... that would go for the index on
 zipcode+housenumber and be very fast. The big offset forces postgres to
 traverse that many entries until it's able to pick the one row for the
I am forced to translate a sorting dependent record number to a record
in the database. The GUI (a Java JTable) works with record /row numbers,
which is handy if one has an ISAM database, but not if one uses
PostgreSQL.

I wonder if using a forward scrolling cursor would be faster.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Tino Wildenhain

Joost Kraaijeveld schrieb:

On Tue, 2005-12-06 at 10:52 +0100, Csaba Nagy wrote:


Joost,

Why do you use an offset here ? I guess you're traversing the table
somehow, in this case it would be better to remember the last zipcode +
housenumber and put an additional condition to get the next bigger than
the last one you've got... that would go for the index on
zipcode+housenumber and be very fast. The big offset forces postgres to
traverse that many entries until it's able to pick the one row for the


I am forced to translate a sorting dependent record number to a record
in the database. The GUI (a Java JTable) works with record /row numbers,
which is handy if one has an ISAM database, but not if one uses
PostgreSQL.


You can have a row number in postgres easily too. For example if you
just include a serial for the row number.

Cursor would work too but you would need to have a persistent connection.

Regards
Tino

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] BLCKSZ

2005-12-06 Thread Steinar H. Gunderson
On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote:
 I can't undestand why bigger is better. For instance in search by 
 index. Index point to page and I need load page to get one row. Thus I 
 load 8kb from disk for every raw. And keep it then in cache. You 
 recommend 64kb. With your recomendation I'll get 8 times more IO 
 throughput, 8 time more head seek on disk, 8 time more memory cache (OS 
 cache and postgresql) become busy.

Hopefully, you won't have eight times the seeking; a single block ought to be
in one chunk on disk. You're of course at your filesystem's mercy, though.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
Hi Tino,

On Tue, 2005-12-06 at 11:32 +0100, Tino Wildenhain wrote:
 You can have a row number in postgres easily too. For example if you
 just include a serial for the row number.
Not if the order of things is determined runtime and not at insert time...

 Cursor would work too but you would need to have a persistent connection.
I just tried it: a cursor is not faster (what does not surprise me at
all, as the amount of work looks the same to me)

I guess there is no solution.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Tino Wildenhain

Joost Kraaijeveld schrieb:

Hi Tino,


..



Cursor would work too but you would need to have a persistent connection.


I just tried it: a cursor is not faster (what does not surprise me at
all, as the amount of work looks the same to me)


Actually no, if you scroll forward, you just ask the database for the
next rows to materialize. So if you are ahead in your database and
ask for next rows, it should be faster then working w/ an offset
from start each time.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] BLCKSZ

2005-12-06 Thread David Lang

On Tue, 6 Dec 2005, Steinar H. Gunderson wrote:


On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote:

I can't undestand why bigger is better. For instance in search by
index. Index point to page and I need load page to get one row. Thus I
load 8kb from disk for every raw. And keep it then in cache. You
recommend 64kb. With your recomendation I'll get 8 times more IO
throughput, 8 time more head seek on disk, 8 time more memory cache (OS
cache and postgresql) become busy.


Hopefully, you won't have eight times the seeking; a single block ought to be
in one chunk on disk. You're of course at your filesystem's mercy, though.


in fact useually it would mean 1/8 as many seeks, since the 64k chunk 
would be created all at once it's probably going to be one chunk on disk 
as Steiner points out and that means that you do one seek per 64k instead 
of one seek per 8k.


With current disks it's getting to the point where it's the same cost to 
read 8k as it is to read 64k (i.e. almost free, you could read 
substantially more then 64k and not notice it in I/O speed), it's the 
seeks that are expensive.


yes it will eat up more ram, but assuming that you are likly to need other 
things nearby it's likly to be a win.


as processor speed keeps climing compared to memory and disk speed true 
random access is really not the correct way to think about I/O anymore. 
It's frequently more appropriate to think of your memory and disks as if 
they were tape drives (seek then read, repeat)


even for memory access what you really do is seek to the beginning of a 
block (expensive) then read that block into cache (cheap, you get the 
entire cacheline of 64-128 bytes no matter if you need it or not) and then 
you can then access that block fairly quickly. with memory on SMP machines 
it's a constant cost to seek anywhere in memory, with NUMA machines 
(including multi-socket Opterons) the cost to do the seek and fetch 
depends on where in memory you are seeking to and what cpu you are running 
on. it also becomes very expensive for multiple CPU's to write to memory 
addresses that are in the same block (cacheline) of memory.


for disks it's even more dramatic, the seek is incredibly expensive 
compared to the read/write, and the cost of the seek varies based on how 
far you need to seek, but once you are on a track you can read the entire 
track in for about the same cost as a single block (in fact the drive 
useually does read the entire track before sending the one block on to 
you). Raid complicates this becouse you have a block size per drive and 
reading larger then that block size involves multiple drives.


most of the work in dealing with these issues and optimizing for them is 
the job of the OS, some other databases work very hard to take over this 
work from the OS, Postgres instead tries to let the OS do this work, but 
we still need to keep it in mind when configuring things becouse it's 
possible to make it much easier or much harder for the OS optimize things.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Tino Wildenhain

Joost Kraaijeveld schrieb:

On Tue, 2005-12-06 at 12:36 +0100, Tino Wildenhain wrote:


I just tried it: a cursor is not faster (what does not surprise me at
all, as the amount of work looks the same to me)


Actually no, if you scroll forward, you just ask the database for the
next rows to materialize. So if you are ahead in your database and
ask for next rows, it should be faster then working w/ an offset
from start each time.


Ah, a misunderstanding: I only need to calculate an index if the user
wants a record that is not in or adjacent to the cache (in which case I
can do a select values  last value in the cache. So  I must always
materialize all rows below the wanted index.


Yes, but still advancing a few blocks from where the cursor is
should be faster then re-issuing the query and scroll thru
the whole resultset to where you want to go.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-06 Thread Jan Wieck

On 12/6/2005 4:08 AM, Assaf Yaari wrote:

Thanks Bruno,

Issuing VACUUM FULL seems not to have influence on the time.
I've added to my script VACUUM ANALYZE every 100 UPDATE's and run the
test again (on different record) and the time still increase.


I think he meant

- run VACUUM FULL once,
- adjust FSM settings to database size and turnover ratio
- run VACUUM ANALYZE more frequent from there on.


Jan



Any other ideas?

Thanks,
Assaf. 


-Original Message-
From: Bruno Wolff III [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 05, 2005 10:36 PM

To: Assaf Yaari
Cc: pgsql-performance@postgresql.org
Subject: Re: Performance degradation after successive UPDATE's

On Mon, Dec 05, 2005 at 19:05:01 +0200,
  Assaf Yaari [EMAIL PROTECTED] wrote:
 Hi,
  
 I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.
  
 My application updates counters in DB. I left a test over the night 
 that increased counter of specific record. After night running 
 (several hundreds of thousands updates), I found out that the time 
 spent on UPDATE increased to be more than 1.5 second (at 
the beginning 
 it was less than 10ms)! Issuing VACUUM ANALYZE and even 
reboot didn't 
 seemed to solve the problem.


You need to be running vacuum more often to get rid of the 
deleted rows (update is essentially insert + delete). Once 
you get too many, plain vacuum won't be able to clean them up 
without raising the value you use for FSM. By now the table 
is really bloated and you probably want to use vacuum full on it.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Csaba Nagy
On Tue, 2005-12-06 at 13:20, Joost Kraaijeveld wrote:
[snip]
 Ah, a misunderstanding: I only need to calculate an index if the user
 wants a record that is not in or adjacent to the cache (in which case I
 can do a select values  last value in the cache. So  I must always
 materialize all rows below the wanted index.

In this case the query will very likely not work faster. It must always
visit all the records till the required offset. If the plan should be
faster using the index, then you probably need to analyze (I don't
recall from your former posts if you did it recently or not), in any
case you could check an explain analyze to see if the planner is
mistaken or not - you might already know this.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Ron

At 04:43 AM 12/6/2005, Joost Kraaijeveld wrote:

Hi,

Is it possible to get this query run faster than it does now, by adding
indexes, changing the query?

SELECT customers.objectid FROM prototype.customers, prototype.addresses
WHERE
customers.contactaddress = addresses.objectid
ORDER BY zipCode asc, housenumber asc
LIMIT 1 OFFSET 283745

Explain:

Limit  (cost=90956.71..90956.71 rows=1 width=55)
  -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
Sort Key: addresses.zipcode, addresses.housenumber
-  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
  Hash Cond: (outer.contactaddress = inner.objectid)
  -  Seq Scan on customers  (cost=0.00..31392.15
rows=368915 width=80)
  -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
-  Seq Scan on addresses  (cost=0.00..13675.15
rows=369315 width=55)

The customers table has an index on contactaddress and objectid.
The addresses table has an index on zipcode+housenumber and objectid.

TIA
customer names, customers.objectid, addresses, and addresses.objectid 
should all be static (addresses do not change, just the customers 
associated with them; and once a customer has been assigned an id 
that better never change...).


To me, this sounds like the addresses and customers tables should be 
duplicated and then physically laid out in sorted order by 
tablename.objectid in one set and by the human friendly 
associated string in the other set.
Then a finding a specific tablename.objectid or it's associated 
string can be done in at worse O(lgn) time assuming binary search 
instead of O(n) time for a sequential scan.  If pg is clever enough, 
it might be able to do better than that.


IOW, I'd try duplicating the addresses and customers tables and using 
the appropriate CLUSTERed Index on each.


I know this breaks Normal Form.  OTOH, this kind of thing is common 
practice for data mining problems on static or almost static data.


Hope this is helpful,
Ron



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Merlin Moncure
 On Tue, 2005-12-06 at 11:32 +0100, Tino Wildenhain wrote:
  You can have a row number in postgres easily too. For example if you
  just include a serial for the row number.
 Not if the order of things is determined runtime and not at insert
time...
 
  Cursor would work too but you would need to have a persistent
 connection.
 I just tried it: a cursor is not faster (what does not surprise me at
 all, as the amount of work looks the same to me)
 
 I guess there is no solution.
 

sure there is.  This begs the question: 'why do you want to read exactly
283745 rows ahead of row 'x'?) :)

If you are scrolling forwards in a set, just pull in, say, 100-1000 rows
at a time, ordered, and grab the next 1000 based on the highest value
read previously.

You can do this on server side (cursor) or client side (parameterized
query).   There are advantages and disadvantages to each.  If you are
looping over this set and doing processing, a cursor would be ideal (try
out pl/pgsql).

Welcome to PostgreSQL! :) 

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Memory Leakage Problem

2005-12-06 Thread Tom Lane
Kathy Lo [EMAIL PROTECTED] writes:
 I found that the memory of the databaser server nearly used up (total 2G RAM).
 After I stop the program, the used memory did not free.

I see no particular reason to believe that you are describing an actual
memory leak.  More likely, you are just seeing the kernel's normal
behavior of eating up unused memory for disk cache space.

Repeat after me: zero free memory is the normal and desirable condition
on Unix-like systems.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] two disks - best way to use them?

2005-12-06 Thread Thomas Harold

Ron wrote:


For accuracy's sake, which exact config did you finally use?

How did you choose the config you finally used?  Did you test the three 
options or just pick one?


(Note: I'm not the original poster.)

I just picked the option of putting the data/pg_xlog directory (WAL) on 
a 2nd set of spindles.  That was the easiest thing for me to change on 
this test box.


The test server is simply a Gentoo box running software RAID and LVM2. 
The primary disk set is 2x7200RPM 300GB drives and the secondary disk 
set is 2x5400RPM 300GB drives.  Brand new install of PGSQL 8.1, with 
mostly default settings (I changed FSM pages to be a higher value, 
max_fsm_pages = 15).  PGSQL was given it's own ext3 32GB LVM volume 
on the primary disk set (2x7200RPM).  Originally, all files were on the 
primary disk.


The task at hand was inserting large quantity of ~45 byte rows 
(according to vacuum verbose), on the order of millions of records per 
table.  There was an unique key and a unique index.  Test clients were 
accessing the database via ODBC / ADO and doing the inserts in a fairly 
brute-force mode (attempt the insert, calling .CancelUpdate if it fails).


When the tables were under 2 million rows, performance was okay. At one 
point, I had a 1.8Ghz P4, dual Opteron 246, and Opteron 148 CPUs running 
at nearly 100% CPU processing and doing inserts into the database.  So I 
had 4 clients running, performing inserts to 4 separate tables in the 
same database.  The P4 ran at about half the throughput as the Opterons 
(client-bound due to the code that generated row data prior to the 
insert), so I'd score my throughput as roughly 3.3-3.4.  Where 1.0 would 
be full utilization of the Opteron 148 box.


However, once the tables started getting above ~2 million rows, 
performance took a nose dive.  CPU utilizations on the 4 client CPUs 
dropped into the basement (5-20% CPU) and I had to back off on the 
number of clients.  So throughput had dropped down to around 0.25 or so. 
 The linux box was spending nearly all of its time waiting on the 
primary disks.


Moving the data/pg_xlog (WAL) to the 2nd set of disks (2x5400RPM) in the 
test server made a dramatic difference for this mass insert.  I'm 
running the P4 (100% CPU) and the Opteron 148 (~80% CPU) at the moment. 
 While it's not up to full speed, a throughput of ~1.3 is a lot better 
then the ~0.25 that I was getting prior.  (The two tables currently 
being written have over 5 million rows each.  One table has ~16 million 
rows.)  Wait percentage in top is only running 20-30% (dipping as low 
as 10%).  I haven't pushed this new setup hard enough to determine where 
the upper limit for throughput is.


It's very much a niche test (millions of inserts of narrow rows into 
multiple tables using fairly brain-dead code).  But it gives me data 
points on which to base purchasing of the production box.  The original 
plan was a simple RAID1 setup (2 spindles), but this tells me it's 
better to order 4 spindles and set it up as a pair of RAID1 sets.


Whether 4 spindles is better as two separate RAID1 arrays, or configured 
as a single RAID1+0 array... dunno.  Our application is typically more 
limited by insert speed then read speed (so I'm leaning towards separate 
RAID arrays).


I'm sure there's also more tuning that could be done to the PGSQL 
database (in the configuration file).  Also, the code is throwaway code 
that isn't the most elegant.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-12-06 Thread Pailloncy Jean-Gerard

Hi,

After few test, the difference is explained by the  
effective_cache_size parameter.


with effective_cache_size=1000 (default)
the planner chooses the following plan
postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

   QUERY PLAN
 


Aggregate  (cost=421893.64..421893.65 rows=1 width=0)
   -  Unique  (cost=385193.48..395679.24 rows=2097152 width=8)
 -  Sort  (cost=385193.48..390436.36 rows=2097152 width=8)
   Sort Key: test.val
   -  Seq Scan on test  (cost=0.00..31252.52  
rows=2097152 width=8)

(5 rows)


with effective_cache_size=15000
the planner chooses the following plan
postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

QUERY PLAN
 
--

Aggregate  (cost=101720.39..101720.40 rows=1 width=0)
   -  Unique  (cost=0.00..75505.99 rows=2097152 width=8)
 -  Index Scan using testval on test  (cost=0.00..70263.11  
rows=2097152 width=8)

(3 rows)

I test some other values for effective_cache_size.
The switch from seq to index scan happens between 9900 and 1 for  
effective_cache_size.


I have my sql server on a OpenBSD 3.8 box with 1 Gb of RAM with  
nothing else running on it.
I setup the cachepercent to 25. I expect to have 25% of 1 Gb of RAM  
(256 Mb) as file cache.
effective_cache_size=15000 means 15000 x 8K of OS cache = 120,000 Kb  
which is lower than my 256 MB of disk cache.


I recall the result of my precedent test.
#rows 2097152
IndexScan 1363396,581s
SeqScan 98758,445s
Ratio  13,805
So the planner when effective_cache_size=15000 chooses a plan that is  
13 times slower than the seqscan one.


I did not understand where the problem comes from.
Any help welcome.

Cordialement,
Jean-Gérard Pailloncy



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Joshua Kramer


Greetings all,

I'm going to do a performance comparison with DocMgr and PG81/TSearch2 on 
one end, and Apache Lucene on the other end.


In order to do this, I'm going to create a derivative of the 
docmgr-autoimport script so that I can specify one file to import at a 
time.  I'll then create a Perl script which logs all details (such as 
timing, etc.) as the test progresses.


As test data, I have approximately 9,000 text files from Project Gutenberg 
ranging in size from a few hundred bytes to 4.5M.


I plan to test the speed of import of each file.  Then, I plan to write a 
web-robot in Perl that will test the speed and number of results returned.


Can anyone think of a validation of this test, or how I should configure 
PG to maximise import and search speed?  Can I maximise search speed and 
import speed, or are those things mutually exclusive?  (Note that this 
will be run on limited hardware - 900MHz Athlon with 512M of ram)


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


Thanks,
-Josh

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Missed index opportunity for outer join?

2005-12-06 Thread Tom Lane
[EMAIL PROTECTED] writes:
 On Mon, 5 Dec 2005, Tom Lane wrote:
 (Note to self: it is a bit odd that fac_id=261 is pushed down to become
 an indexqual in one case but not the other ...)

 I speculate that the seq_scan wasn't really the slow part
 compared to not using using both parts of the index in the 
 second part of the plan.  The table point_features is tens of
 thousands of rows, while the table facets is tens of millions.

Agreed, but it's still odd that it would use a seqscan in one case and
not the other.

I found the reason why the fac_id=261 clause isn't getting used as an
index qual; it's a bit of excessive paranoia that goes back to 2002.
I've fixed that for 8.1.1, but am still wondering about the seqscan
on the other side of the join.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the near 
future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Oleg Bartunov

Folks,

tsearch2 and Lucene are very different search engines, so it'd be unfair
comparison. If you need full access to metadata and instant indexing
you, probably, find tsearch2 is more suitable then Lucene. But, if 
you could live without that features and need to search read only

archives you need Lucene.

Tsearch2 integration into pgsql would be cool, but, I see no problem to 
use tsearch2 as an official extension module. After completing our

todo, which we hope will likely  happens for 8.2 release, you could
forget about Lucene and other engines :) We'll be available for developing
in spring and we estimate about three months for our todo, so, it's
really doable.

Oleg

On Tue, 6 Dec 2005, Michael Riess wrote:



Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?


I'll stay away from TSearch2 until it is fully integrated in the postgres 
core (like create index foo_text on foo (texta, textb) USING TSearch2). 
Because a full integration is unlikely to happen in the near future (as far 
as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

 http://www.postgresql.org/docs/faq



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Bruce Momjian
Oleg Bartunov wrote:
 Folks,
 
 tsearch2 and Lucene are very different search engines, so it'd be unfair
 comparison. If you need full access to metadata and instant indexing
 you, probably, find tsearch2 is more suitable then Lucene. But, if 
 you could live without that features and need to search read only
 archives you need Lucene.
 
 Tsearch2 integration into pgsql would be cool, but, I see no problem to 
 use tsearch2 as an official extension module. After completing our
 todo, which we hope will likely  happens for 8.2 release, you could
 forget about Lucene and other engines :) We'll be available for developing
 in spring and we estimate about three months for our todo, so, it's
 really doable.

Agreed.  There isn't anything magical about a plug-in vs something
integrated, as least in PostgreSQL.  In other database, plug-ins can't
fully function as integrated, but in PostgreSQL, everything is really a
plug-in because it is all abstracted.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] postgresql performance tuning

2005-12-06 Thread Ameet Kini


This didn't get through the first time around, so resending it again.
Sorry for any duplicate entries.

Hello,

I have a question on postgres's performance tuning, in particular, the
vacuum and reindex commands. Currently I do a vacuum (without full) on all
of my tables.  However, its noted in the docs (e.g.
http://developer.postgresql.org/docs/postgres/routine-reindex.html)
and on the lists here that indexes may still bloat after a while and hence
reindex is necessary.  How often do people reindex their tables out
there? I guess I'd have to update my cron scripts to do reindexing too
along with vacuuming but most probably at a much lower frequency than
vacuum.

But these scripts do these maintenance tasks at a fixed time (every few
hours, days, weeks, etc.) What I would like is to do these tasks on a need
basis.  So for vacuuming, by need I mean every few updates or some such
metric that characterizes my workload. Similarly, need for the reindex
command might mean every few updates or degree of bloat, etc.

I came across the pg_autovacuum daemon, which seems to do exactly what I
need for vacuums. However, it'd be great if there was a similar automatic
reindex utility, like say, a pg_autoreindex daemon. Are there any plans
for this feature?  If not, then would cron scripts be the next best
choice?

Thanks,
Ameet

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Memory Leakage Problem

2005-12-06 Thread Scott Marlowe
On Tue, 2005-12-06 at 03:22, Kathy Lo wrote:
 Hi,

 
 In this program, it will access this database server using simple and
 complex (joining tables) SQL Select statement and retrieve the matched
 rows. For each access, it will connect the database and disconnect it.
 
 I found that the memory of the databaser server nearly used up (total 2G RAM).
 
 After I stop the program, the used memory did not free.

U.  What exactly do you mean?  Can we see the output of top and / or
free?  I'm guessing that what Tom said is right, you're just seeing a
normal state of how unix does things.

If your output of free looks like this:

-bash-2.05b$ free
  total   used   free sharedbuffers cached
Mem:60969126069588  27324  0 2607285547264
-/+ buffers/cache: 2615965835316
Swap:  4192880  163204176560

Then that's normal.

That's the output of free on a machine with 6 gigs that runs a reporting
database.  Note that while it shows almost ALL the memory as used, it is
being used by the kernel, which is a good thing.  Note that 5547264 or
about 90% of memory is being used as kernel cache.  That's a good thing.

Note you can also get yourself in trouble with top.  It's not uncommon
for someone to see a bunch of postgres processes each eating up 50 or
more megs of ram, and panic and think that they're running out of
memory, when, in fact, 44 meg for each of those processes is shared, and
the real usage per backend is 6 megs or less.

Definitely grab yourself a good unix / linux sysadmin guide.  The in a
nutshell books from O'Reilley (sp?) are a good starting point.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Oleg Bartunov wrote:
 Tsearch2 integration into pgsql would be cool, but, I see no problem to 
 use tsearch2 as an official extension module.

 Agreed.  There isn't anything magical about a plug-in vs something
 integrated, as least in PostgreSQL.

The quality gap between contrib and the main system is a lot smaller
than it used to be, at least for those contrib modules that have
regression tests.  Main and contrib get equal levels of testing from
the buildfarm, so they're about on par as far as portability goes.
We could never say that before 8.1 ...

(Having said that, I think that tsearch2 will eventually become part
of core, but probably not for awhile yet.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Tom Lane
Ameet Kini [EMAIL PROTECTED] writes:
 I have a question on postgres's performance tuning, in particular, the
 vacuum and reindex commands. Currently I do a vacuum (without full) on all
 of my tables.  However, its noted in the docs (e.g.
 http://developer.postgresql.org/docs/postgres/routine-reindex.html)
 and on the lists here that indexes may still bloat after a while and hence
 reindex is necessary.  How often do people reindex their tables out
 there?

Never, unless you have actual evidence that your indexes are bloating.
It's only very specific use-patterns that have problems.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess

Bruce Momjian schrieb:

Oleg Bartunov wrote:

Folks,

tsearch2 and Lucene are very different search engines, so it'd be unfair
comparison. If you need full access to metadata and instant indexing
you, probably, find tsearch2 is more suitable then Lucene. But, if 
you could live without that features and need to search read only

archives you need Lucene.

Tsearch2 integration into pgsql would be cool, but, I see no problem to 
use tsearch2 as an official extension module. After completing our

todo, which we hope will likely  happens for 8.2 release, you could
forget about Lucene and other engines :) We'll be available for developing
in spring and we estimate about three months for our todo, so, it's
really doable.


Agreed.  There isn't anything magical about a plug-in vs something
integrated, as least in PostgreSQL.  In other database, plug-ins can't
fully function as integrated, but in PostgreSQL, everything is really a
plug-in because it is all abstracted.



I only remember evaluating TSearch2 about a year ago, and when I read 
statements like Vacuum and/or database dump/restore work differently 
when using TSearch2, sql scripts need to be executed etc. I knew that I 
would not want to go there.


But I don't doubt that it works, and that it is a sane concept.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Bruce Momjian
Michael Riess wrote:
 Bruce Momjian schrieb:
  Oleg Bartunov wrote:
  Folks,
 
  tsearch2 and Lucene are very different search engines, so it'd be unfair
  comparison. If you need full access to metadata and instant indexing
  you, probably, find tsearch2 is more suitable then Lucene. But, if 
  you could live without that features and need to search read only
  archives you need Lucene.
 
  Tsearch2 integration into pgsql would be cool, but, I see no problem to 
  use tsearch2 as an official extension module. After completing our
  todo, which we hope will likely  happens for 8.2 release, you could
  forget about Lucene and other engines :) We'll be available for developing
  in spring and we estimate about three months for our todo, so, it's
  really doable.
  
  Agreed.  There isn't anything magical about a plug-in vs something
  integrated, as least in PostgreSQL.  In other database, plug-ins can't
  fully function as integrated, but in PostgreSQL, everything is really a
  plug-in because it is all abstracted.
 
 
 I only remember evaluating TSearch2 about a year ago, and when I read 
 statements like Vacuum and/or database dump/restore work differently 
 when using TSearch2, sql scripts need to be executed etc. I knew that I 
 would not want to go there.
 
 But I don't doubt that it works, and that it is a sane concept.

Good point.  I think we had some problems at that point because the API
was improved between versions.  Even if it had been integrated, we might
have had the same problem.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Vivek Khera


On Dec 6, 2005, at 12:44 PM, Ameet Kini wrote:


I have a question on postgres's performance tuning, in particular, the
vacuum and reindex commands. Currently I do a vacuum (without full)  
on all

of my tables.  However, its noted in the docs (e.g.
http://developer.postgresql.org/docs/postgres/routine-reindex.html)
and on the lists here that indexes may still bloat after a while  
and hence

reindex is necessary.  How often do people reindex their tables out


Why would you be running a version older than 7.4?  Index bloat is  
mostly a non-issue in recent releases of pg.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High context switches occurring

2005-12-06 Thread Anjan Dave
I ran a bit exhaustive pgbench on 2 test machines I have (quad dual core
Intel and Opteron). Ofcourse the Opteron was much faster, but
interestingly, it was experiencing 3x more context switches than the
Intel box (upto 100k, versus ~30k avg on Dell). Both are RH4.0
64bit/PG8.1 64bit.

Sun (v40z):
-bash-3.00$ time pgbench -c 1000 -t 30 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1000
number of transactions per client: 30
number of transactions actually processed: 3/3
tps = 45.871234 (including connections establishing)
tps = 46.092629 (excluding connections establishing)

real10m54.240s
user0m34.894s
sys 3m9.470s


Dell (6850):
-bash-3.00$ time pgbench -c 1000 -t 30 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1000
number of transactions per client: 30
number of transactions actually processed: 3/3
tps = 22.088214 (including connections establishing)
tps = 22.162454 (excluding connections establishing)

real22m38.301s
user0m43.520s
sys 5m42.108s

Thanks,
Anjan

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 22, 2005 2:42 PM
To: Anjan Dave
Cc: Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring 

Anjan Dave [EMAIL PROTECTED] writes:
 Would this problem change it's nature in any way on the recent
Dual-Core
 Intel XEON MP machines?

Probably not much.

There's some evidence that Opterons have less of a problem than Xeons
in multi-chip configurations, but we've seen CS thrashing on Opterons
too.  I think the issue is probably there to some extent in any modern
SMP architecture.

regards, tom lane


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] postgresql performance tuning

2005-12-06 Thread Ameet Kini


Hello,

I have a question on postgres's performance tuning, in particular, the
vacuum and reindex commands. Currently I do a vacuum (without full) on all
of my tables.  However, its noted in the docs (e.g.
http://developer.postgresql.org/docs/postgres/routine-reindex.html)
and on the lists here that indexes may still bloat after a while and hence
reindex is necessary.  How often do people reindex their tables out
there? I guess I'd have to update my cron scripts to do reindexing too
along with vacuuming but most probably at a much lower frequency than
vacuum.

But these scripts do these maintenance tasks at a fixed time (every few
hours, days, weeks, etc.) What I would like is to do these tasks on a need
basis.  So for vacuuming, by need I mean every few updates or some such
metric that characterizes my workload. Similarly, need for the reindex
command might mean every few updates or degree of bloat, etc.

I came across the pg_autovacuum daemon, which seems to do exactly what I
need for vacuums. However, it'd be great if there was a similar automatic
reindex utility, like say, a pg_autoreindex daemon. Are there any plans
for this feature?  If not, then would cron scripts be the next best
choice?

Thanks,
Ameet

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] High context switches occurring

2005-12-06 Thread Vivek Khera


On Dec 6, 2005, at 2:04 PM, Anjan Dave wrote:


interestingly, it was experiencing 3x more context switches than the
Intel box (upto 100k, versus ~30k avg on Dell). Both are RH4.0


I'll assume that's context switches per second... so for the opteron  
that's 6540 cs's and for the Dell that's 4074 switches during  
the duration of the test.  Not so much a difference...


You see, the opteron was context switching more because it was doing  
more work :-)




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Vivek Khera


On Dec 6, 2005, at 11:14 AM, Ameet Kini wrote:

need for vacuums. However, it'd be great if there was a similar  
automatic
reindex utility, like say, a pg_autoreindex daemon. Are there any  
plans

for this feature?  If not, then would cron scripts be the next best


what evidence do you have that you are suffering index bloat?  or are  
you just looking for solutions to problems that don't exist as an  
academic exercise? :-)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Missed index opportunity for outer join?

2005-12-06 Thread Ron Mayer

Tom Lane wrote:

[EMAIL PROTECTED] writes:

On Mon, 5 Dec 2005, Tom Lane wrote:



I speculate that the seq_scan wasn't really the slow part
compared to not using using both parts of the index in the 
second part of the plan.  The table point_features is tens of

thousands of rows, while the table facets is tens of millions.


Agreed, but it's still odd that it would use a seqscan in one case and
not the other.


Hmm.  Unfortunately that was happening on a production system
and the amount of data in the tables has changed - and now I'm
no longer getting a seq_scan when I try to reproduce it.   That
system is still using 8.1.0.

The point_features table is pretty dynamic and it's possible
that the data changed between my 'explain analyze' statement in
the first post in this thread.   However since both of them
show an estimate of rows=948 and returned an actual of 917 I
don't think that happened.


I found the reason why the fac_id=261 clause isn't getting used as an
index qual; it's a bit of excessive paranoia that goes back to 2002.
I've fixed that for 8.1.1, but am still wondering about the seqscan
on the other side of the join.


I now have a development system with cvs-tip; but have not yet
reproduced the seq scan on it either.  I'm using the same data
that was in point_features with featureid=120 - but don't have
any good way of knowing what other data may have been in the table
at the time.   If desired, I could set up a cron job to periodically
explain analyze that query and see if it recurs.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] LVM and Postgres

2005-12-06 Thread August Zajonc

Rory Campbell-Lange wrote:

The server has a 250GB RAID10 (LSI 320-I + BBU) volume which I am
thinking of slicing up in the following way (Linux 2.6 kernel):

/ : ext3  :  47GB (root, home etc)
/boot : ext3  :  1GB
/tmp  : ext2  :  2GB
/usr  : ext3  :  4GB
/var  : ext3  :  6GB
---
60GB

VG:190GB approx
---
Initially divided so:  
/data : ext3  : 90GB

/postgres : xfs   : 40GB

This gives me left over space of roughly 60GB to extend into on the

volume group, which I can balance between the /data and /postgres
logical volumes as needed.

Are there any major pitfalls to this approach?

Thanks,
Rory



It looks like you are using fast disks and xfs for filesystem on the 
/postgresql partition. That's nice.


How many disks in the array?

One thing you miss is sticking a bunch of sequential log writes on a 
separate spindle as far as I can see with this? WAL / XFS (i think) both 
have this pattern. If you've got a fast disk and can do BBU write 
caching your WAL writes will hustle.


Others can probably speak a bit better on any potential speedups.

- August


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Missed index opportunity for outer join?

2005-12-06 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 The point_features table is pretty dynamic and it's possible
 that the data changed between my 'explain analyze' statement in
 the first post in this thread.   However since both of them
 show an estimate of rows=948 and returned an actual of 917 I
 don't think that happened.

Yeah, I had considered the same explanation and rejected it for the same
reason.  Also, the difference in estimated cost is significant (265.85
for the seqscan vs 172.17 for the bitmap scan) so it's hard to think
that a small change in stats --- so small as to not reflect in estimated
row count --- would change the estimate by that much.

[ thinks some more... ]  Of course, what we have to remember is that the
planner is actually going to choose based on the ultimate join cost, not
on the subplan costs.  The reason the seqscan survived initial
comparisons at all is that it has a cheaper startup cost (less time to
return the first tuple) than the bitmap scan, and this will be reflected
into a cheaper startup cost for the overall nestloop.  The extra hundred
units of total cost would only reflect into the nestloop total cost ---
and there, they would be considered down in the noise compared to a
90k total estimate.  So probably what happened is that the planner
preferred this plan on the basis that the total costs are the same to
within estimation error while the startup cost is definitely less.

In this explanation, the reason for the change in plans over time could
be a change in the statistics for the other table.  Is facets more
dynamic than point_features?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Russell Garrett

On 6 Dec 2005, at 16:47, Joshua Kramer wrote:
Has anyone ever compared TSearch2 to Lucene, as far as performance  
is concerned?


In our experience (small often-updated documents) Lucene leaves  
tsearch2 in the dust. This probably has a lot to do with our usage  
pattern though. For our usage it's very beneficial to have the index  
on a separate machine to the data, however in many cases this won't  
make sense. Lucene is also a lot easier to cluster than Postgres  
(it's simply a matter of NFS-mounting the index).


Russ Garrett
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Bruno Wolff III
On Tue, Dec 06, 2005 at 10:52:57 +0100,
  Csaba Nagy [EMAIL PROTECTED] wrote:
 Joost,
 
 Why do you use an offset here ? I guess you're traversing the table
 somehow, in this case it would be better to remember the last zipcode +
 housenumber and put an additional condition to get the next bigger than
 the last one you've got... that would go for the index on
 zipcode+housenumber and be very fast. The big offset forces postgres to
 traverse that many entries until it's able to pick the one row for the
 result...

The other problem with saving an offset, is unless the data isn't changing
or you are doing all of the searches in one serialized transaction, the
fixed offset might not put you back where you left off.
Using the last key, instead of counting records is normally a better way
to do this.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Alan Stange

Vivek Khera wrote:


On Dec 6, 2005, at 11:14 AM, Ameet Kini wrote:

need for vacuums. However, it'd be great if there was a similar 
automatic

reindex utility, like say, a pg_autoreindex daemon. Are there any plans
for this feature?  If not, then would cron scripts be the next best


what evidence do you have that you are suffering index bloat?  or are 
you just looking for solutions to problems that don't exist as an 
academic exercise? :-) 


The files for the two indices on a single table used 7.8GB of space 
before a reindex, and 4.4GB after.   The table had been reindexed over 
the weekend and a vacuum was completed on the table about 2 hours ago.


The two indices are now 3.4GB smaller.   I don't think this counts as 
bloat, because of our use case.  Even so, we reindex our whole database 
every weekend.


-- Alan


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Michael Riess

Ameet Kini schrieb:


This didn't get through the first time around, so resending it again.
Sorry for any duplicate entries.

Hello,

I have a question on postgres's performance tuning, in particular, the
vacuum and reindex commands. Currently I do a vacuum (without full) on all
of my tables.  


I'm curious ... why no full vacuum? I bet that the full vacuum will 
compact your (index) tables as much as a reindex would.


I guess the best advice is to increase FSM and to use autovacuum.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Missed index opportunity for outer join?

2005-12-06 Thread Ron Mayer

Tom Lane wrote:
...planner is actually going to choose based on the ultimate join cost, 
not on the subplan costs...


In this explanation, the reason for the change in plans over time could
be a change in the statistics for the other table.  Is facets more
dynamic than point_features?


In total rows changing it's more dynamic, but percentage-wise, it's
less dynamic (point_features probably turns round 50% of it's rows
in a day -- while facets turns over about 3% per day -- but facets
is 1000X larger).

Facets is a big table with rather odd distributions of values.
Many of the values in the indexed columns show up only
once, others show up hundreds-of-thousands of times.  Perhaps
an analyze ran and just randomly sampled differently creating
different stats on that table?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Tom Lane
Alan Stange [EMAIL PROTECTED] writes:
 Vivek Khera wrote:
 what evidence do you have that you are suffering index bloat?

 The files for the two indices on a single table used 7.8GB of space 
 before a reindex, and 4.4GB after.

That's not bloat ... that's pretty nearly in line with the normal
expectation for a btree index, which is about 2/3rds fill factor.
If the compacted index were 10X smaller then I'd agree that you have
a bloat problem.

Periodic reindexing on this scale is not doing a lot for you except
thrashing your disks --- you're just giving space back to the OS that
will shortly be sucked up again by the same index.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Context switching and Xeon processors

2005-12-06 Thread Brandon Metcalf
We're running a dual Xeon machine with hyperthreading enabled and
PostgreSQL 8.0.3.  Below is the type of CPUs:

  processor   : 3
  vendor_id   : GenuineIntel
  cpu family  : 15
  model   : 4
  model name  : Intel(R) Xeon(TM) CPU 3.20GHz
  stepping: 1
  cpu MHz : 3200.274
  cache size  : 1024 KB
  ...

We've been tuning the kernel (2.4 SMP flavor) and have improved
performance quite a bit.  I'm now wondering if turning off HT will
improve performance even more.  Based on the vmstat output below, is
the context switching typical or too high?  And what is the latest on
the state of PostgreSQL running on Xeon processors with HT turned on?
I searched the archives, but couldn't discern anything definitive.

   r  b   swpd   free   buff  cache   si   sobibo   incs us sy wa id
   1  0 135944  64612  17136 375681600 0   210  154   178  2  0  4 
94
   1  0 135940  46600  17204 375449600 1  1231  442  3658  7  3 10 
80
   1  3 135940  51228  17240 375468000 0  1268  255  2659  4  1 14 
81
   1  0 135940  58512  17300 375468400 0  1818  335  1526  2  1 32 
65
   1  1 135940  18104  17328 380651600 17670   476 1314  1962  2  2 41 
56
   0  1 135940  17776  17232 381162000 23193   394 1600  2097  2  2 53 
44
   0  1 135940  17944  17188 380963600 25459   349 1547  2013  2  2 50 
46
   0  3 135940  18816  15184 379831200 24284  1328 1529  4730  6  5 53 
36
   0  6 135940  23536   6060 381708800 27376  1332 1350  2628  2  3 56 
39
   0  5 135940  18008   6036 382713200 18806  1539 1410  1416  1  2 61 
36
   0  5 135940  18492   5708 382666000  3540 10354  736   955  2  2 76 
20
   0  3 135940  18940   5788 382986400  2308  7506  707   519  2  1 81 
15
   1  4 135940  18980   5820 382883600   138  3503  556   261  1  0 74 
24
   0 10 135940  39332   5896 32400   579  2805  621  4104  7  4 54 
35
   0  4 135936  37816   5952 379140400   260  1887  384  1574  2  1 40 
57
   0  5 135936  29552   5996 380226000   290  1642  434  1944  3  1 38 
58


-- 
Brandon

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Missed index opportunity for outer join?

2005-12-06 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 In this explanation, the reason for the change in plans over time could
 be a change in the statistics for the other table.  Is facets more
 dynamic than point_features?

 Facets is a big table with rather odd distributions of values.
 Many of the values in the indexed columns show up only
 once, others show up hundreds-of-thousands of times.  Perhaps
 an analyze ran and just randomly sampled differently creating
 different stats on that table?

If you have background tasks doing ANALYZEs then this explanation seems
plausible enough.  I'm willing to accept it anyway ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Missed index opportunity for outer join?

2005-12-06 Thread Ron Mayer

Tom Lane wrote:

If you have background tasks doing ANALYZEs then this explanation seems
plausible enough.  I'm willing to accept it anyway ...


Yup, there are such tasks.   I could dig through logs to try to confirm
or reject it; but I think it's reasonably likely that this happened.
Basically, data gets added to that table as it becomes ready from other
systems, and after each batch a vacuum analyze is run.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] LVM and Postgres

2005-12-06 Thread Rory Campbell-Lange
Hi August. Thanks very much for your mail.

On 06/12/05, August Zajonc ([EMAIL PROTECTED]) wrote:
 Rory Campbell-Lange wrote:
 The server has a 250GB RAID10 (LSI 320-I + BBU) volume which I am
 thinking of slicing up in the following way (Linux 2.6 kernel):
 
 / : ext3  :  47GB (root, home etc)
 /boot : ext3  :  1GB
 /tmp  : ext2  :  2GB
 /usr  : ext3  :  4GB
 /var  : ext3  :  6GB
 ---
 60GB
 
 VG:190GB approx
 ---
 Initially divided so:  
 /data : ext3  : 90GB
 /postgres : xfs   : 40GB
 
 This gives me left over space of roughly 60GB to extend into on the
 volume group, which I can balance between the /data and /postgres
 logical volumes as needed.
 
 It looks like you are using fast disks and xfs for filesystem on the 
 /postgresql partition. That's nice.
 
 How many disks in the array?

Four.

 One thing you miss is sticking a bunch of sequential log writes on a 
 separate spindle as far as I can see with this? WAL / XFS (i think) both 
 have this pattern. If you've got a fast disk and can do BBU write 
 caching your WAL writes will hustle.

Yes, we don't have any spare disks unfortunately. We have enabled the
BBU write, so we are hoping for good performance. I'd be grateful for
some advice on dd/bonnie++ tests for checking this.

 Others can probably speak a bit better on any potential speedups.

I'd better test extending the Logical Volumes too!

Many thanks
Rory



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Alan Stange

Tom Lane wrote:

Alan Stange [EMAIL PROTECTED] writes:
  

Vivek Khera wrote:


what evidence do you have that you are suffering index bloat?
  


  
The files for the two indices on a single table used 7.8GB of space 
before a reindex, and 4.4GB after.



That's not bloat ... that's pretty nearly in line with the normal
expectation for a btree index, which is about 2/3rds fill factor.
If the compacted index were 10X smaller then I'd agree that you have
a bloat problem.
  

I wrote I don't think this counts as bloat  I still don't.

-- Alan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Context switching and Xeon processors

2005-12-06 Thread Brandon Metcalf
t == [EMAIL PROTECTED] writes:

 t Brandon Metcalf [EMAIL PROTECTED] writes:
 t  We've been tuning the kernel (2.4 SMP flavor) and have improved
 t  performance quite a bit.  I'm now wondering if turning off HT will
 t  improve performance even more.  Based on the vmstat output below, is
 t  the context switching typical or too high?

 t Given that your CPU usage is hovering around 2%, it's highly unlikely
 t that you'll be able to measure any change at all by fiddling with HT.
 t What you need to be working on is disk I/O --- the 80% wait number
 t is what should be getting your attention, not the CS number.

 t (FWIW, on the sort of hardware you're talking about, I wouldn't worry
 t about CS rates lower than maybe 1/sec --- the hardware can sustain
 t well over 10x that.)


Yes, I agree the disk I/O is an issue and that's what we've been
addressing with the tuning we've been doing and have been able to
improve.  I think that we really need to go to a RAID 10 array to
address the I/O issue, but thought I would investigate the context
switching issue.

Thanks for the information.

-- 
Brandon

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Context switching and Xeon processors

2005-12-06 Thread Steinar H. Gunderson
On Tue, Dec 06, 2005 at 03:01:02PM -0600, Brandon Metcalf wrote:
 We're running a dual Xeon machine with hyperthreading enabled and
 PostgreSQL 8.0.3.

The two single most important things that will help you with high rates of
context switching:

  - Turn off hyperthreading.
  - Upgrade to 8.1.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] two disks - best way to use them?

2005-12-06 Thread David Lang

On Tue, 6 Dec 2005, Thomas Harold wrote:


Ron wrote:


For accuracy's sake, which exact config did you finally use?

How did you choose the config you finally used?  Did you test the three 
options or just pick one?


(Note: I'm not the original poster.)

I just picked the option of putting the data/pg_xlog directory (WAL) on a 2nd 
set of spindles.  That was the easiest thing for me to change on this test 
box.


The test server is simply a Gentoo box running software RAID and LVM2. The 
primary disk set is 2x7200RPM 300GB drives and the secondary disk set is 
2x5400RPM 300GB drives.  Brand new install of PGSQL 8.1, with mostly default 
settings (I changed FSM pages to be a higher value, max_fsm_pages = 15). 
PGSQL was given it's own ext3 32GB LVM volume on the primary disk set 
(2x7200RPM).  Originally, all files were on the primary disk.


the WAL is more sensitive to drive speeds then the data is, so you may 
pick up a little more performance by switching the WAL to the 7200 rpm 
drives instead of the 5400 rpm drives.


if you see a noticable difference with this, consider buying a pair of 
smaller, but faster drives (10k or 15k rpm drives, or a solid-state 
drive). you can test this (with significant data risk) by putting the WAL 
on a ramdisk and see what your performance looks like.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High context switches occurring

2005-12-06 Thread Tom Lane
Anjan Dave [EMAIL PROTECTED] writes:
 -bash-3.00$ time pgbench -c 1000 -t 30 pgbench
 starting vacuum...end.
 transaction type: TPC-B (sort of)
 scaling factor: 1
 number of clients: 1000
 number of transactions per client: 30
 number of transactions actually processed: 3/3
 tps = 45.871234 (including connections establishing)
 tps = 46.092629 (excluding connections establishing)

I can hardly think of a worse way to run pgbench :-(.  These numbers are
about meaningless, for two reasons:

1. You don't want number of clients (-c) much higher than scaling factor
(-s in the initialization step).  The number of rows in the branches
table will equal -s, and since every transaction updates one
randomly-chosen branches row, you will be measuring mostly row-update
contention overhead if there's more concurrent transactions than there
are rows.  In the case -s 1, which is what you've got here, there is no
actual concurrency at all --- all the transactions stack up on the
single branches row.

2. Running a small number of transactions per client means that
startup/shutdown transients overwhelm the steady-state data.  You should
probably run at least a thousand transactions per client if you want
repeatable numbers.

Try something like -s 10 -c 10 -t 3000 to get numbers reflecting test
conditions more like what the TPC council had in mind when they designed
this benchmark.  I tend to repeat such a test 3 times to see if the
numbers are repeatable, and quote the middle TPS number as long as
they're not too far apart.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] postgresql performance tuning

2005-12-06 Thread Michael Stone

On Tue, Dec 06, 2005 at 04:03:22PM -0600, Ameet Kini wrote:

I'm running postgresql v8.0 and my problem is that running vacuum on my
indices are blazing fast (upto 10x faster) AFTER running reindex. For a
table with only 1 index, the time to do a vacuum (without full) went down
from 45 minutes to under 3 minutes.


I've also noticed a fairly large increase in vacuum speed after a
reindex. (To the point where the reindex + vacuum was faster than just a
vacuum.)

Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] LVM and Postgres

2005-12-06 Thread Michael Stone

On Tue, Dec 06, 2005 at 09:36:23PM +, Rory Campbell-Lange wrote:

Yes, we don't have any spare disks unfortunately. We have enabled the
BBU write, so we are hoping for good performance.


Even if you don't use seperate disks you'll probably get better
performance by putting the WAL on a seperate ext2 partition. xfs gives
good performance for the table data, but is not particularly good for
the WAL. 


Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] LVM and Postgres

2005-12-06 Thread Alex Turner
I would argue that almost certainly won't by doing that as you will
create a new place even further away for the disk head to seek to
instead of just another file on the same FS that is probably closer to
the current head position.

Alex

On 12/6/05, Michael Stone [EMAIL PROTECTED] wrote:
 On Tue, Dec 06, 2005 at 09:36:23PM +, Rory Campbell-Lange wrote:
 Yes, we don't have any spare disks unfortunately. We have enabled the
 BBU write, so we are hoping for good performance.

 Even if you don't use seperate disks you'll probably get better
 performance by putting the WAL on a seperate ext2 partition. xfs gives
 good performance for the table data, but is not particularly good for
 the WAL.

 Mike Stone

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] LVM and Postgres

2005-12-06 Thread Michael Stone

On Tue, Dec 06, 2005 at 07:52:25PM -0500, Alex Turner wrote:

I would argue that almost certainly won't by doing that as you will
create a new place even further away for the disk head to seek to
instead of just another file on the same FS that is probably closer to
the current head position.


I would argue that you should benchmark it instead of speculating. You
are perhaps underestimating the effect of the xfs log. (Ordinarily xfs
has great performance, but it seems to be fairly lousy at
fsync/osync/etc operations in my benchmarks; my wild speculation is that
the sync forces a log flush.) At any rate you're going to have a lot of
head movement on any reasonably sized filesystem anyway, and I'm not
convinced that hoping that your data will happen to land close to your log is
a valid, repeatable optimization technique. Note that the WAL will
wander around the disk as files are created and deleted, whereas tables
are basically updated in place.

Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Christopher Kings-Lynne

...

So you'll avoid a non-core product and instead only use another non-core 
product...?


Chris

Michael Riess wrote:


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?



I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the near 
future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Faster db architecture for a twisted table.

2005-12-06 Thread Mitchell Skinner
On Sat, 2005-12-03 at 23:00 +, Rodrigo Madera wrote:
 CREATE TABLE person(
id bigint PRIMARY KEY,
first_name TEXT,
age INT,
mother bigint REFERENCES person,
father biging REFERENCES person,
siblings array of bigints  (don't remember the syntax, but you get the 
 point)
 );
 
 Well, this is ok, but imagine a search for  brothers of person id
 34. We would have to search inside the record's 'siblings' array. Is
 this a bad design? is this going to be slow?

Well, I don't know how close this example is to your actual problem, but
the siblings array is redundant, AFAICS.  If you got rid of it, you
could query for full sibling brothers with something like (not tested):

select bro.* from
person p inner join person bro
on (p.mother = bro.mother)
   AND (p.father = bro.father)
where
bro.sex='M' and p.id=34

...assuming you added a sex field, which you would need in any case to
query for brothers.

You could query for half-siblings by changing the AND into an OR, I
think.

Mitch


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] need help

2005-12-06 Thread Jenny
I'm running PostgreSQL 8.0.3 on i686-pc-linux-gnu (Fedora Core 2). I've been
dealing with Psql for over than 2 years now, but I've never had this case
before.

I have a table that has about 20 rows in it.

   Table public.s_apotik
Column |  Type| Modifiers
---+--+--
obat_id| character varying(10)| not null
stock  | numeric  | not null
s_min  | numeric  | not null
s_jual | numeric  | 
s_r_jual   | numeric  | 
s_order| numeric  | 
s_r_order  | numeric  | 
s_bs   | numeric  | 
last_receive   | timestamp without time zone  |
Indexes:
   s_apotik_pkey PRIMARY KEY, btree(obat_id)
   
When I try to UPDATE one of the row, nothing happens for a very long time.
First, I run it on PgAdminIII, I can see the miliseconds are growing as I
waited. Then I stop the query, because the time needed for it is unbelievably
wrong.

Then I try to run the query from the psql shell. For example, the table has
obat_id : A, B, C, D.
db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='A';
( nothing happens.. I press the Ctrl-C to stop it. This is what comes out
:)
Cancel request sent
ERROR: canceling query due to user request

(If I try another obat_id)
db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='B';
(Less than a second, this is what comes out :)
UPDATE 1

I can't do anything to that row. I can't DELETE it. Can't DROP the table. 
I want this data out of my database.
What should I do? It's like there's a falsely pointed index here.
Any help would be very much appreciated.


Regards,
Jenny Tania



__ 
Yahoo! DSL – Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] need help (not anymore)

2005-12-06 Thread Jenny
I run the VACUUM as you suggested, but still no response from the server. So, I
decided to DROP the database. I got a message that the database is being used.
I closed every application that accessing it. But, the message remains.

I checked the server processes (ps -ax). There were lots of  'UPDATE is waiting
...' on the list. I killed them all. I backuped current database and DROP the
database, restore to the backup file I just made. 

Don't really know why this happened, but thankfully now, everything's normal.
Thank you, guys.

Regards,
Jenny Tania



__ 
Yahoo! DSL – Something to write home about. 
Just $16.99/mo. or less. 
dsl.yahoo.com 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Join the same row

2005-12-06 Thread Edison Azzi

Hi,

I´m trying to optimize some selects between 2 tables and the best way I 
found was
alter the first table and add the fields of the 2nd table. I adjusted 
the contents and
now a have only one table with all info that I need. Now resides my 
problem, because

of legacy queries I decided to make a Rule that replace the 2nd table.

Until now all worked well, but I found when I make a join between de result
table and de Rule, even tought is the same row in the same table, the 
optimizer

generete two access for the same row:
cta_pag is the table and ctapag_adm is the  rule.

CREATE OR REPLACE RULE _RETURN AS
   ON SELECT TO ctapag_adm DO INSTEAD  SELECT cta_pag.nrlancto, 
cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai

  FROM cta_pag
 WHERE cta_pag.origem = 'A'::bpchar;

This is one of the legacy queries:

select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and 
p.nrlancto = 21861;


EXPLAIN:
Nested Loop  (cost=0.00..11.49 rows=1 width=443) (actual 
time=0.081..0.088 rows=1 loops=1)
 -  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1 
width=408) (actual time=0.044..0.046 rows=1 loops=1)

   Index Cond: (nrlancto = 21861::numeric)
 -  Index Scan using cta_pag_pk on cta_pag  (cost=0.00..5.74 rows=1 
width=35) (actual time=0.023..0.025 rows=1 loops=1)

   Index Cond: (21861::numeric = nrlancto)
   Filter: (origem = 'A'::bpchar)
Total runtime: 0.341 ms


   Resulting in twice the time for accessing.

Acessing just on time the same row:

select * from cta_pag p where  p.nrlancto = 21861

EXPLAIN:
Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1 
width=408) (actual time=0.044..0.047 rows=1 loops=1)

 Index Cond: (nrlancto = 21861::numeric)
Total runtime: 0.161 ms


   Is there a way to force the optimizer to understand that is the same 
row?


   Thanks,
   Edison


--
Edison Azzi
edisonazzi (at ) terra ( dot ) com ( dot ) br


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Query Fails with error calloc - Cannot alocate memory

2005-12-06 Thread Howard Oblowitz
Title: Query Fails with error calloc - Cannot alocate memory






Hi 

I am trying to run a query that selects 26 million rows from a

table with 68 byte rows.

When run on the Server via psql the following error occurs:

calloc : Cannot allocate memory

When run via ODBC from Cognos Framework Manager only works

if we limit the retrieval to 3 million rows.

I notice that the memory used by the query when run on the Server increases

to about 2.4 GB before the query fails.

Postgres version is 7.3.4

Running on Linux Redhat 7.2

4 GB memory

7 Processor 2.5 Ghz

Shmmax set to 2 GB

Configuration Parameters

Shared Buffers   12 288

Max Connections  16

Wal buffers24

Sort Mem40960

Vacuum Mem   80192

Checkpoint Timeout  600

Enable Seqscan  false

Effective Cache Size 20


Results of explain analyze and expain analyze verbose:

explain analyze select * from flash_by_branches;

 QUERY PLAN 

--

Seq Scan on flash_by_branches (cost=1.00..100567542.06 rows=26854106 width=68) (actual time=12.14..103936.35 rows=26854106 loops=1)

Total runtime: 122510.02 msec

(2 rows)

explain analyze verbose:

{ SEQSCAN

 :startup_cost 1.00

 :total_cost 100567542.06

 :rows 26854106

 :width 68

 :qptargetlist (

 { TARGETENTRY

 :resdom

 { RESDOM

 :resno 1

 :restype 1043

 :restypmod 8

 :resname br_code

 :reskey 0

 :reskeyop 0

 :ressortgroupref 0

 :resjunk false

 }

 :expr

 { VAR

 :varno 1

 :varattno 1

 :vartype 1043

 :vartypmod 8

 :varlevelsup 0

 :varnoold 1

 :varoattno 1

 }

 }

 { TARGETENTRY

 :resdom

 { RESDOM

 :resno 2

 :restype 23

 :restypmod -1

 :resname fty_code

 :reskey 0

 :reskeyop 0

 :ressortgroupref 0

 :resjunk false

 }

 :expr

 { VAR

 :varno 1

 :varattno 2

 :vartype 23

 :vartypmod -1

 :varlevelsup 0

 :varnoold 1

 :varoattno 2

 }

 }

 { TARGETENTRY

 :resdom

 { RESDOM

 :resno 3

 :restype 1082

 :restypmod -1

 :resname period

 :reskey 0

 :reskeyop 0

 :ressortgroupref 0

 :resjunk false

 }

 :expr

 { VAR

 :varno 1

 :varattno 3

 :vartype 1082

 :vartypmod -1

 :varlevelsup 0

 :varnoold 1

 :varoattno 3

 }

 }

 { TARGETENTRY

 :resdom

 { RESDOM

 :resno 4

 :restype 1700

 :restypmod 786436

 :resname value

 :reskey 0

 :reskeyop 0

 :ressortgroupref 0

 :resjunk false

 }

 :expr

 { VAR

 :varno 1

 :varattno 4

 :vartype 1700

 :vartypmod 786436

 :varlevelsup 0

 :varnoold 1

 :varoattno 4

 }

 }

 { TARGETENTRY

 :resdom

 { RESDOM

 :resno 7

 :restype 1700

 :restypmod 786438

 :resname value1

 :reskey 0

 :reskeyop 0

 :ressortgroupref 0

 :resjunk false

 }

 :expr

 { VAR

 :varno 1

 :varattno 7

 :vartype 1700

 :vartypmod 786438

 :varlevelsup 0

 :varnoold 1

 :varoattno 7

 }

 }

 )

 :qpqual 

 :lefttree 

 :righttree 

 :extprm ()

 :locprm ()

 :initplan 

 :nprm 0

 :scanrelid 1

 }

Seq Scan on flash_by_branches (cost=1.00..100567542.06 rows=26854106 width=68) (actual time=6.59..82501.15 rows=2685

4106 loops=1)

Total runtime: 102089.00 msec

(196 rows)



Please assist.

Thanks,

Howard Oblowitz



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.859 / Virus Database: 585 - Release Date: 14/02/2005





Re: [PERFORM] LVM and Postgres

2005-12-06 Thread Kevin Brown
Michael Stone wrote:
 Note that the WAL will
 wander around the disk as files are created and deleted, whereas tables
 are basically updated in place.

Huh?  I was rather under the impression that the WAL files (in
pg_xlog, right?) were reused once they'd been created, so their
locations on the disk should remain the same, as should their data
blocks (roughly, depending on the implementation of the filesystem, of
course).


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Query Fails with error calloc - Cannot alocate memory

2005-12-06 Thread Christopher Kings-Lynne
If you're trying to retrieve 26 million rows into RAM in one go of 
course it'll be trouble.


Just use a cursor.  (DECLARE/FETCH/MOVE)

Chris


Howard Oblowitz wrote:

Hi …

I am trying to run a query that selects 26 million rows from a

table with 68 byte rows.

When run on the Server via psql the following error occurs:

calloc : Cannot allocate memory

When run via ODBC from Cognos Framework Manager only works

if we limit the retrieval to 3 million rows.

I notice that the memory used by the query when run on the Server increases

to about 2.4 GB before the query fails.

Postgres version is 7.3.4

Running on Linux Redhat 7.2

4 GB memory

7 Processor 2.5 Ghz

Shmmax set to 2 GB

Configuration Parameters

Shared Buffers  12 288

Max Connections 16

Wal buffers 24

Sort Mem40960

Vacuum Mem  80192

Checkpoint Timeout  600

Enable Seqscan  false

Effective Cache Size20


Results of explain analyze and expain analyze verbose:

explain analyze select * from flash_by_branches;

  QUERY 
PLAN 


--

 Seq Scan on flash_by_branches  (cost=1.00..100567542.06 
rows=26854106 width=68) (actual time=12.14..103936.35 rows=26854106 loops=1)


 Total runtime: 122510.02 msec

(2 rows)

explain analyze verbose:

{ SEQSCAN

:startup_cost 1.00

:total_cost 100567542.06

:rows 26854106

:width 68

:qptargetlist (

   { TARGETENTRY

   :resdom

  { RESDOM

  :resno 1

  :restype 1043

  :restypmod 8

  :resname br_code

  :reskey 0

  :reskeyop 0

  :ressortgroupref 0

  :resjunk false

  }

   :expr

  { VAR

  :varno 1

  :varattno 1

  :vartype 1043

  :vartypmod 8

  :varlevelsup 0

  :varnoold 1

  :varoattno 1

  }

   }

   { TARGETENTRY

   :resdom

  { RESDOM

  :resno 2

  :restype 23

  :restypmod -1

  :resname fty_code

  :reskey 0

  :reskeyop 0

  :ressortgroupref 0

  :resjunk false

  }

   :expr

  { VAR

  :varno 1

  :varattno 2

  :vartype 23

  :vartypmod -1

  :varlevelsup 0

  :varnoold 1

  :varoattno 2

}

   }

   { TARGETENTRY

   :resdom

  { RESDOM

  :resno 3

  :restype 1082

  :restypmod -1

  :resname period

  :reskey 0

  :reskeyop 0

  :ressortgroupref 0

  :resjunk false

  }

   :expr

  { VAR

  :varno 1

  :varattno 3

  :vartype 1082

  :vartypmod -1

  :varlevelsup 0

  :varnoold 1

  :varoattno 3

  }

   }

   { TARGETENTRY

   :resdom

  { RESDOM

  :resno 4

  :restype 1700

  :restypmod 786436

  :resname value

  :reskey 0

  :reskeyop 0

  :ressortgroupref 0

  :resjunk false

  }

   :expr

  { VAR

  :varno 1

  :varattno 4

  :vartype 1700

  :vartypmod 786436

  :varlevelsup 0

  :varnoold 1

  :varoattno 4

  }

   }

   { TARGETENTRY

   :resdom

  { RESDOM

  :resno 7

  :restype 1700

  :restypmod 786438

  :resname value1

  :reskey 0

  :reskeyop 0

  :ressortgroupref 0

  :resjunk false

  }

   :expr

  { VAR

  :varno 1

  :varattno 7

  :vartype 1700

  :vartypmod 786438

  :varlevelsup 0

  :varnoold 1

  :varoattno 7

  }

   }

)

:qpqual 

:lefttree 

:righttree 

:extprm ()

:locprm ()

:initplan 

:nprm 0

:scanrelid 1

}

 Seq Scan on flash_by_branches  (cost=1.00..100567542.06 
rows=26854106 width=68) (actual time=6.59..82501.15 rows=2685


4106 loops=1)

 Total runtime: 102089.00 msec

(196 rows)



Please assist.

Thanks,

Howard Oblowitz



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.859 / Virus Database: 585 - Release Date: 14/02/2005
 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High context switches occurring

2005-12-06 Thread Bruce Momjian
Tom Lane wrote:
 Anjan Dave [EMAIL PROTECTED] writes:
  -bash-3.00$ time pgbench -c 1000 -t 30 pgbench
  starting vacuum...end.
  transaction type: TPC-B (sort of)
  scaling factor: 1
  number of clients: 1000
  number of transactions per client: 30
  number of transactions actually processed: 3/3
  tps = 45.871234 (including connections establishing)
  tps = 46.092629 (excluding connections establishing)
 
 I can hardly think of a worse way to run pgbench :-(.  These numbers are
 about meaningless, for two reasons:
 
 1. You don't want number of clients (-c) much higher than scaling factor
 (-s in the initialization step).  The number of rows in the branches
 table will equal -s, and since every transaction updates one

Should we throw a warning when someone runs the test this way?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High context switches occurring

2005-12-06 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Tom Lane wrote:
 1. You don't want number of clients (-c) much higher than scaling factor
 (-s in the initialization step).

 Should we throw a warning when someone runs the test this way?

Not a bad idea (though of course only for the standard scripts).
Tatsuo, what do you think?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query Fails with error calloc - Cannot alocate memory

2005-12-06 Thread Neil Conway
On Mon, 2005-12-05 at 09:42 +0200, Howard Oblowitz wrote:
 I am trying to run a query that selects 26 million rows from a
 table with 68 byte rows.
 
 When run on the Server via psql the following error occurs:
 
 calloc : Cannot allocate memory

That's precisely what I'd expect: the backend will process the query and
begin sending back the entire result set to the client. The client will
attempt to allocate a local buffer to hold the entire result set, which
obviously fails in this case.

You probably want to explicitly create and manipulate a cursor via
DECLARE, FETCH, and the like -- Postgres will not attempt to do this
automatically (for good reason).

 Postgres version is 7.3.4

You should consider upgrading, 7.3 is quite old. At the very least, you
should probably be using the most recent 7.3.x release, 7.3.11.

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 vs. Apache Lucene

2005-12-06 Thread Michael Riess
No, my problem is that using TSearch2 interferes with other core 
components of postgres like (auto)vacuum or dump/restore.




...

So you'll avoid a non-core product and instead only use another non-core 
product...?


Chris

Michael Riess wrote:


Has anyone ever compared TSearch2 to Lucene, as far as performance is 
concerned?



I'll stay away from TSearch2 until it is fully integrated in the 
postgres core (like create index foo_text on foo (texta, textb) USING 
TSearch2). Because a full integration is unlikely to happen in the 
near future (as far as I know), I'll stick to Lucene.


Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org