Re: [PERFORM] two disks - best way to use them?
At 12:52 AM 12/6/2005, Thomas Harold wrote: David Lang wrote: in that case you logicly have two disks, so see the post from Ron earlier in this thread. And it's a very nice performance gain. Percent spent waiting according to top is down around 10-20% instead of 80-90%. While I'm not prepared to benchmark, database performance is way up. The client machines that are writing the data are running closer to 100% CPU (before they were well below 50% CPU utilization). For accuracy's sake, which exact config did you finally use? How did you choose the config you finally used? Did you test the three options or just pick one? Ron ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [GENERAL] need help
Jenny schrieb: I'm running PostgreSQL 8.0.3 on i686-pc-linux-gnu (Fedora Core 2). I've been dealing with Psql for over than 2 years now, but I've never had this case before. I have a table that has about 20 rows in it. Table public.s_apotik Column | Type| Modifiers ---+--+-- obat_id| character varying(10)| not null stock | numeric | not null s_min | numeric | not null s_jual | numeric | s_r_jual | numeric | s_order| numeric | s_r_order | numeric | s_bs | numeric | last_receive | timestamp without time zone | Indexes: s_apotik_pkey PRIMARY KEY, btree(obat_id) When I try to UPDATE one of the row, nothing happens for a very long time. First, I run it on PgAdminIII, I can see the miliseconds are growing as I waited. Then I stop the query, because the time needed for it is unbelievably wrong. Then I try to run the query from the psql shell. For example, the table has obat_id : A, B, C, D. db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='A'; ( nothing happens.. I press the Ctrl-C to stop it. This is what comes out :) Cancel request sent ERROR: canceling query due to user request (If I try another obat_id) db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='B'; (Less than a second, this is what comes out :) UPDATE 1 I can't do anything to that row. I can't DELETE it. Can't DROP the table. I want this data out of my database. What should I do? It's like there's a falsely pointed index here. Any help would be very much appreciated. 1) lets hope you do regulary backups - and actually tested restore. 1a) if not, do it right now 2) reindex the table 3) try again to modify Q: are there any foreign keys involved? If so, reindex those tables too, just in case. did you vacuum regulary? HTH Tino ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance degradation after successive UPDATE's
Thanks Bruno, Issuing VACUUM FULL seems not to have influence on the time. I've added to my script VACUUM ANALYZE every 100 UPDATE's and run the test again (on different record) and the time still increase. Any other ideas? Thanks, Assaf. -Original Message- From: Bruno Wolff III [mailto:[EMAIL PROTECTED] Sent: Monday, December 05, 2005 10:36 PM To: Assaf Yaari Cc: pgsql-performance@postgresql.org Subject: Re: Performance degradation after successive UPDATE's On Mon, Dec 05, 2005 at 19:05:01 +0200, Assaf Yaari [EMAIL PROTECTED] wrote: Hi, I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0. My application updates counters in DB. I left a test over the night that increased counter of specific record. After night running (several hundreds of thousands updates), I found out that the time spent on UPDATE increased to be more than 1.5 second (at the beginning it was less than 10ms)! Issuing VACUUM ANALYZE and even reboot didn't seemed to solve the problem. You need to be running vacuum more often to get rid of the deleted rows (update is essentially insert + delete). Once you get too many, plain vacuum won't be able to clean them up without raising the value you use for FSM. By now the table is really bloated and you probably want to use vacuum full on it. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Memory Leakage Problem
Hi, I setup a database server using the following configuration. Redhat 9.0 Postgresql 8.0.3 Then, I setup a client workstation to access this database server with the following configuration. Redhat 9.0 unixODBC 2.2.11 psqlodbc-08.01.0101 and write a C++ program to run database query. In this program, it will access this database server using simple and complex (joining tables) SQL Select statement and retrieve the matched rows. For each access, it will connect the database and disconnect it. I found that the memory of the databaser server nearly used up (total 2G RAM). After I stop the program, the used memory did not free. Is there any configuration in postgresql.conf I should set? Currently, I just set the following in postgresql.conf listen_addresses = '*' max_stack_depth = 8100 (when I run ulimit -s the max. value that kernel supports = 8192) stats_row_level = true And, I run pg_autovacuum as background job. -- Kathy Lo ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] LVM and Postgres
I need to slice up a web server's disk space to provide space for postgres and storing binaries such as images and sound files. I'm thinking of using logical volume management (LVM) to help manage the amount of space I use between postgres and the data volumes. The server has a 250GB RAID10 (LSI 320-I + BBU) volume which I am thinking of slicing up in the following way (Linux 2.6 kernel): / : ext3 : 47GB (root, home etc) /boot : ext3 : 1GB /tmp : ext2 : 2GB /usr : ext3 : 4GB /var : ext3 : 6GB --- 60GB VG:190GB approx --- Initially divided so: /data : ext3 : 90GB /postgres : xfs : 40GB This gives me left over space of roughly 60GB to extend into on the volume group, which I can balance between the /data and /postgres logical volumes as needed. Are there any major pitfalls to this approach? Thanks, Rory -- Rory Campbell-Lange [EMAIL PROTECTED] www.campbell-lange.net ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Can this query go faster???
Hi, Is it possible to get this query run faster than it does now, by adding indexes, changing the query? SELECT customers.objectid FROM prototype.customers, prototype.addresses WHERE customers.contactaddress = addresses.objectid ORDER BY zipCode asc, housenumber asc LIMIT 1 OFFSET 283745 Explain: Limit (cost=90956.71..90956.71 rows=1 width=55) - Sort (cost=90247.34..91169.63 rows=368915 width=55) Sort Key: addresses.zipcode, addresses.housenumber - Hash Join (cost=14598.44..56135.75 rows=368915 width=55) Hash Cond: (outer.contactaddress = inner.objectid) - Seq Scan on customers (cost=0.00..31392.15 rows=368915 width=80) - Hash (cost=13675.15..13675.15 rows=369315 width=55) - Seq Scan on addresses (cost=0.00..13675.15 rows=369315 width=55) The customers table has an index on contactaddress and objectid. The addresses table has an index on zipcode+housenumber and objectid. TIA -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance degradation after successive UPDATE's
Hi, You might try these steps 1. Do a vacuum full analyze 2. Reindex the index on id column 3. Cluster the table based on this index On 12/5/05, Assaf Yaari [EMAIL PROTECTED] wrote: Hi, I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0. My application updates counters in DB. I left a test over the night that increased counter of specific record. After night running (several hundreds of thousands updates), I found out that the time spent on UPDATE increased to be more than 1.5 second (at the beginning it was less than 10ms)! Issuing VACUUM ANALYZE and even reboot didn't seemed to solve the problem. I succeeded to re-produce this with a simple test: I created a very simple table that looks like that: CREATE TABLE test1 ( id int8 NOT NULL, counter int8 NOT NULL DEFAULT 0, CONSTRAINT Test1_pkey PRIMARY KEY (id) ) ; I've inserted 15 entries and wrote a script that increase the counter of specific record over and over. The SQL command looks like this: UPDATE test1 SET counter=number WHERE id=10; At the beginning the UPDATE time was around 15ms. After ~9 updates, the execution time increased to be more than 120ms. 1. What is the reason for this phenomena? 2. Is there anything that can be done in order to improve this? Thanks, Assaf -- Regards Pandu ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Can this query go faster???
Hi, Is it possible to get this query run faster than it does now, by adding indexes, changing the query? SELECT customers.objectid FROM prototype.customers, prototype.addresses WHERE customers.contactaddress = addresses.objectid ORDER BY zipCode asc, housenumber asc LIMIT 1 OFFSET 283745 Explain: Limit (cost=90956.71..90956.71 rows=1 width=55) - Sort (cost=90247.34..91169.63 rows=368915 width=55) Sort Key: addresses.zipcode, addresses.housenumber - Hash Join (cost=14598.44..56135.75 rows=368915 width=55) Hash Cond: (outer.contactaddress = inner.objectid) - Seq Scan on customers (cost=0.00..31392.15 rows=368915 width=80) - Hash (cost=13675.15..13675.15 rows=369315 width=55) - Seq Scan on addresses (cost=0.00..13675.15 rows=369315 width=55) The customers table has an index on contactaddress and objectid. The addresses table has an index on zipcode+housenumber and objectid. When the resulting relation contains all the info from both tables, indexes won't help, seq scan is inevitable. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Can this query go faster???
Joost, Why do you use an offset here ? I guess you're traversing the table somehow, in this case it would be better to remember the last zipcode + housenumber and put an additional condition to get the next bigger than the last one you've got... that would go for the index on zipcode+housenumber and be very fast. The big offset forces postgres to traverse that many entries until it's able to pick the one row for the result... On Tue, 2005-12-06 at 10:43, Joost Kraaijeveld wrote: Hi, Is it possible to get this query run faster than it does now, by adding indexes, changing the query? SELECT customers.objectid FROM prototype.customers, prototype.addresses WHERE customers.contactaddress = addresses.objectid ORDER BY zipCode asc, housenumber asc LIMIT 1 OFFSET 283745 Explain: Limit (cost=90956.71..90956.71 rows=1 width=55) - Sort (cost=90247.34..91169.63 rows=368915 width=55) Sort Key: addresses.zipcode, addresses.housenumber - Hash Join (cost=14598.44..56135.75 rows=368915 width=55) Hash Cond: (outer.contactaddress = inner.objectid) - Seq Scan on customers (cost=0.00..31392.15 rows=368915 width=80) - Hash (cost=13675.15..13675.15 rows=369315 width=55) - Seq Scan on addresses (cost=0.00..13675.15 rows=369315 width=55) The customers table has an index on contactaddress and objectid. The addresses table has an index on zipcode+housenumber and objectid. TIA -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Can this query go faster???
Hi, -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Joost Kraaijeveld Gesendet: Dienstag, 6. Dezember 2005 10:44 An: Pgsql-Performance Betreff: [PERFORM] Can this query go faster??? SELECT customers.objectid FROM prototype.customers, prototype.addresses WHERE customers.contactaddress = addresses.objectid ORDER BY zipCode asc, housenumber asc LIMIT 1 OFFSET 283745 Explain: Limit (cost=90956.71..90956.71 rows=1 width=55) - Sort (cost=90247.34..91169.63 rows=368915 width=55) Sort Key: addresses.zipcode, addresses.housenumber - Hash Join (cost=14598.44..56135.75 rows=368915 width=55) Hash Cond: (outer.contactaddress = inner.objectid) - Seq Scan on customers (cost=0.00..31392.15 rows=368915 width=80) - Hash (cost=13675.15..13675.15 rows=369315 width=55) - Seq Scan on addresses (cost=0.00..13675.15 rows=369315 width=55) The customers table has an index on contactaddress and objectid. The addresses table has an index on zipcode+housenumber and objectid. The planner chooses sequential scans on customers.contactaddress and addresses.objectid instead of using the indices. In order to determine whether this is a sane decision, you should run EXPLAIN ANALYZE on this query, once with SET ENABLE_SEQSCAN = on; and once with SET ENABLE_SEQSCAN = off;. If the query is significantly faster with SEQSCAN off, then something is amiss - either you haven't run analyze often enough so the stats are out of date or you have random_page_cost set too high (look for the setting in postgresql.conf) - these two are the usual suspects. Kind regards Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Can this query go faster???
On Tue, 2005-12-06 at 10:52 +0100, Csaba Nagy wrote: Joost, Why do you use an offset here ? I guess you're traversing the table somehow, in this case it would be better to remember the last zipcode + housenumber and put an additional condition to get the next bigger than the last one you've got... that would go for the index on zipcode+housenumber and be very fast. The big offset forces postgres to traverse that many entries until it's able to pick the one row for the I am forced to translate a sorting dependent record number to a record in the database. The GUI (a Java JTable) works with record /row numbers, which is handy if one has an ISAM database, but not if one uses PostgreSQL. I wonder if using a forward scrolling cursor would be faster. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Can this query go faster???
Joost Kraaijeveld schrieb: On Tue, 2005-12-06 at 10:52 +0100, Csaba Nagy wrote: Joost, Why do you use an offset here ? I guess you're traversing the table somehow, in this case it would be better to remember the last zipcode + housenumber and put an additional condition to get the next bigger than the last one you've got... that would go for the index on zipcode+housenumber and be very fast. The big offset forces postgres to traverse that many entries until it's able to pick the one row for the I am forced to translate a sorting dependent record number to a record in the database. The GUI (a Java JTable) works with record /row numbers, which is handy if one has an ISAM database, but not if one uses PostgreSQL. You can have a row number in postgres easily too. For example if you just include a serial for the row number. Cursor would work too but you would need to have a persistent connection. Regards Tino ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] BLCKSZ
On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote: I can't undestand why bigger is better. For instance in search by index. Index point to page and I need load page to get one row. Thus I load 8kb from disk for every raw. And keep it then in cache. You recommend 64kb. With your recomendation I'll get 8 times more IO throughput, 8 time more head seek on disk, 8 time more memory cache (OS cache and postgresql) become busy. Hopefully, you won't have eight times the seeking; a single block ought to be in one chunk on disk. You're of course at your filesystem's mercy, though. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Can this query go faster???
Hi Tino, On Tue, 2005-12-06 at 11:32 +0100, Tino Wildenhain wrote: You can have a row number in postgres easily too. For example if you just include a serial for the row number. Not if the order of things is determined runtime and not at insert time... Cursor would work too but you would need to have a persistent connection. I just tried it: a cursor is not faster (what does not surprise me at all, as the amount of work looks the same to me) I guess there is no solution. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Can this query go faster???
Joost Kraaijeveld schrieb: Hi Tino, .. Cursor would work too but you would need to have a persistent connection. I just tried it: a cursor is not faster (what does not surprise me at all, as the amount of work looks the same to me) Actually no, if you scroll forward, you just ask the database for the next rows to materialize. So if you are ahead in your database and ask for next rows, it should be faster then working w/ an offset from start each time. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] BLCKSZ
On Tue, 6 Dec 2005, Steinar H. Gunderson wrote: On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote: I can't undestand why bigger is better. For instance in search by index. Index point to page and I need load page to get one row. Thus I load 8kb from disk for every raw. And keep it then in cache. You recommend 64kb. With your recomendation I'll get 8 times more IO throughput, 8 time more head seek on disk, 8 time more memory cache (OS cache and postgresql) become busy. Hopefully, you won't have eight times the seeking; a single block ought to be in one chunk on disk. You're of course at your filesystem's mercy, though. in fact useually it would mean 1/8 as many seeks, since the 64k chunk would be created all at once it's probably going to be one chunk on disk as Steiner points out and that means that you do one seek per 64k instead of one seek per 8k. With current disks it's getting to the point where it's the same cost to read 8k as it is to read 64k (i.e. almost free, you could read substantially more then 64k and not notice it in I/O speed), it's the seeks that are expensive. yes it will eat up more ram, but assuming that you are likly to need other things nearby it's likly to be a win. as processor speed keeps climing compared to memory and disk speed true random access is really not the correct way to think about I/O anymore. It's frequently more appropriate to think of your memory and disks as if they were tape drives (seek then read, repeat) even for memory access what you really do is seek to the beginning of a block (expensive) then read that block into cache (cheap, you get the entire cacheline of 64-128 bytes no matter if you need it or not) and then you can then access that block fairly quickly. with memory on SMP machines it's a constant cost to seek anywhere in memory, with NUMA machines (including multi-socket Opterons) the cost to do the seek and fetch depends on where in memory you are seeking to and what cpu you are running on. it also becomes very expensive for multiple CPU's to write to memory addresses that are in the same block (cacheline) of memory. for disks it's even more dramatic, the seek is incredibly expensive compared to the read/write, and the cost of the seek varies based on how far you need to seek, but once you are on a track you can read the entire track in for about the same cost as a single block (in fact the drive useually does read the entire track before sending the one block on to you). Raid complicates this becouse you have a block size per drive and reading larger then that block size involves multiple drives. most of the work in dealing with these issues and optimizing for them is the job of the OS, some other databases work very hard to take over this work from the OS, Postgres instead tries to let the OS do this work, but we still need to keep it in mind when configuring things becouse it's possible to make it much easier or much harder for the OS optimize things. David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Can this query go faster???
Joost Kraaijeveld schrieb: On Tue, 2005-12-06 at 12:36 +0100, Tino Wildenhain wrote: I just tried it: a cursor is not faster (what does not surprise me at all, as the amount of work looks the same to me) Actually no, if you scroll forward, you just ask the database for the next rows to materialize. So if you are ahead in your database and ask for next rows, it should be faster then working w/ an offset from start each time. Ah, a misunderstanding: I only need to calculate an index if the user wants a record that is not in or adjacent to the cache (in which case I can do a select values last value in the cache. So I must always materialize all rows below the wanted index. Yes, but still advancing a few blocks from where the cursor is should be faster then re-issuing the query and scroll thru the whole resultset to where you want to go. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance degradation after successive UPDATE's
On 12/6/2005 4:08 AM, Assaf Yaari wrote: Thanks Bruno, Issuing VACUUM FULL seems not to have influence on the time. I've added to my script VACUUM ANALYZE every 100 UPDATE's and run the test again (on different record) and the time still increase. I think he meant - run VACUUM FULL once, - adjust FSM settings to database size and turnover ratio - run VACUUM ANALYZE more frequent from there on. Jan Any other ideas? Thanks, Assaf. -Original Message- From: Bruno Wolff III [mailto:[EMAIL PROTECTED] Sent: Monday, December 05, 2005 10:36 PM To: Assaf Yaari Cc: pgsql-performance@postgresql.org Subject: Re: Performance degradation after successive UPDATE's On Mon, Dec 05, 2005 at 19:05:01 +0200, Assaf Yaari [EMAIL PROTECTED] wrote: Hi, I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0. My application updates counters in DB. I left a test over the night that increased counter of specific record. After night running (several hundreds of thousands updates), I found out that the time spent on UPDATE increased to be more than 1.5 second (at the beginning it was less than 10ms)! Issuing VACUUM ANALYZE and even reboot didn't seemed to solve the problem. You need to be running vacuum more often to get rid of the deleted rows (update is essentially insert + delete). Once you get too many, plain vacuum won't be able to clean them up without raising the value you use for FSM. By now the table is really bloated and you probably want to use vacuum full on it. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Can this query go faster???
On Tue, 2005-12-06 at 13:20, Joost Kraaijeveld wrote: [snip] Ah, a misunderstanding: I only need to calculate an index if the user wants a record that is not in or adjacent to the cache (in which case I can do a select values last value in the cache. So I must always materialize all rows below the wanted index. In this case the query will very likely not work faster. It must always visit all the records till the required offset. If the plan should be faster using the index, then you probably need to analyze (I don't recall from your former posts if you did it recently or not), in any case you could check an explain analyze to see if the planner is mistaken or not - you might already know this. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Can this query go faster???
At 04:43 AM 12/6/2005, Joost Kraaijeveld wrote: Hi, Is it possible to get this query run faster than it does now, by adding indexes, changing the query? SELECT customers.objectid FROM prototype.customers, prototype.addresses WHERE customers.contactaddress = addresses.objectid ORDER BY zipCode asc, housenumber asc LIMIT 1 OFFSET 283745 Explain: Limit (cost=90956.71..90956.71 rows=1 width=55) - Sort (cost=90247.34..91169.63 rows=368915 width=55) Sort Key: addresses.zipcode, addresses.housenumber - Hash Join (cost=14598.44..56135.75 rows=368915 width=55) Hash Cond: (outer.contactaddress = inner.objectid) - Seq Scan on customers (cost=0.00..31392.15 rows=368915 width=80) - Hash (cost=13675.15..13675.15 rows=369315 width=55) - Seq Scan on addresses (cost=0.00..13675.15 rows=369315 width=55) The customers table has an index on contactaddress and objectid. The addresses table has an index on zipcode+housenumber and objectid. TIA customer names, customers.objectid, addresses, and addresses.objectid should all be static (addresses do not change, just the customers associated with them; and once a customer has been assigned an id that better never change...). To me, this sounds like the addresses and customers tables should be duplicated and then physically laid out in sorted order by tablename.objectid in one set and by the human friendly associated string in the other set. Then a finding a specific tablename.objectid or it's associated string can be done in at worse O(lgn) time assuming binary search instead of O(n) time for a sequential scan. If pg is clever enough, it might be able to do better than that. IOW, I'd try duplicating the addresses and customers tables and using the appropriate CLUSTERed Index on each. I know this breaks Normal Form. OTOH, this kind of thing is common practice for data mining problems on static or almost static data. Hope this is helpful, Ron ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Can this query go faster???
On Tue, 2005-12-06 at 11:32 +0100, Tino Wildenhain wrote: You can have a row number in postgres easily too. For example if you just include a serial for the row number. Not if the order of things is determined runtime and not at insert time... Cursor would work too but you would need to have a persistent connection. I just tried it: a cursor is not faster (what does not surprise me at all, as the amount of work looks the same to me) I guess there is no solution. sure there is. This begs the question: 'why do you want to read exactly 283745 rows ahead of row 'x'?) :) If you are scrolling forwards in a set, just pull in, say, 100-1000 rows at a time, ordered, and grab the next 1000 based on the highest value read previously. You can do this on server side (cursor) or client side (parameterized query). There are advantages and disadvantages to each. If you are looping over this set and doing processing, a cursor would be ideal (try out pl/pgsql). Welcome to PostgreSQL! :) Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Memory Leakage Problem
Kathy Lo [EMAIL PROTECTED] writes: I found that the memory of the databaser server nearly used up (total 2G RAM). After I stop the program, the used memory did not free. I see no particular reason to believe that you are describing an actual memory leak. More likely, you are just seeing the kernel's normal behavior of eating up unused memory for disk cache space. Repeat after me: zero free memory is the normal and desirable condition on Unix-like systems. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] two disks - best way to use them?
Ron wrote: For accuracy's sake, which exact config did you finally use? How did you choose the config you finally used? Did you test the three options or just pick one? (Note: I'm not the original poster.) I just picked the option of putting the data/pg_xlog directory (WAL) on a 2nd set of spindles. That was the easiest thing for me to change on this test box. The test server is simply a Gentoo box running software RAID and LVM2. The primary disk set is 2x7200RPM 300GB drives and the secondary disk set is 2x5400RPM 300GB drives. Brand new install of PGSQL 8.1, with mostly default settings (I changed FSM pages to be a higher value, max_fsm_pages = 15). PGSQL was given it's own ext3 32GB LVM volume on the primary disk set (2x7200RPM). Originally, all files were on the primary disk. The task at hand was inserting large quantity of ~45 byte rows (according to vacuum verbose), on the order of millions of records per table. There was an unique key and a unique index. Test clients were accessing the database via ODBC / ADO and doing the inserts in a fairly brute-force mode (attempt the insert, calling .CancelUpdate if it fails). When the tables were under 2 million rows, performance was okay. At one point, I had a 1.8Ghz P4, dual Opteron 246, and Opteron 148 CPUs running at nearly 100% CPU processing and doing inserts into the database. So I had 4 clients running, performing inserts to 4 separate tables in the same database. The P4 ran at about half the throughput as the Opterons (client-bound due to the code that generated row data prior to the insert), so I'd score my throughput as roughly 3.3-3.4. Where 1.0 would be full utilization of the Opteron 148 box. However, once the tables started getting above ~2 million rows, performance took a nose dive. CPU utilizations on the 4 client CPUs dropped into the basement (5-20% CPU) and I had to back off on the number of clients. So throughput had dropped down to around 0.25 or so. The linux box was spending nearly all of its time waiting on the primary disks. Moving the data/pg_xlog (WAL) to the 2nd set of disks (2x5400RPM) in the test server made a dramatic difference for this mass insert. I'm running the P4 (100% CPU) and the Opteron 148 (~80% CPU) at the moment. While it's not up to full speed, a throughput of ~1.3 is a lot better then the ~0.25 that I was getting prior. (The two tables currently being written have over 5 million rows each. One table has ~16 million rows.) Wait percentage in top is only running 20-30% (dipping as low as 10%). I haven't pushed this new setup hard enough to determine where the upper limit for throughput is. It's very much a niche test (millions of inserts of narrow rows into multiple tables using fairly brain-dead code). But it gives me data points on which to base purchasing of the production box. The original plan was a simple RAID1 setup (2 spindles), but this tells me it's better to order 4 spindles and set it up as a pair of RAID1 sets. Whether 4 spindles is better as two separate RAID1 arrays, or configured as a single RAID1+0 array... dunno. Our application is typically more limited by insert speed then read speed (so I'm leaning towards separate RAID arrays). I'm sure there's also more tuning that could be done to the PGSQL database (in the configuration file). Also, the code is throwaway code that isn't the most elegant. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Hi, After few test, the difference is explained by the effective_cache_size parameter. with effective_cache_size=1000 (default) the planner chooses the following plan postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN Aggregate (cost=421893.64..421893.65 rows=1 width=0) - Unique (cost=385193.48..395679.24 rows=2097152 width=8) - Sort (cost=385193.48..390436.36 rows=2097152 width=8) Sort Key: test.val - Seq Scan on test (cost=0.00..31252.52 rows=2097152 width=8) (5 rows) with effective_cache_size=15000 the planner chooses the following plan postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=101720.39..101720.40 rows=1 width=0) - Unique (cost=0.00..75505.99 rows=2097152 width=8) - Index Scan using testval on test (cost=0.00..70263.11 rows=2097152 width=8) (3 rows) I test some other values for effective_cache_size. The switch from seq to index scan happens between 9900 and 1 for effective_cache_size. I have my sql server on a OpenBSD 3.8 box with 1 Gb of RAM with nothing else running on it. I setup the cachepercent to 25. I expect to have 25% of 1 Gb of RAM (256 Mb) as file cache. effective_cache_size=15000 means 15000 x 8K of OS cache = 120,000 Kb which is lower than my 256 MB of disk cache. I recall the result of my precedent test. #rows 2097152 IndexScan 1363396,581s SeqScan 98758,445s Ratio 13,805 So the planner when effective_cache_size=15000 chooses a plan that is 13 times slower than the seqscan one. I did not understand where the problem comes from. Any help welcome. Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] TSearch2 vs. Apache Lucene
Greetings all, I'm going to do a performance comparison with DocMgr and PG81/TSearch2 on one end, and Apache Lucene on the other end. In order to do this, I'm going to create a derivative of the docmgr-autoimport script so that I can specify one file to import at a time. I'll then create a Perl script which logs all details (such as timing, etc.) as the test progresses. As test data, I have approximately 9,000 text files from Project Gutenberg ranging in size from a few hundred bytes to 4.5M. I plan to test the speed of import of each file. Then, I plan to write a web-robot in Perl that will test the speed and number of results returned. Can anyone think of a validation of this test, or how I should configure PG to maximise import and search speed? Can I maximise search speed and import speed, or are those things mutually exclusive? (Note that this will be run on limited hardware - 900MHz Athlon with 512M of ram) Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? Thanks, -Josh ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Missed index opportunity for outer join?
[EMAIL PROTECTED] writes: On Mon, 5 Dec 2005, Tom Lane wrote: (Note to self: it is a bit odd that fac_id=261 is pushed down to become an indexqual in one case but not the other ...) I speculate that the seq_scan wasn't really the slow part compared to not using using both parts of the index in the second part of the plan. The table point_features is tens of thousands of rows, while the table facets is tens of millions. Agreed, but it's still odd that it would use a seqscan in one case and not the other. I found the reason why the fac_id=261 clause isn't getting used as an index qual; it's a bit of excessive paranoia that goes back to 2002. I've fixed that for 8.1.1, but am still wondering about the seqscan on the other side of the join. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] TSearch2 vs. Apache Lucene
Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? I'll stay away from TSearch2 until it is fully integrated in the postgres core (like create index foo_text on foo (texta, textb) USING TSearch2). Because a full integration is unlikely to happen in the near future (as far as I know), I'll stick to Lucene. Mike ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] TSearch2 vs. Apache Lucene
Folks, tsearch2 and Lucene are very different search engines, so it'd be unfair comparison. If you need full access to metadata and instant indexing you, probably, find tsearch2 is more suitable then Lucene. But, if you could live without that features and need to search read only archives you need Lucene. Tsearch2 integration into pgsql would be cool, but, I see no problem to use tsearch2 as an official extension module. After completing our todo, which we hope will likely happens for 8.2 release, you could forget about Lucene and other engines :) We'll be available for developing in spring and we estimate about three months for our todo, so, it's really doable. Oleg On Tue, 6 Dec 2005, Michael Riess wrote: Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? I'll stay away from TSearch2 until it is fully integrated in the postgres core (like create index foo_text on foo (texta, textb) USING TSearch2). Because a full integration is unlikely to happen in the near future (as far as I know), I'll stick to Lucene. Mike ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] TSearch2 vs. Apache Lucene
Oleg Bartunov wrote: Folks, tsearch2 and Lucene are very different search engines, so it'd be unfair comparison. If you need full access to metadata and instant indexing you, probably, find tsearch2 is more suitable then Lucene. But, if you could live without that features and need to search read only archives you need Lucene. Tsearch2 integration into pgsql would be cool, but, I see no problem to use tsearch2 as an official extension module. After completing our todo, which we hope will likely happens for 8.2 release, you could forget about Lucene and other engines :) We'll be available for developing in spring and we estimate about three months for our todo, so, it's really doable. Agreed. There isn't anything magical about a plug-in vs something integrated, as least in PostgreSQL. In other database, plug-ins can't fully function as integrated, but in PostgreSQL, everything is really a plug-in because it is all abstracted. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] postgresql performance tuning
This didn't get through the first time around, so resending it again. Sorry for any duplicate entries. Hello, I have a question on postgres's performance tuning, in particular, the vacuum and reindex commands. Currently I do a vacuum (without full) on all of my tables. However, its noted in the docs (e.g. http://developer.postgresql.org/docs/postgres/routine-reindex.html) and on the lists here that indexes may still bloat after a while and hence reindex is necessary. How often do people reindex their tables out there? I guess I'd have to update my cron scripts to do reindexing too along with vacuuming but most probably at a much lower frequency than vacuum. But these scripts do these maintenance tasks at a fixed time (every few hours, days, weeks, etc.) What I would like is to do these tasks on a need basis. So for vacuuming, by need I mean every few updates or some such metric that characterizes my workload. Similarly, need for the reindex command might mean every few updates or degree of bloat, etc. I came across the pg_autovacuum daemon, which seems to do exactly what I need for vacuums. However, it'd be great if there was a similar automatic reindex utility, like say, a pg_autoreindex daemon. Are there any plans for this feature? If not, then would cron scripts be the next best choice? Thanks, Ameet ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Memory Leakage Problem
On Tue, 2005-12-06 at 03:22, Kathy Lo wrote: Hi, In this program, it will access this database server using simple and complex (joining tables) SQL Select statement and retrieve the matched rows. For each access, it will connect the database and disconnect it. I found that the memory of the databaser server nearly used up (total 2G RAM). After I stop the program, the used memory did not free. U. What exactly do you mean? Can we see the output of top and / or free? I'm guessing that what Tom said is right, you're just seeing a normal state of how unix does things. If your output of free looks like this: -bash-2.05b$ free total used free sharedbuffers cached Mem:60969126069588 27324 0 2607285547264 -/+ buffers/cache: 2615965835316 Swap: 4192880 163204176560 Then that's normal. That's the output of free on a machine with 6 gigs that runs a reporting database. Note that while it shows almost ALL the memory as used, it is being used by the kernel, which is a good thing. Note that 5547264 or about 90% of memory is being used as kernel cache. That's a good thing. Note you can also get yourself in trouble with top. It's not uncommon for someone to see a bunch of postgres processes each eating up 50 or more megs of ram, and panic and think that they're running out of memory, when, in fact, 44 meg for each of those processes is shared, and the real usage per backend is 6 megs or less. Definitely grab yourself a good unix / linux sysadmin guide. The in a nutshell books from O'Reilley (sp?) are a good starting point. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] TSearch2 vs. Apache Lucene
Bruce Momjian pgman@candle.pha.pa.us writes: Oleg Bartunov wrote: Tsearch2 integration into pgsql would be cool, but, I see no problem to use tsearch2 as an official extension module. Agreed. There isn't anything magical about a plug-in vs something integrated, as least in PostgreSQL. The quality gap between contrib and the main system is a lot smaller than it used to be, at least for those contrib modules that have regression tests. Main and contrib get equal levels of testing from the buildfarm, so they're about on par as far as portability goes. We could never say that before 8.1 ... (Having said that, I think that tsearch2 will eventually become part of core, but probably not for awhile yet.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] postgresql performance tuning
Ameet Kini [EMAIL PROTECTED] writes: I have a question on postgres's performance tuning, in particular, the vacuum and reindex commands. Currently I do a vacuum (without full) on all of my tables. However, its noted in the docs (e.g. http://developer.postgresql.org/docs/postgres/routine-reindex.html) and on the lists here that indexes may still bloat after a while and hence reindex is necessary. How often do people reindex their tables out there? Never, unless you have actual evidence that your indexes are bloating. It's only very specific use-patterns that have problems. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] TSearch2 vs. Apache Lucene
Bruce Momjian schrieb: Oleg Bartunov wrote: Folks, tsearch2 and Lucene are very different search engines, so it'd be unfair comparison. If you need full access to metadata and instant indexing you, probably, find tsearch2 is more suitable then Lucene. But, if you could live without that features and need to search read only archives you need Lucene. Tsearch2 integration into pgsql would be cool, but, I see no problem to use tsearch2 as an official extension module. After completing our todo, which we hope will likely happens for 8.2 release, you could forget about Lucene and other engines :) We'll be available for developing in spring and we estimate about three months for our todo, so, it's really doable. Agreed. There isn't anything magical about a plug-in vs something integrated, as least in PostgreSQL. In other database, plug-ins can't fully function as integrated, but in PostgreSQL, everything is really a plug-in because it is all abstracted. I only remember evaluating TSearch2 about a year ago, and when I read statements like Vacuum and/or database dump/restore work differently when using TSearch2, sql scripts need to be executed etc. I knew that I would not want to go there. But I don't doubt that it works, and that it is a sane concept. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] TSearch2 vs. Apache Lucene
Michael Riess wrote: Bruce Momjian schrieb: Oleg Bartunov wrote: Folks, tsearch2 and Lucene are very different search engines, so it'd be unfair comparison. If you need full access to metadata and instant indexing you, probably, find tsearch2 is more suitable then Lucene. But, if you could live without that features and need to search read only archives you need Lucene. Tsearch2 integration into pgsql would be cool, but, I see no problem to use tsearch2 as an official extension module. After completing our todo, which we hope will likely happens for 8.2 release, you could forget about Lucene and other engines :) We'll be available for developing in spring and we estimate about three months for our todo, so, it's really doable. Agreed. There isn't anything magical about a plug-in vs something integrated, as least in PostgreSQL. In other database, plug-ins can't fully function as integrated, but in PostgreSQL, everything is really a plug-in because it is all abstracted. I only remember evaluating TSearch2 about a year ago, and when I read statements like Vacuum and/or database dump/restore work differently when using TSearch2, sql scripts need to be executed etc. I knew that I would not want to go there. But I don't doubt that it works, and that it is a sane concept. Good point. I think we had some problems at that point because the API was improved between versions. Even if it had been integrated, we might have had the same problem. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] postgresql performance tuning
On Dec 6, 2005, at 12:44 PM, Ameet Kini wrote: I have a question on postgres's performance tuning, in particular, the vacuum and reindex commands. Currently I do a vacuum (without full) on all of my tables. However, its noted in the docs (e.g. http://developer.postgresql.org/docs/postgres/routine-reindex.html) and on the lists here that indexes may still bloat after a while and hence reindex is necessary. How often do people reindex their tables out Why would you be running a version older than 7.4? Index bloat is mostly a non-issue in recent releases of pg. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High context switches occurring
I ran a bit exhaustive pgbench on 2 test machines I have (quad dual core Intel and Opteron). Ofcourse the Opteron was much faster, but interestingly, it was experiencing 3x more context switches than the Intel box (upto 100k, versus ~30k avg on Dell). Both are RH4.0 64bit/PG8.1 64bit. Sun (v40z): -bash-3.00$ time pgbench -c 1000 -t 30 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1000 number of transactions per client: 30 number of transactions actually processed: 3/3 tps = 45.871234 (including connections establishing) tps = 46.092629 (excluding connections establishing) real10m54.240s user0m34.894s sys 3m9.470s Dell (6850): -bash-3.00$ time pgbench -c 1000 -t 30 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1000 number of transactions per client: 30 number of transactions actually processed: 3/3 tps = 22.088214 (including connections establishing) tps = 22.162454 (excluding connections establishing) real22m38.301s user0m43.520s sys 5m42.108s Thanks, Anjan -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 2:42 PM To: Anjan Dave Cc: Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring Anjan Dave [EMAIL PROTECTED] writes: Would this problem change it's nature in any way on the recent Dual-Core Intel XEON MP machines? Probably not much. There's some evidence that Opterons have less of a problem than Xeons in multi-chip configurations, but we've seen CS thrashing on Opterons too. I think the issue is probably there to some extent in any modern SMP architecture. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] postgresql performance tuning
Hello, I have a question on postgres's performance tuning, in particular, the vacuum and reindex commands. Currently I do a vacuum (without full) on all of my tables. However, its noted in the docs (e.g. http://developer.postgresql.org/docs/postgres/routine-reindex.html) and on the lists here that indexes may still bloat after a while and hence reindex is necessary. How often do people reindex their tables out there? I guess I'd have to update my cron scripts to do reindexing too along with vacuuming but most probably at a much lower frequency than vacuum. But these scripts do these maintenance tasks at a fixed time (every few hours, days, weeks, etc.) What I would like is to do these tasks on a need basis. So for vacuuming, by need I mean every few updates or some such metric that characterizes my workload. Similarly, need for the reindex command might mean every few updates or degree of bloat, etc. I came across the pg_autovacuum daemon, which seems to do exactly what I need for vacuums. However, it'd be great if there was a similar automatic reindex utility, like say, a pg_autoreindex daemon. Are there any plans for this feature? If not, then would cron scripts be the next best choice? Thanks, Ameet ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] High context switches occurring
On Dec 6, 2005, at 2:04 PM, Anjan Dave wrote: interestingly, it was experiencing 3x more context switches than the Intel box (upto 100k, versus ~30k avg on Dell). Both are RH4.0 I'll assume that's context switches per second... so for the opteron that's 6540 cs's and for the Dell that's 4074 switches during the duration of the test. Not so much a difference... You see, the opteron was context switching more because it was doing more work :-) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgresql performance tuning
On Dec 6, 2005, at 11:14 AM, Ameet Kini wrote: need for vacuums. However, it'd be great if there was a similar automatic reindex utility, like say, a pg_autoreindex daemon. Are there any plans for this feature? If not, then would cron scripts be the next best what evidence do you have that you are suffering index bloat? or are you just looking for solutions to problems that don't exist as an academic exercise? :-) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Missed index opportunity for outer join?
Tom Lane wrote: [EMAIL PROTECTED] writes: On Mon, 5 Dec 2005, Tom Lane wrote: I speculate that the seq_scan wasn't really the slow part compared to not using using both parts of the index in the second part of the plan. The table point_features is tens of thousands of rows, while the table facets is tens of millions. Agreed, but it's still odd that it would use a seqscan in one case and not the other. Hmm. Unfortunately that was happening on a production system and the amount of data in the tables has changed - and now I'm no longer getting a seq_scan when I try to reproduce it. That system is still using 8.1.0. The point_features table is pretty dynamic and it's possible that the data changed between my 'explain analyze' statement in the first post in this thread. However since both of them show an estimate of rows=948 and returned an actual of 917 I don't think that happened. I found the reason why the fac_id=261 clause isn't getting used as an index qual; it's a bit of excessive paranoia that goes back to 2002. I've fixed that for 8.1.1, but am still wondering about the seqscan on the other side of the join. I now have a development system with cvs-tip; but have not yet reproduced the seq scan on it either. I'm using the same data that was in point_features with featureid=120 - but don't have any good way of knowing what other data may have been in the table at the time. If desired, I could set up a cron job to periodically explain analyze that query and see if it recurs. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] LVM and Postgres
Rory Campbell-Lange wrote: The server has a 250GB RAID10 (LSI 320-I + BBU) volume which I am thinking of slicing up in the following way (Linux 2.6 kernel): / : ext3 : 47GB (root, home etc) /boot : ext3 : 1GB /tmp : ext2 : 2GB /usr : ext3 : 4GB /var : ext3 : 6GB --- 60GB VG:190GB approx --- Initially divided so: /data : ext3 : 90GB /postgres : xfs : 40GB This gives me left over space of roughly 60GB to extend into on the volume group, which I can balance between the /data and /postgres logical volumes as needed. Are there any major pitfalls to this approach? Thanks, Rory It looks like you are using fast disks and xfs for filesystem on the /postgresql partition. That's nice. How many disks in the array? One thing you miss is sticking a bunch of sequential log writes on a separate spindle as far as I can see with this? WAL / XFS (i think) both have this pattern. If you've got a fast disk and can do BBU write caching your WAL writes will hustle. Others can probably speak a bit better on any potential speedups. - August ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Missed index opportunity for outer join?
Ron Mayer [EMAIL PROTECTED] writes: The point_features table is pretty dynamic and it's possible that the data changed between my 'explain analyze' statement in the first post in this thread. However since both of them show an estimate of rows=948 and returned an actual of 917 I don't think that happened. Yeah, I had considered the same explanation and rejected it for the same reason. Also, the difference in estimated cost is significant (265.85 for the seqscan vs 172.17 for the bitmap scan) so it's hard to think that a small change in stats --- so small as to not reflect in estimated row count --- would change the estimate by that much. [ thinks some more... ] Of course, what we have to remember is that the planner is actually going to choose based on the ultimate join cost, not on the subplan costs. The reason the seqscan survived initial comparisons at all is that it has a cheaper startup cost (less time to return the first tuple) than the bitmap scan, and this will be reflected into a cheaper startup cost for the overall nestloop. The extra hundred units of total cost would only reflect into the nestloop total cost --- and there, they would be considered down in the noise compared to a 90k total estimate. So probably what happened is that the planner preferred this plan on the basis that the total costs are the same to within estimation error while the startup cost is definitely less. In this explanation, the reason for the change in plans over time could be a change in the statistics for the other table. Is facets more dynamic than point_features? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] TSearch2 vs. Apache Lucene
On 6 Dec 2005, at 16:47, Joshua Kramer wrote: Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? In our experience (small often-updated documents) Lucene leaves tsearch2 in the dust. This probably has a lot to do with our usage pattern though. For our usage it's very beneficial to have the index on a separate machine to the data, however in many cases this won't make sense. Lucene is also a lot easier to cluster than Postgres (it's simply a matter of NFS-mounting the index). Russ Garrett [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Can this query go faster???
On Tue, Dec 06, 2005 at 10:52:57 +0100, Csaba Nagy [EMAIL PROTECTED] wrote: Joost, Why do you use an offset here ? I guess you're traversing the table somehow, in this case it would be better to remember the last zipcode + housenumber and put an additional condition to get the next bigger than the last one you've got... that would go for the index on zipcode+housenumber and be very fast. The big offset forces postgres to traverse that many entries until it's able to pick the one row for the result... The other problem with saving an offset, is unless the data isn't changing or you are doing all of the searches in one serialized transaction, the fixed offset might not put you back where you left off. Using the last key, instead of counting records is normally a better way to do this. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgresql performance tuning
Vivek Khera wrote: On Dec 6, 2005, at 11:14 AM, Ameet Kini wrote: need for vacuums. However, it'd be great if there was a similar automatic reindex utility, like say, a pg_autoreindex daemon. Are there any plans for this feature? If not, then would cron scripts be the next best what evidence do you have that you are suffering index bloat? or are you just looking for solutions to problems that don't exist as an academic exercise? :-) The files for the two indices on a single table used 7.8GB of space before a reindex, and 4.4GB after. The table had been reindexed over the weekend and a vacuum was completed on the table about 2 hours ago. The two indices are now 3.4GB smaller. I don't think this counts as bloat, because of our use case. Even so, we reindex our whole database every weekend. -- Alan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] postgresql performance tuning
Ameet Kini schrieb: This didn't get through the first time around, so resending it again. Sorry for any duplicate entries. Hello, I have a question on postgres's performance tuning, in particular, the vacuum and reindex commands. Currently I do a vacuum (without full) on all of my tables. I'm curious ... why no full vacuum? I bet that the full vacuum will compact your (index) tables as much as a reindex would. I guess the best advice is to increase FSM and to use autovacuum. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Missed index opportunity for outer join?
Tom Lane wrote: ...planner is actually going to choose based on the ultimate join cost, not on the subplan costs... In this explanation, the reason for the change in plans over time could be a change in the statistics for the other table. Is facets more dynamic than point_features? In total rows changing it's more dynamic, but percentage-wise, it's less dynamic (point_features probably turns round 50% of it's rows in a day -- while facets turns over about 3% per day -- but facets is 1000X larger). Facets is a big table with rather odd distributions of values. Many of the values in the indexed columns show up only once, others show up hundreds-of-thousands of times. Perhaps an analyze ran and just randomly sampled differently creating different stats on that table? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] postgresql performance tuning
Alan Stange [EMAIL PROTECTED] writes: Vivek Khera wrote: what evidence do you have that you are suffering index bloat? The files for the two indices on a single table used 7.8GB of space before a reindex, and 4.4GB after. That's not bloat ... that's pretty nearly in line with the normal expectation for a btree index, which is about 2/3rds fill factor. If the compacted index were 10X smaller then I'd agree that you have a bloat problem. Periodic reindexing on this scale is not doing a lot for you except thrashing your disks --- you're just giving space back to the OS that will shortly be sucked up again by the same index. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Context switching and Xeon processors
We're running a dual Xeon machine with hyperthreading enabled and PostgreSQL 8.0.3. Below is the type of CPUs: processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping: 1 cpu MHz : 3200.274 cache size : 1024 KB ... We've been tuning the kernel (2.4 SMP flavor) and have improved performance quite a bit. I'm now wondering if turning off HT will improve performance even more. Based on the vmstat output below, is the context switching typical or too high? And what is the latest on the state of PostgreSQL running on Xeon processors with HT turned on? I searched the archives, but couldn't discern anything definitive. r b swpd free buff cache si sobibo incs us sy wa id 1 0 135944 64612 17136 375681600 0 210 154 178 2 0 4 94 1 0 135940 46600 17204 375449600 1 1231 442 3658 7 3 10 80 1 3 135940 51228 17240 375468000 0 1268 255 2659 4 1 14 81 1 0 135940 58512 17300 375468400 0 1818 335 1526 2 1 32 65 1 1 135940 18104 17328 380651600 17670 476 1314 1962 2 2 41 56 0 1 135940 17776 17232 381162000 23193 394 1600 2097 2 2 53 44 0 1 135940 17944 17188 380963600 25459 349 1547 2013 2 2 50 46 0 3 135940 18816 15184 379831200 24284 1328 1529 4730 6 5 53 36 0 6 135940 23536 6060 381708800 27376 1332 1350 2628 2 3 56 39 0 5 135940 18008 6036 382713200 18806 1539 1410 1416 1 2 61 36 0 5 135940 18492 5708 382666000 3540 10354 736 955 2 2 76 20 0 3 135940 18940 5788 382986400 2308 7506 707 519 2 1 81 15 1 4 135940 18980 5820 382883600 138 3503 556 261 1 0 74 24 0 10 135940 39332 5896 32400 579 2805 621 4104 7 4 54 35 0 4 135936 37816 5952 379140400 260 1887 384 1574 2 1 40 57 0 5 135936 29552 5996 380226000 290 1642 434 1944 3 1 38 58 -- Brandon ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Missed index opportunity for outer join?
Ron Mayer [EMAIL PROTECTED] writes: Tom Lane wrote: In this explanation, the reason for the change in plans over time could be a change in the statistics for the other table. Is facets more dynamic than point_features? Facets is a big table with rather odd distributions of values. Many of the values in the indexed columns show up only once, others show up hundreds-of-thousands of times. Perhaps an analyze ran and just randomly sampled differently creating different stats on that table? If you have background tasks doing ANALYZEs then this explanation seems plausible enough. I'm willing to accept it anyway ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Missed index opportunity for outer join?
Tom Lane wrote: If you have background tasks doing ANALYZEs then this explanation seems plausible enough. I'm willing to accept it anyway ... Yup, there are such tasks. I could dig through logs to try to confirm or reject it; but I think it's reasonably likely that this happened. Basically, data gets added to that table as it becomes ready from other systems, and after each batch a vacuum analyze is run. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LVM and Postgres
Hi August. Thanks very much for your mail. On 06/12/05, August Zajonc ([EMAIL PROTECTED]) wrote: Rory Campbell-Lange wrote: The server has a 250GB RAID10 (LSI 320-I + BBU) volume which I am thinking of slicing up in the following way (Linux 2.6 kernel): / : ext3 : 47GB (root, home etc) /boot : ext3 : 1GB /tmp : ext2 : 2GB /usr : ext3 : 4GB /var : ext3 : 6GB --- 60GB VG:190GB approx --- Initially divided so: /data : ext3 : 90GB /postgres : xfs : 40GB This gives me left over space of roughly 60GB to extend into on the volume group, which I can balance between the /data and /postgres logical volumes as needed. It looks like you are using fast disks and xfs for filesystem on the /postgresql partition. That's nice. How many disks in the array? Four. One thing you miss is sticking a bunch of sequential log writes on a separate spindle as far as I can see with this? WAL / XFS (i think) both have this pattern. If you've got a fast disk and can do BBU write caching your WAL writes will hustle. Yes, we don't have any spare disks unfortunately. We have enabled the BBU write, so we are hoping for good performance. I'd be grateful for some advice on dd/bonnie++ tests for checking this. Others can probably speak a bit better on any potential speedups. I'd better test extending the Logical Volumes too! Many thanks Rory ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] postgresql performance tuning
Tom Lane wrote: Alan Stange [EMAIL PROTECTED] writes: Vivek Khera wrote: what evidence do you have that you are suffering index bloat? The files for the two indices on a single table used 7.8GB of space before a reindex, and 4.4GB after. That's not bloat ... that's pretty nearly in line with the normal expectation for a btree index, which is about 2/3rds fill factor. If the compacted index were 10X smaller then I'd agree that you have a bloat problem. I wrote I don't think this counts as bloat I still don't. -- Alan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Context switching and Xeon processors
t == [EMAIL PROTECTED] writes: t Brandon Metcalf [EMAIL PROTECTED] writes: t We've been tuning the kernel (2.4 SMP flavor) and have improved t performance quite a bit. I'm now wondering if turning off HT will t improve performance even more. Based on the vmstat output below, is t the context switching typical or too high? t Given that your CPU usage is hovering around 2%, it's highly unlikely t that you'll be able to measure any change at all by fiddling with HT. t What you need to be working on is disk I/O --- the 80% wait number t is what should be getting your attention, not the CS number. t (FWIW, on the sort of hardware you're talking about, I wouldn't worry t about CS rates lower than maybe 1/sec --- the hardware can sustain t well over 10x that.) Yes, I agree the disk I/O is an issue and that's what we've been addressing with the tuning we've been doing and have been able to improve. I think that we really need to go to a RAID 10 array to address the I/O issue, but thought I would investigate the context switching issue. Thanks for the information. -- Brandon ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Context switching and Xeon processors
On Tue, Dec 06, 2005 at 03:01:02PM -0600, Brandon Metcalf wrote: We're running a dual Xeon machine with hyperthreading enabled and PostgreSQL 8.0.3. The two single most important things that will help you with high rates of context switching: - Turn off hyperthreading. - Upgrade to 8.1. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] two disks - best way to use them?
On Tue, 6 Dec 2005, Thomas Harold wrote: Ron wrote: For accuracy's sake, which exact config did you finally use? How did you choose the config you finally used? Did you test the three options or just pick one? (Note: I'm not the original poster.) I just picked the option of putting the data/pg_xlog directory (WAL) on a 2nd set of spindles. That was the easiest thing for me to change on this test box. The test server is simply a Gentoo box running software RAID and LVM2. The primary disk set is 2x7200RPM 300GB drives and the secondary disk set is 2x5400RPM 300GB drives. Brand new install of PGSQL 8.1, with mostly default settings (I changed FSM pages to be a higher value, max_fsm_pages = 15). PGSQL was given it's own ext3 32GB LVM volume on the primary disk set (2x7200RPM). Originally, all files were on the primary disk. the WAL is more sensitive to drive speeds then the data is, so you may pick up a little more performance by switching the WAL to the 7200 rpm drives instead of the 5400 rpm drives. if you see a noticable difference with this, consider buying a pair of smaller, but faster drives (10k or 15k rpm drives, or a solid-state drive). you can test this (with significant data risk) by putting the WAL on a ramdisk and see what your performance looks like. David Lang ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] High context switches occurring
Anjan Dave [EMAIL PROTECTED] writes: -bash-3.00$ time pgbench -c 1000 -t 30 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1000 number of transactions per client: 30 number of transactions actually processed: 3/3 tps = 45.871234 (including connections establishing) tps = 46.092629 (excluding connections establishing) I can hardly think of a worse way to run pgbench :-(. These numbers are about meaningless, for two reasons: 1. You don't want number of clients (-c) much higher than scaling factor (-s in the initialization step). The number of rows in the branches table will equal -s, and since every transaction updates one randomly-chosen branches row, you will be measuring mostly row-update contention overhead if there's more concurrent transactions than there are rows. In the case -s 1, which is what you've got here, there is no actual concurrency at all --- all the transactions stack up on the single branches row. 2. Running a small number of transactions per client means that startup/shutdown transients overwhelm the steady-state data. You should probably run at least a thousand transactions per client if you want repeatable numbers. Try something like -s 10 -c 10 -t 3000 to get numbers reflecting test conditions more like what the TPC council had in mind when they designed this benchmark. I tend to repeat such a test 3 times to see if the numbers are repeatable, and quote the middle TPS number as long as they're not too far apart. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] postgresql performance tuning
On Tue, Dec 06, 2005 at 04:03:22PM -0600, Ameet Kini wrote: I'm running postgresql v8.0 and my problem is that running vacuum on my indices are blazing fast (upto 10x faster) AFTER running reindex. For a table with only 1 index, the time to do a vacuum (without full) went down from 45 minutes to under 3 minutes. I've also noticed a fairly large increase in vacuum speed after a reindex. (To the point where the reindex + vacuum was faster than just a vacuum.) Mike Stone ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] LVM and Postgres
On Tue, Dec 06, 2005 at 09:36:23PM +, Rory Campbell-Lange wrote: Yes, we don't have any spare disks unfortunately. We have enabled the BBU write, so we are hoping for good performance. Even if you don't use seperate disks you'll probably get better performance by putting the WAL on a seperate ext2 partition. xfs gives good performance for the table data, but is not particularly good for the WAL. Mike Stone ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] LVM and Postgres
I would argue that almost certainly won't by doing that as you will create a new place even further away for the disk head to seek to instead of just another file on the same FS that is probably closer to the current head position. Alex On 12/6/05, Michael Stone [EMAIL PROTECTED] wrote: On Tue, Dec 06, 2005 at 09:36:23PM +, Rory Campbell-Lange wrote: Yes, we don't have any spare disks unfortunately. We have enabled the BBU write, so we are hoping for good performance. Even if you don't use seperate disks you'll probably get better performance by putting the WAL on a seperate ext2 partition. xfs gives good performance for the table data, but is not particularly good for the WAL. Mike Stone ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LVM and Postgres
On Tue, Dec 06, 2005 at 07:52:25PM -0500, Alex Turner wrote: I would argue that almost certainly won't by doing that as you will create a new place even further away for the disk head to seek to instead of just another file on the same FS that is probably closer to the current head position. I would argue that you should benchmark it instead of speculating. You are perhaps underestimating the effect of the xfs log. (Ordinarily xfs has great performance, but it seems to be fairly lousy at fsync/osync/etc operations in my benchmarks; my wild speculation is that the sync forces a log flush.) At any rate you're going to have a lot of head movement on any reasonably sized filesystem anyway, and I'm not convinced that hoping that your data will happen to land close to your log is a valid, repeatable optimization technique. Note that the WAL will wander around the disk as files are created and deleted, whereas tables are basically updated in place. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] TSearch2 vs. Apache Lucene
... So you'll avoid a non-core product and instead only use another non-core product...? Chris Michael Riess wrote: Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? I'll stay away from TSearch2 until it is fully integrated in the postgres core (like create index foo_text on foo (texta, textb) USING TSearch2). Because a full integration is unlikely to happen in the near future (as far as I know), I'll stick to Lucene. Mike ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Faster db architecture for a twisted table.
On Sat, 2005-12-03 at 23:00 +, Rodrigo Madera wrote: CREATE TABLE person( id bigint PRIMARY KEY, first_name TEXT, age INT, mother bigint REFERENCES person, father biging REFERENCES person, siblings array of bigints (don't remember the syntax, but you get the point) ); Well, this is ok, but imagine a search for brothers of person id 34. We would have to search inside the record's 'siblings' array. Is this a bad design? is this going to be slow? Well, I don't know how close this example is to your actual problem, but the siblings array is redundant, AFAICS. If you got rid of it, you could query for full sibling brothers with something like (not tested): select bro.* from person p inner join person bro on (p.mother = bro.mother) AND (p.father = bro.father) where bro.sex='M' and p.id=34 ...assuming you added a sex field, which you would need in any case to query for brothers. You could query for half-siblings by changing the AND into an OR, I think. Mitch ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] need help
I'm running PostgreSQL 8.0.3 on i686-pc-linux-gnu (Fedora Core 2). I've been dealing with Psql for over than 2 years now, but I've never had this case before. I have a table that has about 20 rows in it. Table public.s_apotik Column | Type| Modifiers ---+--+-- obat_id| character varying(10)| not null stock | numeric | not null s_min | numeric | not null s_jual | numeric | s_r_jual | numeric | s_order| numeric | s_r_order | numeric | s_bs | numeric | last_receive | timestamp without time zone | Indexes: s_apotik_pkey PRIMARY KEY, btree(obat_id) When I try to UPDATE one of the row, nothing happens for a very long time. First, I run it on PgAdminIII, I can see the miliseconds are growing as I waited. Then I stop the query, because the time needed for it is unbelievably wrong. Then I try to run the query from the psql shell. For example, the table has obat_id : A, B, C, D. db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='A'; ( nothing happens.. I press the Ctrl-C to stop it. This is what comes out :) Cancel request sent ERROR: canceling query due to user request (If I try another obat_id) db=# UPDATE s_apotik SET stock = 100 WHERE obat_id='B'; (Less than a second, this is what comes out :) UPDATE 1 I can't do anything to that row. I can't DELETE it. Can't DROP the table. I want this data out of my database. What should I do? It's like there's a falsely pointed index here. Any help would be very much appreciated. Regards, Jenny Tania __ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] need help (not anymore)
I run the VACUUM as you suggested, but still no response from the server. So, I decided to DROP the database. I got a message that the database is being used. I closed every application that accessing it. But, the message remains. I checked the server processes (ps -ax). There were lots of 'UPDATE is waiting ...' on the list. I killed them all. I backuped current database and DROP the database, restore to the backup file I just made. Don't really know why this happened, but thankfully now, everything's normal. Thank you, guys. Regards, Jenny Tania __ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Join the same row
Hi, I´m trying to optimize some selects between 2 tables and the best way I found was alter the first table and add the fields of the 2nd table. I adjusted the contents and now a have only one table with all info that I need. Now resides my problem, because of legacy queries I decided to make a Rule that replace the 2nd table. Until now all worked well, but I found when I make a join between de result table and de Rule, even tought is the same row in the same table, the optimizer generete two access for the same row: cta_pag is the table and ctapag_adm is the rule. CREATE OR REPLACE RULE _RETURN AS ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto, cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai FROM cta_pag WHERE cta_pag.origem = 'A'::bpchar; This is one of the legacy queries: select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and p.nrlancto = 21861; EXPLAIN: Nested Loop (cost=0.00..11.49 rows=1 width=443) (actual time=0.081..0.088 rows=1 loops=1) - Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 width=408) (actual time=0.044..0.046 rows=1 loops=1) Index Cond: (nrlancto = 21861::numeric) - Index Scan using cta_pag_pk on cta_pag (cost=0.00..5.74 rows=1 width=35) (actual time=0.023..0.025 rows=1 loops=1) Index Cond: (21861::numeric = nrlancto) Filter: (origem = 'A'::bpchar) Total runtime: 0.341 ms Resulting in twice the time for accessing. Acessing just on time the same row: select * from cta_pag p where p.nrlancto = 21861 EXPLAIN: Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 width=408) (actual time=0.044..0.047 rows=1 loops=1) Index Cond: (nrlancto = 21861::numeric) Total runtime: 0.161 ms Is there a way to force the optimizer to understand that is the same row? Thanks, Edison -- Edison Azzi edisonazzi (at ) terra ( dot ) com ( dot ) br ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Query Fails with error calloc - Cannot alocate memory
Title: Query Fails with error calloc - Cannot alocate memory Hi I am trying to run a query that selects 26 million rows from a table with 68 byte rows. When run on the Server via psql the following error occurs: calloc : Cannot allocate memory When run via ODBC from Cognos Framework Manager only works if we limit the retrieval to 3 million rows. I notice that the memory used by the query when run on the Server increases to about 2.4 GB before the query fails. Postgres version is 7.3.4 Running on Linux Redhat 7.2 4 GB memory 7 Processor 2.5 Ghz Shmmax set to 2 GB Configuration Parameters Shared Buffers 12 288 Max Connections 16 Wal buffers24 Sort Mem40960 Vacuum Mem 80192 Checkpoint Timeout 600 Enable Seqscan false Effective Cache Size 20 Results of explain analyze and expain analyze verbose: explain analyze select * from flash_by_branches; QUERY PLAN -- Seq Scan on flash_by_branches (cost=1.00..100567542.06 rows=26854106 width=68) (actual time=12.14..103936.35 rows=26854106 loops=1) Total runtime: 122510.02 msec (2 rows) explain analyze verbose: { SEQSCAN :startup_cost 1.00 :total_cost 100567542.06 :rows 26854106 :width 68 :qptargetlist ( { TARGETENTRY :resdom { RESDOM :resno 1 :restype 1043 :restypmod 8 :resname br_code :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 8 :varlevelsup 0 :varnoold 1 :varoattno 1 } } { TARGETENTRY :resdom { RESDOM :resno 2 :restype 23 :restypmod -1 :resname fty_code :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } } { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1082 :restypmod -1 :resname period :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 3 :vartype 1082 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } } { TARGETENTRY :resdom { RESDOM :resno 4 :restype 1700 :restypmod 786436 :resname value :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 4 :vartype 1700 :vartypmod 786436 :varlevelsup 0 :varnoold 1 :varoattno 4 } } { TARGETENTRY :resdom { RESDOM :resno 7 :restype 1700 :restypmod 786438 :resname value1 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 7 :vartype 1700 :vartypmod 786438 :varlevelsup 0 :varnoold 1 :varoattno 7 } } ) :qpqual :lefttree :righttree :extprm () :locprm () :initplan :nprm 0 :scanrelid 1 } Seq Scan on flash_by_branches (cost=1.00..100567542.06 rows=26854106 width=68) (actual time=6.59..82501.15 rows=2685 4106 loops=1) Total runtime: 102089.00 msec (196 rows) Please assist. Thanks, Howard Oblowitz --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.859 / Virus Database: 585 - Release Date: 14/02/2005
Re: [PERFORM] LVM and Postgres
Michael Stone wrote: Note that the WAL will wander around the disk as files are created and deleted, whereas tables are basically updated in place. Huh? I was rather under the impression that the WAL files (in pg_xlog, right?) were reused once they'd been created, so their locations on the disk should remain the same, as should their data blocks (roughly, depending on the implementation of the filesystem, of course). -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query Fails with error calloc - Cannot alocate memory
If you're trying to retrieve 26 million rows into RAM in one go of course it'll be trouble. Just use a cursor. (DECLARE/FETCH/MOVE) Chris Howard Oblowitz wrote: Hi … I am trying to run a query that selects 26 million rows from a table with 68 byte rows. When run on the Server via psql the following error occurs: calloc : Cannot allocate memory When run via ODBC from Cognos Framework Manager only works if we limit the retrieval to 3 million rows. I notice that the memory used by the query when run on the Server increases to about 2.4 GB before the query fails. Postgres version is 7.3.4 Running on Linux Redhat 7.2 4 GB memory 7 Processor 2.5 Ghz Shmmax set to 2 GB Configuration Parameters Shared Buffers 12 288 Max Connections 16 Wal buffers 24 Sort Mem40960 Vacuum Mem 80192 Checkpoint Timeout 600 Enable Seqscan false Effective Cache Size20 Results of explain analyze and expain analyze verbose: explain analyze select * from flash_by_branches; QUERY PLAN -- Seq Scan on flash_by_branches (cost=1.00..100567542.06 rows=26854106 width=68) (actual time=12.14..103936.35 rows=26854106 loops=1) Total runtime: 122510.02 msec (2 rows) explain analyze verbose: { SEQSCAN :startup_cost 1.00 :total_cost 100567542.06 :rows 26854106 :width 68 :qptargetlist ( { TARGETENTRY :resdom { RESDOM :resno 1 :restype 1043 :restypmod 8 :resname br_code :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 8 :varlevelsup 0 :varnoold 1 :varoattno 1 } } { TARGETENTRY :resdom { RESDOM :resno 2 :restype 23 :restypmod -1 :resname fty_code :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } } { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1082 :restypmod -1 :resname period :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 3 :vartype 1082 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } } { TARGETENTRY :resdom { RESDOM :resno 4 :restype 1700 :restypmod 786436 :resname value :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 4 :vartype 1700 :vartypmod 786436 :varlevelsup 0 :varnoold 1 :varoattno 4 } } { TARGETENTRY :resdom { RESDOM :resno 7 :restype 1700 :restypmod 786438 :resname value1 :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 7 :vartype 1700 :vartypmod 786438 :varlevelsup 0 :varnoold 1 :varoattno 7 } } ) :qpqual :lefttree :righttree :extprm () :locprm () :initplan :nprm 0 :scanrelid 1 } Seq Scan on flash_by_branches (cost=1.00..100567542.06 rows=26854106 width=68) (actual time=6.59..82501.15 rows=2685 4106 loops=1) Total runtime: 102089.00 msec (196 rows) Please assist. Thanks, Howard Oblowitz --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.859 / Virus Database: 585 - Release Date: 14/02/2005 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High context switches occurring
Tom Lane wrote: Anjan Dave [EMAIL PROTECTED] writes: -bash-3.00$ time pgbench -c 1000 -t 30 pgbench starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1000 number of transactions per client: 30 number of transactions actually processed: 3/3 tps = 45.871234 (including connections establishing) tps = 46.092629 (excluding connections establishing) I can hardly think of a worse way to run pgbench :-(. These numbers are about meaningless, for two reasons: 1. You don't want number of clients (-c) much higher than scaling factor (-s in the initialization step). The number of rows in the branches table will equal -s, and since every transaction updates one Should we throw a warning when someone runs the test this way? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] High context switches occurring
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: 1. You don't want number of clients (-c) much higher than scaling factor (-s in the initialization step). Should we throw a warning when someone runs the test this way? Not a bad idea (though of course only for the standard scripts). Tatsuo, what do you think? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query Fails with error calloc - Cannot alocate memory
On Mon, 2005-12-05 at 09:42 +0200, Howard Oblowitz wrote: I am trying to run a query that selects 26 million rows from a table with 68 byte rows. When run on the Server via psql the following error occurs: calloc : Cannot allocate memory That's precisely what I'd expect: the backend will process the query and begin sending back the entire result set to the client. The client will attempt to allocate a local buffer to hold the entire result set, which obviously fails in this case. You probably want to explicitly create and manipulate a cursor via DECLARE, FETCH, and the like -- Postgres will not attempt to do this automatically (for good reason). Postgres version is 7.3.4 You should consider upgrading, 7.3 is quite old. At the very least, you should probably be using the most recent 7.3.x release, 7.3.11. -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] TSearch2 vs. Apache Lucene
No, my problem is that using TSearch2 interferes with other core components of postgres like (auto)vacuum or dump/restore. ... So you'll avoid a non-core product and instead only use another non-core product...? Chris Michael Riess wrote: Has anyone ever compared TSearch2 to Lucene, as far as performance is concerned? I'll stay away from TSearch2 until it is fully integrated in the postgres core (like create index foo_text on foo (texta, textb) USING TSearch2). Because a full integration is unlikely to happen in the near future (as far as I know), I'll stick to Lucene. Mike ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org