Re: [PERFORM] [ADMIN] Messed up time zones
All the above are the exact same point in time merely stated as relevant to each location. Note that given a timestamp with time zone and a zone, PostgreSQL returns a timestamp without time zone (you know the zone since you specified it). Yes, I know the zone. But I don't know the offset from UTC. Example: template1= set timezone to 'UTC'; SET template1= select ('2011-10-30 01:00:00'::timestamptz) at time zone 'Europe/Budapest'; timezone - 2011-10-30 02:00:00-- Is it winter or summer time? (1 row) template1= select ('2011-10-30 00:00:00'::timestamptz) at time zone 'Europe/Budapest'; timezone - 2011-10-30 02:00:00-- Is it winter or summer time? What is the offset from UTC here? Can you tell me when it was in UTC? (1 row) template1= What is more: template1= select (('2011-10-30 00:00:00'::timestamptz) at time zone 'Europe/Budapest') is distinct from (('2011-10-30 01:00:00'::timestamptz) at time zone 'Europe/Budapest'); ?column? -- f (1 row) template1= Yeah, we know what time zone it is in, but we don't know when it was, thanks a lot. :-( It would be unambiguous to store the UTC offset along with the value. But it is not how it was implemented. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] Messed up time zones
So you took two distinct points in time, threw away some critical information, and are surprised why they are now equal? Well, I did not want to throw away any information. The actual representation could be something like: 2012-11-04 01:30:00-08 in Europe/Budapest, Winter time and 2012-11-04 01:30:00-08 in Europe/Budapest, Summer time. It would be unambiguous, everybody would know the time zone, the UTC offset and the time value, and conversion back to UTC would be unambiguous too. I presumed that the representation is like that. But I was wrong. I have checked other programming languages. As it turns out, nobody wants to change the representation just because there can be an ambiguous hour in every year. Now I think that most systems treat ambiguous time stamps as if they were in standard time. And who am I to go against the main flow? I'm sorry, I admit that the problem was in my head. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] ZFS vs. UFS
Hello, Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.) * ZFS is journaled, and it is more independent of the hardware. So if the computer goes wrong, I can move the zfs array to a different server. * UFS is not journaled. Also I have to rely on the RAID card to build the RAID array. If there is a hw problem with it, then I won't be able to recover the data easily. I wonder if UFS has better performance or not. Or can you suggest another fs? Just of the PGDATA directory. Thanks, Laszlo
Re: [PERFORM] ZFS vs. UFS
I wonder if UFS has better performance or not. Or can you suggest another fs? Just of the PGDATA directory. Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself. This is why I use a RAID array of 10 disks. So there is no single point of failure. What else could I do? (Yes, I can make regular backups, but that is not the same. I can still loose data...)
Re: [PERFORM] ZFS vs. UFS
On 24.07.2012 14:51, Laszlo Nagy wrote: * UFS is not journaled. There is journal support for UFS as far as i know. Please have a look at the gjournal manpage. Yes, but gjournal works for disk devices. I would have rely on the hw card for RAID. When the card goes wrong I won't be able to access my data. I could also buy an identical RAID card. In fact I could buy a complete backup server. But right now I don't have the money for that. So I would like to use a solution that allows me to recover from a failure even if the RAID card goes wrong. It might also be possible to combine gmirror + gjournal, but that is not good enough. Performance and stability of a simple gmirror with two disks is much worse then a raidz array with 10 disks (and hot spare), or even a raid 1+0 (and hot spare) that is supported by the hw RAID card. So I would like to stick with UFS+hw card support (and then I need to buy an identical RAID card if I can), or ZFS. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] queries are fast after dump-restore but slow again after some days dispite vacuum
Are you running a lot of full table updates? If you mean updates which are applied on every or almost every row of the table - yes, it happens with two rather small tables of max. 10 000 rows. But they are both not touched by the query with this big performance difference. I'm not an expert, but would it help to change fillfactor to about 45%? I'm just guessing that full table updates with fillfactor=45% could store the rows on the same page. Maybe I'm wrong. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query on CLUTER -ed tables
I suspect that, since the matched hid's probably aren't sequential, many of those ~500 product_price_offer_history rows will be far apart on disk. OMG I was a fool! I'll CLUSTER on a different index and it will be fast, I'm sure. Thanks! L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query on CLUTER -ed tables
Given two tables: CREATE TABLE product_price_history ( hid bigint NOT NULL, hdate timestamp without time zone NOT NULL, id bigint NOT NULL, product_id bigint NOT NULL, more columns here CONSTRAINT pk_product_price_history PRIMARY KEY (hid); CREATE INDEX idx_product_price_history_id_hdate ON product_price_history USING btree (id, hdate); CREATE TABLE product_price_offer_history ( hid bigint NOT NULL, product_price_id bigint NOT NULL, isfeatured smallint NOT NULL, price double precision NOT NULL, shipping double precision NOT NULL, some more coumns here CONSTRAINT pk_product_price_offer_history PRIMARY KEY (hid, offerno) ); Stats: product_price_history - tablesize=23GB, indexes size=4GB, row count = 87 million product_price_offer_history - tablesize=24GB, indexes size=7GB, row count = 235 million These tables store historical data of some million products from the last year. The following commands are executed on them daily: CLUSTER idx_product_price_history_id_hdate on product_price_history; CLUSTER pk_product_price_offer_history on product_price_offer_history; Here is a query: select date_part('epoch', min(pph.hdate) ) as hdate_ticks, min(ppoh.price+ppoh.shipping) as price_plus_shipping from product_price_history pph inner join product_price_offer_history ppoh on ppoh.hid = pph.hid where pph.id = 37632081 and ppoh.isfeatured=1 group by ppoh.merchantid,pph.hid,pph.hdate order by pph.hid asc I think that the query plan is correct: GroupAggregate (cost=5553554.25..5644888.17 rows=2283348 width=50) - Sort (cost=5553554.25..5559262.62 rows=2283348 width=50) Sort Key: pph.hid, ppoh.merchantid, pph.hdate - Nested Loop (cost=0.00..5312401.66 rows=2283348 width=50) - Index Scan using idx_product_price_history_id_hdate on product_price_history pph (cost=0.00..8279.80 rows=4588 width=16) Index Cond: (id = 37632081) - Index Scan using pk_product_price_offer_history on product_price_offer_history ppoh (cost=0.00..1149.86 rows=498 width=42) Index Cond: (ppoh.hid = pph.hid) Filter: (ppoh.isfeatured = 1) So it uses two index scans on the indexes we CLUSTER the tables on. Number of rows returned is usually between 100 and 20 000. Here is the problem. When I first open this query for a given identifier, it runs for 100 seconds. When I try to run it again for the same identifier it returns the same rows within one second! The indexes are very well conditioned: from the 235 million rows, any id given occurs at most 20 000 times. It is a btree index, so it should already be stored sorted, and the 20 000 rows to be returned should fit into a few database pages. Even if they are not in the cache, PostgreSQL should be able to read the required pages within a second. I understand that for an index scan, PostgreSQL also needs to read the rows from the table. But since these tables are CLUSTER-ed on those specific indexes, all the data needed shoud fit on a few database pages and PostgreSQL should be able to read them within a second. Then why it is taking 100 seconds to do the query for the first time and why it is just one sec for the second time? Probably my thinking is wrong, but I suppose it means that the data is spread on thousands of pages on the disk. How is that possible? What am I doing wrong? Thanks, Laszlo -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bad query plan when the wrong data type is used
This query: select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail from variation_item_sellingsite_asin visa inner join product p on p.id = visa.product_id inner join variation_item vi on vi.id = visa.variation_item_id where visa.id =4 runs in 43 msec. The visa.id column has int4 datatype. The query plan uses an index condition: Nested Loop (cost=0.00..26.19 rows=1 width=28) - Nested Loop (cost=0.00..17.75 rows=1 width=24) - Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16) Index Cond: (id = 4) - Index Scan using pk_product_id on product p (cost=0.00..9.16 rows=1 width=16) Index Cond: (p.id = visa.product_id) - Index Scan using pk_variation_item_id on variation_item vi (cost=0.00..8.43 rows=1 width=12) Index Cond: (vi.id = visa.variation_item_id) This query: select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail from variation_item_sellingsite_asin visa inner join product p on p.id = visa.product_id inner join variation_item vi on vi.id = visa.variation_item_id where visa.id =4.0 Runs for 1144 msec! Query plan uses seq scan + filter: Nested Loop (cost=33957.27..226162.68 rows=14374 width=28) - Hash Join (cost=33957.27..106190.76 rows=14374 width=20) Hash Cond: (visa.variation_item_id = vi.id) - Seq Scan on variation_item_sellingsite_asin visa (cost=0.00..71928.04 rows=14374 width=16) Filter: ((id)::numeric = 4.0) - Hash (cost=22026.01..22026.01 rows=954501 width=12) - Seq Scan on variation_item vi (cost=0.00..22026.01 rows=954501 width=12) - Index Scan using pk_product_id on product p (cost=0.00..8.33 rows=1 width=16) Index Cond: (p.id = visa.product_id) Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them. It would be better to do this, for each item in the loop: * evaluate the right side (which is float) * tell if it is an integer or not * if not an integer, then discard the row immediately * otherwise use its integer value for the index scan The result is identical, but it makes possible to use the index scan. Of course, I know that the query itself is wrong, because I sould not use a float where an int is expected. But this CAN be optimized, so I think it should be! My idea for the query optimizer is not to use the wider data type, but use the data type that has an index on it instead. (I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.) What do you think? Laszlo
[PERFORM] Get master-detail relationship metadata
Hi All, I'm working on a client program that iterates over master-detail relationships in a loop chain. Pseudo code: for row_1 in table_1: table_2 = get_details(row_1,table2) for row_2 in table_2: row_3 = get_details(row_2,table3) etc. process_data(row1,row_2,row_3,) My task is to write the get_details iterator effectively. The obvious way to do it is to query details in every get_details() call, but that is not efficient. We have relationships where one master only has a few details. For 1 million master rows, that would result in execution of millions of SQL SELECT commands, degrading the performance by magnitudes. My idea was that the iterator should pre-fetch and cache data for many master records at once. The get_details() would use the cached rows, thus reducing the number of SQL SELECT statements needed. Actually I wrote the iterator, and it works fine in some cases. For example: producers = get_rows(producer) for producer in producers: products = get_getails(producer,product) for product in products: prices = get_details(product,prices) for price in prices: process_product_price(producer,product,price) This works fine if one producer has not more than 1000 products and one product has not more than 10 prices. I can easly keep 10 000 records in memory. The actual code executes about 15 SQL queries while iterating over 1 million rows. Compared to the original obvious method, performance is increased to 1500% But sometimes it just doesn't work. If a producer has 1 million products, and one product has 100 prices, then it won't work, because I cannot keep 100 million prices in memory. My program should somehow figure out, how much rows it will get for one master, and select between the cached and not cached methods. So here is the question: is there a way to get this information from PostgreSQL itself? I know that the query plan contains information about this, but I'm not sure how to extract. Should I run an ANALYZE command of some kind, and parse the result as a string? For example: EXPLAIN select * from product where producer_id=1008; QUERY PLAN -- Seq Scan on product (cost=0.00..1018914.74 rows=4727498 width=1400) Filter: (producer_id = 1008) (2 rows) Then I could extract rows=4727498 to get an idea about how much detail rows I'll get for the master. Is there any better way to do it? And how reliable is this? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query + why bitmap index scan??
This will be simple question to answer. :-) There is a single table: select count(*) from product_price_history -- 12982555 rows This table has exactly one index and on primary key constraint: CREATE INDEX idx_product_price_history_id_hdate ON product_price_history USING btree (id, hdate); ALTER TABLE product_price_history ADD CONSTRAINT pk_product_price_history PRIMARY KEY(hid); No more constraints or indexes defined on this table. Rows are never updated or deleted in this table, they are only inserted. It was vacuum-ed and reindex-ed today. Stats on the table: seq scans=13, index scans=108, table size=3770MB, toast table size=8192 bytes, indexes size=666MB This query: select hid from product_price_history where id=35547581 Returns 759 rows in 8837 msec! How can this be that slow??? The query plan is: Bitmap Heap Scan on product_price_history (cost=13.90..1863.51 rows=472 width=8) Recheck Cond: (id = 35547581) - Bitmap Index Scan on idx_product_price_history_id_hdate (cost=0.00..13.78 rows=472 width=0) Index Cond: (id = 35547581) I don't understand why PostgreSQL uses bitmap heap scan + bitmap index scan? Why not just use an regular index scan? Data in a btree index is already sorted. A normal index scan should take no more than a few page reads. This sould never take 8 seconds. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query + why bitmap index scan??
On 2011-01-12 14:42, Florian Weimer wrote: * Laszlo Nagy: This query: select hid from product_price_history where id=35547581 Returns 759 rows in 8837 msec! How can this be that slow??? If most records are on different heap pages, processing this query requires many seeks. 11ms per seek is not too bad if most of them are cache misses. How about this: select id,hdate from product_price_history where id=35547581 -- 759 rows, 8837 ms Query time average: 3 sec. Query plan: Bitmap Heap Scan on product_price_history (cost=13.91..1871.34 rows=474 width=16) Recheck Cond: (id = 35547582) - Bitmap Index Scan on idx_product_price_history_id_hdate (cost=0.00..13.79 rows=474 width=0) Index Cond: (id = 35547582) Why still the heap scan here? All fields in the query are in the index... Wouldn't a simple index scan be faster? (This is only a theoretical question, just I'm curious.) My first idea to speed things up is to cluster this table regularly. That would convert (most of the) rows into a few pages. Few page reads - faster query. Is it a good idea? Another question. Do you think that increasing shared_mem would make it faster? Currently we have: shared_mem = 6GB work_mem = 512MB total system memory=24GB Total database size about 30GB, but there are other programs running on the system, and many other tables. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query + why bitmap index scan??
On 2011-01-12 15:36, Kevin Grittner wrote: Laszlo Nagygand...@shopzeus.com wrote: shared_mem = 6GB work_mem = 512MB total system memory=24GB In addition to the good advice from Ken, I suggest that you set effective_cache_size (if you haven't already). Add whatever the OS shows as RAM used for cache to the shared_mem setting. It was 1GB. Now I changed to 2GB. Although the OS shows 9GB inactive memory, we have many concurrent connections to the database server. I hope it is okay to use 2GB. But yeah, for your immediate problem, if you can cluster the table on the index involved, it will be much faster. Of course, if the table is already in a useful order for some other query, that might get slower, and unlike some other products, CLUSTER in PostgreSQL doesn't *maintain* that order for the data as new rows are added -- so this should probably become a weekly (or monthly or some such) maintenance operation. Thank you! After clustering, queries are really fast. I don't worry about other queries. This is the only way we use this table - get details for a given id value. I put the CLUSTER command into a cron script that runs daily. For the second time, it took 2 minutes to run so I guess it will be fine. Thank you for your help. Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
A change has been written to the WAL and fsync()'d, so Pg knows it's hit disk. It can now safely apply the change to the tables themselves, and does so, calling fsync() to tell the drive containing the tables to commit those changes to disk. The drive lies, returning success for the fsync when it's just cached the data in volatile memory. Pg carries on, shortly deleting the WAL archive the changes were recorded in or recycling it and overwriting it with new change data. The SSD is still merrily buffering data to write cache, and hasn't got around to writing your particular change yet. All right. I believe you. In the current Pg implementation, I need to turn of disk cache. But I would like to ask some theoretical questions. It is just an idea from me, and probably I'm wrong. Here is a scenario: #1. user wants to change something, resulting in a write_to_disk(data) call #2. data is written into the WAL and fsync()-ed #3. at this point the write_to_disk(data) call CAN RETURN, the user can continue his work (the WAL is already written, changes cannot be lost) #4. Pg can continue writting data onto the disk, and fsync() it. #5. Then WAL archive data can be deleted. Now maybe I'm wrong, but between #3 and #5, the data to be written is kept in memory. This is basically a write cache, implemented in OS memory. We could really handle it like a write cache. E.g. everything would remain the same, except that we add some latency. We can wait some time after the last modification of a given block, and then write it out. Is it possible to do? If so, then can we can turn off write cache for all drives, except the one holding the WAL. And still write speed would remain the same. I don't think that any SSD drive has more than some megabytes of write cache. The same amount of write cache could easily be implemented in OS memory, and then Pg would always know what hit the disk. Thanks, Laci -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
- Pg doesn't know the erase block sizes or positions. It can't group writes up by erase block except by hoping that, within a given file, writing in page order will get the blocks to the disk in roughly erase-block order. So your write caching isn't going to do anywhere near as good a job as the SSD's can. Okay, I see. We cannot query erase block size from an SSD drive. :-( I don't think that any SSD drive has more than some megabytes of write cache. The big, lots-of-$$ ones have HUGE battery backed caches for exactly this reason. Heh, this is why they are so expensive. :-) The same amount of write cache could easily be implemented in OS memory, and then Pg would always know what hit the disk. Really? How does Pg know what order the SSD writes things out from its cache? I got the point. We cannot implement an efficient write cache without much more knowledge about how that particular drive works. So... the only solution that works well is to have much more RAM for read cache, and much more RAM for write cache inside the RAID controller (with BBU). Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Heikki Linnakangas wrote: Laszlo Nagy wrote: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. At 32GB database size, I'd seriously consider just buying a server with a regular hard drive or a small RAID array for redundancy, and stuffing 16 or 32 GB of RAM into it to ensure everything is cached. That's tried and tested technology. 32GB is for one table only. This server runs other applications, and you need to leave space for sort memory, shared buffers etc. Buying 128GB memory would solve the problem, maybe... but it is too expensive. And it is not safe. Power out - data loss. I don't know how you came to the 32 GB figure, but keep in mind that administration is a lot easier if you have plenty of extra disk space for things like backups, dumps+restore, temporary files, upgrades etc. This disk space would be dedicated for a smaller tablespace, holding one or two bigger tables with index scans. Of course I would never use an SSD disk for storing database backups. It would be waste of money. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Robert Haas wrote: 2009/11/14 Laszlo Nagy gand...@shopzeus.com: 32GB is for one table only. This server runs other applications, and you need to leave space for sort memory, shared buffers etc. Buying 128GB memory would solve the problem, maybe... but it is too expensive. And it is not safe. Power out - data loss. I'm sorry I though he was talking about keeping the database in memory with fsync=off. Now I see he was only talking about the OS disk cache. My server has 24GB RAM, and I cannot easily expand it unless I throw out some 2GB modules, and buy more 4GB or 8GB modules. But... buying 4x8GB ECC RAM (+throwing out 4x2GB RAM) is a lot more expensive than buying some 64GB SSD drives. 95% of the table in question is not modified. Only read (mostly with index scan). Only 5% is actively updated. This is why I think, using SSD in my case would be effective. Sorry for the confusion. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
* I could buy two X25-E drives and have 32GB disk space, and some redundancy. This would cost about $1600, not counting the RAID controller. It is on the edge. This was the solution I went with (4 drives in a raid 10 actually). Not a cheap solution, but the performance is amazing. I've came across this article: http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ It's from a Linux MySQL user so it's a bit confusing but it looks like he has some reservations about performance vs reliability of the Intel drives - apparently they have their own write cache and when it's disabled performance drops sharply. Ok, I'm getting confused here. There is the WAL, which is written sequentially. If the WAL is not corrupted, then it can be replayed on next database startup. Please somebody enlighten me! In my mind, fsync is only needed for the WAL. If I could configure postgresql to put the WAL on a real hard drive that has BBU and write cache, then I cannot loose data. Meanwhile, product table data could be placed on the SSD drive, and I sould be able to turn on write cache safely. Am I wrong? L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] SSD + RAID
Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. * I could buy two X25-E drives and have 32GB disk space, and some redundancy. This would cost about $1600, not counting the RAID controller. It is on the edge. * I could also buy many cheaper MLC SSD drives. They cost about $140. So even with 10 drives, I'm at $1400. I could put them in RAID6, have much more disk space (256GB), high redundancy and POSSIBLY good read/write speed. Of course then I need to buy a good RAID controller. My question is about the last option. Are there any good RAID cards that are optimized (or can be optimized) for SSD drives? Do any of you have experience in using many cheaper SSD drives? Is it a bad idea? Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Note that some RAID controllers (3Ware in particular) refuse to recognize the MLC drives, in particular, they act as if the OCZ Vertex series do not exist when connected. I don't know what they're looking for (perhaps some indication that actual rotation is happening?) but this is a potential problem make sure your adapter can talk to these things! BTW I have done some benchmarking with Postgresql against these drives and they are SMOKING fast. I was thinking about ARECA 1320 with 2GB memory + BBU. Unfortunately, I cannot find information about using ARECA cards with SSD drives. I'm also not sure how they would work together. I guess the RAID cards are optimized for conventional disks. They read/write data in bigger blocks and they optimize the order of reading/writing for physical cylinders. I know for sure that this particular areca card has an Intel dual core IO processor and its own embedded operating system. I guess it could be tuned for SSD drives, but I don't know how. I was hoping that with a RAID 6 setup, write speed (which is slower for cheaper flash based SSD drives) would dramatically increase, because information written simultaneously to 10 drives. With very small block size, it would probably be true. But... what if the RAID card uses bigger block sizes, and - say - I want to update much smaller blocks in the database? My other option is to buy two SLC SSD drives and use RAID1. It would cost about the same, but has less redundancy and less capacity. Which is the faster? 8-10 MLC disks in RAID 6 with a good caching controller, or two SLC disks in RAID1? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] random_page_cost for tablespace
Hi All, We have a bigger table with some million rows. Number of index scans is high, number of seq reads is low. This table if often joined with others... so we want to buy a new SSD drive, create a tablespace on it and put this big table on it. Random read speed on SSD is identical to seq read. However, I need to tell the optimizer that random_page_cost is less for the new tablespace. Is there a way to do it? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random_page_cost for tablespace
Robert Haas írta: 2009/11/9 Laszlo Nagy gand...@shopzeus.com: We have a bigger table with some million rows. Number of index scans is high, number of seq reads is low. This table if often joined with others... so we want to buy a new SSD drive, create a tablespace on it and put this big table on it. Random read speed on SSD is identical to seq read. However, I need to tell the optimizer that random_page_cost is less for the new tablespace. Is there a way to do it? I happen to be working on a patch for this exact feature. However, even assuming it gets in, that means waiting for 8.5. That will be a very nice feature. Thank you! :-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is my stats collector so busy?
Tom Lane wrote: Laszlo Nagy gand...@shopzeus.com writes: On a 8 processor system, my stats collector is always at 100% CPU. What platform? What Postgres version? regards, tom lane 8.3.5 on FreeBSD 7.0 amd64 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is my stats collector so busy?
What version of Postgres are you using? 8.3.5 on FreeBSD amd64 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Why is my stats collector so busy?
On a 8 processor system, my stats collector is always at 100% CPU. Meanwhile disk I/O is very low. We have many databases, they are accessed frequently. Sometimes there are big table updates, but in most of the time only simple queries are ran against the databases, returning a few records only. From the maximum possible 8.0 system load, the average load is always above 1.1 and from this, 1.0 is the stats collector and 0.1 is the remaining of the system. If I restart the postgresql server, then the stats collector uses 0% CPU for about 10 minutes, then goes up to 100% again. Is there a way to tell why it is working so much? I asked this problem some months ago on a different mailing list. I was asked to provide tracebacks of the stats collector, but due to a bug in the FreeBSD ppid() function, I'm not able to trace the stats collector. Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partial index usage
Hi All, I have these indexes on a table: CREATE INDEX uidx_product_partno_producer_id ON product USING btree (partno, producer_id); CREATE INDEX idx_product_partno ON product USING btree (partno); Can I safely delete the second one? Will postgresql use (partno,producer_id) when it only needs to order by partno? (partno is a text field, producer_id is int4). Index sizes: 172MB and 137MB. I guess if I only had one index, it would save memory and increase performance. Another pair of incides, 144MB and 81MB respecively: CREATE INDEX idx_product_producer_uploads ON product USING btree (producer_id, am_upload_status_id); CREATE INDEX idx_product_producer_id ON product USING btree (producer_id); am_upload_status_id is also an int4. Can I delete the second index without performance drawback? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Big index sizes
We have serveral table where the index size is much bigger than the table size. Example: select count(*) from product_price -- 2234244 Table size: 400 MB Index size: 600 MB After executing reindex table product_price, index size reduced to 269MB. I believe this affects performance. Vacuuming a table does not rebuild the indexes, am I right? I'm not sure if I need to do this manually, or is this the result of another problem? (For example, too many open transactions, frequent updates?) Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
My other idea was that there are so many indexes on this table, maybe the update is slow because of the indexes? Updating indexes is certainly very far from being free. How many is many? Number of indexes = 15. 3 indexex are on text type column, 500MB in size each. Other are on int8 and timestamp columns, cca. 200MB each. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update - SOLVED!
Inf 8.3 the HOT feature may help if the columns being updated are indexed ... what version of PostgreSQL is this again ? (Forgive my lack of memory -- the last few days I've forgotten a lot, heh heh.) 8.3.5. The colum that was being updated is part of one small index only. Any chances to reduce those to a bare minimum, perhaps using conditional index strategies or even some form of replication, so the primary uses indexes related to the updates and the mirror uses indexes related to the read-only / reporting needs ? Perhaps some form of staging table with no indexes to load, check data, etc. and then insert. Any way to reduce those ? Check the usage via the system stats on table/index use and try removing some and testing to see what makes a difference. We tried to remove all indexes on a test system and the update was speedy. We are going to try to reduce the row size also move static description/name/textual data into a separate table, and leave frequently updated data in the original one. We tested this theoretical version: Query returned successfully: 182752 rows affected, 56885 ms execution time. This is much faster. However, this table is used by hundreds of programs. Anyway, I got the answer to my question. Thank you! Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow table update
SQL: update product set sz_category_id=null where am_style_kw1 is not null and sz_category_id is not null query plan: Seq Scan on product (cost=0.00..647053.30 rows=580224 width=1609) Filter: ((am_style_kw1 IS NOT NULL) AND (sz_category_id IS NOT NULL)) Information on the table: row count ~ 2 million table size: 4841 MB toast table size: 277mb indexes size: 4434 MB Computer: FreeBSD 7.0 stable, Dual Xeon Quad code 5420 2.5GHZ, 8GB memory, 6 ES SATA disks in hw RAID 6 (+2GB write back cache) for the database. Autovacuum is enabled. We also perform vacuum analyze on the database, each day. Here are some non-default values from postgresql.conf: shared_buffers=400MB maintenance_work_mem = 256MB max_fsm_pages = 100 There was almost no load on the machine (CPU: mostly idle, IO: approx. 5% total) when we started this update. Maybe I'm wrong with this, but here is a quick calculation: the RAID array should do at least 100MB/sec. Reading the whole table should not take more than 1 min. I think about 20% of the rows should have been updated. Writting out all changes should not take too much time. I believe that this update should have been completed within 2-3 minutes. In reality, after 2600 seconds I have cancelled the query. We monitored disk I/O and it was near 100% all the time. What is wrong? Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
Laszlo Nagy wrote: SQL: update product set sz_category_id=null where am_style_kw1 is not null and sz_category_id is not null Hmm, this query: select count(*) from product where am_style_kw1 is not null and sz_category_id is not null and sz_category_id4809 opens in 10 seconds. The update would not finish in 2600 seconds. I don't understand. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
If the table has some sort of FK relations it might be being slowed by the need to check a row meant to be deleted has any children. If you look at my SQL, there is only one column to be updated. That column has no foreign key constraint. (It should have, but we did not want to add that constraint in order to speed up updates.) Perhaps triggers ? Table product has no triggers. If the table is very bloated with lots of dead rows (but you did say you vacuum frequently and check the results to make sure they are effective?) that would slow it down. I'm not sure how to check if the vacuum was effective. But we have max_fsm_pages=100 in postgresql.conf, and I do not get any errors from the daily vacuum script, so I presume that the table hasn't got too many dead rows. Anyway, the table size is only 4GB. Even if half of the rows are dead, the update should run quite quickly. Another argument is that when I select count(*) instead of UPDATE, then I get the result in 10 seconds. I don't think that dead rows can make such a big difference between reading and writing. My other idea was that there are so many indexes on this table, maybe the update is slow because of the indexes? The column being updated has only one index on it, and that is 200MB. But I have heard somewhere that because of PostgreSQL's multi version system, sometimes the system needs to update indexes with columns that are not being updated. I'm not sure. Might this be the problem? A long running transaction elsewhere that is blocking the delete ? Did you check the locks ? Sorry, this was an update. A blocking transaction would never explain why the disk I/O went up to 100% for 2600 seconds. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
I just tested the same on a test machine. It only has one processor 1GB memory, and one SATA disk. The same select count(*) was 58 seconds. I started the same UPDATE with EXPLAIN ANALYZE. It is running since 1000 seconds. I'm now 100% sure that the problem is with the database, because this machine has nothing but a postgresql server running on it. I'll post the output of explain analyze later. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] rebellious pg stats collector (reopened case)
and see if its output changes when you start to trace it. %cat test.c #include stdio.h int main() { while(1) { sleep(5); printf(ppid = %d\n, getppid()); } } %gcc -o test test.c %./test ppid = 47653 ppid = 47653 ppid = 47653 # Started truss -p 48864 here! ppid = 49073 ppid = 49073 ppid = 49073 Agreed, but we need to understand what the tools being used to investigate the problem are doing ... Unfortunately, I'm not able to install strace: # pwd /usr/ports/devel/strace # make === strace-4.5.7 is only for i386, while you are running amd64. *** Error code 1 Stop in /usr/ports/devel/strace. I'll happily install any trace tool, but have no clue which one would help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] rebellious pg stats collector (reopened case)
Posted to the wrong list by mistake. Sorry. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] UFS 2: soft updates vs. gjournal (AKA: Choosing a filesystem 2.)
Hi again, Should I use gjournal on FreeBSD 7? Or just soft updates? Here is my opinion: I suspect that gjournal would be much slower than soft updates. Also gjournal is relatively new code, not very well tested. But gjournal is better when the system crashes. Although I have heard that sometimes gjournal will crash the system itself. There are more pros for soft updates I would pefer that. But please let me know if I'm wrong. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Choosing a filesystem
I'm about to buy a new server. It will be a Xeon system with two processors (4 cores per processor) and 16GB RAM. Two RAID extenders will be attached to an Intel s5000 series motherboard, providing 12 SAS/Serial ATA connectors. The server will run FreeBSD 7.0, PostgreSQL 8, apache, PHP, mail server, dovecot IMAP server and background programs for database maintenance. On our current system, I/O performance for PostgreSQL is the biggest problem, but sometimes all CPUs are at 100%. Number of users using this system: PostgreSQL: 30 connections Apache: 30 connections IMAP server: 15 connections The databases are mostly OLTP, but the background programs are creating historical data and statistic data continuously, and sometimes web site visitors/serach engine robots run searches in bigger tables (with 3million+ records). There is an expert at the company who sells the server, and he recommended that I use SAS disks for the base system at least. I would like to use many SAS disks, but they are just too expensive. So the basic system will reside on a RAID 1 array, created from two SAS disks spinning at 15 000 rpm. I will buy 10 pieces of Seagate Barracuda 320GB SATA (ES 7200) disks for the rest. The expert told me to use RAID 5 but I'm hesitating. I think that RAID 1+0 would be much faster, and I/O performance is what I really need. I would like to put the WAL file on the SAS disks to improve performance, and create one big RAID 1+0 disk for the data directory. But maybe I'm completely wrong. Can you please advise how to create logical partitions? The hardware is capable of handling different types of RAID volumes on the same set of disks. For example, a smaller RAID 0 for indexes and a bigger RAID 5 etc. If you need more information about the database, please ask. :-) Thank you very much, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Choosing a filesystem
going to the same drives. This turns your fast sequential I/O into random I/O with the accompaning 10x or more performance decrease. Unless you have a good RAID controller with battery-backed-up cache. All right. :-) This is what I'll have: Boxed Intel Server Board S5000PSLROMB with 8-port SAS ROMB card (Supports 45nm processors (Harpertown and Wolfdale-DP) Intel® RAID Activation key AXXRAK18E enables full intelligent SAS RAID on S5000PAL, S5000PSL, SR4850HW4/M, SR6850HW4/M. RoHS Compliant. 512 MB 400MHz DDR2 ECC Registered CL3 DIMM Single Rank, x8(for s5000pslromb) 6-drive SAS/SATA backplane with expander (requires 2 SAS ports) for SC5400 and SC5299 (two pieces) 5410 Xeon 2.33 GHz/1333 FSB/12MB Dobozos , Passive cooling / 80W (2 pieces) 2048 MB 667MHz DDR2 ECC Fully Buffered CL5 DIMM Dual Rank, x8 (8 pieces) SAS disks will be: 146.8 GB, SAS 3G,15000RPM, 16 MB cache (two pieces) SATA disks will be: HDD Server SEAGATE Barracuda ES 7200.1 (320GB,16MB,SATA II-300) __(10 pieces) I cannot spend more money on this computer, but since you are all talking about battery back up, I'll try to get money from the management and buy this: Intel® RAID Smart Battery AXXRSBBU3, optional battery back up for use with AXXRAK18E and SRCSAS144E. RoHS Complaint. This server will also be an IMAP server, web server etc. so I'm 100% sure that the SAS disks will be used for logging. I have two spare 200GB SATA disks here in the office but they are cheap ones designed for desktop computers. Is it okay to dedicate these disks for the WAL file in RAID1? Will it improve performance? How much trouble would it cause if the WAL file goes wrong? Should I just put the WAL file on the RAID 1+0 array? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Planning a new server - help needed
Hello, I need to install a new server for postgresql 8.3. It will run two databases, web server and some background programs. We already have a server but it is becoming slow and we would like to have something that is faster. It is a cost sensitive application, and I would like to get your opinion in some questions. The database itself is an OLTP system. There are many smaller tables, and some bigger ones (biggest table with 1.2 million records, table size 966MB, indexes size 790MB). In the bigger tables there are only a few records updated frequently, most of the other records are not changed. The smaller tables are updated continuously. Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs say that it is better to use FreeBSD because it can alter the I/O priority of processes dynamically. The latest legacy release is 6.3 which is probably more stable. However, folks say that 7.0 has superior performance on the same hardware. Can I use 7.0 on a production server? Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller with 8 SATA 2 disks. The operating system would be on another disk pair, connected to the motherboard's controller. I wonder if I can get more performance with SCSI, for the same amount of money? (I can spend about $1500 on the controller and the disks, that would cover 10 SATA 2 disks and the controller.) Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS 2 + soft updates will be better, but I'm not sure. Which is better? Question 4. How to make the partitions? This is the hardest question. Here is my plan: - the OS resides on 2 disks, RAID 1 - the databases should go on 8 disks, RAID 0 + 1 However, the transaction log file should be on a separate disk and maybe I could gain more performance by putting indexes on a separate drive, but I do not want to reduce the number of disks in the RAID 0+1 array. Should I put indexes and transaction log on the RAID 1 array? Or should I invest a bit more money, add an SATA RAID controller with 16 channels and add more disks? Would it pay the bill? Another alternative is to put the biggest tables on a separate array so that it will be faster when we join these tables with other tables. I know that it is hard to answer without knowing the structure of the databases. :-( I can make tests with different configurations later, but I would like to know your opinion first - what should I try? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planning a new server - help needed
I guess you mean postgresql 8.3.1? :-) Yep. Sorry. Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS 2 + soft updates will be better, but I'm not sure. Which is better? I'd stick with ufs2 atm. There are some issues with zfs which probably have been ironed out by now but ufs2 has been deployed for a longer time. Performance-wise they are about the same. Thank you. I suspected the same but it was good to get positive confirmation. Question 4. How to make the partitions? This is the hardest question. Here is my plan: - the OS resides on 2 disks, RAID 1 - the databases should go on 8 disks, RAID 0 + 1 If you have enough disks raid-6 should perform almost as good as raid 1+0. Hmm, I have heard that RAID 1 or RAID 1 + 0 should be used for databases, never RAID 5. I know nothing about RAID 6. I guess I must accept your suggestion since you have more experience than I have. :-) Obviously, it would be easier to manage a single RAID 6 array. I've setup 11 disks in raid-6 plus one hotspare so I can get more space out of it. Enough disks are approx. eight and up. The RAID controller that I have selected can only handle 8 disks. I guess I need to find a different one with 16 channels and use more disks. So are you saying that with all disks in a bigger RAID 6 array, I will get the most out of the hardware? In that case, I'll try to get a bit more money from the management and build RAID 6 with 12 disks. I also feel that I need to use a separate RAID 1 array (I prefer gmirror) for the base system. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance on seq scan
I have had extremely bad performance historically with onboard SATA chipsets on Linux. The one exception has been with the Intel based chipsets (not the CPU, the I/O chipset). This board has Intel chipset. I cannot remember the exact type but it was not in the low end category. dmesg says: Intel ICH7 SATA300 controller kernel: ad4: 152626MB SAMSUNG HD160JJ ZM100-33 at ata2-master SATA150 kernel: ad4: 152627MB SAMSUNG HD160JJ ZM100-33 at ata3-master SATA150 It is very likely that you are having problems with the driver for the chipset. Are you running RAID1 in hardware? If so, turn it off and see what the performance is. The onboard hardware RAID is worse than useless, it actually slows the I/O down. I'm using software raid, namely gmirror: GEOM_MIRROR: Device gm0 created (id=2574033628). GEOM_MIRROR: Device gm0: provider ad4 detected. GEOM_MIRROR: Device gm0: provider ad6 detected. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider ad6 activated. #gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 2574033628 Providers: 1. Name: mirror/gm0 Mediasize: 160040803328 (149G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad4 Mediasize: 160040803840 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 1153981856 2. Name: ad6 Mediasize: 160041885696 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3520427571 I tried to do: #sysctl vfs.read_max=32 vfs.read_max: 6 - 32 but I could not reach better disk read performance. Thank you for your suggestions. Looks like I need to buy SCSI disks. Regards, Laszlo ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Poor performance on seq scan
Hello, I have a big table called products. Table size: 1123MB. Toast table size: 32MB. Indexes size: 380MB. I try to do a query like this: select id,name from products where name like '%Mug%'; Yes, I know that tsearch2 is better for this, but please read on. The above query gives this plan: Seq Scan on product (cost=0.00..153489.52 rows=31390 width=40) Filter: (name ~~ '%Mug%'::text) When I use this with explain analyze: Seq Scan on product (cost=0.00..153489.52 rows=31390 width=40) (actual time=878.873..38300.588 rows=72567 loops=1) Filter: (name ~~ '%Mug%'::text) Total runtime: 38339.026 ms Meanwhile, iostat 5 gives something like this: tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 0 12 125.66 128 15.75 125.26 128 15.68 10 0 85 6 0 0 12 124.66 129 15.67 124.39 129 15.64 12 0 85 3 0 0 12 117.13 121 13.87 117.95 121 13.96 12 0 84 5 0 0 12 104.84 118 12.05 105.84 118 12.19 10 0 87 2 0 130 transfers per second with 12-15MB/sec transfer speed. (FreeBSD 6.1 with two STATA150 drives in gmirror RAID1) I made another test. I create a file with the identifiers and names of the products: psql#\o products.txt psql#select id,name from product; Then I can search using grep: grep Mug products.txt | cut -f1 -d\| There is a huge difference. This command runs within 0.5 seconds. That is, at least 76 times faster than the seq scan. It is the same if I vacuum, backup and restore the database. I thought that the table is stored in one file, and the seq scan will be actually faster than grepping the file. Can you please tell me what am I doing wrong? I'm not sure if I can increase the performance of a seq scan by adjusting the values in postgresql.conf. I do not like the idea of exporting the product table periodically into a txt file, and search with grep. :-) Another question: I have a btree index on product(name). It contains all product names and the identifiers of the products. Wouldn't it be easier to seq scan the index instead of seq scan the table? The index is only 66MB, the table is 1123MB. I'm new to this list and also I just recently started to tune postgresql so please forgive me if this is a dumb question. Regards, Laszlo ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Poor performance on seq scan
Luke Lonergan írta: Lazlo, Meanwhile, iostat 5 gives something like this: tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 This is your problem. Do the following and report the results here: Take the number of GB of memory you have (say 2 for 2GB), multiply it by 25. This is the number of 8KB pages you can fit in twice your ram. Let's say you have 2GB - the result is 500,000. Use that number to do the following test on your database directory: time bash -c dd if=/dev/zero of=/dbdir/bigfile bs=8k count=number_from_above sync I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root of this fs is /usr. time sh -c dd if=/dev/zero of=/usr/test/bigfile bs=8k count=25 sync 25+0 records in 25+0 records out 204800 bytes transferred in 48.030627 secs (42639460 bytes/sec) 0.178u 8.912s 0:48.31 18.7% 9+96k 37+15701io 0pf+0w Then do this: time bash -c dd if=/dbdir/bigfile of=/dev/null bs=8k time sh -c dd if=/usr/test/bigfile of=/dev/null bs=8k 25+0 records in 25+0 records out 204800 bytes transferred in 145.293473 secs (14095609 bytes/sec) 0.110u 5.857s 2:25.31 4.1% 10+99k 32923+0io 0pf+0w At this point I thought there was another process reading doing I/O so I retried: 25+0 records in 25+0 records out 204800 bytes transferred in 116.395211 secs (17595226 bytes/sec) 0.137u 5.658s 1:56.51 4.9% 10+103k 29082+0io 0pf+1w and again: 25+0 records in 25+0 records out 204800 bytes transferred in 120.198224 secs (17038521 bytes/sec) 0.063u 5.780s 2:00.21 4.8% 10+98k 29776+0io 0pf+0w This is a mirrored disk with two SATA disks. In theory, writing should be slower than reading. Is this a hardware problem? Or is it that sync did not do the sync? Laszlo ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Poor performance on seq scan
Heikki Linnakangas wrote: Is there any other columns besides id and name in the table? How big is products.txt compared to the heap file? Yes, many other columns. The products.txt is only 59MB. It is similar to the size of the index size (66MB). Another question: I have a btree index on product(name). It contains all product names and the identifiers of the products. Wouldn't it be easier to seq scan the index instead of seq scan the table? The index is only 66MB, the table is 1123MB. Probably, but PostgreSQL doesn't know how to do that. Even if it did, it depends on how many matches there is. If you scan the index and then fetch the matching rows from the heap, you're doing random I/O to the heap. That becomes slower than scanning the heap sequentially if you're going to get more than a few hits. I have 700 000 rows in the table, and usually there are less than 500 hits. So probably using a seq index scan would be faster. :-) Now I also tried this: create table test(id int8 not null primary key, name text); insert into test select id,name from product; And then: zeusd1= explain analyze select id,name from test where name like '%Tiffany%'; QUERY PLAN - Seq Scan on test (cost=0.00..26559.62 rows=79 width=40) (actual time=36.595..890.903 rows=117 loops=1) Filter: (name ~~ '%Tiffany%'::text) Total runtime: 891.063 ms (3 rows) But this might be coming from the disk cache. Thank you for your comments. We are making progress. Laszlo ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] tsearch2 question (was: Poor performance on seq scan)
Tom Lane wrote: Only if the index is capable of disgorging the original value of the indexed column, a fact not in evidence in general (counterexample: polygons indexed by their bounding boxes in an r-tree). But yeah, it's interesting to think about applying filters at the index fetch step for index types that can hand back full values. This has been discussed before --- I think we had gotten as far as speculating about doing joins with just index values. See eg here: http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php A lot of the low-level concerns have already been dealt with in order to support bitmap indexscans, but applying non-indexable conditions before fetching from the heap is still not done. To overcome this problem, I created a smaller shadow table: CREATE TABLE product_search ( id int8 NOT NULL, name_desc text, CONSTRAINT pk_product_search PRIMARY KEY (id) ); insert into product_search select id, name || ' ' || coalesce(description,'') from product; Obviously, this is almost like an index, but I need to maintain it manually. I'm able to search with zeusd1= explain analyze select id from product_search where name_desc like '%Mug%'; QUERY PLAN Seq Scan on product_search (cost=0.00..54693.34 rows=36487 width=8) (actual time=20.036..2541.971 rows=91399 loops=1) Filter: (name_desc ~~ '%Mug%'::text) Total runtime: 2581.272 ms (3 rows) The total runtime remains below 3 sec in all cases. Of course I still need to join the main table to the result: explain analyze select s.id,p.name from product_search s inner join product p on (p.id = s.id) where s.name_desc like '%Tiffany%' QUERY PLAN Nested Loop (cost=0.00..55042.84 rows=58 width=40) (actual time=164.437..3982.610 rows=117 loops=1) - Seq Scan on product_search s (cost=0.00..54693.34 rows=58 width=8) (actual time=103.651..2717.914 rows=117 loops=1) Filter: (name_desc ~~ '%Tiffany%'::text) - Index Scan using pk_product_id on product p (cost=0.00..6.01 rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117) Index Cond: (p.id = outer.id) Total runtime: 4007.283 ms (6 rows) Took 4 seconds. Awesome! With the original table, it used to be one or two minutes! Now you can ask, why am I not using tsearch2 for this? Here is answer: CREATE TABLE product_search ( id int8 NOT NULL, ts_name_desc tsvector, CONSTRAINT pk_product_search PRIMARY KEY (id) ); insert into product_search select id, to_tsvector(name || ' ' coalesce(description,'')) from product; CREATE INDEX idx_product_search_ts_name_desc ON product_search USING gist (ts_name_desc); VACUUM product_search; zeusd1= explain analyze select id from product_search where ts_name_desc @@ to_tsquery('mug'); QUERY PLAN --- Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=954.669..13112.009 rows=91434 loops=1) Filter: (ts_name_desc @@ '''mug'''::tsquery) - Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455 rows=91436 loops=1) Index Cond: (ts_name_desc @@ '''mug'''::tsquery) Total runtime: 13155.724 ms (5 rows) zeusd1= explain analyze select id from product_search where ts_name_desc @@ to_tsquery('tiffany'); QUERY PLAN Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=13151.725..13639.112 rows=76 loops=1) Filter: (ts_name_desc @@ '''tiffani'''::tsquery) - Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705 rows=81 loops=1) Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery) Total runtime: 13639.478 ms (5 rows) At least 13 seconds, and the main table is not joined yet. Can anybody explain to me, why the seq scan is faster than the bitmap index? In the last example there were only 81 rows returned, but it took more than 13 seconds. :( Even if the whole table can be cached into memory (which isn't the case), the bitmap index should be much faster. Probably there is a
Re: [PERFORM] Poor performance on seq scan
Tom Lane wrote: Why is that showing 85+ percent *system* CPU time?? I could believe a lot of idle CPU if the query is I/O bound, or a lot of user time if PG was being a hog about doing the ~~ comparisons (not too unlikely BTW). I'm sorry, this was really confusing. I don't know what it was - probably a background system process, started from cron (?). I retried the same query and I got this: zeusd1= explain analyze select id,name from product where name like '%Mug%'; QUERY PLAN Seq Scan on product (cost=0.00..206891.34 rows=36487 width=40) (actual time=17.188..44585.176 rows=91399 loops=1) Filter: (name ~~ '%Mug%'::text) Total runtime: 44631.150 ms (3 rows) tty ad4 ad6 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 62 115.25 143 16.06 116.03 143 16.17 3 0 9 3 85 0 62 122.11 144 17.12 121.78 144 17.07 6 0 3 2 89 0 62 126.18 158 19.45 125.86 157 19.28 5 0 11 6 79 0 62 126.41 131 16.13 127.52 132 16.39 5 0 9 6 80 0 62 127.80 159 19.81 126.89 158 19.55 5 0 9 0 86 0 62 125.29 165 20.15 126.26 165 20.30 5 0 14 2 80 0 62 127.22 164 20.32 126.74 165 20.37 5 0 9 0 86 0 62 121.34 150 17.75 120.76 149 17.54 1 0 13 3 82 0 62 121.40 143 16.92 120.33 144 16.89 5 0 11 3 82 0 62 127.38 154 19.12 127.17 154 19.09 8 0 8 5 80 0 62 126.88 129 15.95 127.00 128 15.84 5 0 9 5 82 0 62 118.48 121 13.97 119.28 121 14.06 6 0 17 3 74 0 62 127.23 146 18.10 126.79 146 18.04 9 0 20 2 70 0 62 127.27 153 18.98 128.00 154 19.21 5 0 17 0 79 0 62 127.02 130 16.09 126.28 130 16.00 10 0 16 3 70 0 62 123.17 125 15.00 122.40 125 14.91 5 0 14 2 80 0 62 112.37 130 14.24 112.62 130 14.27 0 0 14 3 83 0 62 115.83 138 15.58 113.97 138 15.33 3 0 18 0 79 A bit better transfer rate, but nothing serious. Regards, Laszlo ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor performance on seq scan
Craig A. James wrote: There IS a bug for SATA disk drives in some versions of the Linux kernel. On a lark I ran some of the I/O tests in this thread, and much to my surprise discovered my write speed was 6 MB/sec ... ouch! On an identical machine, different kernel, the write speed was 54 MB/sec. My disks are running in SATA150 mode. Whatever it means. I'm using FreeBSD, and not just because it dynamically alters the priority of long running processes. :-) Laszlo ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org