Re: [PERFORM] [ADMIN] Messed up time zones
So you took two distinct points in time, threw away some critical information, and are surprised why they are now equal? Well, I did not want to throw away any information. The actual representation could be something like: "2012-11-04 01:30:00-08 in Europe/Budapest, Winter time" and "2012-11-04 01:30:00-08 in Europe/Budapest, Summer time". It would be unambiguous, everybody would know the time zone, the UTC offset and the time value, and conversion back to UTC would be unambiguous too. I presumed that the representation is like that. But I was wrong. I have checked other programming languages. As it turns out, nobody wants to change the representation just because there can be an ambiguous hour in every year. Now I think that most systems treat ambiguous time stamps as if they were in standard time. And who am I to go against the main flow? I'm sorry, I admit that the problem was in my head. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] Messed up time zones
All the above are the exact same point in time merely stated as relevant to each location. Note that given a timestamp with time zone and a zone, PostgreSQL returns a timestamp without time zone (you know the zone since you specified it). Yes, I know the zone. But I don't know the offset from UTC. Example: template1=> set timezone to 'UTC'; SET template1=> select ('2011-10-30 01:00:00'::timestamptz) at time zone 'Europe/Budapest'; timezone - 2011-10-30 02:00:00-- Is it winter or summer time? (1 row) template1=> select ('2011-10-30 00:00:00'::timestamptz) at time zone 'Europe/Budapest'; timezone - 2011-10-30 02:00:00-- Is it winter or summer time? What is the offset from UTC here? Can you tell me when it was in UTC? (1 row) template1=> What is more: template1=> select (('2011-10-30 00:00:00'::timestamptz) at time zone 'Europe/Budapest') is distinct from (('2011-10-30 01:00:00'::timestamptz) at time zone 'Europe/Budapest'); ?column? -- f (1 row) template1=> Yeah, we know what time zone it is in, but we don't know when it was, thanks a lot. :-( It would be unambiguous to store the UTC offset along with the value. But it is not how it was implemented. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ZFS vs. UFS
When Intel RAID controller is that? All of the ones on the motherboard are pretty much useless if that's what you have. Those are slower than software RAID and it's going to add driver issues you could otherwise avoid. Better to connect the drives to the non-RAID ports or configure the controller in JBOD mode first. Using one of the better RAID controllers, one of Dell's good PERC models for example, is one of the biggest hardware upgrades you could make to this server. If your database is mostly read traffic, it won't matter very much. Write-heavy loads really benefit from a good RAID controller's write cache. Actually, it is a PERC with write-cache and BBU. ZFS will heavily use server RAM for caching by default, much more so than UFS. Make sure you check into that, and leave enough RAM for the database to run too. (Doing *some* caching that way is good for Postgres; you just don't want *all* the memory to be used for that) Right now, the size of the database is below 5GB. So I guess it will fit into memory. I'm concerned about data safety and availability. I have been in a situation where the RAID card went wrong and I was not able to recover the data because I could not get an identical RAID card in time. I have also been in a situation where the system was crashing two times a day, and we didn't know why. (As it turned out, it was a bug in the "stable" kernel and we could not identify this for two weeks.) However, we had to do fsck after every crash. With a 10TB disk array, it was extremely painful. ZFS is much better: short recovery time and it is RAID card independent. So I think I have answered my own question - I'm going to use ZFS to have better availability, even if it leads to poor performance. (That was the original question: how bad it it to use ZFS for PostgreSQL, instead of the native UFS.) Moving disks to another server is a very low probability fix for a broken system. The disks are a likely place for the actual failure to happen at in the first place. Yes, but we don't have to worry about that. raidz2 + hot spare is safe enough. The RAID card is the only single point of failure. I like to think more in terms of "how can I create a real-time replica of this data?" to protect databases, and the standby server for that doesn't need to be an expensive system. That said, there is no reason to set things up so that they only work with that Intel RAID controller, given that it's not a very good piece of hardware anyway. I'm not sure how to create a real-time replica. This database is updated frequently. There is always a process that reads/writes into the database. I was thinking about using slony to create slave databases. I have no experience with that. We have a 100Mbit connection. I'm not sure how much bandwidth we need to maintain a real-time slave database. It might be a good idea. I'm sorry, I feel I'm being off-topic. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ZFS vs. UFS
On 24.07.2012 14:51, Laszlo Nagy wrote: * UFS is not journaled. There is journal support for UFS as far as i know. Please have a look at the gjournal manpage. Yes, but gjournal works for disk devices. I would have rely on the hw card for RAID. When the card goes wrong I won't be able to access my data. I could also buy an identical RAID card. In fact I could buy a complete backup server. But right now I don't have the money for that. So I would like to use a solution that allows me to recover from a failure even if the RAID card goes wrong. It might also be possible to combine gmirror + gjournal, but that is not good enough. Performance and stability of a simple gmirror with two disks is much worse then a raidz array with 10 disks (and hot spare), or even a raid 1+0 (and hot spare) that is supported by the hw RAID card. So I would like to stick with UFS+hw card support (and then I need to buy an identical RAID card if I can), or ZFS. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ZFS vs. UFS
> I wonder if UFS has better performance or not. Or can you suggest > another fs? Just of the PGDATA directory. Relying on physically moving a disk isn't a good backup/recovery strategy. Disks are the least reliable single component in a modern computer. You should figure out the best file system for your application, and separately figure out a recovery strategy, one that can survive the failure of *any* component in your system, including the disk itself. This is why I use a RAID array of 10 disks. So there is no single point of failure. What else could I do? (Yes, I can make regular backups, but that is not the same. I can still loose data...)
[PERFORM] ZFS vs. UFS
Hello, Under FreeBSD 9, what filesystem should I use for PostgreSQL? (Dell PowerEdge 2900, 24G mem, 10x2T SATA2 disk, Intel RAID controller.) * ZFS is journaled, and it is more independent of the hardware. So if the computer goes wrong, I can move the zfs array to a different server. * UFS is not journaled. Also I have to rely on the RAID card to build the RAID array. If there is a hw problem with it, then I won't be able to recover the data easily. I wonder if UFS has better performance or not. Or can you suggest another fs? Just of the PGDATA directory. Thanks, Laszlo
Re: [PERFORM] queries are fast after dump->restore but slow again after some days dispite vacuum
Are you running a lot of full table updates? If you mean updates which are applied on every or almost every row of the table - yes, it happens with two rather small tables of max. 10 000 rows. But they are both not touched by the query with this big performance difference. I'm not an expert, but would it help to change fillfactor to about 45%? I'm just guessing that full table updates with fillfactor=45% could store the rows on the same page. Maybe I'm wrong. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Why it is using/not using index scan?
For this query: select pp.id,pp.product_id,pp.selling_site_id,pp.asin from product_price pp where (pp.asin is not null and pp.asin<>'') and (pp.upload_status_id<>1) and pp.selling_site_id in (8,7,35,6,9) and (pp.last_od < 'now'::timestamp - '1 week'::interval ) limit 5000 Query plan is: "Limit (cost=9182.41..77384.80 rows=3290 width=35)" " -> Bitmap Heap Scan on product_price pp (cost=9182.41..77384.80 rows=3290 width=35)" "Recheck Cond: ((last_od < '2011-03-24 13:05:09.540025'::timestamp without time zone) AND (selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))" "Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND (upload_status_id <> 1))" "-> Bitmap Index Scan on idx_product_price_last_od_ss (cost=0.00..9181.59 rows=24666 width=0)" " Index Cond: ((last_od < '2011-03-24 13:05:09.540025'::timestamp without time zone) AND (selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))" For this query: select pp.id,pp.product_id,pp.selling_site_id,pp.asin from product_price pp where (pp.asin is not null and pp.asin<>'') and (pp.upload_status_id<>1) and pp.selling_site_id in (8,7,35,6,9) and (pp.last_od + '1 week'::interval < 'now'::timestamp ) limit 5000 Query plan is: "Limit (cost=0.00..13890.67 rows=5000 width=35)" " -> Seq Scan on product_price pp (cost=0.00..485889.97 rows=174898 width=35)" "Filter: ((asin IS NOT NULL) AND (asin <> ''::text) AND (upload_status_id <> 1) AND ((last_od + '7 days'::interval) < '2011-03-31 13:06:17.460013'::timestamp without time zone) AND (selling_site_id = ANY ('{8,7,35,6,9}'::bigint[])))" The only difference is this: instead of (pp.last_od < 'now'::timestamp - '1 week'::interval ) I have used (pp.last_od + '1 week'::interval < 'now'::timestamp ) First query with index scan opens in 440msec. The second query with seq scan opens in about 22 seconds. So the first one is about 50x faster. My concern is that we are working on a huge set of applications that use thousands of different queries on a database. There are programs that we wrote years ago. The database structure continuously changing. We are adding new indexes and columns, and of course we are upgrading PostgreSQL when a new stable version comes out. There are cases when a change in a table affects 500+ queries in 50+ programs. I really did not think that I have to be THAT CAREFUL with writing conditions in SQL. Do I really have to manually analyze all those queries and "correct" conditions like this? If so, then at least I would like to know if there is a documentation or wiki page where I can learn about "how not to write conditions". I just figured out that I need to put constant expressions on one side of any comparison, if possible. But probably there are other rules I wouldn't think of. Might it be possible to change the optimizer so that it tries to rally constant expressions in the first place? That cannot be bad, right? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query on CLUTER -ed tables
I suspect that, since the matched hid's probably aren't sequential, many of those ~500 product_price_offer_history rows will be far apart on disk. OMG I was a fool! I'll CLUSTER on a different index and it will be fast, I'm sure. Thanks! L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query on CLUTER -ed tables
Given two tables: CREATE TABLE product_price_history ( hid bigint NOT NULL, hdate timestamp without time zone NOT NULL, id bigint NOT NULL, product_id bigint NOT NULL, more columns here CONSTRAINT pk_product_price_history PRIMARY KEY (hid); CREATE INDEX idx_product_price_history_id_hdate ON product_price_history USING btree (id, hdate); CREATE TABLE product_price_offer_history ( hid bigint NOT NULL, product_price_id bigint NOT NULL, isfeatured smallint NOT NULL, price double precision NOT NULL, shipping double precision NOT NULL, some more coumns here CONSTRAINT pk_product_price_offer_history PRIMARY KEY (hid, offerno) ); Stats: product_price_history - tablesize=23GB, indexes size=4GB, row count = 87 million product_price_offer_history - tablesize=24GB, indexes size=7GB, row count = 235 million These tables store historical data of some million products from the last year. The following commands are executed on them daily: CLUSTER idx_product_price_history_id_hdate on product_price_history; CLUSTER pk_product_price_offer_history on product_price_offer_history; Here is a query: select date_part('epoch', min(pph.hdate) ) as hdate_ticks, min(ppoh.price+ppoh.shipping) as price_plus_shipping from product_price_history pph inner join product_price_offer_history ppoh on ppoh.hid = pph.hid where pph.id = 37632081 and ppoh.isfeatured=1 group by ppoh.merchantid,pph.hid,pph.hdate order by pph.hid asc I think that the query plan is correct: "GroupAggregate (cost=5553554.25..5644888.17 rows=2283348 width=50)" " -> Sort (cost=5553554.25..5559262.62 rows=2283348 width=50)" "Sort Key: pph.hid, ppoh.merchantid, pph.hdate" "-> Nested Loop (cost=0.00..5312401.66 rows=2283348 width=50)" " -> Index Scan using idx_product_price_history_id_hdate on product_price_history pph (cost=0.00..8279.80 rows=4588 width=16)" "Index Cond: (id = 37632081)" " -> Index Scan using pk_product_price_offer_history on product_price_offer_history ppoh (cost=0.00..1149.86 rows=498 width=42)" "Index Cond: (ppoh.hid = pph.hid)" "Filter: (ppoh.isfeatured = 1)" So it uses two index scans on the indexes we CLUSTER the tables on. Number of rows returned is usually between 100 and 20 000. Here is the problem. When I first open this query for a given identifier, it runs for 100 seconds. When I try to run it again for the same identifier it returns the same rows within one second! The indexes are very well conditioned: from the 235 million rows, any id given occurs at most 20 000 times. It is a btree index, so it should already be stored sorted, and the 20 000 rows to be returned should fit into a few database pages. Even if they are not in the cache, PostgreSQL should be able to read the required pages within a second. I understand that for an index scan, PostgreSQL also needs to read the rows from the table. But since these tables are CLUSTER-ed on those specific indexes, all the data needed shoud fit on a few database pages and PostgreSQL should be able to read them within a second. Then why it is taking 100 seconds to do the query for the first time and why it is just one sec for the second time? Probably my thinking is wrong, but I suppose it means that the data is spread on thousands of pages on the disk. How is that possible? What am I doing wrong? Thanks, Laszlo -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bad query plan when the wrong data type is used
This query: select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail from variation_item_sellingsite_asin visa inner join product p on p.id = visa.product_id inner join variation_item vi on vi.id = visa.variation_item_id where visa.id =4 runs in 43 msec. The "visa.id" column has int4 datatype. The query plan uses an index condition: "Nested Loop (cost=0.00..26.19 rows=1 width=28)" " -> Nested Loop (cost=0.00..17.75 rows=1 width=24)" "-> Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16)" " Index Cond: (id = 4)" "-> Index Scan using pk_product_id on product p (cost=0.00..9.16 rows=1 width=16)" " Index Cond: (p.id = visa.product_id)" " -> Index Scan using pk_variation_item_id on variation_item vi (cost=0.00..8.43 rows=1 width=12)" "Index Cond: (vi.id = visa.variation_item_id)" This query: select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail from variation_item_sellingsite_asin visa inner join product p on p.id = visa.product_id inner join variation_item vi on vi.id = visa.variation_item_id where visa.id =4.0 Runs for 1144 msec! Query plan uses seq scan + filter: "Nested Loop (cost=33957.27..226162.68 rows=14374 width=28)" " -> Hash Join (cost=33957.27..106190.76 rows=14374 width=20)" "Hash Cond: (visa.variation_item_id = vi.id)" "-> Seq Scan on variation_item_sellingsite_asin visa (cost=0.00..71928.04 rows=14374 width=16)" " Filter: ((id)::numeric = 4.0)" "-> Hash (cost=22026.01..22026.01 rows=954501 width=12)" " -> Seq Scan on variation_item vi (cost=0.00..22026.01 rows=954501 width=12)" " -> Index Scan using pk_product_id on product p (cost=0.00..8.33 rows=1 width=16)" "Index Cond: (p.id = visa.product_id)" Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them. It would be better to do this, for each item in the loop: * evaluate the right side (which is float) * tell if it is an integer or not * if not an integer, then discard the row immediately * otherwise use its integer value for the index scan The result is identical, but it makes possible to use the index scan. Of course, I know that the query itself is wrong, because I sould not use a float where an int is expected. But this CAN be optimized, so I think it should be! My idea for the query optimizer is not to use the "wider" data type, but use the data type that has an index on it instead. (I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.) What do you think? Laszlo
[PERFORM] Get master-detail relationship metadata
Hi All, I'm working on a client program that iterates over master-detail relationships in a loop chain. Pseudo code: for row_1 in table_1: table_2 = get_details(row_1,"table2") for row_2 in table_2: row_3 = get_details(row_2,"table3") etc. process_data(row1,row_2,row_3,) My task is to write the "get_details" iterator effectively. The obvious way to do it is to query details in every get_details() call, but that is not efficient. We have relationships where one master only has a few details. For 1 million master rows, that would result in execution of millions of SQL SELECT commands, degrading the performance by magnitudes. My idea was that the iterator should pre-fetch and cache data for many master records at once. The get_details() would use the cached rows, thus reducing the number of SQL SELECT statements needed. Actually I wrote the iterator, and it works fine in some cases. For example: producers = get_rows("producer") for producer in producers: products = get_getails(producer,"product") for product in products: prices = get_details(product,"prices") for price in prices: process_product_price(producer,product,price) This works fine if one producer has not more than 1000 products and one product has not more than 10 prices. I can easly keep 10 000 records in memory. The actual code executes about 15 SQL queries while iterating over 1 million rows. Compared to the original "obvious" method, performance is increased to 1500% But sometimes it just doesn't work. If a producer has 1 million products, and one product has 100 prices, then it won't work, because I cannot keep 100 million prices in memory. My program should somehow figure out, how much rows it will get for one master, and select between the cached and not cached methods. So here is the question: is there a way to get this information from PostgreSQL itself? I know that the query plan contains information about this, but I'm not sure how to extract. Should I run an ANALYZE command of some kind, and parse the result as a string? For example: EXPLAIN select * from product where producer_id=1008; QUERY PLAN -- Seq Scan on product (cost=0.00..1018914.74 rows=4727498 width=1400) Filter: (producer_id = 1008) (2 rows) Then I could extract "rows=4727498" to get an idea about how much detail rows I'll get for the master. Is there any better way to do it? And how reliable is this? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query + why bitmap index scan??
On 2011-01-12 15:36, Kevin Grittner wrote: Laszlo Nagy wrote: shared_mem = 6GB work_mem = 512MB total system memory=24GB In addition to the good advice from Ken, I suggest that you set effective_cache_size (if you haven't already). Add whatever the OS shows as RAM used for cache to the shared_mem setting. It was 1GB. Now I changed to 2GB. Although the OS shows 9GB inactive memory, we have many concurrent connections to the database server. I hope it is okay to use 2GB. But yeah, for your immediate problem, if you can cluster the table on the index involved, it will be much faster. Of course, if the table is already in a useful order for some other query, that might get slower, and unlike some other products, CLUSTER in PostgreSQL doesn't *maintain* that order for the data as new rows are added -- so this should probably become a weekly (or monthly or some such) maintenance operation. Thank you! After clustering, queries are really fast. I don't worry about other queries. This is the only way we use this table - get details for a given id value. I put the CLUSTER command into a cron script that runs daily. For the second time, it took 2 minutes to run so I guess it will be fine. Thank you for your help. Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query + why bitmap index scan??
On 2011-01-12 14:42, Florian Weimer wrote: * Laszlo Nagy: This query: select hid from product_price_history where id=35547581 Returns 759 rows in 8837 msec! How can this be that slow??? If most records are on different heap pages, processing this query requires many seeks. 11ms per seek is not too bad if most of them are cache misses. How about this: select id,hdate from product_price_history where id=35547581 -- 759 rows, 8837 ms Query time average: 3 sec. Query plan: "Bitmap Heap Scan on product_price_history (cost=13.91..1871.34 rows=474 width=16)" " Recheck Cond: (id = 35547582)" " -> Bitmap Index Scan on idx_product_price_history_id_hdate (cost=0.00..13.79 rows=474 width=0)" "Index Cond: (id = 35547582)" Why still the heap scan here? All fields in the query are in the index... Wouldn't a simple index scan be faster? (This is only a theoretical question, just I'm curious.) My first idea to speed things up is to cluster this table regularly. That would convert (most of the) rows into a few pages. Few page reads -> faster query. Is it a good idea? Another question. Do you think that increasing shared_mem would make it faster? Currently we have: shared_mem = 6GB work_mem = 512MB total system memory=24GB Total database size about 30GB, but there are other programs running on the system, and many other tables. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query + why bitmap index scan??
This will be simple question to answer. :-) There is a single table: select count(*) from product_price_history -- 12982555 rows This table has exactly one index and on primary key constraint: CREATE INDEX idx_product_price_history_id_hdate ON product_price_history USING btree (id, hdate); ALTER TABLE product_price_history ADD CONSTRAINT pk_product_price_history PRIMARY KEY(hid); No more constraints or indexes defined on this table. Rows are never updated or deleted in this table, they are only inserted. It was vacuum-ed and reindex-ed today. Stats on the table: seq scans=13, index scans=108, table size=3770MB, toast table size=8192 bytes, indexes size=666MB This query: select hid from product_price_history where id=35547581 Returns 759 rows in 8837 msec! How can this be that slow??? The query plan is: "Bitmap Heap Scan on product_price_history (cost=13.90..1863.51 rows=472 width=8)" " Recheck Cond: (id = 35547581)" " -> Bitmap Index Scan on idx_product_price_history_id_hdate (cost=0.00..13.78 rows=472 width=0)" "Index Cond: (id = 35547581)" I don't understand why PostgreSQL uses bitmap heap scan + bitmap index scan? Why not just use an regular index scan? Data in a btree index is already sorted. A normal index scan should take no more than a few page reads. This sould never take 8 seconds. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
- Pg doesn't know the erase block sizes or positions. It can't group writes up by erase block except by hoping that, within a given file, writing in page order will get the blocks to the disk in roughly erase-block order. So your write caching isn't going to do anywhere near as good a job as the SSD's can. Okay, I see. We cannot query erase block size from an SSD drive. :-( I don't think that any SSD drive has more than some megabytes of write cache. The big, lots-of-$$ ones have HUGE battery backed caches for exactly this reason. Heh, this is why they are so expensive. :-) The same amount of write cache could easily be implemented in OS memory, and then Pg would always know what hit the disk. Really? How does Pg know what order the SSD writes things out from its cache? I got the point. We cannot implement an efficient write cache without much more knowledge about how that particular drive works. So... the only solution that works well is to have much more RAM for read cache, and much more RAM for write cache inside the RAID controller (with BBU). Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
A change has been written to the WAL and fsync()'d, so Pg knows it's hit disk. It can now safely apply the change to the tables themselves, and does so, calling fsync() to tell the drive containing the tables to commit those changes to disk. The drive lies, returning success for the fsync when it's just cached the data in volatile memory. Pg carries on, shortly deleting the WAL archive the changes were recorded in or recycling it and overwriting it with new change data. The SSD is still merrily buffering data to write cache, and hasn't got around to writing your particular change yet. All right. I believe you. In the current Pg implementation, I need to turn of disk cache. But I would like to ask some theoretical questions. It is just an idea from me, and probably I'm wrong. Here is a scenario: #1. user wants to change something, resulting in a write_to_disk(data) call #2. data is written into the WAL and fsync()-ed #3. at this point the write_to_disk(data) call CAN RETURN, the user can continue his work (the WAL is already written, changes cannot be lost) #4. Pg can continue writting data onto the disk, and fsync() it. #5. Then WAL archive data can be deleted. Now maybe I'm wrong, but between #3 and #5, the data to be written is kept in memory. This is basically a write cache, implemented in OS memory. We could really handle it like a write cache. E.g. everything would remain the same, except that we add some latency. We can wait some time after the last modification of a given block, and then write it out. Is it possible to do? If so, then can we can turn off write cache for all drives, except the one holding the WAL. And still write speed would remain the same. I don't think that any SSD drive has more than some megabytes of write cache. The same amount of write cache could easily be implemented in OS memory, and then Pg would always know what hit the disk. Thanks, Laci -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
* I could buy two X25-E drives and have 32GB disk space, and some redundancy. This would cost about $1600, not counting the RAID controller. It is on the edge. This was the solution I went with (4 drives in a raid 10 actually). Not a cheap solution, but the performance is amazing. I've came across this article: http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/ It's from a Linux MySQL user so it's a bit confusing but it looks like he has some reservations about performance vs reliability of the Intel drives - apparently they have their own write cache and when it's disabled performance drops sharply. Ok, I'm getting confused here. There is the WAL, which is written sequentially. If the WAL is not corrupted, then it can be replayed on next database startup. Please somebody enlighten me! In my mind, fsync is only needed for the WAL. If I could configure postgresql to put the WAL on a real hard drive that has BBU and write cache, then I cannot loose data. Meanwhile, product table data could be placed on the SSD drive, and I sould be able to turn on write cache safely. Am I wrong? L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Robert Haas wrote: 2009/11/14 Laszlo Nagy : 32GB is for one table only. This server runs other applications, and you need to leave space for sort memory, shared buffers etc. Buying 128GB memory would solve the problem, maybe... but it is too expensive. And it is not safe. Power out -> data loss. I'm sorry I though he was talking about keeping the database in memory with fsync=off. Now I see he was only talking about the OS disk cache. My server has 24GB RAM, and I cannot easily expand it unless I throw out some 2GB modules, and buy more 4GB or 8GB modules. But... buying 4x8GB ECC RAM (+throwing out 4x2GB RAM) is a lot more expensive than buying some 64GB SSD drives. 95% of the table in question is not modified. Only read (mostly with index scan). Only 5% is actively updated. This is why I think, using SSD in my case would be effective. Sorry for the confusion. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Heikki Linnakangas wrote: Laszlo Nagy wrote: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. At 32GB database size, I'd seriously consider just buying a server with a regular hard drive or a small RAID array for redundancy, and stuffing 16 or 32 GB of RAM into it to ensure everything is cached. That's tried and tested technology. 32GB is for one table only. This server runs other applications, and you need to leave space for sort memory, shared buffers etc. Buying 128GB memory would solve the problem, maybe... but it is too expensive. And it is not safe. Power out -> data loss. I don't know how you came to the 32 GB figure, but keep in mind that administration is a lot easier if you have plenty of extra disk space for things like backups, dumps+restore, temporary files, upgrades etc. This disk space would be dedicated for a smaller tablespace, holding one or two bigger tables with index scans. Of course I would never use an SSD disk for storing database backups. It would be waste of money. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Note that some RAID controllers (3Ware in particular) refuse to recognize the MLC drives, in particular, they act as if the OCZ Vertex series do not exist when connected. I don't know what they're looking for (perhaps some indication that actual rotation is happening?) but this is a potential problem make sure your adapter can talk to these things! BTW I have done some benchmarking with Postgresql against these drives and they are SMOKING fast. I was thinking about ARECA 1320 with 2GB memory + BBU. Unfortunately, I cannot find information about using ARECA cards with SSD drives. I'm also not sure how they would work together. I guess the RAID cards are optimized for conventional disks. They read/write data in bigger blocks and they optimize the order of reading/writing for physical cylinders. I know for sure that this particular areca card has an Intel dual core IO processor and its own embedded operating system. I guess it could be tuned for SSD drives, but I don't know how. I was hoping that with a RAID 6 setup, write speed (which is slower for cheaper flash based SSD drives) would dramatically increase, because information written simultaneously to 10 drives. With very small block size, it would probably be true. But... what if the RAID card uses bigger block sizes, and - say - I want to update much smaller blocks in the database? My other option is to buy two SLC SSD drives and use RAID1. It would cost about the same, but has less redundancy and less capacity. Which is the faster? 8-10 MLC disks in RAID 6 with a good caching controller, or two SLC disks in RAID1? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] SSD + RAID
Hello, I'm about to buy SSD drive(s) for a database. For decision making, I used this tech report: http://techreport.com/articles.x/16255/9 http://techreport.com/articles.x/16255/10 Here are my concerns: * I need at least 32GB disk space. So DRAM based SSD is not a real option. I would have to buy 8x4GB memory, costs a fortune. And then it would still not have redundancy. * I could buy two X25-E drives and have 32GB disk space, and some redundancy. This would cost about $1600, not counting the RAID controller. It is on the edge. * I could also buy many cheaper MLC SSD drives. They cost about $140. So even with 10 drives, I'm at $1400. I could put them in RAID6, have much more disk space (256GB), high redundancy and POSSIBLY good read/write speed. Of course then I need to buy a good RAID controller. My question is about the last option. Are there any good RAID cards that are optimized (or can be optimized) for SSD drives? Do any of you have experience in using many cheaper SSD drives? Is it a bad idea? Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] random_page_cost for tablespace
Robert Haas írta: 2009/11/9 Laszlo Nagy : We have a bigger table with some million rows. Number of index scans is high, number of seq reads is low. This table if often joined with others... so we want to buy a new SSD drive, create a tablespace on it and put this big table on it. Random read speed on SSD is identical to seq read. However, I need to tell the optimizer that random_page_cost is less for the new tablespace. Is there a way to do it? I happen to be working on a patch for this exact feature. However, even assuming it gets in, that means waiting for 8.5. That will be a very nice feature. Thank you! :-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] random_page_cost for tablespace
Hi All, We have a bigger table with some million rows. Number of index scans is high, number of seq reads is low. This table if often joined with others... so we want to buy a new SSD drive, create a tablespace on it and put this big table on it. Random read speed on SSD is identical to seq read. However, I need to tell the optimizer that random_page_cost is less for the new tablespace. Is there a way to do it? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is my stats collector so busy?
Tom Lane wrote: Laszlo Nagy writes: On a 8 processor system, my stats collector is always at 100% CPU. What platform? What Postgres version? regards, tom lane 8.3.5 on FreeBSD 7.0 amd64 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is my stats collector so busy?
What version of Postgres are you using? 8.3.5 on FreeBSD amd64 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Why is my stats collector so busy?
On a 8 processor system, my stats collector is always at 100% CPU. Meanwhile disk I/O is very low. We have many databases, they are accessed frequently. Sometimes there are big table updates, but in most of the time only simple queries are ran against the databases, returning a few records only. From the maximum possible 8.0 system load, the average load is always above 1.1 and from this, 1.0 is the stats collector and 0.1 is the remaining of the system. If I restart the postgresql server, then the stats collector uses 0% CPU for about 10 minutes, then goes up to 100% again. Is there a way to tell why it is working so much? I asked this problem some months ago on a different mailing list. I was asked to provide tracebacks of the stats collector, but due to a bug in the FreeBSD ppid() function, I'm not able to trace the stats collector. Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partial index usage
Hi All, I have these indexes on a table: CREATE INDEX uidx_product_partno_producer_id ON product USING btree (partno, producer_id); CREATE INDEX idx_product_partno ON product USING btree (partno); Can I safely delete the second one? Will postgresql use (partno,producer_id) when it only needs to order by partno? (partno is a text field, producer_id is int4). Index sizes: 172MB and 137MB. I guess if I only had one index, it would save memory and increase performance. Another pair of incides, 144MB and 81MB respecively: CREATE INDEX idx_product_producer_uploads ON product USING btree (producer_id, am_upload_status_id); CREATE INDEX idx_product_producer_id ON product USING btree (producer_id); am_upload_status_id is also an int4. Can I delete the second index without performance drawback? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Big index sizes
We have serveral table where the index size is much bigger than the table size. Example: select count(*) from product_price -- 2234244 Table size: 400 MB Index size: 600 MB After executing "reindex table product_price", index size reduced to 269MB. I believe this affects performance. Vacuuming a table does not rebuild the indexes, am I right? I'm not sure if I need to do this manually, or is this the result of another problem? (For example, too many open transactions, frequent updates?) Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update - SOLVED!
Inf 8.3 the HOT feature may help if the columns being updated are indexed ... what version of PostgreSQL is this again ? (Forgive my lack of memory -- the last few days I've forgotten a lot, heh heh.) 8.3.5. The colum that was being updated is part of one small index only. Any chances to reduce those to a bare minimum, perhaps using conditional index strategies or even some form of replication, so the primary uses indexes related to the updates and the mirror uses indexes related to the read-only / reporting needs ? Perhaps some form of staging table with no indexes to load, check data, etc. and then insert. Any way to reduce those ? Check the usage via the system stats on table/index use and try removing some and testing to see what makes a difference. We tried to remove all indexes on a test system and the update was speedy. We are going to try to reduce the row size also move static description/name/textual data into a separate table, and leave frequently updated data in the original one. We tested this theoretical version: Query returned successfully: 182752 rows affected, 56885 ms execution time. This is much faster. However, this table is used by hundreds of programs. Anyway, I got the answer to my question. Thank you! Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
My other idea was that there are so many indexes on this table, maybe the update is slow because of the indexes? Updating indexes is certainly very far from being free. How many is "many"? Number of indexes = 15. 3 indexex are on "text" type column, 500MB in size each. Other are on int8 and timestamp columns, cca. 200MB each. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] rebellious pg stats collector (reopened case)
Posted to the wrong list by mistake. Sorry. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [ADMIN] rebellious pg stats collector (reopened case)
and see if its output changes when you start to trace it. %cat test.c #include int main() { while(1) { sleep(5); printf("ppid = %d\n", getppid()); } } %gcc -o test test.c %./test ppid = 47653 ppid = 47653 ppid = 47653 # Started "truss -p 48864" here! ppid = 49073 ppid = 49073 ppid = 49073 Agreed, but we need to understand what the tools being used to investigate the problem are doing ... Unfortunately, I'm not able to install strace: # pwd /usr/ports/devel/strace # make ===> strace-4.5.7 is only for i386, while you are running amd64. *** Error code 1 Stop in /usr/ports/devel/strace. I'll happily install any trace tool, but have no clue which one would help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
I just tested the same on a test machine. It only has one processor 1GB memory, and one SATA disk. The same "select count(*)" was 58 seconds. I started the same UPDATE with EXPLAIN ANALYZE. It is running since 1000 seconds. I'm now 100% sure that the problem is with the database, because this machine has nothing but a postgresql server running on it. I'll post the output of explain analyze later. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
If the table has some sort of FK relations it might be being slowed by the need to check a row meant to be deleted has any children. If you look at my SQL, there is only one column to be updated. That column has no foreign key constraint. (It should have, but we did not want to add that constraint in order to speed up updates.) Perhaps triggers ? Table "product" has no triggers. If the table is very bloated with lots of dead rows (but you did say you vacuum frequently and check the results to make sure they are effective?) that would slow it down. I'm not sure how to check if the vacuum was effective. But we have max_fsm_pages=100 in postgresql.conf, and I do not get any errors from the daily vacuum script, so I presume that the table hasn't got too many dead rows. Anyway, the table size is only 4GB. Even if half of the rows are dead, the update should run quite quickly. Another argument is that when I "select count(*)" instead of "UPDATE", then I get the result in 10 seconds. I don't think that dead rows can make such a big difference between reading and writing. My other idea was that there are so many indexes on this table, maybe the update is slow because of the indexes? The column being updated has only one index on it, and that is 200MB. But I have heard somewhere that because of PostgreSQL's multi version system, sometimes the system needs to update indexes with columns that are not being updated. I'm not sure. Might this be the problem? A long running transaction elsewhere that is blocking the delete ? Did you check the locks ? Sorry, this was an update. A blocking transaction would never explain why the disk I/O went up to 100% for 2600 seconds. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow table update
Laszlo Nagy wrote: SQL: update product set sz_category_id=null where am_style_kw1 is not null and sz_category_id is not null Hmm, this query: select count(*) from product where am_style_kw1 is not null and sz_category_id is not null and sz_category_id<>4809 opens in 10 seconds. The update would not finish in 2600 seconds. I don't understand. L -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow table update
SQL: update product set sz_category_id=null where am_style_kw1 is not null and sz_category_id is not null query plan: "Seq Scan on product (cost=0.00..647053.30 rows=580224 width=1609)" " Filter: ((am_style_kw1 IS NOT NULL) AND (sz_category_id IS NOT NULL))" Information on the table: row count ~ 2 million table size: 4841 MB toast table size: 277mb indexes size: 4434 MB Computer: FreeBSD 7.0 stable, Dual Xeon Quad code 5420 2.5GHZ, 8GB memory, 6 ES SATA disks in hw RAID 6 (+2GB write back cache) for the database. Autovacuum is enabled. We also perform "vacuum analyze" on the database, each day. Here are some non-default values from postgresql.conf: shared_buffers=400MB maintenance_work_mem = 256MB max_fsm_pages = 100 There was almost no load on the machine (CPU: mostly idle, IO: approx. 5% total) when we started this update. Maybe I'm wrong with this, but here is a quick calculation: the RAID array should do at least 100MB/sec. Reading the whole table should not take more than 1 min. I think about 20% of the rows should have been updated. Writting out all changes should not take too much time. I believe that this update should have been completed within 2-3 minutes. In reality, after 2600 seconds I have cancelled the query. We monitored disk I/O and it was near 100% all the time. What is wrong? Thank you, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] UFS 2: soft updates vs. gjournal (AKA: Choosing a filesystem 2.)
Hi again, Should I use gjournal on FreeBSD 7? Or just soft updates? Here is my opinion: I suspect that gjournal would be much slower than soft updates. Also gjournal is relatively new code, not very well tested. But gjournal is better when the system crashes. Although I have heard that sometimes gjournal will crash the system itself. There are more pros for soft updates I would pefer that. But please let me know if I'm wrong. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Choosing a filesystem
going to the same drives. This turns your fast sequential I/O into random I/O with the accompaning 10x or more performance decrease. Unless you have a good RAID controller with battery-backed-up cache. All right. :-) This is what I'll have: Boxed Intel Server Board S5000PSLROMB with 8-port SAS ROMB card (Supports 45nm processors (Harpertown and Wolfdale-DP) Intel® RAID Activation key AXXRAK18E enables full intelligent SAS RAID on S5000PAL, S5000PSL, SR4850HW4/M, SR6850HW4/M. RoHS Compliant. 512 MB 400MHz DDR2 ECC Registered CL3 DIMM Single Rank, x8(for s5000pslromb) 6-drive SAS/SATA backplane with expander (requires 2 SAS ports) for SC5400 and SC5299 (two pieces) 5410 Xeon 2.33 GHz/1333 FSB/12MB Dobozos , Passive cooling / 80W (2 pieces) 2048 MB 667MHz DDR2 ECC Fully Buffered CL5 DIMM Dual Rank, x8 (8 pieces) SAS disks will be: 146.8 GB, SAS 3G,15000RPM, 16 MB cache (two pieces) SATA disks will be: HDD Server SEAGATE Barracuda ES 7200.1 (320GB,16MB,SATA II-300) __(10 pieces) I cannot spend more money on this computer, but since you are all talking about battery back up, I'll try to get money from the management and buy this: Intel® RAID Smart Battery AXXRSBBU3, optional battery back up for use with AXXRAK18E and SRCSAS144E. RoHS Complaint. This server will also be an IMAP server, web server etc. so I'm 100% sure that the SAS disks will be used for logging. I have two spare 200GB SATA disks here in the office but they are cheap ones designed for desktop computers. Is it okay to dedicate these disks for the WAL file in RAID1? Will it improve performance? How much trouble would it cause if the WAL file goes wrong? Should I just put the WAL file on the RAID 1+0 array? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Choosing a filesystem
I'm about to buy a new server. It will be a Xeon system with two processors (4 cores per processor) and 16GB RAM. Two RAID extenders will be attached to an Intel s5000 series motherboard, providing 12 SAS/Serial ATA connectors. The server will run FreeBSD 7.0, PostgreSQL 8, apache, PHP, mail server, dovecot IMAP server and background programs for database maintenance. On our current system, I/O performance for PostgreSQL is the biggest problem, but sometimes all CPUs are at 100%. Number of users using this system: PostgreSQL: 30 connections Apache: 30 connections IMAP server: 15 connections The databases are mostly OLTP, but the background programs are creating historical data and statistic data continuously, and sometimes web site visitors/serach engine robots run searches in bigger tables (with 3million+ records). There is an expert at the company who sells the server, and he recommended that I use SAS disks for the base system at least. I would like to use many SAS disks, but they are just too expensive. So the basic system will reside on a RAID 1 array, created from two SAS disks spinning at 15 000 rpm. I will buy 10 pieces of Seagate Barracuda 320GB SATA (ES 7200) disks for the rest. The expert told me to use RAID 5 but I'm hesitating. I think that RAID 1+0 would be much faster, and I/O performance is what I really need. I would like to put the WAL file on the SAS disks to improve performance, and create one big RAID 1+0 disk for the data directory. But maybe I'm completely wrong. Can you please advise how to create logical partitions? The hardware is capable of handling different types of RAID volumes on the same set of disks. For example, a smaller RAID 0 for indexes and a bigger RAID 5 etc. If you need more information about the database, please ask. :-) Thank you very much, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planning a new server - help needed
I guess you mean postgresql 8.3.1? :-) Yep. Sorry. Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS 2 + soft updates will be better, but I'm not sure. Which is better? I'd stick with ufs2 atm. There are some issues with zfs which probably have been ironed out by now but ufs2 has been deployed for a longer time. Performance-wise they are about the same. Thank you. I suspected the same but it was good to get positive confirmation. Question 4. How to make the partitions? This is the hardest question. Here is my plan: - the OS resides on 2 disks, RAID 1 - the databases should go on 8 disks, RAID 0 + 1 If you have enough disks raid-6 should perform almost as good as raid 1+0. Hmm, I have heard that RAID 1 or RAID 1 + 0 should be used for databases, never RAID 5. I know nothing about RAID 6. I guess I must accept your suggestion since you have more experience than I have. :-) Obviously, it would be easier to manage a single RAID 6 array. I've setup 11 disks in raid-6 plus one hotspare so I can get more space out of it. "Enough disks" are approx. eight and up. The RAID controller that I have selected can only handle 8 disks. I guess I need to find a different one with 16 channels and use more disks. So are you saying that with all disks in a bigger RAID 6 array, I will get the most out of the hardware? In that case, I'll try to get a bit more money from the management and build RAID 6 with 12 disks. I also feel that I need to use a separate RAID 1 array (I prefer gmirror) for the base system. Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Planning a new server - help needed
Hello, I need to install a new server for postgresql 8.3. It will run two databases, web server and some background programs. We already have a server but it is becoming slow and we would like to have something that is faster. It is a cost sensitive application, and I would like to get your opinion in some questions. The database itself is an OLTP system. There are many smaller tables, and some bigger ones (biggest table with 1.2 million records, table size 966MB, indexes size 790MB). In the bigger tables there are only a few records updated frequently, most of the other records are not changed. The smaller tables are updated continuously. Question 1. We are going to use PostgreSQL 3.1 with FreeBSD. The pg docs say that it is better to use FreeBSD because it can alter the I/O priority of processes dynamically. The latest legacy release is 6.3 which is probably more stable. However, folks say that 7.0 has superior performance on the same hardware. Can I use 7.0 on a production server? Question 2. SCSI or SATA? I plan to buy a RocketRAID 3520 controller with 8 SATA 2 disks. The operating system would be on another disk pair, connected to the motherboard's controller. I wonder if I can get more performance with SCSI, for the same amount of money? (I can spend about $1500 on the controller and the disks, that would cover 10 SATA 2 disks and the controller.) Question 3. FreeBSD 7.0 can use the ZFS file system. I suspect that UFS 2 + soft updates will be better, but I'm not sure. Which is better? Question 4. How to make the partitions? This is the hardest question. Here is my plan: - the OS resides on 2 disks, RAID 1 - the databases should go on 8 disks, RAID 0 + 1 However, the transaction log file should be on a separate disk and maybe I could gain more performance by putting indexes on a separate drive, but I do not want to reduce the number of disks in the RAID 0+1 array. Should I put indexes and transaction log on the RAID 1 array? Or should I invest a bit more money, add an SATA RAID controller with 16 channels and add more disks? Would it pay the bill? Another alternative is to put the biggest tables on a separate array so that it will be faster when we join these tables with other tables. I know that it is hard to answer without knowing the structure of the databases. :-( I can make tests with different configurations later, but I would like to know your opinion first - what should I try? Thanks, Laszlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor performance on seq scan
I have had extremely bad performance historically with onboard SATA chipsets on Linux. The one exception has been with the Intel based chipsets (not the CPU, the I/O chipset). This board has Intel chipset. I cannot remember the exact type but it was not in the low end category. dmesg says: kernel: ad4: 152626MB at ata2-master SATA150 kernel: ad4: 152627MB at ata3-master SATA150 It is very likely that you are having problems with the driver for the chipset. Are you running RAID1 in hardware? If so, turn it off and see what the performance is. The onboard hardware RAID is worse than useless, it actually slows the I/O down. I'm using software raid, namely gmirror: GEOM_MIRROR: Device gm0 created (id=2574033628). GEOM_MIRROR: Device gm0: provider ad4 detected. GEOM_MIRROR: Device gm0: provider ad6 detected. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider ad6 activated. #gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 2574033628 Providers: 1. Name: mirror/gm0 Mediasize: 160040803328 (149G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad4 Mediasize: 160040803840 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 1153981856 2. Name: ad6 Mediasize: 160041885696 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3520427571 I tried to do: #sysctl vfs.read_max=32 vfs.read_max: 6 -> 32 but I could not reach better disk read performance. Thank you for your suggestions. Looks like I need to buy SCSI disks. Regards, Laszlo ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Poor performance on seq scan
Craig A. James wrote: There IS a bug for SATA disk drives in some versions of the Linux kernel. On a lark I ran some of the I/O tests in this thread, and much to my surprise discovered my write speed was 6 MB/sec ... ouch! On an identical machine, different kernel, the write speed was 54 MB/sec. My disks are running in SATA150 mode. Whatever it means. I'm using FreeBSD, and not just because it dynamically alters the priority of long running processes. :-) Laszlo ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Poor performance on seq scan
Tom Lane wrote: Why is that showing 85+ percent *system* CPU time?? I could believe a lot of idle CPU if the query is I/O bound, or a lot of user time if PG was being a hog about doing the ~~ comparisons (not too unlikely BTW). I'm sorry, this was really confusing. I don't know what it was - probably a background system process, started from cron (?). I retried the same query and I got this: zeusd1=> explain analyze select id,name from product where name like '%Mug%'; QUERY PLAN Seq Scan on product (cost=0.00..206891.34 rows=36487 width=40) (actual time=17.188..44585.176 rows=91399 loops=1) Filter: (name ~~ '%Mug%'::text) Total runtime: 44631.150 ms (3 rows) tty ad4 ad6 cpu tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 62 115.25 143 16.06 116.03 143 16.17 3 0 9 3 85 0 62 122.11 144 17.12 121.78 144 17.07 6 0 3 2 89 0 62 126.18 158 19.45 125.86 157 19.28 5 0 11 6 79 0 62 126.41 131 16.13 127.52 132 16.39 5 0 9 6 80 0 62 127.80 159 19.81 126.89 158 19.55 5 0 9 0 86 0 62 125.29 165 20.15 126.26 165 20.30 5 0 14 2 80 0 62 127.22 164 20.32 126.74 165 20.37 5 0 9 0 86 0 62 121.34 150 17.75 120.76 149 17.54 1 0 13 3 82 0 62 121.40 143 16.92 120.33 144 16.89 5 0 11 3 82 0 62 127.38 154 19.12 127.17 154 19.09 8 0 8 5 80 0 62 126.88 129 15.95 127.00 128 15.84 5 0 9 5 82 0 62 118.48 121 13.97 119.28 121 14.06 6 0 17 3 74 0 62 127.23 146 18.10 126.79 146 18.04 9 0 20 2 70 0 62 127.27 153 18.98 128.00 154 19.21 5 0 17 0 79 0 62 127.02 130 16.09 126.28 130 16.00 10 0 16 3 70 0 62 123.17 125 15.00 122.40 125 14.91 5 0 14 2 80 0 62 112.37 130 14.24 112.62 130 14.27 0 0 14 3 83 0 62 115.83 138 15.58 113.97 138 15.33 3 0 18 0 79 A bit better transfer rate, but nothing serious. Regards, Laszlo ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] tsearch2 question (was: Poor performance on seq scan)
Tom Lane wrote: Only if the index is capable of disgorging the original value of the indexed column, a fact not in evidence in general (counterexample: polygons indexed by their bounding boxes in an r-tree). But yeah, it's interesting to think about applying filters at the index fetch step for index types that can hand back full values. This has been discussed before --- I think we had gotten as far as speculating about doing joins with just index values. See eg here: http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php A lot of the low-level concerns have already been dealt with in order to support bitmap indexscans, but applying non-indexable conditions before fetching from the heap is still not done. To overcome this problem, I created a smaller "shadow" table: CREATE TABLE product_search ( id int8 NOT NULL, name_desc text, CONSTRAINT pk_product_search PRIMARY KEY (id) ); insert into product_search select id, name || ' ' || coalesce(description,'') from product; Obviously, this is almost like an index, but I need to maintain it manually. I'm able to search with zeusd1=> explain analyze select id from product_search where name_desc like '%Mug%'; QUERY PLAN Seq Scan on product_search (cost=0.00..54693.34 rows=36487 width=8) (actual time=20.036..2541.971 rows=91399 loops=1) Filter: (name_desc ~~ '%Mug%'::text) Total runtime: 2581.272 ms (3 rows) The total runtime remains below 3 sec in all cases. Of course I still need to join the main table to the result: explain analyze select s.id,p.name from product_search s inner join product p on (p.id = s.id) where s.name_desc like '%Tiffany%' QUERY PLAN Nested Loop (cost=0.00..55042.84 rows=58 width=40) (actual time=164.437..3982.610 rows=117 loops=1) -> Seq Scan on product_search s (cost=0.00..54693.34 rows=58 width=8) (actual time=103.651..2717.914 rows=117 loops=1) Filter: (name_desc ~~ '%Tiffany%'::text) -> Index Scan using pk_product_id on product p (cost=0.00..6.01 rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117) Index Cond: (p.id = "outer".id) Total runtime: 4007.283 ms (6 rows) Took 4 seconds. Awesome! With the original table, it used to be one or two minutes! Now you can ask, why am I not using tsearch2 for this? Here is answer: CREATE TABLE product_search ( id int8 NOT NULL, ts_name_desc tsvector, CONSTRAINT pk_product_search PRIMARY KEY (id) ); insert into product_search select id, to_tsvector(name || ' ' coalesce(description,'')) from product; CREATE INDEX idx_product_search_ts_name_desc ON product_search USING gist (ts_name_desc); VACUUM product_search; zeusd1=> explain analyze select id from product_search where ts_name_desc @@ to_tsquery('mug'); QUERY PLAN --- Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=954.669..13112.009 rows=91434 loops=1) Filter: (ts_name_desc @@ '''mug'''::tsquery) -> Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455 rows=91436 loops=1) Index Cond: (ts_name_desc @@ '''mug'''::tsquery) Total runtime: 13155.724 ms (5 rows) zeusd1=> explain analyze select id from product_search where ts_name_desc @@ to_tsquery('tiffany'); QUERY PLAN Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=13151.725..13639.112 rows=76 loops=1) Filter: (ts_name_desc @@ '''tiffani'''::tsquery) -> Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705 rows=81 loops=1) Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery) Total runtime: 13639.478 ms (5 rows) At least 13 seconds, and the main table is not joined yet. Can anybody explain to me, why the seq scan is faster than the bitmap index? In the last example there were only 81 rows returned, but it took more than 13 seconds. :( Even if the whole table can be cached into memory (which isn't the case), the bitmap index should be much faster. Probably t
Re: [PERFORM] Poor performance on seq scan
Heikki Linnakangas wrote: Is there any other columns besides id and name in the table? How big is products.txt compared to the heap file? Yes, many other columns. The products.txt is only 59MB. It is similar to the size of the index size (66MB). Another question: I have a btree index on product(name). It contains all product names and the identifiers of the products. Wouldn't it be easier to seq scan the index instead of seq scan the table? The index is only 66MB, the table is 1123MB. Probably, but PostgreSQL doesn't know how to do that. Even if it did, it depends on how many matches there is. If you scan the index and then fetch the matching rows from the heap, you're doing random I/O to the heap. That becomes slower than scanning the heap sequentially if you're going to get more than a few hits. I have 700 000 rows in the table, and usually there are less than 500 hits. So probably using a "seq index scan" would be faster. :-) Now I also tried this: create table test(id int8 not null primary key, name text); insert into test select id,name from product; And then: zeusd1=> explain analyze select id,name from test where name like '%Tiffany%'; QUERY PLAN - Seq Scan on test (cost=0.00..26559.62 rows=79 width=40) (actual time=36.595..890.903 rows=117 loops=1) Filter: (name ~~ '%Tiffany%'::text) Total runtime: 891.063 ms (3 rows) But this might be coming from the disk cache. Thank you for your comments. We are making progress. Laszlo ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor performance on seq scan
Luke Lonergan írta: Lazlo, Meanwhile, "iostat 5" gives something like this: tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 This is your problem. Do the following and report the results here: Take the number of GB of memory you have (say 2 for 2GB), multiply it by 25. This is the number of 8KB pages you can fit in twice your ram. Let's say you have 2GB - the result is 500,000. Use that number to do the following test on your database directory: time bash -c "dd if=/dev/zero of=//bigfile bs=8k count= && sync" I have 1GB RAM. The data directory is in /usr/local/pgsql/data. The root of this fs is /usr. time sh -c "dd if=/dev/zero of=/usr/test/bigfile bs=8k count=25 && sync " 25+0 records in 25+0 records out 204800 bytes transferred in 48.030627 secs (42639460 bytes/sec) 0.178u 8.912s 0:48.31 18.7% 9+96k 37+15701io 0pf+0w Then do this: time bash -c "dd if=//bigfile of=/dev/null bs=8k" time sh -c "dd if=/usr/test/bigfile of=/dev/null bs=8k" 25+0 records in 25+0 records out 204800 bytes transferred in 145.293473 secs (14095609 bytes/sec) 0.110u 5.857s 2:25.31 4.1% 10+99k 32923+0io 0pf+0w At this point I thought there was another process reading doing I/O so I retried: 25+0 records in 25+0 records out 204800 bytes transferred in 116.395211 secs (17595226 bytes/sec) 0.137u 5.658s 1:56.51 4.9% 10+103k 29082+0io 0pf+1w and again: 25+0 records in 25+0 records out 204800 bytes transferred in 120.198224 secs (17038521 bytes/sec) 0.063u 5.780s 2:00.21 4.8% 10+98k 29776+0io 0pf+0w This is a mirrored disk with two SATA disks. In theory, writing should be slower than reading. Is this a hardware problem? Or is it that "sync" did not do the sync? Laszlo ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Poor performance on seq scan
Hello, I have a big table called products. Table size: 1123MB. Toast table size: 32MB. Indexes size: 380MB. I try to do a query like this: select id,name from products where name like '%Mug%'; Yes, I know that tsearch2 is better for this, but please read on. The above query gives this plan: Seq Scan on product (cost=0.00..153489.52 rows=31390 width=40) Filter: (name ~~ '%Mug%'::text) When I use this with explain analyze: "Seq Scan on product (cost=0.00..153489.52 rows=31390 width=40) (actual time=878.873..38300.588 rows=72567 loops=1)" " Filter: (name ~~ '%Mug%'::text)" "Total runtime: 38339.026 ms" Meanwhile, "iostat 5" gives something like this: tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 0 12 125.66 128 15.75 125.26 128 15.68 10 0 85 6 0 0 12 124.66 129 15.67 124.39 129 15.64 12 0 85 3 0 0 12 117.13 121 13.87 117.95 121 13.96 12 0 84 5 0 0 12 104.84 118 12.05 105.84 118 12.19 10 0 87 2 0 130 transfers per second with 12-15MB/sec transfer speed. (FreeBSD 6.1 with two STATA150 drives in gmirror RAID1) I made another test. I create a file with the identifiers and names of the products: psql#\o products.txt psql#select id,name from product; Then I can search using grep: grep "Mug" products.txt | cut -f1 -d\| There is a huge difference. This command runs within 0.5 seconds. That is, at least 76 times faster than the seq scan. It is the same if I vacuum, backup and restore the database. I thought that the table is stored in one file, and the seq scan will be actually faster than grepping the file. Can you please tell me what am I doing wrong? I'm not sure if I can increase the performance of a seq scan by adjusting the values in postgresql.conf. I do not like the idea of exporting the product table periodically into a txt file, and search with grep. :-) Another question: I have a btree index on product(name). It contains all product names and the identifiers of the products. Wouldn't it be easier to seq scan the index instead of seq scan the table? The index is only 66MB, the table is 1123MB. I'm new to this list and also I just recently started to tune postgresql so please forgive me if this is a dumb question. Regards, Laszlo ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq