Re: [PERFORM] Raid 10 chunksize
Ron Mayer wrote: Greg Smith wrote: On Wed, 1 Apr 2009, Scott Carey wrote: Write caching on SATA is totally fine. There were some old ATA drives that when paried with some file systems or OS's would not be safe. There are some combinations that have unsafe write barriers. But there is a standard well supported ATA command to sync and only return after the data is on disk. If you are running an OS that is anything recent at all, and any disks that are not really old, you're fine. While I would like to believe this, I don't trust any claims in this area that don't have matching tests that demonstrate things working as expected. And I've never seen this work. My laptop has a 7200 RPM drive, which means that if fsync is being passed through to the disk correctly I can only fsync 120 times/second. Here's what I get when I run sysbench on it, starting with the default ext3 configuration: I believe it's ext3 who's cheating in this scenario. I assume so too. Here the same test using XFS, first with barriers (XFS default) and then without: Linux 2.6.28-gentoo-r2 #1 SMP Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz GenuineIntel GNU/Linux /dev/sdb /data2 xfs rw,noatime,attr2,logbufs=8,logbsize=256k,noquota 0 0 # sysbench --test=fileio --file-fsync-freq=1 --file-num=1 --file-total-size=16384 --file-test-mode=rndwr run sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 0 1 files, 16Kb each 16Kb total file size Block size 16Kb Number of random requests for random IO: 1 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 1 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random write test Threads started! Done. Operations performed: 0 Read, 1 Write, 1 Other = 2 Total Read 0b Written 156.25Mb Total transferred 156.25Mb (463.9Kb/sec) 28.99 Requests/sec executed Test execution summary: total time: 344.9013s total number of events: 1 total time taken by event execution: 0.1453 per-request statistics: min: 0.01ms avg: 0.01ms max: 0.07ms approx. 95 percentile: 0.01ms Threads fairness: events (avg/stddev): 1./0.00 execution time (avg/stddev): 0.1453/0.00 And now without barriers: /dev/sdb /data2 xfs rw,noatime,attr2,nobarrier,logbufs=8,logbsize=256k,noquota 0 0 # sysbench --test=fileio --file-fsync-freq=1 --file-num=1 --file-total-size=16384 --file-test-mode=rndwr run sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 0 1 files, 16Kb each 16Kb total file size Block size 16Kb Number of random requests for random IO: 1 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 1 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random write test Threads started! Done. Operations performed: 0 Read, 1 Write, 1 Other = 2 Total Read 0b Written 156.25Mb Total transferred 156.25Mb (62.872Mb/sec) 4023.81 Requests/sec executed Test execution summary: total time: 2.4852s total number of events: 1 total time taken by event execution: 0.1325 per-request statistics: min: 0.01ms avg: 0.01ms max: 0.06ms approx. 95 percentile: 0.01ms Threads fairness: events (avg/stddev): 1./0.00 execution time (avg/stddev): 0.1325/0.00 -- Best regards, Hannes Dorbath -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgPool query cache
Rauan Maemirov wrote: I want to ask, if anyone used query_cache of pgPool. The problem is there is no detailed installation steps on how to configure it correctly. Itried to follow it, but guess that it doesn't cache my queries. So, maybe someone adviced me or give link. Nobody use?? I'd say caching is better done on a higher level -- like HTML fragments or what ever you generate from those queries.. Just my two cent. -- Best regards, Hannes Dorbath -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Fastest way / best practice to calculate next birthdays
The subject basically says it all, I'm looking for the fastest (indexable) way to calculate the next birthdays relative to NOW() from a dataset of about 1 million users. I'm currently using a function based index, but leap year handling / mapping February 29 to February 28 gives me some headaches. Is there any best practice to do that in PostgreSQL? -- Best regards, Hannes Dorbath -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.3 synchronous_commit
Guillaume Smet wrote: Also with SATA? If your SATA disk is lying about effectively SYNCing the data, I'm not that surprised you don't see any improvement. Being slower is a bit surprising though. The disc is not lying, but LVM does not support write barriers, so the result is the same. Indeed nothing is flushing the disc's write cache on fsync. However I could disable the disc's write cache entirely. One reason why I'm a ZFS fan boy lately. It just get's all of this right by *default*. Anyway, with some further testing my benchmark results vary so much that further discussion seems pointless. I'll repost when I really have reproducible values and followed Greg Smith's advice. -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 8.3 synchronous_commit
Greg Smith wrote: Try something more in the range of 4 clients/CPU and set the scale to closer to twice that (so with a dual-core system you might do 8 clients and a scale of 16). If you really want to simulate a large number of clients, do that on another system and connect to the server remotely. With 4 clients and scale 10 I get 246 TPS for synchronous_commit disabled and 634 TPS for synchronous_commit enabled. So the effect just got even stronger. That was for CFQ. For deadline they are now pretty close, but synchronous_commit disabled is still slower. 690 to 727. Values are AVG from 3 runs each. DROP/CREATE DATABASE and CHECKPOINT; before each run. -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] 8.3 synchronous_commit
I might completely misunderstand this feature. Shouldn't synchronous_commit = off improve performance? Whatever I do, I find synchronous_commit = off to degrade performance. Especially it doesn't like the CFQ I/O scheduler, it's not so bad with deadline. Synthetic load like pgbench -i -s 10 -U pgsql -d bench pgbench -t 1000 -c 100 -U pgsql -d bench or the same with scale 100. Maybe it's just my test box.. single SATA-II drive, XFS on top of LVM. I'll retry without LVM once I have another drive.. I've seen LVM mess with other things in the past. -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
On 05.09.2007 01:15, Scott Marlowe wrote: On 9/4/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Carlo Stonebanks wrote: A client is moving their postgresql db to a brand new Windows 2003 x64 server with 2 quad cores and 32GB of RAM. It is a dedicated server to run 8.2.4. Large shared_buffers and Windows do not mix. Perhaps you should leave the shmem config low, so that the kernel can cache the file pages. But yeah, the I/O, that's the big one. If it's just a single or a couple of IDE drives, it's not gonna be able to handle much load. Right, additionally NTFS is really nothing to use on any serious disc array. -- Regards, Hannes Dorbath ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] FORGOT TO CONFIGURE RAID! DELL POWEREDGE 2950
Joseph wrote: We just got a DELL POWEREDGE 2950. So I was tasked with putting Linux Redhat and dumped our software/packages on it. Contrary to common sense, I didn't bother reading the manuals that came with te 2950. I went right ahead and installed Redhat server on it, then went and loaded the backups software/data etc onto it and started having the team use it. And this has to do with pgsql.performance exactly what? Anyway, as someone who seems to administrates a PostgreSQL production box, you sure have a good backup plan. So just call DELL's support, fix you RAID and restore form backup. From the DELL site it seems this `PERC 5/i' on board controller (assuming that's what you have) doesn't even have a BBU. If you don't plan to post here in a few weeks again about data corruption, go out and shop a serious controller. And before you move that box in production, check: Is my hardware and software setup fsync/fua clean? Is my backup plan working? -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] FORGOT TO CONFIGURE RAID! DELL POWEREDGE 2950
Gregory Stark wrote: From the DELL site it seems this `PERC 5/i' on board controller (assuming that's what you have) doesn't even have a BBU. If you don't plan to post here in a few weeks again about data corruption, go out and shop a serious controller. This is a bit of a strange comment. A BBU will improve performance but Postgres doesn't require one to guarantee data integrity. If your drives have write caching disabled (ie write-through) and your controller does write-through caching and you leave fsync=on and full_page_writes=on which is the default then you shouldn't have any data integrity issues. That was my point, controllers without BBU usually leave drive caches turned on, as with drive caches off performance would be unbearable bad. -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000
Arnau wrote: Hi Thor, Thor-Michael Støre wrote: On 2007-04-04 Arnau wrote: Josh Berkus wrote: Arnau, Is there anything similar in PostgreSQL? The idea behind this is how I can do in PostgreSQL to have tables where I can query on them very often something like every few seconds and get results very fast without overloading the postmaster. If you're only querying the tables every few seconds, then you don't really need to worry about performance. Well, the idea behind this is to have events tables, and a monitoring system polls that table every few seconds. I'd like to have a kind of FIFO stack. From the events producer point of view he'll be pushing rows into that table, when it's filled the oldest one will be removed to leave room to the newest one. From the consumer point of view he'll read all the contents of that table. So I'll not only querying the tables, I'll need to also modify that tables. Please try to refrain from doing this. This is the Database as an IPC antipattern (Antipatterns are commonly-reinvented bad solutions to problems, I.E. you can be sure someone has tried this very thing before and found it to be a bad solution) http://en.wikipedia.org/wiki/Database_as_an_IPC Best solution is (like Ansgar hinted at) to use a real IPC system. Ofcourse, I've done it myself (not on PostgreSQL though) when working at a large corporation where corporate politics prevented me from introducing any new interdependency between systems (like having two start talking with eachother when they previously didn't), the only common ground for systems that needed to communicate was a database, and one of the systems was only able to run simple SQL statements and not stored procedures. First of all, thanks for your interested but let me explain what I need to do. We have a web application where customers want to monitor how it's performing, but not performing in terms of speed but how many customers are now browsing in the application, how many have payed browsing sessions, how many payments have been done, ... More or less is to have a control panel. The difference is that they want that the information displayed on a web browser must be real-time that is a query every 1-10 seconds. Though that has been suggested earlier, but why not use pgmemcache and push each event as a new key? As memcached is FIFO by design that is exacly what you ask for. Besides that memcached is so fast that your OS is more busy with handling all that TCP connections than running memcached. And in case you'd like to display statistical data and not tailing events, let PG push that to memcached keys as well. See memcached as a materialized view in that case. As middleware I'd recommend lighttpd with mod_magnet. You should be able to delivery that admin page way more than 5000 times / sec with some outdated desktop hardware. If that's not enough read up on things like http://blog.lighttpd.net/articles/2006/11/27/comet-meets-mod_mailbox -- Best regards, Hannes Dorbath ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Running PG on cluster files systems
GFS2, OFCS2, lustre, CXFS, GPFS, Veritas and what else there is.. ..has someone experience with any of those? Is it bearable to run PG on them from a performance point of view? I guess not, but any positive reports? Thanks -- Regards, Hannes Dorbath ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Yet another question on LIMIT performance :/
Though I've read recent threads, I'm unsure if any matches my case. We have 2 tables: revisions and revisions_active. revisions contains 117707 rows, revisions_active 17827 rows. DDL: http://hannes.imos.net/ddl.sql.txt Joining the 2 tables without an additional condition seems ok for me (given our outdated hardware): http://hannes.imos.net/query_1.sql.txt What worries me is the performance when limiting the recordset: http://hannes.imos.net/query_2.sql.txt Though it should only have to join a few rows it seems to scan all rows. From experience I thought that adding an ORDER BY on the index columns should speed it up. But no effect: http://hannes.imos.net/query_3.sql.txt I'm on 8.1.5, statistics (ANALYZE) are up to date, the tables have each been CLUSTERed by PK, statistic target for the join columns has been set to 100 (without any effect). Thanks in advance! -- Regards, Hannes Dorbath ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Yet another question on LIMIT performance :/
On 06.11.2006 15:13, Heikki Linnakangas wrote: Hannes Dorbath wrote: Though it should only have to join a few rows it seems to scan all rows. What makes you think that's the case? Sorry, not all rows, but 80753. It's not clear to me why this number is so high with LIMIT 10. -- Regards, Hannes Dorbath ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Opteron vs. Xeon benchmark
A colleague pointed me to this site tomorrow: http://tweakers.net/reviews/642/13 I can't read the language, so can't get a grip on what exactly the benchmark was about. Their diagrams show `Request per seconds'. What should that mean? How many connections PG accepted per second? So they measured the OS fork performance? Should that value be of any interrest? Anyone with heavy OLTP workload will use persistent connections or a connection pool in front. Do they mean TPS? That woulnd't make much sense in a CPU benchmark, as OLTP workload is typically limited by the disc subsystem. Can someone enlighten me what this site is about? -- Regards, Hannes Dorbath ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Arguments Pro/Contra Software Raid
Hi, I've just had some discussion with colleagues regarding the usage of hardware or software raid 1/10 for our linux based database servers. I myself can't see much reason to spend $500 on high end controller cards for a simple Raid 1. Any arguments pro or contra would be desirable. From my experience and what I've read here: + Hardware Raids might be a bit easier to manage, if you never spend a few hours to learn Software Raid Tools. + There are situations in which Software Raids are faster, as CPU power has advanced dramatically in the last years and even high end controller cards cannot keep up with that. + Using SATA drives is always a bit of risk, as some drives are lying about whether they are caching or not. + Using hardware controllers, the array becomes locked to a particular vendor. You can't switch controller vendors as the array meta information is stored proprietary. In case the Raid is broken to a level the controller can't recover automatically this might complicate manual recovery by specialists. + Even battery backed controllers can't guarantee that data written to the drives is consistent after a power outage, neither that the drive does not corrupt something during the involuntary shutdown / power irregularities. (This is theoretical as any server will be UPS backed) -- Regards, Hannes Dorbath ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query Optimizer Failure / Possible Bug
Mhh. I have no clue about the internals of PostgreSQL and query planing, but to me as user this should really be a thing the optimizer has to work out.. On 03.04.2005 10:01, PFC wrote: Noticed this problem,too. You can always make the calculation you want done once inside a set returning function so it'll behave like a table, but that's ugly. On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath [EMAIL PROTECTED] wrote: hm, a few days and not a single reply :| any more information needed? test data? simplified test case? anything? thanks Hannes Dorbath wrote: The query and the corresponding EXPLAIN is at http://hannes.imos.net/query.txt I'd like to use the column q.replaced_serials for multiple calculations in the SELECT clause, but every time it is referenced there in some way the whole query in the FROM clause returning q is executed again. This doesn't make sense to me at all and eats performance. If this wasn't clear enough, for every q.replaced_serials insert_random_calculation AS some_column in the SELECT clause there is new block of --- - Aggregate (cost=884.23..884.23 rows=1 width=0) - Nested Loop (cost=0.00..884.23 rows=1 width=0) - Index Scan using ix_rma_ticket_serials_replace on rma_ticket_serials rts (cost=0.00..122.35 rows=190 width=4) Index Cond: (replace = false) - Index Scan using pk_serials on serials s (cost=0.00..3.51 rows=1 width=4) Index Cond: (s.serial_id = outer.serial_id) Filter: ((article_no = $0) AND (delivery_id = $1)) --- in the EXPLAIN result. For those who wonder why I do this FROM (SELECT...). I was searching for a way to use the result of an subselect for multiple calculations in the SELECT clause and return that calculation results as individual columns. I tested a bit further and found out that PG behaves the same in case q is a view. This makes me wonder how efficient the optimizer can work with views - or even worse - nested views. Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. Thanks in advance, Hannes Dorbath ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query Optimizer Failure / Possible Bug
hm, a few days and not a single reply :| any more information needed? test data? simplified test case? anything? thanks Hannes Dorbath wrote: The query and the corresponding EXPLAIN is at http://hannes.imos.net/query.txt I'd like to use the column q.replaced_serials for multiple calculations in the SELECT clause, but every time it is referenced there in some way the whole query in the FROM clause returning q is executed again. This doesn't make sense to me at all and eats performance. If this wasn't clear enough, for every q.replaced_serials insert_random_calculation AS some_column in the SELECT clause there is new block of --- - Aggregate (cost=884.23..884.23 rows=1 width=0) - Nested Loop (cost=0.00..884.23 rows=1 width=0) - Index Scan using ix_rma_ticket_serials_replace on rma_ticket_serials rts (cost=0.00..122.35 rows=190 width=4) Index Cond: (replace = false) - Index Scan using pk_serials on serials s (cost=0.00..3.51 rows=1 width=4) Index Cond: (s.serial_id = outer.serial_id) Filter: ((article_no = $0) AND (delivery_id = $1)) --- in the EXPLAIN result. For those who wonder why I do this FROM (SELECT...). I was searching for a way to use the result of an subselect for multiple calculations in the SELECT clause and return that calculation results as individual columns. I tested a bit further and found out that PG behaves the same in case q is a view. This makes me wonder how efficient the optimizer can work with views - or even worse - nested views. Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. Thanks in advance, Hannes Dorbath -- imos Gesellschaft fuer Internet-Marketing und Online-Services mbH Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Query Optimizer Failure / Possible Bug
The query and the corresponding EXPLAIN is at http://hannes.imos.net/query.txt I'd like to use the column q.replaced_serials for multiple calculations in the SELECT clause, but every time it is referenced there in some way the whole query in the FROM clause returning q is executed again. This doesn't make sense to me at all and eats performance. If this wasn't clear enough, for every q.replaced_serials insert_random_calculation AS some_column in the SELECT clause there is new block of --- - Aggregate (cost=884.23..884.23 rows=1 width=0) - Nested Loop (cost=0.00..884.23 rows=1 width=0) - Index Scan using ix_rma_ticket_serials_replace on rma_ticket_serials rts (cost=0.00..122.35 rows=190 width=4) Index Cond: (replace = false) - Index Scan using pk_serials on serials s (cost=0.00..3.51 rows=1 width=4) Index Cond: (s.serial_id = outer.serial_id) Filter: ((article_no = $0) AND (delivery_id = $1)) --- in the EXPLAIN result. For those who wonder why I do this FROM (SELECT...). I was searching for a way to use the result of an subselect for multiple calculations in the SELECT clause and return that calculation results as individual columns. I tested a bit further and found out that PG behaves the same in case q is a view. This makes me wonder how efficient the optimizer can work with views - or even worse - nested views. Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32. Thanks in advance, Hannes Dorbath ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org