Re: [PERFORM] Best COPY Performance
On 23 Oct 2006, at 22:59, Jim C. Nasby wrote: http://stats.distributed.net used to use a perl script to do some transformations before loading data into the database. IIRC, when we switched to using C we saw 100x improvement in speed, so I suspect that if you want performance perl isn't the way to go. I think you can compile perl into C, so maybe that would help some. http://shootout.alioth.debian.org/gp4/benchmark.php? test=alllang=perllang2=gcc 100x doesn't totally impossible if that is even vaguely accurate and you happen to be using bits of Perl which are a lot slower than the C implementation would be... The slowest things appear to involve calling functions, all the slowest tests involve lots of function calls. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL
On 3 Oct 2006, at 16:04, Merlin Moncure wrote: On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote: Some very helpful people had asked that I post the troublesome code that was generated by my import program. I installed a SQL log feature in my import program. I have posted samples of the SQL statements that cause the biggest delays. explain analyze is more helpful because it prints the times. You can always use the \timing flag in psql ;) l1_historical=# \timing Timing is on. l1_historical=# select 1; ?column? -- 1 (1 row) Time: 4.717 ms ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] scaling up postgres
On 12 Jun 2006, at 00:21, Joshua D. Drake wrote: Mario Splivalo wrote: On Sat, 2006-06-03 at 11:43 +0200, Steinar H. Gunderson wrote: On Sat, Jun 03, 2006 at 10:31:03AM +0100, [EMAIL PROTECTED] wrote: I do have 2 identical beasts (4G - biproc Xeon 3.2 - 2 Gig NIC) One beast will be apache, and the other will be postgres. I'm using httperf/autobench for measurments and the best result I can get is that my system can handle a trafiic of almost 1600 New con/sec. What version of PostgreSQL? (8.1 is better than 8.0 is much better than 7.4.) Have you remembered to turn HT off? Have you considered Opterons instead of Xeons? (The Xeons generally scale bad with PostgreSQL.) What kind of queries Could you point out to some more detailed reading on why Xeons are poorer choice than Opterons when used with PostgreSQL? It isn't just PostgreSQL. It is any database. Opterons can move memory and whole lot faster then Xeons. A whole lot faster indeed. http://www.amd.com/us-en/Processors/ProductInformation/ 0,,30_118_8796_8799,00.html http://www.theinquirer.net/?article=10797 Although apparently the dual core ones are a little better than the old ones http://www.anandtech.com/IT/showdoc.aspx?i=2644 (Just to provide some evidence ;) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 1 TB of memory
We got a quote for one of these (entirely for comedy value of course) and it was in the region of £1,500,000 give or take a few thousand. On 16 Mar 2006, at 18:33, Jim Nasby wrote: PostgreSQL tuned to the max and still too slow? Database too big to fit into memory? Here's the solution! http://www.superssd.com/ products/tera-ramsan/ Anyone purchasing one will be expected to post benchmarks! :) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN/NAS options
I hope this isn't too far off topic for this list. Postgres is the main application that I'm looking to accomodate. Anything else I can do with whatever solution we find is just gravy... You've given me a lot to go on... Now I'm going to have to do some research as to real-world RAID controller performance. It's vexing (to say the least) that most vendors don't supply any raw throughput or TPS stats on this stuff... One word of advice. Stay away from Dell kit. The PERC 4 controllers they use don't implement RAID 10 properly. It's RAID 1 + JBOD array. It also has generally dismal IOPS performance too. You might get away with running software RAID, either in conjunction with, or entirely avoiding the card. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 15,000 tables
On 1 Dec 2005, at 16:03, Tom Lane wrote: Michael Riess [EMAIL PROTECTED] writes: (We NEED that many tables, please don't recommend to reduce them) No, you don't. Add an additional key column to fold together different tables of the same structure. This will be much more efficient than managing that key at the filesystem level, which is what you're effectively doing now. (If you really have 15000 distinct rowtypes, I'd like to know what your database design is...) Won't you end up with awful seek times if you just want data which previously been stored in a single table? E.g. whilst before you wanted 1000 contiguous rows from the table, now you want 1000 rows which now have 1000 rows you don't care about in between each one you do want. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 15,000 tables
On 2 Dec 2005, at 14:16, Alex Stapleton wrote: On 1 Dec 2005, at 16:03, Tom Lane wrote: Michael Riess [EMAIL PROTECTED] writes: (We NEED that many tables, please don't recommend to reduce them) No, you don't. Add an additional key column to fold together different tables of the same structure. This will be much more efficient than managing that key at the filesystem level, which is what you're effectively doing now. (If you really have 15000 distinct rowtypes, I'd like to know what your database design is...) Won't you end up with awful seek times if you just want data which previously been stored in a single table? E.g. whilst before you wanted 1000 contiguous rows from the table, now you want 1000 rows which now have 1000 rows you don't care about in between each one you do want. I must of had a total and utter failure of intellect for a moment there. Please ignore that :P ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware/OS recommendations for large databases (
On 16 Nov 2005, at 12:51, William Yu wrote: Alex Turner wrote: Not at random access in RAID 10 they aren't, and anyone with their head screwed on right is using RAID 10. The 9500S will still beat the Areca cards at RAID 10 database access patern. The max 256MB onboard for 3ware cards is disappointing though. While good enough for 95% of cases, there's that 5% that could use a gig or two of onboard ram for ultrafast updates. For example, I'm specing out an upgrade to our current data processing server. Instead of the traditional 6xFast-Server-HDs, we're gonna go for broke and do 32xConsumer-HDs. This will give us mega I/O bandwidth but we're vulnerable to random access since consumer-grade HDs don't have the RPMs or the queueing-smarts. This means we're very dependent on the controller using onboard RAM to do I/O scheduling. 256MB divided over 4/6/8 drives -- OK. 256MB divided over 32 drives -- ugh, the HD's buffers are bigger than the RAM alotted to it. At least this is how it seems it would work from thinking through all the factors. Unfortunately, I haven't found anybody else who has gone this route and reported their results so I guess we're the guinea pig. Your going to have to factor in the increased failure rate in your cost measurements, including any downtime or performance degradation whilst rebuilding parts of your RAID array. It depends on how long your planning for this system to be operational as well of course. Pick two: Fast, cheap, reliable. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Monitoring Postgresql performance
On 28 Sep 2005, at 15:32, Arnau wrote: Hi all, I have been googling a bit searching info about a way to monitor postgresql (CPU Memory, num processes, ... ) and I haven't found anything relevant. I'm using munin to monitor others parameters of my servers and I'd like to include postgresql or have a similar tool. Any of you is using anything like that? all kind of hints are welcome :-) Cheers! Have you looked at SNMP? It's a bit complex but there's lots of tools for monitoring system data / sending alerts based on SNMP already. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Avoid using swap in a cluster
On 2 Sep 2005, at 10:42, Richard Huxton wrote: Ricardo Humphreys wrote: Hi all. In a cluster, is there any way to use the main memory of the other nodes instead of the swap? If I have a query with many sub- queries and a lot of data, I can easily fill all the memory in a node. The point is: is there any way to continue using the main memory from other nodes in the same query instead of the swap? I don't know of any clustered version of PG that can spread queries over multiple machines. Can I ask what you are using? IIRC GreenPlums DeepGreen MPP (Version 2) can do it. It does cost money though, but it is a very nice product. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Mount database on RAM disk?
On 8 Jul 2005, at 20:21, Merlin Moncure wrote: Stuart, I'm putting together a road map on how our systems can scale as our load increases. As part of this, I need to look into setting up some fast read only mirrors of our database. We should have more than enough RAM to fit everything into memory. I would like to find out if I could expect better performance by mounting the database from a RAM disk, or if I would be better off keeping that RAM free and increasing the effective_cache_size appropriately. If you're accessing a dedicated, read-only system with a database small enough to fit in RAM, it'll all be cached there anyway, at least on Linux and BSD. You won't be gaining anything by creating a ramdisk. ditto windows. Files cached in memory are slower than reading straight from memory but not nearly enough to justify reserving memory for your use. In other words, your O/S is a machine with years and years of engineering designed best how to dole memory out to caching and various processes. Why second guess it? Because sometimes it gets it wrong. The most brutal method is occasionally the most desirable. Even if it not the right way to do it. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?
On 20 Jun 2005, at 18:46, Josh Berkus wrote: Alex, Hi, i'm trying to optimise our autovacuum configuration so that it vacuums / analyzes some of our larger tables better. It has been set to the default settings for quite some time. We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status, but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) I personally don't use autovaccuum on very large databases. For DW, vacuuming is far better tied to ETL operations or a clock schedule of downtime. Downtime is something I'd rather avoid if possible. Do you think we will need to run VACUUM FULL occasionally? I'd rather not lock tables up unless I cant avoid it. We can probably squeeze an automated vacuum tied to our data inserters every now and then though. XID wraparound may be further away than you think. Try checking pg_controldata, which will give you the current XID, and you can calculate how long you are away from wraparound. I just tested a 200G data warehouse and figured out that we are 800 months away from wraparound, despite hourly ETL. Is this an 8.0 thing? I don't have a pg_controldata from what I can see. Thats nice to hear though. However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration? Hmmm, good point, you could use autovacuum for ANALYZE only. Just set the VACUUM settings preposterously high (like 10x) so it never runs. Then it'll run ANALYZE only. I generally threshold 200, multiple 0.1x for analyze; that is, re-analyze after 200+10% of rows have changed. I will try those settings out, that sounds good to me though. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?
On 21 Jun 2005, at 18:13, Josh Berkus wrote: Alex, Downtime is something I'd rather avoid if possible. Do you think we will need to run VACUUM FULL occasionally? I'd rather not lock tables up unless I cant avoid it. We can probably squeeze an automated vacuum tied to our data inserters every now and then though. As long as your update/deletes are less than 10% of the table for all time, you should never have to vacuum, pending XID wraparound. Hmm, I guess as we have hundreds of millions of rows, and when we do delete things, it's only a few thousand, and rarely. VACUUMing shouldn't need to happen too often. Thats good. Thanks a lot for the advice. Is this an 8.0 thing? I don't have a pg_controldata from what I can see. Thats nice to hear though. 'fraid so, yes. Bloody Debian stable. I might have to experiment with building from source or using alien on debian to convert the rpms. Fun. Oh well. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] autovacuum suggestions for 500,000,000+ row tables?
Hi, i'm trying to optimise our autovacuum configuration so that it vacuums / analyzes some of our larger tables better. It has been set to the default settings for quite some time. We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status, but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?
On 20 Jun 2005, at 15:59, Jacques Caron wrote: Hi, At 16:44 20/06/2005, Alex Stapleton wrote: We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status DELETEs are not the only reason you might need to VACUUM. UPDATEs are important as well, if not more. Tables that are constantly updated (statistics, session data, queues...) really need to be VACUUMed a lot. We UPDATE it even less often. but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) That would give you a maximum average of 31 transactions/sec... Don't know if that's high or low for you. It's high as far as inserts go for us. It does them all at the end of each minute. However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration? ANALYZE is not a very expensive operation, however VACUUM can definitely be a big strain and take a long time on big tables, depending on your setup. I've found that partitioning tables (at the application level) can be quite helpful if you manage to keep each partition to a reasonable size (under or close to available memory), especially if the partitioning scheme is somehow time- related. YMMV. Jacques. That's not currently an option as it would require a pretty large amount of work to implement. I think we will have to keep that in mind though. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] PostgreSQL using the wrong Index
We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL using the wrong Index
Oh, we are running 7.4.2 btw. And our random_page_cost = 1 On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL using the wrong Index
On 13 Jun 2005, at 15:47, John A Meinel wrote: Alex Stapleton wrote: Oh, we are running 7.4.2 btw. And our random_page_cost = 1 Which is only correct if your entire db fits into memory. Also, try updating to a later 7.4 version if at all possible. I am aware of this, I didn't configure this machine though unfortuantely. On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree Just so you are aware, writing this as: We have an index on N_intra(time) and one on N_Intra(symbol, time) is a lot more succinct. Sorry, I happened to have them there in my clipboard at the time so I just blindly pasted them in. l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. What happens if you do: SELECT * FROM N_intra WHERE symbol='doesnt exist' ORDER BY symbol, time DESC LIMIT 1; Hurrah! I should of thought of this, considering i've done it in the past :) Thanks a lot, that's great. Yes, symbol is constant, but it frequently helps the planner realize it can use an index scan if you include all terms in the index in the ORDER BY clause. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? Try the above first. You could also create a new index on symbol CREATE INDEX N_intra_symbol_idx ON N_intra(symbol); Then the WHERE clause should use the symbol index, which means it can know quickly that an entry doesn't exist. I'm not sure how many entries you have per symbol, though, so this might cause problems in the ORDER BY time portion. I'm guessing what you really want is to just do the ORDER BY symbol, time. John =:- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Optimizing for writes. Data integrity not critical
Is using a ramdisk in situations like this entirely ill-advised then? When data integrity isn't a huge issue and you really need good write performance it seems like it wouldn't hurt too much. Unless I am missing something? On 20 May 2005, at 02:45, Christopher Kings-Lynne wrote: I'm doing the writes individually. Is there a better way? Combining them all into a transaction or something? Use COPY of course :) Or at worst bundle 1000 inserts at a time in a transation... And if you seriously do not care about your data at all, set fsync = off in you postgresql.conf for a mega speedup. Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizing for writes. Data integrity not critical
I am interested in optimising write performance as well, the machine I am testing on is maxing out around 450 UPDATEs a second which is quite quick I suppose. I haven't tried turning fsync off yet. The table has...a lot of indices as well. They are mostly pretty simple partial indexes though. I would usually just shuv stuff into memcached, but I need to store and sort (in realtime) 10's of thousands of rows. (I am experimenting with replacing some in house toplist generating stuff with a PG database.) The partial indexes are basically the only thing which makes the table usable btw. The read performance is pretty damn good, but for some reason I chose to wrote the benchmark script in PHP, which can totally destroy the accuracy of your results if you decide to call pg_fetch_*(), even pg_affected_rows() can skew things significantly. So any ideas how to improve the number of writes I can do a second? The existing system sorts everything by the desired column when a request is made, and the data it sorts is updated in realtime (whilst it isn't being sorted.) And it can sustain the read/write load (to memory) just fine. If I PG had heap tables this would probably not be a problem at all, but it does, so it is. Running it in a ramdisk would be acceptable, it's just annoying to create the db everytime the machine goes down. And having to run the entire PG instance off of the ramdisk isn't great either. On 19 May 2005, at 23:21, Steve Bergman wrote: Hi, I am using postgresql in small (almost trivial) application in which I pull some data out of a Cobol C/ISAM file and write it into a pgsl table. My users can then use the data however they want by interfacing to the data from OpenOffice.org. The amount of data written is about 60MB and takes a few minutes on a 1200Mhz Athlon with a single 60MB IDE drive running Fedora Core 3 with pgsql 7.4.7. I'd like to speed up the DB writes a bit if possible. Data integrity is not at all critical as the database gets dropped, created, and populated immediately before each use. Filesystem is ext3, data=ordered and I need to keep it that way as there is other data in the filesystem that I do care about. I have not done any tuning in the config file yet, and was wondering what things would likely speed up writes in this situation. I'm doing the writes individually. Is there a better way? Combining them all into a transaction or something? Thanks, Steve Bergman ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 23:35, PFC wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. Theres no reason it couldn't be done with PHP to be fair as long as you could ensure that the client was always routed back to the same machines. Which has it's own set of issues entirely. I am not entirely sure that memcached actually does serialize data when it's comitted into memcached either, although I could be wrong, I have not looked at the source. Certainly if you can ensure that a client always goes back to the same machine you can simplify the whole thing hugely. It's generally not that easy though, you need a proxy server of some description capable of understanding the HTTP traffic and maintaining a central session lookup table to redirect with. Which isn't really solving the problem so much as moving it somewhere else. Instead of needing huge memcached pools, you need hardcore loadbalancers. Load Balancers tend to cost $ in comparison. Distributed sticky sessions are a rather nice idea, I would like to hear a way of implementing them cheaply (and on PHP) as well. I may have to give that some thought in fact. Oh yeah, and load balancers software often sucks in annoying (if not always important) ways. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. Since when did Massive Data stores have nothing to do with DBs? Isn't Oracle Cluster entirely based on forming an enormous scalable disk array to store your DB on? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
On 12 May 2005, at 15:08, Alex Turner wrote: Having local sessions is unnesesary, and here is my logic: Generaly most people have less than 100Mb of bandwidth to the internet. If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. Why solve the complicated clustered sessions problem, when you don't really need to? 100 hits a second = 8,640,000 hits a day. I work on a site which does 100 million dynamic pages a day. In comparison Yahoo probably does 100,000,000,000 (100 billion) views a day if I am interpreting Alexa's charts correctly. Which is about 1,150,000 a second. Now considering the site I work on is not even in the top 1000 on Alexa, theres a lot of sites out there which need to solve this problem I would assume. There are also only so many hash table lookups a single machine can do, even if its a Quad Opteron behemoth. Alex Turner netEconomist On 5/11/05, PFC [EMAIL PROTECTED] wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Partitioning / Clustering
On 12 May 2005, at 18:33, Josh Berkus wrote: People, In general I think your point is valid. Just remember that it probably also matters how you count page views. Because technically images are a separate page (and this thread did discuss serving up images). So if there are 20 graphics on a specific page, that is 20 server hits just for that one page. Also, there's bots and screen-scrapers and RSS, web e-mails, and web services and many other things which create hits but are not people. I'm currently working on clickstream for a site which is nowhere in the top 100, and is getting 3 million real hits a day ... and we know for a fact that at least 1/4 of that is bots. I doubt bots are generally Alexa toolbar enabled. Regardless, the strategy you should be employing for a high traffic site is that if your users hit the database for anything other than direct interaction (like filling out a webform) then you're lost.Use memcached, squid, lighttpd caching, ASP.NET caching, pools, etc. Keep the load off the database except for the stuff that only the database can do. This is the aproach I would take as well. There is no point storing stuff in a DB, if your only doing direct lookups on it and it isn't the sort of data that you care so much about the integrity of. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 08:57, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very Plug-and-Play as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well. They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/ remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap? David ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 09:50, Alex Stapleton wrote: On 11 May 2005, at 08:57, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very Plug-and-Play as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well. They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap? That will teach me to RTFA first ;) Ok so LJ maintain an index of which cluster each user is on, kinda of like google do :) David ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
On 10 May 2005, at 15:41, John A Meinel wrote: Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. Better hardware = More Efficient != More Scalable But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. snip So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. There is also PGCluster http://pgfoundry.org/projects/pgcluster/ Which is trying to be more of a Synchronous multi-master system. I haven't heard of Clusgres, so I'm guessing it is an older attempt, which has been overtaken by pgcluster. Just realize that clusters don't necessarily scale like you would want them too. Because at some point you have to insert into the same table, which means you need to hold a lock which prevents the other machine from doing anything. And with synchronous replication, you have to wait for all of the machines to get a copy of the data before you can say it has been committed, which does *not* scale well with the number of machines. This is why I mention partitioning. It solves this issue by storing different data sets on different machines under the same schema. These seperate chunks of the table can then be replicated as well for data redundancy and so on. MySQL are working on these things, but PG just has a bunch of third party extensions, I wonder why these are not being integrated into the main trunk :/ Thanks for pointing me to PGCluster though. It looks like it should be better than Slony at least. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
On 10 May 2005, at 16:02, Adam Haberlach wrote: I think that perhaps he was trying to avoid having to buy Big Iron at all. You would be right. Although we are not against paying a bit more than $300 for a server ;) With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet. The aggregate CPU and bandwidth is pretty stupendous, but not as easy to harness as a single machine. snip Yes, clustering solutions can distribute the data, and can even do it on a per-table basis in some cases. This still leaves it up to the application's logic to handle reunification of the data. If your going to be programming that sort of logic into your API in the beginning, it's not too much more work to add basic replication, load balancing and partitioning into it either. But the DB should be able to do it for you, adding that stuff in later is often more difficult and less likely to get done. Ideas: 1. Create a table/storage type that consists of a select statement on another machine. While I don't think the current executor is capable of working on multiple nodes of an execution tree at the same time, it would be great if it could offload a select of tuples from a remote table to an entirely different server and merge the resulting data into the current execution. I believe MySQL has this, and Oracle may implement it in another way. MySQL sort of has this, it's not as good as Oracle's though. Apparently there is a much better version of it in 5.1 though, that should make it to stable sometime next year I imagine. 2. There is no #2 at this time, but I'm sure one can be hypothesized. I would of thought a particularly smart version of pg_pool could do it. It could partition data to different servers if it knew which columns to key by on each table. ...Google and other companies have definitely proved that one can harness huge clusters of cheap hardware. It can't be _that_ hard, can it. :) I shudder to think how much the Big Iron equivalent of a google data-center would cost. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel Sent: Tuesday, May 10, 2005 7:41 AM To: Alex Stapleton Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning / Clustering Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. This would require some application level support, since an INSERT goes to a different place than a SELECT. But there has been some discussion about pg_pool being able to spread the query load, and having it be aware of the difference between a SELECT and an INSERT and have it route the query to the correct host. The biggest problem being that functions could cause a SELECT func() to actually insert a row, which pg_pool wouldn't know about. There are 2 possible solutions, a) don't do that when you are using this system, b) add some sort of comment hint so that pg_pool can understand that the select is actually an INSERT, and needs to be done on the master. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add