Re: [PERFORM] Simply join in PostrgeSQL takes too long
Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html So cutting and pasting: - SCHEMA - CREATE TABLE bv_bookgenres ( book_id INT NOT NULL, genre_id INT NOT NULL ); CREATE TABLE bv_genre_children ( genre_id INT, genre_child_id INT ); --- - QUERY - select DISTINCT book_id from bookgenres, genre_children WHERE bookgenres.genre_id = genre_children.genre_child_id AND genre_children.genre_id = 1 LIMIT 10 - - EXPLAIN ANALYZE - QUERY PLAN Limit (cost=6503.51..6503.70 rows=10 width=4) (actual time=703.000..703.000 rows=10 loops=1) - Unique (cost=6503.51..6738.20 rows=12210 width=4) (actual time=703.000..703.000 rows=10 loops=1) - Sort (cost=6503.51..6620.85 rows=46937 width=4) (actual time=703.000..703.000 rows=24 loops=1) Sort Key: bv_bookgenres.book_id - Merge Join (cost=582.45..2861.57 rows=46937 width=4) (actual time=46.000..501.000 rows=45082 loops=1) Merge Cond: (outer.genre_id = inner.genre_child_id) - Index Scan using genre_id on bv_bookgenres (cost=0.00..1462.84 rows=45082 width=8) (actual time=0.000..158.000 rows=45082 loops=1) - Sort (cost=582.45..598.09 rows=6256 width=2) (actual time=46.000..77.000 rows=49815 loops=1) Sort Key: bv_genre_children.genre_child_id - Index Scan using genre_id2 on bv_genre_children (cost=0.00..187.98 rows=6256 width=2) (actual time=0.000..31.000 rows=6379 loops=1) Index Cond: (genre_id = 1) Total runtime: 703.000 ms --- - CONF SETTINGS - shared_buffers = 1000 # min 16, at least max_connections*2, 8KB each sort_mem = 1 #work_mem = 1024# min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB - Have you VACUUM ANALYZED recently. If not do that then rerun the EXPLAIN ANALYZE. You might wanna bump shared_buffers. You have 512MB RAM right? You probably want to bump shared_buffers to 1, restart PG then run a VACUUM ANALYZE. Then rerun the EXPLAIN ANALYZE. If that doesnt help try doing a ALTER TABLE bv_genre_children ALTER COLUMN genre_child_id SET STATISTICS 100; followed by a: VACUUM ANALYZE bv_genre_children; You might also want to be tweaking the effective_cache_size parameter in postgresql.conf, but I am unsure how this would work on Windows. Does Windows have a kernel disk cache anyone? HTH Nick ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] slow seqscan
Edoardo Ceccarelli wrote: My first post to this list :) Scenario: I have a database used only with search queries with only one table that holds about 450.000/500.000 records. The table is well indexed so that most of the queries are executed with index scan but since there is a big text field in the table (360chars) some search operation (with certain filters) ends up with seq scans. This table is not written during normal operation: twice per week there is a batch program that insert about 35.000 records and updates another 40.000. last friday morning, after that batch has been executed, the database started responding really slowly to queries (expecially seq scans), after a vacuum full analize things did get something better. Yesterday the same: before the batch everything was perfect, after every query was really slow, I've vacuum it again and now is ok. Since now the db was working fine, it's 4 month's old with two updates per week and I vacuum about once per month. I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking about setting this table in a kind of read-only mode to improve performance, is this possible? Thank you for your help Edoardo Ceccarelli ---(end of broadcast)--- TIP 8: explain analyze is your friend In general we are going to need more information, like what kind of search filters you are using on the text field and an EXPLAIN ANALYZE. But can you try and run the following, bearing in mind it will take a while to complete. REINDEX TABLE table_name From what I remember there were issues with index space not being reclaimed in a vacuum. I believe this was fixed in 7.4. By not reclaiming the space the indexes grow larger and larger over time, causing PG to prefer a sequential scan over an index scan (I think). Hope that helps Nick ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] MySQL vs PG TPC-H benchmarks
Hi, Has anyone had a look at: http://people.ac.upc.es/zgomez/ I realize that MySQL PG cannot really be compared (especially when you consider the issues that MySQL has with things like data integrity) but still surely PG would perform better than the stats show (i.e. #7 31.28 seconds versus 42 minutes!!!). On a side note it certainly looks like linux kernel 2.6 is quite a bit faster in comparision to 2.4. Nick ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Slow join using network address function
Tom Lane wrote: Eric Jain [EMAIL PROTECTED] writes: http://word-to-the-wise.com/ipr.tgz is a datatype that contains a range of IPv4 addresses, and which has the various operators to make it GIST indexable. Great, this looks very promising. No cast operators between ipr and inet types. Any way to work around this, short of dumping and reloading tables? Wouldn't it be better to implement the GIST indexing operators of that package on the standard datatypes? It wasn't apparent to me what range of IP addresses does for you that isn't covered by CIDR subnet for real-world cases. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend We currently only allow access to one of our apps based on IP address. These IPs are stored one per row in a single table, but often represent a contiguous piece of IP space, but does not represent a full subnet. The current CIDR subnet has the limitation that it will only allow full subnets, i.e. every IP address in 192.168.1.0/24. For example: 192.168.1.15 - 192.168.1.31 This range cannot be represented by a CIDR subnet, or it might be able to but I really dont want to figure it out each time. However this new type allows us to store this range as one row. It allows an arbitrary range of IP addresses, not just those in a specific subnet. I would see this as a useful inclusion whether in the main src tree or in contrib and we will probably be using it when we get to mess with the database schema for this app in the next few months, in fact I have already inserted it into our PG source tree ;-). Nick P.S. We are not responsible for the IP address ranges, we just get told what they are. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Postgresql on SAN
Josh Berkus wrote: Anjan, Has anyone designed/implemented postgresql server on storage networks? Yes, Zapatec.com runs their stuff this way. Probably others as well. Are there any design considerations? I don't know. Probably. Are there any benchmarks for storage products (HBAs, Switches, Storage Arrays)? Not specific to PostgreSQL.I'm sure there are generic benchmarks. Keep in mind that PostgreSQL needs lots of 2-way I/O, batch writes, and random reads. Any recommendation on the design, resources, references, keeping PG in mind? See above. Also keep in mind that PostgreSQL's use of I/O should improve 100% in version 7.5. We run PG on a SAN array. We currently have it setup so a single PG instance runs off of a single LUN, this includes the WAL logs. Apart from that we have made no other special considerations; we just treat it as a fast RAID array. We haven't got to the stage where the speed of the SAN is a problem as load hasn't increased as expected. This will change, when it does I am sure the performance list will be hearing from us ;-). Out current limitations, as I see it, are amount of memory and then processing power. The only problem we have had was a dodgy set of kernel modules (drivers) for the fibre cards, this was because they were beta drivers and obviously still had a few bugs. This was solved by reverting to an older version. Everything has run smoothly since then (uptime is 153 days :-)). Nick ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Linux / Clariion
Cott Lang wrote: Anybody used Linux with EMC Clariions for PG databases? Any good war stories, pros, cons, performance results ? I'm wearing thin on my 6 disk 0+1 configuration and looking for something beefy, possibly for clustering, and I'm wondering what the net wisdom is. :) thanks! ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Heya, We are currently using a Dell badged Clariion FC4500 running at 1Gb fibre channel. It has 20 HDD in there at the moment, and another 5 disks are coming in the next couple of weeks. These disks are split into several RAID 5 arrays, each about 200Gb. We are using QLogic HBA's under RH Linux. We have had a couple of problems, one driver based which was resolved by a kernel upgrade. They also dont seem to like changing IP addresses of the servers, Navisphere wanted the servers to be reregistered before it started working properly. In terms of performance, its 1Gb fibre to the disk, U320 SCSI hot swap, 10k disks. They run very fast and apart from the configuration issues above have never given us any grief. The LUN's have been running flawlessly for over a year (touch wood). We just need some beefier boxes to take advantage of their speed. I am thinking of proposing one or more Quad Opterons with 32Gb RAM ;-) That should do the trick i reckon. We can try and run some benchmarks on one of the spare machines if you want, if so send through some examples of pg_bench parameters that are appropriate. Anyone got any useful sets? We are currently running PG 7.3 (7.3.1 or 7.3.2 I think) at the moment on that box. Trouble is with this model of Clariion, EMC is apparently setting EOL for 2005 sometime, which we aint to pleased about. Never mind, hopefully we will have some more money by that time.. Any other info just say and I will see what I can dig up. Nick Barr ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Persistent Connections
Hi, [EMAIL PROTECTED] wrote: Hi I have a php script and i make a pg_pconnect If i want to make 4-10 pg_query in that script Have i to close the connection at end of the script? (i would say yes, is it right?) If you want to make multiple pg_query's in a page you can, and you can use the same connection. You dont have to use persistent connections for this. Just open the connection and fire off the different queries. The persistent connection remains open between different pages loading, which is supposedly faster because you dont have the overhead of opening the connection. If you want to use a persistent connection then definitely dont close it at the bottom of the page. If you want to use the other connection (pg_connect, non-persistent) then you dont have to close this connection at the bottom of the page because PHP does it for you, although you can if you are feeling nice ;-). Sorry I m a little bit confused about the persistent thing!! Is it smart to use persistent connections at all if i expect 100K Users to hit the script in an hour and the script calls up to 10-15 pg functions? I have at the mom one function but the server needs 500 ms, its a little bit too much i think, and it crashed when i had 20K users Use the persistent connection but make sure the parameters in postgresql.conf match up with the Apache config. The specific settings are MaxClients in httpd.conf and max_connections in postgresql.conf. Make sure that max_connections is at least as big as MaxClients for every database that your PHP scripts connect to. Thanks Bye ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 100 simultaneous connections, critical limit?
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Jón Ragnarsson Sent: 14 January 2004 13:44 Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] 100 simultaneous connections, critical limit? Ok, connection pooling was the thing that I thought of first, but I haven't found any docs regarding pooling with PHP+Postgres. OTOH, I designed the application to be as independent from the DB as possible. (No stored procedures or other Postgres specific stuff) Thanks, J. As far as I know PHP supports persistent connections to a PG database. See pg_pconnect instead of pg_connect. Each of the db connections are tied to a particular Apache process and will stay open for the life of that process. So basically make sure your Apache config file (httpd.conf) and PG config file (postgresql.conf) agree on the maximum number of connections otherwise some pages will not be able to connect to your database. This may not be a problem for small sites but on large sites it is, with heavy loads and large number of concurrent users. For example, consider a site that must support 500 concurrent connections. If persistent connections are used at least 500 concurrent connections to PG would be required, which I guess is probably not recommended. The way I would like Apache/PHP to work is to have a global pool of connections to a postgres server, which can be shared around all Apache processes. This pool can be limited to say 50 or 100 connections. Problems occur under peak load where all 500 concurrent connections are in use, but all that should happen is there is a bit of a delay. Hope that (almost) makes sense, Kind Regards, Nick Barr WebBased Ltd. Christopher Browne wrote: Clinging to sanity, [EMAIL PROTECTED] (Jón Ragnarsson) mumbled into her beard: I am writing a website that will probably have some traffic. Right now I wrap every .php page in pg_connect() and pg_close(). Then I read somewhere that Postgres only supports 100 simultaneous connections (default). Is that a limitation? Should I use some other method when writing code for high-traffic website? I thought the out-of-the-box default was 32. If you honestly need a LOT of connections, you can configure the database to support more. I upped the limit on one system to have 512 the other week; certainly supportable, if you have the RAM for it. It is, however, quite likely that the connect()/close() cuts down on the efficiency of your application. If PHP supports some form of connection pooling, you should consider using that, as it will cut down _dramatically_ on the amount of work done establishing/closing connections, and should let your apps use somewhat fewer connections more effectively. ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] IDE Hardware RAID Controller
Heya, FYI just spotted this and thought I would pass it on, for all those who are looking at new boxes. http://www.theinquirer.net/?article=12665 http://www.promise.com/product/product_detail_eng.asp?productId=112familyId =2 Looks like a four-channel hot-swap IDE (SATA) hardware RAID controller with up to 256Mb onboard RAM. Nick ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] go for a script! / ex: PostgreSQL vs. MySQL
Heya Guys n Gals, Having been following the thread on go for a script! / ex: PostgreSQL vs. MySQL. I thought I would throw something together in Perl. My current issue is that I only have access to a RH Linux box and so cannot make it cross-platform on my own :-(. Anyhow please find it attached. It runs fine on my box, it doesnt actually write to postgresql.conf because I didnt want to mess it up, it does however write to postgresql.conf.new for the moment. The diffs seem to be writing correctly. There are a set of parameters at the top which may need to get tweaked for your platform. I can also carry on posting to this list new versions if people want. Clearly this lot is open source, so please feel free to play with it and post patches/new features back either to the list or my email directly. In case you cant see my email address, it is nicky at the domain below. I will also post it on me website and as I develop it further new versions will appear there http://www.chuckie.co.uk/postgresql/pg_autoconfig.pl Is this a useful start? Nick pg_autoconfig.pl Description: Binary data ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL
- Original Message - From: Nick Barr [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, October 10, 2003 1:35 PM Subject: go for a script! / ex: PostgreSQL vs. MySQL I will also post it on me website and as I develop it further new versions will appear there http://www.chuckie.co.uk/postgresql/pg_autoconfig.pl Make that http://www.chuckie.co.uk/postgresql/pg_autoconfig.txt Nick ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Effective Cache Size
Hi, I have been following a thread on this list Inconsistent performance and had a few questions especially the bits about effective_cache_size. I have read some of the docs, and some other threads on this setting, and it seems to used by the planner to either choose a sequential or index scan. So it will not necessarily increase performance I suppose but instead choose the most optimal plan. Is this correct? We are not that we are suffering massive performance issues at the moment but it is expected that our database is going to grow considerably in the next couple of years, both in terms of load and size. Also what would an appropriate setting be? From what I read of Scott Marlowes email, and from the information below I reckon it should be somewhere in the region of 240,000. Danger maths ahead. Beware maths 141816K buff + 1781764K cached - 1923580K total effective_cache_size = 1923580 / 8 = 240447.5 /maths Here is some information on the server in question. If any more information is required then please say. It is a dedicated PG machine with no other services being hosted off it. As you can see from the uptime, its load average is 0.00, and is currently so chilled its almost frozen! That will change though :-( Hardware Dual PIII 1.4GHz 2Gb RAM 1Tb SAN with hardware RAID 5 using 1Gbps Fibre channel. OS == Linux webbasedth5 2.4.18-18.7.xsmp #1 SMP Wed Nov 13 19:01:42 EST 2002 i686 Red Hat Linux release 7.3 (Valhalla) PG == PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96 Database This includes all indexes and tables. I can provide more information on how this is chopped up if needed. Size : 1,141.305 Mb Tuples : 13,416,397 Uptime == 11:15am up 197 days, 16:50, 1 user, load average: 0.00, 0.00, 0.00 Top === Mem: 2064836K av, 2018648K used, 46188K free, 0K shrd, 141816K buff Swap: 2096472K av,4656K used, 2091816K free 1781764K cached Postgresql.conf (all defaults except) = max_connections = 1000 shared_buffers = 16000 (128 Mb) max_fsm_relations = 5000 max_fsm_pages = 50 vacuum_mem = 65535 Kind Regards, Nick Barr This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 8: explain analyze is your friend