Re: [PERFORM] Request for feedback on hardware for a new database server
On 2011-03-18 01:51, Oliver Charles wrote: Hello, At MusicBrainz we're looking to get a new database server, and are hoping to buy this in the next couple of days. I'm mostly a software guy, but I'm posting this on behalf of Rob, who's actually going to be buying the hardware. Here's a quote of what we're looking to get: I think most of it has been said already: * Battery backed write cache * See if you can get enough memory to make all of your active dataset fit in memory. (typically not that hard in 2011). * Dependent on your workload of-course, you're typically not bottlenecked by the amount of cpu-cores, so strive for fewer faster cores. * As few sockets as you can screeze you memory and cpu-requirements onto. * If you can live with (or design around) the tradeoffs with SSD it will buy you way more performance than any significant number of rotating drives. (a good backup plan with full WAL-log to a second system as an example). -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Disabling nested loops - worst case performance
Hello list, I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL 8.4.7. The basic problem is that when joining multiple times different entities the planner thinks that there is vastly less rows to join than there is in reality and decides to use multiple nested loops for the join chain. This results in queries where when nested loops are enabled, query time is somewhere around 35 seconds, but with nested loops disabled, the performance is somewhere around 100ms. I don't think there is much hope for getting better statistics, as EAV is just not statistics friendly. The values of an attribute depend on the type of the attribute, and different entities have different attributes defined. The planner has no idea of these correlations. Now, my question is: if I disable nested loops completely for the users of the EAV database what kind of worst case performance loss can I expect? I don't mind if a query that normally runs in 100ms now takes 200ms, but about problems where the query will take much more time to complete than with nested loops enabled. As far as I understand these cases should be pretty rare if non-existent? - Anssi -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
Hello for example queries with LIMIT clause can be significantly faster with nested loop. But you don't need to disable nested loop globally. You can wrap your query to sql functions and disable nested loop just for these functions. Regards Pavel Stehule 2011/3/18 Anssi Kääriäinen anssi.kaariai...@thl.fi: Hello list, I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL 8.4.7. The basic problem is that when joining multiple times different entities the planner thinks that there is vastly less rows to join than there is in reality and decides to use multiple nested loops for the join chain. This results in queries where when nested loops are enabled, query time is somewhere around 35 seconds, but with nested loops disabled, the performance is somewhere around 100ms. I don't think there is much hope for getting better statistics, as EAV is just not statistics friendly. The values of an attribute depend on the type of the attribute, and different entities have different attributes defined. The planner has no idea of these correlations. Now, my question is: if I disable nested loops completely for the users of the EAV database what kind of worst case performance loss can I expect? I don't mind if a query that normally runs in 100ms now takes 200ms, but about problems where the query will take much more time to complete than with nested loops enabled. As far as I understand these cases should be pretty rare if non-existent? - Anssi -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On 18-3-2011 4:02 Scott Marlowe wrote: On Thu, Mar 17, 2011 at 6:51 PM, Oliver Charles postgresql-p...@ocharles.org.uk wrote: Another point. My experience with 1U chassis and cooling is that they don't move enough air across their cards to make sure they stay cool. You'd be better off ordering a 2U chassis with 8 3.5 drive bays so you can add drives later if you need to, and it'll provide more cooling air across the card. Our current big 48 core servers are running plain LSI SAS adapters without HW RAID because the LSI s we were using overheated and cooked themselves to death after about 3 months. Those are 1U chassis machines, and our newer machines are all 2U boxes now. We have several 1U boxes (mostly Dell and Sun) running and had several in the past. And we've never had any heating problems with them. That includes machines with more power hungry processors than are currently available, all power slurping FB-dimm slots occupied and two raid cards installed. But than again, a 2U box will likely have more cooling capacity, no matter how you look at it. Another tip that may be useful; look at 2.5 drives. Afaik there is no really good reason to use 3.5 drives for new servers. The 2.5 drives save power and room - and thus may allow more air flowing through the enclosure - and offer the same performance and reliability (the first I know for sure, the second I'm pretty sure of but haven't seen much proof of lately). You could even have a 8- or 10-disk 1U enclosure in that way or up to 24 disks in 2U. But those configurations will require some attention to cooling again. Best regards, Arjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden acmmail...@tweakers.net wrote: On 18-3-2011 4:02 Scott Marlowe wrote: On Thu, Mar 17, 2011 at 6:51 PM, Oliver Charles postgresql-p...@ocharles.org.uk wrote: Another point. My experience with 1U chassis and cooling is that they don't move enough air across their cards to make sure they stay cool. You'd be better off ordering a 2U chassis with 8 3.5 drive bays so you can add drives later if you need to, and it'll provide more cooling air across the card. Our current big 48 core servers are running plain LSI SAS adapters without HW RAID because the LSI s we were using overheated and cooked themselves to death after about 3 months. Those are 1U chassis machines, and our newer machines are all 2U boxes now. We have several 1U boxes (mostly Dell and Sun) running and had several in the past. And we've never had any heating problems with them. That includes machines with more power hungry processors than are currently available, all power slurping FB-dimm slots occupied and two raid cards installed. Note I am talking specifically about the ability to cool the RAID card, not the CPUS etc. Many 1U boxes have poor air flow across the expansion slots for PCI / etc cards, while doing a great job cooling the CPUs and memory. If you don't use high performance RAID cards (LSI 9xxx Areca 16xx 18xx) then it's not an issue. Open up your 1U and look at the air flow for the expansion slots, it's often just not very much. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
On 03/18/2011 09:02 AM, Pavel Stehule wrote: for example queries with LIMIT clause can be significantly faster with nested loop. But you don't need to disable nested loop globally. You can wrap your query to sql functions and disable nested loop just for these functions. Thank you for your help, the LIMIT example was something I was not aware of. The problem is we are replacing an old database, and we need to replicate certain views for external users. Minimal impact for these users is required. Maybe it would be best to create special user accounts for these external users and disable nested loops only for those accounts. Otherwise we will disable nested loops when absolutely necessary. - Anssi -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
18.03.11 09:15, Anssi Kääriäinen написав(ла): Hello list, I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL 8.4.7. The basic problem is that when joining multiple times different entities the planner thinks that there is vastly less rows to join than there is in reality and decides to use multiple nested loops for the join chain. This results in queries where when nested loops are enabled, query time is somewhere around 35 seconds, but with nested loops disabled, the performance is somewhere around 100ms. I don't think there is much hope for getting better statistics, as EAV is just not statistics friendly. The values of an attribute depend on the type of the attribute, and different entities have different attributes defined. The planner has no idea of these correlations. Hello. If your queries work on single attribute, you can try adding partial indexes for different attributes. Note that in this case parameterized statements may prevent index usage, so check also with attribute id inlined. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
Anssi Kääriäinen, 18.03.2011 08:15: Hello list, I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL 8.4.7. The basic problem is that when joining multiple times different entities the planner thinks that there is vastly less rows to join than there is in reality and decides to use multiple nested loops for the join chain. Did you consider using hstore instead? I think in the PostgreSQL world, this is a better alternative than EAV and most probably faster as well. Regards Thomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
On 03/18/2011 12:52 PM, Vitalii Tymchyshyn wrote: If your queries work on single attribute, you can try adding partial indexes for different attributes. Note that in this case parameterized statements may prevent index usage, so check also with attribute id inlined. Best regards, Vitalii Tymchyshyn Unfortunately this does not help for the statistics, and (I guess) nested loops will still be used when joining: hot2= explain analyze select * from attr_value where attr_tunniste = 'suhde_hyvaksytty' and arvo_text = 't'; QUERY PLAN --- Index Scan using attr_value_arvo_text_idx1 on attr_value (cost=0.00..343.59 rows=152 width=118) (actual time=0.076..7.768 rows=3096 loops=1) Index Cond: (arvo_text = 't'::text) Filter: ((attr_tunniste)::text = 'suhde_hyvaksytty'::text) Total runtime: 10.855 ms (4 rows) hot2= create index suhde_hyvaksytty_idx on attr_value(arvo_text) where attr_tunniste = 'suhde_hyvaksytty'; CREATE INDEX hot2= analyze attr_value; hot2= explain analyze select * from attr_value where attr_tunniste = 'suhde_hyvaksytty' and arvo_text = 't'; QUERY PLAN - Index Scan using suhde_hyvaksytty_idx on attr_value (cost=0.00..43.72 rows=152 width=118) (actual time=0.093..4.776 rows=3096 loops=1) Index Cond: (arvo_text = 't'::text) Total runtime: 7.817 ms (3 rows) - Anssi -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
On 03/18/2011 01:14 PM, Thomas Kellerer wrote: Did you consider using hstore instead? I think in the PostgreSQL world, this is a better alternative than EAV and most probably faster as well. No, we did not. The reason is that we want to track each attribute with bi-temporal timestamps. The actual database schema for the attribute value table is: CREATE TABLE attr_value ( id SERIAL PRIMARY KEY, olio_id INTEGER NOT NULL REFERENCES base_olio, -- entity identifier attr_tunniste VARCHAR(20) NOT NULL REFERENCES base_attr, -- attr identifier kieli_tunniste VARCHAR(20) REFERENCES kieli, -- lang identifier arvo_number DECIMAL(18, 9), -- value number arvo_ts timestamptz, -- value timestamp arvo_text TEXT, -- value text arvo_valinta_tunniste VARCHAR(20), -- for choice lists: value_choice_identifier real_valid_from TIMESTAMPTZ NOT NULL, -- real_valid_from - real_valid_until define when things have been in real world real_valid_until TIMESTAMPTZ NOT NULL, db_valid_from TIMESTAMPTZ NOT NULL, -- db_valid_* defines when things have been in the database db_valid_until TIMESTAMPTZ NOT NULL, tx_id_insert INTEGER default txid_current(), tx_id_delete INTEGER, -- foreign keys checks skipped ); Naturally, we have other tables defining the objects, joins between objects and metadata for the EAV. All data modifications are done through procedures, which ensure uniqueness etc. for the attributes and joins. The data set is small, and performance in general is not that important, as long as the UI is responsive and data can be transferred to other systems in reasonable time. Insert performance is at least 10x worse than when using traditional schema, but it doesn't matter (we have somewhere around 1000 inserts / updates a day max). The only real problem so far is the chained nested loop problem, which really kills performance for some queries. Surprisingly (at least to me) this schema has worked really well, although sometimes there is a feeling that we are implementing a database using a database... - Anssi -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Xeon twice the performance of opteron
On Mar 17, 2011, at 9:39 PM, Scott Marlowe wrote: My experience puts the 23xx series opterons in a same general neighborhood as the E5300 and a little behind the E5400 series Xeons. OTOH, the newer Magny Cours Opterons stomp both of those into the ground. Do any of those machines have zone.reclaim.mode = 1 ??? i.e.: sysctl -a|grep zone.reclaim vm.zone_reclaim_mode = 0 I had a machine that had just high enough interzone communications cost to get it turned on by default and it slowed it right to a crawl under pgsql. It is set to zero on this machine. I've tried PG compiled on the box itself, same result. As for power savings, according to cpuinfo all the cores are running at 2.1ghz We had another machine which typically runs as a web server running on an AMD Opteron(tm) Processor 6128 which after diddling the speed governor to performance (thus bumping cpu speed to 2ghz from 800mhz) query speed increased to 100ms, still not as fast as the xeon, but close enough. I think I'm just hitting some wall of the architecture. I tried getting some oprofile love from it but oprofile seems to not work on that box. however it worked on the xeon box: 33995 9.6859 postgres j2date 21925 6.2469 postgres ExecMakeFunctionResultNoSets 20500 5.8409 postgres slot_deform_tuple 17623 5.0212 postgres BitmapHeapNext 13059 3.7208 postgres dt2time 12271 3.4963 postgres slot_getattr 11509 aside from j2date (probably coming up due to that Join filter I'd wager) nothing unexpected. -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On 18-3-2011 10:11, Scott Marlowe wrote: On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden acmmail...@tweakers.net wrote: On 18-3-2011 4:02 Scott Marlowe wrote: We have several 1U boxes (mostly Dell and Sun) running and had several in the past. And we've never had any heating problems with them. That includes machines with more power hungry processors than are currently available, all power slurping FB-dimm slots occupied and two raid cards installed. Note I am talking specifically about the ability to cool the RAID card, not the CPUS etc. Many 1U boxes have poor air flow across the expansion slots for PCI / etc cards, while doing a great job cooling the CPUs and memory. If you don't use high performance RAID cards (LSI 9xxx Areca 16xx 18xx) then it's not an issue. Open up your 1U and look at the air flow for the expansion slots, it's often just not very much. I was referring to amongst others two machines that have both a Dell Perc 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those also had processors that produce quite some heat (2x X5160 and 2x X5355) combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of heat. Those Dell Perc's are similar to the LSI's from the same period in time. So the produced heat form the other components was already pretty high. Still, I've seen no problems with heat for any component, including all four raid controllers. But I agree, there are some 1U servers that skimp on fans and thus air flow in the system. We've not had that problem with any of our systems. But both Sun and Dell seem to add quite a bit of fans in the middle of the system, where others may do it a bit less heavy duty and less over-dimensioned. Best regards, Arjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with Query Tuning
On 03/18/2011 12:17 AM, Adarsh Sharma wrote: Thanks , it works now ..:-) Here is the output : pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | ' pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | ' pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | ' pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); count 137193 (1 row) Time: 195441.894 ms what is the type/content for column publishing_date? based on what you show above, I assume it's text? -- if so, whats the format of the date string? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Help: massive parallel update to the same table
Hi all, Our system has a postgres database that has a table for statistic which is updated every hour by about 10K clients. Each client only make update to its own row in the table. So far I am only seeing one core out of eight cores on my server being active which tells me that the update is being done serial instead of being parallel. Do you know if there is a way for me to make these independent updates happen in parallel? Thank you, your help is very much appreciated!
Re: [PERFORM] Help: massive parallel update to the same table
Red Maple redmaplel...@gmail.com wrote: Our system has a postgres database that has a table for statistic which is updated every hour by about 10K clients. Each client only make update to its own row in the table. So far I am only seeing one core out of eight cores on my server being active which tells me that the update is being done serial instead of being parallel. Do you know if there is a way for me to make these independent updates happen in parallel? It should be parallel by default. Are you taking out any explicit locks? Also, it seems like you're only doing about three updates per second. I would expect a single-row update to run in a couple ms or less, so it would be rare that two requests would be active at the same time, so you wouldn't often see multiple cores active at the same time. (Of course, the background writer, autovacuum, etc., should occasionally show up concurrently with update queries.) Is there some particular problem you're trying to solve? (For example, is something too slow?) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help: massive parallel update to the same table
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Red Maple Sent: Friday, March 18, 2011 9:05 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Help: massive parallel update to the same table Hi all, Our system has a postgres database that has a table for statistic which is updated every hour by about 10K clients. Each client only make update to its own row in the table. So far I am only seeing one core out of eight cores on my server being active which tells me that the update is being done serial instead of being parallel. Do you know if there is a way for me to make these independent updates happen in parallel? Thank you, your help is very much appreciated! If they are all happening on one core, you are probably using one DB connection to do the updates. To split them across multiple cores, you need to use multiple DB connections. Be careful if/when you restructure things to filter these requests into a reasonable number of backend DB connections - turning a huge number of clients loose against a DB is not going end well. Brad. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fastest pq_restore?
Andy Colson a...@squeakycode.net wrote: On 03/17/2011 09:25 AM, Michael Andreasen wrote: I've been looking around for information on doing a pg_restore as fast as possible. I am using a twin processor box with 2GB of memory shared_buffers = 496MB Probably about right. maintenance_work_mem = 160MB You might get a benefit from a bit more there; hard to say what's best with so little RAM. checkpoint_segments = 30 This one is hard to call without testing. Oddly, some machines do better with the default of 3. Nobody knows why. autovacuum = false full_page_writes=false Good. fsync = off synchronous_commit = off Absolutely. bgwriter_lru_maxpages = 0 I hadn't thought much about that last one -- do you have benchmarks to confirm that it helped with a bulk load? You might want to set max_connections to something lower to free up more RAM for caching, especially considering that you have so little RAM. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Disabling nested loops - worst case performance
On Fri, Mar 18, 2011 at 7:52 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote: 18.03.11 09:15, Anssi Kääriäinen написав(ла): Hello. If your queries work on single attribute, you can try adding partial indexes for different attributes. Note that in this case parameterized statements may prevent index usage, so check also with attribute id inlined. And if your queries work on a single entity instead, you can partition the table per-entity thus teach the database enging about the correlation. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with Query Tuning
Thanks , it works now .. :-) Here is the output : pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like '%2010%' and pdc_uima-# content_language='en' and content is not null and isprocessable = 1 and pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | ' pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | ' pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | ' pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || ' | ' pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' ); count 137193 (1 row) Time: 195441.894 ms But my original query is to use AND also i.e Hi, just replace AND and OR (used with LIKE operator) for and | (used with to_tsquery). So this (content like '%Militant%' OR content like '%jihad%') AND (content like '%kill%' OR content like '%injure%') becomes to_tsvector('english',content) @@ to_tsquery('english', '(Militant | jihad) (kill | injure)') BTW it seems you somehow believe you'll get exactly the same result from those two queries (LIKE vs. tsearch) - that's false expectation. I believe the fulltext query is much better and more appropriate in this case, just don't expect the same results. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On Fri, Mar 18, 2011 at 3:19 AM, Jesper Krogh jes...@krogh.cc wrote: * Dependent on your workload of-course, you're typically not bottlenecked by the amount of cpu-cores, so strive for fewer faster cores. Depending on your workload again, but faster memory is even more important than faster math. So go for the architecture with the fastest memory bus. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help: massive parallel update to the same table
[rearranged - please don't top-post] [also, bringing this back to the list - please keep the list copied] Red Maple redmaplel...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: It should be parallel by default. Are you taking out any explicit locks? my clients use psql to remotely run an update function on the postgres server. Each client run its own psql to connect to the server. What I have noticed is that if I commented out the update in the function so that only query is being done then all the core would kick in and run at 100%. However if I allow the update on the function then only one core would run. Currently it take 40min to update all the client statistics Please show us the part you commented out to get the faster run time, and the source code for the function you mentioned. Do you know if I have configured something incorrectly? I am running postgres 9.0.2 on fedora core 14. Here is my postgres.conf file [over 500 lines of configuration, mostly comments, wrapped] If you're going to post that, please strip the comments or post the results of this query: http://wiki.postgresql.org/wiki/Server_Configuration I don't think anything in your configuration will affect this particular problem, but it seems likely that you could do some overall tuning. If you want to do that, you should probably start a new thread after this issue is sorted out. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On Fri, Mar 18, 2011 at 6:44 AM, Arjen van der Meijden acmmail...@tweakers.net wrote: On 18-3-2011 10:11, Scott Marlowe wrote: On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden acmmail...@tweakers.net wrote: On 18-3-2011 4:02 Scott Marlowe wrote: We have several 1U boxes (mostly Dell and Sun) running and had several in the past. And we've never had any heating problems with them. That includes machines with more power hungry processors than are currently available, all power slurping FB-dimm slots occupied and two raid cards installed. Note I am talking specifically about the ability to cool the RAID card, not the CPUS etc. Many 1U boxes have poor air flow across the expansion slots for PCI / etc cards, while doing a great job cooling the CPUs and memory. If you don't use high performance RAID cards (LSI 9xxx Areca 16xx 18xx) then it's not an issue. Open up your 1U and look at the air flow for the expansion slots, it's often just not very much. I was referring to amongst others two machines that have both a Dell Perc 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those also had processors that produce quite some heat (2x X5160 and 2x X5355) combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of heat. Those Dell Perc's are similar to the LSI's from the same period in time. So the produced heat form the other components was already pretty high. Still, I've seen no problems with heat for any component, including all four raid controllers. But I agree, there are some 1U servers that skimp on fans and thus air flow in the system. We've not had that problem with any of our systems. But both Sun and Dell seem to add quite a bit of fans in the middle of the system, where others may do it a bit less heavy duty and less over-dimensioned. Most machines have different pathways for cooling airflow over their RAID cards, and they don't share that air flow with the CPUs. Also, the PERC RAID controllers do not produce a lot of heat. The CPUs on the high performance LSI or Areca controllers are often dual core high performance CPUs in their own right, and those cards have heat sinks with fans on them to cool them. The cards themselves are what make so much heat and don't get enough cooling in many 1U servers. It has nothing to do with what else is in the server, again because the airflow for the cards is usually separate. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help: massive parallel update to the same table
Hi, Here is my function. If I comment out the update then it would run all the cores, if not then only one core will run CREATE OR REPLACE FUNCTION my_update_device(this_mac text, number_of_devices integer, this_sysuptime integer) RETURNS integer AS $BODY$ DECLARE fake_mac macaddr; this_id integer; new_avgld integer; BEGIN new_avgld = (this_sysuptime / 120) % 100; for i in 1..Number_of_devices loop fake_mac = substring(this_mac from 1 for 11) || ':' || upper(to_hex((i-1)/256)) || ':' || upper(to_hex((i-1)%256)); select into this_id id from ap where lan_mac = upper(fake_mac::text); if not found then return -1; end if; select into this_sysuptime sysuptime from ap_sysuptime where ap_id = this_id for update; -- == -- if I comment out the next update then all cores will be running, else only one core will be running -- == update ap_sysuptime set sysuptime = this_sysuptime, last_contacted = now() where ap_id = this_id; select into new_avgld avg_ld_1min from colubris_device where node_id = this_id for update; new_avgld = (this_avgld / 120 ) % 100; end loop; return this_id; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; On Fri, Mar 18, 2011 at 12:06 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: [rearranged - please don't top-post] [also, bringing this back to the list - please keep the list copied] Red Maple redmaplel...@gmail.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: It should be parallel by default. Are you taking out any explicit locks? my clients use psql to remotely run an update function on the postgres server. Each client run its own psql to connect to the server. What I have noticed is that if I commented out the update in the function so that only query is being done then all the core would kick in and run at 100%. However if I allow the update on the function then only one core would run. Currently it take 40min to update all the client statistics Please show us the part you commented out to get the faster run time, and the source code for the function you mentioned. Do you know if I have configured something incorrectly? I am running postgres 9.0.2 on fedora core 14. Here is my postgres.conf file [over 500 lines of configuration, mostly comments, wrapped] If you're going to post that, please strip the comments or post the results of this query: http://wiki.postgresql.org/wiki/Server_Configuration I don't think anything in your configuration will affect this particular problem, but it seems likely that you could do some overall tuning. If you want to do that, you should probably start a new thread after this issue is sorted out. -Kevin
Re: [PERFORM] Fastest pq_restore?
On 3/18/2011 9:38 AM, Kevin Grittner wrote: Andy Colsona...@squeakycode.net wrote: On 03/17/2011 09:25 AM, Michael Andreasen wrote: I've been looking around for information on doing a pg_restore as fast as possible. bgwriter_lru_maxpages = 0 I hadn't thought much about that last one -- do you have benchmarks to confirm that it helped with a bulk load? Nope, I got it from the running with scissors thread (I think), (maybe from Greg Smith) or here: http://rhaas.blogspot.com/2010/06/postgresql-as-in-memory-only-database_24.html I dont recall exactly. I saw it, add added a comment to my .conf just incase I ever needed it. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help: massive parallel update to the same table
Red Maple redmaplel...@gmail.com wrote: Here is my function. If I comment out the update then it would run all the cores, if not then only one core will run CREATE OR REPLACE FUNCTION [...] select sysuptime into this_sysuptime from ap_sysuptime where ap_id = this_id for update; -- == -- if I comment out the next update --then all cores will be running, --else only one core will be running -- == update ap_sysuptime set sysuptime = this_sysuptime, last_contacted = now() where ap_id = this_id; This proves that you're not showing us the important part. The update locks the same row previously locked by the SELECT FOR UPDATE, so any effect at the row level would be a serialization failure based on a write conflict, which doesn't sound like your problem. They get different locks at the table level, though: http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html#LOCKING-TABLES Somewhere in code you're not showing us you're acquiring a lock on the ap_sysuptime table which conflicts with a ROW EXCLUSIVE lock but not with a ROW SHARE lock. The lock types which could do that are SHARE and SHARE ROW EXCLUSIVE. CREATE INDEX (without CONCURRENTLY) could do that; otherwise it seems that you would need to be explicitly issuing a LOCK statement at one of these levels somewhere in your transaction. That is what is causing the transactions to run one at a time. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Request for feedback on hardware for a new database server
On Fri, Mar 18, 2011 at 10:32 AM, Scott Marlowe scott.marl...@gmail.com wrote: On Fri, Mar 18, 2011 at 6:44 AM, Arjen van der Meijden acmmail...@tweakers.net wrote: On 18-3-2011 10:11, Scott Marlowe wrote: On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden acmmail...@tweakers.net wrote: On 18-3-2011 4:02 Scott Marlowe wrote: We have several 1U boxes (mostly Dell and Sun) running and had several in the past. And we've never had any heating problems with them. That includes machines with more power hungry processors than are currently available, all power slurping FB-dimm slots occupied and two raid cards installed. Note I am talking specifically about the ability to cool the RAID card, not the CPUS etc. Many 1U boxes have poor air flow across the expansion slots for PCI / etc cards, while doing a great job cooling the CPUs and memory. If you don't use high performance RAID cards (LSI 9xxx Areca 16xx 18xx) then it's not an issue. Open up your 1U and look at the air flow for the expansion slots, it's often just not very much. I was referring to amongst others two machines that have both a Dell Perc 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those also had processors that produce quite some heat (2x X5160 and 2x X5355) combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of heat. Those Dell Perc's are similar to the LSI's from the same period in time. So the produced heat form the other components was already pretty high. Still, I've seen no problems with heat for any component, including all four raid controllers. But I agree, there are some 1U servers that skimp on fans and thus air flow in the system. We've not had that problem with any of our systems. But both Sun and Dell seem to add quite a bit of fans in the middle of the system, where others may do it a bit less heavy duty and less over-dimensioned. Most machines have different pathways for cooling airflow over their RAID cards, and they don't share that air flow with the CPUs. Also, the PERC RAID controllers do not produce a lot of heat. The CPUs on the high performance LSI or Areca controllers are often dual core high performance CPUs in their own right, and those cards have heat sinks with fans on them to cool them. The cards themselves are what make so much heat and don't get enough cooling in many 1U servers. It has nothing to do with what else is in the server, again because the airflow for the cards is usually separate. As a followup to this subject, the problem wasn't bad until the server load increased, thus increasing the load on the LSI MegaRAID card, at which point it started producing more heat than it had before. When the machine wasn't working too hard the LSI was fine. Once we started hitting higher and higher load is when the card had issues. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] REINDEX takes half a day (and still not complete!)
I have a large table but not as large as the kind of numbers that get discussed on this list. It has 125 million rows. REINDEXing the table takes half a day, and it's still not finished. To write this post I did SELECT COUNT(*), and here's the output -- so long! select count(*) from links; count --- 125418191 (1 row) Time: 1270405.373 ms That's 1270 seconds! I suppose the vaccuum analyze is not doing its job? As you can see from settings below, I have autovacuum set to ON, and there's also a cronjob every 10 hours to do a manual vacuum analyze on this table, which is largest. PG is version 8.2.9. Any thoughts on what I can do to improve performance!? Below are my settings. max_connections = 300 shared_buffers = 500MB effective_cache_size = 1GB max_fsm_relations= 1500 max_fsm_pages= 95 work_mem = 100MB temp_buffers = 4096 authentication_timeout = 10s ssl = off checkpoint_warning = 3600 random_page_cost = 1 autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay= 20 vacuum_cost_limit= 600 autovacuum_naptime = 10 stats_start_collector= on stats_row_level = on autovacuum_vacuum_threshold = 75 autovacuum_analyze_threshold = 25 autovacuum_analyze_scale_factor = 0.02 autovacuum_vacuum_scale_factor = 0.01 wal_buffers = 64 checkpoint_segments = 128 checkpoint_timeout = 900 fsync= on maintenance_work_mem = 512MB -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] REINDEX takes half a day (and still not complete!)
On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote: I have a large table but not as large as the kind of numbers that get discussed on this list. It has 125 million rows. REINDEXing the table takes half a day, and it's still not finished. To write this post I did SELECT COUNT(*), and here's the output -- so long! select count(*) from links; count --- 125418191 (1 row) Time: 1270405.373 ms That's 1270 seconds! I suppose the vaccuum analyze is not doing its job? As you can see from settings below, I have autovacuum set to ON, and there's also a cronjob every 10 hours to do a manual vacuum analyze on this table, which is largest. PG is version 8.2.9. Any thoughts on what I can do to improve performance!? Below are my settings. max_connections = 300 shared_buffers = 500MB effective_cache_size = 1GB max_fsm_relations = 1500 max_fsm_pages = 95 work_mem = 100MB What is the output of running vacuum verbose as a superuser (you can run it on the postgres database so it returns fast.) We're looking for the output that looks like this: INFO: free space map contains 1930193 pages in 749 relations DETAIL: A total of 1787744 page slots are in use (including overhead). 1787744 page slots are required to track all free space. Current limits are: 1000 page slots, 3000 relations, using 58911 kB. If the space needed exceeds page slots then you need to crank up your free space map. If the relations exceeds the available then you'll need to crank up max relations. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] REINDEX takes half a day (and still not complete!)
On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote: autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay = 20 vacuum_cost_limit = 600 autovacuum_naptime = 10 also, if vacuum can't keep up you can increase the vacuum cost limit, and lower the cost delay. Anything above 1ms is still quite a wait compared to 0. And most systems don't have the real granularity to go that low anyway, so 5ms is about as low as you can go and get a change before 0. Also, if you've got a lot of large relations you might need to increase the max workers as well. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance