[PERFORM] 8.2.13 commit is taking too much time
I am using Postgresql 8.2.13 and I found that most of the commits and insert or update statements are taking more than 4s in the db and the app performance is slow for that. My db settings are as follows; bgwriter_all_maxpages | 300 | bgwriter_all_percent | 15 | bgwriter_delay| 300 | ms bgwriter_lru_maxpages | 50 | bgwriter_lru_percent | 10 | SHOW checkpoint_segments ; checkpoint_segments - 300 (1 row) show work_mem ; work_mem -- 16MB (1 row) show checkpoint_timeout ; checkpoint_timeout 5min (1 row) show checkpoint_warning ; checkpoint_warning 30s (1 row) show shared_buffers ; shared_buffers 4GB (1 row) I have 32 gb RAM and its a 4*2=8 core processors. Any idea how to improve the performance?
Re: [PERFORM] Postgres 9.0.4 + Hot Standby + FusionIO Drive + Performance => Query failed ERROR: catalog is missing 1 attribute(s) for relid 172226
Yes it has something to do with Hot Standby, if you omit some parts on the archive then the standby instance will not have the necessary stuff and complain like this.. I kept the FusionIO drive in my checklist while attending to this issue, as we tried it looking for performance combined with read-only hot standby and in doubt I thought that the recovery is not successful on this drive safely. so I pointed that Fio Drive here. Straight to say, I missed the pg_clog directory on archive. seq_page_cost = 1.0 random_page_cost = 1.0 Is the above settings are fine when we deal with Fio and Performance, as I have the advice earlier stating that read and write are treated same with Fio drives. Any suggestions on configuration changes to have read-only hot standby faster on READs. - Sethu On Sun, May 8, 2011 at 11:08 AM, Simon Riggs wrote: > On Tue, May 3, 2011 at 10:02 AM, Sethu Prasad > wrote: > > > I tried with the PostgreSQL 9.0.4 + Hot Standby and running the database > > from Fusion IO Drive to understand the PG Performance. > > > > While doing so I got the "Query failed ERROR: catalog is missing 1 > > attribute(s) for relid 172226". Any idea on this error? Is that > combination > > PG + HotSB + Fusion IO Drive is not advisable?! > > Why I wonder do you think this might have anything to do with Hot > Standby and/or FusionIO drives? > > This indicates either catalog or catalog index corruption of some kind. > > Did you only get this error once? > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [PERFORM] Benchmarking a large server
On Mon, May 9, 2011 at 10:32 PM, Chris Hoover wrote: > So, does anyone have any suggestions/experiences in benchmarking storage > when the storage is smaller then 2x memory? Try writing a small python script (or C program) to mmap a large chunk of memory, with MAP_LOCKED, this will keep it in RAM and avoid that RAM from being used for caching. The script should touch the memory at least once to avoid overcommit from getting smart on you. I think only root can lock memory, so that small program would have to run as root. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] indexes ignored when querying the master table
hi florian sorry for the late reply - it took almost a day to dump & reload the data into 9.1b1. how can i get postgres to use the indexes when querying the master table? I believe that this is a new feature in PostgreSQL 9.1 ("Allow inheritance table queries to return meaningfully-sorted results"). you are right, pgsql 9.1 indeed makes use of the indexes now: EXPLAIN ANALYZE SELECT * FROM data.logs ORDER BY re_timestamp DESC LIMIT 100; Limit (cost=11.63..36.45 rows=100 width=1390) (actual time=0.169..0.639 rows=100 loops=1) -> Result (cost=11.63..6421619.07 rows=25870141 width=1390) (actual time=0.154..0.610 rows=100 loops=1) -> Merge Append (cost=11.63..6421619.07 rows=25870141 width=1390) (actual time=0.150..0.429 rows=100 loops=1) Sort Key: data.logs.re_timestamp -> Sort (cost=11.46..11.56 rows=40 width=1776) (actual time=0.014..0.014 rows=0 loops=1) Sort Key: data.logs.re_timestamp Sort Method: quicksort Memory: 25kB -> Seq Scan on logs (cost=0.00..10.40 rows=40 width=1776) (actual time=0.003..0.003 rows=0 loops=1) -> Index Scan Backward using logs_2003_timestamp_idx on logs_2003 logs (cost=0.00..373508.47 rows=1825026 width=1327) (actual time=0.026..0.026 rows=1 loops=1) -> Index Scan Backward using logs_2004_timestamp_idx on logs_2004 logs (cost=0.00..417220.55 rows=2034041 width=1327) (actual time=0.012..0.012 rows=1 loops=1) -> Index Scan Backward using logs_2005_timestamp_idx on logs_2005 logs (cost=0.00..502664.57 rows=2438968 width=1345) (actual time=0.015..0.015 rows=1 loops=1) -> Index Scan Backward using logs_2006_timestamp_idx on logs_2006 logs (cost=0.00..640419.01 rows=3091214 width=1354) (actual time=0.015..0.015 rows=1 loops=1) -> Index Scan Backward using logs_2007_timestamp_idx on logs_2007 logs (cost=0.00..752875.00 rows=3603739 width=1369) (actual time=0.009..0.009 rows=1 loops=1) -> Index Scan Backward using logs_2008_timestamp_idx on logs_2008 logs (cost=0.00..969357.51 rows=4406653 width=1440) (actual time=0.007..0.007 rows=1 loops=1) -> Index Scan Backward using logs_2009_timestamp_idx on logs_2009 logs (cost=0.00..862716.39 rows=3986473 width=1422) (actual time=0.016..0.016 rows=1 loops=1) -> Index Scan Backward using logs_2010_timestamp_idx on logs_2010 logs (cost=0.00..778529.29 rows=3579586 width=1426) (actual time=0.009..0.009 rows=1 loops=1) -> Index Scan Backward using logs_2011_timestamp_idx on logs_2011 logs (cost=0.00..200253.71 rows=904401 width=1453) (actual time=0.006..0.089 rows=100 loops=1) Total runtime: 1.765 ms thanks for your help, thomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.2.13 commit is taking too much time
> Any idea how to improve the performance? Hmmm, I guess we'll need more info about resource usage (CPU, I/O, locks) used when the commit happens. Run these two commands $ iostat -x 1 $ vmstat 1 and then execute the commit. See what's causing problems. Is the drive utilization close to 100%? You've problems with disks (I'd bet this is the cause). Etc. There's a very nice chapter about this in Greg's book. BTW what filesystem are you using? Ext3, ext4, reiserfs, xfs? I do remember there were some problems with sync, that some filesystems are unable to sync individual files and always sync everything (which is going to suck if you want to sync just the WAL). regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On Mon, May 9, 2011 at 9:40 PM, Aren Cambre wrote: >> how are you reading through the table? if you are using OFFSET, you >> owe me a steak dinner. >> > > Nope. :-) > Below is my exact code for the main thread. The C# PLINQ statement is > highlighted. Let me know if I can help to explain this. > > NpgsqlConnection arrestsConnection = new > NpgsqlConnection(Properties.Settings.Default.dbConnectionString); > > arrestsConnection.Open(); > > > > /// First clear out the geocoding table > > NpgsqlCommand geocodingTableClear = new NpgsqlCommand("TRUNCATE > raw.\"TxDPS geocoding\"", arrestsConnection); > > geocodingTableClear.ExecuteNonQuery(); > > > > NpgsqlDataReader arrests = new NpgsqlCommand("SELECT * FROM > \"raw\".\"TxDPS all arrests\"", arrestsConnection).ExecuteReader(); > > > > /// Based on the pattern defined at > > /// > http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/2f5ce226-c500-4899-a923-99285ace42ae. > > foreach(IDataRecord arrest in > > from row in arrests.AsParallel().Cast () > > select row) > > { > > Geocoder geocodeThis = new Geocoder(arrest); > > geocodeThis.Geocode(); > > } > > > > arrestsConnection.Close(); hm. I'm not exactly sure. how about turning on statement level logging on the server for a bit and seeing if any unexpected queries are being generated and sent to the server. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On May 9, 2011, at 4:50 PM, Merlin Moncure wrote: hm, if it was me, I'd write a small C program that just jumped directly on the device around and did random writes assuming it wasn't formatted. For sequential read, just flush caches and dd the device to /dev/null. Probably someone will suggest better tools though. merlin http://pgfoundry.org/projects/pgiosim it is a small program we use to beat the [bad word] out of io systems. it randomly seeks, does an 8kB read, optionally writes it out (and optionally fsyncing) and reports how fast it is going (you need to watch iostat output as well so you can see actual physical tps without hte OS cache interfering). It goes through regular read & write calls like PG (I didn't want to bother with junk like o_direct & friends). it is also now multithreaded so you can fire up a bunch of random read threads (rather than firing up a bunch of pgiosims in parallel) and see how things scale up. -- Jeff Trout http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
2011/5/10 Greg Smith : > On 05/09/2011 11:13 PM, Shaun Thomas wrote: >> >> Take a look at /proc/sys/vm/dirty_ratio and >> /proc/sys/vm/dirty_background_ratio if you have an older Linux system, or >> /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes with a >> newer one. >> On older systems for instance, those are set to 40 and 20 respectively >> (recent kernels cut these in half). > > 1/4 actually; 10% and 5% starting in kernel 2.6.22. The main sources of > this on otherwise new servers I see are RedHat Linux RHEL5 systems running > 2.6.18. But as you say, even the lower defaults of the newer kernels can be > way too much on a system with lots of RAM. one can experiment writeback storm with this script from Chris Mason, under GPLv2: http://oss.oracle.com/~mason/fsync-tester.c You need to tweak it a bit, AFAIR, this #define SIZE (32768*32) must be reduced to be equal to 8kb blocks if you want similar to pg write pattern. The script does a big file, many small fsync, writing on both. Please, see http://www.spinics.net/lists/linux-ext4/msg24308.html It is used as a torture program by some linuxfs-hackers and may be useful for the OP on his large server to validate hardware and kernel. > > The main downside I've seen of addressing this by using a kernel with > dirty_bytes and dirty_background_bytes is that VACUUM can slow down > considerably. It really relies on the filesystem having a lot of write > cache to perform well. In many cases people are happy with VACUUM > throttling if it means nasty I/O spikes go away, but the trade-offs here are > still painful at times. > > -- > Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us > "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.2.13 commit is taking too much time
On 05/10/2011 03:01 AM, AI Rumman wrote: I am using Postgresql 8.2.13 and I found that most of the commits and insert or update statements are taking more than 4s in the db and the app performance is slow for that. My db settings are as follows; bgwriter_all_maxpages | 300 | bgwriter_all_percent | 15 | bgwriter_delay| 300 | ms bgwriter_lru_maxpages | 50 | bgwriter_lru_percent | 10 | Reduce bgwriter_all_maxpages to 0, definitely, and you might drop bgwriter_lru_maxpages to 0 too. Making the background writer in PostgreSQL 8.2 do more work as you've tried here increases the amount of repeated I/O done by a lot, without actually getting rid of any pauses. It wastes a lot of I/O capacity instead, making the problems you're seeing worse. shared_buffers 4GB On 8.2, shared_buffers should be no more than 128MB if you want to avoid long checkpoint pauses. You might even find best performance at the default of 32MB. I have 32 gb RAM and its a 4*2=8 core processors. Any idea how to improve the performance? There's nothing you can do here that will work better than upgrading to 8.3. See http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm for more information. PostgreSQL 8.2 had serious problems with the sort of pauses you're seeing back when systems had only 4GB of memory; you'll never get rid of them on a server with 32GB of RAM on that version. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition query on multiple cores
> I have 8-core server, I wanted to ask whether a query can be divided for > multiple processors or cores, if it could be what to do in postgresql No, at this time (and for the foreseeable future), a single query will run on a single core. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition query on multiple cores
On 05/10/2011 10:06 AM, Maciek Sakrejda wrote: I have 8-core server, I wanted to ask whether a query can be divided for multiple processors or cores, if it could be what to do in postgresql No, at this time (and for the foreseeable future), a single query will run on a single core. It can *kinda* be done. Take a look at GridSQL. It's really good for splitting up reporting-like queries that benefit from parallel access of large tables. It's not exactly Hadoop, but I ran a test on a single system with two separate instances of PostgreSQL, and a single query over those two nodes cut execution time in half. It's meant for server parallelism, so I wouldn't necessarily recommend splitting your data up across nodes on the same server. But it seems to deliver as promised when used in the right circumstances. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Question processor speed differences.
AMD Opteron(tm) Processor 4174 HE vs Intel(R) Xeon(R) CPUE5345 @ 2.33GHz I'm wondering if there is a performance difference running postgres on fedora on AMD vs Intel (the 2 listed above). I have an 8 way Intel Xeon box and a 12way AMD box and was thinking about migrating to the new AMD box, from the 4 year old Intel box. But I wasn't sure if there is some performance stats on AMD multi core procs vs the Intels for DB applications? Thanks Tory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] help speeding up a query in postgres 8.4.5
On Tue, Apr 5, 2011 at 3:25 PM, Maria L. Wilson wrote: > Would really appreciate someone taking a look at the query below Thanks > in advance! > > > this is on a linux box... > Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37 > EST 2009 x86_64 x86_64 x86_64 GNU/Linux > > explain analyze > select MIN(IV.STRTDATE), MAX(IV.ENDDATE) > from GRAN_VER GV > left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR > INVS > where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and > INVS.sensor_id='13' > > > "Aggregate (cost=736364.52..736364.53 rows=1 width=8) (actual > time=17532.930..17532.930 rows=1 loops=1)" > " -> Hash Join (cost=690287.33..734679.77 rows=336949 width=8) (actual > time=13791.593..17323.080 rows=924675 loops=1)" > " Hash Cond: (invs.granule_id = gv.granule_id)" > " -> Seq Scan on invsensor invs (cost=0.00..36189.41 rows=1288943 > width=4) (actual time=0.297..735.375 rows=1277121 loops=1)" > " Filter: (sensor_id = 13)" > " -> Hash (cost=674401.52..674401.52 rows=1270865 width=16) (actual > time=13787.698..13787.698 rows=1270750 loops=1)" > " -> Hash Join (cost=513545.62..674401.52 rows=1270865 > width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)" > " Hash Cond: (gv.granule_id = iv.granule_id)" > " -> Seq Scan on gran_ver gv (cost=0.00..75224.90 > rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)" > " -> Hash (cost=497659.81..497659.81 rows=1270865 > width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)" > " -> Bitmap Heap Scan on inventory iv > (cost=24050.00..497659.81 rows=1270865 width=12) (actual > time=253.542..1387.957 rows=1270750 loops=1)" > " Recheck Cond: (inv_id = 65)" > " -> Bitmap Index Scan on inven_idx1 > (cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364 > rows=1270977 loops=1)" > " Index Cond: (inv_id = 65)" > "Total runtime: 17533.100 ms" > > some additional info. > the table inventory is about 4481 MB and also has postgis types. > the table gran_ver is about 523 MB > the table INVSENSOR is about 217 MB > > the server itself has 32G RAM with the following set in the postgres conf > shared_buffers = 3GB > work_mem = 64MB > maintenance_work_mem = 512MB > wal_buffers = 6MB > > let me know if I've forgotten anything! thanks a bunch!! Late response here, but... Is there an index on invsensor (sensor_id, granule_id)? If not, that might be something to try. If so, you might want to try to figure out why it's not being used. Likewise, is there an index on gran_ver (granule_id)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
Greg Smith wrote: On 05/09/2011 11:13 PM, Shaun Thomas wrote: Take a look at /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio if you have an older Linux system, or /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes with a newer one. On older systems for instance, those are set to 40 and 20 respectively (recent kernels cut these in half). 1/4 actually; 10% and 5% starting in kernel 2.6.22. The main sources of this on otherwise new servers I see are RedHat Linux RHEL5 systems running 2.6.18. But as you say, even the lower defaults of the newer kernels can be way too much on a system with lots of RAM. Ugh...we're both right, sort of. 2.6.22 dropped them to 5/10: http://kernelnewbies.org/Linux_2_6_22 as I said. But on the new Scientific Linux 6 box I installed yesterday, they're at 10/20--as you suggested. Can't believe I'm going to need a table by kernel version and possibly distribution to keep this all straight now, what a mess. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] help speeding up a query in postgres 8.4.5
[ woops, accidentally replied off-list, trying again ] On Tue, May 10, 2011 at 1:47 PM, Maria L. Wilson wrote: > thanks for taking a look at this and it's never too late!! > > I've tried bumping up work_mem and did not see any improvements - > All the indexes do exist that you asked see below > Any other ideas? > > CREATE INDEX invsnsr_idx1 > ON invsensor > USING btree > (granule_id); > > CREATE INDEX invsnsr_idx2 > ON invsensor > USING btree > (sensor_id); What about a composite index on both columns? > CREATE UNIQUE INDEX granver_idx1 > ON gran_ver > USING btree > (granule_id); It's a bit surprising to me that this isn't getting used. How big are these tables, and how much memory do you have, and what values are you using for seq_page_cost/random_page_cost/effective_cache_size? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition query on multiple cores
Dne 10.5.2011 18:22, Shaun Thomas napsal(a): > On 05/10/2011 10:06 AM, Maciek Sakrejda wrote: > >>> I have 8-core server, I wanted to ask whether a query can be divided for >>> multiple processors or cores, if it could be what to do in postgresql >> >> No, at this time (and for the foreseeable future), a single query will >> run on a single core. > > It can *kinda* be done. Take a look at GridSQL. Or pgpool-II, that can give you something similar. http://pgpool.projects.postgresql.org/ regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Postgres NoSQL emulation
While reading about NoSQL, MongoDB let's you store and search JSON objects.In that case, you don't need to have the same "columns" in each "row" The following ensued. Isn't it cute ? CREATE TABLE mongo ( id SERIAL PRIMARY KEY, obj hstore NOT NULL ); INSERT INTO mongo (obj) SELECT ('a=>'||n||',key'||(n%10)||'=>'||n)::hstore FROM generate_series(1,10) n; SELECT * FROM mongo LIMIT 10; id | obj +- 1 | "a"=>"1", "key1"=>"1" 2 | "a"=>"2", "key2"=>"2" 3 | "a"=>"3", "key3"=>"3" 4 | "a"=>"4", "key4"=>"4" 5 | "a"=>"5", "key5"=>"5" 6 | "a"=>"6", "key6"=>"6" 7 | "a"=>"7", "key7"=>"7" 8 | "a"=>"8", "key8"=>"8" 9 | "a"=>"9", "key9"=>"9" 10 | "a"=>"10", "key0"=>"10" CREATE INDEX mongo_a ON mongo((obj->'a')) WHERE (obj->'a') IS NOT NULL; CREATE INDEX mongo_k1 ON mongo((obj->'key1')) WHERE (obj->'key1') IS NOT NULL; CREATE INDEX mongo_k2 ON mongo((obj->'key2')) WHERE (obj->'key2') IS NOT NULL; VACUUM ANALYZE mongo; SELECT * FROM mongo WHERE (obj->'key1')='271'; id |obj -+--- 271 | "a"=>"271", "key1"=>"271" (1 ligne) EXPLAIN ANALYZE SELECT * FROM mongo WHERE (obj->'key1')='271'; QUERY PLAN - Index Scan using mongo_k1 on mongo (cost=0.00..567.05 rows=513 width=36) (actual time=0.024..0.025 rows=1 loops=1) Index Cond: ((obj -> 'key1'::text) = '271'::text) Total runtime: 0.048 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Question processor speed differences.
On 05/10/2011 01:28 PM, Tory M Blue wrote: AMD Opteron(tm) Processor 4174 HE vs Intel(R) Xeon(R) CPUE5345 @ 2.33GHz I'm wondering if there is a performance difference running postgres on fedora on AMD vs Intel (the 2 listed above). I have an 8 way Intel Xeon box and a 12way AMD box and was thinking about migrating to the new AMD box, from the 4 year old Intel box. But I wasn't sure if there is some performance stats on AMD multi core procs vs the Intels for DB applications? The real limiting factor on CPU performance on your E5345 is how fast the server can shuffle things back and forth to memory. The FB-DIMM DDR2-667MHz memory on that server will be hard pressed to clear 5GB/s of memory access, probably less. That matter a lot when running in-memory database tasks, where the server is constantly shuffling 8K pages of data around. The new AMD box will have DDR3-1333 Mhz and a much better memory architecture to go with it. I'd expect 6 to 7GB/s out of a single core, and across multiple cores you might hit as much as 20GB/s if you have 4 channels of memory in there. Rough guess, new server is at least twice as fast, and might even hit four times as fast. If you have access to both boxes and can find a quiet period, you could try running stream-scaling: https://github.com/gregs1104/stream-scaling to quantify for yourself just how big the speed difference in this area. That's correlated extremely well for me with PostgreSQL performance on SELECT statements. If you're going to disk instead of being limited by the CPU, none of this matters though. Make sure you really are waiting for the CPUs most of the time. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres NoSQL emulation
On Tue, May 10, 2011 at 12:56 PM, Pierre C wrote: > > While reading about NoSQL, > >> MongoDB let's you store and search JSON objects.In that case, you don't >> need to have the same "columns" in each "row" > > The following ensued. Isn't it cute ? > > CREATE TABLE mongo ( id SERIAL PRIMARY KEY, obj hstore NOT NULL ); > INSERT INTO mongo (obj) SELECT ('a=>'||n||',key'||(n%10)||'=>'||n)::hstore > FROM generate_series(1,10) n; > > SELECT * FROM mongo LIMIT 10; > id | obj > +- > 1 | "a"=>"1", "key1"=>"1" > 2 | "a"=>"2", "key2"=>"2" > 3 | "a"=>"3", "key3"=>"3" > 4 | "a"=>"4", "key4"=>"4" > 5 | "a"=>"5", "key5"=>"5" > 6 | "a"=>"6", "key6"=>"6" > 7 | "a"=>"7", "key7"=>"7" > 8 | "a"=>"8", "key8"=>"8" > 9 | "a"=>"9", "key9"=>"9" > 10 | "a"=>"10", "key0"=>"10" > > CREATE INDEX mongo_a ON mongo((obj->'a')) WHERE (obj->'a') IS NOT NULL; > CREATE INDEX mongo_k1 ON mongo((obj->'key1')) WHERE (obj->'key1') IS NOT > NULL; > CREATE INDEX mongo_k2 ON mongo((obj->'key2')) WHERE (obj->'key2') IS NOT > NULL; > VACUUM ANALYZE mongo; > > SELECT * FROM mongo WHERE (obj->'key1')='271'; > id | obj > -+--- > 271 | "a"=>"271", "key1"=>"271" > (1 ligne) > > EXPLAIN ANALYZE SELECT * FROM mongo WHERE (obj->'key1')='271'; > QUERY PLAN > - > Index Scan using mongo_k1 on mongo (cost=0.00..567.05 rows=513 width=36) > (actual time=0.024..0.025 rows=1 loops=1) > Index Cond: ((obj -> 'key1'::text) = '271'::text) > Total runtime: 0.048 ms why even have multiple rows? just jam it all it there! :-D merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On 05/11/2011 05:34 AM, Aren Cambre wrote: > Using one thread, the app can do about 111 rows per second, and it's > only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / > 111 rows per second ~= 30 hours. > > I hoped to speed things up with some parallel processing. > > When the app is multithreaded, the app itself consumes about 3% CPU time > waiting for Postgres, which is only hammering 1 core and barely > exercising disk I/O (per two programs and HDD light). OK, so before looking at parallelism, you might want to look at why you're not getting much out of Pg and your app with even one thread. You should be able to put a high load on the disk disk - or one cpu core - without needing to split out work into multiple threads and parallel workers. I suspect your app is doing lots of tiny single-row queries instead of efficiently batching things. It'll be wasting huge amounts of time waiting for results. Even if every query is individually incredibly fast, with the number of them you seem to be doing you'll lose a LOT of time if you loop over lots of little SELECTs. The usual cause of the kind of slow performance you describe is an app that "chats" with the database continuously, so its pattern is: loop: ask for row from database using SELECT retrieve result do a tiny bit of processing continue loop This is incredibly inefficient, because Pg is always waiting for the app to ask for something or the app is waiting for Pg to return something. During each switch there are delays and inefficiencies. It's actually: loop: ask for a single row from database using SELECT [twiddle thumbs while database plans and executes the query] retrieve result do a tiny bit of processing [Pg twiddles its thumbs] continue loop What you want is your app and Pg working at the same time. Assuming that CPU is the limitation rather than database speed and disk I/O I'd use something like this: Thread 1: get cursor for selecting all rows from database loop: get 100 rows from cursor add rows to processing queue if queue contains over 1000 rows: wait until queue contains less than 1000 rows Thread 2: until there are no more rows: ask Thread 1 for 100 rows for each row: do a tiny bit of processing By using a producer/consumer model like that you can ensure that thread 1 is always talking to the database, keeping Pg busy, and thread 2 is always working the CPUs. The two threads should share NOTHING except the queue to keep the whole thing simple and clean. You must make sure that the "get 100 rows" operation of the producer can happen even while the producer is in the middle of getting some more rows from Pg (though not necessarily in the middle of actually appending them to the queue data structure) so you don't accidentally serialize on access to the producer thread. If the single producer thread can't keep, try reading in bigger batches or adding more producer threads with a shared queue. If the single consumer thread can't keep up with the producer, add more consumers to use more CPU cores. [producer 1] [producer 2] [...] [producer n] | | || - | queue | - | || | [worker 1] [worker 2] [...] [worker n] ... or you can have each worker fetch its own chunks of rows (getting rid of the producer/consumer split) using its own connection and just have lots more workers to handle all the wasted idle time. A producer/consumer approach will probably be faster, though. If the consumer threads produce a result that must be sent back to the database, you can either have each thread write it to the database using its own connection when it's done, or you can have them delegate that work to another thread that's dedicated to INSERTing the results. If the INSERTer can't keep up, guess what, you spawn more of them working off a shared queue. If the consumer threads require additional information from the database to do their work, make sure they avoid the: loop: fetch one row do work on row pattern, instead fetching sets of rows from the database in batches. Use joins if necessary, or the IN() criterion. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On 11/05/11 05:34, Aren Cambre wrote: > Using one thread, the app can do about 111 rows per second, and it's > only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / > 111 rows per second ~= 30 hours. I don't know how I missed that. You ARE maxing out one cpu core, so you're quite right that you need more threads unless you can make your single worker more efficient. Why not just spawn more copies of your program and have them work on ranges of the data, though? Might that not be simpler than juggling threading schemes? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres NoSQL emulation
On Tue, May 10, 2011 at 3:32 PM, Merlin Moncure wrote: > why even have multiple rows? just jam it all it there! :-D Exactly, serialize the object and stuff it into a simple key->value table. Way more efficient than EAV. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use >1 core
On Tue, May 10, 2011 at 7:35 PM, Craig Ringer wrote: > On 11/05/11 05:34, Aren Cambre wrote: > >> Using one thread, the app can do about 111 rows per second, and it's >> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows / >> 111 rows per second ~= 30 hours. > > I don't know how I missed that. You ARE maxing out one cpu core, so > you're quite right that you need more threads unless you can make your > single worker more efficient. > > Why not just spawn more copies of your program and have them work on > ranges of the data, though? Might that not be simpler than juggling > threading schemes? I suggested that earlier. But now I'm wondering if there's efficiencies to be gained by moving all the heavy lifting to the db as well as splitting thiings into multiple partitions to work on. I.e. don't grab 1,000 rows and work on them on the client side and then insert data, do the data mangling in the query in the database. My experience has been that moving things like this into the database can result in performance gains of several factors, taking hour long processes and making them run in minutes. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] help speeding up a query in postgres 8.4.5
thanks for taking a look at this and it's never too late!! I've tried bumping up work_mem and did not see any improvements - All the indexes do exist that you asked see below Any other ideas? CREATE INDEX invsnsr_idx1 ON invsensor USING btree (granule_id); CREATE INDEX invsnsr_idx2 ON invsensor USING btree (sensor_id); CREATE UNIQUE INDEX granver_idx1 ON gran_ver USING btree (granule_id); thanks for the look - Maria Wilson NASA/Langley Research Center Hampton, Virginia 23681 m.l.wil...@nasa.gov On 5/10/11 1:38 PM, Robert Haas wrote: On Tue, Apr 5, 2011 at 3:25 PM, Maria L. Wilson wrote: Would really appreciate someone taking a look at the query below Thanks in advance! this is on a linux box... Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37 EST 2009 x86_64 x86_64 x86_64 GNU/Linux explain analyze select MIN(IV.STRTDATE), MAX(IV.ENDDATE) from GRAN_VER GV left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR INVS where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and INVS.sensor_id='13' "Aggregate (cost=736364.52..736364.53 rows=1 width=8) (actual time=17532.930..17532.930 rows=1 loops=1)" " -> Hash Join (cost=690287.33..734679.77 rows=336949 width=8) (actual time=13791.593..17323.080 rows=924675 loops=1)" "Hash Cond: (invs.granule_id = gv.granule_id)" "-> Seq Scan on invsensor invs (cost=0.00..36189.41 rows=1288943 width=4) (actual time=0.297..735.375 rows=1277121 loops=1)" " Filter: (sensor_id = 13)" "-> Hash (cost=674401.52..674401.52 rows=1270865 width=16) (actual time=13787.698..13787.698 rows=1270750 loops=1)" " -> Hash Join (cost=513545.62..674401.52 rows=1270865 width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)" "Hash Cond: (gv.granule_id = iv.granule_id)" "-> Seq Scan on gran_ver gv (cost=0.00..75224.90 rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)" "-> Hash (cost=497659.81..497659.81 rows=1270865 width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)" " -> Bitmap Heap Scan on inventory iv (cost=24050.00..497659.81 rows=1270865 width=12) (actual time=253.542..1387.957 rows=1270750 loops=1)" "Recheck Cond: (inv_id = 65)" "-> Bitmap Index Scan on inven_idx1 (cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364 rows=1270977 loops=1)" " Index Cond: (inv_id = 65)" "Total runtime: 17533.100 ms" some additional info. the table inventory is about 4481 MB and also has postgis types. the table gran_ver is about 523 MB the table INVSENSOR is about 217 MB the server itself has 32G RAM with the following set in the postgres conf shared_buffers = 3GB work_mem = 64MB maintenance_work_mem = 512MB wal_buffers = 6MB let me know if I've forgotten anything! thanks a bunch!! Late response here, but... Is there an index on invsensor (sensor_id, granule_id)? If not, that might be something to try. If so, you might want to try to figure out why it's not being used. Likewise, is there an index on gran_ver (granule_id)? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] help speeding up a query in postgres 8.4.5
haven't tested a composite index invsensor is 2,003,980 rows and 219MB granver is 5,138,730 rows and 556MB the machine has 32G memory seq_page_cost, random_page_costs & effective_cache_size are set to the defaults (1,4, and 128MB) - looks like they could be bumped up. Got any recommendations? Maria On 5/10/11 1:59 PM, Robert Haas wrote: [ woops, accidentally replied off-list, trying again ] On Tue, May 10, 2011 at 1:47 PM, Maria L. Wilson wrote: thanks for taking a look at this and it's never too late!! I've tried bumping up work_mem and did not see any improvements - All the indexes do exist that you asked see below Any other ideas? CREATE INDEX invsnsr_idx1 ON invsensor USING btree (granule_id); CREATE INDEX invsnsr_idx2 ON invsensor USING btree (sensor_id); What about a composite index on both columns? CREATE UNIQUE INDEX granver_idx1 ON gran_ver USING btree (granule_id); It's a bit surprising to me that this isn't getting used. How big are these tables, and how much memory do you have, and what values are you using for seq_page_cost/random_page_cost/effective_cache_size? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance