Re: [PERFORM] An unwanted seqscan
Hi Tom, Sorry, I didn't ask the right question. I meant to ask "Why does it estimate a smaller cost for the seqscan?" With some further staring I was able to find the bad estimate and fix it by increasing the relevant statistics target. Thanks, Brian - Original Message From: Tom Lane <[EMAIL PROTECTED]> To: Brian Herlihy <[EMAIL PROTECTED]> Cc: Postgresql Performance Sent: Wednesday, 14 February, 2007 4:53:54 PM Subject: Re: [PERFORM] An unwanted seqscan Brian Herlihy <[EMAIL PROTECTED]> writes: > I am having trouble understanding why a seqscan is chosen for this query. As far as anyone can see from this output, the planner's decisions are correct: it prefers the plans with the smaller estimated cost. If you want us to take an interest, provide some more context --- EXPLAIN ANALYZE output for starters. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmarking PGSQL?
Krisna, > I'm a performance engineer, quite interested in getting deep into the > PGSQL performance enhancement effort. In that regard, I have the > following questions : > 1. Is there a benchmarking setup, that I can access online? > 2. What benchmarks are we running , for performance numbers? > 3. What are the current issues, related to performance? > 4. Where can I start, with the PGSQL performance effort? Hey, I work for Sun and we've been working on PostgreSQL & benchmarks. Hopefully we will soon have a server which runs Spec benchmarks which the community can legally use for testing (it's waiting on some setup issues). Help we could use right now includes work on an open source TPCE-like workload, being run by Rilson and Mark Wong. Another issue which could use help is reducing our WAL transaction log volume; it's higher than any other database. Or you could work on multi-processor scalability; we are still trying to identify the bottlenecks which are holding us back there. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Proximity query with GIST and row estimation
You'll find that PostGIS does a pretty good job of selectivity estimation. P On 13-Feb-07, at 9:09 AM, Guillaume Smet wrote: Hi all, Following the work on Mark Stosberg on this list (thanks Mark!), I optimized our slow proximity queries by using cube, earthdistance (shipped with contrib) and a gist index. The result is globally very interesting apart for a specific query and we'd like to be able to fix it too to be more consistent (it's currently faster with a basic distance calculation based on acos, cos and so on but it's slow anyway). The problem is that we have sometimes very few places near a given location (small city) and sometimes a lot of them (in Paris, Bruxelles and so on - it's the case we have here). The gist index I created doesn't estimate the number of rows in the area very well. Table: lieu (100k rows) with wgslat and wgslon as numeric Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG') Index: "idx_lieu_earth" gist (ll_to_earth(wgslat::double precision, wgslon::double precision)) The simplified query is: SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist FROM lieu l, lieugelieu lgl WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2; It's used to find the nearest car parks from a given location. The plan is attached plan_earthdistance_nestedloop.txt. It uses a nested loop because the row estimate is pretty bad: (cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1). If I disable the nested loop, the plan is different and faster (see plan_earthdistance_hash.txt attached). Is there any way to improve this estimation? I tried to set the statistics of wgslat and wgslon higher but it doesn't change anything (I don't know if the operator is designed to use the statistics). Any other idea to optimize this query is very welcome too. -- Guillaume ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How long should it take to insert 200,000 records?
unless you specify otherwiise, every insert carries its own transaction begin/commit. That's a lot of overhead for a single insert, no? Why not use a single transaction for, say, each 1000 inserts? That would strike a nice balance of security with efficiency. pseudo code for the insert: Begin Transaction; FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (x); IF i % 1000 = 0 THEN Commit Transaction; Begin Transaction; END IF; END LOOP; Commit Transaction; End This approach should speed up things dramatically. >>> "Karen Hill" <[EMAIL PROTECTED]> 2/6/2007 2:39 PM >>> On Feb 5, 9:33 pm, [EMAIL PROTECTED] (Tom Lane) wrote: > "Karen Hill" <[EMAIL PROTECTED]> writes: > > I have a pl/pgsql function that is inserting 200,000 records for > > testing purposes. What is the expected time frame for this operation > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? > > I think you have omitted a bunch of relevant facts. Bare INSERT is > reasonably quick: > > regression=# create table foo (f1 int); > CREATE TABLE > regression=# \timing > Timing is on. > regression=# insert into foo select x from generate_series(1,20) x; > INSERT 0 20 > Time: 5158.564 ms > regression=# > > (this on a not-very-fast machine) but if you weigh it down with a ton > of index updates, foreign key checks, etc, it could get slow ... > also you haven't mentioned what else that plpgsql function is doing. > The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..20 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql; ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] JOIN to a VIEW makes a real slow query
On Tuesday 13 February 2007 14:51, Tom Lane wrote: > "Chuck D." <[EMAIL PROTECTED]> writes: > > It is still using that sequence scan on the view after the APPEND for the > > us_city and world_city table. Any reason why the view won't use the > > indexes when it is JOINed to another table but it will when the view is > > queried without a JOIN? I should have mentioned this is v8.1.4. > > 8.1 isn't bright enough for that. Should work in 8.2 though. > > regards, tom lane Upgraded to 8.2.3 in my spare time here - went from the packaged binary that came with Ubuntu to compiling from source. Haven't tuned it yet, but what do you think about this join on the view? cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN Nested Loop Left Join (cost=0.00..17.76 rows=10614 width=486) (actual time=0.109..0.113 rows=1 loops=1) Join Filter: (mu.city_id = ci.city_id) -> Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=72) (actual time=0.015..0.017 rows=1 loops=1) -> Append (cost=0.00..16.72 rows=2 width=422) (actual time=0.073..0.075 rows=1 loops=1) -> Index Scan using pk_us_city on us_city (cost=0.00..8.28 rows=1 width=222) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: (mu.city_id = us_city.city_id) -> Index Scan using world_city_pk on world_city (cost=0.00..8.44 rows=1 width=422) (actual time=0.040..0.042 rows=1 loops=1) Index Cond: (mu.city_id = world_city.city_id) Total runtime: 0.359 ms (9 rows) >From 65 seconds down to less than 1 ms. Pretty good huh? Nice call Tom. Now I'll have to find some time to do the production server before this app goes up. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] reindex vs 'analyze'
On Wed, Feb 14, 2007 at 01:07:23PM -0500, Tom Lane wrote: > Mark Stosberg <[EMAIL PROTECTED]> writes: > > Your suggestion about the pet_state index was right on. I tried > > "Analyze" on it, but still got the same bad estimate. However, I then > > used "reindex" on that index, and that fixed the estimate accuracy, > > which made the query run faster! > > No, the estimate is about the same, and so is the plan. The data seems > to have changed though --- on Monday you had > > -> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..562.50 > rows=39571 width=0) (actual time=213.620..213.620 rows=195599 loops=82) >Index Cond: ((pet_state)::text = 'available'::text) > > and now it's > > -> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..285.02 > rows=41149 width=0) (actual time=22.043..22.043 rows=40397 loops=82) >Index Cond: ((pet_state)::text = 'available'::text) > > Don't tell me you got 155000 pets adopted out yesterday ... what > happened here? That seemed be the difference that the "reindex" made. The number of rows in the table and the number marked "available" is roughly unchanged. select count(*) from pets; 304951 (1 row) select count(*) from pets where pet_state = 'available'; --- 39857 It appears just about 400 were marked as "adopted" yesterday. > [ thinks... ] One possibility is that those were dead but > not-yet-vacuumed rows. What's your vacuuming policy on this table? It gets vacuum analyzed ery two hours throughout most of the day. Once Nightly we vacuum analyze everything, but most of the time we just do this table. > (A bitmap-index-scan plan node will count dead rows as returned, > unlike all other plan node types, since we haven't actually visited > the heap yet...) Thanks again for your help, Tom. Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer [EMAIL PROTECTED] Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . . ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] quad or dual core Intel CPUs
Approx. 200 reqest a sec. should be a problem unless the queries are heavy. Thanks Claus thats good news! I'm having a reputable vendor build the box and test it for me before delivering. The bottom line of your message, did you mean 'should be not a problem'? I wonder what the main reason for your improvement, your ram was increased by a factor of 2, but 4 way opteron vs 4 way woodcrest performance must not be that significant. Sorry, the line should read 'should *not* be a problem', pardon for the confusion. So 200 queries/s should be fine, probably won't make the server sweat. I'm not shure what attributed most to the decrease when the load went from approx. 100 during peak to less than 1! Since the db-server is such a vital part of our infrastructure, I was reluctant to upgrade it, while load was below 10. But in November and December - when we have our most busy time - our website slowed to a crawl, thus phasing a new server in was an easy decision. The woodcrest is a better performer compared to the current opteron, the ciss-disk-controller also has 256 MB cache compared to the 64 MB LSI-logic controller in the former db-server, FreeBSD 6.2 is also a better performer than 6.0, but I haven't done any benchmarking on the same hardware. regards Claus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmarking PGSQL?
Hi Merlin, On 2/14/07 8:20 AM, "Merlin Moncure" <[EMAIL PROTECTED]> wrote: > I am curious what is your take on the maximum insert performance, in > mb/sec of large bytea columns (toasted), and how much if any greenplum > was able to advance this over the baseline. I am asking on behalf of > another interested party. Interested in numbers broken down per core > on 8 core quad system and also aggreate. Our approach is to attach a segment to each core, so we scale INSERT linearly on number of cores. So the per core limit we live with is the 10-20MB/s observed here. We'd like to improve that so that we get better performance with smaller machines. We have demonstrated insert performance of 670 MB/s, 2.4TB/hour for non-toasted columns using 3 load machines against 120 cores. This rate was load machine limited. WRT toasted bytea columns we haven't done any real benchmarking of those. Do you have a canned benchmark we can run? - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Proximity query with GIST and row estimation
Paul, On 2/14/07, Paul Ramsey <[EMAIL PROTECTED]> wrote: You'll find that PostGIS does a pretty good job of selectivity estimation. PostGIS is probably what I'm going to experiment in the future. The only problem is that it's really big for a very basic need. With my current method, I don't even have to create a new column: I create directly a functional index so it's really easy to use. Using PostGIS requires to create a new column and triggers to maintain it and install PostGIS of course. That's why it was not my first choice. Thanks for your answer. -- Guillaume ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)
Mark Stosberg <[EMAIL PROTECTED]> writes: > Your suggestion about the pet_state index was right on. I tried > "Analyze" on it, but still got the same bad estimate. However, I then > used "reindex" on that index, and that fixed the estimate accuracy, > which made the query run faster! No, the estimate is about the same, and so is the plan. The data seems to have changed though --- on Monday you had -> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..562.50 rows=39571 width=0) (actual time=213.620..213.620 rows=195599 loops=82) Index Cond: ((pet_state)::text = 'available'::text) and now it's -> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..285.02 rows=41149 width=0) (actual time=22.043..22.043 rows=40397 loops=82) Index Cond: ((pet_state)::text = 'available'::text) Don't tell me you got 155000 pets adopted out yesterday ... what happened here? [ thinks... ] One possibility is that those were dead but not-yet-vacuumed rows. What's your vacuuming policy on this table? (A bitmap-index-scan plan node will count dead rows as returned, unlike all other plan node types, since we haven't actually visited the heap yet...) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] quad or dual core Intel CPUs
Thanks Claus thats good news! I'm having a reputable vendor build the box and test it for me before delivering. The bottom line of your message, did you mean 'should be not a problem'? I wonder what the main reason for your improvement, your ram was increased by a factor of 2, but 4 way opteron vs 4 way woodcrest performance must not be that significant. -Kenji > We recently migrated from a four way opteron @ 2 GHz with 8 GB to a > four way woodcrest @ 3 GHz (HP DL380 G5) with 16 GB ram. I also > upgraded FreeBSD from 6.0 to 6.2 and did a minor upgrade of postgresql > from 7.4.9 to 7.4.12. The change was tremendous, the first few hours > of after it went into production I had to doublecheck that our website > worked, since the load was way below 1 whereas the load had been > almost 100 during peak. > > I don't have any financial ties to HP but building a server from > scratch may not be worth it, rather than spending time assemling all > the different parts yourself I would suggest you get a server from one > vendor who build a server according to your specs. > > The DL380 (also) has a 256 MB bbc controller, the nic works flawlessly > with FreeBSD 6.2, all parts are well integrated, the frontbay can > accomodate 8 146 GB SAS drives. This server is wellsuited as a > postgresql-server. > > Approx. 200 reqest a sec. should be a problem unless the queries are heavy. > > regards > Claus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] reindex vs 'analyze' (was: Re: cube operations slower than geo_distance() on production server)
Merlin Moncure wrote: On 2/14/07, Tom Lane <[EMAIL PROTECTED]> wrote: There are two things wrong here: first, that the estimated row count is only 20% of actual; it should certainly not be that far off for such a simple condition. I wonder if your vacuum/analyze procedures are actually working. Second, you mentioned somewhere along the line that 'available' pets are about 10% of all the entries, which means that this indexscan is more than likely entirely counterproductive: it would be cheaper to ignore this index altogether. Tom, Thanks for the generosity of your time. We are using 8.1.3 currently. I have read there are some performance improvements in 8.2, but we have not started evaluating that yet. Your suggestion about the pet_state index was right on. I tried "Analyze" on it, but still got the same bad estimate. However, I then used "reindex" on that index, and that fixed the estimate accuracy, which made the query run faster! The cube search now benchmarks faster than the old search in production, taking about 2/3s of the time of the old one. Any ideas why the manual REINDEX did something that "analyze" didn't? It makes me wonder if there is other tuning like this to do. Attached is the EA output from the most recent run, after the "re-index". I think switching the index on pet_state to a composite on (pet_state, species_id) might help too. or even better: create function is_pet_available(text) returns bool as $$ select $1='available'; $$ language sql immutable; create index pets_available_species_idx on pets(is_pet_available(pet_state), species_id); Merlin, Thanks for this suggestion. It is not an approach I had used before, and I was interested to try it. However, the new index didn't get chosen. (Perhaps I would need to drop the old one?) However, Tom's suggestions did help. I'll follow up on that in just a moment. refactor your query something similar to: SELECT * FROM ( SELECT earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius FROM pets JOIN shelters_active as shelters USING (shelter_id) JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode JOIN zipcodes q ON q.zipcode = '90210' WHERE is_pet_available(pet_state) AND species_id = 1 AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords ) p order by radius merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend Sort (cost=5276.93..5277.00 rows=28 width=64) (actual time=1981.830..1984.415 rows=1344 loops=1) Sort Key: (cube_distance(public.zipcodes.earth_coords, public.zipcodes.earth_coords) / 1609.344::double precision) -> Nested Loop (cost=291.32..5276.26 rows=28 width=64) (actual time=24.080..1976.479 rows=1344 loops=1) -> Nested Loop (cost=2.15..575.79 rows=11 width=68) (actual time=2.637..34.067 rows=131 loops=1) -> Nested Loop (cost=2.15..153.48 rows=42 width=73) (actual time=1.939..3.972 rows=240 loops=1) -> Index Scan using zipcodes_pkey on zipcodes (cost=0.00..3.01 rows=1 width=32) (actual time=0.283..0.287 rows=1 loops=1) Index Cond: ((zipcode)::text = '90210'::text) -> Bitmap Heap Scan on zipcodes (cost=2.15..149.84 rows=42 width=41) (actual time=1.403..2.323 rows=240 loops=1) Recheck Cond: (cube_enlarge(("outer".earth_coords)::cube, 16093.4357308298::double precision, 3) @ zipcodes.earth_coords) -> Bitmap Index Scan on zip_earth_coords_idx (cost=0.00..2.15 rows=42 width=0) (actual time=1.377..1.377 rows=240 loops=1) Index Cond: (cube_enlarge(("outer".earth_coords)::cube, 16093.4357308298::double precision, 3) @ zipcodes.earth_coords) -> Index Scan using shelters_postal_code_for_joining_idx on shelters (cost=0.00..10.03 rows=2 width=12) (actual time=0.064..0.118 rows=1 loops=240) Index Cond: ((shelters.postal_code_for_joining)::text = ("outer".zipcode)::text) Filter: ((shelter_state)::text = 'active'::text) -> Bitmap Heap Scan on pets (cost=289.17..426.86 rows=35 width=4) (actual time=14.362..14.746 rows=10 loops=131) Recheck Cond: ((pets.shelter_id = "outer".shelter_id) AND ((pets.pet_state)::text = 'available'::text)) Filter: (species_id = 1) -> BitmapAnd (cost=289.17..289.17 rows=35 width=0) (actual time=14.219..14.219 rows=0 loops=131) -> Bitmap Index Scan on pets_shelter_id_idx (cost=0.00..3.89 rows=256 width=0) (actual time=0.188..0.188 rows=168 loops=131) Index Cond: (pets.shelter_id = "outer".shelter_id) -> Bitmap Index Scan on pets_pet_state_idx (cost=0.00..285.02 rows=41149 width=0) (actual time=22.043..22.043 rows=40397 loops=82) Index Cond: ((pet_state)::
Re: [PERFORM] Benchmarking PGSQL?
On 2/14/07, Luke Lonergan <[EMAIL PROTECTED]> wrote: Here's one: Insert performance is limited to about 10-12 MB/s no matter how fast the underlying I/O hardware. Bypassing the WAL (write ahead log) only boosts this to perhaps 20 MB/s. We've found that the biggest time consumer in the profile is the collection of routines that "convert to datum". You can perform the test using any dataset, you might consider using the TPC-H benchmark kit with a data generator available at www.tpc.org. Just generate some data, load the schema, then perform some COPY statements, INSERT INTO SELECT FROM and CREATE TABLE AS SELECT. I am curious what is your take on the maximum insert performance, in mb/sec of large bytea columns (toasted), and how much if any greenplum was able to advance this over the baseline. I am asking on behalf of another interested party. Interested in numbers broken down per core on 8 core quad system and also aggreate. merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmarking PGSQL?
Here¹s one: Insert performance is limited to about 10-12 MB/s no matter how fast the underlying I/O hardware. Bypassing the WAL (write ahead log) only boosts this to perhaps 20 MB/s. We¹ve found that the biggest time consumer in the profile is the collection of routines that ³convert to datum². You can perform the test using any dataset, you might consider using the TPC-H benchmark kit with a data generator available at www.tpc.org. Just generate some data, load the schema, then perform some COPY statements, INSERT INTO SELECT FROM and CREATE TABLE AS SELECT. - Luke On 2/14/07 2:00 AM, "Krishna Kumar" <[EMAIL PROTECTED]> wrote: > Hello All, > I'm a performance engineer, quite interested in getting deep into the PGSQL > performance enhancement effort. In that regard, I have the following questions > : > 1. Is there a benchmarking setup, that I can access online? > 2. What benchmarks are we running , for performance numbers? > 3. What are the current issues, related to performance? > 4. Where can I start, with the PGSQL performance effort? > > Thanks a lot, > Krishna >
Re: [PERFORM] cube operations slower than geo_distance() on production server
On 2/14/07, Tom Lane <[EMAIL PROTECTED]> wrote: There are two things wrong here: first, that the estimated row count is only 20% of actual; it should certainly not be that far off for such a simple condition. I wonder if your vacuum/analyze procedures are actually working. Second, you mentioned somewhere along the line that 'available' pets are about 10% of all the entries, which means that this indexscan is more than likely entirely counterproductive: it would be cheaper to ignore this index altogether. I think switching the index on pet_state to a composite on (pet_state, species_id) might help too. or even better: create function is_pet_available(text) returns bool as $$ select $1='available'; $$ language sql immutable; create index pets_available_species_idx on pets(is_pet_available(pet_state), species_id); refactor your query something similar to: SELECT * FROM ( SELECT earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius FROM pets JOIN shelters_active as shelters USING (shelter_id) JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode JOIN zipcodes q ON q.zipcode = '90210' WHERE is_pet_available(pet_state) AND species_id = 1 AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords ) p order by radius merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmarking PGSQL?
Have you tried pgbench yet? -- Shoaib Mir EnterpriseDB (www.enterprisedb.com) On 2/14/07, Krishna Kumar <[EMAIL PROTECTED]> wrote: Hello All, I'm a performance engineer, quite interested in getting deep into the PGSQL performance enhancement effort. In that regard, I have the following questions : 1. Is there a benchmarking setup, that I can access online? 2. What benchmarks are we running , for performance numbers? 3. What are the current issues, related to performance? 4. Where can I start, with the PGSQL performance effort? Thanks a lot, Krishna
[PERFORM] Benchmarking PGSQL?
Hello All, I'm a performance engineer, quite interested in getting deep into the PGSQL performance enhancement effort. In that regard, I have the following questions : 1. Is there a benchmarking setup, that I can access online? 2. What benchmarks are we running , for performance numbers? 3. What are the current issues, related to performance? 4. Where can I start, with the PGSQL performance effort? Thanks a lot, Krishna
Re: [PERFORM] quad or dual core Intel CPUs
I am about to pull the trigger on a new machine after analyzing some discussions I posted here last year. I've been trying to spec out a reliable and powerfull enough machine where I won't have to replace it for some time. Currently I've been using a dual Xeon 3.06ghz with 4GB of ram and utilizing a RAID 1+0 configuration over a total 6 SCSI disks asside from the OS partition. We have about 10GB of data and will probably scale at about 1GB per month. We currently average about 200 queries per second and the 15 minute load average is about .30. I am running FreeBSD 6.1. At the end of last year, I specced out a new machine to replace this box. At that time, the quad core 2.66ghz were not available from my vendor and I was not planning to go that route. Now that they are available, I am considering the option. The main question here is whether FreeBSD 6.X and PostgreSQL 8.1 would be able to take advantage of the quad core and perform better than the 3.0Ghz dual core. The reason I ask is due to some conflicting benchmarking results I see posted on the spec.org website. Here is the full specification of the new box I hope to build and run FreeBSD 6.X and PostgreSQL on: - SuperMicro Dual Xeon X7DBE+ motherboard + 2 x Quad Core X5355 2.66Ghz OR + 2 x Dual Core 5160 3.0Ghz - 8 x 1GB PC2-4200 fully buffered DIMM - LSI MegaRAID SAS 8408E w/BBU 256MB - 16 x 73GB SAS disk So, question #1, to go dual core or quad core? Quad core in theory seems to scale the machine's processing potential by almost a factor of two. We recently migrated from a four way opteron @ 2 GHz with 8 GB to a four way woodcrest @ 3 GHz (HP DL380 G5) with 16 GB ram. I also upgraded FreeBSD from 6.0 to 6.2 and did a minor upgrade of postgresql from 7.4.9 to 7.4.12. The change was tremendous, the first few hours of after it went into production I had to doublecheck that our website worked, since the load was way below 1 whereas the load had been almost 100 during peak. I don't have any financial ties to HP but building a server from scratch may not be worth it, rather than spending time assemling all the different parts yourself I would suggest you get a server from one vendor who build a server according to your specs. The DL380 (also) has a 256 MB bbc controller, the nic works flawlessly with FreeBSD 6.2, all parts are well integrated, the frontbay can accomodate 8 146 GB SAS drives. This server is wellsuited as a postgresql-server. Approx. 200 reqest a sec. should be a problem unless the queries are heavy. regards Claus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] An unwanted seqscan
Brian Herlihy <[EMAIL PROTECTED]> writes: > I am having trouble understanding why a seqscan is chosen for this query. As far as anyone can see from this output, the planner's decisions are correct: it prefers the plans with the smaller estimated cost. If you want us to take an interest, provide some more context --- EXPLAIN ANALYZE output for starters. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] An unwanted seqscan
Hi, I am having trouble understanding why a seqscan is chosen for this query. In practice the seqscan is very expensive, whereas the nested loop is usually quite fast, even with several hundred rows returned from meta_keywords_url. The server is running version 8.1.3, and both tables were analyzed recently. meta_keywords contains around 25% dead rows, meta_keywords_url contains no dead rows. I have included the query written both as a subquery and as a join. Thanks for any assistance! Brian live=> explain select * from meta_keywords where url_id in (select url_id from meta_keywords_url where host = 'postgresql.org'); QUERY PLAN - Hash IN Join (cost=1755.79..545380.52 rows=9442 width=29) Hash Cond: ("outer".url_id = "inner".url_id) -> Seq Scan on meta_keywords (cost=0.00..507976.54 rows=7110754 width=29) -> Hash (cost=1754.35..1754.35 rows=576 width=4) -> Bitmap Heap Scan on meta_keywords_url (cost=11.02..1754.35 rows=576 width=4) Recheck Cond: ((host)::text = 'postgresql.org'::text) -> Bitmap Index Scan on meta_keywords_url_host_path (cost=0.00..11.02 rows=576 width=0) Index Cond: ((host)::text = 'postgresql.org'::text) (8 rows) live=> set enable_seqscan=off; SET live=> explain select * from meta_keywords where url_id in (select url_id from meta_keywords_url where host = 'postgresql.org'); QUERY PLAN - Nested Loop (cost=1755.79..3161748.83 rows=9442 width=29) -> HashAggregate (cost=1755.79..1761.55 rows=576 width=4) -> Bitmap Heap Scan on meta_keywords_url (cost=11.02..1754.35 rows=576 width=4) Recheck Cond: ((host)::text = 'postgresql.org'::text) -> Bitmap Index Scan on meta_keywords_url_host_path (cost=0.00..11.02 rows=576 width=0) Index Cond: ((host)::text = 'postgresql.org'::text) -> Index Scan using meta_keywords_url_id on meta_keywords (cost=0.00..5453.28 rows=2625 width=29) Index Cond: (meta_keywords.url_id = "outer".url_id) (8 rows) live=> explain select * from meta_keywords join meta_keywords_url using (url_id) where host = 'postgresql.org'; QUERY PLAN - Hash Join (cost=1758.52..543685.43 rows=9297 width=107) Hash Cond: ("outer".url_id = "inner".url_id) -> Seq Scan on meta_keywords (cost=0.00..506859.29 rows=6994929 width=28) -> Hash (cost=1757.08..1757.08 rows=577 width=83) -> Bitmap Heap Scan on meta_keywords_url (cost=11.02..1757.08 rows=577 width=83) Recheck Cond: ((host)::text = 'postgresql.org'::text) -> Bitmap Index Scan on meta_keywords_url_host_path (cost=0.00..11.02 rows=577 width=0) Index Cond: ((host)::text = 'postgresql.org'::text) (8 rows) live=> set enable_seqscan=off; SET live=> explain select * from meta_keywords join meta_keywords_url using (url_id) where host = 'postgresql.org'; QUERY PLAN - Nested Loop (cost=0.00..3348211.21 rows=9297 width=107) -> Index Scan using meta_keywords_url_host_path on meta_keywords_url (cost=0.00..2230.24 rows=577 width=83) Index Cond: ((host)::text = 'postgresql.org'::text) -> Index Scan using meta_keywords_url_id on meta_keywords (cost=0.00..5765.81 rows=2649 width=28) Index Cond: (meta_keywords.url_id = "outer".url_id) (5 rows) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly