[PERFORM] many instances or many databases or many users?
Hi, I have used postgresql some years now, but only small databases and only one database per instance and one user per database. Now we have a server reserved only for postgresql, and I'm wondering if it is better to set up: - only one instance and many databases or - many instances and only one database/instance or - one instance, one database and many users server will have 8G memory and 2 processors. Earlier we have had problems with difficult queries, some query can take 100% cpu and all other processes have slowed down. I have used oracle many years and in oracle it's better have one instance and many users, but oracle can handle many difficult queries in same time. no process (=query) can slow other queries as much as in postgesql. there is no need think safety, maintenance, ... only pure performance! is one instance capable to use that 8G of memory? and share it with different databases/users as needed? or will one big and difficult query take all memory and slow down whole server? if there is 2 instances one query can't take all memory, but downside is that if instance1 is inactive and instance2 is active, there will be much unused memory (reverved for instance1) and that can produce disk io when instance2 reads lots of data and sorts it. how you have set up your postgresql server? Ismo ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] many instances or many databases or many users?
[EMAIL PROTECTED] wrote: Now we have a server reserved only for postgresql, and I'm wondering if it is better to set up: - only one instance and many databases or - many instances and only one database/instance or - one instance, one database and many users It depends. One instance should give you best overall throughput, because the OS can maximize the use of resources across all users. There shouldn't be any difference between having one instance with many databases and one database and many users. server will have 8G memory and 2 processors. Earlier we have had problems with difficult queries, some query can take 100% cpu and all other processes have slowed down. How much data do you have? If it all fits in memory, it's not going to make much difference if you have one or more instances. If not, you might be better off with many instances dividing the memory between them, giving some level of fairness in the memory allocation. Unfortunately there's no way to stop one query from using 100% CPU (though on a 2 CPU server, it's only going to saturate 1 CPU). If you have difficult queries like that, I'd suggest that you take a look at the access plans to check if they could benefit from adding indexes or rewritten in a more efficient way. is one instance capable to use that 8G of memory? and share it with different databases/users as needed? or will one big and difficult query take all memory and slow down whole server? One instance can use all of the 8G of memory. You should set your shared_buffers to maybe 1-2G. People have different opinions on what exactly is the best value; I'd suggest that you try with different values to see what gives you the best performance in your application. if there is 2 instances one query can't take all memory, but downside is that if instance1 is inactive and instance2 is active, there will be much unused memory (reverved for instance1) and that can produce disk io when instance2 reads lots of data and sorts it. Yep. how you have set up your postgresql server? I'd say it's more a question of isolation and administration than performance. For example, do you want to be able to do filesystem-level backups and restores one database at a time? Do you need to shut down one database while keeping the rest of them running? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] cube operations slower than geo_distance() on production server
On 2/12/07, Mark Stosberg [EMAIL PROTECTED] wrote: Merlin Moncure wrote: Here the basic query I'm using: SELECT -- 1609.344 is a constant for meters per mile cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode = '90210') , earth_coords)/1609.344 AS RADIUS FROM pets -- shelters_active is a view where shelter_state = 'active' JOIN shelters_active as shelters USING (shelter_id) -- The zipcode fields here are varchars JOIN zipcodes ON ( shelters.postal_code_for_joining = zipcodes.zipcode ) -- search for just 'dogs' WHERE species_id = 1 AND pet_state='available' AND earth_box( (SELECT earth_coords from zipcodes WHERE zipcode = '90210') , 10*1609.344 ) @ earth_coords ORDER BY RADIUS; It may not have been clear from the query, but only the 'zipcodes' table has an 'earth_coords' column. Also, I think your refactoring means something different. My query expresses number of miles this pet is from 90210, while I think the refactor expresses a distance between a pet and another calculated value. my mistake, i misunderstood what you were trying to do...can you try removing the 'order by radius' and see if it helps? if not, we can try working on this query some more. There is a better, faster way to do this, I'm sure of it. merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] cube operations slower than geo_distance() on production server
On 2/13/07, Merlin Moncure [EMAIL PROTECTED] wrote: On 2/12/07, Mark Stosberg [EMAIL PROTECTED] wrote: Merlin Moncure wrote: Here the basic query I'm using: SELECT -- 1609.344 is a constant for meters per mile cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode = '90210') , earth_coords)/1609.344 AS RADIUS FROM pets -- shelters_active is a view where shelter_state = 'active' JOIN shelters_active as shelters USING (shelter_id) -- The zipcode fields here are varchars JOIN zipcodes ON ( shelters.postal_code_for_joining = zipcodes.zipcode ) -- search for just 'dogs' WHERE species_id = 1 AND pet_state='available' AND earth_box( (SELECT earth_coords from zipcodes WHERE zipcode = '90210') , 10*1609.344 ) @ earth_coords ORDER BY RADIUS; It may not have been clear from the query, but only the 'zipcodes' table has an 'earth_coords' column. Also, I think your refactoring means something different. My query expresses number of miles this pet is from 90210, while I think the refactor expresses a distance between a pet and another calculated value. my mistake, i misunderstood what you were trying to do...can you try removing the 'order by radius' and see if it helps? if not, we can try working on this query some more. There is a better, faster way to do this, I'm sure of it. try this: SELECT * FROM ( SELECT earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius FROM pets JOIN shelters_active as shelters USING (shelter_id) JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode JOIN zipcodes q ON q.zipcode = '90210' WHERE species_id = 1 AND pet_state='available' AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords ) p order by radius merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Question about Bitmap Heap Scan/BitmapAnd
Hi all, I'm currently working on optimizing a couple of queries. While studying the EXPLAIN ANALYZE output of a query, I found this Bitmap Heap Scan node: - Bitmap Heap Scan on lieu l (cost=12.46..63.98 rows=53 width=94) (actual time=35.569..97.166 rows=78 loops=1) Recheck Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.8606, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube) Filter: (parking AND (numlieu 0)) - BitmapAnd (cost=12.46..12.46 rows=26 width=0) (actual time=32.902..32.902 rows=0 loops=1) - Bitmap Index Scan on idx_lieu_earth (cost=0.00..3.38 rows=106 width=0) (actual time=30.221..30.221 rows=5864 loops=1) Index Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.8606, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube) - Bitmap Index Scan on idx_lieu_parking (cost=0.00..8.83 rows=26404 width=0) (actual time=0.839..0.839 rows=1095 loops=1) Index Cond: (parking = true) What surprises me is that parking is in the filter and not in the Recheck Cond whereas it's part of the second Bitmap Index Scan of the Bitmap And node. AFAIK, BitmapAnd builds a bitmap of the pages returned by the two Bitmap Index Scans so I supposed it should append both Index Cond in the Recheck Cond. Is there a reason why the second Index Cond in the filter? Does it make a difference in terms of performance (I suppose no but I'd like to have a confirmation)? Thanks. -- Guillaume ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd
Guillaume Smet [EMAIL PROTECTED] writes: What surprises me is that parking is in the filter and not in the Recheck Cond whereas it's part of the second Bitmap Index Scan of the Bitmap And node. That's probably because of this: /* * When dealing with special or lossy operators, we will at this point * have duplicate clauses in qpqual and bitmapqualorig. We may as well * drop 'em from bitmapqualorig, since there's no point in making the * tests twice. */ bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual); What's not immediately clear is why the condition was in both lists to start with. Perhaps idx_lieu_parking is a partial index with this as its WHERE condition? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Proximity query with GIST and row estimation
Hi all, Following the work on Mark Stosberg on this list (thanks Mark!), I optimized our slow proximity queries by using cube, earthdistance (shipped with contrib) and a gist index. The result is globally very interesting apart for a specific query and we'd like to be able to fix it too to be more consistent (it's currently faster with a basic distance calculation based on acos, cos and so on but it's slow anyway). The problem is that we have sometimes very few places near a given location (small city) and sometimes a lot of them (in Paris, Bruxelles and so on - it's the case we have here). The gist index I created doesn't estimate the number of rows in the area very well. Table: lieu (100k rows) with wgslat and wgslon as numeric Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG') Index: idx_lieu_earth gist (ll_to_earth(wgslat::double precision, wgslon::double precision)) The simplified query is: SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist FROM lieu l, lieugelieu lgl WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2; It's used to find the nearest car parks from a given location. The plan is attached plan_earthdistance_nestedloop.txt. It uses a nested loop because the row estimate is pretty bad: (cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1). If I disable the nested loop, the plan is different and faster (see plan_earthdistance_hash.txt attached). Is there any way to improve this estimation? I tried to set the statistics of wgslat and wgslon higher but it doesn't change anything (I don't know if the operator is designed to use the statistics). Any other idea to optimize this query is very welcome too. -- Guillaume db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist db-# FROM lieu l, lieugelieu lgl db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2; QUERY PLAN --- Limit (cost=626.84..626.85 rows=1 width=51) (actual time=449.287..449.298 rows=2 loops=1) - Unique (cost=626.84..626.85 rows=1 width=51) (actual time=449.283..449.290 rows=2 loops=1) - Sort (cost=626.84..626.84 rows=1 width=51) (actual time=449.278..449.279 rows=2 loops=1) Sort Key: round(sec_to_gc(cube_distance('(4192714.86111655, 171959.656483755, 4803394.52951123)'::cube, (ll_to_earth((l.wgslat)::double precision, (l.wgslon)::double precision))::cube))), l.numlieu, l.nomlieu - Nested Loop (cost=3.38..626.83 rows=1 width=51) (actual time=258.877..448.651 rows=78 loops=1) - Bitmap Heap Scan on lieu l (cost=3.38..201.34 rows=106 width=51) (actual time=32.988..60.197 rows=5786 loops=1) Recheck Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.8606, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube) - Bitmap Index Scan on idx_lieu_earth (cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1) Index Cond: ('(4190964.86112204, 170209.656489245, 4801644.52951672),(4194464.8606, 173709.656478266, 4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube) - Index Scan using idx_lieugelieu_codegelieu_numlieu_principal on lieugelieu lgl (cost=0.00..4.00 rows=1 width=4) (actual time=0.052..0.052 rows=0 loops=5786) Index Cond: (((lgl.codegelieu)::text = 'PKG'::text) AND (lgl.numlieu = outer.numlieu)) Total runtime: 449.607 ms (12 rows) db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND (earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, l.wgslon))) as dist db-# FROM lieu l, lieugelieu lgl db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = l.numlieu ORDER BY dist ASC LIMIT 2;
Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd
On 2/13/07, Tom Lane [EMAIL PROTECTED] wrote: bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual); What's not immediately clear is why the condition was in both lists to start with. Perhaps idx_lieu_parking is a partial index with this as its WHERE condition? Yes, it is: idx_lieu_parking btree (parking) WHERE parking = true . Sorry for not pointing it immediatly. If not, the index is not used at all (there are very few lines in lieu with parking=true). So the basic explanation is that it's in both lists due to the partial index and only qpqual keeps the condition? I would have expected the opposite but it doesn't change anything I suppose? Thanks for your answer. -- Guillaume ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd
Guillaume Smet [EMAIL PROTECTED] writes: So the basic explanation is that it's in both lists due to the partial index and only qpqual keeps the condition? I would have expected the opposite but it doesn't change anything I suppose? It gets the right answer, yes. I'm not sure if we could safely put the condition into the recheck instead of the filter. The particular code I showed you has to go the direction it does, because a condition in the filter has to be checked even if the bitmap is not lossy. I seem to recall concluding that we had to recheck partial-index conditions even if the bitmap is not lossy, but I can't reconstruct my reasoning at the moment. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] CPU Usage
On Tuesday 13 February 2007 10:36, Campbell, Lance [EMAIL PROTECTED] wrote: We have 12+ schemas in 1 database. When I do a unix top command I notice one postmaster process has 100% CPU usage. This process just stays at 100% to 99% CPU usage. There are other postmaster processes that pop up. They use hardly no CPU or memory. They also disappear very fast. I am wondering, is postgres only using one processor for database queries? Is there something I need to do to tell postgres to use more than one processor? Or does postgres only use up to one processor for any particular database? Each connection to the cluster gets a dedicated backend process, which can only be scheduled on one processor at a time. So, in effect, a single query can only use one processor at a time, but any number of other backends can be simultaneously using the other CPUs. It does not matter which database they are operating on. -- It is a besetting vice of democracies to substitute public opinion for law. - James Fenimore Cooper ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] JOIN to a VIEW makes a real slow query
Hi folks, I don't know if this is an SQL or PERFORMANCE list problem but I wanted to check here first. I've seen this discussed on the list before but I'm still not sure of the solution. Maybe my query is just structured wrong. I recently visited an old project of mine that has a 'city', 'state,' and 'country' tables. The city data comes from multiple sources and totals about 3 million rows. I decided to split the city table up based on the source (world_city, us_city). This makes easier updating because the assigned feature id's from the two sources overlap in some cases making it impossible to update as a single merged table. However, I decided to create a view to behave like the old 'city' table. The view is just a simple: SELECT [columns] FROM world_city UNION SELECT [columns] FROM us_city ; Selecting from the view is very quick, but JOINing to the view is slow. About 65 seconds to select a city. It doesn't matter wether it is joined to one table or 6 like it is in my user_detail query - it is still slow. It has indexes on the city_id, state_id, country_id of each table in the view too. Everything has been 'VACUUM ANALYZE' ed. When using explain analyze from the view I get this: cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN -- Nested Loop Left Join (cost=650146.58..751018.45 rows=10618 width=55) (actual time=53078.261..61269.190 rows=1 loops=1) Join Filter: (outer.city_id = inner.?column1?) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual time=0.010..0.022 rows=1 loops=1) - Unique (cost=650146.58..703236.51 rows=2123597 width=62) (actual time=49458.007..59635.140 rows=2122712 loops=1) - Sort (cost=650146.58..655455.58 rows=2123597 width=62) (actual time=49458.003..55405.965 rows=2122712 loops=1) Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat, lon, city_name - Append (cost=0.00..73741.94 rows=2123597 width=62) (actual time=18.835..13706.395 rows=2122712 loops=1) - Seq Scan on us_city (cost=0.00..4873.09 rows=169409 width=62) (actual time=18.832..620.553 rows=169398 loops=1) - Seq Scan on world_city (cost=0.00..47632.88 rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1) Total runtime: 61455.471 ms (10 rows) Time: 61512.377 ms So, a sequence scan on the tables in the view, won't use the index. Then do the same query by replacing the view with the real table: cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN -- Nested Loop Left Join (cost=0.00..4.04 rows=1 width=36) (actual time=53.854..53.871 rows=1 loops=1) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual time=0.010..0.016 rows=1 loops=1) - Index Scan using world_city_pk on world_city ci (cost=0.00..3.01 rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1) Index Cond: (outer.city_id = ci.city_id) Total runtime: 53.989 ms (5 rows) Time: 56.234 ms I'm not sure that a view on a UNION is the best idea but I don't know how to go about keeping the tables from the data sources with the view (other than modifying them with a source_id column). Any ideas on what is causing the performance lag? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] JOIN to a VIEW makes a real slow query
On 2/13/07, Chuck D. [EMAIL PROTECTED] wrote: Hi folks, I don't know if this is an SQL or PERFORMANCE list problem but I wanted to check here first. I've seen this discussed on the list before but I'm still not sure of the solution. Maybe my query is just structured wrong. I recently visited an old project of mine that has a 'city', 'state,' and 'country' tables. The city data comes from multiple sources and totals about 3 million rows. I decided to split the city table up based on the source (world_city, us_city). This makes easier updating because the assigned feature id's from the two sources overlap in some cases making it impossible to update as a single merged table. However, I decided to create a view to behave like the old 'city' table. The view is just a simple: SELECT [columns] FROM world_city UNION SELECT [columns] FROM us_city ; Selecting from the view is very quick, but JOINing to the view is slow. About 65 seconds to select a city. It doesn't matter wether it is joined to one table or 6 like it is in my user_detail query - it is still slow. It has indexes on the city_id, state_id, country_id of each table in the view too. Everything has been 'VACUUM ANALYZE' ed. use 'union all' instead of union. union without all has an implied sort and duplicate removal step that has to be resolved, materializing the view, before you can join to it. merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] quad or dual core Intel CPUs
I am about to pull the trigger on a new machine after analyzing some discussions I posted here last year. I've been trying to spec out a reliable and powerfull enough machine where I won't have to replace it for some time. Currently I've been using a dual Xeon 3.06ghz with 4GB of ram and utilizing a RAID 1+0 configuration over a total 6 SCSI disks asside from the OS partition. We have about 10GB of data and will probably scale at about 1GB per month. We currently average about 200 queries per second and the 15 minute load average is about .30. I am running FreeBSD 6.1. At the end of last year, I specced out a new machine to replace this box. At that time, the quad core 2.66ghz were not available from my vendor and I was not planning to go that route. Now that they are available, I am considering the option. The main question here is whether FreeBSD 6.X and PostgreSQL 8.1 would be able to take advantage of the quad core and perform better than the 3.0Ghz dual core. The reason I ask is due to some conflicting benchmarking results I see posted on the spec.org website. Here is the full specification of the new box I hope to build and run FreeBSD 6.X and PostgreSQL on: - SuperMicro Dual Xeon X7DBE+ motherboard + 2 x Quad Core X5355 2.66Ghz OR + 2 x Dual Core 5160 3.0Ghz - 8 x 1GB PC2-4200 fully buffered DIMM - LSI MegaRAID SAS 8408E w/BBU 256MB - 16 x 73GB SAS disk So, question #1, to go dual core or quad core? Quad core in theory seems to scale the machine's processing potential by almost a factor of two. And lastly, up till now, I've only have experience configuring SCSI RAID controllers. I believe this LSI MegaRAID unit has a dual channel setup, but when it comes to SAS drives, I don't know what kind of flexibility this provides. How should the disks be partitioned for maximum PostgreSQL performance? I'm thinking about keeping it simple assuming that the hot spare can only be utilized one per channel leaving me only 14 disks to utilize. 1 RAID1 partition using 2 disks total for the OS 1 RAID1+0 using 12 disks total striping over 6. If I am not able to utilize both channels to create a 12 disk RAID1+0 array, then it might be better to create 2 seperate data partitions, one for WAL/pg_xlog and the rest for the data store. Please comment on any issues you may see with this box and my assumptions. Also any FreeBSD kernel issues or tweaks you could recommend. Sincerely, Kenji ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] JOIN to a VIEW makes a real slow query
On Tuesday 13 February 2007 13:16, Merlin Moncure wrote: use 'union all' instead of union. union without all has an implied sort and duplicate removal step that has to be resolved, materializing the view, before you can join to it. Thanks for that Merlin, I forgot about using ALL. That does eliminate the UNIQUE, SORT and SORT lines from the EXPLAIN query. It also brings the query time down from a whopping 65 seconds to 11 seconds. The two tables contain unique rows already so ALL would be required. It is still using that sequence scan on the view after the APPEND for the us_city and world_city table. Any reason why the view won't use the indexes when it is JOINed to another table but it will when the view is queried without a JOIN? I should have mentioned this is v8.1.4. Also, does anyone know why this line: Join Filter: (outer.city_id = inner.?column1?) ... contains ?column1? instead of the actual column name? This is the result after UNION ALL on the view cmi=# explain analyze cmi-# select user_id, username, city_name cmi-# FROM m_user AS mu cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id) cmi-# ; QUERY PLAN - Nested Loop Left Join (cost=0.00..121523.88 rows=10618 width=55) (actual time=2392.376..11061.117 rows=1 loops=1) Join Filter: (outer.city_id = inner.?column1?) - Seq Scan on m_user mu (cost=0.00..1.01 rows=1 width=27) (actual time=0.025..0.028 rows=1 loops=1) - Append (cost=0.00..73741.94 rows=2123597 width=62) (actual time=16.120..9644.315 rows=2122712 loops=1) - Seq Scan on us_city (cost=0.00..4873.09 rows=169409 width=62) (actual time=16.119..899.802 rows=169398 loops=1) - Seq Scan on world_city (cost=0.00..47632.88 rows=1954188 width=61) (actual time=10.585..6949.946 rows=1953314 loops=1) Total runtime: 11061.441 ms (7 rows) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] JOIN to a VIEW makes a real slow query
Chuck D. [EMAIL PROTECTED] writes: It is still using that sequence scan on the view after the APPEND for the us_city and world_city table. Any reason why the view won't use the indexes when it is JOINed to another table but it will when the view is queried without a JOIN? I should have mentioned this is v8.1.4. 8.1 isn't bright enough for that. Should work in 8.2 though. Also, does anyone know why this line: Join Filter: (outer.city_id = inner.?column1?) ... contains ?column1? instead of the actual column name? EXPLAIN can't conveniently get access to the column name. That could probably be improved if someone wanted to put enough effort into it. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] quad or dual core Intel CPUs
Hi Kenji, On 13-2-2007 20:46 Kenji Morishige wrote: Here is the full specification of the new box I hope to build and run FreeBSD 6.X and PostgreSQL on: - SuperMicro Dual Xeon X7DBE+ motherboard + 2 x Quad Core X5355 2.66Ghz OR + 2 x Dual Core 5160 3.0Ghz - 8 x 1GB PC2-4200 fully buffered DIMM - LSI MegaRAID SAS 8408E w/BBU 256MB - 16 x 73GB SAS disk If this is in one of those 4U cases, make very, very sure it can properly exhaust all the heat generated and has more than adequate power supply. When going for a similar machine, we got a negative advice on such a set-up from a server vendor who built such machines themselves. Don't forget that the FB-dimms run pretty hot and they need sufficient cooling. As you can see on these pictures Fujitsu thought it necessary to add fan-ducts for the memory: http://tweakers.net/reviews/646/7 Our own Dell systems have similar ducts. But a third-party server builder we tested did not include those, and the machine ran very hot (I couldn't touch the bottom for more than a short time) in a not-too-good-ventilated, but mostly empty, server rack. Although that was a 2U machine, but it didn't include any disks. Currently we have had good experience with our new Dell 1950 (2x 5160, PC5300 FBD) combined with a Dell MD1000 SAS disk unit (15x 15k 36G disks) described in the second review linked below. HP offers similar options and there are probably several other suppliers who can build something like that too. Seperate SAS-JBOD disk units are available from other suppliers as well. So, question #1, to go dual core or quad core? Quad core in theory seems to scale the machine's processing potential by almost a factor of two. I can partially answer that question, but than for linux + postgresql 8.2. In that case, postgresql can take advantage of the extra core. See our review here: http://tweakers.net/reviews/661 This includes comparisons between the X5355 and 5160 with postgresql on the seventh page, here: http://tweakers.net/reviews/661/7 But be aware that there can be substantial and unexpected differences on this relatively new platform due to simply changing the OS, like we saw when going from linux 2.6.15 to 2.6.18, as you can see here: http://tweakers.net/reviews/657/2 Our benchmark has relatively little writing and a smallish dataset (fits in 4GB of memory), so I don't know how much use these benchmarks are for you. But the conclusion was that the extra processor power isn't fully available, possibly because there is less memory bandwidth per processor core and more communication overhead. Then again, in our test the dual quad core was faster than the dual dual core. And lastly, up till now, I've only have experience configuring SCSI RAID controllers. I believe this LSI MegaRAID unit has a dual channel setup, but when it comes to SAS drives, I don't know what kind of flexibility this provides. How should the disks be partitioned for maximum PostgreSQL performance? I'm thinking about keeping it simple assuming that the hot spare can only be utilized one per channel leaving me only 14 disks to utilize. In that Dell 1950 we use the Dell PERC5/e SAS-controller for the database, which is based on that same LSI controller, although it has 2 external sas connections (for 4 channels each). Afaik it supports global hot spares. But we use the full set of 15 disks as a 14+1 disk raid 5, so I haven't looked at that too well. For the OS we have a seperate PERC5/i internal raid controller with two internal disks. My colleague also tested several raid set-ups with that equipment, and we choose a raid5 for its slightly better read-performance. If you can make something of these dutch pages, you can have a look at those results here: http://tweakers.net/benchdb/test/122 Play around with the form at the bottom of the page to see some comparisons between several raid set-ups. The sas configurations are of course the ones with the Fujitsu MAX3036RC 36GB disks and Dell PERC 5/E controller. If I am not able to utilize both channels to create a 12 disk RAID1+0 array, then it might be better to create 2 seperate data partitions, one for WAL/pg_xlog and the rest for the data store. I'm not too sure how you can connect your disks and controller to a SAS-expander (which you need to connect more than 8 disks to a controller). I believe it is possible to use a 24-port expander, allowing communication between the 16 disks and 8 ports of your controller. A SAS-expander comes normally with the enclosure/disk unit, but I have no idea about the details. Our own testing was done using just a single 4-port connector, which can handle 1.2GB/sec (afaik this B is for bytes) and we believe that's sufficient for our 15 disks. Please comment on any issues you may see with this box and my assumptions. Also any FreeBSD kernel issues or tweaks you could recommend. Have a very good look at your heat production and exhaust
Re: [PERFORM] quad or dual core Intel CPUs
Kenji Morishige wrote: Please comment on any issues you may see with this box and my assumptions. Also any FreeBSD kernel issues or tweaks you could recommend. I would recommend posting to freebsd-hardware or freebsd-stable and asking if there are any gotchas with the X7DBE+ and 6.2 (for instance X7DBR-8+ suffers from an intermittent hang at boot... so can't hurt to ask!) Cheers Mark ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] quad or dual core Intel CPUs
Arjen van der Meijden wrote: But be aware that there can be substantial and unexpected differences on this relatively new platform due to simply changing the OS, like we saw when going from linux 2.6.15 to 2.6.18, as you can see here: http://tweakers.net/reviews/657/2 Having upgraded to 2.6.18 fairly recently, I am *very* interested in what caused the throughput to drop in 2.6.18? I haven't done any benchmarking on my system to know if it affected my usage pattern negatively, but I am curious if anyone knows why this happened? -Dan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] quad or dual core Intel CPUs
Dan, On 2/13/07, Dan Harris [EMAIL PROTECTED] wrote: Having upgraded to 2.6.18 fairly recently, I am *very* interested in what caused the throughput to drop in 2.6.18? I haven't done any benchmarking on my system to know if it affected my usage pattern negatively, but I am curious if anyone knows why this happened? I think you misread the graph. PostgreSQL 8.2 seems to be approximately 20% faster with kernel 2.6.18 on the platforms tested (and using tweakers.net benchmark). -- Guillaume ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster