[PERFORM] many instances or many databases or many users?

2007-02-13 Thread ismo . tuononen
Hi,

I have used postgresql some years now, but only small databases and only 
one database per instance and one user per database.

Now we have a server reserved only for postgresql, and I'm wondering if it 
is better to set up:
- only one instance and many databases or
- many instances and only one database/instance or
- one instance, one database and many users

server will have 8G memory and 2 processors.

Earlier we have had problems with difficult queries, some query can take 
100% cpu and all other processes have slowed down.

I have used oracle many years and in oracle it's better have one 
instance and many users, but oracle can handle many difficult queries in 
same time. no process (=query) can slow other queries as much as in postgesql.

there is no need think safety, maintenance, ... only pure performance!

is one instance capable to use that 8G of memory? and share it with 
different databases/users as needed?
or will one big and difficult query take all memory and slow down whole 
server?

if there is 2 instances one query can't take all memory, but downside is 
that if instance1 is inactive and instance2 is active, there will be much 
unused memory (reverved for instance1) and that can produce disk io when 
instance2 reads lots of data and sorts it.

how you have set up your postgresql server?

Ismo

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] many instances or many databases or many users?

2007-02-13 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:
Now we have a server reserved only for postgresql, and I'm wondering if it 
is better to set up:

- only one instance and many databases or
- many instances and only one database/instance or
- one instance, one database and many users


It depends. One instance should give you best overall throughput, 
because the OS can maximize the use of resources across all users.


There shouldn't be any difference between having one instance with many 
databases and one database and many users.



server will have 8G memory and 2 processors.

Earlier we have had problems with difficult queries, some query can take 
100% cpu and all other processes have slowed down.


How much data do you have? If it all fits in memory, it's not going to 
make much difference if you have one or more instances. If not, you 
might be better off with many instances dividing the memory between 
them, giving some level of fairness in the memory allocation.


Unfortunately there's no way to stop one query from using 100% CPU 
(though on a 2 CPU server, it's only going to saturate 1 CPU). If you 
have difficult queries like that, I'd suggest that you take a look at 
the access plans to check if they could benefit from adding indexes or 
rewritten in a more efficient way.


is one instance capable to use that 8G of memory? and share it with 
different databases/users as needed?
or will one big and difficult query take all memory and slow down whole 
server?


One instance can use all of the 8G of memory. You should set your 
shared_buffers to maybe 1-2G. People have different opinions on what 
exactly is the best value; I'd suggest that you try with different 
values to see what gives you the best performance in your application.


if there is 2 instances one query can't take all memory, but downside is 
that if instance1 is inactive and instance2 is active, there will be much 
unused memory (reverved for instance1) and that can produce disk io when 
instance2 reads lots of data and sorts it.


Yep.


how you have set up your postgresql server?


I'd say it's more a question of isolation and administration than 
performance. For example, do you want to be able to do filesystem-level 
backups and restores one database at a time? Do you need to shut down 
one database while keeping the rest of them running?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-13 Thread Merlin Moncure

On 2/12/07, Mark Stosberg [EMAIL PROTECTED] wrote:

Merlin Moncure wrote:

 Here the basic query I'm using:
 SELECT
  -- 1609.344 is a constant for meters per mile
  cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode =
 '90210') , earth_coords)/1609.344
AS RADIUS
FROM pets
-- shelters_active is a view where shelter_state = 'active'
JOIN shelters_active as shelters USING (shelter_id)
-- The zipcode fields here are varchars
JOIN zipcodes ON (
 shelters.postal_code_for_joining = zipcodes.zipcode )
-- search for just 'dogs'
WHERE species_id = 1
AND pet_state='available'
   AND earth_box(
 (SELECT earth_coords from zipcodes WHERE zipcode = '90210') ,
 10*1609.344
   ) @ earth_coords
ORDER BY RADIUS;

It may not have been clear from the query, but only the 'zipcodes' table
has an 'earth_coords' column. Also, I think your refactoring means
something different. My query expresses number of miles this pet is
from 90210, while I think the refactor expresses a distance between a
pet and another calculated value.


my mistake, i misunderstood what you were trying to do...can you try
removing the 'order by radius' and see if it helps? if not, we can try
working on this query some more.  There is a better, faster way to do
this, I'm sure of it.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] cube operations slower than geo_distance() on production server

2007-02-13 Thread Merlin Moncure

On 2/13/07, Merlin Moncure [EMAIL PROTECTED] wrote:

On 2/12/07, Mark Stosberg [EMAIL PROTECTED] wrote:
 Merlin Moncure wrote:
 
  Here the basic query I'm using:
  SELECT
   -- 1609.344 is a constant for meters per mile
   cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode =
  '90210') , earth_coords)/1609.344
 AS RADIUS
 FROM pets
 -- shelters_active is a view where shelter_state = 'active'
 JOIN shelters_active as shelters USING (shelter_id)
 -- The zipcode fields here are varchars
 JOIN zipcodes ON (
  shelters.postal_code_for_joining = zipcodes.zipcode )
 -- search for just 'dogs'
 WHERE species_id = 1
 AND pet_state='available'
AND earth_box(
  (SELECT earth_coords from zipcodes WHERE zipcode = '90210') ,
  10*1609.344
) @ earth_coords
 ORDER BY RADIUS;
 
 It may not have been clear from the query, but only the 'zipcodes' table
 has an 'earth_coords' column. Also, I think your refactoring means
 something different. My query expresses number of miles this pet is
 from 90210, while I think the refactor expresses a distance between a
 pet and another calculated value.

my mistake, i misunderstood what you were trying to do...can you try
removing the 'order by radius' and see if it helps? if not, we can try
working on this query some more.  There is a better, faster way to do
this, I'm sure of it.


try this:

SELECT * FROM
(
SELECT
 earth_coords(q.earth_coords, s.earth_coords)/1609.344 as radius
 FROM pets
 JOIN shelters_active as shelters USING (shelter_id)
 JOIN zipcodes s ON shelters.postal_code_for_joining = zipcodes.zipcode
 JOIN zipcodes q ON q.zipcode = '90210'
 WHERE species_id = 1
   AND pet_state='available'
   AND earth_box(q.earth_coords, 10*1609.344) @ s.earth_coords
) p order by radius

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-13 Thread Guillaume Smet

Hi all,

I'm currently working on optimizing a couple of queries. While
studying the EXPLAIN ANALYZE output of a query, I found this Bitmap
Heap Scan node:

-  Bitmap Heap Scan on lieu l  (cost=12.46..63.98 rows=53 width=94)
(actual time=35.569..97.166 rows=78 loops=1)
 Recheck Cond: ('(4190964.86112204, 170209.656489245,
4801644.52951672),(4194464.8606, 173709.656478266,
4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision,
(wgslon)::double precision))::cube)
 Filter: (parking AND (numlieu  0))
 -  BitmapAnd  (cost=12.46..12.46 rows=26 width=0) (actual
time=32.902..32.902 rows=0 loops=1)
   -  Bitmap Index Scan on idx_lieu_earth  (cost=0.00..3.38
rows=106 width=0) (actual time=30.221..30.221 rows=5864 loops=1)
 Index Cond: ('(4190964.86112204, 170209.656489245,
4801644.52951672),(4194464.8606, 173709.656478266,
4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision,
(wgslon)::double precision))::cube)
   -  Bitmap Index Scan on idx_lieu_parking  (cost=0.00..8.83
rows=26404 width=0) (actual time=0.839..0.839 rows=1095 loops=1)
 Index Cond: (parking = true)

What surprises me is that parking is in the filter and not in the
Recheck Cond whereas it's part of the second Bitmap Index Scan of the
Bitmap And node.
AFAIK, BitmapAnd builds a bitmap of the pages returned by the two
Bitmap Index Scans so I supposed it should append both Index Cond in
the Recheck Cond.

Is there a reason why the second Index Cond in the filter? Does it
make a difference in terms of performance (I suppose no but I'd like
to have a confirmation)?

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-13 Thread Tom Lane
Guillaume Smet [EMAIL PROTECTED] writes:
 What surprises me is that parking is in the filter and not in the
 Recheck Cond whereas it's part of the second Bitmap Index Scan of the
 Bitmap And node.

That's probably because of this:

/*
 * When dealing with special or lossy operators, we will at this point
 * have duplicate clauses in qpqual and bitmapqualorig.  We may as well
 * drop 'em from bitmapqualorig, since there's no point in making the
 * tests twice.
 */
bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual);

What's not immediately clear is why the condition was in both lists to
start with.  Perhaps idx_lieu_parking is a partial index with this as
its WHERE condition?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Proximity query with GIST and row estimation

2007-02-13 Thread Guillaume Smet

Hi all,

Following the work on Mark Stosberg on this list (thanks Mark!), I
optimized our slow proximity queries by using cube, earthdistance
(shipped with contrib) and a gist index. The result is globally very
interesting apart for a specific query and we'd like to be able to fix
it too to be more consistent (it's currently faster with a basic
distance calculation based on acos, cos and so on but it's slow
anyway).

The problem is that we have sometimes very few places near a given
location (small city) and sometimes a lot of them (in Paris, Bruxelles
and so on - it's the case we have here). The gist index I created
doesn't estimate the number of rows in the area very well.

Table: lieu (100k rows) with wgslat and wgslon as numeric
Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG')
Index: idx_lieu_earth gist (ll_to_earth(wgslat::double precision,
wgslon::double precision))

The simplified query is:
SELECT DISTINCT l.numlieu, l.nomlieu, ROUND
(earth_distance(ll_to_earth(48.85957600, 2.34860800),
ll_to_earth(l.wgslat, l.wgslon))) as dist
FROM lieu l, lieugelieu lgl
WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600,
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu =
l.numlieu ORDER BY dist ASC LIMIT 2;
It's used to find the nearest car parks from a given location.

The plan is attached plan_earthdistance_nestedloop.txt. It uses a
nested loop because the row estimate is pretty bad: (cost=0.00..3.38
rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1).

If I disable the nested loop, the plan is different and faster (see
plan_earthdistance_hash.txt attached).

Is there any way to improve this estimation? I tried to set the
statistics of wgslat and wgslon higher but it doesn't change anything
(I don't know if the operator is designed to use the statistics).

Any other idea to optimize this query is very welcome too.

--
Guillaume
db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND 
(earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, 
l.wgslon))) as dist
db-# FROM lieu l, lieugelieu lgl
db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = 
l.numlieu ORDER BY dist ASC LIMIT 2;

QUERY PLAN  

   
---
 Limit  (cost=626.84..626.85 rows=1 width=51) (actual time=449.287..449.298 
rows=2 loops=1)
   -  Unique  (cost=626.84..626.85 rows=1 width=51) (actual 
time=449.283..449.290 rows=2 loops=1)
 -  Sort  (cost=626.84..626.84 rows=1 width=51) (actual 
time=449.278..449.279 rows=2 loops=1)
   Sort Key: round(sec_to_gc(cube_distance('(4192714.86111655, 
171959.656483755, 4803394.52951123)'::cube, (ll_to_earth((l.wgslat)::double 
precision, (l.wgslon)::double precision))::cube))), l.numlieu, l.nomlieu
   -  Nested Loop  (cost=3.38..626.83 rows=1 width=51) (actual 
time=258.877..448.651 rows=78 loops=1)
 -  Bitmap Heap Scan on lieu l  (cost=3.38..201.34 
rows=106 width=51) (actual time=32.988..60.197 rows=5786 loops=1)
   Recheck Cond: ('(4190964.86112204, 170209.656489245, 
4801644.52951672),(4194464.8606, 173709.656478266, 4805144.52950574)'::cube 
@ (ll_to_earth((wgslat)::double precision, (wgslon)::double precision))::cube)
   -  Bitmap Index Scan on idx_lieu_earth  
(cost=0.00..3.38 rows=106 width=0) (actual time=30.229..30.229 rows=5864 
loops=1)
 Index Cond: ('(4190964.86112204, 
170209.656489245, 4801644.52951672),(4194464.8606, 173709.656478266, 
4805144.52950574)'::cube @ (ll_to_earth((wgslat)::double precision, 
(wgslon)::double precision))::cube)
 -  Index Scan using 
idx_lieugelieu_codegelieu_numlieu_principal on lieugelieu lgl  (cost=0.00..4.00 
rows=1 width=4) (actual time=0.052..0.052 rows=0 loops=5786)
   Index Cond: (((lgl.codegelieu)::text = 'PKG'::text) 
AND (lgl.numlieu = outer.numlieu))
 Total runtime: 449.607 ms
(12 rows)
db=# explain analyze SELECT DISTINCT l.numlieu, l.nomlieu, ROUND 
(earth_distance(ll_to_earth(48.85957600, 2.34860800), ll_to_earth(l.wgslat, 
l.wgslon))) as dist
db-# FROM lieu l, lieugelieu lgl
db-# WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600, 
2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu = 
l.numlieu ORDER BY dist ASC LIMIT 2;
 

Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-13 Thread Guillaume Smet

On 2/13/07, Tom Lane [EMAIL PROTECTED] wrote:

bitmapqualorig = list_difference_ptr(bitmapqualorig, qpqual);

What's not immediately clear is why the condition was in both lists to
start with.  Perhaps idx_lieu_parking is a partial index with this as
its WHERE condition?


Yes, it is: idx_lieu_parking btree (parking) WHERE parking = true .
Sorry for not pointing it immediatly.
If not, the index is not used at all (there are very few lines in lieu
with parking=true).

So the basic explanation is that it's in both lists due to the partial
index and only qpqual keeps the condition? I would have expected the
opposite but it doesn't change anything I suppose?

Thanks for your answer.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Question about Bitmap Heap Scan/BitmapAnd

2007-02-13 Thread Tom Lane
Guillaume Smet [EMAIL PROTECTED] writes:
 So the basic explanation is that it's in both lists due to the partial
 index and only qpqual keeps the condition? I would have expected the
 opposite but it doesn't change anything I suppose?

It gets the right answer, yes.  I'm not sure if we could safely put the
condition into the recheck instead of the filter.  The particular code
I showed you has to go the direction it does, because a condition in the
filter has to be checked even if the bitmap is not lossy.  I seem to
recall concluding that we had to recheck partial-index conditions even
if the bitmap is not lossy, but I can't reconstruct my reasoning at the
moment.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] CPU Usage

2007-02-13 Thread Alan Hodgson
On Tuesday 13 February 2007 10:36, Campbell, Lance [EMAIL PROTECTED] 
wrote:
 We have 12+ schemas in 1 database. When I do a unix top command I
 notice one postmaster process has 100% CPU usage.  This process just
 stays at 100% to 99% CPU usage.  There are other postmaster processes
 that pop up.  They use hardly no CPU or memory.  They also disappear
 very fast.  I am wondering, is postgres only using one processor for
 database queries?  Is there something I need to do to tell postgres
 to use more than one processor?  Or does postgres only use up to one
 processor for any particular database?

Each connection to the cluster gets a dedicated backend process, which 
can only be scheduled on one processor at a time.  So, in effect, a 
single query can only use one processor at a time, but any number of 
other backends can be simultaneously using the other CPUs.  It does not 
matter which database they are operating on.

-- 
It is a besetting vice of democracies to substitute public opinion for
law. - James Fenimore Cooper 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] JOIN to a VIEW makes a real slow query

2007-02-13 Thread Chuck D.
Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to 
check here first.  I've seen this discussed on the list before but I'm still 
not sure of the solution.  Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,' 
and 'country' tables.  The city data comes from multiple sources and totals 
about 3 million rows.  I decided to split the city table up based on the 
source (world_city, us_city).  This makes easier updating because the 
assigned feature id's from the two sources overlap in some cases making it 
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table.  The 
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow.  About 
65 seconds to select a city.  It doesn't matter wether it is joined to one 
table or 6 like it is in my user_detail query - it is still slow.  It has 
indexes on the city_id, state_id, country_id of each table in the view too.  
Everything has been 'VACUUM ANALYZE' ed.

When using explain analyze from the view I get this:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
QUERY PLAN
--
 Nested Loop Left Join  (cost=650146.58..751018.45 rows=10618 width=55) 
(actual time=53078.261..61269.190 rows=1 loops=1)
   Join Filter: (outer.city_id = inner.?column1?)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.010..0.022 rows=1 loops=1)
   -  Unique  (cost=650146.58..703236.51 rows=2123597 width=62) (actual 
time=49458.007..59635.140 rows=2122712 loops=1)
 -  Sort  (cost=650146.58..655455.58 rows=2123597 width=62) (actual 
time=49458.003..55405.965 rows=2122712 loops=1)
   Sort Key: city_id, state_id, country_id, cc1, rc, adm1, lat, 
lon, city_name
   -  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual 
time=18.835..13706.395 rows=2122712 loops=1)
 -  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409 
width=62) (actual time=18.832..620.553 rows=169398 loops=1)
 -  Seq Scan on world_city  (cost=0.00..47632.88 
rows=1954188 width=61) (actual time=23.513..11193.341 rows=1953314 loops=1)
 Total runtime: 61455.471 ms
(10 rows)

Time: 61512.377 ms

So, a sequence scan on the tables in the view, won't use the index.

Then do the same query by replacing the view with the real table:

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# left JOIN geo.world_city AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
  QUERY PLAN
--
 Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=36) (actual 
time=53.854..53.871 rows=1 loops=1)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.010..0.016 rows=1 loops=1)
   -  Index Scan using world_city_pk on world_city ci  (cost=0.00..3.01 
rows=1 width=17) (actual time=53.825..53.833 rows=1 loops=1)
 Index Cond: (outer.city_id = ci.city_id)
 Total runtime: 53.989 ms
(5 rows)

Time: 56.234 ms


I'm not sure that a view on a UNION is the best idea but I don't know how to 
go about keeping the tables from the data sources with the view (other than 
modifying them with a source_id column).  Any ideas on what is causing the 
performance lag?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] JOIN to a VIEW makes a real slow query

2007-02-13 Thread Merlin Moncure

On 2/13/07, Chuck D. [EMAIL PROTECTED] wrote:

Hi folks,

I don't know if this is an SQL or PERFORMANCE list problem but I wanted to
check here first.  I've seen this discussed on the list before but I'm still
not sure of the solution.  Maybe my query is just structured wrong.

I recently visited an old project of mine that has a 'city', 'state,'
and 'country' tables.  The city data comes from multiple sources and totals
about 3 million rows.  I decided to split the city table up based on the
source (world_city, us_city).  This makes easier updating because the
assigned feature id's from the two sources overlap in some cases making it
impossible to update as a single merged table.

However, I decided to create a view to behave like the old 'city' table.  The
view is just a simple:

SELECT [columns]
FROM world_city
UNION
SELECT [columns]
FROM us_city
;

Selecting from the view is very quick, but JOINing to the view is slow.  About
65 seconds to select a city.  It doesn't matter wether it is joined to one
table or 6 like it is in my user_detail query - it is still slow.  It has
indexes on the city_id, state_id, country_id of each table in the view too.
Everything has been 'VACUUM ANALYZE' ed.


use 'union all' instead of union.  union without all has an implied
sort and duplicate removal step that has to be resolved, materializing
the view, before you can join to it.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Kenji Morishige
I am about to pull the trigger on a new machine after analyzing some
discussions I posted here last year.  I've been trying to spec out a reliable
and powerfull enough machine where I won't have to replace it for some time.
Currently I've been using a dual Xeon 3.06ghz with 4GB of ram and utilizing a
RAID 1+0 configuration over a total 6 SCSI disks asside from the OS partition.
We have about 10GB of data and will probably scale at about 1GB per month. We
currently average about 200 queries per second and the 15 minute load average
is about .30.  I am running FreeBSD 6.1.

At the end of last year, I specced out a new machine to replace this box. At
that time, the quad core 2.66ghz were not available from my vendor and I was
not planning to go that route. Now that they are available, I am considering
the option. The main question here is whether FreeBSD 6.X and PostgreSQL 8.1
would be able to take advantage of the quad core and perform better than the
3.0Ghz dual core.  The reason I ask is due to some conflicting benchmarking
results I see posted on the spec.org website.

Here is the full specification of the new box I hope to build and run FreeBSD
6.X and PostgreSQL on:

- SuperMicro Dual Xeon X7DBE+ motherboard
  + 2 x Quad Core X5355 2.66Ghz 
OR
  + 2 x Dual Core 5160 3.0Ghz 

- 8 x 1GB PC2-4200 fully buffered DIMM

- LSI MegaRAID SAS 8408E w/BBU 256MB

- 16 x 73GB SAS disk

So, question #1, to go dual core or quad core?  Quad core in theory seems to
scale the machine's processing potential by almost a factor of two.

And lastly, up till now, I've only have experience configuring SCSI RAID
controllers.  I believe this LSI MegaRAID unit has a dual channel setup, but
when it comes to SAS drives, I don't know what kind of flexibility this
provides.  How should the disks be partitioned for maximum PostgreSQL
performance?

I'm thinking about keeping it simple assuming that the hot spare can only be
utilized one per channel leaving me only 14 disks to utilize.

1 RAID1 partition using 2 disks total for the OS
1 RAID1+0 using 12 disks total striping over 6.

If I am not able to utilize both channels to create a 12 disk RAID1+0 array,
then it might be better to create 2 seperate data partitions, one for
WAL/pg_xlog and the rest for the data store.

Please comment on any issues you may see with this box and my assumptions.
Also any FreeBSD kernel issues or tweaks you could recommend.

Sincerely,
Kenji 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] JOIN to a VIEW makes a real slow query

2007-02-13 Thread Chuck D.
On Tuesday 13 February 2007 13:16, Merlin Moncure wrote:

 use 'union all' instead of union.  union without all has an implied
 sort and duplicate removal step that has to be resolved, materializing
 the view, before you can join to it.


Thanks for that Merlin, I forgot about using ALL.  That does eliminate the 
UNIQUE, SORT and SORT lines from the EXPLAIN query.  It also brings the query 
time down from a whopping 65 seconds to 11 seconds.  The two tables contain 
unique rows already so ALL would be required.

It is still using that sequence scan on the view after the APPEND for the 
us_city and world_city table.  Any reason why the view won't use the indexes 
when it is JOINed to another table but it will when the view is queried 
without a JOIN?  I should have mentioned this is v8.1.4.

Also, does anyone know why this line:
Join Filter: (outer.city_id = inner.?column1?)
... contains ?column1? instead of the actual column name?

This is the result after UNION ALL on the view

cmi=# explain analyze
cmi-# select user_id, username, city_name
cmi-# FROM   m_user AS mu
cmi-# LEFT JOIN geo.city_vw AS ci ON (mu.city_id = ci.city_id)
cmi-# ;
 QUERY PLAN
-
 Nested Loop Left Join  (cost=0.00..121523.88 rows=10618 width=55) (actual 
time=2392.376..11061.117 rows=1 loops=1)
   Join Filter: (outer.city_id = inner.?column1?)
   -  Seq Scan on m_user mu  (cost=0.00..1.01 rows=1 width=27) (actual 
time=0.025..0.028 rows=1 loops=1)
   -  Append  (cost=0.00..73741.94 rows=2123597 width=62) (actual 
time=16.120..9644.315 rows=2122712 loops=1)
 -  Seq Scan on us_city  (cost=0.00..4873.09 rows=169409 width=62) 
(actual time=16.119..899.802 rows=169398 loops=1)
 -  Seq Scan on world_city  (cost=0.00..47632.88 rows=1954188 
width=61) (actual time=10.585..6949.946 rows=1953314 loops=1)
 Total runtime: 11061.441 ms
(7 rows)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] JOIN to a VIEW makes a real slow query

2007-02-13 Thread Tom Lane
Chuck D. [EMAIL PROTECTED] writes:
 It is still using that sequence scan on the view after the APPEND for the 
 us_city and world_city table.  Any reason why the view won't use the indexes 
 when it is JOINed to another table but it will when the view is queried 
 without a JOIN?  I should have mentioned this is v8.1.4.

8.1 isn't bright enough for that.  Should work in 8.2 though.

 Also, does anyone know why this line:
 Join Filter: (outer.city_id = inner.?column1?)
 ... contains ?column1? instead of the actual column name?

EXPLAIN can't conveniently get access to the column name.  That could
probably be improved if someone wanted to put enough effort into it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Arjen van der Meijden

Hi Kenji,

On 13-2-2007 20:46 Kenji Morishige wrote:

Here is the full specification of the new box I hope to build and run FreeBSD
6.X and PostgreSQL on:

- SuperMicro Dual Xeon X7DBE+ motherboard
  + 2 x Quad Core X5355 2.66Ghz 
OR
  + 2 x Dual Core 5160 3.0Ghz 


- 8 x 1GB PC2-4200 fully buffered DIMM

- LSI MegaRAID SAS 8408E w/BBU 256MB

- 16 x 73GB SAS disk


If this is in one of those 4U cases, make very, very sure it can 
properly exhaust all the heat generated and has more than adequate power 
supply. When going for a similar machine, we got a negative advice on 
such a set-up from a server vendor who built such machines themselves. 
Don't forget that the FB-dimms run pretty hot and they need sufficient 
cooling. As you can see on these pictures Fujitsu thought it necessary 
to add fan-ducts for the memory:

http://tweakers.net/reviews/646/7

Our own Dell systems have similar ducts. But a third-party server 
builder we tested did not include those, and the machine ran very hot (I 
couldn't touch the bottom for more than a short time) in a 
not-too-good-ventilated, but mostly empty, server rack. Although that 
was a 2U machine, but it didn't include any disks. Currently we have had 
good experience with our new Dell 1950 (2x 5160, PC5300 FBD) combined 
with a Dell MD1000 SAS disk unit (15x 15k 36G disks) described in the 
second review linked below. HP offers similar options and there are 
probably several other suppliers who can build something like that too. 
Seperate SAS-JBOD disk units are available from other suppliers as well.



So, question #1, to go dual core or quad core?  Quad core in theory seems to
scale the machine's processing potential by almost a factor of two.


I can partially answer that question, but than for linux + postgresql 
8.2. In that case, postgresql can take advantage of the extra core. See 
our review here:

http://tweakers.net/reviews/661

This includes comparisons between the X5355 and 5160 with postgresql on 
the seventh page, here: http://tweakers.net/reviews/661/7


But be aware that there can be substantial and unexpected differences on 
this relatively new platform due to simply changing the OS, like we saw 
when going from linux 2.6.15 to 2.6.18, as you can see here:

http://tweakers.net/reviews/657/2

Our benchmark has relatively little writing and a smallish dataset (fits 
in 4GB of memory), so I don't know how much use these benchmarks are for 
you. But the conclusion was that the extra processor power isn't fully 
available, possibly because there is less memory bandwidth per processor 
core and more communication overhead. Then again, in our test the dual 
quad core was faster than the dual dual core.



And lastly, up till now, I've only have experience configuring SCSI RAID
controllers.  I believe this LSI MegaRAID unit has a dual channel setup, but
when it comes to SAS drives, I don't know what kind of flexibility this
provides.  How should the disks be partitioned for maximum PostgreSQL
performance?

I'm thinking about keeping it simple assuming that the hot spare can only be
utilized one per channel leaving me only 14 disks to utilize.


In that Dell 1950 we use the Dell PERC5/e SAS-controller for the 
database, which is based on that same LSI controller, although it has 2 
external sas connections (for 4 channels each). Afaik it supports global 
hot spares. But we use the full set of 15 disks as a 14+1 disk raid 5, 
so I haven't looked at that too well. For the OS we have a seperate 
PERC5/i  internal raid controller with two internal disks. My 
colleague also tested several raid set-ups with that equipment, and we 
choose a raid5 for its slightly better read-performance. If you can make 
something of these dutch pages, you can have a look at those results here:

http://tweakers.net/benchdb/test/122

Play around with the form at the bottom of the page to see some 
comparisons between several raid set-ups. The sas configurations are of 
course the ones with the Fujitsu MAX3036RC 36GB disks and Dell PERC 
5/E controller.



If I am not able to utilize both channels to create a 12 disk RAID1+0 array,
then it might be better to create 2 seperate data partitions, one for
WAL/pg_xlog and the rest for the data store.


I'm not too sure how you can connect your disks and controller to a 
SAS-expander (which you need to connect more than 8 disks to a 
controller). I believe it is possible to use a 24-port expander, 
allowing communication between the 16 disks and 8 ports of your 
controller. A SAS-expander comes normally with the enclosure/disk unit, 
but I have no idea about the details. Our own testing was done using 
just a single 4-port connector, which can handle 1.2GB/sec (afaik this B 
is for bytes) and we believe that's sufficient for our 15 disks.



Please comment on any issues you may see with this box and my assumptions.
Also any FreeBSD kernel issues or tweaks you could recommend.


Have a very good look at your heat production and exhaust 

Re: [PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Mark Kirkwood

Kenji Morishige wrote:


Please comment on any issues you may see with this box and my assumptions.
Also any FreeBSD kernel issues or tweaks you could recommend.



I would recommend posting to freebsd-hardware or freebsd-stable and 
asking if there are any gotchas with the X7DBE+ and 6.2 (for instance 
X7DBR-8+ suffers from an intermittent hang at boot... so can't hurt to ask!)


Cheers

Mark

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Dan Harris

Arjen van der Meijden wrote:



But be aware that there can be substantial and unexpected differences on 
this relatively new platform due to simply changing the OS, like we saw 
when going from linux 2.6.15 to 2.6.18, as you can see here:

http://tweakers.net/reviews/657/2



Having upgraded to 2.6.18 fairly recently, I am *very* interested in 
what caused the throughput to drop in 2.6.18?  I haven't done any 
benchmarking on my system to know if it affected my usage pattern 
negatively, but I am curious if anyone knows why this happened?


-Dan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] quad or dual core Intel CPUs

2007-02-13 Thread Guillaume Smet

Dan,

On 2/13/07, Dan Harris [EMAIL PROTECTED] wrote:

Having upgraded to 2.6.18 fairly recently, I am *very* interested in
what caused the throughput to drop in 2.6.18?  I haven't done any
benchmarking on my system to know if it affected my usage pattern
negatively, but I am curious if anyone knows why this happened?


I think you misread the graph. PostgreSQL 8.2 seems to be
approximately 20% faster with kernel 2.6.18 on the platforms tested
(and using tweakers.net benchmark).

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster