Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Ragnar Hafstað
On Wed, 2005-01-19 at 21:00 -0800, [EMAIL PROTECTED] wrote:
 Let's see if I have been paying enough attention to the SQL gurus. 
 The planner is making a different estimate of how many deprecated'' versus 
 how many broken  ''. 
 I would try SET STATISTICS to a larger number on the ports table, and 
 re-analyze.

that should not help, as the estimate is accurate, according to the
explain analyze.

gnari



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Ragnar Hafstað
On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote:
 Hi folks,
 
 Running on 7.4.2, recently vacuum analysed the three tables in 
 question.
 
 The query plan in question changes dramatically when a WHERE clause 
 changes from ports.broken to ports.deprecated.  I don't see why.  
 Well, I do see why: a sequential scan of a 130,000 rows.  The query 
 goes from 13ms to 1100ms because the of this.  The full plans are at 
 http://rafb.net/paste/results/v8ccvQ54.html
 
 I have tried some tuning by:
 
   set effective_cache_size to 4000, was 1000
   set random_page_cost to 1, was 4
 
 The resulting plan changes, but no speed improvment, are at 
 http://rafb.net/paste/results/rV8khJ18.html
 

this just confirms that an indexscan is not always better than a
tablescan. by setting random_page_cost to 1, you deceiving the
planner into thinking that the indexscan is almost as effective
as a tablescan.

 Any suggestions please?  

did you try to increase sort_mem ?

gnari



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM]

2005-01-20 Thread Matt Casters

Hi,

I have the go ahead of a customer to do some testing on Postgresql in a couple 
of weeks as a
replacement for Oracle.
The reason for the test is that the number of users of the warehouse is going 
to increase and this
will have a serious impact on licencing costs. (I bet that sounds familiar)

We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on 
Oracle.
Basically we have 2 large fact tables to deal with: one going for 400M rows, 
the other will be
hitting 1B rows soon.
(around 250Gb of data)

My questions to the list are: has this sort of thing been attempted before? If 
so, what where the
performance results compared to Oracle?
I've been reading up on partitioned tabes on pgsql, will the performance 
benefit will be
comparable to Oracle partitioned tables?
What are the gotchas?
Should I be testing on 8 or the 7 version?
While I didn't find any documents immediately, are there any fine manuals to 
read on data
warehouse performance tuning on PostgreSQL?

Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date
on the results.

Best regards,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] OFFSET impact on Performance???

2005-01-20 Thread Andrei Bintintan




Hi to all, I have the following 2 examples. Now, 
regarding on the offset if it is small(10) or big(5) what is the impact 
on the performance of the query?? I noticed that if I return more 
data's(columns) orif I make more joinsthen the query runs even 
slower if the OFFSET is bigger. How can I somehow improve the performance on 
this? 
Best regards, Andy.
explain analyzeSELECT 
o.idFROM 
report r INNER JOIN orders o ON 
o.id=r.id_order AND o.id_status=6ORDER BY 1 LIMIT 10 OFFSET 10

Limit (cost=44.37..88.75 rows=10 width=4) 
(actual time=0.160..0.275 rows=10 loops=1) - Merge 
Join (cost=0.00..182150.17 rows=41049 width=4) (actual time=0.041..0.260 
rows=20 loops=1) Merge Cond: 
("outer".id_order = "inner".id) 
- Index Scan using report_id_order_idx on report r 
(cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075 rows=20 
loops=1) - Index Scan 
using orders_pkey on orders o (cost=0.00..24127.04 rows=42501 width=4) 
(actual time=0.013..0.078 rows=20 
loops=1) 
Filter: (id_status = 6)Total runtime: 0.373 ms

explain analyzeSELECT 
o.idFROM 
report r INNER JOIN orders o ON 
o.id=r.id_order AND o.id_status=6ORDER BY 1 LIMIT 10 OFFSET 100Limit (cost=31216.85..31216.85 rows=1 width=4) (actual 
time=1168.152..1168.152 rows=0 loops=1) - Sort 
(cost=31114.23..31216.85 rows=41049 width=4) (actual time=1121.769..1152.246 
rows=42693 loops=1) Sort Key: 
o.id - Hash Join 
(cost=2329.99..27684.03 rows=41049 width=4) (actual time=441.879..925.498 
rows=42693 
loops=1) 
Hash Cond: ("outer".id_order = 
"inner".id) 
- Seq Scan on report r (cost=0.00..23860.62 rows=42862 width=4) 
(actual time=38.634..366.035 rows=42864 
loops=1) 
- Hash (cost=2077.74..2077.74 rows=42501 width=4) (actual 
time=140.200..140.200 rows=0 
loops=1) 
- Seq Scan on orders o (cost=0.00..2077.74 rows=42501 width=4) 
(actual time=0.059..96.890 rows=42693 
loops=1) 
Filter: (id_status = 6)Total runtime: 1170.586 
ms


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Dan Langille
On 20 Jan 2005 at 9:34, Ragnar Hafstað wrote:

 On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote:
  Hi folks,
 
  Running on 7.4.2, recently vacuum analysed the three tables in
  question.
 
  The query plan in question changes dramatically when a WHERE clause
  changes from ports.broken to ports.deprecated.  I don't see why.
  Well, I do see why: a sequential scan of a 130,000 rows.  The query
  goes from 13ms to 1100ms because the of this.  The full plans are at
  http://rafb.net/paste/results/v8ccvQ54.html
 
  I have tried some tuning by:
 
set effective_cache_size to 4000, was 1000
set random_page_cost to 1, was 4
 
  The resulting plan changes, but no speed improvment, are at
  http://rafb.net/paste/results/rV8khJ18.html
 

 this just confirms that an indexscan is not always better than a
 tablescan. by setting random_page_cost to 1, you deceiving the
 planner into thinking that the indexscan is almost as effective
 as a tablescan.

  Any suggestions please?

 did you try to increase sort_mem ?

I tried sort_mem = 4096 and then 16384. This did not make a
difference.  See http://rafb.net/paste/results/AVDqEm55.html

Thank you.
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Andrei Bintintan wrote:
Hi to all,
I have the following 2 examples. Now, regarding on the offset if it
is small(10) or big(5) what is the impact on the performance of
the query?? I noticed that if I return more data's(columns) or if I
make more joins then the query runs even slower if the OFFSET is
bigger. How can I somehow improve the performance on this?
There's really only one way to do an offset of 1000 and that's to fetch 
1000 rows and then some and discard the first 1000.

If you're using this to provide pages of results, could you use a cursor?
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] OFFSET impact on Performance???

2005-01-20 Thread Merlin Moncure
Andrei:
Hi to all, 

I have the following 2 examples. Now, regarding on the offset if it is 
small(10) or big(5) what is the impact on the performance of the query?? I 
noticed that if I return more data's(columns) or if I make more joins then the 
query runs even slower if the OFFSET is bigger. How can I 
somehow improve the performance on this? 

Merlin:
Offset is not suitable for traversal of large data sets.  Better not use it at 
all!

There are many ways to deal with this problem, the two most direct being the 
view approach and the cursor approach.

cursor approach:
declare report_order with hold cursor for select * from report r, order o [...]
Remember to close the cursor when you're done.  Now fetch time is proportional 
to the number of rows fetched, and should be very fast.  The major drawback to 
this approach is that cursors in postgres (currently) are always insensitive, 
so that record changes after you declare the cursor from other users are not 
visible to you.  If this is a big deal, try the view approach.

view approach:
create view report_order as select * from report r, order o [...]

and this:
prepare fetch_from_report_order(numeric, numeric, int4) as
select * from report_order where order_id = $1 and
(order_id  $1 or report_id  $2)
order by order_id, report_id limit $3;

fetch next 1000 records from report_order:
execute fetch_from_report_order(o, f, 1000);  o and f being the last key values 
you fetched (pass in zeroes to start it off).

This is not quite as fast as the cursor approach (but it will be when we get a 
proper row constructor, heh), but it more flexible in that it is sensitive to 
changes from other users.  This is more of a 'permanent' binding whereas cursor 
is a binding around a particular task.

Good luck!
Merlin



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Andrei Bintintan
If you're using this to provide pages of results, could you use a 
cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide pages, but If I jump to the last pages it 
goes very slow.

Andy.
- Original Message - 
From: Richard Huxton dev@archonet.com
To: Andrei Bintintan [EMAIL PROTECTED]
Cc: pgsql-sql@postgresql.org; pgsql-performance@postgresql.org
Sent: Thursday, January 20, 2005 2:10 PM
Subject: Re: [SQL] OFFSET impact on Performance???


Andrei Bintintan wrote:
Hi to all,
I have the following 2 examples. Now, regarding on the offset if it
is small(10) or big(5) what is the impact on the performance of
the query?? I noticed that if I return more data's(columns) or if I
make more joins then the query runs even slower if the OFFSET is
bigger. How can I somehow improve the performance on this?
There's really only one way to do an offset of 1000 and that's to fetch 
1000 rows and then some and discard the first 1000.

If you're using this to provide pages of results, could you use a 
cursor?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Dear community,

My company, which I actually represent, is a fervent user of PostgreSQL.
We used to make all our applications using PostgreSQL for more than 5 years.
We usually do classical client/server applications under Linux, and Web 
interface (php, perl, C/C++). We used to manage also public web services with 
10/15 millions records and up to 8 millions pages view by month.

Now we are in front of a new need, but we do not find any good solution with 
PostgreSQL.
We need to make a sort of directory of millions of data growing about 4/8 
millions per month, and to be able to be used by many users from the web. In 
order to do this, our solution need to be able to run perfectly with many 
insert and many select access (done before each insert, and done by web site 
visitors). We will also need to make a search engine for the millions of data 
(140/150 millions records at the immediate beginning) ... No it's not google, 
but the kind of volume of data stored in the main table is similar.

Then ... we have made some tests, with the actual servers we have here, like a 
Bi-Pro Xeon 2.8 Ghz, with 4 Gb of RAM and the result of the cumulative 
inserts, and select access is slowing down the service really quickly ... 
(Load average is going up to 10 really quickly on the database).

We were at this moment thinking about a Cluster solution ... We saw on the 
Internet many solution talking about Cluster solution using MySQL ... but 
nothing about PostgreSQL ... the idea is to use several servers to make a 
sort of big virtual server using the disk space of each server as one, and 
having the ability to use the CPU and RAM of each servers in order to 
maintain good service performance ...one can imagin it is like a GFS but 
dedicated to postgreSQL...

Is there any solution with PostgreSQL matching these needs ... ?
Do we have to backport our development to MySQL for this kind of problem ?
Is there any other solution than a Cluster for our problem ?

Looking for your reply,

Regards,
-- 
Hervé

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Stephan Szabo
On Wed, 19 Jan 2005, Dan Langille wrote:

 Hi folks,

 Running on 7.4.2, recently vacuum analysed the three tables in
 question.

 The query plan in question changes dramatically when a WHERE clause
 changes from ports.broken to ports.deprecated.  I don't see why.
 Well, I do see why: a sequential scan of a 130,000 rows.  The query
 goes from 13ms to 1100ms because the of this.  The full plans are at
 http://rafb.net/paste/results/v8ccvQ54.html

 I have tried some tuning by:

   set effective_cache_size to 4000, was 1000
   set random_page_cost to 1, was 4

 The resulting plan changes, but no speed improvment, are at
 http://rafb.net/paste/results/rV8khJ18.html

 Any suggestions please?

As a question, what does it do if enable_hashjoin is false? I'm wondering
if it'll pick a nested loop for that step for the element/ports join and
what it estimates the cost to be.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jean-Max Reymond
On Thu, 20 Jan 2005 15:03:31 +0100, Hervé Piedvache [EMAIL PROTECTED] wrote:

 We were at this moment thinking about a Cluster solution ... We saw on the
 Internet many solution talking about Cluster solution using MySQL ... but
 nothing about PostgreSQL ... the idea is to use several servers to make a
 sort of big virtual server using the disk space of each server as one, and
 having the ability to use the CPU and RAM of each servers in order to
 maintain good service performance ...one can imagin it is like a GFS but
 dedicated to postgreSQL...
 

forget mysql cluster for now.
We have a small database which size is 500 Mb.
It is not possible to load these base in a computer with 2 Mb of RAM
and loading the base in RAM is required.
So, we shrink the database and it is ok with 350 Mb to fit in the 2 Gb RAM.
First tests of performance on a basic request: 500x slower, yes 500x.
This issue is reported to mysql team  but no answer (and correction)

Actually, the solution is running with a replication database: 1 node
for write request and all the other nodes for read requests and the
load balancer is made with round robin solution.


-- 
Jean-Max Reymond
CKR Solutions
Nice France
http://www.ckr-solutions.com

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Christopher Kings-Lynne
Is there any solution with PostgreSQL matching these needs ... ?
You want: http://www.slony.info/
Do we have to backport our development to MySQL for this kind of problem ?
Is there any other solution than a Cluster for our problem ?
Well, Slony does replication which is basically what you want :)
Only master-slave though, so you will need to have all inserts go via 
the master server, but selects can come off any server.

Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM]

2005-01-20 Thread Stephen Frost
* Matt Casters ([EMAIL PROTECTED]) wrote:
 I have the go ahead of a customer to do some testing on Postgresql in a 
 couple of weeks as a
 replacement for Oracle.
 The reason for the test is that the number of users of the warehouse is going 
 to increase and this
 will have a serious impact on licencing costs. (I bet that sounds familiar)

Rather familiar, yes... :)

 We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) 
 on Oracle.
 Basically we have 2 large fact tables to deal with: one going for 400M rows, 
 the other will be
 hitting 1B rows soon.
 (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think-
Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.  So, it depends on your workload as to how that may impact
you.  Situations where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

 My questions to the list are: has this sort of thing been attempted before? 
 If so, what where the
 performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't
have any numbers on it myself.  My data sets aren't that large (couple
million rows) but I've found PostgreSQL at least as fast as Oracle for
what we do, and much easier to work with.

 I've been reading up on partitioned tabes on pgsql, will the performance 
 benefit will be
 comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

 What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join,
iirc).

 Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for
8.0.x releases before you go production, if you have time before you
have to go into production with the new solution (sounds like you do-
changing databases takes time anyway).

 Thanks in advance for any help you may have, I'll do my best to keep 
 pgsql-performance up to date
 on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Stephen Frost
* Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 Is there any solution with PostgreSQL matching these needs ... ?

You might look into pg_pool.  Another possibility would be slony, though
I'm not sure it's to the point you need it at yet, depends on if you can
handle some delay before an insert makes it to the slave select systems.

 Do we have to backport our development to MySQL for this kind of problem ?

Well, hopefully not. :)

 Is there any other solution than a Cluster for our problem ?

Bigger server, more CPUs/disks in one box.  Try to partition up your
data some way such that it can be spread across multiple machines, then
if you need to combine the data have it be replicated using slony to a
big box that has a view which joins all the tables and do your big
queries against that.

Just some thoughts.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Dan Langille
On 20 Jan 2005 at 6:14, Stephan Szabo wrote:

 On Wed, 19 Jan 2005, Dan Langille wrote:
 
  Hi folks,
 
  Running on 7.4.2, recently vacuum analysed the three tables in
  question.
 
  The query plan in question changes dramatically when a WHERE clause
  changes from ports.broken to ports.deprecated.  I don't see why.
  Well, I do see why: a sequential scan of a 130,000 rows.  The query
  goes from 13ms to 1100ms because the of this.  The full plans are at
  http://rafb.net/paste/results/v8ccvQ54.html
 
  I have tried some tuning by:
 
set effective_cache_size to 4000, was 1000
set random_page_cost to 1, was 4
 
  The resulting plan changes, but no speed improvment, are at
  http://rafb.net/paste/results/rV8khJ18.html
 
  Any suggestions please?
 
 As a question, what does it do if enable_hashjoin is false? I'm wondering
 if it'll pick a nested loop for that step for the element/ports join and
 what it estimates the cost to be.

With enable_hashjoin = false, no speed improvement.  Execution plan 
at http://rafb.net/paste/results/qtSFVM72.html

thanks
-- 
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit :
  Is there any solution with PostgreSQL matching these needs ... ?

 You want: http://www.slony.info/

  Do we have to backport our development to MySQL for this kind of problem
  ? Is there any other solution than a Cluster for our problem ?

 Well, Slony does replication which is basically what you want :)

 Only master-slave though, so you will need to have all inserts go via
 the master server, but selects can come off any server.

Sorry but I don't agree with this ... Slony is a replication solution ... I 
don't need replication ... what will I do when my database will grow up to 50 
Gb ... I'll need more than 50 Gb of RAM on each server ???
This solution is not very realistic for me ...

I need a Cluster solution not a replication one or explain me in details how I 
will do for managing the scalabilty of my database ...

regards,
-- 
Hervé

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 15:38, Christopher Kings-Lynne a écrit :
  Sorry but I don't agree with this ... Slony is a replication solution ...
  I don't need replication ... what will I do when my database will grow up
  to 50 Gb ... I'll need more than 50 Gb of RAM on each server ???
  This solution is not very realistic for me ...
 
  I need a Cluster solution not a replication one or explain me in details
  how I will do for managing the scalabilty of my database ...

 Buy Oracle

I think this is not my solution ... sorry I'm talking about finding a 
PostgreSQL solution ... 
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit :
 * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
  Is there any solution with PostgreSQL matching these needs ... ?

 You might look into pg_pool.  Another possibility would be slony, though
 I'm not sure it's to the point you need it at yet, depends on if you can
 handle some delay before an insert makes it to the slave select systems.

I think not ... pgpool or slony are replication solutions ... but as I have 
said to Christopher Kings-Lynne how I'll manage the scalabilty of the 
database ? I'll need several servers able to load a database growing and 
growing to get good speed performance ...

  Do we have to backport our development to MySQL for this kind of problem
  ?

 Well, hopefully not. :)

I hope so ;o)

  Is there any other solution than a Cluster for our problem ?

 Bigger server, more CPUs/disks in one box.  Try to partition up your
 data some way such that it can be spread across multiple machines, then
 if you need to combine the data have it be replicated using slony to a
 big box that has a view which joins all the tables and do your big
 queries against that.

But I'll arrive to limitation of a box size quickly I thing a 4 processors 
with 64 Gb of RAM ... and after ?

regards,
-- 
Hervé

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Christopher Kings-Lynne
Sorry but I don't agree with this ... Slony is a replication solution ... I 
don't need replication ... what will I do when my database will grow up to 50 
Gb ... I'll need more than 50 Gb of RAM on each server ???
This solution is not very realistic for me ...

I need a Cluster solution not a replication one or explain me in details how I 
will do for managing the scalabilty of my database ...
Buy Oracle
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Stephen Frost
* Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit :
  * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
   Is there any solution with PostgreSQL matching these needs ... ?
 
  You might look into pg_pool.  Another possibility would be slony, though
  I'm not sure it's to the point you need it at yet, depends on if you can
  handle some delay before an insert makes it to the slave select systems.
 
 I think not ... pgpool or slony are replication solutions ... but as I have 
 said to Christopher Kings-Lynne how I'll manage the scalabilty of the 
 database ? I'll need several servers able to load a database growing and 
 growing to get good speed performance ...

They're both replication solutions, but they also help distribute the
load.  For example:

pg_pool will distribute the select queries amoung the servers.  They'll
all get the inserts, so that hurts, but at least the select queries are
distributed.

slony is similar, but your application level does the load distribution
of select statements instead of pg_pool.  Your application needs to know
to send insert statements to the 'main' server, and select from the
others.

   Is there any other solution than a Cluster for our problem ?
 
  Bigger server, more CPUs/disks in one box.  Try to partition up your
  data some way such that it can be spread across multiple machines, then
  if you need to combine the data have it be replicated using slony to a
  big box that has a view which joins all the tables and do your big
  queries against that.
 
 But I'll arrive to limitation of a box size quickly I thing a 4 processors 
 with 64 Gb of RAM ... and after ?

Go to non-x86 hardware after if you're going to continue to increase the
size of the server.  Personally I think your better bet might be to
figure out a way to partition up your data (isn't that what google
does anyway?).

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake
Hervé Piedvache wrote:
Dear community,
My company, which I actually represent, is a fervent user of PostgreSQL.
We used to make all our applications using PostgreSQL for more than 5 years.
We usually do classical client/server applications under Linux, and Web 
interface (php, perl, C/C++). We used to manage also public web services with 
10/15 millions records and up to 8 millions pages view by month.
 

Depending on your needs either:
Slony: www.slony.info
or
Replicator: www.commandprompt.com
Will both do what you want. Replicator is easier to setup but
Slony is free.
Sincerely,
Joshua D. Drake

Now we are in front of a new need, but we do not find any good solution with 
PostgreSQL.
We need to make a sort of directory of millions of data growing about 4/8 
millions per month, and to be able to be used by many users from the web. In 
order to do this, our solution need to be able to run perfectly with many 
insert and many select access (done before each insert, and done by web site 
visitors). We will also need to make a search engine for the millions of data 
(140/150 millions records at the immediate beginning) ... No it's not google, 
but the kind of volume of data stored in the main table is similar.

Then ... we have made some tests, with the actual servers we have here, like a 
Bi-Pro Xeon 2.8 Ghz, with 4 Gb of RAM and the result of the cumulative 
inserts, and select access is slowing down the service really quickly ... 
(Load average is going up to 10 really quickly on the database).

We were at this moment thinking about a Cluster solution ... We saw on the 
Internet many solution talking about Cluster solution using MySQL ... but 
nothing about PostgreSQL ... the idea is to use several servers to make a 
sort of big virtual server using the disk space of each server as one, and 
having the ability to use the CPU and RAM of each servers in order to 
maintain good service performance ...one can imagin it is like a GFS but 
dedicated to postgreSQL...

Is there any solution with PostgreSQL matching these needs ... ?
Do we have to backport our development to MySQL for this kind of problem ?
Is there any other solution than a Cluster for our problem ?
Looking for your reply,
Regards,
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jeff
On Jan 20, 2005, at 9:36 AM, Hervé Piedvache wrote:
Sorry but I don't agree with this ... Slony is a replication solution 
... I
don't need replication ... what will I do when my database will grow 
up to 50
Gb ... I'll need more than 50 Gb of RAM on each server ???
Slony doesn't use much ram. The mysql clustering product, ndb I believe 
it is called, requires all data fit in RAM. (At least, it used to).  
What you'll need is disk space.

As for a cluster I think you are thinking of multi-master replication.
You should look into what others have said about trying to partiition 
data among several boxes and then join the results together.

Or you could fork over  hundreds of thousands of dollars for Oracle's 
RAC.

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake
Stephen Frost wrote:
* Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 

Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit :
   

* Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 

Is there any solution with PostgreSQL matching these needs ... ?
   

You might look into pg_pool.  Another possibility would be slony, though
I'm not sure it's to the point you need it at yet, depends on if you can
handle some delay before an insert makes it to the slave select systems.
 

I think not ... pgpool or slony are replication solutions ... but as I have 
said to Christopher Kings-Lynne how I'll manage the scalabilty of the 
database ? I'll need several servers able to load a database growing and 
growing to get good speed performance ...
   

They're both replication solutions, but they also help distribute the
load.  For example:
pg_pool will distribute the select queries amoung the servers.  They'll
all get the inserts, so that hurts, but at least the select queries are
distributed.
slony is similar, but your application level does the load distribution
of select statements instead of pg_pool.  Your application needs to know
to send insert statements to the 'main' server, and select from the
others.
 

You can put pgpool in front of replicator or slony to get load
balancing for reads.
 

Is there any other solution than a Cluster for our problem ?
   

Bigger server, more CPUs/disks in one box.  Try to partition up your
data some way such that it can be spread across multiple machines, then
if you need to combine the data have it be replicated using slony to a
big box that has a view which joins all the tables and do your big
queries against that.
 

But I'll arrive to limitation of a box size quickly I thing a 4 processors 
with 64 Gb of RAM ... and after ?
   

Opteron.

Go to non-x86 hardware after if you're going to continue to increase the
size of the server.  Personally I think your better bet might be to
figure out a way to partition up your data (isn't that what google
does anyway?).
	Stephen
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Christopher Kings-Lynne
Sorry but I don't agree with this ... Slony is a replication solution ...
I don't need replication ... what will I do when my database will grow up
to 50 Gb ... I'll need more than 50 Gb of RAM on each server ???
This solution is not very realistic for me ...
I need a Cluster solution not a replication one or explain me in details
how I will do for managing the scalabilty of my database ...
Buy Oracle

I think this is not my solution ... sorry I'm talking about finding a 
PostgreSQL solution ... 
My point being is that there is no free solution.  There simply isn't. 
I don't know why you insist on keeping all your data in RAM, but the 
mysql cluster requires that ALL data MUST fit in RAM all the time.

PostgreSQL has replication, but not partitioning (which is what you want).
So, your only option is Oracle or another very expensive commercial 
database.

Chris
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 15:48, Jeff a écrit :
 On Jan 20, 2005, at 9:36 AM, Hervé Piedvache wrote:
  Sorry but I don't agree with this ... Slony is a replication solution
  ... I
  don't need replication ... what will I do when my database will grow
  up to 50
  Gb ... I'll need more than 50 Gb of RAM on each server ???

 Slony doesn't use much ram. The mysql clustering product, ndb I believe
 it is called, requires all data fit in RAM. (At least, it used to).
 What you'll need is disk space.

Slony do not use RAM ... but PostgreSQL will need RAM for accessing a database 
of 50 Gb ... so having two servers with the same configuration replicated by 
slony do not slove the problem of the scalability of the database ...

 As for a cluster I think you are thinking of multi-master replication.

No I'm really thinking about a Cluster solution ... having several servers 
making one big virtual server to have several processors, and many RAM in 
many boxes ...

 You should look into what others have said about trying to partiition
 data among several boxes and then join the results together.

??? Who talk about this ?

 Or you could fork over  hundreds of thousands of dollars for Oracle's
 RAC.

No please do not talk about this again ... I'm looking about a PostgreSQL 
solution ... I know RAC ... and I'm not able to pay for a RAC certify 
hardware configuration plus a RAC Licence.

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Joshua,

Le Jeudi 20 Janvier 2005 15:44, Joshua D. Drake a écrit :
 Hervé Piedvache wrote:
 
 My company, which I actually represent, is a fervent user of PostgreSQL.
 We used to make all our applications using PostgreSQL for more than 5
  years. We usually do classical client/server applications under Linux,
  and Web interface (php, perl, C/C++). We used to manage also public web
  services with 10/15 millions records and up to 8 millions pages view by
  month.

 Depending on your needs either:

 Slony: www.slony.info

 or

 Replicator: www.commandprompt.com

 Will both do what you want. Replicator is easier to setup but
 Slony is free.

No ... as I have said ... how I'll manage a database getting a table of may be 
250 000 000 records ? I'll need incredible servers ... to get quick access or 
index reading ... no ?

So what we would like to get is a pool of small servers able to make one 
virtual server ... for that is called a Cluster ... no ?

I know they are not using PostgreSQL ... but how a company like Google do to 
get an incredible database in size and so quick access ?

regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 15:51, Christopher Kings-Lynne a écrit :
 Sorry but I don't agree with this ... Slony is a replication solution
  ... I don't need replication ... what will I do when my database will
  grow up to 50 Gb ... I'll need more than 50 Gb of RAM on each server
  ??? This solution is not very realistic for me ...
 
 I need a Cluster solution not a replication one or explain me in details
 how I will do for managing the scalabilty of my database ...
 
 Buy Oracle
 
  I think this is not my solution ... sorry I'm talking about finding a
  PostgreSQL solution ...

 My point being is that there is no free solution.  There simply isn't.
 I don't know why you insist on keeping all your data in RAM, but the
 mysql cluster requires that ALL data MUST fit in RAM all the time.

I don't insist about have data in RAM  but when you use PostgreSQL with 
big database you know that for quick access just for reading the index file 
for example it's better to have many RAM as possible ... I just want to be 
able to get a quick access with a growing and growind database ...

 PostgreSQL has replication, but not partitioning (which is what you want).

:o(

 So, your only option is Oracle or another very expensive commercial
 database.

That's not a good news ...
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake

No please do not talk about this again ... I'm looking about a PostgreSQL 
solution ... I know RAC ... and I'm not able to pay for a RAC certify 
hardware configuration plus a RAC Licence.
 

What you want does not exist for PostgreSQL. You will either
have to build it yourself or pay somebody to build it for you.
Sincerely,
Joshua D. Drake

Regards,
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Stephen Frost
* Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote:
 PostgreSQL has replication, but not partitioning (which is what you want).

It doesn't have multi-server partitioning..  It's got partitioning
within a single server (doesn't it?  I thought it did, I know it was
discussed w/ the guy from Cox Communications and I thought he was using
it :).

 So, your only option is Oracle or another very expensive commercial 
 database.

Or partition the data at the application layer.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake

So what we would like to get is a pool of small servers able to make one 
virtual server ... for that is called a Cluster ... no ?

I know they are not using PostgreSQL ... but how a company like Google do to 
get an incredible database in size and so quick access ?
 

You could use dblink with multiple servers across data partitions
within PostgreSQL but I don't know how fast that would be.
J

regards,
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake
Christopher Kings-Lynne wrote:
Or you could fork over  hundreds of thousands of dollars for Oracle's
RAC.

No please do not talk about this again ... I'm looking about a 
PostgreSQL solution ... I know RAC ... and I'm not able to pay for a 
RAC certify hardware configuration plus a RAC Licence.

There is absolutely zero PostgreSQL solution...

I just replied the same thing but then I was thinking. Couldn't he use 
multiple databases
over multiple servers with dblink?

It is not exactly how I would want to do it, but it would provide what 
he needs I think???

Sincerely,
Joshua D. Drake

You may have to split the data yourself onto two independent db 
servers and combine the results somehow in your application.

Chris
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 16:05, Joshua D. Drake a écrit :
 Christopher Kings-Lynne wrote:
  Or you could fork over  hundreds of thousands of dollars for Oracle's
  RAC.
 
  No please do not talk about this again ... I'm looking about a
  PostgreSQL solution ... I know RAC ... and I'm not able to pay for a
  RAC certify hardware configuration plus a RAC Licence.
 
  There is absolutely zero PostgreSQL solution...

 I just replied the same thing but then I was thinking. Couldn't he use
 multiple databases
 over multiple servers with dblink?

 It is not exactly how I would want to do it, but it would provide what
 he needs I think???

Yes seems to be the only solution ... but I'm a little disapointed about 
this ... could you explain me why there is not this kind of 
functionnality ... it seems to be a real need for big applications no ?

Thanks all for your answers ...
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Stephen Frost
* Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 I know they are not using PostgreSQL ... but how a company like Google do to 
 get an incredible database in size and so quick access ?

They segment their data across multiple machines and have an algorithm
which tells the application layer which machine to contact for what
data.

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Steve Wampler
Hervé Piedvache wrote:
No ... as I have said ... how I'll manage a database getting a table of may be 
250 000 000 records ? I'll need incredible servers ... to get quick access or 
index reading ... no ?

So what we would like to get is a pool of small servers able to make one 
virtual server ... for that is called a Cluster ... no ?

I know they are not using PostgreSQL ... but how a company like Google do to 
get an incredible database in size and so quick access ?
Probably by carefully partitioning their data.  I can't imagine anything
being fast on a single table in 250,000,000 tuple range.  Nor can I
really imagine any database that efficiently splits a single table
across multiple machines (or even inefficiently unless some internal
partitioning is being done).
So, you'll have to do some work at your end and not just hope that
a magic bullet is available.
Once you've got the data partitioned, the question becomes one of
how to inhance performance/scalability.  Have you considered RAIDb?
--
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake

then I was thinking. Couldn't he use
multiple databases
over multiple servers with dblink?
It is not exactly how I would want to do it, but it would provide what
he needs I think???
   

Yes seems to be the only solution ... but I'm a little disapointed about 
this ... could you explain me why there is not this kind of 
functionnality ... it seems to be a real need for big applications no ?
 

Because it is really, really hard to do correctly and hard
equals expensive.
Sincerely,
Joshua D. Drake

Thanks all for your answers ...
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Merlin Moncure
 No please do not talk about this again ... I'm looking about a PostgreSQL
 solution ... I know RAC ... and I'm not able to pay for a RAC certify
 hardware configuration plus a RAC Licence.

Are you totally certain you can't solve your problem with a single server 
solution?

How about:
Price out a 4 way Opteron 4u rackmount server with 64 bit linux, stuffed with 
hard drives (like 40) set up in a complex raid configuration (multiple raid 
controllers) allowing you (with tablespaces) to divide up your database.

You can drop in dual core opterons at some later point for an easy upgrade.  
Let's say this server costs 20k$...are you sure this will not be enough to 
handle your load?

Merlin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Stephan Szabo
On Thu, 20 Jan 2005, Dan Langille wrote:

 On 20 Jan 2005 at 6:14, Stephan Szabo wrote:

  On Wed, 19 Jan 2005, Dan Langille wrote:
 
   Hi folks,
  
   Running on 7.4.2, recently vacuum analysed the three tables in
   question.
  
   The query plan in question changes dramatically when a WHERE clause
   changes from ports.broken to ports.deprecated.  I don't see why.
   Well, I do see why: a sequential scan of a 130,000 rows.  The query
   goes from 13ms to 1100ms because the of this.  The full plans are at
   http://rafb.net/paste/results/v8ccvQ54.html
  
   I have tried some tuning by:
  
 set effective_cache_size to 4000, was 1000
 set random_page_cost to 1, was 4
  
   The resulting plan changes, but no speed improvment, are at
   http://rafb.net/paste/results/rV8khJ18.html
  
   Any suggestions please?
 
  As a question, what does it do if enable_hashjoin is false? I'm wondering
  if it'll pick a nested loop for that step for the element/ports join and
  what it estimates the cost to be.

 With enable_hashjoin = false, no speed improvement.  Execution plan
 at http://rafb.net/paste/results/qtSFVM72.html

Honestly I expected it to be slower (which it was), but I figured it's
worth seeing what alternate plans it'll generate (specifically to see how
it cost a nested loop on that join to compare to the fast plan).
Unfortunately, it generated a merge join, so I think it might require both
enable_hashjoin=false and enable_mergejoin=false to get it which is likely
to be even slower in practice but still may be useful to see.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Andrei Bintintan wrote:
If you're using this to provide pages of results, could you use a 
cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide pages, but If I jump to the last pages 
it goes very slow.
DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;
Repeated FETCHes would let you step through your results. That won't 
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Dave Cramer




Google uses something called the google filesystem, look it up in
google. It is a distributed file system.

Dave

Herv Piedvache wrote:

  Joshua,

Le Jeudi 20 Janvier 2005 15:44, Joshua D. Drake a crit :
  
  
Herv Piedvache wrote:


  My company, which I actually represent, is a fervent user of PostgreSQL.
We used to make all our applications using PostgreSQL for more than 5
years. We usually do classical client/server applications under Linux,
and Web interface (php, perl, C/C++). We used to manage also public web
services with 10/15 millions records and up to 8 millions pages view by
month.
  

Depending on your needs either:

Slony: www.slony.info

or

Replicator: www.commandprompt.com

Will both do what you want. Replicator is easier to setup but
Slony is free.

  
  
No ... as I have said ... how I'll manage a database getting a table of may be 
250 000 000 records ? I'll need incredible servers ... to get quick access or 
index reading ... no ?

So what we would like to get is a pool of small servers able to make one 
virtual server ... for that is called a Cluster ... no ?

I know they are not using PostgreSQL ... but how a company like Google do to 
get an incredible database in size and so quick access ?

regards,
  


-- 
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561





Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 16:14, Steve Wampler a écrit :
 Once you've got the data partitioned, the question becomes one of
 how to inhance performance/scalability.  Have you considered RAIDb?

No but I'll seems to be very interesting ... close to the explanation of 
Joshua ... but automaticly done ...

Thanks !
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit :
 Google uses something called the google filesystem, look it up in
 google. It is a distributed file system.

Yes that's another point I'm working on ... make a cluster of server using 
GFS ... and making PostgreSQL running with it ...

But I have not finished my test ... and may be people could have experience 
with this ...

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Herv Piedvache
Le Jeudi 20 Janvier 2005 16:16, Merlin Moncure a écrit :
  No please do not talk about this again ... I'm looking about a PostgreSQL
  solution ... I know RAC ... and I'm not able to pay for a RAC certify
  hardware configuration plus a RAC Licence.

 Are you totally certain you can't solve your problem with a single server
 solution?

 How about:
 Price out a 4 way Opteron 4u rackmount server with 64 bit linux, stuffed
 with hard drives (like 40) set up in a complex raid configuration (multiple
 raid controllers) allowing you (with tablespaces) to divide up your
 database.

 You can drop in dual core opterons at some later point for an easy upgrade.
  Let's say this server costs 20k$...are you sure this will not be enough to
 handle your load?

I'm not as I said ibn my mail I want to do a Cluster of servers ... :o)
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Dan Langille
On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

 On Thu, 20 Jan 2005, Dan Langille wrote:
 
  On 20 Jan 2005 at 6:14, Stephan Szabo wrote:
 
   On Wed, 19 Jan 2005, Dan Langille wrote:
  
Hi folks,
   
Running on 7.4.2, recently vacuum analysed the three tables in
question.
   
The query plan in question changes dramatically when a WHERE clause
changes from ports.broken to ports.deprecated.  I don't see why.
Well, I do see why: a sequential scan of a 130,000 rows.  The query
goes from 13ms to 1100ms because the of this.  The full plans are at
http://rafb.net/paste/results/v8ccvQ54.html
   
I have tried some tuning by:
   
  set effective_cache_size to 4000, was 1000
  set random_page_cost to 1, was 4
   
The resulting plan changes, but no speed improvment, are at
http://rafb.net/paste/results/rV8khJ18.html
   
Any suggestions please?
  
   As a question, what does it do if enable_hashjoin is false? I'm wondering
   if it'll pick a nested loop for that step for the element/ports join and
   what it estimates the cost to be.
 
  With enable_hashjoin = false, no speed improvement.  Execution plan
  at http://rafb.net/paste/results/qtSFVM72.html
 
 Honestly I expected it to be slower (which it was), but I figured it's
 worth seeing what alternate plans it'll generate (specifically to see how
 it cost a nested loop on that join to compare to the fast plan).
 Unfortunately, it generated a merge join, so I think it might require both
 enable_hashjoin=false and enable_mergejoin=false to get it which is likely
 to be even slower in practice but still may be useful to see.

Setting both to false gives a dramatic performance boost.  See 
http://rafb.net/paste/results/b70KAi42.html

This gives suitable speed, but why does the plan vary so much with 
such a minor change in the WHERE clause?
-- 
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Which PARAMETER is most important for load query??

2005-01-20 Thread amrit
I'm dealing with big database [3.8 Gb] and records of 3 millions . Some of the
query seems to be slow eventhough just a few users in the night. I would like
to know which parameter list below is most effective in rising the speed of
these queries?

Shmmax = 32384*8192 =265289728
Share buffer = 32384
sort_mem = 34025= I guess increase this one is most effective but too
high cause reading the swap , is that right?
effective cache = 153204

My server has 4 Gb. ram and ~ 140 clients in rush hours.

Amrit
Thailand

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Steve Wampler
Hervé Piedvache wrote:
Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit :
Google uses something called the google filesystem, look it up in
google. It is a distributed file system.

Yes that's another point I'm working on ... make a cluster of server using 
GFS ... and making PostgreSQL running with it ...
A few years ago I played around with GFS, but not for postgresql.
I don't think it's going to help - logically there's no difference
between putting PG on GFS and putting PG on NFS - in both cases
the filesystem doesn't provide any support for distributing the
task at hand - and a PG database server isn't written to be
distributed across hosts regardless of the distribution of the
data across filesystems.

--
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Christopher Kings-Lynne
Probably by carefully partitioning their data.  I can't imagine anything
being fast on a single table in 250,000,000 tuple range.  Nor can I
really imagine any database that efficiently splits a single table
across multiple machines (or even inefficiently unless some internal
partitioning is being done).
Ah, what about partial indexes - those might help.  As a kind of 
'semi-partition'.

Chris
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Joshua D. Drake
Christopher Kings-Lynne wrote:
Probably by carefully partitioning their data.  I can't imagine anything
being fast on a single table in 250,000,000 tuple range.  Nor can I
really imagine any database that efficiently splits a single table
across multiple machines (or even inefficiently unless some internal
partitioning is being done).

Ah, what about partial indexes - those might help.  As a kind of 
'semi-partition'.
He could also you schemas to partition out the information within the 
same database.

J
Chris

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Richard_D_Levine

I think maybe a SAN in conjunction with tablespaces might be the answer.
Still need one honking server.

Rick



  
  Stephen Frost 
  
  [EMAIL PROTECTED]   To:   Christopher 
Kings-Lynne [EMAIL PROTECTED]  
  Sent by:   cc:   Hervé 
Piedvache [EMAIL PROTECTED], pgsql-performance@postgresql.org
  [EMAIL PROTECTED]Subject:  Re: [PERFORM] 
PostgreSQL clustering VS MySQL clustering
  tgresql.org   
  

  

  
  01/20/2005 10:08 AM   
  

  

  




* Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote:
 PostgreSQL has replication, but not partitioning (which is what you
want).

It doesn't have multi-server partitioning..  It's got partitioning
within a single server (doesn't it?  I thought it did, I know it was
discussed w/ the guy from Cox Communications and I thought he was using
it :).

 So, your only option is Oracle or another very expensive commercial
 database.

Or partition the data at the application layer.

 Stephen
(See attached file: signature.asc)


signature.asc
Description: Binary data

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Which PARAMETER is most important for load query??

2005-01-20 Thread Christopher Kings-Lynne

[EMAIL PROTECTED] wrote:
I'm dealing with big database [3.8 Gb] and records of 3 millions . Some of the
query seems to be slow eventhough just a few users in the night. I would like
to know which parameter list below is most effective in rising the speed of
these queries?
Shmmax = 32384*8192 =265289728
Share buffer = 32384
That's the one you want to increase...
sort_mem = 34025= I guess increase this one is most effective but too
You should reduce this.  This is memory PER SORT.  You could have 10 
sorts in one query and that query being run 10 times at once, using 100x 
that sort_mem in total - causing lots of swapping.  So something like 
8192 would probably be better, even lower at 4096 perhaps.

effective cache = 153204
That's probably about right.
Chris
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Rod Taylor
On Thu, 2005-01-20 at 15:36 +0100, Hervé Piedvache wrote:
 Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit :
   Is there any solution with PostgreSQL matching these needs ... ?
 
  You want: http://www.slony.info/
 
   Do we have to backport our development to MySQL for this kind of problem
   ? Is there any other solution than a Cluster for our problem ?
 
  Well, Slony does replication which is basically what you want :)
 
  Only master-slave though, so you will need to have all inserts go via
  the master server, but selects can come off any server.
 
 Sorry but I don't agree with this ... Slony is a replication solution ... I 
 don't need replication ... what will I do when my database will grow up to 50 
 Gb ... I'll need more than 50 Gb of RAM on each server ???
 This solution is not very realistic for me ...

Slony has some other issues with databases  200GB in size as well
(well, it hates long running transactions -- and pg_dump is a regular
long running transaction)

However, you don't need RAM one each server for this, you simply need
enough disk space.

Have a Master which takes writes, a replicator which you can consider
to be a hot-backup of the master, have N slaves replicate off of the
otherwise untouched replicator machine.

For your next trick, have the application send read requests for Clients
A-C to slave 1, D-F to slave 2, ...

You need enough memory to hold the index sections for clients A-C on
slave 1. The rest of the index can remain on disk. It's available should
it be required (D-F box crashed, so your application is now feeding
those read requests to the A-C machine)...

Go to more slaves and smaller segments as you require. Use the absolute
cheapest hardware you can find for the slaves that gives reasonable
performance. They don't need to be reliable, so RAID 0 on IDE drives is
perfectly acceptable.

PostgreSQL can do the replication portion quite nicely. You need to
implement the cluster part in the application side.
-- 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Disk configuration

2005-01-20 Thread Alex Turner
I have never seen benchmarks for RAID 0+1.  Very few people use it
because it's not very fault tolerant, so I couldn't answer for sure. 
I would imagine that RAID 0+1 could acheive better read throughput
because you could, in theory, read from each half of the mirror
independantly.  Write would be the same I would imagine because you
still have to write all data to all drives.  Thats my best guess.

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 11:55:37 +1100, Benjamin Wragg [EMAIL PROTECTED] wrote:
 
 Thanks. That sorts out all my questions regarding disk configuration. One
 more regarding RAID. Is RAID 1+0 and 0+1 essentially the same at a
 performance level?
 
 Thanks,
 
 Benjamin
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Alex Turner
 Sent: Thursday, 20 January 2005 2:53 AM
 To: Benjamin Wragg
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Disk configuration
 
 The primary goal is to reduce the number of seeks a disk or array has to
 perform.  Serial write throughput is much higher than random write
 throughput.  If you are performing very high volume throughput on a server
 that is doing multiple things, then it maybe advisable to have one partition
 for OS, one for postgresql binaries, one for xlog and one for table data (or
 multiple if you are PG8.0).  This is the ultimate configuration, but most
 people don't require this level of seperation.  If you do need this level of
 seperation, also bare in mind that table data writes are more likely to be
 random writes so you want an array that can sustain a high levels of IO/sec,
 so RAID 10 with 6 or more drives is ideal.  If you want fault tolerance,
 then RAID 1 for OS and postgresql binaries is a minimum, and I believe that
 xlog can also go on a RAID 1 unless you need more MB/sec.  Ultimately you
 will need to benchmark any configuration you build in order to determine if
 it's successfull and meets your needs.  This of course sucks, because you
 don't want to buy too much because it's a waste of $$s.
 
 What I can tell you is my own experience which is a database running with
 xlog, software and OS on a RAID 1, with Data partition running on
 3 disk RAID 5 with a database of about 3 million rows total gets an insert
 speed of about 200 rows/sec on an average size table using a compaq proliant
 ML370 Dual Pentium 933 w/2G RAM.  Most of the DB is in RAM, so read times
 are very good with most queries returning sub second.
 
 Hope this helps at least a little
 
 Alex Turner
 NetEconomist
 
 On Wed, 19 Jan 2005 09:03:44 +1100, Benjamin Wragg [EMAIL PROTECTED]
 wrote:
 
  I just wanted to bounce off the list the best way to configure disks
  for a postgresql server. My gut feeling is as follows:
 
  Keep the OS and postgresql install on seperate disks to the postgresql
  /data directory?
  Is a single hard disk drive acceptable for the OS and postgresql
  program, or will this create a bottle neck? Would a multi disk array
  be more appropriate?
 
  Cheers,
 
  Benjamin Wragg
 
 
  --
   No virus found in this outgoing message.
   Checked by AVG Anti-Virus.
   Version: 7.0.300 / Virus Database: 265.7.0 - Release Date: 17/01/2005
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 
 --
 No virus found in this incoming message.
 Checked by AVG Anti-Virus.
 Version: 7.0.300 / Virus Database: 265.7.0 - Release Date: 17/01/2005
 
 
 --
 No virus found in this outgoing message.
 Checked by AVG Anti-Virus.
 Version: 7.0.300 / Virus Database: 265.7.1 - Release Date: 19/01/2005
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread amrit
What you want is some kind of huge pararell computing , isn't it? I have heard
from many groups of Japanese Pgsql developer did it but they are talking in
japanese website and of course  in Japanese.
I can name one of them  Asushi Mitani  and his website
http://www.csra.co.jp/~mitani/jpug/pgcluster/en/index.html
and you may directly contact him.

Amrit
Thailand

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Stephen Frost
* [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
 I think maybe a SAN in conjunction with tablespaces might be the answer.
 Still need one honking server.

That's interesting- can a PostgreSQL partition be acress multiple
tablespaces?

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM]

2005-01-20 Thread Alex Turner
I am curious - I wasn't aware that postgresql supported partitioned tables,
Could someone point me to the docs on this.

Thanks,

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 09:26:03 -0500, Stephen Frost [EMAIL PROTECTED] wrote:
 * Matt Casters ([EMAIL PROTECTED]) wrote:
  I have the go ahead of a customer to do some testing on Postgresql in a 
  couple of weeks as a
  replacement for Oracle.
  The reason for the test is that the number of users of the warehouse is 
  going to increase and this
  will have a serious impact on licencing costs. (I bet that sounds familiar)
 
 Rather familiar, yes... :)
 
  We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb 
  RAM) on Oracle.
  Basically we have 2 large fact tables to deal with: one going for 400M 
  rows, the other will be
  hitting 1B rows soon.
  (around 250Gb of data)
 
 Quite a bit of data.  There's one big thing to note here I think-
 Postgres will not take advantage of multiple CPUs for a given query,
 Oracle will.  So, it depends on your workload as to how that may impact
 you.  Situations where this will be unlikely to affect you:
 
 Your main bottle-neck is IO/disk and not CPU.
 You run multiple queries in parallel frequently.
 There are other processes on the system which chew up CPU time anyway.
 
 Situations where you're likely to be affected would be:
 
 You periodically run one big query.
 You run a set of queries in sequential order.
 
  My questions to the list are: has this sort of thing been attempted before? 
  If so, what where the
  performance results compared to Oracle?
 
 I'm pretty sure it's been attempted before but unfortunately I don't
 have any numbers on it myself.  My data sets aren't that large (couple
 million rows) but I've found PostgreSQL at least as fast as Oracle for
 what we do, and much easier to work with.
 
  I've been reading up on partitioned tabes on pgsql, will the performance 
  benefit will be
  comparable to Oracle partitioned tables?
 
 In this case I would think so, except that PostgreSQL still won't use
 multiple CPUs for a given query, even against partitioned tables, aiui.
 
  What are the gotchas?
 
 See above? :)  Other issues are things having to do w/ your specific
 SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
 something like select x,y from a,b where x=%y; to do a right-join,
 iirc).
 
  Should I be testing on 8 or the 7 version?
 
 Now that 8.0 is out I'd say probably test with that and just watch for
 8.0.x releases before you go production, if you have time before you
 have to go into production with the new solution (sounds like you do-
 changing databases takes time anyway).
 
  Thanks in advance for any help you may have, I'll do my best to keep 
  pgsql-performance up to date
  on the results.
 
 Hope that helps.  Others on here will correct me if I misspoke. :)
 
 Stephen
 
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM]

2005-01-20 Thread Greg Stark

Matt Casters [EMAIL PROTECTED] writes:

 I've been reading up on partitioned tabes on pgsql, will the performance
 benefit will be comparable to Oracle partitioned tables?

Postgres doesn't have any built-in support for partitioned tables. You can do
it the same way people did it on Oracle up until 8.0 which is by creating
views of UNIONs or using inherited tables.

The main advantage of partitioned tables is being able to load and drop data
in large chunks instantaneously. This avoids having to perform large deletes
and then having to vacuum huge tables to recover the space.

However in Postgres you aren't going to get most of the performance advantage
of partitions in your query plans. The Oracle planner can prune partitions it
knows aren't relevant to the query to avoid having to search through them.

This can let it get the speed of a full table scan without the disadvantage of
having to read irrelevant tuples. Postgres is sometimes going to be forced to
either do a much slower index scan or read tables that aren't relevant.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Alex Turner
I am also very interesting in this very question.. Is there any way to
declare a persistant cursor that remains open between pg sessions? 
This would be better than a temp table because you would not have to
do the initial select and insert into a fresh table and incur those IO
costs, which are often very heavy, and the reason why one would want
to use a cursor.

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 15:20:59 +, Richard Huxton dev@archonet.com wrote:
 Andrei Bintintan wrote:
  If you're using this to provide pages of results, could you use a
  cursor?
 
  What do you mean by that? Cursor?
 
  Yes I'm using this to provide pages, but If I jump to the last pages
  it goes very slow.
 
 DECLARE mycursor CURSOR FOR SELECT * FROM ...
 FETCH FORWARD 10 IN mycursor;
 CLOSE mycursor;
 
 Repeated FETCHes would let you step through your results. That won't
 work if you have a web-app making repeated connections.
 
 If you've got a web-application then you'll probably want to insert the
 results into a cache table for later use.
 
 --
Richard Huxton
Archonet Ltd
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Greg Stark
Steve Wampler [EMAIL PROTECTED] writes:

 Hervé Piedvache wrote:
 
  No ... as I have said ... how I'll manage a database getting a table of may
  be 250 000 000 records ? I'll need incredible servers ... to get quick 
  access
  or index reading ... no ?
 
 Probably by carefully partitioning their data.  I can't imagine anything
 being fast on a single table in 250,000,000 tuple range.  

Why are you all so psyched out by the size of the table? That's what indexes
are for.

The size of the table really isn't relevant here. The important thing is the
size of the working set. Ie, How many of those records are required to respond
to queries.

As long as you tune your application so every query can be satisfied by
reading a (very) limited number of those records and have indexes to speed
access to those records you can have quick response time even if you have
terabytes of raw data. 

I would start by looking at the plans for the queries you're running and
seeing if you have any queries that are reading more than hundred records or
so. If so then you have to optimize them or rethink your application design.
You might need to restructure your data so you don't have to scan too many
records for any query.

No clustering system is going to help you if your application requires reading
through too much data. If every query is designed to not have to read more
than a hundred or so records then there's no reason you can't have sub-100ms
response time even if you had terabytes of raw data.

If the problem is just that each individual query is fast but there's too many
coming for a single server then something like slony is all you need. It'll
spread the load over multiple machines. If you spread the load in an
intelligent way you can even concentrate each server on certain subsets of the
data. But that shouldn't even really be necessary, just a nice improvement.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Ron Mayer
Richard Huxton wrote:
If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

If I have quite a bit of activity like this (people selecting 1 out
of a few million rows and paging through them in a web browser), would
it be good to have a single table with a userid column shared by all
users, or a separate table for each user that can be truncated/dropped?
I started out with one table; but with people doing 10s of thousand
of inserts and deletes per session, I had a pretty hard time figuring
out a reasonable vacuum strategy.
Eventually I started doing a whole bunch of create table tmp_
tables where  is a userid; and a script to drop these tables - but
that's quite ugly in a different way.
With 8.0 I guess I'll try the single table again - perhaps what I
want may be to always have a I/O throttled vacuum running...  hmm.
Any suggestions?
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Alex Turner wrote:
I am also very interesting in this very question.. Is there any way
to declare a persistant cursor that remains open between pg sessions?
Not sure how this would work. What do you do with multiple connections? 
Only one can access the cursor, so which should it be?

 This would be better than a temp table because you would not have to
 do the initial select and insert into a fresh table and incur those
IO costs, which are often very heavy, and the reason why one would
want to use a cursor.
I'm pretty sure two things mean there's less difference than you might 
expect:
1. Temp tables don't fsync
2. A cursor will spill to disk beyond a certain size

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Greg Stark
Andrei Bintintan [EMAIL PROTECTED] writes:

  If you're using this to provide pages of results, could you use a cursor?
 What do you mean by that? Cursor?
 
 Yes I'm using this to provide pages, but If I jump to the last pages it goes
 very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.

To do this the query would look something like:

SELECT * FROM tab WHERE col  ? ORDER BY col LIMIT 50

Then you take note of the last value used on a given page and if the user
selects next you pass that as the starting point for the next page.

This query takes the same amount of time no matter how many records are in the
table and no matter what page of the result set the user is on. It should
actually be instantaneous even if the user is on the hundredth page of
millions of records because it uses an index both for the finding the right
point to start and for the ordering.

It also has the advantage that it works even if the list of items changes as
the user navigates. If you use OFFSET and someone inserts a record in the
table then the next page will overlap the current page. Worse, if someone
deletes a record then next will skip a record.

The disadvantages of this are a) it's hard (but not impossible) to go
backwards. And b) it's impossible to give the user a list of pages and let
them skip around willy nilly.


(If this is for a web page then specifically don't recommend cursors. It will
mean you'll have to have some complex session management system that
guarantees the user will always come to the same postgres session and has some
garbage collection if the user disappears. And it means the URL is only good
for a limited amount of time. If they bookmark it it'll break if they come
back the next day.)

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Merlin Moncure
 I am also very interesting in this very question.. Is there any way to
 declare a persistant cursor that remains open between pg sessions?
 This would be better than a temp table because you would not have to
 do the initial select and insert into a fresh table and incur those IO
 costs, which are often very heavy, and the reason why one would want
 to use a cursor.

Yes, it's called a 'view' :-)

Everything you can do with cursors you can do with a view, including
selecting records in blocks in a reasonably efficient way.  As long as
your # records fetched is not real small ( 10) and your query is not
super complex, you can slide your view just like a cursor with zero real
impact on performance.

If the query in question does not scale in time complexity with the
amount of data returned (there is a fix processing step which can't be
avoided), then it's materialized view time, such that they can be done
in PostgreSQL.

Now, cursors can be passed around in pl/pgsql functions which makes them
very useful in that context.  However, for normal data processing via
queries, they have some limitations that makes them hard to use in a
general sense.

Merlin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Ron Mayer wrote:
Richard Huxton wrote:
If you've got a web-application then you'll probably want to insert 
the results into a cache table for later use.

If I have quite a bit of activity like this (people selecting 1 out
of a few million rows and paging through them in a web browser), would
it be good to have a single table with a userid column shared by all
users, or a separate table for each user that can be truncated/dropped?
I started out with one table; but with people doing 10s of thousand
of inserts and deletes per session, I had a pretty hard time figuring
out a reasonable vacuum strategy.
As often as you can, and make sure your config allocates enough 
free-space-map for them. Unless, of course, you end up I/O saturated.

Eventually I started doing a whole bunch of create table tmp_
tables where  is a userid; and a script to drop these tables - but
that's quite ugly in a different way.
With 8.0 I guess I'll try the single table again - perhaps what I
want may be to always have a I/O throttled vacuum running...  hmm.
Well, there have been some tweaks, but I don't know if they'll help in 
this case.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread William Yu
Hervé Piedvache wrote:
Sorry but I don't agree with this ... Slony is a replication solution ... I 
don't need replication ... what will I do when my database will grow up to 50 
Gb ... I'll need more than 50 Gb of RAM on each server ???
This solution is not very realistic for me ...
Have you confirmed you need a 1:1 RAM:data ratio? Of course more memory 
gets more speed but often at a diminishing rate of return. Unless every 
record of your 50GB is used in every query, only the most commonly used 
elements of your DB needs to be in RAM. This is the very idea of caching.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Holger Hoffstaette
On Thu, 20 Jan 2005 16:32:27 +0100, Hervé Piedvache wrote:

 Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit :
 Google uses something called the google filesystem, look it up in
 google. It is a distributed file system.
 
 Yes that's another point I'm working on ... make a cluster of server using
 GFS ... and making PostgreSQL running with it ...

Did you read the GFS whitepaper? It really works differently from other
filesystems with regard to latency and consistency. You'll probably have
better success with Lustre (http://www.clusterfs.com/) or RedHat's Global
File System (http://www.redhat.com/software/rha/gfs/).
If you're looking for a 'cheap, free and easy' solution you can just as
well stop right now. :-)

-h



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Alex Turner
The problem is very large ammounts of data that needs to be both read
and updated.  If you replicate a system, you will need to
intelligently route the reads to the server that has the data in RAM
or you will always be hitting DIsk which is slow. This kind of routing
AFAIK is not possible with current database technology, and you are
still stuck for writes.

Writes are always going to be the bane of any cluster.  Clustering can
give better parallel read performance i.e. large no. of clients
accessing data simultaneously, but your write performance is always
going to be bound by the underlying disk infrastructure, not even
Oracle RAC can get around this (It uses multiple read nodes accessing
the same set of database files underneath)

Google solved the problem by building this intelligence into the
middle tier, and using a distributed file system. Java Entity Beans
are supposed to solve this problem somewhat by distributing the data
across multiple servers in a cluster and allowing you to defer write
syncing, but it really doesn't work all that well.

The only way I know to solve this at the RDBMS layer is to configure a
very powerfull disk layer, which is basicaly going to a SAN mesh with
multiple cards on a single system with multiple IO boards, or an OS
that clusters at the base level, thinking HP Superdome or z900.  Even
Opteron w/PCI-X cards has a limit of about 400MB/sec throughput on a
single IO channel, and there are only two independent channels on any
boards I know about.

The other solution is to do what google did.  Implement your own
middle tier that knows how to route queries to the appropriate place. 
Each node can then have it's own independant database with it's own
independant disk subsystem, and your throughput is only limited by
your network interconnects, and your internet pipe.  This kind of
middle tier is really not that hard to if your data can easily be
segmented.  Each node runs it's own query sort and filter
independantly, and supplies the result to the central data broker,
which then collates the results and supplies them back to the user. 
Updated work in a similar fasion.  The update comes into the central
broker that decides which nodes it will affect, and then issues
updates to those nodes.

I've built this kind of architecture, if you want to do it, don't use
Java unless you want to pay top dollar for your programmers, because
it's hard to make it work well in Java (most JMS implementations suck,
look at MQueue or a custom queue impl, forget XML it's too slow to
serialize and deserialize requests).

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 11:13:25 -0500, Stephen Frost [EMAIL PROTECTED] wrote:
 * [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote:
  I think maybe a SAN in conjunction with tablespaces might be the answer.
  Still need one honking server.
 
 That's interesting- can a PostgreSQL partition be acress multiple
 tablespaces?
 
 Stephen
 
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton
Greg Stark wrote:
Andrei Bintintan [EMAIL PROTECTED] writes:

If you're using this to provide pages of results, could you use a cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide pages, but If I jump to the last pages it goes
very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.
To do this the query would look something like:
SELECT * FROM tab WHERE col  ? ORDER BY col LIMIT 50
Then you take note of the last value used on a given page and if the user
selects next you pass that as the starting point for the next page.
Greg's is the most efficient, but you need to make sure you have a 
suitable key available in the output of your select.

Also, since you are repeating the query you could get different results 
as people insert/delete rows. This might or might not be what you want.

A similar solution is to partition by date/alphabet or similar, then 
page those results. That can reduce your resultset to a manageable size.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Darcy Buskermolen
On January 20, 2005 06:49 am, Joshua D. Drake wrote:
 Stephen Frost wrote:
 * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit :
 * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
 Is there any solution with PostgreSQL matching these needs ... ?
 
 You might look into pg_pool.  Another possibility would be slony, though
 I'm not sure it's to the point you need it at yet, depends on if you can
 handle some delay before an insert makes it to the slave select systems.
 
 I think not ... pgpool or slony are replication solutions ... but as I
  have said to Christopher Kings-Lynne how I'll manage the scalabilty of
  the database ? I'll need several servers able to load a database growing
  and growing to get good speed performance ...
 
 They're both replication solutions, but they also help distribute the
 load.  For example:
 
 pg_pool will distribute the select queries amoung the servers.  They'll
 all get the inserts, so that hurts, but at least the select queries are
 distributed.
 
 slony is similar, but your application level does the load distribution
 of select statements instead of pg_pool.  Your application needs to know
 to send insert statements to the 'main' server, and select from the
 others.

 You can put pgpool in front of replicator or slony to get load
 balancing for reads.

Last time I checked load ballanced reads was only available in pgpool if you 
were using pgpools's internal replication.  Has something changed recently?


 Is there any other solution than a Cluster for our problem ?
 
 Bigger server, more CPUs/disks in one box.  Try to partition up your
 data some way such that it can be spread across multiple machines, then
 if you need to combine the data have it be replicated using slony to a
 big box that has a view which joins all the tables and do your big
 queries against that.
 
 But I'll arrive to limitation of a box size quickly I thing a 4
  processors with 64 Gb of RAM ... and after ?

 Opteron.

IBM Z-series, or other big iron.


 Go to non-x86 hardware after if you're going to continue to increase the
 size of the server.  Personally I think your better bet might be to
 figure out a way to partition up your data (isn't that what google
 does anyway?).
 
  Stephen

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Darcy Buskermolen
On January 20, 2005 06:51 am, Christopher Kings-Lynne wrote:
 Sorry but I don't agree with this ... Slony is a replication solution
  ... I don't need replication ... what will I do when my database will
  grow up to 50 Gb ... I'll need more than 50 Gb of RAM on each server
  ??? This solution is not very realistic for me ...
 
 I need a Cluster solution not a replication one or explain me in details
 how I will do for managing the scalabilty of my database ...
 
 Buy Oracle
 
  I think this is not my solution ... sorry I'm talking about finding a
  PostgreSQL solution ...

 My point being is that there is no free solution.  There simply isn't.
 I don't know why you insist on keeping all your data in RAM, but the
 mysql cluster requires that ALL data MUST fit in RAM all the time.

 PostgreSQL has replication, but not partitioning (which is what you want).

 So, your only option is Oracle or another very expensive commercial
 database.

Another Option to consider would be pgmemcache.  that way you just build the 
farm out of lots of large memory, diskless boxes for keeping the whole 
database in memory in the whole cluster.  More information on it can be found 
at: http://people.freebsd.org/~seanc/pgmemcache/


 Chris

 ---(end of broadcast)---
 TIP 8: explain analyze is your friend

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Spiegelberg, Greg
Isn't this a prime example of when to use a servlet or something similar
in function?  It will create the cursor, maintain it, and fetch against
it for a particular page.

Greg


-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 20, 2005 10:21 AM
To: Andrei Bintintan
Cc: pgsql-sql@postgresql.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???


Andrei Bintintan wrote:
 If you're using this to provide pages of results, could you use a 
 cursor?
 
 What do you mean by that? Cursor?
 
 Yes I'm using this to provide pages, but If I jump to the last pages 
 it goes very slow.

DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;

Repeated FETCHes would let you step through your results. That won't 
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

--
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-20 Thread Ron Mayer
Randolf Richardson wrote:
 While this doesn't exactly answer your question, I use this little
 tidbit of information when selling people on PostgreSQL.  PostgreSQL
 was chosen over Oracle as the database to handle all of the .org TLDs
 information.  ...
	Do you have a link for that information?  I've told a few people about 
this and one PostgreSQL advocate (thanks to me -- they were going to be a 
Microsoft shop before that) is asking.
Of course you could read their application when they were competing
with a bunch of other companies using databases from different vendors.
I believe this is the link to their response to the database
 questions...
http://www.icann.org/tlds/org/questions-to-applicants-13.htm#Response13TheInternetSocietyISOC
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Mitch Pirtle
On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen
[EMAIL PROTECTED] wrote:
 
 Another Option to consider would be pgmemcache.  that way you just build the
 farm out of lots of large memory, diskless boxes for keeping the whole
 database in memory in the whole cluster.  More information on it can be found
 at: http://people.freebsd.org/~seanc/pgmemcache/

Which brings up another question: why not just cluster at the hardware
layer? Get an external fiberchannel array, and cluster a bunch of dual
Opterons, all sharing that storage. In that sense you would be getting
one big PostgreSQL 'image' running across all of the servers.

Or is that idea too 90's?  ;-)

-- Mitch

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Ragnar =?ISO-8859-1?Q?Hafsta=F0?=
On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:

 The best way to do pages for is not to use offset or cursors but to use an
 index. This only works if you can enumerate all the sort orders the
 application might be using and can have an index on each of them.
 
 To do this the query would look something like:
 
 SELECT * FROM tab WHERE col  ? ORDER BY col LIMIT 50
 
 Then you take note of the last value used on a given page and if the user
 selects next you pass that as the starting point for the next page.

this will only work unchanged if the index is unique. imagine , for
example if you have more than 50 rows with the same value of col.

one way to fix this is to use ORDER BY col,oid

gnari



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Ragnar Hafstað
On Thu, 2005-01-20 at 19:12 +, Ragnar Hafstað wrote:
 On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:
 
  The best way to do pages for is not to use offset or cursors but to use an
  index. This only works if you can enumerate all the sort orders the
  application might be using and can have an index on each of them.
  
  To do this the query would look something like:
  
  SELECT * FROM tab WHERE col  ? ORDER BY col LIMIT 50
  
  Then you take note of the last value used on a given page and if the user
  selects next you pass that as the starting point for the next page.
 
 this will only work unchanged if the index is unique. imagine , for
 example if you have more than 50 rows with the same value of col.
 
 one way to fix this is to use ORDER BY col,oid

and a slightly more complex WHERE clause as well, of course

gnari



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Darcy Buskermolen
On January 20, 2005 10:42 am, Mitch Pirtle wrote:
 On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen

 [EMAIL PROTECTED] wrote:
  Another Option to consider would be pgmemcache.  that way you just build
  the farm out of lots of large memory, diskless boxes for keeping the
  whole database in memory in the whole cluster.  More information on it
  can be found at: http://people.freebsd.org/~seanc/pgmemcache/

 Which brings up another question: why not just cluster at the hardware
 layer? Get an external fiberchannel array, and cluster a bunch of dual
 Opterons, all sharing that storage. In that sense you would be getting
 one big PostgreSQL 'image' running across all of the servers.

It dosn't quite work that way, thanks to shared memory, and kernel disk cache.  
(among other things)

 Or is that idea too 90's?  ;-)

 -- Mitch

 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Merlin Moncure
 this will only work unchanged if the index is unique. imagine , for
 example if you have more than 50 rows with the same value of col.
 
 one way to fix this is to use ORDER BY col,oid

nope!  oid is
1. deprecated
2. not guaranteed to be unique even inside a (large) table.

Use a sequence instead.  

create view a_b as
select nextval('some_sequnce')::k, a.*, b.* from a, b [...]


select * from a_b where k  k1 order by k limit 1000
*or*
execute fetch_a_b(k1, 1000) -- pass limit into prepared statement for extra 
flexibility.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread =?iso-8859-15?q?Herv=E9_Piedvache?=
Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit :
 Could you explain us what do you have in mind for that solution? I mean,
 forget the PostgreSQL (or any other database) restrictions and explain us
 how this hardware would be. Where the data would be stored?

 I've something in mind for you, but first I need to understand your needs!

I just want to make a big database as explained in my first mail ... At the 
beginning we will have aprox. 150 000 000 records ... each month we will add 
about 4/8 millions new rows in constant flow during the day ... and in same 
time web users will access to the database in order to read those data.
Stored data are quite close to data stored by google ... (we are not making a 
google clone ... just a lot of data many small values and some big ones ... 
that's why I'm comparing with google for data storage).
Then we will have a search engine searching into those data ...

Dealing about the hardware, for the moment we have only a bi-pentium Xeon 
2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results ... so 
we are thinking about a new solution with maybe several servers (server 
design may vary from one to other) ... to get a kind of cluster to get better 
performance ...

Am I clear ?

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jean-Max Reymond
On Thu, 20 Jan 2005 12:13:17 -0700, Steve Wampler [EMAIL PROTECTED] wrote:
 Mitch Pirtle wrote:

 But that's not enough, because you're going to be running separate
 postgresql backends on the different hosts, and there are
 definitely consistency issues with trying to do that.  So far as
 I know (right, experts?) postgresql isn't designed with providing
 distributed consistency in mind (isn't shared memory used for
 consistency, which restricts all the backends to a single host?).

yes, you're right: you'll need a Distributed Lock Manager and an
application to manage it , Postgres ?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Greg Stark
Hervé Piedvache [EMAIL PROTECTED] writes:

 Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit :
  Could you explain us what do you have in mind for that solution? I mean,
  forget the PostgreSQL (or any other database) restrictions and explain us
  how this hardware would be. Where the data would be stored?
 
  I've something in mind for you, but first I need to understand your needs!
 
 I just want to make a big database as explained in my first mail ... At the 
 beginning we will have aprox. 150 000 000 records ... each month we will add 
 about 4/8 millions new rows in constant flow during the day ... and in same 
 time web users will access to the database in order to read those data.
 Stored data are quite close to data stored by google ... (we are not making a 
 google clone ... just a lot of data many small values and some big ones ... 
 that's why I'm comparing with google for data storage).
 Then we will have a search engine searching into those data ...

You're concentrating on the data within the database. That's only half the
picture. What are you going to *do* with the data in the database? You need to
analyze what we will have a search engine searching into those data means in
more detail.

Postgres is more than capable of storing 150Gb of data. There are people with
terabyte databases on this list. You need to define what types of queries you
need to perform, how many data they need to manipulate, and what your
performance requirements are for those queries.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM]

2005-01-20 Thread Matt Casters
 
Thanks Stephen,

My main concern is to get as much read performance on the disks as possible
on this given system.  CPU is rarely a problem on a typical data warehouse
system, this one's not any different.

We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
coming along.(around 350Gb)
I was kind of hoping that the new PGSQL tablespaces would allow me to create
a storage container spanning multiple file-systems, but unfortunately, that
seems to be not the case.  Is this correct?

That tells me that I probably need to do a full reconfiguration of the disks
on the Solaris level to get maximum performance out of the system.
Mmmm. This is going to be a though one to crack.  Perhaps it will be
possible to get some extra juice out of placing the indexes on the smaller
disks (150G) and the data on the bigger ones?

Thanks!

Matt

-Oorspronkelijk bericht-
Van: Stephen Frost [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 15:26
Aan: Matt Casters
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

* Matt Casters ([EMAIL PROTECTED]) wrote:
 I have the go ahead of a customer to do some testing on Postgresql in 
 a couple of weeks as a replacement for Oracle.
 The reason for the test is that the number of users of the warehouse 
 is going to increase and this will have a serious impact on licencing 
 costs. (I bet that sounds familiar)

Rather familiar, yes... :)

 We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
RAM) on Oracle.
 Basically we have 2 large fact tables to deal with: one going for 400M 
 rows, the other will be hitting 1B rows soon.
 (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think- Postgres
will not take advantage of multiple CPUs for a given query, Oracle will.
So, it depends on your workload as to how that may impact you.  Situations
where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

 My questions to the list are: has this sort of thing been attempted 
 before? If so, what where the performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't have
any numbers on it myself.  My data sets aren't that large (couple million
rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
and much easier to work with.

 I've been reading up on partitioned tabes on pgsql, will the 
 performance benefit will be comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

 What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join, iirc).

 Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
releases before you go production, if you have time before you have to go
into production with the new solution (sounds like you do- changing
databases takes time anyway).

 Thanks in advance for any help you may have, I'll do my best to keep 
 pgsql-performance up to date on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

Stephen



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Dave Cramer




Two way xeon's are as fast as a single opteron, 150M rows isn't a big
deal.
Clustering isn't really the solution, I fail to see how clustering
actually helps since it has to slow down file access.

Dave

Herv Piedvache wrote:

  Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a crit :
  
  
Could you explain us what do you have in mind for that solution? I mean,
forget the PostgreSQL (or any other database) restrictions and explain us
how this hardware would be. Where the data would be stored?

I've something in mind for you, but first I need to understand your needs!

  
  
I just want to make a big database as explained in my first mail ... At the 
beginning we will have aprox. 150 000 000 records ... each month we will add 
about 4/8 millions new rows in constant flow during the day ... and in same 
time web users will access to the database in order to read those data.
Stored data are quite close to data stored by google ... (we are not making a 
google clone ... just a lot of data many small values and some big ones ... 
that's why I'm comparing with google for data storage).
Then we will have a search engine searching into those data ...

Dealing about the hardware, for the moment we have only a bi-pentium Xeon 
2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results ... so 
we are thinking about a new solution with maybe several servers (server 
design may vary from one to other) ... to get a kind of cluster to get better 
performance ...

Am I clear ?

Regards,
  


-- 
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561





Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Merlin Moncure
 Dealing about the hardware, for the moment we have only a bi-pentium Xeon
 2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results ...
 so
 we are thinking about a new solution with maybe several servers (server
 design may vary from one to other) ... to get a kind of cluster to get
 better
 performance ...
 
 Am I clear ?

yes.  Clustering is not the answer to your problem.  You need to build a 
bigger, faster box with lots of storage.

Clustering is 
A: a headache
B: will cost you more, not less
C: not designed for what you are trying to do.

Going the x86 route, for about 20k$ you can get quad Opteron with 1-2 terabytes 
of storage (SATA), depending on how you configure your raid.   This is the best 
bang for the buck you are going to get, period.  Replicate for redundancy, not 
performance.

If you are doing fair amount of writes, you will not be able to make a faster 
system than this for similar amount of cash.  You can drop the price a bit by 
pushing optional upgrades out to the future...

If this is not good enough for you, it's time to start thinking about a mid 
range server.

Merlin

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM]

2005-01-20 Thread Joshua D. Drake
Matt Casters wrote:
 
Thanks Stephen,

My main concern is to get as much read performance on the disks as possible
on this given system.  CPU is rarely a problem on a typical data warehouse
system, this one's not any different.
We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
coming along.(around 350Gb)
Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for 
your database? Push all of your extra drives into the RAID 10.

Sincerely,
Joshua D. Drake


I was kind of hoping that the new PGSQL tablespaces would allow me to create
a storage container spanning multiple file-systems, but unfortunately, that
seems to be not the case.  Is this correct?
That tells me that I probably need to do a full reconfiguration of the disks
on the Solaris level to get maximum performance out of the system.
Mmmm. This is going to be a though one to crack.  Perhaps it will be
possible to get some extra juice out of placing the indexes on the smaller
disks (150G) and the data on the bigger ones?
Thanks!
Matt
-Oorspronkelijk bericht-
Van: Stephen Frost [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 15:26
Aan: Matt Casters
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

* Matt Casters ([EMAIL PROTECTED]) wrote:
I have the go ahead of a customer to do some testing on Postgresql in 
a couple of weeks as a replacement for Oracle.
The reason for the test is that the number of users of the warehouse 
is going to increase and this will have a serious impact on licencing 
costs. (I bet that sounds familiar)

Rather familiar, yes... :)

We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
RAM) on Oracle.
Basically we have 2 large fact tables to deal with: one going for 400M 
rows, the other will be hitting 1B rows soon.
(around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think- Postgres
will not take advantage of multiple CPUs for a given query, Oracle will.
So, it depends on your workload as to how that may impact you.  Situations
where this will be unlikely to affect you:
Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.
Situations where you're likely to be affected would be:
You periodically run one big query.
You run a set of queries in sequential order.

My questions to the list are: has this sort of thing been attempted 
before? If so, what where the performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't have
any numbers on it myself.  My data sets aren't that large (couple million
rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
and much easier to work with.

I've been reading up on partitioned tabes on pgsql, will the 
performance benefit will be comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join, iirc).

Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
releases before you go production, if you have time before you have to go
into production with the new solution (sounds like you do- changing
databases takes time anyway).

Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)
Stephen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

--
Command Prompt, Inc., your source for PostgreSQL replication,
professional support, programming, managed services, shared
and dedicated hosting. Home of the Open Source Projects plPHP,
plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com
begin:vcard
fn:Joshua D. Drake
n:Drake;Joshua D.
org:Command Prompt, Inc.
adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We  provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored  the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl.
x-mozilla-html:FALSE
url:http://www.commandprompt.com/
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Mark Kirkwood
Hervé Piedvache wrote:

Dealing about the hardware, for the moment we have only a bi-pentium Xeon 
2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results ... so 
we are thinking about a new solution with maybe several servers (server 
design may vary from one to other) ... to get a kind of cluster to get better 
performance ...

The poor performance may not necessarily be:
i) attributable to the hardware or,
ii) solved by clustering.
I would recommend determining *why* you got the slowdown. A few possible
reasons are:
i) not vacuuming often enough, freespacemap settings too small.
ii) postgresql.conf setting very non optimal.
iii) index and/or data design not optimal for PG.
My suspicions would start at iii).
Other posters have pointed out that 25000 records in itself is not
necessarily a problem, so this sort of data size is manageable.
regards
Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM]

2005-01-20 Thread Matt Casters

Joshua,

Actually that's a great idea!
I'll have to check if Solaris wants to play ball though.
We'll have to see as we don't have the new disks yet, ETA is next week.

Cheers,

Matt

-Oorspronkelijk bericht-
Van: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 21:26
Aan: [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

Matt Casters wrote:
  
 Thanks Stephen,
 
 My main concern is to get as much read performance on the disks as 
 possible on this given system.  CPU is rarely a problem on a typical 
 data warehouse system, this one's not any different.
 
 We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third 
 one coming along.(around 350Gb)

Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your
database? Push all of your extra drives into the RAID 10.

Sincerely,

Joshua D. Drake




 I was kind of hoping that the new PGSQL tablespaces would allow me to 
 create a storage container spanning multiple file-systems, but 
 unfortunately, that seems to be not the case.  Is this correct?
 
 That tells me that I probably need to do a full reconfiguration of the 
 disks on the Solaris level to get maximum performance out of the system.
 Mmmm. This is going to be a though one to crack.  Perhaps it will be 
 possible to get some extra juice out of placing the indexes on the 
 smaller disks (150G) and the data on the bigger ones?
 
 Thanks!
 
 Matt
 
 -Oorspronkelijk bericht-
 Van: Stephen Frost [mailto:[EMAIL PROTECTED]
 Verzonden: donderdag 20 januari 2005 15:26
 Aan: Matt Casters
 CC: pgsql-performance@postgresql.org
 Onderwerp: Re: [PERFORM]
 
 * Matt Casters ([EMAIL PROTECTED]) wrote:
 
I have the go ahead of a customer to do some testing on Postgresql in 
a couple of weeks as a replacement for Oracle.
The reason for the test is that the number of users of the warehouse 
is going to increase and this will have a serious impact on licencing 
costs. (I bet that sounds familiar)
 
 
 Rather familiar, yes... :)
 
 
We're running a medium sized data warehouse on a Solaris box (4CPU, 
8Gb
 
 RAM) on Oracle.
 
Basically we have 2 large fact tables to deal with: one going for 400M 
rows, the other will be hitting 1B rows soon.
(around 250Gb of data)
 
 
 Quite a bit of data.  There's one big thing to note here I think- 
 Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.
 So, it depends on your workload as to how that may impact you.  
 Situations where this will be unlikely to affect you:
 
 Your main bottle-neck is IO/disk and not CPU.
 You run multiple queries in parallel frequently.
 There are other processes on the system which chew up CPU time anyway.
 
 Situations where you're likely to be affected would be:
 
 You periodically run one big query.
 You run a set of queries in sequential order.
 
 
My questions to the list are: has this sort of thing been attempted 
before? If so, what where the performance results compared to Oracle?
 
 
 I'm pretty sure it's been attempted before but unfortunately I don't 
 have any numbers on it myself.  My data sets aren't that large (couple 
 million
 rows) but I've found PostgreSQL at least as fast as Oracle for what we 
 do, and much easier to work with.
 
 
I've been reading up on partitioned tabes on pgsql, will the 
performance benefit will be comparable to Oracle partitioned tables?
 
 
 In this case I would think so, except that PostgreSQL still won't use 
 multiple CPUs for a given query, even against partitioned tables, aiui.
 
 
What are the gotchas?
 
 
 See above? :)  Other issues are things having to do w/ your specific
 SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is 
 it, something like select x,y from a,b where x=%y; to do a right-join,
iirc).
 
 
Should I be testing on 8 or the 7 version?
 
 
 Now that 8.0 is out I'd say probably test with that and just watch for 
 8.0.x releases before you go production, if you have time before you 
 have to go into production with the new solution (sounds like you do- 
 changing databases takes time anyway).
 
 
Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.
 
 
 Hope that helps.  Others on here will correct me if I misspoke. :)
 
   Stephen
 
 
 
 ---(end of 
 broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., your source for PostgreSQL replication, professional
support, programming, managed services, shared and dedicated hosting. Home
of the Open Source Projects plPHP, plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Alex Turner
How do you create a temporary view that has only a small subset of the
data from the DB init?  (Links to docs are fine - I can read ;).  My
query isn't all that complex, and my number of records might be from
10 to 2k depending on how I implement it.

Alex Turner
NetEconomist


On Thu, 20 Jan 2005 12:00:06 -0500, Merlin Moncure
[EMAIL PROTECTED] wrote:
  I am also very interesting in this very question.. Is there any way to
  declare a persistant cursor that remains open between pg sessions?
  This would be better than a temp table because you would not have to
  do the initial select and insert into a fresh table and incur those IO
  costs, which are often very heavy, and the reason why one would want
  to use a cursor.
 
 Yes, it's called a 'view' :-)
 
 Everything you can do with cursors you can do with a view, including
 selecting records in blocks in a reasonably efficient way.  As long as
 your # records fetched is not real small ( 10) and your query is not
 super complex, you can slide your view just like a cursor with zero real
 impact on performance.
 
 If the query in question does not scale in time complexity with the
 amount of data returned (there is a fix processing step which can't be
 avoided), then it's materialized view time, such that they can be done
 in PostgreSQL.
 
 Now, cursors can be passed around in pl/pgsql functions which makes them
 very useful in that context.  However, for normal data processing via
 queries, they have some limitations that makes them hard to use in a
 general sense.
 
 Merlin
 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Russell Smith
On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
 On 20 Jan 2005 at 7:26, Stephan Szabo wrote:

[snip]
  Honestly I expected it to be slower (which it was), but I figured it's
  worth seeing what alternate plans it'll generate (specifically to see how
  it cost a nested loop on that join to compare to the fast plan).
  Unfortunately, it generated a merge join, so I think it might require both
  enable_hashjoin=false and enable_mergejoin=false to get it which is likely
  to be even slower in practice but still may be useful to see.
 
 Setting both to false gives a dramatic performance boost.  See 
 http://rafb.net/paste/results/b70KAi42.html
 
 -  Materialize  (cost=15288.70..15316.36 rows=2766 width=35) (actual 
time=0.004..0.596 rows=135 loops=92)
   -  Nested Loop  (cost=0.00..15288.70 rows=2766 width=35) 
(actual time=0.060..9.130 rows=135 loops=1)

The Planner here has a quite inaccurate guess at the number of rows that will 
match in the join.  An alternative to 
turning off join types is to up the statistics on the Element columns because 
that's where the join is happening.  Hopefully the planner will
get a better idea.  However it may not be able too.  2766 rows vs 135 is quite 
likely to choose different plans.  As you can
see you have had to turn off two join types to give something you 
wanted/expected.

 This gives suitable speed, but why does the plan vary so much with 
 such a minor change in the WHERE clause?
Plan 1 - broken
   -  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual 
time=0.056..16.161 rows=218 loops=1)

Plan 2 - deprecated
-  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35) (actual 
time=7.638..1158.128 rows=135 loops=1)

The performance difference is when the where is changed, you have a totally 
different set of selection options.
The Plan 1 and Plan 2 shown from your paste earlier, report that you are out by 
a factor of 2 for plan 1.  But for plan 2
its a factor of 20.  The planner is likely to make the wrong choice when the 
stats are out by that factor.

Beware what is a small typing change does not mean they queries are anything 
alight.

Regards

Russell Smith.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM]

2005-01-20 Thread Mark Kirkwood
Matt Casters wrote:
Hi,
My questions to the list are: has this sort of thing been attempted before? If 
so, what where the
performance results compared to Oracle?
I've been reading up on partitioned tabes on pgsql, will the performance 
benefit will be
comparable to Oracle partitioned tables?
What are the gotchas?
Should I be testing on 8 or the 7 version?
While I didn't find any documents immediately, are there any fine manuals to 
read on data
warehouse performance tuning on PostgreSQL?
Some of the previous postings on this list discuss various methods for
doing partitioning (UNION and INHERIT), as well as the use of partial
indexes - see the thread titled : 'Data Warehouse Reevaluation - MySQL
vs Postgres -- merge tables'.
Unfortunately none of these work well for a standard 'star' because :
i) all conditions are on the dimension tables, and
ii) the optimizer can eliminate 'partition' tables only on the basis of
 *constant* conditions, and the resulting implied restrictions caused
by the join to the dimension table(s) are not usable for this.
So I think to get it to work well some violence to your 'star' may be
required (e.g. adding constant columns to 'fact' tables to aid the
optimizer, plus rewriting queries to include conditions on the added
columns).
One other gotcha is that Pg cannot do index only access, which can hurt.
However it may be possibly to get good performance using CLUSTER on the
fact tables (or just loading them in a desirable order) plus using
partial indexes.
regards
Mark
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Ron Mayer
Ron Mayer wrote:
http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2002-53 
Wrong link...
http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Reportid=812
This is the one that discusses scalability, price, performance, 
failover, power consumption, hardware components, etc.

Bottom line was that the large server with SAN had $1877K hardware costs 
while the application-partitioned cluster had $110K hardware costs -- 
but it's apples-to-oranges since they were deployed in different years.

Still a big advantage for the small systems.
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Ron Mayer
Merlin Moncure wrote:
...You need to build a bigger, faster box with lots of storage...
Clustering ... 
B: will cost you more, not less

Is this still true when you get to 5-way or 17-way systems?
My (somewhat outdated) impression is that up to about 4-way systems
they're price competitive; but beyond that, I thought multiple cheap
servers scales much more afordably than large servers.   Certainly
at the point of a 129-CPU system I bet you're better off with a
network of cheap servers.
 A: a headache
Agreed if you mean clustering as-in making it look like one single 
database to the end user.  However in my experience a few years ago, if 
 you can partition the data in a way managed by the application, it'll 
not only be less of a headache, but probably provide a more flexable 
solution.  Currently I'm working on a pretty big GIS database, that 
we're looking to partition our data in a manner similar to the microsoft 
whitepaper on scaling terraserver that can be found here:
http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2002-53

I think this paper is a very nice analysis of many aspects of 
larger-serverSAN vs. application-partitioned-clusters, including 
looking at cost, reliability, managability, etc.  After reading that 
paper, we started very seriously looking into application-level 
partitioning.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-20 Thread Ron Mayer
I sometimes also think it's fun to point out that Postgresql
bigger companies supporting it's software - like this one:
http://www.fastware.com.au/docs/FujitsuSupportedPostgreSQLWhitePaper.pdf
with $43 billion revenue -- instead of those little companies
like Mysql AB or Oracle.
 :)
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM]

2005-01-20 Thread Bruno Wolff III
On Thu, Jan 20, 2005 at 11:31:29 -0500,
  Alex Turner [EMAIL PROTECTED] wrote:
 I am curious - I wasn't aware that postgresql supported partitioned tables,
 Could someone point me to the docs on this.

Some people have been doing it using a union view. There isn't actually
a partition feature.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] column without pg_stats entry?!

2005-01-20 Thread Bruno Wolff III
On Thu, Jan 20, 2005 at 11:14:28 +0100,
  Bernd Heller [EMAIL PROTECTED] wrote:
 
 I wondered why the planner was making such bad assumptions about the 
 number of rows to find and had a look at pg_stats. and there was the 
 surprise:
 there is no entry in pg_stats for that column at all!! I can only 
 suspect that this has to do with the column being all null. I tried to 
 change a few records to a not-null value, but re-ANALYZE didn't catch 
 them apparently.

Someone else reported this recently and I think it is going to be fixed.

 Is this desired behaviour for analyze? Can I change it somehow? If not, 
 is there a better way to accomplish what I'm trying? I'm not to keen on 
 disabling seqscan for that query explicitly. It's a simple enough query 
 and the planner should be able to find the right plan without help - 
 and I'm sure it would if it had stats about it.

In the short run you could add an IS NOT NULL clause to your query.
The optimizer doesn't know that  being TRUE implies IS NOT NULL and
so the partial index won't be used unless you add that clause explicitly.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Bruno Almeida do Lago
 
I was thinking the same! I'd like to know how other databases such as Oracle
do it.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mitch Pirtle
Sent: Thursday, January 20, 2005 4:42 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

On Thu, 20 Jan 2005 09:33:42 -0800, Darcy Buskermolen
[EMAIL PROTECTED] wrote:
 
 Another Option to consider would be pgmemcache.  that way you just build
the
 farm out of lots of large memory, diskless boxes for keeping the whole
 database in memory in the whole cluster.  More information on it can be
found
 at: http://people.freebsd.org/~seanc/pgmemcache/

Which brings up another question: why not just cluster at the hardware
layer? Get an external fiberchannel array, and cluster a bunch of dual
Opterons, all sharing that storage. In that sense you would be getting
one big PostgreSQL 'image' running across all of the servers.

Or is that idea too 90's?  ;-)

-- Mitch

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] index scan of whole table, can't see why

2005-01-20 Thread Dan Langille
On 21 Jan 2005 at 8:38, Russell Smith wrote:

 On Fri, 21 Jan 2005 02:36 am, Dan Langille wrote:
  On 20 Jan 2005 at 7:26, Stephan Szabo wrote:
 
 [snip]
   Honestly I expected it to be slower (which it was), but I figured
   it's worth seeing what alternate plans it'll generate
   (specifically to see how it cost a nested loop on that join to
   compare to the fast plan). Unfortunately, it generated a merge
   join, so I think it might require both enable_hashjoin=false and
   enable_mergejoin=false to get it which is likely to be even slower
   in practice but still may be useful to see.
  
  Setting both to false gives a dramatic performance boost.  See
  http://rafb.net/paste/results/b70KAi42.html
  
  -  Materialize  (cost=15288.70..15316.36 rows=2766 width=35)
  (actual time=0.004..0.596 rows=135 loops=92)
-  Nested Loop  (cost=0.00..15288.70 rows=2766
width=35) (actual time=0.060..9.130 rows=135 loops=1)
 
 The Planner here has a quite inaccurate guess at the number of rows
 that will match in the join.  An alternative to turning off join types
 is to up the statistics on the Element columns because that's where
 the join is happening.  Hopefully the planner will get a better idea. 
 However it may not be able too.  2766 rows vs 135 is quite likely to
 choose different plans.  As you can see you have had to turn off two
 join types to give something you wanted/expected.

Fair comment.  However, the statistics on ports.element_id, 
ports.deprecated, ports.broken, and element.id are both set to 1000.

  This gives suitable speed, but why does the plan vary so much with
  such a minor change in the WHERE clause?
 Plan 1 - broken
-  Nested Loop  (cost=0.00..3825.30 rows=495 width=35) (actual
time=0.056..16.161 rows=218 loops=1)
 
 Plan 2 - deprecated
 -  Hash Join  (cost=3676.78..10144.06 rows=2767 width=35)
 (actual time=7.638..1158.128 rows=135 loops=1)
 
 The performance difference is when the where is changed, you have a
 totally different set of selection options. The Plan 1 and Plan 2
 shown from your paste earlier, report that you are out by a factor of
 2 for plan 1.  But for plan 2 its a factor of 20.  The planner is
 likely to make the wrong choice when the stats are out by that factor.
 
 Beware what is a small typing change does not mean they queries are
 anything alight.

Agreed.  I just did not expect such a dramatic change which a result 
set that is similar.  Actually, they aren't that similar at all.

Thank you.
-- 
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Josh Berkus
Bruno,

 Which brings up another question: why not just cluster at the hardware
 layer? Get an external fiberchannel array, and cluster a bunch of dual
 Opterons, all sharing that storage. In that sense you would be getting
 one big PostgreSQL 'image' running across all of the servers.

 Or is that idea too 90's?  ;-)

No, it just doesn't work.   Multiple postmasters can't share one database.

LinuxLabs (as I've gathered) tried to go one better by using a tool that 
allows shared memory to bridge multple networked servers -- in other words, 
one postmaster controlling 4 or 5 servers.   The problem is that IPC via this 
method is about 1,000 times slower than IPC on a single machine, wiping out 
all of the scalability gains from having the cluster in the first place.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jim C. Nasby
On Thu, Jan 20, 2005 at 10:08:47AM -0500, Stephen Frost wrote:
 * Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote:
  PostgreSQL has replication, but not partitioning (which is what you want).
 
 It doesn't have multi-server partitioning..  It's got partitioning
 within a single server (doesn't it?  I thought it did, I know it was
 discussed w/ the guy from Cox Communications and I thought he was using
 it :).

No, PostgreSQL doesn't support any kind of partitioning, unless you
write it yourself. I think there's some work being done in this area,
though.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jim C. Nasby
On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote:
  
 I was thinking the same! I'd like to know how other databases such as Oracle
 do it.
 
In a nutshell, in a clustered environment (which iirc in oracle means
shared disks), they use a set of files for locking and consistency
across machines. So you better have fast access to the drive array, and
the array better have caching of some kind.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Jim C. Nasby
On Thu, Jan 20, 2005 at 07:12:42AM -0800, Joshua D. Drake wrote:
 
 then I was thinking. Couldn't he use
 multiple databases
 over multiple servers with dblink?
 
 It is not exactly how I would want to do it, but it would provide what
 he needs I think???

 
 
 Yes seems to be the only solution ... but I'm a little disapointed about 
 this ... could you explain me why there is not this kind of 
 functionnality ... it seems to be a real need for big applications no ?
  
 
 Because it is really, really hard to do correctly and hard
 equals expensive.

To expand on what Josh said, the expense in this case is development
resources. If you look on the developer site you'll see a huge TODO list
and a relatively small list of PostgreSQL developers. To develop a
cluster solution similar to RAC would probably take the efforts of the
entire development team for a year or more, during which time very
little else would be done.

I'm glad to see your persistance in wanting to use PostgreSQL, and there
might be some kind of limited clustering scheme that could be
implemented without a great amount of effort by the core developers. In
that case I think there's a good chance you could find people willing to
work on it.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Tatsuo Ishii
 On January 20, 2005 06:49 am, Joshua D. Drake wrote:
  Stephen Frost wrote:
  * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
  Le Jeudi 20 Janvier 2005 15:30, Stephen Frost a écrit :
  * Herv? Piedvache ([EMAIL PROTECTED]) wrote:
  Is there any solution with PostgreSQL matching these needs ... ?
  
  You might look into pg_pool.  Another possibility would be slony, though
  I'm not sure it's to the point you need it at yet, depends on if you can
  handle some delay before an insert makes it to the slave select systems.
  
  I think not ... pgpool or slony are replication solutions ... but as I
   have said to Christopher Kings-Lynne how I'll manage the scalabilty of
   the database ? I'll need several servers able to load a database growing
   and growing to get good speed performance ...
  
  They're both replication solutions, but they also help distribute the
  load.  For example:
  
  pg_pool will distribute the select queries amoung the servers.  They'll
  all get the inserts, so that hurts, but at least the select queries are
  distributed.
  
  slony is similar, but your application level does the load distribution
  of select statements instead of pg_pool.  Your application needs to know
  to send insert statements to the 'main' server, and select from the
  others.
 
  You can put pgpool in front of replicator or slony to get load
  balancing for reads.
 
 Last time I checked load ballanced reads was only available in pgpool if you 
 were using pgpools's internal replication.  Has something changed recently?

Yes. However it would be pretty easy to modify pgpool so that it could
cope with Slony-I. I.e.

1) pgpool does the load balance and sends query to Slony-I's slave and
   master if the query is SELECT.

2) pgpool sends query only to the master if the query is other than
   SELECT.

Remaining problem is that Slony-I is not a sync replication
solution. Thus you need to prepare that the load balanced query
results might differ among servers.

If there's enough demand, I would do such that enhancements to pgpool.
--
Tatsuo Ishii

  Is there any other solution than a Cluster for our problem ?
  
  Bigger server, more CPUs/disks in one box.  Try to partition up your
  data some way such that it can be spread across multiple machines, then
  if you need to combine the data have it be replicated using slony to a
  big box that has a view which joins all the tables and do your big
  queries against that.
  
  But I'll arrive to limitation of a box size quickly I thing a 4
   processors with 64 Gb of RAM ... and after ?
 
  Opteron.
 
 IBM Z-series, or other big iron.
 
 
  Go to non-x86 hardware after if you're going to continue to increase the
  size of the server.  Personally I think your better bet might be to
  figure out a way to partition up your data (isn't that what google
  does anyway?).
  
 Stephen
 
 -- 
 Darcy Buskermolen
 Wavefire Technologies Corp.
 ph: 250.717.0200
 fx:  250.763.1759
 http://www.wavefire.com
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Iain
Oracle's RAC is good, but I think it's best to view it as a step in the high 
availability direction rather than a performance enhancer. While it can help 
your application scale up, that depends on the usage pattern. Also it's not 
100% transparent to the application for example you can't depend on a 
sequence numbers being allocated uniquely as there can be delays propagating 
them to all nodes. So in clusters where insert rates are high this means you 
should explicitly check for unique key violations and try again. Dealing 
with propagation delays comes with the clustering technology I guess. 
Nonetheless, I would love to see this kind of functionality in postgres.

Regards
Iain
- Original Message - 
From: Jim C. Nasby [EMAIL PROTECTED]
To: Bruno Almeida do Lago [EMAIL PROTECTED]
Cc: 'Mitch Pirtle' [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org
Sent: Friday, January 21, 2005 10:30 AM
Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering


On Thu, Jan 20, 2005 at 10:40:02PM -0200, Bruno Almeida do Lago wrote:
I was thinking the same! I'd like to know how other databases such as 
Oracle
do it.

In a nutshell, in a clustered environment (which iirc in oracle means
shared disks), they use a set of files for locking and consistency
across machines. So you better have fast access to the drive array, and
the array better have caching of some kind.
--
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828
Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly 

---(end of broadcast)---
TIP 8: explain analyze is your friend


  1   2   >