Re: [PERFORM]

2005-01-21 Thread Matt Casters

 On Thu, Jan 20, 2005 at 11:31:29 -0500,
   Alex Turner [EMAIL PROTECTED] wrote:
 I am curious - I wasn't aware that postgresql supported partitioned tables,
 Could someone point me to the docs on this.

 Some people have been doing it using a union view. There isn't actually
 a partition feature.



Actually, there is.  If found this example on pgsql-performance:

   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
   ANALYZE super_foo ;

   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
   INSERT INTO sub_foo1 VALUES ( 1, 1 );
   -- repeat insert until sub_foo1 has 1,000,000 rows
   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
   ANALYZE sub_foo1 ;

   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
   INSERT INTO sub_foo2 VALUES ( 2, 1 );
   -- repeat insert until sub_foo2 has 1,000,000 rows
   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
   ANALYZE sub_foo2 ;


I think that in certain cases this system even beats Oracle as it stores less 
information in the
table partitions. (and in doing so is causing less disk IO)
BTW, internally, Oracle sees partitions as tables too.  Even the Union all 
system that MS SQL
Server uses works fine as long as the optimiser supports it to prune correctly.

Cheers,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM]

2005-01-21 Thread Matt Casters

  Some people have been doing it using a union view. There isn't actually
  a partition feature.

 Actually, there is.  If found this example on pgsql-performance:

CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
ANALYZE super_foo ;
 
CREATE TABLE sub_foo1 () INHERITS ( super_foo );
 [...]
 
CREATE TABLE sub_foo2 () INHERITS ( super_foo );
 [...]
 

 Yes, this could be used instead of a view. But there is one thing
 missing. You can't just insert into super_foo and aquire the correct
 partition. You will still have to insert into the correct underlying
 table. Real partitioning will take care of correct partition
 selection.

This IS bad news.  It would mean a serious change in the ETL.
I think I can solve the other problems, but I don't know about this one...

Regards,

Matt





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM]

2005-01-20 Thread Matt Casters

Hi,

I have the go ahead of a customer to do some testing on Postgresql in a couple 
of weeks as a
replacement for Oracle.
The reason for the test is that the number of users of the warehouse is going 
to increase and this
will have a serious impact on licencing costs. (I bet that sounds familiar)

We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on 
Oracle.
Basically we have 2 large fact tables to deal with: one going for 400M rows, 
the other will be
hitting 1B rows soon.
(around 250Gb of data)

My questions to the list are: has this sort of thing been attempted before? If 
so, what where the
performance results compared to Oracle?
I've been reading up on partitioned tabes on pgsql, will the performance 
benefit will be
comparable to Oracle partitioned tables?
What are the gotchas?
Should I be testing on 8 or the 7 version?
While I didn't find any documents immediately, are there any fine manuals to 
read on data
warehouse performance tuning on PostgreSQL?

Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date
on the results.

Best regards,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM]

2005-01-20 Thread Matt Casters
 
Thanks Stephen,

My main concern is to get as much read performance on the disks as possible
on this given system.  CPU is rarely a problem on a typical data warehouse
system, this one's not any different.

We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
coming along.(around 350Gb)
I was kind of hoping that the new PGSQL tablespaces would allow me to create
a storage container spanning multiple file-systems, but unfortunately, that
seems to be not the case.  Is this correct?

That tells me that I probably need to do a full reconfiguration of the disks
on the Solaris level to get maximum performance out of the system.
Mmmm. This is going to be a though one to crack.  Perhaps it will be
possible to get some extra juice out of placing the indexes on the smaller
disks (150G) and the data on the bigger ones?

Thanks!

Matt

-Oorspronkelijk bericht-
Van: Stephen Frost [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 15:26
Aan: Matt Casters
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

* Matt Casters ([EMAIL PROTECTED]) wrote:
 I have the go ahead of a customer to do some testing on Postgresql in 
 a couple of weeks as a replacement for Oracle.
 The reason for the test is that the number of users of the warehouse 
 is going to increase and this will have a serious impact on licencing 
 costs. (I bet that sounds familiar)

Rather familiar, yes... :)

 We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
RAM) on Oracle.
 Basically we have 2 large fact tables to deal with: one going for 400M 
 rows, the other will be hitting 1B rows soon.
 (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think- Postgres
will not take advantage of multiple CPUs for a given query, Oracle will.
So, it depends on your workload as to how that may impact you.  Situations
where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

 My questions to the list are: has this sort of thing been attempted 
 before? If so, what where the performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't have
any numbers on it myself.  My data sets aren't that large (couple million
rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
and much easier to work with.

 I've been reading up on partitioned tabes on pgsql, will the 
 performance benefit will be comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

 What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join, iirc).

 Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
releases before you go production, if you have time before you have to go
into production with the new solution (sounds like you do- changing
databases takes time anyway).

 Thanks in advance for any help you may have, I'll do my best to keep 
 pgsql-performance up to date on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

Stephen



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM]

2005-01-20 Thread Matt Casters

Joshua,

Actually that's a great idea!
I'll have to check if Solaris wants to play ball though.
We'll have to see as we don't have the new disks yet, ETA is next week.

Cheers,

Matt

-Oorspronkelijk bericht-
Van: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 21:26
Aan: [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

Matt Casters wrote:
  
 Thanks Stephen,
 
 My main concern is to get as much read performance on the disks as 
 possible on this given system.  CPU is rarely a problem on a typical 
 data warehouse system, this one's not any different.
 
 We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third 
 one coming along.(around 350Gb)

Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your
database? Push all of your extra drives into the RAID 10.

Sincerely,

Joshua D. Drake




 I was kind of hoping that the new PGSQL tablespaces would allow me to 
 create a storage container spanning multiple file-systems, but 
 unfortunately, that seems to be not the case.  Is this correct?
 
 That tells me that I probably need to do a full reconfiguration of the 
 disks on the Solaris level to get maximum performance out of the system.
 Mmmm. This is going to be a though one to crack.  Perhaps it will be 
 possible to get some extra juice out of placing the indexes on the 
 smaller disks (150G) and the data on the bigger ones?
 
 Thanks!
 
 Matt
 
 -Oorspronkelijk bericht-
 Van: Stephen Frost [mailto:[EMAIL PROTECTED]
 Verzonden: donderdag 20 januari 2005 15:26
 Aan: Matt Casters
 CC: pgsql-performance@postgresql.org
 Onderwerp: Re: [PERFORM]
 
 * Matt Casters ([EMAIL PROTECTED]) wrote:
 
I have the go ahead of a customer to do some testing on Postgresql in 
a couple of weeks as a replacement for Oracle.
The reason for the test is that the number of users of the warehouse 
is going to increase and this will have a serious impact on licencing 
costs. (I bet that sounds familiar)
 
 
 Rather familiar, yes... :)
 
 
We're running a medium sized data warehouse on a Solaris box (4CPU, 
8Gb
 
 RAM) on Oracle.
 
Basically we have 2 large fact tables to deal with: one going for 400M 
rows, the other will be hitting 1B rows soon.
(around 250Gb of data)
 
 
 Quite a bit of data.  There's one big thing to note here I think- 
 Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.
 So, it depends on your workload as to how that may impact you.  
 Situations where this will be unlikely to affect you:
 
 Your main bottle-neck is IO/disk and not CPU.
 You run multiple queries in parallel frequently.
 There are other processes on the system which chew up CPU time anyway.
 
 Situations where you're likely to be affected would be:
 
 You periodically run one big query.
 You run a set of queries in sequential order.
 
 
My questions to the list are: has this sort of thing been attempted 
before? If so, what where the performance results compared to Oracle?
 
 
 I'm pretty sure it's been attempted before but unfortunately I don't 
 have any numbers on it myself.  My data sets aren't that large (couple 
 million
 rows) but I've found PostgreSQL at least as fast as Oracle for what we 
 do, and much easier to work with.
 
 
I've been reading up on partitioned tabes on pgsql, will the 
performance benefit will be comparable to Oracle partitioned tables?
 
 
 In this case I would think so, except that PostgreSQL still won't use 
 multiple CPUs for a given query, even against partitioned tables, aiui.
 
 
What are the gotchas?
 
 
 See above? :)  Other issues are things having to do w/ your specific
 SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is 
 it, something like select x,y from a,b where x=%y; to do a right-join,
iirc).
 
 
Should I be testing on 8 or the 7 version?
 
 
 Now that 8.0 is out I'd say probably test with that and just watch for 
 8.0.x releases before you go production, if you have time before you 
 have to go into production with the new solution (sounds like you do- 
 changing databases takes time anyway).
 
 
Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.
 
 
 Hope that helps.  Others on here will correct me if I misspoke. :)
 
   Stephen
 
 
 
 ---(end of 
 broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., your source for PostgreSQL replication, professional
support, programming, managed services, shared and dedicated hosting. Home
of the Open Source Projects plPHP, plPerlNG, pgManage,  and pgPHPtoolkit.
Contact us now at: +1-503-667-4564 - http://www.commandprompt.com




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] DWH on Postgresql

2005-01-20 Thread Matt Casters



Hi,

I have the go ahead 
of a customer to do some testing on Postgresql in a couple of weeks as a 
replacement for Oracle.

The reason for the 
test is that the number of users of the warehouse is going to increase and this 
will have a serious impact on licencing costs. (I bet that sounds 
familiar)

We're running a medium sized data 
warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle.
Basically we have 2large fact tables 
to deal with: onegoing for400M rows, the other will be 
hitting 1B rows soon.
(around 250Gb of 
data)

My questions to the 
list are: has this sort of thing been attempted before?If so, what where 
the results?
I've been reading up 
on partitioned tabes on pgsql, will the performance benefit will be comparable 
to Oracle partitioned tables?
What are the 
gotchas? Should I be testing on 8 or the 7 version?

Thanks in 
advancefor any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.

Best 
regards,

Matt
___
Matt Casters
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 OKEGEM, Belgium
Tel. 054/25.01.37
GSM 0486/97.29.37




Re: [PERFORM] Swapping on Solaris

2005-01-19 Thread Matt Casters

 Kevin Schroeder wrote:
 It looks to me like you are using no (device or file) swap at all, and
 have 1.3G of real memory free, so could in fact give Postgres more of it :-)


Indeed.
If you DO run into trouble after giving Postgres more RAM, use the vmstat 
command.
You can use this command like vmstat 10. (ignore the first line)
Keep an eye on the pi and po parameters. (kilobytes paged in and out)

HTH,

Matt
--
Matt Casters [EMAIL PROTECTED]
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])