Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-14 Thread Stephen Crowley
Problem solved.. I set the fetchSize to a reasonable value instead of
the default of unlimited  in the PreparedStatement and now the query
is . After some searching it seeems this is a common problem, would it
make sense to change the default value to something other than 0 in
the JDBC driver?

If I get some extra time I'll look into libpq and see what is required
to fix the API. Most thirdparty programs and existing JDBC apps won't
work with the current paradigm when returning large result sets.

Thanks,
Stephen



On Mon, 13 Sep 2004 21:49:14 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 Stephen Crowley [EMAIL PROTECTED] writes:
  On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
  Stephen Crowley [EMAIL PROTECTED] writes:
  Does postgres cache the entire result set before it begins returning
  data to the client?
 
  The backend doesn't, but libpq does, and I think JDBC does too.
 
  That is incredible. Why would libpq do such a thing?
 
 Because the API it presents doesn't allow for the possibility of query
 failure after having given you back a PGresult: either you have the
 whole result available with no further worries, or you don't.
 If you think it's incredible, let's see you design an equally
 easy-to-use API that doesn't make this assumption.
 
 (Now having said that, I would have no objection to someone extending
 libpq to offer an alternative streaming API for query results.  It
 hasn't got to the top of anyone's to-do list though ... and I'm
 unconvinced that psql could use it if it did exist.)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Pierre-Frdric Caillaud

Performance hint :
	For static data, do not normalize too much.
	For instance if you have a row which can be linked to several other rows,  
you can do this :

create table parents (
id  serial primary key,
values... )
create table children (
id serial primary key,
parent_id references parents(id),
integer slave_value )
Or you can do this, using an array :
create table everything (
id  serial primary key,
integer[] children_values,
values... )
	Pros :
	No Joins. Getting the list of chilndren_values from table everything is  
just a select.
	On an application with several million rows, a query lasting 150 ms with  
a Join takes 30 ms with an array.
	You can build the arrays from normalized tables by using an aggregate  
function.
	You can index the array elements with a GIST index...

	Cons :
	No joins, thus your queries are a little bit limited ; problems if the  
array is too long ;



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Markus Schaber
Hi, Mischa,

On Sun, 12 Sep 2004 20:47:17 GMT
Mischa Sandberg [EMAIL PROTECTED] wrote:

 On the other hand, if you do warehouse-style loading (Insert, or PG 
 COPY, into a temp table; and then 'upsert' into the perm table), I can 
 guarantee 2500 inserts/sec is no problem.

As we can forsee that we'll have similar insert rates to cope with in
the not-so-far future, what do you mean with 'upsert'? Do you mean a
stored procedure that iterates over the temp table?

Generally, what is the fastest way for doing bulk processing of 
update-if-primary-key-matches-and-insert-otherwise operations?

Thanks,
Markus Schaber

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Leeuw van der, Tim
Hi,

I found bulk-insert to perform very slow, when compared to MySQL / Oracle. All inserts 
were done in 1 transaction. However, mitigating factors here were:
- Application was a .Net application using ODBC drivers
- PostgreSQL 7.3 running on CYGWIN with cygipc daemon
- Probably very bad tuning in the config file, if any tuning done at all
- The application was issuing 'generic' SQL since it was generally used with Oracle 
and MySQL databases. So no tricks like using COPY or multiple rows with 1 INSERT 
statement. No stored procedures either.
- When doing queries, most of the time the results were comparable to or better than 
MySQL (the only other database that I tested with myself).


So what I can say is, that if you want fast INSERT performance from PostgreSQL then 
you'll probably have to do some trickery that you wouldn't have to do with a default 
MySQL installation.

regards,

--Tim


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Markus Schaber
Sent: Tuesday, September 14, 2004 2:15 PM
To: PostgreSQL Performance List
Subject: Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --


Hi, Mischa,

On Sun, 12 Sep 2004 20:47:17 GMT
Mischa Sandberg [EMAIL PROTECTED] wrote:

 On the other hand, if you do warehouse-style loading (Insert, or PG 
 COPY, into a temp table; and then 'upsert' into the perm table), I can 
 guarantee 2500 inserts/sec is no problem.

As we can forsee that we'll have similar insert rates to cope with in
the not-so-far future, what do you mean with 'upsert'? Do you mean a
stored procedure that iterates over the temp table?

Generally, what is the fastest way for doing bulk processing of 
update-if-primary-key-matches-and-insert-otherwise operations?

Thanks,
Markus Schaber

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Aaron Werman
 Mark Cotner wrote:

  The time has come to reevaluate/rearchitect an
  application which I built about 3 years ago.  There
  are no performance concerns with MySQL, but it would
  benefit greatly from stored procedures, views, etc.


From: Mischa Sandberg [EMAIL PROTECTED]

 If your company is currently happy with MySQL, there probably are other
 (nontechnical) reasons to stick with it. I'm impressed that you'd
 consider reconsidering PG.

I'd like to second Mischa on that issue. In general, if you migrate an
*existing* application from one RDBMS to another, you should expect
performance to decrease significantly. This is always true in a well
performing system even if the replacement technology is more sophisticated.
This is because of several factors.

Even if you try to develop in totally agnostic generic SQL, you are always
customizing to a feature set, namely the ones in the current system. Any
existing application has had substantial tuning and tweaking, and the new
one is at a disadvantage. Moreover, an existing system is a Skinnerian
reward/punishment system to the developers and DBAs, rewarding or punishing
them for very environment specific choices - resulting in an application,
dbms, OS, and platform that are both explicitly and unconsciously customized
to work together in a particular manner.

The net effect is a rule of thumb that I use:

NEVER reimplement an existing system unless the project includes substantial
functional imporovement.

Every time I've broken that rule, I've found that users expectations, based
on the application they are used to, are locked in. Any place where the new
system is slower, the users are dissatisfied; where it exceeds expectations
it isn't appreciated: the users are used to the old system quirks, and the
improvements only leave them uncomforable since the system acts
differently. (I've broken the rule on occation for standardization
conversions.)

My expectation is that pg will not get a fair shake here. If you do it - I'd
like to see the results anyway.

/Aaron

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Steinar H. Gunderson
On Tue, Sep 14, 2004 at 02:42:20PM +0200, Leeuw van der, Tim wrote:
 - PostgreSQL 7.3 running on CYGWIN with cygipc daemon

Isn't this doomed to kill your performance anyhow?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Leeuw van der, Tim

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steinar H. Gunderson
Sent: Tuesday, September 14, 2004 3:33 PM
To: PostgreSQL Performance List
Subject: Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --


 On Tue, Sep 14, 2004 at 02:42:20PM +0200, Leeuw van der, Tim wrote:
  - PostgreSQL 7.3 running on CYGWIN with cygipc daemon
 
 Isn't this doomed to kill your performance anyhow?

Yes and no, therefore I mentioned it explicitly as one of the caveats. When doing 
selects I could get performance very comparable to MySQL, so I don't want to blame 
poor insert-performance on cygwin/cygipc per se.
I'm not working on this app. anymore and don't have a working test-environment for it 
anymore so I cannot retest now with more recent versions.

regards,

--Tim

 
 /* Steinar */
 -- 
 Homepage: http://www.sesse.net/



---(end of broadcast)---
TIP 8: explain analyze is your friend

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Leeuw van der, Tim [EMAIL PROTECTED] writes:

 So what I can say is, that if you want fast INSERT performance from
 PostgreSQL then you'll probably have to do some trickery that you
 wouldn't have to do with a default MySQL installation.

I think the word INSERT is superfluous in the above sentence ;-)

Contrary to MySQL, you can't expect decent PostgreSQL performance on
decent hardware without some tuning.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Michael Kleiser
What MySQl-table-type did you use?
Was it MyISAM which don't supports transactions ?
Yes I read about that bulk-inserts with this table-type are very fast.
In Data Warehouse one often don't need transactions.
Leeuw van der, Tim schrieb:
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] Behalf Of Steinar H. 
Gunderson
Sent: Tuesday, September 14, 2004 3:33 PM
To: PostgreSQL Performance List
Subject: Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

  On Tue, Sep 14, 2004 at 02:42:20PM +0200, Leeuw van der, Tim wrote:
   - PostgreSQL 7.3 running on CYGWIN with cygipc daemon
 
  Isn't this doomed to kill your performance anyhow?
Yes and no, therefore I mentioned it explicitly as one of the caveats. 
When doing selects I could get performance very comparable to MySQL, so 
I don't want to blame poor insert-performance on cygwin/cygipc per se.
I'm not working on this app. anymore and don't have a working 
test-environment for it anymore so I cannot retest now with more recent 
versions.

regards,
--Tim
 
  /* Steinar */
  --
  Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 8: explain analyze is your friend
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Stephen Frost
* Markus Schaber ([EMAIL PROTECTED]) wrote:
 Generally, what is the fastest way for doing bulk processing of 
 update-if-primary-key-matches-and-insert-otherwise operations?

This is a very good question, and I havn't seen much of an answer to it
yet.  I'm curious about the answer myself, actually.  In the more recent
SQL specs, from what I understand, this is essentially what the 'MERGE'
command is for.  This was recently added and unfortunately is not yet
supported in Postgres.  Hopefully it will be added soon.

Otherwise, what I've done is basically an update followed by an insert
using outer joins.  If there's something better, I'd love to hear about
it.  The statements looks something like:

update X
  set colA = a.colA,
  colB = a.colB
  from Y a
  where keyA = a.keyA and
keyB = a.keyB;

insert into X
  select a.keyA,
 a.keyB,
 a.colA,
 a.colB
  from Y a left join X b
   using (keyA, keyB)
  where b.keyA is NULL and
b.keyB is NULL;

With the appropriate indexes, this is pretty fast but I think a merge
would be much faster.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Harald Lau (Sector-X)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

 From: Mischa Sandberg [EMAIL PROTECTED]
 
  If your company is currently happy with MySQL, there probably are
  other (nontechnical) reasons to stick with it. I'm impressed that
  you'd consider reconsidering PG.
 
 I'd like to second Mischa on that issue.

Though both of you are right from my point of view, I don't think
it's very useful to discuss this item here.

Having once migrated a MySQL-DB to PG I can confirm, that in fact
chances are good you will be unhappy if you adopt the MySQL
data-model and the SQL 1:1.
As well as PG has to be much more configured and optimized than
MySQL.
As well as the client-application is supposed to be modified to a
certain extend, particularly if you want to take over some -or some
more- business-logic from client to database.

But, from what Mark stated so far I'm sure he is not going to migrate
his app just for fun, resp. without having considered this.

 NEVER reimplement an existing system unless the project includes
 substantial functional imporovement.

or monetary issues
I know one big database that was migrated from Oracle to PG and
another from SQLServer to PG because of licence-costs. Definitely
there are some more.
That applies to MySQL, too; licence policy is somewhat obscure to me,
but under certain circumstances you have to pay

regards Harald

-BEGIN PGP SIGNATURE-
Version: PGPfreeware 6.5.3 for non-commercial use http://www.pgp.com

iQA/AwUBQUb+O8JpD/drhCuMEQJCZACgqdJsrWjOwdP779PFaFMjxdgvqkwAoIPc
jPONy6urLRLf3vylVjVlEyci
=/1Ka
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-14 Thread Markus Schaber
Hi, Stephen,

On Mon, 13 Sep 2004 19:51:22 -0500
Stephen Crowley [EMAIL PROTECTED] wrote:

 Does postgres cache the entire result set before it begins returning
 data to the client?
 
 I have a table with ~8 million rows and I am executing a query which
 should return about ~800,000 rows. The problem is that as soon as I
 execute the query it absolutely kills my machine and begins swapping
 for 5 or 6 minutes before it begins returning results. Is postgres
 trying to load the whole query into memory before returning anything?
 Also, why would it choose not to use the index? It is properly
 estimating the # of rows returned. If I set enable_seqscan to off it
 is just as slow.

As you get about 10% of all rows in the table, the query will hit every
page of the table.

Maybe it helps to CLUSTER the table using the index on your query
parameters, and then set enable_seqscan to off.

But beware, that you have to re-CLUSTER after modifications.

HTH,
Markus



-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-14 Thread Vivek Khera
 TL == Tom Lane [EMAIL PROTECTED] writes:

TL Bruce Momjian [EMAIL PROTECTED] writes:
 If we had a majority of queries filling more than one block we would
 be checkpointing like crazy and we don't normally get reports about
 that.

TL [ raised eyebrow... ]  And of course the 30-second-checkpoint-warning
TL stuff is a useless feature that no one ever exercises.

Well, last year about this time I discovered in my testing I was
excessively checkpointing;  I found that the error message was
confusing, and Bruce cleaned it up.  So at least one person excercised
that feature, namely me. :-)

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Vivek Khera
 MC == Mark Cotner [EMAIL PROTECTED] writes:

MC I've finished porting the schema and am importing the
MC data now.  My estimates for just two-thirds(60 of the
MC 90 days) of one of our 30 cable systems(MySQL dbs) is
MC estimated to take about 16 hours.  This may seem like
MC a lot, but I'm satisfied with the performance.  I've

be sure to load your data without indexes defined for your initial
import.

check your logs to see if increasing checkpoint_segments is
recommended.  I found that bumping it up to 50 helped speed up my
data loads (restore from dump) significantly.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-14 Thread Pierre-Frdric Caillaud

I have a table with ~8 million rows and I am executing a query which
should return about ~800,000 rows. The problem is that as soon as I
execute the query it absolutely kills my machine and begins swapping
for 5 or 6 minutes before it begins returning results. Is postgres
trying to load the whole query into memory before returning anything?
Also, why would it choose not to use the index? It is properly
estimating the # of rows returned. If I set enable_seqscan to off it
is just as slow.
1; EXPLAIN ANALYZE.
	Note the time it takes. It should not swap, just read data from the disk  
(and not kill the machine).

2; Run the query in your software
	Note the time it takes. Watch RAM usage. If it's vastly longer and you're  
swimming in virtual memory, postgres is not the culprit... rather use a  
cursor to fetch a huge resultset bit by bit.

Tell us what you find ?
Regards.
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread aaron werman
From: Harald Lau (Sector-X) [EMAIL PROTECTED]
...
 From: Mischa Sandberg [EMAIL PROTECTED]

  If your company is currently happy with MySQL, there probably are
  other (nontechnical) reasons to stick with it. I'm impressed that
  you'd consider reconsidering PG.

 I'd like to second Mischa on that issue.
Though both of you are right from my point of view, I don't think
it's very useful to discuss this item here.
It is kinda windy for the list, but the point is that a big part of 
performance is developer expectation and user expectation. I'd hope to lower 
expectations before we see an article in eWeek. Perhaps this thread should 
move to the advocacy list until the migration needs specific advice.

_
Get ready for school! Find articles, homework help and more in the Back to 
School Guide! http://special.msn.com/network/04backtoschool.armx

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Jim C. Nasby
On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote:
 PostgreSQL's functionality is in many ways similar to Oracle Partitioning.
 
 Loading up your data in many similar tables, then creating a view like:
 
 CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS
 SELECT 200409130800, col1, col2, col3... FROM table200409130800
 UNION ALL
 SELECT 200409131000, col1, col2, col3... FROM table200409131000
 UNION ALL
 SELECT 200409131200, col1, col2, col3... FROM table200409131200
 ...etc...
 
 will allow the PostgreSQL optimizer to eliminate partitions from the query
 when you run queries which include a predicate on the partitioning_col, e.g.
 
 select count(*) from bigtable where idate = 200409131000
 
 will scan the last two partitions only...
 
 There are a few other ways of creating the view that return the same answer,
 but only using constants in that way will allow the partitions to be
 eliminated from the query, and so run for much longer.

Is there by any chance a set of functions to manage adding and removing
partitions? Certainly this can be done by hand, but having a set of
tools would make life much easier. I just looked but didn't see anything
on GBorg.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Simon Riggs
 Jim C. Nasby
 On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote:
  PostgreSQL's functionality is in many ways similar to Oracle
 Partitioning.
 
  Loading up your data in many similar tables, then creating a view like:
 
  CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS
  SELECT 200409130800, col1, col2, col3... FROM table200409130800
  UNION ALL
  SELECT 200409131000, col1, col2, col3... FROM table200409131000
  UNION ALL
  SELECT 200409131200, col1, col2, col3... FROM table200409131200
  ...etc...
 
  will allow the PostgreSQL optimizer to eliminate partitions
 from the query
  when you run queries which include a predicate on the
 partitioning_col, e.g.
 
  select count(*) from bigtable where idate = 200409131000
 
  will scan the last two partitions only...
 
  There are a few other ways of creating the view that return the
 same answer,
  but only using constants in that way will allow the partitions to be
  eliminated from the query, and so run for much longer.

 Is there by any chance a set of functions to manage adding and removing
 partitions? Certainly this can be done by hand, but having a set of
 tools would make life much easier. I just looked but didn't see anything
 on GBorg.

Well, its fairly straightforward to auto-generate the UNION ALL view, and
important as well, since it needs to be re-specified each time a new
partition is loaded or an old one is cleared down. The main point is that
the constant placed in front of each table must in some way relate to the
data, to make it useful in querying. If it is just a unique constant, chosen
at random, it won't do much for partition elimination. So, that tends to
make the creation of the UNION ALL view an application/data specific thing.

The partitions are just tables, so no need for other management tools.
Oracle treats the partitions as sub-tables, so you need a range of commands
to add, swap etc the partitions of the main table.

I guess a set of tools that emulates that functionality would be generically
a good thing, if you can see a way to do that.

Oracle partitions were restricted in only allowing a single load statement
into a single partition at any time, whereas multiple COPY statements can
access a single partition table on PostgreSQL.

BTW, multi-dimensional partitioning is also possible using the same general
scheme

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Michael Glaesemann
On Sep 15, 2004, at 8:32 AM, Simon Riggs wrote:
The partitions are just tables, so no need for other management 
tools.
Oracle treats the partitions as sub-tables, so you need a range of 
commands
to add, swap etc the partitions of the main table.

I guess a set of tools that emulates that functionality would be 
generically
a good thing, if you can see a way to do that.

Oracle partitions were restricted in only allowing a single load 
statement
into a single partition at any time, whereas multiple COPY statements 
can
access a single partition table on PostgreSQL.
How does this compare to DB2 partitioning?
Michael Glaesemann
grzm myrealbox com
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Christopher Browne
[EMAIL PROTECTED] (Simon Riggs) wrote:
 The main point is that the constant placed in front of each table
 must in some way relate to the data, to make it useful in
 querying. If it is just a unique constant, chosen at random, it
 won't do much for partition elimination.

It just struck me - this is much the same notion as that of cutting
planes used in Integer Programming.

The approach, there, is that you take a linear program, which can give
fractional results, and throw on as many additional constraints as you
need in order to force the likelihood of particular variable falling
on integer values.  The constraints may appear redundant, but
declaring them allows the answers to be pushed in the right
directions.

In this particular case, the (arguably redundant) constraints let the
query optimizer have criteria for throwing out unnecessary tables.
Thanks for pointing this out; it may turn a fowl into a feature, when
I can get some round tuits :-).  That should allow me to turn an
81-way evil join into something that's 4-way at the worst.

Cheers!
-- 
cbbrowne,@,linuxfinances.info
http://linuxfinances.info/info/nonrdbms.html
Implementing systems is 95% boredom and 5% sheer terror.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Chris Browne
[EMAIL PROTECTED] (Simon Riggs) writes:
 Well, its fairly straightforward to auto-generate the UNION ALL view, and
 important as well, since it needs to be re-specified each time a new
 partition is loaded or an old one is cleared down. The main point is that
 the constant placed in front of each table must in some way relate to the
 data, to make it useful in querying. If it is just a unique constant, chosen
 at random, it won't do much for partition elimination. So, that tends to
 make the creation of the UNION ALL view an application/data specific thing.

Ah, that's probably a good thought.

When we used big UNION ALL views, it was with logging tables, where
there wasn't really any meaningful distinction between partitions.

So you say that if the VIEW contains, within it, meaningful constraint
information, that can get applied to chop out irrelevant bits?  

That suggests a way of resurrecting the idea...

Might we set up the view as:

create view combination_of_logs as
  select * from table_1 where txn_date between 'this' and 'that' 
   union all
  select * from table_2 where txn_date between 'this2' and 'that2' 
   union all
  select * from table_3 where txn_date between 'this3' and 'that3' 
   union all
  select * from table_4 where txn_date between 'this4' and 'that4' 
   union all
   ... ad infinitum
   union all
  select * from table_n where txn_date  'start_of_partition_n';

and expect that to help, as long as the query that hooks up to this
has date constraints?

We'd have to regenerate the view with new fixed constants each time we
set up the tables, but that sounds like it could work...
-- 
cbbrowne,@,acm.org
http://www3.sympatico.ca/cbbrowne/x.html
But  what can  you  do with  it?   -- ubiquitous  cry from  Linux-user
partner.  -- Andy Pearce, [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Joe Conway
Chris Browne wrote:
Might we set up the view as:
create view combination_of_logs as
  select * from table_1 where txn_date between 'this' and 'that' 
   union all
  select * from table_2 where txn_date between 'this2' and 'that2' 
   union all
  select * from table_3 where txn_date between 'this3' and 'that3' 
   union all
  select * from table_4 where txn_date between 'this4' and 'that4' 
   union all
   ... ad infinitum
   union all
  select * from table_n where txn_date  'start_of_partition_n';

and expect that to help, as long as the query that hooks up to this
has date constraints?
We'd have to regenerate the view with new fixed constants each time we
set up the tables, but that sounds like it could work...
That's exactly what we're doing, but using inherited tables instead of a 
union view. With inheritance, there is no need to rebuild the view each 
time a table is added or removed. Basically, in our application, tables 
are partitioned by either month or week, depending on the type of data 
involved, and queries are normally date qualified.

We're not completely done with our data conversion (from a commercial 
RDBMSi), but so far the results have been excellent. Similar to what 
others have said in this thread, the conversion involved restructuring 
the data to better suit Postgres, and the application (data 
analysis/mining vs. the source system which is operational). As a result 
we've compressed a  1TB database down to ~0.4TB, and seen at least one 
typical query reduced from ~9 minutes down to ~40 seconds.

Joe
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Iain
Hi Joe,

 That's exactly what we're doing, but using inherited tables instead of a
 union view. With inheritance, there is no need to rebuild the view each
 time a table is added or removed. Basically, in our application, tables
 are partitioned by either month or week, depending on the type of data
 involved, and queries are normally date qualified.

That sounds interesting. I have to admit that I havn't touched iheritance in
pg at all yet so I find it hard to imagine how this would work. If you have
a chance, would you mind elaborating on it just a little?

Regards
Iain


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Steve Atkins
On Tue, Sep 14, 2004 at 05:33:33PM -0500, Jim C. Nasby wrote:
 On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote:
  PostgreSQL's functionality is in many ways similar to Oracle Partitioning.
  
  Loading up your data in many similar tables, then creating a view like:
  
  CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS
  SELECT 200409130800, col1, col2, col3... FROM table200409130800
  UNION ALL
  SELECT 200409131000, col1, col2, col3... FROM table200409131000
  UNION ALL
  SELECT 200409131200, col1, col2, col3... FROM table200409131200
  ...etc...
[...]
 
 Is there by any chance a set of functions to manage adding and removing
 partitions? Certainly this can be done by hand, but having a set of
 tools would make life much easier. I just looked but didn't see anything
 on GBorg.

I've done a similar thing with time-segregated data by inheriting
all the partition tables from an (empty) parent table.

Adding a new partition is just a create table tablefoo () inherits(bigtable)
and removing a partition just drop table tablefoo.

Cheers,
  Steve


---(end of broadcast)---
TIP 8: explain analyze is your friend