Mark Cotner wrote:
Hi all,
I had a difficult time deciding which list to post
this to, so please forgive me if this list doesn't
perfectly match my questions.  My decision will not
solely be based on performance, but it is the primary
concern.  I would be very appreciative if you all
could comment on my test plan.  Real world examples of
a Postgres implementation of >=600G with a web
front-end would be great, or any data warehouse with
some size to it.

I'm only in the 30GB range of database, in case that's a consideration for my comments that follow.

At this time, I'm working out the best ROLAP options for our PG transaction store. The transaction store is highly volatile -- longest a txn stays in it is 15 days ... so you imagine the need for historic summaries :-)

I've also implemented multiple data servers, including
a federated server that had to make the best of existing engines
(like MySQL, PG and everything from MSJet to RedBrick in the commercial world).

The time has come to reevaluate/rearchitect an
application which I built about 3 years ago. There
are no performance concerns with MySQL, but it would
benefit greatly from stored procedures, views, etc.

If your company is currently happy with MySQL, there probably are other (nontechnical) reasons to stick with it. I'm impressed that you'd consider reconsidering PG.

Some of the mining that we do could benefit from
stored procedures as well.  MySQL may have these in
the works, but we won't be able to move to a version
of MySQL that supports stored procs for another year
or two.

And PG lets you back-end with some powerful pattern- and aggregate-handling languages, like Perl. This was definitely a plus for data mining of web traffic, for example. The power of server-side extensibility for bailing you out of a design dead-end is not inconsequential.

PG doesn't have PIVOT operators (qv Oracle and MSSQL), but it makes the translation from data to column fairly painless otherwise.

Merge table definition equivalent.  We use these

Looked all over etc, and afaics merge table is indeed exactly a view of a union-all. Is that right?

PG supports views, of course, as well (now) as tablespaces, allowing you to split tables/tablesets across multiple disk systems.
PG is also pretty efficient in query plans on such views, where (say) you make one column a constant (identifier, sort of) per input table.

Merge table equivalent with all tables containing over
100M rows(and about 40 columns, some quite wide) will
need to do index scans in at least 5 seconds(MySQL
currently does 2, but we can live with 5) and return
~200 rows.

PG has TOAST for handling REALLY BIG columns, and the generic TEXT type is as efficient as any size-specific VARCHAR() type ... should make things easier for you.

Um, gonna sound silly, but the web interface has to
remain "snappy" under load.  I don't see this as a
major concern since you don't require table locking.

Agreed. It's more in your warehouse design, and intelligent bounding of queries. I'd say PG's query analyzer is a few years ahead of MySQL for large and complex queries.

If business logic is moved to the database(likely with
Postgres) performance for inserting with light logic
on each insert has to keep up with the 4.5M inserts
per 2 hours(which MySQL completes in ~35min
currently).  Acceptable numbers for this aggregation
would be 45-55min using stored procedures.

Again, it's a matter of pipeline design. The tools for creating an efficient pipeline are at least as good in PG as MySQL.

If you try to insert and postprocess information one row at a time,
procedures or no, there's no offhand way to guarantee your performance without a test/prototype.

On the other hand, if you do warehouse-style loading (Insert, or PG COPY, into a temp table; and then 'upsert' into the perm table), I can guarantee 2500 inserts/sec is no problem.

Here's our case study if you're interested . . .

---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster

Reply via email to