On 09/11/2011 07:35 AM, Igor Chudov wrote:
I have been a MySQL user for years, including owning a few
multi-gigabyte databases for my websites, and using it to host
algebra.com <http://algebra.com> (about 12 GB database).

I have had my ups and downs with MySQL. The ups were ease of use and
decent performance for small databases such as algebra.com 
<http://algebra.com>. The downs
were things like twenty hour REPAIR TABLE operations on a 35 GB
table, etc.

Right now I have a personal (one user) project to create a 5-10
Terabyte data warehouse. The largest table will consume the most space
and will take, perhaps, 200,000,000 rows.

I want to use it to obtain valuable business intelligence and to make
money.

I expect it to grow, never shrink, and to be accessed via batch
queries. I do not care for batch queries to be super fast, for example
an hour per query would be just fine.

However, while an hour is fine, two weeks per query is NOT fine.

I have a server with about 18 TB of storage and 48 GB of RAM, and 12
CPU cores.

My initial plan was to use MySQL, InnoDB, and deal with problems as
they arise. Perhaps, say, I would implement my own joining
procedures.

After reading some disparaging stuff about InnoDB performance on large
datasets, however, I am getting cold feet. I have a general feeling
that, perhaps, I will not be able to succeed with MySQL, or, perhaps,
with either MySQL and Postgres.

I do not know much about Postgres, but I am very eager to learn and
see if I can use it for my purposes more effectively than MySQL.

I cannot shell out $47,000 per CPU for Oracle for this project.

To be more specific, the batch queries that I would do, I hope,
would either use small JOINS of a small dataset to a large dataset, or
just SELECTS from one big table.

So... Can Postgres support a 5-10 TB database with the use pattern
stated above?

Thanks!

i


That is a scale or two larger than I have experience with.  I converted my 
website database from mysql to PG, and it has several db's between 1 and 10 
gig.  There are parts of the website that were faster with mysql, and there are 
parts faster with PG.  One spot, because PG has superior join support on select 
statements, I was able to change the code to generate a single more complicated 
sql statement vs. mysql that had to fire off several simpler statements.  Its a 
search screen where you can type in 15'ish different options.  I was able to 
generate a single sql statement which joins 8 some odd tables and plenty of 
where statements.  PG runs it in the blink of an eye.  Its astonishing compared 
to the pain of mysql.  If you ever have to write your own join, or your own 
lookup function, that's a failure of your database.

One spot that was slower was a batch insert of data.  Its not so much slower 
that it was a problem.  I use COPY on PG vs prepared insert's on mysql.  It was 
pretty close, but mysql still won.

Seeing as you can setup and test both databases, have you considered a trial 
run?

Things to watch for:


I think the same amount of data will use more disk space in PG than in mysql.

Importing data into PG should use COPY and multiple connections at the same 
time.

PG will only use multi-core if you use multiple connections. (each connecion 
uses one core).

Huge result sets (like a select statement that returns 1,000,000 rows) will be 
slow.

PG is a much fuller database than mysql, and as such you can influence its join 
types, and function calls. (table scan vs index, immutable function vs stable, 
perl function vs sql).  So if at first it appears slow, you have a million 
options.  I think the only option you have in mysql is to pull the data back 
and code it yourself.

Upgrading to major versions of PG may or may not be painful.  (mysql sometimes 
works seamlessly between versions, it appears brilliant.  But I have had 
problems with an update, and when it goes bad, you dont have a lot of options). 
 In the past PG's only method of upgrade was a full backup of old, restore in 
new.  Things have gotten better, there is new pg_upgrade support (still kinda 
new though), and there is some 3rd party replication support where you 
replicate your 9.0 database to a new 9.1 database, and at some point you 
promote the new 9.1 database as the new master.  Or something like that.  I've 
only read posts about it, never done it.  But with that much data, you'll need 
an upgrade plan.

All in all, if I can summarize my personal view: mysql is fast at the expense 
of safety and usability.   (mysql still cannot do update's with subselects).  
PG is safe and usable at the expense of speed, and you wont be disappointed by 
the speed.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to