Hi Rama I'm actually looking at going in the other direction ....
I have an app using PG where we have a single table where we just added a lot of data, and I'm ending up with many millions of rows, and I'm finding that the single table schema simply doesn't scale. In PG, the table partitioning is only handled by the database for reads, for insert/update you need to do quite a lot of DIY (setting up triggers, etc.) so I am planning to just use named tables and generate the necessary DDL / DML in vanilla SQL the same way that your older code does. My experience is mostly with Oracle, which is not MVCC, so I've had to relearn some stuff: - Oracle often answers simple queries (e.g. counts and max / min) using only the index, which is of course pre-sorted. PG has to go out and fetch the rows to see if they are still in scope, and if they are stored all over the place on disk it means an 8K random page fetch for each row. This means that adding an index to PG is not nearly the silver bullet that it can be with some non-MVCC databases. - PG's indexes seem to be quite a bit larger than Oracle's, but that's gut feel, I haven't been doing true comparisons ... however, for my app I have limited myself to only two indexes on that table, and each index is larger (in disk space) than the table itself ... I have 60GB of data and 140GB of indexes :-) - There is a lot of row turnover in my big table (I age out data) .... a big delete (millions of rows) in PG seems a bit more expensive to process than in Oracle, however PG is not nearly as sensitive to transaction sizes as Oracle is, so you can cheerfully throw out one big "DELETE from FOO where ..." and let the database chew on it I am interested to hear about your progress. Cheers Dave On Wed, Feb 10, 2010 at 4:13 PM, rama <rama.r...@tiscali.it> wrote: > > > Hi all, > > i am trying to move my app from M$sql to PGsql, but i need a bit of help :) > > > on M$sql, i had certain tables that was made as follow (sorry pseudo code) > > contab_y > date > amt > uid > > > contab_yd > date > amt > uid > > contab_ymd > date > amt > uid > > > and so on.. > > this was used to "solidify" (aggregate..btw sorry for my terrible english) > the data on it.. > > so basically, i get > > contab_y > date = 2010 > amt = 100 > uid = 1 > > contab_ym > date = 2010-01 > amt = 10 > uid = 1 > ---- > date = 2010-02 > amt = 90 > uid = 1 > > > contab_ymd > date=2010-01-01 > amt = 1 > uid = 1 > ---- > blabla > > > in that way, when i need to do a query for a long ranges (ie: 1 year) i > just take the rows that are contained to contab_y > if i need to got a query for a couple of days, i can go on ymd, if i need > to get some data for the other timeframe, i can do some cool intersection > between > the different table using some huge (but fast) queries. > > > Now, the matter is that this design is hard to mantain, and the tables are > difficult to check > > what i have try is to go for a "normal" approach, using just a table that > contains all the data, and some proper indexing. > The issue is that this table can contains easilly 100M rows :) > that's why the other guys do all this work to speed-up queryes splitting > data on different table and precalculating the sums. > > > I am here to ask for an advice to PGsql experts: > what do you think i can do to better manage this situation? > there are some other cases where i can take a look at? maybe some > documentation, or some technique that i don't know? > any advice is really appreciated! > > > > > > > > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >