Re: [PERFORM] perf problem with huge table

Dave Crooke Wed, 10 Feb 2010 15:16:40 -0800

Hi Rama

I'm actually looking at going in the other direction ....


I have an app using PG where we have a single table where we just added a
lot of data, and I'm ending up with many millions of rows, and I'm finding
that the single table schema simply doesn't scale.

In PG, the table partitioning is only handled by the database for reads, for
insert/update you need to do quite a lot of DIY (setting up triggers, etc.)
so I am planning to just use named tables and generate the necessary DDL /
DML in vanilla SQL the same way that your older code does.

My experience is mostly with Oracle, which is not MVCC, so I've had to
relearn some stuff:

- Oracle often answers simple queries (e.g. counts and max / min) using only
the index, which is of course pre-sorted. PG has to go out and fetch the
rows to see if they are still in scope, and if they are stored all over the
place on disk it means an 8K random page fetch for each row. This means that
adding an index to PG is not nearly the silver bullet that it can be with
some non-MVCC databases.

- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut
feel, I haven't been doing true comparisons ...  however, for my app I have
limited myself to only two indexes on that table, and each index is larger
(in disk space) than the table itself ... I have 60GB of data and 140GB of
indexes :-)

- There is a lot of row turnover in my big table (I age out data) .... a big
delete (millions of rows) in PG seems a bit more expensive to process than
in Oracle, however PG is not nearly as sensitive to transaction sizes as
Oracle is, so you can cheerfully throw out one big "DELETE from FOO where
..." and let the database chew on it

I am interested to hear about your progress.

Cheers
Dave

On Wed, Feb 10, 2010 at 4:13 PM, rama <[email protected]> wrote:

>
>
> Hi all,
>
> i am trying to move my app from M$sql to PGsql, but i need a bit of help :)
>
>
> on M$sql, i had certain tables that was made as follow (sorry pseudo code)
>
> contab_y
>   date
>   amt
>   uid
>
>
> contab_yd
>  date
>  amt
>  uid
>
> contab_ymd
>  date
>  amt
>  uid
>
>
> and so on..
>
> this was used to "solidify" (aggregate..btw sorry for my terrible english)
> the data on it..
>
> so basically, i get
>
> contab_y
> date = 2010
> amt = 100
> uid = 1
>
> contab_ym
>  date = 2010-01
>  amt = 10
>  uid = 1
> ----
>  date = 2010-02
>  amt = 90
>  uid = 1
>
>
> contab_ymd
>   date=2010-01-01
>   amt = 1
>   uid = 1
> ----
> blabla
>
>
> in that way, when i need to do a query for a long ranges  (ie: 1 year) i
> just take the rows that are contained to contab_y
> if i need to got a query for a couple of days, i can go on ymd, if i need
> to get some data for the other timeframe, i can do some cool intersection
> between
> the different table using some huge (but fast) queries.
>
>
> Now, the matter is that this design is hard to mantain, and the tables are
> difficult to check
>
> what i have try is to go for a "normal" approach, using just a table that
> contains all the data, and some proper indexing.
> The issue is that this table can contains easilly 100M rows :)
> that's why the other guys do all this work to speed-up queryes splitting
> data on different table and precalculating the sums.
>
>
> I am here to ask for an advice to PGsql experts:
> what do you think i can do to better manage this situation?
> there are some other cases where i can take a look at? maybe some
> documentation, or some technique that i don't know?
> any advice is really appreciated!
>
>
>
>
>
>
>
>
>
>
> --
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] perf problem with huge table

Reply via email to