Re: [PERFORM] perf problem with huge table

2010-02-11 Thread jesper
 Hi all,

 i am trying to move my app from M$sql to PGsql, but i need a bit of help
 :)

Except from all the other good advises about partitioning the dataset and
such there is another aspect to keep in mind. When you have a large
dataset and your queries become IO-bound the tuple density is going to
hit you in 2 ways. Your dataset seems to have a natural clustering around
the time, which is also what you would use for the partitioning. That also
means that if you sort of have the clustering of data on disk you would
have the tuples you need to satisfy a query on the same page or pages
close to.

The cost of checking visibillity for a tuple is to some degree a function
of the tuple size, so if you can do anything to increase the tuple
density that will most likely benefit speed in two ways:

* You increace the likelyhood that the next tuple was in the same page and
  then dont result in a random I/O seek.
* You increace the total amount of tuples you have sitting in your system
  cache in the same amount of pages (memory) so they dont result in a
  random I/O seek.

So .. if you are carrying around columns you dont really need, then
throw them away. (that could be colums that trivially can be computed
bases on other colums), but you need to do your own testing here. To
stress the first point theres a sample run on a fairly old desktop with
one SATA drive.

testtable has the id integer and a data which is 486 bytes of text.
testtable2 has the id integer and a data integer.

both filled with 10M tuples and PG restarted and rand drop caches before
to simulate totally disk bound system.

testdb=# select count(id) from testtable where id  800 and id  850;
 count

 49
(1 row)

Time: 7909.464 ms
testdb=# select count(id) from testtable2 where id  800 and id 
850;
 count

 49
(1 row)

Time: 2149.509 ms

In this sample.. 4 times faster, the query does not touch the data column.
(on better hardware you'll most likely see better results).

If the columns are needed, you can push less frequently used columns to a
1:1 relation.. but that gives you some administrative overhead, but then
you can desice at query time if you want the extra random seeks to
access data.

You have the same picture the other way around if your queries are
accession data sitting in TOAST, you'll be paying double random IO-cost
for getting the tuple. So it is definately a tradeoff, that should be done
with care.

I've monkeypatched my own PG using this patch to toy around with criteria
to send the less frequently used data to a TOAST table.
http://article.gmane.org/gmane.comp.db.postgresql.devel.general/135158/match=

Google vertical partition for more, this is basically what it is.

(I belive this could benefit my own application, so I'm also
trying to push some interest into the area).

-- 
Jesper








-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] perf problem with huge table

2010-02-11 Thread Leo Mannhart
Dave Crooke wrote:
 On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com
 mailto:jlewis...@gmail.com wrote:
 
 
 
 Just a nit, but Oracle implements MVCC.  90% of the databases out
 there do.
 
 
 Sorry, I spoke imprecisely. What I meant was the difference in how the
 rows are stored internally  in Oracle, the main tablespace contains
 only the newest version of a row, which is (where possible) updated in
 place - queries in a transaction that can still see an older version
 have to pull it from the UNDO tablespace (rollback segments in Oracle 8
 and older).
  
 In Postgres, all versions of all rows are in the main table, and have
 validity ranges associated with them (this version of this row existed
 between transaction ids x and y). Once a version goes out of scope, it
 has to be garbage collected by the vacuuming process so the space can be
 re-used.
 
 In general, this means Oracle is faster *if* you're only doing lots of
 small transactions (consider how these different models handle an update
 to a single field in a single row) but it is more sensitive to the scale
 of transactions  doing a really big transaction against a database
 with an OLTP workload can upset Oracle's digestion as it causes a lot of
 UNDO lookups, PG's performance is a lot more predictable in this regard.
 
 Both models have benefits and drawbacks ... when designing a schema for
 performance it's important to understand these differences.
 
 
 I find partitioning pretty useful in this scenario if the data
 allows is.  Aging out data just means dropping a partition rather
 than a delete statement.
 
 
 Forgot to say this - yes, absolutely agree  dropping a table is a
 lot cheaper than a transactional delete.
 
 In general, I think partitioning is more important / beneficial with
 PG's style of MVCC than with Oracle or SQL-Server (which I think is
 closer to Oracle than PG).

I would like to disagree here a little bit

Where Oracle's table partitioning is coming in very handy is for example
when you have to replace the data of a big (read-only) table on a
regularly basis (typically the replicated data from another system).
In this case, you just create a partitioned table of exact the same
columns/indexes whatsoever as the data table.

To load, you then do load the data into the partitioned table, i.e.
- truncate the partitioned table, disable constraints, drop indexes
- load the data into the partitioned table
- rebuild all indexes etc. on the partitioned table

during all this time (even if it takes hours) the application can still
access the data in the data table without interfering the bulk load.

Once you have prepared the data in the partitioned table, you
- exchange the partition with the data table
wich is a dictionary operation, that means, the application is (if ever)
only blocked during this operation which is in the sub-seconds range.

If you have to do this with convetional updates or deletes/inserts resp.
then this might not even be possible in the given timeframe.

just as an example
Leo

p.s. just to make it a little bit clearer about the famous ORA-01555:
Oracle is not forgetting the data as the Oracle RDBMS is of course
also ACID-compliant. The ORA-01555 can happen

- when the rollback tablespace is really to small to hold all the data
changed in the transaction (which I consider a configuration error)

- when a long running (read) transaction is trying to change a record
which is already updated AND COMMITTED by another transaction. The key
here is, that a second transaction has changed a record which is also
needed by the first transaction and the second transaction commited the
work. Committing the change means, the data in the rollback segment is
no longer needed, as it can be read directly from the data block (after
all it is commited and this means valid and visible to other
transactions). If the first transaction now tries to read the data from
the rollback segment to see the unchanged state, it will still succeed
(it is still there, nothing happend until now to the rollback segment).
The problem of the ORA-01555 shows up only, if now a third transaction
needs space in the rollback segment. As the entry from the first/second
transaction is marked committed (and therefore no longer needed), it is
perfectly valid for transaction #3 to grab this rollback segment and to
store its old value there. If THEN (and only then) comes transaction #1
again, asking for the old, unchanged value when the transaction started,
THEN the famous ORA-01555 is raised as this value is now overwritten by
transaction #3.
Thats why in newer versions you have to set the retention time of the
rollback blocks/segments to a value bigger than your expected longest
transaction. This will decrease the likelihood of the ORA-01555
drastically (but it is still not zero, as you could easily construct an
example where it still will fail with ORA-0155 as a transaction can
still run longer than you 

[PERFORM] perf problem with huge table

2010-02-10 Thread rama


Hi all,

i am trying to move my app from M$sql to PGsql, but i need a bit of help :)


on M$sql, i had certain tables that was made as follow (sorry pseudo code)

contab_y
   date
   amt
   uid


contab_yd
  date
  amt
  uid

contab_ymd
  date
  amt 
  uid


and so on..

this was used to solidify (aggregate..btw sorry for my terrible english) the 
data on it..

so basically, i get

contab_y
date = 2010
amt = 100
uid = 1

contab_ym
  date = 2010-01
  amt = 10
  uid = 1

  date = 2010-02
  amt = 90
  uid = 1


contab_ymd
   date=2010-01-01
   amt = 1
   uid = 1

blabla


in that way, when i need to do a query for a long ranges  (ie: 1 year) i just 
take the rows that are contained to contab_y
if i need to got a query for a couple of days, i can go on ymd, if i need to 
get some data for the other timeframe, i can do some cool intersection between 
the different table using some huge (but fast) queries.


Now, the matter is that this design is hard to mantain, and the tables are 
difficult to check

what i have try is to go for a normal approach, using just a table that 
contains all the data, and some proper indexing.
The issue is that this table can contains easilly 100M rows :)
that's why the other guys do all this work to speed-up queryes splitting data 
on different table and precalculating the sums.


I am here to ask for an advice to PGsql experts: 
what do you think i can do to better manage this situation? 
there are some other cases where i can take a look at? maybe some 
documentation, or some technique that i don't know?
any advice is really appreciated!










-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Justin Graf
On 2/10/2010 5:13 PM, rama wrote:
 in that way, when i need to do a query for a long ranges  (ie: 1 year) i just 
 take the rows that are contained to contab_y
 if i need to got a query for a couple of days, i can go on ymd, if i need to 
 get some data for the other timeframe, i can do some cool intersection between
 the different table using some huge (but fast) queries.


 Now, the matter is that this design is hard to mantain, and the tables are 
 difficult to check

 what i have try is to go for a normal approach, using just a table that 
 contains all the data, and some proper indexing.
 The issue is that this table can contains easilly 100M rows :)
 that's why the other guys do all this work to speed-up queryes splitting data 
 on different table and precalculating the sums.


 I am here to ask for an advice to PGsql experts:
 what do you think i can do to better manage this situation?
 there are some other cases where i can take a look at? maybe some 
 documentation, or some technique that i don't know?
 any advice is really appreciated!

Look into table partitioning
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Its similar to what you are doing but it simplifies queries and logic to 
access large data sets.

All legitimate Magwerks Corporation quotations are sent in a .PDF file 
attachment with a unique ID number generated by our proprietary quotation 
system. Quotations received via any other form of communication will not be 
honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally 
privileged, confidential or other information proprietary to Magwerks 
Corporation and is intended solely for the use of the individual to whom it 
addresses. If the reader of this e-mail is not the intended recipient or 
authorized agent, the reader is hereby notified that any unauthorized viewing, 
dissemination, distribution or copying of this e-mail is strictly prohibited. 
If you have received this e-mail in error, please notify the sender by replying 
to this message and destroy all occurrences of this e-mail immediately.
Thank you.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Dave Crooke
Hi Rama

I'm actually looking at going in the other direction 

I have an app using PG where we have a single table where we just added a
lot of data, and I'm ending up with many millions of rows, and I'm finding
that the single table schema simply doesn't scale.

In PG, the table partitioning is only handled by the database for reads, for
insert/update you need to do quite a lot of DIY (setting up triggers, etc.)
so I am planning to just use named tables and generate the necessary DDL /
DML in vanilla SQL the same way that your older code does.

My experience is mostly with Oracle, which is not MVCC, so I've had to
relearn some stuff:

- Oracle often answers simple queries (e.g. counts and max / min) using only
the index, which is of course pre-sorted. PG has to go out and fetch the
rows to see if they are still in scope, and if they are stored all over the
place on disk it means an 8K random page fetch for each row. This means that
adding an index to PG is not nearly the silver bullet that it can be with
some non-MVCC databases.

- PG's indexes seem to be quite a bit larger than Oracle's, but that's gut
feel, I haven't been doing true comparisons ...  however, for my app I have
limited myself to only two indexes on that table, and each index is larger
(in disk space) than the table itself ... I have 60GB of data and 140GB of
indexes :-)

- There is a lot of row turnover in my big table (I age out data)  a big
delete (millions of rows) in PG seems a bit more expensive to process than
in Oracle, however PG is not nearly as sensitive to transaction sizes as
Oracle is, so you can cheerfully throw out one big DELETE from FOO where
... and let the database chew on it

I am interested to hear about your progress.

Cheers
Dave

On Wed, Feb 10, 2010 at 4:13 PM, rama rama.r...@tiscali.it wrote:



 Hi all,

 i am trying to move my app from M$sql to PGsql, but i need a bit of help :)


 on M$sql, i had certain tables that was made as follow (sorry pseudo code)

 contab_y
   date
   amt
   uid


 contab_yd
  date
  amt
  uid

 contab_ymd
  date
  amt
  uid


 and so on..

 this was used to solidify (aggregate..btw sorry for my terrible english)
 the data on it..

 so basically, i get

 contab_y
 date = 2010
 amt = 100
 uid = 1

 contab_ym
  date = 2010-01
  amt = 10
  uid = 1
 
  date = 2010-02
  amt = 90
  uid = 1


 contab_ymd
   date=2010-01-01
   amt = 1
   uid = 1
 
 blabla


 in that way, when i need to do a query for a long ranges  (ie: 1 year) i
 just take the rows that are contained to contab_y
 if i need to got a query for a couple of days, i can go on ymd, if i need
 to get some data for the other timeframe, i can do some cool intersection
 between
 the different table using some huge (but fast) queries.


 Now, the matter is that this design is hard to mantain, and the tables are
 difficult to check

 what i have try is to go for a normal approach, using just a table that
 contains all the data, and some proper indexing.
 The issue is that this table can contains easilly 100M rows :)
 that's why the other guys do all this work to speed-up queryes splitting
 data on different table and precalculating the sums.


 I am here to ask for an advice to PGsql experts:
 what do you think i can do to better manage this situation?
 there are some other cases where i can take a look at? maybe some
 documentation, or some technique that i don't know?
 any advice is really appreciated!










 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Jon Lewison
On Wed, Feb 10, 2010 at 4:16 PM, Dave Crooke dcro...@gmail.com wrote:

 Hi Rama

 I'm actually looking at going in the other direction 

 I have an app using PG where we have a single table where we just added a
 lot of data, and I'm ending up with many millions of rows, and I'm finding
 that the single table schema simply doesn't scale.

 In PG, the table partitioning is only handled by the database for reads,
 for insert/update you need to do quite a lot of DIY (setting up triggers,
 etc.) so I am planning to just use named tables and generate the necessary
 DDL / DML in vanilla SQL the same way that your older code does.

 My experience is mostly with Oracle, which is not MVCC, so I've had to
 relearn some stuff:


Just a nit, but Oracle implements MVCC.  90% of the databases out there do.


 - Oracle often answers simple queries (e.g. counts and max / min) using
 only the index, which is of course pre-sorted. PG has to go out and fetch
 the rows to see if they are still in scope, and if they are stored all over
 the place on disk it means an 8K random page fetch for each row. This means
 that adding an index to PG is not nearly the silver bullet that it can be
 with some non-MVCC databases.

 - PG's indexes seem to be quite a bit larger than Oracle's, but that's gut
 feel, I haven't been doing true comparisons ...  however, for my app I have
 limited myself to only two indexes on that table, and each index is larger
 (in disk space) than the table itself ... I have 60GB of data and 140GB of
 indexes :-)

 - There is a lot of row turnover in my big table (I age out data)  a
 big delete (millions of rows) in PG seems a bit more expensive to process
 than in Oracle, however PG is not nearly as sensitive to transaction sizes
 as Oracle is, so you can cheerfully throw out one big DELETE from FOO where
 ... and let the database chew on it .


I find partitioning pretty useful in this scenario if the data allows is.
Aging out data just means dropping a partition rather than a delete
statement.


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Dave Crooke
On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com wrote:



 Just a nit, but Oracle implements MVCC.  90% of the databases out there do.


Sorry, I spoke imprecisely. What I meant was the difference in how the rows
are stored internally  in Oracle, the main tablespace contains only the
newest version of a row, which is (where possible) updated in place -
queries in a transaction that can still see an older version have to pull
it from the UNDO tablespace (rollback segments in Oracle 8 and older).

In Postgres, all versions of all rows are in the main table, and have
validity ranges associated with them (this version of this row existed
between transaction ids x and y). Once a version goes out of scope, it has
to be garbage collected by the vacuuming process so the space can be
re-used.

In general, this means Oracle is faster *if* you're only doing lots of small
transactions (consider how these different models handle an update to a
single field in a single row) but it is more sensitive to the scale of
transactions  doing a really big transaction against a database with an
OLTP workload can upset Oracle's digestion as it causes a lot of UNDO
lookups, PG's performance is a lot more predictable in this regard.

Both models have benefits and drawbacks ... when designing a schema for
performance it's important to understand these differences.


 I find partitioning pretty useful in this scenario if the data allows is.
 Aging out data just means dropping a partition rather than a delete
 statement.


Forgot to say this - yes, absolutely agree  dropping a table is a lot
cheaper than a transactional delete.

In general, I think partitioning is more important / beneficial with PG's
style of MVCC than with Oracle or SQL-Server (which I think is closer to
Oracle than PG).


Cheers
Dave


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Jon Lewison
On Wed, Feb 10, 2010 at 4:48 PM, Dave Crooke dcro...@gmail.com wrote:

 On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com wrote:



 Just a nit, but Oracle implements MVCC.  90% of the databases out there
 do.


 Sorry, I spoke imprecisely. What I meant was the difference in how the rows
 are stored internally  in Oracle, the main tablespace contains only the
 newest version of a row, which is (where possible) updated in place -
 queries in a transaction that can still see an older version have to pull
 it from the UNDO tablespace (rollback segments in Oracle 8 and older).

 In Postgres, all versions of all rows are in the main table, and have
 validity ranges associated with them (this version of this row existed
 between transaction ids x and y). Once a version goes out of scope, it has
 to be garbage collected by the vacuuming process so the space can be
 re-used.

 In general, this means Oracle is faster *if* you're only doing lots of
 small transactions (consider how these different models handle an update to
 a single field in a single row) but it is more sensitive to the scale of
 transactions  doing a really big transaction against a database with an
 OLTP workload can upset Oracle's digestion as it causes a lot of UNDO
 lookups, PG's performance is a lot more predictable in this regard.

 Both models have benefits and drawbacks ... when designing a schema for
 performance it's important to understand these differences.


Yes, absolutely.  It's not unusual to see the UNDO tablespace increase in
size by several gigs for a large bulk load.

Speaking of rollback segments I'm assuming that since all storage for
non-visible row versions is in the main table that PostgreSQL has no
equivalent for an ORA-01555.

- Jon


Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Dave Crooke
Actually, in a way it does  No space left on device or similar ;-)

Cheers
Dave

P.S. for those not familiar with Oracle, ORA-01555 translates to your query
/ transaction is kinda old and I've forgotten the data, so I'm just going to
throw an error at you now. If you're reading, your SELECT randomly fails,
if you're writing it forces a rollback of your transaction.

On Wed, Feb 10, 2010 at 6:09 PM, Jon Lewison jlewis...@gmail.com wrote:


 Speaking of rollback segments I'm assuming that since all storage for
 non-visible row versions is in the main table that PostgreSQL has no
 equivalent for an ORA-01555.

 - Jon




Re: [PERFORM] perf problem with huge table

2010-02-10 Thread Greg Smith

rama wrote:

in that way, when i need to do a query for a long ranges  (ie: 1 year) i just 
take the rows that are contained to contab_y
if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between 
the different table using some huge (but fast) queries.


Now, the matter is that this design is hard to mantain, and the tables are 
difficult to check
  


You sound like you're trying to implement something like materialized 
views one at a time; have you considered adopting the more general 
techniques used to maintain those so that you're not doing custom 
development each time for the design?


http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views
http://www.pgcon.org/2008/schedule/events/69.en.html

I think that sort of approach is more practical than it would have been 
for you in MySQL, so maybe this wasn't on your list of possibilities before.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance