Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application

2010-10-29 Thread Pierre C


My questions are: (1) Does the MVCC architecture introduce significant  
delays between insert by a thread and visibility by other threads


As said by others, once commited it is immediately visible to all


(2) Are there any available benchmarks that can measure this delay?


Since you will not be batching INSERTs, you will use 1 INSERT per  
transaction.

If you use Autocommit mode, that's it.
If you don't, you will get a few extra network roundtrips after the  
INSERT, to send the COMMIT.


One INSERT is usually extremely fast unless you're short on RAM and the  
indexes that need updating need some disk seeking.


Anyway, doing lots of INSERTs each in its own transaction is usually very  
low-throughput, because at each COMMIT, postgres must always be sure that  
all the data is actually written to the harddisks. So, depending on the  
speed of your harddisks, each COMMIT can take up to 10-20 milliseconds.


On a 7200rpm harddisk, it is absolutely impossible to do more than 7200  
commits/minute if you want to be sure each time that the data really is  
written on the harddisk, unless :


- you use several threads (one disk write can group several commits from  
different connections, see the config file docs)
- you turn of synchronous_commit ; in this case commit is instantaneous,  
but if your server loses power or crashes, the last few seconds of data  
may be lost (database integrity is still guaranteed though)
- you use a battery backup cache on your RAID controller, in this case  
"written to the harddisks" is replaced by "written to batteyr backed RAM"  
which is a lot faster


If you dont use battery backed cache, place the xlog on a different RAID1  
array than the tables/indexes, this allows committing of xlog records  
(which is the time critical part) to proceed smoothly and not be disturbed  
by other IO on the indexes/tables. Also consider tuning your bgwriter and  
checkpoints, after experimentation under realistic load conditions.


So, when you benchmark your application, if you get disappointing results,  
think about this...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application

2010-10-29 Thread Kevin Grittner
Steve Wong  wrote:
 
> (1) Does the MVCC architecture introduce significant delays
> between insert by a thread and visibility by other threads (I am
> unclear about how multiple versions are "collapsed" or reconciled,
> as well as how different query threads are seeing which version)?
 
As soon as the inserting transaction commits the inserted row is
visible to new snapshots.  If you are in an explicit transaction the
commit will have occurred before the return from the COMMIT request;
otherwise it will have completed before the return from the INSERT
request.
 
You will get a new snapshot for every statement in READ COMMITTED
(or lower) transaction isolation.  You will get a new snapshot for
each database transaction in higher isolation levels.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application

2010-10-29 Thread A.M.

On Oct 25, 2010, at 2:46 PM, Steve Wong wrote:

> Hi experts,
> 
> I have a (near) real-time application in which inserts into the database 
> needs 
> to be visible to queries from other threads with minimal delay. The inserts 
> are 
> triggered by real-time events and are therefore asynchronous (i.e. many 
> performance tips I read related to batch inserts or copy do not apply here, 
> since these events cannot be predicted or batched), and the inserted data 
> need 
> to be available within a couple of seconds to other threads (for example, an 
> inserted row that only appears to other query threads 5 seconds or more after 
> the insert is not acceptable). The delay should be under 2 seconds maximum, 
> sub-1 second would be great.
> 
> My questions are: (1) Does the MVCC architecture introduce significant delays 
> between insert by a thread and visibility by other threads (I am unclear 
> about 
> how multiple versions are "collapsed" or reconciled, as well as how different 
> query threads are seeing which version)? (2) Are there any available 
> benchmarks 
> that can measure this delay? (3) What are relevant config parameters that 
> will 
> reduce this delay?

There is no way to know without testing whether your hardware, OS, database 
schema, and database load can meet your demands. However, there is no technical 
reason why PostgreSQL could not meet your timing goals- MVCC does not 
inherently introduce delays, however the PostgreSQL implementation requires a 
cleanup process which can introduce latency.

If you find that your current architecture is not up to the task, consider 
using LISTEN/NOTIFY with a payload (new in 9.0), which we are using for a 
similar "live-update" system.

Cheers,
M



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] MVCC and Implications for (Near) Real-Time Application

2010-10-29 Thread Steve Wong
Hi experts,

I have a (near) real-time application in which inserts into the database needs 
to be visible to queries from other threads with minimal delay. The inserts are 
triggered by real-time events and are therefore asynchronous (i.e. many 
performance tips I read related to batch inserts or copy do not apply here, 
since these events cannot be predicted or batched), and the inserted data need 
to be available within a couple of seconds to other threads (for example, an 
inserted row that only appears to other query threads 5 seconds or more after 
the insert is not acceptable). The delay should be under 2 seconds maximum, 
sub-1 second would be great.

My questions are: (1) Does the MVCC architecture introduce significant delays 
between insert by a thread and visibility by other threads (I am unclear about 
how multiple versions are "collapsed" or reconciled, as well as how different 
query threads are seeing which version)? (2) Are there any available benchmarks 
that can measure this delay? (3) What are relevant config parameters that will 
reduce this delay?

Thanks for your patience with my ignorance of MVCC (still learning more about 
it),
Steve