On Sun, Jan 06, 2013 at 01:33:55AM +0700, Henrik Sarvell wrote:
> In which actual situations do you need to do dbSync -> commit upd as
> opposed to just commit upd? It seems to me the RMDBS equivalent is begin ->
> commit?

The sequence

   (dbSync) -> (modify objects) -> (commit 'upd)

is all and only about two things: Avoiding race conditions, and keeping
the cached objects in all involved processes consistent.

If only one single process writes to the database (this is usually the
case when a new database is created and populated with initial data, but
before the main event loop is started), you don't need (dbSync). You
just call (commit) after creating or modifying objects.

As soon as several processes operate on the DB, you should call (dbSync)
before modifying anything, and (commit 'upd) when you are done.

It is theoretically possible to just go ahead, modify some objects, and
call (commit) to write the changes to the DB. But then you must be aware
that any other process having one of the involved objects already in
memory (this happens when the program accessed the object's value or
property list) will continue keeping the old state of that object. It
will use an outdated version, possibly giving wrong results, and -- even
worse -- will write this old state to the DB when it modifies that
object at a later time.

So if, for example, a process modifies an object's address, then another
process modifies that object's telephone number, the second process will
overwrite the changed address with the old version.

A change to an object's property will in many cases cause changes in
index trees (B-tree nodes are implemented as DB objects too), and in
other objects (e.g. in bi-directional '+Joint' relations). For that
reason, unsynchronized changes will almost surely result in a completely
messed-up database.

You can avoid the synchronization only if you are absolutely sure that
no other process has read (and cached) the objects you are about to
modify. Then you can simply go ahead, create and modify the objects, and
call (commit). It is important here not to forget here that while a
newly created object itself is safe (no other process can have it
already), the creation of objects usually causes the modification of
other objects (tree nodes etc.).

> If you do db, put> and commit upd calls so that they happen roughly at the
> same time you should be as safe as you are when you do a "update table set
> bla bla where userid = 5" in an RMDBS?

"roughly at the same time" is probably not enough.

> Let's take what I actually do at work as an example, pretend that I would
> like to try to move the whole casino database from MySQL/InnoDB to PL, the
> main thing then would be the following SQL query for example: "update users
> set balance = balance + 10 where user_id = 50".
> There can be 100 such calls per second and upwards 5 of them per second for
> the same user. Doing a dbSync here doesn't makes sense, there are no
> relations, just a number that needs increasing or decreasing, updating user
> 5's balance should not have to wait for an update of 10's balance to finish
> and so on.
> It seems to me that this situation is solved in PL by simply doing (commit
> upd)s without any dbSyncs, if there are several updates coming in at
> virtually the same time they will still be synced through the commit upds.

Yes, but the caching issue is not resolved. If two processes increment
the object's counter, the second one will still hold the old
un-incremented value, because it didn't wait to receive the changes
broadcasted by (commit 'upd). This waiting is handled by the 'sync' call
in (dbSync).

You could call (rollback) before the increment, thus forcing the object
(_all_ objects, to be precise) to be reloaded, but you may still have a
race condition where another process increments the object before you do
your 'commit'.

♪♫ Alex
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Reply via email to