Re: ACID tests

2013-01-20 Thread Henrik Sarvell
I've done some testing now, the dbSync calls incur very little overhead
with regards to the telling/updating mechanism of objects in other
processes.

What I still don't understand then is the ability to lock only certain
files in the dbSync call when the dbSync call is doing a global block in
itself. It seems to defeat the purpose of per file level locking?


On Sun, Jan 6, 2013 at 8:53 PM, Alexander Burger a...@software-lab.dewrote:

 On Sun, Jan 06, 2013 at 08:00:51PM +0700, Henrik Sarvell wrote:
  OK so how would you deal with a situation where you have the need to
  quickly increment people's balances like I mentioned previously but at
 the
  same time you have another process that has to update a lot of objects by
  fetching information from many others?
 
  This second process will take roughly one minute to complete from start
 to
  finish and will not update +User in any way.

 I would do it the normal, safe way, i.e. with 'inc!'. Note that the
 way you proposed doesn't have so much less overhead, I think, because it
 still uses the 'upd' argument to 'commit', which triggers communication
 with other processes, and 'commit' itself, which does a low-level
 locking of the DB.


  If I have understood things correctly simply doing dbSync - work -
 commit
  in the second process won't work here because it will block the balance
  updates.

 Yes, but you can control this, depending on how many updates are done in
 the 'work' between dbSync and commit.

 We've discussed this in IRC, so for other readers here is what I do
 usually in such cases:

(dbSync)
(while (..)
   (... do one update step ...)
   (at (0 . 1000) (commit 'upd) (dbSync)) )
(commit 'upd)

 The value of 1000 is an examply, I would try something between 100 and
 1.

 With that, after every 1000th update step other processes get a chance
 to grab the lock in the (dbSync) after the 'commit'.


  Another option would be to do it in a loop and use put! which will only
  initiate the sync at the time of each update which should not block the
  balance updates for too long.

 Right. This would be optimal in terms of giving freedom to other
 processes, but it does only one single change in the 'put!', and thus
 the whole update might take too long.

 The above sync at every 1000th step allows for a good compromise.


  The question then is how much overhead does this cause when it comes to
 the
  balance updates in your experience? If significant is it possible to

 I would not call this overhead. It is just so that the quick
 operation of incrementing the balance may have to wait too long if the
 second process does too many changes in a single transaction.

 So the problem is not the 'inc!'. It just sits and waits until it can
 do its job, and is then done quite quickly. It is the large update
 'work' which may grab the DB for too long periods.


  somehow solve the issue of these two processes creating collateral damage
  to each other so to speak?

 If you can isolate the balance (not like in your last example, where two
 processes incremented and decremented the balance at the same time), and
 make absolutely sure that only once process caches the object at a given
 time, you could take the risk and do the incrementing/decrementing
 without synchronization, with just (commit).

 One way might be to have a single process take care of that,
 communicating values to/from other processes with 'tell', so that no one
 else needs to access these objects. But that's more complicated than
 the straight-forward way.

 ♪♫ Alex
 --
 UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe



Re: ACID tests

2013-01-06 Thread Alexander Burger
On Sun, Jan 06, 2013 at 01:33:55AM +0700, Henrik Sarvell wrote:
 In which actual situations do you need to do dbSync - commit upd as
 opposed to just commit upd? It seems to me the RMDBS equivalent is begin -
 commit?

The sequence

   (dbSync) - (modify objects) - (commit 'upd)

is all and only about two things: Avoiding race conditions, and keeping
the cached objects in all involved processes consistent.


If only one single process writes to the database (this is usually the
case when a new database is created and populated with initial data, but
before the main event loop is started), you don't need (dbSync). You
just call (commit) after creating or modifying objects.


As soon as several processes operate on the DB, you should call (dbSync)
before modifying anything, and (commit 'upd) when you are done.

It is theoretically possible to just go ahead, modify some objects, and
call (commit) to write the changes to the DB. But then you must be aware
that any other process having one of the involved objects already in
memory (this happens when the program accessed the object's value or
property list) will continue keeping the old state of that object. It
will use an outdated version, possibly giving wrong results, and -- even
worse -- will write this old state to the DB when it modifies that
object at a later time.

So if, for example, a process modifies an object's address, then another
process modifies that object's telephone number, the second process will
overwrite the changed address with the old version.

A change to an object's property will in many cases cause changes in
index trees (B-tree nodes are implemented as DB objects too), and in
other objects (e.g. in bi-directional '+Joint' relations). For that
reason, unsynchronized changes will almost surely result in a completely
messed-up database.


You can avoid the synchronization only if you are absolutely sure that
no other process has read (and cached) the objects you are about to
modify. Then you can simply go ahead, create and modify the objects, and
call (commit). It is important here not to forget here that while a
newly created object itself is safe (no other process can have it
already), the creation of objects usually causes the modification of
other objects (tree nodes etc.).



 If you do db, put and commit upd calls so that they happen roughly at the
 same time you should be as safe as you are when you do a update table set
 bla bla where userid = 5 in an RMDBS?

roughly at the same time is probably not enough.


 Let's take what I actually do at work as an example, pretend that I would
 like to try to move the whole casino database from MySQL/InnoDB to PL, the
 main thing then would be the following SQL query for example: update users
 set balance = balance + 10 where user_id = 50.
 
 There can be 100 such calls per second and upwards 5 of them per second for
 the same user. Doing a dbSync here doesn't makes sense, there are no
 relations, just a number that needs increasing or decreasing, updating user
 5's balance should not have to wait for an update of 10's balance to finish
 and so on.
 
 It seems to me that this situation is solved in PL by simply doing (commit
 upd)s without any dbSyncs, if there are several updates coming in at
 virtually the same time they will still be synced through the commit upds.

Yes, but the caching issue is not resolved. If two processes increment
the object's counter, the second one will still hold the old
un-incremented value, because it didn't wait to receive the changes
broadcasted by (commit 'upd). This waiting is handled by the 'sync' call
in (dbSync).

You could call (rollback) before the increment, thus forcing the object
(_all_ objects, to be precise) to be reloaded, but you may still have a
race condition where another process increments the object before you do
your 'commit'.

♪♫ Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: ACID tests

2013-01-06 Thread Henrik Sarvell
 Yes, but the caching issue is not resolved. If two processes increment
 the object's counter, the second one will still hold the old
 un-incremented value, because it didn't wait to receive the changes
 broadcasted by (commit 'upd). This waiting is handled by the 'sync' call
 in (dbSync).


I'm not sure I follow you with that, when I do my testing with the
following code:

(load lib/http.l)

(de decBalance ()
   (let U (db 'uname '+User henrik)
  (dec U 'balance 10) )
   (commit 'upd)
   (println Decreased balance by 10.) )

(de incRepeat ()
   (let U (db 'uname '+User henrik)
  (put! U 'balance 100)
  (println
 (make
(do 10
   (inc U 'balance 10)
   (commit 'upd)
   (link (; U balance))
   (wait 2000) ) ) ) ) )

(class +User +Entity)
(rel uname(+Key +String))
(rel balance  (+Number))

(dbs
   (4 +User)

   (4 (+User uname balance)))

(pool /opt/picolisp/projects/acid-test/ *Dbs)

#(setq U (request '(+User) 'uname henrik 'balance 100))
#(commit)
#(mapc show (collect 'uname '+User))

(de go ()
  (server 8080))

I get (110 120 120 130 140 150 160 170 180 190) as the result of
http://localhost:8080/!incRepeat when I also open
http://localhost:8080/!decBalance in another tab while incRepeat runs.

Seems to me that incRepeat gets the decrease alright?


On Sun, Jan 6, 2013 at 5:16 PM, Alexander Burger a...@software-lab.dewrote:

 On Sun, Jan 06, 2013 at 01:33:55AM +0700, Henrik Sarvell wrote:
  In which actual situations do you need to do dbSync - commit upd as
  opposed to just commit upd? It seems to me the RMDBS equivalent is begin
 -
  commit?

 The sequence

(dbSync) - (modify objects) - (commit 'upd)

 is all and only about two things: Avoiding race conditions, and keeping
 the cached objects in all involved processes consistent.


 If only one single process writes to the database (this is usually the
 case when a new database is created and populated with initial data, but
 before the main event loop is started), you don't need (dbSync). You
 just call (commit) after creating or modifying objects.


 As soon as several processes operate on the DB, you should call (dbSync)
 before modifying anything, and (commit 'upd) when you are done.

 It is theoretically possible to just go ahead, modify some objects, and
 call (commit) to write the changes to the DB. But then you must be aware
 that any other process having one of the involved objects already in
 memory (this happens when the program accessed the object's value or
 property list) will continue keeping the old state of that object. It
 will use an outdated version, possibly giving wrong results, and -- even
 worse -- will write this old state to the DB when it modifies that
 object at a later time.

 So if, for example, a process modifies an object's address, then another
 process modifies that object's telephone number, the second process will
 overwrite the changed address with the old version.

 A change to an object's property will in many cases cause changes in
 index trees (B-tree nodes are implemented as DB objects too), and in
 other objects (e.g. in bi-directional '+Joint' relations). For that
 reason, unsynchronized changes will almost surely result in a completely
 messed-up database.


 You can avoid the synchronization only if you are absolutely sure that
 no other process has read (and cached) the objects you are about to
 modify. Then you can simply go ahead, create and modify the objects, and
 call (commit). It is important here not to forget here that while a
 newly created object itself is safe (no other process can have it
 already), the creation of objects usually causes the modification of
 other objects (tree nodes etc.).



  If you do db, put and commit upd calls so that they happen roughly at
 the
  same time you should be as safe as you are when you do a update table
 set
  bla bla where userid = 5 in an RMDBS?

 roughly at the same time is probably not enough.


  Let's take what I actually do at work as an example, pretend that I would
  like to try to move the whole casino database from MySQL/InnoDB to PL,
 the
  main thing then would be the following SQL query for example: update
 users
  set balance = balance + 10 where user_id = 50.
 
  There can be 100 such calls per second and upwards 5 of them per second
 for
  the same user. Doing a dbSync here doesn't makes sense, there are no
  relations, just a number that needs increasing or decreasing, updating
 user
  5's balance should not have to wait for an update of 10's balance to
 finish
  and so on.
 
  It seems to me that this situation is solved in PL by simply doing
 (commit
  upd)s without any dbSyncs, if there are several updates coming in at
  virtually the same time they will still be synced through the commit
 upds.

 Yes, but the caching issue is not resolved. If two processes increment
 the object's counter, the second one will still hold the old
 un-incremented value, 

Re: ACID tests

2013-01-06 Thread Henrik Sarvell
OK so how would you deal with a situation where you have the need to
quickly increment people's balances like I mentioned previously but at the
same time you have another process that has to update a lot of objects by
fetching information from many others?

This second process will take roughly one minute to complete from start to
finish and will not update +User in any way.

If I have understood things correctly simply doing dbSync - work - commit
in the second process won't work here because it will block the balance
updates.

Another option would be to do it in a loop and use put! which will only
initiate the sync at the time of each update which should not block the
balance updates for too long.

The question then is how much overhead does this cause when it comes to the
balance updates in your experience? If significant is it possible to
somehow solve the issue of these two processes creating collateral damage
to each other so to speak?


On Sun, Jan 6, 2013 at 7:19 PM, Alexander Burger a...@software-lab.dewrote:

 On Sun, Jan 06, 2013 at 05:43:06PM +0700, Henrik Sarvell wrote:
   Yes, but the caching issue is not resolved. If two processes increment
  ...
  I'm not sure I follow you with that, when I do my testing with the
  following code:
  ...
  (do 10
 (inc U 'balance 10)
 (commit 'upd)
 (link (; U balance))
 (wait 2000) ) ) ) ) )
  ...
  I get (110 120 120 130 140 150 160 170 180 190) as the result of
  ...
  Seems to me that incRepeat gets the decrease alright?

 This works because 'wait' is an idle loop, which also synchronizes in
 the background.

 Try a busy loop like (do 1000) instead. Then the above code will
 happily increment the local balance, no matter how much other processes
 decrement the value in the meantime.


 'sync' uses the internal mechanisms of 'wait', but in addition to that
 also addresses the issue of race conditions, which are difficult to
 reproduce with your setup. 'sync' guarantees that notifications about
 changes done by one process are sent _atomically_ to all other
 processes.

 ♪♫ Alex
 --
 UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe



Re: ACID tests

2013-01-06 Thread Alexander Burger
On Sun, Jan 06, 2013 at 08:00:51PM +0700, Henrik Sarvell wrote:
 OK so how would you deal with a situation where you have the need to
 quickly increment people's balances like I mentioned previously but at the
 same time you have another process that has to update a lot of objects by
 fetching information from many others?
 
 This second process will take roughly one minute to complete from start to
 finish and will not update +User in any way.

I would do it the normal, safe way, i.e. with 'inc!'. Note that the
way you proposed doesn't have so much less overhead, I think, because it
still uses the 'upd' argument to 'commit', which triggers communication
with other processes, and 'commit' itself, which does a low-level
locking of the DB.


 If I have understood things correctly simply doing dbSync - work - commit
 in the second process won't work here because it will block the balance
 updates.

Yes, but you can control this, depending on how many updates are done in
the 'work' between dbSync and commit.

We've discussed this in IRC, so for other readers here is what I do
usually in such cases:

   (dbSync)
   (while (..)
  (... do one update step ...)
  (at (0 . 1000) (commit 'upd) (dbSync)) )
   (commit 'upd)

The value of 1000 is an examply, I would try something between 100 and
1.

With that, after every 1000th update step other processes get a chance
to grab the lock in the (dbSync) after the 'commit'.


 Another option would be to do it in a loop and use put! which will only
 initiate the sync at the time of each update which should not block the
 balance updates for too long.

Right. This would be optimal in terms of giving freedom to other
processes, but it does only one single change in the 'put!', and thus
the whole update might take too long.

The above sync at every 1000th step allows for a good compromise.


 The question then is how much overhead does this cause when it comes to the
 balance updates in your experience? If significant is it possible to

I would not call this overhead. It is just so that the quick
operation of incrementing the balance may have to wait too long if the
second process does too many changes in a single transaction.

So the problem is not the 'inc!'. It just sits and waits until it can
do its job, and is then done quite quickly. It is the large update
'work' which may grab the DB for too long periods.


 somehow solve the issue of these two processes creating collateral damage
 to each other so to speak?

If you can isolate the balance (not like in your last example, where two
processes incremented and decremented the balance at the same time), and
make absolutely sure that only once process caches the object at a given
time, you could take the risk and do the incrementing/decrementing
without synchronization, with just (commit).

One way might be to have a single process take care of that,
communicating values to/from other processes with 'tell', so that no one
else needs to access these objects. But that's more complicated than
the straight-forward way.

♪♫ Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: ACID tests

2013-01-05 Thread Henrik Sarvell
How does the GUI framework work?

What if user A pulls upp object X in an update form and then goes to the
toilet and in the meantime user B pulls the same object and changes it,
will user A see any changes when he comes back?


On Sat, Jan 5, 2013 at 10:48 PM, Alexander Burger a...@software-lab.dewrote:

 Hi Henrik,

  Hi, I'm playing around with the syncing and getting behaviour that is not
  what I expected.
  ...
 (dbSync (get *DB '+UserAttr))
  ...
 (dbSync (get *DB '+User))

 I think that 'dbSync' is not really what you want to use here. Perhaps I
 explained this partially wrong last time in IRC, when we talked about
 locking separate parts of the DB.

 'dbSync' does too much here. The source is

(de dbSync (Obj)
   (let *Run NIL
  (while (lock (or Obj *DB))
 (wait 40) )
  (sync) ) )

 So in addition to the locking, it also does a 'sync'. And 'sync' waits
 until a 'commit' or a 'rollback' is executed. This is necessary to keep
 the caches of competing processes synchronized.


 If you want (as I understand our discussion in IRC) just lock separate
 parts of the DB, you use 'lock' directly.

 BTW, you can do such testing much easier. Just connect from two terminals
 to a running application with 'psh':

   Terminal 1:
   $ bin/psh 8080
   : (lock (get *DB '+UserAttr))
   - NIL

   Terminal 2:
   $ bin/psh 8080
   : (lock (get *DB '+User))
   - NIL

 The 'NIL' indicates success in both cases ('lock' returns the PID of the
 process holding the lock). You see this if you go on in Terminal 2 with

   : (lock (get *DB '+UserAttr))
   - 30461

 Such plain locks are used by the GUI (in lib/form.l), to allow just
 one single user to edit a given object.


 In any case, it is quite dangerous (or at least tricky) to run a
 database without the 'sync' mechanisms. You must be absolutely sure that
 no objects are modified which at the same time might be cached by other
 processes. And _if_ this is the case, then you probably also don't need
 to 'lock' anything (and just rely on the low-level locks in 'commit').

 ♪♫ Alex
 --
 UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe



Re: ACID tests

2013-01-05 Thread Alexander Burger
On Sat, Jan 05, 2013 at 11:25:30PM +0700, Henrik Sarvell wrote:
 How does the GUI framework work?
 
 What if user A pulls upp object X in an update form and then goes to the
 toilet and in the meantime user B pulls the same object and changes it,
 will user A see any changes when he comes back?

Yes. Try it ;-)
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: ACID tests

2013-01-05 Thread Henrik Sarvell
In which actual situations do you need to do dbSync - commit upd as
opposed to just commit upd? It seems to me the RMDBS equivalent is begin -
commit?

If you do db, put and commit upd calls so that they happen roughly at the
same time you should be as safe as you are when you do a update table set
bla bla where userid = 5 in an RMDBS?

Let's take what I actually do at work as an example, pretend that I would
like to try to move the whole casino database from MySQL/InnoDB to PL, the
main thing then would be the following SQL query for example: update users
set balance = balance + 10 where user_id = 50.

There can be 100 such calls per second and upwards 5 of them per second for
the same user. Doing a dbSync here doesn't makes sense, there are no
relations, just a number that needs increasing or decreasing, updating user
5's balance should not have to wait for an update of 10's balance to finish
and so on.

It seems to me that this situation is solved in PL by simply doing (commit
upd)s without any dbSyncs, if there are several updates coming in at
virtually the same time they will still be synced through the commit upds.
Also I can NEVER put the balance but have to use dec and inc, just like
in the RMDBS scenario, doing a set balance = 100 somewhere would be
suicide, apart from that I should be ok I think?

But the most common scenario though would simply be a (commit), ie we don't
care who gets there first or not.


On Sat, Jan 5, 2013 at 11:44 PM, Alexander Burger a...@software-lab.dewrote:

 On Sat, Jan 05, 2013 at 11:25:30PM +0700, Henrik Sarvell wrote:
  How does the GUI framework work?
 
  What if user A pulls upp object X in an update form and then goes to the
  toilet and in the meantime user B pulls the same object and changes it,
  will user A see any changes when he comes back?

 Yes. Try it ;-)
 --
 UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe