Re: ACID tests
I've done some testing now, the dbSync calls incur very little overhead with regards to the telling/updating mechanism of objects in other processes. What I still don't understand then is the ability to lock only certain files in the dbSync call when the dbSync call is doing a global block in itself. It seems to defeat the purpose of per file level locking? On Sun, Jan 6, 2013 at 8:53 PM, Alexander Burger a...@software-lab.dewrote: On Sun, Jan 06, 2013 at 08:00:51PM +0700, Henrik Sarvell wrote: OK so how would you deal with a situation where you have the need to quickly increment people's balances like I mentioned previously but at the same time you have another process that has to update a lot of objects by fetching information from many others? This second process will take roughly one minute to complete from start to finish and will not update +User in any way. I would do it the normal, safe way, i.e. with 'inc!'. Note that the way you proposed doesn't have so much less overhead, I think, because it still uses the 'upd' argument to 'commit', which triggers communication with other processes, and 'commit' itself, which does a low-level locking of the DB. If I have understood things correctly simply doing dbSync - work - commit in the second process won't work here because it will block the balance updates. Yes, but you can control this, depending on how many updates are done in the 'work' between dbSync and commit. We've discussed this in IRC, so for other readers here is what I do usually in such cases: (dbSync) (while (..) (... do one update step ...) (at (0 . 1000) (commit 'upd) (dbSync)) ) (commit 'upd) The value of 1000 is an examply, I would try something between 100 and 1. With that, after every 1000th update step other processes get a chance to grab the lock in the (dbSync) after the 'commit'. Another option would be to do it in a loop and use put! which will only initiate the sync at the time of each update which should not block the balance updates for too long. Right. This would be optimal in terms of giving freedom to other processes, but it does only one single change in the 'put!', and thus the whole update might take too long. The above sync at every 1000th step allows for a good compromise. The question then is how much overhead does this cause when it comes to the balance updates in your experience? If significant is it possible to I would not call this overhead. It is just so that the quick operation of incrementing the balance may have to wait too long if the second process does too many changes in a single transaction. So the problem is not the 'inc!'. It just sits and waits until it can do its job, and is then done quite quickly. It is the large update 'work' which may grab the DB for too long periods. somehow solve the issue of these two processes creating collateral damage to each other so to speak? If you can isolate the balance (not like in your last example, where two processes incremented and decremented the balance at the same time), and make absolutely sure that only once process caches the object at a given time, you could take the risk and do the incrementing/decrementing without synchronization, with just (commit). One way might be to have a single process take care of that, communicating values to/from other processes with 'tell', so that no one else needs to access these objects. But that's more complicated than the straight-forward way. ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: ACID tests
On Sun, Jan 06, 2013 at 01:33:55AM +0700, Henrik Sarvell wrote: In which actual situations do you need to do dbSync - commit upd as opposed to just commit upd? It seems to me the RMDBS equivalent is begin - commit? The sequence (dbSync) - (modify objects) - (commit 'upd) is all and only about two things: Avoiding race conditions, and keeping the cached objects in all involved processes consistent. If only one single process writes to the database (this is usually the case when a new database is created and populated with initial data, but before the main event loop is started), you don't need (dbSync). You just call (commit) after creating or modifying objects. As soon as several processes operate on the DB, you should call (dbSync) before modifying anything, and (commit 'upd) when you are done. It is theoretically possible to just go ahead, modify some objects, and call (commit) to write the changes to the DB. But then you must be aware that any other process having one of the involved objects already in memory (this happens when the program accessed the object's value or property list) will continue keeping the old state of that object. It will use an outdated version, possibly giving wrong results, and -- even worse -- will write this old state to the DB when it modifies that object at a later time. So if, for example, a process modifies an object's address, then another process modifies that object's telephone number, the second process will overwrite the changed address with the old version. A change to an object's property will in many cases cause changes in index trees (B-tree nodes are implemented as DB objects too), and in other objects (e.g. in bi-directional '+Joint' relations). For that reason, unsynchronized changes will almost surely result in a completely messed-up database. You can avoid the synchronization only if you are absolutely sure that no other process has read (and cached) the objects you are about to modify. Then you can simply go ahead, create and modify the objects, and call (commit). It is important here not to forget here that while a newly created object itself is safe (no other process can have it already), the creation of objects usually causes the modification of other objects (tree nodes etc.). If you do db, put and commit upd calls so that they happen roughly at the same time you should be as safe as you are when you do a update table set bla bla where userid = 5 in an RMDBS? roughly at the same time is probably not enough. Let's take what I actually do at work as an example, pretend that I would like to try to move the whole casino database from MySQL/InnoDB to PL, the main thing then would be the following SQL query for example: update users set balance = balance + 10 where user_id = 50. There can be 100 such calls per second and upwards 5 of them per second for the same user. Doing a dbSync here doesn't makes sense, there are no relations, just a number that needs increasing or decreasing, updating user 5's balance should not have to wait for an update of 10's balance to finish and so on. It seems to me that this situation is solved in PL by simply doing (commit upd)s without any dbSyncs, if there are several updates coming in at virtually the same time they will still be synced through the commit upds. Yes, but the caching issue is not resolved. If two processes increment the object's counter, the second one will still hold the old un-incremented value, because it didn't wait to receive the changes broadcasted by (commit 'upd). This waiting is handled by the 'sync' call in (dbSync). You could call (rollback) before the increment, thus forcing the object (_all_ objects, to be precise) to be reloaded, but you may still have a race condition where another process increments the object before you do your 'commit'. ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: ACID tests
Yes, but the caching issue is not resolved. If two processes increment the object's counter, the second one will still hold the old un-incremented value, because it didn't wait to receive the changes broadcasted by (commit 'upd). This waiting is handled by the 'sync' call in (dbSync). I'm not sure I follow you with that, when I do my testing with the following code: (load lib/http.l) (de decBalance () (let U (db 'uname '+User henrik) (dec U 'balance 10) ) (commit 'upd) (println Decreased balance by 10.) ) (de incRepeat () (let U (db 'uname '+User henrik) (put! U 'balance 100) (println (make (do 10 (inc U 'balance 10) (commit 'upd) (link (; U balance)) (wait 2000) ) ) ) ) ) (class +User +Entity) (rel uname(+Key +String)) (rel balance (+Number)) (dbs (4 +User) (4 (+User uname balance))) (pool /opt/picolisp/projects/acid-test/ *Dbs) #(setq U (request '(+User) 'uname henrik 'balance 100)) #(commit) #(mapc show (collect 'uname '+User)) (de go () (server 8080)) I get (110 120 120 130 140 150 160 170 180 190) as the result of http://localhost:8080/!incRepeat when I also open http://localhost:8080/!decBalance in another tab while incRepeat runs. Seems to me that incRepeat gets the decrease alright? On Sun, Jan 6, 2013 at 5:16 PM, Alexander Burger a...@software-lab.dewrote: On Sun, Jan 06, 2013 at 01:33:55AM +0700, Henrik Sarvell wrote: In which actual situations do you need to do dbSync - commit upd as opposed to just commit upd? It seems to me the RMDBS equivalent is begin - commit? The sequence (dbSync) - (modify objects) - (commit 'upd) is all and only about two things: Avoiding race conditions, and keeping the cached objects in all involved processes consistent. If only one single process writes to the database (this is usually the case when a new database is created and populated with initial data, but before the main event loop is started), you don't need (dbSync). You just call (commit) after creating or modifying objects. As soon as several processes operate on the DB, you should call (dbSync) before modifying anything, and (commit 'upd) when you are done. It is theoretically possible to just go ahead, modify some objects, and call (commit) to write the changes to the DB. But then you must be aware that any other process having one of the involved objects already in memory (this happens when the program accessed the object's value or property list) will continue keeping the old state of that object. It will use an outdated version, possibly giving wrong results, and -- even worse -- will write this old state to the DB when it modifies that object at a later time. So if, for example, a process modifies an object's address, then another process modifies that object's telephone number, the second process will overwrite the changed address with the old version. A change to an object's property will in many cases cause changes in index trees (B-tree nodes are implemented as DB objects too), and in other objects (e.g. in bi-directional '+Joint' relations). For that reason, unsynchronized changes will almost surely result in a completely messed-up database. You can avoid the synchronization only if you are absolutely sure that no other process has read (and cached) the objects you are about to modify. Then you can simply go ahead, create and modify the objects, and call (commit). It is important here not to forget here that while a newly created object itself is safe (no other process can have it already), the creation of objects usually causes the modification of other objects (tree nodes etc.). If you do db, put and commit upd calls so that they happen roughly at the same time you should be as safe as you are when you do a update table set bla bla where userid = 5 in an RMDBS? roughly at the same time is probably not enough. Let's take what I actually do at work as an example, pretend that I would like to try to move the whole casino database from MySQL/InnoDB to PL, the main thing then would be the following SQL query for example: update users set balance = balance + 10 where user_id = 50. There can be 100 such calls per second and upwards 5 of them per second for the same user. Doing a dbSync here doesn't makes sense, there are no relations, just a number that needs increasing or decreasing, updating user 5's balance should not have to wait for an update of 10's balance to finish and so on. It seems to me that this situation is solved in PL by simply doing (commit upd)s without any dbSyncs, if there are several updates coming in at virtually the same time they will still be synced through the commit upds. Yes, but the caching issue is not resolved. If two processes increment the object's counter, the second one will still hold the old un-incremented value,
Re: ACID tests
OK so how would you deal with a situation where you have the need to quickly increment people's balances like I mentioned previously but at the same time you have another process that has to update a lot of objects by fetching information from many others? This second process will take roughly one minute to complete from start to finish and will not update +User in any way. If I have understood things correctly simply doing dbSync - work - commit in the second process won't work here because it will block the balance updates. Another option would be to do it in a loop and use put! which will only initiate the sync at the time of each update which should not block the balance updates for too long. The question then is how much overhead does this cause when it comes to the balance updates in your experience? If significant is it possible to somehow solve the issue of these two processes creating collateral damage to each other so to speak? On Sun, Jan 6, 2013 at 7:19 PM, Alexander Burger a...@software-lab.dewrote: On Sun, Jan 06, 2013 at 05:43:06PM +0700, Henrik Sarvell wrote: Yes, but the caching issue is not resolved. If two processes increment ... I'm not sure I follow you with that, when I do my testing with the following code: ... (do 10 (inc U 'balance 10) (commit 'upd) (link (; U balance)) (wait 2000) ) ) ) ) ) ... I get (110 120 120 130 140 150 160 170 180 190) as the result of ... Seems to me that incRepeat gets the decrease alright? This works because 'wait' is an idle loop, which also synchronizes in the background. Try a busy loop like (do 1000) instead. Then the above code will happily increment the local balance, no matter how much other processes decrement the value in the meantime. 'sync' uses the internal mechanisms of 'wait', but in addition to that also addresses the issue of race conditions, which are difficult to reproduce with your setup. 'sync' guarantees that notifications about changes done by one process are sent _atomically_ to all other processes. ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: ACID tests
On Sun, Jan 06, 2013 at 08:00:51PM +0700, Henrik Sarvell wrote: OK so how would you deal with a situation where you have the need to quickly increment people's balances like I mentioned previously but at the same time you have another process that has to update a lot of objects by fetching information from many others? This second process will take roughly one minute to complete from start to finish and will not update +User in any way. I would do it the normal, safe way, i.e. with 'inc!'. Note that the way you proposed doesn't have so much less overhead, I think, because it still uses the 'upd' argument to 'commit', which triggers communication with other processes, and 'commit' itself, which does a low-level locking of the DB. If I have understood things correctly simply doing dbSync - work - commit in the second process won't work here because it will block the balance updates. Yes, but you can control this, depending on how many updates are done in the 'work' between dbSync and commit. We've discussed this in IRC, so for other readers here is what I do usually in such cases: (dbSync) (while (..) (... do one update step ...) (at (0 . 1000) (commit 'upd) (dbSync)) ) (commit 'upd) The value of 1000 is an examply, I would try something between 100 and 1. With that, after every 1000th update step other processes get a chance to grab the lock in the (dbSync) after the 'commit'. Another option would be to do it in a loop and use put! which will only initiate the sync at the time of each update which should not block the balance updates for too long. Right. This would be optimal in terms of giving freedom to other processes, but it does only one single change in the 'put!', and thus the whole update might take too long. The above sync at every 1000th step allows for a good compromise. The question then is how much overhead does this cause when it comes to the balance updates in your experience? If significant is it possible to I would not call this overhead. It is just so that the quick operation of incrementing the balance may have to wait too long if the second process does too many changes in a single transaction. So the problem is not the 'inc!'. It just sits and waits until it can do its job, and is then done quite quickly. It is the large update 'work' which may grab the DB for too long periods. somehow solve the issue of these two processes creating collateral damage to each other so to speak? If you can isolate the balance (not like in your last example, where two processes incremented and decremented the balance at the same time), and make absolutely sure that only once process caches the object at a given time, you could take the risk and do the incrementing/decrementing without synchronization, with just (commit). One way might be to have a single process take care of that, communicating values to/from other processes with 'tell', so that no one else needs to access these objects. But that's more complicated than the straight-forward way. ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: ACID tests
How does the GUI framework work? What if user A pulls upp object X in an update form and then goes to the toilet and in the meantime user B pulls the same object and changes it, will user A see any changes when he comes back? On Sat, Jan 5, 2013 at 10:48 PM, Alexander Burger a...@software-lab.dewrote: Hi Henrik, Hi, I'm playing around with the syncing and getting behaviour that is not what I expected. ... (dbSync (get *DB '+UserAttr)) ... (dbSync (get *DB '+User)) I think that 'dbSync' is not really what you want to use here. Perhaps I explained this partially wrong last time in IRC, when we talked about locking separate parts of the DB. 'dbSync' does too much here. The source is (de dbSync (Obj) (let *Run NIL (while (lock (or Obj *DB)) (wait 40) ) (sync) ) ) So in addition to the locking, it also does a 'sync'. And 'sync' waits until a 'commit' or a 'rollback' is executed. This is necessary to keep the caches of competing processes synchronized. If you want (as I understand our discussion in IRC) just lock separate parts of the DB, you use 'lock' directly. BTW, you can do such testing much easier. Just connect from two terminals to a running application with 'psh': Terminal 1: $ bin/psh 8080 : (lock (get *DB '+UserAttr)) - NIL Terminal 2: $ bin/psh 8080 : (lock (get *DB '+User)) - NIL The 'NIL' indicates success in both cases ('lock' returns the PID of the process holding the lock). You see this if you go on in Terminal 2 with : (lock (get *DB '+UserAttr)) - 30461 Such plain locks are used by the GUI (in lib/form.l), to allow just one single user to edit a given object. In any case, it is quite dangerous (or at least tricky) to run a database without the 'sync' mechanisms. You must be absolutely sure that no objects are modified which at the same time might be cached by other processes. And _if_ this is the case, then you probably also don't need to 'lock' anything (and just rely on the low-level locks in 'commit'). ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: ACID tests
On Sat, Jan 05, 2013 at 11:25:30PM +0700, Henrik Sarvell wrote: How does the GUI framework work? What if user A pulls upp object X in an update form and then goes to the toilet and in the meantime user B pulls the same object and changes it, will user A see any changes when he comes back? Yes. Try it ;-) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: ACID tests
In which actual situations do you need to do dbSync - commit upd as opposed to just commit upd? It seems to me the RMDBS equivalent is begin - commit? If you do db, put and commit upd calls so that they happen roughly at the same time you should be as safe as you are when you do a update table set bla bla where userid = 5 in an RMDBS? Let's take what I actually do at work as an example, pretend that I would like to try to move the whole casino database from MySQL/InnoDB to PL, the main thing then would be the following SQL query for example: update users set balance = balance + 10 where user_id = 50. There can be 100 such calls per second and upwards 5 of them per second for the same user. Doing a dbSync here doesn't makes sense, there are no relations, just a number that needs increasing or decreasing, updating user 5's balance should not have to wait for an update of 10's balance to finish and so on. It seems to me that this situation is solved in PL by simply doing (commit upd)s without any dbSyncs, if there are several updates coming in at virtually the same time they will still be synced through the commit upds. Also I can NEVER put the balance but have to use dec and inc, just like in the RMDBS scenario, doing a set balance = 100 somewhere would be suicide, apart from that I should be ok I think? But the most common scenario though would simply be a (commit), ie we don't care who gets there first or not. On Sat, Jan 5, 2013 at 11:44 PM, Alexander Burger a...@software-lab.dewrote: On Sat, Jan 05, 2013 at 11:25:30PM +0700, Henrik Sarvell wrote: How does the GUI framework work? What if user A pulls upp object X in an update form and then goes to the toilet and in the meantime user B pulls the same object and changes it, will user A see any changes when he comes back? Yes. Try it ;-) -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe