Hi list \o/ Hope you all had a good week-end.
On Thu, Jul 15, 2010 at 07:13:47PM +0200, Carsten Klein wrote: > To be frank, you do not get 1k+/s new issues nor do you modify 1k+/s > issues at the same time, neither do you get 1k+ page edits or page adds in I agree. However, this is not a full solution, it will require much more. But *from* those early numbers, we can deduct that such system will probably perform OK when all the other triggers are in place (we have quite a good margin before hitting any realistic limit, as you seem to agree.) Remember that the tested numbers are not simultaneous edits by users. They are "atomic changes" at storage level. One edit by a user is NOT one change, it maybe 0 or 5 or 20 or 100 depending entirely on what was done, how many custom fields are present etc. What happens if you rename a milestone with 1000 associated tickets, for example? First, one update to the milestone table is done, and then all associated tickets are updated; total 1001 revisioned changes in a nanosecond. > What I do not understand, however, is why you reinvent the row lock > mechanisms already available in the system. [...] > [..post merge..] > After having thought about its purpose for I while, I came to the > conclusion that you want to lock a resource when a user starts for example > editing a page or a ticket. BEEEP! Think again! :D The very first place I talk about the locking mechanism I say this: [...] (to tie all the modifications to groups, representable in timeline) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The lock is *not* to be grabbed on read request! It is an exclusive write lock to be grabbed when the middleware *changes data* that is coupled to the revisioning system, and released immediately when done. The only real "locking" function it serves, is to make sure that two parallel processes cannot change rows in storage for the same resource at the same time (we are not locking the row we are applying lock logic to, but rather all rows *associated with it*, located in other tables). The MAIN purpose, is something entirely different, and I did not explain this well enough in the OP. How would you, in one query, delete a row *and* specify who did it? ;) Think of it this way: When a trigger in the schema detects a change of any type on storage table (which means there is some data to revision), it executes a block of SQL. Now the question becomes, how does the SQL block in the trigger know what user is performing the query? To accurately revision the data, we must put in with it the name of the person or program who messed with the values. 1. Tell the system who is making a change (make a new Fact) 2. Tell the revisioning system what resource(s) you plan to change, by locking them with your fact_id. The VIEW now lists your fact as modifying all relevant attributes. 3. Perform modifications 4. Release your Fact (this cascades a release of locks) The entire purpose of the VIEW is to present to the TRIGGERs this info that they *cannot* know themselves. The trigger only has a pitiful link to an attribute, nothing about the user. So we provide a facility for it to look at, the view, showing the information the trigger has (attribute id), right next to the information it needs (fact), so: SELECT revision_fact <-- grab the info our trigger does not have FROM get_attribute_revision_fact <-- from the view WHERE attribute_id = OLD.attribute_id <-- using the attribute id If we had not locked the resource (in this way or some other), the trigger could not obtain the author information with any query, because the information does not exist! The VIEW is a highly inefficient solution for a large dataset. The fastest approach is to ATTACH DATABASE ':memory:' AS factprovider; You could then use triggers to populate an in-memory table with the fact-attribute tuples when locks are taken (we don't care about attribute links in other database connections). This prevents a heavy look at the view (2 joins) for each revisioned row of data. I use a view because it's the easiest for experimental work, and works just fine with a small dataset. > [...] However, users might exist that will use this mechanism for > indefinitely locking individual resources, so that eventually the > administrator of the system will have to remove these locks again, > manually. Certainly, this is an issue. However, we are not grabbing the lock on read, so it would occur only in the case where middleware crashes or for other reason does not release its fact. We will have this problem in any case, the difference is we can now detect if it has happened, because it *will* result in a stale lock. Now you might consider it a disadvantage, but in *my* (messed up) world it's a major benefit. I would **MUCH** rather know that something went to pieces (and be able to track which edit, which user at which time), than to simply have some corrupt/missing data somewhere and not know about it. There are several unexplored options that might help, like using expiry timestamp on locks. But inevitably you have to make a decision on what to do, whether manually or automatically. > Of course, this would require the db layer to be extended so as to allow > the user of that system to select for update, which is currently not part > of its implementation. Absolutely, the db abstraction will change with the model. The biggest issue with this solution, is coming up with a clean way to generate equivalent system across engines. While simple in principle, the differing support for AFTER UPDATE OF .. ON means that some engines like mysql and postgres!=dev need to combine several triggers into one with IF blocks or some other technique. Exactly how this is done, has not been researched at all. I take it for granted, however, that if SQLite can provide the goods, it is possible to duplicate it in the larger engines with other functionality (and I know this to be the case for quite a few, but maybe it's not all; mysql4 can never be supported for example, since it does not have triggers afaik) After sleeping on this stuff for some nights, I think I concluded that the double-fact system is redundant. A single column will do for this purpose, the "linked list" does not add anything except practical help in a few edge cases - it will be removed. I've started work on a small trigger generator, and it looks promising (SQLite only so far). I'll inform the list, and supply the code for testing, as soon as it is *somewhat* more functional.. (not Python tho) (if the weather stays this bad, it'll be this week I guess ;-) Terje -- You received this message because you are subscribed to the Google Groups "Trac Development" group. To post to this group, send email to trac-...@googlegroups.com. To unsubscribe from this group, send email to trac-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/trac-dev?hl=en.