On Fri, Jul 09, 2010 at 04:06:48PM +0200, Christian Boos wrote: > [good post I should have replied to]
Well, what do you know, here I am again;-) I could not leave this message unanswered. (however I have cut it *all* out, so you don't know what it said.) Christian, there are many reasons I am not supplying a full model design up-front. I know you understand it, it's a HUGE task. Even if you all don't want to graph all your resourcess as I do (I *DONT* GET IT), maybe there still is something I can contribute to.. I *FULLY* admit there are BIG challenges in refactoring the data model, but also keep in mind that I have been promoting a long-term perspective with research as the primary goal. I will happily discuss any part of the future database design with you on e-mail, including the intricate details of storage mechanisms, to the very best of my knowledge. Heck we'll discuss it over a $drink some day before you know it :-) In the mean time, here is a slightly further developed version of previously mentioned model-level revisioning, to show use cases for views and triggers. I thought that might be cool to try. -------------------------------------------------------------------------- I wrote and "tested" for **SQLite** <3 JUST to be please YOU Christian!! <3 It was fun to do. I hope you all enjoy hearing about it. -------------------------------------------------------------------------- TABLES: Resources - let's just say Tickets Attributes - defines a "field" for a ticket, relationship to storage Storage - Same as ticket_custom Facts - New table; Holds username of person doing changes. If your mailer is not ASCII-capable, you should replace it!! :D .-----------------------------------------. | VIEW: ResourceValues | | . . . . . . . . . . . . . . . .'--------------. | | VIEW: AttributeRevisions | | | - -----------------. | | | VIEW: AttributeValues | | .|.......................... | | | VIEW: ResourceAttributes | | | Tables |Resources -> Attributes | -> Storage | -> Revisions| -> FACTS TracEQv|Ticket missing link | -> ticket_custom -> 4got name| timeline The "Timeline" is only accurate in the sense that the Facts table now takes it central job. Facts are the economy of this model, you *must* always obtain a valid fact_id and use it to lock the resources you are going to change (to tie all the modifications to groups, representable in timeline) CREATE VIEW get_attribute_revision_fact AS SELECT A.id AS attribute_id, F.id AS revision_fact FROM Attributes A INNER JOIN Resources R ON A.resource_id = R.id INNER JOIN Facts F ON COALESCE(R.lock_fact, 0) = F.id AND F.end_time IS NULL; Attribute being a link between resource (Ticket) and storage (_custom), I only just *skimmed* 1/4th of Carsten's post, but I think that's conceptually the same. Don't quote me, I sketched up a minimal model. For all rows in Resources, that have lock_fact set to an integer corresponding to rowid in Facts (that has end_time IS NULL), this view lists related Attribute ids and the corresponding Fact that is currently locking it. This view is used to determine the owners of value change operations (insert, update, delete), depending on who has locked them. It's just a long list of all attributes that are currently locked, derived from resource locks in "real-time") A TRIGGER can now use such a VIEW as a value when INSERTing a row to the revision table. It does require that you lock the resource with a valid fact_id (by INSERTing a row to Facts containing your username/id, and leaving the end_time as NULL, indicating that the operation is in progress). You would then have to supply this fact_id to a locking mechanism together with a resource id, whether directly table-based or with views to lock cross-domains, or black magic or possibly some other way with virtual tables or such... In this case you lock the resource by UPDATEing the lock_fact column in the Resources table with an open fact_id. You have to check that *YOU* are actually the one obtaining the lock, as is "traditional", and not just grab it blindly (we all know how that can go..): UPDATE Resources SET lock_fact=x WHERE id=y AND lock_fact=NULL; if someone grabbed the lock before you, the AND lock_fact=NULL will fail. If this update completes, you are *guaranteed* to have the lock. Now select it right back to check the result: SELECT lock_fact FROM Resources WHERE id=y; If you get your own fact_id in return, all is good, if not: retry or abort! Once the resource is locked, each row is autorevisioned, tagged with the relevant fact_id, so you can work your way back down to all the changes clustered below single edit afterwards. If the Facts table has a parent_id column, I THINK it seems to be possible to deal with nested "changesets" in this way (I have *NOT* tried that in practice though, but with manual intervention from the lock operator, say, a callchain of code that maintain a stack of facts, it should actually be "fairly easy" to track/backtrack and open a new fact by specifying the top stack element as parent (if any), pushing the resulting new id to stack (moving inwards), and popping the stack empty and update with end_time=.. when retracting... .....I HAVE NOT TESTED THAT AT ALL TO BE CLEAR.....). When you are done, you UPDATE your Fact with SET end_time=JULIANDAY('NOW'). A different trigger picks up this change, and releases the resource rows themselves (if any were actually locked).. storage is ticket_custom equivalent in this case. CREATE TRIGGER storage_changed_to_revision AFTER UPDATE OF value ON storage FOR EACH ROW WHEN OLD.value <> NEW.value -- no redundancy :D BEGIN INSERT INTO revisions (storage_id, attribute_id, owning_fact, acting_fact, created, revision_type, value) VALUES (OLD.rowid, OLD.attribute_id, OLD.fact_id, (SELECT revision_fact FROM get_attribute_revision_fact WHERE attribute_id = OLD.attribute_id ), JULIANDAY('NOW'), 'change', NEW.value); -- Autotag the new data with the current locking fact_id -- owning_fact above, is the value that was previously present here. -- this obviously is not equal for INSERT/DELETE triggers UPDATE storage SET fact_id = ( SELECT revision_fact FROM get_attribute_revision_fact WHERE attribute_id = OLD.attribute_id ) WHERE rowid = OLD.rowid; END; This example will neither block a change NOR perform revisioning if the resource is not locked (but we are free to do what we want, that is, the admin is free to do that by generating and installing different triggersets for the schema, depending on how strict s/he is with that env). That could easily be done from a comand line tool. The table Storage holds the current value and the fact_id that put it there. (the fact holds the username, datetime, ..) Note that this system keeps a *duplicate* of the current real value in revision history. Any other way I did it I was doomed, offset issues, could not reconstruct cleanly in SQL, heavy joins or unions or some shit, or would end up with something really cumbersome in code. PLUS we would've had to do extra magic to get all the facts right (AFAICTW/OT). It might seem a little bit redundant, but when you think about it it's really not. It's just that the revisioning system is *one small step ahead* of where you might expect it to be, and it helps in reassembling a complete changelog from start -> current right out of the revisions table. I actually think it turned out pretty neat (it's just a baby ofc) :-) When a storage row is first INSERTed, it is cloned to revisions *immediately* with an owning_fact of NULL, indicating that nobody had put a value in there previously, and acting_fact of the current locking fact. When it is UPDATEd, the new value is pushed to revisions with owning_fact being the *INSERTing fact* and acting_fact being the *UPDATING fact*. When it is deleted, the value NULL is pushed to revision. That way, all lookups to history are done against one table that contains all the information needed for a full reconstruction (JOIN against Facts..). The "ticket_custom" (storage) is a sort of "caching front-end" to the revisions table, it's an API. Storage holds _nothing_ we don't have, but it serves as a central interface to *manipulate* and validate data, and "caches" the currently valid value for quick comparison and retrieval. It is *OBVIOUSLY* a prototype. There is no sanity checking at all, etc. I do *NOT* know if it will work in the real-world (all I know is it works superficially, I have never used triggers for such tasks before, but I *have* seen it baked in the wild, albeit with ASE+procedures. SQLite is VERY fast if you need it to be.. I'm certain a PRAGMA synchronous = OFF before a bulk edit and ON afterwards would literally FLY ;). Wouldn't it be worth a little more testing you think???!? A big side-advantage - IF this meets performance and practicality requirements, is that a piece of external code in whatever language can operate the Trac database with full integrity and revisioning, as long as they follow the model API, even from a shell script piping back and forth to SQLite in a loop (and all would show up in timeline). Well, **I** consider that a big advantage, but you guys dont even want a **GRAPH** WTF!!!:D It is possible to create triggers that monitor the single columns in the main storage tables (ticket etc), and just push the values to the SAME revisions table IF they change INDIVIDUALLY, even though it is still stored exactly as it is today (we obviously generate these triggers with a script). The middleware is then free to push out all updates it can, the triggers will take care of only revisioning that which has changed. (the lock_fact would be added to WorkEffort in my world, since that's the "toplevel" entity where you can do it in *one place*. For the current codebase, either a locking table with realm/id is established, or a lock_fact is introduced to Ticket Milestone + other tables to introduce them to revisioning system (they need their triggers, revisions table etc. Well actually you could conceivably collapse *all* revisions to one table, do we rerally need to care about datatype once revisioned?? Could we not just CAST() it back if we have to restore?? idk.. It MUST be faster performance-wise to do this directly in SQLite, using its *tailored data monitoring mechanisms*, than to load it back-forth to host via queries, no??? .. I have not tested, but would absolutely think it to be the case. The locking drags down of course, but it is easy to do even if you have to repeat it, and it even HAS GOT *direct* benefits of its own (IMO)! Pre-generated testdata helps us prove the model: $ rm x.db; sqlite3 x.db < emptyschema.sql $ ./datagen.pl -tickets 1000 -milestones 10 -changes 10 > dummy.sql ^^^^ ^^ ^^ $ sqlite3 x.db < dummy.sql $ sqlite3 x.db sqlite> select count(*) from revisions_text; 21508 ^^^^^ sqlite> select count(*) from facts; 3041 ^^^^^ ^^^^ **Note** the number of facts/revisions for this size dataset (ABOVE) ^^ Can YOU please explain to me, what the CORRELATIONS of those numbers are?!??? (frankly I don't understand it *at all*... it just makes some SQL afaik.) sqlite> select resource_id, attribute_id, attribute_storage, revision_type, value from attributerevisions where attribute_id between 1 and 10; 1 1 text create Datagen Component 1 1 1 text change Datagen Component 1 1 1 text change Datagen Component 1 1 1 text change Datagen Component 1 1 2 text create Owner: 1 1 2 text change Owner: 1 upd 1 1 2 text change Owner: 1 upd 2 1 2 text change Owner: 1 upd 3 1 2 text change Owner: 1 upd 4 1 2 text delete 1 3 text create Datagen Milestone 1 1 3 text change Datagen Milestone 1 1 3 text change Datagen Milestone 1 1 3 text change Datagen Milestone 1 1 3 text change Datagen Milestone 1 1 3 text change Datagen Milestone 1 1 4 text create Description for Mil 2 7 text create Datagen Milestone 2 2 7 text change Datagen Milestone 2 2 7 text change Datagen Milestone 2 2 7 text change Datagen Milestone 2 2 8 text create Description for Mil 2 8 text change Description for Mil 2 8 text change Description for Mil 2 8 text change Description for Mil 2 8 text change Description for Mil 2 8 text change Description for Mil real 0m0.027s user 0m0.024s sys 0m0.000s (ofc storing the text "change" is redundant etc, just to test.) And for some revisioned write operations: $ cat 10k.sql PRAGMA synchronous = OFF; BEGIN EXCLUSIVE; -- Obtain a fact we can use to lock with INSERT INTO Facts (author) VALUES ('CLI script'); --id 1 -- you are *required* to specify lock_fact on creation INSERT INTO Resources (lock_fact) VALUES (1); --id 1 -- Declare an attribute belonging to above resource INSERT INTO Attributes (resource_id, storage_type, name) VALUES (1, 'text', 'name'); --id 1 -- And finally INSERT some data, it is copied to revisions. INSERT INTO storage_text (attribute_id, value) VALUES (1, 'Contents of custom field'); -- Now run 10k updates to the same string. UPDATE storage_text SET value='abcdef' where id=1; [.... total 10.000 copies of previous UPDATE statement ....] -- Wipe the data, close the fact. DELETE FROM storage_text; UPDATE Facts SET end_time = JULIANDAY('now') WHERE fact_id=1; COMMIT; $ time sqlite3 x.db < 10k.sql real 0m1.769s user 0m1.472s sys 0m0.296s sqlite> select resource_id, attribute_id, attribute_storage, value from attributerevisions; -- that being a view of revisions and attributes.. 1|1|text|Contents of custom field 1|1|text|abcdef 1|1|text| The trigger does not revision the 10k changes, because the value is identical! I have now wiped x.db and changed the trigger to revision unconditionally (removed WHEN <>), the 2 extra are of course INSERT and DELETE triggers (not shown, almost identical).. $ time sqlite3 x.db < 10k.sql real 0m3.044s user 0m2.024s sys 0m0.984s sqlite> select count(*) from attributerevisions; 10002 synchronous = ON, BEGIN .. COMMIT: real 0m2.994s user 0m1.944s sys 0m1.036s synchronous = ON, NO transaction: real 0m46.058s user 0m2.356s sys 0m3.084s (200+/sec synced w/o trans, high disk load on puter...not so bad) AFAICT features available from SQLite 2002 Jun 17 (2.5.0) (?!??!!) Certainly works on my debian stable. (I think I used'em in 4.0 too?) I will also quickly answer another key point that you raised, and it is the bridge to the data stored in the schema. The correct solutions for storage is already in place (save relations); what is missing is metadata. One proven solution is to provide the middleware with information about where the different data can be found (normal table or custom field) by documenting your domains *within* the model. The middleware must load all this information, that describes the available datatypes, table names, etc and use it as a basis for decisions, UI- and query-generation. Our case is not as complex as a full EAV/CR system, the question is only what minimum subset is possible to derive from this technology and get what we need. http://jamia.bmjjournals.com/content/6/6/478.full Christian, I will gladly devote some time coding SQL, <3 <3 <3 *TO* *YOUR* *SPECS* <3 <3 <3 (wouldn't that be a lot more productive?) (sorry there are probably alot of mistakes/inaccuracies technical and otherwise.. you have to live with it.. cause I *am* posting now:p)) I will be back online in (over?) the week-end. Nothing from me till then, I hope you enjoy this post meanwhile:-) ..... G*dd*%川 Carsten WHY would you reply wise words to a post I EXPLICITLY said I would NOT revive??.... ;( Terje -- You received this message because you are subscribed to the Google Groups "Trac Development" group. To post to this group, send email to trac-...@googlegroups.com. To unsubscribe from this group, send email to trac-dev+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/trac-dev?hl=en.