Re: [Trac-dev] Refactoring the Trac data model - relational approach

Christian Boos Fri, 09 Jul 2010 07:07:48 -0700

On 7/9/2010 7:14 AM, Terje S. wrote:

On Thu, Jul 08, 2010 at 03:44:02PM +0200, Christian Boos wrote

As for views, I don't see an use case yet, but if there's one, there
wouldn't be problems to use them, except for updates, which is also
not something consistently supported.

The use case for views in data modelling without update support is
absolutely present! In this context, you could have Work Efforts at
the bottom ("ticket" being a row in effort type). The view would then
have the name Tickets, and join these systems together so they can be
reached in a uniform way from the users perspective (for reporting).

You're overestimating this use case. We still support SQL reports, butthey're "deprecated" since 0.10 and the introduction of custom queries.We're probably not going to remove that until we have a flexible querylanguage (a la FishEye's EyeQL or Mingle's MQL or JIRA's JQL, etc.), butat least we've always warned people that if they use reports, they haveto be ready to adapt and maintain them.

So we don't really want our users to have to fiddle with SQL, it's anadvanced and use at your own risk feature.

Either it is a feature request that must be considered (then discarded,
or implemented which is all work), or patch (review/commit/reject it),
support request (help submitter) etc. This entity also, unambigiously,
is handled by a *WORK*flow system, does that not sound like a hint?;-)

Not really, no. A workflow is more about evolution of state; a wiki pagecould eventually have an associated workflow as well, e.g. edited ->reviewed -> approved -> obsolete (repeat)

The Ticket concept belongs in the front-end system, - as layering of
software dictates - should be translated downwards. You *cannot* model
this *front-end* concept *directly* in the *storage layer* as you do,
without serious side effects.

Well, it's true the current Trac model does that, but in fact a lot ofpeople like this simplicity. Subsuming tickets and milestones into somekind of type hierarchy is certainly appealing, but is a lot more complexthat you seem to imply. Have a look at the SubTickets proposals and therelated tickets for some past discussion on this topic.

come up with a clean "relational" design which improves upon that,
which is normalized, easy to use, fast, scalable, yet extensible in
terms of adding/removing custom properties at run-time? Then I'm all
ears because it's also exactly what I'm looking for.

Not a "perfect" solution, of course, because of the special requirements on
sqlite versions and the complexity of the system. But *signifficant*
improvements can be made, as I hope this post will show more clearly..


Let's see...

The exact workings of the custom field subsystems, I will supply input on
down the line, when a few other key parts of the design has been done. Its
exact final scope of work is quite unclear to me at the moment.


So you start by putting one hard and essential part aside.

if we would fold everything into custom properties, but that would
most certainly be detrimental to performance, due to the very simple
key/value model we use there. So we need something a bit smarter,
but still as flexible.

What you have failed to do, is to discover the fundamental properties of the
entities you are modelling. Both the ticket and the milestone represents
work units in a hierarchy, as is favoured by some project management
disciplines, even if you *still* think they don't ;-)

Yes, and the more you insist on that, the more I suspect you don'treally "get" how Trac works. Please read ticket #710, time management isa non-essential part of Trac. It *could* be supported, but it's not byany mean central to it. Trac is about maintaining the knowledge relatedto a software project, not a project planning solution.

But this is certainly a testimony to the flexibility of the system, itcan be different things to different people. Therefore, hard-wiring astart and end times into every aspect of the system that remotely lookslike an "unit of work" (be it the tickets or the milestones, or evenversions and projects, if I follow you correctly) is not the rightdirection. Making it possible to do so for those who really see theworld like that, sure.

...
And when you take this even further, consider how two Tracs could exchange
a part of this hierarchy, say Trac "1.0" contacts an old .12 env elsewhere
on the globe via XMLRPC. It could then create a local work unit to represent
an external ticket/milestone that will be synced periodically. The model
can handle this simply with a new row in the Effort Type to classify the
entity as "External work". You could now use this entity as a dependency
*anywhere* in your own work breakdown, control it with a local testing
or other workflow etc.

"Subtyping" is certainly a desirable property of the model. Using a typefield is one possibility, but I don't see what's so groundbreaking in that.

Obviously, a plugin can register a row in the Effort Types, and for example
provide support for an external ticket in another issue tracker alltogether:

INSERT INTO WorkEffortTypes (name) VALUES ('External Work'); -- id 1
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .11 XMLRPC');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .11 .db');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .12 XMLRPC');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Trac .12 .db');

Trac x (.11, .12) x (XMLRPC, .db) ... , well this looks a bit Non-1NF,don't you think? ;-)

INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Bugzilla');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Redmine');

Each of these rows would have a corresponding piece of code, core or plugin,
that takes care of the related business logic. This logic is free to sync in
an n-level hierarchy of work below itself from the external source, and this
is completely transparent to the rest of the system. It would appear as local
work; read-only or read-write, pull/push sync, marked explicity or not, or
whatever the code is made and configured to do.


And how is this related to the way you store that information into the DB?

bridge the gap. For an example use case, explain how you would
handle the addition of a custom field "version-fixed" for storing a
reference to the version in which a ticket has been fixed, assuming
the user don't want to abuse the milestone field for such a thing,
like everyone else does ;-)

No problem!

A version is *easily* envisioned as a work effort: You start at one
point in time and finish it later, and it sure is a lot of work in between
(and the exact same thing is true for any other part of repo, *and* of the
repo itself).

Well, you're describing a milestone. I proposed "Version" in my exampleto see how you'd handle a relation to something *not* qualifying as awork effort...

  So in practice you would need some new rows in Assoc Type to
declare the existence of such relationships in the system:

INSERT INTO WorkEffortAssocTypes (name) VALUES ('Repository relations'); -- 1
INSERT INTO WorkEffortAssocTypes (parent_id, name) VALUES (1, 'Fixed in');
INSERT INTO WorkEffortAssocTypes (parent_id, name) VALUES (1, 'Planned for');

These rows must be directly handled by a corresponding piece of middleware
code, core or plugin. Often the child rows fixed/planned are represented
in code as a subclass of a repository relation class, which is in turn
is subclassed from a workeffortassoctype class (not always). Only this
code knows what the relationships means in terms of fields and forms to
display, what to do etc.

Again, nothing strictly related to the precise way things are stored inthe db. There could be other ways. For example, if we would store'fixed-in' directly as a property name, this would make the content ofthe database more "readable", at the cost of a small amount of dataduplication, certainly negligible.

  It must "manually" check that these relations
are only created so they point from/to a work effort with a valid type
(this is the cost of a generic model).


Indeed.

  In the specific use case you
request explained, the corresponding rows to check against might be:

INSERT INTO WorkEffortTypes (name) VALUES ('Repositories'); -- 1
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (1, 'Subversion');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (2, 'Trunk');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (2, 'Branch');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (4, 'Release');
INSERT INTO WorkEffortTypes (parent_id, name) VALUES (5, 'Version');

The relation would then be created by inserting a row into the
WorkEffortAssocs table, pointing from one work unit to another, and
specifying the ID acquired from the previously shown insert as assoc_type
(the "fixed in" id). This code would leave a NULL in the assoc_rule
column, since it is not relevant for this type of relationship at all.
from_eff_id contains the ID of a work effort declared with an effort_type
as ticket (not shown). to_eff_id contains the ID of a work effort
declared as an effort_type from the list immediately above.

Q.E.D. ;-)

No, I'm not convinced, this example hierarchy is really a stretch, theassoc_rule sounds dubious, etc. The above is confusing, at best. I failto see how this could be translated into better code. So I think I'lljoin the choir of people asking to see something more concrete ;-)

The need for versioning of changes should ideally be taken into
account automatically by the storage system. Maybe that's another
hint that perhaps a relational database is not the best suited for
the job. You should really have a look at the
TighterSubversionIntegration and the WhySQLite pages for prior
discussion on this topic.

I have made a note to re-read these pages in the near future.

For SQLite, I submit the following proof-of-concept:

CREATE TRIGGER delete_to_revision
   AFTER DELETE ON storage
     BEGIN
       INSERT INTO revisions
         (storage_id, column1, column2, ..)
       VALUES
         (OLD.rowid, OLD.column1, OLD.column2, 'delete', JULIANDAY('NOW'));
     END;

1) Again, triggers are not part of the minimal common SQL subset we wantto support.2) For similar reasons, we don't want to use DATE or TIME types, theyare highly non portable, it's a can of worms and we're doing quite finewith our microsecond timestamps stored as bigints3) Doing the revisioning the way you do above is simplistic, you storeredundant values even for fields that haven't changed. Guess what, we'redoing the same for the wiki table, but that's precisely part of whatshould be improved.


etc.

I know that criticism is much easier than trying to come up withalternatives, so thanks for trying to do the latter. There are certainly/some/ of your ideas that will prove useful in the long run, and I'mstill grateful for the reference to the book you gave us in your first mail.

Maybe you could approach this whole topic a bit differently: instead oftrying to come with a completely new model by yourself (a model whichwas probably fine for the domain it was designed for, or simply as asample model, but not really adapted to Trac), you could instead spotthe weaknesses and shortcomings in the model I'm trying to develop inGenericTrac. But please first try to understand the problem scope, theconstraints on versioning, flexibility, portability, etc.


-- Christian

--
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to trac-...@googlegroups.com.
To unsubscribe from this group, send email to 
trac-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en.

Re: [Trac-dev] Refactoring the Trac data model - relational approach

Reply via email to