Re: [Trac-dev] Refactoring the Trac data model - relational approach

Christian Boos Thu, 08 Jul 2010 06:45:15 -0700

On 7/5/2010 11:27 PM, Terje S. wrote:

On Mon, Jul 05, 2010 at 05:12:31PM +0200, Christian Boos wrote:

And there's probably a reason for that, as none of the constraints,
triggers, checks or advanced locking features, not even something as
simple as a proper date field, can be used today in a reasonably
database neutral way... Even with the restricted feature set of SQL
we're using, we would have trouble porting to Oracle, for example.


I don't dispute this at all, except for the constraints and views. They
are *the* critical elements of a data model, and their core feature set
is available in one way or another across engines, including SQLite, so we
can build a model that is fundamentally equal.


"Prior to version 3.6.19, SQLite did not support foreign key constraints"
(http://www.sqlite.org/cvstrac/wiki?p=ForeignKeyTriggers)

For example, Python 2.5.4 is bundled with SQLite 3.3.4, Python 2.6.5 has3.5.9, and only Python 2.7 comes with 3.6.21. So this effectively rulesout relying on FK for the couple of years to come. Note that we candecorate the statements used to create the tables, for betterdocumentation, but we can't expect those constraints to be actuallyfunctional. And no, the eventual benefits workarounds with triggersdescribed in the ForeignKeyTriggers wiki page cited above are not worththeir complexity.

As for views, I don't see an use case yet, but if there's one, therewouldn't be problems to use them, except for updates, which is also notsomething consistently supported.

I suppose this is just for illustration, as the above doesn't really
fit well to Trac. For example the "work effort" central in the above
example is defined by something "that represent work with a
start..end time", which is not how I would define the project,
milestone and ticket resources in Trac. They have such properties,
but they are accidental rather than essential.


I think this example is highly relevant. It encapsulate a lot of central
things regarding project management, that we have a tendency to reinvent.
Consider the ideas outlined here:

http://trac-hacks.org/wiki/ProjectManagementIdeas

against this model. Do we have a clear path for how to handle such
things in the future, whatever approach is chosen? Why not start with
a simple model that is known to be good? Or do we just not care that some
poor schmuck has to deal with these issues down the road? (I see our
current system as a specific implementation of this general case, well, a
part of the system that is)

Be sure that I highly value these kind of design documents. It did readan early version of it, I'll re-read it later and I'm certainly tryingto take that kind of material into account (as well as hundreds oftickets) when thinking about an enhanced model. So I think we can agreeat least on the target goal, make it possible to code less and createmore (to reuse the Qt motto ;-) ).

Well, the GenericTrac page is more a scratch pad for new ideas. I
haven't yet found the time to write down the last batch of ideas I
had on the topic, before the 0.12 release caught most of my
attention, so this is by no means a finalized plan.... And for the
non-existing resource, think about it as a kind of "abstract base
class", there's no resource as such but most of the existing
entities (wiki page, ticket, milestone, repository, etc.) can be
seen as resources, they have an unique identity, properties, and
some of those properties are actually relations to other resources.


Congratulations again on the release. Yes this is pretty much as I had
understood it, but I still don't think it is fundamentally the right
approach. As I outlined in the fantasy novel about addicted wizards,
I think this will lead to a reinvention of the database features with
huge complexity in middleware.

Well, I certainly didn't get all the subtleties of that tale (and to befrank, when I see such contrived analogies, this doesn't encourage me totake what the author says too seriously). But yes, I don't mind a littlereinvention and abstracting away from the database, if this can helpmodularity, code reuse and overall simplification of the system. I thinkthat trying to map the model directly into the database leads to toomuch rigidity (witness the current ticket table for example), and tocompensate for this rigidity, we have workarounds like the currentticket_custom table or the enum table that are really hard to optimize,be it by wizards (addicted?/clean? didn't get it, really). The questionis, can you come up with a clean "relational" design which improves uponthat, which is normalized, easy to use, fast, scalable, yet extensiblein terms of adding/removing custom properties at run-time? Then I'm allears because it's also exactly what I'm looking for.

To expand a bit on the ticket module example, it is already able to copewith a very dynamic data model, due to its support of customproperties. We could actually simplify a lot of things in the code if wewould fold everything into custom properties, but that would mostcertainly be detrimental to performance, due to the very simplekey/value model we use there. So we need something a bit smarter, butstill as flexible.

The model you proposed in the RelationalModel page is only *veryremotely* related to what Trac needs, so maybe you should try to bridgethe gap. For an example use case, explain how you would handle theaddition of a custom field "version-fixed" for storing a reference tothe version in which a ticket has been fixed, assuming the user don'twant to abuse the milestone field for such a thing, like everyone elsedoes ;-)

On the same line, what you seem to consider an essential part of yourproposed model (from_date / thru_date) is *not* relevant for Trac, atleast not for tickets. At most, there's a date associated to the closingtime of a ticket, but a ticket can be closed/reopened many times... Andby the way, all changes to the tickets are versioned, something which isalso not at all present in your simplistic model.

The need for versioning of changes should ideally be taken into accountautomatically by the storage system. Maybe that's another hint thatperhaps a relational database is not the best suited for the job. Youshould really have a look at the TighterSubversionIntegration and theWhySQLite pages for prior discussion on this topic.

But you nevertheless nailed the problem scope quite well: how much
do we want to use the data modeling features of SQL versus
abstracting and doing some modeling in the middleware. Doing more in
middleware can give us a higher degree of flexibility, while still
having simple "building blocks" in the database, especially if we
think about future extensions, like supporting more complex 1-n
relations.


Yup, it's just another (current) difference in philosophy ;-)

Personally I favour a complex model and simple middleware *any* day
of the week. Ultimately, from my perspective, Trac is a very database-
centric application. In such cases, I think it makes sense to design
the database in a way that will essentially survive several generations of
code.

There's much more to it than just the database, the database is just apersistence layer, but not at all the level where we place the logic init. Take for example the permission model: before 0.10, permissions wereall placed in the database, now starting with 0.11, you can have anykind of permission policy implemented by plugins, and the old db-basedpermission policy is just one among them, which could even be turned off.

What you're describing here is the fact that the current Trac data
model is following a "specific" style of modeling (still according
to the terminology of that book). Trying to address this problem by


The problem with the current solution is that the model is specific,
but the implemented use case is generic, leading to a redundant approach.


Indeed.

going to a more "generalized" style of modeling is precisely the
ambition behind my GenericTrac approach. How to do it best and how
much  abstraction is needed is of course an open question. For
example, we perhaps don't want to push the abstraction to the level
of using triples...


I agree, this is one of the toughest things you can possibly envision to
encapsulate in a data model. But still, it *is* a relational data model
at the bottom, and I am quite sure the GenericTrac approach would perform
better if implemented on top of XML, raw files or other non-relational
technology (and these approaches may well deserve consideration)..

This could well be. In the longer term, one nice side-effect ofabstracting away the details of persistence could be to use a non-SQLbackend, or a (D)VCS system, which could indeed prove to be more suitedto store the kind of data manipulated by Trac.

I'm certainly interested in seeing some constructive criticism on
GenericTrac. As I've said above, I'll soon rewrite some parts of the
proposals, and keep you posted. As Remy said, there anyway won't be
a complete rewrite but rather small steps in this direction, if that
proves to be useful.

Well, it's just that I disagree with the fundamental approach to abstracting
storage in this way, again referring to the wizards. A good experiment would
be to set up some expected usage figures for a small, medium, and large team
in a multiproject scenario. What do they produce in terms of ticket changes
per day, wiki changes per day, commits and so forth over time? (remember all
this data is collapsed into one system, no wizard help).

We could use these estimates to generate test datasets, and benchmark the
model using simple "substitute middleware", avoiding the (big) issue of
how the resource schema is loaded and maintained in memory alltogether.

My concern is that this experiment would reveal artificial bottlenecks,
introduced in the middleware, because of the abstract model core. As the
dataset grows, the problems grow exponentially.

For my generic Trac model, I'll eventually go back to a more natural setof tables, each associated with one kind of resources, precisely toavoid possible bottlenecks. Creating ad-hoc tables for relations betweenresources (e.g. a ticket_to_milestone table) is also an option toexplore, so yes, we would effectively have to benchmark things beforecommitting to a model. Once again, I don't know yet the final form thiswill take, in the end this will be a simple, normalized, scalable model,I hope a solid foundation to build upon, so maybe even something thatcould satisfy your database design taste, who knows?


-- Christian

--
You received this message because you are subscribed to the Google Groups "Trac 
Development" group.
To post to this group, send email to trac-...@googlegroups.com.
To unsubscribe from this group, send email to 
trac-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/trac-dev?hl=en.

Re: [Trac-dev] Refactoring the Trac data model - relational approach

Reply via email to