On 7/5/2010 11:27 PM, Terje S. wrote:
On Mon, Jul 05, 2010 at 05:12:31PM +0200, Christian Boos wrote:
And there's probably a reason for that, as none of the constraints,
triggers, checks or advanced locking features, not even something as
simple as a proper date field, can be used today in a reasonably
database neutral way... Even with the restricted feature set of SQL
we're using, we would have trouble porting to Oracle, for example.
I don't dispute this at all, except for the constraints and views. They
are *the* critical elements of a data model, and their core feature set
is available in one way or another across engines, including SQLite, so we
can build a model that is fundamentally equal.
"Prior to version 3.6.19, SQLite did not support foreign key constraints"
(http://www.sqlite.org/cvstrac/wiki?p=ForeignKeyTriggers)
For example, Python 2.5.4 is bundled with SQLite 3.3.4, Python 2.6.5 has
3.5.9, and only Python 2.7 comes with 3.6.21. So this effectively rules
out relying on FK for the couple of years to come. Note that we can
decorate the statements used to create the tables, for better
documentation, but we can't expect those constraints to be actually
functional. And no, the eventual benefits workarounds with triggers
described in the ForeignKeyTriggers wiki page cited above are not worth
their complexity.
As for views, I don't see an use case yet, but if there's one, there
wouldn't be problems to use them, except for updates, which is also not
something consistently supported.
I suppose this is just for illustration, as the above doesn't really
fit well to Trac. For example the "work effort" central in the above
example is defined by something "that represent work with a
start..end time", which is not how I would define the project,
milestone and ticket resources in Trac. They have such properties,
but they are accidental rather than essential.
I think this example is highly relevant. It encapsulate a lot of central
things regarding project management, that we have a tendency to reinvent.
Consider the ideas outlined here:
http://trac-hacks.org/wiki/ProjectManagementIdeas
against this model. Do we have a clear path for how to handle such
things in the future, whatever approach is chosen? Why not start with
a simple model that is known to be good? Or do we just not care that some
poor schmuck has to deal with these issues down the road? (I see our
current system as a specific implementation of this general case, well, a
part of the system that is)
Be sure that I highly value these kind of design documents. It did read
an early version of it, I'll re-read it later and I'm certainly trying
to take that kind of material into account (as well as hundreds of
tickets) when thinking about an enhanced model. So I think we can agree
at least on the target goal, make it possible to code less and create
more (to reuse the Qt motto ;-) ).
Well, the GenericTrac page is more a scratch pad for new ideas. I
haven't yet found the time to write down the last batch of ideas I
had on the topic, before the 0.12 release caught most of my
attention, so this is by no means a finalized plan.... And for the
non-existing resource, think about it as a kind of "abstract base
class", there's no resource as such but most of the existing
entities (wiki page, ticket, milestone, repository, etc.) can be
seen as resources, they have an unique identity, properties, and
some of those properties are actually relations to other resources.
Congratulations again on the release. Yes this is pretty much as I had
understood it, but I still don't think it is fundamentally the right
approach. As I outlined in the fantasy novel about addicted wizards,
I think this will lead to a reinvention of the database features with
huge complexity in middleware.
Well, I certainly didn't get all the subtleties of that tale (and to be
frank, when I see such contrived analogies, this doesn't encourage me to
take what the author says too seriously). But yes, I don't mind a little
reinvention and abstracting away from the database, if this can help
modularity, code reuse and overall simplification of the system. I think
that trying to map the model directly into the database leads to too
much rigidity (witness the current ticket table for example), and to
compensate for this rigidity, we have workarounds like the current
ticket_custom table or the enum table that are really hard to optimize,
be it by wizards (addicted?/clean? didn't get it, really). The question
is, can you come up with a clean "relational" design which improves upon
that, which is normalized, easy to use, fast, scalable, yet extensible
in terms of adding/removing custom properties at run-time? Then I'm all
ears because it's also exactly what I'm looking for.
To expand a bit on the ticket module example, it is already able to cope
with a very dynamic data model, due to its support of custom
properties. We could actually simplify a lot of things in the code if we
would fold everything into custom properties, but that would most
certainly be detrimental to performance, due to the very simple
key/value model we use there. So we need something a bit smarter, but
still as flexible.
The model you proposed in the RelationalModel page is only *very
remotely* related to what Trac needs, so maybe you should try to bridge
the gap. For an example use case, explain how you would handle the
addition of a custom field "version-fixed" for storing a reference to
the version in which a ticket has been fixed, assuming the user don't
want to abuse the milestone field for such a thing, like everyone else
does ;-)
On the same line, what you seem to consider an essential part of your
proposed model (from_date / thru_date) is *not* relevant for Trac, at
least not for tickets. At most, there's a date associated to the closing
time of a ticket, but a ticket can be closed/reopened many times... And
by the way, all changes to the tickets are versioned, something which is
also not at all present in your simplistic model.
The need for versioning of changes should ideally be taken into account
automatically by the storage system. Maybe that's another hint that
perhaps a relational database is not the best suited for the job. You
should really have a look at the TighterSubversionIntegration and the
WhySQLite pages for prior discussion on this topic.
But you nevertheless nailed the problem scope quite well: how much
do we want to use the data modeling features of SQL versus
abstracting and doing some modeling in the middleware. Doing more in
middleware can give us a higher degree of flexibility, while still
having simple "building blocks" in the database, especially if we
think about future extensions, like supporting more complex 1-n
relations.
Yup, it's just another (current) difference in philosophy ;-)
Personally I favour a complex model and simple middleware *any* day
of the week. Ultimately, from my perspective, Trac is a very database-
centric application. In such cases, I think it makes sense to design
the database in a way that will essentially survive several generations of
code.
There's much more to it than just the database, the database is just a
persistence layer, but not at all the level where we place the logic in
it. Take for example the permission model: before 0.10, permissions were
all placed in the database, now starting with 0.11, you can have any
kind of permission policy implemented by plugins, and the old db-based
permission policy is just one among them, which could even be turned off.
What you're describing here is the fact that the current Trac data
model is following a "specific" style of modeling (still according
to the terminology of that book). Trying to address this problem by
The problem with the current solution is that the model is specific,
but the implemented use case is generic, leading to a redundant approach.
Indeed.
going to a more "generalized" style of modeling is precisely the
ambition behind my GenericTrac approach. How to do it best and how
much abstraction is needed is of course an open question. For
example, we perhaps don't want to push the abstraction to the level
of using triples...
I agree, this is one of the toughest things you can possibly envision to
encapsulate in a data model. But still, it *is* a relational data model
at the bottom, and I am quite sure the GenericTrac approach would perform
better if implemented on top of XML, raw files or other non-relational
technology (and these approaches may well deserve consideration)..
This could well be. In the longer term, one nice side-effect of
abstracting away the details of persistence could be to use a non-SQL
backend, or a (D)VCS system, which could indeed prove to be more suited
to store the kind of data manipulated by Trac.
I'm certainly interested in seeing some constructive criticism on
GenericTrac. As I've said above, I'll soon rewrite some parts of the
proposals, and keep you posted. As Remy said, there anyway won't be
a complete rewrite but rather small steps in this direction, if that
proves to be useful.
Well, it's just that I disagree with the fundamental approach to abstracting
storage in this way, again referring to the wizards. A good experiment would
be to set up some expected usage figures for a small, medium, and large team
in a multiproject scenario. What do they produce in terms of ticket changes
per day, wiki changes per day, commits and so forth over time? (remember all
this data is collapsed into one system, no wizard help).
We could use these estimates to generate test datasets, and benchmark the
model using simple "substitute middleware", avoiding the (big) issue of
how the resource schema is loaded and maintained in memory alltogether.
My concern is that this experiment would reveal artificial bottlenecks,
introduced in the middleware, because of the abstract model core. As the
dataset grows, the problems grow exponentially.
For my generic Trac model, I'll eventually go back to a more natural set
of tables, each associated with one kind of resources, precisely to
avoid possible bottlenecks. Creating ad-hoc tables for relations between
resources (e.g. a ticket_to_milestone table) is also an option to
explore, so yes, we would effectively have to benchmark things before
committing to a model. Once again, I don't know yet the final form this
will take, in the end this will be a simple, normalized, scalable model,
I hope a solid foundation to build upon, so maybe even something that
could satisfy your database design taste, who knows?
-- Christian
--
You received this message because you are subscribed to the Google Groups "Trac
Development" group.
To post to this group, send email to trac-...@googlegroups.com.
To unsubscribe from this group, send email to
trac-dev+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/trac-dev?hl=en.