Mike Elston wrote:
Hi Darren,
Thanks for posting this to the list. I haven't included your
conversation with Dave in this reply, to keep the posting short.
A few thoughts...
There already exist a number of systems which store genealogical
information in relational databases, for display by wikis or in more
traditional genealogical style. I have been using the open-source
phpGedView (see phpgedview.sourceforge.net or www.phpgedview.net) for
some while now, and use perl-gedcom utilities for various management
tasks and for writing the occasional one-off processing tool.
phpGedView, GeneoTree, Oxy-Gen and other existing systems are all
designed to parse gedcom files and convert them to SQL databases. It
would seem to be more useful to contribute to existing open-source
systems that to start trying to write a new one for one's own use: for
example, phpGedView already provides tables which allow for multiple
families, aliases, source information and quality by implementing the
full GEDCOM structure.
Speaking just for my own intended project, not Dave's, my understanding is that
what I intend to do is so fundamentally different from any existing genealogy
projects that it really is best for me to start my own, though mine would be
open source and it can still glean things from other projects.
For one thing, my project is more abstract and extensible, and actually is more
of an ontology tool than a genealogy tool, but that it would handle genealogy
particularly well. For another thing, my project focuses on presenting chains
of claims, he-said-she-said, with each link being fully described, and not
simply a bibliography, and this relating to how much stock one can put in the
claims associated with the sources.
In any event, as I said this project is temporarily shelved while I work on my
SQL replacement first. In fact, if one were to look at my (ostensibly
completely specified) relational Muldis D language and how its design could be
used by DBMS access tools or ORMs as a front for SQL databases today, and see
how my approach differs from every other SQL tool/ORM out there, which are
relatively a lot more self-similar, this might give a hint as to how thoroughly
I may present a different tool for genealogy than those existing now; I'm no
more constrained by GEDCOM than by SQL.
I must admit, I do like your distinction between "first hand
experience", "assumed most likely considering X" and "just heard it
somewhere" as examples of differing quality of non-record-based sources.
GEDCOM 5.5 (still the de facto standard, and the implementation on which
this list is based) only has QUAY 0/1/2/3 (which it defines basically as
"unreliable" / "questionable" / "secondary" / "primary or
evidence-based"), and these are neither carefully enough defined nor
sufficiently widely or consistently used to be of much value except to
the person ascribing the quality to the source in the first place.
Yes, each person using the database would be a source themselves, or a proxy for
one, and can talk about their own first hand experience. This isn't like an
encyclopedia where information not first written down elsewhere is disallowed.
Regarding database implementations: I understand (as a software
developer myself) that there is a camp where PostgreSQL is strongly
preferred over MySQL, and the recent acquisition of the latter has added
fuel to the cause. But (imho) you're heavily overstating the case to
claim that PostgreSQL "is a much more reliable and better quality DBMS,
which all the savvy people prefer over MySQL". One factor many people
have to take into account is that many third-party web-space providers
also provide MySQL servers: users may not have the choice. In my
experience, they differ more in implementation than in quality, and
MySQL works perfectly well (and reliably) for the sort of task
genealogical databases require. It's widely-available, well-supported,
industrial-strength, scalable, and I like it :-)
Just my two-penn'orth...
/mike
If you want to see an ostensibly objective example of how PostgreSQL is better
than MySQL, and has been for a long time, just look at the release notes or
change logs for both projects. PostgreSQL is much more stable and free of bugs,
and its change logs dominantly deal in new features or performance enhancements,
and its bug fixes tend towards the relatively minor, though there are some
significant ones periodically. MySQL in contrast dominantly deals with fixing
bugs, many of them serious, which would indicate that MySQL has a lot more bugs
in it to begin with. The release notes or change logs reflect how each
project's own developers see them, never mind other people.
MySQL has been known, for example in its version 5.1, to declare itself
"generally available"/"production" quality despite having over a hundred serious
bugs in it. In contrast, PostgreSQL would consider such to be "alpha" or maybe
"beta" quality.
But even ignoring the change logs, the designs of the two DBMSs reflect
different philosophies, where Postgres considers things like data integrity and
consistent behavior to be highly important while MySQL is much more likely to
consider something to much lower standards as "good enough". MySQL didn't and
still doesn't support transactions product-wide. Nor, I think, foreign key
constraints. MySQL silently truncates inserted data that is too long for a
field, saying things are okay even though it just lost some data, rather than
raising an error citing 'input too long'. I have first-hand experience with
that. MySQL considers the string 'foo ' to be = to 'foo', which it clearly
isn't. Postgres nor any proper relational DBMS does these things. And there
are many other citable things. MySQL databases are much more likely to become
corrupted.
I liken MySQL to Microsoft Windows, which people use dominantly because it has a
greater number of existing installations or they have for the same reason
already used it and only it before. While people having solid experience with
both MySQL and PostgreSQL, and prefer MySQL exist, I would think those MySQL
users are a minority compared to those for whom MySQL is the only thing they
know, because it was pre-installed. As with MS Windows, while some people who
have solid experience with other major OSs would like Windows more, I'd think a
majority of Windows users are those who don't like it more, and just use it
because it is either all they know or a program they need requires it.
If both MySQL and PostgreSQL were pre-installed on the same number of hosts, and
the same number of people were solidly experienced with both, I would think that
more people would choose PostgreSQL by default for their next project.
People who are savvy with databases and know both PostgreSQL and MySQL would
prefer PostgreSQL for quality and features hands-down. And if their host
doesn't provide PostgreSQL, they would demand they provide it, or install it
themselves, or find another host, or that particular data is unimportant.
-- Darren Duncan