Mike Elston wrote:
Hi Darren,

Thanks for posting this to the list. I haven't included your conversation with Dave in this reply, to keep the posting short.

A few thoughts...

There already exist a number of systems which store genealogical information in relational databases, for display by wikis or in more traditional genealogical style. I have been using the open-source phpGedView (see phpgedview.sourceforge.net or www.phpgedview.net) for some while now, and use perl-gedcom utilities for various management tasks and for writing the occasional one-off processing tool.

phpGedView, GeneoTree, Oxy-Gen and other existing systems are all designed to parse gedcom files and convert them to SQL databases. It would seem to be more useful to contribute to existing open-source systems that to start trying to write a new one for one's own use: for example, phpGedView already provides tables which allow for multiple families, aliases, source information and quality by implementing the full GEDCOM structure.

Speaking just for my own intended project, not Dave's, my understanding is that what I intend to do is so fundamentally different from any existing genealogy projects that it really is best for me to start my own, though mine would be open source and it can still glean things from other projects.

For one thing, my project is more abstract and extensible, and actually is more of an ontology tool than a genealogy tool, but that it would handle genealogy particularly well. For another thing, my project focuses on presenting chains of claims, he-said-she-said, with each link being fully described, and not simply a bibliography, and this relating to how much stock one can put in the claims associated with the sources.

In any event, as I said this project is temporarily shelved while I work on my SQL replacement first. In fact, if one were to look at my (ostensibly completely specified) relational Muldis D language and how its design could be used by DBMS access tools or ORMs as a front for SQL databases today, and see how my approach differs from every other SQL tool/ORM out there, which are relatively a lot more self-similar, this might give a hint as to how thoroughly I may present a different tool for genealogy than those existing now; I'm no more constrained by GEDCOM than by SQL.

I must admit, I do like your distinction between "first hand experience", "assumed most likely considering X" and "just heard it somewhere" as examples of differing quality of non-record-based sources. GEDCOM 5.5 (still the de facto standard, and the implementation on which this list is based) only has QUAY 0/1/2/3 (which it defines basically as "unreliable" / "questionable" / "secondary" / "primary or evidence-based"), and these are neither carefully enough defined nor sufficiently widely or consistently used to be of much value except to the person ascribing the quality to the source in the first place.

Yes, each person using the database would be a source themselves, or a proxy for one, and can talk about their own first hand experience. This isn't like an encyclopedia where information not first written down elsewhere is disallowed.

Regarding database implementations: I understand (as a software developer myself) that there is a camp where PostgreSQL is strongly preferred over MySQL, and the recent acquisition of the latter has added fuel to the cause. But (imho) you're heavily overstating the case to claim that PostgreSQL "is a much more reliable and better quality DBMS, which all the savvy people prefer over MySQL". One factor many people have to take into account is that many third-party web-space providers also provide MySQL servers: users may not have the choice. In my experience, they differ more in implementation than in quality, and MySQL works perfectly well (and reliably) for the sort of task genealogical databases require. It's widely-available, well-supported, industrial-strength, scalable, and I like it :-)

Just my two-penn'orth...

/mike

If you want to see an ostensibly objective example of how PostgreSQL is better than MySQL, and has been for a long time, just look at the release notes or change logs for both projects. PostgreSQL is much more stable and free of bugs, and its change logs dominantly deal in new features or performance enhancements, and its bug fixes tend towards the relatively minor, though there are some significant ones periodically. MySQL in contrast dominantly deals with fixing bugs, many of them serious, which would indicate that MySQL has a lot more bugs in it to begin with. The release notes or change logs reflect how each project's own developers see them, never mind other people.

MySQL has been known, for example in its version 5.1, to declare itself "generally available"/"production" quality despite having over a hundred serious bugs in it. In contrast, PostgreSQL would consider such to be "alpha" or maybe "beta" quality.

But even ignoring the change logs, the designs of the two DBMSs reflect different philosophies, where Postgres considers things like data integrity and consistent behavior to be highly important while MySQL is much more likely to consider something to much lower standards as "good enough". MySQL didn't and still doesn't support transactions product-wide. Nor, I think, foreign key constraints. MySQL silently truncates inserted data that is too long for a field, saying things are okay even though it just lost some data, rather than raising an error citing 'input too long'. I have first-hand experience with that. MySQL considers the string 'foo ' to be = to 'foo', which it clearly isn't. Postgres nor any proper relational DBMS does these things. And there are many other citable things. MySQL databases are much more likely to become corrupted.

I liken MySQL to Microsoft Windows, which people use dominantly because it has a greater number of existing installations or they have for the same reason already used it and only it before. While people having solid experience with both MySQL and PostgreSQL, and prefer MySQL exist, I would think those MySQL users are a minority compared to those for whom MySQL is the only thing they know, because it was pre-installed. As with MS Windows, while some people who have solid experience with other major OSs would like Windows more, I'd think a majority of Windows users are those who don't like it more, and just use it because it is either all they know or a program they need requires it.

If both MySQL and PostgreSQL were pre-installed on the same number of hosts, and the same number of people were solidly experienced with both, I would think that more people would choose PostgreSQL by default for their next project.

People who are savvy with databases and know both PostgreSQL and MySQL would prefer PostgreSQL for quality and features hands-down. And if their host doesn't provide PostgreSQL, they would demand they provide it, or install it themselves, or find another host, or that particular data is unimportant.

-- Darren Duncan

Reply via email to