Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

Eddie Stanley Thu, 15 Mar 2007 06:18:38 -0800

David Fetter wrote:

On Wed, Mar 14, 2007 at 12:07:45PM +1300, Edward Stanley wrote:

On Wed, 14 Mar 2007, David Fetter wrote:

On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote:

David Fetter wrote:

On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:

David Fetter wrote:

On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:

* Another good example is the "questionnaire".

With all due respect, this is a solved problem *without EAV or
run-time DDL*.  The URL below has one excellent approach to this.


<http://www.varlena.com/GeneralBits/110.php>

Which broadly speaking was the solution I used for my
questionnaire, except I had a restricted set of types so basically
just coerced them to text and side-stepped the inheritance issue.
To the extent that it's dynamic, it's still just EAV though.

That's precisely the difference between the above solution and
yours, and it's the difference between a good design and one that
will come up and bit you on the as^Hnkle.

It's still basically EAV (either approach).  The key fault with EAV
is that the tables have no semantic meaning - answer_int contains
number of oranges, days since birth and the price of a tube ticket
in pennies.

Stuffing all of those into an answer_int is *precisely* what the end
user must not do.  That's pilot error.

Now, with a questionnaire that might not matter because everything
is an "answer" and you're not necessarily going to do much more than
count/aggregate it.

See above.

It doesn't remove the need for run-time DDL if you allow users to
add their own questions.

Sure it does.   When a user, who should be talking with you, wants
to ask a new kind of question, that's the start of a discussion
about what new kind(s) of questions would be generally applicable
in the questionnaire schema.  Then, when you come to an agreement,
you roll it into the new schema, and the whole system gets an
improvement.

Fine, but if you're not letting the user extend the system, then
it's not really addressing Edward's original posting, is it?

It's my contention that Edward's original idea is ill-posed.  SQL is
just fine for doing this kind of thing, and it's *not that hard*.

If the user's talking to me, I might as well just write the DDL
myself - it's the talk that'll take the time, not writing a dozen
lines of SQL.

It's the talk that's the important part.  Machines are really bad at
seeing the broader picture.  In the attempt to "save" a few minutes'
discussion, he's trying to borrow that time from a system asked to do
things that computers are inherently bad at doing, and every end user
will pay that time back at a very high rate of interest.  This is
precisely the kind of false economy that so plagues software
development and maintenance these days.

The interesting part of the problem (from a Comp-Sci point of view)
is precisely in automating part of that discussion.  It's providing
an abstraction so that you don't end up with a mass of attributes
while still providing freedom to the user.

This freedom and efficiency you're talking about is better supplied,
IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
or an SF.net.  That way, improvements to the DDL get spread all over
the world, and a very large amount of wheel reinvention gets avoided.
Reusable components are a big chunk of both freedom and efficiency. :)

Cheers,
D

Maybe I should rethink the problem a bit - from the very brief
initial research I've done, it seems EAV schemas have two common

uses:

1) When new attributes have to be created on-the-fly2) When the number of possible properties for an entity greatly (orders ofmagnitude) exceeds the number of properties any one entity is likely to have.


Um, no.  The first use case is bad coding practice, and the second is
a classic case for a join table, which is the standard way to handle
M:N relationships.

I'm not sure about solving the first problem - there seems to be a lot ofdebate around this. I can see reasons for and against allowing this. HoweverI think the second is a very real problem. One such example is a patientrecord system.
For each patient we have a table of common data (dob, sex, height, weight etc)but as well as this a patient can present with many symptoms. This might be atable of 40,000 possible symptoms.


Here's how I'd do that:

CREATE TABLE patient (
    patient_id SERIAL PRIMARY KEY, /* for simplicity.  Some
                                      combination of columns in the
                                      table would also have a UNIQUE
                                      NOT NULL constraint on it.
                                    */
    ...
);

CREATE TABLE symptom (
    symptom_id SERIAL PRIMARY KEY, /* See above. */
    ...
);

CREATE TABLE patient_presents_with (
    patient_id INTEGER NOT NULL REFERENCES patient(patient_id),
    symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id),
    UNIQUE(patient_id, symptom_id)
);

Lets say we want to run a query on these symptoms (using a boolean expression)


I'd use something like the following:

SELECT
    p.patient_id,
    p.f_name,
    p.l_name,
    s.symptom_name,
    s.symptom_desc
FROM
    patient p
JOIN
    patient_presents_with ppw
    USING (patient_id)
JOIN
    symptom s
    USING (symptom_id)
WHERE
    s.symptom_name = ALL('foo','bar','baz')
AND
    s.symptom_name = ANY('quux','fleeg');

Are the ALL and ANY functions new to Postgresql 8? I haven't met thembefore.

Anyway this will work for some queries but not others.

What about the following expression? ('foo' && 'bar') | ('baz' && !'quxx') | 'fleeg'Maybe I have misunderstood how these functions work, but I don't thinkthey will handle anything but trivial examples of this problem.

to return the patient records which match the query string on the symptoms.
(This turns out to be a very similar problem to the 'tags' example I firstpresented) - assume a similar schema. With more than a couple of symptoms anda complex tree, the resulting SQL can span pages.
Not really.  See above :)
When I first started thinking about this project I believed the two problemsessentially to be the same class of problem, but this may not be the case.
EAV will bite you.  It's not *that* much work to keep its from biting
you. :)

Cheers,
D



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] My honours project - databases using dynamically attached entity-properties

Reply via email to