This is only to see a general trend how particular values change by age and
by gender in our patient group.
That is about 12000 patients in our practice.
The pattern won't change much over the years, so one snapshot for that
value will do.

RBS

On Sun, 22 Sep 2019, 22:32 Keith Medcalf, <kmedc...@dessus.com> wrote:

>
> Well, no.  It depends what the data is and what you are measuring.  For
> the "age" in the "patients" record to have any meaning, there can only be
> one sample and it must have been taken "now", and the results are only
> valid when the query is run "now".
>
> That is, if you have one (or more) samples per patient in the database,
> then the results classified by the present (at the time of the query) age
> of the patients are meaningless.  When you run the same query next year
> with exactly the same data, you will get an entirely different answer
> because all the patients will be a year older but otherwise nothing will
> have changed.  In order for the results to be constant you need to know the
> age of the patients at the time the sample was taken and use that as the
> classifier -- then it matters not when you run the actual query, the
> results will be constant for constant data.  Plus, of course, if you have
> multiple num_values per patient that implies different samples from the
> same patient taken at different times, so the fact that the patient is now
> 70 years old when there is one sample that was taken when the patient was 5
> is not comparable to a patient now 70 who had his one sample taken last
> week.  Nor if every patient had a sample taken each year and recorded in
> the database, does the present age of the patient provide any meaningful
> information at all.  (And of course we are assuming that the sample data is
> for Released/Final results and not for interim or retests of the same
> sample).
>
> Of course, the "patients" might not actually be "patients" but rather
> "samples", and the num_values are the final results of some analysis for
> that sample.  In which case if "create unique index results on num_values
> (id, text_term)" fails due to a constraint violation (more than one
> analysis result per sample) then the data contains errors that need to be
> resolved before meaningful results can be obtained.
>
> Typical lab databases will have a samplepoint (the thing from which the
> sample was collected).  There may then be multiple samples taken at
> different times, and on those samples multiple analysis may be run giving
> exactly one final result each.  Generally the analysis timestamp is not
> very useful (but it is usually recorded anyway).
>
> That is, there is exactly one final result per analysis per sample per
> samplepoint.
>
> >-----Original Message-----
> >From: sqlite-users <sqlite-users-boun...@mailinglists.sqlite.org> On
> >Behalf Of Bart Smissaert
> >Sent: Sunday, 22 September, 2019 13:51
> >To: General Discussion of SQLite Database <sqlite-
> >us...@mailinglists.sqlite.org>
> >Subject: Re: [sqlite] What is wrong with this SQL?
> >
> >Thanks, interesting stuff! Will study this.
> >Don't comprehend it fully, but basically query 3 is no good then?
> >
> >RBS
> >
> >On Sun, 22 Sep 2019, 20:36 Keith Medcalf, <kmedc...@dessus.com> wrote:
> >
> >>
> >> On Sunday, 22 September, 2019 12:36, Bart Smissaert <
> >> bart.smissa...@gmail.com> wrote:
> >>
> >> >Have tested this now and no 1 is indeed fastest, with no 2 not far
> >> >behind.  3 is quite a bit slower and for some strange reason gives
> >> >a different value for the females aged 50.
> >> >Will look into this.
> >>
> >> Query 1 can run with either Patients or Num_values in the outer loop
> >> depending on the size of the tables, index availability, and any
> >available
> >> statistics.
> >>
> >> Query 2 requires that Patients be in the outer loop and that an index
> >> exist on num_values id and term_text (in any order as the first two
> >columns
> >> of the index) for efficient lookups.  It will do only one lookup into
> >the
> >> num_values per patients (one of the left joins will always fail, so it
> >will
> >> be short-circuited, but this short-circuiting will still take time to
> >> process) but may join multiple matching num_values records per patients
> >> record.
> >>
> >> Query 3 is the same as Query 2 but has the additional restriction that
> >it
> >> will only handle one num_values per patients, and it will always do two
> >> executions of the subquery per patients record (so it will always be
> >much
> >> slower than the others).
> >>
> >> Your observed difference will be because the Patients:Num_values where
> >> Num_values is constrained to have term_text = 'Haemoglobin estimation'
> >is
> >> not a 1:1 relationship.  That is:
> >>
> >>   select patients.id, count(*)
> >>     from patients, num_values
> >>    where patients.id == num_values.id
> >>      and num_values.term_text = 'Haemoglobin estimation'
> >> group by patients.id
> >>   having count(*) > 1;
> >>
> >> returns at least one row.  And those patients.id probably have age 50
> >and
> >> are Female ...
> >>
> >> >On Sun, 22 Sep 2019, 18:37 Keith Medcalf, <kmedc...@dessus.com> wrote:
> >> >
> >> >>
> >> >> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
> >> >> bart.smissa...@gmail.com> wrote:
> >> >>
> >> >> >Have this query:
> >> >>
> >> >> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
> >AS
> >> >AGE,
> >> >> >       AVG(BM.numeric_value) AS avg_Hb_Male,
> >> >> >       AVG(BF.numeric_value) as avg_Hb_Female
> >> >> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX
> >=
> >> >> 'Male')
> >> >> >                INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX
> >=
> >> >> 'Female')
> >> >> >WHERE BF.term_text = 'Haemoglobin estimation' and
> >> >> >      BM.term_text = 'Haemoglobin estimation'
> >> >> >GROUP BY AGE
> >> >> >HAVING AGE > 18 and AGE < 98
> >> >> >ORDER BY AGE ASC
> >> >>
> >> >> >Which always used to run fine and return rows. There is no error,
> >but
> >> >> >there are no returned rows.
> >> >>
> >> >> What changed between when it was "working" and when it was "not
> >> >working"?
> >> >> As far as I can see it should never have been "working" because the
> >> >eqijoin
> >> >> constraints could never be satisfied except by a Patients that was
> >both
> >> >> Male and Female at the same time in the same record (which is
> >> >impossible).
> >> >>
> >> >>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
> >as
> >> >> Age,
> >> >>          avg(case when P.Sex == 'Male' then V.numeric_value else
> >null
> >> >end)
> >> >> as avg_Hb_Male,
> >> >>          avg(case when P.Sex == 'Female' then V.numeric_value else
> >null
> >> >> end) as avg_Hb_Female
> >> >>     from Patients P, num_values V
> >> >>    where P.id == V.id
> >> >>      and V.term_text == 'Haemoglobin estimation'
> >> >> group by Age
> >> >>   having Age between 19 and 97
> >> >> order by Age asc;
> >> >>
> >> >> or, using outer joins:
> >> >>
> >> >>    select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25,
> >0)
> >> >as
> >> >> Age,
> >> >>           avg(BM.numeric_value) as avg_Hb_Male,
> >> >>           avg(BF.numeric_value) as avg_Hb_Female
> >> >>      from Patients as P
> >> >> left join num_values as BM
> >> >>        on P.id == BM.id
> >> >>       and P.Sex == 'Male'
> >> >>       and BM.term_text == 'Haemoglobin estimation'
> >> >> left join num_values as BF
> >> >>        on P.id == BF.id
> >> >>       and P.Sex == 'Female'
> >> >>       and BF.term_text == 'Haemoglobin estimation'
> >> >>  group by Age
> >> >>    having Age between 19 and 97
> >> >>  order by Age asc;
> >> >>
> >> >> or, using correlated subqueries:
> >> >>
> >> >>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
> >as
> >> >> Age,
> >> >>          avg((
> >> >>               select numeric_value
> >> >>                 from num_values as V
> >> >>                where P.id == V.id
> >> >>                  and P.Sex == 'Male'
> >> >>                  and V.term_text == 'Haemoglobin estimation'
> >> >>             )) as avg_Hb_Male,
> >> >>          avg((
> >> >>               select numeric_value
> >> >>                 from num_values as V
> >> >>                where P.id == V.id
> >> >>                  and P.Sex == 'Female'
> >> >>                  and V.term_text == 'Haemoglobin estimation'
> >> >>             )) as avg_Hb_Female
> >> >>     from Patients as P
> >> >> group by Age
> >> >>   having Age between 19 and 97
> >> >> order by Age asc;
> >> >>
> >> >> The first one is likely to be the fastest to execute since it does
> >the
> >> >> least record lookups and the optimizer has the best chance at
> >> >optimizing
> >> >> the query.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> sqlite-users mailing list
> >> >> sqlite-users@mailinglists.sqlite.org
> >> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> >>
> >> >_______________________________________________
> >> >sqlite-users mailing list
> >> >sqlite-users@mailinglists.sqlite.org
> >> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >>
> >>
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to