Thanks, interesting stuff! Will study this.
Don't comprehend it fully, but basically query 3 is no good then?

RBS

On Sun, 22 Sep 2019, 20:36 Keith Medcalf, <kmedc...@dessus.com> wrote:

>
> On Sunday, 22 September, 2019 12:36, Bart Smissaert <
> bart.smissa...@gmail.com> wrote:
>
> >Have tested this now and no 1 is indeed fastest, with no 2 not far
> >behind.  3 is quite a bit slower and for some strange reason gives
> >a different value for the females aged 50.
> >Will look into this.
>
> Query 1 can run with either Patients or Num_values in the outer loop
> depending on the size of the tables, index availability, and any available
> statistics.
>
> Query 2 requires that Patients be in the outer loop and that an index
> exist on num_values id and term_text (in any order as the first two columns
> of the index) for efficient lookups.  It will do only one lookup into the
> num_values per patients (one of the left joins will always fail, so it will
> be short-circuited, but this short-circuiting will still take time to
> process) but may join multiple matching num_values records per patients
> record.
>
> Query 3 is the same as Query 2 but has the additional restriction that it
> will only handle one num_values per patients, and it will always do two
> executions of the subquery per patients record (so it will always be much
> slower than the others).
>
> Your observed difference will be because the Patients:Num_values where
> Num_values is constrained to have term_text = 'Haemoglobin estimation' is
> not a 1:1 relationship.  That is:
>
>   select patients.id, count(*)
>     from patients, num_values
>    where patients.id == num_values.id
>      and num_values.term_text = 'Haemoglobin estimation'
> group by patients.id
>   having count(*) > 1;
>
> returns at least one row.  And those patients.id probably have age 50 and
> are Female ...
>
> >On Sun, 22 Sep 2019, 18:37 Keith Medcalf, <kmedc...@dessus.com> wrote:
> >
> >>
> >> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
> >> bart.smissa...@gmail.com> wrote:
> >>
> >> >Have this query:
> >>
> >> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS
> >AGE,
> >> >       AVG(BM.numeric_value) AS avg_Hb_Male,
> >> >       AVG(BF.numeric_value) as avg_Hb_Female
> >> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
> >> 'Male')
> >> >                INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
> >> 'Female')
> >> >WHERE BF.term_text = 'Haemoglobin estimation' and
> >> >      BM.term_text = 'Haemoglobin estimation'
> >> >GROUP BY AGE
> >> >HAVING AGE > 18 and AGE < 98
> >> >ORDER BY AGE ASC
> >>
> >> >Which always used to run fine and return rows. There is no error, but
> >> >there are no returned rows.
> >>
> >> What changed between when it was "working" and when it was "not
> >working"?
> >> As far as I can see it should never have been "working" because the
> >eqijoin
> >> constraints could never be satisfied except by a Patients that was both
> >> Male and Female at the same time in the same record (which is
> >impossible).
> >>
> >>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> >> Age,
> >>          avg(case when P.Sex == 'Male' then V.numeric_value else null
> >end)
> >> as avg_Hb_Male,
> >>          avg(case when P.Sex == 'Female' then V.numeric_value else null
> >> end) as avg_Hb_Female
> >>     from Patients P, num_values V
> >>    where P.id == V.id
> >>      and V.term_text == 'Haemoglobin estimation'
> >> group by Age
> >>   having Age between 19 and 97
> >> order by Age asc;
> >>
> >> or, using outer joins:
> >>
> >>    select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
> >as
> >> Age,
> >>           avg(BM.numeric_value) as avg_Hb_Male,
> >>           avg(BF.numeric_value) as avg_Hb_Female
> >>      from Patients as P
> >> left join num_values as BM
> >>        on P.id == BM.id
> >>       and P.Sex == 'Male'
> >>       and BM.term_text == 'Haemoglobin estimation'
> >> left join num_values as BF
> >>        on P.id == BF.id
> >>       and P.Sex == 'Female'
> >>       and BF.term_text == 'Haemoglobin estimation'
> >>  group by Age
> >>    having Age between 19 and 97
> >>  order by Age asc;
> >>
> >> or, using correlated subqueries:
> >>
> >>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
> >> Age,
> >>          avg((
> >>               select numeric_value
> >>                 from num_values as V
> >>                where P.id == V.id
> >>                  and P.Sex == 'Male'
> >>                  and V.term_text == 'Haemoglobin estimation'
> >>             )) as avg_Hb_Male,
> >>          avg((
> >>               select numeric_value
> >>                 from num_values as V
> >>                where P.id == V.id
> >>                  and P.Sex == 'Female'
> >>                  and V.term_text == 'Haemoglobin estimation'
> >>             )) as avg_Hb_Female
> >>     from Patients as P
> >> group by Age
> >>   having Age between 19 and 97
> >> order by Age asc;
> >>
> >> The first one is likely to be the fastest to execute since it does the
> >> least record lookups and the optimizer has the best chance at
> >optimizing
> >> the query.
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >_______________________________________________
> >sqlite-users mailing list
> >sqlite-users@mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to