Thanks, interesting stuff! Will study this. Don't comprehend it fully, but basically query 3 is no good then?
RBS On Sun, 22 Sep 2019, 20:36 Keith Medcalf, <kmedc...@dessus.com> wrote: > > On Sunday, 22 September, 2019 12:36, Bart Smissaert < > bart.smissa...@gmail.com> wrote: > > >Have tested this now and no 1 is indeed fastest, with no 2 not far > >behind. 3 is quite a bit slower and for some strange reason gives > >a different value for the females aged 50. > >Will look into this. > > Query 1 can run with either Patients or Num_values in the outer loop > depending on the size of the tables, index availability, and any available > statistics. > > Query 2 requires that Patients be in the outer loop and that an index > exist on num_values id and term_text (in any order as the first two columns > of the index) for efficient lookups. It will do only one lookup into the > num_values per patients (one of the left joins will always fail, so it will > be short-circuited, but this short-circuiting will still take time to > process) but may join multiple matching num_values records per patients > record. > > Query 3 is the same as Query 2 but has the additional restriction that it > will only handle one num_values per patients, and it will always do two > executions of the subquery per patients record (so it will always be much > slower than the others). > > Your observed difference will be because the Patients:Num_values where > Num_values is constrained to have term_text = 'Haemoglobin estimation' is > not a 1:1 relationship. That is: > > select patients.id, count(*) > from patients, num_values > where patients.id == num_values.id > and num_values.term_text = 'Haemoglobin estimation' > group by patients.id > having count(*) > 1; > > returns at least one row. And those patients.id probably have age 50 and > are Female ... > > >On Sun, 22 Sep 2019, 18:37 Keith Medcalf, <kmedc...@dessus.com> wrote: > > > >> > >> On Sunday, 22 September, 2019 04:10, Bart Smissaert < > >> bart.smissa...@gmail.com> wrote: > >> > >> >Have this query: > >> > >> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS > >AGE, > >> > AVG(BM.numeric_value) AS avg_Hb_Male, > >> > AVG(BF.numeric_value) as avg_Hb_Female > >> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX = > >> 'Male') > >> > INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX = > >> 'Female') > >> >WHERE BF.term_text = 'Haemoglobin estimation' and > >> > BM.term_text = 'Haemoglobin estimation' > >> >GROUP BY AGE > >> >HAVING AGE > 18 and AGE < 98 > >> >ORDER BY AGE ASC > >> > >> >Which always used to run fine and return rows. There is no error, but > >> >there are no returned rows. > >> > >> What changed between when it was "working" and when it was "not > >working"? > >> As far as I can see it should never have been "working" because the > >eqijoin > >> constraints could never be satisfied except by a Patients that was both > >> Male and Female at the same time in the same record (which is > >impossible). > >> > >> select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as > >> Age, > >> avg(case when P.Sex == 'Male' then V.numeric_value else null > >end) > >> as avg_Hb_Male, > >> avg(case when P.Sex == 'Female' then V.numeric_value else null > >> end) as avg_Hb_Female > >> from Patients P, num_values V > >> where P.id == V.id > >> and V.term_text == 'Haemoglobin estimation' > >> group by Age > >> having Age between 19 and 97 > >> order by Age asc; > >> > >> or, using outer joins: > >> > >> select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) > >as > >> Age, > >> avg(BM.numeric_value) as avg_Hb_Male, > >> avg(BF.numeric_value) as avg_Hb_Female > >> from Patients as P > >> left join num_values as BM > >> on P.id == BM.id > >> and P.Sex == 'Male' > >> and BM.term_text == 'Haemoglobin estimation' > >> left join num_values as BF > >> on P.id == BF.id > >> and P.Sex == 'Female' > >> and BF.term_text == 'Haemoglobin estimation' > >> group by Age > >> having Age between 19 and 97 > >> order by Age asc; > >> > >> or, using correlated subqueries: > >> > >> select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as > >> Age, > >> avg(( > >> select numeric_value > >> from num_values as V > >> where P.id == V.id > >> and P.Sex == 'Male' > >> and V.term_text == 'Haemoglobin estimation' > >> )) as avg_Hb_Male, > >> avg(( > >> select numeric_value > >> from num_values as V > >> where P.id == V.id > >> and P.Sex == 'Female' > >> and V.term_text == 'Haemoglobin estimation' > >> )) as avg_Hb_Female > >> from Patients as P > >> group by Age > >> having Age between 19 and 97 > >> order by Age asc; > >> > >> The first one is likely to be the fastest to execute since it does the > >> least record lookups and the optimizer has the best chance at > >optimizing > >> the query. > >> > >> > >> > >> > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users@mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> > >_______________________________________________ > >sqlite-users mailing list > >sqlite-users@mailinglists.sqlite.org > >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users