Re: [sqlite] What is wrong with this SQL?

Keith Medcalf Sun, 22 Sep 2019 12:37:09 -0700

On Sunday, 22 September, 2019 12:36, Bart Smissaert <[email protected]> 
wrote:


>Have tested this now and no 1 is indeed fastest, with no 2 not far
>behind.  3 is quite a bit slower and for some strange reason gives 
>a different value for the females aged 50.
>Will look into this.

Query 1 can run with either Patients or Num_values in the outer loop depending 
on the size of the tables, index availability, and any available statistics.  

Query 2 requires that Patients be in the outer loop and that an index exist on 
num_values id and term_text (in any order as the first two columns of the 
index) for efficient lookups.  It will do only one lookup into the num_values 
per patients (one of the left joins will always fail, so it will be 
short-circuited, but this short-circuiting will still take time to process) but 
may join multiple matching num_values records per patients record.

Query 3 is the same as Query 2 but has the additional restriction that it will 
only handle one num_values per patients, and it will always do two executions 
of the subquery per patients record (so it will always be much slower than the 
others).

Your observed difference will be because the Patients:Num_values where 
Num_values is constrained to have term_text = 'Haemoglobin estimation' is not a 
1:1 relationship.  That is:

  select patients.id, count(*) 
    from patients, num_values
   where patients.id == num_values.id
     and num_values.term_text = 'Haemoglobin estimation'
group by patients.id
  having count(*) > 1;

returns at least one row.  And those patients.id probably have age 50 and are 
Female ...

>On Sun, 22 Sep 2019, 18:37 Keith Medcalf, <[email protected]> wrote:
>
>>
>> On Sunday, 22 September, 2019 04:10, Bart Smissaert <
>> [email protected]> wrote:
>>
>> >Have this query:
>>
>> >SELECT round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) AS
>AGE,
>> >       AVG(BM.numeric_value) AS avg_Hb_Male,
>> >       AVG(BF.numeric_value) as avg_Hb_Female
>> >FROM PATIENTS P INNER JOIN NUM_VALUES BM ON(P.ID = BM.ID AND P.SEX =
>> 'Male')
>> >                INNER join NUM_VALUES BF ON(P.ID = BF.ID AND P.SEX =
>> 'Female')
>> >WHERE BF.term_text = 'Haemoglobin estimation' and
>> >      BM.term_text = 'Haemoglobin estimation'
>> >GROUP BY AGE
>> >HAVING AGE > 18 and AGE < 98
>> >ORDER BY AGE ASC
>>
>> >Which always used to run fine and return rows. There is no error, but
>> >there are no returned rows.
>>
>> What changed between when it was "working" and when it was "not
>working"?
>> As far as I can see it should never have been "working" because the
>eqijoin
>> constraints could never be satisfied except by a Patients that was both
>> Male and Female at the same time in the same record (which is
>impossible).
>>
>>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
>> Age,
>>          avg(case when P.Sex == 'Male' then V.numeric_value else null
>end)
>> as avg_Hb_Male,
>>          avg(case when P.Sex == 'Female' then V.numeric_value else null
>> end) as avg_Hb_Female
>>     from Patients P, num_values V
>>    where P.id == V.id
>>      and V.term_text == 'Haemoglobin estimation'
>> group by Age
>>   having Age between 19 and 97
>> order by Age asc;
>>
>> or, using outer joins:
>>
>>    select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0)
>as
>> Age,
>>           avg(BM.numeric_value) as avg_Hb_Male,
>>           avg(BF.numeric_value) as avg_Hb_Female
>>      from Patients as P
>> left join num_values as BM
>>        on P.id == BM.id
>>       and P.Sex == 'Male'
>>       and BM.term_text == 'Haemoglobin estimation'
>> left join num_values as BF
>>        on P.id == BF.id
>>       and P.Sex == 'Female'
>>       and BF.term_text == 'Haemoglobin estimation'
>>  group by Age
>>    having Age between 19 and 97
>>  order by Age asc;
>>
>> or, using correlated subqueries:
>>
>>   select round(((Julianday('now') - 2415018.5) - P.DOB) / 365.25, 0) as
>> Age,
>>          avg((
>>               select numeric_value
>>                 from num_values as V
>>                where P.id == V.id
>>                  and P.Sex == 'Male'
>>                  and V.term_text == 'Haemoglobin estimation'
>>             )) as avg_Hb_Male,
>>          avg((
>>               select numeric_value
>>                 from num_values as V
>>                where P.id == V.id
>>                  and P.Sex == 'Female'
>>                  and V.term_text == 'Haemoglobin estimation'
>>             )) as avg_Hb_Female
>>     from Patients as P
>> group by Age
>>   having Age between 19 and 97
>> order by Age asc;
>>
>> The first one is likely to be the fastest to execute since it does the
>> least record lookups and the optimizer has the best chance at
>optimizing
>> the query.
>>
>>
>>
>>
>> _______________________________________________
>> sqlite-users mailing list
>> [email protected]
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] What is wrong with this SQL?

Reply via email to