Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?

Keith Medcalf Sat, 08 Nov 2014 05:24:15 -0800

How about the direct approach:

SELECT uid
  FROM resource
 WHERE uid NOT IN (SELECT resource_uid
                     FROM event_participant, event
                    WHERE event_participant.event_uid = event.uid
                      AND event.shift_uid = :shift_uid
                      AND event.date = :event_date)


or perhaps 

SELECT uid
  FROM resource
 WHERE NOT EXISTS (SELECT 1
                     FROM event_participant, event
                    WHERE event_participant.event_uid = event.uid
                      AND resource_uid = resource.uid
                      AND event.shift_uid = :shift_uid
                      AND event.date = :event_date)

Is not the "right way to do it" the one that obtains the result required, not 
the one that uses a "checkbox" implementation?  Using the "checkbox" approach, 
someone might be of the opinion that the "right way to do it" is entirely 
within a hand-coded StartIO channel program implemented for a 3390 disk 
controller, and that is unlikely to be implementable on a bitty-box.


---
Theory is when you know everything but nothing works.  Practice is when 
everything works but no one knows why.  Sometimes theory and practice are 
combined:  nothing works and no one knows why.

>-----Original Message-----
>From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
>boun...@sqlite.org] On Behalf Of Tristan Van Berkom
>Sent: Saturday, 8 November, 2014 03:17
>To: sqlite-users@sqlite.org
>Subject: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?
>
>Hi all,
>
>Today I've stumbled on a situation where I think I really need to use a
>RIGHT OUTER JOIN, and looking at all the examples on the internet I
>could find so far, I'm not finding a way to simulate it properly using
>LEFT OUTER JOINs.
>
>So I thought, before I commit to an inefficient alternative I would
>check to see if someone on this list has an idea of how I can form an
>efficient query for this.
>
>I'll start by highlighting some of the schema which is relevant to the
>query:
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>CREATE TABLE shift (
>  uid INTEGER PRIMARY KEY AUTOINCREMENT,
>
>  ... name and start time/duration of a shift ...
>);
>
>CREATE TABLE resource (
>  uid INTEGER PRIMARY KEY AUTOINCREMENT,
>
>  ... details about the resource ...
>);
>
>CREATE TABLE event (
>  uid        INTEGER  PRIMARY KEY AUTOINCREMENT,
>  date       INTEGER  NOT NULL, /* unix timestamp */
>  shift_uid  INTEGER  NOT NULL REFERENCES shift (uid),
>
>  ... other details about the event ...
>);
>
>CREATE TABLE IF NOT EXISTS event_participant (
>  id              INTEGER CHECK (id > 0),
>  event_uid       INTEGER REFERENCES event (uid) ON DELETE CASCADE,
>  resource_uid    INTEGER REFERENCES resource (uid),
>
>  ... other details about this participant ...
>
>  PRIMARY KEY (event_uid, id)
>);
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>Now my goal here, is to pull out every resource UID which is _not_
>assigned to any event on a given date and shift UID.
>
>My first attempt that gets the job done looks like this:
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~                              Statement 1                             ~
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>SELECT resource.uid
>
>/* All resources... */
>FROM   resource
>
>/* Subtract from this set all the booked resources */
>LEFT JOIN (
>
>    /* All resources which are booked on the given date / shift_uid */
>    SELECT participant.resource_uid AS uid
>    FROM event
>    JOIN event_participant
>      AS participant
>      ON (participant.event_uid = event.uid)
>    WHERE (event.date      = 1411185600 AND
>           event.shift_uid = 1 AND
>           participant.resource_uid IS NOT NULL)
>
>) AS booked_resource ON (booked_resource.uid = resource.uid)
>
>WHERE booked_resource.uid IS NULL;
>
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXPLAIN SAYS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>1|0|0|SEARCH TABLE event USING INDEX event_shift_uid_idx (shift_uid=?)
>(~2 rows)
>1|1|1|SEARCH TABLE event_participant AS participant USING INDEX
>event_participant_pk_idx (event_uid=?) (~5 rows)
>0|0|0|SCAN TABLE resource (~1000000 rows)
>0|1|1|SEARCH SUBQUERY 1 AS booked_resource USING AUTOMATIC COVERING
>INDEX (uid=?) (~2 rows)
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>Now, this is not all that bad, except that I try to avoid nesting select
>statements in cases like this as it prevents the query planner from
>doing anything intelligent.
>
>
>So I made another attempt which discards the nested statement:
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~                              Statement 2                             ~
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>SELECT resource.uid
>
>/* All Resources */
>FROM resource
>
>/* Get the cartesian product of event * resources, dangerously big,
> * but hopefully trimmed down by constraints found in the final WHERE
> * clause
> */
>JOIN event
>
>/* Subtract from this set all the booked resources */
>LEFT JOIN event_participant
>  AS participant
>  ON (participant.event_uid    = event.uid AND
>      participant.resource_uid = resource.uid)
>WHERE event.date               = 1411185600 AND
>      event.shift_uid          = 1 AND
>      participant.resource_uid IS NULL;
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXPLAIN SAYS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>0|0|1|SEARCH TABLE event USING INDEX event_shift_uid_idx (shift_uid=?)
>(~2 rows)
>0|1|0|SCAN TABLE resource (~1000000 rows)
>0|2|2|SEARCH TABLE event_participant AS participant USING INDEX
>event_participant_resource_uid_idx (resource_uid=?) (~2 rows)
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>It turns out that this query is almost semantically the same as the
>first, however without the nested select, EXPLAIN QUERY PLAN seems to
>indicate the performance is better (the SCAN on the 'resource' table
>is still present, but the extra SEARCH SUBQUERY is not gone).
>
>Finally, after doing some more digging, I think I've found that the
>correct way to make this query should really be with RIGHT OUTER JOIN as
>follows:
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>~                              Statement 3                             ~
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>SELECT resource.uid
>
>/* All events on the date specified in the final WHERE clause */
>FROM event
>
>/* Join participants by indexed event key, only those with assigned
> * resources
> */
>JOIN event_participant
>  AS participant
>  ON (participant.event_uid = event.uid AND
>      participant.resource_uid IS NOT NULL)
>
>/* Join in all the resources which do not appear in the preceding set */
>RIGHT OUTER JOIN resource
>  ON (resource.uid = participant.resource_uid)
>WHERE event.DATE      = 1411185600 AND
>      event.shift_uid = 1 AND
>      participant.resource_uid IS NULL;
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>However, unlike other simple use cases of LEFT OUTER JOIN, this one
>cannot simply be inverted to get the same performance/results as
>with RIGHT OUTER JOIN (inverting the statement only works in the
>very simple case where you really only have 2 tables to join).
>
>So, in closing, I have only been doing SQL for a couple of years now
>and would consider myself a 'newbie' at this - and I was wondering if
>someone more enlightened could point out something I've missed.
>
>If it's the case that RIGHT OUTER JOIN would be the correct way but
>SQLite simply doesn't offer that (and there is no way to efficiently
>work around this), I will gladly except that and probably go with a
>variation of 'Statement 2' above.
>
>I would also be interested to know if 'Statement 1' may be superior
>to 'Statement 2' due to the dangerously huge cartesian product of
>the 'event' and 'resource' tables.
>
>Thank you all for your attention :)
>
>Best Regards,
>    -Tristan
>
>
>_______________________________________________
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?

Reply via email to