[sqlalchemy] Subquery giving single values in the SELECT part of a query

Guido Winkelmann Wed, 22 Oct 2014 08:37:28 -0700

Hi,

How can I put a subquery that yields exactly one value inside the select part
of the query without joining the subquery like another table?


I have a table jobs and a table tasks. Every job has got any arbitrary number
(zero or more) of tasks. Tasks can be in one of four states: "queued",
"running", "done" and "failed". When querying the jobs table, I want to get
the counts of "queued", "running", "done" and "failed" tasks for every job,
and I want to be able to sort the output by any of those counts. The SQL I
want would look like this:

=======
SELECT
  jobs_1.id AS jobs_1_id,
  <more columns from jobs>
  (SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state IS NULL) 
AS t_queued,
  (SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state=105) AS 
t_running,
  (SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state=106) AS 
t_done,
  (SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state=107) AS 
t_failed
FROM
  jobs AS jobs_1
ORDER BY
  t_running DESC;
======

What I have so far in SQLAlchemy looks like this:

======
class Job(Model):
    <stuff>
    tasks_queued = relationship("Task", lazy="dynamic",
                                primaryjoin="(Task.state == None) & 
(Task.job_id == Job.id)")
    tasks_running = relationship("Task", lazy="dynamic",
                                 primaryjoin="(Task.state == %s) & (Task.job_id 
== Job.id)" % DBWorkState.RUNNING)
    tasks_done = relationship("Task", lazy="dynamic",
                              primaryjoin="(Task.state == %s) & (Task.job_id == 
Job.id)" % DBWorkState.DONE)
    tasks_failed = db.relationship("Task", lazy="dynamic",
                                   primaryjoin="(Task.state == %s) & 
(Task.job_id == Job.id)" % DBWorkState.FAILED)

a_q = aliased(Task)
a_r = aliased(Task)
a_d = aliased(Task)
a_f = aliased(Task)
a_job = aliased(Job)

jobs_query = db.session.query(a_job,
                              func.count( a_q.id.distinct()).label("t_queued"),
                              func.count(a_r.id.distinct()).label("t_running"),
                              func.count(a_d.id.distinct()).label("t_done"),
                              func.count(a_f.id.distinct()).label("t_failed")).\
   outerjoin(a_q, a_job.tasks_queued).\
   outerjoin(a_r, a_job.tasks_running).\
   outerjoin(a_d, a_job.tasks_done).\
   outerjoin(a_f, a_job.tasks_failed).\
   group_by(a_job)
======

(The code in question is part of https://github.com/pyfarm/pyfarm-master)

This does not do quite what I want though. It produces this SQL query:

======
SELECT 
  jobs_1.id AS jobs_1_id,
  <more columns from jobs>
  count(DISTINCT tasks_1.id) AS t_queued,
  count(DISTINCT tasks_2.id) AS t_running,
  count(DISTINCT tasks_3.id) AS t_done,
  count(DISTINCT tasks_4.id) AS t_failed 
FROM
  jobs AS jobs_1
  LEFT OUTER JOIN tasks AS tasks_1 ON tasks_1.state IS NULL AND tasks_1.job_id 
= jobs_1.id
  LEFT OUTER JOIN tasks AS tasks_2 ON tasks_2.state = 105 AND tasks_2.job_id = 
jobs_1.id
  LEFT OUTER JOIN tasks AS tasks_3 ON tasks_3.state = 106 AND tasks_3.job_id = 
jobs_1.id
  LEFT OUTER JOIN tasks AS tasks_4 ON tasks_4.state = 107 AND tasks_4.job_id = 
jobs_1.id 
WHERE
  jobs_1.state = 105
GROUP BY
  jobs_1.id,
  <more columns from jobs>
ORDER BY t_running DESC
======

It works, but it causes horrible performance issues in PostgreSQL, to the
point of being unusable.

How can I make SQLAlchemy produce a query that's more like the first SQL query?

        Guido

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Subquery giving single values in the SELECT part of a query

Reply via email to