Hi,
How can I put a subquery that yields exactly one value inside the select part
of the query without joining the subquery like another table?
I have a table jobs and a table tasks. Every job has got any arbitrary number
(zero or more) of tasks. Tasks can be in one of four states: "queued",
"running", "done" and "failed". When querying the jobs table, I want to get
the counts of "queued", "running", "done" and "failed" tasks for every job,
and I want to be able to sort the output by any of those counts. The SQL I
want would look like this:
=======
SELECT
jobs_1.id AS jobs_1_id,
<more columns from jobs>
(SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state IS NULL)
AS t_queued,
(SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state=105) AS
t_running,
(SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state=106) AS
t_done,
(SELECT count(*) FROM tasks t WHERE t.job_id=jobs_1.id AND t.state=107) AS
t_failed
FROM
jobs AS jobs_1
ORDER BY
t_running DESC;
======
What I have so far in SQLAlchemy looks like this:
======
class Job(Model):
<stuff>
tasks_queued = relationship("Task", lazy="dynamic",
primaryjoin="(Task.state == None) &
(Task.job_id == Job.id)")
tasks_running = relationship("Task", lazy="dynamic",
primaryjoin="(Task.state == %s) & (Task.job_id
== Job.id)" % DBWorkState.RUNNING)
tasks_done = relationship("Task", lazy="dynamic",
primaryjoin="(Task.state == %s) & (Task.job_id ==
Job.id)" % DBWorkState.DONE)
tasks_failed = db.relationship("Task", lazy="dynamic",
primaryjoin="(Task.state == %s) &
(Task.job_id == Job.id)" % DBWorkState.FAILED)
a_q = aliased(Task)
a_r = aliased(Task)
a_d = aliased(Task)
a_f = aliased(Task)
a_job = aliased(Job)
jobs_query = db.session.query(a_job,
func.count( a_q.id.distinct()).label("t_queued"),
func.count(a_r.id.distinct()).label("t_running"),
func.count(a_d.id.distinct()).label("t_done"),
func.count(a_f.id.distinct()).label("t_failed")).\
outerjoin(a_q, a_job.tasks_queued).\
outerjoin(a_r, a_job.tasks_running).\
outerjoin(a_d, a_job.tasks_done).\
outerjoin(a_f, a_job.tasks_failed).\
group_by(a_job)
======
(The code in question is part of https://github.com/pyfarm/pyfarm-master)
This does not do quite what I want though. It produces this SQL query:
======
SELECT
jobs_1.id AS jobs_1_id,
<more columns from jobs>
count(DISTINCT tasks_1.id) AS t_queued,
count(DISTINCT tasks_2.id) AS t_running,
count(DISTINCT tasks_3.id) AS t_done,
count(DISTINCT tasks_4.id) AS t_failed
FROM
jobs AS jobs_1
LEFT OUTER JOIN tasks AS tasks_1 ON tasks_1.state IS NULL AND tasks_1.job_id
= jobs_1.id
LEFT OUTER JOIN tasks AS tasks_2 ON tasks_2.state = 105 AND tasks_2.job_id =
jobs_1.id
LEFT OUTER JOIN tasks AS tasks_3 ON tasks_3.state = 106 AND tasks_3.job_id =
jobs_1.id
LEFT OUTER JOIN tasks AS tasks_4 ON tasks_4.state = 107 AND tasks_4.job_id =
jobs_1.id
WHERE
jobs_1.state = 105
GROUP BY
jobs_1.id,
<more columns from jobs>
ORDER BY t_running DESC
======
It works, but it causes horrible performance issues in PostgreSQL, to the
point of being unusable.
How can I make SQLAlchemy produce a query that's more like the first SQL query?
Guido
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.