Hi All.
I have a select query that uses "subqueryload" and looks like this:
completed_imports = self.ra_import_file.visible() \
.filter(ImportFile.lock_date == None) \
.filter(ImportFile.process_date != None) \
.order_by(ImportFile.process_date.desc()) \
.options(subqueryload('user')) \
.all()
However, intermittently I am finding results from the query where
process_date is None (NULL) in production, which seems weird, since I
am using .filter(ImportFile.process_date != None).
This table is quite busy, and there can be quite a few rows that turn
into "complete imports" by changing their process_date from NULL to an
actual date.
Because I am using subqueryload(), two SQL statements are issued:
First query:
SELECT import_file.create_date ...
FROM import_file
WHERE import_file.import_file_id IN (SELECT
import_file.import_file_id AS import_file_import_file_id
FROM import_file
WHERE import_file.lock_date IS NULL AND import_file.process_date IS
NOT NULL ORDER BY import_file.process_date DESC
Second query:
SELECT "user".password AS user_password... ,
anon_1.import_file_user_id AS anon_1_import_file_user_id
FROM (SELECT import_file.user_id AS import_file_user_id
FROM import_file
WHERE import_file.import_file_id IN (SELECT
import_file.import_file_id AS import_file_import_file_id
FROM import_file
WHERE import_file.lock_date IS NULL AND import_file.process_date IS
NOT NULL) AS anon_1 JOIN "user"
ON anon_1.import_file_user_id = "user".user_id ORDER BY
anon_1.import_file_user_id
My [unconfirmed] theory why I occasionally get rows returned that have
NULL process_dates is that race conditions like this happen when under
high load:
1) Row 'foo' in the file_import table has process_date = NULL
2) First SQL query runs (SELECT import_file.create_date ...) and the
'foo' row is not returned in the result set because it does not match
the "AND import_file.process_date IS NOT NULL" where clause.
3) External process completes import and sets process_date =
<current_date> for row 'foo'
4) Second SQL query runs (SELECT "user".password AS user_password...)
and the 'foo' row is now in the result set because it matches the "AND
import_file.process_date IS NOT NULL" where clause.
5) "subqueryload" joins both SQL queries together in results in python
(sorry, I got a bit lost trying to trace this in sqlalchemy code),
resulting in row 'foo' being in the results, but missing values such
as 'process_date', which are only queried in the first SQL query.
Am I completely wrong? Does subqueryload() only return results if each
row was returned in both SQL queries?
Thanks.
- Paul
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.