Try basing the interval query on a correlated subquery, so instead of
q = session.query(
column,
func.row_number().\
over(order_by=column).\
label('rownum')
).\
from_self(column)
try
subquery = base_query.subquery() q = session.query(
column,
func.row_number().\
over(order_by=column).\
label('rownum')
).\
join(subquery, subquery.columns[column.name] == column).\
from_self(column)
This should give a SQL statement of
SELECT anon_1.widget_id AS anon_1_widget_id
FROM (
SELECT widget.id AS widget_id, row_number() OVER (ORDER BY widget.id) AS
rownum
FROM widget JOIN (
SELECT widget.id AS id, widget.data AS data
FROM widget
WHERE widget.id < :id_1) AS anon_2
ON anon_2.id = widget.id) AS anon_1
WHERE rownum % 1000=1
where myQuery is a filtered query on the widget table, in this example I
used the following to take the first half of the widget table
base_query = session.query(Widget).filter(Widget.id < 5000)
For reference, the SQL statement in the original post is
SELECT anon_1.widget_id AS anon_1_widget_id
FROM (
SELECT widget.id AS widget_id, row_number() OVER (ORDER BY widget.id) AS
rownum
FROM widget) AS anon_1
WHERE rownum % 1000=1
On Tuesday, August 5, 2014 12:12:42 AM UTC+1, Jonathan Vanasco wrote:
>
> I've been working on a migration script, and turned to my trusty friend
> the WindowedRangeQuery recipe:
>
>
> https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery
>
> I ran into a small problem and hoping someone else may have done this and
> saved me some work.
>
> I'm trying to work out some edge cases on a large table, and the default
> performance is pretty bad in Postgres.
>
> The two big issues are:
>
> 1. The initial query to create an iterable range is painful. It takes me
> about 2 minutes. This seems to be due to:
> a - loading a lot of data into memory
> b - sequentially scanning the 'id' column. this function won't use
> the primary key, or any other index, for id. it seq scans the whole table.
>
> 2. Window ranges are created for all ids in the table. If I only want to
> work on half the records, with a min/max, the 'iterable' query doesn't use
> the min/max filter values; it generates 100% of potential ranges based on
> the id, and the filters are only used for the 'where' clause.
>
> Has anyone worked out an improved version of this technique, or am I
> cobbling something together myself?
>
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.