It did help me!
But it needs some modifications:
query = session.query(TableA, TableB) 
             .filter(TableA.b_id == None, <other-conditions>)
for a, b in query:
    a.b_id = b.id
    session.add(a)

This is so much faster, but i have to consider those TableA records which 
may have more than one match on TableB(faulty database) :(
Thanks again.


On Friday, January 2, 2015 2:41:43 AM UTC+3:30, Simon King wrote:
>
> > On 31 Dec 2014, at 16:25, Mehdi <mese...@gmail.com <javascript:>> 
> wrote: 
> > 
> > Hi 
> > What would be the most efficient way to loop over a large table(>120000 
> rows) and base on some conditions find the match on another table(>70000 
> rows)? 
> > Actually i'm working on a given db which there are TableA with 14 fields 
> and more than 120000 rows and TableB with 150 fields and about 70000 rows. 
> now i have to find a match for each row of TableA on TableB and then put 
> value of TableB.id in a field on TableA(let's say TableA.b_id). 
> > Currently i'm doing it with a loop through TableA with offset/limit and 
> then inside the loop i try to find the match by getting query of TableB: 
> > offset = 0 
> > limit = 2000 
> > count = session.query(func.count(TableA.id)).scalar() 
> > while offset < count: 
> >     for a_row in session.query(TableA).offset(offset).limit(limit): 
> >         match = session.query(TableB).filter(some_filters).all() 
> >         if len(match) == 1: 
> >             a_row.b_id = match[0].id 
> >             session.add(a_row) 
> >         else: 
> >             ... 
> >     offset += limit 
> >   
> > But this is pretty time consuming process even though i set deferred 
> columns as much as possible. 
> > I wonder is it possible and reliable to get all matches by a join 
> subquery and then loop over those result to set the match field? 
> > Thanks. 
>
> Perhaps you could do something like this: 
>
> batch_size = 2000 
> while True: 
>     batch = (session.query(TableA, TableB) 
>              .join(<match-conditions>) 
>              .filter(TableA.b_id == None) 
>              .limit(batch_size) 
>              .all()) 
>     if not batch: 
>         break 
>     for a_row, b_row in batch: 
>         a_row.b_id = b_row.id 
>     session.commit() 
>
> Hope that helps, 
>
> Simon

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to