On Oct 13, 2011, at 6:38 AM, Eduardo wrote:

> Hi,
> I am trying to copy a table (or more of them) from one database to the
> other. I found somewhere in internet a code snippet that I have
> slightly modified. I read sequentially rows from the existing table
> and write them to the new one. The code worked for smaller table (up
> to several thousand rows). However when working with the table with
> more than million rows, the session commit failed more frequently as
> the number of the copied rows grows. The rows to be committed in the
> previous loop are still stored in the session and when the session
> filled with the rows from next sequence they are still in the session
> although they are supposed to have bee already committed.
> Is there any problem with the code or it is dependent upon the RDBMS
> (Postgres) configuration.
> Is there any other way to copy a table from one database to the other
> (except of using pg_dump)?

there's nothing obviously wrong with the code and its not the case that the 
rows remain in the Session after commit; I don't see them being placed in any 
kind of persistent collection and the Session does not maintain strong 
references to objects with no pending changes.   A check against 
len(gc.get_objects()) would confirm whether or not the total size of objects in 
memory is growing.

You're also emitting SELECTs with LIMIT/OFFSET which also will ensure that 
psycopg2 isn't buffering the entire result set, even though LIMIT/OFFSET gets 
very slow as you move deeper into the table (I typically use this recipe 
instead: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery)

That said, an operation like this is strictly table->table, you should be using 
the Core for this, exchanging simple tuples from the result set of one 
statement to a single parameter list for the other which will insert the full 
span of rows in one go (i.e. 
http://www.sqlalchemy.org/docs/core/tutorial.html#executing-multiple-statements).
   The ORM code here is unnecessary and complex, and makes more a much slower 
operation.



> Thanks
> ___________________________________________________________________________________________
> 
> 
> def pull_data(from_db, to_db, tables,slicesize):
>    smeta,source, sengine = create_meta_session_engine(from_db)
>    smeta = MetaData(bind=sengine)
>    dmeta,destination, dengine = create_meta_session_engine(to_db)
>    for table_name in tables:
>        table = Table(table_name, smeta, autoload=True)
>        table.metadata.create_all(dengine)
>        NewRecord = quick_mapper(table)
>        columns = table.columns.keys()
>        rowquery=source.query(table)
>        count=rowquery.count()
>        loop=0
>        while count-(loop*slicesize)>slicesize:
> 
> oneslice=source.query(table).slice(slicesize*loop,slicesize*(loop
> +1)).all()
>            for record in oneslice:
>                data = dict([(str(column), getattr(record, column))
> for column in columns])
>                destination.add(NewRecord(**data))
>            try:
>                destination.commit()
>                loop+=1
>            except:
>                destination.rollback()
>                time.sleep(25)
>            print loop
>        lastpending=True
>        lastslice=source.query(table).slice(slicesize*loop,count
> +1).all()
>        lastloop=0
>        while lastpending:
>            for record in lastslice:
>                data = dict([(str(column), getattr(record, column))
> for column in columns])
>                destination.add(NewRecord(**data))
>            try:
>                destination.commit()
>                lastpending=False
>            except:
>                destination.roolback()
>                time.sleep(25)
>                lastloop+=1
>            print lastloop
>    source.close()
>    destination.close()
> def quick_mapper(table):
>    Base = declarative_base()
>    class GenericMapper(Base):
>        __table__ = table
>    return GenericMapper
> ___________________________________________________________________________________________
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to