Re: [sqlalchemy] Copying instances from old to new schema DB

chaouche yacine Sun, 12 Sep 2010 02:11:01 -0700

The two instances don't have the same ID, the uniqueness is tested on two other 
columns.


> session.query(cls).filter(cls.the_unique_column==new_object.the_unique_column).first()

Is there a way to get the tuple (usually a couple) of columns that are involved 
in a unique constraint ? if there is such a way, than the line of code above 
could be generically applied to all models, which would be sweet :)

something like : 

for row in csv:
    new_object = make_an_object_from_csv_row(row)
    existing_object = get_existing_object(new_object)
    if existing_object is not None:
        new_object.id = existing_object.id
    new_object = session.merge(new_object)
session.commit()

def get_existing_object(new_object):

    unique_constraints = [constraint.columns for constraint in 
cls.get_constraints() 
                          if constraint.get_type() == UniqueConstraint]

    u_c_columns = [constraint.columns for constraint in unique_constraints]

    query = session.query(cls)
    for column_tuples in u_c_columns :
        for column in column_tuples : 
            query = query.filter(cls.get_attr(column) == 
new_object.get_attr(column))

    return query.first()


--- On Thu, 9/9/10, Michael Bayer <[email protected]> wrote:

> From: Michael Bayer <[email protected]>
> Subject: Re: [sqlalchemy] Copying instances from old to new schema DB
> To: [email protected]
> Date: Thursday, September 9, 2010, 9:54 AM
> 
> On Sep 9, 2010, at 11:31 AM, chaouche yacine wrote:
> 
> > Thank you Michael, that's for the catch-the-error
> part. How about the set-relations-right part, if I decided
> to go with my fix-as-you go recipe ? do you have any idea ?
> the dictionary approach seems good, but I still am curious
> about how to set relations generically on models. I think
> I'll use it somewhere else in my code.
> 
> You don't need to set any relations.   If
> your new B has a fully populated primary key, you should
> just be using session.merge() for the whole thing. 
>   This can work with your "integrity error" scheme, or
> more simply with the "select first" schemes below:
> 
> 
> for row in csv:
>     new_object =
> make_an_object_from_csv_row(row)
>     new_object = session.merge(new_object)
> session.commit()
> 
> 
> if new_object does *not* have the correct primary key,
> then:
> 
> for row in csv:
>     new_object =
> make_an_object_from_csv_row(row)
>     existing_object =
> session.query(cls).filter(cls.the_unique_column==new_object.the_unique_column).first()
>     if existing_object is not None:
>         new_object.id =
> existing_object.id
>     new_object = session.merge(new_object)
> session.commit()
> 
> 
> if you don't like the many indvidual SELECT statements,
> then
> 
> # load a dictionary of (unique attr, primary key)
> lookup = dict(
>     session.query(cls.the_unique_column,
> cls.id)
> )
> for row in csv:
>     new_object =
> make_an_object_from_csv_row(row)
>         if new_object.the_unique_column
> in lookup:
>             new_object.id =
> lookup[new_object.the_unique_column]
>     new_object = session.merge(new_object)
> session.commit()
> 
> or if most of your rows are replacement rows, quicker to
> preload everything:
> 
> 
> # load a dictionary of (unique attr, instance)
> lookup = dict(
>     session.query(cls.the_unique_column,
> cls)
> )
> for row in csv:
>     new_object =
> make_an_object_from_csv_row(row)
>         if new_object.the_unique_column
> in lookup:
>             new_object.id =
> lookup[new_object.the_unique_column].id
>     new_object = session.merge(new_object)
> session.commit()
> 
> 
> 
> 
> 
> > 
> > By the way, I think the pseudo should have been : 
> > 
> > instance = next_instance_from_csv()
> > try:
> >    session.commit()
> > except IntegrityError,e :
> >    session.rollback()
> >    if e.orig == UniqueConstraintError : 
> >        original_instance =
> ModelClass.get(instance.id)
> >        for relation_name in
> instance.get_relation_names() :
> >            # Is this
> correct ?
> >           
> path_to_attribute= "mapper."+ relation +".inverse" # I don't
> know where to find this ?
> >           
> instance.set_attribute(path_to_attribute,original_instance)
> >        session.commit()
> > 
> > class BaseModel (DeclarativeBase):
> >    ...
> >    def set_attribute(self,path,value):
> >        nodes     
>   = path.explode(".")
> >        current_node = self
> >        for next_node in nodes :
> >            current_node
> = getattr(current_node,next_node,None)
> >            if not
> current_node:
> >               
> raise InvalidAttribute
> >       
> set_attribute(current_node,value) # sqla's set_attribute,
> for whatever reason... 
> > 
> > 
> > Y.Chaouche
> > 
> > 
> > --- On Thu, 9/9/10, Michael Bayer <[email protected]>
> wrote:
> > 
> >> From: Michael Bayer <[email protected]>
> >> Subject: Re: [sqlalchemy] Copying instances from
> old to new schema DB
> >> To: [email protected]
> >> Date: Thursday, September 9, 2010, 5:59 AM
> >> 
> >> On Sep 9, 2010, at 6:31 AM, chaouche yacine
> wrote:
> >> 
> >>> Hello list,
> >>> 
> >>> My schema has changed, and now I want to
> retrieve my
> >> old data (of the old schema) to the new database
> (with the
> >> new schema) from csv files (I export the tables of
> the old
> >> database to csv files and then load those csv
> files to the
> >> new database with some column mapping and some gap
> filling
> >> for new columns)
> >>> 
> >>> The problem is that my new schema have a
> >> UniqueConstraint for some tables (like : the name
> column of
> >> the city table should be unique within a country).
> This
> >> constraint was not present in the old schema and
> data is
> >> corrupt (two cities with the same name in the same
> country).
> >> So when I try to insert them in the new database,
> I have
> >> IntegrityErrors. 
> >>> 
> >>> The solution I thought of was :
> >>> * Catch the IntegrityError exception 
> >>> * If it's a problem on a UniqueConstraint,
> then the
> >> exception was raised because I tried to insert
> instance B
> >> that has the same "key" as instance A that was
> inserted
> >> before.
> >>> * So for all children of B (B's relations),
> set their
> >> parent to A. For example, for all citizens of B,
> set their
> >> city to A, beause A and B ought to be the same. 
> >>> * Then, safely ignore B and move on to the
> next
> >> instance.
> >>> 
> >>> Here's what has been done so far (that works,
> I just
> >> use psuedo code for "illustration" purpose. If
> necessary,
> >> you can look at the actual attached source files):
> 
> >>> 
> >>> line       =
> >> csvloader.next_row()
> >>> ModelClass = get_current_model()
> >>> instance   =
> >> ModelClass.create_instance(**(to_dict(line)))
> >>> session.add(instance) 
> >>> 
> >>> I wish I could do something like this : 
> >>> try:    
> >>>     session.commit()
> >>> except IntegrityError,e : 
> >>>     session.rollback()
> >>>     errror = get_error()
> >>>     if type_of(error) ==
> >> UniqueConstraintError :
> >>>     
>    original_instance =
> >> ModelClass.get(instance.id)
> >>>         for
> relation in
> >> instance.get_relations() : 
> >>>         
>    # Is this
> >> correct ? 
> >>>        
> >> instance.relation.inverse = original_instance
> >>>     
>    session.commit()
> >>> 
> >>> My questions are : how to write get_error,
> type_of,
> >> where to get UniqueContraintError, how to write
> >> get_relations, how to set the inverse of a
> relation (is
> >> instance.realtion.inverse the right thing to set
> ?) and is
> >> this approach correct ?
> >> 
> >> There's no portable way to detect "unique
> constraint"
> >> errors across DBAPIs, you'd have to catch the
> specific error
> >> that you've observed your DBAPI emits as well as
> the
> >> message/codes contained within it, and code
> against
> >> that.   SQLAlchemy wraps DBAPI
> exceptions in
> >> its own same-named wrapper and the object emitted
> by the
> >> DBAPI is available under the ".orig" member.
> >> 
> >> One improvement to the recipe would be if you
> used
> >> savepoints, if supported by your database, so that
> when you
> >> issue rollback() inside a savepoint block, the
> whole
> >> transaction isn't rolled back and you can maintain
> the whole
> >> operation in one transaction.   The
> session
> >> can start a savepoint using begin_nested().
> >> 
> >> When I do csv loads like these however I usually
> load the
> >> full list of identifiers to be checked into memory
> ahead of
> >> time.  If the csv is less than 100K rows and
> represents
> >> the full table of data, I just load the whole
> thing into a
> >> dictionary, formatted according to whatever "these
> are the
> >> fields that are unique" I'm dealing with, and
> consult the
> >> dictionary as I go along.    Otherwise,
> I load
> >> individual blocks of data in as I scan through
> portions of
> >> the csv (like, give me all the records where id in
> (a, b, c,
> >> d, e, f, g, ...) for this
> block).   There's
> >> nothing "wrong" with the "catch-the-error"
> approach except
> >> that I like an entire operation to be in a single
> >> transaction, and its usually simpler and faster
> just to work
> >> with a Python dictionary.
> >> 
> >> 
> >> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> You received this message because you are
> subscribed
> >> to the Google Groups "sqlalchemy" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to
> >> [email protected].
> >>> For more options, visit this group at 
> >>> http://groups.google.com/group/sqlalchemy?hl=en.
> >>> 
> >>> <loadCSV.py><db.py>
> >> 
> >> -- 
> >> You received this message because you are
> subscribed to the
> >> Google Groups "sqlalchemy" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to
> [email protected].
> >> For more options, visit this group at 
> >> http://groups.google.com/group/sqlalchemy?hl=en.
> >> 
> >> 
> > 
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed
> to the Google Groups "sqlalchemy" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected].
> > For more options, visit this group at 
> > http://groups.google.com/group/sqlalchemy?hl=en.
> > 
> 
> -- 
> You received this message because you are subscribed to the
> Google Groups "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 
> 


      

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Copying instances from old to new schema DB

Reply via email to