On 06/07/2010 07:27 PM, Az wrote:
>> By default, deepcopy will make one copy of everything in the object
>> graph reachable by the object you feed it. The scary part is that,
>> unless you also pass in a /memo/ argument to each call to deepcopy, it
>> will copy the entire graph /every single call/. So if you deepcopy the
>> students dictionary and then deepcopy the projects dictionary, each
>> student's allocated_proj attribute will not match any instance in the
>> projects dictionary. This is why a use-case-specific copy function is
>> recommended: it is a lot easier to predict which objects will get copied
>> and which objects will be shared.
>>     
>
>
> Shouldn't it match? I mean the student can only get allocated a
> project if it exists in the projects dictionary... or is that not the
> point?
>
> By use-case-specific, you mean I'll have to redefine deepcopy inside
> each class like this: def __deepcopy__(self): something, something?
>
> The only two places where this is an issue is for Supervisor's
> "offered_proj" attribute (a set) where, naturally, each project is an
> object and in Project where "proj_sup" is naturally a supervisor
> object :D
>
> The usefulness of my data structures comes back to bite me now...
>
>   

In theory, the following will work, ignoring ORM deepcopy issues
discussed at the beginning of this thread:

memo = {}
copied_students = copy.deepcopy(students, memo)
copied_supervisors = copy.deepcopy(supervisors, memo)
copied_projects = copy.deepcopy(projects, memo)

After you do this, memo will contain a record of all copied objects. You
should examine memo.values() to see if it is copying more than you
expected. If it did copy just what you expected, then my worries were
unfounded.

By use-case-specific, I meant define your own copy_objects function that
explicitly specifies what is copied:

def copy_objects(students, supervisors, projects):
    memo = {}
    copied_students = {}
    copied_supervisors = {}
    copied_projects = {}

    def copy_student(student):
        student_id = id(student)
        if student_id in memo:
            return memo[student_id]

        copied_student = Student()
        memo[student_id] = copied_student
        copied_student.attr1 = student.attr1
        [copy rest of student's attributes]
        if you_need_to_copy_students_project:
            copied_student.allocated_proj = copy_project(student.allocated_proj)
        return copied_student

    [define copy_supervisor]
    [define copy_project]

    copied_students = dict((key, copy_student(student)) for (key, student) in 
students.iteritems())
    copied_supervisors = dict((key, copy_supervisor(supervisor)) for (key, 
supervisor) in supervisors.iteritems())
    copied_projects = dict((key, copy_project(project)) for (key, project) in 
projects.iteritems())
    return (copied_students, copied_supervisors, copied_projects)

As you can see, this makes it clear which objects are copied and which
are shared. In retrospect, I think I assumed you didn't want to make
copies of your supervisors or projects when I recommended the
use-case-specific approach, which kind of violates the spirit of
deepcopy. Oh well, my bad.

>> class Student(object):
>>     [existing definitions]
>>
>>     def create_db_record(self):
>>         result = StudentDBRecord()
>>         result.ee_id = self.ee_id
>>         [copy over other attributes]
>>         return result
>>
>> class StudentDBRecord(object):
>>     pass
>>     
> The create_db_record function... does it have to called explicitly
> somewhere or does it automatically run?
>   

You have to call it explicitly, e.g.:

for unmapped_student in unmapped_students:
    mapped_student = unmapped_student.create_db_record()
    # I assume you want "find or create" behavior,
    # so use session.merge instead of session.add.
    mapped_student = session.merge(mapped_student)

> [...]
>   
>> I think a primary key of
>> (run_id, session_id/trial_id, stud_id) would be good
>>     
> If I make them all primary keys I get a composite key right? Within an
> entire M-C simulation the stud_id's would repeat in groups -- so if
> there are 100 simulations, each stud_id appears 100 times in that
> commit.
>
> Run_id is a fantastic idea! I'd probably have it be the date and time?
> Given that the simulation takes a while to run... the time will have
> changed sufficiently for uniqueness. However, then querying becomes a
> pain because of whatever format the date and time data will be in...
> so in that case, what is a GUID and is that something we could give to
> the Monte-Carlo ourselves before the run as some sort of argument? It
> would be the same for an entire run but different from run to run (so
> not unique from row to row, but unique from one run set to the other).
> Any thoughts on this?
>   

Yes, session_id/trial_id and stud_id can repeat, and you can still group
things together by run_id. Alternatively, you could add an
autoincrementing primary key to SimAllocation, but I believe it is
redundant since the combination (run_id, session_id/trial_id, stud_id)
should be unique anyway. run_id can definitely be a datetime, but I'm
not sure how well sqlite (it sounds like you're using sqlite) supports
datetimes in queries (see
http://www.sqlalchemy.org/docs/reference/dialects/sqlite.html#date-and-time-types).
A GUID (or UUID) is just a 128-bit value (usually random); the benefit
here is you can generate it on the client side and be confident that it
will be unique on the server (to avoid duplicate primary key errors).
Using datetimes or database sequences would also work. You can
definitely pass the run_id as an argument to monteCarloBasic, or to each
object's create_db_record method.

-Conor

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to