Re: [sqlalchemy] More efficient Insert mechanism

Mati Skiva Tue, 02 Feb 2010 10:51:35 -0800

Michael Bayer wrote:

Mati Skiva wrote:

I believe I took these issues into account.
But just to be sure, maybe you can highlight something for me.


I assume the followings about the process of session.add:
* after session.add is called, the objects are placed in a to-do pool
(maybe)
* for self-generating-id objects, a special process is used, one that
fetches the generated id after the insert
* for no self-generating-id objects, a regular process of insert is used

I came to this conclusion, because otherwise, after each insert all the
data of the row is retrieved and placed inside the object. Which covers
the newly generated id.


Since you're interested in hacking SQLAlchemy internals, it would helpful
if you could familiarize yourself fully with the SQL expression and
execution API, all of which is public and documented.   "id generation" is
a standard feature of this system and occurs outside of the ORM.  Fetching
the ID for a new row with the SQLA expression API is a matter of calling a
ResultProxy method which the ORM only calls if it doesn't already know the
full set of primary key columns for a specific object.   The backend
implementation of "id fetching" varies wildly across all dialects as well
as versions of each database in use.

The flush procedure updates or expires those fields of the object which
were known to be generated by the database.   Fields which are immediately
updated include some or all of the primary key columns.  Fields which are
expired include any other columns which were generated via server-side SQL
expressions or known schema-level defaults - these are fetched when the
object's expired attributes are next accessed, if at all.

The SQL compiler procedure is aware of these columns when it generates an
individual insert() construct, given a list of parameters which are to be
bound literals and others which are to be embedded SQL expressions.   When
executed, the ResultProxy API provides these details fully - you should
familiarize yourself with all of its methods.

I would also point you to http://www.sqlalchemy.org/trac/ticket/1518 ,
which suggests that we should open up Mapper to being subclassable (for
much more exotic purposes than what you have here).  However, this
subclassing is possible now, which is where you can have your "insertmany"
functionality rolled in a rudimental way for your immediate use case
without burdening SQLAlchemy core with the need to have a fully
tested/supported/performant feature.   See the attached example.

Thank you for the pointers, as well as the example.

I attempted using it, however quickly enough I was forced to dig intothe internals. I shall explain why.

from inspecting mapper.py's _save_obj method, I found that after everyinsert the following items are required:


   * value_params - used by _postfetch
   * params - used by _postfetch
   * ResultProxy.last_inserted_params() - used by _postfetch
   * ResultProxy.postfetch_cols() - used within _postfetch
   * ResultProxy.prefetch_cols() - used within _postfetch
   * ResultProxy.last_inserted_ids() - used by by _save_obj

All of these items shared the same problem, they are all dictionaries orlists. One for each insert, making them single-dimension data (inrespect to the insert).But when executing many, they became either None or kept theirsingle-dimension property (i.e. for two inserts, only one list isreturned, when I expect a list of lists)

That alone prevented me from overloading _save_obj. Because I amrequired to have these items for every inserted row.

So, I set upon transforming these items into multi-dimensional uponexecute-many.It brought me to default.py's __process_defaults where I implemented thefollowings:


   def __process_defaults(self):
       """generate default values for compiled insert/update statements,
       and generate last_inserted_ids() collection."""
       if self.executemany:
           if len(self.compiled.prefetch):
               drunner = self.dialect.defaultrunner(self)
               params = self.compiled_parameters
               for param in params:

# assign each dict of params toself.compiled_parameters;# this allows user-defined default generators toaccess the full

                   # set of bind params for the row
                   self.compiled_parameters = param
                   for c in self.compiled.prefetch:
                       if self.isinsert:
                           val = drunner.get_column_default(c)
                       else:
                           val = drunner.get_column_onupdate(c)
                       if val is not None:
                           param[c.key] = val
               self.compiled_parameters = params

           if self.isinsert:
               self._last_inserted_ids = []
               self._last_inserted_params = []

for compiled_parameters in self.compiled_parameters:

               drunner = self.dialect.defaultrunner(self)

               for c in self.compiled.prefetch:
                   if self.isinsert:
                       val = drunner.get_column_default(c)

                   if val is not None:
                       compiled_parameters[c.key] = val

               if self.isinsert:

self._last_inserted_ids.append([compiled_parameters.get(c.key, None) forc in self.compiled.statement.table.primary_key])

                   self._last_inserted_params.append(compiled_parameters)

           self.postfetch_cols = self.compiled.postfetch

self.prefetch_cols =self.compiled.prefetchelse: ... #rest of the code stays the same


I have one reservations:

* Is there any code that relies on some of the added attributes not toexists upon execute-many?


I was forced to make one extra change, for sqlite.py's post_exec method.

Basically, populating the self._last_inserted_ids attribute. But forsome reason self.cursor.lastrowid is None for execute-many (while it canactually be retrieved from the database server still)

So, these changes yielded...

   def post_exec(self):
       if ( self.cursor.lastrowid == None ):
           self.cursor.execute("SELECT last_insert_rowid()")
           lastrowid = self.cursor.fetchone()[0]
       else:
           lastrowid = self.cursor.lastrowid

if self.compiled.isinsert:

           if ( not self.executemany ):

if not len(self._last_inserted_ids) orself._last_inserted_ids[0] is None:self._last_inserted_ids = [lastrowid] +self._last_inserted_ids[1:]

           else:
               for i in range(len(self._last_inserted_ids)):
                   _last_inserted_ids = self._last_inserted_ids[i]

if not len(_last_inserted_ids) or_last_inserted_ids[0] is None:_last_inserted_ids = [lastrowid] +_last_inserted_ids[1:]

                   self._last_inserted_ids[i] = _last_inserted_ids

And than, everything was working great :)

And I mean everything, including default values, sql expressions andinserted ids.


Would you care to offer your opinion about my efforts?

Thanks in advance,
Mati


This mail was sent via Mobileye Mail-SeCure system.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] More efficient Insert mechanism

Reply via email to