[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Anthony Sat, 04 May 2013 10:14:57 -0700


> def a_child_was_born_in(countryName, cityName):
>>     query = (db.City.Name == cityName) & (db.City.Country.belongs(db.
>> Country.Name == countryName))
>>     db(query).update(Population=db.City.Population + 1)
>>
>>
> Obviously I could have done it this way, but I wanted an example that I 
> can use to illustrate the differences.
> Your suggestion is a circumstantial optimization to an example that is 
> meant to show something else.
>


Arnon, we have repeatedly asked you to offer use cases where the ORM will 
be either easier or more efficient than the DAL. In this case, you have 
concocted an example that makes it appear that the ORM has advantages, but 
only because you have used the DAL in the least efficient way possible. If 
your requirement in making comparisons is that the DAL must be required to 
do everything in the same fashion as the ORM, then this is nonsensical. The 
question should be whether the DAL can achieve the same outcome just as 
easily (i.e., code that is similarly easy to produce, understand, test, 
debug, etc.) and with similar efficiency. There's no reason the DAL code 
should therefore have to superficially resemble the ORM code or precisely 
replicate its operations.

Now, you might argue that this is just a bad example, and that there is 
some real-world example where the web2py DAL will be forced to do something 
like you have coded here, and that's where the ORM will have an advantage. 
If that's the case, then you should have no problem presenting such an 
example. If you cannot do so, then it is hard to take this seriously.
  

> So, in order to do the update, we do not first have to query the database 
>> to retrieve the record. This is actually an advantage over the ORM, which 
>> requires that you first retrieve the record before updating it.
>>
>
> That is not true. I could have done it otherwise - this is just for the 
> sake of the example.
> There is nothing architecturally preventing an ORM from issuing an update 
> without a select. 
>
It's an supplementation detail, not an architectural one. You may have 
> encountered ORMs that can't support that, but that doesn't mean that the 
> problem is in the architecture.
>

As far as I can tell, if you want to update a record in SQLA, you must 
first retrieve it from the database (if using the ORM). If you believe that 
is not the case, then please show the code for how you would do this in the 
ORM.

As for the architectural issue, I suppose in principle an ORM could update 
a record without a prior select, but then you lose the other benefits of 
the ORM, as you will not have an object representation of that updated 
record in Python. At this point, you might as well have a DAL. In other 
words, if both the ORM and the DAL handle this case via direct database 
updates, then this example is irrelevant for establishing the supposed 
advantages of an ORM.
 

> Furthermore, because web2py doesn't need to retrieve the records, it also 
>> has a processing and memory advantage over the ORM, which must create the 
>> record object, add it to the session, and hold it in memory.
>>
>
> That again is a circumstantial issue, pertaining to this example. In most 
> cases, there would be more reuse of the objects, so creating them would be 
> beneficial.
>

In web2py, you create the objects when you need them. You do not always 
need them. For example, if you are updating a record, you might retrieve it 
in one request, then update it in a second request -- no need for another 
retrieval in the second request (unless you want to check for intervening 
changes).
 

> Also, I don't see how in web2py's DAL it is any different - all of the 
> objects you are using in the query/update are objects that has to be 
> created for you to use them...
>

No, in web2py, you do not need to create a Row object in order to update a 
record in the database.
 

> In fact, in an ORM, only the objects that are needed for the query may be 
> created for each transaction, but due to how web2py is executing, you are 
> actually having to create the entire-schema of objects from scratch at 
> every request, so I can't see how web2py would create less-objects - it 
> actually would create more...
>

Not sure what you're talking about here. If you are talking about defining 
tables in web2py at every request, it is not required that you define all 
the tables for the entire schema -- you can define conditionally depending 
on what you need. If you are talking about creating instances of Row 
objects, then you are mistaken -- only those you explicitly create are 
created, and exactly when you explicitly create them.
 

> Now, lets say we want to optimize that, so we do a "Lazy" version of those 
>>> functions.
>>>
>>
>> There's not much to optimize here. If you don't know ahead of time that 
>> you will be making two updates to the same record (which may possibly 
>> negate each other),
>>
>
> No. My point here was that a lazy-query might be beneficial in some cases, 
> so it could be integrated into the functions, at least optionally, and be 
> chosen by the caller, based on the circumstances.
>

Yes, and my point was that we're already done optimizing because there is a 
better approach in the DAL. If you have an example where your proposed 
optimization would really be the only way, please show it.
   

> Assuming this is SQLA, I don't think that's quite the right syntax -- it 
>> appears you are creating object instances rather than issuing queries. I 
>> believe it should be something like this:
>>
>> This isn't SQLA - it isn't anything at this point - just a suggestion.
>

Well, that's a problem, as it means you are assuming it is feasible to 
build an ORM with some magic behavior without being able to point to any 
existing implementation of such behavior.

Again, assuming this is SQLA, that's not how it works. SQLA does not cache 
>> queries -- when you run a query, it doesn't know what record will be 
>> retrieved, so it doesn't know whether it already has the associated object 
>> in the session. Hence, it will re-run the query both times. (The exception 
>> to this is when you use .get() to fetch a record by primary key, which we 
>> are not doing here.)
>>
>
> See, here is where you got it all wrong and where everything after that 
> brakes down.
>

Sorry, but I believe it is the other way around.
 

> I'm not sure how SQLA is doing this, I can look it up, but I did not 
> assume this to be SQLA.
>
But I think you are wrong even in that case. An ORM layer should be smart 
> enough to know that if you are issuing the same query twice within a 
> transaction, than it should know it already has what it needs. It does not 
> need to cache the query itself to do that. The way it works, is that for 
> each object, there are attributes that have been asked for by a query. So, 
> say, in this case, in the second query (in the second function) the 
> country-object of "France", will have already been existing in memory, so 
> it would be reused, even if the original query was a join. The ORM should 
> be smart enough to generate the "France" object, even though it is not 
> queried explicitly. It might not even have a primary-key in 
> place, because the join operations did not return it, but it would still 
> exist as an object in memory, and have it's "Name" attribute populated with 
> "France", and his "City" attribute populated by some kind of sequence, that 
> would have the "Paris" City-object in place (again, with not even a 
> primary-key attribute inhabited), which will then have a "Population" 
> attribute, with a value of 2000001. So what would happen is that the ORM 
> object-graph would be traversed based on the query-parameters (again, using 
> MY proposed syntax, NOT the SQLA one that you pointed-out), and for each 
> object already existing there that matches the filter, it 
> would traverse inwards without issuing a query - eventually getting to the 
> "Paris" city-object - the EXACT SAME INSTANCE (!) would be returned, and 
> the value of the "Population" attribute, would be knocked back by one.
> Now granted, for this to work in an "Identity-Map", it should be flexible 
> enough to identify the already-existing "Paris" city object, even though it 
> is not asked for by a primary-key, since it is the only one in existence in 
> this case, as is matching the filter-criteria.
> So to summarize, there would be a single-select at the first 
> function-call, then a lazy-update, than a re-use of the city-object in the 
> second call, then another lazy-update, and then a unit-of-work check, that 
> would compare the attributes with their original values, resulting with 
> zero pending updates - so all-in-all, a single operation as I've described.
>

That all sounds great, but there are some problems. First, it is not how 
SQLA works -- from the docs:

However, it doesn’t do any kind of query caching. This means, if you say 
session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right 
there, in the identity map, the session has no idea about that. It has to 
issue SQL to the database, get the rows back, and then when it sees the 
primary key in the row, then it can look in the local identity map and see 
that the object is already there. It’s only when you say query.get({some 
primary key}) that the Session doesn’t have to issue a query...


Additionally, the Session stores object instances using a weak reference by 
default. This also defeats the purpose of using the Session as a cache.

The Session is not designed to be a global object from which everyone 
consults as a “registry” of objects. 


Second, without the primary key, you cannot uniquely identify the 
appropriate record. Let's say there are two records in the database that 
match your query criteria, but you only happen to have one of them in the 
session. If you only query the session, you won't know whether you got the 
right record. I suppose you could get around this if you know for sure 
there is only one record that matches the query, but this won't generally 
be the case. It also won't help in cases where you are retrieving multiple 
records -- even if some of them happen to be in the session, it would still 
need to do a database query to get the others (and it wouldn't know whether 
there are others without actually doing the database query, at which point, 
there is no point to the session query).

Also, note that your proposal requires a mechanism for querying the session 
(using the same syntax as for querying the database). Not only would this 
make for an even greater development effort, but it would add the overhead 
of that session query to every query you run. In fact, even when an object 
is already in the session, it may be nearly as fast or perhaps faster to 
simply query the database directly rather than traverse through the session 
in search of the records. Perhaps this is why SQLA doesn't query the 
session.

Another problem here. Whenever you execute a new query, SQLA flushes the 
>> pending changes. So, when you run the query in the second function, it will 
>> first issue the update to the database from the first change. Once it has 
>> done that, it will ultimately also have to issue the update from the second 
>> function (though perhaps at some later time) in order to have the correct 
>> value in the database.
>>
>
> SQLA would not "necessarily" flush these objects in this case, because 
> this is the same transaction. The ACID "C" of consistency, would assure it 
> that whatever it changed internally, would not have to be flushed before 
> the next query, since nothing else could have changed the transaction that 
> was produced by the database on it's end.
>

The problem is, because SQLA doesn't know what records the query will 
return, it cannot by default skip the flush, otherwise it may retrieve 
stale records. If you know for sure that the query to be issued will not 
return any records that you currently have in the session in a dirty state, 
then you can skip the flush, but you have to be explicit about that, in 
which case, you're no longer benefiting from the automatic behavior of the 
ORM. This is no different from the DAL. In any case, you could not turn off 
automatic flushing in this particular example because you wouldn't know at 
the time of the second query whether or not there are changes that need to 
be flushed.
 

> Granted, the default-behavior of a session in SQLA is to do a flush() 
> before each query (called "autoflushing") but this could be disabled, if 
> you want lazy-updates:
>

No, it could not be disabled without risking conflicts. You can only 
disable it when you know you won't have conflicts (i.e., same as in the 
DAL), which is not the case in this example.

So, I believe we have a minimum of 4 database hits with the ORM (5 if you 
>> lazy load the cities when running the initial query) versus 2 hits with 
>> web2py. 
>>
>
> Nope. We have a single database hit with an ORM.
>

Nope, we have 4 -- see above and below.
  

> We also have more processing and memory usage with the ORM.
>>
>
> Again, since it does not have to re-"execute" the entire 
> database-schema-object-creation on each request, than you got it 
> backwards... 
>

Again, not correct. First, in web2py, you are conflating model definition 
(i.e., DAL.Table) with data object creation (i.e., DAL.Row). Only the table 
definitions happen on each request, and even then, only the ones you need 
for that particular request (if you structure your app that way). As for 
data objects, web2py creates none in this example, and the ORM creates one 
(though it loads the data used to create that object into memory twice).
 

> To summarize:
>>
>>    - The ORM doesn't do direct updates to the database, so it must first 
>>    select the records before updating them, unlike web2py, which can issue a 
>>    direct update.
>>
>> I am sure that even using SQLA, it is possible to do a direct-update - if 
> not using the ORM layer, then surely when using the Core.
>

Yes, I'm sure you can do a direct update using the Core, but recall that 
this is an example where you claimed an ORM would outshine a DAL -- you 
can't then claim that the best way to do it is with the SQLA DAL -- web2py 
already has a DAL.
 

> And I am suggesting developing the same kind of layered-design, so you 
> could still do this via the DAL once you have an ORM on-top of it, though I 
> wouldn't do that explicitly - I would build-up a custom-methode within my 
> ORM class, that would do that using the DAL for me.
>

Fine, but that means this example is moot with regard to why you need an 
ORM to begin with, so we need another example.
 

> But this entire fiasco is completely irrelevant to the example presented, 
> as I used an (arguably poorly-chosen) example just to explaid other 
> capabilities.
>

Though, unfortunately, it doesn't even have the other capabilities you 
claim.
 

> But as you saw, even using my example, it still could be faster and more 
> efficient than the DAL by itself.
>

Seems like the opposite as far as I can tell.
 

>
>>    - The ORM doesn't cache queries, 
>>
>> That's irrelevant since it is not necessary.
>

That means it doesn't cache queries in the sense that you think it does 
(i.e., if you issue the same query, or a query that retrieves some of the 
same records, it does another database hit -- it does not first look for 
the records in the session). And please don't complain about the definition 
of the term "cache queries," as that is the same term used in the SQLA docs 
("query caching", to be more precise) to describe the behavior you imagined 
but does not actually happen.

>
>>    - so it must re-run the query twice, even though the record in 
>>    question is already in the session after the first query.
>>
>> Only if you use auto-flush (in SQLA's case)
>

No, even with auto-flushing turned off it does the select query twice. 
Turning off flushing prevents updates but does not obviate the need for a 
select -- these are two separate issues.

>
>>    - The ORM flushes pending changes before each query, so the first 
>>    update goes to the database before the second query is run, ultimately 
>>    necessitating both updates.
>>
>> (see above)
>

Yes, please do see above.
 

> Also, come to think of it, I am not sure the "flushing" is invalidating 
> the objects that were cashed, I think only a "commit()" does that, so not 
> even if ou leave auto-flush "on", so that the first "update" is being sent 
> when the runtime gets to the second query, it may STILL not follow that the 
> second-query would have to go to the database - it may already have the 
> caches-objects still "valid", even tough the flush() pushed them to the 
> transaction on the database.
>

Again, see above. It still does the query. *After* the query completes, it 
will recognize that the record pulled from the database is the same as the 
one stored in the session, so it will not create a new copy -- but that 
doesn't eliminate the select query.

>
>>    - The ORM must create the record object and manage various operations 
>>    in the session, using more memory and processing time than web2py.
>>
>> That may be so, but it does not automatically mean that it would be less 
> efficient overall, as it has some optimization-capabilities that the DAL 
> does not have.
>

Less efficient in this case and in lots of cases because of the overhead of 
managing the session. Perhaps this is balanced by other cases where it is 
more efficient, but we need to see what those other cases are.

In short, I think you need to look a little more closely at how SQLA really 
works, as it does not appear to do what you think it does. And note that it 
is not a useful response to simply say that we should imagine an ORM that 
does exactly what you wish, even if there is currently no ORM in existence 
that does those things. Your argument here has been that we need an ORM on 
top of the DAL because ORM's by their nature enable additional this 
functionality -- if you can't point to a single ORM that actually has such 
functionality, though, it's hard to argue that such functionality is an 
essential benefit of the ORM pattern, or more generally, that it couldn't 
simply be implemented in a DAL.

Anthony

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[web2py] Re: ORM (?) : A Revisit, NOT a Rebuttal

Reply via email to