lazy-loading all sub-class columns as a group

Brian Phillips Thu, 30 Jul 2015 11:43:15 -0700

Hello - I have a feature request/suggestion to run by the group for 
feedback.


*tl;dr* - CTI objects that are not fully populated (because they were 
retrieved by the superclass dataset) lazily load missing columns in a 
less-than-optimal way.  I'd like to fix that

*Use Case:*
We have payments, payment_credit_cards and payment_checks set up using 
class table inheritance. We convert this set of objects into an array of 
hashes and then to JSON for an API.

*Problem:*
When constructing the hash for each object, they are already of the correct 
class but only have fields from the parent class loaded.  When accessing 
each column that we want in the resulting hash a new query is performed 
(CTI uses the lazy_attributes plugin to facilitate this).  The 
tactical_eager_loading plugin does help in that each column-specific query 
loads values for all relevant objects in the dataset but if you have a 
large number of columns in a sub-table it results in a lot of additional 
queries when in theory, you should be able to load all of those values at 
once.

*Proposed Solutions:*
In my opinion, we need a way for columns on a sub-class that aren't already 
loaded to be loaded as a group instead of one by one as they are accessed.

*Option #1* - implicitly load missing CTI columns when the first one is 
accessed

This is what I implemented 
here: https://github.com/jeremyevans/sequel/pull/1055

Basically, the interface would work like this:

cc_payment = Payment.where(kind: PaymentCreditCard).first # no columns from 
payment_credit_cards loaded
exp_month = cc_payment.expiration_month # all columns from 
payment_credit_cards loaded
exp_year = cc_payment.expiration_year   # no query done as the 
`expiration_year` column was already loaded on the prior line

*Option #2* - add a method that requires the caller to explicitly load 
missing columns (but still load them as a group)

This would preserve existing behavior in all cases and expose a new method 
that could be called to load extra columns.  Something like this:

cc_payment = Payment.where(kind: PaymentCreditCard).first # no columns from 
payment_credit_cards loaded
cc_payment.load_attributes(cc_payment.class.columns - cc_payment.keys) # 
all columns from payment_credit_cards loaded
exp_month = cc_payment.expiration_month # no query done
exp_year = cc_payment.expiration_year   # no query done

*Option #3* - combination of #1 and #2 with an option in the CTI plugin 
that enables #1's behavior (otherwise, #2 is still available on demand)

Something like this:

class Payment < Sequel::Model
  plugin :class_table_inheritance, bulk_lazy_load: true, ... # enables 
Option #1's behavior
end

class SomeOtherModel < Sequel::Model
  plugin :class_table_inheritance, ... # maintains current behavior
end

On classes that don't have the "bulk_lazy_load" set to "true", you would 
use the "load_attributes" method defined in option #2.  Otherwise, your 
code could rely on option #1's implicit loading.

*My Preference:*
Obviously, since I coded up Option #1, it would be my first choice 
(although I could live with any of the above).  The reason I think #1 (or 
at minimum #3) is the best choice is that I think the caller shouldn't have 
to think about whether the object it's working with is fully loaded or not. 
 In this case, I have a PaymentCreditCard object that may or may not be 
fully populated (depending on if I used the Payment or the 
PaymentCreditCard dataset to retrieve it).  I think it would be a good 
feature for the CTI (and thus the lazy_attributes) plugins to support the 
ability to load all the necessary columns implicitly when it deems it 
necessary in the most efficient way possible.  Right now, it loads the 
columns implicitly, but in a less-than-optimal fashion (retrieving one 
column at a time from the DB).  I'm doubtful that it's obvious to those 
using the CTI plugin that it has this characteristic which is why I think 
it should always be loaded implicitly.  However, I would be fine with 
making it configurable when the plugin is enabled for a class (option #3).

There is some prior art for this exact implementation 
here: http://docs.sqlalchemy.org/en/rel_0_9/orm/loading_columns.html

*The Question:*
What option do you find most intuitive and preferable?

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

lazy-loading all sub-class columns as a group

Reply via email to