Re: lazy-loading all sub-class columns as a group

Jeremy Evans Thu, 30 Jul 2015 13:49:41 -0700

On Thursday, July 30, 2015 at 11:42:36 AM UTC-7, Brian Phillips wrote:
>
> Hello - I have a feature request/suggestion to run by the group for 
> feedback.
>
> *tl;dr* - CTI objects that are not fully populated (because they were 
> retrieved by the superclass dataset) lazily load missing columns in a 
> less-than-optimal way.  I'd like to fix that
>
> *Use Case:*
> We have payments, payment_credit_cards and payment_checks set up using 
> class table inheritance. We convert this set of objects into an array of 
> hashes and then to JSON for an API.
>
> *Problem:*
> When constructing the hash for each object, they are already of the 
> correct class but only have fields from the parent class loaded.  When 
> accessing each column that we want in the resulting hash a new query is 
> performed (CTI uses the lazy_attributes plugin to facilitate this).  The 
> tactical_eager_loading plugin does help in that each column-specific query 
> loads values for all relevant objects in the dataset but if you have a 
> large number of columns in a sub-table it results in a lot of additional 
> queries when in theory, you should be able to load all of those values at 
> once.
>
> *Proposed Solutions:*
> In my opinion, we need a way for columns on a sub-class that aren't 
> already loaded to be loaded as a group instead of one by one as they are 
> accessed.
>
> *Option #1* - implicitly load missing CTI columns when the first one is 
> accessed
>
> This is what I implemented here: 
> https://github.com/jeremyevans/sequel/pull/1055
>
> Basically, the interface would work like this:
>
> cc_payment = Payment.where(kind: PaymentCreditCard).first # no columns 
> from payment_credit_cards loaded
> exp_month = cc_payment.expiration_month # all columns from 
> payment_credit_cards loaded
> exp_year = cc_payment.expiration_year   # no query done as the 
> `expiration_year` column was already loaded on the prior line
>
> *Option #2* - add a method that requires the caller to explicitly load 
> missing columns (but still load them as a group)
>
> This would preserve existing behavior in all cases and expose a new method 
> that could be called to load extra columns.  Something like this:
>
> cc_payment = Payment.where(kind: PaymentCreditCard).first # no columns 
> from payment_credit_cards loaded
> cc_payment.load_attributes(cc_payment.class.columns - cc_payment.keys) # 
> all columns from payment_credit_cards loaded
> exp_month = cc_payment.expiration_month # no query done
> exp_year = cc_payment.expiration_year   # no query done
>
> *Option #3* - combination of #1 and #2 with an option in the CTI plugin 
> that enables #1's behavior (otherwise, #2 is still available on demand)
>
> Something like this:
>
> class Payment < Sequel::Model
>   plugin :class_table_inheritance, bulk_lazy_load: true, ... # enables 
> Option #1's behavior
> end
>
> class SomeOtherModel < Sequel::Model
>   plugin :class_table_inheritance, ... # maintains current behavior
> end
>
> On classes that don't have the "bulk_lazy_load" set to "true", you would 
> use the "load_attributes" method defined in option #2.  Otherwise, your 
> code could rely on option #1's implicit loading.
>
> *My Preference:*
> Obviously, since I coded up Option #1, it would be my first choice 
> (although I could live with any of the above).  The reason I think #1 (or 
> at minimum #3) is the best choice is that I think the caller shouldn't have 
> to think about whether the object it's working with is fully loaded or not. 
>  In this case, I have a PaymentCreditCard object that may or may not be 
> fully populated (depending on if I used the Payment or the 
> PaymentCreditCard dataset to retrieve it).  I think it would be a good 
> feature for the CTI (and thus the lazy_attributes) plugins to support the 
> ability to load all the necessary columns implicitly when it deems it 
> necessary in the most efficient way possible.  Right now, it loads the 
> columns implicitly, but in a less-than-optimal fashion (retrieving one 
> column at a time from the DB).  I'm doubtful that it's obvious to those 
> using the CTI plugin that it has this characteristic which is why I think 
> it should always be loaded implicitly.  However, I would be fine with 
> making it configurable when the plugin is enabled for a class (option #3).
>
> There is some prior art for this exact implementation here: 
> http://docs.sqlalchemy.org/en/rel_0_9/orm/loading_columns.html
>
> *The Question:*
> What option do you find most intuitive and preferable?
>


My main issue with #1 is that it can cause performance issues by retrieving 
unnecessary columns.  In Brian's app, he wants all the columns for the JSON 
serialization, but I'm not sure how common that need is.  Consider a CTI 
subclass table with an integer column and a blob column.  If you want to 
read the integer column, you shouldn't be forced to load the blob column.

I definitely think this is something that Sequel should expose an instance 
method for, so that users have the choice what to load.  I'm OK with option 
#3, since that makes it opt-in for the users that want everything loaded.

I'm eager to get feedback from Sequel users about this.  I don't run CTI in 
production, so while I'm leaning against option #1, I would still consider 
it.  Especially if an instance method is added that chooses what to load, 
then defaulting to loading everything when an unloaded column is queried 
can be worked around by preloading just the columns you are interested in.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Re: lazy-loading all sub-class columns as a group

Reply via email to