Re: Rcte tree models with many-to-many associations are slow

Jeremy Evans Tue, 03 Dec 2013 17:41:21 -0800

On Tuesday, December 3, 2013 3:07:35 PM UTC-8, Nels Nelson wrote:

> On Tuesday, December 3, 2013 2:38:35 PM UTC-6, Jeremy Evans wrote:
>>
>> On Mon, Dec 2, 2013 at 9:00 PM, Nels Nelson wrote:
>>
>>> Thank you!  I completely failed to find that in the docs.  I'm sure it's 
>>> there, and I just missed it.  It is exactly ideal, and it works perfectly.
>>>
>>
>> Great!
>>
>
> Oops, I spoke too soon.  So, the after_load method applies exclusively to 
> the artist node's albums association after the album association loads, and 
> is defined with a proc on the association definition.  What I really need 
> is something that runs after the artist node loads, and then iterates over 
> the album association members for that node.  Removing the 
> after_initialize method and instead trying to use the after_load method 
> on the album association resulted only in my custom code not getting 
> invoked at all, resulting in the perceived performance improvement.
>


What's your reason you need this to be in after_initalize, instead of 
called lazily when the association is loaded?  Note that if you are eager 
loading the association with the object, after_load will operate as if it 
was called by after_initialize.  In other words:

  Bar.many_to_many :foos, :after_load=>proc{...}
  Bar.eager(:foos).all{ #foos after load proc alread called once for each 
bar}
  

class Artist < Sequel::Model
>   plugin :rcte_tree
>   plugin :caching, GloballyDefinedConcurrentHashMapInstance
>   plugin :tactical_eager_loading
>   many_to_many :albums,
>                :class => Album,
>                :join_table => :artists_to_albums,
>                :left_key => :artist_id,
>                :right_key => :album_id
>
>   def after_initialize
>     for influenced in self.children
>       for album in influenced.albums
>         something_special_with(album)
>       end
>     end
>   end
> end
>
> Many Artists, may be associated to many Albums.  Artists are nodes in a 
> tree.  Upon loading of an Artist node's children Artist nodes, I would like 
> each of the children Artist nodes to have included each of their respective 
> associated Albums so that they are available immediately, and without 
> additional SQL queries.
>

You probably want to add this after plugin: rcte_tree:

  one_to_many :children, :clone=>:children, :eager=>:albums
 
This makes sure that when you load the children association, the albums 
association for the children is eagerly loaded.

I still can't detect a reason that you would want to do this during 
after_initialize, unless something_special_with(album) mutates the 
receiver.  Is that the case?
 

>
> So when I do,
>
>   jimi = Artist.where(:name => 'Jimi Hendrix')
>   # Meanwhile, the after_initialize hook method is executed.
>
> (For the sake of this example, please ignore the fact that in the real 
> world, artists may of course have more than one influencer.)
>

I should probably also ignore that Model#where returns a dataset, not a 
model instance. :)  I'm assuming you want to add a .first at the end there. 
 Assuming that's the case, this should be three queries:

1) Retrieve the first artist with the name Jimi Hendrix
2) Retrieve the children of that artist
3) Retrieve the albums for those children
 

> It would be nice if at most only two queries were executed here.  Only one 
> query would be even better, but I can never keep straight whether or not 
> the children association can be loaded eagerly using rcte_tree -- let alone 
> loaded eagerly while also eagerly loaded with their respective albums 
> association members.  I'm pretty sure the answer is no.
>

Technically, you could do it in two queries if the children association 
uses :eager_graph=>:albums instead of :eager=>:albums (I think, I didn't 
test that).  I'm not sure it would perform better that way, though.
 

> Furthermore, it would be even nicer if after_initialize only ran one time 
> ever, or until the model instances were marked as modified, or until the 
> cached instances were explicitly cleared.  Unfortunately, the only instance 
> that winds up making it into the cache is the instance that gets assigned 
> to jimi.  And of course, none of jimi's albums make it into the cache 
> either.
>

The caching plugin only populates the cache when you do a primary key 
lookup, and only uses cache lookups by primary key, so based on the code 
you've shown, I wouldn't use it.  Roll your own caching:

  def after_initialize
    GloballyDefinedConcurrentHashMapInstance[pk] ||= children.each{|artist| 
artist.each(&method(:something_special_with))}
  end

So this doesn't cache the object, but caches the presumably expensive 
calculation. Good luck with cache invalidation :)

I hope this helps clear things up.  I'm having trouble being as clear as 
> I'd like to be.
>
> I did try using the suggested
>
> one_to_many :children, :clone=>:children, :eager=>:albums
>
>
> on the Artist model definition, but not only does it not seem to achieve 
> the desired performance improvements, it also causes some weird errors when 
> attempting to access a count of the cloned Artist.children association.  I 
> can investigate more about that, if need be.
>

If you post some details about that, it would be helpful.  Sequel 4.4.0 
fixed a couple bugs related to association cloning, but it's possible there 
are more.  You could always manually recreate the association in the 
meantime:

  one_to_many :children, :key=>:parent_id, :class=>self, :eager=>:albums
 

>
>
> ... a given node's branch is programmatically pre-order traversed, and 
>>> information retrieved.  I may have to rework this, because it is bit of a 
>>> performance pain point for the application.
>>>
>>
>> That sounds interesting.  If you want to provide details, I may be able 
>> to give advice in that area. 
>>
>
> The algorithm is an adaptation from another program, and it is rather 
> convoluted.  Basically, it performs a pre-order traversal of a branch 
> starting from a given node, and then does some complex book-keeping along 
> the way, foregoing recursive pre-order traversals of certain nodes meeting 
> specific criteria.
>
> The reason this routine is so slow is because some hundreds of queries and 
> after_initialize hooks get executed against the datasource backend, even if 
> there are only a couple dozen nodes in the branch.
>

I'm assuming that's due to things not being eagerly loaded currently.  
 

> I'm pretty certain that if the loading and caching problem with the 
> example above is solved, then this routine's performance will improve as 
> well.  Furthermore, if I were able to leverage the descendants dataset of 
> the given branch node, and somehow managed to devise a filter based on the 
> algorithm's criteria, then I imagine this bottleneck would be solved 
> completely.
>

There's always:

  jimi.descendants_dataset.where(...).all
 

> However, the associations with which I am concerned are not the children 
>>> associations.  They are just the normal many-to-many associations on the 
>>> individual nodes.
>>>
>>
>> OK.  I think the standard association :eager option or the 
>> tactical_eager_loading plugin should help in those cases, modulo caching 
>> issues.
>>
>
> I am having difficulty finding example of how to apply the :eager option 
> to a many_to_many association definition.  Does this involve defining and 
> specifying a proc?  Or is it as simple as
>
>   many_to_many :albums, :eager => true
>

It would be something like:

  many_to_many :albums, :eager=>:some_Album_association

This would result in loading albums for a given artist would additionally 
preload an association for each of those albums (using a single query).
 

> Thanks again for your time, Jeremy.  I hope that you are not the only one 
> in this forum who ever replies to questions?  :D
>

For most of the hard questions, I am. :)  Many easy questions are answered 
by the documentation, so I do tend to answer most of the questions myself, 
though thankfully other contributors help out, especially when the question 
is more ecosystem related than Sequel related.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Rcte tree models with many-to-many associations are slow

Reply via email to