Re: Streaming, eager loading, and associations

Jeremy Evans Fri, 26 Sep 2014 20:27:41 -0700

On Friday, September 26, 2014 7:28:30 PM UTC-7, Michael Porter wrote:
>
> I've run into an issue with a combination of streaming and associations, 
> and I cannot find a good way around it (though I'll admit relative 
> unfamiliarity with Sequel).
>
> I have a query that will return around 4 million records.  Obviously 
> fetching/processing these all at once would be quite a bit to handle, so I 
> elected to use the streaming functionality.  However, when streaming, I am 
> unable to resolve associations (as the association lookup occurs before the 
> stream read is finished), which throws a "Mysql2::Error: Commands out of 
> sync; you can't run this command now".  To combat this, I read that I could 
> trigger eager loading of the association(s) ... but only if I use ".all". 
>  That's not so great an option, with a dataset this size.
>
> My current test code is below:
> # Database configuration
> DB = Sequel.connect('mysql2://localhost/dummy', :stream => true, :logger 
> => Logger.new(STDOUT)) 
>
> # Database models
> class ReplicaFile < Sequel::Model(:replica_file) 
> end 
>
> class ReplicaHost < Sequel::Model(:replica_host) 
> end 
>
> class ReplicaHostFile < Sequel::Model(:replica_host_file) 
>   many_to_one :file, :class => ReplicaFile, :key => :replica_file_id 
>   many_to_one :host, :class => ReplicaHost, :key => :replica_host_id 
> end 
>
> # Methods 
> def unsynced_files(filters) 
>   ReplicaHostFile.where(filters)
> end 
>
> # Main script
> unsynced_files(:replica_host_id => 127, :file_status_id => [1, 4]).paged_each 
> do |r| 
>   p r.file.filename 
> end
>
>
> Are there any options, short of doing the join in SQL myself and forgoing 
> the models?  That's not an impossibility, but I'd really like being able to 
> use the encapsulation of the models if at all possible.
>
Doing it all in a single SQL query is the fastest method if it's possible. 
 I'd highly recommend that route, especially for 4 million records.


If you really want to do it in ruby, you can either use the sharding 
support and run the streaming query on a different shard than the query to 
get the associated records (Database :servers=>{:assoc=>{}} option and 
using dataset.server(:assoc) for one of the queries).  Alternatively, you 
can just run the query to get associated records in a different Thread from 
the Thread running the sharding query.

Note that you can't eager load the associated records, because eager 
loading requires access to all current records before eager loading, and 
you don't have that when streaming.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Re: Streaming, eager loading, and associations

Reply via email to