Hi Jeremy,

Would it be possible to give a way to overwrite the "STREAMING_SUPPORTED" 
variable in the mysql2 adapter with an option in paged_each?

That way, we could still use paged_each, but use the normal limit, offset 
approach found in other orms. Wouldn't this resolve the issues that happen 
due to mysql streaming?

Aryk

On Friday, September 26, 2014 at 8:27:10 PM UTC-7, Jeremy Evans wrote:
>
> On Friday, September 26, 2014 7:28:30 PM UTC-7, Michael Porter wrote:
>>
>> I've run into an issue with a combination of streaming and associations, 
>> and I cannot find a good way around it (though I'll admit relative 
>> unfamiliarity with Sequel).
>>
>> I have a query that will return around 4 million records.  Obviously 
>> fetching/processing these all at once would be quite a bit to handle, so I 
>> elected to use the streaming functionality.  However, when streaming, I am 
>> unable to resolve associations (as the association lookup occurs before the 
>> stream read is finished), which throws a "Mysql2::Error: Commands out of 
>> sync; you can't run this command now".  To combat this, I read that I could 
>> trigger eager loading of the association(s) ... but only if I use ".all". 
>>  That's not so great an option, with a dataset this size.
>>
>> My current test code is below:
>> # Database configuration
>> DB = Sequel.connect('mysql2://localhost/dummy', :stream => true, :logger 
>> => Logger.new(STDOUT)) 
>>
>> # Database models
>> class ReplicaFile < Sequel::Model(:replica_file) 
>> end 
>>
>> class ReplicaHost < Sequel::Model(:replica_host) 
>> end 
>>
>> class ReplicaHostFile < Sequel::Model(:replica_host_file) 
>>   many_to_one :file, :class => ReplicaFile, :key => :replica_file_id 
>>   many_to_one :host, :class => ReplicaHost, :key => :replica_host_id 
>> end 
>>
>> # Methods 
>> def unsynced_files(filters) 
>>   ReplicaHostFile.where(filters)
>> end 
>>
>> # Main script
>> unsynced_files(:replica_host_id => 127, :file_status_id => [1, 
>> 4]).paged_each 
>> do |r| 
>>   p r.file.filename 
>> end
>>
>>
>> Are there any options, short of doing the join in SQL myself and forgoing 
>> the models?  That's not an impossibility, but I'd really like being able to 
>> use the encapsulation of the models if at all possible.
>>
> Doing it all in a single SQL query is the fastest method if it's possible. 
>  I'd highly recommend that route, especially for 4 million records.
>
> If you really want to do it in ruby, you can either use the sharding 
> support and run the streaming query on a different shard than the query to 
> get the associated records (Database :servers=>{:assoc=>{}} option and 
> using dataset.server(:assoc) for one of the queries).  Alternatively, you 
> can just run the query to get associated records in a different Thread from 
> the Thread running the sharding query.
>
> Note that you can't eager load the associated records, because eager 
> loading requires access to all current records before eager loading, and 
> you don't have that when streaming.
>
> Thanks,
> Jeremy
>

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Reply via email to