Hi Jeremy,
Would it be possible to give a way to overwrite the "STREAMING_SUPPORTED"
variable in the mysql2 adapter with an option in paged_each?
That way, we could still use paged_each, but use the normal limit, offset
approach found in other orms. Wouldn't this resolve the issues that happen
due to mysql streaming?
Aryk
On Friday, September 26, 2014 at 8:27:10 PM UTC-7, Jeremy Evans wrote:
>
> On Friday, September 26, 2014 7:28:30 PM UTC-7, Michael Porter wrote:
>>
>> I've run into an issue with a combination of streaming and associations,
>> and I cannot find a good way around it (though I'll admit relative
>> unfamiliarity with Sequel).
>>
>> I have a query that will return around 4 million records. Obviously
>> fetching/processing these all at once would be quite a bit to handle, so I
>> elected to use the streaming functionality. However, when streaming, I am
>> unable to resolve associations (as the association lookup occurs before the
>> stream read is finished), which throws a "Mysql2::Error: Commands out of
>> sync; you can't run this command now". To combat this, I read that I could
>> trigger eager loading of the association(s) ... but only if I use ".all".
>> That's not so great an option, with a dataset this size.
>>
>> My current test code is below:
>> # Database configuration
>> DB = Sequel.connect('mysql2://localhost/dummy', :stream => true, :logger
>> => Logger.new(STDOUT))
>>
>> # Database models
>> class ReplicaFile < Sequel::Model(:replica_file)
>> end
>>
>> class ReplicaHost < Sequel::Model(:replica_host)
>> end
>>
>> class ReplicaHostFile < Sequel::Model(:replica_host_file)
>> many_to_one :file, :class => ReplicaFile, :key => :replica_file_id
>> many_to_one :host, :class => ReplicaHost, :key => :replica_host_id
>> end
>>
>> # Methods
>> def unsynced_files(filters)
>> ReplicaHostFile.where(filters)
>> end
>>
>> # Main script
>> unsynced_files(:replica_host_id => 127, :file_status_id => [1,
>> 4]).paged_each
>> do |r|
>> p r.file.filename
>> end
>>
>>
>> Are there any options, short of doing the join in SQL myself and forgoing
>> the models? That's not an impossibility, but I'd really like being able to
>> use the encapsulation of the models if at all possible.
>>
> Doing it all in a single SQL query is the fastest method if it's possible.
> I'd highly recommend that route, especially for 4 million records.
>
> If you really want to do it in ruby, you can either use the sharding
> support and run the streaming query on a different shard than the query to
> get the associated records (Database :servers=>{:assoc=>{}} option and
> using dataset.server(:assoc) for one of the queries). Alternatively, you
> can just run the query to get associated records in a different Thread from
> the Thread running the sharding query.
>
> Note that you can't eager load the associated records, because eager
> loading requires access to all current records before eager loading, and
> you don't have that when streaming.
>
> Thanks,
> Jeremy
>
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.