Thanks. Is there a nice way to intelligently carve off populated models
from a direct query's results (assuming all columns are included in the
result)? This is of course assuming that there is no row duplication
created by a many relationship. I imagine there are quite a few potential
pitfalls, but I'd really like to push developers to encapsulate business
rules in models, as I'm attempting to restructure a rather messy haphazard
Perl application, so being able to populate models even when the result set
necessitates direct queries would be quite handy. If such a method doesn't
exist, what gotchas could you point out in populating and using a model
with existing values from a table, such as dirty checks, etc.?
Thanks again for the help,
On Friday, September 26, 2014 11:27:10 PM UTC-4, Jeremy Evans wrote:
>
> On Friday, September 26, 2014 7:28:30 PM UTC-7, Michael Porter wrote:
>>
>> I've run into an issue with a combination of streaming and associations,
>> and I cannot find a good way around it (though I'll admit relative
>> unfamiliarity with Sequel).
>>
>> I have a query that will return around 4 million records. Obviously
>> fetching/processing these all at once would be quite a bit to handle, so I
>> elected to use the streaming functionality. However, when streaming, I am
>> unable to resolve associations (as the association lookup occurs before the
>> stream read is finished), which throws a "Mysql2::Error: Commands out of
>> sync; you can't run this command now". To combat this, I read that I could
>> trigger eager loading of the association(s) ... but only if I use ".all".
>> That's not so great an option, with a dataset this size.
>>
>> My current test code is below:
>> # Database configuration
>> DB = Sequel.connect('mysql2://localhost/dummy', :stream => true, :logger
>> => Logger.new(STDOUT))
>>
>> # Database models
>> class ReplicaFile < Sequel::Model(:replica_file)
>> end
>>
>> class ReplicaHost < Sequel::Model(:replica_host)
>> end
>>
>> class ReplicaHostFile < Sequel::Model(:replica_host_file)
>> many_to_one :file, :class => ReplicaFile, :key => :replica_file_id
>> many_to_one :host, :class => ReplicaHost, :key => :replica_host_id
>> end
>>
>> # Methods
>> def unsynced_files(filters)
>> ReplicaHostFile.where(filters)
>> end
>>
>> # Main script
>> unsynced_files(:replica_host_id => 127, :file_status_id => [1,
>> 4]).paged_each
>> do |r|
>> p r.file.filename
>> end
>>
>>
>> Are there any options, short of doing the join in SQL myself and forgoing
>> the models? That's not an impossibility, but I'd really like being able to
>> use the encapsulation of the models if at all possible.
>>
> Doing it all in a single SQL query is the fastest method if it's possible.
> I'd highly recommend that route, especially for 4 million records.
>
> If you really want to do it in ruby, you can either use the sharding
> support and run the streaming query on a different shard than the query to
> get the associated records (Database :servers=>{:assoc=>{}} option and
> using dataset.server(:assoc) for one of the queries). Alternatively, you
> can just run the query to get associated records in a different Thread from
> the Thread running the sharding query.
>
> Note that you can't eager load the associated records, because eager
> loading requires access to all current records before eager loading, and
> you don't have that when streaming.
>
> Thanks,
> Jeremy
>
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.