If you need to process a million records, you may want to process them
in a specific order like most recently update first since those may be
the most important and it may take a while to finish the batch.
Processing them in primary key order may well mean that the most
important records are processed last.

The patch I submitted does just what Jack suggests. While you will end
up with 35Mb for a million records, it will be a constant memory size
and will not bloat as the batch is processed if you load associations.
While 35Mb may be too much for an environment with limited resources,
you could break it up using :limit and :offset into smaller batches
like any other ActiveRecord find.

On Mar 5, 4:36 pm, Ryan Angilly <[email protected]> wrote:
> Yeah that's possible.  But then you could run into the same memory bloat
> problem.  Just a million records with UUID pk's gives you over 35 megabytes
> of data to store in memory (not including any of the overhead due to Ruby's
> Array and String classes).  Maybe that's enough of an edge case where it's
> worth ignoring for now, but then I still come back to the original question:
> can you think of a case where you really need to order it?
>
> On Thu, Mar 5, 2009 at 5:28 PM, Jack Christensen <[email protected]>wrote:
>
> >  Ryan Angilly wrote:
>
> > If, for example, you order by 'Name', and then while batch processing you
> > insert a new record with the name "Angilly", it is possible (probable
> > actually) that your batch processing will miss records.  ID's are not very
> > likely to change, and the auto-incrementing means it's much more difficult
> > to miss records.  This is also the reason that it requires integer ID's (and
> > not UUIDs).
>
> > I haven't looked closely at the code, but wouldn't it be possible to just
> > select all the ID's in one query with whatever order criteria were desired,
> > then use those ID's to select the full records in batches. It would be
> > something like how eager loading does multiple selects in some cases.
>
> > Jack
>
> > Cheers,
> > Ryan
>
> > On Thu, Mar 5, 2009 at 8:27 AM, blj <[email protected]> wrote:
>
> >> May be I am missing something obvious, but I do not understand why the
> >> method find_in_batches cannot take the option[:order].
>
> >> I have created a tickethttp://is.gd/lUwXwith a patch to remove the
> >> error raised accordingly.
>
> >> Thanks.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Core" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-core?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to