On 9/6/07, Michael Koziarski <[EMAIL PROTECTED]> wrote:

The tests aren't particularly contrived in this case, I've written
> similar stuff in the past.  However the large performance boost you're
> seeing is probably worth investigating whether we can apply those
> joins a little more selectively, or let users opt out of them.


Perhaps a poor choice of words to say contrived... there is definitely use
for :conditions on the eager-loaded table.  My assertion is more along the
lines that the utility of those :conditions is fairly limited in scope and
easily replaced by explicit :joins which can be tailored to a wider variety
of effects.

The solution I'm proposing is the simplest in that it doesn't add any
additional complexity.  If we want something more robust we can add code to
explicitly check the table names in :conditions/:order (this would be ideal
because it could add only the necessary eager joins).  The only downside is
there would be some complexity added to
the construct_finder_sql_for_association_limiting method.

Offering an opt-out option is another possibility (in fact you can opt out
implicitly by NOT referring to another table in the :conditions/:order, but
it could be more fine-grained), but I don't see how to make the interface
make sense.  It's a pretty specific case, and a specific option to control
limited eager loading would seem out of place with the general options
available to :find now.  That's the reason I proposed what I did, because I
think it would make the most sense to those who are unaware of the
internals.

My understanding is that the conditions / joins *have* to be applied
> in order for limit / offset to work with an :order or :conditions when
> using eager loading.  If you apply the limit / offset to the final
> query, it's not limiting correctly, and if you leave the order or
> conditions off that query your paginated result set will behave
> erratically, with varying numbers per page, or repeated entities.


Currently the functions that check this (include_eager_conditions? and
include_eager_order?) aren't smart enough to actually figure out when the
joins HAVE to be applied.  They simply check that another table is referred
to... they don't check whether that table is an eager-loaded table or one
from the explicit :joins or whether it's even included in the query in which
case it causes a database error.

In fact I posted another patch just prior (8496) that fixes an issue with
INNER JOINs that results in breakage.  I didn't comment on that one here
since it should be a much more obvious patch to apply.


> For an example:
>
> @tickets_from_priority_customers = Ticket.find(:all,
> :conditions=>["tickets.open = ? and customers.priority = ?", true,
> true], :include=>[:customers])
>
> Now if you start paginating that list,  your stuff will cause bugs?


Correct.  The basic gist of my argument is that this type of query is
limited because it only lets you operate against the generic LEFT OUTER
JOIN.  It's slightly opaque because you have to know that :include is doing
a LEFT OUTER JOIN customers ON customers.ticket_id = tickets.id.  So the
condition you give selects tickets that have at least 1 customer with
priority=1.  In this case that makes sense, but I don't think it's so
intuitive the other way around.  For instance:

@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>["tickets.open = ? and customers.priority != ?", true,
true], :include=>[:customers])

In that case the developer might think they are querying for all tickets
that aren't associated with a priority customer, when in fact they are
querying for tickets that have at least one non-priority customer.  Granted,
they can get what they want by using the complement of the original query
(in this case false), but my point here is that the user requires a
knowledge of what :include is doing behind the scenes, and the resulting
LEFT JOIN is not particularly flexible.  In addition, the :conditions end up
restricting which associated rows are fetched, with further limits the
general utility.  If I want a list of all tickets with at least one priority
customer associated, it doesn't mean I only want to see the priority
customers in the association.  Typically I would want the whole association,
and therefore I would be better off doing:

@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>["tickets.open = ?",
true], :include=>[:customers], :joins => "INNER JOIN customers AS c2 ON
tickets.id = c2.ticket_id AND c2.priority=1")

or even

@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>["tickets.open = ? AND c2.id IS NOT NULL",
true], :include=>[:customers], :joins => "LEFT JOIN customers AS c2 ON
tickets.id = c2.ticket_id AND c2.priority=1")


I'm not totally wedded to this idea, but all else being equal I would prefer
not to have the eager-loading JOINs added in the pre-query.  Building a
smarter method that checks for specific table names and does a finer-grained
analysis would certainly make the whole thing more robust... that wasn't my
first choice because it's more work and it adds complexity that I wasn't
sure you would want.  If the core team thinks that's a better way to go, I'm
happy to work on a patch though...

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Core" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-core?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to