On Wed, Mar 07, 2012 at 10:30:54PM +0000, Emmanuel MacCaull scratched on the 
wall:
> Hi,
> 
> I'm writing a custom aggregate function whose result depends on the sort
> order of a subselect. It isn't clear whether this a good idea 

  Not a good idea.

> and the documentation on group_concat(X) seems to suggest that the
> order that rows are processed by an aggregate function is not
> guaranteed ("The order of the concatenated elements is arbitrary."
> -- http://www.sqlite.org/lang_aggfunc.html).

  That's true.

> Given something like this:
> 
> SELECT group_concat(name)
> FROM (SELECT * FROM table ORDER BY name)
> GROUP BY some_id;
> 
> I understood the documentation of group_concat() as meaning that even
> though there is an ORDER BY in the subselect, the order of the
> concatenated names is not guaranteed.
>
> If that is the case, it seems logical that the restriction is only there
> because the order in which the aggregate function processes each row is
> arbitrary. Does anyone know if this is true or is it safe to assume that
> the aggregate will access each row in order?

  Not so much "arbitrary" as undefined.  If you run the same query on
  the same data using the same version of SQLite, I'm sure you'll get
  the same results.  That said, you shouldn't depend on that.
  
  SQL tables themselves have no ordering... they are properly considered
  sets of rows (in the formal mathematical sense of "set").  Similarly,
  most working sets of data (relational variables) have no defined
  ordering.  A *result* set may have an order, defined by an ORDER BY,
  but that order may not hold when the result of a sub-select is cast
  back into a data source (such as above).

  In short, the database engine is able to reorder anything it wants,
  at any time, except for the final ORDER BY.  While a specific query
  may do what you expect (at least for this version of SQLite) things
  may change.  Heck, just adding an index can alter the query plan and
  change orderings.  SQLite even has a pragma to purposely re-order
  queries just to test for unintentional order dependencies.

  For example, most (but not all; and not all the time) database engines
  will implement a GROUP BY clause by sorting the rows according to the
  GROUP BY expressions.  This puts all "like" rows next to each other,
  making it easy to collapse the groups into individual rows.  There
  is no reason to assume that sorting process is stable, or that it
  will preserve the sub-select ordering in any way.  In fact, I
  wouldn't be surprised to find out some query engines just ignores a
  "data source" sub-select's ORDER BY all together, since it doesn't
  make semantic sense.  There are all kinds of additional optimizations
  that can open up.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to