> What database are you using? MySQL - the bane of your life ;)
> It currently does an unfiltered, > unordered, limited to 1 select, which should be fastest on most > databases For certain > databases that don't have a syntax for LIMIT and need to be emulated > with subselects, it's possible performance is bad if they don't have a > good query optimizer. This is probably the case for most databases and most queries. The sort of queries I'm running are aggregate and join heavy, so: DB[:table].select(:foo, :bar, :baz, :sum[:quux]).join(...).join(...).join(...).filter(...).group(:foo, :bar, :baz) Datawarehousing/reporting style queries. This may be a MySQL-specific problem, but the grouping and lack of a WHERE clause forces the query to do full table scans building the whole resultset and then ordering, even if you limit the result to 1 row. Filtering makes a huge difference to this sort of query in MySQL (as you can see from the difference between 2ms and 8s). I don't know if this sort of query would have similar problems on other databases. > I think Sequel's current default behavior is best. We should probably > modify the adapter you are using, though. Fair enough. I hacked up a rough implementation this afternoon, which solved my problems, but I haven't got it to the stage where I'm confident it handles every edge case for general consumption. I can polish this up and submit a patch for the MySQL adapter. Cheers, Roland On Sep 16, 5:16 pm, Jeremy Evans <[email protected]> wrote: > On Sep 16, 3:57 am, Roland Swingler <[email protected]> wrote: > > > Hi, > > > I've had a performance issue in one of my apps, and have traced it > > back to Dataset#columns. This method seems to remove any filtering > > from the dataset, and then query for the first row to get the column > > symbols. This is problematic, because while my filtered query takes > > about 2ms, my unfiltered query takes about 8sec, just to get the > > column names. > > What database are you using? It currently does an unfiltered, > unordered, limited to 1 select, which should be fastest on most > databases, which can choose any row in the table. For certain > databases that don't have a syntax for LIMIT and need to be emulated > with subselects, it's possible performance is bad if they don't have a > good query optimizer. In those cases we can probably override > Dataset#columns in the adapter. > > > So here are some thoughts as to how this could be improved, and I'm > > really asking for problems with these methods that I haven't thought > > of that would make #columns return incorrect results. > > > 1. If I've specified the columns I want returned using > > select(:foo, :bar.as(:baz)), then do we not *always* already know what > > the column names will be, (i.e. [:foo, :baz])? Even if I do an > > unaliased calculation/sql-function usage like :sum[:foo], is the > > column name not just the sql version of the calculation? > > For simple cases (all literal symbols), you might be able to. > However, in most cases the names are database dependent. > > > 2. If I haven't specified the columns, or have a '*' somewhere, can we > > not get the column names from the schema? > > Sequel actually does this for models in simple cases, but not for > datasets. I suppose it could do so for datasets as well. In general > the columns query is fast, so you don't need to. And if you are going > to be reusing the same dataset more than once, you should only be > creating it once and using it multiple times, in which case you only > have to pay the columns penalty once. > > > 3. If the above approaches have flaws, would it not be better to keep > > the filtering on the dataset, as that will presumably be faster in > > almost all cases? > > In most cases, an unfiltered, unordered, limited-to-one query will be > fastest. If you add a filter, the database has to check that filter > (either via table or index scan). > > > I'm happy to create a patch that uses a combination of the above > > approaches, as long as no-one can think of any showstoppers with them. > > I think Sequel's current default behavior is best. We should probably > modify the adapter you are using, though. > > Jeremy -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
