On Sep 16, 3:57 am, Roland Swingler <[email protected]> wrote: > Hi, > > I've had a performance issue in one of my apps, and have traced it > back to Dataset#columns. This method seems to remove any filtering > from the dataset, and then query for the first row to get the column > symbols. This is problematic, because while my filtered query takes > about 2ms, my unfiltered query takes about 8sec, just to get the > column names.
What database are you using? It currently does an unfiltered, unordered, limited to 1 select, which should be fastest on most databases, which can choose any row in the table. For certain databases that don't have a syntax for LIMIT and need to be emulated with subselects, it's possible performance is bad if they don't have a good query optimizer. In those cases we can probably override Dataset#columns in the adapter. > So here are some thoughts as to how this could be improved, and I'm > really asking for problems with these methods that I haven't thought > of that would make #columns return incorrect results. > > 1. If I've specified the columns I want returned using > select(:foo, :bar.as(:baz)), then do we not *always* already know what > the column names will be, (i.e. [:foo, :baz])? Even if I do an > unaliased calculation/sql-function usage like :sum[:foo], is the > column name not just the sql version of the calculation? For simple cases (all literal symbols), you might be able to. However, in most cases the names are database dependent. > 2. If I haven't specified the columns, or have a '*' somewhere, can we > not get the column names from the schema? Sequel actually does this for models in simple cases, but not for datasets. I suppose it could do so for datasets as well. In general the columns query is fast, so you don't need to. And if you are going to be reusing the same dataset more than once, you should only be creating it once and using it multiple times, in which case you only have to pay the columns penalty once. > 3. If the above approaches have flaws, would it not be better to keep > the filtering on the dataset, as that will presumably be faster in > almost all cases? In most cases, an unfiltered, unordered, limited-to-one query will be fastest. If you add a filter, the database has to check that filter (either via table or index scan). > I'm happy to create a patch that uses a combination of the above > approaches, as long as no-one can think of any showstoppers with them. I think Sequel's current default behavior is best. We should probably modify the adapter you are using, though. Jeremy -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sequel-talk?hl=en.
