On Sun, 31 Jan 2016 17:39:28 +0100 Yannick Duch?ne <yannick_duchene at yahoo.fr> wrote:
> I saw a page (can't retrieve the URL) suggesting to order table > columns by names. It was strange to me, as I had the idea of a > hierarchical access for tables access. But I though ?there must be a > good reason for them to say this?. On average, the quality of advice on the Internet is average, for SQL doubly so. Some of it is truely terrible. Because the advice on this list is peer-reviewed (in the sense that people who wrote the code participate), it tends to be very good. My advice, ahem, is to choose chemistry over alchemy. If you don't understand why the advice is well founded, keep checking until you do. If sometimes good foundations are a little mysterious, the contrary is not true: unfounded assumptions flush out early. I want to answer your question a little bit abstractly, and then circle back to SQLite. We know how the table is physically ordered. But there's no WHERE clause; the whole table will be scanned. Building an output table for the aggregates will be required regardless. The only difference would be if the cardinality of a, b, and c were differerent. That is, if GROUP BY A produces many fewer rows than GROUP BY B, we would expect it to run faster. Otherwise it's an artifact of the implementation, not something inherent in the problem. Yet your counts indicate the opposite: GROUP BY B is the smallest output. How to explain? Or, um, EXPLAIN? Consider that GROUP BY A will cause each A output row to be summed using the input (from t) sequentially. As the input is consumed, we move to the next output row whenever A changes. There's no seeking from one output row to another. For GROUP BY B, each input row, read sequentially, means a seek (logically speaking) to the appropriate output row. If the cost of that seek is not O(1), it will add to the time used to create the output. Because SQLite tables are based on B+ trees, that seek cost is O(log2 n). I'd say that's the source of the difference you're seeing. EXPLAIN shows there's an optimization for GROUP BY A, probably because the output can be constructed sequentially. And that's a defensible cboice, because aggregation-seeking isn't usually a high cost (as in fact it's not in your case, either). Aggregations by definition are reductions. Outputs are usually small, because otherwise the GROUP BY produces incomprehensible results. 10,000 rows of aggregated output isn't unheard of, but it's unusually many, and to find one row in 10,000 in a binary tree requires at most 15 comparisons. More commonly outputs are in the hundreds of rows, and need half that many. It could be faster, but only at the expense of code complexity and memory footprint. HTH, to see the lite. --jkl