On Fri, 05 Jul 2013 12:38:37 +0200
Gabriel Corneanu <gabrielcorne...@gmail.com> wrote:

> About complexity: I'm not sure it's NlogN; for each N you need to
> count N-1 columns, that's N^2 IMO.

You're right if the data aren't sorted.  

If the data are sorted, to *find* the largest value smaller than a given
value is O(log N), as you know.  

Counting can be minimized by an algorithm that avoids recounting.
Consider the sequence A,B,C,D,E.  We want to rank them, obtaining:

        A       0
        B       1
        C       2
        D       3
        E       4

We first count all how many are smaller than A, finding 0.  We keep
that number, because it will be useful later.  

To count those smaller than B, we first find the largest value smaller
than B, i.e. A, and look up its count (0), and add 1 to that, producing
1, which we keep.  Then we find B for C and add 1 to B's count,
producing 2 for C, and so on.  For every element N we have a lookup
(log N) and a constant increment: O(N log N).  

> And you have an EXTRA temporary B-TREE? Doesn't it matter??
> Although I don't really understand why, it has an index on it.

Sure it matters!  It's not intrinsic to the operation. It represents an
opportunity to improve the optimizer.  

> My original concern is indeed simplicity and efficiency.
> But I'm NOT advocating any complex RANK function, but rather to
> expose (as pseudo-data) the row index IN THE CURRENT RESULT SET;
> while usually the result set is ordered, it doesn't really matter.
> This information HAS NO other technical meaning (like rowid).

Yes, and my answer to you is two-fold:

1.  If you're only interested in the number of the row as delivered to
the application, you can do that better in your application with i++
than SQLite can.  In that case, both application and library have
exactly the same information, and the computation is trivial.  

(Others on this thread say their favorite container library doesn't
include a "row index" column.  But that library is mated to SQLite
somewhere, and that feature could be added there, perhaps in the
callback function to sqlite3_exec().) 

2.  AIUI you suggested a SQL function that would produce the "row
index".  Such a function could be referenced in e.g. a WHERE clause, a
very bad thing, whether or not is has "other technical meaning".  

In the relational model, there are no pointers and no
pseudo-data.  Everything is referenced by value.  It was an intentional
choice made for *convenience*: Codd recognized that no index has any
meaning without an associated value. Eliminate the index and the value
still stands, leaving the user with one less thing to think about.
OTOH if removing the index does remove information, then the index is
*data* and can be be represented in the database as data i.e., as a
value.  

Something about your "row index" relates back to the data.  The
relational model has no way to represent data except as *values*
Introducing a "row index" function into SQLite would provide complexity
without power: another way to address the data, but no new way to
express or derive anything about them.  

You can't have it both ways.  Either the "row index" has no
relationship to the data and therefore no reason to exist in the
database, or it expresses something implied about the order and
therefore is a value.  Any example you give will fall into one of those
two categories.  

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to