[sqlite] Random-access sequences
On 2016/03/02 2:26 AM, James K. Lowden wrote: > On Tue, 1 Mar 2016 08:15:25 -0500 > Richard Damon wrote: > >>> The theoretical maximum number of rows in a table is 264 >>> (18446744073709551616 or about 1.8e+19). This limit is unreachable >>> since the maximum database size of 140 terabytes will be reached >>> first. A 140 terabytes database can hold no more than approximately >>> 1e+13 rows, and then only if there are no indices and if each row >>> contains very little data. >>> >> You can hit 2^63 insertions well before hitting the size limit of the >> database if you have also been doing deletions. > Yes. If you manage 1,000,000 insertion/second, that's 3.15576 * > 10^13/year. You would run out of integers in 584,542 years. > > To get around that, add an "epoch" column, also integer. > Initially it is always zero. Whenever "position" exceeds 2^63, > increment "epoch" and reset "position" to zero. > > That will give you at least twice as many years. No, that will give you 2^63 as many years... Which is many many times more than the age of the Universe and slightly less entries than there are atoms in the known Universe (About 10^80 on last count). Good idea, I think it will be enough. :)
[sqlite] Random-access sequences
In RAM, the simple implementation would be to have a balanced tree, ordered by index, where every node knows how many elements are below it, and has pointers to its children. The tree will have O(logN) depth, following a pointer is O(1), and, and all of your operations involve a small (constant-bound) number of nodes at each level. In a typical indexed database, following a pointer (finding the row for a key) takes O(logN) time (rather than O(1) time), and it seems most of your operations would now cost O((logN)^2) time. You could hack your database's B-tree implementation, so that each B-Tree page "knows" the number of records "below" the page. At that point you would essentially be putting the RAM implementation into a file (with an optimistic assumption that Read(file, page#) is a constant-time operation in the file system). In either of the above, auto-increment works fine as long as the total number enqueued() over the lifetime of the database is not too large, but in addition to the autoincrement, you need to update the "parents" of inserted or removed records. Regards -Original Message- From: Matthias-Christian Ott [mailto:o...@mirix.org] Sent: Tuesday, March 01, 2016 10:38 AM To: sqlite-users at mailinglists.sqlite.org Subject: Re: [sqlite] Random-access sequences On 01/03/16 11:59, Matthias-Christian Ott wrote: > I know that this question is not strictly related to SQLite. > > I want to persist a random-access sequence with SQLite. A > random-access sequence is a queue with random-access to its queued > elements. Suppose that a random-access sequence [x_0, ..., x_n] has the > following operations: > > enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x] dequeue([x_0, ..., > x_n]) = [x_1, ..., x_n] lookup([x_0, ..., x_n], i) = x_i update([x_0, > ..., x_n], i, y) = > [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n] length([x_0, ..., x_n]) > = n + 1 Thinking a bit more about it, I also need the following operation: delete([x_0, ..., x_n], i) = [x_0, ..., x_(i - 1), x_(i + 1), ... x_n] So implementing the data structure with autoincrement does not work anymore. - Matthias-Christian ** This e-mail and any attachments thereto may contain confidential information and/or information protected by intellectual property rights for the exclusive attention of the intended addressees named above. If you have received this transmission in error, please immediately notify the sender by return e-mail and delete this message and its attachments. Unauthorized use, copying or further full or partial distribution of this e-mail or its contents is prohibited. **
[sqlite] Random-access sequences
On Tue, 1 Mar 2016 08:15:25 -0500 Richard Damon wrote: > > The theoretical maximum number of rows in a table is 264 > > (18446744073709551616 or about 1.8e+19). This limit is unreachable > > since the maximum database size of 140 terabytes will be reached > > first. A 140 terabytes database can hold no more than approximately > > 1e+13 rows, and then only if there are no indices and if each row > > contains very little data. > > > You can hit 2^63 insertions well before hitting the size limit of the > database if you have also been doing deletions. Yes. If you manage 1,000,000 insertion/second, that's 3.15576 * 10^13/year. You would run out of integers in 584,542 years. To get around that, add an "epoch" column, also integer. Initially it is always zero. Whenever "position" exceeds 2^63, increment "epoch" and reset "position" to zero. That will give you at least twice as many years. --jkl
[sqlite] Random-access sequences
On 01/03/16 11:59, Matthias-Christian Ott wrote: > I know that this question is not strictly related to SQLite. > > I want to persist a random-access sequence with SQLite. A random-access > sequence is a queue with random-access to its queued elements. Suppose > that a random-access sequence [x_0, ..., x_n] has the following operations: > > enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x] > dequeue([x_0, ..., x_n]) = [x_1, ..., x_n] > lookup([x_0, ..., x_n], i) = x_i > update([x_0, ..., x_n], i, y) = > [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n] > length([x_0, ..., x_n]) = n + 1 Thinking a bit more about it, I also need the following operation: delete([x_0, ..., x_n], i) = [x_0, ..., x_(i - 1), x_(i + 1), ... x_n] So implementing the data structure with autoincrement does not work anymore. - Matthias-Christian
[sqlite] Random-access sequences
On 01/03/16 12:07, Stephan Beal wrote: > On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott > wrote: > >> Unfortunately, this limits the maximum number of elements that can ever >> be inserted during a table's life-time to 2^63 - 1. While this might be >> acceptable in some cases it is an artificial limitation. >> > > Artificial, yes, but so is "64 bits." You will likely hit other limitations > far before getting anywhere near 2^63-1 insertions: > > https://www.sqlite.org/limits.html > > e.g. point #13: > > *Maximum Number Of Rows In A Table* > > The theoretical maximum number of rows in a table is 264 (18446744073709551616 > or about 1.8e+19). This limit is unreachable since the maximum database > size of 140 terabytes will be reached first. A 140 terabytes database can > hold no more than approximately 1e+13 rows, and then only if there are no > indices and if each row contains very little data. I use SQLite as a buffer for small amounts of data so I'm not near the limits of the database. However, 2^63 - 1 for autoincrement columns refers to the life-time of the table not the limits of the database. I'm aware that pushing 2^63 - 1 elements through the queue is not realistic even in the largest deployments. But saying that your software breaks after 2^63 - 1 elements feels somewhat like saying that your name can only be 20 character because some mainframe software can't handle more than that. It's a feeling that you have failed as a software developer to deliver a proper solution and now pretend that your improper solution is proper because nobody will hit the limits.
[sqlite] Random-access sequences
On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott wrote: > Unfortunately, this limits the maximum number of elements that can ever > be inserted during a table's life-time to 2^63 - 1. While this might be > acceptable in some cases it is an artificial limitation. > Artificial, yes, but so is "64 bits." You will likely hit other limitations far before getting anywhere near 2^63-1 insertions: https://www.sqlite.org/limits.html e.g. point #13: *Maximum Number Of Rows In A Table* The theoretical maximum number of rows in a table is 264 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first. A 140 terabytes database can hold no more than approximately 1e+13 rows, and then only if there are no indices and if each row contains very little data. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
[sqlite] Random-access sequences
Original message From: Stephan Beal Date: 01/03/2016 12:07 (GMT+00:00) To: SQLite mailing list Subject: Re: [sqlite] Random-access sequences > On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott wrote: >> Unfortunately, this limits the maximum number of elements that can ever >> be inserted during a table's life-time to 2^63 - 1. While this might be >> acceptable in some cases it is an artificial limitation. >> >Artificial, yes, but so is "64 bits." You will likely hit other limitations far before getting anywhere near 2^63-1 insertions: > https://www.sqlite.org/limits.html > e.g. point #13: > *Maximum Number Of Rows In A Table* I don't think he's bothered about the maximum number of rows at one time, but that he might run out of new rowids. However, this feels as needless a concern: with 1,000 queue/dequeues per second, 2^63 IDs will last 292 million years...? Graham.
[sqlite] Random-access sequences
I know that this question is not strictly related to SQLite. I want to persist a random-access sequence with SQLite. A random-access sequence is a queue with random-access to its queued elements. Suppose that a random-access sequence [x_0, ..., x_n] has the following operations: enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x] dequeue([x_0, ..., x_n]) = [x_1, ..., x_n] lookup([x_0, ..., x_n], i) = x_i update([x_0, ..., x_n], i, y) = [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n] length([x_0, ..., x_n]) = n + 1 A naive implementation list-based would have at least one operation that is in O(n). A possible implementation in SQLite with all operations in O(log n) could be: CREATE TABLE sequence ( position INTEGER PRIMARY KEY AUTOINCREMENT, ... ); -- enqueue INSERT INTO sequence VALUES (...); -- dequeue SELECT position, ... FROM sequence ORDER BY position ASC LIMIT 1; DELETE FROM sequence WHERE position = :position; -- lookup SELECT position AS start, ... FROM sequence ORDER BY position ASC LIMIT 1; SELECT ... FROM sequence WHERE position = :start + :i; -- update UPDATE sequence SET ... WHERE position = :i; -- length SELECT position AS start, ... FROM sequence ORDER BY position ASC LIMIT 1; SELECT position AS end, ... FROM sequence ORDER BY position ASC LIMIT 1; SELECT :end - :start + 1; Unfortunately, this limits the maximum number of elements that can ever be inserted during a table's life-time to 2^63 - 1. While this might be acceptable in some cases it is an artificial limitation. It seems that one way that is not subject to to such limitations is to abuse tables as lists and use algorithms for purely functional data structures. There is an algorithm that could be used to implement random-access sequences in SQLite efficiently [1] but it's neither elegant nor simple. Does anyone have another idea how to elegantly implement random-access sequences in O(log n) in SQLite? - Matthias-Christian [1] Robert Hood and Robert Melville. Real-time queue operations in pure LISP
[sqlite] Random-access sequences
On 3/1/16 7:07 AM, Stephan Beal wrote: > On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott > wrote: > >> Unfortunately, this limits the maximum number of elements that can ever >> be inserted during a table's life-time to 2^63 - 1. While this might be >> acceptable in some cases it is an artificial limitation. >> > Artificial, yes, but so is "64 bits." You will likely hit other limitations > far before getting anywhere near 2^63-1 insertions: > > https://www.sqlite.org/limits.html > > e.g. point #13: > > *Maximum Number Of Rows In A Table* > > The theoretical maximum number of rows in a table is 264 (18446744073709551616 > or about 1.8e+19). This limit is unreachable since the maximum database > size of 140 terabytes will be reached first. A 140 terabytes database can > hold no more than approximately 1e+13 rows, and then only if there are no > indices and if each row contains very little data. > You can hit 2^63 insertions well before hitting the size limit of the database if you have also been doing deletions. A simple repeated pattern of Add record N, Remove record N-1 will eventually us up the limits on record number. The answer here is to detect when the record number is getting near to 'too big' and then go through an 'renumber' the record back in sequence from 1 and reset the next record number. This might require changing data elsewhere if record numbers have been saved anywhere, which means it might make sense to not wait till overflow in imminent, but do it proactively when minimal records (ideally no records) are referenced elsewhere (and only in known locations). 2^63 is such a big number that it is likely you can find a time for standard PM when you can do this with the database offline before things overflow. At one insertion a nano-second, you have almost 3 centuries before this needs to be done (if I am doing my math right), so you can do it when you migrate to a new box. With 32 bit numbers, these limits were readily reachable, 64 bits was a quantum leap in range, adding 32 doubling to the limit, which will take a while to catch up (which we likely will). The 140 terabyte limit is something that I could see being hit now in some applications. -- Richard Damon