[sqlite] Random-access sequences

2016-03-02 Thread R Smith


On 2016/03/02 2:26 AM, James K. Lowden wrote:
> On Tue, 1 Mar 2016 08:15:25 -0500
> Richard Damon  wrote:
>
>>> The theoretical maximum number of rows in a table is 264
>>> (18446744073709551616 or about 1.8e+19). This limit is unreachable
>>> since the maximum database size of 140 terabytes will be reached
>>> first. A 140 terabytes database can hold no more than approximately
>>> 1e+13 rows, and then only if there are no indices and if each row
>>> contains very little data.
>>>
>> You can hit 2^63 insertions well before hitting the size limit of the
>> database if you have also been doing deletions.
> Yes.  If you manage 1,000,000 insertion/second, that's 3.15576 *
> 10^13/year.  You would run out of integers in 584,542 years.
>
> To get around that, add an "epoch" column, also integer.
> Initially it is always zero.  Whenever "position" exceeds 2^63,
> increment "epoch" and reset "position" to zero.
>
> That will give you at least twice as many years.

No, that will give you 2^63 as many years... Which is many many times 
more than the age of the Universe and slightly less entries than there 
are atoms in the known Universe (About 10^80 on last count). Good idea, 
I think it will be enough. :)



[sqlite] Random-access sequences

2016-03-01 Thread Wade, William
In RAM, the simple implementation would be to have a balanced tree, ordered by 
index, where every node knows how many elements are below it, and has pointers 
to its children. The tree will have O(logN) depth, following a pointer is O(1), 
and, and all of your operations involve a small (constant-bound) number of 
nodes at each level.

In a typical indexed database, following a pointer (finding the row for a key) 
takes O(logN) time (rather than O(1) time), and it seems most of your 
operations would now cost O((logN)^2) time.

You could hack your database's B-tree implementation, so that each B-Tree page 
"knows" the number of records "below" the page. At that point you would 
essentially be putting the RAM implementation into a file (with an optimistic 
assumption that Read(file, page#) is a constant-time operation in the file 
system).

In either of the above, auto-increment works fine as long as the total number 
enqueued() over the lifetime of the database is not too large, but in addition 
to the autoincrement, you need to update the "parents" of inserted or removed 
records.

Regards

-Original Message-
From: Matthias-Christian Ott [mailto:o...@mirix.org]
Sent: Tuesday, March 01, 2016 10:38 AM
To: sqlite-users at mailinglists.sqlite.org
Subject: Re: [sqlite] Random-access sequences

On 01/03/16 11:59, Matthias-Christian Ott wrote:
> I know that this question is not strictly related to SQLite.
>
> I want to persist a random-access sequence with SQLite. A
> random-access sequence is a queue with random-access to its queued
> elements. Suppose that a random-access sequence [x_0, ..., x_n] has the 
> following operations:
>
> enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x] dequeue([x_0, ...,
> x_n]) = [x_1, ..., x_n] lookup([x_0, ..., x_n], i) = x_i update([x_0,
> ..., x_n], i, y) =
>   [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n] length([x_0, ..., x_n])
> = n + 1

Thinking a bit more about it, I also need the following operation:

delete([x_0, ..., x_n], i) = [x_0, ..., x_(i - 1), x_(i + 1), ... x_n]

So implementing the data structure with autoincrement does not work anymore.

- Matthias-Christian



**
This e-mail and any attachments thereto may contain confidential information 
and/or information protected by intellectual property rights for the exclusive 
attention of the intended addressees named above. If you have received this 
transmission in error, please immediately notify the sender by return e-mail 
and delete this message and its attachments. Unauthorized use, copying or 
further full or partial distribution of this e-mail or its contents is 
prohibited.
**


[sqlite] Random-access sequences

2016-03-01 Thread James K. Lowden
On Tue, 1 Mar 2016 08:15:25 -0500
Richard Damon  wrote:

> > The theoretical maximum number of rows in a table is 264
> > (18446744073709551616 or about 1.8e+19). This limit is unreachable
> > since the maximum database size of 140 terabytes will be reached
> > first. A 140 terabytes database can hold no more than approximately
> > 1e+13 rows, and then only if there are no indices and if each row
> > contains very little data.
> >
> You can hit 2^63 insertions well before hitting the size limit of the 
> database if you have also been doing deletions.

Yes.  If you manage 1,000,000 insertion/second, that's 3.15576 *
10^13/year.  You would run out of integers in 584,542 years.  

To get around that, add an "epoch" column, also integer.
Initially it is always zero.  Whenever "position" exceeds 2^63,
increment "epoch" and reset "position" to zero.  

That will give you at least twice as many years.  

--jkl


[sqlite] Random-access sequences

2016-03-01 Thread Matthias-Christian Ott
On 01/03/16 11:59, Matthias-Christian Ott wrote:
> I know that this question is not strictly related to SQLite.
> 
> I want to persist a random-access sequence with SQLite. A random-access
> sequence is a queue with random-access to its queued elements. Suppose
> that a random-access sequence [x_0, ..., x_n] has the following operations:
> 
> enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x]
> dequeue([x_0, ..., x_n]) = [x_1, ..., x_n]
> lookup([x_0, ..., x_n], i) = x_i
> update([x_0, ..., x_n], i, y) =
>   [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n]
> length([x_0, ..., x_n]) = n + 1

Thinking a bit more about it, I also need the following operation:

delete([x_0, ..., x_n], i) = [x_0, ..., x_(i - 1), x_(i + 1), ... x_n]

So implementing the data structure with autoincrement does not work anymore.

- Matthias-Christian



[sqlite] Random-access sequences

2016-03-01 Thread Matthias-Christian Ott
On 01/03/16 12:07, Stephan Beal wrote:
> On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott 
> wrote:
> 
>> Unfortunately, this limits the maximum number of elements that can ever
>> be inserted during a table's life-time to 2^63 - 1. While this might be
>> acceptable in some cases it is an artificial limitation.
>>
> 
> Artificial, yes, but so is "64 bits." You will likely hit other limitations
> far before getting anywhere near 2^63-1 insertions:
> 
> https://www.sqlite.org/limits.html
> 
> e.g. point #13:
> 
> *Maximum Number Of Rows In A Table*
> 
> The theoretical maximum number of rows in a table is 264 (18446744073709551616
> or about 1.8e+19). This limit is unreachable since the maximum database
> size of 140 terabytes will be reached first. A 140 terabytes database can
> hold no more than approximately 1e+13 rows, and then only if there are no
> indices and if each row contains very little data.

I use SQLite as a buffer for small amounts of data so I'm not near the
limits of the database. However, 2^63 - 1 for autoincrement columns
refers to the life-time of the table not the limits of the database. I'm
aware that pushing 2^63 - 1 elements through the queue is not realistic
even in the largest deployments. But saying that your software breaks
after 2^63 - 1 elements feels somewhat like saying that your name can
only be 20 character because some mainframe software can't handle more
than that. It's a feeling that you have failed as a software developer
to deliver a proper solution and now pretend that your improper solution
is proper because nobody will hit the limits.


[sqlite] Random-access sequences

2016-03-01 Thread Stephan Beal
On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott 
wrote:

> Unfortunately, this limits the maximum number of elements that can ever
> be inserted during a table's life-time to 2^63 - 1. While this might be
> acceptable in some cases it is an artificial limitation.
>

Artificial, yes, but so is "64 bits." You will likely hit other limitations
far before getting anywhere near 2^63-1 insertions:

https://www.sqlite.org/limits.html

e.g. point #13:

*Maximum Number Of Rows In A Table*

The theoretical maximum number of rows in a table is 264 (18446744073709551616
or about 1.8e+19). This limit is unreachable since the maximum database
size of 140 terabytes will be reached first. A 140 terabytes database can
hold no more than approximately 1e+13 rows, and then only if there are no
indices and if each row contains very little data.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf


[sqlite] Random-access sequences

2016-03-01 Thread Graham Holden


 Original message 
From: Stephan Beal  
Date: 01/03/2016  12:07  (GMT+00:00) 
To: SQLite mailing list  
Subject: Re: [sqlite] Random-access sequences 

> On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott 
wrote:

>> Unfortunately, this limits the maximum number of elements that can ever
>> be inserted during a table's life-time to 2^63 - 1. While this might be
>> acceptable in some cases it is an artificial limitation.
>>

>Artificial, yes, but so is "64 bits." You will likely hit other limitations
far before getting anywhere near 2^63-1 insertions:

> https://www.sqlite.org/limits.html

> e.g. point #13:

> *Maximum Number Of Rows In A Table*

I don't think he's bothered about the maximum number of rows at one time, but 
that he might run out of new rowids. However, this feels as needless a concern: 
with 1,000 queue/dequeues per second, 2^63 IDs will last 292 million years...?

Graham.


[sqlite] Random-access sequences

2016-03-01 Thread Matthias-Christian Ott
I know that this question is not strictly related to SQLite.

I want to persist a random-access sequence with SQLite. A random-access
sequence is a queue with random-access to its queued elements. Suppose
that a random-access sequence [x_0, ..., x_n] has the following operations:

enqueue([x_0, ..., x_n], x) = [x_0, ..., x_n, x]
dequeue([x_0, ..., x_n]) = [x_1, ..., x_n]
lookup([x_0, ..., x_n], i) = x_i
update([x_0, ..., x_n], i, y) =
  [x_0, ..., x_(i - 1), y, x_(i + 1), ... x_n]
length([x_0, ..., x_n]) = n + 1

A naive implementation list-based would have at least one operation that
is in O(n).

A possible implementation in SQLite with all operations in O(log n)
could be:

CREATE TABLE sequence (
  position INTEGER PRIMARY KEY AUTOINCREMENT,
  ...
);

-- enqueue
INSERT INTO sequence VALUES (...);

-- dequeue
SELECT position, ... FROM sequence ORDER BY position ASC LIMIT 1;
DELETE FROM sequence WHERE position = :position;

-- lookup
SELECT position AS start, ... FROM sequence
  ORDER BY position ASC LIMIT 1;
SELECT ... FROM sequence WHERE position = :start + :i;

-- update
UPDATE sequence SET ... WHERE position = :i;

-- length
SELECT position AS start, ... FROM sequence
  ORDER BY position ASC LIMIT 1;
SELECT position AS end, ... FROM sequence
  ORDER BY position ASC LIMIT 1;
SELECT :end - :start + 1;

Unfortunately, this limits the maximum number of elements that can ever
be inserted during a table's life-time to 2^63 - 1. While this might be
acceptable in some cases it is an artificial limitation.

It seems that one way that is not subject to to such limitations is to
abuse tables as lists and use algorithms for purely functional data
structures. There is an algorithm that could be used to implement
random-access sequences in SQLite efficiently [1] but it's neither
elegant nor simple.

Does anyone have another idea how to elegantly implement random-access
sequences in O(log n) in SQLite?

- Matthias-Christian

[1] Robert Hood and Robert Melville. Real-time queue operations in pure
LISP


[sqlite] Random-access sequences

2016-03-01 Thread Richard Damon
On 3/1/16 7:07 AM, Stephan Beal wrote:
> On Tue, Mar 1, 2016 at 12:59 PM, Matthias-Christian Ott 
> wrote:
>
>> Unfortunately, this limits the maximum number of elements that can ever
>> be inserted during a table's life-time to 2^63 - 1. While this might be
>> acceptable in some cases it is an artificial limitation.
>>
> Artificial, yes, but so is "64 bits." You will likely hit other limitations
> far before getting anywhere near 2^63-1 insertions:
>
> https://www.sqlite.org/limits.html
>
> e.g. point #13:
>
> *Maximum Number Of Rows In A Table*
>
> The theoretical maximum number of rows in a table is 264 (18446744073709551616
> or about 1.8e+19). This limit is unreachable since the maximum database
> size of 140 terabytes will be reached first. A 140 terabytes database can
> hold no more than approximately 1e+13 rows, and then only if there are no
> indices and if each row contains very little data.
>
You can hit 2^63 insertions well before hitting the size limit of the 
database if you have also been doing deletions.

A simple repeated pattern of Add record N, Remove record N-1 will 
eventually us up the limits on record number.

The answer here is to detect when the record number is getting near to 
'too big' and then go through an 'renumber' the record back in sequence 
from 1 and reset the next record number. This might require changing 
data elsewhere if record numbers have been saved anywhere, which means 
it might make sense to not wait till overflow in imminent, but do it 
proactively when minimal records (ideally no records) are referenced 
elsewhere (and only in known locations).

2^63 is such a big number that it is likely you can find a time for 
standard PM when you can do this with the database offline before things 
overflow. At one insertion a nano-second, you have almost 3 centuries 
before this needs to be done (if I am doing my math right), so you can 
do it when you migrate to a new box.

With 32 bit numbers, these limits were readily reachable, 64 bits was a 
quantum leap in range, adding 32 doubling to the limit, which will take 
a while to catch up (which we likely will). The 140 terabyte limit is 
something that I could see being hit now in some applications.

-- 
Richard Damon