[sqlite] New word to replace "serverless"

2020-01-28 Thread Howard Chu
Rowan Worth wrote:
> On Tue, 28 Jan 2020 at 06:19, Richard Hipp  wrote:
> 
>> Note that "in-process" and "embedded" are not adequate substitutes for
>> "serverless".  An RDBMS might be in-process or embedded but still be
>> running a server in a separate thread. In fact, that is how most
>> embedded RDBMSes other than SQLite work, if I am not much mistaken.

Wait, really? AFAICS embedded means in-process, no IPC required to operate.

-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FW: Questions about your "Performance Matters" talk re SQLite

2020-01-04 Thread Howard Chu
Keith Medcalf wrote:
> 
> Indeed turning off memstatus leads to a 500% (from ~3s to ~0.5s) performance 
> increase.  
> Changing the threading mode or the indirection level of the mutexes calls 
> seems to have no significant effect.
> 
Goes to show - publishing benchmark results without investigating why they are 
what
they are is mostly pointless. When you suspect mutex contention is a significant
factor, you should use something like mutrace to confirm your suspicion first.

Fundamentally there ought to be no performance difference between running a 
64-threaded
server on a 64-threaded CPU vs 64 single-threaded processes. In practice, the 
single
process with 64 threads ought to be slightly faster, due to less context switch 
overhead
between threads, but if nothing else in the system is contending for CPU then 
context
switching shouldn't even be an issue.

-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS5 Transaction Leads to OOB Read

2019-03-19 Thread Chu
YES. AFL with ASAN.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS5 Transaction Leads to NULL Pointer

2019-03-18 Thread Chu
The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

INSERT INTO t1 VALUES('');

BEGIN ;
DELETE FROM t1 WHERE rowid = 1;
SELECT * FROM t1 WHERE content MATCH '';
INSERT INTO t1 VALUES('');
SELECT * FROM t1 WHERE content MATCH '';
END;
``

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a crash because of null pointer. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 1-null-pointer.sql
AddressSanitizer:DEADLYSIGNAL
=
==20822==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
0x55df5393c60a bp 0x0001 sp 0x706021b0 T0)
==20822==The signal is caused by a READ memory access.
==20822==Hint: address points to the zero page.
#0 0x55df5393c609 in fts5ChunkIterate 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934
#1 0x55df5393ca5e in fts5SegiterPoslist 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210970
#2 0x55df5393d65d in fts5IterSetOutputs_Full 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211177
#3 0x55df5393f17e in fts5MultiIterNext 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210732
#4 0x55df539444e9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211309
#5 0x55df5394702f in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213266
#6 0x55df5398a566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
#7 0x55df5398a566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
#8 0x55df5398ad3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
#9 0x55df5398af0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
#10 0x55df5398cc9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
#11 0x55df538a973a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
#12 0x55df538c5439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
#13 0x55df538c5439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
#14 0x55df536f9662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
#15 0x55df536f9662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
#16 0x55df536fbdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
#17 0x55df5370b466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
#18 0x55df536d6c98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
#19 0x7f5c4f52809a in __libc_start_main ../csu/libc-start.c:308
#20 0x55df536d8599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210934 in fts5ChunkIterate
==20822==ABORTING
```

View detail In gdb:

```
(gdb) r < 1-null-pointer.sql 
The program being debugged has been started already.
Start it from the beginning? (y or n) Y
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
1-null-pointer.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00, 
xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
210934pData = fts5LeafRead(p, FTS5_SEGMENT_ROWID(pSeg->pSeg->iSegid, 
pgno));
(gdb) bt
#0  0x557fe60a in fts5ChunkIterate (p=p@entry=0x60d00ad8, 
pSeg=pSeg@entry=0x61300b28, pCtx=0x7fffac00, 
xChunk=xChunk@entry=0x55622dc0 ) at 
sqlite3.c:210934
#1  0x557fea5f in fts5SegiterPoslist (p=0x60d00ad8, 
pSeg=0x61300b28, pColset=pColset@entry=0x602014b8, 
pBuf=pBuf@entry=0x61300ae8)
at sqlite3.c:210970
#2  0x557ff65e in fts5IterSetOutputs_Full (pIter=0x61300ac8, 
pSeg=) at sqlite3.c:211177
#3  0x5580117f in fts5MultiIterNext (p=p@entry=0x60d00ad8, 
pIter=pIter@entry=0x61300ac8, bFrom=bFrom@entry=0, iFrom=iFrom@entry=0)
at sqlite3.c:210732
#4  0x558064ea in fts5MultiIterNew (p=p@entry=0x60d00ad8, 
pStruct=pStruct@entry=0x60402458, flags=flags@entry=16, 
pColset=pColset@entry=0x602014b8, pTerm=, 
nTerm=nTerm@entry=5, iLevel=, nSegment=, 
ppOut=)
at sqlite3.c:211309
#5  0x55809030 in sqlite3Fts5IndexQuery (p=0x60d00ad8, 
pToken=pToken@entry=0x60201498 "", nToken=4, flags=flags@entry=0, 
pColset=pColset@entry=0x602014b8, ppIter=ppIter@entry=0x61300938) 
at sqlite3.c:213266
#6  0x5584c567 in fts5ExprNearInitAll (pExpr=0x60402598, 
pExpr=0x60402598, 

[sqlite] FTS5 Transaction Leads to OOB Read

2019-03-18 Thread Chu
The code:

```
CREATE VIRTUAL TABLE t1 USING fts5(content);

BEGIN;
INSERT INTO t1 (content) VALUES('');
SELECT * FROM 
t1('*');
END;
```

As you can see, it creates a virtual table with fts5, and run a transaction on 
it, this will leads to a OOB READ. The ASAN report:

```
➜  sqlite-crashes ../sqlite-autoconf-3270200/sqlite3 < 2-oob-read.sql
=
==21007==ERROR: AddressSanitizer: heap-buffer-overflow on address 
0x60d02898 at pc 0x7f0cad16e6a3 bp 0x7ffdc88ddc80 sp 0x7ffdc88dd430
READ of size 81 at 0x60d02898 thread T0
#0 0x7f0cad16e6a2  (/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2)
#1 0x563324ca4013 in fts5HashEntrySort 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207762
#2 0x563324e685f9 in sqlite3Fts5HashScanInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:207820
#3 0x563324e685f9 in fts5SegIterHashInit 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:210321
#4 0x563324e685f9 in fts5MultiIterNew 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:211270
#5 0x563324e6b380 in fts5SetupPrefixIter 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:212946
#6 0x563324e6b380 in sqlite3Fts5IndexQuery 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:213275
#7 0x563324eae566 in fts5ExprNearInitAll 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205261
#8 0x563324eae566 in fts5ExprNodeFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205778
#9 0x563324eaed3d in sqlite3Fts5ExprFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:205836
#10 0x563324eaef0d in fts5CursorFirst 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215371
#11 0x563324eb0c9d in fts5FilterMethod 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:215653
#12 0x563324dcd73a in sqlite3VdbeExec 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:90333
#13 0x563324de9439 in sqlite3Step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81716
#14 0x563324de9439 in sqlite3_step 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:81781
#15 0x563324c1d662 in exec_prepared_stmt 
/root/Documents/sqlite-autoconf-3270200/shell.c:10445
#16 0x563324c1d662 in shell_exec 
/root/Documents/sqlite-autoconf-3270200/shell.c:10752
#17 0x563324c1fdf3 in runOneSqlLine 
/root/Documents/sqlite-autoconf-3270200/shell.c:16106
#18 0x563324c2f466 in process_input 
/root/Documents/sqlite-autoconf-3270200/shell.c:16206
#19 0x563324bfac98 in main 
/root/Documents/sqlite-autoconf-3270200/shell.c:16967
#20 0x7f0cacd7009a in __libc_start_main ../csu/libc-start.c:308
#21 0x563324bfc599 in _start 
(/root/Documents/sqlite-autoconf-3270200/sqlite3+0x46599)

0x60d02898 is located 0 bytes to the right of 136-byte region 
[0x60d02810,0x60d02898)
allocated by thread T0 here:
#0 0x7f0cad19f350 in __interceptor_malloc 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
#1 0x563324cdccf3 in sqlite3MemMalloc 
/root/Documents/sqlite-autoconf-3270200/sqlite3.c:22837

SUMMARY: AddressSanitizer: heap-buffer-overflow 
(/lib/x86_64-linux-gnu/libasan.so.5+0xb86a2) 
Shadow bytes around the buggy address:
  0x0c1a7fff84c0: fd fd fd fd fd fd fa fa fa fa fa fa fa fa fd fd
  0x0c1a7fff84d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c1a7fff84e0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c1a7fff84f0: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x0c1a7fff8500: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c1a7fff8510: 00 00 00[fa]fa fa fa fa fa fa fa fa fd fd fd fd
  0x0c1a7fff8520: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
  0x0c1a7fff8530: fa fa fa fa fa fa 00 00 00 00 00 00 00 00 00 00
  0x0c1a7fff8540: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
  0x0c1a7fff8550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c1a7fff8560: 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==21007==ABORTING
```

View detail in gdb:

```
(gdb) r < 2-oob-read.sql 
Starting program: /root/Documents/sqlite-autoconf-3270200/sqlite3 < 
2-oob-read.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, 0x5564200f in fts5HashEntrySort (pHash=0x60402018, 

Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-15 Thread Howard Chu

Jens Alfke wrote:




On May 14, 2018, at 11:25 PM, Howard Chu <h...@symas.com> wrote:


Do you have a link to the currently updated version of this? Google gives me 
projects that haven't been updated since 2015.


That's the most recent version. So far I haven't seen any compelling new 
features in subsequent SQLite versions to warrant a resync of the code.


It looks to be based on SQLite 3.7.16. New features since then include partial 
indexes, common table expressions, next-gen query planner, deferring foreign 
keys, WITHOUT ROWID tables, row values, secure pointer passing for native 
functions, sqlite3_serialize(), and the upcoming UPSERT … and that's just from 
skimming through the release history. Plus of course all sorts of query planner 
improvements, and misc. bug fixes.


Many of those new features resulted in performance regressions, which is the 
main reason I stayed away from them.


Partial indexes are a must-have for my use case, so when I ran across 
SQLightning in 2016 I quickly gave up on it. There have also been three or four 
bug fixes in recent SQLite releases that fixed serious problems we were having.

If you build a modified version of SQLite in such a way that it can't feasibly* 
be updated by anyone but you, and then have no plans to keep it up to date, it 
isn't a product anyone else can seriously use. It's more of a personal project, 
or a gauntlet thrown down to the SQLite team. Which is fine as it goes, but I 
don't think it's a good idea to suggest other people use it.

—Jens

* I've looked at the source. There's no version history in Git, no copy of the 
original SQLite source files, and no markers I could find in the source that 
show where your changes are. I quickly decided that trying to merge in a newer 
version of SQLite would be a waste of time.


Nonsense.

https://github.com/LMDB/sqlightning/tree/mdb

You didn't look carefully enough, so you're in no position to offer advice.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-15 Thread Howard Chu

Peter Da Silva wrote:

Now I'm even more confused, there's "sqlightning" and "sqlitening". Are these 
different names for the same project? It doesn't seem so. Which was intended?

https://github.com/LMDB/sqlightning


I was referring to this. ^^

Since it uses LMDB, and LMDB readers are wait-free, you can run as many reads 
as you have CPU threads, without blocking.




On 5/14/18, 8:47 AM, "sqlite-users on behalf of Bernard Ertl" 
<sqlite-users-boun...@mailinglists.sqlite.org on behalf of bern...@interplansystems.com> 
wrote:

 This is the latest:
 
 http://www.sqlitening.planetsquires.com/index.php?topic=9427.0
 
 I contributed to the last SQLitening update.  No one has reported any issues that need fixing or updating since that update.  It seems to be working quite well/stable.
 
 
 
 > On 5/13/18, 6:48 AM, "sqlite-users on behalf of Howard Chu" <sqlite-users-boun...@mailinglists.sqlite.org on behalf of h...@symas.com> wrote:
 
 > SQLightning
 
 > Do you have a link to the currently updated version of this? Google gives me projects that haven't been updated since 2015.
 
 > ___

 sqlite-users mailing list
 sqlite-users@mailinglists.sqlite.org
 http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
 ___
 sqlite-users mailing list
 sqlite-users@mailinglists.sqlite.org
 http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
 


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-15 Thread Howard Chu

Peter Da Silva wrote:

On 5/13/18, 6:48 AM, "sqlite-users on behalf of Howard Chu" 
<sqlite-users-boun...@mailinglists.sqlite.org on behalf of h...@symas.com> wrote:

 SQLightning

Do you have a link to the currently updated version of this? Google gives me 
projects that haven't been updated since 2015.


That's the most recent version. So far I haven't seen any compelling new 
features in subsequent SQLite versions to warrant a resync of the code.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Multi threaded readers on memory sqlite cannot scale

2018-05-13 Thread Howard Chu

Clemens Ladisch wrote:

Techno Magos wrote:

So, memory sqlite is not really usable with multiple threads (readers).
While one might expect  that multiple readers of *memory *content could
scale even better than with file content.


Concurrent accesses to the same in-memory data structures must be
serialized.  In shared-cache mode, the connections share the cache, while
on-disk connections each have their own cache.


Is there some special mode possible to achieve scaling up throughput with
multiple threads for memory sqlite content?


Put a DB file on a RAM disk.  Or on a normal disk (with looser synchronous
and journal_mode settings), and rely on the OS file cache.


Or just use SQLightning, which has no scalability limits for readers.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Enabling MMAP in Android

2017-12-12 Thread Howard Chu

J Decker wrote:

On Tue, Dec 12, 2017 at 4:35 PM, Howard Chu <h...@symas.com> wrote:


Martin Raiber wrote:


On 12.12.2017 19:47 Simon Slavin wrote:


On 12 Dec 2017, at 6:27pm, Jens Alfke <j...@mooseyard.com> wrote:

On Dec 12, 2017, at 5:46 AM, Simon Slavin <slav...@bigfraud.org> wrote:



Before you answer that question, you should know that both Windows and
macOS have been proved to have serious bugs in their memory mapping code.


This has been brought up several times recently, but I’ve never seen
any details given about exactly what was wrong with macOS’s mmap
implementation. Does anyone have a pointer to authoritative information
about this?


See this thread:

<http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inev
itable-catalog-corruption-td85620.html>

This is the thread which led to memory mapping being disabled for
writing on macOS, as discussed here:

<http://sqlite.1065341.n5.nabble.com/Re-Database-corruption-
and-PRAGMA-fullfsync-on-macOS-td95366i20.html>



There might also be a mmap bug in the Android 7.0 fuse layer:
https://www.mail-archive.com/openldap-its@openldap.org/msg10970.html



There is definitely a bug in Android 7 fuse/mmap. The bug is definitely
not present when bypassing fuse, but only rooted devices can bypass...

My experience was in majority only rooted devices could use fuse.  since

/dev/fuse was rw--- .  (although not all.)


The /sdcard partition is always mounted through fuse because it's a vfat/exfat 
filesystem that doesn't support owner/permission bits, and the fuse driver 
imposes the Android security model on top of it. Many android devices no 
longer include a physical SDcard slot, but still have an internal storage 
partition that's labeled /sdcard and it still behaves this way.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Enabling MMAP in Android

2017-12-12 Thread Howard Chu

Martin Raiber wrote:

On 12.12.2017 19:47 Simon Slavin wrote:

On 12 Dec 2017, at 6:27pm, Jens Alfke <j...@mooseyard.com> wrote:


On Dec 12, 2017, at 5:46 AM, Simon Slavin <slav...@bigfraud.org> wrote:

Before you answer that question, you should know that both Windows and macOS 
have been proved to have serious bugs in their memory mapping code.

This has been brought up several times recently, but I’ve never seen any 
details given about exactly what was wrong with macOS’s mmap implementation. 
Does anyone have a pointer to authoritative information about this?

See this thread:

<http://sqlite.1065341.n5.nabble.com/SQLITE-vs-OSX-mmap-inevitable-catalog-corruption-td85620.html>

This is the thread which led to memory mapping being disabled for writing on 
macOS, as discussed here:

<http://sqlite.1065341.n5.nabble.com/Re-Database-corruption-and-PRAGMA-fullfsync-on-macOS-td95366i20.html>


There might also be a mmap bug in the Android 7.0 fuse layer:
https://www.mail-archive.com/openldap-its@openldap.org/msg10970.html


There is definitely a bug in Android 7 fuse/mmap. The bug is definitely not 
present when bypassing fuse, but only rooted devices can bypass...


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Enabling MMAP in Android

2017-12-12 Thread Howard Chu

Jens Alfke wrote:


I’m skeptical about mmap being broken on Macs, since there are other production 
databases such as Realm* that use it heavily. (Though I am not sure whether 
Realm uses writeable mappings.)

—Jens

* and LMDB, but I am not sure if LMDB is in use on macOS.


LMDB is in common use on MacOS, no issues have been reported. On iOS there's 
this nagging problem that the virtual address space is still limited to 4GB, 
even on 64bit systems.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-28 Thread Howard Chu

Nico Williams wrote:

On Fri, Nov 25, 2016 at 09:08:37AM +0100, Florian Weimer wrote:

I think you misunderstood what I was asking (see the start of the thread).
I need to support Byzantine readers which do not follow the locking
protocol.  Based on the documentation, LMDB uses locks to implement MVCC and
prevent premature page reuse.



There's a good use case for an LMDB-like COW DB such that readers need
never lock.  That would require vacuuming by writing a new file and
renaming it into place, which is a trade-off.

One of the problems with LMDB's locks is that readers need write
permission to the lock file, IIRC :( but at least it's a separate file.


We developed an alternate locking protocol for Postfix. It just uses standard 
fcntl() locks.


http://www.postfix.org/lmdb_table.5.html

Of course nothing comes for free - with this approach, writers don't block 
readers, but readers block writers. That's the best you're going to get 
without custom lock protocols like LMDB uses natively.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-24 Thread Howard Chu

Florian Weimer wrote:

On 11/24/2016 10:41 PM, Howard Chu wrote:

As a
compromise you could use SQLightning, which replaces SQLite's Btree
layer with LMDB. Since LMDB *does* allow readers that don't block
writers.


How does it do that?  Does LMDB perform lock-free optimistic reads and
retroactively verifies that the entire read operation was consistent? The web
page currently says that “readers need write access to locks and lock file”:

  <http://lmdb.tech/doc/>


Readers are lock-free/wait-free. Since LMDB uses MVCC readers get their own 
fully isolated snapshot of the DB so no retroactive verification is needed.


The restriction on opening the database twice within the same process is
something which we would have to work around, too.

I see you are working on eliminating the key size limit, which is nice.


Yeah, should be available in a month or so in 1.0 release candidates.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-24 Thread Howard Chu

Florian Weimer wrote:

On 11/24/2016 02:54 PM, Richard Hipp wrote:

On 11/24/16, Florian Weimer <fwei...@redhat.com> wrote:

I'd like to replace the use of Berkeley DB in RPM with SQLite.

The scenario is special in the follow way.  There is no database server,
all access goes directly to the database.  Unprivileged users without
write access to the RPM database are expected to run read-only queries
against the database.  Privileged users (basically, root) is expected to
use locking to exclude concurrent writers.  But read-only users should
not be able to stop acquisition of a write lock.

Is there a way to do this with SQLite?


Seems like quite a lot of burden to go this route. SQLite has a much larger 
footprint than BDB, and much worse performance overall. As a compromise you 
could use SQLightning, which replaces SQLite's Btree layer with LMDB. Since 
LMDB *does* allow readers that don't block writers. But it would be more 
efficient to just use LMDB directly, and not incur the overhead of the SQL 
translation layer.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-13 Thread Howard Chu

Daniel Meyer wrote:

We are interested in using sqlite as a read only, in memory, parallel
access database.  We have database files that are on the order of 100GB
that we are loading into memory.  We have found great performance when
reading from a single thread.  We need to scale up to have many parallel
reader threads.  Once the DB is created it never needs to be modified.  How
can we allow many reader threads on an in memory, write once read many
times database and achieve multi-core performance?  Is this possible with
sqlite?


Thanks for all the helpful responses.  I have moved forward experimenting
with using parallel readers on an in memory sqlite database.  I have found
that I get true concurrency (multi-core speed up) when I create a new
connection to my database file on disk in every thread.  I've verified this
by running a single long query and then running the same query in several
threads and ensuring the net time is the same as the single thread query
time.

In order to get parallel readers on an in memory database I first loaded
the file into memory with:
rc = sqlite3_open_v2("file::memory:?cache=shared", , SQLITE_OPEN_URI |
SQLITE_OPEN_READWRITE, NULL);
I hold onto the db reference in the main thread after loading the data and
don't close that connection until all the worker threads are done
attempting to run my long query.

I then spawn N threads, each creating their own connection like so:
rc = sqlite3_open_v2("file::memory:?cache=shared", , SQLITE_OPEN_URI |
SQLITE_OPEN_READONLY | SQLITE_OPEN_NOMUTEX, NULL);
I have experimented with the various flags; however, everything I do gives
me serial performance.  There is some kind of mutex locking the database so
that running my query say two times takes twice as long as running it once
whereas with the disk based approach using this connection string in each
thread:
rc = sqlite3_open("data.sl3", );
makes the total time to run the query twice the same as running it just
once.

I am hypothesizing that since we are using 'cache=shared' and hence each
thread is sharing the same cache each read requires locking the database.
What I would like is to get the same kind of behavior as we get with "
file::memory:?cache=shared" wherein every time I open a new connection that
connection points to the same memory; however, does not actually involve
sharing the cache so no global mutex locks the database on every read.

I have put my test code in a gist.
My C code is here:
https://gist.github.com/danielrmeyer/fae54d5993f2800626c616e72782b5eb
I generate the 1.5GB test database with this python 3.4 script:
https://gist.github.com/danielrmeyer/bfa415256502471d1512f2155e76adc2

I compiled the C code on my system with the following command:
gcc -std=gnu99 test.c -o test -lsqlite3 -lpthread (I did download the
amalgamation and copied the sqlite.h and sqlite.o files into my cwd after
building)

I apologize if the C code is not as clean as it could be.  I'm primarily a
Python programmer but figured i'd be more likely to get help with a C test
case so I did my best to hack this together.  The Python GIL was confusing
the situation in any case.

A little background on what I am doing:  I have several large datasets that
I wish to serve up to customers to generate custom reports based on unique
slices of the data.  I am using a cluster of machines with .5TB of memory
each so loading all the data into memory is reasonable in my case.  I've
found that against my production work load I get massive speedups in single
threaded tests against the in memory database relative to the disk
version.  In fact I have found that the single threaded sqlite in memory
tests are faster than all the other database solutions i've looked at so I
am very excited about using sqlite, nevertheless I really need to scale to
many cores.  Also, my work load is highly random so cache is not much
help.  I really want the data in memory.  Any help is greatly appreciated.
I have started experimenting with memory mapped io; however, I have not had
much luck so far.


Use LMDB in SQLightning. Since LMDB reads acquire no locks, they scale 
perfectly linearly across arbitrarily many CPUs. No other DB engine will do 
what you're looking for. Everything else based on locking will bottleneck as 
soon as you extend beyond a single CPU socket.



I would not mind creating a fork of sqlite on github and hacking the code
if someone could give me pointers on what needs to be modified to get this
working.  Certainly if there is an extra flag or URI I need to use to get
concurrent in memory read access that would be great, but I'm willing to
try and modify the source code and sharing with the community if I can
figure out how to get this going.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: [sqlite] Parallel access to read only in memory database

2016-10-11 Thread Howard Chu

Jens Alfke wrote:



On Oct 9, 2016, at 10:41 AM, Howard Chu <h...@symas.com> wrote:

As for code freshness, I've seen no compelling new features from 3.8.x onward 
that would improve performance so there's been no reason to update further.


Perhaps, but there’s important new functionality in newer versions, such as 
partial indexes and indexes on expressions.

If it’s a personal project for you, or a tech demo, then it’s understandable 
for you to drop it when it stops being interesting; but if this is something 
intended for other people to use, they’re going to want to see it supported 
going forward (the way SQLCipher is, for instance.) It would be nice to add a 
note to the readme saying something like “FYI, this is based on SQLite 3.7.17 
and there are no plans to sync with newer versions.”


As with any open source project, if users want to see something change, the 
onus is on them to initiate those changes. Nobody associated with an open 
source project is ever obligated to proactively implement anything. 
particularly things that no user has asked for yet. SQLightning has many 
satisfied users already and none of them have requested the features you 
mentioned.


As for myself, it works for me in my personal builds of Mozilla Seamonkey.

Lack of development activity on a project doesn't mean it's dead and disused - 
it just means the project has accomplished its goals and its users are content.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-09 Thread Howard Chu

Jens Alfke wrote:



On Oct 9, 2016, at 8:15 AM, Howard Chu <h...@symas.com> wrote:

Use SQLightning, it's designed specifically for write once read many workloads.


"SQLite3 ported to use LMDB instead of its original Btree code” — sounds great, 
and the performance figures quoted in the readme are impressive. But the source code 
appears to be a modified version of SQLite’s source, which would make it very 
difficult to keep in sync with SQLite, and (as Domingo pointed out) the last commit 
is three years old and seems to be based on SQLite 3.7.17. So far this looks like an 
exciting proof-of-concept, but not something I’d use in a real project.

(By comparison, SQLCipher is also released as a modified copy of SQLite, but 
they sync with SQLite regularly; the latest version from this April is based on 
3.11.0.)

It would be best if this were implemented as a separate plugin, but as I’m not 
familiar with the innards of SQLite, I’ll assume that simply wasn't feasible. 
(I know SQLite supports VFS plugins, but I think those just operate at the 
paging layer, below the b-tree.)


SQLite3's code structure is monolithic, not modular, so there is no clean way 
to replace its underlying Btree layer. It's necessarily a hack-and-slash 
proposition.


As for code freshness, I've seen no compelling new features from 3.8.x onward 
that would improve performance so there's been no reason to update further. 
Many of the SQLite performance enhancements from 3.8 are in its own Btree 
code, which is entirely ripped out in SQLightning and thus irrelevant. On the 
other hand, there have been multiple regressions (performance and otherwise) 
in subsequent releases, most often in the query planner, which I've chosen to 
avoid.


E.g.
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg85558.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg86191.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg86901.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg89666.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg91201.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg92189.html
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg99646.html

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Parallel access to read only in memory database

2016-10-09 Thread Howard Chu

Daniel Meyer wrote:

We are interested in using sqlite as a read only, in memory, parallel
access database.  We have database files that are on the order of 100GB
that we are loading into memory.  We have found great performance when
reading from a single thread.  We need to scale up to have many parallel
reader threads.  Once the DB is created it never needs to be modified.  How
can we allow many reader threads on an in memory, write once read many
times database and achieve multi-core performance?  Is this possible with
sqlite?


Use SQLightning, it's designed specifically for write once read many workloads.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why so much I/O ? Can sqlite be improved ?

2016-10-01 Thread Howard Chu

Domingo Alvarez Duarte wrote:

Hello Simon !

I already did it without using "wal" and the result was the same.

And even for my surprise in one try I stopped at the middle performed an
"analyze" and the performance deteriorated a lot to a point that I needed to
delete the stats tables to get the better performance without "analyze".

I also tried with the lsm module and got a bit better performance but with an
irregular timing and a bigger disk usage (20%).

Also tested with lmdb with an astonishing insertion rate but with a lot more
disk usage and irregular timing.


Using LMDB the VACUUM command is supposed to be a no-op; at least that's how I 
intended it. Since LMDB deletes records immediately instead of leaving 
tombstones, there is nothing to vacuum.


Also tested with leveldb with a worse performance and almost twice disk space
usage.

The data distribution on some tables seem to fall into the worst corner cases
for btrees.

Cheers !


On 01/10/16 18:26, Simon Slavin wrote:

On 1 Oct 2016, at 10:18pm, Domingo Alvarez Duarte <mingo...@gmail.com> wrote:


About the vacuum I also understand the need to rewrite the whole database
but I'm not sure if it's really necessary to do almost 5 times the database
size in both reads and writes (also an equivalent amount of I/O happened
during insertions).

Can you try it without

db.exec_dml("PRAGMA wal_checkpoint(FULL);");

and see if that improves time ?

That's the only thing I can see.  You're using a nested INSERT OR IGNORE
command I'm not familiar with.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Sync journal's directory per transaction? or until database closed? in PERSIST mode

2016-05-26 Thread Howard Chu

Graham Holden wrote:



 Original message 
From: Dan Kennedy <danielk1...@gmail.com>
Date: 26/05/2016  18:04  (GMT+00:00)
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] Sync journal's directory per transaction? or until 
database closed? in PERSIST mode


On UNIX, it's possible to delete a file from the file-system while

another process has it open. In this case the other process can continue
reading and writing its file-descriptor as normal, but the data is
stored in memory only, not on disk (since the directory entry has been
deleted). Once the process exits or closes the file-descriptor, the data
is lost.

It probably doesn't affect the argument of why SQLite does what it does,
but I thought (but it's been a long time since I poked around UNIX
file-systems) that data can still be written (or read) to disk because the
i-node chain is still present, it's just that there's no name by which
another proces can access it. Presumably, the final close will release the
disk blocks of a marked-as-deleted file.


Correct. "The data is stored in memory only" is completely impractical - you 
can still write much more data than will fit in RAM. One common mistake of new 
Unix admins is to delete large log files from /var/spool when they're low on 
disk space; if they do this without telling syslog to close the files first, 
the space remains in use and can't be freed until syslog is killed and restarted.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Make mmap_size dynamic?

2016-05-03 Thread Howard Chu
Mikael wrote:
> On Tuesday, 3 May 2016, Richard Hipp  <javascript:_e(%7B%7D,'cvml','drh at sqlite.org');>> wrote:
>
>> On 5/2/16, Mikael  wrote:
>>>
>>> If I just force it on (by hacking the build script), as long as mmap_size
>>> always is 2^63, will Sqlite access the file via memory accesses only, and
>>> never using fread/fwrite which would lead to undefined behavior because
>> of
>>> the absence of a UBC?
>>>
>>
>> SQLite only reads using mmap.  Write always happen using the write()
>> (or pwrite()) system call.  So it will never work on OpenBSD.
>>
>> The database file is mmap-ed read-only.  This is so that stray
>> pointers in the application cannot trivially corrupt the database
>> file.
>
>
> Aha.
>
> What do you say about the idea of a memory access only mode, enabled
> with SQLITE_MMAP_NO_FILEIO on any OS, for the daring users who want that?

It's certainly an avenue to getting functionality in OpenBSD. In general, e.g. 
on Linux, there's not much to gain from it. There is a small performance gain 
when all of your data fits in RAM. There is a larger performance loss when 
your DB is larger than RAM - every time you touch a page to write to it, if 
it's not already memory-resident, the OS faults it in from disk. It's a wasted 
page fault if you were simply going to overwrite the entire page (which e.g. 
LMDB does). If you update pages in-place, instead of doing COW as LMDB does, 
it may or may not be a wash, I haven't measured that use case.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Changing the default page_size in 3.12.0

2016-03-05 Thread Howard Chu
Jim Callahan wrote:
> Is 4096 bytes a large enough page size?
>
> Apparently the disk drive industry has shifted from 512 byte sectors to
> 4096 byte sectors.
> http://tilt.lib.tsinghua.edu.cn/docs/tech/tp613_transition_to_4k_sectors.pdf
>
> Should SQLite maintain a 1:1 ratio between page size and sector size?
> or should the page size be a multiple of the sector size? Say 2:1 (8096 or
> 8K)? or 4:1 (16K).
>
> What sizes do other databases use? (SQL Server and Postgres both use 8096
> default)

You mean 8192.

> For years, virtual machines (VM) have used 4K pages (I think this started
> with IBM VM/370);
> while disk drives had 512 byte sectors (an 8:1 ratio).
>
> With a 2:1 ratio, in terms of seek time, one gets the second page for free.
>
> Would 8096 bytes (8K) be too much for a multi-tasking embedded device (such
> as a smart phone?).

You shouldn't even be discussing a hardcoded number. The page size should be 
equal to the page size of the underlying memory management system. 4K on 
common x86 systems, 8K on SPARC, etc. Choosing a number smaller than this will 
cost you in RMW ops whenever the filesystem tries to do an update. Choosing a 
number larger than this is generally going to waste memory.
>
> Are there any benchmarks?
>
> Jim
>
>
>
>
>
>
>
> On Fri, Mar 4, 2016 at 10:48 AM, Richard Hipp  wrote:
>
>> The tip of trunk (3.12.0 alpha) changes the default page size for new
>> database file from 1024 to 4096 bytes.
>>
>>  https://www.sqlite.org/draft/releaselog/3_12_0.html
>>  https://www.sqlite.org/draft/pgszchng2016.html
>>
>> This seems like a potentially disruptive change, so I want to give
>> you, the user community, plenty of time to consider the consequences
>> and potentially talk me out of it.
>>
>> The "Pre-release Snapshot" on the download page
>> (https://www.sqlite.org/download.html) contains this change, if you
>> want to actually evaluate the latest changes in your application.
>>
>> We hope to release 3.12.0 in early April, or maybe even sooner, so if
>> you want to provide feedback, you should do so without unnecessary
>> delay.
>> --
>> D. Richard Hipp
>> drh at sqlite.org
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Fwd: Re: SQLite and Thunderbird

2016-02-26 Thread Howard Chu
Richard Hipp wrote:
> I suppose that Thunderbird was making use of the fts3_tokenizer()
> interface, which has be removed from standard builds due to security
> concerns, as of version 3.11.0.  You can reenable that feature at
> compile-time by building with -DSQLITE_ENABLE_FTS3_TOKENIZER.  See the
> last bullet (the only bullet under the "Backwards Compability"
> heading) of the release notes at
> https://www.sqlite.org/releaselog/3_11_0.html for links to further
> information.
>
> At this time, you basically have two options:
>
> (1) Compile your system sqlite3.so library using
> SQLITE_ENABLE_FTS3_TOKENIZER and hope that none of the applications
> that link against this library use it in such a way that the
> fts3_tokenizer() could present a security vulnerability.
>
> (2) Statically link against a version of SQLite that you compile
> yourself.  SQlite is a single file of C code ("sqlite3.c") so making
> it a part of the project source tree is not a big deal.
>
> Option (2) seems like the best choice to me since that guarantees that
> Thunderbird will continue to operate regardless of what historical
> version of sqlite3.so happens to be installed (or not installed) on
> the system and regardless of the compile-time options used to create
> that sqlite3.so.  (For example, what if somebody installs a new
> sqlite3.so that omits full-text search?)  Static linking removes a
> dependency and makes Thunderbird more robust.

Thunderbird has *always* used its own statically built sqlite, just like all 
other Mozilla software.

In fact, it has more than one copy:
https://hg.mozilla.org/mozilla-central/file/918df3a0bc1c/db/sqlite3/src

https://hg.mozilla.org/mozilla-central/file/918df3a0bc1c/security/nss/lib/sqlite

>
> On 2/26/16, Steven Haigh  wrote:
>> Dear sqlite-users list,
>>
>> I'd like to try and get some pointers on the following issue as
>> documented on the following included BZ issues.
>>
>> Please CC me as I'm not subscribed to this list.
>>
>>
>>  Forwarded Message 
>> Subject: Re: SQLite and Thunderbird
>> Date: Fri, 26 Feb 2016 10:06:25 +0100
>> From: Jan Stan?k 
>> Organization: Red Hat
>> To: Steven Haigh 
>> CC: nils at redhat.com, stransky at redhat.com
>>
>> Hi,
>> I presume thet this is general thunderbird issue, not Fedora specific
>> one. If so, I would suggest asking at
>> sqlite-users at mailinglists.sqlite.org, they are usually quite helpful.
>>
>> Regards,
>> Jan
>>
>> Dne 26.2.2016 v 07:10 Steven Haigh napsal(a):
>>> Re:
>>>  https://bugzilla.redhat.com/show_bug.cgi?id=1310864
>>>  https://bugzilla.redhat.com/show_bug.cgi?id=1311032
>>>
>>> Hi all,
>>>
>>> Just trying to open a channel of communication regarding these bugs.
>>>
>>> While I believe thunderbird uses a format of call that is depreciated in
>>> the newer SQLite packages, it is not ideal to statically compile
>>> thunderbird against sqlite to make it work (which I believe is the
>>> current fix).
>>>
>>> Any suggestions on a long-term fix?
>>>
>>
>>
>> --
>> Jan Stanek - Red Hat Associate Developer Engineer - Databases Team
>>
>>
>>
>> --
>> Steven Haigh
>>
>> Email: netwiz at crc.id.au
>> Web: https://www.crc.id.au
>> Phone: (03) 9001 6090 - 0412 935 897
>>
>>
>>
>>
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] IS a SQLite db of small size as good as reliable cache?

2016-02-03 Thread Howard Chu
GB wrote:
> So where have you got this wisdom from? It's just plain Bullshit!
>
> Just as most cache managers do, Windows cache manager uses some sort of LRU
> caching scheme. So all data once read from file is kept in memory until either
> some memory pressure occurs or it is simply pushed out by newer data.

No. Windows will toss clean pages out even in the total absence of memory 
pressure. It is moronic, but true. You can repeat the steps I outlined for 
yourself and see.

https://groups.google.com/d/msg/comp.os.ms-windows.nt.misc/449tdNYPoX0/it4cx8Gvv2AJ

> Actually, from what I've experienced, I'd recommend to size sqlite's memory to
> fit for your largest transactions and leave most of the caching to the OS.
>
> And yes, unlike in many *NIX implementations, Windows' FlushFileBuffers() call
> DOES issue according write commands to the hardware, so setting PRAGMA
> synchronous ON/OFF makes a big difference in write performance on Windows
> systems.
>
> Howard Chu schrieb am 30.01.2016 um 23:23:
>> dpb wrote:
>>> Dear Community,
>>>
>>> Assuming that SQLite would arrange the database table contents on page
>>> boundary and
>>>
>>> 1)  if db is small let us say maximum 5MB,
>>>
>>> 2  further if such a db is part of windows service / application running
>>> continuously then may be pages will remain in the memory under normal
>>> conditions.
>>>
>>> Q1) Operating System like Windows would cache the series of pages, is this
>>> behavior not almost same as that of Cache?
>> The Windows cache manager is quite aggressive at evicting cached pages from
>> RAM. It used to be tunable back in Win2000, but none of those tuning knobs
>> survived past WinXP. Generally, if you access some data, leave it for more
>> than 5-10 seconds, don't expect to be able to reference it again without
>> incurring a hard page fault.
>>
>> Note that the cache eviction runs quite frequently - once every 5 seconds or
>> so, and evicts pages regardless of whether there's any memory pressure in the
>> system. It's quite possibly the stupidest cache manager ever written.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-02-01 Thread Howard Chu
Stephen Chrzanowski wrote:
> @Rowan;
>
> First off, whether the OS or SQLite is ACID or not, if you pull the plug on
> your hardware, all bets are off on whether it'll BOOT, let alone recover a
> single transaction.  I get that this could be a useful tool when doing
> disaster proofing, but, at that stage in the game of bulletproofing, you
> can't win every battle, and you're running into that at 100 miles an hour.

Your expectations are pretty low. On a properly configured Unix host, there's 
no reason for a powerfail to prevent a successful reboot. E.g., if you mount 
boot and root filesystems as read-only filesystems, they can never get 
corrupted. If you're using modern filesystems for your writable partitions 
(e.g., FSs with journaling) then there's also no reason for them to fail to 
come back online.

So it just comes down to your application code being reliable.

I should note that SQLightning has none of the problems being described in 
this thread - in its default mode, it is full-ACID and a powerfail cannot lose 
data or corrupt the database. And it does all this while being at least 30% 
faster on writes than vanilla SQLite.

Yes, the OS could have bugs. Yes, the hardware could physically fail. That's 
pretty rare though; HDDs R/W heads auto-retract on powerfail so unless the 
entire mechanism actually jammed, there's no way for a powerfail to cause a 
head crash or any other destructive event.

Bottom line - if your OS reboots successfully, there's no excuse for your 
database to not also come up successfully, fully intact.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] IS a SQLite db of small size as good as reliable cache?

2016-02-01 Thread Howard Chu
James K. Lowden wrote:
> On Sat, 30 Jan 2016 22:23:24 +
> Howard Chu  wrote:
>
>> Note that the cache eviction runs quite frequently - once every 5
>> seconds or so, and evicts pages regardless of whether there's any
>> memory pressure in the system. It's quite possibly the stupidest
>> cache manager ever written.
>
> Any insight into what they were thinking?  Back when I used Windows
> daily, it used to annoy me that every morning the machine had to warm
> up again, to revive the state I'd left it in the night before.  In
> NetBSD I learned that unused memory is unused, so why not use it?

As I understand it, the main rationale is reliability - they don't trust their 
OS to stay up longer than 3 seconds after any particular write operation.

There's a lot more information here
https://groups.google.com/forum/#!topic/comp.os.ms-windows.nt.misc/449tdNYPoX0
That's from 2005 and unfortunately the Windows kernel variables for cache 
tuning no longer exist.

A lot of the relevant info no longer exists on the original websites either, 
but I was able to pull one up from the web archive:

https://web.archive.org/web/20010825042328/http://www.sysinternals.com/ntw2k/source/cacheman.shtml

> I have a feeling that "fast app launching" is the reason, as though
> Windows users were excitedly punching the Start button with a
> stopwatch.  But maybe there's more to it than that?

It may well be a user-oriented philosophy. It is certainly not a 
developer-oriented approach. It was my frustration with slow build times on 
Windows that led me to investigate this in the first place.

https://bugzilla.mozilla.org/show_bug.cgi?id=294122#c69

A lengthier explanation of how it works is online here 
https://msdn.microsoft.com/en-us/library/bb742613.aspx but there's pretty much 
no information there that's actionable - aside from LargeSystemCache there's 
no tuning knobs left.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] IS a SQLite db of small size as good as reliable cache?

2016-01-30 Thread Howard Chu
dpb wrote:
> Dear Community,
>
> Assuming that SQLite would arrange the database table contents on page
> boundary and
>
> 1)  if db is small let us say maximum 5MB,
>
> 2  further if such a db is part of windows service / application running
> continuously then may be pages will remain in the memory under normal
> conditions.
>
> Q1) Operating System like Windows would cache the series of pages, is this
> behavior not almost same as that of Cache?

The Windows cache manager is quite aggressive at evicting cached pages from 
RAM. It used to be tunable back in Win2000, but none of those tuning knobs 
survived past WinXP. Generally, if you access some data, leave it for more 
than 5-10 seconds, don't expect to be able to reference it again without 
incurring a hard page fault.

Note that the cache eviction runs quite frequently - once every 5 seconds or 
so, and evicts pages regardless of whether there's any memory pressure in the 
system. It's quite possibly the stupidest cache manager ever written.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-28 Thread Howard Chu
Simon Slavin wrote:
>
> On 28 Jan 2016, at 1:38pm, Bernard McNeill  wrote:
>
>> ===
>> Like the user reading ?saving OK? and throwing away the
>> Post-It with the original information
>> ===
>>
>> This is exactly my concern.
>> The user throwing away the Post-It is entirely reasonable if he sees a
>> message like that.
>>
>> Do you happen to know if Linux/Debian (which I think uses a journalling
>> filesystem) carries this risk?
>
> The problem is not at the software level.

Not true. There *is* a problem at the software level - on Linux, current BSD 
(and apparently also on QNX) you must fsync the containing directory when you 
make changes to the contents of a directory (create/delete/rename files). This 
is above and beyond whatever lies the hardware layer may tell you. It's a 
documented requirement in Linux, at least. It is also independent of whether 
or not the filesystem uses journaling.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-26 Thread Howard Chu
Richard Hipp wrote:
> On 1/25/16, Howard Chu  wrote:
>>
>> This is actually quite an unusual requirement; on older Unix systems you
>> couldn't even *open* a directory, let alone obtain write access to it or
>> fsync it.
>
> Yeah.  When the SQLITE_DISABLE_DIRSYNC compile-time option is present,
> we disable the directory sync logic for this reason.  Some unixes
> (HP/UX) require -DSQLITE_DISABLE_DIRSYNC in order to work.  But Linux,
> MacOS, and *BSD all work without it, so I thought I'd just not bring
> that up...

I would have to say this is a BSD OS bug as it breaks the guarantees stated in 
the manpages. I.e., rename() and unlink() syscalls are documented to be 
atomic, and fsync() doesn't say anything about being needed to sync a directory.

http://www.unix.com/man-page/FreeBSD/2/fsync
http://www.unix.com/man-page/FreeBSD/2/unlink
http://www.unix.com/man-page/FreeBSD/2/rename/

I no longer have BSD source code on hand but I'd bet that when those manpages 
were written, all directory modifications in the BSD ffs were always 
synchronous. Linux obviously changed this but at least their fsync() manpage 
documents the behavior.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Bug: Successfully committed transaction rolled back after power failure

2016-01-25 Thread Howard Chu
Matthias-Christian Ott wrote:
> On 2016-01-25 16:47, Richard Hipp wrote:
>> On 1/25/16, Matthias-Christian Ott  wrote:
>>>
>>> Does this mean that if I use SQLite SQLITE_EXTRA_DURABLE=0, PRAGMA
>>> journal_mode=DELETE and PRAGMA synchronous=FULL, SQLite could loose a
>>> transaction that it said to be committed depending on the VFS?
>>
>> Sort of.  This appears to be true if you are running on QNX and you
>> lose power (or do a hard reset) shortly after the transaction commits.
>> It might be the case on other OSes/filesystems but it has never before
>> been observed.
>>
>> This is not new behavior.  This is apparently what SQLite has been
>> doing for 15 years, across quadrillions of transactions on many
>> billions of devices, and it has never before caused any issues, until
>> just recently when Mr. Meinlschmidt upgraded to a newer version of
>> QNX.
>
> So it would make sense to add a note that you should check whether your
> target VFS and target operating environment needs an fsync after a
> journal commit if you want to use this journal mode. Would it be
> possible to make SQLITE_EXTRA_DURABLE a pragma? Some GNU/Linux
> distributions package SQLite and therefore not every application can
> compile SQLite with different options.

The question isn't just whether "an fsync is needed" with journal mode - the 
question is *which* fsync is needed? The issue here is that file 
creation/deletion/rename ops require an fsync *on the containing directory*. 
This is actually quite an unusual requirement; on older Unix systems you 
couldn't even *open* a directory, let alone obtain write access to it or fsync 
it.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] POSIX and MMU (was Re: out of the void: xDlSym)

2016-01-17 Thread Howard Chu
Scott Robison wrote:
> On Sat, Jan 16, 2016 at 1:58 PM, James K. Lowden 
> wrote:
>
>> On Fri, 15 Jan 2016 21:41:41 -0500
>> Richard Damon  wrote:
>>
>>> there are machines where it doesn't work (you just need a larger
>>> program space than data space).
>>
>> Huh.  An example of which is the "medium model" of the Intel 8086:
>> 20-bit code pointers and 16-bit data pointers.  A machine for which C
>> compilers existed, and on which no Posix system will ever run (because
>> it lacks an MMU).  Thanks for that.
>>
>
> Sorry for the OT diversion, but I'm just curious as I don't have historical
> POSIX standards for reference. Does POSIX really *require* an MMU?
> Certainly Unix like systems were written for 8086 class computers, but
> given that POSIX was first standardized in 1988 I'm just curious as to
> whether or not an MMU is a requirement or just really nice to have.

ST-Minix ran on MC68000 - no MMU. POSIX API only defines a programming model, 
it doesn't mandate how it gets implemented under the covers.

An MMU *can* make some things easier, but we had fork/exec etc. even without it.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] SQLITE vs. OSX mmap == inevitable catalog corruption?

2015-11-18 Thread Howard Chu
Dan Kennedy wrote:
> On 11/18/2015 03:37 PM, T?r?k Edwin wrote:
>> On 11/17/2015 12:11 AM, Deon Brewis wrote:
>>> We found that using SQLITE memory mapped I/O (mmap_size > 0) in OSX isn't
>>> usable in any way shape or form. It will inevitably lead to catalog
>>> corruption if you hard-reboot OSX, even without the database or application
>>> open.
>>>
>> I tried to turn on mmap I/O in SQLite 3.9.1 in our application on CentOS 7
>> and Debian Jessie, but after 24-48h of continous writes to the DB I always
>> get a corruption:
>> SQLite result 0xb: database corruption at line 76915 of [767c1727fe]
>> SQLite result 0xb: statement aborts at 10: [DELETE FROM revision_blocks
>> WHERE revision_id=:revision_id]
>> Query "DELETE FROM revision_blocks WHERE revision_id=:revision_id" failed:
>> (code 0xb: database disk image is malformed) database disk image is malformed
>>
>> After this 'pragma integrity check' says:
>> row 90814 missing from index sqlite_autoindex_revision_ops_1
>> wrong # of entries in index sqlite_autoindex_revision_ops_1
>> row 1046646 missing from index idx_revmap
>> row 1046646 missing from index sqlite_autoindex_revision_blocks_1
>> wrong # of entries in index idx_revmap
>> wrong # of entries in index sqlite_autoindex_revision_blocks_1
>>
>> There are not reboots involved, just multiple processes accessing a WAL DB.
>> Without mmap I/O I've never seen corrupted DBs in our application.
>
> As of yesterday, SQLite uses a read-only mapping in mmap mode. The db file is
> written using plain old write(), just as in non-mmap mode:

That's the safest way to use mmap, but keep in mind that this requires a 
unified buffer cache and systems like OpenBSD still don't have that, so this 
approach will cause corruptions on systems like that.
>
>http://sqlite.org/src/info/67c5d3c646c8198c
>
> It would be interesting to know if this clears the problem in your 
> environment.


-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] MMAP performance with databases over 2GB

2015-08-08 Thread Howard Chu
Roger Binns wrote:
> It would also be really nice if there wasn't a 2GB mmap limit on 64
> bit machines.  The database would fit in my RAM around 4 times, and in
> the address space more times than there are grains of sand!  Yea I
> know this isn't very Lite ...

SQLightning has no such limit... https://github.com/LMDB/sqlightning there's 
nothing "heavy" about making maximum use of mmap.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Mozilla wiki 'avoid SQLite'

2015-06-18 Thread Howard Chu
Simon Slavin wrote:
> The thing I always found interesting about SQL was that it picks three
> English words, INSERT, DELETE, UPDATE, and says that that's all you need to
> do.  And it's right !  Is there something special about the 'three-ness' of
> database operations ?  Or are you meant to think of it as two writing
> operations (INSERT, DELETE) and a convenience operation which combines them
> (UPDATE) ?  If there was another word, what would it be ?  REPLACE ?
> DUPLICATE ?

LDAP/X.500 has Add/Delete/Modify as well. (It also has Rename, which doesn't 
really make sense for a tabular data store, but is useful for a hierarchical 
data structure.)

> Also, why is there only one English word needed for reading operations ?
> What would a database language look like if it has more than one word ?
> Would there be a difference between FIND and SCAN ?

X.500 has three separate operations Read, List, and Search. (Read = retrieve 
contents of a single entry, List = list the names of the immediate children of 
an entry, Search = search for any entries matching a filter.) LDAP combined 
all of these functions into a single Search operation. It's often considered 
to be a mistake on the part of the LDAP designers.

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] Enabling mmap on OpenBSD: Anything really in the way, and how do?

2015-05-22 Thread Howard Chu
Mikael wrote:
> You who know SQLite all the way through, would either 1) or 2) or something
> else still make a problem on OpenBSD?
>
>
> If not, how do I use SQLite with all-file memory mapping on OpenBSD: I just

Use SQLightning. https://github.com/LMDB/sqlightning

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] How do non-SQLite DBMS communicate?

2015-04-23 Thread Howard Chu
Drago, William @ CSG - NARDA-MITEQ wrote:
> All,
>
> I'm not sure how to ask this question.
>
> When using SQLite the application program accesses the SQLite DBMS
> via
its .dll file. When using something like Oracle Express (a local DBMS)
the application program is communicating with Oracle Express via some
sort of network protocol even though there's no network involved. What
is that called?
>
> My reason for asking is, I'm writing a short white paper describing
> my
use of SQLite (to encourage other engineers where I work to use it too)
and I'm trying to explain the differences between SQLite and other local
database systems. So, SQLite databases are accessed via .dll where as
other local databases run a server that is accessed via ???

The generic term is interprocess communication, IPC. You should be able 
to search on those keywords to find out anything else you want to know.

SQLite is in the class known as "embedded databases" because its code is 
embedded in the application code, instead of running in a separate 
process. There are many other DBs of this class but most of them are 
lower level (key value stores) instead of offering the SQL data model. 
Berkeley DB is another well-known example of this type, as is LMDB.

Many embedded DBs not only run within a single process, but also can 
only support read-write access to a database's files from a single 
process (i.e., no support for multi-process concurrency). Berkeley DB 
and LMDB are somewhat rare since they support transactional 
multi-process concurrency.
>
> Can anyone help me fill in those question marks?
> Thanks,
>
> --
> Bill Drago
> Senior Engineer
> L3 Narda-MITEQ<http://www.nardamicrowave.com/>
> 435 Moreland Road
> Hauppauge, NY 11788
> 631-272-5947 / William.Drago at L-3COM.com<mailto:William.Drago at L-3COM.com>

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


[sqlite] sqlite3 (or sqlite4) performance on NFS

2015-04-12 Thread Howard Chu
Stephen Chrzanowski wrote:
>>From what I understand;
> - Read-Only data
> - Data doesn't change frequently
> - Central repository for data
> - Network latency causing issues
>
> My two cents on this is to keep a database revision ID kicking around and
> do a SQLite backup of the remote data to a local storage medium.  At
> application launch, check the local version of the database, then check the
> NFS version, and if there is a mismatch or a local copy doesn't exist, have
> the application ask (Or force if no local copy exists) to copy the data
> from remote to local, then read data from the local source.  This will be a
> bit of a PITA if you're talking gigabytes of storage on a saturated 100mbit
> network or if drive space is limited.  (I love my quiet GigE network)

Congratulations, you've just reinvented AFS, but without any of its 
convenience.

Seriously - there are many other distributed filesystems out there, all 
designed because users keep running into the same deficiencies of NFS, 
over and over again. Please, can we stop reinventing this wheel now?

-- 
   -- Howard Chu
   CTO, Symas Corp.   http://www.symas.com
   Director, Highland Sun http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Howard Chu

Scott Robison wrote:

On Sat, Sep 13, 2014 at 2:24 PM, Howard Chu <h...@symas.com> wrote:


Scott Robison wrote:


A couple of academic thoughts.




1. If one wanted to embed the journal within the database, would it be
adequate to reserve a specific page as the "root" page of the journal,
then
allocate the remaining pages as normal (either to the journal or the main
database)? This does leave the big hole problem so it may still not be
ideal, but it would give you a known location to find the beginning of the
journal without doubling the database size or requiring an extra file.



Starting with a known location is definitely a step in the right direction.

  2. Building on 1, could sparse files be used to accomplish this? Seek to

"really big constant offset" and do all journaling operations at that
point, allowing the operating system to manage actual disk allocation? If



We're talking about implementing a filesystem. "the operating system" is
your own code, in this case, you don't get to foist the work off onto
anyone else.



No, Simon's original question was to the effect of why doesn't SQLite just
use the already open database file for journaling purposes as well.


OK, maybe I missed that, but I thought that question itself arose from how to 
use SQLite to implement a filesystem, on a raw partition. And the answer to 
that question (operating inside a raw partition) could apply equally well to 
operating inside a single file.


If you preassign a fixed maximum size to the file, you could e.g. reserve the 
tail of the file for the journal, growing backward toward the head of the 
file, while the main data grows the usual direction from the head of the file 
toward the tail. This would basically be your (2) above. On HDDs this approach 
would have horrible seek latencies but it could work OK on SSDs.


The other point though - like the existing journaling filesystems, you should 
not limit yourself to using a single file/storage device. Allow the option of 
storing the journal somewhere else - the performance potential is worth it.



My
point 1 was in response to the need to know where the journal file is, so
just pick a dedicated page in the file as the root page of the journal,
allowing the two files to be co-mingled. It doesn't address every possible
bad reason for co-mingling the data, but it would at least answer the
question "how do you find the journal".

My second point was about existing SQLite database files that live in a
file system managed by some operating system. SQLite already foists that
work off on to someone else, this would be no different. It still may be a
bad idea, but that's not the reason why it wouldn't work. :)




--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Howard Chu

Scott Robison wrote:

On Fri, Sep 12, 2014 at 6:21 PM, Richard Hipp <d...@sqlite.org> wrote:


On Fri, Sep 12, 2014 at 8:07 PM, Simon Slavin <slav...@bigfraud.org>
wrote:


   one thing that annoys me about SQLite is that it needs to make a
journal file which isn't part of the database file.  Why ?  Why can't it
just write the journal to the database file it already has open ?  This
would reduce the problems where the OS prevents an application from
creating a new file because of permissions or sandboxing.


Where in the database does the journal information get stored?  At the
end?  What happens then if the transaction is an INSERT and the size of the
content has to grow?  Does that leave a big hole in the middle of the file
when the journal is removed?  During recovery after a crash, where does the
recovery process go to look for the journal information?   If the journal
is at some arbitrary point in the file, where does it look.  Note that we
cannot write the journal location in the file header because the header
cannot be (safely) changed without first journaling it but we cannot
journal the header without first writing the journal location into the
header.


Journaling filesystems already have this problem. By default they just use a 
section of the partition, reserved at FS creation time. Which leads to the 
problem already described in the video that started this thread - perform a 
large enough write operation and you can exceed the fixed size of the journal, 
which requires the journal data to be split and the operation journal update 
is no longer atomic.


Of course, most journaling filesystems also allow you to optionally specify an 
external journal - i.e., instead of embedding the journal on the filesystem's 
partition, you can use some other block device instead. Naturally you can also 
choose a larger size when doing this. Putting the journal on a separate device 
can bring some major performance benefits, as well as accomodating larger 
transactions.


In the tests I did two years ago, JFS with an external journal was blazingly 
fast. http://symas.com/mdb/microbench/july/#sec11



One idea that might work is to interleave the journal information with the
content.  So for each page in the database, there is a corresponding page
of journal content.  The downside there is that you double the size of the
database file without increasing its storage capacity.


This is why LMDB is much better suited to this task - it uses no journal at 
all, nor does it require compaction/defragmentation/VACUUMing.



A couple of academic thoughts.

1. If one wanted to embed the journal within the database, would it be
adequate to reserve a specific page as the "root" page of the journal, then
allocate the remaining pages as normal (either to the journal or the main
database)? This does leave the big hole problem so it may still not be
ideal, but it would give you a known location to find the beginning of the
journal without doubling the database size or requiring an extra file.


Starting with a known location is definitely a step in the right direction.


2. Building on 1, could sparse files be used to accomplish this? Seek to
"really big constant offset" and do all journaling operations at that
point, allowing the operating system to manage actual disk allocation? If


We're talking about implementing a filesystem. "the operating system" is your 
own code, in this case, you don't get to foist the work off onto anyone else.



this were possible, deleting the journal would be a "fast" truncate
operation. A custom VFS might be able to provide a proof of concept... hmm.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Howard Chu

Scott Robison wrote:

On Sat, Sep 13, 2014 at 1:43 PM, Richard Hipp <d...@sqlite.org> wrote:


Decades ago, files came in all kinds of varieties and permutations.
Details varied from one OS to the next.  But it was common to have a
distinction between text files and binary files (with different APIs for
accessing each.)  It was also common to have a difference between "ordinary
files" (that had to be read sequentially from beginning to end) and
"random-access files", which supported operations similar to lseek().
(Once again, completely incompatible APIs existed for accessing each file
type.)  With binary files, one often had to specify a "block size" which
was the increment in which the file was read and written.  The block size
was typically a property of the file and could not be changed after the
file had been created.  There were often restrictions on the permitted
values for block sizes.  And you couldn't ask the operating system to tell
you whether a file was text or binary or sequential or random-access or
what its block-size was;  you just had to know.  And bewildering problems
resulted if you got it wrong.



And this was not true just of big expensive business class machines. At the
very least (and I suspect more) Commodore 8-bit DOS (which was embedded
within a smart drive with its very own dedicated CPU & RAM) supported
(essentially) sequential byte stream files (no random seeking for these!)
and random access record oriented files (where the record size was set at
file creation time). Man were those a pain in the backside to use.

Now imagine writing an ftp client (or server) for one of these. I wrote both 
for an IBM mainframe, way back when. You had to trust the user to select the 
appropriate record mode for the ftp transfer to succeed without just getting 
garbage on the other end.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Howard Chu

Scott Robison wrote:

On Sat, Sep 13, 2014 at 9:39 AM, Richard Hipp <d...@sqlite.org> wrote:


I say that a filesystem is an eventually-consistent key/value database.
The keys are the filenames and the values are all big BLOBs, specifically
the file content.  Filesystems also have a hierarchical keyspace, which is
an extension from the usual key/value concept, but it is still key/value.



Dan Bernstein, author of qmail & djbdns (among others), used the file
system as a configuration database for those applications. Rather than
having a text configuration file, he used the directory and file names as
keys and their contents as values. I seem to recall him later regretting
this choice (in part, anyway) but I always thought there was a certain
elegance to that solution. It's not perfect, but what is?

OpenLDAP's config database currently uses the filesystem this way as well. 
It's no paragon of efficiency, but it doesn't need to be particularly 
performant in the first place, and it requires zero setup.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-13 Thread Howard Chu

Simon Slavin wrote:

That's not what I meant.  That's the file as a database.  What I want is
the entire volume as a database.


That's exactly what I pointed you to before. The thesis is pretty enlightening 
too.


http://www.fsl.cs.sunysb.edu/docs/kbdbfs-msthesis/

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Howard Chu

Nico Williams wrote:

On ZFS datasets with sync disabled fsync() functions as osync(), as a write
barrier without durability and without the associated penalty.  The obvious
problem is that really do need osync() and fsync(); just one or the other
is not a reasonable compromise.


Write barriers have been debated in Linux ad nauseum. I agree that osync() 
would be great to have, but it's still a die roll - the OS can flush blocks to 
the storage device in order, but without waiting for the storage device's 
buffer to empty, can't make any further ordering promises from there. You need 
device-level ordering support too. - which prompted my suggestion here


http://www.spinics.net/lists/linux-fsdevel/msg70047.html

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Howard Chu

Richard Hipp wrote:

On Thu, Sep 11, 2014 at 5:49 PM, Kees Nuyt <k.n...@zonnet.nl> wrote:



Hi all,

Today I bumped into a presentation about ordering and atomicity
of filesystems that might interest you.

https://www.youtube.com/watch?v=YvchhB1-Aws

The Application/Storage Interface: After All These Years, We're
Still Doing It Wrong
Remzi Arpaci-Dusseau, University of Wisconsin—Madison

Talk at usenix 2014 Published on Sep 4, 2014 by USENIX
Association Videos

Somewhat related to the article drh recently wrote about using
sqlite as an application data store.



Thanks for the link, Kees!

I just finished watching the video.  Remzi Arpaci-Dusseau talks about
research (done by he and his graduate students) into how well application
data survives system crashes.  Remzi observes that filesystem developers
have worked very hard for many years ensuring that filesystem metadata is
preserved in a crash, but they seem less concerned about protecting
application data.

Remzi developed tools (BOB and ALICE) to study various workloads to see how
vulnerable they were to system crashes.  He looked at various
"applications".  His definition of "application" includes standalone
programs like Git and Hg, and database servers like PostgreSQL, and
libraries like SQLite and LevelDB.  At one point he shows a chart that
counts the number of unwarranted assumptions that the applications make
about filesystem behavior.  Such unwarranted assumptions can lead to
corruption following a system crash (or power loss).

SQLite and PostgreSQL came out on top, with just one vulnerability each.
Hg and Git each had many vulnerabilities.  In fairness, Remzi points out
that these vulnerabilities assume a "worst case" filesystem and that many
of them might not exist on a modern filesystem like EXT4.


Actually LMDB comes out on top with zero vulnerabilities. I spoke to the UWisc 
folks to find out what was the one Atomicity vulnerability they reported in 
LMDB and we confirmed that it was not in fact a valid vulnerability.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] presentation about ordering and atomicity of filesystems

2014-09-12 Thread Howard Chu

Simon Slavin wrote:


On 12 Sep 2014, at 3:18am, Scott Robison <sc...@casaderobison.com> wrote:


That was an interesting 90 minutes.


Indeed.  Thanks to Kees for posting it.  Though I was surprised he didn't 
mention the term 'ACID' explicitly.

I'm still of the opinion that we need an actual transactional file system
with equivalents to BEGIN, END and ROLLBACK.  It will have to support many
transactions at the same time, of course, since each process will be doing
its own thing.


There have been such projects. They don't seem to have continued though.
http://mile-outta-boyce.blogspot.ie/2007/05/kernel-berkeley-db-file-system-kbdbfs.html

https://github.com/dmeister/kbdb/tree/master/kbdbfs-1.0

I've got a project underway to retrace their steps, using LMDB instead of 
BerkeleyDB.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite on flash filesystem

2014-08-16 Thread Howard Chu

Simon Slavin wrote:


On 16 Aug 2014, at 9:32pm, Baruch Burstein <bmburst...@gmail.com> wrote:


On Fri, Aug 15, 2014 at 10:45 AM, Levente <leventel...@gmail.com> wrote:


I am thinking about putting an SQLite database on a flash drive (SD card).
I would like to know what kind of file system is the optimal. I want to
query a lot, and insert or update a few times per minute.


Not directly related to your question, but I am curious: Since sqlite's
atomicity  guarantee assumes that write to a disk sector are linear (e.g.
either beginning to end or end to beginning, but never the middle before
either end),


Mmm ... no.  SQLite's atomicity assumes in-order execution of writes.  In other 
words that the changes to the storage device will be done in the order they are 
requested.  Anything else breaks ACID.  However, SQLite does a lot of things to 
try and avoid a corrupted database even if ACID is broken, in an effort to let 
you rescue the maximum amount of data even if something went wrong with 
hardware.  Removing all the 'paranoid about hardware' and 'rescue' code from 
SQLite would make it significantly smaller.


I was wondering if this assumption was ever verified for flash
drives. I have no information to contrary, but while that assumption makes
a lot of sense for a spinning disk, I see no reason to assume this for
flash drives and other SSDs. I would think that they may parallelize writes
to different parts of the "sector" for efficiency.


Current Flash chips are optimized for sequential writes - while they allow 
writes to a sector in random order, doing so requires sending a new row 
address for each random access. Meanwhile, if you simply write them 
sequentially the write address auto-increments. Since operating systems always 
perform whole-sector writes (indeed, they perform whole-page writes) to 
current storage devices, and they use a DMA controller which is programmed 
with a start address and a block length, you can safely assume that they are 
writing each sector sequentially.


There is no efficiency gain in writing multiple sections of a single sector in 
parallel. (In fact it is an efficiency loss.) Multi-channel SSDs parallelize 
across multiple Flash chips, sending whole sectors to each separate channel/chip.



A good question.  By which I mean it's one people should worry about, not one 
which we're not sure about.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite & multi process usage

2014-07-18 Thread Howard Chu

Richard Hipp wrote:

On Thu, Jul 17, 2014 at 5:10 AM, Micka <mickamus...@gmail.com> wrote:


Well,

All of my process are using Mutex to protect sqlite from concurrent access.



What kind of mutex are you using that works across processs?  All the
mutexes I know about only work for a single process.

pthreads supports process-shared mutexes. 
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutexattr_getpshared.html


BerkeleyDB and LMDB use these to support multi-process access to a DB, which 
is why their forks of SQLite3 also support multi-process access.


It seems that FreeBSD and other related BSDs still don't implement these 
though, so you have to use semaphores on them instead. (Either SysV style or 
POSIX named semaphores.)


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite scalability

2013-11-22 Thread Howard Chu

Valentin Davydov wrote:

On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote:

Hi all,

I know SQLite is supposed to support DB sizes in the TB (I think the
default configuration can reach 1TB). I am curious if anyone actually uses
SQlite at anywhere near this.


Yes.


Does anyone use it regularly for DBs 500GB+
in size, or with tables containing 10 billion rows+?


I've got an installation with SQLite DB of several terabytes in size.
It contains about 20 billions thoroughly indexed records, and grows every
day (more precisely, every night) by a few tens of millions of new records.


How much concurrency does your use require?


I've spent some efforts to eliminate concurrency in application. That is,
updates and selects occur at very different times of the day.


How long do selects take (assuming indexes are set correctly?)


It depends of the size of the select. Single row is selected instantaneously.
Check of the uniqueness takes about 1-2 minutes per 1 million of records, most
of time being spent parsing SQL commands. Whereas aggregate functions over
substantional fraction of the entire database, of course, take too long to
be executed in real time.


Use SQLightning instead - concurrency issues are irrelevant then, since 
writers don't block readers. And it will search multiple gigabytes per second, 
as opposed to your millions-per-minute figure above.



Are there problems of locking ("normal" SQLite doesn't usually
suffer from locking since transactions are very quick, but if transactions
can be in the order of 100's of ms, I think locking can easily happen if
the DB is accessed concurrently a few times a second, though I am not sure
if this may only apply to writes).


Yes. Single transaction (insertion of that tens of millions of new recors)
takes hours in the worst case.


There's no good reason for insertion of 10 million records to take hours.


I understand that the answer to most of these questions can be very
hardware (and software) dependent,


Indeed not so. The only hardware capable of storing such amount of data
is an array of magnetic disks, and their latency time (about 10-20 ms for
random access) is much more than any reasonable software overhead. Even
cache (internal SQLite page cache and/or operation system file cache)
occupies the same memory and therefore has almost the same effect. The
only software which determines the performance is SQLite itself, in my
case, perhaps, trees rebalancing algorithm.


1TB SSDs are only ~$500; there's no reason to limit yourself to the slowness 
of magnetic disks these days.


http://www.amazon.com/Samsung-Electronics-EVO-Series-2-5-Inch-MZ-7TE1T0BW/dp/B00E3W16OU

SQLightning uses only the OS filesystem cache, so you get maximal use of the 
available system RAM instead of wasting half of it with redundant copies in 
application-level caches.



but I am just trying to get a feel for
SQLite's applicability for a project I am working on that may reach limits
like these.


The only definive SQLite limits are documentet in the relevant manual page.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-10 Thread Howard Chu

Raheel Gupta wrote:

Look at the performance difference between BDB and SQLite3 here

http://symas.com/mdb/microbench/#sec1

I did, and I really cant comment on that. The results are of 2012 and its
almost 2013. You should update the page with a newer result set.


Or you could just download the code and run it yourself.


I can't think of any other single feature that would remove the "lite"


I am not a database expert. If you say so, it must be the case.
But if there is a way to implement concurrent writers in SQLite maintaining
the "lite" in SQLite, I would be the most happiest person here :)


You seem to enjoy asking a lot of others, without regard to cost.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-08 Thread Howard Chu

Raheel Gupta wrote:

Hi,


You use BDB SQL or BDB KV ?

I built BDB 6.0.20 with --enable-sql_compat
It made a libsqlite3.so in the .libs folder which I linked with my QT C++
Application.


You must try it with SQLightning too, https://gitorious.org/mdb/

sqlightning

I tried to build it, but it says lmdb.h missing. Will check it as well.


You must copy (or symlink) the lmdb source code into the SQLightning source 
directory.



On Fri, Nov 8, 2013 at 4:12 PM, Aris Setyawan <aris.s...@gmail.com> wrote:


This is the BDB SQL doc I found.


http://docs.oracle.com/cd/E17076_02/html/bdb-sql/dbfeatures.html#bulkloading

If you insert record in bulk, you can use PRAGMA TXN_BULK for optimization.



I tested TXN_BULK, still pretty slow. Nearly 2.4 times.
Also the space wastage is pretty high. I set a page size of 64K in this and
the space wastage is very high as the DBNAME.db-journal folder has too many
files.
DBNAME.db is 448MB
DBNAME.db-journal is 161MB.

Which is after inserting 10 rows of 4K = 400MB.

In SQLITE 3.7.17 I get only 6.69% wastage of space.
I wish BDB worked faster to see my alternatives.


BDB is inherently a very slow codebase. Look at the performance difference 
between BDB and SQLite3 here http://symas.com/mdb/microbench/#sec1 - on random 
writes BDB KV is only 8% faster than SQLite. If you add the SQL parsing 
overhead on top of that, that 8% performance margin is erased. If you use 
batched random writes, SQLite is already 2% faster than BDB KV, so BDB SQL can 
only be slower than SQLite.


Whatever other differences there may be, there is no performance benefit to 
using BDB as a backend for SQLite. In most cases there is a performance loss.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-04 Thread Howard Chu

Aris Setyawan wrote:

Hi Howard,


I just looked, sophia is nothing special. See these microbench results.
http://pastebin.com/cFK1JsCN

LMDB's codebase is still smaller and faster. Nothing else touches LMDB's
read
speed.


This is micro benchmark from sophia author compare with lmdb.
http://sphia.org/benchmarks.html


Quite off-topic for this list, but those results are garbage.
https://github.com/pmwkaa/sophia_benchmark/issues/2#issuecomment-27740082

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-03 Thread Howard Chu

Aris Setyawan wrote:

SQLightning replaces the SQLite backend with Symas' LMDB, which also uses
MVCC
and thus supports high concurrency. It is also many times faster than
BerkeleyDB and vanilla SQLite.


Your MVCC is different compared to InnoDB or BDB locking. Every one
should carefully read each DB's doc, then test it before decide to use
it.


Yes, it's different. In LMDB writers never block readers and readers never 
block writers. The original poster was complaining about SELECT taking a long 
time and preventing other threads from making progress. That problem doesn't 
exist in LMDB. BDB locking *might* be able to avoid this in many cases, if 
there are no hotspots, but is prone to deadlocks which require the calling 
application to retry failed requests.



LMDB is storage engine optimized for read. Why you don't optimize it's
write using cache oblivious data structure, for example LSM tree or
create new, like in sophia (sphia.org) key value DB?


I just looked, sophia is nothing special. See these microbench results.
http://pastebin.com/cFK1JsCN

LMDB's codebase is still smaller and faster. Nothing else touches LMDB's read 
speed.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-03 Thread Howard Chu

Raheel Gupta wrote:

@Howard I had tested your code earlier but it didnt seem to be stable and
getting it to run was a task. Also I learnt that it is a "in-memory"
database.


False. LMDB is a memory-mapped disk database, that is not the same as an 
in-memory database.


@Aris are you saying BDB is better and faster than SQLite ?


Oracle claims that. From what I can see, Oracle is wrong.



On Sun, Nov 3, 2013 at 8:28 PM, Howard Chu <h...@symas.com> wrote:


Aris Setyawan wrote:


SQLightning replaces the SQLite backend with Symas' LMDB, which also uses

MVCC
and thus supports high concurrency. It is also many times faster than
BerkeleyDB and vanilla SQLite.



Your MVCC is different compared to InnoDB or BDB locking. Every one
should carefully read each DB's doc, then test it before decide to use
it.

LMDB is storage engine optimized for read. Why you don't optimize it's
write using cache oblivious data structure, for example LSM tree or
create new, like in sophia (sphia.org) key value DB?



Because read optimization is what was important to us when I created LMDB.
That's like asking why a hammer isn't a screwdriver.


  On 11/3/13, Howard Chu <h...@symas.com> wrote:



Aris Setyawan wrote:


SQLite do not use row level locking, but db level locking, so it was
the right behavior the second thread was blocked.

For innodb like in SQLite, Oracle have SQLite compatible API, but use
BDB backend.
Since BDB use MVCC (row/page level locking), your threads only blocked
if they will write in the same row/page.

www.oracle.com/technetwork/database/berkeleydb/bdb-
sqlite-comparison-wp-176431.pdf

* You must aware that BDB now have AGPL license.



SQLightning replaces the SQLite backend with Symas' LMDB, which also uses
MVCC
and thus supports high concurrency. It is also many times faster than
BerkeleyDB and vanilla SQLite.

https://gitorious.org/mdb/sqlightning/



On 11/3/13, Raheel Gupta <raheel...@gmail.com> wrote:


Hi,

I have been using SQLite for one project of mine and I will be storing
TBs
of Data.
Now there will be a lot of selections in this database and I have come
across one problem with SQLite.
In journal_mode=delete the selection is database locked.
When one thread does a "TRANSACTION" on the database and soon after
another
thread does "SELECT" on the database (using the same connection) or
vice
versa, the second thread has to wait till the first thread finishes.

In order to avoid this, I had to use journal_mode=wal so that two
threads
dont have to wait when they both are doing SELECTs which might be
taking
3-5 seconds to process.

I was wondering if Row Level Locking would be introduced in
journal_mode=delete as its there in InnoDB for MySQL. Atleast for
selects
or inserts Row Level rocking should be possible as neither modify the
existing rows.

journal_mode=wal is a little slower and has its own limitations over
NFS.

OR if there is a mode equivalent to innodb in SQLITE please do let me
know.
I need to do a lot of selects and inserts in my application and hence a
row
level locking is suitable vs table or database level locking.



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-03 Thread Howard Chu

Aris Setyawan wrote:

SQLightning replaces the SQLite backend with Symas' LMDB, which also uses
MVCC
and thus supports high concurrency. It is also many times faster than
BerkeleyDB and vanilla SQLite.


Your MVCC is different compared to InnoDB or BDB locking. Every one
should carefully read each DB's doc, then test it before decide to use
it.

LMDB is storage engine optimized for read. Why you don't optimize it's
write using cache oblivious data structure, for example LSM tree or
create new, like in sophia (sphia.org) key value DB?


Because read optimization is what was important to us when I created LMDB. 
That's like asking why a hammer isn't a screwdriver.



On 11/3/13, Howard Chu <h...@symas.com> wrote:

Aris Setyawan wrote:

SQLite do not use row level locking, but db level locking, so it was
the right behavior the second thread was blocked.

For innodb like in SQLite, Oracle have SQLite compatible API, but use
BDB backend.
Since BDB use MVCC (row/page level locking), your threads only blocked
if they will write in the same row/page.

www.oracle.com/technetwork/database/berkeleydb/bdb-sqlite-comparison-wp-176431.pdf

* You must aware that BDB now have AGPL license.


SQLightning replaces the SQLite backend with Symas' LMDB, which also uses
MVCC
and thus supports high concurrency. It is also many times faster than
BerkeleyDB and vanilla SQLite.

https://gitorious.org/mdb/sqlightning/



On 11/3/13, Raheel Gupta <raheel...@gmail.com> wrote:

Hi,

I have been using SQLite for one project of mine and I will be storing
TBs
of Data.
Now there will be a lot of selections in this database and I have come
across one problem with SQLite.
In journal_mode=delete the selection is database locked.
When one thread does a "TRANSACTION" on the database and soon after
another
thread does "SELECT" on the database (using the same connection) or vice
versa, the second thread has to wait till the first thread finishes.

In order to avoid this, I had to use journal_mode=wal so that two
threads
dont have to wait when they both are doing SELECTs which might be taking
3-5 seconds to process.

I was wondering if Row Level Locking would be introduced in
journal_mode=delete as its there in InnoDB for MySQL. Atleast for
selects
or inserts Row Level rocking should be possible as neither modify the
existing rows.

journal_mode=wal is a little slower and has its own limitations over
NFS.

OR if there is a mode equivalent to innodb in SQLITE please do let me
know.
I need to do a lot of selects and inserts in my application and hence a
row
level locking is suitable vs table or database level locking.



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Row Level Locking as in InnoDB

2013-11-03 Thread Howard Chu

Aris Setyawan wrote:

SQLite do not use row level locking, but db level locking, so it was
the right behavior the second thread was blocked.

For innodb like in SQLite, Oracle have SQLite compatible API, but use
BDB backend.
Since BDB use MVCC (row/page level locking), your threads only blocked
if they will write in the same row/page.

www.oracle.com/technetwork/database/berkeleydb/bdb-sqlite-comparison-wp-176431.pdf

* You must aware that BDB now have AGPL license.


SQLightning replaces the SQLite backend with Symas' LMDB, which also uses MVCC 
and thus supports high concurrency. It is also many times faster than 
BerkeleyDB and vanilla SQLite.


https://gitorious.org/mdb/sqlightning/



On 11/3/13, Raheel Gupta <raheel...@gmail.com> wrote:

Hi,

I have been using SQLite for one project of mine and I will be storing TBs
of Data.
Now there will be a lot of selections in this database and I have come
across one problem with SQLite.
In journal_mode=delete the selection is database locked.
When one thread does a "TRANSACTION" on the database and soon after another
thread does "SELECT" on the database (using the same connection) or vice
versa, the second thread has to wait till the first thread finishes.

In order to avoid this, I had to use journal_mode=wal so that two threads
dont have to wait when they both are doing SELECTs which might be taking
3-5 seconds to process.

I was wondering if Row Level Locking would be introduced in
journal_mode=delete as its there in InnoDB for MySQL. Atleast for selects
or inserts Row Level rocking should be possible as neither modify the
existing rows.

journal_mode=wal is a little slower and has its own limitations over NFS.

OR if there is a mode equivalent to innodb in SQLITE please do let me know.
I need to do a lot of selects and inserts in my application and hence a row
level locking is suitable vs table or database level locking.



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] logical to use sqlite to replace Berkeley DB ?

2013-09-13 Thread Howard Chu

Patrick wrote:

Hi Everyone

After Oracle acquired Berkeley DB they changed the license. If people
don't pay a significant licensing fee, it can now only be used for GPL code.

I don't know Berkeley DB very well but I do know a moderate amount of
Sqlite.

I want to tinker with a compiler that uses DB, I was thinking about
ripping it out and replacing it with Sqlite. Does this make sense?

I know they are both zero configuration embedded DBs but DB is a
key-value based one and I am assuming lighter, is this true? Any idea of
how close they would be in terms of memory use and execution speed?


BDB is much faster than SQLite, yes. In fact Oracle supplies a port of SQLite 
that uses BDB as the underlying Btree engine instead of SQLite's native code, 
and there's a significant performance gain.


If you have an app that is comfortably using the key-value API of BDB it would 
introduce major inefficiencies to convert it to using SQL. So no, this doesn't 
seem like a logical action to take.


If you're using BDB and want to switch off it because of the license issue, 
try OpenLDAP LMDB instead. No license hassles, and also several times smaller 
and faster than BDB.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Minimizing Internal Function Calls in SQLite

2013-07-30 Thread Howard Chu

Richard Hipp wrote:

On Tue, Jul 30, 2013 at 8:00 AM, Andrew Beal <ab...@whoi.edu> wrote:


Hi All,

Is there a way to minimize function calls internal to SQLite? At some
points of execution, I have measured as much as 35 deep nested function
calls. Since we are running on an embedded system, we don't support such a
deep call table.



Wow.  What embedded system is it that doesn't support a call stack that is
*only* 35 levels deep?


The other obvious question - what kind of embedded system with such limited 
stack resources actually needs SQL for its data storage purposes? Is your 
schema really so complex? Are your queries? Do you need to do JOINs across 
multiple tables?


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Large Database Windows vs Linux

2013-06-28 Thread Howard Chu

David de Regt wrote:

It's the kind of useful help like this that makes me love the FOSS movement.


All based in facts, of course. http://blog.zorinaq.com/?e=74


-David

-Original Message-
From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
On Behalf Of Walter Hurry
Sent: Friday, June 28, 2013 5:09 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Large Database Windows vs Linux

On Fri, 28 Jun 2013 15:22:57 -0600, Keith Medcalf wrote:


That would explain why the best thing to be done with System Destroyer
(System Restore) is the same as the best way to handle the Hardware
Destroyer (Power Management) in Windows.  Disable it completely.


The best thing to do with Windows is format the drive and install Unix or 
FreeBSD or Linux.



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.7.17 preview - 2x faster?

2013-04-04 Thread Howard Chu

Dan Kennedy wrote:

On 04/04/2013 08:44 PM, Howard Chu wrote:

Richard Hipp wrote:

The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
solaris.  We have found that it does not work on OpenBSD, for reasons we
have not yet been able to uncove; but as a precaution, memory mapped
I/O is
disabled by default on all of the *BSDs until we understand the problem.


As I understand it, OpenBSD lacks a unified buffer cache. They reported
problems with LMDB in its default mode, too.


But it works in some non-default mode? When both reads and writes are
done via memory mapping? Or some other trick?


Right. It works if you use a writable mmap and do all reads and writes thru 
the map. But any process that comes along and accesses the file using read 
will see invalid/stale information, and start double-caching the file pages.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.7.17 preview - 2x faster?

2013-04-04 Thread Howard Chu

Richard Hipp wrote:

The memory-mapped I/O is only enabled for windows, linux, mac OS-X, and
solaris.  We have found that it does not work on OpenBSD, for reasons we
have not yet been able to uncove; but as a precaution, memory mapped I/O is
disabled by default on all of the *BSDs until we understand the problem.


As I understand it, OpenBSD lacks a unified buffer cache. They reported 
problems with LMDB in its default mode, too. But FreeBSD should be OK. I don't 
know about any of the other BSD variants.



The biggest performance gains occur on windows, mac, and solaris.  The new
code is also faster on linux, but not by as big a factor.  The speed
improvement is also heavily dependent upon workload.  Some operations can
be almost twice as faster.  For others, there is no measurable speed
improvement.

Your feedback on whether or not the new code is faster for you, and whether
or not it even works for you, is very important to us.  Thanks for giving
the new code a try.




--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.7.17 preview - 2x faster?

2013-04-04 Thread Howard Chu

Ryan Johnson wrote:

3. It seems like this would increase the "attack surface" for stray
pointers in the host program. Granted, writes to stray pointers are not
sqlite's fault, but they're an unfortunately common problem... and mmap
makes user bugs more likely to directly corrupt the database on disk.
Perceived reliability might drop as a result (I'm not arguing that the
risk is worth giving up 2x, just pointing it out as a potential
unintended consequence).


This is why OpenLDAP LMDB uses a read-only mmap by default. User bugs get an 
immediate SEGV, and usually the bug becomes obvious and easy to fix.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Howard Chu

Simon Slavin wrote:


On 25 Feb 2013, at 11:33am, Howard Chu <h...@symas.com> wrote:


Gabriel Corneanu wrote:

Following a few other discussions, I had the feeling that sqlite should
benefit from a cache which discards cached pages in a least frequently
used order.


Just offhand, classical LRU is quite poor in terms of lock overhead.


Gabriel writes "least frequently used".  Howard writes "least recently used".  
You're not writing about the same thing.


Doh, you're right. Sorry for the noise, going back to get some caffeine now.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Howard Chu

Gabriel Corneanu wrote:

Following a few other discussions, I had the feeling that sqlite should
benefit from a cache which discards cached pages in a least frequently
used order.


Just offhand, classical LRU is quite poor in terms of lock overhead. The CLOCK 
refinement scales much better, because no reorganizing of LRU lists is needed 
during page references.


And of course, having gone thru all of these exercises of fancy 
application-level cache algorithms already, it's still obvious that the best 
approach is to leave it to the kernel.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent read performance

2013-01-14 Thread Howard Chu

Wayne Bradney wrote:

Howard,Thanks for the links - mdb/sqlightning looks interesting.Before I

dive in, however:1. As I understand it, sqlightning is a drop-in replacement
for SQLite.Interop.dll, and I can still use System.Data.SQLite as my
high-level interface?

Right.

2. Is there a compiled release available?

No.

3. How often is sqlightning updated against SQLite? We're at SQLite 3.7.15.2 /
System.Data.SQLite 1.0.84.0 right now

There is no set schedule, basically as interest and demand dictate.

4. Does sqlightning allow lock-free
reads, even against a SQLite memory-backed, shared cache database?

No. But you can get the same effect simply by putting the MDB database onto a 
RAMdisk.



You are putting programming effort into making your code fast, and this is 
costing you (or your employer) programmer time.


For any reasonably useful piece of software, every moment of programmer time
invested in proper coding saves eons of user time. Putting programmer effort
into making correct code fast is always The Right Thing to Do. Software that
delivers the correct answer, late, is still wrong.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent read performance

2013-01-13 Thread Howard Chu

Wayne Bradney wrote:

All access in SQLite is serialized. Apologies if I'm missing something 
fundamental here, but that's not what I'm seeing with a file-backed database 
when shared cache is OFF.My test has a single table with 1M rows, and four 
queries that each yield 100K different rows. I run them two ways: 1. All 
queries in a loop on the same thread in the same connection.2. Each query in 
parallel on separate threads, each with its own connection. If all access were 
serialized, I would expect these two tests to take about the same amount of 
time overall, wouldn't I?In fact, with a file-backed database and shared cache 
OFF, the second run takes about 70% less time.With shared cache ON, they're the 
same. As to your second point, I probably should have made it clear that this 
isn't an internal project, it's a software product, and we don't control where 
it runs. I understand what an SSD is and why it's better than a spindle drive, 
but my question wasn't really meant to solicit suggestions for performa

n
c

  e improvements outside the proposal at hand, which was to retire our existing 
home-grown in-memory cache implementation (which is very fast for concurrent 
reads, but is extremely limited in how it can be queried), and replace it with a 
SQL-capable, relational store and still get roughly the same performance. Our 
expectation was that we could achieve this with SQLite, but were surprised by the 
apparent lack of read-concurrency, and wanted to get some input on what our 
options might be in terms of SQLite configuration of memory-backed databases. > 
From: slav...@bigfraud.org


You should look into MDB, which does no locking for read operations. Reads 
scale perfectly across arbitrarily many CPUs. More info here


http://symas.com/mdb/

and SQLite ported to use MDB as its backend is available here

https://gitorious.org/mdb/sqlightning


Date: Sat, 12 Jan 2013 17:48:56 +
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Concurrent read performance


On 12 Jan 2013, at 5:38pm, Wayne Bradney <wayne_brad...@yahoo.com> wrote:


"mode=memory=shared"




1. when shared cache is enabled, all reads are serialized, and


All access in SQLite is serialised.  All transactions require locking the 
entire database.  SQLite is very simple -- 'lite' -- so queries run extremely 
quickly, so you don't normally realise that any locking has taken place.


2. there doesn't seem to be any way to have a memory-backed database that can 
be accessed by multiple connections without using a shared cache,  then I guess 
I MUST use a file-backed database to get concurrent reads, even though I don't 
need the persistence and don't want to take the I/O hit. Am I making any sense? 
Anything I'm missing?


You are putting programming effort into making your code fast, and this is 
costing you (or your employer) programmer time.


For any reasonably useful piece of software, every moment of programmer time 
invested in proper coding saves eons of user time. Putting programmer effort 
into making correct code fast is always The Right Thing to Do. Software that 
delivers the correct answer, late, is still wrong.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Please help. Read only process being blocked by writer process.

2012-11-29 Thread Howard Chu

Alejandro Martínez wrote:

Thanks Richard, that makes perfect sense.

Thanks Howard, but i don't know what you are talking about, so i will
google "copy-on-write".


See the papers and presentations here:
   http://www.symas.com/mdb

Source code for SQLite is here:
   http://gitorious.org/mdb


Григорий Григоренко, Interesting! I'll consider this approach if at some
point i'm able to go "scorched earth" and start this from scratch, but at
this point i would have to change too much stuff.

I will go with the WAL solution for now. I'm just worried a buggy process
could hang while having a prepared statement open and cause the wal file to
grow forever, then causing errors in other processes.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Please help. Read only process being blocked by writer process.

2012-11-28 Thread Howard Chu

Richard Hipp wrote:

On Wed, Nov 28, 2012 at 9:58 AM, Alejandro Martínez <elpeq...@gmail.com>wrote:


And wouldn't PRAGMA read_uncommitted achieve the effect i was expecting? Or
does that cause other problems?



read_uncommitted only works if both the read and writer are in the same
process and are using shared cache.




Reading "old" or inconsistent data would not be a problem for me. (as long
as it is not corrupted data).



That's really the crux of the problem.  Suppose the writer has needing to
split a btree node, for example.  To do this, the writer would typically
write two new child pages and then overwrite the old b-tree page with the
parent page.  If the writer were part way through this sequence when the
reader comes along, the reader would see corrupt data, since the btree
structure would be only partially updated and hence not well-formed.


That's one of the fundamental problems with update-in-place DB designs. (Aside 
from their crash vulnerability and expensive crash recovery...) MVCC via 
copy-on-write has none of these issues.






On Wed, Nov 28, 2012 at 11:20 AM, Richard Hipp <d...@sqlite.org> wrote:


On Wed, Nov 28, 2012 at 7:09 AM, Alejandro Martínez <elpeq...@gmail.com

wrote:



Ok, i will probably do that. Thank you.

But i'd like to know. Why doesn't this work without wal? A read only
operation shouldn't block, right?



If you are not running WAL, then the database is updated directly.  That
means that there can be no readers active when a write is in progress
because then the readers would see an incomplete and uncommitted
transaction.





And regarding the commit failing, does that need a busy timeout handler
too? From documentation i though it would just wait until all readers

are

done reading and then write. And that further incoming readers would

wait

for those 5 seconds. And i was expecting not to really wait, as the

commit

should be pretty quick, right?

I'm puzzled.



On Wed, Nov 28, 2012 at 10:01 AM, Richard Hipp <d...@sqlite.org> wrote:


On Wed, Nov 28, 2012 at 6:56 AM, Alejandro Martínez <

elpeq...@gmail.com

wrote:



Is that the only way?

When i had done that in the past, the wal file grew constantly and

i

am

afraid it could fill the hard disk.

That could happen if say... one of the reading processes doesn't
properly sqlite3_reset a prepared statement after stepping it.

right?




Correct.  The WAL file will grow until a checkpoint resets it.  And a
checkpoint cannot reset the WAL file while there is a pending

transaction.

So if you have a statement holding a transaction open, the WAL file

will

grow without bound.

The solution there is to not hold read transactions open

indefinitely.

Call sqlite3_reset() when you are done with a statement so that its

implied

read transaction will close.




Thank you for your quick answer.



On Wed, Nov 28, 2012 at 9:52 AM, Richard Hipp <d...@sqlite.org>

wrote:



PRAGMA journal_mode=WAL




--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] light weight write barriers

2012-11-16 Thread Howard Chu

Ric Wheeler wrote:

On 11/16/2012 10:06 AM, Howard Chu wrote:

David Lang wrote:

barriers keep getting mentioned because they are a easy concept to understand.
"do this set of stuff before doing any of this other set of stuff, but I don't
care when any of this gets done" and they fit well with the requirements of the
users.

Users readily accept that if the system crashes, they will loose the most recent
stuff that they did,


*some* users may accept that. *None* should.


but they get annoyed when things get corrupted to the point
that they loose the entire file.

this includes things like modifying one option and a crash resulting in the
config file being blank. Yes, you can do the 'write to temp file, sync file,
sync directory, rename file" dance, but the fact that to do so the user must sit
and wait for the syncs to take place can be a problem. It would be far better to
be able to say "write to temp file, and after it's on disk, rename the file" and
not have the user wait. The user doesn't really care if the changes hit disk
immediately, or several seconds (or even 10s of seconds) later, as long as there
is not any possibility of the rename hitting disk before the file contents.

The fact that this could be implemented in multiple ways in the existing
hardware does not mean that there need to be multiple ways exposed to userspace,
it just means that the cost of doing the operation will vary depending on the
hardware that you have. This also means that if new hardware introduces a new
way of implementing this, that improvement can be passed on to the users without
needing application changes.


There are a couple industry failures here:

1) the drive manufacturers sell drives that lie, and consumers accept it
because they don't know better. We programmers, who know better, have failed
to raise a stink and demand that this be fixed.
   A) Drives should not lose data on power failure. If a drive accepts a write
request and says "OK, done" then that data should get written to stable
storage, period. Whether it requires capacitors or some other onboard power
supply, or whatever, they should just do it. Keep in mind that today, most of
the difference between enterprise drives and consumer desktop drives is just a
firmware change, that hardware is already identical. Nobody should accept a
product that doesn't offer this guarantee. It's inexcusable.
   B) it should go without saying - drives should reliably report back to the
host, when something goes wrong. E.g., if a write request has been accepted,
cached, and reported complete, but then during the actual write an ECC failure
is detected in the cacheline, the drive needs to tell the host "oh by the way,
block XXX didn't actually make it to disk like I told you it did 10ms ago."

If the entire software industry were to simply state "your shit stinks and
we're not going to take it any more" the hard drive industry would have no
choice but to fix it. And in most cases it would be a zero-cost fix for them.

Once you have drives that are actually trustworthy, actually reliable (which
doesn't mean they never fail, it only means they tell the truth about
successes or failures), most of these other issues disappear. Most of the need
for barriers disappear.



I think that you are arguing a fairly silly point.


Seems to me that you're arguing that we should accept inferior technology. 
Who's really being silly?



If you want that behaviour, you have had it for more than a decade - simply
disable the write cache on your drive and you are done.


You seem to believe it's nonsensical for someone to want both fast and 
reliable writes, or that it's unreasonable for a storage device to offer the 
same, cheaply. And yet it is clearly trivial to provide all of the above.



If you - as a user - want to run faster and use applications that are coded to
handle data integrity properly (fsync, fdatasync, etc), leave the write cache
enabled and use file system barriers.


Applications aren't supposed to need to worry about such details, that's why 
we have operating systems.


Drives should tell the truth. In event of an error detected after the fact, 
the drive should report the error back to the host. There's nothing 
nonsensical there.


When a drive's cache is enabled, the host should maintain a queue of written 
pages, of a length equal to the size of the drive's cache. If a drive says 
"hey, block XXX failed" the OS can reissue the write from its own queue. No 
muss, no fuss, no performance bottlenecks. This is what Real Computers did 
before the age of VAX Unix.



Everyone has to trade off cost versus something else and this is a very, very
long standing trade off that drive manufacturers have made.


With the cost of storage falling as rapidly as it has in recent years, this is 
a stupid tradeoff.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun 

Re: [sqlite] light weight write barriers

2012-11-16 Thread Howard Chu

David Lang wrote:

barriers keep getting mentioned because they are a easy concept to understand.
"do this set of stuff before doing any of this other set of stuff, but I don't
care when any of this gets done" and they fit well with the requirements of the
users.

Users readily accept that if the system crashes, they will loose the most recent
stuff that they did,


*some* users may accept that. *None* should.


but they get annoyed when things get corrupted to the point
that they loose the entire file.

this includes things like modifying one option and a crash resulting in the
config file being blank. Yes, you can do the 'write to temp file, sync file,
sync directory, rename file" dance, but the fact that to do so the user must sit
and wait for the syncs to take place can be a problem. It would be far better to
be able to say "write to temp file, and after it's on disk, rename the file" and
not have the user wait. The user doesn't really care if the changes hit disk
immediately, or several seconds (or even 10s of seconds) later, as long as there
is not any possibility of the rename hitting disk before the file contents.

The fact that this could be implemented in multiple ways in the existing
hardware does not mean that there need to be multiple ways exposed to userspace,
it just means that the cost of doing the operation will vary depending on the
hardware that you have. This also means that if new hardware introduces a new
way of implementing this, that improvement can be passed on to the users without
needing application changes.


There are a couple industry failures here:

1) the drive manufacturers sell drives that lie, and consumers accept it 
because they don't know better. We programmers, who know better, have failed 
to raise a stink and demand that this be fixed.
  A) Drives should not lose data on power failure. If a drive accepts a write 
request and says "OK, done" then that data should get written to stable 
storage, period. Whether it requires capacitors or some other onboard power 
supply, or whatever, they should just do it. Keep in mind that today, most of 
the difference between enterprise drives and consumer desktop drives is just a 
firmware change, that hardware is already identical. Nobody should accept a 
product that doesn't offer this guarantee. It's inexcusable.
  B) it should go without saying - drives should reliably report back to the 
host, when something goes wrong. E.g., if a write request has been accepted, 
cached, and reported complete, but then during the actual write an ECC failure 
is detected in the cacheline, the drive needs to tell the host "oh by the way, 
block XXX didn't actually make it to disk like I told you it did 10ms ago."


If the entire software industry were to simply state "your shit stinks and 
we're not going to take it any more" the hard drive industry would have no 
choice but to fix it. And in most cases it would be a zero-cost fix for them.


Once you have drives that are actually trustworthy, actually reliable (which 
doesn't mean they never fail, it only means they tell the truth about 
successes or failures), most of these other issues disappear. Most of the need 
for barriers disappear.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Shared in-memory SQLite database in shared memory

2012-11-03 Thread Howard Chu

Jaco Breitenbach wrote:

Dear all,

My application consists of several indepent processes that must all access
(read) the same data table during processing.  In order to optimise memory
usage I was wondering if it is possible to load an in-memory SQLite
database into shared memory.  The database would be maintained by a
separate management process and all other processes would only require read
access.  This way only one copy of the database would have to be held in
system memory.

Is this possible with the current implementation of SQLite?  Any
suggestions would be welcome.


The OpenLDAP MDB (memory mapped database) library will do exactly what you 
want. Since it uses a shared memory map to access the DB, no matter how many 
processes access it concurrently there's only one copy of data present in RAM. 
It also performs reads faster than anything else, even pure in-memory 
databases. Nothing else is anywhere close to as efficient as MDB for reads.


Read more here http://highlandsun.com/hyc/mdb/

The port of SQLite using MDB as its backend is available on gitorious
https://gitorious.org/mdb/

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] light weight write barriers

2012-11-01 Thread Howard Chu

Alan Cox wrote:

How about that recently preliminary infrastructure to send ORDERED commands
instead of queue draining was deleted from the kernel, because "there's no
difference where to drain the queue, on the kernel or the storage side"?


Send patches.


Isn't any type of kernel-side ordering an exercise in futility, since
  a) the kernel has no knowledge of the disk's actual geometry
  b) most drives will internally re-order requests anyway
  c) cheap drives won't support barriers

Even assuming the drives honored all your requests without lying, how would 
you really want this behavior exposed? From the userland perspective, there 
are very few apps that care. Probably only transactional databases, really.


As a DB author, I'm not sure I'd be keen on this as an open() or fcntl() 
option. Databases that really care would be on dedicated filesystems and/or 
devices, so per-file control would be tedious. You would most likely want to 
say "all writes to this string of devices should be order-preserving" and 
forget about it. With that guarantee, a careful writer can have perfectly 
intact data structures all the time, without ever slowing down for a fsync.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] light weight write barriers

2012-10-30 Thread Howard Chu

On Thu, Oct 25, 2012, Nico Williams wrote:

Incidentally, here's a single-file, bag of b-trees that uses a COW
format: MDB, which can be found in
git://git.openldap.org/openldap.git, in the mdb.master branch.


Complete docs, design notes, and benchmark results are available here:
  http://highlandsun.com/hyc/mdb/

I already discussed some of this on the sqlite-dev mailing list back in July.
I'll also be giving a talk on the design/internals of MDB in Barcelona at 
LinuxCon Europe on Wednesday November 7. Drop by if you're in the area...


Our work on an sqlite 4 backend built on MDB is progressing as well.
--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users