Re: [sqlite] CppSQLite

2004-04-23 Thread Mark D. Anderson
There is no copyright statement or license stated in the article
or in the download.
So it isn't clear what the legal status is of CppSQLite?


On Fri, 23 Apr 2004 14:56:45 +0100, "Rob Groves"
> For those that are interested, a new version of CppSQLite and
> accompanying
> article is available here:
> Main new features are support for pre-compiled SQL and multithreaded
> features
> of SQLite, plus removal of Microsoft specific C++.
> Cheers,
> Rob.

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] row size limit

2004-04-18 Thread Mark D. Anderson
These disk access issues are why no database I know of actually
stores large objects inline. It would be crazy to do so.
mysql, postgres, and oracle all have support for blobs, and
none of them store them inline.

(btw, if you care about disk io performance for blobs,
you can tune the fs parameters for such large files.)

blobs are a pain no matter how you cut it.

i've built different applications with them in the database,
and without (storing just filenames, with files maintained
by the app, as drh suggested).
I've regretted both approaches :(.

The problems I've seen with not using database blobs:

1. kiss replication goodbye. If you are careful with file naming,
and never modify blobs, and use rsync, you can just barely get by.

2. You can't do text matching as part of a sql request. You have
to do your own external text indexing and join at the app layer.

3. You have to implement your own application-level transaction logic,
so that during updates and deletes of both database and file system
are inconsistent. Except that is hard to do right, so over time
the two *will* get inconsistent.

4. You have to have a shared file system for your database clients.
Which means NFS or samba, or something else which is painful
to set up and administer and is difficult to secure over a WAN.

The problems I've seen with using database blobs:

1. The blob-specific APIs are also usually database-specific.
They are typically poorly designed.
Now with JDBC3/ODBC3, they are at least not database-specific

2. The semantics of blob modification within a larger transaction
is usually poorly documented, or buggy, or both.

3. You don't get to play with the blobs in the external file
system when you want to (even if for read only purposes).

4. Performance can still stink, unless you are careful. At the
wire level, the fetch blocks usually only contain "locator"
objects, and these are converted to data streams only when
asked, and that usually requires a new independent roundtrip
and set up for every blob. So you end up doing things like
using sql functions which fetch the first 1000 bytes of the blobs
(if that is what your app wants), so that they are streamed with
the rest of the fetch.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] Concurrency, MVCC

2004-04-15 Thread Mark D. Anderson

On Thu, 15 Apr 2004 20:16:32 -0400, "Doug Currie" <[EMAIL PROTECTED]> said:

> I used this design in a proprietary database in the late 1980s. The
> only reason I didn't consider modifying SQLite this way up until now
> is that I was anticipating BTree changes for 3.0, so I confined my
> efforts to the pager layer.

btw, another example of this class of approach, with a bsd-style
license, is GigaBASE from the prolific Konstantin Knizhnik:

It does not offer anything approaching full SQL.
It does however have several features not available in sqlite:
- online backup [1]
- master-slave replication
- group commit [2]
- parallel query (multiple threads for full table scans)

[1] There is kind of support in sqlite for online backup, via
   echo '.dump' | sqlite ex1 > backup.sql
though this would result in a largish file and blocks
everything else.

[2] Grouping commits is a mechanism that allows for pending transactions 
to get fsync'd together.
This allows for greater performance with a risk only of losing
some transactions (at most the size of the group), but not
greater risk of a corrupted database.
This is more flexibility than sqlite's big knob of OFF/NORMAL/FULL.
It is also offered by DB2, Oracle, and MySQL.

In idle moments I've toyed with what it would take to splice
GigaBASE with the query parser and planner from
lambda-db or sqlite.
But then I wake up


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] Concurrency, MVCC

2004-04-15 Thread Mark D. Anderson

On Wed, 14 Apr 2004 08:13:39 -0400, "D. Richard Hipp" <[EMAIL PROTECTED]>

>* Support for atomic commits of multi-database transactions,
>  which gives you a limited kind of table-level locking,
>  assuming you are willing to put each table in a separate
>  database.

and also a limited form of concurrent writers, as a consequence,
assuming that table locks are acquired in a consistent order
to avoid deadlock, there could be concurrent writers that do
not touch the same tables (in this database-per-table model).

btw, what about offering better behavior about throwing away
cache pages? one approach would be something like a 
commit_begin() function which is offered by some rdbms native
apis. It says "commit what i've done, but at the same time
attempt to acquire the write lock".
Failure to "win" and actually be able to retain the write
lock might not be reported -- the idea is that the application
can at least indicate its desire.

This could also be done as some sort of connection option.

So in the case that a single writer is keeping up with all
requests, it can do so efficiently without throwing away
its pages.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] Concurrency, MVCC

2004-04-14 Thread Mark D. Anderson
Wednesday, April 14, 2004, 1:16:54 AM, Andrew Piskorski wrote:
> as far as I can tell, it seems to be describing a system with
> the usual Oracle/PostgreSQL MVCC semantics, EXCEPT of course that
> Currie proposes that each Write transaction must take a lock on the
> database as a whole.

Well, i suppose from a sufficient distance they look alike,
but in practice MVCC and shadow paging are rather different.

In MVCC, each row typically has two hidden fields identifying the first
and last transaction ids for which the row is relevant.
The last transaction id is to skip rows that are deleted.
There are many variants of MVCC, but you get the idea.

Any reader (or writer) knows its own transaction id, and just
ignores rows that are no applicable.

A "vacuum" process is necessary to periodically reclaim space
taken by rows whose last transaction id is lower than any live

In shadow paging, the basic idea is that any reader or writer
gets a view onto the data based on reachability from "pointers"
in a particular root block. Pages that are reachable from any
live root block are never modified. A vacuum process is required 
to collect the space from blocks that are no longer reachable.
Updates to indexes must be treated in roughly the same way as
data pages, because they contain pointers to different data.

Shadow paging can be used for a table-based database, or
a persistent object store.
It certainly is much older than the HUT work; see for example Lorie 77,
"Physical Integrity in a Large Segmented Database."
It falls into the general class of logless transaction systems,
as opposed to the log-based approach that predominates in
current day non-academic database implementations.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] Concurrency Proposal

2004-04-14 Thread Mark D. Anderson

On Wed, 31 Mar 2004 12:15:36 +1000, [EMAIL PROTECTED] said:
> G'day,

> [snip of Ben's pseudo-code]

Just to check my understanding: the suggestion here is to reduce
reader-writer conflict windows by buffering of writes.
The writer acquires a read lock at the start of the transaction,
and upgrades to a write lock only when it comes time to commit all
pending IO.

If i understand the proposal, this improves read concurrency
at the cost of write throughput, because write IO is held back
and started later than it could be if it were commenced
as soon as the intent was known.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] A proposal for SQLite version 3.0

2004-04-12 Thread Mark D. Anderson
Echoing some others' responses, particularly Darren's:

1. I don't see the rationale for putting much priority on 
multiple string encodings. After all, blobs still can't
be stored natively :).

UTF-16 adds extra complexity, because of embedded nulls,
and because of its own need for a byte-order-mark (BOM).
Furthermore, it is not a fixed width encoding.
(Java, and Windows prior to win2k, behaved as it if it was
fixed length, by neglecting surrogate pairs, but they are
just broken.)

UTF-8 is not a fixed width encoding either, but it does
not have embedded nulls, and it is supported "natively" by
practically all scripting languages.

The wchar_t issue is a pain; on Windows it is typically
a 16-bit type, and on Unix it is typically 32-bit, and in
either case you still don't know whether it is UCS-2, UTF-16,
or (on unix) UCS-4.
But using a char* declaration for a string that can contain
embedded nulls is an invitation to rampant bugs, IMHO.

2. I would be very interested to hear more about better
concurrency support, particularly along the lines of
the HUT HiBase/Shades ideas.
Doug Currie has written up some ideas about this in
cvstrac ("BlueSky"), but I'd also urge people to read
the HUT papers. At least K. Oksanen's publications are
still online at

3. I'm unsure what is driving a desire for variable typing
among the values of a single column. Is this some deep-seated
distaste for the relational model :).


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] sqlite not sharing page cache across processes?

2004-03-22 Thread Mark D. Anderson

On Mon, 22 Mar 2004 14:35:23 -0500, "Doug Currie" <[EMAIL PROTECTED]> said:

> It has no way of knowing (save diffing the db) that "there have been
> no write transactions."

well, that should in principle be easy to accomplish, shouldn't it?
For example by having any writer increment a counter held
in the meta information in the first page of the file (PageOne
or whatever).

> > While I'm making suggestions despite near-total ignorance
> > of sqlite :), have you measured what improvement would
> > be obtained by using readonly mmap() for readers, rather
> > read()? It would save some malloc churning as well as
> > a memory copy. Performance could be additionally tweaked
> > with madvise(SEQUENTIAL) when suitable.
> These functions are not in ANSI C.

No, but they are POSIX.
And there are lots of functions already in the sqlite code base
that are neither ANSI nor POSIX (primarily for windows or mac).

There are several operating systems, such as Solaris that
implement their "cp" using mmap rather than read, and
have done so for ages. So this is really not some piece
of exotica.

There probably are some embedded systems that do not
offer mmap. There are circumstances when mmap will not
perform as well as read. For example, the sqlite case of using
a page only once is pessimal for mmap, because
of TLB bookkeeping overhead on setup and unmapping.
Use of mmap would probably make sqlite even more vulnerable to NFS
than it is already.

I would not suggest mmap as the only solution; as with web servers,
I would suggest the strategy as a configurable option.

Also, even I would hesitate over suggesting mmap for writers, without
a lot of experimentation.

On Mon, 22 Mar 2004 14:56:35 -0500, "D. Richard Hipp" <[EMAIL PROTECTED]>
> Mark D. Anderson wrote:
>  > Have you measured what improvement would be obtained by using readonly mmap()
>  > for readers, rather read()?
> SQLite supports files that are larger than 4GB.  I don't think
> you can mmap() a file that big on a machine with a 32-bit
> address space.

You'd have to use mmap64 for that, if it was desirable to use mmap
at all.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]