Re: [sqlite] CppSQLite
There is no copyright statement or license stated in the article or in the download. So it isn't clear what the legal status is of CppSQLite? -mda On Fri, 23 Apr 2004 14:56:45 +0100, "Rob Groves" <[EMAIL PROTECTED]> said: > For those that are interested, a new version of CppSQLite and > accompanying > article is available here: > > http://www.codeproject.com/database/CppSQLite.asp > > Main new features are support for pre-compiled SQL and multithreaded > features > of SQLite, plus removal of Microsoft specific C++. > > Cheers, > > Rob. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] row size limit
These disk access issues are why no database I know of actually stores large objects inline. It would be crazy to do so. mysql, postgres, and oracle all have support for blobs, and none of them store them inline. (btw, if you care about disk io performance for blobs, you can tune the fs parameters for such large files.) blobs are a pain no matter how you cut it. i've built different applications with them in the database, and without (storing just filenames, with files maintained by the app, as drh suggested). I've regretted both approaches :(. The problems I've seen with not using database blobs: 1. kiss replication goodbye. If you are careful with file naming, and never modify blobs, and use rsync, you can just barely get by. 2. You can't do text matching as part of a sql request. You have to do your own external text indexing and join at the app layer. 3. You have to implement your own application-level transaction logic, so that during updates and deletes of both database and file system are inconsistent. Except that is hard to do right, so over time the two *will* get inconsistent. 4. You have to have a shared file system for your database clients. Which means NFS or samba, or something else which is painful to set up and administer and is difficult to secure over a WAN. The problems I've seen with using database blobs: 1. The blob-specific APIs are also usually database-specific. They are typically poorly designed. Now with JDBC3/ODBC3, they are at least not database-specific 2. The semantics of blob modification within a larger transaction is usually poorly documented, or buggy, or both. 3. You don't get to play with the blobs in the external file system when you want to (even if for read only purposes). 4. Performance can still stink, unless you are careful. At the wire level, the fetch blocks usually only contain "locator" objects, and these are converted to data streams only when asked, and that usually requires a new independent roundtrip and set up for every blob. So you end up doing things like using sql functions which fetch the first 1000 bytes of the blobs (if that is what your app wants), so that they are streamed with the rest of the fetch. -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Concurrency, MVCC
On Thu, 15 Apr 2004 20:16:32 -0400, "Doug Currie" <[EMAIL PROTECTED]> said: > I used this design in a proprietary database in the late 1980s. The > only reason I didn't consider modifying SQLite this way up until now > is that I was anticipating BTree changes for 3.0, so I confined my > efforts to the pager layer. btw, another example of this class of approach, with a bsd-style license, is GigaBASE from the prolific Konstantin Knizhnik: http://www.garret.ru/~knizhnik/gigabase/GigaBASE.htm It does not offer anything approaching full SQL. It does however have several features not available in sqlite: - online backup [1] - master-slave replication - group commit [2] - parallel query (multiple threads for full table scans) [1] There is kind of support in sqlite for online backup, via echo '.dump' | sqlite ex1 > backup.sql though this would result in a largish file and blocks everything else. [2] Grouping commits is a mechanism that allows for pending transactions to get fsync'd together. This allows for greater performance with a risk only of losing some transactions (at most the size of the group), but not greater risk of a corrupted database. This is more flexibility than sqlite's big knob of OFF/NORMAL/FULL. It is also offered by DB2, Oracle, and MySQL. In idle moments I've toyed with what it would take to splice GigaBASE with the query parser and planner from lambda-db or sqlite. But then I wake up -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Concurrency, MVCC
On Wed, 14 Apr 2004 08:13:39 -0400, "D. Richard Hipp" <[EMAIL PROTECTED]> said: >* Support for atomic commits of multi-database transactions, > which gives you a limited kind of table-level locking, > assuming you are willing to put each table in a separate > database. and also a limited form of concurrent writers, as a consequence, right? assuming that table locks are acquired in a consistent order to avoid deadlock, there could be concurrent writers that do not touch the same tables (in this database-per-table model). btw, what about offering better behavior about throwing away cache pages? one approach would be something like a commit_begin() function which is offered by some rdbms native apis. It says "commit what i've done, but at the same time attempt to acquire the write lock". Failure to "win" and actually be able to retain the write lock might not be reported -- the idea is that the application can at least indicate its desire. This could also be done as some sort of connection option. So in the case that a single writer is keeping up with all requests, it can do so efficiently without throwing away its pages. -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Concurrency, MVCC
Wednesday, April 14, 2004, 1:16:54 AM, Andrew Piskorski wrote: > as far as I can tell, it seems to be describing a system with > the usual Oracle/PostgreSQL MVCC semantics, EXCEPT of course that > Currie proposes that each Write transaction must take a lock on the > database as a whole. Well, i suppose from a sufficient distance they look alike, but in practice MVCC and shadow paging are rather different. In MVCC, each row typically has two hidden fields identifying the first and last transaction ids for which the row is relevant. The last transaction id is to skip rows that are deleted. There are many variants of MVCC, but you get the idea. Any reader (or writer) knows its own transaction id, and just ignores rows that are no applicable. A "vacuum" process is necessary to periodically reclaim space taken by rows whose last transaction id is lower than any live transaction. In shadow paging, the basic idea is that any reader or writer gets a view onto the data based on reachability from "pointers" in a particular root block. Pages that are reachable from any live root block are never modified. A vacuum process is required to collect the space from blocks that are no longer reachable. Updates to indexes must be treated in roughly the same way as data pages, because they contain pointers to different data. Shadow paging can be used for a table-based database, or a persistent object store. It certainly is much older than the HUT work; see for example Lorie 77, "Physical Integrity in a Large Segmented Database." It falls into the general class of logless transaction systems, as opposed to the log-based approach that predominates in current day non-academic database implementations. -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Concurrency Proposal
On Wed, 31 Mar 2004 12:15:36 +1000, [EMAIL PROTECTED] said: > G'day, > [snip of Ben's pseudo-code] Just to check my understanding: the suggestion here is to reduce reader-writer conflict windows by buffering of writes. The writer acquires a read lock at the start of the transaction, and upgrades to a write lock only when it comes time to commit all pending IO. If i understand the proposal, this improves read concurrency at the cost of write throughput, because write IO is held back and started later than it could be if it were commenced as soon as the intent was known. -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] A proposal for SQLite version 3.0
Echoing some others' responses, particularly Darren's: 1. I don't see the rationale for putting much priority on multiple string encodings. After all, blobs still can't be stored natively :). UTF-16 adds extra complexity, because of embedded nulls, and because of its own need for a byte-order-mark (BOM). Furthermore, it is not a fixed width encoding. (Java, and Windows prior to win2k, behaved as it if it was fixed length, by neglecting surrogate pairs, but they are just broken.) UTF-8 is not a fixed width encoding either, but it does not have embedded nulls, and it is supported "natively" by practically all scripting languages. The wchar_t issue is a pain; on Windows it is typically a 16-bit type, and on Unix it is typically 32-bit, and in either case you still don't know whether it is UCS-2, UTF-16, or (on unix) UCS-4. But using a char* declaration for a string that can contain embedded nulls is an invitation to rampant bugs, IMHO. 2. I would be very interested to hear more about better concurrency support, particularly along the lines of the HUT HiBase/Shades ideas. Doug Currie has written up some ideas about this in cvstrac ("BlueSky"), but I'd also urge people to read the HUT papers. At least K. Oksanen's publications are still online at http://hibase.cs.hut.fi/publications.shtml 3. I'm unsure what is driving a desire for variable typing among the values of a single column. Is this some deep-seated distaste for the relational model :). -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] sqlite not sharing page cache across processes?
On Mon, 22 Mar 2004 14:35:23 -0500, "Doug Currie" <[EMAIL PROTECTED]> said: > It has no way of knowing (save diffing the db) that "there have been > no write transactions." well, that should in principle be easy to accomplish, shouldn't it? For example by having any writer increment a counter held in the meta information in the first page of the file (PageOne or whatever). > > While I'm making suggestions despite near-total ignorance > > of sqlite :), have you measured what improvement would > > be obtained by using readonly mmap() for readers, rather > > read()? It would save some malloc churning as well as > > a memory copy. Performance could be additionally tweaked > > with madvise(SEQUENTIAL) when suitable. > > These functions are not in ANSI C. No, but they are POSIX. And there are lots of functions already in the sqlite code base that are neither ANSI nor POSIX (primarily for windows or mac). There are several operating systems, such as Solaris that implement their "cp" using mmap rather than read, and have done so for ages. So this is really not some piece of exotica. There probably are some embedded systems that do not offer mmap. There are circumstances when mmap will not perform as well as read. For example, the sqlite case of using a page only once is pessimal for mmap, because of TLB bookkeeping overhead on setup and unmapping. Use of mmap would probably make sqlite even more vulnerable to NFS than it is already. I would not suggest mmap as the only solution; as with web servers, I would suggest the strategy as a configurable option. Also, even I would hesitate over suggesting mmap for writers, without a lot of experimentation. On Mon, 22 Mar 2004 14:56:35 -0500, "D. Richard Hipp" <[EMAIL PROTECTED]> said: > Mark D. Anderson wrote: > > Have you measured what improvement would be obtained by using readonly mmap() > > for readers, rather read()? > > SQLite supports files that are larger than 4GB. I don't think > you can mmap() a file that big on a machine with a 32-bit > address space. You'd have to use mmap64 for that, if it was desirable to use mmap at all. -mda - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]