Re: [sqlite] Read-only access which does not block writers
On Nov 29, 2016, at 9:09 AM, Simon Slavin <slav...@bigfraud.org> wrote: > > >> On 29 Nov 2016, at 4:18pm, Mark Hamburg <mhamburg...@gmail.com> wrote: >> >> Does this make sense? Does it seem useful? (It seems useful to me when I see >> multi-megabyte WAL files.) > > Sorry, but I cannot spare the time right now to analyze the system you laid > out. It usually takes half an hour to diagram out the read and write > procedures and point out where multiple simultaneous ones don’t fit together. > > I can tell you that entire books are written about the difficulties of > simultaneous access to a database, and that I’ve read too many of them. And > that it has been proven many times that there’s no solution. You cannot > design a system which (A) provides up-to-date data to readers (B) allows > writers to get rid of their data immediately without ever locking up and (C) > guarantees that earlier changes to the data are ’saved' before later changes, > thus preserving uncorrupted data in the case of power-cuts, etc.. > > It is possible to implement a version if you drop one of the requirements. > For example, you can have many simultaneous writers as long as you don’t need > any readers. Or you can have many readers as long as you have only one > writer and you don’t need readers to be completely up-to-date. > > You might like to read more about Brewer’s Theorem: > > <https://en.wikipedia.org/wiki/CAP_theorem> > > In the meantime, I’m glad that WAL mode seems to be useful for you, if you > can cope with big journal files until all connections are closed, it’s a good > solution. What I probably haven't accounted for is what it would take to do an atomic/safe swap of the WAL files in my double WAL scenario. I need to give that some more thought. In the single WAL scenario, what I probably really want is a way for the checkpoint operation on the write connection to do its work then wait for exclusive access — standard reader/writer lock pattern — to do the WAL reset. This would presumably limit the time that the readers were blocked since I would expect the WAL reset to be reasonably quick if the checkpoint copying work were already complete. Furthermore, the write operation would only be blocked for the length of the longest outstanding read so as long I favor big transactions for writes and short queries for reads, the writer shouldn't be blocked for too long either. Are there checkpoint settings that achieve this or do I need to build that logic into my code? Mark ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read-only access which does not block writers
One other question about WAL mode and simultaneous readers and writers: How are people setting their page caches? My read is that shared cache is probably not what's wanted. I was setting my reader caches to be bigger than my writer cache under the assumption that writers write and then move on whereas readers would benefit from having more data cached, but I'm now thinking that the disk cache should be getting me the latter effect and increasing the size of the write cache should allow the writer to run longer without having to flush from memory to disk. Is there any standard advice in this regard or is this the sort of question where the answer is "experiment". Mark ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Read-only access which does not block writers
Once I figured out how to reliably get the reader and writer connections open on a database — key point, when creating the database let the writer get through all of the creation work before opening any readers — I've been getting great concurrency behavior for simultaneous reads and writes in WAL mode. What's less great is that if you have enough read activity, the checkpoint logic may never be able to actually reset the WAL and the WAL can get very large. Basically, if there is a read in process that uses the WAL, it can't get reset. This would obviously be a change to the WAL implementation, but I've been thinking that it ought to be possible to use essentially two WALs as essentially successive appendages to the main database and to swap their roles as the earlier WAL gets written into the main database. In other words something like the following: Logical view: DB ++ WAL_A ++ WAL_B where ++ is essentially the WAL overlay logic. Writes always go into the second WAL. Checkpoints copy from the first WAL into the database. When a checkpoint is done, if there are no reads with holds on WAL_A, we can swap the roles of the WALs, reset WAL_A which is now the second WAL and hence the target for writes and start copying WAL_B which is now the first WAL into the database. The trick comes in what we do when we start a read. If the first WAL has not been fully checkpointed, then we need to grab both WALs. (We could ignore the second WAL if it is empty, but that's immaterial to this discussion.) If, however, the first WAL has been fully checkpointed, then a read need not grab it which then leaves us free to do the WAL swap. Such a scheme won't help with long reads blocking WAL reset, but it seems like it would eliminate issues with having a steady stream of small read operations blocking WAL reset. Does this make sense? Does it seem useful? (It seems useful to me when I see multi-megabyte WAL files.) Mark ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Putting an index on a boolean
Though to the extent that speed is proportional to data size, it would be good to use something other than hexadecimal to store UUIDs. Binary blobs would be the most compact, but ASCII85 encoding would work well if you need strings. Also, if these values are reused repeatedly as I suspect projectID and groupID might be, then it may be useful to intern them into a table and use integer keys. We got a noticeable performance improvement when I made that sort of change recently in our project. (I also implemented a string-to-integer-to-string cache that sits ahead of hitting the database.) Mark > On Dec 12, 2015, at 1:07 PM, Darren Duncan wrote: > > On 2015-12-12 12:56 PM, Cecil Westerhof wrote: By the way: I am thinking about using UUID for projectID and groupID, >>> but I heard somewhere that it was a bad idea to use UUID for an indexed field. >>> Is this true?? >>> >>> I think you might have misunderstood. UUID is almost always a good >>> field to index. >> >> ?I was told because of the nature of random UUID (what I will be using) it >> is hard to create a good index. The article said that data that is really >> random cannot be indexed very efficient. But I do not have to worry about >> it then. :-) It has been a few years back, so it is also possible that the >> problem is solved nowadays. > > Cecil, it isn't about randomness, it is about uniqueness or cardinality. The > fields that index the best are ones with many different values, in particular > key fields where every record has a different value from every other record. > UUIDs have this quality in spades. It is even more important to index such > fields if you will either be searching/filtering on them or if they are the > parent in a foreign key constraint. This has always been the case, its not a > new thing. -- Darren Duncan > > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Dual WAL mode?
I know I'm being pretty hand-wavy here since I don't know all of the details of the WAL implementation and I'm not fluent in the terminology, but it seems like checkpoint starvation shouldn't have to allow the WAL to grow without bound provided the individual read and write processes don't have long running transactions. As I understand it, writers always append to the WAL and use a view of the database that includes the main file plus any changes in the WAL. Readers use the main file plus any changes up through a particular point in the WAL ? the point of the last completed transaction prior to the start of the read operation. A checkpoint can move data from the WAL into the main file provided it doesn't move anything beyond the limit point for any client. (Any reader client since writers use changes through the end of the WAL.) Checkpoint starvation (and continued WAL expansion) occurs because the checkpointer fails to make it to the end of the WAL because it is blocked by some readers change limit. My thinking is that this could be addressed by splitting the WAL into two files full of changes that logically follow each other. Assuming readers move forward frequently enough, they should eventually move their limit point from the first file into the second. At this point, when the checkpointer finishes transferring changes for the first WAL file into the main database file, it can swap the first WAL file to follow the second WAL file as empty space for future changes. Would this work and if so is there a reason this hasn't been done? Mark
[sqlite] How would sqlite read this from disk?
On Oct 30, 2015, at 5:56 AM, Richard Hipp wrote: >> Will SQLite rewrite the whole row if you just change field2 from one float >> to another? > > Yes. Not just the whole row but the whole page on which that row > resides. And even if SQLite did just try to write the 8 bytes that > changes, your OS and disk controller will both end up writing the > entire sector and/or track, so it amounts to about the same I/O either > way. I knew it would dirty the whole page. I was thinking about the memory work and ignoring that the disk I/O swamps that. Will it at least avoid dirtying continuation pages if the space before the blob doesn't change? Mark
[sqlite] How would sqlite read this from disk?
> On Oct 29, 2015, at 12:24 PM, Richard Hipp wrote: > > If you do have large BLOBs or strings, SQLite handles this best if the > large blob/string is stored in a table by itself and then linked into > other tables using an integer primary key. For example: > > CREATE TABLE BigBlobTab ( > blobid INTEGER PRIMARY KEY, > content BLOB -- big BLOB or text field. > ); > CREATE TABLE OtherStuff ( > field1 VARCHAR(10), > field2 FLOAT, > field3 BOOLEAN, > blobid INT REFERENCES BigBlobTab > ); Will SQLite rewrite the whole row if you just change field2 from one float to another? Mark
[sqlite] WAL, threads, shared cache, etc
I have a database for which I essentially want to support three streams of operations: writing, reading fast queries (e.g., lookup a single record), and reading complex queries (e.g., find all of the records matching a particular criterion). I would like to have these run with as little interference as possible between them. I'm on iOS, so I can't use processes (just to avoid the whole "don't use threads, use processes!" spiel). That last point, however, leads to the issue that the SQLite documentation says very little about threading other than "SQLite is threadsafe, don't use threads." So, I wanted to see whether I have the right checklist in mind for implementing this: 1. Use WAL mode so that the reads and the writes can proceed essentially in parallel. (Reads can interfere with checkpoints but assuming the system quiesces often and checkpoints then, that shouldn't be an extended problem.) 2. Use one thread (or on iOS probably one GCD dispatch queue) per stream of work. 3. Open a connection per thread? 4. Shared cache? Yes or no? 5. Anything else? Thanks. Mark ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Booleans in SQLite
The real argument for adding boolean support is not about space but about compatibility with dynamic languages with a boolean type that are exploiting SQLite's dynamic typing of values. Without a boolean type in SQLite, a glue layer has to guess whether a 0 means zero or false or a "NO" means the string "NO" or false or... Mark ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Db design question (so. like a tree)
One of the questions that I believe was raised but not answered on this thread was how to make sure that you don't have circular relationships particularly given that SQLite isn't good at scanning the tree. If you can control the id's then simply require that the id of the child be greater than the id's of the parents. Mark ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Join performance in SQLite
Assuming memory is sufficiently inexpensive, I would think that it would almost always be useful to build an index for any field in a join rather than doing a full scan. (Or better yet, build a hash table if memory is sufficient.) Indices maintained in the database then become optimizations to avoid starting the query with an index build. Mark ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users