Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Mark Hamburg
On Nov 29, 2016, at 9:09 AM, Simon Slavin <slav...@bigfraud.org> wrote:
> 
> 
>> On 29 Nov 2016, at 4:18pm, Mark Hamburg <mhamburg...@gmail.com> wrote:
>> 
>> Does this make sense? Does it seem useful? (It seems useful to me when I see 
>> multi-megabyte WAL files.)
> 
> Sorry, but I cannot spare the time right now to analyze the system you laid 
> out.  It usually takes half an hour to diagram out the read and write 
> procedures and point out where multiple simultaneous ones don’t fit together.
> 
> I can tell you that entire books are written about the difficulties of 
> simultaneous access to a database, and that I’ve read too many of them.  And 
> that it has been proven many times that there’s no solution.  You cannot 
> design a system which (A) provides up-to-date data to readers (B) allows 
> writers to get rid of their data immediately without ever locking up and (C) 
> guarantees that earlier changes to the data are ’saved' before later changes, 
> thus preserving uncorrupted data in the case of power-cuts, etc..
> 
> It is possible to implement a version if you drop one of the requirements.  
> For example, you can have many simultaneous writers as long as you don’t need 
> any readers.  Or you can have many readers as long as you have only one 
> writer and you don’t need readers to be completely up-to-date.
> 
> You might like to read more about Brewer’s Theorem:
> 
> <https://en.wikipedia.org/wiki/CAP_theorem>
> 
> In the meantime, I’m glad that WAL mode seems to be useful for you, if you 
> can cope with big journal files until all connections are closed, it’s a good 
> solution.

What I probably haven't accounted for is what it would take to do an 
atomic/safe swap of the WAL files in my double WAL scenario. I need to give 
that some more thought.

In the single WAL scenario, what I probably really want is a way for the 
checkpoint operation on the write connection to do its work then wait for 
exclusive access — standard reader/writer lock pattern — to do the WAL reset. 
This would presumably limit the time that the readers were blocked since I 
would expect the WAL reset to be reasonably quick if the checkpoint copying 
work were already complete. Furthermore, the write operation would only be 
blocked for the length of the longest outstanding read so as long I favor big 
transactions for writes and short queries for reads, the writer shouldn't be 
blocked for too long either. Are there checkpoint settings that achieve this or 
do I need to build that logic into my code?

Mark
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Mark Hamburg
One other question about WAL mode and simultaneous readers and writers: How are 
people setting their page caches? My read is that shared cache is probably not 
what's wanted. I was setting my reader caches to be bigger than my writer cache 
under the assumption that writers write and then move on whereas readers would 
benefit from having more data cached, but I'm now thinking that the disk cache 
should be getting me the latter effect and increasing the size of the write 
cache should allow the writer to run longer without having to flush from memory 
to disk. Is there any standard advice in this regard or is this the sort of 
question where the answer is "experiment".

Mark

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Mark Hamburg
Once I figured out how to reliably get the reader and writer connections open 
on a database — key point, when creating the database let the writer get 
through all of the creation work before opening any readers — I've been getting 
great concurrency behavior for simultaneous reads and writes in WAL mode.

What's less great is that if you have enough read activity, the checkpoint 
logic may never be able to actually reset the WAL and the WAL can get very 
large. Basically, if there is a read in process that uses the WAL, it can't get 
reset.

This would obviously be a change to the WAL implementation, but I've been 
thinking that it ought to be possible to use essentially two WALs as 
essentially successive appendages to the main database and to swap their roles 
as the earlier WAL gets written into the main database. In other words 
something like the following:

Logical view: DB ++ WAL_A ++ WAL_B

where ++ is essentially the WAL overlay logic. Writes always go into the second 
WAL. Checkpoints copy from the first WAL into the database. When a checkpoint 
is done, if there are no reads with holds on WAL_A, we can swap the roles of 
the WALs, reset WAL_A which is now the second WAL and hence the target for 
writes and start copying WAL_B which is now the first WAL into the database.

The trick comes in what we do when we start a read. If the first WAL has not 
been fully checkpointed, then we need to grab both WALs. (We could ignore the 
second WAL if it is empty, but that's immaterial to this discussion.) If, 
however, the first WAL has been fully checkpointed, then a read need not grab 
it which then leaves us free to do the WAL swap.

Such a scheme won't help with long reads blocking WAL reset, but it seems like 
it would eliminate issues with having a steady stream of small read operations 
blocking WAL reset.

Does this make sense? Does it seem useful? (It seems useful to me when I see 
multi-megabyte WAL files.)

Mark

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Putting an index on a boolean

2015-12-12 Thread Mark Hamburg
Though to the extent that speed is proportional to data size, it would be good 
to use something other than hexadecimal to store UUIDs. Binary blobs would be 
the most compact, but ASCII85 encoding would work well if you need strings.

Also, if these values are reused repeatedly as I suspect projectID and groupID 
might be, then it may be useful to intern them into a table and use integer 
keys. We got a noticeable performance improvement when I made that sort of 
change recently in our project. (I also implemented a 
string-to-integer-to-string cache that sits ahead of hitting the database.)

Mark

> On Dec 12, 2015, at 1:07 PM, Darren Duncan  wrote:
> 
> On 2015-12-12 12:56 PM, Cecil Westerhof wrote:
 By the way: I am thinking about using UUID for projectID and groupID,
>>> but I
 heard somewhere that it was a bad idea to use UUID for an indexed field.
>>> Is
 this true??
>>> 
>>> I think you might have misunderstood.  UUID is almost always a good
>>> field to index.
>> 
>> ?I was told because of the nature of random UUID (what I will be using) it
>> is hard to create a good index. The article said that data that is really
>> random cannot be indexed very efficient. But I do not have to worry about
>> it then. :-) It has been a few years back, so it is also possible that the
>> problem is solved nowadays.
> 
> Cecil, it isn't about randomness, it is about uniqueness or cardinality.  The 
> fields that index the best are ones with many different values, in particular 
> key fields where every record has a different value from every other record. 
> UUIDs have this quality in spades.  It is even more important to index such 
> fields if you will either be searching/filtering on them or if they are the 
> parent in a foreign key constraint.  This has always been the case, its not a 
> new thing. -- Darren Duncan
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Dual WAL mode?

2015-12-07 Thread Mark Hamburg
I know I'm being pretty hand-wavy here since I don't know all of the
details of the WAL implementation and I'm not fluent in the terminology,
but it seems like checkpoint starvation shouldn't have to allow the WAL to
grow without bound provided the individual read and write processes don't
have long running transactions.

As I understand it, writers always append to the WAL and use a view of the
database that includes the main file plus any changes in the WAL. Readers
use the main file plus any changes up through a particular point in the WAL
? the point of the last completed transaction prior to the start of the
read operation.

A checkpoint can move data from the WAL into the main file provided it
doesn't move anything beyond the limit point for any client. (Any reader
client since writers use changes through the end of the WAL.) Checkpoint
starvation (and continued WAL expansion) occurs because the checkpointer
fails to make it to the end of the WAL because it is blocked by some
readers change limit.

My thinking is that this could be addressed by splitting the WAL into two
files full of changes that logically follow each other. Assuming readers
move forward frequently enough, they should eventually move their limit
point from the first file into the second. At this point, when the
checkpointer finishes transferring changes for the first WAL file into the
main database file, it can swap the first WAL file to follow the second WAL
file as empty space for future changes.

Would this work and if so is there a reason this hasn't been done?

Mark


[sqlite] How would sqlite read this from disk?

2015-10-30 Thread Mark Hamburg
On Oct 30, 2015, at 5:56 AM, Richard Hipp  wrote:

>> Will SQLite rewrite the whole row if you just change field2 from one float
>> to another?
> 
> Yes.  Not just the whole row but the whole page on which that row
> resides.  And even if SQLite did just try to write the 8 bytes that
> changes, your OS and disk controller will both end up writing the
> entire sector and/or track, so it amounts to about the same I/O either
> way.

I knew it would dirty the whole page. I was thinking about the memory work and 
ignoring that the disk I/O swamps that. Will it at least avoid dirtying 
continuation pages if the space before the blob doesn't change?

Mark


[sqlite] How would sqlite read this from disk?

2015-10-30 Thread Mark Hamburg

> On Oct 29, 2015, at 12:24 PM, Richard Hipp  wrote:
> 
> If you do have large BLOBs or strings, SQLite handles this best if the
> large blob/string is stored in a table by itself and then linked into
> other tables using an integer primary key.  For example:
> 
>   CREATE TABLE BigBlobTab (
>   blobid INTEGER PRIMARY KEY,
>   content BLOB -- big BLOB or text field.
>   );
>   CREATE TABLE OtherStuff (
>   field1 VARCHAR(10),
>   field2 FLOAT,
>   field3 BOOLEAN,
>   blobid INT REFERENCES BigBlobTab
>   );

Will SQLite rewrite the whole row if you just change field2 from one float to 
another?

Mark




[sqlite] WAL, threads, shared cache, etc

2011-04-13 Thread Mark Hamburg
I have a database for which I essentially want to support three streams of 
operations: writing, reading fast queries (e.g., lookup a single record), and 
reading complex queries (e.g., find all of the records matching a particular 
criterion). I would like to have these run with as little interference as 
possible between them. I'm on iOS, so I can't use processes (just to avoid the 
whole "don't use threads, use processes!" spiel). That last point, however, 
leads to the issue that the SQLite documentation says very little about 
threading other than "SQLite is threadsafe, don't use threads." So, I wanted to 
see whether I have the right checklist in mind for implementing this:

1. Use WAL mode so that the reads and the writes can proceed essentially in 
parallel. (Reads can interfere with checkpoints but assuming the system 
quiesces often and checkpoints then, that shouldn't be an extended problem.)

2. Use one thread (or on iOS probably one GCD dispatch queue) per stream of 
work.

3. Open a connection per thread?

4. Shared cache? Yes or no?

5. Anything else?

Thanks.
Mark

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Booleans in SQLite

2009-09-07 Thread Mark Hamburg
The real argument for adding boolean support is not about space but  
about compatibility with dynamic languages with a boolean type that  
are exploiting SQLite's dynamic typing of values. Without a boolean  
type in SQLite, a glue layer has to guess whether a 0 means zero or  
false or a "NO" means the string "NO" or false or...

Mark

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Db design question (so. like a tree)

2009-06-04 Thread Mark Hamburg
One of the questions that I believe was raised but not answered on  
this thread was how to make sure that you don't have circular  
relationships particularly given that SQLite isn't good at scanning  
the tree. If you can control the id's then simply require that the id  
of the child be greater than the id's of the parents.

Mark

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Join performance in SQLite

2009-05-30 Thread Mark Hamburg
Assuming memory is sufficiently inexpensive, I would think that it  
would almost always be useful to build an index for any field in a  
join rather than doing a full scan. (Or better yet, build a hash table  
if memory is sufficient.) Indices maintained in the database then  
become optimizations to avoid starting the query with an index build.

Mark

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users