Re: [sqlite] Safe sqlite over remote filesystem?
On 15 August 2018 at 14:12, Wout Mertens wrote: > On Tue, Aug 14, 2018 at 6:28 PM Rowan Worth wrote: > > > FWIW in the building I work in we have 20-30 users hitting around a dozen > > SQLite DBs 8 hours a day 5 days a week, with all DBs served over nfs. > > > Multiple writers? I presume you use WAL mode? > Yes, all clients read and write the DBs. No we don't use WAL mode, as the clients all run on different machines (WAL only works for multiple clients on the same machine). > We plan to migrate to an actual SQL server > > for that reason, but please don't take it as a criticism of SQLite - I > > think it does a marvelous job in a scenario it definitely wasn't designed > > for. > > > > Before you do that, did you see bedrockdb? http://bedrockdb.com/ > (Although it looks like the github repo is being a little bit ignored by > Expensify) > I've given a reasonable amount of thought towards some kind of "server-fied" SQLite backend, but the main problem I'm facing is that our application's DB layer is very coupled to SQLite and its library bindings. The changes/abstractions required for the application to talk to such a backend are the same amount of work to implement as having it talk to an SQL server. Actually replication might allow that work to be bypassed, but there's a lot of unknowns there as to failure modes and how to manage the machines involved in replication as users drop in and out of different DBs in an ad-hoc fashion. Also on the infrastructure side we have a push towards net-booted diskless nodes... -Rowan ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
On 15 August 2018 at 13:57, Wout Mertens wrote: > For the interested: > > In NixOS (http://nixos.org), a very interesting Linux distribution, the > entire OS (libraries, binaries, shared files, up to and including > configuration files) is composed out of "build products" that are addressed > by "input hash". > > The input hash is calculated on everything that is used to generate the > build product from scratch. For a text file, this is simply the content > hash of the text. For a binary, it is the input hash of the compiler, > libraries, build scripts, and all the build flags, plus the content hash of > the source, all hashed together. > > A build product (whether file or directory) is stored in /nix/store/ hash>-human-readable-name. > It's not clear whether you're involved in NixOS development or just a user, but you might be interested in ipfs: https://ipfs.io/ It's marketed as an http competitor but if I understand correctly it's basically a distributed hash-addressed data store. They provide fairly regular file-system semantics on top of that I believe, but probably NixOS would be happy with a simpler VFS which exposes the hashes themselves. See also venti, because how can you talk about hash-addressed storage without a reference to plan 9 ;) -Rowan > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
Simon Slavin wrote: > On 14 Aug 2018, at 3:09pm, Clemens Ladisch wrote: >> However, there are other file >> operations that are properly synchronized, e.g., it is not possible for two >> NFS clients to create a directory with the same name. > > You are correct. But there's still a problem with simulating a mutex system. > > Suppose two computers try to create a remove folder with the same name at the > same time. The request which is processed first gets "completed > successfully". > If everything is done correctly, the second request would get "folder with > that > name already exists". But since the job has been done (a folder with that > name > does exist, so the task must have been completed, so the programmer should be > happy) the second request is quite likely to get "completed successfully" too. In practice, NFS implementations do report this error correctly. > It takes a very fussy filesystem programmer to make their code report "the > thing you asked for is complete as you requested, but I didn't do it". EEXIST is just one of many error codes that is simply passed through from the real file system. An NFS server would have to go out of its way to change this error into something else. And while trying to be 'clever' with locking could give performance gains, no such benefit exists for mangling the mkdir() result. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Safe sqlite over remote filesystem?
I know it's a bit of an annoying and thankless task, but visiting the github issues and PRs every week or so (at least twice a month) and making sure they progress, is a great way of gaining adoption… So far I have unfortunately not needed Bedrock myself (it would be overkill), but I just got a new project that shows promise in that direction :) ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
Ah, to clarify, we're very, very actively developing on Bedrock every single day. In fact, we're about to roll it out to our new 3-datacenter, 6-server cluster of 384-core, 3TB RAM, 100Gbps fiber-connected machines! All of Expensify is powered by it, so it's been battle tested with over a decade of real-world testing at scale -- both very big, and very small. That said, we personally find it much easier to just build from source and thus don't really maintain a public binary ready to go -- which I recognize makes it a little less accessible, but Bedrock bliss is just a "make" away. Lmk how I can help! -david On Tue, Aug 14, 2018 at 11:13 PM Wout Mertens wrote: > On Tue, Aug 14, 2018 at 6:28 PM Rowan Worth wrote: > > > FWIW in the building I work in we have 20-30 users hitting around a dozen > > SQLite DBs 8 hours a day 5 days a week, with all DBs served over nfs. > > > Multiple writers? I presume you use WAL mode? > > > > Erm, I got a bit carried away. My point is, it's not all doom and gloom. > > > > Yey :) I think that might be good enough (see my previous email) > > And nfs on OSX seems to be a > > non-starter; not sure what's going on with that client implementation but > > the brief experimentation I've done with it suggested that corruption was > > guaranteed. > > > > Aww :( Dammit Apple. I'll have to experiment too. > > > > We plan to migrate to an actual SQL server > > for that reason, but please don't take it as a criticism of SQLite - I > > think it does a marvelous job in a scenario it definitely wasn't designed > > for. > > > > Before you do that, did you see bedrockdb? http://bedrockdb.com/ > (Although it looks like the github repo is being a little bit ignored by > Expensify) > > Wout. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
On Tue, Aug 14, 2018 at 4:10 PM Clemens Ladisch wrote: > So as long as all > programs that access the database cooperate, they can switch to a different > locking implementation, such as the unix-dotfile VFS: > > https://www.sqlite.org/vfs.html#standard_unix_vfses > > Note: this makes all accesses, even reads, take an exclusive lock. Actually, I think that can work! Thanks! Wout. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
On Tue, Aug 14, 2018 at 6:28 PM Rowan Worth wrote: > FWIW in the building I work in we have 20-30 users hitting around a dozen > SQLite DBs 8 hours a day 5 days a week, with all DBs served over nfs. Multiple writers? I presume you use WAL mode? > Erm, I got a bit carried away. My point is, it's not all doom and gloom. > Yey :) I think that might be good enough (see my previous email) And nfs on OSX seems to be a > non-starter; not sure what's going on with that client implementation but > the brief experimentation I've done with it suggested that corruption was > guaranteed. > Aww :( Dammit Apple. I'll have to experiment too. > We plan to migrate to an actual SQL server > for that reason, but please don't take it as a criticism of SQLite - I > think it does a marvelous job in a scenario it definitely wasn't designed > for. > Before you do that, did you see bedrockdb? http://bedrockdb.com/ (Although it looks like the github repo is being a little bit ignored by Expensify) Wout. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
On Tue, Aug 14, 2018 at 6:13 PM Alek Paunov wrote: > I am curious, Did you considered adapting writing in your use-case to > Bedrock? AFAIK, you can read from Bedrock instance DBs safely without > further adaptation. > Right, Bedrock is amazing, but in this particular use-case the only available communication channel is the filesystem. For the interested: In NixOS (http://nixos.org), a very interesting Linux distribution, the entire OS (libraries, binaries, shared files, up to and including configuration files) is composed out of "build products" that are addressed by "input hash". The input hash is calculated on everything that is used to generate the build product from scratch. For a text file, this is simply the content hash of the text. For a binary, it is the input hash of the compiler, libraries, build scripts, and all the build flags, plus the content hash of the source, all hashed together. A build product (whether file or directory) is stored in /nix/store/-human-readable-name. If you assume that a given input hash will always result in the same build product, you can just check if what you want exists at /nix/store/... and if so, use that. NixOS uses a sqlite db to manage metadata. The entire OS image is simply a build product consisting of all the composing packages symlinked together, with an activation script that sets up the filesystem and runs daemons. This image is symlinked to /run/current-system. Upgrading, downgrading, patching, configuring etc are all done by building a new image from scratch and replacing that symlink. All these actions are therefore atomic (this is the mindblowing part about NixOS). Furthermore, since input hashes stay mostly the same, building a new image happens in a matter of seconds most of the time since everything besides what you changed gets reused (more mind blowing). And so the problem I was idly musing about, is that of running Linux Vagrant VMs on OS X, while sharing my /nix/store over NFS. That way the VMs can stay small, plus their build products are retained between launches. Since the OS type is part of the input hash, there is no problem sharing the store between OS X and Linux. And that would Just Work, were it not for the sqlite-over-NFS-exploding hiccup :) That said, from what Rowan wrote, I should just try it, and simply avoid concurrent builds (and make backups). ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
NFS is a Remote/Network File System. iSCSI is a local file system. iSCSI is just transporting the SCSI protocol over a "different" physical layer sort of like how you can transport SCSI over really really fat parallel SCSI cables, PATA cables, or SATA cables. (That is, pSCSI, sSCSI, and iSCSI are identical and they are all filesystemless block transport protocols). --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Dennis Clarke >Sent: Tuesday, 14 August, 2018 13:10 >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] Safe sqlite over remote filesystem? > > >Read all of this repeatedly. Excellent post. > >> >> But if your nfs solution is configured not to lie, to honour lock >and sync > >Had to pop up here briefly. I ran into a number of problems with nfs >clients of various types wherein the most brutal would be VMware ESXi >hosts. Running backend network attached storage from Oracle which is >actually based on Solaris with ZFS can be terrifying if the actual >disk controllers are doing cache at all. ZFS is a filesystem that >uses >tons of memory for cache and actual flush of writes to on disk >happens >long after a given IO operation has long since been complete. >Migration >away from NFS to iSCSI was a smart choice and I wonder if you have >any >iSCSI attached devices and what have you seen ? > > >Dennis >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
Read all of this repeatedly. Excellent post. But if your nfs solution is configured not to lie, to honour lock and sync Had to pop up here briefly. I ran into a number of problems with nfs clients of various types wherein the most brutal would be VMware ESXi hosts. Running backend network attached storage from Oracle which is actually based on Solaris with ZFS can be terrifying if the actual disk controllers are doing cache at all. ZFS is a filesystem that uses tons of memory for cache and actual flush of writes to on disk happens long after a given IO operation has long since been complete. Migration away from NFS to iSCSI was a smart choice and I wonder if you have any iSCSI attached devices and what have you seen ? Dennis ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
On 14 Aug 2018, at 3:09pm, Clemens Ladisch wrote: > However, there are other file > operations that are properly synchronized, e.g., it is not possible for two > NFS clients to create a directory with the same name. You are correct. But there's still a problem with simulating a mutex system. Suppose two computers try to create a remove folder with the same name at the same time. The request which is processed first gets "completed successfully". If everything is done correctly, the second request would get "folder with that name already exists". But since the job has been done (a folder with that name does exist, so the task must have been completed, so the programmer should be happy) the second request is quite likely to get "completed successfully" too. It takes a very fussy filesystem programmer to make their code report "the thing you asked for is complete as you requested, but I didn't do it". Problems like this occur with many methods of trying to simulate locking using file system calls. It would slow mundane file operations down too much to be as fussy as a mutex system needs to be. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
FWIW in the building I work in we have 20-30 users hitting around a dozen SQLite DBs 8 hours a day 5 days a week, with all DBs served over nfs. The number of corruptions I've seen in the last 5 years which nfs *might* be responsible for is *very low*. The number of corruptions where nfs was *definitely* responsible is... zero I think? Definitely single digits. And off the top of my head I can recall four incidents where corruption was found in a copy of the database during our backup procedure, while the main DB was intact. The thing is, we don't use SQLite's backup api in this procedure -- we lock the DB and make a copy using OS syscalls. Ironically this was the cause of the vast majority of our corruptions, because our initial implementation did not account for POSIX fcntl locking semantics and when we closed our file descriptor used in the copy we inadvertently dropped SQLite's locks. Anyway, the incidents I'm talking about occurred long after that bug was fixed. Somehow a regular copy of a file, involving a single nfs client, wound up corrupt. I've looked very closely at these and the corruption affected a single 4k region of the DB. It's been observed on DBs using both 4k and 1k page sizes. It could be a broken nfs cache but the weird part is that the first 512 bytes of the region are absolutely fine. Exactly the same as the non-corrupt DB. Then the next 3584 bytes are random garbage. This is not a solved mystery, and probably it never will be as it's been many months since we've seen these particular symptoms. Maybe the problem stemmed from nfs, but it could also be a faulty disk controller, or strange kernel interaction with multiple fds, or a memory corruption in our application itself (which is kind of what I lean towards given the corruption is 512 bytes misaligned from the page boundary). Erm, I got a bit carried away. My point is, it's not all doom and gloom. I'm sure nfs implementations used to be much worse, and I'm not saying they're perfect these days -- we get semi-regular reports from windows users regarding DB corruption, and there's one particular customer we have who's NAS server likes to keep hold of file locks on behalf of some unknown client, long after all client machines have been rebooted (resulting in SQLite being unable to access the DBs). And nfs on OSX seems to be a non-starter; not sure what's going on with that client implementation but the brief experimentation I've done with it suggested that corruption was guaranteed. But if your nfs solution is configured not to lie, to honour lock and sync requests, things work well. Corruption has almost been a non-issue for us since we fixed our application's bug. The bigger problem we face is concurrency, due to the single writer model and the writer-starvation solution locking new readers out of the DB until all current readers are done (so the writer can finish). We plan to migrate to an actual SQL server for that reason, but please don't take it as a criticism of SQLite - I think it does a marvelous job in a scenario it definitely wasn't designed for. -Rowan On 14 August 2018 at 21:07, Wout Mertens wrote: > Idle musing again, I'm pretty bad at dropping thoughts that are not > immediately applicable to me, sorry. > > I know that multi-writer sqlite and NFS don't play well with each other. > > However, I wonder if some constraints could be added that would make this > situation safe. > > My problem space is that of a shared NixOS package store between VMs, which > holds metadata about the available packages: > >- many writers need access to the same db >- their only communication channel is the POSIX filesystem that holds >the db >- they only write "seldomly", every few seconds at the fastest >- they do read all the time >- it is ok if read data is a little bit stale (10s is acceptable) >- it is ok if write transactions fail and can be retried >- it is ok if writes are slow >- it is never ok for data to be corrupt > > Is there a way to use safely sqlite in this situation, perhaps by using > extra lock files or some other additional mechanism? > > One solution I can think of involves sending all writes through a single > master, via files describing changes and lots of polling, but that seems > really outlandish. > > Wout. > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
On 2018-08-14 16:07, Wout Mertens wrote: Is there a way to use safely sqlite in this situation, perhaps by using extra lock files or some other additional mechanism? One solution I can think of involves sending all writes through a single master, via files describing changes and lots of polling, but that seems really outlandish. From the mail archives I see you have asked few questions about Bedrock SQLite cluster solution last year. I am curious, Did you considered adapting writing in your use-case to Bedrock? AFAIK, you can read from Bedrock instance DBs safely without further adaptation. Kind Regards, Alek [1] https://github.com/Expensify/Bedrock ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
Wout Mertens wrote: > I know that multi-writer sqlite and NFS don't play well with each other. > > However, I wonder if some constraints could be added that would make this > situation safe. NFS locking implementations tend to be bad. However, there are other file operations that are properly synchronized, e.g., it is not possible for two NFS clients to create a directory with the same name. So as long as all programs that access the database cooperate, they can switch to a different locking implementation, such as the unix-dotfile VFS: https://www.sqlite.org/vfs.html#standard_unix_vfses Note: this makes all accesses, even reads, take an exclusive lock. Regards, Clemens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
Have you read through this document: https://www.sqlite.org/howtocorrupt.html On Tue, Aug 14, 2018 at 9:46 AM, Wout Mertens wrote: > In the meantime I found this discussion from 2011 about sqlite on CIFS: > http://sqlite.1065341.n5.nabble.com/How-to-make-SQLite- > run-safely-on-CIFS-mounted-file-system-tt37415.html#none > > Basically using any networked filesystem as a backing store for sqlite is > madness? I imagine not much about that changed in the last 7 years. > > Using the per-host-file-messaging as a communication channel to a single > master that also exports the NFS doesn't seem that outlandish any more. > > On Tue, Aug 14, 2018 at 3:07 PM Wout Mertens > wrote: > > > Idle musing again, I'm pretty bad at dropping thoughts that are not > > immediately applicable to me, sorry. > > > > I know that multi-writer sqlite and NFS don't play well with each other. > > > > However, I wonder if some constraints could be added that would make this > > situation safe. > > > > My problem space is that of a shared NixOS package store between VMs, > > which holds metadata about the available packages: > > > >- many writers need access to the same db > >- their only communication channel is the POSIX filesystem that holds > >the db > >- they only write "seldomly", every few seconds at the fastest > >- they do read all the time > >- it is ok if read data is a little bit stale (10s is acceptable) > >- it is ok if write transactions fail and can be retried > >- it is ok if writes are slow > >- it is never ok for data to be corrupt > > > > Is there a way to use safely sqlite in this situation, perhaps by using > > extra lock files or some other additional mechanism? > > > > One solution I can think of involves sending all writes through a single > > master, via files describing changes and lots of polling, but that seems > > really outlandish. > > > > Wout. > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Safe sqlite over remote filesystem?
In the meantime I found this discussion from 2011 about sqlite on CIFS: http://sqlite.1065341.n5.nabble.com/How-to-make-SQLite-run-safely-on-CIFS-mounted-file-system-tt37415.html#none Basically using any networked filesystem as a backing store for sqlite is madness? I imagine not much about that changed in the last 7 years. Using the per-host-file-messaging as a communication channel to a single master that also exports the NFS doesn't seem that outlandish any more. On Tue, Aug 14, 2018 at 3:07 PM Wout Mertens wrote: > Idle musing again, I'm pretty bad at dropping thoughts that are not > immediately applicable to me, sorry. > > I know that multi-writer sqlite and NFS don't play well with each other. > > However, I wonder if some constraints could be added that would make this > situation safe. > > My problem space is that of a shared NixOS package store between VMs, > which holds metadata about the available packages: > >- many writers need access to the same db >- their only communication channel is the POSIX filesystem that holds >the db >- they only write "seldomly", every few seconds at the fastest >- they do read all the time >- it is ok if read data is a little bit stale (10s is acceptable) >- it is ok if write transactions fail and can be retried >- it is ok if writes are slow >- it is never ok for data to be corrupt > > Is there a way to use safely sqlite in this situation, perhaps by using > extra lock files or some other additional mechanism? > > One solution I can think of involves sending all writes through a single > master, via files describing changes and lots of polling, but that seems > really outlandish. > > Wout. > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Safe sqlite over remote filesystem?
Idle musing again, I'm pretty bad at dropping thoughts that are not immediately applicable to me, sorry. I know that multi-writer sqlite and NFS don't play well with each other. However, I wonder if some constraints could be added that would make this situation safe. My problem space is that of a shared NixOS package store between VMs, which holds metadata about the available packages: - many writers need access to the same db - their only communication channel is the POSIX filesystem that holds the db - they only write "seldomly", every few seconds at the fastest - they do read all the time - it is ok if read data is a little bit stale (10s is acceptable) - it is ok if write transactions fail and can be retried - it is ok if writes are slow - it is never ok for data to be corrupt Is there a way to use safely sqlite in this situation, perhaps by using extra lock files or some other additional mechanism? One solution I can think of involves sending all writes through a single master, via files describing changes and lots of polling, but that seems really outlandish. Wout. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users