Just for others reading that are too lazy to click the link. Documents and attachments are not written atomically. But the header is. Twice. When we open a file we start at the end and scan backwards until we find a valid header. So if you catch some stuff being written half way you won't catch a valid header that refers to the partially written doc or attachment. The first header you find will always ™ refer to a consistent database that precedes it.
There are some caveats to this in that we're assuming that fsync() does get data to disk and that it acts as a barrier. These are pretty much the same assumptions that projects like PostgreSQL make from what I've read. HTH, Paul Davis On Fri, Jan 29, 2010 at 1:57 AM, David Van Couvering <[email protected]> wrote: > Like, yeah, I get it, thanks! I love all the spiral notebook slides :) > > I knew it was append only, but I missed the fact that it's always > consistent, and lock-free. Having worked on lots of database and with db > teams, and all the issues around concurrency, consistency, deadlocks, etc., > this is a Big Deal. > > Great work, Damien, et. al.! > > David > > On Thu, Jan 28, 2010 at 10:40 PM, Kevin Ferguson <[email protected]>wrote: > >> David, check out this preso-- it should explain how the CouchDB storage >> engine works in a bit more detail. >> >> http://jchrisa.net/drl/_design/sofa/_show/post/How-CouchDB-Treats-the-Disks >> >> Kevin >> >> On Thu, Jan 28, 2010 at 10:37 PM, David Van Couvering < >> [email protected]> wrote: >> >> > Thinking further on this: I think one difference is with Derby each table >> > is >> > its own file. Since you need to maintain relational consistency in a SQL >> > DB, this is a problem if you're copying lots of table files while >> > transactions are in flight. Sybase could be a single file (or raw >> device), >> > or it could be split across multiple devices for better throughput, so >> > again >> > you have potential consistency problems (plus I think you tended to back >> up >> > through a special API, as standard file copy doesn't work on raw >> devices). >> > >> > I guess in CouchDB since it's a single file, *perhaps* you don't have the >> > same issues around maintaining consistency... >> > >> > On Thu, Jan 28, 2010 at 10:33 PM, David Van Couvering < >> > [email protected]> wrote: >> > >> > > Well, that's true of any database, but for most databases if you try to >> > > take a "snapshot" of a db file while it's running, chances are pretty >> > good >> > > you're not going to like what you get. More often than not a >> transaction >> > > will have been half-written to the file when you "grab" it and back it >> > up. >> > > And as we all know, the only thing worse than finding a worm in an >> apple >> > is >> > > finding half a worm... >> > > >> > > That's why "online backup" is a key feature requirement for any >> database >> > > worth it's salt. We had to implement it for Sybase, and we had to >> > implement >> > > it for Apache Derby. >> > > >> > > What I'm hearing here is because the way CouchDB works (I'm not sure I >> > > fully grok it), at any point you take a copy of the db file, it's in a >> > > consistent state. I guess what this means is each document is written >> as >> > a >> > > single atomic write, so you can't end up with half a document in your >> > > backup. >> > > >> > > I know you've told me this works, but call me paranoid - is that really >> > > true? What if the document is umpteen gajillibytes long? Is it still >> > > written as a *single atomic write* to the disk? Do you all lock the >> file >> > > each time you do a write? Does Time Machine lock the file from writes >> > the >> > > whole time it's reading it/taking a snapshot of it? Just want to make >> > > really sure we're all on the same page here. >> > > >> > > Thanks, >> > > >> > > David >> > > >> > > >> > > On Thu, Jan 28, 2010 at 9:51 PM, de Saint Martin Cédric < >> > > [email protected]> wrote: >> > > >> > >> Tell me if I'm wrong, but CouchDB's databases are stored in >> > files. >> > >> So if you backup all these files, there is no risk of corrupting your >> > data. >> > >> Conclusion : TimeMachine will work. >> > >> >> > >> >> > >> On 29 janv. 2010, at 00:41, David Van Couvering wrote: >> > >> >> > >> > Anyone know if a TimeMachine backup of a running CouchDB will work, >> or >> > >> is >> > >> > there a likelihood of corruption? >> > >> > >> > >> > -- >> > >> > David W. Van Couvering >> > >> > >> > >> > http://www.linkedin.com/in/davidvc >> > >> > http://davidvancouvering.blogspot.com >> > >> > http://twitter.com/dcouvering >> > >> >> > >> >> > >> >> > > >> > > >> > > -- >> > > David W. Van Couvering >> > > >> > > http://www.linkedin.com/in/davidvc >> > > http://davidvancouvering.blogspot.com >> > > http://twitter.com/dcouvering >> > > >> > >> > >> > >> > -- >> > David W. Van Couvering >> > >> > http://www.linkedin.com/in/davidvc >> > http://davidvancouvering.blogspot.com >> > http://twitter.com/dcouvering >> > >> > > > > -- > David W. Van Couvering > > http://www.linkedin.com/in/davidvc > http://davidvancouvering.blogspot.com > http://twitter.com/dcouvering >
