Re: Database size seems off even after compaction runs.

CGS Fri, 23 Dec 2011 17:39:49 -0800

1. Think of the service as a quantified stream of data and not as acontinuous one. To switch from one db to another is just deviating theflux from one db to another in between two data transmission sequences.The actual implementation depends on your project. I don't know aboutyour project, but just for the sake of the argument, let's consider twodatabases: B (at the back-end) and F (at the front-end). Also, let's sayF is connected with another HTTP server (maybe it's me, but I am notrelying only on CouchDB to respond to all HTTP requests). Let's reclaimthe space from B firstly. I create a database BT and I am starting totransfer all the available documents (delete event for a document justmakes it unavailable). Once I finish, I just "cut the pipe" in between Band F (stopping replication or whatever mechanism you may use to connectB with F) and "redirect the pipe" toward BT (starting replication or anyother mechanism you use; for the replication I would add filter, butthat's another story). You can do that in the reversed order(redirecting and after that cutting). Once the data flux is redirected,delete B and re-create it. That deletes the file from the harddisk andcreates a new one. Secondly, to reclaim F, the same procedure, just thatit is handled by the HTTP server (redirection page can be done even witha simple JavaScript command; all one needs to do is switch the old pageto a temporary new one). If programmed correctly, the user wouldn't feelanything except for a slight delay in loading the page (redirection).Maybe I worked too much with YAWS and Erlang, but I usually create asimple application which checks the correctness of the data beforeinjecting them into the database. The delay time is negligible (I usebulk operation which peaks higher than the volume of documents YAWS cansend) and the switch can be done by a simple command sent to the TCPserver within the Erlang application. That for the back-end database.For the front-end, the redirection it's just replacing the web page (noservice interruption for YAWS - a bit more complex in case of using filecache). That would be my design for this particular example.

2. Would it? Transferring only the available documents from B to BT orfrom F to FT (from the example above), BT/FT would just use the space ofthe documents you want to keep (process done not through CouchDBreplication, but a bit of handy work - or maybe using filteredreplication, but I am not sure here). Once B/F is deleted, the filecontaining the database is deleted from the harddisk (the physical spacewhere the file existed on the harddisk is emptied, meaning, the spacecan be reused by OS), so, no history is kept in this case if thedatabase is created again. That for sure reclaims the space.

Of course, even for this example, there are limitations in using such adesign. But it can be a starting point for you designing your project.If you want something simpler, then maybe you should ask the developersto add a "no history" option to CouchDB (it wouldn't be a bad idea and Iam not ironic here).

But, as I mentioned before, the design depends on your project only andthere is no general solution.


I hope this opinion will help you in your project.

CGS






On 12/24/2011 01:09 AM, Mark Hahn wrote:

  That means, you move the data from one to the other, filtering out the

deleted documents, and when it's over, you switch to the newly constructed
database, while the other gets emptied (deleted and re-created).

1) How exactly could you make this switch without interrupting service?

2) Wouldn't this procedure create the exact same eventual consistency
problems that deleting documents in a db would?

Re: Database size seems off even after compaction runs.

Reply via email to