1. Think of the service as a quantified stream of data and not as a continuous one. To switch from one db to another is just deviating the flux from one db to another in between two data transmission sequences. The actual implementation depends on your project. I don't know about your project, but just for the sake of the argument, let's consider two databases: B (at the back-end) and F (at the front-end). Also, let's say F is connected with another HTTP server (maybe it's me, but I am not relying only on CouchDB to respond to all HTTP requests). Let's reclaim the space from B firstly. I create a database BT and I am starting to transfer all the available documents (delete event for a document just makes it unavailable). Once I finish, I just "cut the pipe" in between B and F (stopping replication or whatever mechanism you may use to connect B with F) and "redirect the pipe" toward BT (starting replication or any other mechanism you use; for the replication I would add filter, but that's another story). You can do that in the reversed order (redirecting and after that cutting). Once the data flux is redirected, delete B and re-create it. That deletes the file from the harddisk and creates a new one. Secondly, to reclaim F, the same procedure, just that it is handled by the HTTP server (redirection page can be done even with a simple JavaScript command; all one needs to do is switch the old page to a temporary new one). If programmed correctly, the user wouldn't feel anything except for a slight delay in loading the page (redirection). Maybe I worked too much with YAWS and Erlang, but I usually create a simple application which checks the correctness of the data before injecting them into the database. The delay time is negligible (I use bulk operation which peaks higher than the volume of documents YAWS can send) and the switch can be done by a simple command sent to the TCP server within the Erlang application. That for the back-end database. For the front-end, the redirection it's just replacing the web page (no service interruption for YAWS - a bit more complex in case of using file cache). That would be my design for this particular example.

2. Would it? Transferring only the available documents from B to BT or from F to FT (from the example above), BT/FT would just use the space of the documents you want to keep (process done not through CouchDB replication, but a bit of handy work - or maybe using filtered replication, but I am not sure here). Once B/F is deleted, the file containing the database is deleted from the harddisk (the physical space where the file existed on the harddisk is emptied, meaning, the space can be reused by OS), so, no history is kept in this case if the database is created again. That for sure reclaims the space.

Of course, even for this example, there are limitations in using such a design. But it can be a starting point for you designing your project. If you want something simpler, then maybe you should ask the developers to add a "no history" option to CouchDB (it wouldn't be a bad idea and I am not ironic here).

But, as I mentioned before, the design depends on your project only and there is no general solution.

I hope this opinion will help you in your project.

CGS






On 12/24/2011 01:09 AM, Mark Hahn wrote:
  That means, you move the data from one to the other, filtering out the
deleted documents, and when it's over, you switch to the newly constructed
database, while the other gets emptied (deleted and re-created).

1) How exactly could you make this switch without interrupting service?

2) Wouldn't this procedure create the exact same eventual consistency
problems that deleting documents in a db would?


Reply via email to