1. Think of the service as a quantified stream of data and not as a
continuous one. To switch from one db to another is just deviating the
flux from one db to another in between two data transmission sequences.
The actual implementation depends on your project. I don't know about
your project, but just for the sake of the argument, let's consider two
databases: B (at the back-end) and F (at the front-end). Also, let's say
F is connected with another HTTP server (maybe it's me, but I am not
relying only on CouchDB to respond to all HTTP requests). Let's reclaim
the space from B firstly. I create a database BT and I am starting to
transfer all the available documents (delete event for a document just
makes it unavailable). Once I finish, I just "cut the pipe" in between B
and F (stopping replication or whatever mechanism you may use to connect
B with F) and "redirect the pipe" toward BT (starting replication or any
other mechanism you use; for the replication I would add filter, but
that's another story). You can do that in the reversed order
(redirecting and after that cutting). Once the data flux is redirected,
delete B and re-create it. That deletes the file from the harddisk and
creates a new one. Secondly, to reclaim F, the same procedure, just that
it is handled by the HTTP server (redirection page can be done even with
a simple JavaScript command; all one needs to do is switch the old page
to a temporary new one). If programmed correctly, the user wouldn't feel
anything except for a slight delay in loading the page (redirection).
Maybe I worked too much with YAWS and Erlang, but I usually create a
simple application which checks the correctness of the data before
injecting them into the database. The delay time is negligible (I use
bulk operation which peaks higher than the volume of documents YAWS can
send) and the switch can be done by a simple command sent to the TCP
server within the Erlang application. That for the back-end database.
For the front-end, the redirection it's just replacing the web page (no
service interruption for YAWS - a bit more complex in case of using file
cache). That would be my design for this particular example.
2. Would it? Transferring only the available documents from B to BT or
from F to FT (from the example above), BT/FT would just use the space of
the documents you want to keep (process done not through CouchDB
replication, but a bit of handy work - or maybe using filtered
replication, but I am not sure here). Once B/F is deleted, the file
containing the database is deleted from the harddisk (the physical space
where the file existed on the harddisk is emptied, meaning, the space
can be reused by OS), so, no history is kept in this case if the
database is created again. That for sure reclaims the space.
Of course, even for this example, there are limitations in using such a
design. But it can be a starting point for you designing your project.
If you want something simpler, then maybe you should ask the developers
to add a "no history" option to CouchDB (it wouldn't be a bad idea and I
am not ironic here).
But, as I mentioned before, the design depends on your project only and
there is no general solution.
I hope this opinion will help you in your project.
CGS
On 12/24/2011 01:09 AM, Mark Hahn wrote:
That means, you move the data from one to the other, filtering out the
deleted documents, and when it's over, you switch to the newly constructed
database, while the other gets emptied (deleted and re-created).
1) How exactly could you make this switch without interrupting service?
2) Wouldn't this procedure create the exact same eventual consistency
problems that deleting documents in a db would?