English is not my native language as well, so, it may be that I
misunderstood your point.
That is the general case with or without your specific intervention.
Nevertheless, writing directly on the harddisk is not quite the best
practice because you reduce the life of your harddisk to at least half
(I've experienced such practices and I've seen some monstrous results).
In addition, as far as I understood how CouchDB works, writing
one-by-one documents forces the two headers to be updated every time one
document arrives, while buffering the documents and flush them on the
harddisk at once requires one update session for the two headers. So,
you have to balance your design in between speed and safety of your data.
But, as long as you don't expect power failure (a little bit of optimism
here, I know) and you take care of not trying to use more resources than
you have, buffering will lose no data if your application is write-safe
(e.g., check for integrity of the operation before discarding the data
from the buffer and use of 2 buffers: one as accumulator and one for
interfacing with the writing part). I know that's a bit of extra-work
here and you use more resources, but it provides a good quantification
of your data transfer toward your harddisk. Using this and the power of
CouchDB of fast recovering from errors, my impression is that those 10
docs per second rate drawback can be ignored. But, of course, that's my
opinion and it is subject of less objectivity.
Well, I know you have lot of experience and I hope you don't take this
post as a lecture (far from me to lecture anyone as long as I need to
learn more), but as a more detailed opinion to support my previously
expressed suggestion. I hope I didn't bore you to death with this post.
Cheers,
CGS
On 10/27/2011 10:09 AM, Konstantin Cherkasov wrote:
I do not quite understand why to lose 1 day from your customer data.
Excuse my English.
I meant "one day" = "once" in the sense that buffer usually does not assume
durability.
In other words, if you choose to buffer the data then you are ready that
there is a probability the data can be lost.