On Mon, Oct 10, 2016 at 4:36 PM, Jan Lehnardt <j...@apache.org> wrote:
> > On 10 Oct 2016, at 14:59, Bogdan Andu <bog...@gmail.com> wrote:
> > yes, I know , but couchdb storage engine cannot optimize
> > this while operating normaly. only after compaction is finished, the
> > database
> > is optimized.
> > I presume that the entire btree is traversed to detect revisions and
> > btree nodes.
> > I have no revisions on documents.
> > My case clear leans toward the unused nodes.
> > Couldn't be those nodes detected in a timely manner,
> > while inserting (appending to the end of file) documents , and be deleted
> > automatically?
> we could do that, but then we’d open ourselves up for database corruption
> during power-, hardware- or software-failures. There are sophisticated
> techniques to safeguard against that, but they come with their own set
> of trade-offs, one of which is code complexity. Other databases have
> millions of lines of code in just this area and CouchDB is <100kLoC total.
> there is an interesting project called scalaris (http://scalaris.zib.de/)
that uses paxos commit protocol and
and algorithms borrowed from torrent technology but they do not store the
db on disk.
another interesting technology is hibary database that uses a concept of
bricks and virtual nodes.
> > But I assume that the btree must be traversed every time an insert is
> > (or may be traversed from a few nodes above the last 100 or 1000 new
> > documents).
> Yes, for individual docs, it is each time, for bulk doc requests with
> somewhat sequential doc ids, it is about per bulk size.
> > Now the problem consist in why and how those node become unusable?
> > What are the conditions necessary that db produces dead nodes?
> As soon as a document (or set of docs in a bulk docs request) is written,
> we stop referencing existing btree nodes up the tree in the particular
but I think stop referencing the nodes does not means garbage-collecting
> > If you could manage to avoid this I think you have a self-compacting
> > database.
> > Just my 2 cents.
> Again, this is a significant engineering effort. E.g. InnoDB does what
> you propose and it took 100s of millions of dollars and 10 years to get
> up to speed and reliability. CouchDB does not have these kinds of
> > just a side question.. wouldn't be nice to have multiple storage engines
> > that follow the same
> > replication protocol, of course
> We are working on this already :)
wow, and what are the candidates for alternative backends. I presume one of
leveldb, because everybody has it. Even mnesia has it.