On Mon, Oct 10, 2016 at 4:36 PM, Jan Lehnardt <j...@apache.org> wrote:

>
> > On 10 Oct 2016, at 14:59, Bogdan Andu <bog...@gmail.com> wrote:
> >
> > yes, I know , but  couchdb storage engine cannot optimize
> > this while operating normaly. only after compaction is finished, the
> > database
> > is optimized.
> >
> > I presume that the entire btree is traversed to detect revisions and
> unused
> > btree nodes.
> >
> > I have no revisions on documents.
> >
> > My case clear leans toward the unused nodes.
> >
> > Couldn't be those nodes detected in a timely manner,
> > while inserting (appending to the end of file) documents , and be deleted
> > automatically?
>
> we could do that, but then we’d open ourselves up for database corruption
> during power-, hardware- or software-failures. There are sophisticated
> techniques to safeguard against that, but they come with their own set
> of trade-offs, one of which is code complexity. Other databases have
> millions of lines of code in just this area and CouchDB is <100kLoC total.
>
> there is an interesting project called scalaris (http://scalaris.zib.de/)
that uses paxos commit protocol and
and algorithms borrowed from torrent technology but they do not store the
db on disk.
another interesting technology is hibary database that uses a concept of
bricks and virtual nodes.


> > But I assume that the btree must be traversed  every time an insert is
> done
> > (or may be traversed from a few nodes above the last 100 or 1000 new
> > documents).
>
> Yes, for individual docs, it is each time, for bulk doc requests with
> somewhat sequential doc ids, it is about per bulk size.
>
> > Now the problem consist in why and how those node become unusable?
> >
> > What are the conditions necessary that db produces dead nodes?
>
> As soon as a document (or set of docs in a bulk docs request) is written,
> we stop referencing existing btree nodes up the tree in the particular
> branch.
>
but I think stop referencing the nodes does not means garbage-collecting
them

>
>
> > If you could manage to avoid this I think you have a self-compacting
> > database.
> >
> > Just my 2 cents.
>
> Again, this is a significant engineering effort. E.g. InnoDB does what
> you propose and it took 100s of millions of dollars and 10 years to get
> up to speed and reliability. CouchDB does not have these kinds of
> resources.
>
> >
> > just a side question.. wouldn't be nice to have multiple storage engines
> > that follow the same
> > replication protocol, of course
>
> We are working on this already :)
>
wow, and what are the candidates for alternative backends. I presume one of
them is
leveldb, because everybody has it. Even mnesia has it.


/Bogdan

Reply via email to