yes, I know , but  couchdb storage engine cannot optimize
this while operating normaly. only after compaction is finished, the
database
is optimized.

I presume that the entire btree is traversed to detect revisions and unused
btree nodes.

I have no revisions on documents.

My case clear leans toward the unused nodes.

Couldn't be those nodes detected in a timely manner,
while inserting (appending to the end of file) documents , and be deleted
automatically?

But I assume that the btree must be traversed  every time an insert is done
(or may be traversed from a few nodes above the last 100 or 1000 new
documents).

Now the problem consist in why and how those node become unusable?

What are the conditions necessary that db produces dead nodes?

If you could manage to avoid this I think you have a self-compacting
database.

Just my 2 cents.

just a side question.. wouldn't be nice to have multiple storage engines
that follow the same
replication protocol, of course

/Bogdan







On Mon, Oct 10, 2016 at 3:21 PM, Jan Lehnardt <[email protected]> wrote:

>
> > On 10 Oct 2016, at 12:42, Bogdan Andu <[email protected]> wrote:
> >
> > but still does not explain the huge difference in size between (1) and
> (2)
> > given the fact that the docs are simple jsons and no document had
> > 2 revisions ever.
>
> That is just how CouchDB works. It never overwrites any data it has on
> disk, that includes all the btree nodes that are obviated with each
> new document write.
>
> >
> > Compaction means discarding all revisions but the latest(newest)
>
> Compaction also means removing unused btree nodes.
>
> Best
> Jan
> --
>
> >
> > /Bogdan
> >
> > On Mon, Oct 10, 2016 at 1:05 PM, Jan Lehnardt <[email protected]> wrote:
> >
> >>
> >>> On 10 Oct 2016, at 11:54, Bogdan Andu <[email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I return with updated info :
> >>>
> >>> I compacted db1 (CouchDB/1.6.1) on the source and now has 350 MB from
> >> 2.5 GB
> >>> with 362849 no. of documents
> >>> I also compacted the views but no difference in size .
> >>>
> >>> The database stores documents of the following form:
> >>>
> >>> {
> >>>  "_id": "00006df04672a0c0e0da142ad8cd90b9",
> >>>  "_rev": "1-a14afd34d5a52e3f6ae515c9adcff2d3",
> >>>  "local_id": "110361",
> >>>  "email": "[email protected]",
> >>>  "sent_date": "2007-06-29 12:20:31",
> >>>  "regtype": "n"
> >>> }
> >>>
> >>> Huge difference between 2.5GB and 350 MB and the
> >>> documents had no revisions.
> >>>
> >>> If Couch is able to reduce a db's size to this magnitude after
> compaction
> >>> why cannot maintain the aprox. the same size limit during
> >>> normal operations(there are no deletions, no updates , only
> insertions).
> >>
> >> For CouchDB Compaction is considered normal operation.
> >>
> >>>
> >>> Maybe the b-tree is optimized only after compaction, and not during
> >>> repetitive insertions
> >>>
> >>> (aprox. 2000 insertions/day).
> >>>
> >>> and for the sake of consistency..
> >>>
> >>> after replication to 2.0 couchdb, the same database
> >>> (with views generated took ~ 20 minutes / 362849 docs), we have:
> >>>
> >>> 69.3 MB / 362849 documents
> >>>
> >>> Now the big surprise is the huge difference in
> >>> size resulted after compaction on 1.6.1
> >>>
> >>>
> >>> to summarize :
> >>>
> >>> (1) 1.6.1     original             2.5 GB      362849 docs
> >>>
> >>> (2) 1.6.1     compacted            350 MB      362849 docs
> >>>
> >>> (3) 2.0       replicate (from (1)) 69.3 MB     362849 docs
> >>
> >> These numbers confirm the significant improvements that were done
> >> to the compactor for 2.0. I’m glad it’s showing for you :)
> >>
> >> https://blog.couchdb.org/2016/08/10/
> >>
> >> Best
> >> Jan
> >> --
> >>
> >>
> >>>
> >>> /Bogdan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Oct 7, 2016 at 4:43 PM, Adam Kocoloski <[email protected]>
> >> wrote:
> >>>
> >>>> Lots of good questions there.
> >>>>
> >>>> On the storage size, note that even when you write only one revision
> of
> >>>> each document the database will accumulate some wasted space. Inserts
> to
> >>>> the database cause internal btree structures to be updated, and due to
> >> the
> >>>> copy-on-write nature of the storage engine the old btree nodes are
> left
> >>>> around in the file.
> >>>>
> >>>> We did make some changes in the compaction system that produce smaller
> >>>> files at the end of the day. You can read more about those changes
> here
> >> -
> >>>> https://blog.couchdb.org/2016/08/10/feature-compaction/ <
> >>>> https://blog.couchdb.org/2016/08/10/feature-compaction/> - but they
> >> don’t
> >>>> explain the difference that you reported. Perhaps you didn’t compact
> the
> >>>> source database at all?
> >>>>
> >>>> You are correct that both design documents and mango will build
> >>>> btree-based indexes to answer their queries. I would like to see us
> add
> >>>> functionality to mango over time so that it can cover the large
> >> majority of
> >>>> use cases where folks need to appeal to views in design documents, but
> >>>> we’re not quite there yet. One example where mango cannot help you
> >> today is
> >>>> reduce functions; if you want to aggregate the values in your index
> you
> >>>> need to drop down and build a view for that.
> >>>>
> >>>> In terms of performance, mango should be moderately faster at building
> >> an
> >>>> index because there’s no JavaScript roundtrip. Querying performance
> >> should
> >>>> be ~identical. Cheers,
> >>>>
> >>>> Adam
> >>>>
> >>>>> On Oct 7, 2016, at 7:56 AM, Thanos Vassilakis <[email protected]>
> >> wrote:
> >>>>>
> >>>>> Good questions
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>>> On Oct 7, 2016, at 5:29 AM, Bogdan Andu <[email protected]> wrote:
> >>>>>>
> >>>>>> I see the data management is totally different(and better).
> >>>>>> now there is a _dbs.couch for a registry-like database for databases
> >>>>>> and actual databases are located in data/shards subdirectories.
> >>>>>>
> >>>>>> so.. only replication works here..
> >>>>>> and one can replicate many databases in parallel.
> >>>>>>
> >>>>>> another difference I see is the size of databases.
> >>>>>>
> >>>>>> 2.0 version keep a very small size of databases compared to 1.6.1
> >>>> version.
> >>>>>>
> >>>>>> Is there any change in storage engine that makes so big differences
> in
> >>>>>> database sizes?
> >>>>>>
> >>>>>> all records in db1 in 1.6.1 have only one revision like (1-...)
> format
> >>>>>>
> >>>>>> db1 in 1.6.1 is 2.5GB with 362849 records
> >>>>>> after replication:
> >>>>>> db1 in 2.0 has 69.3 MB with 362849 records
> >>>>>>
> >>>>>> when is recommended to use design documents and when mango queries.
> >>>>>> is mango intended to replace design documents although I assume both
> >>>>>> build a view tree for the query in question.
> >>>>>>
> >>>>>> which one is faster?
> >>>>>> what are the use-cases for each one of the query methods?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Bogdan
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Fri, Oct 7, 2016 at 11:20 AM, max <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Install 2.0 version on another server or just make it listen on
> >>>> different
> >>>>>>> port than 1.6 then replicate your data ;)
> >>>>>>>
> >>>>>>> 2016-10-07 9:49 GMT+02:00 Bogdan Andu <[email protected]>:
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I configured a single-node CouchDB 2.0 instance and
> >>>>>>>> I copied in data directory 1.6.1 couch databases.
> >>>>>>>>
> >>>>>>>> But the databases does not show up in Fauxton, only the
> >>>>>>>> test databases:
> >>>>>>>>
> >>>>>>>> ["_global_changes","_replicator","_users","verifytestdb"].
> >>>>>>>>
> >>>>>>>> Is there a way to make CouchDB 2.0 read 1.6.1 couch files
> >>>>>>>>
> >>>>>>>> without importing?
> >>>>>>>>
> >>>>>>>> /Bogdan
> >>>>>>>
> >>>>
> >>>>
> >>
> >> --
> >> Professional Support for Apache CouchDB:
> >> https://neighbourhood.ie/couchdb-support/
> >>
> >>
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
>

Reply via email to