from:"Adam Kocoloski"

Updated DEB / RPM packages and container images for 3.2.1

2022-03-02 Thread Adam Kocoloski

Hi all, we’ve published updated convenience binaries and images for CouchDB 
3.2.1 to Artifactory (DEB/RPM) and Docker Hub. Notable changes include

- Upgrading Erlang 20 to Erlang 23
- Upgrading Debian 10 to Debian 11 as the base for the Docker image
- Adding ppc64le support (alongside amd64 and arm64) for the Debian package and 
container image

The Erlang upgrade in particular should resolve a reported issue where CouchDB 
instances cannot replicate with a subset of TLS-protected endpoints, including 
the NPM registry. Enjoy!

Adam

Re: Setting up smoosh for database compaction

2021-08-18 Thread Adam Kocoloski

Hi Paul, sorry to hear you’re finding it a challenge to configure. The default 
configuration described in the documentation does give you an example of how 
things are set up:

https://docs.couchdb.org/en/3.1.1/maintenance/compaction.html#channel-configuration

Cross-referenced from that section you can find the full configuration 
reference that describes all the supported configuration keys at the channel 
level:

https://docs.couchdb.org/en/3.1.1/config/compaction.html#config-compactions

The general idea is that you create [smoosh.] configuration blocks 
with whatever settings you deem appropriate to match a certain set of files and 
prioritize them, and then use the [smoosh] block to activate those channels.

Can you say a little more about what you’re finding lacking in the docs? Cheers,

Adam

> On Aug 18, 2021, at 2:58 AM, Paul Milner  wrote:
> 
> Hello
> 
> I'm looking at the maintenance of my databases and how I could implement
> tools to do that. Smoosh seems to be the main option, but I'm struggling to
> set it up as the documentation seems a bit limited.
> 
> I have only really found this:
> 
> 5.1. Compaction — Apache CouchDB® 3.1 Documentation
> 
> 
> I could do it manually but wanted to explore this first and was wondering
> if there are any smoosh examples about, that could help me on my way?
> 
> If anyone could point me in the right direction please, I would appreciate
> it.
> 
> Thanks a lot
> Best regards
> Paul

Re: CouchDB and RabbitMQ clusters

2021-07-15 Thread Adam Kocoloski

That’s typically how it works for a well-behaved Erlang application, yes. 
CouchDB does work this way; I’m not 100% certain about RabbitMQ but it probably 
does as well. Cheers,

Adam

> On Jul 15, 2021, at 5:11 AM, Andrea Brancatelli 
>  wrote:
> 
> Hello everybody, 
> 
> I have a general Erlang question but I think you could help me with
> that... 
> 
> I need to run CouchDB and RabbitMQ on the same set of (three) nodes, all
> clustered together. 
> 
> What happens with epmd? Erlang's documentation
> (https://erlang.org/doc/man/epmd.html) is pretty vague: "The daemon is
> started automatically by command erl(1) [1] if the node is to be
> distributed and no running instance is present."... 
> 
> So what happens? The first one between Couch and Rabbit who starts opens
> epmd and the second one just hooks to the already running copy?
> 
> Thanks. 
> 
> -- 
> 
> Andrea Brancatelli
> 
> 
> 
> Links:
> --
> [1] https://erlang.org/doc/man/erl.html

Re: couchdb 3.X partitioning

2020-07-07 Thread Adam Kocoloski

The tricky part is that partitioned databases have a hard requirement on 
document IDs to have a “:” in them to demarcate between the partition and rest 
of the document ID.  Replication can’t change document ID, but if the source 
database happens to fulfill that requirement for all of its documents 
(excluding _design documents), then you could create a partitioned database on 
the target and replicate into it. But that’s a pretty unlikely coincidence.

Switching to partitioned databases is unfortunately more likely to require an 
external ETL job.

Adam

> On Jul 7, 2020, at 8:30 AM, Jan Lehnardt  wrote:
> 
> Hi Sharath,
> 
>> On 7. Jul 2020, at 14:17, Sharath  wrote:
>> 
>> Hi,
>> 
>> Got couchdb 3.1 running and migrated my database (replicated) over.
>> 
>> Read about partitioning and have the following questions:
>> 
>> Can a partitioned database be created when replicating from another couchdb
>> instance?
> 
> Do you mean with the `create_target: true` option? Probably not, but you can
> create the database yourself as partitioned and then replicate over.
> 
> Best
> Jan
> —
> 
>> 
>> [I think not but have to ask]
>> 
>> thanks
>> Sharath

Re: How to work with Indexes

2020-06-04 Thread Adam Kocoloski

Hi Piotr,

To first order, yes, I’d try to have every query an application issues be 
satisfied by an index.

If you’ve got some queries that you run infrequently in the background those 
may not warrant an index, as the resources required to keep the index 
up-to-date would be greater than the cost of just scanning the entire database.

Cheers, Adam

> On Jun 4, 2020, at 8:53 AM, Piotr Zarzycki  wrote:
> 
> Hello everyone,
> 
> We are building JS based application which storing data in CouchDb. It is a
> greenfield application in case of front end and database structure. I would
> use some examples at the beginning.
> 
> I have some selector which gets me data from db:
> 
> const q = {
>  selector: {
>historyId: document.historyId,
>  },
>  sort: [{ version: "desc" }],
>  limit: 500,
>};
> 
> 
> In the results I will get data along with warning:
> 
> "warning": "no matching index found, create an index to optimize query time"
> 
> 
> I can easily get rid of that warning by adding using Fauxton index:
> 
> {
>   "index": {
>  "fields": [
> "historyId"
>  ]
>   },
>   "name": "historyId-json-index",
>   "type": "json"
> }
> 
> 
> My question is - How do you guys approach to working with indexes ? Should
> I add it to each query which I'm doing ?
> 
> Thoughts about approaches would be much appreciated. :)
> 
> Thanks,
> -- 
> 
> Piotr Zarzycki
> 
> Patreon: *https://www.patreon.com/piotrzarzycki
> *

Re: running couchdb on a .app domain (https enforced)

2019-08-12 Thread Adam Kocoloski

This means something else is already listening on one of the ports that CouchDB 
is trying to use.

Adam

> On Aug 12, 2019, at 3:24 AM, Rene Veerman  wrote:
> 
> eaddrinuse

Re: High memory consumption of a single node CouchDB server

2019-06-18 Thread Adam Kocoloski

;> écrit :
>>>>>>> 
>>>>>>>> Hey guys. I bet it's a mailbox leaking memory. I am very
>>> interested
>>>>> in
>>>>>>>> debugging issues like this too.
>>>>>>>> 
>>>>>>>> I can suggest to get an erlang shell and run these commands to
>>> see
>>>>> the
>>>>>>> top
>>>>>>>> memory consuming processes
>>>>>>>> 
>>> https://www.mail-archive.com/user@couchdb.apache.org/msg29365.html
>>>>>>>> 
>>>>>>>> One issue I will be reporting soon is if one of your nodes is
>>> down
>>>>> for
>>>>>>> some
>>>>>>>> amount of time, it seems like all databases independently try
>> and
>>>>> retry
>>>>>>> to
>>>>>>>> query the missing node and fail, resulting in printing a lot of
>>>> logs
>>>>>> for
>>>>>>>> each db which can overwhelm the logger process. If you have a
>> lot
>>>> of
>>>>>> DBs
>>>>>>>> this makes the problem worse, but it doesn't happen right away
>>> for
>>>>> some
>>>>>>>> reason.
>>>>>>>> 
>>>>>>>> On Fri, Jun 14, 2019 at 4:25 PM Adrien Vergé <
>>>>> adrien.ve...@tolteck.com
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Jérôme and Adam,
>>>>>>>>> 
>>>>>>>>> That's funny, because I'm investigating the exact same
>> problem
>>>>> these
>>>>>>>> days.
>>>>>>>>> We have a two CouchDB setups:
>>>>>>>>> - a one-node server (q=2 n=1) with 5000 databases
>>>>>>>>> - a 3-node cluster (q=2 n=3) with 5 databases
>>>>>>>>> 
>>>>>>>>> ... and we are experiencing the problem on both setups. We've
>>>> been
>>>>>>> having
>>>>>>>>> this problem for at least 3-4 months.
>>>>>>>>> 
>>>>>>>>> We've monitored:
>>>>>>>>> 
>>>>>>>>> - The number of open files: it's relatively low (both the
>>>> system's
>>>>>>> total
>>>>>>>>> and or fds opened by beam.smp).
>>>>>>>>>  https://framapic.org/wQUf4fLhNIm7/oa2VHZyyoPp9.png
>>>>>>>>> 
>>>>>>>>> - The usage of RAM, total used and used by beam.smp
>>>>>>>>>  https://framapic.org/DBWIhX8ZS8FU/MxbS3BmO0WpX.png
>>>>>>>>>  It continuously grows, with regular spikes, until killing
>>>> CouchDB
>>>>>>> with
>>>>>>>> an
>>>>>>>>> OOM. After restart, the RAM usage is nice and low, and no
>>> spikes.
>>>>>>>>> 
>>>>>>>>> - /_node/_local/_system metrics, before and after restart.
>>> Values
>>>>>> that
>>>>>>>>> significantly differ (before / after restart) are listed
>> here:
>>>>>>>>>  - uptime (obviously ;-))
>>>>>>>>>  - memory.processes : + 3732 %
>>>>>>>>>  - memory.processes_used : + 3735 %
>>>>>>>>>  - memory.binary : + 17700 %
>>>>>>>>>  - context_switches : + 17376 %
>>>>>>>>>  - reductions : + 867832 %
>>>>>>>>>  - garbage_collection_count : + 448248 %
>>>>>>>>>  - words_reclaimed : + 112755 %
>>>>>>>>>  - io_input : + 44226 %
>>>>>>>>>  - io_output : + 157951 %
>>>>>>>>> 
>>>>>>>>> Before CouchDB restart:
>>>>>>>>> {
>>>>>>>>>  "uptime":2712973,
>>>>>>>>>  "memory":{
>>>>>>>>>"other":7250289,
>>>>>>>>>"atom":512625,
>>>>>>>>>"atom_used":510002,
>>>>>>>>>"processes":1877591424,
>&g

Re: High memory consumption of a single node CouchDB server

2019-06-14 Thread Adam Kocoloski

Hi Adrien,

Hi Adrien, there are some additional metrics in the _system output that you 
omitted regarding message queue lengths and process counts. Did you see any 
significant difference in those?

The reason I’m asking is to try and figure out whether a small set of known 
processes within the Erlang VM are consuming a lot of memory (possibly because 
they have large message backlogs), or whether you might have a large number of 
processes hanging around and never getting cleaned up.

Aside from the memory numbers, most of the other metrics you pointed out 
(context_switches, reductions, etc.) are simple counters and so they’re only 
really useful when you look at their derivative.

Adam

> On Jun 14, 2019, at 9:24 AM, Adrien Vergé  wrote:
> 
> Hi Jérôme and Adam,
> 
> That's funny, because I'm investigating the exact same problem these days.
> We have a two CouchDB setups:
> - a one-node server (q=2 n=1) with 5000 databases
> - a 3-node cluster (q=2 n=3) with 5 databases
> 
> ... and we are experiencing the problem on both setups. We've been having
> this problem for at least 3-4 months.
> 
> We've monitored:
> 
> - The number of open files: it's relatively low (both the system's total
> and or fds opened by beam.smp).
>  https://framapic.org/wQUf4fLhNIm7/oa2VHZyyoPp9.png
> 
> - The usage of RAM, total used and used by beam.smp
>  https://framapic.org/DBWIhX8ZS8FU/MxbS3BmO0WpX.png
>  It continuously grows, with regular spikes, until killing CouchDB with an
> OOM. After restart, the RAM usage is nice and low, and no spikes.
> 
> - /_node/_local/_system metrics, before and after restart. Values that
> significantly differ (before / after restart) are listed here:
>  - uptime (obviously ;-))
>  - memory.processes : + 3732 %
>  - memory.processes_used : + 3735 %
>  - memory.binary : + 17700 %
>  - context_switches : + 17376 %
>  - reductions : + 867832 %
>  - garbage_collection_count : + 448248 %
>  - words_reclaimed : + 112755 %
>  - io_input : + 44226 %
>  - io_output : + 157951 %
> 
> Before CouchDB restart:
> {
>  "uptime":2712973,
>  "memory":{
>"other":7250289,
>"atom":512625,
>"atom_used":510002,
>"processes":1877591424,
>"processes_used":1877504920,
>"binary":177468848,
>"code":9653286,
>"ets":16012736
>  },
>  "run_queue":0,
>  "ets_table_count":102,
>  "context_switches":1621495509,
>  "reductions":968705947589,
>  "garbage_collection_count":331826928,
>  "words_reclaimed":269964293572,
>  "io_input":8812455,
>  "io_output":20733066,
>  ...
> 
> After CouchDB restart:
> {
>  "uptime":206,
>  "memory":{
>"other":6907493,
>"atom":512625,
>"atom_used":497769,
>"processes":49001944,
>"processes_used":48963168,
>"binary":997032,
>"code":9233842,
>"ets":4779576
>  },
>  "run_queue":0,
>  "ets_table_count":102,
>  "context_switches":1015486,
>  "reductions":111610788,
>  "garbage_collection_count":74011,
>  "words_reclaimed":239214127,
>  "io_input":19881,
>  "io_output":13118,
>  ...
> 
> Adrien
> 
> Le ven. 14 juin 2019 à 15:11, Jérôme Augé  a
> écrit :
> 
>> Ok, so I'll setup a cron job to journalize (every minute?) the output from
>> "/_node/_local/_system" and wait for the next OOM kill.
>> 
>> Any property from "_system" to look for in particular?
>> 
>> Here is a link to the memory usage graph:
>> https://framapic.org/IzcD4Y404hlr/06rm0Ji4TpKu.png
>> 
>> The memory usage varies, but the general trend is to go up with some
>> regularity over a week until we reach OOM. When "beam.smp" is killed, it's
>> reported as consuming 15 GB (as seen in the kernel's OOM trace in syslog).
>> 
>> Thanks,
>> Jérôme
>> 
>> Le ven. 14 juin 2019 à 13:48, Adam Kocoloski  a
>> écrit :
>> 
>>> Hi Jérôme,
>>> 
>>> Thanks for a well-written and detailed report (though the mailing list
>>> strips attachments). The _system endpoint provides a lot of useful data
>> for
>>> debugging these kinds of situations; do you have a snapshot of the output
>>> when the system was consuming a lot of memory?
>>> 
>>> 
>>> 
>> http://docs.couchdb.org/en/stable/api/server/common.html#node-node-name-system
>>> 
>>> A

Re: High memory consumption of a single node CouchDB server

2019-06-14 Thread Adam Kocoloski

Hi Jérôme,

Thanks for a well-written and detailed report (though the mailing list strips 
attachments). The _system endpoint provides a lot of useful data for debugging 
these kinds of situations; do you have a snapshot of the output when the system 
was consuming a lot of memory?

http://docs.couchdb.org/en/stable/api/server/common.html#node-node-name-system

Adam

> On Jun 14, 2019, at 5:44 AM, Jérôme Augé  wrote:
> 
> Hi,
> 
> I'm having a hard time figuring out the high memory usage of a CouchDB server.
> 
> What I'm observing is that the memory consumption from the "beam.smp" process 
> gradually rises until it triggers the kernel's OOM (Out-Of-Memory) which kill 
> the "beam.smp" process.
> 
> It also seems that many databases are not compacted: I've made a script to 
> iterate over the databases to compute de fragmentation factor, and it seems I 
> have around 2100 databases with a frag > 70%.
> 
> We have a single CouchDB v2.1.1server (configured with q=8 n=1) and around 
> 2770 databases.
> 
> The server initially had 4 GB of RAM, and we are now with 16 GB w/ 8 vCPU, 
> and it still regularly reaches OOM. From the monitoring I see that with 16 GB 
> the OOM is almost triggered once per week (c.f. attached graph).
> 
> The memory usage seems to increase gradually until it reaches OOM.
> 
> The Couch server is mostly used by web clients with the PouchDB JS API.
> 
> We have ~1300 distinct users and by monitoring the netstat/TCP established 
> connections I guess we have around 100 (maximum) users at any given time. 
> From what I understanding of the application's logic, each user access 2 
> private databases (read/write) + 1 common database (read-only).
> 
> On-disk usage of CouchDB's data directory is around 40 GB.
> 
> Any ideas on what could cause such behavior (increasing memory usage over the 
> course of a week)? Or how to find what is happening behind the scene?
> 
> Regards,
> Jérôme

Re: Help in understanding an error message

2019-06-06 Thread Adam Kocoloski

Hi Andrea,

That message is a timeout from one of the workers tasked with reading a 
database shard and responding with that shard’s portion of the response to an 
_all_docs request.

CouchDB spins up redundant workers for each copy of a shard when you make a 
request to _all_docs, so it’s entirely possible that this timeout gets masked 
by the redundancy and you don’t see any user-facing consequences. It is a 
little odd — normally CouchDB will gracefully shut down the workers that it 
doesn’t need. But on its own I would say this is not a major cause for concern.

Adam

> On Jun 6, 2019, at 4:18 AM, Andrea Brancatelli 
>  wrote:
> 
> From time to time I get these errors in my CouchDB log. 
> 
> Can anyone help to understand what it's complaining about? 
> 
> [error] 2019-06-05T14:48:57.184907Z couchdb@10.133.79.176 <0.18022.1113>
>  rexi_server: from: couchdb@10.133.79.175(<12084.22463.1153>)
> mfa: fabric_rpc:all_docs/3 exit:timeout
> [{rexi,init_stream,1,[{file,"src/rexi.erl"},{line,265}]},{rexi,stream2,3,[{file,"src/rexi.erl"},{line,205}]},{fabric_rpc,view_cb,2,[{file,"src/fabric_rpc.erl"},{line,462}]},{couch_mrview,map_fold,3,[{file,"src/couch_mrview.erl"},{line,526}]},{couch_bt_engine,include_reductions,4,[{file,"src/couch_bt_engine.erl"},{line,1074}]},{couch_bt_engine,skip_deleted,4,[{file,"src/couch_bt_engine.erl"},{line,1069}]},{couch_btree,stream_kv_node2,8,[{file,"src/couch_btree.erl"},{line,848}]},{couch_btree,stream_kp_node,8,[{file,"src/couch_btree.erl"},{line,819}]}]
> 
> It's a CouchDB 2.3.1 on FreeBSD 11.2 release p9, with Erlang 21.3.8.1 
> 
> Thanks. 
> 
> -- 
> 
> Andrea Brancatelli

Re: How to notify when an update happened in the document

2019-05-13 Thread Adam Kocoloski

Hi,

The /db/_changes endpoint will inform you of updates to documents in CouchDB. 
If you want the content of the current “winning” revision of the document to be 
included with the notification, add include_docs=true to the query string when 
you make the request. Cheers,

Adam

> On May 13, 2019, at 10:48 AM, Soheil Pourbafrani  
> wrote:
> 
> Hi,
> 
> Using a stream processing engine I need CouchDB data to find out how to
> perform processing on the incomming data. The CouchDB data can be updated.
> I don't want to read data from CouchDB on each incomming message
> processing. So does CouchDB provide any feature for this?

Re: Shards-directory still big after deleting big database in Fauxton

2019-05-12 Thread Adam Kocoloski

Hi Frank,

Thanks for the followup. Definitely appreciate that the clustering feature adds 
complexity and is not appropriate for everyone. The only problem with running 
1.x is that we are not providing any updates at all to that release series - 
even security patches.

Was there something in particular that led you to downgrade?

Cheers, Adam

> On May 11, 2019, at 3:42 AM, Frank Röhm  wrote:
> 
> In the end I indeed downgraded to couchdb v1.5 because I don’t use all this 
> cluster feature and prefer to handle one file for each db ;)
> So all is running again but with couch 1.5 on Ubuntu 14.04
> And with Futon instead of Fauxton. 
> 
> Thanks.
> 
> frank
> 
>> Am 02.05.2019 um 15:21 schrieb Jan Lehnardt :
>> 
>> Glad this worked out. Quick tip then, unless you run this on an 8-core (or 
>> more) machine, you might want to look into reducing your q for this 
>> database. q=2 or $num_cores is a good rule of thumb. You can use our 
>> couch-continuum tool to migrate an existing db: 
>> https://npmjs.com/couch-continuum
>> 
>> Cheers
>> Jan
>> —
>> 
>>> On 2. May 2019, at 14:17, Frank Röhm  wrote:
>>> 
>>> OK, I found it.
>>> In the 8 shards subdirectories (from -1fff to 
>>> e000-) there was still 8 frwiki directories 
>>> (frwiki.1510609658.couch) with each 5 GB.
>>> I deleted them with:
>>> 
>>> find . -name frwiki.1510609658.couch -delete
>>> 
>>> from the shards dir and gone they are.
>>> Hopefully it won’t affect my CouchDB, but as I heard this is very robust ;)
>>> 
>>> I think I can stick to the v2.x now, no need to downgrade now, ouff.
>>> 
>>> frank
>>> 
 Am 02.05.2019 um 07:25 schrieb Joan Touzet :
 
 Look for a .deleted directory under your data/ directory. The files may 
 not have been deleted but moved aside due to the enable_database_recovery 
 setting, or because the DB was still in use when you restarted CouchDB.
 
 Another useful command is:
 
 $ du -sh /opt/couchdb/data/*
 
 which should tell you where the storage is being used. Does this show 
 anything useful to you?
 
 -Joan
 
> On 2019-05-01 2:22 p.m., Frank Walter wrote:
> Hello
> I have CouchDB v2.3.1 (on Ubuntu 14.04) and I use it only for creating
> Wikipedia databases with mwscrape.
> My shards folder was too big, over 50 GB big, so I deleted one big db
> (frwiki) which had 32 GB in Fauxton. That db is gone now.
> After this, I thought now my shards folder should be about 20 GB but it
> is still 52 GB.
> I don't find any documentation about that in the CouchDB Doc.
> I restarted CouchDB (/etc/init.d/couchdb restart) but nothing changes.
> How can I reduce the size of shards? How can I get rid of this ghost-db?
> My next step would be, if I cannot solve this issue, to uninstall
> CouchDB 2.x and reinstall 1.x, because I dont need that feature of
> cluster server anyway. I see only inconvenience for my use.
> Thanks
> frank
>>> 
>> 
>

Re: Disk full

2019-05-02 Thread Adam Kocoloski

Hi Willem,

Good question. CouchDB has a 100% copy-on-write storage engine, including for 
all updates to btree nodes, etc. so any updates to the database will 
necessarily increase the file size before compaction. Looking at your info I 
don’t see a heavy source of updates, so it is a little puzzling.

Adam


> On May 2, 2019, at 12:53 PM, Willem Bison  wrote:
> 
> Hi Adam,
> 
> I ran "POST compact" on the DB mentioned in my post and 'disk_size' went
> from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480.
> 
> Wow.
> 
> I disabled compacting because I thought it was useless in our case since
> the db's and the docs are so small. I do wonder how it is possible for a db
> to grow so much when its being deleted several times a week. What is all
> the 'air' ?
> 
> On Thu, 2 May 2019 at 18:31, Adam Kocoloski  wrote:
> 
>> Hi Willem,
>> 
>> Compaction would certainly reduce your storage space. You have such a
>> small number of documents in these databases that it would be a fast
>> operation.  Did you try it and run into issues?
>> 
>> Changing cluster.q shouldn’t affect the overall storage consumption.
>> 
>> Adam
>> 
>>> On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
>>> 
>>> Hi,
>>> 
>>> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
>>> disk space, so much so that it regularly causes a disk full and a crash.
>>> 
>>> The server contains approximately 100 databases each with a reported
>>> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
>>> 'shards' folders combined exceeded a total 14G causing the server to
>> crash.
>>> 
>>> The server is configured with
>>> cluster.n = 1 and
>>> cluster.q = 8
>>> because that was suggested during setup.
>>> 
>>> When I write this the 'shards' folders look like this:
>>> /var/lib/couchdb/shards# du -hs *
>>> 869M -1fff
>>> 1.4G 2000-3fff
>>> 207M 4000-5fff
>>> 620M 6000-7fff
>>> 446M 8000-9fff
>>> 458M a000-bfff
>>> 400M c000-dfff
>>> 549M e000-
>>> 
>>> One of the largest files is this:
>>> curl localhost:5984/xxx_1590
>>> {
>>>   "db_name": "xxx_1590",
>>>   "purge_seq":
>>> 
>> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
>>>   "update_seq":
>>> 
>> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
>>>   "sizes": {
>>>   "file": 595928643,
>>>   "external": 462778,
>>>   "active": 1393380
>>>   },
>>>   "other": {
>>>   "data_size": 462778
>>>   },
>>>   "doc_del_count": 0,
>>>   "doc_count": 74,
>>>   "disk_size": 595928643,
>>>   "disk_format_version": 7,
>>>   "data_size": 1393380,
>>>   "compact_running": false,
>>>   "cluster": {
>>>   "q": 8,
>>>   "n": 1,
>>>   "w": 1,
>>>   "r": 1
>>>   },
>>>   "instance_start_time": "0"
>>> }
>>> 
>>> curl localhost:5984/xxx_1590/_local_docs
>>> {"total_rows":null,"offset":null,"rows":[
>>> 
>> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
>>> 
>> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
>>> 
>> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
>>> 
>> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
>>> ]}
>>> 
>>> Each database push/pull replicates with a small number of clients (< 10).
>>> Most of the documents contain orders that are shortlived. We throw away
>> all
>>> db's 3 times a week as a brute force purge.
>>> Compacting has been disabled because it takes too much cpu and was
>>> considered useless in our case (small db's, purging).
>>> 
>>> I read this:
>>> https://github.com/apache/couchdb/issues/1621
>>> but I'm not sure how it helps me.
>>> 
>>> These are my questions:
>>> How is it possible that such a small db occupies so much space?
>>> What can I do to reduce this?
>>> Would changing 'cluster.q' have any effect or would the same amount of
>>> bytes be used in less folders? (am I correct in assuming that cluster.q
>>> 1
>>> is pointless in standalone configuration?)
>>> 
>>> Thanks!
>>> Willem
>> 
>>

Re: Disk full

2019-05-02 Thread Adam Kocoloski

Hi Willem,

Compaction would certainly reduce your storage space. You have such a small 
number of documents in these databases that it would be a fast operation.  Did 
you try it and run into issues?

Changing cluster.q shouldn’t affect the overall storage consumption.

Adam

> On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
> 
> Hi,
> 
> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
> disk space, so much so that it regularly causes a disk full and a crash.
> 
> The server contains approximately 100 databases each with a reported
> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
> 'shards' folders combined exceeded a total 14G causing the server to crash.
> 
> The server is configured with
> cluster.n = 1 and
> cluster.q = 8
> because that was suggested during setup.
> 
> When I write this the 'shards' folders look like this:
> /var/lib/couchdb/shards# du -hs *
> 869M -1fff
> 1.4G 2000-3fff
> 207M 4000-5fff
> 620M 6000-7fff
> 446M 8000-9fff
> 458M a000-bfff
> 400M c000-dfff
> 549M e000-
> 
> One of the largest files is this:
> curl localhost:5984/xxx_1590
> {
>"db_name": "xxx_1590",
>"purge_seq":
> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
>"update_seq":
> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
>"sizes": {
>"file": 595928643,
>"external": 462778,
>"active": 1393380
>},
>"other": {
>"data_size": 462778
>},
>"doc_del_count": 0,
>"doc_count": 74,
>"disk_size": 595928643,
>"disk_format_version": 7,
>"data_size": 1393380,
>"compact_running": false,
>"cluster": {
>"q": 8,
>"n": 1,
>"w": 1,
>"r": 1
>},
>"instance_start_time": "0"
> }
> 
> curl localhost:5984/xxx_1590/_local_docs
> {"total_rows":null,"offset":null,"rows":[
> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
> ]}
> 
> Each database push/pull replicates with a small number of clients (< 10).
> Most of the documents contain orders that are shortlived. We throw away all
> db's 3 times a week as a brute force purge.
> Compacting has been disabled because it takes too much cpu and was
> considered useless in our case (small db's, purging).
> 
> I read this:
> https://github.com/apache/couchdb/issues/1621
> but I'm not sure how it helps me.
> 
> These are my questions:
> How is it possible that such a small db occupies so much space?
> What can I do to reduce this?
> Would changing 'cluster.q' have any effect or would the same amount of
> bytes be used in less folders? (am I correct in assuming that cluster.q > 1
> is pointless in standalone configuration?)
> 
> Thanks!
> Willem

Re: [DISCUSS] On the _changes feed - how hard should we strive for exactly once semantics?

2019-03-07 Thread Adam Kocoloski

Bah, our “cue”, not our “queue” ;)

Adam

> On Mar 7, 2019, at 7:35 AM, Adam Kocoloski  wrote:
> 
> Hi Garren,
> 
> In general we wouldn’t know ahead of time whether we can complete in five 
> seconds. I believe the way it works is that we start a transaction, issue a 
> bunch of reads, and after 5 seconds any additional reads will start to fail 
> with something like “read version too old”. That’s our queue to start a new 
> transaction. All the reads that completed successfully are fine, and the 
> CouchDB API layer can certainly choose to start streaming as soon as the 
> first read completes (~2ms after the beginning of the transaction).
> 
> Agree with Bob that steering towards a larger number of short-lived 
> operations is the way to go in general. But I also want to balance that with 
> backwards-compatibility where it makes sense.
> 
> Adam
> 
>> On Mar 7, 2019, at 7:22 AM, Garren Smith  wrote:
>> 
>> I agree that option A seems the most sensibile. I just want to understand
>> this comment:
>> 
>>>> A _changes request that cannot be satisfied within the 5 second limit
>> will be implemented as multiple FoundationDB transactions under the covers
>> 
>> How will we know if a change request cannot be completed in 5 seconds? Can
>> we tell that beforehand. Or would we try and complete a change request. The
>> transaction fails after 5 seconds and then do multiple transactions to get
>> the full changes? If that is the case the response from CouchDB to the user
>> will be really slow as they have already waited 5 seconds and have still
>> not received anything. Or if we start streaming a result back to the user
>> in the first transaction (Is this even possible?) then we would somehow
>> need to know how to continue the changes feed after the transaction has
>> failed.
>> 
>> Then Bob from your comment:
>> 
>>>> Forcing clients to do short (<5s) requests feels like a general good, as
>> long as meaningful things can be done in that time-frame, which I strongly
>> believe from what we've said elsewhere that they can.
>> 
>> That makes sense, but how would we do that? How do you help a user to make
>> sure their request is under 5 seconds?
>> 
>> Cheers
>> Garren
>> 
>> 
>> 
>> On Thu, Mar 7, 2019 at 11:15 AM Robert Newson  wrote:
>> 
>>> Hi,
>>> 
>>> Given that option A is the behaviour of feed=continuous today (barring the
>>> initial whole-snapshot phase to catch up to "now") I think that's the right
>>> move.  I confess to not reading your option B too deeply but I was there on
>>> IRC when the first spark was lit. We can build some sort of temporary
>>> multi-index on FDB today, that's clear, but it's equally clear that we
>>> should avoid doing so if at all possible.
>>> 
>>> Perhaps the future Redwood storage engine for FDB will, as you say,
>>> significantly improve on this, but, even if it does, I'm not 100% convinced
>>> we should expose it. Forcing clients to do short (<5s) requests feels like
>>> a general good, as long as meaningful things can be done in that
>>> time-frame, which I strongly believe from what we've said elsewhere that
>>> they can.
>>> 
>>> CouchDB's API, as we both know from rich (heh, and sometimes poor)
>>> experience in production, has a lot of endpoints of wildly varying
>>> performance characteristics. It's right that we evolve away from that where
>>> possible, and this seems a great candidate given the replicator in ~all
>>> versions of CouchDB will handle the change without blinking.
>>> 
>>> We have the same issue for _all_docs and _view and _find, in that the user
>>> might ask for more data back than can be sent within a single FDB
>>> transaction. I suggest that's a new thread, though.
>>> 
>>> --
>>> Robert Samuel Newson
>>> rnew...@apache.org
>>> 
>>> On Thu, 7 Mar 2019, at 01:24, Adam Kocoloski wrote:
>>>> Hi all, as the project devs are working through the design for the
>>>> _changes feed in FoundationDB we’ve come across a limitation that is
>>>> worth discussing with the broader user community. FoundationDB
>>>> currently imposes a 5 second limit on all transactions, and read
>>>> versions from old transactions are inaccessible after that window. This
>>>> means that, unlike a single CouchDB storage shard, it is not possible
>>>> to grab a long-lived snapshot of the entire database.
>>>> 
>>>>

Re: [DISCUSS] On the _changes feed - how hard should we strive for exactly once semantics?

2019-03-07 Thread Adam Kocoloski

Hi Garren,

In general we wouldn’t know ahead of time whether we can complete in five 
seconds. I believe the way it works is that we start a transaction, issue a 
bunch of reads, and after 5 seconds any additional reads will start to fail 
with something like “read version too old”. That’s our queue to start a new 
transaction. All the reads that completed successfully are fine, and the 
CouchDB API layer can certainly choose to start streaming as soon as the first 
read completes (~2ms after the beginning of the transaction).

Agree with Bob that steering towards a larger number of short-lived operations 
is the way to go in general. But I also want to balance that with 
backwards-compatibility where it makes sense.

Adam

> On Mar 7, 2019, at 7:22 AM, Garren Smith  wrote:
> 
> I agree that option A seems the most sensibile. I just want to understand
> this comment:
> 
>>> A _changes request that cannot be satisfied within the 5 second limit
> will be implemented as multiple FoundationDB transactions under the covers
> 
> How will we know if a change request cannot be completed in 5 seconds? Can
> we tell that beforehand. Or would we try and complete a change request. The
> transaction fails after 5 seconds and then do multiple transactions to get
> the full changes? If that is the case the response from CouchDB to the user
> will be really slow as they have already waited 5 seconds and have still
> not received anything. Or if we start streaming a result back to the user
> in the first transaction (Is this even possible?) then we would somehow
> need to know how to continue the changes feed after the transaction has
> failed.
> 
> Then Bob from your comment:
> 
>>> Forcing clients to do short (<5s) requests feels like a general good, as
> long as meaningful things can be done in that time-frame, which I strongly
> believe from what we've said elsewhere that they can.
> 
> That makes sense, but how would we do that? How do you help a user to make
> sure their request is under 5 seconds?
> 
> Cheers
> Garren
> 
> 
> 
> On Thu, Mar 7, 2019 at 11:15 AM Robert Newson  wrote:
> 
>> Hi,
>> 
>> Given that option A is the behaviour of feed=continuous today (barring the
>> initial whole-snapshot phase to catch up to "now") I think that's the right
>> move.  I confess to not reading your option B too deeply but I was there on
>> IRC when the first spark was lit. We can build some sort of temporary
>> multi-index on FDB today, that's clear, but it's equally clear that we
>> should avoid doing so if at all possible.
>> 
>> Perhaps the future Redwood storage engine for FDB will, as you say,
>> significantly improve on this, but, even if it does, I'm not 100% convinced
>> we should expose it. Forcing clients to do short (<5s) requests feels like
>> a general good, as long as meaningful things can be done in that
>> time-frame, which I strongly believe from what we've said elsewhere that
>> they can.
>> 
>> CouchDB's API, as we both know from rich (heh, and sometimes poor)
>> experience in production, has a lot of endpoints of wildly varying
>> performance characteristics. It's right that we evolve away from that where
>> possible, and this seems a great candidate given the replicator in ~all
>> versions of CouchDB will handle the change without blinking.
>> 
>> We have the same issue for _all_docs and _view and _find, in that the user
>> might ask for more data back than can be sent within a single FDB
>> transaction. I suggest that's a new thread, though.
>> 
>> --
>>  Robert Samuel Newson
>>  rnew...@apache.org
>> 
>> On Thu, 7 Mar 2019, at 01:24, Adam Kocoloski wrote:
>>> Hi all, as the project devs are working through the design for the
>>> _changes feed in FoundationDB we’ve come across a limitation that is
>>> worth discussing with the broader user community. FoundationDB
>>> currently imposes a 5 second limit on all transactions, and read
>>> versions from old transactions are inaccessible after that window. This
>>> means that, unlike a single CouchDB storage shard, it is not possible
>>> to grab a long-lived snapshot of the entire database.
>>> 
>>> In extant versions of CouchDB we rely on this long-lived snapshot
>>> behavior for a number of operations, some of which are user-facing. For
>>> example, it is possible to make a request to the _changes feed for a
>>> database of an arbitrary size and, if you’ve got the storage space and
>>> time to spare, you can pull down a snapshot of the entire database in a
>>> single request. That snapshot will contain exactly one

[DISCUSS] On the _changes feed - how hard should we strive for exactly once semantics?

2019-03-06 Thread Adam Kocoloski

Hi all, as the project devs are working through the design for the _changes 
feed in FoundationDB we’ve come across a limitation that is worth discussing 
with the broader user community. FoundationDB currently imposes a 5 second 
limit on all transactions, and read versions from old transactions are 
inaccessible after that window. This means that, unlike a single CouchDB 
storage shard, it is not possible to grab a long-lived snapshot of the entire 
database.

In extant versions of CouchDB we rely on this long-lived snapshot behavior for 
a number of operations, some of which are user-facing. For example, it is 
possible to make a request to the _changes feed for a database of an arbitrary 
size and, if you’ve got the storage space and time to spare, you can pull down 
a snapshot of the entire database in a single request. That snapshot will 
contain exactly one entry for each document in the database. In CouchDB 1.x the 
documents appear in the order in which they were most recently updated. In 
CouchDB 2.x there is no guaranteed ordering, although in practice the documents 
are roughly ordered by most recent edit. Note that you really do have to 
complete the operation in a single HTTP request; if you chunk up the requests 
or have to retry because the connection was severed then the exactly-once 
guarantees disappear.

We have a couple of different options for how we can implement _changes with 
FoundationDB as a backing store, I’ll describe them below and discuss the 
tradeoffs

## Option A: Single Version Index, long-running operations as multiple 
transactions

In this option the internal index has exactly one entry for each document at 
all times. A _changes request that cannot be satisfied within the 5 second 
limit will be implemented as multiple FoundationDB transactions under the 
covers. These transactions will have different read versions, and a document 
that gets updated in between those read versions will show up *multiple times* 
in the response body. The entire feed will be totally ordered, and later 
occurrences of a particular document are guaranteed to represent more recent 
edits than than the earlier occurrences. In effect, it’s rather like the 
semantics of a feed=continuous request today, but with much better ordering and 
zero possibility of “rewinds” where large portions of the ID space get replayed 
because of issues in the cluster.

This option is very efficient internally and does not require any background 
maintenance. A future enhancement in FoundationDB’s storage engine is designed 
to enable longer-running read-only transactions, so we will likely to be able 
to improve the semantics with this option over time.

## Option B: Multi-Version Index

In this design the internal index can contain multiple entries for a given 
document. Each entry includes the sequence at which the document edit was made, 
and may also include a sequence at which it was overwritten by a more recent 
edit.

The implementation of a _changes request would start by getting the current 
version of the datastore (call this the read version), and then as it examines 
entries in the index it would skip over any entries where there’s a “tombstone” 
sequence less than the read version. Crucially, if the request needs to be 
implemented across multiple transactions, each transaction would use the same 
read version when deciding whether to include entries in the index in the 
_changes response. The readers would know to stop when and if they encounter an 
entry where the created version is greater than the read version. Perhaps a 
diagram helps to clarify, a simplified version of the internal index might look 
like

{“seq”: 1, “id”: ”foo”}
{“seq”: 2, “id”: ”bar”, “tombstone”: 5}
{“seq”: 3, “id”: “baz”}
{“seq”: 4, “id”: “bif”, “tombstone": 6}
{“seq”: 5, “id”: “bar”}
{“seq”: 6, “id”: “bif”}

A _changes request which happens to commence when the database is at sequence 5 
would return (ignoring the format of “seq” for simplicity)

{“seq”: 1, “id”: ”foo”}
{“seq”: 3, “id”: “baz”}
{“seq”: 4, “id”: “bif”}
{“seq”: 5, “id”: “bar”}

i.e., the first instance “bar” would be skipped over because a more recent 
version exists within the time horizon, but the first instance of “bif” would 
included because “seq”: 6 is outside our horizon.

The downside of this approach is someone has to go in and clean up tombstoned 
index entries eventually (or else provision lots and lots of storage space). 
One way we could do this (inside CouchDB) would be to have each _changes 
session record its read version somewhere, and then have a background process 
go in and remove tombstoned entries where the tombstone is less than the 
earliest read version of any active request. It’s doable, but definitely more 
load on the server.

Also, note this approach is not guaranteeing that the older versions of the 
documents referenced in those tombstoned entries are actually accessible. Much 
like today, the changes feed would include a

Re: Can a couched 2.3.0 be downgraded to 2.1.1 without changes to the data

2019-02-04 Thread Adam Kocoloski

Ah, we need to do a better job documenting this for sure. I think you _cannot_ 
do an in-place downgrade because the on-disk header format changed to v7 to 
include some metadata needed for the purge feature that was re-introduced in 
2.3.0:

https://github.com/apache/couchdb/commit/78a388d 

Bill’s suggestion of replication is probably going to be the way to go here. 
Someone more familiar with the disk format version changes and upgrade / 
downgrade logic might correct me, but this is my current understanding.

Adam

> On Feb 4, 2019, at 3:21 PM, Bill Stephenson  wrote:
> 
> Have you tried replicating your data to the 2.2.1 DB?
> 
> 
> Bill
> 
> 
> 
>> On Feb 4, 2019, at 2:19 PM, Compu Net  wrote:
>> 
>> Can a couched 2.3.0 be downgraded to 2.1.1 without changes to the data ?
>

Re: How many replications per server?

2018-11-06 Thread Adam Kocoloski

Hmph, that seems to be an oversight in the documentation! Here’s the set of 
performance-related configuration options that can be specified as top-level 
fields in the replication specification:

worker_batch_size
worker_processes
http_connections
connection_timeout
retries_per_request
socket_options
checkpoint_interval
use_checkpoints

These are all documented in the [replicator] section of the configuration docs, 
which is where you’d go to set the defaults for all replications mediated by 
that server:

http://docs.couchdb.org/en/stable/config/replicator.html#replicator 
<http://docs.couchdb.org/en/stable/config/replicator.html#replicator>

Configuring one of those fields in the replication doc will always override the 
default for the server. There are several other additional fields that are 
meaningful in a replication document — I haven’t checked to see if every one is 
documented. The code that validates them all is here:

https://github.com/apache/couchdb/blob/2.2.0/src/couch_replicator/src/couch_replicator_docs.erl#L469-L529
 
<https://github.com/apache/couchdb/blob/2.2.0/src/couch_replicator/src/couch_replicator_docs.erl#L469-L529>

Looks like we have a bit of homework to do here … Cheers,

Adam

> On Nov 6, 2018, at 2:15 AM, Andrea Brancatelli  
> wrote:
> 
> Hi Adam, 
> 
> can you elaborate a bit on the "It's also possible to override resource
> settings on a per-replication basis" topic? 
> 
> I can't seem to find anything here:
> http://docs.couchdb.org/en/stable/replication/replicator.html 
> 
> Neither here:
> http://docs.couchdb.org/en/stable/api/server/common.html#replicate 
> 
> ---
> 
> Andrea Brancatelli
> 
> On 2018-10-30 17:17, Adam Kocoloski wrote:
> 
>> The `worker_processes` and `http_connections` in particular can have a 
>> significant impact on the resource consumption of each replication job. If 
>> your goal is to host a large number of lightweight replications you could 
>> reduce those settings, and then configure the scheduler to keep a large 
>> `max_jobs` running. It's also possible to override resource settings on a 
>> per-replication basis.

Re: How many replications per server?

2018-10-30 Thread Adam Kocoloski

Precisely :)

I agree the settings can be difficult to grok. I’ll bet a few examples in a 
blog post would go a long way towards illustrating the interplay between them. 
Cheers,

Adam

> On Oct 30, 2018, at 1:03 PM, Andrea Brancatelli  
> wrote:
> 
> Thanks Adam, that's what I thought as well, but believe me I'm having a 
> really hard time understanding the explanation of max_jobs and max_churns 
> from the docs.
> 
> I don't exactly get the difference between those two values. My first guess 
> was that max_jobs was a systemwide max value while max_churn would define how 
> many jobs would run at the same time.
> 
> I tried it and it wasn't working as expected.
> 
> Now I just reread it and I'm guessing that
> 
> while true {
> 
>   if (jobs > max_jobs) {
> 
> for (x = 1 to max_churn) {
> 
>   kill_or_start(something)
> 
> }
> 
>   }
> 
>   sleep(interval)
> 
> }
> 
> 
> 
> Is this correct?
> 
> 
> 
> ---
> Andrea Brancatelli
> 
> On 2018-10-30 17:17, Adam Kocoloski wrote:
> 
>> Hi Andrea, your numbers don't sound crazy for an out-of-the-box setup.
>> 
>> Worth noting that in CouchDB 2.1 and above there is a replication scheduler 
>> which can cycle through an ~unlimited number of continuous replications 
>> within a defined resource envelope. The scheduler is documented here:
>> 
>> http://docs.couchdb.org/en/stable/replication/replicator.html#replication-scheduler
>>  
>> <http://docs.couchdb.org/en/stable/replication/replicator.html#replication-scheduler>
>>  
>> <http://docs.couchdb.org/en/stable/replication/replicator.html#replication-scheduler
>>  
>> <http://docs.couchdb.org/en/stable/replication/replicator.html#replication-scheduler>>
>> 
>> There are a number of configuration properties that govern the behavior of 
>> the scheduler and also the default resources allocated to any particular 
>> replication. These are clustered in the [replicator] configuration block:
>> 
>> http://docs.couchdb.org/en/stable/config/replicator.html#replicator 
>> <http://docs.couchdb.org/en/stable/config/replicator.html#replicator> 
>> <http://docs.couchdb.org/en/stable/config/replicator.html#replicator 
>> <http://docs.couchdb.org/en/stable/config/replicator.html#replicator>>
>> 
>> The `worker_processes` and `http_connections` in particular can have a 
>> significant impact on the resource consumption of each replication job. If 
>> your goal is to host a large number of lightweight replications you could 
>> reduce those settings, and then configure the scheduler to keep a large 
>> `max_jobs` running. It's also possible to override resource settings on a 
>> per-replication basis.
>> 
>> Cheers, Adam
>> 
>> 
>>> On Oct 30, 2018, at 11:52 AM, Stefan Klein >> <mailto:st.fankl...@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> can't comment on the behavior of recent, 2.x, versions of couchdb.
>>> 
>>> Long time ago, with couchdb 1.4 or so I ran a similar test.
>>> Our solution was to:
>>> * keep a list of "active" users (by our application specific definition)
>>> * listen to _db_changes
>>> * run one-shot replications for the changed documents to the per-user dbs
>>> of the users who got access to the documents and are "active"
>>> When a users becomes "active" - again determined by application logic - a
>>> one-shot replication is run to bring the per-user db up to date.
>>> 
>>> Sadly this logic is deeply integrated in our application code and can't be
>>> easily extracted to a module (we're using nodejs).
>>> It's also basically unchanged since then and we have to adapt to couchdb
>>> 2.x.
>>> 
>>> regards,
>>> Stefan
>>> 
>>> 
>>> Am Di., 30. Okt. 2018 um 16:22 Uhr schrieb Andrea Brancatelli <
>>> abrancate...@schema31.it <mailto:abrancate...@schema31.it>>:
>>> 
>>>> Sorry the attachment got stripped - here it is:
>>>> https://pasteboard.co/HKRwOFy.png <https://pasteboard.co/HKRwOFy.png>
>>>> 
>>>> ---
>>>> 
>>>> Andrea Brancatelli
>>>> 
>>>> On 2018-10-30 15:51, Andrea Brancatelli wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have a bare curiosity - I know it's a pretty vague question, but how
>>>> many continuous replication jobs one can expect to run on a single "common"
>>>> machine?
>>>>> 
>>>>> 
>>>>> With common I'd say a quad/octa core with ~16GB RAM...
>>>>> 
>>>>> I don't need an exact number, just the order of it... 1? 10? 100? 1000?
>>>>> 
>>>>> I've read a lot about the per-user approach, the filtered replication
>>>> and all that stuff, but on a test server with 64 replication jobs (1
>>>> central user and 32 test users) the machine is totally bent on its knees:
>>>>> 
>>>>> 
>>>>> root@bigdata-free-rm-01:~/asd # uptime
>>>>> 3:50PM up 5 days, 4:55, 3 users, load averages: 9.28, 9.84, 9.39
>>>>> 
>>>>> I'm attaching a screenshot of current htop output (filtered for CouchDB
>>>> user, but it's the only thing running on the machine)...
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Andrea Brancatelli
>> 
>>

Re: How many replications per server?

2018-10-30 Thread Adam Kocoloski

Hi Andrea, your numbers don’t sound crazy for an out-of-the-box setup.

Worth noting that in CouchDB 2.1 and above there is a replication scheduler 
which can cycle through an ~unlimited number of continuous replications within 
a defined resource envelope. The scheduler is documented here:

http://docs.couchdb.org/en/stable/replication/replicator.html#replication-scheduler
 


There are a number of configuration properties that govern the behavior of the 
scheduler and also the default resources allocated to any particular 
replication. These are clustered in the [replicator] configuration block:

http://docs.couchdb.org/en/stable/config/replicator.html#replicator 


The `worker_processes` and `http_connections` in particular can have a 
significant impact on the resource consumption of each replication job. If your 
goal is to host a large number of lightweight replications you could reduce 
those settings, and then configure the scheduler to keep a large `max_jobs` 
running. It’s also possible to override resource settings on a per-replication 
basis.

Cheers, Adam


> On Oct 30, 2018, at 11:52 AM, Stefan Klein  wrote:
> 
> Hi,
> 
> can't comment on the behavior of recent, 2.x, versions of couchdb.
> 
> Long time ago, with couchdb 1.4 or so I ran a similar test.
> Our solution was to:
> * keep a list of "active" users (by our application specific definition)
> * listen to _db_changes
> * run one-shot replications for the changed documents to the per-user dbs
> of the users who got access to the documents and are "active"
> When a users becomes "active" - again determined by application logic - a
> one-shot replication is run to bring the per-user db up to date.
> 
> Sadly this logic is deeply integrated in our application code and can't be
> easily extracted to a module (we're using nodejs).
> It's also basically unchanged since then and we have to adapt to couchdb
> 2.x.
> 
> regards,
> Stefan
> 
> 
> Am Di., 30. Okt. 2018 um 16:22 Uhr schrieb Andrea Brancatelli <
> abrancate...@schema31.it>:
> 
>> Sorry the attachment got stripped - here it is:
>> https://pasteboard.co/HKRwOFy.png
>> 
>> ---
>> 
>> Andrea Brancatelli
>> 
>> On 2018-10-30 15:51, Andrea Brancatelli wrote:
>> 
>>> Hi,
>>> 
>>> I have a bare curiosity - I know it's a pretty vague question, but how
>> many continuous replication jobs one can expect to run on a single "common"
>> machine?
>>> 
>>> With common I'd say a quad/octa core with ~16GB RAM...
>>> 
>>> I don't need an exact number, just the order of it... 1? 10? 100? 1000?
>>> 
>>> I've read a lot about the per-user approach, the filtered replication
>> and all that stuff, but on a test server with 64 replication jobs (1
>> central user and 32 test users) the machine is totally bent on its knees:
>>> 
>>> root@bigdata-free-rm-01:~/asd # uptime
>>> 3:50PM up 5 days, 4:55, 3 users, load averages: 9.28, 9.84, 9.39
>>> 
>>> I'm attaching a screenshot of current htop output (filtered for CouchDB
>> user, but it's the only thing running on the machine)...
>>> 
>>> --
>>> 
>>> Andrea Brancatelli

Re: View Issues Upgrading From 2.0.0 to 2.2.0

2018-10-24 Thread Adam Kocoloski

Hi Maggie,

The mailing list strips out attached images, so that part didn’t come through. 
However, I think the stack trace is sufficient to figure this out. I suspect 
the `node2@dev02` node is still running 2.0.0, while elsewhere in the cluster 
an _all_docs HTTP request landed on a node in the cluster already running 
2.2.0. It turns out that the RPC message format for this type of request 
changed in between 2.0 and 2.2, and as a result the 2.2 node is sending a 
message that the 2.0 node cannot understand. The 2.2 codebase contains an 
upgrade clause to understand the old message format; unfortunately the 2.0 node 
does not understand the new format and so the process for this request crashes.

Upgrading the entire cluster to 2.2 ought to make this message disappear. We 
should do a better job of documenting these occasional RPC format changes and 
known good upgrade paths through them.

Adam

> On Oct 23, 2018, at 5:30 PM, maggie.ji...@tdameritrade.com.INVALID wrote:
> 
> Hello,
>  
> We are in the process of upgrading from 2.0.0 to 2.2.0. We are pretty close 
> but are seeing the below issues when verifying the installation. This is what 
> I see in the log files:
>  
> [error] 2018-10-23T21:08:10.960427Z 
> dev-appinv2@devctlvsse00202.pteassociatesys.local 
>  emulator  
> Error in process <0.14288.0> on node 'node2@dev02' with exit value:
> {{badmatch,{error,{function_clause,nil,[{fabric_rpc,all_docs,[<<"shards/4000-5fff/
>  
> inventory.1539111041">>,[{user_ctx,{user_ctx,null,[<<"_admin">>],undefined}}],{mrargs,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,fwd,268435456,0,0,undefined,false,true,false,true,true,[],false,undefined,undefined,true,[{namespace,<<"_design">>}]}],[{file,"src/fabric_rpc.erl"},{line,95}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]}}},[{ddoc_cache_entry_validation_funs,recover,1,[{file,"src/ddoc_cache_entry_validation_funs.erl"},{line,33}]},{ddoc_cache_entry,do_open,1,[{file,"src/ddoc_cache_entry.erl"},{line,294}]}]}
>  
> We think it might be related to the “inventory” database but that one has 
> never had views created. We’re not sure why it would be erroring out there?
>  
> This is what the UI shows us:
> 
>  
> Thank you,
>  
> Maggie

Re: couchdb erlang reduce - aggregate object

2018-06-13 Thread Adam Kocoloski

Hi David we added that in version 2.0:

https://github.com/apache/couchdb/commit/5a2ec4f50 


The behavior is documented on master:

http://docs.couchdb.org/en/master/ddocs/ddocs.html#reduce-and-rereduce-functions
 


Cheers, Adam

> On Jun 13, 2018, at 4:59 PM, David Park  wrote:
> 
> hey robert, _sum doesn't work on objects :(
> 
> David Park   Software Developer
> 
> M 416*.*461*.*2649
> 
> *Points  *|  More together.
> 
> Connect with us: Points  | Twitter
>  | Blog  | LinkedIn
>  | Careers 
> 
> TSX: PTS ; NASDAQ: PCOM
> 
> 
> 111 Richmond St W., Suite 700, Toronto, ON, M5H 2G4, Canada
> 
> Connect with me: LinkedIn 
> 
> 
> 
> On Wed, Jun 13, 2018 at 3:58 PM Robert Samuel Newson 
> wrote:
> 
>> the built-in reduce ("_sum") handles both arrays and objects, so you
>> should be able to just do;
>> 
>> map:
>> function(doc) {
>>  emit(doc.label, {"base":doc.basePoints, "bonus":doc.bonusPoints});
>> }
>> 
>> reduce:
>> _sum
>> 
>> B.
>> 
>>> On 13 Jun 2018, at 20:40, Aurélien Bénel  wrote:
>>> 
>>> Dear David,
>>> 
 I desire that value to be split out by base and bonus points.
>>> 
>>> Then you’ll need an array as the key: a first member for the selection
>> and a second one for the grouping.
>>> 
>>> 
>>> MAP :
>>> 
>>> function(o) {
>>>  emit( [o.label, "base"] , o.basePoints);
>>>  emit( [o.label, "bonus"], o.bonusPoints);
>>> }
>>> 
>>> REDUCE
>>> 
>>> _sum
>>> 
>>> QUERY
>>> 
>>> ?group=true=["gold"]=["gold",{}]
>>> 
>>> 
>>> Regards,
>>> 
>>> Aurélien
>>> 
>>> 
 Début du message réexpédié :
 
 De: David Park 
 Objet: Rép : couchdb erlang reduce - aggregate object
 Date: 13 juin 2018 à 20:17:19 UTC+2
 À: user@couchdb.apache.org
 Répondre à: user@couchdb.apache.org
 
 ok. so here's my test
 
 3 docs
 
 {"_id": "a", "label": "gold", "basePoints": 1000, "bonusPoints": 2000}
 {"_id": "b", "label": "gold", "basePoints": 1, "bonusPoints": 2}
 {"_id": "c", "label": "silver", "basePoints": 1, "bonusPoints":
>> 2}
 
 Then I took your advice and wrote a javascript (not erlang) view
 
 function(o) {
  emit(o.label, o.basePoints);
  emit(o.label, o.bonusPoints);
 }
 
 then threw in a reduce of
 
 _sum
 
 then I queried my view with group = true where key = label
 And I get a response of
 {
 "rows":[
  {"key":"gold","value":33000}
 ]
 }
 
 I desire that value to be split out by base and bonus points.
 
 David Park   Software Developer
>> 
>>

Re: Sharding: all databases or just one?

2018-01-14 Thread Adam Kocoloski

Hi Ulrich,

Assuming usersdb-XYZ already existed, yes you should have needed a revision ID 
to issue a successful PUT request there.

The statement in the docs is a statement about the availability of the system 
for reads and writes. The cluster will continue to function in that degraded 
state with two nodes down, but most of the time you wouldn’t want to conduct 
planned maintenance in a way that lowers the number of live replicas for a 
shard below your desired replica count.

Appreciate the feedback about the docs being vague in this respect. I suspect 
part of the problem is a relative lack of open source tooling and documentation 
around this specific process, and so folks are left to try to infer the best 
practice from other parts of the docs.

The number of unique shards per database has little to no bearing on the 
durability and availability of the system. Rather, it affects the overall 
throughput achievable for that database. More shards means higher throughput 
for writes, view indexing, and document lookups that specify the _id directly.

 Cheers, Adam

> On Jan 14, 2018, at 7:56 AM, Ulrich Mayring <u...@mayring.de> wrote:
> 
> Hi Adam,
> 
> this is interesting, I was able to send PUT requests to 
> http://localhost:5986/_dbs/userdb-XYZ without giving a revision. Is this 
> intended or should I try to reproduce the issue and file a bug report?
> 
> If I understand you correctly, then with default settings (replica level of 3 
> and 8 shards) I cannot remove a node from a 3-node cluster, else I would lose 
> some shards. Including perhaps users from _users database. So I would need 4 
> or 5 nodes, then I could remove one?
> 
> The docs might be a little confusing (to me) in that regard. They say:
> 
> n=3 Any 2 nodes can be down
> 
> I believe this is only true if you have as many nodes as shards (8 per 
> default)?
> 
> Ulrich
> 
> Am 12.01.18 um 03:11 schrieb Adam Kocoloski:
>> Hi Ulrich, sharding is indeed per-database. This allows for an important 
>> degree of flexibility but it does introduce maintenance overhead when you 
>> have a lot of databases. The system databases you mentioned do have their 
>> own sharding documents which can be modified if you want to redistribute 
>> them across the cluster. Note that this is not required as you scale the 
>> cluster; nodes can still access the information in those databases 
>> regardless of the presence of a “local” shard. Of course if you’re planning 
>> on removing a node hosting shards of those databases you should move the 
>> shards first to preserve the replica level.
>> The sharding document is a normal document and absolutely does have 
>> revisions. We found the changelog to be a used asset when resolving any 
>> merge conflicts introduced in a concurrent rebalancing exercise. Cheers,
>> Adam
>>> On Jan 7, 2018, at 6:08 AM, Ulrich Mayring <u...@mayring.de> wrote:
>>> 
>>> Hello,
>>> 
>>> I haven't quite understood the 2.1.1 documentation for sharding in one 
>>> aspect: it is described how to get the sharding document for one database, 
>>> how to edit it by e. g. adding a node to it and how to upload it again. 
>>> I've tried that and it works fine.
>>> 
>>> However, if I have the couch_per_user feature turned on, then there are 
>>> potentially thousands of databases. Suppose I add a new node to the 
>>> cluster, do I then need to follow this procedure for all databases in order 
>>> to balance data? Or is it enough to do it for one database? I suppose an 
>>> equivalent question would be: are the shards per database or per cluster?
>>> 
>>> And, somewhat related: what about the _users, _global_changes and 
>>> _replicator databases? Do I need to edit their sharding document as well, 
>>> whenever I add or remove a cluster node?
>>> 
>>> I also find it interesting that the sharding document has no revisions and 
>>> instead relies on changelog entries.
>>> 
>>> many thanks in advance for any enlightenment,
>>> 
>>> Ulrich
>>> 
> 
>

Re: Sharding: all databases or just one?

2018-01-11 Thread Adam Kocoloski

Hi Ulrich, sharding is indeed per-database. This allows for an important degree 
of flexibility but it does introduce maintenance overhead when you have a lot 
of databases. The system databases you mentioned do have their own sharding 
documents which can be modified if you want to redistribute them across the 
cluster. Note that this is not required as you scale the cluster; nodes can 
still access the information in those databases regardless of the presence of a 
“local” shard. Of course if you’re planning on removing a node hosting shards 
of those databases you should move the shards first to preserve the replica 
level.

The sharding document is a normal document and absolutely does have revisions. 
We found the changelog to be a used asset when resolving any merge conflicts 
introduced in a concurrent rebalancing exercise. Cheers,

Adam

> On Jan 7, 2018, at 6:08 AM, Ulrich Mayring  wrote:
> 
> Hello,
> 
> I haven't quite understood the 2.1.1 documentation for sharding in one 
> aspect: it is described how to get the sharding document for one database, 
> how to edit it by e. g. adding a node to it and how to upload it again. I've 
> tried that and it works fine.
> 
> However, if I have the couch_per_user feature turned on, then there are 
> potentially thousands of databases. Suppose I add a new node to the cluster, 
> do I then need to follow this procedure for all databases in order to balance 
> data? Or is it enough to do it for one database? I suppose an equivalent 
> question would be: are the shards per database or per cluster?
> 
> And, somewhat related: what about the _users, _global_changes and _replicator 
> databases? Do I need to edit their sharding document as well, whenever I add 
> or remove a cluster node?
> 
> I also find it interesting that the sharding document has no revisions and 
> instead relies on changelog entries.
> 
> many thanks in advance for any enlightenment,
> 
> Ulrich
>

Re: include_docs and no doc issue

2017-12-20 Thread Adam Kocoloski

Hiya Carlos, yes, the race condition definitely still exists.

As far as I know the guidance in the wiki is still accurate. The brute force 
way to eliminate the issue is to emit the doc directly in the view definition 
and skip include_docs altogether, but of course that has a whole set of 
associated tradeoffs ...

Cheers, Adam


> On Dec 19, 2017, at 6:32 AM, Carlos Alonso  wrote:
> 
> Hello everyone!!
> 
> We're experiencing an issue? in which non deterministically we query a view
> with include_docs = true and some of the fetched documents are missing.
> 
> We've found a few lines in the old wiki (
> https://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options) that warns
> about a race condition that could be the cause of this behaviour.
> 
> I was wondering if this still holds true and if there's anything we can do
> to fix it/work around.
> 
> Regards
> 
> 
> -- 
> [image: Cabify - Your private Driver] 
> 
> *Carlos Alonso*
> Data Engineer
> Madrid, Spain
> 
> carlos.alo...@cabify.com
> 
> [image: Facebook] [image: Twitter]
> [image: Instagram] [image:
> Linkedin] 
> 
> -- 
> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
> destinatario, pudiendo contener información confidencial sometida a secreto 
> profesional. No está permitida su reproducción o distribución sin la 
> autorización expresa de Cabify. Si usted no es el destinatario final por 
> favor elimínelo e infórmenos por esta vía. 
> 
> This message and any attached file are intended exclusively for the 
> addressee, and it may be confidential. You are not allowed to copy or 
> disclose it without Cabify's prior written authorization. If you are not 
> the intended recipient please delete it from your system and notify us by 
> e-mail.

Re: Many logs since Couchdb 2.1.1

2017-12-15 Thread Adam Kocoloski

Those log messages seem to be truncated before they get to the names of the 
actual database and associated index. Is it the same one every time in your 
logs?

Adam

> On Dec 15, 2017, at 8:11 AM, Frédéric Alix  wrote:
> 
> Hi,
> 
> Since I updated from CouchDB 2.0 to 2.1.1, my logs are very more verbose.
> Before this upgrade, I had 5 Go logs / days. Now it is more than 100 Go.
> I have this sort of message non stop:
> 
> [info] 2017-12-15T13:07:41.271204Z couchdb@127.0.0.1 <0.16545.1>  
> Index shutdown by monitor notice for db: shards/6
> [info] 2017-12-15T13:07:41.271358Z couchdb@127.0.0.1 <0.16545.1>  
> Closing index for db: shards/6000-7fff/
> 
> What do you think about this ? What I miss exactly in my conf ?
> 
> Regards, Frederic

Re: Difficulty replicating behind a proxy server

2017-12-15 Thread Adam Kocoloski

Thanks. If you’re interested in poking around the codebase I think the place 
where one could bypass the proxy URL for localhost addresses is here:

https://github.com/apache/couchdb/blob/2.1.1/src/couch_replicator/src/couch_replicator_httpc.erl#L43-L55

Cheers, Adam

> On Dec 14, 2017, at 11:25 PM, Jake Kroon <jkr...@immersivetechnologies.com> 
> wrote:
> 
> Hi Adam,
> Thank you very much for the prompt feedback. At the nearest convenience I 
> will raise an issue at https://github.com/apache/couchdb/issues.
> We will also have a look into whether we can potentially implement a fix for 
> this issue.
> Thanks again,
> Jake Kroon
> 
> On 2017-12-14 05:57, Adam Kocoloski <k...@apache.org> wrote:
>> Hi Jake, ugh, yes, I think you?019ve hit the nail on the head. In your first 
>> configuration, CouchDB is seeing an HTTP-based target and assuming it needs 
>> to use the proxy. Your second configuration fails because the replicator 
>> does not yet work with local clustered databases.>
> 
>> 
>> I think the quickest fix here is to add a check for localhost URLs and not 
>> use the proxy for those. Can you file an issue at 
>> https://github.com/apache/couchdb/issues ?>
> 
>> 
>> Thanks, Adam>
>> 
>>> On Dec 12, 2017, at 11:44 PM, Jake Kroon <jk...@immersivetechnologies.com> 
>>> wrote:>
>>>> 
>>> Hi,>
>>>> 
>>> We are experiencing issues when running CouchDB 2.0.0 behind a proxy, 
>>> specifically when trying to perform replication. When performing 
>>> replication without a proxy we do not experience any issues. We're 
>>> attempting to start the replication by adding a document to the 
>>> /_replicator/ database, but as you can see below, it changes to an error 
>>> state with an "invalid json" error:>
> 
>>>> 
>>> {>
>>> "_id": "rep_init",>
>>> "_rev": "20-81f3b1999b7f8e9ac51a45e3acbc4432",>
>>> "source": 
>>> "https://:@dbgateway..com/",>
>>> "target": "http://127.0.0.1:5984/

Re: Difficulty replicating behind a proxy server

2017-12-13 Thread Adam Kocoloski

Hi Jake, ugh, yes, I think you’ve hit the nail on the head. In your first 
configuration, CouchDB is seeing an HTTP-based target and assuming it needs to 
use the proxy. Your second configuration fails because the replicator does not 
yet work with local clustered databases.

I think the quickest fix here is to add a check for localhost URLs and not use 
the proxy for those. Can you file an issue at 
https://github.com/apache/couchdb/issues ?

Thanks, Adam

> On Dec 12, 2017, at 11:44 PM, Jake Kroon  
> wrote:
> 
> Hi,
> 
> We are experiencing issues when running CouchDB 2.0.0 behind a proxy, 
> specifically when trying to perform replication. When performing replication 
> without a proxy we do not experience any issues. We're attempting to start 
> the replication by adding a document to the /_replicator/ database, but as 
> you can see below, it changes to an error state with an "invalid json" error:
> 
> {
>  "_id": "rep_init",
>  "_rev": "20-81f3b1999b7f8e9ac51a45e3acbc4432",
>  "source": 
> "https://:@dbgateway..com/",
>  "target": "http://127.0.0.1:5984/",
>  "create_target": false,
>  "continuous": false,
>  "filter": "replication_filter/no_ddocs",
>  "owner": null,
>  "proxy": "http://:",
>  "_replication_state": "error",
>  "_replication_state_time": "2017-12-07T15:59:12+08:00",
>  "_replication_state_reason": "{invalid_json,{error,{1,invalid_json}}}",
>  "_replication_id": "d449aac07eeb8da0e322d4646e0b0f9a"
> }
> 
> We suspect what's happening is that CouchDB is attempting to contact the 
> proxy server for the target database, since the target db name includes the 
> full address and database name of the local CouchDB server. Since the proxy 
> server is a remote server and the target database is on the local PC, it 
> would not be able to access the target address and would return an error HTML 
> page, rather than the JSON that CouchDB would be expecting.
> 
> Based on this assumption, we attempted to just specify the target db name as 
> opposed to a URL pointing directly to it, hoping that this would imply that 
> the target db is in the local database. However, this results in a different 
> error:
> 
> {
>  "_id": "rep_init",
>  "_rev": "43-f0b5d20b601a1e8d9348476133b7b782",
>  "source": 
> "https://:@dbgateway..com/,",
>  "target": "",
>  "create_target": false,
>  "continuous": false,
>  "proxy": "http://:",
>  "owner": null,
>  "_replication_state": "error",
>  "_replication_state_time": "2017-12-07T16:49:39+08:00",
>  "_replication_state_reason": "{db_not_found,<<\"could not open 
> \">>}",
>  "_replication_id": "85dfa1ffdbbc3a7e346a360e53d7d71f"
> }
> 
> Note that the target database does actually exist, so we're not sure why it 
> would be saying that the db is not found. We've also tried setting 
> "create_target" to true, but continue to receive the same error. We have 
> spent a considerable amount of time trying to figure out what we are doing 
> wrong by reading through the CouchDB documentation. However, we have not as 
> of yet come to a solution.
> 
> Any support or advice that can be provided would be most appreciated.
> 
> Thank you very much.
> 
> Kind regards,
> Jake Kroon
> 
>

Re: Automatic Compaction of _global_changes

2017-12-11 Thread Adam Kocoloski

Great, glad to hear it. And yes, each compaction daemon only watches the shard 
files that are hosted on the local node. Cheers,

Adam

> On Dec 11, 2017, at 2:37 PM, <melvin@tdameritrade.com> 
> <melvin@tdameritrade.com> wrote:
> 
> Hi Adam,
> 
> Thanks for the clarification.  I updated my config and it worked!
> 
> In the below example, I had included the compaction rules for all four nodes 
> in the config for one node.  In retrospect, this probably doesn't make sense 
> since I think the compaction daemon would only see its local shards.  So I'll 
> have each node's config only be aware of the full paths of the local shards.
> 
> Thanks again, hopefully a bug fix will be provided soon.
> 
> /mel
> 
> -Original Message-
> From: Adam Kocoloski [mailto:kocol...@apache.org 
> <mailto:kocol...@apache.org>] 
> Sent: Friday, December 08, 2017 10:57 PM
> To: user@couchdb.apache.org <mailto:user@couchdb.apache.org>
> Subject: Re: Automatic Compaction of _global_changes
> 
> Hi Melvin, right, it needs to be the full path as in my example below:
> 
>> shards/-1fff/_global_changes.1512750761
> 
> i.e. you need to include the "shards/xx-yy/" piece as well.
> 
> It is a bit curious that you’ve got those 4 separate timestamps. Typically 
> you’d see all the shards with the same timestamp. Did you try to create the 
> _global_changes database multiple times or anything funny like that? Are each 
> of the associated files actually growing in size?
> 
> Cheers, Adam
> 
>> On Dec 8, 2017, at 3:59 PM, <melvin@tdameritrade.com> 
>> <melvin@tdameritrade.com> wrote:
>> 
>> I tried the workaround two different ways, and it doesn't seem to work 
>> either. I have a 4 node cluster, and I added this:
>> 
>> [compactions]
>> _global_changes.1483029052 = [{db_fragmentation, "70%"}]
>> _global_changes.1483029075 = [{db_fragmentation, "70%"}]
>> _global_changes.1483029126 = [{db_fragmentation, "70%"}]
>> _global_changes.1483029167 = [{db_fragmentation, "70%"}]
>> 
>> I also tried this:
>> 
>> [compactions]
>> _global_changes.1483029052.couch = [{db_fragmentation, "70%"}]
>> _global_changes.1483029075.couch = [{db_fragmentation, "70%"}]
>> _global_changes.1483029126.couch = [{db_fragmentation, "70%"}]
>> _global_changes.1483029167.couch = [{db_fragmentation, "70%"}]
>> 
>> -Original Message-
>> From: Lew, Melvin K 
>> Sent: Friday, December 08, 2017 3:03 PM
>> To: user@couchdb.apache.org
>> Subject: RE: Automatic Compaction of _global_changes
>> 
>> Thanks Adam! I've submitted the following issue: 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb_issues_1059=DwIFaQ=nulvIAQnC0yOOjC0e0NVa8TOcyq9jNhjZ156R-JJU10=jjIKqwApUzKCDY-o1Ex3afd4bksosuJsda_NnZShUUM=M9k47v0ZlCacpWf5wHjpTTKdCSfyAxA3hze_qxNWnjo=dd_ZZBHLIxhn2GdeTYu9H05dw4OK5XHNL73sUiQWGSA=
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb_issues_1059=DwIFaQ=nulvIAQnC0yOOjC0e0NVa8TOcyq9jNhjZ156R-JJU10=jjIKqwApUzKCDY-o1Ex3afd4bksosuJsda_NnZShUUM=M9k47v0ZlCacpWf5wHjpTTKdCSfyAxA3hze_qxNWnjo=dd_ZZBHLIxhn2GdeTYu9H05dw4OK5XHNL73sUiQWGSA=>
>> 
>> Thanks also for the tip on the security vulnerability. We'll upgrade to 
>> CouchDB 2.1.1 soon.  Fortunately this is an internal database on a 
>> firewalled corporate network so we have a bit more time.
>> 
>> -Original Message-
>> From: Adam Kocoloski [mailto:kocol...@apache.org] 
>> Sent: Friday, December 08, 2017 11:42 AM
>> To: user@couchdb.apache.org
>> Subject: Re: Automatic Compaction of _global_changes
>> 
>> Hiya Melvin, this looks like a bug. I think what’s happening is the 
>> compaction daemon is walking the list of database *shards* on the node and 
>> comparing those names directly against the keys in that config block. The 
>> shard files have internal names like
>> 
>> shards/-1fff/_global_changes.1512750761
>> 
>> If you want to test this out you could look for the full path to one of your 
>> _global_changes shards and supply that as the key instead of just 
>> “_global_changes”. Repeating the config entry for every one of the shards 
>> could also be a workaround for you until we get this patched. Can you file 
>> an issue for it at 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb_issues-3F=DwIFaQ=nulvIAQnC0yOOjC0e0NVa8TOcyq9jNhjZ156R-JJU10=jjIKqwApUzKCDY-o1Ex3afd4bksosuJsda_NnZShUUM=XHjiBSrSt89_WhJ4-9-2Upcb

Re: Automatic Compaction of _global_changes

2017-12-08 Thread Adam Kocoloski

Hi Melvin, right, it needs to be the full path as in my example below:

> shards/-1fff/_global_changes.1512750761

i.e. you need to include the "shards/xx-yy/" piece as well.

It is a bit curious that you’ve got those 4 separate timestamps. Typically 
you’d see all the shards with the same timestamp. Did you try to create the 
_global_changes database multiple times or anything funny like that? Are each 
of the associated files actually growing in size?

Cheers, Adam

> On Dec 8, 2017, at 3:59 PM, <melvin@tdameritrade.com> 
> <melvin@tdameritrade.com> wrote:
> 
> I tried the workaround two different ways, and it doesn't seem to work 
> either. I have a 4 node cluster, and I added this:
> 
> [compactions]
> _global_changes.1483029052 = [{db_fragmentation, "70%"}]
> _global_changes.1483029075 = [{db_fragmentation, "70%"}]
> _global_changes.1483029126 = [{db_fragmentation, "70%"}]
> _global_changes.1483029167 = [{db_fragmentation, "70%"}]
> 
> I also tried this:
> 
> [compactions]
> _global_changes.1483029052.couch = [{db_fragmentation, "70%"}]
> _global_changes.1483029075.couch = [{db_fragmentation, "70%"}]
> _global_changes.1483029126.couch = [{db_fragmentation, "70%"}]
> _global_changes.1483029167.couch = [{db_fragmentation, "70%"}]
> 
> -Original Message-
> From: Lew, Melvin K 
> Sent: Friday, December 08, 2017 3:03 PM
> To: user@couchdb.apache.org
> Subject: RE: Automatic Compaction of _global_changes
> 
> Thanks Adam! I've submitted the following issue: 
> https://github.com/apache/couchdb/issues/1059
> 
> Thanks also for the tip on the security vulnerability. We'll upgrade to 
> CouchDB 2.1.1 soon.  Fortunately this is an internal database on a firewalled 
> corporate network so we have a bit more time.
> 
> -Original Message-
> From: Adam Kocoloski [mailto:kocol...@apache.org] 
> Sent: Friday, December 08, 2017 11:42 AM
> To: user@couchdb.apache.org
> Subject: Re: Automatic Compaction of _global_changes
> 
> Hiya Melvin, this looks like a bug. I think what’s happening is the 
> compaction daemon is walking the list of database *shards* on the node and 
> comparing those names directly against the keys in that config block. The 
> shard files have internal names like
> 
> shards/-1fff/_global_changes.1512750761
> 
> If you want to test this out you could look for the full path to one of your 
> _global_changes shards and supply that as the key instead of just 
> “_global_changes”. Repeating the config entry for every one of the shards 
> could also be a workaround for you until we get this patched. Can you file an 
> issue for it at 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb_issues-3F=DwIFaQ=nulvIAQnC0yOOjC0e0NVa8TOcyq9jNhjZ156R-JJU10=jjIKqwApUzKCDY-o1Ex3afd4bksosuJsda_NnZShUUM=XHjiBSrSt89_WhJ4-9-2UpcbGcs9zvxIpJN-QXPCNBI=dRWvHDgKxwCB7lPHOIDib7jmUy9wVGLuDcG0j0N3R7s=
> 
> By the way, releases prior to 1.7.1 and 2.1.1 have a fairly serious security 
> vulnerability, it’d be good if you could upgrade. Cheers,
> 
> Adam
> 
>> On Dec 6, 2017, at 2:21 PM, melvin@tdameritrade.com wrote:
>> 
>> Hi,
>> 
>> I'm using couchdb 2.0.0 on RHEL 7.2 and I'm looking to configure automatic 
>> compaction of _global_changes but I can't seem to get it to work.  I've 
>> checked the file size and data size of the _global_changes database so I 
>> know the criteria I've specified have been met. I don't get an error upon 
>> couchdb startup, but nothing happens.  When I tried setting a _default 
>> compaction rule, then compaction does happen for all databases including 
>> _global_changes.  Any ideas? I hope I'm just missing something obvious. 
>> Please let me know if any more detail is needed.
>> 
>> This is what I have in local.ini that does not work:
>> [compactions]
>> _global_changes = [{db_fragmentation, "70%"}]
>> 
>> Putting this into local.ini does work, but I don't want to compact all 
>> databases:
>> [compactions]
>> _default = [{db_fragmentation, "70%"}]
>> 
>> For the purposes of my testing, I've also added:
>> [compaction_daemon]
>> check_interval = 30
>> 
>> Thanks in advance!
>

Re: Automatic Compaction of _global_changes

2017-12-08 Thread Adam Kocoloski

Hiya Melvin, this looks like a bug. I think what’s happening is the compaction 
daemon is walking the list of database *shards* on the node and comparing those 
names directly against the keys in that config block. The shard files have 
internal names like

shards/-1fff/_global_changes.1512750761

If you want to test this out you could look for the full path to one of your 
_global_changes shards and supply that as the key instead of just 
“_global_changes”. Repeating the config entry for every one of the shards could 
also be a workaround for you until we get this patched. Can you file an issue 
for it at https://github.com/apache/couchdb/issues?

By the way, releases prior to 1.7.1 and 2.1.1 have a fairly serious security 
vulnerability, it’d be good if you could upgrade. Cheers,

Adam

> On Dec 6, 2017, at 2:21 PM, melvin@tdameritrade.com wrote:
> 
> Hi,
> 
> I'm using couchdb 2.0.0 on RHEL 7.2 and I'm looking to configure automatic 
> compaction of _global_changes but I can't seem to get it to work.  I've 
> checked the file size and data size of the _global_changes database so I know 
> the criteria I've specified have been met. I don't get an error upon couchdb 
> startup, but nothing happens.  When I tried setting a _default compaction 
> rule, then compaction does happen for all databases including 
> _global_changes.  Any ideas? I hope I'm just missing something obvious. 
> Please let me know if any more detail is needed.
> 
> This is what I have in local.ini that does not work:
> [compactions]
> _global_changes = [{db_fragmentation, "70%"}]
> 
> Putting this into local.ini does work, but I don't want to compact all 
> databases:
> [compactions]
> _default = [{db_fragmentation, "70%"}]
> 
> For the purposes of my testing, I've also added:
> [compaction_daemon]
> check_interval = 30
> 
> Thanks in advance!

Re: 100% CPU on only a single node because of couchjs processes

2017-12-05 Thread Adam Kocoloski

Hi Geoff, a couple of additional questions:

1) Are you making these view requests with stale=ok or stale=update_after?
2) What are you using for N and Q in the [cluster] configuration settings?
3) Did you take advantage of the (barely-documented) “zones" attribute when 
defining cluster members?
3) Do you have any other JS code besides the view definitions?

Regarding #1, the cluster will actually select shards differently depending on 
the use of those query parameters. When your request stipulates that you’re OK 
with stale results the cluster *will* select a “primary” copy in order to 
improve the consistency of repeated requests to the same view. The algorithm 
for choosing those primary copies is somewhat subtle hence my question #3.

If you’re not using stale requests I have a much harder time explaining why the 
100% CPU issue would migrate from node to node like that.

Adam

> On Dec 5, 2017, at 9:36 AM, Geoffrey Cox  wrote:
> 
> Thanks for the responses, any other thoughts?
> 
> FYI: I’m trying to work on a very focused test case that I can share with
> the Dev team, but it is taking a little while to narrow down the exact
> cause.
> On Tue, Dec 5, 2017 at 4:43 AM Robert Samuel Newson 
> wrote:
> 
>> Sorry to contradict you, but Cloudant deploys clusters across amazon AZ's
>> as standard. It's fast enough. It's cross-region that you need to avoid.
>> 
>> B.
>> 
>>> On 5 Dec 2017, at 09:11, Jan Lehnardt  wrote:
>>> 
>>> Heya Geoff,
>>> 
>>> a CouchDB cluster is designed to run in the same data center / with
>> local are networking latencies. A cluster across AWS Availability Zones
>> won’t work as you see. If you want CouchDB’s in both AZs, use regular
>> replication and keep the clusters local to the AZ.
>>> 
>>> Best
>>> Jan
>>> --
>>> 
 On 4. Dec 2017, at 19:46, Geoffrey Cox  wrote:
 
 Hi,
 
 I've spent days using trial and error to try and figure out why I am
 getting a very high CPU load on only a single node in my cluster. I'm
 hoping someone has an idea of what is going on as I'm getting stuck.
 
 Here's my configuration:
 
 1. 2 node cluster:
1. Each node is located in a different AWS availability zone
2. Each node is a t2 medium instance (2 CPU cores, 4 GB Mem)
 2. A haproxy server is load balancing traffic to the nodes using round
 robin
 
 The problem:
 
 1. After users make changes via PouchDB, a backend runs a number of
 routines that use views to calculate notifications. The issue is that
>> on a
 single node, the couchjs processes stack up and then start to consume
 nearly all the available CPU. This server then becomes the "workhorse"
>> that
 always does *all* the heavy duty couchjs processing until I restart
>> this
 node.
 2. It is important to note that both nodes have couchjs processes, but
 it is only a single node that has the couchjs processes that are using
>> 100%
 CPU
 3. I've even resorted to setting `os_process_limit = 10` and this just
 results in each couchjs process taking over 10% each! In other words,
>> the
 couchjs processes just eat up all the CPU no matter how many couchjs
 process there are!
 4. The CPU usage will eventually clear after all the processing is
>> done,
 but then as soon as there is more to process the workhorse node will
>> get
 bogged down again.
 5. If I restart the workhorse node, the other node then becomes the
 workhorse node. This is the only way to get the couchjs processes to
>> "move"
 to another node.
 6. The problem is that this design is not scalable as only one node can
 be the workhorse node at any given time. Moreover this causes specific
 instances to run out of CPU credits. Shouldn't the couchjs processes be
 spread out over all my nodes? From what I can tell, if I add more
>> nodes I'm
 still going to have the issue where only one of the nodes is getting
>> bogged
 down. Is it possible that the problem is that I have 2 nodes and
>> really I
 need at least 3 nodes? (I know a 2-node cluster is not very typical)
 
 
 Things I've checked:
 
 1. Ensured that the load balancing is working, i.e. haproxy is indeed
 distributing traffic accordingly
 2. I've tried setting `os_process_limit = 10` and
>> `os_process_soft_limit
 = 5` to see if I could force a more conservative usage of couchjs
 processes, but instead the couchjs processes just consume all the CPU
>> load.
 3. I've tried simulating the issue locally with VMs and I cannot
 duplicate any such load. My guess is that this is because the nodes are
 located on the same box so hop distance between nodes is very small and
 this somehow keeps the CPU usage to a minimum
 4. I've tried isolating the issue by creating short code snippets that
 intentionally try to spawn

Re: Armhf Snap for Couchdb

2017-12-02 Thread Adam Kocoloski


> On Dec 1, 2017, at 5:11 PM, Dave Cottlehuber  wrote:
> 
> On Fri, 1 Dec 2017, at 22:06, Ryan J. Yoder wrote:
>> I'm trying to install couchdb via snap on a raspberrypi. I've installed
>> on my laptop, so i know the snap is in the store, but the armhf snap
>> doesnt appear to be in the store. I can see it here
>> (https://launchpad.net/~couchdb/+snap/couchdb) (and it appears to run).
>> Why isn't it in the store? I can download and install it, but i would
>> much prefere to install it directly from the store.
>> 
>> Thanks
> 
> Hi Ryan,
> 
> http://docs.couchdb.org/en/latest/install/snap.html is all I know about,
> and as I don't use Ubuntu I can't help further.
> 
> Can you try that, and report back with any specific error messages you
> get to the list?
> 
> Can anybody else help Ryan out on how to install ubuntu snaps for Couch?
> I'm not familiar with how this is set up. Please Cc him as I don't think
> he's subscribed to our list.
> 
> A+
> Dave
> —
>  Dave Cottlehuber
>  d...@apache.org
>  Sent from my Couch

Good question, I recall some activity around getting CouchDB into the store 
back in March but I wasn’t tracking it closely. Perhaps Robert or Michael has a 
more detailed view. Cheers,

Adam

Re: CouchDB sync error net::ERR_CONNECTION_RESET

2017-11-29 Thread Adam Kocoloski

Hi Yvonne, yes, either the remote CouchDB instance needs to be directly 
accessible from the internet, or you need some kind of tunnel or proxy that 
will provide access to it. PouchDB is not special in this respect; it’s using 
the same API as everyone else. Cheers,

Adam

> On Nov 29, 2017, at 5:15 AM, Yvonne Aburrow <yabur...@brookes.ac.uk> wrote:
> 
> oops I mean SSH tunnel
> 
> Regards,
> 
> *Yvonne Aburrow*
> Applications Developer
> Information Systems Team
> *IT Services <http://www.brookes.ac.uk/obis/>*
> Oxford Brookes University <http://www.brookes.ac.uk/>
> 
> extn 2706
> 
> For enquiries and issues with live systems, please email
> broo...@service-now.com
> 
> 
> 
> On 29 November 2017 at 09:48, Yvonne Aburrow <yabur...@brookes.ac.uk> wrote:
> 
>> Thanks Adam
>> 
>> Unfortunately that still didn't work - I got ERR_CONNECTION_RESET again.
>> 
>> I am wondering if I need to use a SSL tunnel to create a local instance (I
>> had to do that to access Fauxton).
>> 
>> There doesn't seem to be a script for SSL tunnelling that doesn't involve
>> using Node.js
>> 
>> Regards,
>> 
>> *Yvonne Aburrow*
>> Applications Developer
>> Information Systems Team
>> *IT Services <http://www.brookes.ac.uk/obis/>*
>> Oxford Brookes University <http://www.brookes.ac.uk/>
>> 
>> extn 2706
>> 
>> For enquiries and issues with live systems, please email
>> broo...@service-now.com
>> 
>> 
>> 
>> On 28 November 2017 at 20:40, Adam Kocoloski <kocol...@apache.org> wrote:
>> 
>>> Hi Yvonne,
>>> 
>>> It looks like you’re trying to make an HTTPS connection to port 5984. Did
>>> you customize the server config? The default CouchDB configuration us port
>>> *6984* for HTTPS. Cheers,
>>> 
>>> Adam
>>> 
>>>> On Nov 28, 2017, at 9:37 AM, Yvonne Aburrow <yabur...@brookes.ac.uk>
>>> wrote:
>>>> 
>>>> I am trying to sync a local PouchDB instance with a remote CouchDB
>>> instance
>>>> on Google App Engine. I am not using Node.js
>>>> 
>>>> I have successfully logged in to the remote instance
>>>> <https://stackoverflow.com/questions/47474384/couchdb-login-
>>> access-on-google-app-engine>,
>>>> but I am getting the following error when I try to sync:
>>>> 
>>>>   replication paused (e.g. user went offline)
>>>>   pouchdb-6.3.4.min.js:9 GET
>>>> https://:5984/pouchnotes/?_nonce=1511870134012
>>>> net::ERR_CONNECTION_RESET
>>>> 
>>>> *This is my sync function:*
>>>> 
>>>>   PouchNotesObj.prototype.syncnoteset = function (start, end) {
>>>>   var start = new Date().getTime();
>>>>   document.getElementById("syncbutton").innerHTML = "Syncing...";
>>>> 
>>>>   var i,
>>>>   that = this,
>>>> 
>>>>   options = {
>>>>   doc_ids:['1450853987668']
>>>> };
>>>> 
>>>> 
>>>>   //options.include_docs = true;
>>>> 
>>>>   if(start){ options.startkey = start; }
>>>>   if(end){ options.endkey = end; }
>>>> 
>>>>   PouchDB.sync(this.dbname, this.remote, { retry: true })
>>>>   //this.pdb.sync(this.remote, { doc_id:['1450853987668'] })
>>>>   .on('change', function (info) {
>>>>  console.log('change');
>>>>   document.getElementById("syncbutton").innerHTML = "Sync Notes";
>>>>   }).on('paused', function () {
>>>>  console.log('replication paused (e.g. user went offline)');
>>>>   document.getElementById("syncbutton").innerHTML = "Sync Notes";
>>>>   }).on('active', function () {
>>>>  console.log('replicate resumed (e.g. user went back online)');
>>>>   document.getElementById("syncbutton").innerHTML = "Sync Notes";
>>>>   }).on('denied', function (info) {
>>>>  console.log('a document failed to replicate, e.g. due to
>>>> permissions');
>>>>   document.getElementById("syncbutton").innerHTML = "Sync Notes";
>>>>   }).on('complete', function (info) {
>>>> console.log("Sync Complete");
>>>> document.getElementById("syncbutton").innerHTML = "Sync
>>> Notes";
>>>> that.viewnoteset();
>>>> that.formobject.reset();
>>>> that.show(that.formobject.dataset.show);
>>>> that.hide(that.formobject.dataset.hide);
>>>> var end = new Date().getTime();
>>>> console.log("Time Taken - " + (end - start) + " ms");
>>>>   }).on('error', function (error) {
>>>> console.log("Sync Error:" + JSON.stringify(error));
>>>> alert("Sync Error:" + error);
>>>> that.showerror(error);
>>>>   });
>>>> 
>>>>   }
>>>> 
>>>> Any idea what is causing the connection to reset?
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>>> *Yvonne Aburrow*
>>>> Applications Developer
>>>> Information Systems Team
>>>> *IT Services <http://www.brookes.ac.uk/obis/>*
>>>> Oxford Brookes University <http://www.brookes.ac.uk/>
>>>> 
>>>> extn 2706
>>> 
>>> 
>>

Re: CouchDB sync error net::ERR_CONNECTION_RESET

2017-11-28 Thread Adam Kocoloski

Hi Yvonne,

It looks like you’re trying to make an HTTPS connection to port 5984. Did you 
customize the server config? The default CouchDB configuration us port *6984* 
for HTTPS. Cheers,

Adam

> On Nov 28, 2017, at 9:37 AM, Yvonne Aburrow  wrote:
> 
> I am trying to sync a local PouchDB instance with a remote CouchDB instance
> on Google App Engine. I am not using Node.js
> 
> I have successfully logged in to the remote instance
> ,
> but I am getting the following error when I try to sync:
> 
>replication paused (e.g. user went offline)
>pouchdb-6.3.4.min.js:9 GET
> https://:5984/pouchnotes/?_nonce=1511870134012
> net::ERR_CONNECTION_RESET
> 
> *This is my sync function:*
> 
>PouchNotesObj.prototype.syncnoteset = function (start, end) {
>var start = new Date().getTime();
>document.getElementById("syncbutton").innerHTML = "Syncing...";
> 
>var i,
>that = this,
> 
>options = {
>doc_ids:['1450853987668']
>  };
> 
> 
>//options.include_docs = true;
> 
>if(start){ options.startkey = start; }
>if(end){ options.endkey = end; }
> 
>PouchDB.sync(this.dbname, this.remote, { retry: true })
>//this.pdb.sync(this.remote, { doc_id:['1450853987668'] })
>.on('change', function (info) {
>   console.log('change');
>document.getElementById("syncbutton").innerHTML = "Sync Notes";
>}).on('paused', function () {
>   console.log('replication paused (e.g. user went offline)');
>document.getElementById("syncbutton").innerHTML = "Sync Notes";
>}).on('active', function () {
>   console.log('replicate resumed (e.g. user went back online)');
>document.getElementById("syncbutton").innerHTML = "Sync Notes";
>}).on('denied', function (info) {
>   console.log('a document failed to replicate, e.g. due to
> permissions');
>document.getElementById("syncbutton").innerHTML = "Sync Notes";
>}).on('complete', function (info) {
>  console.log("Sync Complete");
>  document.getElementById("syncbutton").innerHTML = "Sync Notes";
>  that.viewnoteset();
>  that.formobject.reset();
>  that.show(that.formobject.dataset.show);
>  that.hide(that.formobject.dataset.hide);
>  var end = new Date().getTime();
>  console.log("Time Taken - " + (end - start) + " ms");
>}).on('error', function (error) {
>  console.log("Sync Error:" + JSON.stringify(error));
>  alert("Sync Error:" + error);
>  that.showerror(error);
>});
> 
>}
> 
> Any idea what is causing the connection to reset?
> 
> 
> Regards,
> 
> *Yvonne Aburrow*
> Applications Developer
> Information Systems Team
> *IT Services *
> Oxford Brookes University 
> 
> extn 2706

Re: Strange Errors After Reboot

2017-10-11 Thread Adam Kocoloski

Hi Steven, cryptic isn’t it?

The “undef” error means that the VM cannot find the function 
couch_httpd_auth:basic_authentication_handler/1. Sure enough, that function is 
not defined in that module, nor is it listed in the default server 
configuration. Your configuration has a customized "[chttpd] 
authentication_handlers” block which adds this handler. That’s why you’re 
getting this crash.

Do you happen to know where that configuration showed up? Cheers,

Adam

> On Oct 10, 2017, at 11:20 AM, Steven Hammond  wrote:
> 
> Hello,
> 
> I managed to get an instance of CouchDB running on a virtual machine
> running Ubuntu. Things seemed just fine until I rebooted the server. Now I
> get a 500 error with every request.
> 
> couchdb@couchserver:~$ curl http://localhost:5984
> {"error":"unknown_error","reason":"undef","ref":1372225368}
> 
> On the server, the error looks like this.
> 
> [error] 2017-10-10T15:14:39.690521Z couchdb@localhost <0.8457.0> 4dc9fc7aad
> req_err(1372225368) unknown_error : undef
> 
> [<<"couch_httpd_auth:basic_authentication_handler/1">>,<<"chttpd:authenticate_request/2
> L510">>,<<"chttpd:process_request/1 L289">>,<<"chttpd:handle_request_int/1
> L231">>,<<"mochiweb_http:headers/6 L91">>,<<"proc_lib:init_p_do_apply/3
> L247">>]
> [notice] 2017-10-10T15:14:39.690663Z couchdb@localhost <0.8457.0>
> 4dc9fc7aad localhost:5984 127.0.0.1 undefined GET / 500 ok 0
> 
> 
> And here is what the server shows when couchdb is started, with the log
> level set to debug. I've googled everything I can think of, to no avail.
> I'm pretty new to CouchDB so any help is appreciated.
> 
> Thanks,
> Steve
> 
> 
> 
> shammond42@couchserver:~$ sudo -i -u couchdb /home/couchdb/bin/couchdb
> Configuration Settings:
>  [admins]
> shammond42="-pbkdf2-ba88473098fca0902fa38563526b768128ffd0eb,ae88623bb18783ff5264964f06191e40,10"
>  [attachments] compressible_types="text/*, application/javascript,
> application/json, application/xml"
>  [attachments] compression_level="8"
>  [chttpd] authentication_handlers="{couch_httpd_auth,
> basic_authentication_handler},{couch_httpd_auth,
> cookie_authentication_handler}, {couch_httpd_auth,
> proxy_authentication_handler},{couch_httpd_auth,
> default_authentication_handler}"
>  [chttpd] backlog="512"
>  [chttpd] bind_address="0.0.0.0"
>  [chttpd] docroot="./share/www"
>  [chttpd] port="5984"
>  [chttpd] require_valid_user="true"
>  [chttpd] socket_options="[{recbuf, 262144}, {sndbuf, 262144}, {nodelay,
> true}]"
>  [cluster] n="1"
>  [cluster] q="8"
>  [compaction_daemon] check_interval="300"
>  [compaction_daemon] min_file_size="131072"
>  [compactions] _default="[{db_fragmentation, \"70%\"},
> {view_fragmentation, \"60%\"}]"
>  [cors] credentials="true"
>  [cors] headers="accept, authorization, content-type, origin, referer"
>  [cors] methods="GET, PUT, POST, HEAD, DELETE"
>  [cors] origins="http://localhost:3000,http://10.0.42.8:3000;
>  [couch_httpd_auth] allow_persistent_cookies="false"
>  [couch_httpd_auth] auth_cache_size="50"
>  [couch_httpd_auth] authentication_db="_users"
>  [couch_httpd_auth] authentication_redirect="/_utils/session.html"
>  [couch_httpd_auth] iterations="10"
>  [couch_httpd_auth] proxy_use_secret="true"
>  [couch_httpd_auth] require_valid_user="true"
>  [couch_httpd_auth] secret="73130080693b036da8f1eb56733e7f87"
>  [couch_httpd_auth] timeout="600"
>  [couch_peruser] delete_dbs="false"
>  [couch_peruser] enable="false"
>  [couchdb] attachment_stream_buffer_size="4096"
>  [couchdb] changes_doc_ids_optimization_threshold="100"
>  [couchdb] database_dir="./data"
>  [couchdb] default_security="admin_local"
>  [couchdb] delayed_commits="false"
>  [couchdb] file_compression="snappy"
>  [couchdb] max_dbs_open="500"
>  [couchdb] os_process_timeout="5000"
>  [couchdb] uuid="8b728ef48e287299ae55d231d6e5887a"
>  [couchdb] view_index_dir="./data"
>  [csp] enable="true"
>  [daemons] auth_cache="{couch_auth_cache, start_link, []}"
>  [daemons] compaction_daemon="{couch_compaction_daemon, start_link, []}"
>  [daemons] couch_peruser="{couch_peruser, start_link, []}"
>  [daemons] external_manager="{couch_external_manager, start_link, []}"
>  [daemons] httpd="{couch_httpd, start_link, []}"
>  [daemons] index_server="{couch_index_server, start_link, []}"
>  [daemons] os_daemons="{couch_os_daemons, start_link, []}"
>  [daemons] query_servers="{couch_proc_manager, start_link, []}"
>  [daemons] uuids="{couch_uuids, start, []}"
>  [daemons] vhosts="{couch_httpd_vhost, start_link, []}"
>  [database_compaction] checkpoint_after="5242880"
>  [database_compaction] doc_buffer_size="524288"
>  [httpd] allow_jsonp="false"
>  [httpd] authentication_handlers="{couch_httpd_auth,
> cookie_authentication_handler}, {couch_httpd_auth,
> proxy_authentication_handler},{couch_httpd_auth,
> default_authentication_handler}"
>  [httpd] bind_address="127.0.0.1"
>  [httpd] default_handler="{couch_httpd_db, handle_request}"
>  [httpd] enable_cors="true"
>  [httpd]

Re: CouchDB 1.6 changes feed throughput decrease

2017-06-28 Thread Adam Kocoloski

Hi Carlos, I think the ML might have stripped the screenshot.

I know there have been a couple of bugs that caused exactly that behavior in 
the past. Will see what I can dig up from JIRA. Cheers,

Adam

> On Jun 28, 2017, at 12:03 PM, Carlos Alonso  wrote:
> 
> Hi guys, we're seeing CouchDB changes feed dropping throughput after a few 
> minutes. Is this something known?
> 
> I've tried to isolate the problem by just reading from the changes feed and 
> throwing it into /dev/null. The send/receive speeds consistently go down and 
> I have no clue. Have any of you seen this before? Please find attached a 
> screenshot of our metrics. The purple line is the bytes sent metric from the 
> CouchDB node host and the blue one is the bytes received from where I read 
> it. The second peak is because I restart the curl process.
> 
> Any clue?
> 
> Regards
> -- 
>   
> Carlos Alonso
> Data Engineer
> Madrid, Spain
> carlos.alo...@cabify.com 
> Prueba gratis con este código 
> #CARLOSA6319 
>     
> 
> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
> destinatario, pudiendo contener información confidencial sometida a secreto 
> profesional. No está permitida su reproducción o distribución sin la 
> autorización expresa de Cabify. Si usted no es el destinatario final por 
> favor elimínelo e infórmenos por esta vía. 
> 
> This message and any attached file are intended exclusively for the 
> addressee, and it may be confidential. You are not allowed to copy or 
> disclose it without Cabify's prior written authorization. If you are not the 
> intended recipient please delete it from your system and notify us by e-mail.

Re: _global_changes purpose

2017-06-27 Thread Adam Kocoloski

The guarantee on the per-DB _changes feed is much stronger — if you got a 201 
or 202 HTTP response back on the updated documents will always show up in 
_changes eventually.

It'd be nice if we could offer the same guarantee on _db_updates but there are 
some non-trivial technical challenges under the hood. In the current 
implementation you’d basically need to crash the entire cluster within 1 second 
of the update being committed to the DB in order for it to be lost from the 
_db_updates feed. I think that’s still a useful alternative to opening up 100s 
of connections to listen to every database’s feed directly.

Cheers, Adam

> On Jun 27, 2017, at 2:42 AM, Vladimir Kuznetsov  wrote:
> 
> Joan, thanks for the pointer to the documentation. 
> 
> Sorry for being annoying, I have one more question. The doc states the 
> following:  
> 
> "Note: This was designed without the guarantee that a DB event will be 
> persisted or ever occur in the _db_updates feed. It probably will, but it 
> isn't guaranteed". 
> 
> Ok, I understand events in _db_updates feed are not guaranteed to be in 
> order, timing is also is not guaranteed, that's fine. What makes me really 
> confused is "DB event is not guaranteed to ever occur in _db_updates feed". 
> What's the point of using _db_updates if I cannot rely on it? Recalling the 
> use case you mentioned earlier in this thread: "you have 100 databases and 
> you want to know when something changes on all of them", how do I know for 
> sure a change in some database occurred if it is not even guaranteed to 
> eventually appear in _db_updates?
> 
> Another question - is this true for per-db _changes feed i.e. is it also not 
> guaranteed that ANY change will eventually appear in _changes?
> 
> thanks,
> --Vovan
> 
>> On Jun 26, 2017, at 11:12 PM, Joan Touzet  wrote:
>> 
>> I'll update the docs. However, for now we have:
>> 
>> ---
>> When a database is created, deleted, or updated, a corresponding event will 
>> be persisted to disk (Note: This was designed without the guarantee that a 
>> DB event will be persisted or ever occur in the _db_updates feed. It 
>> probably will, but it isn't guaranteed). Users can subscribe to a 
>> _changes-like feed of these database events by querying the _db_updates 
>> endpoint.
>> 
>> When an admin user queries the /_db_updates endpoint, they will see the 
>> account name associated with the DB update as well as update
>> ---
>> And technically, the endpoint can work without the _global_changes database, 
>> but be aware:
>> 
>> ---
>> 3: global_changes, update_db: (true/false) A flag setting whether to update 
>> the global_changes database. If false, changes will be lost and there will 
>> be no performance impact of global_changes on the cluster.
>> ---
>> 
>> This is all from https://github.com/apache/couchdb-global-changes
>> 
>> I also learned something new today!
>> 
>> -Joan
>> 
>> - Original Message -
>> From: "Vladimir Kuznetsov" 
>> To: "Joan Touzet" 
>> Cc: user@couchdb.apache.org
>> Sent: Tuesday, 27 June, 2017 1:53:02 AM
>> Subject: Re: _global_changes purpose
>> 
>> Thanks Joan. 
>> 
>> Very good to know. It'd be great to have this reflected somewhere in the 
>> official couchdb 2.0 docs. Probably it is already there I just could not 
>> find that...
>> 
>> thanks,
>> --Vovan
>> 
>>> On Jun 26, 2017, at 10:42 PM, Joan Touzet  wrote:
>>> 
>>> _db_updates is powered by the _global_changes database.
>>> 
>>> -Joan
>>> 
>>> - Original Message -
>>> From: "Vladimir Kuznetsov" 
>>> To: user@couchdb.apache.org, "Joan Touzet" 
>>> Sent: Tuesday, 27 June, 2017 12:39:55 AM
>>> Subject: Re: _global_changes purpose
>>> 
>>> Hi Joan
>>> 
>>> I heard /_db_updates is the feed-like thing I could subscribe to listen to 
>>> the global updates(same way you described). It is not very clear why would 
>>> I need access to _global_changes database when I already have /_db_updates 
>>> method with pagination and long-polling features.
>>> 
>>> Is listening on _global_changes's /_changes feed the same as listening on 
>>> /_db_updates? Or is there any difference? What is preferred?
>>> 
>>> thanks,
>>> --Vovan
>>> 
>>> 
 On Jun 26, 2017, at 9:21 PM, Joan Touzet  wrote:
 
 Say you have 100 databases and you want to know when something changes on 
 all
 of them. In 1.x you have to open 100 _changes continuous feeds to get that
 information. In 2.x you have to open a single connection to 
 _global_changes.
 
 Think of the possibilities.
 
 -Joan
 
 - Original Message -
 From: "Vladimir Kuznetsov" 
 To: user@couchdb.apache.org
 Sent: Monday, 26 June, 2017 8:47:46 PM
 Subject: _global_changes purpose
 
 Hi guys
 
 I cannot find any good explanation what's the purpose of

Re: Stale mango queries?

2017-06-12 Thread Adam Kocoloski

Hi Assaf, Mango does not support stale queries at this time. COUCHDB-3209 is 
filed for that enhancement. The indexing performance is approximately 
equivalent to Erlang views, not JS. Cheers,

Adam

> On Jun 12, 2017, at 9:42 AM, aa mm  wrote:
> 
> Hi.
> 
> Does mango views support stale queries?
> Does the mango views index efficiency equivalent to erlang views or
> javascript views?
> 
> Thanks,
> Assaf.

Re: _all_docs consistency in cluster

2017-06-06 Thread Adam Kocoloski

Hi Carlos, yes …

The _all_docs, _changes, _view and _find endpoints do *not* apply any quorum to 
their results. They generate a merged result set from exactly one copy of every 
shard, and the copy that is selected is not always stable. So this is the 
defined (if admittedly unexpected) behavior.

One best practice if you’re adding a new node to a cluster (or rebuilding an 
unhealthy one) is to set the maintenance mode flag on the node’s config to true:

[couchdb]
maintenance_mode = true

This will cause the node to not participate in any read operations, but it will 
still receive and synchronize data. You can watch for pending_changes messages 
in the logs and view builds in active_tasks and lift the flag once those clear 
out. Cheers,

Adam

> On Jun 6, 2017, at 10:39 AM, Carlos Alonso  wrote:
> 
> Hi guys.
> 
> I've been experimenting with operating a CouchDB 2.0 cluster and I've seen
> the following unexpected behaviour.
> 
> Having a db with just one shard and just one repica, and a client that just
> inserts a new document and reads /db/_all_docs on a loop every second (just
> to simulate a controlled load) I expect the number of docs on every read to
> be sequentially incrementing, and it actually is.
> 
> However, at some point I add a new replica for that shard, in a different
> node of the cluster and, while the newly added node is synchronising its
> shard the number of read documents is not incremented anymore!! It is only
> when the new repica is synchronised that the numbers match again.
> 
> It feels like while replicating those requests using the new replica as
> coordinator are resolved locally instead of via quorum as I'd expect.
> 
> Has anyone seen something similar?
> -- 
> [image: Cabify - Your private Driver] 
> 
> *Carlos Alonso*
> Data Engineer
> Madrid, Spain
> 
> carlos.alo...@cabify.com
> 
> Prueba gratis con este código
> #CARLOSA6319 
> [image: Facebook] [image: Twitter]
> [image: Instagram] [image:
> Linkedin] 
> 
> -- 
> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
> destinatario, pudiendo contener información confidencial sometida a secreto 
> profesional. No está permitida su reproducción o distribución sin la 
> autorización expresa de Cabify. Si usted no es el destinatario final por 
> favor elimínelo e infórmenos por esta vía. 
> 
> This message and any attached file are intended exclusively for the 
> addressee, and it may be confidential. You are not allowed to copy or 
> disclose it without Cabify's prior written authorization. If you are not 
> the intended recipient please delete it from your system and notify us by 
> e-mail.

Re: How scalable is replication?

2017-04-11 Thread Adam Kocoloski

Hi Simon, it is definitely possible for a single cluster to manage thousands of 
replication jobs simultaneously — we have done this in production — but it does 
introduce scalability challenges and quite a bit of computational overhead. 
There’s a lot of work happening to improve CouchDB’s handling of large numbers 
of replications right now in 
https://issues.apache.org/jira/browse/COUCHDB-3324; Nick Vatamaniuc recently 
summarized the effort on the dev@couchdb mailing list:

http://mail-archives.apache.org/mod_mbox/couchdb-dev/201704.mbox/browser

Cheers, Adam

> On Apr 11, 2017, at 4:45 AM, Simon Temple  wrote:
> 
> I'm searching for information about replication scalability, has anyone 
> deployed replication across hundreds or even thousands of instances?
> 
> Can I realistically expect a single cluster to act as a replication point for 
> a thousand replicas?
> 
> The database I wish to replicate contains user information.  This information 
> is not updated very frequently but is read often. 
> 
> TIA
> 
> Simon
> 
>

Re: Pagination recipe, Do you really need to keep updating the start key?

2017-04-07 Thread Adam Kocoloski

No, the “startkey_doc_id” is only considered after the start key is applied. 
For example:

key=[02134], id=123
key=[02134], id=567 <- your second query starts from here
…
key=[02134, Jackson], id=234 <- not here

Make sense? In other words, the view is actually indexed under the hood by the 
[key, docid] combination, and the “startkey_doc_id” field exposes that detail. 
Cheers,

Adam

> On Apr 7, 2017, at 1:15 PM, Jason Gordon  
> wrote:
> 
> Hi Jan,
> 
> You raise a good point.  But it leads to another, similar question
> 
> If rows 50 and 51 both contain a user with lastname Jackson, the recipe
> says to use the following as the next key
> 
> startkey=[02134,Jackson] endKey=[02134,{}] limit=51 startkey_doc_id=234
> (assuming _id for Jackson 234)
> 
> but why not
> 
> startkey=[02134] endKey=[02134,{}] startkey_doc_id=234 limit=51
> 
> Would they not accomplish the same thing?  Is the 2nd query inefficient?
> 
> Thanks,
> 
> Jason
> 
> 
> 
> 
> 
> 
> Jason Gordon  | Principal | A S S U R E B R I D G E
> Office:  +1 888 409 6995  |  Mobile:  +1 978 885 6102  |  Fax: +1 888 409
> 6995
> Email: jason.gor...@assurebridge.com
> 
> On Fri, Apr 7, 2017 at 5:32 AM, Jan Lehnardt  wrote:
> 
>> 
>>> On 6 Apr 2017, at 21:18, Jason Gordon 
>> wrote:
>>> 
>>> The CouchDB docs 6.2.5 pagination recipe recommends to use the
>>> "startkey_docid for pagination if, and only if, the extra row you fetch
>> to
>>> find the next page has the same key as the current startkey"
>>> 
>>> Why can't you keep the start key the same and just keep updating the
>>> startkey_docid?
>>> 
>>> For example:
>>> 
>>> if a view emits a key of [doc.zipcode, doc.lastname]
>>> 
>>> And I'm looking for all people in a given zipcode.
>>> and I do an initial query with startkey=[02134] endKey=[02134,{}]
>> limit=51
>>> 
>>> the 50th user has a last name of Jackson (_id 123)  and the 51st user
>> has a
>>> lastname of Johnson (_id 234).
>>> 
>>> I could ask for the next page in two ways:
>>> 
>>> *startkey=[02134,Johnson] endKey=[02134,{}] limit=51*
>>> 
>>> OR
>>> 
>>> *startkey=[02134] endKey=[02134,{}] startkey_doc_id=234 limit=51*
>>> 
>>> Is there something wrong with the second approach?  Would it perform
>> poorly?
>> 
>> What if lastname in row 51 is also Jackson? :)
>> 
>> Best
>> Jan
>> --
>> 
>>> 
>>> Thanks
>>> 
>>> Jason
>>> 
>>> 
>>> Jason Gordon  | Principal | A S S U R E B R I D G E
>>> Office:  +1 888 409 6995  |  Mobile:  +1 978 885 6102  |  Fax: +1 888 409
>>> 6995
>>> Email: jason.gor...@assurebridge.com
>> 
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>>

Re: views failing due to fabric_worker_timeout and OS process timed out

2017-02-21 Thread Adam Kocoloski

Hi Gustavo, there are a couple of things going on here. Let’s address them 
individually:

> On Feb 21, 2017, at 6:17 PM, Gustavo Delfino  wrote:
> 
> Hi, I am evaluating using CouchDB and all worked well with a small test 
> database. Now I am trying to use it with a much larger database and I am 
> having an issue creating views. My view map function is very simple:
> 
> function (doc) {
>var trw_id;
>if(doc.customer_id){
>  emit(doc.customer_id, doc._id);
>}
> }
> 
> With a few hundred documents it works well but not as the size of the db 
> grows (or maybe I have an issue with the function above).
> 
> I can see in the log how the shards start working:
> 
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29209.6>  
> Starting index update for db: shards/2000-3fff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29194.6>  
> Starting index update for db: shards/-1fff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29191.6>  
> Starting index update for db: shards/6000-7fff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29205.6>  
> Starting index update for db: shards/8000-9fff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29218.6>  
> Starting index update for db: shards/4000-5fff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29228.6>  
> Starting index update for db: shards/a000-bfff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.786000Z couchdb@localhost <0.29225.6>  
> Starting index update for db: shards/c000-dfff/vw.1487715840 idx: 
> _design/appname
> [info] 2017-02-21T22:38:58.788000Z couchdb@localhost <0.29208.6>  
> Starting index update for db: shards/e000-/vw.1487715840 idx: 
> _design/appname
> 
> I see high CPU activity signaling that the index is being created and 
> suddenly it stops:
> 
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/-1fff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/2000-3fff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/4000-5fff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/6000-7fff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/8000-9fff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/a000-bfff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/c000-dfff/vw.1487715840">>
> [error] 2017-02-21T22:39:58.931000Z couchdb@localhost <0.19734.6> d4985a33d1 
> fabric_worker_timeout 
> map_view,couchdb@localhost,<<"shards/e000-/vw.1487715840”>>

These timeouts are the expected behavior in 2.0 when a request for a view hits 
a configurable limit. The default timeout is 60 seconds. I believe this may 
have been a change from 1.x where the socket would sit open as long as 
necessary. If you need to recover that behavior you can set

[fabric]
request_timeout = infinity

You could also configure some other number in milliseconds:

; Give up after 10 seconds
[fabric]
request_timeout = 1 

In any case the indexing jobs should have continued even after this timeout. I 
think that’s why the request worked when you reloaded the page.

> [error] 2017-02-21T22:39:59.00Z couchdb@localhost <0.19734.6> d4985a33d1 
> req_err(1329706011) unknown_error : function_clause
>[<<"couch_mrview_show:list_cb/2 L212">>,<<"fabric_view_map:go/7 
> L52">>,<<"couch_query_servers:with_ddoc_proc/2 
> L421">>,<<"chttpd:process_request/1 L293">>,<<"chttpd:handle_request_int/1 
> L229">>,<<"mochiweb_http:headers/6 L122">>,<<"proc_lib:init_p_do_apply/3 
> L237">>]
> [notice] 2017-02-21T22:39:59.002000Z couchdb@localhost <0.19734.6> d4985a33d1 
> 127.0.0.1:5984 127.0.0.1 undefined GET 
> /dbname/_design/appname/_list/data/customer_id?key=%22PRIV-SE270_FC_AZT10L16_016%22
>  500 ok 60218

Here the “60218” number is the response time in milliseconds, which confirms 
that you bumped into the default timeout. However, you should

Re: Relaxo Ruby gem no longer targets CouchDB

2017-02-12 Thread Adam Kocoloski

Hi Samuel, thanks for sending this note. I have to admit that I find some of 
the feedback confusing — a new query server was one of the headline features of 
2.0 — but that’s neither here nor there. It’s good of you to provide a clear 
statement about the future of the Relaxo projects. Cheers,

Adam

> On Feb 11, 2017, at 5:48 PM, Samuel Williams  
> wrote:
> 
> My bad, the gem actually "Relaxo::QueryServer" :) But I'm sure no one cares.
> 
> On 12 February 2017 at 11:46, Samuel Williams
>  wrote:
>> Hi Guys,
>> 
>> It's with a heavy (and slightly frustrated) heart that I've decided to
>> stop investing time and effort into CouchDB. The 2.0 release moves
>> further away from the core principals of CouchDB that made it
>> attractive to me. In addition, a lot of issues in the core design of
>> CouchDB (e.g. better query server & schema) seem to be ignored for
>> years and so I've given up hope that there would be improvements.
>> 
>> I'm not trying to be negative too much, the 2.0 release looks really
>> great in a lot of ways - it's simply not what I'm after for my
>> personal projects.
>> 
>> The main point of this email is regarding the gems I maintained and 
>> published.
>> 
>> For several years, I maintained an unpopular set of Ruby gems:
>> "Relaxo", "Relaxo::Model" and "Relaxo::Query::Server". They are not
>> used much but they were pretty decent client libraries. I'm
>> refactoring the first two gems (never had a 1.0 release) as a Git
>> based transactional database. So, from the next release (probably
>> 0.6.0), they will have breaking API changes and no longer work with
>> CouchDB. The third one - Relaxo::Query::Server, may be modified to be
>> a git-based map-reduce server, so eventually that will be unavailable
>> too.
>> 
>> I'm just wanted to let anyone know, officially, what's happening with
>> these gems as I feel it would be unfortunate for someone to be
>> depending on them and not know what's happening or why. If you are
>> stuck using these gems, know that they are now unmaintained in their
>> current form, but you can pin to version "~> 0.4.0" and things would
>> keep working. In addition, I've updated the confluence wiki to point
>> to the best other option I know of (Couchrest) and removed links to
>> these gems. When I update these gems, later today, they should not be
>> published on the CouchDB news feed as they are no longer relevant.
>> 
>> Thanks so much everyone.
>> 
>> Kind regards,
>> Samuel

Re: Storing mango indexes in the ddoc

2017-01-26 Thread Adam Kocoloski

Hi Stefan,

Yes, Mango indexes are stored in design documents. The POST /db/_index endpoint 
accepts a “ddoc” query parameter that allows you to specify the ID of the 
document where you want to store the index. If you don’t specify that parameter 
a new doc is created automatically:

http://docs.couchdb.org/en/2.0.0/api/database/find.html#db-index 

If you already have the design document created you can replicate it in or use 
whatever ddoc management strategy you’ve been using with views to manage your 
Mango indexes as well. Cheers,

Adam

> On Jan 26, 2017, at 4:35 PM, Stefan du Fresne  wrote:
> 
> Hi,
> 
> In the same way you define views in the ddoc, is there any way of defining 
> mango indexes in the ddoc? If not, is there a pattern that is recommended for 
> getting them deployed to be used that I’m missing? Or is it expected that 
> when you use these indexes you’re responsible for some code that checks to 
> see if they already exist and then adds them if they aren’t at some point in 
> your application?
> 
> Cheers,
> Stefan

Re: Is there any technical reason why _revisions cannot be returned with _changes and/or views?

2016-12-12 Thread Adam Kocoloski

Hi Robert, that seems like a fine idea.

I think the lack of _revisions support in the changes feed is largely a 
historical artifact. Once upon a time the seq_tree that powers the changes feed 
only had access to the leaf revisions of each document as opposed to the full 
revision tree. If you wanted the full revision tree you needed to do a separate 
btree lookup on the id_tree, and so the changes feed avoided that overhead. In 
2.0 we include direct access to the revision tree in both btrees, and so the 
overhead of adding _revisions to the feed directly should be much smaller.

Cheers, Adam

> On Dec 12, 2016, at 9:27 PM, Robert Payne  wrote:
> 
> One of the major sync bottle necks we have (even with _bulk_get support) is 
> that the changes feed as well as views cannot have the revs=true parameter in 
> addition to include_docs=true to ensure all returned documents include their 
> revisions list. We need these revision lists to ensure our app can do offline 
> edits and then upload them with the _bulk_docs + new_edits flag.
> 
> Digging through the source it looks pretty easy to add support to allow 
> revs=true to be included, but I'm curious if there is any reason why this 
> would be a bad idea?
> 
> Cheers,
> Robert

Re: 2.0 _purge returning "not implemented"

2016-11-10 Thread Adam Kocoloski

Today there is not, but I’m keenly interested in pursuing this option. It’s 
technically feasible — the full deletion would be done by the compactor — but 
we want to be fairly careful in minimizing the ramifications for replication if 
a user configures this option.

My ideal scenario is that we can configure a database to “auto-purge” a deleted 
revision during compaction when we know that said revision has been recorded by 
every replication peer for which we have a checkpoint record. Still needs some 
discussion though.

I agree that without some sort of option to remove the tombstones CouchDB is 
not a good fit for a use case where the data has a shelf life of a couple of 
days. Cheers,

Adam

> On Nov 10, 2016, at 4:54 PM, Geoff Bomford  
> wrote:
> 
> I don't know?
> 
> Is there an option to enable deletion of deleted documents, especially when 
> there is no replication???
> 
> 
> -Original Message- From: Brian Brown
> Sent: Thursday, November 10, 2016 09:56
> To: user@couchdb.apache.org
> Subject: Re: 2.0 _purge returning "not implemented"
> 
> 
> 
> On 11/09/2016 03:30 PM, Geoff Bomford wrote:
>> Thanks Adam,
>> 
>> OK, then maybe Couchdb isn't what I'm looking for because I really
>> need to purge deleted documents.
>> 
>> I'm not using replication, and my data has a life of a couple of days,
>> maximum. So I have a lot of data coming in, being updated, and then
>> being deleted. All those old deleted documents are just going to get
>> in the way.
> Is there not still a config option that tells couch not to store old
> version of documents whatsoever? That sounds like exactly what you need
> in this case...
> 
> --Brian 
>

Re: 2.0 _changes request returning all changes

2016-11-09 Thread Adam Kocoloski

Hi Geoff, did you add ?filter=_doc_ids to the query string parameters?

http://docs.couchdb.org/en/2.0.0/api/database/changes.html#post--db-_changes 


Cheers, Adam

> On Nov 9, 2016, at 12:19 AM, Geoff Bomford  
> wrote:
> 
> I’m using couchdb 2.0 and trying to get a list of revision ids for a batch of 
> documents so I can purge them.
> 
> When I submit a POST to /db/_changes...
> 
> {
>"doc_ids": [
>"NEW712064",
>"NTG2192"
>]
> }
> 
> ...the response returns all changes, for all document ids.
> 
> Am I doing something wrong?

Re: 2.0 _purge returning "not implemented"

2016-11-09 Thread Adam Kocoloski

Hi Geoff,

Hmm, no … purge does not work at the clustered database level in the 2.0 
release. It looks like we may have failed to document this change beyond the 
API response that you got. There are some subtleties in how we have to manage 
that operation in a cluster. For example, if you happen to purge a document 
while only two replicas out of three are online, the third one may ultimately 
get re-propagated to the other two shards at a later date.

I know some folks are working on adding this capability back in to the 
clustered API, but until that work lands if you absolutely need to purge 
documents you can navigate to the shard-level API on a different port and 
execute the purge request there. That’s not for the faint of heart, though. 
Cheers,

Adam

> On Nov 9, 2016, at 12:22 AM, Geoff Bomford  
> wrote:
> 
> I’m trying to purge some documents by POSTing to /db/_purge
> 
> The response I am getting is...
> 
> {
>"error": "not_implemented",
>"reason": "this feature is not yet implemented"
> }
> 
> Is purge meant to be working in 2.0??

Re: Modifying a cluster

2016-11-02 Thread Adam Kocoloski

Hi Simon,

Maintenance mode is a bit different than simply taking the node out of the load 
balancer configuration. It goes further and actually prevents the node from 
contributing to a clustered response coordinated by _any_ node in the cluster. 
Under normal circumstances a node can still contribute to a result by 
forwarding its data on to the node than handled the HTTP request.

On the _global_changes bit — you might have uncovered a bug. We should look 
into that.

Adam

> On Nov 1, 2016, at 11:16 PM, Simon Keary <ske...@immersivetechnologies.com> 
> wrote:
> 
> 
> 
> Thanks Adam,
> 
> Sorry, yes, it was the "/_dbs" endpoint I meant.
> 
> That all makes sense. What your suggesting does seem simpler. I presume just 
> removing the node out of the load balancer temporarily has the same effect as 
> putting it into "maintenance mode"? I presume also that I could just have a 
> two, instead of 3, node cluster with q=8, r=1, w=1, n=2 and then remove the 
> second node temporarily, do updates, and then add it back in? 
> 
> I'm still confused why the PUT to "/_dbs/_global_changes" fails with "Only 
> reserved document ids may start with underscore"? Is there any other way to 
> add nodes for the system databases? I'm just thinking of the longer term 
> issue of adding nodes to the cluster if we want to expand the size for 
> additional durability/performance. I can see how do it with the other user 
> databases but not the system databases.
> 
> Thanks,
> Simon
> 
> 
> -Original Message-
> From: Adam Kocoloski [mailto:kocol...@apache.org] 
> Sent: Tuesday, 1 November 2016 10:44 PM
> To: user@couchdb.apache.org
> Subject: Re: Modifying a cluster
> 
> Hi Simon, that sounds more or less correct. I think you meant the 
> “_dbs/” endpoint instead of “_all_dbs”.
> 
> I’d agree that the process you outlined is a lot of manual labor. This is 
> part of the price that we pay for having the flexibility to define a 
> different sharding topology for each database in the cluster.
> 
> I might suggest a somewhat different approach - run a 3 node, n=3 cluster, 
> then put the nodes into “maintenance mode” one at a time to patch and upgrade 
> them. The maintenance mode flag will allow the node to continue to 
> participate in the cluster and receive updates, but will prevent it from 
> responding to the client until you determine that it’s healthy again. Running 
> n=3 ensures that you will always have two live nodes durably committing data 
> at any point in time. I appreciate that this may be more expensive than the 
> n=2 model, but it’s far simpler operationally (as you won’t have to modify 
> the sharding setup at all) and is a configuration that is much more 
> extensively tested.
> 
> If you want to use this technique the relevant configuration setting is
> 
> [couchdb]
> maintenance_mode = true
> 
> Cheers, Adam
> 
>> On Nov 1, 2016, at 3:36 AM, Simon Keary <ske...@immersivetechnologies.com> 
>> wrote:
>> 
>> Hi All,
>> 
>> I have a two node cluster with the following configuration:
>> 
>> q=8, r=1, w=2, n=2
>> 
>> From time to time I want to be able to be able to patch/upgrade the servers 
>> by adding two new nodes (servers) to the cluster and then removing the 
>> previous two. In this scenario I think all nodes in the cluster at any time 
>> (2-4) should have copies of all shards of all databases. My understanding is 
>> then to add a node I need to:
>> 
>> 1. Add the node to the list of cluster nodes via a PUT to /_nodes 2. 
>> For each database update the /_all_dbs/ pseudo document. For 
>> each shard in the document add the new node.
>> 
>> There are a few things I'm not clear of:
>> 
>> 1. Is this generally right? Assuming it is:
>> 2. With a large amount of databases it seems impractical to manually add a 
>> node since a document for each database will need to be modified and the 
>> modification isn't trivial. At the moment I have a JS script to do this but 
>> wanted to check I'm not missing something?
>> 3. I don't really understand how the system databases (_users, _metadata, 
>> _replication, _global_changes) fit into the picture? It looks like I need to 
>> treat them as normal databases and add all the shards for them to the new 
>> node? Doing a PUT to (for instance) _all_dbs/_global_changes to do this 
>> fails with "Only reserved document ids may start with underscore" so I'm a 
>> little confused...
>> 
>> Thanks for any help!
>> Simon
>> 
>> 
>> Disclaimer:
>> This message contains confi

Re: Modifying a cluster

2016-11-01 Thread Adam Kocoloski

Hi Simon, that sounds more or less correct. I think you meant the 
“_dbs/” endpoint instead of “_all_dbs”.

I’d agree that the process you outlined is a lot of manual labor. This is part 
of the price that we pay for having the flexibility to define a different 
sharding topology for each database in the cluster.

I might suggest a somewhat different approach - run a 3 node, n=3 cluster, then 
put the nodes into “maintenance mode” one at a time to patch and upgrade them. 
The maintenance mode flag will allow the node to continue to participate in the 
cluster and receive updates, but will prevent it from responding to the client 
until you determine that it’s healthy again. Running n=3 ensures that you will 
always have two live nodes durably committing data at any point in time. I 
appreciate that this may be more expensive than the n=2 model, but it’s far 
simpler operationally (as you won’t have to modify the sharding setup at all) 
and is a configuration that is much more extensively tested.

If you want to use this technique the relevant configuration setting is

[couchdb]
maintenance_mode = true

Cheers, Adam

> On Nov 1, 2016, at 3:36 AM, Simon Keary  
> wrote:
> 
> Hi All,
> 
> I have a two node cluster with the following configuration:
> 
> q=8, r=1, w=2, n=2
> 
> From time to time I want to be able to be able to patch/upgrade the servers 
> by adding two new nodes (servers) to the cluster and then removing the 
> previous two. In this scenario I think all nodes in the cluster at any time 
> (2-4) should have copies of all shards of all databases. My understanding is 
> then to add a node I need to:
> 
> 1. Add the node to the list of cluster nodes via a PUT to /_nodes
> 2. For each database update the /_all_dbs/ pseudo document. 
> For each shard in the document add the new node.
> 
> There are a few things I'm not clear of:
> 
> 1. Is this generally right? Assuming it is:
> 2. With a large amount of databases it seems impractical to manually add a 
> node since a document for each database will need to be modified and the 
> modification isn't trivial. At the moment I have a JS script to do this but 
> wanted to check I'm not missing something?
> 3. I don't really understand how the system databases (_users, _metadata, 
> _replication, _global_changes) fit into the picture? It looks like I need to 
> treat them as normal databases and add all the shards for them to the new 
> node? Doing a PUT to (for instance) _all_dbs/_global_changes to do this fails 
> with "Only reserved document ids may start with underscore" so I'm a little 
> confused...
> 
> Thanks for any help!
> Simon
> 
> 
> Disclaimer:
> This message contains confidential information and is intended only for the 
> individual(s) named. If you are not the named addressee you should not 
> disseminate, distribute or copy this email. Please immediately delete it and 
> all copies of it from your system, destroy any hard copies of it, and notify 
> the sender. Email transmission cannot be guaranteed to be secure or 
> error-free as information could be intercepted, corrupted, lost, destroyed, 
> arrive late or incomplete, or contain viruses. To the maximum extent 
> permitted by law, Immersive Technologies Pty. Ltd. does not accept liability 
> for any errors or omissions in the contents of this message which arise as a 
> result of email transmission.

Re: Node names and cluster with CouchDB 2.0

2016-10-25 Thread Adam Kocoloski

That would be _most_ welcome. Glad we got this sorted out. Cheers,

Adam

> On Oct 25, 2016, at 6:56 PM, Simon Keary <ske...@immersivetechnologies.com> 
> wrote:
> 
> 
> 
> Hi Adam,
> 
> Thanks so much - I have a cluster working now!
> 
> A key part I was missing was adding these lines to vm.args:
> 
> -kernel inet_dist_listen_min 9100
> -kernel inet_dist_listen_max 9200
> 
> I had incorrectly assumed that these were the default settings based on the 
> documentation.
> 
> I'm (obviously) very new to clustering but I'll see if I can suggest a couple 
> of tweaks to the documentation via a PR .
> 
> Thanks again,
> Simon
> 
> 
> -Original Message-
> From: Adam Kocoloski [mailto:kocol...@apache.org] 
> Sent: Wednesday, 26 October 2016 2:39 AM
> To: user@couchdb.apache.org
> Subject: Re: Node names and cluster with CouchDB 2.0
> 
> Hi Simon, if your nodes need to find each other by IP address you should use 
> -name. You can specify it in VM.args like
> 
> -name couchdb@
> 
> There shouldn't be a need to change the "couchdb@" element for each node 
> unless you want to run multiple nodes on the same IP (not recommended).
> 
> When you add the peer nodes into the _nodes database you should use the same 
> format; each document should have an ID like "couchdb@".
> 
> The firewall ports look fine provided that your vm.args file includes these 
> lines
> 
> -kernel inet_dist_listen_min 9100
> -kernel inet_dist_listen_max 9200
> 
> Cheers, Adam
> 
>> On Oct 25, 2016, at 1:17 AM, Simon Keary <ske...@immersivetechnologies.com> 
>> wrote:
>> 
>> 
>> 
>> Hi All,
>> 
>> I'm trying to setup a test CouchDB cluster across two machines and 
>> struggling to get it working and understanding how the node names work. I'm 
>> am able to sucessfully call the _nodes endpoint on one of the machines to 
>> add the other but I don't think this is doing the right thing since the 
>> cluster_nodes array is updated with the new node but the all_nodes array 
>> isn't...
>> 
>> Where I think I'm going wrong is setting the node names in the the vm.args 
>> files and then specifying the correct node name when trying to add the 
>> second machine to the first instance. In my case the machines have host 
>> names but neither machine can contact the other via this and can only 
>> contact the other via IP address. Ideally I'd only like the machines to 
>> connect to each other via IP and not host name as this will make maintenance 
>> of the cluster much simpler going forward. In this situation I'm not sure of:
>> 
>> * Whether I should be setting -sname or -name in vm.args. When using -name,  
>> my testing suggests I should use a name of the form "node1" with no @ 
>> symbol. Is this right?
>> * For -name I'm not sure whether it's supposed to be "node1@> address>" or "node1@" or "node1@localhost"?
>> * Once I've set the name I'm not sure how to refer to the other node when 
>> trying to add it via curl. If I've used -sname should I just use the short 
>> name (e.g. "node2") or "node2@"? If -name is used in vm.args 
>> is the correct thing to do to use "node2@"?
>> 
>> I don't believe I have a connectivity issue between the nodes but the ports 
>> that are open between them are 4369, 5984, 5986, 9100-9200 (all TCP) and all 
>> UDP.
>> 
>> When trying to remove the node, to try further variants to see if they work, 
>> I get an error about a document conflict. I suspect this is because CouchDB 
>> is getting into a strange state  but this makes a trial and error process of 
>> figuring out how the node names work as difficult to say the least.
>> 
>> Any help would be appreciated - Thanks!
>> Simon
>> 
>> 
>> 
>> 
>> Disclaimer:
>> This message contains confidential information and is intended only for the 
>> individual(s) named. If you are not the named addressee you should not 
>> disseminate, distribute or copy this email. Please immediately delete it and 
>> all copies of it from your system, destroy any hard copies of it, and notify 
>> the sender. Email transmission cannot be guaranteed to be secure or 
>> error-free as information could be intercepted, corrupted, lost, destroyed, 
>> arrive late or incomplete, or contain viruses. To the maximum extent 
>> permitted by law, Immersive Technologies Pty. Ltd. does not accept liability 
>> for any errors or omissions in the contents of this message which arise as a 
>> result of email transmission.

Re: Node names and cluster with CouchDB 2.0

2016-10-25 Thread Adam Kocoloski

Hi Simon, if your nodes need to find each other by IP address you should use 
-name. You can specify it in VM.args like

-name couchdb@

There shouldn't be a need to change the "couchdb@" element for each node unless 
you want to run multiple nodes on the same IP (not recommended).

When you add the peer nodes into the _nodes database you should use the same 
format; each document should have an ID like "couchdb@".

The firewall ports look fine provided that your vm.args file includes these 
lines

-kernel inet_dist_listen_min 9100
-kernel inet_dist_listen_max 9200

Cheers, Adam

> On Oct 25, 2016, at 1:17 AM, Simon Keary  
> wrote:
> 
> 
> 
> Hi All,
> 
> I'm trying to setup a test CouchDB cluster across two machines and struggling 
> to get it working and understanding how the node names work. I'm am able to 
> sucessfully call the _nodes endpoint on one of the machines to add the other 
> but I don't think this is doing the right thing since the cluster_nodes array 
> is updated with the new node but the all_nodes array isn't...
> 
> Where I think I'm going wrong is setting the node names in the the vm.args 
> files and then specifying the correct node name when trying to add the second 
> machine to the first instance. In my case the machines have host names but 
> neither machine can contact the other via this and can only contact the other 
> via IP address. Ideally I'd only like the machines to connect to each other 
> via IP and not host name as this will make maintenance of the cluster much 
> simpler going forward. In this situation I'm not sure of:
> 
> * Whether I should be setting -sname or -name in vm.args. When using -name,  
> my testing suggests I should use a name of the form "node1" with no @ symbol. 
> Is this right?
> * For -name I'm not sure whether it's supposed to be "node1@" 
> or "node1@" or "node1@localhost"?
> * Once I've set the name I'm not sure how to refer to the other node when 
> trying to add it via curl. If I've used -sname should I just use the short 
> name (e.g. "node2") or "node2@"? If -name is used in vm.args 
> is the correct thing to do to use "node2@"?
> 
> I don't believe I have a connectivity issue between the nodes but the ports 
> that are open between them are 4369, 5984, 5986, 9100-9200 (all TCP) and all 
> UDP.
> 
> When trying to remove the node, to try further variants to see if they work, 
> I get an error about a document conflict. I suspect this is because CouchDB 
> is getting into a strange state  but this makes a trial and error process of 
> figuring out how the node names work as difficult to say the least.
> 
> Any help would be appreciated - Thanks!
> Simon
> 
> 
> 
> 
> Disclaimer:
> This message contains confidential information and is intended only for the 
> individual(s) named. If you are not the named addressee you should not 
> disseminate, distribute or copy this email. Please immediately delete it and 
> all copies of it from your system, destroy any hard copies of it, and notify 
> the sender. Email transmission cannot be guaranteed to be secure or 
> error-free as information could be intercepted, corrupted, lost, destroyed, 
> arrive late or incomplete, or contain viruses. To the maximum extent 
> permitted by law, Immersive Technologies Pty. Ltd. does not accept liability 
> for any errors or omissions in the contents of this message which arise as a 
> result of email transmission.

Re: CouchDB at ApacheCon EU / Apache BigData EU

2016-10-25 Thread Adam Kocoloski

Hi Jan, awesome - thanks for representing CouchDB! Wish I could be there, I’m 
especially interested in that second talk :) Cheers,

Adam

> On Oct 23, 2016, at 7:52 AM, Jan Lehnardt  wrote:
> 
> Hey all,
> 
> I’m pleased to announced that I’ll be speaking at ApacheCon EU 
> (http://events.linuxfoundation.org/events/apachecon-europe/program/schedule) 
> and Apache BigData EU 
> (http://events.linuxfoundation.org/events/apache-big-data-europe/program/schedule)
>  in Seville, Spain on the week of November 14-18.
> 
> My talks are:
> - Introducing Apache CouchDB 2.0 (one version at each event)
> - A 2.0 is Not Going to Kill You But It Will Try: Lessons learned from 
> shipping 2.0
> - Building Inclusive Communities: Lessons learned and to be learned from 
> running CouchDB and Hoodie
> - Apache CouchDB 2.0 Sync Deep Dive: Replication Explained in excruciating 
> detail: design decisions and trade offs.
> 
> I’ll be there all week and I hope to see you there!
> 
> Tickets are still available:
> 
> ApacheCon Big Data EU: 
> http://events.linuxfoundation.org/events/apache-big-data-europe/attend/register
> ApacheCon EU: 
> http://events.linuxfoundation.org/events/apache-big-data-europe/attend/register
> 
> Special rates for committers and academics are available.
> 
> Best
> Jan
> --
> 
>

Re: CouchDB 2.0 Command Line Options

2016-10-18 Thread Adam Kocoloski

Hi Melvin, I have a theory here. Did the original $ROOTDIR/etc/vm.args file 
still contain a “-name” directive?

Multiple “-args_file” inclusions are OK, but when there are conflicting “-name” 
settings the node will default to nonode@nohost. When you specify both “-name” 
and “-sname” the latter will take precedence. Try removing the “-name” from 
$ROOTDIR/etc/vm.args and setting it using either of your first two methods. 
Cheers,

Adam

> On Oct 18, 2016, at 11:33 AM, melvin@tdameritrade.com wrote:
> 
> Hi,
> 
> I work with Maggie, and had a follow up question about running multiple 
> instances of couchdb using one binaries directory.  When setting up a 
> cluster, I understand I need to set the identity of each node in vm.args.  I 
> tried the following to have couchdb look in a different location for the 
> vm.args file:
> 
>   export ERL_FLAGS="-couch_ini /myinstance1/default.ini 
> /myinstance1/local.ini -args_file /myinstance1/vm.args"
> 
> However, the couchdb log shows the node identity as: nonode@nohost
> 
> Then I tried specifying -name in ERL_FLAGS:
> 
>   export ERL_FLAGS="-couch_ini /myinstance1/default.ini 
> /myinstance1/local.ini -name metc/yinstance1@hostname1.domain"
> 
> And I still get: nonode@nohost
> 
> Finally, I tried specifying -sname in ERL_FLAGS, and this seems to work, but 
> gives a shortname:
> 
>   export ERL_FLAGS="-couch_ini /myinstance1/default.ini 
> /myinstance1/local.ini -sname myinstance1"
> 
> And now I get: myinstance1@hostname1.  Is there any way I can either point 
> couchdb at my instance's vm.args file, or I can specify the fully qualified 
> -name option correctly?
> 
> Thanks for your help!
> 
> -Original Message-
> From: Jiang, Maggie 
> Sent: Thursday, October 06, 2016 5:27 PM
> To: user@couchdb.apache.org
> Subject: RE: CouchDB 2.0 Command Line Options
> 
> Thanks Adam and Jan. We were able to have a few instances running with the 
> workaround.
> 
> -Original Message-
> From: Adam Kocoloski [mailto:kocol...@apache.org] 
> Sent: Thursday, October 06, 2016 12:47 PM
> To: user@couchdb.apache.org
> Subject: Re: CouchDB 2.0 Command Line Options
> 
> Thanks Jan, that was a bit cleaner than I realized. It’s a decent workaround. 
> Important to note that every file you want to be consulted needs to be listed 
> there — the couch_ini flag will override the default search space entirely.
> 
> Adam
> 
>> On Oct 6, 2016, at 11:48 AM, Jan Lehnardt <j...@apache.org> wrote:
>> 
>> try
>> export ERL_FLAGS=“-couch_ini /path/to/default.ini /path/to/local_one.ini”
>> ./bin/couchdb
>> 
>> Analogous with local_two.ini etc.
>> 
>> Best
>> Jan
>> --
>> 
>>> On 06 Oct 2016, at 17:41, maggie.ji...@tdameritrade.com wrote:
>>> 
>>> Hi Adam,
>>> 
>>> Thanks for replying! I need to point it to another location. We plan to 
>>> have a few instances of CouchDB for different purposes on our dev servers 
>>> and would like to use the same binaries to start it up (but pointing to 
>>> different local.ini files on start up). Without the -a option, I'd have to 
>>> build CouchDB 2.0 3 times (for example) in order to be able to start up 3 
>>> instances of it on a dev server.
>>> 
>>> Maggie
>>> 
>>> -Original Message-
>>> From: Adam Kocoloski [mailto:kocol...@apache.org] 
>>> Sent: Thursday, October 06, 2016 11:10 AM
>>> To: user@couchdb.apache.org
>>> Subject: Re: CouchDB 2.0 Command Line Options
>>> 
>>> Hi Maggie,
>>> 
>>> You’re right, the “-a” switch is ignored in 2.0. That’s a miss on our part. 
>>> I filed 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_COUCHDB-2D3183=DQIFaQ=nulvIAQnC0yOOjC0e0NVa8TOcyq9jNhjZ156R-JJU10=NRwPGDrZo7L6ztGDn5UDKmqnTC5BQkZuuScVw4Hm9u4=CvGFCOSao1pTuD2kwqk7Wzhvz_0qdcdqfL_QjL_7E3o=LfHg4gH3qRk1Ve2f26b8_Pnj3TqhGxUo0O_duEaLikg=
>>>  . You can still drop files in the local.d directory and they should take 
>>> precedence over any files in default.* as well as local.ini.
>>> 
>>> There are other undocumented ways to customize the list of configuration 
>>> files that are consulted using flags in the vm.args file (and probably 
>>> using some environment variables as well),, but I wouldn’t really recommend 
>>> going there. Does the local.d option work for you or do you need to point 
>>> to another location?
>>> 
>>> Adam
>>> 
>>>> On Oct 5, 2016, at 10:20 AM, maggie.ji...@t

Re: CouchDB: 2.0 & 1.6.1 database compatibility

2016-10-07 Thread Adam Kocoloski

Lots of good questions there.

On the storage size, note that even when you write only one revision of each 
document the database will accumulate some wasted space. Inserts to the 
database cause internal btree structures to be updated, and due to the 
copy-on-write nature of the storage engine the old btree nodes are left around 
in the file.

We did make some changes in the compaction system that produce smaller files at 
the end of the day. You can read more about those changes here - 
https://blog.couchdb.org/2016/08/10/feature-compaction/ 
 - but they don’t 
explain the difference that you reported. Perhaps you didn’t compact the source 
database at all?

You are correct that both design documents and mango will build btree-based 
indexes to answer their queries. I would like to see us add functionality to 
mango over time so that it can cover the large majority of use cases where 
folks need to appeal to views in design documents, but we’re not quite there 
yet. One example where mango cannot help you today is reduce functions; if you 
want to aggregate the values in your index you need to drop down and build a 
view for that.

In terms of performance, mango should be moderately faster at building an index 
because there’s no JavaScript roundtrip. Querying performance should be 
~identical. Cheers,

Adam

> On Oct 7, 2016, at 7:56 AM, Thanos Vassilakis  wrote:
> 
> Good questions 
> 
> Sent from my iPhone
> 
>> On Oct 7, 2016, at 5:29 AM, Bogdan Andu  wrote:
>> 
>> I see the data management is totally different(and better).
>> now there is a _dbs.couch for a registry-like database for databases
>> and actual databases are located in data/shards subdirectories.
>> 
>> so.. only replication works here..
>> and one can replicate many databases in parallel.
>> 
>> another difference I see is the size of databases.
>> 
>> 2.0 version keep a very small size of databases compared to 1.6.1 version.
>> 
>> Is there any change in storage engine that makes so big differences in
>> database sizes?
>> 
>> all records in db1 in 1.6.1 have only one revision like (1-...) format
>> 
>> db1 in 1.6.1 is 2.5GB with 362849 records
>> after replication:
>> db1 in 2.0 has 69.3 MB with 362849 records
>> 
>> when is recommended to use design documents and when mango queries.
>> is mango intended to replace design documents although I assume both
>> build a view tree for the query in question.
>> 
>> which one is faster?
>> what are the use-cases for each one of the query methods?
>> 
>> Thanks,
>> 
>> Bogdan
>> 
>> 
>> 
>>> On Fri, Oct 7, 2016 at 11:20 AM, max  wrote:
>>> 
>>> Hi,
>>> 
>>> Install 2.0 version on another server or just make it listen on different
>>> port than 1.6 then replicate your data ;)
>>> 
>>> 2016-10-07 9:49 GMT+02:00 Bogdan Andu :
>>> 
 Hello,

 I configured a single-node CouchDB 2.0 instance and
 I copied in data directory 1.6.1 couch databases.

 But the databases does not show up in Fauxton, only the
 test databases:

 ["_global_changes","_replicator","_users","verifytestdb"].

 Is there a way to make CouchDB 2.0 read 1.6.1 couch files

 without importing?

 /Bogdan
>>>

Re: CouchDB 2.0 Command Line Options

2016-10-06 Thread Adam Kocoloski

Thanks Jan, that was a bit cleaner than I realized. It’s a decent workaround. 
Important to note that every file you want to be consulted needs to be listed 
there — the couch_ini flag will override the default search space entirely.

Adam

> On Oct 6, 2016, at 11:48 AM, Jan Lehnardt <j...@apache.org> wrote:
> 
> try
> export ERL_FLAGS=“-couch_ini /path/to/default.ini /path/to/local_one.ini”
> ./bin/couchdb
> 
> Analogous with local_two.ini etc.
> 
> Best
> Jan
> --
> 
>> On 06 Oct 2016, at 17:41, maggie.ji...@tdameritrade.com wrote:
>> 
>> Hi Adam,
>> 
>> Thanks for replying! I need to point it to another location. We plan to have 
>> a few instances of CouchDB for different purposes on our dev servers and 
>> would like to use the same binaries to start it up (but pointing to 
>> different local.ini files on start up). Without the -a option, I'd have to 
>> build CouchDB 2.0 3 times (for example) in order to be able to start up 3 
>> instances of it on a dev server.
>> 
>> Maggie
>> 
>> -Original Message-
>> From: Adam Kocoloski [mailto:kocol...@apache.org] 
>> Sent: Thursday, October 06, 2016 11:10 AM
>> To: user@couchdb.apache.org
>> Subject: Re: CouchDB 2.0 Command Line Options
>> 
>> Hi Maggie,
>> 
>> You’re right, the “-a” switch is ignored in 2.0. That’s a miss on our part. 
>> I filed 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_COUCHDB-2D3183=DQIFaQ=nulvIAQnC0yOOjC0e0NVa8TOcyq9jNhjZ156R-JJU10=NRwPGDrZo7L6ztGDn5UDKmqnTC5BQkZuuScVw4Hm9u4=CvGFCOSao1pTuD2kwqk7Wzhvz_0qdcdqfL_QjL_7E3o=LfHg4gH3qRk1Ve2f26b8_Pnj3TqhGxUo0O_duEaLikg=
>>  . You can still drop files in the local.d directory and they should take 
>> precedence over any files in default.* as well as local.ini.
>> 
>> There are other undocumented ways to customize the list of configuration 
>> files that are consulted using flags in the vm.args file (and probably using 
>> some environment variables as well),, but I wouldn’t really recommend going 
>> there. Does the local.d option work for you or do you need to point to 
>> another location?
>> 
>> Adam
>> 
>>> On Oct 5, 2016, at 10:20 AM, maggie.ji...@tdameritrade.com wrote:
>>> 
>>> Hi,
>>> 
>>> I'm looking to point to a different local.ini than the one located in the 
>>> CouchDB 2.0 etc folder. Is there a way to specify this when starting up 
>>> CouchDB or is there another way?  Looking at the docs, it seems like we 
>>> should still be using the "-a" option but after looking at the startup 
>>> scripts in the bin folder there is no code there to accept the "-a" switch 
>>> anymore.
>>> 
>>> Thanks,
>>> 
>>> Maggie
>> 
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>

Re: CouchDB 2.0 Command Line Options

2016-10-06 Thread Adam Kocoloski

Hi Maggie,

You’re right, the “-a” switch is ignored in 2.0. That’s a miss on our part. I 
filed https://issues.apache.org/jira/browse/COUCHDB-3183. You can still drop 
files in the local.d directory and they should take precedence over any files 
in default.* as well as local.ini.

There are other undocumented ways to customize the list of configuration files 
that are consulted using flags in the vm.args file (and probably using some 
environment variables as well),, but I wouldn’t really recommend going there. 
Does the local.d option work for you or do you need to point to another 
location?

Adam

> On Oct 5, 2016, at 10:20 AM, maggie.ji...@tdameritrade.com wrote:
> 
> Hi,
> 
> I'm looking to point to a different local.ini than the one located in the 
> CouchDB 2.0 etc folder. Is there a way to specify this when starting up 
> CouchDB or is there another way?  Looking at the docs, it seems like we 
> should still be using the "-a" option but after looking at the startup 
> scripts in the bin folder there is no code there to accept the "-a" switch 
> anymore.
> 
> Thanks,
> 
> Maggie

Re: Sharding question for clustered CouchDB 2.0

2016-07-26 Thread Adam Kocoloski

Hi Peyton,

It’s expected. The global_changes DB contains one document for every other 
database in the cluster. If you’re primarily writing to one database the 
associated doc in global_changes DB will have a ton of revisions and the shard 
hosting that doc will grow quickly. Other shards of global_changes won’t see 
the same growth. The good news as you've noticed is that it should also compact 
right back down.

Cheers, Adam

> On Jul 26, 2016, at 10:00 AM, Peyton Vaughn  wrote:
> 
> Thank you sooo much Eric - I find examples, in absence of documentation, a
> tremendous help - that was exactly what I needed.
> 
> Turns out it's the "global_changes" database that's the culprit - but as
> was expected, compaction fixes the disparity in storage usage.
> Given that even global_changes is sharded, is it a concern at all that some
> shards end up significantly larger than others? The most egregious example
> from my 3-node cluster looks like:
> 29G /usr/src/couchdb/dev/lib/node1/data/shards/-1fff
> 8.0G /usr/src/couchdb/dev/lib/node1/data/shards/c000-dfff
> 510M /usr/src/couchdb/dev/lib/node1/data/shards/8000-9fff
> 508M /usr/src/couchdb/dev/lib/node1/data/shards/e000-
> 1.7G /usr/src/couchdb/dev/lib/node1/data/shards/4000-5fff
> 56K /usr/src/couchdb/dev/lib/node1/data/shards/6000-7fff
> 510M /usr/src/couchdb/dev/lib/node1/data/shards/2000-3fff
> 1.7G /usr/src/couchdb/dev/lib/node1/data/shards/a000-bfff
> 42G /usr/src/couchdb/dev/lib/node1/data/shards
> 
> Given that there is a global_changes DB in each shard, obviously not an
> even distribution...
> 
> But maybe this is known/welcome behavior... mainly including the above info
> in case it's of interest to the 2.0 beta testing efforts.
> 
> 
> If I could ask one more question: how do I trigger compaction on the
> sharded views? Using the same base URLs that worked for DB compaction, I
> tried appending '_compact/[design doc name]' which gets me
> {"error":"not_found","reason":"missing"}, and I also tried hitting the
> '/[DB]/_view_cleanup' endpoint, which gives me a longer
> '{"error":"badmatch","reason":"{database_does_not_exist,\n
> [{mem3_shards,load_shards_from_db' response.
> 
> Apologies if I'm overlooking something obvious.
> Thanks again for the help,
> peyton
> 
> 
> On Mon, Jul 25, 2016 at 11:29 AM, Eiri  wrote:
> 
>> 
>> Hey Peyton,
>> 
>> Here is the example. First, get a list of all the shards from admin port
>> (15986)
>> 
>> http :15986/_all_dbs
>> [
>>“_replicator”,
>>“_users”,
>>“dbs”,
>>“shards/-/koi.1469199178”
>> ]
>> 
>> You are interested in the databases with “shards” prefix and need to run
>> usual compaction on each of them. The only catch is that the name have to
>> be url encoded. So in my case:
>> 
>> $ http post :15986/shards%2F-%2Fkoi.1469199178/_compact
>> content-type:application/json
>> {
>>“ok”: true
>> }
>> 
>> Mind that content-type have to be specified. And of course it need to be
>> ran on all the nodes, admin interface not clustered, i.e. the API commands
>> will not be carried across cluster.
>> 
>> Regards,
>> Eric
>> 
>> 
>>> On Jul 25, 2016, at 12:04 PM, Peyton Vaughn  wrote:
>>> 
>>> Apologies - bad copy paste - I'm doing this against port 15986. (All
>> nodes
>>> in the cluster are 1598[46], since they are not in a single container).
>>> ~>curl -H "Content-Type: application/json" -X POST '
>>> http://localhost:15986/shards-1fff/_compact' --user
>>> admin:wacit
>>> {"error":"illegal_database_name","reason":"Name: 'shards&'. Only
>> lowercase
>>> characters (a-z), digits (0-9), and any of the characters _, $, (, ), +,
>> -,
>>> and / are allowed. Must begin with a letter."}
>>> ~>curl -H "Content-Type: application/json" -X POST '
>>> http://localhost:15986/shards/-1fff/_compact' --user
>> admin:wacit
>>> {"error":"not_found","reason":"no_db_file"}
>>> ~>curl -H "Content-Type: application/json" -X POST '
>>> http://localhost:15986/shards\/-1fff/_compact' --user
>>> admin:wacit
>>> {"error":"illegal_database_name","reason":"Name: 'shards\\'. Only
>> lowercase
>>> characters (a-z), digits (0-9), and any of the characters _, $, (, ), +,
>> -,
>>> and / are allowed. Must begin with a letter."}
>>> ~>curl -H "Content-Type: application/json" -X POST '
>>> http://localhost:15986/staging_inventory/_compact' --user admin:wacit
>>> {"error":"not_found","reason":"no_db_file"}
>>> 
>>> Is it possible to get an example?
>>> 
>>> On Mon, Jul 25, 2016 at 10:58 AM, Jan Lehnardt  wrote:
>>> 
 
> On 25 Jul 2016, at 16:35, Peyton Vaughn  wrote:
> 
> I'm afraid I must echo Teo's question: how do I run compaction at the
 shard
> level?
> 
> Fauxton lists all of my shards as:
> 
> shards/-1fff/_global_changes.1469456629 This

Re: CouchDB 2.0 and zones

2016-04-28 Thread Adam Kocoloski

Yes, it does. The API has changed a bit, though. You need to label each of the 
nodes in the /nodes database with a “zone” attribute and then define a config 
setting like

[cluster]
placement = metro-dc-a:2,metro-dc-b:1

which will ensure that two replicas for a shard will be hosted on nodes with 
the zone attribute set to “metro-dc-a” and one replica will be hosted on a node 
with the zone attribute set to “metro-dc-b”.

Note that you can also use this system to ensure certain nodes in the cluster 
do not host _any_ replicas for newly created databases, by giving them a zone 
attribute that does not appear in the [cluster] placement string. This is 
definitely an area where we need to beef up the documentation. Cheers,

Adam

> On Apr 28, 2016, at 10:04 AM, Jason Gordon  
> wrote:
> 
> Hi,
> 
> Does CouchDB 2.0 support BigCouch-style zones?  If not, is it in the
> roadmap?  I would like to have a cluster where two nodes are in one data
> center and a third node (for hot failover) is in a remote data center.  I
> know replication could be used  but a cluster with zones seems more elegant.
> 
> Thanks,
> 
> Jason

Re: 2.0 Clustering Data Encryption

2016-04-27 Thread Adam Kocoloski

The data exchange between nodes uses the Erlang distribution protocol, not 
HTTP. It’s an unencrypted TCP channel by default but as Bob mentioned it is 
possible to configure the distribution to use TLS. We could do more to document 
the minimal steps required to enable this in CouchDB.

Adam

> On Apr 26, 2016, at 7:00 PM, david  wrote:
> 
> Not an expert but seems that by default it uses, http which is not encrypted, 
> hpwever you could configure it to use https and then it would be encrypted.
> 
> Again I am not an expert on this particular topic;
> 
> I am sure the docs will give guidance.
> 
> 
> 
> 
> 
> On 4/26/16 5:57 PM, Gabriel Mancini wrote:
>> i have this question too
>> 
>> On Tue, Apr 26, 2016 at 6:43 PM, Oleg Cohen 
>> wrote:
>> 
>>> Greetings,
>>> 
>>> I would like to understand if the data exchanged between cluster nodes is
>>> securely encrypted. Is there any documentation that explains how the data
>>> is passed around?
>>> 
>>> Thank you!
>>> Oleg
>> 
>> 
>> 
>

Re: Leap year bug?

2016-02-29 Thread Adam Kocoloski

Definitely a bug. We set the favicon.ico to expire “one year from now”, and we 
do that by incrementing the year by one and keeping all other elements of the 
date the same :)

https://github.com/apache/couchdb/blob/1.6.1/src/couchdb/couch_httpd_misc_handlers.erl#L49
 


I suspect it’s still a bug in master as well. Do you mind filing a JIRA?

https://issues.apache.org/jira/browse/COUCHDB 


Thanks, Adam

> On Feb 29, 2016, at 11:47 AM, Matthew Buckett  
> wrote:
> 
> I was just running my couch instance today and whenever I request the
> favicon (/favicon.ico) I get this in my logs:
> 
> couchdb_1 | [error] [<0.20390.0>] Uncaught error in HTTP request:
> {error,if_clause}
> couchdb_1 | [info] [<0.20390.0>] Stacktrace:
> [{calendar,date_to_gregorian_days,3,
> couchdb_1 |
> [{file,"calendar.erl"},{line,116}]},
> couchdb_1 |   {calendar,day_of_the_week,3,
> couchdb_1 |
> [{file,"calendar.erl"},{line,151}]},
> couchdb_1 |   {couch_util,rfc1123_date,1,
> couchdb_1 |
> [{file,"couch_util.erl"},{line,462}]},
> couchdb_1 |   {couch_httpd_misc_handlers,
> couchdb_1 |   handle_favicon_req,2,
> couchdb_1 |
> [{file,"couch_httpd_misc_handlers.erl"},
> couchdb_1 |{line,53}]},
> couchdb_1 |   
> {couch_httpd,handle_request_int,5,
> couchdb_1 |
> [{file,"couch_httpd.erl"},{line,318}]},
> couchdb_1 |   {mochiweb_http,headers,5,
> couchdb_1 |
> [{file,"mochiweb_http.erl"},{line,94}]},
> couchdb_1 |   {proc_lib,init_p_do_apply,3,
> couchdb_1 |
> [{file,"proc_lib.erl"},{line,239}]}]
> couchdb_1 | [info] [<0.20390.0>] 192.168.99.1 - - GET /favicon.ico 500
> couchdb_1 | [error] [<0.20390.0>] httpd 500 error response:
> couchdb_1 |  {"error":"unknown_error","reason":"if_clause"}
> 
> The last part of the couch code is:
> 
> rfc1123_date(UniversalTime) ->
>{{,MM,DD},{Hour,Min,Sec}} = UniversalTime,
>DayNumber = calendar:day_of_the_week({,MM,DD}),
> 
> Anyone else seeing this today, this is with Erlang R16B03
> (erts-5.10.4) and CouchDB 1.6.1?
> 
> -- 
>  Matthew Buckett, VLE Developer, IT Services, University of Oxford

Re: Couchdb Attachments architecture

2015-09-22 Thread Adam Kocoloski

Yes, actually we do have something in the works related to COUCHDB-769 to 
offload attachments to an external object storage system. We preserve the 
CouchDB API so an end user can’t tell if the offloading is happening. I’m 
overdue to get the code posted, and it’s only a prototype so it’ll need some 
work, but glad to see there’s interest here. I’ll look to get this posted on 
Wednesday. Cheers,

Adam

> On Sep 17, 2015, at 8:44 PM, Michael Power  wrote:
> 
> Looks like there is interest in it, but nothing concrete, maybe something in 
> the works.  I’ll have to see how long we can continue running on attachments 
> through couchdb and plan a migration. 
> 
> Michael Power

Re: Query view with keys, is order guaranteed?

2015-08-24 Thread Adam Kocoloski

Yes, that’s the defined behavior, and it will persist in 2.x. You need to take 
this into account when you write your map function; e.g. if you `emit([“keyB”, 
“keyA”], “value);` and then query for [“keyA”,”keyB”] you will not get a match. 
Regards,

Adam

 On Aug 22, 2015, at 8:02 AM, Stefan Klein st.fankl...@gmail.com wrote:
 
 Hi CouchDB users,
 
 when I query a view with keys=[keyA, keyB] the returnd rows also
 list the matches for keyA first, then matches for keyB. If i query
 with [keyB,keyA] the results reflects this and lists matches for
 keyB first.
 This is for my local couchdb 1.6.1.
 Is this behaviour guaranteed for 1.6.1?
 Will it also be guaranteed for 2.x?
 Or does couchdb just happen to behave so on my installation?
 
 thanks,
 Stefan

Re: Starting couchd 2.0 under diverse network infrastructures

2015-08-18 Thread Adam Kocoloski

Good question. The node name is set by the vm.args file. It defaults to “-name 
couchdb” which will cause CouchDB to try to discover the system hostname, but 
you could set that value directly as “-name couchdb@FQDN” or even “-name 
couchdb@IPV4”. The key point is to set it to something that can be routed 
from other nodes if you ever plan to exercise the clustering capability.

Does that make sense? Cheers,

Adam

 On Aug 18, 2015, at 5:39 AM, Antoine Duchâteau aduch...@gmail.com wrote:
 
 Hi list,
 
 I've got the following problem with CouchDB 2.0
 
 If I create a database when the computer has one hostname and subsequently 
 start the database again when the hostname is different, the database exposed 
 on 5984 appears empty because there is a mismatch in the node name and fabric 
 does not find the shards anymore.
 
 What is the best way to handle this situation ?
 I tried using hostname to force the hostname before launching couchdb but it 
 only works if I first disconnect all network connections... Which is not 
 really acceptable.
 
 Is there a way to force the hostname CouchDB is going to use ?
 
 Thanks in advance,
 Antoine
 
 -- 
 *Antoine Duchâteau*
 Managing Director
 Email : a...@taktik.be mailto:a...@taktik.be
 GSM : +32 499 534 536
 
 ___
 *TAKTIK SA*
 Parc de l'Alliance
 Avenue de Finlande 8
 1420 Braine l'Alleud
 Tel : +32 2 333.58.40
 http://www.taktik.be/http://www.taktik.be

Re: CouchDB Filtered Replication - Group of Doc's Don't appear to be batched

2015-08-03 Thread Adam Kocoloski

Hi Bob,

The replicator will dynamically choose _bulk_docs batch sizes based on the 
number of documents that are ready to be transmitted to the source. It’s 
possible to set an upper bound on the size of the batch, but at this time it’s 
not possible to set a lower bound.

It sounds like what’s happening here is that the replicator is faster than the 
filter function, and that it’s constantly waiting for the next document to pass 
the filter. One sanity check you might try is to request the filtered _changes 
feed directly and see what kind of throughput you get. The filtered _changes 
feed sets the upper bound on the replication throughput you can achieve with 
that filter. If the _changes feed is fast but the replication is slow, the next 
thing you should try to do is minimize the replication-related resource 
consumption on the source — e.g., is the replication mediated by the server 
hosting the source database, and if so do you have an opportunity to mediate 
the replication on a different server? Cheers,

Adam

 On Jul 24, 2015, at 10:50 PM, Bob Hathaway bob_hatha...@ymail.com wrote:
 
 Our couchdb replication runs fast without filters to a couch instance.  But 
 from that
 couch instance to another a filter is doing a simple check and sync'ing about 
 10x slower.
 Looking at couch debug logging, the bulk_docs Content-length is 10x smaller  
 with the filter.
 With filter the _bulk_docs rate per minute appears identical to the number of 
 docs sync'd.
 Without the filter, the _bulk_docs rate per minute is 10x less than docs 
 sync'd.
 
 It would appear the couch replication protocol groups 10 docs in a _bulk_docs 
 POST without a replication filter
 but  with the filter there is no grouping and bulk_docs appears to only 
 contain a single doc.
 
 Does couchdb filter replication not group docs and send 1 doc in each 
 _bulk_docs call to the target host?
 
 Is there some configuration which would allow the filter replication to group 
 docs to speed up replication?
 
 -- 
 Robert Hathaway
 President  Chief Software Architect
 SOA Object Systems, LLC
 office:  201-408-5828
 cell:201-390-7602
 email: rjh...@gmail.com

Re: High CPU after loading 15K documents

2015-07-07 Thread Adam Kocoloski

Hi Kiril, no it’s not normal. Did you mean `couchjs` when you said `couchdb`? 
If so it’s likely related to indexing. Do your views have custom reduce 
functions? A poorly-behaved custom reduce function (e.g. one that doesn’t 
really “reduce” its output) could yield very slow indexing. Does the dashboard 
indicate any indexing jobs ongoing?

Adam

 On Jul 7, 2015, at 1:42 PM, Kiril Stankov ki...@open-net.biz wrote:
 
 Hi,
 
 I've loaded 15K docs to a single DB, with 7 views, which do not emit docs 
 (just id's).
 Since then (4 hours now) the CPU is constantly high (~15-20%) with most of it 
 divided between couchdb and beam.smp.
 In addition couch became significantly slower than before (when there were ~ 
 3K docs).
 Is that normal? Can it be related to indexing? How long can it take?
 
 Thanks in advance.
 
 Kiril.

Re: Recover space imposed by 4K minimum document size?

2015-06-30 Thread Adam Kocoloski

Perhaps try triggering the compaction directly from the API with curl?

http://docs.couchdb.org/en/1.6.1/api/database/compact.html

Adam

 On Jun 30, 2015, at 3:45 AM, Travis Downs travis.do...@gmail.com wrote:
 
 I ran compaction via the button in _utils. I did notice that when I
 clicked the button, the spinner in the UI never stops, but I did check
 that compact_running was false for the DB in question - so I assumed
 it finished. I suppose some issue with _utils could instead mean it
 never started? Is there some way to distinguish the two cases?
 
 On Mon, Jun 29, 2015 at 5:49 PM, Adam Kocoloski kocol...@apache.org wrote:
 Database compaction should absolutely recover that space. Can you share a 
 few more details? Are you sure the compaction completes successfully? Cheers,
 
 Adam
 
 On Jun 29, 2015, at 8:19 PM, Travis Downs travis.do...@gmail.com wrote:
 
 I have an issue where I'm posting single smallish (~500 bytes)
 documents to couchdb, yet the DB size is about 10x larger than
 expected (i.e., 10x larger than the aggregate size of the documents).
 
 Documents are not deleted or modified after posting.
 
 It seems like what is happening is that every individual (unbatched
 write) always takes 4K due to the nature of the append-only algorithm
 writing 2 x 2K blocks for each modification as documented here:
 
 http://guide.couchdb.org/draft/btree.html
 
 OK, that's fine. What I don't understand is why the compact
 operation doesn't recover this space?
 
 I do recover the space if I replicate this DB somewhere else. The full
 copy takes about 10x less space. I would expect replicate to be able
 to do the same thing in place. Is there some option I'm missing?
 
 Note that I cannot use bulk writes since the documents are posted one
 by one by different clients.

Re: Recover space imposed by 4K minimum document size?

2015-06-30 Thread Adam Kocoloski

Ah, this one I think I can explain. The compactor in CouchDB 1.x writes 
documents directly to the new file in batches. If the IDs of those documents 
are essentially random in nature, the compacted file can end up with a lot of 
wasted space. By contrast, if the document IDs in the _changes feed are roughly 
ordered, then the compactor will write a large block of IDs to the same node in 
the ID btree and then not touch that btree again, resulting in a more compact 
file. When you increased the buffer size you decreased the number of rewrites 
that any individual btree node goes through during compaction.

The new compactor in the upcoming 2.0 release eliminates this inefficiency; it 
generates an optimal file size regardless of ID selection. It does this by 
maintaining the updated ID tree in a separate .meta file during compaction and 
then streaming the btree from that .meta file in-order at the end of the 
compaction.

Travis, I guess it’s possible that you could be bumping into this as well, 
although 10x sounds extreme.

Adam

 On Jun 30, 2015, at 2:19 AM, Alexander Shorin kxe...@gmail.com wrote:
 
 If documents are too small, compaction cannot retrieve all the disk
 space back. See this thread with the similar question:
 http://qnalist.com/questions/5836043/couchdb-database-size
 
 Question why is still open for me, but at least solution there is.
 --
 ,,,^..^,,,
 
 
 On Tue, Jun 30, 2015 at 3:49 AM, Adam Kocoloski kocol...@apache.org wrote:
 Database compaction should absolutely recover that space. Can you share a 
 few more details? Are you sure the compaction completes successfully? Cheers,
 
 Adam
 
 On Jun 29, 2015, at 8:19 PM, Travis Downs travis.do...@gmail.com wrote:
 
 I have an issue where I'm posting single smallish (~500 bytes)
 documents to couchdb, yet the DB size is about 10x larger than
 expected (i.e., 10x larger than the aggregate size of the documents).
 
 Documents are not deleted or modified after posting.
 
 It seems like what is happening is that every individual (unbatched
 write) always takes 4K due to the nature of the append-only algorithm
 writing 2 x 2K blocks for each modification as documented here:
 
 http://guide.couchdb.org/draft/btree.html
 
 OK, that's fine. What I don't understand is why the compact
 operation doesn't recover this space?
 
 I do recover the space if I replicate this DB somewhere else. The full
 copy takes about 10x less space. I would expect replicate to be able
 to do the same thing in place. Is there some option I'm missing?
 
 Note that I cannot use bulk writes since the documents are posted one
 by one by different clients.

Re: Recover space imposed by 4K minimum document size?

2015-06-29 Thread Adam Kocoloski

Database compaction should absolutely recover that space. Can you share a few 
more details? Are you sure the compaction completes successfully? Cheers,

Adam

 On Jun 29, 2015, at 8:19 PM, Travis Downs travis.do...@gmail.com wrote:
 
 I have an issue where I'm posting single smallish (~500 bytes)
 documents to couchdb, yet the DB size is about 10x larger than
 expected (i.e., 10x larger than the aggregate size of the documents).
 
 Documents are not deleted or modified after posting.
 
 It seems like what is happening is that every individual (unbatched
 write) always takes 4K due to the nature of the append-only algorithm
 writing 2 x 2K blocks for each modification as documented here:
 
 http://guide.couchdb.org/draft/btree.html
 
 OK, that's fine. What I don't understand is why the compact
 operation doesn't recover this space?
 
 I do recover the space if I replicate this DB somewhere else. The full
 copy takes about 10x less space. I would expect replicate to be able
 to do the same thing in place. Is there some option I'm missing?
 
 Note that I cannot use bulk writes since the documents are posted one
 by one by different clients.

Re: CouchDB utc_id behavior

2015-06-18 Thread Adam Kocoloski

Awesome, thanks for replying back to the thread Kiril. Cheers,

Adam

 On Jun 18, 2015, at 1:14 PM, Kiril Stankov ki...@open-net.biz wrote:
 
 Solved.
 If Couch starts ~ 30 sec. after boot completion, the problem seems to go away.
 Thanks all.
 
 *With best regards,*
 Kiril Stankov
 
 On 18-Jun-15 2:15 PM, Kiril Stankov wrote:
 Yes, the one from 12:59 is Ok. The one from 12:51 is not.
 I will try now to delay the start of Couchdb with a minute after all other 
 daemons started and see if it will help.
 May be the hardware clock is off and it takes time until ntpd syncs it and 
 CouchDB starts in between.
 
 
 
 *With best regards,*
 Kiril Stankov,
 
 On 17-Jun-15 6:41 PM, Adam Kocoloski wrote:
 Do you happen to know if either one of these was correct-ish? Do you see 
 any timestamps in the access logs that are also “off”?
 
 05188de92ef02f - Mon, 15 Jun 2015 12:51:05 GMT
 05188e0805067f - Mon, 15 Jun 2015 12:59:42 GMT
 
 Adam
 
 On Jun 16, 2015, at 12:44 PM, Kiril Stankov ki...@open-net.biz wrote:
 
 Ubuntu, couch 1.6.1, apt-get update few days ago.
 -- 
 Regards,
 
 Kiril Stankov,
 OpenNet Software ltd.
 
 
  Original Message 
 From: Nick North nort...@gmail.com
 Sent: June 16, 2015 7:41:50 PM GMT+03:00
 To: user@couchdb.apache.org, Joan Touzet woh...@apache.org
 Subject: Re: CouchDB utc_id behavior
 
 Thanks for the correction Joan - I had forgotten about the possibility of
 the clock jumping backwards. However, in this case the obvious causes don't
 seem to apply, and I'm slightly at a loss. I can't hypothesise a mechanism
 that would cause now() to return these times if the OS clocks are in sync.
 Kiril - what setup are you running on?
 
 Nick
 
 On Tue, 16 Jun 2015 at 15:48 Kiril Stankov ki...@open-net.biz wrote:
 
 Hi,
 As I wrote, ntpd is running and both machines have synced time. They were
 not down for weeks.
 -- 
 Regards,
 
 Kiril Stankov,
 OpenNet Software ltd.
 
 
  Original Message 
 From: Joan Touzet woh...@apache.org
 Sent: June 16, 2015 5:38:28 PM GMT+03:00
 To: user@couchdb.apache.org
 Subject: Re: CouchDB utc_id behavior
 
 now() is not guaranteed to be monotonically increasing if the system
 clock rolls backwards, which various things can cause.
 
 You should look into setting up ntpd for your two machines at the very
 least.
 
 -Joan
 
 - Original Message -
 From: Nick North nort...@gmail.com
 To: user@couchdb.apache.org
 Sent: Monday, June 15, 2015 12:14:50 PM
 Subject: Re: CouchDB utc_id behavior
 
 The utc_id algorithm uses Erlang's now() function for generating
 timestamps. This is guaranteed to be monotonic increasing, but not
 necessarily to be in very close correspondence with the operating
 system
 clock all the time, especially if you call it very frequently.
 However, I'm
 surprised that calls seconds apart are giving this problem. Are your
 machines VMs? There can be clock problems when they are suspended and
 reactivated, with clocks initially having the time when the machine
 was
 suspended, and then jumping forward, but that's unlikely if they are
 in
 fairly constant use. For what it's worth, I use utc_id timestamps for
 sorting documents, and have not seen this problem, but that doesn't
 help
 you very much.
 
 Nick
 
 On Mon, 15 Jun 2015 at 16:42 Kiril Stankov ki...@open-net.biz
 wrote:
 
 Hi,
 
 nope, this is not the case.
 The newer document has older ID, this is the problem.
 
 05188de92ef02f  05188e0805067f
 
 But
 05188de92ef02f
 was created after
 05188e0805067f
 
 
 
  
 *With best regards,*
 Kiril Stankov,
 
 On 15-Jun-15 6:08 PM, Alexander Shorin wrote:
 Time resolution is in microseconds, so difference in one second
 generated notable leap forward.
 -- 
 ,,,^..^,,,
 
 
 On Mon, Jun 15, 2015 at 5:10 PM, Kiril Stankov
 ki...@open-net.biz
 wrote:
 Hi all,
 
 I have two CouchDB (v1.6.1) servers, fully synchronized between
 them
 (master-to-master).
 The uuids algorithm is utc_id.
 The servers are synchronized via ntp and there is practically no
 time
 offset
 between them.
 I notice a strange behavior of the ID's of newly posted
 documents.
 In some cases, posting to server1, will generate ID, which is
 later
 than a
 subsequent post to server 2.
 E.g., posting to server 1 generates ID:
 05188e0805067f_1
 and then, few seconds later, posting to server 2 generates:
 05188de92ef02f_2
 As you see, the timestamp of the second message is earlier than
 the
 first
 (_1  _2 are suffixes for the two servers).
 This is causing me a big mess, as I use the timestamp to sort
 and order
 documents.
 Any idea why this happens?
 Can someone, please, shed more light on how CouchDB reads the
 time
 for the
 generation of the ID?
 Or if you have an idea what may be causing this behavior.
 
 Thanks in advance

Re: DB size question

2015-06-18 Thread Adam Kocoloski

Yep, it’s normal. The wasted space is due to the purely copy-on-write nature of 
the btree indexes that the database maintains. Two main things you can do to 
reduce the overhead:

* use the _bulk_docs endpoint
* choose a long common prefix for the _ids of the documents in a given payload

Yes, periodic compaction and cleanup is a good practice. Compaction only 
requires 1-2 extra file descriptors. It will use up to `doc_buffer_size` bytes 
to store docs in memory (default 512k), and will fsync after if fills the 
buffer `checkpoint_after` times (default 10). A larger buffer should result in 
a slightly faster compaction and a slightly more compact file. You probably 
don’t want to bother changing the checkpoint frequency. Cheers,

Adam

 On Jun 18, 2015, at 2:11 PM, Kiril Stankov ki...@open-net.biz wrote:
 
 Hi,
 
 I'm importing now a big number of documents in CouchDB.
 The documents have only single revision. And they will stay with single rev 
 in one of the DB's
 I notice that the Db size grows significantly, then, after compact drops by 
 70%.
 
 This process - import of single version documents will occur once a week.
 
 Why is so much space wasted? Is it something normal?
 
 Is it a good practice to run periodically compact and cleanup?
 
 Is there some DB size limit, after which the compact and cleanup may cause 
 issue or have problems to run? E.g. file descriptors, memory. How should I 
 configure checkpoint_after, doc_buffer_size?
 
 Thanks in advance.

Re: CouchDB utc_id behavior

2015-06-17 Thread Adam Kocoloski

Do you happen to know if either one of these was correct-ish? Do you see any 
timestamps in the access logs that are also “off”?

05188de92ef02f - Mon, 15 Jun 2015 12:51:05 GMT
05188e0805067f - Mon, 15 Jun 2015 12:59:42 GMT

Adam

 On Jun 16, 2015, at 12:44 PM, Kiril Stankov ki...@open-net.biz wrote:
 
 Ubuntu, couch 1.6.1, apt-get update few days ago. 
 -- 
 Regards,
 
 Kiril Stankov,
 OpenNet Software ltd.
 
 
  Original Message 
 From: Nick North nort...@gmail.com
 Sent: June 16, 2015 7:41:50 PM GMT+03:00
 To: user@couchdb.apache.org, Joan Touzet woh...@apache.org
 Subject: Re: CouchDB utc_id behavior
 
 Thanks for the correction Joan - I had forgotten about the possibility of
 the clock jumping backwards. However, in this case the obvious causes don't
 seem to apply, and I'm slightly at a loss. I can't hypothesise a mechanism
 that would cause now() to return these times if the OS clocks are in sync.
 Kiril - what setup are you running on?
 
 Nick
 
 On Tue, 16 Jun 2015 at 15:48 Kiril Stankov ki...@open-net.biz wrote:
 
 Hi,
 As I wrote, ntpd is running and both machines have synced time. They were
 not down for weeks.
 --
 Regards,
 
 Kiril Stankov,
 OpenNet Software ltd.
 
 
  Original Message 
 From: Joan Touzet woh...@apache.org
 Sent: June 16, 2015 5:38:28 PM GMT+03:00
 To: user@couchdb.apache.org
 Subject: Re: CouchDB utc_id behavior
 
 now() is not guaranteed to be monotonically increasing if the system
 clock rolls backwards, which various things can cause.
 
 You should look into setting up ntpd for your two machines at the very
 least.
 
 -Joan
 
 - Original Message -
 From: Nick North nort...@gmail.com
 To: user@couchdb.apache.org
 Sent: Monday, June 15, 2015 12:14:50 PM
 Subject: Re: CouchDB utc_id behavior
 
 The utc_id algorithm uses Erlang's now() function for generating
 timestamps. This is guaranteed to be monotonic increasing, but not
 necessarily to be in very close correspondence with the operating
 system
 clock all the time, especially if you call it very frequently.
 However, I'm
 surprised that calls seconds apart are giving this problem. Are your
 machines VMs? There can be clock problems when they are suspended and
 reactivated, with clocks initially having the time when the machine
 was
 suspended, and then jumping forward, but that's unlikely if they are
 in
 fairly constant use. For what it's worth, I use utc_id timestamps for
 sorting documents, and have not seen this problem, but that doesn't
 help
 you very much.
 
 Nick
 
 On Mon, 15 Jun 2015 at 16:42 Kiril Stankov ki...@open-net.biz
 wrote:
 
 Hi,
 
 nope, this is not the case.
 The newer document has older ID, this is the problem.
 
 05188de92ef02f  05188e0805067f
 
 But
 05188de92ef02f
 was created after
 05188e0805067f
 
 
 
 
 *With best regards,*
 Kiril Stankov,
 
 On 15-Jun-15 6:08 PM, Alexander Shorin wrote:
 Time resolution is in microseconds, so difference in one second
 generated notable leap forward.
 --
 ,,,^..^,,,
 
 
 On Mon, Jun 15, 2015 at 5:10 PM, Kiril Stankov
 ki...@open-net.biz
 wrote:
 Hi all,
 
 I have two CouchDB (v1.6.1) servers, fully synchronized between
 them
 (master-to-master).
 The uuids algorithm is utc_id.
 The servers are synchronized via ntp and there is practically no
 time
 offset
 between them.
 I notice a strange behavior of the ID's of newly posted
 documents.
 In some cases, posting to server1, will generate ID, which is
 later
 than a
 subsequent post to server 2.
 E.g., posting to server 1 generates ID:
 05188e0805067f_1
 and then, few seconds later, posting to server 2 generates:
 05188de92ef02f_2
 As you see, the timestamp of the second message is earlier than
 the
 first
 (_1  _2 are suffixes for the two servers).
 This is causing me a big mess, as I use the timestamp to sort
 and order
 documents.
 Any idea why this happens?
 Can someone, please, shed more light on how CouchDB reads the
 time
 for the
 generation of the ID?
 Or if you have an idea what may be causing this behavior.
 
 Thanks in advance!
 
 
 
 *With best regards,*
 Kiril Stankov,

Re: a case for couchdb

2015-06-12 Thread Adam Kocoloski

Hi Francesco, welcome!

My perception is that the CouchDB development community is as strong as it’s 
been at any point in the past several years. The Weekly News 
(http://blog.couchdb.org/) can give you a feel for all of the things that are 
going on these days. I’d also emphasize:

* We’ve grown the committer base to 40+, many of whom are active on a daily 
basis
* Neighbourhoodie Software offers professional support: 
http://neighbourhood.ie/couchdb-support/
* IBM provides a compatible cloud service and on-premises enterprise software: 
https://cloudant.com/
* Smileupps is building out a hosting and marketplace business: 
https://www.smileupps.com/

As far as 2.0 is concerned, I know it’s been a long time coming. I can tell you 
the folks on my team are treating it as a high priority. It’s actively being 
worked and we discuss the status in the weekly CouchDB IRC meetings. Happy to 
get into more of the specifics with you here.

IBM is committed to CouchDB as a core technology for our suite of data services 
in the cloud, so you’ll see us continue to invest and work with the rest of the 
community across the board to help evolve the technology and grow the 
community. Cheers,

—
Adam Kocoloski
IBM Distinguished Engineer
CTO, Cloud Data Services

 On Jun 12, 2015, at 12:19 PM, Francesco Zamboni f.zamb...@mastertek.it 
 wrote:
 
 Hello to everybody.
 My company is evaluating the use of couchdb for a new project.
 The set of features that couchdb have on paper matches almost perfectly the
 application that we've to develop, but there're some worries about the low
 level of activity apparent in the project, its scarce adoption and the push
 toward couchbase we see in a lot of threads around the web (while I'm aware
 that couchbase does not offer the same features of couchdb, including some
 of those that are in our desiderata).
 As the project is going to be quite big for us, investing a lot of effort
 in a technology that have an uncertain future could be a disastrous choice.
 For this reasons I want to ask to some more experienced users some help in
 making my case for using couchdb, especially in knowing how alive the
 development community is, how's going the 2.0 implementation and so on.
 Thank you in advance,
 
 -- 
 Francesco Zamboni

Re: Ordered changes feed in v2.0

2015-03-31 Thread Adam Kocoloski

A few scenarios here:

* Any v2.x database with no replicas (N=1) and only one shard (Q=1) will 
recover the v1.x behavior exactly.
* Any database with replicas but only one shard will have roughly ordered 
events. If a replica dies and gets repopulated from its peers you may see some 
reordering.
* Each shard of a database contributes to the _changes feed as fast as it can 
without any regard to total ordering.

The v2.x _changes feed still guarantees at least once processing; i.e., you 
can still store the last processed change and continue from there at any moment.

If you need sharding _and_ complete ordering of events then you should probably 
store a timestamp with each document, build a view keyed on timestamp, and have 
your async process consume that.

Adam

 On Mar 30, 2015, at 7:07 AM, Roald de Vries webthusi...@gmail.com wrote:
 
 Hi all,
 
 I’m considering how I can make my application forward compatible with v2.0, 
 and I see a potential problem with that (for my use case):
 
 I have a front-end that generates events, which I store in CouchDB. An 
 asynchronous back-end process listens to a feed of these events, and writes 
 an aggregate back to the db. I can’t use a view for this, because the 
 aggregate also depends on documents referenced from the event documents.
 
 For this to work, I need to be sure to process every event, and in the 
 correct order. I v1.X, I can simply store the last processed change, and 
 continue from there at any moment. Would this still work in v2.0? As I 
 understand now, the order of the changes in v2.0 is no longer ordered. Is 
 there a way to still get a consistently ordered feed from?
 
 Thanks in advance, Roald

Re: Spring Data Adapter for CouchDB: CouchRepository

2015-01-14 Thread Adam Kocoloski

Hi Rodrigo, thanks! I know Spring Data support is something that has come up in 
conversations with Cloudant prospects. Cheers,

Adam

 On Jan 14, 2015, at 5:10 AM, Rodrigo Witzel rwitze...@googlemail.com wrote:
 
 Hello Couch-er
 
 if you are a Java Developer and you want to use the Spring Data framework 
 together with CouchDB, then you should have a look on 
 https://github.com/rwitzel/CouchRepository
 
 CouchRepository is a thin adapter for the existing CouchDB Java drivers and 
 provides the Spring Data API.
 
 Regards
 Rod

Re: understanding couchdb errors

2014-12-30 Thread Adam Kocoloski

Hi Cory, line 440 in the 1.2.1 release of Apache CouchDB looks like this:

  {ok, RawBin:TotalBytes/binary} = file:pread(Fd, Pos, TotalBytes),

Your initial mail reported a 'badmatch' error on this line and indicated a 
response of the form {ok, Bin}. The match that's failing is therefore the 
TotalBytes check; i.e. CouchDB tried to read TotalBytes bytes from Fd starting 
at position Pos but failed to do so.

The TotalBytes value is not checksummed, so this is indeed a common way that 
disk failure manifests in CouchDB. When we go to read a term from disk we first 
read a few bytes from the file which represent the size of the term (i.e., 
TotalBytes). What's probably happened is that the TotalBytes value is junk, and 
CouchDB tried to read some crazy large value from disk. The Erlang I/O 
subsystem returned all bytes up to EOF, and the process crashes when the sizes 
don't match.

If you're inclined to do some surgery and you can figure out the value of Pos 
from the stacktrace you can try truncating the file just before that number of 
bytes; the resulting file is still a valid CouchDB view group, and whatever is 
missing will automatically be reindexed. Or you could blow away the view group 
file entirely and have it rebuild from scratch.

Adam

 On Dec 23, 2014, at 4:50 PM, Cory Zue c...@dimagi.com wrote:
 
 Hey Alexander,
 
 Thanks for the note. Upgrading sounds like a good suggestion, although my
 first priority right now is just understanding if I have any reason to be
 concerned about data corruption/loss. Do these stack traces provide any
 more insight?
 
 1.
 
 
   [{couch_file,
 
 read_raw_iolist_int,
 
 3,
 
 [{file,
 
   couch_file.erl},
 
  {line,
 
   440}]},
 
{couch_file,
 
 maybe_read_more_iolist,
 
 4,
 
 [{file,
 
   couch_file.erl},
 
  {line,
 
   430}]},
 
{couch_file,
 
 handle_call,
 
 3,
 
 [{file,
 
   couch_file.erl},
 
  {line,
 
   336}]},
 
{gen_server,
 
 handle_msg,
 
 5,
 
 [{file,
 
   gen_server.erl},
 
  {line,
 
   588}]},
 
{proc_lib,
 
 init_p_do_apply,
 
 3,
 
 [{file,
 
   proc_lib.erl},
 
  {line,
 
   227}]}]}
 
 2.
 
 [{couch_file,read_raw_iolist_int,3,[{file,\couch_file.erl\},{line,440}]},\n
 {couch_file,maybe_read_more_iolist,4,[{file,\couch_file.erl\},{line,430}]},\n
 {couch_file,handle_call,3,[{file,\couch_file.erl\},{line,336}]},\n
 {gen_server,handle_msg,5,[{file,\gen_server.erl\},{line,588}]},\n
 {proc_lib,init_p_do_apply,3,[{file,\proc_lib.erl\},{line,
 227}]}]},reason:{gen_server,call,[0.17085.3,{pread_iolist,8607618203},
 infinity]}}
 
 also, any suggestions on {error:unknown_error,reason:function_clause
 }?
 
 thanks in advance!
 Cory
 On Tue, Dec 23, 2014 at 3:20 AM, Alexander Shorin kxe...@gmail.com wrote:
 
 Hi,
 
 Without complete stacktrace it's hard to say, but
 badmatch/function_clause commonly is about unhandled case in code.
 According your CouchDB version the very first advice is to upgrade it
 up to the latest stable release since alot of things had been fixed
 since 1.2.1 day. Security ones are too.
 --
 ,,,^..^,,,
 
 
 On Tue, Dec 23, 2014 at 6:37 AM, Cory Zue c...@dimagi.com wrote:
 Hi all,
 
 We recently had an accident and lost our database and had to restore
 from a
 daily backup. Since restoring, couchdb has seemed to work ok for the most
 part, but has been giving strange, hard-to-reproduce

Re: How get results in reversed order by date and without id field (stackoverflow)

2014-05-05 Thread Adam Kocoloski

Indeed, it may be a bit quirky but if you do what it asks you should get what 
you're looking for:

 myview?startkey=[3566120224,{}]endkey=[3566120224]descending=true

That is, CouchDB always wants to start the traversal at the start key and 
finish at the end key. If you're supplying descending=true, that means the 
start key must sort _after_ the end key.

Adam

On May 5, 2014, at 3:19 PM, Nicolas Palacios npalacio...@gmail.com wrote:

 Hi Jens, thanks for your comment.
 
 But if I try with:
 myview?startkey=[3566120224]endkey=[3566120224,{}]descending=true
 
 I get this error:
 
 {error:query_parse_error,reason:No rows can match your key
 range, reverse your start_key and end_key or set descending=false}
 
 I'm lost. with it, also I want exclude the id field in order to
 get a shortest json result.
 
 
 
 2014-05-05 15:09 GMT-04:00 Jens Alfke j...@couchbase.com:
 
 
 On May 5, 2014, at 11:53 AM, Nicolas Palacios npalacio...@gmail.com
 wrote:
 
   myview?startkey=[3566120224]endkey=[3566120224,{}]reversed=true
 
 It’s “descending”, not “reversed”.
 Docs are here: http://docs.couchdb.org/en/latest/api/ddoc/views.html
 
 —Jens

Re: Help understanding crash log

2014-05-01 Thread Adam Kocoloski

On May 1, 2014, at 8:47 AM, Interactive Blueprints 
p.van.der.e...@interactiveblueprints.nl wrote:

 2014-05-01 13:14 GMT+02:00 Herman Chan herman...@gmail.com:
 Thanks Adam,
 
 It seems like it is happening again, with more info this time.  It looks 
 like I am hitting some sort of system limit, can anyone point out where to 
 look next?
 
 Just guessing here..
 What could be is that you hit the max open file limit of your system.
 With ulimit -a you can see the limits on your system.
 Usually the max open file limit is somewhere around 1024.
 I noticed that couchdb loves to have a lot of files open simultaneously.
 
 Iin the same shell you start couchdb, right before you start couchdb,
 you can do a ulimit -a 4096 (or another large value), this should
 give coudhb the ability to open more files.
 
 Hope this helps.
 
 Pieter van der Eems
 Interactive Blueprints

That's a good thought Pieter, though typically in that case you'll see an 
'emfile' error in the logs. This particular system_limit error (with {erlang, 
spawn_link, ...} following it) occurs when the Erlang VM has reached the 
maximum number of processes it's allowed to spawn. Judging from the *long* list 
of processes linked to couch_httpd in this stacktrace I'd say Herman's client 
is improperly leaving connections open. Herman, did you intend to have 1000s of 
open TCP connections on this server? Regards,

Adam

Re: Help understanding crash log

2014-05-01 Thread Adam Kocoloski

 May 2014 14:28:04 GMT] [info] [0.17536.33] Index shutdown by 
 monitor notice for db: group_98ff493c-63e8-4714-9940-ccea514d4b1d idx: 
 _design/filters
 [Thu, 01 May 2014 14:28:04 GMT] [info] [0.27768.43] Closing index for db: 
 group_56a0df90-c79e-4863-ae71-2bde3cb0d801 idx: _design/hub sig: 
 4f6edcabc4b7a6357b714e1391ed93ac
 
 On 2014-05-01, at 9:18 AM, Adam Kocoloski kocol...@apache.org wrote:
 
 On May 1, 2014, at 8:47 AM, Interactive Blueprints 
 p.van.der.e...@interactiveblueprints.nl wrote:
 
 2014-05-01 13:14 GMT+02:00 Herman Chan herman...@gmail.com:
 Thanks Adam,
 
 It seems like it is happening again, with more info this time.  It looks 
 like I am hitting some sort of system limit, can anyone point out where to 
 look next?
 
 Just guessing here..
 What could be is that you hit the max open file limit of your system.
 With ulimit -a you can see the limits on your system.
 Usually the max open file limit is somewhere around 1024.
 I noticed that couchdb loves to have a lot of files open simultaneously.
 
 Iin the same shell you start couchdb, right before you start couchdb,
 you can do a ulimit -a 4096 (or another large value), this should
 give coudhb the ability to open more files.
 
 Hope this helps.
 
 Pieter van der Eems
 Interactive Blueprints
 
 That's a good thought Pieter, though typically in that case you'll see an 
 'emfile' error in the logs. This particular system_limit error (with 
 {erlang, spawn_link, ...} following it) occurs when the Erlang VM has 
 reached the maximum number of processes it's allowed to spawn. Judging from 
 the *long* list of processes linked to couch_httpd in this stacktrace I'd 
 say Herman's client is improperly leaving connections open. Herman, did you 
 intend to have 1000s of open TCP connections on this server? Regards,
 
 Adam

Re: Help understanding crash log

2014-05-01 Thread Adam Kocoloski

Sure, here are a few rules of thumb:

* 1 process per inbound TCP connection
* 4 processes per open DB (up to [couchdb] max_dbs_open DBs will be kept open 
simultaneously)
* 3 processes per open view group (I might be off by one or two here)

More Erlang processes require more RAM, so don't go crazy.

Adam

On May 1, 2014, at 12:24 PM, Herman Chan herman...@gmail.com wrote:

 Thanks Adam,
 
 We just tried that and it seems to hold up.  Just wondering if there is some 
 kind of formula on what to set ERL_FLAGS to?
 
 Herman
 On 2014-05-01, at 10:51 AM, Adam Kocoloski kocol...@apache.org wrote:
 
 Hi Herman, I think those are just the view groups shutting down after the 
 parent DB crashed because you ran out of processes.
 
 You can increase the maximum number of processes via the ERL_FLAGS 
 environment variable, e.g.
 
 $ ERL_FLAGS=+P 512000 erl
 Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] 
 [async-threads:0] [hipe] [kernel-poll:false]
 
 Eshell V5.8.2  (abort with ^G)
 1 erlang:system_info(process_limit).
 512000
 
 The default is 256k, assuming you've got enough RAM you can bump that up to 
 1M with impunity. Regards,
 
 Adam
 
 On May 1, 2014, at 10:43 AM, Herman Chan herman...@gmail.com wrote:
 
 We do have 1000+ connection to the db, which we are trying to dial down.  
 However, even with lower connection, we hit the crash again, this time I 
 was able to get a better log.  You are right that we are hitting some limit,
 
 before the crash, the log shows that couch is still trying to open up index 
 from a reboot that we did.  Once it crash, the log start print out with 
 Index shutdown by monitor.  Is there any limit parameter that we can 
 increase?
 
 [Thu, 01 May 2014 14:28:04 GMT] [error] [emulator] Too many processes
 [Thu, 01 May 2014 14:28:04 GMT] [error] [emulator] Error in process 
 0.3672.477 with exit value: 
 {system_limit,[{erlang,spawn_opt,[proc_lib,init_p,[0.3672.477,[],gen,init_it,[
 gen_server,0.3672.477,0.3672.477,couch_db,{42 
 bytes,/usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,0.21556.480,[{user_ctx,{user_ctx,null,
 [6 bytes],undefined... 
 
 
 [Thu, 01 May 2014 14:28:04 GMT] [error] [0.21556.480] ** Generic server 
 0.21556.480 terminating 
 ** Last message in was {'EXIT',0.3672.477,
  {system_limit,
   [{erlang,spawn_opt,
 [proc_lib,init_p,
  [0.3672.477,[],gen,init_it,
   [gen_server,0.3672.477,0.3672.477,couch_db,

 {group_370c0635-e593-45ed-ac96-75e6b318cb35,
 
 /usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,
 0.21556.480,
 [{user_ctx,
   {user_ctx,null,[_admin],undefined}}]},
[]]],
  [link]]},
{proc_lib,start_link,5},
{couch_db,start_link,3},
{couch_server,'-open_async/5-fun-0-',4}]}}
 ** When Server state == {file,
  {file_descriptor,prim_file,
  {#Port0.898531,307709}},
  1261681}
 ** Reason for termination == 
 ** {system_limit,
 [{erlang,spawn_opt,
  [proc_lib,init_p,
   [0.3672.477,[],gen,init_it,
[gen_server,0.3672.477,0.3672.477,couch_db,
 {group_370c0635-e593-45ed-ac96-75e6b318cb35,
  
 /usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,
  0.21556.480,
  [{user_ctx,{user_ctx,null,[_admin],undefined}}]},
 []]],
   [link]]},
  {proc_lib,start_link,5},
  {couch_db,start_link,3},
  {couch_server,'-open_async/5-fun-0-',4}]}
 
 [Thu, 01 May 2014 14:28:04 GMT] [error] [0.21556.480] 
 {error_report,0.31.0,
   {0.21556.480,crash_report,
[[{initial_call,{couch_file,init,['Argument__1']}},
  {pid,0.21556.480},
  {registered_name,[]},
  {error_info,
   {exit,
{system_limit,
 [{erlang,spawn_opt,
   [proc_lib,init_p,
[0.3672.477,[],gen,init_it,
 [gen_server,0.3672.477,0.3672.477,
  couch_db,
  
 {group_370c0635-e593-45ed-ac96-75e6b318cb35,
   
 /usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,
   0.21556.480,
   [{user_ctx,
 {user_ctx,null,
  [_admin

Re: Help understanding crash log

2014-05-01 Thread Adam Kocoloski

Great, glad we got that sorted out.

On the topic of max_dbs_open - it can be a funny optimization dance, because 
the data structures that manage the list of open databases will become less 
efficient as the number of open databases increases. In particular, it becomes 
expensive to find the least recently used (LRU) DB in order to close it.

In my experience, increasing max_dbs_open makes a lot of sense if e.g. you have 
1000 active databases. By the time you reach 100k databases it's often better 
to match the max_dbs_open to the number of DBs that are queried over some short 
timespan, so that it's cheap to find the right one to close. Best,

Adam

On May 1, 2014, at 2:47 PM, Herman Chan herman...@gmail.com wrote:

 Thanks Adam,
 
 It make sense now why we crashed, we've set a very high number on 
 max_dbs_open (something like 10) and with the formula you described, 
 it'll create something like 800,000 processes (we have around 800,000 db on 
 this box), which is higher than what we set on ERL_FLAGS.
 
 Thanks for your help!
 
 Herman
 
 On 2014-05-01, at 2:31 PM, Adam Kocoloski kocol...@apache.org wrote:
 
 Sure, here are a few rules of thumb:
 
 * 1 process per inbound TCP connection
 * 4 processes per open DB (up to [couchdb] max_dbs_open DBs will be kept 
 open simultaneously)
 * 3 processes per open view group (I might be off by one or two here)
 
 More Erlang processes require more RAM, so don't go crazy.
 
 Adam
 
 On May 1, 2014, at 12:24 PM, Herman Chan herman...@gmail.com wrote:
 
 Thanks Adam,
 
 We just tried that and it seems to hold up.  Just wondering if there is 
 some kind of formula on what to set ERL_FLAGS to?
 
 Herman
 On 2014-05-01, at 10:51 AM, Adam Kocoloski kocol...@apache.org wrote:
 
 Hi Herman, I think those are just the view groups shutting down after the 
 parent DB crashed because you ran out of processes.
 
 You can increase the maximum number of processes via the ERL_FLAGS 
 environment variable, e.g.
 
 $ ERL_FLAGS=+P 512000 erl
 Erlang R14B01 (erts-5.8.2) [source] [64-bit] [smp:4:4] [rq:4] 
 [async-threads:0] [hipe] [kernel-poll:false]
 
 Eshell V5.8.2  (abort with ^G)
 1 erlang:system_info(process_limit).
 512000
 
 The default is 256k, assuming you've got enough RAM you can bump that up 
 to 1M with impunity. Regards,
 
 Adam
 
 On May 1, 2014, at 10:43 AM, Herman Chan herman...@gmail.com wrote:
 
 We do have 1000+ connection to the db, which we are trying to dial down.  
 However, even with lower connection, we hit the crash again, this time I 
 was able to get a better log.  You are right that we are hitting some 
 limit,
 
 before the crash, the log shows that couch is still trying to open up 
 index from a reboot that we did.  Once it crash, the log start print out 
 with Index shutdown by monitor.  Is there any limit parameter that we 
 can increase?
 
 [Thu, 01 May 2014 14:28:04 GMT] [error] [emulator] Too many processes
 [Thu, 01 May 2014 14:28:04 GMT] [error] [emulator] Error in process 
 0.3672.477 with exit value: 
 {system_limit,[{erlang,spawn_opt,[proc_lib,init_p,[0.3672.477,[],gen,init_it,[
 gen_server,0.3672.477,0.3672.477,couch_db,{42 
 bytes,/usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,0.21556.480,[{user_ctx,{user_ctx,null,
 [6 bytes],undefined... 
 
 
 [Thu, 01 May 2014 14:28:04 GMT] [error] [0.21556.480] ** Generic server 
 0.21556.480 terminating 
 ** Last message in was {'EXIT',0.3672.477,
{system_limit,
 [{erlang,spawn_opt,
   [proc_lib,init_p,
[0.3672.477,[],gen,init_it,
 [gen_server,0.3672.477,0.3672.477,couch_db,
  
 {group_370c0635-e593-45ed-ac96-75e6b318cb35,
   
 /usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,
   0.21556.480,
   [{user_ctx,
 {user_ctx,null,[_admin],undefined}}]},
  []]],
[link]]},
  {proc_lib,start_link,5},
  {couch_db,start_link,3},
  {couch_server,'-open_async/5-fun-0-',4}]}}
 ** When Server state == {file,
{file_descriptor,prim_file,
{#Port0.898531,307709}},
1261681}
 ** Reason for termination == 
 ** {system_limit,
   [{erlang,spawn_opt,
[proc_lib,init_p,
 [0.3672.477,[],gen,init_it,
  [gen_server,0.3672.477,0.3672.477,couch_db,
   {group_370c0635-e593-45ed-ac96-75e6b318cb35,

 /usr/local/var/lib/couchdb/group_370c0635-e593-45ed-ac96-75e6b318cb35.couch,
0.21556.480,
[{user_ctx,{user_ctx,null,[_admin],undefined}}]},
   []]],
 [link]]},
{proc_lib,start_link,5},
{couch_db,start_link,3},
{couch_server,'-open_async/5-fun-0-',4}]}
 
 [Thu, 01 May 2014 14:28

Re: CouchDB load spike (even with low traffic)?

2014-04-30 Thread Adam Kocoloski

Thanks Mike! I filed https://issues.apache.org/jira/browse/COUCHDB-2231 and 
linked your gist in there as a possible solution.

Adam

On Apr 30, 2014, at 1:12 AM, Mike Marino mmar...@gmail.com wrote:

 Hi Marty,
 
 It's difficult for me to tell the reason that couchdb is not stopping using
 your init script, but we had a similar issue that I fixed by patching the
 couchdb startup script (executable).  The issue was that the 'shepherd'
 program was respawning couch after a requested shutdown.
 
 This was discussed some time a while ago on the list and I sent our fix
 out, but I don't think it was ever integrated.  Anyways, here's the gist
 (for 1.3, though I think the file has remained the same in the newer
 versions):
 
 https://gist.github.com/7601778
 
 Cheers,
 Mike
 
 Am 30.04.2014 um 06:52 schrieb Marty Hu marty...@gmail.com:
 
 Okay, after doing a bit more work this is what I found out:
 
 1. When I start couchdb on a fresh server, it appears to run correctly.
 
 2. However, the conventional sudo service couchdb stop does not actually
 stop couchdb correctly. I know this because I can kill the couchdb
 processes with ps -U couchdb -o pid= | xargs kill -9
 
 3. We use chef for configuration, so at a set interval it will queue up a
 sudo service couchdb restart, which will try to stop the process (the
 process won't stop) and then start a new process (this process will
 actually try to start). However, the second process will not be able to
 bind to the port (the first process never got killed and still holds it) so
 will throw the error.
 
 I imagine that this is a configuration issue (and so not really a fault of
 your guys) but welcoming any tips about how to deal with this short of
 changing the init script to be a messy killer.
 
 
 On Tue, Apr 29, 2014 at 6:54 PM, Adam Kocoloski kocol...@apache.org wrote:
 
 Hi Marty, the mailing list stripped out the attachments except for
 
 spike.txt.
 
 
 I don't know if they're the cause of the load spikes that you see, but the
 
 eaddrinuse errors are not normal. They can be caused by another process
 
 listening on the same port as CouchDB. Fairly peculiar stuff.
 
 
 The timeout trying to open the splits-v0.1.7 at 21:23 does line up with
 
 your report that the system was heavily loaded at the time, but there's
 
 really not too much to go on here.
 
 
 Regards, Adam
 
 
 On Apr 29, 2014, at 7:46 PM, Marty Hu marty...@gmail.com wrote:
 
 
 Thanks for the follow-up.
 
 
 I've attached nagios graphs (load, disk, and ping) of one such event,
 
 which occurred at 2:24pm (after the drop in disk) according to my nagios
 
 emails. I've also attached database logs (with some client-specific queries
 
 removed). The error was fixed around 2:30pm. Notably, the log files are in
 
 GMT.
 
 
 Unfortunately I don't have any graphs for the event other than what's on
 
 nagios.
 
 
 Are the connection errors with CouchDB normal? We get them continuously
 
 (around every minute) even during normal operation with the DB not crashing.
 
 
 
 On Tue, Apr 29, 2014 at 2:34 AM, Alexander Shorin kxe...@gmail.com
 
 wrote:
 
 Hi Marty,
 
 
 thanks for following up! I see your problem, but what would we need:
 
 
 1. CouchDB stats graphs and your system disk, network and memory ones.
 
 If you cannot share them in public, feel free to send me in private.
 
 We need to know they are related. For instance, high memory usage may
 
 be caused by uploading high amount of big files: you'll easily notice
 
 that comparing CouchDB, network and memory graphs for the spike
 
 period.
 
 
 2. CouchDB log entries for spike event. Graphs can only show you
 
 that's something going wrong and we could only guess (almost we guess
 
 right, but without much precise) what's exactly going wrong. Logs will
 
 help to us to find out actual requests that causes memory spike.
 
 
 After that we can start to think about the problem. For instance, if
 
 spikes are happens due to large attachments uploads, there is no much
 
 to do. On other hand, query server may easily eat quite big chunk of
 
 memory. We'll easily notice that by monitoring /_active_tasks resource
 
 (if problem is in views) or by looking through logs for the spike
 
 period. And this case can be fixed.
 
 
 Not sure which tools you're using for monitoring and graphs drawing,
 
 but take a look on next projects:
 
 - https://github.com/gws/munin-plugin-couchdb - Munin plugin for
 
 CouchDB monitoring. Suddenly, it doesn't handles system metrics for
 
 CouchDB process - I'll only add this during this week, but make sure
 
 you have similar plugin for your monitoring system.
 
 - https://github.com/etsy/skyline - anomalies detector. spikes are so
 
 - https://github.com/etsy/oculus - metrics correlation tool. it would
 
 be very-very easily to compare multiple graphs for anomaly period with
 
 it.
 
 
 --
 
 ,,,^..^,,,
 
 
 
 On Tue, Apr 29, 2014 at 8:15 AM, Marty Hu marty...@gmail.com wrote:
 
 We're been running CouchDB v1.5.0 on AWS

Re: BigCouch merge - conflict management

2014-04-30 Thread Adam Kocoloski

Hi Klaus, N is the number of replicas of each document, not the number of nodes 
in the cluster.  You could have a 30 node cluster with N=1 and spread a 
database across all those nodes (Q = 30). With N=1 you'll still have the 
consistency properties that you desire. Regards,

Adam

On Apr 30, 2014, at 3:46 AM, Schroiff, Klaus kla...@fast.au.fujitsu.com wrote:

 Hi Joan,
 
 My thoughts circled more around N=x (-say- 5) ,W=1,R=1.
 Or to phrase it differently - we'd like to use horizontal scaling whereas HA 
 is not achieve in the same cluster but via (external) replication.
 
 So in this case the behaviour is still the same as with N=W=R=1 ?
 
 Thanks
 
 Klaus

Re: Compaction halted right away

2014-04-30 Thread Adam Kocoloski

Hi Boaz, ugh. It looks like you're using deflate-1 for the file compression, 
right? Were you ever using snappy on this database? You say the compaction 
stops right away; is there a .compact file? If there is, can you remove it and 
try again?

Adam

On Apr 30, 2014, at 1:30 AM, Boaz Citrin bcit...@gmail.com wrote:

 Hello,
 
 My database is functioning well, except compaction won't work.
 
 It stops right away. Attached log with errors.
 
 Version is 1.2.1 on Windows.
 
 Any idea?
 
 Thanks,
 
 Boaz
 couch.zip

Re: Help understanding crash log

2014-04-30 Thread Adam Kocoloski

The process that manages the .ini configuration information inside the server 
was taking longer than 5 seconds to respond to requests. This caused other 
processes to crash. Eventually the death rate for processes in the supervision 
tree grew large enough that the VM decided to nuke the entire Erlang 
application.

Of course, this raises the question -- why was couch_config timing out on 
register requests? There's nothing in that log which can help us figure that 
out.

Adam

On Apr 30, 2014, at 11:47 PM, Herman Chan herman...@gmail.com wrote:

 hi all,
 
 Our couchdb server (1.2) just died and had to be restarted, can someone 
 explain to me the crash log below?
 
 Thu, 01 May 2014 03:30:02 GMT] [error] [0.4723.2131] {error_report,0.31.0,
 {0.4723.2131,crash_report,
  [[{initial_call,
 {couch_index_server,init,['Argument__1']}},
{pid,0.4723.2131},
{registered_name,[]},
{error_info,
 {exit,
  {timeout,
   {gen_server,call,
[couch_config,
 {register,
  #Funcouch_index_server.config_change.2,
  0.4723.2131}]}},
  [{gen_server,init_it,6},
   {proc_lib,init_p_do_apply,3}]}},
{ancestors,
 [couch_secondary_services,couch_server_sup,
  0.32.0]},
{messages,[]},
{links,[0.27000.2602]},
{dictionary,[]},
{trap_exit,true},
{status,running},
{heap_size,377},
{stack_size,24},
{reductions,144}],
   []]}}
 [Thu, 01 May 2014 03:30:02 GMT] [error] [0.27000.2602] 
 {error_report,0.31.0,
  {0.27000.2602,supervisor_report,
   [{supervisor,{local,couch_secondary_services}},
{errorContext,start_error},
{reason,
 {timeout,
  {gen_server,call,
   [couch_config,
{register,
 #Funcouch_index_server.config_change.2,
 0.4723.2131}]}}},
{offender,
 [{pid,0.27013.2602},
  {name,index_server},
  {mfargs,{couch_index_server,start_link,[]}},
  {restart_type,permanent},
  {shutdown,brutal_kill},
  {child_type,worker}]}]}}
 [Thu, 01 May 2014 03:30:02 GMT] [error] [0.27000.2602] 
 {error_report,0.31.0,
 {0.27000.2602,supervisor_report,
   [{supervisor,{local,couch_secondary_services}},
{errorContext,start_error},
{reason,
 {timeout,
  {gen_server,call,
   [couch_config,
{register,
 #Funcouch_index_server.config_change.2,
 0.4723.2131}]}}},
{offender,
 [{pid,0.27013.2602},
  {name,index_server},
  {mfargs,{couch_index_server,start_link,[]}},
  {restart_type,permanent},
  {shutdown,brutal_kill},
  {child_type,worker}]}]}}
 [Thu, 01 May 2014 03:30:02 GMT] [error] [0.27000.2602] 
 {error_report,0.31.0,
  {0.27000.2602,supervisor_report,
   [{supervisor,{local,couch_secondary_services}},
{errorContext,shutdown},
{reason,reached_max_restart_intensity},
{offender,
 [{pid,0.27013.2602},
  {name,index_server},
  {mfargs,{couch_index_server,start_link,[]}},
  {restart_type,permanent},
  {shutdown,brutal_kill},
  {child_type,worker}]}]}}
 [Thu, 01 May 2014 03:30:02 GMT] [error] [0.83.0] {error_report,0.31.0,
   {0.83.0,supervisor_report,
[{supervisor,{local,couch_server_sup}},
 {errorContext,child_terminated},

Re: CouchDB load spike (even with low traffic)?

2014-04-29 Thread Adam Kocoloski

Hi Marty, the mailing list stripped out the attachments except for spike.txt.

I don't know if they're the cause of the load spikes that you see, but the 
eaddrinuse errors are not normal. They can be caused by another process 
listening on the same port as CouchDB. Fairly peculiar stuff.

The timeout trying to open the splits-v0.1.7 at 21:23 does line up with your 
report that the system was heavily loaded at the time, but there's really not 
too much to go on here.

Regards, Adam

On Apr 29, 2014, at 7:46 PM, Marty Hu marty...@gmail.com wrote:

 Thanks for the follow-up.
 
 I've attached nagios graphs (load, disk, and ping) of one such event, which 
 occurred at 2:24pm (after the drop in disk) according to my nagios emails. 
 I've also attached database logs (with some client-specific queries removed). 
 The error was fixed around 2:30pm. Notably, the log files are in GMT.
 
 Unfortunately I don't have any graphs for the event other than what's on 
 nagios. 
 
 Are the connection errors with CouchDB normal? We get them continuously 
 (around every minute) even during normal operation with the DB not crashing.
 
 
 On Tue, Apr 29, 2014 at 2:34 AM, Alexander Shorin kxe...@gmail.com wrote:
 Hi Marty,
 
 thanks for following up! I see your problem, but what would we need:
 
 1. CouchDB stats graphs and your system disk, network and memory ones.
 If you cannot share them in public, feel free to send me in private.
 We need to know they are related. For instance, high memory usage may
 be caused by uploading high amount of big files: you'll easily notice
 that comparing CouchDB, network and memory graphs for the spike
 period.
 
 2. CouchDB log entries for spike event. Graphs can only show you
 that's something going wrong and we could only guess (almost we guess
 right, but without much precise) what's exactly going wrong. Logs will
 help to us to find out actual requests that causes memory spike.
 
 After that we can start to think about the problem. For instance, if
 spikes are happens due to large attachments uploads, there is no much
 to do. On other hand, query server may easily eat quite big chunk of
 memory. We'll easily notice that by monitoring /_active_tasks resource
 (if problem is in views) or by looking through logs for the spike
 period. And this case can be fixed.
 
 Not sure which tools you're using for monitoring and graphs drawing,
 but take a look on next projects:
 - https://github.com/gws/munin-plugin-couchdb - Munin plugin for
 CouchDB monitoring. Suddenly, it doesn't handles system metrics for
 CouchDB process - I'll only add this during this week, but make sure
 you have similar plugin for your monitoring system.
 - https://github.com/etsy/skyline - anomalies detector. spikes are so
 - https://github.com/etsy/oculus - metrics correlation tool. it would
 be very-very easily to compare multiple graphs for anomaly period with
 it.
 
 --
 ,,,^..^,,,
 
 
 On Tue, Apr 29, 2014 at 8:15 AM, Marty Hu marty...@gmail.com wrote:
  We're been running CouchDB v1.5.0 on AWS and its been working fine.
  Recently AWS came out with new prices for their new m3 instances so we
  switched our CouchDB instance to use an m3.large. We have a relatively
  small database with  10GB of data in it.
 
  Our steady state metrics for it are system loads of 0.2 and memory usages
  of 5% or so. However, we noticed that every few hours (3-4 times per day)
  we get a huge spike that floors our load to 1.5 or so and memory usage to
  close to 100%.
 
  We don't run any cronjobs that involve the database and our traffic flow
  about the same over the day. We do run a continuous replication from one
  database on the west coast to another on the east coast.
 
  This has been stumping me for a bit - any ideas?
 
 
 spike.txt

Re: Compaction of a database with the same number of documents is getting slower over time

2014-04-22 Thread Adam Kocoloski

To first order it shouldn't be the case.

Is the number of deleted documents growing with time? That'd have an impact.

Adam

 On Apr 22, 2014, at 8:06 AM, Boaz Citrin bcit...@gmail.com wrote:
 
 Hello,
 
 Our database contains more or less the same number of documents, however
 the documents themselves change frequently.
 I would expect that compaction time will be the same, but I see that over
 time it takes longer to compact the database.
 
 Why is it so? Any way to overcome this?
 
 Thanks,
 
 Boaz

Re: Compaction of a database with the same number of documents is getting slower over time

2014-04-22 Thread Adam Kocoloski

Sorry, just so we're super clear -- the doc_count is roughly constant but the 
doc_del_count is rising? Compaction time scales with the sum of those two 
values. Your options to get it back down under control at the moment are

1) Purge the deleted docs (lots of caveats about replication, potential for 
view index resets, etc.)
2) Rotate over to a new database, either by querying both or by replicating 
non-deleted docs to the new one

Neither one is particularly palatable. CouchDB currently keeps the tombstones 
around forever so that replication can always work. Making changes on that 
front is a pretty subtle thing but maybe not completely impossible.

Also, there's a new compactor in the works that is faster and generates smaller 
files.

Adam

On Apr 22, 2014, at 9:46 AM, Boaz Citrin bcit...@gmail.com wrote:

 Yes, the documents change quickly, Many deletions and insertions, but total
 number don't change that much.
 
 
 On Tue, Apr 22, 2014 at 4:41 PM, Adam Kocoloski 
 adam.kocolo...@gmail.comwrote:
 
 To first order it shouldn't be the case.
 
 Is the number of deleted documents growing with time? That'd have an
 impact.
 
 Adam
 
 On Apr 22, 2014, at 8:06 AM, Boaz Citrin bcit...@gmail.com wrote:
 
 Hello,
 
 Our database contains more or less the same number of documents, however
 the documents themselves change frequently.
 I would expect that compaction time will be the same, but I see that over
 time it takes longer to compact the database.
 
 Why is it so? Any way to overcome this?
 
 Thanks,
 
 Boaz

Re: Issues with terabytes databases

2014-04-22 Thread Adam Kocoloski

Hi Jean-Yves, welcome! Always nice to hear reports from production deployments. 
Replies inline:

On Apr 22, 2014, at 10:09 AM, Jean-Yves Moulin jean-yves.mou...@eileo.com 
wrote:

 Hi everybody,
 
 we use CouchDB in production for more than two years now. And we are almost 
 happy with it :-) We have a heavy writing workload, with very few update, and 
 we never delete data. Some of our databases are terabytes with billions of 
 documents (sometimes 20 millions of doc per day). But we are experiencing 
 some issues, and the only solution was to split our data: today we create a 
 new database each week, with even and odd on two different servers (thus we 
 have on-line and off-line servers). This is not perfect, and we look forward 
 BigCouch :-)

You and me both :)

 Below is some of our current problems with these big databases. For the 
 record, we use couchdb-1.2 and couchdb-1.4 on twelve servers running FreeBSD 
 (because we like ZFS).
 
 I don't know if these issues are known or not (or specific to us).
 
 * Overall speed: we are far from our real server performance: it seems that 
 CouchDB is not able to use the full potential of the system. Even with 24 
 disks in RAID10, we can't go faster that 2000 doc/sec (with an average 
 document size of 1k, that's only a few MB/s on disk) on replication or 
 compaction. CPU and disk are almost idle. Tweaking the number of Erlang I/O 
 thread doesn't help.

For replication, there are various parameters that you can supply to allocate 
more resources to a given job. For your example I might try something like

{
  source: ...,
  target: ...,
  worker_processes: 0.8 * number of cores on server mediating replication,
  worker_batch_size: 500,
  http_connections: 300
}

though even with those settings I don't usually see the replicator exceed 4000 
docs/sec or so. At that point it's generally used up a good chunk of the CPU on 
a standard dual socket server. Note that mediating the replication on a 
less-loaded server can sometimes help significantly.

 * Insert time: At 1000 PUT/sec the insert time is good, even without bulk. 
 But it collapses when launching view calculation, replication or compaction. 
 So, we use stale view in our applications and views are processed regularly 
 by a crontab scripts. We avoid compaction on live servers. Compaction are 
 launched manually on off-line servers only. We also avoid replication on 
 heavy loaded servers.

We've done some work internally at Cloudant on ensuring that e.g. compaction 
only runs in the background and does not impact the throughput of interactive 
operations. We need to do some work to get that into a place where it can be 
contributed back. I don't have a better answer for this one at the moment.

 * Compaction: When size of database increase, compaction time can be really 
 really long. It will be great if compaction process can run faster on already 
 compressed doc. This is our biggest drawback, which implies the database 
 split each week. And the speed decreases slowly: compaction starts fast 
 (2000 doc/sec) but slow down to ~100 doc/sec after hundred of millions of 
 documents.

There's a new compactor in the works that's significantly faster and also 
generates smaller post-compaction files. It also eliminates this exponential 
falloff in throughput that you observed.

 Is there other people using CouchDB this kind of database ? How do you handle 
 a write-heavy workload ?

Billion document databases with 20 million updates per day are certainly within 
scope for BigCouch. Cheers,

Adam

 
 Sorry for my english and thank you for the reading.
 
 Best,

Re: Compaction of a database with the same number of documents is getting slower over time

2014-04-22 Thread Adam Kocoloski

You can either run a filtered replication or block them with a 
validate_doc_update function on the target database. Something like the 
following would allow you to delete docs on the target but block the 
replication of tombstones:

function (newDoc, oldDoc, userCtx) {
// any update to an existing doc is OK
if(oldDoc) {
return;
}

// reject tombstones for docs we don't know about
if(newDoc[\_deleted\]) {
throw({forbidden : \We're rejecting tombstones for unknown docs\})
}
}

Regards, Adam

On Apr 22, 2014, at 10:22 AM, Boaz Citrin bcit...@gmail.com wrote:

 You got it right.
 
 How can I only replicate non deleted docs?
 בתאריך 22 באפר' 2014 17:15, Adam Kocoloski kocol...@apache.org כתב:
 
 Sorry, just so we're super clear -- the doc_count is roughly constant
 but the doc_del_count is rising? Compaction time scales with the sum of
 those two values. Your options to get it back down under control at the
 moment are
 
 1) Purge the deleted docs (lots of caveats about replication, potential
 for view index resets, etc.)
 2) Rotate over to a new database, either by querying both or by
 replicating non-deleted docs to the new one
 
 Neither one is particularly palatable. CouchDB currently keeps the
 tombstones around forever so that replication can always work. Making
 changes on that front is a pretty subtle thing but maybe not completely
 impossible.
 
 Also, there's a new compactor in the works that is faster and generates
 smaller files.
 
 Adam
 
 On Apr 22, 2014, at 9:46 AM, Boaz Citrin bcit...@gmail.com wrote:
 
 Yes, the documents change quickly, Many deletions and insertions, but
 total
 number don't change that much.
 
 
 On Tue, Apr 22, 2014 at 4:41 PM, Adam Kocoloski 
 adam.kocolo...@gmail.comwrote:
 
 To first order it shouldn't be the case.
 
 Is the number of deleted documents growing with time? That'd have an
 impact.
 
 Adam
 
 On Apr 22, 2014, at 8:06 AM, Boaz Citrin bcit...@gmail.com wrote:
 
 Hello,
 
 Our database contains more or less the same number of documents,
 however
 the documents themselves change frequently.
 I would expect that compaction time will be the same, but I see that
 over
 time it takes longer to compact the database.
 
 Why is it so? Any way to overcome this?
 
 Thanks,
 
 Boaz

Re: Reseting DB Sequence Number

2014-03-15 Thread Adam Kocoloski

Hi Behrad, no, there's no way to reset that sequence number.  How many deleted 
docs do you have in your DB?  Is the aim behind your question to avoid sending 
those deleted docs through the view generation process?  Note that the actual 
number of entries in the index is going to be significantly less than 50M; each 
document that ever existed in your database shows up exactly once, regardless 
of how often it was edited.

Adam

On Mar 15, 2014, at 8:28 AM, Behrad behr...@gmail.com wrote:

 Is there any solution to reset a long-existing (update-intensive) Db's seq
 number?
 I see my view generation gets over 50,000,000 changes but I have only
 1,000,000 docs (most changes are archiving doc deletions)
 
 -- 
 --Behrad

Re: Replication vs. Compaction

2014-02-18 Thread Adam Kocoloski

On Feb 18, 2014, at 11:38 AM, Jan Lehnardt j...@apache.org wrote:

 On 31 Jan 2014, at 20:08 , Jason Smith j...@apache.org wrote:
 
 However there is a pathological situation where you are
 updating faster than the compactor can run, and you will get an infinite
 loop (plus very heavy i/o and filesystem waste as the compactor is
 basically duplicating your .couch into a .couch.compact forever).
 
 Just a little clarification on this point: CouchDB will try to catch up,
 I think 10 times, before giving up and reporting the result in the logs.
 
 Best
 Jan
 -- 

I'm not aware of any limit to the number of iterations executed by the 
compactor.  Regards,

Adam

Re: Revisions lost on CouchDB 1.2.0

2014-01-31 Thread Adam Kocoloski

On Jan 31, 2014, at 6:36 PM, Luca Morandini lmorand...@ieee.org wrote:

 On 31/01/14 19:42, Robert Samuel Newson wrote:
 
 As Simon says, this is the normal and expected behavior of CouchDB after
 database compaction (or replication). CouchDB is not a revision control 
 system,
 it only keeps the latest versions (including conflicts) of every document
 (including deleted ones).
 
 Compaction was indeed run (the sysadmin has no recollection of it, but it is 
 shown in the logs, my bad I overlooked it).
 
 I interpreted the _revs_limit parameter as the number of revisions to keep 
 (including their data), or does CouchDB keep metadata only after a compaction 
 ?
 
 Regards,
 Luca Morandini

Correct, _revs_limit controls the number of revisions about which metadata is 
kept.  It's relevant for replication -- if more than _revs_limit updates are 
applied in between replications to a target DB spurious conflicts can be 
generated.

Compaction only ever preserves the body of the latest revision on each edit 
branch.

Adam

Re: Replication of attachment is extremely slow.. LOGGED INFORMATION

2014-01-24 Thread Adam Kocoloski

Well how about that. Go Nick!

Adam

 On Jan 24, 2014, at 7:31 PM, Paul Davis paul.joseph.da...@gmail.com wrote:
 
 Interesting note, this is fixed by applying the patch to COUCHDB-1953.
 
 On Fri, Jan 24, 2014 at 3:03 PM, Rian R. Maloney rian.malo...@yahoo.com 
 wrote:
 Will do Dave. Thanks again for the help
 
 Im away from desk at the moment but - Ive recreated this on 2 windows 7 pcs 
 and a MAC OS X mini.
 
 Im putting together a cleansed file - this format is used for check image 
 exchange between banks so I need to remove personal data.
 
 Thanks
 Rian
 
 
 
 On Friday, January 24, 2014 4:51 PM, Paul Davis 
 paul.joseph.da...@gmail.com wrote:
 
 Duplicated locally. Poking around at debugging what's going on.
 
 
 On Fri, Jan 24, 2014 at 1:08 PM, Scott Weber scotty2...@sbcglobal.net 
 wrote:
 Thanks. I'll remember that for next time.
 
 Hopefully, there won't be a next time for a while :-)
 
 
 
 
 - Original Message -
 From: Jens Alfke j...@couchbase.com
 To: user@couchdb.apache.org user@couchdb.apache.org; Scott Weber 
 scotty2...@sbcglobal.net
 Cc: replicat...@couchdb.apache.org replicat...@couchdb.apache.org
 Sent: Friday, January 24, 2014 2:07 PM
 Subject: Re: Replication of attachment is extremely slow.. LOGGED 
 INFORMATION
 
 
 On Jan 24, 2014, at 11:41 AM, Scott Weber scotty2...@sbcglobal.net wrote:
 
 I do not see the log attached. It must have been stripped by the list 
 server.
 I will copy/paste it to the bottom of this email.  It is 1024 lines.
 
 Speaking of netiquette :) please don't paste a thousand lines of logs into 
 a message. It messes up the threaded view of the conversation.
 
 The listserv running this group is really MIME-unfriendly (ironic, 
 considering this thread) so attachments are out; but nowadays we have lots 
 of nice online tools like Pastebin and Gist for hosting big blobs of text 
 that you can then post a URL to.
 
 —Jens

Re: Number of surviving revisions after compaction

2014-01-20 Thread Adam Kocoloski

On Jan 20, 2014, at 11:29 AM, Vladimir Ralev vladimir.ra...@gmail.com wrote:

 Hello all
 
 I was reading about  _revs_limit
 http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options
 which
 defaults to 1000 or so here
 http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options
 
 It seems to imply that those 1000 revisions will be preserved even after
 compaction.Is this correct and does it mean that the database will be as
 much as 1000x bigger than it needs to be after compaction.
 
 I have a database that I want to perform maintenance on so i remove it from
 traffic and want to reduce the number of revisions to 1 again safely. Is
 there some shortcut to do that?

Hi, that setting controls the number of revisions about which the server keeps 
a record, not the number where the actual body of the rev is preserved.  
Compaction only ever preserves the last revision of each edit branch; this is 
not configurable.  The _revs_limit setting impacts replication, e.g. if you 
make 1001 edits on a source server in between replications to a target the 
replicator will not be able to piece together edit 1 and edit 1002 and you'll 
end up with a spurious conflict on the target.

Adam

Re: Errors when trying to compact

2014-01-20 Thread Adam Kocoloski

Hi Vladimir, compaction is an asynchronous process and so only a small subset 
of the possible errors that could occur during a compaction will surface in 
time to be reported as part of the response to your POST request.

Adam

On Jan 20, 2014, at 5:05 PM, Vladimir Ralev vladimir.ra...@gmail.com wrote:

 Upgraded to latest bigcouch and it solved this problem. Still would
 appreciate if someone can advice if this is a common problem i should watch
 for.
 
 
 On Mon, Jan 20, 2014 at 10:16 PM, Vladimir Ralev
 vladimir.ra...@gmail.comwrote:
 
 Hi,
 
 
 So I am trying to compact a bigcouch DB (couch 1.1.1, on r15b) with
 
 
 
 curl -H Content-Type: application/json -X POST 
 http://localhost:5984/my_db/_compact#= {ok:true}
 
 and it does indeed return {ok, true}
 
 I dont think the compaction worked, the size is unchanged.
 
 however when I look into the logs in fact this error is displayed:
 
 [Mon, 20 Jan 2014 16:13:17 GMT] [error] [emulator] [] Error in
 process 0.3208.218 on node 'bigcouch@bigcouch-server' with exit value:
 {undef,[{couch_task_status,update,[Copied ~p of ~p changes
 (~p%),[0,7386,0]]},{couch_db_updater,'-copy_compact/3-fun-0-',6},{couch_btree,stream_kv_node2,8},{couch_btree,stream_kp_node,8},{couch_btree,fold,4},{couch_db_updater...
 
 
 Could it be failing silently and is there some other way to debug it?
 Those are debug logs BTW, just a single error level line.

Re: Clarification on UUIDs Configuration

2014-01-20 Thread Adam Kocoloski

On Jan 20, 2014, at 3:36 PM, Jens Alfke j...@couchbase.com wrote:

 
 On Jan 20, 2014, at 12:19 PM, Stefan Klein st.fankl...@gmail.com wrote:
 
 a performance impact of random document ids.
 If the document ids are not sequential larger portions of the b-tree need
 to be rewriten.
 Is this related only to inserts or also to updates?
 
 It only applies to inserts, because if nodes are added to the b-tree in 
 random order, more rebalancing will be necessary. Adding them in sequential 
 order is more optimal.
 
 Updates don't change the structure of the tree (only the contents of leaf 
 nodes) so their ordering doesn't matter as much.
 
 —Jens

Well, at the end of the day the goal is that documents which mutated 
concurrently share long common id prefixes, because if they do they'll share 
many of the same inner nodes in their respective paths to the root, and we can 
optimize away extra rewrites of those inner nodes.

The easiest place to achieve this is during insertion by a judicious choice of 
document ID, but if for some reason you have a subset of documents in your 
database which are hot (i.e., frequently updated relative to the others) and 
you can afford to update them via _bulk_docs then it would make sense to give 
that document class a common ID prefix so that you can benefit from this group 
commit optimization.

Adam

Re: A CouchDB/Cloudant Scale Authorization question

2014-01-09 Thread Adam Kocoloski

Hi Andy, you're right that a DB per user-feature is currently the way to go to 
achieve the kind of access control granularity that you have in mind.  100k 
databases in a Cloudant account is not all that uncommon.  Cheers,

Adam

On Jan 9, 2014, at 8:55 AM, Andy Dorman ador...@ironicdesign.com wrote:

 Hi, we are new to document databases and CouchDB, but we are very excited 
 about the possibilities of CouchDB, Cloudant, and PouchDB, especially for 
 mobile applications.
 
 We are beginning a major update to a mobile first design of a web app that 
 has used an SQL db for over 13 years.  The app currently has thousands of 
 users (and will hopefully grow to tens of thousands once we have a mobile 
 version running) with 10 shareable features (Calendar, Recipes, etc.) for 
 each user.  Each user needs to be able to grant read or edit access to 
 each feature to some number (usually anywhere from 2 to 50) of other users.
 
 This access model needs read/write authorization to be per user per feature. 
 ie, Joe (a user) can grant edit access for his Recipes (a feature) to his Mom 
 (another user) and read access for his Calendar (another feature) to his wife 
 (another user).
 
 We really want to use Pouchdb in the client and Couchdb/Cloudant on the 
 server-side as that solves a LOT of issues regarding replication and network 
 access for mobile clients.
 
 However, it looks to us like the only way to implement this access model 
 using CouchDB's built-in auth features is to define a database for each 
 user-feature combination.  So Joe could grant edit access to his Recipe 
 database to his Mom and read access to his Calendar database to Fred and 
 edit access to his wife.
 
 Our first question is: Is it scalable for an app with several thousand(s) 
 users and 10 features to use a separate database for each user-feature? With 
 10,000 users and 10 features, that would come to 100,000 databases for our 
 app.
 
 The second question would be is there another way (other than us writing a 
 server-side middle layer REST-ful app to handle authorization) to handle 
 authorization at a per user per feature level?  Our original design using 
 CouchDB had a single database per user and a doc-type or document per 
 feature.  But we have been unable to figure out a way to have CouchDB control 
 authorization for each document or doc_type.
 
 Thank you for any insight or references to documentation that might explain a 
 way to implement CouchDB authorization at the doc_type or document level.
 
 -- 
 Andy Dorman

1 2 3 4 >

1 - 100 of 382 matches

Mail list logo