from:"Robert Newson"

Re: Configuring a Replication-Only Server

2023-11-28 Thread Robert Newson

Hi,

Not great options out of the box, unfortunately.

1) the autoupdate property (true|false) in the design document itself, disables 
"background" indexing.
2) ken.ignore config items would let you block building of databases by name.
3) disable ken entirely.

You'd also have to ensure users couldn't trigger interactive view building 
(i.e, don't let them call _view, _search, _find) but I think that's already 
achieved in your proposed setup.

autoupdate seems inappropriate as that would also inhibit background building 
of the ddoc's views everywhere they're replicated to as well.

the ken.ignore property seems maybe viable option but is not great. you'd need 
to list the dbname (and that's _shard_ level name I think) for every db you 
don't want to index.

disabling ken entirely might suit you if you query your views often (and thus 
they don't get too stale) or you're prepared to take the latency hit when stale 
or you are happy to make a homegrown ken of your own for just your indexes.

Beyond that it would need to be an enhancement or fork.

HTH,

B.

> On 28 Nov 2023, at 22:45, Diana Thayer  wrote:
> 
> Howdy!
> 
> Is it possible to configure CouchDB to only index design documents on
> certain databases?
> 
> For context, I'm developing a service that wraps CouchDB so that users can
> generate access tokens they can replicate with. These tokens look like
> URLs, but they point to a proxy that processes the request before passing
> it to the server. As these tokens permit replication *only*, it would be
> pointless to build indices off the design documents that users replicate,
> as they will never be able to query them.
> 
> However, the service also uses CouchDB to store its own data, and I do want
> it to be able to build and query indices. So, is there a way to permit both
> use-cases in one CouchDB server?
> 
> Best regards,
> Diana

Re: Writing an index with a linked document

2023-11-12 Thread Robert Newson

https://docs.couchdb.org/en/stable/api/ddoc/search.html#get--db-_design-ddoc-_search-index

You might mean ?include_fields=["createdDate"] ?

B.

> On 12 Nov 2023, at 19:12, TDAS  wrote:
> 
> …thinking further on this, can I return a number of fields with the index 
> that aren’t searched? EG if I have a ‘doc.createdDate’, how can I just return 
> that with the data?
> 
>> On 12 Nov 2023, at 18:19, TDAS  wrote:
>> 
>> Basically, I was hoping that I could have the search query return the name 
>> of the person linked to that document. Just to save doing further queries to 
>> convert a list of IDs to users.
>> 
>>> On 12 Nov 2023, at 17:24, Robert Newson  wrote:
>>> 
>>> chatgpt makes everything up. :)
>>> 
>>> You can't fetch another document during the indexing callbacks.
>>> 
>>> Perhaps explain what you're trying to achieve?
>>> 
>>> \b.
>>> 
>>>> On 11 Nov 2023, at 23:54, TDAS  
>>>> wrote:
>>>> 
>>>> getDoc doesn’t exist? Did chatgpt just make that up?! Man…
>>>> 
>>>> Is there an alternative?
>>>> 
>>>>> On 11 Nov 2023, at 22:52, Robert Newson  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> The problem is that getDoc() function doesn't exist, and so the 
>>>>> evaluation of this throws an error, which causes the document not to be 
>>>>> indexed at all.
>>>>> 
>>>>> B.
>>>>> 
>>>>>> On 11 Nov 2023, at 17:30, TDAS  
>>>>>> wrote:
>>>>>> 
>>>>>> Hey all
>>>>>> 
>>>>>> I have Clouseau running, and have written a search index which is 
>>>>>> working nicely.
>>>>>> 
>>>>>> However when I try to link a document, the search stops returning any 
>>>>>> results.
>>>>>> 
>>>>>> I’ve checked it with chatgpt (so it must be right, hey!) :)
>>>>>> 
>>>>>> Can anyone point out what I’m doing wrong?
>>>>>> 
>>>>>> 
>>>>>> The doc.owner is the ID of the user document, and the commented out 
>>>>>> section is the lookup I’m trying (that breaks the search). I’ve tried 
>>>>>> indexing it under ‘default’ to see if that was it, and also tried using 
>>>>>> a different index name, like ‘user’.
>>>>>> 
>>>>>> function (doc) {
>>>>>> if(!doc.deleted && doc.type) {
>>>>>> index('type', doc.type, {"store":true})
>>>>>> 
>>>>>> if (doc.type === 'user' && doc.firstname && doc.lastname) {
>>>>>> index('default', doc.firstname + ' ' + doc.lastname, {"store": true});
>>>>>> }
>>>>>> if(doc.addresses) {
>>>>>> for(const address of doc.addresses) {
>>>>>>   if(address.postcode)
>>>>>>   index('default', address.postcode, {"store": true})
>>>>>>   index('default', address.main.replace(/\n/g, ', '), {"store": true})
>>>>>> }
>>>>>> }
>>>>>> if(doc.email) {
>>>>>> index('default', doc.email, {"store": true})
>>>>>> }
>>>>>> if(doc.c_provider) {
>>>>>> index('default', doc.c_provider, {"store": true})
>>>>>> }
>>>>>> if(doc.c_policy_number) {
>>>>>> index('default', doc.c_policy_number, {"store": true})
>>>>>> }
>>>>>> 
>>>>>> // if (doc.owner) {
>>>>>> //   var userDoc = getDoc(doc.owner);
>>>>>> //   if (userDoc && userDoc.firstname && userDoc.lastname) {
>>>>>> //   index('owner', userDoc.firstname + ' ' + userDoc.lastname, { 
>>>>>> "store": true });
>>>>>> //   }
>>>>>> // }
>>>>>> }
>>>>>> }
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Writing an index with a linked document

2023-11-12 Thread Robert Newson

chatgpt makes everything up. :)

You can't fetch another document during the indexing callbacks.

Perhaps explain what you're trying to achieve?

\b.

> On 11 Nov 2023, at 23:54, TDAS  wrote:
> 
> getDoc doesn’t exist? Did chatgpt just make that up?! Man…
> 
> Is there an alternative?
> 
>> On 11 Nov 2023, at 22:52, Robert Newson  wrote:
>> 
>> Hi,
>> 
>> The problem is that getDoc() function doesn't exist, and so the evaluation 
>> of this throws an error, which causes the document not to be indexed at all.
>> 
>> B.
>> 
>>> On 11 Nov 2023, at 17:30, TDAS  
>>> wrote:
>>> 
>>> Hey all
>>> 
>>> I have Clouseau running, and have written a search index which is working 
>>> nicely.
>>> 
>>> However when I try to link a document, the search stops returning any 
>>> results.
>>> 
>>> I’ve checked it with chatgpt (so it must be right, hey!) :)
>>> 
>>> Can anyone point out what I’m doing wrong?
>>> 
>>> 
>>> The doc.owner is the ID of the user document, and the commented out section 
>>> is the lookup I’m trying (that breaks the search). I’ve tried indexing it 
>>> under ‘default’ to see if that was it, and also tried using a different 
>>> index name, like ‘user’.
>>> 
>>> function (doc) {
>>> if(!doc.deleted && doc.type) {
>>>  index('type', doc.type, {"store":true})
>>> 
>>>  if (doc.type === 'user' && doc.firstname && doc.lastname) {
>>>  index('default', doc.firstname + ' ' + doc.lastname, {"store": true});
>>>  }
>>>  if(doc.addresses) {
>>>for(const address of doc.addresses) {
>>>  if(address.postcode)
>>>  index('default', address.postcode, {"store": true})
>>>  index('default', address.main.replace(/\n/g, ', '), {"store": true})
>>>}
>>>  }
>>>  if(doc.email) {
>>>index('default', doc.email, {"store": true})
>>>  }
>>>  if(doc.c_provider) {
>>>index('default', doc.c_provider, {"store": true})
>>>  }
>>>  if(doc.c_policy_number) {
>>>index('default', doc.c_policy_number, {"store": true})
>>>  }
>>> 
>>>  // if (doc.owner) {
>>>  //   var userDoc = getDoc(doc.owner);
>>>  //   if (userDoc && userDoc.firstname && userDoc.lastname) {
>>>  //   index('owner', userDoc.firstname + ' ' + userDoc.lastname, { 
>>> "store": true });
>>>  //   }
>>>  // }
>>> }
>>> }
>> 
>> 
>

Re: Writing an index with a linked document

2023-11-11 Thread Robert Newson

Hi,

The problem is that getDoc() function doesn't exist, and so the evaluation of 
this throws an error, which causes the document not to be indexed at all.

B.

> On 11 Nov 2023, at 17:30, TDAS  wrote:
> 
> Hey all
> 
> I have Clouseau running, and have written a search index which is working 
> nicely.
> 
> However when I try to link a document, the search stops returning any results.
> 
> I’ve checked it with chatgpt (so it must be right, hey!) :)
> 
> Can anyone point out what I’m doing wrong?
> 
> 
> The doc.owner is the ID of the user document, and the commented out section 
> is the lookup I’m trying (that breaks the search). I’ve tried indexing it 
> under ‘default’ to see if that was it, and also tried using a different index 
> name, like ‘user’.
> 
> function (doc) {
>   if(!doc.deleted && doc.type) {
>index('type', doc.type, {"store":true})
> 
>if (doc.type === 'user' && doc.firstname && doc.lastname) {
>index('default', doc.firstname + ' ' + doc.lastname, {"store": true});
>}
>if(doc.addresses) {
>  for(const address of doc.addresses) {
>if(address.postcode)
>index('default', address.postcode, {"store": true})
>index('default', address.main.replace(/\n/g, ', '), {"store": true})
>  }
>}
>if(doc.email) {
>  index('default', doc.email, {"store": true})
>}
>if(doc.c_provider) {
>  index('default', doc.c_provider, {"store": true})
>}
>if(doc.c_policy_number) {
>  index('default', doc.c_policy_number, {"store": true})
>}
> 
>// if (doc.owner) {
>//   var userDoc = getDoc(doc.owner);
>//   if (userDoc && userDoc.firstname && userDoc.lastname) {
>//   index('owner', userDoc.firstname + ' ' + userDoc.lastname, { 
> "store": true });
>//   }
>// }
>  }
> }

Re: Search issues

2023-09-14 Thread Robert Newson

Hi,

It's true not 'true' (boolean not string).

B.

> On 13 Sep 2023, at 22:54, TDAS  wrote:
> 
> Ok success!
> Well, nearly… I’m not getting anything useful back unless I include_docs, 
> which I don’t want to do…
> How can I populate fields here? I’ve tried adding a fields array, and also 
> using highlight_fields
> {
>"total_rows": 2,
>"bookmark": 
> "g1BteJzLYWBgYMpgTmEQTM4vTc5ISXIwNDLXMwBCwxyQVCJDUv3___-zMpjc7D9rbnEAiiUy4lGfxwIkGRqA1H-Yto9rXoPEEnmyADx8Hsc",
>"rows": [
>{
>"id": "bf9143999ed3eedd7cbc7ff3560588eb",
>"order": [
>1.1976816654205322,
>1
>],
>"fields": {}
>},
>{
>"id": "bf9143999ed3eedd7cbc7ff35601949d",
>"order": [
>1.1047163009643555,
>12
>],
>"fields": {}
>}
>]
> }
> 
> This is my index:
> 
> function (doc) {
>  if (doc.type === 'user' && doc.firstname && doc.lastname) {
>index('name', doc.firstname + ' ' + doc.lastname, { 'store': 'true' });
>index('firstname', doc.firstname, { 'store': 'true' });
>}
> }
> 
> 
> 
>> On 13 Sep 2023, at 22:06, TDAS  wrote:
>> 
>> Ok thanks, any tips on installing jdk 8 on Debian bullseye? I can’t find 
>> anywhere with any suggestions for a version that early, apart from one which 
>> says I need to sign up with Oracle! this is becoming a much bigger headache 
>> than I had envisaged
>> 
>>> On 13 Sep 2023, at 21:56, Robert Newson  wrote:
>>> 
>>> There's https://docs.couchdb.org/en/stable/install/search.html
>>> 
>>> You can use up to java 8 but nothing newer.
>>> 
>>> We dropped log4j btw, though clouseau only ever used log4j 1.x which was 
>>> not affected by the Log4Shell vulnerability. clouseau uses slf4j and you 
>>> need to choose which adapter you'd like.
>>> 
>>> The next major release of couchdb will include an alternative Lucene 
>>> indexing system that works with Java 11 through 20 and will include the 
>>> Java artifacts necessary to run the whole stack.
>>> 
>>> B.
>>> 
>>>> On 13 Sep 2023, at 21:45, TDAS  
>>>> wrote:
>>>> 
>>>> Greetings
>>>> 
>>>> I’ve set up an index in a design doc, and I’m trying to use _search after 
>>>> a long wait I get:
>>>> 
>>>> {
>>>> "error": "ou_est_clouseau”,
>>>> "reason": "Could not connect to the Clouseau Java service at 
>>>> clouseau@127.0.0.1”
>>>> }
>>>> 
>>>> So, I’ve been looking into installing this but am going down a rabbit hole.
>>>> 
>>>> First I need java which I don’t have - I install that, then google tells 
>>>> me it will only compile with jdk6, which I can’t find. Then I see it uses 
>>>> log4j and I’m remembering the vulnerability that caused a load of 
>>>> headaches.
>>>> 
>>>> Any advice? Is there an idiot guide somewhere?
>>>> 
>>>> TIA
>>>> 
>>>> TC
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>

Re: Search issues

2023-09-13 Thread Robert Newson

There's https://docs.couchdb.org/en/stable/install/search.html

You can use up to java 8 but nothing newer.

We dropped log4j btw, though clouseau only ever used log4j 1.x which was not 
affected by the Log4Shell vulnerability. clouseau uses slf4j and you need to 
choose which adapter you'd like.

The next major release of couchdb will include an alternative Lucene indexing 
system that works with Java 11 through 20 and will include the Java artifacts 
necessary to run the whole stack.

B.

> On 13 Sep 2023, at 21:45, TDAS  wrote:
> 
> Greetings
> 
> I’ve set up an index in a design doc, and I’m trying to use _search after a 
> long wait I get:
> 
>{
>"error": "ou_est_clouseau”,
>"reason": "Could not connect to the Clouseau Java service at 
> clouseau@127.0.0.1”
>}
> 
> So, I’ve been looking into installing this but am going down a rabbit hole.
> 
> First I need java which I don’t have - I install that, then google tells me 
> it will only compile with jdk6, which I can’t find. Then I see it uses log4j 
> and I’m remembering the vulnerability that caused a load of headaches.
> 
> Any advice? Is there an idiot guide somewhere?
> 
> TIA
> 
> TC
> 
> 
> 
>

Re: Intercepting HTTP Requests

2023-07-08 Thread Robert Newson

rewrite functions are also optional. a user with access to call the _rewrite 
endpoint could simply PUT /dbname/docid instead. You'd need external 
enforcement to ensure they did not do so. _rewrite is also deprecated.

Only admins can create (or delete) databases, and ordinary users should not be 
granted admin rights.

B.

> On 8 Jul 2023, at 21:03, Ronnie Royston  wrote:
> 
> The aim is to implement a least privilege model, i.e., each user is granted
> the minimum system resources and authorizations that they need.
> https://csrc.nist.gov/glossary/term/least_privilege
> 
> Will try it with _rewrite as a function.
> 
> In addition to per document authorization, what limits a user/member from
> creating an infinite number of databases? It seems like a native rich auth
> model could be built with a *request function* having req, oldDoc, newDoc,
> userCtx, and secObj *but* for max power the verify function would also need
> to call/request other endpoints, for example, .length of GET all db with
> owner/author = userCtx.id/sub in order to limit db's per user.
> 
> On Sat, Jul 8, 2023 at 2:41 PM Robert Newson  wrote:
> 
>> Hi,
>> 
>> Currently there is no fine-grained read access controls within a database
>> and our advice is to separate documents into different databases to achieve
>> this level of control or, as you suggest, you can put such logic in an
>> application or proxy that mediates all access to couchdb.
>> 
>> Show functions are optional, a user could simply call GET /dbname/docid
>> and bypass any logic you might add there.
>> 
>> as an aside, fine-grained _write_ access is supported, through the
>> validate_doc_update functions.
>> 
>> We are looking at enhancing this area of couchdb. That work exists at
>> https://github.com/apache/couchdb/pull/4139 and has recently seen some
>> significant activity that raises the odds of it landing in a future couchdb
>> release. We'd benefit from knowing if it would address your needs.
>> 
>> hth,
>> B.
>> 
>>> On 8 Jul 2023, at 20:27, Ronnie Royston 
>> wrote:
>>> 
>>> I am a CouchDB user. I need more granularity in terms of DB
>> authorization,
>>> e.g. limit who can read a document in a shared database.
>>> 
>>> It appears that show functions do get passed the request object, (doc,
>>> req), however it looks like this is discouraged via a deprecation
>> warning.
>>> Update validation documents pass (newDoc, oldDoc, userCtx, secObj) to the
>>> query server, however I need the request object, and for *all* HTTP
>> methods.
>>> 
>>> src/chttpd/src/chttpd_node.erl seems to handle HTTP requests but I do not
>>> know Erlang well enough to pipe all requests out. I would really like to
>>> allow clients/browsers to communicate directly with couch (albeit via
>>> recommended reverse proxy) and not force all db requests through, for
>>> example, Node.js.
>>> 
>>> It seems like the query server architecture is 99% there in terms of
>> what I
>>> need - it's just that I need the full request object and need my
>> validation
>>> to get called for every HTTP method.
>>> 
>>> How can I restrict access to a document in a shared database based on
>>> userID? I believe I need to intercept HTTP requests and validate them,
>>> right?
>>> 
>>> --
>> 
>> 
> 
> -- 
> Ronnie Royston
> (504) 460-1592

Re: Intercepting HTTP Requests

2023-07-08 Thread Robert Newson

Hi,

Currently there is no fine-grained read access controls within a database and 
our advice is to separate documents into different databases to achieve this 
level of control or, as you suggest, you can put such logic in an application 
or proxy that mediates all access to couchdb.

Show functions are optional, a user could simply call GET /dbname/docid and 
bypass any logic you might add there.

as an aside, fine-grained _write_ access is supported, through the 
validate_doc_update functions.

We are looking at enhancing this area of couchdb. That work exists at 
https://github.com/apache/couchdb/pull/4139 and has recently seen some 
significant activity that raises the odds of it landing in a future couchdb 
release. We'd benefit from knowing if it would address your needs.

hth,
B.

> On 8 Jul 2023, at 20:27, Ronnie Royston  wrote:
> 
> I am a CouchDB user. I need more granularity in terms of DB authorization,
> e.g. limit who can read a document in a shared database.
> 
> It appears that show functions do get passed the request object, (doc,
> req), however it looks like this is discouraged via a deprecation warning.
> Update validation documents pass (newDoc, oldDoc, userCtx, secObj) to the
> query server, however I need the request object, and for *all* HTTP methods.
> 
> src/chttpd/src/chttpd_node.erl seems to handle HTTP requests but I do not
> know Erlang well enough to pipe all requests out. I would really like to
> allow clients/browsers to communicate directly with couch (albeit via
> recommended reverse proxy) and not force all db requests through, for
> example, Node.js.
> 
> It seems like the query server architecture is 99% there in terms of what I
> need - it's just that I need the full request object and need my validation
> to get called for every HTTP method.
> 
> How can I restrict access to a document in a shared database based on
> userID? I believe I need to intercept HTTP requests and validate them,
> right?
> 
> --

Re: Quorum: which nodes have a right to vote?

2023-06-14 Thread Robert Newson

Hi,

They are not votes. we are simply waiting to hear the first two of the total of 
three expected responses before returning a response to the client. No node 
will revert its write if another node fails its write. each of the three nodes 
might return a different status code due to ordering (e.g, one node might 
return 201 for a write and another node might return 409, which ends up adding 
a conflict into the document).

couchdb uses the shard map to know which nodes should host the document (based 
solely on its id) and directs reads and writes to those nodes only, correct.

If no node that hosts the range is available you will get an error trying to 
read or write a document in that range and no write happens; there is no 
"hinted handoff" in our variant of dynamo.

If the above happens for all ranges then, yes, all reads and writes for that 
database will fail.

B.

> On 14 Jun 2023, at 09:22, Luca Morandini  wrote:
> 
> On Wed, 14 Jun 2023 at 17:23, Robert Newson  wrote:
> 
>> 
>> There are no votes, no elections and there are no leader nodes.
>> 
> 
> As I see it, when there is a quorum to reach there is an implicit voting,
> but never mind.
> 
> 
> When couchdb believes nodes to be down, the quorum is implicitly lowered to
>> avoid the latency penalty.
>> 
> 
> So, it is kind of a "soft quorum".
> 
> Going back to my original question: only the nodes that host the shards are
> queried, but when there are not enough surviving nodes the quorum is
> lowered.
> 
> As a corollary, I assume that when at least one shard is no
> longer reachable (no one of the surviving nodes hosts it) the cluster stops
> accepting requests on that database: is that so?
> 
> Thanks for the answer,
> 
> Luca Morandini

Re: Quorum: which nodes have a right to vote?

2023-06-14 Thread Robert Newson

Hi,

There are no votes, no elections and there are no leader nodes.

CouchDB chooses availability over consistency and will accept reads/writes even 
if only one node (that hosts the shard ranges being read/written) is up. 

In a 3-node, 3-replica cluster, where every node hosts a copy of every shard, 
any single node can be up to allow all reads and writes to succeed.

Every node in the cluster can coordinate a read or write. The coordinator 
creates N concurrent and independent read/write requests and sends them to the 
appropriate nodes (that the shard map indicates for that document id). The 
coordinator waits for a quorum of replies before merging those replies into the 
http response to the client, up to the request timeout parameter. If at least 
one write occurred CouchDB will return a 202 status code, if quorum was reached 
a 201 is returned, 200 is returned for reads (whether quorum reached or not, 
the difference is you'd get a faster reply if quorum is reached, otherwise 
you're waiting for the timeout). 

When couchdb believes nodes to be down, the quorum is implicitly lowered to 
avoid the latency penalty.

In your scenario the two offline nodes would not get the writes at the time, 
for obvious reasons, but once up again they will receive those writes from the 
surviving nodes, restoring the expected N level of redundancy.

B.

> On 14 Jun 2023, at 07:11, Luca Morandini  wrote:
> 
> Folks,
> 
> A student (I teach CouchDB as part of a Cloud Computing course), pointed
> out that, on a 4-node, 3-replica cluster, the database should stop
> accepting requests when 2 nodes are down.
> 
> His rationale is: the quorum (assuming its default value of 2) in principle
> can be reached, but since some of the shards are not present on both nodes,
> the quorum of replicas cannot be reached even when there are still 2 nodes
> standing.
> 
> This did not chime with my experience, hence I did a little experiment:
> - set a cluster with 4 nodes and cluster parameters set to
> "q":8,"n":3,"w":2,"r":2;
> - created a database;
> - added a few documents;
> - stopped 2 nodes out of 4;
> - added another 10,000 documents without a hiccup.
> 
> I checked the two surviving nodes, and there were 6 shard files
> representing the 8 shards in each node: 4 shards were replicated, and 4
> were not.
> Therefore, about 5,000 of the write operations must have hit the
> un-replicated shards.
> 
> In other words, who has the vote in a quorum election: all the nodes, or
> only the nodes that host the shard with the sought document?
> 
> Cheers,
> 
> Luca Morandini

Re: Quorum formula

2023-06-11 Thread Robert Newson

Hi,

The code is definitive: 

https://github.com/apache/couchdb/blob/604526f5f93df28138a165a666e39ff37f3fdc06/src/mem3/src/mem3.erl#L391

n(DbName) div 2 + 1;

That is, (N/2) + 1, where (N/2) is rounded down to nearest integer.

For odd numbers of N (the only kind we recommend) the doc formulation is 
equivalent due to rounding.

However, we will amend the documentation as it is wrong for even numbers 
(because of the lack of rounding).

Thanks for bringing this to our attention.

B.

> On 10 Jun 2023, at 13:05, Luca Morandini  wrote:
> 
> Folks,
> 
> The doc states that the formula to get the write and read quorum is:
> The default required size of a quorum is equal to r=w=((n+1)/2)
> https://docs.couchdb.org/en/stable/cluster/sharding.html#quorum
> 
> However, the code suggests otherwise:
> WR = N div 2 + 1,
> https://github.com/apache/couchdb/blob/604526
> f5f93df28138a165a666e39ff37f3fdc06/src/fabric/src/fabric_db_info.erl#L159
> 
> Am I missing something?
> 
> Cheers,
> 
> Luca Morandini

Re: Repeated documents returned by text-search pagination.

2022-09-20 Thread Robert Newson

Hi,

The bookmark encodes the "order" property of the last result from each shard 
range, and a query with a bookmark parameter is simply retrieving matches that 
come after those order values. If the database changes between queries 
(documents added, changed or removed) such that the overall ordering of search 
results also changes, it is normal to see search results repeated (a database 
change added an item to a previous page, pushing every later change further 
down the list) or missing (a database change removed an item from a previous 
page, moving everyone "up").

B.

> On 20 Sep 2022, at 01:23, Luca Morandini  wrote:
> 
> Hi,
> 
> I added a text search index to a 4-node, Kubernetes-deployed,
> clustered database and started querying it.
> 
> The queries work, but I noticed that a variable (say, 1%-8%)
> proportion of the documents ids returned in batches through pagination
> (using bookmarks) was already returned by previous pages. The
> duplicated IDs change somewhat at every run, hence ithe phenomenon
> seems to be random.
> 
> I did not use stale in the requests, just the query, a limit set to
> 200, and the bookmark returned by the previous pagination response.
> 
> There are no errors in the log of either CocuhDB or Clouseau.
> 
> Could someone shed some light on this?
> 
> Cheers,
> 
> Luca Morandini

Re: Clouseau instances co-location

2022-08-05 Thread Robert Newson

It doesn't have to be. Couchdb and Clouseau communicate over Erlang RPC (the 
same protocol the couchdb nodes use to talk to each other). You can specify the 
Clouseau node name in the couchdb configuration. But do note that they are 
still _paired_. Each couchdb node should be configured to talk to its own, 
separate, Clouseau node.

B.

> On 4 Aug 2022, at 01:32, Luca Morandini  wrote:
> 
> Hi Folks,
> 
> Possibly a naive question: do Clouseau instances have to run on the
> same nodes CouchDB instances reside?
> 
> From the installation recipe, it looks like a Clouseau process has to
> be co-located on the same VM of the CouchDB Instance.
> 
> For the sake of robustness, I'd rather have Clouseau run on a
> different VM: is that at all possible?
> 
> Cheers,
> 
> Luca Morandini

Re: Disabling CouchDB server signature

2022-07-20 Thread Robert Newson

Hi,

The easiest approach would be to have haproxy send something else instead, but 
note that some tools might break if they can't retrieve the welcome message. 
I've confirmed that the replicator would not be affected. We welcome reports of 
your success and/or issues you face by removing this.

Something like this in the haproxy frontend;

 http-request return status 200 content-type application/json string "{}" if { 
path eq / }

B.

> On 20 Jul 2022, at 06:40, Arcadius Ahouansou  wrote:
> 
> Hello.
> By default, CouchDB exposes its current version to the worlds i.e
> going to curl http://MYHOST:MYPORT/
> I get the pretty json response below.
> 
> Please what is the recommended way of disabling this and display an
> empty json or remove at least the version.
> Note that I have haproxy in front of couchdb.
> Thank you very much.
> 
> Arcadius
> 
> {
> 
>   - "couchdb": "Welcome",
>   - "version": "3.2.2",
>   - "git_sha": "d5b746b7c",
>   - "uuid": "ce35097f091bda955f1a7b46adddaaca",
>   - "features": [...],
>   - "vendor": {
>  - "name": "The Apache Software Foundation"
>   }
> 
> }

Re: Shards cannot be read after move to a different cluster

2022-07-08 Thread Robert Newson

Hi,

The config file isn't monitored, so just changing the file won't help, you'd 
need to restart couchdb.

Did you have anything bypassed in the first place, though?

Could you explain the replication problems you encountered?

I can say for sure that it is generally unsafe to modify shard files 
out-of-band while couchdb is running, as it appears you did at step 5. Couchdb 
may well have opened the shard files (created at step 3) and it holds them open 
in a cache.

I don't think we have written up how to do this properly (we strongly advise 
replication instead) but I did write an SO post a while ago: 
https://stackoverflow.com/questions/6676972/moving-a-shard-from-one-bigcouch-server-to-another-for-balancing.
 The sharing scheme for bigcouch is the same as for couchdb 3.x.

The essential difference is to _not_ create the clustered database at the 
target cluster until _after_ you've copied the shard files over. You then 
create the '_dbs' doc yourself. (Note that in big couch this database was 
called "dbs").

B.

> On 8 Jul 2022, at 09:08, Luca Morandini  wrote:
> 
> On Fri, 8 Jul 2022 at 17:17, Robert Newson  wrote:
>> 
>> Hi,
>> 
>> There's a bug in 3.1.0 that affects you. Namely that the default 5 second 
>> gen_server timeout is used for some requests if ioq bypass is enabled. 
>> Please check if your config has a [ioq.bypass] section and try again without 
>> bypasses for a time.
> 
> Thanks for taking the time to answer me.
> 
> I set all the settings of the [ioq.bypass] section to false, set the
> cluster in maintenance mode, waited a couple minutes, than set
> maintenance to false... but no joy.
> 
> 
>> If you could explain your migration process in more detail perhaps we can 
>> find other explanations. I note that such migrations are better done online 
>> using replication, moving the files around is a bit more challenging.
> 
> I tried replication, but it failed, hence the shard files copy.
> 
> The procedure I followed (a tad simplified):
> - set the source cluster in maintenance mode;
> - copied the shard files to a shared disk;
> - created a database with the same name on the target cluster;
> - changed the database id on the copied shard files to match the
> newly-created one on the target cluster;
> - set the target cluster to maintenance mode;
> - copied the shard files from the shared disk to the target cluster
> data directories, making sure to get the shard directories right;
> - unset the maintenance mode on the target cluster.
> 
> The procedure above worked for a few databases (including one that
> -with replicas- was 6GB) but failed with the 200GB database.
> 
> Cheers,
> 
> Luca Morandini

Re: Shards cannot be read after move to a different cluster

2022-07-08 Thread Robert Newson

Hi,

There's a bug in 3.1.0 that affects you. Namely that the default 5 second 
gen_server timeout is used for some requests if ioq bypass is enabled. Please 
check if your config has a [ioq.bypass] section and try again without bypasses 
for a time.

If you could explain your migration process in more detail perhaps we can find 
other explanations. I note that such migrations are better done online using 
replication, moving the files around is a bit more challenging.

B.

> On 7 Jul 2022, at 08:28, Luca Morandini  wrote:
> 
> Dear All,
> 
> I moved some CouchDB 3.1.0 databases to a new 4-node cluster via
> copying the shard files.
> 
> The operation worked for 5 out of 6 databases; the biggest database
> (about 200GB, 12 shards, 2 replicas) did not come online on the new
> cluster.
> 
> I suspect high disk latency, but... could someone shed some light on this?
> 
> The relevant logs are:
> 
> [info] 2022-07-06T04:30:44.697901Z couchdb@10.0.0.80
> \u003c0.228.0\u003e  db
> shards/9553-aaa7/twitter.1657067184 died with reason
> {timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}}
> [error] 2022-07-06T04:30:44.698269Z couchdb@10.0.0.80
> \u003c0.26789.5\u003e  CRASH REPORT Process
> (\u003c0.26789.5\u003e) with 2 neighbors exited with reason:
> {timeout,{gen_server,call,[\u003c0.26790.5\u003e,find_header]}} at
> gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
> \u003c= couch_bt_engine:init/2(line:157) \u003c=
> couch_db_engine:init/3(line:775) \u003c=
> couch_db_updater:init/1(line:43) \u003c=
> proc_lib:init_p_do_apply/3(line:247); initial_call:
> {couch_db_updater,init,['Argument__1']}, ancestors:
> [\u003c0.26784.5\u003e], message_queue_len: 0, messages: [], links:
> [\u003c0.26784.5\u003e,\u003c0.26790.5\u003e], dictionary:
> [{io_priority,{db_update,\u003c\u003c\"shards/9553-aaa7/twitter.16570671...\"\u003e\u003e}},...],
> trap_exit: false, status: running, heap_size: 610, stack_size: 27,
> reductions: 250
> [error] 2022-07-06T04:56:10.077664Z couchdb@10.0.0.80
> \u003c0.6591.6\u003e  CRASH REPORT Process
> (\u003c0.6591.6\u003e) with 2 neighbors exited with reason:
> {timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}} at
> gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
> \u003c= couch_bt_engine:init/2(line:157) \u003c=
> couch_db_engine:init/3(line:775) \u003c=
> couch_db_updater:init/1(line:43) \u003c=
> proc_lib:init_p_do_apply/3(line:247); initial_call:
> {couch_db_updater,init,['Argument__1']}, ancestors:
> [\u003c0.6584.6\u003e], message_queue_len: 0, messages: [], links:
> [\u003c0.6584.6\u003e,\u003c0.6593.6\u003e], dictionary:
> [{io_priority,{db_update,\u003c\u003c\"shards/9553-aaa7/twitter.16570671...\"\u003e\u003e}},...],
> trap_exit: false, status: running, heap_size: 610, stack_size: 27,
> reductions: 250
> [info] 2022-07-06T04:56:10.077711Z couchdb@10.0.0.80
> \u003c0.228.0\u003e  db
> shards/9553-aaa7/twitter.1657067184 died with reason
> {timeout,{gen_server,call,[\u003c0.6593.6\u003e,find_header]}}
> [info] 2022-07-07T06:44:13.863950Z couchdb@10.0.0.80
> \u003c0.228.0\u003e  db
> shards/9553-aaa7/twitter.1657067184 died with reason
> {timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}}
> [error] 2022-07-07T06:44:13.864516Z couchdb@10.0.0.80
> \u003c0.9152.29\u003e  CRASH REPORT Process
> (\u003c0.9152.29\u003e) with 2 neighbors exited with reason:
> {timeout,{gen_server,call,[\u003c0.9139.29\u003e,find_header]}} at
> gen_server:call/2(line:206) \u003c= couch_file:read_header/1(line:378)
> \u003c= couch_bt_engine:init/2(line:157) \u003c=
> couch_db_engine:init/3(line:775) \u003c=
> couch_db_updater:init/1(line:43) \u003c=
> proc_lib:init_p_do_apply/3(line:247); initial_call:
> {couch_db_updater,init,['Argument__1']}, ancestors:
> [\u003c0.9136.29\u003e], message_queue_len: 0, messages: [], links:
> [\u003c0.9136.29\u003e,\u003c0.9139.29\u003e], dictionary:
> [{io_priority,{db_update,\u003c\u003c\"shards/9553-aaa7/twitter.16570671...\"\u003e\u003e}},...],
> trap_exit: false, status: running, heap_size: 610, stack_size: 27,
> reductions: 250
> 
> Cheers,
> 
> Luca Morandini

Re: Issue with search

2022-05-09 Thread Robert Newson

Hi Rick,

I think the explanation is straight-forward given your last comment. Indexes 
are not replicated, they are only built locally. So that original error is 
likely a timeout waiting for the index to build.

B.

> On 9 May 2022, at 21:16, Rick Jarvis  wrote:
> 
> It would appear it is actually speed. For some reason, search is a lot slower 
> on a rebuilt vm with the same specs.
> 
> Is there any control over indexing / verifying it’s happening ok? I’m not at 
> all sure how the search functionality works if I’m hones...
> 
> -- 
> Rick Jarvis
> 
> On 9 May 2022 at 17:40:23, Rick Jarvis (r...@magicmail.mooo.com) wrote:
> 
> Apologies if this is a duplicate, I think I might have used the wrong email 
> address:
> 
> I’ve migrated couchdb over to a new server, using replication. Latest couchdb 
> running on Debian 11.
> 
> Everything is working except ‘find’ (using Nano, from a NodeJs app).
> 
> It’s a long time since I set the search tree up, but essentially it looks 
> like the below (if of interest) - I can’t really remember how it works!
> 
> Is there anything I need to do since moving to the new server to make it 
> work? I can’t see any obvious errors in the couchdb logs. Client times out, 
> and I get this in the node logs (I think this is coming from the error, but 
> it’s production, so difficult to pinpoint atm):
> 
> 0|main  | Error: error happened in your connection
> 0|main  | at responseHandler 
> (/var/app/node_modules/nano/lib/nano.js:120:16)
> 0|main  | at axios.then.catch 
> (/var/app/node_modules/nano/lib/nano.js:405:13)
> 0|main  | at process._tickCallback (internal/process/next_tick.js:68:7)
> 
> var subquery = {  // q is search term
>selector: {
>$and: [
>{
>$or: [{ type: { $eq: 'case' } }, { type: { $eq: 'user' } 
> }, { type: { $eq: 'property' } }],
>},
>{
>$or: [
>{ email: { $regex: q } },
>{ firstname: { $regex: q } },
>{ lastname: { $regex: q } },
>{ p_forenames: { $regex: q } },
>{ p_surname: { $regex: q } },
>{ p_forenames2: { $regex: q } },
>{ p_surname2: { $regex: q } },
>{ n_company_name: { $regex: q } },
>{ n_property_address: { $regex: q } },
>{ n_property_address_postcode: { $regex: q } },
>{ c_provider: { $regex: q } },
>{ c_policy_number: { $regex: q } },
>{ c_product_code: { $regex: q } },
>],
>},
>],
>},
>fields: [
>'type',
>'_id',
>'firstname',
>'lastname',
>'p_forenames',
>'p_surname',
>'p_forenames2',
>'p_surname2',
>'email',
>'role',
>'clientid',
>'adviser',
>'p_joint',
>'n_property_address',
>'n_property_address_postcode',
>'c_provider',
>'c_policy_number',
>'c_product_code',
>'mb_deleted',
>'n_company_name',
>'created',
>],
>limit: 50,
>};
> 
>couch.find(subquery, function(err, data) { ... })
> 
> -- 
> Rick Jarvis

Re: Ubuntu update fails

2021-10-01 Thread Robert Newson

Hi,

Bintray went offline a while ago. Our official instructions and docs were 
updated ahead of that to point to the new location for our binary artefacts;

At https://docs.couchdb.org/en/stable/install/index.html check the 
"Installation using the Apache CouchDB convenience binary packages" section, 
we'll keep that up to date should it ever change again.


B.

> On 30 Sep 2021, at 23:27, Bill Stephenson  wrote:
> 
> I'm trying to update my DigitalOcean VPS that has my CouchDB running on it 
> and I get this error message:
> 
> sudo apt-get update
> 
>   Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
>   [109 kB]
>   Hit:2 http://mirrors.digitalocean.com/ubuntu xenial InRelease
>   Hit:3 http://mirrors.digitalocean.com/ubuntu xenial-updates InRelease
>   Hit:4 http://mirrors.digitalocean.com/ubuntu xenial-backports InRelease
>   Hit:5 http://ppa.launchpad.net/certbot/certbot/ubuntu xenial InRelease
>   Hit:6 http://ppa.launchpad.net/ondrej/apache2/ubuntu xenial InRelease
>   Ign:7 https://apache.bintray.com/couchdb-deb xenial InRelease
>   Ign:8 https://apache.bintray.com/couchdb-deb bionic InRelease
>   Ign:9 https://apache.bintray.com/couchdb-deb xenial Release
>   Ign:10 https://apache.bintray.com/couchdb-deb bionic Release
> 
>   ...
> 
>   Err:19 https://apache.bintray.com/couchdb-deb xenial/main amd64 Packages
>  403  Forbidden
>   Err:20 https://apache.bintray.com/couchdb-deb bionic/main amd64 Packages
>  403  Forbidden
>   Fetched 109 kB in 4s (22.6 kB/s)
>   Reading package lists... Done
>   W: The repository 'https://apache.bintray.com/couchdb-deb xenial
>   Release' does not have a Release file.
>   N: Data from such a repository can't be authenticated and is
>   therefore potentially dangerous to use.
>   N: See apt-secure(8) manpage for repository creation and user
>   configuration details.
>   W: The repository 'https://apache.bintray.com/couchdb-deb bionic
>   Release' does not have a Release file.
>   N: Data from such a repository can't be authenticated and is
>   therefore potentially dangerous to use.
>   N: See apt-secure(8) manpage for repository creation and user
>   configuration details.
>   E: Failed to fetch
>   
> https://apache.bintray.com/couchdb-deb/dists/xenial/main/binary-amd64/Packages
>   403  Forbidden
>   E: Failed to fetch
>   
> https://apache.bintray.com/couchdb-deb/dists/bionic/main/binary-amd64/Packages
>   403  Forbidden
>   E: Some index files failed to download. They have been ignored, or
>   old ones used instead.
> 
> I've never had an issue doing this before. Any ideas on where to go from here?
> 
> 
> -- 
> 
> Kindest Regards,
> 
> Bill Stephenson
> Tech Support
> www.ezInvoice.com
> 1-417-546-8390
>

Re: attachments timestamp

2021-09-08 Thread Robert Newson

You can add a timestamp as a field in your document when you write or add it in 
an _update  handler (though note that these are deprecated and slow).

B.

> On 8 Sep 2021, at 21:34, Sultan Dadakhanov  wrote:
> 
> Thanks. May be is it possible to extend/update the _attachments object and
> add a custom field like timestamp?
> 
> On Wed, Sep 8, 2021 at 11:28 PM Robert Newson  wrote:
> 
>> Hi,
>> 
>> Unfortunately, no. CouchDB only stores what you put in and does not add
>> supplemental data like a timestamp. If you have the couchdb log you might
>> find a record of the original PUT request, though.
>> 
>> B.
>> 
>>> On 8 Sep 2021, at 20:42, Sultan Dadakhanov  wrote:
>>> 
>>> Googled but unsuccessfully
>>> 
>>> Is it possible to know the timestamp of attachments? When exactly they
>>> uploaded to document?
>> 
>>

Re: attachments timestamp

2021-09-08 Thread Robert Newson

Hi,

Unfortunately, no. CouchDB only stores what you put in and does not add 
supplemental data like a timestamp. If you have the couchdb log you might find 
a record of the original PUT request, though.

B.

> On 8 Sep 2021, at 20:42, Sultan Dadakhanov  wrote:
> 
> Googled but unsuccessfully
> 
> Is it possible to know the timestamp of attachments? When exactly they
> uploaded to document?

Re: Setting up smoosh for database compaction

2021-08-19 Thread Robert Newson

Hi Paul,


I think that’s reasonable though do note that compaction is also for 
performance, even if you never update or delete a document, as couchdb defers 
rebalancing the b+tree disk structures until then (i.e, couchdb isn’t adhering 
to the b+tree algorithm from literature).

Left uncompacted the lookup/insert performance will drop from roughly O(log n) 
to O(n) over time (though only as a consequence of writing documents).

None of what I’ve said will apply in CouchDB 4.0 (compaction no longer required 
there)

B (short for Bob)

> On 19 Aug 2021, at 11:32, Paul Milner  wrote:
> 
> Hi B (?? ;-) )
> 
> I have a log database that could encounter high frequency updates and
> deletes. It's not required to be read by multiple users, but will be
> updated by all users. So rather than compacting it, which at certain
> frequencies of updates could lead to possible race conditions (thinking of
> extremes), I was going to do the following steps:
> 
> 1) Switch the active log to a new database
> 2) Copy the old database without orphans/history to the new database
> 3) delete the old database
> 
> I would toggle databases as needed.
> 
> Best regards
> Paul
> 
> On Thu, 19 Aug 2021 at 10:24, Robert Newson  wrote:
> 
>> Hi Paul,
>> 
>> We welcome feedback on why the automatic compaction system (in its default
>> configuration or custom) is not appropriate for you.
>> 
>> B.
>> 
>>> On 19 Aug 2021, at 05:29, Paul Milner  wrote:
>>> 
>>> Hi Adam
>>> 
>>> Thanks for the feedback. I was actually struggling with which options to
>> set per channel and what to set them to. Anyway after more thought, I’ve
>> decided on a manual approach as I need it to be more custom than automatic.
>>> 
>>> But thanks again
>>> I appreciate it.
>>> 
>>> Best regards
>>> Paul
>>> 
>>> Sent from my iPad
>>> 
>>>> On 18 Aug 2021, at 20:01, Adam Kocoloski  wrote:
>>>> 
>>>> Hi Paul, sorry to hear you’re finding it a challenge to configure. The
>> default configuration described in the documentation does give you an
>> example of how things are set up:
>>>> 
>>>> 
>> https://docs.couchdb.org/en/3.1.1/maintenance/compaction.html#channel-configuration
>>>> 
>>>> Cross-referenced from that section you can find the full configuration
>> reference that describes all the supported configuration keys at the
>> channel level:
>>>> 
>>>> 
>> https://docs.couchdb.org/en/3.1.1/config/compaction.html#config-compactions
>>>> 
>>>> The general idea is that you create [smoosh.]
>> configuration blocks with whatever settings you deem appropriate to match a
>> certain set of files and prioritize them, and then use the [smoosh] block
>> to activate those channels.
>>>> 
>>>> Can you say a little more about what you’re finding lacking in the
>> docs? Cheers,
>>>> 
>>>> Adam
>>>> 
>>>>> On Aug 18, 2021, at 2:58 AM, Paul Milner 
>> wrote:
>>>>> 
>>>>> Hello
>>>>> 
>>>>> I'm looking at the maintenance of my databases and how I could
>> implement
>>>>> tools to do that. Smoosh seems to be the main option, but I'm
>> struggling to
>>>>> set it up as the documentation seems a bit limited.
>>>>> 
>>>>> I have only really found this:
>>>>> 
>>>>> 5.1. Compaction — Apache CouchDB® 3.1 Documentation
>>>>> <
>> https://docs.couchdb.com/en/3.1.1/maintenance/compaction.html#database-compaction
>>> 
>>>>> 
>>>>> I could do it manually but wanted to explore this first and was
>> wondering
>>>>> if there are any smoosh examples about, that could help me on my way?
>>>>> 
>>>>> If anyone could point me in the right direction please, I would
>> appreciate
>>>>> it.
>>>>> 
>>>>> Thanks a lot
>>>>> Best regards
>>>>> Paul
>>>> 
>> 
>>

Re: Setting up smoosh for database compaction

2021-08-19 Thread Robert Newson

Hi Paul,

We welcome feedback on why the automatic compaction system (in its default 
configuration or custom) is not appropriate for you.

B.

> On 19 Aug 2021, at 05:29, Paul Milner  wrote:
> 
> Hi Adam
> 
> Thanks for the feedback. I was actually struggling with which options to set 
> per channel and what to set them to. Anyway after more thought, I’ve decided 
> on a manual approach as I need it to be more custom than automatic. 
> 
> But thanks again
> I appreciate it. 
> 
> Best regards
> Paul 
> 
> Sent from my iPad
> 
>> On 18 Aug 2021, at 20:01, Adam Kocoloski  wrote:
>> 
>> Hi Paul, sorry to hear you’re finding it a challenge to configure. The 
>> default configuration described in the documentation does give you an 
>> example of how things are set up:
>> 
>> https://docs.couchdb.org/en/3.1.1/maintenance/compaction.html#channel-configuration
>> 
>> Cross-referenced from that section you can find the full configuration 
>> reference that describes all the supported configuration keys at the channel 
>> level:
>> 
>> https://docs.couchdb.org/en/3.1.1/config/compaction.html#config-compactions
>> 
>> The general idea is that you create [smoosh.] configuration 
>> blocks with whatever settings you deem appropriate to match a certain set of 
>> files and prioritize them, and then use the [smoosh] block to activate those 
>> channels.
>> 
>> Can you say a little more about what you’re finding lacking in the docs? 
>> Cheers,
>> 
>> Adam
>> 
>>> On Aug 18, 2021, at 2:58 AM, Paul Milner  wrote:
>>> 
>>> Hello
>>> 
>>> I'm looking at the maintenance of my databases and how I could implement
>>> tools to do that. Smoosh seems to be the main option, but I'm struggling to
>>> set it up as the documentation seems a bit limited.
>>> 
>>> I have only really found this:
>>> 
>>> 5.1. Compaction — Apache CouchDB® 3.1 Documentation
>>> 
>>> 
>>> I could do it manually but wanted to explore this first and was wondering
>>> if there are any smoosh examples about, that could help me on my way?
>>> 
>>> If anyone could point me in the right direction please, I would appreciate
>>> it.
>>> 
>>> Thanks a lot
>>> Best regards
>>> Paul
>>

Re: CouchDB and RabbitMQ clusters

2021-07-15 Thread Robert Newson

Just agreeing with all previous responses but would add that it might make 
sense in your setup to put epmd under direct management (runit, systemd, etc) 
and arrange for it to start  before either service. And another note that if 
epmd _crashes_ then existing nodes do not re-register (and that’s as fun as it 
sounds). Epmd is very reliable but I have encountered it crashing under heavy 
registration load (caused by a faulting application).

B.

> On 15 Jul 2021, at 17:53, Joan Touzet  wrote:
> 
> I have seen it work this way myself with these two applications. Whoever
> starts first starts epmd, and the second sees epmd running on startup
> and simply connects to that.
> 
> -Joan
> 
> On 15/07/2021 10:10, Adam Kocoloski wrote:
>> That’s typically how it works for a well-behaved Erlang application, yes. 
>> CouchDB does work this way; I’m not 100% certain about RabbitMQ but it 
>> probably does as well. Cheers,
>> 
>> Adam
>> 
>>> On Jul 15, 2021, at 5:11 AM, Andrea Brancatelli 
>>>  wrote:
>>> 
>>> Hello everybody, 
>>> 
>>> I have a general Erlang question but I think you could help me with
>>> that... 
>>> 
>>> I need to run CouchDB and RabbitMQ on the same set of (three) nodes, all
>>> clustered together. 
>>> 
>>> What happens with epmd? Erlang's documentation
>>> (https://erlang.org/doc/man/epmd.html) is pretty vague: "The daemon is
>>> started automatically by command erl(1) [1] if the node is to be
>>> distributed and no running instance is present."... 
>>> 
>>> So what happens? The first one between Couch and Rabbit who starts opens
>>> epmd and the second one just hooks to the already running copy?
>>> 
>>> Thanks. 
>>> 
>>> -- 
>>> 
>>> Andrea Brancatelli
>>> 
>>> 
>>> 
>>> Links:
>>> --
>>> [1] https://erlang.org/doc/man/erl.html
>>

Re: Proxying document updates with update handlers

2021-05-28 Thread Robert Newson

Hi,

It’s worth remembering that the reason the new _rev is not available in your 
_update handler is because the database update happens afterward, and thus the 
value is not known. Indeed, it is not known if the update even succeeded (or 
failed because couchdb crashed, or there was a validate_doc_update error or 
you’d get a 409).

_update handler is just a way to move some logic to the server side, it isn’t 
being process ed inline with the database update itself.

Note also that the presence of an update handler doesn’t force all clients to 
use it, you’d need to enforce that some other way.

Jan is (of course) correct that this kind of thing is better done at the client 
end.

B.

> On 28 May 2021, at 07:50, Jan Lehnardt  wrote:
> 
> Hi Aurélien,
> 
> we generally recommend doing this kind of stuff outside of CouchDB,
> these days.
> 
> A Node.js proxy that does this completely and reliably is maybe 50
> lines of code and not a huge operational overhead, while granted not
> as neat as doing all this inside of CouchDB.
> 
> As for getting the new _rev in the function: the rev will be generated
> from the result that the function returns, so there is no way to get
> that other than calculating it yourself (it is deterministic), but
> that requires knowledge of erlang term encoding and such things. I’ve
> done it in JS (for other things), but it is not pretty.
> 
> Best
> Jan
> — 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 
> 24/7 Observation for your CouchDB Instances:
> https://opservatory.app
> 
>> On 27. May 2021, at 22:08, Aurélien Bénel  wrote:
>> 
>> Dear all,
>> 
>> I have known update handlers for quite long but I never used them "for real" 
>> myself... My current idea, which must be very common, is to proxy updates of 
>> whole documents in order to add some accountability of who contributed to 
>> the document and when.
>> 
>>   # rewrites.json
>>   [{
>>  "from": "",
>>  "to": "elsewhere",
>>  "method": "GET"
>>}, {
>> "from": "",
>> "to": "_update/accounting"
>>   }, {
>>  "from": ":object",
>> "to": "../../:object",
>>  "method": "GET"
>>   }, {
>>  "from": ":object",
>>  "to": "_update/accounting/:object"
>>   }]
>> 
>>   # accounting.js
>>   function(doc, req) {
>> var o = JSON.parse(req.body);
>> o._id = o._id || req.id || req.uuid;
>> var h = doc && doc.history || [];
>> h.push({
>>   user: req.userCtx.name,
>>   timestamp: new Date()
>> });
>> o.history = h;
>> return [o, {json: {ok: true, id: o._id }}];
>>   }
>> 
>> Tested on CouchDB 2.3.1, it *nearly* emulates the direct update of a 
>> document and adds contributions accounting, however I face two problems ;
>> 
>> 1. In the update handler, I see no way to get the new `_rev` value  (which 
>> should be returned either in the JSON body or as an ETag for compatibility 
>> with normal update of an object). Is there a secret builtin function that 
>> could be used to get (or set) this? Or is it set afterwards and then cannot 
>> be get or set at this stage of the process?
>> 
>> 2. In the update handler, when used with POST (with the `_id` in the body 
>> but not in the URI),  it seems that `doc` is always null (even when the ID 
>> refers to an existing document)... Is this behaviour intended? I feel that 
>> the documentation could be interpreted both ways... 
>> Of course, we can still use PUT. But I wanted a complete emulation of normal 
>> updates (with both methods)...
>> 
>> Any help or insight would be appreciated.
>> 
>> 
>> Regards,
>> 
>> Aurélien
>> 
>> 
>

Re: Compatibility of proxy authentication in CouchDB ecosystem

2021-05-26 Thread Robert Newson

Hi,

I can confirm that Cloudant does not enable the proxy authentication handler 
nor supports externalising authentication/authorization decisions in any other 
way. Use either IBM IAM or the CouchDB _users database within your account 
(note that the _users database option is not available for Transaction Engine 
instances).

B.

> On 26 May 2021, at 08:38, Aurélien Bénel  wrote:
> 
> Dear all,
> 
> I'm totally aware that this list is dedicated to Apache CouchDB and not to 
> IBM Cloudant, but please consider my question as related to Apache CouchDB 
> compatibility with its ecosystem.
> 
> As stated by Apache CouchDB documentation: 
> 
> "Proxy authentication is very useful in case your application already uses 
> some external authentication service and you don’t want to duplicate users 
> and their roles in CouchDB."
> Source: 
> https://docs.couchdb.org/en/latest/api/server/authn.html#proxy-authentication
> 
> Hence a reverse proxy can authenticate a user and send the username (as 
> `X-Auth-CouchDB-UserName` HTTP header) along with a token (as 
> `X-Auth-CouchDB-Token`) generated from this username and a general shared 
> secret (not related with this particular user). 
> 
> As stated: 
> 
> This authentication method allows creation of a User Context Object for 
> remotely authenticated user. 
> 
> This user context can be useful in a `validate_doc_update` function (for 
> authorizations) or in an `update` function (for accounting).
> 
> Among third party CouchDB hosts, IBM cloudant is one of the most famous. 
> However the product is slightly different, especially concerning security 
> (see: 
> https://cloud.ibm.com/docs/Cloudant?topic=Cloudant-couchdb-and-cloudant). 
> 
> Whereas I am familiar with proxy authentication in Apache CouchDB, I didn't 
> manage to setup a similar feature in Cloudant nor to find documentation about 
> it. In particular, IBM "API keys" (composed by a key and a password) don't 
> seem to be compatible with setting a different username as 
> `X-Auth-CouchDB-UserName`. 
> 
> Does anyone succeeded in setting up (in Cloudant) a user context different 
> from the credentials used for authentication? Or is there a doc anywhere 
> saying that it is not possible?
> 
> Or, is there a similar hosted service (esp. with free tier) but with proxy 
> authentication enabled?
> 
> 
> Best regards,
> 
> Aurélien
> 
> P.S. My question was asked also on StackOverflow: 
> https://stackoverflow.com/questions/67537968

Re: [Eventual consitency] Is the uniqness of the _id guaranteed across nodes within a cluster

2020-12-13 Thread Robert Newson

_id is indeed unique across the nodes of the cluster but that isn't helpful to 
your cause, because a document can have multiple, equally valid versions 
(called "revisions" in couchdb terms).

In CouchDB 2.x and 3.x, and with a default "N" value of three, each of the 
three nodes will accept a write independently of the others. There is an 
anti-entropy system that ensures, in the absence of error, all writes at any 
one of those nodes will reach all of the others, introducing a "conflict" 
revision if necessary.

What this means is if you had two writers trying to create "slot1:booked", they 
might each succeed in updating at least one of the three nodes. Once both 
writes have reached all three nodes, any subsequent read will see only of of 
those writes (the so-called "winner"). The other write is retrievable with 
extra options (?conflicts=true and others).

So, you could not build a mutual-exclusion lock or auto-incrementing counter 
using a single document.

We are building a new version of CouchDB, which will be designated 4.0 when 
it's finished, which fundamentally shifts from an availability focus to a 
consistency focus (AP -> CP). In that version you could attempt concurrent 
updates to the same document and be sure that only one write will succeed and 
the others definitively and permanently fail.

B.

> On 13 Dec 2020, at 10:26, Jan Lehnardt  wrote:
> 
> Hi Olaf,
> 
>> On 13. Dec 2020, at 11:13, Olaf Krueger  wrote:
>> 
>> Hi again,
>> 
>> we're working on a booking app.
>> In order to prevent over booking of a particular slot (A slot can booked 
>> only once), it's crucial to know if a slot is already booked or even not.
>> 
>> We're using a cluster of 3 CouchDB nodes, so having the eventual consistency 
>> issue in mind the question is, if it's possible to basically achieve the 
>> above requirement.
>> 
>> The idea is to create an own document for each booking by using a custom 
>> _id, e.g.
>> {
>>  _id: = "slot1:booked 
>> }
>> 
>> But that would probably only work if CouchDB guarantees the uniqness across 
>> all nodes.
>> Is that the case?
>> Or do we have to accept that consistency is sacrificed in favour of 
>> availability?
> 
> Yes.
> 
>> Or do we need to think about using another DB which sacrifies availability 
>> in favor of consistency?
> 
> Unless you can resolve a double-booking after the fact, yes. A memcached or 
> redis instance is often used in conjunction with CouchDB, but other 
> unique-id-register mechanisms exists.
> 
> Best
> Jan
> —
>> 
>> Many thanks!
>> Olaf
>> 
>> 
>

Re: Back to "Admin Party"

2020-05-04 Thread Robert Newson

>From 3.0 onward couchdb won’t even start unless there’s at least one admin 
>configured. 

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 4 May 2020, at 22:20, Bill Stephenson wrote:
> Thank you Joan!
> 
> It took me some time to figure our where those CouchDB config files are 
> on my Mac but I did get it back to “Admin Party” and got some testing 
> done.
> 
> What I want to play with is configuring my web apps to use a locally 
> installed CouchDB instead of my internet based CouchDB server. I think 
> this can be a pretty cool option for end users. All their data is in 
> their hands and control and they have options to backup their data 
> onsite and offsite, and using 3rd party services. My initial tests show 
> my app to be a lot faster, which should be expected because it’s not 
> running over the internet at all. 
> 
> I’m not thinking using “Admin Party” is the best option, but it may be 
> the easiest for end users so I wanted to look at how that would work.  
> 
> With PouchDB.js I can create a DB for the user, and a “_user” file, and 
> some files in their DB that my apps use. I’ve not figured out how to 
> create a “_security” file for a user’s DB with PouchDB yet, but I’m not 
> sure I can’t. 
> 
> I know there are some issues using this approach. For example, before I 
> got my CouchDB back to "Admin Party” I had to create a form field that 
> contains the user’s CouchDB admin credentials. That’s not good. And 
> with "Admin Party” a web app could potentially grab all a user’s data. 
> 
>  And there are surely some issues I don’t know of yet, but it’s a 
> pretty cool way to approach that component of developing and using web 
> apps if it can be secure and I’d like to work with it. 
> 
> Any thoughts on this?
> 
> —Bill
> 
> 
> > On May 3, 2020, at 2:51 PM, Joan Touzet  wrote:
> > 
> > Remove all admin users defined in [admins].
> > 
> > On 2020-05-03 3:23 p.m., Bill Stephenson wrote:
> >> No, I want to play with it a bit to see how I can use it in Admin Party 
> >> mode (as if I just installed it).
> >> —Bill
> >>> On May 3, 2020, at 2:19 PM, Daniel Holth  wrote:
> >>> 
> >>> Do you want to find the local.ini and put a new password in? It'll encode
> >>> the plaintext password typed into the correct .ini field
> >>> 
> >>> On Sun, May 3, 2020, 3:18 PM Bill Stephenson 
> >>> wrote:
> >>> 
>  Is there way to reset my CouchDB (2.3.0) to "Admin Party" that I 
>  installed
>  on my Mac Mini?
>  
>  
>  -Bill
>  
>  
>  
> 
>

Re: Disable Compaction for a single database

2020-04-28 Thread Robert Newson

Noting a) that replication only replicates latest revision, not the older ones. 
b) compaction is not optional, you are strongly advised not to go this way. 

--
 Robert Samuel Newson
 rnew...@apache.org


On Tue, 28 Apr 2020, at 11:46, Garren Smith wrote:
> I think it would be better to create a daily or hourly snapshot of your 
> database instead of relying on a database that doesn't run compaction. 
> Depending on the versioning history of a CouchDB database is a really bad 
> idea.
> As Bob said, rather create new docs than one document with lots of revisions. 
> PouchDB is slow to replicate documents with lots of revisions versus lots of 
> new documents. 
> 
> Cheers
> Garren
> 
> 
> 
> On Tue, Apr 28, 2020 at 9:06 AM Andrea Brancatelli 
>  wrote:
>> Hello Robert, 
>> 
>>  I see your point and mostly understand it. The plan was not to "use"
>>  this secondary database as an active one, but as a passively replicated
>>  database from a main instance, so performances of the secondary database
>>  weren't a big priority - the idea is to keep the whole "journal" of the
>>  main database. 
>> 
>>  We thought of having multiple copies of the documents as well, but the
>>  "client" is a React/Pouch application and that would become a pita. 
>> 
>>  My plan was to have a main database with a very aggressive compaction
>>  rule, so that pouch replication would be as fast as possibile and the
>>  local storage be as little as possible (also because pouch isn't blazing
>>  fast with local views and indexes when you have a lot of documents) and
>>  a secondary replicated database with a more relaxed compaction rule (as
>>  I was saying maybe disabled at all) to run backups on or to do
>>  post-mortem analysis of any problem that may rise on business logic. 
>> 
>>  ---
>> 
>>  Andrea Brancatelli
>> 
>>  On 2020-04-27 20:34, Robert Samuel Newson wrote:
>> 
>>  > Hi,
>>  > 
>>  > This is the most common mistake made with CouchDB, that it provides (or 
>> could provide) a full history of document changes.
>>  > 
>>  > Compaction is essential, it's the only time that the b+tree's are 
>> rebalanced and obsolete version of b+tree
>>  > nodes are removed from disk.
>>  > 
>>  > If the old revisions of your documents really matter, make new documents 
>> instead of updating them, and use some scheme of your choice to group them 
>> (you could use a view on some property common to all revisions of
>>  > the same logical document).
>>  > 
>>  > B.
>>  > 
>>  >> On 27 Apr 2020, at 17:10, Andrea Brancatelli 
>>  wrote:
>>  >> 
>>  >> Let's say I'd like to keep the whole revision history for documents in a
>>  >> specific database (but maybe drop old views, if it's possible). 
>>  >> 
>>  >> What compaction setting would do that overriding the more-reasonable
>>  >> default we usually have?
>>  >> 
>>  >> -- 
>>  >> 
>>  >> Andrea Brancatelli

Re: REST API stops after some calls

2020-01-16 Thread Robert Newson

It’s not clear what you’re reporting here. 

Do you get a response or not?
If you do, please show it. 
If not, check couch.log for output from that time and show that. 

> On 16 Jan 2020, at 14:09, Betto McRose [icarus]  wrote:
> 
> Hi all
> I got this issue I can't figure out what I'm missing
> I have tested in two applications: one with java-ee 8, the other in quarkus
> both calls CouchDB REST API for post() and _bulk operations
> 
> with java-ee I made some tricks because I need to create a document with
> every entity, from a mmsql record and went fine
> 
> but with quarkus app, I need to:
> -retrieve 100 non-migrated documents (could be more)
> -create 100 entities and persist in mmsql --no problems till here--
> -updates those 100 documents with 'migrated' back to CouchDB
> first with one-by-one, now in a bulk operation
> ---still no problems till here--
> 
> the problem appears when (and depends on ???, still don't know)
> do the query for the 'next' 100 or calls for bulk insert/update operation
> 
> doing one by one occurs after 50 inserts/update
> or calling the _bulk, occurs after 20 times
> 
> calls the REST API and stops there
> 
> CouchDB is installed with default values
> on a Windows 10 PRO i7
> with 16GB RAM
> and 500GB SSD
> 
> 
> 
> 
> -- 
> 
> 
> [image: --]
> 
> [icarus]
> [image: https://]about.me/mcrose
> 
> Betto McRose
> Java/JavaEE Developer

Re: Expected behavior on conflict between PouchDB and CouchDB

2019-12-14 Thread Robert Newson

The algorithm for choosing the winner is a little complicated and not really 
helpful to you here, though I'll describe it at the end anyway.

What matters is that couchdb and pouchdb retain both versions of your document. 
Your application semantics determine what should happen once a document has 
more than one edit branch. You can fetch the doc with ?conflicts=true to 
determine if there are alternatives, and then decide whether to merge those 
revisions, or delete the "winner" to promote the "loser", etc. There's lots of 
articles about conflict resolution and our docs site also talks about this.

The winner is chosen roughly as follows;

1) The longest edit branch wins
2) Edit branches that end in a non-deleted revision win over ones that end with 
a deleted revision
3) If there are multiple branches after considering 1) and 2) the _rev values 
are sorted and the one on the end is chosen.

> On 13 Dec 2019, at 23:58, Kiril Stankov  wrote:
> 
> Hi, thanks for your reply.
> Yes, looking at the Couch Fauxton side I found the conflicts.
> Can anyone explain how the winner is selected?
> Anyway to disable this feature, so the GET method returns the previous
> revision of the doc, before the conflict happened?
> 
> 
> On 12/12/19 10:18 PM, Robert Newson wrote:
>> Overwrote, are you sure? Was there no other revision available?
>> 
>> What should happen is that both versions of the document will be replicated 
>> to both sides, and one of them (the same one) will be chosen as the 
>> "winner". The other is always available until you delete it. Query with 
>> /dbname/docid?conflicts=true to see if you get a _conflicts member with the 
>> losing revision, then query with /dbname/docid?rev=X where X is the losing 
>> revision to confirm it's your "lost" update.
>> 
>> B.
>> 
>>> On 12 Dec 2019, at 12:03, Kiril Stankov  wrote:
>>> 
>>> Hi all,
>>> 
>>> I have 1 local PouchDB and 1 remote CouchDB (cluster mode).
>>> As I wanted to prepare for conflicts and also monitor for them I did the
>>> following test:
>>> - Pouch and Couch were in sync with few docs in the same DB.
>>> - stopped the connection between Pouch and Couch on network level.
>>> - modified a doc in Pouch
>>> - modified the same doc on Couch
>>> - restored connectivity
>>> - Expected Behavior: conflict
>>> - Observed behavior: Couch version of the doc overwrote the Pouch version.
>>> 
>>> Read the documentation  here:
>>> https://pouchdb.com/guides/replication.html
>>> and here
>>> https://pouchdb.com/guides/conflicts.htm
>>> <https://pouchdb.com/guides/conflicts.html>
>>> 
>>> but it doesn't seem to discuss this case.
>>> 
>>> What is the designed behavior?
>>> Thanks in advance!
>>> 
>>> Kiril.
>

Re: Expected behavior on conflict between PouchDB and CouchDB

2019-12-12 Thread Robert Newson

Overwrote, are you sure? Was there no other revision available?

What should happen is that both versions of the document will be replicated to 
both sides, and one of them (the same one) will be chosen as the "winner". The 
other is always available until you delete it. Query with 
/dbname/docid?conflicts=true to see if you get a _conflicts member with the 
losing revision, then query with /dbname/docid?rev=X where X is the losing 
revision to confirm it's your "lost" update.

B.

> On 12 Dec 2019, at 12:03, Kiril Stankov  wrote:
> 
> Hi all,
> 
> I have 1 local PouchDB and 1 remote CouchDB (cluster mode).
> As I wanted to prepare for conflicts and also monitor for them I did the
> following test:
>  - Pouch and Couch were in sync with few docs in the same DB.
>  - stopped the connection between Pouch and Couch on network level.
>  - modified a doc in Pouch
>  - modified the same doc on Couch
>  - restored connectivity
>  - Expected Behavior: conflict
>  - Observed behavior: Couch version of the doc overwrote the Pouch version.
> 
> Read the documentation  here:
> https://pouchdb.com/guides/replication.html
> and here
> https://pouchdb.com/guides/conflicts.htm
> 
> 
> but it doesn't seem to discuss this case.
> 
> What is the designed behavior?
> Thanks in advance!
> 
> Kiril.

Re: Disk full

2019-05-02 Thread Robert Newson

Indeed puzzling.

If you delete the database (DELETE /dbname) and if this succeeds (2xx response) 
then all of the db data is deleted fully. If you think you're seeing data 
persisting after deletion you have a problem (the delete is failing, or you're 
not really deleting the db, or something extremely strange is happening).

Another  cause of invisible bloat would be failed writes (especially ones with 
attachment data) as we'll write the data as we go but if the write then fails 
that leaves the partial write in the file with nothing pointing back at it. 
Compaction will clean that up, of course.

Compaction is essential in practically all cases. You could maybe get away with 
disabling it if you don't create, update or delete a document but even in that 
case the files will grow on restart (and perhaps when the db is closed and 
reopened?) as we'll append a new database footer.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 2 May 2019, at 18:02, Adam Kocoloski wrote:
> Hi Willem,
> 
> Good question. CouchDB has a 100% copy-on-write storage engine, 
> including for all updates to btree nodes, etc. so any updates to the 
> database will necessarily increase the file size before compaction. 
> Looking at your info I don’t see a heavy source of updates, so it is a 
> little puzzling.
> 
> Adam
> 
> 
> > On May 2, 2019, at 12:53 PM, Willem Bison  wrote:
> > 
> > Hi Adam,
> > 
> > I ran "POST compact" on the DB mentioned in my post and 'disk_size' went
> > from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480.
> > 
> > Wow.
> > 
> > I disabled compacting because I thought it was useless in our case since
> > the db's and the docs are so small. I do wonder how it is possible for a db
> > to grow so much when its being deleted several times a week. What is all
> > the 'air' ?
> > 
> > On Thu, 2 May 2019 at 18:31, Adam Kocoloski  wrote:
> > 
> >> Hi Willem,
> >> 
> >> Compaction would certainly reduce your storage space. You have such a
> >> small number of documents in these databases that it would be a fast
> >> operation.  Did you try it and run into issues?
> >> 
> >> Changing cluster.q shouldn’t affect the overall storage consumption.
> >> 
> >> Adam
> >> 
> >>> On May 2, 2019, at 12:15 PM, Willem Bison  wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
> >>> disk space, so much so that it regularly causes a disk full and a crash.
> >>> 
> >>> The server contains approximately 100 databases each with a reported
> >>> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
> >>> 'shards' folders combined exceeded a total 14G causing the server to
> >> crash.
> >>> 
> >>> The server is configured with
> >>> cluster.n = 1 and
> >>> cluster.q = 8
> >>> because that was suggested during setup.
> >>> 
> >>> When I write this the 'shards' folders look like this:
> >>> /var/lib/couchdb/shards# du -hs *
> >>> 869M -1fff
> >>> 1.4G 2000-3fff
> >>> 207M 4000-5fff
> >>> 620M 6000-7fff
> >>> 446M 8000-9fff
> >>> 458M a000-bfff
> >>> 400M c000-dfff
> >>> 549M e000-
> >>> 
> >>> One of the largest files is this:
> >>> curl localhost:5984/xxx_1590
> >>> {
> >>>   "db_name": "xxx_1590",
> >>>   "purge_seq":
> >>> 
> >> "0-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
> >>>   "update_seq":
> >>> 
> >> "3132-g1FWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
> >>>   "sizes": {
> >>>   "file": 595928643,
> >>>   "external": 462778,
> >>>   "active": 1393380
> >>>   },
> >>>   "other": {
> >>>   "data_size": 462778
> >>>   },
> >>>   "doc_del_count": 0,
> >>>   "doc_count": 74,
> >>>   "disk_size": 595928643,
> >>>   "disk_format_version": 7,
> >>>   "data_size": 1393380,
> >>>   "compact_running": false,
> >>>   "cluster": {
> >>>   "q": 8,
> >>>   "n": 1,
> >>>   "w": 1,
> >>>   "r": 1
> >>>   },
> >>>   "instance_start_time": "0"
> >>> }
> >>> 
> >>> curl localhost:5984/xxx_1590/_local_docs
> >>> {"total_rows":null,"offset":null,"rows":[
> >>> 
> >> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
> >>> 
> >> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
> >>> 
> >> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
> >>> 
> >> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
> >>> ]}
> >>> 
> >>> Each database push/pull replicates

Re: CouchDB replication crashing

2019-04-23 Thread Robert Newson

Hi,

The most likely explanation is there is a document that you update frequently 
that happens to land in the 8000-9fff shard range.

Noting that you did not need to delete and replace the file, we strongly 
recommend against modifying database files directly, as compaction would have 
fixed this fragmentation issue for you as it writes out a new file and then 
swaps at the end. Compaction is an essential maintenance task so perhaps you 
should revise your schedule (or start compacting regularly if you have not been 
so far).

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 22 Apr 2019, at 22:37, Delfino Gustavo wrote:
> (resending hopefully with corrected ascii table layout )
> 
> It turned out that my replications started failing shortly after my 
> last message. I was able to fix them, and I would like to share how. 
> 
> The problem was on my local server with a specific large database. As I 
> have had fragmentation issues in the past with a log file so I 
> suspected that this could be the problem. I used the contig.exe tool in 
> order to check for the number of fragments of the respective data file 
> (.couch) in each of the shards. This is what I found:
> 
> Shard 1: "-1fff" 330 fragments (~8GB)
> 
> Shard 2: "2000-3fff" 286 fragments (~6GB) 
> 
> Shard 3: "4000-5fff" 333 fragments (~7GB) 
> 
> Shard 4: "6000-7fff" 305 fragments (~6GB) 
> 
> Shard 5: "8000-9fff" 100601 fragments (~10GB) 
> 
> Shard 6: "a000-bfff" 346 fragments (~8GB) 
> 
> Shard 7: "c000-dfff" 362 fragments (~10GB) 
> 
> Shard 8: "e000-" 252 fragments (~5GB)
> 
> I have no idea why the data file for shard 5 became so fragmented. The 
> contig.exe tool would not defragment it probably because it was too 
> large but I was able to drastically reduce the number of fragments by 
> shutting down CouchDB and replacing the file with a duplicate of 
> itself. Now the replication is working fine. 
> 
> Regards,
> 
> Gustavo Delfino
> 
> 
> -Original Message-
> From: Delfino Gustavo 
> Sent: Monday, April 22, 2019 2:05 PM
> To: user@couchdb.apache.org
> Subject: RE: CouchDB replication crashing
> 
> Thank you Nick and Peter for your help. It is now back to normal.
> 
> I had plenty of disk space on CouchDB database hard disks, but the 
> startup disk was 99.5% full. Besides dealing with this space situation 
> I ran the defragment tool as I know that it can also be an issue under 
> windows. Thankfully I'll be moving this DB to Linux soon.
> 
> Regards,
> 
> Gustavo Delfino
> 
> -Original Message-
> From: Nick Vatamaniuc 
> Sent: Monday, April 22, 2019 12:19 PM
> To: user@couchdb.apache.org
> Subject: Re: CouchDB replication crashing
> 
> Hi Gustavo,
> 
> "no match of right hand value {error,eio}" looks like a failure to 
> write to a block device. Perhaps disks are full, or there is some kind 
> of throttling
> (over-quota) issue?
> 
> Replication will restart on failures. There is an exponential backoff 
> on repeated errors so if your replications crash too many times in a 
> row they'd be penalized and wait a bit. But I'd investigate why there 
> are these EIO errors happening first.
> 
> Cheers,
> -Nick
> 
> On Mon, Apr 22, 2019 at 11:39 AM Delfino Gustavo 
> wrote:
> 
> > I am doing multimaster replication where each server is configured to 
> > push its changes to the other one.  The log I sent was from the remote 
> > server trying to send the changes to my local server. As an experiment 
> > I instructed the local server to pull from the remote and that is not 
> > working either. This is a segment of the log in the local server:
> >
> > [error] 2019-04-22T13:39:06.972000Z couchdb@localhost <0.20502.2490>
> >  CRASH REPORT Process  (<0.20502.2490>) with 1 neighbors 
> > exited with reason: no match of right hand value {error,eio} at
> > couch_bt_engine_stream:write/2(line:60) <=
> > couch_stream:do_write/2(line:302) <=
> > couch_stream:handle_call/3(line:278)
> > <= gen_server:try_handle_call/4(line:615) <=
> > gen_server:handle_msg/5(line:647) <=
> > proc_lib:init_p_do_apply/3(line:247)
> > at gen_server:terminate/7(line:812) <= 
> > proc_lib:init_p_do_apply/3(line:247); initial_call:
> > {couch_stream,init,['Argument__1']}, ancestors: [<0.27672.2480>], messages:
> > [], links: [<0.27672.2480>], dictionary:
> > [{io_priority,{interactive,<<"shards/8000-9fff/jobhandlersims.
> > 15...">>}}],
> > trap_exit: false, status: running, heap_size: 987, stack_size: 27,
> > reductions: 596
> > [error] 2019-04-22T13:39:07.081000Z couchdb@localhost <0.18062.2491>
> >  Replicator, request PUT to "http:// 
> > nearbyhost.com:5984/jobhandlersims/361c9af728e2775b4e611efcad07bc57-1:results?new_edits=false"
> > failed due to error {code,500}
> > [error] 2019-04-22T13:39:07.30Z couchdb@localhost <0.4458.2476>
> >  Replicator, request PUT to "http:// 
> >

Re: Leaking memory in logger process couch 2.3.1

2019-03-21 Thread Robert Newson

Hi,

Eek. This queue should never get this big, it indicates that there is far too 
much logging traffic generated and your target (file or syslog server) can't 
take it. It looks like you have 'debug' level set which goes a long way to 
explaining it. I would return to the default level of 'notice' for a 
significant reduction in logging volume.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 21 Mar 2019, at 18:34, Vladimir Ralev wrote:
> Hello,
> 
> I am testing couch 2.3.1 in various configurations and while loading high
> number of test DBs I notice a ton of memory being eaten at some point and
> never recovered More than 20 gigs and going into swap at which point i kill
> the machine.
> 
> So went into the remsh to see where the memory goes and it is the logging
> process. Take a look at the message queue len 4671185:
> 
> (couc...@couch01.int.test)65> MQSizes2 = lists:map(fun(A) -> {_,B} =
> process_info(A,message_queue_len), {B,A} end, processes()).
> (couc...@couch01.int.test)66> {_,BadProcess} =
> hd(lists:reverse(lists:sort(MQSizes2))).
> (couc...@couch01.int.test)67> process_info(BadProcess).
> [{registered_name,couch_log_server},
>  {current_function,{prim_file,drv_get_response,1}},
>  {initial_call,{proc_lib,init_p,5}},
>  {status,running},
>  {message_queue_len,4671185},
>  {messages,[{'$gen_cast',{log,{log_entry,debug,<0.8973.15>,
> 
>  [79,83,32,80,114,111,99,101,115,115,32,[...]|...],
>  "",
> 
>  ["2019",45,["0",51],45,"21",84,["0",50],58,"40",58|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8973.15>,
> 
>  [79,83,32,80,114,111,99,101,115,115,32|...],
>  "",
> 
>  ["2019",45,["0",51],45,"21",84,["0",50],58,[...]|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.15949.9>,
> 
>  [79,83,32,80,114,111,99,101,115,115|...],
>  "",
> 
>  ["2019",45,["0",51],45,"21",84,[[...]|...],58|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8971.15>,
> 
>  [79,83,32,80,114,111,99,101,115|...],
>  "",
> 
>  ["2019",45,["0",51],45,"21",84,[...]|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.9015.15>,
>  [79,83,32,80,114,111,99,101|...],
>  "",
> 
>  ["2019",45,["0",51],45,"21",84|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.9015.15>,
>  [79,83,32,80,114,111,99|...],
>  "",
> 
>  ["2019",45,["0",51],45,[...]|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8973.15>,
>  [79,83,32,80,114,111|...],
>  "",
>  ["2019",45,[[...]|...],45|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.15949.9>,
>  [79,83,32,80,114|...],
>  "",
>  ["2019",45,[...]|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8971.15>,
>  [79,83,32,80|...],
>  "",
>  ["2019",45|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8973.15>,
>  [79,83,32|...],
>  "",
>  [[...]|...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.15949.9>,
>  [79,83|...],
>  "",
>  [...]}}},
> {'$gen_cast',{log,{log_entry,debug,<0.9015.15>,
>  [79|...],
>  [...],...}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8971.15>,[...],...}}},
> {'$gen_cast',{log,{log_entry,debug,<0.8973.15>,...}}},
> {'$gen_cast',{log,{log_entry,debug,...}}},
> {'$gen_cast',{log,{log_entry,...}}},
> {'$gen_cast',{log,{...}}},
> {'$gen_cast',{log,...}},
> {'$gen_cast',{...}},
> {'$gen_cast',...},
> {...}|...]},
>  {links,[<0.122.0>,#Port<0.2149>]},
>  {dictionary,[{'$initial_call',{couch_log_server,init,1}},
>   {'$ancestors',[couch_log_sup,<0.121.0>]}]},
>  {trap_exit,true},
>  {error_handler,error_handler},
>  {priority,normal},
>  {group_leader,<0.120.0>},
>  {total_heap_size,10957},
>  {heap_size,4185},
>  {stack_size,29},
>  {reductions,292947037857},
>  {garbage_collection,[{max_heap_size,#{error_logger => true,kill =>
> true,size => 0}},
>

Re: r and w parameters in couch2.x

2019-03-12 Thread Robert Newson

Thanks Jan, it was useful to clarify what N means here in case the OP would 
increase N if they added more nodes.

N=3 is the default, three separate copies of any individual document, even if 
you had 100 nodes in your cluster (any given document would be stored on 3 of 
those 100 nodes).

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Tue, 12 Mar 2019, at 07:25, Jan Lehnardt wrote:
> Specifically, n is the number of copies of your data, not the number of 
> nodes in the system. You can tweak read concurrency performance by 
> increasing a database’s number of shards (q) and adding more nodes for 
> those shards to live on, at the expense of view, all_docs and changes 
> requests becoming more expensive.
> 
> > On 12. Mar 2019, at 08:08, Vladimir Ralev  wrote:
> > 
> > OK, I see. Thank you.
> > 
> > On Mon, Mar 11, 2019 at 8:48 PM Robert Newson  wrote:
> > 
> >> Hi,
> >> 
> >> Yes, you will have 4 copies of your data, your nodes will be mirrors of
> >> each other in effect.
> >> 
> >> R and W only control one thing; the number of replies we wait for before
> >> returning your response. All N requests are made, in parallel,  no matter
> >> what setting for R or W you use. You're not saving I/O by changing it, you
> >> are just modifying your latency (lower values of R and W will lower request
> >> latency) and consistency (higher values of R and W will improve
> >> consistency, though nothing delivers strong consistency in CouchDB).
> >> 
> >> Your understanding is not quite right, and so there neither are the
> >> inferences made from that base.
> >> 
> >> B.
> >> 
> >> --
> >>  Robert Samuel Newson
> >>  rnew...@apache.org
> >> 
> >> On Mon, 11 Mar 2019, at 15:25, Vladimir Ralev wrote:
> >>> Ah thanks a lot for the reply.
> >>> 
> >>> The idea for n = 4 is both fault tolerance and performance. Since I have
> >>> very few writes, I expect replication IO and view indexing IO to be
> >> minimal
> >>> and I have no issues with temporary inconsistencies and conflicts.
> >>> 
> >>> My understanding is that since there are very few writes, the 4 nodes
> >> will
> >>> behave almost like 4 independent single nodes and will be able to serve
> >> the
> >>> read requests independently without having to proxy to cluster peers and
> >>> thus avoiding a great deal of extra network and disk IO.
> >>> 
> >>> R=3 to me means 3 times the IO and thus 3 machines will be busy for the
> >>> same read request instead of serving other requests. Which I understand
> >> is
> >>> 3 times less performance from the cluster as a whole.
> >>> 
> >>> If my understanding is correct, I imagine this would be a common use-case
> >>> for couch?
> >>> 
> >>> On Mon, Mar 11, 2019 at 4:58 PM Robert Newson 
> >> wrote:
> >>> 
> >>>> r and w are no longer configurable from the config file by design. The
> >>>> default is n/2+1 (so 3 in your case) unless you specify r or w as
> >> request
> >>>> parameters.
> >>>> 
> >>>> setting n = 4 for a 4 node cluster is very unusual, do you really need
> >> 4
> >>>> full copies of your data?
> >>>> 
> >>>> couchdb will also automatically lower both r and w if nodes are
> >> offline.
> >>>> 
> >>>> The default of n=3, r=w=2 is appropriate in almost all cases as the
> >> right
> >>>> balance between data safety and availability. Nothing you've said so
> >> far
> >>>> suggests it would be good to deviate from those settings.
> >>>> 
> >>>> --
> >>>>  Robert Samuel Newson
> >>>>  rnew...@apache.org
> >>>> 
> >>>> On Mon, 11 Mar 2019, at 14:52, Vladimir Ralev wrote:
> >>>>> Hi all,
> >>>>> 
> >>>>> I am looking into running a 4-node couchdb 2.3 with this config in
> >>>>> default.ini and I made sure no other config file override them:
> >>>>> [cluster]
> >>>>> q = 8
> >>>>> n = 4
> >>>>> r = 1
> >>>>> w = 1
> >>>>> 
> >>>>> But when i create a test DB and check the settings I get:
> >>>>> curl -s couch01:5984/mytest1234 |jq .   "cluster&qu

Re: r and w parameters in couch2.x

2019-03-11 Thread Robert Newson

Hi,

Yes, you will have 4 copies of your data, your nodes will be mirrors of each 
other in effect.

R and W only control one thing; the number of replies we wait for before 
returning your response. All N requests are made, in parallel,  no matter what 
setting for R or W you use. You're not saving I/O by changing it, you are just 
modifying your latency (lower values of R and W will lower request latency) and 
consistency (higher values of R and W will improve consistency, though nothing 
delivers strong consistency in CouchDB).

Your understanding is not quite right, and so there neither are the inferences 
made from that base.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 11 Mar 2019, at 15:25, Vladimir Ralev wrote:
> Ah thanks a lot for the reply.
> 
> The idea for n = 4 is both fault tolerance and performance. Since I have
> very few writes, I expect replication IO and view indexing IO to be minimal
> and I have no issues with temporary inconsistencies and conflicts.
> 
> My understanding is that since there are very few writes, the 4 nodes will
> behave almost like 4 independent single nodes and will be able to serve the
> read requests independently without having to proxy to cluster peers and
> thus avoiding a great deal of extra network and disk IO.
> 
> R=3 to me means 3 times the IO and thus 3 machines will be busy for the
> same read request instead of serving other requests. Which I understand is
> 3 times less performance from the cluster as a whole.
> 
> If my understanding is correct, I imagine this would be a common use-case
> for couch?
> 
> On Mon, Mar 11, 2019 at 4:58 PM Robert Newson  wrote:
> 
> > r and w are no longer configurable from the config file by design. The
> > default is n/2+1 (so 3 in your case) unless you specify r or w as request
> > parameters.
> >
> > setting n = 4 for a 4 node cluster is very unusual, do you really need 4
> > full copies of your data?
> >
> > couchdb will also automatically lower both r and w if nodes are offline.
> >
> > The default of n=3, r=w=2 is appropriate in almost all cases as the right
> > balance between data safety and availability. Nothing you've said so far
> > suggests it would be good to deviate from those settings.
> >
> > --
> >   Robert Samuel Newson
> >   rnew...@apache.org
> >
> > On Mon, 11 Mar 2019, at 14:52, Vladimir Ralev wrote:
> > > Hi all,
> > >
> > > I am looking into running a 4-node couchdb 2.3 with this config in
> > > default.ini and I made sure no other config file override them:
> > > [cluster]
> > > q = 8
> > > n = 4
> > > r = 1
> > > w = 1
> > >
> > > But when i create a test DB and check the settings I get:
> > > curl -s couch01:5984/mytest1234 |jq .   "cluster": { "q": 8,
> > "n": 4,
> > > "w": 3, "r": 3 },
> > >
> > > r and w settings are not respected and seem stuck to be the defaults.
> > >
> > > When I kill 3 of the machine and test reads and writes, they still work
> > > fine so it doesn't seem like the r and w are actually 3 either. I checked
> > > if the debug logs printed out the r and w anywhere to confirm what is
> > being
> > > configured or executed but there is nothing.
> > >
> > > It is unclear if r and w are active in this version of couch. I can see
> > > the
> > > they have been partially removed from the documentation
> > > https://docs.couchdb.org/en/master/cluster/theory.html as opposed to
> > > couchdb 2.0.0 original doc
> > >
> > https://web.archive.org/web/20160109122310/https://docs.couchdb.org/en/stable/cluster/theory.html
> > >
> > > Additionally curl -s couch01:5984/mytest1234/doc?r=3
> > > still works even if 3 out of the 4 nodes are dead which is unexpected per
> > > the quorum documentation here
> > > https://docs.couchdb.org/en/master/cluster/sharding.html#quorum
> > >
> > > My specific concern with r and w is that if r is 3 this means 3 times
> > more
> > > network and disk IO since it will have to read 3 times from remote
> > > machines. My use case really doesn't need this and performance will
> > suffer.
> > > This is a little hard to test so I was hopinh someone can shed some light
> > > on the current situation with r and w in couch 2.3.
> > >
> > > Thanks
> > >
> >
>

Re: r and w parameters in couch2.x

2019-03-11 Thread Robert Newson

r and w are no longer configurable from the config file by design. The default 
is n/2+1 (so 3 in your case) unless you specify r or w as request parameters.

setting n = 4 for a 4 node cluster is very unusual, do you really need 4 full 
copies of your data?

couchdb will also automatically lower both r and w if nodes are offline.

The default of n=3, r=w=2 is appropriate in almost all cases as the right 
balance between data safety and availability. Nothing you've said so far 
suggests it would be good to deviate from those settings.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Mon, 11 Mar 2019, at 14:52, Vladimir Ralev wrote:
> Hi all,
> 
> I am looking into running a 4-node couchdb 2.3 with this config in
> default.ini and I made sure no other config file override them:
> [cluster]
> q = 8
> n = 4
> r = 1
> w = 1
> 
> But when i create a test DB and check the settings I get:
> curl -s couch01:5984/mytest1234 |jq .   "cluster": { "q": 8, "n": 4,
> "w": 3, "r": 3 },
> 
> r and w settings are not respected and seem stuck to be the defaults.
> 
> When I kill 3 of the machine and test reads and writes, they still work
> fine so it doesn't seem like the r and w are actually 3 either. I checked
> if the debug logs printed out the r and w anywhere to confirm what is being
> configured or executed but there is nothing.
> 
> It is unclear if r and w are active in this version of couch. I can see 
> the
> they have been partially removed from the documentation
> https://docs.couchdb.org/en/master/cluster/theory.html as opposed to
> couchdb 2.0.0 original doc
> https://web.archive.org/web/20160109122310/https://docs.couchdb.org/en/stable/cluster/theory.html
> 
> Additionally curl -s couch01:5984/mytest1234/doc?r=3
> still works even if 3 out of the 4 nodes are dead which is unexpected per
> the quorum documentation here
> https://docs.couchdb.org/en/master/cluster/sharding.html#quorum
> 
> My specific concern with r and w is that if r is 3 this means 3 times more
> network and disk IO since it will have to read 3 times from remote
> machines. My use case really doesn't need this and performance will suffer.
> This is a little hard to test so I was hopinh someone can shed some light
> on the current situation with r and w in couch 2.3.
> 
> Thanks
>

Re: [DISCUSS] On the _changes feed - how hard should we strive for exactly once semantics?

2019-03-07 Thread Robert Newson

I said a bunch of this on IRC also but Adam has it. further operations within 
the 'expired' txn just fail. we recognise that and start a new one. In the 
_changes case, we'd send a last_seq row and end the request, but this isn't 
going to be a great answer (at least, not a backward compatible answer) for 
_view, _all_docs and _find.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 7 Mar 2019, at 12:37, Adam Kocoloski wrote:
> Bah, our “cue”, not our “queue” ;)
> 
> Adam
> 
> > On Mar 7, 2019, at 7:35 AM, Adam Kocoloski  wrote:
> > 
> > Hi Garren,
> > 
> > In general we wouldn’t know ahead of time whether we can complete in five 
> > seconds. I believe the way it works is that we start a transaction, issue a 
> > bunch of reads, and after 5 seconds any additional reads will start to fail 
> > with something like “read version too old”. That’s our queue to start a new 
> > transaction. All the reads that completed successfully are fine, and the 
> > CouchDB API layer can certainly choose to start streaming as soon as the 
> > first read completes (~2ms after the beginning of the transaction).
> > 
> > Agree with Bob that steering towards a larger number of short-lived 
> > operations is the way to go in general. But I also want to balance that 
> > with backwards-compatibility where it makes sense.
> > 
> > Adam
> > 
> >> On Mar 7, 2019, at 7:22 AM, Garren Smith  wrote:
> >> 
> >> I agree that option A seems the most sensibile. I just want to understand
> >> this comment:
> >> 
> >>>> A _changes request that cannot be satisfied within the 5 second limit
> >> will be implemented as multiple FoundationDB transactions under the covers
> >> 
> >> How will we know if a change request cannot be completed in 5 seconds? Can
> >> we tell that beforehand. Or would we try and complete a change request. The
> >> transaction fails after 5 seconds and then do multiple transactions to get
> >> the full changes? If that is the case the response from CouchDB to the user
> >> will be really slow as they have already waited 5 seconds and have still
> >> not received anything. Or if we start streaming a result back to the user
> >> in the first transaction (Is this even possible?) then we would somehow
> >> need to know how to continue the changes feed after the transaction has
> >> failed.
> >> 
> >> Then Bob from your comment:
> >> 
> >>>> Forcing clients to do short (<5s) requests feels like a general good, as
> >> long as meaningful things can be done in that time-frame, which I strongly
> >> believe from what we've said elsewhere that they can.
> >> 
> >> That makes sense, but how would we do that? How do you help a user to make
> >> sure their request is under 5 seconds?
> >> 
> >> Cheers
> >> Garren
> >> 
> >> 
> >> 
> >> On Thu, Mar 7, 2019 at 11:15 AM Robert Newson  wrote:
> >> 
> >>> Hi,
> >>> 
> >>> Given that option A is the behaviour of feed=continuous today (barring the
> >>> initial whole-snapshot phase to catch up to "now") I think that's the 
> >>> right
> >>> move.  I confess to not reading your option B too deeply but I was there 
> >>> on
> >>> IRC when the first spark was lit. We can build some sort of temporary
> >>> multi-index on FDB today, that's clear, but it's equally clear that we
> >>> should avoid doing so if at all possible.
> >>> 
> >>> Perhaps the future Redwood storage engine for FDB will, as you say,
> >>> significantly improve on this, but, even if it does, I'm not 100% 
> >>> convinced
> >>> we should expose it. Forcing clients to do short (<5s) requests feels like
> >>> a general good, as long as meaningful things can be done in that
> >>> time-frame, which I strongly believe from what we've said elsewhere that
> >>> they can.
> >>> 
> >>> CouchDB's API, as we both know from rich (heh, and sometimes poor)
> >>> experience in production, has a lot of endpoints of wildly varying
> >>> performance characteristics. It's right that we evolve away from that 
> >>> where
> >>> possible, and this seems a great candidate given the replicator in ~all
> >>> versions of CouchDB will handle the change without blinking.
> >>> 
> >>> We have the same issue for _all_docs and _view and _find, in that the user
> >>>

Re: [DISCUSS] On the _changes feed - how hard should we strive for exactly once semantics?

2019-03-07 Thread Robert Newson

Hi,

Given that option A is the behaviour of feed=continuous today (barring the 
initial whole-snapshot phase to catch up to "now") I think that's the right 
move.  I confess to not reading your option B too deeply but I was there on IRC 
when the first spark was lit. We can build some sort of temporary multi-index 
on FDB today, that's clear, but it's equally clear that we should avoid doing 
so if at all possible. 

Perhaps the future Redwood storage engine for FDB will, as you say, 
significantly improve on this, but, even if it does, I'm not 100% convinced we 
should expose it. Forcing clients to do short (<5s) requests feels like a 
general good, as long as meaningful things can be done in that time-frame, 
which I strongly believe from what we've said elsewhere that they can.

CouchDB's API, as we both know from rich (heh, and sometimes poor) experience 
in production, has a lot of endpoints of wildly varying performance 
characteristics. It's right that we evolve away from that where possible, and 
this seems a great candidate given the replicator in ~all versions of CouchDB 
will handle the change without blinking.

We have the same issue for _all_docs and _view and _find, in that the user 
might ask for more data back than can be sent within a single FDB transaction. 
I suggest that's a new thread, though.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Thu, 7 Mar 2019, at 01:24, Adam Kocoloski wrote:
> Hi all, as the project devs are working through the design for the 
> _changes feed in FoundationDB we’ve come across a limitation that is 
> worth discussing with the broader user community. FoundationDB 
> currently imposes a 5 second limit on all transactions, and read 
> versions from old transactions are inaccessible after that window. This 
> means that, unlike a single CouchDB storage shard, it is not possible 
> to grab a long-lived snapshot of the entire database.
> 
> In extant versions of CouchDB we rely on this long-lived snapshot 
> behavior for a number of operations, some of which are user-facing. For 
> example, it is possible to make a request to the _changes feed for a 
> database of an arbitrary size and, if you’ve got the storage space and 
> time to spare, you can pull down a snapshot of the entire database in a 
> single request. That snapshot will contain exactly one entry for each 
> document in the database. In CouchDB 1.x the documents appear in the 
> order in which they were most recently updated. In CouchDB 2.x there is 
> no guaranteed ordering, although in practice the documents are roughly 
> ordered by most recent edit. Note that you really do have to complete 
> the operation in a single HTTP request; if you chunk up the requests or 
> have to retry because the connection was severed then the exactly-once 
> guarantees disappear.
> 
> We have a couple of different options for how we can implement _changes 
> with FoundationDB as a backing store, I’ll describe them below and 
> discuss the tradeoffs
> 
> ## Option A: Single Version Index, long-running operations as multiple 
> transactions
> 
> In this option the internal index has exactly one entry for each 
> document at all times. A _changes request that cannot be satisfied 
> within the 5 second limit will be implemented as multiple FoundationDB 
> transactions under the covers. These transactions will have different 
> read versions, and a document that gets updated in between those read 
> versions will show up *multiple times* in the response body. The entire 
> feed will be totally ordered, and later occurrences of a particular 
> document are guaranteed to represent more recent edits than than the 
> earlier occurrences. In effect, it’s rather like the semantics of a 
> feed=continuous request today, but with much better ordering and zero 
> possibility of “rewinds” where large portions of the ID space get 
> replayed because of issues in the cluster.
> 
> This option is very efficient internally and does not require any 
> background maintenance. A future enhancement in FoundationDB’s storage 
> engine is designed to enable longer-running read-only transactions, so 
> we will likely to be able to improve the semantics with this option 
> over time.
> 
> ## Option B: Multi-Version Index
> 
> In this design the internal index can contain multiple entries for a 
> given document. Each entry includes the sequence at which the document 
> edit was made, and may also include a sequence at which it was 
> overwritten by a more recent edit.
> 
> The implementation of a _changes request would start by getting the 
> current version of the datastore (call this the read version), and then 
> as it examines entries in the index it would skip over any entries 
> where there’s a “tombstone” sequence less than the read version. 
> Crucially, if the request needs to be implemented across multiple 
> transactions, each transaction would use the same read version when 
> deciding whether to include entries in the

Re: Restoring a 2.2 CouchDB on a newly installed 2.3

2019-02-13 Thread Robert Newson

Upgrades are more straightforward, you only update the software. CouchDB 2.3 
can read databases made with CouchDB 2.2 (and earlier). What you've done is 
unusual, created a new, separate CouchDB server and moved all the database 
files over by an external method.

The procedure outlined is not ideal but it should work. The best way to do it 
is to copy the .couch files for the database under shards/ and the 
corresponding document from the '_dbs' database (which is reachable at 
localhost:5986 not :5984). This method does not require downtime as long as 
there was no existing database of the same name on the destination server (if 
there is, you can delete it before copying over the _dbs document).

Again, for version upgrades, you don't have to move a thing, just update the 
software and restart couchdb. For moving data, we recommend replication (which 
works if the source or target is couchdb 1.x, 2.0, 2.1, 2.2, 2.3, and so on).

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Wed, 13 Feb 2019, at 06:58, jtuc...@objektfabrik.de wrote:
> So here is something that seems to work:
> 
>   * install couchdb on Windows Machine
>   * set up as single node
>   * create a db of the same name as the one of the origin machine (Linux
> based in my case)
>   * stop CouchDB service
>   * copy all of the .couch files from the /data/shards directory from
> the linux machine to local machine
>   * move them one by one to the /data/shards directory on the target machine
>   * replace the numeric part of the pre-existing shard files
> (mydb.1550039930.couch) with the one of the files that were created
> when you created the database on the Windows machine
>   * restart CouchDB service
> 
> Fauxton can see the database and lists the number of documents. 
> Attachmants can be opened with Fauxton. Our Application can open 
> documents and read attachments.
> 
> So this seems to work. But as I understand it, this is *not* a 
> recommended strategy for backup/restore and transport of data. Too bad 
> the single file thing doesn't work any more... This might especially be 
> a bad idea whenever you update CouchDB I guess.
> 
> 
> Joachim
> 
> 
> 
> 
> 
> 
> 
> Am 12.02.19 um 21:09 schrieb Krawetzky, Peter J:
> > Is it possible the encode or character set is unreadable to the windows 
> > instance?  These two OS's are completely different and in some cases I've 
> > had issues copying from one OS to another and vice versa.
> >
> >
> >
> > Proprietary
> >
> > -Original Message-
> > From: Joachim Tuchel 
> > Sent: Tuesday, February 12, 2019 3:03 PM
> > To: user@couchdb.apache.org
> > Subject: [EXTERNAL] Re: Restoring a 2.2 CouchDB on a newly installed 2.3
> >
> >  External Email - Use Caution 
> >
> > Thanks for the info. Seems like a little more complex than before...
> >
> > So the first thing I’ll try is copying the full /data directory (I need to 
> > get this running now).
> >
> > Replication is a bit difficult if you cannot open ports and the dev 
> > machines don’t have a fixed ip.
> >
> > Does anybody have recipes for such scenarios? I also wonder what a simple 
> > backup/restore looks like..
> >
> > Joachim
> >
> >> Am 12.02.2019 um 20:16 schrieb Robert Newson :
> >>
> >> since 2.0 there is more to this than copying the dbname.couch file around. 
> >> For one thing, every database is now sharded, so you have several .couch 
> >> files to copy (even if you only have one node). So make sure you've copied 
> >> them all and kept their directory hierarchy. In addition there is a meta 
> >> database called '_dbs' which is how couchdb knows where the shards of all 
> >> databases are. In your old install, you will have a document in _dbs 
> >> database named after the database you copied. You'll need to copy it to 
> >> your new cluster and modify the node names embedded in the by_node and 
> >> by_range attributes (assuming your new machine _has_ a different name. if 
> >> they're both 'couchdb@127.0.0.1' you won't need this step).
> >>
> >> We recommend replication as the means to move data from one couchdb 
> >> instance to another rather than moving database files around by hand.
> >>
> >> B.
> >>
> >> --
> >>   Robert Samuel Newson
> >>   rnew...@apache.org
> >>
> >>> On Tue, 12 Feb 2019, at 18:57, jtuc...@objektfabrik.de wrote:
> >>> Hi,
> >>>
> >>>
> >>> we have a 2.2 instance running on Ubuntu Linux. It is a single node setup.
> >>>
&g

Re: Restoring a 2.2 CouchDB on a newly installed 2.3

2019-02-12 Thread Robert Newson

since 2.0 there is more to this than copying the dbname.couch file around. For 
one thing, every database is now sharded, so you have several .couch files to 
copy (even if you only have one node). So make sure you've copied them all and 
kept their directory hierarchy. In addition there is a meta database called 
'_dbs' which is how couchdb knows where the shards of all databases are. In 
your old install, you will have a document in _dbs database named after the 
database you copied. You'll need to copy it to your new cluster and modify the 
node names embedded in the by_node and by_range attributes (assuming your new 
machine _has_ a different name. if they're both 'couchdb@127.0.0.1' you won't 
need this step).

We recommend replication as the means to move data from one couchdb instance to 
another rather than moving database files around by hand.

B.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Tue, 12 Feb 2019, at 18:57, jtuc...@objektfabrik.de wrote:
> Hi,
> 
> 
> we have a 2.2 instance running on Ubuntu Linux. It is a single node setup.
> 
> I am setting up a new Windows Development machine and tried copying the 
> .couch file from a backup to this new machine. On the CouchDb website, 
> only 2.3 is available for download for Windows.
> 
> So I copied the .couch from Linux to this Windows Machine into 
> C:\CouchDB\data. it is readable (no permission problems). But the 
> database doesn't show up in Fauxton, no matter how often I restart the 
> CouchDb service or WIndows.
> 
> The new machine is also set up as a single node and this way of doing 
> things has been working for years now (just the path where the databases 
> are has changed). If I create a database using Fauxton, it doesn't show 
> up in /data, but in the /shards subdirectories. But I don't have any 
> /shard files on the production Linux box.
> 
> So how can I restore this single database in a Windows Development machine?
> 
> 
> I am sure I forgot to mention something important about my versions and 
> setup. Sorry for that, please ask...
> 
> 
> Thanks
> 
> 
> Joachim
> 
>

Re: incomplete replication under 2.0.0

2017-03-09 Thread Robert Newson

Were the six missing documents newer on the target? That is, did you delete 
them on the target and expect another replication to restore them?

Sent from my iPhone

> On 9 Mar 2017, at 22:08, Christopher D. Malon  wrote:
> 
> I replicated a database (continuously), but ended up with fewer
> documents in the target than in the source.  Even if I wait,
> the remaining documents don't appear.
> 
> 1. Here's the DB entry on the source machine, showing 12 documents:
> 
> {"db_name":"library","update_seq":"61-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rkQGPoiQFIJlkD1bHjE-dA0hdPFgdIz51CSB19WB1BnjU5bEASYYGIAVUOh-_mRC1CyBq9-P3D0TtAYja-1mJbATVPoCoBbqXKQsA-0Fvaw","sizes":{"file":181716,"external":11524,"active":60098},"purge_seq":0,"other":{"data_size":11524},"doc_del_count":0,"doc_count":12,"disk_size":181716,"disk_format_version":6,"data_size":60098,"compact_running":false,"instance_start_time":"0"}
> 
> 2. Here's the DB entry on the target machine, showing 6 documents:
> 
> {"db_name":"library","update_seq":"6-g1FTeJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8rkQGPoiQFIJlkD1bHhE-dA0hdPFgdIz51CSB19QTV5bEASYYGIAVUOh-_GyFqF0DU7idG7QGI2vvEqH0AUQvyfxYA1_dvNA","sizes":{"file":82337,"external":2282,"active":5874},"purge_seq":0,"other":{"data_size":2282},"doc_del_count":0,"doc_count":6,"disk_size":82337,"disk_format_version":6,"data_size":5874,"compact_running":false,"instance_start_time":"0"}
> 
> 3. Here's _active_tasks for the task, converted to YAML for readability:
> 
> - changes_pending: 0
>  checkpoint_interval: 3
>  checkpointed_source_seq: 
> 61-g1JTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWyl
> pvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkW
> RV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu
> 1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ
>  continuous: !!perl/scalar:JSON::PP::Boolean 1
>  database: shards/-1fff/_replicator.1489086006
>  doc_id: 172.16.100.222_library
>  doc_write_failures: 0
>  docs_read: 12
>  docs_written: 12
>  missing_revisions_found: 12
>  node: couchdb@localhost
>  pid: <0.5521.0>
>  replication_id: c60427215125bd97559d069f6fb3ddb4+continuous+create_target
>  revisions_checked: 12
>  source: http://172.16.100.222:5984/library/
>  source_seq: 
> 61-g1JTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWylpvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkWRV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ
>  started_on: 1489086008
>  target: http://localhost:5984/library/
>  through_seq: 
> 61-g1JTeJyd0EsOgjAQBuAqxsfSE-gRKK08VnIT7UwhSBAWylpvojfRm-hNsLQkbAgRNtOkk__L5M8IIcvEkmSNRYmJhDArUGRJcblmajUVBDZVVaWJJchZfSwAucPQkWRV5jKKT3kke-KwVRP2jWBpgdMAwcOuTJ8U1tKhkSZaYhS5x2GodKylWyPZWnJ9QW3KBkr5TE1yV4_CHu1dMeyQ-c4o7Wm0V9u4F9setaM_GzfK2yifWplrxYeAcuGOuulrNN3X1PTFgXPqd-XSHxdwuSQ
>  type: replication
>  updated_on: 1489096815
>  user: peer
> 
> 4. Here's the _replicator record for the task:
> 
> {"_id":"172.16.100.222_library","_rev":"2-8e6cf63bc167c7c7e4bd38242218572c","schema":1,"storejson":null,"source":"http://172.16.100.222:5984/library","target":"http://localhost:5984/library","create_target":true,"dont_storejson":1,"wholejson":{},"user_ctx":{"roles":["_admin"],"name":"peer"},"continuous":true,"owner":null,"_replication_state":"triggered","_replication_state_time":"2017-03-09T19:00:08+00:00","_replication_id":"c60427215125bd97559d069f6fb3ddb4"}
> 
> There should have been no conflicting transactions on the target host.
> The appearance of "61-*" in through_seq of the _active_tasks entry
> gives me a false sense of security; I only noticed the missing documents
> by chance.
> 
> A fresh replication to a different target succeeded without any
> missing documents.
> 
> Is there anything here that would tip me off that the target wasn't
> in sync with the source?  Is there a good way to resolve the condition?
> 
> Thanks,
> Christopher

Re: Deleted document in _changes is missing

2016-12-25 Thread Robert Newson

Deleted docs return 404 when fetched, that's normal. If you're fetching an 
older revision than the latest, it will also be missing if you've compacted the 
database. 

Sent from my iPhone

> On 24 Dec 2016, at 17:32, Ian Goodacre  wrote:
> 
> Hi all,
> 
> I am running CouchDB 1.6.1 on Linux.
> 
> I have a database that has many deleted documents and I am able to retrieve 
> most of these but there are a few that I am unable to retrieve. When I 
> attempt to retrieve these, I get 404 with error 'not_found' and reason 
> 'missing'.
> 
> I would like to understand why these few documents are different - why am I 
> unable to retrieve these deleted documents?
> 
> For example, _changes response includes:
> 
>{
>  "deleted": true,
>  "changes": [
>{
>  "rev": "2-338d783957e141566caf3662cc0726bb"
>}
>  ],
>  "id": "assets_245",
>  "seq": 2355
>},
> 
> 
> When I attempt to retrieve this with:
> 
> http://localhost:5984/dcm_assets_tp/assets_245?rev=2-338d783957e141566caf3662cc0726bb
> 
> I get a 404 response.
> 
> I am expecting to get the deleted document, even it if only contains _id, 
> _rev and _deleted.
> 
> Also, I don't understand the response to
> 
> curl --noproxy '*' -X GET 
> 'http://localhost:5984/dcm_assets_tp/assets_245?open_revs=all'
> 
> which is
> 
> --a341c8902ae323bd6ea7d938bc0c2ac5--
> 
> And I get the same in response to
> 
> http://localhost:5984/dcm_assets_tp/assets_245?revs=true_revs=all
> 
> But, if I add -H 'Accept: application/json' then I get an empty array ([]):
> 
> curl --noproxy '*' -X GET -H 'Accept: application/json' 
> 'http://localhost:5984/dcm_assets_tp/assets_245?open_revs=all'
> 
> 
> I must be misunderstanding something (or a lot of things). Any help would be 
> appreciated.
> 
> Regards,
> Ian
>

Re: Getting 403 when trying to delete a document

2016-08-28 Thread Robert Newson

Http or https? Mobile operators do every awful thing if they can see plaintext. 

Sent from my iPhone

> On 25 Aug 2016, at 13:17, herman...@gmail.com wrote:
> 
> No, I don't have that.  The weird thing is it works after awhile.  The only 
> difference is that I was using my cell phone data plan when I was getting the 
> 403
> 
> Sent from my iPhone
> 
>> On Aug 25, 2016, at 3:09 AM, Robert Newson <rnew...@apache.org> wrote:
>> 
>> Maybe you have a design doc with a validate_doc_update function that is 
>> throwing "forbidden". 
>> 
>> Sent from my iPhone
>> 
>>> On 24 Aug 2016, at 23:47, herman...@gmail.com wrote:
>>> 
>>> Hi there,
>>> 
>>> Trying to delete a document and getting a 403 back.  The delete is execute 
>>> as a admin user, and I can still create or update document, just not 
>>> delete.  Any hints? Running 1.3.1
>>> 
>>> Thanks
>>> Herman
>>> 
>>> Sent from my iPhone
>>

Re: Getting 403 when trying to delete a document

2016-08-25 Thread Robert Newson

Maybe you have a design doc with a validate_doc_update function that is 
throwing "forbidden". 

Sent from my iPhone

> On 24 Aug 2016, at 23:47, herman...@gmail.com wrote:
> 
> Hi there,
> 
> Trying to delete a document and getting a 403 back.  The delete is execute as 
> a admin user, and I can still create or update document, just not delete.  
> Any hints? Running 1.3.1
> 
> Thanks
> Herman
> 
> Sent from my iPhone

Re: Can I run couchdb instance without installing?

2016-08-13 Thread Robert Newson

dev/run

Sent from my iPhone

> On 13 Aug 2016, at 19:19, Cihad Guzel  wrote:
> 
> Hi
> 
> I want to use couchdb for my project testing. So I want to embed cocuhdb in
> my project. Then I run couchdb with my script  programmaticaly and make
> test. After test, I stop couchdb instance with my script. But, I don't want
> to install couchdb because I want to embed it as binary. Is it possible?
> 
> Can I run it without installing?
> 
> regards
> Cihad Guzel

Re: Sharding question for clustered CouchDB 2.0

2016-07-23 Thread Robert Newson

You'll need to do so on port 5986, the node-local interface. 

Sent from my iPhone

> On 23 Jul 2016, at 07:15, Constantin Teodorescu <braila...@gmail.com> wrote:
> 
>> On Sat, Jul 23, 2016 at 12:47 AM, Robert Newson <rnew...@apache.org> wrote:
>> 
>> Are you updating one doc over and over? That's my inference. Also you'll
>> need to run compaction on all shards then look at the distribution
>> afterward.
> 
> How do I run compaction on all shards?
> On Fauxton UI I didn't found anywhere any button for database or view
> compaction! :-(
> 
> Teo

Re: Sharding question for clustered CouchDB 2.0

2016-07-22 Thread Robert Newson

Are you updating one doc over and over? That's my inference. Also you'll need 
to run compaction on all shards then look at the distribution afterward. 

Sent from my iPhone

> On 22 Jul 2016, at 21:02, Peyton Vaughn  wrote:
> 
> Hi,
> 
> I've been working through getting a Couch cluster set up in Kubernetes.
> Finally got to the point of testing it and am a bit surprised by the
> distribution of data I see amongst the shards (this is for 2 nodes on 2
> separate host):
> 
> node1:
> ~>du -hs *
> 
> 6.7Gshards/-1fff
> 855Mshards/2000-3fff
> 859Mshards/4000-5fff
> 856Mshards/6000-7fff
> 859Mshards/8000-9fff
> 858Mshards/a000-bfff
> 6.5Gshards/c000-dfff
> 851Mshards/e000-
> 
> node2:
> ~>du -hs *
> 853M-1fff
> 855M2000-3fff
> 859M4000-5fff
> 856M6000-7fff
> 859M8000-9fff
> 858Ma000-bfff
> 853Mc000-dfff
> 851Me000-
> 
> Two of the shards really stand out in terms of disk usage... so I was
> wondering if this is expected behavior, or have I managed to misconfigure
> something?
> 
> 
> I really appreciate any insight - am really trying to understand 2.0 as
> best I can.
> Thanks!
> Peyton

Re: The state of filtered replication

2016-05-26 Thread Robert Newson

All replications should checkpoint periodically too, not just at the end. The 
log will show this, PUT to a _local url. 

Sent from my iPhone

> On 26 May 2016, at 14:04, Paul Okstad <poks...@gmail.com> wrote:
> 
> I'll double check my situation since I have not thoroughly verified it. This 
> particular issue occurs between restarts of the server where I make no 
> changes to the continuous replications in the _replicator DB, but it may also 
> be related to the issue of too many continuous replications causing a 
> replications to stall out from lack of resources. It's possible that I 
> assumed they were starting over from seq 1 when in fact they were never able 
> to complete a full replication in the first place.
> 
> -- 
> Paul Okstad
> 
>> On May 26, 2016, at 2:51 AM, Robert Newson <rnew...@apache.org> wrote:
>> 
>> There must be something else wrong. Filtered replications definitely make 
>> and resume from checkpoints, same as unfiltered.
>> 
>> We mix the filter code and parameters into the replication checkpoint id to 
>> ensure we start from 0 for a potentially different filtering. Perhaps you 
>> are changing those? Or maybe supplying since_seq as well (which overrides 
>> the checkpoint)?
>> 
>> Sent from my iPhone
>> 
>>> On 25 May 2016, at 16:39, Paul Okstad <poks...@gmail.com> wrote:
>>> 
>>> This isn’t just a problem of filtered replication, it’s a major issue in 
>>> the database-per-user strategy (at least in the v1.6.1 I’m using). I’m also 
>>> using a database-per-user design with thousands of users and a single 
>>> global database. If a small fraction of the users (hundreds) has 
>>> continuously ongoing replications from the user DB to the global DB, it 
>>> will cause extremely high CPU utilization. This is without any replication 
>>> filtered javascript function.
>>> 
>>> Another huge issue with filtered replications is that they lose their place 
>>> when replications are restarted. In other words, they don’t keep track of 
>>> sequence ID between restarts of the server or stopping and starting the 
>>> same replication. So for example, if I want to perform filtered replication 
>>> of public documents from the global DB to the public DB, and I have a ton 
>>> of documents in global, then each time I restart the filtered replication 
>>> it will begin from sequence #1. I’m guessing this is due to the fact that 
>>> CouchDB does not know if the filter function has been modified between 
>>> replications, but this behavior is still very disappointing.
>>> 
>>> — 
>>> Paul Okstad
>>> http://pokstad.com <http://pokstad.com/>
>>> 
>>> 
>>> 
>>>> On May 25, 2016, at 4:25 AM, Stefan Klein <st.fankl...@gmail.com> wrote:
>>>> 
>>>> 2016-05-25 12:48 GMT+02:00 Stefan du Fresne <ste...@medicmobile.org>:
>>>> 
>>>> 
>>>> 
>>>>> So to be clear, this is effectively replacing replication— where the
>>>>> client negotiates with the server for a collection of changes to download—
>>>>> with a daemon that builds up a collection of documents that each client
>>>>> should get (and also presumably delete), which clients can then query for
>>>>> when they’re able?
>>>> 
>>>> Sorry, didn't describe well enough.
>>>> 
>>>> On Serverside we have one big database containing all documents and one db
>>>> for each user.
>>>> The clients always replicate to and from their individual userdb,
>>>> unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
>>>> their client.
>>>> 
>>>> Initially we set up a filtered replication for each user from servers main
>>>> database to the server copy of the users database.
>>>> With this we ran into performance problems and sooner or later we probably
>>>> would have ran into issues with open file descriptors.
>>>> 
>>>> So what we do instead is listening to the changes of the main database and
>>>> distribute the documents to the servers userdb, which then are synced with
>>>> the clients.
>>>> 
>>>> Note: this is only for documents the users actually work with (as in
>>>> possibly modify), for queries on the data we query views on the main
>>>> database.
>>>> 
>>>> For the way back, we listen to the _dbchanges, so we get an event for
>>>> changes on the users dbs, get that change from the users db and determine
>>>> what to do with it.
>>>> We do not replicate back users changes to the main database but rather have
>>>> an internal API to evaluate all kinds of constrains on users input.
>>>> If you do not have to check users input, you could certainly listen to
>>>> _dbchanges and "blindly" one-shot replicate from the changed DB to your
>>>> main DB.
>>>> 
>>>> -- 
>>>> Stefan
>>

Re: The state of filtered replication

2016-05-26 Thread Robert Newson

There must be something else wrong. Filtered replications definitely make and 
resume from checkpoints, same as unfiltered.

We mix the filter code and parameters into the replication checkpoint id to 
ensure we start from 0 for a potentially different filtering. Perhaps you are 
changing those? Or maybe supplying since_seq as well (which overrides the 
checkpoint)?

Sent from my iPhone

> On 25 May 2016, at 16:39, Paul Okstad  wrote:
> 
> This isn’t just a problem of filtered replication, it’s a major issue in the 
> database-per-user strategy (at least in the v1.6.1 I’m using). I’m also using 
> a database-per-user design with thousands of users and a single global 
> database. If a small fraction of the users (hundreds) has continuously 
> ongoing replications from the user DB to the global DB, it will cause 
> extremely high CPU utilization. This is without any replication filtered 
> javascript function.
> 
> Another huge issue with filtered replications is that they lose their place 
> when replications are restarted. In other words, they don’t keep track of 
> sequence ID between restarts of the server or stopping and starting the same 
> replication. So for example, if I want to perform filtered replication of 
> public documents from the global DB to the public DB, and I have a ton of 
> documents in global, then each time I restart the filtered replication it 
> will begin from sequence #1. I’m guessing this is due to the fact that 
> CouchDB does not know if the filter function has been modified between 
> replications, but this behavior is still very disappointing.
> 
> — 
> Paul Okstad
> http://pokstad.com 
> 
> 
> 
>> On May 25, 2016, at 4:25 AM, Stefan Klein  wrote:
>> 
>> 2016-05-25 12:48 GMT+02:00 Stefan du Fresne :
>> 
>> 
>> 
>>> So to be clear, this is effectively replacing replication— where the
>>> client negotiates with the server for a collection of changes to download—
>>> with a daemon that builds up a collection of documents that each client
>>> should get (and also presumably delete), which clients can then query for
>>> when they’re able?
>> 
>> Sorry, didn't describe well enough.
>> 
>> On Serverside we have one big database containing all documents and one db
>> for each user.
>> The clients always replicate to and from their individual userdb,
>> unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
>> their client.
>> 
>> Initially we set up a filtered replication for each user from servers main
>> database to the server copy of the users database.
>> With this we ran into performance problems and sooner or later we probably
>> would have ran into issues with open file descriptors.
>> 
>> So what we do instead is listening to the changes of the main database and
>> distribute the documents to the servers userdb, which then are synced with
>> the clients.
>> 
>> Note: this is only for documents the users actually work with (as in
>> possibly modify), for queries on the data we query views on the main
>> database.
>> 
>> For the way back, we listen to the _dbchanges, so we get an event for
>> changes on the users dbs, get that change from the users db and determine
>> what to do with it.
>> We do not replicate back users changes to the main database but rather have
>> an internal API to evaluate all kinds of constrains on users input.
>> If you do not have to check users input, you could certainly listen to
>> _dbchanges and "blindly" one-shot replicate from the changed DB to your
>> main DB.
>> 
>> -- 
>> Stefan
>

Re: 2.0 Clustering Data Encryption

2016-04-27 Thread Robert Newson

Recent Erlang versions make it possible to encrypt the rpc traffic. We don't 
currently include those settings in the run scripts. 

http://erlang.org/doc/apps/ssl/ssl_distribution.html

> On 26 Apr 2016, at 22:43, Oleg Cohen  wrote:
> 
> Greetings,
> 
> I would like to understand if the data exchanged between cluster nodes is 
> securely encrypted. Is there any documentation that explains how the data is 
> passed around?
> 
> Thank you!
> Oleg

Re: CouchDB (1.6.1) crash

2016-03-29 Thread Robert Newson

emfile means you ran out of file descriptors. 

> On 29 Mar 2016, at 05:04, Raja  wrote:
> 
> Hi Everyone,
> 
> We seem to be getting a crash when loading a lot of records in a short
> interval into CouchDB. The crash details are available at:
> https://gist.github.com/rajasaur/c0140776d5d8d78d0200
> 
> This has only happened under load  when we are migrating a lot of mysql
> records into CouchDB. In the process of moving them to CouchDB, I use an
> adhoc query to get the results (for each record) and use the output of the
> query to do some calculation, which goes into the CouchDB document.
> 
> I looked at some of the similar failures in the groups and it seems to be
> mostly related to the number of processes open, which I have not changed at
> all (defaulting to 256k)
> 
> 1> erlang:system_info(process_limit).
> 
> 262144
> Im thinking of trying the following to minimize the load from a code
> perspective:
> a) Make it a named view rather than adhoc view that could then be called
> instead of creating an adhoc view, once for every record.
> b) Increasing the number of processes.
> 
> Are there any other pointers that I should try? Any help would be greatly
> appreciated.
> 
> Thanks
> Raja
> 
> 
> -- 
> Raja
> rajasaur at gmail.com

Re: two couchdb docker containers writing to the same mount?

2015-10-03 Thread Robert Newson

It's definitely not supposed to run this way. You'll certainly corrupt your 
databases if you allow two couchdb instances to write to the same files. 

> On 3 Oct 2015, at 04:56, Dan Santner  wrote:
> 
> I think this is just not the way couch was meant to be used but….
> 
> I setup two docker containers which shared the couchdb database directory and 
> ran them simultaneously.
> 
> Mostly it worked like a cluster….but there were plenty of strange errors.  I 
> assume couch isn’t meant to run this way.  I assume each instance of couch 
> wants to own those files?

Re: Should I use master or developer-preview-2.0 branch?

2015-10-02 Thread Robert Newson

Definitely master, a lot of work has been done in the year(!) since the 
preview. 

> On 2 Oct 2015, at 12:14, Ying Bian  wrote:
> 
> OK. I think I would stay on master. Thanks,
> 
> -Ying
> 
>> On Oct 2, 2015, at 18:43, Alexander Shorin  wrote:
>> 
>> Hi,
>> 
>> They are mostly identical, except master evolves faster while
>> developer-preview is always a bit outdated.
>> --
>> ,,,^..^,,,
>> 
>> 
>>> On Fri, Oct 2, 2015 at 1:39 PM, Ying Bian  wrote:
>>> I want to try out couchdb 2.0 in my new product. Which branch should I use? 
>>> I see the version on both branches are set to 2.0.0.
>>> What’s the difference?
>>> 
>>> -Ying
>

Re: su vs sudo in init script

2015-09-30 Thread Robert Newson

It's an ignorable error caused by the code server scanning for .beam files 
starting in current working dir. The init script should cd to somewhere that 
couchdb can read, but does not. Using sudo must have a side effect of changing 
cwd. I strongly advise returning to su but adding a cd call to somewhere 
couchdb can read. 

> On 29 Sep 2015, at 12:12, Tom Chiverton  wrote:
> 
> I've just installed the latest CouchDb (from source, using EPEL's erlang) on 
> the latest Amazon Linux, and although it starts up, the 'verify your 
> installation' tests from Futon do not complete, and errors like this:
> 
> [Fri, 25 Sep 2015 15:14:05 GMT] [error] [<0.20.0>] {error_report,<0.9.0>,
>  {<0.20.0>,std_error,
>   "File operation error: eacces. Target: 
> ./couch_os_daemons.beam. Function: get_file. Process: code_server."}}
> 
> are logged.
> If I alter the init script run_command() to be
>   if sudo -i -u $COUCHDB_USER $command ; then
> instead of 
>   if su ...
> then everything goes smoothly.
> 
> Does anyone know what I might have done wrong ? To add the user I did:
> sudo adduser -r --home /usr/local/var/lib/couchdb -M --shell /bin/bash 
> --comment "CouchDB Administrator" couchdb
> sudo chown -R couchdb:couchdb /usr/local/etc/couchdb
> sudo chown -R couchdb:couchdb /usr/local/var/lib/couchdb
> sudo chown -R couchdb:couchdb /usr/local/var/log/couchdb
> sudo chown -R couchdb:couchdb /usr/local/var/run/couchdb
> sudo chown -R couchdb:couchdb /usr/local/lib/couchdb
> sudo chmod 0770 /usr/local/etc/couchdb
> sudo chmod 0770 /usr/local/var/lib/couchdb
> sudo chmod 0770 /usr/local/var/log/couchdb
> sudo chmod 0770 /usr/local/var/run/couchdb
> -- 
> 
> 
> 
> Tom Chiverton
> Lead Developer
> 
> e:t...@extravision.com
> p:0161 817 2922
> t:@extravision
> w:www.extravision.com
> 
> 
> 
> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, M15 
> 4LD.
> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
> 
> This e-mail is intended solely for the person to whom it is addressed and may 
> contain confidential or privileged information.
> Any views or opinions presented in this e-mail are solely of the author and 
> do not necessarily represent those of Extravision Ltd.
>

Re: Crashing couchdb

2014-08-06 Thread Robert Newson

Default timeout in vhost module is a bug. 5s not long enough for that. 

Sent from my iPhone

 On 6 Aug 2014, at 12:48, Jason Woods de...@jasonwoods.me.uk wrote:
 
 Hi all,
 
 Hopefully someone can help shed some light on this. The logs aren't the 
 easiest thing to understand :(
 CouchDB is 1.6.0. Thousands of document, all are several MB in size.
 
 When we put a little bit of a read strain on it, it appears that memory 
 consumption climbs to ridiculous proportions, around 5-6GB.
 Then it crashes completely. A few errors appear and then all processes end.
 Before we upgraded to 1.6.0 we were using 1.3.1, and under the same 
 conditions, it would continuously crash and restart processes, and eventually 
 OOM-killer kicks in and destroys everything nice in the world. Since 
 upgrading to 1.6.0 this morning it appears to just completely crash instead.
 
 When I say bit of a read strain... to me it looks honestly like a maximum of 
 1 or 2 reads a second maximum. Though the transfer time will potentially be 
 long due to the several MB size.
 
 Here is what appears in the log a few seconds before the crash:
 
 [Wed, 06 Aug 2014 11:31:03 GMT] [error] [0.531.0] {error_report,0.30.0,
 {0.531.0,crash_report,
  [[{initial_call,
 {mochiweb_acceptor,init,
  ['Argument__1','Argument__2','Argument__3']}},
{pid,0.531.0},
{registered_name,[]},
{error_info,
 {exit,
  {timeout,
   {gen_server,call,[couch_httpd_vhost,get_state]}},
  [{gen_server,call,2},
   {couch_httpd_vhost,dispatch_host,1},
   {couch_httpd,handle_request,5},
   {mochiweb_http,headers,5},
   {proc_lib,init_p_do_apply,3}]}},
{ancestors,
 [couch_httpd,couch_secondary_services,
  couch_server_sup,0.31.0]},
{messages,
 [{#Ref0.0.0.3824,
   {vhosts_state,[],
[_utils,_uuids,_session,_oauth,_users],
#Funcouch_httpd.8.2523}}]},
{links,[0.126.0,#Port0.3479]},
{dictionary,[{couch_rewrite_count,0}]},
{trap_exit,false},
{status,running},
{heap_size,2584},
{stack_size,24},
{reductions,941}],
   []]}}
 [Wed, 06 Aug 2014 11:31:31 GMT] [error] [0.530.0] {error_report,0.30.0,
 {0.530.0,crash_report,
  [[{initial_call,
 {mochiweb_acceptor,init,
  ['Argument__1','Argument__2','Argument__3']}},
{pid,0.530.0},
{registered_name,[]},
{error_info,
 {exit,
  {timeout,
   {gen_server,call,[couch_httpd_vhost,get_state]}},
  [{gen_server,call,2},
   {couch_httpd_vhost,dispatch_host,1},
   {couch_httpd,handle_request,5},
   {mochiweb_http,headers,5},
   {proc_lib,init_p_do_apply,3}]}},
{ancestors,
 [couch_httpd,couch_secondary_services,
  couch_server_sup,0.31.0]},
{messages,
 [{#Ref0.0.0.3825,
   {vhosts_state,[],
[_utils,_uuids,_session,_oauth,_users],
#Funcouch_httpd.8.2523}}]},
{links,[0.126.0,#Port0.3480]},
{dictionary,[{couch_rewrite_count,0}]},
{trap_exit,false},
{status,running},
{heap_size,2584},
{stack_size,24},
{reductions,941}],
   []]}}
 
 Any advice on diagnosing is gratefully appreciated.
 
 Regards,
 
 Jason

Re: badmatch on initial config file password

2014-07-02 Thread Robert Newson

Sorry about that. Fixed on master. 

Sent from my iPhone

 On 2 Jul 2014, at 01:53, Nathan Vander Wilt nate-li...@calftrail.com wrote:
 
 I am trying to set up CouchDB from a script, which makes a couch.ini config 
 file that includes this line:
 
 
 [admins]
 admin = password
 
 On my local machine with CouchDB 1.5.0 that gets automatically hashed on 
 first launch. However, when another developer on the project tries to start 
 CouchDB 1.6.0 on his machine, he gets:
 
 $ couchdb -a local_data/couch.ini -o /dev/null -e local_data/logs/couch.stderr
 Apache CouchDB 1.6.0 (LogLevel=info) is starting.
 {init terminating in 
 do_boot,{{badmatch,{error,{bad_return,{{couch_app,start,[normal,[/usr/local/etc/couchdb/default.ini,/usr/local/etc/couchdb/local.ini]]},{'EXIT',{{badmatch,{error,{shutdown,{failed_to_start_child,couch_primary_services,{shutdown,{failed_to_start_child,couch_server,{function_clause,[{couch_passwords,hash_admin_password,[password],[{file,couch_passwords.erl},{line,30}]},{couch_server,'-hash_admin_passwords/1-fun-0-',2,[{file,couch_server.erl},{line,148}]},{lists,foreach,2,[{file,lists.erl},{line,1336}]},{couch_server,init,1,[{file,couch_server.erl},{line,172}]},{gen_server,init_it,6,[{file,gen_server.erl},{line,306}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,239}]}]}}},[{couch_server_sup,start_server,1,[{file,couch_server_sup.erl},{line,98}]},{application_master,start_it_old,4,[{file,application_master.erl},{line,272}]}]}},[{couch,start,0,[{file,couch.erl},{line,18}]},{init,start_it,1,[]},{init,start_em,1,[]}]}}
 Crash dump was written to: erl_crash.dump
 init terminating in do_boot ()
 
 
 Has this behavior changed in CouchDB 1.6.0? Is there a way to set a password 
 via plaintext config anymore?
 
 thanks,
 -natevw

Re: Issue with CouchDB

2014-06-17 Thread Robert Newson

Sounds like couchdb-1415

Sent from my iPhone

 On 17 Jun 2014, at 12:34, kankanala karthik karthi...@beehyv.com wrote:
 
 Hi All,
 
 In the TAMA implementation, I came across an issue with Couchdb. (Version 
 1.2.0) ,  
 
 We are using  named documents to maintain unique constraint logic in the 
 application. (named documents : whose _id is user defined but not couch 
 generated.)
 
 
 We are using the  REST API to add the documents to Couchdb, where we found 
 below strange behavior : 
 
 
 When we try to recreate the documents using HTTP PUT which have been deleted 
 in the past(because of bug in the code), the documents are not created the 
 first time .
 
 HTTP Put - Returns HTTP 200, but doc is not saved in couchdb. 
 Again trying the same request, 
 HTTP Put - Returns HTTP 200 and adds the doc in database.
 
 HTTP PUT request needs to be sent twice to create and save the doc.
 
  I have checked that the above bug is reproducible for deleted docs, i.e the 
 response for GET _id is {error:not_found,reason:deleted}.
 
 This looks like a bug in CouchDB to me, could you please let us know if you 
 could think of any scenario where above error might occur and any possible 
 workarounds/solutions ?
 
 Thanks,
 Karthik.

Re: CouchDB: Most recently added document that is NOT a design document

2014-01-06 Thread Robert Newson


read _changes?descending=true row by row until you reach a non-design document? 
The doc to ddoc ratio should be strongly in your favor.

B.

On 6 Jan 2014, at 22:00, Stanley Iriele siriele...@gmail.com wrote:

 Could you do what Jens just mentioned and just make a filter?that way a
 seq number plus the filter should get you what you want
 On Jan 6, 2014 1:28 PM, Jens Alfke j...@couchbase.com wrote:
 
 
 On Jan 6, 2014, at 12:42 PM, Hank Knight hknight...@gmail.com wrote:
 
 I want the ID of the most recently added document that is NOT a design
 document.
 
 There’s nothing built-in for that. CouchDB doesn’t track the order in
 which documents are created, only the order in which they’re changed.
 
 You could put a “date_created” property in a document and populate it with
 a timestamp when the doc is first created; then you can make a view that
 emits those as keys, and query it in reverse order.
 
 —Jens

Re: Disabling doc include

2014-01-02 Thread Robert Newson


It is relevant, the OP could use multiple databases to expose the subset of 
documents to the appropriate subset of users.

Mentioning Couchbase is not relevant. :)

B.

On 2 Jan 2014, at 00:40, Jens Alfke j...@couchbase.com wrote:

 
 On Jan 1, 2014, at 3:27 PM, Robert Newson rnew...@apache.org wrote:
 
 There’s no document level read protection, but you can certainly grant or 
 deny read access to users on a per database basis.
 
 Yes, but that isn’t relevant to what the OP is trying to do, i.e. give users 
 access to some data but not all of it.
 
 The restrictive proxy approach is brittle, it requires that you know all the 
 URL patterns to block and keep them up to date when you upgrade CouchDB. It 
 can work, it’s just not awesome.
 
 Yes. I only brought it up because it’s the only way I know of to enable some 
 form of per-document read protection using Apache CouchDB (as opposed to 
 something similar-but-not-the-same, like Couchbase Sync Gateway.)
 
 —Jens

Re: Disabling doc include

2014-01-02 Thread Robert Newson


It doesn’t achieve the same effect, though, the virtual host + url rewriter is 
not an access control mechanism. You’re still granting database-wide read 
permissions to the user.

B.


On 2 Jan 2014, at 09:09, Florian Westreicher Bakk.techn. st...@meredrica.org 
wrote:

 I put a design doc behind a desk record / virtual host, that should do the 
 trick. The user that is used by the app is read only 
 
 Robert Newson rnew...@apache.org wrote:
 there’s no notion of read-protection in CouchDB.
 
 There’s no document level read protection, but you can certainly grant
 or deny read access to users on a per database basis. That’s by design
 due to the ease that information could leak out through views
 (particularly reduce views). The restrictive proxy approach is brittle,
 it requires that you know all the URL patterns to block and keep them
 up to date when you upgrade CouchDB. It can work, it’s just not
 awesome.
 
 B.
 
 .
 
 On 1 Jan 2014, at 20:47, Jens Alfke j...@couchbase.com wrote:
 
 
 On Dec 31, 2013, at 1:44 AM, meredrica st...@meredrica.org wrote:
 
 I expose CouchDB directly to mobile clients and wanted to hide some 
 information from them.
 
 You can’t really do that; there’s no notion of read-protection in
 CouchDB.
 As a workaround you can put CouchDB behind a proxy or gateway, and
 restrict the URL patterns that clients are allowed to send.
 
 —Jens
 
 
 -- 
 Sent from Kaiten Mail. Please excuse my brevity.

Re: Disabling doc include

2014-01-01 Thread Robert Newson

there’s no notion of read-protection in CouchDB.

There’s no document level read protection, but you can certainly grant or deny 
read access to users on a per database basis. That’s by design due to the ease 
that information could leak out through views (particularly reduce views). The 
restrictive proxy approach is brittle, it requires that you know all the URL 
patterns to block and keep them up to date when you upgrade CouchDB. It can 
work, it’s just not awesome.

B.

 .

On 1 Jan 2014, at 20:47, Jens Alfke j...@couchbase.com wrote:

 
 On Dec 31, 2013, at 1:44 AM, meredrica st...@meredrica.org wrote:
 
 I expose CouchDB directly to mobile clients and wanted to hide some 
 information from them.
 
 You can’t really do that; there’s no notion of read-protection in CouchDB.
 As a workaround you can put CouchDB behind a proxy or gateway, and restrict 
 the URL patterns that clients are allowed to send.
 
 —Jens

Re: [ANNOUNCE] Nick North elected as CouchDB committer

2014-01-01 Thread Robert Newson

Welcome!

On 1 Jan 2014, at 20:20, Simon Metson si...@cloudant.com wrote:

 w00t! 
 
 
 On Wednesday, 1 January 2014 at 19:24, Dave Cottlehuber wrote:
 
 Dear community,
 
 There's nothing like starting off the New Year with a New Committer!!
 
 I am pleased to announce that the CouchDB Project Management Committee
 has elected Nick North as a CouchDB committer.
 
 Apache ID: nicknorth
 
 IRC nick: NickN
 
 Twitter: @_shastra
 
 By default, external contributions to the project follow the
 Review-Then-Commit model. Being a committer means you can follow the
 Commit-Then-Review model. In other words, Nick can now make changes at
 will.
 
 This election was made in recognition of Nick's existing contributions
 and commitment to the project.
 
 Please join me in extending a warm welcome to Nick!
 
 On behalf of the CouchDB PMC,
 
 Dave Cottlehuber

Re: Timeout using Erlang views with large documents

2013-12-21 Thread Robert Newson

I filed https://issues.apache.org/jira/browse/COUCHDB-2013 for this.

The patch will be a little more involved than just changing the prompt function 
as the run method does not respect the timeout for many of its clauses. While 
changing the gen_server call to infinity is an easy fix it removes any upper 
limit on execution time of a map or reduce function. Perhaps that’s fine, maybe 
we allow native processes to take forever (in which case we should remove all 
the existing timeout plumbing), but I can’t quite convince myself of that.

On 20 Dec 2013, at 15:10, Adam Kocoloski kocol...@apache.org wrote:

 Hey folks, back to the original question, the native process gen_server 
 respects the timeout internally but the public API in the module still makes 
 a gen_server:call with the default 5 second timeout:
 
 https://github.com/apache/couchdb/blob/1.5.0/src/couchdb/couch_native_process.erl#L62-L63
 
 Contrast this with the OS process version where it sets the timeout on the 
 client call to infinity (thus leaving it to the server to control the flow):
 
 https://github.com/apache/couchdb/blob/1.5.0/src/couchdb/couch_os_process.erl#L51-L58
 
 Teaching the native_process API to do the same would be a welcome change.  Is 
 there a JIRA for this one already?
 
 Adam

Re: Timeout using Erlang views with large documents

2013-12-18 Thread Robert Newson

I've confirmed that the native view server honors that timeout, can
you tell me what;

curl localhost:5984/_config/couchdb/os_process_timeout

returns? You might need to bounce couchdb in any case, as it applies
this timeout setting when it creates the process, and we keep a pool
of them around, so changes to timeout after that won't be picked up
until they're rebuild. restarting couchdb is the quickest way to
ensure that.

B.


On 18 December 2013 16:20, david martin david.mar...@lymegreen.co.uk wrote:
 Futon on Apache CouchDB 1.2 (according to Futon)
 {couchdb:Welcome,version:1.2.0} according to ?
 CouchDB 1.4.0 Ubuntu according to Package name

 I set os_process_timeout 50 (effective infinity).

  I ALWAYS get the VERY unhelpful message which merely prints the document
 contents.

 Error: timeout   % yes I know this but cannot do anything about it

 {gen_server,call, % it's in a gen_server yes I know this!
 [0.14190.8,   % this is its PID yes I know this!
  {prompt,[map_doc,   % it is a MAP function yes I know
 this!
 {[{_id,61c3f496b9e4c8dc29b95270d9000370}, % it is the document I
 am processing, Yes I know this!
 {_rev,9-e48194151642345e0e3a4a5edfee56e4},
 .

 Yes it is a large and complex document (16K lines to make this happen on
 fast machine much less on Raspberry Pi).
 Yes it uses Erlang view function.
 Yes I DO want it to hog resources until it is finished.
 Yes I am the administrator.
 No  I AM NOT INTERFERING WITH ANYTHING ELSE.
 No I cannot dictate how big or small the document is.
 Yes this is important to me.
 I have not pursued this as I was using rcouch, I could not find the source
 of the timeout message.
 I did not want to have to rebuild to fix this.
 I did not want to bother the Couchdb team as I was using a fork of CouchDB.
 Simlar issues have been raised and no answers forthcoming.
 Mentions of hidden tweaks, this is not good for you, have you got big
 documents  etc.

 How do I get this NOT to timeout?

 On rcouch I would change a value and rebuild a release to fix this (if I
 could identify the source).
 If anybody can give a clue I will test their hypothesis and report back to
 the list.

 --
 David Martin

Re: Timeout using Erlang views with large documents

2013-12-18 Thread Robert Newson

couch_native_server has the set_timeout callback, though. I'll re-test shortly.

B.


On 18 December 2013 18:17, Alexander Shorin kxe...@gmail.com wrote:
 iirc native query server has hardcoded timeout 5000 and ignores
 os_process_timeout setting.
 --
 ,,,^..^,,,


 On Wed, Dec 18, 2013 at 10:05 PM, Robert Newson rnew...@apache.org wrote:
 I've confirmed that the native view server honors that timeout, can
 you tell me what;

 curl localhost:5984/_config/couchdb/os_process_timeout

 returns? You might need to bounce couchdb in any case, as it applies
 this timeout setting when it creates the process, and we keep a pool
 of them around, so changes to timeout after that won't be picked up
 until they're rebuild. restarting couchdb is the quickest way to
 ensure that.

 B.


 On 18 December 2013 16:20, david martin david.mar...@lymegreen.co.uk wrote:
 Futon on Apache CouchDB 1.2 (according to Futon)
 {couchdb:Welcome,version:1.2.0} according to ?
 CouchDB 1.4.0 Ubuntu according to Package name

 I set os_process_timeout 50 (effective infinity).

  I ALWAYS get the VERY unhelpful message which merely prints the document
 contents.

 Error: timeout   % yes I know this but cannot do anything about it

 {gen_server,call, % it's in a gen_server yes I know this!
 [0.14190.8,   % this is its PID yes I know this!
  {prompt,[map_doc,   % it is a MAP function yes I know
 this!
 {[{_id,61c3f496b9e4c8dc29b95270d9000370}, % it is the document I
 am processing, Yes I know this!
 {_rev,9-e48194151642345e0e3a4a5edfee56e4},
 .

 Yes it is a large and complex document (16K lines to make this happen on
 fast machine much less on Raspberry Pi).
 Yes it uses Erlang view function.
 Yes I DO want it to hog resources until it is finished.
 Yes I am the administrator.
 No  I AM NOT INTERFERING WITH ANYTHING ELSE.
 No I cannot dictate how big or small the document is.
 Yes this is important to me.
 I have not pursued this as I was using rcouch, I could not find the source
 of the timeout message.
 I did not want to have to rebuild to fix this.
 I did not want to bother the Couchdb team as I was using a fork of CouchDB.
 Simlar issues have been raised and no answers forthcoming.
 Mentions of hidden tweaks, this is not good for you, have you got big
 documents  etc.

 How do I get this NOT to timeout?

 On rcouch I would change a value and rebuild a release to fix this (if I
 could identify the source).
 If anybody can give a clue I will test their hypothesis and report back to
 the list.

 --
 David Martin

Re: Timeout using Erlang views with large documents

2013-12-18 Thread Robert Newson

Yes, reconfirmed my finding. I added ?LOG_INFO lines to the
set_timeout clause in couch_native_server and it gets the current
os_process_timeout value. That's a bit silly (given it's not an os
process) but at least it's configurable. I stand by my original reply.

B.


On 18 December 2013 18:31, Robert Newson rnew...@apache.org wrote:
 couch_native_server has the set_timeout callback, though. I'll re-test 
 shortly.

 B.


 On 18 December 2013 18:17, Alexander Shorin kxe...@gmail.com wrote:
 iirc native query server has hardcoded timeout 5000 and ignores
 os_process_timeout setting.
 --
 ,,,^..^,,,


 On Wed, Dec 18, 2013 at 10:05 PM, Robert Newson rnew...@apache.org wrote:
 I've confirmed that the native view server honors that timeout, can
 you tell me what;

 curl localhost:5984/_config/couchdb/os_process_timeout

 returns? You might need to bounce couchdb in any case, as it applies
 this timeout setting when it creates the process, and we keep a pool
 of them around, so changes to timeout after that won't be picked up
 until they're rebuild. restarting couchdb is the quickest way to
 ensure that.

 B.


 On 18 December 2013 16:20, david martin david.mar...@lymegreen.co.uk 
 wrote:
 Futon on Apache CouchDB 1.2 (according to Futon)
 {couchdb:Welcome,version:1.2.0} according to ?
 CouchDB 1.4.0 Ubuntu according to Package name

 I set os_process_timeout 50 (effective infinity).

  I ALWAYS get the VERY unhelpful message which merely prints the document
 contents.

 Error: timeout   % yes I know this but cannot do anything about it

 {gen_server,call, % it's in a gen_server yes I know this!
 [0.14190.8,   % this is its PID yes I know this!
  {prompt,[map_doc,   % it is a MAP function yes I know
 this!
 {[{_id,61c3f496b9e4c8dc29b95270d9000370}, % it is the document 
 I
 am processing, Yes I know this!
 {_rev,9-e48194151642345e0e3a4a5edfee56e4},
 .

 Yes it is a large and complex document (16K lines to make this happen on
 fast machine much less on Raspberry Pi).
 Yes it uses Erlang view function.
 Yes I DO want it to hog resources until it is finished.
 Yes I am the administrator.
 No  I AM NOT INTERFERING WITH ANYTHING ELSE.
 No I cannot dictate how big or small the document is.
 Yes this is important to me.
 I have not pursued this as I was using rcouch, I could not find the source
 of the timeout message.
 I did not want to have to rebuild to fix this.
 I did not want to bother the Couchdb team as I was using a fork of CouchDB.
 Simlar issues have been raised and no answers forthcoming.
 Mentions of hidden tweaks, this is not good for you, have you got big
 documents  etc.

 How do I get this NOT to timeout?

 On rcouch I would change a value and rebuild a release to fix this (if I
 could identify the source).
 If anybody can give a clue I will test their hypothesis and report back to
 the list.

 --
 David Martin

Re: Timeout using Erlang views with large documents

2013-12-18 Thread Robert Newson

There is something hard coded in there and I will find it eventually
and find why it was put there and by whom.

This attitude might discourage people from helping you with your efforts.

B.


On 18 December 2013 22:33, david martin david.mar...@lymegreen.co.uk wrote:
 On 18/12/13 18:05, Robert Newson wrote:

 I've confirmed that the native view server honors that timeout, can
 you tell me what;

 curl localhost:5984/_config/couchdb/os_process_timeout


 restart CouchDB  on 1.2 (latest in Ubuntu) then

 curl david:@localhost:5984/_config/couchdb/os_process_timeout
 50
 rerun gives
 Error: timeout

 {gen_server,call,
 [0.200.0,
  {prompt,[map_doc,
 {[{_id,61c3f496b9e4c8dc29b95270d9000370},
 {_rev,9-e48194151642345e0e3a4a5edfee56e4},
 {test,
  {[{hey,
 {[{_id,
 61c3f496b9e4c8dc29b95270d9000370},}

 Test JSON here ~16K lines

 https://friendpaste.com/6LkCbdENAe1gOZlD9DWCod

 Code as in couchdb/erlang  list in Using the Erlang view server to Educate
 in CouchDB

 I have looked for this for some time hoping next release would fix it.
 There is something hard coded in there and I will find it eventually and
 find why it was put there and by whom.




 returns? You might need to bounce couchdb in any case, as it applies
 this timeout setting when it creates the process, and we keep a pool
 of them around, so changes to timeout after that won't be picked up
 until they're rebuild. restarting couchdb is the quickest way to
 ensure that.

 B.


 On 18 December 2013 16:20, david martin david.mar...@lymegreen.co.uk
 wrote:

 Futon on Apache CouchDB 1.2 (according to Futon)
 {couchdb:Welcome,version:1.2.0} according to ?
 CouchDB 1.4.0 Ubuntu according to Package name

 I set os_process_timeout 50 (effective infinity).

   I ALWAYS get the VERY unhelpful message which merely prints the
 document
 contents.

 Error: timeout   % yes I know this but cannot do anything about it

 {gen_server,call, % it's in a gen_server yes I know this!
  [0.14190.8,   % this is its PID yes I know this!
   {prompt,[map_doc,   % it is a MAP function yes I know
 this!
 {[{_id,61c3f496b9e4c8dc29b95270d9000370}, % it is the
 document I
 am processing, Yes I know this!
 {_rev,9-e48194151642345e0e3a4a5edfee56e4},
  .

 Yes it is a large and complex document (16K lines to make this happen on
 fast machine much less on Raspberry Pi).
 Yes it uses Erlang view function.
 Yes I DO want it to hog resources until it is finished.
 Yes I am the administrator.
 No  I AM NOT INTERFERING WITH ANYTHING ELSE.
 No I cannot dictate how big or small the document is.
 Yes this is important to me.
 I have not pursued this as I was using rcouch, I could not find the
 source
 of the timeout message.
 I did not want to have to rebuild to fix this.
 I did not want to bother the Couchdb team as I was using a fork of
 CouchDB.
 Simlar issues have been raised and no answers forthcoming.
 Mentions of hidden tweaks, this is not good for you, have you got
 big
 documents  etc.

 How do I get this NOT to timeout?

 On rcouch I would change a value and rebuild a release to fix this (if I
 could identify the source).
 If anybody can give a clue I will test their hypothesis and report back
 to
 the list.

 --
 David Martin




 --
 David Martin

Re: New errors introduced after migrating from couchdb v1.2 to v1.5

2013-12-17 Thread Robert Newson

emfile: you ran out of file descriptors.

B.


On 17 December 2013 21:02, Glen Aidukas gaidu...@behaviormatrix.com wrote:
 Hello,

 I am hoping someone knows what my issue might be.  We recently migrated our 
 data from a couchdb v1.2 server over to a new build with more resources 
 running v1.5.

 We are now seeing some errors with the following in the logs.


 {error:{{case_clause,{{badmatch,{error,emfile}},\n   
 [{couch_file,init,1},\n{gen_server,init_it,6},\n  
   {proc_lib,init_p_do_apply,3}]}},\n [{couch_server,handle_info,2},\n  
 {gen_server,handle_msg,5},\n  
 {proc_lib,init_p_do_apply,3}]},reason:{gen_server,call,\n
 [couch_server,\n {open,\z_673_results\,\n   
 [{user_ctx,{user_ctx,null,[],undefined}}]},\n infinity]}}


 Everything else seems to be working properly.

 Any ideas on what the issue may be?

 Thanks!

 -Glen

Re: New errors introduced after migrating from couchdb v1.2 to v1.5

2013-12-17 Thread Robert Newson

Yup, thanks ubuntu/debian for that (longstanding annoyance). btw, It's
/etc/pam.d/su though, couchdb su's to the couchdb user during startup.
common-session clearly works, though.

B.


On 17 December 2013 21:56, Glen Aidukas gaidu...@behaviormatrix.com wrote:
 We just figured this out.

 I had placed in file: /etc/security/limits.conf the following:

 *   softnofile  16384
 *   hardnofile  32768

 But it turns out that we also needed to edit: /etc/pam.d/common-session  and 
 add the following line to the end:

 session required pam_limits.so

 Once we did this and restarted, the problem went away! :)

 Thanks for getting back to me though...  :)

 -Glen


 -Original Message-
 From: matt j. sorenson [mailto:m...@sorensonbros.net]
 Sent: Tuesday, December 17, 2013 4:48 PM
 To: user@couchdb.apache.org
 Subject: Re: New errors introduced after migrating from couchdb v1.2 to v1.5

 On Tue, Dec 17, 2013 at 3:27 PM, Robert Newson rnew...@apache.org wrote:

 emfile: you ran out of file descriptors.

 B.


 Can this be solved with a bigger thesaurus? haha (sorry) --Matt

Re: New errors introduced after migrating from couchdb v1.2 to v1.5

2013-12-17 Thread Robert Newson

And, for posterity, you can check;

cat /proc/`pidof beam.smp`/limits

to check that it was applied.

B.


On 17 December 2013 22:11, Robert Newson rnew...@apache.org wrote:
 Yup, thanks ubuntu/debian for that (longstanding annoyance). btw, It's
 /etc/pam.d/su though, couchdb su's to the couchdb user during startup.
 common-session clearly works, though.

 B.


 On 17 December 2013 21:56, Glen Aidukas gaidu...@behaviormatrix.com wrote:
 We just figured this out.

 I had placed in file: /etc/security/limits.conf the following:

 *   softnofile  16384
 *   hardnofile  32768

 But it turns out that we also needed to edit: /etc/pam.d/common-session  and 
 add the following line to the end:

 session required pam_limits.so

 Once we did this and restarted, the problem went away! :)

 Thanks for getting back to me though...  :)

 -Glen


 -Original Message-
 From: matt j. sorenson [mailto:m...@sorensonbros.net]
 Sent: Tuesday, December 17, 2013 4:48 PM
 To: user@couchdb.apache.org
 Subject: Re: New errors introduced after migrating from couchdb v1.2 to v1.5

 On Tue, Dec 17, 2013 at 3:27 PM, Robert Newson rnew...@apache.org wrote:

 emfile: you ran out of file descriptors.

 B.


 Can this be solved with a bigger thesaurus? haha (sorry) --Matt

Re: Force the start sequence of a replication

2013-12-12 Thread Robert Newson

Hi,

Add a property called since_seq to your second replication with the
update sequence you wish to start at. Like;

{source:source url here, target:target url, since_seq:9}

This was introduce in CouchDB 1.2.0;

* Added optional field `since_seq` to replication objects/documents.
  It allows to bootstrap a replication from a specific source sequence
  number.

Also works at Cloudant.

B.

On 12 December 2013 08:16, Zoé Bellot zoe.bel...@cozycloud.cc wrote:
 Hello couchDB users,

 I would like to replicate two databases with a one-shot replication with a
 filter which takes documents with a field 'docType' equals to File or
 Folder. Once, this replication done, I would like to do a continuous
 replication between these two databases with a filter which takes documents
 with a field 'docType' equals to File or Folder and deleted document.

 However, I would like to start the second replication at the last sequence
 of the first replication. I can find the sequence number in the first
 replication information but I don't know how force the second replication
 to start at this sequence.
 Is there a way to do this ?

 Thanks for your answer.

 Sorry for my english, I'm not a native English speaker.

 Zoé

Re: bulk update failing when document has attachments?

2013-12-11 Thread Robert Newson

I think your image\/png is just an artifact of your printing method,
you don't need to escape the forward slash in content_type, see
example below;

{_id:doc1,_rev:1-96e2a6c78b8bfb227e79e1fbb16873f9,_attachments:{att1:{content_type:image/png,revpos:1,digest:md5-XUFAKrxLKna5cZ2REBfFkg==,length:5,stub:true}}}

B.


On 11 December 2013 12:35, Daniel Gonzalez gonva...@gonvaled.com wrote:
 That would work *only* for that prefix (data:image/png;base64,), or any
 prefix which happens to have the same length. Not very robust.

 I just discovered that the data coming from the front-end comes in data-uri
 format (rfc2397). This should handle any rfc2397 prefix:
 http://stackoverflow.com/a/20518589/647991 (maybe buggy, just implemented).

 Another question: even after removing the data-uri prefix, I am still
 getting problems. I think my content type is not right.

 Must content_type be escaped? That is:

 'content_type': 'image/png', - 'content_type': 'image\/png',

 The only reference I see to that is an example here:
 http://wiki.apache.org/couchdb/HTTP_Document_API#Inline_Attachments

 But no real explanation of why. It seems no other strings must be escaped
 for couchdb. The only requirement that couchdb seems to impose on top of
 json is that the data in the attachment must be in base64 format.

 But now it seems that the content_type must escape the slashes (/). Why? It
 does not seem to be a json feature: slashes are fine in any json string. So
 what is that?

 I would like to know the specificacion for the format expected for
 content_type. Does that have a name? I am calling it escaped mediatype.
 Is it part of a more generic escaping process expected by couchdb, or only
 the content_type is affected? Is there an official name for that?


 On Wed, Dec 11, 2013 at 12:18 PM, Johannes Jörg Schmidt 
 schm...@netzmerk.com wrote:

 data.slice(22)

 2013/12/11 Daniel Gonzalez gonva...@gonvaled.com:
  Thanks, I just realized about this. The base64 is coming from the
  javascript frontend (chose file in a form). So I need to remove the
  prefix data:image/png;base64,.
  Not sure how to do this without rolling my own regexes though.
 
 
  On Wed, Dec 11, 2013 at 12:01 PM, Alexander Shorin kxe...@gmail.com
 wrote:
 
  Hi,
 
  _attachments data should be valid base64 encoded string, while you have:
  
 
 data:image/png;base64,iVBORw0KGgoNSUhEUgAAAM8AAADkCAIAAACwiOf9A3NCSVQICAjb4U/gAAAgAElEQVR4nO...
 
  Chars : and , are invalid for base64.
  --
  ,,,^..^,,,
 
 
  On Wed, Dec 11, 2013 at 2:49 PM, Daniel Gonzalez gonva...@gonvaled.com
 
  wrote:
   Hi,
  
   (SO reference: http://stackoverflow.com/q/20516980/647991. I post
 there
   because formatting makes things much easier to read, replies /
 comments
  are
   well organized, and the up/downvote mechanism works)
  
   I am performing the following operation:
  
   1. Prepare some documents: `docs = [ doc1, doc2, ... ]`. The documents
  have
   *maybe* attachments
   2. I `POST` to `_bulk_docs` the list of documents
   3. I get an `Exception  Problems updating list of documents (length =
  1):
   (500, ('badarg', '58'))`
  
   My `bulk_docs` is (in this case just one):
  
   [   {   '_attachments': {   'image.png': {   'content_type':
   'image/png',
'data':
  
 
 'data:image/png;base64,iVBORw0KGgoNSUhEUgAAAM8AAADkCAIAAACwiOf9A3NCSVQICAjb4U/gAAAgAElEQVR4nO...'}},
   '_id': '08b8fc66-cd90-47a1-9053-4f6fefabdfe3',
   '_rev': '15-ff3d0e8baa56e5ad2fac4937264fb3f6',
   'docmeta': {   'created': '2013-10-01 14:48:24.311257',
  'updated': [   '2013-10-01
 14:48:24.394157',
 '2013-12-11
 08:19:47.271812',
 '2013-12-11
 08:25:05.662546',
 '2013-12-11
 10:38:56.116145']},
   'org_id': 45345,
   'outputs_id': None,
   'properties': {   'auto-t2s': False,
 'content_type': 'image/png',
 'lang': 'es',
 'name': 'dfasdfasdf',
 'text': 'erwerwerwrwerwr'},
   'subtype': 'voicemail-st',
   'tags': ['RRR-ccc-dtjkqx'],
   'type': 'recording'}]
  
   This is the detailed exception:
  
   Traceback (most recent call last):
 File portal_support_ut.py, line 470, in test_UpdateDoc
   self.ps.UpdateDoc(self.org_id, what, doc_id, new_data)
 File
  
 
 /home/gonvaled/projects/new-wavilon-portal/python_modules/wav/ps/complex_ops.py,
   line 349, in UpdateDoc
   success, doc = database.UpdateDoc(doc_id, new_data)
 File
  
 
 /home/gonvaled/projects/new-wavilon-portal/python_modules/wav/cdb/core/updater.py,
   line 38, in UpdateDoc
   res = self.SaveDoc(doc_id, doc)
 File

Re: bulk update failing when document has attachments?

2013-12-11 Thread Robert Newson

http://json.org/string.gif talks escaping back slash, not forward
slash. The PDF page 194 talks about escaping forward slash within a
RegExp statement in Javascript, which is not JSON.

B.


On 11 December 2013 12:58, Daniel Gonzalez gonva...@gonvaled.com wrote:
 It is not an artifact: I am taking that from the couchdb documentation.

 And according to Alexander Shorin forward slashes **really** need to be
 escaped in json. But it is not me who must do that, but the library
 converting the python objects to couchdb, so that can not be my problem.

 Now I am left with a badarg exception, which I can not relate to my input
 data:

 Exception  Problems updating list of documents (length = 1): (500,
 ('badarg', '46'))

 What does that '46' mean?


 On Wed, Dec 11, 2013 at 1:47 PM, Robert Newson rnew...@apache.org wrote:

 I think your image\/png is just an artifact of your printing method,
 you don't need to escape the forward slash in content_type, see
 example below;


 {_id:doc1,_rev:1-96e2a6c78b8bfb227e79e1fbb16873f9,_attachments:{att1:{content_type:image/png,revpos:1,digest:md5-XUFAKrxLKna5cZ2REBfFkg==,length:5,stub:true}}}

 B.


 On 11 December 2013 12:35, Daniel Gonzalez gonva...@gonvaled.com wrote:
  That would work *only* for that prefix (data:image/png;base64,), or any
  prefix which happens to have the same length. Not very robust.
 
  I just discovered that the data coming from the front-end comes in
 data-uri
  format (rfc2397). This should handle any rfc2397 prefix:
  http://stackoverflow.com/a/20518589/647991 (maybe buggy, just
 implemented).
 
  Another question: even after removing the data-uri prefix, I am still
  getting problems. I think my content type is not right.
 
  Must content_type be escaped? That is:
 
  'content_type': 'image/png', - 'content_type': 'image\/png',
 
  The only reference I see to that is an example here:
  http://wiki.apache.org/couchdb/HTTP_Document_API#Inline_Attachments
 
  But no real explanation of why. It seems no other strings must be escaped
  for couchdb. The only requirement that couchdb seems to impose on top of
  json is that the data in the attachment must be in base64 format.
 
  But now it seems that the content_type must escape the slashes (/). Why?
 It
  does not seem to be a json feature: slashes are fine in any json string.
 So
  what is that?
 
  I would like to know the specificacion for the format expected for
  content_type. Does that have a name? I am calling it escaped mediatype.
  Is it part of a more generic escaping process expected by couchdb, or
 only
  the content_type is affected? Is there an official name for that?
 
 
  On Wed, Dec 11, 2013 at 12:18 PM, Johannes Jörg Schmidt 
  schm...@netzmerk.com wrote:
 
  data.slice(22)
 
  2013/12/11 Daniel Gonzalez gonva...@gonvaled.com:
   Thanks, I just realized about this. The base64 is coming from the
   javascript frontend (chose file in a form). So I need to remove the
   prefix data:image/png;base64,.
   Not sure how to do this without rolling my own regexes though.
  
  
   On Wed, Dec 11, 2013 at 12:01 PM, Alexander Shorin kxe...@gmail.com
  wrote:
  
   Hi,
  
   _attachments data should be valid base64 encoded string, while you
 have:
   
  
 
 data:image/png;base64,iVBORw0KGgoNSUhEUgAAAM8AAADkCAIAAACwiOf9A3NCSVQICAjb4U/gAAAgAElEQVR4nO...
  
   Chars : and , are invalid for base64.
   --
   ,,,^..^,,,
  
  
   On Wed, Dec 11, 2013 at 2:49 PM, Daniel Gonzalez 
 gonva...@gonvaled.com
  
   wrote:
Hi,
   
(SO reference: http://stackoverflow.com/q/20516980/647991. I post
  there
because formatting makes things much easier to read, replies /
  comments
   are
well organized, and the up/downvote mechanism works)
   
I am performing the following operation:
   
1. Prepare some documents: `docs = [ doc1, doc2, ... ]`. The
 documents
   have
*maybe* attachments
2. I `POST` to `_bulk_docs` the list of documents
3. I get an `Exception  Problems updating list of documents
 (length =
   1):
(500, ('badarg', '58'))`
   
My `bulk_docs` is (in this case just one):
   
[   {   '_attachments': {   'image.png': {   'content_type':
'image/png',
 'data':
   
  
 
 'data:image/png;base64,iVBORw0KGgoNSUhEUgAAAM8AAADkCAIAAACwiOf9A3NCSVQICAjb4U/gAAAgAElEQVR4nO...'}},
'_id': '08b8fc66-cd90-47a1-9053-4f6fefabdfe3',
'_rev': '15-ff3d0e8baa56e5ad2fac4937264fb3f6',
'docmeta': {   'created': '2013-10-01 14:48:24.311257',
   'updated': [   '2013-10-01
  14:48:24.394157',
  '2013-12-11
  08:19:47.271812',
  '2013-12-11
  08:25:05.662546',
  '2013-12-11
  10:38:56.116145']},
'org_id': 45345,
'outputs_id': None,
'properties': {   'auto-t2s

Re: bulk update failing when document has attachments?

2013-12-11 Thread Robert Newson

➜  ~  curl localhost:5984/db1/_bulk_docs
-Hcontent-type:application/json -d
'{docs:[{_attachments:{foo:{data:aGVsbG8=}}}]}'
[{ok:true,id:b5d2060479624e483a8fe4747f001dbe,rev:1-12c665c499a525a3a1a9ad35c90604a1}]

➜  ~  curl localhost:5984/db1/b5d2060479624e483a8fe4747f001dbe
{_id:b5d2060479624e483a8fe4747f001dbe,_rev:1-12c665c499a525a3a1a9ad35c90604a1,_attachments:{foo:{content_type:application/octet-stream,revpos:1,digest:md5-XUFAKrxLKna5cZ2REBfFkg==,length:5,stub:true}}}

➜  ~  curl localhost:5984/db1/b5d2060479624e483a8fe4747f001dbe/foo
hello%

Maybe you left escaped newlines in your base64 input?

B.


On 11 December 2013 13:11, Robert Newson rnew...@apache.org wrote:
 http://json.org/string.gif talks escaping back slash, not forward
 slash. The PDF page 194 talks about escaping forward slash within a
 RegExp statement in Javascript, which is not JSON.

 B.


 On 11 December 2013 12:58, Daniel Gonzalez gonva...@gonvaled.com wrote:
 It is not an artifact: I am taking that from the couchdb documentation.

 And according to Alexander Shorin forward slashes **really** need to be
 escaped in json. But it is not me who must do that, but the library
 converting the python objects to couchdb, so that can not be my problem.

 Now I am left with a badarg exception, which I can not relate to my input
 data:

 Exception  Problems updating list of documents (length = 1): (500,
 ('badarg', '46'))

 What does that '46' mean?


 On Wed, Dec 11, 2013 at 1:47 PM, Robert Newson rnew...@apache.org wrote:

 I think your image\/png is just an artifact of your printing method,
 you don't need to escape the forward slash in content_type, see
 example below;


 {_id:doc1,_rev:1-96e2a6c78b8bfb227e79e1fbb16873f9,_attachments:{att1:{content_type:image/png,revpos:1,digest:md5-XUFAKrxLKna5cZ2REBfFkg==,length:5,stub:true}}}

 B.


 On 11 December 2013 12:35, Daniel Gonzalez gonva...@gonvaled.com wrote:
  That would work *only* for that prefix (data:image/png;base64,), or any
  prefix which happens to have the same length. Not very robust.
 
  I just discovered that the data coming from the front-end comes in
 data-uri
  format (rfc2397). This should handle any rfc2397 prefix:
  http://stackoverflow.com/a/20518589/647991 (maybe buggy, just
 implemented).
 
  Another question: even after removing the data-uri prefix, I am still
  getting problems. I think my content type is not right.
 
  Must content_type be escaped? That is:
 
  'content_type': 'image/png', - 'content_type': 'image\/png',
 
  The only reference I see to that is an example here:
  http://wiki.apache.org/couchdb/HTTP_Document_API#Inline_Attachments
 
  But no real explanation of why. It seems no other strings must be escaped
  for couchdb. The only requirement that couchdb seems to impose on top of
  json is that the data in the attachment must be in base64 format.
 
  But now it seems that the content_type must escape the slashes (/). Why?
 It
  does not seem to be a json feature: slashes are fine in any json string.
 So
  what is that?
 
  I would like to know the specificacion for the format expected for
  content_type. Does that have a name? I am calling it escaped mediatype.
  Is it part of a more generic escaping process expected by couchdb, or
 only
  the content_type is affected? Is there an official name for that?
 
 
  On Wed, Dec 11, 2013 at 12:18 PM, Johannes Jörg Schmidt 
  schm...@netzmerk.com wrote:
 
  data.slice(22)
 
  2013/12/11 Daniel Gonzalez gonva...@gonvaled.com:
   Thanks, I just realized about this. The base64 is coming from the
   javascript frontend (chose file in a form). So I need to remove the
   prefix data:image/png;base64,.
   Not sure how to do this without rolling my own regexes though.
  
  
   On Wed, Dec 11, 2013 at 12:01 PM, Alexander Shorin kxe...@gmail.com
  wrote:
  
   Hi,
  
   _attachments data should be valid base64 encoded string, while you
 have:
   
  
 
 data:image/png;base64,iVBORw0KGgoNSUhEUgAAAM8AAADkCAIAAACwiOf9A3NCSVQICAjb4U/gAAAgAElEQVR4nO...
  
   Chars : and , are invalid for base64.
   --
   ,,,^..^,,,
  
  
   On Wed, Dec 11, 2013 at 2:49 PM, Daniel Gonzalez 
 gonva...@gonvaled.com
  
   wrote:
Hi,
   
(SO reference: http://stackoverflow.com/q/20516980/647991. I post
  there
because formatting makes things much easier to read, replies /
  comments
   are
well organized, and the up/downvote mechanism works)
   
I am performing the following operation:
   
1. Prepare some documents: `docs = [ doc1, doc2, ... ]`. The
 documents
   have
*maybe* attachments
2. I `POST` to `_bulk_docs` the list of documents
3. I get an `Exception  Problems updating list of documents
 (length =
   1):
(500, ('badarg', '58'))`
   
My `bulk_docs` is (in this case just one):
   
[   {   '_attachments': {   'image.png': {   'content_type':
'image/png',
 'data':
   
  
 
 'data:image/png;base64

Re: bulk update failing when document has attachments?

2013-12-11 Thread Robert Newson

forward slash is a
Unicode-character-except--or-\-or-control-character. The picture does
show that you *can* escape a forward slash with \/ but the 'any' track
allows an unescaped forward slash. It's not news that JSON (while
ostensibly simple) is not well-defined, though. I suggest we all just
have some cake.

On 11 December 2013 13:18, Daniel Gonzalez gonva...@gonvaled.com wrote:
Funny that you do not need to escape it. The spec says you should:

char
any-Unicode-character-
except--or-\-or-
control-character
\
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits

Anyway, my problem has been solved. I am not escaping anything in the
content_type: the json library is problably doing that. What I need to do
is to attach real base64 encoded data, which has solved my problem.

On Wed, Dec 11, 2013 at 2:15 PM, Robert Newson rnew...@apache.org wrote:

➜ ~ curl localhost:5984/db1/_bulk_docs
-Hcontent-type:application/json -d
'{docs:[{_attachments:{foo:{data:aGVsbG8=}}}]}'

[{ok:true,id:b5d2060479624e483a8fe4747f001dbe,rev:1-12c665c499a525a3a1a9ad35c90604a1}]

➜ ~ curl localhost:5984/db1/b5d2060479624e483a8fe4747f001dbe

{_id:b5d2060479624e483a8fe4747f001dbe,_rev:1-12c665c499a525a3a1a9ad35c90604a1,_attachments:{foo:{content_type:application/octet-stream,revpos:1,digest:md5-XUFAKrxLKna5cZ2REBfFkg==,length:5,stub:true}}}

➜ ~ curl localhost:5984/db1/b5d2060479624e483a8fe4747f001dbe/foo
hello%

Maybe you left escaped newlines in your base64 input?

On 11 December 2013 13:11, Robert Newson rnew...@apache.org wrote:
http://json.org/string.gif talks escaping back slash, not forward
slash. The PDF page 194 talks about escaping forward slash within a
RegExp statement in Javascript, which is not JSON.

On 11 December 2013 12:58, Daniel Gonzalez gonva...@gonvaled.com
wrote:
It is not an artifact: I am taking that from the couchdb documentation.

And according to Alexander Shorin forward slashes **really** need to be
escaped in json. But it is not me who must do that, but the library
converting the python objects to couchdb, so that can not be my problem.

Now I am left with a badarg exception, which I can not relate to my
input
data:

Exception Problems updating list of documents (length = 1): (500,
('badarg', '46'))

What does that '46' mean?

On Wed, Dec 11, 2013 at 1:47 PM, Robert Newson rnew...@apache.org
wrote:

I think your image\/png is just an artifact of your printing method,
you don't need to escape the forward slash in content_type, see
example below;

{_id:doc1,_rev:1-96e2a6c78b8bfb227e79e1fbb16873f9,_attachments:{att1:{content_type:image/png,revpos:1,digest:md5-XUFAKrxLKna5cZ2REBfFkg==,length:5,stub:true}}}

On 11 December 2013 12:35, Daniel Gonzalez gonva...@gonvaled.com
wrote:
That would work *only* for that prefix (data:image/png;base64,), or
any
prefix which happens to have the same length. Not very robust.

I just discovered that the data coming from the front-end comes in
data-uri
format (rfc2397). This should handle any rfc2397 prefix:
http://stackoverflow.com/a/20518589/647991 (maybe buggy, just
implemented).

Another question: even after removing the data-uri prefix, I am still
getting problems. I think my content type is not right.

Must content_type be escaped? That is:

'content_type': 'image/png', - 'content_type': 'image\/png',

The only reference I see to that is an example here:
http://wiki.apache.org/couchdb/HTTP_Document_API#Inline_Attachments

But no real explanation of why. It seems no other strings must be
escaped
for couchdb. The only requirement that couchdb seems to impose on
top of
json is that the data in the attachment must be in base64 format.

But now it seems that the content_type must escape the slashes (/).
Why?
It
does not seem to be a json feature: slashes are fine in any json
string.
So
what is that?

I would like to know the specificacion for the format expected for
content_type. Does that have a name? I am calling it escaped
mediatype.
Is it part of a more generic escaping process expected by couchdb, or
only
the content_type is affected? Is there an official name for that?

On Wed, Dec 11, 2013 at 12:18 PM, Johannes Jörg Schmidt
schm...@netzmerk.com wrote:

data.slice(22)

2013/12/11 Daniel Gonzalez gonva...@gonvaled.com:
Thanks, I just realized about this. The base64 is coming from the
javascript frontend (chose file in a form). So I need to remove
the
prefix data:image/png;base64,.
Not sure how to do this without rolling my own regexes though.

On Wed, Dec 11, 2013 at 12:01 PM, Alexander Shorin
kxe...@gmail.com
wrote:

Hi,

_attachments data should be valid base64 encoded string, while
you
have:

data:image/png;base64

Re: Please add me to ContributorsGroup

2013-12-10 Thread Robert Newson

Hi Michael,

This is the CouchDB user list and the
https://wiki.apache.org/couchdb/People_on_the_Couch page is for users of
CouchDB, not MongoDB.

B.



On 10 December 2013 13:03, Michael Giglhuber m.giglhu...@newelements.dewrote:

  Hi all,

 I would  be glad, if you add me to the 
 ContributorsGrouphttps://wiki.apache.org/couchdb/ContributorsGroupso I can 
 edit the page
 https://wiki.apache.org/couchdb/People_on_the_Couch . We love the mongodb
  and we use it within our Web Analytics / Live Chat Software.

 My Nickname is MichaelGiglhuber



 Best Regards

 Michael







 Michael Giglhuber

 Marketing  Communication Manager



 _



 New Elements GmbH | We Turn Traffic Into Turnover

 Thurn-und-Taxis-Strasse 10
 D - 90411 Nürnberg



 Fon: 0911 – 650 083-41
 Fax: 0911 – 650 083-99
 mailto: m.giglhu...@newelements.de



 Geschäftsführer: Atasoy Altinci, Michael Deinhard

 Amtsgericht Nürnberg HRB 25042

 USt-IdNr.: DE263750718





 weitere Informationen:
 http://www.newelements.de/#technologies



 [image: Beschreibung: Beschreibung: 
 cid:image001.jpg@01C80989.B298B160]http://www.newelements.de/

 _




 Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
 Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
 irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
 vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
 Weitergabe dieser Mail ist nicht gestattet.



 This e-mail may contain confidential and/or privileged information. If you
 are not the intended recipient (or have received this e-mail in error)
 please notify the sender immediately and destroy this e-mail. Any
 unauthorised copying, disclosure or distribution of the material in this
 e-mail is strictly forbidden.



 --

 -
 New Elements GmbH
 Thurn-und-Taxis Str. 10
 D -90411 Nürnberg
 Geschäftsführer: Atasoy Altinci
 Sitz- und Registergericht: Amtsgericht Nürnberg HRB 25042

Re: public_field: make sub-field visible

2013-12-09 Thread Robert Newson

Yeah, it only works on top level fields right now.

B.


On 9 December 2013 17:48, Stefan Klein st.fankl...@gmail.com wrote:
 Sorry, hit send to fast. :(

 2013/12/9 Stefan Klein st.fankl...@gmail.com

 Hi couch users,

 i got some application specific data in my user documents and have to make
 one of the fields visible to other users.
 public_field works fine, for top level fields. I try to make a sub field
 visible.

 {
 _id: org.couchdb.user:someuser,
 _rev: somerev,
 appData : {
 field : should be visible,
 secretfield : should not be visible
 }



 i tried
 [couch_httpd_auth]
 public_fields = appData.field

 and
 public_fields = appData[field]

 both didn't work, so i guess it is not possible and public_fields only work
 on top level fields?

 Thanks,
 Stefan

Re: Răspuns: CouchDB design doc editor and couch.js sandbox

2013-12-09 Thread Robert Newson

The more tools the better, imo.

B.


On 9 December 2013 22:41, Skitsanos i...@skitsanos.com wrote:
 Salut Dragos,

 I guess you wasn't aware about Kanapes IDE (http://kanapeside.com), a fully
 featured CouchDB IDE, made in Bucharest btw...



 On Tuesday, December 10, 2013, Dragos Stoica wrote:

 Hello all,


 I would like to present you a tool that we use for ddoc editing and for
 small couch.js taks.
 The tool is a CouchDB DB itself and was developed with couchapp.


 The tool is availble on github:
 https://github.com/dragos-constantin-stoica/designeditor

 Any feedback is welcome!

 All the best,

 Dragos STOICA
 0735220023



 --
 
 *Skitsanos *-- White Label design and development
 ph.: +1 (941) 840-0716 | email: i...@skitsanos.com | facebook:
 ΣΚΙΤΣΑΝΟΣhttp://www.facebook.com/skitsanoscom
  | web: http://skitsanos.com/

Re: [ANN] CouchDB-Lucene Package availability for OpenSuSE systems

2013-12-08 Thread Robert Newson

Brilliant!

Pull Requests for the features in your fork would be gratefully received
too.
On 8 Dec 2013 15:21, Marcello Barnaba v...@openssl.it wrote:


 Hello list,

 I have built a package of CouchDB-Lucene for OpenSuSE (11.4 ~ 13.1)
 systems.

 It is available on
 https://build.opensuse.org/package/show/home:vjt:ifad/couchdb-lucene. The
 same buildservice repo also contains the latest releases of CouchDB (1.5.0)
 and Erlang (R16).

 This package is built from our CouchDB-Lucene fork (
 https://github.com/ifad/couchdb-lucene), that increases the precisionStep
 of date fields, to allow more granularity in range queries and perform
 exact sorting, allows queries to be POST’ed and includes the full
 ooxml-schemas from Apache POI, allowing Tika to parse Office documents that
 use exotic features.

 We are using this release in production since July 2013, and we didn’t
 encounter any issue with it - it works beautifully, backing an application
 with tens of thousands of CouchDB documents, and some thousands of Office
 attachments.

 Feedback is of course appreciated. :-)

 Enjoy!

 ~Marcello

Re: use startkey/endkey to discribe a range problem

2013-12-08 Thread Robert Newson

CouchDB views are one dimensional so you will not succeed with a two
dimensional geo query. You could try couchdb-lucene which can.
On 8 Dec 2013 15:51, Qaqabincs luji...@gmail.com wrote:

 I use a view to query an area, and emit [lng, lat] as key, so I use
 ...?startkey=[min_lng, min_lat]endkey=[max_lng, max_lat] to help find
 out all place within the quadranglar area, but the results are exactly
 between x∈[min_lng, max_lng]y∈[-∞, +∞ ] and x∈[-∞, +∞]y∈[min_lat,
 max_lat] (like a cross-shaped stripe area).

 what is the correct usage of startkey/endkey to express a quadranglar area?

 -
 qaqabincs

Re: What would be the best way to exactly duplicate a document, with attachments?

2013-12-05 Thread Robert Newson

https://wiki.apache.org/couchdb/Replication#Named_Document_Replication ?

On 5 December 2013 08:10, Benoit Chesneau bchesn...@gmail.com wrote:
 this not really possible diectly for now.


 maybe copy to a new doc id, replicate this docid  and delete on the source?

 (why renaming on the other host?)

 On Thursday, December 5, 2013, Daniel Gonzalez wrote:

 Hi all,

 Let's say I have host1/db1/doc_id1 which I want to duplicate to
 host2/db2/doc_id2.

 The original document has (maybe) attachments. Currently what I am doing
 is:

 1.
 Get doc1. I get an attachment stub.
 2.
 Put this document to host2/db2/doc_id2
 3.
 Now I should loop through all attachments, get them, and put them to the
 new document. Probably something will not be guesses right (content_type,
 ...).

 Instead, I would like to use a method which:

 1.
 Duplicates the document by using a single GET to obtain the original
 documents + attachments, and creates the new document with a single
 PUT/POST (is this possible, no matter the amount of attachments?)
 2.
 Does not use the filesystem to save the attachments
 3.
 Does not use replication
 4.
 Does not recreate the attachments: specifically, no re-guessing of the
 content type and other attachment properties.
 To make it clear: I want an exact copy of the original document, except
 that:

 1.
 The doc_id is different
 2.
 It is in a different host/database than the original document
 3.
 There is no revision

 Is there any built-in couchdb support for this?

 Thanks,
 Daniel


 Link to original SO question: http://stackoverflow.com/q/20386913/647991

Re: Is startkey_docid as scalable as startkey?

2013-12-05 Thread Robert Newson

To be clearer, startkey_docid is *ignored* unless you also specify startkey.

B.


On 5 December 2013 23:23, Robert Newson rnew...@apache.org wrote:
 The question is meaningless, let me explain.

 startkey_docid (and endkey_docid) are used for selecting ranges where
 the view key is the same, it is *not* a separate index. Views are in
 key order only.

 under the covers, the true view key is actually [emitted_key_order,
 doc._id], the rows are unique in the b+tree.

 B.


 On 5 December 2013 23:14, Nathan Vander Wilt nate-li...@calftrail.com wrote:
 Let's say for every doc I `emit([doc.user])` and, when a user requests a 
 document ID I have my middleware `GET 
 …/docs_by_user?startkey=[req.user.name]endkey=[req.user.name,{}]include_docs=truelimit=1startkey_docid=req.param.id`.
  I return the row's doc or 404 if the range is empty. Basically I'm giving 
 each user read access to their own objects without having to give them 
 their own database.

 I'm wondering though, if `startkey_docid` is as scalable as `startkey` 
 itself. IIRC, the doc ids are simply a final extra group level internally 
 (clearly they determine sort order) but if this behaves more like 
 `skip=lots` instead, then of course relying heavily on the query above 
 would be something of an anti-pattern.

 (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't 
 simplify my emit/query to use `key=namestartkey_docid=id` right? 
 Alternatively, would it be more efficient but just-as-correct to emit plain 
 string keys and limit my range to `startkey=nameendkey=name+\0?)

 thanks,
 -natevw

Re: Is startkey_docid as scalable as startkey?

2013-12-05 Thread Robert Newson

The question is meaningless, let me explain.

startkey_docid (and endkey_docid) are used for selecting ranges where
the view key is the same, it is *not* a separate index. Views are in
key order only.

under the covers, the true view key is actually [emitted_key_order,
doc._id], the rows are unique in the b+tree.

B.


On 5 December 2013 23:14, Nathan Vander Wilt nate-li...@calftrail.com wrote:
 Let's say for every doc I `emit([doc.user])` and, when a user requests a 
 document ID I have my middleware `GET 
 …/docs_by_user?startkey=[req.user.name]endkey=[req.user.name,{}]include_docs=truelimit=1startkey_docid=req.param.id`.
  I return the row's doc or 404 if the range is empty. Basically I'm giving 
 each user read access to their own objects without having to give them 
 their own database.

 I'm wondering though, if `startkey_docid` is as scalable as `startkey` 
 itself. IIRC, the doc ids are simply a final extra group level internally 
 (clearly they determine sort order) but if this behaves more like 
 `skip=lots` instead, then of course relying heavily on the query above would 
 be something of an anti-pattern.

 (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't 
 simplify my emit/query to use `key=namestartkey_docid=id` right? 
 Alternatively, would it be more efficient but just-as-correct to emit plain 
 string keys and limit my range to `startkey=nameendkey=name+\0?)

 thanks,
 -natevw

Re: Is startkey_docid as scalable as startkey?

2013-12-05 Thread Robert Newson

Well, that'll teach me to multi-task and skim emails...

startkey_docid is the same 'scalability' as startkey, in the sense
that startkey and startkey+startkey_docid are O(log n) lookups.

key=namestartkey_docid=id ought to work as key=foo is, internally,
startkey=fooendkey=foo (possibly verbatim).

To get back to your use case, I'm assuming doc.user is not unique but,
somehow, you know the doc id of the user you're looking for? If so,
why not just use _all_docs?key=req.param.id and don't build the view
at all?




On 5 December 2013 23:23, Robert Newson rnew...@apache.org wrote:
 To be clearer, startkey_docid is *ignored* unless you also specify startkey.

 B.


 On 5 December 2013 23:23, Robert Newson rnew...@apache.org wrote:
 The question is meaningless, let me explain.

 startkey_docid (and endkey_docid) are used for selecting ranges where
 the view key is the same, it is *not* a separate index. Views are in
 key order only.

 under the covers, the true view key is actually [emitted_key_order,
 doc._id], the rows are unique in the b+tree.

 B.


 On 5 December 2013 23:14, Nathan Vander Wilt nate-li...@calftrail.com 
 wrote:
 Let's say for every doc I `emit([doc.user])` and, when a user requests a 
 document ID I have my middleware `GET 
 …/docs_by_user?startkey=[req.user.name]endkey=[req.user.name,{}]include_docs=truelimit=1startkey_docid=req.param.id`.
  I return the row's doc or 404 if the range is empty. Basically I'm giving 
 each user read access to their own objects without having to give them 
 their own database.

 I'm wondering though, if `startkey_docid` is as scalable as `startkey` 
 itself. IIRC, the doc ids are simply a final extra group level internally 
 (clearly they determine sort order) but if this behaves more like 
 `skip=lots` instead, then of course relying heavily on the query above 
 would be something of an anti-pattern.

 (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't 
 simplify my emit/query to use `key=namestartkey_docid=id` right? 
 Alternatively, would it be more efficient but just-as-correct to emit plain 
 string keys and limit my range to `startkey=nameendkey=name+\0?)

 thanks,
 -natevw

Re: Why do couchdb reduce functions have to be commutative

2013-12-03 Thread Robert Newson

At your own risk. CouchDB makes no promise not to break reduce
functions that don't follow the rules, though we won't do it
capriciously.

On 3 December 2013 18:00, Oliver Dain opub...@dains.org wrote:
Hi Robert,

Thanks very much for the reply. That makes sense.

I gather this means that if I'm running a single server, at least with
today's code, commutative isn't required? If so, is that something I can
count on? For example, if I know my application is quite small and will
never be sharded, is it safe for me to use a non-commutative reduce?

Thanks,
Oliver

On Tue, Dec 3, 2013 at 9:57 AM, Oliver Dain oli...@dains.org wrote:

Because the order that we pass keys and values to the reduce function
is not defined. In sharded situations (like bigcouch, which is being
merged) an intermediate reduce value on an effectively random subset
of keys/values is generated at each node and a final rereduce is done
on all the intermediates. The constraints on reduce functions exist in
anticipation of clustering.

On 1 December 2013 21:45, Oliver Dain opub...@dains.org wrote:
Hey CouchDB users,

I've just started messing around with CouchDB and I understand why CouchDB
reduce functions need to be associative, but I don't understand why they
also have to be commutative. I posted a much more detailed version of this
question to StackOverflow yesterday, but haven't gotten an answer yet (my
SO experience says that means I probably won't ever get one). Figured it
might be smart to explicitly loop in the couch community.

The original StackOverflow question is here:

http://stackoverflow.com/questions/20303355/why-do-couchdb-reduce-functions-have-to-be-commutative

Any thoughts would be appreciated!

Thanks,
Oliver

Re: Why do couchdb reduce functions have to be commutative

2013-12-02 Thread Robert Newson

Because the order that we pass keys and values to the reduce function
is not defined. In sharded situations (like bigcouch, which is being
merged) an intermediate reduce value on an effectively random subset
of keys/values is generated at each node and a final rereduce is done
on all the intermediates. The constraints on reduce functions exist in
anticipation of clustering.

B.


On 1 December 2013 21:45, Oliver Dain opub...@dains.org wrote:
 Hey CouchDB users,

 I've just started messing around with CouchDB and I understand why CouchDB
 reduce functions need to be associative, but I don't understand why they
 also have to be commutative. I posted a much more detailed version of this
 question to StackOverflow yesterday, but haven't gotten an answer yet (my
 SO experience says that means I probably won't ever get one). Figured it
 might be smart to explicitly loop in the couch community.

 The original StackOverflow question is here:

 http://stackoverflow.com/questions/20303355/why-do-couchdb-reduce-functions-have-to-be-commutative

 Any thoughts would be appreciated!

 Thanks,
 Oliver

Re: Doucment Update Conflict in Futon

2013-11-29 Thread Robert Newson

Odd, sounds like Futon is confused. Try clearing your browser cache
and reloading the page. (That or someone else is editing the document
in another window)

B.


On 29 November 2013 09:54, John Norris j...@norricorp.f9.co.uk wrote:
 Just to add, I notice there is a 409 error in the logs - a conflict when the
 _rev value is not part of the post. But I am doing this via Futon which
 should handle this (and has done previously).
 Regards,
 John

Re: couchdb teporaly average view

2013-11-29 Thread Robert Newson

map:

emit(doc.created, doc.value);

reduce:

_stats

then query with startkey and endkey appropriately. This will give you
the sum of all values between the two keys and the number of rows.
Divide one by the other to derive mean average. This will work for
startkey/endkey's that span hours, days or weeks.

B.


On 28 November 2013 20:22, Gerardo Di Iorio aret...@gmail.com wrote:
 Sorry for previous post
 Hi,
 i try to use couchdb for store data sensor.
 My document is
 {
_id: a631c192ffebb7b0d543863925f4e8f9,
_rev: 1-9e8bacc2a3b79a2dc37ffeb5c53383f9,
source: sensorbox,
location: living room,
type: temperature,
value: 25.3,
created: 1358270795751
 }
 I now have need to create an view for
 1)average temp every hour
 2)average temp every day
 3)average temp every week

 Is possible recall in avg_day the data of view avg_hour ??
 If yes can send me an example?

 Regards
 Gerardo Di Iorio




 2013/11/28 Gerardo Di Iorio aret...@gmail.com

 Hi,
 i try to use couchdb for store data sensor.
 My document is

 {
id:123
source:

Re: proposed feature - list function /update handler half baby

2013-11-27 Thread Robert Newson

What request would trigger this fold? What arguments would it take?
I'm not sure what's painful about the existing _bulk_docs read and
write API's, though exist primarily for bulk importing/exporting, most
database interactions are document or view level.

Since the word transaction was mentioned, it's worth remembering
that couchdb supports fully ACID transactions at the individual
document level (and, yes, that's not an oxymoron). This idea would not
(could not) change that. For example, any document updated during this
fold would stay updated, even if the fold failed before reaching the
end.

Is there value in a bulk version of update handlers, I guess?

As for list function /update handler half baby, I have no idea what
it means...


B.


On 27 November 2013 08:58, Benoit Chesneau bchesn...@gmail.com wrote:
 On Wed, Nov 27, 2013 at 9:46 AM, Alexander Shorin kxe...@gmail.com wrote:

 On Wed, Nov 27, 2013 at 12:38 PM, Benoit Chesneau bchesn...@gmail.com
 wrote:
  On Wed, Nov 27, 2013 at 9:33 AM, Alexander Shorin kxe...@gmail.com
 wrote:
 
  mmm..this would require to be database admin and might not been
  optimal. May be have just update function name there? Also, I believe,
  such call will ignore any custom response from update function, right?
  --
  ,,,^..^,,,
 
 
  The purpose would be to only handle the updates of a docs coming in a
 blik
  update. The return message won't change,
 
  For the admin rights, I don't see, all the rights working for a bulk
 update
  will be applied there.

 Due to custom source code execution like in case of temp views. While
 it's might be ok for sandboxed languages, I don't like to have such
 feature for Python query server or even Erlang one for oblivious
 reasons(:


 Well let the operator decides. Want sandboxed execution, run couchdb or the
 view script in a protected process. either using a vm, seccomp, limits...
 Other than that you konw what could happen when doing that. Not every
 people are using a couch exposed to the public. Only to the applications.
 And it's not more dangerous than running a script in the view engine.
 either temp views or normal views.





  Note that such function could introduce the possibility to have
  transactions. Imagine you could also access to the database api in such
  function...

 Hm..to have real transaction feature you need to operate with all
 posted documents e.g. this would be not update function, but something
 different with signature (docs, req) // note docs instead of doc


 Yes eventually such functions should have a req, and then we pass doc after
 doc like for the views. That would be better anyway. The response may be
 adapated I agree. But it would be returned by couchdb not by the function.
 This function would only return docs.



 For access to the database api inside design functions I'm not
 sure...I'm playing within for lua query server - it's cool and
 very-very powerful feature, but it works just because I could pass
 native Erlang couch_* functions into it. For other languages I fear we
 have to setup some generic RPC service on CouchDB side for that (and
 make a lot of work for public API) and not sure that this is wise
 idea.


 You can pass any call using STDIO too. we just need to make the protocol
 a little more asynchronous, passing messages to the erlang side. What we
 are doing with lists.

 It will be a little faster using a native language but that's all. Anyway
 these are implementations details. I reallly think that such idea could be
 very powerful.



 --
 ,,,^..^,,,

Re: view used startkey endkey problem

2013-11-25 Thread Robert Newson

Views can be used to look up a specific key or a contiguous range of
keys, the original poster is wrong to think that each item in the view
is separately queryable.

That said, [600,69] is greater than [400,50] and less than [1000, 100]
and so should be returned, even in 1.0.4.

B.


On 25 November 2013 22:16, Andy Wenk a...@nms.de wrote:
 I am not sure how it was in old 1.0.4. But is it a problem to upgrade to
 1.5? 1.0.4 is really outdated ...


 On 25 November 2013 15:51, Qaqabincs luji...@gmail.com wrote:

 hi, all,

 I use couchdb-1.0.4, and I use a design view:

 findBoys:{
 map:function(doc) {
 if(doc.boy){
 emit([doc.boy.height,doc.boy.weight],doc);
 };
 }
 }

 if I query this view via Futon, there are 13 boys on list, but if I set
 a range to this view, such as:
 http://localhost:5984/repos/_design/namelist/_view/findBoys?startkey=[400,
 50]endkey=[1000, 100]
 couchdb return me a result like:
 {total_rows:13,offset:13,rows:[]}
 but there at least a boy whose height=600  weight=69 !

 and, when I use couchdb-1.5.0, the view work all right, it can return
 the exact data.

 is the startkey  endkey not supported by couchdb-1.0.4? or I use a
 mistake syntax?



 Qaqabincs




 --
 Andy Wenk
 Hamburg - Germany
 RockIt!

 http://www.couchdb-buch.de
 http://www.pg-praxisbuch.de

 GPG fingerprint: C044 8322 9E12 1483 4FEC 9452 B65D 6BE3 9ED3 9588

Re: Debian init script stop/restart does not work

2013-11-22 Thread Robert Newson

That would be great!

On 22 November 2013 14:55, Mike Marino mmar...@gmail.com wrote:
 I have definitely had a similar issue, and had to fix the script
 myself ( We use a CRUX distribution, so everything was built from
 scratch ).  If it would help, I could dig up how I fixed it on my
 system.

 On Fri, Nov 22, 2013 at 3:54 PM, Robert Newson rnew...@apache.org wrote:
 Yup, we know. The start/stop code is quite complicated (*too*
 complicated) and seems to go wrong more and more.

 Jan and I are going to spend some time digging into it over the weekend.

 The main issue is that the pid in the pidfile is wrong (generally on
 first start since boot) so stop fails. I've even seen it let two
 couchdb's start at once.

 B.


 On 22 November 2013 14:45, Alexander Uvarov alexander.uva...@gmail.com 
 wrote:
 Debian init script does not work as expected in stop/restart cases. It 
 successfully executes, but process still alive and database still respond. 
 Anyone with same problems?

 Using 1.4, 1.5 built by hands.

Re: Debian init script stop/restart does not work

2013-11-22 Thread Robert Newson

Thanks!

On 22 November 2013 15:42, Mike Marino mmar...@gmail.com wrote:
 Ok, so this is for couchdb 1.3.0 (downloaded tar.gz).  The changes I
 made (please ignore my misspelling of shepherd :-) ) are here:

 https://gist.github.com/mgmarino/7601778

 To summarize, I essentially track the pid of the shepherd program,
 which would otherwise respawn couchdb, and kill this program also
 during a shutdown.  This certainly doesn't address any larger issues
 of the complication of the start / stop, but perhaps it will at least
 be informative.  I also haven't checked to see if anything changed
 between 1.3 and 1.4/1.5, so I hope it's still relevant.

 hth,
 Mike


 On Fri, Nov 22, 2013 at 3:57 PM, Robert Newson rnew...@apache.org wrote:
 That would be great!

 On 22 November 2013 14:55, Mike Marino mmar...@gmail.com wrote:
 I have definitely had a similar issue, and had to fix the script
 myself ( We use a CRUX distribution, so everything was built from
 scratch ).  If it would help, I could dig up how I fixed it on my
 system.

 On Fri, Nov 22, 2013 at 3:54 PM, Robert Newson rnew...@apache.org wrote:
 Yup, we know. The start/stop code is quite complicated (*too*
 complicated) and seems to go wrong more and more.

 Jan and I are going to spend some time digging into it over the weekend.

 The main issue is that the pid in the pidfile is wrong (generally on
 first start since boot) so stop fails. I've even seen it let two
 couchdb's start at once.

 B.


 On 22 November 2013 14:45, Alexander Uvarov alexander.uva...@gmail.com 
 wrote:
 Debian init script does not work as expected in stop/restart cases. It 
 successfully executes, but process still alive and database still 
 respond. Anyone with same problems?

 Using 1.4, 1.5 built by hands.

Re: Need help for Couch DB data export

2013-11-22 Thread Robert Newson

_bulk_docs requires a different format input than _all_docs. You can't pipe
one to the other.
On 22 Nov 2013 21:22, Andy Wenk a...@nms.de wrote:

 Hi Sreedhar,


 On 21 November 2013 14:51, Sreedhar P V venkatasridha...@gmail.com
 wrote:

  Hi Team,
 
  I am using couchdb for one of my projects and need to import the client
  data in couchdb. I have good experience on RDBMS but not on NoSQL. I need
  your assistance for the below scenario.
 
  To export the data I have used below curl command:
  *curl -X
  GET http://127.0.0.1:5984/testdb/_all_docs?include_docs=true
  http://127.0.0.1:5984/testdb/_all_docs?include_docs=true  FILE.txt*
 
  To import the FILE.txt, I used _bulk_docs as below:
  *curl -d @FILE.txt -H “Content-type: application/json” -X POST *
  *http://localhost:5984/testdb1/_bulk_docs*
  http://localhost:5984/testdb1/_bulk_docs
 
 
  Below is the sample data for GET with _all_docs. But it is not is not
  working for POST with _bulk_docs and it is expecting docs in the txt
  file.
 
  {total_rows:2,offset:0,rows:[
 
 
 {id:docid1,key:docid1,value:{rev:2-d872217c576ee407a11a5de30ce2bb30},doc:{_id:docid1,_rev:2-d872217c576ee407a11a5de30ce2bb30,sdfs:data}},
 
 
 {id:docid2,key:docid2,value:{rev:1-f58d4c586789ca8e166a53ece008b23b},doc:{_id:docid2,_rev:1-f58d4c586789ca8e166a53ece008b23b,f2:dsvsd}}
  ]}
  We need to import the huge data dump which we got from client. Please
 guide
  me to import the huge client data. Thank you in advance.
 
 
 
  Thanks,
  Sreedhar
 

 don't think in terms of a SQL-dump. The example above looks like you export
 all documents form testdb and you want to put them into testdb1. For this
 job, CouchDB provides the _replication mechanism. It would look something
 like:

 curl -X POST http://127.0.0.1:5984/_replicate \
 -H content-type:application/json \
 -d ’{source: testdb,
  target: testdb1,
  create_target: true}’
 If you just need a subset of the data, you can use a filter you created
 create this in a _design document:

 curl -X PUT http://127.0.0.1:5984/kina/_design/default \

 -H content-type: application/json \
 -d ’{filters:{
  country:
function(doc, req) {
   return \US\ == doc.lang;
   }
 }
 }’

 Then within the replication:

 curl -X POST http://127.0.0.1:5984/_replicate \
 -H content-type:application/json \
 -d ’{source: testdb,
  target: testdb1,
  create_target: true,
  filter: default/country}’
 For more info on replication please see
 http://docs.couchdb.org/en/latest/intro/api.html#replication

 Cheers

 Andy

 --
 Andy Wenk
 Hamburg - Germany
 RockIt!

 http://www.couchdb-buch.de
 http://www.pg-praxisbuch.de

 GPG fingerprint: C044 8322 9E12 1483 4FEC 9452 B65D 6BE3 9ED3 9588

Re: building couchdb 1.5 on centos 6.4

2013-11-21 Thread Robert Newson

Agreed, the doc does say There are lots of Erlang packages. If there
is a problem with your install, try a different mix. but that's a bit
weak.

B.


On 21 November 2013 15:51, Vivek Pathak vpat...@orgmeta.com wrote:
 Yes there was a missing dependency to erlang-asn1 on centos 6.4

 Perhaps the wiki document
 (http://docs.couchdb.org/en/latest/install/unix.html ) can just say yum
 install erlang instead of several (but incomplete) erlang-* it installs.

 The complete erl package does get this specific dependency (and any others
 if they were missing)

 Thanks
 Vivek


 On 11/21/2013 10:07 AM, Robert Newson wrote:

 asn1 comes from your erlang install, we don't ship it, but it implies
 you're missing standard parts of erlang. I'm assuming debian or
 ubuntu, therefore apt-get install erlang-asn1 and probably others.
 The policy that forces package maintainers to subdivide erlang because
 it's possible but not wise irks me.

 B.

 On 21 November 2013 14:58, Vivek Pathak vpat...@orgmeta.com wrote:

 Thanks - I tried it.  There is something odd going on.  It says there is
 no
 such app, and while couchdb source seems to refer to it, I can not find
 it
 anywhere in the filesystem.


 Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:8:8] [rq:8]
 [async-threads:0] [kernel-poll:false]

 Eshell V5.8.5  (abort with ^G)
 1 application:start(asn1).
 {error,{no such file or directory,asn1.app}}


 # pwd
 /opt/apache-couchdb-1.5.0/src

 # grep -R asn1 *
 couchdb/couch_app.erl:case start_apps([crypto, asn1, public_key,
 sasl,
 inets, oauth, ssl, ibrowse, syntax_tools, compiler, xmerl, mochiweb,
 os_mon]) of




 On 11/21/2013 09:36 AM, Robert Newson wrote:

 {app_would_not_start,asn1} is pretty telling.

 try 'erl' then application:start(asn1). and see what error you get.
 If it's a not_started for some other app, try starting that one.
 You'll probably do this a few times before finding the thing that
 fails to start. Likely, it will be one that requires zlib or libssl or
 something.

 B.


 On 21 November 2013 14:31, Vivek Pathak vpat...@orgmeta.com wrote:

 Hi

 I was following the steps given in
 http://docs.couchdb.org/en/latest/install/unix.html for couchdb 1.5
 install
 from source.

 After installing dependencies through the yum commands, I could build
 and
 install couchdb.  Next I made changes to its home directory and gave
 permissions for user couchdb to those directories.

 Then I started couchdb but got error :

   sudo -i -u couchdb /usr/local/bin/couchdb
  {init terminating in


 do_boot,{{badmatch,{error,{{app_would_not_start,asn1},{couch_app,start,[normal,[/usr/local/etc/couchdb/default.ini,/usr/local/etc/couchdb/local.ini]],[{couch,start,0},{init,start_it,1},{init,start_em,1}]}}

 Crash dump was written to: erl_crash.dump
 init terminating in do_boot ()

 Both erl and js-devel versions appear to be reasonable for what is
 given
 in
 the install guide.

 The crash dump is a large file.  Is there any guide on how to actually
 locate what is causing the problem?

 Thank you
 Vivek

Re: BigCouch

2013-11-21 Thread Robert Newson

Hi,

There has been a protracted lull in the bigcouch merger work but we're
doing some more at couchhack in December and then a whole lot more in
Q1, hopefully to completion.

We're not yet sure what migration will look like. At worse, it will be
replication based but we're mindful to do better.

0.4 is stable enough but note that it's 2+ years old from the author's
(Cloudant's) point of view and is no longer under active development
or maintenance.

For clarity, the merge is emphatically not the bigcouch code base but,
rather, the code that Cloudant runs in production as of around April.
Once the merge is done, we can look at porting all the core
improvements we've made since then.

B.



On 21 November 2013 16:31, Dan Santner dansant...@me.com wrote:
 I use bigcouch in a production system.  In my case it's been pretty stable 
 but we don't have massive usage yet.

 It was one of the easiest things for me to setup honestly.  And if you don't 
 actually need your own instance running on your own nodes just use cloudant 
 which makes the deployment steps go away :).

 Dan.
 On Nov 21, 2013, at 10:24 AM, Jens Rantil jens.ran...@gmail.com wrote:

 Hi,

 I think I have a use case that would fit BigCouch quite well. This brings
 me to three questions:

   - When is the BigCouch merge expected to be released? Maybe I should
   focus on getting that one working if it's near in time.
   - If I choose to go with the original BigCouch, will it be easy to
   migrate to CouchDB when the BigCouch merge is done?
   - The BigCouch version is 0.4, but I guess it's production ready?


 Cheers,
 Jens

Re: building couchdb 1.5 on centos 6.4

2013-11-21 Thread Robert Newson

{app_would_not_start,asn1} is pretty telling.

try 'erl' then application:start(asn1). and see what error you get.
If it's a not_started for some other app, try starting that one.
You'll probably do this a few times before finding the thing that
fails to start. Likely, it will be one that requires zlib or libssl or
something.

B.


On 21 November 2013 14:31, Vivek Pathak vpat...@orgmeta.com wrote:
 Hi

 I was following the steps given in
 http://docs.couchdb.org/en/latest/install/unix.html for couchdb 1.5 install
 from source.

 After installing dependencies through the yum commands, I could build and
 install couchdb.  Next I made changes to its home directory and gave
 permissions for user couchdb to those directories.

 Then I started couchdb but got error :

 sudo -i -u couchdb /usr/local/bin/couchdb
{init terminating in
 do_boot,{{badmatch,{error,{{app_would_not_start,asn1},{couch_app,start,[normal,[/usr/local/etc/couchdb/default.ini,/usr/local/etc/couchdb/local.ini]],[{couch,start,0},{init,start_it,1},{init,start_em,1}]}}

 Crash dump was written to: erl_crash.dump
 init terminating in do_boot ()

 Both erl and js-devel versions appear to be reasonable for what is given in
 the install guide.

 The crash dump is a large file.  Is there any guide on how to actually
 locate what is causing the problem?

 Thank you
 Vivek

Re: Considering CouchDB

2013-11-20 Thread Robert Newson

1) a stop the world lock when writing to disk

There's no such thing in couchdb. Databases are append-only, there's a
single writer, but concurrent PUT/POST requests are faster than serial
anyway, and each writes to different databases are fully independent.

2) Stack traces are hard to read, not impossible, but couchdb will
send useful errors, we don't just dump stack traces (that is, the
crash only thing does not extend to the API)

3) Compaction can take a long time, sure. Factor it into your planning
(that is, separate your data into multiple databases along natural
partitions). The bigcouch merge will add sharding, which divides this
problem by a configurable factor.

Your two remaining questions are too nebulous to answer but seem to be
predicated on couchdb being clustered, which it currently isn't.

B.




On 20 November 2013 15:53, Diogo Moitinho de Almeida diogo...@gmail.com wrote:
 Hello,

 Based on the research that I've done, CouchDB seems like a very good fit
 for the problem I'm trying to solve, but when talking to people from within
 the company, they're expressed that there are some unadvertised down sides
 to it (though they tried using it 3 years ago) and that we would have to
 migrate fairly quickly off of it.

 The worries that they expressed were:
 1. Scalability (with a stop the world lock when writing to disk).
 2. Erlang stack traces not being the easiest to deal with. (I assumed with
 the crash only design, that common problems could be solved by restarting
 a node).
 3. Garbage collection being a very slow process.

 And questions that I had were:
 1. Would anyone have approximate numbers on how it scales? In terms of
 writes, size of the data, and cluster size? How big of a cluster have
 people gotten working without issues? Have people tested this out in the
 petabyte range?
 2. Would anyone have approximate times on how long a map reduce query could
 take (with whatever sized data, I'm just curious)?

 The use that I'd be looking at is about 200 writes/second of documents 50
 kB each with updates fairly rarely, though if everything goes well, the
 documents could be a bit larger with a more writes and a lot more updates.

 Thanks in advance for any help,
 Diogo

Re: Considering CouchDB

2013-11-20 Thread Robert Newson

I guess this was released from moderation by someone that didn't see
your other email after you subscribed, let's consider this thread
dead?

B.


On 19 November 2013 21:16, Diogo Moitinho de Almeida diogo...@gmail.com wrote:
 Hello,

 Based on the research that I've done, CouchDB seems like a very good fit for
 the problem I'm trying to solve, but when talking to people from within the
 company, they're expressed that there are some unadvertised down sides to it
 (though they tried using it 3 years ago) and that we would have to migrate
 fairly quickly off of it.

 The worries that they expressed were:
 1. Scalability (with a stop the world lock when writing to disk).
 2. Erlang stack traces not being the easiest to deal with. (I assumed with
 the crash only design, that common problems could be solved by restarting
 a node).
 3. Garbage collection being a very slow process.

 And questions that I had were:
 1. Would anyone have approximate numbers on how it scales? In terms of
 writes, size of the data, and cluster size? How big of a cluster have people
 gotten working without issues? Have people tested this out in the petabyte
 range?
 2. Would anyone have approximate times on how long a map reduce query could
 take (with whatever sized data, I'm just curious)?

 The use that I'd be looking at is about 200 writes/second of documents 50
 kB each with updates fairly rarely, though if everything goes well, the
 documents could be a bit larger with a more writes and a lot more updates.

 Thanks in advance for any help,
 Diogo

Re: Considering CouchDB

2013-11-20 Thread Robert Newson

A write requires updating views and reads have to wait for the update

Is not true. Database writes are not coupled to view updates. 

Sent from my iPad

 On 20 Nov 2013, at 20:59, Mark Hahn m...@reevuit.com wrote:
 
 A write requires updating views and reads have
 to wait for the update

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1233 matches

Mail list logo