internal_server_error : No DB shards could be opened.

2017-02-06 Thread Garth Gutenberg
Hey guys.  I'm having a problem that I hope someone can shed some light on.

I have a 3 node cluster.  I just imported 420 DBs into node 0 (about 20gb
on disk) via bulk insert and triggered view indexes along the way.  Nodes 1
and 2 were happily replicating (or whatever the cluster term for that is
now), and all was good.  Import completed, all the boxes were dormant.
However when I load Fauxton on node 1, I get the following message beside
each DB:

This database failed to load.


Each DB has the following log entries for it when it's being accessed on
this node:

[error] 2017-02-06T17:35:32.309294Z
couc...@couchdb1.aries.aws.weeverapps.com.aries.aws.weeverapps.com
<0.1206.9> 0ef0b93a1b req_err(1995524407) internal_server_error : No DB
shards could be opened.
[<<"fabric_util:get_shard/4 L180">>,<<"fabric:get_security/2
L146">>,<<"chttpd_auth_request:db_authorization_check/1
L87">>,<<"chttpd_auth_request:authorize_request/1
L19">>,<<"chttpd:process_request/1 L291">>,<<"chttpd:handle_request_int/1
L229">>,<<"mochiweb_http:headers/6 L122">>,<<"proc_lib:init_p_do_apply/3
L237">>]
[notice] 2017-02-06T17:35:32.309544Z
couc...@couchdb1.aries.aws.weeverapps.com.aries.aws.weeverapps.com
<0.1206.9> 0ef0b93a1b couchdb1.aries.aws.weeverapps.com:5984 10.150.0.42
undefined GET /app_18950%2Fconfig 500 ok 1

It looks like it's appending the search domain to the FQDN for some reason,
but only on this node.  Also, if I query for membership I get:

{"all_nodes":["couc...@couchdb0.aries.aws.weeverapps.com","
couc...@couchdb2.aries.aws.weeverapps.com"],"cluster_nodes":["
couc...@couchdb0.aries.aws.weeverapps.com","
couc...@couchdb1.aries.aws.weeverapps.com","
couc...@couchdb2.aries.aws.weeverapps.com"]}

Both nodes 0 and 2 appear to be operating fine.  Thankfully this is still
in a lab environment, but we'd really like to get this into production, so
would like to understand/solve this problem asap.


How do you know when a new node in a cluster is "synced"?

2017-01-25 Thread Garth Gutenberg
Scenario:

I have a three node cluster.  One of the nodes goes offline (server dies,
whatever).  I bring up a new node with no data and it starts sync'ing with
the other nodes in the cluster.

How do I know when this sync is complete and the new node has all the
data?  I'm dealing with thousands of DBs, so doing a doc count in each one
isn't really feasible - at least not in a timely manner.  Is there a log or
API somewhere that indicates completion of data synchronization from a
server perspective, not just individual DBs?


Re: Fauxton does not URLencode its links

2017-01-25 Thread Garth Gutenberg
Hey guys.  Sorry to revive this, but I don't think this fix is fully
working.  In general it seems ok, but it's not applying to Views.  Steps to
reproduce:

Create a DB called "a/b".
Create a new View (call it _design/test and "new-view" is fine)
Under Design Documents in the left nav you'll see "test".  Expand it and
click on "new-view".  Note that the DB is *not* urlencoded, and navigation
breaks.

On Thu, Dec 1, 2016 at 1:27 PM, Garth Gutenberg <garth.gutenb...@gmail.com>
wrote:

> Thanks Robert.  I'll test that out next week.
>
> On Wed, Nov 30, 2016 at 5:04 PM, Robert Kowalski <r...@kowalski.gd> wrote:
>
>> Hi Garth,
>>
>> I think it is already fixed on master:
>> https://github.com/apache/couchdb-fauxton/commit/1aa4ca6f34a
>> 718c294a06a1301f39fe05f157a1c
>>
>>
>> On Wed, Nov 30, 2016 at 10:25 PM, Garth Gutenberg
>> <garth.gutenb...@gmail.com> wrote:
>> > Hey folks.  Fauxton is not URL encoding any of its navigation links in
>> > Couch 2.0.  It's a pretty major issue for us as all of our databases use
>> > slashes in their names, which makes navigating around Fauxton
>> unbearable.
>> >
>> > I created a ticket for it here https://issues.apache.
>> > org/jira/browse/COUCHDB-3229 .
>> >
>> > Can someone please take a look at this issue?
>>
>
>


Re: How do you know when a new node in a cluster is "synced"?

2017-01-25 Thread Garth Gutenberg
Kind of a follow-up question to this.  I've found in my testing that when a
new node comes online in a cluster, it only syncs the raw data, but not the
views.  Is there a way to enable syncing of views across cluster nodes as
well?  Basically I want all the nodes in my cluster to be exact replicas of
each other.  We have some relatively large DBs (~4GB) whose views take
awhile to generate.

To expand on the previous scenario, if the downed node comes up without any
views, and a client hits hit, that client needs to wait for the view to be
generated - even though it exists on the other nodes in the cluster.  And
that wait time can be 15-30 mins in some cases, which really isn't
acceptable when the view is already generated, just not on this particular
node.

On Wed, Jan 25, 2017 at 8:59 AM, Garth Gutenberg <garth.gutenb...@gmail.com>
wrote:

> Scenario:
>
> I have a three node cluster.  One of the nodes goes offline (server dies,
> whatever).  I bring up a new node with no data and it starts sync'ing with
> the other nodes in the cluster.
>
> How do I know when this sync is complete and the new node has all the
> data?  I'm dealing with thousands of DBs, so doing a doc count in each one
> isn't really feasible - at least not in a timely manner.  Is there a log or
> API somewhere that indicates completion of data synchronization from a
> server perspective, not just individual DBs?
>


Re: Fauxton does not URLencode its links

2017-01-25 Thread Garth Gutenberg
Is there an ETA on when this will be complete?

On Wed, Jan 25, 2017 at 12:07 PM, Garren Smith <gar...@apache.org> wrote:

> Hi Garth,
>
> You are right. We have done the encoding for the _all_dbs section but
> haven't finished the work for the database page.
>
> Cheers
> Garren
>
> On Wed, Jan 25, 2017 at 4:45 PM, Garth Gutenberg <
> garth.gutenb...@gmail.com>
> wrote:
>
> > Hey guys.  Sorry to revive this, but I don't think this fix is fully
> > working.  In general it seems ok, but it's not applying to Views.  Steps
> to
> > reproduce:
> >
> > Create a DB called "a/b".
> > Create a new View (call it _design/test and "new-view" is fine)
> > Under Design Documents in the left nav you'll see "test".  Expand it and
> > click on "new-view".  Note that the DB is *not* urlencoded, and
> navigation
> > breaks.
> >
> > On Thu, Dec 1, 2016 at 1:27 PM, Garth Gutenberg <
> garth.gutenb...@gmail.com
> > >
> > wrote:
> >
> > > Thanks Robert.  I'll test that out next week.
> > >
> > > On Wed, Nov 30, 2016 at 5:04 PM, Robert Kowalski <r...@kowalski.gd>
> > wrote:
> > >
> > >> Hi Garth,
> > >>
> > >> I think it is already fixed on master:
> > >> https://github.com/apache/couchdb-fauxton/commit/1aa4ca6f34a
> > >> 718c294a06a1301f39fe05f157a1c
> > >>
> > >>
> > >> On Wed, Nov 30, 2016 at 10:25 PM, Garth Gutenberg
> > >> <garth.gutenb...@gmail.com> wrote:
> > >> > Hey folks.  Fauxton is not URL encoding any of its navigation links
> in
> > >> > Couch 2.0.  It's a pretty major issue for us as all of our databases
> > use
> > >> > slashes in their names, which makes navigating around Fauxton
> > >> unbearable.
> > >> >
> > >> > I created a ticket for it here https://issues.apache.
> > >> > org/jira/browse/COUCHDB-3229 .
> > >> >
> > >> > Can someone please take a look at this issue?
> > >>
> > >
> > >
> >
>


Re: How do you know when a new node in a cluster is "synced"?

2017-01-25 Thread Garth Gutenberg
Great info.  Thanks!

On Wed, Jan 25, 2017 at 12:11 PM, Paul Davis <paul.joseph.da...@gmail.com>
wrote:

> Garth,
>
> The way to tell when a cluster is sync'ed is by looking at the
> `internal_replication_jobs` key in the JSON blob returned from the
> _system endpoint on the 5984 port from each node in the cluster. Once
> its zero (or close to) on each node you're done getting data to the
> new node. Though it can lie some times if there's a busy cluster and
> things get wonky. To be extra sure you can run this on each node in
> the cluster:
>
> `mem3_sync:initial_sync(nodes()).`
>
> Basically all that does is queue up every database for internal
> replication. If that doesn't change the count from zero then you
> should be good to go.
>
>
> To your second question, it would be impossible to automatically sync
> a view when rebuilding it via internal replication. Each view depends
> on the order of updates to the shard. With internal replication that
> order is undefined and would change each time during a rebuild as the
> order of updates via internal replication is approximately random.
> Also, it most definitely would not match the order of the source shard
> when documents have been updated.
>
> However, the answer to your specific problem is to use
> maintenance_mode on the node being rebuilt. Before you first boot the
> node you're wanting to rebuild (or before you connect it to the
> cluster) you just need to set the `[couchdb] maintenance_mode = true`
> parameter. This prevents the node from participating in any
> interactive requests which prevents it from responding to view queries
> before its built its views. Then you just need to watch _active_tasks
> for your view to build before setting maintenance_mode back to false
> or deleting it.
>
> You may also want to make a view query to the individual shards over
> the 5986 node local port as well to make sure that there was a build
> triggered for each shard.
>
> Paul
>
> On Wed, Jan 25, 2017 at 11:27 AM, Garth Gutenberg
> <garth.gutenb...@gmail.com> wrote:
> > Kind of a follow-up question to this.  I've found in my testing that
> when a
> > new node comes online in a cluster, it only syncs the raw data, but not
> the
> > views.  Is there a way to enable syncing of views across cluster nodes as
> > well?  Basically I want all the nodes in my cluster to be exact replicas
> of
> > each other.  We have some relatively large DBs (~4GB) whose views take
> > awhile to generate.
> >
> > To expand on the previous scenario, if the downed node comes up without
> any
> > views, and a client hits hit, that client needs to wait for the view to
> be
> > generated - even though it exists on the other nodes in the cluster.  And
> > that wait time can be 15-30 mins in some cases, which really isn't
> > acceptable when the view is already generated, just not on this
> particular
> > node.
> >
> > On Wed, Jan 25, 2017 at 8:59 AM, Garth Gutenberg <
> garth.gutenb...@gmail.com>
> > wrote:
> >
> >> Scenario:
> >>
> >> I have a three node cluster.  One of the nodes goes offline (server
> dies,
> >> whatever).  I bring up a new node with no data and it starts sync'ing
> with
> >> the other nodes in the cluster.
> >>
> >> How do I know when this sync is complete and the new node has all the
> >> data?  I'm dealing with thousands of DBs, so doing a doc count in each
> one
> >> isn't really feasible - at least not in a timely manner.  Is there a
> log or
> >> API somewhere that indicates completion of data synchronization from a
> >> server perspective, not just individual DBs?
> >>
>


Fauxton does not URLencode its links

2016-11-30 Thread Garth Gutenberg
Hey folks.  Fauxton is not URL encoding any of its navigation links in
Couch 2.0.  It's a pretty major issue for us as all of our databases use
slashes in their names, which makes navigating around Fauxton unbearable.

I created a ticket for it here https://issues.apache.
org/jira/browse/COUCHDB-3229 .

Can someone please take a look at this issue?


/_stats not working on 2.0

2016-12-01 Thread Garth Gutenberg
When I try hitting localhost:5984/_stats I get:

{"error":"not_found","reason":"Database does not exist."}

On my 1.6 box it returns stats as expected.  All of the 2.0 docs still
refer to /_stats, but it's not working for me.

Of note, I'm testing this against a cluster setup.  Could that be the
problem?  Is /_stats not supported in a cluster?  If not, what is the
equivalent?


Re: Fauxton does not URLencode its links

2016-12-01 Thread Garth Gutenberg
Thanks Robert.  I'll test that out next week.

On Wed, Nov 30, 2016 at 5:04 PM, Robert Kowalski <r...@kowalski.gd> wrote:

> Hi Garth,
>
> I think it is already fixed on master:
> https://github.com/apache/couchdb-fauxton/commit/
> 1aa4ca6f34a718c294a06a1301f39fe05f157a1c
>
>
> On Wed, Nov 30, 2016 at 10:25 PM, Garth Gutenberg
> <garth.gutenb...@gmail.com> wrote:
> > Hey folks.  Fauxton is not URL encoding any of its navigation links in
> > Couch 2.0.  It's a pretty major issue for us as all of our databases use
> > slashes in their names, which makes navigating around Fauxton unbearable.
> >
> > I created a ticket for it here https://issues.apache.
> > org/jira/browse/COUCHDB-3229 .
> >
> > Can someone please take a look at this issue?
>


Re: /_stats not working on 2.0

2016-12-01 Thread Garth Gutenberg
Awesome!  Works like a charm.  Thanks.

On Thu, Dec 1, 2016 at 4:00 PM, Eiri <e...@eiri.ca> wrote:

> Hi Garth,
>
> Stats are node specific, so they are on nodes interface 
> /_node/{node@name}/_stats
> in 2.0,  e.g. /_node/node1@127.0.0.1/_stats
>
>
> Regards,
> Eric
>
> > On Dec 1, 2016, at 15:26, Garth Gutenberg <garth.gutenb...@gmail.com>
> wrote:
> >
> > When I try hitting localhost:5984/_stats I get:
> >
> > {"error":"not_found","reason":"Database does not exist."}
> >
> > On my 1.6 box it returns stats as expected.  All of the 2.0 docs still
> > refer to /_stats, but it's not working for me.
> >
> > Of note, I'm testing this against a cluster setup.  Could that be the
> > problem?  Is /_stats not supported in a cluster?  If not, what is the
> > equivalent?
>
>


Restoring a single DB in a cluster

2017-03-26 Thread Garth Gutenberg
Hello developers.  I can't find any documentation on this, so if it exists,
please point me at it.  Otherwise, I need some input.

What is the official, sanctioned way to restore a DB in a cluster
configuration?

Scenario:
3 servers in a cluster.  A client deletes a bunch of docs from their DB, or
some other DB-specific catastrophe happens.  What is the correct approach
for restoring this DB from a backup?  Let's assume that I have disk
snapshots of every node in the cluster taken at *approximately* the same
time from the day before.

In the old days, pre-cluster, it was as simple as just copying the
.couch file... but with shards and clusters, this just doesn't seem
feasible.  Any input would be greatly appreciated.