Thanks for the script, Alan.

I had tried to set up a basic replication between localhost endpoint
on Erlang 22 with 3.2.1 release and that seems to work:
https://gist.github.com/nickva/5a89198c62fdd3ec97693c87833d5738

Looking at the differences between our setups I noticed a few things:

 * I haven't tried TLS on the endpoints. Wonder if that's the cause.
Would you be able to try it locally (or via a VPN) without TLS.
Sometimes it's possible to install or build an Erlang release without
crypto support and it only manifests itself when trying to use any of
that functionality at runtime. If you get a chance to find and run a
remsh script check the output of :

crypto:info_lib().
[{<<"OpenSSL">>,269488239,
  <<"OpenSSL 1.1.1f  31 Mar 2020">>}]

ssl:versions().
[{ssl_app,"9.2"},
 {supported,['tlsv1.2']},
 {supported_dtls,['dtlsv1.2']},
 {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
 {available_dtls,['dtlsv1.2',dtlsv1]}]

That would indicate you have the crypto application installed and
linked to the openssl library.

 * Another potential issue is that the curl script quotes the
parameters with a single quote in:

curl -X POST http://$USERPASS@localhost:5984/_replicator -d
'{"source":"http://$USERPASS@localhost:5984/workqueue_inbox/";,
"target":"https://$REMOTEHOST/couchdb/workqueue/";, "continuous":true}'
-H "Content-Type: application/json"

That would make the target the literal
`https://$REMOTEHOST/couchdb/workqueue/` string without substituting
the $REMOTEHOST with its value. That's probably not the reason here
but thought I'd mention it just in case.

  * `https://$REMOTEHOST/couchdb/workqueue`. I could see the db / url
parser being confused by the url path there as the path
$REMOTEHOST/couchdb/workqueue could be split up as a database
path=$REMOTEHOST/couchdb and then workqueue would be the document, but
in this case the workqueue is the database actually. Would you be able
to test a setup where the URL path looks like
http://domain.name.ext/dbname for an endpoint?

The logs don't show any stack traces besides the one you indicated in
the initial email? Anything with a module name and a line number
perhaps.

Thanks,
-Nick





On Sat, Feb 26, 2022 at 4:24 PM Alan Malta <alanma...@gmail.com> wrote:
>
> Hi Nick,
>
> Thank you for your prompt response.
>
> Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that user
> and password only have basic chars a-z.
>
> I wiped out all my setup, started from scratch and managed to reproduce
> this replication issue with the following set
> of commands:
> https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0
>
> while reproducing it, I defined only one way replication. However, my
> previous setup had it bi-directional and both
> of them were in a failed state. I also added some extra checks and
> information in the gist above, in case it turns out
> to be helpful.
>
> I haven't yet tried to replicate data among two instances running the same
> version. Reason is, during this migration,
> I believe it will be impossible to swap all my services to the new CouchDB
> version, so there should be a period of
> time (around a month) where I will need to keep this hybrid setup.
>
> Thank you again!
> Alan.
>
> On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <vatam...@gmail.com> wrote:
>
> > Hi Alan,
> >
> > Thanks for reaching out.
> >
> > It looks like CouchDB had failed to parse the replication document,
> > and couldn't turn it into a proper replication job.
> >
> > The 'undef' error could suggest running on an unsupported version of
> > Erlang. It's a generic "this function doesn't exist" error in Erlang.
> > Are you running on at least Erlang 20?
> >
> > Does the target url have any unusual characters in it, or something
> > that might cause parsing errors (say, ':' or '@' characters for
> > example).
> >
> > Would it be possible to have an example script which fails. Ideally, a
> > set of curl commands creating dbs, then the replication job using
> > similar parameters you had?
> >
> > Cheers,
> > -Nick
> >
> > On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <alanma...@gmail.com> wrote:
> > >
> > > Hi everyone,
> > >
> > > after a delay of many years to migrate to (almost) the latest CouchDB
> > > version, I started working with CouchDB 3.1.2.
> > >
> > > My tests with replication to/from the same node/localhost have been
> > > successful. But now that I am trying multiple push/pull replications
> > with a
> > > remote host, they get into a "failed" state.
> > >
> > > I just learned about the "_scheduler/jobs" API - and I am likely missing
> > > some crucial knowledge here - and when I compare it against the documents
> > > in the "_replicator" database, I see an inconsistent definition for
> > either
> > > the source or the target database.
> > > For instance, the "_scheduler/jobs" gives me the following output for one
> > > of the replications:
> > >
> > >
> > {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"
> > http://admin:
> > >
> > *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
> > > while the "_replicator" db lists this document as:
> > >
> > >
> > {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"
> > http://admin:
> > > *****@localhost:5984/my_db_name","target":"
> > > https://alanblah.blah.blah/couchdb/wmstats
> > >
> > ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
> > > in short, the "target" parameter is defined as null in the "jobs" output.
> > > Is it because the replication failed somehow?
> > >
> > > Just in case, this is the only error I see in the couch log regarding
> > that
> > > replication - on the node that triggered the replication:
> > >
> > > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0> --------
> > > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e` from
> > > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
> > >
> > > I also wonder if the replication protocol is compatible among different
> > > releases of CouchDB? In my case, target is still on the super old version
> > > 1.6.1 while source is on 3.1.2
> > >
> > > Thank you very much for any help that you can provide.
> > > Best,
> > > Alan.
> >

Reply via email to