Re: [DISCUSS] OpenTelemetry for Zookeeper?

2024-01-02 Thread Andor Molnar
Thanks Chris.

Unfortunately both ZooKeeper client and server code reside in
zookeeper-server Maven project, so I think the separation that you
suggest is not feasible at the moment.

Andor




On Tue, 2024-01-02 at 21:13 -0500, Christopher wrote:
> I think open tracing is a good library to use. Accumulo started using
> it
> after HTrace went to the attic. Accumulo also uses ZooKeeper. It
> remains to
> be seen if open telemetry will be stable in the long term, since it
> is
> relatively new.
> 
> Lots of projects in an application stack may want to use open
> telemetry,
> but it may be difficult because of dependency hell and class path
> issues if
> lots of libraries in an application are all using open telemetry, but
> using
> different incompatible versions. That was certainly a big problem
> with
> HTrace when Hadoop was using one version and Accumulo was using a
> different
> version (notably, this problem arose, even though HTrace was
> originally
> derived from code that started out in Accumulo as "cloudtrace",
> because it
> was modified to be more general purpose and began being used by
> multiple
> applications in the application stack). This is not a unique issue,
> but
> other places where this could be a problem, such as logging, have
> well
> established solutions, like slf4j. It's not clear to me yet whether
> open
> telemetry can be relied upon to be stable like that in the long term.
> If
> zookeeper starts using it, it would be safest to use it only on the
> server
> code, at least to start, and not require it as a dependency of
> zookeeper
> client code, to make sure it doesn't conflict with other applications
> that
> use ZooKeeper.
> 
> If it is used for client code, it should be possible to completely
> disable
> it with a property so nothing breaks if it is disabled and it is
> missing
> from the class path, or if a different version that is incompatible
> exists
> on the class path. That could be hard to do.
> 
> This doesn't mean it shouldn't be done, just that if it is done,
> these are
> some things to consider to try to avoid potential problems down the
> line.
> 
> On Tue, Jan 2, 2024, 10:57 Andor Molnar  wrote:
> 
> > Hi all,
> > 
> > Inspired by the following CURATOR ticket I started to think about
> > what
> > needs to be done for ZooKeeper to support OpenTelemetry.
> > 
> > CURATOR-695 Open Telemetry Tracing Driver [1]
> > 
> > Unfortunately we don't have such generic tracing driver, even
> > ZooTrace
> > class looks unusable for this use case, but we should be able to
> > implement it in a generic fashion. Start the trace in
> > PrepRequestProcessor when request comes in and finish it in
> > FinalRequestProcessor with adding some in-process events too.
> > 
> > It's never that simple obviously, because, for instance, we also
> > need
> > to track the failing code paths too, but looks to me a good
> > starting
> > point and something we should invest into.
> > 
> > Thoughts?
> > 
> > Regards,
> > Andor
> > 
> > [1] https://issues.apache.org/jira/browse/CURATOR-695
> > 
> > 
> > 
> > 



Re: [DISCUSS] OpenTelemetry for Zookeeper?

2024-01-02 Thread Christopher
I think open tracing is a good library to use. Accumulo started using it
after HTrace went to the attic. Accumulo also uses ZooKeeper. It remains to
be seen if open telemetry will be stable in the long term, since it is
relatively new.

Lots of projects in an application stack may want to use open telemetry,
but it may be difficult because of dependency hell and class path issues if
lots of libraries in an application are all using open telemetry, but using
different incompatible versions. That was certainly a big problem with
HTrace when Hadoop was using one version and Accumulo was using a different
version (notably, this problem arose, even though HTrace was originally
derived from code that started out in Accumulo as "cloudtrace", because it
was modified to be more general purpose and began being used by multiple
applications in the application stack). This is not a unique issue, but
other places where this could be a problem, such as logging, have well
established solutions, like slf4j. It's not clear to me yet whether open
telemetry can be relied upon to be stable like that in the long term. If
zookeeper starts using it, it would be safest to use it only on the server
code, at least to start, and not require it as a dependency of zookeeper
client code, to make sure it doesn't conflict with other applications that
use ZooKeeper.

If it is used for client code, it should be possible to completely disable
it with a property so nothing breaks if it is disabled and it is missing
from the class path, or if a different version that is incompatible exists
on the class path. That could be hard to do.

This doesn't mean it shouldn't be done, just that if it is done, these are
some things to consider to try to avoid potential problems down the line.

On Tue, Jan 2, 2024, 10:57 Andor Molnar  wrote:

> Hi all,
>
> Inspired by the following CURATOR ticket I started to think about what
> needs to be done for ZooKeeper to support OpenTelemetry.
>
> CURATOR-695 Open Telemetry Tracing Driver [1]
>
> Unfortunately we don't have such generic tracing driver, even ZooTrace
> class looks unusable for this use case, but we should be able to
> implement it in a generic fashion. Start the trace in
> PrepRequestProcessor when request comes in and finish it in
> FinalRequestProcessor with adding some in-process events too.
>
> It's never that simple obviously, because, for instance, we also need
> to track the failing code paths too, but looks to me a good starting
> point and something we should invest into.
>
> Thoughts?
>
> Regards,
> Andor
>
> [1] https://issues.apache.org/jira/browse/CURATOR-695
>
>
>
>


Lack of support for TLS-only ZK cluster

2024-01-02 Thread Abhilash Kishore
Many organizations, large and small, have strict security and compliance
requirements to only accept encrypted/TLS connections and not plain text
connections.

I'd like to discuss an issue which is preventing us from starting our ZK
clusters in TLS only mode (for client traffic).

As per dynamic reconfig doc
,

> Starting with 3.5.0 the *clientPort* and *clientPortAddress* configuration
> parameters should no longer be used. Instead, this information is now part
> of the server keyword specification, which becomes as follows:
> server. = ::[:role];[ address>:]



Let's say the dynamic config entry of a server is
"server.1=abhilash-ubuntu:3183:4183:participant;0.0.0.0:2181". The server
starts up with a (plaintext) clientPort listener on 2181.

Now, if we want to make this server TLS-only, what options do we have? We
want to stop accepting plaintext traffic on 2181 and make the same port
accept TLS connections only (make clientPort as secureClientPort).

If we add "secureClientPort=2181" in zoo.cfg, then ZK server first starts a
plaintext listener on 2181 because of ";0.0.0.0:2181" in "server.1" dynamic
config entry and then attempts to start a TLS client listener on the same
port (2181) and fails. The reason for this behavior is already described in
ZOOKEEPER-4276  (highly
recommended pre-read).

It is not possible to just remove the "" part from the
"server.1" entry as well (I believe it is mandatory from v3.5). I tried:

[zk: localhost:2181(CONNECTED) 4] reconfig -remove 1
[zk: localhost:2181(CONNECTED) 5] reconfig -add
server.1=abhilash-ubuntu:3183:4183:participant
Arguments are not valid :


The reconfig command does not allow us to add a server entry without ";[:]".

How do we support a "TLS-only" cluster in this case?

My recommendation:

   1. If both clientPort and secureClientPort are not set in zoo.cfg, then
   use the client port address from dynamic config.
   2. If only clientPort is set in zoo.cfg, then it has to match the port
   in dynamic config and ZK starts a plaintext listener on this port.
   3. If only secureClientPort is set in zoo.cfg, then it has to match the
   port in dynamic config and ZK starts a TLS listener on this port.
   4. If both clientPort and secureClientPort are set in zoo.cfg, then the
   client port in zoo.cfg should match the port in dynamic config. ZK starts a
   plaintext listener on clientPort and TLS listener on secureClientPort (dual
   mode).


This would reintroduce the requirement to set "clientPort" in zoo.cfg if
someone wants to start the cluster in dual mode.

For example,

secureClientPort=2182
server.1=abhilash-ubuntu:3183:4183:participant;0.0.0.0:2181

will no longer be a valid config because of rule 3 above.

It has to be:

clientPort=2181
secureClientPort=2182
server.1=abhilash-ubuntu:3183:4183:participant;0.0.0.0:2181


I can create a PR to make the above changes, but first I'd like to know
your thoughts on this and discuss further on whether there's a better way
to handle this.

Regards,
Abhilash Kishore


[DISCUSS] OpenTelemetry for Zookeeper?

2024-01-02 Thread Andor Molnar
Hi all,

Inspired by the following CURATOR ticket I started to think about what
needs to be done for ZooKeeper to support OpenTelemetry.

CURATOR-695 Open Telemetry Tracing Driver [1]

Unfortunately we don't have such generic tracing driver, even ZooTrace
class looks unusable for this use case, but we should be able to
implement it in a generic fashion. Start the trace in
PrepRequestProcessor when request comes in and finish it in
FinalRequestProcessor with adding some in-process events too.

It's never that simple obviously, because, for instance, we also need
to track the failing code paths too, but looks to me a good starting
point and something we should invest into.

Thoughts?

Regards,
Andor

[1] https://issues.apache.org/jira/browse/CURATOR-695