2019-07-05 15:21:31 UTC - Santiago Del Campo: @David Kjerrumgaard You have some 
troubleshooting in mind?

Lately i've been understanding more the outputs of the bookeeper and i might 
know how to solve some issues.... but even after that.. i can see other 
problems when i try to deploy the broker pods again:

*Broker pods logs*

``` org.apache.pulsar.broker.PulsarServerException: java.lang.RuntimeException: 
java.lang.RuntimeException: Can't create a producer on assignment topic 
<persistent://public/functions/assignments>  ```

``` Caused by: 
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger  ```
----
2019-07-05 15:23:31 UTC - Santiago Del Campo: I've been surfing the Pulsar 
Documentation trying to understand better the architecture.... but sometimes i 
think im fighting more with the Kubernetes deployments design than with Pulsar 
itself :thinking_face:
----
2019-07-05 17:00:26 UTC - David Kjerrumgaard: We would have to have a more 
interactive debugging session, as I am not 100% clear on the environment you 
are working in and the steps taken to produce the issue.  It sounds like you 
spin up the Pulsar cluster with the standard Helm chart, and then "re-deploy" 
new bookie pods that are configured to point to the existing ZK node. Then you 
see "invalid cookie" and "bad segment" errors.
----
2019-07-05 17:01:50 UTC - David Kjerrumgaard: I don't think you can deploy an 
new "set" of bookies at once. You would need to have at least one "old" bookie 
around to serve the reads of existing data. Otherwise all the ledger metadata 
in the ZK nodes (which is based on the old, and now deleted bookies) would be 
invalid.
----
2019-07-05 17:28:01 UTC - Santiago Del Campo: I'd appreciate that alot, right 
now im kinda lost how to troubleshoot from here.

How could we have a more interactive debugging session?
----
2019-07-05 17:44:47 UTC - David Kjerrumgaard: Let me ask you this, do you 
really replace all the bookies in the cluster at the same time?
----
2019-07-05 18:57:48 UTC - Santiago Del Campo: Yeah.. so, we use Rancher2 as our 
Kubernetes cluster administrator... it uses a UI to visualize all that is 
deployed inside a specific cluster.  I simply click in a "redeploy" button for 
the bookie workload which contains all the bookie pods and all the current pods 
are replaced for new ones... thats all.
----
2019-07-05 19:04:16 UTC - David Kjerrumgaard: That is most likely the issue 
then. All the metadata that is kept in the ZK pod is for the old bookies. 
Therefore you will need to initialize the metadata again, since it is now 
essentially a new cluster.... 
<https://pulsar.apache.org/docs/en/admin-api-clusters/#initialize-cluster-metadata>
----
2019-07-05 19:09:47 UTC - David Kjerrumgaard: Bookies have a strict cookie 
validation mechanism to ensure data consistency. If a bookie is added without 
proper initialization, the bookie will fail the cookie validation. The cookie 
is stored on the bookies local disk and validated against the expected cookie 
value that is kept in ZK. Since you are adding new bookies, they don't have the 
proper cookie value.
----
2019-07-05 19:12:16 UTC - Santiago Del Campo: Perfect, i understand that.. so 
that means... that whenever i redeploy the bookie pods for whatever reason.. i 
have to also make sure that the ZKs are aware of this change by updating the 
general cluster metadata?
----
2019-07-05 19:27:25 UTC - David Kjerrumgaard: correct
----
2019-07-05 19:28:09 UTC - David Kjerrumgaard: Since no data is kept on the 
brokers (excluding the cache),  you are basically starting over with a new, 
empty cluster. HTH
----
2019-07-05 19:45:17 UTC - Santiago Del Campo: And how impossible would be if i 
wanted to do it backwards... like forcing the new bookie pods to have the same 
metadata that the ZKs want to?. That way it'd be more easy to poweroff a 
machine to do some maintenance and turn it back on
----
2019-07-05 19:49:10 UTC - Santiago Del Campo: as far i understand.. what is 
really breaking everything.. is not that old topics or messages are lost in the 
new redeploy... because i am not dealing with persistent data, and the default 
topics generated by Pulsar are created automatically with a new deploy..... the 
thing here is about the stricts validation mechanisms that bookie needs to be 
able to boot correctly.

In that case, what i need would be to be able to setup the Bookies in a way 
that can adapt to machines that may be turned off at some point... if it is 
possible, of course.
----
2019-07-05 22:42:14 UTC - t: @t has joined the channel
----
2019-07-06 05:39:57 UTC - vikash: Hello  ,i  am   facing   continues disconnect 
 of   websocket Producer like  Closing connection ,i have   used  websocket ,is 
 there  any  setting  in  apache  pulsar  to   keep   connection  long  live
----
2019-07-06 05:56:57 UTC - vikash: i  also  getting  this  Error  too
----
2019-07-06 05:56:58 UTC - vikash: 05:55:23.260 
[ForkJoinPool.commonPool-worker-3] ERROR 
org.apache.pulsar.broker.web.PulsarWebResource - Policies not found for 
c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients namespace
05:55:23.260 [ForkJoinPool.commonPool-worker-3] WARN  
org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata 
[/20.43.19.64:58104] 
<persistent://c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients/0370bc8f-a880-43b7-8121-930996c67e52>:
 Policies not found for c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients 
namespace
org.apache.pulsar.broker.web.RestException: Policies not found for 
c091e548-b45a-49b4-b8ec-2cb5e27c7af6/visurClients namespace
        at 
org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$4(PulsarWebResource.java:679)
 ~[org.apache.pulsar-pulsar-broker-2.3.2.jar:2.3.2]
        at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) 
~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
 ~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_212]
        at 
org.apache.pulsar.zookeeper.ZooKeeperDataCache.lambda$0(ZooKeeperDataCache.java:67)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.2.jar:2.3.2]
        at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) 
~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
 ~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_212]
        at 
org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$14(ZooKeeperCache.java:354) 
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.2.jar:2.3.2]
        at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656) 
~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
 ~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_212]
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) 
~[?:1.8.0_212]
        at 
org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$12(ZooKeeperCache.java:339) 
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.2.jar:2.3.2]
        at 
java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
 [?:1.8.0_212]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 
[?:1.8.0_212]
        at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) 
[?:1.8.0_212]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) 
[?:1.8.0_212]
        at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) 
[?:1.8.0_212]
----

Reply via email to