2018-01-07 18:53:40 UTC - Daniel Ferreira Jorge: Hello, I am trying to deploy 
the kubernetes manifests following the exact instructions on the documentation 
and everything goes as expected, except that "pulsar-perf produce" produces 0 
messages per second... and if I go to the broker logs I see thousands of 
messages like: Write did not succeed to 10.142.0.19:3181, bookieIndex 1, but we 
have already fixed it.
----
2018-01-07 18:55:06 UTC - Matteo Merli: it looks like all writes are failing on 
the bookie
----
2018-01-07 18:55:34 UTC - Matteo Merli: or that it’s not reachable by the 
broker (though it’s registered as available in ZK)
----
2018-01-07 18:56:16 UTC - Daniel Ferreira Jorge: I can ping the any bookie from 
any broker
----
2018-01-07 18:56:39 UTC - Matteo Merli: bookies logs are telling anything?
----
2018-01-07 18:56:54 UTC - Daniel Ferreira Jorge: nope... just initialized
----
2018-01-07 18:57:43 UTC - Matteo Merli: any other error/warn message in broker 
log? It should say something on the reason why the write failed in the first 
place
----
2018-01-07 18:57:43 UTC - Daniel Ferreira Jorge: nothing is happening
----
2018-01-07 18:59:33 UTC - Matteo Merli: Can it be a problem of the IP that it’s 
being advertised by bookies?>
----
2018-01-07 18:59:45 UTC - Matteo Merli: Are you using StatefulSet or DaemonSet?
----
2018-01-07 18:59:59 UTC - Daniel Ferreira Jorge: daemon
----
2018-01-07 19:00:12 UTC - Daniel Ferreira Jorge: the exact deployment from the 
repository
----
2018-01-07 19:00:13 UTC - Matteo Merli: are you setting the `advertisedAddress` 
in bookie config?
----
2018-01-07 19:00:17 UTC - Daniel Ferreira Jorge: absolutely nothing changed
----
2018-01-07 19:00:20 UTC - Matteo Merli: ok
----
2018-01-07 19:01:41 UTC - Daniel Ferreira Jorge: the advertisedAddress is 
status.hostIP
----
2018-01-07 19:02:27 UTC - Daniel Ferreira Jorge: and the ip 10.142.0.19:3181 is 
the ip of a bookie node
----
2018-01-07 19:02:36 UTC - Daniel Ferreira Jorge: so the broker knows where it is
----
2018-01-07 19:02:44 UTC - Matteo Merli: ok, telnetting there from broker works?
----
2018-01-07 19:03:13 UTC - Daniel Ferreira Jorge: pinging "bookie" works
----
2018-01-07 19:03:49 UTC - Matteo Merli: my concern is that the Pod is not being 
exposed on the host network
----
2018-01-07 19:04:22 UTC - Daniel Ferreira Jorge: I tried pinging everything 
from everywhere
----
2018-01-07 19:04:24 UTC - Matteo Merli: broker might be able to “ping” but the 
Pod still needs to be bound on 3181 in the bookie host machine
----
2018-01-07 19:04:38 UTC - Matteo Merli: try telnet instead of ping
----
2018-01-07 19:04:45 UTC - Daniel Ferreira Jorge: ok
----
2018-01-07 19:04:49 UTC - Daniel Ferreira Jorge: let me redeploy
----
2018-01-07 19:04:56 UTC - Daniel Ferreira Jorge: I will report in 5 min
----
2018-01-07 19:05:05 UTC - Matteo Merli: :+1:
----
2018-01-07 19:06:03 UTC - Matteo Merli: uhm, just looking the bookie.yaml file
----
2018-01-07 19:06:48 UTC - Matteo Merli: I think the problem is indeed the IP 
exposed
----
2018-01-07 19:06:54 UTC - Matteo Merli: In this change   
<https://github.com/apache/incubator-pulsar/pull/764>
----
2018-01-07 19:07:26 UTC - Matteo Merli: I had put the hostIP.. but this was 
missing the change to bind the bookie on `hostNetwork`
----
2018-01-07 19:07:28 UTC - Matteo Merli: :confused:
----
2018-01-07 19:10:34 UTC - Daniel Ferreira Jorge: so, I have to remove the 
advertised address?
----
2018-01-07 19:10:55 UTC - Matteo Merli: try this:
----
2018-01-07 19:10:55 UTC - Matteo Merli: 
<https://gist.github.com/merlimat/dad357c1cccde8e0b634a9639e1fcb16>
----
2018-01-07 19:11:36 UTC - Matteo Merli: enabling `hostNetwork` tells Kubernetes 
to expose 3181 in the host network (rather than just bind it on the Pod IP)
----
2018-01-07 19:12:32 UTC - Daniel Ferreira Jorge: great! trying now... will 
report back soon! thank you @Matteo Merli
----
2018-01-07 19:25:25 UTC - Daniel Ferreira Jorge: @Matteo Merli if I try to 
deploy with hostNetwork, the bookies fail to start because it cannot find 
zookeeper anymore "zk-0.zookeeper: Name or service not known"
----
2018-01-07 19:26:08 UTC - Matteo Merli: ok, let me try it as well
----
2018-01-07 19:26:56 UTC - Daniel Ferreira Jorge: ok, im using kube 1.8.4 on GKE
----
2018-01-07 19:28:38 UTC - Sijie Guo: I think you need to expose hostPort?
----
2018-01-07 19:29:31 UTC - Matteo Merli: yes, I got confused with hostNetwork 
but that’s going too far
----
2018-01-07 19:29:35 UTC - Sijie Guo: ports:
                  - name: client
                    containerPort: 3181
                    # we are using `status.hostIP` for the bookie's advertised 
address. export 3181 as the hostPort,
                    # so that the containers are able to access the host port
                    hostPort: 3181
----
2018-01-07 19:38:49 UTC - Daniel Ferreira Jorge: exposing the hostPort works... 
but isn't it against best practices?
----
2018-01-07 19:39:15 UTC - Matteo Merli: the restriction with host port is that 
you can only have 1 pod per host
----
2018-01-07 19:39:32 UTC - Matteo Merli: but that’s anyway implied by using 
DaemonSet
----
2018-01-07 19:42:02 UTC - Sijie Guo: Bookkeeper needs a reliable ID for bookie 
advertisement. Unfortunately in daemonset, host ip is the only way to achieve 
that. Because pod ip can change when pod being restarted.
----
2018-01-07 19:43:19 UTC - Matteo Merli: yes, StatefulSet is better for that 
because it preserves the Pod IP, but the support for local volumes is still a 
bit green
----
2018-01-07 19:43:57 UTC - Daniel Ferreira Jorge: yes, I'm only trying to deploy 
the manifests supplied, because I'm getting an error with the helm chart I'm 
making... In my chart, the brokers cannot find zookeeper... I'm trying to debug 
that but with no success so far... the bookies are deployed as statefulsets 
with useHostNameAsBookieID: "true"
----
2018-01-07 19:45:04 UTC - Daniel Ferreira Jorge: but when the brokers are 
deployed, they do not start because they cannot find zookeeper... the bookies 
found zookeeper
----
2018-01-07 19:45:52 UTC - Matteo Merli: that is strange, do they use the same 
zk connection string?
----
2018-01-07 19:46:13 UTC - Daniel Ferreira Jorge: the exact same
----
2018-01-07 19:46:44 UTC - Daniel Ferreira Jorge: I spent 7 hours trying to find 
something wrong...
----
2018-01-07 19:47:19 UTC - Matteo Merli: 
<https://github.com/apache/incubator-pulsar/pull/1035>
----
2018-01-07 19:49:31 UTC - Daniel Ferreira Jorge: also, with the manifests from 
the repo, nothing is shown in grafana
----
2018-01-07 19:49:45 UTC - Daniel Ferreira Jorge: the pulsar dashboard works
----
2018-01-07 19:49:54 UTC - Daniel Ferreira Jorge: maybe there is some config 
missing
----
2018-01-07 19:53:58 UTC - Daniel Ferreira Jorge: @Daniel Ferreira Jorge 
uploaded a file: 
<https://apache-pulsar.slack.com/files/U8E1J0DHS/F8PCK1JF7/stack.txt|stack.txt> 
and commented: The error I'm getting when initializing a broker with my chart 
is this
----
2018-01-07 19:55:51 UTC - Daniel Ferreira Jorge: the bookies can access the 
"alpha-pulsar-zookeeper-X.alpha-pulsar-zookeeper", and the metadata is already 
initialized in zookeeper
----
2018-01-07 20:39:10 UTC - Matteo Merli: does the container restart at that 
point?
----
2018-01-07 20:39:20 UTC - Daniel Ferreira Jorge: yes, many times
----
2018-01-07 20:40:22 UTC - Matteo Merli: there should be no difference with what 
the bookies are doing then
----
2018-01-07 20:40:42 UTC - Daniel Ferreira Jorge: I even tried putting an init 
container on the broker to wait for like 5min to make sure everything else is 
already up
----
2018-01-07 20:40:53 UTC - Matteo Merli: and the DNS error while it might happen 
before the ZK pods are active, it should resolve after that
----
2018-01-07 20:41:48 UTC - Matteo Merli: one thing catches the eye:
----
2018-01-07 20:41:50 UTC - Matteo Merli: 
`alpha-pulsar-zookeeper-0.alpha-pulsar-zookeeper, 
alpha-pulsar-zookeeper-1.alpha-pulsar-zookeeper, 
alpha-pulsar-zookeeper-2.alpha-pulsar-zookeeper`
----
2018-01-07 20:41:57 UTC - Matteo Merli: there’s a space after the `,`
----
2018-01-07 20:42:19 UTC - Matteo Merli: looks like the DNS name it’s trying to 
use it’s ` alpha-pulsar-zookeeper-1.alpha-pulsar-zookeeper`
----
2018-01-07 20:42:40 UTC - Matteo Merli: so, picking zk-0 works but zk-1 and 
zk-2 won’t
----
2018-01-07 20:43:26 UTC - Daniel Ferreira Jorge: well, that may be the issue... 
I will try to change that, the bookkeeper pods have the same string with spaces
----
2018-01-07 20:43:34 UTC - Daniel Ferreira Jorge: and it works there
----
2018-01-07 20:43:52 UTC - Matteo Merli: ZK clients picks one random server to 
connect
----
2018-01-07 20:44:14 UTC - Matteo Merli: if you reach to the first in the list 
it won’t have the extra space
----
2018-01-07 20:44:40 UTC - Matteo Merli: can you check the BK log for which ZK 
server it actually connected to ?
----
2018-01-07 20:44:59 UTC - Daniel Ferreira Jorge: sure
----
2018-01-07 20:45:07 UTC - Daniel Ferreira Jorge: I will report back in 2 min
+1 : Matteo Merli
----
2018-01-07 20:49:58 UTC - Daniel Ferreira Jorge: @Daniel Ferreira Jorge 
uploaded a file: 
<https://apache-pulsar.slack.com/files/U8E1J0DHS/F8PCTFMV3/stack.txt|stack.txt> 
and commented: This is the logs from BK-0 pod... seems it connected with the 
ZK-2...
----
2018-01-07 20:50:44 UTC - Daniel Ferreira Jorge: BK-1 and BK-2 connected to ZK-1
----
2018-01-07 20:51:00 UTC - Matteo Merli: Uhm, interesting
----
2018-01-07 20:51:19 UTC - Matteo Merli: It might be related to how the property 
file is loaded
----
2018-01-07 20:51:42 UTC - Daniel Ferreira Jorge: maybe BK strips the string? 
and the broker does not?
----
2018-01-07 20:53:15 UTC - Matteo Merli: I think in broker, we’re just picking 
that as a String
----
2018-01-07 20:53:27 UTC - Matteo Merli: in BK, it’s reading the property as a 
list:
----
2018-01-07 20:53:28 UTC - Matteo Merli: 
<https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/AbstractConfiguration.java#L157>
----
2018-01-07 20:56:02 UTC - Daniel Ferreira Jorge: it was the spaces... 7 hours 
on this man... 7!
----
2018-01-07 20:56:14 UTC - Daniel Ferreira Jorge: I removed and it works now
----
2018-01-07 20:56:57 UTC - Daniel Ferreira Jorge: unbelievable
----
2018-01-07 20:57:17 UTC - Daniel Ferreira Jorge: thanks for the help @Matteo 
Merli
----
2018-01-07 20:57:55 UTC - Matteo Merli: :grinning:
----
2018-01-07 21:10:17 UTC - Matteo Merli: Sorry for that
----
2018-01-07 21:11:03 UTC - Daniel Ferreira Jorge: sorry for what??
----

Reply via email to