2018-05-17 09:15:45 UTC - Xiaolin Zhang: @Xiaolin Zhang has joined the channel
----
2018-05-17 15:01:19 UTC - Igor Zubchenok: Hello guys!
We're connecting pulsar client to a single broker by URL. If this broker goes
down, pulsar client does not try to connect to other working brokers. I
expected Pulsar client to do this.
What guidelines do you have to handle this case?
----
2018-05-17 15:03:18 UTC - Matteo Merli: A common way is to either use a VIP
load balancer for service discovery (if available) or setup a DNS name that
resolve to the list of IPs of the brokers
----
2018-05-17 15:13:12 UTC - Igor Zubchenok: We'll try this to 'setup a DNS name
that resolve to the list of IPs of the brokers'
However we broke Pulsar finally and get this exception:
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
----
2018-05-17 15:14:19 UTC - Igor Zubchenok: Bookkeeper has many exceptions in
logs after restarting main broker.
----
2018-05-17 15:15:47 UTC - Vasily Yanov: exceptions:
```
2018-05-17 15:14:36,509 - WARN [bookkeeper-ml-workers-38-1:ServerCnx@650] -
[/1.1.1.1:32945][<persistent://server-ali-t2-1526567437134/prod-pulsar-cluster-1/session_queue/7092e9f8-80a1-4769-8a89-76c682a38a73>][sender]
Failed to create consumer:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
java.util.concurrent.CompletionException:
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
```
----
2018-05-17 15:17:35 UTC - Vasily Yanov: ```
2018-05-17 15:14:51,507 - ERROR
[BookKeeperClientWorker-23-1:PersistentDispatcherSingleActiveConsumer@323] -
[<persistent://server-ali-t2-1526567437134/prod-pulsar-cluster-1/session_init/1ab5aea6-bebc-472d-b4b2-7492f3af44a4>
/
sender-Consumer{subscription=PersistentSubscription{topic=<persistent://server-ali-t2-1526567437134/prod-pulsar-cluster-1/session_init/1ab5aea6-bebc-472d-b4b2-7492f3af44a4>,
name=sender}, consumerId=1746, consumerName=38bc2, address=/1.1.1.1:32945}]
Error reading entries at 1092669:1 : Bookie operation timeout - Retrying to
read in 15.0 seconds
2018-05-17 15:15:06,507 - WARN [BookKeeperClientWorker-22-1:PendingAddOp@238]
- Write did not succeed: L1101172 E0 on 1.1.1.1:3181, rc = -23
2018-05-17 15:15:06,507 - WARN
[BookKeeperClientWorker-22-1:RackawareEnsemblePlacementPolicy@553] - Failed to
choose a bookie: excluded [<Bookie:1.1.1.1:3181>,
<Bookie:3.3.3.3:3181>, <Bookie:2.2.2.2:3181>], fallback to choose
bookie randomly from the cluster.
```
----
2018-05-17 15:18:35 UTC - Vasily Yanov: ```
2018-05-17 15:17:21,509 - WARN [BookKeeperClientWorker-25-1:PendingAddOp@238]
- Write did not succeed: L1100903 E418 on 2.2.2.2:3181, rc = -23
2018-05-17 15:17:21,509 - WARN [BookKeeperClientWorker-25-1:LedgerHandle@919]
- Write did not succeed to 2.2.2.2:3181, bookieIndex 0, but we have already
fixed it.
2018-05-17 15:17:21,509 - WARN [BookKeeperClientWorker-25-1:PendingAddOp@238]
- Write did not succeed: L1100903 E421 on 2.2.2.2:3181, rc = -23
2018-05-17 15:17:21,509 - WARN [BookKeeperClientWorker-25-1:LedgerHandle@919]
- Write did not succeed to 2.2.2.2:3181, bookieIndex 0, but we have already
fixed it.
```
----
2018-05-17 15:32:38 UTC - Igor Zubchenok: We're trying to reproduce it again
and send you all logs (zk+bk+broker from all 3 nodes) with INFO level
----
2018-05-17 15:42:37 UTC - Sijie Guo: @Igor Zubchenok do you mind describing the
sequence on how this happens?
----
2018-05-17 15:49:38 UTC - Igor Zubchenok: - we setup 3 nodes: pulsar-01,
pulsar-02, pulsar-03
- we start pulsar-03, then pulsar-02, then pulsar-01
- then our pulsar client connects to pulsarbroker-03 directly (no VIP load
balancing or multiple IP addresses in DNS)
- then we stop pulsarbroker-01 wait 30 seconds, start pulsarbroker-01
- then we wait 2 minutes, stop pulsarbroker-02 wait 30 seconds, start
pulsarbroker-02
- then we wait 2 minutes, stop pulsarbroker-03
- here our instance has a lot of exceptions
- we wait 30 seconds, start pulsarbroker-03
- our instance cannot work even after restart until we create another
topics/properties
----
2018-05-17 15:59:41 UTC - Sijie Guo: a couple of more questions:
- are you using default configuration, basically replication settings 2/2/2?
- the WARN logging seems to be normal when you kill a pulsar instance. because
it basically try to write the entries to the pulsar broker (bookie) you stop,
and it does ensemble changes.
- the last step seems a bit unusual. “instance cannot work until create another
topics”? can you describe more about “instance cannot work”?
----
2018-05-17 16:05:02 UTC - Vasily Yanov: @Sijie Guo
1. it should be because we didn't changed anything in bookkeeper.conf except zk
hosts and journalSyncData
2. ok
----
2018-05-17 16:05:37 UTC - Igor Zubchenok: 3. my instance cannot work cause I
get
`org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger`
note: we stop/start only broker during testing.
P.S. we'have failed to reproduce the issue with steps above, but we've deleted
more than 40GB of bookkeeper and zookeeper data.
----
2018-05-17 16:05:41 UTC - Vasily Yanov: btw: what parameters should I check in
order to be sure about replication values?
----
2018-05-17 16:15:31 UTC - Sijie Guo: @Vasily Yanov in the broker conf,
managedLedgerDefaultEnsembleSize / managedLedgerDefaultWriteQuorum /
managedLedgerDefaultAckQuorum
----
2018-05-17 16:16:03 UTC - Sijie Guo: > we stop/start only broker during
testing.
so you run bookies and brokers as separate processes, or in same process?
----
2018-05-17 16:17:04 UTC - Vasily Yanov: ```
cat /opt/pulsar/conf/broker.conf | grep -E
"managedLedgerDefaultEnsembleSize|managedLedgerDefaultWriteQuorum|managedLedgerDefaultAckQuorum"
managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2
```
----
2018-05-17 16:17:11 UTC - Sijie Guo: > my instance cannot work cause I get
`org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger`
interesting. I am wondering if that’s transient errors. does the client succeed
after retries?
----
2018-05-17 16:17:24 UTC - Sijie Guo: @Vasily Yanov thank you
----
2018-05-17 16:18:18 UTC - Vasily Yanov: no
----
2018-05-17 16:19:07 UTC - Vasily Yanov: how I can check if bookie and broker as
separate process or no?
----
2018-05-17 16:25:27 UTC - Sijie Guo: how do you start the pulsar brokers?
----
2018-05-17 16:25:58 UTC - Igor Zubchenok: no, it does not succeed, we tried
several times
----
2018-05-17 16:36:40 UTC - Vasily Yanov: as systemd unit:
----
2018-05-17 16:36:51 UTC - Vasily Yanov: ```
[Unit]
Description=Apache Pulsar
Documentation=<https://pulsar.incubator.apache.org/>
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
#ExecStart=/opt/pulsar/bin/pulsar-daemon start broker
ExecStart=/opt/pulsar/bin/pulsar broker
ExecStop=/opt/pulsar/bin/pulsar-daemon stop broker
Restart=on-failure
SyslogIdentifier=broker
LimitNOFILE=64536
LimitNPROC=8192
[Install]
WantedBy=multi-user.target
```
----
2018-05-17 16:42:28 UTC - Karthik Palanivelu: Hello There, I am trying build
pulsar in docker. I am trying to bring up 2 bookies in port 3181 and 3182
respectively on the same host. It is failing on below exception. Can you please
help how I can have multiple bookies on the same host:
----
2018-05-17 16:45:41 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a
file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FASJYB0HL/-.m|Untitled>
----
2018-05-17 16:54:50 UTC - Karthik Palanivelu: Hi, Parameter does not work on
RHEL, you need to wrap it in another script like below:
```
#!/bin/bash
export JAVA_HOME=/opt/jdk1.8
/opt/pulsar/bin/pulsar broker
```
----
2018-05-17 17:10:15 UTC - Vasily Yanov: Hi! I think it's not our case.
----
2018-05-17 17:40:44 UTC - Ali Ahmed: @Karthikeyan Palanivelu How are you
configuring the bookies ? are you using docker compose ?
----
2018-05-17 17:45:25 UTC - Matteo Merli: @Karthikeyan Palanivelu Are they
sharing the same disk paths?
----
2018-05-17 17:46:39 UTC - Matteo Merli: BookKeeper has a mechanism (called
“cookies”) to ensure a bookie advertised name matches the data it contains. If
the are discrepancies, it refuses to startup
----
2018-05-17 17:47:23 UTC - Matteo Merli: in this case it looks one bookie is
trying to starts with the data that supposed to belong to the other bookie
----
2018-05-17 17:48:12 UTC - Sijie Guo: @Vasily Yanov it seems that this script
only starts broker. do you have a separate script to start bookie?
----
2018-05-17 17:48:29 UTC - Vasily Yanov: yes
----
2018-05-17 17:49:03 UTC - Vasily Yanov: ```
[Unit]
Description=Bookkeeper
Documentation=something realy strange
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/opt/pulsar/bin/bookkeeper bookie
#ExecStart=/opt/pulsar/bin/pulsar-daemon start bookie
ExecStop=/opt/pulsar/bin/pulsar-daemon stop bookie
Restart=on-failure
SyslogIdentifier=bookkeeper
LimitNOFILE=64536
LimitNPROC=8192
[Install]
WantedBy=multi-user.target
```
----
2018-05-17 17:49:06 UTC - Sijie Guo: oh so you started bookie and broker
separately, and during your tests, you only kill brokers?
----
2018-05-17 17:49:13 UTC - Vasily Yanov: yes
----
2018-05-17 17:49:17 UTC - Vasily Yanov: right
----
2018-05-17 17:49:24 UTC - Sijie Guo: that’s interesting.
----
2018-05-17 17:49:57 UTC - Vasily Yanov: exactly. Only brokers were affected
with systemctl stop|start
----
2018-05-17 17:50:19 UTC - Sijie Guo: what are the hardware of these 3 nodes?
like number of cpus, number of disks, memory size?
----
2018-05-17 17:50:58 UTC - Vasily Yanov: 8xCPU, 32Gb RAM, 2x2Tb HDD in RAID1
----
2018-05-17 17:51:57 UTC - Vasily Yanov: brokers and bookies start with:
-Xms4g -Xmx8g -XX:MaxDirectMemorySize=8g
----
2018-05-17 18:50:37 UTC - Sijie Guo: @Vasily Yanov interesting. the hardware
settings and jvm settings seem to be good. and since you are only start/stop
brokers, it shouldn’t be timed out on reading from bookies, that’s a bit
strange. unless start/stop brokers will impact disks. is the RAID1 used only by
pulsar?
----
2018-05-17 18:52:12 UTC - Vasily Yanov: Yes. It used only by
pulsar/bookkeeper/zookeeper
----
2018-05-17 18:53:34 UTC - Sijie Guo: do you have any monitoring mechanisms to
see what’s happening around network/disks?
----
2018-05-17 18:56:03 UTC - Karthik Palanivelu: @Ali Ahmed @Matteo Merli I am not
using docker compose. Yes I am trying assign two bookies to same host on same
data dir. How could I segregate it to own its own path? I prefer to hold base
path /prod/data/. Inside which bookies should write their data to.
----
2018-05-17 19:00:43 UTC - Sijie Guo: @Karthikeyan Palanivelu: you can create a
subdirectory under /prod/data/bookie-x for each bookie. for example,
/prod/data/bookie-3181 and /prod/data/bookie-3182. then when you start the
docker passing the environment : journalDirectory=/prod/data/bookie-x/journal
and ledgerDirectories=/prod/data/bookie-x/ledgers
----
2018-05-17 19:01:47 UTC - Sijie Guo: so each bookie will use its separated
directory, those two environment variables will configure the bookie docker
process to use different directories.
----
2018-05-17 19:01:57 UTC - Sijie Guo: does that address your requirement?
----
2018-05-17 19:02:19 UTC - Karthik Palanivelu: Yes Cool That works for me...
heavy_check_mark : Sijie Guo
----
2018-05-17 19:02:53 UTC - Sijie Guo: that’s interesting. after that it never
succeed, or it eventually succeed?
----
2018-05-17 19:03:06 UTC - Karthik Palanivelu: One more question, how can I
associate the data created by bookie-1 to bookie-2 when bookie-1 is dead?
----
2018-05-17 19:04:34 UTC - Karthik Palanivelu: Or is that even advisable?
----
2018-05-17 19:04:38 UTC - Ali Ahmed: I don’t think it’s advisable
----
2018-05-17 19:05:30 UTC - Ali Ahmed: you generally don’t re associate node
data, you keep enough nodes with replicas to tolerate failures
----
2018-05-17 19:06:56 UTC - Karthik Palanivelu: Ok Cool got it.
----
2018-05-17 19:11:25 UTC - Karthik Palanivelu: Related to the above
question/answer, If I get a residue of data left after being few containers are
dead, do I need to clean up the disk eventually?
----
2018-05-17 19:21:15 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a
file:
<https://apache-pulsar.slack.com/files/U7VRE0Q1G/FARJR4DNG/-.xml|Untitled> and
commented: Do we have this feature built in for Pulsar instance of BookKeeper?
----
2018-05-17 19:29:32 UTC - Sijie Guo: @Karthikeyan Palanivelu
- wondering where is the text from? it seems to be out-of-dated. e.g.
BookKeeperTools is removed and the new command is `bin/bookkeeper shell
recover`; now bookie supports adding a new disk on-the-fly; and such.
> Do we have this feature built in for Pulsar instance of BookKeeper?
It should be also available in the shell script shipped as part of pulsar. `$
bin/bookkeeper shell recover`
back to your original question:
> If I get a residue of data left after being few containers are dead, , do
I need to clean up the disk eventually?
do you need this data? if you need this data, you don’t need to do anything,
just relauch your docker process.
if you don’t need this data, you can just simply wipe out the data by removing
the directory; or use the tool `bin/bookkeeper shell bookieformat`
----
2018-05-17 19:32:18 UTC - Ali Ahmed: @Karthikeyan Palanivelu I think you may be
looking at twitter bookkeeper which is an old repo
----
2018-05-17 19:32:29 UTC - Ali Ahmed: the new location is here
```<https://github.com/apache/bookkeeper>```
----
2018-05-17 21:08:27 UTC - Karthik Palanivelu: Oh sure got it. Let me try this
option. Reason is in case the docker IP changes on the host upon restart/start
we should have a means to associate the data with Bookie.
----
2018-05-17 21:08:53 UTC - Sijie Guo: oh i see
----
2018-05-17 21:09:33 UTC - Sijie Guo: you actually can configure
advertisedAddress, which you probably can use hostIP the adverstisedAddress
----
2018-05-17 21:09:48 UTC - Sijie Guo: it is able to do it in k8s. so I assume it
is doable using plain docker
----
2018-05-17 21:10:29 UTC - Sijie Guo: I am not sure if anyone in the slack
channel how docker can use hostIP, if anyone has quick answer please help.
otherwise I can look around.
----
2018-05-17 21:12:20 UTC - Matteo Merli: It should be something like :
```
docker run apachepulsar/pulsar -e advertisedAddress=1.2.3.4:3181 -p 3181:3181
"bin/apply-config-from-env.py conf/bookkeeper.conf && bin/pulsar
bookie"```
+1 : Sijie Guo
----
2018-05-17 21:13:09 UTC - Matteo Merli: Haven’t really tried the precise
command..
----
2018-05-17 21:14:07 UTC - Matteo Merli: point is that you just need to pass env
variable inside the container and then you can use the
`apply-config-from-env.py` to have the values replaced in the config files
----
2018-05-17 22:34:16 UTC - Igor Zubchenok: We've just released our Pulsar based
solution to production.
----
2018-05-17 22:34:21 UTC - Igor Zubchenok: :slightly_smiling_face:
passenger_ship : Matteo Merli, Ali Ahmed, Sijie Guo, Jerry Peng, Jon Bock,
Guillaume LECROC
----
2018-05-17 22:35:39 UTC - Ali Ahmed: Would you like to add a logo to apache
pulsar’s website ?
----
2018-05-17 22:38:23 UTC - Igor Zubchenok: Why not, could you send svg?
----
2018-05-17 22:41:28 UTC - Ali Ahmed: it’s in the pulsar repo
“./site/img/pulsar.svg”
----
2018-05-17 22:41:45 UTC - Ali Ahmed: @Ali Ahmed uploaded a file:
<https://apache-pulsar.slack.com/files/U6EHQ91KM/FARUUQSSW/pulsar.svg|pulsar.svg>
----
2018-05-18 01:05:17 UTC - Karthik Palanivelu: @Sijie Guo @Matteo Merli Will try
this option and will get back to you. Thanks and Appreciate your time.
----
2018-05-18 04:24:23 UTC - Anand Ranganathan: @Anand Ranganathan has joined the
channel
----
2018-05-18 08:00:46 UTC - Marco Didonna: @Marco Didonna has joined the channel
----
2018-05-18 08:18:44 UTC - Vasily Yanov: @Sijie Guo I have zabbix but nothing is
strange I saw regarding the disks/CPU state at mentioned moment. Ok. Will
continue my investigation
----