2018-05-17 09:15:45 UTC - Xiaolin Zhang: @Xiaolin Zhang has joined the channel
----
2018-05-17 15:01:19 UTC - Igor Zubchenok: Hello guys!

We're connecting pulsar client to a single broker by URL. If this broker goes 
down, pulsar client does not try to connect to other working brokers. I 
expected Pulsar client to do this.
What guidelines do you have to handle this case?
----
2018-05-17 15:03:18 UTC - Matteo Merli: A common way is to either use a VIP 
load balancer for service discovery (if available) or setup a DNS name that 
resolve to the list of IPs of the brokers 
----
2018-05-17 15:13:12 UTC - Igor Zubchenok: We'll try this to 'setup a DNS name 
that resolve to the list of IPs of the brokers'

However we broke Pulsar finally and get this exception:
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
----
2018-05-17 15:14:19 UTC - Igor Zubchenok: Bookkeeper has many exceptions in 
logs after restarting main broker.
----
2018-05-17 15:15:47 UTC - Vasily Yanov: exceptions:
```
2018-05-17 15:14:36,509 - WARN  [bookkeeper-ml-workers-38-1:ServerCnx@650] - 
[/1.1.1.1:32945][<persistent://server-ali-t2-1526567437134/prod-pulsar-cluster-1/session_queue/7092e9f8-80a1-4769-8a89-76c682a38a73>][sender]
 Failed to create consumer: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
java.util.concurrent.CompletionException: 
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
```
----
2018-05-17 15:17:35 UTC - Vasily Yanov: ```
2018-05-17 15:14:51,507 - ERROR 
[BookKeeperClientWorker-23-1:PersistentDispatcherSingleActiveConsumer@323] - 
[<persistent://server-ali-t2-1526567437134/prod-pulsar-cluster-1/session_init/1ab5aea6-bebc-472d-b4b2-7492f3af44a4>
 / 
sender-Consumer{subscription=PersistentSubscription{topic=<persistent://server-ali-t2-1526567437134/prod-pulsar-cluster-1/session_init/1ab5aea6-bebc-472d-b4b2-7492f3af44a4>,
 name=sender}, consumerId=1746, consumerName=38bc2, address=/1.1.1.1:32945}] 
Error reading entries at 1092669:1 : Bookie operation timeout - Retrying to 
read in 15.0 seconds
2018-05-17 15:15:06,507 - WARN  [BookKeeperClientWorker-22-1:PendingAddOp@238] 
- Write did not succeed: L1101172 E0 on 1.1.1.1:3181, rc = -23
2018-05-17 15:15:06,507 - WARN  
[BookKeeperClientWorker-22-1:RackawareEnsemblePlacementPolicy@553] - Failed to 
choose a bookie: excluded [&lt;Bookie:1.1.1.1:3181&gt;, 
&lt;Bookie:3.3.3.3:3181&gt;, &lt;Bookie:2.2.2.2:3181&gt;], fallback to choose 
bookie randomly from the cluster.
```
----
2018-05-17 15:18:35 UTC - Vasily Yanov: ```
2018-05-17 15:17:21,509 - WARN  [BookKeeperClientWorker-25-1:PendingAddOp@238] 
- Write did not succeed: L1100903 E418 on 2.2.2.2:3181, rc = -23
2018-05-17 15:17:21,509 - WARN  [BookKeeperClientWorker-25-1:LedgerHandle@919] 
- Write did not succeed to 2.2.2.2:3181, bookieIndex 0, but we have already 
fixed it.
2018-05-17 15:17:21,509 - WARN  [BookKeeperClientWorker-25-1:PendingAddOp@238] 
- Write did not succeed: L1100903 E421 on 2.2.2.2:3181, rc = -23
2018-05-17 15:17:21,509 - WARN  [BookKeeperClientWorker-25-1:LedgerHandle@919] 
- Write did not succeed to 2.2.2.2:3181, bookieIndex 0, but we have already 
fixed it.
```
----
2018-05-17 15:32:38 UTC - Igor Zubchenok: We're trying to reproduce it again 
and send you all logs (zk+bk+broker from all 3 nodes) with INFO level
----
2018-05-17 15:42:37 UTC - Sijie Guo: @Igor Zubchenok do you mind describing the 
sequence on how this happens?
----
2018-05-17 15:49:38 UTC - Igor Zubchenok: - we setup 3 nodes: pulsar-01, 
pulsar-02, pulsar-03
- we start pulsar-03, then pulsar-02, then pulsar-01
- then our pulsar client connects to pulsarbroker-03 directly (no VIP load 
balancing or multiple IP addresses in DNS)
- then we stop pulsarbroker-01 wait 30 seconds, start pulsarbroker-01
- then we wait 2 minutes, stop pulsarbroker-02 wait 30 seconds, start 
pulsarbroker-02
- then we wait 2 minutes, stop pulsarbroker-03
- here our instance has a lot of exceptions
- we wait 30 seconds, start pulsarbroker-03
- our instance cannot work even after restart until we create another 
topics/properties
----
2018-05-17 15:59:41 UTC - Sijie Guo: a couple of more questions:

- are you using default configuration, basically replication settings 2/2/2?
- the WARN logging seems to be normal when you kill a pulsar instance. because 
it basically try to write the entries to the pulsar broker (bookie) you stop, 
and it does ensemble changes.
- the last step seems a bit unusual. “instance cannot work until create another 
topics”? can you describe more about “instance cannot work”?
----
2018-05-17 16:05:02 UTC - Vasily Yanov: @Sijie Guo
1. it should be because we didn't changed anything in bookkeeper.conf except zk 
hosts and journalSyncData
2. ok
----
2018-05-17 16:05:37 UTC - Igor Zubchenok: 3. my instance cannot work cause I 
get 
`org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger`
note: we stop/start only broker during testing.
P.S. we'have failed to reproduce the issue with steps above, but we've deleted 
more than 40GB of bookkeeper and zookeeper data.
----
2018-05-17 16:05:41 UTC - Vasily Yanov: btw: what parameters should I check in 
order to be sure about replication values?
----
2018-05-17 16:15:31 UTC - Sijie Guo: @Vasily Yanov in the broker conf, 
managedLedgerDefaultEnsembleSize / managedLedgerDefaultWriteQuorum / 
managedLedgerDefaultAckQuorum
----
2018-05-17 16:16:03 UTC - Sijie Guo: &gt; we stop/start only broker during 
testing.

so you run bookies and brokers as separate processes, or in same process?
----
2018-05-17 16:17:04 UTC - Vasily Yanov: ```
cat /opt/pulsar/conf/broker.conf | grep -E 
"managedLedgerDefaultEnsembleSize|managedLedgerDefaultWriteQuorum|managedLedgerDefaultAckQuorum"
managedLedgerDefaultEnsembleSize=2
managedLedgerDefaultWriteQuorum=2
managedLedgerDefaultAckQuorum=2
```
----
2018-05-17 16:17:11 UTC - Sijie Guo: &gt; my instance cannot work cause I get 
`org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger`

interesting. I am wondering if that’s transient errors. does the client succeed 
after retries?
----
2018-05-17 16:17:24 UTC - Sijie Guo: @Vasily Yanov thank you
----
2018-05-17 16:18:18 UTC - Vasily Yanov: no
----
2018-05-17 16:19:07 UTC - Vasily Yanov: how I can check if bookie and broker as 
separate process or no?
----
2018-05-17 16:25:27 UTC - Sijie Guo: how do you start the pulsar brokers?
----
2018-05-17 16:25:58 UTC - Igor Zubchenok: no, it does not succeed, we tried 
several times
----
2018-05-17 16:36:40 UTC - Vasily Yanov: as systemd unit:
----
2018-05-17 16:36:51 UTC - Vasily Yanov: ```
[Unit]

Description=Apache Pulsar
Documentation=<https://pulsar.incubator.apache.org/>
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
#ExecStart=/opt/pulsar/bin/pulsar-daemon start broker
ExecStart=/opt/pulsar/bin/pulsar broker
ExecStop=/opt/pulsar/bin/pulsar-daemon stop broker
Restart=on-failure
SyslogIdentifier=broker
LimitNOFILE=64536
LimitNPROC=8192

[Install]
WantedBy=multi-user.target
```
----
2018-05-17 16:42:28 UTC - Karthik Palanivelu: Hello There, I am trying build 
pulsar in docker. I am trying to bring up 2 bookies in port 3181 and 3182 
respectively on the same host. It is failing on below exception. Can you please 
help how I can have multiple bookies on the same host:
----
2018-05-17 16:45:41 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a 
file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FASJYB0HL/-.m|Untitled>
----
2018-05-17 16:54:50 UTC - Karthik Palanivelu: Hi, Parameter does not work on 
RHEL, you need to wrap it in another script like below:

```
#!/bin/bash

export JAVA_HOME=/opt/jdk1.8
/opt/pulsar/bin/pulsar broker
```
----
2018-05-17 17:10:15 UTC - Vasily Yanov: Hi! I think it's not our case.
----
2018-05-17 17:40:44 UTC - Ali Ahmed: @Karthikeyan Palanivelu How are you 
configuring the bookies ? are you using docker compose ?
----
2018-05-17 17:45:25 UTC - Matteo Merli: @Karthikeyan Palanivelu Are they 
sharing the same disk paths?
----
2018-05-17 17:46:39 UTC - Matteo Merli: BookKeeper has a mechanism (called 
“cookies”) to ensure a bookie advertised name matches the data it contains. If 
the are discrepancies, it refuses to startup
----
2018-05-17 17:47:23 UTC - Matteo Merli: in this case it looks one bookie is 
trying to starts with the data that supposed to belong to the other bookie
----
2018-05-17 17:48:12 UTC - Sijie Guo: @Vasily Yanov it seems that this script  
only starts broker. do you have a separate script to start bookie?
----
2018-05-17 17:48:29 UTC - Vasily Yanov: yes
----
2018-05-17 17:49:03 UTC - Vasily Yanov: ```
[Unit]

Description=Bookkeeper
Documentation=something realy strange
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/opt/pulsar/bin/bookkeeper bookie
#ExecStart=/opt/pulsar/bin/pulsar-daemon start bookie
ExecStop=/opt/pulsar/bin/pulsar-daemon stop bookie
Restart=on-failure
SyslogIdentifier=bookkeeper
LimitNOFILE=64536
LimitNPROC=8192

[Install]
WantedBy=multi-user.target
```
----
2018-05-17 17:49:06 UTC - Sijie Guo: oh so you started bookie and broker 
separately, and during your tests, you only kill brokers?
----
2018-05-17 17:49:13 UTC - Vasily Yanov: yes
----
2018-05-17 17:49:17 UTC - Vasily Yanov: right
----
2018-05-17 17:49:24 UTC - Sijie Guo: that’s interesting.
----
2018-05-17 17:49:57 UTC - Vasily Yanov: exactly. Only brokers were affected 
with systemctl stop|start
----
2018-05-17 17:50:19 UTC - Sijie Guo: what are the hardware of these 3 nodes? 
like number of cpus, number of disks, memory size?
----
2018-05-17 17:50:58 UTC - Vasily Yanov: 8xCPU, 32Gb RAM, 2x2Tb HDD in RAID1
----
2018-05-17 17:51:57 UTC - Vasily Yanov: brokers and bookies start with:
-Xms4g -Xmx8g -XX:MaxDirectMemorySize=8g
----
2018-05-17 18:50:37 UTC - Sijie Guo: @Vasily Yanov interesting. the hardware 
settings and jvm settings seem to be good. and since you are only start/stop 
brokers, it shouldn’t be timed out on reading from bookies, that’s a bit 
strange. unless start/stop brokers will impact disks. is the RAID1 used only by 
pulsar?
----
2018-05-17 18:52:12 UTC - Vasily Yanov: Yes. It used only by 
pulsar/bookkeeper/zookeeper
----
2018-05-17 18:53:34 UTC - Sijie Guo: do you have any monitoring mechanisms to 
see what’s happening around network/disks?
----
2018-05-17 18:56:03 UTC - Karthik Palanivelu: @Ali Ahmed @Matteo Merli I am not 
using docker compose. Yes I am trying assign two bookies to same host on same 
data dir. How could I segregate it to own its own path? I prefer to hold base 
path /prod/data/. Inside which bookies should write their data to.
----
2018-05-17 19:00:43 UTC - Sijie Guo: @Karthikeyan Palanivelu: you can create a 
subdirectory under /prod/data/bookie-x for each bookie. for example, 
/prod/data/bookie-3181 and /prod/data/bookie-3182. then when you start the 
docker passing the environment : journalDirectory=/prod/data/bookie-x/journal 
and ledgerDirectories=/prod/data/bookie-x/ledgers
----
2018-05-17 19:01:47 UTC - Sijie Guo: so each bookie will use its separated 
directory, those two environment variables will configure the bookie docker 
process to use different directories.
----
2018-05-17 19:01:57 UTC - Sijie Guo: does that address your requirement?
----
2018-05-17 19:02:19 UTC - Karthik Palanivelu: Yes Cool That works for me...
heavy_check_mark : Sijie Guo
----
2018-05-17 19:02:53 UTC - Sijie Guo: that’s interesting. after that it never 
succeed, or it eventually succeed?
----
2018-05-17 19:03:06 UTC - Karthik Palanivelu: One more question, how can I 
associate the data created by bookie-1 to bookie-2 when bookie-1 is dead?
----
2018-05-17 19:04:34 UTC - Karthik Palanivelu: Or is that even advisable?
----
2018-05-17 19:04:38 UTC - Ali Ahmed: I don’t think it’s advisable
----
2018-05-17 19:05:30 UTC - Ali Ahmed: you generally don’t re associate node 
data, you keep enough nodes with replicas to tolerate failures
----
2018-05-17 19:06:56 UTC - Karthik Palanivelu: Ok Cool got it.
----
2018-05-17 19:11:25 UTC - Karthik Palanivelu: Related to the above 
question/answer, If I get a residue of data left after being few containers are 
dead, do I need to clean up the disk eventually?
----
2018-05-17 19:21:15 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a 
file: 
<https://apache-pulsar.slack.com/files/U7VRE0Q1G/FARJR4DNG/-.xml|Untitled> and 
commented: Do we have this feature built in for Pulsar instance of BookKeeper?
----
2018-05-17 19:29:32 UTC - Sijie Guo: @Karthikeyan Palanivelu 

- wondering where is the text from? it seems to be out-of-dated. e.g. 
BookKeeperTools is removed and the new command is `bin/bookkeeper shell 
recover`; now bookie supports adding a new disk on-the-fly; and such.

&gt; Do we have this feature built in for Pulsar instance of BookKeeper?

It should be also available in the shell script shipped as part of pulsar. `$ 
bin/bookkeeper shell recover`

back to your original question:

&gt; If I get a residue of data left after being few containers are dead, , do 
I need to clean up the disk eventually?

do you need this data? if you need this data, you don’t need to do anything, 
just relauch your docker process.

if you don’t need this data, you can just simply wipe out the data by removing 
the directory; or use the tool `bin/bookkeeper shell bookieformat`
----
2018-05-17 19:32:18 UTC - Ali Ahmed: @Karthikeyan Palanivelu I think you may be 
looking at twitter bookkeeper which is an old repo
----
2018-05-17 19:32:29 UTC - Ali Ahmed: the new location is here 
```<https://github.com/apache/bookkeeper>```
----
2018-05-17 21:08:27 UTC - Karthik Palanivelu: Oh sure got it. Let me try this 
option. Reason is in case the docker IP changes on the host upon restart/start  
we should have a means to associate the data with Bookie.
----
2018-05-17 21:08:53 UTC - Sijie Guo: oh i see
----
2018-05-17 21:09:33 UTC - Sijie Guo: you actually can configure 
advertisedAddress, which you probably can use hostIP the adverstisedAddress
----
2018-05-17 21:09:48 UTC - Sijie Guo: it is able to do it in k8s. so I assume it 
is doable using plain docker
----
2018-05-17 21:10:29 UTC - Sijie Guo: I am not sure if anyone in the slack 
channel how docker can use hostIP, if anyone has quick answer please help. 
otherwise I can look around.
----
2018-05-17 21:12:20 UTC - Matteo Merli: It should be something like : 

```
docker run apachepulsar/pulsar -e advertisedAddress=1.2.3.4:3181 -p 3181:3181 
"bin/apply-config-from-env.py conf/bookkeeper.conf &amp;&amp; bin/pulsar 
bookie"```
+1 : Sijie Guo
----
2018-05-17 21:13:09 UTC - Matteo Merli: Haven’t really tried the precise 
command..
----
2018-05-17 21:14:07 UTC - Matteo Merli: point is that you just need to pass env 
variable inside the container and then you can use the 
`apply-config-from-env.py` to have the values replaced in the config files
----
2018-05-17 22:34:16 UTC - Igor Zubchenok: We've just released our Pulsar based 
solution to production.
----
2018-05-17 22:34:21 UTC - Igor Zubchenok: :slightly_smiling_face:
passenger_ship : Matteo Merli, Ali Ahmed, Sijie Guo, Jerry Peng, Jon Bock, 
Guillaume LECROC
----
2018-05-17 22:35:39 UTC - Ali Ahmed: Would you like to add a logo to apache 
pulsar’s website ?
----
2018-05-17 22:38:23 UTC - Igor Zubchenok: Why not, could you send svg?
----
2018-05-17 22:41:28 UTC - Ali Ahmed: it’s in the pulsar repo 
“./site/img/pulsar.svg”
----
2018-05-17 22:41:45 UTC - Ali Ahmed: @Ali Ahmed uploaded a file: 
<https://apache-pulsar.slack.com/files/U6EHQ91KM/FARUUQSSW/pulsar.svg|pulsar.svg>
----
2018-05-18 01:05:17 UTC - Karthik Palanivelu: @Sijie Guo @Matteo Merli Will try 
this option and will get back to you. Thanks and Appreciate your time.
----
2018-05-18 04:24:23 UTC - Anand Ranganathan: @Anand Ranganathan has joined the 
channel
----
2018-05-18 08:00:46 UTC - Marco Didonna: @Marco Didonna has joined the channel
----
2018-05-18 08:18:44 UTC - Vasily Yanov: @Sijie Guo I have zabbix but nothing is 
strange I saw regarding the disks/CPU state at mentioned moment. Ok. Will 
continue my investigation
----

Reply via email to