Slack digest for #general - 2019-10-23

Apache Pulsar Slack Wed, 23 Oct 2019 02:11:46 -0700

2019-10-22 10:51:26 UTC - sunliuchang: @sunliuchang has joined the channel
----
2019-10-22 12:17:25 UTC - Retardust: Consumer stucks after restart. And only 
after restarting pulsar and consumer it's continues to parse backlog.
any ideas? nothing in logs
----
2019-10-22 12:37:34 UTC - Sijie Guo: Can you get topics stats using 
“pulsar-admin topic-stats” ?
----
2019-10-22 12:41:53 UTC - Alexandre DUVAL: Hi, I have an issue on a function: 
it starts well and after consuming 3xxx messages it stucks. No error, function 
is considered running, the stucked message is never the same. Each time I 
restart the function always 3xxx messages and stuck appears. There is RAM 
available. The stuck appears on context.publish(). Do you have an idea?
----
2019-10-22 12:44:10 UTC - Retardust: {
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 10.00003482795463,
  "msgThroughputOut" : 3106588.20290695,
  "averageMsgSize" : 0.0,
  "storageSize" : 10668635541,
  "publishers" : [ ],
  "subscriptions" : {
    "skdf4k" : {
      "msgRateOut" : 0.0,
      "msgThroughputOut" : 0.0,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 383636,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "msgRateExpired" : 0.0,
      "consumers" : [ ],
      "isReplicated" : false
    },
    "journal_consumer" : {
      "msgRateOut" : 10.00003482795463,
      "msgThroughputOut" : 3106588.20290695,
      "msgRateRedeliver" : 0.0,
      "msgBacklog" : 383736,
      "blockedSubscriptionOnUnackedMsgs" : false,
      "msgDelayed" : 0,
      "unackedMessages" : 0,
      "type" : "Failover",
      "activeConsumerName" : "91149",
      "msgRateExpired" : 0.0,
      "consumers" : [ {
        "msgRateOut" : 10.00003482795463,
        "msgThroughputOut" : 3106588.20290695,
        "msgRateRedeliver" : 0.0,
        "consumerName" : "91149",
        "availablePermits" : 0,
        "unackedMessages" : 0,
        "blockedConsumerOnUnackedMsgs" : false,
        "metadata" : { },
        "connectedSince" : "2019-10-22T12:14:02.351Z",
        "clientVersion" : "2.4.1",
        "address" : "/172.28.117.8:60366"
      } ],
      "isReplicated" : true
    }
  },
  "replication" : { },
  "deduplicationStatus" : "Disabled"
}



seems ok
----
2019-10-22 12:44:41 UTC - Retardust: but it's after restart
----
2019-10-22 12:45:06 UTC - Retardust: I will try to get stats on problem
----
2019-10-22 12:48:29 UTC - Raph: @Raph has joined the channel
----
2019-10-22 12:57:03 UTC - Alexandre DUVAL: 
----
2019-10-22 13:08:23 UTC - Sijie Guo: OK :ok_hand:
----
2019-10-22 13:23:41 UTC - Alexandre DUVAL: I bumped my function worker from 
2.4.0 to 2.4.1 and now get this error:
----
2019-10-22 13:23:43 UTC - Alexandre DUVAL: ```13:10:51.038 
[clevercloud/functions/accessLogsCleverCloudADCHaproxy-0] ERROR 
org.apache.pulsar.functions.instance.JavaInstanceRunnable - 
[clevercloud/functions/accessLogsCleverCloudADCHaproxy:0] Uncaught exception in 
Java Instance
java.lang.RuntimeException: User class constructor throws exception
        at 
org.apache.pulsar.functions.utils.Reflections.createInstance(Reflections.java:126)
 ~[org.apache.pulsar-pulsar-functions-utils-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.functions.instance.JavaInstanceRunnable.setupJavaInstance(JavaInstanceRunnable.java:189)
 ~[org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
        at 
org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:234)
 [org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_192]
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_192]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_192]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_192]
        at 
org.apache.pulsar.functions.utils.Reflections.createInstance(Reflections.java:118)
 ~[org.apache.pulsar-pulsar-functions-utils-2.4.1.jar:2.4.1]
        ... 3 more
Caused by: java.lang.LinkageError: ClassCastException: attempting to 
castjar:file:/pulsar/lib/javax.ws.rs-javax.ws.rs-api-2.1.jar!/javax/ws/rs/client/ClientBuilder.class
 to 
file:/tmp/pulsar-nar/pulsar-functions-0.1.0-SNAPSHOT.jar-unpacked/javax/ws/rs/client/ClientBuilder.class
        at javax.ws.rs.client.ClientBuilder.newBuilder(ClientBuilder.java:81) 
~[javax.ws.rs-javax.ws.rs-api-2.1.jar:2.1]
        at javax.ws.rs.client.ClientBuilder.newClient(ClientBuilder.java:97) 
~[javax.ws.rs-javax.ws.rs-api-2.1.jar:2.1]
        at 
com.clevercloud.pulsar.util.GeoIPAPI.updateDatabase(GeoIPAPI.java:101) ~[?:?]
        at com.clevercloud.pulsar.util.GeoIPAPI.&lt;init&gt;(GeoIPAPI.java:45) 
~[?:?]
        at 
com.clevercloud.pulsar.function.ApplicationsAddonsHaproxyAccessLogs.&lt;init&gt;(ApplicationsAddonsHaproxyAccessLogs.java:28)
 ~[?:?]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_192]
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_192]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_192]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_192]
        at 
org.apache.pulsar.functions.utils.Reflections.createInstance(Reflections.java:118)
 ~[org.apache.pulsar-pulsar-functions-utils-2.4.1.jar:2.4.1]
        ... 3 more
13:10:51.047 [clevercloud/functions/accessLogsCleverCloudADCHaproxy-0] INFO  
org.apache.pulsar.functions.instance.JavaInstanceRunnable - Closing instance```
----
2019-10-22 14:12:24 UTC - Tim Howard: ASF Jenkins still sideways? it looks like 
it from the build statuses...
----
2019-10-22 14:18:17 UTC - Matteo Merli: We’re still working and getting closer 
to a solution 
----
2019-10-22 14:18:51 UTC - Tim Howard: thanks for the update
----
2019-10-22 14:49:12 UTC - dbartz: @dbartz has joined the channel
----
2019-10-22 15:04:53 UTC - Alexandre DUVAL: @Matteo Merli After bump to 2.4.1 
it's not shaded anymore?
----
2019-10-22 15:05:43 UTC - Matteo Merli: No, the change was meant for 2.5 though 
got backported to 2.4.1 as well
----
2019-10-22 15:06:26 UTC - Matteo Merli: the function framework is not shaded 
anymore, rather it’s using different classloaders for framewokr and user code
----
2019-10-22 15:16:28 UTC - Alexandre DUVAL: So how should I use this?
----
2019-10-22 15:19:05 UTC - Alexandre DUVAL: Do you have example?
----
2019-10-22 15:33:19 UTC - Retardust: ```
public class Bridge implements MessageListener&lt;byte[]&gt; {

    private final Producer&lt;JournalBatch&gt; batchProducer;
    private final JournalBatchParser parser;

    @Override
    @SneakyThrows
    public void received(Consumer&lt;byte[]&gt; consumer, Message&lt;byte[]&gt; 
msg) {
        JournalBatch batch = parse(msg);
        batchProducer.sendAsync(batch)
                .thenAccept(messageId -&gt; ack(consumer, msg));
    }
    
    private JournalBatch parse(Message&lt;byte[]&gt; msg) {
            return parser.parse(msg.getData());
    }

    @SneakyThrows
    private void ack(Consumer&lt;byte[]&gt; consumer, Message&lt;byte[]&gt; 
msg) {
        consumer.acknowledgeCumulativeAsync(msg)
                .thenAccept(d -&gt; log.debug("Message ack"));
    }
}

```

is that ok for connect two topics with preserving order and at-least-once 
guaranties?
which settings I should pay attention?
----
2019-10-22 15:35:48 UTC - Matteo Merli: make sure to set 
`blockIfQueueFull(true)` when creating the `batchProducer`
heavy_check_mark : Retardust
----
2019-10-22 15:36:27 UTC - Matteo Merli: to get backpressure (instead of error) 
when publishing on the downstream topic
----
2019-10-22 15:39:11 UTC - Matteo Merli: also, you’d need to handle send 
failures. There are 2 possible ways:
 1. `sendTimeout` set to 0, to have producer to retry forever
 2. Negative ack when publish error:

```
batchProducer.sendAsync(batch)
                .thenAccept(messageId -&gt; ack(consumer, msg))
                .exceptionally(ex -&gt; {
                     consumer.negativeAck(msg);
                     return null;
                });
```
----
2019-10-22 15:45:14 UTC - Retardust: ok
----
2019-10-22 15:48:06 UTC - Retardust: But for throughput and latency everythink 
seems ok?
It's not fast right now:( I have only one consumer, cause I need to parse 
single ordered stream without partitioning.

I see 200mbit/s throughtput
20% usage of cpu
not a lot of gc pauses.
there could be the bottleneck?

should I check settings like direct buffers for example?
----
2019-10-22 15:50:08 UTC - Retardust: default overides

Producer:
batching enabled, 50ms window, up to 500 messages
1000 messages maxPendingMessages
LZ4

Consumer:
1000 max receiver q
----
2019-10-22 15:50:33 UTC - Retardust: messages something between 5kb and 1mb
----
2019-10-22 15:50:52 UTC - Matteo Merli: are the messages batched in the 
upstream topic?
----
2019-10-22 15:51:19 UTC - Matteo Merli: also, check the topic stats for the 
upstream topic
----
2019-10-22 15:51:35 UTC - Matteo Merli: `pulsar-admin topics stats $TOPIC`
----
2019-10-22 15:52:31 UTC - Matteo Merli: and check for :

```
  "availablePermits" : 766, // Number of flow-control permits that Pulsar
                                  // has currently from a consumer. When &gt; 
0, it
                                  // means Pulsar can push more messages. When 
it's
                                  // &lt;= 0, the broker will pause the 
delivery to
                                  // adjust to consumer processing speed
```
----
2019-10-22 15:53:23 UTC - Retardust: availablePermits = 1000 in stats at least
----
2019-10-22 15:54:03 UTC - Matteo Merli: when traffic is ongoing?
----
2019-10-22 15:54:11 UTC - Matteo Merli: then consumer is fast enough
----
2019-10-22 15:54:53 UTC - Retardust: is there prometheus metric to check this? 
doesn't see
----
2019-10-22 15:57:55 UTC - Matteo Merli: no, it’s not reported on Prometheus
----
2019-10-22 16:08:41 UTC - Retardust: "availablePermits" : 600,
on load.
but there is huge lag (Im reset offset and wait to reprocess)
cpu is ok, gc is ok:) network is ok:)

but still low rate

```
      "msgRateOut" : 4.999984407631958,
      "msgThroughputOut" : 2559363.268668303,

```
----
2019-10-22 16:24:56 UTC - Retardust: and what stats should I check on the 
upstream topic? permits are on upstream topic consumer
rates are from upstream topic too
----
2019-10-22 16:27:34 UTC - Retardust: downstream topic stats are weird
```
{
  "msgRateIn" : 0.0,
  "msgThroughputIn" : 0.0,
  "msgRateOut" : 0.0,
  "msgThroughputOut" : 0.0,
  "averageMsgSize" : 0.0,
  "storageSize" : 64107851814,
  "publishers" : [ {
    "msgRateIn" : 0.0,
    "msgThroughputIn" : 0.0,
    "averageMsgSize" : 0.0,
    "producerId" : 0,
    "metadata" : { },
    "producerName" : "kappa-1295-20",
    "connectedSince" : "2019-10-22T15:44:42.119Z",
    "clientVersion" : "2.4.1",
    "address" : "/172.28.117.36:58302"
  } ],
```
----
2019-10-22 16:39:03 UTC - Retardust: ```

786837 2019-10-22 18:57:38,992 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ClientCnx                                     | 
[<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
 Broker notification of Closed consumer: 0
786838 2019-10-22 18:57:38,993 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ConnectionHandler                             | 
[<persistent://t1/n1/queue_journal>] [journal_consumer] Closed connection [id: 
0x6bb31ad1, L:/172.17.0.50:58302 - 
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
 -- Will try again in 0.1 s
786940 2019-10-22 18:57:39,095 INFO  [ pulsar-timer-6-1 ] 
o.a.p.c.i.ConnectionHandler                             | 
[<persistent://t1/n1/queue_journal>] [journal_consumer] Reconnecting after 
timeout
787007 2019-10-22 18:57:39,162 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ConsumerImpl                                  | 
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribing to topic on 
cnx [id: 0x6bb31ad1, L:/172.17.0.50:58302 - 
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
787010 2019-10-22 18:57:39,165 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ConsumerImpl                                  | 
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribed to topic on 
<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>
 -- consumer: 0
1170871 2019-10-22 19:04:03,026 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ClientCnx                                     | 
[<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
 Broker notification of Closed consumer: 0
1170872 2019-10-22 19:04:03,027 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ConnectionHandler                             | 
[<persistent://t1/n1/queue_journal>] [journal_consumer] Closed connection [id: 
0x6bb31ad1, L:/172.17.0.50:58302 - 
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
 -- Will try again in 0.1 s
1170974 2019-10-22 19:04:03,129 INFO  [ pulsar-timer-6-1 ] 
o.a.p.c.i.ConnectionHandler                             | 
[<persistent://t1/n1/queue_journal>] [journal_consumer] Reconnecting after 
timeout
1171101 2019-10-22 19:04:03,256 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ConsumerImpl                                  | 
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribing to topic on 
cnx [id: 0x6bb31ad1, L:/172.17.0.50:58302 - 
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
1171104 2019-10-22 19:04:03,259 INFO  [ pulsar-client-io-1-1 ] 
o.a.p.c.i.ConsumerImpl                                  | 
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribed to topic on 
<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>
 -- consumer: 0
```

what could be the reason?
----
2019-10-22 16:50:55 UTC - Alexandre DUVAL: @xiaolong.ran hi, @Sijie Guo told me 
to tag on this. About function stucked on context.publish After processed ~3000 
messages. Same After multiple restarts.
----
2019-10-22 16:51:12 UTC - Alexandre DUVAL: This function was running for few 
weeks ans today got this.
----
2019-10-22 17:36:49 UTC - Retardust: I wondering why
msgRateIn per topic is 3mb
but Publish throughput for producer is 12.87 msg/s --- 816.96 Mbit/s
:slightly_smiling_face:
----
2019-10-22 17:37:32 UTC - Sergey Zhemzhitsky: What do you guys think about 
recent announcement of Streamlio acquisition by Splunk?
Splunk has already had Kafka and Flink internally, so I’m worried about 
Pulsar’s destiny.
```
Streamlio's experience with Pulsar, combined with Splunk's existing expertise 
in Apache Flink and Apache Kafka will result in the world's best real-time 
stream processing solution.
...
Splunk intends to continue to maintain Apache Pulsar and other projects through 
our acquisition of Streamlio. We're eager to find new ways to support the 
Apache Software Foundation, and the Pulsar project.
```

<https://www.splunk.com/blog/2019/10/21/splunk-to-expand-streaming-expertise-announces-intent-to-acquire-streamlio-open-source-distributed-messaging-leader.html>
----
2019-10-22 17:41:50 UTC - Endre Karlson: @Sijie Guo ^??
----
2019-10-22 17:42:25 UTC - Matteo Merli: Splunk has committed to ensuring the 
ongoing growth and success of Apache Pulsar through contributions and 
continuing support of the open source community (see Splunk blog).
----
2019-10-22 17:47:36 UTC - Sergey Zhemzhitsky: Well, Splunk will be fully 
committed in case it decides to replace its internal dataflows going through 
Kafka with Pulsar )
----
2019-10-22 17:59:11 UTC - Retardust: and
pulsar_rate_in : 80msg/s
but
rate(pulsar_storage_backlog_size[1m]) for same topic is
3 000 000 msg/s

what?))
----
2019-10-22 17:59:19 UTC - Matteo Merli: For now, we can only comment that 
Splunk plans to use Apache Pulsar in a number of its internal services and 
products.

I’m the least worried about Pulsar’s destiny :slightly_smiling_face:We’ve been 
working on maturing the technology for many years now and we’ll continue on the 
same path, to take it to the next level.

At the same time you might have noticed that the community has considerably 
expanded, with many companies invested on it for critical systems and 
contributing back.
----
2019-10-22 18:07:55 UTC - Retardust: pulsar_storage_backlog_size seems to be in 
bytes, not messages
?
documentation says it's messages
----
2019-10-22 18:46:47 UTC - Vladimir Shchur: Can you please comment regarding 
streamlio cloud? Is it discontinued?
----
2019-10-22 18:50:43 UTC - David Kjerrumgaard: @Vladimir Shchur The free trial 
period for Streamlio cloud has concluded. Any existing trials will continue 
until they conclude, but we will not be accepting new trial applications at 
this time.
----
2019-10-22 18:55:01 UTC - Vladimir Shchur: @David Kjerrumgaard what about 
non-trial offering? We've regarded Streamlio as Pulsar as a service platform on 
AWS and planned to have some business with it, is it gone?
----
2019-10-22 18:59:13 UTC - Chris Bartholomew: FYI, we offer Pulsar as a service 
on AWS and GCP. Azure coming soon. <https://kafkaesque.io/>
----
2019-10-22 19:02:36 UTC - Sijie Guo: @Sergey Zhemzhitsky Apache Pulsar is a 
100% open source project, hosted at the vendor-independent Apache Software 
Foundation. PMC is the group of people who lead the direction and development 
of Pulsar. Pulsar PMC is from many different companies, Yahoo, Yahoo! JAPAN, 
Zhaopin and etc. It will not fall apart due to one vendor acquisition. That’s 
also the whole point of running Pulsar in the Apache way.  Community lives much 
much longer than vendors.

Also the Pulsar community is growing really fast. Many large companies have 
already invested heavily in using Pulsar in their mission critical services. 
For example, Tencent (one of the largest and most valuable internet companies) 
has adopted and run Pulsar at a very large scale 
(<https://streamnative.io/blog/tech/2019-10-22-powering-tencent-billing-platform-with-apache-pulsar/>).
 It uses Pulsar to power its billing platform for processing tens of billions 
of transactions every day for its total escrowed accounts of 30 billions 
dollars.

We, StreamNative, a company also founded by a group of Pulsar/BookKeeper PMC 
members, will continue our commitments to provide commercial support for Pulsar 
and work with the broader Pulsar community including Splunk to push the project 
to next level. We are really positive about the project and the community, as 
we have helped a lot of companies adopted Pulsar and have seen a fast-growing 
adoption pace. It just takes some time for those adopters to share their 
stories publicly. We have published some of the success stories 
(<https://streamnative.io/success-stories/>) and more will be coming.

Hope this give you some ideas.
heart_eyes : Pierre Zemb
----
2019-10-22 19:06:34 UTC - Vladimir Shchur: Thank you! I've evaluated your 
project as well, but looks like pulsar functions support is missing, which is 
crucial for us
----
2019-10-22 19:07:45 UTC - Chris Bartholomew: Pulsar functions is definitely on 
the roadmap.
----
2019-10-22 19:08:09 UTC - Vladimir Shchur: Any time commitments available?
----
2019-10-22 19:09:23 UTC - Chris Bartholomew: Definitely before the end of Dec, 
if not earlier.
----
2019-10-22 19:09:55 UTC - Vladimir Shchur: Thank you, will keep that in mind
----
2019-10-22 19:10:46 UTC - Chris Bartholomew: We will be supporting all Pulsar 
core features (Functions, IO Connectors,  Schema registry).
----
2019-10-22 19:12:02 UTC - Luke Lu: Congrats on closing the seed round 
:slightly_smiling_face:
----
2019-10-22 22:46:58 UTC - Nicolas Ha: I am having the same kubernetes issue as 
last time pop up again 2 of the 3 brokers are in `CrashLoopBackOff` state. Here 
is what I see in the logs
```
22:41:56.520 [main] ERROR org.apache.pulsar.PulsarBrokerStarter - Failed to 
start pulsar service.
org.apache.pulsar.broker.PulsarServerException: java.lang.RuntimeException: 
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:472) 
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273)
 ~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:332) 
[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
Caused by: java.lang.RuntimeException: 
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
        at 
org.apache.pulsar.functions.worker.WorkerService.start(WorkerService.java:206) 
~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1046)
 ~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:459) 
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
        ... 2 more
Caused by: 
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
        at 
org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:271)
 ~[org.apache.pulsar-pulsar-client-api-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:88)
 ~[org.apache.pulsar-pulsar-client-original-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.functions.worker.FunctionMetaDataManager.getServiceRequestManager(FunctionMetaDataManager.java:484)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.functions.worker.FunctionMetaDataManager.&lt;init&gt;(FunctionMetaDataManager.java:74)
 ~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.functions.worker.WorkerService.start(WorkerService.java:156) 
~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
        at 
org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1046)
 ~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:459) 
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
        ... 2 more
```
Not being a kubernetes expert, what would you check first?
Note, there is plenty of disk
----
2019-10-22 22:48:58 UTC - Nicolas Ha: kubectl describe deployments.apps broker
```
Name:                   broker
Namespace:              default
CreationTimestamp:      Mon, 02 Sep 2019 01:54:28 +0100
Labels:                 app=pulsar
                        component=broker
Annotations:            
<http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: 2
                        
<http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>:
                          
{"apiVersion":"apps/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"broker","namespace":"default"},"spec":{"replicas":3,...
Selector:               app=pulsar,component=broker
Replicas:               3 desired | 3 updated | 3 total | 1 available | 2 
unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:       app=pulsar
                component=broker
  Annotations:  <http://prometheus.io/port|prometheus.io/port>: 8080
                <http://prometheus.io/scrape|prometheus.io/scrape>: true
  Containers:
   broker:
    Image:       apachepulsar/pulsar-all:2.4.1
    Ports:       8080/TCP, 6650/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      sh
      -c
    Args:
      bin/apply-config-from-env.py conf/broker.conf &amp;&amp; 
bin/apply-config-from-env.py conf/pulsar_env.sh &amp;&amp; 
bin/gen-yml-from-env.py conf/functions_worker.yml &amp;&amp; bin/pulsar broker

    Limits:
      memory:  2Gi
    Requests:
      memory:  2Gi
    Environment Variables from:
      broker-config  ConfigMap  Optional: false
    Environment:
      advertisedAddress:   (v1:status.podIP)
    Mounts:               &lt;none&gt;
  Volumes:                &lt;none&gt;
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      False   MinimumReplicasUnavailable
OldReplicaSets:  &lt;none&gt;
NewReplicaSet:   broker-84678846d6 (3/3 replicas created)
Events:          &lt;none&gt;
```
----
2019-10-22 22:58:08 UTC - Ambud Sharma: congratulations @Matteo Merli
----
2019-10-22 23:10:00 UTC - Matteo Merli: thanks @Ambud Sharma
----
2019-10-23 03:39:50 UTC - xiaolong.ran: Hello in broker, is there any log 
information about this error?
----
2019-10-23 05:39:46 UTC - Sijie Guo: it seems that it failed to recover a ledger
----
2019-10-23 05:40:02 UTC - Sijie Guo: did you replace any disk or erase disks 
before?
----
2019-10-23 05:55:25 UTC - Sijie Guo: 
<https://medium.com/streamnative/how-to-use-apache-pulsar-manager-with-herddb-dd265c955ca4>
----
2019-10-23 05:56:29 UTC - Sijie Guo: The new blog post from @Enrico Olivelli 
about using HerdDB in Pulsar Manager.
+1 : Retardust
----
2019-10-23 06:31:01 UTC - Retardust: and there is no info about 
pulsar_msg_backlog metric in documentation, will pr
----
2019-10-23 07:58:52 UTC - Nicolas Ha: Nothing between functioning fine and the 
errors, which is why I am puzzled
----
2019-10-23 07:59:56 UTC - Nicolas Ha: Although I went from 5 nodes to 3 nodes. 
Could it be it?
----
2019-10-23 08:00:21 UTC - Nicolas Ha: And more importantly, if that's the case 
how can I recover?
----

Slack digest for #general - 2019-10-23

Reply via email to