2019-10-22 10:51:26 UTC - sunliuchang: @sunliuchang has joined the channel
----
2019-10-22 12:17:25 UTC - Retardust: Consumer stucks after restart. And only
after restarting pulsar and consumer it's continues to parse backlog.
any ideas? nothing in logs
----
2019-10-22 12:37:34 UTC - Sijie Guo: Can you get topics stats using
“pulsar-admin topic-stats” ?
----
2019-10-22 12:41:53 UTC - Alexandre DUVAL: Hi, I have an issue on a function:
it starts well and after consuming 3xxx messages it stucks. No error, function
is considered running, the stucked message is never the same. Each time I
restart the function always 3xxx messages and stuck appears. There is RAM
available. The stuck appears on context.publish(). Do you have an idea?
----
2019-10-22 12:44:10 UTC - Retardust: {
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 10.00003482795463,
"msgThroughputOut" : 3106588.20290695,
"averageMsgSize" : 0.0,
"storageSize" : 10668635541,
"publishers" : [ ],
"subscriptions" : {
"skdf4k" : {
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"msgRateRedeliver" : 0.0,
"msgBacklog" : 383636,
"blockedSubscriptionOnUnackedMsgs" : false,
"msgDelayed" : 0,
"unackedMessages" : 0,
"msgRateExpired" : 0.0,
"consumers" : [ ],
"isReplicated" : false
},
"journal_consumer" : {
"msgRateOut" : 10.00003482795463,
"msgThroughputOut" : 3106588.20290695,
"msgRateRedeliver" : 0.0,
"msgBacklog" : 383736,
"blockedSubscriptionOnUnackedMsgs" : false,
"msgDelayed" : 0,
"unackedMessages" : 0,
"type" : "Failover",
"activeConsumerName" : "91149",
"msgRateExpired" : 0.0,
"consumers" : [ {
"msgRateOut" : 10.00003482795463,
"msgThroughputOut" : 3106588.20290695,
"msgRateRedeliver" : 0.0,
"consumerName" : "91149",
"availablePermits" : 0,
"unackedMessages" : 0,
"blockedConsumerOnUnackedMsgs" : false,
"metadata" : { },
"connectedSince" : "2019-10-22T12:14:02.351Z",
"clientVersion" : "2.4.1",
"address" : "/172.28.117.8:60366"
} ],
"isReplicated" : true
}
},
"replication" : { },
"deduplicationStatus" : "Disabled"
}
seems ok
----
2019-10-22 12:44:41 UTC - Retardust: but it's after restart
----
2019-10-22 12:45:06 UTC - Retardust: I will try to get stats on problem
----
2019-10-22 12:48:29 UTC - Raph: @Raph has joined the channel
----
2019-10-22 12:57:03 UTC - Alexandre DUVAL:
----
2019-10-22 13:08:23 UTC - Sijie Guo: OK :ok_hand:
----
2019-10-22 13:23:41 UTC - Alexandre DUVAL: I bumped my function worker from
2.4.0 to 2.4.1 and now get this error:
----
2019-10-22 13:23:43 UTC - Alexandre DUVAL: ```13:10:51.038
[clevercloud/functions/accessLogsCleverCloudADCHaproxy-0] ERROR
org.apache.pulsar.functions.instance.JavaInstanceRunnable -
[clevercloud/functions/accessLogsCleverCloudADCHaproxy:0] Uncaught exception in
Java Instance
java.lang.RuntimeException: User class constructor throws exception
at
org.apache.pulsar.functions.utils.Reflections.createInstance(Reflections.java:126)
~[org.apache.pulsar-pulsar-functions-utils-2.4.1.jar:2.4.1]
at
org.apache.pulsar.functions.instance.JavaInstanceRunnable.setupJavaInstance(JavaInstanceRunnable.java:189)
~[org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
at
org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:234)
[org.apache.pulsar-pulsar-functions-instance-2.4.1.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) ~[?:1.8.0_192]
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
~[?:1.8.0_192]
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
~[?:1.8.0_192]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
~[?:1.8.0_192]
at
org.apache.pulsar.functions.utils.Reflections.createInstance(Reflections.java:118)
~[org.apache.pulsar-pulsar-functions-utils-2.4.1.jar:2.4.1]
... 3 more
Caused by: java.lang.LinkageError: ClassCastException: attempting to
castjar:file:/pulsar/lib/javax.ws.rs-javax.ws.rs-api-2.1.jar!/javax/ws/rs/client/ClientBuilder.class
to
file:/tmp/pulsar-nar/pulsar-functions-0.1.0-SNAPSHOT.jar-unpacked/javax/ws/rs/client/ClientBuilder.class
at javax.ws.rs.client.ClientBuilder.newBuilder(ClientBuilder.java:81)
~[javax.ws.rs-javax.ws.rs-api-2.1.jar:2.1]
at javax.ws.rs.client.ClientBuilder.newClient(ClientBuilder.java:97)
~[javax.ws.rs-javax.ws.rs-api-2.1.jar:2.1]
at
com.clevercloud.pulsar.util.GeoIPAPI.updateDatabase(GeoIPAPI.java:101) ~[?:?]
at com.clevercloud.pulsar.util.GeoIPAPI.<init>(GeoIPAPI.java:45)
~[?:?]
at
com.clevercloud.pulsar.function.ApplicationsAddonsHaproxyAccessLogs.<init>(ApplicationsAddonsHaproxyAccessLogs.java:28)
~[?:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) ~[?:1.8.0_192]
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
~[?:1.8.0_192]
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
~[?:1.8.0_192]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
~[?:1.8.0_192]
at
org.apache.pulsar.functions.utils.Reflections.createInstance(Reflections.java:118)
~[org.apache.pulsar-pulsar-functions-utils-2.4.1.jar:2.4.1]
... 3 more
13:10:51.047 [clevercloud/functions/accessLogsCleverCloudADCHaproxy-0] INFO
org.apache.pulsar.functions.instance.JavaInstanceRunnable - Closing instance```
----
2019-10-22 14:12:24 UTC - Tim Howard: ASF Jenkins still sideways? it looks like
it from the build statuses...
----
2019-10-22 14:18:17 UTC - Matteo Merli: We’re still working and getting closer
to a solution
----
2019-10-22 14:18:51 UTC - Tim Howard: thanks for the update
----
2019-10-22 14:49:12 UTC - dbartz: @dbartz has joined the channel
----
2019-10-22 15:04:53 UTC - Alexandre DUVAL: @Matteo Merli After bump to 2.4.1
it's not shaded anymore?
----
2019-10-22 15:05:43 UTC - Matteo Merli: No, the change was meant for 2.5 though
got backported to 2.4.1 as well
----
2019-10-22 15:06:26 UTC - Matteo Merli: the function framework is not shaded
anymore, rather it’s using different classloaders for framewokr and user code
----
2019-10-22 15:16:28 UTC - Alexandre DUVAL: So how should I use this?
----
2019-10-22 15:19:05 UTC - Alexandre DUVAL: Do you have example?
----
2019-10-22 15:33:19 UTC - Retardust: ```
public class Bridge implements MessageListener<byte[]> {
private final Producer<JournalBatch> batchProducer;
private final JournalBatchParser parser;
@Override
@SneakyThrows
public void received(Consumer<byte[]> consumer, Message<byte[]>
msg) {
JournalBatch batch = parse(msg);
batchProducer.sendAsync(batch)
.thenAccept(messageId -> ack(consumer, msg));
}
private JournalBatch parse(Message<byte[]> msg) {
return parser.parse(msg.getData());
}
@SneakyThrows
private void ack(Consumer<byte[]> consumer, Message<byte[]>
msg) {
consumer.acknowledgeCumulativeAsync(msg)
.thenAccept(d -> log.debug("Message ack"));
}
}
```
is that ok for connect two topics with preserving order and at-least-once
guaranties?
which settings I should pay attention?
----
2019-10-22 15:35:48 UTC - Matteo Merli: make sure to set
`blockIfQueueFull(true)` when creating the `batchProducer`
heavy_check_mark : Retardust
----
2019-10-22 15:36:27 UTC - Matteo Merli: to get backpressure (instead of error)
when publishing on the downstream topic
----
2019-10-22 15:39:11 UTC - Matteo Merli: also, you’d need to handle send
failures. There are 2 possible ways:
1. `sendTimeout` set to 0, to have producer to retry forever
2. Negative ack when publish error:
```
batchProducer.sendAsync(batch)
.thenAccept(messageId -> ack(consumer, msg))
.exceptionally(ex -> {
consumer.negativeAck(msg);
return null;
});
```
----
2019-10-22 15:45:14 UTC - Retardust: ok
----
2019-10-22 15:48:06 UTC - Retardust: But for throughput and latency everythink
seems ok?
It's not fast right now:( I have only one consumer, cause I need to parse
single ordered stream without partitioning.
I see 200mbit/s throughtput
20% usage of cpu
not a lot of gc pauses.
there could be the bottleneck?
should I check settings like direct buffers for example?
----
2019-10-22 15:50:08 UTC - Retardust: default overides
Producer:
batching enabled, 50ms window, up to 500 messages
1000 messages maxPendingMessages
LZ4
Consumer:
1000 max receiver q
----
2019-10-22 15:50:33 UTC - Retardust: messages something between 5kb and 1mb
----
2019-10-22 15:50:52 UTC - Matteo Merli: are the messages batched in the
upstream topic?
----
2019-10-22 15:51:19 UTC - Matteo Merli: also, check the topic stats for the
upstream topic
----
2019-10-22 15:51:35 UTC - Matteo Merli: `pulsar-admin topics stats $TOPIC`
----
2019-10-22 15:52:31 UTC - Matteo Merli: and check for :
```
"availablePermits" : 766, // Number of flow-control permits that Pulsar
// has currently from a consumer. When >
0, it
// means Pulsar can push more messages. When
it's
// <= 0, the broker will pause the
delivery to
// adjust to consumer processing speed
```
----
2019-10-22 15:53:23 UTC - Retardust: availablePermits = 1000 in stats at least
----
2019-10-22 15:54:03 UTC - Matteo Merli: when traffic is ongoing?
----
2019-10-22 15:54:11 UTC - Matteo Merli: then consumer is fast enough
----
2019-10-22 15:54:53 UTC - Retardust: is there prometheus metric to check this?
doesn't see
----
2019-10-22 15:57:55 UTC - Matteo Merli: no, it’s not reported on Prometheus
----
2019-10-22 16:08:41 UTC - Retardust: "availablePermits" : 600,
on load.
but there is huge lag (Im reset offset and wait to reprocess)
cpu is ok, gc is ok:) network is ok:)
but still low rate
```
"msgRateOut" : 4.999984407631958,
"msgThroughputOut" : 2559363.268668303,
```
----
2019-10-22 16:24:56 UTC - Retardust: and what stats should I check on the
upstream topic? permits are on upstream topic consumer
rates are from upstream topic too
----
2019-10-22 16:27:34 UTC - Retardust: downstream topic stats are weird
```
{
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"msgRateOut" : 0.0,
"msgThroughputOut" : 0.0,
"averageMsgSize" : 0.0,
"storageSize" : 64107851814,
"publishers" : [ {
"msgRateIn" : 0.0,
"msgThroughputIn" : 0.0,
"averageMsgSize" : 0.0,
"producerId" : 0,
"metadata" : { },
"producerName" : "kappa-1295-20",
"connectedSince" : "2019-10-22T15:44:42.119Z",
"clientVersion" : "2.4.1",
"address" : "/172.28.117.36:58302"
} ],
```
----
2019-10-22 16:39:03 UTC - Retardust: ```
786837 2019-10-22 18:57:38,992 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ClientCnx |
[<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
Broker notification of Closed consumer: 0
786838 2019-10-22 18:57:38,993 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ConnectionHandler |
[<persistent://t1/n1/queue_journal>] [journal_consumer] Closed connection [id:
0x6bb31ad1, L:/172.17.0.50:58302 -
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
-- Will try again in 0.1 s
786940 2019-10-22 18:57:39,095 INFO [ pulsar-timer-6-1 ]
o.a.p.c.i.ConnectionHandler |
[<persistent://t1/n1/queue_journal>] [journal_consumer] Reconnecting after
timeout
787007 2019-10-22 18:57:39,162 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ConsumerImpl |
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribing to topic on
cnx [id: 0x6bb31ad1, L:/172.17.0.50:58302 -
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
787010 2019-10-22 18:57:39,165 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ConsumerImpl |
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribed to topic on
<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>
-- consumer: 0
1170871 2019-10-22 19:04:03,026 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ClientCnx |
[<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
Broker notification of Closed consumer: 0
1170872 2019-10-22 19:04:03,027 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ConnectionHandler |
[<persistent://t1/n1/queue_journal>] [journal_consumer] Closed connection [id:
0x6bb31ad1, L:/172.17.0.50:58302 -
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
-- Will try again in 0.1 s
1170974 2019-10-22 19:04:03,129 INFO [ pulsar-timer-6-1 ]
o.a.p.c.i.ConnectionHandler |
[<persistent://t1/n1/queue_journal>] [journal_consumer] Reconnecting after
timeout
1171101 2019-10-22 19:04:03,256 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ConsumerImpl |
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribing to topic on
cnx [id: 0x6bb31ad1, L:/172.17.0.50:58302 -
R:<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>]
1171104 2019-10-22 19:04:03,259 INFO [ pulsar-client-io-1-1 ]
o.a.p.c.i.ConsumerImpl |
[<persistent://t1/n1/queue_journal>][journal_consumer] Subscribed to topic on
<http://corpint5.moscow.alfaintra.net/172.28.117.19:9022|corpint5.moscow.alfaintra.net/172.28.117.19:9022>
-- consumer: 0
```
what could be the reason?
----
2019-10-22 16:50:55 UTC - Alexandre DUVAL: @xiaolong.ran hi, @Sijie Guo told me
to tag on this. About function stucked on context.publish After processed ~3000
messages. Same After multiple restarts.
----
2019-10-22 16:51:12 UTC - Alexandre DUVAL: This function was running for few
weeks ans today got this.
----
2019-10-22 17:36:49 UTC - Retardust: I wondering why
msgRateIn per topic is 3mb
but Publish throughput for producer is 12.87 msg/s --- 816.96 Mbit/s
:slightly_smiling_face:
----
2019-10-22 17:37:32 UTC - Sergey Zhemzhitsky: What do you guys think about
recent announcement of Streamlio acquisition by Splunk?
Splunk has already had Kafka and Flink internally, so I’m worried about
Pulsar’s destiny.
```
Streamlio's experience with Pulsar, combined with Splunk's existing expertise
in Apache Flink and Apache Kafka will result in the world's best real-time
stream processing solution.
...
Splunk intends to continue to maintain Apache Pulsar and other projects through
our acquisition of Streamlio. We're eager to find new ways to support the
Apache Software Foundation, and the Pulsar project.
```
<https://www.splunk.com/blog/2019/10/21/splunk-to-expand-streaming-expertise-announces-intent-to-acquire-streamlio-open-source-distributed-messaging-leader.html>
----
2019-10-22 17:41:50 UTC - Endre Karlson: @Sijie Guo ^??
----
2019-10-22 17:42:25 UTC - Matteo Merli: Splunk has committed to ensuring the
ongoing growth and success of Apache Pulsar through contributions and
continuing support of the open source community (see Splunk blog).
----
2019-10-22 17:47:36 UTC - Sergey Zhemzhitsky: Well, Splunk will be fully
committed in case it decides to replace its internal dataflows going through
Kafka with Pulsar )
----
2019-10-22 17:59:11 UTC - Retardust: and
pulsar_rate_in : 80msg/s
but
rate(pulsar_storage_backlog_size[1m]) for same topic is
3 000 000 msg/s
what?))
----
2019-10-22 17:59:19 UTC - Matteo Merli: For now, we can only comment that
Splunk plans to use Apache Pulsar in a number of its internal services and
products.
I’m the least worried about Pulsar’s destiny :slightly_smiling_face:We’ve been
working on maturing the technology for many years now and we’ll continue on the
same path, to take it to the next level.
At the same time you might have noticed that the community has considerably
expanded, with many companies invested on it for critical systems and
contributing back.
----
2019-10-22 18:07:55 UTC - Retardust: pulsar_storage_backlog_size seems to be in
bytes, not messages
?
documentation says it's messages
----
2019-10-22 18:46:47 UTC - Vladimir Shchur: Can you please comment regarding
streamlio cloud? Is it discontinued?
----
2019-10-22 18:50:43 UTC - David Kjerrumgaard: @Vladimir Shchur The free trial
period for Streamlio cloud has concluded. Any existing trials will continue
until they conclude, but we will not be accepting new trial applications at
this time.
----
2019-10-22 18:55:01 UTC - Vladimir Shchur: @David Kjerrumgaard what about
non-trial offering? We've regarded Streamlio as Pulsar as a service platform on
AWS and planned to have some business with it, is it gone?
----
2019-10-22 18:59:13 UTC - Chris Bartholomew: FYI, we offer Pulsar as a service
on AWS and GCP. Azure coming soon. <https://kafkaesque.io/>
----
2019-10-22 19:02:36 UTC - Sijie Guo: @Sergey Zhemzhitsky Apache Pulsar is a
100% open source project, hosted at the vendor-independent Apache Software
Foundation. PMC is the group of people who lead the direction and development
of Pulsar. Pulsar PMC is from many different companies, Yahoo, Yahoo! JAPAN,
Zhaopin and etc. It will not fall apart due to one vendor acquisition. That’s
also the whole point of running Pulsar in the Apache way. Community lives much
much longer than vendors.
Also the Pulsar community is growing really fast. Many large companies have
already invested heavily in using Pulsar in their mission critical services.
For example, Tencent (one of the largest and most valuable internet companies)
has adopted and run Pulsar at a very large scale
(<https://streamnative.io/blog/tech/2019-10-22-powering-tencent-billing-platform-with-apache-pulsar/>).
It uses Pulsar to power its billing platform for processing tens of billions
of transactions every day for its total escrowed accounts of 30 billions
dollars.
We, StreamNative, a company also founded by a group of Pulsar/BookKeeper PMC
members, will continue our commitments to provide commercial support for Pulsar
and work with the broader Pulsar community including Splunk to push the project
to next level. We are really positive about the project and the community, as
we have helped a lot of companies adopted Pulsar and have seen a fast-growing
adoption pace. It just takes some time for those adopters to share their
stories publicly. We have published some of the success stories
(<https://streamnative.io/success-stories/>) and more will be coming.
Hope this give you some ideas.
heart_eyes : Pierre Zemb
----
2019-10-22 19:06:34 UTC - Vladimir Shchur: Thank you! I've evaluated your
project as well, but looks like pulsar functions support is missing, which is
crucial for us
----
2019-10-22 19:07:45 UTC - Chris Bartholomew: Pulsar functions is definitely on
the roadmap.
----
2019-10-22 19:08:09 UTC - Vladimir Shchur: Any time commitments available?
----
2019-10-22 19:09:23 UTC - Chris Bartholomew: Definitely before the end of Dec,
if not earlier.
----
2019-10-22 19:09:55 UTC - Vladimir Shchur: Thank you, will keep that in mind
----
2019-10-22 19:10:46 UTC - Chris Bartholomew: We will be supporting all Pulsar
core features (Functions, IO Connectors, Schema registry).
----
2019-10-22 19:12:02 UTC - Luke Lu: Congrats on closing the seed round
:slightly_smiling_face:
----
2019-10-22 22:46:58 UTC - Nicolas Ha: I am having the same kubernetes issue as
last time pop up again 2 of the 3 brokers are in `CrashLoopBackOff` state. Here
is what I see in the logs
```
22:41:56.520 [main] ERROR org.apache.pulsar.PulsarBrokerStarter - Failed to
start pulsar service.
org.apache.pulsar.broker.PulsarServerException: java.lang.RuntimeException:
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:472)
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
at
org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.start(PulsarBrokerStarter.java:273)
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
at
org.apache.pulsar.PulsarBrokerStarter.main(PulsarBrokerStarter.java:332)
[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
Caused by: java.lang.RuntimeException:
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
at
org.apache.pulsar.functions.worker.WorkerService.start(WorkerService.java:206)
~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
at
org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1046)
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:459)
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
... 2 more
Caused by:
org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
at
org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:271)
~[org.apache.pulsar-pulsar-client-api-2.4.1.jar:2.4.1]
at
org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:88)
~[org.apache.pulsar-pulsar-client-original-2.4.1.jar:2.4.1]
at
org.apache.pulsar.functions.worker.FunctionMetaDataManager.getServiceRequestManager(FunctionMetaDataManager.java:484)
~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
at
org.apache.pulsar.functions.worker.FunctionMetaDataManager.<init>(FunctionMetaDataManager.java:74)
~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
at
org.apache.pulsar.functions.worker.WorkerService.start(WorkerService.java:156)
~[org.apache.pulsar-pulsar-functions-worker-2.4.1.jar:2.4.1]
at
org.apache.pulsar.broker.PulsarService.startWorkerService(PulsarService.java:1046)
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:459)
~[org.apache.pulsar-pulsar-broker-2.4.1.jar:2.4.1]
... 2 more
```
Not being a kubernetes expert, what would you check first?
Note, there is plenty of disk
----
2019-10-22 22:48:58 UTC - Nicolas Ha: kubectl describe deployments.apps broker
```
Name: broker
Namespace: default
CreationTimestamp: Mon, 02 Sep 2019 01:54:28 +0100
Labels: app=pulsar
component=broker
Annotations:
<http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: 2
<http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>:
{"apiVersion":"apps/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"broker","namespace":"default"},"spec":{"replicas":3,...
Selector: app=pulsar,component=broker
Replicas: 3 desired | 3 updated | 3 total | 1 available | 2
unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=pulsar
component=broker
Annotations: <http://prometheus.io/port|prometheus.io/port>: 8080
<http://prometheus.io/scrape|prometheus.io/scrape>: true
Containers:
broker:
Image: apachepulsar/pulsar-all:2.4.1
Ports: 8080/TCP, 6650/TCP
Host Ports: 0/TCP, 0/TCP
Command:
sh
-c
Args:
bin/apply-config-from-env.py conf/broker.conf &&
bin/apply-config-from-env.py conf/pulsar_env.sh &&
bin/gen-yml-from-env.py conf/functions_worker.yml && bin/pulsar broker
Limits:
memory: 2Gi
Requests:
memory: 2Gi
Environment Variables from:
broker-config ConfigMap Optional: false
Environment:
advertisedAddress: (v1:status.podIP)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available False MinimumReplicasUnavailable
OldReplicaSets: <none>
NewReplicaSet: broker-84678846d6 (3/3 replicas created)
Events: <none>
```
----
2019-10-22 22:58:08 UTC - Ambud Sharma: congratulations @Matteo Merli
----
2019-10-22 23:10:00 UTC - Matteo Merli: thanks @Ambud Sharma
----
2019-10-23 03:39:50 UTC - xiaolong.ran: Hello in broker, is there any log
information about this error?
----
2019-10-23 05:39:46 UTC - Sijie Guo: it seems that it failed to recover a ledger
----
2019-10-23 05:40:02 UTC - Sijie Guo: did you replace any disk or erase disks
before?
----
2019-10-23 05:55:25 UTC - Sijie Guo:
<https://medium.com/streamnative/how-to-use-apache-pulsar-manager-with-herddb-dd265c955ca4>
----
2019-10-23 05:56:29 UTC - Sijie Guo: The new blog post from @Enrico Olivelli
about using HerdDB in Pulsar Manager.
+1 : Retardust
----
2019-10-23 06:31:01 UTC - Retardust: and there is no info about
pulsar_msg_backlog metric in documentation, will pr
----
2019-10-23 07:58:52 UTC - Nicolas Ha: Nothing between functioning fine and the
errors, which is why I am puzzled
----
2019-10-23 07:59:56 UTC - Nicolas Ha: Although I went from 5 nodes to 3 nodes.
Could it be it?
----
2019-10-23 08:00:21 UTC - Nicolas Ha: And more importantly, if that's the case
how can I recover?
----