2020-03-03 10:01:00 UTC - Kannan: This is because of broker advertise k8s
service name (which is pulsaristio-broker) and the Broker-znode session gets
constructed using this service name. So, with 1 broker it works but on adding
another broker, it uses same service name and hence, fails with error -
"broker-znode owned by different session."
----
2020-03-03 10:21:20 UTC - eilonk: anyone here deployed pulsar with helm using
tls authentication?
While the chart allows you to create a tls secret for the admin certs, what
happens when i want to use different ones for the consumers? brokers?
(in a different type of deployment, i would give the filepath to these
certificates but that's not how helm works)
----
2020-03-03 13:13:03 UTC - Santiago Del Campo: It was a problem with K8s and
linux host networking, for some reason the bookkeeper pod it's accessible from
outside but the pod can't access itself..... as fas as i know, seems to be a
bug with k8s and iptables.
----
2020-03-03 13:20:37 UTC - Greg: Hello there, we are currently facing an issue
with pulsar 2.5.0 (was working fine in 2.4.2). at some point we are not able to
subscribe to a given topic with this error sent by the server :
[<non-persistent://public/default/cluster][Proxy-vm-k8s-worker-infinity-2]>
Failed to create consumer: Topic does not have schema to check
----
2020-03-03 13:21:56 UTC - Greg: We are suspecting the topic to be cleaned at
some point because of no activity, and when the client automatically reconnect,
there are no more schema linked to this topic recreated
----
2020-03-03 13:23:32 UTC - Greg: Any hint on how to workaround this or some
informations needed to pinpoint what is the issue ?
----
2020-03-03 13:26:15 UTC - Greg: I just realized we are using 2.4.2 java client
on top of 2.5.0 server version of pulsar, i will try with 2.5.0 java client
----
2020-03-03 13:32:11 UTC - Ryan Slominski: Thanks for pointing out the
getCurrentRecord method. I missed that as I thought the input of the function
was always passed in as a parameter in the method signature (in all the
examples I've seen it's a String named "input"). I guess the input parameter
appears in two places then?
----
2020-03-03 13:34:15 UTC - Penghui Li: Have you deleted the schema of this topic
before?
----
2020-03-03 13:34:56 UTC - Antti Kaikkonen:
<https://pulsar.apache.org/docs/en/functions-develop/#available-apis>
You have to use the pulsar function SDK instead of language-native function
interface to get access to the context object.
----
2020-03-03 13:38:32 UTC - Greg: no, this topic is created by the first
publisher with a schema (STRING)
----
2020-03-03 13:39:15 UTC - Greg: and at some point this topic is removed
automatically
----
2020-03-03 13:39:22 UTC - Ryan Slominski: Yeah, it looks like you can get the
input from the method signature AND from the context object. In other words,
the SDK version should not include the input as a method parameter - all you
need is the context object.
+1 : Antti Kaikkonen
----
2020-03-03 13:39:41 UTC - Greg: and we cannot reconnect manually on it because
of the schema not available
----
2020-03-03 13:40:24 UTC - Greg: i think the topic is recreated automatically by
the java client, right ?
----
2020-03-03 13:40:26 UTC - Penghui Li: ok, is it happens when the topic is
removed then a consumer try to connect to it.
----
2020-03-03 13:40:34 UTC - Greg: yes
----
2020-03-03 13:41:40 UTC - Greg: because it looks like it has been recreated
automatically (client reconnection mechanism ?) without schema
----
2020-03-03 13:43:47 UTC - Penghui Li: No, `Topic does not have schema to check`
means broker can get schema from consumer but can't find any schema on this
topic. So, it's better to check is the topic don't have any schema
----
2020-03-03 13:44:16 UTC - Greg: Sorry that was what i meant, the topic has lost
the schema
----
2020-03-03 13:44:24 UTC - Penghui Li: yes
----
2020-03-03 13:44:53 UTC - Greg: we verified using pulsar-admin and can
confirmed the topic has lost the schema, but i cannot explain why
----
2020-03-03 13:44:55 UTC - Penghui Li: Here is the source code:
```return getSchema(schemaId).thenCompose(existingSchema -> {
if (existingSchema != null &&
!existingSchema.schema.isDeleted()) {
if (strategy == SchemaCompatibilityStrategy.BACKWARD ||
strategy == SchemaCompatibilityStrategy.FORWARD ||
strategy ==
SchemaCompatibilityStrategy.FORWARD_TRANSITIVE ||
strategy == SchemaCompatibilityStrategy.FULL) {
return checkCompatibilityWithLatest(schemaId,
schemaData, SchemaCompatibilityStrategy.BACKWARD);
} else {
return checkCompatibilityWithAll(schemaId, schemaData,
strategy);
}
} else {
return FutureUtil.failedFuture(new
IncompatibleSchemaException("Topic does not have schema to check"));
}
});```
----
2020-03-03 13:45:58 UTC - Penghui Li: > we verified using pulsar-admin and
can confirmed the topic has lost the schema, but i cannot explain why
Ok, thanks.
----
2020-03-03 13:47:00 UTC - Greg: and i also see that the schema has been cleaned
: [<non-persistent://public/default/cluster>] Topic deleted successfully due to
inact
ivity
----
2020-03-03 13:47:34 UTC - Greg: so i don't know how the topic has been
recreated (something automatic somewhere :wink: )
----
2020-03-03 13:49:07 UTC - Greg: sorry i mean the topic has been cleaned, not
the schema...
----
2020-03-03 13:49:44 UTC - Penghui Li: Pulsar enable topic auto creation by
default. So, if a producer or a consumer try to connect to a not exists topic,
broker will create the topic.
----
2020-03-03 13:50:13 UTC - Greg: but in this case the topic is well created with
the schema
----
2020-03-03 13:51:03 UTC - Ryan Slominski: FYI - looks like syntax highlighting
of method getCurrentRecord is broken in the 2.5.0 docs. Not sure why it isn't
blue like the rest of the methods. Probably why I didn't see it.
----
2020-03-03 13:51:16 UTC - Greg: so i suppose a client already connected that
tries to use a topic that has been cleaned
----
2020-03-03 13:51:20 UTC - Greg: is it possible ?
----
2020-03-03 13:52:50 UTC - Greg: Can a topic be cleaned when there are
publisher/consumer connnected to it ?
----
2020-03-03 13:53:05 UTC - Penghui Li: topic only delete when no publishers, no
consumers, no subscriptions.
----
2020-03-03 13:53:18 UTC - Penghui Li: > Can a topic be cleaned when there
are publisher/consumer connnected to it ? (edited)
NO
----
2020-03-03 13:53:43 UTC - Greg: We are using a pulsar cluster with 3 instances
----
2020-03-03 13:54:23 UTC - Greg: i think we don't have the issue with only one
pulsar instance, but i need to verify to confirm
----
2020-03-03 14:00:03 UTC - Penghui Li: Ok, I will try to reproduce it. looks
this is a bug related to schema deletion.
----
2020-03-03 14:00:18 UTC - Alex Yaroslavsky: Hi,
Can't figure out how to deal with this one
I get the following error in the function log:
`[ERROR] python_instance.py: Exception while executing user method`
`Traceback (most recent call last):`
`File "/opt/pulsar/instances/python-instance/python_instance.py", line 242,
in actual_execution`
`output_object = self.function_class.process(input_object,
self.contextimpl)`
`File "/tmp/pulsar_functions/c1-d1/in/in_router/0/in_router.py", line 8, in
process`
`context.publish(self.out_topic, item, properties={ 'tenant' :
context.get_function_tenant() })`
`File "/opt/pulsar/instances/python-instance/contextimpl.py", line 190, in
publish`
`output_bytes, partial(self.callback_wrapper, callback, topic_name,
self.get_message_id()), **message_conf)`
`File "/opt/pulsar/instances/python-instance/pulsar/__init__.py", line 894,
in send_async`
`replication_clusters, disable_replication, event_timestamp)`
`File "/opt/pulsar/instances/python-instance/pulsar/__init__.py", line 928,
in _build_msg`
`mb.property(k, v)`
`ArgumentError: Python argument types in`
`MessageBuilder.property(MessageBuilder, str, unicode)`
`did not match C++ signature:`
`property(pulsar::MessageBuilder {lvalue}, std::string, std::string)`
The function code is this:
`from pulsar import Function`
`class RoutingFunction(Function):`
`def __init__(self):`
`self.out_topic = "<persistent://internal/emsaas/out>"`
`def process(self, item, context):`
`context.publish(self.out_topic, item, properties={ 'tenant' :
context.get_function_tenant() })`
----
2020-03-03 14:01:07 UTC - Penghui Li: @Greg Can you help to create a issue on
github, and provide the reproduce steps if possible ?
----
2020-03-03 14:01:15 UTC - Greg: ok thanks Penghui, tell me if you need more
informations, traces or tests from me !
+1 : Penghui Li
----
2020-03-03 14:01:29 UTC - Greg: yes
----
2020-03-03 14:01:36 UTC - Penghui Li: Thanks
----
2020-03-03 14:03:46 UTC - Ryan Slominski: Looks like DOCUSAURUS code tabs
interferes with the markdown because the raw markdown works fine. Might be
related to this bug: <https://github.com/facebook/docusaurus/issues/1260>
----
2020-03-03 14:08:45 UTC - Tobias Macey: @Sijie Guo is there anywhere that I can
track progress on the Kafka protocol layer that you are working on?
----
2020-03-03 14:10:59 UTC - Penghui Li: @Greg I can't reproduce it on master
branch
----
2020-03-03 14:11:04 UTC - Penghui Li: ```{
"version": 2,
"schemaInfo": {
"name": "test",
"schema": "",
"type": "STRING",
"properties": {}
}
}```
----
2020-03-03 14:11:56 UTC - Greg: i try to see how to reproduce it easily, will
also see if i have the issue with only one broker
----
2020-03-03 14:12:01 UTC - Penghui Li: This is created schema by consumer. I
create the consumer after topic delete.
----
2020-03-03 14:12:39 UTC - Greg: but in my case the consumer is already
connected when the topic is deleted
----
2020-03-03 14:13:03 UTC - Greg: i don't know why it is deleted, i though it was
because no messages were sent
----
2020-03-03 14:13:04 UTC - Penghui Li: Ok, let me try
----
2020-03-03 14:17:45 UTC - Penghui Li: ```22:14:20.129
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConnectionHandler@114] INFO
org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Closed
connection [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650] -- Will try
again in 0.1 s
22:14:20.230
[pulsar-timer-4-1:org.apache.pulsar.client.impl.ConnectionHandler@117] INFO
org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Reconnecting
after timeout
22:14:20.236
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@550] INFO
org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribing to topic
on cnx [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650]
22:14:20.270
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@662] INFO
org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribed to topic
on /127.0.0.1:6650 -- consumer: 0
22:15:00.982 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ClientCnx@629]
INFO org.apache.pulsar.client.impl.ClientCnx - [/127.0.0.1:6650] Broker
notification of Closed consumer: 0
22:15:00.982
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConnectionHandler@114] INFO
org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Closed
connection [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650] -- Will try
again in 0.1 s
22:15:01.084
[pulsar-timer-4-1:org.apache.pulsar.client.impl.ConnectionHandler@117] INFO
org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Reconnecting
after timeout
22:15:01.085
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@550] INFO
org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribing to topic
on cnx [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650]
22:15:01.120
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@662] INFO
org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribed to topic
on /127.0.0.1:6650 -- consumer: 0
22:15:30.509 [pulsar-client-io-1-1:org.apache.pulsar.client.impl.ClientCnx@629]
INFO org.apache.pulsar.client.impl.ClientCnx - [/127.0.0.1:6650] Broker
notification of Closed consumer: 0
22:15:30.510
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConnectionHandler@114] INFO
org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Closed
connection [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650] -- Will try
again in 0.1 s
22:15:30.611
[pulsar-timer-4-1:org.apache.pulsar.client.impl.ConnectionHandler@117] INFO
org.apache.pulsar.client.impl.ConnectionHandler - [test] [test] Reconnecting
after timeout
22:15:30.613
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@550] INFO
org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribing to topic
on cnx [id: 0x448f874d, L:/127.0.0.1:53657 - R:/127.0.0.1:6650]
22:15:30.650
[pulsar-client-io-1-1:org.apache.pulsar.client.impl.ConsumerImpl@662] INFO
org.apache.pulsar.client.impl.ConsumerImpl - [test][test] Subscribed to topic
on /127.0.0.1:6650 -- consumer: 0```
Looks not easy to reproduce. I tried delete topic only, delete topic and
schema. The consumer still can reconnect succeed. And when I delete shema, I
check the last schema version is changed.
----
2020-03-03 14:18:05 UTC - Penghui Li: ```lipenghui@lipenghuideMacBook-Pro-2
apache-pulsar-2.6.0-SNAPSHOT % bin/pulsar-admin schemas get test
{
"version": 4,
"schemaInfo": {
"name": "test",
"schema": "",
"type": "STRING",
"properties": {}
}
}
lipenghui@lipenghuideMacBook-Pro-2 apache-pulsar-2.6.0-SNAPSHOT %
bin/pulsar-admin topics delete test -f -d
lipenghui@lipenghuideMacBook-Pro-2 apache-pulsar-2.6.0-SNAPSHOT %
bin/pulsar-admin schemas get test
{
"version": 6,
"schemaInfo": {
"name": "test",
"schema": "",
"type": "STRING",
"properties": {}
}
}```
----
2020-03-03 14:20:32 UTC - Greg: wow ok :disappointed: I will try to find the
exact way to reproduce, thanks for the help
----
2020-03-03 14:29:39 UTC - Geetish Sanjeeb Nayak: @Geetish Sanjeeb Nayak has
joined the channel
----
2020-03-03 14:31:54 UTC - Greg: So here is a reproduction with 2 pulsar nodes :
```Node 1 :
14:23:30.460 [ForkJoinPool.commonPool-worker-1] WARN
org.apache.pulsar.broker.service.AbstractTopic -
[<non-persistent://public/default/cluster>] Error getting policies
java.util.concurrent.CompletableFuture cannot be cast to
org.apache.pulsar.common.policies.data.Policies and publish throttling will be
disabled
14:23:30.461 [ForkJoinPool.commonPool-worker-1] INFO
org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling
for <non-persistent://public/default/cluster>
14:23:30.477 [ForkJoinPool.commonPool-worker-1] INFO
org.apache.pulsar.broker.service.BrokerService - Created topic
NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
14:25:20.219 [pulsar-io-24-3] INFO
org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic -
[<non-persistent://public/default/cluster>] Topic deleted
14:25:20.219 [pulsar-io-24-3] INFO
org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic -
[<non-persistent://public/default/cluster>] Topic deleted successfully due to
inactivity
14:28:24.086 [pulsar-io-24-4] WARN
org.apache.pulsar.broker.service.AbstractTopic -
[<non-persistent://public/default/cluster>] Error getting policies
java.util.concurrent.CompletableFuture cannot be cast to
org.apache.pulsar.common.policies.data.Policies and publish throttling will be
disabled
14:28:24.086 [pulsar-io-24-4] INFO
org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling
for <non-persistent://public/default/cluster>
14:28:24.087 [pulsar-io-24-4] INFO
org.apache.pulsar.broker.service.BrokerService - Created topic
NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
Node 2 :
14:25:20.201 [pulsar-ordered-OrderedExecutor-5-0-EventThread] INFO
org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000
sessionid:0x100d2a693590018 local:/10.200.8.111:41790
remoteserver:zookeeper/10.200.7.69:2181 lastZxid:4294984843 xid:719 sent:719
recv:753 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch
event: WatchedEvent state:SyncConnected type:NodeDataChanged
path:/schemas/public/default/cluster
14:27:04.984 [pulsar-io-24-3] INFO org.apache.pulsar.broker.service.ServerCnx
- [/10.200.8.112:36886] Subscribing on topic
<non-persistent://public/default/cluster> /
Monitor-istra-monitor-667c9fbbcc-ldlsf
14:27:05.086 [Thread-14] WARN org.apache.pulsar.broker.service.ServerCnx -
[/10.200.8.112:36886][<non-persistent://public/default/cluster][Monitor-istra-monitor-667c9fbbcc-ldlsf]>
Failed to create consumer: Topic does not have schema to check```
----
2020-03-03 14:33:21 UTC - Greg: The publisher does not disconnect but does not
send any message, after 1 minute the Topic is deleted. Then a new publisher
connects on the other node and gets the error message
----
2020-03-03 14:33:41 UTC - Greg: Will try with only one node
----
2020-03-03 14:40:35 UTC - Santiago Del Campo: Hello!.. any idea about this?
We're running pulsar cluster on top of K8s.... the broker takes high load
traffic through websocket.... overtime, the CPU rise up to 3 cores of the
server, could be related to some CPU usage "leak"?
Broker version is 2.4.2
----
2020-03-03 14:48:21 UTC - Greg: @Penghui Li: I cannot reproduce with only one
pulsar instance, the Topic is never cleaned. But with several instances, i can
easily reproduce as the Topic is deleted after 1 minute and then i cannot
reconnect anymore to it
----
2020-03-03 14:56:45 UTC - Penghui Li: Is the instance means pulsar broker?
----
2020-03-03 15:15:51 UTC - Greg: yes
----
2020-03-03 15:34:06 UTC - Pavel Tishkevich: Hello.
We’ve increased number of brokers from 3 to 12 to have better availability when
broker fails as @Sijie Guo suggested.
But after that latency for some operations increased significantly (For example
`ConsumerBuilder.subscribe()` increased from 0.6s to 1.5s)
What is causing this latency growth? Can it be tuned with this number of
brokers?
Also we’ve noticed that zookeeper is more loaded after adding brokers (see
attached). Is this normal? What can we do with this to decrease latency?
----
2020-03-03 17:24:46 UTC - Sijie Guo: We are about to open source KoP in the
coming week.
----
2020-03-03 17:27:06 UTC - Sijie Guo: what kind of hardware are you using for
your zookeeper cluster?
----
2020-03-03 17:49:45 UTC - Gabriel Paiz III: @Gabriel Paiz III has joined the
channel
----
2020-03-03 17:57:53 UTC - Devin G. Bost: I was just going to ask the same
question.
----
2020-03-03 17:58:10 UTC - Devin G. Bost: More brokers puts greater load on
Zookeeper.
----
2020-03-03 18:03:04 UTC - Devin G. Bost: I know there are folks using helm.
----
2020-03-03 18:03:29 UTC - Devin G. Bost: I don't know specifics though.
----
2020-03-03 18:19:50 UTC - Devin G. Bost: @Penghui Li I'm not sure if it's the
same issue, but I know that sometimes the schema isn't created depending on the
other in which components are created. For example, if a source is created
before a function, or if a function is created before a sink, the schema is not
created.
----
2020-03-03 18:20:36 UTC - Devin G. Bost: Do you have logs associated with the
event? I think I've seen this happen before, but we need to capture logs around
what's happening to identify the cause.
----
2020-03-03 18:24:54 UTC - Santiago Del Campo: Which logs exactly...
broker's?:thinking_face:
I made some research about this.. found an issue on github related to CPU usage
leak when broker was under high load, websocket communication and exclusive
subscription of a topic was attempted (which btw it's our case due to the app
architecture around Pulsar consumption).
----
2020-03-03 18:26:05 UTC - Santiago Del Campo: According to the github issue,
and if i understand correctly, a fix was made in 2.4.2 and 2.5.0 :thinking_face:
----
2020-03-03 18:28:26 UTC - Devin G. Bost: Do you have the issue number? I can
check on this.
----
2020-03-03 18:28:38 UTC - Devin G. Bost: Yes, I meant broker logs.
----
2020-03-03 18:29:42 UTC - Devin G. Bost: Is the issue intermittent? If so, it
may be tricky to reproduce.
----
2020-03-03 18:30:09 UTC - Santiago Del Campo: issue:
<https://github.com/apache/pulsar/issues/5200>
apparently fixed with: <https://github.com/apache/pulsar/pull/5204>
----
2020-03-03 18:31:50 UTC - Devin G. Bost: Thanks
----
2020-03-03 18:33:57 UTC - Santiago Del Campo: Thanks to you
:slightly_smiling_face:
----
2020-03-03 18:41:47 UTC - Santiago Del Campo: This is the exception received
from the stdout:
Tell me if you needed something else.
----
2020-03-03 18:41:47 UTC - Santiago Del Campo: ```18:38:41.381 [pulsar-web-32-7]
WARN org.apache.pulsar.websocket.ConsumerHandler - [172.31.49.0:33282] Failed
in creating subscription xxxxxxxxxxxxxxxxx on topic
<non-persistent://public/default/xxxxxxxxxxxxxxxxxxx>
org.apache.pulsar.client.api.PulsarClientException$ConsumerBusyException:
Exclusive consumer is already connected
at
org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:276)
~[org.apache.pulsar-pulsar-client-api-2.4.2.jar:2.4.2]
at
org.apache.pulsar.client.impl.ConsumerBuilderImpl.subscribe(ConsumerBuilderImpl.java:91)
~[org.apache.pulsar-pulsar-client-original-2.4.2.jar:2.4.2]
at
org.apache.pulsar.websocket.ConsumerHandler.<init>(ConsumerHandler.java:111)
~[org.apache.pulsar-pulsar-websocket-2.4.2.jar:2.4.2]
at
org.apache.pulsar.websocket.WebSocketConsumerServlet.lambda$configure$0(WebSocketConsumerServlet.java:44)
~[org.apache.pulsar-pulsar-websocket-2.4.2.jar:2.4.2]
at
org.eclipse.jetty.websocket.server.WebSocketServerFactory.acceptWebSocket(WebSocketServerFactory.java:229)
[org.eclipse.jetty.websocket-websocket-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.websocket.server.WebSocketServerFactory.acceptWebSocket(WebSocketServerFactory.java:214)
[org.eclipse.jetty.websocket-websocket-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.websocket.servlet.WebSocketServlet.service(WebSocketServlet.java:160)
[org.eclipse.jetty.websocket-websocket-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
[javax.servlet-javax.servlet-api-3.1.0.jar:3.1.0]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)
[org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
[org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:53)
[org.apache.pulsar-pulsar-broker-2.4.2.jar:2.4.2]
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
[org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:542)
[org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)
[org.eclipse.jetty-jetty-servlet-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:173)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.Server.handle(Server.java:494)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)
[org.eclipse.jetty-jetty-server-9.4.20.v20190813.jar:9.4.20.v20190813]
at
<http://org.eclipse.jetty.io|org.eclipse.jetty.io>.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
at
<http://org.eclipse.jetty.io|org.eclipse.jetty.io>.FillInterest.fillable(FillInterest.java:103)
[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
at
<http://org.eclipse.jetty.io|org.eclipse.jetty.io>.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
[org.eclipse.jetty-jetty-io-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)
[org.eclipse.jetty-jetty-util-9.4.20.v20190813.jar:9.4.20.v20190813]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_232]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_232]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]```
----
2020-03-03 19:11:04 UTC - eilonk: If anyone was wondering, this isn’t related
to helm but to kubernetes itself - just requires to mount volumes with the
matching secrets and then editing the chart templates to use those.
----
2020-03-03 19:16:54 UTC - Devin G. Bost: Zookeeper needs to be properly tuned
to function at its best.
----
2020-03-03 19:17:04 UTC - Devin G. Bost: It needs very fast disk storage.
----
2020-03-03 19:30:55 UTC - Alexander Ursu: Quick question, has anyone tried to
put the Pulsar brokers directly behind a reverse proxy, specifically Traefik in
my case, without first standing behind the Pulsar proxy? Are there any inherit
problems with this approach, or are there any opinions on it?
I know the Pulsar proxy can do service discovery, but so can Traefik, since I
have all my components running in docker containers in a swarm.
----
2020-03-03 19:33:00 UTC - Rolf Arne Corneliussen: Simple question: for a
_partitioned_ topic, if you set a *compaction-threshold*, will it apply to the
topic as a whole or for each individual partition? I assume the latter, but I
could not find the answer in the documentation.
----
2020-03-03 19:37:42 UTC - Addison Higham: @Alexander Ursu it is a bit more
complex than that, topics belong to a broker, for admin calls, http redirects
are used, and the proxy follows those for you, likely you could do that with
Traefik fairly easily, but for the tcp protocol (that actually sends/receives
messages), you need to ensure you are being routed to the proper broker. When
you try and connect a consumer/producer via a proxy, it looks up the broker for
a topic (either via talking to zookeeper directly or by making an http call to
the brokers, depending on how your proxy is configured) and then forwards
requests. There is additional complexity if you are doing authz/authn there.
Making Traefik do all that would be a fair bit more complicated and requires
domains specific knowledge of pulsar's protocol
----
2020-03-03 19:40:11 UTC - Alexander Ursu: Thanks for the input! So I guess
using the Pulsar proxy is still a much more sane approach, and then putting
that behind Traefik would be straightforward?
----
2020-03-03 19:40:40 UTC - Addison Higham: should be, in our k8s we use an AWS
NLB -> Pulasr proxy
----
2020-03-03 21:31:53 UTC - Ryan Slominski: `java.lang.IllegalStateException:
State is not enabled.`
I turned on debugging and I see this. That explain why my Pulsar Function that
attempts to use state is not working. So, how do I enable state in a Pulsar
Function?
----
2020-03-03 21:45:47 UTC - Ryan Slominski: Tried updating conf/bookkeeper.conf
with:
`extraServerComponents=org.apache.bookkeeper.stream.server.StreamStorageLifecycleComponent`
Still no luck.
----
2020-03-03 22:04:54 UTC - Alexander Ursu: Might anyone have an idea/opinion on
connecting a pulsar cluster to a sink for an arbitrary amount of topics, with
arbitrary schemas, and can be connected to Grafana for graphing, preferably
through SQL or SQL-like query language
----
2020-03-03 22:12:38 UTC - Alexander Ursu: Or has anyone had any experience with
putting any sort of visualizations over Presto after configuring it to work
with Pulsar
----
2020-03-03 22:18:42 UTC - Ali Ahmed: @Alexander Ursu you can try this
<http://rickbergfalk.github.io/sqlpad/>
----
2020-03-03 22:42:23 UTC - Ryan Slominski: FYI - This does work if you deploy
the function to the cluster. Just doesn't work with Local Function Runner
----
2020-03-03 23:59:09 UTC - Derek Moore: @Derek Moore has joined the channel
----
2020-03-04 02:31:48 UTC - Eugen: I'm currently testing a pulsar cluster and I
experience message loss when producing from a single thread to a single topic
(asynchronously). The following list shows the sequence numbers that are
missing when sending at 200.000 msg/sec. It does not matter if I consume in a
tailing fashion or if I seek to the beginning and receive everything - the
result is the same - which leads me to believe the producing side is the
problem:
```11671-11840=169
23172-23397=225
23660-24127=467
24866-26299=1433
28529-28619=90
30519-30552=33
41185-41933=748
57788-58270=482
58534-58778=244
...```
Under what circumstances can message loss occur? I was under the impression
that that's pretty much impossible in pulsar.
----
2020-03-04 02:33:13 UTC - Joe Francis: It is impossible, if the message is
acked to the producer
----
2020-03-04 02:33:59 UTC - Eugen: Fwiw, I am using the same benchmark code to
produce data for pulsar and kafka, which I'm comparing pulsar to, and the kafka
variant, which uses the same message generation threading and logic, and with
kafka there is no message loss.
----
2020-03-04 02:34:55 UTC - Eugen: I also thought it is impossible, even if
brokers and bookies fail, as long as the producer is alive, which it is
----
2020-03-04 02:35:21 UTC - Joe Francis: Published messages are durable. A
message is published only after you receive an ack for it.
----
2020-03-04 02:36:14 UTC - Eugen: I'm ignoring acks
----
2020-03-04 02:36:25 UTC - Joe Francis: :slightly_smiling_face:
----
2020-03-04 02:37:00 UTC - Eugen: I'm just sending using producer.sendAsync()
----
2020-03-04 02:37:22 UTC - Eugen: so you are saying that if I do not wait for an
ack, messages may be lost?
----
2020-03-04 02:37:29 UTC - Eugen: In other words, I need to send synchronously?
----
2020-03-04 02:37:35 UTC - Joe Francis: Not at all
----
2020-03-04 02:39:21 UTC - Joe Francis: But the ack is the guarantee that your
message was persisted. Asynchronous send allows you to pipeline sends, instead
of waiting for the roundtrip before the next send.
----
2020-03-04 02:42:16 UTC - Eugen: I heard (I believe here in slack) that as long
as a producer lives, it will buffer messages in the face of broker failures,
and make sure there can be nore reordering or message loss. As I'm facing
message loss (always in batches), either my producer is buggy (but it works in
kafka), or something is happening in pulsar, which I thought was impossible. So
how could some messages get lost?
----
2020-03-04 02:43:43 UTC - Eugen: Obviously there could be message "loss" at the
end, as long as brokers (or bookies) aren't recovering, but I still don't see
how that could happen with messages in the middle.
----
2020-03-04 02:48:15 UTC - Joe Francis: That is correct. But if the operation
times out, the clinet returns an error (future completed with error) and app
has to republish
----
2020-03-04 02:49:06 UTC - Joe Francis: The broker retries will not exceed the
timeout window. Within the timeout window, there will be retries.
----
2020-03-04 02:50:21 UTC - Joe Francis: The timeout window essentially is a
limit - The app tells Pulsar, if you cannot publish within this time, kick up
the error to me
----
2020-03-04 02:50:52 UTC - Eugen: In which case there will be message loss (or
reordering in case I resend), right?
----
2020-03-04 02:51:14 UTC - Eugen: I will look into the timeout setting next. Do
you happen to know what the default is?
----
2020-03-04 02:52:13 UTC - Joe Francis: What happens on a timeout is that
everything after that message that is pending in sthe client sendbuffer is
cleaned out. the app has to resend everything starting from the message that
returned error.
----
2020-03-04 02:52:57 UTC - Eugen: So before resending it needs to clear some
flag that it received the nack?
----
2020-03-04 02:53:22 UTC - Eugen: Otherwise how would the client know if it's a
resend (aware of the problem) or not
----
2020-03-04 02:58:43 UTC - Joe Francis: Lets say you get an error on the 23rd
message, after you send 50 in async. You will get error on all 23..50 messages
(futures). You will just have to resend starting from 23...
----
2020-03-04 03:01:11 UTC - Joe Francis: I am not clear on what happened in your
case, but I am pretty confident that if you received the ack for the publish,
that message will be delivered.
----
2020-03-04 03:05:11 UTC - Eugen: So in your example, I send msg 50
asynchronously, and then the future for msg 23 returns an error. Everything
before that is lost, but everything after that (like msg 51 which follows
immediately after the asynchronously sent msg 50) gets sent? Then I don't see
how I can prevent message loss / reordering when sending asynchronously.
----
2020-03-04 03:17:30 UTC - Joe Francis: That is a very good qn - I have to look
at the code and see what happens @Matteo Merli may know it off the top of his
head
----
2020-03-04 03:25:20 UTC - Eugen: I will try to set `brokerDeduplicationEnabled`
to true
----
2020-03-04 03:59:45 UTC - Kannan: Broker-znode sessions names are constructed
using broker advertised address. By default broker advertise its IP & in
k8s if we try to advertise the broker service name then other brokers failing
to create session with because of session conflict. how to resolve this ?
----
2020-03-04 04:20:22 UTC - Eugen: adding
```brokerDeduplicationEnabled=true```
to broker.conf did not fix this
----
2020-03-04 04:48:22 UTC - Matteo Merli: @Eugen The most probable cause of the
publish failure is the message being rejected directly in the client side
because the buffer is full.
Producer has a queue where messages are kept before they are acknoledged by the
broker.
When the queue gets full (default size is 1K messages), then the producer will,
by default, reject new sendAsync() operation (by failing the associated future
objects).
One way to avoid that is to set `blockIfQueueFull=true` on the
`ProducerBuilder`. This will make the sendAsync operation to throttle
(blocking) when there's no more space.
+1 : Eugen, Ali Ahmed
----
2020-03-04 04:49:48 UTC - Matteo Merli: Another way you can get publish failure
is through timeouts (default is 30s). You can disable timeout, making the
client to keep trying publishing the message forever by setting `sendTimeout=0`.
----
2020-03-04 04:58:29 UTC - Eugen: Good news - I will give this a shot!
----
2020-03-04 05:13:51 UTC - Eugen: The 30 sec timeout can't be what is causing my
problem though, because it occurs after only a couple of seconds
----
2020-03-04 05:30:12 UTC - Sijie Guo: Are you using the same service name among
different broker nodes?
----
2020-03-04 05:38:25 UTC - Eugen: Note to self and others following this thread:
the queue size can be changed via `ProducerBuilder.maxPendingMessages()`
----
2020-03-04 05:42:36 UTC - Kannan: yes
----
2020-03-04 07:36:35 UTC - Greg: No, i can reproduce easily, but only with
several brokers. If i use a single broker, i don't reproduce as the Topic is
never cleaned. I need to understand why the Topic is cleaned even when there
are subscribers/consumers connected...
----
2020-03-04 07:39:45 UTC - Greg: We managed to workaround the issue by setting
brokerDeleteInactiveTopicsEnabled=false, but this is a temporary solution...
----
2020-03-04 07:50:47 UTC - Sijie Guo: each broker should have its own advertised
address
----
2020-03-04 07:50:50 UTC - Sijie Guo: you can’t assign same address to different
brokers.
----
2020-03-04 07:56:59 UTC - Kannan: by default, if we advertise broker ip its not
accessible by its own pod with mTLS (<http://pod-ip:8080>)
----
2020-03-04 08:09:16 UTC - Greg: @Penghui Li: Hi, i just reproduced the issue
with 2 brokers, i have a publisher and a consumer connected to the topic on
broker 1 and i see that broker 2 deletes the topic :
```Broker 1 :
08:01:56.281 [pulsar-io-24-3] INFO org.apache.pulsar.broker.service.ServerCnx
- [/10.200.8.120:51372] Subscribing on topic
<non-persistent://public/default/cluster> /
Monitor-istra-monitor-667c9fbbcc-9cbns
08:01:56.309 [pulsar-io-24-3] WARN
org.apache.pulsar.broker.service.AbstractTopic -
[<non-persistent://public/default/cluster>] Error getting policies
java.util.concurrent.CompletableFuture cannot be cast to
org.apache.pulsar.common.policies.data.Policies and publish throttling will be
disabled
08:01:56.309 [pulsar-io-24-3] INFO
org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling
for <non-persistent://public/default/cluster>
08:01:56.314 [pulsar-io-24-3] INFO
org.apache.pulsar.broker.service.BrokerService - Created topic
NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
08:01:57.199 [Thread-9] INFO
org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic -
[<non-persistent://public/default/cluster][Monitor-istra-monitor-667c9fbbcc-9cbns]>
Created new subscription for 0
08:01:57.199 [Thread-9] INFO org.apache.pulsar.broker.service.ServerCnx -
[/10.200.8.120:51372] Created subscription on topic
<non-persistent://public/default/cluster> /
Monitor-istra-monitor-667c9fbbcc-9cbns
08:01:57.888 [pulsar-io-24-3] INFO org.apache.pulsar.broker.service.ServerCnx
- [/10.200.8.120:51372][<non-persistent://public/default/cluster>] Creating
producer. producerId=0
08:01:58.025 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO
org.apache.pulsar.broker.service.ServerCnx - [/10.200.8.120:51372] Created new
producer:
Producer{topic=NonPersistentTopic{topic=<non-persistent://public/default/cluster>},
client=/10.200.8.120:51372,
producerName=Monitor-istra-monitor-667c9fbbcc-9cbns, producerId=0}
08:03:10.130 [pulsar-ordered-OrderedExecutor-5-0-EventThread] INFO
org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000
sessionid:0x100d2a69359002d local:/10.200.3.150:50400
remoteserver:zookeeper/10.200.7.69:2181 lastZxid:4294987042 xid:600 sent:600
recv:631 queuedpkts:0 pendingresp:0 queuedevents:0] Received ZooKeeper watch
event: WatchedEvent state:SyncConnected type:NodeDataChanged
path:/schemas/public/default/cluster
Broker 2 :
08:01:55.954 [ForkJoinPool.commonPool-worker-0] WARN
org.apache.pulsar.broker.service.AbstractTopic -
[<non-persistent://public/default/cluster>] Error getting policies
java.util.concurrent.CompletableFuture cannot be cast to
org.apache.pulsar.common.policies.data.Policies and publish throttling will be
disabled
08:01:55.955 [ForkJoinPool.commonPool-worker-0] INFO
org.apache.pulsar.broker.service.AbstractTopic - Disabling publish throttling
for <non-persistent://public/default/cluster>
08:01:55.959 [ForkJoinPool.commonPool-worker-0] INFO
org.apache.pulsar.broker.service.BrokerService - Created topic
NonPersistentTopic{topic=<non-persistent://public/default/cluster>}
08:03:10.129 [pulsar-ordered-OrderedExecutor-4-0-EventThread] INFO
org.apache.pulsar.zookeeper.ZooKeeperCache - [State:CONNECTED Timeout:30000
sessionid:0x200eccd91c10031 local:/10.200.7.133:36742
remoteserver:zookeeper/10.200.8.87:2181 lastZxid:4294987046 xid:186 sent:186
recv:202 queuedpkts:0 pendingresp:0 queuedevents:1] Received ZooKeeper watch
event: WatchedEvent state:SyncConnected type:NodeDataChanged
path:/schemas/public/default/cluster
08:03:10.132 [pulsar-io-24-3] INFO
org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic -
[<non-persistent://public/default/cluster>] Topic deleted
08:03:10.133 [pulsar-io-24-3] INFO
org.apache.pulsar.broker.service.nonpersistent.NonPersistentTopic -
[<non-persistent://public/default/cluster>] Topic deleted successfully due to
inactivity```
----
2020-03-04 08:11:58 UTC - Greg: Is it normal that the topic is created in both
brokers ?
----
2020-03-04 08:14:20 UTC - Penghui Li: Ok, let me take a look.
----
2020-03-04 08:19:52 UTC - Greg: thanks
----
2020-03-04 08:31:09 UTC - Vincent LE MAITRE: Hi, I would like to develop Pulsar
functions with state. On my local standalone Pulsar installation it works fine
(state is enabled by default). On a kubernetes Pulsar installation, deployed
using Pulsar helm templates, the state feature seems to be disabled by default.
And I am not able to enable this feature. Is there someone who succeed in
activating function state on Kubernetes deployment ? Does the provided helm
template allow to enable pulsar functions state ? Thanks
----
2020-03-04 08:42:02 UTC - Pavel Tishkevich: Zookeeper node has 12 CPU cores,
64GB of RAM, CPU idle > 75%
On each zookeeper node there are also deployed 1 broker and 1 bookie.
System load is not high: after adding more brokers it is never more than 5
(less than 50%).
There are almost no processes blocked by I/O.
Also we have disabled fsync in Zookeeper.
All this looks like problem is not related to hardware/disk speed.
----
2020-03-04 08:42:21 UTC - Pavel Tishkevich:
----