Slack digest for #general - 2020-03-10

Apache Pulsar Slack Tue, 10 Mar 2020 02:12:04 -0700

2020-03-09 09:59:58 UTC - Pedro Cardoso: Does pulsar support creating consumers 
without topics associated and during runtime add &amp; remove topics on the fly?
----
2020-03-09 10:28:41 UTC - Kartik Gupta: @Kartik Gupta has joined the channel
----
2020-03-09 10:29:29 UTC - Atif: Is there any way to bring up a local copy of 
pulsar for running integration tests sans testcontainers?
----
2020-03-09 10:32:12 UTC - Ali Ahmed: 
<https://github.com/streamlio/pulsar-embedded-tutorial/blob/master/src/test/java/org/apache/pulsar/PulsarEmbeddedTest.java>
+1 : Atif, Kartik Gupta
----
2020-03-09 10:33:11 UTC - Atif: thanks!
----
2020-03-09 10:40:52 UTC - rani: I’m currently experiencing a similar problem. 
Any hints @Sijie Guo? Or have you had any breakthroughs @gfouquier?
----
2020-03-09 11:32:37 UTC - gfouquier: no. I just reduce the verbosity to keep 
the log small, but it seems impossible to move them out of syslog
----
2020-03-09 13:30:09 UTC - Meyappan Ramasamy: hi team, getting below error when 
trying to publish from pulsar-client
----
2020-03-09 13:30:12 UTC - Meyappan Ramasamy: "Namespace missing local cluster 
name in clusters list: local_cluster=local ns=public/default 
clusters=[pulsarbm]"
----
2020-03-09 14:08:38 UTC - Andy Papia: @Andy Papia has joined the channel
----
2020-03-09 14:11:40 UTC - Andy Papia: Just getting started with Pulsar.  When 
you auto-create a topic, can you start publishing and consuming right away or 
is there a delay?  With Kafka, it takes like 2-3 seconds for the brokers to 
learn about the new topic from Zookeeper and I'm wondering if Pulsar has the 
same issue.
----
2020-03-09 14:25:42 UTC - Josh Ryan: @Josh Ryan has joined the channel
----
2020-03-09 15:11:56 UTC - Arun J: @Arun J has joined the channel
----
2020-03-09 15:21:56 UTC - David Kjerrumgaard: @Andy Papia There is a delay but 
it is in the magnitude of milliseconds
----
2020-03-09 15:22:15 UTC - Andy Papia: :thumbsup:
----
2020-03-09 15:23:16 UTC - Andy Papia: Another quick question that I haven't 
seen addressed in the docs.  When you create a topic, I assume you can set the 
replication for it like 2-way, 3-way, etc like you can with Kafka?
----
2020-03-09 15:30:01 UTC - Matteo Merli: You set that on the namespace 
associated to the topic:

<https://pulsar.apache.org/docs/en/pulsar-admin/#set-persistence>
----
2020-03-09 15:31:28 UTC - Matteo Merli: You can create a consumer with a regex
expression to match topics.
----
2020-03-09 15:38:26 UTC - Andy Papia: oh cool. missed that.
----
2020-03-09 15:48:10 UTC - Pedro Cardoso: Can I change that regex over time?
----
2020-03-09 15:48:53 UTC - Manuel Mueller: took a while but I figured it out
-&gt; it is the `wal_level` which needs to be set correctly. In my default
docker postgres the level was on “replica” which doesnt offer enough data to
debezium to work
----
2020-03-09 15:49:39 UTC - Andy Papia: thinking about how to secure Pulsar. Is
it possible to use a side car pattern with Istio in K8s to do mTLS? Has anyone
done this or is it best to use the built-in TLS?
----
2020-03-09 15:50:15 UTC - Matteo Merli: You can restart the consumer
application with a new regex
----
2020-03-09 15:50:46 UTC - Atif: Yes, our team has been working to see if this
is possible
----
2020-03-09 15:51:31 UTC - Atif: We got it to work, however need a few fixes
that are only available in the recently released Istio 1.5.0
----
2020-03-09 15:52:52 UTC - Andy Papia: cool
----
2020-03-09 15:52:53 UTC - Atif: I can ask folks to share our setup here @Andy
Papia
----
2020-03-09 15:53:08 UTC - Andy Papia: That would be great. I'd love to see an
example setup.
----
2020-03-09 15:54:57 UTC - Sijie Guo: The cluster “local” isn’t in the cluster
list of public/default
----
2020-03-09 15:55:40 UTC - Sijie Guo: Did you configure the right cluster name?
Are you using local or pulsarbm as the cluster name?
----
2020-03-09 15:56:00 UTC - Ian: I'm looking at using Pulsar for a scenario where
an application would be processing messages which correspond to thousands of
customers. It seems like having 1 topic per customer would require either
thousands of producers, or creating a new producer every time a message is to
be sent. Neither seem optimal. Is there something I'm missing, or in this case
would it be better to not have a separate topic per customer so that only one
producer is required?
----
2020-03-09 16:00:01 UTC - Manuel Mueller: could you elaborate on why you would
try it this way? Even though pulsar can handle a lot of topics - I do not
really understand the need for it.
----
2020-03-09 16:06:02 UTC - Ian: I was thinking it may be useful for per-customer
configuration and monitoring, but it does seem that the added complexity would
outweigh the benefit.
----
2020-03-09 16:07:26 UTC - Manuel Mueller: I would agree - currently I am
investigating the same concept due to GDPR compliance. Depending on your
usecase, the added complexity and potential for failure is getting higher
----
2020-03-09 16:13:08 UTC - Ian: Yes, I can see how it would be useful for GDPR
compliance.
----
2020-03-09 16:14:41 UTC - Ian: Keeping a map of producers in memory might work
fine, but I'm not sure what the costs of this would be.
----
2020-03-09 16:34:39 UTC - Andres Riofrio: Hi! I’m using pulsar-client to access
IoT metrics from our provider (Tuya) and I seem to have misunderstood how
failover subscriptions work with partitioned topics.

I have two consumers (A and B) running on the same subscription to a
partitioned topic. Message X was sent to a partition. Consumer A received
message X and crashed without acknowledging it. It restarted. Consumer B also
received it and restarted without acknowledging it. Then message Y was sent to
the same partition by the same producer. Consumer A spun back up and received
message X. It again crashed without acknowledging it. However, before Consumer
A had exited, Consumer B received message Y. About 10 seconds later, it
received message X. This breaks the assumption my code makes that consumers
always receive messages in the same partition in order.

I expected that Consumer B wouldn’t receive message Y (or any messages to that
partition) until Consumer A had exited. The documentation for
`SubscriptionType.Failover` says “On each partition, at most one consumer will
be active at a given point in time.” Furthermore, I expected that B would
receive message X before Y because both were unacknowledged and X was produced
before Y.

What am I missing? I would appreciate any insights!
----
2020-03-09 17:10:37 UTC - Atif: Interesting, I've been thinking about this as
well

1. With the above approach the issue becomes that there is a lot of overhead
with very minimal data flowing through per topic and large number of producers
per individual
2. Another approach that i can think of is moving this complexity processing
side. IE encrypt the data per customer id which acts as a key to a single
pulsar topic. Whenever data needs to be decrypted - every key corresponds to
another encryption/decryption key in a lookup table sort of a service that is
fetched during processing.Producers have access to this lookup service and can
encrypt data while writing to the topic. Right to forget simply translates to
deleting the key. from the lookup table?
----
2020-03-09 17:51:30 UTC - Bobby: when attempting to setup a new environment in
pulsar-manager, i'm getting an error saying "broker is exist"
----
2020-03-09 17:51:35 UTC - Bobby: do we have any ideas what could cause that?
----
2020-03-09 17:52:06 UTC - Pradeesh:
----
2020-03-09 17:52:58 UTC - Pradeesh: @Roman Popenov @Sijie Guo ^^
----
2020-03-09 17:53:13 UTC - Pradeesh: error says “broker is exist”
----
2020-03-09 17:53:17 UTC - Pradeesh: not sure what that means
----
2020-03-09 17:53:33 UTC - Roman Popenov: You can try seeing what is the service
URL for the
```bin/pulsar-admin clusters get ${cluster}```

----
2020-03-09 17:53:42 UTC - Pradeesh: ok
----
2020-03-09 17:53:55 UTC - Roman Popenov: And pass that URL
----
2020-03-09 17:54:37 UTC - Roman Popenov: ``` /pulsar/bin/pulsar-admin
clusters list -&gt; this will return the clusters
/pulsar/bin/pulsar-admin clusters get my-dev-cluster -&gt; to see the URLs```
----
2020-03-09 17:55:15 UTC - Pradeesh: ```# bin/pulsar-admin clusters get
pulsar-dev-dev
{
"serviceUrl" : "<http://172.27.223.166:8080/>",
"brokerServiceUrl" : "<pulsar://172.27.223.166:6650/>"```
----
2020-03-09 17:55:35 UTC - Pradeesh: so my service URL is correct then
----
2020-03-09 17:56:22 UTC - Pradeesh: when I try it it says “environment is exist”
----
2020-03-09 17:57:40 UTC - Roman Popenov: oh, refresh the page
----
2020-03-09 17:58:32 UTC - Pradeesh: refresh worked …thanks
----
2020-03-09 17:58:39 UTC - Pradeesh: haha
----
2020-03-09 17:58:41 UTC - Roman Popenov: Yeah, there is a bug
----
2020-03-09 17:59:10 UTC - Roman Popenov: When you put wrong url - it creates
everything, but doesn’t close the popup
----
2020-03-09 18:01:02 UTC - Pradeesh: does the manager really provide any value ?
----
2020-03-09 18:01:27 UTC - Pradeesh: does it show us live metrics or data on
diff topics - throughput etc ?
----
2020-03-09 18:02:02 UTC - Roman Popenov: I think it’s configurable and you can
set the refresh, but I found that there were some discrepencies and didn’t
really dig too deep
----
2020-03-09 18:02:55 UTC - Roman Popenov: &gt; does the manager really provide
any value
It is what it is
----
2020-03-09 18:26:10 UTC - Sijie Guo: the manager is mainly a management
console. it does basic stats collections (throughput by topics, namespaces and
tenants) and provide you a way to manage your clusters via a GUI.

If you are looking for visualizing the metrics, prometheus + grafana is the
perferrable solution. If you are using datadog, you can send the metrics to
datadog to visualize the metrics as well.
----
2020-03-09 19:11:14 UTC - Andy Papia: I'm deploying Pulsar for the first time
on Kubernetes with Helm. Does anyone know why the chart makes such large
resource reservations for Zookeeper?
``` resources:
requests:
memory: 15Gi
cpu: 4```
----
2020-03-09 19:11:40 UTC - Andy Papia: I would think that the load on ZK would
be pretty light.
----
2020-03-09 19:13:12 UTC - Roman Popenov: Yeah, those are pretty random
----
2020-03-09 19:13:52 UTC - Andy Papia: ok. I'll try smaller values. Just
wanted to sanity check.
----
2020-03-09 19:15:16 UTC - Roman Popenov: 1 CPU and 1 Gi is en masse for 150k
messages/s and ample tenants with topics
----
2020-03-09 19:41:32 UTC - Alexander Ursu: Hi, are there any examples anywhere
of a JSON schema file. Only example I can find is using AVRO but I dislike how
all quotes have to be escaped. Are JSON schema files any better in comparison?
----
2020-03-09 20:12:52 UTC - Andy Papia: Doing some benchmarking on AWS. Does it
make sense to run the bookies on i3 (storage optimized) instances? What about
the brokers? If I understand correctly the brokers are stateless so would m5
(general purpose) or c5 (compute optimized) be best?
----
2020-03-09 20:25:42 UTC - Sijie Guo: you can check out this
<http://pulsar.apache.org/docs/en/schema-manage/#upload-a-schema>

if the schema is a struct schema, the `schema` definition is the JSON string of
schema definition written using AVRO schema specification.
----
2020-03-09 20:36:20 UTC - Greg Methvin: What usually causes publishes to time
out? Is it possible it can be caused by a client-side issue? Let’s say I have
something like 100k simultaneous publishes happening at once. Is it possible
that could cause the publishes to time out on the client side?
----
2020-03-09 20:58:10 UTC - Sijie Guo: for bookies, just make sure you have fast
disks and allocate enough memory for both JVM and filesystem. so you can pick
up the machines that have fast disks.

for brokers, there is not much constraints.
----
2020-03-09 20:58:24 UTC - Andy Papia: Bringing up Pulsar for the first time but
one of my bookkeepers is failing to start with this exception:
```20:52:38.624 [main] ERROR org.apache.bookkeeper.server.Main - Failed to
build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost: "100.96.14.4:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa"
] is not matching with [4
bookieHost: "100.96.11.2:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa"
]```
----
2020-03-09 20:59:03 UTC - Andy Papia: :thumbsup:
----
2020-03-09 21:04:51 UTC - Sijie Guo: timeout can happen if the client reaches
max pending threshold. when the queue full, a client (configured with
blockIfQueueFull), it will wait for previous outstanding requests to come back.
If the previous requests don’t come back, those requests blocking on queue full
will be failed with send timeout when they have wait longer than send timeout.

And the reason why the previous requests don’t come back is usually due to
increase latency or reaching disk bandwidth limit on bookies. But you might
have to look into bookie metrics to understand more.

If you send 100k messages simultaneously (and depends what is your send
timeout), there is a chance that your requests timed out due to pending queue
is full.
----
2020-03-09 21:05:39 UTC - Sijie Guo: the bookieHost is changed upon restarts.
Are you using containers? If you are using container, you need to use a stable
name for advertised address.
----
2020-03-09 21:07:22 UTC - Sijie Guo: bookie has a very strict verification
about environment and configuration to ensure consistency. so it keeps a cookie
about the system settings (e.g. bookie host, directories and etc). The cookie
is kept locally and zookeeper. It checks the cookie each time the bookie is
restarted. so you need to make sure those systems are not changed upon bookie
restarts.
----
2020-03-09 21:29:07 UTC - Greg Methvin: thanks, I think that makes sense
----
2020-03-09 21:30:15 UTC - Greg Methvin: we also did set blockIfQueueFull to
match our previous setup with rabbitmq, so this makes sense
----
2020-03-09 21:31:28 UTC - Greg Methvin: and we saw high thread usage during
this situation as well
----
2020-03-09 22:12:28 UTC - Bobby: does anyone know why this wouldn't work?
----
2020-03-09 22:12:29 UTC - Sijie Guo: I see. You can also try to check your
bookkeeper dashboard to learn if there is increasing latency on add latency or
journal latency. Those are good resources to learn insights as well.
----
2020-03-09 22:12:35 UTC - Bobby: ```COSML-9841305:~/viper_repos/hewish$ go get
-u
<http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar>
#
<http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar>
In file included from
../../go/src/github.com/apache/pulsar/pulsar-client-go/pulsar/c_client.go:24:
./c_go_pulsar.h:22:10: fatal error: 'pulsar/c/client.h' file not found
#include &lt;pulsar/c/client.h&gt;
^~~~~~~~~~~~~~~~~~~
1 error generated.```
----
2020-03-09 22:12:49 UTC - Bobby: did the repo move?
----
2020-03-09 22:14:11 UTC - Sijie Guo: you are using a c-go wrapper. so you need
to make sure you have installed the c/c++ library locally.
----
2020-03-09 22:14:41 UTC - Bobby: ok thanks
----
2020-03-09 22:18:44 UTC - Sijie Guo: or you can consider using the native go
client - <https://github.com/apache/pulsar-client-go>
----
2020-03-09 22:22:58 UTC - Bobby: i got a few errors after installing libpulsar,
but it appears to work
----
2020-03-09 22:23:00 UTC - Bobby: thank you
----
2020-03-09 22:56:30 UTC - Sijie Guo: you are welcome
----
2020-03-10 00:32:43 UTC - Andy Papia: ok I'm using the helm chart and running
containers on Kubernetes. not sure how to keep a stable name but I'll look
into it. thanks for the context.
----
2020-03-10 00:39:32 UTC - Sijie Guo: you can use hostName.
----
2020-03-10 00:39:51 UTC - Sijie Guo: In the bookkeeper config map, you can
configure useHostname so that bookkeeper pods can use hostname as the
bookkeeper identifier.
----
2020-03-10 00:41:21 UTC - Andy Papia: nice I'll try that
----
2020-03-10 01:07:59 UTC - Andy Papia: hmm I've added that to the configmap and
did a new install from helm but I get the same error:
```01:07:05.863 [main] ERROR org.apache.bookkeeper.server.Main - Failed to
build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost: "100.96.13.6:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa"
] is not matching with [4
bookieHost: "100.96.11.2:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa"
]```
----
2020-03-10 01:09:34 UTC - Andy Papia: This is the configmap:
```PULSAR_PREFIX_useHostNameAsBookieID:
----
true```
----
2020-03-10 01:23:43 UTC - Sijie Guo: Remove PULSAR_PREFIX_
----
2020-03-10 01:23:49 UTC - Andy Papia: hmm wasn't sure if PULSAR_PREFIX was
needed so I removed it
----
2020-03-10 01:23:58 UTC - Andy Papia: but I'm still getting this:
```01:21:19.922 [main] ERROR org.apache.bookkeeper.server.Main - Failed to
build bookie server
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4
bookieHost:
"my-pulsar-bookkeeper-0.my-pulsar-bookkeeper.default.svc.cluster.local:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa"
] is not matching with [4
bookieHost: "100.96.11.2:3181"
journalDir: "data/bookkeeper/journal"
ledgerDirs: "1\tdata/bookkeeper/ledgers"
instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa"
]```
----
2020-03-10 01:24:36 UTC - Sijie Guo: So you already have data stored on the
disks.
----
2020-03-10 01:24:43 UTC - Andy Papia: so it seems to be using the hostname now
but it doesn't match the cookie?
----
2020-03-10 01:24:49 UTC - Andy Papia: ahh that makes sense. I'll delete the
volumes.
----
2020-03-10 01:24:51 UTC - Andy Papia: thanks!
----
2020-03-10 01:24:59 UTC - Sijie Guo: No problem

----
2020-03-10 04:25:08 UTC - Jeon.DeukJin: Hello,
Doen’t it provide ZSTD compression yet?
----
2020-03-10 04:25:10 UTC - Jeon.DeukJin:
<http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/CompressionType.html>
----
2020-03-10 04:25:59 UTC - Jeon.DeukJin: but, here is provide.
<http://pulsar.apache.org/api/client/2.5.0-SNAPSHOT/org/apache/pulsar/client/api/CompressionType.html>
----
2020-03-10 04:27:33 UTC - Jeon.DeukJin: git hub.
<https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/CompressionType.java>
----
2020-03-10 04:27:54 UTC - Jeon.DeukJin: `/** Compress with Zstandard codec. */`
`ZSTD,`
----
2020-03-10 04:50:28 UTC - Antti Kaikkonen: After I started using function state
in my source connector pulsar standalone appears to randomly stop working
usually within a couple of minutes. I can't find any errors in logs, but the
CompletableFuture's returned by getStateAsync never complete. Restarting the
source connector doesn't help but restarting pulsar standalone does. Has anyone
experienced anything similar?
----
2020-03-10 04:56:06 UTC - Antti Kaikkonen: I have also tried creating a
duplicate of the source connector (with a different name and output-topic)
after the failure happens, but it also doesn't work until I restart pulsar.
----
2020-03-10 08:23:36 UTC - Ken Huang: Does WebSocket API support authentication?
----

Slack digest for #general - 2020-03-10

Reply via email to