2020-03-09 09:59:58 UTC - Pedro Cardoso: Does pulsar support creating consumers without topics associated and during runtime add & remove topics on the fly? ---- 2020-03-09 10:28:41 UTC - Kartik Gupta: @Kartik Gupta has joined the channel ---- 2020-03-09 10:29:29 UTC - Atif: Is there any way to bring up a local copy of pulsar for running integration tests sans testcontainers? ---- 2020-03-09 10:32:12 UTC - Ali Ahmed: <https://github.com/streamlio/pulsar-embedded-tutorial/blob/master/src/test/java/org/apache/pulsar/PulsarEmbeddedTest.java> +1 : Atif, Kartik Gupta ---- 2020-03-09 10:33:11 UTC - Atif: thanks! ---- 2020-03-09 10:40:52 UTC - rani: I’m currently experiencing a similar problem. Any hints @Sijie Guo? Or have you had any breakthroughs @gfouquier? ---- 2020-03-09 11:32:37 UTC - gfouquier: no. I just reduce the verbosity to keep the log small, but it seems impossible to move them out of syslog ---- 2020-03-09 13:30:09 UTC - Meyappan Ramasamy: hi team, getting below error when trying to publish from pulsar-client ---- 2020-03-09 13:30:12 UTC - Meyappan Ramasamy: "Namespace missing local cluster name in clusters list: local_cluster=local ns=public/default clusters=[pulsarbm]" ---- 2020-03-09 14:08:38 UTC - Andy Papia: @Andy Papia has joined the channel ---- 2020-03-09 14:11:40 UTC - Andy Papia: Just getting started with Pulsar. When you auto-create a topic, can you start publishing and consuming right away or is there a delay? With Kafka, it takes like 2-3 seconds for the brokers to learn about the new topic from Zookeeper and I'm wondering if Pulsar has the same issue. ---- 2020-03-09 14:25:42 UTC - Josh Ryan: @Josh Ryan has joined the channel ---- 2020-03-09 15:11:56 UTC - Arun J: @Arun J has joined the channel ---- 2020-03-09 15:21:56 UTC - David Kjerrumgaard: @Andy Papia There is a delay but it is in the magnitude of milliseconds ---- 2020-03-09 15:22:15 UTC - Andy Papia: :thumbsup: ---- 2020-03-09 15:23:16 UTC - Andy Papia: Another quick question that I haven't seen addressed in the docs. When you create a topic, I assume you can set the replication for it like 2-way, 3-way, etc like you can with Kafka? ---- 2020-03-09 15:30:01 UTC - Matteo Merli: You set that on the namespace associated to the topic:
<https://pulsar.apache.org/docs/en/pulsar-admin/#set-persistence> ---- 2020-03-09 15:31:28 UTC - Matteo Merli: You can create a consumer with a regex expression to match topics. ---- 2020-03-09 15:38:26 UTC - Andy Papia: oh cool. missed that. ---- 2020-03-09 15:48:10 UTC - Pedro Cardoso: Can I change that regex over time? ---- 2020-03-09 15:48:53 UTC - Manuel Mueller: took a while but I figured it out -> it is the `wal_level` which needs to be set correctly. In my default docker postgres the level was on “replica” which doesnt offer enough data to debezium to work ---- 2020-03-09 15:49:39 UTC - Andy Papia: thinking about how to secure Pulsar. Is it possible to use a side car pattern with Istio in K8s to do mTLS? Has anyone done this or is it best to use the built-in TLS? ---- 2020-03-09 15:50:15 UTC - Matteo Merli: You can restart the consumer application with a new regex ---- 2020-03-09 15:50:46 UTC - Atif: Yes, our team has been working to see if this is possible ---- 2020-03-09 15:51:31 UTC - Atif: We got it to work, however need a few fixes that are only available in the recently released Istio 1.5.0 ---- 2020-03-09 15:52:52 UTC - Andy Papia: cool ---- 2020-03-09 15:52:53 UTC - Atif: I can ask folks to share our setup here @Andy Papia ---- 2020-03-09 15:53:08 UTC - Andy Papia: That would be great. I'd love to see an example setup. ---- 2020-03-09 15:54:57 UTC - Sijie Guo: The cluster “local” isn’t in the cluster list of public/default ---- 2020-03-09 15:55:40 UTC - Sijie Guo: Did you configure the right cluster name? Are you using local or pulsarbm as the cluster name? ---- 2020-03-09 15:56:00 UTC - Ian: I'm looking at using Pulsar for a scenario where an application would be processing messages which correspond to thousands of customers. It seems like having 1 topic per customer would require either thousands of producers, or creating a new producer every time a message is to be sent. Neither seem optimal. Is there something I'm missing, or in this case would it be better to not have a separate topic per customer so that only one producer is required? ---- 2020-03-09 16:00:01 UTC - Manuel Mueller: could you elaborate on why you would try it this way? Even though pulsar can handle a lot of topics - I do not really understand the need for it. ---- 2020-03-09 16:06:02 UTC - Ian: I was thinking it may be useful for per-customer configuration and monitoring, but it does seem that the added complexity would outweigh the benefit. ---- 2020-03-09 16:07:26 UTC - Manuel Mueller: I would agree - currently I am investigating the same concept due to GDPR compliance. Depending on your usecase, the added complexity and potential for failure is getting higher ---- 2020-03-09 16:13:08 UTC - Ian: Yes, I can see how it would be useful for GDPR compliance. ---- 2020-03-09 16:14:41 UTC - Ian: Keeping a map of producers in memory might work fine, but I'm not sure what the costs of this would be. ---- 2020-03-09 16:34:39 UTC - Andres Riofrio: Hi! I’m using pulsar-client to access IoT metrics from our provider (Tuya) and I seem to have misunderstood how failover subscriptions work with partitioned topics. I have two consumers (A and B) running on the same subscription to a partitioned topic. Message X was sent to a partition. Consumer A received message X and crashed without acknowledging it. It restarted. Consumer B also received it and restarted without acknowledging it. Then message Y was sent to the same partition by the same producer. Consumer A spun back up and received message X. It again crashed without acknowledging it. However, before Consumer A had exited, Consumer B received message Y. About 10 seconds later, it received message X. This breaks the assumption my code makes that consumers always receive messages in the same partition in order. I expected that Consumer B wouldn’t receive message Y (or any messages to that partition) until Consumer A had exited. The documentation for `SubscriptionType.Failover` says “On each partition, at most one consumer will be active at a given point in time.” Furthermore, I expected that B would receive message X before Y because both were unacknowledged and X was produced before Y. What am I missing? I would appreciate any insights! ---- 2020-03-09 17:10:37 UTC - Atif: Interesting, I've been thinking about this as well 1. With the above approach the issue becomes that there is a lot of overhead with very minimal data flowing through per topic and large number of producers per individual 2. Another approach that i can think of is moving this complexity processing side. IE encrypt the data per customer id which acts as a key to a single pulsar topic. Whenever data needs to be decrypted - every key corresponds to another encryption/decryption key in a lookup table sort of a service that is fetched during processing.Producers have access to this lookup service and can encrypt data while writing to the topic. Right to forget simply translates to deleting the key. from the lookup table? ---- 2020-03-09 17:51:30 UTC - Bobby: when attempting to setup a new environment in pulsar-manager, i'm getting an error saying "broker is exist" ---- 2020-03-09 17:51:35 UTC - Bobby: do we have any ideas what could cause that? ---- 2020-03-09 17:52:06 UTC - Pradeesh: ---- 2020-03-09 17:52:58 UTC - Pradeesh: @Roman Popenov @Sijie Guo ^^ ---- 2020-03-09 17:53:13 UTC - Pradeesh: error says “broker is exist” ---- 2020-03-09 17:53:17 UTC - Pradeesh: not sure what that means ---- 2020-03-09 17:53:33 UTC - Roman Popenov: You can try seeing what is the service URL for the ```bin/pulsar-admin clusters get ${cluster}``` ---- 2020-03-09 17:53:42 UTC - Pradeesh: ok ---- 2020-03-09 17:53:55 UTC - Roman Popenov: And pass that URL ---- 2020-03-09 17:54:37 UTC - Roman Popenov: ``` /pulsar/bin/pulsar-admin clusters list -> this will return the clusters /pulsar/bin/pulsar-admin clusters get my-dev-cluster -> to see the URLs``` ---- 2020-03-09 17:55:15 UTC - Pradeesh: ```# bin/pulsar-admin clusters get pulsar-dev-dev { "serviceUrl" : "<http://172.27.223.166:8080/>", "brokerServiceUrl" : "<pulsar://172.27.223.166:6650/>"``` ---- 2020-03-09 17:55:35 UTC - Pradeesh: so my service URL is correct then ---- 2020-03-09 17:56:22 UTC - Pradeesh: when I try it it says “environment is exist” ---- 2020-03-09 17:57:40 UTC - Roman Popenov: oh, refresh the page ---- 2020-03-09 17:58:32 UTC - Pradeesh: refresh worked …thanks ---- 2020-03-09 17:58:39 UTC - Pradeesh: haha ---- 2020-03-09 17:58:41 UTC - Roman Popenov: Yeah, there is a bug ---- 2020-03-09 17:59:10 UTC - Roman Popenov: When you put wrong url - it creates everything, but doesn’t close the popup ---- 2020-03-09 18:01:02 UTC - Pradeesh: does the manager really provide any value ? ---- 2020-03-09 18:01:27 UTC - Pradeesh: does it show us live metrics or data on diff topics - throughput etc ? ---- 2020-03-09 18:02:02 UTC - Roman Popenov: I think it’s configurable and you can set the refresh, but I found that there were some discrepencies and didn’t really dig too deep ---- 2020-03-09 18:02:55 UTC - Roman Popenov: > does the manager really provide any value It is what it is ---- 2020-03-09 18:26:10 UTC - Sijie Guo: the manager is mainly a management console. it does basic stats collections (throughput by topics, namespaces and tenants) and provide you a way to manage your clusters via a GUI. If you are looking for visualizing the metrics, prometheus + grafana is the perferrable solution. If you are using datadog, you can send the metrics to datadog to visualize the metrics as well. ---- 2020-03-09 19:11:14 UTC - Andy Papia: I'm deploying Pulsar for the first time on Kubernetes with Helm. Does anyone know why the chart makes such large resource reservations for Zookeeper? ``` resources: requests: memory: 15Gi cpu: 4``` ---- 2020-03-09 19:11:40 UTC - Andy Papia: I would think that the load on ZK would be pretty light. ---- 2020-03-09 19:13:12 UTC - Roman Popenov: Yeah, those are pretty random ---- 2020-03-09 19:13:52 UTC - Andy Papia: ok. I'll try smaller values. Just wanted to sanity check. ---- 2020-03-09 19:15:16 UTC - Roman Popenov: 1 CPU and 1 Gi is en masse for 150k messages/s and ample tenants with topics ---- 2020-03-09 19:41:32 UTC - Alexander Ursu: Hi, are there any examples anywhere of a JSON schema file. Only example I can find is using AVRO but I dislike how all quotes have to be escaped. Are JSON schema files any better in comparison? ---- 2020-03-09 20:12:52 UTC - Andy Papia: Doing some benchmarking on AWS. Does it make sense to run the bookies on i3 (storage optimized) instances? What about the brokers? If I understand correctly the brokers are stateless so would m5 (general purpose) or c5 (compute optimized) be best? ---- 2020-03-09 20:25:42 UTC - Sijie Guo: you can check out this <http://pulsar.apache.org/docs/en/schema-manage/#upload-a-schema> if the schema is a struct schema, the `schema` definition is the JSON string of schema definition written using AVRO schema specification. ---- 2020-03-09 20:36:20 UTC - Greg Methvin: What usually causes publishes to time out? Is it possible it can be caused by a client-side issue? Let’s say I have something like 100k simultaneous publishes happening at once. Is it possible that could cause the publishes to time out on the client side? ---- 2020-03-09 20:58:10 UTC - Sijie Guo: for bookies, just make sure you have fast disks and allocate enough memory for both JVM and filesystem. so you can pick up the machines that have fast disks. for brokers, there is not much constraints. ---- 2020-03-09 20:58:24 UTC - Andy Papia: Bringing up Pulsar for the first time but one of my bookkeepers is failing to start with this exception: ```20:52:38.624 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4 bookieHost: "100.96.14.4:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa" ] is not matching with [4 bookieHost: "100.96.11.2:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa" ]``` ---- 2020-03-09 20:59:03 UTC - Andy Papia: :thumbsup: ---- 2020-03-09 21:04:51 UTC - Sijie Guo: timeout can happen if the client reaches max pending threshold. when the queue full, a client (configured with blockIfQueueFull), it will wait for previous outstanding requests to come back. If the previous requests don’t come back, those requests blocking on queue full will be failed with send timeout when they have wait longer than send timeout. And the reason why the previous requests don’t come back is usually due to increase latency or reaching disk bandwidth limit on bookies. But you might have to look into bookie metrics to understand more. If you send 100k messages simultaneously (and depends what is your send timeout), there is a chance that your requests timed out due to pending queue is full. ---- 2020-03-09 21:05:39 UTC - Sijie Guo: the bookieHost is changed upon restarts. Are you using containers? If you are using container, you need to use a stable name for advertised address. ---- 2020-03-09 21:07:22 UTC - Sijie Guo: bookie has a very strict verification about environment and configuration to ensure consistency. so it keeps a cookie about the system settings (e.g. bookie host, directories and etc). The cookie is kept locally and zookeeper. It checks the cookie each time the bookie is restarted. so you need to make sure those systems are not changed upon bookie restarts. ---- 2020-03-09 21:29:07 UTC - Greg Methvin: thanks, I think that makes sense ---- 2020-03-09 21:30:15 UTC - Greg Methvin: we also did set blockIfQueueFull to match our previous setup with rabbitmq, so this makes sense ---- 2020-03-09 21:31:28 UTC - Greg Methvin: and we saw high thread usage during this situation as well ---- 2020-03-09 22:12:28 UTC - Bobby: does anyone know why this wouldn't work? ---- 2020-03-09 22:12:29 UTC - Sijie Guo: I see. You can also try to check your bookkeeper dashboard to learn if there is increasing latency on add latency or journal latency. Those are good resources to learn insights as well. ---- 2020-03-09 22:12:35 UTC - Bobby: ```COSML-9841305:~/viper_repos/hewish$ go get -u <http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar> # <http://github.com/apache/pulsar/pulsar-client-go/pulsar|github.com/apache/pulsar/pulsar-client-go/pulsar> In file included from ../../go/src/github.com/apache/pulsar/pulsar-client-go/pulsar/c_client.go:24: ./c_go_pulsar.h:22:10: fatal error: 'pulsar/c/client.h' file not found #include <pulsar/c/client.h> ^~~~~~~~~~~~~~~~~~~ 1 error generated.``` ---- 2020-03-09 22:12:49 UTC - Bobby: did the repo move? ---- 2020-03-09 22:14:11 UTC - Sijie Guo: you are using a c-go wrapper. so you need to make sure you have installed the c/c++ library locally. ---- 2020-03-09 22:14:41 UTC - Bobby: ok thanks ---- 2020-03-09 22:18:44 UTC - Sijie Guo: or you can consider using the native go client - <https://github.com/apache/pulsar-client-go> ---- 2020-03-09 22:22:58 UTC - Bobby: i got a few errors after installing libpulsar, but it appears to work ---- 2020-03-09 22:23:00 UTC - Bobby: thank you ---- 2020-03-09 22:56:30 UTC - Sijie Guo: you are welcome ---- 2020-03-10 00:32:43 UTC - Andy Papia: ok I'm using the helm chart and running containers on Kubernetes. not sure how to keep a stable name but I'll look into it. thanks for the context. ---- 2020-03-10 00:39:32 UTC - Sijie Guo: you can use hostName. ---- 2020-03-10 00:39:51 UTC - Sijie Guo: In the bookkeeper config map, you can configure useHostname so that bookkeeper pods can use hostname as the bookkeeper identifier. ---- 2020-03-10 00:41:21 UTC - Andy Papia: nice I'll try that ---- 2020-03-10 01:07:59 UTC - Andy Papia: hmm I've added that to the configmap and did a new install from helm but I get the same error: ```01:07:05.863 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4 bookieHost: "100.96.13.6:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa" ] is not matching with [4 bookieHost: "100.96.11.2:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa" ]``` ---- 2020-03-10 01:09:34 UTC - Andy Papia: This is the configmap: ```PULSAR_PREFIX_useHostNameAsBookieID: ---- true``` ---- 2020-03-10 01:23:43 UTC - Sijie Guo: Remove PULSAR_PREFIX_ ---- 2020-03-10 01:23:49 UTC - Andy Papia: hmm wasn't sure if PULSAR_PREFIX was needed so I removed it ---- 2020-03-10 01:23:58 UTC - Andy Papia: but I'm still getting this: ```01:21:19.922 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie [4 bookieHost: "my-pulsar-bookkeeper-0.my-pulsar-bookkeeper.default.svc.cluster.local:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa" ] is not matching with [4 bookieHost: "100.96.11.2:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "8991dd28-e0fa-4a2f-b233-9dc529d859aa" ]``` ---- 2020-03-10 01:24:36 UTC - Sijie Guo: So you already have data stored on the disks. ---- 2020-03-10 01:24:43 UTC - Andy Papia: so it seems to be using the hostname now but it doesn't match the cookie? ---- 2020-03-10 01:24:49 UTC - Andy Papia: ahh that makes sense. I'll delete the volumes. ---- 2020-03-10 01:24:51 UTC - Andy Papia: thanks! ---- 2020-03-10 01:24:59 UTC - Sijie Guo: No problem ---- 2020-03-10 04:25:08 UTC - Jeon.DeukJin: Hello, Doen’t it provide ZSTD compression yet? ---- 2020-03-10 04:25:10 UTC - Jeon.DeukJin: <http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/CompressionType.html> ---- 2020-03-10 04:25:59 UTC - Jeon.DeukJin: but, here is provide. <http://pulsar.apache.org/api/client/2.5.0-SNAPSHOT/org/apache/pulsar/client/api/CompressionType.html> ---- 2020-03-10 04:27:33 UTC - Jeon.DeukJin: git hub. <https://github.com/apache/pulsar/blob/master/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/CompressionType.java> ---- 2020-03-10 04:27:54 UTC - Jeon.DeukJin: `/** Compress with Zstandard codec. */` `ZSTD,` ---- 2020-03-10 04:50:28 UTC - Antti Kaikkonen: After I started using function state in my source connector pulsar standalone appears to randomly stop working usually within a couple of minutes. I can't find any errors in logs, but the CompletableFuture's returned by getStateAsync never complete. Restarting the source connector doesn't help but restarting pulsar standalone does. Has anyone experienced anything similar? ---- 2020-03-10 04:56:06 UTC - Antti Kaikkonen: I have also tried creating a duplicate of the source connector (with a different name and output-topic) after the failure happens, but it also doesn't work until I restart pulsar. ---- 2020-03-10 08:23:36 UTC - Ken Huang: Does WebSocket API support authentication? ----
