2020-09-17 09:42:58 UTC - charles: Anyone experiencing these kind of *TimeoutException* using the java client as well? Any solution direction? ---- 2020-09-17 12:11:06 UTC - dipali bhat: @dipali bhat has joined the channel ---- 2020-09-17 12:12:54 UTC - dipali bhat: Hi All, i am looking to deploy pulsar in kubernetes for production . i am not finding any reference document in terms of detailed steps and yaml file ---- 2020-09-17 12:13:01 UTC - dipali bhat: any help please ---- 2020-09-17 12:15:05 UTC - dipali bhat: k great thanks Manuel ---- 2020-09-17 12:15:21 UTC - Manuel Mueller: I actually just mixed the channel :smile: ---- 2020-09-17 12:15:33 UTC - dipali bhat: oh :slightly_smiling_face: ---- 2020-09-17 12:15:56 UTC - dipali bhat: let me follow the doc ... ---- 2020-09-17 12:16:17 UTC - dipali bhat: actually i have a 3 node kubernetes cluster ---- 2020-09-17 12:16:32 UTC - dipali bhat: and planning to deploy pulsar ---- 2020-09-17 12:16:40 UTC - Manuel Mueller: yea - you probably will not have access to this, since this will be company internal document ---- 2020-09-17 12:16:42 UTC - dipali bhat: i hope the yaml files r also provided ---- 2020-09-17 12:16:59 UTC - dipali bhat: oh ---- 2020-09-17 12:17:27 UTC - Manuel Mueller: but did you try to follow this guideline? <https://pulsar.apache.org/docs/en/deploy-kubernetes/> ---- 2020-09-17 12:17:52 UTC - dipali bhat: yes.. but the yaml files i could not find for reference ---- 2020-09-17 12:19:04 UTC - dipali bhat: if there is some good documentation that will help ---- 2020-09-17 12:22:41 UTC - Manuel Mueller: > The YAML resource definitions for Pulsar components can be found in the `kubernetes` folder of the <https://pulsar.apache.org/download|Pulsar source package>. ---- 2020-09-17 12:23:30 UTC - Manuel Mueller: you can try with those as a reference - but most likely you will need to customize them a lot to fit to your needs ---- 2020-09-17 12:23:57 UTC - Manuel Mueller: <https://pulsar.apache.org/docs/v2.0.0-rc1-incubating/deployment/Kubernetes/> ---- 2020-09-17 12:24:02 UTC - Manuel Mueller: i found this guide to behelpful ---- 2020-09-17 12:25:37 UTC - dipali bhat: i too following but the " The deployment method shown in this guide relies on <http://yaml.org/|YAML> definitions for Kubernetes <https://kubernetes.io/docs/resources-reference/v1.6/|resources>. The `kubernetes` subdirectory of the <https://pulsar.apache.org/download|Pulsar package> holds resource definitions for:" ---- 2020-09-17 12:25:58 UTC - dipali bhat: i downloaded and dont know which yaml to follow ---- 2020-09-17 12:26:16 UTC - dipali bhat: as there are many for zookeeper, bookie ,...broker etc ---- 2020-09-17 12:26:22 UTC - dipali bhat: so the confusion ---- 2020-09-17 12:36:49 UTC - dipali bhat: the document also talks about `deployment/kubernetes/generic` folder. To begin, `cd` into the appropriate folder.... but generic folder does not exist ---- 2020-09-17 12:36:53 UTC - dipali bhat: incomplete ---- 2020-09-17 12:36:58 UTC - dipali bhat: doc.. ---- 2020-09-17 12:47:19 UTC - Huanli Meng: @Huanli Meng set the channel topic: - Pulsar Summit Asia 2020 CFP is now open: <http://pulsar.apache.org/blog/2020/09/01/pulsar-summit-asia-2020-cfp/> - Pulsar Flink Connector 2.5.0 release blog <https://streamnative.io/blog/release/2020-09-17-pulsar-flink-connector-250> +1 : Frank Kelly ---- 2020-09-17 13:32:07 UTC - Helder Sousa: @Helder Sousa has joined the channel ---- 2020-09-17 13:36:22 UTC - Helder Sousa: Hi Team. How is `seek(MessageId)` implemented for partitions if Pulsar doesn't support it? <https://github.com/apache/pulsar/pull/7518> ---- 2020-09-17 13:52:09 UTC - Praveen Sannadi: Hello all. Recently I am working on configuring pulsar manager with custom Mysql DB. We deploy pulsar manager using helm chart ( referring to Apache pulsar helm chart ). So Initially I thought to configure the custom Mysql DB we just need to change the details below ```REDIRECT_HOST: "<http://127.0.0.1>" REDIRECT_PORT: "9527" DRIVER_CLASS_NAME: org.postgresql.Driver URL: jdbc:<postgresql://127.0.0.1:5432/pulsar_manager> LOG_LEVEL: DEBUG``` But when I looked more into apache/pulsar-manager repo I found <https://github.com/apache/pulsar-manager/commit/c1cbe2688f2e286e619a7cca48343798ac388520#diff-660a4c1a5cbedaa6e2556dde19faeb2d> I think MySql configuration was removed in this PR-190. I am not sure why but can we have a work around or a way to use custom Mysql DB config to pulsar-manager. To be specific using apache pulsar helm chart. I am really worried person now :pensive:. ---- 2020-09-17 13:56:51 UTC - Frank Kelly: These are the production Kubernetes deployment instructions we (roughly) follow using Helm <https://pulsar.apache.org/docs/en/helm-install/> ---- 2020-09-17 14:24:23 UTC - Marcus E: @Marcus E has joined the channel ---- 2020-09-17 14:34:11 UTC - Robin Custers: @Robin Custers has joined the channel ---- 2020-09-17 14:34:23 UTC - Brent Evans: We've introduced an ELB now and have our discovery being done by brokerUrls pointing to that ELB however we're still seeing the same issue with 504s. These seem to pickup once we have data being published to Pulsar topics. While no data is being published to the topics the 504s don't seem to be happening.
The only non-INFO messages in the Pulsar Proxy logs (these don't correlate to times our services are getting 504s): ```Sep 17 08:33:18 pulsar[21993]: 08:33:18.892 [pulsar-proxy-io-2-2] WARN org.apache.pulsar.proxy.server.LookupProxyHandler - [/172.16.3.41:21552] Failed to get TopicsOfNamespace public/tracking: org.apache.pulsar.client.api.PulsarClientException: Disconnected from server at <http://pulsar-broker-elb.elb.amazonaws.com/17.177.35.162:6650|pulsar-broker-elb.elb.amazonaws.com/17.177.35.162:6650> Sep 17 08:33:18 pulsar[21993]: 08:33:18.900 [pulsar-proxy-io-2-8] WARN org.apache.pulsar.common.protocol.PulsarHandler - [[id:0xc83d319c, L:/172.16.3.217:47504 - R:<http://pulsar-broker-elb.elb.amazonaws.com/17.177.80.246:6650|pulsar-broker-elb.elb.amazonaws.com/17.177.80.246:6650>]] Forcing connection to close after keep-alive timeout``` *Config:* ```# The ZooKeeper quorum connection string (as a comma-separated list) zookeeperServers= # Configuration store connection string (as a comma-separated list) configurationStoreServers= # if Service Discovery is Disabled this url should point to the discovery service provider. brokerServiceURL=<pulsar://pulsar-broker-elb.elb.amazonaws.com:6650> # These settings are unnecessary if `zookeeperServers` is specified brokerWebServiceURL=<http://pulsar-broker-elb.elb.amazonaws.com:8080>``` ---- 2020-09-17 14:40:50 UTC - Robin Custers: Hi all, I need some input on best practices related to topic organisation in apache pulsar. To give a bit of context: • we use pulsar as an event hub in a setup where we have different applications dealing with processing data of employers and employees. Each of these applications has its own responsibility (e.g. tax calculation, reporting,..). • Employers have no influence on each other • The order of events happened for an employer is important in the applications • There are about 40 000 employers in the system. They have a number of employees ranging from 1 to about 10 000 What would be the best way to setup our topics and why? • have one topic with partitions where we partition on the employer key for example. So for example: 1 topic and 3 partitions (we have 3 brokers) • have one topic per employer (and possibly a partitioned topic for large employers). In this case 40 000 topics which will be handled by the 3 brokers ---- 2020-09-17 16:11:00 UTC - Addison Higham: do you see error logs in your broker logs? ---- 2020-09-17 16:19:02 UTC - Addison Higham: apologies, I misread the original question, it doesn't look to be currently implemented in C++, it should be fairly easy to implement if you wouldn't mind seeing if there is an open issue and opening one if there isn't that be a good first step towards that ---- 2020-09-17 16:23:49 UTC - Addison Higham: in Pulsar, partitioned topics are composed of multiple topics under the hood, the Reader API works fine against those underlying topics, it just doesn't work for creating a reader against all the topics at once, like with the consumer API. You can create readers for each of the topic partitions without issue. Additionally, when you call `seek(MessageId)` that only really makes sense in the context of a single partition, as other topics won't have that message id ---- 2020-09-17 16:26:07 UTC - Addison Higham: I still believe MySQL should be supported, the default implementation was just changed. You will note that that commit is about making sure the SQL is more compliant instead of using custom features of mysql or postgres ---- 2020-09-17 16:42:26 UTC - Addison Higham: @Robin Custers With the data here, I don't think there is quite enough to give a recommendation, both are viable in Pulsar, but do have some trade offs, which I can give you some more details around: - The biggest trade-off/design choice in these situations is often about your consumption patterns. if your consumers are always consuming the data for all consumers, than doing a single partitioned topic makes more sense, as you don't need to have each consumer create 40k subscriptions to all the topics. If your consumers are primarily dealing with individual customers, then you will waste a lot of bandwidth as consumers receive the whole stream but only care about a small subset. If you do both patterns, then it make actually make sense to have both, with producers producing to customer topics, then perhaps a pulsar function that merges all the meessages into one single topic, as that can help you handle both use cases - 40,000 topics is well within the realm of Pulsar to handle, but it may take a bit more resources relative to single partitioned topic, specifically, you may need a slightly bigger zookeeper cluster or slightly more RAM and CPU for brokers, but that would be very slight compared to bigger questions about how many consumers are connecting to a given topic, how many producers, etc - handling more topics requires a bit more tuning (specifically, you want to increase the number of "bundles" in a namespace, that is a group of topics that are scheduled together, you may also need to tune some timeouts for when subscribing to thousands of topics at once), it can also be a bit more complicated in failure scenarios and take longer for recovery One thing to note though, you may want to consider more than 3 partitions even with only 3 brokers, this can still be more efficient than a single huge topic per broker to instead have a 2 or 3. ---- 2020-09-17 16:51:45 UTC - Lari Hotari: > - handling more topics requires a bit more tuning (specifically, you want to increase the number of "bundles" in a namespace, that is a group of topics that are scheduled together, you may also need to tune some timeouts for when subscribing to thousands of topics at once), it can also be a bit more complicated in failure scenarios and take longer for recovery @Addison Higham interesting. is there some tuning guide available where it would be possible to gain the knowledge to do the required tuning? I have a use case where the solution would be using eventually 10k-100k topics and it will soon proceed to phase where more load / performance tests would be performed so that's why I'd like to know more about scaling Pulsar. ---- 2020-09-17 16:55:27 UTC - Helder Sousa: Thanks Addison. I understand what you are saying. Do you know of an example/test case that shows this working? I'm a bit confused by the docs and by the fact there is an open PR to implement that support: > <https://pulsar.apache.org/docs/en/concepts-clients/> > _Non-partitioned topics only_ > _The reader interface for Pulsar cannot currently be used with <https://pulsar.apache.org/docs/en/concepts-messaging#partitioned-topics|partitioned topics>._ ---- 2020-09-17 16:56:55 UTC - Lari Hotari: by now, I've found OpenMessaging Benchmark and the config for Pulsar there: <https://github.com/openmessaging/openmessaging-benchmark/tree/master/driver-pulsar/deploy> . Looks like there it has `defaultNumberOfNamespaceBundles=64` . I don't see timeouts tuning in the OMB Pulsar config. ---- 2020-09-17 16:58:25 UTC - Addison Higham: The most important one is the number of bundles by far. Bundles will naturally split or you can manually split, but if you know you are using lots of topics, you can create a namespace with more bundles. For the timeouts, that more has to do if you are subscribing to thousands of topics at once with a multi-topic subscription. Let me check around for a better reference, I don't believe there is anything in the projects docs at the moment (but should be!) but there may be some details on this in some of the presentations +1 : Lari Hotari, charles ---- 2020-09-17 17:01:23 UTC - Lari Hotari: Thanks, any reference would be very helpful. I have a few on my watch list that I haven't seen yet. I'm planning to watch "StreamNative Webinar - How to Operate Pulsar in Production, Jul 28, 2020". ---- 2020-09-17 18:12:51 UTC - Yarden Arane: *Functions*: Subscriptions' Cursor Management Hi all, I was wondering if it is acceptable to modify the cursor of a function's active subscription from within the function itself? Example using the pulsar-admin api: ```public class CursorManagementFunction implements Function<String, Void> { PulsarAdmin admin; MessageId prevMessageId; @Override public void process(String input, Context context) { String topic = context.getCurrentRecord().getTopicName() String subName = context.getFunctionName admin.topics().resetCursor(topic, subName, prevMessageId); } }``` Any red flags with such an approach? p.s. is there a way to access a function's subscription (within the function) without using the pulsar-admin api? ```public class CursorManagementFunction implements Function<String, Void> { MessageId prevMessageId; @Override public void process(String input, Context context) { String topic = context.getCurrentRecord().getTopicName() //pseudo code: getSubscription().resetCursor(topic, prevMessageId) } }``` ---- 2020-09-17 22:21:04 UTC - el akroudi abdessamad: Hello, ---- 2020-09-17 22:21:55 UTC - el akroudi abdessamad: i set a 6 vms 3 for bookkeeper and 3 for zookeeper but when i try to start bookkeper i had this exception ---- 2020-09-17 22:21:57 UTC - el akroudi abdessamad: `22:07:47.648 [main] ERROR org.apache.bookkeeper.bookie.Bookie - There are directories without a cookie, and this is neither a new environment, nor is storage expansion enabled. Empty directories are [data/bookkeeper/journal/current, data/bookkeeper/ledgers/current]` `22:07:47.650 [main] INFO org.apache.bookkeeper.proto.BookieNettyServer - Shutting down BookieNettyServer` `22:07:47.668 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server` `org.apache.bookkeeper.bookie.BookieException$InvalidCookieException:` `at org.apache.bookkeeper.bookie.Bookie.checkEnvironmentWithStorageExpansion(Bookie.java:468) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.bookie.Bookie.checkEnvironment(Bookie.java:250) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:688) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:136) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:105) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:41) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:301) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.server.Main.doMain(Main.java:221) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.server.Main.main(Main.java:203) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` `at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:313) [org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]` ---- 2020-09-17 22:22:49 UTC - el akroudi abdessamad: when i did bin/bookkeeper shell listbookies -rw i see only one bookie ---- 2020-09-17 22:23:01 UTC - el akroudi abdessamad: `22:11:13.415 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now.` `ReadWrite Bookies :` `10.10.10.11(broker1.pulsar):3181` ---- 2020-09-17 22:23:19 UTC - el akroudi abdessamad: any idea to resole this configuration issue i guess ---- 2020-09-17 22:23:24 UTC - el akroudi abdessamad: thank you in advance ---- 2020-09-18 05:23:40 UTC - Brent Evans: No errors in the broker logs, just warnings: ```Sep 17 08:33:36 ip-XXX.eu-south-1.compute.internal pulsar[4613]: 08:33:36.053 [pulsar-io-23-2] WARN org.apache.pulsar.common.protocol.PulsarHandler - [[id: 0x1eec5052, L:/197.16.2.202:6650 - R:/197.16.1.254:5770]] Forcing connection to close after keep-alive timeout Sep 17 20:46:42 ip-XXX.eu-south-1.compute.internal pulsar[4604]: 20:46:42.556 [BookKeeperClientWorker-OrderedExecutor-5-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [197.16.2.174:3181, 197.16.1.50:3181] is not adhering to Placement Policy. quarantinedBookies: [] Sep 17 22:12:35 ip-XXX.eu-south-1.compute.internal pulsar[4604]: 22:12:35.058 [main-EventThread] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [197.16.3.124:3181, 197.16.1.50:3181] is not adhering to Placement Policy. quarantinedBookies: []``` ---- 2020-09-18 05:49:36 UTC - Rahul Vashishth: <https://github.com/apache/pulsar-helm-chart> ---- 2020-09-18 05:51:01 UTC - Rahul Vashishth: Alternatively there are two more charts, you might find interesting for your purpose <https://github.com/kafkaesque-io/pulsar-helm-chart> <https://github.com/streamnative/charts> ---- 2020-09-18 06:35:20 UTC - Will Wong: @Will Wong has joined the channel ---- 2020-09-18 06:42:13 UTC - Linton: Hi, I’m relatively new to pulsar. Does anyone know if non-persistent topics can be configured to auto-expire/automatically deleted after some period of time of if the consumer/producers have both disconnected? ---- 2020-09-18 06:42:37 UTC - Linton: (not sure if this is the right channel for this question, apologies in advance) ---- 2020-09-18 07:03:09 UTC - Enrico: Thanks so much. Do you have any tips to make conusmers faster? because they look like turtles and not capsicum how to speed them up ---- 2020-09-18 07:15:33 UTC - Sankararao Routhu: Hi, I am testing asyn replication using global zookeeper, I see messages are getting replicated(hopping) between the clusters indefinitely. Message published in west, replicated to east and then from east again replicated back to west and so on... It is never stopping. Topic stats shows backlogSize and storageSize increasing (though I just published only one message and stopped by publisher). The moment I attach consumer, the backlogSize came down to 0. But I dint receive any messages on my consumer(as the message was already consumed earlier). Am I missing any configuration? ---- 2020-09-18 07:48:21 UTC - Robin Custers: alright thanks for the answers ---- 2020-09-18 08:04:40 UTC - Praveen Sannadi: Thanks for your reply on this Addision. You reply made me :grinning:. So I can still pass the Mysql details in above variables of apache helm chart that will intern point the pulsar manager Db to custom Mysql right. Or do we need to do any other specific configurations. I am trying it with local pulsar manager (on docker) now. But I am looking for apache pulsar chart configurations as well. ---- 2020-09-18 08:17:02 UTC - Enrico: Hi, i have a little problem, i test pulsar and i have this problem, if i use only one consumer i consume for example 100k msg/s with 90Mbit/s but if i use 2 consumer shared mode, each consumer consume 50k msg/s with 45 Mbit/s why? my producer send messages with 200Mbit/s, so it's not a slow producer problem. If i use 2 consumer Exclusive mode each consumer consume 100k msg/s with 90Mbit/s but it's same message and this not resolve my problem ---- 2020-09-18 08:17:11 UTC - Enrico: It's broker config? ----