2018-12-10 09:43:59 UTC - Maarten Tielemans: morning all. looking at "scaling" my single node Pulsar setup to a multi node setup, however most documentation I find immediately speaks about 6 nodes/VM's. would it also be possible to currently deploy Pulsar on 2 i3.xlarge nodes? (and have zookeeper, bookkeeper and pulsar run on each) ---- 2018-12-10 10:03:56 UTC - Sijie Guo: @Maarten Tielemans: I just discussed with @richardliu above - you can start with deploy Pulsar to one (or small number of nodes) and expand later. I made some changes to the deployment documentation. <https://github.com/apache/pulsar/pull/3152/files> ---- 2018-12-10 10:24:30 UTC - Maarten Tielemans: Thanks @Sijie Guo ---- 2018-12-10 10:41:59 UTC - Christophe Bornet: OK. So the fallback can only be done manually ? ---- 2018-12-10 11:01:37 UTC - David Tinker: What happens if `Consumer.acknowledgeAsync()` (Java client) fails? Does it get retried or something? Or should I handle that myself somehow? ---- 2018-12-10 11:04:11 UTC - David Tinker: It would be nice if there was more Apache Pulsar on Stackoverflow. I could go post my question there if you like? Would probably be good for the project if that was the preferred way to ask questions, then post link in this channel. +1 : jia zhai, Sijie Guo ---- 2018-12-10 11:06:56 UTC - jia zhai: @David Tinker You may need handle that yourself. There is currently no ack status kept. ---- 2018-12-10 11:08:21 UTC - David Tinker: Tx. Should I retry a few times and toss my consumer and re-connect if that doesn't work? ---- 2018-12-10 11:12:39 UTC - jia zhai: usually, if not acked successfully, after acktimeout, it will get redelivered to consumer ---- 2018-12-10 11:14:07 UTC - Sijie Guo: I think Stackoverflow is also preferred. that would also generally good for sharing the knowledges. people from the community is also monitoring Stackoverflow as well ---- 2018-12-10 11:14:34 UTC - David Tinker: Ok. So it is probably sufficient to just consider async acked messages to be acked immediately as they will be re-delivered later in any case. I am counting "messages in flight" for flow control purposes. ---- 2018-12-10 11:16:36 UTC - Ivan Kelly: I'd use the debezium connector rather than rolling my own solution for bringing data from mysql to pulsar. I think it's available in master now, so will be in a release in the next month or so ---- 2018-12-10 11:17:00 UTC - David Tinker: <https://stackoverflow.com/questions/53704514/how-should-apache-pulsar-consumer-acknowledgeasync-failure-be-handled> ---- 2018-12-10 12:20:23 UTC - Maarten Tielemans: If you were to use multiple bookkeepers, how many bookkeepers would need to ack a produced message before a consumer would receive it? ---- 2018-12-10 12:24:56 UTC - Sijie Guo: @Maarten Tielemans :
it is configurable. you can configure it at `conf/broker.conf`: ``` # Number of bookies to use when creating a ledger managedLedgerDefaultEnsembleSize=<replicas> # Number of copies to store for each message managedLedgerDefaultWriteQuorum=<replicas> # Number of guaranteed copies (acks to wait before write is complete) managedLedgerDefaultAckQuorum=<replicas> ``` you can configure the replication settings per namespace via `bin/pulsar-admin namespaces set-persistence` ---- 2018-12-10 13:24:05 UTC - Ezequiel Lovelle: @Ezequiel Lovelle has joined the channel ---- 2018-12-10 13:33:12 UTC - Christophe Bornet: The `unload` command does indeed work. @Matteo Merli Can you give more info on what this command does internally ? Is it safe to execute periodically, eg. in a cron to ensure an automatic fallback after some time ? ---- 2018-12-10 14:31:13 UTC - Samuel Sun: <https://builds.apache.org/job/pulsar_precommit_java8/5237/console> ---- 2018-12-10 14:31:45 UTC - Samuel Sun: can I rerun this jenkins job ? could fail due to other reasons, not pr itself. ---- 2018-12-10 14:32:06 UTC - Samuel Sun: <https://github.com/apache/pulsar/pull/3151> ---- 2018-12-10 14:41:49 UTC - Matteo Merli: You can comment `run java8 tests` to have Jenkins to start again ---- 2018-12-10 14:42:14 UTC - Matteo Merli: On the PR itself ---- 2018-12-10 14:42:53 UTC - Samuel Sun: sure ---- 2018-12-10 14:43:25 UTC - Samuel Sun: nice ---- 2018-12-10 14:44:41 UTC - Maarten Tielemans: Following the deploy on bare metal guide (<https://pulsar.apache.org/docs/en/deploy-bare-metal/>), with the change that I try to run zookeeper, bookkeeper and pulsar on the same node I started two nodes/instances of zookeeper and initialised the cluster metadata However, when I try to start bookkeeper I receive the following error ---- 2018-12-10 14:44:50 UTC - Maarten Tielemans: ---- 2018-12-10 14:48:12 UTC - Matteo Merli: Unload will trigger the current brokers to do graceful close of the topics and then release the ownership. Topic will be automatically reassigned to a new broken based on load and current constraints. The only downside of it is the latency blip perceived by clients during the failover ---- 2018-12-10 14:53:21 UTC - Christophe Bornet: So what is your recommendation ? Should we monitor for failover and when primary brokers come back to live, ask for an unload ? Or maybe do an automatic unload each time we detect a broker does an unactive to active transition ? ---- 2018-12-10 14:53:59 UTC - Christophe Bornet: Shouldn't Pulsar do it by itself ideally ? ---- 2018-12-10 15:04:08 UTC - Grégory Guichard: Hi, there is a limit of concurrent connection on a pulsar broker ? My broker doesn't accept new connection after 10 000 ---- 2018-12-10 15:23:44 UTC - Rohit Rajan: @Rohit Rajan has joined the channel ---- 2018-12-10 16:18:07 UTC - Mike Card: @Matteo Merli Have you guys ever run a test like this on pulsar, i.e. two parallel producers calling the synchronous send() API as fast as possible, both publishing to the same partitioned topic (in my test there were 48 partitions) which is being consumed downstream by 2 tasks each running a shared subscription to consume the topic, each using synchronous receives? ---- 2018-12-10 17:08:01 UTC - Matteo Merli: @Grégory Guichard There is no artificial limit. Have you checked the OS file-descriptors limit for the process? ---- 2018-12-10 17:14:52 UTC - Matteo Merli: What is your use case for isolation exactly? The case for primary/secondary was a bit complicated to begin with. In general, at Yahoo we have few namespaces isolated to a subset of brokers (set as the “primary”), with fallback to the general pool (secondary set to “.*“) in case all the brokers from primary were unavailable ---- 2018-12-10 17:16:50 UTC - Christophe Bornet: I'm testing rack aware placement and seeing unexpected behavior. I publish on a topic with E=2,Qw=2,Qa=2 (standard config). I have 2 bookies on rack "eu" and 2 on rack "us". When I start publishing one bookie of each rack gets used. When I stopped a bookie from "eu", I expected it would have been replaced by another bookie of the same rack but a bookie from "us" got used instead. Am I missing something ? ---- 2018-12-10 17:17:57 UTC - Matteo Merli: Can you check in the broker logs if the rack info has been picked up correctly? ---- 2018-12-10 17:18:34 UTC - Matteo Merli: It should print something there when it gets the bookies racks ---- 2018-12-10 17:23:28 UTC - Christophe Bornet: ``` bin/pulsar-admin --admin-url <http://pulsar1-eu:8080> bookies racks-placement { "default" : { "bk1-eu" : { "rack" : "eu" }, "bk2-eu" : { "rack" : "eu" }, "bk1-us" : { "rack" : "us" }, "bk2-us" : { "rack" : "us" }, "pulsar1-eu" : { "rack" : "eu" }, "pulsar2-eu" : { "rack" : "eu" } } } ``` ---- 2018-12-10 17:23:37 UTC - Christophe Bornet: Is that the info ? ---- 2018-12-10 17:24:14 UTC - Matteo Merli: In broker logs, it should print info regarding the racks for each bookie ---- 2018-12-10 17:25:16 UTC - Christophe Bornet: OK. That's DEBUG info? ---- 2018-12-10 17:25:20 UTC - Matteo Merli: In any case, I think the bookie address should include the port as well: `bk1-eu:3181` And make sure `bk1-eu` is the same address advertised by bookies ---- 2018-12-10 17:25:56 UTC - Christophe Bornet: oh ! probably ! ---- 2018-12-10 17:26:28 UTC - Christophe Bornet: I'm using docker images to run the cluster ---- 2018-12-10 17:26:35 UTC - Christophe Bornet: with docker compose ---- 2018-12-10 17:26:59 UTC - Matteo Merli: I see, in any case try adding the port as well ---- 2018-12-10 17:27:13 UTC - Christophe Bornet: how do I activate the logs fro broker rack ? ---- 2018-12-10 17:28:12 UTC - Matteo Merli: It would be automatically printed as info logs ---- 2018-12-10 17:29:07 UTC - Matteo Merli: when a broker discover bookies there will be one line about that. if if refers to “/default” rack … it means it has not picked up the configured rack info ---- 2018-12-10 17:29:16 UTC - Matteo Merli: Take a look at this unit test: <https://github.com/apache/pulsar/blob/master/pulsar-zookeeper-utils/src/test/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMappingTest.java#L72> ---- 2018-12-10 17:32:49 UTC - Christophe Bornet: Indeed port is missing ---- 2018-12-10 17:33:18 UTC - Christophe Bornet: I have very few logs in docker logs INFO ---- 2018-12-10 17:33:57 UTC - Christophe Bornet: ``` [conf/broker.conf] Applying config clusterName = test [conf/broker.conf] Applying config configurationStoreServers = zk:2181 [conf/broker.conf] Applying config zookeeperServers = zk 2018-12-09 22:59:53,519 CRIT Supervisor running as root (no user in config file) 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/bookie.conf" during parsing 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/broker.conf" during parsing 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/functions_worker.conf" during parsing 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/global-zk.conf" during parsing 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/local-zk.conf" during parsing 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/presto_worker.conf" during parsing 2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/proxy.conf" during parsing 2018-12-09 22:59:53,526 INFO RPC interface 'supervisor' initialized 2018-12-09 22:59:53,527 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2018-12-09 22:59:53,527 INFO supervisord started with pid 1 2018-12-09 22:59:54,529 INFO spawned: 'broker' with pid 17 2018-12-09 22:59:56,520 INFO success: broker entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) ``` ---- 2018-12-10 17:34:21 UTC - Christophe Bornet: That's all I got from a broker spawned yesterday... ---- 2018-12-10 17:34:22 UTC - Maarten Tielemans: @Sijie Guo In case you run both zookeeper and bookie on same node, the prometheus ports conflict. You may want to highlight that in the documentation ---- 2018-12-10 17:40:30 UTC - Matteo Merli: It should have printed a lot more info logs. Are you starting with supervisord ? ---- 2018-12-10 17:40:45 UTC - Matteo Merli: That would collect logs under /var/logs/supervisor/… ---- 2018-12-10 17:42:49 UTC - Christophe Bornet: I'm starting this way ``` pulsar1-eu: hostname: pulsar1-eu image: apachepulsar/pulsar-test-latest-version:latest command: bin/run-broker.sh environment: clusterName: test zookeeperServers: zk configurationStoreServers: zk:2181 networks: pulsar: ipv4_address: 172.22.0.16 ``` ---- 2018-12-10 17:43:36 UTC - Christophe Bornet: Yes it seems to be started by supervisord ---- 2018-12-10 17:43:44 UTC - Matteo Merli: Oh I see. This is the “test” image that we use for integration tests. I would recommend to use the official image `apachepulsar/pulsar` ---- 2018-12-10 17:45:04 UTC - Matteo Merli: with command: ``` bin/apply-config-from-env.py conf/broker.conf && bin/apply-config-from-env.py conf/pulsar_env.sh && bin/pulsar broker ``` ---- 2018-12-10 17:48:25 UTC - Ryan Samo: Hey guys, I am trying to get the websocket proxy to work with authentication and authorization but I keep running into trouble. I have a root ca and certs in place that work today via the java client to a Pulsar proxy to the brokers, all working with no problem. I granted consume and produce to the client cert cname as well as the proxy cert cname. If I start a WebSocket proxy and attempt to use the same proxy cert I get the following error: failed to get Partitioned metadata : Valid Proxy Client role should be provided for getPartitionMetadataRequest Can you please guide me to what might be wrong since the grants and certs are the same but only the websockets proxy has an issue? Thanks! ---- 2018-12-10 17:48:37 UTC - Christophe Bornet: OK will do. Shouldn't I add `bin/watch-znode.py -z $zookeeperServers -p /initialized-$clusterName -w` before ? ---- 2018-12-10 17:51:35 UTC - Matteo Merli: That I think was to wait for that z-node to be created when creating a new cluster ---- 2018-12-10 17:51:55 UTC - Matteo Merli: Not sure it’s strictly required in general deployment ---- 2018-12-10 17:52:12 UTC - Ryan Samo: Oh and it shows in the logs that the client cert is being seen by the brokers which is good, it does say the originalPrincipal is null. ---- 2018-12-10 17:57:20 UTC - Christophe Bornet: Yes. My docker-compose is for tests only ---- 2018-12-10 17:58:21 UTC - Christophe Bornet: ``` 17:38:22.527 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Reloading the bookie rack affinity mapping cache. 17:38:22.554 [zk-cache-callback-executor-OrderedExecutor-3-0] WARN org.apache.pulsar.zookeeper.ZooKeeperDataCache - Reloading ZooKeeperDataCache failed at path: /bookies java.lang.RuntimeException: java.net.UnknownHostException: bk1-eu at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$1(ZkBookieRackAffinityMapping.java:128) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684) ~[?:1.8.0_181] at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$0(ZkBookieRackAffinityMapping.java:123) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at java.util.TreeMap.forEach(TreeMap.java:1005) ~[?:1.8.0_181] at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.deserialize(ZkBookieRackAffinityMapping.java:122) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.deserialize(ZkBookieRackAffinityMapping.java:1) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$9(ZooKeeperCache.java:325) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:572) ~[org.apache.pulsar-pulsar-broker-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) ~[org.apache.pulsar-pulsar-broker-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] Caused by: java.net.UnknownHostException: bk1-eu at org.apache.bookkeeper.net.BookieSocketAddress.<init>(BookieSocketAddress.java:55) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$1(ZkBookieRackAffinityMapping.java:125) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] ... 9 more ``` So the wrong entries make the ZkBookieRackAffinityMapping fail completely. Maybe they could be just ignored ? ---- 2018-12-10 17:58:38 UTC - Christophe Bornet: I'll remove them for now ---- 2018-12-10 18:05:43 UTC - Christophe Bornet: What is the use of `hostname` in racks-info ? Do I have to put it or is it guessed from the bookie name if not present ? ---- 2018-12-10 18:06:27 UTC - Matteo Merli: The hostname is there for info purposes. It can be helpful if bookies are advertising just IP addresses ---- 2018-12-10 18:07:13 UTC - Matteo Merli: Regarding the DNS error. The problem is that BookieSocketAddress is creating a InetSocketAddress in the constructor and failing the DNS ---- 2018-12-10 18:10:46 UTC - Christophe Bornet: I've removed the wrong entries and put the ones with the ports. Now I see the policy correctly loaded. But in the logs I still don't see which bookies are effectively selected ---- 2018-12-10 18:13:56 UTC - Christophe Bornet: ``` 18:00:50.892 [zk-cache-callback-executor-OrderedExecutor-3-0] INFO org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info updated to {default={bk2-eu:3181=BookieInfo(rack=eu, hostname=null), bk1-eu:3181=BookieInfo(rack=eu, hostname=null), bk2-us:3181=BookieInfo(rack=us, hostname=null), bk1-us:3181=BookieInfo(rack=us, hostname=null)}}. Notifying rackaware policy. 18:06:02.040 [pulsar-web-358] INFO org.eclipse.jetty.server.RequestLog - 172.22.0.1 - - [10/Dec/2018:18:06:02 +0000] "GET /admin/v2/persistent/public/default/test9003/partitions HTTP/1.1" 200 16 "-" "Pulsar-Java-v2.3.0-SNAPSHOT" 1 18:06:02.147 [pulsar-1-5] INFO org.eclipse.jetty.server.RequestLog - 172.22.0.1 - - [10/Dec/2018:18:06:02 +0000] "GET /lookup/v2/topic/persistent/public/default/test9003 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.3.0-SNAPSHOT" 20 18:06:02.185 [pulsar-1-4] INFO org.apache.pulsar.broker.namespace.OwnershipCache - Trying to acquire ownership of public/default/0x40000000_0x50000000 18:06:02.203 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - Successfully acquired ownership of /namespace/public/default/0x40000000_0x50000000 18:06:02.204 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO org.eclipse.jetty.server.RequestLog - 172.22.0.1 - - [10/Dec/2018:18:06:02 +0000] "GET /lookup/v2/topic/persistent/public/default/test9003?authoritative=true HTTP/1.1" 200 166 "-" "Pulsar-Java-v2.3.0-SNAPSHOT" 27 18:06:02.206 [pulsar-1-15] INFO org.apache.pulsar.broker.PulsarService - Loading all topics on bundle: public/default/0x40000000_0x50000000 18:06:02.209 [pulsar-ordered-OrderedExecutor-5-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/test9003 18:06:02.227 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [172.22.0.14:3181, 172.22.0.15:3181] for ledger: 1057 18:06:02.229 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/test9003] Created ledger 1057 18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - [<persistent://public/default/test9003>] [null] setting message-dispatch-rate DispatchRate{dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, ratePeriodInSecond=1} 18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - [<persistent://public/default/test9003>] [null] configured message-dispatch rate at broker DispatchRate{dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, ratePeriodInSecond=1} 18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.service.BrokerService - Created topic <persistent://public/default/test9003> - dedup is disabled 18:06:02.240 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO org.apache.pulsar.broker.PulsarService - Loaded 1 topics on public/default/0x40000000_0x50000000 -- time taken: 0.032 seconds 18:06:02.400 [pulsar-io-21-6] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /172.22.0.1:39696 18:06:02.422 [pulsar-io-21-6] INFO org.apache.pulsar.broker.service.ServerCnx - [/172.22.0.1:39696][<persistent://public/default/test9003>] Creating producer. producerId=0 18:06:02.424 [ForkJoinPool.commonPool-worker-2] INFO org.apache.pulsar.broker.service.ServerCnx - [/172.22.0.1:39696] Created new producer: Producer{topic=PersistentTopic{topic=<persistent://public/default/test9003>}, client=/172.22.0.1:39696, producerName=test-4-174, producerId=0} ``` ---- 2018-12-10 18:19:32 UTC - Christophe Bornet: My use case is for a multi-region cluster where I want to avoid cross-dc bandwidth in "normal" mode with fallback to producing/consuming on the other dc in case of failure of a dc ---- 2018-12-10 18:20:47 UTC - Christophe Bornet: Note that if a dc fails, production/consumption on that dc is probably also OOO but at least no data is lost thanks to sync replication of bookies on the other dc ---- 2018-12-10 18:21:44 UTC - Christophe Bornet: There could also be use cases where we would shutdown brokers on one region for maintenance purpose and have the traffic go to the other region for some time ---- 2018-12-10 18:22:11 UTC - Christophe Bornet: And at the end of the maintenance we would like to have the trafic go back to the local brokers ---- 2018-12-10 18:28:34 UTC - Matteo Merli: Got it. In all these cases I think it would be preferable to have “manual” triggered failover, if this is meant for these special conditions (eg: failback into one DC or planned maintenance) ---- 2018-12-10 18:31:03 UTC - David Kjerrumgaard: @Ryan Samo This is the code block that is throwing the exception you are seeing. If the originalPrincipal is indeed null, then the call to invalidOriginalPrincipal will return true, and cause the exception to be thrown. ---- 2018-12-10 18:33:47 UTC - David Kjerrumgaard: @Ryan Samo FYI.....This is the logic for the invalidOriginalPrincipal method ---- 2018-12-10 18:35:44 UTC - Ryan Samo: Yeah I saw the code block and also the invalid block but I guess I’m confused as to what the “originalPrincipal” really was and why it would ever be null? It’s like the client cert makes it to the brokers but not the proxy cert? ---- 2018-12-10 18:37:41 UTC - David Kjerrumgaard: @Ryan Samo Can you step through the above code block in a debugger? That would help us identify the issue ---- 2018-12-10 18:38:48 UTC - David Kjerrumgaard: Do you see a log message similar to the following? <http://log.info|log.info>("[{}] Client successfully authenticated with {} role {} and originalPrincipal {}", remoteAddress, authMethod, authRole, originalPrincipal); ---- 2018-12-10 18:38:56 UTC - Ryan Samo: Sure, let me give it a shot and maybe it’ll stand out to me. ---- 2018-12-10 18:39:34 UTC - Ryan Samo: Yup I sure did, it said it was ok, the client cert gave that message ---- 2018-12-10 18:40:06 UTC - David Kjerrumgaard: but the value for originalPrincipal was `null` , correct? ---- 2018-12-10 18:40:50 UTC - Ryan Samo: Let me take another look ---- 2018-12-10 18:44:58 UTC - Christophe Bornet: Yes, probably. ---- 2018-12-10 18:45:10 UTC - Ryan Samo: “Client successfully authenticated with tls role websocket and originalPrincipal null” ---- 2018-12-10 18:45:25 UTC - Ryan Samo: That’s in the broker ---- 2018-12-10 18:47:22 UTC - Ryan Samo: The websocket proxy shows “Authenticated WebSocket client devmclient1 on topic <persistent://testtenant/ns1/testtopic> “ ---- 2018-12-10 18:48:07 UTC - Ryan Samo: My certs are “websocket” and “devmclient1” ---- 2018-12-10 18:50:10 UTC - David Kjerrumgaard: and when you connect with the java client, which cert do you use? Maybe it is worth trying connecting with “websocket” cert via the java client just to see if that works? ---- 2018-12-10 18:51:13 UTC - David Kjerrumgaard: just to rule out the cert as the issue, and isolate it to the websocket proxy ---- 2018-12-10 18:52:07 UTC - Ryan Samo: Gotcha, ok let me try that ---- 2018-12-10 19:09:54 UTC - Ryan Samo: Ok, so on my java client path I use a Pulsar proxy with the cert named proxy and the client cert is named devmclient1. If I use those together it works fine. If I swap the client devmclient1 cert to the WebSocket cert I get Client successfully authenticated with tls role proxy and originalPrincipal websocket 14:05:20.536 [pulsar-io-21-15] WARN org.apache.pulsar.broker.service.ServerCnx - [/] Valid Proxy Client role should be provided for lookup with role proxy and proxyClientAuthRole websocket on topic ---- 2018-12-10 19:10:30 UTC - Ryan Samo: Also I I try to use the proxy cert on the client I get the same error ---- 2018-12-10 19:10:36 UTC - Ryan Samo: If ---- 2018-12-10 19:20:14 UTC - David Kjerrumgaard: @Ryan Samo So there appears to be an issue with the webSocket cert. ---- 2018-12-10 19:20:55 UTC - Ryan Samo: Ok, let me generate a new cert and try it once more ---- 2018-12-10 19:21:00 UTC - Ryan Samo: Thanks! ---- 2018-12-10 19:21:08 UTC - David Kjerrumgaard: no problem.....good luck!! ---- 2018-12-10 20:42:47 UTC - Ben Devore: @Ben Devore has joined the channel ---- 2018-12-10 20:45:17 UTC - Thor Sigurjonsson: @Thor Sigurjonsson has joined the channel ---- 2018-12-10 21:06:59 UTC - Christophe Bornet: I'm still not seeing bookie selection info in logs. Any hint ? ---- 2018-12-10 21:08:52 UTC - Emma Pollum: I'm running into issues creating a function in Pulsar. When I try to create it, I get `Function worker service is not done initializing. Please try again in a little while.` ---- 2018-12-10 21:09:05 UTC - Emma Pollum: My pulsar cluster has been running for a few days though.... ---- 2018-12-10 21:27:50 UTC - Emma Pollum: Is there a seperate pulsar-functions service that needs to be launched? ---- 2018-12-10 21:38:55 UTC - Mike Card: @Matteo Merli Oh and I had the message routing mode on the producers set to round robin and message batching set to true as well ---- 2018-12-10 21:56:40 UTC - David Kjerrumgaard: @Emma Pollum No, there isn't a separate service that needs launched. Can you scan you Pulsar Broker log files for any errors / entries related to the function worker service? ---- 2018-12-10 22:00:50 UTC - Emma Pollum: I think I found the issue, it looks lke you need to set up the bookkeeper conf file to enable function worker +1 : David Kjerrumgaard ---- 2018-12-10 22:01:04 UTC - Emma Pollum: <https://pulsar.apache.org/docs/fr/deploy-bare-metal/#enabling-pulsar-functions-optional> ---- 2018-12-10 22:09:31 UTC - Matteo Merli: > 18:00:50.892 [zk-cache-callback-executor-OrderedExecutor-3-0] INFO org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info updated to {default={bk2-eu:3181=BookieInfo(rack=eu, hostname=null), bk1-eu:3181=BookieInfo(rack=eu, hostname=null), bk2-us:3181=BookieInfo(rack=us, hostname=null), bk1-us:3181=BookieInfo(rack=us, hostname=null)}}. Notifying rackaware policy. That’s a start. I think there should be some other message at some point, though I don’t remember the exact format. Does the rack-aware policy work now after you kill one of the bookies? Before, without the :3181 for sure it wasn’t being picked up ---- 2018-12-10 22:40:07 UTC - Christophe Bornet: It still doesn't work because `ZkBookieRackAffinityMapping.getRack()` gets called with an address in the form `bk1-eu` without the port and `racksWithHost` has keys with the port ---- 2018-12-10 22:46:11 UTC - Emma Pollum: What is the best way to get a list of all the bookies in the cluster? ---- 2018-12-10 22:56:06 UTC - Christophe Bornet: It seems that `getZkBookieRackMappingCache` tried to update `racksWithHost` with the correct keys but the reference used by `getRack` is still the old one. ---- 2018-12-10 23:01:23 UTC - David Kjerrumgaard: Are you running in a K8s environment? ---- 2018-12-10 23:13:51 UTC - Matteo Merli: `bookkeeper shell listbookies -readwrite` ---- 2018-12-10 23:20:00 UTC - Christophe Bornet: There should be a workaround for the `racksWithHost` but it doesn't seem to work for me : <https://github.com/apache/pulsar/blob/v2.2.0/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMapping.java#L118> ---- 2018-12-10 23:50:21 UTC - Christophe Bornet: @Matteo Merli it works if I return `racksWithHost` instead of `racks` at <https://github.com/apache/pulsar/blob/v2.2.0/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMapping.java#L134> . I think it's a bug. Do I do a PR ? ---- 2018-12-11 00:41:02 UTC - Mike Card: @Matteo Merli repeated my test using the asynchronous send API, I would guess the synchronous send API is doing exactly what I am doing here: retryEventProducer.sendAsync(newRetryRefBuffer.array()).thenAccept(msgId -> {}); ---- 2018-12-11 00:42:15 UTC - Mike Card: I still get the same 64-byte message truncation I was seeing before, if send() is just calling the asyncSend() API then perhaps there is a problem queuing messages in the send queue under very high (say 15 KHz) write rates ---- 2018-12-11 00:44:21 UTC - Mike Card: @Matteo Merli when I switched to the asynchronous send API I set block if queue full to true on all the producers. ---- 2018-12-11 01:46:16 UTC - Harry Rickards: @Harry Rickards has joined the channel ---- 2018-12-11 06:58:15 UTC - Cristian: @Cristian has joined the channel ---- 2018-12-11 07:03:03 UTC - Cristian: Hello people! I'm trying to understand Pulsar's schema registry and seeing how it compares with the one that confluent developed for Kafka. I don't see in the docs whether Pulsar's supports configuring evolution compatibility modes for Avro schemas (this is what I mean <https://docs.confluent.io/current/avro.html#avro-backward-compatibility>) ---- 2018-12-11 07:13:17 UTC - Sijie Guo: @Cristian I think the evolution compatibility modes will be supported in the upcoming 2.3.0 release. it is not yet supported in 2.2.0. ---- 2018-12-11 07:44:47 UTC - Ivan Kelly: @Sijie Guo we need to document that. there's very little documentation on actually using schema +1 : jia zhai ---- 2018-12-11 09:10:16 UTC - 陈琳: @陈琳 has joined the channel ----
