Slack digest for #general - 2018-12-11

Apache Pulsar Slack Tue, 11 Dec 2018 01:11:14 -0800
2018-12-10 09:43:59 UTC - Maarten Tielemans: morning all. looking at "scaling"
my single node Pulsar setup to a multi node setup, however most documentation I
find immediately speaks about 6 nodes/VM's. would it also be possible to
currently deploy Pulsar on 2 i3.xlarge nodes? (and have zookeeper, bookkeeper
and pulsar run on each)
----
2018-12-10 10:03:56 UTC - Sijie Guo: @Maarten Tielemans: I just discussed with
@richardliu above - you can start with deploy Pulsar to one (or small number of
nodes) and expand later. I made some changes to the deployment documentation.
<https://github.com/apache/pulsar/pull/3152/files>
----
2018-12-10 10:24:30 UTC - Maarten Tielemans: Thanks @Sijie Guo
----
2018-12-10 10:41:59 UTC - Christophe Bornet: OK. So the fallback can only be
done manually ?
----
2018-12-10 11:01:37 UTC - David Tinker: What happens if
`Consumer.acknowledgeAsync()` (Java client) fails? Does it get retried or
something? Or should I handle that myself somehow?
----
2018-12-10 11:04:11 UTC - David Tinker: It would be nice if there was more
Apache Pulsar on Stackoverflow. I could go post my question there if you like?
Would probably be good for the project if that was the preferred way to ask
questions, then post link in this channel.
+1 : jia zhai, Sijie Guo
----
2018-12-10 11:06:56 UTC - jia zhai: @David Tinker You may need handle that
yourself. There is currently no ack status kept.
----
2018-12-10 11:08:21 UTC - David Tinker: Tx. Should I retry a few times and toss
my consumer and re-connect if that doesn't work?
----
2018-12-10 11:12:39 UTC - jia zhai: usually, if not acked successfully, after
acktimeout, it will get redelivered to consumer
----
2018-12-10 11:14:07 UTC - Sijie Guo: I think Stackoverflow is also preferred.
that would also generally good for sharing the knowledges. people from the
community is also monitoring Stackoverflow as well
----
2018-12-10 11:14:34 UTC - David Tinker: Ok. So it is probably sufficient to
just consider async acked messages to be acked immediately as they will be
re-delivered later in any case. I am counting "messages in flight" for flow
control purposes.
----
2018-12-10 11:16:36 UTC - Ivan Kelly: I'd use the debezium connector rather
than rolling my own solution for bringing data from mysql to pulsar. I think
it's available in master now, so will be in a release in the next month or so
----
2018-12-10 11:17:00 UTC - David Tinker:
<https://stackoverflow.com/questions/53704514/how-should-apache-pulsar-consumer-acknowledgeasync-failure-be-handled>
----
2018-12-10 12:20:23 UTC - Maarten Tielemans: If you were to use multiple
bookkeepers, how many bookkeepers would need to ack a produced message before a
consumer would receive it?
----
2018-12-10 12:24:56 UTC - Sijie Guo: @Maarten Tielemans :

it is configurable.

you can configure it at `conf/broker.conf`:

```
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=&lt;replicas&gt;

# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=&lt;replicas&gt;

# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=&lt;replicas&gt;
```

you can configure the replication settings per namespace via `bin/pulsar-admin 
namespaces set-persistence`
----
2018-12-10 13:24:05 UTC - Ezequiel Lovelle: @Ezequiel Lovelle has joined the 
channel
----
2018-12-10 13:33:12 UTC - Christophe Bornet: The `unload` command does indeed 
work. @Matteo Merli Can you give more info on what this command does internally 
? Is it safe to execute periodically, eg. in a cron to ensure an automatic 
fallback after some time ?
----
2018-12-10 14:31:13 UTC - Samuel Sun: 
<https://builds.apache.org/job/pulsar_precommit_java8/5237/console>
----
2018-12-10 14:31:45 UTC - Samuel Sun: can I rerun this jenkins job ? could fail 
due to other reasons, not pr itself.
----
2018-12-10 14:32:06 UTC - Samuel Sun: 
<https://github.com/apache/pulsar/pull/3151>
----
2018-12-10 14:41:49 UTC - Matteo Merli: You can comment `run java8 tests` to 
have Jenkins to start again
----
2018-12-10 14:42:14 UTC - Matteo Merli: On the PR itself 
----
2018-12-10 14:42:53 UTC - Samuel Sun: sure
----
2018-12-10 14:43:25 UTC - Samuel Sun: nice
----
2018-12-10 14:44:41 UTC - Maarten Tielemans: Following the deploy on bare metal 
guide (<https://pulsar.apache.org/docs/en/deploy-bare-metal/>), with the change 
that I try to run zookeeper, bookkeeper and pulsar on the same node

I started two nodes/instances of zookeeper and initialised the cluster metadata
However, when I try to start bookkeeper I receive the following error
----
2018-12-10 14:44:50 UTC - Maarten Tielemans: 
----
2018-12-10 14:48:12 UTC - Matteo Merli: Unload will trigger the current brokers 
to do graceful close of the topics and then release the ownership. Topic will 
be automatically reassigned to a new broken based on load and current 
constraints. 

The only downside of it is the latency blip perceived by clients during the 
failover 
----
2018-12-10 14:53:21 UTC - Christophe Bornet: So what is your recommendation ? 
Should we monitor for failover and when primary brokers come back to live, ask 
for an unload ? Or maybe do an automatic unload each time we detect a broker 
does an unactive to active transition ?
----
2018-12-10 14:53:59 UTC - Christophe Bornet: Shouldn't Pulsar do it by itself 
ideally ?
----
2018-12-10 15:04:08 UTC - Grégory Guichard: Hi,  there is a limit of concurrent 
connection on a pulsar broker ? My broker doesn't accept new connection after 
10 000
----
2018-12-10 15:23:44 UTC - Rohit Rajan: @Rohit Rajan has joined the channel
----
2018-12-10 16:18:07 UTC - Mike Card: @Matteo Merli Have you guys ever run a 
test like this on pulsar, i.e. two parallel producers calling the synchronous 
send() API as fast as possible, both publishing to the same partitioned topic 
(in my test there were 48 partitions) which is being consumed downstream by 2 
tasks each running a shared subscription to consume the topic, each using 
synchronous receives?
----
2018-12-10 17:08:01 UTC - Matteo Merli: @Grégory Guichard There is no 
artificial limit. Have you checked the OS file-descriptors limit for the 
process?
----
2018-12-10 17:14:52 UTC - Matteo Merli: What is your use case for isolation 
exactly? The case for primary/secondary was a bit complicated to begin with. In 
general, at Yahoo we have few namespaces isolated to a subset of brokers (set 
as the “primary”), with fallback to the general pool (secondary set to “.*“) in 
case all the brokers from primary were unavailable
----
2018-12-10 17:16:50 UTC - Christophe Bornet: I'm testing rack aware placement 
and seeing unexpected behavior. I publish on a topic with E=2,Qw=2,Qa=2 
(standard config). I have 2 bookies on rack "eu" and 2 on rack "us". When I 
start publishing one bookie of each rack gets used. When I stopped a bookie 
from  "eu", I expected it would have been replaced by another bookie of the 
same rack but a bookie from "us" got used instead. Am I missing something ?
----
2018-12-10 17:17:57 UTC - Matteo Merli: Can you check in the broker logs if the 
rack info has been picked up correctly?
----
2018-12-10 17:18:34 UTC - Matteo Merli: It should print something there when it 
gets the bookies racks
----
2018-12-10 17:23:28 UTC - Christophe Bornet: ```
bin/pulsar-admin --admin-url <http://pulsar1-eu:8080> bookies racks-placement
{
  "default" : {
    "bk1-eu" : {
      "rack" : "eu"
    },
    "bk2-eu" : {
      "rack" : "eu"
    },
    "bk1-us" : {
      "rack" : "us"
    },
    "bk2-us" : {
      "rack" : "us"
    },
    "pulsar1-eu" : {
      "rack" : "eu"
    },
    "pulsar2-eu" : {
      "rack" : "eu"
    }
  }
}
```
----
2018-12-10 17:23:37 UTC - Christophe Bornet: Is that the info ?
----
2018-12-10 17:24:14 UTC - Matteo Merli: In broker logs, it should print info 
regarding the racks for each bookie
----
2018-12-10 17:25:16 UTC - Christophe Bornet: OK. That's DEBUG info?
----
2018-12-10 17:25:20 UTC - Matteo Merli: In any case, I think the bookie address 
should include the port as well: `bk1-eu:3181`

And make sure `bk1-eu` is the same address advertised by bookies
----
2018-12-10 17:25:56 UTC - Christophe Bornet: oh ! probably !
----
2018-12-10 17:26:28 UTC - Christophe Bornet: I'm using docker images to run the 
cluster
----
2018-12-10 17:26:35 UTC - Christophe Bornet: with docker compose
----
2018-12-10 17:26:59 UTC - Matteo Merli: I see, in any case try adding the port 
as well
----
2018-12-10 17:27:13 UTC - Christophe Bornet: how do I activate the logs fro 
broker rack ?
----
2018-12-10 17:28:12 UTC - Matteo Merli: It would be automatically printed as 
info logs
----
2018-12-10 17:29:07 UTC - Matteo Merli: when a broker discover bookies there 
will be one line about that. if if refers to “/default”  rack … it means it has 
not picked up the configured rack info
----
2018-12-10 17:29:16 UTC - Matteo Merli: Take a look at this unit test:  
<https://github.com/apache/pulsar/blob/master/pulsar-zookeeper-utils/src/test/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMappingTest.java#L72>
----
2018-12-10 17:32:49 UTC - Christophe Bornet: Indeed port is missing
----
2018-12-10 17:33:18 UTC - Christophe Bornet: I have very few logs in docker 
logs INFO
----
2018-12-10 17:33:57 UTC - Christophe Bornet: ```
[conf/broker.conf] Applying config clusterName = test
[conf/broker.conf] Applying config configurationStoreServers = zk:2181
[conf/broker.conf] Applying config zookeeperServers = zk
2018-12-09 22:59:53,519 CRIT Supervisor running as root (no user in config file)
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/bookie.conf" during parsing
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/broker.conf" during parsing
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/functions_worker.conf" during parsing
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/global-zk.conf" during parsing
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/local-zk.conf" during parsing
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/presto_worker.conf" during parsing
2018-12-09 22:59:53,519 INFO Included extra file 
"/etc/supervisord/conf.d/proxy.conf" during parsing
2018-12-09 22:59:53,526 INFO RPC interface 'supervisor' initialized
2018-12-09 22:59:53,527 CRIT Server 'unix_http_server' running without any HTTP 
authentication checking
2018-12-09 22:59:53,527 INFO supervisord started with pid 1
2018-12-09 22:59:54,529 INFO spawned: 'broker' with pid 17
2018-12-09 22:59:56,520 INFO success: broker entered RUNNING state, process has 
stayed up for &gt; than 1 seconds (startsecs)

```
----
2018-12-10 17:34:21 UTC - Christophe Bornet: That's all I got from a broker 
spawned yesterday...
----
2018-12-10 17:34:22 UTC - Maarten Tielemans: @Sijie Guo In case you run both 
zookeeper and bookie on same node, the prometheus ports conflict. You may want 
to highlight that in the documentation
----
2018-12-10 17:40:30 UTC - Matteo Merli: It should have printed a lot more info 
logs. Are you starting with supervisord ?
----
2018-12-10 17:40:45 UTC - Matteo Merli: That would collect logs under 
/var/logs/supervisor/…
----
2018-12-10 17:42:49 UTC - Christophe Bornet: I'm starting this way
```
  pulsar1-eu:
    hostname: pulsar1-eu
    image: apachepulsar/pulsar-test-latest-version:latest
    command: bin/run-broker.sh
    environment:
      clusterName: test
      zookeeperServers: zk
      configurationStoreServers: zk:2181
    networks:
      pulsar:
        ipv4_address: 172.22.0.16
```
----
2018-12-10 17:43:36 UTC - Christophe Bornet: Yes it seems to be started by 
supervisord
----
2018-12-10 17:43:44 UTC - Matteo Merli: Oh I see. This is the “test” image that 
we use for integration tests. I would recommend to use the official image 
`apachepulsar/pulsar`
----
2018-12-10 17:45:04 UTC - Matteo Merli: with command:
```
bin/apply-config-from-env.py conf/broker.conf &amp;&amp;
bin/apply-config-from-env.py conf/pulsar_env.sh &amp;&amp;
bin/pulsar broker
```
----
2018-12-10 17:48:25 UTC - Ryan Samo: Hey guys,
I am trying to get the websocket proxy to work with authentication and 
authorization but I keep running into trouble. I have a root ca and certs in 
place that work today via the java client to a Pulsar proxy to the brokers, all 
working with no problem. I granted consume and produce to the client cert cname 
as well as the proxy cert cname. If I start a WebSocket proxy and attempt to 
use the same proxy cert I get the following error:

failed to get Partitioned metadata : Valid Proxy Client role should be provided 
for getPartitionMetadataRequest

Can you please guide me to what might be wrong since the grants and certs are 
the same but only the websockets proxy has an issue?

Thanks!
----
2018-12-10 17:48:37 UTC - Christophe Bornet: OK will do. Shouldn't I add 
`bin/watch-znode.py -z $zookeeperServers -p /initialized-$clusterName -w` 
before ?
----
2018-12-10 17:51:35 UTC - Matteo Merli: That I think was to wait for that 
z-node to be created when creating a new cluster
----
2018-12-10 17:51:55 UTC - Matteo Merli: Not sure it’s strictly required in 
general deployment
----
2018-12-10 17:52:12 UTC - Ryan Samo: Oh and it shows in the logs that the 
client cert is being seen by the brokers which is good, it does say the 
originalPrincipal is null.
----
2018-12-10 17:57:20 UTC - Christophe Bornet: Yes. My docker-compose is for 
tests only
----
2018-12-10 17:58:21 UTC - Christophe Bornet: ```
17:38:22.527 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Reloading the bookie 
rack affinity mapping cache.
17:38:22.554 [zk-cache-callback-executor-OrderedExecutor-3-0] WARN  
org.apache.pulsar.zookeeper.ZooKeeperDataCache - Reloading ZooKeeperDataCache 
failed at path: /bookies
java.lang.RuntimeException: java.net.UnknownHostException: bk1-eu
        at 
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$1(ZkBookieRackAffinityMapping.java:128)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684) 
~[?:1.8.0_181]
        at 
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$0(ZkBookieRackAffinityMapping.java:123)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at java.util.TreeMap.forEach(TreeMap.java:1005) ~[?:1.8.0_181]
        at 
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.deserialize(ZkBookieRackAffinityMapping.java:122)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.deserialize(ZkBookieRackAffinityMapping.java:1)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$9(ZooKeeperCache.java:325) 
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at 
org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994)
 ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:572) 
~[org.apache.pulsar-pulsar-broker-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) 
~[org.apache.pulsar-pulsar-broker-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
Caused by: java.net.UnknownHostException: bk1-eu
        at 
org.apache.bookkeeper.net.BookieSocketAddress.&lt;init&gt;(BookieSocketAddress.java:55)
 ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
        at 
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$1(ZkBookieRackAffinityMapping.java:125)
 ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
        ... 9 more
```
So the wrong entries make the ZkBookieRackAffinityMapping fail completely. 
Maybe they could be just ignored ?
----
2018-12-10 17:58:38 UTC - Christophe Bornet: I'll remove them for now
----
2018-12-10 18:05:43 UTC - Christophe Bornet: What is the use of `hostname` in 
racks-info ? Do I have to put it or is it guessed from the bookie name if not 
present ?
----
2018-12-10 18:06:27 UTC - Matteo Merli: The hostname is there for info 
purposes. It can be helpful if bookies are advertising just IP addresses
----
2018-12-10 18:07:13 UTC - Matteo Merli: Regarding the DNS error. The problem is 
that BookieSocketAddress is creating a InetSocketAddress in the constructor and 
failing the DNS
----
2018-12-10 18:10:46 UTC - Christophe Bornet: I've removed the wrong entries and 
put the ones with the ports. Now I see the policy correctly loaded.
But in the logs I still don't see which bookies are effectively selected
----
2018-12-10 18:13:56 UTC - Christophe Bornet: ```
18:00:50.892 [zk-cache-callback-executor-OrderedExecutor-3-0] INFO  
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info 
updated to {default={bk2-eu:3181=BookieInfo(rack=eu, hostname=null), 
bk1-eu:3181=BookieInfo(rack=eu, hostname=null), bk2-us:3181=BookieInfo(rack=us, 
hostname=null), bk1-us:3181=BookieInfo(rack=us, hostname=null)}}. Notifying 
rackaware policy.
18:06:02.040 [pulsar-web-358] INFO  org.eclipse.jetty.server.RequestLog - 
172.22.0.1 - - [10/Dec/2018:18:06:02 +0000] "GET 
/admin/v2/persistent/public/default/test9003/partitions HTTP/1.1" 200 16 "-" 
"Pulsar-Java-v2.3.0-SNAPSHOT" 1
18:06:02.147 [pulsar-1-5] INFO  org.eclipse.jetty.server.RequestLog - 
172.22.0.1 - - [10/Dec/2018:18:06:02 +0000] "GET 
/lookup/v2/topic/persistent/public/default/test9003 HTTP/1.1" 307 0 "-" 
"Pulsar-Java-v2.3.0-SNAPSHOT" 20
18:06:02.185 [pulsar-1-4] INFO  
org.apache.pulsar.broker.namespace.OwnershipCache - Trying to acquire ownership 
of public/default/0x40000000_0x50000000
18:06:02.203 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  
org.apache.pulsar.broker.namespace.OwnershipCache - Successfully acquired 
ownership of /namespace/public/default/0x40000000_0x50000000
18:06:02.204 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  
org.eclipse.jetty.server.RequestLog - 172.22.0.1 - - [10/Dec/2018:18:06:02 
+0000] "GET 
/lookup/v2/topic/persistent/public/default/test9003?authoritative=true 
HTTP/1.1" 200 166 "-" "Pulsar-Java-v2.3.0-SNAPSHOT" 27
18:06:02.206 [pulsar-1-15] INFO  org.apache.pulsar.broker.PulsarService - 
Loading all topics on bundle: public/default/0x40000000_0x50000000
18:06:02.209 [pulsar-ordered-OrderedExecutor-5-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger 
public/default/persistent/test9003
18:06:02.227 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  
org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [172.22.0.14:3181, 
172.22.0.15:3181] for ledger: 1057
18:06:02.229 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[public/default/persistent/test9003] Created ledger 1057
18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - 
[<persistent://public/default/test9003>] [null] setting message-dispatch-rate 
DispatchRate{dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, 
ratePeriodInSecond=1}
18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - 
[<persistent://public/default/test9003>] [null] configured message-dispatch 
rate at broker DispatchRate{dispatchThrottlingRateInMsg=0, 
dispatchThrottlingRateInByte=0, ratePeriodInSecond=1}
18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.service.BrokerService - Created topic 
<persistent://public/default/test9003> - dedup is disabled
18:06:02.240 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  
org.apache.pulsar.broker.PulsarService - Loaded 1 topics on 
public/default/0x40000000_0x50000000 -- time taken: 0.032 seconds
18:06:02.400 [pulsar-io-21-6] INFO  org.apache.pulsar.broker.service.ServerCnx 
- New connection from /172.22.0.1:39696
18:06:02.422 [pulsar-io-21-6] INFO  org.apache.pulsar.broker.service.ServerCnx 
- [/172.22.0.1:39696][<persistent://public/default/test9003>] Creating 
producer. producerId=0
18:06:02.424 [ForkJoinPool.commonPool-worker-2] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/172.22.0.1:39696] Created new 
producer: 
Producer{topic=PersistentTopic{topic=<persistent://public/default/test9003>}, 
client=/172.22.0.1:39696, producerName=test-4-174, producerId=0}
```
----
2018-12-10 18:19:32 UTC - Christophe Bornet: My use case is for a multi-region 
cluster where I want to avoid cross-dc bandwidth in "normal" mode with fallback 
to producing/consuming on the other dc in case of failure of a dc
----
2018-12-10 18:20:47 UTC - Christophe Bornet: Note that if a dc fails, 
production/consumption on that dc is probably also OOO but at least no data is 
lost thanks to sync replication of bookies on the other dc
----
2018-12-10 18:21:44 UTC - Christophe Bornet: There could also be use cases 
where we would shutdown brokers on one region for maintenance purpose and have 
the traffic go to the other region for some time
----
2018-12-10 18:22:11 UTC - Christophe Bornet: And at the end of the maintenance 
we would like to have the trafic go back to the local brokers
----
2018-12-10 18:28:34 UTC - Matteo Merli: Got it. In all these cases I think it 
would be preferable to have “manual” triggered failover, if this is meant for 
these special conditions (eg: failback into one DC or planned maintenance)
----
2018-12-10 18:31:03 UTC - David Kjerrumgaard: @Ryan Samo This is the code block 
that is throwing the exception you are seeing. If the originalPrincipal is 
indeed null, then the call to invalidOriginalPrincipal will return true, and 
cause the exception to be thrown.
----
2018-12-10 18:33:47 UTC - David Kjerrumgaard: @Ryan Samo FYI.....This is the 
logic for the invalidOriginalPrincipal method
----
2018-12-10 18:35:44 UTC - Ryan Samo: Yeah I saw the code block and also the 
invalid block but I guess I’m confused as to what the “originalPrincipal” 
really was and why it would ever be null? It’s like the client cert makes it to 
the brokers but not the proxy cert?
----
2018-12-10 18:37:41 UTC - David Kjerrumgaard: @Ryan Samo Can you step through 
the above code block in a debugger? That would help us identify the issue
----
2018-12-10 18:38:48 UTC - David Kjerrumgaard: Do you see a log message similar 
to the following?  <http://log.info|log.info>("[{}] Client successfully 
authenticated with {} role {} and originalPrincipal {}", remoteAddress, 
authMethod, authRole, originalPrincipal);
----
2018-12-10 18:38:56 UTC - Ryan Samo: Sure, let me give it a shot and maybe 
it’ll stand out to me. 
----
2018-12-10 18:39:34 UTC - Ryan Samo: Yup I sure did, it said it was ok, the 
client cert gave that message
----
2018-12-10 18:40:06 UTC - David Kjerrumgaard: but the value for 
originalPrincipal was `null` , correct?
----
2018-12-10 18:40:50 UTC - Ryan Samo: Let me take another look
----
2018-12-10 18:44:58 UTC - Christophe Bornet: Yes, probably.
----
2018-12-10 18:45:10 UTC - Ryan Samo: “Client successfully authenticated with 
tls role websocket and originalPrincipal null”
----
2018-12-10 18:45:25 UTC - Ryan Samo: That’s in the broker
----
2018-12-10 18:47:22 UTC - Ryan Samo: The websocket proxy shows “Authenticated 
WebSocket client devmclient1 on topic <persistent://testtenant/ns1/testtopic> “
----
2018-12-10 18:48:07 UTC - Ryan Samo: My certs are “websocket” and “devmclient1”
----
2018-12-10 18:50:10 UTC - David Kjerrumgaard: and when you connect with the 
java client, which cert do you use?  Maybe it is worth trying connecting with 
“websocket” cert via the java client just to see if that works?
----
2018-12-10 18:51:13 UTC - David Kjerrumgaard: just to rule out the cert as the 
issue, and isolate it to the websocket proxy
----
2018-12-10 18:52:07 UTC - Ryan Samo: Gotcha, ok let me try that
----
2018-12-10 19:09:54 UTC - Ryan Samo: Ok, so on my java client path I use a 
Pulsar proxy with the cert named proxy and the client cert is named 
devmclient1. If I use those together it works fine. If I swap the client 
devmclient1 cert to the WebSocket cert I get 

Client successfully authenticated with tls role proxy and originalPrincipal 
websocket
14:05:20.536 [pulsar-io-21-15] WARN  org.apache.pulsar.broker.service.ServerCnx 
- [/] Valid Proxy Client role should be provided for lookup  with role proxy 
and proxyClientAuthRole websocket on topic
----
2018-12-10 19:10:30 UTC - Ryan Samo: Also I I try to use the proxy cert on the 
client I get the same error
----
2018-12-10 19:10:36 UTC - Ryan Samo: If 
----
2018-12-10 19:20:14 UTC - David Kjerrumgaard: @Ryan Samo So there appears to be 
an issue with the webSocket cert.
----
2018-12-10 19:20:55 UTC - Ryan Samo: Ok, let me generate a new cert and try it 
once more
----
2018-12-10 19:21:00 UTC - Ryan Samo: Thanks!
----
2018-12-10 19:21:08 UTC - David Kjerrumgaard: no problem.....good luck!!
----
2018-12-10 20:42:47 UTC - Ben Devore: @Ben Devore has joined the channel
----
2018-12-10 20:45:17 UTC - Thor Sigurjonsson: @Thor Sigurjonsson has joined the 
channel
----
2018-12-10 21:06:59 UTC - Christophe Bornet: I'm still not seeing bookie 
selection info in logs. Any hint ?
----
2018-12-10 21:08:52 UTC - Emma Pollum: I'm running into issues creating a 
function in Pulsar. When I try to create it, I get `Function worker service is 
not done initializing. Please try again in a little while.`
----
2018-12-10 21:09:05 UTC - Emma Pollum: My pulsar cluster has been running for a 
few days though....
----
2018-12-10 21:27:50 UTC - Emma Pollum: Is there a seperate pulsar-functions 
service that needs to be launched?
----
2018-12-10 21:38:55 UTC - Mike Card: @Matteo Merli Oh and I had the message 
routing mode on the producers set to round robin and message batching set to 
true as well
----
2018-12-10 21:56:40 UTC - David Kjerrumgaard: @Emma Pollum No, there isn't a 
separate service that needs launched.  Can you scan you Pulsar Broker log files 
for any errors / entries related to the function worker service?
----
2018-12-10 22:00:50 UTC - Emma Pollum: I think I found the issue, it looks lke 
you need to set up the bookkeeper conf file to enable function worker
+1 : David Kjerrumgaard
----
2018-12-10 22:01:04 UTC - Emma Pollum: 
<https://pulsar.apache.org/docs/fr/deploy-bare-metal/#enabling-pulsar-functions-optional>
----
2018-12-10 22:09:31 UTC - Matteo Merli: &gt; 18:00:50.892 
[zk-cache-callback-executor-OrderedExecutor-3-0] INFO  
org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info 
updated to {default={bk2-eu:3181=BookieInfo(rack=eu, hostname=null), 
bk1-eu:3181=BookieInfo(rack=eu, hostname=null), bk2-us:3181=BookieInfo(rack=us, 
hostname=null), bk1-us:3181=BookieInfo(rack=us, hostname=null)}}. Notifying 
rackaware policy.

That’s a start. I think there should be some other message at some point, 
though I don’t remember the exact format.

Does the rack-aware policy work now after you kill one of the bookies? Before, 
without the :3181 for sure it wasn’t being picked up
----
2018-12-10 22:40:07 UTC - Christophe Bornet: It still doesn't work because 
`ZkBookieRackAffinityMapping.getRack()` gets called with an address in the form 
`bk1-eu` without the port and `racksWithHost` has keys with the port
----
2018-12-10 22:46:11 UTC - Emma Pollum: What is the best way to get a list of 
all the bookies in the cluster?
----
2018-12-10 22:56:06 UTC - Christophe Bornet: It seems that 
`getZkBookieRackMappingCache`  tried to update `racksWithHost` with the correct 
keys but the reference used by `getRack` is still the old one.
----
2018-12-10 23:01:23 UTC - David Kjerrumgaard: Are you running in a K8s 
environment?
----
2018-12-10 23:13:51 UTC - Matteo Merli: `bookkeeper shell listbookies 
-readwrite`
----
2018-12-10 23:20:00 UTC - Christophe Bornet: There should be a workaround for 
the `racksWithHost` but it doesn't seem to work for me : 
<https://github.com/apache/pulsar/blob/v2.2.0/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMapping.java#L118>
----
2018-12-10 23:50:21 UTC - Christophe Bornet: @Matteo Merli it works if I return 
`racksWithHost` instead of `racks` at 
<https://github.com/apache/pulsar/blob/v2.2.0/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMapping.java#L134>
 . I think it's a bug. Do I do a PR ?
----
2018-12-11 00:41:02 UTC - Mike Card: @Matteo Merli repeated my test using the 
asynchronous send API, I would guess the synchronous send API is doing exactly 
what I am doing here:

retryEventProducer.sendAsync(newRetryRefBuffer.array()).thenAccept(msgId -&gt; 
{});
----
2018-12-11 00:42:15 UTC - Mike Card: I still get the same 64-byte message 
truncation I was seeing before, if send() is just calling the asyncSend() API 
then perhaps there is a problem queuing messages in the send queue under very 
high (say 15 KHz) write rates
----
2018-12-11 00:44:21 UTC - Mike Card: @Matteo Merli when I switched to the 
asynchronous send API I set block if queue full to true on all the producers.
----
2018-12-11 01:46:16 UTC - Harry Rickards: @Harry Rickards has joined the channel
----
2018-12-11 06:58:15 UTC - Cristian: @Cristian has joined the channel
----
2018-12-11 07:03:03 UTC - Cristian: Hello people!

I'm trying to understand Pulsar's schema registry and seeing how it compares 
with the one that confluent developed for Kafka. I don't see in the docs 
whether Pulsar's supports configuring evolution compatibility modes for Avro 
schemas (this is what I mean 
<https://docs.confluent.io/current/avro.html#avro-backward-compatibility>)
----
2018-12-11 07:13:17 UTC - Sijie Guo: @Cristian I think the evolution 
compatibility modes will be supported in the upcoming 2.3.0 release. it is not 
yet supported in 2.2.0.
----
2018-12-11 07:44:47 UTC - Ivan Kelly: @Sijie Guo we need to document that. 
there's very little documentation on actually using schema
+1 : jia zhai
----
2018-12-11 09:10:16 UTC - 陈琳: @陈琳 has joined the channel
----
Slack digest for #general - 2018-12-11

Reply via email to