Slack digest for #general - 2020-07-02

Apache Pulsar Slack Thu, 02 Jul 2020 02:12:09 -0700

2020-07-01 10:28:45 UTC - Nhat Ha Trinh: @Nhat Ha Trinh has joined the channel
----
2020-07-01 10:30:36 UTC - Richard Wilkinson: @Richard Wilkinson has joined the 
channel
----
2020-07-01 10:41:39 UTC - Nhat Ha Trinh: Hi, I've got a question about Pulsar 
Encryption. pulsar-client, pulsar-client-admin and 
pulsar-client-messagecrypto-bc version 2.5.2 are used and I've got a 
NoSuchMethodError with org/apache/pulsar/shade/io/netty/buffer/ByteBuf when it 
tried to send data with Pulsar Encryption. I thought the 
<http://org.apache.pulsar.shade.ie|org.apache.pulsar.shade.ie>.netty.buffer.ByteBuf
 should be included in pulsar-client package.  I wonder if anyone know the root 
cause of this. Thanks
----
2020-07-01 11:54:28 UTC - Hiroyuki Yamada: @Sijie Guo @Penghui Li Found another 
bug in 2.6.0.  <https://github.com/apache/pulsar/issues/7414> PTAL.
----
2020-07-01 11:58:54 UTC - Penghui Li: @Hiroyuki Yamada Thanks, I will take a 
look.
man-bowing : Hiroyuki Yamada
+1 : Hiroyuki Yamada
----
2020-07-01 11:59:20 UTC - Hiroyuki Yamada: Thanks !
----
2020-07-01 12:27:41 UTC - Nhat Ha Trinh: I found a same issue in github, but i 
am not sure why the ticket is closed 
(<https://github.com/apache/pulsar/issues/6834>)
----
2020-07-01 12:38:08 UTC - Penghui Li: I have pushed a PR 
<https://github.com/apache/pulsar/pull/7416>. Please help take a look.
+1 : Hiroyuki Yamada
----
2020-07-01 12:42:15 UTC - Hiroyuki Yamada: Great !
----
2020-07-01 16:33:46 UTC - Abhishek Varshney: I am trying to understand the 
`effectively-once` delivery semantics by referring to this article. 
<https://www.splunk.com/en_us/blog/it/effectively-once-semantics-in-apache-pulsar.html>.
It says,
&gt; The fundamental capability that is required for effectively-once 
consumption is to tie the act of processing the data and storing its 
transformed output with the act of “acknowledging” the message in a single 
atomic action.
&gt; Since the Consumer API is not apt for this, we have introduced the 
<http://pulsar.apache.org/docs/en/concepts-clients/#reader-interface|Reader> 
concept in Pulsar
The question that I have is, why is `consumer` API not apt for this? Why cant 
the consumer `consume` a message, record its `id` and `ack` back to the broker 
using `consumer` API itself?
----
2020-07-01 16:44:26 UTC - Addison Higham: I am not totally sure the question 
you are asking but if I am understanding, let me just create a scenario that 
illustrates it:


1. consumer receives the message, process it and makes a change
2. consumer acks (which does essentially what you described, it sends back the 
id)
3. before the ack is confirmed, the connection breaks

Upon reconnect, the client may see the same message again. Since processing the 
message and acking the message aren't done atomically (or transactionally), 
there can always be some event in between processing the message and acking the 
message that means we can't guarantee effectively-once WITHOUT the client 
keeping state.
----
2020-07-01 16:51:29 UTC - Addison Higham: The obvious thing here is that if the 
client knows the position, (the last message id it saw) it can either reset the 
cursor position or just skip any messages before the ID it saw last.

That is possible with the consumer API, however, the reader API is a better fit 
for that use case. If you already have to keep state, then you are doing the 
thing that the consumer API gives you: tracking the state of your cursor. 
Internally, a reader is very similar to a consumer, the biggest differences are:
1. the cursor position isn't persisted in bookkeeper
2. the reader auto-acks
3. upon re-connect the reader doesn't re-send any messages

The first 1 there means readers can be slightly less expensive, but the next 2 
points are actually advantages if you are going to track cursor position 
locally, as now you don't need to skip messages or worry about acks, as you 
will just reset your cursor position as needed
----
2020-07-01 16:59:47 UTC - Abhishek Varshney: Thanks for the explanation 
@Addison Higham.
1. So, that means with the `reader` API `ack` is not required as the consumer 
keeps track of the offset. Please correct me if wrong.
2. Does this also mean that with the `reader` API, negative acks are not 
supported?
3. Also, with the `consumer` API, if the consumer just checks the `id` of the 
last message it has processed before processing this message, wouldn't that 
achieve the `effectively-once` semantics? Or do you see any issues with that 
approach?
----
2020-07-01 17:03:53 UTC - Addison Higham: 1. I haven't looked at the details of 
how exactly it works in a while, but if I am remembering correct, acks are 
still sent, but the client does it automatically as soon as it sees the message 
(i.e. before you do any processing)
2. I would assume no, but I am not positive
3. this can work with exclusive consumers, but is more complicated with any 
other consumer types
+1 : Abhishek Varshney
----
2020-07-01 20:08:37 UTC - Joshua Decosta: has anyone successfully enabled tls 
encryption on the local standalone setup? I’m trying to follow the docs to 
enable it but the conf files note to not use `useTls` anymore and i know from 
enabling authn/authz on pulsar that there are configurations that need to be 
included in the `conf/standalone`. Any help would be appreciated.
----
2020-07-01 20:09:54 UTC - Joshua Decosta: Also is there a way to use JWT for 
Authn/Authz for everything except the broker to broker communication?
----
2020-07-02 00:43:01 UTC - Ambud Sharma: thanks @Matteo Merli for the 
confirmation; are there any thoughts on introducing some sort of checkpointing 
barrier in the future to preserve ordering?
----
2020-07-02 03:22:56 UTC - Luke Stephenson: With pulsar 2.6.0 I'm seeing the CPU 
use on brokers increase even when the cluster should be idle.  I'll start a 
thread with a bit more info.
----
2020-07-02 03:23:15 UTC - Luke Stephenson: Here is a screenshot showing the CPU 
increasing.
----
2020-07-02 03:25:20 UTC - Luke Stephenson: So I restarted all the brokers.  CPU 
initially starts low.  But after a while something happens which triggers the 
CPU to start a linear growth curve.
----
2020-07-02 03:27:07 UTC - Luke Stephenson: Looking at the process, there are a 
couple of threads using a substantial amount of CPU.
```root@pulsar-broker-1:/pulsar#  top -n 1 -H -p 51
top - 03:25:44 up 2 days,  1:44,  0 users,  load average: 0.12, 0.09, 0.14
Threads: 158 total,   0 running, 158 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.3 us,  3.3 sy,  0.0 ni, 93.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15576.3 total,   7665.2 free,   2606.7 used,   5304.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  12755.5 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  134 root      20   0 9000576   1.8g  30612 S   6.7  12.2   0:42.58 
bookkeeper-ml-w
  135 root      20   0 9000576   1.8g  30612 S   6.7  12.2   0:41.10 
bookkeeper-ml-w```
----
2020-07-02 03:28:11 UTC - Luke Stephenson: And if I look at the output of stack 
for that thread, the thread is `bookkeeper-ml-workers-OrderedExecutor-6-0`.
----
2020-07-02 03:29:31 UTC - Luke Stephenson: That is as far as I've got so far.  
That thread is in a waiting state when I look at it.
----
2020-07-02 03:46:33 UTC - yingziyu: @yingziyu has joined the channel
----
2020-07-02 03:46:56 UTC - Luke Stephenson: If I had to guess, a scheduled job 
is rescheduling itself and it's getting more frequently executed.
----
2020-07-02 03:47:10 UTC - Luke Stephenson: Bookies are showing no load.
----
2020-07-02 03:52:30 UTC - Alan Kittel: I am deploying a cluster on AWS EC2 with 
Terraform.  I would like to be able to turn off the instances in the cluster 
when I am not using them, but the bookie instances fail to start back up if I 
do so.  For the bookies, I am using the default configuration from the 
Terraform and Ansible configuration files for Pulsar to setup.  I noticed that 
the `/mnt/journal` and `/mnt/storage` directories are being mounted to 
ephemeral storage on a nvme drive, as defined in `setup-disk.yaml`, which would 
be lost if the instance were shut down.  I attempted to mount these directories 
to the existing EBS volume and to attach additional EBS volumes to the 
instances and mount to those volumes, but had the same issue where the bookies 
would fail to boot up after being turned off.

For anyone who might be able to help me with this issue, I have a couple of 
questions:
1. Is it possible to setup the bookies so that they can be turned off and back 
on again later?
2. If yes to the first question, if `/mnt/journal` and `/mnt/storage` on the 
bookies were to be mounted to a persistent (but slower) volume to retain data 
after shut down, would this be a cause for concern in terms of I/O performance?
----
2020-07-02 04:21:06 UTC - Luke Stephenson: When the CPU started to increase, I 
must have triggered something which caused the broker to load a bundle. I see 
`00:28:56.865 [pulsar-1-20] INFO org.apache.pulsar.broker.PulsarService - 
Loading all topics on bundle: goanna/test/0xc9249246_0xdb6db6d8`.  Shortly 
afterwards there are also a lot of exceptions reported
----
2020-07-02 06:14:19 UTC - Sijie Guo: It looks like to be something related to 
tiered storrage
----
2020-07-02 06:14:26 UTC - Sijie Guo: and it hits a NPE
----
2020-07-02 06:15:53 UTC - Sijie Guo: If the broker is configured to using Token 
authentication, you need to also using Token authentication for broker to 
broker communication.

Alterantively you can configure multiple authentication providers. Pullsar 
supports chaining authentication providers.
----
2020-07-02 06:16:18 UTC - Sijie Guo: What are the issues you hit?
----
2020-07-02 06:17:12 UTC - Luke Stephenson: I am making use of tiered storage
----
2020-07-02 06:19:08 UTC - Sijie Guo: 1. If you turn off the instances, it will 
trigger bookie auto-recovery. So when you turn off an instance, you need to 
ensure data is re-replicated to other instances before turning off other 
instances.
2. Ideally you should put the storage in a persistent volume. 
----
2020-07-02 06:20:14 UTC - Sijie Guo: This seems to be related to the issue 
reported here: <https://github.com/apache/pulsar/pull/7389>
----

Slack digest for #general - 2020-07-02

Reply via email to