Slack digest for #general - 2020-04-04

Apache Pulsar Slack Sat, 04 Apr 2020 02:11:24 -0700

2020-04-03 09:35:58 UTC - Foong Fook Won: @Foong Fook Won has joined the channel
----
2020-04-03 11:15:56 UTC - Poul Henriksen: I have a question about the message 
listener implementation in the Java client.


In 
<https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/MessageListener.html#received-org.apache.pulsar.client.api.Consumer-org.apache.pulsar.client.api.Message->
 it is stated that:

```This method is called whenever a new message is received. Messages are 
guaranteed to be delivered in order and from the same thread for a single 
consumer This method will only be called once for each message, unless either 
application or broker crashes. Application is responsible of handling any 
exception that could be thrown while processing the message.```
Imagine a case of having 2 listener threads for the pool (A and B), and 3 
consumers with the following characteristics:

- Consumer 1: 1 message every 20 second, but processing always takes 20 seconds
- Consumer 2: 1 message every 1 hour, processing always takes 1 millisecond
- Consumer 3: 1000 messsages every 1 second, processing always takes 1 
millisecond

Consumer 1 and 3 could then be assigned to the same thread (Thread A).
When Consumer 1 is using the Thread A, it will block it for 20 seconds, and 
Consumer 3 will not get any messages processed in that period, although it 
could just have used the idling Thread B for processing.

Is my above understanding correct? If yes, then that seems like potential 
performance issue...
----
2020-04-03 13:59:04 UTC - Dhaval Kotecha: @Dhaval Kotecha has joined the channel
----
2020-04-03 16:50:41 UTC - Matteo Merli: The listener is implemented using a 
thread-pool. You can define the size of that thread pool. default is 1.

Each consumer is assigned to one of the threads in the pool, in round robin 
fashion.

If you have 3 consumers and 3 threads, they won't impact each others
----
2020-04-03 16:51:07 UTC - Matteo Merli: Also, you can always "jump" to other 
threads to do the processing.
----
2020-04-03 16:56:46 UTC - Poul Henriksen: The example above was just an 
example. With 1000s of consumers I don’t want 1000s of threads to avoid this. 
That kind of defeats the purpose of a thread pool. 
----
2020-04-03 16:56:56 UTC - Poul Henriksen: What do you mean by jumping ?
----
2020-04-03 16:59:48 UTC - Matteo Merli: Using your own thread pool
----
2020-04-03 17:00:32 UTC - Matteo Merli: at the end of the day, in presence of 
blocking operations, it's not possible to offer a "fair" execution environment 
in a general form
----
2020-04-03 17:01:08 UTC - Matteo Merli: if you have more information on the 
priorities of each consumer, you can process their messages in different sets 
of threads.
----
2020-04-03 17:01:36 UTC - Poul Henriksen: Yes, using my own thread pool would 
be my workaround. Just wanted to make sure i understood the behaviour. 
----
2020-04-03 17:04:02 UTC - Poul Henriksen: I would want all threads to be active 
if there are messages to process. I don’t think this is an issue of fairness 
but more of utilising available resources.
----
2020-04-03 17:14:03 UTC - Poul Henriksen: But thanks for your answer. I'll look 
at implementing a workaround for it.
----
2020-04-03 17:30:55 UTC - Matteo Merli: The reasoning around that consumers are 
pinned to 1 thread in the listener is to maintain ordering. If you don't need 
ordering you can just create an Executor with N threads and do your processing 
there for all the consumers. All the threads will be busy processing messages 
in any order.

Just be careful to put a max size on the Executor queue so that backpressure 
can be applied back on the consumers.
----
2020-04-03 17:39:42 UTC - Poul Henriksen: I do want to ensure ordering though, 
so unfortunately it will not be that easy.
----
2020-04-03 17:43:59 UTC - Matteo Merli: Then I think using 2 (or more) sets of 
thread pools, based on processing time, could be a way to do it
----
2020-04-03 17:48:08 UTC - Adime: @Adime has joined the channel
----
2020-04-03 19:32:02 UTC - Tobias Macey: Hello Folks,

I am the host of the Data Engineering Podcast and Podcast.__init__, and I am 
working with O'Reilly on a new book that showcases the work and challenges of 
data engineers. I am collecting short essays (~400 - 500 words) from industry 
veterans and experts such as yourself that will be compiled into the final work.

This project is a collection of pearls of wisdom collected from leading experts 
like you -- whether you are/have been a data engineer or you work closely with 
data engineers in your daily work. The collection will include multiple and 
varied perspectives on key topics, ideas, skills, and future-focused thoughts 
that data engineers should be aware of.

We’re looking for articles on a wide range of topics: building pipelines, batch 
and stream processing, data privacy, data security, data governance and 
lineage, data storage, career advice, data team makeup and culture--and 
everything in between, from the granular issues to big-picture concepts. What 
are the challenges? How did you fail? What have you learned? When did the light 
bulb go off? What do you wish you knew when you started?

Submissions are being accepted via the Google form found at 
<http://boundlessnotions.com/97things|boundlessnotions.com/97things>, which 
also contains additional details and instructions. You can feel free to submit 
multiple articles, but please have them to me within the next 3 - 4 weeks so 
that we can be sure to hit our publishing deadline.

You can find out more about the existing "97 Things Every Programmer Should 
Know" book here at 
<https://www.safaribooksonline.com/library/view/97-things-every/9780596809515/>
If you know anyone else who might be interested in participating then please 
pass this invitation along to your friends and colleagues as well.
Thank you for your time and interest, and we look forward to learning from you!
tada : Matteo Merli, Karthik Ramasamy
clap : Karthik Ramasamy
----
2020-04-03 23:19:11 UTC - Tim Corbett: I've finished the first draft of my C# 
performance testing client.  There are some feature gaps but it shows the issue 
pretty well I hope.  The next few posts will contain the output from that tool 
running with 200 consumers against a single topic in KeyShared mode.
----
2020-04-03 23:19:57 UTC - Tim Corbett: Ignore the super-accurate instance 
clocks allowing for end-to-end latency coming back negative.  This is a static 
load of 200 consumers.
```=================================================
Receive latency:
Min: 0.0264  Max: 294.9339  Mean: 74.62139855072465
Out-of-bounds results: 0 / 69
0.001    0.01     0.1      1        10       100      1000     10000    100000  
 1000000
000000000010000000000000000000000001441332211920000000000000000000000000000000000000000000
-------------------------------------------------
End-to-end latency:
Min: -8.7073  Max: 1.4427  Mean: -4.261844927536234
Out-of-bounds results: 61 / 69
0.1      1        10       100      1000     10000    100000   1000000
002004020800000000000000000000000000000000000000000000000000000000000000
=================================================```
----
2020-04-03 23:20:16 UTC - Tim Corbett: This is from adding a consumer:
```=================================================
Receive latency:
Min: 0.0179  Max: 2706.6598  Mean: 67.08200945945943
Out-of-bounds results: 0 / 74
0.001    0.01     0.1      1        10       100      1000     10000    100000  
 1000000
000000000190000000000000100154310301951001100310000000000000000000000000000000000000000000
-------------------------------------------------
End-to-end latency:
Min: 4.3639  Max: 2607.2919  Mean: 1037.0302081081088
Out-of-bounds results: 0 / 74
0.1      1        10       100      1000     10000    100000   1000000
000000000000100000600000000002112001790000000000000000000000000000000000
=================================================```
----
2020-04-03 23:20:37 UTC - Tim Corbett: And then removing that added consumer, 
which ended up even worse:
```=================================================
Receive latency:
Min: 0.02  Max: 9150.0224  Mean: 78.59670518518516
Out-of-bounds results: 0 / 135
0.001    0.01     0.1      1        10       100      1000     10000    100000  
 1000000
000000000050000000000000000041121210931000000000000000000000000000000000000000000000000000
-------------------------------------------------
End-to-end latency:
Min: 5.8923  Max: 8831.887  Mean: 4458.832968888888
Out-of-bounds results: 0 / 135
0.1      1        10       100      1000     10000    100000   1000000
000000000000000000100000000000000100678695760000000000000000000000000000
=================================================```
----
2020-04-03 23:21:02 UTC - Tim Corbett: However, the removal case may not have 
been a clean shutdown.  That is something I will investigate as I improve my 
tool.
----
2020-04-03 23:22:12 UTC - Tim Corbett: The funny line of numbers at the bottom 
of the reports is a LogLinear Histogram, normalized such that 9 represents the 
largest bucket and the other numbers indicate relative sizes to that bucket.
----

Slack digest for #general - 2020-04-04

Reply via email to