Slack digest for #general - 2018-01-23

Apache Pulsar Slack Tue, 23 Jan 2018 10:16:29 -0800

2018-01-22 18:59:47 UTC - Allen Wang: @Matteo Merli It looks like the Kafka
wrapper always subscribe to topic using “failover” mode and limits the rate of
consuming when there is a large number of partitions.
----
2018-01-22 19:02:05 UTC - Matteo Merli: using the “failover” mode is to match
the same mode of consumption as Kafka. The problem with using “Shared”
subscription type would be that Kafka API is not expressing individual acks,
just offset updates
----
2018-01-22 19:05:27 UTC - Allen Wang: We created 20 consumer instances but they
can only consume at 90 messages/second (aggregated) when the producers produce
at 1000 messages/second. The topic has 4800 partitions.
----
2018-01-22 19:08:16 UTC - Allen Wang: It looks like in “failover” mode, only
one consumer instance will be consuming, correct?
----
2018-01-22 19:30:49 UTC - Matteo Merli: one per partition
----
2018-01-22 19:33:02 UTC - Matteo Merli: I haven’t seen this issue. Let me do
the test with same number of partititions to see if I can reproduce
----
2018-01-22 22:41:00 UTC - Nicolas Ha: What is the common way to `.receive`
continuously from multiple consumers? Put all these `.receive` operations in a
thread pool?
And if you want to cancel it, check before calling the next `.receive` for a
given consumer?
----
2018-01-22 22:41:47 UTC - Matteo Merli: I think the easier approach is to set a
listener on the consumer
----
2018-01-22 22:42:20 UTC - Matteo Merli: the listener for many topics are
invoked on a dedicated threadpool (default size:1 thread)
----
2018-01-22 22:42:55 UTC - Matteo Merli:
<https://github.com/apache/incubator-pulsar/blob/master/pulsar-client/src/test/java/org/apache/pulsar/client/tutorial/SampleConsumerListener.java#L37>
eyes : Nicolas Ha
+1 : Nicolas Ha
----
2018-01-22 22:50:14 UTC - Allen Wang: @Matteo Merli We changed the consumer to
be the native consumer that uses “shared” mode. It works well and has no
problem consuming at high rate with large number of partitions.
----
2018-01-22 22:50:56 UTC - Matteo Merli: Ok, but that shouldn’t have issues even
in the Failover configuration :slightly_smiling_face:
----
2018-01-22 22:51:54 UTC - Matteo Merli: I haven’t got yet to reproduce it. Will
do it shortly
----
2018-01-22 23:29:08 UTC - Jaebin Yoon: We have currently 10 brokers, 10 bookies
in the cluster. And 20 producers produce over the topic of 4800 partitions. I
noticed that only 4 brokers are being used currently. I'm not sure how the
current load balancer works but will this be rebalanced when the traffic
increases? Where should I look into to tweak this load balancing?
----
2018-01-22 23:33:54 UTC - Matteo Merli: There are few parameters to look at :
1. The topic assignments to brokers are done in terms of “bundles”, that is in
group of topic
2. Topics are matched to bundles by hashing on the name
3. Effectively, a bundle is a hash-range where topics falls into
4. Initially the default is to have 4 “bundles” for a namespace
5. When the traffic increases on a given bundle, it will be split in 2 and
reassigned to a different broker
6. There are some adjustable thresholds that can be used to control when the
split happens, based on number of topics/partitions, messages in/out, bytes
in/out, etc..
7. It’s also possible to specify a higher number of bundles when creating a
namepsace
----
2018-01-22 23:34:46 UTC - Matteo Merli: And in additions, there are the
load-manager thresholds that control when a broker should offload some of the
bundles to other brokers
----
2018-01-22 23:38:23 UTC - Jaebin Yoon: @Matteo Merli thanks a lot for detail
explanation. This gives me some ideas. I'll look into it.
----
2018-01-23 03:01:23 UTC - Matteo Merli: @Allen Wang I’m running the producers
and consumers with 4800 partitions, using `pulsar-perf` with the consumers in
Failover mode (and running multiple of them). I’m not seeing any strange
behavior, the traffic is evenly spread across all the available consumers. I
haven’t tested with the Kafka wrapper yet, that will be my next test.
----
2018-01-23 08:35:10 UTC - Julien Laurenceau: @Julien Laurenceau has joined the
channel
----
2018-01-23 12:01:12 UTC - Benjamin Lupton: What is the minimum requirements for
Apache Pulsar? Looking at
<https://pulsar.incubator.apache.org/docs/latest/deployment/cluster/> that is
quite an expensive set of requirements for an early startup.
----
2018-01-23 13:57:35 UTC - jia zhai: @Benjamin Lupton need 3 kind of cluster:
bookie, broker, zookeeper, But if not have enough resource, It is OK run
bookie, zookeeper, and broker on same machine.
There was already a command and config that could run broker and bookie
together : `PulsarBrokerStarter --broker-conf broker.conf --run-bookie
--bookie-conf bookie.conf`. PR 1023 contains more info for this —
<https://github.com/apache/incubator-pulsar/pull/1023>
----
2018-01-23 16:19:28 UTC - Matteo Merli: @Benjamin Lupton As @jia zhai said,
there are several components but in a small deployment they can be collapsed in
a handful of nodes. If you’re in AWS, there’s a Terraform+Ansible combination
to get a cluster up with 3 nodes:
<http://pulsar.apache.org/docs/latest/deployment/aws-cluster/> (+ 3 small VMs
for ZooKeeper)
----
2018-01-23 16:20:14 UTC - Matteo Merli: and even the 3 ZK processes could be
co-hosted on the same 3 VMs
----
2018-01-23 16:20:52 UTC - Matteo Merli: In general, if you want the data to be
replicated in at-least 3 machines, having 3 nodes is the minimum cluster size.
----
2018-01-23 16:21:23 UTC - Matteo Merli: (If you don’t need replication, then
the minimal is to use the Standalone service)
----

Slack digest for #general - 2018-01-23

Reply via email to