2018-01-22 18:59:47 UTC - Allen Wang: @Matteo Merli It looks like the Kafka 
wrapper always subscribe to topic using “failover” mode and limits the rate of 
consuming when there is a large number of partitions.
----
2018-01-22 19:02:05 UTC - Matteo Merli: using the “failover” mode is to match 
the same mode of consumption as Kafka. The problem with using “Shared” 
subscription type would be that Kafka API is not expressing individual acks, 
just offset updates
----
2018-01-22 19:05:27 UTC - Allen Wang: We created 20 consumer instances but they 
can only consume at 90 messages/second (aggregated) when the producers produce 
at 1000 messages/second. The topic has 4800 partitions.
----
2018-01-22 19:08:16 UTC - Allen Wang: It looks like in “failover” mode, only 
one consumer instance will be consuming, correct?
----
2018-01-22 19:30:49 UTC - Matteo Merli: one per partition
----
2018-01-22 19:33:02 UTC - Matteo Merli: I haven’t seen this issue. Let me do 
the test with same number of partititions to see if I can reproduce
----
2018-01-22 22:41:00 UTC - Nicolas Ha: What is the common way to `.receive` 
continuously from multiple consumers? Put all these `.receive` operations in a 
thread pool?
And if you want to cancel it, check before calling the next `.receive` for a 
given consumer?
----
2018-01-22 22:41:47 UTC - Matteo Merli: I think the easier approach is to set a 
listener on the consumer
----
2018-01-22 22:42:20 UTC - Matteo Merli: the listener for many topics are 
invoked on a dedicated threadpool (default size:1 thread)
----
2018-01-22 22:42:55 UTC - Matteo Merli: 
<https://github.com/apache/incubator-pulsar/blob/master/pulsar-client/src/test/java/org/apache/pulsar/client/tutorial/SampleConsumerListener.java#L37>
eyes : Nicolas Ha
+1 : Nicolas Ha
----
2018-01-22 22:50:14 UTC - Allen Wang: @Matteo Merli We changed the consumer to 
be the native consumer that uses “shared” mode. It works well and has no 
problem consuming at high rate with large number of partitions.
----
2018-01-22 22:50:56 UTC - Matteo Merli: Ok, but that shouldn’t have issues even 
in the Failover configuration :slightly_smiling_face:
----
2018-01-22 22:51:54 UTC - Matteo Merli: I haven’t got yet to reproduce it. Will 
do it shortly
----
2018-01-22 23:29:08 UTC - Jaebin Yoon: We have currently 10 brokers, 10 bookies 
in the cluster. And 20 producers produce over the topic of 4800 partitions. I 
noticed that only 4 brokers are being used currently. I'm not sure how the 
current load balancer works but will this be rebalanced when the traffic 
increases? Where should I look into to tweak this load balancing?
----
2018-01-22 23:33:54 UTC - Matteo Merli: There are few parameters to look at : 
 1. The topic assignments to brokers are done in terms of “bundles”, that is in 
group of topic
 2. Topics are matched to bundles by hashing on the name
 3. Effectively, a bundle is a hash-range where topics falls into
 4. Initially the default is to have 4 “bundles” for a namespace
 5. When the traffic increases on a given bundle, it will be split in 2 and 
reassigned to a different broker
 6. There are some adjustable thresholds that can be used to control when the 
split happens, based on number of topics/partitions, messages in/out, bytes 
in/out, etc.. 
 7. It’s also possible to specify a higher number of bundles when creating a 
namepsace
----
2018-01-22 23:34:46 UTC - Matteo Merli: And in additions, there are the 
load-manager thresholds that control when a broker should offload some of the 
bundles to other brokers
----
2018-01-22 23:38:23 UTC - Jaebin Yoon: @Matteo Merli thanks a lot for detail 
explanation. This gives me some ideas. I'll look into it.
----
2018-01-23 03:01:23 UTC - Matteo Merli: @Allen Wang I’m running the producers 
and consumers with 4800 partitions, using `pulsar-perf` with the consumers in 
Failover mode (and running multiple of them). I’m not seeing any strange 
behavior, the traffic is evenly spread across all the available consumers. I 
haven’t tested with the Kafka wrapper yet, that will be my next test.
----
2018-01-23 08:35:10 UTC - Julien Laurenceau: @Julien Laurenceau has joined the 
channel
----
2018-01-23 12:01:12 UTC - Benjamin Lupton: What is the minimum requirements for 
Apache Pulsar? Looking at 
<https://pulsar.incubator.apache.org/docs/latest/deployment/cluster/> that is 
quite an expensive set of requirements for an early startup.
----
2018-01-23 13:57:35 UTC - jia zhai: @Benjamin Lupton need 3 kind of cluster: 
bookie, broker, zookeeper,  But if not have enough resource,  It is OK run 
bookie, zookeeper, and broker on same machine.  
There was already a command and config that could run broker and bookie 
together : `PulsarBrokerStarter --broker-conf broker.conf --run-bookie 
--bookie-conf bookie.conf`.      PR 1023 contains more info for this — 
<https://github.com/apache/incubator-pulsar/pull/1023>
----
2018-01-23 16:19:28 UTC - Matteo Merli: @Benjamin Lupton As @jia zhai said, 
there are several components but in a small deployment they can be collapsed in 
a handful of nodes. If you’re in AWS, there’s a Terraform+Ansible combination 
to get a cluster up with 3 nodes: 
<http://pulsar.apache.org/docs/latest/deployment/aws-cluster/> (+ 3 small VMs 
for ZooKeeper)
----
2018-01-23 16:20:14 UTC - Matteo Merli: and even the 3 ZK processes could be 
co-hosted on the same 3 VMs
----
2018-01-23 16:20:52 UTC - Matteo Merli: In general, if you want the data to be 
replicated in at-least 3 machines, having 3 nodes is the minimum cluster size.
----
2018-01-23 16:21:23 UTC - Matteo Merli: (If you don’t need replication, then 
the minimal is to use the Standalone service)
----

Reply via email to