2020-04-05 12:17:55 UTC - Franck Schmidlin: I want to deploy on aws as well, 
but remain elastic. Can the bookies be deployed separately from the brokers? As 
far as I can tell, the bookies are the only components that require long term 
storage, so could be on ec2 with ebs.
Also, long term retention is not a requirement, so could I have a small number 
of bookkeepers for a larger number of brokers? 
This way I could use fargate for additional brokers if/when required? 
Has anyone drawn a deployment diagram anywhere?
----
2020-04-05 12:27:07 UTC - yujun: @yujun has joined the channel
----
2020-04-05 16:34:44 UTC - Jesus Ramirez: @Jesus Ramirez has joined the channel
----
2020-04-05 16:35:12 UTC - Jesus Ramirez: Hi guys!
----
2020-04-05 16:36:35 UTC - Jesus Ramirez: I've been checking on different 
solutions like Kafka, pulsar and kubemq.
----
2020-04-05 16:38:04 UTC - Jesus Ramirez: I've noticed that kafka has a 
limitation in consumers, you need to have more or equal partitions than 
consumers, does pulsar have the same limitation?
----
2020-04-05 16:39:03 UTC - Jesus Ramirez: I've been checking the documentaion 
and i don't find anything about this
----
2020-04-05 16:40:06 UTC - Chris Bartholomew: You can connect many consumers to 
a single topic partition.
----
2020-04-05 16:43:06 UTC - Jesus Ramirez: thank you!
----
2020-04-05 16:58:31 UTC - Matteo Merli: There's a limit on the size of the 
range-set that is stored. By default it would store up to 50K disjoint ranges. 
After that the delivery to consumer is also stalled, until the "holes" in the 
ack sequence are filled.
----
2020-04-05 17:11:47 UTC - Shangpeng Sun: Cool! Another question related to 
this, are the cursor ledger updated per ack, or periodically with batches? I 
suppose it’s the latter because the broker needs to accumulate some acks to 
calculate the range-sets. However will this cause consistency problem? If the 
broker crashes before updating the cursor ledger, the recent acks will be lost
----
2020-04-05 17:18:50 UTC - David Kjerrumgaard: @Franck Schmidlin I am not that 
familiar with AWS Fargate, but based on what I have seen/read it looks like it 
would be possible to setup a Pulsar cluster to run on that service. However, it 
looks like there would be a lot of steps required to setup each of these 
services, i.e. ZK, BookKeeper, and the Brokers. Then you would also need to 
initialize the cluster metadata, etc.  IMHO it would be easier to just setup 
the Pulsar cluster on EKS using the Helm chart included with the Pulsar 
distribution which performs all these steps for you. Also, I am not sure how 
Fargate handles pod failures, but with EKS you can define StatefulSets which 
insure that a minimum number of pods of a given type are running at all times.
+1 : Franck Schmidlin
----
2020-04-05 17:37:16 UTC - Franck Schmidlin: Which bits of pulsar are elastic?
Can i have a fixed set of zk an bk instances and varying numbers of brokers to 
meet demand?
Or is that silly and there is a fixed n to 1 ratio  between brokers and bk?
Thx
----
2020-04-05 17:39:24 UTC - Matteo Merli: There are 2 ways the acks are batched:
1. The client library by default groups them by 100 millis (can be turned to 0)
2. The broker only persist on ledger every 1sec by default
In case of failures, there will be a limited amount of duplicates. Turning the 
delays to 0 will reduce the amount of dups, though it will never be guaranteed 
to have no dups (with Consumer API)
----
2020-04-05 17:42:12 UTC - Shangpeng Sun: Nice, this makes a lot of sense, 
thanks for the help!
----
2020-04-05 17:42:27 UTC - Matteo Merli: ZK is fixed (and having a big ZK 
cluster doesn't increase the write througput).

Broker and bookies are elastic and they can be independently scaled. There's no 
pre-fixed ratio across the 2.

As a high-level general rule:
• Increase broker to increase serving capacity (generally limited by 
CPU/network bandwidth)
• Increase bookies to increase disk IO and storage capacity
----
2020-04-05 17:42:44 UTC - Matteo Merli: You're welcome
----
2020-04-05 17:44:50 UTC - Franck Schmidlin: Thx. So containerised brokers on 
something like fargate, scaling from 0 to x based on load is not an entirely 
stupid idea?
----
2020-04-05 17:47:06 UTC - Matteo Merli: No
+1 : Franck Schmidlin
----
2020-04-05 17:48:05 UTC - Matteo Merli: For bookies, it's a bit more complex:

• It's easy to scale up
• Scale down has to be done 1 by 1, letting data to get re-replicated first
+1 : Franck Schmidlin, Pierre Zemb
----
2020-04-05 17:50:17 UTC - Franck Schmidlin: But could scale up to deal with 
seasonal spike and slowly scale down as message expire (past retention). Which 
would work for me as well, i think.
Thx
----
2020-04-05 19:19:16 UTC - Vladimir Shchur: @Matteo Merli can you please add few 
words about service discovery for such elastic solution? I've failed GCP pulsar 
k8s try because of the situation where brokers failed to discover bookies after 
bookies' ip changed, how it is supposed to be handled?
----
2020-04-05 19:56:25 UTC - Matteo Merli: The bookies need to expose a "stable" 
identifier. That is used by client to establish a relationship between the data 
and its location.

The stable identifier can be either the IP or the hostname of the bookies.
+1 : Franck Schmidlin
----
2020-04-05 20:07:46 UTC - steven meadows: Do you know whether Pulsar schema 
registry support integration with Kafka?
----
2020-04-05 20:56:21 UTC - Franck Schmidlin: Just read about tiered storage. 
Even better, i probably don't need to worry about scaling bookies.
----
2020-04-06 02:01:42 UTC - Prashanth Tirupachur Vasanthakrishnan: @Prashanth 
Tirupachur Vasanthakrishnan has joined the channel
----

Reply via email to