2018-05-09 09:32:20 UTC - Byron: @Idan I setup a cluster on k8s, no tuning 
(frankly I am completely new to tuning things on the JVM) and easily achieve 
your throughput requirements.
----
2018-05-09 09:32:34 UTC - Byron: @Byron uploaded a file: 
<https://apache-pulsar.slack.com/files/UACD54WB1/FALNQVA8J/pulsar-dash.png|pulsar-dash.png>
----
2018-05-09 09:33:13 UTC - Byron: the irregular patterns are because i am still 
testing workloads on it, but it has been running for several days without issue 
so far.
----
2018-05-09 09:34:48 UTC - Byron: just mentioning my anecdote to hopefully 
reduce your doubt
+1 : Sijie Guo, Ali Ahmed, Matteo Merli
----
2018-05-09 09:53:56 UTC - Idan: Thank you guys. Ill take this under 
considerations
----
2018-05-09 13:34:20 UTC - William Fry: @William Fry has joined the channel
----
2018-05-09 15:33:24 UTC - Rob V: Hi there, regarding Pulsar Functions, the only 
comparable solution that I know of is Kafka Streams, where you deploy a Java 
application that does the message transformation. In my experience running 
Kafka and Kafka Streams in production under significant workload is that you do 
have to be careful configuring your application (memory/CPU/Kafka 
consumer/producer fine-tuning). Although Functions sound like an interesting 
idea, how does it supposed to scale and make sure that functions running on the 
cluster won't have an impact on the cluster itself? Also, does it support state 
handling like Kafka Streams does?
----
2018-05-09 15:41:38 UTC - George Goranov: @George Goranov has joined the channel
----
2018-05-09 16:39:01 UTC - Matteo Merli: &gt; Although Functions sound like an 
interesting idea, how does it supposed to scale and make sure that functions 
running on the cluster won’t have an impact on the cluster itself?

@Rob V There are different deployment options : 
 * Local runner (similar to Kafka streams, you start your instances 
independently) 
 * Managed mode — There’s a pool of workers that executes instances of the 
functions. The workers can run as part of broker (make sense for a small 
single-tenant cluster) or as a separate service layer. Inside each worker, 
function instances can run as threads, processes or, in near future, as 
containers
----
2018-05-09 16:44:07 UTC - Rob V: thank you @Matteo Merli
----
2018-05-09 17:13:32 UTC - Sebastian Schepens: @Sebastian Schepens has joined 
the channel
----
2018-05-09 17:44:08 UTC - Karthik Palanivelu: Hi there, I am getting the 
following error when starting standalone in RHEL7 box. It is snot using the IP 
from Client.conf and using 127.0.0.1 which is failing in this case. Can you 
please let me know which setting need to be changed for that IP?
----
2018-05-09 17:44:15 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a 
file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FALFZ3Z33/-.lua|Untitled>
----
2018-05-09 17:51:15 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a 
file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FAN45KQ23/-.php|Untitled>
----
2018-05-09 17:53:17 UTC - Matteo Merli: @Karthikeyan Palanivelu In this case it 
is the functions worker embedded in the standalone service. You can disable 
functions worker, if you’re not using that, by passing the 
`--no-functions-worker` to the `pulsar standalone` command
----
2018-05-09 18:05:54 UTC - Karthik Palanivelu: @Matteo Merli That works, Is 
there a separate config to add the IP or it should read from client.conf? 
Please let me know where it is picking the 127.0.0.1 from?
----
2018-05-09 18:24:13 UTC - Matteo Merli: The function worker is trying to use 
some topics for its internal management. In standalone mode we assume 
connecting to 127.0.0.1 is always working (which might not be true in your case)
----
2018-05-09 18:34:24 UTC - Daniel Ferreira Jorge: Hi, I'm running a small pulsar 
cluster in production on kubernetes (GKE). It has 5 bookies (one bookie per 
node), each one with a 3TB Persistent disk. The brokers are in another set of 
nodes, so is the zookeeper cluster. The write quorum is 3 and ack quorum is 2. 
I need to migrate this 5 bookies to a new set of 5 nodes, without downtime and 
without impacting producers and consumers. I'm trying to avoid having to use 
auto-recovery since this will have to copy the data of each old bookie to each 
new bookie and this will take a long time. This procedure has to be done 
periodically (roughly every 3 months) so the worker nodes in GKE can keep up 
with newer kubernetes versions on the master nodes.

If I just delete a bookie pod so it will be rescheduled on another node by 
kubernetes (a procedure that takes roughly a minute due to the migration of the 
persistent disk to the new node) will this be enough? Will it be safe? Will the 
bookie come back online and rejoin the cluster "gracefully" since it will keep 
the data?

Do you guys recommend a better way to do a "rolling migration" of bookie pods 
to new kubernetes nodes?
----
2018-05-09 18:45:54 UTC - Matteo Merli: &gt; If I just delete a bookie pod so 
it will be rescheduled on another node by kubernetes (a procedure that takes 
roughly a minute due to the migration of the persistent disk to the new node) 
will this be enough? Will it be safe? Will the bookie come back online and 
rejoin the cluster “gracefully” since it will keep the data?
&gt; Do you guys recommend a better way to do a “rolling migration” of bookie 
pods to new kubernetes nodes?

Since the data is stored in the persistent volume, the bookies pods become 
essentially stateless, so you’re right that auto-recovery is not needed. 
The only thing that needs to be ensured is that the new bookie Pod identifies 
itself with the same “name” as the old bookie. That is required so that BK 
clients can connect to the right Pod and assume it still has the data. 

Bookies have a “cookie” verification mechanism that will prevent a bookie to 
impersonate a different one. One such example would be a new “bookie-2” trying 
to start with the same persistent volume that was previously used by 
“bookie-1". 

To ensure the same identifier is used, in K8S the easiest way is to use 
StatefulSet for bookies and configure them to register and advertise with the 
“hostname” rather than the “IP address” (which is the default). Since the Pod 
IP can change when Pod is rescheduled to a different node, but the Pod name 
will remain the same “bookie-1”. That should already be configured that way in 
the example K8S deployment specs: 
<https://github.com/apache/incubator-pulsar/blob/master/deployment/kubernetes/google-kubernetes-engine/bookie.yaml#L53>
----
2018-05-09 18:51:09 UTC - Daniel Ferreira Jorge: I use a statefulset for the 
bookies. So its safe to simply delete the pod and wait for it to be 
rescheduled, great! One question regarding that: When we delete a pod on 
kubernetes, it will send a SIGTERM to the pod to terminate the main process and 
wait for the grace period and then send a SIGKILL. Does the bookeeper handle 
the SIGTERM to stop itself gracefully?
----
2018-05-09 18:52:52 UTC - Daniel Ferreira Jorge: Also, after I delete my first 
bookie and it come back online, how can I verify that everything is OK? (if it 
rejoined the cluster, if there are no under replicated ledgers, etc)
----
2018-05-09 18:53:00 UTC - Matteo Merli: I believe it should do the grace 
shutdown on sigterm, flush all write cache and then exist
----
2018-05-09 18:54:29 UTC - Ali Ahmed: kubernetes passes sigterm to the pid 1, I 
don’t think bookie is running with that
----
2018-05-09 18:55:51 UTC - Ali Ahmed: here are some docs here 
<https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html>
----
2018-05-09 18:59:59 UTC - Matteo Merli: &gt; Also, after I delete my first 
bookie and it come back online, how can I verify that everything is OK? (if it 
rejoined the cluster, if there are no under replicated ledgers, etc) 

Good question. The auto-recovery itself doesn’t verify all the data, it’s just 
based on the meta-data though it double-checks few entries per each ledger. 

I think you can indeed use auto-recovery, perhaps manually to force the 
double-check
----
2018-05-09 19:00:09 UTC - Joe Francis: @Daniel Ferreira Jorge @Matteo Merli 
Wouldn't making the existing 5 bookies read-only and adding .5 new bookies be 
simpler?
----
2018-05-09 19:00:33 UTC - Joe Francis: The read-only bookies can be dropped 
once the ledgers roll-off
----
2018-05-09 19:01:06 UTC - Daniel Ferreira Jorge: @Ali Ahmed Hmm... I will have 
to check that! Thanks
----
2018-05-09 19:01:38 UTC - Matteo Merli: Like `bin/bookkeeper shell recover 
$SRC_BOOKIE`
----
2018-05-09 19:02:55 UTC - Matteo Merli: In any case, for the restart. If it’s 
not graceful shutdown, it just means it will replay some entries from the 
journal
----
2018-05-09 19:04:45 UTC - Daniel Ferreira Jorge: @Matteo Merli Ok that is 
great! So to check each bookie I can do a recovery just to be sure, correct?
----
2018-05-09 19:06:13 UTC - Daniel Ferreira Jorge: @Joe Francis I could not 
understand how would this work... could you explain it please?
----
2018-05-09 19:08:28 UTC - Joe Francis: Add 5 new bookies. Then making the 
existing ones read only. New data will get written to new bookies, since the 
old ones are now read only. Once all consumers are done with reading data on 
old ledgers, those ledgers will get deleted. At that point you can retire the 
old bookies.
----
2018-05-09 19:09:18 UTC - Daniel Ferreira Jorge: That would not work for me, 
because I keep the data for re-consumption forever
----
2018-05-09 19:10:06 UTC - Matteo Merli: @Joe Francis yes, that would be the 
best strategy for on-prem bare metal. but if you’re in cloud with persistent 
volumes it’s easier to just restart the Pod in a different node
----
2018-05-09 19:11:39 UTC - Daniel Ferreira Jorge: I need only to change the 
underlying node that the pod is running but when the pod come back online, 
everything should be the same as it was before.
----
2018-05-09 19:12:09 UTC - Daniel Ferreira Jorge: @Matteo Merli @Joe Francis 
@Ali Ahmed Thank you very much for all the help
----
2018-05-09 19:27:54 UTC - Byron: :point_up: great information
----
2018-05-10 01:55:36 UTC - Senthilkumar Kalaiselvan: <!here> , Hi All , Good 
Morning... Looking into <https://pulsar.incubator.apache.org/> this doc , Seems 
to be Apache Pulsar is a good candidate for my streaming use case ... Can 
someone help me to understand how it differs from Kafka ?  Any documentation 
with benchmarking numbers ?
----
2018-05-10 02:03:54 UTC - Ali Ahmed: In simple terms pulsar could be considered 
a 2nd generation distributed pub sub, it provides a superset of kafka 
capabilities  and has a kafka compatibility api, if you can describe your use 
case in more detail we can provide more guidance.
----
2018-05-10 02:09:33 UTC - Senthilkumar Kalaiselvan: At high level ,  We collect 
different data from servers ( timestamp is the key ) and push it to Message 
Buffer(Kafka or Pulsar ?). While consuming back the same data should be 
consumed based on timestamp i.e strong ordering ... also sometime we want to 
move consumer position to some timestamp .. for example : start consuming from 
now -3 hours , now - 1 hour etc ... 

Traffic :   250K Requests/ Sec each request : ~1.2 KB .
----
2018-05-10 02:09:59 UTC - Senthilkumar Kalaiselvan: Looking to see the 
benchmarking numbers if any  ...
----
2018-05-10 02:10:44 UTC - Ali Ahmed: kafka and pulsar have already been 
benchmarked.
<http://openmessaging.cloud/docs/benchmarks/>
----
2018-05-10 02:11:28 UTC - Ali Ahmed: pulsar today is bout 4x faster , with 2.0 
on the horizon it will get a lot faster
----
2018-05-10 02:11:58 UTC - Ali Ahmed: pulsar has much stronger data durability 
and ordering guarantees then Kafka
----
2018-05-10 02:12:32 UTC - Senthilkumar Kalaiselvan: Awesome! I'll get started 
on building POC..
----
2018-05-10 02:12:58 UTC - Ali Ahmed: your request volume is modest so it’s not 
a problem at all for a mid size clsuter
----
2018-05-10 02:13:08 UTC - Ali Ahmed: no specific tuning is necessary
----
2018-05-10 02:13:49 UTC - Senthilkumar Kalaiselvan: 250MBps is starting load , 
it can grow up 500MB in 2019 Q1 .
----
2018-05-10 02:15:21 UTC - Ali Ahmed: that’s not as problem , you just add nodes 
if needed , there is no rebalancing or downtime in pulsar, it just scales 
horzontally
----
2018-05-10 02:15:55 UTC - Senthilkumar Kalaiselvan: nice .
----
2018-05-10 02:18:50 UTC - Ali Ahmed: I recommend doing the POC with 2.0 RC , it 
will go GA within within a few weeks.
----
2018-05-10 02:19:09 UTC - Senthilkumar Kalaiselvan: Cool, Thanks.
----

Reply via email to