Slack digest for #general - 2018-03-24

Apache Pulsar Slack Sat, 24 Mar 2018 02:11:32 -0700

2018-03-23 09:17:04 UTC - Piotr: @Ivan Kelly also found this written by you, 
great read! :slightly_smiling_face: : 
<https://streaml.io/blog/bookkeeper-toab/> . A question regarding 
<http://streaml.io|streaml.io> - am I correct in assuming you will be providing 
some kind of managed hosting of pulsar clusters? And got any timeline for that?
----
2018-03-23 10:24:20 UTC - Ivan Kelly: we're focusing on on-prem right now. I 
haven't heard any plans for saas, but I'm remote so it may have passed me by
----
2018-03-23 10:25:35 UTC - Piotr: ok
----
2018-03-23 13:24:14 UTC - Jon Bock: Piotr, what cloud would be of most interest 
to you?  We’re gathering information to help us in our planning.
----
2018-03-23 13:24:43 UTC - Jon Bock: Or are you agnostic to which cloud a SaaS 
offering would be hosted on?
----
2018-03-23 13:47:08 UTC - Piotr: @Jon Bock AWS, and in my case it would 
probably have to be in the EU region as EU data protection laws are getting 
even more strict in may this year
----
2018-03-23 14:04:03 UTC - Ivan Kelly: the US is ok according to GDPR
----
2018-03-23 14:06:07 UTC - Piotr: ah ok, if you say so - about to meet with some 
lawyers to get more clarity regarding that :slightly_smiling_face: Anyhow, I 
probably will have to wait for any offering to be available in EU-ireland since 
that´s where I´m hosted
----
2018-03-23 14:07:14 UTC - Piotr: but I guess you´ll start with the most popular 
US regions etc and move on from there
----
2018-03-23 17:19:01 UTC - Igor Zubchenok: Hello!!
Is there a lightweight way to post a message/batch of messages to a topic?
Due to my business logic I have to send messages to arbitrary topic and I worry 
about time and resources needed to create a publisher just for a single message.
----
2018-03-23 17:20:16 UTC - Matteo Merli: From where are you publishing from?
----
2018-03-23 17:22:00 UTC - Igor Zubchenok: From different stateless instances of 
my server (java)
----
2018-03-23 17:22:38 UTC - Matteo Merli: Well, you could create the producer on 
startup and keep reusing that for each message
----
2018-03-23 17:22:45 UTC - Matteo Merli: (on each instance)
----
2018-03-23 17:23:08 UTC - Igor Zubchenok: Will I have to create millions 
producers for millions topics at every instace?
----
2018-03-23 17:23:18 UTC - Matteo Merli: Other option could be to use the 
WebSocket proxy: <http://pulsar.apache.org/docs/latest/clients/WebSocket/>
----
2018-03-23 17:24:25 UTC - Matteo Merli: yes, what is the reason for creating 
millions of topics? (Many times is easier to work around that assumption, than 
dealing with resources necessaries to support them)
----
2018-03-23 17:25:53 UTC - Igor Zubchenok: I have an independent queue(topic) 
per every user. Millions of topics is a major reason to start using Pulsar 
instead of Kafka.
----
2018-03-23 17:29:53 UTC - Igor Zubchenok: I've checked WebSocket API and there 
is nothing about sending messages to an arbitrary topic. You have to specify 
topic in URL to WebSocket. Did I miss something?
----
2018-03-23 17:30:24 UTC - Matteo Merli: yes, it’s working in the same way, you 
need to have 1 producer per topic
----
2018-03-23 17:30:41 UTC - Matteo Merli: it’s just more suited if you have 
short-living processes
----
2018-03-23 17:32:50 UTC - Igor Zubchenok: So this means a connection 
establishing and a WebSocket handshake for every single message posted to an 
arbitrary topic. Right?
----
2018-03-23 17:33:28 UTC - Matteo Merli: Well, webosocket session is meant to be 
long-living as well
----
2018-03-23 17:33:45 UTC - Matteo Merli: you can use same session to publish 
many messages
----
2018-03-23 17:33:56 UTC - Matteo Merli: (with ordering)
----
2018-03-23 17:34:00 UTC - Igor Zubchenok: Caching in memory millions of 
websocket connections does not sounds as a good way.
----
2018-03-23 17:34:20 UTC - Matteo Merli: Yes, it’s more efficient to use Java 
client anywya
----
2018-03-23 17:34:20 UTC - Igor Zubchenok: As well as Producers.
----
2018-03-23 17:34:30 UTC - Matteo Merli: where connections are pooled
----
2018-03-23 17:34:46 UTC - Matteo Merli: it’s just the memory that you need to 
keep all the objects
----
2018-03-23 17:35:16 UTC - Igor Zubchenok: Is Producer instance a lightweight?
----
2018-03-23 17:35:49 UTC - Matteo Merli: it is in the sense that there are no 
threads / TCP connections specifics for a publisher
----
2018-03-23 17:36:22 UTC - Igor Zubchenok: What is actually happens when a new 
Producer is created? Why is it a long operation?
----
2018-03-23 17:37:04 UTC - Matteo Merli: client will go to broker and check and 
it establishes a sessions (check permissions, etc)
----
2018-03-23 17:37:50 UTC - Igor Zubchenok: Why this cannot be done once per 
`namespace`?
----
2018-03-23 17:37:57 UTC - Matteo Merli: if the number of active topics is much 
smaller than the total, you could keep a cache of producers
----
2018-03-23 17:38:11 UTC - Matteo Merli: no, each topic can be served by 
different machines
----
2018-03-23 17:38:39 UTC - Igor Zubchenok: It is done by a hash of topic, isn't 
it?
----
2018-03-23 17:39:09 UTC - Matteo Merli: the grouping is done by hash, but the 
assingment is dynamic
----
2018-03-23 17:40:58 UTC - Igor Zubchenok: So to find a proper broker instance 
you should have a meta information about all available brokers for namespace 
and then choose proper by hash of a topic. Sounds like there is no actual need 
to contact broker for every single topic.
----
2018-03-23 17:42:16 UTC - Matteo Merli: It’s more complicated than that 
:slightly_smiling_face: because that assignments and even the hash split can 
change over time
----
2018-03-23 17:44:07 UTC - Igor Zubchenok: What do Producer of a topic do when 
this happens?
----
2018-03-23 17:44:34 UTC - Igor Zubchenok: I guess the same can be done by such 
'ProducerForNamespace'.
----
2018-03-23 17:45:27 UTC - Matteo Merli: To clarify, even though a cluster can 
support millions of topics, it’s not always the best way to model a problem. 
For example, you could use a database to store per-user data and message bus as 
a machine to machine notification channel (eg: 1 topic per server)
----
2018-03-23 17:46:39 UTC - Matteo Merli: When topic moves to a different broker, 
producer gets notified (either gracefully or forcefully) and it will redo the 
service discovery and session creation
----
2018-03-23 17:48:24 UTC - Igor Zubchenok: So it could be ok for re-discovery 
and re-creation for 'ProducerForNamespace' in the same manner.
----
2018-03-23 17:51:06 UTC - Igor Zubchenok: That's what I have now, I store 
queues in Cassandra.
----
2018-03-23 17:53:08 UTC - Igor Zubchenok: However when I write a message to 
Cassandra I have no fast way to know where to send a notification cause user 
could be connected to arbitrary server instance. So I do 100ms period scanning 
of new messages for connected to server users. And Cassandra is dying... 
:slightly_smiling_face:
----
2018-03-23 17:53:26 UTC - Igor Zubchenok: Am I wrong?
----
2018-03-23 17:56:17 UTC - Matteo Merli: I see, would a single queue which 
broadcast info to all servers work? Each server could just drop the message if 
user is not connected there.

Of course it doesn’t scale indefinitely, but you can always group users to
servers.
----
2018-03-23 17:58:56 UTC - Matteo Merli: For that to work, there would be
needing several changes :
* Right now, producer is based on that producer session for efficiency reason.
There is some context expected on both client and broker around this producer.
* For example, internally the topic name doesn’t need to be repeated for each
message (and many other things like that)
* If a topic moves, the notification is at topic level, though if broker
doesn’t know a client is interested in publishing to a topic, it cannot notify
it.
----
2018-03-23 18:00:02 UTC - Piotr: How about using Pulsar Functions for this?
Can´t you use Functions to forward messages to other topics? In case you want
to broadcast messages etc
----
2018-03-23 18:00:23 UTC - Matteo Merli: good point
----
2018-03-23 18:01:11 UTC - Igor Zubchenok: It could be a namespace
context/notifications/etc.

P.S. I have stateless servers, so caching of Producers goes against stateless
idea.
----
2018-03-23 18:01:28 UTC - Igor Zubchenok: Functions sounds good if I put topic
name in a message? Will it work good?
----
2018-03-23 18:01:52 UTC - Matteo Merli: you need some info to know where to
route to
----
2018-03-23 18:03:11 UTC - Igor Zubchenok: Can I specify arbitrary target topic
in a function?
----
2018-03-23 18:05:12 UTC - Matteo Merli: you can use the supplied context object
to publish to arbitrary topic
----
2018-03-23 18:06:13 UTC - Matteo Merli: though, again, if you want to fan-out
to unlimited number of topic, the same reources/concerns would apply
----
2018-03-23 18:08:28 UTC - Igor Zubchenok: I see. Do I do something wrong? Is it
an anti-pattern to have a separated queue per user?
----
2018-03-23 18:09:21 UTC - Piotr: Igor, I assume you want to broadcast system
wide messages to users or something? How about you have one topic just for
system wide messages?
----
2018-03-23 18:09:32 UTC - Piotr: that would also mean using a lot less resources
----
2018-03-23 18:10:04 UTC - Matteo Merli: &gt; I see. Do I do something wrong? Is
it an anti-pattern to have a separated queue per user?

If users is unbound, yes.
----
2018-03-23 18:10:09 UTC - Piotr: the ability to have millions of topics (one
per user) is also the reason why I switched from kafka to pulsar
----
2018-03-23 18:12:36 UTC - Piotr: also, don´t you have it on your roadmap to
support millions of topics further ahead? So question is if this is a
requirement you have from day 0 igor, or if the actual number of active topics
is going to be less than that (maybe you don´t need unlimited retention on all
topics, for example)
----
2018-03-23 18:13:26 UTC - Piotr: I guess using pulsar for a mail inbox for all
your users is not optimal - you could use cassandra or something for that and
use pulsar for realtime data
----
2018-03-23 18:14:03 UTC - Matteo Merli: Yes, even currently it’s running in
production with several million topics at Yahoo (though, not one topic per user
:wink: )
----
2018-03-23 18:14:53 UTC - Igor Zubchenok: Requirement is to have
millions-to-millions messaging. scaleable.
----
2018-03-23 18:15:34 UTC - Igor Zubchenok: and stateless server instances.
----
2018-03-23 18:15:44 UTC - Matteo Merli: also 1M is very different from 100M
----
2018-03-23 18:16:08 UTC - Matteo Merli: (I mean in terms of design assumptions)
----
2018-03-23 18:17:18 UTC - Piotr: @Matteo Merli are "idle" topics (with no
active consumers or producers) costly? Or can you have millions of those in the
store just waiting?
----
2018-03-23 18:18:36 UTC - Matteo Merli: the cost is mostly in terms of memory
in broker. There’s a small amount of objects for each topic that is “loaded”
----
2018-03-23 18:19:49 UTC - Piotr: Ok. How about running several clusters and
choosing one based on user id (sharding)?
heavy_plus_sign : Igor Zubchenok
----
2018-03-23 18:20:25 UTC - Matteo Merli: That is not the biggest problem, you
can do that already by having multiple brokers
----
2018-03-23 18:20:26 UTC - Piotr: or simply assigning a cluster id at user
creation time or something
----
2018-03-23 18:23:09 UTC - Igor Zubchenok: This is what I want to avoid -
grouping users and dispatching messages to a proper server where user is
connected.
----
2018-03-23 18:24:42 UTC - Igor Zubchenok: @Piotr this could be a solution for
scaling, but the problem with Producers is still actual.
----
2018-03-23 18:26:20 UTC - Piotr: @Igor Zubchenok how about using sticky
sessions?
----
2018-03-23 18:26:29 UTC - Matteo Merli: @Igor Zubchenok Most of notifications
systems that needs to scale to 100 of millions of users typically use a
cascading mechanism of notifications
----
2018-03-23 18:26:42 UTC - Piotr: you would still keep servers stateless, but
using a cookie you could make each client target the same servers most of the
time
----
2018-03-23 18:27:35 UTC - Igor Zubchenok: This became a problem if a server
instance goes down and users reconnect to other available instances.
----
2018-03-23 18:28:17 UTC - Piotr: I´m guessing you´re designing some kind of
chat app here? In that case each user will keep messaging almost the same users
in a period of time - and by using sticky sessions the requests from that user
will keep going to the same server until it goes down and you need to switch.
Point is then a producer cache should work nicely
----
2018-03-23 18:28:49 UTC - Matteo Merli: a common way for that is to have “pods”
of servers handling groups of users, where each pod has a max-size
----
2018-03-23 18:29:02 UTC - Piotr: Igor: sticky sessions doesnt mean your servers
become stateful - it just means that the load balancer will keep sending
requests from the same user to the same server for as long as that server is
online
----
2018-03-23 18:30:38 UTC - Piotr:
<https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html>
----
2018-03-23 18:34:45 UTC - Igor Zubchenok: Agree, this is a solution - having
groups of servers for partition of users and reconnect user to the same group
of servers. But possibility that provide ProducerForNamespace would solve this
problem in much more elegant and scaleable way and it would be superb.
----
2018-03-23 18:35:16 UTC - Piotr: hehe
----
2018-03-23 18:35:28 UTC - Matteo Merli: &gt; But possibility that provide
ProducerForNamespace would solve this problem in much more elegant and
scaleable way and it would be superb.

Let me think a bit more about it. :slightly_smiling_face:
----
2018-03-23 18:37:52 UTC - Igor Zubchenok: Otherwise I have to configure user
partitioning in my solution, adding a new partition will result in session
closing. As a result - lot of additional development and operations. And
sticky-to-group sessions.
----
2018-03-23 18:38:23 UTC - Piotr: I´m thinking there would probably be a tradeof
either way - isn´t creating a producer also a way to avoid doing security
checks on every time you send a message?
----
2018-03-23 18:38:45 UTC - Matteo Merli: exactly
----
2018-03-23 18:38:59 UTC - Piotr: Igor: why sticky to group? Session affinity is
a yes/no config thing in the load balancer :slightly_smiling_face:
----
2018-03-23 18:39:18 UTC - Piotr: unless I missed something sticky sessions +
producer caching should solve your problems beautifully
----
2018-03-23 18:40:06 UTC - Piotr: sticky sessions = on first request load
balancer assigns the client to 1 server for X amount of time, or until that
server dies - then it selects a new one. So... quite scalable, elastic etc etc
----
2018-03-23 18:45:08 UTC - Igor Zubchenok: @Piotr To have a few producers I need
to know to which group of servers a user is connected and send messages to the
group. Server instance reads all messages in topic of a group and filters all
messages. So it is the same problem and solution that was with Kafka, in
general.
----
2018-03-23 18:46:40 UTC - Igor Zubchenok: Another issue here is that user could
be offline and all servers will drop message. So the message will be
undelivered.
----
2018-03-23 18:48:11 UTC - Piotr: you should ack messages after they have been
read to avoid that
----
2018-03-23 18:48:26 UTC - Igor Zubchenok: So you mean still have a topic per
user?
----
2018-03-23 18:48:35 UTC - Piotr: also, I am still not sure we are on the same
page with that thing with the groups. Why would you need groups of servers?
:slightly_smiling_face:
----
2018-03-23 18:48:48 UTC - Piotr: Ok here is the flow in my mind
----
2018-03-23 18:49:07 UTC - Piotr: Sending a message from user A to user B:
----
2018-03-23 18:49:37 UTC - Piotr: 1) Client sends request for example POST
/users/userA/messages
----
2018-03-23 18:50:30 UTC - Piotr: 2) Load balancer sends the request to server
1, and along with the response from the server it also sets a cookie saying
"you belong to server 1" (session affinity/ sticky sessions)
----
2018-03-23 18:51:38 UTC - Piotr: 2b) On server 1, a producer is created for
topic <persistent://yourapp/global/userMessages/userA>, a message is sent
through that producer and the producer is put in a pool/cache
----
2018-03-23 18:52:20 UTC - Piotr: 3) Client sends another message request for
the same destination user (user A) POST /users/userA/messages, this time it
also includes the cookie "you belong to server 1"
----
2018-03-23 18:52:31 UTC - Piotr: 4) Load balancer sees the cookie, and sends
the request to server 1
----
2018-03-23 18:52:59 UTC - Piotr: 5) server 1 already has a producer for
<persistent://yourapp/global/userMessages/userA> in its cache, so it can use
that - avoiding having to create it again
----
2018-03-23 18:53:19 UTC - Piotr: Does that make sense? :slightly_smiling_face:
----
2018-03-23 18:53:25 UTC - Piotr: Or did I misunderstand something?
----
2018-03-23 18:54:59 UTC - Piotr: Also, you could do it all through websockets
of course - then you wouldn´t even need the cookie - the websocket session
would suffice as a connection to one and the same server for its lifetime
----
2018-03-23 18:55:42 UTC - Igor Zubchenok: I meant group of server was needed to
have another server where user can connect to avoid session re-initialization
if some server instance is down.
----
2018-03-23 18:56:07 UTC - Igor Zubchenok: And I actually use websocket
connections.
----
2018-03-23 18:58:54 UTC - Piotr: ok I guess you have a reason for grouping the
servers together
----
2018-03-23 18:58:59 UTC - Igor Zubchenok: There is no problem with consumer per
user, cause there are limited number of users connected to a server. Problem is
with Producers, when a user can send a message to another arbitrary user. To
send a message I should create and cache millions of Producers at a single
server instance which is not scalable.

So solution is to know which group of servers a user is connected, and send
messages to that group, where message is 'forwarded' to proper user's topic.
----
2018-03-23 19:00:51 UTC - Matteo Merli: @Igor Zubchenok regarding the acking
problem. If you treat the message bus as a notification, it's not a problem to
ack messages immediately even if a consumer is not connected
----
2018-03-23 19:01:27 UTC - Piotr: I think you can safely assume that a user will
message at most X other users within a reasonable amount of time. And if you
can direct all outgoing messages to the same server for a period of time, then
that server can cache those producers - no need to cache more producers than
you need
----
2018-03-23 19:01:46 UTC - Matteo Merli: You just need to make sure that a
server fetches the state for the users from the database, when it forst start
serving that user
----
2018-03-23 19:02:34 UTC - Igor Zubchenok: Sounds like I need a cluster of
dispatchers on the top of Pulsar to forward messages to proper user topic.
Every dispatcher serves a partition of users and should have cached producers
per topic of user.
----
2018-03-23 19:03:24 UTC - Piotr: I really don´t think you need that
:slightly_smiling_face: But I can´t explain my idea in any other way now
----
2018-03-23 19:03:36 UTC - Piotr: or I missed something essential
----
2018-03-23 19:04:50 UTC - Igor Zubchenok: It is not only a chat. It real-time
tracking, broadcasting and orders dispatching.
----
2018-03-23 19:06:08 UTC - Matteo Merli: @Igor Zubchenok if you assign a user to
this set of servers, you can broadcast the notification to that specific group
----
2018-03-23 19:06:15 UTC - Karthik Palanivelu: Hi @Matteo Merli, I brought up 3
standalone pulsar and trying to configure the clusters. I am following your
instructions and issued the below command to receive error of Connection
refused..\pulsar-admin clusters create c —url <pulsar://IP:8080>. I am getting
same error when I fired the list as well, I am getting the same error. But
telnet works fine. Any setting I need to make? Please advice.
----
2018-03-23 19:06:18 UTC - Igor Zubchenok: Agree if messages are in Cassandra.
----
2018-03-23 19:06:22 UTC - Piotr: ok. Yeah I think broadcasting is the tricky
part
----
2018-03-23 19:06:32 UTC - Matteo Merli: At that point you don't need 1 topic
per user anymore
----
2018-03-23 19:07:37 UTC - Matteo Merli: @Karthik Palanivelu the pulsar admin
URL should use the HTTP
----
2018-03-23 19:09:52 UTC - Matteo Merli: Correct, but using the message bus for
notifications will solve the polling/scanning
heavy_check_mark : Igor Zubchenok
----
2018-03-23 19:11:12 UTC - Karthik Palanivelu: @Matteo Merli I tried with url
and broker-url options and both all returning connection refused. I can access
these instances from my local env, 3 standalone are in AWS
----
2018-03-23 19:13:43 UTC - Igor Zubchenok: So there are 3 ways:
1. Have a ProducerForNamespace.
2. Have a cluster of dispatchers for partitions of users that just cache
Producers for every user in dispatcher's partition and forward messages to
proper users' topics.
3. Store messages out of Pulsar, use sticky to server(or group of servers) user
sessions, and send notifications about new messages via Pulsar via single topic
per server(or group of servers) -- such can be done with Kafka as well.
----
2018-03-23 19:14:27 UTC - Piotr: Igor: do you want to broadcast exactly the
same message to absolutely everyone?
----
2018-03-23 19:15:29 UTC - Igor Zubchenok: Nope. Broadcast is to selected users,
sending orders. This require a queue per user.
----
2018-03-23 19:16:14 UTC - Piotr: aha ok
----
2018-03-23 19:16:29 UTC - Igor Zubchenok: Every operation in system can
generate messages to different users. So arbitrary sending is needed.
----
2018-03-23 19:16:40 UTC - Piotr: so it´s a multicast then
----
2018-03-23 19:16:58 UTC - Piotr: @Matteo Merli do you think efficient
multicasting is possible using Functions?
----
2018-03-23 19:17:17 UTC - Piotr: I can´t seem to find any available API for
Functions :slightly_smiling_face:
----
2018-03-23 19:17:27 UTC - Igor Zubchenok: Messages that are sent to arbitrary
users are actually different. It's not just sending a message to multiple users.
----
2018-03-23 19:18:30 UTC - Piotr: ah ok. So not even multicasting, but batching
since this will probably be done by one and the same process in huge batches.
Got it
----
2018-03-23 19:20:18 UTC - Piotr: with rabbitmq you could do it quite fast using
routing keys - I am thinking if Functions can be used to do something similar
----
2018-03-23 19:20:57 UTC - Igor Zubchenok: It would be great if I can send 1000
messages to 1000 different topics in a single batch. :wink: Currently I have to
split the batch per topic and create 1000 producers.
----
2018-03-23 19:25:18 UTC - Piotr: yep... I think I read somewhere that Functions
can be used to forward messages from one topic to another... So then you could
just have one topic to send all messages and then have a Function forwarding
them to the right topic. But maybe that was the kafka docs as I can´t find that
again
----
2018-03-23 19:26:59 UTC - Igor Zubchenok:
----
2018-03-23 19:27:11 UTC - Igor Zubchenok: if you use functions for that.
@Matteo Merli told there is a context in function to send a message to
arbitrary topic.
----
2018-03-23 19:27:49 UTC - Sijie Guo: &gt; I can´t seem to find any available
API for Functions

we are still actively working on the function documentation.

you can checkout this:

<https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/>
<https://streaml.io/blog/pulsar-functions/>

or you can checkout the code :slightly_smiling_face:
----
2018-03-23 19:28:24 UTC - Matteo Merli: A possible way would be to use
partitioned topic for that and have the function use failover subscription
+1 : Igor Zubchenok
----
2018-03-23 19:28:48 UTC - Matteo Merli: And you publish using user name as
message key
+1 : Igor Zubchenok
----
2018-03-23 19:29:40 UTC - Sijie Guo: &gt; if you use functions for that.
@Matteo Merli told there is a context in function to send a message to
arbitrary topic.

just fyi (not interrupt the discussion) :
<https://github.com/apache/incubator-pulsar/blob/master/pulsar-functions/api-java/src/main/java/org/apache/pulsar/functions/api/Context.java#L135>
+1 : Igor Zubchenok
----
2018-03-23 19:31:58 UTC - Piotr: nice - thanks. I really like the Functions
feature so far. Then I think your only problem left to solve should be the
ability to have tens of millions of topics
----
2018-03-23 19:32:36 UTC - Piotr: which should be possible currently, and even
more with this:
<https://github.com/apache/incubator-pulsar/wiki/PIP-8-Pulsar-beyond-1M-topics>
----
2018-03-23 19:33:20 UTC - Karthik Palanivelu: @Matteo Merli is this because of
super user roles is empty in my standalone conf?
----
2018-03-23 19:35:50 UTC - Igor Zubchenok: these functions are like a cluster of
dispatchers that forwards messages to proper topic with Producer instances
caching?
----
2018-03-23 19:36:27 UTC - Igor Zubchenok: if yes, this simplifies the problem
significantly.
----
2018-03-23 20:04:03 UTC - Igor Zubchenok: I so much happy my case most likely
can be solved with Pulsar + dispatch function.

I've checked mentioned Context and found that there is nothing like Publishers.
Just a `publish` method to an arbitrary `topic`. @Matteo Merli told that it
will cost resources. Could anyone provide any details on sending messages with
this `Context.publish` method?
----
2018-03-23 20:17:12 UTC - Piotr: yes I wonder if that will be much more
efficient than just doing it with publishers
----
2018-03-23 20:17:43 UTC - Piotr: at a minimum, you´ll get rid of some roundtrip
time
----
2018-03-23 20:19:07 UTC - Igor Zubchenok: This is at least partitioned and
limited number of publishers should be cached (and if they are cached under the
hood of Context class)
----
2018-03-23 20:21:37 UTC - Igor Zubchenok: If not cached and same technique is
used for sending messages to topics, it's sounds like the same as creating a
Publisher for every single message.
----
2018-03-23 20:21:58 UTC - Igor Zubchenok: @Matteo Merli @Sijie Guo could you
provide some details on this please?
heavy_plus_sign : Piotr
----
2018-03-23 21:23:22 UTC - Matteo Merli: That is correct. By using a
parrtitioned topic with hashing based on the message key, and failover
subscription mode for the function, each instance will only see messages for a
certain subset of users
----
2018-03-23 22:14:49 UTC - Sijie Guo: @Igor Zubchenok sorry was away for a
meeting. will reply soon
ok_hand : Igor Zubchenok
----
2018-03-23 22:23:13 UTC - Igor Zubchenok: great!
----
2018-03-23 22:23:56 UTC - Igor Zubchenok: &gt; with hashing based on the
message key
maybe hash(userId) or hash(target topic)?
----
2018-03-23 22:24:41 UTC - Igor Zubchenok: welcome back:)
----
2018-03-23 22:55:14 UTC - Sijie Guo: &gt; Could anyone provide any details on
sending messages with this `Context.publish` method?

currently `Context.publish` is just wrapping the producers. so fundamentally it
isn’t more efficient than just using the plain pulsar producer.

&gt; If not cached and same technique is used for sending messages to topics,
it’s sounds like the same as creating a Publisher for every single message.

there are producer instances cached. so at some extend, it is “more” efficient
than just creating a producer for every single message.

back to your use case:

1) if you are using a giant topic with many partitions (as what Matteo
suggested), you can use use_name as the key for routing messages. so same user
will end up at same partition. it is most efficient for functions, as it caches
only one topic producer which handles multiple partitions.

2) if you wish to use a topic per user, that means each topic will have very
low traffic. cache is not that efficient. but something like NamespaceProducer
(as what you suggested) is probably better to be introduced. NamespaceProducer
should be very straightforward to implement though.
----
2018-03-23 22:55:33 UTC - Sijie Guo: @Igor Zubchenok hope this give you better
idea. /cc @Matteo Merli if he has more to add
----
2018-03-23 23:04:19 UTC - Igor Zubchenok: Seems I did not got the idea around
`use_name`. What is use_name?
----
2018-03-23 23:04:58 UTC - Sijie Guo: @Igor Zubchenok sorry, userId?
----
2018-03-23 23:05:29 UTC - Sijie Guo: @Igor Zubchenok - this is the thread I am
referring to regarding “user_name”
+1 : Igor Zubchenok
----
2018-03-23 23:05:46 UTC - Sijie Guo: @Igor Zubchenok I tagged you in the tread
you had with @Matteo Merli
----
2018-03-23 23:11:10 UTC - Igor Zubchenok: Great. All this makes sense and
actually looks scaleable. Thank you for your help!
----
2018-03-23 23:11:11 UTC - Igor Zubchenok: BTW I hope you'll eventually consider
to implement NamespaceProducer as more efficient way and easy to implement
inside Pulsar. :slightly_smiling_face:
heavy_plus_sign : Piotr
----
2018-03-23 23:17:41 UTC - Sijie Guo: no problem. feel free to ping us if you
have more questions.
----
2018-03-23 23:17:59 UTC - Sijie Guo: I created an issue for this namespace
producer idea - <https://github.com/apache/incubator-pulsar/issues/1432>
+1 : Igor Zubchenok
----
2018-03-23 23:18:03 UTC - Sijie Guo: just fyi
----
2018-03-23 23:18:46 UTC - Sijie Guo: feel free to drop your ideas in that
issue, so we can come up an clear picture about the requirements
----
2018-03-24 00:31:02 UTC - Igor Zubchenok: 1. Is there any way to flush the
batch of a producer?
----
2018-03-24 00:35:17 UTC - Igor Zubchenok: 2. When is
MessageListener.reachedEndOfTopic called? How to send something to a topic to
trigger this at all consumers?
----
2018-03-24 00:36:11 UTC - Matteo Merli: 1. Calling `send()` on the last message
----
2018-03-24 00:37:15 UTC - Matteo Merli: 2. `reachedEndOfTopic` is only called
when the topic is “terminated”. That’s an explicit operation that “seals” the
topic and doesn’t allow any new message to be published anymore
----
2018-03-24 00:38:54 UTC - Igor Zubchenok: 1. Send() of what object (I use java
client)? No such method producer.send(), when I send I only get MessageId
instance.
----
2018-03-24 00:39:48 UTC - Matteo Merli: Let say you’re publishing 10 messages.
You can call `producer.sendAsync()` 9 times and `producer.send()` for the last
message
----
2018-03-24 00:40:14 UTC - Matteo Merli: (in current master code, the
`producer.send()` triggers immediate flush
----
2018-03-24 00:40:29 UTC - Igor Zubchenok: Only async calls use batches?
----
2018-03-24 00:40:51 UTC - Matteo Merli: Yes, the batch is done transparently in
producer
----
2018-03-24 00:41:14 UTC - Matteo Merli: batching adds a small delay to allow
for grouping the messages
----
2018-03-24 00:41:35 UTC - Matteo Merli: when used on sync send, it would slow
down the throughput
----
2018-03-24 00:42:16 UTC - Igor Zubchenok: `producer.send()` will triggers
flush, but it will make my call synchronized, right? I prefer async API only.
----
2018-03-24 00:42:51 UTC - Matteo Merli: yes, then why do you need to flush?
----
2018-03-24 00:43:38 UTC - Matteo Merli: just configure the batch grouping time
on producer, and use sendAsync(), tracking the CompletableFuture
----
2018-03-24 00:43:42 UTC - Igor Zubchenok: I use infinite timeout and max number
of messages to send everything I already have to send in a single batch.
:slightly_smiling_face:
----
2018-03-24 00:44:27 UTC - Matteo Merli: there’s not much advantage in having a
huge batch. the max is capped at 128Kb anyway
----
2018-03-24 00:44:37 UTC - Igor Zubchenok: It's ok.
----
2018-03-24 00:44:39 UTC - Matteo Merli: tehre’s no throughtput advantage after
that
----
2018-03-24 00:45:57 UTC - Matteo Merli: I would say to configure the batch
grouping time to a value that you’re confortable with, in terms of latency. eg:
1ms, 10ms or 100ms and forget about it
----
2018-03-24 00:46:08 UTC - Igor Zubchenok: Ok, &lt;128Kb of my data I want to
send in a single batch and flush it in async way.
----
2018-03-24 00:47:53 UTC - Igor Zubchenok: Has it any sense?
----
2018-03-24 00:48:47 UTC - Matteo Merli: I think I don’t fully understand your
goal :slightly_smiling_face:
----
2018-03-24 00:55:08 UTC - Igor Zubchenok: :slightly_smiling_face: Ok
I writing high performance server. Everything must be async.
At some time I have many messages (with total size &lt; 128Kb) to send and I
don't want to send all these messages one-by-one. So I would like to use batch
to send them and I have to set some batch timeout. But my server will wait too
long until timeout. And don't want my messages to be splitted if batch
preparation was longer than timeout.
However I know when batch is actually ready.
----
2018-03-24 00:57:33 UTC - Igor Zubchenok: so more clearly?
----
2018-03-24 00:59:49 UTC - Igor Zubchenok: How to terminate a topic?
I use topic for session initialization messages. When all messages are sent I
want to terminate it.
----
2018-03-24 01:02:16 UTC - Matteo Merli: There’s a REST / Java / CLI command to
terminate a topic. Though keep in mind that terminate is different than deletion
----
2018-03-24 01:02:41 UTC - Igor Zubchenok: I don't need to delete cause I need
messages to be delivered.
----
2018-03-24 01:04:42 UTC - Igor Zubchenok: Neither Producer or PulsarClient
classes have anything like terminate.
----
2018-03-24 01:05:04 UTC - Matteo Merli: &gt; But my server will wait too long
until timeout

That shouldn’t require much resources in a async server though.

&gt; And don’t want my messages to be splitted if batch preparation was longer
than timeout.

This I don’t understand why is important
----
2018-03-24 01:07:16 UTC - Matteo Merli: It’s on admin API. It’s a separate
artifact. basically it’s a wrapper on REST API.
<http://pulsar.apache.org/api/admin/org/apache/pulsar/client/admin/PersistentTopics.html#terminateTopicAsync-java.lang.String->
----
2018-03-24 01:07:32 UTC - Igor Zubchenok: Thanks! Will look into it.
----
2018-03-24 01:10:53 UTC - Igor Zubchenok: &gt;This I don’t understand why is
important
I guess all messages in a batch are queued together and not mixed with other
producers' messages.
So if timeout is small, messages are send in few batches and messages could be
mixed.
----
2018-03-24 01:11:55 UTC - Igor Zubchenok: &gt;That shouldn’t require much
resources in a async server though.
But it makes needless latency.
----
2018-03-24 01:13:54 UTC - Matteo Merli: Yes, but if you set the batch timeout
to 1ms, the latency impact is very low and you can batch to a 1K write/s which
is very sustainable on broker/bookies
----
2018-03-24 01:16:08 UTC - Igor Zubchenok: I always looking for the maximum
performance so the ideal way if I can control the batch - all that needed is to
trigger flushAsync manually.
----
2018-03-24 01:17:27 UTC - Igor Zubchenok: And all immediately flushed messages
are not mixed with another producers sends.
----
2018-03-24 01:21:57 UTC - Matteo Merli: Yes, it makes sense and that’s easy to
expose. It’s already implemented in that way. I’m not just sure about the
effective improvement that it could get. with 1ms , the latency is increase of
~0.5 average which is still much lower than the overall system latency (~5ms)
----
2018-03-24 01:26:51 UTC - Igor Zubchenok: Just for max theoretical performance.
Should I file an issue then at github?
----
2018-03-24 01:34:19 UTC - Matteo Merli: Sure, go ahead
----
2018-03-24 01:41:18 UTC - Igor Zubchenok:
<https://github.com/apache/incubator-pulsar/issues/1433>
----
2018-03-24 02:08:54 UTC - Matteo Merli: :+1:
----
2018-03-24 08:09:43 UTC - Igor Zubchenok: Unfortunately PulsarAdmin does not
work due to issue #1409
<https://github.com/apache/incubator-pulsar/issues/1409>
----

Slack digest for #general - 2018-03-24

Reply via email to