2018-03-23 09:17:04 UTC - Piotr: @Ivan Kelly also found this written by you, 
great read! :slightly_smiling_face: : 
<https://streaml.io/blog/bookkeeper-toab/> . A question regarding 
<http://streaml.io|streaml.io> - am I correct in assuming you will be providing 
some kind of managed hosting of pulsar clusters? And got any timeline for that?
----
2018-03-23 10:24:20 UTC - Ivan Kelly: we're focusing on on-prem right now. I 
haven't heard any plans for saas, but I'm remote so it may have passed me by
----
2018-03-23 10:25:35 UTC - Piotr: ok
----
2018-03-23 13:24:14 UTC - Jon Bock: Piotr, what cloud would be of most interest 
to you?  We’re gathering information to help us in our planning.
----
2018-03-23 13:24:43 UTC - Jon Bock: Or are you agnostic to which cloud a SaaS 
offering would be hosted on?
----
2018-03-23 13:47:08 UTC - Piotr: @Jon Bock AWS, and in my case it would 
probably have to be in the EU region as EU data protection laws are getting 
even more strict in may this year
----
2018-03-23 14:04:03 UTC - Ivan Kelly: the US is ok according to GDPR
----
2018-03-23 14:06:07 UTC - Piotr: ah ok, if you say so - about to meet with some 
lawyers to get more clarity regarding that :slightly_smiling_face: Anyhow, I 
probably will have to wait for any offering to be available in EU-ireland since 
that´s where I´m hosted
----
2018-03-23 14:07:14 UTC - Piotr: but I guess you´ll start with the most popular 
US regions etc and move on from there
----
2018-03-23 17:19:01 UTC - Igor Zubchenok: Hello!!
Is there a lightweight way to post a message/batch of messages to a topic?
Due to my business logic I have to send messages to arbitrary topic and I worry 
about time and resources needed to create a publisher just for a single message.
----
2018-03-23 17:20:16 UTC - Matteo Merli: From where are you publishing from?
----
2018-03-23 17:22:00 UTC - Igor Zubchenok: From different stateless instances of 
my server (java)
----
2018-03-23 17:22:38 UTC - Matteo Merli: Well, you could create the producer on 
startup and keep reusing that for each message
----
2018-03-23 17:22:45 UTC - Matteo Merli: (on each instance)
----
2018-03-23 17:23:08 UTC - Igor Zubchenok: Will I have to create millions 
producers for millions topics at every instace?
----
2018-03-23 17:23:18 UTC - Matteo Merli: Other option could be to use the 
WebSocket proxy: <http://pulsar.apache.org/docs/latest/clients/WebSocket/>
----
2018-03-23 17:24:25 UTC - Matteo Merli: yes, what is the reason for creating 
millions of topics? (Many times is easier to work around that assumption, than 
dealing with resources necessaries to support them)
----
2018-03-23 17:25:53 UTC - Igor Zubchenok: I have an independent queue(topic) 
per every user. Millions of topics is a major reason to start using Pulsar 
instead of Kafka.
----
2018-03-23 17:29:53 UTC - Igor Zubchenok: I've checked WebSocket API and there 
is nothing about sending messages to an arbitrary topic. You have to specify 
topic in URL to WebSocket. Did I miss something?
----
2018-03-23 17:30:24 UTC - Matteo Merli: yes, it’s working in the same way, you 
need to have 1 producer per topic
----
2018-03-23 17:30:41 UTC - Matteo Merli: it’s just more suited if you have 
short-living processes
----
2018-03-23 17:32:50 UTC - Igor Zubchenok: So this means a connection 
establishing and a WebSocket handshake for every single message posted to an 
arbitrary topic. Right?
----
2018-03-23 17:33:28 UTC - Matteo Merli: Well, webosocket session is meant to be 
long-living as well
----
2018-03-23 17:33:45 UTC - Matteo Merli: you can use same session to publish 
many messages
----
2018-03-23 17:33:56 UTC - Matteo Merli: (with ordering)
----
2018-03-23 17:34:00 UTC - Igor Zubchenok: Caching in memory millions of 
websocket connections does not sounds as a good way.
----
2018-03-23 17:34:20 UTC - Matteo Merli: Yes, it’s more efficient to use Java 
client anywya
----
2018-03-23 17:34:20 UTC - Igor Zubchenok: As well as Producers.
----
2018-03-23 17:34:30 UTC - Matteo Merli: where connections are pooled
----
2018-03-23 17:34:46 UTC - Matteo Merli: it’s just the memory that you need to 
keep all the objects
----
2018-03-23 17:35:16 UTC - Igor Zubchenok: Is Producer instance a lightweight?
----
2018-03-23 17:35:49 UTC - Matteo Merli: it is in the sense that there are no 
threads / TCP connections specifics for a publisher
----
2018-03-23 17:36:22 UTC - Igor Zubchenok: What is actually happens when a new 
Producer is created? Why is it a long operation?
----
2018-03-23 17:37:04 UTC - Matteo Merli: client will go to broker and check and 
it establishes a sessions (check permissions, etc)
----
2018-03-23 17:37:50 UTC - Igor Zubchenok: Why this cannot be done once per 
`namespace`?
----
2018-03-23 17:37:57 UTC - Matteo Merli: if the number of active topics is much 
smaller than the total, you could keep a cache of producers
----
2018-03-23 17:38:11 UTC - Matteo Merli: no, each topic can be served by 
different machines
----
2018-03-23 17:38:39 UTC - Igor Zubchenok: It is done by a hash of topic, isn't 
it?
----
2018-03-23 17:39:09 UTC - Matteo Merli: the grouping is done by hash, but the 
assingment is dynamic
----
2018-03-23 17:40:58 UTC - Igor Zubchenok: So to find a proper broker instance 
you should have a meta information about all available brokers for namespace 
and then choose proper by hash of a topic. Sounds like there is no actual need 
to contact broker for every single topic.
----
2018-03-23 17:42:16 UTC - Matteo Merli: It’s more complicated than that 
:slightly_smiling_face: because that assignments and even the hash split can 
change over time
----
2018-03-23 17:44:07 UTC - Igor Zubchenok: What do Producer of a topic do when 
this happens?
----
2018-03-23 17:44:34 UTC - Igor Zubchenok: I guess the same can be done by such 
'ProducerForNamespace'.
----
2018-03-23 17:45:27 UTC - Matteo Merli: To clarify, even though a cluster can 
support millions of topics, it’s not always the best way to model a problem. 
For example, you could use a database to store per-user data and message bus as 
a machine to machine notification channel (eg: 1 topic per server)
----
2018-03-23 17:46:39 UTC - Matteo Merli: When topic moves to a different broker, 
producer gets notified (either gracefully or forcefully) and it will redo the 
service discovery and session creation
----
2018-03-23 17:48:24 UTC - Igor Zubchenok: So it could be ok for re-discovery 
and re-creation for 'ProducerForNamespace' in the same manner.
----
2018-03-23 17:51:06 UTC - Igor Zubchenok: That's what I have now, I store 
queues in Cassandra.
----
2018-03-23 17:53:08 UTC - Igor Zubchenok: However when I write a message to 
Cassandra I have no fast way to know where to send a notification cause user 
could be connected to arbitrary server instance. So I do 100ms period scanning 
of new messages for connected to server users. And Cassandra is dying... 
:slightly_smiling_face:
----
2018-03-23 17:53:26 UTC - Igor Zubchenok: Am I wrong?
----
2018-03-23 17:56:17 UTC - Matteo Merli: I see, would a single queue which 
broadcast info to all servers work? Each server could just drop the message if 
user is not connected there. 

Of course it doesn’t scale indefinitely, but you can always group users to 
servers.
----
2018-03-23 17:58:56 UTC - Matteo Merli: For that to work, there would be 
needing several changes : 
 * Right now, producer is based on that producer session for efficiency reason. 
There is some context expected on both client and broker around this producer. 
 * For example, internally the topic name doesn’t need to be repeated for each 
message (and many other things like that)
 * If a topic moves, the notification is at topic level, though if broker 
doesn’t know a client is interested in publishing to a topic, it cannot notify 
it.
----
2018-03-23 18:00:02 UTC - Piotr: How about using Pulsar Functions for this? 
Can´t you use Functions to forward messages to other topics? In case you want 
to broadcast messages etc
----
2018-03-23 18:00:23 UTC - Matteo Merli: good point
----
2018-03-23 18:01:11 UTC - Igor Zubchenok: It could be a namespace 
context/notifications/etc.

P.S. I have stateless servers, so caching of Producers goes against stateless 
idea.
----
2018-03-23 18:01:28 UTC - Igor Zubchenok: Functions sounds good if I put topic 
name in a message? Will it work good?
----
2018-03-23 18:01:52 UTC - Matteo Merli: you need some info to know where to 
route to
----
2018-03-23 18:03:11 UTC - Igor Zubchenok: Can I specify arbitrary target topic 
in a function?
----
2018-03-23 18:05:12 UTC - Matteo Merli: you can use the supplied context object 
to publish to arbitrary topic
----
2018-03-23 18:06:13 UTC - Matteo Merli: though, again, if you want to fan-out 
to unlimited number of topic, the same reources/concerns would apply
----
2018-03-23 18:08:28 UTC - Igor Zubchenok: I see. Do I do something wrong? Is it 
an anti-pattern to have a separated queue per user?
----
2018-03-23 18:09:21 UTC - Piotr: Igor, I assume you want to broadcast system 
wide messages to users or something? How about you have one topic just for 
system wide messages?
----
2018-03-23 18:09:32 UTC - Piotr: that would also mean using a lot less resources
----
2018-03-23 18:10:04 UTC - Matteo Merli: &gt; I see. Do I do something wrong? Is 
it an anti-pattern to have a separated queue per user? 

If users is unbound, yes.
----
2018-03-23 18:10:09 UTC - Piotr: the ability to have millions of topics (one 
per user) is also the reason why I switched from kafka to pulsar
----
2018-03-23 18:12:36 UTC - Piotr: also, don´t you have it on your roadmap to 
support millions of topics further ahead? So question is if this is a 
requirement you have from day 0 igor, or if the actual number of active topics 
is going to be less than that (maybe you don´t need unlimited retention on all 
topics, for example)
----
2018-03-23 18:13:26 UTC - Piotr: I guess using pulsar for a mail inbox for all 
your users is not optimal - you could use cassandra or something for that and 
use pulsar for realtime data
----
2018-03-23 18:14:03 UTC - Matteo Merli: Yes, even currently it’s running in 
production with several million topics at Yahoo (though, not one topic per user 
:wink: )
----
2018-03-23 18:14:53 UTC - Igor Zubchenok: Requirement is to have 
millions-to-millions messaging. scaleable.
----
2018-03-23 18:15:34 UTC - Igor Zubchenok: and stateless server instances.
----
2018-03-23 18:15:44 UTC - Matteo Merli: also 1M is very different from 100M
----
2018-03-23 18:16:08 UTC - Matteo Merli: (I mean in terms of design assumptions)
----
2018-03-23 18:17:18 UTC - Piotr: @Matteo Merli are "idle" topics (with no 
active consumers or producers) costly? Or can you have millions of those in the 
store just waiting?
----
2018-03-23 18:18:36 UTC - Matteo Merli: the cost is mostly in terms of memory 
in broker. There’s a small amount of objects for each topic that is “loaded”
----
2018-03-23 18:19:49 UTC - Piotr: Ok. How about running several clusters and 
choosing one based on user id (sharding)?
heavy_plus_sign : Igor Zubchenok
----
2018-03-23 18:20:25 UTC - Matteo Merli: That is not the biggest problem, you 
can do that already by having multiple brokers
----
2018-03-23 18:20:26 UTC - Piotr: or simply assigning a cluster id at user 
creation time or something
----
2018-03-23 18:23:09 UTC - Igor Zubchenok: This is what I want to avoid - 
grouping users and dispatching messages to a proper server where user is 
connected.
----
2018-03-23 18:24:42 UTC - Igor Zubchenok: @Piotr this could be a solution for 
scaling, but the problem with Producers is still actual.
----
2018-03-23 18:26:20 UTC - Piotr: @Igor Zubchenok how about using sticky 
sessions?
----
2018-03-23 18:26:29 UTC - Matteo Merli: @Igor Zubchenok Most of notifications 
systems that needs to scale to 100 of millions of users typically use a 
cascading mechanism of notifications
----
2018-03-23 18:26:42 UTC - Piotr: you would still keep servers stateless, but 
using a cookie you could make each client target the same servers most of the 
time
----
2018-03-23 18:27:35 UTC - Igor Zubchenok: This became a problem if a server 
instance goes down and users reconnect to other available instances.
----
2018-03-23 18:28:17 UTC - Piotr: I´m guessing you´re designing some kind of 
chat app here? In that case each user will keep messaging almost the same users 
in a period of time - and by using sticky sessions the requests from that user 
will keep going to the same server until it goes down and you need to switch. 
Point is then a producer cache should work nicely
----
2018-03-23 18:28:49 UTC - Matteo Merli: a common way for that is to have “pods” 
of servers handling groups of users,  where each pod has a max-size
----
2018-03-23 18:29:02 UTC - Piotr: Igor: sticky sessions doesnt mean your servers 
become stateful - it just means that the load balancer will keep sending 
requests from the same user to the same server for as long as that server is 
online
----
2018-03-23 18:30:38 UTC - Piotr: 
<https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html>
----
2018-03-23 18:34:45 UTC - Igor Zubchenok: Agree, this is a solution - having 
groups of servers for partition of users and reconnect user to the same group 
of servers. But possibility that provide ProducerForNamespace would solve this 
problem in much more elegant and scaleable way and it would be superb.
----
2018-03-23 18:35:16 UTC - Piotr: hehe
----
2018-03-23 18:35:28 UTC - Matteo Merli: &gt;  But possibility that provide 
ProducerForNamespace would solve this problem in much more elegant and 
scaleable way and it would be superb. 

Let me think a bit more about it. :slightly_smiling_face:
----
2018-03-23 18:37:52 UTC - Igor Zubchenok: Otherwise I have to configure user 
partitioning in my solution, adding a new partition will result in session 
closing. As a result - lot of additional development and operations. And 
sticky-to-group sessions.
----
2018-03-23 18:38:23 UTC - Piotr: I´m thinking there would probably be a tradeof 
either way - isn´t creating a producer also a way to avoid doing security 
checks on every time you send a message?
----
2018-03-23 18:38:45 UTC - Matteo Merli: exactly
----
2018-03-23 18:38:59 UTC - Piotr: Igor: why sticky to group? Session affinity is 
a yes/no config thing in the load balancer :slightly_smiling_face:
----
2018-03-23 18:39:18 UTC - Piotr: unless I missed something sticky sessions + 
producer caching should solve your problems beautifully
----
2018-03-23 18:40:06 UTC - Piotr: sticky sessions = on first request load 
balancer assigns the client to 1 server for X amount of time, or until that 
server dies - then it selects a new one. So... quite scalable, elastic etc etc
----
2018-03-23 18:45:08 UTC - Igor Zubchenok: @Piotr To have a few producers I need 
to know to which group of servers a user is connected and send messages to the 
group. Server instance reads all messages in topic of a group and filters all 
messages. So it is the same problem and solution that was with Kafka, in 
general.
----
2018-03-23 18:46:40 UTC - Igor Zubchenok: Another issue here is that user could 
be offline and all servers will drop message. So the message will be 
undelivered.
----
2018-03-23 18:48:11 UTC - Piotr: you should ack messages after they have been 
read to avoid that
----
2018-03-23 18:48:26 UTC - Igor Zubchenok: So you mean still have a topic per 
user?
----
2018-03-23 18:48:35 UTC - Piotr: also, I am still not sure we are on the same 
page with that thing with the groups. Why would you need groups of servers? 
:slightly_smiling_face:
----
2018-03-23 18:48:48 UTC - Piotr: Ok here is the flow in my mind
----
2018-03-23 18:49:07 UTC - Piotr: Sending a message from user A to user B:
----
2018-03-23 18:49:37 UTC - Piotr: 1) Client sends request for example POST 
/users/userA/messages
----
2018-03-23 18:50:30 UTC - Piotr: 2) Load balancer sends the request to server 
1, and along with the response from the server it also sets a cookie saying 
"you belong to server 1" (session affinity/ sticky sessions)
----
2018-03-23 18:51:38 UTC - Piotr: 2b) On server 1, a producer is created for 
topic <persistent://yourapp/global/userMessages/userA>, a message is sent 
through that producer and the producer is put in a pool/cache
----
2018-03-23 18:52:20 UTC - Piotr: 3) Client sends another message request for 
the same destination user (user A) POST /users/userA/messages, this time it 
also includes the cookie "you belong to server 1"
----
2018-03-23 18:52:31 UTC - Piotr: 4) Load balancer sees the cookie, and sends 
the request to server 1
----
2018-03-23 18:52:59 UTC - Piotr: 5) server 1 already has a producer for 
<persistent://yourapp/global/userMessages/userA> in its cache, so it can use 
that - avoiding having to create it again
----
2018-03-23 18:53:19 UTC - Piotr: Does that make sense? :slightly_smiling_face:
----
2018-03-23 18:53:25 UTC - Piotr: Or did I misunderstand something?
----
2018-03-23 18:54:59 UTC - Piotr: Also, you could do it all through websockets 
of course - then you wouldn´t even need the cookie - the websocket session 
would suffice as a connection to one and the same server for its lifetime
----
2018-03-23 18:55:42 UTC - Igor Zubchenok: I meant group of server was needed to 
have another server where user can connect to avoid session re-initialization 
if some server instance is down.
----
2018-03-23 18:56:07 UTC - Igor Zubchenok: And I actually use websocket 
connections.
----
2018-03-23 18:58:54 UTC - Piotr: ok I guess you have a reason for grouping the 
servers together
----
2018-03-23 18:58:59 UTC - Igor Zubchenok: There is no problem with consumer per 
user, cause there are limited number of users connected to a server. Problem is 
with Producers, when a user can send a message to another arbitrary user. To 
send a message I should create and cache millions of Producers at a single 
server instance which is not scalable.

So solution is to know which group of servers a user is connected, and send 
messages to that group, where message is 'forwarded' to proper user's topic.
----
2018-03-23 19:00:51 UTC - Matteo Merli: @Igor Zubchenok regarding the acking 
problem. If you treat the message bus as a notification, it's not a problem to 
ack messages immediately even if a consumer is not connected 
----
2018-03-23 19:01:27 UTC - Piotr: I think you can safely assume that a user will 
message at most X other users within a reasonable amount of time. And if you 
can direct all outgoing messages to the same server for a period of time, then 
that server can cache those producers - no need to cache more producers than 
you need
----
2018-03-23 19:01:46 UTC - Matteo Merli: You just need to make sure that a 
server fetches the state for the users from the database, when it forst start 
serving that user 
----
2018-03-23 19:02:34 UTC - Igor Zubchenok: Sounds like I need a cluster of 
dispatchers on the top of Pulsar to forward messages to proper user topic. 
Every dispatcher serves a partition of users and should have cached producers 
per topic of user.
----
2018-03-23 19:03:24 UTC - Piotr: I really don´t think you need that 
:slightly_smiling_face: But I can´t explain my idea in any other way now
----
2018-03-23 19:03:36 UTC - Piotr: or I missed something essential
----
2018-03-23 19:04:50 UTC - Igor Zubchenok: It is not only a chat. It real-time 
tracking, broadcasting and orders dispatching.
----
2018-03-23 19:06:08 UTC - Matteo Merli: @Igor Zubchenok if you assign a user to 
this set of servers, you can broadcast the notification to that specific group 
----
2018-03-23 19:06:15 UTC - Karthik Palanivelu: Hi @Matteo Merli, I brought up 3 
standalone pulsar and trying to configure the clusters. I am following your 
instructions and issued the below command to receive error of Connection 
refused..\pulsar-admin clusters create c —url <pulsar://IP:8080>. I am getting 
same error when I fired the list as well, I am getting the same error. But 
telnet works fine. Any setting I need to make? Please advice.
----
2018-03-23 19:06:18 UTC - Igor Zubchenok: Agree if messages are in Cassandra.
----
2018-03-23 19:06:22 UTC - Piotr: ok. Yeah I think broadcasting is the tricky 
part
----
2018-03-23 19:06:32 UTC - Matteo Merli: At that point you don't need 1 topic 
per user anymore
----
2018-03-23 19:07:37 UTC - Matteo Merli: @Karthik Palanivelu the pulsar admin 
URL should use the HTTP 
----
2018-03-23 19:09:52 UTC - Matteo Merli: Correct, but using the message bus for 
notifications will solve the polling/scanning
heavy_check_mark : Igor Zubchenok
----
2018-03-23 19:11:12 UTC - Karthik Palanivelu: @Matteo Merli I tried with url 
and broker-url options and both all returning connection refused. I can access 
these instances from my local env, 3 standalone are in AWS 
----
2018-03-23 19:13:43 UTC - Igor Zubchenok: So there are 3 ways:
1. Have a ProducerForNamespace.
2. Have a cluster of dispatchers for partitions of users that just cache 
Producers for every user in dispatcher's partition and forward messages to 
proper users' topics.
3. Store messages out of Pulsar, use sticky to server(or group of servers) user 
sessions, and send notifications about new messages via Pulsar via single topic 
per server(or group of servers) -- such can be done with Kafka as well.
----
2018-03-23 19:14:27 UTC - Piotr: Igor: do you want to broadcast exactly the 
same message to absolutely everyone?
----
2018-03-23 19:15:29 UTC - Igor Zubchenok: Nope. Broadcast is to selected users, 
sending orders. This require a queue per user.
----
2018-03-23 19:16:14 UTC - Piotr: aha ok
----
2018-03-23 19:16:29 UTC - Igor Zubchenok: Every operation in system can 
generate messages to different users. So arbitrary sending is needed.
----
2018-03-23 19:16:40 UTC - Piotr: so it´s a multicast then
----
2018-03-23 19:16:58 UTC - Piotr: @Matteo Merli do you think efficient 
multicasting is possible using Functions?
----
2018-03-23 19:17:17 UTC - Piotr: I can´t seem to find any available API for 
Functions :slightly_smiling_face:
----
2018-03-23 19:17:27 UTC - Igor Zubchenok: Messages that are sent to arbitrary 
users are actually different. It's not just sending a message to multiple users.
----
2018-03-23 19:18:30 UTC - Piotr: ah ok. So not even multicasting, but batching 
since this will probably be done by one and the same process in huge batches. 
Got it
----
2018-03-23 19:20:18 UTC - Piotr: with rabbitmq you could do it quite fast using 
routing keys - I am thinking if Functions can be used to do something similar
----
2018-03-23 19:20:57 UTC - Igor Zubchenok: It would be great if I can send 1000 
messages to 1000 different topics in a single batch. :wink: Currently I have to 
split the batch per topic and create 1000 producers.
----
2018-03-23 19:25:18 UTC - Piotr: yep... I think I read somewhere that Functions 
can be used to forward messages from one topic to another... So then you could 
just have one topic to send all messages and then have a Function forwarding 
them to the right topic. But maybe that was the kafka docs as I can´t find that 
again
----
2018-03-23 19:26:59 UTC - Igor Zubchenok: 
----
2018-03-23 19:27:11 UTC - Igor Zubchenok: if you use functions for that. 
@Matteo Merli told there is a context in function to send a message to 
arbitrary topic.
----
2018-03-23 19:27:49 UTC - Sijie Guo: &gt; I can´t seem to find any available 
API for Functions

we are still actively working on the function documentation. 

you can checkout this:

<https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/>
<https://streaml.io/blog/pulsar-functions/>

or you can checkout the code :slightly_smiling_face:
----
2018-03-23 19:28:24 UTC - Matteo Merli: A possible way would be to use 
partitioned topic for that and have the function use failover subscription 
+1 : Igor Zubchenok
----
2018-03-23 19:28:48 UTC - Matteo Merli: And you publish using user name as 
message key
+1 : Igor Zubchenok
----
2018-03-23 19:29:40 UTC - Sijie Guo: &gt; if you use functions for that. 
@Matteo Merli told there is a context in function to send a message to 
arbitrary topic.

just fyi (not interrupt the discussion) : 
<https://github.com/apache/incubator-pulsar/blob/master/pulsar-functions/api-java/src/main/java/org/apache/pulsar/functions/api/Context.java#L135>
+1 : Igor Zubchenok
----
2018-03-23 19:31:58 UTC - Piotr: nice - thanks. I really like the Functions 
feature so far. Then I think your only problem left to solve should be the 
ability to have tens of millions of topics
----
2018-03-23 19:32:36 UTC - Piotr: which should be possible currently, and even 
more with this: 
<https://github.com/apache/incubator-pulsar/wiki/PIP-8-Pulsar-beyond-1M-topics>
----
2018-03-23 19:33:20 UTC - Karthik Palanivelu: @Matteo Merli is this because of 
super user roles is empty in my standalone conf?
----
2018-03-23 19:35:50 UTC - Igor Zubchenok: these functions are like a cluster of 
dispatchers that forwards messages to proper topic with Producer instances 
caching?
----
2018-03-23 19:36:27 UTC - Igor Zubchenok: if yes, this simplifies the problem 
significantly.
----
2018-03-23 20:04:03 UTC - Igor Zubchenok: I so much happy my case most likely 
can be solved with Pulsar + dispatch function.

I've checked mentioned Context and found that there is nothing like Publishers. 
Just a `publish` method to an arbitrary `topic`. @Matteo Merli told that it 
will cost resources. Could anyone provide any details on sending messages with 
this `Context.publish` method?
----
2018-03-23 20:17:12 UTC - Piotr: yes I wonder if that will be much more 
efficient than just doing it with publishers
----
2018-03-23 20:17:43 UTC - Piotr: at a minimum, you´ll get rid of some roundtrip 
time
----
2018-03-23 20:19:07 UTC - Igor Zubchenok: This is at least partitioned and 
limited number of publishers should be cached (and if they are cached under the 
hood of Context class)
----
2018-03-23 20:21:37 UTC - Igor Zubchenok: If not cached and same technique is 
used for sending messages to topics, it's sounds like the same as creating a 
Publisher for every single message.
----
2018-03-23 20:21:58 UTC - Igor Zubchenok: @Matteo Merli @Sijie Guo could you 
provide some details on this please?
heavy_plus_sign : Piotr
----
2018-03-23 21:23:22 UTC - Matteo Merli: That is correct. By using a 
parrtitioned topic with hashing based on the message key, and failover 
subscription mode for the function, each instance will only see messages for a 
certain subset of users
----
2018-03-23 22:14:49 UTC - Sijie Guo: @Igor Zubchenok sorry was away for a 
meeting. will reply soon
ok_hand : Igor Zubchenok
----
2018-03-23 22:23:13 UTC - Igor Zubchenok: great!
----
2018-03-23 22:23:56 UTC - Igor Zubchenok: &gt; with hashing based on the 
message key
maybe hash(userId) or hash(target topic)?
----
2018-03-23 22:24:41 UTC - Igor Zubchenok: welcome back:)
----
2018-03-23 22:55:14 UTC - Sijie Guo: &gt; Could anyone provide any details on 
sending messages with this `Context.publish` method?

currently `Context.publish` is just wrapping the producers. so fundamentally it 
isn’t more efficient than just using the plain pulsar producer.

&gt; If not cached and same technique is used for sending messages to topics, 
it’s sounds like the same as creating a Publisher for every single message.

there are producer instances cached. so at some extend, it is “more” efficient 
than just creating a producer for every single message.

back to your use case:

1) if you are using a giant topic with many partitions (as what Matteo 
suggested), you can use use_name as the key for routing messages. so same user 
will end up at same partition. it is most efficient for functions, as it caches 
only one topic producer which handles multiple partitions.

2) if you wish to use a topic per user, that means each topic will have very 
low traffic. cache is not that efficient. but something like NamespaceProducer 
(as what you suggested) is probably better to be introduced. NamespaceProducer 
should be very straightforward to implement though.
----
2018-03-23 22:55:33 UTC - Sijie Guo: @Igor Zubchenok hope this give you better 
idea. /cc @Matteo Merli if he has more to add
----
2018-03-23 23:04:19 UTC - Igor Zubchenok: Seems I did not got the idea around 
`use_name`. What is use_name?
----
2018-03-23 23:04:58 UTC - Sijie Guo: @Igor Zubchenok sorry, userId?
----
2018-03-23 23:05:29 UTC - Sijie Guo: @Igor Zubchenok - this is the thread I am 
referring to regarding “user_name”
+1 : Igor Zubchenok
----
2018-03-23 23:05:46 UTC - Sijie Guo: @Igor Zubchenok I tagged you in the tread 
you had with @Matteo Merli
----
2018-03-23 23:11:10 UTC - Igor Zubchenok: Great. All this makes sense and 
actually looks scaleable. Thank you for your help!
----
2018-03-23 23:11:11 UTC - Igor Zubchenok: BTW I hope you'll eventually consider 
to implement NamespaceProducer as more efficient way and easy to implement 
inside Pulsar. :slightly_smiling_face:
heavy_plus_sign : Piotr
----
2018-03-23 23:17:41 UTC - Sijie Guo: no problem. feel free to ping us if you 
have more questions.
----
2018-03-23 23:17:59 UTC - Sijie Guo: I created an issue for this namespace 
producer idea - <https://github.com/apache/incubator-pulsar/issues/1432>
+1 : Igor Zubchenok
----
2018-03-23 23:18:03 UTC - Sijie Guo: just fyi
----
2018-03-23 23:18:46 UTC - Sijie Guo: feel free to drop your ideas in that 
issue, so we can come up an clear picture about the requirements
----
2018-03-24 00:31:02 UTC - Igor Zubchenok: 1. Is there any way to flush the 
batch of a producer?
----
2018-03-24 00:35:17 UTC - Igor Zubchenok: 2. When is 
MessageListener.reachedEndOfTopic called? How to send something to a topic to 
trigger this at all consumers?
----
2018-03-24 00:36:11 UTC - Matteo Merli: 1. Calling `send()` on the last message
----
2018-03-24 00:37:15 UTC - Matteo Merli: 2. `reachedEndOfTopic` is only called 
when the topic is “terminated”. That’s an explicit operation that “seals” the 
topic and doesn’t allow any new message to be published anymore
----
2018-03-24 00:38:54 UTC - Igor Zubchenok: 1. Send() of what object (I use java 
client)? No such method producer.send(), when I send I only get MessageId 
instance.
----
2018-03-24 00:39:48 UTC - Matteo Merli: Let say you’re publishing 10 messages. 
You can call `producer.sendAsync()` 9 times and `producer.send()` for the last 
message
----
2018-03-24 00:40:14 UTC - Matteo Merli: (in current master code, the 
`producer.send()` triggers immediate flush
----
2018-03-24 00:40:29 UTC - Igor Zubchenok: Only async calls use batches?
----
2018-03-24 00:40:51 UTC - Matteo Merli: Yes, the batch is done transparently in 
producer
----
2018-03-24 00:41:14 UTC - Matteo Merli: batching adds a small delay to allow 
for grouping the messages
----
2018-03-24 00:41:35 UTC - Matteo Merli: when used on sync send, it would slow 
down the throughput
----
2018-03-24 00:42:16 UTC - Igor Zubchenok: `producer.send()` will triggers 
flush, but it will make my call synchronized, right? I prefer async API only.
----
2018-03-24 00:42:51 UTC - Matteo Merli: yes, then why do you need to flush?
----
2018-03-24 00:43:38 UTC - Matteo Merli: just configure the batch grouping time 
on producer, and use sendAsync(), tracking the CompletableFuture
----
2018-03-24 00:43:42 UTC - Igor Zubchenok: I use infinite timeout and max number 
of messages to send everything I already have to send in a single batch. 
:slightly_smiling_face:
----
2018-03-24 00:44:27 UTC - Matteo Merli: there’s not much advantage in having a 
huge batch. the max is capped at 128Kb anyway
----
2018-03-24 00:44:37 UTC - Igor Zubchenok: It's ok.
----
2018-03-24 00:44:39 UTC - Matteo Merli: tehre’s no throughtput advantage after 
that
----
2018-03-24 00:45:57 UTC - Matteo Merli: I would say to configure the batch 
grouping time to a value that you’re confortable with, in terms of latency. eg: 
1ms, 10ms or 100ms and forget about it
----
2018-03-24 00:46:08 UTC - Igor Zubchenok: Ok, &lt;128Kb of my data I want to 
send in a single batch and flush it in async way.
----
2018-03-24 00:47:53 UTC - Igor Zubchenok: Has it any sense?
----
2018-03-24 00:48:47 UTC - Matteo Merli: I think I don’t fully understand your 
goal :slightly_smiling_face:
----
2018-03-24 00:55:08 UTC - Igor Zubchenok: :slightly_smiling_face: Ok
I writing high performance server. Everything must be async.
At some time I have many messages (with total size &lt; 128Kb) to send and I 
don't want to send all these messages one-by-one. So I would like to use batch 
to send them and I have to set some batch timeout. But my server will wait too 
long until timeout. And don't want my messages to be splitted if batch 
preparation was longer than timeout.
However I know when batch is actually ready.
----
2018-03-24 00:57:33 UTC - Igor Zubchenok: so more clearly?
----
2018-03-24 00:59:49 UTC - Igor Zubchenok: How to terminate a topic?
I use topic for session initialization messages. When all messages are sent I 
want to terminate it.
----
2018-03-24 01:02:16 UTC - Matteo Merli: There’s a REST / Java / CLI command to 
terminate a topic. Though keep in mind that terminate is different than deletion
----
2018-03-24 01:02:41 UTC - Igor Zubchenok: I don't need to delete cause I need 
messages to be delivered.
----
2018-03-24 01:04:42 UTC - Igor Zubchenok: Neither Producer or PulsarClient 
classes have anything like terminate.
----
2018-03-24 01:05:04 UTC - Matteo Merli: &gt; But my server will wait too long 
until timeout

That shouldn’t require much resources in a async server though. 

&gt; And don’t want my messages to be splitted if batch preparation was longer 
than timeout.

This I don’t understand why is important
----
2018-03-24 01:07:16 UTC - Matteo Merli: It’s on admin API. It’s a separate 
artifact. basically it’s a wrapper on REST API. 
<http://pulsar.apache.org/api/admin/org/apache/pulsar/client/admin/PersistentTopics.html#terminateTopicAsync-java.lang.String->
----
2018-03-24 01:07:32 UTC - Igor Zubchenok: Thanks! Will look into it.
----
2018-03-24 01:10:53 UTC - Igor Zubchenok: &gt;This I don’t understand why is 
important
I guess all messages in a batch are queued together and not mixed with other 
producers' messages.
So if timeout is small, messages are send in few batches and messages could be 
mixed.
----
2018-03-24 01:11:55 UTC - Igor Zubchenok: &gt;That shouldn’t require much 
resources in a async server though.
But it makes needless latency.
----
2018-03-24 01:13:54 UTC - Matteo Merli: Yes, but if you set the batch timeout 
to 1ms, the latency impact is very low and you can batch to a 1K write/s which 
is very sustainable on broker/bookies
----
2018-03-24 01:16:08 UTC - Igor Zubchenok: I always looking for the maximum 
performance so the ideal way if I can control the batch - all that needed is to 
trigger flushAsync manually.
----
2018-03-24 01:17:27 UTC - Igor Zubchenok: And all immediately flushed messages 
are not mixed with another producers sends.
----
2018-03-24 01:21:57 UTC - Matteo Merli: Yes, it makes sense and that’s easy to 
expose. It’s already implemented in that way. I’m not just sure about the 
effective improvement that it could get. with 1ms , the latency is increase of 
~0.5 average which is still much lower than the overall system latency (~5ms)
----
2018-03-24 01:26:51 UTC - Igor Zubchenok: Just for max theoretical performance. 
Should I file an issue then at github?
----
2018-03-24 01:34:19 UTC - Matteo Merli: Sure, go ahead
----
2018-03-24 01:41:18 UTC - Igor Zubchenok: 
<https://github.com/apache/incubator-pulsar/issues/1433>
----
2018-03-24 02:08:54 UTC - Matteo Merli: :+1:
----
2018-03-24 08:09:43 UTC - Igor Zubchenok: Unfortunately PulsarAdmin does not 
work due to issue #1409
<https://github.com/apache/incubator-pulsar/issues/1409>
----

Reply via email to