2018-03-23 09:17:04 UTC - Piotr: @Ivan Kelly also found this written by you, great read! :slightly_smiling_face: : <https://streaml.io/blog/bookkeeper-toab/> . A question regarding <http://streaml.io|streaml.io> - am I correct in assuming you will be providing some kind of managed hosting of pulsar clusters? And got any timeline for that? ---- 2018-03-23 10:24:20 UTC - Ivan Kelly: we're focusing on on-prem right now. I haven't heard any plans for saas, but I'm remote so it may have passed me by ---- 2018-03-23 10:25:35 UTC - Piotr: ok ---- 2018-03-23 13:24:14 UTC - Jon Bock: Piotr, what cloud would be of most interest to you? We’re gathering information to help us in our planning. ---- 2018-03-23 13:24:43 UTC - Jon Bock: Or are you agnostic to which cloud a SaaS offering would be hosted on? ---- 2018-03-23 13:47:08 UTC - Piotr: @Jon Bock AWS, and in my case it would probably have to be in the EU region as EU data protection laws are getting even more strict in may this year ---- 2018-03-23 14:04:03 UTC - Ivan Kelly: the US is ok according to GDPR ---- 2018-03-23 14:06:07 UTC - Piotr: ah ok, if you say so - about to meet with some lawyers to get more clarity regarding that :slightly_smiling_face: Anyhow, I probably will have to wait for any offering to be available in EU-ireland since that´s where I´m hosted ---- 2018-03-23 14:07:14 UTC - Piotr: but I guess you´ll start with the most popular US regions etc and move on from there ---- 2018-03-23 17:19:01 UTC - Igor Zubchenok: Hello!! Is there a lightweight way to post a message/batch of messages to a topic? Due to my business logic I have to send messages to arbitrary topic and I worry about time and resources needed to create a publisher just for a single message. ---- 2018-03-23 17:20:16 UTC - Matteo Merli: From where are you publishing from? ---- 2018-03-23 17:22:00 UTC - Igor Zubchenok: From different stateless instances of my server (java) ---- 2018-03-23 17:22:38 UTC - Matteo Merli: Well, you could create the producer on startup and keep reusing that for each message ---- 2018-03-23 17:22:45 UTC - Matteo Merli: (on each instance) ---- 2018-03-23 17:23:08 UTC - Igor Zubchenok: Will I have to create millions producers for millions topics at every instace? ---- 2018-03-23 17:23:18 UTC - Matteo Merli: Other option could be to use the WebSocket proxy: <http://pulsar.apache.org/docs/latest/clients/WebSocket/> ---- 2018-03-23 17:24:25 UTC - Matteo Merli: yes, what is the reason for creating millions of topics? (Many times is easier to work around that assumption, than dealing with resources necessaries to support them) ---- 2018-03-23 17:25:53 UTC - Igor Zubchenok: I have an independent queue(topic) per every user. Millions of topics is a major reason to start using Pulsar instead of Kafka. ---- 2018-03-23 17:29:53 UTC - Igor Zubchenok: I've checked WebSocket API and there is nothing about sending messages to an arbitrary topic. You have to specify topic in URL to WebSocket. Did I miss something? ---- 2018-03-23 17:30:24 UTC - Matteo Merli: yes, it’s working in the same way, you need to have 1 producer per topic ---- 2018-03-23 17:30:41 UTC - Matteo Merli: it’s just more suited if you have short-living processes ---- 2018-03-23 17:32:50 UTC - Igor Zubchenok: So this means a connection establishing and a WebSocket handshake for every single message posted to an arbitrary topic. Right? ---- 2018-03-23 17:33:28 UTC - Matteo Merli: Well, webosocket session is meant to be long-living as well ---- 2018-03-23 17:33:45 UTC - Matteo Merli: you can use same session to publish many messages ---- 2018-03-23 17:33:56 UTC - Matteo Merli: (with ordering) ---- 2018-03-23 17:34:00 UTC - Igor Zubchenok: Caching in memory millions of websocket connections does not sounds as a good way. ---- 2018-03-23 17:34:20 UTC - Matteo Merli: Yes, it’s more efficient to use Java client anywya ---- 2018-03-23 17:34:20 UTC - Igor Zubchenok: As well as Producers. ---- 2018-03-23 17:34:30 UTC - Matteo Merli: where connections are pooled ---- 2018-03-23 17:34:46 UTC - Matteo Merli: it’s just the memory that you need to keep all the objects ---- 2018-03-23 17:35:16 UTC - Igor Zubchenok: Is Producer instance a lightweight? ---- 2018-03-23 17:35:49 UTC - Matteo Merli: it is in the sense that there are no threads / TCP connections specifics for a publisher ---- 2018-03-23 17:36:22 UTC - Igor Zubchenok: What is actually happens when a new Producer is created? Why is it a long operation? ---- 2018-03-23 17:37:04 UTC - Matteo Merli: client will go to broker and check and it establishes a sessions (check permissions, etc) ---- 2018-03-23 17:37:50 UTC - Igor Zubchenok: Why this cannot be done once per `namespace`? ---- 2018-03-23 17:37:57 UTC - Matteo Merli: if the number of active topics is much smaller than the total, you could keep a cache of producers ---- 2018-03-23 17:38:11 UTC - Matteo Merli: no, each topic can be served by different machines ---- 2018-03-23 17:38:39 UTC - Igor Zubchenok: It is done by a hash of topic, isn't it? ---- 2018-03-23 17:39:09 UTC - Matteo Merli: the grouping is done by hash, but the assingment is dynamic ---- 2018-03-23 17:40:58 UTC - Igor Zubchenok: So to find a proper broker instance you should have a meta information about all available brokers for namespace and then choose proper by hash of a topic. Sounds like there is no actual need to contact broker for every single topic. ---- 2018-03-23 17:42:16 UTC - Matteo Merli: It’s more complicated than that :slightly_smiling_face: because that assignments and even the hash split can change over time ---- 2018-03-23 17:44:07 UTC - Igor Zubchenok: What do Producer of a topic do when this happens? ---- 2018-03-23 17:44:34 UTC - Igor Zubchenok: I guess the same can be done by such 'ProducerForNamespace'. ---- 2018-03-23 17:45:27 UTC - Matteo Merli: To clarify, even though a cluster can support millions of topics, it’s not always the best way to model a problem. For example, you could use a database to store per-user data and message bus as a machine to machine notification channel (eg: 1 topic per server) ---- 2018-03-23 17:46:39 UTC - Matteo Merli: When topic moves to a different broker, producer gets notified (either gracefully or forcefully) and it will redo the service discovery and session creation ---- 2018-03-23 17:48:24 UTC - Igor Zubchenok: So it could be ok for re-discovery and re-creation for 'ProducerForNamespace' in the same manner. ---- 2018-03-23 17:51:06 UTC - Igor Zubchenok: That's what I have now, I store queues in Cassandra. ---- 2018-03-23 17:53:08 UTC - Igor Zubchenok: However when I write a message to Cassandra I have no fast way to know where to send a notification cause user could be connected to arbitrary server instance. So I do 100ms period scanning of new messages for connected to server users. And Cassandra is dying... :slightly_smiling_face: ---- 2018-03-23 17:53:26 UTC - Igor Zubchenok: Am I wrong? ---- 2018-03-23 17:56:17 UTC - Matteo Merli: I see, would a single queue which broadcast info to all servers work? Each server could just drop the message if user is not connected there.
Of course it doesn’t scale indefinitely, but you can always group users to servers. ---- 2018-03-23 17:58:56 UTC - Matteo Merli: For that to work, there would be needing several changes : * Right now, producer is based on that producer session for efficiency reason. There is some context expected on both client and broker around this producer. * For example, internally the topic name doesn’t need to be repeated for each message (and many other things like that) * If a topic moves, the notification is at topic level, though if broker doesn’t know a client is interested in publishing to a topic, it cannot notify it. ---- 2018-03-23 18:00:02 UTC - Piotr: How about using Pulsar Functions for this? Can´t you use Functions to forward messages to other topics? In case you want to broadcast messages etc ---- 2018-03-23 18:00:23 UTC - Matteo Merli: good point ---- 2018-03-23 18:01:11 UTC - Igor Zubchenok: It could be a namespace context/notifications/etc. P.S. I have stateless servers, so caching of Producers goes against stateless idea. ---- 2018-03-23 18:01:28 UTC - Igor Zubchenok: Functions sounds good if I put topic name in a message? Will it work good? ---- 2018-03-23 18:01:52 UTC - Matteo Merli: you need some info to know where to route to ---- 2018-03-23 18:03:11 UTC - Igor Zubchenok: Can I specify arbitrary target topic in a function? ---- 2018-03-23 18:05:12 UTC - Matteo Merli: you can use the supplied context object to publish to arbitrary topic ---- 2018-03-23 18:06:13 UTC - Matteo Merli: though, again, if you want to fan-out to unlimited number of topic, the same reources/concerns would apply ---- 2018-03-23 18:08:28 UTC - Igor Zubchenok: I see. Do I do something wrong? Is it an anti-pattern to have a separated queue per user? ---- 2018-03-23 18:09:21 UTC - Piotr: Igor, I assume you want to broadcast system wide messages to users or something? How about you have one topic just for system wide messages? ---- 2018-03-23 18:09:32 UTC - Piotr: that would also mean using a lot less resources ---- 2018-03-23 18:10:04 UTC - Matteo Merli: > I see. Do I do something wrong? Is it an anti-pattern to have a separated queue per user? If users is unbound, yes. ---- 2018-03-23 18:10:09 UTC - Piotr: the ability to have millions of topics (one per user) is also the reason why I switched from kafka to pulsar ---- 2018-03-23 18:12:36 UTC - Piotr: also, don´t you have it on your roadmap to support millions of topics further ahead? So question is if this is a requirement you have from day 0 igor, or if the actual number of active topics is going to be less than that (maybe you don´t need unlimited retention on all topics, for example) ---- 2018-03-23 18:13:26 UTC - Piotr: I guess using pulsar for a mail inbox for all your users is not optimal - you could use cassandra or something for that and use pulsar for realtime data ---- 2018-03-23 18:14:03 UTC - Matteo Merli: Yes, even currently it’s running in production with several million topics at Yahoo (though, not one topic per user :wink: ) ---- 2018-03-23 18:14:53 UTC - Igor Zubchenok: Requirement is to have millions-to-millions messaging. scaleable. ---- 2018-03-23 18:15:34 UTC - Igor Zubchenok: and stateless server instances. ---- 2018-03-23 18:15:44 UTC - Matteo Merli: also 1M is very different from 100M ---- 2018-03-23 18:16:08 UTC - Matteo Merli: (I mean in terms of design assumptions) ---- 2018-03-23 18:17:18 UTC - Piotr: @Matteo Merli are "idle" topics (with no active consumers or producers) costly? Or can you have millions of those in the store just waiting? ---- 2018-03-23 18:18:36 UTC - Matteo Merli: the cost is mostly in terms of memory in broker. There’s a small amount of objects for each topic that is “loaded” ---- 2018-03-23 18:19:49 UTC - Piotr: Ok. How about running several clusters and choosing one based on user id (sharding)? heavy_plus_sign : Igor Zubchenok ---- 2018-03-23 18:20:25 UTC - Matteo Merli: That is not the biggest problem, you can do that already by having multiple brokers ---- 2018-03-23 18:20:26 UTC - Piotr: or simply assigning a cluster id at user creation time or something ---- 2018-03-23 18:23:09 UTC - Igor Zubchenok: This is what I want to avoid - grouping users and dispatching messages to a proper server where user is connected. ---- 2018-03-23 18:24:42 UTC - Igor Zubchenok: @Piotr this could be a solution for scaling, but the problem with Producers is still actual. ---- 2018-03-23 18:26:20 UTC - Piotr: @Igor Zubchenok how about using sticky sessions? ---- 2018-03-23 18:26:29 UTC - Matteo Merli: @Igor Zubchenok Most of notifications systems that needs to scale to 100 of millions of users typically use a cascading mechanism of notifications ---- 2018-03-23 18:26:42 UTC - Piotr: you would still keep servers stateless, but using a cookie you could make each client target the same servers most of the time ---- 2018-03-23 18:27:35 UTC - Igor Zubchenok: This became a problem if a server instance goes down and users reconnect to other available instances. ---- 2018-03-23 18:28:17 UTC - Piotr: I´m guessing you´re designing some kind of chat app here? In that case each user will keep messaging almost the same users in a period of time - and by using sticky sessions the requests from that user will keep going to the same server until it goes down and you need to switch. Point is then a producer cache should work nicely ---- 2018-03-23 18:28:49 UTC - Matteo Merli: a common way for that is to have “pods” of servers handling groups of users, where each pod has a max-size ---- 2018-03-23 18:29:02 UTC - Piotr: Igor: sticky sessions doesnt mean your servers become stateful - it just means that the load balancer will keep sending requests from the same user to the same server for as long as that server is online ---- 2018-03-23 18:30:38 UTC - Piotr: <https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html> ---- 2018-03-23 18:34:45 UTC - Igor Zubchenok: Agree, this is a solution - having groups of servers for partition of users and reconnect user to the same group of servers. But possibility that provide ProducerForNamespace would solve this problem in much more elegant and scaleable way and it would be superb. ---- 2018-03-23 18:35:16 UTC - Piotr: hehe ---- 2018-03-23 18:35:28 UTC - Matteo Merli: > But possibility that provide ProducerForNamespace would solve this problem in much more elegant and scaleable way and it would be superb. Let me think a bit more about it. :slightly_smiling_face: ---- 2018-03-23 18:37:52 UTC - Igor Zubchenok: Otherwise I have to configure user partitioning in my solution, adding a new partition will result in session closing. As a result - lot of additional development and operations. And sticky-to-group sessions. ---- 2018-03-23 18:38:23 UTC - Piotr: I´m thinking there would probably be a tradeof either way - isn´t creating a producer also a way to avoid doing security checks on every time you send a message? ---- 2018-03-23 18:38:45 UTC - Matteo Merli: exactly ---- 2018-03-23 18:38:59 UTC - Piotr: Igor: why sticky to group? Session affinity is a yes/no config thing in the load balancer :slightly_smiling_face: ---- 2018-03-23 18:39:18 UTC - Piotr: unless I missed something sticky sessions + producer caching should solve your problems beautifully ---- 2018-03-23 18:40:06 UTC - Piotr: sticky sessions = on first request load balancer assigns the client to 1 server for X amount of time, or until that server dies - then it selects a new one. So... quite scalable, elastic etc etc ---- 2018-03-23 18:45:08 UTC - Igor Zubchenok: @Piotr To have a few producers I need to know to which group of servers a user is connected and send messages to the group. Server instance reads all messages in topic of a group and filters all messages. So it is the same problem and solution that was with Kafka, in general. ---- 2018-03-23 18:46:40 UTC - Igor Zubchenok: Another issue here is that user could be offline and all servers will drop message. So the message will be undelivered. ---- 2018-03-23 18:48:11 UTC - Piotr: you should ack messages after they have been read to avoid that ---- 2018-03-23 18:48:26 UTC - Igor Zubchenok: So you mean still have a topic per user? ---- 2018-03-23 18:48:35 UTC - Piotr: also, I am still not sure we are on the same page with that thing with the groups. Why would you need groups of servers? :slightly_smiling_face: ---- 2018-03-23 18:48:48 UTC - Piotr: Ok here is the flow in my mind ---- 2018-03-23 18:49:07 UTC - Piotr: Sending a message from user A to user B: ---- 2018-03-23 18:49:37 UTC - Piotr: 1) Client sends request for example POST /users/userA/messages ---- 2018-03-23 18:50:30 UTC - Piotr: 2) Load balancer sends the request to server 1, and along with the response from the server it also sets a cookie saying "you belong to server 1" (session affinity/ sticky sessions) ---- 2018-03-23 18:51:38 UTC - Piotr: 2b) On server 1, a producer is created for topic <persistent://yourapp/global/userMessages/userA>, a message is sent through that producer and the producer is put in a pool/cache ---- 2018-03-23 18:52:20 UTC - Piotr: 3) Client sends another message request for the same destination user (user A) POST /users/userA/messages, this time it also includes the cookie "you belong to server 1" ---- 2018-03-23 18:52:31 UTC - Piotr: 4) Load balancer sees the cookie, and sends the request to server 1 ---- 2018-03-23 18:52:59 UTC - Piotr: 5) server 1 already has a producer for <persistent://yourapp/global/userMessages/userA> in its cache, so it can use that - avoiding having to create it again ---- 2018-03-23 18:53:19 UTC - Piotr: Does that make sense? :slightly_smiling_face: ---- 2018-03-23 18:53:25 UTC - Piotr: Or did I misunderstand something? ---- 2018-03-23 18:54:59 UTC - Piotr: Also, you could do it all through websockets of course - then you wouldn´t even need the cookie - the websocket session would suffice as a connection to one and the same server for its lifetime ---- 2018-03-23 18:55:42 UTC - Igor Zubchenok: I meant group of server was needed to have another server where user can connect to avoid session re-initialization if some server instance is down. ---- 2018-03-23 18:56:07 UTC - Igor Zubchenok: And I actually use websocket connections. ---- 2018-03-23 18:58:54 UTC - Piotr: ok I guess you have a reason for grouping the servers together ---- 2018-03-23 18:58:59 UTC - Igor Zubchenok: There is no problem with consumer per user, cause there are limited number of users connected to a server. Problem is with Producers, when a user can send a message to another arbitrary user. To send a message I should create and cache millions of Producers at a single server instance which is not scalable. So solution is to know which group of servers a user is connected, and send messages to that group, where message is 'forwarded' to proper user's topic. ---- 2018-03-23 19:00:51 UTC - Matteo Merli: @Igor Zubchenok regarding the acking problem. If you treat the message bus as a notification, it's not a problem to ack messages immediately even if a consumer is not connected ---- 2018-03-23 19:01:27 UTC - Piotr: I think you can safely assume that a user will message at most X other users within a reasonable amount of time. And if you can direct all outgoing messages to the same server for a period of time, then that server can cache those producers - no need to cache more producers than you need ---- 2018-03-23 19:01:46 UTC - Matteo Merli: You just need to make sure that a server fetches the state for the users from the database, when it forst start serving that user ---- 2018-03-23 19:02:34 UTC - Igor Zubchenok: Sounds like I need a cluster of dispatchers on the top of Pulsar to forward messages to proper user topic. Every dispatcher serves a partition of users and should have cached producers per topic of user. ---- 2018-03-23 19:03:24 UTC - Piotr: I really don´t think you need that :slightly_smiling_face: But I can´t explain my idea in any other way now ---- 2018-03-23 19:03:36 UTC - Piotr: or I missed something essential ---- 2018-03-23 19:04:50 UTC - Igor Zubchenok: It is not only a chat. It real-time tracking, broadcasting and orders dispatching. ---- 2018-03-23 19:06:08 UTC - Matteo Merli: @Igor Zubchenok if you assign a user to this set of servers, you can broadcast the notification to that specific group ---- 2018-03-23 19:06:15 UTC - Karthik Palanivelu: Hi @Matteo Merli, I brought up 3 standalone pulsar and trying to configure the clusters. I am following your instructions and issued the below command to receive error of Connection refused..\pulsar-admin clusters create c —url <pulsar://IP:8080>. I am getting same error when I fired the list as well, I am getting the same error. But telnet works fine. Any setting I need to make? Please advice. ---- 2018-03-23 19:06:18 UTC - Igor Zubchenok: Agree if messages are in Cassandra. ---- 2018-03-23 19:06:22 UTC - Piotr: ok. Yeah I think broadcasting is the tricky part ---- 2018-03-23 19:06:32 UTC - Matteo Merli: At that point you don't need 1 topic per user anymore ---- 2018-03-23 19:07:37 UTC - Matteo Merli: @Karthik Palanivelu the pulsar admin URL should use the HTTP ---- 2018-03-23 19:09:52 UTC - Matteo Merli: Correct, but using the message bus for notifications will solve the polling/scanning heavy_check_mark : Igor Zubchenok ---- 2018-03-23 19:11:12 UTC - Karthik Palanivelu: @Matteo Merli I tried with url and broker-url options and both all returning connection refused. I can access these instances from my local env, 3 standalone are in AWS ---- 2018-03-23 19:13:43 UTC - Igor Zubchenok: So there are 3 ways: 1. Have a ProducerForNamespace. 2. Have a cluster of dispatchers for partitions of users that just cache Producers for every user in dispatcher's partition and forward messages to proper users' topics. 3. Store messages out of Pulsar, use sticky to server(or group of servers) user sessions, and send notifications about new messages via Pulsar via single topic per server(or group of servers) -- such can be done with Kafka as well. ---- 2018-03-23 19:14:27 UTC - Piotr: Igor: do you want to broadcast exactly the same message to absolutely everyone? ---- 2018-03-23 19:15:29 UTC - Igor Zubchenok: Nope. Broadcast is to selected users, sending orders. This require a queue per user. ---- 2018-03-23 19:16:14 UTC - Piotr: aha ok ---- 2018-03-23 19:16:29 UTC - Igor Zubchenok: Every operation in system can generate messages to different users. So arbitrary sending is needed. ---- 2018-03-23 19:16:40 UTC - Piotr: so it´s a multicast then ---- 2018-03-23 19:16:58 UTC - Piotr: @Matteo Merli do you think efficient multicasting is possible using Functions? ---- 2018-03-23 19:17:17 UTC - Piotr: I can´t seem to find any available API for Functions :slightly_smiling_face: ---- 2018-03-23 19:17:27 UTC - Igor Zubchenok: Messages that are sent to arbitrary users are actually different. It's not just sending a message to multiple users. ---- 2018-03-23 19:18:30 UTC - Piotr: ah ok. So not even multicasting, but batching since this will probably be done by one and the same process in huge batches. Got it ---- 2018-03-23 19:20:18 UTC - Piotr: with rabbitmq you could do it quite fast using routing keys - I am thinking if Functions can be used to do something similar ---- 2018-03-23 19:20:57 UTC - Igor Zubchenok: It would be great if I can send 1000 messages to 1000 different topics in a single batch. :wink: Currently I have to split the batch per topic and create 1000 producers. ---- 2018-03-23 19:25:18 UTC - Piotr: yep... I think I read somewhere that Functions can be used to forward messages from one topic to another... So then you could just have one topic to send all messages and then have a Function forwarding them to the right topic. But maybe that was the kafka docs as I can´t find that again ---- 2018-03-23 19:26:59 UTC - Igor Zubchenok: ---- 2018-03-23 19:27:11 UTC - Igor Zubchenok: if you use functions for that. @Matteo Merli told there is a context in function to send a message to arbitrary topic. ---- 2018-03-23 19:27:49 UTC - Sijie Guo: > I can´t seem to find any available API for Functions we are still actively working on the function documentation. you can checkout this: <https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/> <https://streaml.io/blog/pulsar-functions/> or you can checkout the code :slightly_smiling_face: ---- 2018-03-23 19:28:24 UTC - Matteo Merli: A possible way would be to use partitioned topic for that and have the function use failover subscription +1 : Igor Zubchenok ---- 2018-03-23 19:28:48 UTC - Matteo Merli: And you publish using user name as message key +1 : Igor Zubchenok ---- 2018-03-23 19:29:40 UTC - Sijie Guo: > if you use functions for that. @Matteo Merli told there is a context in function to send a message to arbitrary topic. just fyi (not interrupt the discussion) : <https://github.com/apache/incubator-pulsar/blob/master/pulsar-functions/api-java/src/main/java/org/apache/pulsar/functions/api/Context.java#L135> +1 : Igor Zubchenok ---- 2018-03-23 19:31:58 UTC - Piotr: nice - thanks. I really like the Functions feature so far. Then I think your only problem left to solve should be the ability to have tens of millions of topics ---- 2018-03-23 19:32:36 UTC - Piotr: which should be possible currently, and even more with this: <https://github.com/apache/incubator-pulsar/wiki/PIP-8-Pulsar-beyond-1M-topics> ---- 2018-03-23 19:33:20 UTC - Karthik Palanivelu: @Matteo Merli is this because of super user roles is empty in my standalone conf? ---- 2018-03-23 19:35:50 UTC - Igor Zubchenok: these functions are like a cluster of dispatchers that forwards messages to proper topic with Producer instances caching? ---- 2018-03-23 19:36:27 UTC - Igor Zubchenok: if yes, this simplifies the problem significantly. ---- 2018-03-23 20:04:03 UTC - Igor Zubchenok: I so much happy my case most likely can be solved with Pulsar + dispatch function. I've checked mentioned Context and found that there is nothing like Publishers. Just a `publish` method to an arbitrary `topic`. @Matteo Merli told that it will cost resources. Could anyone provide any details on sending messages with this `Context.publish` method? ---- 2018-03-23 20:17:12 UTC - Piotr: yes I wonder if that will be much more efficient than just doing it with publishers ---- 2018-03-23 20:17:43 UTC - Piotr: at a minimum, you´ll get rid of some roundtrip time ---- 2018-03-23 20:19:07 UTC - Igor Zubchenok: This is at least partitioned and limited number of publishers should be cached (and if they are cached under the hood of Context class) ---- 2018-03-23 20:21:37 UTC - Igor Zubchenok: If not cached and same technique is used for sending messages to topics, it's sounds like the same as creating a Publisher for every single message. ---- 2018-03-23 20:21:58 UTC - Igor Zubchenok: @Matteo Merli @Sijie Guo could you provide some details on this please? heavy_plus_sign : Piotr ---- 2018-03-23 21:23:22 UTC - Matteo Merli: That is correct. By using a parrtitioned topic with hashing based on the message key, and failover subscription mode for the function, each instance will only see messages for a certain subset of users ---- 2018-03-23 22:14:49 UTC - Sijie Guo: @Igor Zubchenok sorry was away for a meeting. will reply soon ok_hand : Igor Zubchenok ---- 2018-03-23 22:23:13 UTC - Igor Zubchenok: great! ---- 2018-03-23 22:23:56 UTC - Igor Zubchenok: > with hashing based on the message key maybe hash(userId) or hash(target topic)? ---- 2018-03-23 22:24:41 UTC - Igor Zubchenok: welcome back:) ---- 2018-03-23 22:55:14 UTC - Sijie Guo: > Could anyone provide any details on sending messages with this `Context.publish` method? currently `Context.publish` is just wrapping the producers. so fundamentally it isn’t more efficient than just using the plain pulsar producer. > If not cached and same technique is used for sending messages to topics, it’s sounds like the same as creating a Publisher for every single message. there are producer instances cached. so at some extend, it is “more” efficient than just creating a producer for every single message. back to your use case: 1) if you are using a giant topic with many partitions (as what Matteo suggested), you can use use_name as the key for routing messages. so same user will end up at same partition. it is most efficient for functions, as it caches only one topic producer which handles multiple partitions. 2) if you wish to use a topic per user, that means each topic will have very low traffic. cache is not that efficient. but something like NamespaceProducer (as what you suggested) is probably better to be introduced. NamespaceProducer should be very straightforward to implement though. ---- 2018-03-23 22:55:33 UTC - Sijie Guo: @Igor Zubchenok hope this give you better idea. /cc @Matteo Merli if he has more to add ---- 2018-03-23 23:04:19 UTC - Igor Zubchenok: Seems I did not got the idea around `use_name`. What is use_name? ---- 2018-03-23 23:04:58 UTC - Sijie Guo: @Igor Zubchenok sorry, userId? ---- 2018-03-23 23:05:29 UTC - Sijie Guo: @Igor Zubchenok - this is the thread I am referring to regarding “user_name” +1 : Igor Zubchenok ---- 2018-03-23 23:05:46 UTC - Sijie Guo: @Igor Zubchenok I tagged you in the tread you had with @Matteo Merli ---- 2018-03-23 23:11:10 UTC - Igor Zubchenok: Great. All this makes sense and actually looks scaleable. Thank you for your help! ---- 2018-03-23 23:11:11 UTC - Igor Zubchenok: BTW I hope you'll eventually consider to implement NamespaceProducer as more efficient way and easy to implement inside Pulsar. :slightly_smiling_face: heavy_plus_sign : Piotr ---- 2018-03-23 23:17:41 UTC - Sijie Guo: no problem. feel free to ping us if you have more questions. ---- 2018-03-23 23:17:59 UTC - Sijie Guo: I created an issue for this namespace producer idea - <https://github.com/apache/incubator-pulsar/issues/1432> +1 : Igor Zubchenok ---- 2018-03-23 23:18:03 UTC - Sijie Guo: just fyi ---- 2018-03-23 23:18:46 UTC - Sijie Guo: feel free to drop your ideas in that issue, so we can come up an clear picture about the requirements ---- 2018-03-24 00:31:02 UTC - Igor Zubchenok: 1. Is there any way to flush the batch of a producer? ---- 2018-03-24 00:35:17 UTC - Igor Zubchenok: 2. When is MessageListener.reachedEndOfTopic called? How to send something to a topic to trigger this at all consumers? ---- 2018-03-24 00:36:11 UTC - Matteo Merli: 1. Calling `send()` on the last message ---- 2018-03-24 00:37:15 UTC - Matteo Merli: 2. `reachedEndOfTopic` is only called when the topic is “terminated”. That’s an explicit operation that “seals” the topic and doesn’t allow any new message to be published anymore ---- 2018-03-24 00:38:54 UTC - Igor Zubchenok: 1. Send() of what object (I use java client)? No such method producer.send(), when I send I only get MessageId instance. ---- 2018-03-24 00:39:48 UTC - Matteo Merli: Let say you’re publishing 10 messages. You can call `producer.sendAsync()` 9 times and `producer.send()` for the last message ---- 2018-03-24 00:40:14 UTC - Matteo Merli: (in current master code, the `producer.send()` triggers immediate flush ---- 2018-03-24 00:40:29 UTC - Igor Zubchenok: Only async calls use batches? ---- 2018-03-24 00:40:51 UTC - Matteo Merli: Yes, the batch is done transparently in producer ---- 2018-03-24 00:41:14 UTC - Matteo Merli: batching adds a small delay to allow for grouping the messages ---- 2018-03-24 00:41:35 UTC - Matteo Merli: when used on sync send, it would slow down the throughput ---- 2018-03-24 00:42:16 UTC - Igor Zubchenok: `producer.send()` will triggers flush, but it will make my call synchronized, right? I prefer async API only. ---- 2018-03-24 00:42:51 UTC - Matteo Merli: yes, then why do you need to flush? ---- 2018-03-24 00:43:38 UTC - Matteo Merli: just configure the batch grouping time on producer, and use sendAsync(), tracking the CompletableFuture ---- 2018-03-24 00:43:42 UTC - Igor Zubchenok: I use infinite timeout and max number of messages to send everything I already have to send in a single batch. :slightly_smiling_face: ---- 2018-03-24 00:44:27 UTC - Matteo Merli: there’s not much advantage in having a huge batch. the max is capped at 128Kb anyway ---- 2018-03-24 00:44:37 UTC - Igor Zubchenok: It's ok. ---- 2018-03-24 00:44:39 UTC - Matteo Merli: tehre’s no throughtput advantage after that ---- 2018-03-24 00:45:57 UTC - Matteo Merli: I would say to configure the batch grouping time to a value that you’re confortable with, in terms of latency. eg: 1ms, 10ms or 100ms and forget about it ---- 2018-03-24 00:46:08 UTC - Igor Zubchenok: Ok, <128Kb of my data I want to send in a single batch and flush it in async way. ---- 2018-03-24 00:47:53 UTC - Igor Zubchenok: Has it any sense? ---- 2018-03-24 00:48:47 UTC - Matteo Merli: I think I don’t fully understand your goal :slightly_smiling_face: ---- 2018-03-24 00:55:08 UTC - Igor Zubchenok: :slightly_smiling_face: Ok I writing high performance server. Everything must be async. At some time I have many messages (with total size < 128Kb) to send and I don't want to send all these messages one-by-one. So I would like to use batch to send them and I have to set some batch timeout. But my server will wait too long until timeout. And don't want my messages to be splitted if batch preparation was longer than timeout. However I know when batch is actually ready. ---- 2018-03-24 00:57:33 UTC - Igor Zubchenok: so more clearly? ---- 2018-03-24 00:59:49 UTC - Igor Zubchenok: How to terminate a topic? I use topic for session initialization messages. When all messages are sent I want to terminate it. ---- 2018-03-24 01:02:16 UTC - Matteo Merli: There’s a REST / Java / CLI command to terminate a topic. Though keep in mind that terminate is different than deletion ---- 2018-03-24 01:02:41 UTC - Igor Zubchenok: I don't need to delete cause I need messages to be delivered. ---- 2018-03-24 01:04:42 UTC - Igor Zubchenok: Neither Producer or PulsarClient classes have anything like terminate. ---- 2018-03-24 01:05:04 UTC - Matteo Merli: > But my server will wait too long until timeout That shouldn’t require much resources in a async server though. > And don’t want my messages to be splitted if batch preparation was longer than timeout. This I don’t understand why is important ---- 2018-03-24 01:07:16 UTC - Matteo Merli: It’s on admin API. It’s a separate artifact. basically it’s a wrapper on REST API. <http://pulsar.apache.org/api/admin/org/apache/pulsar/client/admin/PersistentTopics.html#terminateTopicAsync-java.lang.String-> ---- 2018-03-24 01:07:32 UTC - Igor Zubchenok: Thanks! Will look into it. ---- 2018-03-24 01:10:53 UTC - Igor Zubchenok: >This I don’t understand why is important I guess all messages in a batch are queued together and not mixed with other producers' messages. So if timeout is small, messages are send in few batches and messages could be mixed. ---- 2018-03-24 01:11:55 UTC - Igor Zubchenok: >That shouldn’t require much resources in a async server though. But it makes needless latency. ---- 2018-03-24 01:13:54 UTC - Matteo Merli: Yes, but if you set the batch timeout to 1ms, the latency impact is very low and you can batch to a 1K write/s which is very sustainable on broker/bookies ---- 2018-03-24 01:16:08 UTC - Igor Zubchenok: I always looking for the maximum performance so the ideal way if I can control the batch - all that needed is to trigger flushAsync manually. ---- 2018-03-24 01:17:27 UTC - Igor Zubchenok: And all immediately flushed messages are not mixed with another producers sends. ---- 2018-03-24 01:21:57 UTC - Matteo Merli: Yes, it makes sense and that’s easy to expose. It’s already implemented in that way. I’m not just sure about the effective improvement that it could get. with 1ms , the latency is increase of ~0.5 average which is still much lower than the overall system latency (~5ms) ---- 2018-03-24 01:26:51 UTC - Igor Zubchenok: Just for max theoretical performance. Should I file an issue then at github? ---- 2018-03-24 01:34:19 UTC - Matteo Merli: Sure, go ahead ---- 2018-03-24 01:41:18 UTC - Igor Zubchenok: <https://github.com/apache/incubator-pulsar/issues/1433> ---- 2018-03-24 02:08:54 UTC - Matteo Merli: :+1: ---- 2018-03-24 08:09:43 UTC - Igor Zubchenok: Unfortunately PulsarAdmin does not work due to issue #1409 <https://github.com/apache/incubator-pulsar/issues/1409> ----
