Re: Stepping down
Le 21/01/2018 à 16:24, Tomasz Sterna a écrit : W dniu nie, 21.01.2018 o godzinie 15∶01 +0100, użytkownik Alexandre Jousset napisał: [...] I don't know if I'm skilled enough but instead of letting it die, I would like to become the maintainer if nobody with better skills wants to :-) Judging by your contributions to jabberd2, I see no problem in passing the project to you. Thanks for this :-) I suggest to, of course, give some time to other people to volunteer if they wish to. Some said they could host the jabberd2.org website or the mailing list, and that may help another maintainer, but in the case I become the new maintainer I think I could do that too. However, note that I only have access to Linux boxes, so I may need help from someone else about the other OS's ports. That may be a problem. BTW I was recently doing some load test and having thought about solving the SPOF of the router process, [...] We already had a _lengthy_ discussion on list on my vision how to multiply the router: https://www.mail-archive.com/jabberd2@lists.xiaoka.com/msg01909.html Your work still lives in: https://github.com/jabberd2/jabberd2/tree/mesh I know :-) My concern is that the work I've done is complex and not thoroughly tested. That does not mean it should be abandoned, but maybe a shorter term solution could be found just to remove the SPOF (without necessarily implementing the whole router mesh). And as I said at that time, I'm not at all happy with my routing graph discovering feature I implemented. I see 3 approaches here: - a heartbeat / failover solution, for cases where the single router (which could be switched without disconnecting users) is not a bottleneck? - a simplification of my work to get a router mesh less dynamic but easier to implement? I haven't thought a lot about this yet but I checked out the "mesh" branch again on my computer and I'm currently digging into it to see how to simplify it. Any advice welcome :-) - work fast and hard to simplify / finish the mesh branch code :-) I also experimented a "multi-router" setup, which works great, but needs that all the c2s's and all the sm's to be connected to all routers at the same time in order for them to work as expected, thus not allowing a failover setup, just a kind of multithread (actually multiprocess / multihost) setup. But my latest approach was to ditch the router component in favor to message bus (using 0MQ). See discussion at https://gitter.im/jabberd2/jabberd2?at=56b8b4e9939ffd5d15f671e1 This is what https://github.com/jabberd2/jabberd2/commits/ashnazg branch implemented and jabberd3 code (which was born of ashnazg branch) was going for. Just a question about this, you are talking about 0MQ, but the source points to nanomsg...? Anyway, I think it is a better solution indeed and it looks promising, and this is one of the reasons I think a shorter term / simpler solution should be found for the router / SPOF in j2. I mean, priority should be given to j3 (but of course with continuing maintaining "normally" j2). And about j3, I'm afraid I didn't fully understood what you said, I'm sorry. You said in your first message that you are going away from all XMPP work, and that you're opening the source of j3 (and another project), but you only say you're stepping down from the j2 lead. Just to be clear, is it the same for the other projects? The rest of this post is of about other topics I wanted to say / ask the list. I started to implement a Redis storage backend based on the BDB one. It is still experimental but I get good performance with it, especially with millions of users in my tests (see below). I'd like to know if people would be interested in it? I made the assumption (based on some web found comparison pages) that BDB doesn't scale well with lot of users. I also started to make some load tests, using a home made XMPP tester (I tried Tsung but I'm not at ease with it nor with Erlang to customize it) on a "few-nodes-setup", and I managed to connect up to 5M simultaneous users using a simple scenario (connect / send message randomly from time to time but not getting roster nor presences for the moment). About this my questions to the list are the following: - What is your biggest known j2 installation in terms of account number and simultaneously connected users? - Same question in a test environment? - Do you know about an XMPP stress-tester (apart from Tsung) that is able to connect millions of users? My searches only led me to Tsung in that category. Thanks, -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: Clustered Jabberd2
Hi Sylvain, Le 17/05/2013 13:41, Sylvain Guglielmi a écrit : I've been reading the mailing-list archives (especially this thread http://www.mail-archive.com/jabberd2@lists.xiaoka.com/msg01908.html ), and I think the discussed changes would suit my needs pretty well. The feature I dig most is allowing multiple routers and Session Managers (SM) on a single domain name to make jabberd2 service more resilient/easily scalable. I think that works fine with multiple c2s already (I'll try this in the upcoming days). - I think the branch made by Alexandre Jousset has not been merged in the master. Am I right on this ? You're right. Unfortunately I had to stop working on this for a while, but I'll publish my (unfinished) changes ASAP and I may be able to finish them soon. - If not, is there any plan to do it, or to implement similar features ? Also, when having multiple SMs, I guess there's two main ways to do : - each SM only hosts the data of a subset of the users (if one SM goes down, some people will have issues). This is the current way it is done. And the one, albeit different, it will be done in my implementation. In my implementation the advantage is that if one SM node is down, new users (including users that have been disconnected) will be able to connect using the others SMs. or - every user's session data is on every SM (more reliable, more storage required, and I guess harder to implement to guarantee one user can not open two session on different SMs at once/would require the admin to set up replication for the databases of each SM... etc... ). Both views are valid I guess. I still don't get which approach was most favoured or was going to be tried. This one would be tricky I think... I'm still not very at ease with jabberd2 code. I've just stared writing my own highly customisable roster module to plug it on already-existing roster databases. But I'd gladly help with that if needed. Thanks. @Alexandre Jousset : if you're still living near Paris, I would gladly invite you to discuss this around a drink. ^_^. Anyone else in the area ? Yes I live near Paris and I work in Paris. I'm OK to have a drink sometime somewhere with you ;-) Contact me in private for this :-) -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Router mesh, some news
Hi Tomasz, hi all, FYI I'm still working (when possible, mainly on my spare time) on this feature, even if I'm quite silent. I could say that I'm at ~70% of code modifications, the remaining 30% are the remote routers management, i.e. connections / disconnections / reconnections / (un)binding propagation. For the moment there are 384 '+' and 269 '-' in the diffs ;-) When I have something that can be shown and which is a bit tested I'll push it on my repo on github and I'll send a message here explaining what I've done. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 18/10/2012 16:42, Tomasz Sterna a écrit : Dnia 2012-10-18, czw o godzinie 16:12 +0200, Alexandre Jousset pisze: What if you do not manage all the routers in the mesh? And you were given a password to access only one or two routers of the mesh? I think it is pretty unusual for the admin not to have access to all routers (at least all routers managing the same domains). I'm sure there could be cases and this would add a lot of flexibility, but see below for the drawbacks. I've been building collaborative mesh networks (ircd, eggdrop) a lot. Believe me, the situations when you have just an entry point to the network are not that rare. Besides, it goes along the philosophy of jabberd2. router-users.xml is there for a reason. If it was assumed one administrator controls the whole components network, there would be no need for separate users. The problem with the multi-hop proposal is that you have to manage cases where there is cyclic connections. e.g. A = B = C = A What exactly is the problem with cyclic connections? A solution may be to add the ID of the component binding the domain / bare JID to the bound route, and to check if that combination is already bound, but this will increase CPU usage and the data structure sizes. TTL/distance would be enough. This does not increase data structures that much and CPU use is neglectable - you have to choose the route anyway. Premature optimisation is the root of all evil. - let's concentrate on the design first. Now that I think of it, implementing distance would be beneficial anyway, as we could mark routers on slow connections as less preferable. For the moment I have 2 hash tables (finally I differentiated them), one for domains where we don't really care of the size of the values, and one for bare JIDs bindings where the value is just the component_t. This component_t can be the local component for local connections, or other routers connection for remotes. So we would have to add a char * malloc'ed, strcpy'ed, etc., for each new connection in bare JID binding mode. So this would add CPU and memory consumption just for multi-hop support. It's a choice to do, if you really want me to do this I'll do it, but I'm a bit against that solution. I'm sorry, I don't understand. Give me for-instance. Forget about all this ;-) As I was writing a response to this post I understood that my problem was an implementation issue indeed, and I found a solution for it ;-) Roughly, instead of storing a component_t as bare JID hash table values I'll store a pointer to an element of a pool of component_t / component ID / distance associations. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
About this topic, I have some more comments and questions: Le 15/10/2012 02:22, Alexandre Jousset a écrit : Le 12/10/2012 19:53, Tomasz Sterna a écrit : We do. In the simplest way to do it, routers don't forward other routers' binding requests. Of course it is possible to implement it to allow multi-hops, but I'm afraid this could lead to problems (and inefficiency) for no real gain (except simplistic configuration). Of course it would be easier to only list just one router of the mesh when adding a router, but I would prefer sacrificing this easiness in favor of efficiency. After all, the administrator has all knowledge about its server architecture. So when adding a router, the config file should list *all* other already running routers in the mesh. What if you do not manage all the routers in the mesh? And you were given a password to access only one or two routers of the mesh? I think it is pretty unusual for the admin not to have access to all routers (at least all routers managing the same domains). I'm sure there could be cases and this would add a lot of flexibility, but see below for the drawbacks. In my proposal nothing stops you from making each router know all the others to make it more efficient, but it shouldn't be _required_. The problem with the multi-hop proposal is that you have to manage cases where there is cyclic connections. e.g. A = B = C = A A solution may be to add the ID of the component binding the domain / bare JID to the bound route, and to check if that combination is already bound, but this will increase CPU usage and the data structure sizes. For the moment I have 2 hash tables (finally I differentiated them), one for domains where we don't really care of the size of the values, and one for bare JIDs bindings where the value is just the component_t. This component_t can be the local component for local connections, or other routers connection for remotes. So we would have to add a char * malloc'ed, strcpy'ed, etc., for each new connection in bare JID binding mode. So this would add CPU and memory consumption just for multi-hop support. It's a choice to do, if you really want me to do this I'll do it, but I'm a bit against that solution. Of course, if you have better solution... :-) -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 15/10/2012 10:03, Tomasz Sterna a écrit : Dnia 2012-10-15, pon o godzinie 02:22 +0200, Alexandre Jousset pisze: We talked earlier about weighted randomization instead of priorities. With weighted randomization it is impossible to be sure that a local component will be preferred, this is why I made an implicit priority for local components, still using weighted random between local components, or between remote components when needed. Right. But I still don't see a rationale, why local components are better than remote ones? Why does local component should be preferred just because the connection happened to come from local c2s? Going to a remote component involves going through local router, then through remote router, then remote component. It adds a hop + a (physical) network access. To do otherwise, we should use weighted random + priorities, this would add more complexity and misunderstanding in the configuration process. I was thinking more of a binary switch prefer local components, than reintroducing priorities. Ok. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 15/10/2012 14:43, Tomasz Sterna a écrit : Dnia 2012-10-15, pon o godzinie 12:15 +0200, Alexandre Jousset pisze: But I still don't see a rationale, why local components are better than remote ones? Why does local component should be preferred just because the connection happened to come from local c2s? Going to a remote component involves going through local router, then through remote router, then remote component. It adds a hop + a (physical) network access. That's a technical detail that should not affect the load balancing algorithm. If I set two components one with weight 1 and one with weight 2, the second one should get two times more requests than first one, regardless where is it connected and where the requests are coming from. You're right. I think both behaviors make sense so I agree with you to add a binary switch in configuration to tell the router whether it should take care of local / remote components or not. This would make, for example, the use of a (separate and protocol agnostic) load balancer in front of the cluster more efficient. But it's true that in that example the weighted randomization would be useless. But as it (the binary switch) is easy to implement, letting the admin choose is better. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
I respond to this message back in time to ask a question: Le 11/09/2012 13:35, Tomasz Sterna a écrit : [...] Components have its own names. Each component needs to be uniquely named. Is it because components could previously have same names that there is « switch(targets-rtype) » at router/router.c:502, and all the multi attribute, route_MULTI_TO and route_MULTI_FROM? If not, I have misunderstood something about the purpose of these...? If yes, I think I can remove that functionality completely...? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 15/10/2012 19:38, Tomasz Sterna a écrit : Dnia 2012-10-15, pon o godzinie 18:29 +0200, Alexandre Jousset pisze: Is it because components could previously have same names that there is « switch(targets-rtype) » at router/router.c:502, and all the multi attribute, route_MULTI_TO and route_MULTI_FROM? If not, I have misunderstood something about the purpose of these...? Take a look in the code how these are set. :-) In short, these choose on which attribute router does hash computation. In case of jabberd2 component protocol connections we have route_MULTI_TO, which computes hash of stanza to attribute. In case of legacy component connections we compute hash of from attribute. Legacy connections are used to connect transports, so we need to make sure all your packets are directed to the same transport instance - so we use from attribute (route_MULTI_FROM). Ok, I was confused by the from directing to which component to send the packet... I understand now. jabberd2 component protocol connections are jabberd2 components like sm, s2s and we need to make sure all stanza sent to you are directed to the same sm instance - thus we compute hash of the to attribute to select sm instance to route to. So, you may remove this support from jabberd2 protocol connections, but we need to keep route_MULTI_FROM behavior for legacy component connections, as we cannot extend this protocol. Ok, thanks for explanation. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 12/10/2012 19:53, Tomasz Sterna a écrit : We do. In the simplest way to do it, routers don't forward other routers' binding requests. Of course it is possible to implement it to allow multi-hops, but I'm afraid this could lead to problems (and inefficiency) for no real gain (except simplistic configuration). Of course it would be easier to only list just one router of the mesh when adding a router, but I would prefer sacrificing this easiness in favor of efficiency. After all, the administrator has all knowledge about its server architecture. So when adding a router, the config file should list *all* other already running routers in the mesh. What if you do not manage all the routers in the mesh? And you were given a password to access only one or two routers of the mesh? In my proposal nothing stops you from making each router know all the others to make it more efficient, but it shouldn't be _required_. Ok. Also, in the pseudo-code I've written (and started to implement) I had to make a distinction between local components and remote routers, just for efficiency, to allow the use of a local component preferably before trying a remote one. So the local components have greater priorities than remote ones, and both are chosen with weighted random in their category. What do you think about this? Explicit is better than implicit. If you want local components to have higher priority - just say so in the configuration file. But default should be that remote binds are as equal as local ones. We talked earlier about weighted randomization instead of priorities. With weighted randomization it is impossible to be sure that a local component will be preferred, this is why I made an implicit priority for local components, still using weighted random between local components, or between remote components when needed. To do otherwise, we should use weighted random + priorities, this would add more complexity and misunderstanding in the configuration process. But maybe I've misunderstood something? Finally, I've added a routers.xml file (with a final 's', naming can be changed of course) to allow reloading it dynamically to change its connections settings if needed. What do you think about this? Do you think it could be necessary? Seams reasonable and simple. remote-routers.xml maybe? Ok. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: Working around client bugs in server software
Le 12/10/2012 10:42, Tomasz Sterna a écrit : There is a SMACK-324 [1] bug affecting a lot of Java client applications (including most Android clients). It would be trivial to work around it in jabberd2 codebase, but it just doesn't feel right. From practical point of view: There is a trivial fix we can have on - let's just do it and make our users happy. And current jabberd2 development philosophy is a stable server that just works. But there is a danger of: - never fixing the original issue if we stop exposing it - jabberd becoming an unmaintainable bag of patches for problems not in jabberd I would like to hear your opinions how should we approach such issues. [1] http://issues.igniterealtime.org/browse/SMACK-324 According to this link the bug is marked as resolved, fixed. Do you think clients haven't upgraded and still have the bug? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 17/09/2012 17:50, Tomasz Sterna a écrit : This case could be resolved if the router auto-binds the user@domain route. There could still be problems if there are more than one router. But that case (2+ routers auto-binding user@domain at the same time) could be fixed by the conflict resolution we thought before, just by canceling all the binds for user@domain...? What do you think? Brilliant idea! Works for me. Thanks :-) I would just make it temporary and extend it to all routing levels. Whenever the router makes a (random) decision to choose one of equal binds to route to, it sticks to this decision for a predefined time. Ok. I'm going to change my pseudo-code according to our latest decisions and when everything is OK I'll start to write real code. I'll keep you informed when there will be something interesting and / or more questions. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 19/09/2012 00:19, Tomasz Sterna a écrit : Dnia 2012-09-18, wto o godzinie 21:50 +0200, Alexandre Jousset pisze: About routing levels and the user@domain binding... With this solution, there's no more domain-only level at the beginning, so each SM should bind directly bare JIDs and domains (still with auto-binding). Moreover, as the same SM will manage all user@domain, there is no more full JID binding level either... This simplifies binding and routing (and all changes needed to the code), as we only have to maintain 2 hash tables (preferably), one for domains (with multiple routes/priorities) and one for bare JIDs (with only 1 route and no priority, everything would be managed by the SM)... Do we need priorities in case of domain binds? This would cause sm with higher priority to take all the load. Maybe weights instead priorities. This would help tune load balancing. I thought it was what you meant previously, sorry. But you're right, ok for this. To sum it up, it is the end of the adaptive routing... And this has an implication, it is that the router memory consumption will be higher than before with single-router architecture... Then wouldn't it be a good solution to --enable-multi-router at ./configure time? It would add the burden of maintaining both binding/routing solutions (hopefully only in the router) but it will make the changes invisible to the currently deployed servers. Not really. You need bare-jid level binds only when there is more than one component handling the domain. If there is only one, you can do domain based routing without the need for user@domain binding. This approach does not use more memory than current solution. I was thinking about multiple SMs handling one domain. Maybe it would be a good thing to say this (only 1 SM per domain to use less memory) in the install / config guide after the changes. I won't accept #ifdef'd implementation. Even dynamic modules make releases difficult. Having switchable code is even more PITA. There were some screw ups with libsubst and mio implementations in the past. Ok again :-) Sorry for insisting ;-) Now I have a question about the routers interconnections. In my proof of concept, each router had IPs/ports of each other router (including itself, just to copy/paste this part of the config file), and each one connected to each other in a client-server way, trying to reconnect roughly each X seconds (with a customizable X) in case of error / lost connection. The client router component binded the domains it managed on the server, so when the server wanted to send a packet it used that bind information to route. Each router was at the same time a client and a server, receiving packets from other routers on its client side and routing outgoing packets on its server-side. This architecture is far from perfection of course, but it has the advantage of being symmetric, avoiding problems about which router has to connect to which router and some other issues. How do you see these interconnections? How to configure them? What if (and how) one add or remove a router/host from the cluster? I'm thinking about something like editing the router.xml file on each host, or a specific routers.xml containing only these informations that could be copied verbatim on each host, and sending a SIGsomething to reload it, but if you have a better idea you're welcome :-) -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
About this, and after reading/writing other posts in other parts of this thread: Le 17/09/2012 17:50, Tomasz Sterna a écrit : I would just make it temporary and extend it to all routing levels. Whenever the router makes a (random) decision to choose one of equal binds to route to, it sticks to this decision for a predefined time. I don't see any need to stick with a (now weighted) random decision, other than in the user@domain auto-bind case. According to the pseudo code I've just written there are 3 cases where the router makes a random decision: 1) to=ad...@example.com (with or without resource) or to=example.com, and no example.com domain bound = weighted random on default routes (whether we accept multiple default routes or not, and how, is another question) 2) to=ad...@example.com (with or without resource), no ad...@example.com bare JID bound and more than one component accepting example.com = weighted random on example.com routes + auto-bind ad...@example.com to the chosen route 3) to=example.com and more than one component accepting example.com = weighted random on example.com routes Do you see any other case? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 17/09/2012 10:05, Tomasz Sterna a écrit : Dnia 2012-09-16, nie o godzinie 23:06 +0200, Alexandre Jousset pisze: Err... Sorry again, but in case of delivery I've found that: http://xmpp.org/rfcs/rfc3921.html#rules (see 11.1.4.1 for messages)... This page was what I saw before posting my solution (I remember now). Anyway, it is for messages, where one can deliver them to *all* with same priority without ACK problems and there are other treatment (see same URL) for other types of stanzas. Am I still wrong and mixing things or...? ;-) I still don't see how this might work. Could you give an example protocol flow? The question was: But then - what happens if two resources of the same priority get connected to two different sm instances? After reading the link I posted in my previous message, I don't see what could be the problem with this...? The router should send the messages to both SMs where the resources with same priority are bound, they will know what to do with them. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 17/09/2012 11:02, Tomasz Sterna a écrit : Dnia 2012-09-17, pon o godzinie 10:55 +0200, Alexandre Jousset pisze: The question was: But then - what happens if two resources of the same priority get connected to two different sm instances? After reading the link I posted in my previous message, I don't see what could be the problem with this...? The router should send the messages to both SMs where the resources with same priority are bound, they will know what to do with them. What about iq and/or presence? - iq-get would get two responses then - presence-subscribe could get both accept and deny, which to use? This would cause problems only if the message is sent to user@domain. And in that case, the link I've posted (rfc3921) says that for IQs the server should reply with an error on behalf of the user. And For presence stanzas other than those of type probe, the server MUST deliver the stanza to all available resources; I suppose that in the latter case the response includes the full JID...? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 17/09/2012 13:10, Tomasz Sterna a écrit : Again. There is 'u...@example.com/foo' resource with priority 1 bound on sm1. There is 'u...@example.com/bar' resource with priority 1 bound on sm2. 1. There is an incoming iq-get request for u...@example.com vCard. - it is being sent to sm1 and sm2 - sm1 and sm2 answers on behalf of the user - querying user gets two responses I see a possibility for this but it looks hackish...: router looks into the messages when there are more than 1 possible recipient component (in user@domain case). If it is an IQ = it generates the error itself. Or it passes the message to one of the component (at random) that will generate the error message... I'm not happy with these ideas, though... I'm still trying to think about a better solution. 1. Presence case - you're right. Presence packets are replicated to all resources, so we're good here. Ok. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 14/09/2012 16:08, Tomasz Sterna a écrit : Let's say, that we won't allow several SM instances handle resources of the same user. How? This needs tiny modification of C2S/SM protocol. Instead sending the user session creation request to the user domain, let's send it to the user bare-JID. This way if there is no session for u...@example.com bound on example.com SMs, router will revert to routing by domain and will pick up one SM at random. Then this SM will bind 'u...@example.com' name and all subsequent user sessions will be created with this SM. So there will be no need for communication between SMs. (There is a possibility of race - handling several session creation requests by router and pushing to several random SMs, before first one binds user bare JID.) This case could be resolved if the router auto-binds the user@domain route. There could still be problems if there are more than one router. But that case (2+ routers auto-binding user@domain at the same time) could be fixed by the conflict resolution we thought before, just by canceling all the binds for user@domain...? What do you think? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 14/09/2012 21:17, Tomasz Sterna a écrit : There is nothing in XMPP about delivering to most recent resource. I would like to stick to the specification :-) Sorry, I mixed the notions of binding and delivering :-( -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 16/09/2012 21:54, Alexandre Jousset a écrit : Le 14/09/2012 21:17, Tomasz Sterna a écrit : There is nothing in XMPP about delivering to most recent resource. I would like to stick to the specification :-) Sorry, I mixed the notions of binding and delivering :-( Err... Sorry again, but in case of delivery I've found that: http://xmpp.org/rfcs/rfc3921.html#rules (see 11.1.4.1 for messages)... This page was what I saw before posting my solution (I remember now). Anyway, it is for messages, where one can deliver them to *all* with same priority without ACK problems and there are other treatment (see same URL) for other types of stanzas. Am I still wrong and mixing things or...? ;-) -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Hi, Le 13/09/2012 16:15, James Wilson a écrit : I've been watching this discussion unfold and thought I might contribute. Thanks for your contribution. Personally, I have not ran a jabberd2 instance in a long time, but this question below: On 13/09/2012, at 11:35 PM, Tomasz Sterna wrote: Dnia 2012-09-13, czw o godzinie 15:07 +0200, Alexandre Jousset pisze: But then - what happens if two resources of the same priority get connected to two different sm instances? *This* was my real question ;-) I don't have answer. Will have to think about it. leads me to believe that they should act like so: [...] I don't know if this would work in practice, but this is one way I see the issue of the above question being resolved. I don't know what Tomasz thinks about this, but I think it is quite complicated for just that case. I was thinking about another idea: AFAIK the protocol says that in that case the message should either be duplicated, and we've seen previously that this may lead to problems (IQs, ACKs), or sent to one of the recipients based on the implementation's choice. Maybe we can just record the time when the session was started and add this information to each related bind request, keep at router level that information in the hash table values' structure, and use it in that case. The message would then be delivered to the recipient of the most recently started session. ...? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 14/09/2012 16:08, Tomasz Sterna a écrit : [...] (There is a possibility of race - handling several session creation requests by router and pushing to several random SMs, before first one binds user bare JID.) This race condition, in theory, has small probability to happen, but actually I see some cases where it is possible (e.g. c2s crash and / or restart, some network problems...), especially with a lot of users and auto-reconnecting clients. What exactly will happen if this race condition occurs? I think it may be complicated too to recover from it. You haven't told your opinion about my idea...? Its advantage is that there could not be such race condition. The drawback is just that SM needs to keep track of session start time (which may be already the case, I haven't checked) and to send it on each of this session related bind. In any case SM has to be patched at least to implement the new binding algorithm, so this addition would be just a small change. What do you think? -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 13/09/2012 00:05, Tomasz Sterna a écrit : [...] Looks simple. Too simple? ;-) It's never too simple :-) I think that, as you said before, the current implementation was designed open enough to be adapted and that will greatly simplify the coding of these new features. In real life the incoming part of the split would get disabled parts already handled on the accepting part, and all disconnected sessions would have to cope. Err... I'm not sure I understand this one. Sorry if my English and / or understanding is too bad :-/ BTW I have another question. AFAIK the routing to a domain is done only from c2s to sm when a user connects. Then the sm answers with the domain in the from part and gives its ID too for further communication. So, after this moment c2s knows to which component it should send messages for that user session. My question is: is this the only case where the routing to a domain is needed? If yes, in case of domain routing (e.g. when to is example.com) one should only route to one of the bound component serving that domain, maybe randomly. If no, ...? That is not the same as routing to u...@example.com without resource because in that case we said we should duplicate the message to all components bound to this user (whatever their resources). -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Le 13/09/2012 14:57, Tomasz Sterna a écrit : Dnia 2012-09-13, czw o godzinie 13:45 +0200, Alexandre Jousset pisze: AFAIK the routing to a domain is done only from c2s to sm when a user connects. Then the sm answers with the domain in the from part and gives its ID too for further communication. So, after this moment c2s knows to which component it should send messages for that user session. My question is: is this the only case where the routing to a domain is needed? The other case is communicating with the jabber server (sm) itself. You can disco the domain to see the server features. You can xmpp-ping the domain, you can even get server presence (it has some resources answering). If yes, in case of domain routing (e.g. when to is example.com) one should only route to one of the bound component serving that domain, maybe randomly. If no, ...? Not randomly. To the highest priority bind. Yes, sorry, I meant randomly between components bound with same priority. But what happens if there are many binds of the same priority? Cannot do randomly, as messages need to go to all highest priority resources. Cannot do all, as iq requests would get response many times. See above. That is not the same as routing to u...@example.com without resource because in that case we said we should duplicate the message to all components bound to this user (whatever their resources). Not all. I suggested that the priority of the bare-JID bind should be equal to the highest priority resource connected to sm. Yes, sorry again, I meant with taking priority into account. But then - what happens if two resources of the same priority get connected to two different sm instances? *This* was my real question ;-) -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Hello, Le 03/09/2012 18:10, Tomasz Sterna a écrit : I didn't get to designing routing exchange protocol yet. Building a working implementation of adapting binding of components should shine some light on what is required to be exchanged. Ok. I started to look at the process you sent earlier, and I could start to think about the tree structure just by following the process. That led me to have to ask some questions about this: 1) A minor one: is it right that it's a typo when you wrote example.org instead of example.com at some places? 2) Is is OK to assume that all non full-JIDs are of priority 0 (or, better said, have no priority at all)? 3) I have a doubt when you say at different places that a component needs to bind bare-JIDs / full-JIDs from now on. Does that mean that it needs to send bind requests for all sessions it already has, or just make *new* bind requests of the type you mentioned? 4) In this process, what if component1 disconnects? I suppose that the router needs to crawl through its tree to update it, but that can be CPU intensive and can cause lags... I know that this event is not a normal event anyway and is not supposed to happen often, so it can be negligible. 5) A Jabber protocol question, I know I could find the answer in the online docs, but as I have you at hand ;-) Is it possible to have 2 full-JIDs connected at the same time? With equal or different priorities? I suppose the answer is no to both questions, but just to be sure... 6) The tree structure I'm thinking about uses hashes to find the next node (root = domain = user = resource), and finally a pointer or an array of pointers to components for the leaves. If the answer to previous questions at 5) is no, there is only one case where there can be more than one leaf: at each node is a default route leading to one or more components. If I use a static sized array to store them, that will use more memory. So the best would be to use a linked list, but that would make the process slower. I would tend to use linked lists because the cases where one has to crawl through them are (relatively) less frequent. What do you think? That's it for the questions. I you need I could draw a diagram of the tree structure I'm thinking about to make things clearer, just tell me. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Hm, I answer to myself, after thinking some more ;-) Le 05/09/2012 21:04, Alexandre Jousset a écrit : 1) A minor one: is it right that it's a typo when you wrote example.org instead of example.com at some places? Obviously, yes. 2) Is is OK to assume that all non full-JIDs are of priority 0 (or, better said, have no priority at all)? Obviously, yes. 3) I have a doubt when you say at different places that a component needs to bind bare-JIDs / full-JIDs from now on. Does that mean that it needs to send bind requests for all sessions it already has, or just make *new* bind requests of the type you mentioned? I think the component has to send all its already online sessions. And I think this can be a hint for the routers' synchronisation in multi-router implementation. In a multi-router implementation, why not consider a router as a more or less normal component (when connected to other routers)? With one exception, that when a random has to be chosen between components, only the local components have a chance. I have to think more about this... 4) In this process, what if component1 disconnects? I suppose that the router needs to crawl through its tree to update it, but that can be CPU intensive and can cause lags... I know that this event is not a normal event anyway and is not supposed to happen often, so it can be negligible. 5) A Jabber protocol question, I know I could find the answer in the online docs, but as I have you at hand ;-) Is it possible to have 2 full-JIDs connected at the same time? With equal or different priorities? I suppose the answer is no to both questions, but just to be sure... Obviously, no. 6) The tree structure I'm thinking about uses hashes to find the next node (root = domain = user = resource), and finally a pointer or an array of pointers to components for the leaves. If the answer to previous questions at 5) is no, there is only one case where there can be more than one leaf: at each node is a default route leading to one or more components. If I use a static sized array to store them, that will use more memory. So the best would be to use a linked list, but that would make the process slower. I would tend to use linked lists because the cases where one has to crawl through them are (relatively) less frequent. What do you think? Stupid question: the case where there is multiple choice for components is only the domain case. So I can use a reasonnably configurable-at-compile-time sized array. -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----
Re: jabberd2 in cluster? ideas, proof of concept and questions...
Hi Tomasz, Thanks for your answer. I'll study your message in detail when I'll have time to. I think I'll be able to work on this topic during the week-end. Regards, -- -- \^/-- ---/ O \----- -- | |/ \| Alexandre (Midnite) Jousset | -- ---|___|-----