Re: coping with lost connectivity (was: mesh portal discovery)
On Sat, 12 Jan 2008 at 12:00:02 -0500, Benjamin M. Schwartz wrote: This is precisely what I am saying. Telepathy should only register a disconnect if there is no way to route between two XOs. The mesh system should be designed so that moving about within the mesh, or handing off between Salut and Gabble, or switching from one internet-connected wireless network to another, does not cause a Telepathy disconnect. For the record: Telepathy, which is a standard API used on OLPC and elsewhere, does not and will not do this. A Telepathy Connection object represents a connection (e.g. a Gabble connection represents a TCP connection to the server) and we're not going to mangle the API to behave otherwise. You're right that a possible solution for activities' networking would be for some lower layer to paper over the cracks - Telepathy is the wrong layer to be doing this, but the Presence Service could do it, or so could a library in the sugar. hierarchy. However, each activity is fundamentally going to need a way to sync its state on initial connection; for at least the medium term, as Sjoerd said, we propose that the same mechanism be used to resync after connectivity loss. If we have enough developer time to be able to work on library code for activities' networking, it's likely that the API will involve an activity-supplied resync callback that's called when initially joining an activity, and when connectivity is regained after a connection loss. In each of these cases, the path between XOs remains routable, with a gap of at most a few seconds. I'm not convinced this is even technically feasible, given the constraints of the quality of the underlying network (if the packets aren't arriving, there's nothing we can do about it). It's certainly a much lower priority right now than making the servers scale properly. Many of Sjoerd's bugfixes to Salut have been to do with detecting and signalling loss of connectivity - not in the sense of I lost my IP address, but in the sense of I'm sending packets to Fred and he's not acknowledging any of them, so either he's not receiving them or I'm not receiving the acknowledgements. No amount of programming will fix packets just not turning up, so the only improvements we can bring here are by tweaking the trade-off of bandwidth use vs timely error recovery. This suggests that however much we can improve the API, activities will still have to be able to deal with situations where other users fall off the network, in a more or less graceful way - and if you can do that, then you can use the same mechanisms to resync after connectivity changes, in the way that Sjoerd suggests. We can probably never make it entirely transparent, because however quick it becomes to reconnect after connectivity loss, you can't guarantee that you haven't lost messages; and in a message-passing system, as soon as you can't make that guarantee, the only way back to a consistent state is to ask someone else what's going on, which is exactly the resync I'm talking about. Simon ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Thu, Jan 10, 2008 at 02:29:15PM -0500, Benjamin M. Schwartz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Morgan Collett wrote: We'll add some API to PresenceService and sugar.presence, and put some signal into Sugar similar to the buddy-left signal to indicate you were disconnected, and ensure that the activity gets back into an unshared state. If we find the shared activity ID in presence we can attempt to rejoin, handing switching of one IP address to another without changing from gabble to salut (or vice versa). Then Activities will only need to hook the disconnected signal to clean up state, if that is necessary. This is not an acceptable long-term solution. It means that whenever people are moving about the mesh in a multi-server environment, their activities will spontaneously disconnect, and possibly reconnect some time later. This is hugely disruptive to activities with shared state, continuous action, or just about any other collaboration style. From the users' perspective, it will inevitably render the sharing system unreliable and frustrating. Activities need to cope with people coming going anyway. If your in a mesh only environment, the mesh can be split into two or more parts at any point and later on merge again. Salut will model that as people disconnecting and later on connecting again, your application _must_ be able to synchronize the shared state if needed in some way. As an example, consider two students using Distance (my activity) to measure out a distance of 15 meters. As they move apart, one of them switches into the domain of another mesh-portal, and its IP address changes. It suddenly drops out of the activity, and measurement stops. Some time later, it may rejoin automatically, at which point the students will have to reinitiate measurement manually. If IP address switches are triggered automatically, and silently, then they must be handled automatically, and silently. That's mostly up to the application. Telepathy shouldn't hide the fact that we're not actually connected anymore and applications should do something usefull with that info. A better long term solution would probably to use mobile IP, so you don't get disconnected when switching between networks. Sjoerd -- A transistor protected by a fast-acting fuse will protect the fuse by blowing first. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sjoerd Simons wrote: Activities need to cope with people coming going anyway. If your in a mesh only environment, the mesh can be split into two or more parts at any point and later on merge again. Salut will model that as people disconnecting and later on connecting again, your application _must_ be able to synchronize the shared state if needed in some way. cope is exactly the right word. It is simply impossible, in many cases, to handle a mesh split without disruption. For example, any code that creates a distributed lock will fail if the group splits and rejoins. If your Activity uses this common structure, then there will inevitably be a major discontinuity when the group rejoins. I understand that mesh splits are inevitable, but every effort should be undertaken to minimize their frequency. If IP address switches are triggered automatically, and silently, then they must be handled automatically, and silently. That's mostly up to the application. Telepathy shouldn't hide the fact that we're not actually connected anymore and applications should do something usefull with that info. A better long term solution would probably to use mobile IP, so you don't get disconnected when switching between networks. This is precisely what I am saying. Telepathy should only register a disconnect if there is no way to route between two XOs. The mesh system should be designed so that moving about within the mesh, or handing off between Salut and Gabble, or switching from one internet-connected wireless network to another, does not cause a Telepathy disconnect. In each of these cases, the path between XOs remains routable, with a gap of at most a few seconds. I understand that this is not easy, and that it will not be implemented immediately, but we should not profess ourselves satisfied with anything less reliable. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHiPISUJT6e6HFtqQRAlRcAJwJadZ90uMpHwj+zXrsP3Ub3m6e/ACbBi2P YiqXYw+Pt+591PRgPcqytfU= =i832 -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Sat, Jan 12, 2008 at 12:00:02PM -0500, Benjamin M. Schwartz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sjoerd Simons wrote: Activities need to cope with people coming going anyway. If your in a mesh only environment, the mesh can be split into two or more parts at any point and later on merge again. Salut will model that as people disconnecting and later on connecting again, your application _must_ be able to synchronize the shared state if needed in some way. cope is exactly the right word. It is simply impossible, in many cases, to handle a mesh split without disruption. For example, any code that creates a distributed lock will fail if the group splits and rejoins. If your Activity uses this common structure, then there will inevitably be a major discontinuity when the group rejoins. Distributed locks aren't a good design to use in activities even without the splitting problem, their usually very sensitive to latency. Which is something you shouldn't assume to be low in any case. Especially in case of temporary network issues, it can easily take multiple minutes for a message to arrive at one or all your nodes. I understand that mesh splits are inevitable, but every effort should be undertaken to minimize their frequency. Ofcourse. And we do try to minimise it. But your application really should be designed to cope with splitting and merging, because it will happen in the field. I would recommend everyone to take this as a basic requirement when designing your protocols. And don't hesitate to ask us (the Collabora) people for advise in specific cases. If IP address switches are triggered automatically, and silently, then they must be handled automatically, and silently. That's mostly up to the application. Telepathy shouldn't hide the fact that we're not actually connected anymore and applications should do something usefull with that info. A better long term solution would probably to use mobile IP, so you don't get disconnected when switching between networks. This is precisely what I am saying. Telepathy should only register a disconnect if there is no way to route between two XOs. The mesh system should be designed so that moving about within the mesh, or handing off between Salut and Gabble, or switching from one internet-connected wireless network to another, does not cause a Telepathy disconnect. In each of these cases, the path between XOs remains routable, with a gap of at most a few seconds. I understand that this is not easy, and that it will not be implemented immediately, but we should not profess ourselves satisfied with anything less reliable. Well. The simplest way to do that is to have some nice helper classes, that will can reconnect and reshare your activity if you get disconnected. But it is still up to the application to recover from this nicely, which is basically recovering seeing from everyone splitting away from you and coming back again Hmmm, that does look a lot like the mesh-splitup issue, we mentioned earlier on.. :) Sjoerd -- The price of success in philosophy is triviality. -- C. Glymour. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
D'oh. Still recovering from a wicked cold. 2008-01-11T16:49:06 Bennett Todd: If so, you could use them to assign the lower 24 bits of a 10/24 addr until you've shipped 16 million XOs, 10/8 Sorry. -Bennett pgpxodZ2p78Z8.pgp Description: PGP signature ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, 09 Jan 2008 at 22:17:18 -0500, John Watlington wrote: We have a presence service which provides a way for P2P applications to find one another, even after the IP changes. Presence Service isn't magical. If a laptop's IP address changes, in the link-local backend (Salut) this will most likely appear as a disconnect + reconnect (and the user will leave all shared activities they were currently in). This is somewhat unavoidable, but if it's a hard requirement that Salut do its best to survive IP addresses changing, file a bug against telepathy-salut. In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. Simon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHhd6wWSc8zVUw7HYRArqcAKCsl70pz9HTAefk05uUulh+9NJzmgCfWzZX sBzd9bWu/1RpvBi+GGTnIfk= =0pb8 -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
Simon McVittie wrote: On Wed, 09 Jan 2008 at 22:17:18 -0500, John Watlington wrote: We have a presence service which provides a way for P2P applications to find one another, even after the IP changes. Presence Service isn't magical. If a laptop's IP address changes, in the link-local backend (Salut) this will most likely appear as a disconnect + reconnect (and the user will leave all shared activities they were currently in). This is somewhat unavoidable, but if it's a hard requirement that Salut do its best to survive IP addresses changing, file a bug against telepathy-salut. In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. As mentioned in #5620, activities aren't aware of the dropped connection, and still show shared in the sharing combobox. We don't yet have a (standard) way for activities to detect the disconnection and handle it gracefully. So user will leave all shared activities means the activities keep running with no indication to the user that disconnection occurred, except that sharing stops working... Morgan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Thu, 10 Jan 2008, [EMAIL PROTECTED] wrote: for #2 the basic approach is the same as LVS uses in tunneling mode see http://www.linuxvirtualserver.org/VS-IPTunneling.html for a diagram and explination This is basicly what I was suggesting earlier, don't worry about the outbound traffic, just bounce the inbound traffic to the closest node (via a tunnel) before sending it over the air. this chould be a matter of useing the existing LVS code and changing the server selection logic with something that is aware of the wireless topology. to avoid a routing loop where the packet gets bounced back and forth between MPP boxes, you should be able to set things up so that the load balancing is only done on packets coming in from the outside (I don't know if iptables can do this stock, but it should be a simple, if ugly hack to make packets arriving through a tunnel bypass the LVS code and get inserted just past it in the IP stack) the worst case with this model should be that some inbound packets get relayed to the wrong MPP and make more hops then they need to over the air. another thought that hit me. you have a mesh routing daemon (I don't know if it's in kernel space or user space) to decide how to get the packets to the target laptop over the mesh. what if this routing daemon is told about tunnels to other MPP nodes and treats them like one radio hop for the routing decision? the result should be that if the node is closer to another MPP node the inbound packet will go over the wire until it is as close to the laptop as possible. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Thu, 2008-01-10 at 09:00 +, Simon McVittie wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, 09 Jan 2008 at 22:17:18 -0500, John Watlington wrote: We have a presence service which provides a way for P2P applications to find one another, even after the IP changes. Presence Service isn't magical. If a laptop's IP address changes, in the link-local backend (Salut) this will most likely appear as a disconnect + reconnect (and the user will leave all shared activities they were currently in). This is somewhat unavoidable, but if it's a hard requirement that Salut do its best to survive IP addresses changing, file a bug against telepathy-salut. In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. IP addresses are going to change; that's a fact of life. The best anyone can do is try to not make an IP address change a traumatic experience for the user, and provide mechanisms to ensure that whatever the user was working on at the time doesn't just disappear in a puff of smoke. Dan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Thu, 10 Jan 2008, Dan Williams wrote: In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. IP addresses are going to change; that's a fact of life. The best anyone can do is try to not make an IP address change a traumatic experience for the user, and provide mechanisms to ensure that whatever the user was working on at the time doesn't just disappear in a puff of smoke. this means changing every app to be aware of IP changes so that they know that they need to re-connnect to the far end. and for many apps, modifying them to be able to pick up where they left off (and to do so in a secure way so that bad guys can't claim to be you on a new IP address and connect into an authenticated session) good luck in re-writing the world. now, if you are willing to throw way all existing software (and solve the reconnect security problems) you may be able to make it work, but there are no apps that work this way today that I am aware of. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Fri, 2008-01-11 at 00:09 +, [EMAIL PROTECTED] wrote: On Thu, 10 Jan 2008, Dan Williams wrote: In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. IP addresses are going to change; that's a fact of life. The best anyone can do is try to not make an IP address change a traumatic experience for the user, and provide mechanisms to ensure that whatever the user was working on at the time doesn't just disappear in a puff of smoke. this means changing every app to be aware of IP changes so that they know that they need to re-connnect to the far end. and for many apps, modifying them to be able to pick up where they left off (and to do so in a secure way so that bad guys can't claim to be you on a new IP address and connect into an authenticated session) good luck in re-writing the world. now, if you are willing to throw way all existing software (and solve the reconnect security problems) you may be able to make it work, but there are no apps that work this way today that I am aware of. The world changed underneath the apps, but the apps weren't modified to handle it. It's not 1997 anymore. People no longer only use desktop workstations with static IP addresses. Laptops are everywhere. You don't keep the same IP address when you walk from Starbucks to Panera. Mobile IP may mostly solve this; and that's fine. But until then, the apps are going to suck if they don't handle address changes which are simply a fact of life. It's not that hard to write an app that notices and handles IP address changes. Not handling this in apps that are written for or ported to the XO is just plain laziness. When porting or writing, you need to handle the always-fullscreen-window case, you need to handle the security system, and you need to be aware of IP address changes. Welcome to 2008. Dan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Thu, 10 Jan 2008, Dan Williams wrote: On Fri, 2008-01-11 at 00:09 +, [EMAIL PROTECTED] wrote: On Thu, 10 Jan 2008, Dan Williams wrote: In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. IP addresses are going to change; that's a fact of life. The best anyone can do is try to not make an IP address change a traumatic experience for the user, and provide mechanisms to ensure that whatever the user was working on at the time doesn't just disappear in a puff of smoke. this means changing every app to be aware of IP changes so that they know that they need to re-connnect to the far end. and for many apps, modifying them to be able to pick up where they left off (and to do so in a secure way so that bad guys can't claim to be you on a new IP address and connect into an authenticated session) good luck in re-writing the world. now, if you are willing to throw way all existing software (and solve the reconnect security problems) you may be able to make it work, but there are no apps that work this way today that I am aware of. The world changed underneath the apps, but the apps weren't modified to handle it. It's not 1997 anymore. People no longer only use desktop workstations with static IP addresses. Laptops are everywhere. You don't keep the same IP address when you walk from Starbucks to Panera. but you don't continue to use your laptop as you walk from starbucks to panera, you close your laptop at starbucks, walk to panera and open it again. or starbucks and panera are part of the same network so you don't actually change addresses asyou move between them. and when you suspend and resume there are going to be apps that quit on you. Mobile IP may mostly solve this; and that's fine. But until then, the apps are going to suck if they don't handle address changes which are simply a fact of life. It's not that hard to write an app that notices and handles IP address changes. Not handling this in apps that are written for or ported to the XO is just plain laziness. When porting or writing, you need to handle the always-fullscreen-window case, you need to handle the security system, and you need to be aware of IP address changes. you have to modify both the client and the server to survive the changes. you can't just modify the client when you port it to the XO. Welcome to 2008. but even the XO apps loose the connection to their peers and require manual actions to re-establish them when they change their IP address. you say 'welcome to 2008' I say none of your software works the way you claim it does. you are probably thinking of web based things, and HTTP is designed so that every request-response pair can be a seperate TCP connection (with state held via other means), that will survive IP changes (although even there they will loose any transactions in flight and require them to be manually restarted, including laarge transfers) there are very few (if any) applications that use long-term connections that will handle IP changes (frankly, most of them won't handle their connection being interrupted at all) if you think that I am wrong and there are lots of apps that use long-term connections and recover from IP changes, please provide examples. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dan Williams wrote: It's not that hard to write an app that notices and handles IP address changes. Not handling this in apps that are written for or ported to the XO is just plain laziness. When porting or writing, you need to handle the always-fullscreen-window case, you need to handle the security system, and you need to be aware of IP address changes. No and yes. I agree that this is the desired behavior, but it cannot be handled by individual activities. Correctly designed activities aren't even aware that they are operating over an IP network. Once Telepathy's streaming media support is in, there will be almost no excuse to have the other participant's IP address in your code, ever. Telepathy must handle these network topology changes seamlessly, invisibly, and entirely behind the abstraction barrier. The routing system must be designed to make this possible. I know nothing about routing, but if a participant's IP address is about to change, perhaps the change should be broadcast over the network, so that Telepathy knows who to handoff the connection to. - --Ben -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhldgUJT6e6HFtqQRAgdeAJ9DPoCUaP9fOVINzxOu+/5BC7dYIQCfXF0L IHwm7Z6q3q9g5x5T/+AKzQQ= =3So0 -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
[EMAIL PROTECTED] wrote: On Thu, 10 Jan 2008, Dan Williams wrote: In the server-based backend, an IP address change *will* cause a disconnect and reconnect. This is definitely unavoidable, since XMPP uses a long-lived TCP connection to the server. IP addresses are going to change; that's a fact of life. The best anyone can do is try to not make an IP address change a traumatic experience for the user, and provide mechanisms to ensure that whatever the user was working on at the time doesn't just disappear in a puff of smoke. this means changing every app to be aware of IP changes so that they know that they need to re-connnect to the far end. and for many apps, modifying them to be able to pick up where they left off (and to do so in a secure way so that bad guys can't claim to be you on a new IP address and connect into an authenticated session) good luck in re-writing the world. We'll add some API to PresenceService and sugar.presence, and put some signal into Sugar similar to the buddy-left signal to indicate you were disconnected, and ensure that the activity gets back into an unshared state. If we find the shared activity ID in presence we can attempt to rejoin, handing switching of one IP address to another without changing from gabble to salut (or vice versa). Then Activities will only need to hook the disconnected signal to clean up state, if that is necessary. /handwave Morgan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
IP addresses are going to change; that's a fact of life. I know nothing about routing, but if a participant's IP address is about to change, perhaps the change should be broadcast over the network, so that Telepathy knows who to handoff the connection to. To re-ground this discussion, if two mesh portals appear on the network, at different IP addresses, a laptop can continue to use the old one for its existing connections, yet switch its primary address to a new (better) one for new connections. IPv6 includes host-based tools for making IP address changes easier. In particular, it requires the kernel to be able to process several global IP addresses for a given hardware interface. The latest is marked preferred, the rest are marked deprecated. When creating new connections, it normally uses the preferred address. But communication over all of the addresses continues to work (as long as the network outside the kernel has connectivity at that address). Linux implements all of this for IPv6. I don't know if the Linux kernel can do the same for IPv4, but it would be a natural extension. Some applications care what IP address they are using; bind (DNS) in particular watches for interfaces to go up or down, or to change. If Telepathy wants to do the same, yet there is no low-overhead way to do it, then another natural extension would be to extend inotify (or raw sockets, or some other kernel mechanism) to report such changes. This would avoid polling for them. As long as the previous mesh portal continues to work for a short while, there should be no need for nonstandard mechanisms to let applications know that the IP address is *about* to change. Instead they will naturally find out after it *does* change. John Gilmore ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Thu, 10 Jan 2008, John Gilmore wrote: IP addresses are going to change; that's a fact of life. I know nothing about routing, but if a participant's IP address is about to change, perhaps the change should be broadcast over the network, so that Telepathy knows who to handoff the connection to. To re-ground this discussion, if two mesh portals appear on the network, at different IP addresses, a laptop can continue to use the old one for its existing connections, yet switch its primary address to a new (better) one for new connections. why is it that the laptop needs to switch IP addresses? is it that the new portal won't talk to the old IP address? or is it that outbound traffic could go out either portal, but inbound traffic would still go to the old portal and make more hops over the radio then is nessasary? or something else? IPv6 includes host-based tools for making IP address changes easier. In particular, it requires the kernel to be able to process several global IP addresses for a given hardware interface. The latest is marked preferred, the rest are marked deprecated. When creating new connections, it normally uses the preferred address. But communication over all of the addresses continues to work (as long as the network outside the kernel has connectivity at that address). Linux implements all of this for IPv6. I don't know if the Linux kernel can do the same for IPv4, but it would be a natural extension. Some applications care what IP address they are using; bind (DNS) in particular watches for interfaces to go up or down, or to change. If Telepathy wants to do the same, yet there is no low-overhead way to do it, then another natural extension would be to extend inotify (or raw sockets, or some other kernel mechanism) to report such changes. This would avoid polling for them. but is it really the right thing to try and do this on the laptops (in the OS and all the software), or should we do it in the portal boxes instead? As long as the previous mesh portal continues to work for a short while, there should be no need for nonstandard mechanisms to let applications know that the IP address is *about* to change. Instead they will naturally find out after it *does* change. this gets back to my question about exactly why the change is a problem. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
mesh portal discovery
Right now we have a problem with mesh portal discovery. The DHCP procedure currently being used only discovers the nearest mesh portal when it is first run (DHCP_DISCOVER), not when it tries to renew (DHCP_REQUEST). Furthermore, as the address previously assigned indicates which mesh portal was selected, it seems like we should always be discovering, not renewing... There are larger issues which will probably need a day of discussion later surrounding IPv6 deployment, such as cooperation between RADVD and mesh portal discovery... (Please defer discussion on this right now) wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
John Watlington [EMAIL PROTECTED] wrote on 01/09/2008 11:34:29 AM: Right now we have a problem with mesh portal discovery. The DHCP procedure currently being used only discovers the nearest mesh portal when it is first run (DHCP_DISCOVER), not when it tries to renew (DHCP_REQUEST). Furthermore, as the address previously assigned indicates which mesh portal was selected, it seems like we should always be discovering, not renewing... This is the expected behavior since the special anycast address is only used during discovery. There are larger issues which will probably need a day of discussion later surrounding IPv6 deployment, such as cooperation between RADVD and mesh portal discovery... (Please defer discussion on this right now) The largest issue is how wrong, ugly and painful is to use DHCP on a mesh network. Because of RADV, IPv6 doesn't have that issue. The original mesh portal discovery method was proprietory but also extremely lightweight and did what it was supposed to do with minimal code. Using DHCP is the absolutely ugliest hack that I have even encounter because you can't legally have more than one server per layer-2 network to begin with, it makes the address configuration inconsistent (different method for a school server and different for a non-school server mesh) and to add insult to the injury, it forces the use of a DHCP server process, utilizing several megabytes of RAM in every laptop to just distribute name server and GW ip addresses, having effectively broken Internet sharing via the mesh for several months now. I am not even mentioning the uneccesary broadcasts forced by the fact that you have to have pretty short leases given the dynamic character of the network. We put a lot of effort to put the anycast address support in the mesh, to specifically address the need of selecting the optimal path to a specific service in the path discovery process itself. We ended up with the DHCP monstrocity just so that we don't use anything new in what is in effect a new way of doing local area networking. M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Jan 9, 2008 11:21 AM, Michail Bletsas [EMAIL PROTECTED] wrote: ... The largest issue is how wrong, ugly and painful is to use DHCP on a mesh network. Because of RADV, IPv6 doesn't have that issue. The original mesh portal discovery method was proprietory but also extremely lightweight and did what it was supposed to do with minimal code. ... I'm late to the party. How/who was it proprietary? ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 2008-01-09 at 12:18 -0600, Charles Durrett wrote: On Jan 9, 2008 11:21 AM, Michail Bletsas [EMAIL PROTECTED] wrote: ... The largest issue is how wrong, ugly and painful is to use DHCP on a mesh network. Because of RADV, IPv6 doesn't have that issue. The original mesh portal discovery method was proprietory but also extremely lightweight and did what it was supposed to do with minimal code. ... I'm late to the party. How/who was it proprietary? The original method used UDP packets with a custom format. I wouldn't say proprietary so much as non-standard since the code that created the UDP packets was open for all to see. Dan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 2008-01-09 at 11:34 -0500, John Watlington wrote: Right now we have a problem with mesh portal discovery. The DHCP procedure currently being used only discovers the nearest mesh portal when it is first run (DHCP_DISCOVER), not when it tries to renew (DHCP_REQUEST). Furthermore, as the address previously assigned indicates which mesh portal was selected, it seems like we should always be discovering, not renewing... Legacy IP doesn't work well and doesn't really give us what we need in the long term... or even the medium term. We've known that all along. What do you propose to do about it? Throw away pointless engineering into cobbling together some way of making Legacy IP work a bit better? I seriously hope not. Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Jan 9, 2008, at 2:08 PM, David Woodhouse wrote: On Wed, 2008-01-09 at 11:34 -0500, John Watlington wrote: Right now we have a problem with mesh portal discovery. The DHCP procedure currently being used only discovers the nearest mesh portal when it is first run (DHCP_DISCOVER), not when it tries to renew (DHCP_REQUEST). Furthermore, as the address previously assigned indicates which mesh portal was selected, it seems like we should always be discovering, not renewing... Legacy IP doesn't work well and doesn't really give us what we need in the long term... or even the medium term. We've known that all along. What do you propose to do about it? Throw away pointless engineering into cobbling together some way of making Legacy IP work a bit better? I seriously hope not. Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. Unsolicited RAs for IPv6 mean that IPv6 isn't the panacea to this problem. It's easy to discover the shortest way out of the mesh (nearest mesh portal), but setting up the larger mesh networkl infrastucture means you also need to provide a way to route packets back INTO the mesh through the MPP nearest the destination laptop. I have yet to see a good description of how to make IPv6 work right on a mesh with multiple portals.One would be welcome! I have such a method for IPv4 defined, but due to an error in modifying the DHCP client, it doesn't handle laptops moving around in the mesh once they've chosen an MPP. (BTW, the error was mine) wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 2008-01-09 at 14:33 -0500, John Watlington wrote: Unsolicited RAs for IPv6 mean that IPv6 isn't the panacea to this problem. It's easy to discover the shortest way out of the mesh (nearest mesh portal), but setting up the larger mesh networkl infrastucture means you also need to provide a way to route packets back INTO the mesh through the MPP nearest the destination laptop. I have yet to see a good description of how to make IPv6 work right on a mesh with multiple portals.One would be welcome! I talked to cscott and Michail about this briefly when I was in Boston in December. I suspect we should turn off the automatic response to RA in the kernel, and handle it in userspace. We need some special handling in userspace anyway, to pick up DNS server details from RA. We can also check the mesh path length to the origin of each RA we see, and choose the best one. I have such a method for IPv4 defined, but due to an error in modifying the DHCP client, it doesn't handle laptops moving around in the mesh once they've chosen an MPP. (BTW, the error was mine) Is there a hack which would work around that -- like reducing the lease time? -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
What do you propose to do about it? Throw away pointless engineering into cobbling together some way of making Legacy IP work a bit better? I seriously hope not. Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. You definitely live in a universe different from mine. Regardless of how much we try to make the XO to only talk to other XOs at the p2p application level, there is this small thingy out there called the web which is going to require Legacy IP for the foreseeable future... M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Jan 9, 2008, at 2:40 PM, David Woodhouse wrote: On Wed, 2008-01-09 at 14:33 -0500, John Watlington wrote: Unsolicited RAs for IPv6 mean that IPv6 isn't the panacea to this problem. It's easy to discover the shortest way out of the mesh (nearest mesh portal), but setting up the larger mesh networkl infrastucture means you also need to provide a way to route packets back INTO the mesh through the MPP nearest the destination laptop. I have yet to see a good description of how to make IPv6 work right on a mesh with multiple portals.One would be welcome! I talked to cscott and Michail about this briefly when I was in Boston in December. I suspect we should turn off the automatic response to RA in the kernel, and handle it in userspace. We need some special handling in userspace anyway, to pick up DNS server details from RA. We can also check the mesh path length to the origin of each RA we see, and choose the best one. Sounds like a plan. Sometime next week we should start outlining the work that needs to happen. I have such a method for IPv4 defined, but due to an error in modifying the DHCP client, it doesn't handle laptops moving around in the mesh once they've chosen an MPP. (BTW, the error was mine) Is there a hack which would work around that -- like reducing the lease time? Heh. JG wants to increase the lease time --- I want to reduce it. It doesn't really make a difference, as once the lease time is expired the dhclient first tries to request the existing lease from the previous DHCP server. As long as it can communicate with it by hopping through the mesh, it will renew the existing lease and never discover a closer MPP/DHCP server This was the problem that prompted my original message on this thread. wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 02:40:46 PM: We can also check the mesh path length to the origin of each RA we see, and choose the best one. The way this was originally implemented in a way that can be used for any well defined service (not just network gateways), was to assign an anycast MAC address to such well defined services. So when a node is looking to see if there is another node providing such a service in the mesh, all that it has to do is a path discovery for the MAC address corresponding to that service. If the path discovery is successful, both the presence of the service as well as the optimal path to it has been discovered. In the case of the mesh portal (A NAT Internet Gateway in our case) we need to get back the IP address of the gateway as well as DNS info. A simple python server listening at a predefined port was providing that. That simple server has been replaced by a complete DHCP server in our current implementation. M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Jan 9, 2008, at 2:55 PM, Michail Bletsas wrote: David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 02:40:46 PM: We can also check the mesh path length to the origin of each RA we see, and choose the best one. The way this was originally implemented in a way that can be used for any well defined service (not just network gateways), was to assign an anycast MAC address to such well defined services. So when a node is looking to see if there is another node providing such a service in the mesh, all that it has to do is a path discovery for the MAC address corresponding to that service. If the path discovery is successful, both the presence of the service as well as the optimal path to it has been discovered. In the case of the mesh portal (A NAT Internet Gateway in our case) we need to get back the IP address of the gateway as well as DNS info. A simple python server listening at a predefined port was providing that. That simple server has been replaced by a complete DHCP server in our current implementation. The DHCP server was needed anyway. And to implement shortest path routing both for sent and received packets, we needed a mechanism for receiving an IP address that reflected the nearest MPP anyway (or use NAT, something we would like to avoid inside the school) wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 2008-01-09 at 14:43 -0500, Michail Bletsas wrote: What do you propose to do about it? Throw away pointless engineering into cobbling together some way of making Legacy IP work a bit better? I seriously hope not. Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. You definitely live in a universe different from mine. Regardless of how much we try to make the XO to only talk to other XOs at the p2p application level, there is this small thingy out there called the web which is going to require Legacy IP for the foreseeable future... NAT-PT and proxying should solve that problem relatively simply. I should investigate the implementation at http://tomicki.net/naptd.php -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
The DHCP server was needed anyway. And to implement shortest path routing both for sent and received packets, we needed a mechanism for receiving an IP address that reflected the nearest MPP anyway (or use NAT, something we would like to avoid inside the school) I completely fail to see why we need the DHCP server to get the IP address of the nearest MPP or get the optimal path to and from it. As for the multiple radio scenario at the schools, you can address that with two different ways: bridging between the interfaces on the server or -if you want something quicker-, use a different autoIP range for each channel. M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 02:57:50 PM: NAT-PT and proxying should solve that problem relatively simply. I should investigate the implementation at http://tomicki.net/naptd.php Running application proxies on every XO that wants to act as a mesh portal? M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 03:17:30 PM: On Wed, 2008-01-09 at 15:15 -0500, Michail Bletsas wrote: Running application proxies on every XO that wants to act as a mesh portal? Running NAT-PT. Since they're required to run NAT as it is anyway, that shouldn't be too much of a problem. And this 'mesh portal' mode isn't something we really have working anyway. It does require application level proxies to look inside packets (FTP and DNS are examples, per its documentation). And we had the mesh portal thing working perfectly fine until we decided to add DHCP in the picture M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 9 Jan 2008, John Watlington wrote: In the case of the mesh portal (A NAT Internet Gateway in our case) we need to get back the IP address of the gateway as well as DNS info. A simple python server listening at a predefined port was providing that. That simple server has been replaced by a complete DHCP server in our current implementation. The DHCP server was needed anyway. And to implement shortest path routing both for sent and received packets, we needed a mechanism for receiving an IP address that reflected the nearest MPP anyway (or use NAT, something we would like to avoid inside the school) you really don't want to have your IP address change becouse you moved to have a different MPP closer to you. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
I completely fail to see why we need the DHCP server to get the IP address of the nearest MPP or get the optimal path to and from it. The MPP discovery mechanism originally proposed worked great for getting packets out of the mesh through the shortest path. The problem was that outside of running NAT on each MPP, there wasn't a good way to ensure that packets sent to that laptop entered the mesh through the same MPP. That is currently handled (in IPv4) by using a different DHCP range for each MPP, and routing to the appropriate MPP based on those ranges. As for the multiple radio scenario at the schools, you can address that with two different ways: bridging between the interfaces on the server or -if you want something quicker-, use a different autoIP range for each channel. We briefly discussed using a different autoIP range, and decided it was difficult to implement. wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
The MPP discovery mechanism originally proposed worked great for getting packets out of the mesh through the shortest path. The problem was that outside of running NAT on each MPP, there wasn't a good way to ensure that packets sent to that laptop entered the mesh through the same MPP. The only reason that I can see to use DHCP is if you want to distribute routable IPv4 addresses, something that would be glorious if it could happen but which I don't see happening very often. If you are not running NAT on the MPPs and you have multiple MPPs per mesh and the external routing protocol decided that packets should return through a different portal, what much do you think you are gaining by using the same path inside the mesh (which b.t.w. is different in each direction anyway!)? We briefly discussed using a different autoIP range, and decided it was difficult to implement. Again fail to see why - it can be non-standard but definitely not difficult to implement. M. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 9 Jan 2008, John Watlington wrote: I completely fail to see why we need the DHCP server to get the IP address of the nearest MPP or get the optimal path to and from it. The MPP discovery mechanism originally proposed worked great for getting packets out of the mesh through the shortest path. The problem was that outside of running NAT on each MPP, there wasn't a good way to ensure that packets sent to that laptop entered the mesh through the same MPP. That is currently handled (in IPv4) by using a different DHCP range for each MPP, and routing to the appropriate MPP based on those ranges. this sounds like the mobile IP problem. could you do something along the lines of having the MPP boxes within a mesh talk to each other (either over the mesh or over the Internet) so that they know what boxes are closest to each one. then have the mesh route the traffic out to the nearest MPP. response traffic would go to the MPP that allocated the IP address, and that box then tunnels the packet over to the MPP box closest to the laptop (similar to how LVS does load balancing), and that box then sends it over the radio. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 2008-01-09 at 15:59 -0500, John Watlington wrote: On Jan 9, 2008, at 3:47 PM, Michail Bletsas wrote: The MPP discovery mechanism originally proposed worked great for getting packets out of the mesh through the shortest path. The problem was that outside of running NAT on each MPP, there wasn't a good way to ensure that packets sent to that laptop entered the mesh through the same MPP. The only reason that I can see to use DHCP is if you want to distribute routable IPv4 addresses, something that would be glorious if it could happen but which I don't see happening very often. Neither do I. But I don't want to impose a NAT between two laptops in the same school. It will break P2P applications. If you are not running NAT on the MPPs and you have multiple MPPs per mesh and the external routing protocol decided that packets should return through a different portal, what much do you think you are gaining by using the same path inside the mesh (which b.t.w. is different in each direction anyway!)? I don't care about using the same path, but sending packets for six hops through the mesh when proper routing can reduce it to a single hop seems like piss-poor design. And it makes the mesh interfaces on a single server serve the entire school. Why bother with multiple MPPs at all ? We briefly discussed using a different autoIP range, and decided it was difficult to implement. Again fail to see why - it can be non-standard but definitely not difficult to implement. IIRC, Dan Williams was the person looking into it. It wasn't a Network Manager change, it was a change to Avahi, and would either have to be pushed upstream or maintained indefinitely by us. Plus, AutoIP addresses aren't EVER supposed to be routed --- they are strictly link local due to the assignment process. Thanks for the discussion --- we need to figure out a solution for IPv6 going forward, as none of the current approaches will absolutely not extend to IPv6. - DHCP did what we needed back then, namely 1) a robust discovery mechanism 2) well-tested backoff mechanisms 3) well-known and standardized behavior and packet format 4) well-tested and security audited server and client In the School Server case, using DHCP as the allowed us to collapse two steps of the connection process into one. With the previous method, you would have to _both_ find the MPP using the non-standard MPP discovery method, and second do a DHCP run to get your address from the school server. Using DHCP here _already_ can provide the address of your gateway. You could conceivably do both these operations in parallel but since you have to do DHCP anyway, it's pointless to do some other MPP discovery mechanism. In a school setting autoip might work, but might mean more traffic because of potential address conflicts and the resolution process. So if you want dynamic addressing in the school, DHCP is about the only easy way to do that, and once you're using DHCP the old MPP discovery mechanism is pointless. The above benefit does not apply in the XO-as-MPP case because autoip addressing is used, however the same codepath is used in NetworkManager as the school server case, and therefore there is less code to maintain, and fewer codepaths to test, and fewer opportunities for stuff to go wrong. The only real solution to this problem that doesn't suck is to use IPv6 auto addressing for everything. Dan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
John, The DHCP procedure currently being used only discovers the nearest mesh portal when it is first run (DHCP_DISCOVER), not when it tries to renew (DHCP_REQUEST). Furthermore, as the address previously assigned indicates which mesh portal was selected, it seems like we should always be discovering, not renewing... You probably don't want that: a mesh point might have equal cost routes to several mesh portals. In that case you want some hysteresis: only change to a new MPP if it offers a big advantage over the current one. As long as it can communicate with it by hopping through the mesh, it will renew the existing lease and never discover a closer MPP/DHCP server This was the problem that prompted my original message on this thread. One way to do this would be to run a simple daemon that 1. Periodically sends traffic to the anycast address. If you want to use dhclient for this ( assuming it is patched as described here: http://www.cozybit.com/projects/mpp-utils/index.html#update ) you could send frames to the anycast address like this: # dhclient eth0 -1 -lf /dev/null -sf /bin/true 2. Compare the metric of the best mpp with the current mpp. This can be done via iwpriv fwt_list calls. 3. If the cost difference justifies it, wipe out the existing leases and re-discover # rm /var/lib/dhcp3/* ; dhclient eth0 Cheers, Javier -- Javier Cardona cozybit Inc. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 9 Jan 2008, Javier Cardona wrote: The DHCP procedure currently being used only discovers the nearest mesh portal when it is first run (DHCP_DISCOVER), not when it tries to renew (DHCP_REQUEST). Furthermore, as the address previously assigned indicates which mesh portal was selected, it seems like we should always be discovering, not renewing... You probably don't want that: a mesh point might have equal cost routes to several mesh portals. In that case you want some hysteresis: only change to a new MPP if it offers a big advantage over the current one. As long as it can communicate with it by hopping through the mesh, it will renew the existing lease and never discover a closer MPP/DHCP server This was the problem that prompted my original message on this thread. One way to do this would be to run a simple daemon that 1. Periodically sends traffic to the anycast address. If you want to use dhclient for this ( assuming it is patched as described here: http://www.cozybit.com/projects/mpp-utils/index.html#update ) you could send frames to the anycast address like this: # dhclient eth0 -1 -lf /dev/null -sf /bin/true 2. Compare the metric of the best mpp with the current mpp. This can be done via iwpriv fwt_list calls. 3. If the cost difference justifies it, wipe out the existing leases and re-discover # rm /var/lib/dhcp3/* ; dhclient eth0 you really don't want to change the IP of the laptop any more then you absolutly must, it's too likely to disrupt existing connections. as I understand it the mesh is (close to) continuously reconfiguring itself to find the most efficiant path across it. is the resulting information available to all of the MPP nodes? if it is you should be able to do something like the following. 1. on initial connection use the existing process to make a 'best guess' to find a DHCP server and get an IP address. 2. outbound packets use this IP address no matter which MPP the packets go through. 3. inbound packets go to the MPP that initially gave out the IP address. 3a. if that MPP determines that it is still the closest MPP to the end node, it sends the packet out normally. 3b. if the packet arrives at the MPP over a tunnel from another MPP, don't check the routing, just send it out over the mesh (avoids routing loops) 3c. if the MPP determines that another MPP is significantly closer to the end node, it tunnels the packet over to the closer MPP, which then sends it over the mesh to the end node. I think that step 3 can be tested without extensive code changes by useing hooks in iptables. Iptables has the ability to call out to userspace code as part of it's processing decision, if that userspace code reports that the end-node is closest to this MPP then it routes the packet normally, if it thinks that another MPP is closer, it returns somthing to indicate which remote node to use, and then the packet gets routed through a tunnel to that node (a simple GRE tunnel will do, we just need to encapsulate the packet) This approach requires that all of the MPP boxes know which one of them is closest to each end-node. If the current mesh structure does not provide this info to all nodes then an additional daemon would need to share this info (possibly over the same tunnels that are used to relay the traffic) I will say up front that I haven't done the iptables-userspace hooking in any of my projects, but this should be an easy way to prototype this before adding this type of routing to the kernel. This approach is safe, the worst case is that inbound packets take a longer path then optimal to get to the node (either they don't get re-routed when they should or they get re-routed when they shouldn't, either way they take more hops over the radio than nessasary). By not changing the IP address of the node it avoids breaking existing connections at the cost of an additional hop over wired networks Potential problems if you are doing NAT on the MPP then this approach won't work (becouse the outbound packets don't all go through the same MPP) if the different MPP boxes are on different Internet connections and there is egress filtering outside the MPP boxes, that filtering would need to allow the mesh IP's out through all MPP boxes. David Lang ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. I sympathize with how overworked OLPC developers are. But a number of G1G1 systems are getting into the hands of articulate net-aware people. If they become disenchanted by the Legacy IP performance of the OLPC, what they say might result in hurting the whole project. I completely fail to see why we need the DHCP server ... I don't have wireless at home. First tried stopping NetManager, and manually setting the IP address. That worked, but screwed something up so I could not use the system in a cafe. So I restored the OLPC, and added a DHCP server in my home, __just__ for the OLPC. Effort that I had not anticipated needing to expend. mikus ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Jan 9, 2008, at 6:32 PM, [EMAIL PROTECTED] wrote: On Wed, 9 Jan 2008, John Watlington wrote: Sounds like you are volunteering to write that code. We need it by early next week. Yes, mobile IPv6 is one of our long-term solution possibilities. two things. 1. I didn't realize that participation wasn't desired except by people volunterring to write the code (the discussion sounded like people were trying to figure out what to do) 2. while I referred to mobile IP, the solution that I then outlined is not that complete. do you want me to shut up until I have completed code to present? or do you want people discussing options and trying to figure out a strategy that will work? Sorry about the snap, but you have to realize that I am currently the developer and QA department for the school server, and it isn't even my current job. And I really do need the answer implemented over the next week... Some handwaving about the best way to do it from the peanut gallery is fine, but don't expect to be listened to unless you can point to existing software or are willing to help implement the solution. We don't have infinite (or even sufficient) resources, nor the time for development and testing of complex solutions. I don't understand your dislike of changing IP addresses. My current laptop does it multiple times a day just fine, and it should only happen when a student is moving from one place to another (as long as Javier's comments about avoiding flapping are heeded). We have a presence service which provides a way for P2P applications to find one another, even after the IP changes. wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Jan 9, 2008, at 6:50 PM, Mikus Grinbergs wrote: Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. I sympathize with how overworked OLPC developers are. But a number of G1G1 systems are getting into the hands of articulate net-aware people. If they become disenchanted by the Legacy IP performance of the OLPC, what they say might result in hurting the whole project. You misunderstood our local IPv6 evangelist, he wasn't proposing to disable IPv4 on the laptop, just not to support it on the school server mesh. Given that all mesh capable devices will support IPv6, he's probably got a point. Here is my take-home summary of this thread: Short term solution is to turn off IPv6 on the mesh, and tell kids that if their network performance degrades, they should click on the circle again which will trigger an IPv4 DHCP discovery of the nearest MPP. Long term solution is probably to move to IPv6 only, using a user space agent to decide which RAs to listen to. This user space agent can implement Javier's suggestion to avoid flapping between MPPs. Mobile IPv6 would be frosting on the cake, but doesn't help with the primary problem of MPP selection. Thanks, wad ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: mesh portal discovery
On Wed, 9 Jan 2008, John Watlington wrote: On Jan 9, 2008, at 6:50 PM, Mikus Grinbergs wrote: Just switch off the Legacy IP, as we should have done months ago, and get on with making things work properly. Anything else is a distraction. I sympathize with how overworked OLPC developers are. But a number of G1G1 systems are getting into the hands of articulate net-aware people. If they become disenchanted by the Legacy IP performance of the OLPC, what they say might result in hurting the whole project. You misunderstood our local IPv6 evangelist, he wasn't proposing to disable IPv4 on the laptop, just not to support it on the school server mesh. Given that all mesh capable devices will support IPv6, he's probably got a point. Here is my take-home summary of this thread: Short term solution is to turn off IPv6 on the mesh, and tell kids that if their network performance degrades, they should click on the circle again which will trigger an IPv4 DHCP discovery of the nearest MPP. Long term solution is probably to move to IPv6 only, using a user space agent to decide which RAs to listen to. This user space agent can implement Javier's suggestion to avoid flapping between MPPs. Mobile IPv6 would be frosting on the cake, but doesn't help with the primary problem of MPP selection. I'm trying to make sure I fully understand the problem it sounds as if you have a good mechanism in the mesh for the laptops to send packets to the nearest MPP the problem is that if they get an IP address from a MPP that is a long way away (either initially due to a problem or over time as the laptop moves more hops away from the MPP) the fact that reply packets will always go the the MPP that gave out the IP address (due to normal IP routing) results in a slow reply as these packets start taking longer to get from the MPP to the laptop. is this correct so far? this problem is further complicated by the IPv6 equivalent of DHCP makeing it more likely that the initial 'registration' with a MPP is less optimal. and to top things off, since the replies are typically larger then the requests (which is why people live with DSL that is only 512Kb outbound, but is 1.5Mb inbound) the additional delays on the inbound leg are significantly worse. I am makeing the assumption that the MPP machines know the wireless topology from each of their points of view i.e. not only do they know how to get to the wireless nodes from themselves (and how many hops away they are), but they also know this information for each of the other MPP nodes. If this assumption is not true currently, a daemon would need to be run to keep the MPP boxes in agreement over who is the best gateway to the laptop. If I am on track so far let me see if I can divide the resulting problem into three cases 1. the MPP boxes involved are 'owned' by seperate entites and may not know about each other over the wired network. 2. the MPP boxes involved are associated with each other, but may be two or more network hops away from each other, but all managed as part of the same set (with egress filters configured so that outbound traffic could come from any of the MPP boxes) 3. the MPP boxes involved with the mesh network are tightly coupled (all connected with a high-speed wire network on the same subnet (no routing between them, all on the same broadcase domain) addressing these one at a time. for #1 I can't think of any reasonable way to move a machine from talking to one MPP to another short of true mobile IP solutions. for #2 the basic approach is the same as LVS uses in tunneling mode see http://www.linuxvirtualserver.org/VS-IPTunneling.html for a diagram and explination This is basicly what I was suggesting earlier, don't worry about the outbound traffic, just bounce the inbound traffic to the closest node (via a tunnel) before sending it over the air. this chould be a matter of useing the existing LVS code and changing the server selection logic with something that is aware of the wireless topology. to avoid a routing loop where the packet gets bounced back and forth between MPP boxes, you should be able to set things up so that the load balancing is only done on packets coming in from the outside (I don't know if iptables can do this stock, but it should be a simple, if ugly hack to make packets arriving through a tunnel bypass the LVS code and get inserted just past it in the IP stack) the worst case with this model should be that some inbound packets get relayed to the wrong MPP and make more hops then they need to over the air. for #3 I am looking at other server load balancing options, specificly the clusterIP target available in iptables http://flaviostechnotalk.com/wordpress/index.php/2005/06/12/loadbalancer-less-clusters-on-linux what this does is to define an IP address that exists on all machine and uses a multicast MAC address, this forces the switch to send the packet to