Re: coping with lost connectivity (was: mesh portal discovery)

2008-01-13 Thread Simon McVittie
On Sat, 12 Jan 2008 at 12:00:02 -0500, Benjamin M. Schwartz wrote:
 This is precisely what I am saying.  Telepathy should only register a 
 disconnect
 if there is no way to route between two XOs.  The mesh system should be 
 designed
 so that moving about within the mesh, or handing off between Salut and Gabble,
 or switching from one internet-connected wireless network to another, does not
 cause a Telepathy disconnect.

For the record: Telepathy, which is a standard API used on OLPC and elsewhere,
does not and will not do this. A Telepathy Connection object represents a
connection (e.g. a Gabble connection represents a TCP connection to the
server) and we're not going to mangle the API to behave otherwise.

You're right that a possible solution for activities' networking would
be for some lower layer to paper over the cracks - Telepathy is the
wrong layer to be doing this, but the Presence Service could do it, or
so could a library in the sugar. hierarchy.

However, each activity is fundamentally going to need a way to sync its state
on initial connection; for at least the medium term, as Sjoerd said, we propose
that the same mechanism be used to resync after connectivity loss.

If we have enough developer time to be able to work on library code for
activities' networking, it's likely that the API will involve an
activity-supplied resync callback that's called when initially joining
an activity, and when connectivity is regained after a connection loss.

 In each of these cases, the path between XOs
 remains routable, with a gap of at most a few seconds.

I'm not convinced this is even technically feasible, given the
constraints of the quality of the underlying network (if the packets
aren't arriving, there's nothing we can do about it). It's certainly a
much lower priority right now than making the servers scale properly.

Many of Sjoerd's bugfixes to Salut have been to do with detecting and
signalling loss of connectivity - not in the sense of I lost my IP
address, but in the sense of I'm sending packets to Fred and he's not
acknowledging any of them, so either he's not receiving them or I'm not
receiving the acknowledgements. No amount of programming will fix
packets just not turning up, so the only improvements we can bring here
are by tweaking the trade-off of bandwidth use vs timely error recovery.

This suggests that however much we can improve the API, activities will still
have to be able to deal with situations where other users fall off the
network, in a more or less graceful way - and if you can do that, then you
can use the same mechanisms to resync after connectivity changes, in the way
that Sjoerd suggests.

We can probably never make it entirely transparent, because however
quick it becomes to reconnect after connectivity loss, you can't
guarantee that you haven't lost messages; and in a message-passing
system, as soon as you can't make that guarantee, the only way back to a
consistent state is to ask someone else what's going on, which is exactly
the resync I'm talking about.

Simon
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-12 Thread Sjoerd Simons
On Thu, Jan 10, 2008 at 02:29:15PM -0500, Benjamin M. Schwartz wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Morgan Collett wrote:
  We'll add some API to PresenceService and sugar.presence, and put some
  signal into Sugar similar to the buddy-left signal to indicate you were
  disconnected, and ensure that the activity gets back into an unshared state.
  
  If we find the shared activity ID in presence we can attempt to rejoin,
  handing switching of one IP address to another without changing from
  gabble to salut (or vice versa).
  
  Then Activities will only need to hook the disconnected signal to clean
  up state, if that is necessary.
 
 This is not an acceptable long-term solution.  It means that whenever people
 are moving about the mesh in a multi-server environment, their activities
 will spontaneously disconnect, and possibly reconnect some time later.  This
 is hugely disruptive to activities with shared state, continuous action, or
 just about any other collaboration style.  From the users' perspective, it
 will inevitably render the sharing system unreliable and frustrating.

Activities need to cope with people coming going anyway. If your in a mesh only
environment, the mesh can be split into two or more parts at any point and
later on merge again. Salut will model that as people disconnecting and later
on connecting again, your application _must_ be able to synchronize the shared
state if needed in some way.

 As an example, consider two students using Distance (my activity) to measure
 out a distance of 15 meters.  As they move apart, one of them switches into
 the domain of another mesh-portal, and its IP address changes.  It suddenly
 drops out of the activity, and measurement stops.  Some time later, it may
 rejoin automatically, at which point the students will have to reinitiate
 measurement manually.
 
 If IP address switches are triggered automatically, and silently, then they
 must be handled automatically, and silently.

That's mostly up to the application. Telepathy shouldn't hide the fact that
we're not actually connected anymore and applications should do something
usefull with that info. A better long term solution would probably to use
mobile IP, so you don't get disconnected when switching between networks.

  Sjoerd
-- 
A transistor protected by a fast-acting fuse will protect the fuse by
blowing first.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-12 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sjoerd Simons wrote:
 Activities need to cope with people coming going anyway. If your in a mesh 
 only
 environment, the mesh can be split into two or more parts at any point and
 later on merge again. Salut will model that as people disconnecting and later
 on connecting again, your application _must_ be able to synchronize the shared
 state if needed in some way.

cope is exactly the right word.  It is simply impossible, in many cases, to
handle a mesh split without disruption.  For example, any code that creates a
distributed lock will fail if the group splits and rejoins.  If your Activity
uses this common structure, then there will inevitably be a major discontinuity
when the group rejoins.  I understand that mesh splits are inevitable, but every
effort should be undertaken to minimize their frequency.

 If IP address switches are triggered automatically, and silently, then they
 must be handled automatically, and silently.
 
 That's mostly up to the application. Telepathy shouldn't hide the fact that
 we're not actually connected anymore and applications should do something
 usefull with that info. A better long term solution would probably to use
 mobile IP, so you don't get disconnected when switching between networks.

This is precisely what I am saying.  Telepathy should only register a disconnect
if there is no way to route between two XOs.  The mesh system should be designed
so that moving about within the mesh, or handing off between Salut and Gabble,
or switching from one internet-connected wireless network to another, does not
cause a Telepathy disconnect.  In each of these cases, the path between XOs
remains routable, with a gap of at most a few seconds.  I understand that this
is not easy, and that it will not be implemented immediately, but we should not
profess ourselves satisfied with anything less reliable.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHiPISUJT6e6HFtqQRAlRcAJwJadZ90uMpHwj+zXrsP3Ub3m6e/ACbBi2P
YiqXYw+Pt+591PRgPcqytfU=
=i832
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-12 Thread Sjoerd Simons
On Sat, Jan 12, 2008 at 12:00:02PM -0500, Benjamin M. Schwartz wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Sjoerd Simons wrote:
  Activities need to cope with people coming going anyway. If your in a mesh
  only environment, the mesh can be split into two or more parts at any point
  and later on merge again. Salut will model that as people disconnecting and
  later on connecting again, your application _must_ be able to synchronize
  the shared state if needed in some way.
 
 cope is exactly the right word.  It is simply impossible, in many cases, to
 handle a mesh split without disruption.  For example, any code that creates a
 distributed lock will fail if the group splits and rejoins.  If your Activity
 uses this common structure, then there will inevitably be a major
 discontinuity when the group rejoins.

Distributed locks aren't a good design to use in activities even without the
splitting problem, their usually very sensitive to latency. Which is something
you shouldn't assume to be low in any case. Especially in case of temporary
network issues, it can easily take multiple minutes for a message to arrive at
one or all your nodes.

I understand that mesh splits are inevitable, but every effort should be
undertaken to minimize their frequency.

Ofcourse. And we do try to minimise it. But your application really should be
designed to cope with splitting and merging, because it will happen in the
field. I would recommend everyone to take this as a basic requirement when
designing your protocols. And don't hesitate to ask us (the Collabora) people
for advise in specific cases.

  If IP address switches are triggered automatically, and silently, then
  they must be handled automatically, and silently.
  
  That's mostly up to the application. Telepathy shouldn't hide the fact that
  we're not actually connected anymore and applications should do something
  usefull with that info. A better long term solution would probably to use
  mobile IP, so you don't get disconnected when switching between networks.
 
 This is precisely what I am saying.  Telepathy should only register a
 disconnect if there is no way to route between two XOs.  The mesh system
 should be designed so that moving about within the mesh, or handing off
 between Salut and Gabble, or switching from one internet-connected wireless
 network to another, does not cause a Telepathy disconnect.
 In each of these cases, the path between XOs remains routable, with a gap of
 at most a few seconds.  I understand that this is not easy, and that it will
 not be implemented immediately, but we should not profess ourselves satisfied
 with anything less reliable.

Well. The simplest way to do that is to have some nice helper classes, that
will can reconnect and reshare your activity if you get disconnected.

But it is still up to the application to recover from this nicely, which is
basically recovering seeing from everyone splitting away from you and coming
back again Hmmm, that does look a lot like the mesh-splitup issue, we
mentioned earlier on.. :)

  Sjoerd
-- 
The price of success in philosophy is triviality.
-- C. Glymour.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-11 Thread Bennett Todd
D'oh. Still recovering from a wicked cold.

2008-01-11T16:49:06 Bennett Todd:
 If so, you could use them to assign the lower 24
 bits of a 10/24 addr until you've shipped 16 million XOs,

10/8

Sorry.

-Bennett


pgpxodZ2p78Z8.pgp
Description: PGP signature
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread Simon McVittie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 09 Jan 2008 at 22:17:18 -0500, John Watlington wrote:
 We have a presence service which  
 provides a way for P2P applications to find
 one another, even after the IP changes.

Presence Service isn't magical. If a laptop's IP address changes, in the
link-local backend (Salut) this will most likely appear as a disconnect +
reconnect (and the user will leave all shared activities they were currently
in). This is somewhat unavoidable, but if it's a hard requirement that Salut
do its best to survive IP addresses changing, file a bug against
telepathy-salut.

In the server-based backend, an IP address change *will* cause a
disconnect and reconnect. This is definitely unavoidable, since XMPP
uses a long-lived TCP connection to the server.

Simon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhd6wWSc8zVUw7HYRArqcAKCsl70pz9HTAefk05uUulh+9NJzmgCfWzZX
sBzd9bWu/1RpvBi+GGTnIfk=
=0pb8
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread Morgan Collett
Simon McVittie wrote:
 On Wed, 09 Jan 2008 at 22:17:18 -0500, John Watlington wrote:
 We have a presence service which
 provides a way for P2P applications to find
 one another, even after the IP changes.
 
 Presence Service isn't magical. If a laptop's IP address changes, in the
 link-local backend (Salut) this will most likely appear as a disconnect +
 reconnect (and the user will leave all shared activities they were currently
 in). This is somewhat unavoidable, but if it's a hard requirement that Salut
 do its best to survive IP addresses changing, file a bug against
 telepathy-salut.
 
 In the server-based backend, an IP address change *will* cause a
 disconnect and reconnect. This is definitely unavoidable, since XMPP
 uses a long-lived TCP connection to the server.

As mentioned in #5620, activities aren't aware of the dropped
connection, and still show shared in the sharing combobox. We don't
yet have a (standard) way for activities to detect the disconnection and
handle it gracefully. So user will leave all shared activities means
the activities keep running with no indication to the user that
disconnection occurred, except that sharing stops working...

Morgan
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread david
On Thu, 10 Jan 2008, [EMAIL PROTECTED] wrote:

 for #2 the basic approach is the same as LVS uses in tunneling mode see 
 http://www.linuxvirtualserver.org/VS-IPTunneling.html for a diagram and 
 explination

  This is basicly what I was suggesting earlier, don't worry about the 
 outbound traffic, just bounce the inbound traffic to the closest node (via a 
 tunnel) before sending it over the air. this chould be a matter of useing the 
 existing LVS code and changing the server selection logic with something that 
 is aware of the wireless topology.

 to avoid a routing loop where the packet gets bounced back and forth between 
 MPP boxes, you should be able to set things up so that the load balancing is 
 only done on packets coming in from the outside (I don't know if iptables can 
 do this stock, but it should be a simple, if ugly hack to make packets 
 arriving through a tunnel bypass the LVS code and get inserted just past it 
 in the IP stack)

 the worst case with this model should be that some inbound packets get 
 relayed to the wrong MPP and make more hops then they need to over the air.

another thought that hit me.

you have a mesh routing daemon (I don't know if it's in kernel space or 
user space) to decide how to get the packets to the target laptop over the 
mesh.

what if this routing daemon is told about tunnels to other MPP nodes and 
treats them like one radio hop for the routing decision? the result should 
be that if the node is closer to another MPP node the inbound packet will 
go over the wire until it is as close to the laptop as possible.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread Dan Williams
On Thu, 2008-01-10 at 09:00 +, Simon McVittie wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On Wed, 09 Jan 2008 at 22:17:18 -0500, John Watlington wrote:
  We have a presence service which  
  provides a way for P2P applications to find
  one another, even after the IP changes.
 
 Presence Service isn't magical. If a laptop's IP address changes, in the
 link-local backend (Salut) this will most likely appear as a disconnect +
 reconnect (and the user will leave all shared activities they were currently
 in). This is somewhat unavoidable, but if it's a hard requirement that Salut
 do its best to survive IP addresses changing, file a bug against
 telepathy-salut.
 
 In the server-based backend, an IP address change *will* cause a
 disconnect and reconnect. This is definitely unavoidable, since XMPP
 uses a long-lived TCP connection to the server.

IP addresses are going to change; that's a fact of life.  The best
anyone can do is try to not make an IP address change a traumatic
experience for the user, and provide mechanisms to ensure that whatever
the user was working on at the time doesn't just disappear in a puff of
smoke.

Dan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread david
On Thu, 10 Jan 2008, Dan Williams wrote:


 In the server-based backend, an IP address change *will* cause a
 disconnect and reconnect. This is definitely unavoidable, since XMPP
 uses a long-lived TCP connection to the server.

 IP addresses are going to change; that's a fact of life.  The best
 anyone can do is try to not make an IP address change a traumatic
 experience for the user, and provide mechanisms to ensure that whatever
 the user was working on at the time doesn't just disappear in a puff of
 smoke.


this means changing every app to be aware of IP changes so that they know 
that they need to re-connnect to the far end. and for many apps, 
modifying them to be able to pick up where they left off (and to do so in 
a secure way so that bad guys can't claim to be you on a new IP address 
and connect into an authenticated session)

good luck in re-writing the world.

now, if you are willing to throw way all existing software (and solve the 
reconnect security problems) you may be able to make it work, but there 
are no apps that work this way today that I am aware of.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread Dan Williams
On Fri, 2008-01-11 at 00:09 +, [EMAIL PROTECTED] wrote:
 On Thu, 10 Jan 2008, Dan Williams wrote:
 
 
  In the server-based backend, an IP address change *will* cause a
  disconnect and reconnect. This is definitely unavoidable, since XMPP
  uses a long-lived TCP connection to the server.
 
  IP addresses are going to change; that's a fact of life.  The best
  anyone can do is try to not make an IP address change a traumatic
  experience for the user, and provide mechanisms to ensure that whatever
  the user was working on at the time doesn't just disappear in a puff of
  smoke.
 
 
 this means changing every app to be aware of IP changes so that they know 
 that they need to re-connnect to the far end. and for many apps, 
 modifying them to be able to pick up where they left off (and to do so in 
 a secure way so that bad guys can't claim to be you on a new IP address 
 and connect into an authenticated session)
 
 good luck in re-writing the world.
 
 now, if you are willing to throw way all existing software (and solve the 
 reconnect security problems) you may be able to make it work, but there 
 are no apps that work this way today that I am aware of.

The world changed underneath the apps, but the apps weren't modified to
handle it.  It's not 1997 anymore.  People no longer only use desktop
workstations with static IP addresses.  Laptops are everywhere.  You
don't keep the same IP address when you walk from Starbucks to Panera.

Mobile IP may mostly solve this; and that's fine.  But until then, the
apps are going to suck if they don't handle address changes which are
simply a fact of life.

It's not that hard to write an app that notices and handles IP address
changes.  Not handling this in apps that are written for or ported to
the XO is just plain laziness.  When porting or writing, you need to
handle the always-fullscreen-window case, you need to handle the
security system, and you need to be aware of IP address changes.

Welcome to 2008.

Dan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread david
On Thu, 10 Jan 2008, Dan Williams wrote:

 On Fri, 2008-01-11 at 00:09 +, [EMAIL PROTECTED] wrote:
 On Thu, 10 Jan 2008, Dan Williams wrote:


 In the server-based backend, an IP address change *will* cause a
 disconnect and reconnect. This is definitely unavoidable, since XMPP
 uses a long-lived TCP connection to the server.

 IP addresses are going to change; that's a fact of life.  The best
 anyone can do is try to not make an IP address change a traumatic
 experience for the user, and provide mechanisms to ensure that whatever
 the user was working on at the time doesn't just disappear in a puff of
 smoke.


 this means changing every app to be aware of IP changes so that they know
 that they need to re-connnect to the far end. and for many apps,
 modifying them to be able to pick up where they left off (and to do so in
 a secure way so that bad guys can't claim to be you on a new IP address
 and connect into an authenticated session)

 good luck in re-writing the world.

 now, if you are willing to throw way all existing software (and solve the
 reconnect security problems) you may be able to make it work, but there
 are no apps that work this way today that I am aware of.

 The world changed underneath the apps, but the apps weren't modified to
 handle it.  It's not 1997 anymore.  People no longer only use desktop
 workstations with static IP addresses.  Laptops are everywhere.  You
 don't keep the same IP address when you walk from Starbucks to Panera.

but you don't continue to use your laptop as you walk from starbucks to 
panera, you close your laptop at starbucks, walk to panera and open it 
again. or starbucks and panera are part of the same network so you don't 
actually change addresses asyou move between them.

and when you suspend and resume there are going to be apps that quit on 
you.

 Mobile IP may mostly solve this; and that's fine.  But until then, the
 apps are going to suck if they don't handle address changes which are
 simply a fact of life.

 It's not that hard to write an app that notices and handles IP address
 changes.  Not handling this in apps that are written for or ported to
 the XO is just plain laziness.  When porting or writing, you need to
 handle the always-fullscreen-window case, you need to handle the
 security system, and you need to be aware of IP address changes.

you have to modify both the client and the server to survive the changes. 
you can't just modify the client when you port it to the XO.

 Welcome to 2008.

but even the XO apps loose the connection to their peers and require 
manual actions to re-establish them when they change their IP address. you 
say 'welcome to 2008' I say none of your software works the way you claim 
it does.

you are probably thinking of web based things, and HTTP is designed so 
that every request-response pair can be a seperate TCP connection (with 
state held via other means), that will survive IP changes (although even 
there they will loose any transactions in flight and require them to be 
manually restarted, including laarge transfers)

there are very few (if any) applications that use long-term connections 
that will handle IP changes (frankly, most of them won't handle their 
connection being interrupted at all)

if you think that I am wrong and there are lots of apps that use long-term 
connections and recover from IP changes, please provide examples.

David Lang

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dan Williams wrote:
 It's not that hard to write an app that notices and handles IP address
 changes.  Not handling this in apps that are written for or ported to
 the XO is just plain laziness.  When porting or writing, you need to
 handle the always-fullscreen-window case, you need to handle the
 security system, and you need to be aware of IP address changes.

No and yes.  I agree that this is the desired behavior, but it cannot be handled
by individual activities.  Correctly designed activities aren't even aware that
they are operating over an IP network.  Once Telepathy's streaming media support
is in, there will be almost no excuse to have the other participant's IP address
in your code, ever.

Telepathy must handle these network topology changes seamlessly, invisibly, and
entirely behind the abstraction barrier.  The routing system must be designed to
make this possible.

I know nothing about routing, but if a participant's IP address is about to
change, perhaps the change should be broadcast over the network, so that
Telepathy knows who to handoff the connection to.

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHhldgUJT6e6HFtqQRAgdeAJ9DPoCUaP9fOVINzxOu+/5BC7dYIQCfXF0L
IHwm7Z6q3q9g5x5T/+AKzQQ=
=3So0
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread Morgan Collett
[EMAIL PROTECTED] wrote:
 On Thu, 10 Jan 2008, Dan Williams wrote:
 
 In the server-based backend, an IP address change *will* cause a
 disconnect and reconnect. This is definitely unavoidable, since XMPP
 uses a long-lived TCP connection to the server.
 IP addresses are going to change; that's a fact of life.  The best
 anyone can do is try to not make an IP address change a traumatic
 experience for the user, and provide mechanisms to ensure that whatever
 the user was working on at the time doesn't just disappear in a puff of
 smoke.

 
 this means changing every app to be aware of IP changes so that they know 
 that they need to re-connnect to the far end. and for many apps, 
 modifying them to be able to pick up where they left off (and to do so in 
 a secure way so that bad guys can't claim to be you on a new IP address 
 and connect into an authenticated session)
 
 good luck in re-writing the world.

We'll add some API to PresenceService and sugar.presence, and put some
signal into Sugar similar to the buddy-left signal to indicate you were
disconnected, and ensure that the activity gets back into an unshared state.

If we find the shared activity ID in presence we can attempt to rejoin,
handing switching of one IP address to another without changing from
gabble to salut (or vice versa).

Then Activities will only need to hook the disconnected signal to clean
up state, if that is necessary.

/handwave

Morgan
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread John Gilmore
 IP addresses are going to change; that's a fact of life.

 I know nothing about routing, but if a participant's IP address is about to
 change, perhaps the change should be broadcast over the network, so that
 Telepathy knows who to handoff the connection to.

To re-ground this discussion, if two mesh portals appear on the
network, at different IP addresses, a laptop can continue to use the
old one for its existing connections, yet switch its primary address
to a new (better) one for new connections.

IPv6 includes host-based tools for making IP address changes easier.
In particular, it requires the kernel to be able to process several
global IP addresses for a given hardware interface.  The latest is
marked preferred, the rest are marked deprecated.  When creating
new connections, it normally uses the preferred address.  But 
communication over all of the addresses continues to work (as long
as the network outside the kernel has connectivity at that address).

Linux implements all of this for IPv6.  I don't know if the Linux kernel 
can do the same for IPv4, but it would be a natural extension.

Some applications care what IP address they are using; bind (DNS) in
particular watches for interfaces to go up or down, or to change.  If
Telepathy wants to do the same, yet there is no low-overhead way to do
it, then another natural extension would be to extend inotify (or raw
sockets, or some other kernel mechanism) to report such changes.
This would avoid polling for them.

As long as the previous mesh portal continues to work for a short
while, there should be no need for nonstandard mechanisms to let
applications know that the IP address is *about* to change.  Instead
they will naturally find out after it *does* change.

John Gilmore
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-10 Thread david
On Thu, 10 Jan 2008, John Gilmore wrote:

 IP addresses are going to change; that's a fact of life.

 I know nothing about routing, but if a participant's IP address is about to
 change, perhaps the change should be broadcast over the network, so that
 Telepathy knows who to handoff the connection to.

 To re-ground this discussion, if two mesh portals appear on the
 network, at different IP addresses, a laptop can continue to use the
 old one for its existing connections, yet switch its primary address
 to a new (better) one for new connections.

why is it that the laptop needs to switch IP addresses?

is it that the new portal won't talk to the old IP address?

or is it that outbound traffic could go out either portal, but inbound 
traffic would still go to the old portal and make more hops over the radio 
then is nessasary?

or something else?

 IPv6 includes host-based tools for making IP address changes easier.
 In particular, it requires the kernel to be able to process several
 global IP addresses for a given hardware interface.  The latest is
 marked preferred, the rest are marked deprecated.  When creating
 new connections, it normally uses the preferred address.  But
 communication over all of the addresses continues to work (as long
 as the network outside the kernel has connectivity at that address).

 Linux implements all of this for IPv6.  I don't know if the Linux kernel
 can do the same for IPv4, but it would be a natural extension.

 Some applications care what IP address they are using; bind (DNS) in
 particular watches for interfaces to go up or down, or to change.  If
 Telepathy wants to do the same, yet there is no low-overhead way to do
 it, then another natural extension would be to extend inotify (or raw
 sockets, or some other kernel mechanism) to report such changes.
 This would avoid polling for them.

but is it really the right thing to try and do this on the laptops (in the 
OS and all the software), or should we do it in the portal boxes instead?

 As long as the previous mesh portal continues to work for a short
 while, there should be no need for nonstandard mechanisms to let
 applications know that the IP address is *about* to change.  Instead
 they will naturally find out after it *does* change.

this gets back to my question about exactly why the change is a problem.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


mesh portal discovery

2008-01-09 Thread John Watlington

Right now we have a problem with mesh portal discovery.

The DHCP procedure currently being used only discovers
the nearest mesh portal when it is first run (DHCP_DISCOVER),
not when it tries to renew (DHCP_REQUEST).   Furthermore,
as the address previously assigned indicates which mesh portal
was selected, it seems like we should always be discovering, not
renewing...

There are larger issues which will probably need a day of discussion
later surrounding IPv6 deployment, such as cooperation between RADVD
and mesh portal discovery...   (Please defer discussion on this right  
now)

wad
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
John Watlington [EMAIL PROTECTED] wrote on 01/09/2008 11:34:29 AM:

 
 Right now we have a problem with mesh portal discovery.
 
 The DHCP procedure currently being used only discovers
 the nearest mesh portal when it is first run (DHCP_DISCOVER),
 not when it tries to renew (DHCP_REQUEST).   Furthermore,
 as the address previously assigned indicates which mesh portal
 was selected, it seems like we should always be discovering, not
 renewing...
This is the expected behavior since the special anycast address is only 
used during discovery.


 
 There are larger issues which will probably need a day of discussion
 later surrounding IPv6 deployment, such as cooperation between RADVD
 and mesh portal discovery...   (Please defer discussion on this right 
 now)
 
The largest issue is how wrong, ugly and painful is to use DHCP on a mesh 
network.
Because of RADV, IPv6 doesn't have that issue. The original mesh portal 
discovery method was
proprietory but also extremely lightweight and did what it was supposed to 
do with minimal code.

Using DHCP is the absolutely ugliest hack that I have even encounter 
because you can't legally have more than one server per layer-2 network to 
begin with, it makes the address configuration inconsistent (different 
method for a school server and different for a non-school server mesh) and 
to add insult to the injury, it forces the use of a DHCP server process, 
utilizing several megabytes of RAM in every laptop to just distribute name 
server and GW ip addresses, having effectively broken Internet sharing via 
the mesh for several months now.

I am not even mentioning the uneccesary broadcasts forced by the fact that 
you have to have pretty short leases given the dynamic character of the 
network.

We put a lot of effort to put the anycast address support in the mesh, to 
specifically address the need of selecting the optimal path to a specific 
service in the path discovery process itself. We ended up with the DHCP 
monstrocity just so that we don't use anything new in what is in effect a 
new way of doing local area networking.

M.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Charles Durrett
On Jan 9, 2008 11:21 AM, Michail Bletsas [EMAIL PROTECTED] wrote:

 ...



 The largest issue is how wrong, ugly and painful is to use DHCP on a mesh
 network.
 Because of RADV, IPv6 doesn't have that issue. The original mesh portal
 discovery method was
 proprietory but also extremely lightweight and did what it was supposed to
 do with minimal code.

 ...


I'm late to the party.   How/who was it proprietary?
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Dan Williams
On Wed, 2008-01-09 at 12:18 -0600, Charles Durrett wrote:
 
 On Jan 9, 2008 11:21 AM, Michail Bletsas [EMAIL PROTECTED] wrote:
 ...
  
 The largest issue is how wrong, ugly and painful is to use
 DHCP on a mesh 
 network.
 Because of RADV, IPv6 doesn't have that issue. The original
 mesh portal
 discovery method was
 proprietory but also extremely lightweight and did what it was
 supposed to
 do with minimal code.
 
 ...
 
 I'm late to the party.   How/who was it proprietary?

The original method used UDP packets with a custom format.  I wouldn't
say proprietary so much as non-standard since the code that created
the UDP packets was open for all to see.

Dan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread David Woodhouse

On Wed, 2008-01-09 at 11:34 -0500, John Watlington wrote:
 
 Right now we have a problem with mesh portal discovery.
 
 The DHCP procedure currently being used only discovers
 the nearest mesh portal when it is first run (DHCP_DISCOVER),
 not when it tries to renew (DHCP_REQUEST).   Furthermore,
 as the address previously assigned indicates which mesh portal
 was selected, it seems like we should always be discovering, not
 renewing...

Legacy IP doesn't work well and doesn't really give us what we need in
the long term... or even the medium term. We've known that all along.

What do you propose to do about it? Throw away pointless engineering
into cobbling together some way of making Legacy IP work a bit better? I
seriously hope not. Just switch off the Legacy IP, as we should have
done months ago, and get on with making things work properly. Anything
else is a distraction.

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread John Watlington

On Jan 9, 2008, at 2:08 PM, David Woodhouse wrote:


 On Wed, 2008-01-09 at 11:34 -0500, John Watlington wrote:

 Right now we have a problem with mesh portal discovery.

 The DHCP procedure currently being used only discovers
 the nearest mesh portal when it is first run (DHCP_DISCOVER),
 not when it tries to renew (DHCP_REQUEST).   Furthermore,
 as the address previously assigned indicates which mesh portal
 was selected, it seems like we should always be discovering, not
 renewing...

 Legacy IP doesn't work well and doesn't really give us what we need in
 the long term... or even the medium term. We've known that all along.

 What do you propose to do about it? Throw away pointless engineering
 into cobbling together some way of making Legacy IP work a bit  
 better? I
 seriously hope not. Just switch off the Legacy IP, as we should have
 done months ago, and get on with making things work properly. Anything
 else is a distraction.

Unsolicited RAs for IPv6 mean that IPv6 isn't the panacea to this  
problem.
It's easy to discover the shortest way out of the mesh (nearest mesh  
portal),
but setting up the larger mesh networkl infrastucture means you also  
need to
provide a way to route packets back INTO the mesh through the MPP  
nearest
the destination laptop.

I have yet to see a good description of how to make IPv6 work right  
on a mesh
with multiple portals.One would be welcome!

I have such a method for IPv4 defined, but due to an error in  
modifying the DHCP
client, it doesn't handle laptops moving around in the mesh once they've
chosen an MPP.   (BTW, the error was mine)

wad


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread David Woodhouse

On Wed, 2008-01-09 at 14:33 -0500, John Watlington wrote:
 Unsolicited RAs for IPv6 mean that IPv6 isn't the panacea to this
 problem. It's easy to discover the shortest way out of the mesh
 (nearest mesh portal), but setting up the larger mesh networkl
 infrastucture means you also need to provide a way to route packets
 back INTO the mesh through the MPP nearest the destination laptop.
 
 I have yet to see a good description of how to make IPv6 work right on
 a mesh with multiple portals.One would be welcome!

I talked to cscott and Michail about this briefly when I was in Boston
in December. I suspect we should turn off the automatic response to RA
in the kernel, and handle it in userspace. We need some special handling
in userspace anyway, to pick up DNS server details from RA. We can also
check the mesh path length to the origin of each RA we see, and choose
the best one.

 I have such a method for IPv4 defined, but due to an error in
 modifying the DHCP client, it doesn't handle laptops moving around in
 the mesh once they've chosen an MPP.   (BTW, the error was mine)

Is there a hack which would work around that -- like reducing the lease
time?

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
 What do you propose to do about it? Throw away pointless engineering
 into cobbling together some way of making Legacy IP work a bit better? I
 seriously hope not. Just switch off the Legacy IP, as we should have
 done months ago, and get on with making things work properly. Anything
 else is a distraction.
 
You definitely live in a universe different from mine.
Regardless of how much we try to make  the XO to only talk to other XOs at 
the p2p application level, there is this small thingy out there called the 
web which is going to require Legacy IP for the foreseeable future...

M.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread John Watlington

On Jan 9, 2008, at 2:40 PM, David Woodhouse wrote:


 On Wed, 2008-01-09 at 14:33 -0500, John Watlington wrote:
 Unsolicited RAs for IPv6 mean that IPv6 isn't the panacea to this
 problem. It's easy to discover the shortest way out of the mesh
 (nearest mesh portal), but setting up the larger mesh networkl
 infrastucture means you also need to provide a way to route packets
 back INTO the mesh through the MPP nearest the destination laptop.

 I have yet to see a good description of how to make IPv6 work  
 right on
 a mesh with multiple portals.One would be welcome!

 I talked to cscott and Michail about this briefly when I was in Boston
 in December. I suspect we should turn off the automatic response to RA
 in the kernel, and handle it in userspace. We need some special  
 handling
 in userspace anyway, to pick up DNS server details from RA. We can  
 also
 check the mesh path length to the origin of each RA we see, and choose
 the best one.

Sounds like a plan.  Sometime next week we should start outlining the
work that needs to happen.

 I have such a method for IPv4 defined, but due to an error in
 modifying the DHCP client, it doesn't handle laptops moving around in
 the mesh once they've chosen an MPP.   (BTW, the error was mine)

 Is there a hack which would work around that -- like reducing the  
 lease
 time?

Heh. JG wants to increase the lease time --- I want to reduce it.
It doesn't
really make a difference, as once the lease time is expired the dhclient
first tries to request the existing lease from the previous DHCP server.
As long as it can communicate with it by hopping through the mesh, it  
will
renew the existing lease and never discover a closer MPP/DHCP server
This was the problem that prompted my original message on this thread.

wad




___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 02:40:46 PM:


 We can also
 check the mesh path length to the origin of each RA we see, and choose
 the best one.
 

The way this was originally implemented in a way that can be used for any 
well defined service (not just network gateways), 
was to assign an anycast MAC address to such well defined services.

So when a node is looking to see if there is another node providing such a 
service in the mesh, all that it has to do is a path discovery for the MAC 
address corresponding to that service. If the path discovery is 
successful, both the presence of the service as well as the optimal path 
to it has been discovered.

In the case of the mesh portal (A NAT Internet Gateway in our case) we 
need to get back the IP address of the gateway as well as DNS info.
A simple python server listening at a predefined port was providing that. 
That simple server has been replaced by a complete DHCP server in our 
current implementation.


M.

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread John Watlington

On Jan 9, 2008, at 2:55 PM, Michail Bletsas wrote:

 David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 02:40:46 PM:


 We can also
 check the mesh path length to the origin of each RA we see, and  
 choose
 the best one.


 The way this was originally implemented in a way that can be used  
 for any
 well defined service (not just network gateways),
 was to assign an anycast MAC address to such well defined services.

 So when a node is looking to see if there is another node providing  
 such a
 service in the mesh, all that it has to do is a path discovery for  
 the MAC
 address corresponding to that service. If the path discovery is
 successful, both the presence of the service as well as the optimal  
 path
 to it has been discovered.

 In the case of the mesh portal (A NAT Internet Gateway in our case) we
 need to get back the IP address of the gateway as well as DNS info.
 A simple python server listening at a predefined port was providing  
 that.
 That simple server has been replaced by a complete DHCP server in our
 current implementation.


The DHCP server was needed anyway.   And to implement shortest path
routing both for sent and received packets, we needed a mechanism for
receiving an IP address that reflected the nearest MPP anyway (or use  
NAT,
something we would like to avoid inside the school)

wad

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread David Woodhouse

On Wed, 2008-01-09 at 14:43 -0500, Michail Bletsas wrote:
  What do you propose to do about it? Throw away pointless engineering
  into cobbling together some way of making Legacy IP work a bit better? I
  seriously hope not. Just switch off the Legacy IP, as we should have
  done months ago, and get on with making things work properly. Anything
  else is a distraction.
  
 You definitely live in a universe different from mine.
 Regardless of how much we try to make  the XO to only talk to other XOs at 
 the p2p application level, there is this small thingy out there called the 
 web which is going to require Legacy IP for the foreseeable future...

NAT-PT and proxying should solve that problem relatively simply. I
should investigate the implementation at http://tomicki.net/naptd.php

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
 
 
 The DHCP server was needed anyway.   And to implement shortest path
 routing both for sent and received packets, we needed a mechanism for
 receiving an IP address that reflected the nearest MPP anyway (or use 
 NAT, something we would like to avoid inside the school)
 

I completely fail to see why we need the DHCP server to get the IP address 
of the nearest MPP or get the optimal path to and from it.

As for the multiple radio scenario at the schools, you can address that 
with two different ways: bridging  between the interfaces on the server or 
-if you want something quicker-, use a different autoIP range for each 
channel.

M.


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 02:57:50 PM:


 
 NAT-PT and proxying should solve that problem relatively simply. I
 should investigate the implementation at http://tomicki.net/naptd.php
 
Running application proxies on every XO that wants to act as a mesh 
portal?

M.

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
David Woodhouse [EMAIL PROTECTED] wrote on 01/09/2008 03:17:30 PM:

 
 On Wed, 2008-01-09 at 15:15 -0500, Michail Bletsas wrote:
  Running application proxies on every XO that wants to act as a mesh 
  portal?
 
 Running NAT-PT. Since they're required to run NAT as it is anyway, that
 shouldn't be too much of a problem. And this 'mesh portal' mode isn't
 something we really have working anyway. 
 
It does require application level proxies to look inside packets (FTP and 
DNS are examples, per its documentation).

And we had the mesh portal thing working perfectly fine until we decided 
to add DHCP in the picture

M.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread david
On Wed, 9 Jan 2008, John Watlington wrote:

 In the case of the mesh portal (A NAT Internet Gateway in our case) we
 need to get back the IP address of the gateway as well as DNS info.
 A simple python server listening at a predefined port was providing
 that.
 That simple server has been replaced by a complete DHCP server in our
 current implementation.


 The DHCP server was needed anyway.   And to implement shortest path
 routing both for sent and received packets, we needed a mechanism for
 receiving an IP address that reflected the nearest MPP anyway (or use
 NAT,
 something we would like to avoid inside the school)

you really don't want to have your IP address change becouse you moved to 
have a different MPP closer to you.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread John Watlington

 I completely fail to see why we need the DHCP server to get the IP  
 address
 of the nearest MPP or get the optimal path to and from it.

The MPP discovery mechanism originally proposed worked great for getting
packets out of the mesh through the shortest path.   The problem was  
that
outside of running NAT on each MPP, there wasn't a good way to ensure
that packets sent to that laptop entered the mesh through the same MPP.

That is currently handled (in IPv4) by using a different DHCP range  
for each
MPP, and routing to the appropriate MPP based on those ranges.

 As for the multiple radio scenario at the schools, you can address  
 that
 with two different ways: bridging  between the interfaces on the  
 server or
 -if you want something quicker-, use a different autoIP range for each
 channel.


We briefly discussed using a different autoIP range, and decided it was
difficult to implement.

wad


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Michail Bletsas
 
 The MPP discovery mechanism originally proposed worked great for getting
 packets out of the mesh through the shortest path.   The problem was 
 that
 outside of running NAT on each MPP, there wasn't a good way to ensure
 that packets sent to that laptop entered the mesh through the same MPP.
The only reason that I can see to use DHCP is if you want to distribute 
routable IPv4 addresses, something that would be glorious if it could 
happen but  which I don't see happening very often.

If you are not running NAT on the MPPs and you have multiple MPPs per mesh 
and the external routing protocol decided that packets should return 
through a different portal, what much do you think you are gaining by 
using the same path inside the mesh (which b.t.w. is different in each 
direction anyway!)?






 
 
 We briefly discussed using a different autoIP range, and decided it was
 difficult to implement.
 
Again fail to see why - it can be non-standard but definitely not 
difficult to implement.

M.


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread david
On Wed, 9 Jan 2008, John Watlington wrote:

 I completely fail to see why we need the DHCP server to get the IP
 address
 of the nearest MPP or get the optimal path to and from it.

 The MPP discovery mechanism originally proposed worked great for getting
 packets out of the mesh through the shortest path.   The problem was
 that
 outside of running NAT on each MPP, there wasn't a good way to ensure
 that packets sent to that laptop entered the mesh through the same MPP.

 That is currently handled (in IPv4) by using a different DHCP range
 for each
 MPP, and routing to the appropriate MPP based on those ranges.

this sounds like the mobile IP problem.

could you do something along the lines of having the MPP boxes within a 
mesh talk to each other (either over the mesh or over the Internet) so 
that they know what boxes are closest to each one.

then have the mesh route the traffic out to the nearest MPP.

response traffic would go to the MPP that allocated the IP address, and 
that box then tunnels the packet over to the MPP box closest to the laptop 
(similar to how LVS does load balancing), and that box then sends it over 
the radio.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Dan Williams
On Wed, 2008-01-09 at 15:59 -0500, John Watlington wrote:
 On Jan 9, 2008, at 3:47 PM, Michail Bletsas wrote:
 
  The MPP discovery mechanism originally proposed worked great for  
  getting
  packets out of the mesh through the shortest path.   The problem was
  that outside of running NAT on each MPP, there wasn't a good way  
  to ensure
  that packets sent to that laptop entered the mesh through the same  
  MPP.
 
  The only reason that I can see to use DHCP is if you want to  
  distribute
  routable IPv4 addresses, something that would be glorious if it could
  happen but  which I don't see happening very often.
 
 Neither do I.  But I don't want to impose a NAT between two laptops  
 in the
 same school.   It will break P2P applications.
 
  If you are not running NAT on the MPPs and you have multiple MPPs  
  per mesh
  and the external routing protocol decided that packets should return
  through a different portal, what much do you think you are gaining by
  using the same path inside the mesh (which b.t.w. is different in each
  direction anyway!)?
 
 I don't care about using the same path, but sending packets for six  
 hops
 through the mesh when proper routing can reduce it to a single hop seems
 like piss-poor design.  And it makes the mesh interfaces on a single  
 server
 serve the entire school.  Why bother with multiple MPPs at all ?
 
  We briefly discussed using a different autoIP range, and decided  
  it was
  difficult to implement.
 
  Again fail to see why - it can be non-standard but definitely not
  difficult to implement.
 
 IIRC, Dan Williams was the person looking into it.  It wasn't a  
 Network Manager
 change, it was a change to Avahi, and would either have to be pushed  
 upstream
 or maintained indefinitely by us.  Plus, AutoIP addresses aren't EVER  
 supposed
 to be routed --- they are strictly link local due to the assignment  
 process.
 
 Thanks for the discussion --- we need to figure out a solution for  
 IPv6 going forward,
 as none of the current approaches will absolutely not extend to IPv6.

- DHCP did what we needed back then, namely
1) a robust discovery mechanism
2) well-tested backoff mechanisms
3) well-known and standardized behavior and packet format
4) well-tested and security audited server and client

In the School Server case, using DHCP as the allowed us to collapse two
steps of the connection process into one.  With the previous method, you
would have to _both_ find the MPP using the non-standard MPP discovery
method, and second do a DHCP run to get your address from the school
server.  Using DHCP here _already_ can provide the address of your
gateway.

You could conceivably do both these operations in parallel but since you
have to do DHCP anyway, it's pointless to do some other MPP discovery
mechanism.  In a school setting autoip might work, but might mean more
traffic because of potential address conflicts and the resolution
process.  So if you want dynamic addressing in the school, DHCP is about
the only easy way to do that, and once you're using DHCP the old MPP
discovery mechanism is pointless.

The above benefit does not apply in the XO-as-MPP case because autoip
addressing is used, however the same codepath is used in NetworkManager
as the school server case, and therefore there is less code to maintain,
and fewer codepaths to test, and fewer opportunities for stuff to go
wrong.

The only real solution to this problem that doesn't suck is to use IPv6
auto addressing for everything.

Dan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Javier Cardona
John,

 The DHCP procedure currently being used only discovers
 the nearest mesh portal when it is first run (DHCP_DISCOVER),
 not when it tries to renew (DHCP_REQUEST).   Furthermore,
 as the address previously assigned indicates which mesh portal
 was selected, it seems like we should always be discovering, not
 renewing...

You probably don't want that:  a mesh point might have equal cost
routes to several mesh portals.  In that case you want some
hysteresis:  only change to a new MPP if it offers a big advantage
over the current one.

 As long as it can communicate with it by hopping through the mesh, it
 will renew the existing lease and never discover a closer MPP/DHCP server
 This was the problem that prompted my original message on this thread.

One way to do this would be to run a simple daemon that

  1. Periodically sends traffic to the anycast address.  If you want
to use dhclient for this ( assuming it is patched as described here:
http://www.cozybit.com/projects/mpp-utils/index.html#update ) you
could send frames to the anycast address like this:

  # dhclient eth0 -1 -lf /dev/null -sf /bin/true

  2. Compare the metric of the best mpp with the current mpp.  This
can be done via iwpriv fwt_list calls.

  3. If the cost difference justifies it, wipe out the existing leases
and re-discover

  # rm /var/lib/dhcp3/* ; dhclient eth0

Cheers,

Javier

-- 
Javier Cardona
cozybit Inc.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread david
On Wed, 9 Jan 2008, Javier Cardona wrote:

 The DHCP procedure currently being used only discovers
 the nearest mesh portal when it is first run (DHCP_DISCOVER),
 not when it tries to renew (DHCP_REQUEST).   Furthermore,
 as the address previously assigned indicates which mesh portal
 was selected, it seems like we should always be discovering, not
 renewing...

 You probably don't want that:  a mesh point might have equal cost
 routes to several mesh portals.  In that case you want some
 hysteresis:  only change to a new MPP if it offers a big advantage
 over the current one.

 As long as it can communicate with it by hopping through the mesh, it
 will renew the existing lease and never discover a closer MPP/DHCP server
 This was the problem that prompted my original message on this thread.

 One way to do this would be to run a simple daemon that

  1. Periodically sends traffic to the anycast address.  If you want
 to use dhclient for this ( assuming it is patched as described here:
 http://www.cozybit.com/projects/mpp-utils/index.html#update ) you
 could send frames to the anycast address like this:

  # dhclient eth0 -1 -lf /dev/null -sf /bin/true

  2. Compare the metric of the best mpp with the current mpp.  This
 can be done via iwpriv fwt_list calls.

  3. If the cost difference justifies it, wipe out the existing leases
 and re-discover

  # rm /var/lib/dhcp3/* ; dhclient eth0

you really don't want to change the IP of the laptop any more then you 
absolutly must, it's too likely to disrupt existing connections.

as I understand it the mesh is (close to) continuously reconfiguring 
itself to find the most efficiant path across it.

is the resulting information available to all of the MPP nodes?

if it is you should be able to do something like the following.

1. on initial connection use the existing process to make a 'best guess' 
to find a DHCP server and get an IP address.

2. outbound packets use this IP address no matter which MPP the packets go 
through.

3. inbound packets go to the MPP that initially gave out the IP address.

3a. if that MPP determines that it is still the closest MPP to the end 
node, it sends the packet out normally.

3b. if the packet arrives at the MPP over a tunnel from another MPP, don't 
check the routing, just send it out over the mesh (avoids routing loops)

3c. if the MPP determines that another MPP is significantly closer to the 
end node, it tunnels the packet over to the closer MPP, which then sends 
it over the mesh to the end node.

I think that step 3 can be tested without extensive code changes by useing 
hooks in iptables. Iptables has the ability to call out to userspace code 
as part of it's processing decision, if that userspace code reports that 
the end-node is closest to this MPP then it routes the packet normally, if 
it thinks that another MPP is closer, it returns somthing to indicate 
which remote node to use, and then the packet gets routed through a tunnel 
to that node (a simple GRE tunnel will do, we just need to encapsulate 
the packet)

This approach requires that all of the MPP boxes know which one of them is 
closest to each end-node. If the current mesh structure does not provide 
this info to all nodes then an additional daemon would need to share this 
info (possibly over the same tunnels that are used to relay the traffic)

I will say up front that I haven't done the iptables-userspace hooking in 
any of my projects, but this should be an easy way to prototype this 
before adding this type of routing to the kernel.

This approach is safe, the worst case is that inbound packets take a 
longer path then optimal to get to the node (either they don't get 
re-routed when they should or they get re-routed when they shouldn't, 
either way they take more hops over the radio than nessasary). By not 
changing the IP address of the node it avoids breaking existing 
connections at the cost of an additional hop over wired networks


Potential problems

if you are doing NAT on the MPP then this approach won't work (becouse the 
outbound packets don't all go through the same MPP)

if the different MPP boxes are on different Internet connections and there 
is egress filtering outside the MPP boxes, that filtering would need to 
allow the mesh IP's out through all MPP boxes.

David Lang

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread Mikus Grinbergs
 Just switch off the Legacy IP, as we should have
 done months ago, and get on with making things work properly.
 Anything else is a distraction.

I sympathize with how overworked OLPC developers are.  But a number 
of G1G1 systems are getting into the hands of articulate net-aware 
people.  If they become disenchanted by the Legacy IP performance of 
the OLPC, what they say might result in hurting the whole project.



 I completely fail to see why we need the DHCP server  ...

I don't have wireless at home.  First tried stopping NetManager, and 
manually setting the IP address.  That worked, but screwed something 
up so I could not use the system in a cafe.  So I restored the OLPC, 
and added a DHCP server in my home, __just__ for the OLPC.  Effort 
that I had not anticipated needing to expend.

mikus

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread John Watlington

On Jan 9, 2008, at 6:32 PM, [EMAIL PROTECTED] wrote:

 On Wed, 9 Jan 2008, John Watlington wrote:

 Sounds like you are volunteering to write that code.
 We need it by early next week.

 Yes, mobile IPv6 is one of our long-term solution possibilities.

 two things.

 1. I didn't realize that participation wasn't desired except by  
 people volunterring to write the code (the discussion sounded like  
 people were trying to figure out what to do)

 2. while I referred to mobile IP, the solution that I then outlined  
 is not that complete.

 do you want me to shut up until I have completed code to present?  
 or do you want people discussing options and trying to figure out a  
 strategy that will work?

Sorry about the snap, but you have to realize that I am currently the  
developer and QA department for the school
server, and it isn't even my current job.   And I really do need the  
answer implemented over the next week...

Some handwaving about the best way to do it from the peanut gallery  
is fine, but don't expect to be listened to
unless you can point to existing software or are willing to help  
implement the solution.   We don't have infinite
(or even sufficient) resources, nor the time for development and  
testing of complex solutions.

I don't understand your dislike of changing IP addresses.  My current  
laptop does it multiple times a day just fine,
and it should only happen when a student is moving from one place to  
another (as long as Javier's comments about
avoiding flapping are heeded).   We have a presence service which  
provides a way for P2P applications to find
one another, even after the IP changes.

wad

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread John Watlington

On Jan 9, 2008, at 6:50 PM, Mikus Grinbergs wrote:

 Just switch off the Legacy IP, as we should have
 done months ago, and get on with making things work properly.
 Anything else is a distraction.

 I sympathize with how overworked OLPC developers are.  But a number
 of G1G1 systems are getting into the hands of articulate net-aware
 people.  If they become disenchanted by the Legacy IP performance of
 the OLPC, what they say might result in hurting the whole project.

You misunderstood our local IPv6 evangelist, he wasn't proposing to  
disable
IPv4 on the laptop, just not to support it on the school server  
mesh.  Given that
all mesh capable devices will support IPv6, he's probably got a point.


Here is my take-home summary of this thread:

Short term solution is to turn off IPv6 on the mesh, and tell kids  
that if their
network performance degrades, they should click on the circle again
which will trigger an IPv4 DHCP discovery of the nearest MPP.

Long term solution is probably to move to IPv6 only, using a user space
agent to decide which RAs to listen to.  This user space agent can  
implement
Javier's suggestion to avoid flapping between MPPs.   Mobile IPv6 would
be frosting on the cake, but doesn't help with the primary problem of  
MPP
selection.

Thanks,
wad

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: mesh portal discovery

2008-01-09 Thread david
On Wed, 9 Jan 2008, John Watlington wrote:

 On Jan 9, 2008, at 6:50 PM, Mikus Grinbergs wrote:

 Just switch off the Legacy IP, as we should have
 done months ago, and get on with making things work properly.
 Anything else is a distraction.

 I sympathize with how overworked OLPC developers are.  But a number
 of G1G1 systems are getting into the hands of articulate net-aware
 people.  If they become disenchanted by the Legacy IP performance of
 the OLPC, what they say might result in hurting the whole project.

 You misunderstood our local IPv6 evangelist, he wasn't proposing to
 disable
 IPv4 on the laptop, just not to support it on the school server
 mesh.  Given that
 all mesh capable devices will support IPv6, he's probably got a point.


 Here is my take-home summary of this thread:

 Short term solution is to turn off IPv6 on the mesh, and tell kids
 that if their
 network performance degrades, they should click on the circle again
 which will trigger an IPv4 DHCP discovery of the nearest MPP.

 Long term solution is probably to move to IPv6 only, using a user space
 agent to decide which RAs to listen to.  This user space agent can
 implement
 Javier's suggestion to avoid flapping between MPPs.   Mobile IPv6 would
 be frosting on the cake, but doesn't help with the primary problem of
 MPP
 selection.

I'm trying to make sure I fully understand the problem

it sounds as if you have a good mechanism in the mesh for the laptops to 
send packets to the nearest MPP

the problem is that if they get an IP address from a MPP that is a long 
way away (either initially due to a problem or over time as the laptop 
moves more hops away from the MPP) the fact that reply packets will always 
go the the MPP that gave out the IP address (due to normal IP routing) 
results in a slow reply as these packets start taking longer to get from 
the MPP to the laptop.

is this correct so far?

this problem is further complicated by the IPv6 equivalent of DHCP makeing 
it more likely that the initial 'registration' with a MPP is less optimal.

and to top things off, since the replies are typically larger then the 
requests (which is why people live with DSL that is only 512Kb outbound, 
but is 1.5Mb inbound) the additional delays on the inbound leg are 
significantly worse.


I am makeing the assumption that the MPP machines know the wireless 
topology from each of their points of view i.e. not only do they know 
how to get to the wireless nodes from themselves (and how many hops away 
they are), but they also know this information for each of the other MPP 
nodes. If this assumption is not true currently, a daemon would need to be 
run to keep the MPP boxes in agreement over who is the best gateway to the 
laptop.



If I am on track so far let me see if I can divide the resulting problem 
into three cases


1. the MPP boxes involved are 'owned' by seperate entites and may not know 
about each other over the wired network.

2. the MPP boxes involved are associated with each other, but may be two 
or more network hops away from each other, but all managed as part of the 
same set (with egress filters configured so that outbound traffic could 
come from any of the MPP boxes)

3. the MPP boxes involved with the mesh network are tightly coupled (all 
connected with a high-speed wire network on the same subnet (no routing 
between them, all on the same broadcase domain)


addressing these one at a time.

for #1 I can't think of any reasonable way to move a machine from talking 
to one MPP to another short of true mobile IP solutions.

for #2 the basic approach is the same as LVS uses in tunneling mode see 
http://www.linuxvirtualserver.org/VS-IPTunneling.html for a diagram and 
explination

   This is basicly what I was suggesting earlier, don't worry about the 
outbound traffic, just bounce the inbound traffic to the closest node (via 
a tunnel) before sending it over the air. this chould be a matter of 
useing the existing LVS code and changing the server selection logic with 
something that is aware of the wireless topology.

to avoid a routing loop where the packet gets bounced back and forth 
between MPP boxes, you should be able to set things up so that the load 
balancing is only done on packets coming in from the outside (I don't know 
if iptables can do this stock, but it should be a simple, if ugly hack to 
make packets arriving through a tunnel bypass the LVS code and get 
inserted just past it in the IP stack)

the worst case with this model should be that some inbound packets get 
relayed to the wrong MPP and make more hops then they need to over the 
air.

for #3 I am looking at other server load balancing options, specificly 
the clusterIP target available in iptables 
http://flaviostechnotalk.com/wordpress/index.php/2005/06/12/loadbalancer-less-clusters-on-linux

what this does is to define an IP address that exists on all machine and 
uses a multicast MAC address, this forces the switch to send the packet to