Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-17 Thread Giannis Galanis
Simon,

I think the email i send you is incomplete, my connection was poor and gmail
must have saved the wrong draft. But, 1-2-3, is what i intended to send you.

I also meant to ask, How many times do you try _init_connection before you
assume the connection is down?


I hope so. I have a tarball with the patch, but I'm still waiting for
 Update.1 approval (it's unclear whether I can build RPMs for Joyride
 before I get Update.1 approval or not). If you're at 1CC, could you please
 annoy the ApprovalForUpdate people in person until they either look at
 their
 bugs, or confirm whether I'm still allowed to build RPMs in Koji?


I can definitely try to arrange this. But, can you please send me the
tarball to test it in the mean time?

 2. We need to be able to restart PS. As you say this is not possible, but
 if
  we restart sugar will PS restart as well?

 Yes, that's right (the D-Bus session bus will exit, which causes
 D-Bus services like PS to exit too unless they've specifically asked not
 to).

 I see you assigned the bug about need to be able to cope with PS
 restarts to
 yourself. Unless you're planning to implement the necessary Python code
 in sugar.presence yourself, please don't.

I don't think it's feasible to implement correct handling of PS restarts in
 sugar.presence for Update.1, so unless the release engineering team
 specifically tell me to, I won't be addressing that bug until a later
 release.


Ok, i will reassign the bug to presenceservice. As long as restarting sugar
works, we can stick to that for now.


 3. We need to force gabble to run. We have several instances of 4193
 (almost
  all XOs connected to schoolserver,AP are running salut). Or at least to
  force trying to connect to jabber server.

 Please see my comments on #4193 regarding steps to take to debug (I think
 it's #4193 I commented on - I can't remember bug numbers, and Trac is
 down at the moment).

 In summary:

 * try resolving the server with getent hosts jabber.laptop.org
 * try pinging it with ping jabber.laptop.org
 * try connecting via TCP with telnet jabber.laptop.org 5222
   (type hello and press Enter, if all goes well you should get
 disconnected
   with an error message that mentions XML not well formed)


The bug is indeed 4193.  I have replied to your post, but as the trac is
down you probably havent seen it.
I made all three tests:

$getent hosts jabber.laptop.org
 2001:4830:2446:ff00:201:6cff:fe07:68ec jabber.laptop.org   -
frequent reply
 18.85.46.41 jabber.laptop.org  --rare reply

$ping jabber.laptop.org
 PING jabber.laptop.org (18.85.46.41) 56(84) bytes of data.
 64 bytes from jabber.laptop.org (18.85.46.41): icmp_seq=1 ttl=63 time=
67.4 ms
 ...

$telnet jabber.laptop.org 5222
 blabla... connected
hello
 replied with an xml packet with xml-not-well-formed included

so it seems that it is a PS issue. Perhaps it is not waiting long enough, or
doesnt make enough tries when trying to connect. I have reassigned the bug
to presenceservice.


If any of these steps fail, Gabble won't be able to connect either, and
 there's nothing Gabble can do about it - talk to the Network Manager
 maintainer instead, since that's the component responsible for getting
 network connectivity and DNS on the XO.

 If you check the Gabble log you'll probably find that Gabble is trying
 to connect, but failing because either it can't resolve
 jabber.laptop.org in DNS, or it can't get a TCP connection there. That was
 my
 diagnosis of two of the cases you mentioned in your bug with 3 sets of
 logs
 (which may have been #4193?). In the third case it looked as though you
 hadn't
 waited long enough for the log to indicate success or failure.

  4. The process of trying to connect to the jabber server, is done by
  telepathy-gabble, or by the presence



What I meant here is, Does the PS check if jabber server is accessible, and
then runs telepathy-gabble?, or this is one of the tasks of
telepathy-gabble?, which as I see you replied to

Depends what you mean. The Presence Service is responsible for choosing when

 to try to connect (at which time it calls the Connect() D-Bus method
 on Gabble), but it's Gabble that actually opens a TCP socket to the Jabber
 server and tries to talk to it. You can see this in the PS log, for
 instance:

 1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): connecting...
 1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect()
 succeeded

 (note that Connect() succeeded is a bit misleading - it just means
 that the connection manager has said OK, I'll try, rather than that it
 has actually been able to connect.)

 In the telepathy-gabble log you'll then see something like this:

 ** (telepathy-gabble:25330): DEBUG: do_connect: calling lm_connection_open
 Going to connect to olpc.collabora.co.uk
 

Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-17 Thread Giannis Galanis
Yes, i have seen this ticket in the past. To detect whether an XO is
actually there or not, is a simple task to accomplish, and I am currently
working on a simple script that will give a list of the properly connected
XOs, along with the temporarily disconnected.

It is a very useful idea to display this information in the neighbor view,
in terms of a dotted line, or a grey color perhaps.  The problem is that the
bugs are dealt with according to priority, and generally enhancements
although very practical, can cause  other bugs, or take several builds until
they work properly.

Since we are in code freeze, a quick solution must be implemented to solve
the current situation, ie that it takes up to an hour for a disconnected xo
to dissapear(just reported as #4735).

yani


On Nov 7, 2007 5:49 PM, Eben Eliason [EMAIL PROTECTED] wrote:

   1. We need to fix the timeout for icons to disappear. Can we try
 Guillaume's
   patch?
 
  I hope so. I have a tarball with the patch, but I'm still waiting for
  Update.1 approval (it's unclear whether I can build RPMs for Joyride
  before I get Update.1 approval or not). If you're at 1CC, could you
 please
  annoy the ApprovalForUpdate people in person until they either look at
 their
  bugs, or confirm whether I'm still allowed to build RPMs in Koji?

 Just a mention, since this thread is getting a lot of attention. There
 is an added visual element which should be in play here, according to
 the design.  There should be an intermediate state before XOs
 disappear from the view, as outlined in:

 http://dev.laptop.org/ticket/3657


   2. We need to be able to restart PS. As you say this is not possible,
 but if
   we restart sugar will PS restart as well?
 
  Yes, that's right (the D-Bus session bus will exit, which causes
  D-Bus services like PS to exit too unless they've specifically asked not
 to).
 
  I see you assigned the bug about need to be able to cope with PS
 restarts to
  yourself. Unless you're planning to implement the necessary Python code
  in sugar.presence yourself, please don't.
 
  I don't think it's feasible to implement correct handling of PS restarts
 in
  sugar.presence for Update.1, so unless the release engineering team
  specifically tell me to, I won't be addressing that bug until a later
  release.
 
   3. We need to force gabble to run. We have several instances of 4193
 (almost
   all XOs connected to schoolserver,AP are running salut). Or at least
 to
   force trying to connect to jabber server.
 
  Please see my comments on #4193 regarding steps to take to debug (I
 think
  it's #4193 I commented on - I can't remember bug numbers, and Trac is
  down at the moment).
 
  In summary:
 
  * try resolving the server with getent hosts jabber.laptop.org
  * try pinging it with ping jabber.laptop.org
  * try connecting via TCP with telnet jabber.laptop.org 5222
(type hello and press Enter, if all goes well you should get
 disconnected
with an error message that mentions XML not well formed)
 
  If any of these steps fail, Gabble won't be able to connect either, and
  there's nothing Gabble can do about it - talk to the Network Manager
  maintainer instead, since that's the component responsible for getting
  network connectivity and DNS on the XO.
 
  If you check the Gabble log you'll probably find that Gabble is trying
  to connect, but failing because either it can't resolve
  jabber.laptop.org in DNS, or it can't get a TCP connection there. That
 was my
  diagnosis of two of the cases you mentioned in your bug with 3 sets of
 logs
  (which may have been #4193?). In the third case it looked as though you
 hadn't
  waited long enough for the log to indicate success or failure.
 
   4. The process of trying to connect to the jabber server, is done by
   telepathy-gabble, or by the presence
 
  Depends what you mean. The Presence Service is responsible for choosing
 when
  to try to connect (at which time it calls the Connect() D-Bus method
  on Gabble), but it's Gabble that actually opens a TCP socket to the
 Jabber
  server and tries to talk to it. You can see this in the PS log, for
  instance:
 
  1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
  0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0):
 connecting...
  1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
  0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect()
  succeeded
 
  (note that Connect() succeeded is a bit misleading - it just means
  that the connection manager has said OK, I'll try, rather than that it
  has actually been able to connect.)
 
  In the telepathy-gabble log you'll then see something like this:
 
  ** (telepathy-gabble:25330): DEBUG: do_connect: calling
 lm_connection_open
  Going to connect to olpc.collabora.co.uk
  Trying 195.10.223.134 port 5222...
  ** (telepathy-gabble:25330): DEBUG: tp_base_connection_change_status:
  was 4294967295, now 1, for reason 1
  ** 

Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-08 Thread Morgan Collett
Simon McVittie wrote:

 PS makes an unlimited number of connection attempts, with a short
 delay between each one (we should probably change this to use an
 exponential backoff process so the delays get longer as you're offline
 for longer, up to a maximum of perhaps 10 minutes).

#2522.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-08 Thread Simon McVittie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 07 Nov 2007 at 17:49:52 -0500, Eben Eliason wrote:
 Just a mention, since this thread is getting a lot of attention. There
 is an added visual element which should be in play here, according to
 the design.  There should be an intermediate state before XOs
 disappear from the view, as outlined in:
 
 http://dev.laptop.org/ticket/3657

As outlined in that bug, this has nothing to do with the Telepathy
backends and PS, it's just a layout/presentation tweak in the Sugar
shell (and indeed, the bug is assigned to sugar, not presence-service).

Getting more information from the network, via Telepathy and PS, to the shell
would be necessary to take it beyond what you suggest in comment#2, but
is not feasible in the short term (i.e. Update.1).

Simon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net

iD8DBQFHMukAWSc8zVUw7HYRAtF/AKDlmpJsI08JeWrYlebdtGHovF4oSgCfeEAV
ZIXr8UwO6guqBRbkbvlVivw=
=dRfa
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-08 Thread Simon McVittie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 07 Nov 2007 at 13:36:45 -0500, Giannis Galanis wrote:
 I can definitely try to arrange this. But, can you please send me the
 tarball to test it in the mean time?

Will do.

 I don't think it's feasible to implement correct handling of PS restarts in
  sugar.presence for Update.1, so unless the release engineering team
  specifically tell me to, I won't be addressing that bug until a later
  release.
 
 Ok, i will reassign the bug to presenceservice. As long as restarting sugar
 works, we can stick to that for now.

No, it's not a Presence Service bug, it's a Sugar bug (the sugar.presence
module is part of Sugar, and it's that module that will have to be changed).
Please assign to component Sugar, with owner smcv or morgs, and keep the
'collaboration' keyword (we use that to keep track of collaboration-related
bugs in other people's components).

 $getent hosts jabber.laptop.org
  2001:4830:2446:ff00:201:6cff:fe07:68ec jabber.laptop.org   -
 frequent reply
  18.85.46.41 jabber.laptop.org  --rare reply
 
 $ping jabber.laptop.org
  PING jabber.laptop.org (18.85.46.41) 56(84) bytes of data.
  64 bytes from jabber.laptop.org (18.85.46.41): icmp_seq=1 ttl=63 time=
 67.4 ms
  ...
 
 $telnet jabber.laptop.org 5222
  blabla... connected
 hello
  replied with an xml packet with xml-not-well-formed included
 
 so it seems that it is a PS issue. Perhaps it is not waiting long enough, or
 doesnt make enough tries when trying to connect. I have reassigned the bug
 to presenceservice.

Was all this done on a machine exhibiting the failure you mention?

PS makes an unlimited number of connection attempts, with a short
delay between each one (we should probably change this to use an
exponential backoff process so the delays get longer as you're offline
for longer, up to a maximum of perhaps 10 minutes).

 What I meant here is, Does the PS check if jabber server is accessible, and
 then runs telepathy-gabble?, or this is one of the tasks of
 telepathy-gabble?, which as I see you replied to

Like I said, the PS doesn't check whether the server is accessible, it
just optimistically tries to connect anyway. I believe this is the right
thing to do.

 have you tried to check connecting to gabble with the laptops available
 there? Does it work fine?

Not recently with XOs, I must admit (downloading filesystem images takes
a while) but it's always worked fine from my jhbuild.

 Perhaps you can connect to an XO here with ssh, and debug real time what is
 exactly happening.

Talk to me on #sugar when you have an Internet-accessible XO that's
exhibiting this problem. I'm smcv on IRC.

 it was suggested (i think bug 4700) that it is possible that the jabber
 server might have a limit in number of users. Is this possible?

It's possible, but it's always worked for me...

Simon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net

iD8DBQFHMtxBWSc8zVUw7HYRAj3fAJ95oDyvE30EXR3UP4/muZdWtbAE3ACggXbS
EEhhwpa+vAW+7uwvuIMkK/g=
=uOn0
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-07 Thread Giannis Galanis
Simon,

I think the email i send you was incomplete, my connection was poor and
gmail must have saved the wrong draft. But, 1-2-3, is what i intended to
send you.

I also meant to ask, How many times do you try _init_connection before you
assume the connection is down?


I hope so. I have a tarball with the patch, but I'm still waiting for
 Update.1 approval (it's unclear whether I can build RPMs for Joyride
 before I get Update.1 approval or not). If you're at 1CC, could you please
 annoy the ApprovalForUpdate people in person until they either look at
 their
 bugs, or confirm whether I'm still allowed to build RPMs in Koji?


I can definitely try to arrange this. But, can you please send me the
tarball to test it in the mean time?

 2. We need to be able to restart PS. As you say this is not possible, but
 if
  we restart sugar will PS restart as well?

 Yes, that's right (the D-Bus session bus will exit, which causes
 D-Bus services like PS to exit too unless they've specifically asked not
 to).

 I see you assigned the bug about need to be able to cope with PS
 restarts to
 yourself. Unless you're planning to implement the necessary Python code
 in sugar.presence yourself, please don't.

I don't think it's feasible to implement correct handling of PS restarts in
 sugar.presence for Update.1, so unless the release engineering team
 specifically tell me to, I won't be addressing that bug until a later
 release.


Ok, i will reassign the bug to presenceservice. As long as restarting sugar
works, we can stick to that for now.


 3. We need to force gabble to run. We have several instances of 4193
 (almost
  all XOs connected to schoolserver,AP are running salut). Or at least to
  force trying to connect to jabber server.

 Please see my comments on #4193 regarding steps to take to debug (I think
 it's #4193 I commented on - I can't remember bug numbers, and Trac is
 down at the moment).

 In summary:

 * try resolving the server with getent hosts jabber.laptop.org 
 * try pinging it with ping jabber.laptop.org
 * try connecting via TCP with telnet jabber.laptop.org 5222
   (type hello and press Enter, if all goes well you should get
 disconnected
   with an error message that mentions XML not well formed)


The bug is indeed 4193.  I have replied to your post, but as the trac is
down you probably havent seen it.
I made all three tests:

$getent hosts jabber.laptop.org
 2001:4830:2446:ff00:201:6cff:fe07:68ec jabber.laptop.org   -
frequent reply
 18.85.46.41 jabber.laptop.org  --rare reply

$ping jabber.laptop.org
 PING jabber.laptop.org (18.85.46.41) 56(84) bytes of data.
 64 bytes from jabber.laptop.org (18.85.46.41): icmp_seq=1 ttl=63 time=
67.4 ms
 ...

$telnet jabber.laptop.org 5222
 blabla... connected
hello
 replied with an xml packet with xml-not-well-formed included

so it seems that it is a PS issue. Perhaps it is not waiting long enough, or
doesnt make enough tries when trying to connect. I have reassigned the bug
to presenceservice.


If any of these steps fail, Gabble won't be able to connect either, and
 there's nothing Gabble can do about it - talk to the Network Manager
 maintainer instead, since that's the component responsible for getting
 network connectivity and DNS on the XO.

 If you check the Gabble log you'll probably find that Gabble is trying
 to connect, but failing because either it can't resolve
 jabber.laptop.org in DNS, or it can't get a TCP connection there. That was
 my
 diagnosis of two of the cases you mentioned in your bug with 3 sets of
 logs
 (which may have been #4193?). In the third case it looked as though you
 hadn't
 waited long enough for the log to indicate success or failure.

  4. The process of trying to connect to the jabber server, is done by
  telepathy-gabble, or by the presence



What I meant here is, Does the PS check if jabber server is accessible, and
then runs telepathy-gabble?, or this is one of the tasks of
telepathy-gabble?, which as I see you replied to

Depends what you mean. The Presence Service is responsible for choosing when

 to try to connect (at which time it calls the Connect() D-Bus method
 on Gabble), but it's Gabble that actually opens a TCP socket to the Jabber
 server and tries to talk to it. You can see this in the PS log, for
 instance:

 1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): connecting...
 1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect()
 succeeded

 (note that Connect() succeeded is a bit misleading - it just means
 that the connection manager has said OK, I'll try, rather than that it
 has actually been able to connect.)

 In the telepathy-gabble log you'll then see something like this:

 ** (telepathy-gabble:25330): DEBUG: do_connect: calling lm_connection_open
 Going to connect to olpc.collabora.co.uk

Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-07 Thread Eben Eliason
  1. We need to fix the timeout for icons to disappear. Can we try Guillaume's
  patch?

 I hope so. I have a tarball with the patch, but I'm still waiting for
 Update.1 approval (it's unclear whether I can build RPMs for Joyride
 before I get Update.1 approval or not). If you're at 1CC, could you please
 annoy the ApprovalForUpdate people in person until they either look at their
 bugs, or confirm whether I'm still allowed to build RPMs in Koji?

Just a mention, since this thread is getting a lot of attention. There
is an added visual element which should be in play here, according to
the design.  There should be an intermediate state before XOs
disappear from the view, as outlined in:

http://dev.laptop.org/ticket/3657


  2. We need to be able to restart PS. As you say this is not possible, but if
  we restart sugar will PS restart as well?

 Yes, that's right (the D-Bus session bus will exit, which causes
 D-Bus services like PS to exit too unless they've specifically asked not to).

 I see you assigned the bug about need to be able to cope with PS restarts to
 yourself. Unless you're planning to implement the necessary Python code
 in sugar.presence yourself, please don't.

 I don't think it's feasible to implement correct handling of PS restarts in
 sugar.presence for Update.1, so unless the release engineering team
 specifically tell me to, I won't be addressing that bug until a later
 release.

  3. We need to force gabble to run. We have several instances of 4193 (almost
  all XOs connected to schoolserver,AP are running salut). Or at least to
  force trying to connect to jabber server.

 Please see my comments on #4193 regarding steps to take to debug (I think
 it's #4193 I commented on - I can't remember bug numbers, and Trac is
 down at the moment).

 In summary:

 * try resolving the server with getent hosts jabber.laptop.org
 * try pinging it with ping jabber.laptop.org
 * try connecting via TCP with telnet jabber.laptop.org 5222
   (type hello and press Enter, if all goes well you should get disconnected
   with an error message that mentions XML not well formed)

 If any of these steps fail, Gabble won't be able to connect either, and
 there's nothing Gabble can do about it - talk to the Network Manager
 maintainer instead, since that's the component responsible for getting
 network connectivity and DNS on the XO.

 If you check the Gabble log you'll probably find that Gabble is trying
 to connect, but failing because either it can't resolve
 jabber.laptop.org in DNS, or it can't get a TCP connection there. That was my
 diagnosis of two of the cases you mentioned in your bug with 3 sets of logs
 (which may have been #4193?). In the third case it looked as though you hadn't
 waited long enough for the log to indicate success or failure.

  4. The process of trying to connect to the jabber server, is done by
  telepathy-gabble, or by the presence

 Depends what you mean. The Presence Service is responsible for choosing when
 to try to connect (at which time it calls the Connect() D-Bus method
 on Gabble), but it's Gabble that actually opens a TCP socket to the Jabber
 server and tries to talk to it. You can see this in the PS log, for
 instance:

 1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): connecting...
 1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at
 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect()
 succeeded

 (note that Connect() succeeded is a bit misleading - it just means
 that the connection manager has said OK, I'll try, rather than that it
 has actually been able to connect.)

 In the telepathy-gabble log you'll then see something like this:

 ** (telepathy-gabble:25330): DEBUG: do_connect: calling lm_connection_open
 Going to connect to olpc.collabora.co.uk
 Trying 195.10.223.134 port 5222...
 ** (telepathy-gabble:25330): DEBUG: tp_base_connection_change_status:
 was 4294967295, now 1, for reason 1
 ** (telepathy-gabble:25330): DEBUG: tp_base_connection_change_status:
 emitting status-changed to 1, for reason 1

 (here status 4294967295 means haven't tried to connect yet, status 1 means
 connecting, reason 1 means by user request)

 If the TCP connection succeeds, in the telepathy-gabble log you'll see:

 Connection success.
 SEND:
 - ---
 ?xml version='1.0' encoding='UTF-8'?
 - ---

 Some XMPP handshaking will follow.

 Finally, when Gabble has finished doing its initial setup on the
 connection, it will signal that it's become connected:

 ** (telepathy-gabble:25330): DEBUG: tp_base_connection_change_status:
 was 1, now 0, for reason 1
 ** (telepathy-gabble:25330): DEBUG: tp_base_connection_change_status:
  emitting status-changed to 0, for reason 1

 (here status 0 means connected)

 and the Presence Service will receive the StatusChanged signal via D-Bus:

 1194431621.473412 DEBUG 

Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Giannis Galanis
Sjoerd, Guillaume, Simon,

What does proper notification mean? Which are the cases that it happens?

Probably this is not if an XO moves slowly to a place with poor
connectivity.

In the case of a temporary(short) disruption of connectictivity, how much
time does it generally take for it to return? You mentioned that in the past
XOs were appearing  and disappearing constantly. This implies that the
common drop of connectivity is in the scale of few seconds. If it is lost
for more than a few minutes, than it is not bad for the XO to leave and
return.  So I believe that 1h or even 10min are too long timeouts.

There are a couple more things I would like to address:

1. Is there a way to restart the presence service? In that way we can
resolve a weird state. Will killing restarting the porcess work?

2. At what point in the source code, the presence serivce
i.will try to connect to the jabber server?
ii. run gabble?

3. I noticed the dbus diagram is updated. Indeed we have a better picture of
whats happening. But, still we need some more information like:
i. state diagram of the presence service
ii. what type of communication is taking place between NM and PS
iii. when connection is switched from linklocal to schoolserver(for example)
what steps are taking place in the presence service
iv. the internet connectivity is detected by NM and sent to PS, or detected
by PS

yani




On 10/30/07, Sjoerd Simons [EMAIL PROTECTED]  wrote:

 On Fri, Oct 26, 2007 at 02:48:55PM -0400, Giannis Galanis wrote:
   Sjoerd,
 
  I would like to ask you,
 
  you replied at one of the bugs:

 Moving from a bugreport to a private mail might not be a great idea..
 Could you
 in the future just put your questions in the bugreport so we can have the
 discussion in a more public fashion :)

  Salut used to drop the presence of people for which it couldn't resolve
 the
  extra information, but this seemed to give a lot of problems in the mesh

  (people appearing and
  disappearing all the time). So as a workaround we switched to only
 dropping
  presence iff all info about a node has gone. Which has the downside the
  nodes that are really
  gone can still appear on the mesh view for some time (specifically when
  they didn't send a proper mdns bye packet or when that was dropped).
 
  iff all info about a node has gone
  what does this mean?

 It means that it is hard to decide when a node has really gone or if the
 network link to a certain node is just (temporarily) bad.

 In the OLPC office, the second case apparently happens a lot.

  how often do you refresh?

 The refresh is done by avahi. Avahi tries every few minutes. Guillame
 worked on
 a patch to make the effect of being unsure about a user less bad (As in
 assume
 that if your unsure about for a certain period of time their actually
 really
 gone).. It still needs to be finished though.

 Which means for an end-users point of view, that if a user went away
 without
 doing proper notification, then they will only stay on the meshview for a
 limited amount of time (Say maximum of 10 minutes instead of the current
 situation of more then an hour)



  Sjoerd
 --
 Kindness is the beginning of cruelty.
 -- Muad'dib [Frank Herbert, Dune]

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Morgan Collett
Giannis Galanis wrote:
 1. Is there a way to restart the presence service? In that way we can
 resolve a weird state. Will killing restarting the porcess work?

Killing it will result in it being restarted. However, Sugar remains in
an inconsistent state, with buddies stuck on the mesh view if they left
while PS was not running, since PS never told Sugar that they left.

 2. At what point in the source code, the presence serivce
 i.will try to connect to the jabber server?

If the server plugin detects that NM has an IP address. See below for
details.

 ii. run gabble?

The connection manager plugins are both started at startup, unless
PRESENCE_SERVICE_DEBUG=disable-gabble or disable-salut are set.

 3. I noticed the dbus diagram is updated. Indeed we have a better
 picture of whats happening. But, still we need some more information like:
 i. state diagram of the presence service

PS was designed to run with both plugins (which talk to gabble and
salut) running concurrently. However due to confusion about server and
link local activities displayed on the same mesh view with no
differentiation, and problems in connecting to some shared activities
you can see presence for, that was disabled in the run up to Trial3.

* Both plugins are started when PS starts.
* Salut succeeds faster than Gabble, so link local buddies are shown first.
* If PS gets an IP address from NM, it starts the server plugin (gabble).
* When the server plugin has started, PS stops the link local plugin
(salut). Link local buddies disappear.
* If NM loses the IP address, the server plugin stops itself.
* When the server plugin stops, PS starts the link local plugin.

Is that sufficient for your state diagram? There's a lot more than
happens, with async calls, so a complete state diagram would be rather
complex.

 ii. what type of communication is taking place between NM and PS
 iii. when connection is switched from linklocal to schoolserver(for
 example) what steps are taking place in the presence service
 iv. the internet connectivity is detected by NM and sent to PS, or
 detected by PS

PS's server plugin watches NM signals on the D-Bus system bus.

Regards
Morgan

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Simon McVittie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In reply to your previous mail, iff means if and only if. It's often
used by mathematicians.

On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote:
 What does proper notification mean? Which are the cases that it happens?

If Salut is explicitly asked to disconnect, it will tell Avahi to delete
all its mDNS records (this actually consists of re-sending all the
records it was advertising, with the Time To Live set to 0 seconds).
This is sometimes referred to as a goodbye packet. See
http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt
section 11.2 Goodbye Packets.

The only time we'll currently do this is when switching off Salut because
Gabble has connected successfully.

 Probably this is not if an XO moves slowly to a place with poor
 connectivity.

This is never done in response to network conditions - we can't know that
we've lost network connectivity until it's too late.

If the Time To Live on our mDNS records expires, that should have the same
effect; however, as Sjoerd explained, we currently ignore that, because
the 1CC mesh network is apparently unstable enough that the TTL
sometimes expires even for laptops that are actually present.

 In the case of a temporary(short) disruption of connectictivity, how much
 time does it generally take for it to return? You mentioned that in the past
 XOs were appearing  and disappearing constantly. This implies that the
 common drop of connectivity is in the scale of few seconds.

You tell me! :-) I don't have enough XOs to replicate the conditions of
a large mesh network like 1CC, so I can't comment on packet loss rates.
Perhaps Dan Williams (who used to maintain Presence Service) could help
you.

 If it is lost
 for more than a few minutes, than it is not bad for the XO to leave and
 return.  So I believe that 1h or even 10min are too long timeouts.

I believe we're currently using Avahi's default timeouts, which are
those recommended in the mDNS draft (linked above). If I'm right about
that, then we're using 120 second TTLs for the SRV and A records.

Assuming Salut and Avahi follow the draft's recommendations, this means
that for the records representing activities, buddies and laptops, if we
haven't seen an annoucement of a particular record, we will:

- - re-query after 96 - 98.4 seconds;
- - if no reply, re-query after 102 - 104.4 seconds;
- - if no reply, re-query after 114 - 116.4 seconds;
- - if no reply, assume the record has vanished after 120 seconds.

(In each of the ranges given for the re-queries, the exact time is
chosen at random, to avoid simultaneous queries from everyone in the
network.)

The timeout is reset as soon as we see any announcement of a record.

The only ones whose disappearance matters are the SRV and A records - if
a TXT record fails to disappear when it shouldn't, we don't really care.
TXT records have a substantially longer timeout (the draft recommends 75
minutes).

 There are a couple more things I would like to address:
 
 1. Is there a way to restart the presence service? In that way we can
 resolve a weird state. Will killing restarting the porcess work?

Only if client code that accesses the PS is amended to cope with this
(I just filed #4681 to represent this). Until #4681 is closed, if the PS
was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all of
Sugar. Please see the bug for more details or to reply.

 2. At what point in the source code, the presence serivce
 i.will try to connect to the jabber server?
 ii. run gabble?

I'll answer (ii.) first. Gabble is automatically run by the session bus
(dbus-daemon) via service activation, the first time the Presence Service
uses it, if it isn't already running. So there is no explicit code in the PS
to run Gabble.

OK, now (i.):

When Network Manager indicates that we have a valid IP address, we run
the _init_connection method of the ServerPlugin instance. If the Gabble
connection fails, we schedule a timer (currently 5 seconds) and retry
running _init_connection when the timer runs out. (classes
TelepathyPlugin and ServerPlugin, methods _init_connection,
_reconnect_cb, _could_connect, _handle_connection_status_change.)

What _init_connection does is: If there's already a Gabble connection and it's
connected, it'll be used. (class ServerPlugin, method
_find_existing_connection). Otherwise we make a new connection (method
_make_new_connection).

ServerPlugin (src/server_plugin.py) inherits from TelepathyPlugin
(src/telepathy_plugin.py) so some of the methods I mentioned are defined
in TelepathyPlugin, some in ServerPlugin, and some are defined in
TelepathyPlugin but overridden in ServerPlugin.

 ii. what type of communication is taking place between NM and PS

D-Bus messages, on the system bus.

 iv. the internet connectivity is detected by NM and sent to PS, or detected
 by PS

Internet connectivity isn't really detected, as such. The PS listens for
signals from Network Manager that 

Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Simon McVittie
On Tue, 06 Nov 2007 at 12:05:59 +0200, Morgan Collett wrote:
 Giannis Galanis wrote:
  1. Is there a way to restart the presence service? In that way we can
  resolve a weird state. Will killing restarting the porcess work?
 
 Killing it will result in it being restarted. However, Sugar remains in
 an inconsistent state, with buddies stuck on the mesh view if they left
 while PS was not running, since PS never told Sugar that they left.

The situation would actually be worse than that, see #4681.

Simon
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Giannis Galanis
Thank you all for your replies. They clear the picture a lot.

To summarize:

1. We need to fix the timeout for icons to disappear. Can we try Guillaume's
patch? Also we need to be able to resolve which icons are currently not
avaiable(but still appearing). I believe that failed entries in
_precense._tcp is a complete list. Is this correct?

2. We need to be able to restart PS. As you say this is not possible, but if
we restart sugar will PS restart as well?

3. We need to force gabble to run. We have several instances of 4193 (almost
all XOs connected to schoolserver,AP are running salut). Or at least to
force trying to connect to jabber server.

4. The process of trying to connect to the jabber server, is done by
telepathy-gabble, or by the presence

On 11/6/07, Simon McVittie  [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 In reply to your previous mail, iff means if and only if. It's often
 used by mathematicians.

 On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote:
  What does proper notification mean? Which are the cases that it happens?


 If Salut is explicitly asked to disconnect, it will tell Avahi to delete
 all its mDNS records (this actually consists of re-sending all the
 records it was advertising, with the Time To Live set to 0 seconds).
 This is sometimes referred to as a goodbye packet. See
 http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt
 section 11.2 Goodbye Packets.

 The only time we'll currently do this is when switching off Salut because
 Gabble has connected successfully.

  Probably this is not if an XO moves slowly to a place with poor
  connectivity.

 This is never done in response to network conditions - we can't know that
 we've lost network connectivity until it's too late.

 If the Time To Live on our mDNS records expires, that should have the same

 effect; however, as Sjoerd explained, we currently ignore that, because
 the 1CC mesh network is apparently unstable enough that the TTL
 sometimes expires even for laptops that are actually present.

  In the case of a temporary(short) disruption of connectictivity, how
 much
  time does it generally take for it to return? You mentioned that in the
 past
  XOs were appearing  and disappearing constantly. This implies that the
  common drop of connectivity is in the scale of few seconds.

 You tell me! :-) I don't have enough XOs to replicate the conditions of
 a large mesh network like 1CC, so I can't comment on packet loss rates.
 Perhaps Dan Williams (who used to maintain Presence Service) could help
 you.

  If it is lost
  for more than a few minutes, than it is not bad for the XO to leave and
  return.  So I believe that 1h or even 10min are too long timeouts.

 I believe we're currently using Avahi's default timeouts, which are
 those recommended in the mDNS draft (linked above). If I'm right about
 that, then we're using 120 second TTLs for the SRV and A records.

 Assuming Salut and Avahi follow the draft's recommendations, this means
 that for the records representing activities, buddies and laptops, if we
 haven't seen an annoucement of a particular record, we will:

 - - re-query after 96 - 98.4 seconds;
 - - if no reply, re-query after 102 - 104.4 seconds;
 - - if no reply, re-query after 114 - 116.4 seconds;
 - - if no reply, assume the record has vanished after 120 seconds.

 (In each of the ranges given for the re-queries, the exact time is
 chosen at random, to avoid simultaneous queries from everyone in the
 network.)

 The timeout is reset as soon as we see any announcement of a record.

 The only ones whose disappearance matters are the SRV and A records - if
 a TXT record fails to disappear when it shouldn't, we don't really care.
 TXT records have a substantially longer timeout (the draft recommends 75
 minutes).

  There are a couple more things I would like to address:
 
  1. Is there a way to restart the presence service? In that way we can
  resolve a weird state. Will killing restarting the porcess work?

 Only if client code that accesses the PS is amended to cope with this
 (I just filed #4681 to represent this). Until #4681 is closed, if the PS
 was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all
 of
 Sugar. Please see the bug for more details or to reply.

  2. At what point in the source code, the presence serivce
  i.will try to connect to the jabber server?
  ii. run gabble?

 I'll answer (ii.) first. Gabble is automatically run by the session bus
 (dbus-daemon) via service activation, the first time the Presence Service
 uses it, if it isn't already running. So there is no explicit code in the
 PS
 to run Gabble.

 OK, now (i.):

 When Network Manager indicates that we have a valid IP address, we run
 the _init_connection method of the ServerPlugin instance. If the Gabble
 connection fails, we schedule a timer (currently 5 seconds) and retry
 running _init_connection when the timer runs out. (classes