MPP tests
On my last week at 1cc, I performed several tests on MeshPortalPoint(MPP) configurations. Perhaps it is good for all to have an insight on what is possible, and what isnt with MPP. (especially after tomorrows network presentation!) For those who dont know yet, an XO acts as an MPP, when it manages to share its private internet connectivity to other XOs in the mesh. The MPPs tested managed to successfully share their connectivity through the msh0 interface. Their connectivity was via 1) a GSM/usb modem though a ppp0 interface(thanks to Ankur!!) 2) a simple Wifi AP 3) a School Wifi The common cases of MPP regarded home scenarios, or under-the-tree scenarios(where at least 1 XO had access to an internet connection) Specifically in the GSM modem case, probably only one XO will have the device due to cost.. etc But, it was rarely discussed whether an MPP can be useful at the school(am i wrong?) Probably because Schools were initially configured with Meshes, and the MPP is probably useless there (if the MPP, XO, School are indeed in the same mesh, then the XO can reach the School directly) However, now that Schools are mostly connected with Access points, MPP can be useful in School as well! Other XOs in the neighborhood can join a School Wifi via an MPP, combining the benefits of the AP in the dense environment, and the Mesh in the scarce environment ***Michali*, I have a question for u: In the case that some XOs are many hops away(5 and 10) from the School, can an XO at halfway act as an MPP to practically increase the ttl? I can understand that it would probably work if the MPP had two mshX ifaces on diff channels(the second to be an active antenna).. Is this a good way to bridge two mesh clouds? Results: 1) GSM modem / SimpleWifi * All XOs (including the MPP) had access to the internet * The client-XOs showed a DNS server(in resolv.conf) of the msh0 address of the MPP-XO * jabber:All XOs(including the MPP) could perfectly collaborate via a jabber server(any publicly routable jabber should work.. i used schoolserver.laptop.org) * salut: I disabled gabble in all XOs with sugar-control-panel -s jabber foo a) All XOs(including the MPP) shared *Presence information*.. i.e. New XO arrivals, new activities, who joins the activity..etc b) Only the client-XOs could perform actual collaboration!! why is this happening? I found it rather strange.. I was under the impression that Presence data were very similar to Activity sharing data.. *Dafydd*, can you explain this? (or anyone from collabora) 2) School Wifi.. tested with media lab 802.11, which is also connected to schoolserver.laptop.org All the above facts were also true here. What makes this case more special than simple Wifi, is that the client-XOs would share the benefits of being at the school. However, * The client-XOs could *not* Register * They also could *not* resolve schoolserver(it is the simplest way to tell whether the School services are accessible) (it could ping schoolserver.laptop.org, but this is publicly routable anyway) * They could ping 172.18.0.1 Basically, they could reach the school server machine, but didnt treat it as a school! *Wad*, can you explain this behavior? yanni http://schoolserver.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: #7188 HIGH Never A: Both telepathies are permanently down
sorry for the multiple tickets. i had a problem accessing trac. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: New update.1 build 706
these is a list some bugs discovered tonight on 706 5848: The mesh circle in the main view was disappered 7121: Chat would not load 7119: Usb stick was too slow to mount(1min) 7118: Letters in all sugar activities became tiny! **5848 probably isnt linked to the new libertas On Tue, May 27, 2008 at 10:30 PM, Build Announcer v2 [EMAIL PROTECTED] wrote: http://pilgrim.laptop.org/~pilgrim/olpc/streams/update.1/build706http://pilgrim.laptop.org/%7Epilgrim/olpc/streams/update.1/build706 Changes in build 706 from build: 705 Size delta: 0.13M -kernel 2.6.22-20080312.2.olpc.f3687aa7e09fd65 +kernel 2.6.22-20080523.1.olpc.28f4cb6e780db07 -libertas-usb8388-firmware 2:5.110.22.p1-1.fc7 +libertas-usb8388-firmware 2:5.110.22.p14-1.fc7 -xorg-x11-drv-evdev 1.2.0-2norel.olpc2 +xorg-x11-drv-evdev 1.2.0-3norel.olpc2 -xorg-x11-server-Xorg 1.4-8.olpc2 +xorg-x11-server-Xorg 1.4-9.olpc2 --- Changes for kernel 2.6.22-20080523.1.olpc.28f4cb6e780db07 from 2.6.22-20080312.2.olpc.f3687aa7e09fd65 --- + Patch the kernel, xorg-x11-server, and xorg-x11-drv-evdev packages + Kernel source code taken from the 'stable' branch of + Thanks to Blake Setlow for writing all the code and for pushing me to --- Changes for xorg-x11-drv-evdev 1.2.0-3norel.olpc2 from 1.2.0-2norel.olpc2 --- + Patch the kernel, xorg-x11-server, and xorg-x11-drv-evdev packages + Kernel source code taken from the 'stable' branch of + Thanks to Blake Setlow for writing all the code and for pushing me to --- Changes for xorg-x11-server-Xorg 1.4-9.olpc2 from 1.4-8.olpc2 --- + Patch the kernel, xorg-x11-server, and xorg-x11-drv-evdev packages + Kernel source code taken from the 'stable' branch of + Thanks to Blake Setlow for writing all the code and for pushing me to -- This mail was automatically generated See http://dev.laptop.org/~rwh/announcer/update.1-pkgs.htmlhttp://dev.laptop.org/%7Erwh/announcer/update.1-pkgs.htmlfor aggregate logs See http://dev.laptop.org/~rwh/announcer/joyride_vs_update1.htmlhttp://dev.laptop.org/%7Erwh/announcer/joyride_vs_update1.htmlfor a comparison ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: New network scripts/tools for testing
On Mon, May 12, 2008 at 10:58 AM, Mikus Grinbergs [EMAIL PROTECTED] wrote: Some small discrepancies in the output of the new 'olpc-netstatus': 1) I have a wired connection. (NO wireless.) I do not understand why, but for some Joyride builds, the wired connection gets assigned to 'eth0', and for others it gets assigned to 'eth1'. My current build (1932) assigns it to 'eth1'. The result is 'olpc-connections' and 'olpc-netstatus' have NOTHING to report for 'eth0' (that interface is there, but does not have an IPv4 address). olpc-netstatus should work either way. I see that it detected properly that eth1 is your ethernet. It should also work if it was the other way around. it scans all eth*, and checks which has an IP.(now if both have an IP it willonly choose one) oh btw *I think* eth1 shows as the wireless, when you upgrade the build with the eth/usb adapter plugged in(not 100% sure) about olpc-connections then this is a bug. It is not smart enough to determine whether eth0/eth1 is active. I will make sure this is fixed before i put on the build 2) My connection goes through a proxy. The result is that 'olpc-connections' and 'olpc-netstatus' show the Proxy-system IP, where they claim to be showing the Jabber-system IP. this i dont know how to fix. perhaps there is nothing i can do. I will have to ask in 1cc 3) (For nameserver?) 'olpc-netstatus' refers to /root/test. My system has no such file. oh this is a terrible mistake! I was testing with a sample resolv.conf file and I forgot about it. I updated it properly now on the wiki. Thanx alot! Thanks alot for the feedback! ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
New network scripts/tools for testing
The past couple of weeks I have been working on developing several Network testing scripts, that make testing a more pleasant experience! The scripts collect and display information about the network configuration, telepathies and their status, the neighbor XOs and the forwarding tables For details have a look at http://wiki.laptop.org/go/Network_Resources (I created this page to group general Network info, including important Wiki pages, scripts, bugs etc... but now the scripts use 90% of the page!) a short overview: *olpc-connections*: Tracks any change in msh0/eth0, dns, Telepathy status, jabber, num of XOs in the neighborhood. *olpc-xos* *-avahi*: Displays the XOs currently seen by Avahi. You may also run it continously with *-c*. Then it will continuously scan for changes, and display the list with a timestamp when a change is identified.action usb *olpc-xos -sugar*: The same as above, but for the sugar XOs. It works with salut and gabble. (Note that sometimes avahi and salut show different XOs) ** When running a test that involves collaboration, it is very useful to have the above scripts run at boot and log to a file. All changes are timestamped so you can track down bugs much easier *olpc-mesh*: It collects data from the firmware ioctls, and displays the complete forwarding in a readable manner. It may also replace the MAC address with the correspondig Nick name if you have a mac-nick table. You may create this table from the neighbor XOs with *olpc-xos -mac*, which was written for this purpose. *sugar-xos**: *(This was written by Guillaume in Python) It displays a list of the sugar XOs. It is separate so it can be used as a library from olpc-xos, olpc-connections and olpc-netstatus. The reason it is split from olpc-xos is that the latter does more processing(tracks changes), and also works for Avahi, *and* also because the former is written in python, which I know very little of! So i used the first as an input to the second..(perhaps i will clean this in the future) *sugar-telepathies*: (Thanks Daf for your help!)This lists the presenceservice Telepathies and is used as a library from olpc-connections and olpc-netstatus Also, I updated the olpc-netstatus and olpc-netlog: *olpc-netstatus*: (Old versions are on our build several months now). This tool collects several network info and other info like build, libertas etc. It determines which configuration you are connected to(Simple Wifi, School WIfi, Simple Mesh, School Mesh) checks which Telepathy is currently active, and whether there is connection to Jabber. new stuff: *checks if a school is present *reads the Telepathies from Dbus(not the ps list), displays their status *shows uptime *shows num of XOs connected *olpc-log*: (it was previously named as olpc-netlog, and was also present in our builds long time now) It gathers all possible logs(messages, activities, dmesg, etc..) and stores them to a tarball named by S/N and timestamp. It also collects several files and the output of network commands like i[f|w]config, route, olpc-nestatus etc For complete list of stuff logged check olpc-log --help(olpc-*net*log --help, for older versions) new stuff: *includes config file *includes progress bar(sometimes it might take even 1min) With the help of Michael, the scripts will be shortly available in the next joyride. I would also highly recommend that olpc-connections to be logging by default at startup(when debug logs are enabled) Waiting for any recommendations/feedback yanni ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
If you need to customize an XO automatically, you should use the Action usb stick
The action USB key offers the capability to customize the XOs nand image automatically. It is also possible to set certain function perform at every boot. It is a very usefull tool to customize quickly many XOs, collect information to the usb, transfer files to the XO etc.. The required can be found here: http://wiki.laptop.org/images/a/a9/Actionkey.tar.gz To prepare the key: 1) Edit the action script with commands to be executed once. This could involve copying files, collecting data 2) Edit the rc.tweaks script with commands to be executed at every boot. This could involve setting variables, running testing tools etc.. 3) Copy all files(boot dir, bzimage, action, tc.tweaks) to the root, along with any other file you need to perform your customization To install the key: 1) Turn the XO off 2) Boot the XO with the usb stick plugged in without holding any game key 3) Wait until you see no job control in the shell, and you are done! 4) To turnoff type exit, or simply hold the power button. Remove the stick before you boot again Notes: * The key only works in activated machines with developer keys * The path to the usb stick would be /mnt/usb * Dont erase the export PATH=.. line from the action key * If you copy files back to he usb, make sure you type sync, or exit before removing the stick * If you wanna install an rpm, you should first copy it locally with the action script. Then set it to install with the rc.tweaks script, and clear the instruction so it doesnt install the nest time. * When booting with the action script, the wireless firmware is not loaded * Thank mstone for creating this amazing tool! If you need to customize an XO automatically, you should use the Action usb stick ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: list of laptops connected to jabber
thanx, thats exactly what i needed On Tue, Apr 22, 2008 at 5:56 AM, Guillaume Desmottes [EMAIL PROTECTED] wrote: Le lundi 21 avril 2008 à 10:05 +0200, Guillaume Desmottes a écrit : Le samedi 19 avril 2008 à 22:18 +0300, Giannis Galanis a écrit : In the testbed in peabody, the list of peers seen from the server is usually a superset of what we see from each individual mesh view. Could be related to: #6883 #6884 #6888 It would be very useful if we could get the list shown by the analyze activity, but from the console. I believe it would easy to do that from the telepathy-gabble logs. For every new arrival or departure there must a specific entry. Actually there are different levels for this: - Contacts know as online by Gabble - OLPC Buddy in the PS - Buddy displayed by sugar in the mesh view And of course, bugs can occur in each level. telepathy-gabble.log gives us enough information to track the first level (but not easy to read as log can be a mess). presence-service.log for the second. And currently the only way to check 3 is to manually count buddies. I agree with you, more helper would be welcome to debug these kinds of bugs. I wrote a simple script listing all the buddies known by the PS. See https://dev.laptop.org/ticket/6918 Maybe we should ship it with images? G. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: list of laptops connected to jabber
In the testbed in peabody, the list of peers seen from the server is usually a superset of what we see from each individual mesh view. It would be very useful if we could get the list shown by the analyze activity, but from the console. I believe it would easy to do that from the telepathy-gabble logs. For every new arrival or departure there must a specific entry. On 4/19/08, Dafydd Harries [EMAIL PROTECTED] wrote: Ar 18/04/2008 am 23:25, ysgrifennodd Giannis Galanis: When connecting to a jabber server, how can we check the list of XOs that are seen in the mesh view, or the analyze activity? Is checking the gabble log the only way? What records in the log indicate arrival or departure? When testing with 50 or 100 XOs connected it is often impractical to detect missing icons, and a commandline tool would be of more help. There is a presence service monitor tool; I think it's included in the Analyze activity. If you log into the the Jabber server, you can just ejabberdctl to list the connected users. -- Dafydd ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Tickets for mesh problems
We have these network UI bugs: 5904 - GUI problem updating buddies clustered around shared activity 5459 - second circle in sugar home view provides false information 5908 - Laptop unable to connect to schoolserver jabber server Also 4193 - Two XOs were connected to an access point and were still running salut is a dup of 5908 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
list of laptops connected to jabber
When connecting to a jabber server, how can we check the list of XOs that are seen in the mesh view, or the analyze activity? Is checking the gabble log the only way? What records in the log indicate arrival or departure? When testing with 50 or 100 XOs connected it is often impractical to detect missing icons, and a commandline tool would be of more help. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: New update.1 build 699
Chris, what does inhibit-idle-suspend do? On Wed, Mar 12, 2008 at 11:45 PM, Build Announcer v2 [EMAIL PROTECTED] wrote: http://pilgrim.laptop.org/~pilgrim/olpc/streams/update.1/build699http://pilgrim.laptop.org/%7Epilgrim/olpc/streams/update.1/build699 Changes in build 699 from build: 698 Size delta: -1.31M -kernel 2.6.22-20080304.1.olpc.914fce4d9a8baf3 +kernel 2.6.22-20080312.2.olpc.f3687aa7e09fd65 -ohm 0.1.1-6.10.20080119git.fc7 +ohm 0.1.1-6.11.20080119git.fc7 -sugar 0.75.13-1.olpc2 +sugar 0.75.14-1.olpc2 -sugar-presence-service 0.75.1-1.olpc2 +sugar-presence-service 0.75.2-1.olpc2 -Read 44 -Chat 35 -Web 86 -Write 55 -Record 53 -Paint 19 --- Changes for ohm 0.1.1-6.11.20080119git.fc7 from 0.1.1-6.10.20080119git.fc7 --- + touch /etc/ohm/inhibit-idle-suspend to allow sleep without idle. --- Changes for sugar 0.75.14-1.olpc2 from 0.75.13-1.olpc2 --- + Fix #6671 #5933 #6405 -- This mail was automatically generated See http://dev.laptop.org/~rwh/announcer/update.1-pkgs.htmlhttp://dev.laptop.org/%7Erwh/announcer/update.1-pkgs.htmlfor aggregate logs See http://dev.laptop.org/~rwh/announcer/joyride_vs_update1.htmlhttp://dev.laptop.org/%7Erwh/announcer/joyride_vs_update1.htmlfor a comparison ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: New update.1 build 699
Ok. Should we test automatic suspend by removing the file? Or we dont consider it a priority any more? On Fri, Mar 14, 2008 at 12:05 AM, Chris Ball [EMAIL PROTECTED] wrote: Hi, what does inhibit-idle-suspend do? It allows you to disable automatic idle suspend while keeping enabled the explicit suspend on power button press or lid close. Previously, there was only the inhibit-suspend file that inhibits both of the above. - Chris. -- Chris Ball [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Preparing the XOs for next week's test
Kim, You can see the diffs here: http://dev.laptop.org/~rwh/announcer/joyride_vs_update1.html http://dev.laptop.org/%7Erwh/announcer/joyride_vs_update1.html You will see that there are plenty of new stuff in joyride than just a telepathy-salut update. One thing we can do is use 693 and install the specific package, or make a new Update.1 build that includes it. But we should test it individually before putting it in the update.1 build. One thing we can do is have half the XOs with 1721 in one channel, and the other half with 693 in another channel. Also, how about using B4s? Is there any effect in the performance except suspend/resume? I remember Ricardo saying there were hardware changes related to 4470. Ricardo, can you confirm this? If we decide on the build by tonight, I can have all the XOs updated and ready by tomorrow. On Sat, Feb 23, 2008 at 5:08 PM, Kim Quirk [EMAIL PROTECTED] wrote: Agreed that Read sharing is the highest priority application. My concern is if there are a lot of other things in joyride, then it will take us a long time to get a release out based on joyride. If we pull the fix for Read back into update.1, (and other things that we find next week), then we won't waste time on testing or finding bugs in joyride. Does anyone have a good feel for the differences between today's Update.1and joyride 1721 -- or can someone list the diffs so we can make a decision? Kim On Sat, Feb 23, 2008 at 8:13 AM, Walter Bender [EMAIL PROTECTED] wrote: Read sharing is a critical feature. Please do test it. -walter On 2/23/08, Morgan Collett [EMAIL PROTECTED] wrote: Giannis Galanis wrote: 2. I will try to update all of them with the build we will agree to initially test with. This would be 693/D13? There is a new version of telepathy-salut in 1721, which apparently only fixes smth related to stream tube flush(which i dont know what it is). I dont believe it important to our test. Other than that Update.1 i think should be ok. As I said in reply to Chris's mail, the salut fix is for Read in #6483. If you are going to test sharing PDFs in Read, please use Joyride-1721 otherwise there is a high chance it won't work at all under any conditions. Morgan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel -- Walter Bender One Laptop per Child http://laptop.org ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Preparing the XOs for next week's test
Kim, The suspend/resume problems will show even with 2 XOs. This cannot be fixed at the moment. As Michalis mentioned in another email, testing S/R and mesh scalability will just break the test. We have to test them invidually. On Sun, Feb 24, 2008 at 10:35 PM, Kim Quirk [EMAIL PROTECTED] wrote: Right. But the suspend and resume problems we've seen with the mesh and sharing can be recreated on a relatively small number of laptops (10). So we will either fix the problems, or turn off suspend in order to test for scaling issues above 50. We have 50 MPs for next weeks testing. So we should be ok. Kim On Sun, Feb 24, 2008 at 10:12 PM, John Gilmore [EMAIL PROTECTED] wrote: Ricardo, if you think there is anything else different with B4s in regards to network performance, please tell us. I'm not aware of anything in hardware. They don't suspend. So if MP's have networking trouble that happens when a laptop suspends, the trouble won't happen on a B4. John ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Suspended time vs Resumed time in an idle XO
in case you need it, i am resending the script because it was blocked On Fri, Feb 22, 2008 at 2:55 PM, Giannis Galanis [EMAIL PROTECTED] wrote: I have noticed that an idle machine will resume for some time, and suspend again, several times for no reason. I wrote a simple script that checks every 1sec whether the machine is suspended on not. It gives a timeline of Suspended times and Resumed times. A left an XO completely idle overnight for 12h. The results were: It resumed about 80 times It was resuming every 1m to 10min The total suspended time percentage was 90% Do these numbers seem normal? Chris was mentioning the other day about the additional power consumed to resume the XO. I can assume that resuming/suspending at a regular basis is not very power efficient. Also, this script made it easy to examine what happens to timeouts that are interrupted with suspends. The result is that the suspended time extends the timeout. The timeout does not expire relative to the absolute time, but the time the CPU is alive. So if a 10min timeout is interrupted by a 2min suspend, the timeout will expire 12min after the point it was executed. Scott, does this agree with what you expected? suspendtime Description: Binary data ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Suspended time vs Resumed time in an idle XO
I have noticed that an idle machine will resume for some time, and suspend again, several times for no reason. I wrote a simple script that checks every 1sec whether the machine is suspended on not. It gives a timeline of Suspended times and Resumed times. A left an XO completely idle overnight for 12h. The results were: It resumed about 80 times It was resuming every 1m to 10min The total suspended time percentage was 90% Do these numbers seem normal? Chris was mentioning the other day about the additional power consumed to resume the XO. I can assume that resuming/suspending at a regular basis is not very power efficient. Also, this script made it easy to examine what happens to timeouts that are interrupted with suspends. The result is that the suspended time extends the timeout. The timeout does not expire relative to the absolute time, but the time the CPU is alive. So if a 10min timeout is interrupted by a 2min suspend, the timeout will expire 12min after the point it was executed. Scott, does this agree with what you expected? suspendtime.sh Description: Bourne shell script ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Suspended time vs Resumed time in an idle XO
It is possible indeed. We have to check. I will try to test how it works with avahi. On Fri, Feb 22, 2008 at 3:25 PM, Ricardo Carrano [EMAIL PROTECTED] wrote: Yanni, But we should note that, not everything that expires, does so because of a timer. A cache entry may have an associated timestamp and expire in timestamp + ttl. I have noticed that an idle machine will resume for some time, and suspend again, several times for no reaso The result is that the suspended time extends the timeout. The timeout does not expire relative to the absolute time, but the time the CPU is alive. So if a 10min timeout is interrupted by a 2min suspend, the timeout will expire 12min after the point it was executed. Scott, does this agree with what you expected? ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Preparing the XOs for next week's test
A couple of stuff for the next week's test. 1. We have about 45 XOs in the conference room, and we can make it up to 80 by collecting other XOs in the office. Do you think this is enough? 2. I will try to update all of them with the build we will agree to initially test with. This would be 693/D13? There is a new version of telepathy-salut in 1721, which apparently only fixes smth related to stream tube flush(which i dont know what it is). I dont believe it important to our test. Other than that Update.1 i think should be ok. 3. I can also disable suspend/resume in all of them in case we decide we dont wanna have it enabled. It will save alot of time by doing it on monday. If there is anything you think might be useful to prepare in advance, let me know! ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: How XO's know XS
When the XO connects to a mesh channel, it sends a specific request for a School server. If it receives a reply it knows there is SS somewhere in the mesh. After it receives the reply it will attempt to connect to it. If no reply is received within a certain timeout, then the XO will connect to simple mesh. On Wed, Feb 20, 2008 at 5:31 PM, Shikhar [EMAIL PROTECTED] wrote: I was wondering how an individual XO identifies a school server. On the wiki I see that '...When a laptop is activated, it is associated in some way (TBD) with a school server. (http://wiki.laptop.org/go/XS_Server_Services#Security_and_Identity) Best Shikhar ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Salut and Suspend/Resume issues
On Feb 19, 2008 10:13 AM, Ricardo Carrano [EMAIL PROTECTED] wrote: I was asking whether it would help to have the wireless module wake us on multicast packets instead of only unicast. Are you saying that it would? It seems so, though it would, as John points out, make resumes far more constant. It seems we have to find a creative way out of this tough choice (automated suspend vs mesh) or face it. Avahi entries will expire after some time. Suspend will prevent it to update its cache. Yani's bug report (#6467) suggests that Avahi entries often expire immediately upon resume: After the XO resumes (probably after beinng suspended for several minutes) all the icons in the mesh view vanish, except the mesh circles. I read this as the avahi-cache expiring its entries. Yanni can you put timeframes on this? Could check how long does it take to expiry an entry (TO) and then check if: Suspend time TO - all entries vanish Suspend time TO - no entries vanish Supens time ~ TO - some entries vanish There as 2 cases where icons vanish due to suspend. 1st: The moment you resume(it generally happens after long suspends), all icons vanish instantly(APs/XOs). This bug (#6467) suggests that sugar has a problem with suspend resume. The icons slowly reappear. I assume that if the avahi peer list is intact that all XOs return. 2nd: The avahi list smtimes looses some or all of the peers at resume. This is also under 6467, but it seems technicaly different. One possible explanation could be that during suspend th XO resumes several times, but i didnt notice it! And within this time frames it realized that the other suspended XOs are gone, so it cleared its cache. Now when I resumed it myself, I observed that the cache is clean!! Now, regarding the timeouts of avahi. This is a 3rd thing: When an XO leaves the channel we have 4 states: mm:ss 1. 00:00 XO leave the channel(manually/or ti suspended) 2. 10:00 Avahi notices teh XO left, and reports it as failed 3. 30:00 Icon dissappears in the mesh view 4. 60:00 Avahi cache is cleared Additionally there is a bug(#5501) according to which, is a NEW XO arrives between states 2 and 3, then instantly ALL failed avahi peers are cleared and the corresponding icons vanish. So, the 3rd case is the following: Assume a mesh has e.g. 20 XOs, and I use my XO so it doesnt suspend, but the rest 19 of them are suspended. If in 10mins a new XO arrives, then all the 19 XOs instantly vanish from the mesh. So the TO time is between 10-30min... but closer to 10min if many XOs suspend/resume So if resume time 10min everything is fine!! What i dont know is when an XO resumes if it sends any avahi packet no notify tis presence/return. Because if it doesnt, then the XO wont exist int he others cache list, so the others wont search for it. Sjoerd, can you answer this? This would explain why after resume some XOs take tooo long to see each other again. If you combine this with the 2nd case, you will see that in the natural case that XOs will resume at random points in time by the user, they will all clear their cache, unless they resume concurrently. So in the end, all will have empty caches!! Thanks, - Chris. -- Chris Ball [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Salut and Suspend/Resume issues
For the protocol to be healthy, not only you have to wake every 10min to send your request, but you have to be awake to receive the others' requests. These are again 10min, but have different offsets. Thats why I believe the only way would be to have 10off - 10on. Still, due to bug 5501, if you miss a single request, u are prone to be deleted right away. So 10ff-10on might not work either. In fact the although the requests are every 10min, the icon will hold for 30min in total until it is deleted. Bug 5501, however, will delete the entry if within the timeframe, a new host arrives. On Feb 19, 2008 1:19 PM, Benjamin M. Schwartz [EMAIL PROTECTED] wrote: On Tue, 2008-02-19 at 13:11 -0500, Giannis Galanis wrote: The wakeup required is T minutes for every T minutes. Actually you would need to be awake for T minutes and suspended for T minutes to be sure u are ok. So for T=10min, as in this case: 9off, 11on, 9 off, 11on but this is not very effective in terms of suspend/resume I meant to imply that this would work only if the wireless hardware wakes up the system for every broadcast. --Ben ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Salut and Suspend/Resume issues
The list expires in 10min-30min. But we cant wait 30min before suspending, it is way too long. On Feb 19, 2008 11:37 AM, Ricardo Carrano [EMAIL PROTECTED] wrote: Yanni, As I posted in the bug, I believe that you are observing the entries on the avahi cache expiring. So, your first scenario would happen when the suspend time is longer than the time it takes for all entries to expire. The second scenario would happen when the suspend time is not long enough to make all cached entries to go away. Oh i see that you mean. But, i think both cases are when the suspend time is longer than time to expire. The first is UI effect, and might have no relation to salut, but to mesh view in general The second is an avahi effect, that the avahi cache is chagned Both, are in long suspends And the third scenario seems related to previous reports you've made on the Xmas tree effect, so not related to suspend/resume. The xmas tree effect appears when XOs leave connection, while others return. Suspend/resume enhances this effect dramatically, because in 1-2min everyone goes away, and they return at random time according to when they resume. In my suspend-salut tests , the xmas tree effect(although NOT related to suspend/resume), it affects salut alot more then the other 2 scenarios My point is that we must fix it anyway. But especially now!! What do you think? I have 2 questions that will help (me) understand alot about the situation: 1. When a XO resumes, does it send any notification via avahi, that it is back? Because if it doesnt, then other XOs that have cleared it from their lists, they will never search for it. 2. Every scans the network every 10min, to check whether its avahi peers are alive, in multicast packets. Do these packets include the address of the peers/targets? I think they do, unless i am very confused. Couldn't we awake/resume the target XO when it receives these specific packets? we need to do some sniffing On Feb 19, 2008 1:13 PM, Giannis Galanis [EMAIL PROTECTED] wrote: On Feb 19, 2008 10:13 AM, Ricardo Carrano [EMAIL PROTECTED] wrote: I was asking whether it would help to have the wireless module wake us on multicast packets instead of only unicast. Are you saying that it would? It seems so, though it would, as John points out, make resumes far more constant. It seems we have to find a creative way out of this tough choice (automated suspend vs mesh) or face it. Avahi entries will expire after some time. Suspend will prevent it to update its cache. Yani's bug report (#6467) suggests that Avahi entries often expire immediately upon resume: After the XO resumes (probably after beinng suspended for several minutes) all the icons in the mesh view vanish, except the mesh circles. I read this as the avahi-cache expiring its entries. Yanni can you put timeframes on this? Could check how long does it take to expiry an entry (TO) and then check if: Suspend time TO - all entries vanish Suspend time TO - no entries vanish Supens time ~ TO - some entries vanish There as 2 cases where icons vanish due to suspend. 1st: The moment you resume(it generally happens after long suspends), all icons vanish instantly(APs/XOs). This bug (#6467) suggests that sugar has a problem with suspend resume. The icons slowly reappear. I assume that if the avahi peer list is intact that all XOs return. 2nd: The avahi list smtimes looses some or all of the peers at resume. This is also under 6467, but it seems technicaly different. One possible explanation could be that during suspend th XO resumes several times, but i didnt notice it! And within this time frames it realized that the other suspended XOs are gone, so it cleared its cache. Now when I resumed it myself, I observed that the cache is clean!! Now, regarding the timeouts of avahi. This is a 3rd thing: When an XO leaves the channel we have 4 states: mm:ss 1. 00:00 XO leave the channel(manually/or ti suspended) 2. 10:00 Avahi notices teh XO left, and reports it as failed 3. 30:00 Icon dissappears in the mesh view 4. 60:00 Avahi cache is cleared Additionally there is a bug(#5501) according to which, is a NEW XO arrives between states 2 and 3, then instantly ALL failed avahi peers are cleared and the corresponding icons vanish. So, the 3rd case is the following: Assume a mesh has e.g. 20 XOs, and I use my XO so it doesnt suspend, but the rest 19 of them are suspended. If in 10mins a new XO arrives, then all the 19 XOs instantly vanish from the mesh. So the TO time is between 10-30min... but closer to 10min if many XOs suspend/resume So if resume time 10min everything is fine!! What i dont know is when an XO resumes if it sends any avahi packet no notify tis presence/return. Because if it doesnt, then the XO
Re: Salut and Suspend/Resume issues
On Feb 19, 2008 12:55 PM, Benjamin M. Schwartz [EMAIL PROTECTED] wrote: On Tue, 2008-02-19 at 12:29 -0500, Giannis Galanis wrote: The avahi works is that every several minutes(a predetermined timeout) each host will send multicast request for all peers in its list. Then all peers receiving this request will send a multicast reply. The packets are multicast because the mesh is mobile/dynamic so we dont know where the target is, or which is the ideal route The problem is that with a timeout of T minutes and N laptops, there is a wakeup required every T/N minutes, on average? The wakeup required is T minutes for every T minutes. Actually you would need to be awake for T minutes and suspended for T minutes to be sure u are ok. So for T=10min, as in this case: 9off, 11on, 9 off, 11on but this is not very effective in terms of suspend/resume Based on your description, it sounds as if this could be fixed by a small change in Avahi's timeout behavior. If I reach the timeout, I send a broadcast saying Everyone, what's your status?. In reply, all users send a broadcast My status is X. All peers receive all of these broadcasts, and reset their timers to zero. In this way, all laptops wake up together once every T minutes. Surely the solution is not this simple... The problem is that the others wont know YOUR status. I think the confirmation of status is not announced/beaconed, but requested first. But someone from collabora must confirm this ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Salut and Suspend/Resume issues
On Feb 19, 2008 2:48 PM, Ricardo Carrano [EMAIL PROTECTED] wrote: Yanni, Timeout is a value, not a range. The effects brought by the timeout may manifest in a period (a range). Did a use it otherwise? Because of the effects of xmas tree, the timeout for a failed XO until it's icon is removed is 10-30min. I believe everyone will agree that 30 minutes is a long time to wait (and like Polychronis added) defeat the whole idea of a presence service. But, what I want to stress is that we are dealing with different issues here. I don't believe this 30 minutes or the xmas tree effect is related to suspend/resume. Those seem like bugs somewhere in the stack of software that support presence, while the suspend/resume issues are clearly a side effect of the multicast traffic not being heard by a suspended XO. There are way too many issues. Theses bugs (30min/xmas tree) enhance the effects of suspend/resume on the mesh. I believe that since we have the big test week coming, everyone must be aware of them, or else noone will interpret the results properly. The direct suspend/resume bugs are: 1. Why the mesh view empties after a long suspend, and how this affects the mesh view 2. Why some times the avahi cache is cleared after resume. Ricardo, do you have anwers to the questions I posted before? : 1. When a XO resumes, does it send any notification via avahi, that it is back? Because if it doesnt, then other XOs that have cleared it from their lists, they will never search for it. 2. Every scans the network every 10min, to check whether its avahi peers are alive, in multicast packets. Do these packets include the address of the peers/targets? I think they do, unless i am very confused. Couldn't we awake/resume the target XO when it receives these specific packets? On Feb 19, 2008 3:00 PM, Giannis Galanis [EMAIL PROTECTED] wrote: The list expires in 10min-30min. But we cant wait 30min before suspending, it is way too long. On Feb 19, 2008 11:37 AM, Ricardo Carrano [EMAIL PROTECTED] wrote: Yanni, As I posted in the bug, I believe that you are observing the entries on the avahi cache expiring. So, your first scenario would happen when the suspend time is longer than the time it takes for all entries to expire. The second scenario would happen when the suspend time is not long enough to make all cached entries to go away. Oh i see that you mean. But, i think both cases are when the suspend time is longer than time to expire. The first is UI effect, and might have no relation to salut, but to mesh view in general The second is an avahi effect, that the avahi cache is chagned Both, are in long suspends And the third scenario seems related to previous reports you've made on the Xmas tree effect, so not related to suspend/resume. The xmas tree effect appears when XOs leave connection, while others return. Suspend/resume enhances this effect dramatically, because in 1-2min everyone goes away, and they return at random time according to when they resume. In my suspend-salut tests , the xmas tree effect(although NOT related to suspend/resume), it affects salut alot more then the other 2 scenarios My point is that we must fix it anyway. But especially now!! What do you think? I have 2 questions that will help (me) understand alot about the situation: 1. When a XO resumes, does it send any notification via avahi, that it is back? Because if it doesnt, then other XOs that have cleared it from their lists, they will never search for it. 2. Every scans the network every 10min, to check whether its avahi peers are alive, in multicast packets. Do these packets include the address of the peers/targets? I think they do, unless i am very confused. Couldn't we awake/resume the target XO when it receives these specific packets? we need to do some sniffing On Feb 19, 2008 1:13 PM, Giannis Galanis [EMAIL PROTECTED] wrote: On Feb 19, 2008 10:13 AM, Ricardo Carrano [EMAIL PROTECTED] wrote: I was asking whether it would help to have the wireless module wake us on multicast packets instead of only unicast. Are you saying that it would? It seems so, though it would, as John points out, make resumes far more constant. It seems we have to find a creative way out of this tough choice (automated suspend vs mesh) or face it. Avahi entries will expire after some time. Suspend will prevent it to update its cache. Yani's bug report (#6467) suggests that Avahi entries often expire immediately upon resume: After the XO resumes (probably after beinng suspended for several minutes) all the icons in the mesh view vanish, except the mesh circles. I read this as the avahi-cache expiring its entries. Yanni can you put timeframes on this? Could check how long does it take to expiry
Re: Salut and Suspend/Resume issues
On Feb 19, 2008 4:10 PM, Ricardo Carrano [EMAIL PROTECTED] wrote: Yanni, Did a use it otherwise? Because of the effects of xmas tree, the timeout for a failed XO until it's icon is removed is 10-30min. I am talking about the time it takes for an avahi entry to expire. For what you said, is 10 minutes. Oh ok. This is not 10min. Avahi checks every 10min that its peers are alive. An active entry will never expire A failed entry will naturally expire in an additional 20min(30 in total). BUT, it can expire instantly due to xmas tree bug(5501) Ricardo, do you have anwers to the questions I posted before? : Let's see: 1. When a XO resumes, does it send any notification via avahi, that it is back? Because if it doesnt, then other XOs that have cleared it from their lists, they will never search for it. I believe there is no I am back notification different than the normal way presence information is exchanged by the protocol. If not, then we have a problem. The other XOs will never know it is here, so they will never search for it. I think the are u alive request is destination specific. I will do some sniffing and find out. 2. Every scans the network every 10min, to check whether its avahi peers are alive, in multicast packets. Do these packets include the address of the peers/targets? I think they do, unless i am very confused. Couldn't we awake/resume the target XO when it receives these specific packets? That's the point. Mdns is multicast and the XOs, when suspended, don't listen to multicast frames. The suspended XO can be setup to wake up by multicast packets. This is technically possible afaik ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Salut and Suspend/Resume issues
There are a couple of important issues/bugs regarding Salut and Suspend/Resume. FIRST, there is a sugar issue, (or at least it seems so). When an XO resumes after long suspends, all icons(APs, XOs, but not the meshes) instantly vanish*(#6467)*. Then they slowly reappear. Although with the APs the situation is pretty straightforward, with the XOs we have several cases: - all XOs in the mesh return almost instantly - all or some XOs return slowly one by one - nothing returns, and avahi peer list is empty*(#6498)* It seems that although suspend should keep the previous situation frozen, in fact the avahi peer list is affected. SECOND, we have a network issue, which suggests a war between suspend/resume and avahi/salut Suspend will be interrupted only with unicast packets, but Salut/avahi rely on multicast packets. The result is that when an XO that appears in the mesh view is suspended, avahi will treat it just as if it has left the mesh. - When an XO is being used(not suspended), all other suspended XOs in the mesh will start failing 1 by 1 - From the moment an XO is suspended in about 10-30min the icon will vanish.*(#6282)* - If within this time new XOs join the mesh than the icon will vanish instantly!!*(#5501)* - If gradually several removed XOs start to resume, their icons will start returning *As you can see, the XOs have very little chance to even see each other** RESULT: A mesh of several XOs will avoid icons flashing here and there, ONLY if no XO has been idle for more 10min, which is rather unlikely. Considering the effects of the FIRST issue, you would practically have to restart sugar or switch channel back and forth to return to your original status. Salut/avahi are very sluggish in handling failed connections, and suspend resume enhaces this effect. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: New joyride build 1687
Perhaps it is not important, but the shown changes from olpc-utils 0.67 to 0.68 is wrong. It includes stuff that were added in previous versions ages ago On Feb 13, 2008 2:24 AM, Build Announcer v2 [EMAIL PROTECTED] wrote: http://xs-dev.laptop.org/~cscott/olpc/streams/joyride/build1687http://xs-dev.laptop.org/%7Ecscott/olpc/streams/joyride/build1687 Changes in build 1687 from build: 1686 Size delta: 0.00M -rainbow 0.7.9-1.olpc2 +rainbow 0.7.10-1.olpc2 -xterm 231-1.fc7 +xterm 232-1.fc7 -krb5-libs 1.6.1-4.fc7 +krb5-libs 1.6.1-6.fc7 -olpc-utils 0.67-1.olpc2 +olpc-utils 0.68-1.olpc2 -openldap 2.3.34-6.fc7 +openldap 2.3.34-7.fc7 -Terminal 8 +Terminal 9 -olpcsudo 1.3-0 --- Changes for rainbow 0.7.10-1.olpc2 from 0.7.9-1.olpc2 --- + Symlink ~/{.macromedia,.adobe} -gt; ~/.instance to ease --- Changes for xterm 232-1.fc7 from 231-1.fc7 --- + update to 232 --- Changes for olpc-utils 0.68-1.olpc2 from 0.67-1.olpc2 --- + Import olpc-netstatus 0.4 from Yanni + dlo#5746: Do not try to rename msh0. + dlo#5153: Fix sysfs path to rtap + Use GPLv2+ license tag as nothing in this package is GPLv2-only. + Make preview cleaner robust in the case of a missing datastore + Do not bother running journal cleaner on fresh installations (saves time on first boot) + Add a silly TODO list + Bump revision to 0.65 + Import olpc-netlog-0.3 and olpc-netstatus-0.3 + Add 'clean-previews' and incorporate it into olpc-configure. + 'become_root' script merged upstream. + Update License field to GPLv2 in order to match the COPYING file. + Install a simple 'become_root' script to ease dlo#5537. + Rename RPMDIST to DISTVER and DISTVAR to DIST + dlo#5626: Fix permissions in /home/bernie. + Insert extra spacing at the top for cosmetic reasons + Spacing fixes + Add missing cron job for olpc-pwr-prof + Power profile scripts + Construct Rainbow's spool dir if it doesn't exist - #5033 + Ensure /security has reasonable permissions. + Depend on /usr/bin/find + Remove files in $OLPC_HOME before creating them. + Add missing dependencies. + Use /ofw/openprom/model instead of olpc-bios-sig + Add more missing dependencies + Remove stray reference to olpc-bios-sig.c. + Pass absolute paths to rpmbuild + Add back sbin dirs to unprivileged users PATH + Invoke rainbow-replay-spool + Remove stupid 'exit 0' in zzz_olpc.sh that makes bash *exit* rather than skip the scriptlet + Depend on tcpdump for olpc-netcapture. + Fix version replacement in spec file + Merge olpc-netstatus 0.2 + Merge olpc-netlog 0.2 + Really bump revision + Add a couple of new languages + Add missing files + Ensure correct keyboard is loaded even on first boot + Don't create /root/.i18n as it makes us loose the boot time optimization + Add code to help us improve boot time + Add VMware configuration. + Fix http://dev.laptop.org/ticket/5320 + Display motd in profile, not through /bin/login + Simplyfy setxkb invocation + Add ASCII art for motd (need more translations) + More languages for the motd + Replace fake input driver hack with proper config option. + Fix http://dev.laptop.org/ticket/5114 + Simplify test for Geode + Reindent with TABs to match other init scripts + Remove check for A-test boards (the following code is harmelss) + Be a little more verbose on progress. + Fix https://dev.laptop.org/ticket/5217: Update library index + Only run checks on start + Use $OLPC_HOME consistently + Only run hardware configuration on startup. + Fix numeric test on empty flag file. + Bump revision + Add olpc-netcapture to %files + Fix olpc#5195: Console font too small when using pretty boot. + Bump revision + Add autoconf check for PAM + Update spec file + Merge branch 'master' of ssh://[EMAIL PROTECTED]/git/projects/olpc-utils + Automatically push to origin on bumprev + Fix bumprev rule + Bump revision + Reorganize variables + Fix http://dev.laptop.org/ticket/4928 + Fix permissions on /home/olpc + Bump revision + Pacify automake's portability warnings + Update spec file + Even more aggressive packaging automation + Add script to import srpms in Fedora. + Merge commit 'cscott/master' + Explicitly strip NUL from mfg tags + Add cvs-import.sh to EXTRADIST + Fix https://dev.laptop.org/ticket/4762 + Bump revision + Separate out configuration done to /home and /. + Create /home/devkey.html, which can be used to request a developer key. + Automate the release process a bit more. + Approximate XOs DPI on emulators. + ReTAB. + Automate specfile generation some more + Ignore a few more generated files. + Set i18n settings from the new manufacturing data tags + Go back to starting sugar with /usr/bin/sugar. + Bump revision + Add bumprev rule + Merge branch 'master' of ssh://[EMAIL PROTECTED]/git/projects/olpc-utils + Add rule to generate RPM changelog. + Add support for X 1.3 + Bump revision
Re: [Server-devel] Mesh Portal Question
I believe in the blind table of XO-1, you have to include the anycast address C027C027C027. I think the last digits of the address are custom, but not sure though. Still, in your case, you should only blind XO2 to XO1, and dont forget to invert the blinding table. On Feb 11, 2008 11:41 AM, John Watlington [EMAIL PROTECTED] wrote: Waqas, Are you explicitly blinding the laptops to force that network configuration ? Can XO-2 talk to XO-1 fine ? Can XO-1 talk to the server ? We do this regularly --- it has been tested and works. John On Feb 9, 2008, at 6:45 AM, Waqas Toor wrote: Hello All, He is my scenario, XS XO-1 --- XO-2 I am unable to access school server from XO-2 via XO-1 route, I have 656 build on my XOs and server build 150 on my server with 1 active antennae what could be the problem, how to access XS from different hops of XOs as the automatic configuration didn't create route to the server Regards -- Waqas Toor member olpc Pakistan team ___ Server-devel mailing list [EMAIL PROTECTED] http://lists.laptop.org/listinfo/server-devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: [Server-devel] Mesh Portal Question
I believe in the blind table of XO-1, you have to include the anycast address C027C027C027. I think the last digits of the address are custom, but not sure though. Still, in your case, you should only blind XO2 to XO1, and dont forget to invert the blinding table. On Feb 11, 2008 11:41 AM, John Watlington [EMAIL PROTECTED] wrote: Waqas, Are you explicitly blinding the laptops to force that network configuration ? Can XO-2 talk to XO-1 fine ? Can XO-1 talk to the server ? We do this regularly --- it has been tested and works. John On Feb 9, 2008, at 6:45 AM, Waqas Toor wrote: Hello All, He is my scenario, XS XO-1 --- XO-2 I am unable to access school server from XO-2 via XO-1 route, I have 656 build on my XOs and server build 150 on my server with 1 active antennae what could be the problem, how to access XS from different hops of XOs as the automatic configuration didn't create route to the server Regards -- Waqas Toor member olpc Pakistan team ___ Server-devel mailing list [EMAIL PROTECTED] http://lists.laptop.org/listinfo/server-devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Salut/avahi/meshview issues
2. It takes up to 10min for avahi even to detect the inactivity of a peer. i.e. If an XOs switches channels, for up to 10min avahi wont even know(it used to be 1-2min). Is this with or without the patch from bug #6162 ? If without, then the time it takes avahi to discover it should still be 2 mintues. I'd like to how you test this. Oh and please file a bug, so we can actually track these issues. The patch 6162, as well as the patch of 5501 are in included in the 689/690 that I am testing. So this indeed explains the 10minutes(Actually i just found out of this bug). 3. It will take a total of about 30min for the XO to vanish from the mesh view(this is tooo long!) Again, file a bug. Needed info here is if there is a time difference between when avahi marks something as removed, when salut sends out the removed signal and when it actually disappears from the mesh view. This is now filed as 6282, with all dbusmonitor/avahibrowse logs to compare. This case is also an example of a avahi/mesh view inconsistency. Icons disappear form the mesh view/ but remain for about 1h longer in the avahi cache But these details should continue in trac anyway. 4. Avahi/mesh view respond independently. The situation used to be that when an entry dissappeared in avahi, it disappeared in mesh view, and the same when new peers arrive. This relation was very consistent. However, now we have the following cases: a) an XO will vanish from the mesh view, but remain indefinitely in the avahi cache as failed to resolve b) sometimes avahi shows alot less peers than the mesh view. The extra peers in the mesh view are definitely active since they properly respond to activity joining/sharing. c)sometimes avahi included more active peers than the mesh view. does anyone know why this is happening? Is it a bug? I have logs, if needed, that compare avahi-browse with timestamped dbus-monitor logs, that indicate the inconsistencies. Well you all list them as undesired behaviour, so i would say they're bugs. 5. An important improvement is that peers will not generally fail alot on their own. So, if many XOs join a mesh channel, and noone goes away, the will not start failing. This used to be a common effect after 4-5 XOs. However, i noticed once in 1cc, 61 active XOs in the mesh view! When you say salut, you actually mean avahi. It would help if you could be clear on what you mean :) This improvement is probably caused by the fix in #5501. I mean avahi indeed. In the past these two were very tight to each other. And i believe that the only direct way to examine salut is by checking the buddy list in the Analyze activity. I remember Ricardo had an interesting case were the buddy list included plenty of XOs, which were also properly sharing in the mesh view, but the avahi list was empty. Does this seem possible? (unfortunately no log at the moment) Anyway for all the bugs you should have filed instead of sending this mail, i will need tcpdump logs, avahi logs, salut logs and if possible meshview logs indicating when contacts are removed from the mesh from a machine where you say the behaviour. Preferably with timestamps I updated the trac with logs/tcpdumps/dbusmon/screenshots...enjoy! The reason i send first this email before filing tons of bugs is because i though it was necessary to describe the big picture, and the current status of salut. And also to avoid duplicate bugs, or bugs that are in fact intentional mods. This conversation was unfortunately directed towards other issues(wireless difficulties is a sensitive subject at olpc!), but in fact its purpose was to determine some very specific bugs in salut, that have nothing to do at the point with scalability or robustness of the protocol. When these are resolved, we can proceed with scalability, for which i am very confident. I believe our current salut/avahi issues are described in the following points: 1. I was under the impression that when a peer switches channels it sends a goodbye signal. And in fact only anorthodoxically removed peers(after crashes/poweroffs by pressing the button etc) would delay to disappear from mesh views. The 10min TTL is not unreasonable, but it should only be used for a routine check. In fact peers that leave/arrive should inform the mesh instantly. In that case the 10min TLL will only affect only the mesh points with noisy links that their goodbye signals will get lost. And these connections are less priority anyway. Also we could send 2/3 goodbye signals to ensure delivery. 2. We should definitely decrease the timeout window between a lost peer being detected, and the actual disappearance from the mesh view. This used to be 10min, now it is 20min, but really, to my experience, if a peer is for more than 1-2min away he aint coming back. 3. Should we make the above TTL and timeout to be user specific, or custom anyway?. Will there be a problem if two
Re: Salut/avahi/meshview issues
On Jan 31, 2008 10:54 AM, Ricardo Carrano [EMAIL PROTECTED] wrote: I believe our current salut/avahi issues are described in the following points: 1. I was under the impression that when a peer switches channels it sends a goodbye signal. And in fact only anorthodoxically removed peers(after crashes/poweroffs by pressing the button etc) would delay to disappear from mesh views. The 10min TTL is not unreasonable, but it should only be used for a routine check. In fact peers that leave/arrive should inform the mesh instantly. In that case the 10min TLL will only affect only the mesh points with noisy links that their goodbye signals will get lost. And these connections are less priority anyway. Also we could send 2/3 goodbye signals to ensure delivery. Mm, it seems that some dbus signal or the respective processing by the PS lacks. Is there a NM dbus signal when we change channels? This should be easy to determine. It must be very easy for the PS to detect a channel change, or anyway when the XOs leaves the channel. The point is whether avahi supports such notifications, so the other peers can instantly remove the entry. 2. We should definitely decrease the timeout window between a lost peer being detected, and the actual disappearance from the mesh view. This used to be 10min, now it is 20min, but really, to my experience, if a peer is for more than 1-2min away he aint coming back. For what you describe this does not seem related to the protocol itself, right? I believe it is important to achieve our goals without making the protocol more chatty. This timeout is client specific, and doesnt affect the protocol itself at all. There reason this timeout exists(to my knowledge anyway), is that sometime a peer seems indiscoverable, but in fact it is just the effect of a poor link. So the peer rejoins shortly after. The effect would be XOs would move around the mesh view. To solve this issue, we wait for several minutes, before actually removing the XO. To my opinion the more we hide from the user, the more she gets confused. Keeping the icon in the mesh view while the connections is down, just messes things up. I also remember that there was the idea of keeping the lost icon in the mesh view, but notifying the user somehow, like change its outline to a dotted line or smth. But, this is a UI issue 3. Should we make the above TTL and timeout to be user specific, or custom anyway?. Will there be a problem if two XOs have different TTL? I would assume that it wont. The idea is that it is a waste of our resources to try to calculate the ideal values of TTL and timeout by asking the collabora team to fix, and fix again. Whereas we can make the test here in 1cc, and find ourselves which suits as best. Is it easy to implement such a patch? I believe it is useful to have some controls in order to help tuning things up. But not all of them need to be translated in user friendly controls. I believe your question would be how we could change this setting ourselves. Did I get it right? Exactly. By no means we need to have this controls user friendly. We only need the ability to tune them dynamically our selves for testing and evaluating purposes. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: jabber for non-wireless XO ?
can you please specify: which jabber you tried to connect to? which build are you running? On Jan 20, 2008 7:42 PM, Mikus Grinbergs [EMAIL PROTECTED] wrote: I don't have any wireless. I do have a wired ethernet connection to a LAN (which in turn uses a proxy to reach the internet). Even when I specify in sugar-control-panel the name of a real server, my XO is not accessing jabber (the field in olpc-netstatus is shown blank). I believe my proxy can correctly pass requests for ports 5222-5223. Does telepathy work with a wired ethernet? Does it have a problem if the connection is through a proxy? mikus ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Salut/avahi/meshview issues
I understand that salut is not very popular lately since we are drifting mostly towards infra mode. Still, it is the preferable way for G1G1 laptops to talk to each other, since there is no SS, and the public jabber is not guaranteed, or in the future overcrowded. I have conducted several tests with a group of 9 XOs blinded with each other. The most important issues is the response of the mesh view, when an XO leaves the mesh. The results were: 1. The xmas tree effect is still here. i.e. XOs occasionally vanish/reappear in differenent positions. This is because of the following: When the avahi cache includes several inactive/departed/(reported as failed) peers, and a new pear arrives, then all the inactive peers vanish from the screen instantly. (#5501) If their inactivity was temporary, then they will reappear shortly in a different location If for e.g. 3-4 XOs are (by user internention) moved simultaneously from ch6 to ch11, and then back to ch6, the icons wont have the time to disappear. BUT, the first to return to ch6 will cause the effect/bug to the others, which will instantly vanish. Shortly after they will naturally all return 1by1 to ch6 and will reappear in different locations. There was a patch for this issue(5501), which was included in 678+, but it has no effect. 2. It takes up to 10min for avahi even to detect the inactivity of a peer. i.e. If an XOs switches channels, for up to 10min avahi wont even know(it used to be 1-2min). 3. It will take a total of about 30min for the XO to vanish from the mesh view(this is tooo long!) 4. Avahi/mesh view respond independently. The situation used to be that when an entry dissappeared in avahi, it disappeared in mesh view, and the same when new peers arrive. This relation was very consistent. However, now we have the following cases: a) an XO will vanish from the mesh view, but remain indefinitely in the avahi cache as failed to resolve b) sometimes avahi shows alot less peers than the mesh view. The extra peers in the mesh view are definitely active since they properly respond to activity joining/sharing. c)sometimes avahi included more active peers than the mesh view. does anyone know why this is happening? Is it a bug? I have logs, if needed, that compare avahi-browse with timestamped dbus-monitor logs, that indicate the inconsistencies. 5. An important improvement is that peers will not generally fail alot on their own. So, if many XOs join a mesh channel, and noone goes away, the will not start failing. This used to be a common effect after 4-5 XOs. However, i noticed once in 1cc, 61 active XOs in the mesh view! This shows that salut is more capable then we expected. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Testing the Wireless driver changes
I see. But i hope this is not done only because of the airline issue, and that there are other reasons that it useful to boot with firmware unloaded. Because as far as the airline issue is concerned, we should not take it tooo seriously. As long as we have a working solution it is fine. Not many people will use it anyway. U see my point? Also, if it is smth simple we can quickly implement is for Update.1. On Jan 17, 2008 7:36 PM, David Woodhouse [EMAIL PROTECTED] wrote: On Thu, 2008-01-17 at 19:16 -0500, Giannis Galanis wrote: It must be noted that the important issue of this discussion is how to have the radio blocked from BEFORE the XO boots, so as not to be conflicting with the airline regulations. We should change the firmware so that it isn't active automatically as soon as it's loaded -- let the driver activate it when it's appropriate. Then the decision as to whether the radio is blocked can properly be handled in userspace, and the device can be left quiescent if appropriate. -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Testing the Wireless driver changes
David, There are a couple of issues i would like to address, mostly related to the new wireless driver. First, the netstat command: About 50% of the time it becomes very slow(practically freezes) and spews a getnameinfo error. The result from strace is: --- . socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr( 172.18.0.1)}, 28) = 0 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR) fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 gettimeofday({1200442106, 340565}, NULL) = 0 poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1 send(4, \270\227\1\0\0\1\0\0\0\0\0\0\1e\0017\1c\1e\1c\0010\1e\1f\1f\1f..., 90, MSG_NOSIGNAL) = 90 poll( unfinished ... It seems(according to Bernie)..that netstat makes queries to the DNS server but it is temporarily down. Still if you execute the command a couple of time it works again, but is a very regular phenomenon. This should be a network issue, and not a driver issue, but can you confirm that? Also, the msh0 interface is named after msh0_rename. Is there a reason for that? Will this change back to normal in the future? How will it be in Update1?. This inconsistency causes some issues in the olpc-netstatus command utility. Can you also please describe the changes from the user's perspective that are changed/improved in the new driver. So we know were to start testing from. For example, what is the situation with mesh on or off is the mesh-start file still in use are improvements related to 4470 thanx, yani On Jan 15, 2008 6:40 PM, Kim Quirk [EMAIL PROTECTED] wrote: David, Yani is back from his time off and finished with his exams (at least for now). Before the new year break, he had been working on testing, documenting and debugging issues mostly associated with avahi and telepathy, but also with wireless. He and Ricardo have been our wireless test experts. Now that he is back, it would be great if you and Michail can provide some thoughts on the highest priority testing that we should do here or at Michail's house (for a little more controlled RF setting); so we can try to find bugs as quickly as possible. Also - Ricardo, you might be able to give us some indication of your availability for testing and how many laptops you have in Brazil, etc. Thanks, Kim ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: The reason we see icons flashing here and there in the mesh view.. i.e. xmas tree effect
The test showed that the effect is not a result of a network failure. It occurs naturally, every time a new host arrives, while at the same time another host appears dead. Dead can also mean a host that simply disconnected fro the channel by user intervention. The best and simplest way to recreate the effect in any environment(noisy or not) is to: 1.Connect successfully 3 XOs in the same mesh. 2.Move successfully XO1,XO2 to another channel., and verify the show as failed when running avahi-browse in XO3 3.Reconnect at the same time XO1,XO2 to the initial channel. 4.While the XOs are trying to connect(30sec) check they still show are Failed when running avahi-browse in XO3 5.Observe the screen in XO3: the icons of XO1,XO2 will jump almost at the same time. To my best understanding, It is not related to a noisy envirnment Does not require a large number of laptops Can be recreated in 100% of the times you try the above. I believe that if the emulator you operate, uses the proper timeouts, you will see the effect yani On Dec 14, 2007 4:31 AM, Sjoerd Simons [EMAIL PROTECTED] wrote: On Thu, Dec 13, 2007 at 11:18:01PM -0500, Giannis Galanis wrote: THE TEST: 6 XOs connected to channel 11, with forwarding tables blinded only to them selves, so no other element in the mesh can interfere. The cache list was scanned continuously on all XOs using a script If all XOs remained idle, they all showed reliably to each other mesh view. Every 5-10 mins an XO showed as dead in some other XOs scns, but this was shortly recovered, and there was no visual effect in the mesh view. Could you provide a packet trace of one of these XO's in this test? (Install tcpdump and run ``tcpdump -i msh0 -n -s 1500 -w some filename''. I'm surprised that with only 6 laptops you hit this case so often. Ofcourse the RF environment in the OLPC is quite crowded, which could trigger this. Can you also run: http://people.collabora.co.uk/~sjoerd/mc-test.pyhttp://people.collabora.co.uk/%7Esjoerd/mc-test.py Run it as ``python mc-test.py server'' on one machine and just ``python mc-test.py'' on the others. This should give you an indication of the amount of multicast packet loss.. Which can help me to recreate a comparable setting here by using netem. If you switched an XO manually to another channel, again it showed dead in all others. If you reconnected to channel 11, there is again no effect in the mesh view. If you never reconnected, in about 10-15 minutes the entry is deleted, and the corresponding XO icon dissapeared from the view. Therefore, it is common and expected for XOs to show as dead in the Avahi cache for some time for some time. THE BUG: IF a new XO appears(a message is received through Avahi), WHILE there are 1 or more XOs in the cache that are reported as dead THEN Avahi crashes temporarily and the cache CLEARS. At this point ALL XOs that are listed as dead instantly disappear from the mesh view. Interesting. Could you file an trac bug with this info, with me cc'd ? Sjoerd -- Everything should be made as simple as possible, but not simpler. -- Albert Einstein ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: connection to jabber.laptop.org
On Dec 13, 2007 5:08 PM, John Watlington [EMAIL PROTECTED] wrote: On Dec 13, 2007, at 1:05 PM, Giannis Galanis wrote: I also installed the rpm in custom machine(not a school server) in 1CC. I must note that the ejabberdctl-extra.diff patch in the wiki page is for another version than 1.1.4. I used the config file ejabberd.cgf which I got from jabber.laptop.org John, I didnt use the jtest account, but omicron which danny created for me last week. I got the file from /home/wad The accounts are not registered in the ejabberd.cfg file. They are kept in database (which can be dumped and reloaded using ejabberdctl). Thus, omicron doesn't have an account on your new machine. I couldnt register the admin account(is this necessary? because it is not stated in the wiki) This is absolutely necessary, and was the sticking point for me last week on a schoolserver. I tried: ejabberdctl register localhost admin admin (according to wiki) ejabberdctl ejabberd register admin localhost admin (according to the previous email) or ejabberdctl register admin localhost admin Every time i received: RPC failed on the node [EMAIL PROTECTED] : nodedown ro similar Take a look at the command line parameters to ejabberdctl. I wouldn't expect those commands to work. The commandline parameters of ejabberdctl are not easy to find. Why do you think the above commands would not work? You said you used ejabberdctl ejabberd register admin localhost admin and managed to register. I also tried ejabberdctl delete-older-users ejabberdctl status ejabberdctl --node localhost status I received: RPC failed on the node {1st [EMAIL PROTECTED] : nodedown Can anyone from collabora please specify the single correct way to configure this, because we will never get it right. Yes, please. Also i couldnt connect to http://yourserver:5280/admin/. Perhaps this is expected since the admin account was not succefully created. Correct. You were trying http://18.85.46.175:5280/admin/, right ? I could telnet 18.85.46.175 5222 from an XO, or telnet localhost 5222 and successfully connected. Note 18.85.46.175 is the servers IP. I tried to connect to the custom jabber server through an XO by sugar-control-panel -s jabber 18.85.46.175 sugar reboot The gabble logs, which i attach, show an initial succefull connection, which failed later on. Also in the server side, the following message poped up: INFO REPORT: [(0.185.0:ejabberd_listener:90):(#port0.388) Accepted connection ({0,0,0,0,0,65535,46935,5098,56209}) - ({0,0,0,0,65535,46935,11951,5223})] or similar. The XO was finally connected to salut. However, no other XO has managed to connect to jabber.laptop.org successfully the past week. Is there a reason for this? That is interesting. The server is up and running, and thinks that 140 of the 7500+ registered users are currently using it. You might try restarting it ? wad There are 140 people connected to jabber.laptop.org? With what command can you see this? Noone at the office has connected recently. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: connection to jabber.laptop.org
oh i thought we were using 1.1.3. I was using the guidelines from the http://wiki.laptop.org/go/Ejabberd_Configuration Ricardo applied the patches provided in the page and compiled them. Is this necessary with 1.1.4? Now, 1.1.3 runs, but xo's cannot connect. I set the jabber with sugar-control-panel but the XOs connects to salut It can telnet to it though. Also, I couldnt connect to http://server:5280/admin/ as indicated in the wiki. Is this really necessary? thanx yani On Dec 6, 2007 1:25 PM, Robert McQueen [EMAIL PROTECTED] wrote: Yani, It's no problem, just that Danny said you were logged in and I was worried you were changing the configuration or something! Which version are you putting on the school server? The version on jabber.laptop.org is in an RPM at ~robot101/ejabberd/F-7/i386/ejabberd- 1.1.4-1.1.20071205svn1027.fc7.olpc.i386.rpm on that machine. Regards, Rob Giannis Galanis wrote: Robert, It was me that logged in the jabber. Sorry i did not let u know, i didnt think it was important. I am trying to set up a jabber server at another school server, but i was having continuous erros with the config file. I logged on to copy the config file used by jabber.laptop.org http://jabber.laptop.org. It works now. thanx yani ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: ip4-address buddy property - still needed?
The feature, although not usable by the activities, it has other benefits. By observing the buddy list, you acquire instant information of the network connection go the users: when connected to channel 1 for example: 169.254.x.x address are in link-local 172.18.x.x are connected to schoolserver when connected to a jabber server: 169.254.x.x are connected through an MPP 18x.x.x are media lab 172.18.x.x are connected to schoolserver in olpc etc It is information continuously used in network testing, also useful from the users prespective: 1. in the case of connecting to multiple jabber servers, the user should be able to tell which XO in the neighbout view belongs to the same school 2. get the geopraphical location of another user In future versions of the neighbor view, or through other activities, the user should be able to filter for specific XOs according to location, or school(in the case he's connected to many servers). Two children in the same school should be able to recognize each other even if they are connected through a jabber server, other then the one in the school. It can also be useful for locating an XO in case of theft. I have also added a ticket(4405) for adding the public id in the buddy list properties. It is a small part of data(both IPs, private and public), which can be harmfully incorporated in the telepathy services. Please let me know if you agree, yani On 10/25/07, Jim Gettys [EMAIL PROTECTED] wrote: It seems, from your discussion like unless someone grumbles today, this should be removed immediately. And it removed within a week, even if someone grumbles... - Jim On Thu, 2007-10-25 at 10:15 +0100, Simon McVittie wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We still have one set of OLPC-specific patches to Salut (the link-local collaboration backend) that has been rejected upstream, which is the one that adds support for the deprecated ip4-address buddy property. This was used during a transitional period to enable simple TCP-based collaboration for activities that didn't use Tubes; Sjoerd is reluctant to keep this patch set, because it's meant to have gone away by now! Is anyone still using this property? If not, can we kill it? It was added in Trial-2, and it was meant to be gone by Trial-3 but was left in just in case, so it really ought to disappear. When it does, we can delete some code from Salut and Presence Service. Places it's exposed in the APIs, which I propose to get rid of: PS D-Bus API: Buddy.GetProperties() returns a dict that contains ip4-address: 10.0.0.1 (or whatever), and Buddy.PropertyChanged signal includes a dict that can contain the same sugar.presence: Buddy has a GLib property ip4-address (aka buddy.props.ip4_address) and can emit it in its property-changed signal The Read activity appears to be the only thing in my jhbuild that uses ip4-address (#4297). It should be ported to either stream tubes (when they're ready in Salut, which should be this or next week) or D-Bus tubes (now). Gabble already supports stream tubes, so stream-tube support can be implemented on a branch and tested against Gabble. Porting from plain TCP to stream tubes should be very straightforward; I hope to produce a proof-of-concept patch for Read later today. Simon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net iD8DBQFHIF7HWSc8zVUw7HYRAvp6AJ9G/Xiw27pPPMm0g02vhXzRhzUxqwCfW27Z nh1B/wqe7GD/xf/YaOPVaw8= =42L7 -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel -- Jim Gettys One Laptop Per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: log-collect / log-send
Pascal, I have been working on something similar. It is a console script that gather networks related logs, and will be available in the next joyride. At the moment it includes: var/log/messages var/log/xorg.0.log /home/olpc.sugar.logs/presenceservice /home/olpc.sugar.logs/gabble /home/olpc.sugar.logs/salut and the following info: build firmware model time mac ips of all interfaces network topology jabber server salut or gabble The gzipped tar is ~20kb which is pretty low. However, other tests(for specific activites for ex.) will require other logs. I believe that a complete log activity should have a list of options like: network logs kernel logs activities logs all logs ...so the user can choose according to the problem also, the activity should be able to enable All Logs, from the .xinitrc, .sugar.debug files, or perhaps the full kernel logs. I was planning to add the above features in my script, but a sugar activity is better than a console script. Since we are working on the same thing we can use each other's help, and create a single application. yani On 10/29/07, Pascal Scheffers [EMAIL PROTECTED] wrote: I've created a rough-cut log-collector, it's in d.l.o/git/project/log- activity/log-collect.py For now, it just outputs some system info, tell me what's missing or what would be interesting to include? I don't know yet how to list installed activities... would that be just `ls /usr/share/activities/`? Or is there a package list? And then the main purpose: sending logs to OLPC, either using http- post or email or usb-stick or... but what logs should I collect? Just all of them? ~/.sugar/default/logs/* and /var/log/* ? Or should it be more selective? And some information from the journal, perhaps? What about privacy/sensitive information? Will there be any in the logs or system info? - Pascal Current log-activity.py output: bios-version: Q2C18 uptime: 434169.21 430235.72 wireless_mac: 00-17-C4-05-2A-58 uuid: 8A401F4E-E312-47F9-96C8-A488C99BDA2F localization: ?? kernel_version: Linux version 2.6.22-20071018.1.olpc.d4414541d2be66a ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 PREEMPT Thu Oct 18 11:44:14 EDT 2007 diskfree: 716 MB laptop-info-version: 0.1 memfree: 63496 kB serial-number: SHF7250025C disksize: 1024 MB keyboard: ??-??-?? olpc_build: OLPC build joyride 58 (stream joyride; variant devel_jffs2) country: USA board-revision: B4 motherboard-number: QTFLCA72400063 POWER_SUPPLY_NAME=olpc-battery POWER_SUPPLY_TYPE=Battery POWER_SUPPLY_STATUS=Full POWER_SUPPLY_PRESENT=1 POWER_SUPPLY_HEALTH=Good POWER_SUPPLY_TECHNOLOGY=LiFe POWER_SUPPLY_VOLTAGE_AVG=6792960 POWER_SUPPLY_CURRENT_AVG=0 POWER_SUPPLY_CAPACITY=97 POWER_SUPPLY_CAPACITY_LEVEL=Full POWER_SUPPLY_TEMP=2508 POWER_SUPPLY_TEMP_AMBIENT=4300 POWER_SUPPLY_ACCUM_CURRENT=8390 POWER_SUPPLY_MANUFACTURER=BYD POWER_SUPPLY_SERIAL_NUMBER=5d0d0100daff ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Simon, I think the email i send you is incomplete, my connection was poor and gmail must have saved the wrong draft. But, 1-2-3, is what i intended to send you. I also meant to ask, How many times do you try _init_connection before you assume the connection is down? I hope so. I have a tarball with the patch, but I'm still waiting for Update.1 approval (it's unclear whether I can build RPMs for Joyride before I get Update.1 approval or not). If you're at 1CC, could you please annoy the ApprovalForUpdate people in person until they either look at their bugs, or confirm whether I'm still allowed to build RPMs in Koji? I can definitely try to arrange this. But, can you please send me the tarball to test it in the mean time? 2. We need to be able to restart PS. As you say this is not possible, but if we restart sugar will PS restart as well? Yes, that's right (the D-Bus session bus will exit, which causes D-Bus services like PS to exit too unless they've specifically asked not to). I see you assigned the bug about need to be able to cope with PS restarts to yourself. Unless you're planning to implement the necessary Python code in sugar.presence yourself, please don't. I don't think it's feasible to implement correct handling of PS restarts in sugar.presence for Update.1, so unless the release engineering team specifically tell me to, I won't be addressing that bug until a later release. Ok, i will reassign the bug to presenceservice. As long as restarting sugar works, we can stick to that for now. 3. We need to force gabble to run. We have several instances of 4193 (almost all XOs connected to schoolserver,AP are running salut). Or at least to force trying to connect to jabber server. Please see my comments on #4193 regarding steps to take to debug (I think it's #4193 I commented on - I can't remember bug numbers, and Trac is down at the moment). In summary: * try resolving the server with getent hosts jabber.laptop.org * try pinging it with ping jabber.laptop.org * try connecting via TCP with telnet jabber.laptop.org 5222 (type hello and press Enter, if all goes well you should get disconnected with an error message that mentions XML not well formed) The bug is indeed 4193. I have replied to your post, but as the trac is down you probably havent seen it. I made all three tests: $getent hosts jabber.laptop.org 2001:4830:2446:ff00:201:6cff:fe07:68ec jabber.laptop.org - frequent reply 18.85.46.41 jabber.laptop.org --rare reply $ping jabber.laptop.org PING jabber.laptop.org (18.85.46.41) 56(84) bytes of data. 64 bytes from jabber.laptop.org (18.85.46.41): icmp_seq=1 ttl=63 time= 67.4 ms ... $telnet jabber.laptop.org 5222 blabla... connected hello replied with an xml packet with xml-not-well-formed included so it seems that it is a PS issue. Perhaps it is not waiting long enough, or doesnt make enough tries when trying to connect. I have reassigned the bug to presenceservice. If any of these steps fail, Gabble won't be able to connect either, and there's nothing Gabble can do about it - talk to the Network Manager maintainer instead, since that's the component responsible for getting network connectivity and DNS on the XO. If you check the Gabble log you'll probably find that Gabble is trying to connect, but failing because either it can't resolve jabber.laptop.org in DNS, or it can't get a TCP connection there. That was my diagnosis of two of the cases you mentioned in your bug with 3 sets of logs (which may have been #4193?). In the third case it looked as though you hadn't waited long enough for the log to indicate success or failure. 4. The process of trying to connect to the jabber server, is done by telepathy-gabble, or by the presence What I meant here is, Does the PS check if jabber server is accessible, and then runs telepathy-gabble?, or this is one of the tasks of telepathy-gabble?, which as I see you replied to Depends what you mean. The Presence Service is responsible for choosing when to try to connect (at which time it calls the Connect() D-Bus method on Gabble), but it's Gabble that actually opens a TCP socket to the Jabber server and tries to talk to it. You can see this in the PS log, for instance: 1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): connecting... 1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect() succeeded (note that Connect() succeeded is a bit misleading - it just means that the connection manager has said OK, I'll try, rather than that it has actually been able to connect.) In the telepathy-gabble log you'll then see something like this: ** (telepathy-gabble:25330): DEBUG: do_connect: calling lm_connection_open Going to connect to olpc.collabora.co.uk
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Yes, i have seen this ticket in the past. To detect whether an XO is actually there or not, is a simple task to accomplish, and I am currently working on a simple script that will give a list of the properly connected XOs, along with the temporarily disconnected. It is a very useful idea to display this information in the neighbor view, in terms of a dotted line, or a grey color perhaps. The problem is that the bugs are dealt with according to priority, and generally enhancements although very practical, can cause other bugs, or take several builds until they work properly. Since we are in code freeze, a quick solution must be implemented to solve the current situation, ie that it takes up to an hour for a disconnected xo to dissapear(just reported as #4735). yani On Nov 7, 2007 5:49 PM, Eben Eliason [EMAIL PROTECTED] wrote: 1. We need to fix the timeout for icons to disappear. Can we try Guillaume's patch? I hope so. I have a tarball with the patch, but I'm still waiting for Update.1 approval (it's unclear whether I can build RPMs for Joyride before I get Update.1 approval or not). If you're at 1CC, could you please annoy the ApprovalForUpdate people in person until they either look at their bugs, or confirm whether I'm still allowed to build RPMs in Koji? Just a mention, since this thread is getting a lot of attention. There is an added visual element which should be in play here, according to the design. There should be an intermediate state before XOs disappear from the view, as outlined in: http://dev.laptop.org/ticket/3657 2. We need to be able to restart PS. As you say this is not possible, but if we restart sugar will PS restart as well? Yes, that's right (the D-Bus session bus will exit, which causes D-Bus services like PS to exit too unless they've specifically asked not to). I see you assigned the bug about need to be able to cope with PS restarts to yourself. Unless you're planning to implement the necessary Python code in sugar.presence yourself, please don't. I don't think it's feasible to implement correct handling of PS restarts in sugar.presence for Update.1, so unless the release engineering team specifically tell me to, I won't be addressing that bug until a later release. 3. We need to force gabble to run. We have several instances of 4193 (almost all XOs connected to schoolserver,AP are running salut). Or at least to force trying to connect to jabber server. Please see my comments on #4193 regarding steps to take to debug (I think it's #4193 I commented on - I can't remember bug numbers, and Trac is down at the moment). In summary: * try resolving the server with getent hosts jabber.laptop.org * try pinging it with ping jabber.laptop.org * try connecting via TCP with telnet jabber.laptop.org 5222 (type hello and press Enter, if all goes well you should get disconnected with an error message that mentions XML not well formed) If any of these steps fail, Gabble won't be able to connect either, and there's nothing Gabble can do about it - talk to the Network Manager maintainer instead, since that's the component responsible for getting network connectivity and DNS on the XO. If you check the Gabble log you'll probably find that Gabble is trying to connect, but failing because either it can't resolve jabber.laptop.org in DNS, or it can't get a TCP connection there. That was my diagnosis of two of the cases you mentioned in your bug with 3 sets of logs (which may have been #4193?). In the third case it looked as though you hadn't waited long enough for the log to indicate success or failure. 4. The process of trying to connect to the jabber server, is done by telepathy-gabble, or by the presence Depends what you mean. The Presence Service is responsible for choosing when to try to connect (at which time it calls the Connect() D-Bus method on Gabble), but it's Gabble that actually opens a TCP socket to the Jabber server and tries to talk to it. You can see this in the PS log, for instance: 1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): connecting... 1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect() succeeded (note that Connect() succeeded is a bit misleading - it just means that the connection manager has said OK, I'll try, rather than that it has actually been able to connect.) In the telepathy-gabble log you'll then see something like this: ** (telepathy-gabble:25330): DEBUG: do_connect: calling lm_connection_open Going to connect to olpc.collabora.co.uk Trying 195.10.223.134 port 5222... ** (telepathy-gabble:25330): DEBUG: tp_base_connection_change_status: was 4294967295, now 1, for reason 1 **
Re: Salut (link-local) protocol changing - don't expect interop between versions
Since you are updating the presence service, it is a could opportunity to fix the switch from salut to gabble. When internet connectivity is detected, salut should stop, and gabble should start right after. However, this doesnt work properly even on latest builds, especially when the XO connects through schoolserver. It has even been documented(bug 4193) that an XO was connected to medialab AP and was still running Salut. The neighbor view included several XOs and could share properly. It is pretty high priority to make this work properly. Also, when connected to a school server, it is faster to communicate with others in the mesh through salut, then through jabber. So it can be useful for the user to force salut even when jabber is available. yani On 10/18/07, Simon McVittie [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Just a heads-up for anyone who isn't already aware: We're replacing the Salut (link-local collaborative backend) rMulticast protocol with a better version, over the next week or so (bug #4044). This is an incompatible change; there may in fact be more than one incompatible change involved, if we have to change the protocol further when it's had larger-scale testing. As a result, until further notice, Salut is not expected to be compatible between different versions. Please do not report bugs in link-local (serverless) collaboration unless all participants in the activity are running exactly the same snapshot of Salut (e.g. the same XO image). We'll freeze the network protocol again between now and the 1.0 freeze. The improved rMulticast protocol either fixes, or will enable us to fix, #3294, #3969, #3465, #3338 and possibly #4127; we might also take the opportunity to improve the mDNS part of the protocol. Checking the version on an OLPC: rpm -q telepathy-salut Checking the version in jhbuild: ls -d source/telepathy-salut-*, see which one has the latest date in the directory name Regards, Simon on behalf of the OLPC Telepathy team -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net iD8DBQFHF4ZmWSc8zVUw7HYRAqtDAJ9AWv5rE8jZzl84zlZW+MRLd6zxqACfRD3z OgPyBcBGKb1tZjbY+PT432I= =ouwQ -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Salut (link-local) protocol changing - don't expect interop between versions
Since you are updating the presence service, it is a could opportunity to fix the switch from salut to gabble. When internet connectivity is detected, salut should stop, and gabble should start right after. However, this doesnt work properly even on latest builds, especially when the XO connects through schoolserver. It has even been documented(bug 4193) that an XO was connected to medialab AP and was still running Salut. The neighbor view included several XOs and could share properly. It is pretty high priority to make this work properly. Also, when connected to a school server, it is faster to communicate with others in the mesh through salut, then through jabber. So it can be useful for the user to force salut even when jabber is available. yani On 10/18/07, Simon McVittie [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Just a heads-up for anyone who isn't already aware: We're replacing the Salut (link-local collaborative backend) rMulticast protocol with a better version, over the next week or so (bug #4044). This is an incompatible change; there may in fact be more than one incompatible change involved, if we have to change the protocol further when it's had larger-scale testing. As a result, until further notice, Salut is not expected to be compatible between different versions. Please do not report bugs in link-local (serverless) collaboration unless all participants in the activity are running exactly the same snapshot of Salut (e.g. the same XO image). We'll freeze the network protocol again between now and the 1.0 freeze. The improved rMulticast protocol either fixes, or will enable us to fix, #3294, #3969, #3465, #3338 and possibly #4127; we might also take the opportunity to improve the mDNS part of the protocol. Checking the version on an OLPC: rpm -q telepathy-salut Checking the version in jhbuild: ls -d source/telepathy-salut-*, see which one has the latest date in the directory name Regards, Simon on behalf of the OLPC Telepathy team -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net iD8DBQFHF4ZmWSc8zVUw7HYRAqtDAJ9AWv5rE8jZzl84zlZW+MRLd6zxqACfRD3z OgPyBcBGKb1tZjbY+PT432I= =ouwQ -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Simon, I think the email i send you was incomplete, my connection was poor and gmail must have saved the wrong draft. But, 1-2-3, is what i intended to send you. I also meant to ask, How many times do you try _init_connection before you assume the connection is down? I hope so. I have a tarball with the patch, but I'm still waiting for Update.1 approval (it's unclear whether I can build RPMs for Joyride before I get Update.1 approval or not). If you're at 1CC, could you please annoy the ApprovalForUpdate people in person until they either look at their bugs, or confirm whether I'm still allowed to build RPMs in Koji? I can definitely try to arrange this. But, can you please send me the tarball to test it in the mean time? 2. We need to be able to restart PS. As you say this is not possible, but if we restart sugar will PS restart as well? Yes, that's right (the D-Bus session bus will exit, which causes D-Bus services like PS to exit too unless they've specifically asked not to). I see you assigned the bug about need to be able to cope with PS restarts to yourself. Unless you're planning to implement the necessary Python code in sugar.presence yourself, please don't. I don't think it's feasible to implement correct handling of PS restarts in sugar.presence for Update.1, so unless the release engineering team specifically tell me to, I won't be addressing that bug until a later release. Ok, i will reassign the bug to presenceservice. As long as restarting sugar works, we can stick to that for now. 3. We need to force gabble to run. We have several instances of 4193 (almost all XOs connected to schoolserver,AP are running salut). Or at least to force trying to connect to jabber server. Please see my comments on #4193 regarding steps to take to debug (I think it's #4193 I commented on - I can't remember bug numbers, and Trac is down at the moment). In summary: * try resolving the server with getent hosts jabber.laptop.org * try pinging it with ping jabber.laptop.org * try connecting via TCP with telnet jabber.laptop.org 5222 (type hello and press Enter, if all goes well you should get disconnected with an error message that mentions XML not well formed) The bug is indeed 4193. I have replied to your post, but as the trac is down you probably havent seen it. I made all three tests: $getent hosts jabber.laptop.org 2001:4830:2446:ff00:201:6cff:fe07:68ec jabber.laptop.org - frequent reply 18.85.46.41 jabber.laptop.org --rare reply $ping jabber.laptop.org PING jabber.laptop.org (18.85.46.41) 56(84) bytes of data. 64 bytes from jabber.laptop.org (18.85.46.41): icmp_seq=1 ttl=63 time= 67.4 ms ... $telnet jabber.laptop.org 5222 blabla... connected hello replied with an xml packet with xml-not-well-formed included so it seems that it is a PS issue. Perhaps it is not waiting long enough, or doesnt make enough tries when trying to connect. I have reassigned the bug to presenceservice. If any of these steps fail, Gabble won't be able to connect either, and there's nothing Gabble can do about it - talk to the Network Manager maintainer instead, since that's the component responsible for getting network connectivity and DNS on the XO. If you check the Gabble log you'll probably find that Gabble is trying to connect, but failing because either it can't resolve jabber.laptop.org in DNS, or it can't get a TCP connection there. That was my diagnosis of two of the cases you mentioned in your bug with 3 sets of logs (which may have been #4193?). In the third case it looked as though you hadn't waited long enough for the log to indicate success or failure. 4. The process of trying to connect to the jabber server, is done by telepathy-gabble, or by the presence What I meant here is, Does the PS check if jabber server is accessible, and then runs telepathy-gabble?, or this is one of the tasks of telepathy-gabble?, which as I see you replied to Depends what you mean. The Presence Service is responsible for choosing when to try to connect (at which time it calls the Connect() D-Bus method on Gabble), but it's Gabble that actually opens a TCP socket to the Jabber server and tries to talk to it. You can see this in the PS log, for instance: 1194431620.966651 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): connecting... 1194431620.967008 DEBUG s-p-s.telepathy_plugin: ServerPlugin object at 0x85f1e14 (telepathy_plugin+TelepathyPlugin at 0x82c8fb0): Connect() succeeded (note that Connect() succeeded is a bit misleading - it just means that the connection manager has said OK, I'll try, rather than that it has actually been able to connect.) In the telepathy-gabble log you'll then see something like this: ** (telepathy-gabble:25330): DEBUG: do_connect: calling lm_connection_open Going to connect to olpc.collabora.co.uk
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Sjoerd, Guillaume, Simon, What does proper notification mean? Which are the cases that it happens? Probably this is not if an XO moves slowly to a place with poor connectivity. In the case of a temporary(short) disruption of connectictivity, how much time does it generally take for it to return? You mentioned that in the past XOs were appearing and disappearing constantly. This implies that the common drop of connectivity is in the scale of few seconds. If it is lost for more than a few minutes, than it is not bad for the XO to leave and return. So I believe that 1h or even 10min are too long timeouts. There are a couple more things I would like to address: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? 2. At what point in the source code, the presence serivce i.will try to connect to the jabber server? ii. run gabble? 3. I noticed the dbus diagram is updated. Indeed we have a better picture of whats happening. But, still we need some more information like: i. state diagram of the presence service ii. what type of communication is taking place between NM and PS iii. when connection is switched from linklocal to schoolserver(for example) what steps are taking place in the presence service iv. the internet connectivity is detected by NM and sent to PS, or detected by PS yani On 10/30/07, Sjoerd Simons [EMAIL PROTECTED] wrote: On Fri, Oct 26, 2007 at 02:48:55PM -0400, Giannis Galanis wrote: Sjoerd, I would like to ask you, you replied at one of the bugs: Moving from a bugreport to a private mail might not be a great idea.. Could you in the future just put your questions in the bugreport so we can have the discussion in a more public fashion :) Salut used to drop the presence of people for which it couldn't resolve the extra information, but this seemed to give a lot of problems in the mesh (people appearing and disappearing all the time). So as a workaround we switched to only dropping presence iff all info about a node has gone. Which has the downside the nodes that are really gone can still appear on the mesh view for some time (specifically when they didn't send a proper mdns bye packet or when that was dropped). iff all info about a node has gone what does this mean? It means that it is hard to decide when a node has really gone or if the network link to a certain node is just (temporarily) bad. In the OLPC office, the second case apparently happens a lot. how often do you refresh? The refresh is done by avahi. Avahi tries every few minutes. Guillame worked on a patch to make the effect of being unsure about a user less bad (As in assume that if your unsure about for a certain period of time their actually really gone).. It still needs to be finished though. Which means for an end-users point of view, that if a user went away without doing proper notification, then they will only stay on the meshview for a limited amount of time (Say maximum of 10 minutes instead of the current situation of more then an hour) Sjoerd -- Kindness is the beginning of cruelty. -- Muad'dib [Frank Herbert, Dune] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Thank you all for your replies. They clear the picture a lot. To summarize: 1. We need to fix the timeout for icons to disappear. Can we try Guillaume's patch? Also we need to be able to resolve which icons are currently not avaiable(but still appearing). I believe that failed entries in _precense._tcp is a complete list. Is this correct? 2. We need to be able to restart PS. As you say this is not possible, but if we restart sugar will PS restart as well? 3. We need to force gabble to run. We have several instances of 4193 (almost all XOs connected to schoolserver,AP are running salut). Or at least to force trying to connect to jabber server. 4. The process of trying to connect to the jabber server, is done by telepathy-gabble, or by the presence On 11/6/07, Simon McVittie [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In reply to your previous mail, iff means if and only if. It's often used by mathematicians. On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote: What does proper notification mean? Which are the cases that it happens? If Salut is explicitly asked to disconnect, it will tell Avahi to delete all its mDNS records (this actually consists of re-sending all the records it was advertising, with the Time To Live set to 0 seconds). This is sometimes referred to as a goodbye packet. See http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt section 11.2 Goodbye Packets. The only time we'll currently do this is when switching off Salut because Gabble has connected successfully. Probably this is not if an XO moves slowly to a place with poor connectivity. This is never done in response to network conditions - we can't know that we've lost network connectivity until it's too late. If the Time To Live on our mDNS records expires, that should have the same effect; however, as Sjoerd explained, we currently ignore that, because the 1CC mesh network is apparently unstable enough that the TTL sometimes expires even for laptops that are actually present. In the case of a temporary(short) disruption of connectictivity, how much time does it generally take for it to return? You mentioned that in the past XOs were appearing and disappearing constantly. This implies that the common drop of connectivity is in the scale of few seconds. You tell me! :-) I don't have enough XOs to replicate the conditions of a large mesh network like 1CC, so I can't comment on packet loss rates. Perhaps Dan Williams (who used to maintain Presence Service) could help you. If it is lost for more than a few minutes, than it is not bad for the XO to leave and return. So I believe that 1h or even 10min are too long timeouts. I believe we're currently using Avahi's default timeouts, which are those recommended in the mDNS draft (linked above). If I'm right about that, then we're using 120 second TTLs for the SRV and A records. Assuming Salut and Avahi follow the draft's recommendations, this means that for the records representing activities, buddies and laptops, if we haven't seen an annoucement of a particular record, we will: - - re-query after 96 - 98.4 seconds; - - if no reply, re-query after 102 - 104.4 seconds; - - if no reply, re-query after 114 - 116.4 seconds; - - if no reply, assume the record has vanished after 120 seconds. (In each of the ranges given for the re-queries, the exact time is chosen at random, to avoid simultaneous queries from everyone in the network.) The timeout is reset as soon as we see any announcement of a record. The only ones whose disappearance matters are the SRV and A records - if a TXT record fails to disappear when it shouldn't, we don't really care. TXT records have a substantially longer timeout (the draft recommends 75 minutes). There are a couple more things I would like to address: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? Only if client code that accesses the PS is amended to cope with this (I just filed #4681 to represent this). Until #4681 is closed, if the PS was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all of Sugar. Please see the bug for more details or to reply. 2. At what point in the source code, the presence serivce i.will try to connect to the jabber server? ii. run gabble? I'll answer (ii.) first. Gabble is automatically run by the session bus (dbus-daemon) via service activation, the first time the Presence Service uses it, if it isn't already running. So there is no explicit code in the PS to run Gabble. OK, now (i.): When Network Manager indicates that we have a valid IP address, we run the _init_connection method of the ServerPlugin instance. If the Gabble connection fails, we schedule a timer (currently 5 seconds) and retry running _init_connection when the timer runs out. (classes
Re: log-collect / log-send
Pascal, I have been working on something similar. It is a console script that gather networks related logs, and will be available in the next joyride. At the moment it includes: var/log/messages var/log/xorg.0.log /home/olpc.sugar.logs/presenceservice /home/olpc.sugar.logs/gabble /home/olpc.sugar.logs/salut and the following info: build firmware model time mac ips of all interfaces network topology jabber server salut or gabble The gzipped tar is ~20kb which is pretty low. However, other tests(for specific activites for ex.) will require other logs. I believe that a complete log activity should have a list of options like: network logs kernel logs activities logs all logs ...so the user can choose according to the problem also, the activity should be able to enable All Logs, from the .xinitrc, .sugar.debug files, or perhaps the full kernel logs. I was planning to add the above features in my script, but a sugar activity is better than a console script. Since we are working on the same thing we can use each other's help, and create a single application. yani On 10/29/07, Pascal Scheffers [EMAIL PROTECTED] wrote: I've created a rough-cut log-collector, it's in d.l.o/git/project/log- activity/log-collect.py For now, it just outputs some system info, tell me what's missing or what would be interesting to include? I don't know yet how to list installed activities... would that be just `ls /usr/share/activities/`? Or is there a package list? And then the main purpose: sending logs to OLPC, either using http- post or email or usb-stick or... but what logs should I collect? Just all of them? ~/.sugar/default/logs/* and /var/log/* ? Or should it be more selective? And some information from the journal, perhaps? What about privacy/sensitive information? Will there be any in the logs or system info? - Pascal Current log-activity.py output: bios-version: Q2C18 uptime: 434169.21 430235.72 wireless_mac: 00-17-C4-05-2A-58 uuid: 8A401F4E-E312-47F9-96C8-A488C99BDA2F localization: ?? kernel_version: Linux version 2.6.22-20071018.1.olpc.d4414541d2be66a ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-51)) #1 PREEMPT Thu Oct 18 11:44:14 EDT 2007 diskfree: 716 MB laptop-info-version: 0.1 memfree: 63496 kB serial-number: SHF7250025C disksize: 1024 MB keyboard: ??-??-?? olpc_build: OLPC build joyride 58 (stream joyride; variant devel_jffs2) country: USA board-revision: B4 motherboard-number: QTFLCA72400063 POWER_SUPPLY_NAME=olpc-battery POWER_SUPPLY_TYPE=Battery POWER_SUPPLY_STATUS=Full POWER_SUPPLY_PRESENT=1 POWER_SUPPLY_HEALTH=Good POWER_SUPPLY_TECHNOLOGY=LiFe POWER_SUPPLY_VOLTAGE_AVG=6792960 POWER_SUPPLY_CURRENT_AVG=0 POWER_SUPPLY_CAPACITY=97 POWER_SUPPLY_CAPACITY_LEVEL=Full POWER_SUPPLY_TEMP=2508 POWER_SUPPLY_TEMP_AMBIENT=4300 POWER_SUPPLY_ACCUM_CURRENT=8390 POWER_SUPPLY_MANUFACTURER=BYD POWER_SUPPLY_SERIAL_NUMBER=5d0d0100daff ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: log-collect / log-send
Eduardo, There is a wiki page with some similar info: http://wiki.laptop.org/go/Developer_Environment I just realized that this page is created and edited by you So you have written scripts for this purpose as well? I have attached my two scripts. The are written in bash, but they are not commented netstatus gathers network info like: mac ip eth0,msh0,eth1,etc dns jabber server MPP,AP,schoolserver,linklocal gabble/salut netlog gathers the following: output from netstatus info file with build,firmware,model messages Xorg.0.log (thanx Jim for the comment in the trac) presenceservice.log gabble.log salut.log yani On 10/30/07, Eduardo Silva [EMAIL PROTECTED] wrote: Hi Guys, I have been working on something similar. It is a console script that gather networks related logs, and will be available in the next joyride. Would be better focus to develop just a main class to collect this information and different front-ends as a console script and the UI interface under the log activity. In this way we can avoid to duplicate code. Giannis, where is your source code?, can be cool if you and Pascal can merge a final python class. I was planning to add the above features in my script, but a sugar activity is better than a console script. Since we are working on the same thing we can use each other's help, and create a single application. both can be useful, but using just ONE collector ;) cheers. Eduardo. netlog Description: Binary data netstatus Description: Binary data ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: ip4-address buddy property - still needed?
The feature, although not usable by the activities, it has other benefits. By observing the buddy list, you acquire instant information of the network connection go the users: when connected to channel 1 for example: 169.254.x.x address are in link-local 172.18.x.x are connected to schoolserver when connected to a jabber server: 169.254.x.x are connected through an MPP 18x.x.x are media lab 172.18.x.x are connected to schoolserver in olpc etc It is information continuously used in network testing, also useful from the users prespective: 1. in the case of connecting to multiple jabber servers, the user should be able to tell which XO in the neighbout view belongs to the same school 2. get the geopraphical location of another user In future versions of the neighbor view, or through other activities, the user should be able to filter for specific XOs according to location, or school(in the case he's connected to many servers). Two children in the same school should be able to recognize each other even if they are connected through a jabber server, other then the one in the school. It can also be useful for locating an XO in case of theft. I have also added a ticket(4405) for adding the public id in the buddy list properties. It is a small part of data(both IPs, private and public), which can be harmfully incorporated in the telepathy services. Please let me know if you agree, yani On 10/25/07, Jim Gettys [EMAIL PROTECTED] wrote: It seems, from your discussion like unless someone grumbles today, this should be removed immediately. And it removed within a week, even if someone grumbles... - Jim On Thu, 2007-10-25 at 10:15 +0100, Simon McVittie wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We still have one set of OLPC-specific patches to Salut (the link-local collaboration backend) that has been rejected upstream, which is the one that adds support for the deprecated ip4-address buddy property. This was used during a transitional period to enable simple TCP-based collaboration for activities that didn't use Tubes; Sjoerd is reluctant to keep this patch set, because it's meant to have gone away by now! Is anyone still using this property? If not, can we kill it? It was added in Trial-2, and it was meant to be gone by Trial-3 but was left in just in case, so it really ought to disappear. When it does, we can delete some code from Salut and Presence Service. Places it's exposed in the APIs, which I propose to get rid of: PS D-Bus API: Buddy.GetProperties() returns a dict that contains ip4-address: 10.0.0.1 (or whatever), and Buddy.PropertyChanged signal includes a dict that can contain the same sugar.presence: Buddy has a GLib property ip4-address (aka buddy.props.ip4_address) and can emit it in its property-changed signal The Read activity appears to be the only thing in my jhbuild that uses ip4-address (#4297). It should be ported to either stream tubes (when they're ready in Salut, which should be this or next week) or D-Bus tubes (now). Gabble already supports stream tubes, so stream-tube support can be implemented on a branch and tested against Gabble. Porting from plain TCP to stream tubes should be very straightforward; I hope to produce a proof-of-concept patch for Read later today. Simon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net iD8DBQFHIF7HWSc8zVUw7HYRAvp6AJ9G/Xiw27pPPMm0g02vhXzRhzUxqwCfW27Z nh1B/wqe7GD/xf/YaOPVaw8= =42L7 -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel -- Jim Gettys One Laptop Per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Presence service bugs/enhancements
Simon, The following are the current bugs/enhancements regarding the presence service. They are listed from high to low priority with their corresponding trac number. 1. The presence service should detect more efficiently the internet connectivity and switch to gabble when appropriate(4193) 2. In link-local XOs are seen in neighbor view but cannot be shared with. Sometimes they are not connected to the mesh anymore, but still present. In some such cases the avahi-browse cannot resolve the services of the corresponding XO. This is high priority but i dont have a log file in a blocking case, although i have experienced it in build617(4402) 3. Ability to switch from gabble to salut manually using the options: auto,salut,gabble(4403) 4. Ability to keep an activity alive when passing from salut to gabble and vice versa. This can occur automatically when internet connectivity is dynamically lost or recovered(4404) 5. In gabble, the public IP must be available in the buddy list, or at least be accessible through the jabber server upon request(4405) 6. The jabber servers should be switchable(to change from one to the other) in a neater way then accessing the config file and rebooting. This can probably be invoked by sending smth like ..xmlns:stream= http://etherx.jabber.org/streams; to=jabber.laptop.orgas i noticed in the log files. If it is simple to apply, can you describe how it can be done properly?(not on trac) Thanx yani ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Network connectivity test update
Kim, Ricardo, I have updated the Network connectivity test page( http://wiki.laptop.org/go/Test_Network_Configuration). I have added some additional information concerning the IP addresses and the resolv.conf file, in order to make things more clear. I updated the connectivity_status script. It can now detect if the XO acts as an MPP, connects through an Ethernet adapter etc. It can be useful for machines running old builds. I will be waiting for possible suggestions. yani ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Bugs and more
server, for instance) and the laptops can't decide if they should be talking local-link or through the jabber server. I'm sure that will mess up tubes sharing. I agree, I just had another issue in Alex's XOs where two XOs where connected to MediaLab802.11 but were still running salut. It was displaying 18.85 and 169.254 XOs, which seemed they were all from this room..this bug is very interesting! I had seen it in the past, but i couldnt describe it. Even when on AP mode, the avahi, which runs in the background, still creates the presence list from the mesh that its connected to (ch 6 in this case).It can be accesed by avahi-browse -t _presence._tcp). But , now it included 18.85 and 169 xos(bug 4193) = Once you've explored a few of these things from this email, I would like you to send it out to the devel mailing list for review. You might get a few good suggestions on other things for the status program and then figure out how to check it in so we can all use it for debug. yani On 10/12/07, Kim Quirk [EMAIL PROTECTED] wrote: Yani, Lots of good work on this document! I've added my comments inline below. Copying Alex for comments as well. Kim On 10/12/07, Giannis Galanis [EMAIL PROTECTED] wrote: Kim, A couple of things in case I forget later today, First, concerning the storage of the WEP keys, deleting the nm/networks.cfg does not work. It is recreated after sometime. You must delete and reboot, before trying to reconnect to the AP. I think it is an important bug that the APs dont refresh in the neighbor screen when they are configured. I had to reboot more than 10 times to finish my tests for different types of WEP keys.(bug 4190) = Could you tell if you needed to reboot the AP and wait for it to settle and reboot the XO -- each just once, but you would have to do the order properly. It is my expectation that after setting up the AP, it would need to be rebooted. And it makes sense to me that the XO would need its file removed and then to be rebooted in order to 'see' it as a new AP. This should be in the release note. Here is what I *think* the release note should say (please make appropriate changes, etc and add it to the Kqrelease): If there is a change to the configuration of your infrastructure AP, then after making changes and rebooting the AP; please delete the network config file and reboot the XO. (you might want to put this in steps and say how to find the config file, etc) Concerning the authentication via password and not WEP key, it is as I described to you in my last email. In fact each manufacturer has its own hashing algorithm, so it is virtually impossible to try all combinations. We must come up with a convention( e.g. only the airport algorithm or smth) = We are in the business of the laptop and not the AP. So we are not going to be able to test with all the Access Points out there and all their configurations. What we can do is to document the ones we have tested (and the work arounds, as you found with the Airport Extreme); and make sure we invite others to add their support notes about any problems or advice for working with other APs. The 3 items(2 circles+battery) in the donut appeared again(4191), and I reported it for the second time, this time as Wireless not as Network Manager in case it goes through more efficiently. No one replied to the first bug, although it must be very important. = I believe Dan's comments on this is that it is probably just a UI bug and not affecting the functioning. If you agree with that, then you should put a note in the bug about your thoughts and re-assign it to 'Sugar' so the right person will look at it. I would let it remain as 'high' priority until we know more. Also, can you put a note in the release notes on this one. Finally, have a look at this page: http://wiki.laptop.org/go/Test_Network_Configuration It includes a detailed guide of how to examine your network, including MPP, 169.x addresses, Gabble/Salut etc. I have also included a script which collects all the useful information(resolv.conf, ip, jabber server.. etc) and displays it in a neat format. = This looks great, Yani. I have a couple of questions and I added a few notes. On the IP Addresses section, I don't believe that you get a 169.x.x.x address when connected to an Infrastructure AP. Normally you would get a real IP address given to you by the DHCP on the internet side of the infrastructure AP. Perhaps your AP (the airport extreme) was not connected to the internet when you did your testing. Also, I'm pretty sure you will not get a 192.x.x.x from the school server mesh. Usually you see 192.x.x.x from people setting up a home wireless network. So I made some changes to that section of your document. I'm not sure your DNS check info (resolve.conf) is correct for the case of 169.x.x.x. You say it is because the XO is connecting through an MPP. But I think you