Inline. Regards, Ronak
> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:testing- > [EMAIL PROTECTED] On Behalf Of Michail Bletsas > Sent: Thursday, March 27, 2008 10:53 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; Titus Brown; [EMAIL PROTECTED]; testing- > [EMAIL PROTECTED] > Subject: [Testing] [sugar] Automated testing, OLPC, code+screencasts. > > "Benjamin M. Schwartz" <[EMAIL PROTECTED]> wrote on 03/27/2008 > 01:37:16 AM: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Michail Bletsas wrote: > > | [EMAIL PROTECTED] wrote on 03/26/2008 09:19:19 PM: > > | > > |> 2. Many, and perhaps most, of OLPC's remaining difficult bugs are > > | related > > |> to the network. They are most commonly related to the closed > wireless > > |> firmware, which is buggy and lacks key features regarding mesh > routing > > | and > > |> multicast. > > | > > | Can you qualify your statement? > > I have seen the wireless hardware silently drop all outgoing packets but > > continue to route incoming packets for several minutes, until forcibly > > reset by the user (about a month ago). The firmware is so unstable that > > the wireless driver even contains a mechanism to recognize when the > > firmware has wedged and reset it. This is what I mean by buggy. > > So according to your thinking everything that has a reset button is buggy. > I guess that, technically speaking, you are correct ;-) > I also tend to believe that a "thick" firmware like the one that we use on > the 8388 will always have bugs given that it is several hundred thousand > lines of code so I don't feel bad for putting the reset functionality > there in the first place. > > You are also very quick to point fingers to the firmware for everything > that goes wrong with the networking subsystem of the laptop. > The behavior that you are describing can be explained when the wireless > firmware doesn't communicate with the host CPU and is only forwarding > frames for other mesh nodes. There has also been a major rewrite of the > (completely open source) driver in use with the laptop, after which we > started to see that behavior (which was not observed before the rewrite). > It is very easy to point fingers on religious grounds, it is much more > difficult to fix problems. > > > > > > | What features does it lack when it comes to mesh routing? > > For me, the #1 missing feature is whitelisted wake-on-multicast. To be > > specific, it should be possible for the firmware to be told which > > multicast addresses refer to this host. The firmware would then wake up > > the CPU only when a multicast packet arrives with a destination that is > on > > the whitelist. Without this feature, we are forced to choose between > > never waking on multicast, and missing lots of important packets, and > > waking up on every single multicast packet, which essentially means > never > > sleeping at all. > > First of all, what you are describing is standard WOL behavior (Wakeup on > LAN) which was not present in the original spec of the mesh firmware in > favor of the more general wakeup on broadcast, mcast or unicast. Marvell > is working on adding that in. > So, no bug here, just oversight on our part which is going to be remedied. > > > Even with that support in place, we will still be "missing lots of > important packets" unless we decide to wakeup on every multicast frame. So > a more specific filter is required because you don't want to wakeup on > Avahi announcements but you do want to wakeup on traffic from activities > that you already participate. You can do that on the application level, by > stopping the Avahi listener before you suspend, however that will add a > lot of time to suspend and resume. > > [Ronak] we are introducing a filtered mechanism in the firmware that will allow the driver to program a handful of multicast address and hence will allow the device to wake-up the host on some (and not all) of the multicast addresses. > > > > > My #2 missing feature is a control for transmit gain and receive gain. > By > > decreasing gain, the range of each transmission could be reduced, > turning > > dense meshes in a single classroom into multihop meshes. This might > > compensate somewhat for the firmware's simplistic multicast routing. > It's > > not clear that this would work, but at present we cannot even try it. > > Why would you ever want to turn a classroom into a multihop mesh? > Just because you have a hammer, doesn't turn everything into a nail. > That is exactly the approach that has created all the unrealistic > expectations about what the mesh can and cannot do. > If you are in a classroom, an AP will always be a lot more efficient since > it doesn't have to do with the mesh control plane traffic. > > As far as the support for transmit gain and receive gain is concerned, > transmit power control is definitely supported and the firmware even > supports per frame tx power setting. The D/A on the power amplifier used > on the XO's module is not fast enough for that to work, so one has to > settle for coarser grain control. The bottom line is that this is a > hardware limitation, not software. > > I don't really understand what receive gain adjustment will buy you in a > dense scenario. One of the fundamental issues with WiFi radios in general, > is that interference range is much larger than decode range. What you can > play with is the clear channel assessment threshold, however that is > different from receiver gain (usually done via an AGC in the analog > domain). > > > > > > Smart multicast routing is the other obvious missing feature; I > appreciate > > that this is still considered an academic research problem. > It is and the 802.11s standards committee is also struggling with it. > > > > > | Can you point me to a better working implementation out there when it > > | comes to multicast routing? > > No, I cannot. > > > > This wireless firmware may be the best mesh implementation in all of > > history, in the whole world. It's still disgustingly buggy, and has > > already set the project back months. [Ronak] Not sure how far this is actually true. But we have been through this in an earlier email thread and I don't feel the need to re-iterate the explanations again here. Its multicast and wakeup behaviors > > have forced us to drop critical features. The software team has come to > > regard the wireless system as so unpredictable that any task involving > it > > is "science, not programming". > > Just looking at the number of bugs in the trac contradicts your statement. > And yes, the wireless subsystem does many things that existing radios > don't do. > It is also asked to do "magic" as opposed to what physics realistically > allow. > That's the main bug with it right now. It just doesn't make spectrum out > of thin air... > > > > > > I am also quite convinced that if OLPC developers were free to read the > > source code and modify it, given access to Marvell's internal > > documentation, we would be much further along. > > That is generally true. It runs against long established practice in the > wireless industry that is enforced by some valid and some not so valid > reasons. Unfortunately, there is no example in the industry right now of > an open fully-functional low-level wireless stack and that will take some > time to change. If the XO ends up being produced in really high volumes, > then we will definitely revisit that. The bottom line is that right now > the volumes of the devices that require their radios to be "closed" are > much higher than those of the open source devices. > > > > > > > |> 3. Almost all of OLPC's major bugs are Heisenbugs. They often don't > > |> appear at all with only one laptop, and appear rarely until one has > 12 > > | or > > |> more laptops sharing a wireless mesh. > > | > > | And most of them are due to the fact that our application traffic > > | saturates the wireless spectrum. > > > > Indeed. And that is due to a mismatch between Salut, which assumes > > efficient multicast routing, and the firmware, which doesn't provide it. > > I know very little about the ongoing work with Cerebro, but that seems > to > > be a very reasonable next step. > > Yes, it is. > > > M. > > > _______________________________________________ > Testing mailing list > [email protected] > http://lists.laptop.org/listinfo/testing _______________________________________________ Testing mailing list [email protected] http://lists.laptop.org/listinfo/testing
