Re: e100: checksum mismatch on 82551ER rev10
Hi, If the EEPROM has a broken checksum, the user should have an option that allows him to try and use the device anyways, end of story. Ive come across this problem a number of times on e1000 chips (to be clear it was vendor programming issues). The driver has the option to read and write the EEPROM already. All we need is the ability for the driver to hang around so that we can use ethtool to fix it. At the moment we carry an out of tree patch to do this. Anton - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: checksum mismatch on 82551ER rev10
[EMAIL PROTECTED] said: And BTW I want to remind the entire world that the last time Intel cried wolf to all of us about vendors using broken EEPROMs with their networking chips it turned out to be a bug in one of the patches Intel put into the Linux driver. :-) Intel should really humble themselves and help users instead of hinder them. Putting the blame on other vendors does not help users, I don't care how you spin it. It only serves to make Intel look like a bunch of control freaks, and that pisses off users to no end. The real problem here is neither Intel nor users. It's crappy vendor QA. I recently had to deal with a batch of e1000 cards that had the *wrong* EEPROMs, with *correct* checksums. So of course the driver didn't complain - nevermind the fact that the EEPROMs might claim you have a copper card when it's really fiber. And that's best case, because it fails obviously. Far worse is when an EEPROM is close enough to work, but claim the wrong chipset revision and cause the driver to do totally wrong things in strange circumstances. I think this is what Auke is worried about. If you can't trust the EEPROM, all sorts of maddeningly subtle things can go wrong. And it isn't likely to be properly diagnosed by an end user. The sad thing is that the checksum can only protect against a subset of EEPROM problems. But it does help. As a counterexample, a power failure last weekend corrupted the EEPROM of the onboard e100 in one of my servers, and this EEPROM check led to an immediate diagnosis of the problem. Please put the option into the e100 driver to allow trying to use the device even if the EEPROM checksum is wrong. There is already support for EEPROM read/write in ethtool. I used it to fix the e1000 cards in question. If e100 implements ethtool -E, all that's needed is documentation on where in the EEPROM the checksum is stored and how to calculate it. I don't doubt the freely-available pdfs for e100 chipsets cover this. Jason - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: checksum mismatch on 82551ER rev10
Auke Kok wrote: Charlie Brady wrote: Let's assume that these things are all true, and the NIC currently does not work perfectly, just imperfectly, but acceptably. With the recent driver change, it now does not work at all. That's surely a bug in the driver. There is no logic in that sentence at all. You're saying that the driver is broken because it doesn't fix an error in the EEPROM? It's broken because it bails completely instead of just emitting a warning message. You wouldn't believe the number of hours people spend out there trying to get a Linux box up when there's no network access. Bailing out and completely disabling the hardware on checksum errors is shooting those people in the foot, because they'll need to try and debug the driver, or the hardware, or do something completely else, perhaps on an embedded device, and you're basically telling them We at Intel do not want to allow you to even attempt to make this your hardware work.. By refusing to add an option to NOT bail, you're adding And we're happy to handicap any attempts you might make at it.. We're trying extremely hard to fix real errors here You're not fixing anything, you're creating a problem for the user, sorry. (especially when we find that hardware resellers send out hardware with EEPROM problems) and you are asking for a workaround that will (likely) introduce random errors and failure into your kernel. You've established yourself that the most likely cause of the error is that the vendor forgot to run a checksumming tool. That's hardly random errors and failure. You're trying to pull Linux end users into a war between Intel and it's vendors, so you can make end users scream at the vendors when they forget to run the checksum tool. Well, perhaps you should drop that and instead make it so that the *tools* bail when the checksum is wrong, not the end user's driver. If you want to recalculate the checksum yourself and put it in the EEPROM then I am also fine with that. Could you please provide a method and/or tool to do that? But we can't support an option that allows all users to willingly enable a piece of non-properly-working hardware. The tactful thing to do would be to put out a big fat error message during boot, but not bailing. If you're worried that the end user might not see the message, then bail, but provide an option to load anyway. This is the only constructive and meaningful way forward. There's no point in holding the end user hostage. The bottom line is that your problem is that a specific hardware vendor is/was selling badly configured hardware, and you buy it from them, even after it's End Of Lifed for that vendor. Even though that vendor did buy the units properly configured and had all the tools needed to configure them properly. There's no way for me to make Nokia do anything about this problem. Please don't try to drag me into a Intel vs vendors war just for the purpose of making me a number in their statistics. (Maybe you could improve your tools so they'll want to fix the checksum.) I can maybe fix your problem by seeing if we can get you an eeprom update Any chance you could get one of those for me? (Yeah, I do realize that I'm critizicing and then asking for help. Cocky :-D.) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: checksum mismatch on 82551ER rev10
From: Molle Bestefich [EMAIL PROTECTED] Date: Fri, 4 Aug 2006 13:04:07 +0200 You're trying to pull Linux end users into a war between Intel and it's vendors, so you can make end users scream at the vendors when they forget to run the checksum tool. I totally agree, Intel driver maintainers generally act like complete idiots in these kinds of situations. If the EEPROM has a broken checksum, the user should have an option that allows him to try and use the device anyways, end of story. It is only self serving to not provide this option to the user. People make errors, EEPROM's get shipped with bad checksums but the device might still be usable. That is life get over it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e100: checksum mismatch on 82551ER rev10
From: David Miller [EMAIL PROTECTED] Date: Fri, 04 Aug 2006 04:20:24 -0700 (PDT) I totally agree, Intel driver maintainers generally act like complete idiots in these kinds of situations. If the EEPROM has a broken checksum, the user should have an option that allows him to try and use the device anyways, end of story. And BTW I want to remind the entire world that the last time Intel cried wolf to all of us about vendors using broken EEPROMs with their networking chips it turned out to be a bug in one of the patches Intel put into the Linux driver. :-) Intel should really humble themselves and help users instead of hinder them. Putting the blame on other vendors does not help users, I don't care how you spin it. It only serves to make Intel look like a bunch of control freaks, and that pisses off users to no end. Please put the option into the e100 driver to allow trying to use the device even if the EEPROM checksum is wrong. If an Intel developer doesn't do it, I will. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html