Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Jeff Garzik

Kok, Auke wrote:

Adam Jackson wrote:

On Tue, 2007-10-23 at 09:18 -0700, Kok, Auke wrote:

Adam Jackson wrote:

When the EEPROM gets corrupted, you can fix it with ethtool, but only if
the module loads and creates a network device.  But, without this option,
if the EEPROM is corrupted, the driver will not create a network device.

Signed-off-by: Adam Jackson [EMAIL PROTECTED]

NAK

wrong list, not sent to me, and while for e100 I was OK with this patch, for 
e1000
it really does not make sense to 'just allow' a bad checksum - if your eeprom is
randomly messed up then you cannot just fix it like this anyway.

That's strange, I managed to recover an otherwise horked e1000 with it.
What should I have done instead?



Dump the eeprom and send us a copy, plus any and all information to the card,
system etc.. I realize that you need the patch to actually create it but the
danger is that people will start using it *without* troubleshooting the real
issue. In various systems the eeprom checksum failure is actually due to a
misconfigured powersavings feature and the checksum is really not bad at all, 
but
the card just reports random values.

In any case, this patch should not be merged. We often send it around to users 
to
debug their issue in case it involves eeproms, but merging it will just conceal
the real issue and all of a sudden a flood of people stop reporting *real* 
issues
to us.



Sorry, I disagree.  Just as with e100, if there is a clear way the user 
can recover their setup -- and Adam says his was effective -- I don't 
see why we should be denying users the ability to use their own hardware.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Kok, Auke
Jeff Garzik wrote:
 Kok, Auke wrote:
 Adam Jackson wrote:
 On Tue, 2007-10-23 at 09:18 -0700, Kok, Auke wrote:
 Adam Jackson wrote:
 When the EEPROM gets corrupted, you can fix it with ethtool, but
 only if
 the module loads and creates a network device.  But, without this
 option,
 if the EEPROM is corrupted, the driver will not create a network
 device.

 Signed-off-by: Adam Jackson [EMAIL PROTECTED]
 NAK

 wrong list, not sent to me, and while for e100 I was OK with this
 patch, for e1000
 it really does not make sense to 'just allow' a bad checksum - if
 your eeprom is
 randomly messed up then you cannot just fix it like this anyway.
 That's strange, I managed to recover an otherwise horked e1000 with it.
 What should I have done instead?


 Dump the eeprom and send us a copy, plus any and all information to
 the card,
 system etc.. I realize that you need the patch to actually create it
 but the
 danger is that people will start using it *without* troubleshooting
 the real
 issue. In various systems the eeprom checksum failure is actually due
 to a
 misconfigured powersavings feature and the checksum is really not bad
 at all, but
 the card just reports random values.

 In any case, this patch should not be merged. We often send it around
 to users to
 debug their issue in case it involves eeproms, but merging it will
 just conceal
 the real issue and all of a sudden a flood of people stop reporting
 *real* issues
 to us.
 
 
 Sorry, I disagree.  Just as with e100, if there is a clear way the user
 can recover their setup -- and Adam says his was effective -- I don't
 see why we should be denying users the ability to use their own hardware.


That's not even relevant, I already offer the same patch offline to people who
*really* only have a wrong checksum, AFTER we check the contents of their eeprom
for them.

We help everyone out, and if you merge this patch you will prevent users from
getting to us for support in the first place.

I realize that we should probably document the bad eeprom checksum case more
decently but I think merging this patch is a bad idea for the *user* in all 
cases.

You completely bypass the fact that e100 eeproms and e1000 eeproms are 
completely
different beasts as well, one can be practically empty in all cases (e100), the
other one every bit counts (most e1000's), which is an unfair representation and
falsely tells the user that he can just do this without any information or
disclaimer as to what he may expect afterwards.


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Dave Jones
On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:

   In any case, this patch should not be merged. We often send it around to 
   users to
   debug their issue in case it involves eeproms, but merging it will just 
   conceal
   the real issue and all of a sudden a flood of people stop reporting *real* 
   issues
   to us.
  
  Sorry, I disagree.  Just as with e100, if there is a clear way the user 
  can recover their setup -- and Adam says his was effective -- I don't 
  see why we should be denying users the ability to use their own hardware.
 
Indeed. This is a common enough problem that not including it causes more pain
than its worth.  I have two affected boxes myself that I actually thought
the hardware was dead before I tried ajax's patch.

People aren't going to report this as a bug. They aren't going to try out 
patches,
they're going to do what I did and stick another network card in the box and
go on with life.

Our users deserve better than this.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Alan Cox
 People aren't going to report this as a bug. They aren't going to try out 
 patches,
 they're going to do what I did and stick another network card in the box and
 go on with life.
 
 Our users deserve better than this.

Agreed. By all means warn people, or give them a 1-800 Intel number to
phone, but they should be able to continue as well.

Alan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread David Miller
From: Kok, Auke [EMAIL PROTECTED]
Date: Tue, 23 Oct 2007 14:01:21 -0700

 We help everyone out, and if you merge this patch you will prevent
 users from getting to us for support in the first place.

If using the bad eeprom has to be explicitly enabled by the user, your
argument holds no water.  We just need to make sure the patch does
that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread David Miller
From: Dave Jones [EMAIL PROTECTED]
Date: Tue, 23 Oct 2007 17:20:26 -0400

 Indeed. This is a common enough problem that not including it causes
 more pain than its worth.  I have two affected boxes myself that I
 actually thought the hardware was dead before I tried ajax's patch.

 People aren't going to report this as a bug. They aren't going to
 try out patches, they're going to do what I did and stick another
 network card in the box and go on with life.

 Our users deserve better than this.

Seconded.  The resistence to this patch is just flat-out rediculious,
just like it was in the e100 case.

And I think all of this e1000 is different! talk is merely a
scarecrow for the fact that Intel simply doesn't want this patch
merged for some other reason.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Kok, Auke
Dave Jones wrote:
 On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:
 
In any case, this patch should not be merged. We often send it around to 
 users to
debug their issue in case it involves eeproms, but merging it will just 
 conceal
the real issue and all of a sudden a flood of people stop reporting 
 *real* issues
to us.
   
   Sorry, I disagree.  Just as with e100, if there is a clear way the user 
   can recover their setup -- and Adam says his was effective -- I don't 
   see why we should be denying users the ability to use their own hardware.
  
 Indeed. This is a common enough problem that not including it causes more pain
 than its worth.  I have two affected boxes myself that I actually thought
 the hardware was dead before I tried ajax's patch.


look: You should have reported this to us and you didn't. Now you are using the
fact that you did not report it as an argument which is out of place.

why do you say it is common? how often have you seen this and not reported it 
back
to our support? are you willingly trying to frustrate this issue?


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Kok, Auke
David Miller wrote:
 From: Dave Jones [EMAIL PROTECTED]
 Date: Tue, 23 Oct 2007 17:20:26 -0400
 
 Indeed. This is a common enough problem that not including it causes
 more pain than its worth.  I have two affected boxes myself that I
 actually thought the hardware was dead before I tried ajax's patch.

 People aren't going to report this as a bug. They aren't going to
 try out patches, they're going to do what I did and stick another
 network card in the box and go on with life.

 Our users deserve better than this.
 
 Seconded.  The resistence to this patch is just flat-out rediculious,
 just like it was in the e100 case.
 
 And I think all of this e1000 is different! talk is merely a
 scarecrow for the fact that Intel simply doesn't want this patch
 merged for some other reason.

no, e1000 eeproms contain many timing information and bits crucial to getting 
the
adapter working in the first place. All of these are documented in our 
PUBLICALLY
available SDM which is downloadable from our e1000.sf.net website. (e.g. 8254x
sdm, section 5.6, page 98+). For pci-e silicon this gets much more complex.

we haven't even heard from the user what hardware he has nor gotten an eeprom 
dump
from him.

I'm not hiding anything and you're deliberately creating a negative atmosphere 
here.

The people who do have eeprom checksum issues have come to us in the past. The
fact that you didn't see them means that they *properly* made it to us.

As a matter of fact I am still working on a permanent solution for bad eeprom
checksums on lenovo T60 laptops. Should I just drop that issue and leave the 
real
problem unsolved?

This patch only affirms *YOUR* point of view, not that of many customers who 
have
come to us and received help with many issues. You're completely ignoring that 
and
that is unfair.

If we want this patch in the kernel in some form that actually shows in a decent
way what a user *should* do and more importantly should *know*, then maybe we 
can
talk about that.

The patch in question does not add any extra information nor does it do some
sanity checking on the eeprom values or turn off any of the problematic features
that we really should disable. We even have code that allows some of the 
hardware
to run without a properly setup eeprom in a few hardware cases. And we 
definately
should print out a lot more warnings to the user that running with garbage 
eeprom
data is _not_ a good idea.


Auke


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Stephen Hemminger
On Tue, 23 Oct 2007 16:03:38 -0700
Kok, Auke [EMAIL PROTECTED] wrote:

 Dave Jones wrote:
  On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:
  
 In any case, this patch should not be merged. We often send it around 
  to users to
 debug their issue in case it involves eeproms, but merging it will 
  just conceal
 the real issue and all of a sudden a flood of people stop reporting 
  *real* issues
 to us.

Sorry, I disagree.  Just as with e100, if there is a clear way the user 
can recover their setup -- and Adam says his was effective -- I don't 
see why we should be denying users the ability to use their own hardware.
   
  Indeed. This is a common enough problem that not including it causes more 
  pain
  than its worth.  I have two affected boxes myself that I actually thought
  the hardware was dead before I tried ajax's patch.
 
 
 look: You should have reported this to us and you didn't. Now you are using 
 the
 fact that you did not report it as an argument which is out of place.
 
 why do you say it is common? how often have you seen this and not reported it 
 back
 to our support? are you willingly trying to frustrate this issue?
 
 
 Auke

What about a compromise like ignore_checksum module option?
That way users with bad checksums wouldn't just ignore the problem (no one 
reads console logs),
but would have a way to correct the checksum.

There are many reasons would want the ability to fix the problem themselves 
without
asking Intel.  

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add eeprom_bad_csum_allow module option to e1000.

2007-10-23 Thread Dave Jones
On Tue, Oct 23, 2007 at 04:03:38PM -0700, Kok, Auke wrote:
  Dave Jones wrote:
   On Tue, Oct 23, 2007 at 04:40:01PM -0400, Jeff Garzik wrote:
   
  In any case, this patch should not be merged. We often send it around 
   to users to
  debug their issue in case it involves eeproms, but merging it will 
   just conceal
  the real issue and all of a sudden a flood of people stop reporting 
   *real* issues
  to us.
 
 Sorry, I disagree.  Just as with e100, if there is a clear way the user 
 can recover their setup -- and Adam says his was effective -- I don't 
 see why we should be denying users the ability to use their own 
   hardware.

   Indeed. This is a common enough problem that not including it causes more 
   pain
   than its worth.  I have two affected boxes myself that I actually thought
   the hardware was dead before I tried ajax's patch.
  
  
  look: You should have reported this to us and you didn't. Now you are using 
  the
  fact that you did not report it as an argument which is out of place.

you're missing the point.  It looks like a hardware failure. Why would I report 
this? 

  why do you say it is common? how often have you seen this and not reported 
  it back
  to our support? are you willingly trying to frustrate this issue?

Not at all. The only frustration here is that I used to have a kernel that
worked, upgraded, and thought that my hardware was broken.
How many other users thought the same ?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html