Bug#572201: Further queries

2010-04-07 Thread stephen mulcahy

Ben Hutchings wrote:

On Tue, 2010-03-16 at 10:33 +, stephen mulcahy wrote:
[...]

We will shortly update the official kernel packages to incorporate this
release, so you could just wait a day or two and update.  However I'm not
aware of any changes in 2.6.32.10 that would fix this sort of bug.
Again, I scanned the changelogs and nothing jumped out at me. I'll try 
the updated package when you release it to see if it makes a difference.

[...]

Have you done this and did it help?


Hi,

Just tried this now and it doesn't help. Still getting nodes dropping 
out after running the Hadoop Terasort (behaviour which doesn't happen 
with the 2.6.30 kernel).


Still no messages in the logs - but as usual, ifdown followed by ifup 
makes things right.


Anything else I can run for diagnostics?

-stephen

--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#572201: Further queries

2010-04-04 Thread Ben Hutchings
On Tue, 2010-03-16 at 10:33 +, stephen mulcahy wrote:
[...]
  We will shortly update the official kernel packages to incorporate this
  release, so you could just wait a day or two and update.  However I'm not
  aware of any changes in 2.6.32.10 that would fix this sort of bug.
 
 Again, I scanned the changelogs and nothing jumped out at me. I'll try 
 the updated package when you release it to see if it makes a difference.
[...]

Have you done this and did it help?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#572201: Further queries

2010-03-16 Thread stephen mulcahy

Ben Hutchings wrote:

On Mon, Mar 15, 2010 at 05:20:32PM +, stephen mulcahy wrote:
All pause frames should be dropped, either by the hardware or the driver.
So it's not unexpected that these are equal.


Ok, thanks for the clarification.


It might be interesting to see what happens if you disable pause frame
handling with this command:

ethtool -A eth0 autoneg off rx off tx off


I tried this and re-ran my hadoop test and I'm seeing the same drop-outs 
 from systems as with this enabled. Running ethtool -S eth0 on a 
dropped out system gives the following output.


NIC statistics:
 tx_bytes: 45900034824
 tx_zero_rexmt: 40968086
 tx_one_rexmt: 0
 tx_many_rexmt: 0
 tx_late_collision: 0
 tx_fifo_errors: 0
 tx_carrier_errors: 0
 tx_excess_deferral: 0
 tx_retry_error: 0
 rx_frame_error: 0
 rx_extra_byte: 0
 rx_late_collision: 0
 rx_runt: 0
 rx_frame_too_long: 0
 rx_over_errors: 0
 rx_crc_errors: 0
 rx_frame_align_error: 0
 rx_length_error: 0
 rx_unicast: 42104294
 rx_multicast: 897
 rx_broadcast: 564
 rx_packets: 42105755
 rx_errors_total: 0
 tx_errors_total: 0
 tx_deferral: 0
 tx_packets: 40968086
 rx_bytes: 48159336484
 tx_pause: 0
 rx_pause: 0
 rx_drop_frame: 0
 tx_unicast: 3322
 tx_multicast: 4392
 tx_broadcast: 23998478524

and no messages in the system logs.

These systems are running with DHCP (and have Avahi installed) - is it 
possible these are related to the problem (but again, why is it only 
showing up when running the 2.6.32 kernel).



I can't see any major changes in the forcedeth driver since 2.6.30.


I scanned what changelogs I could find also and nothing jumped out at me 
that could be the cause of this.



We will shortly update the official kernel packages to incorporate this
release, so you could just wait a day or two and update.  However I'm not
aware of any changes in 2.6.32.10 that would fix this sort of bug.


Again, I scanned the changelogs and nothing jumped out at me. I'll try 
the updated package when you release it to see if it makes a difference.


Let me know if there's any further testing I can do before I roll the 
systems back to 2.6.30 and put them back into production.


Thanks,

-stephen

--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#572201: Further queries

2010-03-15 Thread stephen mulcahy

Hi,

Any further thoughts on this?

In the ethtool output, I notice the following

rx_pause: 46798
rx_drop_frame: 46798

I've checked some other machines and I don't see any of either stat - 
possibly because these are specific to some nic drivers? Anyway, is it 
normal for those numbers to be the same?


As I said, I'm not seeing the behaviour with the 2.6.30 kernel - so 
wondering what has changed.


I see Linux 2.6.32.10 was just released, is it worth my while building 
that and seeing if I can reproduce the problem?


-stephen

--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#572201: Further queries

2010-03-15 Thread Ben Hutchings
On Mon, Mar 15, 2010 at 05:20:32PM +, stephen mulcahy wrote:
 Hi,

 Any further thoughts on this?

 In the ethtool output, I notice the following

 rx_pause: 46798
 rx_drop_frame: 46798

 I've checked some other machines and I don't see any of either stat -  
 possibly because these are specific to some nic drivers?

The statistics available through ethtool are entirely driver-dependent.
There is a small set of standard statistics which are shown in
/proc/net/dev and under /sys/class/net/name/statistics/.

 Anyway, is it  normal for those numbers to be the same?

All pause frames should be dropped, either by the hardware or the driver.
So it's not unexpected that these are equal.

It might be interesting to see what happens if you disable pause frame
handling with this command:

ethtool -A eth0 autoneg off rx off tx off

 As I said, I'm not seeing the behaviour with the 2.6.30 kernel - so  
 wondering what has changed.

I can't see any major changes in the forcedeth driver since 2.6.30.

 I see Linux 2.6.32.10 was just released, is it worth my while building  
 that and seeing if I can reproduce the problem?

We will shortly update the official kernel packages to incorporate this
release, so you could just wait a day or two and update.  However I'm not
aware of any changes in 2.6.32.10 that would fix this sort of bug.

Ben.

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org