from:"Josip Rodin"

Bug#875621: ditto

2018-11-12 Thread Josip Rodin

Hi,

Without this kernel module shipped, users of X1 gen6 are forced to compile,
which is significantly more taxing than just having to unblacklist the
module.

Can't it be shipped, yet added to the default blacklist, until the Yoga X11e
issue is resolved?

(It goes without saying that it's already most annoying to have to fiddle
with anything at all to get basic mouse support on this machine, it's like
it's 1998 all over again and I'm too old for this...)

TIA.

-- 
 2. That which causes joy or happiness.

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-21 Thread Josip Rodin

On Sat, Aug 17, 2013 at 06:30:48PM +0200, Josip Rodin wrote:
  LOCATIONOFFSET COUNT
 net_tx_action 0 1
 
 qdisc tbf 20: parent 1:2 rate 2Kbit burst 20Kb lat 4295.0s
  Sent 1235809 bytes 6051 pkt (dropped 182, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

JFTR I worked around this problem by giving up on sch_tbf - I replaced it
with an equivalent simple sch_htb setup (htb qdisc, htb class, sfq qdisc).

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130821182927.ga18...@entuzijast.net

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-17 Thread Josip Rodin

Package: linux-image-3.2.0-4-amd64
Version: 3.2.46-1

Hi,

I have a gateway machine, with $iface_Internet == xenbr2 and $iface_intranet
== xenbr0, running these traffic control rules on the outside interface
which are supposed to be a trivial ToS match and a limit on 20 Mbps:

tc qdisc del dev $iface_Internet root || true
tc qdisc add dev $iface_Internet root handle 1: prio
tc qdisc add dev $iface_Internet parent 1:1 handle 10: sfq
tc qdisc add dev $iface_Internet parent 1:2 handle 20: tbf rate 20mbit buffer 
20480 limit 16384
tc qdisc add dev $iface_Internet parent 1:3 handle 30: sfq

This worked just fine for about seven years now on a machine running
squeeze, and a fair few distro+kernel versions before that.
I changed the rate from 10 to 20 on 2012-10-12, and everything kept working
fine.

However, the upgrade to this new kernel appears to have killed it - the tbf
rule is causing outgoing HTTP connections to max out at around 8 Kbps.

When I remove tbf, everything is fine.

I think there's a software problem there - even if these rules were somehow
broken to begin with, this is a poor way of telling me that.

Please fix it. TIA.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130817080030.ga31...@entuzijast.net

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-17 Thread Josip Rodin

On Sat, Aug 17, 2013 at 12:33:02PM +0200, Josip Rodin wrote:
 On Sat, Aug 17, 2013 at 12:23:21PM +0200, Ben Hutchings wrote:
   tc qdisc add dev $iface_Internet parent 1:2 handle 20: tbf rate 20mbit 
   buffer 20480 limit 16384
   
   However, the upgrade to this new kernel appears to have killed it - the 
   tbf
   rule is causing outgoing HTTP connections to max out at around 8 Kbps.
  [...]
  
  This might be the same as bug #708995.  Does turning off GRO on the
  internal interface (not the bridge but the physical interface) work
  around it?
 
 Yes, it looks like ifenslave -c bond0 eth0  ethtool -K eth0 gro off makes
 TBF precise again, and vice versa.

That's on one machine. But on another wheezy machine with the same setup but
somewhat different hardware, turning off GRO didn't help.

How do I debug this further?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130817105659.ga32...@entuzijast.net

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-17 Thread Josip Rodin

On Sat, Aug 17, 2013 at 12:23:21PM +0200, Ben Hutchings wrote:
  tc qdisc add dev $iface_Internet parent 1:2 handle 20: tbf rate 20mbit 
  buffer 20480 limit 16384
  
  However, the upgrade to this new kernel appears to have killed it - the tbf
  rule is causing outgoing HTTP connections to max out at around 8 Kbps.
 [...]
 
 This might be the same as bug #708995.  Does turning off GRO on the
 internal interface (not the bridge but the physical interface) work
 around it?

Yes, it looks like ifenslave -c bond0 eth0  ethtool -K eth0 gro off makes
TBF precise again, and vice versa.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130817103302.ga26...@entuzijast.net

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-17 Thread Josip Rodin

On Sat, Aug 17, 2013 at 02:06:57PM +0200, Ben Hutchings wrote:
 On Sat, 2013-08-17 at 12:56 +0200, Josip Rodin wrote:
  On Sat, Aug 17, 2013 at 12:33:02PM +0200, Josip Rodin wrote:
   On Sat, Aug 17, 2013 at 12:23:21PM +0200, Ben Hutchings wrote:
 tc qdisc add dev $iface_Internet parent 1:2 handle 20: tbf rate 
 20mbit buffer 20480 limit 16384
 
 However, the upgrade to this new kernel appears to have killed it - 
 the tbf
 rule is causing outgoing HTTP connections to max out at around 8 Kbps.
[...]

This might be the same as bug #708995.  Does turning off GRO on the
internal interface (not the bridge but the physical interface) work
around it?
   
   Yes, it looks like ifenslave -c bond0 eth0  ethtool -K eth0 gro off 
   makes
   TBF precise again, and vice versa.
  
  That's on one machine. But on another wheezy machine with the same setup but
  somewhat different hardware, turning off GRO didn't help.
  
  How do I debug this further?
 
 You could try using the perf dropmonitor script as I described on my bug
 report.

Didn't you say that was also broken? :)

 The other machine might also have LRO enabled on the internal interface,
 although this is supposed to be disabled for bridged interfaces.  If the
 other machine is also passing traffic from another VM on the same
 physical host, it might be necessary to disable TSO on the interface
 within the other VM.

There's no distinction here between physical interfaces; I receive traffic
on a bond0 throught several VLANs.

On one machine there's eth0 and eth2 behind that bond0, and that's the
one where the workaround works. On the other one, there's only eth0 behind
that bond0 (by accident), and the workaround doesn't make tbf work, oddly
enough. I also tried removing other offload options, but didn't make a dent.

The machines have different hardware but identical netfilter and tc rules,
and I shift traffic between them by moving the IP addresses, using
keepalived.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130817125423.ga20...@entuzijast.net

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-17 Thread Josip Rodin

On Sat, Aug 17, 2013 at 02:58:07PM +0200, Ben Hutchings wrote:
How do I debug this further?
   
   You could try using the perf dropmonitor script as I described on my bug
   report.
  
  Didn't you say that was also broken? :)
 [...]
 
 It's fixed now.

Hmm. Googling says it was fixed in May, so it doesn't sound like something
that's going to come close to entering 3.2...

So I took the new script and placed into
/usr/share/perf_3.2-core/scripts/python/net_dropmonitor.py

But I still can't seem to run it:

% perf script net_dropmonitor
invalid or unsupported event: 'skb:kfree_skb'

Help?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130817133342.ga27...@entuzijast.net

Bug#719958: traffic control simple token bucket filter within prio broken in wheezy

2013-08-17 Thread Josip Rodin

On Sat, Aug 17, 2013 at 04:08:12PM +0200, Ben Hutchings wrote:
 On Sat, 2013-08-17 at 15:33 +0200, Josip Rodin wrote:
  On Sat, Aug 17, 2013 at 02:58:07PM +0200, Ben Hutchings wrote:
  How do I debug this further?
 
 You could try using the perf dropmonitor script as I described on my 
 bug
 report.

Didn't you say that was also broken? :)
   [...]
   
   It's fixed now.
  
  Hmm. Googling says it was fixed in May, so it doesn't sound like something
  that's going to come close to entering 3.2...
  
  So I took the new script and placed into
  /usr/share/perf_3.2-core/scripts/python/net_dropmonitor.py
  
  But I still can't seem to run it:
  
  % perf script net_dropmonitor
  invalid or unsupported event: 'skb:kfree_skb'
  
  Help?
 
 Try running it as root...

Well, that was stupid. Anyway, my test file transfer that drags along like
this:

Length: 2586317 (2,5M) [application/octet-stream]
Saving to: /dev/null

 8% [==] 207.064 13,7K/s  eta 2m 18s  
^C

Results in this:

Starting trace (Ctrl-C to dump results)
^CGathering kallsyms data
 LOCATIONOFFSET COUNT
net_tx_action 0 1

At the same time, the tc output changes from:

qdisc prio 1: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 948738 bytes 5358 pkt (dropped 117, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 10: parent 1:1 limit 127p quantum 1514b divisor 1024
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc tbf 20: parent 1:2 rate 2Kbit burst 20Kb lat 4295.0s
 Sent 948738 bytes 5358 pkt (dropped 117, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 30: parent 1:3 limit 127p quantum 1514b divisor 1024
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

to this:

qdisc prio 1: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1235809 bytes 6051 pkt (dropped 182, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 10: parent 1:1 limit 127p quantum 1514b divisor 1024
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc tbf 20: parent 1:2 rate 2Kbit burst 20Kb lat 4295.0s
 Sent 1235809 bytes 6051 pkt (dropped 182, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc sfq 30: parent 1:3 limit 127p quantum 1514b divisor 1024
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130817163048.ga23...@entuzijast.net

Bug#700755: huge slab_unreclaimable in Xen domU

2013-02-20 Thread Josip Rodin

On Wed, Feb 20, 2013 at 10:27:02AM +, Ian Campbell wrote:
 On Sun, 2013-02-17 at 00:22 +0100, Josip Rodin wrote:
  Package: linux-image-2.6.32-5-xen-amd64
 
 This is in a guest, right? Is it possible to try the non-Xen amd64
 flavour? I forget the exact status in Squeeze but IIRC most of the domU
 functionality is present in the -amd64 flavour with the -xen-amd64
 flavour only being required for dom0 and some of the more advanced domU
 features.
 
 The reason I ask this is that the non-xen flavour is closer to mainline
 and therefore should be easier to track down the issue with.
 
 If you are also able separately to try this with the Wheezy kernel that
 would be very useful too.

OK, I can install both (it's got PV-GRUB), which do you prefer to test first?
I'm asking because it'll likely take a few weeks for the bug to appear,
judging by what it did before.

  The thing I noticed was the slab_unreclaimable explosion, by a factor
  of 122. That... doesn't sound like something that should be happenning.
 
 Indeed. Is the system responsive enough to login and
 examine /proc/slabinfo? There is probably one which has exploded in
 size, it may even be sufficient to observe this over time and see if one
 seems to be slowly creeping upwards towards $doom.
 
  I'm going to try to run slabtop the next time I catch it in this state,
  in order to try to glean some more information.
 
 That would be great.

I did post two consecutive slabtop results... I thought they had all the
relevant info from /proc/slabinfo.

The two large elements that grew both in the total number of objects and
the active number were (extracted from my previous message):

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
first readout:
 65419  65419 100%4.00K  141798453728K kmalloc-4096
 65390  65390 100%2.06K  13338   15426816K net_namespace
second readout:
 65428  65428 100%4.00K  141818453792K kmalloc-4096
 65391  65391 100%2.06K  13339   15426848K net_namespace

How do I trace which process is calling this?

In comparison, now, under seemingly normal circumstances, slabtop looks like
this on that machine:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 56124  25272  45%0.11K   1559   36  6236K buffer_head
 24843  12898  51%0.19K   1183   21  4732K dentry
 23100  16107  69%1.01K   1540   15 24640K nfs_inode_cache
 11456   6403  55%0.06K179   64   716K kmalloc-64
 10208   8864  86%0.12K319   32  1276K kmalloc-128
  7308   5275  72%0.55K522   14  4176K radix_tree_node
  4947   4940  99%0.08K 97   51   388K sysfs_dir_cache
  3584   3573  99%0.01K  7  51228K kmalloc-8
  3200   2016  63%0.79K160   20  2560K ext3_inode_cache
  2068   1981  95%0.18K 94   22   376K vm_area_struct
  1792   1790  99%0.02K  7  25628K kmalloc-16
  1692   1631  96%0.63K141   12  1128K proc_inode_cache
  1632   1588  97%1.00K102   16  1632K kmalloc-1024
  1472   1442  97%0.25K 92   16   368K kmalloc-256
  1428   1129  79%0.19K 68   21   272K kmalloc-192
  1296   1284  99%4.00K1628  5184K kmalloc-4096
  1275   1270  99%2.06K 85   15  2720K net_namespace
[...]

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130220111856.ga20...@entuzijast.net

Bug#700755: huge slab_unreclaimable in Xen domU

2013-02-20 Thread Josip Rodin

On Wed, Feb 20, 2013 at 11:35:44AM +, Ian Campbell wrote:
  OK, I can install both (it's got PV-GRUB), which do you prefer to test 
  first?
  I'm asking because it'll likely take a few weeks for the bug to appear,
  judging by what it did before.
 
 Probably at this stage I would be more interested in making sure Wheezy
 was going to be OK first.

ACK

 I'm not sure. The net_namespace one should be easy enough to track in
 the code since:
 net_cachep = kmem_cache_create(net_namespace, sizeof(struct
 net),
 and therefore users of net_cachep must be responsible, I'd expect there
 to be not all that many of those. Are you actually using network
 namespaces in the guest?

If I am, I'm not doing it intentionally :) I'd assume this was part of the
LXC functionality, but as Ben noticed before, there's code in vsftpd that
triggers their use...?

 The Debian kernels have SLUB:
 /boot/config-2.6.32-5-xen-amd64:CONFIG_SLUB_DEBUG=y
 /boot/config-2.6.32-5-xen-amd64:CONFIG_SLUB=y
 (same as native). Documentation/vm/slub.txt has some info on adding
 debugging stuff there, e.g. adding slub_debug to the command line. It
 doesn't look like rebuilding with the other two option would initially
 be useful (the first is equivalent to the command line option anyway)

I'm wary of enabling slub_debug by default; the document says it's okay to
enable individual items on runtime:

% ls -l /sys/kernel/slab/net_namespace/ | grep rw
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 min_partial
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 order
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 poison
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 reclaim_account
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 red_zone
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 remote_node_defrag_ratio
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 sanity_checks
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 shrink
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 store_user
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 trace
-rw-r--r-- 1 root root 4096 2013-02-20 15:41 validate

From the documentation, I probably want:
U   User tracking (free and alloc)

But which of the above files corresponds to that, 'store_user' or?
I'll have to go look at the source.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130220144349.ga15...@entuzijast.net

Bug#519586: Huge Slab Unreclaimable and continually growing

2013-02-16 Thread Josip Rodin

On Sat, Feb 16, 2013 at 03:13:06AM +, Ben Hutchings wrote:
 On Fri, 2013-02-15 at 08:56 +0100, Josip Rodin wrote:
   I appear to be experiencing a serious problem with a 768 MB RAM Xen domU
   machine running an NFS client - every now and then (for months now), often
   in the middle of the night, it enters some kind of a broken state where a
   few semi-random processes (mainly apache2's and vsftpd's which are told to
   serve files from the NFS mount)
 [...]
  I caught it earlier just now, at:
  
  [950084.590733] active_anon:2805 inactive_anon:11835 isolated_anon:0
  [950084.590735]  active_file:76 inactive_file:516 isolated_file:32
  [950084.590737]  unevictable:783 dirty:1 writeback:0 unstable:0
  [950084.590739]  free:26251 slab_reclaimable:15733 slab_unreclaimable:128868
  [950084.590741]  mapped:938 shmem:75 pagetables:651 bounce:0
  
  And snuck in a few slabtops (even some -o invocations were getting killed,
  along with my shell and pretty much everything else):
 [...]
   65390  65390 100%2.06K  13338   15426816K net_namespace
 [...]
 
 Looks like CVE-2011-2189, for which there was a fix/workaround in:
 
 vsftpd (2.3.2-3+squeeze2) stable-security; urgency=high
 
* Non-maintainer upload by the Security Team.
* Disable network isolation due to a problem with cleaning up network
  namespaces fast enough in kernels  2.6.35 (CVE-2011-2189).
  Thanks Ben Hutchings for the patch!
* Fix possible DoS via globa expressions in STAT commands by
  limiting the matching loop (CVE-2011-0762; Closes: #622741).
 
  -- Nico Golde n...@debian.org  Wed, 07 Sep 2011 20:39:59 +
 
 Do you have an old version of vsftpd, or perhaps an upstream version
 which doesn't include the workaround?

No, 2.3.2-3+squeeze2 is there, has been since 2012-03-22.

 Anyway, I'm closing the bug report; please don't hijack closed bugs.

Eh? It was not closed for being fixed, it was closed en masse on a
procedural reason that could easily be wrong, and I don't believe I was
hijacking it; you just confirmed that this is a kernel problem above,
so how could this possibly be improper?!

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130216213705.ga13...@entuzijast.net

Bug#519586: Huge Slab Unreclaimable and continually growing

2013-02-14 Thread Josip Rodin

On Tue, Jan 22, 2013 at 10:59:17AM +0100, Josip Rodin wrote:
 I appear to be experiencing a serious problem with a 768 MB RAM Xen domU
 machine running an NFS client - every now and then (for months now), often
 in the middle of the night, it enters some kind of a broken state where a
 few semi-random processes (mainly apache2's and vsftpd's which are told to
 serve files from the NFS mount) start battling it out for the memory, and
 everything including sshd starts invoking the OOM killer, over and over
 again. Nothing seems to halt the downward spiral; manual invocation of the
 OOM killer does nothing of any use. Terminating all processes is the only
 thing that makes it go quiet, but then that's effectively the same as a
 reboot.
 
 This is the SysRq+M output on the machine once it's been in the broken state
 for a while:
[...]
 active_anon:394 inactive_anon:3197 isolated_anon:0
  active_file:25 inactive_file:176 isolated_file:32
  unevictable:2659 dirty:1 writeback:0 unstable:0
  free:21456 slab_reclaimable:16177 slab_unreclaimable:143165
  mapped:677 shmem:76 pagetables:455 bounce:0
[...]
 The thing I noticed was the slab_unreclaimable explosion, by a factor
 of 122. That... doesn't sound like something that should be happenning.
 
 Googling for slab_unreclaimable found me this old bug report about
 slab_unreclaimable domU problems that was mass-closed with the switch to the
 new paravirtops Xen release. Granted, our use case is not Samba like with
 the original reporter, but the pattern of a file server was close enough for
 me to be uncomfortable with it :|

I caught it earlier just now, at:

[950084.590733] active_anon:2805 inactive_anon:11835 isolated_anon:0
[950084.590735]  active_file:76 inactive_file:516 isolated_file:32
[950084.590737]  unevictable:783 dirty:1 writeback:0 unstable:0
[950084.590739]  free:26251 slab_reclaimable:15733 slab_unreclaimable:128868
[950084.590741]  mapped:938 shmem:75 pagetables:651 bounce:0

And snuck in a few slabtops (even some -o invocations were getting killed,
along with my shell and pretty much everything else):

 Active / Total Objects (% used): 555753 / 587128 (94.7%)
 Active / Total Slabs (% used)  : 49430 / 49430 (100.0%)
 Active / Total Caches (% used) : 65 / 76 (85.5%)
 Active / Total Size (% used)   : 546613.78K / 553025.01K (98.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.94K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 90993  66836  73%0.19K   4333   21 17332K dentry
 75840  73664  97%0.12K   2370   32  9480K kmalloc-128
 68096  68092  99%0.01K133  512   532K kmalloc-8
 65888  65655  99%0.25K   4118   16 16472K kmalloc-256
 65820  65778  99%1.00K   4767   16 76272K kmalloc-1024
 65436  65414  99%0.63K   5454   12 43632K proc_inode_cache
 65419  65419 100%4.00K  141798453728K kmalloc-4096
 65390  65390 100%2.06K  13338   15426816K net_namespace
  4998   4990  99%0.08K 98   51   392K sysfs_dir_cache
  4224   2018  47%0.06K 66   64   264K kmalloc-64
  2288   2107  92%0.18K104   22   416K vm_area_struct
  1792   1789  99%0.02K  7  25628K kmalloc-16
  1470   1203  81%0.19K 70   21   280K kmalloc-192
  1300402  30%0.79K 65   20  1040K ext3_inode_cache
   896731  81%0.03K  7  12828K anon_vma
   784532  67%0.55K 56   14   448K radix_tree_node

A bit later:

 Active / Total Objects (% used): 555403 / 586704 (94.7%)
 Active / Total Slabs (% used)  : 49394 / 49394 (100.0%)
 Active / Total Caches (% used) : 65 / 76 (85.5%)
 Active / Total Size (% used)   : 546552.82K / 552827.43K (98.9%)
 Minimum / Average / Maximum Object : 0.01K / 0.94K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 90993  66779  73%0.19K   4333   21 17332K dentry
 75840  73654  97%0.12K   2370   32  9480K kmalloc-128
 68096  68092  99%0.01K133  512   532K kmalloc-8
 65888  65601  99%0.25K   4118   16 16472K kmalloc-256
 65852  65741  99%1.00K   4760   16 76160K kmalloc-1024
 65436  65409  99%0.63K   5454   12 43632K proc_inode_cache
 65428  65428 100%4.00K  141818453792K kmalloc-4096
 65391  65391 100%2.06K  13339   15426848K net_namespace
  4998   4986  99%0.08K 98   51   392K sysfs_dir_cache
  4224   2017  47%0.06K 66   64   264K kmalloc-64
  2134   2108  98%0.18K 97   22   388K vm_area_struct
  1792   1789  99%0.02K  7  25628K kmalloc-16
  1449   1078  74%0.19K 69   21   276K kmalloc-192
  1100376  34%0.79K 55   20   880K ext3_inode_cache
   896639  71%0.03K  7  12828K anon_vma
   714554  77%0.55K 51   14

Bug#685360: [PATCH 1/1] HID: Fix missing Unifying device issue

2012-09-27 Thread Josip Rodin

On Mon, Sep 24, 2012 at 11:30:28AM +0200, Nestor Lopez Casado wrote:
 Josip, this is a different issue from the one addressed with the patch.
 
 1) Can you try it on a 3.2 kernel ?

I can try that too, I'll let you know how it went.

(Unfortunately the machine is in the same room with a crib, so I don't
get a lot of time slots for testing. *shrug* :)

 2) The problem you describe, does it happen all the time ?

Yes. The keyboard simply stopped working after I upgraded to Linux 3.2.
It works fine under 3.1 and earlier, and under Windows.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120927070426.ga27...@entuzijast.net

Bug#685360: [PATCH 1/1] HID: Fix missing Unifying device issue

2012-09-27 Thread Josip Rodin

On Thu, Sep 27, 2012 at 09:04:26AM +0200, Josip Rodin wrote:
 On Mon, Sep 24, 2012 at 11:30:28AM +0200, Nestor Lopez Casado wrote:
  Josip, this is a different issue from the one addressed with the patch.
  
  1) Can you try it on a 3.2 kernel ?
 
 I can try that too, I'll let you know how it went.

Same thing, it doesn't work.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120927185628.ga30...@entuzijast.net

Bug#685360: [PATCH 1/1] HID: Fix missing Unifying device issue

2012-09-23 Thread Josip Rodin

On Fri, Sep 21, 2012 at 12:21:34PM +0200, Nestor Lopez Casado wrote:
 This patch fixes an issue introduced after commit 4ea5454203d991ec
 
 After that commit, hid-core silently discards any incoming packet
 that arrives while any hid driver's probe function is being executed.

I managed to test this now, on top of Linux 3.5, but it didn't fix my
keyboard. I still get the same sequence of messages with hid.debug=1:

+usb 5-2: new full-speed USB device number 3 using ohci_hcd
+drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 0
+drivers/hid/hid-logitech-dj.c: logi_dj_probe called for ifnum 0
+drivers/hid/hid-logitech-dj.c: logi_dj_probe: ignoring ifnum 0
+drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 1
+drivers/hid/hid-logitech-dj.c: logi_dj_probe called for ifnum 1
+drivers/hid/hid-logitech-dj.c: logi_dj_probe: ignoring ifnum 1
+drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 2
+drivers/hid/hid-logitech-dj.c: logi_dj_probe called for ifnum 2
+drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0110 
wIndex=0x0002 wLength=7
+drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0111 
wIndex=0x0002 wLength=20
+drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0120 
wIndex=0x0002 wLength=15
+drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0121 
wIndex=0x0002 wLength=32
+logitech-djreceiver 0003:046D:C52B.0005: claimed by neither input, hiddev nor 
hidraw
+logitech-djreceiver 0003:046D:C52B.0005: logi_dj_probe:hid_hw_start returned 
error

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120923102318.ga12...@entuzijast.net

Bug#685360: Logitech USB keyboard broken with Linux 3.2 (regression from 3.1)

2012-09-17 Thread Josip Rodin

On Mon, Sep 17, 2012 at 12:57:06PM +0200, Jiri Kosina wrote:
 On Wed, 12 Sep 2012, Nestor Lopez Casado wrote:
 
  Take a look at this thread ... where a patch was published ...
  
  https://bugs.launchpad.net/ubuntu/+bug/958174
  
  Your issue may come from the same problem.
  
  I will get back to you next week. I am OOO until monday.
 
 So, what is the progress here, please? Nestor, Josip?

There is no progress because for some reason I did not receive Nestor's
previous e-mail; I'll try to test it this week.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120917110008.ga16...@entuzijast.net

Bug#685360: Acknowledgement (AMD SB 750 + Logitech USB keyboard broken and system unbootable with Linux 3.2 (regression from 2.6.38))

2012-09-11 Thread Josip Rodin

Control: retitle -1 AMD SB 750 + Logitech USB keyboard brokenness with Linux 
3.2 (regression from 3.1)

On Mon, Aug 20, 2012 at 06:26:42PM +0200, Josip Rodin wrote:
 I'll try to bisect this now with my config.

It looks like it's definitely in some way related with the introduction of
CONFIG_HID_LOGITECH_DJ in 3.2+, because 3.1.0 works fine...

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120911224629.ga32...@entuzijast.net

Bug#685360: Acknowledgement (AMD SB 750 + Logitech USB keyboard broken and system unbootable with Linux 3.2 (regression from 2.6.38))

2012-09-11 Thread Josip Rodin

On Wed, Sep 12, 2012 at 12:46:29AM +0200, Josip Rodin wrote:
 On Mon, Aug 20, 2012 at 06:26:42PM +0200, Josip Rodin wrote:
  I'll try to bisect this now with my config.
 
 It looks like it's definitely in some way related with the introduction of
 CONFIG_HID_LOGITECH_DJ in 3.2+, because 3.1.0 works fine...

The dmesg difference between 3.1.0 (working) and 3.2.0 (broken) is a bit
confusing - on one USB port, there's no change, but on the other the new
module reports a failure (this output is with hid.debug=1 and is a bit fuzzy
because of random harmless changes like spelling fixes or device indices
5 vs 3):

-usb 2-5: new high speed USB device number 2 using ehci_hcd
-usb 5-1: new low speed USB device number 2 using ohci_hcd
+usb 2-5: new high-speed USB device number 2 using ehci_hcd
+usb 3-1: new low-speed USB device number 2 using ohci_hcd
 drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 0
-input: Logitech USB Receiver as 
/devices/pci:00/:00:12.0/usb5/5-1/5-1:1.0/input/input3
+input: Logitech USB Receiver as 
/devices/pci:00/:00:12.0/usb3/3-1/3-1:1.0/input/input3
 generic-usb 0003:046D:C51B.0001: input: USB HID v1.11 Mouse [Logitech USB 
Receiver] on usb-:00:12.0-1/input
 drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 1
 drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0110 
wIndex=0x0001 wLength=7
 generic-usb 0003:046D:C51B.0002: claimed by neither input, hiddev nor hidraw
-usb 5-2: new full speed USB device number 3 using ohci_hcd
+usb 3-2: new full-speed USB device number 3 using ohci_hcd
 drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 0
-drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Set_Report wValue=0x0200 
wIndex=0x wLength=1
-input: Logitech USB Receiver as 
/devices/pci:00/:00:12.0/usb5/5-2/5-2:1.0/input/input4
-generic-usb 0003:046D:C52B.0003: input: USB HID v1.11 Keyboard [Logitech USB 
Receiver] on usb-:00:12.0-2/in
+drivers/hid/hid-logitech-dj.c: logi_dj_probe called for ifnum 0
+drivers/hid/hid-logitech-dj.c: logi_dj_probe: ignoring ifnum 0
 drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 1
-input: Logitech USB Receiver as 
/devices/pci:00/:00:12.0/usb5/5-2/5-2:1.1/input/input5
-generic-usb 0003:046D:C52B.0004: input: USB HID v1.11 Mouse [Logitech USB 
Receiver] on usb-:00:12.0-2/input
+drivers/hid/hid-logitech-dj.c: logi_dj_probe called for ifnum 1
+drivers/hid/hid-logitech-dj.c: logi_dj_probe: ignoring ifnum 1
 drivers/hid/usbhid/hid-core.c: HID probe called for ifnum 2
+drivers/hid/hid-logitech-dj.c: logi_dj_probe called for ifnum 2
 drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0110 
wIndex=0x0002 wLength=7
 drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0111 
wIndex=0x0002 wLength=20
 drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0120 
wIndex=0x0002 wLength=15
 drivers/hid/usbhid/hid-core.c: submitting ctrl urb: Get_Report wValue=0x0121 
wIndex=0x0002 wLength=32
-generic-usb 0003:046D:C52B.0005: claimed by neither input, hiddev nor hidraw
-usb 2-5: reset high speed USB device number 2 using ehci_hcd
+logitech-djreceiver 0003:046D:C52B.0005: claimed by neither input, hiddev nor 
hidraw
+logitech-djreceiver 0003:046D:C52B.0005: logi_dj_probe:hid_hw_start returned 
error:-19

Now where did the devices 0003:046D:C52B.000[34] go with the new kernel?
Are they the ones that logi_dj_probe sees as 0 and 1?

The code says:

/* Ignore interfaces 0 and 1, they will not carry any data, dont create
 * any hid_device for them */
if (intf-cur_altsetting-desc.bInterfaceNumber !=
LOGITECH_DJ_INTERFACE_NUMBER) {   
dbg_hid(%s: ignoring ifnum %d\n, __func__,
intf-cur_altsetting-desc.bInterfaceNumber);
return -ENODEV;  
}

Well, that probably explains it. But why does it do that?

lsusb -v says the following about the hardware:

Bus 005 Device 003: ID 046d:c52b Logitech, Inc. Unifying Receiver
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize0 8
  idVendor   0x046d Logitech, Inc.
  idProduct  0xc52b Unifying Receiver
  bcdDevice   12.01
  iManufacturer   1 Logitech
  iProduct2 USB Receiver
  iSerial 0 
  bNumConfigurations  1
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength   84
bNumInterfaces  3
bConfigurationValue 1
iConfiguration  4 RQR12.01_B0019
bmAttributes 0xa0
  (Bus Powered)
  Remote Wakeup
MaxPower   98mA
Interface Descriptor:
  bLength 9

Bug#666386: more info

2012-09-03 Thread Josip Rodin

On Sun, Sep 02, 2012 at 02:42:38PM +0100, Ben Hutchings wrote:
 On Sat, 2012-09-01 at 10:45 +0200, Bastian Blank wrote:
  On Fri, Aug 31, 2012 at 05:52:03PM +0200, Josip Rodin wrote:
   auto vlan2
   iface vlan2 inet manual
 vlan-raw-device xenbr0
  
  Is vlan-over-bridge documented to be supported?
 
 If it was not supported then bridge devices would have
 NETIF_F_VLAN_CHALLENGED and you would not be able to create VLAN devices
 on top of them.  But I don't expect this to work *well* at present.
 
 Rebooting, however... something is very wrong here.
 
  Usually I would use:
  | iface xenbr2 inet static
  |   bridge-ports bond0.2
  
   But as soon as I generate any traffic to or from 192.168.54.0/24 and that
   virtual machine (notice - not the right VLAN), the whole system instantly
   reboots, with no messages in syslog.
  
  Does it work without bond?
  
  I would switch to openvswitch. It documents bond/vlan setups, so they
  most likely work. (I don't use bond yet, but the rest works pretty
  flawless, however I have to submit the openvswitch support.)
 
 Definitely worth trying.

Ah, good catch, I do actually seem to want to use the underlying device for
vlan, rather than the bridge device. All my other setups are like that.

It shouldn't crash anyway...

JFTR the other thing I noticed, before I read the mail, was that the remote
machine was actually visible on that L2 segment through ARP, but the outside
world can't ping it - it's as if xen-netback/front don't let that through,
for no apparent reason. And then, when I initiate traffic towards the
outside world from the machine, it all goes poof.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120903162712.ga22...@entuzijast.net

Bug#666386: more info

2012-09-03 Thread Josip Rodin

On Mon, Sep 03, 2012 at 06:27:12PM +0200, Josip Rodin wrote:
 On Sun, Sep 02, 2012 at 02:42:38PM +0100, Ben Hutchings wrote:
  On Sat, 2012-09-01 at 10:45 +0200, Bastian Blank wrote:
   On Fri, Aug 31, 2012 at 05:52:03PM +0200, Josip Rodin wrote:
auto vlan2
iface vlan2 inet manual
  vlan-raw-device xenbr0
   
   Is vlan-over-bridge documented to be supported?
  
  If it was not supported then bridge devices would have
  NETIF_F_VLAN_CHALLENGED and you would not be able to create VLAN devices
  on top of them.  But I don't expect this to work *well* at present.
  
  Rebooting, however... something is very wrong here.
  
   Usually I would use:
   | iface xenbr2 inet static
   |   bridge-ports bond0.2
   
But as soon as I generate any traffic to or from 192.168.54.0/24 and 
that
virtual machine (notice - not the right VLAN), the whole system 
instantly
reboots, with no messages in syslog.
   
   Does it work without bond?
 
 Ah, good catch, I do actually seem to want to use the underlying device for
 vlan, rather than the bridge device. All my other setups are like that.

Confirming it works with fixed vlan-raw-device, pointed to eth2. I should
test with bond now, that was probably it...

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120903163909.ga26...@entuzijast.net

Bug#666386: more info

2012-08-31 Thread Josip Rodin

Hi,

I had removed the igb-based eth0 from the bonding interface, and the machine
was running fine with it, but when the time had come to get some Xen domUs
running on it, it failed miserably on me once again.

The updated setup is:

auto bond0
iface bond0 inet manual
  slaves eth2
  bond_mode active-backup
  bond_miimon 100
auto xenbr0
iface xenbr0 inet static
  bridge-ports bond0
  bridge-fd 0
  address 192.168.54.2
  netmask 255.255.255.0
auto vlan2
iface vlan2 inet manual
  vlan-raw-device xenbr0
auto xenbr2
iface xenbr2 inet static
  bridge-ports vlan2
  bridge-fd 0
  address 213.202.97.156
  netmask 255.255.255.240
  gateway 213.202.97.145

And the virtual machine has simply this:

vif = [
mac=00:16:3e:7a:32:9b, bridge=xenbr2,
]

But as soon as I generate any traffic to or from 192.168.54.0/24 and that
virtual machine (notice - not the right VLAN), the whole system instantly
reboots, with no messages in syslog.

I should probably use the hypervisor's noreboot option, but I don't have
a connection to its IPMI out-of-band access controller, and I'm off-site,
so I'm SOL.

This is with linux-image-3.2.0-0.bpo.2-amd64 and with latest .bpo.3.

I'm going to try fiddling with ethtool -K eth2 gro/lro off, but with the
reboots taking 3min on this hardware, this is most annoying...

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120831155203.ga6...@entuzijast.net

Bug#666386: igb + bnx2 + ifenslave + brctl + vconfig = largely broken

2012-04-11 Thread Josip Rodin

On Sat, Apr 07, 2012 at 04:29:38AM +0100, Ben Hutchings wrote:
 I would like to take this upstream now, but first I need to check
 whether it has already been fixed after 2.6.32.  Please can you test the
 current kernel package from testing, unstable or squeeze-backports
 (linux-image-3.2.0-2-amd64 or linux-image-3.2.0-0.bpo.2-amd64)?

I installed linux-image-3.2.0-0.bpo.2-amd64, plus the upgraded linux-base
and initramfs-tools, plus the indicated firmware-bnx2 upgrade -- and then
rebooted into that kernel, but the machine wouldn't respond to ping over
the xenbr2 interface (the one with the default gateway).

I logged into it fine through the xenbr54 interface, and tried to ping the
default gateway, and it didn't work. This was with the workaround - only
bnx2/eth2 in the bonding interface. Then I removed the default gateway
and added it back just to see if it'll work, and then it started pinging.
Weird.

After that, I tried to reproduce this bug, but failed, it looks like the bug
is fixed there. I noticed a significant lag with some of those bonding
--detach/--change-active actions, but after a few sections everything
continued to work fine.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120411122453.ga29...@entuzijast.net

Bug#666386: igb + bnx2 + ifenslave + brctl + vconfig = largely broken

2012-04-04 Thread Josip Rodin

On Mon, Apr 02, 2012 at 05:22:37AM +0100, Ben Hutchings wrote:
 On Sun, 2012-04-01 at 12:40 +0200, Josip Rodin wrote:
  On Sun, Apr 01, 2012 at 03:09:56AM +0100, Ben Hutchings wrote:
   I bet this is due to the combination of LRO plus bridging.  We try to
   turn off LRO in devices under a bridge, but that won't work if there's
   an intermediate bonding device.
   
   If you run:
   
   # ethtool -K eth0 lro off
   # ethtool -K eth2 lro off
   
   does the bridge start working?
  
  Err...
  
  % sudo ethtool -K eth0 lro off
  Cannot set large receive offload settings: Operation not supported
  % sudo ethtool -K eth2 lro off
  Cannot set large receive offload settings: Operation not supported
 
 Hmm.  Well it shouldn't be a problem but you could try also turning off
 GRO (similar commands).

Ah, there we go. Once I ran sudo ethtool -K eth0 gro off,
sudo ifenslave bond54 eth0 produced a still-working bond54.

  That's with eth0 removed from bonding, and eth2 inside.
 
 So the bonding device has only one slave now?

Yes, it was like that.

 What if you take the bonding device out completely and add eth2 directly
 to the bridge?

I think I had already tested that and everything was fine, too.
Do you want me to test that or is the GRO removal conclusive?

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120404075557.ga3...@entuzijast.net

Bug#666386: igb + bnx2 + ifenslave + brctl + vconfig = largely broken

2012-04-01 Thread Josip Rodin

On Sun, Apr 01, 2012 at 03:09:56AM +0100, Ben Hutchings wrote:
 I bet this is due to the combination of LRO plus bridging.  We try to
 turn off LRO in devices under a bridge, but that won't work if there's
 an intermediate bonding device.
 
 If you run:
 
 # ethtool -K eth0 lro off
 # ethtool -K eth2 lro off
 
 does the bridge start working?

Err...

% sudo ethtool -K eth0 lro off
Cannot set large receive offload settings: Operation not supported
% sudo ethtool -K eth2 lro off
Cannot set large receive offload settings: Operation not supported

That's with eth0 removed from bonding, and eth2 inside.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120401104044.ga28...@entuzijast.net

Bug#666386: igb + bnx2 + ifenslave + brctl + vconfig = largely broken

2012-03-30 Thread Josip Rodin

Package: linux-image-2.6.32-5-xen-amd64
Version: 2.6.32-41

Hi,

The machine is a new IBM x3550 M3, with this network hardware:

% lspci | grep Ethernet
0b:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit 
Ethernet (rev 20)
0b:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit 
Ethernet (rev 20)
1a:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection 
(rev 01)
1a:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection 
(rev 01)

One of each brands (eth0 and eth2) has a working cable plugged into a
working Ethernet switch that's set up so that it serves a native VLAN
(otherwise known as ID 54) and VLAN ID 2 trunked (tagged), among others.

The devices are:

lrwxrwxrwx 1 root root 0 Mar 19 15:42 /sys/class/net/eth0 - 
../../devices/pci:00/:00:07.0/:1a:00.0/net/eth0/
lrwxrwxrwx 1 root root 0 Mar 19 15:42 /sys/class/net/eth2 - 
../../devices/pci:00/:00:01.0/:0b:00.0/net/eth2/

So, if I read that right, eth0 is Intel, and eth2 is Broadcom.

The desired network setup is, in interfaces(5) format:

iface bond54 inet manual
  slaves eth0 eth2
  bond_mode active-backup
  bond_miimon 100

iface xenbr54 inet static
  bridge-ports bond54
  bridge-fd 0
  address 192.168.54.2
  netmask 255.255.255.0

iface vlan2 inet manual
  vlan-raw-device xenbr54

iface xenbr2 inet static
  bridge-ports vlan2
  bridge-fd 0
  address 213.202.97.156
  netmask 255.255.255.240
  gateway 213.202.97.145

This used to work for me elsewhere, however, on this machine it's broken as
follows:

Everything starts up fine, and the machine is perfectly usable (albeit I
only used SSH) over the xenbr54 interface.

However, over the xenbr2 interface, all the small network packets pass, such
as ICMP, or the bringup and teardown of HTTP connections, but as soon as I
try to actually GET something non-trivial over a seemingly established HTTP
connection, the machine pretends it doesn't see that incoming traffic.

Like this:

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
--2012-03-30 11:15:23--  http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response...

In parallel, the trace shows:

% sudo tshark -n -i xenbr2
  0.00 213.202.97.156 - 161.53.160.11 TCP 51657  80 [SYN] Seq=0 Win=5840 
Len=0 MSS=1460 TSV=232632046 TSER=0 WS=1
  0.001797 161.53.160.11 - 213.202.97.156 TCP 80  51657 [SYN, ACK] Seq=0 
Ack=1 Win=5792 Len=0 MSS=1460 TSV=643552423 TSER=232632046 WS=8
  0.001816 213.202.97.156 - 161.53.160.11 TCP 51657  80 [ACK] Seq=1 Ack=1 
Win=5840 Len=0 TSV=232632046 TSER=643552423
  0.001906 213.202.97.156 - 161.53.160.11 HTTP GET /debian/ls-lR.gz HTTP/1.0
  0.003625 161.53.160.11 - 213.202.97.156 TCP 80  51657 [ACK] Seq=1 Ack=131 
Win=6912 Len=0 TSV=643552423 TSER=232632046

And then it sits there. The server machine (which I happen to have control
over) says:

  0.00 213.202.97.156 - 161.53.160.11 TCP 51660  80 [SYN] Seq=0 Win=5840 
Len=0 MSS=1460 TSV=232668023 TSER=0 WS=1
  0.23 161.53.160.11 - 213.202.97.156 TCP 80  51660 [SYN, ACK] Seq=0 
Ack=1 Win=5792 Len=0 MSS=1460 TSV=643588400 TSER=232668023 WS=8
  0.003117 213.202.97.156 - 161.53.160.11 TCP 51660  80 [ACK] Seq=1 Ack=1 
Win=5840 Len=0 TSV=232668024 TSER=643588400
  0.003125 213.202.97.156 - 161.53.160.11 HTTP GET /debian/ls-lR.gz HTTP/1.0
  0.003145 161.53.160.11 - 213.202.97.156 TCP 80  51660 [ACK] Seq=1 Ack=131 
Win=6912 Len=0 TSV=643588401 TSER=232668024
  0.003480 161.53.160.11 - 213.202.97.156 TCP [TCP segment of a reassembled 
PDU]
  0.003500 161.53.160.11 - 213.202.97.156 TCP [TCP segment of a reassembled 
PDU]
  0.204965 161.53.160.11 - 213.202.97.156 TCP [TCP Retransmission] [TCP 
segment of a reassembled PDU]
  0.613959 161.53.160.11 - 213.202.97.156 TCP [TCP Retransmission] [TCP 
segment of a reassembled PDU]
  1.428964 161.53.160.11 - 213.202.97.156 TCP [TCP Retransmission] [TCP 
segment of a reassembled PDU]
  3.061959 161.53.160.11 - 213.202.97.156 TCP [TCP Retransmission] [TCP 
segment of a reassembled PDU]
  6.329958 161.53.160.11 - 213.202.97.156 TCP [TCP Retransmission] [TCP 
segment of a reassembled PDU]
 12.853960 161.53.160.11 - 213.202.97.156 TCP [TCP Retransmission] [TCP 
segment of a reassembled PDU]

And then I Ctrl+C that wget, and the traces show:

(on the client)
  8.017451 213.202.97.156 - 161.53.160.11 TCP 51664  80 [FIN, ACK] Seq=131 
Ack=1 Win=5840 Len=0 TSV=232696067 TSER=643614440
  8.057740 161.53.160.11 - 213.202.97.156 TCP [TCP Previous segment lost] 80  
51664 [ACK] Seq=4345 Ack=132 Win=6912 Len=0 TSV=643616454 TSER=232696067

(on the server)
  8.017218 213.202.97.156 - 161.53.160.11 TCP 51664  80 [FIN, ACK] Seq=131 
Ack=1 Win=5840 Len=0 TSV=232696067 TSER=643614440
  8.055647 161.53.160.11 - 213.202.97.156 TCP 80  51664 [ACK] Seq=4345 
Ack=132 Win=6912 Len=0

Bug#599161: ditto

2012-01-04 Thread Josip Rodin

On Tue, Jan 03, 2012 at 01:42:38PM +, Ian Campbell wrote:
 On Wed, 2011-12-28 at 01:49 +0100, Josip Rodin wrote:
  This clock jump by 2999 seconds also happened here, so per:
  
  http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html
  
  we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
  the dom0. This seemed to have avoided the problem, but since then, the clock
  jumps started happening like this:
  
  Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc 
  unstable (delta = -811538856601 ns)
  
  In addition, now I checked what the said machine thinks is its clocksource:
  
  % cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
  /sys/devices/system/clocksource/clocksource0/available_clocksource
  xen
  xen
  
  So there's neither pit nor tsc in the available list :)
 
 A PV kernel will (or should) always use xen as it's clocksource. This
 is a PV timesource based around the TSC + correction factors (to account
 for drift and PCPU migration).
 
 The clocksource=pit on the hypervisor command line controls the
 hypervisor's own timesource and not the dom0 kernels. I'm not sure how
 you query the hypervisor for its timesource but I guess it'll be in xl
 dmesg somewhere (Platform timer is ...).

Ah, d'oh :) sorry, I wasn't really thinking.

The xm dmesg output on HP DL360 machines that we have set to clocksource=pit
and that have nevertheless happened to shifted by more than 35996 seconds
in at least five incidents in the last six months says:

(XEN) Platform timer is 1.193MHz PIT

On a couple of FS RX300's that happened not to have clocksource=pit set but
had time shift by 2999.69 seconds it's this:

(XEN) Platform timer is 14.318MHz HPET

Both also show the following message after the time shift:

(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.


 The message you quote above says *tsc* unstable. Prior to that was the
 system actually using the tsc clocksource? It really shouldn't have
 been... Before that message did available_clocksource contain TSC? What
 about current_clocksource? (Before here ~= on a freshly booted system)

The dom0 machines where we set clocksource=pit do see the sole xen
clocksource. That didn't stop the time from going awry.

On the dom0 machines that don't have the hypervisor fixated on
clocksource=pit:

* one dom0 that sees both xen and tsc in available_clocksource, but uses
  xen as current_clocksource. Not sure what it used at the time of the
  failure in September, probably the same because we didn't touch that. 
* one that recently failed has:

% dmesg | grep unstable
[4613030.883101] Clocksource tsc unstable (delta = -2999660301416 ns)
% cat /sys/devices/system/clocksource/clocksource0/*
xen
xen

 What are your exact hypervisor and kernel command lines? Other than
 clocksource=pit are you overriding anything else in this regard?

Most of the machines now seem to have:

GRUB_CMDLINE_LINUX=console=tty0 console=ttyS1,115200n1 elevator=deadline
GRUB_CMDLINE_XEN=dom0_mem=512M clocksource=pit cpuidle=0

The machines without clocksource=pit only had dom0_mem=512M for the
hypervisor and nothing for the dom0 kernel.

 Can you press the 's' hypervisor debug key and report the resulting text
 from dmesg. (press a debug key == xl debug-key s + xl dmesg or press
 Ctrl-A 3 times on serial then press 's').

(Note that I used xm for both of those commands, I don't have xl.)

This is the output on a couple of of the DL360's with clocksource=pit:

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=3066 
(count=1)
(XEN) dom2: mode=0,ofs=0x21e231c896,khz=2333479,inc=1,vtsc count: 10647611967 
kernel, 454486411 user
(XEN) dom12: mode=0,ofs=0x21a01e68ddeb,khz=2333479,inc=1,vtsc count: 2478607037 
kernel, 199833427 user
(XEN) dom17: mode=0,ofs=0x8d12c3820bf0b,khz=2333479,inc=1,vtsc count: 918220049 
kernel, 56818086 user
(XEN) dom18: mode=0,ofs=0x8d1334e2f635f,khz=2333479,inc=1,vtsc count: 
4707785417 kernel, 197043637 user
(XEN) dom21: mode=0,ofs=0x1004cc1e5bf801,khz=2333479,inc=1,vtsc count: 
6386763431 kernel, 166512523 user
(XEN) dom22: mode=0,ofs=0x14b5955232a7e1,khz=2333479,inc=1,vtsc count: 
2218555643 kernel, 88962103 user

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=1715 
(count=1)
(XEN) dom1: mode=0,ofs=0x149170bd5f,khz=2333479,inc=1,vtsc count: 36234921552 
kernel, 294922844 user

This is the output on an RX300 without clocksource=pit:

(XEN) TSC marked as reliable, warp = 0 (count=2)
(XEN) dom1: mode=0,ofs=0x59e046806,khz=2400116,inc=1
(XEN) No domains have emulated TSC

And finally this is the output on the odd machine that has tsc as an
available clock source:

(XEN) TSC marked as reliable, warp = 0 (count=2)
(XEN) dom1: mode=0,ofs=0x593b1f9e8,khz=2400190,inc=1
(XEN) dom4: mode=0,ofs=0xf3c77d49e41e6,khz=2400190,inc=1
(XEN) No domains have emulated TSC

In the latter case, I've no idea why the domU with the ID 4

Bug#599161: ditto

2011-12-27 Thread Josip Rodin


This clock jump by 2999 seconds also happened here, so per:

http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html

we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
the dom0. This seemed to have avoided the problem, but since then, the clock
jumps started happening like this:

Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable 
(delta = -811538856601 ns)

In addition, now I checked what the said machine thinks is its clocksource:

% cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
/sys/devices/system/clocksource/clocksource0/available_clocksource
xen
xen

So there's neither pit nor tsc in the available list :)

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20111228004915.ga21...@entuzijast.net

Bug#622779: sparc config missing SERIAL_8250{,_PCI}

2011-04-15 Thread Josip Rodin

On Fri, Apr 15, 2011 at 04:14:12AM +0100, Ben Hutchings wrote:
 On Thu, 2011-04-14 at 17:10 +0200, Josip Rodin wrote:
  Package: linux-image-2.6.32-5-sparc64
  Version: 2.6.32-31
  
  Hi,
  
  /boot/config-2.6.32-5-sparc64 does not include CONFIG_SERIAL_8250
  or SERIAL_8250_PCI, so it's impossible to use PCI cards with serial ports
  on them, which is useful for accessing e.g. serial consoles of other
  machines from a sparc machine.
 [...]
 
 Is a module OK or do you want it built-in for some reason?  (That would
 be necessary for a serial console, but you can presumably already use
 the built-in serial port.)

Yes, in fact I only tested it as a module :)

For the serial console of the sparc machine itself, the ttyS* of sun* serial
modules are used, and on server hardware these usually hardwire to ALOM.
This is to be able to get a usable physical port to connect to another
machine.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110415072354.ga7...@entuzijast.net

Bug#622779: sparc config missing SERIAL_8250{,_PCI}

2011-04-14 Thread Josip Rodin

Package: linux-image-2.6.32-5-sparc64
Version: 2.6.32-31

Hi,

/boot/config-2.6.32-5-sparc64 does not include CONFIG_SERIAL_8250
or SERIAL_8250_PCI, so it's impossible to use PCI cards with serial ports
on them, which is useful for accessing e.g. serial consoles of other
machines from a sparc machine.

This used to be impossible (upstream), but the issue that had caused that
has long been fixed. (We got it to work in March 2009, judging by a
/boot/config-2.6.28.7 of mine.)

Please add this so that the PCI serial cards in lebrun.debian.org
and schroeder.debian.org stop being useless with the default kernel.
TIA :)

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110414151032.ga7...@entuzijast.net

Bug#610118: swapper: page allocation failure. order:0, mode:0x4020

2011-01-15 Thread Josip Rodin

Package: linux-image-2.6.32-5-xen-amd64
Version: 2.6.32-23~bpo50+1

Hi,

Something like this was mentioned misplaced in #592497, and about a
different network driver, so I'm filing a new bug because it should
be unrelated to both issues over there :)

I've just seen something similar with a tg3, randomly during normal
operation (after 100 days of uptime), but also on a Xen dom0.
Nothing very bad seems to have happened, the domUs haven't complained
at all, but it's suspicious.

Jan 15 13:39:10 virgo kernel: [9658533.157616] swapper: page allocation 
failure. order:0, mode:0x4020
Jan 15 13:39:10 virgo kernel: [9658533.157634] Pid: 0, comm: swapper Not 
tainted 2.6.32-bpo.5-xen-amd64 #1
Jan 15 13:39:10 virgo kernel: [9658533.157642] Call Trace:
Jan 15 13:39:10 virgo kernel: [9658533.157649]  IRQ  [810baed4] ? 
__alloc_pages_nodemask+0x55b/0x5cf
Jan 15 13:39:10 virgo kernel: [9658533.157677]  [81295624] ? 
tcp_rcv_established+0x688/0x6d9
Jan 15 13:39:10 virgo kernel: [9658533.157690]  [810e70d2] ? 
new_slab+0x5b/0x1ca
Jan 15 13:39:10 virgo kernel: [9658533.157700]  [810e7431] ? 
__slab_alloc+0x1f0/0x39b
Jan 15 13:39:10 virgo kernel: [9658533.157712]  [81258103] ? 
__netdev_alloc_skb+0x29/0x43
Jan 15 13:39:10 virgo kernel: [9658533.157723]  [810e7e63] ? 
__kmalloc_node_track_caller+0xbb/0x11b
Jan 15 13:39:10 virgo kernel: [9658533.157734]  [81258103] ? 
__netdev_alloc_skb+0x29/0x43
Jan 15 13:39:10 virgo kernel: [9658533.157744]  [81257518] ? 
__alloc_skb+0x69/0x15a
Jan 15 13:39:10 virgo kernel: [9658533.157754]  [81258103] ? 
__netdev_alloc_skb+0x29/0x43
Jan 15 13:39:10 virgo kernel: [9658533.157785]  [a0019c25] ? 
tg3_alloc_rx_skb+0xd2/0x146 [tg3]
Jan 15 13:39:10 virgo kernel: [9658533.157805]  [a0021292] ? 
tg3_poll+0x484/0x93d [tg3]
Jan 15 13:39:10 virgo kernel: [9658533.157818]  [8100e5b5] ? 
xen_force_evtchn_callback+0x9/0xa
Jan 15 13:39:10 virgo kernel: [9658533.157829]  [8100ec72] ? 
check_events+0x12/0x20
Jan 15 13:39:10 virgo kernel: [9658533.157840]  [8125e633] ? 
net_rx_action+0xae/0x1c9
Jan 15 13:39:10 virgo kernel: [9658533.157852]  [810548ca] ? 
__do_softirq+0xdd/0x19f
Jan 15 13:39:10 virgo kernel: [9658533.157863]  [81012cac] ? 
call_softirq+0x1c/0x30
Jan 15 13:39:10 virgo kernel: [9658533.157873]  [8101422b] ? 
do_softirq+0x3f/0x7c
Jan 15 13:39:10 virgo kernel: [9658533.157883]  [81054739] ? 
irq_exit+0x36/0x76
Jan 15 13:39:10 virgo kernel: [9658533.157894]  [811f14b1] ? 
xen_evtchn_do_upcall+0x33/0x42
Jan 15 13:39:10 virgo kernel: [9658533.157905]  [81012cfe] ? 
xen_do_hypervisor_callback+0x1e/0x30
Jan 15 13:39:10 virgo kernel: [9658533.157913]  EOI  [810093aa] ? 
hypercall_page+0x3aa/0x1001
Jan 15 13:39:10 virgo kernel: [9658533.157929]  [810093aa] ? 
hypercall_page+0x3aa/0x1001
Jan 15 13:39:10 virgo kernel: [9658533.157940]  [8100e633] ? 
xen_safe_halt+0xc/0x15
Jan 15 13:39:10 virgo kernel: [9658533.157950]  [8100bf3f] ? 
xen_idle+0x37/0x40
Jan 15 13:39:10 virgo kernel: [9658533.157959]  [81010eb1] ? 
cpu_idle+0xa2/0xda
Jan 15 13:39:10 virgo kernel: [9658533.157977]  [81502cd1] ? 
start_kernel+0x3dc/0x3e8
Jan 15 13:39:10 virgo kernel: [9658533.157987]  [81504c7d] ? 
xen_start_kernel+0x57c/0x581
Jan 15 13:39:10 virgo kernel: [9658533.157995] Mem-Info:
Jan 15 13:39:10 virgo kernel: [9658533.158000] Node 0 DMA per-cpu:
Jan 15 13:39:10 virgo kernel: [9658533.158009] CPU0: hi:0, btch:   1 
usd:   0
Jan 15 13:39:10 virgo kernel: [9658533.158017] CPU1: hi:0, btch:   1 
usd:   0
Jan 15 13:39:10 virgo kernel: [9658533.158024] CPU2: hi:0, btch:   1 
usd:   0
Jan 15 13:39:10 virgo kernel: [9658533.158031] CPU3: hi:0, btch:   1 
usd:   0
Jan 15 13:39:10 virgo kernel: [9658533.158038] Node 0 DMA32 per-cpu:
Jan 15 13:39:10 virgo kernel: [9658533.158046] CPU0: hi:  186, btch:  31 
usd: 197
Jan 15 13:39:10 virgo kernel: [9658533.158053] CPU1: hi:  186, btch:  31 
usd: 182
Jan 15 13:39:10 virgo kernel: [9658533.158061] CPU2: hi:  186, btch:  31 
usd: 140
Jan 15 13:39:10 virgo kernel: [9658533.158068] CPU3: hi:  186, btch:  31 
usd:  85
Jan 15 13:39:10 virgo kernel: [9658533.158080] active_anon:5831 
inactive_anon:8962 isolated_anon:0
Jan 15 13:39:10 virgo kernel: [9658533.158082]  active_file:36954 
inactive_file:24010 isolated_file:0
Jan 15 13:39:10 virgo kernel: [9658533.158084]  unevictable:5 dirty:2821 
writeback:0 unstable:0
Jan 15 13:39:10 virgo kernel: [9658533.158086]  free:753 slab_reclaimable:25777 
slab_unreclaimable:3367
Jan 15 13:39:10 virgo kernel: [9658533.158088]  mapped:4481 shmem:151 
pagetables:489 bounce:0
Jan 15 13:39:10 virgo kernel: [9658533.158109] Node 0 DMA free:1976kB min:80kB 
low:100kB high:120kB active_anon:308kB inactive_anon:512kB active_file:5712kB 
inactive_file:3932kB unevictable:0kB isolated(anon):0kB

Bug#598057: our xen-netfront in featureset=xen kernels has smartpoll enabled, but probably shouldn't

2010-09-25 Thread Josip Rodin

Package: linux-image-2.6.32-5-xen-amd64
Version: 2.6.32-23

Hi,

I just witnessed a strange situation - a domU had its kernel updated from
2.6.32-4-amd64 to 2.6.32-bpo.5-xen-amd64, and all seemed well, but after
two hours it stopped responding on its (statically configured) eth0 device.

tcpdump of the bridge and the vif on the dom0 said that everything was all
right, the domU was making ARP requests for its gateway, and the gateway was
responding, but then the domU would just repeat the same request, over and
over again.

That sounded to me like a xen-netfront problem. I happen to watch xen.git,
so I know about this recent back-and-forth:

http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=refs/heads/xen/netfront

After September 10th this year, several bugs were identified in this new
smartpoll logic. I then checked in our package, and we seem to be using
an August 13th copy, so we're missing those.

I'm not at all sure that this was the issue in my problem, because I don't
completely grok all that stuff - I don't exactly know if xennet_interrupt()
or smart_poll_function() are what's getting stuck in my use case - but
debian/patches/features/all/xen/pvops.patch includes the smartpoll changes
and plain drivers/net/xen-netfront.c doesn't, and the latter has worked fine
for many months here while the former screwed us shortly after installation,
so that's suspicious enough for me.

Please update the kernel pvops patch to include the more recent xen/netfront
branch - where the benefit is both in that known smartpoll bugs are fixed,
and this new feature is turned off by default until upstream is more
comfortable with it. TIA.

--
2. That which causes joy or happiness.

--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100925230436.ga3...@entuzijast.net

Bug#598057: our xen-netfront in featureset=xen kernels has smartpoll enabled, but probably shouldn't

2010-09-25 Thread Josip Rodin

forcemerge 596635 598057
thanks

On Sun, Sep 26, 2010 at 12:34:28AM +0100, Ben Hutchings wrote:
 We know; this is already going to be fixed:
 
   * [x86/xen] Disable netfront's smartpoll mode by default. (Closes: #596635)

Sorry, I didn't check the applicable bug list before sending, because of
the simple fact that linux-2.6's bug page wouldn't load within half a dozen
times 8 seconds, so I gave up and just checked
linux-image-2.6.32-5-xen-amd64's bug page, which didn't have anything
that resembled this. Oh well.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100926003128.ga24...@entuzijast.net

Bug#597276: qla2xxx_eh_abort(5) - kernel NULL pointer dereference

2010-09-22 Thread Josip Rodin

On Sun, Sep 19, 2010 at 11:44:50PM -0700, Giridhar Malavali wrote:
 Thanks for letting us know about this problem. Can u please provide logs
 with ql2xextended_error_logging enabled. Also, can u please provide more
 details about the test case.

OK. The machine has this hardware:

% sudo lspci -v
[...]
0b:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI 
Express HBA (rev 02)
Subsystem: Hewlett-Packard Company Device 7041
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at 5000 [size=256]
Memory at fdef (64-bit, non-prefetchable) [size=16K]
[virtual] Expansion ROM at d000 [disabled] [size=256K]
Capabilities: [44] Power Management version 2
Capabilities: [4c] Express Endpoint, MSI 00
Capabilities: [64] Message Signalled Interrupts: Mask- 64bit+ Queue=0/4 
Enable-
Capabilities: [74] Vital Product Data ?
Capabilities: [7c] MSI-X: Enable- Mask- TabSize=16
Capabilities: [100] Advanced Error Reporting ?
Capabilities: [138] Power Budgeting ?
Kernel driver in use: qla2xxx
Kernel modules: qla2xxx

0b:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI 
Express HBA (rev 02)
Subsystem: Hewlett-Packard Company Device 7041
Flags: bus master, fast devsel, latency 0, IRQ 17
I/O ports at 5400 [size=256]
Memory at fdee (64-bit, non-prefetchable) [size=16K]
[virtual] Expansion ROM at d004 [disabled] [size=256K]
Capabilities: [44] Power Management version 2
Capabilities: [4c] Express Endpoint, MSI 00
Capabilities: [64] Message Signalled Interrupts: Mask- 64bit+ Queue=0/4 
Enable-
Capabilities: [74] Vital Product Data ?
Capabilities: [7c] MSI-X: Enable- Mask- TabSize=16
Capabilities: [100] Advanced Error Reporting ?
Capabilities: [138] Power Budgeting ?
Kernel driver in use: qla2xxx
Kernel modules: qla2xxx

13:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI 
Express HBA (rev 02)
Subsystem: Hewlett-Packard Company Device 7041
Flags: bus master, fast devsel, latency 0, IRQ 17
I/O ports at 6000 [size=256]
Memory at fdff (64-bit, non-prefetchable) [size=16K]
[virtual] Expansion ROM at d020 [disabled] [size=256K]
Capabilities: [44] Power Management version 2
Capabilities: [4c] Express Endpoint, MSI 00
Capabilities: [64] Message Signalled Interrupts: Mask- 64bit+ Queue=0/4 
Enable-
Capabilities: [74] Vital Product Data ?
Capabilities: [7c] MSI-X: Enable- Mask- TabSize=16
Capabilities: [100] Advanced Error Reporting ?
Capabilities: [138] Power Budgeting ?
Kernel driver in use: qla2xxx
Kernel modules: qla2xxx

13:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI 
Express HBA (rev 02)
Subsystem: Hewlett-Packard Company Device 7041
Flags: bus master, fast devsel, latency 0, IRQ 18
I/O ports at 6400 [size=256]
Memory at fdfe (64-bit, non-prefetchable) [size=16K]
[virtual] Expansion ROM at d024 [disabled] [size=256K]
Capabilities: [44] Power Management version 2
Capabilities: [4c] Express Endpoint, MSI 00
Capabilities: [64] Message Signalled Interrupts: Mask- 64bit+ Queue=0/4 
Enable-
Capabilities: [74] Vital Product Data ?
Capabilities: [7c] MSI-X: Enable- Mask- TabSize=16
Capabilities: [100] Advanced Error Reporting ?
Capabilities: [138] Power Budgeting ?
Kernel driver in use: qla2xxx
Kernel modules: qla2xxx

Anyway, we had been running an earlier 2.6.32 kernel up until a few days
ago, which gave us this on boot:

[2.656008] QLogic Fibre Channel HBA Driver: 8.03.01-k6-debug
[2.656188] qla2xxx :0b:00.0: PCI INT A - GSI 16 (level, low) - IRQ 16
[2.710842] qla2xxx :0b:00.0: Found an ISP2432, irq 16, iobase 
0xc9c6c000
[2.719526] qla2xxx :0b:00.0: MSI-X: Unsupported ISP2432 (0x2, 0x0).
[2.727776]   alloc irq_desc for 61 on node -1
[2.727778]   alloc kstat_irqs on node -1
[2.728002] qla2xxx :0b:00.0: irq 61 for MSI/MSI-X
[2.728184] qla2xxx :0b:00.0: MSI: Enabled.
[2.732040] IRQ 59/cciss0: IRQF_DISABLED is not guaranteed on shared IRQs
[2.732058] cciss0: 0x3230 at PCI :06:00.0 IRQ 59 using DAC
[2.747326] qla2xxx :0b:00.0: Configuring PCI space...
[2.747479]  cciss/c0d0: p1
[2.755773] qla2xxx :0b:00.0: setting latency timer to 64
[2.756280]  p2
[2.760467] qla2xxx :0b:00.0: FLTL[DEF] = 0x11400.
[2.773807] qla2xxx :0b:00.0: FLT[DEF]: boot=0x0 fw=0x2 
vpd_nvram=0x48000 vpd=0x0 nvram=0x0 fdt=0x11000 flt=0x11400
[2.787143] qla2xxx :0b:00.0: FDT[MID]: (0xbf/0x80) erase=0x7ffd0352 
pro=0 upro=0 wrtd=0x9c blk=0x8000.
[2.789701] qla2xxx :0b:00.0:

Bug#597276: qla2xxx_eh_abort(5) - kernel NULL pointer dereference

2010-09-18 Thread Josip Rodin

Package: linux-2.6
Version: 2.6.32-21~bpo50+1

Hi,

Got this in dmesg on a server:

Sep 18 02:46:52 birdun kernel: [387093.744649] qla2xxx_eh_abort(5): aborting sp 
8801b58013c0 from RISC. pid=46881441.
Sep 18 02:46:56 birdun kernel: [387093.836909] BUG: unable to handle kernel 
NULL pointer dereference at 0040
Sep 18 02:46:56 birdun kernel: [387093.924511] IP: [812f8ea1] 
_spin_lock_irqsave+0x1a/0x34
Sep 18 02:46:56 birdun kernel: [387093.996511] PGD 22d846067 PUD 22d678067 PMD 0
Sep 18 02:46:56 birdun kernel: [387094.048511] Oops: 0002 [#1] SMP
Sep 18 02:46:56 birdun kernel: [387094.086651] last sysfs file: 
/sys/devices/pci:00/:00:04.0/:13:00.0/host4/rport-4:0-3/target4:0:3/fc_transport/target4:0:3/node_name
Sep 18 02:46:56 birdun kernel: [387094.236007] CPU 4
Sep 18 02:46:56 birdun kernel: [387094.260007] Modules linked in: ipmi_devintf 
nf_conntrack_ipv6 ip6t_LOG ip6table_filter ip6_tables xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT ipt_LOG 
iptable_filter ip_tables x_tables bonding xfs exportfs dm_round_robin 
dm_multipath scsi_dh loop snd_pcsp snd_pcm snd_timer psmouse ipmi_si rng_core 
snd soundcore i5000_edac serio_raw hpilo ipmi_msghandler snd_page_alloc 
edac_core evdev container i5k_amb button processor shpchp pci_hotplug ext3 jbd 
mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod st ch osst sd_mod 
crc_t10dif sg sr_mod cdrom ata_piix ata_generic qla2xxx scsi_transport_fc 
libata scsi_tgt cciss usbhid hid bnx2 ehci_hcd uhci_hcd floppy usbcore nls_base 
scsi_mod thermal fan thermal_sys
Sep 18 02:46:56 birdun kernel: [387095.008511] Pid: 763, comm: scsi_eh_5 Not 
tainted 2.6.32-bpo.5-amd64 #1 ProLiant DL360 G5
Sep 18 02:46:56 birdun kernel: [387095.104511] RIP: 0010:[812f8ea1]  
[812f8ea1] _spin_lock_irqsave+0x1a/0x34
Sep 18 02:46:56 birdun kernel: [387095.204007] RSP: 0018:88022b1c5d70  
EFLAGS: 00010082
Sep 18 02:46:56 birdun kernel: [387095.264511] RAX: 0282 RBX: 
0040 RCX: 381d
Sep 18 02:46:56 birdun kernel: [387095.348511] RDX: 0001 RSI: 
0282 RDI: 0040
Sep 18 02:46:56 birdun kernel: [387095.432258] RBP: 8801b58013c0 R08: 
000a26c8 R09: 000a
Sep 18 02:46:56 birdun kernel: [387095.512512] R10:  R11: 
81673868 R12: 0001
Sep 18 02:46:56 birdun kernel: [387095.596512] R13: 88014066e100 R14: 
8801b5801e80 R15: 
Sep 18 02:46:56 birdun kernel: [387095.684513] FS:  () 
GS:880008d0() knlGS:
Sep 18 02:46:56 birdun kernel: [387095.780002] CS:  0010 DS: 0018 ES: 0018 CR0: 
8005003b
Sep 18 02:46:56 birdun kernel: [387095.844512] CR2: 0040 CR3: 
00022d42b000 CR4: 06e0
Sep 18 02:46:56 birdun kernel: [387095.928512] DR0:  DR1: 
 DR2: 
Sep 18 02:46:56 birdun kernel: [387096.012511] DR3:  DR6: 
0ff0 DR7: 0400
Sep 18 02:46:56 birdun kernel: [387096.096005] Process scsi_eh_5 (pid: 763, 
threadinfo 88022b1c4000, task 88022ba39c40)
Sep 18 02:46:56 birdun kernel: [387096.192511] Stack:
Sep 18 02:46:56 birdun kernel: [387096.216511]  381d 
a014cb8b  0286
Sep 18 02:46:56 birdun kernel: [387096.300959] 0 ff10 
8801b58013c0 2002 0286
Sep 18 02:46:56 birdun kernel: [387096.390206] 0 88022df0a900 
88022b1c 88022b881840 a01407e4
Sep 18 02:46:56 birdun kernel: [387096.480511] Call Trace:
Sep 18 02:46:56 birdun kernel: [387096.508511]  [a014cb8b] ? 
qla24xx_abort_command+0x3f/0x1db [qla2xxx]
Sep 18 02:46:56 birdun kernel: [387096.592513]  [a01407e4] ? 
qla2xxx_eh_abort+0xf2/0x250 [qla2xxx]
Sep 18 02:46:56 birdun kernel: [387096.672511]  [a001ccde] ? 
scsi_error_handler+0x302/0x5b5 [scsi_mod]
Sep 18 02:46:56 birdun kernel: [387096.756512]  [a001c9dc] ? 
scsi_error_handler+0x0/0x5b5 [scsi_mod]
Sep 18 02:46:56 birdun kernel: [387096.836513]  [81063601] ? 
kthread+0x79/0x81
Sep 18 02:46:56 birdun kernel: [387096.896512]  [81011baa] ? 
child_rip+0xa/0x20
Sep 18 02:46:56 birdun kernel: [387096.956511]  [81063588] ? 
kthread+0x0/0x81
Sep 18 02:46:56 birdun kernel: [387097.012512]  [81011ba0] ? 
child_rip+0x0/0x20
Sep 18 02:46:56 birdun kernel: [387097.072511] Code: 31 d2 89 d0 c3 f0 83 2f 01 
79 05 e8 ca ae e9 ff c3 48 83 ec 08 9c 58 0f 1f 44 00 00 48 89 c6 fa 66 0f 1f 
44 00 00 ba 00 00 01 00 f0 0f c1 17 0f b7 ca c1 ea 10 39 d1 74 07 f3 90 0f b7 
0f eb f5
Sep 18 02:46:56 birdun kernel: [387097.292511] RIP  [812f8ea1] 
_spin_lock_irqsave+0x1a/0x34
Sep 18 02:46:56 birdun kernel: [387097.364514]  RSP 88022b1c5d70
Sep 18 02:46:56 birdun kernel: [387097.404511] CR2: 0040
Sep 18 02:46:56 birdun kernel:

Bug#594604: linux-image-2.6.32-5-sparc64-smp: Kernel panic - not syncing: Irrecoverable deferred error trap.

2010-08-28 Thread Josip Rodin

On Fri, Aug 27, 2010 at 07:29:13PM -0700, David Miller wrote:
 From: Josip Rodin j...@debbugs.entuzijast.net
 Date: Fri, 27 Aug 2010 21:31:37 +0200

  David, can you please queue this sunxvr500.c post-2.6.32 bugfix
  to sta...@kernel.org?

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bdd32ce95f79fb5cc964cd789d7ae4500bba7c6f

  Same story as the last one :)

 As you asked me to last week, I submitted this fix to all of the
 active branches of -stable earlier this week, so it should show up in
 the next round of -stable releases. :-)

Oh, I'm sorry about that, I actually thought it was a different one.
Oh well, better safe than sorry :) Thanks.

-- 
 2. That which causes joy or happiness.

-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100828150515.ga24...@entuzijast.net

Bug#594604: linux-image-2.6.32-5-sparc64-smp: Kernel panic - not syncing: Irrecoverable deferred error trap.

2010-08-27 Thread Josip Rodin

On Fri, Aug 27, 2010 at 11:49:06PM +0800, James Andrewartha wrote:
 Package: linux-2.6
 Version: 2.6.32-21
 Severity: important
 Tags: patch
 
 I'm getting the same error and kernel log as mentioned in
 http://thread.gmane.org/gmane.linux.ports.sparc/13092 for which there is
 a patch at http://thread.gmane.org/gmane.linux.ports.sparc/13092/focus=13101
 The hardware is a SunBlade 2000 with an XVR-1200. I've tested the patch 
 against
 linux-source-2.6.32 version 2.6.32-21 and it boots successfully. This patch
 was included in 2.6.34 bdd32ce95f79fb5cc964cd789d7ae4500bba7c6f.

David, can you please queue this sunxvr500.c post-2.6.32 bugfix
to sta...@kernel.org?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bdd32ce95f79fb5cc964cd789d7ae4500bba7c6f

Same story as the last one :)

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100827193137.ga22...@entuzijast.net

Bug#574243: please restore Sun XVR video drivers, and add the latest one [was Re: Debian Sparc 5.4 on Sun blade 2500]

2010-08-02 Thread Josip Rodin

On Sun, Aug 01, 2010 at 09:45:45PM -0400, Moritz Muehlenhoff wrote:
 On Wed, Mar 17, 2010 at 01:13:45AM +0100, Josip Rodin wrote:
  Package: linux-2.6
  Version: 2.6.32-9
  
  On Tue, Mar 16, 2010 at 03:24:37PM -0700, David Miller wrote:
Josip, please make sure this gets fixed, please get my
sunxvr1000 driver added (attached) and then add:

CONFIG_FB_XVR500=y
CONFIG_FB_XVR2500=y
CONFIG_FB_XVR1000=y

to the config for sparc64.
   
   Ok, further checking shows that lenny has XVR500 and XVR2500
   enabled (doing a test install with a XVR-500 card right now)
   but testing doesn't.
  
  Indeed, they seem to have gone missing somehow from
  debian/config/sparc/config where there's just:
  
  # CONFIG_FB_LEO is not set
  # CONFIG_FB_S1D13XXX is not set
  
  whereas in the same source we have arch/sparc/configs/sparc64_defconfig
  where there's:
  
  # CONFIG_FB_LEO is not set
  CONFIG_FB_XVR500=y
  CONFIG_FB_XVR2500=y
  # CONFIG_FB_S1D13XXX is not set
  
  I'm filing a bug report with this message, thanks for the exact hint.
  
  As for the new driver that Dave mentioned as an attachment, it's
  already in mainline at:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2d378b9179881b46a0faf11430efb421fe03ddd8
  That should apply pretty easily to .32 stable.
 
 I've added the patch and activated it in the Sparc config. Do you have the
 hardware, can you test a build?

Me personally, no, but if you post a link I'm sure someone will crop up :)
Just yesterday we had someone ask about XVR-1200.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100802074356.ga29...@orion.carnet.hr

upgrade to new xen domU on old xen dom0?

2010-03-27 Thread Josip Rodin

Hi,

If I try to boot 2.6.32-4-xen-amd64 on a 2.6.26-2-xen-amd64 (lenny) dom0,
it gets stuck at:

[0.120653] XENBUS: Device with no driver: device/vbd/769
[0.120658] XENBUS: Device with no driver: device/vif/0
[0.120663] XENBUS: Device with no driver: device/console/0
[0.120679] 
/build/mattems-linux-2.6_2.6.32-10-amd64-Ff7Wwa/linux-2.6-2.6.32-10/debian/build/source_amd64_xen/drivers/rtc/hctosys.c:
 unable to open rtc device (rtc0)
[0.120822] Freeing unused kernel memory: 588k freed
[0.121088] Write protecting the kernel read-only data: 4264k
Loading, please wait...
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... FATAL: Error inserting fan 
(/lib/modules/2.6.32-4-xen-amd64/kernel/drivers/acpi/fan.ko): No such device
FATAL: Error inserting thermal 
(/lib/modules/2.6.32-4-xen-amd64/kernel/drivers/acpi/thermal.ko): No such device
[0.610445] blkfront: xvda1: barriers enabled
done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Waiting for root file system ...

Can anything be done? I thought the domUs were supposed to be a safe
upgrade?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100327105625.ga20...@orion.carnet.hr

Re: upgrade to new xen domU on old xen dom0?

2010-03-27 Thread Josip Rodin

On Sat, Mar 27, 2010 at 12:02:01PM +, Ian Campbell wrote:
 xen-blkfront is a module in the pvops based 2.6.32-x-xen-amd64 where as
 it was statically linked in the non-pvops 2.6.26-x-xen-and64 images.
 This already happened in Lenny for 32 bit guests (sort of) since the
 -686-bigmem kernel (which supports Xen) also uses modules for the
 drivers. I think the change is generally a step in the right direction.
 
 Perhaps running mkinitramfs within the 2.6.26 environment causes the
 2.6.32 initrd to not contain the correct module? (since it can't detect
 the requirement for the module because the current kernel has it
 statically linked?)
 
 This should be fixable with some configuration in the guest (e.g. add
 the modules to /etc/initramfs-tools/modules).

I ran the default install of the image package on the guest running .18,
and then copied the image and initrd over to the parent.
I extracted that initrd image now and I see

lib/modules/2.6.32-4-xen-amd64/kernel/drivers/block/xen-blkfront.ko

in it. Are you saying it could have gotten missed by the initrd init scripts
even though it's there? Couldn't we fix that automatism?

I diffed the trees and noticed that kernel/drivers/net/xen-netfront.ko
is missing from the initrd, but that's probably non-fatal.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100327161006.ga25...@orion.carnet.hr

Re: upgrade to new xen domU on old xen dom0?

2010-03-27 Thread Josip Rodin

On Sat, Mar 27, 2010 at 11:56:25AM +0100, joy wrote:
 [0.610445] blkfront: xvda1: barriers enabled
 done.
 Begin: Mounting root file system ... Begin: Running /scripts/local-top ... 
 done.
 Begin: Waiting for root file system ...
 
 Can anything be done? I thought the domUs were supposed to be a safe
 upgrade?

I missed Bastian's message as I'm not subscribed - please keep me in Cc:.

 What was the last known working version?

The one from lenny. Well, for some values of working at least :)

 You are supposed to provide the complete log. There are several possible
 pitfalls.
 
 If you are upgrading from an old-style image and followed old
 documentation it is most likely a wrong root device.

I just replaced the kernel and ramdisk settings on the old dom0.
The relevant settings, that work with our .26 and .18, are:

root= '/dev/hda1 ro'
disk= [ 'phy:pavo/lastovo,hda1,w' ]

What do I need to change?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100327161510.ga27...@orion.carnet.hr

Re: upgrade to new xen domU on old xen dom0?

2010-03-27 Thread Josip Rodin

On Sat, Mar 27, 2010 at 05:28:28PM +0100, Bastian Blank wrote:
   What was the last known working version?
  The one from lenny. Well, for some values of working at least :)
 
 Well, Lenny have two variants. The early pv-ops and the oldstyle one.

We had early pvops in lenny? Where? :)

   If you are upgrading from an old-style image and followed old
   documentation it is most likely a wrong root device.
  I just replaced the kernel and ramdisk settings on the old dom0.
  The relevant settings, that work with our .26 and .18, are:
  root= '/dev/hda1 ro'
  disk= [ 'phy:pavo/lastovo,hda1,w' ]
 
 Yeah, using [hs]d[a-z]* was already deprecated in Lenny.
 
  What do I need to change?
 
 Use xvda as device name.

OK, that works, thanks. We have got to get this documented somewhere
now that the deprecated option is broken. There is no mention of it at
http://wiki.debian.org/Xen and simple googling is far from conclusive.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100327170726.ga6...@orion.carnet.hr

Re: upgrade to new xen domU on old xen dom0?

2010-03-27 Thread Josip Rodin

On Sat, Mar 27, 2010 at 06:11:15PM +, Ian Campbell wrote:
  OK, that works, thanks. We have got to get this documented somewhere
  now that the deprecated option is broken. There is no mention of it at
  http://wiki.debian.org/Xen and simple googling is far from conclusive.
 
 Would you mind updating the wiki with your findings?

OK, done.

Speaking of the new domU, does anyone know anything about this:

[0.00] Calgary: detecting Calgary via BIOS EBDA area
[0.00] Calgary: Unable to locate Rio Grande table in EBDA - bailing!

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100327210925.ga29...@orion.carnet.hr

Re: Xen dom0 2.6.32 stable branch

2010-03-24 Thread Josip Rodin

On Thu, Mar 18, 2010 at 01:52:34AM +0100, joy wrote:
 On Wed, Feb 24, 2010 at 01:05:53PM +0100, joy wrote:
  http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=refs/heads/xen/stable
 
 It's great to see the new packages :) I didn't want to rain on the parade by
 instantly filing bug reports, but I must point out a bit of a problem with
 the .32 kernel that may have something to do with (the lack of) the NX bit:
 
 http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00243.html
 http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00658.html
 
 I'm getting ready to start bisecting.

Sadly that didn't help, but regardless, that problem was fixed yesterday with
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=de67ec8b23629776f786d62c3109552ea7f8cc27

Please update the package with the up-to-date xen/stable.
Want a critical bug report as a reminder? :)

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100324092424.ga16...@orion.carnet.hr

Bug#575183: fails to boot on SGI C2108-F6 server under Xen 3.4 hypervisor

2010-03-24 Thread Josip Rodin


Ian Campbell wrote:
 IIRC these kernels require a newer hypervisor than is in stable at the
 moment, at a minimum you need 3.4.3, RC's are available in testing.

I'd just like to confirm this, I distinctly recall seeing the mention of
the exact Mercurial changeset on the xen-devel list for a new hypercall.
Russell also mentioned it implicitly at
http://etbe.coker.com.au/2010/03/21/xen-debian-squeeze/

William, did you try the hypervisor upgrade, does it work then?

The new paravirt_ops Xen dom0 kernel packages should probably simply have a:

Conflicts: xen-hypervisor-3.4-$ARCH ( 3.4.3~rc3), xen-hypervisor-3.2-$ARCH, 
xen-hypervisor-3.0-$ARCH

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100324094420.ga19...@orion.carnet.hr

Bug#575183: fails to boot on SGI C2108-F6 server under Xen 3.4 hypervisor

2010-03-24 Thread Josip Rodin

On Wed, Mar 24, 2010 at 11:09:45AM +0100, Bastian Blank wrote:
   IIRC these kernels require a newer hypervisor than is in stable at the
   moment, at a minimum you need 3.4.3, RC's are available in testing.
 
  The new paravirt_ops Xen dom0 kernel packages should probably simply have a:
  Conflicts: xen-hypervisor-3.4-$ARCH ( 3.4.3~rc3), 
  xen-hypervisor-3.2-$ARCH, xen-hypervisor-3.0-$ARCH
 
 No. Kernels are co-installable.

Oh, crap, I forgot, yes. Maybe postinst messages then? Since the alternative
is usually an instant reboot loop, which will inevitably result in people
complaining.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100324114720.ga17...@orion.carnet.hr

Re: Xen dom0 2.6.32 stable branch

2010-03-17 Thread Josip Rodin

On Wed, Feb 24, 2010 at 01:05:53PM +0100, joy wrote:
 http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=refs/heads/xen/stable

It's great to see the new packages :) I didn't want to rain on the parade by
instantly filing bug reports, but I must point out a bit of a problem with
the .32 kernel that may have something to do with (the lack of) the NX bit:

http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00243.html
http://lists.xensource.com/archives/html/xen-devel/2010-03/msg00658.html

I'm getting ready to start bisecting.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100318005234.ga28...@orion.carnet.hr

Bug#574243: please restore Sun XVR video drivers, and add the latest one [was Re: Debian Sparc 5.4 on Sun blade 2500]

2010-03-16 Thread Josip Rodin

Package: linux-2.6
Version: 2.6.32-9

On Tue, Mar 16, 2010 at 03:24:37PM -0700, David Miller wrote:
Josip, please make sure this gets fixed, please get my
sunxvr1000 driver added (attached) and then add:

CONFIG_FB_XVR500=y
CONFIG_FB_XVR2500=y
CONFIG_FB_XVR1000=y

to the config for sparc64.

Ok, further checking shows that lenny has XVR500 and XVR2500
enabled (doing a test install with a XVR-500 card right now)
but testing doesn't.

Indeed, they seem to have gone missing somehow from
debian/config/sparc/config where there's just:

# CONFIG_FB_LEO is not set
# CONFIG_FB_S1D13XXX is not set

whereas in the same source we have arch/sparc/configs/sparc64_defconfig
where there's:

# CONFIG_FB_LEO is not set
CONFIG_FB_XVR500=y
CONFIG_FB_XVR2500=y
# CONFIG_FB_S1D13XXX is not set

I'm filing a bug report with this message, thanks for the exact hint.

As for the new driver that Dave mentioned as an attachment, it's
already in mainline at:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2d378b9179881b46a0faf11430efb421fe03ddd8
That should apply pretty easily to .32 stable.

I'm not sure offhand what the Debian policy is about tracking linux-stable
vs. adding new code, but either way this seems pretty uncontroversial -
it's a separate new driver which won't hurt any existing users, because
its OF match of SUNW,gfb simply does not overlap with anything else
in drivers/video/.

--
2. That which causes joy or happiness.

Bug#572442: sparc 2.6.29+ NMI watchdog deadlock on Sun Fire V240 etc

2010-03-04 Thread Josip Rodin

Package: linux-2.6
Severity: serious
Tags: upstream patch

Hi there,

Ever since kernel 2.6.29 came out, several classes of sparc machines have
been unable to upgrade, because they would get stuck while initializing
the new NMI watchdog code.

The process of trying to figure it out is mostly documented in this
long-running mailing list thread that spanned many months:
http://lists.debian.org/debian-sparc/2009/08/msg5.html
http://lists.debian.org/debian-sparc/2009/09/msg00018.html
http://lists.debian.org/debian-sparc/2009/10/msg00015.html
http://lists.debian.org/debian-sparc/2009/11/msg00034.html
http://lists.debian.org/debian-sparc/2009/12/msg0.html

Had this gone unattended, sparc release requalification might have been in
trouble, because the bug affects the Fire V240 sparc buildd machines as well
as Jurij Smakov's test machine, and that's a lot in our little universe :)

Fortunately David Miller came to the rescue and personally debugged the
problem on one of the buildds, and fixed the problem. His solution, that
we are currently running on schroeder.debian.org, is attached.

Please include the patch in the sparc kernel package so that we can test
it widely, preferably ASAP. TIA.

- Forwarded message from David Miller da...@davemloft.net -

Date: Wed, 03 Mar 2010 09:11:41 -0800 (PST)
Subject: Re: Sparc release requalification


Ok, I think I fixed it.

Attached are two versions of the fix, the first attachment is
for 2.6.33 and the second one is for any kernel 2.6.32 and
previous.

Give it a good test on any machine you've seen this problem on
and let me know how it goes.

Thanks.

From 8a4fd1e4922413cfdfa6c51a59efb720d904a5eb Mon Sep 17 00:00:00 2001
From: David S. Miller da...@davemloft.net
Date: Wed, 3 Mar 2010 09:06:03 -0800
Subject: [PATCH] sparc64: Make prom entry spinlock NMI safe.

If we do something like try to print to the OF console from an NMI
while we're already in OpenFirmware, we'll deadlock on the spinlock.

Use a raw spinlock and disable NMIs when we take it.

Signed-off-by: David S. Miller da...@davemloft.net
---
 arch/sparc/prom/p1275.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/prom/p1275.c b/arch/sparc/prom/p1275.c
index 4b7c937..2d8b70d 100644
--- a/arch/sparc/prom/p1275.c
+++ b/arch/sparc/prom/p1275.c
@@ -32,10 +32,9 @@ extern void prom_cif_interface(void);
 extern void prom_cif_callback(void);
 
 /*
- * This provides SMP safety on the p1275buf. prom_callback() drops this lock
- * to allow recursuve acquisition.
+ * This provides SMP safety on the p1275buf.
  */
-DEFINE_SPINLOCK(prom_entry_lock);
+DEFINE_RAW_SPINLOCK(prom_entry_lock);
 
 long p1275_cmd(const char *service, long fmt, ...)
 {
@@ -47,7 +46,9 @@ long p1275_cmd(const char *service, long fmt, ...)

p = p1275buf.prom_buffer;
 
-   spin_lock_irqsave(prom_entry_lock, flags);
+   raw_local_save_flags(flags);
+   raw_local_irq_restore(PIL_NMI);
+   raw_spin_lock(prom_entry_lock);
 
p1275buf.prom_args[0] = (unsigned long)p;   /* service */
strcpy (p, service);
@@ -139,7 +140,8 @@ long p1275_cmd(const char *service, long fmt, ...)
va_end(list);
x = p1275buf.prom_args [nargs + 3];
 
-   spin_unlock_irqrestore(prom_entry_lock, flags);
+   raw_spin_unlock(prom_entry_lock);
+   raw_local_irq_restore(flags);
 
return x;
 }
-- 
1.6.6.1


sparc64: Make prom entry spinlock NMI safe.

If we do something like try to print to the OF console from an NMI
while we're already in OpenFirmware, we'll deadlock on the spinlock.

Disable NMIs when we take it.

Signed-off-by: David S. Miller da...@davemloft.net

diff --git a/arch/sparc/prom/p1275.c b/arch/sparc/prom/p1275.c
index 4b7c937..815cab6 100644
--- a/arch/sparc/prom/p1275.c
+++ b/arch/sparc/prom/p1275.c
@@ -32,8 +32,7 @@ extern void prom_cif_interface(void);
 extern void prom_cif_callback(void);
 
 /*
- * This provides SMP safety on the p1275buf. prom_callback() drops this lock
- * to allow recursuve acquisition.
+ * This provides SMP safety on the p1275buf.
  */
 DEFINE_SPINLOCK(prom_entry_lock);
 
@@ -47,7 +46,9 @@ long p1275_cmd(const char *service, long fmt, ...)

p = p1275buf.prom_buffer;
 
-   spin_lock_irqsave(prom_entry_lock, flags);
+   raw_local_save_flags(flags);
+   raw_local_irq_restore(PIL_NMI);
+   spin_lock(prom_entry_lock);
 
p1275buf.prom_args[0] = (unsigned long)p;   /* service */
strcpy (p, service);
@@ -139,7 +140,8 @@ long p1275_cmd(const char *service, long fmt, ...)
va_end(list);
x = p1275buf.prom_args [nargs + 3];
 
-   spin_unlock_irqrestore(prom_entry_lock, flags);
+   spin_unlock(prom_entry_lock);
+   raw_local_irq_restore(flags);
 
return x;
 }


- End forwarded message -

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a

Bug#534978: clock drift in Xen domU with clocksource=xen

2010-03-04 Thread Josip Rodin

On Thu, Mar 04, 2010 at 05:21:31PM +0100, Markus Hochholdinger wrote:
  In my case this manifested itself when some PHP profiling via microtime()
  suddenly became useless, and it also caused occasional PostgreSQL errors
  with tables that had timestamp columns as keys, since it became possible
  for two independent transactions to come in at the exact same time.
 
 have you any service like ntp running on these boxes!? What will the app
 do if ntp corrects the time!?

NTP always corrects time in a very subtle manner (see its documentation).
I guess it's possible for it to screw with this, yet it never has.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100304162452.ga22...@orion.carnet.hr

Bug#534978: clock drift in Xen domU with clocksource=xen

2010-02-25 Thread Josip Rodin

Hi,

Markus Hochholdinger wrote:
 Here is my solution to this problem, lenny xen kernel:
 * dom0 with clocksource=jiffies and /proc/sys/xen/independent_wallclock=0
 * domU with clocksource=jiffies and /proc/sys/xen/independent_wallclock=0

Using jiffies as a clock source is not a solution, it's a workaround,
because its resolution (CONFIG_HZ^1) is not good enough for reading
microseconds, that is, time with microseconds will become just monotonic.
This will cause problems for any program that wants its time readouts to
be strictly increasing, as real-world time usually is :)

In other words you will get this (real example from a while ago):

% i=0; while :; do i=$((i+1)); if [ $i = 20 ]; then break; fi; date 
--rfc-3339=ns; done
2009-09-23 13:35:13.123400807+02:00
2009-09-23 13:35:13.127400857+02:00
2009-09-23 13:35:13.131400906+02:00
2009-09-23 13:35:13.135400956+02:00
2009-09-23 13:35:13.139401005+02:00
2009-09-23 13:35:13.143401055+02:00
2009-09-23 13:35:13.147401104+02:00
2009-09-23 13:35:13.151401154+02:00
2009-09-23 13:35:13.151401154+02:00
2009-09-23 13:35:13.155401203+02:00
2009-09-23 13:35:13.155401203+02:00
2009-09-23 13:35:13.159401253+02:00
2009-09-23 13:35:13.163401302+02:00
2009-09-23 13:35:13.167401352+02:00
2009-09-23 13:35:13.171401401+02:00
2009-09-23 13:35:13.171401401+02:00
2009-09-23 13:35:13.175401451+02:00
2009-09-23 13:35:13.179401500+02:00
2009-09-23 13:35:13.183401550+02:00

In my case this manifested itself when some PHP profiling via microtime()
suddenly became useless, and it also caused occasional PostgreSQL errors
with tables that had timestamp columns as keys, since it became possible
for two independent transactions to come in at the exact same time.

Having said that, if one doesn't switch to jiffies but wants to use live
migration, that ends up being hampered by the Time went backwards problem.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100225111854.ga18...@orion.carnet.hr

Bug#534978: clock drift in Xen domU with clocksource=xen

2010-02-25 Thread Josip Rodin

On Thu, Feb 25, 2010 at 01:33:05PM +0100, Bastian Blank wrote:
 On Thu, Feb 25, 2010 at 12:18:54PM +0100, Josip Rodin wrote:
  Using jiffies as a clock source is not a solution, it's a workaround,
  because its resolution (CONFIG_HZ^1) is not good enough for reading
  microseconds, that is, time with microseconds will become just monotonic.
  This will cause problems for any program that wants its time readouts to
  be strictly increasing, as real-world time usually is :)
 
 No. The time resolution is not defined and within one step it will
 always provide the same value.

What? :) The problem here is that a time readout function provides the same
value across *two* steps. A monotonic function is one which allows for that.
A strictly increasing function is one which does not. Most of the time,
just monotonic is okay, but not always.

   and it also caused occasional PostgreSQL errors
  with tables that had timestamp columns as keys, since it became possible
  for two independent transactions to come in at the exact same time.
 
 Äh, where is documented, that this supposed to work anyway?

The key column has a unique constraint and a default value of current
timestamp. Even if two perfectly concurrent writers come in to add a new
record, it's still logical to expect for them to be serialized to a minimal
extent, because the database itself is explicitly instructed to input all
values and maintain their uniqueness. The expectation that all updates take
at least one minimal unit of time is perhaps not theoretically valid, but
it's certainly like that in the real world (every action takes *some*
perceivable time).

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100225132755.ga11...@orion.carnet.hr

Bug#534978: clock drift in Xen domU with clocksource=xen

2010-02-25 Thread Josip Rodin

On Thu, Feb 25, 2010 at 03:01:41PM +0100, Bastian Blank wrote:
   No. The time resolution is not defined and within one step it will
   always provide the same value.
  What? :) The problem here is that a time readout function provides the same
  value across *two* steps. A monotonic function is one which allows for that.
  A strictly increasing function is one which does not. Most of the time,
  just monotonic is okay, but not always.
 
 No, the time is only monotone, not strictly monotone. (With discreet
 values, it is not possible to make it strictly monotone.)

You mean discrete. It's impossible to make it strictly monotone in the
resolution that is smaller than the smallest unit of time (or one that
converges into zero).

But anyway, the problem isn't just the monotonicity, it's simply that e.g.
with HZ of 250, a jiffie takes 4ms, so if you need to do anything with
something that takes a comparable amount of time, you're shit outta luck.

 and it also caused occasional PostgreSQL errors
with tables that had timestamp columns as keys, since it became possible
for two independent transactions to come in at the exact same time.
   Äh, where is documented, that this supposed to work anyway?
  The key column has a unique constraint and a default value of current
  timestamp. Even if two perfectly concurrent writers come in to add a new
  record, it's still logical to expect for them to be serialized to a minimal
  extent, because the database itself is explicitly instructed to input all
  values and maintain their uniqueness. The expectation that all updates take
  at least one minimal unit of time is perhaps not theoretically valid, but
  it's certainly like that in the real world (every action takes *some*
  perceivable time).
 
 Wrong answer. Where is this documented as working in the postgresql
 documentation?

I have no idea. Why would I need an exact documentation of this use case?
The unique and default key parameters, and the definition of the timestamp
data type are documented. Indeed, I just checked and the resolution of a
timestamp is explicitly documented as 1 microsecond, so if the underlying
system has a resolution of 4000 microseconds, that simply precludes it.
If you're trying to argue that nobody should be using anything using
microseconds because they're not supported by clocksource=jiffies, well,
then we might as well cease this pointless discussion.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100225150156.ga...@orion.carnet.hr

Xen dom0 2.6.32 stable branch

2010-02-24 Thread Josip Rodin

Hi,

Just in case I'm the first to notice, we now have:
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=refs/heads/xen/stable

This is the new upstream default branch, with paravirt_ops dom0, and based
on 2.6.32-stable, so it's presumably suitable for inclusion as a new patch
that would restore our packages linux-image-2.6-xen-{686,amd64} which were
previously based on forward-ported .18 dom0 patches. Yay.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20100224120553.ga2...@orion.carnet.hr

Re: [Users] New Kernel Patch

2010-01-16 Thread Josip Rodin

On Sat, Jan 16, 2010 at 12:17:19PM +0100, Suno Ano wrote:
 currently (January 2010) mainline is in development for the .33 release,
 .32 is stable and used by most Linux Distributions like for example
 Debian, Ubuntu, Suse, etc.
 
 From what it looks now Debian and Ubuntu are going into freeze for their
 next stable release in March 2010. Will there be an up-to-date OpenVZ
 kernel patch available by then? Debian is targeting to ship .32 with
 their next stable release called squeeze.
 
 In case OpenVZ will not be available on at least one of the major Linux
 distributions and its offsprings, no need to mention how horrid that
 would be ...

http://lists.debian.org/debian-devel-announce/2009/10/msg3.html said
OpenVZ will remain supported, but
and http://lists.debian.org/debian-release/2009/08/msg00233.html had
previously went unanswered and I don't see anything new at
http://packages.debian.org/linux-image-2.6-openvz-686

I'm thinking the most usable compromise would be if someone volunteered to
maintain the Debian packages of the actual kernel stable release 2.6.27 -
where the meaning of stable more closely corresponds to the Debian
stable release concept. For off-the-shelf usage, mainline releases can
satisfy the same definition, but for corner cases it's doubtful because
they tend to move too fast for people to track them reliably.

I have to mention that Xen has a similar problem - there are XCI 2.6.27
patches which seem to be maintained, whereas it's doubtful anyone really
wants to continue forward-porting the old branch to .32. Xen upstream do
have an advanced paravirt_ops dom0 branch (it's much further along than LXC
vs. OpenVZ, judging by the LXC description in this thread), but it would
still be a regression compared to the old branch for some people who use
some of those still-unimplemented features, so it's not a drop-in
replacement yet.

I'm Cc:ing Adrian Bunk - given that you initated the marking of .27 as
the real stable, and Greg KH is still maintaining .27 upstream, I can't
help but wonder if you might be willing to maintain those packages? :)

Also Cc:'ing the debian-kernel mailing list.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#559035: fyi .27 stable has it

2010-01-11 Thread Josip Rodin


stable/linux-2.6.27.y has this patch since 2009-10-12:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commitdiff;h=2578cf95969936c372db29ee2bbc21c9b6a299aa
it's included in the release since v2.6.27.37.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#535331: ditto

2009-11-07 Thread Josip Rodin

On Fri, Oct 23, 2009 at 11:54:21AM +0200, Josip Rodin wrote:
 I've experienced the same problem. I've got two lenny machines which have
[...]

FWIW Here's the last upgrade output pasted exactly as it just happened:

% sudo apt-get upgrade
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following packages will be upgraded:
  linux-image-2.6.26-2-686
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 20,2MB of archives.
After this operation, 0B of additional disk space will be used.
Do you want to continue [Y/n]? 
Get:1 http://security.debian.org lenny/updates/main linux-image-2.6.26-2-686 
2.6.26-19lenny2 [20,2MB]
Fetched 20,2MB in 21s (929kB/s)
Preconfiguring packages ...
(Reading database ... 24703 files and directories currently installed.)
Preparing to replace linux-image-2.6.26-2-686 2.6.26-19lenny1 (using 
.../linux-image-2.6.26-2-686_2.6.26-19lenny2_i386.deb) ...
The directory /lib/modules/2.6.26-2-686 still exists. Continuing as directed.
Done.
Unpacking replacement linux-image-2.6.26-2-686 ...
Setting up linux-image-2.6.26-2-686 (2.6.26-19lenny2) ...
Running depmod.
Running mkinitramfs-kpkg.
Not updating initrd symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny1 was configured last, according to dpkg)
Not updating image symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny1 was configured last, according to dpkg)
% 

% sudo perl -pi -e 's,^(my \$loader\s+=\s+),$1lilo,' 
/var/lib/dpkg/info/linux-image-2.6.26-2-686.postinst

% sudo dpkg-reconfigure linux-image-2.6.26-2-686
Running depmod.
Running mkinitramfs-kpkg.
Not updating initrd symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny2 was configured last, according to dpkg)
Not updating image symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny2 was configured last, according to dpkg)
You already have a LILO configuration in /etc/lilo.conf
Running boot loader as requested
Testing lilo.conf ... 
Testing successful.
Installing the partition boot sector... 
Running /sbin/lilo  ... 
Installation successful.
% 

 [...] after upgrading linux-image-2.6.26-2-686, I just get [...]

FWIW it also happens on the amd64 version, exactly the same:

% sudo apt-get upgrade
Reading package lists... Done
Building dependency tree   
Reading state information... Done
The following packages will be upgraded:
  linux-image-2.6.26-2-amd64
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 20,9MB of archives.
After this operation, 4096B of additional disk space will be used.
Do you want to continue [Y/n]? 
Get:1 http://security.debian.org lenny/updates/main linux-image-2.6.26-2-amd64 
2.6.26-19lenny2 [20,9MB]
Fetched 20,9MB in 20s (1013kB/s)   
Preconfiguring packages ...
(Reading database ... 20849 files and directories currently installed.)
Preparing to replace linux-image-2.6.26-2-amd64 2.6.26-19lenny1 (using 
.../linux-image-2.6.26-2-amd64_2.6.26-19lenny2_amd64.deb) ...
The directory /lib/modules/2.6.26-2-amd64 still exists. Continuing as directed.
Done.
Unpacking replacement linux-image-2.6.26-2-amd64 ...
Setting up linux-image-2.6.26-2-amd64 (2.6.26-19lenny2) ...
Running depmod.
Running mkinitramfs-kpkg.
Not updating initrd symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny1 was configured last, according to dpkg)
Not updating image symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny1 was configured last, according to dpkg)
% 

% sudo perl -pi -e 's,^(my \$loader\s+=\s+),$1lilo,' 
/var/lib/dpkg/info/linux-image-2.6.26-2-amd64.postinst

% sudo dpkg-reconfigure linux-image-2.6.26-2-amd64
Running depmod.
Running mkinitramfs-kpkg.
Not updating initrd symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny2 was configured last, according to dpkg)
Not updating image symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny2 was configured last, according to dpkg)
You already have a LILO configuration in /etc/lilo.conf
Running boot loader as requested
Testing lilo.conf ... 
Testing successful.
Installing the partition boot sector... 
Running /sbin/lilo  ... 
Installation successful.
% 

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#535331: ditto

2009-10-23 Thread Josip Rodin

Hi,

I've experienced the same problem. I've got two lenny machines which have
GPT paritition tables and Linux root on LVM, and they can't use anything but
LILO (there are some novelty hacks for GRUB but I haven't been able to test
them yet because this is in production). I have kernel-img.conf set up
right, but after upgrading linux-image-2.6.26-2-686, I just get the not
updating symbolic links messages and no triggers or boot loaders are run.
If left unattended, this typically renders these two systems unbootable.

It really looks like a failure to define the $loader variable in the
predefined variables section. If I just put 'lilo' in there and re-run
dpkg-reconfigure linux-image-2.6.26-2-686, the output changes to:

Running depmod.
Running mkinitramfs-kpkg.
Not updating initrd symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny1 was configured last, according to dpkg)
Not updating image symbolic links since we are being updated/reinstalled 
(2.6.26-19lenny1 was configured last, according to dpkg)
You already have a LILO configuration in /etc/lilo.conf
Running boot loader as requested
Testing lilo.conf ... 
Testing successful.
Installing the partition boot sector... 
Running /sbin/lilo  ... 
Installation successful.

This is what would be expected. The run_lilo() function goes out of its way
to determine whether the existence of /etc/lilo.conf is sufficient reason to
run lilo, so there doesn't appear to be any reason to completely omit it.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#535331: ditto

2009-10-23 Thread Josip Rodin

On Fri, Oct 23, 2009 at 01:15:52PM +0200, maximilian attems wrote:
 On Fri, Oct 23, 2009 at 11:54:21AM +0200, Josip Rodin wrote:
  Hi,
  
  I've experienced the same problem. I've got two lenny machines which have
  GPT paritition tables and Linux root on LVM, and they can't use anything but
  LILO (there are some novelty hacks for GRUB but I haven't been able to test
  them yet because this is in production). I have kernel-img.conf set up
  right, but after upgrading linux-image-2.6.26-2-686, I just get the not
  updating symbolic links messages and no triggers or boot loaders are run.
  If left unattended, this typically renders these two systems unbootable.
  
  It really looks like a failure to define the $loader variable in the
  predefined variables section. If I just put 'lilo' in there and re-run
  dpkg-reconfigure linux-image-2.6.26-2-686, the output changes to:
  
  Running depmod.
  Running mkinitramfs-kpkg.
  Not updating initrd symbolic links since we are being updated/reinstalled 
  (2.6.26-19lenny1 was configured last, according to dpkg)
  Not updating image symbolic links since we are being updated/reinstalled 
  (2.6.26-19lenny1 was configured last, according to dpkg)
  You already have a LILO configuration in /etc/lilo.conf
  Running boot loader as requested
  Testing lilo.conf ... 
  Testing successful.
  Installing the partition boot sector... 
  Running /sbin/lilo  ... 
  Installation successful.
  
  This is what would be expected. The run_lilo() function goes out of its way
  to determine whether the existence of /etc/lilo.conf is sufficient reason to
  run lilo, so there doesn't appear to be any reason to completely omit it.
 
 from the affected box:
 cat /etc/kernel-img.conf

I fail to see the benefit, but here goes - on both it's identical:

% cat /etc/kernel-img.conf
# Kernel image management overrides
# See kernel-img.conf(5) for details
do_symlinks = yes
relative_links = yes
do_bootloader = yes
do_bootfloppy = no
do_initrd = yes
link_in_boot = yes

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#525958: Sparc release requalification

2009-09-16 Thread Josip Rodin

On Tue, Sep 15, 2009 at 08:38:22PM +0100, Jurij Smakov wrote:
If the PROM console driver still has some utility, maybe the boot 
option is
the way to go... does it? Does anyone still manufacture new machines 
with
new and strange console types that we don't support? :)
   
   The PROM console driver has no relevance today at all.
   
   It should simply never be used.
  
  OK, cool, please remove it, and also please propagate that to the stable
  branches so we don't miss anyone who's not on the bleeding edge.
 
 Please feel free to follow up on http://bugs.debian.org/525958 which I've
 filed in April to have the CONFIG_PROM_CONSOLE removed.

Yeah, definitely, Cc:ed. Dave has in the meantime killed it completely in
his stable tree:

http://git.kernel.org/?p=linux/kernel/git/davem/sparc-2.6.git;a=commit;h=09d3f3f0e02c8a900d076c302c5c02227f33572d

There's another commit that takes it out of the defconfigs completely - it
was already unset there for a while now. So it's pretty much official now
(well, Linus still has to take it in, but that should be a formality).

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#514418: [FIX]: ultra45 boot failing...

2009-02-09 Thread Josip Rodin

On Sun, Feb 08, 2009 at 04:58:08PM -0800, David Miller wrote:
 So you're saying that X working is more important than machines
 actually booting at all?  These priorities are wrong.

When N (where N  0) users complain about dead X, and 0 users complain
about not being able to boot, the priorities are fairly clear...

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#514418: [FIX]: ultra45 boot failing...

2009-02-09 Thread Josip Rodin

On Mon, Feb 09, 2009 at 12:28:18AM -0800, David Miller wrote:
   So you're saying that X working is more important than machines
   actually booting at all?  These priorities are wrong.
  
  When N (where N  0) users complain about dead X, and 0 users complain
  about not being able to boot, the priorities are fairly clear...
 
 If their machine won't even boot into the installer they are unlikely
 to even make a report.

Nobody (before you that is) reported an installer failure on these machines,
so the situation is still clear from our point of view - it's certainly not
perfect or even good, but the system as a whole depends on user input.

 Furthermore the point remains that you put a change into the kernel
 that I would never have advocated had you presented the bug to
 me.  I would have suggested ways to fix the X server and even
 worked on the patch.
 
 But since nobody contacted me about this, a broken change went into
 the kernel instead.

That is true, someone should have contacted you (sparclinux list at least)
about that.

But then, it would have been completely your prerogative to respond to that
simply by saying - DTRT and go upgrade X, patching old X is a waste of my
time, and I guess nobody wanted to risk hearing that answer? :)

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#500358: Fix found

2008-11-11 Thread Josip Rodin

On Sun, Nov 09, 2008 at 10:30:38PM +0100, Bastian Blank wrote:
   SPARC is a traditionally brand
  architecture. This case
  affects Ultra 5 and may be several other workstation. So if something
  doesn't function
  on one box it doesn't function on a whole generation of boxes. I think this 
  is
  quite a big part of all Debian SPARC users.
 
 This still does not qualify for the severity grave:
 
 | makes the package in question unusable or mostly so,
 
 It still runs. And the Sparc machines I use don't show such problems.

Seeing how you're interested in this kind of bureaucratic nitpicking :p
I should point out that grave is actually too light a severity for this
bug, and critical should be used instead - the kernel upgrade broke the X
server, so it's a critical bug by definition (makes unrelated software on
the system break). The part that fit the grave severity was makes the
package in question mostly unusable, which is what any typical X user would
say in this situation.

In any case this is a pointless exercise, let's just make sure the bug is
fixed and go forward. I hope I'll be verifying the submitted patch on my
Ultra 5 soon.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#498536: Acknowledgement (.26 breaks firmware loading for qla2xxx on sparc)

2008-09-13 Thread Josip Rodin


This has been fixed a couple of days ago by Andrew Vasquez with some
help by Dave Miller. The patch is sent to the linux-scsi list/maintainers
for inclusion in -next as well as in -stable
(Message-Id: [EMAIL PROTECTED]).

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#498536: .26 breaks firmware loading for qla2xxx on sparc

2008-09-10 Thread Josip Rodin

Package: linux-image-2.6.26-1-sparc64-smp
Version: 2.6.26-4
Severity: grave
Tags: upstream

Hi,

qla2xxx's firmware loading thingy got hosed between .25 and .26,
I've already reported something along these lines to upstream, but I just
verified it with our kernel image so I'm filing it here too.

[egin: Loading essential drivers... ...
49.444722] SCSI subsystem initialized
[   49.524563] QLogic Fibre Channel HBA Driver: 8.02.01-k4-debug
[   49.602779] qla2xxx 0001:00:04.0: Found an ISP2200, irq 19, iobase 0x07fd
[010
[   49.714344] qla2xxx 0001:00:04.0: Configuring PCI space...
[   49.789041] scsi(0): No matching ROM signature.
[   49.851178] qla2xxx 0001:00:04.0: Configure NVRAM parameters...
[   50.025526] qla2xxx 0001:00:04.0: Inconsistent NVRAM detected: checksum=0x0 i
L=4qla2xxx 0001:00:04.0: Falling back to functioning (yet invalid -- WWPN) def
Bults.
[   50.230463] scsi(0): NVRAM configuration failed!
[   50.293695] qla2xxx 0001:00:04.0: Verifying loaded RISC code...
[   50.373898] scsi(0):  Load RISC code 
0   50.449209] firmware: requesting ql2200_fw.bin
[  110.508456] scsi(0): Failed to load firmware image (ql2200_fw.bin).
[  110.593119] qla2xxx 0001:00:04.0: Firmware image unavailable.
[  110.671025] qla2xxx 0001:00:04.0: Firmware images can be retrieved from: ftp:
[/ftp.qlogic.com/outgoing/linux/firmware/.
d  110.819988] scsi(0): Setup chip  FAILED .
a  110.884362] qla2xxx 0001:00:04.0: Failed to initialize adapter
[  110.963425] scsi(0): Failed to initialize adapter - Adapter flags 10.
[one.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: closed by maximilian attems [EMAIL PROTECTED] (Re: snd-intel8x0 line-in not working in later 2.6.x kernels)

2008-05-20 Thread Josip Rodin

On Tue, May 20, 2008 at 04:54:06PM +, Debian Bug Tracking System wrote:
 closing as according to upstream not a driver issue.
 marked as resolved thus closing.

I would appreciate it if you could first answer the question which I asked
in August last year (which was the reason I didn't close the bug myself):

 Can someone tell me the proper steps to test the default value to see if
 the bug was really just some local mishap?

It's a bit annoying to see people ignore you for nine months,
and then close the bug report rather than answering it.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: closed by maximilian attems [EMAIL PROTECTED] (Re: snd-intel8x0 line-in not working in later 2.6.x kernels)

2008-05-20 Thread Josip Rodin

reopen 439072
thanks

On Tue, May 20, 2008 at 07:08:59PM +0200, Josip Rodin wrote:
 On Tue, May 20, 2008 at 04:54:06PM +, Debian Bug Tracking System wrote:
  closing as according to upstream not a driver issue.
  marked as resolved thus closing.
 
 I would appreciate it if you could first answer the question which I asked
 in August last year (which was the reason I didn't close the bug myself):
 
  Can someone tell me the proper steps to test the default value to see if
  the bug was really just some local mishap?
 
 It's a bit annoying to see people ignore you for nine months,
 and then close the bug report rather than answering it.

I almost forgot - that's disregarding the simple fact that even if
the driver made no mistake with this, this 2ch vs 4ch issue is not documented
anywhere, neither in kernel (linux-2.6/Documentation/sound/alsa/?) nor in
userland (alsamixer(1), arecord(1), ...?), so if anyone else sees this
behaviour, whether due to one's own change or a possible bad default,
there is no recourse.

Closing bugs just like that may well be acceptable elsewhere, but I thought
we had a bit higher standards in Debian. :|

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: closed by maximilian attems [EMAIL PROTECTED] (Re: snd-intel8x0 line-in not working in later 2.6.x kernels)

2008-05-20 Thread Josip Rodin

On Tue, May 20, 2008 at 08:01:05PM +0200, maximilian attems wrote:
closing as according to upstream not a driver issue.
marked as resolved thus closing.
   
   I would appreciate it if you could first answer the question which I asked
   in August last year (which was the reason I didn't close the bug myself):
   
Can someone tell me the proper steps to test the default value to see if
the bug was really just some local mishap?
   
   It's a bit annoying to see people ignore you for nine months,
   and then close the bug report rather than answering it.
  
  I almost forgot - that's disregarding the simple fact that even if
  the driver made no mistake with this, this 2ch vs 4ch issue is not 
  documented
  anywhere, neither in kernel (linux-2.6/Documentation/sound/alsa/?) nor in
  userland (alsamixer(1), arecord(1), ...?), so if anyone else sees this
  behaviour, whether due to one's own change or a possible bad default,
  there is no recourse.
  
  Closing bugs just like that may well be acceptable elsewhere, but I thought
  we had a bit higher standards in Debian. :|
 
 great.
 
 firstly this is *not* a kernel bug.

Well, assertions are nice, but useless. It's a bug in the kernel module
if it produces completely unexpected results after an option is changed -
if such results are expected, then that expectation needs to be documented
*somewhere*.

 secondly this i not the way you'll get alsa userland support

You're free to clone and/or reassign the bug to the right alsa userland
packages.

 thirdly your message did mention that you even *tried* current 2.6.25

The Documentation/sound/alsa/ that I checked was with 2.6.25.4, so, yes,
the problem applies to the current version.

 candidate to closure unless you provide info that is is a kernel bug.

Again with the closure... You really should read the manual regarding
how to deal with bug reports. For example,
http://www.debian.org/doc/developers-reference/ch-pkgs.en.html#s-bug-handling

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: sparc and testing migration

2007-12-11 Thread Josip Rodin

On Fri, Nov 30, 2007 at 01:18:29AM +0100, Josip Rodin wrote:
 On Sun, Sep 02, 2007 at 07:30:24PM +0200, Andreas Barth wrote:
  as you all are probably aware, we currently have some quite bad issues
  with the sparc buildds for some times, especially
  http://bugs.debian.org/433187 unkillable processes on the buildds.
  
  I hope that the mentioned RC bug can be fixed soon - if so, we're happy
  to stop ignoring issues on sparc (or rather: we probably will find us in
  the situation that such cases cease to exist).
 
 I haven't seen a reply to this mail, so JFTR - that bug was fixed.
 
 Though I'm still not sure if packages built on the new buildd are getting
 uploaded, I haven't been able to contact James about it.

I happened to run into him last night on IRC - he confirmed that things
are all right on lebrun, whereas we still need to upgrade the kernel on
spontini before that machine can also build unstable.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Re: sparc and testing migration

2007-11-29 Thread Josip Rodin

On Sun, Sep 02, 2007 at 07:30:24PM +0200, Andreas Barth wrote:
 as you all are probably aware, we currently have some quite bad issues
 with the sparc buildds for some times, especially
 http://bugs.debian.org/433187 unkillable processes on the buildds.
 
 I hope that the mentioned RC bug can be fixed soon - if so, we're happy
 to stop ignoring issues on sparc (or rather: we probably will find us in
 the situation that such cases cease to exist).

I haven't seen a reply to this mail, so JFTR - that bug was fixed.

Though I'm still not sure if packages built on the new buildd are getting
uploaded, I haven't been able to contact James about it.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: Installing Debian on Ultrasparc III machines

2007-09-27 Thread Josip Rodin

On Mon, Sep 24, 2007 at 09:57:23PM +0200, Josip Rodin wrote:
   kernel which works well - at least on out US III machine. We've applied
   179c85ea53bef807621f335767e41e23f86f01df to make sure that the system
   doesn't create unkillable processes anymore if you use the libc6 from
   _lenny_.
   BTW, lebrun.d.o, also an USIII, running 2.6.23-rc6 plus the 
   aforementioned
   patch still created unkillable dpkg-query processes.
   
   BTW, I got around to changing the input/output-device on lebrun today,
   so I'll be able to get register dumps in case it goes dead.
  
  I'm not sure if those problems are related :) The register dumps would
  be needed if the kernel fails to initialize the CPU
 
 Fabio told me that break+p output might be useful in this case too,
 I'm just repeating :)

In any case, I let it run some more, and then when it went more or less
dead, I tried to press the said key combination on the keyboard - to no
avail. Break+p would be Ctrl+Pause+p? Didn't work, and Alt+Pause+p also
didn't work. What was even more annoying was the fact that Stop+a got me
the PROM shell, but I wasn't able to type anything in it (including 'go'),
so that effectively freezes the machine.

Please tell me if I did something stunningly stupid...

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: Installing Debian on Ultrasparc III machines

2007-09-24 Thread Josip Rodin

On Wed, Sep 19, 2007 at 12:10:26AM +0200, Josip Rodin wrote:
  kernel which works well - at least on out US III machine. We've applied
  179c85ea53bef807621f335767e41e23f86f01df to make sure that the system
  doesn't create unkillable processes anymore if you use the libc6 from
  _lenny_.
 
 BTW, lebrun.d.o, also an USIII, running 2.6.23-rc6 plus the aforementioned
 patch still created unkillable dpkg-query processes.

BTW, I got around to changing the input/output-device on lebrun today,
so I'll be able to get register dumps in case it goes dead.

Right now its buildd has been building for over 3.5 hours, and it has
created this one process:

buildd   20263  100  0.5 1941872 11472 ?   RN   18:25 192:03 dpkg-query 
--search libc.so.6

But it keeps moving! The load was around 5 when I checked this.

I went to run 'less buildd.log', but that process just stopped responding
instantly. I tried stracing it, and that strace stopped responding :)
The load went up to 7 after that.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: Installing Debian on Ultrasparc III machines

2007-09-24 Thread Josip Rodin

On Mon, Sep 24, 2007 at 09:53:44PM +0200, Bernd Zeimetz wrote:
  kernel which works well - at least on out US III machine. We've applied
  179c85ea53bef807621f335767e41e23f86f01df to make sure that the system
  doesn't create unkillable processes anymore if you use the libc6 from
  _lenny_.
  BTW, lebrun.d.o, also an USIII, running 2.6.23-rc6 plus the aforementioned
  patch still created unkillable dpkg-query processes.
  
  BTW, I got around to changing the input/output-device on lebrun today,
  so I'll be able to get register dumps in case it goes dead.
 
 I'm not sure if those problems are related :) The register dumps would
 be needed if the kernel fails to initialize the CPU

Fabio told me that break+p output might be useful in this case too,
I'm just repeating :)

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: Installing Debian on Ultrasparc III machines

2007-09-18 Thread Josip Rodin

On Tue, Sep 18, 2007 at 11:40:19PM +0200, Bernd Zeimetz wrote:
 kernel which works well - at least on out US III machine. We've applied
 179c85ea53bef807621f335767e41e23f86f01df to make sure that the system
 doesn't create unkillable processes anymore if you use the libc6 from
 _lenny_. Please read the following
 
 WARNING: using the libc6 from Etch on an US III machine results in a
 freeze (badly, as in not reacting to stop+a/break) of your system if you
 do things like using aptitude after becoming root by the use of su/sudo.
 This is not that bad with the libc6 from testing, but this is definitely
 NOT fixed.

BTW, lebrun.d.o, also an USIII, running 2.6.23-rc6 plus the aforementioned
patch still created unkillable dpkg-query processes.

[EMAIL PROTECTED]:/home/buildd/build/chroot-unstable/lib]# ./libc-2.6.1.so  

GNU C Library stable release version 2.6.1, by Roland McGrath et al.
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.2.1 (Debian 4.2.1-5).
Compiled on a Linux 2.6.17-rc1 system on 2007-09-04.
Available extensions:
crypt add-on version 2.1 by Michael Glad and others
GNU Libidn by Simon Josefsson
Native POSIX Threads Library by Ulrich Drepper et al
BIND-8.2.3-T5B
software FPU emulation by Richard Henderson, Jakub Jelinek and others
For bug reporting instructions, please see:
http://www.gnu.org/software/libc/bugs.html.

Outside the chroot it's etch.

-- 
 2. That which causes joy or happiness.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: linux-2.6 - [sparc64-smp] produces unkillable processes

2007-09-06 Thread Josip Rodin

On Tue, Sep 04, 2007 at 10:46:47AM +0200, Fabio Massimo Di Nitto wrote:
  We (David Miller and I) are already working on this. We finally got some
  info dump from a debugging patched kernel and I expect we will have a fix
  within the next 3/4 weeks.
  From our first look it seems like a futex bug and some users have
  reported that the latest 2.6.23-rcX do not show this behavior. Clearly we
  also want to figure out a fix for .22.
 
  Fabio
  I should mention that lebrun.d.o is still dead since the last attempt
  (ssh unresponsive since 2007-08-30 ~21:25), when it was running a 2.6.22.5
  with one davem patch applied (one line in kernel/futex_compat.c). If you
  need something more done to lebrun, such as kicking it back to life,
  just tell me...
 
  If you have console access, it would be good to get a processor dump by 
  break + p.
  
  I can easily reproduce that with my Sparc Ultra60 here, which is running
  as buildd for experimental. The machines has the very same problem. I
  will try that tonight.
 
 It is also worth checking with .23-rcX since it has been reported to be 
 working.

lebrun.d.o exploded again after a few hours of building under 2.6.23-rc5. :(

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: linux-2.6 - [sparc64-smp] produces unkillable processes

2007-09-04 Thread Josip Rodin

On Tue, Sep 04, 2007 at 06:16:05AM +0200, Fabio Massimo Di Nitto wrote:
  #433187 is the bug that has killed the buildds on lebrun and spontini, 
  right?
  
  AIUC, yes. at least i can reproduce that on my buildd.
 
 Hi guys,
 
 We (David Miller and I) are already working on this. We finally got some
 info dump from a debugging patched kernel and I expect we will have a fix
 within the next 3/4 weeks.
 From our first look it seems like a futex bug and some users have
 reported that the latest 2.6.23-rcX do not show this behavior. Clearly we
 also want to figure out a fix for .22.
 
 Fabio

I should mention that lebrun.d.o is still dead since the last attempt
(ssh unresponsive since 2007-08-30 ~21:25), when it was running a 2.6.22.5
with one davem patch applied (one line in kernel/futex_compat.c). If you
need something more done to lebrun, such as kicking it back to life,
just tell me...

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: linux-2.6 - [sparc64-smp] produces unkillable processes

2007-09-04 Thread Josip Rodin

On Tue, Sep 04, 2007 at 10:17:33AM +0200, Fabio Massimo Di Nitto wrote:
  I should mention that lebrun.d.o is still dead since the last attempt
  (ssh unresponsive since 2007-08-30 ~21:25), when it was running a 2.6.22.5
  with one davem patch applied (one line in kernel/futex_compat.c). If you
  need something more done to lebrun, such as kicking it back to life,
  just tell me...
 
 If you have console access, it would be good to get a processor dump by
 break + p.

Unfortunately it's impossible to get that remotely on lebrun, I could never
get its RSC to work right. Running consolehistory only gets me as far as
the first getty prints the issue file, and then zilch. :(

 I personally have no say on how buildds should be managed.. i guess it's
 up to you guys if you want to kick it back.

That question was for James :)

 If you do so just make sure you can grab CPU register dumps from console.

At this point I'm not sure if it would be possible to see them even if
I plugged the monitor into the VGA port, because I redirected output to
rsc-console. sigh

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#433187: linux-2.6 - [sparc64-smp] produces unkillable processes

2007-09-03 Thread Josip Rodin

Hi,

#433187 is the bug that has killed the buildds on lebrun and spontini, right?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: snd-intel8x0 line-in not working in later 2.6.x kernels

2007-08-28 Thread Josip Rodin

On Wed, Aug 22, 2007 at 04:13:19AM +0200, Josip Rodin wrote:
 On Wed, Aug 22, 2007 at 02:52:59AM +0200, Josip Rodin wrote:
  I'm reporting this bug that I have been seeing for a while and which is
  a regression from a few months/years ago - the line-in input simply doesn't
  work right. arecord(1) just doesn't record anything with it, it doesn't show
  any errors, it records silence. The recording from the same external source
  works just fine with the microphone input.
 
  This works just fine with the same hardware in MS Windows (ugh), and it also
  worked fine with an earlier 2.6.x kernel version that I had been using when
  I was still running sarge on this machine. But, I removed it in the meantime
  so I don't know which one it was. I think it was 2.6.16 or so, but I'm not
  sure. It's definitely not working with = 2.6.18 (I still have one of those
  and it behaves the same as 2.6.21).
 
 Oh, I might have been too vague there. I can't exactly reproduce the old
 state because I changed much of my other hardware in this machine since and
 my old kernel images won't boot; and then I also noticed that there was once
 an old OSS driver and then I switched to alsa, but I don't have backups of
 my ancient /etc/modules file so I don't know when that was.
 
 I re-selected the old-style i810_audio driver in 2.6.21 and compiled it,
 unloaded the ALSA driver, loaded the old driver, and voila, everything went
 back to normal, I can hear the TV sound just fine. So, it might be that this
 is an OSS-ALSA regression that slipped through the cracks?

After an upstream developer helped debug it, it seems that it works if I use
alsamixer to change the mixer from the 4 channel mode to the 2 channel mode.
The 2ch mode is supposed to be the default; I have no idea why my alsamixer
setup used the 4ch mode, at least I certainly don't remember ever fiddling
with that setting (because I have no idea what it really means :).

Can someone tell me the proper steps to test the default value to see if
the bug was really just some local mishap? Move away
/var/lib/alsa/asound.state and reload the modules?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: snd-intel8x0 line-in not working in later 2.6.x kernels

2007-08-23 Thread Josip Rodin

On Thu, Aug 23, 2007 at 07:04:13PM +0200, Adrian Bunk wrote:
  While browsing kernel options, I noticed:
  
  Please contact Adrian Bunk [EMAIL PROTECTED] if you had to
  say Y here because your hardware is not properly supported
  by ALSA.
  
  ...in the description of CONFIG_OSS_OBSOLETE, so, here I am :)
  
  This is Debian bug #439072 (and #384933 also looks suspiciously similar,
  if I might add).
 
 Please do the following:
 - check at the ALSA bug tracking system [1] whether your problem was 
   already reported
 - if there isn't already a bug for it, open a new bug
 - in any case, tell me the bug number so that I can track this issue

I found several bug reports that sounded familiar, but none of them
described this exact issue. This is the new ticket I just filed:

https://bugtrack.alsa-project.org/alsa-bug/view.php?id=3335

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: snd-intel8x0 line-in not working in later 2.6.x kernels

2007-08-22 Thread Josip Rodin

Hi Adrian,

On Wed, Aug 22, 2007 at 04:13:19AM +0200, Josip Rodin wrote:
  I'm reporting this bug that I have been seeing for a while and which is
  a regression from a few months/years ago - the line-in input simply doesn't
  work right. arecord(1) just doesn't record anything with it, it doesn't show
  any errors, it records silence. The recording from the same external source
  works just fine with the microphone input.
 
 I re-selected the old-style i810_audio driver in 2.6.21 and compiled it,
 unloaded the ALSA driver, loaded the old driver, and voila, everything went
 back to normal, I can hear the TV sound just fine. So, it might be that this
 is an OSS-ALSA regression that slipped through the cracks?

While browsing kernel options, I noticed:

Please contact Adrian Bunk [EMAIL PROTECTED] if you had to
say Y here because your hardware is not properly supported
by ALSA.

...in the description of CONFIG_OSS_OBSOLETE, so, here I am :)

This is Debian bug #439072 (and #384933 also looks suspiciously similar,
if I might add).

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: snd-intel8x0 line-in not working in later 2.6.x kernels

2007-08-21 Thread Josip Rodin

Package: linux-2.6

Hi,

I'm reporting this bug that I have been seeing for a while and which is
a regression from a few months/years ago - the line-in input simply doesn't
work right. arecord(1) just doesn't record anything with it, it doesn't show
any errors, it records silence. The recording from the same external source
works just fine with the microphone input.

I and apparently many other people noticed the same bug indirectly, by
noticing that the sound input from the TV card isn't working. (Many TV cards
ship with a line-out and a short cable which connects the TV sound output
with the regular sound card input.)

I also tried changing the external sources - I tried to record using a
microphone via the mic in, and that still works just fine, whereas the
microphone via the line in also fails to record anything.
And finally, the TV card is heard just fine when plugged into the mic in.

This works just fine with the same hardware in MS Windows (ugh), and it also
worked fine with an earlier 2.6.x kernel version that I had been using when
I was still running sarge on this machine. But, I removed it in the meantime
so I don't know which one it was. I think it was 2.6.16 or so, but I'm not
sure. It's definitely not working with = 2.6.18 (I still have one of those
and it behaves the same as 2.6.21).

I have previously recorded my problem in the Ubuntu bug report #29789[1],
which includes a bit of a convoluted description at the beginning which also
talks about an unrelated module, but it appears that several other people
are seeing the same problem as I am.

The Debian bug report #384933 mentions that snd-emu10k1 also has
a dysfunctional line-in.

The bug report #374545 mentions something vaguely similar, but even though
they have an explicit error message there, the suspicious bit was that the
regression happened from .15 to .16, which should be around the same time
as this.

I grepped the kernel patch files for .13, .14, .15, .16, .17, and they
rarely ever mention snd-intel8x0. The quirks option was added in .14,
but offhand that doesn't seem to be related to the line-in.
(The snd-emu10k1 driver is even more rarely mentioned in the said patches.)

Any help would be appreciated. TIA.

[1] https://bugs.launchpad.net/ubuntu/+bug/29789

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#439072: snd-intel8x0 line-in not working in later 2.6.x kernels

2007-08-21 Thread Josip Rodin

On Wed, Aug 22, 2007 at 02:52:59AM +0200, Josip Rodin wrote:
 I'm reporting this bug that I have been seeing for a while and which is
 a regression from a few months/years ago - the line-in input simply doesn't
 work right. arecord(1) just doesn't record anything with it, it doesn't show
 any errors, it records silence. The recording from the same external source
 works just fine with the microphone input.

 This works just fine with the same hardware in MS Windows (ugh), and it also
 worked fine with an earlier 2.6.x kernel version that I had been using when
 I was still running sarge on this machine. But, I removed it in the meantime
 so I don't know which one it was. I think it was 2.6.16 or so, but I'm not
 sure. It's definitely not working with = 2.6.18 (I still have one of those
 and it behaves the same as 2.6.21).

Oh, I might have been too vague there. I can't exactly reproduce the old
state because I changed much of my other hardware in this machine since and
my old kernel images won't boot; and then I also noticed that there was once
an old OSS driver and then I switched to alsa, but I don't have backups of
my ancient /etc/modules file so I don't know when that was.

I re-selected the old-style i810_audio driver in 2.6.21 and compiled it,
unloaded the ALSA driver, loaded the old driver, and voila, everything went
back to normal, I can hear the TV sound just fine. So, it might be that this
is an OSS-ALSA regression that slipped through the cracks?

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#409244: bug

2007-02-07 Thread Josip Rodin

Hi,

Thinking about this, there's actually another thing bothering me - you can't
use firmware-qlogic without the 'MODULES=most' option, IOW, there seems to
be no way to build just a minimal initrd just with the qla2xxx and the
firmware file.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#409244: initramfs doesn't include the udev firmware helper

2007-02-01 Thread Josip Rodin

Package: initramfs-tools
Version: 0.85e

Hi,

The other day I tried to boot a Sun Fire 280R that works nicely with kernel
2.4.30; however, it didn't work, because the qla2xxx driver can't find
the firmware image, and it fails to load properly, meaning I can't access
the hard disks in the machine, and... flop. :)

I worked around this by including the proprietary file downloaded from the
URL provided in kernel config help, ql2200_fw.bin, using a hook file.

It was necessary to load qla2xxx *after* init-premount, because it needs
udev to load in order to access firmware helper.

But, for udev to actually use the firmware helper, it sounds like this is
also needed:
copy_exec /lib/udev/firmware.agent /lib/udev/

After that, the hook file that installs into /lib/firmware also needed:
mkdir -p ${DESTDIR}/lib/firmware

Those two problems are more general; another cp/copy_exec for the actual
file is probably a matter for another package, with license issues sorting
and all that.

Cf. http://lists.debian.org/debian-sparc/2007/01/msg00074.html and
http://lists.debian.org/debian-sparc/2007/02/msg2.html

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#409244: initramfs doesn't include the udev firmware helper

2007-02-01 Thread Josip Rodin

On Thu, Feb 01, 2007 at 01:41:36PM +0100, maximilian attems wrote:
  The other day I tried to boot a Sun Fire 280R that works nicely with kernel
  2.4.30; however, it didn't work, because the qla2xxx driver can't find
  the firmware image, and it fails to load properly, meaning I can't access
  the hard disks in the machine, and... flop. :)
 
 did you try to use firmware-qlogic?
 afaik it has the necessary hooks.

Oh, nice, thanks. It does all the same. Someone should have told me about
it; I should file a bug on that package for having a completely useless
description:

Description: Binary firmware for QLOGIC
 This package contains the binary firmware for QLOGIC.

apt-cache search qla2xxx returns nothing, and that's really easily fixable.

But, the existence of that package doesn't really help the installation
system, or any other driver which uses udev firmware.agent. It would be good
if at least the firmware helper was added by default - it's trivially small
overhead, and it helps reduce confusion for people trying to roll in their
own firmware images and whatnot.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

Bug#409244: initramfs doesn't include the udev firmware helper

2007-02-01 Thread Josip Rodin

clone 409244 -1
retitle -1 [sparc] Sun Fire 280R with disks on qla2xxx not bootable by default 
any more due to lack of firmware-qlogic
reassign -1 debian-installer
clone 409244 -2
retitle -2 firmware-qlogic description is inadequate
reassign -2 firmware-qlogic
severity 409244 wishlist
merge 355881 409244
thanks

On Thu, Feb 01, 2007 at 09:33:10PM +0100, maximilian attems wrote:
 On Thu, Feb 01, 2007 at 04:33:09PM +0100, Josip Rodin wrote:
  On Thu, Feb 01, 2007 at 01:41:36PM +0100, maximilian attems wrote:
The other day I tried to boot a Sun Fire 280R that works nicely with 
kernel
2.4.30; however, it didn't work, because the qla2xxx driver can't find
the firmware image, and it fails to load properly, meaning I can't 
access
the hard disks in the machine, and... flop. :)
   
   did you try to use firmware-qlogic?
   afaik it has the necessary hooks.
  
  Oh, nice, thanks. It does all the same. Someone should have told me about
  it; I should file a bug on that package for having a completely useless
  description:
  
  Description: Binary firmware for QLOGIC
   This package contains the binary firmware for QLOGIC.
  
  apt-cache search qla2xxx returns nothing, and that's really easily fixable.
 
 you want me to reassign this as wishlist against it?

Hm, I've performed the bug surgery above :)

  But, the existence of that package doesn't really help the installation
  system, or any other driver which uses udev firmware.agent. It would be good
  if at least the firmware helper was added by default - it's trivially small
  overhead, and it helps reduce confusion for people trying to roll in their
  own firmware images and whatnot.
 
 well module-init-tools modinfo will get info about modules
 needing firmware than the helper gets added. that is an 
 postetch item, there is a open bug repot against initramfs-tools
 tracking that.

Oh, I see, #355881. Merged.

-- 
 2. That which causes joy or happiness.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

86 matches

Mail list logo