from:"Anton Ivanov"

Bug#989571: linux-image-5.10.0-0.bpo.3-amd64: Incorrect large USB disk sizing leading to data corruption

2021-06-07 Thread Anton Ivanov


Close please.

The 17G was from trying to blank the drive, which for some reason 
disconnected in the process resulting in a file written in /dev with the 
name sda. From there on the loop and so on. So there was a /dev/sda file 
as a left-over after that. Thanks for pointing me in the right direction 
and apologies.


I am going to continue investigating why I got the data corruption in 
the first place, before I tried to blank it, but it looks like it may 
have been a hardware issue with the original USB-to-ATA bridge.


--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#989571: linux-image-5.10.0-0.bpo.3-amd64: Incorrect large USB disk sizing leading to data corruption

2021-06-07 Thread Anton Ivanov

Package: src:linux
Version: 5.10.13-1~bpo10+1
Severity: critical
Justification: causes serious data loss

Dear Maintainer,

Large USB drives (example - Seagate 4TB Backup) which work perfectly fine with 
4.19 are identified as incorrect size. In the case of the 4TB sized USB it's 
identified as a 17GB and for some unfatomable reason mounted as loop. The 
result is severe data corruption making all 4TB of data on the drive 
unrecoverable.

Tested with the original USB bridge coming with the drive and after attaching 
the SATA drive inside to an alternative USB bridge. Same result in both cases.

-- Package-specific info:
** Version:
Linux version 5.10.0-0.bpo.3-amd64 (debian-kernel@lists.debian.org) (gcc-8 
(Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP Debian 
5.10.13-1~bpo10+1 (2021-02-11)

** Command line:
BOOT_IMAGE=diskless/amd64/vmlinuz-5.10.0-0.bpo.3-amd64 
initrd=diskless/amd64/initrd.img-5.10.0-0.bpo.3-amd64 root=/dev/nfs ip=dhcp 
nfsroot=192.168.3.3:/exports/boot/madding mitigations=off rw  --

** Tainted: S (4)
 * SMP kernel oops on an officially SMP incapable processor

** Kernel log:
[754632.929276] nfs: server 192.168.3.3 OK
[754635.600887] rpc_check_timeout: 443 callbacks suppressed
[754635.600889] nfs: server 192.168.3.3 not responding, still trying
[754635.612996] nfs: server 192.168.3.3 not responding, still trying
[754635.625266] nfs: server 192.168.3.3 not responding, still trying
[754635.625462] nfs: server 192.168.3.3 not responding, still trying
[754635.637374] nfs: server 192.168.3.3 not responding, still trying
[754635.649472] nfs: server 192.168.3.3 not responding, still trying
[754635.661739] nfs: server 192.168.3.3 not responding, still trying
[754635.661922] nfs: server 192.168.3.3 not responding, still trying
[754635.673850] nfs: server 192.168.3.3 not responding, still trying
[754635.686131] nfs: server 192.168.3.3 not responding, still trying
[791938.374623] lxc-bridge0: port 3(tap-opsft2-0) entered blocking state
[791938.374628] lxc-bridge0: port 3(tap-opsft2-0) entered forwarding state
[791938.374654] lxc-bridge0: port 4(tap-opsft3-0) entered blocking state
[791938.374655] lxc-bridge0: port 4(tap-opsft3-0) entered forwarding state
[791938.375075] lxc-bridge0: port 2(tap-opsft1-0) entered blocking state
[791938.375078] lxc-bridge0: port 2(tap-opsft1-0) entered forwarding state
[791938.388241] k8-bridge0: port 2(tap-opsft1-1) entered blocking state
[791938.388243] k8-bridge0: port 2(tap-opsft1-1) entered forwarding state
[791938.388402] k8-bridge0: port 4(tap-opsft3-1) entered blocking state
[791938.388405] k8-bridge0: port 4(tap-opsft3-1) entered forwarding state
[791938.388481] k8-bridge0: port 3(tap-opsft2-1) entered blocking state
[791938.388484] k8-bridge0: port 3(tap-opsft2-1) entered forwarding state
[801076.265404] usb 4-2.4: new SuperSpeed Gen 1 USB device number 5 using 
xhci_hcd
[801076.289933] usb 4-2.4: New USB device found, idVendor=174c, idProduct=55aa, 
bcdDevice= 1.00
[801076.289937] usb 4-2.4: New USB device strings: Mfr=2, Product=3, 
SerialNumber=1
[801076.289939] usb 4-2.4: Product: ASM105x
[801076.289940] usb 4-2.4: Manufacturer: ASMT
[801076.289942] usb 4-2.4: SerialNumber: 
[801076.291139] scsi host10: uas
[801076.291557] scsi 10:0:0:0: Direct-Access ASMT 2115 0
PQ: 0 ANSI: 6
[801076.292065] sd 10:0:0:0: Attached scsi generic sg0 type 0
[801076.292232] sd 10:0:0:0: [sda] Spinning up disk...
[801077.321342] ..ready
[801082.447597] sd 10:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 
TB/3.64 TiB)
[801082.447600] sd 10:0:0:0: [sda] 4096-byte physical blocks
[801082.447673] sd 10:0:0:0: [sda] Write Protect is off
[801082.447674] sd 10:0:0:0: [sda] Mode Sense: 43 00 00 00
[801082.447832] sd 10:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[801082.448032] sd 10:0:0:0: [sda] Optimal transfer size 33553920 bytes not a 
multiple of physical block size (4096 bytes)
[801082.494646] sd 10:0:0:0: [sda] Attached SCSI disk
[801150.687429] loop: module loaded
[801150.815997] EXT4-fs (loop0): mounted filesystem with ordered data mode. 
Opts: (null)
[803002.579925] blk_update_request: I/O error, dev loop0, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[803002.579960] blk_update_request: I/O error, dev loop0, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[803017.725341] EXT4-fs (loop0): mounted filesystem with ordered data mode. 
Opts: (null)
[803081.125594] blk_update_request: I/O error, dev loop0, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[803081.125635] blk_update_request: I/O error, dev loop0, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[803085.522063] EXT4-fs (loop0): mounted filesystem with ordered data mode. 
Opts: (null)
[803239.336895] blk_update_request: I/O error, dev loop0, sector 0 op 
0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[803239.336950] blk_update_request: I/O

Bug#940821: NFS Caching broken in 4.19.37

2021-02-26 Thread Anton Ivanov


On 26/02/2021 15:03, Timo Rothenpieler wrote:
I think I can reproduce this, or something that at least looks very 
similar to this, on 5.10. Namely on 5.10.17 (On both Client and Server).


I think this is a different issue - see below.



We are running slurm, and since a while now (coincides with updating 
from 5.4 to 5.10, but a whole bunch of other stuff was updated at the 
same time, so it took me a while to correlate this) the logs it writes 
have been truncated, but only while they're being observed on the 
client, using tail -f or something like that.


Looks like this then:

On Server:

store01 /srv/export/home/users/timo/TestRun # ls -l slurm-41101.out
-rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
store01 /srv/export/home/users/timo/TestRun # wc -l slurm-41101.out
61 slurm-41101.out


On Client:

timo@login01 ~/TestRun $ ls -l slurm-41101.out
-rw-r--r-- 1 timo timo 1931 Feb 26 15:46 slurm-41101.out
timo@login01 ~/TestRun $ wc -l slurm-41101.out
24 slurm-41101.out


See https://gist.github.com/BtbN/b9eb4fc08ccc53bb20087bce0bf9f826 for 
the respective file-contents.


If I run the same test job, wait until its done, and then look at its 
slurm.out file, it matches between NFS Client and Server.
If I tail -f the slurm.out on an NFS client, the file stops getting 
updated on the client, but keeps getting more logs written to it on 
the NFS server.


The slurm.out file is being written to by another NFS client, which is 
running on one of the compute nodes of the system. It's being reads 
from a login node.


These are two different clients, then what you see is possible on NFS 
with client side caching. If you have multiple clients reading/writing 
to the same files you usually need to tune the caching options and/or 
use locking. I suspect that if you leave it for a while (until the cache 
expires) it will sort itself out.


In my test-case it is just one client, it missed a file deletion and 
nothing short of an unmount and remount fixes that. I have waited for 30 
mins+. It does not seem to refresh or expire. I also see the opposite 
behavior - the bug shows up on 4.x up to at least 5.4. I do not see it 
on 5.10.


Brgds,







Timo


On 21.02.2021 16:53, Anton Ivanov wrote:

Client side. This seems to be an entirely client side issue.

A variety of kernels on the clients starting from 4.9 and up to 5.10 
using 4.19 servers. I have observed it on a 4.9 client versus 4.9 
server earlier.


4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.

At present the server is at 4.19.67 in all tests.

Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 
(2019-11-11) x86_64 GNU/Linux


I can set-up a couple of alternative servers during the week, but so 
far everything is pointing towards a client fs cache issue, not a 
server one.


Brgds,






--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

Bug#940821: NFS Caching broken in 4.19.37

2021-02-21 Thread Anton Ivanov


On 21/02/2021 14:37, Bruce Fields wrote:

On Sun, Feb 21, 2021 at 11:38:51AM +, Anton Ivanov wrote:

On 21/02/2021 09:13, Salvatore Bonaccorso wrote:

On Sat, Feb 20, 2021 at 08:16:26PM +, Chuck Lever wrote:

Confirming you are varying client-side kernels. Should the Linux
NFS client maintainers be Cc'd?

Ok, agreed. Let's add them as well. NFS client maintainers any ideas
on how to trackle this?

This is not observed with Debian backports 5.10 package

uname -a
Linux madding 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1
(2021-02-11) x86_64 GNU/Linux

I'm still unclear: when you say you tested a certain kernel: are you
varying the client-side kernel version, or the server side, or both at
once?


Client side. This seems to be an entirely client side issue.

A variety of kernels on the clients starting from 4.9 and up to 5.10 
using 4.19 servers. I have observed it on a 4.9 client versus 4.9 server 
earlier.


4.9 fails, 4.19 fails, 5.2 fails, 5.4 fails, 5.10 works.

At present the server is at 4.19.67 in all tests.

Linux jain 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) 
x86_64 GNU/Linux


I can set-up a couple of alternative servers during the week, but so far 
everything is pointing towards a client fs cache issue, not a server one.


Brgds,


--b.



--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

Bug#940821: NFS Caching broken in 4.19.37

2021-02-21 Thread Anton Ivanov

On 21/02/2021 09:13, Salvatore Bonaccorso wrote:

Hi,

On Sat, Feb 20, 2021 at 08:16:26PM +, Chuck Lever wrote:

On Feb 20, 2021, at 3:13 PM, Anton Ivanov
wrote:

On 20/02/2021 20:04, Salvatore Bonaccorso wrote:

Hi,

On Mon, Jul 08, 2019 at 07:19:54PM +0100, Anton Ivanov wrote:

Hi list,

NFS caching appears broken in 4.19.37.

The more cores/threads the easier to reproduce. Tested with identical
results on Ryzen 1600 and 1600X.

1. Mount an openwrt build tree over NFS v4
2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a
loop
3. Result after 3-4 iterations:

State on the client

ls -laF
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../

State as seen on the server (mounted via nfs from localhost):

ls -laF
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../
-rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h

Actual state on the filesystem:

ls -laF
/exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../
-rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h

So the client has quite clearly lost the plot. Telling it to drop caches and
re-reading the directory shows the file present.

It is possible to reproduce this using a linux kernel tree too, just takes
much more iterations - 10+ at least.

Both client and server run 4.19.37 from Debian buster. This is filed as
debian bug 931500. I originally thought it to be autofs related, but IMHO it
is actually something fundamentally broken in nfs caching resulting in cache
corruption.

According to the reporter downstream in Debian, at
https://bugs.debian.org/940821#26 thi seem still reproducible with
more recent kernels than the initial reported. Is there anything Anton
can provide to try to track down the issue?

Anton, can you reproduce with current stable series?

100% reproducible with any kernel from 4.9 to 5.4, stable or backports. It may
exist in earlier versions, but I do not have a machine with anything before 4.9
to test at present.

Confirming you are varying client-side kernels. Should the Linux
NFS client maintainers be Cc'd?

Ok, agreed. Let's add them as well. NFS client maintainers any ideas
on how to trackle this?

This is not observed with Debian backports 5.10 package

uname -a
Linux madding 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1
(2021-02-11) x86_64 GNU/Linux

I left the testcase running for ~ 4 hours on a 6core/12thread Ryzen. It
should have blown up 10 times by now.

So one of the commits between 5.4 and 5.10.13 fixed it.

If nobody can think of a particular commit which fixes it, I can try
dissecting it during the week.

From 1-2 make clean && make cycles to one afternoon depending on the number
of machine cores. More cores/threads the faster it does it.

I tried playing with protocol minor versions, caching options, etc - it is
still reproducible for any nfs4 settings as long as there is client side
caching of metadata.

Regards,
Salvatore

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

--
Chuck Lever

Regards,
Salvatore

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

Bug#940821: NFS Caching broken in 4.19.37

2021-02-20 Thread Anton Ivanov

On 20/02/2021 20:04, Salvatore Bonaccorso wrote:

Hi,

On Mon, Jul 08, 2019 at 07:19:54PM +0100, Anton Ivanov wrote:

Hi list,

NFS caching appears broken in 4.19.37.

The more cores/threads the easier to reproduce. Tested with identical
results on Ryzen 1600 and 1600X.

1. Mount an openwrt build tree over NFS v4
2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a
loop
3. Result after 3-4 iterations:

State on the client

ls -laF
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../

State as seen on the server (mounted via nfs from localhost):

Actual state on the filesystem:

So the client has quite clearly lost the plot. Telling it to drop caches and
re-reading the directory shows the file present.

It is possible to reproduce this using a linux kernel tree too, just takes
much more iterations - 10+ at least.

Anton, can you reproduce with current stable series?

100% reproducible with any kernel from 4.9 to 5.4, stable or backports.
It may exist in earlier versions, but I do not have a machine with
anything before 4.9 to test at present.

From 1-2 make clean && make cycles to one afternoon depending on the
number of machine cores. More cores/threads the faster it does it.

I tried playing with protocol minor versions, caching options, etc - it
is still reproducible for any nfs4 settings as long as there is client
side caching of metadata.

Regards,
Salvatore

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

Bug#940821: closed by Bastian Blank (No response by submitter)

2021-02-20 Thread Anton Ivanov


On 20/02/2021 10:33, Debian Bug Tracking System wrote:

This is an automatic notification regarding your Bug report
which was filed against the src:linux package:

#940821: linux-image-5.2.0-2-amd64: file cache corruption with nfs4

It has been closed by Bastian Blank .

Their explanation is attached below along with your original report.
If this explanation is unsatisfactory and you have not received a
better one in a separate message then please contact Bastian Blank 
 by
replying to this email.



I missed the question. Probably hit the spam bucket for some reason.

I am able to reproduce it with more recent versions as well.

The most recent one I have around is 5.4.0-0.bpo.2-amd64

Still reproducible 100% - just tested it.

It is trivial to reproduce if anyone actually bothers to do so. Just 
grab a big enough tree where make runs truly in parallel - openwrt is 
best, but even the Linux kernel does the job.


Mount it via nfs4 from another server (it will work even locally, but 
takes longer to reproduce - may take a whole afternoon)


Run while make -j 12 clean && make -j 12 ; do true ; done

Leave it to run. On 6 cores/12 threads it takes 2-3 builds of openwrt or 
~ 5-8 linux kernel builds to blow up. More cores - faster. Less cores 
slower.


I sent it to the mailing list too, but nobody could be bothered to even 
ask any questions.


--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#945213: Info received (Bug#945213: linux-image-5.2.0-3-amd64: OOM handling broken if hugepages are enabled)

2019-11-24 Thread Anton Ivanov


[0.00] Linux version 5.2.0-3-amd64 (debian-kernel@lists.debian.org) 
(gcc version 8.3.0 (Debian 8.3.0-22)) #1 SMP Debian 5.2.17-1 (2019-09-26)
[0.00] Command line: BOOT_IMAGE=diskless/amd64/vmlinuz-5.2.0-3-amd64 
initrd=diskless/amd64/initrd.img-5.2.0-3-amd64 root=/dev/nfs ip=dhcp 
nfsroot=192.168.3.3:/exports/boot/buster-bess mitigations=off rw  --
[0.00] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with 
crng_init=0
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, 
using 'standard' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009e7ff] usable
[0.00] BIOS-e820: [mem 0x0009e800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x9dc43fff] usable
[0.00] BIOS-e820: [mem 0x9dc44000-0x9ddc] reserved
[0.00] BIOS-e820: [mem 0x9ddd-0x9ddd] ACPI data
[0.00] BIOS-e820: [mem 0x9dde-0x9e13bfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x9e13c000-0x9e694fff] reserved
[0.00] BIOS-e820: [mem 0x9e695000-0x9e695fff] usable
[0.00] BIOS-e820: [mem 0x9e696000-0x9e89bfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x9e89c000-0x9ecb1fff] usable
[0.00] BIOS-e820: [mem 0x9ecb2000-0x9eff3fff] reserved
[0.00] BIOS-e820: [mem 0x9eff4000-0x9eff] usable
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved
[0.00] BIOS-e820: [mem 0xfed0-0xfed00fff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfed8] reserved
[0.00] BIOS-e820: [mem 0xff00-0x] reserved
[0.00] BIOS-e820: [mem 0x00011000-0x00015eff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.7 present.
[0.00] DMI: System manufacturer System Product Name/F2A55, BIOS 5301 
10/10/2012
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 3501.783 MHz processor
[0.003478] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.003479] e820: remove [mem 0x000a-0x000f] usable
[0.003485] last_pfn = 0x15f000 max_arch_pfn = 0x4
[0.003490] MTRR default type: uncachable
[0.003490] MTRR fixed ranges enabled:
[0.003491]   0-9 write-back
[0.003492]   A-B write-through
[0.003493]   C-D2FFF write-protect
[0.003494]   D3000-E7FFF uncachable
[0.003494]   E8000-F write-protect
[0.003495] MTRR variable ranges enabled:
[0.003496]   0 base  mask 8000 write-back
[0.003497]   1 base 8000 mask E000 write-back
[0.003498]   2 base 9F00 mask FF00 uncachable
[0.003498]   3 disabled
[0.003499]   4 disabled
[0.003499]   5 disabled
[0.003500]   6 disabled
[0.003500]   7 disabled
[0.003501] TOM2: 00015f00 aka 5616M
[0.003713] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
[0.003882] e820: update [mem 0x9f00-0x] usable ==> reserved
[0.003887] last_pfn = 0x9f000 max_arch_pfn = 0x4
[0.007940] found SMP MP-table at [mem 0x000fd870-0x000fd87f]
[0.030016] Using GB pages for direct mapping
[0.030018] BRK [0x133801000, 0x133801fff] PGTABLE
[0.030020] BRK [0x133802000, 0x133802fff] PGTABLE
[0.030021] BRK [0x133803000, 0x133803fff] PGTABLE
[0.030074] BRK [0x133804000, 0x133804fff] PGTABLE
[0.030076] BRK [0x133805000, 0x133805fff] PGTABLE
[0.030380] BRK [0x133806000, 0x133806fff] PGTABLE
[0.030449] BRK [0x133807000, 0x133807fff] PGTABLE
[0.030551] BRK [0x133808000, 0x133808fff] PGTABLE
[0.030642] BRK [0x133809000, 0x133809fff] PGTABLE
[0.030767] BRK [0x13380a000, 0x13380afff] PGTABLE
[0.030857] BRK [0x13380b000, 0x13380bfff] PGTABLE
[0.030919] BRK [0x13380c000, 0x13380cfff] PGTABLE
[0.031040] RAMDISK: [mem 0x7e75-0x7fff]
[0.031046] ACPI: Early table checksum verification disabled
[0.039448] ACPI: RSDP 0x000F0490 24 (v02 ALASKA)
[0.039451] ACPI: XSDT 0x9DDD8078 64 (v01 ALASKA A M I
01072009 AMI  00010013)
[0.039457] ACPI: FACP 0x9DDDE868 00010C (v05 ALASKA A M I
01072009 AMI  00010013)
[0.039461] ACPI BIOS

Bug#945213: linux-image-5.2.0-3-amd64: OOM handling broken if hugepages are enabled

2019-11-22 Thread Anton Ivanov


On 22/11/2019 19:32, Ben Hutchings wrote:

Control: reassign -1 src:linux 5.2.17-1
Control: tag -1 moreinfo

On Thu, 2019-11-21 at 08:58 +, Anton Ivanov wrote:

Package: linux-image-5.2.0-3-amd64
Version: 5.2.17+1
Severity: important

Dear Maintainer,

Dear Maintainer,

OOM handling appears to be broken in 5.2.17-1 if hugepages are enabled.

Test system: AMD A4-5300, 40G RAM, no swap, booted disklessly.

Without hugepages enabled can compile dpdk without any issues. With huge
pages enabled it will reproducibly OOM when trying to link one of the
libraries. There are 20G+ free RAM at that point according to free with the
rest being mostly used as buffers.

It is sufficient to just enable huge pages to trigger this (2G out of 40G),
they are not allocated or used by anything.

What do you mean by "if hugepages are enabled"?  hugetlbfs and THP are
enabled by default.


$ tail -2 sysctl.conf

vm.nr_hugepages=1024

If you do not have that, compile completes fine. If you have that 
compile blows up when linking one of the dpdk libraries. At that point 
the machine has ~ 20G free RAM.

You need to provide a log of the OOM messages.


Ack. I will re-run the tests tomorrow and update the bug with detailed 
logs and the OOM.




Ben.



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#945213: linux-image-5.2.0-3-amd64: OOM handling broken if hugepages are enabled

2019-11-21 Thread Anton Ivanov

Package: linux-image-5.2.0-3-amd64
Version: 5.2.17+1
Severity: important

Dear Maintainer,

Dear Maintainer,

OOM handling appears to be broken in 5.2.17-1 if hugepages are enabled.

Test system: AMD A4-5300, 40G RAM, no swap, booted disklessly.

Without hugepages enabled can compile dpdk without any issues. With huge
pages enabled it will reproducibly OOM when trying to link one of the
libraries. There are 20G+ free RAM at that point according to free with the
rest being mostly used as buffers.

It is sufficient to just enable huge pages to trigger this (2G out of 40G),
they are not allocated or used by anything. 


-- System Information:
Debian Release: 10.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.2.0-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_GB:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Bug#940820: UML not loading on Debian buster with a 5.2 kernel from testing

2019-10-14 Thread Anton Ivanov


This is a regression in the randomization of the va setting.

UML will boot on debian 4.19 kernel host  with kernel.randomize_va_space = 2

UML will not boot debian 5.2 kernel host with kernel.randomize_va_space = 2

UML will boot on 5.2 once kernel.randomize_va_space is set to 0 on the host.

So something has changed in how randomize is implemented between 4.19 
and 5.2.


--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#941637: linux-image-4.19.0-6-amd64: noht flag on command line has no effect for 6 core/12 Thread Ryzens

2019-10-03 Thread Anton Ivanov


On 03/10/2019 16:06, Salvatore Bonaccorso wrote:

Control: tags -1 + moreinfo

Hi

On Thu, Oct 03, 2019 at 09:24:26AM +0100, Anton Ivanov wrote:

Package: src:linux
Version: 4.19.67-2+deb10u1
Severity: important

Dear Maintainer,

noht has no effect.

I have been trying to chase down a weird hang which occurs only on 6
core/12 thread Ryzens (I cannot reproduce it on 4/8 or older CPUs).

As a part of that I tried to disable ht. Well, it cannot be disabled
- the noht command line arg has no effect whatosever.

As ht can be a security hole this may have security implications as
well.

Do you mean 'nosmt'? (See kernel-parameters.txt).

You can find further information as well in
Documentation/admin-guide/hw-vuln/l1tf.rst.


I picked up noht from an older document somewhere and I cannot remember 
the actual source. It was definitely in the older version of RHEL 
guides, etc.


I can see that the parameter is nosmt now.

You can close the bug.



Regards,
Salvatore



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#941637: linux-image-4.19.0-6-amd64: noht flag on command line has no effect for 6 core/12 Thread Ryzens

2019-10-03 Thread Anton Ivanov

Package: src:linux
Version: 4.19.67-2+deb10u1
Severity: important

Dear Maintainer,

noht has no effect.

I have been trying to chase down a weird hang which occurs only on 6 core/12 
thread Ryzens (I cannot reproduce it on 4/8 or older CPUs).

As a part of that I tried to disable ht. Well, it cannot be disabled - the noht 
command line arg has no effect whatosever.

As ht can be a security hole this may have security implications as well.

-- Package-specific info:
** Version:
Linux version 4.19.0-6-amd64 (debian-kernel@lists.debian.org) (gcc version 
8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.19.0-6-amd64 
root=UUID=8eb17efb-6574-42d0-885e-487b98364059 ro mitigations=off noht quiet

** Not tainted

** Kernel log:
[4.833468] EDAC amd64: Node 0: DRAM ECC disabled.
[4.833470] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.892875] EDAC amd64: Node 0: DRAM ECC disabled.
[4.892877] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.932919] EDAC amd64: Node 0: DRAM ECC disabled.
[4.932920] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.968846] audit: type=1400 audit(1570086470.642:2): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-senddoc" 
pid=638 comm="apparmor_parser"
[4.969330] audit: type=1400 audit(1570086470.642:3): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-xpdfimport" 
pid=643 comm="apparmor_parser"
[4.971460] audit: type=1400 audit(1570086470.642:4): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-oopslash" 
pid=636 comm="apparmor_parser"
[4.972463] pktcdvd: pktcdvd0: writer mapped to sr0
[4.973798] audit: type=1400 audit(1570086470.646:5): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=639 
comm="apparmor_parser"
[4.973802] audit: type=1400 audit(1570086470.646:6): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" 
pid=639 comm="apparmor_parser"
[4.976702] EDAC amd64: Node 0: DRAM ECC disabled.
[4.976704] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.977529] audit: type=1400 audit(1570086470.650:7): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=646 
comm="apparmor_parser"
[4.977534] audit: type=1400 audit(1570086470.650:8): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="man_filter" pid=646 
comm="apparmor_parser"
[4.977537] audit: type=1400 audit(1570086470.650:9): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="man_groff" pid=646 
comm="apparmor_parser"
[4.977935] audit: type=1400 audit(1570086470.650:10): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=647 
comm="apparmor_parser"
[5.036714] EDAC amd64: Node 0: DRAM ECC disabled.
[5.036716] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[5.057409] new mount options do not match the existing superblock, will be 
ignored
[5.108619] EDAC amd64: Node 0: DRAM ECC disabled.
[5.108621] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[5.130890] fuse init (API version 7.27)
[5.164629] EDAC amd64: Node 0: DRAM ECC disabled.
[5.164630] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[5.212714] EDAC amd64: Node 0: DRAM ECC disabled.
[5.212716] EDAC amd64:

Bug#940820: linux-image-5.2.0-2-amd64: breaks UML all versions, both debian stock and compiled from source.

2019-09-20 Thread Anton Ivanov


Looks like the culprit is a different default elf start address on 5.x

What changes is not the sbrk(0) or _end - these are pretty much 
identical as in 4.x. It is the START which after some "fixups" in 
arch/um/kernel/uml.lds.S becomes __binary_start


I do not see an easy way to fix it :(

A.

On 20/09/2019 15:48, Anton Ivanov wrote:
These are the Start (that is what sbrk(0) returns) and &_end values I 
get for the two kernels:


Linux 4.19 on host - Start 1645867008 end 1631412224 diff 14454784

Linux 5.2 on host - Start 93825006145536 end 1631412224 diff 
93823374733312


I think the whole logic in UML here is broken because with memory 
model = large &_end is less than start to start off with so reserving 
XM gap does not quite make sense.


I am going to see if I can sort out the UML side, but I think we still 
need to check the host kernel side and what is reason for the sudden 
change in behavior.


A.

On 20/09/2019 11:12, Anton Ivanov wrote:

Package: src:linux
Version: 5.2.9-2
Severity: important

Dear Maintainer,

Any attempt to run UML on a machine running 5.2.9-2 results in:

Adding 9382334992 bytes to physical memory to account for 
exec-shield gap

Too few physical memory! Needed=93823417974784, given=547037904896

Running the same UML images on 4.19 debian stock has no issues.

A.

-- Package-specific info:
** Version:
Linux version 5.2.0-2-amd64 (debian-kernel@lists.debian.org) (gcc 
version 8.3.0 (Debian 8.3.0-21)) #1 SMP Debian 5.2.9-2 (2019-08-21)


** Command line:
BOOT_IMAGE=/boot/vmlinuz-5.2.0-2-amd64 
root=UUID=8eb17efb-6574-42d0-885e-487b98364059 ro mitigations=off 
noht quiet


** Not tainted

** Kernel log:
[    3.684402] input: HD-Audio Generic Front Mic as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input8
[    3.684490] input: HD-Audio Generic Rear Mic as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input9
[    3.684555] input: HD-Audio Generic Line as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input10
[    3.685553] input: HD-Audio Generic Line Out as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input11
[    3.685627] input: HD-Audio Generic Front Headphone as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input12

[    3.806626] kvm: Nested Virtualization enabled
[    3.806636] kvm: Nested Paging enabled
[    3.806637] SVM: Virtual VMLOAD VMSAVE supported
[    3.806637] SVM: Virtual GIF supported
[    3.820371] MCE: In-kernel MCE decoding enabled.
[    3.824533] EDAC amd64: Node 0: DRAM ECC disabled.
[    3.824536] EDAC amd64: ECC disabled in the BIOS or no ECC 
capability, module will not load.
 Either enable ECC checking or force module loading 
by setting 'ecc_enable_override'.
 (Note that use of the override may cause unknown 
side effects.)

[    3.872569] pktcdvd: pktcdvd0: writer mapped to sr0
[    3.900858] EDAC amd64: Node 0: DRAM ECC disabled.
[    3.900860] EDAC amd64: ECC disabled in the BIOS or no ECC 
capability, module will not load.
 Either enable ECC checking or force module loading 
by setting 'ecc_enable_override'.
 (Note that use of the override may cause unknown 
side effects.)

[    3.948661] EDAC amd64: Node 0: DRAM ECC disabled.
[    3.948662] EDAC amd64: ECC disabled in the BIOS or no ECC 
capability, module will not load.
 Either enable ECC checking or force module loading 
by setting 'ecc_enable_override'.
 (Note that use of the override may cause unknown 
side effects.)

[    3.996651] EDAC amd64: Node 0: DRAM ECC disabled.
[    3.996652] EDAC amd64: ECC disabled in the BIOS or no ECC 
capability, module will not load.
 Either enable ECC checking or force module loading 
by setting 'ecc_enable_override'.
 (Note that use of the override may cause unknown 
side effects.)
[    4.002382] audit: type=1400 audit(1568973482.655:2): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="libreoffice-xpdfimport" pid=706 comm="apparmor_parser"
[    4.002712] audit: type=1400 audit(1568973482.655:3): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="libreoffice-senddoc" pid=701 comm="apparmor_parser"
[    4.005254] audit: type=1400 audit(1568973482.659:4): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="libreoffice-oopslash" pid=699 comm="apparmor_parser"
[    4.007555] audit: type=1400 audit(1568973482.659:5): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="nvidia_modprobe" pid=702 comm="apparmor_parser"
[    4.007558] audit: type=1400 audit(1568973482.659:6): 
apparmor="STATUS" operation="profile_load" profile="unconfined"

Bug#940820: linux-image-5.2.0-2-amd64: breaks UML all versions, both debian stock and compiled from source.

2019-09-20 Thread Anton Ivanov

Package: src:linux
Version: 5.2.9-2
Severity: important

Dear Maintainer,

Any attempt to run UML on a machine running 5.2.9-2 results in:

Adding 9382334992 bytes to physical memory to account for exec-shield gap
Too few physical memory! Needed=93823417974784, given=547037904896

Running the same UML images on 4.19 debian stock has no issues.

A.

-- Package-specific info:
** Version:
Linux version 5.2.0-2-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 
(Debian 8.3.0-21)) #1 SMP Debian 5.2.9-2 (2019-08-21)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-5.2.0-2-amd64 
root=UUID=8eb17efb-6574-42d0-885e-487b98364059 ro mitigations=off noht quiet

** Not tainted

** Kernel log:
[3.684402] input: HD-Audio Generic Front Mic as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input8
[3.684490] input: HD-Audio Generic Rear Mic as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input9
[3.684555] input: HD-Audio Generic Line as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input10
[3.685553] input: HD-Audio Generic Line Out as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input11
[3.685627] input: HD-Audio Generic Front Headphone as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input12
[3.806626] kvm: Nested Virtualization enabled
[3.806636] kvm: Nested Paging enabled
[3.806637] SVM: Virtual VMLOAD VMSAVE supported
[3.806637] SVM: Virtual GIF supported
[3.820371] MCE: In-kernel MCE decoding enabled.
[3.824533] EDAC amd64: Node 0: DRAM ECC disabled.
[3.824536] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[3.872569] pktcdvd: pktcdvd0: writer mapped to sr0
[3.900858] EDAC amd64: Node 0: DRAM ECC disabled.
[3.900860] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[3.948661] EDAC amd64: Node 0: DRAM ECC disabled.
[3.948662] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[3.996651] EDAC amd64: Node 0: DRAM ECC disabled.
[3.996652] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.002382] audit: type=1400 audit(1568973482.655:2): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-xpdfimport" 
pid=706 comm="apparmor_parser"
[4.002712] audit: type=1400 audit(1568973482.655:3): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-senddoc" 
pid=701 comm="apparmor_parser"
[4.005254] audit: type=1400 audit(1568973482.659:4): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-oopslash" 
pid=699 comm="apparmor_parser"
[4.007555] audit: type=1400 audit(1568973482.659:5): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=702 
comm="apparmor_parser"
[4.007558] audit: type=1400 audit(1568973482.659:6): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" 
pid=702 comm="apparmor_parser"
[4.011004] audit: type=1400 audit(1568973482.663:7): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=709 
comm="apparmor_parser"
[4.011007] audit: type=1400 audit(1568973482.663:8): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="man_filter" pid=709 
comm="apparmor_parser"
[4.011009] audit: type=1400 audit(1568973482.663:9): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="man_groff" pid=709 
comm="apparmor_parser"
[4.012542] audit: type=1400 audit(1568973482.667:10): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="/usr/sbin/ntpd" pid=705 
comm="apparmor_parser"
[4.052465] EDAC amd64: Node 0: DRAM ECC disabled.
[4.052466] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.132680] EDAC amd64: Node 0: DRAM ECC disabled.
[4.132682] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.

Bug#940821: linux-image-5.2.0-2-amd64: file cache corruption with nfs4

2019-09-20 Thread Anton Ivanov

Package: src:linux
Version: 5.2.9-2
Severity: critical
Justification: breaks unrelated software

Dear Maintainer,

NFSv4 caching is completely broken on SMP.

How to reproduce:

Option 1. clone openwrt, run while make clean && make -j `nproc` ; do true ; 
done

It will break depending on number of CPUs within several runs. 

Symptoms of breakage. A directory on the client looks empty. Example (mnt is an 
NFSv4 mount):

ls -laF 
/mnt/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 8
drwxr-xr-x 2 anivanov anivanov 4096 Sep 20 10:51 ./
drwxr-xr-x 3 anivanov anivanov 4096 Sep 20 10:51 ../

While it actually has a file in it (same on server):

ls -laF 
/exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm
total 12
drwxr-xr-x 2 anivanov anivanov 4096 Sep 20 10:51 ./
drwxr-xr-x 3 anivanov anivanov 4096 Sep 20 10:51 ../
-rw-r--r-- 1 anivanov anivanov   32 Sep 20 10:51 ipcbuf.h

This cache entry on the client does not expire as it should per the NFSv4 
caching documentation - the only way of dealing with it is reboot, unmount or 
caches drop.

Option 2. Have your $HOME on nfsv4 and use thunderbird. Move mails between 
folders. Sooner or later (usually sooner) you will lose an email.

So this is both "breaks unrelated software" and "data loss" depending on what 
you are doing.

Tested on:

AMD Ryzen 5 2400G, AMD Ryzen 5 1600X, AMD Ryzen 5 1600, AMD A8-6500

Shows up on all. Fastest on the 6 core 12 thread ryzens, slowest on the AMD A8 
(takes up to 3 iterations of make there).

Brgds,

A.

-- Package-specific info:
** Version:
Linux version 5.2.0-2-amd64 (debian-kernel@lists.debian.org) (gcc version 8.3.0 
(Debian 8.3.0-21)) #1 SMP Debian 5.2.9-2 (2019-08-21)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-5.2.0-2-amd64 
root=UUID=8eb17efb-6574-42d0-885e-487b98364059 ro mitigations=off noht quiet

** Not tainted

** Kernel log:
[3.684402] input: HD-Audio Generic Front Mic as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input8
[3.684490] input: HD-Audio Generic Rear Mic as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input9
[3.684555] input: HD-Audio Generic Line as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input10
[3.685553] input: HD-Audio Generic Line Out as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input11
[3.685627] input: HD-Audio Generic Front Headphone as 
/devices/pci:00/:00:08.1/:09:00.3/sound/card0/input12
[3.806626] kvm: Nested Virtualization enabled
[3.806636] kvm: Nested Paging enabled
[3.806637] SVM: Virtual VMLOAD VMSAVE supported
[3.806637] SVM: Virtual GIF supported
[3.820371] MCE: In-kernel MCE decoding enabled.
[3.824533] EDAC amd64: Node 0: DRAM ECC disabled.
[3.824536] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[3.872569] pktcdvd: pktcdvd0: writer mapped to sr0
[3.900858] EDAC amd64: Node 0: DRAM ECC disabled.
[3.900860] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[3.948661] EDAC amd64: Node 0: DRAM ECC disabled.
[3.948662] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[3.996651] EDAC amd64: Node 0: DRAM ECC disabled.
[3.996652] EDAC amd64: ECC disabled in the BIOS or no ECC capability, 
module will not load.
Either enable ECC checking or force module loading by setting 
'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[4.002382] audit: type=1400 audit(1568973482.655:2): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-xpdfimport" 
pid=706 comm="apparmor_parser"
[4.002712] audit: type=1400 audit(1568973482.655:3): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-senddoc" 
pid=701 comm="apparmor_parser"
[4.005254] audit: type=1400 audit(1568973482.659:4): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="libreoffice-oopslash" 
pid=699 comm="apparmor_parser"
[4.007555] audit: type=1400 audit(1568973482.659:5): apparmor="STATUS" 
operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=702 
comm="apparmor_parser"
[4.007558] audit: type=1400 audit(1568973482.659:6):

Bug#931500:

2019-07-08 Thread Anton Ivanov


Same picture with different NFS minor versions - 4.0, 4.1

Same picture with and without hyperthreading

Same picture with and without different mitigations on/off via kernel 
command line.


100% reproducible within 4-5 repeats of make -j `cat /proc/cpuinfo | 
grep processor | wc -l` ; make clean on an openwrt tree. Reproducing it 
on a linux tree takes a bit longer, but it is also reproducible - 10-12 
times.


So actually the executive summary is - NFS is broken. Completely. That 
is not level 6 bug, that is a much higher, please adjust priority 
accordingly.


--

Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#931500: Acknowledgement (linux-image-4.19.0-5-amd64: kernel deadlock with autofs)

2019-07-08 Thread Anton Ivanov


The most interesting part - it is always the same file.

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm/ipcbuf.h


It becomes invisible from the client, but exists in the server. Usually 
takes ~4-5 builds in a loop to achieve that.


A.

On 08/07/2019 12:01, Anton Ivanov wrote:


On 08/07/2019 11:59, Anton Ivanov wrote:
There are clearly some issues with nfs across an autofs mount (maybe 
for hard mounts as well), so this may warrant an upgrade.


Example test.  Run make -j 12 ; make clean in a loop on an nfs 
mounted openwrt tree until it fails (usually 2-3 iterations).


State on the client

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm


total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../

State as seen on the server (mounted via nfs across localhost):

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

State on the filesystem:

ls -laF 
/exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

So actually this looks like the caching on NFS is royally fubar


Dropping caches restores things to normal, but that is not a solution. 
It is a diagnosis.



--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#931500: Acknowledgement (linux-image-4.19.0-5-amd64: kernel deadlock with autofs)

2019-07-08 Thread Anton Ivanov

There are clearly some issues with nfs across an autofs mount (maybe for 
hard mounts as well), so this may warrant an upgrade.


Example test.  Run make -j 12 ; make clean in a loop on an nfs mounted 
openwrt tree until it fails (usually 2-3 iterations).


State on the client

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm


total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../

State as seen on the server (mounted via nfs across localhost):

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

State on the filesystem:

ls -laF 
/exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

So actually this looks like the caching on NFS is royally fubar

--

Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#931500: Acknowledgement (linux-image-4.19.0-5-amd64: kernel deadlock with autofs)

2019-07-08 Thread Anton Ivanov




On 08/07/2019 11:59, Anton Ivanov wrote:
There are clearly some issues with nfs across an autofs mount (maybe 
for hard mounts as well), so this may warrant an upgrade.


Example test.  Run make -j 12 ; make clean in a loop on an nfs mounted 
openwrt tree until it fails (usually 2-3 iterations).


State on the client

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm


total 8
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../

State as seen on the server (mounted via nfs across localhost):

ls -laF 
/var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

State on the filesystem:

ls -laF 
/exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm

total 12
drwxr-xr-x 2 anivanov anivanov 4096 Jul  8 11:40 ./
drwxr-xr-x 3 anivanov anivanov 4096 Jul  8 11:40 ../
-rw-r--r-- 1 anivanov anivanov   32 Jul  8 11:40 ipcbuf.h

So actually this looks like the caching on NFS is royally fubar


Dropping caches restores things to normal, but that is not a solution. 
It is a diagnosis.


--
Anton R. Ivanov
https://www.kot-begemot.co.uk/

Bug#931500: linux-image-4.19.0-5-amd64: kernel deadlock with autofs

2019-07-06 Thread Anton Ivanov

Package: src:linux
Version: 4.19.37-5
Severity: normal
File: linux-image-4.19.0-5-amd64

Dear Maintainer,

An attempt to mount an nfs mount via autofs when it is being unmounted 
sometimes results in a deadlock.

This is  easier to reproduce with nfsv3. It is more difficult but still 
possible with nfs4.

I have been unable to reproduce it on any CPU with lower number of 
threads/cores than Ryzen 5 1600 (6/12). It is reliably reproducible on any 6 
core 12 thread or higher Ryzen.

It is not easy to trigger - usually takes up to 1-2 days of regular 
mount/unmounts at the normal autofs 5 min unmount interval to do that. It may 
sometimes happen in less than 30 minutes. In my case the culprit were system 
stats scripts executed every 5 minutes from cron. Raising the autofs timeout to 
600s eliminated the deadlocks.

The deadlock is usually hard and it is impossible to use Alt-SysRQ. The only 
time I managed to obtain a trace it was as follows:
Jun 28 12:56:01 sleer kernel: [101497.077162] rcu: INFO: rcu_sched 
self-detected stall on CPU
Jun 28 12:56:01 sleer kernel: [101497.077172] rcu: #0118-...!: (5250 ticks this 
GP) idle=6fa/1/0x4002 softirq=514095/514095 fqs=175 
Jun 28 12:56:01 sleer kernel: [101497.077174] rcu: #011 (t=5250 jiffies 
g=2596081 q=15)
Jun 28 12:56:01 sleer kernel: [101497.077179] rcu: rcu_sched kthread starved 
for 4900 jiffies! g2596081 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=7
Jun 28 12:56:01 sleer kernel: [101497.077180] rcu: RCU grace-period kthread 
stack dump:
Jun 28 12:56:01 sleer kernel: [101497.077182] rcu_sched   R  running task   
 010  2 0x8000
Jun 28 12:56:01 sleer kernel: [101497.077185] Call Trace:
Jun 28 12:56:01 sleer kernel: [101497.077192]  ? __schedule+0x2a2/0x870
Jun 28 12:56:01 sleer kernel: [101497.077194]  schedule+0x28/0x80
Jun 28 12:56:01 sleer kernel: [101497.077196]  schedule_timeout+0x16b/0x390
Jun 28 12:56:01 sleer kernel: [101497.077200]  ? 
__next_timer_interrupt+0xc0/0xc0
Jun 28 12:56:01 sleer kernel: [101497.077203]  rcu_gp_kthread+0x40d/0x850
Jun 28 12:56:01 sleer kernel: [101497.077205]  ? call_rcu_sched+0x20/0x20
Jun 28 12:56:01 sleer kernel: [101497.077207]  kthread+0x112/0x130
Jun 28 12:56:01 sleer kernel: [101497.077209]  ? kthread_bind+0x30/0x30
Jun 28 12:56:01 sleer kernel: [101497.077211]  ret_from_fork+0x1f/0x40
Jun 28 12:56:01 sleer kernel: [101497.077213] NMI backtrace for cpu 8
Jun 28 12:56:01 sleer kernel: [101497.077215] CPU: 8 PID: 21552 Comm: 
localStorage DB Tainted: GE 4.19.0-5-amd64 #1 Debian 4.19.37-5
Jun 28 12:56:01 sleer kernel: [101497.077216] Hardware name: System 
manufacturer System Product Name/PRIME B450M-A, BIOS 0604 12/07/2018
Jun 28 12:56:01 sleer kernel: [101497.077217] Call Trace:
Jun 28 12:56:01 sleer kernel: [101497.077218]  
Jun 28 12:56:01 sleer kernel: [101497.077220]  dump_stack+0x5c/0x80
Jun 28 12:56:01 sleer kernel: [101497.077223]  
nmi_cpu_backtrace.cold.4+0x13/0x50
Jun 28 12:56:01 sleer kernel: [101497.077225]  ? 
lapic_can_unplug_cpu.cold.29+0x3b/0x3b
Jun 28 12:56:01 sleer kernel: [101497.077227]  
nmi_trigger_cpumask_backtrace+0xf9/0xfb
Jun 28 12:56:01 sleer kernel: [101497.077229]  rcu_dump_cpu_stacks+0x9b/0xcb
Jun 28 12:56:01 sleer kernel: [101497.077231]  
rcu_check_callbacks.cold.80+0x1db/0x338
Jun 28 12:56:01 sleer kernel: [101497.077234]  ? tick_sched_do_timer+0x60/0x60
Jun 28 12:56:01 sleer kernel: [101497.077236]  update_process_times+0x28/0x60
Jun 28 12:56:01 sleer kernel: [101497.077238]  tick_sched_handle+0x22/0x60
Jun 28 12:56:01 sleer kernel: [101497.077240]  tick_sched_timer+0x37/0x70
Jun 28 12:56:01 sleer kernel: [101497.077241]  __hrtimer_run_queues+0x100/0x280
Jun 28 12:56:01 sleer kernel: [101497.077243]  hrtimer_interrupt+0x100/0x220
Jun 28 12:56:01 sleer kernel: [101497.077245]  ? handle_irq_event+0x47/0x5c
Jun 28 12:56:01 sleer kernel: [101497.077247]  
smp_apic_timer_interrupt+0x6a/0x140
Jun 28 12:56:01 sleer kernel: [101497.077248]  apic_timer_interrupt+0xf/0x20
Jun 28 12:56:01 sleer kernel: [101497.077249]  
Jun 28 12:56:01 sleer kernel: [101497.077251] RIP: 
0010:smp_call_function_many+0x1f8/0x250
Jun 28 12:56:01 sleer kernel: [101497.077253] Code: c7 e8 0c c4 5e 00 3b 05 1a 
86 01 01 0f 83 8c fe ff ff 48 63 d0 48 8b 0b 48 03 0c d5 00 b7 8c a4 8b 51 18 
83 e2 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 60 e3 b2 a4 4c 89 
fe 89 df
Jun 28 12:56:01 sleer kernel: [101497.077254] RSP: 0018:b93dc9cd3bb8 
EFLAGS: 0202 ORIG_RAX: ff13
Jun 28 12:56:01 sleer kernel: [101497.077256] RAX:  RBX: 
9309fec22c00 RCX: 9309fea27000
Jun 28 12:56:01 sleer kernel: [101497.077256] RDX: 0001 RSI: 
 RDI: 9309fec22c08
Jun 28 12:56:01 sleer kernel: [101497.077257] RBP: 9309fec22c08 R08: 
0004 R09: 9309fec22c48
Jun 28 12:56:01 sleer kernel: [101497.077258] R10: 9309fec22c08 R11: 
0008 R12: a3a6ca90
Jun 28 12:56:01 sleer kernel:

Bug#931048: linux-image-4.19.0-4-amd64: bridge MAC learning is broken

2019-06-25 Thread Anton Ivanov

Package: src:linux
Version: 4.19.28-1
Severity: normal
File: linux-image-4.19.0-4-amd64

Dear Maintainer,

Bridge MAC learning is completely broken at present. 

How to reproduce:
1. Build one or more MINIMAL vms or connect machines with MINIMAL installs to 
interfaces which join to a Linux bridge
2. Observe the bridge fdb using the bridge utility or brctl. 
3. Run traffic.

Obvious issues:

1. MACs expire even if there are gigabytes of traffic flowing to/from them. The 
refresh if used is completely broken
2. MACs are not immediately reinstated into the forwarding database if there is 
traffic upon expiry

Observations:

This seems to be a result of learning being tightly bound with the idea of 
neighbour and neighbour discovery code. MACs
are learned instantaneously if one of the hosts issues a multicast join - f.e. 
performs IPv6 neighbour discovery or runs
avahi. If either one of these is not present the bridge code does not function 
as it should. While as an idea this is good
it should not completely replace learning from unicast traffic.

-- Package-specific info:
** Version:
Linux version 4.19.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 
8.3.0 (Debian 8.3.0-2)) #1 SMP Debian 4.19.28-1 (2019-03-12)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.19.0-4-amd64 
root=UUID=3db3d925-a3d9-4c1d-b63d-c087261f1fb2 ro quiet

** Tainted: WE (8704)
 * Taint on warning.
 * Unsigned module has been loaded.

** Kernel log:
Unable to read kernel log; any relevant messages should be attached

** Model information
sys_vendor: System manufacturer
product_name: System Product Name
product_version: System Version
chassis_vendor: Default string
chassis_version: Default string
bios_vendor: American Megatrends Inc.
bios_version: 0409
board_vendor: ASUSTeK COMPUTER INC.
board_name: PRIME B450M-A
board_version: Rev X.0x

** Loaded modules:
cfg80211(E)
bnep(E)
nfnetlink_queue(E)
nfnetlink_log(E)
nfnetlink(E)
bluetooth(E)
drbg(E)
ansi_cprng(E)
ecdh_generic(E)
squashfs(E)
loop(E)
ufs(E)
qnx4(E)
hfsplus(E)
hfs(E)
minix(E)
ntfs(E)
msdos(E)
jfs(E)
xfs(E)
dm_mod(E)
cpuid(E)
uas(E)
usb_storage(E)
xt_nat(E)
xt_tcpudp(E)
xt_conntrack(E)
iptable_nat(E)
nf_nat_ipv4(E)
nf_nat(E)
nf_conntrack(E)
nf_defrag_ipv6(E)
nf_defrag_ipv4(E)
ip6table_filter(E)
ip6_tables(E)
nfsv3(E)
rpcsec_gss_krb5(E)
nfsv4(E)
dns_resolver(E)
nfs(E)
fscache(E)
iptable_filter(E)
veth(E)
bridge(E)
8021q(E)
garp(E)
mrp(E)
stp(E)
llc(E)
fuse(E)
tun(E)
binfmt_misc(E)
nls_ascii(E)
eeepc_wmi(E)
asus_wmi(E)
nls_cp437(E)
sparse_keymap(E)
rfkill(E)
wmi_bmof(E)
vfat(E)
fat(E)
edac_mce_amd(E)
uvcvideo(E)
videobuf2_vmalloc(E)
videobuf2_memops(E)
videobuf2_v4l2(E)
kvm_amd(E)
videobuf2_common(E)
ccp(E)
amdkfd(E)
videodev(E)
rng_core(E)
media(E)
snd_usb_audio(E)
joydev(E)
snd_usbmidi_lib(E)
kvm(E)
snd_rawmidi(E)
evdev(E)
snd_seq_device(E)
irqbypass(E)
efi_pstore(E)
crct10dif_pclmul(E)
crc32_pclmul(E)
snd_hda_codec_realtek(E)
snd_hda_codec_generic(E)
amdgpu(E)
ghash_clmulni_intel(E)
snd_hda_codec_hdmi(E)
efivars(E)
snd_hda_intel(E)
pcspkr(E)
chash(E)
snd_hda_codec(E)
gpu_sched(E)
snd_hda_core(E)
ttm(E)
snd_hwdep(E)
k10temp(E)
sp5100_tco(E)
snd_pcm_oss(E)
snd_mixer_oss(E)
drm_kms_helper(E)
snd_pcm(E)
snd_timer(E)
drm(E)
snd(E)
soundcore(E)
sg(E)
wmi(E)
video(E)
button(E)
pcc_cpufreq(E)
acpi_cpufreq(E)
hwmon_vid(E)
parport_pc(E)
nfsd(E)
auth_rpcgss(E)
ppdev(E)
nfs_acl(E)
lockd(E)
lp(E)
grace(E)
parport(E)
sunrpc(E)
efivarfs(E)
ip_tables(E)
x_tables(E)
autofs4(E)
ext4(E)
crc16(E)
mbcache(E)
jbd2(E)
fscrypto(E)
ecb(E)
btrfs(E)
zstd_decompress(E)
zstd_compress(E)
xxhash(E)
raid10(E)
raid456(E)
async_raid6_recov(E)
async_memcpy(E)
async_pq(E)
async_xor(E)
async_tx(E)
xor(E)
raid6_pq(E)
libcrc32c(E)
crc32c_generic(E)
raid0(E)
multipath(E)
linear(E)
raid1(E)
md_mod(E)
sd_mod(E)
hid_generic(E)
usbhid(E)
hid(E)
crc32c_intel(E)
aesni_intel(E)
aes_x86_64(E)
crypto_simd(E)
cryptd(E)
glue_helper(E)
ahci(E)
mptsas(E)
xhci_pci(E)
libahci(E)
igb(E)
mptscsih(E)
r8169(E)
i2c_piix4(E)
xhci_hcd(E)
realtek(E)
mptbase(E)
i2c_algo_bit(E)
libphy(E)
libata(E)
scsi_transport_sas(E)
dca(E)
usbcore(E)
usb_common(E)
scsi_mod(E)
gpio_amdpt(E)
gpio_generic(E)

** PCI devices:
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device 
[1022:15d0]
Subsystem: ASUSTeK Computer Inc. Device [1043:876b]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- 

00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device 
[1022:1452]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 
Kernel driver in use: pcieport

00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc.

Bug#924460: linux-image-4.19.0-0.bpo.2-amd64: Weird hangs on AMD Ryzen

2019-03-13 Thread Anton Ivanov

Package: src:linux
Version: 4.19.16-1~bpo9+1
Severity: important

Dear Maintainer,

Occasional hangs, under X only. During the hang no new processes can be
spawned from any terminal windows in the X session, windows which use DRM
like firefox, thunderbird, etc do not update. Windows can be moved and
it is possible to switch to a new desktop.

At the same time the rest of the machine works fine. Switching to a text
console works fine and any processes launched from there also work fine.
Firefox and other processes relying on DRM during the hang are shown in
D state.

The machine recovers by itself in less than a minute. The hang frequency
is once in a 3-4 hours.

I am using an up-todate out of tree it87 version to get the right sensors
on the MB. The bug shows both with and without this driver.

I also had to pull the most recent firmware from kernel.org for the video.

The bug is not observed when using a plug-in video card (Nvidia Quadro 290
NVS) so this looks like something related to DRM or amdgpu power management.

-- Package-specific info:
** Version:
Linux version 4.19.0-0.bpo.2-amd64 (debian-kernel@lists.debian.org) (gcc 
version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP Debian 4.19.16-1~bpo9+1 
(2019-02-07)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.19.0-0.bpo.2-amd64 
root=UUID=3db3d925-a3d9-4c1d-b63d-c087261f1fb2 ro quiet

** Tainted: WOE (12800)
 * Taint on warning.
 * Out-of-tree module has been loaded.
 * Unsigned module has been loaded.

** Kernel log:
[665617.595702] CR2: 557931720f18 CR3: 00024536e000 CR4: 
003406e0
[665617.595703] Call Trace:
[665617.595751]  optc1_lock+0x9e/0xb0 [amdgpu]
[665617.595796]  dcn10_pipe_control_lock.part.25+0x2d/0x70 [amdgpu]
[665617.595840]  dcn10_apply_ctx_for_surface+0xdf/0x540 [amdgpu]
[665617.595883]  ? hubbub1_verify_allow_pstate_change_high+0x82/0x1a0 [amdgpu]
[665617.595924]  dc_commit_state+0x23d/0x550 [amdgpu]
[665617.595963]  ? set_freesync_on_streams.part.7+0xce/0x2c0 [amdgpu]
[665617.596002]  ? mod_freesync_set_user_enable+0x16d/0x1b0 [amdgpu]
[665617.596046]  amdgpu_dm_atomic_commit_tail+0x33e/0xe60 [amdgpu]
[665617.596079]  ? amdgpu_bo_pin_restricted+0x68/0x280 [amdgpu]
[665617.596083]  ? _cond_resched+0x16/0x40
[665617.596085]  ? wait_for_completion_timeout+0x3b/0x1a0
[665617.596087]  ? refcount_inc_checked+0x5/0x30
[665617.596119]  ? amdgpu_bo_ref+0x17/0x20 [amdgpu]
[665617.596127]  commit_tail+0x3d/0x70 [drm_kms_helper]
[665617.596133]  drm_atomic_helper_commit+0xb4/0x120 [drm_kms_helper]
[665617.596147]  drm_atomic_connector_commit_dpms+0xe5/0xf0 [drm]
[665617.596159]  drm_mode_obj_set_property_ioctl+0x247/0x290 [drm]
[665617.596170]  ? drm_connector_set_obj_prop+0x80/0x80 [drm]
[665617.596181]  drm_connector_property_set_ioctl+0x3e/0x60 [drm]
[665617.596191]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[665617.596194]  ? sock_write_iter+0x87/0x100
[665617.596204]  drm_ioctl+0x2ff/0x390 [drm]
[665617.596215]  ? drm_connector_set_obj_prop+0x80/0x80 [drm]
[665617.596217]  ? do_iter_write+0xd6/0x180
[665617.596248]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[665617.596251]  do_vfs_ioctl+0xa2/0x640
[665617.596254]  ? do_sigaction+0xad/0x1e0
[665617.596256]  ksys_ioctl+0x70/0x80
[665617.596258]  __x64_sys_ioctl+0x16/0x20
[665617.596260]  do_syscall_64+0x55/0x110
[665617.596262]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[665617.596264] RIP: 0033:0x7fb56083a017
[665617.596265] Code: 00 00 00 48 8b 05 81 7e 2b 00 64 c7 00 26 00 00 00 48 c7 
c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 51 7e 2b 00 f7 d8 64 89 01 48
[665617.596266] RSP: 002b:7ffd64cbfd08 EFLAGS: 3246 ORIG_RAX: 
0010
[665617.596267] RAX: ffda RBX:  RCX: 
7fb56083a017
[665617.596268] RDX: 7ffd64cbfd40 RSI: c01064ab RDI: 
000e
[665617.596269] RBP: 7ffd64cbfd40 R08: 556b0190 R09: 
556aff1154d0
[665617.596270] R10:  R11: 3246 R12: 
c01064ab
[665617.596270] R13: 000e R14: 556afdb28fb0 R15: 
556afd86d580
[665617.596272] ---[ end trace 070aabde88b649c0 ]---
[665929.195580] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us * 
10 tries - optc1_lock line:628
[665929.195675] WARNING: CPU: 4 PID: 15694 at 
/build/linux-qcc0VE/linux-4.19.16/drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254
 generic_reg_wait+0xe5/0x150 [amdgpu]
[665929.195676] Modules linked in: 8021q garp mrp stp llc nls_utf8 isofs uas 
usb_storage fuse ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs dm_mod cpuid 
nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache binfmt_misc eeepc_wmi 
asus_wmi sparse_keymap rfkill wmi_bmof nls_ascii uvcvideo nls_cp437 amdkfd vfat 
videobuf2_vmalloc videobuf2_memops fat videobuf2_v4l2 videobuf2_common 
efi_pstore videodev edac_mce_amd snd_usb_audio media amdgpu 
snd_hda_codec_realtek kvm_amd snd_hda_codec_generic joydev ccp snd_usbmidi_lib 
snd_rawmidi rng_core

Bug#884284: nfs-kernel-server: NFSv4 broken

2017-12-13 Thread Anton Ivanov

Package: nfs-kernel-server
Version: 1:1.3.4-2.1
Severity: important

Dear Maintainer,

NFSv4 in stretch is broken and unusable.

After some time the server exporting the directories starts throwing

[1130732.440356] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[1130734.801510] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[1173981.176268] NFS: nfs4_reclaim_open_state: Lock reclaim failed!

messages, read/writes slow down to a crawl and at the end there is
no choice but to reboot the server. Restarting nfs-kernel-server,
unmounting from all known clients and remouting does not help.

I have now been forced to downgrade back to nfsv3 across the board.
The same setup works fine with NFSv3.

NFSv4 used to work perfectly fine in jessie and before that.

I am not sure if this started from the stretch upgrade or after one
of the stretch mid-life kernel updates (I think it is the latter).

Setup: Standard mid-size classic Linux/Unix multiuser install. Server(s)
exporting $HOME and other directories to a local network. Clients mount
via autofs when needed. Most directories are mounted from at least 2 (usually
more) clients.


-- Package-specific info:
-- rpcinfo --
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
151   udp  58357  mountd
151   tcp  37131  mountd
152   udp  54135  mountd
152   tcp  32951  mountd
153   udp  47587  mountd
153   tcp  41773  mountd
133   tcp   2049  nfs
134   tcp   2049  nfs
1002273   tcp   2049
133   udp   2049  nfs
134   udp   2049  nfs
1002273   udp   2049
1000211   udp  46283  nlockmgr
1000213   udp  46283  nlockmgr
1000214   udp  46283  nlockmgr
1000211   tcp  40039  nlockmgr
1000213   tcp  40039  nlockmgr
1000214   tcp  40039  nlockmgr
142   udp856  ypserv
141   udp856  ypserv
142   tcp857  ypserv
141   tcp857  ypserv
191   udp866  yppasswdd
 6001000691   udp874  fypxfrd
 6001000691   tcp875  fypxfrd
172   udp969  ypbind
171   udp969  ypbind
172   tcp970  ypbind
171   tcp970  ypbind
1000241   udp  44513  status
1000241   tcp  58657  status
-- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="--manage-gids"
NEED_SVCGSSD=""
RPCSVCGSSDOPTS=""
-- /etc/exports --
/exports
192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide,fsid=root) 
127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide,fsid=root)
/exports/md0
192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide) 
127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide)
/exports/md1
192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide) 
127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide)
/exports/md2
192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide) 
127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide)
-- /proc/fs/nfs/exports --
# Version 1.1
# Path Client(Flags) # IPs
/exports/md0
192.168.0.0/16(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,uuid=a114f04d:9e54427e:b051ce17:4dc02e9f,sec=1)
/exports
192.168.0.0/16(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,fsid=0,uuid=a3734f7a:774744b7:b41d4cea:bc2a4f0f,sec=1)
/exports
127.0.0.0/8(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,fsid=0,uuid=a3734f7a:774744b7:b41d4cea:bc2a4f0f,sec=1)
/exports/md0
127.0.0.0/8(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,uuid=a114f04d:9e54427e:b051ce17:4dc02e9f,sec=1)

-- System Information:
Debian Release: 9.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_GB:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nfs-kernel-server depends on:
ii  init-system-helpers  1.48
ii  keyutils 1.5.9-9
ii  libblkid12.29.2-1
ii  libc62.24-11+deb9u1
ii  libcap2  1:2.25-1
ii  libsqlite3-0 3.16.2-5
ii  libtirpc10.2.5-1.2
ii  libwrap0 7.6.q-26
ii  lsb-base 9.20161125
ii  netbase  5.4
ii  nfs-common   1:1.3.4-2.1
ii  ucf  3.0036

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information

OpenWRT Build Process broken on recent Debian NFS

2017-11-11 Thread Anton Ivanov

Hi all,

I am observing an interesting issue with the OpenWRT build process when
building on an up-to-date stretch host. It no longer works on NFS on
debian (it used to work).

If I run make with a clean freshly cloned directory tree on a normally
mounted filesystem it completes OK.

If I do a fresh git clone, mount the filesystem via nfs I get the following:

SHELL= flock /var/autofs/local/src/openwrt/tmp/.patch-2.7.5.tar.xz.flock
-c '    /var/autofs/local/src/openwrt/scripts/download.pl
"/var/autofs/local/src/openwrt/dl" "patch-2.7.5.tar.xz"
"e3da7940431633fb65a01b91d3b7a27a" "" "@GNU/patch"'
flock: /var/autofs/local/src/openwrt/tmp/.patch-2.7.5.tar.xz.flock: Bad
file descriptor
Makefile:23: recipe for target
'/var/autofs/local/src/openwrt/dl/patch-2.7.5.tar.xz' failed

The results are the same if I mount the system via autofs or directly
via command line mount.

If I run the flock statement "by hand" it completes OK as well so this
happens only if it is invoked out of the openwrt build process (I smell
a race here somewhere...).

I wish I could pinpoint the exact moment it broke. However, as the
actual problem is with downloads/stamps it is difficult to determine the
actual point in time it stopped working.

I tried running the build on a "pristine" stretch with no updates it was
already broken so this most likely happened somewhere between jessie and
stretch.

Any ideas (I do not want to file a Debian bug before narrowing it down)?

A.

Bug#752403: linux-image-3.12-0.bpo.1-amd64: gre fragmentation broken

2014-06-23 Thread Anton Ivanov

Package: src:linux
Version: 3.12.9-1~bpo70+1
Severity: important

Dear Maintainer,

The following should setup a gre tunnel which has MTU 1500 and 
fragments gre correctly as needed. 

ip link add gt0 type gretap remote 10.0.48.1 local 192.168.128.1
ip link set gt0 up
ifconfig gt0 mtu 1500

This works fine on 3.2 from wheezy.

Well, on 3.12 (and also tested on 3.10 from OpenWRT Barrier Breaker)
it does not. For some reason the kernel transmits _ONLY_ the second
frag, not the first (big) one. As aresult anything relying on 1500
mtu GRE breaks outright.

I have noticed that backports is now @ 3.14, I will retest with that
shortly.


-- Package-specific info:
** Version:
Linux version 3.12-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 
4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.12.9-1~bpo70+1 (2014-02-07)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.12-0.bpo.1-amd64 
root=UUID=49a2baa4-c4fb-4b25-a847-da38aabf6eb4 ro quiet rootdelay=10

** Not tainted

** Kernel log:
[  121.018798] ppdev: user-space parallel port driver
[  126.617541] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  126.683548] NFSD: starting 90-second grace period (net 81883dc0)
[  153.542230] usb 4-3: USB disconnect, device number 2
[  153.576943] lenovo_tpkbd 0003:17EF:6009.0002: usb_submit_urb(ctrl) failed: 
-19
[  153.577006] lenovo_tpkbd 0003:17EF:6009.0002: usb_submit_urb(ctrl) failed: 
-19
[  180.619815] ip_tables: (C) 2000-2006 Netfilter Core Team
[  180.652075] ip6_tables: (C) 2000-2006 Netfilter Core Team
[  180.692646] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[  181.426186] tg3 :04:00.0 eth0: Link is down
[  184.867657] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[  184.870369] device eth2 entered promiscuous mode
[  185.310452] IPv6: ADDRCONF(NETDEV_CHANGE): tap1: link becomes ready
[  185.310517] br0: port 2(tap1) entered forwarding state
[  185.310540] br0: port 2(tap1) entered forwarding state
[  185.340931] IPv6: ADDRCONF(NETDEV_CHANGE): tap0: link becomes ready
[  185.341022] br0: port 1(tap0) entered forwarding state
[  185.341053] br0: port 1(tap0) entered forwarding state
[  186.308096] br0: port 2(tap1) entered disabled state
[  186.969420] tg3 :04:00.0 eth0: Link is up at 1000 Mbps, full duplex
[  186.969449] tg3 :04:00.0 eth0: Flow control is off for TX and off for RX
[  187.945130] e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
None
[  187.945457] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  200.392550] br0: port 1(tap0) entered forwarding state
[  251.635353] Key type dns_resolver registered
[  251.665092] NFS: Registering the id_resolver key type
[  251.665138] Key type id_resolver registered
[  251.665143] Key type id_legacy registered
[  499.831589] br0: port 1(tap0) entered disabled state
[  526.554515] device l2tp0 entered promiscuous mode
[  526.554713] br0: port 3(l2tp0) entered forwarding state
[  526.554730] br0: port 3(l2tp0) entered forwarding state
[  541.589078] br0: port 3(l2tp0) entered forwarding state
[  906.452815] device l2tp1 entered promiscuous mode
[  906.452948] br0: port 4(l2tp1) entered forwarding state
[  906.452964] br0: port 4(l2tp1) entered forwarding state
[  921.507056] br0: port 4(l2tp1) entered forwarding state
[22276.366266] perf samples too long (2505  2500), lowering 
kernel.perf_event_max_sample_rate to 5
[25678.723114] ICMPv6 checksum failed [2a01:348:6:4c4::1  2a01:348:6:4c4::2]
[33901.361167] nr_pdflush_threads exported in /proc is scheduled for removal
[33901.361441] sysctl: The scan_unevictable_pages sysctl/node-interface has 
been disabled for lack of a legitimate use case.  If you have one, please send 
an email to linux...@kvack.org.
[34047.286645] device br0 entered promiscuous mode
[34054.745393] device br0 left promiscuous mode
[34509.021723] device lo entered promiscuous mode
[34523.995709] device lo left promiscuous mode
[34565.706977] device lo entered promiscuous mode
[34574.849005] device lo left promiscuous mode
[94270.499549] ICMPv6 checksum failed [2a01:348:6:4c4::1  2a01:348:6:4c4::2]
[100331.491781] ICMPv6 checksum failed [2a01:348:6:4c4::1  2a01:348:6:4c4::2]
[106812.568949] ICMPv6 checksum failed [2a01:348:6:4c4::1  2a01:348:6:4c4::2]
[126549.143683] CE: hpet increased min_delta_ns to 20115 nsec
[126549.143785] CE: hpet increased min_delta_ns to 30172 nsec
[126549.143893] CE: hpet increased min_delta_ns to 45258 nsec
[156243.297918] br0: port 4(l2tp1) entered disabled state
[156243.297967] br0: port 3(l2tp0) entered disabled state
[156260.892403] device l2tp1 left promiscuous mode
[156260.892419] br0: port 4(l2tp1) entered disabled state
[156260.892763] device l2tp0 left promiscuous mode
[156260.892769] br0: port 3(l2tp0) entered disabled state
[156260.893006] device tap1 left promiscuous mode
[156260.893012] br0: port 2(tap1) entered disabled state
[156260.893230] device tap0 left promiscuous mode
[156260.893235] br0: port 1(tap0) entered

Bug#751215: linux-image-3.12-0.bpo.1-amd64: bridge broken for tunnel interfaces

2014-06-11 Thread Anton Ivanov

Package: src:linux
Version: 3.12.9-1~bpo70+1
Severity: important

Dear Maintainer,

Tunnel interfaces using Evernet over LTPv3 are broken for bridge use.

Scenario:

l2tp0-br0-l2tp1 

Arp requests from l2tp0 emits OK
Arp request travels across bridge to l2tp1 OK (l2tp packets observed on
host)
Arp reply on l2tp1 emits OK
Arp reply is emitted by br0 as l2tp0, but the packet never ever arrives
on the l2tp client. It is eaten by the kernel somewhere. I have used several
alternative userspace eol2tp implementations (all working and tested) to test
this. Running them under debugger shows that they never get the encapsulated 
arp reply packet (it for some reason ends up being eaten by kernel). At the
same time tcpdump shows it on localhost.

As a result anything connected to l2tp0 can ping host (on br0), but cannot
ping anything on l2tp1 and vice versa.

Additional info not provided by normal scripts - tunnel setup:

ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 encap udp \
tunnel_id 1 peer_tunnel_id 1 udp_sport 16384 udp_dport 16385

ip l2tp add session name l2tp0 tunnel_id 1 session_id 0x \
peer_session_id 0x \
cookie deadbeefdeadbeef \
peer_cookie beefdeadbeefdead

/sbin/ifconfig l2tp0 mtu 1500 up

/sbin/brctl addif br0 l2tp0

ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 encap udp \
tunnel_id 2 peer_tunnel_id 2 udp_sport 16386 udp_dport 16387

ip l2tp add session name l2tp1 tunnel_id 2 session_id 0x \
peer_session_id 0x \
cookie deadbeefdeadbeef \
peer_cookie beefdeadbeefdead

/sbin/ifconfig l2tp1 mtu 1500 up

/sbin/brctl addif br0 l2tp1

So if I configure the bridge like that (perfectly legit config) it does not
work. 

-- Package-specific info:
** Version:
Linux version 3.12-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 
4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.12.9-1~bpo70+1 (2014-02-07)

** Command line:
BOOT_IMAGE=/boot/vmlinuz-3.12-0.bpo.1-amd64 
root=UUID=49a2baa4-c4fb-4b25-a847-da38aabf6eb4 ro quiet rootdelay=10

** Not tainted

** Kernel log:
[   18.661366] l2tp_netlink: L2TP netlink interface
[   18.662553] l2tp_eth: L2TP ethernet pseudowire support (L2TPv3)
[   21.573268] EXT4-fs (md1): mounting ext3 file system using the ext4 subsystem
[   21.615349] EXT4-fs (md1): mounted filesystem with ordered data mode. Opts: 
(null)
[   21.643764] EXT4-fs (md2): mounting ext3 file system using the ext4 subsystem
[   21.704521] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: 
(null)
[   21.728474] EXT4-fs (sda1): mounting ext3 file system using the ext4 
subsystem
[   21.757964] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: 
(null)
[   21.781781] EXT4-fs (sdd1): mounting ext3 file system using the ext4 
subsystem
[   21.810221] EXT4-fs (sdd1): mounted filesystem with ordered data mode. Opts: 
(null)
[   23.570895] tg3 :04:00.0: irq 50 for MSI/MSI-X
[   24.346403] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   26.936826] tg3 :04:00.0 eth0: Link is up at 1000 Mbps, full duplex
[   26.936838] tg3 :04:00.0 eth0: Flow control is on for TX and on for RX
[   26.936877] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   61.093148] Bridge firewalling registered
[   61.141009] tun: Universal TUN/TAP device driver, 1.6
[   61.141015] tun: (C) 1999-2004 Max Krasnyansky m...@qualcomm.com
[   61.178845] IPv6: ADDRCONF(NETDEV_UP): tap0: link is not ready
[   61.180350] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
[   61.182004] device tap0 entered promiscuous mode
[   61.183701] device tap1 entered promiscuous mode
[   61.209149] br0: port 2(tap1) entered forwarding state
[   61.209163] br0: port 2(tap1) entered forwarding state
[   62.140931] br0: port 2(tap1) entered disabled state
[   98.287522] RPC: Registered named UNIX socket transport module.
[   98.287528] RPC: Registered udp transport module.
[   98.287531] RPC: Registered tcp transport module.
[   98.287534] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   98.331544] FS-Cache: Loaded
[   98.354281] FS-Cache: Netfs 'nfs' registered for caching
[   98.397833] Installing knfsd (copyright (C) 1996 o...@monad.swb.de).
[   98.683597] fuse init (API version 7.22)
[  104.033349] sit: IPv6 over IPv4 tunneling driver
[  120.652842] Bluetooth: Core ver 2.16
[  120.652897] NET: Registered protocol family 31
[  120.652902] Bluetooth: HCI device and connection manager initialized
[  120.652922] Bluetooth: HCI socket layer initialized
[  120.652928] Bluetooth: L2CAP socket layer initialized
[  120.652942] Bluetooth: SCO socket layer initialized
[  120.713330] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[  120.713337] Bluetooth: BNEP filters: protocol multicast
[  120.713352] Bluetooth: BNEP socket layer initialized
[  120.742620] Bluetooth: RFCOMM TTY layer initialized
[  120.742648] Bluetooth:

Bug#663906: linux-image-2.6.32-5-amd64: ksm does not work

2012-03-14 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-35squeeze2
Severity: normal


enabling ksm by echo 1  /sys/kernel/mm/ksm/run has no effect
full_scans are always 0, no increment in any of the other variables



-- Package-specific info:
** Version:
Linux version 2.6.32-5-amd64 (Debian 2.6.32-35squeeze2) (da...@debian.org) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Fri Sep 9 20:23:16 UTC 2011

** Command line:
BOOT_IMAGE=/boot/vmlinuz-2.6.32-5-amd64 
root=UUID=49a2baa4-c4fb-4b25-a847-da38aabf6eb4 ro quiet

** Tainted: P (1)
 * Proprietary module has been loaded.

** Kernel log:
[1888762.816160] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 7e 61 43 00 00 01 00
[1888762.816185] ata1.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 
2048 in
[1888762.816188]  res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 
(timeout)
[1888762.816195] ata1.00: status: { DRDY }
[1888762.816207] ata1: hard resetting link
[1888763.136059] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[1888763.160243] ata1.00: configured for UDMA/100
[1888763.165702] ata1: EH complete
[1888793.816223] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[1888793.816236] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 7e 61 43 00 00 01 00
[1888793.816260] ata1.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 
2048 in
[1888793.816264]  res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 
(timeout)
[1888793.816270] ata1.00: status: { DRDY }
[1888793.816284] ata1: hard resetting link
[1888794.136058] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[1888794.160243] ata1.00: configured for UDMA/100
[1888794.160859] ata1: EH complete
[124.816182] ata1.00: limiting speed to UDMA/66:PIO4
[124.816192] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[124.816202] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 7e 61 43 00 00 01 00
[124.816226] ata1.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 
2048 in
[124.816230]  res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 
(timeout)
[124.816236] ata1.00: status: { DRDY }
[124.816249] ata1: hard resetting link
[125.136058] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[125.160243] ata1.00: configured for UDMA/66
[125.160812] sr 0:0:0:0: [sr0] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[125.160820] sr 0:0:0:0: [sr0] Sense Key : Aborted Command [current] 
[descriptor]
[125.160827] Descriptor sense data with sense descriptors (in hex):
[125.160831] 72 0b 00 00 00 00 00 0e 09 0c 00 00 00 02 00 00 
[125.160845] 00 08 00 00 a0 40 
[125.160853] sr 0:0:0:0: [sr0] Add. Sense: No additional sense information
[125.160860] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 7e 61 43 00 00 01 00
[125.160874] end_request: I/O error, dev sr0, sector 33129740
[125.160907] ata1: EH complete
[155.816222] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[155.816235] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 05 2c 00 00 01 00
[155.816259] ata1.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 
2048 in
[155.816263]  res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 
(timeout)
[155.816269] ata1.00: status: { DRDY }
[155.816283] ata1: hard resetting link
[156.136058] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[156.160244] ata1.00: configured for UDMA/66
[156.160811] ata1: EH complete
[1898559.432027] usb 1-6: new high speed USB device using ehci_hcd and address 4
[1898560.344047] hub 1-0:1.0: unable to enumerate USB device on port 6
[1898560.716021] usb 4-2: new full speed USB device using ohci_hcd and address 2
[1898560.918991] usb 4-2: not running at top speed; connect to a high speed hub
[1898560.942991] usb 4-2: New USB device found, idVendor=1949, idProduct=0004
[1898560.942998] usb 4-2: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
[1898560.943004] usb 4-2: Product: Amazon Kindle
[1898560.943008] usb 4-2: Manufacturer: Amazon
[1898560.943012] usb 4-2: SerialNumber: B008D0A112830JDV
[1898560.944961] usb 4-2: configuration #1 chosen from 1 choice
[1898561.378518] Initializing USB Mass Storage driver...
[1898561.378723] scsi7 : SCSI emulation for USB Mass Storage devices
[1898561.378972] usbcore: registered new interface driver usb-storage
[1898561.378979] USB Mass Storage support registered.
[1898561.382766] usb-storage: device found at 2
[1898561.382772] usb-storage: waiting for device to settle before scanning
[1898566.382615] usb-storage: device scan complete
[1898566.389592] scsi 7:0:0:0: Direct-Access Kindle   Internal Storage 0100 
PQ: 0 ANSI: 2
[1898566.392415] sd 7:0:0:0: Attached scsi generic sg5 type 0
[1898566.416569] sd 7:0:0:0: [sdc] 6410688 512-byte logical blocks: (3.28 
GB/3.05 GiB)
[1898566.534602] sd 7:0:0:0: [sdc] Write Protect is off
[1898566.534611] sd 7:0:0:0: [sdc] Mode Sense: 0f 00 00 00
[1898566.534616] sd 7:0:0:0: [sdc] Assuming drive cache: write through

Bug#661162: This is fixed in 3.2 from backports

2012-03-09 Thread Anton Ivanov


Upgrade to backports fixes that.

Brgds,

A.



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4f5a166d.9050...@kot-begemot.co.uk

Bug#661162: iwl wifi hangs on lots of traffic

2012-02-24 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-41
Severity: normal

iwl wifi hangs on heavy traffic. Mounting an nfs server via wifi
and untarring linux kernel gives a reproducible hang 1 out of 4 
times or so.

At the same time if the traffic is light it can work for days
with no problem.

-- Package-specific info:
** Version:
Linux version 2.6.32-5-amd64 (Debian 2.6.32-41) (b...@decadent.org.uk) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Mon Jan 16 16:22:28 UTC 2012

** Command line:
BOOT_IMAGE=/boot/vmlinuz-2.6.32-5-amd64 
root=UUID=8aa6c209-ca56-4f03-a332-97c130d8cbf8 ro quiet

** Tainted: W (512)
 * Taint on warning.

** Kernel log:
[18243.724048] wlan0: associated
[18304.830702] wlan0: deauthenticated from 28:94:0f:75:75:f0 (Reason: 1)
[18308.103844] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[18308.107980] wlan0: direct probe responded
[18308.107987] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[18308.109739] wlan0: authenticated
[18308.109776] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[18308.113929] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[18308.113933] wlan0: associated
[18318.117506] wlan0: deauthenticating from 28:94:0f:75:75:e0 by local choice 
(reason=3)
[18321.397240] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[18321.402371] wlan0: direct probe responded
[18321.402377] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[18321.405751] wlan0: authenticated
[18321.405781] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[18321.408651] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[18321.408656] wlan0: associated
[18904.803655] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[18908.156216] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 1)
[18908.355928] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 2)
[18908.356458] wlan0: direct probe responded
[18908.356463] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 1)
[18908.357216] wlan0: authenticated
[18908.357240] wlan0: associate with AP 28:94:0f:75:75:f0 (try 1)
[18908.555631] wlan0: associate with AP 28:94:0f:75:75:f0 (try 2)
[18908.556893] wlan0: RX AssocResp from 28:94:0f:75:75:f0 (capab=0x111 status=0 
aid=1)
[18908.556897] wlan0: associated
[19504.521603] wlan0: deauthenticating from 28:94:0f:75:75:f0 by local choice 
(reason=3)
[19504.561894] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 1)
[19504.562363] wlan0: direct probe responded
[19504.562366] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 1)
[19504.563622] wlan0: authenticated
[19504.563640] wlan0: associate with AP 28:94:0f:75:75:f0 (try 1)
[19504.759735] wlan0: associate with AP 28:94:0f:75:75:f0 (try 2)
[19504.762632] wlan0: deauthenticated from 28:94:0f:75:75:f0 (Reason: 6)
[19517.708072] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 1)
[19517.709101] wlan0: direct probe responded
[19517.709105] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 1)
[19517.904967] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 2)
[19517.905550] wlan0: authenticated
[19517.905581] wlan0: associate with AP 28:94:0f:75:75:f0 (try 1)
[19517.906740] wlan0: RX AssocResp from 28:94:0f:75:75:f0 (capab=0x111 status=0 
aid=1)
[19517.906744] wlan0: associated
[19560.702855] wlan0: deauthenticating from 28:94:0f:75:75:f0 by local choice 
(reason=3)
[19560.733096] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[19560.739341] wlan0: direct probe responded
[19560.739347] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[19560.741039] wlan0: authenticated
[19560.741071] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[19560.744230] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[19560.744236] wlan0: associated
[19644.170131] svc: failed to register lockdv1 RPC service (errno 97).
[20104.699514] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[20107.943709] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[20107.947453] wlan0: direct probe responded
[20107.947460] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[20107.949211] wlan0: authenticated
[20107.949244] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[20107.961955] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[20107.961960] wlan0: associated
[20704.640487] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[20707.980510] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[20707.984186] wlan0: direct probe responded
[20707.984192] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[20707.991442] wlan0: authenticated
[20707.991477] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[20707.994404] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[20707.994409] wlan0: associated
[21304.628216] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[21307.937258] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[21307.940867] wlan0: direct probe responded
[21307.940874] wlan0: authenticate with AP

Bug#661163: e1000e ignores mitigation settings

2012-02-24 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-41
Severity: normal

Attempts to do QoS (CBQ using tc) show tell-tale signs of 
interrupt mitigation at work. Bandwidth fluctates and is 
not measured properly.

I have set all applicable settings via ethtool at 0, I have
also passed the relevant intel-style parameters to modprobe.

TxIntDelay=0x0 TxAbsIntDelay=0x0 RxIntDelay=0x0 RxAbsIntDelay=0x0 
InterruptThrottleRate=0x0

Neither one gives any difference. The card continues to 
behave as if mitigation is enabled.

Brgds,

-- Package-specific info:
** Version:
Linux version 2.6.32-5-amd64 (Debian 2.6.32-41) (b...@decadent.org.uk) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Mon Jan 16 16:22:28 UTC 2012

** Command line:
BOOT_IMAGE=/boot/vmlinuz-2.6.32-5-amd64 
root=UUID=8aa6c209-ca56-4f03-a332-97c130d8cbf8 ro quiet

** Tainted: W (512)
 * Taint on warning.

** Kernel log:
[18243.724048] wlan0: associated
[18304.830702] wlan0: deauthenticated from 28:94:0f:75:75:f0 (Reason: 1)
[18308.103844] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[18308.107980] wlan0: direct probe responded
[18308.107987] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[18308.109739] wlan0: authenticated
[18308.109776] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[18308.113929] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[18308.113933] wlan0: associated
[18318.117506] wlan0: deauthenticating from 28:94:0f:75:75:e0 by local choice 
(reason=3)
[18321.397240] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[18321.402371] wlan0: direct probe responded
[18321.402377] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[18321.405751] wlan0: authenticated
[18321.405781] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[18321.408651] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[18321.408656] wlan0: associated
[18904.803655] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[18908.156216] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 1)
[18908.355928] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 2)
[18908.356458] wlan0: direct probe responded
[18908.356463] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 1)
[18908.357216] wlan0: authenticated
[18908.357240] wlan0: associate with AP 28:94:0f:75:75:f0 (try 1)
[18908.555631] wlan0: associate with AP 28:94:0f:75:75:f0 (try 2)
[18908.556893] wlan0: RX AssocResp from 28:94:0f:75:75:f0 (capab=0x111 status=0 
aid=1)
[18908.556897] wlan0: associated
[19504.521603] wlan0: deauthenticating from 28:94:0f:75:75:f0 by local choice 
(reason=3)
[19504.561894] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 1)
[19504.562363] wlan0: direct probe responded
[19504.562366] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 1)
[19504.563622] wlan0: authenticated
[19504.563640] wlan0: associate with AP 28:94:0f:75:75:f0 (try 1)
[19504.759735] wlan0: associate with AP 28:94:0f:75:75:f0 (try 2)
[19504.762632] wlan0: deauthenticated from 28:94:0f:75:75:f0 (Reason: 6)
[19517.708072] wlan0: direct probe to AP 28:94:0f:75:75:f0 (try 1)
[19517.709101] wlan0: direct probe responded
[19517.709105] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 1)
[19517.904967] wlan0: authenticate with AP 28:94:0f:75:75:f0 (try 2)
[19517.905550] wlan0: authenticated
[19517.905581] wlan0: associate with AP 28:94:0f:75:75:f0 (try 1)
[19517.906740] wlan0: RX AssocResp from 28:94:0f:75:75:f0 (capab=0x111 status=0 
aid=1)
[19517.906744] wlan0: associated
[19560.702855] wlan0: deauthenticating from 28:94:0f:75:75:f0 by local choice 
(reason=3)
[19560.733096] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[19560.739341] wlan0: direct probe responded
[19560.739347] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[19560.741039] wlan0: authenticated
[19560.741071] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[19560.744230] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[19560.744236] wlan0: associated
[19644.170131] svc: failed to register lockdv1 RPC service (errno 97).
[20104.699514] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[20107.943709] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[20107.947453] wlan0: direct probe responded
[20107.947460] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[20107.949211] wlan0: authenticated
[20107.949244] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[20107.961955] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[20107.961960] wlan0: associated
[20704.640487] wlan0: deauthenticated from 28:94:0f:75:75:e0 (Reason: 1)
[20707.980510] wlan0: direct probe to AP 28:94:0f:75:75:e0 (try 1)
[20707.984186] wlan0: direct probe responded
[20707.984192] wlan0: authenticate with AP 28:94:0f:75:75:e0 (try 1)
[20707.991442] wlan0: authenticated
[20707.991477] wlan0: associate with AP 28:94:0f:75:75:e0 (try 1)
[20707.994404] wlan0: RX AssocResp from 28:94:0f:75:75:e0 (capab=0x431 status=0 
aid=1)
[20707.994409] wlan0: associated

Bug#618744: nfsd gets stuck in D state

2012-02-12 Thread Anton Ivanov


On 10/02/12 01:38, Jonathan Nieder wrote:

Hi,

Anton Ivanov wrote:

   

nfsd gets stuck in D state. Initially some machines, later all which read off
the nfs server fail to read. Messages like:

Mar 17 22:03:34 localhost kernel: [1899559.532028] statd: server rpc.statd not 
responding, timed out
Mar 17 22:03:34 localhost kernel: [1899559.532055] lockd: cannot monitor greebo
 

Mph, that's no good.

What kernel do you use now?  Any changes?  Can you use alt+sysrq; w
while in that state to get backtraces for blocked tasks, in case it
helps find a deadlock?  (You may need to use echo 1/proc/sys/kernel/sysrq
before that will work; see Documentation/sysrq.txt for details.)
   


I have upgraded most machines to current stable and I have stopped 
seeing thi.



Hope that helps,
Jonathan

   





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4f380add.3090...@kot-begemot.co.uk

Bug#576405: linux-image-2.6.26: Deadlock during combined NFS3/NFS4 use

2011-09-22 Thread Anton Ivanov


On 12/09/11 10:07, Jonathan Nieder wrote:

Hi Anton,

Anton Ivanov wrote:

   

When an export is exported and mounted via autofs using BOTH
NFSv3 and NFSv4 the NFSv4 one deadlocks.
 

We ought to have put this in the hands of upstream about a year ago.
Better late than never, so I will echo Ben:

Ben Hutchings wrote:

   

OK.  Next, can you test whether the kernel version in unstable
(linux-image-2.6.32-4-* version 2.6.32-10) or testing
(linux-image-2.6.32-3-* version 2.6.32-9) also has this bug?
 


I cannot reproduce it on 2.6.32 with autofs5 (squeeze). You can close it.

In general 2.6.26/autofs4 (lenny) was quite fragile with automounted 
nfs4. nfs4 itself was OK, autofs itself was OK, together the combination 
was rather explosive.


Squeeze fixed all of that. I have yet to observe a single problem with 
nfs4 + autofs on my squeeze systems.



Can you reproduce this with a recent (3.x) kernel?  (If so, upstream
might care, and if not, we can try to find the fix and backport it.)

Thanks for an interesting report, and sorry to have left it hanging.
Jonathan

   



--
Humans are allergic to change. They love to say, We've always
done it this way. I try to fight that. That's why I have a clock
on my wall that runs counter-clockwise.   -- R.A. Grace Hopper

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e7b3f3d.2010...@kot-begemot.co.uk

Bug#576405: linux-image-2.6.26: Deadlock during combined NFS3/NFS4 use

2011-09-12 Thread Anton Ivanov


On 12/09/11 10:07, Jonathan Nieder wrote:

Hi Anton,

Anton Ivanov wrote:

   

When an export is exported and mounted via autofs using BOTH
NFSv3 and NFSv4 the NFSv4 one deadlocks.
 

We ought to have put this in the hands of upstream about a year ago.
Better late than never, so I will echo Ben:

Ben Hutchings wrote:

   

OK.  Next, can you test whether the kernel version in unstable
(linux-image-2.6.32-4-* version 2.6.32-10) or testing
(linux-image-2.6.32-3-* version 2.6.32-9) also has this bug?
 

Can you reproduce this with a recent (3.x) kernel?  (If so, upstream
might care, and if not, we can try to find the fix and backport it.)
   



I will try to find some time to retest with current and 3.x this week. I 
do not recall seeing it on 2.6.32 lately. I have, however, changed back 
to v3 a lot of autofs entries so this is not indicative.



Thanks for an interesting report, and sorry to have left it hanging.
   


No worries.


Jonathan

   



--
Humans are allergic to change. They love to say, We've always
done it this way. I try to fight that. That's why I have a clock
on my wall that runs counter-clockwise.   -- R.A. Grace Hopper

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk




--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4e6dd52a.9000...@kot-begemot.co.uk

Bug#629428: linux-image-2.6.32-5-686: rtl818x broken for RTL-8185 IEEE 802.11a/b/g

2011-06-06 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-34squeeze1
Severity: normal


The card is identified correctly, but never manages to associate 
with an AP.

The card is a generic rtl8185 cannibalised out of a Maplin HTC
PC bundle.

Dmesg self-explanatory.

Unfortunately the web site on sourceforge seems a bit blank so
I cannot pick up a more recent driver to test build to see 
if this can be fixed.

-- Package-specific info:
** Version:
Linux version 2.6.32-5-686 (Debian 2.6.32-34squeeze1) (da...@debian.org) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed May 18 07:08:50 UTC 2011

** Command line:
BOOT_IMAGE=/boot/vmlinuz-2.6.32-5-686 
root=UUID=ddf5a907-47ff-4167-adc8-f00fd59ed1f8 ro acpi_enforce_resources=lax 
quiet

** Not tainted

** Kernel log:
[2.646116] uhci_hcd :00:10.3: PCI INT B - Link[ALKB] - GSI 21 (level, 
low) - IRQ 21
[2.646134] uhci_hcd :00:10.3: setting latency timer to 64
[2.646141] uhci_hcd :00:10.3: UHCI Host Controller
[2.646160] uhci_hcd :00:10.3: new USB bus registered, assigned bus 
number 5
[2.646199] uhci_hcd :00:10.3: irq 21, io base 0xf400
[2.646281] usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
[2.646289] usb usb5: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[2.646295] usb usb5: Product: UHCI Host Controller
[2.646300] usb usb5: Manufacturer: Linux 2.6.32-5-686 uhci_hcd
[2.646306] usb usb5: SerialNumber: :00:10.3
[2.646640] usb usb5: configuration #1 chosen from 1 choice
[2.647236] hub 5-0:1.0: USB hub found
[2.647276] hub 5-0:1.0: 2 ports detected
[2.824026] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[2.988236] ata2.00: ATA-7: Maxtor 6Y080M0, YAR511W0, max UDMA/100
[2.988245] ata2.00: 156301488 sectors, multi 16: LBA48 
[3.004253] ata2.00: configured for UDMA/100
[3.004477] scsi 1:0:0:0: Direct-Access ATA  Maxtor 6Y080M0   YAR5 
PQ: 0 ANSI: 5
[3.303701] sd 1:0:0:0: [sda] 156301488 512-byte logical blocks: (80.0 
GB/74.5 GiB)
[3.303789] sd 1:0:0:0: [sda] Write Protect is off
[3.303796] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[3.303832] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[3.304151]  sda: sda1 sda2
[3.326469] sd 1:0:0:0: [sda] Attached SCSI disk
[3.599716] PM: Starting manual resume from disk
[3.599727] PM: Resume from partition 8:1
[3.599732] PM: Checking hibernation image.
[3.600083] PM: Error -22 checking image file
[3.600087] PM: Resume from disk failed.
[3.674961] kjournald starting.  Commit interval 5 seconds
[3.674984] EXT3-fs: mounted filesystem with ordered data mode.
[5.254134] udev[309]: starting version 164
[5.907667] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[5.936574] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0
[5.936593] ACPI: Power Button [PWRB]
[5.936735] input: Sleep Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input1
[5.936752] ACPI: Sleep Button [SLPB]
[5.936915] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[5.936925] ACPI: Power Button [PWRF]
[6.076088] processor LNXCPU:00: registered as cooling_device0
[6.113043] input: PC Speaker as /devices/platform/pcspkr/input/input3
[6.197120] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[6.261327] parport_pc 00:0a: reported by Plug and Play ACPI
[6.261383] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[6.458051] cfg80211: Using static regulatory domain info
[6.458058] cfg80211: Regulatory domain: US
[6.458064]  (start_freq - end_freq @ bandwidth), (max_antenna_gain, 
max_eirp)
[6.458072]  (2402000 KHz - 2472000 KHz @ 4 KHz), (600 mBi, 2700 mBm)
[6.458079]  (517 KHz - 519 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[6.458087]  (519 KHz - 521 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[6.458095]  (521 KHz - 523 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[6.458102]  (523 KHz - 533 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[6.458110]  (5735000 KHz - 5835000 KHz @ 4 KHz), (600 mBi, 3000 mBm)
[6.459361] cfg80211: Calling CRDA for country: US
[7.418256] rtl8180 :00:08.0: PCI INT A - GSI 16 (level, low) - IRQ 16
[7.659638] phy0: Selected rate control algorithm 'minstrel'
[7.661526] phy0: hwaddr 00:e0:46:50:00:40, RTL8185vD + rtl8225z2
[8.002042] ACPI: PCI Interrupt Link [ALKC] enabled at IRQ 22
[8.002065] VIA 82xx Audio :00:11.5: PCI INT C - Link[ALKC] - GSI 22 
(level, low) - IRQ 22
[8.002248] VIA 82xx Audio :00:11.5: setting latency timer to 64
[9.498139] Adding 1959888k swap on /dev/sda1.  Priority:-1 extents:1 
across:1959888k 
[9.763059] EXT3 FS on sda2, internal journal
[9.974774] loop: module loaded
[   10.124184] it87: Found IT8716F chip at 0x290, revision 1
[   10.124196] it87: in3 is VCC (+5V)
[   10.124200]

Bug#621737: linux-image-2.6.32-5-powerpc: ath ignores regulatory domain setting

2011-04-10 Thread Anton Ivanov


On 04/10/11 00:55, Ben Hutchings wrote:

On Fri, 2011-04-08 at 13:39 +0100, Anton Ivanov wrote:
   

Package: linux-2.6
Version: 2.6.32-30
Severity: minor


ath driver ignores reg domain setting passed via cfg80211 and uses one
from EEPROM instead. This setting a lot of cheap cards is CN. As a result
the reg domain is set incorrectly (and for some countries illegally).
 

[...]

What do you mean by 'passed via cfg80211'?  Are you setting the
ieee80211_regdom module parameter?

Ben.

   

Yes. No effect. ath still reads from eeprom.

--
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanovai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4da14eb0.10...@sigsegv.cx

Bug#621737: linux-image-2.6.32-5-powerpc: ath ignores regulatory domain setting

2011-04-10 Thread Anton Ivanov


On 04/10/11 16:23, Stefan Lippers-Hollmann wrote:

Hi

On Sunday 10 April 2011, Anton Ivanov wrote:
   

On 04/10/11 00:55, Ben Hutchings wrote:
 

On Fri, 2011-04-08 at 13:39 +0100, Anton Ivanov wrote:
   

[...]
   

ath driver ignores reg domain setting passed via cfg80211 and uses one
from EEPROM instead. This setting a lot of cheap cards is CN. As a result
the reg domain is set incorrectly (and for some countries illegally).
 

[...]
   

What do you mean by 'passed via cfg80211'?  Are you setting the
ieee80211_regdom module parameter?
   

[...]
   

Yes. No effect. ath still reads from eeprom.
 

The EEPROM settings are authoritative, you can only restrict the
regulatory settings further to aid regulatory compliance in different
regions, but never relax them. Tools like crda always intersect the
EEPROM's (OTP in newer chipset generations) with the chosen regulatory
domain as provided by wireless-regdb or the in-kernel regdb; regulatory
hints like IEEE 802.11d may also restrict the allowed frequencies even
further.

http://wireless.kernel.org/en/users/Drivers/ath#Regulatory

This is intended beaviour and required for FCC compliance (keep in mind
that calibration data is also only validated for the given regdomain),
not a bug.
   


So a card that returns only CN from EEPROM is basically intended to be 
sold _ONLY_ in China. Right?


Brgds,


Regards
Stefan Lippers-Hollmann

   



--
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanovai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4da1f416.9000...@sigsegv.cx

Bug#621737: linux-image-2.6.32-5-powerpc: ath ignores regulatory domain setting

2011-04-10 Thread Anton Ivanov


OK, cool, thanks, you can close the bug.

I agree - thankfully it is not a 5GHz part so no harm done from it 
reporting CN.


Brgds,

On 04/10/11 20:08, Stefan Lippers-Hollmann wrote:

Hi

On Sunday 10 April 2011, Anton Ivanov wrote:
   

On 04/10/11 16:23, Stefan Lippers-Hollmann wrote:
 

Hi

On Sunday 10 April 2011, Anton Ivanov wrote:

   

On 04/10/11 00:55, Ben Hutchings wrote:

 

On Fri, 2011-04-08 at 13:39 +0100, Anton Ivanov wrote:
   

[...]
   

Yes. No effect. ath still reads from eeprom.

 

The EEPROM settings are authoritative, you can only restrict the
regulatory settings further to aid regulatory compliance in different
regions, but never relax them. Tools like crda always intersect the
EEPROM's (OTP in newer chipset generations) with the chosen regulatory
domain as provided by wireless-regdb or the in-kernel regdb; regulatory
hints like IEEE 802.11d may also restrict the allowed frequencies even
further.

http://wireless.kernel.org/en/users/Drivers/ath#Regulatory

This is intended beaviour and required for FCC compliance (keep in mind
that calibration data is also only validated for the given regdomain),
not a bug.

   

So a card that returns only CN from EEPROM is basically intended to be
sold _ONLY_ in China. Right?
 

[...]

Correct, it's arguably even illegal to sell in ETSI regions. Although
it's technically a little more complex as Atheros groups regdom regions
with identical mappings together[1], which makes reading the EEPROM
based regulatory domain code a bit strange (the alphabetically first
match corresponding to the regdom group gets printed to dmesg).

In your particular case, with a 2.4 GHz-only AR2417 PHY, 0x52
(APL1_WORLD vs ETSI1_WORLD, GB) doesn't actually do any harm, as 'CN'
allows channel 1-13 just as well as the most permissive regdomains
(ch14 in Japan is only allowed for CSMA/CA == 11 MBit/s, not the more
common OFDM rates (= 54 MBit/s)). So even though your device is
wrongly programmed, it doesn't actually limit your abilities (unless
you'd add an additional 5 GHz capable card, which would suffer from an
'unfortunate' intersection) - and neither allows you to access
non-public frequency bands. This situation would be seriously worse
(both technically and legally) for 5 GHz operations, but your device
doesn't support that anyways.

country CN:
 (2402 - 2482 @ 40), (N/A, 20)
 (5735 - 5835 @ 40), (N/A, 30)

country GB:
 (2402 - 2482 @ 40), (N/A, 20)
 (5170 - 5250 @ 40), (N/A, 20)
 (5250 - 5330 @ 40), (N/A, 20), DFS
 (5490 - 5710 @ 40), (N/A, 27), DFS


However I'm aware of the sad truth that most commonly sold cards are
wrongly programmed for CN or (worse for 2.4 GHz operations) US...

Regards
Stefan Lippers-Hollmann

[1] http://wireless.kernel.org/en/users/Drivers/ath#line-28

   



--
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanovai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4da20b48.4070...@sigsegv.cx

Bug#621737: linux-image-2.6.32-5-powerpc: ath ignores regulatory domain setting

2011-04-08 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-30
Severity: minor


ath driver ignores reg domain setting passed via cfg80211 and uses one 
from EEPROM instead. This setting a lot of cheap cards is CN. As a result
the reg domain is set incorrectly (and for some countries illegally).

dmesg selfexplanatory.


-- Package-specific info:
** Version:
Linux version 2.6.32-5-powerpc (Debian 2.6.32-30) (b...@decadent.org.uk) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 Wed Jan 12 04:47:03 UTC 2011

** Command line:
root=/dev/hda4 ro 

** Tainted: W (512)
 * Taint on warning.

** Kernel log:
[37906.170686] PHY ID: 2060e1, addr: 0
[37908.211322] eth1: Airport waking up
[37908.659045] hda: host max PIO4 wanted PIO255(auto-tune) selected PIO4
[37908.659178] adb: starting probe task...
[37908.660207] hda: UDMA/66 mode selected
[37908.664146] hdc: host max PIO4 wanted PIO255(auto-tune) selected PIO4
[37908.664517] hdc: MWDMA2 mode selected
[37908.907906] adb devices: [2]: 2 c4 [3]: 3 1 [7]: 7 1f
[37908.913808] ADB keyboard at 2, handler 1
[37908.928828] ADB mouse at 3, handler set to 4 (trackpad)
[37908.987527] adb: finished probe task...
[37909.512236] PM: Finishing wakeup.
[37909.512244] Restarting tasks ... done.
[37909.986288] ath5k 0001:11:00.0: enabling device ( - 0002)
[37909.986396] ath5k 0001:11:00.0: registered as 'phy1'
[37910.772036] ath: EEPROM regdomain: 0x809c
[37910.772046] ath: EEPROM indicates we should expect a country code
[37910.772054] ath: doing EEPROM country-regdmn map search
[37910.772061] ath: country maps to regdmn code: 0x52
[37910.772069] ath: Country alpha2 being used: CN
[37910.772075] ath: Regpair used: 0x52
[37910.847157] agpgart-uninorth :00:0b.0: putting AGP V2 device into 4x mode
[37910.847179] radeonfb :00:10.0: putting AGP V2 device into 4x mode
[37910.874578] phy1: Selected rate control algorithm 'minstrel'
[37910.878605] ath5k phy1: Atheros AR2417 chip found (MAC: 0xf0, PHY: 0x70)
[37910.878634] cfg80211: Calling CRDA for country: CN
[37912.612186] ondemand governor failed, too long transition latency of HW, 
fallback to performance governor
[37988.110403] ondemand governor failed, too long transition latency of HW, 
fallback to performance governor
[37988.359273] hda: host max PIO4 wanted PIO0 selected PIO0
[38096.963649] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[38102.988503] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[38103.192041] ADDRCONF(NETDEV_UP): eth0: link is not ready
[38103.454283] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[38106.938822] wlan0: direct probe to AP 00:90:4c:91:00:03 (try 1)
[38106.940456] wlan0: direct probe responded
[38106.940465] wlan0: authenticate with AP 00:90:4c:91:00:03 (try 1)
[38106.942155] wlan0: authenticated
[38106.942187] wlan0: associate with AP 00:90:4c:91:00:03 (try 1)
[38106.98] wlan0: RX AssocResp from 00:90:4c:91:00:03 (capab=0x411 status=0 
aid=5)
[38106.944457] wlan0: associated
[38106.945747] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[38153.913695] tun0: Disabled Privacy Extensions
[38513.727694] pcmcia_socket pcmcia_socket0: pccard: card ejected from slot 0
[38513.763178] wlan0: deauthenticating from 00:90:4c:91:00:03 by local choice 
(reason=3)
[38513.808570] ath5k phy1: failed to wakeup the MAC Chip
[38514.658638] ADDRCONF(NETDEV_UP): eth0: link is not ready
[38582.438927] cfg80211: Using static regulatory domain info
[38582.438937] cfg80211: Regulatory domain: EU
[38582.438941]  (start_freq - end_freq @ bandwidth), (max_antenna_gain, 
max_eirp)
[38582.438949]  (2402000 KHz - 2482000 KHz @ 4 KHz), (600 mBi, 2000 mBm)
[38582.438956]  (517 KHz - 519 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[38582.438964]  (519 KHz - 521 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[38582.438971]  (521 KHz - 523 KHz @ 4 KHz), (600 mBi, 2300 mBm)
[38582.438978]  (523 KHz - 533 KHz @ 4 KHz), (600 mBi, 2000 mBm)
[38582.438985]  (549 KHz - 571 KHz @ 4 KHz), (600 mBi, 3000 mBm)
[38582.442195] cfg80211: Calling CRDA for country: EU
[38582.442360] cfg80211: Calling CRDA for country: EU
[38590.296292] pcmcia_socket pcmcia_socket0: pccard: CardBus card inserted into 
slot 0
[38590.296354] pci 0001:11:00.0: reg 10 32bit mmio: [0x00-0x00]
[38590.366503] ath5k 0001:11:00.0: enabling device ( - 0002)
[38590.366594] ath5k 0001:11:00.0: registered as 'phy0'
[38590.869302] ath: EEPROM regdomain: 0x809c
[38590.869309] ath: EEPROM indicates we should expect a country code
[38590.869315] ath: doing EEPROM country-regdmn map search
[38590.869321] ath: country maps to regdmn code: 0x52
[38590.869326] ath: Country alpha2 being used: CN
[38590.869331] ath: Regpair used: 0x52
[38590.870149] phy0: Selected rate control algorithm 'minstrel'
[38590.902742] ath5k phy0: Atheros AR2417 chip found (MAC: 0xf0, PHY: 0x70)
[38590.902765] cfg80211: Calling CRDA for country: CN
[38591.064341] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[38593.360236] wlan0: direct probe to AP 00:90:4c:91:00:03 (try 1)

Bug#618744: linux-image-2.6.32-5-amd64: nfsd gets stuck in D state

2011-03-18 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-30
Severity: important


nfsd gets stuck in D state. Initially some machines, later all which read off
the nfs server fail to read. Messages like:

Mar 17 22:03:34 localhost kernel: [1899559.532028] statd: server rpc.statd not 
responding, timed out
Mar 17 22:03:34 localhost kernel: [1899559.532055] lockd: cannot monitor greebo

appear in the kernel.

I tried to restart nfs-kernel-daemon and got:

Mar 17 22:26:14 localhost kernel: [1900919.668030] rpcbind: server localhost 
not responding, timed out
Mar 17 22:26:15 localhost kernel: [1900920.452074] INFO: task rpc.nfsd:12072 
blocked for more than 120 seconds.
Mar 17 22:26:15 localhost kernel: [1900920.452150] echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Mar 17 22:26:15 localhost kernel: [1900920.452226] rpc.nfsd  D 
 0 12072  12061 0x0004
Mar 17 22:26:15 localhost kernel: [1900920.452238]  814611f0 
0082 880146d0 88007f803110
Mar 17 22:26:15 localhost kernel: [1900920.452249]  03961016 
0001 f9e0 880035d3bfd8
Mar 17 22:26:15 localhost kernel: [1900920.452258]  00015780 
00015780 880003863880 880003863b78
Mar 17 22:26:15 localhost kernel: [1900920.452267] Call Trace:
Mar 17 22:26:15 localhost kernel: [1900920.452284]  [810b9ea8] ? 
__alloc_pages_nodemask+0x11c/0x5f4
Mar 17 22:26:15 localhost kernel: [1900920.452303]  [812fb05a] ? 
__mutex_lock_common+0x122/0x192
Mar 17 22:26:15 localhost kernel: [1900920.452315]  [812fb182] ? 
mutex_lock+0x1a/0x31
Mar 17 22:26:15 localhost kernel: [1900920.452339]  [a0f7ac44] ? 
write_ports+0x2a/0x28a [nfsd]
Mar 17 22:26:15 localhost kernel: [1900920.452348]  [810b92e4] ? 
__get_free_pages+0x9/0x46
Mar 17 22:26:15 localhost kernel: [1900920.452358]  [81106ce3] ? 
simple_transaction_get+0x8c/0xa6
Mar 17 22:26:15 localhost kernel: [1900920.452375]  [a0f7ac1a] ? 
write_ports+0x0/0x28a [nfsd]
Mar 17 22:26:15 localhost kernel: [1900920.452392]  [a0f7a971] ? 
nfsctl_transaction_write+0x43/0x64 [nfsd]
Mar 17 22:26:15 localhost kernel: [1900920.452409]  [a0f7b85a] ? 
nfsctl_transaction_read+0x27/0x4d [nfsd]
Mar 17 22:26:15 localhost kernel: [1900920.452420]  [810ef252] ? 
vfs_read+0xa6/0xff
Mar 17 22:26:15 localhost kernel: [1900920.452428]  [810ef367] ? 
sys_read+0x45/0x6e
Mar 17 22:26:15 localhost kernel: [1900920.452437]  [81010b42] ? 
system_call_fastpath+0x16/0x1b
Mar 17 22:26:44 localhost kernel: [1900949.668032] rpcbind: server localhost 
not responding, timed out


The machine is stable, has been used for around a year in the current hardware 
config with 2.6.26 and 2.6.32.bpo

My suspicion is offload in e1000. Something similar to this one:

http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg02489.html

I have turned off all offload except checksumming for the time being.

-- Package-specific info:
** Version:
Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) (b...@decadent.org.uk) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 03:40:32 UTC 2011

** Command line:
BOOT_IMAGE=/boot/vmlinuz-2.6.32-5-amd64 
root=UUID=49a2baa4-c4fb-4b25-a847-da38aabf6eb4 ro quiet

** Tainted: P (1)
 * Proprietary module has been loaded.

** Kernel log:
[4.200149] raid0:   == UNIQUE
[4.200150] raid0: 1 zones
[4.200151] raid0: looking at sdb3
[4.200153] raid0:   comparing sdb3(1297270784)
[4.200155]  with sda3(1297270784)
[4.200156] raid0:   EQUAL
[4.200158] raid0: FINAL 1 zones
[4.200162] raid0: done.
[4.200164] raid0 : md_size is 2594541568 sectors.
[4.200167] *** md2 configuration *
[4.200168] zone0=[sda3/sdb3/]
[4.200172] zone offset=0kb device offset=0kb size=1297270784kb
[4.200173] **
[4.200174] 
[4.200225] md2: detected capacity change from 0 to 1328405282816
[4.202838]  md2: unknown partition table
[4.461090] kjournald starting.  Commit interval 5 seconds
[4.461102] EXT3-fs: mounted filesystem with ordered data mode.
[5.654925] udev[404]: starting version 164
[6.032719] input: Power Button as 
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2
[6.032728] ACPI: Power Button [PWRB]
[6.032795] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[6.032798] ACPI: Power Button [PWRF]
[6.038008] input: PC Speaker as /devices/platform/pcspkr/input/input4
[6.046956] processor LNXCPU:00: registered as cooling_device0
[6.047038] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[6.150018] EDAC MC: Ver: 2.1.0 Jan 12 2011
[6.170251] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[6.208884] parport_pc 00:07: reported by Plug and Play ACPI
[6.208964] parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE]
[6.295118] EDAC amd64_edac:  Ver: 3.2.0

Bug#613225: Acknowledgement (linux-image-2.6.32: fails to suspend to RAM)

2011-02-27 Thread Anton Ivanov


You can close this one or downgrade it.

There is a problem somewhere in the suspend/resume, but it is too 
esoteric and hard to reproduce.


Upgrade to KDE4 left a total mess in various KDE cache files resulting 
in the machine keeping open files on autofs NFS across a VPN link and 
not unmounting (as it should) stuff before trying to suspend.


So on the negative side - if there are files open on an automounted NFS 
mount there may be circumstances where the kernel can go berserk when it 
tries to suspend.


On the positive side - once the KDE cache, history, etc was cleared and 
the files were not being accessed any more the machine started 
suspending properly once again. I have not been able to reproduce this 
issue from that point onwards.


Brgds,

On 13/02/11 16:03, Debian Bug Tracking System wrote:

Thank you for filing a new Bug report with Debian.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
  Debian Kernel Teamdebian-kernel@lists.debian.org

If you wish to submit further information on this problem, please
send it to 613...@bugs.debian.org.

Please do not send mail to ow...@bugs.debian.org unless you wish
to report a problem with the Bug-tracking system.

   





--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d6a4887.1010...@sigsegv.cx

Bug#613225: linux-image-2.6.32: fails to suspend to RAM

2011-02-13 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-30
Severity: normal
File: linux-image-2.6.32


Current .32 and last BPO for lenny prior to squeeze release fail to 
suspend to RAM most of the time. Up to 2.6.32-15~bpo50+1 suspend 
was flawless.

The machine is a TiBook G4 1GHz with 1G of RAM and a new harddrive.


-- Package-specific info:
** Version:
Linux version 2.6.32-5-powerpc (Debian 2.6.32-30) (b...@decadent.org.uk) (gcc 
version 4.3.5 (Debian 4.3.5-4) ) #1 Wed Jan 12 04:47:03 UTC 2011

** Command line:
root=/dev/hda4 ro 

** Not tainted

** Kernel log:
[   19.862778] airport 0.0003:radio: WEP supported, 104-bit key
[   19.870813] airport 0.0003:radio: WPA-PSK supported
[   22.229679] Adding 1464836k swap on /dev/hda3.  Priority:-1 extents:1 
across:1464836k 
[   22.540964] EXT3 FS on hda4, internal journal
[   23.024549] loop: module loaded
[   23.146664] SCSI subsystem initialized
[   23.596933] irq: irq 1 on host 
/pci@f200/mac-io@17/interrupt-controller@4 mapped to virtual irq 30
[   23.596974] irq: irq 2 on host 
/pci@f200/mac-io@17/interrupt-controller@4 mapped to virtual irq 31
[   23.604578] irq: irq 61 on host 
/pci@f200/mac-io@17/interrupt-controller@4 mapped to virtual irq 61
[   24.019623] input: PowerMac Beep as 
/devices/pci0001:10/0001:10:17.0/input/input5
[   26.930734] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   27.427036] RPC: Registered udp transport module.
[   27.433683] RPC: Registered tcp transport module.
[   27.440302] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   27.518815] Slow work thread pool: Starting up
[   27.525637] Slow work thread pool: Ready
[   27.532918] FS-Cache: Loaded
[   27.652868] FS-Cache: Netfs 'nfs' registered for caching
[   27.743174] Installing knfsd (copyright (C) 1996 o...@monad.swb.de).
[   35.221394] ondemand governor failed, too long transition latency of HW, 
fallback to performance governor
[   54.138182] svc: failed to register lockdv1 RPC service (errno 97).
[   54.148002] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[   54.188930] NFSD: starting 90-second grace period
[   56.203753] hda: host max PIO4 wanted PIO0 selected PIO0
[   58.267898] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   60.748975] Bluetooth: Core ver 2.15
[   60.762837] NET: Registered protocol family 31
[   60.770678] Bluetooth: HCI device and connection manager initialized
[   60.778526] Bluetooth: HCI socket layer initialized
[   60.838194] Bluetooth: L2CAP ver 2.14
[   60.845924] Bluetooth: L2CAP socket layer initialized
[   60.896361] Bluetooth: RFCOMM TTY layer initialized
[   60.904497] Bluetooth: RFCOMM socket layer initialized
[   60.912310] Bluetooth: RFCOMM ver 1.11
[   61.010434] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   61.018283] Bluetooth: BNEP filters: protocol multicast
[   61.226590] Bridge firewalling registered
[   61.331182] Bluetooth: SCO (Voice Link) ver 0.6
[   61.339212] Bluetooth: SCO socket layer initialized
[   61.747515] lp: driver loaded but no devices found
[   64.940332] nf_conntrack version 0.5.0 (16120 buckets, 64480 max)
[   64.959457] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please 
use
[   64.967574] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module 
option or
[   64.975699] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[   65.205175] ip_tables: (C) 2000-2006 Netfilter Core Team
[   65.717038] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   66.171018] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   66.395982] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   68.261963] radeonfb :00:10.0: Invalid ROM contents
[   68.262528] radeonfb :00:10.0: Invalid ROM contents
[   68.421552] [drm] Initialized drm 1.1.0 20060810
[   68.654781] [drm] radeon defaulting to userspace modesetting.
[   68.660143] [drm] Initialized radeon 1.32.0 20080528 for :00:10.0 on 
minor 0
[   68.832689] eth1: Lucent/Agere firmware doesn't support manual roaming
[   69.047762] agpgart-uninorth :00:0b.0: putting AGP V2 device into 4x mode
[   69.047783] radeonfb :00:10.0: putting AGP V2 device into 4x mode
[   69.313279] [drm] Setting GART location based on new memory map
[   69.313426] [drm] Loading R200 Microcode
[   69.315238] platform radeon_cp.0: firmware: requesting radeon/R200_cp.bin
[   69.389693] [drm] writeback test succeeded in 1 usecs
[   73.564031] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   73.909699] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   74.085192] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   75.425353] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   75.753039] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   76.267197] eth1: Lucent/Agere firmware doesn't support manual roaming
[   82.122204] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   82.314088] ADDRCONF(NETDEV_UP): eth0: link is not ready
[   82.410441] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   83.511624] ADDRCONF(NETDEV_UP): eth1:

Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems

2011-02-09 Thread Anton Ivanov

Hi Ben,

You were correct.

It is offload and it is X and/or pulse which is throwing enough TCP at
the system to trigger the memory allocation failures.

You can close the bug now.

Turning off all offloads except checksumming looks like a valid
workaround. I have had the system running for a while. The memory
allocation failures should have shown up by now.

It may be worth it to have an init script as a part of the ethtool
package which sets offloads and defaults to turning off segmentation
offloads at if there is no swap. I will be happy to write it, if you and
the ethtool maintainer think it is a good idea.

Brgds,

-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d52c09a.70...@sigsegv.cx

Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems

2011-02-06 Thread Anton Ivanov

Ben Hutchings wrote:
 On Sat, 2011-02-05 at 13:45 +, Anton Ivanov wrote:
   
 Ben Hutchings wrote:
 
 On Mon, 2011-01-31 at 11:16 +, Anton Ivanov wrote:
   
   
 Package: linux-2.6
 Version: 2.6.32-30~bpo50+1
 Severity: normal


 I keep getting VM failure messages. I suspect the machine 
 is simply a bit too slow for the network card which is in 
 it. It is a via Nehemia at 1.7GHz with an extra Intel 
 GigE server adapter. The backtraces look like showing 
 problems in the network receive/xmit routines.
 
 
 This is an allocation failure for a *huge* allocation (order 5 = 128 KB
 chunk) in atomic (non-sleeping) context.  I think this may be related to
 (1) use of GRO on the receive path to coalesce packets (2) a
 netfilter/iptables rule that requires the packet to be duplicated, or
 requires the contents to be made contiguous.
   
   
 1. Do you mean gso? I do not see gro as an option on ethtool.
 

 I mean what I said.  Install ethtool from squeeze.
   
Understood. Will test and submit results.
   
 2. I think I know the culprit. I have recently made the machine to
 double up as a X-term. Some pixmap updates can easily pass around chunks
 that size. I have a couple of other systems with similar hardware so I
 will see if I can reproduce it with them.
 

 That doesn't require contiguous blocks.  But it will still reduce the
 amount of free memory.

   
 3. While the machine has a few netfilter rules they are all on another
 interface (towards a wifi AP) and it does not do any NAT so no need to
 reconstruct packets.
 

 That's strange.

   
The only traffic of notice the machine has is NFS, Xterm and a bit of
mysql from time to time. NFS is mostly read and clients use -orsize=4096

[snip]

   

 Really, you think Linux hasn't improved in 7 years?
   
Oh it has.  It is now much better on handling failed hardware/hardware
gone away. Fair point. I will test how exactly does it look if you swap
to a device and the device suddenly goes away nowdays.

 Ben.

   
Brgds,

-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d4e572a.4040...@sigsegv.cx

Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems

2011-02-05 Thread Anton Ivanov

Ben Hutchings wrote:
 On Mon, 2011-01-31 at 11:16 +, Anton Ivanov wrote:
   
 Package: linux-2.6
 Version: 2.6.32-30~bpo50+1
 Severity: normal


 I keep getting VM failure messages. I suspect the machine 
 is simply a bit too slow for the network card which is in 
 it. It is a via Nehemia at 1.7GHz with an extra Intel 
 GigE server adapter. The backtraces look like showing 
 problems in the network receive/xmit routines.
 

 This is an allocation failure for a *huge* allocation (order 5 = 128 KB
 chunk) in atomic (non-sleeping) context.  I think this may be related to
 (1) use of GRO on the receive path to coalesce packets (2) a
 netfilter/iptables rule that requires the packet to be duplicated, or
 requires the contents to be made contiguous.
   

1. Do you mean gso? I do not see gro as an option on ethtool.

2. I think I know the culprit. I have recently made the machine to
double up as a X-term. Some pixmap updates can easily pass around chunks
that size. I have a couple of other systems with similar hardware so I
will see if I can reproduce it with them.

3. While the machine has a few netfilter rules they are all on another
interface (towards a wifi AP) and it does not do any NAT so no need to
reconstruct packets.
   
 The machine is swapless and is used mostly as an NFS 
 server. It was not showing this behaviour under 2.6.26
 
 [...]

 Probably because e1000 did not use LRO or GRO there.  You can test this
 by turning off GRO with 'ethtool -K eth0 gro off'.

 However I would also recommend configuring the machine with some swap
 space.  The kernel has trouble defragmenting memory without swapping.
   
It is my always-on server with everything raid-ed. If I configure swap
the reliability is out of the window. I did that mistake once a while
back (7 years or so) and it ended up with some serious damage. The only
to get swap for it is hardware RAID.
 Ben.

   


-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d4d545e.3010...@sigsegv.cx

Bug#611622: linux-image-2.6.32-bpo.5-686: VM problems

2011-01-31 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-30~bpo50+1
Severity: normal


I keep getting VM failure messages. I suspect the machine 
is simply a bit too slow for the network card which is in 
it. It is a via Nehemia at 1.7GHz with an extra Intel 
GigE server adapter. The backtraces look like showing 
problems in the network receive/xmit routines.

The machine is swapless and is used mostly as an NFS 
server. It was not showing this behaviour under 2.6.26

Best Regards,

-- Package-specific info:
** Version:
Linux version 2.6.32-bpo.5-686 (Debian 2.6.32-30~bpo50+1) 
(norb...@tretkowski.de) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Tue Jan 
18 23:27:36 UTC 2011

** Command line:
auto BOOT_IMAGE=Lin_2.6.32-bpo ro root=900 acpi_enforce_resources=lax

** Not tainted

** Kernel log:
[120567.850253] HighMem: 1*4kB 5*8kB 0*16kB 1*32kB 2*64kB 0*128kB 0*256kB 
0*512kB 0*1024kB 0*2048kB 0*4096kB = 204kB
[120567.850276] 171605 total pagecache pages
[120567.850281] 0 pages in swap cache
[120567.850286] Swap cache stats: add 0, delete 0, find 0/0
[120567.850291] Free swap  = 0kB
[120567.850295] Total swap = 0kB
[120567.867575] 245472 pages RAM
[120567.867584] 19186 pages HighMem
[120567.867588] 3410 pages reserved
[120567.867592] 29182 pages shared
[120567.867596] 216768 pages non-shared
[120567.867696] swapper: page allocation failure. order:5, mode:0x4020
[120567.867705] Pid: 0, comm: swapper Not tainted 2.6.32-bpo.5-686 #1
[120567.867710] Call Trace:
[120567.867731]  [c108c099] ? __alloc_pages_nodemask+0x484/0x4d9
[120567.867743]  [c108c0fa] ? __get_free_pages+0xc/0x17
[120567.867752]  [c10ae8fe] ? __kmalloc+0x30/0x128
[120567.867763]  [c11d29ec] ? pskb_expand_head+0x4f/0x157
[120567.867772]  [c11d2e2f] ? __pskb_pull_tail+0x40/0x1f6
[120567.867786]  [c11d9ad4] ? dev_queue_xmit+0xe4/0x38e
[120567.867801]  [c11fb191] ? ip_finish_output+0x0/0x5c
[120567.867810]  [c11fb156] ? ip_finish_output2+0x187/0x1c2
[120567.867820]  [c11fa657] ? ip_local_out+0x15/0x17
[120567.867829]  [c11fae38] ? ip_queue_xmit+0x31e/0x379
[120567.867838]  [c10add2d] ? __slab_alloc+0x97/0x431
[120567.867849]  [c126e058] ? _spin_lock_bh+0x8/0x1e
[120567.867870]  [f8775a86] ? __nf_ct_refresh_acct+0x66/0xa4 [nf_conntrack]
[120567.867884]  [c1209e8e] ? tcp_transmit_skb+0x595/0x5cc
[120567.867894]  [c120bef2] ? tcp_write_xmit+0x7a3/0x874
[120567.867903]  [c1207926] ? tcp_ack+0x1611/0x1802
[120567.867912]  [c120921b] ? tcp_established_options+0x1d/0x8b
[120567.867921]  [c12094df] ? tcp_current_mss+0x38/0x53
[120567.867931]  [c120c009] ? __tcp_push_pending_frames+0x1e/0x50
[120567.867940]  [c1207b32] ? tcp_data_snd_check+0x1b/0xd2
[120567.867949]  [c12081d1] ? tcp_rcv_established+0xd2/0x626
[120567.867960]  [c120e958] ? tcp_v4_do_rcv+0x15f/0x2cf
[120567.867970]  [c120ee9a] ? tcp_v4_rcv+0x3d2/0x602
[120567.867980]  [c11f71d6] ? ip_local_deliver_finish+0x10c/0x18c
[120567.867989]  [c11f6dfc] ? ip_rcv_finish+0x2c4/0x2d8
[120567.867999]  [c11d8d99] ? netif_receive_skb+0x3bb/0x3d6
[120567.868095]  [f7c6ca2c] ? e1000_clean_rx_irq+0x351/0x400 [e1000]
[120567.868130]  [f7c703c6] ? e1000_clean+0x29f/0x40d [e1000]
[120567.868142]  [c104684c] ? hrtimer_get_next_event+0x8c/0xa0
[120567.868155]  [c103b2df] ? get_next_timer_interrupt+0x190/0x1fb
[120567.868165]  [c1007569] ? sched_clock+0x5/0x7
[120567.868175]  [c1047beb] ? sched_clock_local+0x15/0x11b
[120567.868184]  [c11d9319] ? net_rx_action+0x96/0x194
[120567.868196]  [c10354dc] ? __do_softirq+0xaa/0x151
[120567.868205]  [c10355b4] ? do_softirq+0x31/0x3c
[120567.868213]  [c103568a] ? irq_exit+0x26/0x58
[120567.868225]  [c1004699] ? do_IRQ+0x78/0x89
[120567.868234]  [c10037f0] ? common_interrupt+0x30/0x38
[120567.868250]  [c101a818] ? native_safe_halt+0x2/0x3
[120567.868259]  [c1008597] ? default_idle+0x3c/0x5a
[120567.868267]  [c1002389] ? cpu_idle+0x89/0xa4
[120567.868279]  [c13bf7fc] ? start_kernel+0x318/0x31d
[120567.868284] Mem-Info:
[120567.868288] DMA per-cpu:
[120567.868293] CPU0: hi:0, btch:   1 usd:   0
[120567.868298] Normal per-cpu:
[120567.868304] CPU0: hi:  186, btch:  31 usd:  43
[120567.868308] HighMem per-cpu:
[120567.868314] CPU0: hi:   18, btch:   3 usd:   2
[120567.868327] active_anon:21435 inactive_anon:28240 isolated_anon:0
[120567.868331]  active_file:7502 inactive_file:163906 isolated_file:0
[120567.868334]  unevictable:0 dirty:30 writeback:0 unstable:0
[120567.868338]  free:4413 slab_reclaimable:3626 slab_unreclaimable:1704
[120567.868341]  mapped:1903 shmem:189 pagetables:547 bounce:0
[120567.868359] DMA free:3556kB min:64kB low:80kB high:96kB active_anon:0kB 
inactive_anon:904kB active_file:8kB inactive_file:10832kB unevictable:0kB 
isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:104kB 
slab_unreclaimable:248kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[120567.868375] lowmem_reserve[]: 0 861 935 935
[120567.868397] Normal

Bug#610859: linux-image-2.6.32-bpo.5: Sound unusable on 1GHz Powerbook (TiBook)

2011-01-23 Thread Anton Ivanov

Package: linux-2.6
Version: 2.6.32-15~bpo50+1
Severity: normal
File: linux-image-2.6.32-bpo.5


Sound module which is unusable.

Any playback starts OK, but gets interrupted with hissing/ticking
by other activity. Moving windows, disk activity, etc cause sound
interruptions up to a couple of seconds in length. With hands off
there is the occasional hiss-n-tick

I have tried both the AOA and powermac sound modules. AOA does not
detect the onboard sound card. Powermac detects it, but hisses/ticks.

I am observing the same with 2.6.26 and with 2.6.32 from backports.

-- Package-specific info:
** Version:
Linux version 2.6.32-bpo.5-powerpc (Debian 2.6.32-15~bpo50+1) 
(norb...@tretkowski.de) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 Sat Jun 12 
11:36:21 UTC 2010

** Command line:
root=/dev/hda4 ro 

** Not tainted

** Kernel log:
[  119.408798] ADDRCONF(NETDEV_UP): eth1: link is not ready
[  119.634006] eth1: Lucent/Agere firmware doesn't support manual roaming
[  119.705128] eth1: New link status: Connected (0001)
[  119.705685] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[  128.768782] tun: Universal TUN/TAP device driver, 1.6
[  128.768793] tun: (C) 1999-2004 Max Krasnyansky m...@qualcomm.com
[  128.771214] tun0: Disabled Privacy Extensions
[  130.690459] eth1: no IPv6 routers present
[  192.420338] eth1: New link status: AP Out of Range (0004)
[  192.476681] irq: irq 1 on host 
/pci@f200/mac-io@17/interrupt-controller@4 mapped to virtual irq 30
[  192.476723] irq: irq 2 on host 
/pci@f200/mac-io@17/interrupt-controller@4 mapped to virtual irq 31
[  192.522927] eth1: New link status: AP In Range (0005)
[  192.577697] irq: irq 61 on host 
/pci@f200/mac-io@17/interrupt-controller@4 mapped to virtual irq 61
[  192.991053] input: PowerMac Beep as 
/devices/pci0001:10/0001:10:17.0/input/input5
[  309.309899] eth1: New link status: AP Out of Range (0004)
[  310.245024] eth1: New link status: AP In Range (0005)
[  453.575083] eth1: New link status: AP Out of Range (0004)
[  453.673571] eth1: New link status: AP In Range (0005)
[  463.199890] eth1: New link status: AP Out of Range (0004)
[  463.298378] eth1: New link status: AP In Range (0005)
[  914.766359] eth1: New link status: AP Out of Range (0004)
[  914.867941] eth1: New link status: AP In Range (0005)
[ 1760.135494] input: PowerMac Beep as 
/devices/pci0001:10/0001:10:17.0/input/input6
[ 2101.517585] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 2101.811146] eth1: New link status: Connected (0001)
[ 2101.811702] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 2107.495653] eth1: New link status: Disconnected (0002)
[ 2108.696631] eth1: New link status: Connected (0001)
[ 2112.075720] eth1: no IPv6 routers present
[ 2112.875570] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 2113.011062] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 2113.250253] eth1: Lucent/Agere firmware doesn't support manual roaming
[ 2113.941866] eth1: New link status: Connected (0001)
[ 2113.942576] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 2119.300944] eth1: New link status: Disconnected (0002)
[ 2120.489911] eth1: New link status: Connected (0001)
[ 2123.945793] eth1: no IPv6 routers present
[ 2139.908804] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 2140.086915] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 2140.347740] eth1: Lucent/Agere firmware doesn't support manual roaming
[ 2140.589689] eth1: New link status: Connected (0001)
[ 2140.590395] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 2146.189335] eth1: New link status: Disconnected (0002)
[ 2147.406630] eth1: New link status: Connected (0001)
[ 2151.312134] eth1: no IPv6 routers present
[ 2304.962355] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 2305.191994] eth1: New link status: Connected (0001)
[ 2305.192556] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 2310.687020] eth1: New link status: Disconnected (0002)
[ 2312.007900] eth1: New link status: Connected (0001)
[ 2315.490011] eth1: no IPv6 routers present
[ 2316.217673] usb 2-1: new full speed USB device using ohci_hcd and address 2
[ 2316.426840] usb 2-1: New USB device found, idVendor=19d2, idProduct=0103
[ 2316.426856] usb 2-1: New USB device strings: Mfr=3, Product=2, SerialNumber=4
[ 2316.426866] usb 2-1: Product: ZTE WCDMA Technologies MSM
[ 2316.426874] usb 2-1: Manufacturer: ZTE,Incorporated
[ 2316.426882] usb 2-1: SerialNumber: P673A3H3GD01
[ 2316.428864] usb 2-1: configuration #1 chosen from 1 choice
[ 2317.081553] Initializing USB Mass Storage driver...
[ 2317.084319] scsi0 : SCSI emulation for USB Mass Storage devices
[ 2317.086268] usbcore: registered new interface driver usb-storage
[ 2317.086278] USB Mass Storage support registered.
[ 2317.093232] usb-storage: device found at 2
[ 2317.093239] usb-storage: waiting for device to settle before scanning
[ 2318.651190] usb 2-1: USB disconnect, address 2
[ 2319.080219] usb 2-1: new full speed USB device using ohci_hcd and address 3
[

Bug#610859: linux-image-2.6.32-bpo.5: Sound unusable on 1GHz Powerbook (TiBook)

2011-01-23 Thread Anton Ivanov


Ben Hutchings wrote:

On Sun, 2011-01-23 at 12:55 +, Anton Ivanov wrote:
  

Package: linux-2.6
Version: 2.6.32-15~bpo50+1



Why are you using a 7 month old version?  Try the current version
(2.6.32-30).

Ben.

  

The current one does not manifest the problem. You can close this bug.

Apologies, I should have updated to latest first instead of looking at 
the sources and wondering why the hell is it doing this :)


Brgds,



--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d3c672b.9090...@sigsegv.cx

Bug#534444: Acknowledgement (linux-image-2.6.26-2-486: CBQ broken)

2010-07-11 Thread Anton Ivanov

I thought we had it closed :)

I have not tried with anything past 2.6.26.

2.6.26 as discussed before on this bug has two problems:

1. Counterintuitive use of the word borrow in the stats.

2. Bad estimator precision compared to  2.6.18 and older kernels. 

If you know about both you can workaround. I do not believe that the use
of Borrow will be changed any time soon. It will be interesting to see
if the timers have improved, but I have no test rig to test it right now.

Brgds,

Moritz Muehlenhoff wrote:
 tags 53 moreinfo
 thanks

 On Wed, Jun 24, 2009 at 11:08:36PM +0100, Ben Hutchings wrote:
   
 On Wed, 2009-06-24 at 16:42 +0100, Anton Ivanov wrote:
 
 Additional information.

 It does not do it on all classes. I can observe it on a particular class
 parented to the root CBQ qdisc with multiple burstable children. 

 isolated put on another class parented to the root qdisc is similarly
 ignored. 

 I will try to dig through the source to see exactly where the bug is,
 but it is definitely a bug (the results of tc are confirmed by delay
 measurements and bandwidth measurements in the relevant classes).
   
 Please also test whether the configuration you're trying is still broken
 in kernel version 2.6.30 (from unstable).  If so, please report this bug
 upstream on bugzilla.kernel.org.
 

 Anton,
 does still occur with current kernels? Did you report it in the kernel.org
 bugzilla?

 Cheers,
 Moritz

   


-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715






-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4c39a35b.5030...@sigsegv.cx

Bug#576405: Acknowledgement (linux-image-2.6.26: Deadlock during combined NFS3/NFS4 use)

2010-04-04 Thread Anton Ivanov

Just remembered something - I did quite a lot of copying off manually
mounted via v3 exports which were at the same time accessed elsewhere
via v4. So this is probably autofs related and needs to me mounted under
an autofs point to be triggered.

The only reference I have been able to find to something similar was in
the fedora bugs for v9. By the look of it using autofs to set up a Unix
workstation network properly has become something of a lost art :(

The setup can be found in details at:
http://foswiki.sigsegv.cx/bin/view/Net/LinuxNFSv4

I have used the v3 setup in production environments with tens of users
and on my home network for many years and the only reason to look at v4
at all is that most apps like firefox, etc have now moved to use sqlite
which locks/unlocks like crazy so v3 starts hitting performance
limitations.

Brgds,

On Sun, 2010-04-04 at 09:33 +, Debian Bug Tracking System wrote:
 Thank you for filing a new Bug report with Debian.
 
 This is an automatically generated reply to let you know your message
 has been received.
 
 Your message is being forwarded to the package maintainers and other
 interested parties for their attention; they will reply in due course.
 
 Your message has been sent to the package maintainer(s):
  unknown-pack...@qa.debian.org
 
 If you wish to submit further information on this problem, please
 send it to 576...@bugs.debian.org.
 
 Please do not send mail to ow...@bugs.debian.org unless you wish
 to report a problem with the Bug-tracking system.
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1270377747.18041.7.ca...@moonbird.sigsegv.cx

Bug#576405: linux-image-2.6.26: Deadlock during combined NFS3/NFS4 use

2010-04-04 Thread Anton Ivanov

Package: linux-image-2.6.26
Version: nfsfix.1
Severity: important


When an export is exported and mounted via autofs using BOTH
NFSv3 and NFSv4 the NFSv4 one deadlocks.

Setup - transition from v3 to v4.

System A is still perusing the old map:
cat /etc/auto.local | grep iPodResolution
iPodResolution  -rsize=4096,wsize=4096,rw   eden:/exports/md4/videoiPod

System B is using the newever version of same map with a v4 mount:
cat /var/yp/auto.local | grep iPodResolution
iPodResolution  -fstype=nfs4   eden:/md4/videoiPod

If system B is writing to the mount and A is reading from it B starts getting 
I/O errors/BAD FDs. If B is running from disk lots of things fail. If B is
running diskless - total lock up. 

Same setup with V3 only has been working flawlessly for 5+ years. Same setup 
with 
V4 only (when the v3 machines are off) seems to work OK as well.

Fairly easy to reproduce.

I am not sure at this point if autofs has any role in this and I will not be in 
a position to retest until 19th. Apologies.

Tested with: stock debian, older version with just the nfs regression fix, 
stock recompiled with preempt, 686 and 486 versions.

-- System Information:
Debian Release: 5.0.3
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-2-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26 depends on:
ii  coreutils 6.10-6 The GNU core utilities
ii  debconf [debconf-2.0] 1.5.24 Debian configuration management sy

linux-image-2.6.26 recommends no packages.

Versions of packages linux-image-2.6.26 suggests:
ii  fdutils   5.5-20060227-3 Linux floppy utilities
pn  ksymoops  none (no description available)
pn  linux-doc-2.6.26 | linux- none (no description available)

-- debconf-show failed


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/2010040408.10335.29590.report...@eden.sigsegv.cx

Bug#576405: linux-image-2.6.26: Deadlock during combined NFS3/NFS4 use

2010-04-04 Thread Anton Ivanov

On Sun, 2010-04-04 at 20:40 +0100, Ben Hutchings wrote:
 On Sun, 2010-04-04 at 09:44 +0100, Anton Ivanov wrote:
  Package: linux-image-2.6.26
  Version: nfsfix.1
 
 What does that version mean?  Have you applied your own patches?  Can
 you reproduce this with an official kernel package?

That is a rather old official with only the NFS regression applied
(the one I dug out a while back - bug 524199). It worked and I have not
touched it since.

That is just one machine on which I am observing it. I also see it on
several other machines with:

1. Latest Official - 486, 686
2. Rebuilt with only preemption enabled
3. Official as released only with NFS regression applied (524199)

I have not done a full matrix of all client/server options for these
as there are quite a few possible combinations. 

I can reboot all of the machines in question into official and retest
formally after I am back from holidays on the 19th. I would not expect
to see anything particularly different though (even on the ones that do
not run official the difference from official is marginal - a few
option tweaks and/or official patches from debian-security applied by
hand to older kernels).

-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk
WWW: http://www.kot-begemot.co.uk/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ai...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1270412451.30401.10.ca...@localhost.localdomain

Bug#534430: Info received (linux-image-2.6.26: CBQ broken)

2010-02-20 Thread Anton Ivanov

Sure, that is the same bug. I actually thought that I was updating that
one when submitting the recent bug reports.

Close please.
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk
WWW: http://www.kot-begemot.co.uk/





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1266659595.31060.0.ca...@vorlon.sigsegv.cx

Bug#534430: Info received (linux-image-2.6.26: CBQ broken)

2010-01-31 Thread Anton Ivanov

On Sun, 2010-01-31 at 13:12 +0100, Moritz Muehlenhoff wrote:
 On Wed, Jan 27, 2010 at 11:38:07AM +, Anton Ivanov wrote:
  Sorry, ignore my previous email. 
  
  I think I got to it, for whatever reason it is not getting set in
  cbq_set_lss(), just can't figure out what is wrong.
 
 Anton, 
 as per your posting on linux-netdev I understand this bug can be closed?

Yes. 

It is bad english in the output of tc combined with bad timing since
kernel has gone to high perf timers. 2.6.9 and even 2.6.18 delivered
considerably better traffic shaping performance.



 
 |  I am using CBQ myself with recent kernels and never found it
 |  'borrowing', could you post a copy of your rules, or better, a subset of
 |  them desmonstrating the problem ?
 |
 | Actually after going through it several times and looking at the code
 | Jarek pointed out it doesn't.
 |
 | Just the stats are very confusing and precision is not particularly
 | great.
 
 Cheers,
 Moritz
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk
WWW: http://www.kot-begemot.co.uk/





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534430: linux-image-2.6.26: CBQ broken

2010-01-27 Thread Anton Ivanov

I have finally gotten around to look at it properly (it has been
annoying me all morning so I did not have choice, but to get to it).

There is no way I can see the current kernel code to work.

It sets borrow to be _ALWAYS_ equal to the parent on line 2077 of
cbq_shed.c. For a bounded class it should change the used bandwidth in
the parent as in the other bits of code around this part and after that
set it to NULL. That bit of code is completely missing.

I just downloaded 33-rc5 will look if it is there.

Brgds,

-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534430: linux-image-2.6.26: CBQ broken

2010-01-27 Thread Anton Ivanov

I think I found the breakage.

Have a look at cbq_set_lss()

Instead of checking the flags with  and working based on that it
actually ANDs the flags every time (which if the flag is not already set
results in an eternal false). 

I am rebuilding it at the moment after replacing the offending  with
. If it works after that we can hopefully consider this one closed.

Brgds,

On Wed, 2010-01-27 at 10:49 +, Anton Ivanov wrote:
 I have finally gotten around to look at it properly (it has been
 annoying me all morning so I did not have choice, but to get to it).
 
 There is no way I can see the current kernel code to work.
 
 It sets borrow to be _ALWAYS_ equal to the parent on line 2077 of
 cbq_shed.c. For a bounded class it should change the used bandwidth in
 the parent as in the other bits of code around this part and after that
 set it to NULL. That bit of code is completely missing.
 
 I just downloaded 33-rc5 will look if it is there.
 
 Brgds,
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534430: Info received (linux-image-2.6.26: CBQ broken)

2010-01-27 Thread Anton Ivanov

Sorry, ignore my previous email. 

I think I got to it, for whatever reason it is not getting set in
cbq_set_lss(), just can't figure out what is wrong.

Brgds,


-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk
WWW: http://www.kot-begemot.co.uk/





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#552255: linux-image-2.6.26-2-686: /proc permission bypass

2009-10-25 Thread Anton Ivanov


[snip]

 I imagine such applications are already totally insecure.

Sure, agree 100%. However, under normal circumstances they can be bolted
down by a sysadmin using directory permissions until the developers see
the light.

 
  Fourth, during the discussion it was claimed that this does not work on
  Linux proper.
 
 In a listing of /proc/self/fd the files appear with read and/or write
 permissions depending on the file descriptor mode.  But when a process
 tries to open them they are treated as symbolic links, which have no
 permissions of their own.  This is fairly obvious when looking at the
 code and it's not something we change.

I did not have the time to look at it in detail. After one of the people
on the cc-list of the actual discussion said that it does not apply to
plain linux and this is debian-specific I looked at the current debian
patch for .26. I saw some that there are some patches that apply to the
relevant files for proc, but I have not had the time do decipher what
they do.

 
  I have some doubts about the claim, but cannot verify it
  (I am off on holiday in an hour or so). It maybe  Debian specific or
  specific to a patch which Debian and more than one other distro is using
  (ptrace comes to mind). I personally do not think that is the case,
  however it is worth checking and if it is coming from the ptrace patches
  double check if they do not introduce something worse than that
  somewhere.
 
 I don't know what patches you're talking about.

See above. As I said, I have not had the time to test this vs a vanilla
kernel. I am on my way to chop wood for a week instead of chopping code.
Sorry.

Will fw you the relevant email just in case it does not make the bugtraq
moderator queue.

 
 Ben.
 




-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#552255: [Fwd: Re: /proc filesystem allows bypassing directory permissions on Linux]

2009-10-25 Thread Anton Ivanov

Personally, I think the chap needs ceiling replastered. Too many
scratches from the nose being ploughed through it at high velocity.

As I said, I did not have the resources to test if he is right or wrong
yesterday.

Brgds,
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715

---BeginMessage---

On 24.10.2009 22:05, Anton Ivanov wrote:

It works on Debian 2.6.26 out of the box. It is not an obscure patched
kernel case I am afraid.

If you redir an FD to a file using thus redir-ed FD in /proc allows you
to bypass directory permissions for where the file is located.
Thankfully, file permissions still apply so you need an app which has
silly file perms in a bolted down directory for this.

Symlinking the same file to a link on a normal ext3 or nfs filesystem as
a sanity check shows correct permission behaviour. If you try to write
to that symlink you get permission denied so the permissions on the fs
actually work.

No need to be root, nothing. It is not a case of forget to drop EID or
something else like that either. It looks like what it says on the tin
- permission bypass.

Not that I would have expected anything different considering who posted
it in the first place.

Thus Debian kernel team should be blamed for that misbehaviour. Don't worry, 
hardlinks behave just the same way, as you describe. Use authentic Linux 
kernels, if you dislike that.

--

Sincerely Your, Dan.

---End Message---

Bug#552255: linux-image-2.6.26-2-686: /proc permission bypass

2009-10-24 Thread Anton Ivanov

Package: linux-image-2.6.26-2-686
Version: 2.6.26-17
Severity: important


Currently discussed on bugtraq

Cut-n-pasting the email

Hi!

This is forward from lkml, so no, I did not invent this
hole. Unfortunately, I do not think lkml sees this as a security hole,
so...

Jamie Lokier said:
a) the current permission model under /proc/PID/fd has a security
   hole (which Jamie is worried about)
  
  I believe its bugtraq time. Being able to reopen file with additional
  permissions looks like  a security problem...
  
  Jamie, do you have some test script? And do you want your 15 minutes
   of bugtraq fame? ;-).

 The reopen does check the inode permission, but it does not require
 you have any reachable path to the file.  Someone _might_ use that as
 a traditional unix security mechanism, but if so it's probably quite rare.

Ok, I got this, with two users. I guess it is real (but obscure)
security hole.

So, we have this scenario. pavel/root is not doing anything interesting in
the background.

pa...@toy:/tmp$ uname -a
Linux toy.ucw.cz 2.6.32-rc3 #21 Mon Oct 19 07:32:02 CEST 2009 armv5tel GNU/Linux
pa...@toy:/tmp mkdir my_priv; cd my_priv
pa...@toy:/tmp/my_priv$ echo this file should never be writable  
unwritable_file
# lock down directory
pa...@toy:/tmp/my_priv$ chmod 700 .
# relax file permissions, directory is private, so this is safe
# check link count on unwritable_file. We would not want someone 
# to have a hard link to work around our permissions, would we?
pa...@toy:/tmp/my_priv$ chmod 666 unwritable_file 
pa...@toy:/tmp/my_priv$ cat unwritable_file 
this file should never be writable
pa...@toy:/tmp/my_priv$ cat unwritable_file 
got you
# Security problem here

[Please pause here for a while before reading how guest did it.]

Unexpected? Well, yes, to me anyway. Linux specific? Yes, I think so.

So what did happen? User guest was able to work around directory
permissions in the background, using /proc filesystem.

gu...@toy:~$ bash 3 /tmp/my_priv/unwritable_file 
# Running inside nested shell
gu...@toy:~$ read A 3
gu...@toy:~$ echo $A
this file should never be writable

gu...@toy:~$ cd /tmp/my_priv
gu...@toy:/tmp/my_priv$ ls
unwritable_file

# pavel did chmod 000, chmod 666 here
gu...@toy:/tmp/my_priv$ ls
ls: cannot open directory .: Permission denied

# Linux correctly prevents guest from writing to that file
gu...@toy:/tmp/my_priv$ cat unwritable_file
cat: unwritable_file: Permission denied
gu...@toy:/tmp/my_priv$ echo got you 3
bash: echo: write error: Bad file descriptor

# ...until we take a way around it with /proc filesystem. Oops.
gu...@toy:/tmp/my_priv$ echo got you  /proc/self/fd/3 


-- Package-specific info:

-- System Information:
Debian Release: 5.0.2
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26 (SMP w/1 CPU core; PREEMPT)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-2-686 depends on:
ii  debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92o  tools for generating an initramfs
ii  module-init-tools 3.4-1  tools for managing Linux kernel mo

Versions of packages linux-image-2.6.26-2-686 recommends:
ii  libc6-i6862.7-18 GNU C Library: Shared libraries [i

Versions of packages linux-image-2.6.26-2-686 suggests:
ii  grub   0.97-47lenny2 GRand Unified Bootloader (Legacy v
ii  lilo   1:22.8-7  LInux LOader - The Classic OS load
pn  linux-doc-2.6.26   none(no description available)

-- debconf-show failed



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#552255: linux-image-2.6.26-2-686: /proc permission bypass

2009-10-24 Thread Anton Ivanov

We have been having a back and fourth on this with a couple of people.
It has not shown up on BUGTRAQ yet because it is sitting in the
moderator queue.

First of all, any permission bypass is bad. Principle of least surprise.

Second, the important thing here is that directory permissions are
ignored. Whatever the reason, that is not good. The case shown by Pavel
is an extreme example (using 666), but you can most likely have a less
extreme example where this can be put to good use.

Third, there is a non-zero size class of applications where it is likely
such idiocy like 666 protected one level above by dir to be found -
ported from Windows. Under windows, locking is non-advisory and apps
tend to scribble under themselves. So if you open a file with an
exclusive Read/write lock nobody can read/write it regardless of
permissions. When a program gets ported to unix developers (or the
porting toolkit) replaces the code with flocks or fcntl which are
advisory and the file becomes nicely accessible. No such code in debian
proper, but that does not mean that there is no such code out there in
the wild.

Fourth, during the discussion it was claimed that this does not work on
Linux proper. I have some doubts about the claim, but cannot verify it
(I am off on holiday in an hour or so). It maybe  Debian specific or
specific to a patch which Debian and more than one other distro is using
(ptrace comes to mind). I personally do not think that is the case,
however it is worth checking and if it is coming from the ptrace patches
double check if they do not introduce something worse than that
somewhere.

Cheers,

On Sat, 2009-10-24 at 21:18 +0100, Ben Hutchings wrote:
 On Sat, 2009-10-24 at 20:19 +0100, Anton Ivanov wrote:
  Package: linux-image-2.6.26-2-686
  Version: 2.6.26-17
  Severity: important
  
  
  Currently discussed on bugtraq
  
  Cut-n-pasting the email
  
  Hi!
  
  This is forward from lkml, so no, I did not invent this
  hole. Unfortunately, I do not think lkml sees this as a security hole,
  so...
  
  Jamie Lokier said:
  a) the current permission model under /proc/PID/fd has a security
 hole (which Jamie is worried about)

I believe its bugtraq time. Being able to reopen file with additional
permissions looks like  a security problem...

Jamie, do you have some test script? And do you want your 15 minutes
 of bugtraq fame? ;-).
  
   The reopen does check the inode permission, but it does not require
   you have any reachable path to the file.  Someone _might_ use that as
   a traditional unix security mechanism, but if so it's probably quite rare.
  
  Ok, I got this, with two users. I guess it is real (but obscure)
  security hole.
 
 So obscure that it doesn't really count as important.
 
  So, we have this scenario. pavel/root is not doing anything interesting in
  the background.
  
  pa...@toy:/tmp$ uname -a
  Linux toy.ucw.cz 2.6.32-rc3 #21 Mon Oct 19 07:32:02 CEST 2009 armv5tel 
  GNU/Linux
  pa...@toy:/tmp mkdir my_priv; cd my_priv
  pa...@toy:/tmp/my_priv$ echo this file should never be writable  
  unwritable_file
  # lock down directory
  pa...@toy:/tmp/my_priv$ chmod 700 .
  # relax file permissions, directory is private, so this is safe
  # check link count on unwritable_file. We would not want someone 
  # to have a hard link to work around our permissions, would we?
  pa...@toy:/tmp/my_priv$ chmod 666 unwritable_file 
 [...]
 
 But who's really going to do that, other that to demonstrate this?
 
 Ben.
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534430: linux-image-2.6.26: CBQ broken

2009-07-26 Thread Anton Ivanov

Bounded classes are allowed to borrow at least under some circumstances.

In my config there is a bounded class parented to root on my DSL uplink
and a hierarchy sitting under it where most classes are allowed to
borrow. If the root class is bounded it all works like a breeze. I have
used to use a replica of this setup under BSD for nearly 10 years and
recently moved it to Linux.

Because the root class was borrowing the underlying hierarchy was
exceeding their allocated bandwidths on casual basis. As a result - no
QoS.

I have worked around it by bringing down the bandwidth of the parent
root CBQ qdisc at the moment. It is now still borrowing:

class cbq 2:16 parent 2: rate 38bit (bounded) prio 2
 Sent 1041746585 bytes 7852722 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
  borrowed 7984534 overactions 0 avgidle 78 undertime 0

However, it just gets dropped by the qdisc.

Bounded classes in lower levels in the hierarchy actually work. Putting
a few more classes between the root and the first class that is bounded
does not. Overall, it is broken and broken pretty badly. 

I have not had the time to sit down and read the actual code yet to see
exactly where it is broken. Apologies,

Best Regards,

On Sat, 2009-07-25 at 22:30 +0200, Moritz Muehlenhoff wrote:
 On Wed, Jun 24, 2009 at 10:21:05AM +0100, Anton Ivanov wrote:
  Package: linux-image-2.6.26
  Version: nfsfix.1
  Severity: normal
  
  
  CBQ is completely broken. The borrowed counters never increase
  and from there on the  bandwidth computation is totally fubar
 
 Please explain the problem more verbosely. What exactly did you
 do and what result did you expect?
 
 Cheers,
 Moritz
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#521727: Preempt

2009-07-15 Thread Anton Ivanov

I can confirm that.

While the difference between PREEMPT and normal kernels with 2.6.18 and
prior to that was mostly for connoisseurs, with 2.6.26 it is clearly
visible with the naked eye (tested on a bog standard 2GHz Athlon XP).

This should not really be the case especially under light or no load.
There is yet another performance regression somewhere besides the NFS
one and the block IO one which got fixed recently.

-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  anton.iva...@kot-begemot.co.uk
WWW: http://www.kot-begemot.co.uk/





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534430: linux-image-2.6.26: CBQ broken

2009-06-24 Thread Anton Ivanov

Package: linux-image-2.6.26
Version: nfsfix.1
Severity: normal


CBQ is completely broken. The borrowed counters never increase and from there 
on the  bandwidth computation is totally fubar.

-- System Information:
Debian Release: 5.0.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26 depends on:
ii  coreutils 6.10-6 The GNU core utilities
ii  debconf [debconf-2.0] 1.5.24 Debian configuration management sy

linux-image-2.6.26 recommends no packages.

Versions of packages linux-image-2.6.26 suggests:
ii  fdutils   5.5-20060227-3 Linux floppy utilities
pn  ksymoops  none (no description available)
pn  linux-doc-2.6.26 | linux- none (no description available)

-- debconf-show failed


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534444: linux-image-2.6.26-2-486: CBQ broken

2009-06-24 Thread Anton Ivanov

Package: linux-image-2.6.26-2-486
Version: 2.6.26-15
Severity: normal


CBQ fails to perform correctly. 


class cbq 2:16 parent 2: leaf 76: rate 32bit (bounded) prio 1
 Sent 551644 bytes 1279 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
  borrowed 810 overactions 0 avgidle 18424 undertime 0

This should never happen.

From there on the entire CBQ subsystem is fubar...

-- Package-specific info:

-- System Information:
Debian Release: 5.0.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-2-486 depends on:
ii  debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92o  tools for generating an initramfs
ii  module-init-tools 3.4-1  tools for managing Linux kernel mo

linux-image-2.6.26-2-486 recommends no packages.

Versions of packages linux-image-2.6.26-2-486 suggests:
ii  grub   0.97-47lenny2 GRand Unified Bootloader (Legacy v
ii  lilo   1:22.8-7  LInux LOader - The Classic OS load
pn  linux-doc-2.6.26   none(no description available)

-- debconf-show failed



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#534444: Acknowledgement (linux-image-2.6.26-2-486: CBQ broken)

2009-06-24 Thread Anton Ivanov

Additional information.

It does not do it on all classes. I can observe it on a particular class
parented to the root CBQ qdisc with multiple burstable children. 

isolated put on another class parented to the root qdisc is similarly
ignored. 

I will try to dig through the source to see exactly where the bug is,
but it is definitely a bug (the results of tc are confirmed by delay
measurements and bandwidth measurements in the relevant classes).

Brgds,

On Wed, 2009-06-24 at 12:09 +, Debian Bug Tracking System wrote:
 Thank you for filing a new Bug report with Debian.
 
 This is an automatically generated reply to let you know your message
 has been received.
 
 Your message is being forwarded to the package maintainers and other
 interested parties for their attention; they will reply in due course.
 
 Your message has been sent to the package maintainer(s):
  Debian Kernel Team debian-kernel@lists.debian.org
 
 If you wish to submit further information on this problem, please
 send it to 534...@bugs.debian.org, as before.
 
 Please do not send mail to ow...@bugs.debian.org unless you wish
 to report a problem with the Bug-tracking system.
 
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#524199: test build

2009-04-16 Thread Anton Ivanov

It is loaded on one of my machines (the one that sees heavy use). The
results should be available in 6-8 hours. 

On Thu, 2009-04-16 at 00:23 -0600, dann frazier wrote:
 Can you test this build to see if it fixes the issue?
   http://people.debian.org/~dannf/bugs/524199/
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#524199: test build

2009-04-16 Thread Anton Ivanov

Does not fix it I am afraid. 

It took it longer to show up, but it is showing up none the less. So it
is not just that bit of code. There is something buggered elsewhere in
the NFS subsystem I am afraid :-(

Took around 8 hours of medium level usage - reading mail, digging for
stuff around the internet, browsing mplayer sources, etc.

aiva...@falkor:~$ uptime
 16:19:30 up  7:47,  2 users,  load average: 0.82, 1.03, 0.95

It started doing it right after I ran a couple of find scripts on
stuff mounted over NFS.

This is on my workstation which is a diskless P3 running of an etch NFS
server.

On Thu, 2009-04-16 at 00:23 -0600, dann frazier wrote:
 Can you test this build to see if it fixes the issue?
   http://people.debian.org/~dannf/bugs/524199/
 
-- 
   Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail:  aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331  89D5 FCDA 572E DDE5 E715





-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#524199: linux-image-2.6.26-1-686: nfs unusable

2009-04-15 Thread Anton Ivanov

Package: linux-image-2.6.26-1-686
Version: 2.6.26-13lenny2
Severity: important
Tags: patch


Overtime any system with NFS usage grinds to a crawl. While root or /usr on
NFS are most affected the same problem should affect other NFS usage. The
symptoms are extremely high system load, taking 20 minutes to open/close
applications, taking half a minute to create the menu on right click in KDE,
becoming unusable for 5 minutes while rebuilding menus in KDE, etc.

This is reported multiple times versus other packages elsewhere (mostly KDE
due to the idiotic way it builds its menus).

The reason seems to be: 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.gita=commitdiffh=23918b03060f6e572168fdde1798a905679d2e06

discussed here: http://lkml.indiana.edu/hypermail/linux/kernel/0812.2/00418.html

On the positive side this made me find out that a whole bunch of programs
are coded with the left foot. This patch needs to be retrofitted
into the kernel to make it usable again for anyone using NFS and especially
diskless clients.

Otherwise, nfs in current kernel is unusable. After waiting for 30 minutes for 
tex to 
update its map on an otherwise fast machine I downgraded all of my diskless 
clients to 
2.6.18


-- Package-specific info:

-- System Information:
Debian Release: 5.0.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.18-6-686 (SMP w/1 CPU core)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages linux-image-2.6.26-1-686 depends on:
ii  debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.92o  tools for generating an initramfs
ii  module-init-tools 3.4-1  tools for managing Linux kernel mo

Versions of packages linux-image-2.6.26-1-686 recommends:
ii  libc6-i6862.7-18 GNU C Library: Shared libraries [i

Versions of packages linux-image-2.6.26-1-686 suggests:
ii  grub   0.97-47lenny2 GRand Unified Bootloader (Legacy v
ii  lilo   1:22.8-7  LInux LOader - The Classic OS load
pn  linux-doc-2.6.26   none(no description available)

-- debconf-show failed



-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#524199: linux-image-2.6.26-1-686: nfs unusable

2009-04-15 Thread Anton Ivanov

Frankly, this deserves a higher bug rating than important.

A Unix without a working NFS is not a Unix. At least not a usable one.

I can now confirm that downgrading to 2.6.18 seems to fix it. All 3
machines I have with diskless Lenny are still up and usable. By this
time they would have been out of commission with 2.6.26.

On Wed, 2009-04-15 at 13:52 -0400, John Morrissey wrote:
On Wed, Apr 15, 2009 at 01:30:51PM +0100, Anton Ivanov wrote:
Overtime any system with NFS usage grinds to a crawl. While root or /usr
on NFS are most affected the same problem should affect other NFS usage.
The symptoms are extremely high system load,
[snip]
The reason seems to be:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.gita=commitdiffh=23918b03060f6e572168fdde1798a905679d2e06

FWIW, we're also seeing this, on machines mounting several large NFS volumes
that serve virtual accounts (i.e., they all have the same UID/GID and no
shell access).

After ~two days of uptime, these moderately loaded machines (running Apache,
ProFTPD, and Courier IMAP) start spending 90% of CPU time in system state
and become so unresponsive that they must be rebooted. Running the 2.6.28
that was recently in sid (which contains the commit Anton mentions) fixed
this.

john
--
Understanding is a three-edged sword:
your side, their side, and the truth. --Kosh Naranek

A. R. Ivanov
E-mail: aiva...@sigsegv.cx
WWW: http://www.sigsegv.cx/
pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx
Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715

--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#421443: linux-image-2.6.18-4-686: ide tape broken and ide scsi disabled, ide tapes unuseable

2007-04-29 Thread Anton Ivanov

Package: linux-image-2.6.18-4-686
Version: 2.6.18.dfsg.1-12
Severity: important


Kernel detects ide tape

ide-tape: hdb - ht0: Seagate STT2A rev 8A51
ide-tape: hdb - ht0: 1000KBps, 6*54kB buffer, 9720kB pipeline, 108ms tDSC, DMA

After which all userland utilities fail to access it or issue any
commands to it.

The drive mostly works using ide-tape in 2.6.14 and 2.6.16 on write (some 
read problems). Works fine in 2.6.16 using ide-scsi read/write.

Frankly, anyone who has had to use ide-tapes knows that Linus can go get
lost with his statement about the IDE tape driver now being a perfect
replacement for ide-scsi (multiple times on lkm since 2003). It isn't. 
In fact I have yet to see a kernel release where it works fine. 

So disabling IDE-SCSI is not nice. That is the only means to use ide
tape drives at the moment.

-- System Information:
Debian Release: 4.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.18-4-686
Locale: LANG=en_GB, LC_CTYPE=en_GB (charmap=ISO-8859-1)

Versions of packages linux-image-2.6.18-4-686 depends on:
ii  coreutils 5.97-5.3   The GNU core utilities
ii  debconf [debconf-2.0] 1.5.11 Debian configuration management sy
ii  initramfs-tools [linux-initra 0.85g  tools for generating an initramfs
ii  module-init-tools 3.3-pre4-2 tools for managing Linux kernel mo

Versions of packages linux-image-2.6.18-4-686 recommends:
ii  libc6-i686  2.3.6.ds1-13 GNU C Library: Shared libraries [i

-- debconf information excluded


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]

71 matches

Mail list logo