Bug#522726: kernel problem after a simple 'rm' command: RESERVE_SPACE(805) failed in function encode_lookup

2010-04-05 Thread Aleksandr Levchuk
Hi Ben,

No, I haven't got a chance to check if the bug exists in newer version.
We changed our NFS server from Linux to OpenSolaris.

But it was a major problem. It re-occurred every time a user would
attempt a filesystem operation where the filename was very long (e.g.
500 characters). Any fs write operation (rm, create new file) would
cause the kernel panic.

The crash happened several times a year. In all cases it was when
someone would antecedently pass data instead of a filename to a peace
of code that expects filenames.

Alex


On Mon, Apr 5, 2010 at 3:45 PM, Ben Hutchings  wrote:
> On Sun, 2009-04-05 at 23:08 -0700, Aleksandr Levchuk wrote:
>> Package: nfs-kernel-server
>> Version: 1:1.0.10-6+etch.1
>> Severity: important
>>
>> My very stable server crashed as a result of a 'rm' command in an
>> NFS-mounted home directory. The 'rm' command was a file name (with
>> newlines) but that file did not exist.
> [...]
>
> Sorry for the delay in replying to this.  The nfs-kernel-server package
> only contains supporting scripts, but the bug is clearly in the kernel
> itself (linux-image-* packages).
>
> The system you reported this bug from was apparently running Linux
> 2.6.22.  I assume that is the same version in which you saw this bug.
> Have you seen the bug reoccur in any more recent kernel version?
>
> Ben.
>
> --
> Ben Hutchings
> Once a job is fouled up, anything done to improve it makes it worse.
>



-- 
-
Aleksandr Levchuk
Administrator of Bioinformatic Systems and Databases

Homepage: http://biocluster.ucr.edu/~alevchuk/
Cell Phone: (951) 368-0004
Lab Phone: (951) 905-5232

Institute for Integrative Genome Biology
University of California, Riverside
-



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#522726: kernel problem after a simple 'rm' command: RESERVE_SPACE(805) failed in function encode_lookup

2010-04-05 Thread Ben Hutchings
On Sun, 2009-04-05 at 23:08 -0700, Aleksandr Levchuk wrote:
> Package: nfs-kernel-server
> Version: 1:1.0.10-6+etch.1
> Severity: important
> 
> My very stable server crashed as a result of a 'rm' command in an
> NFS-mounted home directory. The 'rm' command was a file name (with
> newlines) but that file did not exist.
[...]

Sorry for the delay in replying to this.  The nfs-kernel-server package
only contains supporting scripts, but the bug is clearly in the kernel
itself (linux-image-* packages).

The system you reported this bug from was apparently running Linux
2.6.22.  I assume that is the same version in which you saw this bug.
Have you seen the bug reoccur in any more recent kernel version?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#522726: kernel problem after a simple 'rm' command: RESERVE_SPACE(805) failed in function encode_lookup

2009-04-05 Thread Aleksandr Levchuk
Package: nfs-kernel-server
Version: 1:1.0.10-6+etch.1
Severity: important

My very stable server crashed as a result of a 'rm' command in an
NFS-mounted home directory. The 'rm' command was a file name (with
newlines) but that file did not exist.

The NFS client and the NFS server were the same machine.

Surprisingly, this cause a big problem inside the Kernel - the stack
trace shows a large amount of NFS system calls.

Here is what I did and what I got in response:
alevc...@biocluster:~/.html/cellwall$ rm
'source_fasta_tair-v20080412-seq---_downloaded-2009-04-04
> source_fasta_tair-v20080412-pep---_downloaded-2009-04-04
> source_fasta_tair-v20080412-cds---_downloaded-2009-04-04
> source_fasta_tair-v20080412-cdna--_downloaded-2009-04-04
> source_fasta_tair-v20080229-igenic_downloaded-2009-04-04
> source_fasta_tair-v20080228-intron_downloaded-2009-04-04
> source_fasta_tigr-v6-0-all-seq_downloaded-2009-04-04
> source_fasta_tigr-v6-0-all-pep_downloaded-2009-04-04
> source_fasta_jgi-poptr-v1-1_prot--_downloaded-2009-04-04
> source_fasta_jgi-phypa-v1-1_trans-_downloaded-2009-04-04
> source_fasta_jgi-phypa-v1-1_prot--_downloaded-2009-04-04
> source_fasta_uniprot-v14-9-_tremb-_downloaded-2009-04-04
> source_fasta_uniprot-v14-9-_sprot-_downloaded-2009-04-04
> source_fasta_jgi-poptr-v1-1_trans-_downloaded-2009-04-04'
Segmentation fault

Message from sysl...@biocluster at Sat Apr  4 23:06:56 2009 ...
biocluster kernel: [ cut here ]

Message from sysl...@biocluster at Sat Apr  4 23:06:56 2009 ...
biocluster kernel: invalid opcode:  [1] SMP

Message from sysl...@biocluster at Sat Apr  4 23:06:56 2009 ...
biocluster kernel: invalid opcode:  [1] SMP

Message from sysl...@biocluster at Sat Apr  4 23:06:56 2009 ...
biocluster kernel: [ cut here ]


Here is what /var/log/messages showed immediately after:

Apr  4 22:39:40 biocluster -- MARK --
Apr  4 22:59:40 biocluster -- MARK --
Apr  4 23:06:56 biocluster kernel: RESERVE_SPACE(805) failed in
function encode_lookup
Apr  4 23:06:56 biocluster kernel: CPU 15
Apr  4 23:06:56 biocluster kernel: Modules linked in: tcp_diag
inet_diag nfsd exportfs button ac battery autofs4 ib_ipoib ipv6 nfs
lockd nfs_acl sunrpc quota_v1 ext2 ext3 jbd mbcache dm_snapshot
dm_mirror dm_mod qla2xxx mppVhba mppUpper sg rdma_ucm rdma_cm ib_cm
iw_cm ib_sa ib_addr ib_umad ib_ipath ib_uverbs mlx4_ib ib_mad ib_core
loop psmouse serio_raw i2c_i801 i2c_core shpchp pci_hotplug pcspkr
mlx4_core igb evdev xfs ide_cd cdrom ata_generic sd_mod ata_piix
libata piix generic ide_core ehci_hcd uhci_hcd firmware_class
scsi_transport_fc mptsas mptscsih mptbase e1000 scsi_transport_sas
scsi_mod thermal processor fan
Apr  4 23:06:56 biocluster kernel: Pid: 12459, comm: rm Not tainted
2.6.22-3-amd64 #1
Apr  4 23:06:56 biocluster kernel: RIP: 0010:[]
[] :nfs:encode_lookup+0x34/0x5c
Apr  4 23:06:56 biocluster kernel: RSP: 0018:81053e8b38d8  EFLAGS: 00010292
Apr  4 23:06:56 biocluster kernel: RAX: 0037 RBX:
031d RCX: 804afd28
Apr  4 23:06:56 biocluster kernel: RDX: 804afd28 RSI:
0092 RDI: 804afd20
Apr  4 23:06:56 biocluster kernel: RBP: 0325 R08:
804afd28 R09: 
Apr  4 23:06:56 biocluster kernel: R10: 0046 R11:
8100010ceb40 R12: 81070967edb0
Apr  4 23:06:56 biocluster kernel: R13: 810e2c4343a8 R14:
88408091 R15: 81070967edb0
Apr  4 23:06:56 biocluster kernel: FS:  2b5b8bc496e0()
GS:810f0463a6c0() knlGS:
Apr  4 23:06:56 biocluster kernel: CS:  0010 DS:  ES:  CR0:
8005003b
Apr  4 23:06:56 biocluster kernel: CR2: 00403940 CR3:
000b7e1ee000 CR4: 06e0
Apr  4 23:06:56 biocluster kernel: Process rm (pid: 12459, threadinfo
81053e8b2000, task 810c73dad020)
Apr  4 23:06:56 biocluster kernel: Stack:  810e2c4343a8
81053e8b3a38 81063849b884 884080f3
Apr  4 23:06:56 biocluster kernel:  81063849b8ac 810e2c4343b0
81063849ba38 810e2c4343b0
Apr  4 23:06:56 biocluster kernel:  0004 
 81063849b884
Apr  4 23:06:56 biocluster kernel: Call Trace:
Apr  4 23:06:56 biocluster kernel:  []
:nfs:nfs4_xdr_enc_lookup+0x62/0x85
Apr  4 23:06:56 biocluster kernel:  []
:sunrpc:call_transmit+0x1c1/0x22d
Apr  4 23:06:56 biocluster kernel:  []
:sunrpc:__rpc_execute+0x7d/0x234
Apr  4 23:06:56 biocluster kernel:  []
:sunrpc:rpc_call_sync+0x75/0x9c
Apr  4 23:06:56 biocluster kernel:  [] touch_atime+0xbe/0x101
Apr  4 23:06:56 biocluster kernel:  []
:nfs:nfs4_proc_lookup+0xe5/0x25c
Apr  4 23:06:56 biocluster kernel:  []
get_page_from_freelist+0x363/0x4de
Apr  4 23:06:56 biocluster kernel:  []
:nfs:nfs_lookup+0xf6/0x262
Apr  4 23:06:56 biocluster kernel:  [] do_lookup+0x63/0x1ae
Apr  4 23:06:56 biocluster kernel:  [] dput+0x1c/0x10b
Apr  4 23:06:56 biocluster kernel:  []
current_fs_time+0x3b/0x40
Apr  4 23:06:5