Re: [uml-devel] Issues with a rather unusual configured NFS server
richard -rw- weinberger wrote: >>>> Thanks for the report. I think I see the problem--after this commit >>>> nfs4_set_delegation() failures result in nfs4_put_delegation being >>>> called, but nfs4_put_delegation doesn't free the nfs4_file that has >>>> already been set by alloc_init_deleg(). >>>> >>>> Let me think about how to fix that >>> >>> Sorry for the slow response--can you check whether this fixes the >>> problem? >>> >> Yes. >> >> With the attached patch the problem can't be reproduced any longer >> with the prepared test case and current git kernels. > BTW: Is nobody else fuzz testing NFS? Or are these bugs just more > likely to hit on UML? This is not the first NFS issue found by Toralf > using UML and Trinity. Kernel thread scheduling is likely very different on UML than other architectures. My guess is that there could well be be gaps where no kernel thread is scheduled (because another process is running), followed by resumption of a thread other than the one which would have been resumed on other virtualization. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works| network architect [ ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails[ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [uml-devel] Issues with a rather unusual configured NFS server
richard -rw- weinberger richard.weinber...@gmail.com wrote: Thanks for the report. I think I see the problem--after this commit nfs4_set_delegation() failures result in nfs4_put_delegation being called, but nfs4_put_delegation doesn't free the nfs4_file that has already been set by alloc_init_deleg(). Let me think about how to fix that Sorry for the slow response--can you check whether this fixes the problem? Yes. With the attached patch the problem can't be reproduced any longer with the prepared test case and current git kernels. BTW: Is nobody else fuzz testing NFS? Or are these bugs just more likely to hit on UML? This is not the first NFS issue found by Toralf using UML and Trinity. Kernel thread scheduling is likely very different on UML than other architectures. My guess is that there could well be be gaps where no kernel thread is scheduled (because another process is running), followed by resumption of a thread other than the one which would have been resumed on other virtualization. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works| network architect [ ] m...@sandelman.ca http://www.sandelman.ca/| ruby on rails[ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Expertise required:USB bulk-throughput and memory leak detection
[EMAIL PROTECTED] wrote: 1.) detecting memory leaks caused by our driver code. Your code will of course be allocating buffers. If you are allocating from a specific slab, if you have leaks, they will show up in /proc/slabinfo I wrote some code last month, which I called slabwatch, to track each item from slabinfo over time and let me plot it. I knew I had a memory leak which under heavy (network) test loads eventually brought down the kernel, and I was able to determine what was leaking, and eventually where the leak was. http://www.sandelman.ca/software/slabwatch-1.3.tgz It's very rudamentary. 2.) Neither have we been able to find a tool which shows % utilisation of kernel memory used by our driver.(kernel memory monitoring) if you know what pools you are allocating from, then you should be able to see what is going on. If you are able to, you may want to allocate (at least experimentally), your own slab, because then you can see precisely what is going on. This isn't trivial for skb allocations, because you have to use a custom destructor for the skb, as kfree_skb() otherwise would free things into the wrong slab. I wouldn't have minded having a per-interface pool of memory (and I can see a lot of uses where it would be valuable to limit skb's allocated to the capture port of a snort sensor, for instance, while not starving the management port), but I don't know if the skb->destructor is sufficient under-used to permit such a thing to be trivially implemented. I don't know the situation with USB drivers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Expertise required:USB bulk-throughput and memory leak detection
[EMAIL PROTECTED] wrote: 1.) detecting memory leaks caused by our driver code. Your code will of course be allocating buffers. If you are allocating from a specific slab, if you have leaks, they will show up in /proc/slabinfo I wrote some code last month, which I called slabwatch, to track each item from slabinfo over time and let me plot it. I knew I had a memory leak which under heavy (network) test loads eventually brought down the kernel, and I was able to determine what was leaking, and eventually where the leak was. http://www.sandelman.ca/software/slabwatch-1.3.tgz It's very rudamentary. 2.) Neither have we been able to find a tool which shows % utilisation of kernel memory used by our driver.(kernel memory monitoring) if you know what pools you are allocating from, then you should be able to see what is going on. If you are able to, you may want to allocate (at least experimentally), your own slab, because then you can see precisely what is going on. This isn't trivial for skb allocations, because you have to use a custom destructor for the skb, as kfree_skb() otherwise would free things into the wrong slab. I wouldn't have minded having a per-interface pool of memory (and I can see a lot of uses where it would be valuable to limit skb's allocated to the capture port of a snort sensor, for instance, while not starving the management port), but I don't know if the skb-destructor is sufficient under-used to permit such a thing to be trivially implemented. I don't know the situation with USB drivers. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
failure of cache_alloc_refill to terminate
I am running a UserModeLinux of 2.6.19-rc4 vintage. It failed to stop properly on halt. Yes, there is some custom code in it, and it is possible that some memory got walked on. Attaching GDB to it, shows that it is stuck in: slab.c, cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags) in the loop for: while (batchcount > 0) {..} because the decrement: while (slabp->inuse < cachep->num && batchcount--) { is never performed. I don't know why slabp->inuse < cachep->num, but because it it is never true, the batchcount never gets decremented. Since, I think the point of the batchcount was to prevent it from trying too hard, I am uncertain if it was intended that the batchcount not decrement if the inuse check fails. I think the point was to limit how many things we add to the slab at once. (Yes, I suspect that the slab got corrupted.) The stack trace is: #0 cache_alloc_refill (cachep=0x8800600, flags=208) at mm/slab.c:2954 #1 0x080bfbf0 in __kmalloc (size=1024, flags=) at mm/slab.c:3085 #2 0x080d8c34 in alloc_fd_array (num=256) at include/linux/slab.h:146 #3 0x080d91da in alloc_fdtable (nr=255) at fs/file.c:274 #4 0x080d926a in expand_fdtable (files=0x8c12a80, nr=255) at fs/file.c:304 #5 0x080d9315 in expand_files (files=0x8c12a80, nr=255) at fs/file.c:347 #6 0x0807fffb in dup_fd (oldf=0x82d1480, errorp=0x840fe3c) at kernel/fork.c:679 #7 0x08080237 in copy_files (clone_flags=17, tsk=0x8771380) at kernel/fork.c:764 #8 0x08080706 in copy_process (clone_flags=17, stack_start=3214960796, regs=0x86894b4, stack_size=0, parent_tidptr=0x0, child_tidptr=0x0, pid=952) at kernel/fork.c:1107 #9 0x080810cd in do_fork (clone_flags=17, stack_start=3214960796, regs=0x86894b4, stack_size=0, parent_tidptr=0x0, child_tidptr=0x0) at kernel/fork.c:1368 #10 0x080628b8 in sys_fork () at arch/um/kernel/syscall.c:34 #11 0x08065492 in handle_syscall (r=0x86894b4) at arch/um/kernel/skas/syscall.c:38 #12 0x08078647 in handle_trap (pid=6063, regs=0x86894b4, local_using_sysemu=2) at arch/um/os-Linux/skas/process.c:151 #13 0x08078d95 in userspace (regs=0x86894b4) at arch/um/os-Linux/skas/process.c:302 #14 0x08065104 in fork_handler () at arch/um/kernel/skas/process.c:96 #15 0x in ?? () (gdb) which is not really specific to rebooting/shutdown. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
failure of cache_alloc_refill to terminate
I am running a UserModeLinux of 2.6.19-rc4 vintage. It failed to stop properly on halt. Yes, there is some custom code in it, and it is possible that some memory got walked on. Attaching GDB to it, shows that it is stuck in: slab.c, cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags) in the loop for: while (batchcount 0) {..} because the decrement: while (slabp-inuse cachep-num batchcount--) { is never performed. I don't know why slabp-inuse cachep-num, but because it it is never true, the batchcount never gets decremented. Since, I think the point of the batchcount was to prevent it from trying too hard, I am uncertain if it was intended that the batchcount not decrement if the inuse check fails. I think the point was to limit how many things we add to the slab at once. (Yes, I suspect that the slab got corrupted.) The stack trace is: #0 cache_alloc_refill (cachep=0x8800600, flags=208) at mm/slab.c:2954 #1 0x080bfbf0 in __kmalloc (size=1024, flags=value optimized out) at mm/slab.c:3085 #2 0x080d8c34 in alloc_fd_array (num=256) at include/linux/slab.h:146 #3 0x080d91da in alloc_fdtable (nr=255) at fs/file.c:274 #4 0x080d926a in expand_fdtable (files=0x8c12a80, nr=255) at fs/file.c:304 #5 0x080d9315 in expand_files (files=0x8c12a80, nr=255) at fs/file.c:347 #6 0x0807fffb in dup_fd (oldf=0x82d1480, errorp=0x840fe3c) at kernel/fork.c:679 #7 0x08080237 in copy_files (clone_flags=17, tsk=0x8771380) at kernel/fork.c:764 #8 0x08080706 in copy_process (clone_flags=17, stack_start=3214960796, regs=0x86894b4, stack_size=0, parent_tidptr=0x0, child_tidptr=0x0, pid=952) at kernel/fork.c:1107 #9 0x080810cd in do_fork (clone_flags=17, stack_start=3214960796, regs=0x86894b4, stack_size=0, parent_tidptr=0x0, child_tidptr=0x0) at kernel/fork.c:1368 #10 0x080628b8 in sys_fork () at arch/um/kernel/syscall.c:34 #11 0x08065492 in handle_syscall (r=0x86894b4) at arch/um/kernel/skas/syscall.c:38 #12 0x08078647 in handle_trap (pid=6063, regs=0x86894b4, local_using_sysemu=2) at arch/um/os-Linux/skas/process.c:151 #13 0x08078d95 in userspace (regs=0x86894b4) at arch/um/os-Linux/skas/process.c:302 #14 0x08065104 in fork_handler () at arch/um/kernel/skas/process.c:96 #15 0x in ?? () (gdb) which is not really specific to rebooting/shutdown. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/14] ppc32: Remove board ports that are no longer maintained
-BEGIN PGP SIGNED MESSAGE- >>>>> "Kumar" == Kumar Gala <[EMAIL PROTECTED]> writes: >> When we recover our history from the linuxppc-2.4/2.5 trees we >> can show exactly how long it's been since anybody touched ep405. >> >> Quick googling shows that it's been almost 2 years since the last >> mention of ep405 (exluding removal discussions) on >> linuxppc-embedded. Last ep405-related commits are more than 2 >> years ago. So, I'll bet I can find other parts of the kernel tree that haven't been touched in 2 years. Maybe there isn't anything to fix? Happens that in our case, a) the board is the basis to our own board. b) we only moved to 2.6 in May. So, I just don't get removing board support files. - -- ] Michael Richardson Xelerance Corporation, Ottawa, ON | firewalls [ ] mcr @ xelerance.com Now doing IPsec training, see |net architect[ ] http://www.sandelman.ca/mcr/www.xelerance.com/training/ |device driver[ ]I'm a dad: http://www.sandelman.ca/lrmr/ [ -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQupaXIqHRg3pndX9AQFnGQP+JXX0ZTKW35LljC/ighUPpmcdClRlmWP2 fsnofXYNi2v9QEkYpoS8pHMc3ClKHT8MFzK/nsDe1CFWPxxavK+365usf77DSGWB bjZ8CZWjkvDt7IMjBxEnSlzCTVt39Gtjq1zM/DMY0SOi1ccB7TIZE+1Ol3zkYnW5 2X6+0SKgS6Q= =kcQj -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/14] ppc32: Remove board ports that are no longer maintained
-BEGIN PGP SIGNED MESSAGE- Kumar == Kumar Gala [EMAIL PROTECTED] writes: When we recover our history from the linuxppc-2.4/2.5 trees we can show exactly how long it's been since anybody touched ep405. Quick googling shows that it's been almost 2 years since the last mention of ep405 (exluding removal discussions) on linuxppc-embedded. Last ep405-related commits are more than 2 years ago. So, I'll bet I can find other parts of the kernel tree that haven't been touched in 2 years. Maybe there isn't anything to fix? Happens that in our case, a) the board is the basis to our own board. b) we only moved to 2.6 in May. So, I just don't get removing board support files. - -- ] Michael Richardson Xelerance Corporation, Ottawa, ON | firewalls [ ] mcr @ xelerance.com Now doing IPsec training, see |net architect[ ] http://www.sandelman.ca/mcr/www.xelerance.com/training/ |device driver[ ]I'm a dad: http://www.sandelman.ca/lrmr/ [ -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQupaXIqHRg3pndX9AQFnGQP+JXX0ZTKW35LljC/ighUPpmcdClRlmWP2 fsnofXYNi2v9QEkYpoS8pHMc3ClKHT8MFzK/nsDe1CFWPxxavK+365usf77DSGWB bjZ8CZWjkvDt7IMjBxEnSlzCTVt39Gtjq1zM/DMY0SOi1ccB7TIZE+1Ol3zkYnW5 2X6+0SKgS6Q= =kcQj -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/14] ppc32: Remove board ports that are no longer maintained
-BEGIN PGP SIGNED MESSAGE- Kumar, I thought that we had some volunteers to take care of some of those. I know that I still care about ep405, and I'm willing to maintain the code. - -- ] Michael Richardson Xelerance Corporation, Ottawa, ON | firewalls [ ] mcr @ xelerance.com Now doing IPsec training, see |net architect[ ] http://www.sandelman.ca/mcr/www.xelerance.com/training/ |device driver[ ]I'm a dad: http://www.sandelman.ca/lrmr/ [ -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQueyoYqHRg3pndX9AQEUPwQA5mlU9YM5UUnJtJiuh074MOblFu0+Eu2e dyE8e8VPkfPdHuc4UZxZUIwoGQm37PXA+Wtu8W/FYeXBkvW9or7mvGgN5kYYp7iI 2Gu2Kk+qhmhO5sK107Sf7pS/FWkXR3hQ80oOcZQ3ow4GaA6zcpIj7IDvl2qoFgkX xKLAGhjY+6c= =GR1d -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/14] ppc32: Remove board ports that are no longer maintained
-BEGIN PGP SIGNED MESSAGE- Kumar, I thought that we had some volunteers to take care of some of those. I know that I still care about ep405, and I'm willing to maintain the code. - -- ] Michael Richardson Xelerance Corporation, Ottawa, ON | firewalls [ ] mcr @ xelerance.com Now doing IPsec training, see |net architect[ ] http://www.sandelman.ca/mcr/www.xelerance.com/training/ |device driver[ ]I'm a dad: http://www.sandelman.ca/lrmr/ [ -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Finger me for keys iQCVAwUBQueyoYqHRg3pndX9AQEUPwQA5mlU9YM5UUnJtJiuh074MOblFu0+Eu2e dyE8e8VPkfPdHuc4UZxZUIwoGQm37PXA+Wtu8W/FYeXBkvW9or7mvGgN5kYYp7iI 2Gu2Kk+qhmhO5sK107Sf7pS/FWkXR3hQ80oOcZQ3ow4GaA6zcpIj7IDvl2qoFgkX xKLAGhjY+6c= =GR1d -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New net features for added performance
>>>>> "Jeff" == Jeff Garzik <[EMAIL PROTECTED]> writes: Jeff> 1) Rx Skb recycling. It would be nice to have skbs returned to the Jeff> driver after the net core is done with them, rather than have netif_rx Jeff> free the skb. Many drivers pre-allocate a number of maximum-sized skbs Jeff> into which the net card DMA's data. If netif_rx returned the SKB Jeff> instead of freeing it, the driver could simply flip the DescriptorOwned Jeff> bit for that buffer, giving it immediately back to the net card. Jeff> Disadvantages? netif_rx() would have to copy the buffer. Right now, it just puts it on the queue towards the BH. For it to return the skb would require that all processing occur inside of netif_rx() (a la BSD), or that it copy the buffer. Jeff> 3) Slabbier packet allocation. Even though skb allocation is decently Jeff> fast, you are still looking at an skb buffer head grab and a I think that if you had this, and you also returned skb's to this list on a per device basis (change skb->free, I think) instead of to the general pool, you probably eliminate your request #1. ] Train travel features AC outlets with no take-off restrictions|gigabit is no[ ] Michael Richardson, Solidum Systems Oh where, oh where has|problem with[ ] [EMAIL PROTECTED] www.solidum.com the little fishy gone?|PAX.port 1100[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New net features for added performance
"Jeff" == Jeff Garzik [EMAIL PROTECTED] writes: Jeff 1) Rx Skb recycling. It would be nice to have skbs returned to the Jeff driver after the net core is done with them, rather than have netif_rx Jeff free the skb. Many drivers pre-allocate a number of maximum-sized skbs Jeff into which the net card DMA's data. If netif_rx returned the SKB Jeff instead of freeing it, the driver could simply flip the DescriptorOwned Jeff bit for that buffer, giving it immediately back to the net card. Jeff Disadvantages? netif_rx() would have to copy the buffer. Right now, it just puts it on the queue towards the BH. For it to return the skb would require that all processing occur inside of netif_rx() (a la BSD), or that it copy the buffer. Jeff 3) Slabbier packet allocation. Even though skb allocation is decently Jeff fast, you are still looking at an skb buffer head grab and a I think that if you had this, and you also returned skb's to this list on a per device basis (change skb-free, I think) instead of to the general pool, you probably eliminate your request #1. ] Train travel features AC outlets with no take-off restrictions|gigabit is no[ ] Michael Richardson, Solidum Systems Oh where, oh where has|problem with[ ] [EMAIL PROTECTED] www.solidum.com the little fishy gone?|PAX.port 1100[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Too long network device names corrupts kernel
>>>>> "Tobias" == Tobias Ringstrom <[EMAIL PROTECTED]> writes: Tobias> Btw, does anyone know of a C function that works like strncpy, but does Tobias> add a terminating null character, event if the string does not fit, ro Tobias> does one have to do str[5]=0 first, and then strncpy(str,src,4)? str[0]=0; strncat(str, src, 4); Works as you want. ] Train travel features AC outlets with no take-off restrictions|gigabit is no[ ] Michael Richardson, Solidum Systems Oh where, oh where has|problem with[ ] [EMAIL PROTECTED] www.solidum.com the little fishy gone?|PAX.port 1100[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Too long network device names corrupts kernel
"Tobias" == Tobias Ringstrom [EMAIL PROTECTED] writes: Tobias Btw, does anyone know of a C function that works like strncpy, but does Tobias add a terminating null character, event if the string does not fit, ro Tobias does one have to do str[5]=0 first, and then strncpy(str,src,4)? str[0]=0; strncat(str, src, 4); Works as you want. ] Train travel features AC outlets with no take-off restrictions|gigabit is no[ ] Michael Richardson, Solidum Systems Oh where, oh where has|problem with[ ] [EMAIL PROTECTED] www.solidum.com the little fishy gone?|PAX.port 1100[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/