Re: Sanitize CPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 It took me some time to build me the Debian Sid testing environment for amd64 with the same quality, I have vor i386, but now it is ready. And it seems, that amd64 is also affected, but lockup is immediately (makes exploitation harder) Here is the OOPS from the serial console, again in __switch_to [ 498.783577] fpu exception: [#1] SMP [ 498.787054] Modules linked in: xt_multiport xt_hashlimit xt_tcpudp ipt_ULOG xt_LOG xt_conntrack iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables fuse snd_pcm snd_page_alloc snd_timer snd soundcore i2c_piix4 psmouse pcspkr evdev serio_raw i2c_core parport_pc parport battery button ac ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common sg sr_mod cdrom ata_generic virtio_net mptspi scsi_transport_spi ata_piix virtio_pci virtio_ring virtio mptscsih mptbase libata scsi_mod [ 498.787205] CPU: 0 PID: 1783 Comm: Test Not tainted 3.12-1-amd64 #1 Debian 3.12.6-2 [ 498.787205] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 498.787205] task: 88000cb18840 ti: 88000b454000 task.ti: 88000b454000 [ 498.787205] RIP: 0010:[] [] __switch_to+0x2d0/0x490 [ 498.787205] RSP: 0018:88000e0c78b8 EFLAGS: 00010002 [ 498.787205] RAX: 0001 RBX: 88000e0b77c0 RCX: c100 [ 498.787205] RDX: RSI: 51e3f800 RDI: c100 [ 498.787205] RBP: 88000cb18840 R08: R09: 3314 [ 498.787205] R10: 1746 R11: 000f R12: [ 498.787205] R13: R14: 88000fc11780 R15: [ 498.787205] FS: 7fb651e3f800() GS:88000fc0() knlGS: [ 498.787205] CS: 0010 DS: ES: CR0: 80050033 [ 498.787205] CR2: 7f72ddfcc990 CR3: 0e22d000 CR4: 06f0 [ 498.787205] Stack: [ 498.787205] 88000e0b7bc0 00010fc14330 88000b4efac0 88000e0b77c0 [ 498.787205] 88000fc142c0 88000b5d3b40 88000e0b77c0 [ 498.787205] 8148febe 88000e0b77c0 0086 000142c0 [ 498.787205] Call Trace: [ 498.787205] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 bf 7d 00 00 00 e8 e6 00 01 00 84 c0 0f 85 d7 fd ff ff 0f 06 66 66 90 66 90 e9 cb fd ff ff 66 90 <0f> 77 db 83 94 04 00 00 66 90 eb 74 b8 ff ff ff ff 48 8b bb 98 [ 498.787205] RIP [] __switch_to+0x2d0/0x490 [ 498.787205] RSP [ 498.787205] ---[ end trace 3f873c38e16c8005 ]--- [ 498.787205] Fixing recursive fault but reboot is needed! I'll try to go the same line as before: understand it, write a local-root-exploit for it (I feel it somehow, that this might be really hard on that kernel) and test it on the bare hardware afterwards. - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLPJ7MACgkQxFmThv7tq+6CSACeK7/SBzJOVvLlVBas9NANZYFp pEUAn21LoX0ewsnOag7fomqtvKqUzGyL =pst0 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 It took me some time to build me the Debian Sid testing environment for amd64 with the same quality, I have vor i386, but now it is ready. And it seems, that amd64 is also affected, but lockup is immediately (makes exploitation harder) Here is the OOPS from the serial console, again in __switch_to [ 498.783577] fpu exception: [#1] SMP [ 498.787054] Modules linked in: xt_multiport xt_hashlimit xt_tcpudp ipt_ULOG xt_LOG xt_conntrack iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables fuse snd_pcm snd_page_alloc snd_timer snd soundcore i2c_piix4 psmouse pcspkr evdev serio_raw i2c_core parport_pc parport battery button ac ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common sg sr_mod cdrom ata_generic virtio_net mptspi scsi_transport_spi ata_piix virtio_pci virtio_ring virtio mptscsih mptbase libata scsi_mod [ 498.787205] CPU: 0 PID: 1783 Comm: Test Not tainted 3.12-1-amd64 #1 Debian 3.12.6-2 [ 498.787205] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 498.787205] task: 88000cb18840 ti: 88000b454000 task.ti: 88000b454000 [ 498.787205] RIP: 0010:[81011730] [81011730] __switch_to+0x2d0/0x490 [ 498.787205] RSP: 0018:88000e0c78b8 EFLAGS: 00010002 [ 498.787205] RAX: 0001 RBX: 88000e0b77c0 RCX: c100 [ 498.787205] RDX: RSI: 51e3f800 RDI: c100 [ 498.787205] RBP: 88000cb18840 R08: R09: 3314 [ 498.787205] R10: 1746 R11: 000f R12: [ 498.787205] R13: R14: 88000fc11780 R15: [ 498.787205] FS: 7fb651e3f800() GS:88000fc0() knlGS: [ 498.787205] CS: 0010 DS: ES: CR0: 80050033 [ 498.787205] CR2: 7f72ddfcc990 CR3: 0e22d000 CR4: 06f0 [ 498.787205] Stack: [ 498.787205] 88000e0b7bc0 00010fc14330 88000b4efac0 88000e0b77c0 [ 498.787205] 88000fc142c0 88000b5d3b40 88000e0b77c0 [ 498.787205] 8148febe 88000e0b77c0 0086 000142c0 [ 498.787205] Call Trace: [ 498.787205] Code: ff 66 2e 0f 1f 84 00 00 00 00 00 bf 7d 00 00 00 e8 e6 00 01 00 84 c0 0f 85 d7 fd ff ff 0f 06 66 66 90 66 90 e9 cb fd ff ff 66 90 0f 77 db 83 94 04 00 00 66 90 eb 74 b8 ff ff ff ff 48 8b bb 98 [ 498.787205] RIP [81011730] __switch_to+0x2d0/0x490 [ 498.787205] RSP 88000e0c78b8 [ 498.787205] ---[ end trace 3f873c38e16c8005 ]--- [ 498.787205] Fixing recursive fault but reboot is needed! I'll try to go the same line as before: understand it, write a local-root-exploit for it (I feel it somehow, that this might be really hard on that kernel) and test it on the bare hardware afterwards. - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLPJ7MACgkQxFmThv7tq+6CSACeK7/SBzJOVvLlVBas9NANZYFp pEUAn21LoX0ewsnOag7fomqtvKqUzGyL =pst0 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Borislav Petkov wrote: > On Wed, Jan 08, 2014 at 09:42:40AM -0800, H. Peter Anvin wrote: >> Adding Borislav. >> >> Boris, do you happen to know of any erratum on AMD E-350 which >> may be in play here? > > Interesting. Well, nothing looks even remotely related from looking > at the F14h rev guide here: > > http://developer.amd.com/wordpress/media/2012/10/47534_14h_Mod_00h-0Fh_Rev_Guide.pdf > > Btw, hd (if that is your real name :-)), can you post > /proc/cpuinfo? Of course (you can also find it in the Debian bug report [1]): processor : 0 vendor_id : AuthenticAMD cpu family : 20 model : 1 model name : AMD E-350 Processor stepping: 0 microcode : 0x528 cpu MHz : 1596.563 cache size : 512 KB fdiv_bug: no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter bogomips: 3193.12 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate > I think I might have a E-350 here too and I could try to reproduce. > Btw, how exactly do you trigger? > > You run > FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c first > to modify shmem_xattr_handlers and then > ManipulatedXattrHandlerForPrivEscalation.c? You need a 32-bit > kernel and userspace, right? Anything else? Yes: I used the standard Debian Sid 468 kernel (32bit), the first tool might just trigger the OOPS to early, this seems to be harmless to the kernel, so one can invoke it until the handler pointer was modified. Since I hardcoded the Debian kernel addresses (copied from System.map), this is very unlikly to give you root on another kernel, but the math OOPS should be reproducible. Does this sound fishy (from [2])? "There is no need to save any active fpu state to the task structure memory if the task is dead. Just drop the state instead." My rogue process might interfere with that: change control registers, cause exception and then exit quickly Or could it be invalid CPU-features detection, perhaps related to [3]? The math-restore/__do_switch combination occurred already in older bug reports, e.g. [4] (very close), [5] (similar, poor info). )))OOPS "EIP is at math_state_restore"((( seems to be suitable search expression. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733551 [2] http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02182.html [3] http://lkml.indiana.edu/hypermail/linux/kernel/0905.2/02599.html [4] https://lkml.org/lkml/2008/6/16/146 [5] http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1536 - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLNww0ACgkQxFmThv7tq+4LngCeI/ZVFtzEy9RDpVP9Jk46tzGs 9h8Ani/YO9FsUOpcKxiXovJkTPiKuI4e =InkM -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Borislav Petkov wrote: On Wed, Jan 08, 2014 at 09:42:40AM -0800, H. Peter Anvin wrote: Adding Borislav. Boris, do you happen to know of any erratum on AMD E-350 which may be in play here? Interesting. Well, nothing looks even remotely related from looking at the F14h rev guide here: http://developer.amd.com/wordpress/media/2012/10/47534_14h_Mod_00h-0Fh_Rev_Guide.pdf Btw, hd (if that is your real name :-)), can you post /proc/cpuinfo? Of course (you can also find it in the Debian bug report [1]): processor : 0 vendor_id : AuthenticAMD cpu family : 20 model : 1 model name : AMD E-350 Processor stepping: 0 microcode : 0x528 cpu MHz : 1596.563 cache size : 512 KB fdiv_bug: no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter bogomips: 3193.12 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate I think I might have a E-350 here too and I could try to reproduce. Btw, how exactly do you trigger? You run FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c first to modify shmem_xattr_handlers and then ManipulatedXattrHandlerForPrivEscalation.c? You need a 32-bit kernel and userspace, right? Anything else? Yes: I used the standard Debian Sid 468 kernel (32bit), the first tool might just trigger the OOPS to early, this seems to be harmless to the kernel, so one can invoke it until the handler pointer was modified. Since I hardcoded the Debian kernel addresses (copied from System.map), this is very unlikly to give you root on another kernel, but the math OOPS should be reproducible. Does this sound fishy (from [2])? There is no need to save any active fpu state to the task structure memory if the task is dead. Just drop the state instead. My rogue process might interfere with that: change control registers, cause exception and then exit quickly Or could it be invalid CPU-features detection, perhaps related to [3]? The math-restore/__do_switch combination occurred already in older bug reports, e.g. [4] (very close), [5] (similar, poor info). )))OOPS EIP is at math_state_restore((( seems to be suitable search expression. [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733551 [2] http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/02182.html [3] http://lkml.indiana.edu/hypermail/linux/kernel/0905.2/02599.html [4] https://lkml.org/lkml/2008/6/16/146 [5] http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1536 - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLNww0ACgkQxFmThv7tq+4LngCeI/ZVFtzEy9RDpVP9Jk46tzGs 9h8Ani/YO9FsUOpcKxiXovJkTPiKuI4e =InkM -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Update to the issue: * Although first observed with virtual-8086 mode, the bug is not specific to virtual-8086 mode, it can be triggered with normal x86 userspace code also, even with better reproducibility. * Ben Hutchings looked at the Debian bug report [1], he failed to reproduce on his hardware, so it might be specific to some CPU models (currently my AMD E-350 is only machine known to be affected). * When deactivating mmap_min_addr, the NULL-dereferences during task-switch is exploitable, works both on native hardware and within VirtualBox. See [2] for POC to gain root privileges. * It seems, that when changing the FPU control word with "fstcw" just before exit of the process, then another process could suffer when doing __do_switch, probably related to the xsave instruction and a x86 processor bug workaround, see "noxsave" switch [3]: [BUGS=X86] Disables x86 extended register state save and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. hd [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733551 [2] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ [3] https://www.kernel.org/doc/Documentation/kernel-parameters.txt - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLNAjEACgkQxFmThv7tq+44FACfeDHQHK71+7tZawm9Ftjw7Hvp j04AmwY04UwG9clERS3e1HisM2swbo1i =KoQL -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Update to the issue: * Although first observed with virtual-8086 mode, the bug is not specific to virtual-8086 mode, it can be triggered with normal x86 userspace code also, even with better reproducibility. * Ben Hutchings looked at the Debian bug report [1], he failed to reproduce on his hardware, so it might be specific to some CPU models (currently my AMD E-350 is only machine known to be affected). * When deactivating mmap_min_addr, the NULL-dereferences during task-switch is exploitable, works both on native hardware and within VirtualBox. See [2] for POC to gain root privileges. * It seems, that when changing the FPU control word with fstcw just before exit of the process, then another process could suffer when doing __do_switch, probably related to the xsave instruction and a x86 processor bug workaround, see noxsave switch [3]: [BUGS=X86] Disables x86 extended register state save and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. hd [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=733551 [2] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ [3] https://www.kernel.org/doc/Documentation/kernel-parameters.txt - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLNAjEACgkQxFmThv7tq+44FACfeDHQHK71+7tZawm9Ftjw7Hvp j04AmwY04UwG9clERS3e1HisM2swbo1i =KoQL -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize FPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 H. Peter Anvin wrote: > On 12/31/2013 11:21 AM, Konrad Rzeszutek Wilk wrote: >> >> So, I am wondering if this is related to " x86/fpu: CR0.TS should >> be set before trap into PV guest's #NM exception handle" which >> does have a similar pattern - you do enough of the task switches >> and the FPU is screwed. >> >> See >> http://mid.gmane.org/1383720072-6242-1-git-send-email-gaoyang@taobao.com >> >> >> (I thought there was a thread about this on LKML too but I can't >> find it). > > That would be a bug in Xen, so I guess you're surmising a similar > bug in VirtualBox? Not sure on that yet, but the whole thing is getting even more funnier, the longer I can play with it. Here is some more information from my latest tests: * Although first observed with virtual-8086 mode, the bug is not specific to virtual-8086 mode, it can be triggered with normal x86 userspace code also, even with better reproducibility. * It seems, that when changing the FPU control word with "fstcw" just before exit of the process, then another process could suffer when doing __do_switch * By having two rogue processes writing data to each other via a socket, time and code-position of OOPS can be influenced. * When deactivating mmap_min_addr, the NULL-dereferences during task-switch are exploitable, but I did not get full ring-0 code execution yet, putting EIP to the NULL-seg seem to have failed, perhaps wrong RPL? Hoping to fix that during next days. You can find the new improved test code at [1]. hd [1] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/FpuStateTaskSwitchOops.c - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLHQrwACgkQxFmThv7tq+4C+wCfZ0a0LhaJqI7DW78ZFGbnzIyu 6H8AnROrUklhvdbAGV5+7/ELEzPikU7T =jKjH -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize FPU-state when switching tasks (was sanitize CPU-state when switching from virtual-8086 mode to other task)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 H. Peter Anvin wrote: On 12/31/2013 11:21 AM, Konrad Rzeszutek Wilk wrote: So, I am wondering if this is related to x86/fpu: CR0.TS should be set before trap into PV guest's #NM exception handle which does have a similar pattern - you do enough of the task switches and the FPU is screwed. See http://mid.gmane.org/1383720072-6242-1-git-send-email-gaoyang@taobao.com (I thought there was a thread about this on LKML too but I can't find it). That would be a bug in Xen, so I guess you're surmising a similar bug in VirtualBox? Not sure on that yet, but the whole thing is getting even more funnier, the longer I can play with it. Here is some more information from my latest tests: * Although first observed with virtual-8086 mode, the bug is not specific to virtual-8086 mode, it can be triggered with normal x86 userspace code also, even with better reproducibility. * It seems, that when changing the FPU control word with fstcw just before exit of the process, then another process could suffer when doing __do_switch * By having two rogue processes writing data to each other via a socket, time and code-position of OOPS can be influenced. * When deactivating mmap_min_addr, the NULL-dereferences during task-switch are exploitable, but I did not get full ring-0 code execution yet, putting EIP to the NULL-seg seem to have failed, perhaps wrong RPL? Hoping to fix that during next days. You can find the new improved test code at [1]. hd [1] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/FpuStateTaskSwitchOops.c - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLHQrwACgkQxFmThv7tq+4C+wCfZ0a0LhaJqI7DW78ZFGbnzIyu 6H8AnROrUklhvdbAGV5+7/ELEzPikU7T =jKjH -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching from virtual-8086 mode to other task
H. Peter Anvin wrote: > On 12/29/2013 12:44 PM, halfdog wrote: >> H. Peter Anvin wrote: >>> On 12/28/2013 02:02 PM, halfdog wrote: >>>> It seems that missing CPU-state sanitation during task >>>> switching triggers kernel-panic. This might be related to >>>> unhandled FPU-errors. See [1] for POC and serial console log >>>> of OOPs. Due to missing real 32-bit x86-hardware it is not >>>> clear, if this issue might be related to subtle differences in >>>> virtual-8086 mode handling when inside a virtualbox guest. >>>> >> >>> This oops happens inside the guest? Either way, I would be >>> *very* skeptical of Virtualbox in this case. >> >>> You can run a 32-bit kernel on 64-bit hardware, you know... >> >> I know, but hardware was occupied with long-running simulation. >> >> With the initial POC, there might be a timing issue involved, with >> different process layout, exception does not occur in swith_to but >> sometimes on other locations. >> >> I created a new random-code testcase [1] , which works around that >> problem. When booted a Debian initrd and tried id, OOPSes are >> fired like wild but at least system does not lock up immediately. >> > > Still in VirtualBox? Yes, again: after comparing the results from initrd on real hardware with Vbox, I'm getting to understand the timing problem involved and why timing in VBox is different: The test program usually OOPSes when touching FPU multiple times, otherwise, when terminated before second FPU-interacation, it OOPSes on next invocation, stumbling over invalid CPU state from prior invocation. With improved code, I can rather reliably bring CPU into that state, so that next process invoked and touching FPU/MMX-state is OOPSed. Currently searching SUID-binaries and running UID=0 daemons, that might show interesting reaction on that event, but only on DOS level yet, e.g. after running V2 test program once and then connecting via SSH, this currently kills the ssh daemon nicely. It seems that machine lockup occurs when e.g. switch to idle task happens at exactly the right moment, which I currently cannot trigger on real hardware, but still working on that. -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching from virtual-8086 mode to other task
H. Peter Anvin wrote: On 12/29/2013 12:44 PM, halfdog wrote: H. Peter Anvin wrote: On 12/28/2013 02:02 PM, halfdog wrote: It seems that missing CPU-state sanitation during task switching triggers kernel-panic. This might be related to unhandled FPU-errors. See [1] for POC and serial console log of OOPs. Due to missing real 32-bit x86-hardware it is not clear, if this issue might be related to subtle differences in virtual-8086 mode handling when inside a virtualbox guest. This oops happens inside the guest? Either way, I would be *very* skeptical of Virtualbox in this case. You can run a 32-bit kernel on 64-bit hardware, you know... I know, but hardware was occupied with long-running simulation. With the initial POC, there might be a timing issue involved, with different process layout, exception does not occur in swith_to but sometimes on other locations. I created a new random-code testcase [1] , which works around that problem. When booted a Debian initrd and tried id, OOPSes are fired like wild but at least system does not lock up immediately. Still in VirtualBox? Yes, again: after comparing the results from initrd on real hardware with Vbox, I'm getting to understand the timing problem involved and why timing in VBox is different: The test program usually OOPSes when touching FPU multiple times, otherwise, when terminated before second FPU-interacation, it OOPSes on next invocation, stumbling over invalid CPU state from prior invocation. With improved code, I can rather reliably bring CPU into that state, so that next process invoked and touching FPU/MMX-state is OOPSed. Currently searching SUID-binaries and running UID=0 daemons, that might show interesting reaction on that event, but only on DOS level yet, e.g. after running V2 test program once and then connecting via SSH, this currently kills the ssh daemon nicely. It seems that machine lockup occurs when e.g. switch to idle task happens at exactly the right moment, which I currently cannot trigger on real hardware, but still working on that. -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching from virtual-8086 mode to other task
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 H. Peter Anvin wrote: > On 12/28/2013 02:02 PM, halfdog wrote: >> It seems that missing CPU-state sanitation during task switching >> triggers kernel-panic. This might be related to unhandled >> FPU-errors. See [1] for POC and serial console log of OOPs. Due >> to missing real 32-bit x86-hardware it is not clear, if this >> issue might be related to subtle differences in virtual-8086 >> mode handling when inside a virtualbox guest. >> > > This oops happens inside the guest? Either way, I would be *very* > skeptical of Virtualbox in this case. > > You can run a 32-bit kernel on 64-bit hardware, you know... I know, but hardware was occupied with long-running simulation. With the initial POC, there might be a timing issue involved, with different process layout, exception does not occur in swith_to but sometimes on other locations. I created a new random-code testcase [1] , which works around that problem. When booted a Debian initrd and tried id, OOPSes are fired like wild but at least system does not lock up immediately. hd [1] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/Virtual86RandomCode.c - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLAiZEACgkQxFmThv7tq+5dsgCeIqOicLB17PuV7C6AzfZIY9J9 I0UAnA7YftR+4Jz2d5jP6YbpmBBtNOAz =9MJY -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sanitize CPU-state when switching from virtual-8086 mode to other task
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 H. Peter Anvin wrote: On 12/28/2013 02:02 PM, halfdog wrote: It seems that missing CPU-state sanitation during task switching triggers kernel-panic. This might be related to unhandled FPU-errors. See [1] for POC and serial console log of OOPs. Due to missing real 32-bit x86-hardware it is not clear, if this issue might be related to subtle differences in virtual-8086 mode handling when inside a virtualbox guest. This oops happens inside the guest? Either way, I would be *very* skeptical of Virtualbox in this case. You can run a 32-bit kernel on 64-bit hardware, you know... I know, but hardware was occupied with long-running simulation. With the initial POC, there might be a timing issue involved, with different process layout, exception does not occur in swith_to but sometimes on other locations. I created a new random-code testcase [1] , which works around that problem. When booted a Debian initrd and tried id, OOPSes are fired like wild but at least system does not lock up immediately. hd [1] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/Virtual86RandomCode.c - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlLAiZEACgkQxFmThv7tq+5dsgCeIqOicLB17PuV7C6AzfZIY9J9 I0UAnA7YftR+4Jz2d5jP6YbpmBBtNOAz =9MJY -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Sanitize CPU-state when switching from virtual-8086 mode to other task
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 It seems that missing CPU-state sanitation during task switching triggers kernel-panic. This might be related to unhandled FPU-errors. See [1] for POC and serial console log of OOPs. Due to missing real 32-bit x86-hardware it is not clear, if this issue might be related to subtle differences in virtual-8086 mode handling when inside a virtualbox guest. hd [1] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ [ 348.270712] fpu exception: [#1] [ 348.270763] Modules linked in: nfnetlink_log nfnetlink xt_multiport xt_hashlimit xt_tcpudp ipt_ULOG xt_LOG xt_conntrack iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables snd_pcm snd_page_alloc snd_timer snd parport_pc soundcore microcode psmouse serio_raw pcspkr evdev parport ac battery button i2c_piix4 i2c_core ext4 crc16 mbcache jbd2 sg sr_mod sd_mod cdrom crc_t10dif ata_generic ata_piix mptspi scsi_transport_spi mptscsih libata mptbase pcnet32 mii scsi_mod [ 348.270763] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.11-2-486 #1 Debian 3.11.10-1 [ 348.270763] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 348.270763] task: cf835400 ti: cf93 task.ti: cf84a000 [ 348.270763] EIP: 0060:[] EFLAGS: 00010002 CPU: 0 [ 348.270763] EIP is at __switch_to+0x190/0x300 [ 348.270763] EAX: cd2eec00 EBX: cd2eec00 ECX: EDX: [ 348.270763] ESI: cf835400 EDI: 0001 EBP: cd2eedf8 ESP: cf931a40 [ 348.270763] DS: 007b ES: 007b FS: GS: 00e0 SS: 0068 [ 348.270763] CR0: 80050033 CR2: b76997e0 CR3: 0d11a000 CR4: 0690 [ 348.270763] Stack: [ 348.270763] 4a6ef7ab ccee9c80 ccee9900 cf835400 c13978cf cd2eec00 00200082 c15de480 [ 348.270763] 0018 67bf6d70 cf93 cd2eec00 1625d3df 0051 cd2eec2c c1056e15 [ 348.270763] 00200086 000a cf931a90 c1006cc8 00393f1e 5d3e5d0f 0040 [ 348.270763] Call Trace: [ 348.270763] [] ? __schedule+0x1ef/0x510 [ 348.270763] [] ? update_curr+0x95/0x140 [ 348.270763] [] ? sched_clock+0x8/0x10 [ 348.270763] [] ? schedule_hrtimeout_range_clock+0x165/0x180 [ 348.270763] [] ? __flush_work+0xbf/0x100 [ 348.270763] [] ? nf_nat_get_offset+0x39/0x60 [nf_nat] [ 348.270763] [] ? tcp_packet+0x637/0xf40 [nf_conntrack] [ 348.270763] [] ? tty_write_room+0xc/0x20 [ 348.270763] [] ? n_tty_poll+0x189/0x1a0 [ 348.270763] [] ? schedule_hrtimeout_range+0xf/0x20 [ 348.270763] [] ? poll_schedule_timeout+0x20/0x40 [ 348.270763] [] ? do_select+0x537/0x5f0 [ 348.270763] [] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [] ? nf_iterate+0x7d/0x90 [ 348.270763] [] ? __getnstimeofday+0x2c/0x110 [ 348.270763] [] ? bictcp_cong_avoid+0x12/0x4a0 [ 348.270763] [] ? getnstimeofday+0x5/0x20 [ 348.270763] [] ? tcp_ack+0x82b/0xdc0 [ 348.270763] [] ? local_bh_enable+0x70/0x80 [ 348.270763] [] ? ip_finish_output+0x151/0x350 [ 348.270763] [] ? put_compound_page+0xa/0xe0 [ 348.270763] [] ? tcp_rcv_established+0xf7/0x7a0 [ 348.270763] [] ? sk_reset_timer+0xc/0x20 [ 348.270763] [] ? tcp_v4_do_rcv+0x15e/0x3b0 [ 348.270763] [] ? release_sock+0x88/0xf0 [ 348.270763] [] ? tcp_sendmsg+0x177/0xc60 [ 348.270763] [] ? update_curr+0x95/0x140 [ 348.270763] [] ? core_sys_select+0x12c/0x220 [ 348.270763] [] ? sock_aio_write+0xe1/0x110 [ 348.270763] [] ? do_sync_write+0x6a/0xa0 [ 348.270763] [] ? fsnotify+0x203/0x2f0 [ 348.270763] [] ? SyS_select+0x8f/0xc0 [ 348.270763] [] ? syscall_trace_leave+0xa2/0xb0 [ 348.270763] [] ? syscall_call+0x7/0xb [ 348.270763] Code: e9 1d ff ff ff 8d b6 00 00 00 00 b8 7d 00 00 00 e8 36 b8 00 00 84 c0 0f 85 e1 fe ff ff 0f 06 8d 74 26 00 e9 d6 fe ff ff 8d 76 00 <0f> 77 db 83 4c 02 00 00 89 f6 8d b6 00 00 00 00 eb 66 b8 ff ff [ 348.270763] EIP: [] __switch_to+0x190/0x300 SS:ESP 0068:cf931a40 [ 348.270763] ---[ end trace c3836805b501f815 ]--- [ 348.274764] [ cut here ] [ 348.278424] kernel BUG at /build/linux-tAcKXn/linux-3.11.10/kernel/exit.c:870! [ 348.278764] invalid opcode: [#2] [ 348.278764] Modules linked in: nfnetlink_log nfnetlink xt_multiport xt_hashlimit xt_tcpudp ipt_ULOG xt_LOG xt_conntrack iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables snd_pcm snd_page_alloc snd_timer snd parport_pc soundcore microcode psmouse serio_raw pcspkr evdev parport ac battery button i2c_piix4 i2c_core ext4 crc16 mbcache jbd2 sg sr_mod sd_mod cdrom crc_t10dif ata_generic ata_piix mptspi scsi_transport_spi mptscsih libata mptbase pcnet32 mii scsi_mod [ 348.278764] CPU: 0 PID: 2220 Comm: sshd Tainted: G D 3.11-2-486 #1 Debian 3.11.10-1 [ 348.278764] Hardware
Sanitize CPU-state when switching from virtual-8086 mode to other task
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 It seems that missing CPU-state sanitation during task switching triggers kernel-panic. This might be related to unhandled FPU-errors. See [1] for POC and serial console log of OOPs. Due to missing real 32-bit x86-hardware it is not clear, if this issue might be related to subtle differences in virtual-8086 mode handling when inside a virtualbox guest. hd [1] http://www.halfdog.net/Security/2013/Vm86SyscallTaskSwitchKernelPanic/ [ 348.270712] fpu exception: [#1] [ 348.270763] Modules linked in: nfnetlink_log nfnetlink xt_multiport xt_hashlimit xt_tcpudp ipt_ULOG xt_LOG xt_conntrack iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables snd_pcm snd_page_alloc snd_timer snd parport_pc soundcore microcode psmouse serio_raw pcspkr evdev parport ac battery button i2c_piix4 i2c_core ext4 crc16 mbcache jbd2 sg sr_mod sd_mod cdrom crc_t10dif ata_generic ata_piix mptspi scsi_transport_spi mptscsih libata mptbase pcnet32 mii scsi_mod [ 348.270763] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 3.11-2-486 #1 Debian 3.11.10-1 [ 348.270763] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 348.270763] task: cf835400 ti: cf93 task.ti: cf84a000 [ 348.270763] EIP: 0060:[c10013e0] EFLAGS: 00010002 CPU: 0 [ 348.270763] EIP is at __switch_to+0x190/0x300 [ 348.270763] EAX: cd2eec00 EBX: cd2eec00 ECX: EDX: [ 348.270763] ESI: cf835400 EDI: 0001 EBP: cd2eedf8 ESP: cf931a40 [ 348.270763] DS: 007b ES: 007b FS: GS: 00e0 SS: 0068 [ 348.270763] CR0: 80050033 CR2: b76997e0 CR3: 0d11a000 CR4: 0690 [ 348.270763] Stack: [ 348.270763] 4a6ef7ab ccee9c80 ccee9900 cf835400 c13978cf cd2eec00 00200082 c15de480 [ 348.270763] 0018 67bf6d70 cf93 cd2eec00 1625d3df 0051 cd2eec2c c1056e15 [ 348.270763] 00200086 000a cf931a90 c1006cc8 00393f1e 5d3e5d0f 0040 [ 348.270763] Call Trace: [ 348.270763] [c13978cf] ? __schedule+0x1ef/0x510 [ 348.270763] [c1056e15] ? update_curr+0x95/0x140 [ 348.270763] [c1006cc8] ? sched_clock+0x8/0x10 [ 348.270763] [c13973d5] ? schedule_hrtimeout_range_clock+0x165/0x180 [ 348.270763] [c1044e9f] ? __flush_work+0xbf/0x100 [ 348.270763] [d0a4fa59] ? nf_nat_get_offset+0x39/0x60 [nf_nat] [ 348.270763] [d0a68df7] ? tcp_packet+0x637/0xf40 [nf_conntrack] [ 348.270763] [c124932c] ? tty_write_room+0xc/0x20 [ 348.270763] [c1246fb9] ? n_tty_poll+0x189/0x1a0 [ 348.270763] [c13973ff] ? schedule_hrtimeout_range+0xf/0x20 [ 348.270763] [c11093a0] ? poll_schedule_timeout+0x20/0x40 [ 348.270763] [c1109c77] ? do_select+0x537/0x5f0 [ 348.270763] [c11094d0] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [c11094d0] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [c11094d0] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [c11094d0] ? poll_select_copy_remaining+0x110/0x110 [ 348.270763] [c12f688d] ? nf_iterate+0x7d/0x90 [ 348.270763] [c1067e6c] ? __getnstimeofday+0x2c/0x110 [ 348.270763] [c133f7f2] ? bictcp_cong_avoid+0x12/0x4a0 [ 348.270763] [c1067f55] ? getnstimeofday+0x5/0x20 [ 348.270763] [c131116b] ? tcp_ack+0x82b/0xdc0 [ 348.270763] [c10353a0] ? local_bh_enable+0x70/0x80 [ 348.270763] [c1300301] ? ip_finish_output+0x151/0x350 [ 348.270763] [c10c612a] ? put_compound_page+0xa/0xe0 [ 348.270763] [c1311b07] ? tcp_rcv_established+0xf7/0x7a0 [ 348.270763] [c12c1edc] ? sk_reset_timer+0xc/0x20 [ 348.270763] [c131a94e] ? tcp_v4_do_rcv+0x15e/0x3b0 [ 348.270763] [c12c3558] ? release_sock+0x88/0xf0 [ 348.270763] [c13088d7] ? tcp_sendmsg+0x177/0xc60 [ 348.270763] [c1056e15] ? update_curr+0x95/0x140 [ 348.270763] [c1109e5c] ? core_sys_select+0x12c/0x220 [ 348.270763] [c12beee1] ? sock_aio_write+0xe1/0x110 [ 348.270763] [c10f9cda] ? do_sync_write+0x6a/0xa0 [ 348.270763] [c112b673] ? fsnotify+0x203/0x2f0 [ 348.270763] [c1109fdf] ? SyS_select+0x8f/0xc0 [ 348.270763] [c100aca2] ? syscall_trace_leave+0xa2/0xb0 [ 348.270763] [c1398fef] ? syscall_call+0x7/0xb [ 348.270763] Code: e9 1d ff ff ff 8d b6 00 00 00 00 b8 7d 00 00 00 e8 36 b8 00 00 84 c0 0f 85 e1 fe ff ff 0f 06 8d 74 26 00 e9 d6 fe ff ff 8d 76 00 0f 77 db 83 4c 02 00 00 89 f6 8d b6 00 00 00 00 eb 66 b8 ff ff [ 348.270763] EIP: [c10013e0] __switch_to+0x190/0x300 SS:ESP 0068:cf931a40 [ 348.270763] ---[ end trace c3836805b501f815 ]--- [ 348.274764] [ cut here ] [ 348.278424] kernel BUG at /build/linux-tAcKXn/linux-3.11.10/kernel/exit.c:870! [ 348.278764] invalid opcode: [#2] [ 348.278764] Modules linked in: nfnetlink_log nfnetlink xt_multiport xt_hashlimit xt_tcpudp ipt_ULOG xt_LOG xt_conntrack iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables snd_pcm snd_page_alloc snd_timer snd parport_pc soundcore microcode psmouse serio_raw pcspkr evdev
Re: [PATCH] exec: do not leave bprm->interp on stack
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kees Cook wrote: > On Tue, Nov 6, 2012 at 12:10 AM, P J P wrote: >> >> Hello Kees, Al, >> >> +-- On Sat, 27 Oct 2012, Kees Cook wrote --+ | If we change >> binfmt_script to not make a recursive call, then we still | need >> to keep the interp change somewhere off the stack. I still think >> | my patchset is the least bad. | | Al, do you have something >> else in mind? >> >> Guys, are there any updates further? >> >> Al, what's your take on the *rare* extra call to request_module? > > Without any other feedback, I'd like to use my minimal allocation > patch, since it fixes the problem and doesn't change any of the > semantics of how/when loading happens. As a first step, I think that we can go with the Keess' (nice/small/simple) patch. On the long run, exec should be reworked. Not only interp is modified, also credentials are set, e.g. when using "ping" as interpreter. With intransparent error handling and retry-logic, this might be a future local-root-exploit in the beginning (I tried to, but did not manage yet). Also a remark from Prasad Pandit did not make it to the list (or at least I missed the replies). > Yesterday, while testing Keess' patch I was reading through > execve(2) manual which says: path name must be a valid executable > which is NOT a script. > > $ man execve ... Interpreter scripts An interpreter script is a > text file that has execute permission enabled and whose first line > is of the form: > > #! interpreter [optional-arg] > > The interpreter must be a valid path name for an executable which > is not itself a script. Does someone know what POSIX says about that? I guess that interp recursion might have some usecases: Script uses interp, but interp was wrapped by admin or distribution folks into another script to fix something, e.g. to pass an additional arg. hd - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlCh7ZEACgkQxFmThv7tq+4X/QCeLN+0qUtP6Hhag1d4iwZ4PZbL evEAn2iPQH9mJ0zTHMs3qOsaWLRs9UWW =Ow3u -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] exec: do not leave bprm-interp on stack
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kees Cook wrote: On Tue, Nov 6, 2012 at 12:10 AM, P J P ppan...@redhat.com wrote: Hello Kees, Al, +-- On Sat, 27 Oct 2012, Kees Cook wrote --+ | If we change binfmt_script to not make a recursive call, then we still | need to keep the interp change somewhere off the stack. I still think | my patchset is the least bad. | | Al, do you have something else in mind? Guys, are there any updates further? Al, what's your take on the *rare* extra call to request_module? Without any other feedback, I'd like to use my minimal allocation patch, since it fixes the problem and doesn't change any of the semantics of how/when loading happens. As a first step, I think that we can go with the Keess' (nice/small/simple) patch. On the long run, exec should be reworked. Not only interp is modified, also credentials are set, e.g. when using ping as interpreter. With intransparent error handling and retry-logic, this might be a future local-root-exploit in the beginning (I tried to, but did not manage yet). Also a remark from Prasad Pandit did not make it to the list (or at least I missed the replies). Yesterday, while testing Keess' patch I was reading through execve(2) manual which says: path name must be a valid executable which is NOT a script. $ man execve ... Interpreter scripts An interpreter script is a text file that has execute permission enabled and whose first line is of the form: #! interpreter [optional-arg] The interpreter must be a valid path name for an executable which is not itself a script. Does someone know what POSIX says about that? I guess that interp recursion might have some usecases: Script uses interp, but interp was wrapped by admin or distribution folks into another script to fix something, e.g. to pass an additional arg. hd - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlCh7ZEACgkQxFmThv7tq+4X/QCeLN+0qUtP6Hhag1d4iwZ4PZbL evEAn2iPQH9mJ0zTHMs3qOsaWLRs9UWW =Ow3u -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] binfmt_script: do not leave interp on stack
Kees Cook wrote: > On Thu, Oct 11, 2012 at 07:32:40PM -0700, Kees Cook wrote: >> +/* >> + * Since bprm is already modified, we cannot continue if the the >> + * handlers for starting the new interpreter have failed. >> + * Make sure that we do not return -ENOEXEC, as that would >> + * allow searching for handlers to continue. >> + */ >> +if (retval == -ENOEXEC) >> +retval = -EINVAL; > > After looking at this some more, I wonder if this should be -ELOOP > instead? Or maybe that should happen if/when the recursion depth problem is > fixed? > > This is much more obvious, instead of "Invalid argument": > $ ./dotest.sh > file-AAAfile-: > bad interpreter: Too many levels of symbolic links In my opinion, a different, more specific error code is nice, but when not self-explanatory, it would need to be documented to avoid confusion. I do not know, what would be the most accepted way to change syscall return value semantics, if to change semantics or add new ones. From man-pages, many have already some meaning and only some could be re-interpreted in that way: E2BIG: The total number of bytes in the environment (envp) and argument list (argv) is too large. (not perfect, because usually only associated with mem/file size issues) ELOOP: Too many symbolic links were encountered in resolving filename or the name of a script or ELF interpreter. (currently no distinction from real symlink problems) EMFILE: The process has the maximum number of files open. (too generic?) This one has already a meaning, but only for ELF not script (but since script might also call ELF in the end, user cannot know): EINVAL: An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter). Those are not yet unused, but I think it is a bad idea to add them, since some programs might be confused by unexpected error code: ELIBMAX: Attempting to link in too many shared libraries (not a really good match) EMLINK: Too many links (somehow generic, do not know if usually used another way). It is strange: from current description, this one suits best, the only reason why we want to get rid of it is, that it triggers module reloading and another round of execution. ENOEXEC: An executable is not in a recognized format, is for the wrong architecture, or has some other format error that means it cannot be executed. Perhaps it would be better to continue returning ENOEXEC from syscall in that case but change the logic for module-reloading (use some other return value meaning in binfmt handlers in kernel internally)? > More importantly, I also wonder if interp handling to just be > changed to be an allocation that needs to be cleaned up, as done with > argv? You mean like an allocation on the stack of the new process' growing stack? This would be cleaned automatically if something goes wrong during exec. > Right now interp just points to the filename argument handed to > do_execve. Especially since it looks like binfmt_misc is vulnerable > to this as well, since it runs the risk of getting -ENOEXEC from > search_binary_handler, leaving bprm->interp pointing into the stack, > only to get it recalled after module loading attempts: > > static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs) > { > ... > char iname[BINPRM_BUF_SIZE]; > ... > bprm->interp = iname; /* for binfmt_script */ > ... > retval = search_binary_handler (bprm, regs); > if (retval < 0) > goto _error; > ... > _ret: > return retval; > _error: > if (fd_binary > 0) > sys_close(fd_binary); > bprm->interp_flags = 0; > bprm->interp_data = 0; > goto _ret; > } Correct. I hope the patch should be a formality, as soon as discussion on this one is done. -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] binfmt_script: do not leave interp on stack
Kees Cook wrote: On Thu, Oct 11, 2012 at 07:32:40PM -0700, Kees Cook wrote: +/* + * Since bprm is already modified, we cannot continue if the the + * handlers for starting the new interpreter have failed. + * Make sure that we do not return -ENOEXEC, as that would + * allow searching for handlers to continue. + */ +if (retval == -ENOEXEC) +retval = -EINVAL; After looking at this some more, I wonder if this should be -ELOOP instead? Or maybe that should happen if/when the recursion depth problem is fixed? This is much more obvious, instead of Invalid argument: $ ./dotest.sh file-AAAfile-: bad interpreter: Too many levels of symbolic links In my opinion, a different, more specific error code is nice, but when not self-explanatory, it would need to be documented to avoid confusion. I do not know, what would be the most accepted way to change syscall return value semantics, if to change semantics or add new ones. From man-pages, many have already some meaning and only some could be re-interpreted in that way: E2BIG: The total number of bytes in the environment (envp) and argument list (argv) is too large. (not perfect, because usually only associated with mem/file size issues) ELOOP: Too many symbolic links were encountered in resolving filename or the name of a script or ELF interpreter. (currently no distinction from real symlink problems) EMFILE: The process has the maximum number of files open. (too generic?) This one has already a meaning, but only for ELF not script (but since script might also call ELF in the end, user cannot know): EINVAL: An ELF executable had more than one PT_INTERP segment (i.e., tried to name more than one interpreter). Those are not yet unused, but I think it is a bad idea to add them, since some programs might be confused by unexpected error code: ELIBMAX: Attempting to link in too many shared libraries (not a really good match) EMLINK: Too many links (somehow generic, do not know if usually used another way). It is strange: from current description, this one suits best, the only reason why we want to get rid of it is, that it triggers module reloading and another round of execution. ENOEXEC: An executable is not in a recognized format, is for the wrong architecture, or has some other format error that means it cannot be executed. Perhaps it would be better to continue returning ENOEXEC from syscall in that case but change the logic for module-reloading (use some other return value meaning in binfmt handlers in kernel internally)? More importantly, I also wonder if interp handling to just be changed to be an allocation that needs to be cleaned up, as done with argv? You mean like an allocation on the stack of the new process' growing stack? This would be cleaned automatically if something goes wrong during exec. Right now interp just points to the filename argument handed to do_execve. Especially since it looks like binfmt_misc is vulnerable to this as well, since it runs the risk of getting -ENOEXEC from search_binary_handler, leaving bprm-interp pointing into the stack, only to get it recalled after module loading attempts: static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs) { ... char iname[BINPRM_BUF_SIZE]; ... bprm-interp = iname; /* for binfmt_script */ ... retval = search_binary_handler (bprm, regs); if (retval 0) goto _error; ... _ret: return retval; _error: if (fd_binary 0) sys_close(fd_binary); bprm-interp_flags = 0; bprm-interp_data = 0; goto _ret; } Correct. I hope the patch should be a formality, as soon as discussion on this one is done. -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Updated: [PATCH] hardening: add PROT_FINAL prot flag to mmap/mprotect
PaX Team wrote: > On 7 Oct 2012 at 9:43, Ard Biesheuvel wrote: > >> 2012/10/6 PaX Team : >>> sadly, this is not true at all, for multiple reasons: >>> >> .. snip ... >>> >>> cheers, >>> PaX Team >>> >> >> So can I summarize your position as that there is no merit at all in >> the ability to inhibit future permissions of existing mappings? > > i believe i answered this in the previous mail already: > >> there's certainly a point (i've been doing it for 12 years now), but to >> make an mprotect flag into an actual security feature, it had better pass >> simple tests, such as non-circumventability. any method relying on >> userland playing nice is already suspect of being the wrong way and right >> now i don't see how PROT_FINAL could be used for actual security. > > so if PROT_FINAL wants to be useful, you'd have to present a case of > how it does something useful *while* an exploited userland cannot get > around it. in fact i think i already told you that presenting your own > use case in more detail (read: source code, policy, etc) would be a > great step in 'selling the idea'. I like the idea of final memory protection, but I guess it is quite tricky to make it non-circumventable for reading or non-modification. To block code execution, this feature makes it harder but does not prevent anyway: if you can execute already (e.g. ROP), one still has ways to exec more of anything, e.g. load more stack data and stay ROPed, map new segments, write to file and map it r-x or exec the new file, but per-application policies to prevent that could be simpler than without PROT_FINAL. >From my point of view, when protecting against reading/modifiction, it would make only sense when current vm and all clones stay protected, e.g. against proc/$$/mem-reading, ptrace attaching of process to self or clones, not core-dumpable. Otherwise, except for the latest issue, it should be possible, that the process forks, parent modify child via ptrace or proc/mem, then parent just waits or commits suicide. If the content in memory or modification of running process is that important for success of attack, efforts might be taken to do that. But if PROT_FINAL could be made that solid, it might be quite interesting, especially with some proc-fs settings like final-modification-action: ignore (do not check final, e.g. for debugging), log (log and fail), kill (get rid of process immediately). With kernel-wide default and e.g. uid-0 modification of policy per process, that would still allow all debugging also. -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Updated: [PATCH] hardening: add PROT_FINAL prot flag to mmap/mprotect
PaX Team wrote: On 7 Oct 2012 at 9:43, Ard Biesheuvel wrote: 2012/10/6 PaX Team pagee...@freemail.hu: sadly, this is not true at all, for multiple reasons: .. snip ... cheers, PaX Team So can I summarize your position as that there is no merit at all in the ability to inhibit future permissions of existing mappings? i believe i answered this in the previous mail already: there's certainly a point (i've been doing it for 12 years now), but to make an mprotect flag into an actual security feature, it had better pass simple tests, such as non-circumventability. any method relying on userland playing nice is already suspect of being the wrong way and right now i don't see how PROT_FINAL could be used for actual security. so if PROT_FINAL wants to be useful, you'd have to present a case of how it does something useful *while* an exploited userland cannot get around it. in fact i think i already told you that presenting your own use case in more detail (read: source code, policy, etc) would be a great step in 'selling the idea'. I like the idea of final memory protection, but I guess it is quite tricky to make it non-circumventable for reading or non-modification. To block code execution, this feature makes it harder but does not prevent anyway: if you can execute already (e.g. ROP), one still has ways to exec more of anything, e.g. load more stack data and stay ROPed, map new segments, write to file and map it r-x or exec the new file, but per-application policies to prevent that could be simpler than without PROT_FINAL. From my point of view, when protecting against reading/modifiction, it would make only sense when current vm and all clones stay protected, e.g. against proc/$$/mem-reading, ptrace attaching of process to self or clones, not core-dumpable. Otherwise, except for the latest issue, it should be possible, that the process forks, parent modify child via ptrace or proc/mem, then parent just waits or commits suicide. If the content in memory or modification of running process is that important for success of attack, efforts might be taken to do that. But if PROT_FINAL could be made that solid, it might be quite interesting, especially with some proc-fs settings like final-modification-action: ignore (do not check final, e.g. for debugging), log (log and fail), kill (get rid of process immediately). With kernel-wide default and e.g. uid-0 modification of policy per process, that would still allow all debugging also. -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] Fix kernel stack data disclosure in binfmt_script during execve
Randy Dunlap wrote: > On 09/20/2012 09:05 AM, halfdog wrote: > >> halfdog wrote: >> >> Now this is the updated and also tested patch (vs. linux-3.5.4 kernel) to fix >> https://bugzilla.kernel.org/show_bug.cgi?id=46841 . See also >> http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ >> This patch adresses the stack data disclosure but does not deal with the >> excessive recursion (to be handled in separate patch if needed). >> >> --- fs/binfmt_script.c 2012-09-14 22:28:08.0 + >> +++ fs/binfmt_script.c 2012-09-20 16:01:58.951942355 + > > > Incorrect diff/patch format for kernel patches. > It should be apply-able by using 'patch -p1'. > ... OK, formatting changed: * patch depth level added * comment style changed * goto-s now on own line Has any one looked at the logic apart from the styling? Are there any flaws? > Oh, the patch is not signed off. Yes. Anyone who likes it can sign it off or even resubmit it in his name. --- linux-3.5.4/fs/binfmt_script.c 2012-09-14 22:28:08.0 + +++ linux-3.5.4/fs/binfmt_script.c 2012-09-23 02:28:39.905123091 + @@ -14,12 +14,25 @@ #include #include +/* + * Check if this handler is suitable to load the "binary" identified + * by first BINPRM_BUF_SIZE bytes in bprm->buf. + * returns: -ENOEXEC if this handler is not suitable for that type + * of binary. In that case, the handler must not modify any of the + * data associated with bprm. + * Any error if the binary should have been handled by this loader + * but handling failed. In that case. FIXME: be defensive? also + * kill bprm->mm or bprm->file also to make it impossible, that + * upper search_binary_handler can continue handling? + * 0 (OK) otherwise, the new executable is ready in bprm->mm. + */ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) { const char *i_arg, *i_name; char *cp; struct file *file; - char interp[BINPRM_BUF_SIZE]; + char bprm_buf_copy[BINPRM_BUF_SIZE]; + const char *bprm_old_interp_name; int retval; if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || @@ -30,25 +43,32 @@ static int load_script(struct linux_binp * Sorta complicated, but hopefully it will work. -TYT */ - bprm->recursion_depth++; - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; + /* +* Keep bprm unchanged until we known, that this is a script +* to be handled by this loader. Copy bprm->buf for sure, +* otherwise returning -ENOEXEC will make other handlers see +* modified data. (hd) +*/ + memcpy(bprm_buf_copy, bprm->buf, BINPRM_BUF_SIZE); - bprm->buf[BINPRM_BUF_SIZE - 1] = '\0'; - if ((cp = strchr(bprm->buf, '\n')) == NULL) - cp = bprm->buf+BINPRM_BUF_SIZE-1; + bprm_buf_copy[BINPRM_BUF_SIZE - 1]='\0'; + if ((cp = strchr(bprm_buf_copy, '\n')) == NULL) + cp = bprm_buf_copy+BINPRM_BUF_SIZE-1; *cp = '\0'; - while (cp > bprm->buf) { + while (cp > bprm_buf_copy) { cp--; if ((*cp == ' ') || (*cp == '\t')) *cp = '\0'; else break; } - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + for (cp = bprm_buf_copy+2; (*cp == ' ') || (*cp == '\t'); cp++); if (*cp == '\0') - return -ENOEXEC; /* No interpreter name found */ + /* +* No interpreter name found. No problem to let other handlers +* retry, we did not change anything. +*/ + return -ENOEXEC; i_name = cp; i_arg = NULL; for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) @@ -57,45 +77,94 @@ static int load_script(struct linux_binp *cp++ = '\0'; if (*cp) i_arg = cp; - strcpy (interp, i_name); + + /* +* So this is our point-of-no-return: modification of bprm +* will be irreversible, so if we fail to setup execution +* using the new interpreter name (i_name), we have to make +* sure, that no other handler tries again. (hd) +*/ + /* * OK, we've parsed out the interpreter name and * (optional) argument. * Splice in (1) the interpreter's name for argv[0] -* (2) (optional) argument to interpreter -* (3) filename of shell script (replace argv[0]) +* (2) (optional) argument to interpreter +* (3) filename of shell script (replace argv[0]) * * This is done in reverse order, because of how the * user environment and a
Re: [PATCH v2] Fix kernel stack data disclosure in binfmt_script during execve
Randy Dunlap wrote: On 09/20/2012 09:05 AM, halfdog wrote: halfdog wrote: Now this is the updated and also tested patch (vs. linux-3.5.4 kernel) to fix https://bugzilla.kernel.org/show_bug.cgi?id=46841 . See also http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ This patch adresses the stack data disclosure but does not deal with the excessive recursion (to be handled in separate patch if needed). --- fs/binfmt_script.c 2012-09-14 22:28:08.0 + +++ fs/binfmt_script.c 2012-09-20 16:01:58.951942355 + Incorrect diff/patch format for kernel patches. It should be apply-able by using 'patch -p1'. ... OK, formatting changed: * patch depth level added * comment style changed * goto-s now on own line Has any one looked at the logic apart from the styling? Are there any flaws? Oh, the patch is not signed off. Yes. Anyone who likes it can sign it off or even resubmit it in his name. --- linux-3.5.4/fs/binfmt_script.c 2012-09-14 22:28:08.0 + +++ linux-3.5.4/fs/binfmt_script.c 2012-09-23 02:28:39.905123091 + @@ -14,12 +14,25 @@ #include linux/err.h #include linux/fs.h +/* + * Check if this handler is suitable to load the binary identified + * by first BINPRM_BUF_SIZE bytes in bprm-buf. + * returns: -ENOEXEC if this handler is not suitable for that type + * of binary. In that case, the handler must not modify any of the + * data associated with bprm. + * Any error if the binary should have been handled by this loader + * but handling failed. In that case. FIXME: be defensive? also + * kill bprm-mm or bprm-file also to make it impossible, that + * upper search_binary_handler can continue handling? + * 0 (OK) otherwise, the new executable is ready in bprm-mm. + */ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) { const char *i_arg, *i_name; char *cp; struct file *file; - char interp[BINPRM_BUF_SIZE]; + char bprm_buf_copy[BINPRM_BUF_SIZE]; + const char *bprm_old_interp_name; int retval; if ((bprm-buf[0] != '#') || (bprm-buf[1] != '!') || @@ -30,25 +43,32 @@ static int load_script(struct linux_binp * Sorta complicated, but hopefully it will work. -TYT */ - bprm-recursion_depth++; - allow_write_access(bprm-file); - fput(bprm-file); - bprm-file = NULL; + /* +* Keep bprm unchanged until we known, that this is a script +* to be handled by this loader. Copy bprm-buf for sure, +* otherwise returning -ENOEXEC will make other handlers see +* modified data. (hd) +*/ + memcpy(bprm_buf_copy, bprm-buf, BINPRM_BUF_SIZE); - bprm-buf[BINPRM_BUF_SIZE - 1] = '\0'; - if ((cp = strchr(bprm-buf, '\n')) == NULL) - cp = bprm-buf+BINPRM_BUF_SIZE-1; + bprm_buf_copy[BINPRM_BUF_SIZE - 1]='\0'; + if ((cp = strchr(bprm_buf_copy, '\n')) == NULL) + cp = bprm_buf_copy+BINPRM_BUF_SIZE-1; *cp = '\0'; - while (cp bprm-buf) { + while (cp bprm_buf_copy) { cp--; if ((*cp == ' ') || (*cp == '\t')) *cp = '\0'; else break; } - for (cp = bprm-buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + for (cp = bprm_buf_copy+2; (*cp == ' ') || (*cp == '\t'); cp++); if (*cp == '\0') - return -ENOEXEC; /* No interpreter name found */ + /* +* No interpreter name found. No problem to let other handlers +* retry, we did not change anything. +*/ + return -ENOEXEC; i_name = cp; i_arg = NULL; for ( ; *cp (*cp != ' ') (*cp != '\t'); cp++) @@ -57,45 +77,94 @@ static int load_script(struct linux_binp *cp++ = '\0'; if (*cp) i_arg = cp; - strcpy (interp, i_name); + + /* +* So this is our point-of-no-return: modification of bprm +* will be irreversible, so if we fail to setup execution +* using the new interpreter name (i_name), we have to make +* sure, that no other handler tries again. (hd) +*/ + /* * OK, we've parsed out the interpreter name and * (optional) argument. * Splice in (1) the interpreter's name for argv[0] -* (2) (optional) argument to interpreter -* (3) filename of shell script (replace argv[0]) +* (2) (optional) argument to interpreter +* (3) filename of shell script (replace argv[0]) * * This is done in reverse order, because of how the * user environment and arguments are stored. */ + + /* +* Ugly: we store pointer to local stack frame in bprm, +* so make sure to clear this up before returning. +*/ + bprm_old_interp_name = bprm-interp
[PATCH] Fix kernel stack data disclosure in binfmt_script during execve
halfdog wrote: > Kirill A. Shutemov wrote: >> On Wed, Aug 22, 2012 at 09:49:35PM +, halfdog wrote: >>> Got a hint via IRC, that I should not send patch idea for review >>> to "generic" list, but to maintainers and last (or relevant) >>> comitters of code. >>> >>> http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf2a9a39639b8b51377905397a5005f444e9a892 >>> >>> > ... >>> halfdog wrote: >>>> halfdog wrote: >>>>> I'm searching for a patch for linux kernel stack disclosure >>>>> in binfmt_script with crafted interpreter names when >>>>> CONFIG_MODULES is active (see [1]). >>>> >>>> Please disregard my previous proposal [2], since it did not >>>> address the problem directly (referencing local stack frame >>>> data from bprm structure) but worked around it. I suspect, >>>> that this could increase probability to reintroduce similar >>>> bugs. >>>> >>>> Opinions on (untested sketch for) second solution: Could >>>> someone look on the source code comments and changes in patch >>>> to judge, if this is going in the right direction? >>>> >>>> Explanation of patch: Since load_script will start to >>>> irreversibly change bprm structures at some point (using stack >>>> local data was one of those changes), try to delay this point. >>>> Run checks if load_script could be the right handler, if not >>>> give other binfmt handlers the chance to do so. >>>> >>>> If binfmt_script is the right one, try to load the interpreter >>>> (causing bprm modification), if failing make sure that no >>>> other binfmt handler has the chance to continue on the now >>>> modified bprm data. >>>> >>>> CAVEAT: This assumes, that if binfmt_script could handle the >>>> load, that it would be the one and only binfmt with that >>>> ability, so no other one, e.g. binfmt_misc should have the >>>> chance to do so. If this assumption is wrong, leaving >>>> binfmt_script would have to rollback all bprm changes (e.g. >>>> restore old credentials). >>>> >>>> [1] >>>> http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ >>>> >>>> > [2] http://lkml.org/lkml/2012/8/18/75 > >> What about (untested): > >> diff --git a/fs/exec.c b/fs/exec.c index 574cf4d..ef13850 100644 >> --- a/fs/exec.c +++ b/fs/exec.c @@ -1438,7 +1438,8 @@ int >> search_binary_handler(struct linux_binprm *bprm,struct pt_regs >> *regs) } read_unlock(_lock); #ifdef CONFIG_MODULES - if >> (retval != -ENOEXEC || bprm->mm == NULL) { + if (retval != >> -ENOEXEC || bprm->mm == NULL || + >> bprm->recursion_depth > >> BINPRM_MAX_RECURSION) { break; } else { #define printable(c) >> (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e)) > > - From my understanding, this patch should not fix the problem, since > recursion depth is reset back to old value after call of binfmt handler. > This is done, so that fs/exec does not have to trust all binfmts to > reset the depth by themselfes when leaving. > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=fs/exec.c;h=da27b91ff1e8cbe87d0fe42aa5d39513e6a9deeb;hb=HEAD >1408 read_unlock(_lock); >1409 retval = fn(bprm, regs); >1410 /* >1411 * Restore the depth counter to its > starting value >1412 * in this call, so we don't have to > rely on every >1413 * load_binary function to restore it on > return. >1414 */ >1415 bprm->recursion_depth = depth; > > > I guess, the problem is, that recursion_depth usually not only limits > the depth, but also the maximal number of binfmt_xxx calls. That's why, > the use of local stack-frame data in bprm does not matter, there is no > going up the stack AND using bprm->interpreter, the last error is > terminates the search. > > In the POC, search is not terminated because of ENOEXEC when max depth > reached and due to special filename, mod-loader triggers also (about 30 > times? I do not known, if that could be a problem also, interfering with > other module loads. Usually non-root users cannot trigger r
[PATCH] Fix kernel stack data disclosure in binfmt_script during execve
halfdog wrote: Kirill A. Shutemov wrote: On Wed, Aug 22, 2012 at 09:49:35PM +, halfdog wrote: Got a hint via IRC, that I should not send patch idea for review to generic list, but to maintainers and last (or relevant) comitters of code. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf2a9a39639b8b51377905397a5005f444e9a892 ... halfdog wrote: halfdog wrote: I'm searching for a patch for linux kernel stack disclosure in binfmt_script with crafted interpreter names when CONFIG_MODULES is active (see [1]). Please disregard my previous proposal [2], since it did not address the problem directly (referencing local stack frame data from bprm structure) but worked around it. I suspect, that this could increase probability to reintroduce similar bugs. Opinions on (untested sketch for) second solution: Could someone look on the source code comments and changes in patch to judge, if this is going in the right direction? Explanation of patch: Since load_script will start to irreversibly change bprm structures at some point (using stack local data was one of those changes), try to delay this point. Run checks if load_script could be the right handler, if not give other binfmt handlers the chance to do so. If binfmt_script is the right one, try to load the interpreter (causing bprm modification), if failing make sure that no other binfmt handler has the chance to continue on the now modified bprm data. CAVEAT: This assumes, that if binfmt_script could handle the load, that it would be the one and only binfmt with that ability, so no other one, e.g. binfmt_misc should have the chance to do so. If this assumption is wrong, leaving binfmt_script would have to rollback all bprm changes (e.g. restore old credentials). [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ [2] http://lkml.org/lkml/2012/8/18/75 What about (untested): diff --git a/fs/exec.c b/fs/exec.c index 574cf4d..ef13850 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1438,7 +1438,8 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs) } read_unlock(binfmt_lock); #ifdef CONFIG_MODULES - if (retval != -ENOEXEC || bprm-mm == NULL) { + if (retval != -ENOEXEC || bprm-mm == NULL || + bprm-recursion_depth BINPRM_MAX_RECURSION) { break; } else { #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20=(c) (c)=0x7e)) - From my understanding, this patch should not fix the problem, since recursion depth is reset back to old value after call of binfmt handler. This is done, so that fs/exec does not have to trust all binfmts to reset the depth by themselfes when leaving. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=fs/exec.c;h=da27b91ff1e8cbe87d0fe42aa5d39513e6a9deeb;hb=HEAD 1408 read_unlock(binfmt_lock); 1409 retval = fn(bprm, regs); 1410 /* 1411 * Restore the depth counter to its starting value 1412 * in this call, so we don't have to rely on every 1413 * load_binary function to restore it on return. 1414 */ 1415 bprm-recursion_depth = depth; I guess, the problem is, that recursion_depth usually not only limits the depth, but also the maximal number of binfmt_xxx calls. That's why, the use of local stack-frame data in bprm does not matter, there is no going up the stack AND using bprm-interpreter, the last error is terminates the search. In the POC, search is not terminated because of ENOEXEC when max depth reached and due to special filename, mod-loader triggers also (about 30 times? I do not known, if that could be a problem also, interfering with other module loads. Usually non-root users cannot trigger rapid module loads easily). What about (untested): Now this is the updated and also tested patch (vs. linux-3.5.4 kernel) to fix https://bugzilla.kernel.org/show_bug.cgi?id=46841 . See also http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ This patch adresses the stack data disclosure but does not deal with the excessive recursion (to be handled in separate patch if needed). --- fs/binfmt_script.c 2012-09-14 22:28:08.0 + +++ fs/binfmt_script.c 2012-09-20 16:01:58.951942355 + @@ -14,12 +14,24 @@ #include linux/err.h #include linux/fs.h +/** Check if this handler is suitable to load the binary identified + * by first BINPRM_BUF_SIZE bytes in bprm-buf. + * @returns -ENOEXEC if this handler is not suitable for that type + * of binary. In that case, the handler must not modify any of the + * data associated with bprm. + * Any error if the binary should have been handled by this loader + * but handling failed
Re: Software interrupt 0x8 guest crash from userspace: virtualbox emulation or guest kernel bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 halfdog wrote: > I have observed a strange guest kernel crash in virtualbox and are > currently trying to understand it. Since I have no real 32-bit > Intel platform any more, I cannot verify that this crash would > happen on native 32bit also, so perhaps someone could check that. > ... For the records: no checks needed any more, root cause of issue already found. - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlBBrXEACgkQxFmThv7tq+6JHQCfaC1tCY2iLFcmoVwtA0UUZXpx 5TcAn1KhhpvgkaLGQ80AlErXzv6bN8SJ =uJvv -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Software interrupt 0x8 guest crash from userspace: virtualbox emulation or guest kernel bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 halfdog wrote: I have observed a strange guest kernel crash in virtualbox and are currently trying to understand it. Since I have no real 32-bit Intel platform any more, I cannot verify that this crash would happen on native 32bit also, so perhaps someone could check that. ... For the records: no checks needed any more, root cause of issue already found. - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlBBrXEACgkQxFmThv7tq+6JHQCfaC1tCY2iLFcmoVwtA0UUZXpx 5TcAn1KhhpvgkaLGQ80AlErXzv6bN8SJ =uJvv -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Search for patch for kernel stack data disclosure in binfmt_script during execve
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kirill A. Shutemov wrote: > On Wed, Aug 22, 2012 at 09:49:35PM +0000, halfdog wrote: >> Got a hint via IRC, that I should not send patch idea for review >> to "generic" list, but to maintainers and last (or relevant) >> comitters of code. >> >> http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf2a9a39639b8b51377905397a5005f444e9a892 >> >> ... >> halfdog wrote: >>> halfdog wrote: >>>> I'm searching for a patch for linux kernel stack disclosure >>>> in binfmt_script with crafted interpreter names when >>>> CONFIG_MODULES is active (see [1]). >>> >>> Please disregard my previous proposal [2], since it did not >>> address the problem directly (referencing local stack frame >>> data from bprm structure) but worked around it. I suspect, >>> that this could increase probability to reintroduce similar >>> bugs. >>> >>> Opinions on (untested sketch for) second solution: Could >>> someone look on the source code comments and changes in patch >>> to judge, if this is going in the right direction? >>> >>> Explanation of patch: Since load_script will start to >>> irreversibly change bprm structures at some point (using stack >>> local data was one of those changes), try to delay this point. >>> Run checks if load_script could be the right handler, if not >>> give other binfmt handlers the chance to do so. >>> >>> If binfmt_script is the right one, try to load the interpreter >>> (causing bprm modification), if failing make sure that no >>> other binfmt handler has the chance to continue on the now >>> modified bprm data. >>> >>> CAVEAT: This assumes, that if binfmt_script could handle the >>> load, that it would be the one and only binfmt with that >>> ability, so no other one, e.g. binfmt_misc should have the >>> chance to do so. If this assumption is wrong, leaving >>> binfmt_script would have to rollback all bprm changes (e.g. >>> restore old credentials). >>> >>> [1] >>> http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ >>> >>> [2] http://lkml.org/lkml/2012/8/18/75 > > What about (untested): > > diff --git a/fs/exec.c b/fs/exec.c index 574cf4d..ef13850 100644 > --- a/fs/exec.c +++ b/fs/exec.c @@ -1438,7 +1438,8 @@ int > search_binary_handler(struct linux_binprm *bprm,struct pt_regs > *regs) } read_unlock(_lock); #ifdef CONFIG_MODULES - if > (retval != -ENOEXEC || bprm->mm == NULL) { + if (retval != > -ENOEXEC || bprm->mm == NULL || + > bprm->recursion_depth > > BINPRM_MAX_RECURSION) { break; } else { #define printable(c) > (((c)=='\t') || ((c)=='\n') || (0x20<=(c) && (c)<=0x7e)) - - From my understanding, this patch should not fix the problem, since recursion depth is reset back to old value after call of binfmt handler. This is done, so that fs/exec does not have to trust all binfmts to reset the depth by themselfes when leaving. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=fs/exec.c;h=da27b91ff1e8cbe87d0fe42aa5d39513e6a9deeb;hb=HEAD 1408 read_unlock(_lock); 1409 retval = fn(bprm, regs); 1410 /* 1411 * Restore the depth counter to its starting value 1412 * in this call, so we don't have to rely on every 1413 * load_binary function to restore it on return. 1414 */ 1415 bprm->recursion_depth = depth; I guess, the problem is, that recursion_depth usually not only limits the depth, but also the maximal number of binfmt_xxx calls. That's why, the use of local stack-frame data in bprm does not matter, there is no going up the stack AND using bprm->interpreter, the last error is terminates the search. In the POC, search is not terminated because of ENOEXEC when max depth reached and due to special filename, mod-loader triggers also (about 30 times? I do not known, if that could be a problem also, interfering with other module loads. Usually non-root users cannot trigger rapid module loads easily). - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlA3U3QACgkQxFmThv7tq+7hTgCZAcQFn70FUWnAhvoMYhm8EcFT 8vQAn06VtbeY5P0cPGd9fcxL6AaEF8oS =An9g -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Search for patch for kernel stack data disclosure in binfmt_script during execve
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kirill A. Shutemov wrote: On Wed, Aug 22, 2012 at 09:49:35PM +, halfdog wrote: Got a hint via IRC, that I should not send patch idea for review to generic list, but to maintainers and last (or relevant) comitters of code. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf2a9a39639b8b51377905397a5005f444e9a892 ... halfdog wrote: halfdog wrote: I'm searching for a patch for linux kernel stack disclosure in binfmt_script with crafted interpreter names when CONFIG_MODULES is active (see [1]). Please disregard my previous proposal [2], since it did not address the problem directly (referencing local stack frame data from bprm structure) but worked around it. I suspect, that this could increase probability to reintroduce similar bugs. Opinions on (untested sketch for) second solution: Could someone look on the source code comments and changes in patch to judge, if this is going in the right direction? Explanation of patch: Since load_script will start to irreversibly change bprm structures at some point (using stack local data was one of those changes), try to delay this point. Run checks if load_script could be the right handler, if not give other binfmt handlers the chance to do so. If binfmt_script is the right one, try to load the interpreter (causing bprm modification), if failing make sure that no other binfmt handler has the chance to continue on the now modified bprm data. CAVEAT: This assumes, that if binfmt_script could handle the load, that it would be the one and only binfmt with that ability, so no other one, e.g. binfmt_misc should have the chance to do so. If this assumption is wrong, leaving binfmt_script would have to rollback all bprm changes (e.g. restore old credentials). [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ [2] http://lkml.org/lkml/2012/8/18/75 What about (untested): diff --git a/fs/exec.c b/fs/exec.c index 574cf4d..ef13850 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1438,7 +1438,8 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs) } read_unlock(binfmt_lock); #ifdef CONFIG_MODULES - if (retval != -ENOEXEC || bprm-mm == NULL) { + if (retval != -ENOEXEC || bprm-mm == NULL || + bprm-recursion_depth BINPRM_MAX_RECURSION) { break; } else { #define printable(c) (((c)=='\t') || ((c)=='\n') || (0x20=(c) (c)=0x7e)) - - From my understanding, this patch should not fix the problem, since recursion depth is reset back to old value after call of binfmt handler. This is done, so that fs/exec does not have to trust all binfmts to reset the depth by themselfes when leaving. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=blob;f=fs/exec.c;h=da27b91ff1e8cbe87d0fe42aa5d39513e6a9deeb;hb=HEAD 1408 read_unlock(binfmt_lock); 1409 retval = fn(bprm, regs); 1410 /* 1411 * Restore the depth counter to its starting value 1412 * in this call, so we don't have to rely on every 1413 * load_binary function to restore it on return. 1414 */ 1415 bprm-recursion_depth = depth; I guess, the problem is, that recursion_depth usually not only limits the depth, but also the maximal number of binfmt_xxx calls. That's why, the use of local stack-frame data in bprm does not matter, there is no going up the stack AND using bprm-interpreter, the last error is terminates the search. In the POC, search is not terminated because of ENOEXEC when max depth reached and due to special filename, mod-loader triggers also (about 30 times? I do not known, if that could be a problem also, interfering with other module loads. Usually non-root users cannot trigger rapid module loads easily). - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlA3U3QACgkQxFmThv7tq+7hTgCZAcQFn70FUWnAhvoMYhm8EcFT 8vQAn06VtbeY5P0cPGd9fcxL6AaEF8oS =An9g -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Search for patch for kernel stack data disclosure in binfmt_script during execve
Got a hint via IRC, that I should not send patch idea for review to "generic" list, but to maintainers and last (or relevant) comitters of code. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf2a9a39639b8b51377905397a5005f444e9a892 CC to generic just for the records halfdog wrote: > halfdog wrote: >> I'm searching for a patch for linux kernel stack disclosure in >> binfmt_script with crafted interpreter names when CONFIG_MODULES >> is active (see [1]). > > Please disregard my previous proposal [2], since it did not address > the problem directly (referencing local stack frame data from bprm > structure) but worked around it. I suspect, that this could increase > probability to reintroduce similar bugs. > > Opinions on (untested sketch for) second solution: Could someone look > on the source code comments and changes in patch to judge, if this is > going in the right direction? > > > Explanation of patch: Since load_script will start to irreversibly > change bprm structures at some point (using stack local data was one > of those changes), try to delay this point. Run checks if load_script > could be the right handler, if not give other binfmt handlers the > chance to do so. > > If binfmt_script is the right one, try to load the interpreter > (causing bprm modification), if failing make sure that no other binfmt > handler has the chance to continue on the now modified bprm data. > > CAVEAT: This assumes, that if binfmt_script could handle the load, > that it would be the one and only binfmt with that ability, so no > other one, e.g. binfmt_misc should have the chance to do so. If this > assumption is wrong, leaving binfmt_script would have to rollback all > bprm changes (e.g. restore old credentials). > > hd > > [1] > http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ > [2] http://lkml.org/lkml/2012/8/18/75 > > -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Search for patch for kernel stack data disclosure in binfmt_script during execve
Got a hint via IRC, that I should not send patch idea for review to generic list, but to maintainers and last (or relevant) comitters of code. http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=bf2a9a39639b8b51377905397a5005f444e9a892 CC to generic just for the records halfdog wrote: halfdog wrote: I'm searching for a patch for linux kernel stack disclosure in binfmt_script with crafted interpreter names when CONFIG_MODULES is active (see [1]). Please disregard my previous proposal [2], since it did not address the problem directly (referencing local stack frame data from bprm structure) but worked around it. I suspect, that this could increase probability to reintroduce similar bugs. Opinions on (untested sketch for) second solution: Could someone look on the source code comments and changes in patch to judge, if this is going in the right direction? Explanation of patch: Since load_script will start to irreversibly change bprm structures at some point (using stack local data was one of those changes), try to delay this point. Run checks if load_script could be the right handler, if not give other binfmt handlers the chance to do so. If binfmt_script is the right one, try to load the interpreter (causing bprm modification), if failing make sure that no other binfmt handler has the chance to continue on the now modified bprm data. CAVEAT: This assumes, that if binfmt_script could handle the load, that it would be the one and only binfmt with that ability, so no other one, e.g. binfmt_misc should have the chance to do so. If this assumption is wrong, leaving binfmt_script would have to rollback all bprm changes (e.g. restore old credentials). hd [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ [2] http://lkml.org/lkml/2012/8/18/75 -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Search for patch for kernel stack data disclosure in binfmt_script during execve
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 halfdog wrote: > I'm searching for a patch for linux kernel stack disclosure in > binfmt_script with crafted interpreter names when CONFIG_MODULES > is active (see [1]). Please disregard my previous proposal [2], since it did not address the problem directly (referencing local stack frame data from bprm structure) but worked around it. I suspect, that this could increase probability to reintroduce similar bugs. Opinions on (untested sketch for) second solution: Could someone look on the source code comments and changes in patch to judge, if this is going in the right direction? Explanation of patch: Since load_script will start to irreversibly change bprm structures at some point (using stack local data was one of those changes), try to delay this point. Run checks if load_script could be the right handler, if not give other binfmt handlers the chance to do so. If binfmt_script is the right one, try to load the interpreter (causing bprm modification), if failing make sure that no other binfmt handler has the chance to continue on the now modified bprm data. CAVEAT: This assumes, that if binfmt_script could handle the load, that it would be the one and only binfmt with that ability, so no other one, e.g. binfmt_misc should have the chance to do so. If this assumption is wrong, leaving binfmt_script would have to rollback all bprm changes (e.g. restore old credentials). hd [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ [2] http://lkml.org/lkml/2012/8/18/75 - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlAwphsACgkQxFmThv7tq+6UAQCgh7IA8UcqNieV41YKHS5/YxGE IbcAn1uP1nIakg/gD1KlV0KNnLIfitEp =5Klt -END PGP SIGNATURE- --- fs/binfmt_script.c 2012-01-19 23:04:48.0 + +++ fs/binfmt_script.c 2012-08-19 07:08:42.540611605 + @@ -14,12 +14,24 @@ #include #include +/** Check if this handler is suitable to load the "binary" identified + * by first BINPRM_BUF_SIZE bytes in bprm->buf. + * @returns -ENOEXEC if this handler is not suitable for that type + * of binary. In that case, the handler must not modify any of the + * data associated with bprm. + * Any error if the binary should have been handled by this loader + * but handling failed. In that case. FIXME: be defensive? also + * kill bprm->mm or bprm->file also to make it impossible, that + * upper search_binary_handler can continue handling? + * 0 (OK) otherwise, the new executable is ready in bprm->mm. + */ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) { const char *i_arg, *i_name; char *cp; struct file *file; - char interp[BINPRM_BUF_SIZE]; + char bprm_buf_copy[BINPRM_BUF_SIZE]; + char *bprm_old_interp_name; int retval; if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || @@ -30,25 +42,29 @@ static int load_script(struct linux_binp * Sorta complicated, but hopefully it will work. -TYT */ - bprm->recursion_depth++; - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; + /* Keep bprm unchanged until we known, that this is a script +* to be handled by this loader. Copy bprm->buf for sure, +* otherwise returning -ENOEXEC will make other handlers see +* modified data. (hd) +*/ + memcpy(bprm_buf_copy, bprm->buf, BINPRM_BUF_SIZE); - bprm->buf[BINPRM_BUF_SIZE - 1] = '\0'; - if ((cp = strchr(bprm->buf, '\n')) == NULL) - cp = bprm->buf+BINPRM_BUF_SIZE-1; + bprm_buf_copy[BINPRM_BUF_SIZE - 1]='\0'; + if ((cp = strchr(bprm_buf_copy, '\n')) == NULL) + cp = bprm_buf_copy+BINPRM_BUF_SIZE-1; *cp = '\0'; - while (cp > bprm->buf) { + while (cp > bprm_buf_copy) { cp--; if ((*cp == ' ') || (*cp == '\t')) *cp = '\0'; else break; } - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + for (cp = bprm_buf_copy+2; (*cp == ' ') || (*cp == '\t'); cp++); if (*cp == '\0') - return -ENOEXEC; /* No interpreter name found */ + /* No interpreter name found. No problem to let other handlers +* retry, we did not change anything. */ + return -ENOEXEC; i_name = cp; i_arg = NULL; for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) @@ -57,45 +73,83 @@ static int load_script(struct linux_binp *cp++ = '\0'; if (*cp) i_arg = cp; - strcpy (interp, i_name); + + /* So this is our point-of-no-return: modification of bprm +* will be irreversible,
Re: Search for patch for kernel stack data disclosure in binfmt_script during execve
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 halfdog wrote: I'm searching for a patch for linux kernel stack disclosure in binfmt_script with crafted interpreter names when CONFIG_MODULES is active (see [1]). Please disregard my previous proposal [2], since it did not address the problem directly (referencing local stack frame data from bprm structure) but worked around it. I suspect, that this could increase probability to reintroduce similar bugs. Opinions on (untested sketch for) second solution: Could someone look on the source code comments and changes in patch to judge, if this is going in the right direction? Explanation of patch: Since load_script will start to irreversibly change bprm structures at some point (using stack local data was one of those changes), try to delay this point. Run checks if load_script could be the right handler, if not give other binfmt handlers the chance to do so. If binfmt_script is the right one, try to load the interpreter (causing bprm modification), if failing make sure that no other binfmt handler has the chance to continue on the now modified bprm data. CAVEAT: This assumes, that if binfmt_script could handle the load, that it would be the one and only binfmt with that ability, so no other one, e.g. binfmt_misc should have the chance to do so. If this assumption is wrong, leaving binfmt_script would have to rollback all bprm changes (e.g. restore old credentials). hd [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ [2] http://lkml.org/lkml/2012/8/18/75 - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlAwphsACgkQxFmThv7tq+6UAQCgh7IA8UcqNieV41YKHS5/YxGE IbcAn1uP1nIakg/gD1KlV0KNnLIfitEp =5Klt -END PGP SIGNATURE- --- fs/binfmt_script.c 2012-01-19 23:04:48.0 + +++ fs/binfmt_script.c 2012-08-19 07:08:42.540611605 + @@ -14,12 +14,24 @@ #include linux/err.h #include linux/fs.h +/** Check if this handler is suitable to load the binary identified + * by first BINPRM_BUF_SIZE bytes in bprm-buf. + * @returns -ENOEXEC if this handler is not suitable for that type + * of binary. In that case, the handler must not modify any of the + * data associated with bprm. + * Any error if the binary should have been handled by this loader + * but handling failed. In that case. FIXME: be defensive? also + * kill bprm-mm or bprm-file also to make it impossible, that + * upper search_binary_handler can continue handling? + * 0 (OK) otherwise, the new executable is ready in bprm-mm. + */ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs) { const char *i_arg, *i_name; char *cp; struct file *file; - char interp[BINPRM_BUF_SIZE]; + char bprm_buf_copy[BINPRM_BUF_SIZE]; + char *bprm_old_interp_name; int retval; if ((bprm-buf[0] != '#') || (bprm-buf[1] != '!') || @@ -30,25 +42,29 @@ static int load_script(struct linux_binp * Sorta complicated, but hopefully it will work. -TYT */ - bprm-recursion_depth++; - allow_write_access(bprm-file); - fput(bprm-file); - bprm-file = NULL; + /* Keep bprm unchanged until we known, that this is a script +* to be handled by this loader. Copy bprm-buf for sure, +* otherwise returning -ENOEXEC will make other handlers see +* modified data. (hd) +*/ + memcpy(bprm_buf_copy, bprm-buf, BINPRM_BUF_SIZE); - bprm-buf[BINPRM_BUF_SIZE - 1] = '\0'; - if ((cp = strchr(bprm-buf, '\n')) == NULL) - cp = bprm-buf+BINPRM_BUF_SIZE-1; + bprm_buf_copy[BINPRM_BUF_SIZE - 1]='\0'; + if ((cp = strchr(bprm_buf_copy, '\n')) == NULL) + cp = bprm_buf_copy+BINPRM_BUF_SIZE-1; *cp = '\0'; - while (cp bprm-buf) { + while (cp bprm_buf_copy) { cp--; if ((*cp == ' ') || (*cp == '\t')) *cp = '\0'; else break; } - for (cp = bprm-buf+2; (*cp == ' ') || (*cp == '\t'); cp++); + for (cp = bprm_buf_copy+2; (*cp == ' ') || (*cp == '\t'); cp++); if (*cp == '\0') - return -ENOEXEC; /* No interpreter name found */ + /* No interpreter name found. No problem to let other handlers +* retry, we did not change anything. */ + return -ENOEXEC; i_name = cp; i_arg = NULL; for ( ; *cp (*cp != ' ') (*cp != '\t'); cp++) @@ -57,45 +73,83 @@ static int load_script(struct linux_binp *cp++ = '\0'; if (*cp) i_arg = cp; - strcpy (interp, i_name); + + /* So this is our point-of-no-return: modification of bprm +* will be irreversible, so if we fail to setup execution +* using the new interpreter name (i_name), we have to make
Search for patch for kernel stack disclosure in binfmt_script during execve
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm searching for a patch for linux kernel stack disclosure in binfmt_script with crafted interpreter names when CONFIG_MODULES is active (see [1]). The simplest solution would be to return an error in load_script (from fs/binfmt_script.c). when maximal recursion depth is reached, but I'm not sure, if that is nice and could have any side effects. Apart from that, some change in the loop condition in search_binary_handler (from fs/exec.c) could have side effects hard to see and hence reintroduce the bug (challenge to get that right in documentation). Any comments? - --- fs/binfmt_script.c 2012-01-19 23:04:48.0 + +++ fs/binfmt_script.c2012-08-18 13:55:25.735748407 + @@ -22,9 +22,8 @@ char interp[BINPRM_BUF_SIZE]; int retval; - - if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || - - (bprm->recursion_depth > BINPRM_MAX_RECURSION)) - - return -ENOEXEC; + if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!')) return - -ENOEXEC; +if (bprm->recursion_depth > BINPRM_MAX_RECURSION) return -ENOMEM; /* * This section does the #! interpretation. * Sorta complicated, but hopefully it will work. -TYT hd [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlAvn0MACgkQxFmThv7tq+6nUACfdk7KWESuC6J1FXZcrMaa3kCb eWoAn0wV6INdYGjAZydd6ytO0i5BnhGa =cxbR -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Search for patch for kernel stack disclosure in binfmt_script during execve
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm searching for a patch for linux kernel stack disclosure in binfmt_script with crafted interpreter names when CONFIG_MODULES is active (see [1]). The simplest solution would be to return an error in load_script (from fs/binfmt_script.c). when maximal recursion depth is reached, but I'm not sure, if that is nice and could have any side effects. Apart from that, some change in the loop condition in search_binary_handler (from fs/exec.c) could have side effects hard to see and hence reintroduce the bug (challenge to get that right in documentation). Any comments? - --- fs/binfmt_script.c 2012-01-19 23:04:48.0 + +++ fs/binfmt_script.c2012-08-18 13:55:25.735748407 + @@ -22,9 +22,8 @@ char interp[BINPRM_BUF_SIZE]; int retval; - - if ((bprm-buf[0] != '#') || (bprm-buf[1] != '!') || - - (bprm-recursion_depth BINPRM_MAX_RECURSION)) - - return -ENOEXEC; + if ((bprm-buf[0] != '#') || (bprm-buf[1] != '!')) return - -ENOEXEC; +if (bprm-recursion_depth BINPRM_MAX_RECURSION) return -ENOMEM; /* * This section does the #! interpretation. * Sorta complicated, but hopefully it will work. -TYT hd [1] http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlAvn0MACgkQxFmThv7tq+6nUACfdk7KWESuC6J1FXZcrMaa3kCb eWoAn0wV6INdYGjAZydd6ytO0i5BnhGa =cxbR -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Software interrupt 0x8 guest crash from userspace: virtualbox emulation or guest kernel bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have observed a strange guest kernel crash in virtualbox and are currently trying to understand it. Since I have no real 32-bit Intel platform any more, I cannot verify that this crash would happen on native 32bit also, so perhaps someone could check that. I have also collected information about the crash [1], but currently fail to understand why this is happening. In short: Calling "int 0x8" in i386 guest on amd64 host crashes the guest. It seems, that "int 0x8" is handled by task gate, that fails to initialize "gs" correctly. The crash can be reproduced using [2], the same program does not crash the host. Due to lack of test platforms it is not clear, if that only affects virtual box guests. Questions: * Does this idt entry seem sane or could it be really broken? Code says ./arch/x86/kernel/traps.c: set_intr_gate_ist(8, _fault, DOUBLEFAULT_STACK); which seems consistent with observed idt setup. I'm not sure about privilege levels, is it possible to invoke this interrupt also on native systems and cause same behavior? * If broken, what is idt on native i386 system (not guest) on real 32-bit CPU? Could someone with such system send me: grep "idt_table" in System.map, "gdb --core /proc/kcore" and "x/64x [address of idt_table]" (see also [1])? * If broken, why? Same outcome on native i386 platform? * If not broken on native: why this interaction with virtualbox? hd [1] http://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/ [2] http://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/RtcInt.c - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlAuqz8ACgkQxFmThv7tq+6CzwCginL/PMRVIKxRV4YRXtRIRF+O tO4An2KcZs5caaoTFu+UGJQLtFOrmKpS =9P33 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Software interrupt 0x8 guest crash from userspace: virtualbox emulation or guest kernel bug?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have observed a strange guest kernel crash in virtualbox and are currently trying to understand it. Since I have no real 32-bit Intel platform any more, I cannot verify that this crash would happen on native 32bit also, so perhaps someone could check that. I have also collected information about the crash [1], but currently fail to understand why this is happening. In short: Calling int 0x8 in i386 guest on amd64 host crashes the guest. It seems, that int 0x8 is handled by task gate, that fails to initialize gs correctly. The crash can be reproduced using [2], the same program does not crash the host. Due to lack of test platforms it is not clear, if that only affects virtual box guests. Questions: * Does this idt entry seem sane or could it be really broken? Code says ./arch/x86/kernel/traps.c: set_intr_gate_ist(8, double_fault, DOUBLEFAULT_STACK); which seems consistent with observed idt setup. I'm not sure about privilege levels, is it possible to invoke this interrupt also on native systems and cause same behavior? * If broken, what is idt on native i386 system (not guest) on real 32-bit CPU? Could someone with such system send me: grep idt_table in System.map, gdb --core /proc/kcore and x/64x [address of idt_table] (see also [1])? * If broken, why? Same outcome on native i386 platform? * If not broken on native: why this interaction with virtualbox? hd [1] http://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/ [2] http://www.halfdog.net/Security/2012/VirtualBoxSoftwareInterrupt0x8GuestCrash/RtcInt.c - -- http://www.halfdog.net/ PGP: 156A AE98 B91F 0114 FE88 2BD8 C459 9386 feed a bee -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlAuqz8ACgkQxFmThv7tq+6CzwCginL/PMRVIKxRV4YRXtRIRF+O tO4An2KcZs5caaoTFu+UGJQLtFOrmKpS =9P33 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/