Re: [PATCH 2/6] pstore: Add event tracing support
On 9/19/2018 2:43 AM, Sai Prakash Ranjan wrote: On 9/19/2018 2:14 AM, Steven Rostedt wrote: On Tue, 18 Sep 2018 23:22:48 +0530 Sai Prakash Ranjan wrote: On 9/18/2018 5:04 AM, Steven Rostedt wrote: It looks like pstore_event_call() gets called from a trace event. You can't call kmalloc() from one. One thing is that kmalloc has tracepoints itself. You trace those you just entered an infinite loop. Ok will remove it in v2. But any alternative way to do this? I think I describe it below. Ok got it, will change and post the 2nd version soon. Hi Steven, Instead of dummy iterator, can't we have something like below, there won't be any infinite loop if we trace kmalloc in this case. This is same as tp_printk. diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 018cbbefb769..271b0573f44a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8644,8 +8644,14 @@ void __init early_trace_init(void) static_key_enable(&tracepoint_printk_key.key); } - if (tracepoint_pstore) - static_key_enable(&tracepoint_pstore_key.key); + if (tracepoint_pstore) { + tracepoint_pstore_iter = + kmalloc(sizeof(*tracepoint_pstore_iter), GFP_KERNEL); + if (WARN_ON(!tracepoint_pstore_iter)) + tracepoint_pstore = 0; + else + static_key_enable(&tracepoint_pstore_key.key); + } tracer_alloc_buffers(); } diff --git a/fs/pstore/ftrace.c b/fs/pstore/ftrace.c index f5263b6fb96f..0534546aef6d 100644 --- a/fs/pstore/ftrace.c +++ b/fs/pstore/ftrace.c @@ -73,7 +73,6 @@ void notrace pstore_event_call(struct trace_event_buffer *fbuffer) struct trace_event *event; struct seq_buf *seq; unsigned long flags; - gfp_t gfpflags; if (!psinfo) if (!psinfo) return; @@ -81,20 +80,17 @@ void notrace pstore_event_call(struct trace_event_buffer *fbuffer) if (unlikely(oops_in_progress)) return; - pstore_record_init(&record, psinfo); - record.type = PSTORE_TYPE_EVENT; - - /* Can be called in atomic context */ - gfpflags = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL; - - iter = kmalloc(sizeof(*iter), gfpflags); + iter = tracepoint_pstore_iter; if (!iter) return; + pstore_record_init(&record, psinfo); + record.type = PSTORE_TYPE_EVENT; + event_call = fbuffer->trace_file->event_call; if (!event_call || !event_call->event.funcs || !event_call->event.funcs->trace) - goto fail_event; + return; event = &fbuffer->trace_file->event_call->event; @@ -116,9 +112,6 @@ void notrace pstore_event_call(struct trace_event_buffer *fbuffer) psinfo->write(&record); spin_unlock_irqrestore(&psinfo->buf_lock, flags); - -fail_event: - kfree(iter); } Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Job Application
How's it going? My name is Kelvin and I'm interested in a job. I've attached a copy of my resume. The password is "1234" Looking forward to hearing back from you! Kelvin <>
Re: [PATCH 3.16 52/63] xfs: validate cached inodes are free when allocated
On Sat, Sep 22, 2018 at 01:15:42AM +0100, Ben Hutchings wrote: > 3.16.58-rc1 review patch. If anyone has any objections, please let me know. > > -- > > From: Dave Chinner > > commit afca6c5b2595fc44383919fba740c194b0b76aff upstream. > > A recent fuzzed filesystem image cached random dcache corruption > when the reproducer was run. This often showed up as panics in > lookup_slow() on a null inode->i_ops pointer when doing pathwalks. . > [bwh: Backported to 3.16: > - Look up mode in XFS inode, not VFS inode > - Use positive error codes, and EIO instead of EFSCORRUPTED] Again, why EIO? And > Signed-off-by: Ben Hutchings > --- > fs/xfs/xfs_icache.c | 73 + > 1 file changed, 48 insertions(+), 25 deletions(-) > > --- a/fs/xfs/xfs_icache.c > +++ b/fs/xfs/xfs_icache.c > @@ -133,6 +133,46 @@ xfs_inode_free( > } > > /* > + * If we are allocating a new inode, then check what was returned is > + * actually a free, empty inode. If we are not allocating an inode, > + * then check we didn't find a free inode. > + * > + * Returns: > + * 0 if the inode free state matches the lookup context > + * ENOENT if the inode is free and we are not allocating > + * EFSCORRUPTEDif there is any state mismatch at all You changed the code but not the comment. Cheers, Dave. -- Dave Chinner dchin...@redhat.com
Re: [PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption
On Sat, Sep 22, 2018 at 01:15:42AM +0100, Ben Hutchings wrote: > 3.16.58-rc1 review patch. If anyone has any objections, please let me know. > > -- > > From: Dave Chinner > > commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream. > > We recently came across a V4 filesystem causing memory corruption > due to a newly allocated inode being setup twice and being added to > the superblock inode list twice. From code inspection, the only way > this could happen is if a newly allocated inode was not marked as > free on disk (i.e. di_mode wasn't zero). > Signed-Off-By: Dave Chinner > Reviewed-by: Carlos Maiolino > Tested-by: Carlos Maiolino > Reviewed-by: Darrick J. Wong > Signed-off-by: Darrick J. Wong > [bwh: Backported to 3.16: > - Look up mode in XFS inode, not VFS inode > - Use positive error codes, and EIO instead of EFSCORRUPTED] Why EIO? Cheers, Dave. -- Dave Chinner dchin...@redhat.com
Re: Grant
-- I, Mikhail Fridman have selected you specifically as one of my beneficiaries for my Charitable Donation of $5 Million Dollars, Check the link below for confirmation: https://www.rt.com/business/343781-mikhail-fridman-will-charity/ I await your earliest response for further directives. Best Regards, Mikhail Fridman.
Re: [Patch v7 21/22] CIFS: SMBD: Upper layer performs SMB read via RDMA write through memory registration
Hi, >> + req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; >> + if (need_invalidate) >> + req->Channel = SMB2_CHANNEL_RDMA_V1; >> + req->ReadChannelInfoOffset = >> + offsetof(struct smb2_read_plain_req, Buffer); >> + req->ReadChannelInfoLength = >> + sizeof(struct smbd_buffer_descriptor_v1); >> + v1 = (struct smbd_buffer_descriptor_v1 *) &req->Buffer[0]; >> + v1->offset = rdata->mr->mr->iova; > > It's unnecessary, and possibly leaking kernel information, to use > the IOVA as the offset of a memory region which is registered using > an FRWR. Because such regions are based on the exact bytes targeted > by the memory handle, the offset can be set to any value, typically > zero, but nearly arbitrary. As long as the (offset + length) does > not wrap or otherwise overflow, offset can be set to anything > convenient. > > Since SMB reads and writes range up to 8MB, I'd suggest zeroing the > least significant 23 bits, which should guarantee it. The other 41 > bits, party on. You could randomize them, pass some clever identifier > such as MID sequence, whatever. I just tested that setting: mr->iova &= (PAGE_SIZE - 1); mr->iova |= 0x; after the ib_map_mr_sg() and before doing the IB_WR_REG_MR, seems to work. metze signature.asc Description: OpenPGP digital signature
Re: [PATCH v12 1/2] leds: core: Introduce LED pattern trigger
Hi, On 22 September 2018 at 06:18, Pavel Machek wrote: > On Sat 2018-09-22 00:11:29, Jacek Anaszewski wrote: >> On 09/21/2018 11:17 PM, Pavel Machek wrote: >> > On Fri 2018-09-21 22:59:40, Jacek Anaszewski wrote: >> >> Hi Baolin, >> >> >> >> On 09/21/2018 05:31 AM, Baolin Wang wrote: >> >>> Hi Jacek and Pavel, >> >>> >> >>> On 11 September 2018 at 10:47, Baolin Wang >> >>> wrote: >> This patch adds one new led trigger that LED device can configure >> the software or hardware pattern and trigger it. >> >> Consumers can write 'pattern' file to enable the software pattern >> which alters the brightness for the specified duration with one >> software timer. >> >> Moreover consumers can write 'hw_pattern' file to enable the hardware >> pattern for some LED controllers which can autonomously control >> brightness over time, according to some preprogrammed hardware >> patterns. >> >> Signed-off-by: Raphael Teysseyre >> Signed-off-by: Baolin Wang >> > >> >>> Do you have any comments for the v12 patch set? Thanks. >> >> >> >> We will probably have to remove hw_pattern from ledtrig-pattern >> >> since we are unable to come up with generic interface for it. >> >> Unless thread [0] will end up with some brilliant ideas. So far >> >> we're waiting for Pavel's reply. >> >> >> >> [0] https://lkml.org/lkml/2018/9/13/1216 >> > >> > To paint a picture: >> > >> > brightness >> > >> >rise hold lower hold down >> > ^XXX >> > | X XX >> > | X XX >> > | X XX >> > +---> time >> > >> > This is what Baolin's hardware can do, right? >> > >> > This is also what pattern trigger can do, right? >> > >> > So all we need to do is match the two interfaces, so that hw_pattern >> > returns -EINVAL on patterns hardware can not actually do. >> > >> > I believe I described code to do that in [0] above. >> >> You said that we should get the same effect by writing the >> same series of tuples to either pattern or hw_pattern file. >> >> Below command consists of four tuples (marked with brackets >> to highlight), and it will activate breathing mode in Baolin's >> hw_pattern: >> >> "[0 rise_duration] [brightness high_duration] [brightness fall_duration] >> [0 low_duration]" >> >> Now, I can't see how these four tuples could force the software >> fallback to produce breathing effect you depicted. > > I really should get some sleep now. But my intention was that software > fallback produces just that with those four tuples. (If it does not, > we can fix the software fallback to do just that). I agree with Jacek. For our SC27XX led, we just need set 4 components (low state, rise stage, high stage and fall stage) for one hardware pattern to enable the breathing mode, but it is hard to use software pattern to simulate the hardware breathing mode if failed to set the hardware pattern, especially for the rising time and falling time which are hard for software pattern to simulate. -- Baolin Wang Best Regards
Re: [PATCH] zram: fix missing zero pages for memory tracking
On Wed, Sep 19, 2018 at 04:29:16PM +0900, Sergey Senozhatsky wrote: > On (09/19/18 14:18), Minchan Kim wrote: > > We need to count zero filled pages as well as other pages in zram. > > A nit, > > 'ZRAM_FLAG_SHIFT + 1' covers all ZRAM_SAME pages, not only > zero filled pages. Ah, now I got your point. I was brainfart. I will drop this patch and find other reason I have missed. Thanks, Sergey.
[PATCH] torture-test modules: Remove unnecessary "ret" variables
Remove return variables (declared as "ret") in cases where, depending on whether a condition evaluates as true, the result of a function call can be immediately returned instead of storing the result in the return variable. When the condition evaluates as false, the constant initially stored in the return variable at declaration is returned instead. Signed-off-by: Pierce Griffiths --- kernel/torture.c | 22 -- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/kernel/torture.c b/kernel/torture.c index 1ac24a826589..f4cec6db7f3c 100644 --- a/kernel/torture.c +++ b/kernel/torture.c @@ -233,16 +233,15 @@ torture_onoff(void *arg) */ int torture_onoff_init(long ooholdoff, long oointerval) { - int ret = 0; - #ifdef CONFIG_HOTPLUG_CPU onoff_holdoff = ooholdoff; onoff_interval = oointerval; if (onoff_interval <= 0) return 0; - ret = torture_create_kthread(torture_onoff, NULL, onoff_task); -#endif /* #ifdef CONFIG_HOTPLUG_CPU */ - return ret; + return torture_create_kthread(torture_onoff, NULL, onoff_task); +#else /* #ifdef CONFIG_HOTPLUG_CPU */ + return 0; +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */ } EXPORT_SYMBOL_GPL(torture_onoff_init); @@ -513,15 +512,13 @@ static int torture_shutdown(void *arg) */ int torture_shutdown_init(int ssecs, void (*cleanup)(void)) { - int ret = 0; - torture_shutdown_hook = cleanup; if (ssecs > 0) { shutdown_time = ktime_add(ktime_get(), ktime_set(ssecs, 0)); - ret = torture_create_kthread(torture_shutdown, NULL, + return torture_create_kthread(torture_shutdown, NULL, shutdown_task); } - return ret; + return 0; } EXPORT_SYMBOL_GPL(torture_shutdown_init); @@ -619,13 +616,10 @@ static int torture_stutter(void *arg) /* * Initialize and kick off the torture_stutter kthread. */ -int torture_stutter_init(int s) +int torture_stutter_init(const int s) { - int ret; - stutter = s; - ret = torture_create_kthread(torture_stutter, NULL, stutter_task); - return ret; + return torture_create_kthread(torture_stutter, NULL, stutter_task); } EXPORT_SYMBOL_GPL(torture_stutter_init); -- 2.19.0
[PATCH 3.16 16/63] KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Paolo Bonzini commit ce14e868a54edeb2e30cb7a7b104a2fc4b9d76ca upstream. Int the next patch the emulator's .read_std and .write_std callbacks will grow another argument, which is not needed in kvm_read_guest_virt and kvm_write_guest_virt_system's callers. Since we have to make separate functions, let's give the currently existing names a nicer interface, too. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: - Drop change to handle_invvpid() - Adjust context] Signed-off-by: Ben Hutchings --- --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6027,8 +6027,7 @@ static int nested_vmx_check_vmptr(struct vmcs_read32(VMX_INSTRUCTION_INFO), &gva)) return 1; - if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr, - sizeof(vmptr), &e)) { + if (kvm_read_guest_virt(vcpu, gva, &vmptr, sizeof(vmptr), &e)) { kvm_inject_page_fault(vcpu, &e); return 1; } @@ -6539,8 +6538,8 @@ static int handle_vmread(struct kvm_vcpu vmx_instruction_info, &gva)) return 1; /* _system ok, as nested_vmx_check_permission verified cpl=0 */ - kvm_write_guest_virt_system(&vcpu->arch.emulate_ctxt, gva, -&field_value, (is_long_mode(vcpu) ? 8 : 4), NULL); + kvm_write_guest_virt_system(vcpu, gva, &field_value, + (is_long_mode(vcpu) ? 8 : 4), NULL); } nested_vmx_succeed(vcpu); @@ -6575,8 +6574,8 @@ static int handle_vmwrite(struct kvm_vcp if (get_vmx_mem_address(vcpu, exit_qualification, vmx_instruction_info, &gva)) return 1; - if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, - &field_value, (is_long_mode(vcpu) ? 8 : 4), &e)) { + if (kvm_read_guest_virt(vcpu, gva, &field_value, + (is_long_mode(vcpu) ? 8 : 4), &e)) { kvm_inject_page_fault(vcpu, &e); return 1; } @@ -6669,9 +6668,9 @@ static int handle_vmptrst(struct kvm_vcp vmx_instruction_info, &vmcs_gva)) return 1; /* ok to use *_system, as nested_vmx_check_permission verified cpl=0 */ - if (kvm_write_guest_virt_system(&vcpu->arch.emulate_ctxt, vmcs_gva, -(void *)&to_vmx(vcpu)->nested.current_vmptr, -sizeof(u64), &e)) { + if (kvm_write_guest_virt_system(vcpu, vmcs_gva, + (void *)&to_vmx(vcpu)->nested.current_vmptr, + sizeof(u64), &e)) { kvm_inject_page_fault(vcpu, &e); return 1; } @@ -6723,8 +6722,7 @@ static int handle_invept(struct kvm_vcpu if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), vmx_instruction_info, &gva)) return 1; - if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand, - sizeof(operand), &e)) { + if (kvm_read_guest_virt(vcpu, gva, &operand, sizeof(operand), &e)) { kvm_inject_page_fault(vcpu, &e); return 1; } --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4178,11 +4178,10 @@ static int kvm_fetch_guest_virt(struct x exception); } -int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt, +int kvm_read_guest_virt(struct kvm_vcpu *vcpu, gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception) { - struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0; return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access, @@ -4190,26 +4189,24 @@ int kvm_read_guest_virt(struct x86_emula } EXPORT_SYMBOL_GPL(kvm_read_guest_virt); -static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt, - gva_t addr, void *val, unsigned int bytes, - struct x86_exception *exception) +static int emulator_read_std(struct x86_emulate_ctxt *ctxt, +gva_t addr, void *val, unsigned int bytes, +struct x86_exception *exception) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception); } -int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt, -
[PATCH 3.16 07/63] usbip: usbip_host: refine probe and disconnect debug msgs to be useful
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Shuah Khan commit 28b68acc4a88dcf91fd1dcf2577371dc9bf574cc upstream. Refine probe and disconnect debug msgs to be useful and say what is in progress. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub_dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/staging/usbip/stub_dev.c +++ b/drivers/staging/usbip/stub_dev.c @@ -343,7 +343,7 @@ static int stub_probe(struct usb_device struct bus_id_priv *busid_priv; int rc; - dev_dbg(&udev->dev, "Enter\n"); + dev_dbg(&udev->dev, "Enter probe\n"); /* check we should claim or not by busid_table */ busid_priv = get_busid_priv(udev_busid); @@ -446,7 +446,7 @@ static void stub_disconnect(struct usb_d struct bus_id_priv *busid_priv; int rc; - dev_dbg(&udev->dev, "Enter\n"); + dev_dbg(&udev->dev, "Enter disconnect\n"); busid_priv = get_busid_priv(udev_busid); if (!busid_priv) {
[PATCH 3.16 08/63] usbip: usbip_host: delete device from busid_table after rebind
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: "Shuah Khan (Samsung OSG)" commit 1e180f167d4e413afccbbb4a421b48b2de832549 upstream. Device is left in the busid_table after unbind and rebind. Rebind initiates usb bus scan and the original driver claims the device. After rescan the device should be deleted from the busid_table as it no longer belongs to usbip_host. Fix it to delete the device after device_attach() succeeds. Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub_main.c | 6 ++ 1 file changed, 6 insertions(+) --- a/drivers/staging/usbip/stub_main.c +++ b/drivers/staging/usbip/stub_main.c @@ -205,6 +205,9 @@ static ssize_t rebind_store(struct devic if (!bid) return -ENODEV; + /* mark the device for deletion so probe ignores it during rescan */ + bid->status = STUB_BUSID_OTHER; + /* device_attach() callers should hold parent lock for USB */ if (bid->udev->dev.parent) device_lock(bid->udev->dev.parent); @@ -216,6 +219,9 @@ static ssize_t rebind_store(struct devic return ret; } + /* delete device from busid_table */ + del_match_busid((char *) buf); + return count; }
[PATCH 3.16 06/63] usbip: usbip_host: fix to hold parent lock for device_attach() calls
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Shuah Khan commit 4bfb141bc01312a817d36627cc47c93f801c216d upstream. usbip_host calls device_attach() without holding dev->parent lock. Fix it. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub_main.c | 5 + 1 file changed, 5 insertions(+) --- a/drivers/staging/usbip/stub_main.c +++ b/drivers/staging/usbip/stub_main.c @@ -205,7 +205,12 @@ static ssize_t rebind_store(struct devic if (!bid) return -ENODEV; + /* device_attach() callers should hold parent lock for USB */ + if (bid->udev->dev.parent) + device_lock(bid->udev->dev.parent); ret = device_attach(&bid->udev->dev); + if (bid->udev->dev.parent) + device_unlock(bid->udev->dev.parent); if (ret < 0) { dev_err(&bid->udev->dev, "rebind failed\n"); return ret;
[PATCH 3.16 03/63] Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU"
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Ben Hutchings This reverts commit 5a79e43ffa5014c020e0d0f4e383205f87b10111, which was commit 03080e5ec72740c1a62e6730f2a5f3f114f11b19 upstream, as it causes test failures. It should not have been backported to anything older than 4.16. Thanks to Alistair Strachan for debugging this. Cc: Alistair Strachan Signed-off-by: Ben Hutchings --- net/ipv4/ip_vti.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index b4fc9e710308..778ee1bf40ad 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -359,6 +359,7 @@ static int vti_tunnel_init(struct net_device *dev) memcpy(dev->dev_addr, &iph->saddr, 4); memcpy(dev->broadcast, &iph->daddr, 4); + dev->mtu= ETH_DATA_LEN; dev->flags = IFF_NOARP; dev->iflink = 0; dev->addr_len = 4;
[PATCH 3.16 10/63] usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: "Shuah Khan (Samsung OSG)" commit 22076557b07c12086eeb16b8ce2b0b735f7a27e7 upstream. usbip_host updates device status without holding lock from stub probe, disconnect and rebind code paths. When multiple requests to import a device are received, these unprotected code paths step all over each other and drive fails with NULL-ptr deref and use-after-free errors. The driver uses a table lock to protect the busid array for adding and deleting busids to the table. However, the probe, disconnect and rebind paths get the busid table entry and update the status without holding the busid table lock. Add a new finer grain lock to protect the busid entry. This new lock will be held to search and update the busid entry fields from get_busid_idx(), add_match_busid() and del_match_busid(). match_busid_show() does the same to access the busid entry fields. get_busid_priv() changed to return the pointer to the busid entry holding the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind() call put_busid_priv() to release the busid lock before returning. This changes fixes the unprotected code paths eliminating the race conditions in updating the busid entries. Reported-by: Jakub Jirasek Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filenames, context] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub.h | 2 ++ drivers/staging/usbip/stub_dev.c | 33 - drivers/staging/usbip/stub_main.c | 40 ++- 3 files changed, 60 insertions(+), 15 deletions(-) --- a/drivers/staging/usbip/stub.h +++ b/drivers/staging/usbip/stub.h @@ -87,6 +87,7 @@ struct bus_id_priv { struct stub_device *sdev; struct usb_device *udev; char shutdown_busid; + spinlock_t busid_lock; }; /* stub_priv is allocated from stub_priv_cache */ @@ -97,6 +98,7 @@ extern struct usb_device_driver stub_dri /* stub_main.c */ struct bus_id_priv *get_busid_priv(const char *busid); +void put_busid_priv(struct bus_id_priv *bid); int del_match_busid(char *busid); void stub_device_cleanup_urbs(struct stub_device *sdev); --- a/drivers/staging/usbip/stub_dev.c +++ b/drivers/staging/usbip/stub_dev.c @@ -341,7 +341,7 @@ static int stub_probe(struct usb_device struct stub_device *sdev = NULL; const char *udev_busid = dev_name(&udev->dev); struct bus_id_priv *busid_priv; - int rc; + int rc = 0; dev_dbg(&udev->dev, "Enter probe\n"); @@ -358,13 +358,15 @@ static int stub_probe(struct usb_device * other matched drivers by the driver core. * See driver_probe_device() in driver/base/dd.c */ - return -ENODEV; + rc = -ENODEV; + goto call_put_busid_priv; } if (udev->descriptor.bDeviceClass == USB_CLASS_HUB) { dev_dbg(&udev->dev, "%s is a usb hub device... skip!\n", udev_busid); - return -ENODEV; + rc = -ENODEV; + goto call_put_busid_priv; } if (!strcmp(udev->bus->bus_name, "vhci_hcd")) { @@ -372,13 +374,16 @@ static int stub_probe(struct usb_device "%s is attached on vhci_hcd... skip!\n", udev_busid); - return -ENODEV; + rc = -ENODEV; + goto call_put_busid_priv; } /* ok, this is my device */ sdev = stub_device_alloc(udev); - if (!sdev) - return -ENOMEM; + if (!sdev) { + rc = -ENOMEM; + goto call_put_busid_priv; + } dev_info(&udev->dev, "usbip-host: register new device (bus %u dev %u)\n", @@ -410,7 +415,9 @@ static int stub_probe(struct usb_device } busid_priv->status = STUB_BUSID_ALLOC; - return 0; + rc = 0; + goto call_put_busid_priv; + err_files: usb_hub_release_port(udev->parent, udev->portnum, (struct usb_dev_state *) udev); @@ -421,6 +428,9 @@ err_port: busid_priv->sdev = NULL; stub_device_free(sdev); + +call_put_busid_priv: + put_busid_priv(busid_priv); return rc; } @@ -459,7 +469,7 @@ static void stub_disconnect(struct usb_d /* get stub_device */ if (!sdev) { dev_err(&udev->dev, "could not get device"); - return; + goto call_put_busid_priv; } dev_set_drvdata(&udev->dev, NULL); @@ -474,12 +484,12 @@ static void stub_disconnect(struct usb_d (struct usb_dev_state *) udev); if (rc) { dev_dbg(&udev->dev, "unable to release port\n"); - return; + goto call_put_busid_p
[PATCH 3.16 18/63] sr: pass down correctly sized SCSI sense buffer
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Jens Axboe commit f7068114d45ec55996b9040e98111afa56e010fe upstream. We're casting the CDROM layer request_sense to the SCSI sense buffer, but the former is 64 bytes and the latter is 96 bytes. As we generally allocate these on the stack, we end up blowing up the stack. Fix this by wrapping the scsi_execute() call with a properly sized sense buffer, and copying back the bits for the CDROM layer. Reported-by: Piotr Gabriel Kosinski Reported-by: Daniel Shapira Tested-by: Kees Cook Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Jens Axboe [bwh: Despite what the "Fixes" field says, a buffer overrun was already possible if the sense data was really > 64 bytes long. Backported to 3.16: - We always need to allocate a sense buffer in order to call scsi_normalize_sense() - Remove the existing conditional heap-allocation of the sense buffer] Signed-off-by: Ben Hutchings --- drivers/scsi/sr_ioctl.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/drivers/scsi/sr_ioctl.c +++ b/drivers/scsi/sr_ioctl.c @@ -188,30 +188,25 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack struct scsi_device *SDev; struct scsi_sense_hdr sshdr; int result, err = 0, retries = 0; - struct request_sense *sense = cgc->sense; + unsigned char sense_buffer[SCSI_SENSE_BUFFERSIZE]; SDev = cd->device; - if (!sense) { - sense = kmalloc(SCSI_SENSE_BUFFERSIZE, GFP_KERNEL); - if (!sense) { - err = -ENOMEM; - goto out; - } - } - retry: if (!scsi_block_when_processing_errors(SDev)) { err = -ENODEV; goto out; } - memset(sense, 0, sizeof(*sense)); + memset(sense_buffer, 0, sizeof(sense_buffer)); result = scsi_execute(SDev, cgc->cmd, cgc->data_direction, - cgc->buffer, cgc->buflen, (char *)sense, + cgc->buffer, cgc->buflen, sense_buffer, cgc->timeout, IOCTL_RETRIES, 0, NULL); - scsi_normalize_sense((char *)sense, sizeof(*sense), &sshdr); + scsi_normalize_sense(sense_buffer, sizeof(sense_buffer), &sshdr); + + if (cgc->sense) + memcpy(cgc->sense, sense_buffer, sizeof(*cgc->sense)); /* Minimal error checking. Ignore cases we know about, and report the rest. */ if (driver_byte(result) != 0) { @@ -268,8 +263,6 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack /* Wake up a process waiting for device */ out: - if (!cgc->sense) - kfree(sense); cgc->stat = err; return err; }
[PATCH 3.16 02/63] x86/fpu: Default eagerfpu if FPU and FXSR are enabled
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Ben Hutchings This is a limited version of commit 58122bf1d856 "x86/fpu: Default eagerfpu=on on all CPUs". That commit revealed bugs in the use of eagerfpu together with math emulation or without the FXSR feature. Although those bugs have been fixed upstream, the fixes do not seem to be practical to backport to 3.16. The security issue that motivates using eagerfpu (CVE-2018-3665) is an information leak through speculative execution, and most CPUs lacking the FXSR feature also don't implement speculative execution. The exceptions I am aware of are the Intel Pentium Pro and AMD K6 family, which will remain vulnerable to this issue. Move the eagerfpu variable and associated initialisation into fpu_init(), since xstate_enable_boot_cpu() won't be called at all if XSAVE is disabled. Signed-off-by: Ben Hutchings Cc: Andy Lutomirski Cc: x...@kernel.org --- --- a/arch/x86/kernel/xsave.c +++ b/arch/x86/kernel/xsave.c @@ -509,19 +509,6 @@ static void __init setup_init_fpu_buf(vo xsave_state(init_xstate_buf, -1); } -static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO; -static int __init eager_fpu_setup(char *s) -{ - if (!strcmp(s, "on")) - eagerfpu = ENABLE; - else if (!strcmp(s, "off")) - eagerfpu = DISABLE; - else if (!strcmp(s, "auto")) - eagerfpu = AUTO; - return 1; -} -__setup("eagerfpu=", eager_fpu_setup); - /* * Enable and initialize the xsave feature. */ @@ -560,17 +547,11 @@ static void __init xstate_enable_boot_cp prepare_fx_sw_frame(); setup_init_fpu_buf(); - /* Auto enable eagerfpu for xsaveopt */ - if (cpu_has_xsaveopt && eagerfpu != DISABLE) - eagerfpu = ENABLE; - if (pcntxt_mask & XSTATE_EAGER) { - if (eagerfpu == DISABLE) { + if (!boot_cpu_has(X86_FEATURE_EAGER_FPU)) { pr_err("eagerfpu not present, disabling some xstate features: 0x%llx\n", pcntxt_mask & XSTATE_EAGER); pcntxt_mask &= ~XSTATE_EAGER; - } else { - eagerfpu = ENABLE; } } @@ -613,9 +594,6 @@ void eager_fpu_init(void) clear_used_math(); current_thread_info()->status = 0; - if (eagerfpu == ENABLE) - setup_force_cpu_cap(X86_FEATURE_EAGER_FPU); - if (!cpu_has_eager_fpu) { stts(); return; --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -159,6 +159,19 @@ static void init_thread_xstate(void) xstate_size = sizeof(struct i387_fsave_struct); } +static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO; +static int __init eager_fpu_setup(char *s) +{ + if (!strcmp(s, "on")) + eagerfpu = ENABLE; + else if (!strcmp(s, "off")) + eagerfpu = DISABLE; + else if (!strcmp(s, "auto")) + eagerfpu = AUTO; + return 1; +} +__setup("eagerfpu=", eager_fpu_setup); + /* * Called at bootup to set up the initial FPU state that is later cloned * into all processes. @@ -197,6 +210,17 @@ void fpu_init(void) if (xstate_size == 0) init_thread_xstate(); + /* +* We should always enable eagerfpu, but it doesn't work properly +* here without fpu and fxsr. +*/ + if (eagerfpu == AUTO) + eagerfpu = (boot_cpu_has(X86_FEATURE_FPU) && + boot_cpu_has(X86_FEATURE_FXSR)) ? + ENABLE : DISABLE; + if (eagerfpu == ENABLE) + setup_force_cpu_cap(X86_FEATURE_EAGER_FPU); + mxcsr_feature_mask_init(); xsave_init(); eager_fpu_init();
[PATCH 3.16 09/63] usbip: usbip_host: run rebind from exit when module is removed
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: "Shuah Khan (Samsung OSG)" commit 7510df3f29d44685bab7b1918b61a8ccd57126a9 upstream. After removing usbip_host module, devices it releases are left without a driver. For example, when a keyboard or a mass storage device are bound to usbip_host when it is removed, these devices are no longer bound to any driver. Fix it to run device_attach() from the module exit routine to restore the devices to their original drivers. This includes cleanup changes and moving device_attach() code to a common routine to be called from rebind_store() and usbip_host_exit(). Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filenames] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub_dev.c | 6 +--- drivers/staging/usbip/stub_main.c | 60 +-- 2 files changed, 52 insertions(+), 14 deletions(-) --- a/drivers/staging/usbip/stub_dev.c +++ b/drivers/staging/usbip/stub_dev.c @@ -490,12 +490,8 @@ static void stub_disconnect(struct usb_d busid_priv->sdev = NULL; stub_device_free(sdev); - if (busid_priv->status == STUB_BUSID_ALLOC) { + if (busid_priv->status == STUB_BUSID_ALLOC) busid_priv->status = STUB_BUSID_ADDED; - } else { - busid_priv->status = STUB_BUSID_OTHER; - del_match_busid((char *)udev_busid); - } } #ifdef CONFIG_PM --- a/drivers/staging/usbip/stub_main.c +++ b/drivers/staging/usbip/stub_main.c @@ -28,6 +28,7 @@ #define DRIVER_DESC "USB/IP Host Driver" struct kmem_cache *stub_priv_cache; + /* * busid_tables defines matching busids that usbip can grab. A user can change * dynamically what device is locally used and what device is exported to a @@ -188,6 +189,51 @@ static ssize_t store_match_busid(struct static DRIVER_ATTR(match_busid, S_IRUSR | S_IWUSR, show_match_busid, store_match_busid); +static int do_rebind(char *busid, struct bus_id_priv *busid_priv) +{ + int ret; + + /* device_attach() callers should hold parent lock for USB */ + if (busid_priv->udev->dev.parent) + device_lock(busid_priv->udev->dev.parent); + ret = device_attach(&busid_priv->udev->dev); + if (busid_priv->udev->dev.parent) + device_unlock(busid_priv->udev->dev.parent); + if (ret < 0) { + dev_err(&busid_priv->udev->dev, "rebind failed\n"); + return ret; + } + return 0; +} + +static void stub_device_rebind(void) +{ +#if IS_MODULE(CONFIG_USBIP_HOST) + struct bus_id_priv *busid_priv; + int i; + + /* update status to STUB_BUSID_OTHER so probe ignores the device */ + spin_lock(&busid_table_lock); + for (i = 0; i < MAX_BUSID; i++) { + if (busid_table[i].name[0] && + busid_table[i].shutdown_busid) { + busid_priv = &(busid_table[i]); + busid_priv->status = STUB_BUSID_OTHER; + } + } + spin_unlock(&busid_table_lock); + + /* now run rebind */ + for (i = 0; i < MAX_BUSID; i++) { + if (busid_table[i].name[0] && + busid_table[i].shutdown_busid) { + busid_priv = &(busid_table[i]); + do_rebind(busid_table[i].name, busid_priv); + } + } +#endif +} + static ssize_t rebind_store(struct device_driver *dev, const char *buf, size_t count) { @@ -208,16 +254,9 @@ static ssize_t rebind_store(struct devic /* mark the device for deletion so probe ignores it during rescan */ bid->status = STUB_BUSID_OTHER; - /* device_attach() callers should hold parent lock for USB */ - if (bid->udev->dev.parent) - device_lock(bid->udev->dev.parent); - ret = device_attach(&bid->udev->dev); - if (bid->udev->dev.parent) - device_unlock(bid->udev->dev.parent); - if (ret < 0) { - dev_err(&bid->udev->dev, "rebind failed\n"); + ret = do_rebind((char *) buf, bid); + if (ret < 0) return ret; - } /* delete device from busid_table */ del_match_busid((char *) buf); @@ -343,6 +382,9 @@ static void __exit usbip_host_exit(void) */ usb_deregister_device_driver(&stub_driver); + /* initiate scan to attach devices */ + stub_device_rebind(); + kmem_cache_destroy(stub_priv_cache); }
[PATCH 3.16 00/63] 3.16.58-rc1 review
This is the start of the stable review cycle for the 3.16.58 release. There are 63 patches in this series, which will be posted as responses to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Mon Sep 24 00:15:41 UTC 2018. Anything received after that time might be too late. All the patches have also been committed to the linux-3.16.y-rc branch of https://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-stable-rc.git . A shortlog and diffstat can be found below. Ben. - Alexander Potapenko (1): scsi: sg: allocate with __GFP_ZERO in sg_build_indirect() [a45b599ad808c3c982fdcdc12b0b8611c2f92824] Alexey Khoroshilov (1): usbip: fix error handling in stub_probe() [3ff67445750a84de67faaf52c6e1895cb09f2c56] Andy Lutomirski (1): x86/entry/64: Remove %ebx handling from error_entry/exit [b3681dd548d06deb2e1573890829dff4b15abf46] Ben Hutchings (2): Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU" [not upstream; the reverted commit was correct for upstream] x86/fpu: Default eagerfpu if FPU and FXSR are enabled [58122bf1d856a4ea9581d62a07c557d997d46a19] Borislav Petkov (1): x86/cpu/AMD: Fix erratum 1076 (CPB bit) [f7f3dc00f61261cdc9ccd8b886f21bc4dffd6fd9] Christoph Paasch (1): net: Set sk_prot_creator when cloning sockets to the right proto [9d538fa60bad4f7b23193c89e843797a1cf71ef3] Cong Wang (1): infiniband: fix a possible use-after-free bug [cb2595c1393b4a5211534e6f0a0fbad369e21ad8] Dave Chinner (2): xfs: catch inode allocation state mismatch corruption [ee457001ed6c6f31ddad69c24c1da8f377d8472d] xfs: validate cached inodes are free when allocated [afca6c5b2595fc44383919fba740c194b0b76aff] Eric Sandeen (2): xfs: don't call xfs_da_shrink_inode with NULL bp [bb3d48dcf86a97dc25fe9fc2c11938e19cb4399a] xfs: set format back to extents if xfs_bmap_extents_to_btree [2c4306f719b083d17df2963bc761777576b8ad1b] Ernesto A . Fernández (1): hfsplus: fix NULL dereference in hfsplus_lookup() [a7ec7a4193a2eb3b5341243fc0b621c1ac9e4ec4] Ingo Molnar (2): x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT [d364a7656c1855c940dfa4baf4ebcc3c6a9e6fd2] x86/speculation: Clean up various Spectre related details [21e433bdb95bdf3aa48226fd3d33af608437f293] Jann Horn (1): USB: yurex: fix out-of-bounds uaccess in read handler [f1e255d60ae66a9f672ff9a207ee6cd8e33d2679] Jason Yan (1): scsi: libsas: defer ata device eh commands to libata [318aaf34f1179b39fa9c30fa0f3288b645beee39] Jens Axboe (1): sr: pass down correctly sized SCSI sense buffer [f7068114d45ec55996b9040e98111afa56e010fe] Jiri Kosina (1): x86/speculation: Protect against userspace-userspace spectreRSB [fdf82a7856b32d905c39afc85e34364491e46346] Kees Cook (5): seccomp: add "seccomp" syscall [48dc92b9fc3926844257316e75ba11eb5c742b2c] seccomp: create internal mode-setting function [d78ab02c2c194257a03355fbb79eb721b381d105] seccomp: extract check/assign mode helpers [1f41b450416e689b9b7c8bfb750a98604f687a9b] seccomp: split mode setting routines [3b23dd12846215eff4afb073366b80c0c4d7543e] video: uvesafb: Fix integer overflow in allocation [9f645bcc566a1e9f921bdae7528a01ced5bc3713] Kyle Huey (2): x86/process: Correct and optimize TIF_BLOCKSTEP switch [b9894a2f5bd18b1691cb6872c9afe32b148d0132] x86/process: Optimize TIF checks in __switch_to_xtra() [af8b3cd3934ec60f4c2a420d19a9d416554f140b] Linus Torvalds (2): Fix up non-directory creation in SGID directories [0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7] mm: get rid of vmacache_flush_all() entirely [7a9cdebdcc17e426fb5287e4a82db1dfe86339b2] Mark Salyzyn (1): Bluetooth: hidp: buffer overflow in hidp_process_report [7992c18810e568b95c869b227137a2215702a805] Mel Gorman (2): futex: Remove requirement for lock_page() in get_futex_key() [65d8fc777f6dcfee12785c057a6b57f679641c90] futex: Remove unnecessary warning from get_futex_key [48fb6f4db940e92cfb16cd878cddd59ea6120d06] Nadav Amit (1): KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR [e37a75a13cdae5deaa2ea2cbf8d55b5dd08638b6] Paolo Bonzini (4): KVM: x86: introduce linear_{read,write}_system [79367a65743975e5cac8d24d08eccc7fdae832b0] KVM: x86: introduce num_emulated_msrs [62ef68bb4d00f1a662e487f3fc44ce8521c416aa] KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system [ce14e868a54edeb2e30cb7a7b104a2fc4b9d76ca] kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access [3c9fa24ca7c9c4
[PATCH 3.16 11/63] usbip: usbip_host: fix bad unlock balance during stub_probe()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: "Shuah Khan (Samsung OSG)" commit c171654caa875919be3c533d3518da8be5be966e upstream. stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer. This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer. usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip! [ 367.359679] = [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ #5 Not tainted [ 367.359685] - [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] == [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0058 by task modprobe/2768 [ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5 Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/staging/usbip/stub_main.c +++ b/drivers/staging/usbip/stub_main.c @@ -96,7 +96,8 @@ struct bus_id_priv *get_busid_priv(const void put_busid_priv(struct bus_id_priv *bid) { - spin_unlock(&bid->busid_lock); + if (bid) + spin_unlock(&bid->busid_lock); } static int add_match_busid(char *busid)
[PATCH 3.16 04/63] net: Set sk_prot_creator when cloning sockets to the right proto
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Christoph Paasch commit 9d538fa60bad4f7b23193c89e843797a1cf71ef3 upstream. sk->sk_prot and sk->sk_prot_creator can differ when the app uses IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one). Which is why sk_prot_creator is there to make sure that sk_prot_free() does the kmem_cache_free() on the right kmem_cache slab. Now, if such a socket gets transformed back to a listening socket (using connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through sk_clone_lock() when a new connection comes in. But sk_prot_creator will still point to the IPv6 kmem_cache (as everything got copied in sk_clone_lock()). When freeing, we will thus put this memory back into the IPv6 kmem_cache although it was allocated in the IPv4 cache. I have seen memory corruption happening because of this. With slub-debugging and MEMCG_KMEM enabled this gives the warning "cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP" A C-program to trigger this: void main(void) { int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP); int new_fd, newest_fd, client_fd; struct sockaddr_in6 bind_addr; struct sockaddr_in bind_addr4, client_addr1, client_addr2; struct sockaddr unsp; int val; memset(&bind_addr, 0, sizeof(bind_addr)); bind_addr.sin6_family = AF_INET6; bind_addr.sin6_port = ntohs(42424); memset(&client_addr1, 0, sizeof(client_addr1)); client_addr1.sin_family = AF_INET; client_addr1.sin_port = ntohs(42424); client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&client_addr2, 0, sizeof(client_addr2)); client_addr2.sin_family = AF_INET; client_addr2.sin_port = ntohs(42421); client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&unsp, 0, sizeof(unsp)); unsp.sa_family = AF_UNSPEC; bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr)); listen(fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1)); new_fd = accept(fd, NULL, NULL); close(fd); val = AF_INET; setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val)); connect(new_fd, &unsp, sizeof(unsp)); memset(&bind_addr4, 0, sizeof(bind_addr4)); bind_addr4.sin_family = AF_INET; bind_addr4.sin_port = ntohs(42421); bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4)); listen(new_fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2)); newest_fd = accept(new_fd, NULL, NULL); close(new_fd); close(client_fd); close(new_fd); } As far as I can see, this bug has been there since the beginning of the git-days. Signed-off-by: Christoph Paasch Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Cc: Thomas King Signed-off-by: Ben Hutchings --- net/core/sock.c | 2 ++ 1 file changed, 2 insertions(+) --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1512,6 +1512,8 @@ struct sock *sk_clone_lock(const struct sock_copy(newsk, sk); + newsk->sk_prot_creator = sk->sk_prot; + /* SANITY */ get_net(sock_net(newsk)); sk_node_init(&newsk->sk_node);
[PATCH 3.16 17/63] kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Paolo Bonzini commit 3c9fa24ca7c9c47605672916491f79e8ccacb9e6 upstream. The functions that were used in the emulation of fxrstor, fxsave, sgdt and sidt were originally meant for task switching, and as such they did not check privilege levels. This is very bad when the same functions are used in the emulation of unprivileged instructions. This is CVE-2018-10853. The obvious fix is to add a new argument to ops->read_std and ops->write_std, which decides whether the access is a "system" access or should use the processor's CPL. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: Drop change in handle_ud()] Signed-off-by: Ben Hutchings --- --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -104,11 +104,12 @@ struct x86_emulate_ops { * @addr: [IN ] Linear address from which to read. * @val: [OUT] Value read from memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to read from memory. +* @system:[IN ] Whether the access is forced to be at CPL0. */ int (*read_std)(struct x86_emulate_ctxt *ctxt, unsigned long addr, void *val, unsigned int bytes, - struct x86_exception *fault); + struct x86_exception *fault, bool system); /* * write_std: Write bytes of standard (non-emulated/special) memory. @@ -116,10 +117,11 @@ struct x86_emulate_ops { * @addr: [IN ] Linear address to which to write. * @val: [OUT] Value write to memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to write to memory. +* @system:[IN ] Whether the access is forced to be at CPL0. */ int (*write_std)(struct x86_emulate_ctxt *ctxt, unsigned long addr, void *val, unsigned int bytes, -struct x86_exception *fault); +struct x86_exception *fault, bool system); /* * fetch: Read bytes of standard (non-emulated/special) memory. *Used for instruction fetch. --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -734,14 +734,14 @@ static int linearize(struct x86_emulate_ static int linear_read_system(struct x86_emulate_ctxt *ctxt, ulong linear, void *data, unsigned size) { - return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception); + return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception, true); } static int linear_write_system(struct x86_emulate_ctxt *ctxt, ulong linear, void *data, unsigned int size) { - return ctxt->ops->write_std(ctxt, linear, data, size, &ctxt->exception); + return ctxt->ops->write_std(ctxt, linear, data, size, &ctxt->exception, true); } static int segmented_read_std(struct x86_emulate_ctxt *ctxt, @@ -755,7 +755,7 @@ static int segmented_read_std(struct x86 rc = linearize(ctxt, addr, size, false, &linear); if (rc != X86EMUL_CONTINUE) return rc; - return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception); + return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception, false); } static int segmented_write_std(struct x86_emulate_ctxt *ctxt, @@ -769,7 +769,7 @@ static int segmented_write_std(struct x8 rc = linearize(ctxt, addr, size, true, &linear); if (rc != X86EMUL_CONTINUE) return rc; - return ctxt->ops->write_std(ctxt, linear, data, size, &ctxt->exception); + return ctxt->ops->write_std(ctxt, linear, data, size, &ctxt->exception, false); } /* @@ -2472,12 +2472,12 @@ static bool emulator_io_port_access_allo #ifdef CONFIG_X86_64 base |= ((u64)base3) << 32; #endif - r = ops->read_std(ctxt, base + 102, &io_bitmap_ptr, 2, NULL); + r = ops->read_std(ctxt, base + 102, &io_bitmap_ptr, 2, NULL, true); if (r != X86EMUL_CONTINUE) return false; if (io_bitmap_ptr + port/8 > desc_limit_scaled(&tr_seg)) return false; - r = ops->read_std(ctxt, base + io_bitmap_ptr + port/8, &perm, 2, NULL); + r = ops->read_std(ctxt, base + io_bitmap_ptr + port/8, &perm, 2, NULL, true); if (r != X86EMUL_CONTINUE) return false; if ((perm >> bit_idx) & mask) --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4191,10 +4191,15 @@ EXPORT_SYMBOL_GPL(kvm_read_guest_virt); static int emulator_read_std(struct x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes, -struct x86_exception *exception) +struct x86_e
[PATCH 3.16 48/63] video: uvesafb: Fix integer overflow in allocation
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 9f645bcc566a1e9f921bdae7528a01ced5bc3713 upstream. cmap->len can get close to INT_MAX/2, allowing for an integer overflow in allocation. This uses kmalloc_array() instead to catch the condition. Reported-by: Dr Silvio Cesare of InfoSect Fixes: 8bdb3a2d7df48 ("uvesafb: the driver core") Signed-off-by: Kees Cook Signed-off-by: Ben Hutchings --- drivers/video/fbdev/uvesafb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/video/fbdev/uvesafb.c +++ b/drivers/video/fbdev/uvesafb.c @@ -1059,7 +1059,8 @@ static int uvesafb_setcmap(struct fb_cma info->cmap.len || cmap->start < info->cmap.start) return -EINVAL; - entries = kmalloc(sizeof(*entries) * cmap->len, GFP_KERNEL); + entries = kmalloc_array(cmap->len, sizeof(*entries), + GFP_KERNEL); if (!entries) return -ENOMEM;
[PATCH 3.16 54/63] seccomp: create internal mode-setting function
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit d78ab02c2c194257a03355fbb79eb721b381d105 upstream. In preparation for having other callers of the seccomp mode setting logic, split the prctl entry point away from the core logic that performs seccomp mode setting. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings --- kernel/seccomp.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -473,7 +473,7 @@ long prctl_get_seccomp(void) } /** - * prctl_set_seccomp: configures current->seccomp.mode + * seccomp_set_mode: internal function for setting seccomp mode * @seccomp_mode: requested mode to use * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER * @@ -486,7 +486,7 @@ long prctl_get_seccomp(void) * * Returns 0 on success or -EINVAL on failure. */ -long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) +static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter) { long ret = -EINVAL; @@ -517,3 +517,15 @@ long prctl_set_seccomp(unsigned long sec out: return ret; } + +/** + * prctl_set_seccomp: configures current->seccomp.mode + * @seccomp_mode: requested mode to use + * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER + * + * Returns 0 on success or -EINVAL on failure. + */ +long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) +{ + return seccomp_set_mode(seccomp_mode, filter); +}
Re: Code of Conduct: Let's revamp it.
On Fri, Sep 21, 2018 at 8:05 PM Joey Pabalinas wrote: > > On Fri, Sep 21, 2018 at 07:31:05PM -0400, jonsm...@gmail.com wrote: > > On Fri, Sep 21, 2018 at 7:17 PM Theodore Y. Ts'o wrote: > > > > > > People can decide who they want to respond to, but I'm going to gently > > > suggest that before people think about responding to a particular > > > e-mail, that they do a quick check using "git log > > > --author=xy...@example.com" > > > then decide how much someone appears to be a member of the community > > > before deciding how and whether their thoughts are relevant. > > > > How does this part apply to email addresses used to commit code? > > > > * Publishing others’ private information, such as a physical or electronic > > address, without explicit permission > > > > It appears to me that this would conflict with the GPL since the GPL > > granted the right to distribute (or even print it in a book) Linux and > > Linux contains email addresses. This also seems contradictory with > > the Reply button I used to send this email. > > I don't really think email addresses used in patches which are sent, > voluntarily, to a public mailing list are something you can sanely > consider "private information". > > > How do you reconcile working on a public project while keeping email > > address secret? > > This is a little more delicate, and I admit that I can't really > think of any real solutions for this part... I would propose adding a statement to clarify that Linux is a public project and because of this things like names and email addresses of people working on the project are public information. I don't see how any other position is viable since it appears to be a GPL conflict. But... it this bothers you, simply don't use your private, personal email address when working on the kernel. Anyone with the skills to work on the kernel should know enough to be able to create email aliases. No rule says you have to use your real name either. Aliases have been used in the past, search through the logs and you will find a few commits from 'anonymous'. However, commits from anonymous sources have to go through extra layers of review since the identity of the contributor and their reputation is unknown. If someone is using an email alias and has their hidden, true address published then that would be a CoC violation. Although if you mess up and submit a patch using your hidden identity, you just published it and it is no longer hidden. > > -- > Cheers, > Joey Pabalinas -- Jon Smirl jonsm...@gmail.com
[PATCH 3.16 61/63] x86/cpu/intel: Add Knights Mill to Intel family
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Piotr Luc commit 0047f59834e5947d45f34f5f12eb330d158f700b upstream. Add CPUID of Knights Mill (KNM) processor to Intel family list. Signed-off-by: Piotr Luc Reviewed-by: Dave Hansen Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20161012180520.30976-1-piotr@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings --- arch/x86/include/asm/intel-family.h | 1 + 1 file changed, 1 insertion(+) --- a/arch/x86/include/asm/intel-family.h +++ b/arch/x86/include/asm/intel-family.h @@ -67,5 +67,6 @@ /* Xeon Phi */ #define INTEL_FAM6_XEON_PHI_KNL0x57 /* Knights Landing */ +#define INTEL_FAM6_XEON_PHI_KNM0x85 /* Knights Mill */ #endif /* _ASM_X86_INTEL_FAMILY_H */
[PATCH 3.16 56/63] seccomp: split mode setting routines
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 3b23dd12846215eff4afb073366b80c0c4d7543e upstream. Separates the two mode setting paths to make things more readable with fewer #ifdefs within function bodies. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings --- kernel/seccomp.c | 71 1 file changed, 48 insertions(+), 23 deletions(-) --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -489,48 +489,66 @@ long prctl_get_seccomp(void) } /** - * seccomp_set_mode: internal function for setting seccomp mode - * @seccomp_mode: requested mode to use - * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER - * - * This function may be called repeatedly with a @seccomp_mode of - * SECCOMP_MODE_FILTER to install additional filters. Every filter - * successfully installed will be evaluated (in reverse order) for each system - * call the task makes. + * seccomp_set_mode_strict: internal function for setting strict seccomp * * Once current->seccomp.mode is non-zero, it may not be changed. * * Returns 0 on success or -EINVAL on failure. */ -static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter) +static long seccomp_set_mode_strict(void) { + const unsigned long seccomp_mode = SECCOMP_MODE_STRICT; long ret = -EINVAL; if (!seccomp_may_assign_mode(seccomp_mode)) goto out; - switch (seccomp_mode) { - case SECCOMP_MODE_STRICT: - ret = 0; #ifdef TIF_NOTSC - disable_TSC(); + disable_TSC(); #endif - break; + seccomp_assign_mode(seccomp_mode); + ret = 0; + +out: + + return ret; +} + #ifdef CONFIG_SECCOMP_FILTER - case SECCOMP_MODE_FILTER: - ret = seccomp_attach_user_filter(filter); - if (ret) - goto out; - break; -#endif - default: +/** + * seccomp_set_mode_filter: internal function for setting seccomp filter + * @filter: struct sock_fprog containing filter + * + * This function may be called repeatedly to install additional filters. + * Every filter successfully installed will be evaluated (in reverse order) + * for each system call the task makes. + * + * Once current->seccomp.mode is non-zero, it may not be changed. + * + * Returns 0 on success or -EINVAL on failure. + */ +static long seccomp_set_mode_filter(char __user *filter) +{ + const unsigned long seccomp_mode = SECCOMP_MODE_FILTER; + long ret = -EINVAL; + + if (!seccomp_may_assign_mode(seccomp_mode)) + goto out; + + ret = seccomp_attach_user_filter(filter); + if (ret) goto out; - } seccomp_assign_mode(seccomp_mode); out: return ret; } +#else +static inline long seccomp_set_mode_filter(char __user *filter) +{ + return -EINVAL; +} +#endif /** * prctl_set_seccomp: configures current->seccomp.mode @@ -541,5 +559,12 @@ out: */ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) { - return seccomp_set_mode(seccomp_mode, filter); + switch (seccomp_mode) { + case SECCOMP_MODE_STRICT: + return seccomp_set_mode_strict(); + case SECCOMP_MODE_FILTER: + return seccomp_set_mode_filter(filter); + default: + return -EINVAL; + } }
[PATCH 3.16 14/63] KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Nadav Amit commit e37a75a13cdae5deaa2ea2cbf8d55b5dd08638b6 upstream. The current implementation ignores the LDTR/TR base high 32-bits on long-mode. As a result the loaded segment descriptor may be incorrect. Signed-off-by: Nadav Amit Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1467,6 +1467,7 @@ static int __load_segment_descriptor(str ulong desc_addr; int ret; u16 dummy; + u32 base3 = 0; memset(&seg_desc, 0, sizeof seg_desc); @@ -1597,9 +1598,14 @@ static int __load_segment_descriptor(str ret = write_segment_descriptor(ctxt, selector, &seg_desc); if (ret != X86EMUL_CONTINUE) return ret; + } else if (ctxt->mode == X86EMUL_MODE_PROT64) { + ret = ctxt->ops->read_std(ctxt, desc_addr+8, &base3, + sizeof(base3), &ctxt->exception); + if (ret != X86EMUL_CONTINUE) + return ret; } load: - ctxt->ops->set_segment(ctxt, selector, &seg_desc, 0, seg); + ctxt->ops->set_segment(ctxt, selector, &seg_desc, base3, seg); if (desc) *desc = seg_desc; return X86EMUL_CONTINUE;
[PATCH 3.16 05/63] usbip: fix error handling in stub_probe()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Alexey Khoroshilov commit 3ff67445750a84de67faaf52c6e1895cb09f2c56 upstream. If usb_hub_claim_port() fails, no resources are deallocated and if stub_add_files() fails, port is not released. The patch fixes these issues and rearranges error handling code. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Acked-by: Valentina Manea Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- drivers/staging/usbip/stub_dev.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) --- a/drivers/staging/usbip/stub_dev.c +++ b/drivers/staging/usbip/stub_dev.c @@ -340,7 +340,6 @@ static int stub_probe(struct usb_device { struct stub_device *sdev = NULL; const char *udev_busid = dev_name(&udev->dev); - int err = 0; struct bus_id_priv *busid_priv; int rc; @@ -401,23 +400,28 @@ static int stub_probe(struct usb_device (struct usb_dev_state *) udev); if (rc) { dev_dbg(&udev->dev, "unable to claim port\n"); - return rc; + goto err_port; } - err = stub_add_files(&udev->dev); - if (err) { + rc = stub_add_files(&udev->dev); + if (rc) { dev_err(&udev->dev, "stub_add_files for %s\n", udev_busid); - dev_set_drvdata(&udev->dev, NULL); - usb_put_dev(udev); - kthread_stop_put(sdev->ud.eh); - - busid_priv->sdev = NULL; - stub_device_free(sdev); - return err; + goto err_files; } busid_priv->status = STUB_BUSID_ALLOC; return 0; +err_files: + usb_hub_release_port(udev->parent, udev->portnum, +(struct usb_dev_state *) udev); +err_port: + dev_set_drvdata(&udev->dev, NULL); + usb_put_dev(udev); + kthread_stop_put(sdev->ud.eh); + + busid_priv->sdev = NULL; + stub_device_free(sdev); + return rc; } static void shutdown_busid(struct bus_id_priv *busid_priv)
[PATCH 3.16 12/63] futex: Remove requirement for lock_page() in get_futex_key()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Mel Gorman commit 65d8fc777f6dcfee12785c057a6b57f679641c90 upstream. When dealing with key handling for shared futexes, we can drastically reduce the usage/need of the page lock. 1) For anonymous pages, the associated futex object is the mm_struct which does not require the page lock. 2) For inode based, keys, we can check under RCU read lock if the page mapping is still valid and take reference to the inode. This just leaves one rare race that requires the page lock in the slow path when examining the swapcache. Additionally realtime users currently have a problem with the page lock being contended for unbounded periods of time during futex operations. Task A get_futex_key() lock_page() ---> preempted Now any other task trying to lock that page will have to wait until task A gets scheduled back in, which is an unbound time. With this patch, we pretty much have a lockless futex_get_key(). Experiments show that this patch can boost/speedup the hashing of shared futexes with the perf futex benchmarks (which is good for measuring such change) by up to 45% when there are high (> 100) thread counts on a 60 core Westmere. Lower counts are pretty much in the noise range or less than 10%, but mid range can be seen at over 30% overall throughput (hash ops/sec). This makes anon-mem shared futexes much closer to its private counterpart. Signed-off-by: Mel Gorman [ Ported on top of thp refcount rework, changelog, comments, fixes. ] Signed-off-by: Davidlohr Bueso Reviewed-by: Thomas Gleixner Cc: Chris Mason Cc: Darren Hart Cc: Hugh Dickins Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Sebastian Andrzej Siewior Cc: d...@stgolabs.net Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-d...@stgolabs.net Signed-off-by: Ingo Molnar Signed-off-by: Chenbo Feng Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: s/READ_ONCE/ACCESS_ONCE/] Signed-off-by: Ben Hutchings --- kernel/futex.c | 98 ++ 1 file changed, 91 insertions(+), 7 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -394,6 +394,7 @@ get_futex_key(u32 __user *uaddr, int fsh unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; struct page *page, *page_head; + struct address_space *mapping; int err, ro = 0; /* @@ -472,7 +473,19 @@ again: } #endif - lock_page(page_head); + /* +* The treatment of mapping from this point on is critical. The page +* lock protects many things but in this context the page lock +* stabilizes mapping, prevents inode freeing in the shared +* file-backed region case and guards against movement to swap cache. +* +* Strictly speaking the page lock is not needed in all cases being +* considered here and page lock forces unnecessarily serialization +* From this point on, mapping will be re-verified if necessary and +* page lock will be acquired only if it is unavoidable +*/ + + mapping = ACCESS_ONCE(page_head->mapping); /* * If page_head->mapping is NULL, then it cannot be a PageAnon @@ -489,18 +502,31 @@ again: * shmem_writepage move it from filecache to swapcache beneath us: * an unlikely race, but we do need to retry for page_head->mapping. */ - if (!page_head->mapping) { - int shmem_swizzled = PageSwapCache(page_head); + if (unlikely(!mapping)) { + int shmem_swizzled; + + /* +* Page lock is required to identify which special case above +* applies. If this is really a shmem page then the page lock +* will prevent unexpected transitions. +*/ + lock_page(page); + shmem_swizzled = PageSwapCache(page) || page->mapping; unlock_page(page_head); put_page(page_head); + if (shmem_swizzled) goto again; + return -EFAULT; } /* * Private mappings are handled in a simple way. * +* If the futex key is stored on an anonymous page, then the associated +* object is the mm which is implicitly pinned by the calling process. +* * NOTE: When userspace waits on a MAP_SHARED mapping, even if * it's a read-only handle, it's expected that futexes attach to * the object not the particular process. @@ -518,16 +544,74 @@ again: key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */ key->private.mm = mm; key->private.address = address; + + get_futex_key_refs(key); /* implies smp_mb(); (B) */ + } else { + struct inode *
[PATCH 3.16 62/63] KVM: x86: introduce num_emulated_msrs
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Paolo Bonzini commit 62ef68bb4d00f1a662e487f3fc44ce8521c416aa upstream. We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if the host CPU does not support SMM virtualization. Introduce the logic to do that, and also move paravirt MSRs to emulated_msrs for simplicity and to get rid of KVM_SAVE_MSRS_BEGIN. Reviewed-by: Radim Krčmář Signed-off-by: Paolo Bonzini Signed-off-by: Ben Hutchings --- arch/x86/kvm/x86.c | 40 +++- 1 file changed, 27 insertions(+), 13 deletions(-) --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -876,17 +876,11 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); * * This list is modified at module load time to reflect the * capabilities of the host cpu. This capabilities test skips MSRs that are - * kvm-specific. Those are put in the beginning of the list. + * kvm-specific. Those are put in emulated_msrs; filtering of emulated_msrs + * may depend on host virtualization features rather than host cpu features. */ -#define KVM_SAVE_MSRS_BEGIN12 static u32 msrs_to_save[] = { - MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, - MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, - HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, - HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC, - HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, - MSR_KVM_PV_EOI_EN, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_STAR, #ifdef CONFIG_X86_64 @@ -899,7 +893,14 @@ static u32 msrs_to_save[] = { static unsigned num_msrs_to_save; -static const u32 emulated_msrs[] = { +static u32 emulated_msrs[] = { + MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, + MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, + HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, + HV_X64_MSR_TIME_REF_COUNT, HV_X64_MSR_REFERENCE_TSC, + HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, + MSR_KVM_PV_EOI_EN, + MSR_IA32_TSC_ADJUST, MSR_IA32_TSCDEADLINE, MSR_IA32_MISC_ENABLE, @@ -907,6 +908,8 @@ static const u32 emulated_msrs[] = { MSR_IA32_MCG_CTL, }; +static unsigned num_emulated_msrs; + bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) { if (efer & efer_reserved_bits) @@ -2774,7 +2777,7 @@ long kvm_arch_dev_ioctl(struct file *fil if (copy_from_user(&msr_list, user_msr_list, sizeof msr_list)) goto out; n = msr_list.nmsrs; - msr_list.nmsrs = num_msrs_to_save + ARRAY_SIZE(emulated_msrs); + msr_list.nmsrs = num_msrs_to_save + num_emulated_msrs; if (copy_to_user(user_msr_list, &msr_list, sizeof msr_list)) goto out; r = -E2BIG; @@ -2786,7 +2789,7 @@ long kvm_arch_dev_ioctl(struct file *fil goto out; if (copy_to_user(user_msr_list->indices + num_msrs_to_save, &emulated_msrs, -ARRAY_SIZE(emulated_msrs) * sizeof(u32))) +num_emulated_msrs * sizeof(u32))) goto out; r = 0; break; @@ -4009,8 +4012,7 @@ static void kvm_init_msr_list(void) u32 dummy[2]; unsigned i, j; - /* skip the first msrs in the list. KVM-specific */ - for (i = j = KVM_SAVE_MSRS_BEGIN; i < ARRAY_SIZE(msrs_to_save); i++) { + for (i = j = 0; i < ARRAY_SIZE(msrs_to_save); i++) { if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0) continue; @@ -4035,6 +4037,18 @@ static void kvm_init_msr_list(void) j++; } num_msrs_to_save = j; + + for (i = j = 0; i < ARRAY_SIZE(emulated_msrs); i++) { + switch (emulated_msrs[i]) { + default: + break; + } + + if (j < i) + emulated_msrs[j] = emulated_msrs[i]; + j++; + } + num_emulated_msrs = j; } static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
[PATCH 3.16 57/63] seccomp: add "seccomp" syscall
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 48dc92b9fc3926844257316e75ba11eb5c742b2c upstream. This adds the new "seccomp" syscall with both an "operation" and "flags" parameter for future expansion. The third argument is a pointer value, used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). In addition to the TSYNC flag later in this patch series, there is a non-zero chance that this syscall could be used for configuring a fixed argument area for seccomp-tracer-aware processes to pass syscall arguments in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter" for this syscall. Additionally, this syscall uses operation, flags, and user pointer for arguments because strictly passing arguments via a user pointer would mean seccomp itself would be unable to trivially filter the seccomp syscall itself. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings --- arch/Kconfig | 1 + arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 1 + include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/unistd.h | 4 ++- include/uapi/linux/seccomp.h | 4 +++ kernel/seccomp.c | 55 --- kernel/sys_ni.c | 3 ++ 8 files changed, 65 insertions(+), 6 deletions(-) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -321,6 +321,7 @@ config HAVE_ARCH_SECCOMP_FILTER - secure_computing is called from a ptrace_event()-safe context - secure_computing return value is checked and a return value of -1 results in the system call being skipped immediately. + - seccomp syscall wired up config SECCOMP_FILTER def_bool y --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -360,3 +360,4 @@ 351i386sched_setattr sys_sched_setattr 352i386sched_getattr sys_sched_getattr 353i386renameat2 sys_renameat2 +354i386seccomp sys_seccomp --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -323,6 +323,7 @@ 314common sched_setattr sys_sched_setattr 315common sched_getattr sys_sched_getattr 316common renameat2 sys_renameat2 +317common seccomp sys_seccomp # # x32-specific system call numbers start at 512 to avoid cache impact --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pi asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2); asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags); +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, + const char __user *uargs); #endif --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -699,9 +699,11 @@ __SYSCALL(__NR_sched_setattr, sys_sched_ __SYSCALL(__NR_sched_getattr, sys_sched_getattr) #define __NR_renameat2 276 __SYSCALL(__NR_renameat2, sys_renameat2) +#define __NR_seccomp 277 +__SYSCALL(__NR_seccomp, sys_seccomp) #undef __NR_syscalls -#define __NR_syscalls 277 +#define __NR_syscalls 278 /* * All syscalls below here should go away really, --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -10,6 +10,10 @@ #define SECCOMP_MODE_STRICT1 /* uses hard-coded filter. */ #define SECCOMP_MODE_FILTER2 /* uses user-supplied filter. */ +/* Valid operations for seccomp syscall. */ +#define SECCOMP_SET_MODE_STRICT0 +#define SECCOMP_SET_MODE_FILTER1 + /* * All BPF programs must return a 32-bit value. * The bottom 16-bits are for optional return data. --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -18,6 +18,7 @@ #include #include #include +#include /* #define SECCOMP_DEBUG 1 */ @@ -314,7 +315,7 @@ free_prog: * * Returns 0 on success and non-zero otherwise. */ -static long seccomp_attach_user_filter(char __user *user_filter) +static long seccomp_attach_user_filter(const char __user *user_filter) { struct sock_fprog fprog; long ret = -EFAULT; @@ -517,6 +518,7 @@ out: #ifdef CONFIG_SECCOMP_FILTER /** * seccomp_set_mode_filter: internal function for setting seccomp filter + * @flags: flags to change filter behavior * @filter: struct sock_fprog containing filter * * This function may be called repeatedly to install additional filters. @@ -527,11 +529,16 @@ out: * * Returns 0 on success or -EINVAL on failure. */ -static long seccomp_set_mode_filter(char __user *filter) +static long seccomp_set_mode_filter(unsigned int flags, + cons
[PATCH 3.16 01/63] x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Ingo Molnar commit d364a7656c1855c940dfa4baf4ebcc3c6a9e6fd2 upstream. I tried to simulate an ancient CPU via this option, and found that it still has fxsr_opt enabled, confusing the FPU code. Make the 'nofxsr' option also clear FXSR_OPT flag. Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings --- arch/x86/kernel/cpu/common.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -199,6 +199,15 @@ static int __init x86_noinvpcid_setup(ch } early_param("noinvpcid", x86_noinvpcid_setup); +static int __init x86_fxsr_setup(char *s) +{ + setup_clear_cpu_cap(X86_FEATURE_FXSR); + setup_clear_cpu_cap(X86_FEATURE_FXSR_OPT); + setup_clear_cpu_cap(X86_FEATURE_XMM); + return 1; +} +__setup("nofxsr", x86_fxsr_setup); + #ifdef CONFIG_X86_32 static int cachesize_override = -1; static int disable_x86_serial_nr = 1; @@ -210,14 +219,6 @@ static int __init cachesize_setup(char * } __setup("cachesize=", cachesize_setup); -static int __init x86_fxsr_setup(char *s) -{ - setup_clear_cpu_cap(X86_FEATURE_FXSR); - setup_clear_cpu_cap(X86_FEATURE_XMM); - return 1; -} -__setup("nofxsr", x86_fxsr_setup); - static int __init x86_sep_setup(char *s) { setup_clear_cpu_cap(X86_FEATURE_SEP);
[PATCH 3.16 53/63] xfs: don't call xfs_da_shrink_inode with NULL bp
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Eric Sandeen commit bb3d48dcf86a97dc25fe9fc2c11938e19cb4399a upstream. xfs_attr3_leaf_create may have errored out before instantiating a buffer, for example if the blkno is out of range. In that case there is no work to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops if we try. This also seems to fix a flaw where the original error from xfs_attr3_leaf_create gets overwritten in the cleanup case, and it removes a pointless assignment to bp which isn't used after this. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969 Reported-by: Xu, Wen Tested-by: Xu, Wen Signed-off-by: Eric Sandeen Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- fs/xfs/xfs_attr_leaf.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) --- a/fs/xfs/xfs_attr_leaf.c +++ b/fs/xfs/xfs_attr_leaf.c @@ -701,9 +701,8 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t ASSERT(blkno == 0); error = xfs_attr3_leaf_create(args, blkno, &bp); if (error) { - error = xfs_da_shrink_inode(args, 0, bp); - bp = NULL; - if (error) + /* xfs_attr3_leaf_create may not have instantiated a block */ + if (bp && (xfs_da_shrink_inode(args, 0, bp) != 0)) goto out; xfs_idata_realloc(dp, size, XFS_ATTR_FORK); /* try to put */ memcpy(ifp->if_u1.if_data, tmpbuffer, size);/* it back */
[PATCH 3.16 47/63] uas: replace WARN_ON_ONCE() with lockdep_assert_held()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Sanjeev Sharma commit ab945eff8396bc3329cc97274320e8d2c6585077 upstream. on some architecture spin_is_locked() always return false in uniprocessor configuration and therefore it would be advise to replace with lockdep_assert_held(). Signed-off-by: Sanjeev Sharma Acked-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings --- drivers/usb/storage/uas.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -158,7 +158,7 @@ static void uas_mark_cmd_dead(struct uas struct scsi_cmnd *cmnd = container_of(scp, struct scsi_cmnd, SCp); uas_log_cmd_state(cmnd, caller); - WARN_ON_ONCE(!spin_is_locked(&devinfo->lock)); + lockdep_assert_held(&devinfo->lock); WARN_ON_ONCE(cmdinfo->state & COMMAND_ABORTED); cmdinfo->state |= COMMAND_ABORTED; cmdinfo->state &= ~IS_IN_WORK_LIST; @@ -185,7 +185,7 @@ static void uas_add_work(struct uas_cmd_ struct scsi_cmnd *cmnd = container_of(scp, struct scsi_cmnd, SCp); struct uas_dev_info *devinfo = cmnd->device->hostdata; - WARN_ON_ONCE(!spin_is_locked(&devinfo->lock)); + lockdep_assert_held(&devinfo->lock); cmdinfo->state |= IS_IN_WORK_LIST; schedule_work(&devinfo->work); } @@ -287,7 +287,7 @@ static int uas_try_complete(struct scsi_ struct uas_cmd_info *cmdinfo = (void *)&cmnd->SCp; struct uas_dev_info *devinfo = (void *)cmnd->device->hostdata; - WARN_ON_ONCE(!spin_is_locked(&devinfo->lock)); + lockdep_assert_held(&devinfo->lock); if (cmdinfo->state & (COMMAND_INFLIGHT | DATA_IN_URB_INFLIGHT | DATA_OUT_URB_INFLIGHT | @@ -626,7 +626,7 @@ static int uas_submit_urbs(struct scsi_c struct urb *urb; int err; - WARN_ON_ONCE(!spin_is_locked(&devinfo->lock)); + lockdep_assert_held(&devinfo->lock); if (cmdinfo->state & SUBMIT_STATUS_URB) { urb = uas_submit_sense_urb(cmnd, gfp, cmdinfo->stream); if (!urb)
Re: [PATCH] libnvdimm: remove duplicate include
On Wed, Sep 19, 2018 at 5:59 AM Pankaj Gupta wrote: > > Removed duplicate include. > > Signed-off-by: Pankaj Gupta > --- > drivers/nvdimm/nd-core.h | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h > index ac68072fb8cd..182258f64417 100644 > --- a/drivers/nvdimm/nd-core.h > +++ b/drivers/nvdimm/nd-core.h > @@ -14,7 +14,6 @@ > #define __ND_CORE_H__ > #include > #include > -#include > #include > #include > #include Looks good, applied.
[PATCH 3.16 13/63] futex: Remove unnecessary warning from get_futex_key
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Mel Gorman commit 48fb6f4db940e92cfb16cd878cddd59ea6120d06 upstream. Commit 65d8fc777f6d ("futex: Remove requirement for lock_page() in get_futex_key()") removed an unnecessary lock_page() with the side-effect that page->mapping needed to be treated very carefully. Two defensive warnings were added in case any assumption was missed and the first warning assumed a correct application would not alter a mapping backing a futex key. Since merging, it has not triggered for any unexpected case but Mark Rutland reported the following bug triggering due to the first warning. kernel BUG at kernel/futex.c:679! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3 Hardware name: linux,dummy-virt (DT) task: 80001e271780 task.stack: 10908000 PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679 pc : [] lr : [] pstate: 8145 The fact that it's a bug instead of a warning was due to an unrelated arm64 problem, but the warning itself triggered because the underlying mapping changed. This is an application issue but from a kernel perspective it's a recoverable situation and the warning is unnecessary so this patch removes the warning. The warning may potentially be triggered with the following test program from Mark although it may be necessary to adjust NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the system. #include #include #include #include #include #include #include #include #define NR_FUTEX_THREADS 16 pthread_t threads[NR_FUTEX_THREADS]; void *mem; #define MEM_PROT (PROT_READ | PROT_WRITE) #define MEM_SIZE 65536 static int futex_wrapper(int *uaddr, int op, int val, const struct timespec *timeout, int *uaddr2, int val3) { syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3); } void *poll_futex(void *unused) { for (;;) { futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1); } } int main(int argc, char *argv[]) { int i; mem = mmap(NULL, MEM_SIZE, MEM_PROT, MAP_SHARED | MAP_ANONYMOUS, -1, 0); printf("Mapping @ %p\n", mem); printf("Creating futex threads...\n"); for (i = 0; i < NR_FUTEX_THREADS; i++) pthread_create(&threads[i], NULL, poll_futex, NULL); printf("Flipping mapping...\n"); for (;;) { mmap(mem, MEM_SIZE, MEM_PROT, MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0); } return 0; } Reported-and-tested-by: Mark Rutland Signed-off-by: Mel Gorman Acked-by: Peter Zijlstra (Intel) Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings --- kernel/futex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/kernel/futex.c +++ b/kernel/futex.c @@ -583,13 +583,14 @@ again: * this reference was taken by ihold under the page lock * pinning the inode in place so i_lock was unnecessary. The * only way for this check to fail is if the inode was -* truncated in parallel so warn for now if this happens. +* truncated in parallel which is almost certainly an +* application bug. In such a case, just retry. * * We are not calling into get_futex_key_refs() in file-backed * cases, therefore a successful atomic_inc return below will * guarantee that get_futex_key() will still imply smp_mb(); (B). */ - if (WARN_ON_ONCE(!atomic_inc_not_zero(&inode->i_count))) { + if (!atomic_inc_not_zero(&inode->i_count)) { rcu_read_unlock(); put_page(page_head);
[PATCH 3.16 60/63] x86/cpu/AMD: Fix erratum 1076 (CPB bit)
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Borislav Petkov commit f7f3dc00f61261cdc9ccd8b886f21bc4dffd6fd9 upstream. CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that. Signed-off-by: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sherry Hurwitz Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170907170821.16021-1...@alien8.de Signed-off-by: Ingo Molnar Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: - Change the added case into an if-statement - s/x86_stepping/x86_mask/] Signed-off-by: Ben Hutchings --- arch/x86/kernel/cpu/amd.c | 13 + 1 file changed, 13 insertions(+) --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -533,6 +533,16 @@ static void init_amd_ln(struct cpuinfo_x msr_set_bit(MSR_AMD64_DE_CFG, 31); } +static void init_amd_zn(struct cpuinfo_x86 *c) +{ + /* +* Fix erratum 1076: CPB feature bit not being set in CPUID. It affects +* all up to and including B1. +*/ + if (c->x86_model <= 1 && c->x86_mask <= 1) + set_cpu_cap(c, X86_FEATURE_CPB); +} + static void init_amd(struct cpuinfo_x86 *c) { u32 dummy; @@ -611,6 +621,9 @@ static void init_amd(struct cpuinfo_x86 clear_cpu_cap(c, X86_FEATURE_MCE); #endif + if (c->x86 == 0x17) + init_amd_zn(c); + /* Enable workaround for FXSAVE leak */ if (c->x86 >= 6) set_cpu_cap(c, X86_FEATURE_FXSAVE_LEAK);
[PATCH 3.16 45/63] x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Peter Zijlstra commit 5800dc5c19f34e6e03b5adab1282535cb102fafd upstream. Nadav reported that on guests we're failing to rewrite the indirect calls to CALLEE_SAVE paravirt functions. In particular the pv_queued_spin_unlock() call is left unpatched and that is all over the place. This obviously wrecks Spectre-v2 mitigation (for paravirt guests) which relies on not actually having indirect calls around. The reason is an incorrect clobber test in paravirt_patch_call(); this function rewrites an indirect call with a direct call to the _SAME_ function, there is no possible way the clobbers can be different because of this. Therefore remove this clobber check. Also put WARNs on the other patch failure case (not enough room for the instruction) which I've not seen trigger in my (limited) testing. Three live kernel image disassemblies for lock_sock_nested (as a small function that illustrates the problem nicely). PRE is the current situation for guests, POST is with this patch applied and NATIVE is with or without the patch for !guests. PRE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0x817be970 <+0>: push %rbp 0x817be971 <+1>: mov%rdi,%rbp 0x817be974 <+4>: push %rbx 0x817be975 <+5>: lea0x88(%rbp),%rbx 0x817be97c <+12>:callq 0x819f7160 <_cond_resched> 0x817be981 <+17>:mov%rbx,%rdi 0x817be984 <+20>:callq 0x819fbb00 <_raw_spin_lock_bh> 0x817be989 <+25>:mov0x8c(%rbp),%eax 0x817be98f <+31>:test %eax,%eax 0x817be991 <+33>:jne0x817be9ba 0x817be993 <+35>:movl $0x1,0x8c(%rbp) 0x817be99d <+45>:mov%rbx,%rdi 0x817be9a0 <+48>:callq *0x822299e8 0x817be9a7 <+55>:pop%rbx 0x817be9a8 <+56>:pop%rbp 0x817be9a9 <+57>:mov$0x200,%esi 0x817be9ae <+62>:mov$0x817be993,%rdi 0x817be9b5 <+69>:jmpq 0x81063ae0 <__local_bh_enable_ip> 0x817be9ba <+74>:mov%rbp,%rdi 0x817be9bd <+77>:callq 0x817be8c0 <__lock_sock> 0x817be9c2 <+82>:jmp0x817be993 End of assembler dump. POST: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0x817be970 <+0>: push %rbp 0x817be971 <+1>: mov%rdi,%rbp 0x817be974 <+4>: push %rbx 0x817be975 <+5>: lea0x88(%rbp),%rbx 0x817be97c <+12>:callq 0x819f7160 <_cond_resched> 0x817be981 <+17>:mov%rbx,%rdi 0x817be984 <+20>:callq 0x819fbb00 <_raw_spin_lock_bh> 0x817be989 <+25>:mov0x8c(%rbp),%eax 0x817be98f <+31>:test %eax,%eax 0x817be991 <+33>:jne0x817be9ba 0x817be993 <+35>:movl $0x1,0x8c(%rbp) 0x817be99d <+45>:mov%rbx,%rdi 0x817be9a0 <+48>:callq 0x810a0c20 <__raw_callee_save___pv_queued_spin_unlock> 0x817be9a5 <+53>:xchg %ax,%ax 0x817be9a7 <+55>:pop%rbx 0x817be9a8 <+56>:pop%rbp 0x817be9a9 <+57>:mov$0x200,%esi 0x817be9ae <+62>:mov$0x817be993,%rdi 0x817be9b5 <+69>:jmpq 0x81063aa0 <__local_bh_enable_ip> 0x817be9ba <+74>:mov%rbp,%rdi 0x817be9bd <+77>:callq 0x817be8c0 <__lock_sock> 0x817be9c2 <+82>:jmp0x817be993 End of assembler dump. NATIVE: (gdb) disassemble lock_sock_nested Dump of assembler code for function lock_sock_nested: 0x817be970 <+0>: push %rbp 0x817be971 <+1>: mov%rdi,%rbp 0x817be974 <+4>: push %rbx 0x817be975 <+5>: lea0x88(%rbp),%rbx 0x817be97c <+12>:callq 0x819f7160 <_cond_resched> 0x817be981 <+17>:mov%rbx,%rdi 0x817be984 <+20>:callq 0x819fbb00 <_raw_spin_lock_bh> 0x817be989 <+25>:mov0x8c(%rbp),%eax 0x817be98f <+31>:test %eax,%eax 0x817be991 <+33>:jne0x817be9ba 0x817be993 <+35>:movl $0x1,0x8c(%rbp) 0x817be99d <+45>:mov%rbx,%rdi 0x817be9a0 <+48>:movb $0x0,(%rdi) 0x817be9a3 <+51>:nopl 0x0(%rax) 0x817be9a7 <+55>:pop%rbx 0x817be9a8 <+56>:pop%rbp 0x817be9a9 <+57>:mov$0x200,%esi 0x817be9ae <+62>:mov$0x817be993,%rdi 0x817be9b5 <+69>:jmpq 0x81063ae0 <__local_bh_enable_ip> 0xf
[PATCH v4 10/19] Smack: Abstract use of file security blob
Don't use the file->f_security pointer directly. Provide a helper function that provides the security blob pointer. Signed-off-by: Casey Schaufler --- security/smack/smack.h | 5 + security/smack/smack_lsm.c | 12 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/security/smack/smack.h b/security/smack/smack.h index 01a922856eba..22ca30379209 100644 --- a/security/smack/smack.h +++ b/security/smack/smack.h @@ -361,6 +361,11 @@ static inline struct task_smack *smack_cred(const struct cred *cred) return cred->security; } +static inline struct smack_known **smack_file(const struct file *file) +{ + return (struct smack_known **)&file->f_security; +} + /* * Is the directory transmuting? */ diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index a06ea8aa89c4..9ec595f0c3f1 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -1571,9 +1571,9 @@ static void smack_inode_getsecid(struct inode *inode, u32 *secid) */ static int smack_file_alloc_security(struct file *file) { - struct smack_known *skp = smk_of_current(); + struct smack_known **blob = smack_file(file); - file->f_security = skp; + *blob = smk_of_current(); return 0; } @@ -1813,7 +1813,9 @@ static int smack_mmap_file(struct file *file, */ static void smack_file_set_fowner(struct file *file) { - file->f_security = smk_of_current(); + struct smack_known **blob = smack_file(file); + + *blob = smk_of_current(); } /** @@ -1830,6 +1832,7 @@ static void smack_file_set_fowner(struct file *file) static int smack_file_send_sigiotask(struct task_struct *tsk, struct fown_struct *fown, int signum) { + struct smack_known **blob; struct smack_known *skp; struct smack_known *tkp = smk_of_task(smack_cred(tsk->cred)); struct file *file; @@ -1842,7 +1845,8 @@ static int smack_file_send_sigiotask(struct task_struct *tsk, file = container_of(fown, struct file, f_owner); /* we don't log here as rc can be overriden */ - skp = file->f_security; + blob = smack_file(file); + skp = *blob; rc = smk_access(skp, tkp, MAY_DELIVER, NULL); rc = smk_bu_note("sigiotask", skp, tkp, MAY_DELIVER, rc); if (rc != 0 && has_capability(tsk, CAP_MAC_OVERRIDE)) -- 2.17.1
[PATCH 3.16 52/63] xfs: validate cached inodes are free when allocated
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Dave Chinner commit afca6c5b2595fc44383919fba740c194b0b76aff upstream. A recent fuzzed filesystem image cached random dcache corruption when the reproducer was run. This often showed up as panics in lookup_slow() on a null inode->i_ops pointer when doing pathwalks. BUG: unable to handle kernel NULL pointer dereference at Call Trace: lookup_slow+0x44/0x60 walk_component+0x3dd/0x9f0 link_path_walk+0x4a7/0x830 path_lookupat+0xc1/0x470 filename_lookup+0x129/0x270 user_path_at_empty+0x36/0x40 path_listxattr+0x98/0x110 SyS_listxattr+0x13/0x20 do_syscall_64+0xf5/0x280 entry_SYSCALL_64_after_hwframe+0x42/0xb7 but had many different failure modes including deadlocks trying to lock the inode that was just allocated or KASAN reports of use-after-free violations. The cause of the problem was a corrupt INOBT on a v4 fs where the root inode was marked as free in the inobt record. Hence when we allocated an inode, it chose the root inode to allocate, found it in the cache and re-initialised it. We recently fixed a similar inode allocation issue caused by inobt record corruption problem in xfs_iget_cache_miss() in commit ee457001ed6c ("xfs: catch inode allocation state mismatch corruption"). This change adds similar checks to the cache-hit path to catch it, and turns the reproducer into a corruption shutdown situation. Reported-by: Wen Xu Signed-Off-By: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Carlos Maiolino Reviewed-by: Darrick J. Wong [darrick: fix typos in comment] Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: - Look up mode in XFS inode, not VFS inode - Use positive error codes, and EIO instead of EFSCORRUPTED] Signed-off-by: Ben Hutchings --- fs/xfs/xfs_icache.c | 73 + 1 file changed, 48 insertions(+), 25 deletions(-) --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -133,6 +133,46 @@ xfs_inode_free( } /* + * If we are allocating a new inode, then check what was returned is + * actually a free, empty inode. If we are not allocating an inode, + * then check we didn't find a free inode. + * + * Returns: + * 0 if the inode free state matches the lookup context + * ENOENT if the inode is free and we are not allocating + * EFSCORRUPTEDif there is any state mismatch at all + */ +static int +xfs_iget_check_free_state( + struct xfs_inode*ip, + int flags) +{ + if (flags & XFS_IGET_CREATE) { + /* should be a free inode */ + if (ip->i_d.di_mode != 0) { + xfs_warn(ip->i_mount, +"Corruption detected! Free inode 0x%llx not marked free! (mode 0x%x)", + ip->i_ino, ip->i_d.di_mode); + return EIO; + } + + if (ip->i_d.di_nblocks != 0) { + xfs_warn(ip->i_mount, +"Corruption detected! Free inode 0x%llx has blocks allocated!", + ip->i_ino); + return EIO; + } + return 0; + } + + /* should be an allocated inode */ + if (ip->i_d.di_mode == 0) + return ENOENT; + + return 0; +} + +/* * Check the validity of the inode we just found it the cache */ static int @@ -181,12 +221,12 @@ xfs_iget_cache_hit( } /* -* If lookup is racing with unlink return an error immediately. +* Check the inode free state is valid. This also detects lookup +* racing with unlinks. */ - if (ip->i_d.di_mode == 0 && !(flags & XFS_IGET_CREATE)) { - error = ENOENT; + error = xfs_iget_check_free_state(ip, flags); + if (error) goto out_error; - } /* * If IRECLAIMABLE is set, we've torn down the VFS inode already. @@ -295,29 +335,12 @@ xfs_iget_cache_miss( /* -* If we are allocating a new inode, then check what was returned is -* actually a free, empty inode. If we are not allocating an inode, -* the check we didn't find a free inode. +* Check the inode free state is valid. This also detects lookup +* racing with unlinks. */ - if (flags & XFS_IGET_CREATE) { - if (ip->i_d.di_mode != 0) { - xfs_warn(mp, -"Corruption detected! Free inode 0x%llx not marked free on disk", - ino); - error = EIO; - goto out_destroy; - } - if (ip->i_d.di_nblocks != 0) { - xfs_warn(mp, -"Corruption detected! Free inode 0x%llx has blocks allocated!", - ino); - error = EIO; - goto out_destroy; -
[PATCH 3.16 63/63] mm: get rid of vmacache_flush_all() entirely
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Linus Torvalds commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream. Jann Horn points out that the vmacache_flush_all() function is not only potentially expensive, it's buggy too. It also happens to be entirely unnecessary, because the sequence number overflow case can be avoided by simply making the sequence number be 64-bit. That doesn't even grow the data structures in question, because the other adjacent fields are already 64-bit. So simplify the whole thing by just making the sequence number overflow case go away entirely, which gets rid of all the complications and makes the code faster too. Win-win. [ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics also just goes away entirely with this ] Reported-by: Jann Horn Suggested-by: Will Deacon Acked-by: Davidlohr Bueso Cc: Oleg Nesterov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: drop changes to mm debug code] Signed-off-by: Ben Hutchings --- --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -345,7 +345,7 @@ struct kioctx_table; struct mm_struct { struct vm_area_struct *mmap;/* list of VMAs */ struct rb_root mm_rb; - u32 vmacache_seqnum; /* per-thread vmacache */ + u64 vmacache_seqnum; /* per-thread vmacache */ #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, unsigned long addr, unsigned long len, --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1291,7 +1291,7 @@ struct task_struct { unsigned brk_randomized:1; #endif /* per-thread vma caching */ - u32 vmacache_seqnum; + u64 vmacache_seqnum; struct vm_area_struct *vmacache[VMACACHE_SIZE]; #if defined(SPLIT_RSS_COUNTING) struct task_rss_statrss_stat; --- a/include/linux/vmacache.h +++ b/include/linux/vmacache.h @@ -15,7 +15,6 @@ static inline void vmacache_flush(struct memset(tsk->vmacache, 0, sizeof(tsk->vmacache)); } -extern void vmacache_flush_all(struct mm_struct *mm); extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma); extern struct vm_area_struct *vmacache_find(struct mm_struct *mm, unsigned long addr); @@ -29,10 +28,6 @@ extern struct vm_area_struct *vmacache_f static inline void vmacache_invalidate(struct mm_struct *mm) { mm->vmacache_seqnum++; - - /* deal with overflows */ - if (unlikely(mm->vmacache_seqnum == 0)) - vmacache_flush_all(mm); } #endif /* __LINUX_VMACACHE_H */ --- a/mm/vmacache.c +++ b/mm/vmacache.c @@ -6,42 +6,6 @@ #include /* - * Flush vma caches for threads that share a given mm. - * - * The operation is safe because the caller holds the mmap_sem - * exclusively and other threads accessing the vma cache will - * have mmap_sem held at least for read, so no extra locking - * is required to maintain the vma cache. - */ -void vmacache_flush_all(struct mm_struct *mm) -{ - struct task_struct *g, *p; - - /* -* Single threaded tasks need not iterate the entire -* list of process. We can avoid the flushing as well -* since the mm's seqnum was increased and don't have -* to worry about other threads' seqnum. Current's -* flush will occur upon the next lookup. -*/ - if (atomic_read(&mm->mm_users) == 1) - return; - - rcu_read_lock(); - for_each_process_thread(g, p) { - /* -* Only flush the vmacache pointers as the -* mm seqnum is already set and curr's will -* be set upon invalidation when the next -* lookup is done. -*/ - if (mm == p->mm) - vmacache_flush(p); - } - rcu_read_unlock(); -} - -/* * This task may be accessing a foreign mm via (for example) * get_user_pages()->find_vma(). The vmacache is task-local and this * task's vmacache pertains to a different mm (ie, its own). There is
[PATCH 3.16 51/63] xfs: catch inode allocation state mismatch corruption
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Dave Chinner commit ee457001ed6c6f31ddad69c24c1da8f377d8472d upstream. We recently came across a V4 filesystem causing memory corruption due to a newly allocated inode being setup twice and being added to the superblock inode list twice. From code inspection, the only way this could happen is if a newly allocated inode was not marked as free on disk (i.e. di_mode wasn't zero). Running the metadump on an upstream debug kernel fails during inode allocation like so: XFS: Assertion failed: ip->i_d.di_nblocks == 0, file: fs/xfs/xfs_inod= e.c, line: 838 [ cut here ] kernel BUG at fs/xfs/xfs_message.c:114! invalid opcode: [#1] PREEMPT SMP CPU: 11 PID: 3496 Comm: mkdir Not tainted 4.16.0-rc5-dgc #442 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/0= 1/2014 RIP: 0010:assfail+0x28/0x30 RSP: 0018:c9000236fc80 EFLAGS: 00010202 RAX: ffea RBX: 4000 RCX: RDX: ffc0 RSI: 000a RDI: 8227211b RBP: c9000236fce8 R08: R09: R10: 0bec R11: f000 R12: c9000236fd30 R13: 8805c76bab80 R14: 8805c77ac800 R15: 88083fb12e10 FS: 7fac8cbff040() GS:88083fd0() knlGS:0= 000 CS: 0010 DS: ES: CR0: 80050033 CR2: 7fffa6783ff8 CR3: 0005c6e2b003 CR4: 000606e0 Call Trace: xfs_ialloc+0x383/0x570 xfs_dir_ialloc+0x6a/0x2a0 xfs_create+0x412/0x670 xfs_generic_create+0x1f7/0x2c0 ? capable_wrt_inode_uidgid+0x3f/0x50 vfs_mkdir+0xfb/0x1b0 SyS_mkdir+0xcf/0xf0 do_syscall_64+0x73/0x1a0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Extracting the inode number we crashed on from an event trace and looking at it with xfs_db: xfs_db> inode 184452204 xfs_db> p core.magic = 0x494e core.mode = 0100644 core.version = 2 core.format = 2 (extents) core.nlinkv2 = 1 core.onlink = 0 . Confirms that it is not a free inode on disk. xfs_repair also trips over this inode: . zero length extent (off = 0, fsbno = 0) in ino 184452204 correcting nextents for inode 184452204 bad attribute fork in inode 184452204, would clear attr fork bad nblocks 1 for inode 184452204, would reset to 0 bad anextents 1 for inode 184452204, would reset to 0 imap claims in-use inode 184452204 is free, would correct imap would have cleared inode 184452204 . disconnected inode 184452204, would move to lost+found And so we have a situation where the directory structure and the inobt thinks the inode is free, but the inode on disk thinks it is still in use. Where this corruption came from is not possible to diagnose, but we can detect it and prevent the kernel from oopsing on lookup. The reproducer now results in: $ sudo mkdir /mnt/scratch/{0,1,2,3,4,5}{0,1,2,3,4,5} mkdir: cannot create directory =E2=80=98/mnt/scratch/00=E2=80=99: File ex= ists mkdir: cannot create directory =E2=80=98/mnt/scratch/01=E2=80=99: File ex= ists mkdir: cannot create directory =E2=80=98/mnt/scratch/03=E2=80=99: Structu= re needs cleaning mkdir: cannot create directory =E2=80=98/mnt/scratch/04=E2=80=99: Input/o= utput error mkdir: cannot create directory =E2=80=98/mnt/scratch/05=E2=80=99: Input/o= utput error And this corruption shutdown: [ 54.843517] XFS (loop0): Corruption detected! Free inode 0xafe846c not= marked free on disk [ 54.845885] XFS (loop0): Internal error xfs_trans_cancel at line 1023 = of file fs/xfs/xfs_trans.c. Caller xfs_create+0x425/0x670 [ 54.848994] CPU: 10 PID: 3541 Comm: mkdir Not tainted 4.16.0-rc5-dgc #= 443 [ 54.850753] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO= S 1.10.2-1 04/01/2014 [ 54.852859] Call Trace: [ 54.853531] dump_stack+0x85/0xc5 [ 54.854385] xfs_trans_cancel+0x197/0x1c0 [ 54.855421] xfs_create+0x425/0x670 [ 54.856314] xfs_generic_create+0x1f7/0x2c0 [ 54.857390] ? capable_wrt_inode_uidgid+0x3f/0x50 [ 54.858586] vfs_mkdir+0xfb/0x1b0 [ 54.859458] SyS_mkdir+0xcf/0xf0 [ 54.860254] do_syscall_64+0x73/0x1a0 [ 54.861193] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 54.862492] RIP: 0033:0x7fb73bddf547 [ 54.863358] RSP: 002b:7ffdaa553338 EFLAGS: 0246 ORIG_RAX: = 0053 [ 54.865133] RAX: ffda RBX: 7ffdaa55449a RCX: 7fb73= bddf547 [ 54.866766] RDX: 0001 RSI: 01ff RDI: 7ffda= a55449a [ 54.868432] RBP: 7ffdaa55449a R08: 01ff R09: 5623a= 8670dd0 [ 54.870110] R10: 7fb73be72d5b R11: 0246 R12: 0= 1ff [ 54.871752] R13: 7ffdaa5534b0 R14: R15: 7ffda= a553500 [ 54.873429] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 1= 024 of file fs/xfs/xfs_trans.c. Return address = 814cd050 [ 54.882790] XFS (loop0): Corruption of in-memory data detected. Shutt= ing down filesystem [ 54.884597]
[PATCH 3.16 49/63] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Qu Wenruo commit 389305b2aa68723c754f88d9dbd268a400e10664 upstream. Invalid reloc tree can cause kernel NULL pointer dereference when btrfs does some cleanup of the reloc roots. It turns out that fs_info::reloc_ctl can be NULL in btrfs_recover_relocation() as we allocate relocation control after all reloc roots have been verified. So when we hit: note, we haven't called set_reloc_control() thus fs_info::reloc_ctl is still NULL. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833 Reported-by: Xu Wen Signed-off-by: Qu Wenruo Tested-by: Gu Jinxiang Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Ben Hutchings --- fs/btrfs/relocation.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1311,18 +1311,19 @@ static void __del_reloc_root(struct btrf struct mapping_node *node = NULL; struct reloc_control *rc = root->fs_info->reloc_ctl; - spin_lock(&rc->reloc_root_tree.lock); - rb_node = tree_search(&rc->reloc_root_tree.rb_root, - root->node->start); - if (rb_node) { - node = rb_entry(rb_node, struct mapping_node, rb_node); - rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root); + if (rc) { + spin_lock(&rc->reloc_root_tree.lock); + rb_node = tree_search(&rc->reloc_root_tree.rb_root, + root->node->start); + if (rb_node) { + node = rb_entry(rb_node, struct mapping_node, rb_node); + rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root); + } + spin_unlock(&rc->reloc_root_tree.lock); + if (!node) + return; + BUG_ON((struct btrfs_root *)node->data != root); } - spin_unlock(&rc->reloc_root_tree.lock); - - if (!node) - return; - BUG_ON((struct btrfs_root *)node->data != root); spin_lock(&root->fs_info->trans_lock); list_del_init(&root->root_list);
[PATCH 3.16 50/63] hfsplus: fix NULL dereference in hfsplus_lookup()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Ernesto A. Fernández commit a7ec7a4193a2eb3b5341243fc0b621c1ac9e4ec4 upstream. An HFS+ filesystem can be mounted read-only without having a metadata directory, which is needed to support hardlinks. But if the catalog data is corrupted, a directory lookup may still find dentries claiming to be hardlinks. hfsplus_lookup() does check that ->hidden_dir is not NULL in such a situation, but mistakenly does so after dereferencing it for the first time. Reorder this check to prevent a crash. This happens when looking up corrupted catalog data (dentry) on a filesystem with no metadata directory (this could only ever happen on a read-only mount). Wen Xu sent the replication steps in detail to the fsdevel list: https://bugzilla.kernel.org/show_bug.cgi?id=200297 Link: http://lkml.kernel.org/r/20180712215344.q44dyrhymm4ajkao@eaf Signed-off-by: Ernesto A. Fernández Reported-by: Wen Xu Cc: Viacheslav Dubeyko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/hfsplus/dir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/fs/hfsplus/dir.c +++ b/fs/hfsplus/dir.c @@ -74,13 +74,13 @@ again: cpu_to_be32(HFSP_HARDLINK_TYPE) && entry.file.user_info.fdCreator == cpu_to_be32(HFSP_HFSPLUS_CREATOR) && + HFSPLUS_SB(sb)->hidden_dir && (entry.file.create_date == HFSPLUS_I(HFSPLUS_SB(sb)->hidden_dir)-> create_date || entry.file.create_date == HFSPLUS_I(sb->s_root->d_inode)-> - create_date) && - HFSPLUS_SB(sb)->hidden_dir) { + create_date)) { struct qstr str; char name[32];
[PATCH 3.16 59/63] x86/process: Correct and optimize TIF_BLOCKSTEP switch
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kyle Huey commit b9894a2f5bd18b1691cb6872c9afe32b148d0132 upstream. The debug control MSR is "highly magical" as the blockstep bit can be cleared by hardware under not well documented circumstances. So a task switch relying on the bit set by the previous task (according to the previous tasks thread flags) can trip over this and not update the flag for the next task. To fix this its required to handle DEBUGCTLMSR_BTF when either the previous or the next or both tasks have the TIF_BLOCKSTEP flag set. While at it avoid branching within the TIF_BLOCKSTEP case and evaluating boot_cpu_data twice in kernels without CONFIG_X86_DEBUGCTLMSR. x86_64: arch/x86/kernel/process.o textdatabss dec hex 3024857716 116172d61 Before 3008857716 116012d51 After i386: No change [ tglx: Made the shift value explicit, use a local variable to make the code readable and massaged changelog] Originally-by: Thomas Gleixner Signed-off-by: Kyle Huey Cc: Peter Zijlstra Cc: Andy Lutomirski Link: http://lkml.kernel.org/r/20170214081104.9244-3-kh...@kylehuey.com Signed-off-by: Thomas Gleixner Signed-off-by: David Woodhouse Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings --- arch/x86/include/uapi/asm/msr-index.h | 1 + arch/x86/kernel/process.c | 12 +++- 2 files changed, 8 insertions(+), 5 deletions(-) --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -109,6 +109,7 @@ /* DEBUGCTLMSR bits (others vary by model): */ #define DEBUGCTLMSR_LBR(1UL << 0) /* last branch recording */ +#define DEBUGCTLMSR_BTF_SHIFT 1 #define DEBUGCTLMSR_BTF(1UL << 1) /* single-step on branches */ #define DEBUGCTLMSR_TR (1UL << 6) #define DEBUGCTLMSR_BTS(1UL << 7) --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -231,13 +231,15 @@ void __switch_to_xtra(struct task_struct propagate_user_return_notify(prev_p, next_p); - if ((tifp ^ tifn) & _TIF_BLOCKSTEP) { - unsigned long debugctl = get_debugctlmsr(); + if ((tifp & _TIF_BLOCKSTEP || tifn & _TIF_BLOCKSTEP) && + arch_has_block_step()) { + unsigned long debugctl, msk; + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); debugctl &= ~DEBUGCTLMSR_BTF; - if (tifn & _TIF_BLOCKSTEP) - debugctl |= DEBUGCTLMSR_BTF; - update_debugctlmsr(debugctl); + msk = tifn & _TIF_BLOCKSTEP; + debugctl |= (msk >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BTF_SHIFT; + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); } if ((tifp ^ tifn) & _TIF_NOTSC) {
[PATCH 3.16 37/63] ext4: avoid running out of journal credits when appending to an inline file
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 8bc1379b82b8e809eef77a9fedbb75c6c297be19 upstream. Use a separate journal transaction if it turns out that we need to convert an inline file to use an data block. Otherwise we could end up failing due to not having journal credits. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2701,9 +2701,6 @@ extern struct buffer_head *ext4_get_firs extern int ext4_inline_data_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, int *has_inline); -extern int ext4_try_to_evict_inline_data(handle_t *handle, -struct inode *inode, -int needed); extern void ext4_inline_data_truncate(struct inode *inode, int *has_inline); extern int ext4_convert_inline_data(struct inode *inode); --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -877,11 +877,11 @@ retry_journal: } if (ret == -ENOSPC) { + ext4_journal_stop(handle); ret = ext4_da_convert_inline_data_to_extent(mapping, inode, flags, fsdata); - ext4_journal_stop(handle); if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries)) goto retry_journal; @@ -1839,42 +1839,6 @@ out: return (error < 0 ? error : 0); } -/* - * Called during xattr set, and if we can sparse space 'needed', - * just create the extent tree evict the data to the outer block. - * - * We use jbd2 instead of page cache to move data to the 1st block - * so that the whole transaction can be committed as a whole and - * the data isn't lost because of the delayed page cache write. - */ -int ext4_try_to_evict_inline_data(handle_t *handle, - struct inode *inode, - int needed) -{ - int error; - struct ext4_xattr_entry *entry; - struct ext4_inode *raw_inode; - struct ext4_iloc iloc; - - error = ext4_get_inode_loc(inode, &iloc); - if (error) - return error; - - raw_inode = ext4_raw_inode(&iloc); - entry = (struct ext4_xattr_entry *)((void *)raw_inode + - EXT4_I(inode)->i_inline_off); - if (EXT4_XATTR_LEN(entry->e_name_len) + - EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size)) < needed) { - error = -ENOSPC; - goto out; - } - - error = ext4_convert_inline_data_nolock(handle, inode, &iloc); -out: - brelse(iloc.bh); - return error; -} - void ext4_inline_data_truncate(struct inode *inode, int *has_inline) { handle_t *handle; --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1028,22 +1028,8 @@ int ext4_xattr_ibody_inline_set(handle_t if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; error = ext4_xattr_set_entry(i, s, inode); - if (error) { - if (error == -ENOSPC && - ext4_has_inline_data(inode)) { - error = ext4_try_to_evict_inline_data(handle, inode, - EXT4_XATTR_LEN(strlen(i->name) + - EXT4_XATTR_SIZE(i->value_len))); - if (error) - return error; - error = ext4_xattr_ibody_find(inode, i, is); - if (error) - return error; - error = ext4_xattr_set_entry(i, s, inode); - } - if (error) - return error; - } + if (error) + return error; header = IHDR(inode, ext4_raw_inode(&is->iloc)); if (!IS_LAST_ENTRY(s->first)) { header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC);
[PATCH 3.16 55/63] seccomp: extract check/assign mode helpers
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kees Cook commit 1f41b450416e689b9b7c8bfb750a98604f687a9b upstream. To support splitting mode 1 from mode 2, extract the mode checking and assignment logic into common functions. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski Signed-off-by: Ben Hutchings --- kernel/seccomp.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -194,7 +194,23 @@ static u32 seccomp_run_filters(int sysca } return ret; } +#endif /* CONFIG_SECCOMP_FILTER */ +static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode) +{ + if (current->seccomp.mode && current->seccomp.mode != seccomp_mode) + return false; + + return true; +} + +static inline void seccomp_assign_mode(unsigned long seccomp_mode) +{ + current->seccomp.mode = seccomp_mode; + set_tsk_thread_flag(current, TIF_SECCOMP); +} + +#ifdef CONFIG_SECCOMP_FILTER /** * seccomp_attach_filter: Attaches a seccomp filter to current. * @fprog: BPF program to install @@ -490,8 +506,7 @@ static long seccomp_set_mode(unsigned lo { long ret = -EINVAL; - if (current->seccomp.mode && - current->seccomp.mode != seccomp_mode) + if (!seccomp_may_assign_mode(seccomp_mode)) goto out; switch (seccomp_mode) { @@ -512,8 +527,7 @@ static long seccomp_set_mode(unsigned lo goto out; } - current->seccomp.mode = seccomp_mode; - set_thread_flag(TIF_SECCOMP); + seccomp_assign_mode(seccomp_mode); out: return ret; }
[PATCH v4 13/19] Smack: Abstract use of inode security blob
Don't use the inode->i_security pointer directly. Provide a helper function that provides the security blob pointer. Signed-off-by: Casey Schaufler Reviewed-by: Kees Cook --- security/smack/smack.h | 9 +++-- security/smack/smack_lsm.c | 32 2 files changed, 23 insertions(+), 18 deletions(-) diff --git a/security/smack/smack.h b/security/smack/smack.h index 62a22ad8ce92..add19b7efc96 100644 --- a/security/smack/smack.h +++ b/security/smack/smack.h @@ -366,12 +366,17 @@ static inline struct smack_known **smack_file(const struct file *file) return file->f_security; } +static inline struct inode_smack *smack_inode(const struct inode *inode) +{ + return inode->i_security; +} + /* * Is the directory transmuting? */ static inline int smk_inode_transmutable(const struct inode *isp) { - struct inode_smack *sip = isp->i_security; + struct inode_smack *sip = smack_inode(isp); return (sip->smk_flags & SMK_INODE_TRANSMUTE) != 0; } @@ -380,7 +385,7 @@ static inline int smk_inode_transmutable(const struct inode *isp) */ static inline struct smack_known *smk_of_inode(const struct inode *isp) { - struct inode_smack *sip = isp->i_security; + struct inode_smack *sip = smack_inode(isp); return sip->smk_inode; } diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index d1430341798f..364699ad55b9 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -166,7 +166,7 @@ static int smk_bu_task(struct task_struct *otp, int mode, int rc) static int smk_bu_inode(struct inode *inode, int mode, int rc) { struct task_smack *tsp = smack_cred(current_cred()); - struct inode_smack *isp = inode->i_security; + struct inode_smack *isp = smack_inode(inode); char acc[SMK_NUM_ACCESS_TYPE + 1]; if (isp->smk_flags & SMK_INODE_IMPURE) @@ -198,7 +198,7 @@ static int smk_bu_file(struct file *file, int mode, int rc) struct task_smack *tsp = smack_cred(current_cred()); struct smack_known *sskp = tsp->smk_task; struct inode *inode = file_inode(file); - struct inode_smack *isp = inode->i_security; + struct inode_smack *isp = smack_inode(inode); char acc[SMK_NUM_ACCESS_TYPE + 1]; if (isp->smk_flags & SMK_INODE_IMPURE) @@ -228,7 +228,7 @@ static int smk_bu_credfile(const struct cred *cred, struct file *file, struct task_smack *tsp = smack_cred(cred); struct smack_known *sskp = tsp->smk_task; struct inode *inode = file_inode(file); - struct inode_smack *isp = inode->i_security; + struct inode_smack *isp = smack_inode(inode); char acc[SMK_NUM_ACCESS_TYPE + 1]; if (isp->smk_flags & SMK_INODE_IMPURE) @@ -824,7 +824,7 @@ static int smack_set_mnt_opts(struct super_block *sb, /* * Initialize the root inode. */ - isp = inode->i_security; + isp = smack_inode(inode); if (isp == NULL) { isp = new_inode_smack(sp->smk_root); if (isp == NULL) @@ -912,7 +912,7 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm) if (bprm->called_set_creds) return 0; - isp = inode->i_security; + isp = smack_inode(inode); if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task) return 0; @@ -992,7 +992,7 @@ static void smack_inode_free_rcu(struct rcu_head *head) */ static void smack_inode_free_security(struct inode *inode) { - struct inode_smack *issp = inode->i_security; + struct inode_smack *issp = smack_inode(inode); /* * The inode may still be referenced in a path walk and @@ -1020,7 +1020,7 @@ static int smack_inode_init_security(struct inode *inode, struct inode *dir, const struct qstr *qstr, const char **name, void **value, size_t *len) { - struct inode_smack *issp = inode->i_security; + struct inode_smack *issp = smack_inode(inode); struct smack_known *skp = smk_of_current(); struct smack_known *isp = smk_of_inode(inode); struct smack_known *dsp = smk_of_inode(dir); @@ -1358,7 +1358,7 @@ static void smack_inode_post_setxattr(struct dentry *dentry, const char *name, const void *value, size_t size, int flags) { struct smack_known *skp; - struct inode_smack *isp = d_backing_inode(dentry)->i_security; + struct inode_smack *isp = smack_inode(d_backing_inode(dentry)); if (strcmp(name, XATTR_NAME_SMACKTRANSMUTE) == 0) { isp->smk_flags |= SMK_INODE_TRANSMUTE; @@ -1439,7 +1439,7 @@ static int smack_inode_removexattr(struct dentry *dentry, const char *name) if (rc != 0) return rc; - isp = d_backing_inode(dentry)->i_security; + isp = smack_inode(d_backing_inode(dentry)
[PATCH v4 09/19] SELinux: Abstract use of file security blob
Don't use the file->f_security pointer directly. Provide a helper function that provides the security blob pointer. Signed-off-by: Casey Schaufler Reviewed-by: Kees Cook --- security/selinux/hooks.c | 18 +- security/selinux/include/objsec.h | 5 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index b629cc302088..641a8ce726ff 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -396,7 +396,7 @@ static int file_alloc_security(struct file *file) static void file_free_security(struct file *file) { - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); file->f_security = NULL; kmem_cache_free(file_security_cache, fsec); } @@ -1879,7 +1879,7 @@ static int file_has_perm(const struct cred *cred, struct file *file, u32 av) { - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct inode *inode = file_inode(file); struct common_audit_data ad; u32 sid = cred_sid(cred); @@ -2223,7 +2223,7 @@ static int selinux_binder_transfer_file(struct task_struct *from, struct file *file) { u32 sid = task_sid(to); - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct dentry *dentry = file->f_path.dentry; struct inode_security_struct *isec; struct common_audit_data ad; @@ -3535,7 +3535,7 @@ static int selinux_revalidate_file_permission(struct file *file, int mask) static int selinux_file_permission(struct file *file, int mask) { struct inode *inode = file_inode(file); - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct inode_security_struct *isec; u32 sid = current_sid(); @@ -3570,7 +3570,7 @@ static int ioctl_has_perm(const struct cred *cred, struct file *file, u32 requested, u16 cmd) { struct common_audit_data ad; - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct inode *inode = file_inode(file); struct inode_security_struct *isec; struct lsm_ioctlop_audit ioctl; @@ -3822,7 +3822,7 @@ static void selinux_file_set_fowner(struct file *file) { struct file_security_struct *fsec; - fsec = file->f_security; + fsec = selinux_file(file); fsec->fown_sid = current_sid(); } @@ -3837,7 +3837,7 @@ static int selinux_file_send_sigiotask(struct task_struct *tsk, /* struct fown_struct is never outside the context of a struct file */ file = container_of(fown, struct file, f_owner); - fsec = file->f_security; + fsec = selinux_file(file); if (!signum) perm = signal_to_av(SIGIO); /* as per send_sigio_to_task */ @@ -3861,7 +3861,7 @@ static int selinux_file_open(struct file *file) struct file_security_struct *fsec; struct inode_security_struct *isec; - fsec = file->f_security; + fsec = selinux_file(file); isec = inode_security(file_inode(file)); /* * Save inode label and policy sequence number @@ -4000,7 +4000,7 @@ static int selinux_kernel_module_from_file(struct file *file) ad.type = LSM_AUDIT_DATA_FILE; ad.u.file = file; - fsec = file->f_security; + fsec = selinux_file(file); if (sid != fsec->sid) { rc = avc_has_perm(&selinux_state, sid, fsec->sid, SECCLASS_FD, FD__USE, &ad); diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index ad511c3d2eb7..cad8b765f6dd 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -165,4 +165,9 @@ static inline struct task_security_struct *selinux_cred(const struct cred *cred) return cred->security; } +static inline struct file_security_struct *selinux_file(const struct file *file) +{ + return file->f_security; +} + #endif /* _SELINUX_OBJSEC_H_ */ -- 2.17.1
[PATCH v4 09/19] SELinux: Abstract use of file security blob
Don't use the file->f_security pointer directly. Provide a helper function that provides the security blob pointer. Signed-off-by: Casey Schaufler Reviewed-by: Kees Cook --- security/selinux/hooks.c | 18 +- security/selinux/include/objsec.h | 5 + 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index b629cc302088..641a8ce726ff 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -396,7 +396,7 @@ static int file_alloc_security(struct file *file) static void file_free_security(struct file *file) { - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); file->f_security = NULL; kmem_cache_free(file_security_cache, fsec); } @@ -1879,7 +1879,7 @@ static int file_has_perm(const struct cred *cred, struct file *file, u32 av) { - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct inode *inode = file_inode(file); struct common_audit_data ad; u32 sid = cred_sid(cred); @@ -2223,7 +2223,7 @@ static int selinux_binder_transfer_file(struct task_struct *from, struct file *file) { u32 sid = task_sid(to); - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct dentry *dentry = file->f_path.dentry; struct inode_security_struct *isec; struct common_audit_data ad; @@ -3535,7 +3535,7 @@ static int selinux_revalidate_file_permission(struct file *file, int mask) static int selinux_file_permission(struct file *file, int mask) { struct inode *inode = file_inode(file); - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct inode_security_struct *isec; u32 sid = current_sid(); @@ -3570,7 +3570,7 @@ static int ioctl_has_perm(const struct cred *cred, struct file *file, u32 requested, u16 cmd) { struct common_audit_data ad; - struct file_security_struct *fsec = file->f_security; + struct file_security_struct *fsec = selinux_file(file); struct inode *inode = file_inode(file); struct inode_security_struct *isec; struct lsm_ioctlop_audit ioctl; @@ -3822,7 +3822,7 @@ static void selinux_file_set_fowner(struct file *file) { struct file_security_struct *fsec; - fsec = file->f_security; + fsec = selinux_file(file); fsec->fown_sid = current_sid(); } @@ -3837,7 +3837,7 @@ static int selinux_file_send_sigiotask(struct task_struct *tsk, /* struct fown_struct is never outside the context of a struct file */ file = container_of(fown, struct file, f_owner); - fsec = file->f_security; + fsec = selinux_file(file); if (!signum) perm = signal_to_av(SIGIO); /* as per send_sigio_to_task */ @@ -3861,7 +3861,7 @@ static int selinux_file_open(struct file *file) struct file_security_struct *fsec; struct inode_security_struct *isec; - fsec = file->f_security; + fsec = selinux_file(file); isec = inode_security(file_inode(file)); /* * Save inode label and policy sequence number @@ -4000,7 +4000,7 @@ static int selinux_kernel_module_from_file(struct file *file) ad.type = LSM_AUDIT_DATA_FILE; ad.u.file = file; - fsec = file->f_security; + fsec = selinux_file(file); if (sid != fsec->sid) { rc = avc_has_perm(&selinux_state, sid, fsec->sid, SECCLASS_FD, FD__USE, &ad); diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index ad511c3d2eb7..cad8b765f6dd 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -165,4 +165,9 @@ static inline struct task_security_struct *selinux_cred(const struct cred *cred) return cred->security; } +static inline struct file_security_struct *selinux_file(const struct file *file) +{ + return file->f_security; +} + #endif /* _SELINUX_OBJSEC_H_ */ -- 2.17.1
[PATCH 3.16 36/63] jbd2: don't mark block as modified if the handle is out of credits
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit e09463f220ca9a1a1ecfda84fcda658f99a1f12a upstream. Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: Drop the added logging statement, as it's on a code path that doesn't exist here] Signed-off-by: Ben Hutchings --- --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1288,11 +1288,11 @@ int jbd2_journal_dirty_metadata(handle_t * of the transaction. This needs to be done * once a transaction -bzzz */ - jh->b_modified = 1; if (handle->h_buffer_credits <= 0) { ret = -ENOSPC; goto out_unlock_bh; } + jh->b_modified = 1; handle->h_buffer_credits--; }
[PATCH 3.16 58/63] x86/process: Optimize TIF checks in __switch_to_xtra()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Kyle Huey commit af8b3cd3934ec60f4c2a420d19a9d416554f140b upstream. Help the compiler to avoid reevaluating the thread flags for each checked bit by reordering the bit checks and providing an explicit xor for evaluation. With default defconfigs for each arch, x86_64: arch/x86/kernel/process.o text data bss dec hex 3056 8577 16 116492d81 Before 3024 8577 16 116172d61 After i386: arch/x86/kernel/process.o text data bss dec hex 2957 8673 8 116382d76 Before 2925 8673 8 116062d56 After Originally-by: Thomas Gleixner Signed-off-by: Kyle Huey Cc: Peter Zijlstra Cc: Andy Lutomirski Link: http://lkml.kernel.org/r/20170214081104.9244-2-kh...@kylehuey.com Signed-off-by: Thomas Gleixner [bwh: Backported to 3.16: - We don't do refresh_tr_limit() here - Use ACCESS_ONCE() instead of READ_ONCE()] Signed-off-by: Ben Hutchings --- arch/x86/kernel/process.c | 54 ++- 1 file changed, 31 insertions(+), 23 deletions(-) --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -196,48 +196,56 @@ int set_tsc_mode(unsigned int val) return 0; } +static inline void switch_to_bitmap(struct tss_struct *tss, + struct thread_struct *prev, + struct thread_struct *next, + unsigned long tifp, unsigned long tifn) +{ + if (tifn & _TIF_IO_BITMAP) { + /* +* Copy the relevant range of the IO bitmap. +* Normally this is 128 bytes or less: +*/ + memcpy(tss->io_bitmap, next->io_bitmap_ptr, + max(prev->io_bitmap_max, next->io_bitmap_max)); + } else if (tifp & _TIF_IO_BITMAP) { + /* +* Clear any possible leftover bits: +*/ + memset(tss->io_bitmap, 0xff, prev->io_bitmap_max); + } +} + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, struct tss_struct *tss) { struct thread_struct *prev, *next; + unsigned long tifp, tifn; prev = &prev_p->thread; next = &next_p->thread; - if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^ - test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) { + tifn = ACCESS_ONCE(task_thread_info(next_p)->flags); + tifp = ACCESS_ONCE(task_thread_info(prev_p)->flags); + switch_to_bitmap(tss, prev, next, tifp, tifn); + + propagate_user_return_notify(prev_p, next_p); + + if ((tifp ^ tifn) & _TIF_BLOCKSTEP) { unsigned long debugctl = get_debugctlmsr(); debugctl &= ~DEBUGCTLMSR_BTF; - if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) + if (tifn & _TIF_BLOCKSTEP) debugctl |= DEBUGCTLMSR_BTF; - update_debugctlmsr(debugctl); } - if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^ - test_tsk_thread_flag(next_p, TIF_NOTSC)) { - /* prev and next are different */ - if (test_tsk_thread_flag(next_p, TIF_NOTSC)) + if ((tifp ^ tifn) & _TIF_NOTSC) { + if (tifn & _TIF_NOTSC) hard_disable_TSC(); else hard_enable_TSC(); } - - if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) { - /* -* Copy the relevant range of the IO bitmap. -* Normally this is 128 bytes or less: -*/ - memcpy(tss->io_bitmap, next->io_bitmap_ptr, - max(prev->io_bitmap_max, next->io_bitmap_max)); - } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) { - /* -* Clear any possible leftover bits: -*/ - memset(tss->io_bitmap, 0xff, prev->io_bitmap_max); - } - propagate_user_return_notify(prev_p, next_p); } /*
[PATCH 3.16 15/63] KVM: x86: introduce linear_{read,write}_system
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Paolo Bonzini commit 79367a65743975e5cac8d24d08eccc7fdae832b0 upstream. Wrap the common invocation of ctxt->ops->read_std and ctxt->ops->write_std, so as to have a smaller patch when the functions grow another argument. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Signed-off-by: Paolo Bonzini [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- arch/x86/kvm/emulate.c | 64 +- 1 file changed, 32 insertions(+), 32 deletions(-) --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -731,6 +731,19 @@ static int linearize(struct x86_emulate_ } +static int linear_read_system(struct x86_emulate_ctxt *ctxt, ulong linear, + void *data, unsigned size) +{ + return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception); +} + +static int linear_write_system(struct x86_emulate_ctxt *ctxt, + ulong linear, void *data, + unsigned int size) +{ + return ctxt->ops->write_std(ctxt, linear, data, size, &ctxt->exception); +} + static int segmented_read_std(struct x86_emulate_ctxt *ctxt, struct segmented_address addr, void *data, @@ -1394,8 +1407,7 @@ static int read_interrupt_descriptor(str return emulate_gp(ctxt, index << 3 | 0x2); addr = dt.address + index * 8; - return ctxt->ops->read_std(ctxt, addr, desc, sizeof *desc, - &ctxt->exception); + return linear_read_system(ctxt, addr, desc, sizeof *desc); } static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, @@ -1432,8 +1444,7 @@ static int read_segment_descriptor(struc return emulate_gp(ctxt, selector & 0xfffc); *desc_addr_p = addr = dt.address + index * 8; - return ctxt->ops->read_std(ctxt, addr, desc, sizeof *desc, - &ctxt->exception); + return linear_read_system(ctxt, addr, desc, sizeof(*desc)); } /* allowed just for 8 bytes segments */ @@ -1450,8 +1461,7 @@ static int write_segment_descriptor(stru return emulate_gp(ctxt, selector & 0xfffc); addr = dt.address + index * 8; - return ctxt->ops->write_std(ctxt, addr, desc, sizeof *desc, - &ctxt->exception); + return linear_write_system(ctxt, addr, desc, sizeof *desc); } static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, @@ -1599,8 +1609,7 @@ static int __load_segment_descriptor(str if (ret != X86EMUL_CONTINUE) return ret; } else if (ctxt->mode == X86EMUL_MODE_PROT64) { - ret = ctxt->ops->read_std(ctxt, desc_addr+8, &base3, - sizeof(base3), &ctxt->exception); + ret = linear_read_system(ctxt, desc_addr+8, &base3, sizeof(base3)); if (ret != X86EMUL_CONTINUE) return ret; } @@ -1917,11 +1926,11 @@ static int __emulate_int_real(struct x86 eip_addr = dt.address + (irq << 2); cs_addr = dt.address + (irq << 2) + 2; - rc = ops->read_std(ctxt, cs_addr, &cs, 2, &ctxt->exception); + rc = linear_read_system(ctxt, cs_addr, &cs, 2); if (rc != X86EMUL_CONTINUE) return rc; - rc = ops->read_std(ctxt, eip_addr, &eip, 2, &ctxt->exception); + rc = linear_read_system(ctxt, eip_addr, &eip, 2); if (rc != X86EMUL_CONTINUE) return rc; @@ -2573,27 +2582,23 @@ static int task_switch_16(struct x86_emu u16 tss_selector, u16 old_tss_sel, ulong old_tss_base, struct desc_struct *new_desc) { - const struct x86_emulate_ops *ops = ctxt->ops; struct tss_segment_16 tss_seg; int ret; u32 new_tss_base = get_desc_base(new_desc); - ret = ops->read_std(ctxt, old_tss_base, &tss_seg, sizeof tss_seg, - &ctxt->exception); + ret = linear_read_system(ctxt, old_tss_base, &tss_seg, sizeof tss_seg); if (ret != X86EMUL_CONTINUE) /* FIXME: need to provide precise fault address */ return ret; save_state_to_tss16(ctxt, &tss_seg); - ret = ops->write_std(ctxt, old_tss_base, &tss_seg, sizeof tss_seg, -&ctxt->exception); + ret = linear_write_system(ctxt, old_tss_base, &tss_seg, sizeof tss_seg); if (ret != X86EMUL_CONTINUE) /* FIXME: need to provide precise fault address */ return ret; - ret = ops->read_std(ctxt, new_tss_base, &tss_seg, sizeof tss_seg, - &ctxt->exception); + ret = linear_read_system(ctxt, new_tss_base, &tss_seg
[PATCH 3.16 43/63] x86/speculation: Clean up various Spectre related details
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Ingo Molnar commit 21e433bdb95bdf3aa48226fd3d33af608437f293 upstream. Harmonize all the Spectre messages so that a: dmesg | grep -i spectre ... gives us most Spectre related kernel boot messages. Also fix a few other details: - clarify a comment about firmware speculation control - s/KPTI/PTI - remove various line-breaks that made the code uglier Acked-by: David Woodhouse Cc: Andy Lutomirski Cc: Arjan van de Ven Cc: Borislav Petkov Cc: Dan Williams Cc: Dave Hansen Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -224,8 +224,7 @@ static enum spectre_v2_mitigation_cmd __ if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; else { - ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, - sizeof(arg)); + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); if (ret < 0) return SPECTRE_V2_CMD_AUTO; @@ -246,8 +245,7 @@ static enum spectre_v2_mitigation_cmd __ cmd == SPECTRE_V2_CMD_RETPOLINE_AMD || cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) && !IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("%s selected but not compiled in. Switching to AUTO select\n", - mitigation_options[i].option); + pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option); return SPECTRE_V2_CMD_AUTO; } @@ -317,14 +315,14 @@ static void __init spectre_v2_select_mit goto retpoline_auto; break; } - pr_err("kernel not compiled with retpoline; no mitigation available!"); + pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!"); return; retpoline_auto: if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { retpoline_amd: if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { - pr_err("LFENCE not serializing. Switching to generic retpoline\n"); + pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n"); goto retpoline_generic; } mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : @@ -342,7 +340,7 @@ retpoline_auto: pr_info("%s\n", spectre_v2_strings[mode]); /* -* If neither SMEP or KPTI are available, there is a risk of +* If neither SMEP nor PTI are available, there is a risk of * hitting userspace addresses in the RSB after a context switch * from a shallow call stack to a deeper one. To prevent this fill * the entire RSB, even when using IBRS. @@ -356,13 +354,13 @@ retpoline_auto: if ((!boot_cpu_has(X86_FEATURE_KAISER) && !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) { setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Filling RSB on context switch\n"); + pr_info("Spectre v2 mitigation: Filling RSB on context switch\n"); } /* Initialize Indirect Branch Prediction Barrier if supported */ if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Enabling Indirect Branch Prediction Barrier\n"); + pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); } /* @@ -378,8 +376,7 @@ retpoline_auto: #undef pr_fmt #ifdef CONFIG_SYSFS -ssize_t cpu_show_meltdown(struct device *dev, - struct device_attribute *attr, char *buf) +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) return sprintf(buf, "Not affected\n"); @@ -388,16 +385,14 @@ ssize_t cpu_show_meltdown(struct device return sprintf(buf, "Vulnerable\n"); } -ssize_t cpu_show_spectre_v1(struct device *dev, - struct device_attribute *attr, char *buf) +ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) { if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1)) return sprintf(buf, "Not affected\n"); return sprintf(buf, "Mitigation: __user pointer sanitization\n"); } -ssize_t cpu_show_spectre_v2(struct device *dev, - st
[PATCH 3.16 46/63] cdrom: Fix info leak/OOB read in cdrom_ioctl_drive_status
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Scott Bauer commit 8f3fafc9c2f0ece10832c25f7ffcb07c97a32ad4 upstream. Like d88b6d04: "cdrom: information leak in cdrom_ioctl_media_changed()" There is another cast from unsigned long to int which causes a bounds check to fail with specially crafted input. The value is then used as an index in the slot array in cdrom_slot_status(). Signed-off-by: Scott Bauer Signed-off-by: Scott Bauer Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings --- drivers/cdrom/cdrom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -2528,7 +2528,7 @@ static int cdrom_ioctl_drive_status(stru if (!CDROM_CAN(CDC_SELECT_DISC) || (arg == CDSL_CURRENT || arg == CDSL_NONE)) return cdi->ops->drive_status(cdi, CDSL_CURRENT); - if (((int)arg >= cdi->capacity)) + if (arg >= cdi->capacity) return -EINVAL; return cdrom_slot_status(cdi, arg); }
[PATCH 3.16 44/63] x86/speculation: Protect against userspace-userspace spectreRSB
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Jiri Kosina commit fdf82a7856b32d905c39afc85e34364491e46346 upstream. The article "Spectre Returns! Speculation Attacks using the Return Stack Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks, making use solely of the RSB contents even on CPUs that don't fallback to BTB on RSB underflow (Skylake+). Mitigate userspace-userspace attacks by always unconditionally filling RSB on context switch when the generic spectrev2 mitigation has been enabled. [1] https://arxiv.org/pdf/1807.07940.pdf Signed-off-by: Jiri Kosina Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Acked-by: Tim Chen Cc: Konrad Rzeszutek Wilk Cc: Borislav Petkov Cc: David Woodhouse Cc: Peter Zijlstra Cc: Linus Torvalds Link: https://lkml.kernel.org/r/nycvar.yfh.7.76.1807261308190@cbobk.fhfr.pm Signed-off-by: Ben Hutchings --- arch/x86/kernel/cpu/bugs.c | 38 +++--- 1 file changed, 7 insertions(+), 31 deletions(-) --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -263,23 +263,6 @@ static enum spectre_v2_mitigation_cmd __ return cmd; } -/* Check for Skylake-like CPUs (for RSB handling) */ -static bool __init is_skylake_era(void) -{ - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && - boot_cpu_data.x86 == 6) { - switch (boot_cpu_data.x86_model) { - case INTEL_FAM6_SKYLAKE_MOBILE: - case INTEL_FAM6_SKYLAKE_DESKTOP: - case INTEL_FAM6_SKYLAKE_X: - case INTEL_FAM6_KABYLAKE_MOBILE: - case INTEL_FAM6_KABYLAKE_DESKTOP: - return true; - } - } - return false; -} - static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -340,22 +323,15 @@ retpoline_auto: pr_info("%s\n", spectre_v2_strings[mode]); /* -* If neither SMEP nor PTI are available, there is a risk of -* hitting userspace addresses in the RSB after a context switch -* from a shallow call stack to a deeper one. To prevent this fill -* the entire RSB, even when using IBRS. +* If spectre v2 protection has been enabled, unconditionally fill +* RSB during a context switch; this protects against two independent +* issues: * -* Skylake era CPUs have a separate issue with *underflow* of the -* RSB, when they will predict 'ret' targets from the generic BTB. -* The proper mitigation for this is IBRS. If IBRS is not supported -* or deactivated in favour of retpolines the RSB fill on context -* switch is required. +* - RSB underflow (and switch to BTB) on Skylake+ +* - SpectreRSB variant of spectre v2 on X86_BUG_SPECTRE_V2 CPUs */ - if ((!boot_cpu_has(X86_FEATURE_KAISER) && -!boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) { - setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Spectre v2 mitigation: Filling RSB on context switch\n"); - } + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); + pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n"); /* Initialize Indirect Branch Prediction Barrier if supported */ if (boot_cpu_has(X86_FEATURE_IBPB)) {
[PATCH 3.16 33/63] ext4: never move the system.data xattr out of the inode body
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 8cdb5240ec5928b20490a2bb34cb87e9a5f40226 upstream. When expanding the extra isize space, we must never move the system.data xattr out of the inode body. For performance reasons, it doesn't make any sense, and the inline data implementation assumes that system.data xattr is never in the external xattr block. This addresses CVE-2018-10880 https://bugzilla.kernel.org/show_bug.cgi?id=25 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/xattr.c | 5 + 1 file changed, 5 insertions(+) --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1370,6 +1370,11 @@ retry: /* Find the entry best suited to be pushed into EA block */ entry = NULL; for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { + /* never move system.data out of the inode */ + if ((last->e_name_len == 4) && + (last->e_name_index == EXT4_XATTR_INDEX_SYSTEM) && + !memcmp(last->e_name, "data", 4)) + continue; total_size = EXT4_XATTR_SIZE(le32_to_cpu(last->e_value_size)) + EXT4_XATTR_LEN(last->e_name_len);
[PATCH 3.16 41/63] USB: yurex: fix out-of-bounds uaccess in read handler
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Jann Horn commit f1e255d60ae66a9f672ff9a207ee6cd8e33d2679 upstream. In general, accessing userspace memory beyond the length of the supplied buffer in VFS read/write handlers can lead to both kernel memory corruption (via kernel_read()/kernel_write(), which can e.g. be triggered via sys_splice()) and privilege escalation inside userspace. Fix it by using simple_read_from_buffer() instead of custom logic. Fixes: 6bc235a2e24a ("USB: add driver for Meywa-Denki & Kayac YUREX") Signed-off-by: Jann Horn Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings --- drivers/usb/misc/yurex.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) --- a/drivers/usb/misc/yurex.c +++ b/drivers/usb/misc/yurex.c @@ -413,8 +413,7 @@ static int yurex_release(struct inode *i static ssize_t yurex_read(struct file *file, char *buffer, size_t count, loff_t *ppos) { struct usb_yurex *dev; - int retval = 0; - int bytes_read = 0; + int len = 0; char in_buffer[20]; unsigned long flags; @@ -422,26 +421,16 @@ static ssize_t yurex_read(struct file *f mutex_lock(&dev->io_mutex); if (!dev->interface) { /* already disconnected */ - retval = -ENODEV; - goto exit; + mutex_unlock(&dev->io_mutex); + return -ENODEV; } spin_lock_irqsave(&dev->lock, flags); - bytes_read = snprintf(in_buffer, 20, "%lld\n", dev->bbu); + len = snprintf(in_buffer, 20, "%lld\n", dev->bbu); spin_unlock_irqrestore(&dev->lock, flags); - - if (*ppos < bytes_read) { - if (copy_to_user(buffer, in_buffer + *ppos, bytes_read - *ppos)) - retval = -EFAULT; - else { - retval = bytes_read - *ppos; - *ppos += bytes_read; - } - } - -exit: mutex_unlock(&dev->io_mutex); - return retval; + + return simple_read_from_buffer(buffer, count, ppos, in_buffer, len); } static ssize_t yurex_write(struct file *file, const char *user_buffer, size_t count, loff_t *ppos)
[PATCH 3.16 29/63] ext4: make sure bitmaps and the inode table don't overlap with bg descriptors
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 77260807d1170a8cf35dbb06e07461a655f67eee upstream. It's really bad when the allocation bitmaps and the inode table overlap with the block group descriptors, since it causes random corruption of the bg descriptors. So we really want to head those off at the pass. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: Open-code sb_rdonly()] Signed-off-by: Ben Hutchings --- fs/ext4/super.c | 25 + 1 file changed, 25 insertions(+) --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2086,6 +2086,7 @@ static int ext4_check_descriptors(struct struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); ext4_fsblk_t last_block; + ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; ext4_fsblk_t block_bitmap; ext4_fsblk_t inode_bitmap; ext4_fsblk_t inode_table; @@ -2118,6 +2119,14 @@ static int ext4_check_descriptors(struct if (!(sb->s_flags & MS_RDONLY)) return 0; } + if (block_bitmap >= sb_block + 1 && + block_bitmap <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " +"Block bitmap for group %u overlaps " +"block group descriptors", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; + } if (block_bitmap < first_block || block_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Block bitmap for group %u not in group " @@ -2132,6 +2141,14 @@ static int ext4_check_descriptors(struct if (!(sb->s_flags & MS_RDONLY)) return 0; } + if (inode_bitmap >= sb_block + 1 && + inode_bitmap <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " +"Inode bitmap for group %u overlaps " +"block group descriptors", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; + } if (inode_bitmap < first_block || inode_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Inode bitmap for group %u not in group " @@ -2146,6 +2163,14 @@ static int ext4_check_descriptors(struct if (!(sb->s_flags & MS_RDONLY)) return 0; } + if (inode_table >= sb_block + 1 && + inode_table <= last_bg_block) { + ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " +"Inode table for group %u overlaps " +"block group descriptors", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; + } if (inode_table < first_block || inode_table + sbi->s_itb_per_group - 1 > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: "
[PATCH 3.16 30/63] ext4: fix false negatives *and* false positives in ext4_check_descriptors()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 44de022c4382541cebdd6de4465d1f4f465ff1dd upstream. Ext4_check_descriptors() was getting called before s_gdb_count was initialized. So for file systems w/o the meta_bg feature, allocation bitmaps could overlap the block group descriptors and ext4 wouldn't notice. For file systems with the meta_bg feature enabled, there was a fencepost error which would cause the ext4_check_descriptors() to incorrectly believe that the block allocation bitmap overlaps with the block group descriptor blocks, and it would reject the mount. Fix both of these problems. Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/super.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2086,7 +2086,7 @@ static int ext4_check_descriptors(struct struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t first_block = le32_to_cpu(sbi->s_es->s_first_data_block); ext4_fsblk_t last_block; - ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0) + 1; + ext4_fsblk_t last_bg_block = sb_block + ext4_bg_num_gdb(sb, 0); ext4_fsblk_t block_bitmap; ext4_fsblk_t inode_bitmap; ext4_fsblk_t inode_table; @@ -3987,6 +3987,7 @@ static int ext4_fill_super(struct super_ goto failed_mount2; } } + sbi->s_gdb_count = db_count; if (!ext4_check_descriptors(sb, logical_sb_block, &first_not_zeroed)) { ext4_msg(sb, KERN_ERR, "group descriptors corrupted!"); goto failed_mount2; @@ -3999,7 +4000,6 @@ static int ext4_fill_super(struct super_ goto failed_mount2; } - sbi->s_gdb_count = db_count; get_random_bytes(&sbi->s_next_generation, sizeof(u32)); spin_lock_init(&sbi->s_next_gen_lock);
Re: [PATCH] watchdog/hpwdt: Disable PreTimeout when Timeout is smaller
On 09/21/2018 01:50 PM, Jerry Hoemann wrote: During module install, disable pretimeout if the requested timeout value is not greater than the minimal pretimeout value that is supported by hardware. This makes the module load handling of pretimeout consistent with the ioctl handling of pretimeout. Signed-off-by: Jerry Hoemann Reviewed-by: Guenter Roeck
[PATCH v4 11/19] LSM: Infrastructure management of the file security
Move management of the file->f_security blob out of the individual security modules and into the infrastructure. The modules no longer allocate or free the data, instead they tell the infrastructure how much space they require. Signed-off-by: Casey Schaufler --- include/linux/lsm_hooks.h | 1 + security/apparmor/lsm.c| 19 +++--- security/security.c| 54 +++--- security/selinux/hooks.c | 25 ++ security/smack/smack.h | 2 +- security/smack/smack_lsm.c | 14 +- 6 files changed, 66 insertions(+), 49 deletions(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 0bef312efd45..167ffbd4d0c0 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -2029,6 +2029,7 @@ struct security_hook_list { */ struct lsm_blob_sizes { int lbs_cred; + int lbs_file; }; /* diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index c2566aaa138e..15716b6ff860 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -431,21 +431,21 @@ static int apparmor_file_open(struct file *file) static int apparmor_file_alloc_security(struct file *file) { - int error = 0; - - /* freed by apparmor_file_free_security */ + struct aa_file_ctx *ctx = file_ctx(file); struct aa_label *label = begin_current_label_crit_section(); - file->f_security = aa_alloc_file_ctx(label, GFP_KERNEL); - if (!file_ctx(file)) - error = -ENOMEM; - end_current_label_crit_section(label); - return error; + spin_lock_init(&ctx->lock); + rcu_assign_pointer(ctx->label, aa_get_label(label)); + end_current_label_crit_section(label); + return 0; } static void apparmor_file_free_security(struct file *file) { - aa_free_file_ctx(file_ctx(file)); + struct aa_file_ctx *ctx = file_ctx(file); + + if (ctx) + aa_put_label(rcu_access_pointer(ctx->label)); } static int common_file_perm(const char *op, struct file *file, u32 mask) @@ -1131,6 +1131,7 @@ static void apparmor_sock_graft(struct sock *sk, struct socket *parent) */ struct lsm_blob_sizes apparmor_blob_sizes = { .lbs_cred = sizeof(struct aa_task_ctx *), + .lbs_file = sizeof(struct aa_file_ctx), }; static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = { diff --git a/security/security.c b/security/security.c index ff7df14f6db1..5430cae73cf6 100644 --- a/security/security.c +++ b/security/security.c @@ -40,6 +40,8 @@ struct security_hook_heads security_hook_heads __lsm_ro_after_init; static ATOMIC_NOTIFIER_HEAD(lsm_notifier_chain); +static struct kmem_cache *lsm_file_cache; + char *lsm_names; static struct lsm_blob_sizes blob_sizes; @@ -92,6 +94,13 @@ int __init security_init(void) */ do_security_initcalls(); + /* +* Create any kmem_caches needed for blobs +*/ + if (blob_sizes.lbs_file) + lsm_file_cache = kmem_cache_create("lsm_file_cache", + blob_sizes.lbs_file, 0, + SLAB_PANIC, NULL); /* * The second call to a module specific init function * adds hooks to the hook lists and does any other early @@ -101,6 +110,7 @@ int __init security_init(void) #ifdef CONFIG_SECURITY_LSM_DEBUG pr_info("LSM: cred blob size = %d\n", blob_sizes.lbs_cred); + pr_info("LSM: file blob size = %d\n", blob_sizes.lbs_file); #endif return 0; @@ -277,6 +287,28 @@ static void __init lsm_set_size(int *need, int *lbs) void __init security_add_blobs(struct lsm_blob_sizes *needed) { lsm_set_size(&needed->lbs_cred, &blob_sizes.lbs_cred); + lsm_set_size(&needed->lbs_file, &blob_sizes.lbs_file); +} + +/** + * lsm_file_alloc - allocate a composite file blob + * @file: the file that needs a blob + * + * Allocate the file blob for all the modules + * + * Returns 0, or -ENOMEM if memory can't be allocated. + */ +int lsm_file_alloc(struct file *file) +{ + if (!lsm_file_cache) { + file->f_security = NULL; + return 0; + } + + file->f_security = kmem_cache_zalloc(lsm_file_cache, GFP_KERNEL); + if (file->f_security == NULL) + return -ENOMEM; + return 0; } /* @@ -962,12 +994,28 @@ int security_file_permission(struct file *file, int mask) int security_file_alloc(struct file *file) { - return call_int_hook(file_alloc_security, 0, file); + int rc = lsm_file_alloc(file); + + if (rc) + return rc; + rc = call_int_hook(file_alloc_security, 0, file); + if (unlikely(rc)) + security_file_free(file); + return rc; } void security_file_free(struct file *file) { + void *blob; + + if (!lsm_file_cache) + return; + call_void
[PATCH v4 12/19] SELinux: Abstract use of inode security blob
Don't use the inode->i_security pointer directly. Provide a helper function that provides the security blob pointer. Signed-off-by: Casey Schaufler Reviewed-by: Kees Cook --- security/selinux/hooks.c | 26 +- security/selinux/include/objsec.h | 6 ++ security/selinux/selinuxfs.c | 4 ++-- 3 files changed, 21 insertions(+), 15 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index fdda53552224..248ae907320f 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -275,7 +275,7 @@ static int __inode_security_revalidate(struct inode *inode, struct dentry *dentry, bool may_sleep) { - struct inode_security_struct *isec = inode->i_security; + struct inode_security_struct *isec = selinux_inode(inode); might_sleep_if(may_sleep); @@ -296,7 +296,7 @@ static int __inode_security_revalidate(struct inode *inode, static struct inode_security_struct *inode_security_novalidate(struct inode *inode) { - return inode->i_security; + return selinux_inode(inode); } static struct inode_security_struct *inode_security_rcu(struct inode *inode, bool rcu) @@ -306,7 +306,7 @@ static struct inode_security_struct *inode_security_rcu(struct inode *inode, boo error = __inode_security_revalidate(inode, NULL, !rcu); if (error) return ERR_PTR(error); - return inode->i_security; + return selinux_inode(inode); } /* @@ -315,14 +315,14 @@ static struct inode_security_struct *inode_security_rcu(struct inode *inode, boo static struct inode_security_struct *inode_security(struct inode *inode) { __inode_security_revalidate(inode, NULL, true); - return inode->i_security; + return selinux_inode(inode); } static struct inode_security_struct *backing_inode_security_novalidate(struct dentry *dentry) { struct inode *inode = d_backing_inode(dentry); - return inode->i_security; + return selinux_inode(inode); } /* @@ -333,7 +333,7 @@ static struct inode_security_struct *backing_inode_security(struct dentry *dentr struct inode *inode = d_backing_inode(dentry); __inode_security_revalidate(inode, dentry, true); - return inode->i_security; + return selinux_inode(inode); } static void inode_free_rcu(struct rcu_head *head) @@ -346,7 +346,7 @@ static void inode_free_rcu(struct rcu_head *head) static void inode_free_security(struct inode *inode) { - struct inode_security_struct *isec = inode->i_security; + struct inode_security_struct *isec = selinux_inode(inode); struct superblock_security_struct *sbsec = inode->i_sb->s_security; /* @@ -1500,7 +1500,7 @@ static int selinux_genfs_get_sid(struct dentry *dentry, static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dentry) { struct superblock_security_struct *sbsec = NULL; - struct inode_security_struct *isec = inode->i_security; + struct inode_security_struct *isec = selinux_inode(inode); u32 task_sid, sid = 0; u16 sclass; struct dentry *dentry; @@ -1800,7 +1800,7 @@ static int inode_has_perm(const struct cred *cred, return 0; sid = cred_sid(cred); - isec = inode->i_security; + isec = selinux_inode(inode); return avc_has_perm(&selinux_state, sid, isec->sid, isec->sclass, perms, adp); @@ -3028,7 +3028,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir, /* Possibly defer initialization to selinux_complete_init. */ if (sbsec->flags & SE_SBINITIALIZED) { - struct inode_security_struct *isec = inode->i_security; + struct inode_security_struct *isec = selinux_inode(inode); isec->sclass = inode_mode_to_security_class(inode->i_mode); isec->sid = newsid; isec->initialized = LABEL_INITIALIZED; @@ -3128,7 +3128,7 @@ static noinline int audit_inode_permission(struct inode *inode, unsigned flags) { struct common_audit_data ad; - struct inode_security_struct *isec = inode->i_security; + struct inode_security_struct *isec = selinux_inode(inode); int rc; ad.type = LSM_AUDIT_DATA_INODE; @@ -4148,7 +4148,7 @@ static int selinux_task_kill(struct task_struct *p, struct siginfo *info, static void selinux_task_to_inode(struct task_struct *p, struct inode *inode) { - struct inode_security_struct *isec = inode->i_security; + struct inode_security_struct *isec = selinux_inode(inode); u32 sid = task_sid(p); spin_lock(&isec->lock); @@ -6527,7 +6527,7 @@ static void selinux_release_secctx(char *secdata, u32 seclen) static void selinux_inode_i
[PATCH 3.16 28/63] ext4: don't allow r/w mounts if metadata blocks overlap the superblock
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 18db4b4e6fc31eda838dd1c1296d67dbcb3dc957 upstream. If some metadata block, such as an allocation bitmap, overlaps the superblock, it's very likely that if the file system is mounted read/write, the results will not be pretty. So disallow r/w mounts for file systems corrupted in this particular way. Backport notes: 3.18.y is missing bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") and e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") so we simply use the sb MS_RDONLY check from pre bc98a42c1f7d in place of the sb_rdonly function used in the upstream variant of the patch. Signed-off-by: Theodore Ts'o Signed-off-by: Harsh Shandilya Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ben Hutchings --- fs/ext4/super.c | 6 ++ 1 file changed, 6 insertions(+) --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2115,6 +2115,8 @@ static int ext4_check_descriptors(struct ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Block bitmap for group %u overlaps " "superblock", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; } if (block_bitmap < first_block || block_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " @@ -2127,6 +2129,8 @@ static int ext4_check_descriptors(struct ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Inode bitmap for group %u overlaps " "superblock", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; } if (inode_bitmap < first_block || inode_bitmap > last_block) { ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " @@ -2139,6 +2143,8 @@ static int ext4_check_descriptors(struct ext4_msg(sb, KERN_ERR, "ext4_check_descriptors: " "Inode table for group %u overlaps " "superblock", i); + if (!(sb->s_flags & MS_RDONLY)) + return 0; } if (inode_table < first_block || inode_table + sbi->s_itb_per_group - 1 > last_block) {
[PATCH 3.16 27/63] ext4: always check block group bounds in ext4_init_block_bitmap()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 819b23f1c501b17b9694325471789e6b5cc2d0d2 upstream. Regardless of whether the flex_bg feature is set, we should always check to make sure the bits we are setting in the block bitmap are within the block group bounds. https://bugzilla.kernel.org/show_bug.cgi?id=199865 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/balloc.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -184,7 +184,6 @@ static int ext4_init_block_bitmap(struct unsigned int bit, bit_max; struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_fsblk_t start, tmp; - int flex_bg = 0; struct ext4_group_info *grp; J_ASSERT_BH(bh, buffer_locked(bh)); @@ -217,22 +216,19 @@ static int ext4_init_block_bitmap(struct start = ext4_group_first_block_no(sb, block_group); - if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG)) - flex_bg = 1; - /* Set bits for block and inode bitmaps, and inode table */ tmp = ext4_block_bitmap(sb, gdp); - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); tmp = ext4_inode_bitmap(sb, gdp); - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); tmp = ext4_inode_table(sb, gdp); for (; tmp < ext4_inode_table(sb, gdp) + sbi->s_itb_per_group; tmp++) { - if (!flex_bg || ext4_block_in_group(sb, tmp, block_group)) + if (ext4_block_in_group(sb, tmp, block_group)) ext4_set_bit(EXT4_B2C(sbi, tmp - start), bh->b_data); }
[PATCH v4 19/19] LSM: Blob sharing support for S.A.R.A and LandLock
Two proposed security modules require the ability to share security blobs with existing "major" security modules. These modules, S.A.R.A and LandLock, provide significantly different services than SELinux, Smack or AppArmor. Using either in conjunction with the existing modules is quite reasonable. S.A.R.A requires access to the cred, inode and task blobs, while LandLock uses the cred, file, inode and ipc blobs. The use of the cred, file, inode, ipc and task blobs has been abstracted in preceding patches in the series. This patch teaches the affected security modules how to access the part of the blob set aside for their use in the case where blobs are shared. The configuration option CONFIG_SECURITY_STACKING identifies systems where the blobs may be shared. The mechanism for selecting which security modules are active has been changed to allow non-conflicting "major" security modules to be used together. At this time the TOMOYO module can safely be used with any of the others. The two new modules would be non-conflicting as well. Signed-off-by: Casey Schaufler --- Documentation/admin-guide/LSM/index.rst | 14 +++-- include/linux/lsm_hooks.h | 2 +- security/Kconfig| 81 + security/apparmor/include/cred.h| 8 +++ security/apparmor/include/file.h| 9 ++- security/apparmor/include/lib.h | 4 ++ security/apparmor/lsm.c | 8 ++- security/security.c | 30 - security/selinux/hooks.c| 3 +- security/selinux/include/objsec.h | 12 security/smack/smack.h | 13 security/smack/smack_lsm.c | 3 +- security/tomoyo/common.h| 5 ++ security/tomoyo/tomoyo.c| 3 +- 14 files changed, 182 insertions(+), 13 deletions(-) diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst index 9842e21afd4a..d3d8af174042 100644 --- a/Documentation/admin-guide/LSM/index.rst +++ b/Documentation/admin-guide/LSM/index.rst @@ -17,10 +17,16 @@ MAC extensions, other extensions can be built using the LSM to provide specific changes to system operation when these tweaks are not available in the core functionality of Linux itself. -The Linux capabilities modules will always be included. This may be -followed by any number of "minor" modules and at most one "major" module. -For more details on capabilities, see ``capabilities(7)`` in the Linux -man-pages project. +The Linux capabilities modules will always be included. For more details +on capabilities, see ``capabilities(7)`` in the Linux man-pages project. + +Security modules that do not use the security data blobs maintained +by the LSM infrastructure are considered "minor" modules. These may be +included at compile time and stacked explicitly. Security modules that +use the LSM maintained security blobs are considered "major" modules. +These may only be stacked if the CONFIG_LSM_STACKED configuration +option is used. If this is chosen all of the security modules selected +will be used. A list of the active security modules can be found by reading ``/sys/kernel/security/lsm``. This is a comma separated list, and diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index f6dbde28833a..7e8b32fdf576 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -2082,7 +2082,7 @@ static inline void security_delete_hooks(struct security_hook_list *hooks, #define __lsm_ro_after_init__ro_after_init #endif /* CONFIG_SECURITY_WRITABLE_HOOKS */ -extern int __init security_module_enable(const char *module); +extern bool __init security_module_enable(const char *lsm, const bool stacked); extern void __init capability_add_hooks(void); #ifdef CONFIG_SECURITY_YAMA extern void __init yama_add_hooks(void); diff --git a/security/Kconfig b/security/Kconfig index 22f7664c4977..ed48025ae9e0 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -36,6 +36,28 @@ config SECURITY_WRITABLE_HOOKS bool default n +config SECURITY_STACKING + bool "Security module stacking" + depends on SECURITY + help + Allows multiple major security modules to be stacked. + Modules are invoked in the order registered with a + "bail on fail" policy, in which the infrastructure + will stop processing once a denial is detected. Not + all modules can be stacked. SELinux, Smack and AppArmor are + known to be incompatible. User space components may + have trouble identifying the security module providing + data in some cases. + + If you select this option you will have to select which + of the stackable modules you wish to be active. The + "Default security module" will be ignored. The boot line + "security=" option can be used to specify that one of + the modules identifed for stacking sho
[PATCH 3.16 35/63] ext4: add more inode number paranoia checks
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit c37e9e013469521d9adb932d17a1795c139b36db upstream. If there is a directory entry pointing to a system inode (such as a journal inode), complain and declare the file system to be corrupted. Also, if the superblock's first inode number field is too small, refuse to mount the file system. This addresses CVE-2018-10882. https://bugzilla.kernel.org/show_bug.cgi?id=200069 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/ext4.h | 5 - fs/ext4/inode.c | 3 ++- fs/ext4/super.c | 5 + 3 files changed, 7 insertions(+), 6 deletions(-) --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1422,11 +1422,6 @@ static inline struct timespec ext4_curre static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino) { return ino == EXT4_ROOT_INO || - ino == EXT4_USR_QUOTA_INO || - ino == EXT4_GRP_QUOTA_INO || - ino == EXT4_BOOT_LOADER_INO || - ino == EXT4_JOURNAL_INO || - ino == EXT4_RESIZE_INO || (ino >= EXT4_FIRST_INO(sb) && ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)); } --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3957,7 +3957,8 @@ static int __ext4_get_inode_loc(struct i int inodes_per_block, inode_offset; iloc->bh = NULL; - if (!ext4_valid_inum(sb, inode->i_ino)) + if (inode->i_ino < EXT4_ROOT_INO || + inode->i_ino > le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count)) return -EIO; iloc->block_group = (inode->i_ino - 1) / EXT4_INODES_PER_GROUP(sb); --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3771,6 +3771,11 @@ static int ext4_fill_super(struct super_ } else { sbi->s_inode_size = le16_to_cpu(es->s_inode_size); sbi->s_first_ino = le32_to_cpu(es->s_first_ino); + if (sbi->s_first_ino < EXT4_GOOD_OLD_FIRST_INO) { + ext4_msg(sb, KERN_ERR, "invalid first ino: %u", +sbi->s_first_ino); + goto failed_mount; + } if ((sbi->s_inode_size < EXT4_GOOD_OLD_INODE_SIZE) || (!is_power_of_2(sbi->s_inode_size)) || (sbi->s_inode_size > blocksize)) {
[PATCH 3.16 39/63] x86/entry/64: Remove %ebx handling from error_entry/exit
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Andy Lutomirski commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream. error_entry and error_exit communicate the user vs. kernel status of the frame using %ebx. This is unnecessary -- the information is in regs->cs. Just use regs->cs. This makes error_entry simpler and makes error_exit more robust. It also fixes a nasty bug. Before all the Spectre nonsense, the xen_failsafe_callback entry point returned like this: ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS ENCODE_FRAME_POINTER jmp error_exit And it did not go through error_entry. This was bogus: RBX contained garbage, and error_exit expected a flag in RBX. Fortunately, it generally contained *nonzero* garbage, so the correct code path was used. As part of the Spectre fixes, code was added to clear RBX to mitigate certain speculation attacks. Now, depending on kernel configuration, RBX got zeroed and, when running some Wine workloads, the kernel crashes. This was introduced by: commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") With this patch applied, RBX is no longer needed as a flag, and the problem goes away. I suspect that malicious userspace could use this bug to crash the kernel even without the offending patch applied, though. [ Historical note: I wrote this patch as a cleanup before I was aware of the bug it fixed. ] [ Note to stable maintainers: this should probably get applied to all kernels. If you're nervous about that, a more conservative fix to add xorl %ebx,%ebx; incl %ebx before the jump to error_exit should also fix the problem. ] Reported-and-tested-by: M. Vefa Bicakci Signed-off-by: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: Dominik Brodowski Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: xen-de...@lists.xenproject.org Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface") Link: http://lkml.kernel.org/r/b5010a090d3586b2d6e06c7ad3ec5542d1241c45.1532282627.git.l...@kernel.org Signed-off-by: Ingo Molnar [bwh: Backported to 3.16: - error_exit moved EBX to EAX before testing it, so delete both instructions - error_exit does RESTORE_REST earlier, so adjust the offset to saved CS accordingly - Drop inapplicable comment changes - Adjust filename, context] Signed-off-by: Ben Hutchings --- --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -1135,7 +1135,7 @@ ENTRY(\sym) .if \paranoid jmp paranoid_exit /* %ebx: no swapgs flag */ .else - jmp error_exit /* %ebx: no swapgs flag */ + jmp error_exit .endif CFI_ENDPROC @@ -1411,7 +1411,6 @@ END(paranoid_exit) /* * Exception entry point. This expects an error code/orig_rax on the stack. - * returns in "no swapgs flag" in %ebx. */ ENTRY(error_entry) XCPT_FRAME @@ -1440,7 +1439,6 @@ ENTRY(error_entry) * the kernel CR3 here. */ SWITCH_KERNEL_CR3 - xorl %ebx,%ebx testl $3,CS+8(%rsp) je error_kernelspace error_swapgs: @@ -1456,7 +1454,6 @@ error_sti: * for these here too. */ error_kernelspace: - incl %ebx leaq native_irq_return_iret(%rip),%rcx cmpq %rcx,RIP+8(%rsp) je error_bad_iret @@ -1477,22 +1474,18 @@ error_bad_iret: mov %rsp,%rdi call fixup_bad_iret mov %rax,%rsp - decl %ebx /* Return to usergs */ jmp error_sti CFI_ENDPROC END(error_entry) - -/* ebx:no swapgs flag (1: don't need swapgs, 0: need it) */ ENTRY(error_exit) DEFAULT_FRAME - movl %ebx,%eax RESTORE_REST DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF GET_THREAD_INFO(%rcx) - testl %eax,%eax - jne retint_kernel + testb $3, CS-ARGOFFSET(%rsp) + jz retint_kernel LOCKDEP_SYS_EXIT_IRQ movl TI_flags(%rcx),%edx movl $_TIF_WORK_MASK,%edi
[PATCH 3.16 34/63] ext4: clear i_data in ext4_inode_info when removing inline data
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 6e8ab72a812396996035a37e5ca4b3b99b5d214b upstream. When converting from an inode from storing the data in-line to a data block, ext4_destroy_inline_data_nolock() was only clearing the on-disk copy of the i_blocks[] array. It was not clearing copy of the i_blocks[] in ext4_inode_info, in i_data[], which is the copy actually used by ext4_map_blocks(). This didn't matter much if we are using extents, since the extents header would be invalid and thus the extents could would re-initialize the extents tree. But if we are using indirect blocks, the previous contents of the i_blocks array will be treated as block numbers, with potentially catastrophic results to the file system integrity and/or user data. This gets worse if the file system is using a 1k block size and s_first_data is zero, but even without this, the file system can get quite badly corrupted. This addresses CVE-2018-10881. https://bugzilla.kernel.org/show_bug.cgi?id=200015 Signed-off-by: Theodore Ts'o Signed-off-by: Ben Hutchings --- fs/ext4/inline.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -438,6 +438,7 @@ static int ext4_destroy_inline_data_nolo memset((void *)ext4_raw_inode(&is.iloc)->i_block, 0, EXT4_MIN_INLINE_DATA_SIZE); + memset(ei->i_data, 0, EXT4_MIN_INLINE_DATA_SIZE); if (EXT4_HAS_INCOMPAT_FEATURE(inode->i_sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) {
[PATCH 3.16 32/63] ext4: always verify the magic number in xattr blocks
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream. If there an inode points to a block which is also some other type of metadata block (such as a block allocation bitmap), the buffer_verified flag can be set when it was validated as that other metadata block type; however, it would make a really terrible external attribute block. The reason why we use the verified flag is to avoid constantly reverifying the block. However, it doesn't take much overhead to make sure the magic number of the xattr block is correct, and this will avoid potential crashes. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=21 Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/xattr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -213,12 +213,12 @@ ext4_xattr_check_block(struct inode *ino { int error; - if (buffer_verified(bh)) - return 0; - if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) || BHDR(bh)->h_blocks != cpu_to_le32(1)) return -EIO; + if (buffer_verified(bh)) + return 0; + if (!ext4_xattr_block_csum_verify(inode, bh)) return -EIO; error = ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size,
[PATCH 3.16 38/63] Fix up non-directory creation in SGID directories
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Linus Torvalds commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream. sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: Jann Horn Cc: Andy Lutomirski Cc: Al Viro Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings --- fs/inode.c | 6 ++ 1 file changed, 6 insertions(+) --- a/fs/inode.c +++ b/fs/inode.c @@ -1827,8 +1827,14 @@ void inode_init_owner(struct inode *inod inode->i_uid = current_fsuid(); if (dir && dir->i_mode & S_ISGID) { inode->i_gid = dir->i_gid; + + /* Directories are special, and always inherit S_ISGID */ if (S_ISDIR(mode)) mode |= S_ISGID; + else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) && +!in_group_p(inode->i_gid) && +!capable_wrt_inode_uidgid(dir, CAP_FSETID)) + mode &= ~S_ISGID; } else inode->i_gid = current_fsgid(); inode->i_mode = mode;
[PATCH 3.16 23/63] xfs: set format back to extents if xfs_bmap_extents_to_btree
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Eric Sandeen commit 2c4306f719b083d17df2963bc761777576b8ad1b upstream. If xfs_bmap_extents_to_btree fails in a mode where we call xfs_iroot_realloc(-1) to de-allocate the root, set the format back to extents. Otherwise we can assume we can dereference ifp->if_broot based on the XFS_DINODE_FMT_BTREE format, and crash. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423 Signed-off-by: Eric Sandeen Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong [bwh: Backported to 3.16: - Only one failure path needs to be patched - Adjust filename] Signed-off-by: Ben Hutchings --- fs/xfs/xfs_bmap.c | 4 1 file changed, 4 insertions(+) --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -822,6 +822,8 @@ xfs_bmap_extents_to_btree( *logflagsp = 0; if ((error = xfs_alloc_vextent(&args))) { xfs_iroot_realloc(ip, -1, whichfork); + ASSERT(ifp->if_broot == NULL); + XFS_IFORK_FMT_SET(ip, whichfork, XFS_DINODE_FMT_EXTENTS); xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); return error; }
[PATCH 3.16 26/63] ext4: verify the depth of extent tree in ext4_find_extent()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit bc890a60247171294acc0bd67d211fa4b88d40ba upstream. If there is a corupted file system where the claimed depth of the extent tree is -1, this can cause a massive buffer overrun leading to sadness. This addresses CVE-2018-10877. https://bugzilla.kernel.org/show_bug.cgi?id=199417 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: return -EIO instead of -EFSCORRUPTED] Signed-off-by: Ben Hutchings --- fs/ext4/ext4_extents.h | 1 + fs/ext4/extents.c | 6 ++ 2 files changed, 7 insertions(+) --- a/fs/ext4/ext4_extents.h +++ b/fs/ext4/ext4_extents.h @@ -103,6 +103,7 @@ struct ext4_extent_header { }; #define EXT4_EXT_MAGIC cpu_to_le16(0xf30a) +#define EXT4_MAX_EXTENT_DEPTH 5 #define EXT4_EXTENT_TAIL_OFFSET(hdr) \ (sizeof(struct ext4_extent_header) + \ --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -851,6 +851,12 @@ ext4_ext_find_extent(struct inode *inode eh = ext_inode_hdr(inode); depth = ext_depth(inode); + if (depth < 0 || depth > EXT4_MAX_EXTENT_DEPTH) { + EXT4_ERROR_INODE(inode, "inode has invalid extent depth: %d", +depth); + ret = -EIO; + goto err; + } /* account possible depth increase */ if (!path) {
[PATCH 3.16 31/63] ext4: add corruption check in ext4_xattr_set_entry()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream. In theory this should have been caught earlier when the xattr list was verified, but in case it got missed, it's simple enough to add check to make sure we don't overrun the xattr buffer. This addresses CVE-2018-10879. https://bugzilla.kernel.org/show_bug.cgi?id=21 Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger [bwh: Backported to 3.16: - Add inode parameter to ext4_xattr_set_entry() and update callers - Return -EIO instead of -EFSCORRUPTED on error - Adjust context] Signed-off-by: Ben Hutchings --- --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -610,14 +610,20 @@ static size_t ext4_xattr_free_space(stru } static int -ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s) +ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s, +struct inode *inode) { - struct ext4_xattr_entry *last; + struct ext4_xattr_entry *last, *next; size_t free, min_offs = s->end - s->base, name_len = strlen(i->name); /* Compute min_offs and last. */ last = s->first; - for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { + for (; !IS_LAST_ENTRY(last); last = next) { + next = EXT4_XATTR_NEXT(last); + if ((void *)next >= s->end) { + EXT4_ERROR_INODE(inode, "corrupted xattr entries"); + return -EIO; + } if (!last->e_value_block && last->e_value_size) { size_t offs = le16_to_cpu(last->e_value_offs); if (offs < min_offs) @@ -798,7 +804,7 @@ ext4_xattr_block_set(handle_t *handle, s ce = NULL; } ea_bdebug(bs->bh, "modifying in-place"); - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (!error) { if (!IS_LAST_ENTRY(s->first)) ext4_xattr_rehash(header(s->base), @@ -851,7 +857,7 @@ ext4_xattr_block_set(handle_t *handle, s s->end = s->base + sb->s_blocksize; } - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (error == -EIO) goto bad_block; if (error) @@ -1021,7 +1027,7 @@ int ext4_xattr_ibody_inline_set(handle_t if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (error) { if (error == -ENOSPC && ext4_has_inline_data(inode)) { @@ -1033,7 +1039,7 @@ int ext4_xattr_ibody_inline_set(handle_t error = ext4_xattr_ibody_find(inode, i, is); if (error) return error; - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); } if (error) return error; @@ -1059,7 +1065,7 @@ static int ext4_xattr_ibody_set(handle_t if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; - error = ext4_xattr_set_entry(i, s); + error = ext4_xattr_set_entry(i, s, inode); if (error) return error; header = IHDR(inode, ext4_raw_inode(&is->iloc));
[PATCH 3.16 42/63] ALSA: rawmidi: Change resized buffers atomically
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Takashi Iwai commit 39675f7a7c7e7702f7d5341f1e0d01db746543a0 upstream. The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the current code is racy. For example, the sequencer client may write to buffer while it being resized. As a simple workaround, let's switch to the resized buffer inside the stream runtime lock. Reported-by: syzbot+52f83f0ea8df16932...@syzkaller.appspotmail.com Signed-off-by: Takashi Iwai Signed-off-by: Ben Hutchings --- sound/core/rawmidi.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -645,7 +645,7 @@ static int snd_rawmidi_info_select_user( int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime; if (substream->append && substream->use_count > 1) @@ -658,13 +658,17 @@ int snd_rawmidi_output_params(struct snd return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; runtime->avail = runtime->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; substream->active_sensing = !params->no_active_sensing; @@ -675,7 +679,7 @@ EXPORT_SYMBOL(snd_rawmidi_output_params) int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime; snd_rawmidi_drain_input(substream); @@ -686,12 +690,16 @@ int snd_rawmidi_input_params(struct snd_ return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; return 0;
[PATCH 3.16 40/63] infiniband: fix a possible use-after-free bug
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Cong Wang commit cb2595c1393b4a5211534e6f0a0fbad369e21ad8 upstream. ucma_process_join() will free the new allocated "mc" struct, if there is any error after that, especially the copy_to_user(). But in parallel, ucma_leave_multicast() could find this "mc" through idr_find() before ucma_process_join() frees it, since it is already published. So "mc" could be used in ucma_leave_multicast() after it is been allocated and freed in ucma_process_join(), since we don't refcnt it. Fix this by separating "publish" from ID allocation, so that we can get an ID first and publish it later after copy_to_user(). Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Reported-by: Noam Rathaus Signed-off-by: Cong Wang Signed-off-by: Jason Gunthorpe Signed-off-by: Ben Hutchings --- drivers/infiniband/core/ucma.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -180,7 +180,7 @@ static struct ucma_multicast* ucma_alloc return NULL; mutex_lock(&mut); - mc->id = idr_alloc(&multicast_idr, mc, 0, 0, GFP_KERNEL); + mc->id = idr_alloc(&multicast_idr, NULL, 0, 0, GFP_KERNEL); mutex_unlock(&mut); if (mc->id < 0) goto error; @@ -1285,6 +1285,10 @@ static ssize_t ucma_process_join(struct goto err3; } + mutex_lock(&mut); + idr_replace(&multicast_idr, mc, mc->id); + mutex_unlock(&mut); + mutex_unlock(&file->mut); ucma_put_ctx(ctx); return 0;
[PATCH 3.16 19/63] jfs: Fix inconsistency between memory allocation and ea_buf->max_size
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Shankara Pailoor commit 92d34134193e5b129dc24f8d79cb9196626e8d7a upstream. The code is assuming the buffer is max_size length, but we weren't allocating enough space for it. Signed-off-by: Shankara Pailoor Signed-off-by: Dave Kleikamp Signed-off-by: Ben Hutchings --- fs/jfs/xattr.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) --- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -493,15 +493,17 @@ static int ea_get(struct inode *inode, s if (size > PSIZE) { /* * To keep the rest of the code simple. Allocate a -* contiguous buffer to work with +* contiguous buffer to work with. Make the buffer large +* enough to make use of the whole extent. */ - ea_buf->xattr = kmalloc(size, GFP_KERNEL); + ea_buf->max_size = (size + sb->s_blocksize - 1) & + ~(sb->s_blocksize - 1); + + ea_buf->xattr = kmalloc(ea_buf->max_size, GFP_KERNEL); if (ea_buf->xattr == NULL) return -ENOMEM; ea_buf->flag = EA_MALLOC; - ea_buf->max_size = (size + sb->s_blocksize - 1) & - ~(sb->s_blocksize - 1); if (ea_size == 0) return 0;
[PATCH 3.16 24/63] ext4: only look at the bg_flags field if it is valid
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream. The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this. Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected. This addresses CVE-2018-10876. https://bugzilla.kernel.org/show_bug.cgi?id=199403 Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: - ext4_read_block_bitmap_nowait() and ext4_read_inode_bitmap() return a pointer (NULL on error) instead of an error code - Open-code sb_rdonly() - Adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/balloc.c | 11 ++- fs/ext4/ialloc.c | 14 -- fs/ext4/mballoc.c | 6 -- fs/ext4/super.c | 11 ++- 4 files changed, 36 insertions(+), 6 deletions(-) --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -453,9 +453,18 @@ ext4_read_block_bitmap_nowait(struct sup goto verify; } ext4_lock_group(sb, block_group); - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { int err; + if (block_group == 0) { + ext4_unlock_group(sb, block_group); + unlock_buffer(bh); + ext4_error(sb, "Block bitmap for bg 0 marked " + "uninitialized"); + put_bh(bh); + return NULL; + } err = ext4_init_block_bitmap(sb, bh, block_group, desc); set_bitmap_uptodate(bh); set_buffer_uptodate(bh); --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -156,7 +156,16 @@ ext4_read_inode_bitmap(struct super_bloc } ext4_lock_group(sb, block_group); - if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT))) { + if (block_group == 0) { + ext4_unlock_group(sb, block_group); + unlock_buffer(bh); + ext4_error(sb, "Inode bitmap for bg 0 marked " + "uninitialized"); + put_bh(bh); + return NULL; + } ext4_init_inode_bitmap(sb, bh, block_group, desc); set_bitmap_uptodate(bh); set_buffer_uptodate(bh); @@ -910,7 +919,8 @@ got: /* recheck and clear flag under lock if we still need to */ ext4_lock_group(sb, group); - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ext4_free_group_clusters_set(sb, gdp, ext4_free_clusters_after_init(sb, group, gdp)); --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2418,7 +2418,8 @@ int ext4_mb_add_groupinfo(struct super_b * initialize bb_free to be able to skip * empty groups without initialization */ - if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { meta_group_info[i]->bb_free = ext4_free_clusters_after_init(sb, group, desc); } else { @@ -2943,7 +2944,8 @@ ext4_mb_mark_diskspace_used(struct ext4_ #endif ext4_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); ext4_free_group_clusters_set(sb, gdp, ext4_free_clusters_after_init(sb, --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3080,13 +3080,22 @@ static ext4_group_t ext4_has_uninit_itab ext4_group_t group, ngroups = EXT4_SB(sb)->s_groups_count; struct ext4_group_desc *gdp = NULL; + if (!ext4_has_group_desc_csum(sb)) + return ngroups; + for (group = 0; group < ngroups; group++) { gdp = ext4_get_group_desc(sb, group, NULL); if (!gdp)
[PATCH 3.16 20/63] scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Alexander Potapenko commit a45b599ad808c3c982fdcdc12b0b8611c2f92824 upstream. This shall help avoid copying uninitialized memory to the userspace when calling ioctl(fd, SG_IO) with an empty command. Reported-by: syzbot+7d26fc1eea198488d...@syzkaller.appspotmail.com Signed-off-by: Alexander Potapenko Acked-by: Douglas Gilbert Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Ben Hutchings --- drivers/scsi/sg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1825,7 +1825,7 @@ retry: num = (rem_sz > scatter_elem_sz_prev) ? scatter_elem_sz_prev : rem_sz; - schp->pages[k] = alloc_pages(gfp_mask, order); + schp->pages[k] = alloc_pages(gfp_mask | __GFP_ZERO, order); if (!schp->pages[k]) goto out;
[PATCH 3.16 25/63] ext4: fix check to prevent initializing reserved inodes
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Theodore Ts'o commit 5012284700775a4e6e3fbe7eac4c543c4874b559 upstream. Commit 8844618d8aa7: "ext4: only look at the bg_flags field if it is valid" will complain if block group zero does not have the EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct, since a freshly created file system has this flag cleared. It gets almost immediately after the file system is mounted read-write --- but the following somewhat unlikely sequence will end up triggering a false positive report of a corrupted file system: mkfs.ext4 /dev/vdc mount -o ro /dev/vdc /vdc mount -o remount,rw /dev/vdc Instead, when initializing the inode table for block group zero, test to make sure that itable_unused count is not too large, since that is the case that will result in some or all of the reserved inodes getting cleared. This fixes the failures reported by Eric Whiteney when running generic/230 and generic/231 in the the nojournal test case. Fixes: 8844618d8aa7 ("ext4: only look at the bg_flags field if it is valid") Reported-by: Eric Whitney Signed-off-by: Theodore Ts'o [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings --- fs/ext4/ialloc.c | 5 - fs/ext4/super.c | 8 +--- 2 files changed, 5 insertions(+), 8 deletions(-) --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1289,7 +1289,10 @@ int ext4_init_inode_table(struct super_b ext4_itable_unused_count(sb, gdp)), sbi->s_inodes_per_block); - if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group)) { + if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group) || + ((group == 0) && ((EXT4_INODES_PER_GROUP(sb) - + ext4_itable_unused_count(sb, gdp)) < + EXT4_FIRST_INO(sb { ext4_error(sb, "Something is wrong with group %u: " "used itable blocks: %d; " "itable unused count: %u", --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3088,14 +3088,8 @@ static ext4_group_t ext4_has_uninit_itab if (!gdp) continue; - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) - continue; - if (group != 0) + if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) break; - ext4_error(sb, "Inode table for bg 0 marked as " - "needing zeroing"); - if (sb->s_flags & MS_RDONLY) - return ngroups; } return group;
[PATCH 3.16 22/63] scsi: libsas: defer ata device eh commands to libata
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Jason Yan commit 318aaf34f1179b39fa9c30fa0f3288b645beee39 upstream. When ata device doing EH, some commands still attached with tasks are not passed to libata when abort failed or recover failed, so libata did not handle these commands. After these commands done, sas task is freed, but ata qc is not freed. This will cause ata qc leak and trigger a warning like below: WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037 ata_eh_finish+0xb4/0xcc CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1 .. Call trace: [] ata_eh_finish+0xb4/0xcc [] ata_do_eh+0xc4/0xd8 [] ata_std_error_handler+0x44/0x8c [] ata_scsi_port_error_handler+0x480/0x694 [] async_sas_ata_eh+0x4c/0x80 [] async_run_entry_fn+0x4c/0x170 [] process_one_work+0x144/0x390 [] worker_thread+0x144/0x418 [] kthread+0x10c/0x138 [] ret_from_fork+0x10/0x18 If ata qc leaked too many, ata tag allocation will fail and io blocked for ever. As suggested by Dan Williams, defer ata device commands to libata and merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle ata qcs correctly after this. Signed-off-by: Jason Yan CC: Xiaofei Tan CC: John Garry CC: Dan Williams Reviewed-by: Dan Williams Signed-off-by: Martin K. Petersen Signed-off-by: Ben Hutchings --- drivers/scsi/libsas/sas_scsi_host.c | 33 - 1 file changed, 13 insertions(+), 20 deletions(-) --- a/drivers/scsi/libsas/sas_scsi_host.c +++ b/drivers/scsi/libsas/sas_scsi_host.c @@ -250,6 +250,7 @@ out_done: static void sas_eh_finish_cmd(struct scsi_cmnd *cmd) { struct sas_ha_struct *sas_ha = SHOST_TO_SAS_HA(cmd->device->host); + struct domain_device *dev = cmd_to_domain_dev(cmd); struct sas_task *task = TO_SAS_TASK(cmd); /* At this point, we only get called following an actual abort @@ -258,6 +259,14 @@ static void sas_eh_finish_cmd(struct scs */ sas_end_task(cmd, task); + if (dev_is_sata(dev)) { + /* defer commands to libata so that libata EH can +* handle ata qcs correctly +*/ + list_move_tail(&cmd->eh_entry, &sas_ha->eh_ata_q); + return; + } + /* now finish the command and move it on to the error * handler done list, this also takes it off the * error handler pending list. @@ -265,22 +274,6 @@ static void sas_eh_finish_cmd(struct scs scsi_eh_finish_cmd(cmd, &sas_ha->eh_done_q); } -static void sas_eh_defer_cmd(struct scsi_cmnd *cmd) -{ - struct domain_device *dev = cmd_to_domain_dev(cmd); - struct sas_ha_struct *ha = dev->port->ha; - struct sas_task *task = TO_SAS_TASK(cmd); - - if (!dev_is_sata(dev)) { - sas_eh_finish_cmd(cmd); - return; - } - - /* report the timeout to libata */ - sas_end_task(cmd, task); - list_move_tail(&cmd->eh_entry, &ha->eh_ata_q); -} - static void sas_scsi_clear_queue_lu(struct list_head *error_q, struct scsi_cmnd *my_cmd) { struct scsi_cmnd *cmd, *n; @@ -288,7 +281,7 @@ static void sas_scsi_clear_queue_lu(stru list_for_each_entry_safe(cmd, n, error_q, eh_entry) { if (cmd->device->sdev_target == my_cmd->device->sdev_target && cmd->device->lun == my_cmd->device->lun) - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); } } @@ -677,12 +670,12 @@ static void sas_eh_handle_sas_errors(str case TASK_IS_DONE: SAS_DPRINTK("%s: task 0x%p is done\n", __func__, task); - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); continue; case TASK_IS_ABORTED: SAS_DPRINTK("%s: task 0x%p is aborted\n", __func__, task); - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); continue; case TASK_IS_AT_LU: SAS_DPRINTK("task 0x%p is at LU: lu recover\n", task); @@ -693,7 +686,7 @@ static void sas_eh_handle_sas_errors(str "recovered\n", SAS_ADDR(task->dev), cmd->device->lun); - sas_eh_defer_cmd(cmd); + sas_eh_finish_cmd(cmd); sas_scsi_clear_queue_lu(work_q, cmd); goto Again; }
[PATCH v4 15/19] LSM: Infrastructure management of the task security
Move management of the task_struct->security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules the modules tell the infrastructure how much space is required, and the space is allocated there. The only user of this blob is AppArmor. The AppArmor use is abstracted to avoid future conflict. Signed-off-by: Casey Schaufler --- include/linux/lsm_hooks.h| 2 ++ security/apparmor/include/task.h | 18 +++ security/apparmor/lsm.c | 15 ++--- security/security.c | 54 +++- 4 files changed, 62 insertions(+), 27 deletions(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 416b20c3795b..6057c603b979 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -2031,6 +2031,7 @@ struct lsm_blob_sizes { int lbs_cred; int lbs_file; int lbs_inode; + int lbs_task; }; /* @@ -2098,6 +2099,7 @@ extern int lsm_inode_alloc(struct inode *inode); #ifdef CONFIG_SECURITY void lsm_early_cred(struct cred *cred); void lsm_early_inode(struct inode *inode); +void lsm_early_task(struct task_struct *task); #endif #endif /* ! __LINUX_LSM_HOOKS_H */ diff --git a/security/apparmor/include/task.h b/security/apparmor/include/task.h index 55edaa1d83f8..039c1e60887a 100644 --- a/security/apparmor/include/task.h +++ b/security/apparmor/include/task.h @@ -14,7 +14,10 @@ #ifndef __AA_TASK_H #define __AA_TASK_H -#define task_ctx(X) ((X)->security) +static inline struct aa_task_ctx *task_ctx(struct task_struct *task) +{ + return task->security; +} /* * struct aa_task_ctx - information for current task label change @@ -36,17 +39,6 @@ int aa_set_current_hat(struct aa_label *label, u64 token); int aa_restore_previous_label(u64 cookie); struct aa_label *aa_get_task_label(struct task_struct *task); -/** - * aa_alloc_task_ctx - allocate a new task_ctx - * @flags: gfp flags for allocation - * - * Returns: allocated buffer or NULL on failure - */ -static inline struct aa_task_ctx *aa_alloc_task_ctx(gfp_t flags) -{ - return kzalloc(sizeof(struct aa_task_ctx), flags); -} - /** * aa_free_task_ctx - free a task_ctx * @ctx: task_ctx to free (MAYBE NULL) @@ -57,8 +49,6 @@ static inline void aa_free_task_ctx(struct aa_task_ctx *ctx) aa_put_label(ctx->nnp); aa_put_label(ctx->previous); aa_put_label(ctx->onexec); - - kzfree(ctx); } } diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index 15716b6ff860..c97dc3dbb515 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -91,19 +91,14 @@ static void apparmor_task_free(struct task_struct *task) { aa_free_task_ctx(task_ctx(task)); - task_ctx(task) = NULL; } static int apparmor_task_alloc(struct task_struct *task, unsigned long clone_flags) { - struct aa_task_ctx *new = aa_alloc_task_ctx(GFP_KERNEL); - - if (!new) - return -ENOMEM; + struct aa_task_ctx *new = task_ctx(task); aa_dup_task_ctx(new, task_ctx(current)); - task_ctx(task) = new; return 0; } @@ -1132,6 +1127,7 @@ static void apparmor_sock_graft(struct sock *sk, struct socket *parent) struct lsm_blob_sizes apparmor_blob_sizes = { .lbs_cred = sizeof(struct aa_task_ctx *), .lbs_file = sizeof(struct aa_file_ctx), + .lbs_task = sizeof(struct aa_task_ctx), }; static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = { @@ -1457,15 +1453,10 @@ static int param_set_mode(const char *val, const struct kernel_param *kp) static int __init set_init_ctx(void) { struct cred *cred = (struct cred *)current->real_cred; - struct aa_task_ctx *ctx; - - ctx = aa_alloc_task_ctx(GFP_KERNEL); - if (!ctx) - return -ENOMEM; lsm_early_cred(cred); + lsm_early_task(current); set_cred_label(cred, aa_get_label(ns_unconfined(root_ns))); - task_ctx(current) = ctx; return 0; } diff --git a/security/security.c b/security/security.c index a8f00fdff4d8..7e11de7eec21 100644 --- a/security/security.c +++ b/security/security.c @@ -117,6 +117,7 @@ int __init security_init(void) pr_info("LSM: cred blob size = %d\n", blob_sizes.lbs_cred); pr_info("LSM: file blob size = %d\n", blob_sizes.lbs_file); pr_info("LSM: inode blob size = %d\n", blob_sizes.lbs_inode); + pr_info("LSM: task blob size = %d\n", blob_sizes.lbs_task); #endif return 0; @@ -301,6 +302,7 @@ void __init security_add_blobs(struct lsm_blob_sizes *needed) if (needed->lbs_inode && blob_sizes.lbs_inode == 0) blob_sizes.lbs_inode = sizeof(struct rcu_head); lsm_set_size(&needed->lbs_inode, &blob_sizes.lbs_inode); + lsm_set_size(&needed
Re: [PATCH 3.16 20/63] scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()
3.16.58-rc1 review patch. If anyone has any objections, please let me know. -- From: Alexander Potapenko commit a45b599ad808c3c982fdcdc12b0b8611c2f92824 upstream. This shall help avoid copying uninitialized memory to the userspace when calling ioctl(fd, SG_IO) with an empty command. Reported-by: syzbot+7d26fc1eea198488d...@syzkaller.appspotmail.com Signed-off-by: Alexander Potapenko Acked-by: Douglas Gilbert Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Ben Hutchings --- drivers/scsi/sg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -1825,7 +1825,7 @@ retry: num = (rem_sz > scatter_elem_sz_prev) ? scatter_elem_sz_prev : rem_sz; - schp->pages[k] = alloc_pages(gfp_mask, order); + schp->pages[k] = alloc_pages(gfp_mask | __GFP_ZERO, order); if (!schp->pages[k]) goto out; Can't find the corresponding bug.
[PATCH v4 16/19] SELinux: Abstract use of ipc security blobs
Don't use the ipc->security pointer directly. Don't use the msg_msg->security pointer directly. Provide helper functions that provides the security blob pointers. Signed-off-by: Casey Schaufler --- security/selinux/hooks.c | 18 +- security/selinux/include/objsec.h | 13 + 2 files changed, 22 insertions(+), 9 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 389e51ef48a5..e6cb5fce5437 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -5884,7 +5884,7 @@ static int ipc_has_perm(struct kern_ipc_perm *ipc_perms, struct common_audit_data ad; u32 sid = current_sid(); - isec = ipc_perms->security; + isec = selinux_ipc(ipc_perms); ad.type = LSM_AUDIT_DATA_IPC; ad.u.ipc_id = ipc_perms->key; @@ -5941,7 +5941,7 @@ static int selinux_msg_queue_associate(struct kern_ipc_perm *msq, int msqflg) struct common_audit_data ad; u32 sid = current_sid(); - isec = msq->security; + isec = selinux_ipc(msq); ad.type = LSM_AUDIT_DATA_IPC; ad.u.ipc_id = msq->key; @@ -5990,8 +5990,8 @@ static int selinux_msg_queue_msgsnd(struct kern_ipc_perm *msq, struct msg_msg *m u32 sid = current_sid(); int rc; - isec = msq->security; - msec = msg->security; + isec = selinux_ipc(msq); + msec = selinux_msg_msg(msg); /* * First time through, need to assign label to the message @@ -6038,8 +6038,8 @@ static int selinux_msg_queue_msgrcv(struct kern_ipc_perm *msq, struct msg_msg *m u32 sid = task_sid(target); int rc; - isec = msq->security; - msec = msg->security; + isec = selinux_ipc(msq); + msec = selinux_msg_msg(msg); ad.type = LSM_AUDIT_DATA_IPC; ad.u.ipc_id = msq->key; @@ -6092,7 +6092,7 @@ static int selinux_shm_associate(struct kern_ipc_perm *shp, int shmflg) struct common_audit_data ad; u32 sid = current_sid(); - isec = shp->security; + isec = selinux_ipc(shp); ad.type = LSM_AUDIT_DATA_IPC; ad.u.ipc_id = shp->key; @@ -6189,7 +6189,7 @@ static int selinux_sem_associate(struct kern_ipc_perm *sma, int semflg) struct common_audit_data ad; u32 sid = current_sid(); - isec = sma->security; + isec = selinux_ipc(sma); ad.type = LSM_AUDIT_DATA_IPC; ad.u.ipc_id = sma->key; @@ -6275,7 +6275,7 @@ static int selinux_ipc_permission(struct kern_ipc_perm *ipcp, short flag) static void selinux_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid) { - struct ipc_security_struct *isec = ipcp->security; + struct ipc_security_struct *isec = selinux_ipc(ipcp); *secid = isec->sid; } diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index 591adb374d69..5bf9f280e9b2 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include "flask.h" #include "avc.h" @@ -173,4 +174,16 @@ static inline struct inode_security_struct *selinux_inode( return inode->i_security; } +static inline struct msg_security_struct *selinux_msg_msg( + const struct msg_msg *msg_msg) +{ + return msg_msg->security; +} + +static inline struct ipc_security_struct *selinux_ipc( + const struct kern_ipc_perm *ipc) +{ + return ipc->security; +} + #endif /* _SELINUX_OBJSEC_H_ */ -- 2.17.1
[PATCH v4 02/19] Smack: Abstract use of cred security blob
Don't use the cred->security pointer directly. Provide a helper function that provides the security blob pointer. Signed-off-by: Casey Schaufler --- security/smack/smack.h| 17 +-- security/smack/smack_access.c | 4 +-- security/smack/smack_lsm.c| 57 +-- security/smack/smackfs.c | 18 +-- 4 files changed, 53 insertions(+), 43 deletions(-) diff --git a/security/smack/smack.h b/security/smack/smack.h index f7db791fb566..01a922856eba 100644 --- a/security/smack/smack.h +++ b/security/smack/smack.h @@ -356,6 +356,11 @@ extern struct list_head smack_onlycap_list; #define SMACK_HASH_SLOTS 16 extern struct hlist_head smack_known_hash[SMACK_HASH_SLOTS]; +static inline struct task_smack *smack_cred(const struct cred *cred) +{ + return cred->security; +} + /* * Is the directory transmuting? */ @@ -382,13 +387,19 @@ static inline struct smack_known *smk_of_task(const struct task_smack *tsp) return tsp->smk_task; } -static inline struct smack_known *smk_of_task_struct(const struct task_struct *t) +static inline struct smack_known *smk_of_task_struct( + const struct task_struct *t) { struct smack_known *skp; + const struct cred *cred; rcu_read_lock(); - skp = smk_of_task(__task_cred(t)->security); + + cred = __task_cred(t); + skp = smk_of_task(smack_cred(cred)); + rcu_read_unlock(); + return skp; } @@ -405,7 +416,7 @@ static inline struct smack_known *smk_of_forked(const struct task_smack *tsp) */ static inline struct smack_known *smk_of_current(void) { - return smk_of_task(current_security()); + return smk_of_task(smack_cred(current_cred())); } /* diff --git a/security/smack/smack_access.c b/security/smack/smack_access.c index 9a4c0ad46518..489d49a20b47 100644 --- a/security/smack/smack_access.c +++ b/security/smack/smack_access.c @@ -275,7 +275,7 @@ int smk_tskacc(struct task_smack *tsp, struct smack_known *obj_known, int smk_curacc(struct smack_known *obj_known, u32 mode, struct smk_audit_info *a) { - struct task_smack *tsp = current_security(); + struct task_smack *tsp = smack_cred(current_cred()); return smk_tskacc(tsp, obj_known, mode, a); } @@ -635,7 +635,7 @@ DEFINE_MUTEX(smack_onlycap_lock); */ bool smack_privileged_cred(int cap, const struct cred *cred) { - struct task_smack *tsp = cred->security; + struct task_smack *tsp = smack_cred(cred); struct smack_known *skp = tsp->smk_task; struct smack_known_list_elem *sklep; int rc; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 340fc30ad85d..68ee3ae8f25c 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -122,7 +122,7 @@ static int smk_bu_note(char *note, struct smack_known *sskp, static int smk_bu_current(char *note, struct smack_known *oskp, int mode, int rc) { - struct task_smack *tsp = current_security(); + struct task_smack *tsp = smack_cred(current_cred()); char acc[SMK_NUM_ACCESS_TYPE + 1]; if (rc <= 0) @@ -143,7 +143,7 @@ static int smk_bu_current(char *note, struct smack_known *oskp, #ifdef CONFIG_SECURITY_SMACK_BRINGUP static int smk_bu_task(struct task_struct *otp, int mode, int rc) { - struct task_smack *tsp = current_security(); + struct task_smack *tsp = smack_cred(current_cred()); struct smack_known *smk_task = smk_of_task_struct(otp); char acc[SMK_NUM_ACCESS_TYPE + 1]; @@ -165,7 +165,7 @@ static int smk_bu_task(struct task_struct *otp, int mode, int rc) #ifdef CONFIG_SECURITY_SMACK_BRINGUP static int smk_bu_inode(struct inode *inode, int mode, int rc) { - struct task_smack *tsp = current_security(); + struct task_smack *tsp = smack_cred(current_cred()); struct inode_smack *isp = inode->i_security; char acc[SMK_NUM_ACCESS_TYPE + 1]; @@ -195,7 +195,7 @@ static int smk_bu_inode(struct inode *inode, int mode, int rc) #ifdef CONFIG_SECURITY_SMACK_BRINGUP static int smk_bu_file(struct file *file, int mode, int rc) { - struct task_smack *tsp = current_security(); + struct task_smack *tsp = smack_cred(current_cred()); struct smack_known *sskp = tsp->smk_task; struct inode *inode = file_inode(file); struct inode_smack *isp = inode->i_security; @@ -225,7 +225,7 @@ static int smk_bu_file(struct file *file, int mode, int rc) static int smk_bu_credfile(const struct cred *cred, struct file *file, int mode, int rc) { - struct task_smack *tsp = cred->security; + struct task_smack *tsp = smack_cred(cred); struct smack_known *sskp = tsp->smk_task; struct inode *inode = file_inode(file); struct inode_smack *isp = inode->i_security; @@ -429,7 +429,7 @@ static int smk_ptrace_rule_che
Re: [PATCH 0/3] mm: Randomize free memory
On Fri, Sep 21, 2018 at 4:51 PM Elliott, Robert (Persistent Memory) wrote: > > > > -Original Message- > > From: linux-kernel-ow...@vger.kernel.org > ow...@vger.kernel.org> On Behalf Of Kees Cook > > Sent: Friday, September 21, 2018 2:13 PM > > Subject: Re: [PATCH 0/3] mm: Randomize free memory > ... > > I'd be curious to hear more about the mentioned cache performance > > improvements. I love it when a security feature actually _improves_ > > performance. :) > > It's been a problem in the HPC space: > http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ > > A kernel module called zonesort is available to try to help: > https://software.intel.com/en-us/articles/xeon-phi-software > > and this abandoned patch series proposed that for the kernel: > https://lkml.org/lkml/2017/8/23/195 > > Dan's patch series doesn't attempt to ensure buffers won't conflict, but > also reduces the chance that the buffers will. This will make performance > more consistent, albeit slower than "optimal" (which is near impossible > to attain in a general-purpose kernel). That's better than forcing > users to deploy remedies like: > "To eliminate this gradual degradation, we have added a Stream > measurement to the Node Health Check that follows each job; > nodes are rebooted whenever their measured memory bandwidth > falls below 300 GB/s." Robert, thanks for that! Yes, instead of run-to-run variations alternating between almost-never-conflict and nearly-always-conflict, we'll get a random / average distribution of cache conflicts.
Re: Code of Conduct: Let's revamp it.
On Fri, Sep 21, 2018 at 07:31:05PM -0400, jonsm...@gmail.com wrote: > On Fri, Sep 21, 2018 at 7:17 PM Theodore Y. Ts'o wrote: > > > > People can decide who they want to respond to, but I'm going to gently > > suggest that before people think about responding to a particular > > e-mail, that they do a quick check using "git log > > --author=xy...@example.com" > > then decide how much someone appears to be a member of the community > > before deciding how and whether their thoughts are relevant. > > How does this part apply to email addresses used to commit code? > > * Publishing others’ private information, such as a physical or electronic > address, without explicit permission > > It appears to me that this would conflict with the GPL since the GPL > granted the right to distribute (or even print it in a book) Linux and > Linux contains email addresses. This also seems contradictory with > the Reply button I used to send this email. I don't really think email addresses used in patches which are sent, voluntarily, to a public mailing list are something you can sanely consider "private information". > How do you reconcile working on a public project while keeping email > address secret? This is a little more delicate, and I admit that I can't really think of any real solutions for this part... -- Cheers, Joey Pabalinas signature.asc Description: PGP signature
Re: [PATCH resend] uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
Andrew Morton wrote: > Are there such programs? Do they reference the `private' field? They would use the keyutils.h header from keyutils package probably. There the field was named "priv" not "private". The kernel's UAPI header should be amended again to match that. David
Re: Code of Conduct: Let's revamp it.
On 22/09/2018 01:31, jonsm...@gmail.com wrote: [...] > How does this part apply to email addresses used to commit code? > > * Publishing others’ private information, such as a physical or electronic > address, without explicit permission I need an (explicit) permission to "publish" an already published email address which is already world-wide known because it can be found by the simplest and worst search engine as the email address is in public mailing list archives and git repos? Sounds pretty absurd as the people themselves already published their email address. IMHO you cannot "publish" already published stuff. MfG, Bernd, NAL -- Bernd Petrovitsch Email : be...@petrovitsch.priv.at LUGA : http://www.luga.at pEpkey.asc Description: application/pgp-keys
Re: [PATCH V4 00/27] C-SKY(csky) Linux Kernel Port
On Thu, Sep 20, 2018 at 10:18:51PM -0700, Arnd Bergmann wrote: > On Thu, Sep 20, 2018 at 10:52 AM Palmer Dabbelt wrote: > > > > On Fri, 14 Sep 2018 07:37:20 PDT (-0700), ren_...@c-sky.com wrote: > > > On Wed, Sep 12, 2018 at 04:30:36PM +0200, Arnd Bergmann wrote: > > >> On Wed, Sep 12, 2018 at 3:25 PM Guo Ren wrote: > > I don't want to hijack this thread, but in RISC-V land we were hoping to > > have a > > user ABI free of 32-bit time_t. Our 32-bit glibc ABI hasn't been finalized > > yet, and when I talked to the glibc guys a few weeks ago they were happy to > > let > > us wait until 32-bit time_t can be removed before we stabilize the ABI. > > We've > > been maintaining out-of-tree glibc patches for a while now, so I'd really > > like > > to get them into the next glibc release. > > > > Mapping out the schedule more explicitly, as I'm terrible with dates: > > > > * 4.19-rc4 was 2018-09-16 > > * 4.19 should be 2018-10-21 > > * 4.20 should be 2019-01-13 (skipping 2 weeks for the holidays) > > * 4.21 merge window should close 2019-01-27 > > * glibc 2.29 is scheduled for 2019-02-01 Thx for the schedule info. > > > > That's very tight, but assuming we at least have a prototype of the API so > > we > > can get the rv32i glibc patches in much earlier it might be OK. There was > > some > > talk of being able to use some workarounds to do a 32-bit time_t user ABI > > without the cooresponding kernel ABI, so we could always go down that route > > to > > start and then decide to deprecate or not deprecate the 32-bit kernel ABI at > > the last minute -- not something I'm fond of doing, but an option. > > > > How close to done do you think the 32-bit time_t will be by the end of the > > 4.20 > > merge window? If it's close enough to start our glibc push then that might > > be > > OK. > > It will be a bit of a stretch, but it's possible. Most syscalls are > done in linux-next, > I have a few more pending, and only clock_adjtime is really missing now (I had > some earlier patches that I could revive). Seems time schedule is OK. If we make csky get into linux-4.20, then csky glibc port could remove 32-bit time_t in patchset before glibc 2.29 release. > My plan was to get that all into 4.20, and then have a conversation about the > actual syscall table changes in 4.21. If we need it for both csky and rv32, > we might just change the generic syscall table that way in 4.21 without > changing all the other ones along with them. I don't want to drag things out > over too many merge windows though, and my plan was to do all architectures > together to simplify the version checks in the libc code to only have to check > for a single version. Seems that's no problem. Best Regards Guo Ren
RE: [PATCH 0/3] mm: Randomize free memory
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Kees Cook > Sent: Friday, September 21, 2018 2:13 PM > Subject: Re: [PATCH 0/3] mm: Randomize free memory ... > I'd be curious to hear more about the mentioned cache performance > improvements. I love it when a security feature actually _improves_ > performance. :) It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.org/lkml/2017/8/23/195 Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." --- Robert Elliott, HPE Persistent Memory
Re: [PATCH] selftests: watchdog: Add gettimeout and get|set pretimeout
Hi Jerry, Thanks for the patch. A few comments below: On 09/21/2018 04:55 PM, Jerry Hoemann wrote: > Add command line arguments to call ioctl WDIOC_GETTIMEOUT, > WDIOC_GETPRETIMEOUT and WDIOC_SETPRETIMEOUT. > > Signed-off-by: Jerry Hoemann > --- > tools/testing/selftests/watchdog/watchdog-test.c | 30 > +++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/watchdog/watchdog-test.c > b/tools/testing/selftests/watchdog/watchdog-test.c > index 6e29087..4861e2c 100644 > --- a/tools/testing/selftests/watchdog/watchdog-test.c > +++ b/tools/testing/selftests/watchdog/watchdog-test.c > @@ -19,7 +19,7 @@ > > int fd; > const char v = 'V'; > -static const char sopts[] = "bdehp:t:"; > +static const char sopts[] = "bdehp:t:Tn:N"; > static const struct option lopts[] = { > {"bootstatus", no_argument, NULL, 'b'}, > {"disable", no_argument, NULL, 'd'}, > @@ -27,6 +27,9 @@ > {"help",no_argument, NULL, 'h'}, > {"pingrate", required_argument, NULL, 'p'}, > {"timeout", required_argument, NULL, 't'}, > + {"gettimeout", no_argument, NULL, 'T'}, > + {"pretimeout",required_argument, NULL, 'n'}, > + {"getpretimeout", no_argument, NULL, 'N'}, > {NULL, no_argument, NULL, 0x0} > }; > > @@ -71,6 +74,9 @@ static void usage(char *progname) > printf(" -h, --help Print the help message\n"); > printf(" -p, --pingrate=PSet ping rate to P seconds (default > %d)\n", DEFAULT_PING_RATE); > printf(" -t, --timeout=T Set timeout to T seconds\n"); > + printf(" -T, --gettimeoutGet the timeout\n"); > + printf(" -n, --pretimeoutSet the pretimeout to T seconds\n"); > + printf(" -N, --getpretimeout Get the pretimeout\n"); How are the new arguments used? > printf("\n"); > printf("Parameters are parsed left-to-right in real-time.\n"); > printf("Example: %s -d -t 10 -p 5 -e\n", progname); Please add an example usage for each of these new arguments. > @@ -135,6 +141,28 @@ int main(int argc, char *argv[]) > else > printf("WDIOC_SETTIMEOUT errno '%s'\n", > strerror(errno)); > break; > + case 'T': > + ret = ioctl(fd, WDIOC_GETTIMEOUT, &flags); > + if (!ret) > + printf("Watchdog timeout set to %u seconds.\n", > flags); It would good to make this message different from the WDIOC_SETTIMEOUT message. Please update it to reflect that this is the result of a WDIOC_GETTIMEOUT. What would user intend to do with this GETTIMEOUT? Shouldn't this be the case that it prints the current value and exits instead of the same logic as SETTIMEOUT option? > + else > + printf("WDIOC_GETTIMEOUT errno '%s'\n", > strerror(errno)) Shouldn't this error be an exit condition? > + break; > + case 'n': > + flags = strtoul(optarg, NULL, 0); > + ret = ioctl(fd, WDIOC_SETPRETIMEOUT, &flags); > + if (!ret) > + printf("Watchdog pretimeout set to %u > seconds.\n", flags); > + else > + printf("WDIOC_SETPRETIMEOUT errno '%s'\n", > strerror(errno)); > + break; > + case 'N': > + ret = ioctl(fd, WDIOC_GETPRETIMEOUT, &flags); > + if (!ret) > + printf("Watchdog pretimeout set to %u > seconds.\n", flags); It would good to make this message different from the WDIOC_GETPRETIMEOUT message. Please update it to reflect that this is the result of a WDIOC_GETPRETIMEOUT What would user intend to do with this GETTIMEOUT? Shouldn't this be the case that it prints the current value and exits instead of the same logic as WDIOC_SETPRETIMEOUT? > + else > + printf("WDIOC_GETPRETIMEOUT errno '%s'\n", > strerror(errno)); Shouldn't this error be an exit condition? > + break; > default: > usage(argv[0]); > goto end; > Also can you run this test as normal user? thanks, -- Shuah
Re: [PATCH v10 24/26] KVM: s390: device attrs to enable/disable AP interpretation
On 09/17/2018 04:51 AM, David Hildenbrand wrote: Am 12.09.18 um 21:43 schrieb Tony Krowiak: From: Tony Krowiak Introduces two new VM crypto device attributes (KVM_S390_VM_CRYPTO) to enable or disable AP instruction interpretation from userspace via the KVM_SET_DEVICE_ATTR ioctl: * The KVM_S390_VM_CRYPTO_ENABLE_APIE attribute enables hardware interpretation of AP instructions executed on the guest. * The KVM_S390_VM_CRYPTO_DISABLE_APIE attribute disables hardware interpretation of AP instructions executed on the guest. In this case the instructions will be intercepted and pass through to the guest. Signed-off-by: Tony Krowiak --- arch/s390/include/asm/kvm_host.h |1 + arch/s390/include/uapi/asm/kvm.h |2 ++ arch/s390/kvm/kvm-s390.c | 27 +++ 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index b32bd1b..36d3531 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -719,6 +719,7 @@ struct kvm_s390_crypto { __u32 crycbd; __u8 aes_kw; __u8 dea_kw; + __u8 apie; }; #define APCB0_MASK_SIZE 1 diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h index 8c23afc..a8dbd90 100644 --- a/arch/s390/include/uapi/asm/kvm.h +++ b/arch/s390/include/uapi/asm/kvm.h @@ -161,6 +161,8 @@ struct kvm_s390_vm_cpu_subfunc { #define KVM_S390_VM_CRYPTO_ENABLE_DEA_KW 1 #define KVM_S390_VM_CRYPTO_DISABLE_AES_KW 2 #define KVM_S390_VM_CRYPTO_DISABLE_DEA_KW 3 +#define KVM_S390_VM_CRYPTO_ENABLE_APIE 4 +#define KVM_S390_VM_CRYPTO_DISABLE_APIE5 /* kvm attributes for migration mode */ #define KVM_S390_VM_MIGRATION_STOP0 diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 2cdd980..286c2e0 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -856,12 +856,11 @@ void kvm_s390_vcpu_crypto_reset_all(struct kvm *kvm) static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr) { - if (!test_kvm_facility(kvm, 76)) - return -EINVAL; - mutex_lock(&kvm->lock); switch (attr->attr) { case KVM_S390_VM_CRYPTO_ENABLE_AES_KW: + if (!test_kvm_facility(kvm, 76)) + return -EINVAL; get_random_bytes( kvm->arch.crypto.crycb->aes_wrapping_key_mask, sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask)); @@ -869,6 +868,8 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr) VM_EVENT(kvm, 3, "%s", "ENABLE: AES keywrapping support"); break; case KVM_S390_VM_CRYPTO_ENABLE_DEA_KW: + if (!test_kvm_facility(kvm, 76)) + return -EINVAL; get_random_bytes( kvm->arch.crypto.crycb->dea_wrapping_key_mask, sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask)); @@ -876,17 +877,31 @@ static int kvm_s390_vm_set_crypto(struct kvm *kvm, struct kvm_device_attr *attr) VM_EVENT(kvm, 3, "%s", "ENABLE: DEA keywrapping support"); break; case KVM_S390_VM_CRYPTO_DISABLE_AES_KW: + if (!test_kvm_facility(kvm, 76)) + return -EINVAL; kvm->arch.crypto.aes_kw = 0; memset(kvm->arch.crypto.crycb->aes_wrapping_key_mask, 0, sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask)); VM_EVENT(kvm, 3, "%s", "DISABLE: AES keywrapping support"); break; case KVM_S390_VM_CRYPTO_DISABLE_DEA_KW: + if (!test_kvm_facility(kvm, 76)) + return -EINVAL; kvm->arch.crypto.dea_kw = 0; memset(kvm->arch.crypto.crycb->dea_wrapping_key_mask, 0, sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask)); VM_EVENT(kvm, 3, "%s", "DISABLE: DEA keywrapping support"); break; + case KVM_S390_VM_CRYPTO_ENABLE_APIE: + if (!ap_instructions_available()) { + mutex_unlock(&kvm->lock); + return -EOPNOTSUPP; + } + kvm->arch.crypto.apie = 1; + break; + case KVM_S390_VM_CRYPTO_DISABLE_APIE: + kvm->arch.crypto.apie = 0; + break; default: mutex_unlock(&kvm->lock); return -ENXIO; @@ -1493,6 +1508,8 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr) case KVM_S390_VM_CRYPTO_ENABLE_DEA_KW: case KVM_S390_VM_CRYPTO_DISABLE_AES_KW: case KVM_S390_VM_CRYPTO_DISABLE_DEA_KW: + case KVM_S390_VM_CRYPTO_ENABLE_APIE: +
Re: [PATCH v2] slub: extend slub debug to handle multiple slabs
On Thu, 20 Sep 2018 21:00:16 +0100 Aaron Tomlin wrote: > Extend the slub_debug syntax to "slub_debug=[,]*", where > may contain an asterisk at the end. For example, the following would poison > all kmalloc slabs: > > slub_debug=P,kmalloc* > > and the following would apply the default flags to all kmalloc and all block > IO > slabs: > > slub_debug=,bio*,kmalloc* > > Please note that a similar patch was posted by Iliyan Malchev some time ago > but > was never merged: > > https://marc.info/?l=linux-mm&m=131283905330474&w=2 Fair enough, I guess. > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1283,9 +1283,37 @@ slab_flags_t kmem_cache_flags(unsigned int object_size, > /* >* Enable debugging if selected on the kernel commandline. >*/ The above comment is in a strange place. Can we please move it to above the function definition in the usual fashion? And make it better, if anything seems to be missing. > - if (slub_debug && (!slub_debug_slabs || (name && > - !strncmp(slub_debug_slabs, name, strlen(slub_debug_slabs) > - flags |= slub_debug; > + > + char *end, *n, *glob; `end' and `glob' could be local to the loop which uses them, which I find a bit nicer. `n' is a rotten identifier. Can't we think of something which communicates meaning? > + int len = strlen(name); > + > + /* If slub_debug = 0, it folds into the if conditional. */ > + if (!slub_debug_slabs) > + return flags | slub_debug; If we take the above return, the call to strlen() was wasted cycles. Presumably gcc is smart enough to prevent that, but why risk it. > + n = slub_debug_slabs; > + while (*n) { > + int cmplen; > + > + end = strchr(n, ','); > + if (!end) > + end = n + strlen(n); > + > + glob = strnchr(n, end - n, '*'); > + if (glob) > + cmplen = glob - n; > + else > + cmplen = max(len, (int)(end - n)); max_t() exists for this. Or maybe make `len' size_t, but I expect that will still warn - that subtraction returns a ptrdiff_t, yes? > + > + if (!strncmp(name, n, cmplen)) { > + flags |= slub_debug; > + break; > + } > + > + if (!*end) > + break; > + n = end + 1; > + } The code in this loop hurts my brain a bit. I hope it's correct ;)
Re: [RFC/PATCH 2/5] device property: introduce notion of subnodes for legacy boards
On Thu, Sep 20, 2018 at 01:16:48PM +0300, Heikki Krogerus wrote: > On Wed, Sep 19, 2018 at 10:13:26AM -0700, Dmitry Torokhov wrote: > > > > diff --git a/drivers/base/pset_property.c b/drivers/base/pset_property.c > > > > index 08ecc13080ae..63f2377aefe8 100644 > > > > --- a/drivers/base/pset_property.c > > > > +++ b/drivers/base/pset_property.c > > > > @@ -18,6 +18,11 @@ struct property_set { > > > > struct device *dev; > > > > struct fwnode_handle fwnode; > > > > const struct property_entry *properties; > > > > + > > > > + struct property_set *parent; > > > > + /* Entry in parent->children list */ > > > > + struct list_head child_node; > > > > + struct list_head children; > > > > > > Add > > > > > > const char *name; > > > > > > and you can implement also pset_get_named_child_node(). > > > > Or > > char name[]; > > > > to avoid separate allocation. > > Let's not do that, especially if you are planning on exporting this > structure. Can you please elaborate why? Not using pointer saves us 4/8 bytes + however much memory we need for bookkeeping for the extra chunk. Given that majority of pset nodes are unnamed this seems wasteful. > If the name is coming from .rodata, there is no need to > allocate anything for the name. Check kstrdup_const(). The data is most likely coming as __initconst so we do need to copy it. > > > Alternatively, we can add it later when we need it, and add > > device_add_named_child_properties(). > > > > I'll leave it up to Rafael to decide. > > Fair enough. > > > Thanks, > > -- > heikki Thanks. -- Dmitry
Re: [RFC/PATCH 2/5] device property: introduce notion of subnodes for legacy boards
Hi Heikki, On Thu, Sep 20, 2018 at 04:53:48PM +0300, Heikki Krogerus wrote: > Hi Dmitry, > > On Mon, Sep 17, 2018 at 11:16:00AM -0700, Dmitry Torokhov wrote: > > +/** > > + * device_add_child_properties - Add a collection of properties to a > > device object. > > + * @dev: Device to add properties to. > > In case you didn't notice my comment for this, you are missing @parent > here. > > But why do you need both the parent and the dev? I could go by parent only and fetch dev from parent. > > > + * @properties: Collection of properties to add. > > + * > > + * Associate a collection of device properties represented by @properties > > as a > > + * child of given @parent firmware node. The function takes a copy of > > + * @properties. > > + */ > > +struct fwnode_handle * > > +device_add_child_properties(struct device *dev, > > + struct fwnode_handle *parent, > > + const struct property_entry *properties) > > +{ > > + struct property_set *p; > > + struct property_set *parent_pset; > > + > > + if (!properties) > > + return ERR_PTR(-EINVAL); > > + > > + parent_pset = to_pset_node(parent); > > For this function, the parent will in practice have to be > dev_fwnode(dev), so I don't think you need @parent at all, no? > > There is something wrong here.. Yes, I expect majority of the calls will use dev_fwnode(dev) as parent, but nobody stops you from doing: device_add_properties(dev, props); c1 = device_add_child_properties(dev, dev_fwnode(dev), cp1); c2 = device_add_child_properties(dev, c1, cp2); c3 = device_add_child_properties(dev, c2, cp3); ... > > > + if (!parent_pset) > > + return ERR_PTR(-EINVAL); > > + > > + p = pset_create_set(properties); > > + if (IS_ERR(p)) > > + return ERR_CAST(p); > > + > > + p->dev = dev; > > That looks wrong. > > I'm guessing the assumption here is that the child nodes will never be > assigned to their own devices, but you can't do that. It will limit > the use of the child nodes to a very small number of cases, possibly > only to gpios. If I need to assign a node to a device I'll use device_add_properties() API. device_add_child_properties() is for nodes living "below" the device. All nodes (the primary/secondary and children) would point to the owning device, just for convenience. > > I think that has to be fixed. It should not be a big deal. Just expect > the child nodes to be removed separately, and add ref counting to the > struct property_set handling. Why do we need to remove them separately and what do we need refcounting for? > > > + p->parent = parent_pset; > > + list_add_tail(&p->child_node, &parent_pset->children); > > + > > + return &p->fwnode; > > +} > > +EXPORT_SYMBOL_GPL(device_add_child_properties); > > The child nodes will change the purpose of the build-in property > support. Originally the goal was just to support adding of build-in > device properties to real firmware nodes, but things have changed > quite a bit from that. These child nodes are purely tied to the > build-in device property support, so we should be talking about adding > pset type child nodes to pset type parent nodes in the API: > fwnode_pset_add_child_node(), or something like that. OK, I can change device_add_child_properties() to fwnode_pset_add_child_node() if Rafael would prefer this name. Thanks. -- Dmitry
Re: [PATCH v10 25/26] KVM: s390: CPU model support for AP virtualization
On 09/12/2018 03:43 PM, Tony Krowiak wrote: From: Tony Krowiak Introduces a new CPU model feature and two CPU model facilities to support AP virtualization for KVM guests. CPU model feature: The KVM_S390_VM_CPU_FEAT_AP feature indicates that AP instructions are available on the guest. This feature will be enabled by the kernel only if the AP instructions are installed on the linux host. This feature must be specifically turned on for the KVM guest from userspace to use the VFIO AP device driver for guest access to AP devices. CPU model facilities: 1. AP Query Configuration Information (QCI) facility is installed. This is indicated by setting facilities bit 12 for the guest. The kernel will not enable this facility for the guest if it is not set on the host. If this facility is not set for the KVM guest, then only APQNs with an APQI less than 16 will be used by a Linux guest regardless of the matrix configuration for the virtual machine. This is a limitation of the Linux AP bus. 2. AP Facilities Test facility (APFT) is installed. This is indicated by setting facilities bit 15 for the guest. The kernel will not enable this facility for the guest if it is not set on the host. If this facility is not set for the KVM guest, then no AP devices will be available to the guest regardless of the guest's matrix configuration for the virtual machine. This is a limitation of the Linux AP bus. Signed-off-by: Tony Krowiak Reviewed-by: Christian Borntraeger Reviewed-by: Halil Pasic Reviewed-by: David Hildenbrand Tested-by: Michael Mueller Tested-by: Farhan Ali Signed-off-by: Christian Borntraeger --- arch/s390/kvm/kvm-s390.c |5 + arch/s390/tools/gen_facilities.c |2 ++ 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 286c2e0..f0b8e2a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -371,6 +371,11 @@ static void kvm_s390_cpu_feat_init(void) if (MACHINE_HAS_ESOP) allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP); + + /* Check if AP instructions installed on host */ + if (ap_instructions_available()) + allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP); + /* * We need SIE support, ESOP (PROT_READ protection for gmap_shadow), * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing). diff --git a/arch/s390/tools/gen_facilities.c b/arch/s390/tools/gen_facilities.c index 0c85aed..fd788e0 100644 --- a/arch/s390/tools/gen_facilities.c +++ b/arch/s390/tools/gen_facilities.c @@ -106,6 +106,8 @@ struct facility_def { .name = "FACILITIES_KVM_CPUMODEL", .bits = (int[]){ + 12, /* AP Query Configuration Information */ + 15, /* AP Facilities Test */ 156, /* etoken facility */ -1 /* END */ } The fixup! patch below modifies this patch (25/26) to illustrate how David's recommendation will be implemented for v11 of the series. It is one of three fixup! patches (the other two are in responses to 03/26 and 11/26) included to generate discussion in v10 rather than waiting until v11 for comments. ---8<--- From: Tony Krowiak Date: Thu, 20 Sep 2018 13:28:07 -0400 Subject: [FIXUP v10] fixup!: KVM: s390: CPU model support for AP virtualization --- arch/s390/kvm/kvm-s390.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index a3a7cd9..ff38251 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -372,10 +372,6 @@ static void kvm_s390_cpu_feat_init(void) if (MACHINE_HAS_ESOP) allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP); - /* Check if AP instructions installed on host */ - if (ap_instructions_available()) - allow_cpu_feat(KVM_S390_VM_CPU_FEAT_AP); - /* * We need SIE support, ESOP (PROT_READ protection for gmap_shadow), * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing). -- 1.7.1
Re: Code of Conduct: Let's revamp it.
On Fri, Sep 21, 2018 at 7:17 PM Theodore Y. Ts'o wrote: > > People can decide who they want to respond to, but I'm going to gently > suggest that before people think about responding to a particular > e-mail, that they do a quick check using "git log --author=xy...@example.com" > then decide how much someone appears to be a member of the community > before deciding how and whether their thoughts are relevant. How does this part apply to email addresses used to commit code? * Publishing others’ private information, such as a physical or electronic address, without explicit permission It appears to me that this would conflict with the GPL since the GPL granted the right to distribute (or even print it in a book) Linux and Linux contains email addresses. This also seems contradictory with the Reply button I used to send this email. How do you reconcile working on a public project while keeping email address secret? > > There are a lot of strong feelings on this issue, and allowing people > who aren't members of the Linux kernel development community, to > escalate the rhetoric --- either in the pro- or anti- CoC direction, > and whether on mailing lists, github comment threads, Twitter, or > Reddit --- is not helpful. > > For example > > > >* Showing empathy towards other community members > > > > Your pussy hurts? Maybe you should have just accepted that your a boy! > > > > I think Linus is perfectly fine in conduct. I mean, this bullshit > > pressure comes from corporations and other wierd places (all > > seeing eye) that want to "help". > > There are people commenting from all sides that are wanting to "help". > But I hope that it is obvious that the above text is *not* *helpful*. > Mark, please stop. > > - Ted -- Jon Smirl jonsm...@gmail.com
[PATCH v4] i2c: aspeed: fix invalid clock parameters for very large divisors
The function that computes clock parameters from divisors did not respect the maximum size of the bitfields that the parameters were written to. This fixes the bug. This bug can be reproduced with (and this fix verified with) the test at: https://kunit-review.googlesource.com/c/linux/+/1035/ Discovered-by-KUnit: https://kunit-review.googlesource.com/c/linux/+/1035/ Signed-off-by: Brendan Higgins Reviewed-by: Jae Hyun Yoo --- - v2 updates the title of the patch, renames a local variable, and prints an error when the clock divider is clamped. - v3 adds a missing newline character for the new logging statement. - v4 fixes a cast that I forgot to update in v2. --- drivers/i2c/busses/i2c-aspeed.c | 65 +++-- 1 file changed, 45 insertions(+), 20 deletions(-) diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c index c258c4d9a4c0..49e682f358e0 100644 --- a/drivers/i2c/busses/i2c-aspeed.c +++ b/drivers/i2c/busses/i2c-aspeed.c @@ -142,7 +142,8 @@ struct aspeed_i2c_bus { /* Synchronizes I/O mem access to base. */ spinlock_t lock; struct completion cmd_complete; - u32 (*get_clk_reg_val)(u32 divisor); + u32 (*get_clk_reg_val)(struct device *dev, + u32 divisor); unsigned long parent_clk_frequency; u32 bus_frequency; /* Transaction state. */ @@ -705,16 +706,27 @@ static const struct i2c_algorithm aspeed_i2c_algo = { #endif /* CONFIG_I2C_SLAVE */ }; -static u32 aspeed_i2c_get_clk_reg_val(u32 clk_high_low_max, u32 divisor) +static u32 aspeed_i2c_get_clk_reg_val(struct device *dev, + u32 clk_high_low_mask, + u32 divisor) { - u32 base_clk, clk_high, clk_low, tmp; + u32 base_clk_divisor, clk_high_low_max, clk_high, clk_low, tmp; + + /* +* SCL_high and SCL_low represent a value 1 greater than what is stored +* since a zero divider is meaningless. Thus, the max value each can +* store is every bit set + 1. Since SCL_high and SCL_low are added +* together (see below), the max value of both is the max value of one +* them times two. +*/ + clk_high_low_max = (clk_high_low_mask + 1) * 2; /* * The actual clock frequency of SCL is: * SCL_freq = APB_freq / (base_freq * (SCL_high + SCL_low)) * = APB_freq / divisor * where base_freq is a programmable clock divider; its value is -* base_freq = 1 << base_clk +* base_freq = 1 << base_clk_divisor * SCL_high is the number of base_freq clock cycles that SCL stays high * and SCL_low is the number of base_freq clock cycles that SCL stays * low for a period of SCL. @@ -724,47 +736,59 @@ static u32 aspeed_i2c_get_clk_reg_val(u32 clk_high_low_max, u32 divisor) * SCL_low = clk_low + 1 * Thus, * SCL_freq = APB_freq / -* ((1 << base_clk) * (clk_high + 1 + clk_low + 1)) +* ((1 << base_clk_divisor) * (clk_high + 1 + clk_low + 1)) * The documentation recommends clk_high >= clk_high_max / 2 and * clk_low >= clk_low_max / 2 - 1 when possible; this last constraint * gives us the following solution: */ - base_clk = divisor > clk_high_low_max ? + base_clk_divisor = divisor > clk_high_low_max ? ilog2((divisor - 1) / clk_high_low_max) + 1 : 0; - tmp = (divisor + (1 << base_clk) - 1) >> base_clk; - clk_low = tmp / 2; - clk_high = tmp - clk_low; - if (clk_high) - clk_high--; + if (base_clk_divisor > ASPEED_I2CD_TIME_BASE_DIVISOR_MASK) { + base_clk_divisor = ASPEED_I2CD_TIME_BASE_DIVISOR_MASK; + clk_low = clk_high_low_mask; + clk_high = clk_high_low_mask; + dev_err(dev, + "clamping clock divider: divider requested, %u, is greater than largest possible divider, %u.\n", + divisor, (1 << base_clk_divisor) * clk_high_low_max); + } else { + tmp = (divisor + (1 << base_clk_divisor) - 1) + >> base_clk_divisor; + clk_low = tmp / 2; + clk_high = tmp - clk_low; + + if (clk_high) + clk_high--; - if (clk_low) - clk_low--; + if (clk_low) + clk_low--; + } return ((clk_high << ASPEED_I2CD_TIME_SCL_HIGH_SHIFT) & ASPEED_I2CD_TIME_SCL_HIGH_MASK) | ((clk_low << ASPEED_I2CD_TIME_SCL_LOW_SHIFT) & ASPEED_I2CD_TI