RE: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
> Subject: Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset > in the 1st page > > On 5/17/2018 5:22 PM, Long Li wrote: > > From: Long Li> > There's a typo "recognize" in the patch title > > > > When doing RDMA send, the offset needs to be checked as data may start > > in an offset in the 1st page. > > Doesn't this patch alter the generic smb2pdu.c code too? I think this should > note "any" send, not just RDMA? Yes, but for TCP the direct_pages and page_offset are always NULL and 0 when cache=rdma is in mounting options, so it doesn't really affect he TCP code path in this patch set. This behavior can be changed, and it will work with TCP too. > > Tom. > > > > > Signed-off-by: Long Li > > --- > > fs/cifs/smb2pdu.c | 3 ++- > > fs/cifs/smbdirect.c | 25 +++-- > > 2 files changed, 21 insertions(+), 7 deletions(-) > > > > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index > > 5097f28..fdcf97e 100644 > > --- a/fs/cifs/smb2pdu.c > > +++ b/fs/cifs/smb2pdu.c > > @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata, > > > > rqst.rq_iov = iov; > > rqst.rq_nvec = 2; > > - rqst.rq_pages = wdata->pages; > > + rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : > wdata->pages; > > + rqst.rq_offset = wdata->page_offset; > > rqst.rq_npages = wdata->nr_pages; > > rqst.rq_pagesz = wdata->pagesz; > > rqst.rq_tailsz = wdata->tailsz; > > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index > > b0a1955..b46586d 100644 > > --- a/fs/cifs/smbdirect.c > > +++ b/fs/cifs/smbdirect.c > > @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, > > struct smb_rqst *rqst) > > > > /* add in the page array if there is one */ > > if (rqst->rq_npages) { > > - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1); > > - buflen += rqst->rq_tailsz; > > + if (rqst->rq_npages == 1) > > + buflen += rqst->rq_tailsz; > > + else > > + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - > > +rqst->rq_offset + rqst->rq_tailsz; > > } > > > > if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8 > > +2184,19 @@ int smbd_send(struct smbd_connection *info, struct > > smb_rqst *rqst) > > > > /* now sending pages if there are any */ > > for (i = 0; i < rqst->rq_npages; i++) { > > - buflen = (i == rqst->rq_npages-1) ? > > - rqst->rq_tailsz : rqst->rq_pagesz; > > + unsigned int offset = 0; > > + if (i == 0) > > + offset = rqst->rq_offset; > > + if (rqst->rq_npages == 1 || i == rqst->rq_npages-1) > > + buflen = rqst->rq_tailsz; > > + else { > > + /* We have at least two pages, and this is not the last > page */ > > + if (i == 0) > > + buflen = rqst->rq_pagesz - rqst->rq_offset; > > + else > > + buflen = rqst->rq_pagesz; > > + } > > + > > nvecs = (buflen + max_iov_size - 1) / max_iov_size; > > log_write(INFO, "sending pages buflen=%d nvecs=%d\n", > > buflen, nvecs); > > @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, > struct smb_rqst *rqst) > > remaining_data_length -= size; > > log_write(INFO, "sending pages i=%d offset=%d > size=%d" > > " remaining_data_length=%d\n", > > - i, j*max_iov_size, size, > remaining_data_length); > > + i, j*max_iov_size+offset, size, > remaining_data_length); > > rc = smbd_post_send_page( > > - info, rqst->rq_pages[i], j*max_iov_size, > > + info, rqst->rq_pages[i], j*max_iov_size + > offset, > > size, remaining_data_length); > > if (rc) > > goto done; > >
RE: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
> Subject: Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset > in the 1st page > > On 5/17/2018 5:22 PM, Long Li wrote: > > From: Long Li > > There's a typo "recognize" in the patch title > > > > When doing RDMA send, the offset needs to be checked as data may start > > in an offset in the 1st page. > > Doesn't this patch alter the generic smb2pdu.c code too? I think this should > note "any" send, not just RDMA? Yes, but for TCP the direct_pages and page_offset are always NULL and 0 when cache=rdma is in mounting options, so it doesn't really affect he TCP code path in this patch set. This behavior can be changed, and it will work with TCP too. > > Tom. > > > > > Signed-off-by: Long Li > > --- > > fs/cifs/smb2pdu.c | 3 ++- > > fs/cifs/smbdirect.c | 25 +++-- > > 2 files changed, 21 insertions(+), 7 deletions(-) > > > > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index > > 5097f28..fdcf97e 100644 > > --- a/fs/cifs/smb2pdu.c > > +++ b/fs/cifs/smb2pdu.c > > @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata, > > > > rqst.rq_iov = iov; > > rqst.rq_nvec = 2; > > - rqst.rq_pages = wdata->pages; > > + rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : > wdata->pages; > > + rqst.rq_offset = wdata->page_offset; > > rqst.rq_npages = wdata->nr_pages; > > rqst.rq_pagesz = wdata->pagesz; > > rqst.rq_tailsz = wdata->tailsz; > > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index > > b0a1955..b46586d 100644 > > --- a/fs/cifs/smbdirect.c > > +++ b/fs/cifs/smbdirect.c > > @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, > > struct smb_rqst *rqst) > > > > /* add in the page array if there is one */ > > if (rqst->rq_npages) { > > - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1); > > - buflen += rqst->rq_tailsz; > > + if (rqst->rq_npages == 1) > > + buflen += rqst->rq_tailsz; > > + else > > + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - > > +rqst->rq_offset + rqst->rq_tailsz; > > } > > > > if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8 > > +2184,19 @@ int smbd_send(struct smbd_connection *info, struct > > smb_rqst *rqst) > > > > /* now sending pages if there are any */ > > for (i = 0; i < rqst->rq_npages; i++) { > > - buflen = (i == rqst->rq_npages-1) ? > > - rqst->rq_tailsz : rqst->rq_pagesz; > > + unsigned int offset = 0; > > + if (i == 0) > > + offset = rqst->rq_offset; > > + if (rqst->rq_npages == 1 || i == rqst->rq_npages-1) > > + buflen = rqst->rq_tailsz; > > + else { > > + /* We have at least two pages, and this is not the last > page */ > > + if (i == 0) > > + buflen = rqst->rq_pagesz - rqst->rq_offset; > > + else > > + buflen = rqst->rq_pagesz; > > + } > > + > > nvecs = (buflen + max_iov_size - 1) / max_iov_size; > > log_write(INFO, "sending pages buflen=%d nvecs=%d\n", > > buflen, nvecs); > > @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, > struct smb_rqst *rqst) > > remaining_data_length -= size; > > log_write(INFO, "sending pages i=%d offset=%d > size=%d" > > " remaining_data_length=%d\n", > > - i, j*max_iov_size, size, > remaining_data_length); > > + i, j*max_iov_size+offset, size, > remaining_data_length); > > rc = smbd_post_send_page( > > - info, rqst->rq_pages[i], j*max_iov_size, > > + info, rqst->rq_pages[i], j*max_iov_size + > offset, > > size, remaining_data_length); > > if (rc) > > goto done; > >
Re: [PATCH 1/2] m68k: Set default dma mask for platform devices
On Fri, 18 May 2018, Christoph Hellwig wrote: > > This implementation of arch_setup_pdev_archdata() differs from the > > powerpc one, in that this one avoids clobbering a device dma mask > > which has already been initialized. > > I think your implementation should move into the generic implementation > in drivers/base/platform.c instead of being stuck in m68k. > > Also powerpc probably wants fixing, but that's something left to the > ppc folks.. On powerpc, all platform devices get a dma mask. But they don't do that in drivers/base/platform.c, they use arch_setup_pdev_archdata(). Why didn't they take the approach you suggest? How would I support the claim that replacing an empty platform device dma mask with 0x is safe on all architectures and platforms? Is there no code conditional upon dev.coherent_dma_mask or dev.dma_mask that could misbehave? (Didn't I cite an example in the other thread?*) If you can convince me that it is safe, I'd be happy to submit the patch you asked for. For now, I still think that patching the platform driver was the correct patch*. Maybe the real problem is your commit 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask"), because it assumes that all dma_ops implementations care about coherent_dma_mask. The dma_map_ops implementations that do use coherent_dma_mask should simply fail when it is unset, right? Would it not be better to revert your patch and fix the dma_map_ops failure paths, than to somehow audit all the platform drivers and patch drivers/base/platform.c? Thanks. * https://lkml.kernel.org/r/alpine.LNX.2.21.1805091804290.72%40nippy.intranet --
Re: [PATCH 1/2] m68k: Set default dma mask for platform devices
On Fri, 18 May 2018, Christoph Hellwig wrote: > > This implementation of arch_setup_pdev_archdata() differs from the > > powerpc one, in that this one avoids clobbering a device dma mask > > which has already been initialized. > > I think your implementation should move into the generic implementation > in drivers/base/platform.c instead of being stuck in m68k. > > Also powerpc probably wants fixing, but that's something left to the > ppc folks.. On powerpc, all platform devices get a dma mask. But they don't do that in drivers/base/platform.c, they use arch_setup_pdev_archdata(). Why didn't they take the approach you suggest? How would I support the claim that replacing an empty platform device dma mask with 0x is safe on all architectures and platforms? Is there no code conditional upon dev.coherent_dma_mask or dev.dma_mask that could misbehave? (Didn't I cite an example in the other thread?*) If you can convince me that it is safe, I'd be happy to submit the patch you asked for. For now, I still think that patching the platform driver was the correct patch*. Maybe the real problem is your commit 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask"), because it assumes that all dma_ops implementations care about coherent_dma_mask. The dma_map_ops implementations that do use coherent_dma_mask should simply fail when it is unset, right? Would it not be better to revert your patch and fix the dma_map_ops failure paths, than to somehow audit all the platform drivers and patch drivers/base/platform.c? Thanks. * https://lkml.kernel.org/r/alpine.LNX.2.21.1805091804290.72%40nippy.intranet --
atención
usuarios de webmail Tenga en cuenta que el 95% de sus correos electrónicos recibidos después de actualizar el servidor webmail recientemente en nuestra base de datos se han suspendido. Reciba su mensaje regularmente. Nuestro personal técnico actualizará su cuenta dentro de los tres días hábiles. De lo contrario, su cuenta será suspendida temporalmente por nuestro servicio. Para reconfirmar su buzón de correo, se enviará la siguiente información: normal: Nombre de usuario: contraseña: Confirmar la contraseña: Una advertencia !! Si esto no le permite actualizar su cuenta dentro de los dos días posteriores a la recepción del correo electrónico, perderá permanentemente el correo web de la cuenta del titular de la cuenta. ¡Gracias por su colaboración! Copyright © 2017-2018 WebMail Technical Support Services, Inc.
atención
usuarios de webmail Tenga en cuenta que el 95% de sus correos electrónicos recibidos después de actualizar el servidor webmail recientemente en nuestra base de datos se han suspendido. Reciba su mensaje regularmente. Nuestro personal técnico actualizará su cuenta dentro de los tres días hábiles. De lo contrario, su cuenta será suspendida temporalmente por nuestro servicio. Para reconfirmar su buzón de correo, se enviará la siguiente información: normal: Nombre de usuario: contraseña: Confirmar la contraseña: Una advertencia !! Si esto no le permite actualizar su cuenta dentro de los dos días posteriores a la recepción del correo electrónico, perderá permanentemente el correo web de la cuenta del titular de la cuenta. ¡Gracias por su colaboración! Copyright © 2017-2018 WebMail Technical Support Services, Inc.
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
Hi Tycho, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.17-rc5] [cannot apply to next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527 config: i386-randconfig-a1-05181545 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): kernel/seccomp.c: In function '__seccomp_filter': kernel/seccomp.c:891:46: warning: passing argument 2 of 'seccomp_do_user_notification' makes integer from pointer without a cast seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 'struct seccomp_filter *' static void seccomp_do_user_notification(int this_syscall, ^ >> kernel/seccomp.c:891:53: warning: passing argument 3 of >> 'seccomp_do_user_notification' from incompatible pointer type seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but argument is of type 'const struct seccomp_data *' static void seccomp_do_user_notification(int this_syscall, ^ kernel/seccomp.c:891:3: error: too few arguments to function 'seccomp_do_user_notification' seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: declared here static void seccomp_do_user_notification(int this_syscall, ^ kernel/seccomp.c: In function 'seccomp_set_mode_filter': >> kernel/seccomp.c:1036:3: error: implicit declaration of function >> 'get_unused_fd_flags' [-Werror=implicit-function-declaration] listener = get_unused_fd_flags(O_RDWR); ^ >> kernel/seccomp.c:1042:3: error: implicit declaration of function >> 'init_listener' [-Werror=implicit-function-declaration] listener_f = init_listener(current, prepared); ^ kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer without a cast listener_f = init_listener(current, prepared); ^ >> kernel/seccomp.c:1044:4: error: implicit declaration of function >> 'put_unused_fd' [-Werror=implicit-function-declaration] put_unused_fd(listener); ^ >> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' >> [-Werror=implicit-function-declaration] fput(listener_f); ^ >> kernel/seccomp.c:1080:4: error: implicit declaration of function >> 'fd_install' [-Werror=implicit-function-declaration] fd_install(listener, listener_f); ^ cc1: some warnings being treated as errors vim +/get_unused_fd_flags +1036 kernel/seccomp.c 812 813 #ifdef CONFIG_SECCOMP_FILTER 814 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, 815 const bool recheck_after_trace) 816 { 817 u32 filter_ret, action; 818 struct seccomp_filter *match = NULL; 819 int data; 820 821 /* 822 * Make sure that any changes to mode from another thread have 823 * been seen after TIF_SECCOMP was seen. 824 */ 825 rmb(); 826 827 filter_ret = seccomp_run_filters(sd, ); 828 data = filter_ret & SECCOMP_RET_DATA; 829 action = filter_ret & SECCOMP_RET_ACTION_FULL; 830 831 switch (action) { 832 case SECCOMP_RET_ERRNO: 833 /* Set low-order bits as an errno, capped at MAX_ERRNO. */ 834 if (data > MAX_ERRNO) 835 data = MAX_ERRNO; 836 syscall_set_return_value(current, task_pt_regs(current), 837 -data, 0); 838 goto skip; 839 840 case SECCOMP_RET_TRAP: 841 /* Show the handler the original registers. */ 842 syscall_rollback(current, task_pt_regs(current)); 843 /* Let the filter pass back 16 bits of data. */ 844 seccomp_send_sigsys(this_syscall, data); 845 goto skip; 846 847 case SECCOMP_RET_TRACE: 848 /* We've been put in this state by the ptracer already. */ 849 if (recheck_after_trace) 850 return 0; 851 852 /* ENOSYS these calls if there is no tracer attached. */ 853 if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) { 854
Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace
Hi Tycho, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.17-rc5] [cannot apply to next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527 config: i386-randconfig-a1-05181545 (attached as .config) compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4 reproduce: # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by >>): kernel/seccomp.c: In function '__seccomp_filter': kernel/seccomp.c:891:46: warning: passing argument 2 of 'seccomp_do_user_notification' makes integer from pointer without a cast seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 'struct seccomp_filter *' static void seccomp_do_user_notification(int this_syscall, ^ >> kernel/seccomp.c:891:53: warning: passing argument 3 of >> 'seccomp_do_user_notification' from incompatible pointer type seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but argument is of type 'const struct seccomp_data *' static void seccomp_do_user_notification(int this_syscall, ^ kernel/seccomp.c:891:3: error: too few arguments to function 'seccomp_do_user_notification' seccomp_do_user_notification(this_syscall, match, sd); ^ kernel/seccomp.c:802:13: note: declared here static void seccomp_do_user_notification(int this_syscall, ^ kernel/seccomp.c: In function 'seccomp_set_mode_filter': >> kernel/seccomp.c:1036:3: error: implicit declaration of function >> 'get_unused_fd_flags' [-Werror=implicit-function-declaration] listener = get_unused_fd_flags(O_RDWR); ^ >> kernel/seccomp.c:1042:3: error: implicit declaration of function >> 'init_listener' [-Werror=implicit-function-declaration] listener_f = init_listener(current, prepared); ^ kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer without a cast listener_f = init_listener(current, prepared); ^ >> kernel/seccomp.c:1044:4: error: implicit declaration of function >> 'put_unused_fd' [-Werror=implicit-function-declaration] put_unused_fd(listener); ^ >> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' >> [-Werror=implicit-function-declaration] fput(listener_f); ^ >> kernel/seccomp.c:1080:4: error: implicit declaration of function >> 'fd_install' [-Werror=implicit-function-declaration] fd_install(listener, listener_f); ^ cc1: some warnings being treated as errors vim +/get_unused_fd_flags +1036 kernel/seccomp.c 812 813 #ifdef CONFIG_SECCOMP_FILTER 814 static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, 815 const bool recheck_after_trace) 816 { 817 u32 filter_ret, action; 818 struct seccomp_filter *match = NULL; 819 int data; 820 821 /* 822 * Make sure that any changes to mode from another thread have 823 * been seen after TIF_SECCOMP was seen. 824 */ 825 rmb(); 826 827 filter_ret = seccomp_run_filters(sd, ); 828 data = filter_ret & SECCOMP_RET_DATA; 829 action = filter_ret & SECCOMP_RET_ACTION_FULL; 830 831 switch (action) { 832 case SECCOMP_RET_ERRNO: 833 /* Set low-order bits as an errno, capped at MAX_ERRNO. */ 834 if (data > MAX_ERRNO) 835 data = MAX_ERRNO; 836 syscall_set_return_value(current, task_pt_regs(current), 837 -data, 0); 838 goto skip; 839 840 case SECCOMP_RET_TRAP: 841 /* Show the handler the original registers. */ 842 syscall_rollback(current, task_pt_regs(current)); 843 /* Let the filter pass back 16 bits of data. */ 844 seccomp_send_sigsys(this_syscall, data); 845 goto skip; 846 847 case SECCOMP_RET_TRACE: 848 /* We've been put in this state by the ptracer already. */ 849 if (recheck_after_trace) 850 return 0; 851 852 /* ENOSYS these calls if there is no tracer attached. */ 853 if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) { 854
[PATCH v2 1/2] [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver
The ALWAYS_SYNC flag is currently honored by the usb-storage driver but not UAS and is required to work around devices that become unstable upon being queried for cache. This code is taken straight from: drivers/usb/storage/scsiglue.c:284 Signed-off-by: Alexander Kappner--- drivers/usb/storage/uas.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index 6034c39..9e9de54 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -836,6 +836,12 @@ static int uas_slave_configure(struct scsi_device *sdev) if (devinfo->flags & US_FL_BROKEN_FUA) sdev->broken_fua = 1; + /* UAS also needs to support FL_ALWAYS_SYNC */ + if (devinfo->flags & US_FL_ALWAYS_SYNC) { + sdev->skip_ms_page_3f = 1; + sdev->skip_ms_page_8 = 1; + sdev->wce_default_on = 1; + } scsi_change_queue_depth(sdev, devinfo->qdepth - 2); return 0; } -- 2.1.4
[PATCH v2 2/2] [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive
The "G-Drive" (sold by G-Technology) external USB 3.0 drive hangs on write access under UAS and usb-storage: [ 136.079121] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 136.079144] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current] [ 136.079152] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb [ 136.079176] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 00 00 00 00 00 00 00 08 00 00 [ 136.079180] print_req_error: critical target error, dev sdi, sector 0 [ 136.079183] Buffer I/O error on dev sdi, logical block 0, lost sync page write [ 136.173148] EXT4-fs (sdi): mounted filesystem with ordered data mode. Opts: (null) [ 140.583998] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 140.584010] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current] [ 140.584016] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb [ 140.584022] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 e8 c4 00 18 00 00 00 08 00 00 [ 140.584025] print_req_error: critical target error, dev sdi, sector 3905159192 [ 140.584044] print_req_error: critical target error, dev sdi, sector 3905159192 [ 140.584052] Aborting journal on device sdi-8. The proposed patch adds compatibility quirks. Because the drive requires two quirks (one to work with UAS, and another to work with usb-storage), adding this under unusual_devs.h and not just unusual_uas.h so kernels compiled without UAS receive the quirk. With the patch, the drive works reliably on UAS and usb- storage. (tested on NEC Corporation uPD720200 USB 3.0 host controller). Signed-off-by: Alexander Kappner--- drivers/usb/storage/unusual_devs.h | 9 + drivers/usb/storage/unusual_uas.h | 9 + 2 files changed, 18 insertions(+) diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h index 747d3a9..22fcfcc 100644 --- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -2321,6 +2321,15 @@ UNUSUAL_DEV( 0x4146, 0xba01, 0x0100, 0x0100, "Micro Mini 1GB", USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NOT_LOCKABLE ), +/* "G-DRIVE" external HDD hangs on write without these. + * Patch submitted by Alexander Kappner + */ +UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x, + "SimpleTech", + "External HDD", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_ALWAYS_SYNC), + /* * Nick Bowler * SCSI stack spams (otherwise harmless) error messages. diff --git a/drivers/usb/storage/unusual_uas.h b/drivers/usb/storage/unusual_uas.h index 38434d8..d0bdebd 100644 --- a/drivers/usb/storage/unusual_uas.h +++ b/drivers/usb/storage/unusual_uas.h @@ -107,3 +107,12 @@ UNUSUAL_DEV(0x4971, 0x8017, 0x, 0x, "External HDD", USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NO_REPORT_OPCODES), + +/* "G-DRIVE" external HDD hangs on write without these. + * Patch submitted by Alexander Kappner + */ +UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x, + "SimpleTech", + "External HDD", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_ALWAYS_SYNC), -- 2.1.4
[PATCH v2 0/2] Re: usb-storage: Add quirks to make G-Technology "G-Drive" work
v2 of this patch implements the FL_ALWAYS_SYNC flag in the UAS driver, and then adds the flag as quirks for the device at issue. This allows the G-Drive to work under both UAS and usb-storage. Alexander Kappner (2): [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive drivers/usb/storage/uas.c | 6 ++ drivers/usb/storage/unusual_devs.h | 9 + drivers/usb/storage/unusual_uas.h | 9 + 3 files changed, 24 insertions(+) -- 2.1.4
[PATCH v2 1/2] [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver
The ALWAYS_SYNC flag is currently honored by the usb-storage driver but not UAS and is required to work around devices that become unstable upon being queried for cache. This code is taken straight from: drivers/usb/storage/scsiglue.c:284 Signed-off-by: Alexander Kappner --- drivers/usb/storage/uas.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index 6034c39..9e9de54 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -836,6 +836,12 @@ static int uas_slave_configure(struct scsi_device *sdev) if (devinfo->flags & US_FL_BROKEN_FUA) sdev->broken_fua = 1; + /* UAS also needs to support FL_ALWAYS_SYNC */ + if (devinfo->flags & US_FL_ALWAYS_SYNC) { + sdev->skip_ms_page_3f = 1; + sdev->skip_ms_page_8 = 1; + sdev->wce_default_on = 1; + } scsi_change_queue_depth(sdev, devinfo->qdepth - 2); return 0; } -- 2.1.4
[PATCH v2 2/2] [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive
The "G-Drive" (sold by G-Technology) external USB 3.0 drive hangs on write access under UAS and usb-storage: [ 136.079121] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 136.079144] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current] [ 136.079152] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb [ 136.079176] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 00 00 00 00 00 00 00 08 00 00 [ 136.079180] print_req_error: critical target error, dev sdi, sector 0 [ 136.079183] Buffer I/O error on dev sdi, logical block 0, lost sync page write [ 136.173148] EXT4-fs (sdi): mounted filesystem with ordered data mode. Opts: (null) [ 140.583998] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 140.584010] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current] [ 140.584016] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb [ 140.584022] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 e8 c4 00 18 00 00 00 08 00 00 [ 140.584025] print_req_error: critical target error, dev sdi, sector 3905159192 [ 140.584044] print_req_error: critical target error, dev sdi, sector 3905159192 [ 140.584052] Aborting journal on device sdi-8. The proposed patch adds compatibility quirks. Because the drive requires two quirks (one to work with UAS, and another to work with usb-storage), adding this under unusual_devs.h and not just unusual_uas.h so kernels compiled without UAS receive the quirk. With the patch, the drive works reliably on UAS and usb- storage. (tested on NEC Corporation uPD720200 USB 3.0 host controller). Signed-off-by: Alexander Kappner --- drivers/usb/storage/unusual_devs.h | 9 + drivers/usb/storage/unusual_uas.h | 9 + 2 files changed, 18 insertions(+) diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h index 747d3a9..22fcfcc 100644 --- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -2321,6 +2321,15 @@ UNUSUAL_DEV( 0x4146, 0xba01, 0x0100, 0x0100, "Micro Mini 1GB", USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NOT_LOCKABLE ), +/* "G-DRIVE" external HDD hangs on write without these. + * Patch submitted by Alexander Kappner + */ +UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x, + "SimpleTech", + "External HDD", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_ALWAYS_SYNC), + /* * Nick Bowler * SCSI stack spams (otherwise harmless) error messages. diff --git a/drivers/usb/storage/unusual_uas.h b/drivers/usb/storage/unusual_uas.h index 38434d8..d0bdebd 100644 --- a/drivers/usb/storage/unusual_uas.h +++ b/drivers/usb/storage/unusual_uas.h @@ -107,3 +107,12 @@ UNUSUAL_DEV(0x4971, 0x8017, 0x, 0x, "External HDD", USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NO_REPORT_OPCODES), + +/* "G-DRIVE" external HDD hangs on write without these. + * Patch submitted by Alexander Kappner + */ +UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x, + "SimpleTech", + "External HDD", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_ALWAYS_SYNC), -- 2.1.4
[PATCH v2 0/2] Re: usb-storage: Add quirks to make G-Technology "G-Drive" work
v2 of this patch implements the FL_ALWAYS_SYNC flag in the UAS driver, and then adds the flag as quirks for the device at issue. This allows the G-Drive to work under both UAS and usb-storage. Alexander Kappner (2): [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive drivers/usb/storage/uas.c | 6 ++ drivers/usb/storage/unusual_devs.h | 9 + drivers/usb/storage/unusual_uas.h | 9 + 3 files changed, 24 insertions(+) -- 2.1.4
Re: [PATCH 5/6] x86: refcount: prevent gcc distortions
Hi Nadav, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/auto-latest] [also build test ERROR on v4.17-rc5 next-20180517] [cannot apply to tip/x86/core] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-objtool-use-asm-macro-for-better-compiler-decisions/20180519-045439 config: x86_64-allyesdebian (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): arch/x86/include/asm/refcount.h: Assembler messages: >> arch/x86/include/asm/refcount.h:67: Error: too many positional arguments vim +67 arch/x86/include/asm/refcount.h 63 64 static __always_inline void refcount_inc(refcount_t *r) 65 { 66 asm volatile(LOCK_PREFIX "incl %0\n\t" > 67 "__REFCOUNT_CHECK_LT_ZERO %[counter]" 68 : [counter] "+m" (r->refs.counter) 69 : : "cc", "cx"); 70 } 71 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 5/6] x86: refcount: prevent gcc distortions
Hi Nadav, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/auto-latest] [also build test ERROR on v4.17-rc5 next-20180517] [cannot apply to tip/x86/core] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-objtool-use-asm-macro-for-better-compiler-decisions/20180519-045439 config: x86_64-allyesdebian (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): arch/x86/include/asm/refcount.h: Assembler messages: >> arch/x86/include/asm/refcount.h:67: Error: too many positional arguments vim +67 arch/x86/include/asm/refcount.h 63 64 static __always_inline void refcount_inc(refcount_t *r) 65 { 66 asm volatile(LOCK_PREFIX "incl %0\n\t" > 67 "__REFCOUNT_CHECK_LT_ZERO %[counter]" 68 : [counter] "+m" (r->refs.counter) 69 : : "cc", "cx"); 70 } 71 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 9:12 PM Linus Torvalds < torva...@linux-foundation.org> wrote: > (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it > would only be used as a "fill in inode value for regular files for this > superblock"). Actually, maybe we could just make rw_verify_area() use the new file_mmap_size_max() function. We'd just remove the "mmap" part of the name, and have all IO (and all mmap) use that limit. That doesn't sound like a horrible idea to try for 4.18. Hmm? Linus
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 9:12 PM Linus Torvalds < torva...@linux-foundation.org> wrote: > (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it > would only be used as a "fill in inode value for regular files for this > superblock"). Actually, maybe we could just make rw_verify_area() use the new file_mmap_size_max() function. We'd just remove the "mmap" part of the name, and have all IO (and all mmap) use that limit. That doesn't sound like a horrible idea to try for 4.18. Hmm? Linus
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 8:43 PM Al Virowrote: > Not quite. The things like > if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) > return 0; > iov_iter_truncate(iter, inode->i_sb->s_maxbytes); > protect most of the regular files (see mm/filemap.c). And for devices (which is > where the majority of crap ->read()/->write() is) it's obviously not applicable - > ->s_maxbytes of *what*? Yeah that "s_maxbytes of what" is I think the real issue. We should never have made s_maxbytes be super-block specific: we should have made it be per-inode, and then have inode_init_always() initialize it using something like the file_mmap_size_max() logic. (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it would only be used as a "fill in inode value for regular files for this superblock"). Then we could actually protect read/write properly, because many of the nasty bugs have been in character device drivers. Oh well. It would still be a good thing to do some day, I suspect, but it's clearly not the case now, and so s_maxbytes actually has much less coverage than I was hoping for. (And thus also the problems with /proc/vmcore - it never saw s_maxbytes limits before). Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously means that my objections against Vasily's patch are mostly invalid. Even if /proc does use "generic_file_llseek()" a lot and that should limit things to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up the offset. I'd still prefer to limit the damage to just "vmcore". Something like the below COMPLETELY UNTESTED patch? Vasily? Linus fs/proc/vmcore.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index a45f0af22a60..83278c547127 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) } #endif +/* Mark vmcore as being able and willing to do 64-bit mmaps */ +static int vmcore_open(struct inode *inode, struct file *file) +{ + file->f_mode |= FMODE_UNSIGNED_OFFSET; + return 0; +} + static const struct file_operations proc_vmcore_operations = { + .open = vmcore_open, .read = read_vmcore, .llseek = default_llseek, .mmap = mmap_vmcore,
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 8:43 PM Al Viro wrote: > Not quite. The things like > if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) > return 0; > iov_iter_truncate(iter, inode->i_sb->s_maxbytes); > protect most of the regular files (see mm/filemap.c). And for devices (which is > where the majority of crap ->read()/->write() is) it's obviously not applicable - > ->s_maxbytes of *what*? Yeah that "s_maxbytes of what" is I think the real issue. We should never have made s_maxbytes be super-block specific: we should have made it be per-inode, and then have inode_init_always() initialize it using something like the file_mmap_size_max() logic. (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it would only be used as a "fill in inode value for regular files for this superblock"). Then we could actually protect read/write properly, because many of the nasty bugs have been in character device drivers. Oh well. It would still be a good thing to do some day, I suspect, but it's clearly not the case now, and so s_maxbytes actually has much less coverage than I was hoping for. (And thus also the problems with /proc/vmcore - it never saw s_maxbytes limits before). Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously means that my objections against Vasily's patch are mostly invalid. Even if /proc does use "generic_file_llseek()" a lot and that should limit things to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up the offset. I'd still prefer to limit the damage to just "vmcore". Something like the below COMPLETELY UNTESTED patch? Vasily? Linus fs/proc/vmcore.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index a45f0af22a60..83278c547127 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma) } #endif +/* Mark vmcore as being able and willing to do 64-bit mmaps */ +static int vmcore_open(struct inode *inode, struct file *file) +{ + file->f_mode |= FMODE_UNSIGNED_OFFSET; + return 0; +} + static const struct file_operations proc_vmcore_operations = { + .open = vmcore_open, .read = read_vmcore, .llseek = default_llseek, .mmap = mmap_vmcore,
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 08:33:37PM -0700, Linus Torvalds wrote: > On Fri, May 18, 2018 at 8:20 PM Linus Torvalds < > torva...@linux-foundation.org> wrote: > > > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_, > > rather than open up all proc files to issues with 4G+ offsets. > > Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and > ask you how that ever worked for that file, but it's not there, the > s_maxbyte checks are only in lseek and in do_splice(). > > So apparently we protect against llseek + read/write, but we don't protect > against pread64/pwrite64 having offset overflows.. > > That's crazy. That makes all the s_maxbytes protection much less effective > than it should be. Filesystems that don't get the 64-bit case right will > screw up pread64 and friends. > > Al, I'm missing something. Did we always have this gaping hole where we > didn't actually check s_maxbytes against read/write, only > generic_file_llseek? Apparently. Not quite. The things like if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) return 0; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); protect most of the regular files (see mm/filemap.c). And for devices (which is where the majority of crap ->read()/->write() is) it's obviously not applicable - ->s_maxbytes of *what*?
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 08:33:37PM -0700, Linus Torvalds wrote: > On Fri, May 18, 2018 at 8:20 PM Linus Torvalds < > torva...@linux-foundation.org> wrote: > > > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_, > > rather than open up all proc files to issues with 4G+ offsets. > > Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and > ask you how that ever worked for that file, but it's not there, the > s_maxbyte checks are only in lseek and in do_splice(). > > So apparently we protect against llseek + read/write, but we don't protect > against pread64/pwrite64 having offset overflows.. > > That's crazy. That makes all the s_maxbytes protection much less effective > than it should be. Filesystems that don't get the 64-bit case right will > screw up pread64 and friends. > > Al, I'm missing something. Did we always have this gaping hole where we > didn't actually check s_maxbytes against read/write, only > generic_file_llseek? Apparently. Not quite. The things like if (unlikely(*ppos >= inode->i_sb->s_maxbytes)) return 0; iov_iter_truncate(iter, inode->i_sb->s_maxbytes); protect most of the regular files (see mm/filemap.c). And for devices (which is where the majority of crap ->read()/->write() is) it's obviously not applicable - ->s_maxbytes of *what*?
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 8:20 PM Linus Torvalds < torva...@linux-foundation.org> wrote: > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_, > rather than open up all proc files to issues with 4G+ offsets. Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and ask you how that ever worked for that file, but it's not there, the s_maxbyte checks are only in lseek and in do_splice(). So apparently we protect against llseek + read/write, but we don't protect against pread64/pwrite64 having offset overflows.. That's crazy. That makes all the s_maxbytes protection much less effective than it should be. Filesystems that don't get the 64-bit case right will screw up pread64 and friends. Al, I'm missing something. Did we always have this gaping hole where we didn't actually check s_maxbytes against read/write, only generic_file_llseek? Apparently. Linus
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 8:20 PM Linus Torvalds < torva...@linux-foundation.org> wrote: > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_, > rather than open up all proc files to issues with 4G+ offsets. Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and ask you how that ever worked for that file, but it's not there, the s_maxbyte checks are only in lseek and in do_splice(). So apparently we protect against llseek + read/write, but we don't protect against pread64/pwrite64 having offset overflows.. That's crazy. That makes all the s_maxbytes protection much less effective than it should be. Filesystems that don't get the 64-bit case right will screw up pread64 and friends. Al, I'm missing something. Did we always have this gaping hole where we didn't actually check s_maxbytes against read/write, only generic_file_llseek? Apparently. Linus
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 8:15 PM Vasily Gorbikwrote: > Commit be83bbf80682 ("mmap: introduce sane default mmap limits") > introduced "pgoff" limits checks for mmap, considering fs max file size > as upper limit, which made it impossible to mmap /proc/vmcore file > contents above 2Gb (/proc/vmcore appears to be the only procfs file > supporting mmap). > Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value. Ugh. /proc is where a lot of problems *have* been. Admittedly not as much as random drivers, but still. If proc doesn't set s_maxbytes, then we should not raise it here magically, there might be various /proc files that get offset accounting wrong. I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_, rather than open up all proc files to issues with 4G+ offsets. Linus
Re: [PATCH] procfs: fix mmap() for /proc/vmcore
On Fri, May 18, 2018 at 8:15 PM Vasily Gorbik wrote: > Commit be83bbf80682 ("mmap: introduce sane default mmap limits") > introduced "pgoff" limits checks for mmap, considering fs max file size > as upper limit, which made it impossible to mmap /proc/vmcore file > contents above 2Gb (/proc/vmcore appears to be the only procfs file > supporting mmap). > Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value. Ugh. /proc is where a lot of problems *have* been. Admittedly not as much as random drivers, but still. If proc doesn't set s_maxbytes, then we should not raise it here magically, there might be various /proc files that get offset accounting wrong. I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_, rather than open up all proc files to issues with 4G+ offsets. Linus
[PATCH] procfs: fix mmap() for /proc/vmcore
Procfs does not set max file size (s_maxbytes) and ends up with default value of MAX_NON_LFS ((1UL<<31) - 1) Commit be83bbf80682 ("mmap: introduce sane default mmap limits") introduced "pgoff" limits checks for mmap, considering fs max file size as upper limit, which made it impossible to mmap /proc/vmcore file contents above 2Gb (/proc/vmcore appears to be the only procfs file supporting mmap). Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value. Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits") Signed-off-by: Vasily Gorbik--- fs/proc/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 2cf3b74391ca..de1b3b1da883 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -502,6 +502,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent) /* User space would break if executables or devices appear on proc */ s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV; s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC; + s->s_maxbytes = MAX_LFS_FILESIZE; s->s_blocksize = 1024; s->s_blocksize_bits = 10; s->s_magic = PROC_SUPER_MAGIC; -- ⢀⡵⣤⡴⣅ 2.17.0 ⠏⢟⡛⣛⠏⠇ space invaders edition
[PATCH] procfs: fix mmap() for /proc/vmcore
Procfs does not set max file size (s_maxbytes) and ends up with default value of MAX_NON_LFS ((1UL<<31) - 1) Commit be83bbf80682 ("mmap: introduce sane default mmap limits") introduced "pgoff" limits checks for mmap, considering fs max file size as upper limit, which made it impossible to mmap /proc/vmcore file contents above 2Gb (/proc/vmcore appears to be the only procfs file supporting mmap). Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value. Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits") Signed-off-by: Vasily Gorbik --- fs/proc/inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 2cf3b74391ca..de1b3b1da883 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -502,6 +502,7 @@ int proc_fill_super(struct super_block *s, void *data, int silent) /* User space would break if executables or devices appear on proc */ s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV; s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC; + s->s_maxbytes = MAX_LFS_FILESIZE; s->s_blocksize = 1024; s->s_blocksize_bits = 10; s->s_magic = PROC_SUPER_MAGIC; -- ⢀⡵⣤⡴⣅ 2.17.0 ⠏⢟⡛⣛⠏⠇ space invaders edition
[v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap
Greetings, be83bbf80682 "mmap: introduce sane default mmap limits" introduced a problem with mapping /proc/vmcore if it is bigger than 2gb. This breaks s390 kernel zfcpdump. But it should be a general problem. Please consider the following one-liner fix, if it makes sense. Vasily Gorbik (1): procfs: fix mmap() for /proc/vmcore fs/proc/inode.c | 1 + 1 file changed, 1 insertion(+) -- ⢀⡵⣤⡴⣅ 2.17.0 ⠏⢟⡛⣛⠏⠇ space invaders edition
[v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap
Greetings, be83bbf80682 "mmap: introduce sane default mmap limits" introduced a problem with mapping /proc/vmcore if it is bigger than 2gb. This breaks s390 kernel zfcpdump. But it should be a general problem. Please consider the following one-liner fix, if it makes sense. Vasily Gorbik (1): procfs: fix mmap() for /proc/vmcore fs/proc/inode.c | 1 + 1 file changed, 1 insertion(+) -- ⢀⡵⣤⡴⣅ 2.17.0 ⠏⢟⡛⣛⠏⠇ space invaders edition
Re: [PATCH] net: sched: don't disable bh when accessing action idr
On Fri, May 18, 2018 at 8:45 AM, Vlad Buslovwrote: > Underlying implementation of action map has changed and doesn't require > disabling bh anymore. Replace all action idr spinlock usage with regular > calls that do not disable bh. Please explain explicitly why it is not required, don't let people dig, this would save everyone's time. Also, this should be targeted for net-next, right? Thanks.
Re: [PATCH] net: sched: don't disable bh when accessing action idr
On Fri, May 18, 2018 at 8:45 AM, Vlad Buslov wrote: > Underlying implementation of action map has changed and doesn't require > disabling bh anymore. Replace all action idr spinlock usage with regular > calls that do not disable bh. Please explain explicitly why it is not required, don't let people dig, this would save everyone's time. Also, this should be targeted for net-next, right? Thanks.
[PATCH 05/12] arm: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit. Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm/include/asm/hw_breakpoint.h | 6 ++- arch/arm/kernel/hw_breakpoint.c | 71 ++-- 2 files changed, 41 insertions(+), 36 deletions(-) diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h index d5a0f52..451544d 100644 --- a/arch/arm/include/asm/hw_breakpoint.h +++ b/arch/arm/include/asm/hw_breakpoint.h @@ -111,6 +111,7 @@ static inline void decode_ctrl_reg(u32 reg, asm volatile("mcr p14, 0, %0, " #N "," #M ", " #OP2 : : "r" (VAL));\ } while (0) +struct perf_event_attr; struct notifier_block; struct perf_event; struct pmu; @@ -118,7 +119,10 @@ struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type); extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index 385dcf4..e80d125 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -517,42 +517,42 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, /* * Construct an arch_hw_breakpoint from a perf_event. */ -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_X: - info->ctrl.type = ARM_BREAKPOINT_EXECUTE; + hw->ctrl.type = ARM_BREAKPOINT_EXECUTE; break; case HW_BREAKPOINT_R: - info->ctrl.type = ARM_BREAKPOINT_LOAD; + hw->ctrl.type = ARM_BREAKPOINT_LOAD; break; case HW_BREAKPOINT_W: - info->ctrl.type = ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_STORE; break; case HW_BREAKPOINT_RW: - info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; break; default: return -EINVAL; } /* Len */ - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->ctrl.len = ARM_BREAKPOINT_LEN_1; + hw->ctrl.len = ARM_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->ctrl.len = ARM_BREAKPOINT_LEN_2; + hw->ctrl.len = ARM_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_4: - info->ctrl.len = ARM_BREAKPOINT_LEN_4; + hw->ctrl.len = ARM_BREAKPOINT_LEN_4; break; case HW_BREAKPOINT_LEN_8: - info->ctrl.len = ARM_BREAKPOINT_LEN_8; - if ((info->ctrl.type != ARM_BREAKPOINT_EXECUTE) + hw->ctrl.len = ARM_BREAKPOINT_LEN_8; + if ((hw->ctrl.type != ARM_BREAKPOINT_EXECUTE) && max_watchpoint_len >= 8) break; default: @@ -565,24 +565,24 @@ static int arch_build_bp_info(struct perf_event *bp) * by the hardware and must be aligned to the appropriate number of * bytes. */ - if
[PATCH 05/12] arm: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit. Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm/include/asm/hw_breakpoint.h | 6 ++- arch/arm/kernel/hw_breakpoint.c | 71 ++-- 2 files changed, 41 insertions(+), 36 deletions(-) diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h index d5a0f52..451544d 100644 --- a/arch/arm/include/asm/hw_breakpoint.h +++ b/arch/arm/include/asm/hw_breakpoint.h @@ -111,6 +111,7 @@ static inline void decode_ctrl_reg(u32 reg, asm volatile("mcr p14, 0, %0, " #N "," #M ", " #OP2 : : "r" (VAL));\ } while (0) +struct perf_event_attr; struct notifier_block; struct perf_event; struct pmu; @@ -118,7 +119,10 @@ struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type); extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index 385dcf4..e80d125 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -517,42 +517,42 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, /* * Construct an arch_hw_breakpoint from a perf_event. */ -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_X: - info->ctrl.type = ARM_BREAKPOINT_EXECUTE; + hw->ctrl.type = ARM_BREAKPOINT_EXECUTE; break; case HW_BREAKPOINT_R: - info->ctrl.type = ARM_BREAKPOINT_LOAD; + hw->ctrl.type = ARM_BREAKPOINT_LOAD; break; case HW_BREAKPOINT_W: - info->ctrl.type = ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_STORE; break; case HW_BREAKPOINT_RW: - info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; break; default: return -EINVAL; } /* Len */ - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->ctrl.len = ARM_BREAKPOINT_LEN_1; + hw->ctrl.len = ARM_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->ctrl.len = ARM_BREAKPOINT_LEN_2; + hw->ctrl.len = ARM_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_4: - info->ctrl.len = ARM_BREAKPOINT_LEN_4; + hw->ctrl.len = ARM_BREAKPOINT_LEN_4; break; case HW_BREAKPOINT_LEN_8: - info->ctrl.len = ARM_BREAKPOINT_LEN_8; - if ((info->ctrl.type != ARM_BREAKPOINT_EXECUTE) + hw->ctrl.len = ARM_BREAKPOINT_LEN_8; + if ((hw->ctrl.type != ARM_BREAKPOINT_EXECUTE) && max_watchpoint_len >= 8) break; default: @@ -565,24 +565,24 @@ static int arch_build_bp_info(struct perf_event *bp) * by the hardware and must be aligned to the appropriate number of * bytes. */ - if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE && - info->ctrl.len != ARM_BREAKPOINT_LEN_2 && - info->ctrl.len != ARM_BREAKPOINT_LEN_4) + if (hw->ctrl.type == ARM_BREAKPOINT_EXECUTE && + hw->ctrl.len != ARM_BREAKPOINT_LEN_2 && + hw->ctrl.len != ARM_BREAKPOINT_LEN_4) return -EINVAL; /* Address */ - info->address = bp->attr.bp_addr; + hw->address = attr->bp_addr; /*
[PATCH 06/12] arm64: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit. Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm64/include/asm/hw_breakpoint.h | 6 ++- arch/arm64/kernel/hw_breakpoint.c | 79 +- 2 files changed, 45 insertions(+), 40 deletions(-) diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 9f4a3d4..3b43319 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -119,13 +119,17 @@ static inline void decode_ctrl_reg(u32 reg, struct task_struct; struct notifier_block; +struct perf_event_attr; struct perf_event; struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type, int *offset); extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 92c6973..968c3af 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -421,53 +421,53 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, /* * Construct an arch_hw_breakpoint from a perf_event. */ -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_X: - info->ctrl.type = ARM_BREAKPOINT_EXECUTE; + hw->ctrl.type = ARM_BREAKPOINT_EXECUTE; break; case HW_BREAKPOINT_R: - info->ctrl.type = ARM_BREAKPOINT_LOAD; + hw->ctrl.type = ARM_BREAKPOINT_LOAD; break; case HW_BREAKPOINT_W: - info->ctrl.type = ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_STORE; break; case HW_BREAKPOINT_RW: - info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; break; default: return -EINVAL; } /* Len */ - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->ctrl.len = ARM_BREAKPOINT_LEN_1; + hw->ctrl.len = ARM_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->ctrl.len = ARM_BREAKPOINT_LEN_2; + hw->ctrl.len = ARM_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_3: - info->ctrl.len = ARM_BREAKPOINT_LEN_3; + hw->ctrl.len = ARM_BREAKPOINT_LEN_3; break; case HW_BREAKPOINT_LEN_4: - info->ctrl.len = ARM_BREAKPOINT_LEN_4; + hw->ctrl.len = ARM_BREAKPOINT_LEN_4; break; case HW_BREAKPOINT_LEN_5: - info->ctrl.len = ARM_BREAKPOINT_LEN_5; + hw->ctrl.len = ARM_BREAKPOINT_LEN_5; break; case HW_BREAKPOINT_LEN_6: - info->ctrl.len = ARM_BREAKPOINT_LEN_6; + hw->ctrl.len = ARM_BREAKPOINT_LEN_6; break; case HW_BREAKPOINT_LEN_7: - info->ctrl.len = ARM_BREAKPOINT_LEN_7; + hw->ctrl.len =
[PATCH 09/12] xtensa: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/xtensa/include/asm/hw_breakpoint.h | 6 +- arch/xtensa/kernel/hw_breakpoint.c | 33 - 2 files changed, 17 insertions(+), 22 deletions(-) diff --git a/arch/xtensa/include/asm/hw_breakpoint.h b/arch/xtensa/include/asm/hw_breakpoint.h index 2525bf6..58632d3 100644 --- a/arch/xtensa/include/asm/hw_breakpoint.h +++ b/arch/xtensa/include/asm/hw_breakpoint.h @@ -30,13 +30,17 @@ struct arch_hw_breakpoint { u16 type; }; +struct perf_event_attr; struct perf_event; struct pt_regs; struct task_struct; int hw_breakpoint_slots(int type); int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -int arch_validate_hwbkpt_settings(struct perf_event *bp); +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/xtensa/kernel/hw_breakpoint.c b/arch/xtensa/kernel/hw_breakpoint.c index 6e34c38..a0def44 100644 --- a/arch/xtensa/kernel/hw_breakpoint.c +++ b/arch/xtensa/kernel/hw_breakpoint.c @@ -47,50 +47,41 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) /* * Construct an arch_hw_breakpoint from a perf_event. */ -static int arch_build_bp_info(struct perf_event *bp) +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_X: - info->type = XTENSA_BREAKPOINT_EXECUTE; + hw->type = XTENSA_BREAKPOINT_EXECUTE; break; case HW_BREAKPOINT_R: - info->type = XTENSA_BREAKPOINT_LOAD; + hw->type = XTENSA_BREAKPOINT_LOAD; break; case HW_BREAKPOINT_W: - info->type = XTENSA_BREAKPOINT_STORE; + hw->type = XTENSA_BREAKPOINT_STORE; break; case HW_BREAKPOINT_RW: - info->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE; + hw->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE; break; default: return -EINVAL; } /* Len */ - info->len = bp->attr.bp_len; - if (info->len < 1 || info->len > 64 || !is_power_of_2(info->len)) + hw->len = attr->bp_len; + if (hw->len < 1 || hw->len > 64 || !is_power_of_2(hw->len)) return -EINVAL; /* Address */ - info->address = bp->attr.bp_addr; - if (info->address & (info->len - 1)) + hw->address = attr->bp_addr; + if (hw->address & (hw->len - 1)) return -EINVAL; return 0; } -int arch_validate_hwbkpt_settings(struct perf_event *bp) -{ - int ret; - - /* Build the arch_hw_breakpoint. */ - ret = arch_build_bp_info(bp); - return ret; -} - int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data) { -- 2.7.4
[PATCH 06/12] arm64: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit. Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm64/include/asm/hw_breakpoint.h | 6 ++- arch/arm64/kernel/hw_breakpoint.c | 79 +- 2 files changed, 45 insertions(+), 40 deletions(-) diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 9f4a3d4..3b43319 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -119,13 +119,17 @@ static inline void decode_ctrl_reg(u32 reg, struct task_struct; struct notifier_block; +struct perf_event_attr; struct perf_event; struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type, int *offset); extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 92c6973..968c3af 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -421,53 +421,53 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, /* * Construct an arch_hw_breakpoint from a perf_event. */ -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_X: - info->ctrl.type = ARM_BREAKPOINT_EXECUTE; + hw->ctrl.type = ARM_BREAKPOINT_EXECUTE; break; case HW_BREAKPOINT_R: - info->ctrl.type = ARM_BREAKPOINT_LOAD; + hw->ctrl.type = ARM_BREAKPOINT_LOAD; break; case HW_BREAKPOINT_W: - info->ctrl.type = ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_STORE; break; case HW_BREAKPOINT_RW: - info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; + hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE; break; default: return -EINVAL; } /* Len */ - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->ctrl.len = ARM_BREAKPOINT_LEN_1; + hw->ctrl.len = ARM_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->ctrl.len = ARM_BREAKPOINT_LEN_2; + hw->ctrl.len = ARM_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_3: - info->ctrl.len = ARM_BREAKPOINT_LEN_3; + hw->ctrl.len = ARM_BREAKPOINT_LEN_3; break; case HW_BREAKPOINT_LEN_4: - info->ctrl.len = ARM_BREAKPOINT_LEN_4; + hw->ctrl.len = ARM_BREAKPOINT_LEN_4; break; case HW_BREAKPOINT_LEN_5: - info->ctrl.len = ARM_BREAKPOINT_LEN_5; + hw->ctrl.len = ARM_BREAKPOINT_LEN_5; break; case HW_BREAKPOINT_LEN_6: - info->ctrl.len = ARM_BREAKPOINT_LEN_6; + hw->ctrl.len = ARM_BREAKPOINT_LEN_6; break; case HW_BREAKPOINT_LEN_7: - info->ctrl.len = ARM_BREAKPOINT_LEN_7; + hw->ctrl.len = ARM_BREAKPOINT_LEN_7; break; case HW_BREAKPOINT_LEN_8: - info->ctrl.len = ARM_BREAKPOINT_LEN_8; + hw->ctrl.len = ARM_BREAKPOINT_LEN_8; break; default: return -EINVAL; @@ -478,37 +478,37 @@ static int arch_build_bp_info(struct perf_event *bp) * AArch32 also requires breakpoints of length 2 for Thumb. * Watchpoints can be of length 1, 2, 4 or 8 bytes.
[PATCH 09/12] xtensa: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/xtensa/include/asm/hw_breakpoint.h | 6 +- arch/xtensa/kernel/hw_breakpoint.c | 33 - 2 files changed, 17 insertions(+), 22 deletions(-) diff --git a/arch/xtensa/include/asm/hw_breakpoint.h b/arch/xtensa/include/asm/hw_breakpoint.h index 2525bf6..58632d3 100644 --- a/arch/xtensa/include/asm/hw_breakpoint.h +++ b/arch/xtensa/include/asm/hw_breakpoint.h @@ -30,13 +30,17 @@ struct arch_hw_breakpoint { u16 type; }; +struct perf_event_attr; struct perf_event; struct pt_regs; struct task_struct; int hw_breakpoint_slots(int type); int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -int arch_validate_hwbkpt_settings(struct perf_event *bp); +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/xtensa/kernel/hw_breakpoint.c b/arch/xtensa/kernel/hw_breakpoint.c index 6e34c38..a0def44 100644 --- a/arch/xtensa/kernel/hw_breakpoint.c +++ b/arch/xtensa/kernel/hw_breakpoint.c @@ -47,50 +47,41 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) /* * Construct an arch_hw_breakpoint from a perf_event. */ -static int arch_build_bp_info(struct perf_event *bp) +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_X: - info->type = XTENSA_BREAKPOINT_EXECUTE; + hw->type = XTENSA_BREAKPOINT_EXECUTE; break; case HW_BREAKPOINT_R: - info->type = XTENSA_BREAKPOINT_LOAD; + hw->type = XTENSA_BREAKPOINT_LOAD; break; case HW_BREAKPOINT_W: - info->type = XTENSA_BREAKPOINT_STORE; + hw->type = XTENSA_BREAKPOINT_STORE; break; case HW_BREAKPOINT_RW: - info->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE; + hw->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE; break; default: return -EINVAL; } /* Len */ - info->len = bp->attr.bp_len; - if (info->len < 1 || info->len > 64 || !is_power_of_2(info->len)) + hw->len = attr->bp_len; + if (hw->len < 1 || hw->len > 64 || !is_power_of_2(hw->len)) return -EINVAL; /* Address */ - info->address = bp->attr.bp_addr; - if (info->address & (info->len - 1)) + hw->address = attr->bp_addr; + if (hw->address & (hw->len - 1)) return -EINVAL; return 0; } -int arch_validate_hwbkpt_settings(struct perf_event *bp) -{ - int ret; - - /* Build the arch_hw_breakpoint. */ - ret = arch_build_bp_info(bp); - return ret; -} - int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data) { -- 2.7.4
[PATCH 12/12] perf/breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check()
Remove the dance around old and new attributes. Just don't modify the previous breakpoint at all until we have verified everything. Reported-by: Linus TorvaldsOriginal-patch-by: Andy Lutomirski Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- kernel/events/hw_breakpoint.c | 46 --- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index 7840746..1af761a 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -461,37 +461,43 @@ register_user_hw_breakpoint(struct perf_event_attr *attr, } EXPORT_SYMBOL_GPL(register_user_hw_breakpoint); +static void hw_breakpoint_copy_attr(struct perf_event_attr *to, + struct perf_event_attr *from) +{ + to->bp_addr = from->bp_addr; + to->bp_type = from->bp_type; + to->bp_len = from->bp_len; + to->disabled = from->disabled; +} + int modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *attr, bool check) { - u64 old_addr = bp->attr.bp_addr; - u64 old_len = bp->attr.bp_len; - int old_type = bp->attr.bp_type; - bool modify = attr->bp_type != old_type; struct arch_hw_breakpoint hw; - int err = 0; - - bp->attr.bp_addr = attr->bp_addr; - bp->attr.bp_type = attr->bp_type; - bp->attr.bp_len = attr->bp_len; - - if (check && memcmp(>attr, attr, sizeof(*attr))) - return -EINVAL; + int err; err = hw_breakpoint_parse(bp, attr, ); - if (!err && modify) - err = modify_bp_slot(bp, old_type, bp->attr.bp_type); - - if (err) { - bp->attr.bp_addr = old_addr; - bp->attr.bp_type = old_type; - bp->attr.bp_len = old_len; + if (err) return err; + + if (check) { + struct perf_event_attr old_attr; + + old_attr = bp->attr; + hw_breakpoint_copy_attr(_attr, attr); + if (memcmp(_attr, attr, sizeof(*attr))) + return -EINVAL; + } + + if (bp->attr.bp_type != attr->bp_type) { + err = modify_bp_slot(bp, bp->attr.bp_type, attr->bp_type); + if (err) + return err; } + hw_breakpoint_copy_attr(>attr, attr); bp->hw.info = hw; - bp->attr.disabled = attr->disabled; return 0; } -- 2.7.4
[PATCH 10/12] perf/breakpoint: Remove default hw_breakpoint_arch_parse()
All architectures have implemented it, we can now remove the poor weak version. Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm/include/asm/hw_breakpoint.h | 1 - arch/arm64/include/asm/hw_breakpoint.h | 1 - arch/powerpc/include/asm/hw_breakpoint.h | 1 - arch/sh/include/asm/hw_breakpoint.h | 1 - arch/x86/include/asm/hw_breakpoint.h | 1 - arch/xtensa/include/asm/hw_breakpoint.h | 1 - kernel/events/hw_breakpoint.c| 17 - 7 files changed, 23 deletions(-) diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h index 451544d..3533fbe 100644 --- a/arch/arm/include/asm/hw_breakpoint.h +++ b/arch/arm/include/asm/hw_breakpoint.h @@ -122,7 +122,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 3b43319..d1a9c73 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -129,7 +129,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 8f647e6..6fe 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -65,7 +65,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); int arch_install_hw_breakpoint(struct perf_event *bp); diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h index ed421dc..4910990 100644 --- a/arch/sh/include/asm/hw_breakpoint.h +++ b/arch/sh/include/asm/hw_breakpoint.h @@ -58,7 +58,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h index 8cc161f..525f8c7 100644 --- a/arch/x86/include/asm/hw_breakpoint.h +++ b/arch/x86/include/asm/hw_breakpoint.h @@ -57,7 +57,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git
[PATCH 08/12] sh: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/sh/include/asm/hw_breakpoint.h | 6 +- arch/sh/kernel/hw_breakpoint.c | 37 +++-- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h index dae622d..ed421dc 100644 --- a/arch/sh/include/asm/hw_breakpoint.h +++ b/arch/sh/include/asm/hw_breakpoint.h @@ -40,6 +40,7 @@ struct sh_ubc { struct clk *clk; /* optional interface clock / MSTP bit */ }; +struct perf_event_attr; struct perf_event; struct task_struct; struct pmu; @@ -54,7 +55,10 @@ static inline int hw_breakpoint_slots(int type) /* arch/sh/kernel/hw_breakpoint.c */ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c index 3594773..78a87b1 100644 --- a/arch/sh/kernel/hw_breakpoint.c +++ b/arch/sh/kernel/hw_breakpoint.c @@ -173,40 +173,40 @@ int arch_bp_generic_fields(int sh_len, int sh_type, return 0; } -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - - info->address = bp->attr.bp_addr; + hw->address = attr->bp_addr; /* Len */ - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->len = SH_BREAKPOINT_LEN_1; + hw->len = SH_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->len = SH_BREAKPOINT_LEN_2; + hw->len = SH_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_4: - info->len = SH_BREAKPOINT_LEN_4; + hw->len = SH_BREAKPOINT_LEN_4; break; case HW_BREAKPOINT_LEN_8: - info->len = SH_BREAKPOINT_LEN_8; + hw->len = SH_BREAKPOINT_LEN_8; break; default: return -EINVAL; } /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_R: - info->type = SH_BREAKPOINT_READ; + hw->type = SH_BREAKPOINT_READ; break; case HW_BREAKPOINT_W: - info->type = SH_BREAKPOINT_WRITE; + hw->type = SH_BREAKPOINT_WRITE; break; case HW_BREAKPOINT_W | HW_BREAKPOINT_R: - info->type = SH_BREAKPOINT_RW; + hw->type = SH_BREAKPOINT_RW; break; default: return -EINVAL; @@ -218,19 +218,20 @@ static int arch_build_bp_info(struct perf_event *bp) /* * Validate the arch-specific HW Breakpoint register settings */ -int arch_validate_hwbkpt_settings(struct perf_event *bp) +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); unsigned int align; int ret; - ret = arch_build_bp_info(bp); + ret = arch_build_bp_info(bp, attr, hw); if (ret) return ret; ret = -EINVAL; - switch (info->len) { + switch (hw->len) { case
[PATCH 12/12] perf/breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check()
Remove the dance around old and new attributes. Just don't modify the previous breakpoint at all until we have verified everything. Reported-by: Linus Torvalds Original-patch-by: Andy Lutomirski Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- kernel/events/hw_breakpoint.c | 46 --- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index 7840746..1af761a 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -461,37 +461,43 @@ register_user_hw_breakpoint(struct perf_event_attr *attr, } EXPORT_SYMBOL_GPL(register_user_hw_breakpoint); +static void hw_breakpoint_copy_attr(struct perf_event_attr *to, + struct perf_event_attr *from) +{ + to->bp_addr = from->bp_addr; + to->bp_type = from->bp_type; + to->bp_len = from->bp_len; + to->disabled = from->disabled; +} + int modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *attr, bool check) { - u64 old_addr = bp->attr.bp_addr; - u64 old_len = bp->attr.bp_len; - int old_type = bp->attr.bp_type; - bool modify = attr->bp_type != old_type; struct arch_hw_breakpoint hw; - int err = 0; - - bp->attr.bp_addr = attr->bp_addr; - bp->attr.bp_type = attr->bp_type; - bp->attr.bp_len = attr->bp_len; - - if (check && memcmp(>attr, attr, sizeof(*attr))) - return -EINVAL; + int err; err = hw_breakpoint_parse(bp, attr, ); - if (!err && modify) - err = modify_bp_slot(bp, old_type, bp->attr.bp_type); - - if (err) { - bp->attr.bp_addr = old_addr; - bp->attr.bp_type = old_type; - bp->attr.bp_len = old_len; + if (err) return err; + + if (check) { + struct perf_event_attr old_attr; + + old_attr = bp->attr; + hw_breakpoint_copy_attr(_attr, attr); + if (memcmp(_attr, attr, sizeof(*attr))) + return -EINVAL; + } + + if (bp->attr.bp_type != attr->bp_type) { + err = modify_bp_slot(bp, bp->attr.bp_type, attr->bp_type); + if (err) + return err; } + hw_breakpoint_copy_attr(>attr, attr); bp->hw.info = hw; - bp->attr.disabled = attr->disabled; return 0; } -- 2.7.4
[PATCH 10/12] perf/breakpoint: Remove default hw_breakpoint_arch_parse()
All architectures have implemented it, we can now remove the poor weak version. Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm/include/asm/hw_breakpoint.h | 1 - arch/arm64/include/asm/hw_breakpoint.h | 1 - arch/powerpc/include/asm/hw_breakpoint.h | 1 - arch/sh/include/asm/hw_breakpoint.h | 1 - arch/x86/include/asm/hw_breakpoint.h | 1 - arch/xtensa/include/asm/hw_breakpoint.h | 1 - kernel/events/hw_breakpoint.c| 17 - 7 files changed, 23 deletions(-) diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h index 451544d..3533fbe 100644 --- a/arch/arm/include/asm/hw_breakpoint.h +++ b/arch/arm/include/asm/hw_breakpoint.h @@ -122,7 +122,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 3b43319..d1a9c73 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -129,7 +129,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 8f647e6..6fe 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -65,7 +65,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); int arch_install_hw_breakpoint(struct perf_event *bp); diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h index ed421dc..4910990 100644 --- a/arch/sh/include/asm/hw_breakpoint.h +++ b/arch/sh/include/asm/hw_breakpoint.h @@ -58,7 +58,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h index 8cc161f..525f8c7 100644 --- a/arch/x86/include/asm/hw_breakpoint.h +++ b/arch/x86/include/asm/hw_breakpoint.h @@ -57,7 +57,6 @@ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/xtensa/include/asm/hw_breakpoint.h b/arch/xtensa/include/asm/hw_breakpoint.h index 58632d3..f3b6036 100644 --- a/arch/xtensa/include/asm/hw_breakpoint.h +++ b/arch/xtensa/include/asm/hw_breakpoint.h @@ -40,7 +40,6 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); int hw_breakpoint_arch_parse(struct perf_event *bp, struct perf_event_attr *attr, struct arch_hw_breakpoint *hw); -#define
[PATCH 08/12] sh: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/sh/include/asm/hw_breakpoint.h | 6 +- arch/sh/kernel/hw_breakpoint.c | 37 +++-- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h index dae622d..ed421dc 100644 --- a/arch/sh/include/asm/hw_breakpoint.h +++ b/arch/sh/include/asm/hw_breakpoint.h @@ -40,6 +40,7 @@ struct sh_ubc { struct clk *clk; /* optional interface clock / MSTP bit */ }; +struct perf_event_attr; struct perf_event; struct task_struct; struct pmu; @@ -54,7 +55,10 @@ static inline int hw_breakpoint_slots(int type) /* arch/sh/kernel/hw_breakpoint.c */ extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c index 3594773..78a87b1 100644 --- a/arch/sh/kernel/hw_breakpoint.c +++ b/arch/sh/kernel/hw_breakpoint.c @@ -173,40 +173,40 @@ int arch_bp_generic_fields(int sh_len, int sh_type, return 0; } -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - - info->address = bp->attr.bp_addr; + hw->address = attr->bp_addr; /* Len */ - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->len = SH_BREAKPOINT_LEN_1; + hw->len = SH_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->len = SH_BREAKPOINT_LEN_2; + hw->len = SH_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_4: - info->len = SH_BREAKPOINT_LEN_4; + hw->len = SH_BREAKPOINT_LEN_4; break; case HW_BREAKPOINT_LEN_8: - info->len = SH_BREAKPOINT_LEN_8; + hw->len = SH_BREAKPOINT_LEN_8; break; default: return -EINVAL; } /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_R: - info->type = SH_BREAKPOINT_READ; + hw->type = SH_BREAKPOINT_READ; break; case HW_BREAKPOINT_W: - info->type = SH_BREAKPOINT_WRITE; + hw->type = SH_BREAKPOINT_WRITE; break; case HW_BREAKPOINT_W | HW_BREAKPOINT_R: - info->type = SH_BREAKPOINT_RW; + hw->type = SH_BREAKPOINT_RW; break; default: return -EINVAL; @@ -218,19 +218,20 @@ static int arch_build_bp_info(struct perf_event *bp) /* * Validate the arch-specific HW Breakpoint register settings */ -int arch_validate_hwbkpt_settings(struct perf_event *bp) +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); unsigned int align; int ret; - ret = arch_build_bp_info(bp); + ret = arch_build_bp_info(bp, attr, hw); if (ret) return ret; ret = -EINVAL; - switch (info->len) { + switch (hw->len) { case SH_BREAKPOINT_LEN_1: align = 0; break; @@ -251,7 +252,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) * Check that the low-order bits of the address are appropriate * for the alignment implied by len. */ - if (info->address & align) + if (hw->address & align) return -EINVAL; return 0; -- 2.7.4
[PATCH 11/12] perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot()
We soon won't be able to rely on bp->attr anymore to get the new type of the modifying breakpoint because the new attributes are going to be copied only once we successfully modified the breakpoint slot. This will fix the current misdesigned layout where the new attr are copied to the modifying breakpoint before we actually know if the modification will be validated. In order to prepare for that, allow modify_breakpoint_slot() to take the new breakpoint type. Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- kernel/events/hw_breakpoint.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index d7ed438..7840746 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -345,13 +345,13 @@ void release_bp_slot(struct perf_event *bp) mutex_unlock(_bp_mutex); } -static int __modify_bp_slot(struct perf_event *bp, u64 old_type) +static int __modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type) { int err; __release_bp_slot(bp, old_type); - err = __reserve_bp_slot(bp, bp->attr.bp_type); + err = __reserve_bp_slot(bp, new_type); if (err) { /* * Reserve the old_type slot back in case @@ -367,12 +367,12 @@ static int __modify_bp_slot(struct perf_event *bp, u64 old_type) return err; } -static int modify_bp_slot(struct perf_event *bp, u64 old_type) +static int modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type) { int ret; mutex_lock(_bp_mutex); - ret = __modify_bp_slot(bp, old_type); + ret = __modify_bp_slot(bp, old_type, new_type); mutex_unlock(_bp_mutex); return ret; } @@ -481,7 +481,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a err = hw_breakpoint_parse(bp, attr, ); if (!err && modify) - err = modify_bp_slot(bp, old_type); + err = modify_bp_slot(bp, old_type, bp->attr.bp_type); if (err) { bp->attr.bp_addr = old_addr; -- 2.7.4
[PATCH 11/12] perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot()
We soon won't be able to rely on bp->attr anymore to get the new type of the modifying breakpoint because the new attributes are going to be copied only once we successfully modified the breakpoint slot. This will fix the current misdesigned layout where the new attr are copied to the modifying breakpoint before we actually know if the modification will be validated. In order to prepare for that, allow modify_breakpoint_slot() to take the new breakpoint type. Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- kernel/events/hw_breakpoint.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index d7ed438..7840746 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -345,13 +345,13 @@ void release_bp_slot(struct perf_event *bp) mutex_unlock(_bp_mutex); } -static int __modify_bp_slot(struct perf_event *bp, u64 old_type) +static int __modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type) { int err; __release_bp_slot(bp, old_type); - err = __reserve_bp_slot(bp, bp->attr.bp_type); + err = __reserve_bp_slot(bp, new_type); if (err) { /* * Reserve the old_type slot back in case @@ -367,12 +367,12 @@ static int __modify_bp_slot(struct perf_event *bp, u64 old_type) return err; } -static int modify_bp_slot(struct perf_event *bp, u64 old_type) +static int modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type) { int ret; mutex_lock(_bp_mutex); - ret = __modify_bp_slot(bp, old_type); + ret = __modify_bp_slot(bp, old_type, new_type); mutex_unlock(_bp_mutex); return ret; } @@ -481,7 +481,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a err = hw_breakpoint_parse(bp, attr, ); if (!err && modify) - err = modify_bp_slot(bp, old_type); + err = modify_bp_slot(bp, old_type, bp->attr.bp_type); if (err) { bp->attr.bp_addr = old_addr; -- 2.7.4
[PATCH 03/12] x86: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit. Original-patch-by: Andy LutomirskiSigned-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/x86/include/asm/hw_breakpoint.h | 6 +++- arch/x86/kernel/hw_breakpoint.c | 60 ++-- 2 files changed, 35 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h index 7892459..8cc161f 100644 --- a/arch/x86/include/asm/hw_breakpoint.h +++ b/arch/x86/include/asm/hw_breakpoint.h @@ -49,11 +49,15 @@ static inline int hw_breakpoint_slots(int type) return HBP_NUM; } +struct perf_event_attr; struct perf_event; struct pmu; extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index c433791..a8bceb0 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -239,19 +239,20 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX); } -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - - info->address = bp->attr.bp_addr; + hw->address = attr->bp_addr; + hw->mask = 0; /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_W: - info->type = X86_BREAKPOINT_WRITE; + hw->type = X86_BREAKPOINT_WRITE; break; case HW_BREAKPOINT_W | HW_BREAKPOINT_R: - info->type = X86_BREAKPOINT_RW; + hw->type = X86_BREAKPOINT_RW; break; case HW_BREAKPOINT_X: /* @@ -259,23 +260,23 @@ static int arch_build_bp_info(struct perf_event *bp) * acceptable for kprobes. On non-kprobes kernels, we don't * allow kernel breakpoints at all. */ - if (bp->attr.bp_addr >= TASK_SIZE_MAX) { + if (attr->bp_addr >= TASK_SIZE_MAX) { #ifdef CONFIG_KPROBES - if (within_kprobe_blacklist(bp->attr.bp_addr)) + if (within_kprobe_blacklist(attr->bp_addr)) return -EINVAL; #else return -EINVAL; #endif } - info->type = X86_BREAKPOINT_EXECUTE; + hw->type = X86_BREAKPOINT_EXECUTE; /* * x86 inst breakpoints need to have a specific undefined len. * But we still need to check userspace is not trying to setup * an unsupported length, to get a range breakpoint for example. */ - if (bp->attr.bp_len == sizeof(long)) { - info->len = X86_BREAKPOINT_LEN_X; + if (attr->bp_len == sizeof(long)) { + hw->len = X86_BREAKPOINT_LEN_X; return 0; } default: @@ -283,28 +284,26 @@ static int arch_build_bp_info(struct perf_event *bp) } /* Len */ - info->mask = 0; - - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->len = X86_BREAKPOINT_LEN_1; + hw->len =
[PATCH 02/12] perf/breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace()
We can't pass the breakpoint directly on arch_check_bp_in_kernelspace() anymore because its architecture internal datas (struct arch_hw_breakpoint) are not yet filled by the time we call the function, and most implementation need this backend to be up to date. So arrange the function to take the probing struct instead. Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm/include/asm/hw_breakpoint.h | 2 +- arch/arm/kernel/hw_breakpoint.c | 11 +++-- arch/arm64/include/asm/hw_breakpoint.h | 2 +- arch/arm64/kernel/hw_breakpoint.c| 9 ++-- arch/powerpc/include/asm/hw_breakpoint.h | 2 +- arch/powerpc/kernel/hw_breakpoint.c | 6 +-- arch/sh/include/asm/hw_breakpoint.h | 2 +- arch/sh/kernel/hw_breakpoint.c | 9 ++-- arch/x86/include/asm/hw_breakpoint.h | 2 +- arch/x86/kernel/hw_breakpoint.c | 71 +--- arch/xtensa/include/asm/hw_breakpoint.h | 2 +- arch/xtensa/kernel/hw_breakpoint.c | 7 ++-- kernel/events/hw_breakpoint.c| 2 +- 13 files changed, 63 insertions(+), 64 deletions(-) diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h index e46e4e7..d5a0f52 100644 --- a/arch/arm/include/asm/hw_breakpoint.h +++ b/arch/arm/include/asm/hw_breakpoint.h @@ -117,7 +117,7 @@ struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type); -extern int arch_check_bp_in_kernelspace(struct perf_event *bp); +extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int arch_validate_hwbkpt_settings(struct perf_event *bp); extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index 629e251..385dcf4 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -456,14 +456,13 @@ static int get_hbp_len(u8 hbp_len) /* * Check whether bp virtual address is in kernel space. */ -int arch_check_bp_in_kernelspace(struct perf_event *bp) +int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) { unsigned int len; unsigned long va; - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - va = info->address; - len = get_hbp_len(info->ctrl.len); + va = hw->address; + len = get_hbp_len(hw->ctrl.len); return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE); } @@ -576,7 +575,7 @@ static int arch_build_bp_info(struct perf_event *bp) /* Privilege */ info->ctrl.privilege = ARM_BREAKPOINT_USER; - if (arch_check_bp_in_kernelspace(bp)) + if (arch_check_bp_in_kernelspace(info)) info->ctrl.privilege |= ARM_BREAKPOINT_PRIV; /* Enabled? */ @@ -640,7 +639,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) return -EINVAL; /* We don't allow mismatch breakpoints in kernel space. */ - if (arch_check_bp_in_kernelspace(bp)) + if (arch_check_bp_in_kernelspace(info)) return -EPERM; /* diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 4177076..9f4a3d4 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -124,7 +124,7 @@ struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type, int *offset); -extern int arch_check_bp_in_kernelspace(struct perf_event *bp); +extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int arch_validate_hwbkpt_settings(struct perf_event *bp); extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index
[PATCH 03/12] x86: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit. Original-patch-by: Andy Lutomirski Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/x86/include/asm/hw_breakpoint.h | 6 +++- arch/x86/kernel/hw_breakpoint.c | 60 ++-- 2 files changed, 35 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/hw_breakpoint.h b/arch/x86/include/asm/hw_breakpoint.h index 7892459..8cc161f 100644 --- a/arch/x86/include/asm/hw_breakpoint.h +++ b/arch/x86/include/asm/hw_breakpoint.h @@ -49,11 +49,15 @@ static inline int hw_breakpoint_slots(int type) return HBP_NUM; } +struct perf_event_attr; struct perf_event; struct pmu; extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index c433791..a8bceb0 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -239,19 +239,20 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX); } -static int arch_build_bp_info(struct perf_event *bp) +static int arch_build_bp_info(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) { - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - - info->address = bp->attr.bp_addr; + hw->address = attr->bp_addr; + hw->mask = 0; /* Type */ - switch (bp->attr.bp_type) { + switch (attr->bp_type) { case HW_BREAKPOINT_W: - info->type = X86_BREAKPOINT_WRITE; + hw->type = X86_BREAKPOINT_WRITE; break; case HW_BREAKPOINT_W | HW_BREAKPOINT_R: - info->type = X86_BREAKPOINT_RW; + hw->type = X86_BREAKPOINT_RW; break; case HW_BREAKPOINT_X: /* @@ -259,23 +260,23 @@ static int arch_build_bp_info(struct perf_event *bp) * acceptable for kprobes. On non-kprobes kernels, we don't * allow kernel breakpoints at all. */ - if (bp->attr.bp_addr >= TASK_SIZE_MAX) { + if (attr->bp_addr >= TASK_SIZE_MAX) { #ifdef CONFIG_KPROBES - if (within_kprobe_blacklist(bp->attr.bp_addr)) + if (within_kprobe_blacklist(attr->bp_addr)) return -EINVAL; #else return -EINVAL; #endif } - info->type = X86_BREAKPOINT_EXECUTE; + hw->type = X86_BREAKPOINT_EXECUTE; /* * x86 inst breakpoints need to have a specific undefined len. * But we still need to check userspace is not trying to setup * an unsupported length, to get a range breakpoint for example. */ - if (bp->attr.bp_len == sizeof(long)) { - info->len = X86_BREAKPOINT_LEN_X; + if (attr->bp_len == sizeof(long)) { + hw->len = X86_BREAKPOINT_LEN_X; return 0; } default: @@ -283,28 +284,26 @@ static int arch_build_bp_info(struct perf_event *bp) } /* Len */ - info->mask = 0; - - switch (bp->attr.bp_len) { + switch (attr->bp_len) { case HW_BREAKPOINT_LEN_1: - info->len = X86_BREAKPOINT_LEN_1; + hw->len = X86_BREAKPOINT_LEN_1; break; case HW_BREAKPOINT_LEN_2: - info->len = X86_BREAKPOINT_LEN_2; + hw->len = X86_BREAKPOINT_LEN_2; break; case HW_BREAKPOINT_LEN_4: - info->len = X86_BREAKPOINT_LEN_4; + hw->len = X86_BREAKPOINT_LEN_4; break; #ifdef CONFIG_X86_64 case HW_BREAKPOINT_LEN_8: - info->len = X86_BREAKPOINT_LEN_8; + hw->len =
[PATCH 02/12] perf/breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace()
We can't pass the breakpoint directly on arch_check_bp_in_kernelspace() anymore because its architecture internal datas (struct arch_hw_breakpoint) are not yet filled by the time we call the function, and most implementation need this backend to be up to date. So arrange the function to take the probing struct instead. Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/arm/include/asm/hw_breakpoint.h | 2 +- arch/arm/kernel/hw_breakpoint.c | 11 +++-- arch/arm64/include/asm/hw_breakpoint.h | 2 +- arch/arm64/kernel/hw_breakpoint.c| 9 ++-- arch/powerpc/include/asm/hw_breakpoint.h | 2 +- arch/powerpc/kernel/hw_breakpoint.c | 6 +-- arch/sh/include/asm/hw_breakpoint.h | 2 +- arch/sh/kernel/hw_breakpoint.c | 9 ++-- arch/x86/include/asm/hw_breakpoint.h | 2 +- arch/x86/kernel/hw_breakpoint.c | 71 +--- arch/xtensa/include/asm/hw_breakpoint.h | 2 +- arch/xtensa/kernel/hw_breakpoint.c | 7 ++-- kernel/events/hw_breakpoint.c| 2 +- 13 files changed, 63 insertions(+), 64 deletions(-) diff --git a/arch/arm/include/asm/hw_breakpoint.h b/arch/arm/include/asm/hw_breakpoint.h index e46e4e7..d5a0f52 100644 --- a/arch/arm/include/asm/hw_breakpoint.h +++ b/arch/arm/include/asm/hw_breakpoint.h @@ -117,7 +117,7 @@ struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type); -extern int arch_check_bp_in_kernelspace(struct perf_event *bp); +extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int arch_validate_hwbkpt_settings(struct perf_event *bp); extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c index 629e251..385dcf4 100644 --- a/arch/arm/kernel/hw_breakpoint.c +++ b/arch/arm/kernel/hw_breakpoint.c @@ -456,14 +456,13 @@ static int get_hbp_len(u8 hbp_len) /* * Check whether bp virtual address is in kernel space. */ -int arch_check_bp_in_kernelspace(struct perf_event *bp) +int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) { unsigned int len; unsigned long va; - struct arch_hw_breakpoint *info = counter_arch_bp(bp); - va = info->address; - len = get_hbp_len(info->ctrl.len); + va = hw->address; + len = get_hbp_len(hw->ctrl.len); return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE); } @@ -576,7 +575,7 @@ static int arch_build_bp_info(struct perf_event *bp) /* Privilege */ info->ctrl.privilege = ARM_BREAKPOINT_USER; - if (arch_check_bp_in_kernelspace(bp)) + if (arch_check_bp_in_kernelspace(info)) info->ctrl.privilege |= ARM_BREAKPOINT_PRIV; /* Enabled? */ @@ -640,7 +639,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) return -EINVAL; /* We don't allow mismatch breakpoints in kernel space. */ - if (arch_check_bp_in_kernelspace(bp)) + if (arch_check_bp_in_kernelspace(info)) return -EPERM; /* diff --git a/arch/arm64/include/asm/hw_breakpoint.h b/arch/arm64/include/asm/hw_breakpoint.h index 4177076..9f4a3d4 100644 --- a/arch/arm64/include/asm/hw_breakpoint.h +++ b/arch/arm64/include/asm/hw_breakpoint.h @@ -124,7 +124,7 @@ struct pmu; extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl, int *gen_len, int *gen_type, int *offset); -extern int arch_check_bp_in_kernelspace(struct perf_event *bp); +extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); extern int arch_validate_hwbkpt_settings(struct perf_event *bp); extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c index 74bb56f..92c6973 100644 --- a/arch/arm64/kernel/hw_breakpoint.c +++ b/arch/arm64/kernel/hw_breakpoint.c @@ -344,14 +344,13 @@ static int get_hbp_len(u8 hbp_len) /* * Check whether bp virtual address is in kernel space. */ -int arch_check_bp_in_kernelspace(struct perf_event *bp) +int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw) { unsigned int len; unsigned long va; - struct arch_hw_breakpoint *info =
[PATCH 04/12] powerpc: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/powerpc/include/asm/hw_breakpoint.h | 6 - arch/powerpc/kernel/hw_breakpoint.c | 41 2 files changed, 26 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 4d0b1bf..8f647e6 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -52,6 +52,7 @@ struct arch_hw_breakpoint { #include #include +struct perf_event_attr; struct perf_event; struct pmu; struct perf_sample_data; @@ -61,7 +62,10 @@ struct perf_sample_data; extern int hw_breakpoint_slots(int type); extern int arch_bp_generic_fields(int type, int *gen_bp_type); extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); int arch_install_hw_breakpoint(struct perf_event *bp); diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 348cac9..fba6527 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -139,30 +139,31 @@ int arch_bp_generic_fields(int type, int *gen_bp_type) /* * Validate the arch-specific HW Breakpoint register settings */ -int arch_validate_hwbkpt_settings(struct perf_event *bp) +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { int ret = -EINVAL, length_max; - struct arch_hw_breakpoint *info = counter_arch_bp(bp); if (!bp) return ret; - info->type = HW_BRK_TYPE_TRANSLATE; - if (bp->attr.bp_type & HW_BREAKPOINT_R) - info->type |= HW_BRK_TYPE_READ; - if (bp->attr.bp_type & HW_BREAKPOINT_W) - info->type |= HW_BRK_TYPE_WRITE; - if (info->type == HW_BRK_TYPE_TRANSLATE) + hw->type = HW_BRK_TYPE_TRANSLATE; + if (attr->bp_type & HW_BREAKPOINT_R) + hw->type |= HW_BRK_TYPE_READ; + if (attr->bp_type & HW_BREAKPOINT_W) + hw->type |= HW_BRK_TYPE_WRITE; + if (hw->type == HW_BRK_TYPE_TRANSLATE) /* must set alteast read or write */ return ret; - if (!(bp->attr.exclude_user)) - info->type |= HW_BRK_TYPE_USER; - if (!(bp->attr.exclude_kernel)) - info->type |= HW_BRK_TYPE_KERNEL; - if (!(bp->attr.exclude_hv)) - info->type |= HW_BRK_TYPE_HYP; - info->address = bp->attr.bp_addr; - info->len = bp->attr.bp_len; + if (!attr->exclude_user) + hw->type |= HW_BRK_TYPE_USER; + if (!attr->exclude_kernel) + hw->type |= HW_BRK_TYPE_KERNEL; + if (!attr->exclude_hv) + hw->type |= HW_BRK_TYPE_HYP; + hw->address = attr->bp_addr; + hw->len = attr->bp_len; /* * Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8) @@ -176,12 +177,12 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) if (cpu_has_feature(CPU_FTR_DAWR)) { length_max = 512 ; /* 64 doublewords */ /* DAWR region can't cross 512 boundary */ - if ((bp->attr.bp_addr >> 10) != - ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10)) + if ((attr->bp_addr >> 10) != + ((attr->bp_addr + attr->bp_len - 1) >> 10)) return -EINVAL; } - if
[PATCH 07/12] sh: Remove "struct arch_hw_breakpoint::name" unused field
This field seem to be unused, perhaps a leftover from old code... Signed-off-by: Frederic WeisbeckerCc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/sh/include/asm/hw_breakpoint.h | 1 - arch/sh/kernel/hw_breakpoint.c | 7 --- 2 files changed, 8 deletions(-) diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h index 8a88ed0..dae622d 100644 --- a/arch/sh/include/asm/hw_breakpoint.h +++ b/arch/sh/include/asm/hw_breakpoint.h @@ -10,7 +10,6 @@ #include struct arch_hw_breakpoint { - char*name; /* Contains name of the symbol to set bkpt */ unsigned long address; u16 len; u16 type; diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c index 713699b..3594773 100644 --- a/arch/sh/kernel/hw_breakpoint.c +++ b/arch/sh/kernel/hw_breakpoint.c @@ -248,13 +248,6 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) } /* -* For kernel-addresses, either the address or symbol name can be -* specified. -*/ - if (info->name) - info->address = (unsigned long)kallsyms_lookup_name(info->name); - - /* * Check that the low-order bits of the address are appropriate * for the alignment implied by len. */ -- 2.7.4
[PATCH 04/12] powerpc: Implement hw_breakpoint_arch_parse()
Migrate to the new API in order to remove arch_validate_hwbkpt_settings() that clumsily mixes up architecture validation and commit Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/powerpc/include/asm/hw_breakpoint.h | 6 - arch/powerpc/kernel/hw_breakpoint.c | 41 2 files changed, 26 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/hw_breakpoint.h b/arch/powerpc/include/asm/hw_breakpoint.h index 4d0b1bf..8f647e6 100644 --- a/arch/powerpc/include/asm/hw_breakpoint.h +++ b/arch/powerpc/include/asm/hw_breakpoint.h @@ -52,6 +52,7 @@ struct arch_hw_breakpoint { #include #include +struct perf_event_attr; struct perf_event; struct pmu; struct perf_sample_data; @@ -61,7 +62,10 @@ struct perf_sample_data; extern int hw_breakpoint_slots(int type); extern int arch_bp_generic_fields(int type, int *gen_bp_type); extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw); -extern int arch_validate_hwbkpt_settings(struct perf_event *bp); +extern int hw_breakpoint_arch_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw); +#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused, unsigned long val, void *data); int arch_install_hw_breakpoint(struct perf_event *bp); diff --git a/arch/powerpc/kernel/hw_breakpoint.c b/arch/powerpc/kernel/hw_breakpoint.c index 348cac9..fba6527 100644 --- a/arch/powerpc/kernel/hw_breakpoint.c +++ b/arch/powerpc/kernel/hw_breakpoint.c @@ -139,30 +139,31 @@ int arch_bp_generic_fields(int type, int *gen_bp_type) /* * Validate the arch-specific HW Breakpoint register settings */ -int arch_validate_hwbkpt_settings(struct perf_event *bp) +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { int ret = -EINVAL, length_max; - struct arch_hw_breakpoint *info = counter_arch_bp(bp); if (!bp) return ret; - info->type = HW_BRK_TYPE_TRANSLATE; - if (bp->attr.bp_type & HW_BREAKPOINT_R) - info->type |= HW_BRK_TYPE_READ; - if (bp->attr.bp_type & HW_BREAKPOINT_W) - info->type |= HW_BRK_TYPE_WRITE; - if (info->type == HW_BRK_TYPE_TRANSLATE) + hw->type = HW_BRK_TYPE_TRANSLATE; + if (attr->bp_type & HW_BREAKPOINT_R) + hw->type |= HW_BRK_TYPE_READ; + if (attr->bp_type & HW_BREAKPOINT_W) + hw->type |= HW_BRK_TYPE_WRITE; + if (hw->type == HW_BRK_TYPE_TRANSLATE) /* must set alteast read or write */ return ret; - if (!(bp->attr.exclude_user)) - info->type |= HW_BRK_TYPE_USER; - if (!(bp->attr.exclude_kernel)) - info->type |= HW_BRK_TYPE_KERNEL; - if (!(bp->attr.exclude_hv)) - info->type |= HW_BRK_TYPE_HYP; - info->address = bp->attr.bp_addr; - info->len = bp->attr.bp_len; + if (!attr->exclude_user) + hw->type |= HW_BRK_TYPE_USER; + if (!attr->exclude_kernel) + hw->type |= HW_BRK_TYPE_KERNEL; + if (!attr->exclude_hv) + hw->type |= HW_BRK_TYPE_HYP; + hw->address = attr->bp_addr; + hw->len = attr->bp_len; /* * Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8) @@ -176,12 +177,12 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) if (cpu_has_feature(CPU_FTR_DAWR)) { length_max = 512 ; /* 64 doublewords */ /* DAWR region can't cross 512 boundary */ - if ((bp->attr.bp_addr >> 10) != - ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10)) + if ((attr->bp_addr >> 10) != + ((attr->bp_addr + attr->bp_len - 1) >> 10)) return -EINVAL; } - if (info->len > - (length_max - (info->address & HW_BREAKPOINT_ALIGN))) + if (hw->len > + (length_max - (hw->address & HW_BREAKPOINT_ALIGN))) return -EINVAL; return 0; } -- 2.7.4
[PATCH 07/12] sh: Remove "struct arch_hw_breakpoint::name" unused field
This field seem to be unused, perhaps a leftover from old code... Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- arch/sh/include/asm/hw_breakpoint.h | 1 - arch/sh/kernel/hw_breakpoint.c | 7 --- 2 files changed, 8 deletions(-) diff --git a/arch/sh/include/asm/hw_breakpoint.h b/arch/sh/include/asm/hw_breakpoint.h index 8a88ed0..dae622d 100644 --- a/arch/sh/include/asm/hw_breakpoint.h +++ b/arch/sh/include/asm/hw_breakpoint.h @@ -10,7 +10,6 @@ #include struct arch_hw_breakpoint { - char*name; /* Contains name of the symbol to set bkpt */ unsigned long address; u16 len; u16 type; diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c index 713699b..3594773 100644 --- a/arch/sh/kernel/hw_breakpoint.c +++ b/arch/sh/kernel/hw_breakpoint.c @@ -248,13 +248,6 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp) } /* -* For kernel-addresses, either the address or symbol name can be -* specified. -*/ - if (info->name) - info->address = (unsigned long)kallsyms_lookup_name(info->name); - - /* * Check that the low-order bits of the address are appropriate * for the alignment implied by len. */ -- 2.7.4
[PATCH 01/12] perf/breakpoint: Split attribute parse and commit
arch_validate_hwbkpt_settings() mixes up attribute check and commit into a single code entity. Therefore the validation may return an error due to incorrect atributes while still leaving halfway modified architecture breakpoint data. This is harmless when we deal with a new breakpoint but it becomes a problem when we modify an existing breakpoint. Split attribute parse and commit to fix that. The architecture is passed a "struct arch_hw_breakpoint" to fill on top of the new attr and the core takes care about copying the backend data once it's fully validated. The architectures then need to implement the new API. Reported-by: Linus TorvaldsOriginal-patch-by: Andy Lutomirski Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- kernel/events/hw_breakpoint.c | 57 +++ 1 file changed, 41 insertions(+), 16 deletions(-) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index 6e28d28..51320c2 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -400,16 +400,35 @@ int dbg_release_bp_slot(struct perf_event *bp) return 0; } -static int validate_hw_breakpoint(struct perf_event *bp) +#ifndef hw_breakpoint_arch_parse +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - int ret; + int err; - ret = arch_validate_hwbkpt_settings(bp); - if (ret) - return ret; + err = arch_validate_hwbkpt_settings(bp); + if (err) + return err; + + *hw = bp->hw.info; + + return 0; +} +#endif + +static int hw_breakpoint_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) +{ + int err; + + err = hw_breakpoint_arch_parse(bp, attr, hw); + if (err) + return err; if (arch_check_bp_in_kernelspace(bp)) { - if (bp->attr.exclude_kernel) + if (attr->exclude_kernel) return -EINVAL; /* * Don't let unprivileged users set a breakpoint in the trap @@ -424,19 +443,22 @@ static int validate_hw_breakpoint(struct perf_event *bp) int register_perf_hw_breakpoint(struct perf_event *bp) { - int ret; + struct arch_hw_breakpoint hw; + int err; - ret = reserve_bp_slot(bp); - if (ret) - return ret; + err = reserve_bp_slot(bp); + if (err) + return err; - ret = validate_hw_breakpoint(bp); - - /* if arch_validate_hwbkpt_settings() fails then release bp slot */ - if (ret) + err = hw_breakpoint_parse(bp, >attr, ); + if (err) { release_bp_slot(bp); + return err; + } - return ret; + bp->hw.info = hw; + + return 0; } /** @@ -464,6 +486,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a u64 old_len = bp->attr.bp_len; int old_type = bp->attr.bp_type; bool modify = attr->bp_type != old_type; + struct arch_hw_breakpoint hw; int err = 0; bp->attr.bp_addr = attr->bp_addr; @@ -473,7 +496,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a if (check && memcmp(>attr, attr, sizeof(*attr))) return -EINVAL; - err = validate_hw_breakpoint(bp); + err = hw_breakpoint_parse(bp, attr, ); if (!err && modify) err = modify_bp_slot(bp, old_type); @@ -484,7 +507,9 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a return err; } + bp->hw.info = hw; bp->attr.disabled = attr->disabled; + return 0; } -- 2.7.4
[PATCH 01/12] perf/breakpoint: Split attribute parse and commit
arch_validate_hwbkpt_settings() mixes up attribute check and commit into a single code entity. Therefore the validation may return an error due to incorrect atributes while still leaving halfway modified architecture breakpoint data. This is harmless when we deal with a new breakpoint but it becomes a problem when we modify an existing breakpoint. Split attribute parse and commit to fix that. The architecture is passed a "struct arch_hw_breakpoint" to fill on top of the new attr and the core takes care about copying the backend data once it's fully validated. The architectures then need to implement the new API. Reported-by: Linus Torvalds Original-patch-by: Andy Lutomirski Signed-off-by: Frederic Weisbecker Cc: Linus Torvalds Cc: Andy Lutomirski Cc: Yoshinori Sato Cc: Rich Felker Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Will Deacon Cc: Mark Rutland Cc: Max Filippov Cc: Chris Zankel Cc: Catalin Marinas Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Jiri Olsa Cc: Namhyung Kim Cc: Joel Fernandes --- kernel/events/hw_breakpoint.c | 57 +++ 1 file changed, 41 insertions(+), 16 deletions(-) diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c index 6e28d28..51320c2 100644 --- a/kernel/events/hw_breakpoint.c +++ b/kernel/events/hw_breakpoint.c @@ -400,16 +400,35 @@ int dbg_release_bp_slot(struct perf_event *bp) return 0; } -static int validate_hw_breakpoint(struct perf_event *bp) +#ifndef hw_breakpoint_arch_parse +int hw_breakpoint_arch_parse(struct perf_event *bp, +struct perf_event_attr *attr, +struct arch_hw_breakpoint *hw) { - int ret; + int err; - ret = arch_validate_hwbkpt_settings(bp); - if (ret) - return ret; + err = arch_validate_hwbkpt_settings(bp); + if (err) + return err; + + *hw = bp->hw.info; + + return 0; +} +#endif + +static int hw_breakpoint_parse(struct perf_event *bp, + struct perf_event_attr *attr, + struct arch_hw_breakpoint *hw) +{ + int err; + + err = hw_breakpoint_arch_parse(bp, attr, hw); + if (err) + return err; if (arch_check_bp_in_kernelspace(bp)) { - if (bp->attr.exclude_kernel) + if (attr->exclude_kernel) return -EINVAL; /* * Don't let unprivileged users set a breakpoint in the trap @@ -424,19 +443,22 @@ static int validate_hw_breakpoint(struct perf_event *bp) int register_perf_hw_breakpoint(struct perf_event *bp) { - int ret; + struct arch_hw_breakpoint hw; + int err; - ret = reserve_bp_slot(bp); - if (ret) - return ret; + err = reserve_bp_slot(bp); + if (err) + return err; - ret = validate_hw_breakpoint(bp); - - /* if arch_validate_hwbkpt_settings() fails then release bp slot */ - if (ret) + err = hw_breakpoint_parse(bp, >attr, ); + if (err) { release_bp_slot(bp); + return err; + } - return ret; + bp->hw.info = hw; + + return 0; } /** @@ -464,6 +486,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a u64 old_len = bp->attr.bp_len; int old_type = bp->attr.bp_type; bool modify = attr->bp_type != old_type; + struct arch_hw_breakpoint hw; int err = 0; bp->attr.bp_addr = attr->bp_addr; @@ -473,7 +496,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a if (check && memcmp(>attr, attr, sizeof(*attr))) return -EINVAL; - err = validate_hw_breakpoint(bp); + err = hw_breakpoint_parse(bp, attr, ); if (!err && modify) err = modify_bp_slot(bp, old_type); @@ -484,7 +507,9 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr *a return err; } + bp->hw.info = hw; bp->attr.disabled = attr->disabled; + return 0; } -- 2.7.4
[PATCH 00/12] breakpoint: Rework arch validation v2
Changes since v1: - Avoid arch code duplication between check and commit. Use a temporary struct arch_hw_breakpoint to fill up arch data on the fly (Mark Rutland) - Start with a default weak version of hw_breakpoint_arch_parse() that architectures override one by one (Peter Zijlstra, Andy Lutomirski) git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git perf/breakpoint-v2 HEAD: a350174ed8f7240b91ae46933bfe2274d3d03806 --- Summary --- When we modify a hardware breakpoint, the architecture code fills up the architecture data as the validation of generic attributes progresses. If something goes wrong in the middle, the architecture data changes aren't rolled back and we are left with a halfway fiddled breakpoint. This set fixes the various misdesigns that back this bad behaviour. Thanks, Frederic --- Frederic Weisbecker (12): perf/breakpoint: Split attribute parse and commit perf/breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace() x86: Implement hw_breakpoint_arch_parse() powerpc: Implement hw_breakpoint_arch_parse() arm: Implement hw_breakpoint_arch_parse() arm64: Implement hw_breakpoint_arch_parse() sh: Remove "struct arch_hw_breakpoint::name" unused field sh: Implement hw_breakpoint_arch_parse() xtensa: Implement hw_breakpoint_arch_parse() perf/breakpoint: Remove default hw_breakpoint_arch_parse() perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot() perf/breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check() arch/arm/include/asm/hw_breakpoint.h | 7 +- arch/arm/kernel/hw_breakpoint.c | 78 +- arch/arm64/include/asm/hw_breakpoint.h | 7 +- arch/arm64/kernel/hw_breakpoint.c| 86 ++-- arch/powerpc/include/asm/hw_breakpoint.h | 7 +- arch/powerpc/kernel/hw_breakpoint.c | 47 ++- arch/sh/include/asm/hw_breakpoint.h | 8 +- arch/sh/kernel/hw_breakpoint.c | 53 ++--- arch/x86/include/asm/hw_breakpoint.h | 7 +- arch/x86/kernel/hw_breakpoint.c | 131 --- arch/xtensa/include/asm/hw_breakpoint.h | 7 +- arch/xtensa/kernel/hw_breakpoint.c | 40 -- kernel/events/hw_breakpoint.c| 92 +- 13 files changed, 294 insertions(+), 276 deletions(-)
[PATCH 00/12] breakpoint: Rework arch validation v2
Changes since v1: - Avoid arch code duplication between check and commit. Use a temporary struct arch_hw_breakpoint to fill up arch data on the fly (Mark Rutland) - Start with a default weak version of hw_breakpoint_arch_parse() that architectures override one by one (Peter Zijlstra, Andy Lutomirski) git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git perf/breakpoint-v2 HEAD: a350174ed8f7240b91ae46933bfe2274d3d03806 --- Summary --- When we modify a hardware breakpoint, the architecture code fills up the architecture data as the validation of generic attributes progresses. If something goes wrong in the middle, the architecture data changes aren't rolled back and we are left with a halfway fiddled breakpoint. This set fixes the various misdesigns that back this bad behaviour. Thanks, Frederic --- Frederic Weisbecker (12): perf/breakpoint: Split attribute parse and commit perf/breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace() x86: Implement hw_breakpoint_arch_parse() powerpc: Implement hw_breakpoint_arch_parse() arm: Implement hw_breakpoint_arch_parse() arm64: Implement hw_breakpoint_arch_parse() sh: Remove "struct arch_hw_breakpoint::name" unused field sh: Implement hw_breakpoint_arch_parse() xtensa: Implement hw_breakpoint_arch_parse() perf/breakpoint: Remove default hw_breakpoint_arch_parse() perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot() perf/breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check() arch/arm/include/asm/hw_breakpoint.h | 7 +- arch/arm/kernel/hw_breakpoint.c | 78 +- arch/arm64/include/asm/hw_breakpoint.h | 7 +- arch/arm64/kernel/hw_breakpoint.c| 86 ++-- arch/powerpc/include/asm/hw_breakpoint.h | 7 +- arch/powerpc/kernel/hw_breakpoint.c | 47 ++- arch/sh/include/asm/hw_breakpoint.h | 8 +- arch/sh/kernel/hw_breakpoint.c | 53 ++--- arch/x86/include/asm/hw_breakpoint.h | 7 +- arch/x86/kernel/hw_breakpoint.c | 131 --- arch/xtensa/include/asm/hw_breakpoint.h | 7 +- arch/xtensa/kernel/hw_breakpoint.c | 40 -- kernel/events/hw_breakpoint.c| 92 +- 13 files changed, 294 insertions(+), 276 deletions(-)
Re: [PATCH 8/9] perf/breakpoint: Split breakpoint "check" and "commit"
On Tue, May 15, 2018 at 09:58:03PM -0700, Andy Lutomirski wrote: > > > > On May 15, 2018, at 8:11 PM, Frederic Weisbecker> > wrote: > > > >> On Wed, May 09, 2018 at 11:17:03AM +0200, Peter Zijlstra wrote: > >>> On Sun, May 06, 2018 at 09:19:54PM +0200, Frederic Weisbecker wrote: > >>> arch/arm/include/asm/hw_breakpoint.h | 5 - > >>> arch/arm/kernel/hw_breakpoint.c | 22 +++--- > >>> arch/arm64/include/asm/hw_breakpoint.h | 5 - > >>> arch/arm64/kernel/hw_breakpoint.c| 22 +++--- > >>> arch/powerpc/include/asm/hw_breakpoint.h | 5 - > >>> arch/powerpc/kernel/hw_breakpoint.c | 22 +++--- > >>> arch/sh/include/asm/hw_breakpoint.h | 5 - > >>> arch/sh/kernel/hw_breakpoint.c | 22 +++--- > >>> arch/x86/include/asm/hw_breakpoint.h | 5 - > >>> arch/x86/kernel/hw_breakpoint.c | 23 +++ > >>> arch/xtensa/include/asm/hw_breakpoint.h | 5 - > >>> arch/xtensa/kernel/hw_breakpoint.c | 22 +++--- > >> > >> Because of those ^, > >> > >>> kernel/events/hw_breakpoint.c| 11 ++- > >> > >> would it not make sense to have a prelimenary patch doing something > >> like: > >> > >> __weak int hw_breakpoint_arch_check(struct perf_event *bp) > >> { > >>return arch_validate_hwbkpt_settings(bp); > >> } > > > > So eventually I fear I can't do that, due to linking order. > > > > Say I convert x86 to implement hw_breakpoint_arch_check(), so I > > remove arch_validate_hwbkpt_settings(). On build time, the weak version > > is still compiled and can't find a declaration for > > arch_validate_hwbkpt_settings(). > > > > I tried to keep the declaration while the definition has been removed but > > it seems the weak version is linked first before it gets later replaced by > > the overriden arch version. So I get a build error. > > > > I could keep arch_validate_hwbkpt_settings() around on all archs and remove > > it in > > the end with the weak version but that would defeat the purpose of removing > > the mid-state in the current patch. > > How about just not using weak functions? Weak functions have annoying issues > like this, and they have trouble generating good code. I much prefer the > pattern: > > in arch header: > extern void arch_func(whatever); > #define arch_func arch_func > > in generic header: > #ifndef arch_func > static inline void arch_func(whatever) ... > #endif Thanks, that works well!
Re: [PATCH 8/9] perf/breakpoint: Split breakpoint "check" and "commit"
On Tue, May 15, 2018 at 09:58:03PM -0700, Andy Lutomirski wrote: > > > > On May 15, 2018, at 8:11 PM, Frederic Weisbecker > > wrote: > > > >> On Wed, May 09, 2018 at 11:17:03AM +0200, Peter Zijlstra wrote: > >>> On Sun, May 06, 2018 at 09:19:54PM +0200, Frederic Weisbecker wrote: > >>> arch/arm/include/asm/hw_breakpoint.h | 5 - > >>> arch/arm/kernel/hw_breakpoint.c | 22 +++--- > >>> arch/arm64/include/asm/hw_breakpoint.h | 5 - > >>> arch/arm64/kernel/hw_breakpoint.c| 22 +++--- > >>> arch/powerpc/include/asm/hw_breakpoint.h | 5 - > >>> arch/powerpc/kernel/hw_breakpoint.c | 22 +++--- > >>> arch/sh/include/asm/hw_breakpoint.h | 5 - > >>> arch/sh/kernel/hw_breakpoint.c | 22 +++--- > >>> arch/x86/include/asm/hw_breakpoint.h | 5 - > >>> arch/x86/kernel/hw_breakpoint.c | 23 +++ > >>> arch/xtensa/include/asm/hw_breakpoint.h | 5 - > >>> arch/xtensa/kernel/hw_breakpoint.c | 22 +++--- > >> > >> Because of those ^, > >> > >>> kernel/events/hw_breakpoint.c| 11 ++- > >> > >> would it not make sense to have a prelimenary patch doing something > >> like: > >> > >> __weak int hw_breakpoint_arch_check(struct perf_event *bp) > >> { > >>return arch_validate_hwbkpt_settings(bp); > >> } > > > > So eventually I fear I can't do that, due to linking order. > > > > Say I convert x86 to implement hw_breakpoint_arch_check(), so I > > remove arch_validate_hwbkpt_settings(). On build time, the weak version > > is still compiled and can't find a declaration for > > arch_validate_hwbkpt_settings(). > > > > I tried to keep the declaration while the definition has been removed but > > it seems the weak version is linked first before it gets later replaced by > > the overriden arch version. So I get a build error. > > > > I could keep arch_validate_hwbkpt_settings() around on all archs and remove > > it in > > the end with the weak version but that would defeat the purpose of removing > > the mid-state in the current patch. > > How about just not using weak functions? Weak functions have annoying issues > like this, and they have trouble generating good code. I much prefer the > pattern: > > in arch header: > extern void arch_func(whatever); > #define arch_func arch_func > > in generic header: > #ifndef arch_func > static inline void arch_func(whatever) ... > #endif Thanks, that works well!
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
2018-05-19 3:25 GMT+01:00 Dmitry Safonov <0x7f454...@gmail.com>: >> Here is the function: >> 00400842 : >> 400842: 53 push %rbx >> 400843: 55 push %rbp >> 400844: 41 54 push %r12 >> 400846: 41 55 push %r13 >> 400848: 41 56 push %r14 >> 40084a: 41 57 push %r15 >> 40084c: 9c pushfq >> 40084d: 48 89 27mov%rsp,(%rdi) >> 400850: 48 89 fcmov%rdi,%rsp >> 400853: 6a 23 pushq $0x23 >> 400855: 68 5c 08 40 00 pushq $0x40085c >> 40085a: 48 cb lretq >> 40085c: ff d6 callq *%rsi >> 40085e: ea (bad) >> 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) >> 400863: 33 00 xor(%rax),%eax >> 400865: 48 8b 24 24 mov(%rsp),%rsp >> 400869: 9d popfq >> 40086a: 41 5f pop%r15 >> 40086c: 41 5e pop%r14 >> 40086e: 41 5d pop%r13 >> 400870: 41 5c pop%r12 >> 400872: 5d pop%rbp >> 400873: 5b pop%rbx >> 400874: c3 retq >> 400875: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) >> 40087c: 00 00 00 >> 40087f: 90 nop >> >> Looks like mov between registers caused it? The hell. > > Oh, it's not 400850, I missloked, but 40085a so lretq might case it. But it's 002b:417bafe8 USER_DS and sensible address, still no idea. -- Dmitry
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
2018-05-19 3:25 GMT+01:00 Dmitry Safonov <0x7f454...@gmail.com>: >> Here is the function: >> 00400842 : >> 400842: 53 push %rbx >> 400843: 55 push %rbp >> 400844: 41 54 push %r12 >> 400846: 41 55 push %r13 >> 400848: 41 56 push %r14 >> 40084a: 41 57 push %r15 >> 40084c: 9c pushfq >> 40084d: 48 89 27mov%rsp,(%rdi) >> 400850: 48 89 fcmov%rdi,%rsp >> 400853: 6a 23 pushq $0x23 >> 400855: 68 5c 08 40 00 pushq $0x40085c >> 40085a: 48 cb lretq >> 40085c: ff d6 callq *%rsi >> 40085e: ea (bad) >> 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) >> 400863: 33 00 xor(%rax),%eax >> 400865: 48 8b 24 24 mov(%rsp),%rsp >> 400869: 9d popfq >> 40086a: 41 5f pop%r15 >> 40086c: 41 5e pop%r14 >> 40086e: 41 5d pop%r13 >> 400870: 41 5c pop%r12 >> 400872: 5d pop%rbp >> 400873: 5b pop%rbx >> 400874: c3 retq >> 400875: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) >> 40087c: 00 00 00 >> 40087f: 90 nop >> >> Looks like mov between registers caused it? The hell. > > Oh, it's not 400850, I missloked, but 40085a so lretq might case it. But it's 002b:417bafe8 USER_DS and sensible address, still no idea. -- Dmitry
Re: [RFC v4 3/5] virtio_ring: add packed ring support
On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote: > On 2018年05月18日 22:33, Tiwei Bie wrote: > > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote: > > > On 2018年05月18日 19:29, Tiwei Bie wrote: > > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote: > > > > > On 2018年05月16日 22:33, Tiwei Bie wrote: > > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote: > > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote: > > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote: > > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote: > > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote: > > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote: > > > > > > [...] > > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue > > > > > > > > > > > > *vq, unsigned int head, > > > > > > > > > > > > + unsigned int id, void > > > > > > > > > > > > **ctx) > > > > > > > > > > > > +{ > > > > > > > > > > > > + struct vring_packed_desc *desc; > > > > > > > > > > > > + unsigned int i, j; > > > > > > > > > > > > + > > > > > > > > > > > > + /* Clear data ptr. */ > > > > > > > > > > > > + vq->desc_state[id].data = NULL; > > > > > > > > > > > > + > > > > > > > > > > > > + i = head; > > > > > > > > > > > > + > > > > > > > > > > > > + for (j = 0; j < vq->desc_state[id].num; j++) { > > > > > > > > > > > > + desc = >vring_packed.desc[i]; > > > > > > > > > > > > + vring_unmap_one_packed(vq, desc); > > > > > > > > > > > As mentioned in previous discussion, this probably won't > > > > > > > > > > > work for the case > > > > > > > > > > > of out of order completion since it depends on the > > > > > > > > > > > information in the > > > > > > > > > > > descriptor ring. We probably need to extend ctx to record > > > > > > > > > > > such information. > > > > > > > > > > Above code doesn't depend on the information in the > > > > > > > > > > descriptor > > > > > > > > > > ring. The vq->desc_state[] is the extended ctx. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Tiwei Bie > > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so > > > > > > > > > vring_unmap_one_packed() still depends on the content of > > > > > > > > > descriptor ring? > > > > > > > > > > > > > > > > > I got your point now. I think it makes sense to reserve > > > > > > > > the bits of the addr field. Driver shouldn't try to get > > > > > > > > addrs from the descriptors when cleanup the descriptors > > > > > > > > no matter whether we support out-of-order or not. > > > > > > > Maybe I was wrong, but I remember spec mentioned something like > > > > > > > this. > > > > > > You're right. Spec mentioned this. I was just repeating > > > > > > the spec to emphasize that it does make sense. :) > > > > > > > > > > > > > > But combining it with the out-of-order support, it will > > > > > > > > mean that the driver still needs to maintain a desc/ctx > > > > > > > > list that is very similar to the desc ring in the split > > > > > > > > ring. I'm not quite sure whether it's something we want. > > > > > > > > If it is true, I'll do it. So do you think we also want > > > > > > > > to maintain such a desc/ctx list for packed ring? > > > > > > > To make it work for OOO backends I think we need something like > > > > > > > this > > > > > > > (hardware NIC drivers are usually have something like this). > > > > > > Which hardware NIC drivers have this? > > > > > It's quite common I think, e.g driver track e.g dma addr and page frag > > > > > somewhere. e.g the ring->rx_info in mlx4 driver. > > > > It seems that I had a misunderstanding on your > > > > previous comments. I know it's quite common for > > > > drivers to track e.g. DMA addrs somewhere (and > > > > I think one reason behind this is that they want > > > > to reuse the bits of addr field). > > > Yes, we may want this for virtio-net as well in the future. > > > > > > >But tracking > > > > addrs somewhere doesn't means supporting OOO. > > > > I thought you were saying it's quite common for > > > > hardware NIC drivers to support OOO (i.e. NICs > > > > will return the descriptors OOO): > > > > > > > > I'm not familiar with mlx4, maybe I'm wrong. > > > > I just had a quick glance. And I found below > > > > comments in mlx4_en_process_rx_cq(): > > > > > > > > ``` > > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx > > > >* descriptor offset can be deduced from the CQE index instead of > > > >* reading 'cqe->index' */ > > > > index = cq->mcq.cons_index & ring->size_mask; > > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor; > > > > ``` > > > > > > > > It seems that although they have a completion > > > > queue, they are still using the ring in order. > > > I guess so (at least from the above bits). Git grep -i "out of order" in > > >
Re: [RFC v4 3/5] virtio_ring: add packed ring support
On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote: > On 2018年05月18日 22:33, Tiwei Bie wrote: > > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote: > > > On 2018年05月18日 19:29, Tiwei Bie wrote: > > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote: > > > > > On 2018年05月16日 22:33, Tiwei Bie wrote: > > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote: > > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote: > > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote: > > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote: > > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote: > > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote: > > > > > > [...] > > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue > > > > > > > > > > > > *vq, unsigned int head, > > > > > > > > > > > > + unsigned int id, void > > > > > > > > > > > > **ctx) > > > > > > > > > > > > +{ > > > > > > > > > > > > + struct vring_packed_desc *desc; > > > > > > > > > > > > + unsigned int i, j; > > > > > > > > > > > > + > > > > > > > > > > > > + /* Clear data ptr. */ > > > > > > > > > > > > + vq->desc_state[id].data = NULL; > > > > > > > > > > > > + > > > > > > > > > > > > + i = head; > > > > > > > > > > > > + > > > > > > > > > > > > + for (j = 0; j < vq->desc_state[id].num; j++) { > > > > > > > > > > > > + desc = >vring_packed.desc[i]; > > > > > > > > > > > > + vring_unmap_one_packed(vq, desc); > > > > > > > > > > > As mentioned in previous discussion, this probably won't > > > > > > > > > > > work for the case > > > > > > > > > > > of out of order completion since it depends on the > > > > > > > > > > > information in the > > > > > > > > > > > descriptor ring. We probably need to extend ctx to record > > > > > > > > > > > such information. > > > > > > > > > > Above code doesn't depend on the information in the > > > > > > > > > > descriptor > > > > > > > > > > ring. The vq->desc_state[] is the extended ctx. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Tiwei Bie > > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so > > > > > > > > > vring_unmap_one_packed() still depends on the content of > > > > > > > > > descriptor ring? > > > > > > > > > > > > > > > > > I got your point now. I think it makes sense to reserve > > > > > > > > the bits of the addr field. Driver shouldn't try to get > > > > > > > > addrs from the descriptors when cleanup the descriptors > > > > > > > > no matter whether we support out-of-order or not. > > > > > > > Maybe I was wrong, but I remember spec mentioned something like > > > > > > > this. > > > > > > You're right. Spec mentioned this. I was just repeating > > > > > > the spec to emphasize that it does make sense. :) > > > > > > > > > > > > > > But combining it with the out-of-order support, it will > > > > > > > > mean that the driver still needs to maintain a desc/ctx > > > > > > > > list that is very similar to the desc ring in the split > > > > > > > > ring. I'm not quite sure whether it's something we want. > > > > > > > > If it is true, I'll do it. So do you think we also want > > > > > > > > to maintain such a desc/ctx list for packed ring? > > > > > > > To make it work for OOO backends I think we need something like > > > > > > > this > > > > > > > (hardware NIC drivers are usually have something like this). > > > > > > Which hardware NIC drivers have this? > > > > > It's quite common I think, e.g driver track e.g dma addr and page frag > > > > > somewhere. e.g the ring->rx_info in mlx4 driver. > > > > It seems that I had a misunderstanding on your > > > > previous comments. I know it's quite common for > > > > drivers to track e.g. DMA addrs somewhere (and > > > > I think one reason behind this is that they want > > > > to reuse the bits of addr field). > > > Yes, we may want this for virtio-net as well in the future. > > > > > > >But tracking > > > > addrs somewhere doesn't means supporting OOO. > > > > I thought you were saying it's quite common for > > > > hardware NIC drivers to support OOO (i.e. NICs > > > > will return the descriptors OOO): > > > > > > > > I'm not familiar with mlx4, maybe I'm wrong. > > > > I just had a quick glance. And I found below > > > > comments in mlx4_en_process_rx_cq(): > > > > > > > > ``` > > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx > > > >* descriptor offset can be deduced from the CQE index instead of > > > >* reading 'cqe->index' */ > > > > index = cq->mcq.cons_index & ring->size_mask; > > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor; > > > > ``` > > > > > > > > It seems that although they have a completion > > > > queue, they are still using the ring in order. > > > I guess so (at least from the above bits). Git grep -i "out of order" in > > >
Re: Tasks RCU vs Preempt RCU
On Fri, May 18, 2018 at 11:36:23AM -0700, Joel Fernandes wrote: > Hi, > > I was thinking about tasks-RCU and why its needed. Since preempt-RCU allows > tasks to be preempted in read-sections, can we not just reuse that mechanism > for the trampolines since we track all preempted tasks so we would wait on > all tasks preempted within a trampoline? > > I am trying to understand what will _not_ work if we did that.. I'm guessing > the answer is that that would mean the trampoline has to be wrapped with > rcu_read_{lock,unlock} which may add some overhead, but please let me know > if I'm missing something else.. > > The advantage I guess is possible elimination of an RCU variant, and also > possibly eliminating the tasks RCU thread that monitors.. Anyway I was > thinking more in terms of the effort of reduction of the RCU flavors etc and > reducing complexity ideas. The problem is that if they are preempted while executing in a trampoline, RCU-preempt doesn't queue them nor does it wait on them. And the problem with wrapping them with rcu_read_{lock,unlock} is that there would be a point before the trampoline executed rcu_read_lock() but while it was on the trampoline. Nothing good comes from this. ;-) Thanx, Paul
Re: Tasks RCU vs Preempt RCU
On Fri, May 18, 2018 at 11:36:23AM -0700, Joel Fernandes wrote: > Hi, > > I was thinking about tasks-RCU and why its needed. Since preempt-RCU allows > tasks to be preempted in read-sections, can we not just reuse that mechanism > for the trampolines since we track all preempted tasks so we would wait on > all tasks preempted within a trampoline? > > I am trying to understand what will _not_ work if we did that.. I'm guessing > the answer is that that would mean the trampoline has to be wrapped with > rcu_read_{lock,unlock} which may add some overhead, but please let me know > if I'm missing something else.. > > The advantage I guess is possible elimination of an RCU variant, and also > possibly eliminating the tasks RCU thread that monitors.. Anyway I was > thinking more in terms of the effort of reduction of the RCU flavors etc and > reducing complexity ideas. The problem is that if they are preempted while executing in a trampoline, RCU-preempt doesn't queue them nor does it wait on them. And the problem with wrapping them with rcu_read_{lock,unlock} is that there would be a point before the trampoline executed rcu_read_lock() but while it was on the trampoline. Nothing good comes from this. ;-) Thanx, Paul
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
2018-05-19 3:22 GMT+01:00 Dmitry Safonov: > On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote: >> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> >> > cpu family: 6 >> > model: 142 >> > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz >> > But I usually test kernels in VM. So, I use virt-manager as it's >> > easier to manage >> > multiple VMs. The thing is that I've chosen "Copy host CPU >> > configuration" >> > and for some reason, I don't quite follow virt-manager makes model >> >> "Opteron_G4". >> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1- >> > 2.fc26). >> > So, cpuinfo in VM says: >> > cpu family: 21 >> > model: 1 >> > model name: AMD Opteron 62xx class CPU >> >> What does guest cpuinfo say for vendor_id? >> >> There are multiple potential screwups here. >> >> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior >> that’s different from Intel’s, and the test case was definitely >> wrong. But >> KVM has no way to influence it. Are you sure you’re using KVM and >> not QEMU >> TCG? Anyway, the IRET thing is minor compared to your other problems, >> so >> let’s try to fix them first. >> >> 2. Compat fast syscalls are wildly different on AMD and Intel. >> Because of >> this issue, QEMU with KVM is supposed to always report the real >> vendor_id >> no matter -cpu asks for. If we get the wrong vendor_id, then we’re >> at the >> mercy of KVM’s emulation and performance will suck. On older >> kernels, this >> would cause hideous kernel crashes. On new kernels, I would expect >> it to >> merely crash 32-bit user programs or be slow. > > Heh, I didn't know those details, so it looks like it's (2), > vendor_id : AuthenticAMD > in guest. > >> >> > What's worse than registers changes is that some selftests actually >> > lead >> >> to >> > Oops's. The same reason for criu-ia32 fails. >> > I've tested so far v4.15 and v4.16 releases besides master >> > (2c71d338bef2), >> > so it looks to be not a recent regression. >> > Full Oopses: >> > [ 189.100174] BUG: unable to handle kernel paging request at >> >> 417bafe8 >> > [ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 >> > PTE >> >> 6991f067 >> > [ 189.100174] Oops: 0001 [#3] SMP NOPTI >> >> Whoa there! 0001 means a failed *kernel* access. >> >> > [ 189.100174] Modules linked in: >> > [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G >> >> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64? > > sysret_ss_attrs_32 survives > >> >> > D 4.17.0-rc5+ #11 >> > [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, >> > 1996), >> > BIOS 1.10.2-1.fc26 04/01/2014 >> > [ 189.103187] RIP: 0033:0x40085a >> >> The oops was caused from CPL 3 at what looks like a totally sensible >> user >> address. Can you disassemble the offending binary and tell me what >> the >> code at 0x40085a is? > > Here is the function: > 00400842 : > 400842: 53 push %rbx > 400843: 55 push %rbp > 400844: 41 54 push %r12 > 400846: 41 55 push %r13 > 400848: 41 56 push %r14 > 40084a: 41 57 push %r15 > 40084c: 9c pushfq > 40084d: 48 89 27mov%rsp,(%rdi) > 400850: 48 89 fcmov%rdi,%rsp > 400853: 6a 23 pushq $0x23 > 400855: 68 5c 08 40 00 pushq $0x40085c > 40085a: 48 cb lretq > 40085c: ff d6 callq *%rsi > 40085e: ea (bad) > 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) > 400863: 33 00 xor(%rax),%eax > 400865: 48 8b 24 24 mov(%rsp),%rsp > 400869: 9d popfq > 40086a: 41 5f pop%r15 > 40086c: 41 5e pop%r14 > 40086e: 41 5d pop%r13 > 400870: 41 5c pop%r12 > 400872: 5d pop%rbp > 400873: 5b pop%rbx > 400874: c3 retq > 400875: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) > 40087c: 00 00 00 > 40087f: 90 nop > > Looks like mov between registers caused it? The hell. Oh, it's not 400850, I missloked, but 40085a so lretq might case it. > >> >> > [ 189.103187] RSP: 002b:417bafe8 EFLAGS: 0206 >> > [ 189.103187] RAX: RBX: 03e8 RCX: >> >> >> > [ 189.103187] RDX: RSI: 00400830 RDI: >> >> 417baff8 >> > [ 189.103187] RBP: 417baff8 R08: R09: >> >> 0077 >> > [
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
2018-05-19 3:22 GMT+01:00 Dmitry Safonov : > On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote: >> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> >> > cpu family: 6 >> > model: 142 >> > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz >> > But I usually test kernels in VM. So, I use virt-manager as it's >> > easier to manage >> > multiple VMs. The thing is that I've chosen "Copy host CPU >> > configuration" >> > and for some reason, I don't quite follow virt-manager makes model >> >> "Opteron_G4". >> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1- >> > 2.fc26). >> > So, cpuinfo in VM says: >> > cpu family: 21 >> > model: 1 >> > model name: AMD Opteron 62xx class CPU >> >> What does guest cpuinfo say for vendor_id? >> >> There are multiple potential screwups here. >> >> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior >> that’s different from Intel’s, and the test case was definitely >> wrong. But >> KVM has no way to influence it. Are you sure you’re using KVM and >> not QEMU >> TCG? Anyway, the IRET thing is minor compared to your other problems, >> so >> let’s try to fix them first. >> >> 2. Compat fast syscalls are wildly different on AMD and Intel. >> Because of >> this issue, QEMU with KVM is supposed to always report the real >> vendor_id >> no matter -cpu asks for. If we get the wrong vendor_id, then we’re >> at the >> mercy of KVM’s emulation and performance will suck. On older >> kernels, this >> would cause hideous kernel crashes. On new kernels, I would expect >> it to >> merely crash 32-bit user programs or be slow. > > Heh, I didn't know those details, so it looks like it's (2), > vendor_id : AuthenticAMD > in guest. > >> >> > What's worse than registers changes is that some selftests actually >> > lead >> >> to >> > Oops's. The same reason for criu-ia32 fails. >> > I've tested so far v4.15 and v4.16 releases besides master >> > (2c71d338bef2), >> > so it looks to be not a recent regression. >> > Full Oopses: >> > [ 189.100174] BUG: unable to handle kernel paging request at >> >> 417bafe8 >> > [ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 >> > PTE >> >> 6991f067 >> > [ 189.100174] Oops: 0001 [#3] SMP NOPTI >> >> Whoa there! 0001 means a failed *kernel* access. >> >> > [ 189.100174] Modules linked in: >> > [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G >> >> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64? > > sysret_ss_attrs_32 survives > >> >> > D 4.17.0-rc5+ #11 >> > [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, >> > 1996), >> > BIOS 1.10.2-1.fc26 04/01/2014 >> > [ 189.103187] RIP: 0033:0x40085a >> >> The oops was caused from CPL 3 at what looks like a totally sensible >> user >> address. Can you disassemble the offending binary and tell me what >> the >> code at 0x40085a is? > > Here is the function: > 00400842 : > 400842: 53 push %rbx > 400843: 55 push %rbp > 400844: 41 54 push %r12 > 400846: 41 55 push %r13 > 400848: 41 56 push %r14 > 40084a: 41 57 push %r15 > 40084c: 9c pushfq > 40084d: 48 89 27mov%rsp,(%rdi) > 400850: 48 89 fcmov%rdi,%rsp > 400853: 6a 23 pushq $0x23 > 400855: 68 5c 08 40 00 pushq $0x40085c > 40085a: 48 cb lretq > 40085c: ff d6 callq *%rsi > 40085e: ea (bad) > 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) > 400863: 33 00 xor(%rax),%eax > 400865: 48 8b 24 24 mov(%rsp),%rsp > 400869: 9d popfq > 40086a: 41 5f pop%r15 > 40086c: 41 5e pop%r14 > 40086e: 41 5d pop%r13 > 400870: 41 5c pop%r12 > 400872: 5d pop%rbp > 400873: 5b pop%rbx > 400874: c3 retq > 400875: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) > 40087c: 00 00 00 > 40087f: 90 nop > > Looks like mov between registers caused it? The hell. Oh, it's not 400850, I missloked, but 40085a so lretq might case it. > >> >> > [ 189.103187] RSP: 002b:417bafe8 EFLAGS: 0206 >> > [ 189.103187] RAX: RBX: 03e8 RCX: >> >> >> > [ 189.103187] RDX: RSI: 00400830 RDI: >> >> 417baff8 >> > [ 189.103187] RBP: 417baff8 R08: R09: >> >> 0077 >> > [ 189.103187] R10:
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote: > > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> > > cpu family: 6 > > model: 142 > > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz > > But I usually test kernels in VM. So, I use virt-manager as it's > > easier to manage > > multiple VMs. The thing is that I've chosen "Copy host CPU > > configuration" > > and for some reason, I don't quite follow virt-manager makes model > > "Opteron_G4". > > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1- > > 2.fc26). > > So, cpuinfo in VM says: > > cpu family: 21 > > model: 1 > > model name: AMD Opteron 62xx class CPU > > What does guest cpuinfo say for vendor_id? > > There are multiple potential screwups here. > > 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior > that’s different from Intel’s, and the test case was definitely > wrong. But > KVM has no way to influence it. Are you sure you’re using KVM and > not QEMU > TCG? Anyway, the IRET thing is minor compared to your other problems, > so > let’s try to fix them first. > > 2. Compat fast syscalls are wildly different on AMD and Intel. > Because of > this issue, QEMU with KVM is supposed to always report the real > vendor_id > no matter -cpu asks for. If we get the wrong vendor_id, then we’re > at the > mercy of KVM’s emulation and performance will suck. On older > kernels, this > would cause hideous kernel crashes. On new kernels, I would expect > it to > merely crash 32-bit user programs or be slow. Heh, I didn't know those details, so it looks like it's (2), vendor_id : AuthenticAMD in guest. > > > What's worse than registers changes is that some selftests actually > > lead > > to > > Oops's. The same reason for criu-ia32 fails. > > I've tested so far v4.15 and v4.16 releases besides master > > (2c71d338bef2), > > so it looks to be not a recent regression. > > Full Oopses: > > [ 189.100174] BUG: unable to handle kernel paging request at > > 417bafe8 > > [ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 > > PTE > > 6991f067 > > [ 189.100174] Oops: 0001 [#3] SMP NOPTI > > Whoa there! 0001 means a failed *kernel* access. > > > [ 189.100174] Modules linked in: > > [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G > > Was this sysret_ss_attrs_32 or sysret_ss_attrs_64? sysret_ss_attrs_32 survives > > > D 4.17.0-rc5+ #11 > > [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), > > BIOS 1.10.2-1.fc26 04/01/2014 > > [ 189.103187] RIP: 0033:0x40085a > > The oops was caused from CPL 3 at what looks like a totally sensible > user > address. Can you disassemble the offending binary and tell me what > the > code at 0x40085a is? Here is the function: 00400842 : 400842: 53 push %rbx 400843: 55 push %rbp 400844: 41 54 push %r12 400846: 41 55 push %r13 400848: 41 56 push %r14 40084a: 41 57 push %r15 40084c: 9c pushfq 40084d: 48 89 27mov%rsp,(%rdi) 400850: 48 89 fcmov%rdi,%rsp 400853: 6a 23 pushq $0x23 400855: 68 5c 08 40 00 pushq $0x40085c 40085a: 48 cb lretq 40085c: ff d6 callq *%rsi 40085e: ea (bad) 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) 400863: 33 00 xor(%rax),%eax 400865: 48 8b 24 24 mov(%rsp),%rsp 400869: 9d popfq 40086a: 41 5f pop%r15 40086c: 41 5e pop%r14 40086e: 41 5d pop%r13 400870: 41 5c pop%r12 400872: 5d pop%rbp 400873: 5b pop%rbx 400874: c3 retq 400875: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 40087c: 00 00 00 40087f: 90 nop Looks like mov between registers caused it? The hell. > > > [ 189.103187] RSP: 002b:417bafe8 EFLAGS: 0206 > > [ 189.103187] RAX: RBX: 03e8 RCX: > > > > [ 189.103187] RDX: RSI: 00400830 RDI: > > 417baff8 > > [ 189.103187] RBP: 417baff8 R08: R09: > > 0077 > > [ 189.103187] R10: 0006 R11: R12: > > 417ba000 > > [ 189.103187] R13: 7ffc05207840 R14: R15: > > > > [ 189.103187] FS: 7f98566ecb40() > > GS:9740ffc0() > > knlGS: > > [
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote: > > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> > > cpu family: 6 > > model: 142 > > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz > > But I usually test kernels in VM. So, I use virt-manager as it's > > easier to manage > > multiple VMs. The thing is that I've chosen "Copy host CPU > > configuration" > > and for some reason, I don't quite follow virt-manager makes model > > "Opteron_G4". > > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1- > > 2.fc26). > > So, cpuinfo in VM says: > > cpu family: 21 > > model: 1 > > model name: AMD Opteron 62xx class CPU > > What does guest cpuinfo say for vendor_id? > > There are multiple potential screwups here. > > 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior > that’s different from Intel’s, and the test case was definitely > wrong. But > KVM has no way to influence it. Are you sure you’re using KVM and > not QEMU > TCG? Anyway, the IRET thing is minor compared to your other problems, > so > let’s try to fix them first. > > 2. Compat fast syscalls are wildly different on AMD and Intel. > Because of > this issue, QEMU with KVM is supposed to always report the real > vendor_id > no matter -cpu asks for. If we get the wrong vendor_id, then we’re > at the > mercy of KVM’s emulation and performance will suck. On older > kernels, this > would cause hideous kernel crashes. On new kernels, I would expect > it to > merely crash 32-bit user programs or be slow. Heh, I didn't know those details, so it looks like it's (2), vendor_id : AuthenticAMD in guest. > > > What's worse than registers changes is that some selftests actually > > lead > > to > > Oops's. The same reason for criu-ia32 fails. > > I've tested so far v4.15 and v4.16 releases besides master > > (2c71d338bef2), > > so it looks to be not a recent regression. > > Full Oopses: > > [ 189.100174] BUG: unable to handle kernel paging request at > > 417bafe8 > > [ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 > > PTE > > 6991f067 > > [ 189.100174] Oops: 0001 [#3] SMP NOPTI > > Whoa there! 0001 means a failed *kernel* access. > > > [ 189.100174] Modules linked in: > > [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G > > Was this sysret_ss_attrs_32 or sysret_ss_attrs_64? sysret_ss_attrs_32 survives > > > D 4.17.0-rc5+ #11 > > [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, > > 1996), > > BIOS 1.10.2-1.fc26 04/01/2014 > > [ 189.103187] RIP: 0033:0x40085a > > The oops was caused from CPL 3 at what looks like a totally sensible > user > address. Can you disassemble the offending binary and tell me what > the > code at 0x40085a is? Here is the function: 00400842 : 400842: 53 push %rbx 400843: 55 push %rbp 400844: 41 54 push %r12 400846: 41 55 push %r13 400848: 41 56 push %r14 40084a: 41 57 push %r15 40084c: 9c pushfq 40084d: 48 89 27mov%rsp,(%rdi) 400850: 48 89 fcmov%rdi,%rsp 400853: 6a 23 pushq $0x23 400855: 68 5c 08 40 00 pushq $0x40085c 40085a: 48 cb lretq 40085c: ff d6 callq *%rsi 40085e: ea (bad) 40085f: 65 08 40 00 or %al,%gs:0x0(%rax) 400863: 33 00 xor(%rax),%eax 400865: 48 8b 24 24 mov(%rsp),%rsp 400869: 9d popfq 40086a: 41 5f pop%r15 40086c: 41 5e pop%r14 40086e: 41 5d pop%r13 400870: 41 5c pop%r12 400872: 5d pop%rbp 400873: 5b pop%rbx 400874: c3 retq 400875: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 40087c: 00 00 00 40087f: 90 nop Looks like mov between registers caused it? The hell. > > > [ 189.103187] RSP: 002b:417bafe8 EFLAGS: 0206 > > [ 189.103187] RAX: RBX: 03e8 RCX: > > > > [ 189.103187] RDX: RSI: 00400830 RDI: > > 417baff8 > > [ 189.103187] RBP: 417baff8 R08: R09: > > 0077 > > [ 189.103187] R10: 0006 R11: R12: > > 417ba000 > > [ 189.103187] R13: 7ffc05207840 R14: R15: > > > > [ 189.103187] FS: 7f98566ecb40() > > GS:9740ffc0() > > knlGS: > > [
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
> On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> wrote: > Hi Andy, > 2018-05-18 23:03 GMT+01:00 Andy Lutomirski: >>> On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov wrote: >>> Some selftests are failing, but the same way as before the patch >>> (ITOW, it's not regression): >>> [root@localhost self]# grep FAIL out >>> [FAIL] Reg 1 mismatch: requested 0x0; got 0x3 >>> [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got >>> 0xff425aadc0de >>> [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got >>> 0xff425aadc0de >>> [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got >>> 0xff425aadc0de >> Are you on AMD? Can you try this patch: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=c88aa6d53840e48970c54f9ef70c79415033b32d >> and give me a Tested-by if it fixes it for you? > Sure. > I'm on Intel actually: > cpu family: 6 > model: 142 > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz > But I usually test kernels in VM. So, I use virt-manager as it's > easier to manage > multiple VMs. The thing is that I've chosen "Copy host CPU configuration" > and for some reason, I don't quite follow virt-manager makes model "Opteron_G4". > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26). > So, cpuinfo in VM says: > cpu family: 21 > model: 1 > model name: AMD Opteron 62xx class CPU What does guest cpuinfo say for vendor_id? There are multiple potential screwups here. 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior that’s different from Intel’s, and the test case was definitely wrong. But KVM has no way to influence it. Are you sure you’re using KVM and not QEMU TCG? Anyway, the IRET thing is minor compared to your other problems, so let’s try to fix them first. 2. Compat fast syscalls are wildly different on AMD and Intel. Because of this issue, QEMU with KVM is supposed to always report the real vendor_id no matter -cpu asks for. If we get the wrong vendor_id, then we’re at the mercy of KVM’s emulation and performance will suck. On older kernels, this would cause hideous kernel crashes. On new kernels, I would expect it to merely crash 32-bit user programs or be slow. > What's worse than registers changes is that some selftests actually lead to > Oops's. The same reason for criu-ia32 fails. > I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2), > so it looks to be not a recent regression. > Full Oopses: > [ 189.100174] BUG: unable to handle kernel paging request at 417bafe8 > [ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE 6991f067 > [ 189.100174] Oops: 0001 [#3] SMP NOPTI Whoa there! 0001 means a failed *kernel* access. > [ 189.100174] Modules linked in: > [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G Was this sysret_ss_attrs_32 or sysret_ss_attrs_64? > D 4.17.0-rc5+ #11 > [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1.fc26 04/01/2014 > [ 189.103187] RIP: 0033:0x40085a The oops was caused from CPL 3 at what looks like a totally sensible user address. Can you disassemble the offending binary and tell me what the code at 0x40085a is? > [ 189.103187] RSP: 002b:417bafe8 EFLAGS: 0206 > [ 189.103187] RAX: RBX: 03e8 RCX: > [ 189.103187] RDX: RSI: 00400830 RDI: 417baff8 > [ 189.103187] RBP: 417baff8 R08: R09: 0077 > [ 189.103187] R10: 0006 R11: R12: 417ba000 > [ 189.103187] R13: 7ffc05207840 R14: R15: > [ 189.103187] FS: 7f98566ecb40() GS:9740ffc0() > knlGS: > [ 189.103187] CS: 0010 DS: ES: CR0: 80050033 CS here is the value of CS that the *kernel* has, so 0x10 is normal. > [ 189.103187] CR2: 417bafe8 CR3: 69dc4000 CR4: 007406f0 CR2 is in user space. So the big question is: what happened here? Why did the CPU (or emulated CPU) attempt a privileged access to a user address while running user code?
Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall
> On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> wrote: > Hi Andy, > 2018-05-18 23:03 GMT+01:00 Andy Lutomirski : >>> On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov wrote: >>> Some selftests are failing, but the same way as before the patch >>> (ITOW, it's not regression): >>> [root@localhost self]# grep FAIL out >>> [FAIL] Reg 1 mismatch: requested 0x0; got 0x3 >>> [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got >>> 0xff425aadc0de >>> [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got >>> 0xff425aadc0de >>> [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got >>> 0xff425aadc0de >> Are you on AMD? Can you try this patch: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=c88aa6d53840e48970c54f9ef70c79415033b32d >> and give me a Tested-by if it fixes it for you? > Sure. > I'm on Intel actually: > cpu family: 6 > model: 142 > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz > But I usually test kernels in VM. So, I use virt-manager as it's > easier to manage > multiple VMs. The thing is that I've chosen "Copy host CPU configuration" > and for some reason, I don't quite follow virt-manager makes model "Opteron_G4". > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26). > So, cpuinfo in VM says: > cpu family: 21 > model: 1 > model name: AMD Opteron 62xx class CPU What does guest cpuinfo say for vendor_id? There are multiple potential screwups here. 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior that’s different from Intel’s, and the test case was definitely wrong. But KVM has no way to influence it. Are you sure you’re using KVM and not QEMU TCG? Anyway, the IRET thing is minor compared to your other problems, so let’s try to fix them first. 2. Compat fast syscalls are wildly different on AMD and Intel. Because of this issue, QEMU with KVM is supposed to always report the real vendor_id no matter -cpu asks for. If we get the wrong vendor_id, then we’re at the mercy of KVM’s emulation and performance will suck. On older kernels, this would cause hideous kernel crashes. On new kernels, I would expect it to merely crash 32-bit user programs or be slow. > What's worse than registers changes is that some selftests actually lead to > Oops's. The same reason for criu-ia32 fails. > I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2), > so it looks to be not a recent regression. > Full Oopses: > [ 189.100174] BUG: unable to handle kernel paging request at 417bafe8 > [ 189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE 6991f067 > [ 189.100174] Oops: 0001 [#3] SMP NOPTI Whoa there! 0001 means a failed *kernel* access. > [ 189.100174] Modules linked in: > [ 189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G Was this sysret_ss_attrs_32 or sysret_ss_attrs_64? > D 4.17.0-rc5+ #11 > [ 189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1.fc26 04/01/2014 > [ 189.103187] RIP: 0033:0x40085a The oops was caused from CPL 3 at what looks like a totally sensible user address. Can you disassemble the offending binary and tell me what the code at 0x40085a is? > [ 189.103187] RSP: 002b:417bafe8 EFLAGS: 0206 > [ 189.103187] RAX: RBX: 03e8 RCX: > [ 189.103187] RDX: RSI: 00400830 RDI: 417baff8 > [ 189.103187] RBP: 417baff8 R08: R09: 0077 > [ 189.103187] R10: 0006 R11: R12: 417ba000 > [ 189.103187] R13: 7ffc05207840 R14: R15: > [ 189.103187] FS: 7f98566ecb40() GS:9740ffc0() > knlGS: > [ 189.103187] CS: 0010 DS: ES: CR0: 80050033 CS here is the value of CS that the *kernel* has, so 0x10 is normal. > [ 189.103187] CR2: 417bafe8 CR3: 69dc4000 CR4: 007406f0 CR2 is in user space. So the big question is: what happened here? Why did the CPU (or emulated CPU) attempt a privileged access to a user address while running user code?
Re: [PATCH 03/18] printk: Convert pr_fmt from blank define to KBUILD_MODNAME
On Fri, 2018-05-18 at 23:29 +0300, Andy Shevchenko wrote: > On Fri, May 18, 2018 at 12:10 PM, Joe Percheswrote: > > On Fri, 2018-05-18 at 10:42 +0200, Petr Mladek wrote: > > > On Thu 2018-05-10 08:45:29, Joe Perches wrote: > > > [0.00] libftrace: ftrace: allocating 40753 entries in 160 pages > > > [0.004008] apic: APIC: Switch to symmetric I/O mode setup > > > > I believe the uppercase APIC: bit should be removed. > > I disagree on lowering the case of some prefixes. > > In the case when kernel crashes at boot time and only what you have is > what on your screen it's much better to catch with different cases > than if everything would be either upper or lower. I don't parse. My suggestion is to make "apic: " the only prefix. ie: --- arch/x86/kernel/apic/apic.c | 100 +--- 1 file changed, 47 insertions(+), 53 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 2aabd4cb0e3f..67dd4e9191da 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -14,6 +14,8 @@ * Mikael Pettersson : PM converted to driver model. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -232,7 +234,7 @@ static int modern_apic(void) */ static void __init apic_disable(void) { - pr_info("APIC: switched to apic NOOP\n"); + pr_info("switched to apic NOOP\n"); apic = _noop; } @@ -430,17 +432,13 @@ int setup_APIC_eilvt(u8 offset, u8 vector, u8 msg_type, u8 mask) reserved = reserve_eilvt_offset(offset, new); if (reserved != new) { - pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for " - "vector 0x%x, but the register is already in use for " - "vector 0x%x on another cpu\n", + pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for vector 0x%x, but the register is already in use for vector 0x%x on another cpu\n", smp_processor_id(), reg, offset, new, reserved); return -EINVAL; } if (!eilvt_entry_is_changeable(old, new)) { - pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for " - "vector 0x%x, but the register is already in use for " - "vector 0x%x on this cpu\n", + pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for vector 0x%x, but the register is already in use for vector 0x%x on this cpu\n", smp_processor_id(), reg, offset, new, old); return -EBUSY; } @@ -624,8 +622,8 @@ static void apic_check_deadline_errata(void) return; setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); - pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; " - "please update microcode to version: 0x%x (or later)\n", rev); + pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x%x (or later)\n", + rev); } /* @@ -769,14 +767,14 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc) res = (((u64)deltapm) * mult) >> 22; do_div(res, 100); - pr_warning("APIC calibration not consistent " - "with PM-Timer: %ldms instead of 100ms\n",(long)res); + pr_warn("calibration not consistent with PM-Timer: %ldms instead of 100ms\n", + (long)res); /* Correct the lapic counter value */ res = (((u64)(*delta)) * pm_100ms); do_div(res, deltapm); - pr_info("APIC delta adjusted to PM-Timer: " - "%lu (%ld)\n", (unsigned long)res, *delta); + pr_info("APIC delta adjusted to PM-Timer: %lu (%ld)\n", + (unsigned long)res, *delta); *delta = (long)res; /* Correct the tsc counter value */ @@ -893,7 +891,7 @@ static int __init calibrate_APIC_clock(void) */ if (lapic_timer_frequency < (100 / HZ)) { local_irq_enable(); - pr_warning("APIC frequency too slow, disabling apic timer\n"); + pr_warn("frequency too slow, disabling apic timer\n"); return -1; } @@ -936,8 +934,8 @@ static int __init calibrate_APIC_clock(void) local_irq_enable(); if (levt->features & CLOCK_EVT_FEAT_DUMMY) { - pr_warning("APIC timer disabled due to verification failure\n"); - return -1; + pr_warn("timer disabled due to verification failure\n"); + return -1; } return 0; @@ -1010,8 +1008,8 @@ static void local_apic_timer_interrupt(void) * spurious. */ if (!evt->event_handler) { - pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", - smp_processor_id()); + pr_warn("Spurious LAPIC timer
Re: [PATCH 03/18] printk: Convert pr_fmt from blank define to KBUILD_MODNAME
On Fri, 2018-05-18 at 23:29 +0300, Andy Shevchenko wrote: > On Fri, May 18, 2018 at 12:10 PM, Joe Perches wrote: > > On Fri, 2018-05-18 at 10:42 +0200, Petr Mladek wrote: > > > On Thu 2018-05-10 08:45:29, Joe Perches wrote: > > > [0.00] libftrace: ftrace: allocating 40753 entries in 160 pages > > > [0.004008] apic: APIC: Switch to symmetric I/O mode setup > > > > I believe the uppercase APIC: bit should be removed. > > I disagree on lowering the case of some prefixes. > > In the case when kernel crashes at boot time and only what you have is > what on your screen it's much better to catch with different cases > than if everything would be either upper or lower. I don't parse. My suggestion is to make "apic: " the only prefix. ie: --- arch/x86/kernel/apic/apic.c | 100 +--- 1 file changed, 47 insertions(+), 53 deletions(-) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 2aabd4cb0e3f..67dd4e9191da 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -14,6 +14,8 @@ * Mikael Pettersson : PM converted to driver model. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -232,7 +234,7 @@ static int modern_apic(void) */ static void __init apic_disable(void) { - pr_info("APIC: switched to apic NOOP\n"); + pr_info("switched to apic NOOP\n"); apic = _noop; } @@ -430,17 +432,13 @@ int setup_APIC_eilvt(u8 offset, u8 vector, u8 msg_type, u8 mask) reserved = reserve_eilvt_offset(offset, new); if (reserved != new) { - pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for " - "vector 0x%x, but the register is already in use for " - "vector 0x%x on another cpu\n", + pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for vector 0x%x, but the register is already in use for vector 0x%x on another cpu\n", smp_processor_id(), reg, offset, new, reserved); return -EINVAL; } if (!eilvt_entry_is_changeable(old, new)) { - pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for " - "vector 0x%x, but the register is already in use for " - "vector 0x%x on this cpu\n", + pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for vector 0x%x, but the register is already in use for vector 0x%x on this cpu\n", smp_processor_id(), reg, offset, new, old); return -EBUSY; } @@ -624,8 +622,8 @@ static void apic_check_deadline_errata(void) return; setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER); - pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; " - "please update microcode to version: 0x%x (or later)\n", rev); + pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x%x (or later)\n", + rev); } /* @@ -769,14 +767,14 @@ calibrate_by_pmtimer(long deltapm, long *delta, long *deltatsc) res = (((u64)deltapm) * mult) >> 22; do_div(res, 100); - pr_warning("APIC calibration not consistent " - "with PM-Timer: %ldms instead of 100ms\n",(long)res); + pr_warn("calibration not consistent with PM-Timer: %ldms instead of 100ms\n", + (long)res); /* Correct the lapic counter value */ res = (((u64)(*delta)) * pm_100ms); do_div(res, deltapm); - pr_info("APIC delta adjusted to PM-Timer: " - "%lu (%ld)\n", (unsigned long)res, *delta); + pr_info("APIC delta adjusted to PM-Timer: %lu (%ld)\n", + (unsigned long)res, *delta); *delta = (long)res; /* Correct the tsc counter value */ @@ -893,7 +891,7 @@ static int __init calibrate_APIC_clock(void) */ if (lapic_timer_frequency < (100 / HZ)) { local_irq_enable(); - pr_warning("APIC frequency too slow, disabling apic timer\n"); + pr_warn("frequency too slow, disabling apic timer\n"); return -1; } @@ -936,8 +934,8 @@ static int __init calibrate_APIC_clock(void) local_irq_enable(); if (levt->features & CLOCK_EVT_FEAT_DUMMY) { - pr_warning("APIC timer disabled due to verification failure\n"); - return -1; + pr_warn("timer disabled due to verification failure\n"); + return -1; } return 0; @@ -1010,8 +1008,8 @@ static void local_apic_timer_interrupt(void) * spurious. */ if (!evt->event_handler) { - pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", - smp_processor_id()); + pr_warn("Spurious LAPIC timer interrupt on cpu
Re: [v0 2/2] cpufreq: qcom-fw: Add support for QCOM cpufreq FW driver
Hi Taniya, Thank you for the patch! Yet something to improve: [auto build test ERROR on pm/linux-next] [also build test ERROR on v4.17-rc5 next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Taniya-Das/Add-support-for-QCOM-cpufreq-FW-driver/20180519-050902 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next config: arm-allmodconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All errors (new ones prefixed by >>): drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpufreq_fw_cpu_init': >> drivers/cpufreq/qcom-cpufreq-fw.c:77:8: error: 'struct cpufreq_policy' has >> no member named 'table' policy->table = c->table; ^~ drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpu_resources_init': >> drivers/cpufreq/qcom-cpufreq-fw.c:187:37: error: passing argument 1 of >> 'platform_get_resource_byname' from incompatible pointer type >> [-Werror=incompatible-pointer-types] res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base"); ^~~ In file included from include/linux/of_device.h:6:0, from include/linux/of_platform.h:12, from drivers/cpufreq/qcom-cpufreq-fw.c:11: include/linux/platform_device.h:56:25: note: expected 'struct platform_device *' but argument is of type 'struct device *' extern struct resource *platform_get_resource_byname(struct platform_device *, ^~~~ >> drivers/cpufreq/qcom-cpufreq-fw.c:187:6: error: incompatible types when >> assigning to type 'struct resource' from type 'struct resource *' res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base"); ^ >> drivers/cpufreq/qcom-cpufreq-fw.c:188:6: error: wrong type argument to unary >> exclamation mark if (!res) { ^ >> drivers/cpufreq/qcom-cpufreq-fw.c:193:33: error: invalid type argument of >> '->' (have 'struct resource') en_base = devm_ioremap(dev, res->start, resource_size(res)); ^~ >> drivers/cpufreq/qcom-cpufreq-fw.c:193:56: error: incompatible type for >> argument 1 of 'resource_size' en_base = devm_ioremap(dev, res->start, resource_size(res)); ^~~ In file included from include/linux/of_address.h:4:0, from drivers/cpufreq/qcom-cpufreq-fw.c:10: include/linux/ioport.h:196:31: note: expected 'const struct resource *' but argument is of type 'struct resource' static inline resource_size_t resource_size(const struct resource *res) ^ cc1: some warnings being treated as errors vim +77 drivers/cpufreq/qcom-cpufreq-fw.c > 11 #include 12 13 #define INIT_RATE 3UL 14 #define XO_RATE 1920UL 15 #define LUT_MAX_ENTRIES 40U 16 #define CORE_COUNT_VAL(val) ((val & GENMASK(18, 16)) >> 16) 17 #define LUT_ROW_SIZE32 18 19 struct cpufreq_qcom { 20 struct cpufreq_frequency_table *table; 21 struct device *dev; 22 void __iomem *perf_base; 23 void __iomem *lut_base; 24 cpumask_t related_cpus; 25 unsigned int max_cores; 26 }; 27 28 static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS]; 29 30 static int 31 qcom_cpufreq_fw_target_index(struct cpufreq_policy *policy, unsigned int index) 32 { 33 struct cpufreq_qcom *c = policy->driver_data; 34 35 if (index >= LUT_MAX_ENTRIES) { 36 dev_err(c->dev, 37 "Passing an index (%u) that's greater than max (%d)\n", 38 index, LUT_MAX_ENTRIES - 1); 39 return -EINVAL; 40 } 41 42 writel_relaxed(index, c->perf_base); 43 44 /* Make sure the write goes through before proceeding */ 45 mb(); 46 return 0; 47 } 48 49 static unsigned int qcom_cpufreq_fw_get(unsigned int cpu) 50 { 51 struct cpufreq_qcom *c; 52 unsigned int index; 53 54 c = qcom_freq_domain_map[cpu]; 55 if (!c) 56 return -ENODEV; 57 58 index = readl_relaxed(c->perf_base); 59 index = min(index, LUT_MAX_ENTRIES - 1); 60 61 return
Re: [v0 2/2] cpufreq: qcom-fw: Add support for QCOM cpufreq FW driver
Hi Taniya, Thank you for the patch! Yet something to improve: [auto build test ERROR on pm/linux-next] [also build test ERROR on v4.17-rc5 next-20180517] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Taniya-Das/Add-support-for-QCOM-cpufreq-FW-driver/20180519-050902 base: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next config: arm-allmodconfig (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All errors (new ones prefixed by >>): drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpufreq_fw_cpu_init': >> drivers/cpufreq/qcom-cpufreq-fw.c:77:8: error: 'struct cpufreq_policy' has >> no member named 'table' policy->table = c->table; ^~ drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpu_resources_init': >> drivers/cpufreq/qcom-cpufreq-fw.c:187:37: error: passing argument 1 of >> 'platform_get_resource_byname' from incompatible pointer type >> [-Werror=incompatible-pointer-types] res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base"); ^~~ In file included from include/linux/of_device.h:6:0, from include/linux/of_platform.h:12, from drivers/cpufreq/qcom-cpufreq-fw.c:11: include/linux/platform_device.h:56:25: note: expected 'struct platform_device *' but argument is of type 'struct device *' extern struct resource *platform_get_resource_byname(struct platform_device *, ^~~~ >> drivers/cpufreq/qcom-cpufreq-fw.c:187:6: error: incompatible types when >> assigning to type 'struct resource' from type 'struct resource *' res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base"); ^ >> drivers/cpufreq/qcom-cpufreq-fw.c:188:6: error: wrong type argument to unary >> exclamation mark if (!res) { ^ >> drivers/cpufreq/qcom-cpufreq-fw.c:193:33: error: invalid type argument of >> '->' (have 'struct resource') en_base = devm_ioremap(dev, res->start, resource_size(res)); ^~ >> drivers/cpufreq/qcom-cpufreq-fw.c:193:56: error: incompatible type for >> argument 1 of 'resource_size' en_base = devm_ioremap(dev, res->start, resource_size(res)); ^~~ In file included from include/linux/of_address.h:4:0, from drivers/cpufreq/qcom-cpufreq-fw.c:10: include/linux/ioport.h:196:31: note: expected 'const struct resource *' but argument is of type 'struct resource' static inline resource_size_t resource_size(const struct resource *res) ^ cc1: some warnings being treated as errors vim +77 drivers/cpufreq/qcom-cpufreq-fw.c > 11 #include 12 13 #define INIT_RATE 3UL 14 #define XO_RATE 1920UL 15 #define LUT_MAX_ENTRIES 40U 16 #define CORE_COUNT_VAL(val) ((val & GENMASK(18, 16)) >> 16) 17 #define LUT_ROW_SIZE32 18 19 struct cpufreq_qcom { 20 struct cpufreq_frequency_table *table; 21 struct device *dev; 22 void __iomem *perf_base; 23 void __iomem *lut_base; 24 cpumask_t related_cpus; 25 unsigned int max_cores; 26 }; 27 28 static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS]; 29 30 static int 31 qcom_cpufreq_fw_target_index(struct cpufreq_policy *policy, unsigned int index) 32 { 33 struct cpufreq_qcom *c = policy->driver_data; 34 35 if (index >= LUT_MAX_ENTRIES) { 36 dev_err(c->dev, 37 "Passing an index (%u) that's greater than max (%d)\n", 38 index, LUT_MAX_ENTRIES - 1); 39 return -EINVAL; 40 } 41 42 writel_relaxed(index, c->perf_base); 43 44 /* Make sure the write goes through before proceeding */ 45 mb(); 46 return 0; 47 } 48 49 static unsigned int qcom_cpufreq_fw_get(unsigned int cpu) 50 { 51 struct cpufreq_qcom *c; 52 unsigned int index; 53 54 c = qcom_freq_domain_map[cpu]; 55 if (!c) 56 return -ENODEV; 57 58 index = readl_relaxed(c->perf_base); 59 index = min(index, LUT_MAX_ENTRIES - 1); 60 61 return
MANAGER AUDIT AND ACCOUNTING DEPARTMENT
FROM MR NELSON CAPPER MANAGER AUDIT AND ACCOUNTING DEPARTMENT KOREA DEVELOPMENT BANK (K.D.B) LONDON U.K PLEASE READ CAREFULLY This message might meet you in utmost surprise. However, it’s just my urgent need for foreign partner that made me to contact you for this transaction. I am Mr, Nelson Capper Manager Audit Accounting Department Korea Development Bank KDB London U.K I would like to know if this proposal will be worthwhile for your acceptance. I have a Foreign Customer, Manfred Hoffman from Germany who was an Investor, Crude Oil Merchant and Federal Government Contractor he was a victim with Concord Air Line, flight AF4590 killing 113 people crashed on 25 July 2000 near Paris leaving a closing balance of Ten Million Five Hundred Thousand United States Dollars ($10.5m) in one of his Private US Dollar Account that was been managed by me as the Customer’s Account Officer. Base on my security report, these funds can be claim without any hitches as no one is aware of the funds and it’s closing balance except me and the customer who is (Now Deceased) therefore, I can present you as the Next of Kin and we will work out the modalities for the claiming of the funds in accordance with the law. Now, if you are really sure of your trustworthy, accountability and confidentiality on this transaction without disappointment, contact me through email for further details. i expect your letter.After your reply I shall give you the details and procedures of the transaction. Regards Mr, Nelson Capper
MANAGER AUDIT AND ACCOUNTING DEPARTMENT
FROM MR NELSON CAPPER MANAGER AUDIT AND ACCOUNTING DEPARTMENT KOREA DEVELOPMENT BANK (K.D.B) LONDON U.K PLEASE READ CAREFULLY This message might meet you in utmost surprise. However, it’s just my urgent need for foreign partner that made me to contact you for this transaction. I am Mr, Nelson Capper Manager Audit Accounting Department Korea Development Bank KDB London U.K I would like to know if this proposal will be worthwhile for your acceptance. I have a Foreign Customer, Manfred Hoffman from Germany who was an Investor, Crude Oil Merchant and Federal Government Contractor he was a victim with Concord Air Line, flight AF4590 killing 113 people crashed on 25 July 2000 near Paris leaving a closing balance of Ten Million Five Hundred Thousand United States Dollars ($10.5m) in one of his Private US Dollar Account that was been managed by me as the Customer’s Account Officer. Base on my security report, these funds can be claim without any hitches as no one is aware of the funds and it’s closing balance except me and the customer who is (Now Deceased) therefore, I can present you as the Next of Kin and we will work out the modalities for the claiming of the funds in accordance with the law. Now, if you are really sure of your trustworthy, accountability and confidentiality on this transaction without disappointment, contact me through email for further details. i expect your letter.After your reply I shall give you the details and procedures of the transaction. Regards Mr, Nelson Capper
Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
On 5/18/2018 1:58 PM, Long Li wrote: Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical wrote: Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote: From: Long LiWhen cache=rdma is enabled on mount options, CIFS do not allocate internal data buffer pages for I/O, data is read/writen directly to user memory via RDMA. I don't think this should be an option. For direct I/O without signing or encryption CIFS should always use get_user_pages, with or without RDMA. Yes this should be done for all transport. If there are no objections, I'll send patches to change this. Would this help/change performance much? On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O patterns. But I haven't tested on TCP. Maybe it will help a little bit. Well, when the application requests direct i/o on a TCP connection, you definitely don't want to cache it! So even if the performance is different, correctness would dictate doing this. You probably don't need to pin the buffer in the TCP case, which might be worth avoiding. Tom.
Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
On 5/18/2018 1:58 PM, Long Li wrote: Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical wrote: Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote: From: Long Li When cache=rdma is enabled on mount options, CIFS do not allocate internal data buffer pages for I/O, data is read/writen directly to user memory via RDMA. I don't think this should be an option. For direct I/O without signing or encryption CIFS should always use get_user_pages, with or without RDMA. Yes this should be done for all transport. If there are no objections, I'll send patches to change this. Would this help/change performance much? On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O patterns. But I haven't tested on TCP. Maybe it will help a little bit. Well, when the application requests direct i/o on a TCP connection, you definitely don't want to cache it! So even if the performance is different, correctness would dictate doing this. You probably don't need to pin the buffer in the TCP case, which might be worth avoiding. Tom.
[PATCH v3] usbip: vhci_sysfs: fix potential Spectre v1
pdev_nr and rhport can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/usb/usbip/vhci_sysfs.c:238 detach_store() warn: potential spectre issue 'vhcis' drivers/usb/usbip/vhci_sysfs.c:328 attach_store() warn: potential spectre issue 'vhcis' drivers/usb/usbip/vhci_sysfs.c:338 attach_store() warn: potential spectre issue 'vhci->vhci_hcd_ss->vdev' drivers/usb/usbip/vhci_sysfs.c:340 attach_store() warn: potential spectre issue 'vhci->vhci_hcd_hs->vdev' Fix this by sanitizing pdev_nr and rhport before using them to index vhcis and vhci->vhci_hcd_ss->vdev respectively. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel=152449131114778=2 Cc: sta...@vger.kernel.org Signed-off-by: Gustavo A. R. Silva--- Changes in v3: - Pass the addresses of pdev_nr and rhport into valid_port and valid_args. Changes in v2: - Place the barriers into valid_port. drivers/usb/usbip/vhci_sysfs.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index 4880838..be37aec 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -10,6 +10,9 @@ #include #include +/* Hardening for Spectre-v1 */ +#include + #include "usbip_common.h" #include "vhci.h" @@ -205,16 +208,20 @@ static int vhci_port_disconnect(struct vhci_hcd *vhci_hcd, __u32 rhport) return 0; } -static int valid_port(__u32 pdev_nr, __u32 rhport) +static int valid_port(__u32 *pdev_nr, __u32 *rhport) { - if (pdev_nr >= vhci_num_controllers) { - pr_err("pdev %u\n", pdev_nr); + if (*pdev_nr >= vhci_num_controllers) { + pr_err("pdev %u\n", *pdev_nr); return 0; } - if (rhport >= VHCI_HC_PORTS) { - pr_err("rhport %u\n", rhport); + *pdev_nr = array_index_nospec(*pdev_nr, vhci_num_controllers); + + if (*rhport >= VHCI_HC_PORTS) { + pr_err("rhport %u\n", *rhport); return 0; } + *rhport = array_index_nospec(*rhport, VHCI_HC_PORTS); + return 1; } @@ -232,7 +239,7 @@ static ssize_t detach_store(struct device *dev, struct device_attribute *attr, pdev_nr = port_to_pdev_nr(port); rhport = port_to_rhport(port); - if (!valid_port(pdev_nr, rhport)) + if (!valid_port(_nr, )) return -EINVAL; hcd = platform_get_drvdata(vhcis[pdev_nr].pdev); @@ -258,7 +265,8 @@ static ssize_t detach_store(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_WO(detach); -static int valid_args(__u32 pdev_nr, __u32 rhport, enum usb_device_speed speed) +static int valid_args(__u32 *pdev_nr, __u32 *rhport, + enum usb_device_speed speed) { if (!valid_port(pdev_nr, rhport)) { return 0; @@ -322,7 +330,7 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr, sockfd, devid, speed); /* check received parameters */ - if (!valid_args(pdev_nr, rhport, speed)) + if (!valid_args(_nr, , speed)) return -EINVAL; hcd = platform_get_drvdata(vhcis[pdev_nr].pdev); -- 2.7.4
[PATCH v3] usbip: vhci_sysfs: fix potential Spectre v1
pdev_nr and rhport can be controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/usb/usbip/vhci_sysfs.c:238 detach_store() warn: potential spectre issue 'vhcis' drivers/usb/usbip/vhci_sysfs.c:328 attach_store() warn: potential spectre issue 'vhcis' drivers/usb/usbip/vhci_sysfs.c:338 attach_store() warn: potential spectre issue 'vhci->vhci_hcd_ss->vdev' drivers/usb/usbip/vhci_sysfs.c:340 attach_store() warn: potential spectre issue 'vhci->vhci_hcd_hs->vdev' Fix this by sanitizing pdev_nr and rhport before using them to index vhcis and vhci->vhci_hcd_ss->vdev respectively. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel=152449131114778=2 Cc: sta...@vger.kernel.org Signed-off-by: Gustavo A. R. Silva --- Changes in v3: - Pass the addresses of pdev_nr and rhport into valid_port and valid_args. Changes in v2: - Place the barriers into valid_port. drivers/usb/usbip/vhci_sysfs.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index 4880838..be37aec 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -10,6 +10,9 @@ #include #include +/* Hardening for Spectre-v1 */ +#include + #include "usbip_common.h" #include "vhci.h" @@ -205,16 +208,20 @@ static int vhci_port_disconnect(struct vhci_hcd *vhci_hcd, __u32 rhport) return 0; } -static int valid_port(__u32 pdev_nr, __u32 rhport) +static int valid_port(__u32 *pdev_nr, __u32 *rhport) { - if (pdev_nr >= vhci_num_controllers) { - pr_err("pdev %u\n", pdev_nr); + if (*pdev_nr >= vhci_num_controllers) { + pr_err("pdev %u\n", *pdev_nr); return 0; } - if (rhport >= VHCI_HC_PORTS) { - pr_err("rhport %u\n", rhport); + *pdev_nr = array_index_nospec(*pdev_nr, vhci_num_controllers); + + if (*rhport >= VHCI_HC_PORTS) { + pr_err("rhport %u\n", *rhport); return 0; } + *rhport = array_index_nospec(*rhport, VHCI_HC_PORTS); + return 1; } @@ -232,7 +239,7 @@ static ssize_t detach_store(struct device *dev, struct device_attribute *attr, pdev_nr = port_to_pdev_nr(port); rhport = port_to_rhport(port); - if (!valid_port(pdev_nr, rhport)) + if (!valid_port(_nr, )) return -EINVAL; hcd = platform_get_drvdata(vhcis[pdev_nr].pdev); @@ -258,7 +265,8 @@ static ssize_t detach_store(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_WO(detach); -static int valid_args(__u32 pdev_nr, __u32 rhport, enum usb_device_speed speed) +static int valid_args(__u32 *pdev_nr, __u32 *rhport, + enum usb_device_speed speed) { if (!valid_port(pdev_nr, rhport)) { return 0; @@ -322,7 +330,7 @@ static ssize_t attach_store(struct device *dev, struct device_attribute *attr, sockfd, devid, speed); /* check received parameters */ - if (!valid_args(pdev_nr, rhport, speed)) + if (!valid_args(_nr, , speed)) return -EINVAL; hcd = platform_get_drvdata(vhcis[pdev_nr].pdev); -- 2.7.4
Re: [RFC v4 3/5] virtio_ring: add packed ring support
On 2018年05月18日 22:33, Tiwei Bie wrote: On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote: On 2018年05月18日 19:29, Tiwei Bie wrote: On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote: On 2018年05月16日 22:33, Tiwei Bie wrote: On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote: On 2018年05月16日 21:45, Tiwei Bie wrote: On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote: On 2018年05月16日 20:39, Tiwei Bie wrote: On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote: On 2018年05月16日 16:37, Tiwei Bie wrote: [...] +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head, + unsigned int id, void **ctx) +{ + struct vring_packed_desc *desc; + unsigned int i, j; + + /* Clear data ptr. */ + vq->desc_state[id].data = NULL; + + i = head; + + for (j = 0; j < vq->desc_state[id].num; j++) { + desc = >vring_packed.desc[i]; + vring_unmap_one_packed(vq, desc); As mentioned in previous discussion, this probably won't work for the case of out of order completion since it depends on the information in the descriptor ring. We probably need to extend ctx to record such information. Above code doesn't depend on the information in the descriptor ring. The vq->desc_state[] is the extended ctx. Best regards, Tiwei Bie Yes, but desc is a pointer to descriptor ring I think so vring_unmap_one_packed() still depends on the content of descriptor ring? I got your point now. I think it makes sense to reserve the bits of the addr field. Driver shouldn't try to get addrs from the descriptors when cleanup the descriptors no matter whether we support out-of-order or not. Maybe I was wrong, but I remember spec mentioned something like this. You're right. Spec mentioned this. I was just repeating the spec to emphasize that it does make sense. :) But combining it with the out-of-order support, it will mean that the driver still needs to maintain a desc/ctx list that is very similar to the desc ring in the split ring. I'm not quite sure whether it's something we want. If it is true, I'll do it. So do you think we also want to maintain such a desc/ctx list for packed ring? To make it work for OOO backends I think we need something like this (hardware NIC drivers are usually have something like this). Which hardware NIC drivers have this? It's quite common I think, e.g driver track e.g dma addr and page frag somewhere. e.g the ring->rx_info in mlx4 driver. It seems that I had a misunderstanding on your previous comments. I know it's quite common for drivers to track e.g. DMA addrs somewhere (and I think one reason behind this is that they want to reuse the bits of addr field). Yes, we may want this for virtio-net as well in the future. But tracking addrs somewhere doesn't means supporting OOO. I thought you were saying it's quite common for hardware NIC drivers to support OOO (i.e. NICs will return the descriptors OOO): I'm not familiar with mlx4, maybe I'm wrong. I just had a quick glance. And I found below comments in mlx4_en_process_rx_cq(): ``` /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx * descriptor offset can be deduced from the CQE index instead of * reading 'cqe->index' */ index = cq->mcq.cons_index & ring->size_mask; cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor; ``` It seems that although they have a completion queue, they are still using the ring in order. I guess so (at least from the above bits). Git grep -i "out of order" in drivers/net gives some hints. Looks like there're few deivces do this. I guess maybe storage device may want OOO. Right, some iSCSI did. But tracking them elsewhere is not only for OOO. Spec said: for element address " In a used descriptor, Element Address is unused. " for Next flag: " For example, if descriptors are used in the same order in which they are made available, this will result in the used descriptor overwriting the first available descriptor in the list, the used descriptor for the next list overwriting the first available descriptor in the next list, etc. " for in order completion: " This will result in the used descriptor overwriting the first available descriptor in the batch, the used descriptor for the next batch overwriting the first available descriptor in the next batch, etc. " So: - It's an alignment to the spec - device may (or should) overwrite the descriptor make also make address field useless. You didn't get my point... I don't hope so. I agreed driver should track the DMA addrs or some other necessary things from the very beginning. And I also repeated the spec to emphasize that it does make sense. And I'd like to do that. What I was saying is that, to support OOO, we may need to manage these context (which saves DMA addrs etc) via a list which is similar to the desc list maintained via `next` in split ring instead of an array whose elements always
Re: [RFC v4 3/5] virtio_ring: add packed ring support
On 2018年05月18日 22:33, Tiwei Bie wrote: On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote: On 2018年05月18日 19:29, Tiwei Bie wrote: On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote: On 2018年05月16日 22:33, Tiwei Bie wrote: On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote: On 2018年05月16日 21:45, Tiwei Bie wrote: On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote: On 2018年05月16日 20:39, Tiwei Bie wrote: On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote: On 2018年05月16日 16:37, Tiwei Bie wrote: [...] +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head, + unsigned int id, void **ctx) +{ + struct vring_packed_desc *desc; + unsigned int i, j; + + /* Clear data ptr. */ + vq->desc_state[id].data = NULL; + + i = head; + + for (j = 0; j < vq->desc_state[id].num; j++) { + desc = >vring_packed.desc[i]; + vring_unmap_one_packed(vq, desc); As mentioned in previous discussion, this probably won't work for the case of out of order completion since it depends on the information in the descriptor ring. We probably need to extend ctx to record such information. Above code doesn't depend on the information in the descriptor ring. The vq->desc_state[] is the extended ctx. Best regards, Tiwei Bie Yes, but desc is a pointer to descriptor ring I think so vring_unmap_one_packed() still depends on the content of descriptor ring? I got your point now. I think it makes sense to reserve the bits of the addr field. Driver shouldn't try to get addrs from the descriptors when cleanup the descriptors no matter whether we support out-of-order or not. Maybe I was wrong, but I remember spec mentioned something like this. You're right. Spec mentioned this. I was just repeating the spec to emphasize that it does make sense. :) But combining it with the out-of-order support, it will mean that the driver still needs to maintain a desc/ctx list that is very similar to the desc ring in the split ring. I'm not quite sure whether it's something we want. If it is true, I'll do it. So do you think we also want to maintain such a desc/ctx list for packed ring? To make it work for OOO backends I think we need something like this (hardware NIC drivers are usually have something like this). Which hardware NIC drivers have this? It's quite common I think, e.g driver track e.g dma addr and page frag somewhere. e.g the ring->rx_info in mlx4 driver. It seems that I had a misunderstanding on your previous comments. I know it's quite common for drivers to track e.g. DMA addrs somewhere (and I think one reason behind this is that they want to reuse the bits of addr field). Yes, we may want this for virtio-net as well in the future. But tracking addrs somewhere doesn't means supporting OOO. I thought you were saying it's quite common for hardware NIC drivers to support OOO (i.e. NICs will return the descriptors OOO): I'm not familiar with mlx4, maybe I'm wrong. I just had a quick glance. And I found below comments in mlx4_en_process_rx_cq(): ``` /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx * descriptor offset can be deduced from the CQE index instead of * reading 'cqe->index' */ index = cq->mcq.cons_index & ring->size_mask; cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor; ``` It seems that although they have a completion queue, they are still using the ring in order. I guess so (at least from the above bits). Git grep -i "out of order" in drivers/net gives some hints. Looks like there're few deivces do this. I guess maybe storage device may want OOO. Right, some iSCSI did. But tracking them elsewhere is not only for OOO. Spec said: for element address " In a used descriptor, Element Address is unused. " for Next flag: " For example, if descriptors are used in the same order in which they are made available, this will result in the used descriptor overwriting the first available descriptor in the list, the used descriptor for the next list overwriting the first available descriptor in the next list, etc. " for in order completion: " This will result in the used descriptor overwriting the first available descriptor in the batch, the used descriptor for the next batch overwriting the first available descriptor in the next batch, etc. " So: - It's an alignment to the spec - device may (or should) overwrite the descriptor make also make address field useless. You didn't get my point... I don't hope so. I agreed driver should track the DMA addrs or some other necessary things from the very beginning. And I also repeated the spec to emphasize that it does make sense. And I'd like to do that. What I was saying is that, to support OOO, we may need to manage these context (which saves DMA addrs etc) via a list which is similar to the desc list maintained via `next` in split ring instead of an array whose elements always
Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
On 5/17/2018 5:22 PM, Long Li wrote: From: Long LiThere's a typo "recognize" in the patch title When doing RDMA send, the offset needs to be checked as data may start in an offset in the 1st page. Doesn't this patch alter the generic smb2pdu.c code too? I think this should note "any" send, not just RDMA? Tom. Signed-off-by: Long Li --- fs/cifs/smb2pdu.c | 3 ++- fs/cifs/smbdirect.c | 25 +++-- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 5097f28..fdcf97e 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = wdata->pages; + rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : wdata->pages; + rqst.rq_offset = wdata->page_offset; rqst.rq_npages = wdata->nr_pages; rqst.rq_pagesz = wdata->pagesz; rqst.rq_tailsz = wdata->tailsz; diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index b0a1955..b46586d 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst) /* add in the page array if there is one */ if (rqst->rq_npages) { - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1); - buflen += rqst->rq_tailsz; + if (rqst->rq_npages == 1) + buflen += rqst->rq_tailsz; + else + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - rqst->rq_offset + rqst->rq_tailsz; } if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8 +2184,19 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst) /* now sending pages if there are any */ for (i = 0; i < rqst->rq_npages; i++) { - buflen = (i == rqst->rq_npages-1) ? - rqst->rq_tailsz : rqst->rq_pagesz; + unsigned int offset = 0; + if (i == 0) + offset = rqst->rq_offset; + if (rqst->rq_npages == 1 || i == rqst->rq_npages-1) + buflen = rqst->rq_tailsz; + else { + /* We have at least two pages, and this is not the last page */ + if (i == 0) + buflen = rqst->rq_pagesz - rqst->rq_offset; + else + buflen = rqst->rq_pagesz; + } + nvecs = (buflen + max_iov_size - 1) / max_iov_size; log_write(INFO, "sending pages buflen=%d nvecs=%d\n", buflen, nvecs); @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst) remaining_data_length -= size; log_write(INFO, "sending pages i=%d offset=%d size=%d" " remaining_data_length=%d\n", - i, j*max_iov_size, size, remaining_data_length); + i, j*max_iov_size+offset, size, remaining_data_length); rc = smbd_post_send_page( - info, rqst->rq_pages[i], j*max_iov_size, + info, rqst->rq_pages[i], j*max_iov_size + offset, size, remaining_data_length); if (rc) goto done;
Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page
On 5/17/2018 5:22 PM, Long Li wrote: From: Long Li There's a typo "recognize" in the patch title When doing RDMA send, the offset needs to be checked as data may start in an offset in the 1st page. Doesn't this patch alter the generic smb2pdu.c code too? I think this should note "any" send, not just RDMA? Tom. Signed-off-by: Long Li --- fs/cifs/smb2pdu.c | 3 ++- fs/cifs/smbdirect.c | 25 +++-- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 5097f28..fdcf97e 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata, rqst.rq_iov = iov; rqst.rq_nvec = 2; - rqst.rq_pages = wdata->pages; + rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : wdata->pages; + rqst.rq_offset = wdata->page_offset; rqst.rq_npages = wdata->nr_pages; rqst.rq_pagesz = wdata->pagesz; rqst.rq_tailsz = wdata->tailsz; diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index b0a1955..b46586d 100644 --- a/fs/cifs/smbdirect.c +++ b/fs/cifs/smbdirect.c @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst) /* add in the page array if there is one */ if (rqst->rq_npages) { - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1); - buflen += rqst->rq_tailsz; + if (rqst->rq_npages == 1) + buflen += rqst->rq_tailsz; + else + buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - rqst->rq_offset + rqst->rq_tailsz; } if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8 +2184,19 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst) /* now sending pages if there are any */ for (i = 0; i < rqst->rq_npages; i++) { - buflen = (i == rqst->rq_npages-1) ? - rqst->rq_tailsz : rqst->rq_pagesz; + unsigned int offset = 0; + if (i == 0) + offset = rqst->rq_offset; + if (rqst->rq_npages == 1 || i == rqst->rq_npages-1) + buflen = rqst->rq_tailsz; + else { + /* We have at least two pages, and this is not the last page */ + if (i == 0) + buflen = rqst->rq_pagesz - rqst->rq_offset; + else + buflen = rqst->rq_pagesz; + } + nvecs = (buflen + max_iov_size - 1) / max_iov_size; log_write(INFO, "sending pages buflen=%d nvecs=%d\n", buflen, nvecs); @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst) remaining_data_length -= size; log_write(INFO, "sending pages i=%d offset=%d size=%d" " remaining_data_length=%d\n", - i, j*max_iov_size, size, remaining_data_length); + i, j*max_iov_size+offset, size, remaining_data_length); rc = smbd_post_send_page( - info, rqst->rq_pages[i], j*max_iov_size, + info, rqst->rq_pages[i], j*max_iov_size + offset, size, remaining_data_length); if (rc) goto done;
Re: [PATCH net] tuntap: raise EPOLLOUT on device up
On 2018年05月18日 22:46, Michael S. Tsirkin wrote: On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote: On 2018年05月18日 22:06, Michael S. Tsirkin wrote: On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote: On 2018年05月18日 21:26, Jason Wang wrote: On 2018年05月18日 21:13, Michael S. Tsirkin wrote: On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote: We return -EIO on device down but can not raise EPOLLOUT after it was up. This may confuse user like vhost which expects tuntap to raise EPOLLOUT to re-enable its TX routine after tuntap is down. This could be easily reproduced by transmitting packets from VM while down and up the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO. Cc: Hannes Frederic SowaCc: Eric Dumazet Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()") Signed-off-by: Jason Wang --- drivers/net/tun.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d45ac37..1b29761 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, int skb_xdp = 1; bool frags = tun_napi_frags_enabled(tun); - if (!(tun->dev->flags & IFF_UP)) + if (!(tun->dev->flags & IFF_UP)) { Isn't this racy? What if flag is cleared at this point? I think you mean "set at this point"? Then yes, so we probably need to set the bit during tun_net_close(). Thanks Looks no need, vhost will poll socket after it see EIO. So we are ok here? Thanks In fact I don't even understand why does this help any longer. We disable tx polling and only enable it on demand for a better rx performance. You may want to have a look at : commit feb8892cb441c742d4220cf7ced001e7fa070731 Author: Jason Wang Date: Mon Nov 13 11:45:34 2017 +0800 vhost_net: conditionally enable tx polling Thanks Question is, what looks at SOCKWQ_ASYNC_NOSPACE. I think it's tested when packet is transmitted, but there is no guarantee here any packet will ever be transmitted. Well, actually, I do plan to disable vq polling from the beginning. But looks like you do not want this: See https://patchwork.kernel.org/patch/10034025/ Thanks
Re: [PATCH net] tuntap: raise EPOLLOUT on device up
On 2018年05月18日 22:46, Michael S. Tsirkin wrote: On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote: On 2018年05月18日 22:06, Michael S. Tsirkin wrote: On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote: On 2018年05月18日 21:26, Jason Wang wrote: On 2018年05月18日 21:13, Michael S. Tsirkin wrote: On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote: We return -EIO on device down but can not raise EPOLLOUT after it was up. This may confuse user like vhost which expects tuntap to raise EPOLLOUT to re-enable its TX routine after tuntap is down. This could be easily reproduced by transmitting packets from VM while down and up the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO. Cc: Hannes Frederic Sowa Cc: Eric Dumazet Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()") Signed-off-by: Jason Wang --- drivers/net/tun.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d45ac37..1b29761 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, int skb_xdp = 1; bool frags = tun_napi_frags_enabled(tun); - if (!(tun->dev->flags & IFF_UP)) + if (!(tun->dev->flags & IFF_UP)) { Isn't this racy? What if flag is cleared at this point? I think you mean "set at this point"? Then yes, so we probably need to set the bit during tun_net_close(). Thanks Looks no need, vhost will poll socket after it see EIO. So we are ok here? Thanks In fact I don't even understand why does this help any longer. We disable tx polling and only enable it on demand for a better rx performance. You may want to have a look at : commit feb8892cb441c742d4220cf7ced001e7fa070731 Author: Jason Wang Date: Mon Nov 13 11:45:34 2017 +0800 vhost_net: conditionally enable tx polling Thanks Question is, what looks at SOCKWQ_ASYNC_NOSPACE. I think it's tested when packet is transmitted, but there is no guarantee here any packet will ever be transmitted. Well, actually, I do plan to disable vq polling from the beginning. But looks like you do not want this: See https://patchwork.kernel.org/patch/10034025/ Thanks
[PATCH v1] gpu: host1x: Skip IOMMU initialization if firewall is enabled
Host1x's CDMA can't access the command buffers if IOMMU and Host1x firewall are enabled in the kernels config because firewall doesn't map the copied buffer into IOVA space. Fix this by skipping IOMMU initialization if firewall is enabled as firewall merges sparse cmdbufs into a single contiguous buffer and hence IOMMU isn't needed in this case. Signed-off-by: Dmitry Osipenko--- drivers/gpu/host1x/dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index f1d5f76e9c33..d88073e7d22d 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -218,6 +218,9 @@ static int host1x_probe(struct platform_device *pdev) return err; } + if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) + goto skip_iommu; + host->group = iommu_group_get(>dev); if (host->group) { struct iommu_domain_geometry *geometry; -- 2.17.0
[PATCH v1] gpu: host1x: Skip IOMMU initialization if firewall is enabled
Host1x's CDMA can't access the command buffers if IOMMU and Host1x firewall are enabled in the kernels config because firewall doesn't map the copied buffer into IOVA space. Fix this by skipping IOMMU initialization if firewall is enabled as firewall merges sparse cmdbufs into a single contiguous buffer and hence IOMMU isn't needed in this case. Signed-off-by: Dmitry Osipenko --- drivers/gpu/host1x/dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index f1d5f76e9c33..d88073e7d22d 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -218,6 +218,9 @@ static int host1x_probe(struct platform_device *pdev) return err; } + if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL)) + goto skip_iommu; + host->group = iommu_group_get(>dev); if (host->group) { struct iommu_domain_geometry *geometry; -- 2.17.0
Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
On 5/17/2018 11:03 PM, Long Li wrote: Subject: Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA On 5/17/2018 8:22 PM, Long Li wrote: From: Long LiThis patchset implements direct user I/O through RDMA. In normal code path (even with cache=none), CIFS copies I/O data from user-space to kernel-space for security reasons. With this patchset, a new mounting option is introduced to have CIFS pin the user-space buffer into memory and performs I/O through RDMA. This avoids memory copy, at the cost of added security risk. What's the security risk? This type of direct i/o behavior is not uncommon, and can certainly be made safe, using the appropriate memory registration and protection domains. Any risk needs to be stated explicitly, and mitigation provided, or at least described. I think it's an assumption that user-mode buffer can't be trusted, so CIFS always copies them into internal buffers, and calculate signature and encryption based on protocol used. With the direct buffer, the user can potentially modify the buffer when signature or encryption is in progress or after they are done. I don't agree that the legacy copying behavior is because the buffer is "untrusted". The buffer is the user's data, there's no trust issue here. If the user application modifies the buffer while it's being sent, it's a violation of the API contract, and the only victim is the application itself. Same applies for receiving data. And as pointed out, most all storage layers, file and block both, use this strategy for direct i/o. Regarding signing, if the application alters the data then the integrity hash will simply do its job and catch the application in the act. Again, nothing suffers but the application. Regarding encryption, I assume you're proposing to encrypt and decrypt the data in a kernel buffer, effectively a copy. So in fact, in the encryption case there's no need to pin and map the user buffer at all. I'll mention however that Windows takes the path of not performing RDMA placement when encrypting data. It saves nothing, and even adds some overhead, because of the need to touch the buffer anyway to manage the encryption/decryption. Bottom line - no security implication for using user buffers directly. Tom. I also want to point out that, I choose to implement .read_iter and .write_iter from file_operations to implement direct I/O (CIFS is already doing this for O_DIRECT, so following this code path will avoid a big mess up). The ideal choice is to implement .direct_IO from address_space_operations that I think eventually we want to move to. Tom. This patchset is RFC. The work is in progress, do not merge. Long Li (9): Introduce offset for the 1st page in data transfer structures Change wdata alloc to support direct pages Change rdata alloc to support direct pages Change function to support offset when reading pages Change RDMA send to regonize page offset in the 1st page Change RDMA recv to support offset in the 1st page Support page offset in memory regsitrations Implement no-copy file I/O interfaces Introduce cache=rdma moutning option fs/cifs/cifs_fs_sb.h | 2 + fs/cifs/cifsfs.c | 19 +++ fs/cifs/cifsfs.h | 3 + fs/cifs/cifsglob.h| 6 + fs/cifs/cifsproto.h | 4 +- fs/cifs/cifssmb.c | 10 +- fs/cifs/connect.c | 13 +- fs/cifs/dir.c | 5 + fs/cifs/file.c| 351 ++ fs/cifs/inode.c | 4 +- fs/cifs/smb2ops.c | 2 +- fs/cifs/smb2pdu.c | 22 ++- fs/cifs/smbdirect.c | 132 ++--- fs/cifs/smbdirect.h | 2 +- fs/read_write.c | 7 + include/linux/ratelimit.h | 2 +- 16 files changed, 489 insertions(+), 95 deletions(-) N�r��y���b�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!tml=
Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA
On 5/17/2018 11:03 PM, Long Li wrote: Subject: Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA On 5/17/2018 8:22 PM, Long Li wrote: From: Long Li This patchset implements direct user I/O through RDMA. In normal code path (even with cache=none), CIFS copies I/O data from user-space to kernel-space for security reasons. With this patchset, a new mounting option is introduced to have CIFS pin the user-space buffer into memory and performs I/O through RDMA. This avoids memory copy, at the cost of added security risk. What's the security risk? This type of direct i/o behavior is not uncommon, and can certainly be made safe, using the appropriate memory registration and protection domains. Any risk needs to be stated explicitly, and mitigation provided, or at least described. I think it's an assumption that user-mode buffer can't be trusted, so CIFS always copies them into internal buffers, and calculate signature and encryption based on protocol used. With the direct buffer, the user can potentially modify the buffer when signature or encryption is in progress or after they are done. I don't agree that the legacy copying behavior is because the buffer is "untrusted". The buffer is the user's data, there's no trust issue here. If the user application modifies the buffer while it's being sent, it's a violation of the API contract, and the only victim is the application itself. Same applies for receiving data. And as pointed out, most all storage layers, file and block both, use this strategy for direct i/o. Regarding signing, if the application alters the data then the integrity hash will simply do its job and catch the application in the act. Again, nothing suffers but the application. Regarding encryption, I assume you're proposing to encrypt and decrypt the data in a kernel buffer, effectively a copy. So in fact, in the encryption case there's no need to pin and map the user buffer at all. I'll mention however that Windows takes the path of not performing RDMA placement when encrypting data. It saves nothing, and even adds some overhead, because of the need to touch the buffer anyway to manage the encryption/decryption. Bottom line - no security implication for using user buffers directly. Tom. I also want to point out that, I choose to implement .read_iter and .write_iter from file_operations to implement direct I/O (CIFS is already doing this for O_DIRECT, so following this code path will avoid a big mess up). The ideal choice is to implement .direct_IO from address_space_operations that I think eventually we want to move to. Tom. This patchset is RFC. The work is in progress, do not merge. Long Li (9): Introduce offset for the 1st page in data transfer structures Change wdata alloc to support direct pages Change rdata alloc to support direct pages Change function to support offset when reading pages Change RDMA send to regonize page offset in the 1st page Change RDMA recv to support offset in the 1st page Support page offset in memory regsitrations Implement no-copy file I/O interfaces Introduce cache=rdma moutning option fs/cifs/cifs_fs_sb.h | 2 + fs/cifs/cifsfs.c | 19 +++ fs/cifs/cifsfs.h | 3 + fs/cifs/cifsglob.h| 6 + fs/cifs/cifsproto.h | 4 +- fs/cifs/cifssmb.c | 10 +- fs/cifs/connect.c | 13 +- fs/cifs/dir.c | 5 + fs/cifs/file.c| 351 ++ fs/cifs/inode.c | 4 +- fs/cifs/smb2ops.c | 2 +- fs/cifs/smb2pdu.c | 22 ++- fs/cifs/smbdirect.c | 132 ++--- fs/cifs/smbdirect.h | 2 +- fs/read_write.c | 7 + include/linux/ratelimit.h | 2 +- 16 files changed, 489 insertions(+), 95 deletions(-) N�r��y���b�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj"��!tml=
Re: [RFC PATCH 02/09] Change wdata alloc to support direct pages
On 5/17/2018 5:22 PM, Long Li wrote: From: Long LiWhen using direct pages from user space, there is no need to allocate pages. Just ping those user pages for RDMA. Did you mean "pin" those user pages? If so, where does that pinning occur, it's not in this patch. Perhaps this should just say "point to" those user pages. I also don't think this is necessarily only "for RDMA". Perhaps there are other transport scenarios where this is advantageous. Signed-off-by: Long Li --- fs/cifs/cifsproto.h | 2 +- fs/cifs/cifssmb.c | 10 +++--- fs/cifs/file.c | 4 ++-- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 365a414..94106b9 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -523,7 +523,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid); int cifs_async_writev(struct cifs_writedata *wdata, void (*release)(struct kref *kref)); void cifs_writev_complete(struct work_struct *work); -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, +struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete); void cifs_writedata_release(struct kref *refcount); int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 1529a08..3b1731d 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -1983,7 +1983,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata) tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE; } - wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete); + wdata2 = cifs_writedata_alloc(nr_pages, NULL, cifs_writev_complete); if (!wdata2) { rc = -ENOMEM; break; @@ -2067,12 +2067,16 @@ cifs_writev_complete(struct work_struct *work) } struct cifs_writedata * -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) +cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete) { struct cifs_writedata *wdata; /* writedata + number of page pointers */ - wdata = kzalloc(sizeof(*wdata) + + if (direct_pages) { + wdata = kzalloc(sizeof(*wdata), GFP_NOFS); + wdata->direct_pages = direct_pages; + } else + wdata = kzalloc(sizeof(*wdata) + sizeof(struct page *) * nr_pages, GFP_NOFS); if (wdata != NULL) { kref_init(>refcount); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 23fd430..a6ec896 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1965,7 +1965,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, { struct cifs_writedata *wdata; - wdata = cifs_writedata_alloc((unsigned int)tofind, + wdata = cifs_writedata_alloc((unsigned int)tofind, NULL, cifs_writev_complete); if (!wdata) return NULL; @@ -2554,7 +2554,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, break; nr_pages = get_numpages(wsize, len, _len); - wdata = cifs_writedata_alloc(nr_pages, + wdata = cifs_writedata_alloc(nr_pages, NULL, cifs_uncached_writev_complete); if (!wdata) { rc = -ENOMEM;
Re: [RFC PATCH 02/09] Change wdata alloc to support direct pages
On 5/17/2018 5:22 PM, Long Li wrote: From: Long Li When using direct pages from user space, there is no need to allocate pages. Just ping those user pages for RDMA. Did you mean "pin" those user pages? If so, where does that pinning occur, it's not in this patch. Perhaps this should just say "point to" those user pages. I also don't think this is necessarily only "for RDMA". Perhaps there are other transport scenarios where this is advantageous. Signed-off-by: Long Li --- fs/cifs/cifsproto.h | 2 +- fs/cifs/cifssmb.c | 10 +++--- fs/cifs/file.c | 4 ++-- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index 365a414..94106b9 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -523,7 +523,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, struct mid_q_entry *mid); int cifs_async_writev(struct cifs_writedata *wdata, void (*release)(struct kref *kref)); void cifs_writev_complete(struct work_struct *work); -struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, +struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete); void cifs_writedata_release(struct kref *refcount); int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c index 1529a08..3b1731d 100644 --- a/fs/cifs/cifssmb.c +++ b/fs/cifs/cifssmb.c @@ -1983,7 +1983,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata) tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE; } - wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete); + wdata2 = cifs_writedata_alloc(nr_pages, NULL, cifs_writev_complete); if (!wdata2) { rc = -ENOMEM; break; @@ -2067,12 +2067,16 @@ cifs_writev_complete(struct work_struct *work) } struct cifs_writedata * -cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) +cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, work_func_t complete) { struct cifs_writedata *wdata; /* writedata + number of page pointers */ - wdata = kzalloc(sizeof(*wdata) + + if (direct_pages) { + wdata = kzalloc(sizeof(*wdata), GFP_NOFS); + wdata->direct_pages = direct_pages; + } else + wdata = kzalloc(sizeof(*wdata) + sizeof(struct page *) * nr_pages, GFP_NOFS); if (wdata != NULL) { kref_init(>refcount); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 23fd430..a6ec896 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -1965,7 +1965,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, { struct cifs_writedata *wdata; - wdata = cifs_writedata_alloc((unsigned int)tofind, + wdata = cifs_writedata_alloc((unsigned int)tofind, NULL, cifs_writev_complete); if (!wdata) return NULL; @@ -2554,7 +2554,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, break; nr_pages = get_numpages(wsize, len, _len); - wdata = cifs_writedata_alloc(nr_pages, + wdata = cifs_writedata_alloc(nr_pages, NULL, cifs_uncached_writev_complete); if (!wdata) { rc = -ENOMEM;
[BUG] 4.17-rcX delays during boot and memory mapping errors
Apologies for not having more in the way of diagnostic info. Every one of the 4.17-rcX series has hung during boot at the point where "acpid" starts up. Beginning with -rc5, the boot actually proceeds to completion after an "uncomfortably" long delay. Attempting to run the X11 server results in the following diagnostic output in "/var/log/Xorg.0.log": (...) [ 109.951] (--) Depth 24 pixmap format is 32 bpp [ 109.951] (II) RADEON(0): [DRI2] Setup complete [ 109.951] (II) RADEON(0): [DRI2] DRI driver: r600 [ 109.951] (II) RADEON(0): [DRI2] VDPAU driver: r600 [ 109.951] Failed to map cursor buffer memory [ 109.951] Failed to map cursor buffer memory [ 109.951] Failed to map cursor buffer memory [ 109.952] Failed to map cursor buffer memory [ 109.952] Failed to map cursor buffer memory [ 109.952] Failed to map cursor buffer memory [ 109.952] (II) RADEON(0): Front buffer size: 8160K [ 109.952] (II) RADEON(0): VRAM usage limit set to 218167K [ 109.952] (==) RADEON(0): DRI3 disabled [ 109.952] (==) RADEON(0): Backing store enabled [ 109.952] (II) RADEON(0): Direct rendering enabled [ 109.952] (II) EXA(0): Driver allocated offscreen pixmaps [ 109.952] (II) EXA(0): Driver registered support for the following operations: [ 109.952] (II) Solid [ 109.952] (II) Copy [ 109.952] (II) Composite (RENDER acceleration) [ 109.952] (II) UploadToScreen [ 109.952] (II) DownloadFromScreen [ 109.952] (EE) Fatal server error: [ 109.952] (EE) failed to map shader -75 [ 109.952] (EE) [ 109.952] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 109.952] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 109.952] (EE) [ 109.971] (EE) Server terminated with error (1). Closing log file. Everything works normally with a 4.16 kernel. --Bob
[BUG] 4.17-rcX delays during boot and memory mapping errors
Apologies for not having more in the way of diagnostic info. Every one of the 4.17-rcX series has hung during boot at the point where "acpid" starts up. Beginning with -rc5, the boot actually proceeds to completion after an "uncomfortably" long delay. Attempting to run the X11 server results in the following diagnostic output in "/var/log/Xorg.0.log": (...) [ 109.951] (--) Depth 24 pixmap format is 32 bpp [ 109.951] (II) RADEON(0): [DRI2] Setup complete [ 109.951] (II) RADEON(0): [DRI2] DRI driver: r600 [ 109.951] (II) RADEON(0): [DRI2] VDPAU driver: r600 [ 109.951] Failed to map cursor buffer memory [ 109.951] Failed to map cursor buffer memory [ 109.951] Failed to map cursor buffer memory [ 109.952] Failed to map cursor buffer memory [ 109.952] Failed to map cursor buffer memory [ 109.952] Failed to map cursor buffer memory [ 109.952] (II) RADEON(0): Front buffer size: 8160K [ 109.952] (II) RADEON(0): VRAM usage limit set to 218167K [ 109.952] (==) RADEON(0): DRI3 disabled [ 109.952] (==) RADEON(0): Backing store enabled [ 109.952] (II) RADEON(0): Direct rendering enabled [ 109.952] (II) EXA(0): Driver allocated offscreen pixmaps [ 109.952] (II) EXA(0): Driver registered support for the following operations: [ 109.952] (II) Solid [ 109.952] (II) Copy [ 109.952] (II) Composite (RENDER acceleration) [ 109.952] (II) UploadToScreen [ 109.952] (II) DownloadFromScreen [ 109.952] (EE) Fatal server error: [ 109.952] (EE) failed to map shader -75 [ 109.952] (EE) [ 109.952] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 109.952] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 109.952] (EE) [ 109.971] (EE) Server terminated with error (1). Closing log file. Everything works normally with a 4.16 kernel. --Bob
Re: [PATCH v1] gpu: host1x: Utilize IOMMU mapping for firewall-copied buffers
On 19.05.2018 02:52, Dmitry Osipenko wrote: > Map firewall-copied buffers into Host1x's IOVA space, otherwise Host1x > CDMA can't access the command buffers and all submitted jobs fail if IOMMU > and Host1x firewall are enabled in the kernels config. > > Signed-off-by: Dmitry Osipenko> --- > drivers/gpu/host1x/job.c | 58 +++- > include/linux/host1x.h | 4 ++- > 2 files changed, 55 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c > index e2f4a4d93d20..57384a5b5059 100644 > --- a/drivers/gpu/host1x/job.c > +++ b/drivers/gpu/host1x/job.c > @@ -449,10 +449,13 @@ static int validate(struct host1x_firewall *fw, struct > host1x_job_gather *g) > > static inline int copy_gathers(struct host1x_job *job, struct device *dev) > { > + struct host1x *host = dev_get_drvdata(job->channel->dev->parent); > struct host1x_firewall fw; > + dma_addr_t dma_addr; > size_t size = 0; > size_t offset = 0; > unsigned int i; > + int err; > > fw.job = job; > fw.dev = dev; > @@ -466,23 +469,55 @@ static inline int copy_gathers(struct host1x_job *job, > struct device *dev) > size += g->words * sizeof(u32); > } > > + if (host->domain) > + size = iova_align(>iova, size); > + > /* >* Try a non-blocking allocation from a higher priority pools first, >* as awaiting for the allocation here is a major performance hit. >*/ > - job->gather_copy_mapped = dma_alloc_wc(dev, size, >gather_copy, > -GFP_NOWAIT); > + job->gather_copy_mapped = dma_alloc_wc(dev, size, > +>gather_copy_phys, > +GFP_NOWAIT); > > /* the higher priority allocation failed, try the generic-blocking */ > if (!job->gather_copy_mapped) > job->gather_copy_mapped = dma_alloc_wc(dev, size, > ->gather_copy, > +>gather_copy_phys, > GFP_KERNEL); > if (!job->gather_copy_mapped) > return -ENOMEM; > > job->gather_copy_size = size; > > + if (host->domain) { > + unsigned long shift; > + > + shift = iova_shift(>iova); > + job->gather_copy_iova_alloc = alloc_iova( > + >iova, size >> shift, > + host->iova_end >> shift, true); > + if (!job->gather_copy_iova_alloc) > + return -ENOMEM; > + > + job->gather_copy_iova = iova_dma_addr( > + >iova, job->gather_copy_iova_alloc); > + > + err = iommu_map(host->domain, > + job->gather_copy_iova, > + job->gather_copy_phys, > + size, IOMMU_READ); > + if (err) { > + __free_iova(>iova, job->gather_copy_iova_alloc); > + job->gather_copy_iova_alloc = NULL; > + return err; > + } > + > + dma_addr = job->gather_copy_iova; > + } else { > + dma_addr = job->gather_copy_phys; > + } > + > for (i = 0; i < job->num_gathers; i++) { > struct host1x_job_gather *g = >gathers[i]; > void *gather; > @@ -494,7 +529,7 @@ static inline int copy_gathers(struct host1x_job *job, > struct device *dev) > host1x_bo_munmap(g->bo, gather); > > /* Store the location in the buffer */ > - g->base = job->gather_copy; > + g->base = dma_addr; > g->offset = offset; > > /* Validate the job */ > @@ -582,9 +617,20 @@ void host1x_job_unpin(struct host1x_job *job) > > job->num_unpins = 0; > > - if (job->gather_copy_size) > + if (job->gather_copy_size) { > dma_free_wc(job->channel->dev, job->gather_copy_size, > - job->gather_copy_mapped, job->gather_copy); > + job->gather_copy_mapped, job->gather_copy_phys); > + > + if (job->gather_copy_iova_alloc) { > + iommu_unmap(host->domain, > + job->gather_copy_iova, > + job->gather_copy_size); > + > + __free_iova(>iova, job->gather_copy_iova_alloc); > + > + job->gather_copy_iova_alloc = NULL; > + } > + } > } > EXPORT_SYMBOL(host1x_job_unpin); > > diff --git a/include/linux/host1x.h b/include/linux/host1x.h > index 57d26406bdfd..536a678f81d4 100644 > --- a/include/linux/host1x.h > +++ b/include/linux/host1x.h > @@ -235,8 +235,10 @@ struct
Re: [PATCH v1] gpu: host1x: Utilize IOMMU mapping for firewall-copied buffers
On 19.05.2018 02:52, Dmitry Osipenko wrote: > Map firewall-copied buffers into Host1x's IOVA space, otherwise Host1x > CDMA can't access the command buffers and all submitted jobs fail if IOMMU > and Host1x firewall are enabled in the kernels config. > > Signed-off-by: Dmitry Osipenko > --- > drivers/gpu/host1x/job.c | 58 +++- > include/linux/host1x.h | 4 ++- > 2 files changed, 55 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c > index e2f4a4d93d20..57384a5b5059 100644 > --- a/drivers/gpu/host1x/job.c > +++ b/drivers/gpu/host1x/job.c > @@ -449,10 +449,13 @@ static int validate(struct host1x_firewall *fw, struct > host1x_job_gather *g) > > static inline int copy_gathers(struct host1x_job *job, struct device *dev) > { > + struct host1x *host = dev_get_drvdata(job->channel->dev->parent); > struct host1x_firewall fw; > + dma_addr_t dma_addr; > size_t size = 0; > size_t offset = 0; > unsigned int i; > + int err; > > fw.job = job; > fw.dev = dev; > @@ -466,23 +469,55 @@ static inline int copy_gathers(struct host1x_job *job, > struct device *dev) > size += g->words * sizeof(u32); > } > > + if (host->domain) > + size = iova_align(>iova, size); > + > /* >* Try a non-blocking allocation from a higher priority pools first, >* as awaiting for the allocation here is a major performance hit. >*/ > - job->gather_copy_mapped = dma_alloc_wc(dev, size, >gather_copy, > -GFP_NOWAIT); > + job->gather_copy_mapped = dma_alloc_wc(dev, size, > +>gather_copy_phys, > +GFP_NOWAIT); > > /* the higher priority allocation failed, try the generic-blocking */ > if (!job->gather_copy_mapped) > job->gather_copy_mapped = dma_alloc_wc(dev, size, > ->gather_copy, > +>gather_copy_phys, > GFP_KERNEL); > if (!job->gather_copy_mapped) > return -ENOMEM; > > job->gather_copy_size = size; > > + if (host->domain) { > + unsigned long shift; > + > + shift = iova_shift(>iova); > + job->gather_copy_iova_alloc = alloc_iova( > + >iova, size >> shift, > + host->iova_end >> shift, true); > + if (!job->gather_copy_iova_alloc) > + return -ENOMEM; > + > + job->gather_copy_iova = iova_dma_addr( > + >iova, job->gather_copy_iova_alloc); > + > + err = iommu_map(host->domain, > + job->gather_copy_iova, > + job->gather_copy_phys, > + size, IOMMU_READ); > + if (err) { > + __free_iova(>iova, job->gather_copy_iova_alloc); > + job->gather_copy_iova_alloc = NULL; > + return err; > + } > + > + dma_addr = job->gather_copy_iova; > + } else { > + dma_addr = job->gather_copy_phys; > + } > + > for (i = 0; i < job->num_gathers; i++) { > struct host1x_job_gather *g = >gathers[i]; > void *gather; > @@ -494,7 +529,7 @@ static inline int copy_gathers(struct host1x_job *job, > struct device *dev) > host1x_bo_munmap(g->bo, gather); > > /* Store the location in the buffer */ > - g->base = job->gather_copy; > + g->base = dma_addr; > g->offset = offset; > > /* Validate the job */ > @@ -582,9 +617,20 @@ void host1x_job_unpin(struct host1x_job *job) > > job->num_unpins = 0; > > - if (job->gather_copy_size) > + if (job->gather_copy_size) { > dma_free_wc(job->channel->dev, job->gather_copy_size, > - job->gather_copy_mapped, job->gather_copy); > + job->gather_copy_mapped, job->gather_copy_phys); > + > + if (job->gather_copy_iova_alloc) { > + iommu_unmap(host->domain, > + job->gather_copy_iova, > + job->gather_copy_size); > + > + __free_iova(>iova, job->gather_copy_iova_alloc); > + > + job->gather_copy_iova_alloc = NULL; > + } > + } > } > EXPORT_SYMBOL(host1x_job_unpin); > > diff --git a/include/linux/host1x.h b/include/linux/host1x.h > index 57d26406bdfd..536a678f81d4 100644 > --- a/include/linux/host1x.h > +++ b/include/linux/host1x.h > @@ -235,8 +235,10 @@ struct host1x_job { >
[PATCH] usb: dwc2: Fix HiKey regression caused by power_down feature
In 4.17-rc, commit 03ea6d6e9e1f ("usb: dwc2: Enable power down") caused the HiKey board to not correctly handle switching between usb-gadget and usb-host mode. Unplugging the OTG port would result in: [ 42.240973] dwc2 f72c.usb: dwc2_restore_host_registers: no host registers to restore [ 42.249066] dwc2 f72c.usb: dwc2_host_exit_hibernation: failed to restore host registers And the USB-host ports would not function. And plugging in the OTG port, we would see: [ 46.046557] WARNING: CPU: 3 PID: 6 at drivers/usb/dwc2/gadget.c:260 dwc2_hsotg_init_fifo+0x194/0x1a0 [ 46.055761] CPU: 3 PID: 6 Comm: kworker/u16:0 Not tainted 4.17.0-rc5-00030-ge67da8c #231 [ 46.055767] Hardware name: HiKey Development Board (DT) [ 46.055784] Workqueue: dwc2 dwc2_conn_id_status_change ... Thus, this patch sets the hisi params to disable the power_down flag by default, and gets thing working again. Cc: John YounCc: Vardan Mikayelyan Cc: Artur Petrosyan Cc: Grigor Tovmasyan Cc: Felipe Balbi Cc: linux-...@vger.kernel.org Signed-off-by: John Stultz --- drivers/usb/dwc2/params.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c index f03e418..96b1b25 100644 --- a/drivers/usb/dwc2/params.c +++ b/drivers/usb/dwc2/params.c @@ -70,6 +70,7 @@ static void dwc2_set_his_params(struct dwc2_hsotg *hsotg) GAHBCFG_HBSTLEN_SHIFT; p->uframe_sched = false; p->change_speed_quirk = true; + p->power_down = false; } static void dwc2_set_rk_params(struct dwc2_hsotg *hsotg) -- 2.7.4
[PATCH] usb: dwc2: Fix HiKey regression caused by power_down feature
In 4.17-rc, commit 03ea6d6e9e1f ("usb: dwc2: Enable power down") caused the HiKey board to not correctly handle switching between usb-gadget and usb-host mode. Unplugging the OTG port would result in: [ 42.240973] dwc2 f72c.usb: dwc2_restore_host_registers: no host registers to restore [ 42.249066] dwc2 f72c.usb: dwc2_host_exit_hibernation: failed to restore host registers And the USB-host ports would not function. And plugging in the OTG port, we would see: [ 46.046557] WARNING: CPU: 3 PID: 6 at drivers/usb/dwc2/gadget.c:260 dwc2_hsotg_init_fifo+0x194/0x1a0 [ 46.055761] CPU: 3 PID: 6 Comm: kworker/u16:0 Not tainted 4.17.0-rc5-00030-ge67da8c #231 [ 46.055767] Hardware name: HiKey Development Board (DT) [ 46.055784] Workqueue: dwc2 dwc2_conn_id_status_change ... Thus, this patch sets the hisi params to disable the power_down flag by default, and gets thing working again. Cc: John Youn Cc: Vardan Mikayelyan Cc: Artur Petrosyan Cc: Grigor Tovmasyan Cc: Felipe Balbi Cc: linux-...@vger.kernel.org Signed-off-by: John Stultz --- drivers/usb/dwc2/params.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c index f03e418..96b1b25 100644 --- a/drivers/usb/dwc2/params.c +++ b/drivers/usb/dwc2/params.c @@ -70,6 +70,7 @@ static void dwc2_set_his_params(struct dwc2_hsotg *hsotg) GAHBCFG_HBSTLEN_SHIFT; p->uframe_sched = false; p->change_speed_quirk = true; + p->power_down = false; } static void dwc2_set_rk_params(struct dwc2_hsotg *hsotg) -- 2.7.4