date:20180518

RE: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page

2018-05-18 Thread Long Li

> Subject: Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset
> in the 1st page
> 
> On 5/17/2018 5:22 PM, Long Li wrote:
> > From: Long Li 
> 
> There's a typo "recognize" in the patch title
> 
> 
> > When doing RDMA send, the offset needs to be checked as data may start
> > in an offset in the 1st page.
> 
> Doesn't this patch alter the generic smb2pdu.c code too? I think this should
> note "any" send, not just RDMA?

Yes, but for TCP the direct_pages and page_offset are always NULL and 0 when 
cache=rdma is in mounting options, so it doesn't really affect he TCP code path 
in this patch set.

This behavior can be changed, and it will work with TCP too.

> 
> Tom.
> 
> >
> > Signed-off-by: Long Li 
> > ---
> >   fs/cifs/smb2pdu.c   |  3 ++-
> >   fs/cifs/smbdirect.c | 25 +++--
> >   2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index
> > 5097f28..fdcf97e 100644
> > --- a/fs/cifs/smb2pdu.c
> > +++ b/fs/cifs/smb2pdu.c
> > @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
> >
> > rqst.rq_iov = iov;
> > rqst.rq_nvec = 2;
> > -   rqst.rq_pages = wdata->pages;
> > +   rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages :
> wdata->pages;
> > +   rqst.rq_offset = wdata->page_offset;
> > rqst.rq_npages = wdata->nr_pages;
> > rqst.rq_pagesz = wdata->pagesz;
> > rqst.rq_tailsz = wdata->tailsz;
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > b0a1955..b46586d 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info,
> > struct smb_rqst *rqst)
> >
> > /* add in the page array if there is one */
> > if (rqst->rq_npages) {
> > -   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > -   buflen += rqst->rq_tailsz;
> > +   if (rqst->rq_npages == 1)
> > +   buflen += rqst->rq_tailsz;
> > +   else
> > +   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > +rqst->rq_offset + rqst->rq_tailsz;
> > }
> >
> > if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8
> > +2184,19 @@ int smbd_send(struct smbd_connection *info, struct
> > smb_rqst *rqst)
> >
> > /* now sending pages if there are any */
> > for (i = 0; i < rqst->rq_npages; i++) {
> > -   buflen = (i == rqst->rq_npages-1) ?
> > -   rqst->rq_tailsz : rqst->rq_pagesz;
> > +   unsigned int offset = 0;
> > +   if (i == 0)
> > +   offset = rqst->rq_offset;
> > +   if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
> > +   buflen = rqst->rq_tailsz;
> > +   else {
> > +   /* We have at least two pages, and this is not the last
> page */
> > +   if (i == 0)
> > +   buflen = rqst->rq_pagesz - rqst->rq_offset;
> > +   else
> > +   buflen = rqst->rq_pagesz;
> > +   }
> > +
> > nvecs = (buflen + max_iov_size - 1) / max_iov_size;
> > log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
> > buflen, nvecs);
> > @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> > remaining_data_length -= size;
> > log_write(INFO, "sending pages i=%d offset=%d
> size=%d"
> > " remaining_data_length=%d\n",
> > -   i, j*max_iov_size, size,
> remaining_data_length);
> > +   i, j*max_iov_size+offset, size,
> remaining_data_length);
> > rc = smbd_post_send_page(
> > -   info, rqst->rq_pages[i], j*max_iov_size,
> > +   info, rqst->rq_pages[i], j*max_iov_size +
> offset,
> > size, remaining_data_length);
> > if (rc)
> > goto done;
> >

RE: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page

2018-05-18 Thread Long Li

> Subject: Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset
> in the 1st page
> 
> On 5/17/2018 5:22 PM, Long Li wrote:
> > From: Long Li 
> 
> There's a typo "recognize" in the patch title
> 
> 
> > When doing RDMA send, the offset needs to be checked as data may start
> > in an offset in the 1st page.
> 
> Doesn't this patch alter the generic smb2pdu.c code too? I think this should
> note "any" send, not just RDMA?

Yes, but for TCP the direct_pages and page_offset are always NULL and 0 when 
cache=rdma is in mounting options, so it doesn't really affect he TCP code path 
in this patch set.

This behavior can be changed, and it will work with TCP too.

> 
> Tom.
> 
> >
> > Signed-off-by: Long Li 
> > ---
> >   fs/cifs/smb2pdu.c   |  3 ++-
> >   fs/cifs/smbdirect.c | 25 +++--
> >   2 files changed, 21 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index
> > 5097f28..fdcf97e 100644
> > --- a/fs/cifs/smb2pdu.c
> > +++ b/fs/cifs/smb2pdu.c
> > @@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
> >
> > rqst.rq_iov = iov;
> > rqst.rq_nvec = 2;
> > -   rqst.rq_pages = wdata->pages;
> > +   rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages :
> wdata->pages;
> > +   rqst.rq_offset = wdata->page_offset;
> > rqst.rq_npages = wdata->nr_pages;
> > rqst.rq_pagesz = wdata->pagesz;
> > rqst.rq_tailsz = wdata->tailsz;
> > diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c index
> > b0a1955..b46586d 100644
> > --- a/fs/cifs/smbdirect.c
> > +++ b/fs/cifs/smbdirect.c
> > @@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info,
> > struct smb_rqst *rqst)
> >
> > /* add in the page array if there is one */
> > if (rqst->rq_npages) {
> > -   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
> > -   buflen += rqst->rq_tailsz;
> > +   if (rqst->rq_npages == 1)
> > +   buflen += rqst->rq_tailsz;
> > +   else
> > +   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) -
> > +rqst->rq_offset + rqst->rq_tailsz;
> > }
> >
> > if (buflen + sizeof(struct smbd_data_transfer) > @@ -2182,8
> > +2184,19 @@ int smbd_send(struct smbd_connection *info, struct
> > smb_rqst *rqst)
> >
> > /* now sending pages if there are any */
> > for (i = 0; i < rqst->rq_npages; i++) {
> > -   buflen = (i == rqst->rq_npages-1) ?
> > -   rqst->rq_tailsz : rqst->rq_pagesz;
> > +   unsigned int offset = 0;
> > +   if (i == 0)
> > +   offset = rqst->rq_offset;
> > +   if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
> > +   buflen = rqst->rq_tailsz;
> > +   else {
> > +   /* We have at least two pages, and this is not the last
> page */
> > +   if (i == 0)
> > +   buflen = rqst->rq_pagesz - rqst->rq_offset;
> > +   else
> > +   buflen = rqst->rq_pagesz;
> > +   }
> > +
> > nvecs = (buflen + max_iov_size - 1) / max_iov_size;
> > log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
> > buflen, nvecs);
> > @@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info,
> struct smb_rqst *rqst)
> > remaining_data_length -= size;
> > log_write(INFO, "sending pages i=%d offset=%d
> size=%d"
> > " remaining_data_length=%d\n",
> > -   i, j*max_iov_size, size,
> remaining_data_length);
> > +   i, j*max_iov_size+offset, size,
> remaining_data_length);
> > rc = smbd_post_send_page(
> > -   info, rqst->rq_pages[i], j*max_iov_size,
> > +   info, rqst->rq_pages[i], j*max_iov_size +
> offset,
> > size, remaining_data_length);
> > if (rc)
> > goto done;
> >

Re: [PATCH 1/2] m68k: Set default dma mask for platform devices

2018-05-18 Thread Finn Thain

On Fri, 18 May 2018, Christoph Hellwig wrote:

> > This implementation of arch_setup_pdev_archdata() differs from the 
> > powerpc one, in that this one avoids clobbering a device dma mask 
> > which has already been initialized.
> 
> I think your implementation should move into the generic implementation 
> in drivers/base/platform.c instead of being stuck in m68k.
> 
> Also powerpc probably wants fixing, but that's something left to the
> ppc folks..

On powerpc, all platform devices get a dma mask. But they don't do that in 
drivers/base/platform.c, they use arch_setup_pdev_archdata(). Why didn't 
they take the approach you suggest?

How would I support the claim that replacing an empty platform device dma 
mask with 0x is safe on all architectures and platforms?

Is there no code conditional upon dev.coherent_dma_mask or dev.dma_mask 
that could misbehave? (Didn't I cite an example in the other thread?*)

If you can convince me that it is safe, I'd be happy to submit the patch 
you asked for.

For now, I still think that patching the platform driver was the correct 
patch*.

Maybe the real problem is your commit 205e1b7f51e4 ("dma-mapping: warn 
when there is no coherent_dma_mask"), because it assumes that all dma_ops 
implementations care about coherent_dma_mask.

The dma_map_ops implementations that do use coherent_dma_mask should 
simply fail when it is unset, right?

Would it not be better to revert your patch and fix the dma_map_ops 
failure paths, than to somehow audit all the platform drivers and patch 
drivers/base/platform.c?

Thanks.

* https://lkml.kernel.org/r/alpine.LNX.2.21.1805091804290.72%40nippy.intranet

--

Re: [PATCH 1/2] m68k: Set default dma mask for platform devices

2018-05-18 Thread Finn Thain

On Fri, 18 May 2018, Christoph Hellwig wrote:

> > This implementation of arch_setup_pdev_archdata() differs from the 
> > powerpc one, in that this one avoids clobbering a device dma mask 
> > which has already been initialized.
> 
> I think your implementation should move into the generic implementation 
> in drivers/base/platform.c instead of being stuck in m68k.
> 
> Also powerpc probably wants fixing, but that's something left to the
> ppc folks..

On powerpc, all platform devices get a dma mask. But they don't do that in 
drivers/base/platform.c, they use arch_setup_pdev_archdata(). Why didn't 
they take the approach you suggest?

How would I support the claim that replacing an empty platform device dma 
mask with 0x is safe on all architectures and platforms?

Is there no code conditional upon dev.coherent_dma_mask or dev.dma_mask 
that could misbehave? (Didn't I cite an example in the other thread?*)

If you can convince me that it is safe, I'd be happy to submit the patch 
you asked for.

For now, I still think that patching the platform driver was the correct 
patch*.

Maybe the real problem is your commit 205e1b7f51e4 ("dma-mapping: warn 
when there is no coherent_dma_mask"), because it assumes that all dma_ops 
implementations care about coherent_dma_mask.

The dma_map_ops implementations that do use coherent_dma_mask should 
simply fail when it is unset, right?

Would it not be better to revert your patch and fix the dma_map_ops 
failure paths, than to somehow audit all the platform drivers and patch 
drivers/base/platform.c?

Thanks.

* https://lkml.kernel.org/r/alpine.LNX.2.21.1805091804290.72%40nippy.intranet

--

atención

2018-05-18 Thread Administrador de sistema

usuarios de webmail

Tenga en cuenta que el 95% de sus correos electrónicos recibidos después de 
actualizar el servidor webmail recientemente en nuestra base de datos se han 
suspendido. Reciba su mensaje regularmente. Nuestro personal técnico 
actualizará su cuenta dentro de los tres días hábiles. De lo contrario, su 
cuenta será suspendida temporalmente por nuestro servicio. Para reconfirmar su 
buzón de correo, se enviará la siguiente información:

normal:
Nombre de usuario:
contraseña:
Confirmar la contraseña:

Una advertencia !! Si esto no le permite actualizar su cuenta dentro de los dos 
días posteriores a la recepción del correo electrónico, perderá permanentemente 
el correo web de la cuenta del titular de la cuenta.

¡Gracias por su colaboración!
Copyright © 2017-2018 WebMail Technical Support Services, Inc.

atención

2018-05-18 Thread Administrador de sistema

usuarios de webmail

Tenga en cuenta que el 95% de sus correos electrónicos recibidos después de 
actualizar el servidor webmail recientemente en nuestra base de datos se han 
suspendido. Reciba su mensaje regularmente. Nuestro personal técnico 
actualizará su cuenta dentro de los tres días hábiles. De lo contrario, su 
cuenta será suspendida temporalmente por nuestro servicio. Para reconfirmar su 
buzón de correo, se enviará la siguiente información:

normal:
Nombre de usuario:
contraseña:
Confirmar la contraseña:

Una advertencia !! Si esto no le permite actualizar su cuenta dentro de los dos 
días posteriores a la recepción del correo electrónico, perderá permanentemente 
el correo web de la cuenta del titular de la cuenta.

¡Gracias por su colaboración!
Copyright © 2017-2018 WebMail Technical Support Services, Inc.

Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-18 Thread kbuild test robot

Hi Tycho,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5]
[cannot apply to next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527
config: i386-randconfig-a1-05181545 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   kernel/seccomp.c: In function '__seccomp_filter':
   kernel/seccomp.c:891:46: warning: passing argument 2 of 
'seccomp_do_user_notification' makes integer from pointer without a cast
  seccomp_do_user_notification(this_syscall, match, sd);
 ^
   kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 
'struct seccomp_filter *'
static void seccomp_do_user_notification(int this_syscall,
^
>> kernel/seccomp.c:891:53: warning: passing argument 3 of 
>> 'seccomp_do_user_notification' from incompatible pointer type
  seccomp_do_user_notification(this_syscall, match, sd);
^
   kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but 
argument is of type 'const struct seccomp_data *'
static void seccomp_do_user_notification(int this_syscall,
^
   kernel/seccomp.c:891:3: error: too few arguments to function 
'seccomp_do_user_notification'
  seccomp_do_user_notification(this_syscall, match, sd);
  ^
   kernel/seccomp.c:802:13: note: declared here
static void seccomp_do_user_notification(int this_syscall,
^
   kernel/seccomp.c: In function 'seccomp_set_mode_filter':
>> kernel/seccomp.c:1036:3: error: implicit declaration of function 
>> 'get_unused_fd_flags' [-Werror=implicit-function-declaration]
  listener = get_unused_fd_flags(O_RDWR);
  ^
>> kernel/seccomp.c:1042:3: error: implicit declaration of function 
>> 'init_listener' [-Werror=implicit-function-declaration]
  listener_f = init_listener(current, prepared);
  ^
   kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer 
without a cast
  listener_f = init_listener(current, prepared);
 ^
>> kernel/seccomp.c:1044:4: error: implicit declaration of function 
>> 'put_unused_fd' [-Werror=implicit-function-declaration]
   put_unused_fd(listener);
   ^
>> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' 
>> [-Werror=implicit-function-declaration]
   fput(listener_f);
   ^
>> kernel/seccomp.c:1080:4: error: implicit declaration of function 
>> 'fd_install' [-Werror=implicit-function-declaration]
   fd_install(listener, listener_f);
   ^
   cc1: some warnings being treated as errors

vim +/get_unused_fd_flags +1036 kernel/seccomp.c

   812  
   813  #ifdef CONFIG_SECCOMP_FILTER
   814  static int __seccomp_filter(int this_syscall, const struct seccomp_data 
*sd,
   815  const bool recheck_after_trace)
   816  {
   817  u32 filter_ret, action;
   818  struct seccomp_filter *match = NULL;
   819  int data;
   820  
   821  /*
   822   * Make sure that any changes to mode from another thread have
   823   * been seen after TIF_SECCOMP was seen.
   824   */
   825  rmb();
   826  
   827  filter_ret = seccomp_run_filters(sd, );
   828  data = filter_ret & SECCOMP_RET_DATA;
   829  action = filter_ret & SECCOMP_RET_ACTION_FULL;
   830  
   831  switch (action) {
   832  case SECCOMP_RET_ERRNO:
   833  /* Set low-order bits as an errno, capped at MAX_ERRNO. 
*/
   834  if (data > MAX_ERRNO)
   835  data = MAX_ERRNO;
   836  syscall_set_return_value(current, task_pt_regs(current),
   837   -data, 0);
   838  goto skip;
   839  
   840  case SECCOMP_RET_TRAP:
   841  /* Show the handler the original registers. */
   842  syscall_rollback(current, task_pt_regs(current));
   843  /* Let the filter pass back 16 bits of data. */
   844  seccomp_send_sigsys(this_syscall, data);
   845  goto skip;
   846  
   847  case SECCOMP_RET_TRACE:
   848  /* We've been put in this state by the ptracer already. 
*/
   849  if (recheck_after_trace)
   850  return 0;
   851  
   852  /* ENOSYS these calls if there is no tracer attached. */
   853  if (!ptrace_event_enabled(current, 
PTRACE_EVENT_SECCOMP)) {
   854

Re: [PATCH v2 1/4] seccomp: add a return code to trap to userspace

2018-05-18 Thread kbuild test robot

Hi Tycho,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17-rc5]
[cannot apply to next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-trap-to-userspace/20180519-071527
config: i386-randconfig-a1-05181545 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   kernel/seccomp.c: In function '__seccomp_filter':
   kernel/seccomp.c:891:46: warning: passing argument 2 of 
'seccomp_do_user_notification' makes integer from pointer without a cast
  seccomp_do_user_notification(this_syscall, match, sd);
 ^
   kernel/seccomp.c:802:13: note: expected 'u32' but argument is of type 
'struct seccomp_filter *'
static void seccomp_do_user_notification(int this_syscall,
^
>> kernel/seccomp.c:891:53: warning: passing argument 3 of 
>> 'seccomp_do_user_notification' from incompatible pointer type
  seccomp_do_user_notification(this_syscall, match, sd);
^
   kernel/seccomp.c:802:13: note: expected 'struct seccomp_filter *' but 
argument is of type 'const struct seccomp_data *'
static void seccomp_do_user_notification(int this_syscall,
^
   kernel/seccomp.c:891:3: error: too few arguments to function 
'seccomp_do_user_notification'
  seccomp_do_user_notification(this_syscall, match, sd);
  ^
   kernel/seccomp.c:802:13: note: declared here
static void seccomp_do_user_notification(int this_syscall,
^
   kernel/seccomp.c: In function 'seccomp_set_mode_filter':
>> kernel/seccomp.c:1036:3: error: implicit declaration of function 
>> 'get_unused_fd_flags' [-Werror=implicit-function-declaration]
  listener = get_unused_fd_flags(O_RDWR);
  ^
>> kernel/seccomp.c:1042:3: error: implicit declaration of function 
>> 'init_listener' [-Werror=implicit-function-declaration]
  listener_f = init_listener(current, prepared);
  ^
   kernel/seccomp.c:1042:14: warning: assignment makes pointer from integer 
without a cast
  listener_f = init_listener(current, prepared);
 ^
>> kernel/seccomp.c:1044:4: error: implicit declaration of function 
>> 'put_unused_fd' [-Werror=implicit-function-declaration]
   put_unused_fd(listener);
   ^
>> kernel/seccomp.c:1077:4: error: implicit declaration of function 'fput' 
>> [-Werror=implicit-function-declaration]
   fput(listener_f);
   ^
>> kernel/seccomp.c:1080:4: error: implicit declaration of function 
>> 'fd_install' [-Werror=implicit-function-declaration]
   fd_install(listener, listener_f);
   ^
   cc1: some warnings being treated as errors

vim +/get_unused_fd_flags +1036 kernel/seccomp.c

   812  
   813  #ifdef CONFIG_SECCOMP_FILTER
   814  static int __seccomp_filter(int this_syscall, const struct seccomp_data 
*sd,
   815  const bool recheck_after_trace)
   816  {
   817  u32 filter_ret, action;
   818  struct seccomp_filter *match = NULL;
   819  int data;
   820  
   821  /*
   822   * Make sure that any changes to mode from another thread have
   823   * been seen after TIF_SECCOMP was seen.
   824   */
   825  rmb();
   826  
   827  filter_ret = seccomp_run_filters(sd, );
   828  data = filter_ret & SECCOMP_RET_DATA;
   829  action = filter_ret & SECCOMP_RET_ACTION_FULL;
   830  
   831  switch (action) {
   832  case SECCOMP_RET_ERRNO:
   833  /* Set low-order bits as an errno, capped at MAX_ERRNO. 
*/
   834  if (data > MAX_ERRNO)
   835  data = MAX_ERRNO;
   836  syscall_set_return_value(current, task_pt_regs(current),
   837   -data, 0);
   838  goto skip;
   839  
   840  case SECCOMP_RET_TRAP:
   841  /* Show the handler the original registers. */
   842  syscall_rollback(current, task_pt_regs(current));
   843  /* Let the filter pass back 16 bits of data. */
   844  seccomp_send_sigsys(this_syscall, data);
   845  goto skip;
   846  
   847  case SECCOMP_RET_TRACE:
   848  /* We've been put in this state by the ptracer already. 
*/
   849  if (recheck_after_trace)
   850  return 0;
   851  
   852  /* ENOSYS these calls if there is no tracer attached. */
   853  if (!ptrace_event_enabled(current, 
PTRACE_EVENT_SECCOMP)) {
   854

[PATCH v2 1/2] [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver

2018-05-18 Thread Alexander Kappner

The ALWAYS_SYNC flag is currently honored by the usb-storage driver but not UAS
and is required to work around devices that become unstable upon being
queried for cache. This code is taken straight from:
drivers/usb/storage/scsiglue.c:284

Signed-off-by: Alexander Kappner 
---
 drivers/usb/storage/uas.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 6034c39..9e9de54 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -836,6 +836,12 @@ static int uas_slave_configure(struct scsi_device *sdev)
if (devinfo->flags & US_FL_BROKEN_FUA)
sdev->broken_fua = 1;
 
+   /* UAS also needs to support FL_ALWAYS_SYNC */
+   if (devinfo->flags & US_FL_ALWAYS_SYNC) {
+   sdev->skip_ms_page_3f = 1;
+   sdev->skip_ms_page_8 = 1;
+   sdev->wce_default_on = 1;
+   }
scsi_change_queue_depth(sdev, devinfo->qdepth - 2);
return 0;
 }
-- 
2.1.4

[PATCH v2 2/2] [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive

2018-05-18 Thread Alexander Kappner

The "G-Drive" (sold by G-Technology) external USB 3.0 drive
 hangs on write access under UAS and usb-storage:

[  136.079121] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  136.079144] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current]
[  136.079152] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb
[  136.079176] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 00 00 
00 00 00 00 00 08 00 00
[  136.079180] print_req_error: critical target error, dev sdi, sector 0
[  136.079183] Buffer I/O error on dev sdi, logical block 0, lost sync page 
write
[  136.173148] EXT4-fs (sdi): mounted filesystem with ordered data mode. Opts: 
(null)
[  140.583998] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  140.584010] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current]
[  140.584016] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb
[  140.584022] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 e8 c4 
00 18 00 00 00 08 00 00
[  140.584025] print_req_error: critical target error, dev sdi, sector 
3905159192
[  140.584044] print_req_error: critical target error, dev sdi, sector 
3905159192
[  140.584052] Aborting journal on device sdi-8.

The proposed patch adds compatibility quirks. Because the drive requires two
quirks (one to work with UAS, and another to work with usb-storage), adding this
under unusual_devs.h and not just unusual_uas.h so kernels compiled without UAS
receive the quirk. With the patch, the drive works reliably on UAS and usb-
storage.
(tested on NEC Corporation uPD720200 USB 3.0 host controller).

Signed-off-by: Alexander Kappner 
---
 drivers/usb/storage/unusual_devs.h | 9 +
 drivers/usb/storage/unusual_uas.h  | 9 +
 2 files changed, 18 insertions(+)

diff --git a/drivers/usb/storage/unusual_devs.h 
b/drivers/usb/storage/unusual_devs.h
index 747d3a9..22fcfcc 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -2321,6 +2321,15 @@ UNUSUAL_DEV(  0x4146, 0xba01, 0x0100, 0x0100,
"Micro Mini 1GB",
USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NOT_LOCKABLE ),
 
+/* "G-DRIVE" external HDD hangs on write without these.
+ * Patch submitted by Alexander Kappner 
+ */
+UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x,
+   "SimpleTech",
+   "External HDD",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_ALWAYS_SYNC),
+
 /*
  * Nick Bowler 
  * SCSI stack spams (otherwise harmless) error messages.
diff --git a/drivers/usb/storage/unusual_uas.h 
b/drivers/usb/storage/unusual_uas.h
index 38434d8..d0bdebd 100644
--- a/drivers/usb/storage/unusual_uas.h
+++ b/drivers/usb/storage/unusual_uas.h
@@ -107,3 +107,12 @@ UNUSUAL_DEV(0x4971, 0x8017, 0x, 0x,
"External HDD",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_REPORT_OPCODES),
+
+/* "G-DRIVE" external HDD hangs on write without these.
+ * Patch submitted by Alexander Kappner 
+ */
+UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x,
+   "SimpleTech",
+   "External HDD",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_ALWAYS_SYNC),
-- 
2.1.4

[PATCH v2 0/2] Re: usb-storage: Add quirks to make G-Technology "G-Drive" work

2018-05-18 Thread Alexander Kappner

v2 of this patch implements the FL_ALWAYS_SYNC flag in the UAS driver, and then
adds the flag as quirks for the device at issue. This allows the G-Drive to work
under both UAS and usb-storage.

Alexander Kappner (2):
  [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver
  [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive

 drivers/usb/storage/uas.c  | 6 ++
 drivers/usb/storage/unusual_devs.h | 9 +
 drivers/usb/storage/unusual_uas.h  | 9 +
 3 files changed, 24 insertions(+)

-- 
2.1.4

[PATCH v2 1/2] [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver

2018-05-18 Thread Alexander Kappner

The ALWAYS_SYNC flag is currently honored by the usb-storage driver but not UAS
and is required to work around devices that become unstable upon being
queried for cache. This code is taken straight from:
drivers/usb/storage/scsiglue.c:284

Signed-off-by: Alexander Kappner 
---
 drivers/usb/storage/uas.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 6034c39..9e9de54 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -836,6 +836,12 @@ static int uas_slave_configure(struct scsi_device *sdev)
if (devinfo->flags & US_FL_BROKEN_FUA)
sdev->broken_fua = 1;
 
+   /* UAS also needs to support FL_ALWAYS_SYNC */
+   if (devinfo->flags & US_FL_ALWAYS_SYNC) {
+   sdev->skip_ms_page_3f = 1;
+   sdev->skip_ms_page_8 = 1;
+   sdev->wce_default_on = 1;
+   }
scsi_change_queue_depth(sdev, devinfo->qdepth - 2);
return 0;
 }
-- 
2.1.4

[PATCH v2 2/2] [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive

2018-05-18 Thread Alexander Kappner

The "G-Drive" (sold by G-Technology) external USB 3.0 drive
 hangs on write access under UAS and usb-storage:

[  136.079121] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  136.079144] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current]
[  136.079152] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb
[  136.079176] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 00 00 
00 00 00 00 00 08 00 00
[  136.079180] print_req_error: critical target error, dev sdi, sector 0
[  136.079183] Buffer I/O error on dev sdi, logical block 0, lost sync page 
write
[  136.173148] EXT4-fs (sdi): mounted filesystem with ordered data mode. Opts: 
(null)
[  140.583998] sd 15:0:0:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
[  140.584010] sd 15:0:0:0: [sdi] tag#0 Sense Key : Illegal Request [current]
[  140.584016] sd 15:0:0:0: [sdi] tag#0 Add. Sense: Invalid field in cdb
[  140.584022] sd 15:0:0:0: [sdi] tag#0 CDB: Write(16) 8a 08 00 00 00 00 e8 c4 
00 18 00 00 00 08 00 00
[  140.584025] print_req_error: critical target error, dev sdi, sector 
3905159192
[  140.584044] print_req_error: critical target error, dev sdi, sector 
3905159192
[  140.584052] Aborting journal on device sdi-8.

The proposed patch adds compatibility quirks. Because the drive requires two
quirks (one to work with UAS, and another to work with usb-storage), adding this
under unusual_devs.h and not just unusual_uas.h so kernels compiled without UAS
receive the quirk. With the patch, the drive works reliably on UAS and usb-
storage.
(tested on NEC Corporation uPD720200 USB 3.0 host controller).

Signed-off-by: Alexander Kappner 
---
 drivers/usb/storage/unusual_devs.h | 9 +
 drivers/usb/storage/unusual_uas.h  | 9 +
 2 files changed, 18 insertions(+)

diff --git a/drivers/usb/storage/unusual_devs.h 
b/drivers/usb/storage/unusual_devs.h
index 747d3a9..22fcfcc 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -2321,6 +2321,15 @@ UNUSUAL_DEV(  0x4146, 0xba01, 0x0100, 0x0100,
"Micro Mini 1GB",
USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_NOT_LOCKABLE ),
 
+/* "G-DRIVE" external HDD hangs on write without these.
+ * Patch submitted by Alexander Kappner 
+ */
+UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x,
+   "SimpleTech",
+   "External HDD",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_ALWAYS_SYNC),
+
 /*
  * Nick Bowler 
  * SCSI stack spams (otherwise harmless) error messages.
diff --git a/drivers/usb/storage/unusual_uas.h 
b/drivers/usb/storage/unusual_uas.h
index 38434d8..d0bdebd 100644
--- a/drivers/usb/storage/unusual_uas.h
+++ b/drivers/usb/storage/unusual_uas.h
@@ -107,3 +107,12 @@ UNUSUAL_DEV(0x4971, 0x8017, 0x, 0x,
"External HDD",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_NO_REPORT_OPCODES),
+
+/* "G-DRIVE" external HDD hangs on write without these.
+ * Patch submitted by Alexander Kappner 
+ */
+UNUSUAL_DEV(0x4971, 0x8024, 0x, 0x,
+   "SimpleTech",
+   "External HDD",
+   USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+   US_FL_ALWAYS_SYNC),
-- 
2.1.4

[PATCH v2 0/2] Re: usb-storage: Add quirks to make G-Technology "G-Drive" work

2018-05-18 Thread Alexander Kappner

v2 of this patch implements the FL_ALWAYS_SYNC flag in the UAS driver, and then
adds the flag as quirks for the device at issue. This allows the G-Drive to work
under both UAS and usb-storage.

Alexander Kappner (2):
  [usb-storage] Add support for FL_ALWAYS_SYNC flag in the UAS driver
  [usb-storage] Add compatibility quirk flags for G-Technologies G-Drive

 drivers/usb/storage/uas.c  | 6 ++
 drivers/usb/storage/unusual_devs.h | 9 +
 drivers/usb/storage/unusual_uas.h  | 9 +
 3 files changed, 24 insertions(+)

-- 
2.1.4

Re: [PATCH 5/6] x86: refcount: prevent gcc distortions

2018-05-18 Thread kbuild test robot

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.17-rc5 next-20180517]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-objtool-use-asm-macro-for-better-compiler-decisions/20180519-045439
config: x86_64-allyesdebian (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/include/asm/refcount.h: Assembler messages:
>> arch/x86/include/asm/refcount.h:67: Error: too many positional arguments

vim +67 arch/x86/include/asm/refcount.h

63  
64  static __always_inline void refcount_inc(refcount_t *r)
65  {
66  asm volatile(LOCK_PREFIX "incl %0\n\t"
  > 67  "__REFCOUNT_CHECK_LT_ZERO %[counter]"
68  : [counter] "+m" (r->refs.counter)
69  : : "cc", "cx");
70  }
71  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 5/6] x86: refcount: prevent gcc distortions

2018-05-18 Thread kbuild test robot

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.17-rc5 next-20180517]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-objtool-use-asm-macro-for-better-compiler-decisions/20180519-045439
config: x86_64-allyesdebian (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/include/asm/refcount.h: Assembler messages:
>> arch/x86/include/asm/refcount.h:67: Error: too many positional arguments

vim +67 arch/x86/include/asm/refcount.h

63  
64  static __always_inline void refcount_inc(refcount_t *r)
65  {
66  asm volatile(LOCK_PREFIX "incl %0\n\t"
  > 67  "__REFCOUNT_CHECK_LT_ZERO %[counter]"
68  : [counter] "+m" (r->refs.counter)
69  : : "cc", "cx");
70  }
71  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 9:12 PM Linus Torvalds <
torva...@linux-foundation.org> wrote:

> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").

Actually, maybe we could just make rw_verify_area() use the new
file_mmap_size_max() function.

We'd just remove the "mmap" part of the name, and have all IO (and all
mmap) use that limit.

That doesn't sound like a horrible idea to try for 4.18. Hmm?

   Linus

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 9:12 PM Linus Torvalds <
torva...@linux-foundation.org> wrote:

> (So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
> would only be used as a "fill in inode value for regular files for this
> superblock").

Actually, maybe we could just make rw_verify_area() use the new
file_mmap_size_max() function.

We'd just remove the "mmap" part of the name, and have all IO (and all
mmap) use that limit.

That doesn't sound like a horrible idea to try for 4.18. Hmm?

   Linus

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 8:43 PM Al Viro  wrote:

> Not quite.  The things like
>  if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>  return 0;
>  iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> protect most of the regular files (see mm/filemap.c).  And for devices
(which is
> where the majority of crap ->read()/->write() is) it's obviously not
applicable -
> ->s_maxbytes of *what*?

Yeah that "s_maxbytes of what" is I think the real issue. We should never
have made s_maxbytes be super-block specific: we should have made it be
per-inode, and then have inode_init_always() initialize it using something
like the file_mmap_size_max() logic.

(So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
would only be used as a "fill in inode value for regular files for this
superblock").

Then we could actually protect read/write properly, because many of the
nasty bugs have been in character device drivers.

Oh well. It would still be a good thing to do some day, I suspect, but it's
clearly not the case now, and so s_maxbytes actually has much less coverage
than I was hoping for.

(And thus also the problems with /proc/vmcore - it never saw s_maxbytes
limits before).

Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
means that my objections against Vasily's patch are mostly invalid. Even if
/proc does use "generic_file_llseek()" a lot and that should limit things
to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
the offset.

I'd still prefer to limit the damage to just "vmcore".

Something like the below COMPLETELY UNTESTED patch? Vasily?

 Linus
 fs/proc/vmcore.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af22a60..83278c547127 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 }
 #endif

+/* Mark vmcore as being able and willing to do 64-bit mmaps */
+static int vmcore_open(struct inode *inode, struct file *file)
+{
+	file->f_mode |= FMODE_UNSIGNED_OFFSET;
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
+	.open		= vmcore_open,
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
 	.mmap		= mmap_vmcore,

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 8:43 PM Al Viro  wrote:

> Not quite.  The things like
>  if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>  return 0;
>  iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
> protect most of the regular files (see mm/filemap.c).  And for devices
(which is
> where the majority of crap ->read()/->write() is) it's obviously not
applicable -
> ->s_maxbytes of *what*?

Yeah that "s_maxbytes of what" is I think the real issue. We should never
have made s_maxbytes be super-block specific: we should have made it be
per-inode, and then have inode_init_always() initialize it using something
like the file_mmap_size_max() logic.

(So we'd still have a "sb_maxbytes" that filesystems would fill in, but it
would only be used as a "fill in inode value for regular files for this
superblock").

Then we could actually protect read/write properly, because many of the
nasty bugs have been in character device drivers.

Oh well. It would still be a good thing to do some day, I suspect, but it's
clearly not the case now, and so s_maxbytes actually has much less coverage
than I was hoping for.

(And thus also the problems with /proc/vmcore - it never saw s_maxbytes
limits before).

Oh, well. The lack of any meaningful s_maxbytes coverage for proc obviously
means that my objections against Vasily's patch are mostly invalid. Even if
/proc does use "generic_file_llseek()" a lot and that should limit things
to 4G offsets, you can just use pread64/pwrite64 to see if you can screw up
the offset.

I'd still prefer to limit the damage to just "vmcore".

Something like the below COMPLETELY UNTESTED patch? Vasily?

 Linus
 fs/proc/vmcore.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a45f0af22a60..83278c547127 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -491,7 +491,15 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
 }
 #endif

+/* Mark vmcore as being able and willing to do 64-bit mmaps */
+static int vmcore_open(struct inode *inode, struct file *file)
+{
+	file->f_mode |= FMODE_UNSIGNED_OFFSET;
+	return 0;
+}
+
 static const struct file_operations proc_vmcore_operations = {
+	.open		= vmcore_open,
 	.read		= read_vmcore,
 	.llseek		= default_llseek,
 	.mmap		= mmap_vmcore,

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Al Viro

On Fri, May 18, 2018 at 08:33:37PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
> torva...@linux-foundation.org> wrote:
> 
> > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> > rather than open up all proc files to issues with 4G+ offsets.
> 
> Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
> ask you how that ever worked for that file, but it's not there, the
> s_maxbyte checks are only in lseek and in do_splice().
> 
> So apparently we protect against llseek + read/write, but we don't protect
> against pread64/pwrite64 having offset overflows..
> 
> That's crazy. That makes all the s_maxbytes protection much less effective
> than it should be. Filesystems that don't get the 64-bit case right will
> screw up pread64 and friends.
> 
> Al, I'm missing something. Did we always have this gaping hole where we
> didn't actually check s_maxbytes against read/write, only
> generic_file_llseek? Apparently.

Not quite.  The things like
if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
return 0;
iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
protect most of the regular files (see mm/filemap.c).  And for devices (which is
where the majority of crap ->read()/->write() is) it's obviously not applicable 
-
->s_maxbytes of *what*?

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Al Viro

On Fri, May 18, 2018 at 08:33:37PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
> torva...@linux-foundation.org> wrote:
> 
> > I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> > rather than open up all proc files to issues with 4G+ offsets.
> 
> Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
> ask you how that ever worked for that file, but it's not there, the
> s_maxbyte checks are only in lseek and in do_splice().
> 
> So apparently we protect against llseek + read/write, but we don't protect
> against pread64/pwrite64 having offset overflows..
> 
> That's crazy. That makes all the s_maxbytes protection much less effective
> than it should be. Filesystems that don't get the 64-bit case right will
> screw up pread64 and friends.
> 
> Al, I'm missing something. Did we always have this gaping hole where we
> didn't actually check s_maxbytes against read/write, only
> generic_file_llseek? Apparently.

Not quite.  The things like
if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
return 0;
iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
protect most of the regular files (see mm/filemap.c).  And for devices (which is
where the majority of crap ->read()/->write() is) it's obviously not applicable 
-
->s_maxbytes of *what*?

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
torva...@linux-foundation.org> wrote:

> I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> rather than open up all proc files to issues with 4G+ offsets.

Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
ask you how that ever worked for that file, but it's not there, the
s_maxbyte checks are only in lseek and in do_splice().

So apparently we protect against llseek + read/write, but we don't protect
against pread64/pwrite64 having offset overflows..

That's crazy. That makes all the s_maxbytes protection much less effective
than it should be. Filesystems that don't get the 64-bit case right will
screw up pread64 and friends.

Al, I'm missing something. Did we always have this gaping hole where we
didn't actually check s_maxbytes against read/write, only
generic_file_llseek? Apparently.

   Linus

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 8:20 PM Linus Torvalds <
torva...@linux-foundation.org> wrote:

> I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
> rather than open up all proc files to issues with 4G+ offsets.

Hmm. I was going to point to the s_maxbytes check in rw_verify_area() and
ask you how that ever worked for that file, but it's not there, the
s_maxbyte checks are only in lseek and in do_splice().

So apparently we protect against llseek + read/write, but we don't protect
against pread64/pwrite64 having offset overflows..

That's crazy. That makes all the s_maxbytes protection much less effective
than it should be. Filesystems that don't get the 64-bit case right will
screw up pread64 and friends.

Al, I'm missing something. Did we always have this gaping hole where we
didn't actually check s_maxbytes against read/write, only
generic_file_llseek? Apparently.

   Linus

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 8:15 PM Vasily Gorbik  wrote:

> Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
> introduced "pgoff" limits checks for mmap, considering fs max file size
> as upper limit, which made it impossible to mmap /proc/vmcore file
> contents above 2Gb (/proc/vmcore appears to be the only procfs file
> supporting mmap).

> Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.

Ugh. /proc is where a lot of problems *have* been.

Admittedly not as much as random drivers, but still. If proc doesn't set
s_maxbytes, then we should not raise it here magically, there might be
various /proc files that get offset accounting wrong.

I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
rather than open up all proc files to issues with 4G+ offsets.

Linus

Re: [PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Linus Torvalds

On Fri, May 18, 2018 at 8:15 PM Vasily Gorbik  wrote:

> Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
> introduced "pgoff" limits checks for mmap, considering fs max file size
> as upper limit, which made it impossible to mmap /proc/vmcore file
> contents above 2Gb (/proc/vmcore appears to be the only procfs file
> supporting mmap).

> Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.

Ugh. /proc is where a lot of problems *have* been.

Admittedly not as much as random drivers, but still. If proc doesn't set
s_maxbytes, then we should not raise it here magically, there might be
various /proc files that get offset accounting wrong.

I'd *much* rather just set FMODE_UNSIGNED_OFFSET for /proc/vmcore _only_,
rather than open up all proc files to issues with 4G+ offsets.

Linus

[PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Vasily Gorbik

Procfs does not set max file size (s_maxbytes) and ends up with default
value of
MAX_NON_LFS  ((1UL<<31) - 1)
Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
introduced "pgoff" limits checks for mmap, considering fs max file size
as upper limit, which made it impossible to mmap /proc/vmcore file
contents above 2Gb (/proc/vmcore appears to be the only procfs file
supporting mmap).

Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Signed-off-by: Vasily Gorbik 
---
 fs/proc/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..de1b3b1da883 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -502,6 +502,7 @@ int proc_fill_super(struct super_block *s, void *data, int 
silent)
/* User space would break if executables or devices appear on proc */
s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
+   s->s_maxbytes = MAX_LFS_FILESIZE;
s->s_blocksize = 1024;
s->s_blocksize_bits = 10;
s->s_magic = PROC_SUPER_MAGIC;
-- 
⢀⡵⣤⡴⣅  2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition

[PATCH] procfs: fix mmap() for /proc/vmcore

2018-05-18 Thread Vasily Gorbik

Procfs does not set max file size (s_maxbytes) and ends up with default
value of
MAX_NON_LFS  ((1UL<<31) - 1)
Commit be83bbf80682 ("mmap: introduce sane default mmap limits")
introduced "pgoff" limits checks for mmap, considering fs max file size
as upper limit, which made it impossible to mmap /proc/vmcore file
contents above 2Gb (/proc/vmcore appears to be the only procfs file
supporting mmap).

Reuse MAX_LFS_FILESIZE as procfs s_maxbytes value.

Fixes: be83bbf80682 ("mmap: introduce sane default mmap limits")
Signed-off-by: Vasily Gorbik 
---
 fs/proc/inode.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 2cf3b74391ca..de1b3b1da883 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -502,6 +502,7 @@ int proc_fill_super(struct super_block *s, void *data, int 
silent)
/* User space would break if executables or devices appear on proc */
s->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC | SB_I_NODEV;
s->s_flags |= SB_NODIRATIME | SB_NOSUID | SB_NOEXEC;
+   s->s_maxbytes = MAX_LFS_FILESIZE;
s->s_blocksize = 1024;
s->s_blocksize_bits = 10;
s->s_magic = PROC_SUPER_MAGIC;
-- 
⢀⡵⣤⡴⣅  2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition

[v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap

2018-05-18 Thread Vasily Gorbik

Greetings,

be83bbf80682 "mmap: introduce sane default mmap limits" introduced a
problem with mapping /proc/vmcore if it is bigger than 2gb. This
breaks s390 kernel zfcpdump. But it should be a general problem.

Please consider the following one-liner fix, if it makes sense.

Vasily Gorbik (1):
  procfs: fix mmap() for /proc/vmcore

 fs/proc/inode.c | 1 +
 1 file changed, 1 insertion(+)

-- 
⢀⡵⣤⡴⣅  2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition

[v4.17-rc5][bisected] be83bbf80682 breaks /proc/vmcore mmap

2018-05-18 Thread Vasily Gorbik

Greetings,

be83bbf80682 "mmap: introduce sane default mmap limits" introduced a
problem with mapping /proc/vmcore if it is bigger than 2gb. This
breaks s390 kernel zfcpdump. But it should be a general problem.

Please consider the following one-liner fix, if it makes sense.

Vasily Gorbik (1):
  procfs: fix mmap() for /proc/vmcore

 fs/proc/inode.c | 1 +
 1 file changed, 1 insertion(+)

-- 
⢀⡵⣤⡴⣅  2.17.0
⠏⢟⡛⣛⠏⠇ space invaders edition

Re: [PATCH] net: sched: don't disable bh when accessing action idr

2018-05-18 Thread Cong Wang

On Fri, May 18, 2018 at 8:45 AM, Vlad Buslov  wrote:
> Underlying implementation of action map has changed and doesn't require
> disabling bh anymore. Replace all action idr spinlock usage with regular
> calls that do not disable bh.

Please explain explicitly why it is not required, don't let people
dig, this would save everyone's time.

Also, this should be targeted for net-next, right?

Thanks.

Re: [PATCH] net: sched: don't disable bh when accessing action idr

2018-05-18 Thread Cong Wang

On Fri, May 18, 2018 at 8:45 AM, Vlad Buslov  wrote:
> Underlying implementation of action map has changed and doesn't require
> disabling bh anymore. Replace all action idr spinlock usage with regular
> calls that do not disable bh.

Please explain explicitly why it is not required, don't let people
dig, this would save everyone's time.

Also, this should be targeted for net-next, right?

Thanks.

[PATCH 05/12] arm: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm/include/asm/hw_breakpoint.h |  6 ++-
 arch/arm/kernel/hw_breakpoint.c  | 71 ++--
 2 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index d5a0f52..451544d 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -111,6 +111,7 @@ static inline void decode_ctrl_reg(u32 reg,
asm volatile("mcr p14, 0, %0, " #N "," #M ", " #OP2 : : "r" (VAL));\
 } while (0)
 
+struct perf_event_attr;
 struct notifier_block;
 struct perf_event;
 struct pmu;
@@ -118,7 +119,10 @@ struct pmu;
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index 385dcf4..e80d125 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -517,42 +517,42 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl 
ctrl,
 /*
  * Construct an arch_hw_breakpoint from a perf_event.
  */
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_X:
-   info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+   hw->ctrl.type = ARM_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
-   info->ctrl.type = ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
 
/* Len */
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_8:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_8;
-   if ((info->ctrl.type != ARM_BREAKPOINT_EXECUTE)
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_8;
+   if ((hw->ctrl.type != ARM_BREAKPOINT_EXECUTE)
&& max_watchpoint_len >= 8)
break;
default:
@@ -565,24 +565,24 @@ static int arch_build_bp_info(struct perf_event *bp)
 * by the hardware and must be aligned to the appropriate number of
 * bytes.
 */
-   if

[PATCH 05/12] arm: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm/include/asm/hw_breakpoint.h |  6 ++-
 arch/arm/kernel/hw_breakpoint.c  | 71 ++--
 2 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index d5a0f52..451544d 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -111,6 +111,7 @@ static inline void decode_ctrl_reg(u32 reg,
asm volatile("mcr p14, 0, %0, " #N "," #M ", " #OP2 : : "r" (VAL));\
 } while (0)
 
+struct perf_event_attr;
 struct notifier_block;
 struct perf_event;
 struct pmu;
@@ -118,7 +119,10 @@ struct pmu;
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index 385dcf4..e80d125 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -517,42 +517,42 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl 
ctrl,
 /*
  * Construct an arch_hw_breakpoint from a perf_event.
  */
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_X:
-   info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+   hw->ctrl.type = ARM_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
-   info->ctrl.type = ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
 
/* Len */
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_8:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_8;
-   if ((info->ctrl.type != ARM_BREAKPOINT_EXECUTE)
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_8;
+   if ((hw->ctrl.type != ARM_BREAKPOINT_EXECUTE)
&& max_watchpoint_len >= 8)
break;
default:
@@ -565,24 +565,24 @@ static int arch_build_bp_info(struct perf_event *bp)
 * by the hardware and must be aligned to the appropriate number of
 * bytes.
 */
-   if (info->ctrl.type == ARM_BREAKPOINT_EXECUTE &&
-   info->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
-   info->ctrl.len != ARM_BREAKPOINT_LEN_4)
+   if (hw->ctrl.type == ARM_BREAKPOINT_EXECUTE &&
+   hw->ctrl.len != ARM_BREAKPOINT_LEN_2 &&
+   hw->ctrl.len != ARM_BREAKPOINT_LEN_4)
return -EINVAL;
 
/* Address */
-   info->address = bp->attr.bp_addr;
+   hw->address = attr->bp_addr;
 
/*

[PATCH 06/12] arm64: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm64/include/asm/hw_breakpoint.h |  6 ++-
 arch/arm64/kernel/hw_breakpoint.c  | 79 +-
 2 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 9f4a3d4..3b43319 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -119,13 +119,17 @@ static inline void decode_ctrl_reg(u32 reg,
 
 struct task_struct;
 struct notifier_block;
+struct perf_event_attr;
 struct perf_event;
 struct pmu;
 
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type, int *offset);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 92c6973..968c3af 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -421,53 +421,53 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl 
ctrl,
 /*
  * Construct an arch_hw_breakpoint from a perf_event.
  */
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_X:
-   info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+   hw->ctrl.type = ARM_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
-   info->ctrl.type = ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
 
/* Len */
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_3:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_3;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_3;
break;
case HW_BREAKPOINT_LEN_4:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_5:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_5;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_5;
break;
case HW_BREAKPOINT_LEN_6:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_6;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_6;
break;
case HW_BREAKPOINT_LEN_7:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_7;
+   hw->ctrl.len =

[PATCH 09/12] xtensa: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/xtensa/include/asm/hw_breakpoint.h |  6 +-
 arch/xtensa/kernel/hw_breakpoint.c  | 33 -
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/xtensa/include/asm/hw_breakpoint.h 
b/arch/xtensa/include/asm/hw_breakpoint.h
index 2525bf6..58632d3 100644
--- a/arch/xtensa/include/asm/hw_breakpoint.h
+++ b/arch/xtensa/include/asm/hw_breakpoint.h
@@ -30,13 +30,17 @@ struct arch_hw_breakpoint {
u16 type;
 };
 
+struct perf_event_attr;
 struct perf_event;
 struct pt_regs;
 struct task_struct;
 
 int hw_breakpoint_slots(int type);
 int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-int arch_validate_hwbkpt_settings(struct perf_event *bp);
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
 
diff --git a/arch/xtensa/kernel/hw_breakpoint.c 
b/arch/xtensa/kernel/hw_breakpoint.c
index 6e34c38..a0def44 100644
--- a/arch/xtensa/kernel/hw_breakpoint.c
+++ b/arch/xtensa/kernel/hw_breakpoint.c
@@ -47,50 +47,41 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint 
*hw)
 /*
  * Construct an arch_hw_breakpoint from a perf_event.
  */
-static int arch_build_bp_info(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_X:
-   info->type = XTENSA_BREAKPOINT_EXECUTE;
+   hw->type = XTENSA_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
-   info->type = XTENSA_BREAKPOINT_LOAD;
+   hw->type = XTENSA_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
-   info->type = XTENSA_BREAKPOINT_STORE;
+   hw->type = XTENSA_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
-   info->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE;
+   hw->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
 
/* Len */
-   info->len = bp->attr.bp_len;
-   if (info->len < 1 || info->len > 64 || !is_power_of_2(info->len))
+   hw->len = attr->bp_len;
+   if (hw->len < 1 || hw->len > 64 || !is_power_of_2(hw->len))
return -EINVAL;
 
/* Address */
-   info->address = bp->attr.bp_addr;
-   if (info->address & (info->len - 1))
+   hw->address = attr->bp_addr;
+   if (hw->address & (hw->len - 1))
return -EINVAL;
 
return 0;
 }
 
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
-{
-   int ret;
-
-   /* Build the arch_hw_breakpoint. */
-   ret = arch_build_bp_info(bp);
-   return ret;
-}
-
 int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data)
 {
-- 
2.7.4

[PATCH 06/12] arm64: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm64/include/asm/hw_breakpoint.h |  6 ++-
 arch/arm64/kernel/hw_breakpoint.c  | 79 +-
 2 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 9f4a3d4..3b43319 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -119,13 +119,17 @@ static inline void decode_ctrl_reg(u32 reg,
 
 struct task_struct;
 struct notifier_block;
+struct perf_event_attr;
 struct perf_event;
 struct pmu;
 
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type, int *offset);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 92c6973..968c3af 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -421,53 +421,53 @@ int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl 
ctrl,
 /*
  * Construct an arch_hw_breakpoint from a perf_event.
  */
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_X:
-   info->ctrl.type = ARM_BREAKPOINT_EXECUTE;
+   hw->ctrl.type = ARM_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
-   info->ctrl.type = ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
-   info->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
+   hw->ctrl.type = ARM_BREAKPOINT_LOAD | ARM_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
 
/* Len */
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_1;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_2;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_3:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_3;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_3;
break;
case HW_BREAKPOINT_LEN_4:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_4;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_5:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_5;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_5;
break;
case HW_BREAKPOINT_LEN_6:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_6;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_6;
break;
case HW_BREAKPOINT_LEN_7:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_7;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_7;
break;
case HW_BREAKPOINT_LEN_8:
-   info->ctrl.len = ARM_BREAKPOINT_LEN_8;
+   hw->ctrl.len = ARM_BREAKPOINT_LEN_8;
break;
default:
return -EINVAL;
@@ -478,37 +478,37 @@ static int arch_build_bp_info(struct perf_event *bp)
 * AArch32 also requires breakpoints of length 2 for Thumb.
 * Watchpoints can be of length 1, 2, 4 or 8 bytes.

[PATCH 09/12] xtensa: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/xtensa/include/asm/hw_breakpoint.h |  6 +-
 arch/xtensa/kernel/hw_breakpoint.c  | 33 -
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/xtensa/include/asm/hw_breakpoint.h 
b/arch/xtensa/include/asm/hw_breakpoint.h
index 2525bf6..58632d3 100644
--- a/arch/xtensa/include/asm/hw_breakpoint.h
+++ b/arch/xtensa/include/asm/hw_breakpoint.h
@@ -30,13 +30,17 @@ struct arch_hw_breakpoint {
u16 type;
 };
 
+struct perf_event_attr;
 struct perf_event;
 struct pt_regs;
 struct task_struct;
 
 int hw_breakpoint_slots(int type);
 int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-int arch_validate_hwbkpt_settings(struct perf_event *bp);
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
 
diff --git a/arch/xtensa/kernel/hw_breakpoint.c 
b/arch/xtensa/kernel/hw_breakpoint.c
index 6e34c38..a0def44 100644
--- a/arch/xtensa/kernel/hw_breakpoint.c
+++ b/arch/xtensa/kernel/hw_breakpoint.c
@@ -47,50 +47,41 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint 
*hw)
 /*
  * Construct an arch_hw_breakpoint from a perf_event.
  */
-static int arch_build_bp_info(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_X:
-   info->type = XTENSA_BREAKPOINT_EXECUTE;
+   hw->type = XTENSA_BREAKPOINT_EXECUTE;
break;
case HW_BREAKPOINT_R:
-   info->type = XTENSA_BREAKPOINT_LOAD;
+   hw->type = XTENSA_BREAKPOINT_LOAD;
break;
case HW_BREAKPOINT_W:
-   info->type = XTENSA_BREAKPOINT_STORE;
+   hw->type = XTENSA_BREAKPOINT_STORE;
break;
case HW_BREAKPOINT_RW:
-   info->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE;
+   hw->type = XTENSA_BREAKPOINT_LOAD | XTENSA_BREAKPOINT_STORE;
break;
default:
return -EINVAL;
}
 
/* Len */
-   info->len = bp->attr.bp_len;
-   if (info->len < 1 || info->len > 64 || !is_power_of_2(info->len))
+   hw->len = attr->bp_len;
+   if (hw->len < 1 || hw->len > 64 || !is_power_of_2(hw->len))
return -EINVAL;
 
/* Address */
-   info->address = bp->attr.bp_addr;
-   if (info->address & (info->len - 1))
+   hw->address = attr->bp_addr;
+   if (hw->address & (hw->len - 1))
return -EINVAL;
 
return 0;
 }
 
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
-{
-   int ret;
-
-   /* Build the arch_hw_breakpoint. */
-   ret = arch_build_bp_info(bp);
-   return ret;
-}
-
 int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data)
 {
-- 
2.7.4

[PATCH 12/12] perf/breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check()

2018-05-18 Thread Frederic Weisbecker

Remove the dance around old and new attributes. Just don't modify the
previous breakpoint at all until we have verified everything.

Reported-by: Linus Torvalds 
Original-patch-by: Andy Lutomirski 
Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 kernel/events/hw_breakpoint.c | 46 ---
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 7840746..1af761a 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -461,37 +461,43 @@ register_user_hw_breakpoint(struct perf_event_attr *attr,
 }
 EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
 
+static void hw_breakpoint_copy_attr(struct perf_event_attr *to,
+   struct perf_event_attr *from)
+{
+   to->bp_addr = from->bp_addr;
+   to->bp_type = from->bp_type;
+   to->bp_len  = from->bp_len;
+   to->disabled = from->disabled;
+}
+
 int
 modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr 
*attr,
bool check)
 {
-   u64 old_addr = bp->attr.bp_addr;
-   u64 old_len  = bp->attr.bp_len;
-   int old_type = bp->attr.bp_type;
-   bool modify  = attr->bp_type != old_type;
struct arch_hw_breakpoint hw;
-   int err = 0;
-
-   bp->attr.bp_addr = attr->bp_addr;
-   bp->attr.bp_type = attr->bp_type;
-   bp->attr.bp_len  = attr->bp_len;
-
-   if (check && memcmp(>attr, attr, sizeof(*attr)))
-   return -EINVAL;
+   int err;
 
err = hw_breakpoint_parse(bp, attr, );
-   if (!err && modify)
-   err = modify_bp_slot(bp, old_type, bp->attr.bp_type);
-
-   if (err) {
-   bp->attr.bp_addr = old_addr;
-   bp->attr.bp_type = old_type;
-   bp->attr.bp_len  = old_len;
+   if (err)
return err;
+
+   if (check) {
+   struct perf_event_attr old_attr;
+
+   old_attr = bp->attr;
+   hw_breakpoint_copy_attr(_attr, attr);
+   if (memcmp(_attr, attr, sizeof(*attr)))
+   return -EINVAL;
+   }
+
+   if (bp->attr.bp_type != attr->bp_type) {
+   err = modify_bp_slot(bp, bp->attr.bp_type, attr->bp_type);
+   if (err)
+   return err;
}
 
+   hw_breakpoint_copy_attr(>attr, attr);
bp->hw.info = hw;
-   bp->attr.disabled = attr->disabled;
 
return 0;
 }
-- 
2.7.4

[PATCH 10/12] perf/breakpoint: Remove default hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

All architectures have implemented it, we can now remove the poor weak
version.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm/include/asm/hw_breakpoint.h |  1 -
 arch/arm64/include/asm/hw_breakpoint.h   |  1 -
 arch/powerpc/include/asm/hw_breakpoint.h |  1 -
 arch/sh/include/asm/hw_breakpoint.h  |  1 -
 arch/x86/include/asm/hw_breakpoint.h |  1 -
 arch/xtensa/include/asm/hw_breakpoint.h  |  1 -
 kernel/events/hw_breakpoint.c| 17 -
 7 files changed, 23 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index 451544d..3533fbe 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -122,7 +122,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 3b43319..d1a9c73 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -129,7 +129,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 8f647e6..6fe 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -65,7 +65,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
 int arch_install_hw_breakpoint(struct perf_event *bp);
diff --git a/arch/sh/include/asm/hw_breakpoint.h 
b/arch/sh/include/asm/hw_breakpoint.h
index ed421dc..4910990 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -58,7 +58,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/x86/include/asm/hw_breakpoint.h 
b/arch/x86/include/asm/hw_breakpoint.h
index 8cc161f..525f8c7 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -57,7 +57,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git

[PATCH 08/12] sh: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/sh/include/asm/hw_breakpoint.h |  6 +-
 arch/sh/kernel/hw_breakpoint.c  | 37 +++--
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/sh/include/asm/hw_breakpoint.h 
b/arch/sh/include/asm/hw_breakpoint.h
index dae622d..ed421dc 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -40,6 +40,7 @@ struct sh_ubc {
struct clk  *clk;   /* optional interface clock / MSTP bit */
 };
 
+struct perf_event_attr;
 struct perf_event;
 struct task_struct;
 struct pmu;
@@ -54,7 +55,10 @@ static inline int hw_breakpoint_slots(int type)
 
 /* arch/sh/kernel/hw_breakpoint.c */
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c
index 3594773..78a87b1 100644
--- a/arch/sh/kernel/hw_breakpoint.c
+++ b/arch/sh/kernel/hw_breakpoint.c
@@ -173,40 +173,40 @@ int arch_bp_generic_fields(int sh_len, int sh_type,
return 0;
 }
 
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
-   info->address = bp->attr.bp_addr;
+   hw->address = attr->bp_addr;
 
/* Len */
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->len = SH_BREAKPOINT_LEN_1;
+   hw->len = SH_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->len = SH_BREAKPOINT_LEN_2;
+   hw->len = SH_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
-   info->len = SH_BREAKPOINT_LEN_4;
+   hw->len = SH_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_8:
-   info->len = SH_BREAKPOINT_LEN_8;
+   hw->len = SH_BREAKPOINT_LEN_8;
break;
default:
return -EINVAL;
}
 
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_R:
-   info->type = SH_BREAKPOINT_READ;
+   hw->type = SH_BREAKPOINT_READ;
break;
case HW_BREAKPOINT_W:
-   info->type = SH_BREAKPOINT_WRITE;
+   hw->type = SH_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
-   info->type = SH_BREAKPOINT_RW;
+   hw->type = SH_BREAKPOINT_RW;
break;
default:
return -EINVAL;
@@ -218,19 +218,20 @@ static int arch_build_bp_info(struct perf_event *bp)
 /*
  * Validate the arch-specific HW Breakpoint register settings
  */
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
unsigned int align;
int ret;
 
-   ret = arch_build_bp_info(bp);
+   ret = arch_build_bp_info(bp, attr, hw);
if (ret)
return ret;
 
ret = -EINVAL;
 
-   switch (info->len) {
+   switch (hw->len) {
case

[PATCH 12/12] perf/breakpoint: Clean up and consolidate modify_user_hw_breakpoint_check()

2018-05-18 Thread Frederic Weisbecker

Remove the dance around old and new attributes. Just don't modify the
previous breakpoint at all until we have verified everything.

Reported-by: Linus Torvalds 
Original-patch-by: Andy Lutomirski 
Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 kernel/events/hw_breakpoint.c | 46 ---
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 7840746..1af761a 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -461,37 +461,43 @@ register_user_hw_breakpoint(struct perf_event_attr *attr,
 }
 EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
 
+static void hw_breakpoint_copy_attr(struct perf_event_attr *to,
+   struct perf_event_attr *from)
+{
+   to->bp_addr = from->bp_addr;
+   to->bp_type = from->bp_type;
+   to->bp_len  = from->bp_len;
+   to->disabled = from->disabled;
+}
+
 int
 modify_user_hw_breakpoint_check(struct perf_event *bp, struct perf_event_attr 
*attr,
bool check)
 {
-   u64 old_addr = bp->attr.bp_addr;
-   u64 old_len  = bp->attr.bp_len;
-   int old_type = bp->attr.bp_type;
-   bool modify  = attr->bp_type != old_type;
struct arch_hw_breakpoint hw;
-   int err = 0;
-
-   bp->attr.bp_addr = attr->bp_addr;
-   bp->attr.bp_type = attr->bp_type;
-   bp->attr.bp_len  = attr->bp_len;
-
-   if (check && memcmp(>attr, attr, sizeof(*attr)))
-   return -EINVAL;
+   int err;
 
err = hw_breakpoint_parse(bp, attr, );
-   if (!err && modify)
-   err = modify_bp_slot(bp, old_type, bp->attr.bp_type);
-
-   if (err) {
-   bp->attr.bp_addr = old_addr;
-   bp->attr.bp_type = old_type;
-   bp->attr.bp_len  = old_len;
+   if (err)
return err;
+
+   if (check) {
+   struct perf_event_attr old_attr;
+
+   old_attr = bp->attr;
+   hw_breakpoint_copy_attr(_attr, attr);
+   if (memcmp(_attr, attr, sizeof(*attr)))
+   return -EINVAL;
+   }
+
+   if (bp->attr.bp_type != attr->bp_type) {
+   err = modify_bp_slot(bp, bp->attr.bp_type, attr->bp_type);
+   if (err)
+   return err;
}
 
+   hw_breakpoint_copy_attr(>attr, attr);
bp->hw.info = hw;
-   bp->attr.disabled = attr->disabled;
 
return 0;
 }
-- 
2.7.4

[PATCH 10/12] perf/breakpoint: Remove default hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

All architectures have implemented it, we can now remove the poor weak
version.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm/include/asm/hw_breakpoint.h |  1 -
 arch/arm64/include/asm/hw_breakpoint.h   |  1 -
 arch/powerpc/include/asm/hw_breakpoint.h |  1 -
 arch/sh/include/asm/hw_breakpoint.h  |  1 -
 arch/x86/include/asm/hw_breakpoint.h |  1 -
 arch/xtensa/include/asm/hw_breakpoint.h  |  1 -
 kernel/events/hw_breakpoint.c| 17 -
 7 files changed, 23 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index 451544d..3533fbe 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -122,7 +122,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 3b43319..d1a9c73 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -129,7 +129,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 8f647e6..6fe 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -65,7 +65,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
 int arch_install_hw_breakpoint(struct perf_event *bp);
diff --git a/arch/sh/include/asm/hw_breakpoint.h 
b/arch/sh/include/asm/hw_breakpoint.h
index ed421dc..4910990 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -58,7 +58,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/x86/include/asm/hw_breakpoint.h 
b/arch/x86/include/asm/hw_breakpoint.h
index 8cc161f..525f8c7 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -57,7 +57,6 @@ extern int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw);
 extern int hw_breakpoint_arch_parse(struct perf_event *bp,
struct perf_event_attr *attr,
struct arch_hw_breakpoint *hw);
-#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/xtensa/include/asm/hw_breakpoint.h 
b/arch/xtensa/include/asm/hw_breakpoint.h
index 58632d3..f3b6036 100644
--- a/arch/xtensa/include/asm/hw_breakpoint.h
+++ b/arch/xtensa/include/asm/hw_breakpoint.h
@@ -40,7 +40,6 @@ int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint 
*hw);
 int hw_breakpoint_arch_parse(struct perf_event *bp,
 struct perf_event_attr *attr,
 struct arch_hw_breakpoint *hw);
-#define

[PATCH 08/12] sh: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/sh/include/asm/hw_breakpoint.h |  6 +-
 arch/sh/kernel/hw_breakpoint.c  | 37 +++--
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/sh/include/asm/hw_breakpoint.h 
b/arch/sh/include/asm/hw_breakpoint.h
index dae622d..ed421dc 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -40,6 +40,7 @@ struct sh_ubc {
struct clk  *clk;   /* optional interface clock / MSTP bit */
 };
 
+struct perf_event_attr;
 struct perf_event;
 struct task_struct;
 struct pmu;
@@ -54,7 +55,10 @@ static inline int hw_breakpoint_slots(int type)
 
 /* arch/sh/kernel/hw_breakpoint.c */
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c
index 3594773..78a87b1 100644
--- a/arch/sh/kernel/hw_breakpoint.c
+++ b/arch/sh/kernel/hw_breakpoint.c
@@ -173,40 +173,40 @@ int arch_bp_generic_fields(int sh_len, int sh_type,
return 0;
 }
 
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
-   info->address = bp->attr.bp_addr;
+   hw->address = attr->bp_addr;
 
/* Len */
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->len = SH_BREAKPOINT_LEN_1;
+   hw->len = SH_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->len = SH_BREAKPOINT_LEN_2;
+   hw->len = SH_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
-   info->len = SH_BREAKPOINT_LEN_4;
+   hw->len = SH_BREAKPOINT_LEN_4;
break;
case HW_BREAKPOINT_LEN_8:
-   info->len = SH_BREAKPOINT_LEN_8;
+   hw->len = SH_BREAKPOINT_LEN_8;
break;
default:
return -EINVAL;
}
 
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_R:
-   info->type = SH_BREAKPOINT_READ;
+   hw->type = SH_BREAKPOINT_READ;
break;
case HW_BREAKPOINT_W:
-   info->type = SH_BREAKPOINT_WRITE;
+   hw->type = SH_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
-   info->type = SH_BREAKPOINT_RW;
+   hw->type = SH_BREAKPOINT_RW;
break;
default:
return -EINVAL;
@@ -218,19 +218,20 @@ static int arch_build_bp_info(struct perf_event *bp)
 /*
  * Validate the arch-specific HW Breakpoint register settings
  */
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
unsigned int align;
int ret;
 
-   ret = arch_build_bp_info(bp);
+   ret = arch_build_bp_info(bp, attr, hw);
if (ret)
return ret;
 
ret = -EINVAL;
 
-   switch (info->len) {
+   switch (hw->len) {
case SH_BREAKPOINT_LEN_1:
align = 0;
break;
@@ -251,7 +252,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
 * Check that the low-order bits of the address are appropriate
 * for the alignment implied by len.
 */
-   if (info->address & align)
+   if (hw->address & align)
return -EINVAL;
 
return 0;
-- 
2.7.4

[PATCH 11/12] perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot()

2018-05-18 Thread Frederic Weisbecker

We soon won't be able to rely on bp->attr anymore to get the new
type of the modifying breakpoint because the new attributes are going
to be copied only once we successfully modified the breakpoint slot.

This will fix the current misdesigned layout where the new attr are
copied to the modifying breakpoint before we actually know if the
modification will be validated.

In order to prepare for that, allow modify_breakpoint_slot() to take
the new breakpoint type.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 kernel/events/hw_breakpoint.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index d7ed438..7840746 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -345,13 +345,13 @@ void release_bp_slot(struct perf_event *bp)
mutex_unlock(_bp_mutex);
 }
 
-static int __modify_bp_slot(struct perf_event *bp, u64 old_type)
+static int __modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type)
 {
int err;
 
__release_bp_slot(bp, old_type);
 
-   err = __reserve_bp_slot(bp, bp->attr.bp_type);
+   err = __reserve_bp_slot(bp, new_type);
if (err) {
/*
 * Reserve the old_type slot back in case
@@ -367,12 +367,12 @@ static int __modify_bp_slot(struct perf_event *bp, u64 
old_type)
return err;
 }
 
-static int modify_bp_slot(struct perf_event *bp, u64 old_type)
+static int modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type)
 {
int ret;
 
mutex_lock(_bp_mutex);
-   ret = __modify_bp_slot(bp, old_type);
+   ret = __modify_bp_slot(bp, old_type, new_type);
mutex_unlock(_bp_mutex);
return ret;
 }
@@ -481,7 +481,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
 
err = hw_breakpoint_parse(bp, attr, );
if (!err && modify)
-   err = modify_bp_slot(bp, old_type);
+   err = modify_bp_slot(bp, old_type, bp->attr.bp_type);
 
if (err) {
bp->attr.bp_addr = old_addr;
-- 
2.7.4

[PATCH 11/12] perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot()

2018-05-18 Thread Frederic Weisbecker

We soon won't be able to rely on bp->attr anymore to get the new
type of the modifying breakpoint because the new attributes are going
to be copied only once we successfully modified the breakpoint slot.

This will fix the current misdesigned layout where the new attr are
copied to the modifying breakpoint before we actually know if the
modification will be validated.

In order to prepare for that, allow modify_breakpoint_slot() to take
the new breakpoint type.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 kernel/events/hw_breakpoint.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index d7ed438..7840746 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -345,13 +345,13 @@ void release_bp_slot(struct perf_event *bp)
mutex_unlock(_bp_mutex);
 }
 
-static int __modify_bp_slot(struct perf_event *bp, u64 old_type)
+static int __modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type)
 {
int err;
 
__release_bp_slot(bp, old_type);
 
-   err = __reserve_bp_slot(bp, bp->attr.bp_type);
+   err = __reserve_bp_slot(bp, new_type);
if (err) {
/*
 * Reserve the old_type slot back in case
@@ -367,12 +367,12 @@ static int __modify_bp_slot(struct perf_event *bp, u64 
old_type)
return err;
 }
 
-static int modify_bp_slot(struct perf_event *bp, u64 old_type)
+static int modify_bp_slot(struct perf_event *bp, u64 old_type, u64 new_type)
 {
int ret;
 
mutex_lock(_bp_mutex);
-   ret = __modify_bp_slot(bp, old_type);
+   ret = __modify_bp_slot(bp, old_type, new_type);
mutex_unlock(_bp_mutex);
return ret;
 }
@@ -481,7 +481,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
 
err = hw_breakpoint_parse(bp, attr, );
if (!err && modify)
-   err = modify_bp_slot(bp, old_type);
+   err = modify_bp_slot(bp, old_type, bp->attr.bp_type);
 
if (err) {
bp->attr.bp_addr = old_addr;
-- 
2.7.4

[PATCH 03/12] x86: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.

Original-patch-by: Andy Lutomirski 
Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/x86/include/asm/hw_breakpoint.h |  6 +++-
 arch/x86/kernel/hw_breakpoint.c  | 60 ++--
 2 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/hw_breakpoint.h 
b/arch/x86/include/asm/hw_breakpoint.h
index 7892459..8cc161f 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -49,11 +49,15 @@ static inline int hw_breakpoint_slots(int type)
return HBP_NUM;
 }
 
+struct perf_event_attr;
 struct perf_event;
 struct pmu;
 
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index c433791..a8bceb0 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -239,19 +239,20 @@ int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw)
return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
 }
 
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
-   info->address = bp->attr.bp_addr;
+   hw->address = attr->bp_addr;
+   hw->mask = 0;
 
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_W:
-   info->type = X86_BREAKPOINT_WRITE;
+   hw->type = X86_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
-   info->type = X86_BREAKPOINT_RW;
+   hw->type = X86_BREAKPOINT_RW;
break;
case HW_BREAKPOINT_X:
/*
@@ -259,23 +260,23 @@ static int arch_build_bp_info(struct perf_event *bp)
 * acceptable for kprobes.  On non-kprobes kernels, we don't
 * allow kernel breakpoints at all.
 */
-   if (bp->attr.bp_addr >= TASK_SIZE_MAX) {
+   if (attr->bp_addr >= TASK_SIZE_MAX) {
 #ifdef CONFIG_KPROBES
-   if (within_kprobe_blacklist(bp->attr.bp_addr))
+   if (within_kprobe_blacklist(attr->bp_addr))
return -EINVAL;
 #else
return -EINVAL;
 #endif
}
 
-   info->type = X86_BREAKPOINT_EXECUTE;
+   hw->type = X86_BREAKPOINT_EXECUTE;
/*
 * x86 inst breakpoints need to have a specific undefined len.
 * But we still need to check userspace is not trying to setup
 * an unsupported length, to get a range breakpoint for example.
 */
-   if (bp->attr.bp_len == sizeof(long)) {
-   info->len = X86_BREAKPOINT_LEN_X;
+   if (attr->bp_len == sizeof(long)) {
+   hw->len = X86_BREAKPOINT_LEN_X;
return 0;
}
default:
@@ -283,28 +284,26 @@ static int arch_build_bp_info(struct perf_event *bp)
}
 
/* Len */
-   info->mask = 0;
-
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->len = X86_BREAKPOINT_LEN_1;
+   hw->len =

[PATCH 02/12] perf/breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace()

2018-05-18 Thread Frederic Weisbecker

We can't pass the breakpoint directly on arch_check_bp_in_kernelspace()
anymore because its architecture internal datas (struct arch_hw_breakpoint)
are not yet filled by the time we call the function, and most
implementation need this backend to be up to date. So arrange the
function to take the probing struct instead.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm/include/asm/hw_breakpoint.h |  2 +-
 arch/arm/kernel/hw_breakpoint.c  | 11 +++--
 arch/arm64/include/asm/hw_breakpoint.h   |  2 +-
 arch/arm64/kernel/hw_breakpoint.c|  9 ++--
 arch/powerpc/include/asm/hw_breakpoint.h |  2 +-
 arch/powerpc/kernel/hw_breakpoint.c  |  6 +--
 arch/sh/include/asm/hw_breakpoint.h  |  2 +-
 arch/sh/kernel/hw_breakpoint.c   |  9 ++--
 arch/x86/include/asm/hw_breakpoint.h |  2 +-
 arch/x86/kernel/hw_breakpoint.c  | 71 +---
 arch/xtensa/include/asm/hw_breakpoint.h  |  2 +-
 arch/xtensa/kernel/hw_breakpoint.c   |  7 ++--
 kernel/events/hw_breakpoint.c|  2 +-
 13 files changed, 63 insertions(+), 64 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index e46e4e7..d5a0f52 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -117,7 +117,7 @@ struct pmu;
 
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
 extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index 629e251..385dcf4 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -456,14 +456,13 @@ static int get_hbp_len(u8 hbp_len)
 /*
  * Check whether bp virtual address is in kernel space.
  */
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
 {
unsigned int len;
unsigned long va;
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
 
-   va = info->address;
-   len = get_hbp_len(info->ctrl.len);
+   va = hw->address;
+   len = get_hbp_len(hw->ctrl.len);
 
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
 }
@@ -576,7 +575,7 @@ static int arch_build_bp_info(struct perf_event *bp)
 
/* Privilege */
info->ctrl.privilege = ARM_BREAKPOINT_USER;
-   if (arch_check_bp_in_kernelspace(bp))
+   if (arch_check_bp_in_kernelspace(info))
info->ctrl.privilege |= ARM_BREAKPOINT_PRIV;
 
/* Enabled? */
@@ -640,7 +639,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
return -EINVAL;
 
/* We don't allow mismatch breakpoints in kernel space. */
-   if (arch_check_bp_in_kernelspace(bp))
+   if (arch_check_bp_in_kernelspace(info))
return -EPERM;
 
/*
diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 4177076..9f4a3d4 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -124,7 +124,7 @@ struct pmu;
 
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type, int *offset);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
 extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index

[PATCH 03/12] x86: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit.

Original-patch-by: Andy Lutomirski 
Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/x86/include/asm/hw_breakpoint.h |  6 +++-
 arch/x86/kernel/hw_breakpoint.c  | 60 ++--
 2 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/hw_breakpoint.h 
b/arch/x86/include/asm/hw_breakpoint.h
index 7892459..8cc161f 100644
--- a/arch/x86/include/asm/hw_breakpoint.h
+++ b/arch/x86/include/asm/hw_breakpoint.h
@@ -49,11 +49,15 @@ static inline int hw_breakpoint_slots(int type)
return HBP_NUM;
 }
 
+struct perf_event_attr;
 struct perf_event;
 struct pmu;
 
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
 
diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c
index c433791..a8bceb0 100644
--- a/arch/x86/kernel/hw_breakpoint.c
+++ b/arch/x86/kernel/hw_breakpoint.c
@@ -239,19 +239,20 @@ int arch_check_bp_in_kernelspace(struct 
arch_hw_breakpoint *hw)
return (va >= TASK_SIZE_MAX) || ((va + len - 1) >= TASK_SIZE_MAX);
 }
 
-static int arch_build_bp_info(struct perf_event *bp)
+static int arch_build_bp_info(struct perf_event *bp,
+ struct perf_event_attr *attr,
+ struct arch_hw_breakpoint *hw)
 {
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
-
-   info->address = bp->attr.bp_addr;
+   hw->address = attr->bp_addr;
+   hw->mask = 0;
 
/* Type */
-   switch (bp->attr.bp_type) {
+   switch (attr->bp_type) {
case HW_BREAKPOINT_W:
-   info->type = X86_BREAKPOINT_WRITE;
+   hw->type = X86_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
-   info->type = X86_BREAKPOINT_RW;
+   hw->type = X86_BREAKPOINT_RW;
break;
case HW_BREAKPOINT_X:
/*
@@ -259,23 +260,23 @@ static int arch_build_bp_info(struct perf_event *bp)
 * acceptable for kprobes.  On non-kprobes kernels, we don't
 * allow kernel breakpoints at all.
 */
-   if (bp->attr.bp_addr >= TASK_SIZE_MAX) {
+   if (attr->bp_addr >= TASK_SIZE_MAX) {
 #ifdef CONFIG_KPROBES
-   if (within_kprobe_blacklist(bp->attr.bp_addr))
+   if (within_kprobe_blacklist(attr->bp_addr))
return -EINVAL;
 #else
return -EINVAL;
 #endif
}
 
-   info->type = X86_BREAKPOINT_EXECUTE;
+   hw->type = X86_BREAKPOINT_EXECUTE;
/*
 * x86 inst breakpoints need to have a specific undefined len.
 * But we still need to check userspace is not trying to setup
 * an unsupported length, to get a range breakpoint for example.
 */
-   if (bp->attr.bp_len == sizeof(long)) {
-   info->len = X86_BREAKPOINT_LEN_X;
+   if (attr->bp_len == sizeof(long)) {
+   hw->len = X86_BREAKPOINT_LEN_X;
return 0;
}
default:
@@ -283,28 +284,26 @@ static int arch_build_bp_info(struct perf_event *bp)
}
 
/* Len */
-   info->mask = 0;
-
-   switch (bp->attr.bp_len) {
+   switch (attr->bp_len) {
case HW_BREAKPOINT_LEN_1:
-   info->len = X86_BREAKPOINT_LEN_1;
+   hw->len = X86_BREAKPOINT_LEN_1;
break;
case HW_BREAKPOINT_LEN_2:
-   info->len = X86_BREAKPOINT_LEN_2;
+   hw->len = X86_BREAKPOINT_LEN_2;
break;
case HW_BREAKPOINT_LEN_4:
-   info->len = X86_BREAKPOINT_LEN_4;
+   hw->len = X86_BREAKPOINT_LEN_4;
break;
 #ifdef CONFIG_X86_64
case HW_BREAKPOINT_LEN_8:
-   info->len = X86_BREAKPOINT_LEN_8;
+   hw->len =

[PATCH 02/12] perf/breakpoint: Pass arch breakpoint struct to arch_check_bp_in_kernelspace()

2018-05-18 Thread Frederic Weisbecker

We can't pass the breakpoint directly on arch_check_bp_in_kernelspace()
anymore because its architecture internal datas (struct arch_hw_breakpoint)
are not yet filled by the time we call the function, and most
implementation need this backend to be up to date. So arrange the
function to take the probing struct instead.

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/arm/include/asm/hw_breakpoint.h |  2 +-
 arch/arm/kernel/hw_breakpoint.c  | 11 +++--
 arch/arm64/include/asm/hw_breakpoint.h   |  2 +-
 arch/arm64/kernel/hw_breakpoint.c|  9 ++--
 arch/powerpc/include/asm/hw_breakpoint.h |  2 +-
 arch/powerpc/kernel/hw_breakpoint.c  |  6 +--
 arch/sh/include/asm/hw_breakpoint.h  |  2 +-
 arch/sh/kernel/hw_breakpoint.c   |  9 ++--
 arch/x86/include/asm/hw_breakpoint.h |  2 +-
 arch/x86/kernel/hw_breakpoint.c  | 71 +---
 arch/xtensa/include/asm/hw_breakpoint.h  |  2 +-
 arch/xtensa/kernel/hw_breakpoint.c   |  7 ++--
 kernel/events/hw_breakpoint.c|  2 +-
 13 files changed, 63 insertions(+), 64 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index e46e4e7..d5a0f52 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -117,7 +117,7 @@ struct pmu;
 
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
 extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index 629e251..385dcf4 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -456,14 +456,13 @@ static int get_hbp_len(u8 hbp_len)
 /*
  * Check whether bp virtual address is in kernel space.
  */
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
 {
unsigned int len;
unsigned long va;
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
 
-   va = info->address;
-   len = get_hbp_len(info->ctrl.len);
+   va = hw->address;
+   len = get_hbp_len(hw->ctrl.len);
 
return (va >= TASK_SIZE) && ((va + len - 1) >= TASK_SIZE);
 }
@@ -576,7 +575,7 @@ static int arch_build_bp_info(struct perf_event *bp)
 
/* Privilege */
info->ctrl.privilege = ARM_BREAKPOINT_USER;
-   if (arch_check_bp_in_kernelspace(bp))
+   if (arch_check_bp_in_kernelspace(info))
info->ctrl.privilege |= ARM_BREAKPOINT_PRIV;
 
/* Enabled? */
@@ -640,7 +639,7 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
return -EINVAL;
 
/* We don't allow mismatch breakpoints in kernel space. */
-   if (arch_check_bp_in_kernelspace(bp))
+   if (arch_check_bp_in_kernelspace(info))
return -EPERM;
 
/*
diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
b/arch/arm64/include/asm/hw_breakpoint.h
index 4177076..9f4a3d4 100644
--- a/arch/arm64/include/asm/hw_breakpoint.h
+++ b/arch/arm64/include/asm/hw_breakpoint.h
@@ -124,7 +124,7 @@ struct pmu;
 
 extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
  int *gen_len, int *gen_type, int *offset);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
 extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
   unsigned long val, void *data);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 74bb56f..92c6973 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -344,14 +344,13 @@ static int get_hbp_len(u8 hbp_len)
 /*
  * Check whether bp virtual address is in kernel space.
  */
-int arch_check_bp_in_kernelspace(struct perf_event *bp)
+int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw)
 {
unsigned int len;
unsigned long va;
-   struct arch_hw_breakpoint *info =

[PATCH 04/12] powerpc: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/powerpc/include/asm/hw_breakpoint.h |  6 -
 arch/powerpc/kernel/hw_breakpoint.c  | 41 
 2 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 4d0b1bf..8f647e6 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -52,6 +52,7 @@ struct arch_hw_breakpoint {
 #include 
 #include 
 
+struct perf_event_attr;
 struct perf_event;
 struct pmu;
 struct perf_sample_data;
@@ -61,7 +62,10 @@ struct perf_sample_data;
 extern int hw_breakpoint_slots(int type);
 extern int arch_bp_generic_fields(int type, int *gen_bp_type);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
 int arch_install_hw_breakpoint(struct perf_event *bp);
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 348cac9..fba6527 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -139,30 +139,31 @@ int arch_bp_generic_fields(int type, int *gen_bp_type)
 /*
  * Validate the arch-specific HW Breakpoint register settings
  */
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
int ret = -EINVAL, length_max;
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
 
if (!bp)
return ret;
 
-   info->type = HW_BRK_TYPE_TRANSLATE;
-   if (bp->attr.bp_type & HW_BREAKPOINT_R)
-   info->type |= HW_BRK_TYPE_READ;
-   if (bp->attr.bp_type & HW_BREAKPOINT_W)
-   info->type |= HW_BRK_TYPE_WRITE;
-   if (info->type == HW_BRK_TYPE_TRANSLATE)
+   hw->type = HW_BRK_TYPE_TRANSLATE;
+   if (attr->bp_type & HW_BREAKPOINT_R)
+   hw->type |= HW_BRK_TYPE_READ;
+   if (attr->bp_type & HW_BREAKPOINT_W)
+   hw->type |= HW_BRK_TYPE_WRITE;
+   if (hw->type == HW_BRK_TYPE_TRANSLATE)
/* must set alteast read or write */
return ret;
-   if (!(bp->attr.exclude_user))
-   info->type |= HW_BRK_TYPE_USER;
-   if (!(bp->attr.exclude_kernel))
-   info->type |= HW_BRK_TYPE_KERNEL;
-   if (!(bp->attr.exclude_hv))
-   info->type |= HW_BRK_TYPE_HYP;
-   info->address = bp->attr.bp_addr;
-   info->len = bp->attr.bp_len;
+   if (!attr->exclude_user)
+   hw->type |= HW_BRK_TYPE_USER;
+   if (!attr->exclude_kernel)
+   hw->type |= HW_BRK_TYPE_KERNEL;
+   if (!attr->exclude_hv)
+   hw->type |= HW_BRK_TYPE_HYP;
+   hw->address = attr->bp_addr;
+   hw->len = attr->bp_len;
 
/*
 * Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8)
@@ -176,12 +177,12 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
if (cpu_has_feature(CPU_FTR_DAWR)) {
length_max = 512 ; /* 64 doublewords */
/* DAWR region can't cross 512 boundary */
-   if ((bp->attr.bp_addr >> 10) != 
-   ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10))
+   if ((attr->bp_addr >> 10) !=
+   ((attr->bp_addr + attr->bp_len - 1) >> 10))
return -EINVAL;
}
-   if

[PATCH 07/12] sh: Remove "struct arch_hw_breakpoint::name" unused field

2018-05-18 Thread Frederic Weisbecker

This field seem to be unused, perhaps a leftover from old code...

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/sh/include/asm/hw_breakpoint.h | 1 -
 arch/sh/kernel/hw_breakpoint.c  | 7 ---
 2 files changed, 8 deletions(-)

diff --git a/arch/sh/include/asm/hw_breakpoint.h 
b/arch/sh/include/asm/hw_breakpoint.h
index 8a88ed0..dae622d 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -10,7 +10,6 @@
 #include 
 
 struct arch_hw_breakpoint {
-   char*name; /* Contains name of the symbol to set bkpt */
unsigned long   address;
u16 len;
u16 type;
diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c
index 713699b..3594773 100644
--- a/arch/sh/kernel/hw_breakpoint.c
+++ b/arch/sh/kernel/hw_breakpoint.c
@@ -248,13 +248,6 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
}
 
/*
-* For kernel-addresses, either the address or symbol name can be
-* specified.
-*/
-   if (info->name)
-   info->address = (unsigned long)kallsyms_lookup_name(info->name);
-
-   /*
 * Check that the low-order bits of the address are appropriate
 * for the alignment implied by len.
 */
-- 
2.7.4

[PATCH 04/12] powerpc: Implement hw_breakpoint_arch_parse()

2018-05-18 Thread Frederic Weisbecker

Migrate to the new API in order to remove arch_validate_hwbkpt_settings()
that clumsily mixes up architecture validation and commit

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/powerpc/include/asm/hw_breakpoint.h |  6 -
 arch/powerpc/kernel/hw_breakpoint.c  | 41 
 2 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_breakpoint.h 
b/arch/powerpc/include/asm/hw_breakpoint.h
index 4d0b1bf..8f647e6 100644
--- a/arch/powerpc/include/asm/hw_breakpoint.h
+++ b/arch/powerpc/include/asm/hw_breakpoint.h
@@ -52,6 +52,7 @@ struct arch_hw_breakpoint {
 #include 
 #include 
 
+struct perf_event_attr;
 struct perf_event;
 struct pmu;
 struct perf_sample_data;
@@ -61,7 +62,10 @@ struct perf_sample_data;
 extern int hw_breakpoint_slots(int type);
 extern int arch_bp_generic_fields(int type, int *gen_bp_type);
 extern int arch_check_bp_in_kernelspace(struct arch_hw_breakpoint *hw);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_arch_parse(struct perf_event *bp,
+   struct perf_event_attr *attr,
+   struct arch_hw_breakpoint *hw);
+#define hw_breakpoint_arch_parse hw_breakpoint_arch_parse
 extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
unsigned long val, void *data);
 int arch_install_hw_breakpoint(struct perf_event *bp);
diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index 348cac9..fba6527 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -139,30 +139,31 @@ int arch_bp_generic_fields(int type, int *gen_bp_type)
 /*
  * Validate the arch-specific HW Breakpoint register settings
  */
-int arch_validate_hwbkpt_settings(struct perf_event *bp)
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
int ret = -EINVAL, length_max;
-   struct arch_hw_breakpoint *info = counter_arch_bp(bp);
 
if (!bp)
return ret;
 
-   info->type = HW_BRK_TYPE_TRANSLATE;
-   if (bp->attr.bp_type & HW_BREAKPOINT_R)
-   info->type |= HW_BRK_TYPE_READ;
-   if (bp->attr.bp_type & HW_BREAKPOINT_W)
-   info->type |= HW_BRK_TYPE_WRITE;
-   if (info->type == HW_BRK_TYPE_TRANSLATE)
+   hw->type = HW_BRK_TYPE_TRANSLATE;
+   if (attr->bp_type & HW_BREAKPOINT_R)
+   hw->type |= HW_BRK_TYPE_READ;
+   if (attr->bp_type & HW_BREAKPOINT_W)
+   hw->type |= HW_BRK_TYPE_WRITE;
+   if (hw->type == HW_BRK_TYPE_TRANSLATE)
/* must set alteast read or write */
return ret;
-   if (!(bp->attr.exclude_user))
-   info->type |= HW_BRK_TYPE_USER;
-   if (!(bp->attr.exclude_kernel))
-   info->type |= HW_BRK_TYPE_KERNEL;
-   if (!(bp->attr.exclude_hv))
-   info->type |= HW_BRK_TYPE_HYP;
-   info->address = bp->attr.bp_addr;
-   info->len = bp->attr.bp_len;
+   if (!attr->exclude_user)
+   hw->type |= HW_BRK_TYPE_USER;
+   if (!attr->exclude_kernel)
+   hw->type |= HW_BRK_TYPE_KERNEL;
+   if (!attr->exclude_hv)
+   hw->type |= HW_BRK_TYPE_HYP;
+   hw->address = attr->bp_addr;
+   hw->len = attr->bp_len;
 
/*
 * Since breakpoint length can be a maximum of HW_BREAKPOINT_LEN(8)
@@ -176,12 +177,12 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
if (cpu_has_feature(CPU_FTR_DAWR)) {
length_max = 512 ; /* 64 doublewords */
/* DAWR region can't cross 512 boundary */
-   if ((bp->attr.bp_addr >> 10) != 
-   ((bp->attr.bp_addr + bp->attr.bp_len - 1) >> 10))
+   if ((attr->bp_addr >> 10) !=
+   ((attr->bp_addr + attr->bp_len - 1) >> 10))
return -EINVAL;
}
-   if (info->len >
-   (length_max - (info->address & HW_BREAKPOINT_ALIGN)))
+   if (hw->len >
+   (length_max - (hw->address & HW_BREAKPOINT_ALIGN)))
return -EINVAL;
return 0;
 }
-- 
2.7.4

[PATCH 07/12] sh: Remove "struct arch_hw_breakpoint::name" unused field

2018-05-18 Thread Frederic Weisbecker

This field seem to be unused, perhaps a leftover from old code...

Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 arch/sh/include/asm/hw_breakpoint.h | 1 -
 arch/sh/kernel/hw_breakpoint.c  | 7 ---
 2 files changed, 8 deletions(-)

diff --git a/arch/sh/include/asm/hw_breakpoint.h 
b/arch/sh/include/asm/hw_breakpoint.h
index 8a88ed0..dae622d 100644
--- a/arch/sh/include/asm/hw_breakpoint.h
+++ b/arch/sh/include/asm/hw_breakpoint.h
@@ -10,7 +10,6 @@
 #include 
 
 struct arch_hw_breakpoint {
-   char*name; /* Contains name of the symbol to set bkpt */
unsigned long   address;
u16 len;
u16 type;
diff --git a/arch/sh/kernel/hw_breakpoint.c b/arch/sh/kernel/hw_breakpoint.c
index 713699b..3594773 100644
--- a/arch/sh/kernel/hw_breakpoint.c
+++ b/arch/sh/kernel/hw_breakpoint.c
@@ -248,13 +248,6 @@ int arch_validate_hwbkpt_settings(struct perf_event *bp)
}
 
/*
-* For kernel-addresses, either the address or symbol name can be
-* specified.
-*/
-   if (info->name)
-   info->address = (unsigned long)kallsyms_lookup_name(info->name);
-
-   /*
 * Check that the low-order bits of the address are appropriate
 * for the alignment implied by len.
 */
-- 
2.7.4

[PATCH 01/12] perf/breakpoint: Split attribute parse and commit

2018-05-18 Thread Frederic Weisbecker

arch_validate_hwbkpt_settings() mixes up attribute check and commit into
a single code entity. Therefore the validation may return an error due to
incorrect atributes while still leaving halfway modified architecture
breakpoint data.

This is harmless when we deal with a new breakpoint but it becomes a
problem when we modify an existing breakpoint.

Split attribute parse and commit to fix that. The architecture is
passed a "struct arch_hw_breakpoint" to fill on top of the new attr
and the core takes care about copying the backend data once it's fully
validated. The architectures then need to implement the new API.

Reported-by: Linus Torvalds 
Original-patch-by: Andy Lutomirski 
Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 kernel/events/hw_breakpoint.c | 57 +++
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 6e28d28..51320c2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -400,16 +400,35 @@ int dbg_release_bp_slot(struct perf_event *bp)
return 0;
 }
 
-static int validate_hw_breakpoint(struct perf_event *bp)
+#ifndef hw_breakpoint_arch_parse
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   int ret;
+   int err;
 
-   ret = arch_validate_hwbkpt_settings(bp);
-   if (ret)
-   return ret;
+   err = arch_validate_hwbkpt_settings(bp);
+   if (err)
+   return err;
+
+   *hw = bp->hw.info;
+
+   return 0;
+}
+#endif
+
+static int hw_breakpoint_parse(struct perf_event *bp,
+  struct perf_event_attr *attr,
+  struct arch_hw_breakpoint *hw)
+{
+   int err;
+
+   err = hw_breakpoint_arch_parse(bp, attr, hw);
+   if (err)
+   return err;
 
if (arch_check_bp_in_kernelspace(bp)) {
-   if (bp->attr.exclude_kernel)
+   if (attr->exclude_kernel)
return -EINVAL;
/*
 * Don't let unprivileged users set a breakpoint in the trap
@@ -424,19 +443,22 @@ static int validate_hw_breakpoint(struct perf_event *bp)
 
 int register_perf_hw_breakpoint(struct perf_event *bp)
 {
-   int ret;
+   struct arch_hw_breakpoint hw;
+   int err;
 
-   ret = reserve_bp_slot(bp);
-   if (ret)
-   return ret;
+   err = reserve_bp_slot(bp);
+   if (err)
+   return err;
 
-   ret = validate_hw_breakpoint(bp);
-
-   /* if arch_validate_hwbkpt_settings() fails then release bp slot */
-   if (ret)
+   err = hw_breakpoint_parse(bp, >attr, );
+   if (err) {
release_bp_slot(bp);
+   return err;
+   }
 
-   return ret;
+   bp->hw.info = hw;
+
+   return 0;
 }
 
 /**
@@ -464,6 +486,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
u64 old_len  = bp->attr.bp_len;
int old_type = bp->attr.bp_type;
bool modify  = attr->bp_type != old_type;
+   struct arch_hw_breakpoint hw;
int err = 0;
 
bp->attr.bp_addr = attr->bp_addr;
@@ -473,7 +496,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
if (check && memcmp(>attr, attr, sizeof(*attr)))
return -EINVAL;
 
-   err = validate_hw_breakpoint(bp);
+   err = hw_breakpoint_parse(bp, attr, );
if (!err && modify)
err = modify_bp_slot(bp, old_type);
 
@@ -484,7 +507,9 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
return err;
}
 
+   bp->hw.info = hw;
bp->attr.disabled = attr->disabled;
+
return 0;
 }
 
-- 
2.7.4

[PATCH 01/12] perf/breakpoint: Split attribute parse and commit

2018-05-18 Thread Frederic Weisbecker

arch_validate_hwbkpt_settings() mixes up attribute check and commit into
a single code entity. Therefore the validation may return an error due to
incorrect atributes while still leaving halfway modified architecture
breakpoint data.

This is harmless when we deal with a new breakpoint but it becomes a
problem when we modify an existing breakpoint.

Split attribute parse and commit to fix that. The architecture is
passed a "struct arch_hw_breakpoint" to fill on top of the new attr
and the core takes care about copying the backend data once it's fully
validated. The architectures then need to implement the new API.

Reported-by: Linus Torvalds 
Original-patch-by: Andy Lutomirski 
Signed-off-by: Frederic Weisbecker 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Max Filippov 
Cc: Chris Zankel 
Cc: Catalin Marinas 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Alexander Shishkin 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Joel Fernandes 
---
 kernel/events/hw_breakpoint.c | 57 +++
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 6e28d28..51320c2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -400,16 +400,35 @@ int dbg_release_bp_slot(struct perf_event *bp)
return 0;
 }
 
-static int validate_hw_breakpoint(struct perf_event *bp)
+#ifndef hw_breakpoint_arch_parse
+int hw_breakpoint_arch_parse(struct perf_event *bp,
+struct perf_event_attr *attr,
+struct arch_hw_breakpoint *hw)
 {
-   int ret;
+   int err;
 
-   ret = arch_validate_hwbkpt_settings(bp);
-   if (ret)
-   return ret;
+   err = arch_validate_hwbkpt_settings(bp);
+   if (err)
+   return err;
+
+   *hw = bp->hw.info;
+
+   return 0;
+}
+#endif
+
+static int hw_breakpoint_parse(struct perf_event *bp,
+  struct perf_event_attr *attr,
+  struct arch_hw_breakpoint *hw)
+{
+   int err;
+
+   err = hw_breakpoint_arch_parse(bp, attr, hw);
+   if (err)
+   return err;
 
if (arch_check_bp_in_kernelspace(bp)) {
-   if (bp->attr.exclude_kernel)
+   if (attr->exclude_kernel)
return -EINVAL;
/*
 * Don't let unprivileged users set a breakpoint in the trap
@@ -424,19 +443,22 @@ static int validate_hw_breakpoint(struct perf_event *bp)
 
 int register_perf_hw_breakpoint(struct perf_event *bp)
 {
-   int ret;
+   struct arch_hw_breakpoint hw;
+   int err;
 
-   ret = reserve_bp_slot(bp);
-   if (ret)
-   return ret;
+   err = reserve_bp_slot(bp);
+   if (err)
+   return err;
 
-   ret = validate_hw_breakpoint(bp);
-
-   /* if arch_validate_hwbkpt_settings() fails then release bp slot */
-   if (ret)
+   err = hw_breakpoint_parse(bp, >attr, );
+   if (err) {
release_bp_slot(bp);
+   return err;
+   }
 
-   return ret;
+   bp->hw.info = hw;
+
+   return 0;
 }
 
 /**
@@ -464,6 +486,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
u64 old_len  = bp->attr.bp_len;
int old_type = bp->attr.bp_type;
bool modify  = attr->bp_type != old_type;
+   struct arch_hw_breakpoint hw;
int err = 0;
 
bp->attr.bp_addr = attr->bp_addr;
@@ -473,7 +496,7 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
if (check && memcmp(>attr, attr, sizeof(*attr)))
return -EINVAL;
 
-   err = validate_hw_breakpoint(bp);
+   err = hw_breakpoint_parse(bp, attr, );
if (!err && modify)
err = modify_bp_slot(bp, old_type);
 
@@ -484,7 +507,9 @@ modify_user_hw_breakpoint_check(struct perf_event *bp, 
struct perf_event_attr *a
return err;
}
 
+   bp->hw.info = hw;
bp->attr.disabled = attr->disabled;
+
return 0;
 }
 
-- 
2.7.4

[PATCH 00/12] breakpoint: Rework arch validation v2

2018-05-18 Thread Frederic Weisbecker


Changes since v1:

- Avoid arch code duplication between check and commit. Use a temporary
  struct arch_hw_breakpoint to fill up arch data on the fly
  (Mark Rutland)

- Start with a default weak version of hw_breakpoint_arch_parse() that
  architectures override one by one (Peter Zijlstra, Andy Lutomirski)

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
perf/breakpoint-v2

HEAD: a350174ed8f7240b91ae46933bfe2274d3d03806

--- Summary ---

When we modify a hardware breakpoint, the architecture code fills up
the architecture data as the validation of generic attributes progresses.
If something goes wrong in the middle, the architecture data changes
aren't rolled back and we are left with a halfway fiddled breakpoint.

This set fixes the various misdesigns that back this bad behaviour.

Thanks,
Frederic
---

Frederic Weisbecker (12):
  perf/breakpoint: Split attribute parse and commit
  perf/breakpoint: Pass arch breakpoint struct to 
arch_check_bp_in_kernelspace()
  x86: Implement hw_breakpoint_arch_parse()
  powerpc: Implement hw_breakpoint_arch_parse()
  arm: Implement hw_breakpoint_arch_parse()
  arm64: Implement hw_breakpoint_arch_parse()
  sh: Remove "struct arch_hw_breakpoint::name" unused field
  sh: Implement hw_breakpoint_arch_parse()
  xtensa: Implement hw_breakpoint_arch_parse()
  perf/breakpoint: Remove default hw_breakpoint_arch_parse()
  perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot()
  perf/breakpoint: Clean up and consolidate 
modify_user_hw_breakpoint_check()


 arch/arm/include/asm/hw_breakpoint.h |   7 +-
 arch/arm/kernel/hw_breakpoint.c  |  78 +-
 arch/arm64/include/asm/hw_breakpoint.h   |   7 +-
 arch/arm64/kernel/hw_breakpoint.c|  86 ++--
 arch/powerpc/include/asm/hw_breakpoint.h |   7 +-
 arch/powerpc/kernel/hw_breakpoint.c  |  47 ++-
 arch/sh/include/asm/hw_breakpoint.h  |   8 +-
 arch/sh/kernel/hw_breakpoint.c   |  53 ++---
 arch/x86/include/asm/hw_breakpoint.h |   7 +-
 arch/x86/kernel/hw_breakpoint.c  | 131 ---
 arch/xtensa/include/asm/hw_breakpoint.h  |   7 +-
 arch/xtensa/kernel/hw_breakpoint.c   |  40 --
 kernel/events/hw_breakpoint.c|  92 +-
 13 files changed, 294 insertions(+), 276 deletions(-)

[PATCH 00/12] breakpoint: Rework arch validation v2

2018-05-18 Thread Frederic Weisbecker


Changes since v1:

- Avoid arch code duplication between check and commit. Use a temporary
  struct arch_hw_breakpoint to fill up arch data on the fly
  (Mark Rutland)

- Start with a default weak version of hw_breakpoint_arch_parse() that
  architectures override one by one (Peter Zijlstra, Andy Lutomirski)

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
perf/breakpoint-v2

HEAD: a350174ed8f7240b91ae46933bfe2274d3d03806

--- Summary ---

When we modify a hardware breakpoint, the architecture code fills up
the architecture data as the validation of generic attributes progresses.
If something goes wrong in the middle, the architecture data changes
aren't rolled back and we are left with a halfway fiddled breakpoint.

This set fixes the various misdesigns that back this bad behaviour.

Thanks,
Frederic
---

Frederic Weisbecker (12):
  perf/breakpoint: Split attribute parse and commit
  perf/breakpoint: Pass arch breakpoint struct to 
arch_check_bp_in_kernelspace()
  x86: Implement hw_breakpoint_arch_parse()
  powerpc: Implement hw_breakpoint_arch_parse()
  arm: Implement hw_breakpoint_arch_parse()
  arm64: Implement hw_breakpoint_arch_parse()
  sh: Remove "struct arch_hw_breakpoint::name" unused field
  sh: Implement hw_breakpoint_arch_parse()
  xtensa: Implement hw_breakpoint_arch_parse()
  perf/breakpoint: Remove default hw_breakpoint_arch_parse()
  perf/breakpoint: Pass new breakpoint type to modify_breakpoint_slot()
  perf/breakpoint: Clean up and consolidate 
modify_user_hw_breakpoint_check()


 arch/arm/include/asm/hw_breakpoint.h |   7 +-
 arch/arm/kernel/hw_breakpoint.c  |  78 +-
 arch/arm64/include/asm/hw_breakpoint.h   |   7 +-
 arch/arm64/kernel/hw_breakpoint.c|  86 ++--
 arch/powerpc/include/asm/hw_breakpoint.h |   7 +-
 arch/powerpc/kernel/hw_breakpoint.c  |  47 ++-
 arch/sh/include/asm/hw_breakpoint.h  |   8 +-
 arch/sh/kernel/hw_breakpoint.c   |  53 ++---
 arch/x86/include/asm/hw_breakpoint.h |   7 +-
 arch/x86/kernel/hw_breakpoint.c  | 131 ---
 arch/xtensa/include/asm/hw_breakpoint.h  |   7 +-
 arch/xtensa/kernel/hw_breakpoint.c   |  40 --
 kernel/events/hw_breakpoint.c|  92 +-
 13 files changed, 294 insertions(+), 276 deletions(-)

Re: [PATCH 8/9] perf/breakpoint: Split breakpoint "check" and "commit"

2018-05-18 Thread Frederic Weisbecker

On Tue, May 15, 2018 at 09:58:03PM -0700, Andy Lutomirski wrote:
> 
> 
> > On May 15, 2018, at 8:11 PM, Frederic Weisbecker  
> > wrote:
> > 
> >> On Wed, May 09, 2018 at 11:17:03AM +0200, Peter Zijlstra wrote:
> >>> On Sun, May 06, 2018 at 09:19:54PM +0200, Frederic Weisbecker wrote:
> >>> arch/arm/include/asm/hw_breakpoint.h |  5 -
> >>> arch/arm/kernel/hw_breakpoint.c  | 22 +++---
> >>> arch/arm64/include/asm/hw_breakpoint.h   |  5 -
> >>> arch/arm64/kernel/hw_breakpoint.c| 22 +++---
> >>> arch/powerpc/include/asm/hw_breakpoint.h |  5 -
> >>> arch/powerpc/kernel/hw_breakpoint.c  | 22 +++---
> >>> arch/sh/include/asm/hw_breakpoint.h |  5 -
> >>> arch/sh/kernel/hw_breakpoint.c   | 22 +++---
> >>> arch/x86/include/asm/hw_breakpoint.h |  5 -
> >>> arch/x86/kernel/hw_breakpoint.c  | 23 +++
> >>> arch/xtensa/include/asm/hw_breakpoint.h  |  5 -
> >>> arch/xtensa/kernel/hw_breakpoint.c   | 22 +++---
> >> 
> >> Because of those ^,
> >> 
> >>> kernel/events/hw_breakpoint.c| 11 ++-
> >> 
> >> would it not make sense to have a prelimenary patch doing something
> >> like:
> >> 
> >> __weak int hw_breakpoint_arch_check(struct perf_event *bp)
> >> {
> >>return arch_validate_hwbkpt_settings(bp);
> >> }
> > 
> > So eventually I fear I can't do that, due to linking order.
> > 
> > Say I convert x86 to implement hw_breakpoint_arch_check(), so I
> > remove arch_validate_hwbkpt_settings(). On build time, the weak version
> > is still compiled and can't find a declaration for 
> > arch_validate_hwbkpt_settings().
> > 
> > I tried to keep the declaration while the definition has been removed but
> > it seems the weak version is linked first before it gets later replaced by
> > the overriden arch version. So I get a build error.
> > 
> > I could keep arch_validate_hwbkpt_settings() around on all archs and remove 
> > it in
> > the end with the weak version but that would defeat the purpose of removing
> > the mid-state in the current patch.
> 
> How about just not using weak functions?  Weak functions have annoying issues 
> like this, and they have trouble generating good code. I much prefer the 
> pattern:
> 
> in arch header:
> extern void arch_func(whatever);
> #define arch_func arch_func
> 
> in generic header:
> #ifndef arch_func
> static inline void arch_func(whatever) ...
> #endif

Thanks, that works well!

Re: [PATCH 8/9] perf/breakpoint: Split breakpoint "check" and "commit"

2018-05-18 Thread Frederic Weisbecker

On Tue, May 15, 2018 at 09:58:03PM -0700, Andy Lutomirski wrote:
> 
> 
> > On May 15, 2018, at 8:11 PM, Frederic Weisbecker  
> > wrote:
> > 
> >> On Wed, May 09, 2018 at 11:17:03AM +0200, Peter Zijlstra wrote:
> >>> On Sun, May 06, 2018 at 09:19:54PM +0200, Frederic Weisbecker wrote:
> >>> arch/arm/include/asm/hw_breakpoint.h |  5 -
> >>> arch/arm/kernel/hw_breakpoint.c  | 22 +++---
> >>> arch/arm64/include/asm/hw_breakpoint.h   |  5 -
> >>> arch/arm64/kernel/hw_breakpoint.c| 22 +++---
> >>> arch/powerpc/include/asm/hw_breakpoint.h |  5 -
> >>> arch/powerpc/kernel/hw_breakpoint.c  | 22 +++---
> >>> arch/sh/include/asm/hw_breakpoint.h |  5 -
> >>> arch/sh/kernel/hw_breakpoint.c   | 22 +++---
> >>> arch/x86/include/asm/hw_breakpoint.h |  5 -
> >>> arch/x86/kernel/hw_breakpoint.c  | 23 +++
> >>> arch/xtensa/include/asm/hw_breakpoint.h  |  5 -
> >>> arch/xtensa/kernel/hw_breakpoint.c   | 22 +++---
> >> 
> >> Because of those ^,
> >> 
> >>> kernel/events/hw_breakpoint.c| 11 ++-
> >> 
> >> would it not make sense to have a prelimenary patch doing something
> >> like:
> >> 
> >> __weak int hw_breakpoint_arch_check(struct perf_event *bp)
> >> {
> >>return arch_validate_hwbkpt_settings(bp);
> >> }
> > 
> > So eventually I fear I can't do that, due to linking order.
> > 
> > Say I convert x86 to implement hw_breakpoint_arch_check(), so I
> > remove arch_validate_hwbkpt_settings(). On build time, the weak version
> > is still compiled and can't find a declaration for 
> > arch_validate_hwbkpt_settings().
> > 
> > I tried to keep the declaration while the definition has been removed but
> > it seems the weak version is linked first before it gets later replaced by
> > the overriden arch version. So I get a build error.
> > 
> > I could keep arch_validate_hwbkpt_settings() around on all archs and remove 
> > it in
> > the end with the weak version but that would defeat the purpose of removing
> > the mid-state in the current patch.
> 
> How about just not using weak functions?  Weak functions have annoying issues 
> like this, and they have trouble generating good code. I much prefer the 
> pattern:
> 
> in arch header:
> extern void arch_func(whatever);
> #define arch_func arch_func
> 
> in generic header:
> #ifndef arch_func
> static inline void arch_func(whatever) ...
> #endif

Thanks, that works well!

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Dmitry Safonov

2018-05-19 3:25 GMT+01:00 Dmitry Safonov <0x7f454...@gmail.com>:
>> Here is the function:
>> 00400842 :
>>   400842:   53  push   %rbx
>>   400843:   55  push   %rbp
>>   400844:   41 54   push   %r12
>>   400846:   41 55   push   %r13
>>   400848:   41 56   push   %r14
>>   40084a:   41 57   push   %r15
>>   40084c:   9c  pushfq
>>   40084d:   48 89 27mov%rsp,(%rdi)
>>   400850:   48 89 fcmov%rdi,%rsp
>>   400853:   6a 23   pushq  $0x23
>>   400855:   68 5c 08 40 00  pushq  $0x40085c
>>   40085a:   48 cb   lretq
>>   40085c:   ff d6   callq  *%rsi
>>   40085e:   ea  (bad)
>>   40085f:   65 08 40 00 or %al,%gs:0x0(%rax)
>>   400863:   33 00   xor(%rax),%eax
>>   400865:   48 8b 24 24 mov(%rsp),%rsp
>>   400869:   9d  popfq
>>   40086a:   41 5f   pop%r15
>>   40086c:   41 5e   pop%r14
>>   40086e:   41 5d   pop%r13
>>   400870:   41 5c   pop%r12
>>   400872:   5d  pop%rbp
>>   400873:   5b  pop%rbx
>>   400874:   c3  retq
>>   400875:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
>>   40087c:   00 00 00
>>   40087f:   90  nop
>>
>> Looks like mov between registers caused it? The hell.
>
> Oh, it's not 400850, I missloked, but 40085a so lretq might case it.

But it's
002b:417bafe8
USER_DS and sensible address, still no idea.

-- 
 Dmitry

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Dmitry Safonov

2018-05-19 3:25 GMT+01:00 Dmitry Safonov <0x7f454...@gmail.com>:
>> Here is the function:
>> 00400842 :
>>   400842:   53  push   %rbx
>>   400843:   55  push   %rbp
>>   400844:   41 54   push   %r12
>>   400846:   41 55   push   %r13
>>   400848:   41 56   push   %r14
>>   40084a:   41 57   push   %r15
>>   40084c:   9c  pushfq
>>   40084d:   48 89 27mov%rsp,(%rdi)
>>   400850:   48 89 fcmov%rdi,%rsp
>>   400853:   6a 23   pushq  $0x23
>>   400855:   68 5c 08 40 00  pushq  $0x40085c
>>   40085a:   48 cb   lretq
>>   40085c:   ff d6   callq  *%rsi
>>   40085e:   ea  (bad)
>>   40085f:   65 08 40 00 or %al,%gs:0x0(%rax)
>>   400863:   33 00   xor(%rax),%eax
>>   400865:   48 8b 24 24 mov(%rsp),%rsp
>>   400869:   9d  popfq
>>   40086a:   41 5f   pop%r15
>>   40086c:   41 5e   pop%r14
>>   40086e:   41 5d   pop%r13
>>   400870:   41 5c   pop%r12
>>   400872:   5d  pop%rbp
>>   400873:   5b  pop%rbx
>>   400874:   c3  retq
>>   400875:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
>>   40087c:   00 00 00
>>   40087f:   90  nop
>>
>> Looks like mov between registers caused it? The hell.
>
> Oh, it's not 400850, I missloked, but 40085a so lretq might case it.

But it's
002b:417bafe8
USER_DS and sensible address, still no idea.

-- 
 Dmitry

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie

On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote:
> On 2018年05月18日 22:33, Tiwei Bie wrote:
> > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> > > On 2018年05月18日 19:29, Tiwei Bie wrote:
> > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue 
> > > > > > > > > > > > *vq, unsigned int head,
> > > > > > > > > > > > + unsigned int id, void 
> > > > > > > > > > > > **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   i = head;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > > > As mentioned in previous discussion, this probably won't 
> > > > > > > > > > > work for the case
> > > > > > > > > > > of out of order completion since it depends on the 
> > > > > > > > > > > information in the
> > > > > > > > > > > descriptor ring. We probably need to extend ctx to record 
> > > > > > > > > > > such information.
> > > > > > > > > > Above code doesn't depend on the information in the 
> > > > > > > > > > descriptor
> > > > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > > > 
> > > > > > > > > > Best regards,
> > > > > > > > > > Tiwei Bie
> > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > > > vring_unmap_one_packed() still depends on the content of 
> > > > > > > > > descriptor ring?
> > > > > > > > > 
> > > > > > > > I got your point now. I think it makes sense to reserve
> > > > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > > > no matter whether we support out-of-order or not.
> > > > > > > Maybe I was wrong, but I remember spec mentioned something like 
> > > > > > > this.
> > > > > > You're right. Spec mentioned this. I was just repeating
> > > > > > the spec to emphasize that it does make sense. :)
> > > > > > 
> > > > > > > > But combining it with the out-of-order support, it will
> > > > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > > > list that is very similar to the desc ring in the split
> > > > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > > > To make it work for OOO backends I think we need something like 
> > > > > > > this
> > > > > > > (hardware NIC drivers are usually have something like this).
> > > > > > Which hardware NIC drivers have this?
> > > > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > > > It seems that I had a misunderstanding on your
> > > > previous comments. I know it's quite common for
> > > > drivers to track e.g. DMA addrs somewhere (and
> > > > I think one reason behind this is that they want
> > > > to reuse the bits of addr field).
> > > Yes, we may want this for virtio-net as well in the future.
> > > 
> > > >But tracking
> > > > addrs somewhere doesn't means supporting OOO.
> > > > I thought you were saying it's quite common for
> > > > hardware NIC drivers to support OOO (i.e. NICs
> > > > will return the descriptors OOO):
> > > > 
> > > > I'm not familiar with mlx4, maybe I'm wrong.
> > > > I just had a quick glance. And I found below
> > > > comments in mlx4_en_process_rx_cq():
> > > > 
> > > > ```
> > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> > > >* descriptor offset can be deduced from the CQE index instead of
> > > >* reading 'cqe->index' */
> > > > index = cq->mcq.cons_index & ring->size_mask;
> > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > > > ```
> > > > 
> > > > It seems that although they have a completion
> > > > queue, they are still using the ring in order.
> > > I guess so (at least from the above bits). Git grep -i "out of order" in
> > >

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Tiwei Bie

On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote:
> On 2018年05月18日 22:33, Tiwei Bie wrote:
> > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> > > On 2018年05月18日 19:29, Tiwei Bie wrote:
> > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue 
> > > > > > > > > > > > *vq, unsigned int head,
> > > > > > > > > > > > + unsigned int id, void 
> > > > > > > > > > > > **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > > > +   vq->desc_state[id].data = NULL;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   i = head;
> > > > > > > > > > > > +
> > > > > > > > > > > > +   for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > > > +   desc = >vring_packed.desc[i];
> > > > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > > > As mentioned in previous discussion, this probably won't 
> > > > > > > > > > > work for the case
> > > > > > > > > > > of out of order completion since it depends on the 
> > > > > > > > > > > information in the
> > > > > > > > > > > descriptor ring. We probably need to extend ctx to record 
> > > > > > > > > > > such information.
> > > > > > > > > > Above code doesn't depend on the information in the 
> > > > > > > > > > descriptor
> > > > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > > > 
> > > > > > > > > > Best regards,
> > > > > > > > > > Tiwei Bie
> > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > > > vring_unmap_one_packed() still depends on the content of 
> > > > > > > > > descriptor ring?
> > > > > > > > > 
> > > > > > > > I got your point now. I think it makes sense to reserve
> > > > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > > > no matter whether we support out-of-order or not.
> > > > > > > Maybe I was wrong, but I remember spec mentioned something like 
> > > > > > > this.
> > > > > > You're right. Spec mentioned this. I was just repeating
> > > > > > the spec to emphasize that it does make sense. :)
> > > > > > 
> > > > > > > > But combining it with the out-of-order support, it will
> > > > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > > > list that is very similar to the desc ring in the split
> > > > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > > > To make it work for OOO backends I think we need something like 
> > > > > > > this
> > > > > > > (hardware NIC drivers are usually have something like this).
> > > > > > Which hardware NIC drivers have this?
> > > > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > > > It seems that I had a misunderstanding on your
> > > > previous comments. I know it's quite common for
> > > > drivers to track e.g. DMA addrs somewhere (and
> > > > I think one reason behind this is that they want
> > > > to reuse the bits of addr field).
> > > Yes, we may want this for virtio-net as well in the future.
> > > 
> > > >But tracking
> > > > addrs somewhere doesn't means supporting OOO.
> > > > I thought you were saying it's quite common for
> > > > hardware NIC drivers to support OOO (i.e. NICs
> > > > will return the descriptors OOO):
> > > > 
> > > > I'm not familiar with mlx4, maybe I'm wrong.
> > > > I just had a quick glance. And I found below
> > > > comments in mlx4_en_process_rx_cq():
> > > > 
> > > > ```
> > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> > > >* descriptor offset can be deduced from the CQE index instead of
> > > >* reading 'cqe->index' */
> > > > index = cq->mcq.cons_index & ring->size_mask;
> > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > > > ```
> > > > 
> > > > It seems that although they have a completion
> > > > queue, they are still using the ring in order.
> > > I guess so (at least from the above bits). Git grep -i "out of order" in
> > >

Re: Tasks RCU vs Preempt RCU

2018-05-18 Thread Paul E. McKenney

On Fri, May 18, 2018 at 11:36:23AM -0700, Joel Fernandes wrote:
> Hi,
> 
> I was thinking about tasks-RCU and why its needed. Since preempt-RCU allows
> tasks to be preempted in read-sections, can we not just reuse that mechanism
> for the trampolines since we track all preempted tasks so we would wait on
> all tasks preempted within a trampoline?
> 
> I am trying to understand what will _not_ work if we did that.. I'm guessing
> the answer is that that would mean the trampoline has to be wrapped with
> rcu_read_{lock,unlock} which may add some overhead, but please let me know
> if I'm missing something else..
> 
> The advantage I guess is possible elimination of an RCU variant, and also
> possibly eliminating the tasks RCU thread that monitors.. Anyway I was
> thinking more in terms of the effort of reduction of the RCU flavors etc and
> reducing complexity ideas.

The problem is that if they are preempted while executing in a trampoline,
RCU-preempt doesn't queue them nor does it wait on them.

And the problem with wrapping them with rcu_read_{lock,unlock} is that
there would be a point before the trampoline executed rcu_read_lock()
but while it was on the trampoline.  Nothing good comes from this.  ;-)

Thanx, Paul

Re: Tasks RCU vs Preempt RCU

2018-05-18 Thread Paul E. McKenney

On Fri, May 18, 2018 at 11:36:23AM -0700, Joel Fernandes wrote:
> Hi,
> 
> I was thinking about tasks-RCU and why its needed. Since preempt-RCU allows
> tasks to be preempted in read-sections, can we not just reuse that mechanism
> for the trampolines since we track all preempted tasks so we would wait on
> all tasks preempted within a trampoline?
> 
> I am trying to understand what will _not_ work if we did that.. I'm guessing
> the answer is that that would mean the trampoline has to be wrapped with
> rcu_read_{lock,unlock} which may add some overhead, but please let me know
> if I'm missing something else..
> 
> The advantage I guess is possible elimination of an RCU variant, and also
> possibly eliminating the tasks RCU thread that monitors.. Anyway I was
> thinking more in terms of the effort of reduction of the RCU flavors etc and
> reducing complexity ideas.

The problem is that if they are preempted while executing in a trampoline,
RCU-preempt doesn't queue them nor does it wait on them.

And the problem with wrapping them with rcu_read_{lock,unlock} is that
there would be a point before the trampoline executed rcu_read_lock()
but while it was on the trampoline.  Nothing good comes from this.  ;-)

Thanx, Paul

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Dmitry Safonov

2018-05-19 3:22 GMT+01:00 Dmitry Safonov :
> On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
>> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com>
>> > cpu family: 6
>> > model: 142
>> > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>> > But I usually test kernels in VM. So, I use virt-manager as it's
>> > easier to manage
>> > multiple VMs. The thing is that I've chosen "Copy host CPU
>> > configuration"
>> > and for some reason, I don't quite follow virt-manager makes model
>>
>> "Opteron_G4".
>> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-
>> > 2.fc26).
>> > So, cpuinfo in VM says:
>> > cpu family: 21
>> > model: 1
>> > model name: AMD Opteron 62xx class CPU
>>
>> What does guest cpuinfo say for vendor_id?
>>
>> There are multiple potential screwups here.
>>
>> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
>> that’s different from Intel’s, and the test case was definitely
>> wrong. But
>> KVM has no way to influence it.  Are you sure you’re using KVM and
>> not QEMU
>> TCG? Anyway, the IRET thing is minor compared to your other problems,
>> so
>> let’s try to fix them first.
>>
>> 2. Compat fast syscalls are wildly different on AMD and Intel.
>> Because of
>> this issue, QEMU with KVM is supposed to always report the real
>> vendor_id
>> no matter -cpu asks for.  If we get the wrong vendor_id, then we’re
>> at the
>> mercy of KVM’s emulation and performance will suck.  On older
>> kernels, this
>> would cause hideous kernel crashes.  On new kernels, I would expect
>> it to
>> merely crash 32-bit user programs or be slow.
>
> Heh, I didn't know those details, so it looks like it's (2),
> vendor_id   : AuthenticAMD
> in guest.
>
>>
>> > What's worse than registers changes is that some selftests actually
>> > lead
>>
>> to
>> > Oops's. The same reason for criu-ia32 fails.
>> > I've tested so far v4.15 and v4.16 releases besides master
>> > (2c71d338bef2),
>> > so it looks to be not a recent regression.
>> > Full Oopses:
>> > [  189.100174] BUG: unable to handle kernel paging request at
>>
>> 417bafe8
>> > [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067
>> > PTE
>>
>> 6991f067
>> > [  189.100174] Oops: 0001 [#3] SMP NOPTI
>>
>> Whoa there!  0001 means a failed *kernel* access.
>>
>> > [  189.100174] Modules linked in:
>> > [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
>>
>> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?
>
> sysret_ss_attrs_32 survives
>
>>
>> > D   4.17.0-rc5+ #11
>> > [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> > 1996),
>> > BIOS 1.10.2-1.fc26 04/01/2014
>> > [  189.103187] RIP: 0033:0x40085a
>>
>> The oops was caused from CPL 3 at what looks like a totally sensible
>> user
>> address.  Can you disassemble the offending binary and tell me what
>> the
>> code at 0x40085a is?
>
> Here is the function:
> 00400842 :
>   400842:   53  push   %rbx
>   400843:   55  push   %rbp
>   400844:   41 54   push   %r12
>   400846:   41 55   push   %r13
>   400848:   41 56   push   %r14
>   40084a:   41 57   push   %r15
>   40084c:   9c  pushfq
>   40084d:   48 89 27mov%rsp,(%rdi)
>   400850:   48 89 fcmov%rdi,%rsp
>   400853:   6a 23   pushq  $0x23
>   400855:   68 5c 08 40 00  pushq  $0x40085c
>   40085a:   48 cb   lretq
>   40085c:   ff d6   callq  *%rsi
>   40085e:   ea  (bad)
>   40085f:   65 08 40 00 or %al,%gs:0x0(%rax)
>   400863:   33 00   xor(%rax),%eax
>   400865:   48 8b 24 24 mov(%rsp),%rsp
>   400869:   9d  popfq
>   40086a:   41 5f   pop%r15
>   40086c:   41 5e   pop%r14
>   40086e:   41 5d   pop%r13
>   400870:   41 5c   pop%r12
>   400872:   5d  pop%rbp
>   400873:   5b  pop%rbx
>   400874:   c3  retq
>   400875:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
>   40087c:   00 00 00
>   40087f:   90  nop
>
> Looks like mov between registers caused it? The hell.

Oh, it's not 400850, I missloked, but 40085a so lretq might case it.

>
>>
>> > [  189.103187] RSP: 002b:417bafe8 EFLAGS: 0206
>> > [  189.103187] RAX:  RBX: 03e8 RCX:
>>
>> 
>> > [  189.103187] RDX:  RSI: 00400830 RDI:
>>
>> 417baff8
>> > [  189.103187] RBP: 417baff8 R08:  R09:
>>
>> 0077
>> > [

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Dmitry Safonov

2018-05-19 3:22 GMT+01:00 Dmitry Safonov :
> On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
>> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com>
>> > cpu family: 6
>> > model: 142
>> > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
>> > But I usually test kernels in VM. So, I use virt-manager as it's
>> > easier to manage
>> > multiple VMs. The thing is that I've chosen "Copy host CPU
>> > configuration"
>> > and for some reason, I don't quite follow virt-manager makes model
>>
>> "Opteron_G4".
>> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-
>> > 2.fc26).
>> > So, cpuinfo in VM says:
>> > cpu family: 21
>> > model: 1
>> > model name: AMD Opteron 62xx class CPU
>>
>> What does guest cpuinfo say for vendor_id?
>>
>> There are multiple potential screwups here.
>>
>> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
>> that’s different from Intel’s, and the test case was definitely
>> wrong. But
>> KVM has no way to influence it.  Are you sure you’re using KVM and
>> not QEMU
>> TCG? Anyway, the IRET thing is minor compared to your other problems,
>> so
>> let’s try to fix them first.
>>
>> 2. Compat fast syscalls are wildly different on AMD and Intel.
>> Because of
>> this issue, QEMU with KVM is supposed to always report the real
>> vendor_id
>> no matter -cpu asks for.  If we get the wrong vendor_id, then we’re
>> at the
>> mercy of KVM’s emulation and performance will suck.  On older
>> kernels, this
>> would cause hideous kernel crashes.  On new kernels, I would expect
>> it to
>> merely crash 32-bit user programs or be slow.
>
> Heh, I didn't know those details, so it looks like it's (2),
> vendor_id   : AuthenticAMD
> in guest.
>
>>
>> > What's worse than registers changes is that some selftests actually
>> > lead
>>
>> to
>> > Oops's. The same reason for criu-ia32 fails.
>> > I've tested so far v4.15 and v4.16 releases besides master
>> > (2c71d338bef2),
>> > so it looks to be not a recent regression.
>> > Full Oopses:
>> > [  189.100174] BUG: unable to handle kernel paging request at
>>
>> 417bafe8
>> > [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067
>> > PTE
>>
>> 6991f067
>> > [  189.100174] Oops: 0001 [#3] SMP NOPTI
>>
>> Whoa there!  0001 means a failed *kernel* access.
>>
>> > [  189.100174] Modules linked in:
>> > [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
>>
>> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?
>
> sysret_ss_attrs_32 survives
>
>>
>> > D   4.17.0-rc5+ #11
>> > [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX,
>> > 1996),
>> > BIOS 1.10.2-1.fc26 04/01/2014
>> > [  189.103187] RIP: 0033:0x40085a
>>
>> The oops was caused from CPL 3 at what looks like a totally sensible
>> user
>> address.  Can you disassemble the offending binary and tell me what
>> the
>> code at 0x40085a is?
>
> Here is the function:
> 00400842 :
>   400842:   53  push   %rbx
>   400843:   55  push   %rbp
>   400844:   41 54   push   %r12
>   400846:   41 55   push   %r13
>   400848:   41 56   push   %r14
>   40084a:   41 57   push   %r15
>   40084c:   9c  pushfq
>   40084d:   48 89 27mov%rsp,(%rdi)
>   400850:   48 89 fcmov%rdi,%rsp
>   400853:   6a 23   pushq  $0x23
>   400855:   68 5c 08 40 00  pushq  $0x40085c
>   40085a:   48 cb   lretq
>   40085c:   ff d6   callq  *%rsi
>   40085e:   ea  (bad)
>   40085f:   65 08 40 00 or %al,%gs:0x0(%rax)
>   400863:   33 00   xor(%rax),%eax
>   400865:   48 8b 24 24 mov(%rsp),%rsp
>   400869:   9d  popfq
>   40086a:   41 5f   pop%r15
>   40086c:   41 5e   pop%r14
>   40086e:   41 5d   pop%r13
>   400870:   41 5c   pop%r12
>   400872:   5d  pop%rbp
>   400873:   5b  pop%rbx
>   400874:   c3  retq
>   400875:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
>   40087c:   00 00 00
>   40087f:   90  nop
>
> Looks like mov between registers caused it? The hell.

Oh, it's not 400850, I missloked, but 40085a so lretq might case it.

>
>>
>> > [  189.103187] RSP: 002b:417bafe8 EFLAGS: 0206
>> > [  189.103187] RAX:  RBX: 03e8 RCX:
>>
>> 
>> > [  189.103187] RDX:  RSI: 00400830 RDI:
>>
>> 417baff8
>> > [  189.103187] RBP: 417baff8 R08:  R09:
>>
>> 0077
>> > [  189.103187] R10:

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Dmitry Safonov

On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com>
> > cpu family: 6
> > model: 142
> > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
> > But I usually test kernels in VM. So, I use virt-manager as it's
> > easier to manage
> > multiple VMs. The thing is that I've chosen "Copy host CPU
> > configuration"
> > and for some reason, I don't quite follow virt-manager makes model
> 
> "Opteron_G4".
> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-
> > 2.fc26).
> > So, cpuinfo in VM says:
> > cpu family: 21
> > model: 1
> > model name: AMD Opteron 62xx class CPU
> 
> What does guest cpuinfo say for vendor_id?
> 
> There are multiple potential screwups here.
> 
> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
> that’s different from Intel’s, and the test case was definitely
> wrong. But
> KVM has no way to influence it.  Are you sure you’re using KVM and
> not QEMU
> TCG? Anyway, the IRET thing is minor compared to your other problems,
> so
> let’s try to fix them first.
> 
> 2. Compat fast syscalls are wildly different on AMD and Intel.
> Because of
> this issue, QEMU with KVM is supposed to always report the real
> vendor_id
> no matter -cpu asks for.  If we get the wrong vendor_id, then we’re
> at the
> mercy of KVM’s emulation and performance will suck.  On older
> kernels, this
> would cause hideous kernel crashes.  On new kernels, I would expect
> it to
> merely crash 32-bit user programs or be slow.

Heh, I didn't know those details, so it looks like it's (2),
vendor_id   : AuthenticAMD
in guest.

> 
> > What's worse than registers changes is that some selftests actually
> > lead
> 
> to
> > Oops's. The same reason for criu-ia32 fails.
> > I've tested so far v4.15 and v4.16 releases besides master
> > (2c71d338bef2),
> > so it looks to be not a recent regression.
> > Full Oopses:
> > [  189.100174] BUG: unable to handle kernel paging request at
> 
> 417bafe8
> > [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067
> > PTE
> 
> 6991f067
> > [  189.100174] Oops: 0001 [#3] SMP NOPTI
> 
> Whoa there!  0001 means a failed *kernel* access.
> 
> > [  189.100174] Modules linked in:
> > [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
> 
> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?

sysret_ss_attrs_32 survives

> 
> > D   4.17.0-rc5+ #11
> > [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996),
> > BIOS 1.10.2-1.fc26 04/01/2014
> > [  189.103187] RIP: 0033:0x40085a
> 
> The oops was caused from CPL 3 at what looks like a totally sensible
> user
> address.  Can you disassemble the offending binary and tell me what
> the
> code at 0x40085a is?

Here is the function:
00400842 :
  400842:   53  push   %rbx
  400843:   55  push   %rbp
  400844:   41 54   push   %r12
  400846:   41 55   push   %r13
  400848:   41 56   push   %r14
  40084a:   41 57   push   %r15
  40084c:   9c  pushfq 
  40084d:   48 89 27mov%rsp,(%rdi)
  400850:   48 89 fcmov%rdi,%rsp
  400853:   6a 23   pushq  $0x23
  400855:   68 5c 08 40 00  pushq  $0x40085c
  40085a:   48 cb   lretq  
  40085c:   ff d6   callq  *%rsi
  40085e:   ea  (bad)  
  40085f:   65 08 40 00 or %al,%gs:0x0(%rax)
  400863:   33 00   xor(%rax),%eax
  400865:   48 8b 24 24 mov(%rsp),%rsp
  400869:   9d  popfq  
  40086a:   41 5f   pop%r15
  40086c:   41 5e   pop%r14
  40086e:   41 5d   pop%r13
  400870:   41 5c   pop%r12
  400872:   5d  pop%rbp
  400873:   5b  pop%rbx
  400874:   c3  retq   
  400875:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
  40087c:   00 00 00 
  40087f:   90  nop

Looks like mov between registers caused it? The hell.

> 
> > [  189.103187] RSP: 002b:417bafe8 EFLAGS: 0206
> > [  189.103187] RAX:  RBX: 03e8 RCX:
> 
> 
> > [  189.103187] RDX:  RSI: 00400830 RDI:
> 
> 417baff8
> > [  189.103187] RBP: 417baff8 R08:  R09:
> 
> 0077
> > [  189.103187] R10: 0006 R11:  R12:
> 
> 417ba000
> > [  189.103187] R13: 7ffc05207840 R14:  R15:
> 
> 
> > [  189.103187] FS:  7f98566ecb40()
> > GS:9740ffc0()
> > knlGS:
> > [

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Dmitry Safonov

On Fri, 2018-05-18 at 19:05 -0700, Andy Lutomirski wrote:
> > On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com>
> > cpu family: 6
> > model: 142
> > model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz
> > But I usually test kernels in VM. So, I use virt-manager as it's
> > easier to manage
> > multiple VMs. The thing is that I've chosen "Copy host CPU
> > configuration"
> > and for some reason, I don't quite follow virt-manager makes model
> 
> "Opteron_G4".
> > I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-
> > 2.fc26).
> > So, cpuinfo in VM says:
> > cpu family: 21
> > model: 1
> > model name: AMD Opteron 62xx class CPU
> 
> What does guest cpuinfo say for vendor_id?
> 
> There are multiple potential screwups here.
> 
> 1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
> that’s different from Intel’s, and the test case was definitely
> wrong. But
> KVM has no way to influence it.  Are you sure you’re using KVM and
> not QEMU
> TCG? Anyway, the IRET thing is minor compared to your other problems,
> so
> let’s try to fix them first.
> 
> 2. Compat fast syscalls are wildly different on AMD and Intel.
> Because of
> this issue, QEMU with KVM is supposed to always report the real
> vendor_id
> no matter -cpu asks for.  If we get the wrong vendor_id, then we’re
> at the
> mercy of KVM’s emulation and performance will suck.  On older
> kernels, this
> would cause hideous kernel crashes.  On new kernels, I would expect
> it to
> merely crash 32-bit user programs or be slow.

Heh, I didn't know those details, so it looks like it's (2),
vendor_id   : AuthenticAMD
in guest.

> 
> > What's worse than registers changes is that some selftests actually
> > lead
> 
> to
> > Oops's. The same reason for criu-ia32 fails.
> > I've tested so far v4.15 and v4.16 releases besides master
> > (2c71d338bef2),
> > so it looks to be not a recent regression.
> > Full Oopses:
> > [  189.100174] BUG: unable to handle kernel paging request at
> 
> 417bafe8
> > [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067
> > PTE
> 
> 6991f067
> > [  189.100174] Oops: 0001 [#3] SMP NOPTI
> 
> Whoa there!  0001 means a failed *kernel* access.
> 
> > [  189.100174] Modules linked in:
> > [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G
> 
> Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?

sysret_ss_attrs_32 survives

> 
> > D   4.17.0-rc5+ #11
> > [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996),
> > BIOS 1.10.2-1.fc26 04/01/2014
> > [  189.103187] RIP: 0033:0x40085a
> 
> The oops was caused from CPL 3 at what looks like a totally sensible
> user
> address.  Can you disassemble the offending binary and tell me what
> the
> code at 0x40085a is?

Here is the function:
00400842 :
  400842:   53  push   %rbx
  400843:   55  push   %rbp
  400844:   41 54   push   %r12
  400846:   41 55   push   %r13
  400848:   41 56   push   %r14
  40084a:   41 57   push   %r15
  40084c:   9c  pushfq 
  40084d:   48 89 27mov%rsp,(%rdi)
  400850:   48 89 fcmov%rdi,%rsp
  400853:   6a 23   pushq  $0x23
  400855:   68 5c 08 40 00  pushq  $0x40085c
  40085a:   48 cb   lretq  
  40085c:   ff d6   callq  *%rsi
  40085e:   ea  (bad)  
  40085f:   65 08 40 00 or %al,%gs:0x0(%rax)
  400863:   33 00   xor(%rax),%eax
  400865:   48 8b 24 24 mov(%rsp),%rsp
  400869:   9d  popfq  
  40086a:   41 5f   pop%r15
  40086c:   41 5e   pop%r14
  40086e:   41 5d   pop%r13
  400870:   41 5c   pop%r12
  400872:   5d  pop%rbp
  400873:   5b  pop%rbx
  400874:   c3  retq   
  400875:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
  40087c:   00 00 00 
  40087f:   90  nop

Looks like mov between registers caused it? The hell.

> 
> > [  189.103187] RSP: 002b:417bafe8 EFLAGS: 0206
> > [  189.103187] RAX:  RBX: 03e8 RCX:
> 
> 
> > [  189.103187] RDX:  RSI: 00400830 RDI:
> 
> 417baff8
> > [  189.103187] RBP: 417baff8 R08:  R09:
> 
> 0077
> > [  189.103187] R10: 0006 R11:  R12:
> 
> 417ba000
> > [  189.103187] R13: 7ffc05207840 R14:  R15:
> 
> 
> > [  189.103187] FS:  7f98566ecb40()
> > GS:9740ffc0()
> > knlGS:
> > [

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Andy Lutomirski

> On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> wrote:

> Hi Andy,

> 2018-05-18 23:03 GMT+01:00 Andy Lutomirski :
>>> On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov  wrote:
>>> Some selftests are failing, but the same way as before the patch
>>> (ITOW, it's not regression):
>>> [root@localhost self]# grep FAIL out
>>> [FAIL]  Reg 1 mismatch: requested 0x0; got 0x3
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xff425aadc0de
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xff425aadc0de
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xff425aadc0de

>> Are you on AMD?  Can you try this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=c88aa6d53840e48970c54f9ef70c79415033b32d

>> and give me a Tested-by if it fixes it for you?

> Sure.
> I'm on Intel actually:
> cpu family: 6
> model: 142
> model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

> But I usually test kernels in VM. So, I use virt-manager as it's
> easier to manage
> multiple VMs. The thing is that I've chosen "Copy host CPU configuration"
> and for some reason, I don't quite follow virt-manager makes model
"Opteron_G4".
> I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26).
> So, cpuinfo in VM says:
> cpu family: 21
> model: 1
> model name: AMD Opteron 62xx class CPU

What does guest cpuinfo say for vendor_id?

There are multiple potential screwups here.

1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
that’s different from Intel’s, and the test case was definitely wrong. But
KVM has no way to influence it.  Are you sure you’re using KVM and not QEMU
TCG? Anyway, the IRET thing is minor compared to your other problems, so
let’s try to fix them first.

2. Compat fast syscalls are wildly different on AMD and Intel. Because of
this issue, QEMU with KVM is supposed to always report the real vendor_id
no matter -cpu asks for.  If we get the wrong vendor_id, then we’re at the
mercy of KVM’s emulation and performance will suck.  On older kernels, this
would cause hideous kernel crashes.  On new kernels, I would expect it to
merely crash 32-bit user programs or be slow.

> What's worse than registers changes is that some selftests actually lead
to
> Oops's. The same reason for criu-ia32 fails.
> I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2),
> so it looks to be not a recent regression.

> Full Oopses:
> [  189.100174] BUG: unable to handle kernel paging request at
417bafe8
> [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE
6991f067
> [  189.100174] Oops: 0001 [#3] SMP NOPTI

Whoa there!  0001 means a failed *kernel* access.

> [  189.100174] Modules linked in:
> [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G

Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?

> D   4.17.0-rc5+ #11
> [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1.fc26 04/01/2014
> [  189.103187] RIP: 0033:0x40085a

The oops was caused from CPL 3 at what looks like a totally sensible user
address.  Can you disassemble the offending binary and tell me what the
code at 0x40085a is?

> [  189.103187] RSP: 002b:417bafe8 EFLAGS: 0206
> [  189.103187] RAX:  RBX: 03e8 RCX:

> [  189.103187] RDX:  RSI: 00400830 RDI:
417baff8
> [  189.103187] RBP: 417baff8 R08:  R09:
0077
> [  189.103187] R10: 0006 R11:  R12:
417ba000
> [  189.103187] R13: 7ffc05207840 R14:  R15:

> [  189.103187] FS:  7f98566ecb40() GS:9740ffc0()
> knlGS:
> [  189.103187] CS:  0010 DS:  ES:  CR0: 80050033

CS here is the value of CS that the *kernel* has, so 0x10 is normal.

> [  189.103187] CR2: 417bafe8 CR3: 69dc4000 CR4:
007406f0

CR2 is in user space.

So the big question is: what happened here?  Why did the CPU (or emulated
CPU) attempt a privileged access to a user address while running user code?

Re: [PATCH] x86/mm: Drop TS_COMPAT on 64-bit exec() syscall

2018-05-18 Thread Andy Lutomirski

> On May 18, 2018, at 4:10 PM, Dmitry Safonov <0x7f454...@gmail.com> wrote:

> Hi Andy,

> 2018-05-18 23:03 GMT+01:00 Andy Lutomirski :
>>> On Thu, May 17, 2018 at 4:40 PM Dmitry Safonov  wrote:
>>> Some selftests are failing, but the same way as before the patch
>>> (ITOW, it's not regression):
>>> [root@localhost self]# grep FAIL out
>>> [FAIL]  Reg 1 mismatch: requested 0x0; got 0x3
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xff425aadc0de
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xff425aadc0de
>>> [FAIL]  Reg 15 mismatch: requested 0x8badf00d5aadc0de; got
>>> 0xff425aadc0de

>> Are you on AMD?  Can you try this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes=c88aa6d53840e48970c54f9ef70c79415033b32d

>> and give me a Tested-by if it fixes it for you?

> Sure.
> I'm on Intel actually:
> cpu family: 6
> model: 142
> model name: Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

> But I usually test kernels in VM. So, I use virt-manager as it's
> easier to manage
> multiple VMs. The thing is that I've chosen "Copy host CPU configuration"
> and for some reason, I don't quite follow virt-manager makes model
"Opteron_G4".
> I'm on Fedora 27, virt-manager 1.4.3, qemu 2.9.1(qemu-2.9.1-2.fc26).
> So, cpuinfo in VM says:
> cpu family: 21
> model: 1
> model name: AMD Opteron 62xx class CPU

What does guest cpuinfo say for vendor_id?

There are multiple potential screwups here.

1. (What I *thought* was going on) AMD CPUs have screwy IRET behavior
that’s different from Intel’s, and the test case was definitely wrong. But
KVM has no way to influence it.  Are you sure you’re using KVM and not QEMU
TCG? Anyway, the IRET thing is minor compared to your other problems, so
let’s try to fix them first.

2. Compat fast syscalls are wildly different on AMD and Intel. Because of
this issue, QEMU with KVM is supposed to always report the real vendor_id
no matter -cpu asks for.  If we get the wrong vendor_id, then we’re at the
mercy of KVM’s emulation and performance will suck.  On older kernels, this
would cause hideous kernel crashes.  On new kernels, I would expect it to
merely crash 32-bit user programs or be slow.

> What's worse than registers changes is that some selftests actually lead
to
> Oops's. The same reason for criu-ia32 fails.
> I've tested so far v4.15 and v4.16 releases besides master (2c71d338bef2),
> so it looks to be not a recent regression.

> Full Oopses:
> [  189.100174] BUG: unable to handle kernel paging request at
417bafe8
> [  189.100174] PGD 69ed4067 P4D 69ed4067 PUD 707fc067 PMD 6c535067 PTE
6991f067
> [  189.100174] Oops: 0001 [#3] SMP NOPTI

Whoa there!  0001 means a failed *kernel* access.

> [  189.100174] Modules linked in:
> [  189.100174] CPU: 0 PID: 2443 Comm: sysret_ss_attrs Tainted: G

Was this sysret_ss_attrs_32 or sysret_ss_attrs_64?

> D   4.17.0-rc5+ #11
> [  189.103187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1.fc26 04/01/2014
> [  189.103187] RIP: 0033:0x40085a

The oops was caused from CPL 3 at what looks like a totally sensible user
address.  Can you disassemble the offending binary and tell me what the
code at 0x40085a is?

> [  189.103187] RSP: 002b:417bafe8 EFLAGS: 0206
> [  189.103187] RAX:  RBX: 03e8 RCX:

> [  189.103187] RDX:  RSI: 00400830 RDI:
417baff8
> [  189.103187] RBP: 417baff8 R08:  R09:
0077
> [  189.103187] R10: 0006 R11:  R12:
417ba000
> [  189.103187] R13: 7ffc05207840 R14:  R15:

> [  189.103187] FS:  7f98566ecb40() GS:9740ffc0()
> knlGS:
> [  189.103187] CS:  0010 DS:  ES:  CR0: 80050033

CS here is the value of CS that the *kernel* has, so 0x10 is normal.

> [  189.103187] CR2: 417bafe8 CR3: 69dc4000 CR4:
007406f0

CR2 is in user space.

So the big question is: what happened here?  Why did the CPU (or emulated
CPU) attempt a privileged access to a user address while running user code?

Re: [PATCH 03/18] printk: Convert pr_fmt from blank define to KBUILD_MODNAME

2018-05-18 Thread Joe Perches

On Fri, 2018-05-18 at 23:29 +0300, Andy Shevchenko wrote:
> On Fri, May 18, 2018 at 12:10 PM, Joe Perches  wrote:
> > On Fri, 2018-05-18 at 10:42 +0200, Petr Mladek wrote:
> > > On Thu 2018-05-10 08:45:29, Joe Perches wrote:
> > > [0.00] libftrace: ftrace: allocating 40753 entries in 160 pages
> > > [0.004008] apic: APIC: Switch to symmetric I/O mode setup
> > 
> > I believe the uppercase APIC: bit should be removed.
> 
> I disagree on lowering the case of some prefixes.
> 
> In the case when kernel crashes at boot time and only what you have is
> what on your screen it's much better to catch with different cases
> than if everything would be either upper or lower.

I don't parse.
My suggestion is to make "apic: " the only prefix.

ie:
---
 arch/x86/kernel/apic/apic.c | 100 +---
 1 file changed, 47 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 2aabd4cb0e3f..67dd4e9191da 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -14,6 +14,8 @@
  * Mikael Pettersson   :   PM converted to driver model.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -232,7 +234,7 @@ static int modern_apic(void)
  */
 static void __init apic_disable(void)
 {
-   pr_info("APIC: switched to apic NOOP\n");
+   pr_info("switched to apic NOOP\n");
apic = _noop;
 }
 
@@ -430,17 +432,13 @@ int setup_APIC_eilvt(u8 offset, u8 vector, u8 msg_type, 
u8 mask)
reserved = reserve_eilvt_offset(offset, new);
 
if (reserved != new) {
-   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for "
-  "vector 0x%x, but the register is already in use for "
-  "vector 0x%x on another cpu\n",
+   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for 
vector 0x%x, but the register is already in use for vector 0x%x on another 
cpu\n",
   smp_processor_id(), reg, offset, new, reserved);
return -EINVAL;
}
 
if (!eilvt_entry_is_changeable(old, new)) {
-   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for "
-  "vector 0x%x, but the register is already in use for "
-  "vector 0x%x on this cpu\n",
+   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for 
vector 0x%x, but the register is already in use for vector 0x%x on this cpu\n",
   smp_processor_id(), reg, offset, new, old);
return -EBUSY;
}
@@ -624,8 +622,8 @@ static void apic_check_deadline_errata(void)
return;
 
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-   pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; "
-  "please update microcode to version: 0x%x (or later)\n", rev);
+   pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; please update 
microcode to version: 0x%x (or later)\n",
+  rev);
 }
 
 /*
@@ -769,14 +767,14 @@ calibrate_by_pmtimer(long deltapm, long *delta, long 
*deltatsc)
 
res = (((u64)deltapm) *  mult) >> 22;
do_div(res, 100);
-   pr_warning("APIC calibration not consistent "
-  "with PM-Timer: %ldms instead of 100ms\n",(long)res);
+   pr_warn("calibration not consistent with PM-Timer: %ldms instead of 
100ms\n",
+  (long)res);
 
/* Correct the lapic counter value */
res = (((u64)(*delta)) * pm_100ms);
do_div(res, deltapm);
-   pr_info("APIC delta adjusted to PM-Timer: "
-   "%lu (%ld)\n", (unsigned long)res, *delta);
+   pr_info("APIC delta adjusted to PM-Timer: %lu (%ld)\n",
+   (unsigned long)res, *delta);
*delta = (long)res;
 
/* Correct the tsc counter value */
@@ -893,7 +891,7 @@ static int __init calibrate_APIC_clock(void)
 */
if (lapic_timer_frequency < (100 / HZ)) {
local_irq_enable();
-   pr_warning("APIC frequency too slow, disabling apic timer\n");
+   pr_warn("frequency too slow, disabling apic timer\n");
return -1;
}
 
@@ -936,8 +934,8 @@ static int __init calibrate_APIC_clock(void)
local_irq_enable();
 
if (levt->features & CLOCK_EVT_FEAT_DUMMY) {
-   pr_warning("APIC timer disabled due to verification failure\n");
-   return -1;
+   pr_warn("timer disabled due to verification failure\n");
+   return -1;
}
 
return 0;
@@ -1010,8 +1008,8 @@ static void local_apic_timer_interrupt(void)
 * spurious.
 */
if (!evt->event_handler) {
-   pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
-  smp_processor_id());
+   pr_warn("Spurious LAPIC timer

Re: [PATCH 03/18] printk: Convert pr_fmt from blank define to KBUILD_MODNAME

2018-05-18 Thread Joe Perches

On Fri, 2018-05-18 at 23:29 +0300, Andy Shevchenko wrote:
> On Fri, May 18, 2018 at 12:10 PM, Joe Perches  wrote:
> > On Fri, 2018-05-18 at 10:42 +0200, Petr Mladek wrote:
> > > On Thu 2018-05-10 08:45:29, Joe Perches wrote:
> > > [0.00] libftrace: ftrace: allocating 40753 entries in 160 pages
> > > [0.004008] apic: APIC: Switch to symmetric I/O mode setup
> > 
> > I believe the uppercase APIC: bit should be removed.
> 
> I disagree on lowering the case of some prefixes.
> 
> In the case when kernel crashes at boot time and only what you have is
> what on your screen it's much better to catch with different cases
> than if everything would be either upper or lower.

I don't parse.
My suggestion is to make "apic: " the only prefix.

ie:
---
 arch/x86/kernel/apic/apic.c | 100 +---
 1 file changed, 47 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 2aabd4cb0e3f..67dd4e9191da 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -14,6 +14,8 @@
  * Mikael Pettersson   :   PM converted to driver model.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -232,7 +234,7 @@ static int modern_apic(void)
  */
 static void __init apic_disable(void)
 {
-   pr_info("APIC: switched to apic NOOP\n");
+   pr_info("switched to apic NOOP\n");
apic = _noop;
 }
 
@@ -430,17 +432,13 @@ int setup_APIC_eilvt(u8 offset, u8 vector, u8 msg_type, 
u8 mask)
reserved = reserve_eilvt_offset(offset, new);
 
if (reserved != new) {
-   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for "
-  "vector 0x%x, but the register is already in use for "
-  "vector 0x%x on another cpu\n",
+   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for 
vector 0x%x, but the register is already in use for vector 0x%x on another 
cpu\n",
   smp_processor_id(), reg, offset, new, reserved);
return -EINVAL;
}
 
if (!eilvt_entry_is_changeable(old, new)) {
-   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for "
-  "vector 0x%x, but the register is already in use for "
-  "vector 0x%x on this cpu\n",
+   pr_err(FW_BUG "cpu %d, try to use APIC%lX (LVT offset %d) for 
vector 0x%x, but the register is already in use for vector 0x%x on this cpu\n",
   smp_processor_id(), reg, offset, new, old);
return -EBUSY;
}
@@ -624,8 +622,8 @@ static void apic_check_deadline_errata(void)
return;
 
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-   pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; "
-  "please update microcode to version: 0x%x (or later)\n", rev);
+   pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; please update 
microcode to version: 0x%x (or later)\n",
+  rev);
 }
 
 /*
@@ -769,14 +767,14 @@ calibrate_by_pmtimer(long deltapm, long *delta, long 
*deltatsc)
 
res = (((u64)deltapm) *  mult) >> 22;
do_div(res, 100);
-   pr_warning("APIC calibration not consistent "
-  "with PM-Timer: %ldms instead of 100ms\n",(long)res);
+   pr_warn("calibration not consistent with PM-Timer: %ldms instead of 
100ms\n",
+  (long)res);
 
/* Correct the lapic counter value */
res = (((u64)(*delta)) * pm_100ms);
do_div(res, deltapm);
-   pr_info("APIC delta adjusted to PM-Timer: "
-   "%lu (%ld)\n", (unsigned long)res, *delta);
+   pr_info("APIC delta adjusted to PM-Timer: %lu (%ld)\n",
+   (unsigned long)res, *delta);
*delta = (long)res;
 
/* Correct the tsc counter value */
@@ -893,7 +891,7 @@ static int __init calibrate_APIC_clock(void)
 */
if (lapic_timer_frequency < (100 / HZ)) {
local_irq_enable();
-   pr_warning("APIC frequency too slow, disabling apic timer\n");
+   pr_warn("frequency too slow, disabling apic timer\n");
return -1;
}
 
@@ -936,8 +934,8 @@ static int __init calibrate_APIC_clock(void)
local_irq_enable();
 
if (levt->features & CLOCK_EVT_FEAT_DUMMY) {
-   pr_warning("APIC timer disabled due to verification failure\n");
-   return -1;
+   pr_warn("timer disabled due to verification failure\n");
+   return -1;
}
 
return 0;
@@ -1010,8 +1008,8 @@ static void local_apic_timer_interrupt(void)
 * spurious.
 */
if (!evt->event_handler) {
-   pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
-  smp_processor_id());
+   pr_warn("Spurious LAPIC timer interrupt on cpu

Re: [v0 2/2] cpufreq: qcom-fw: Add support for QCOM cpufreq FW driver

2018-05-18 Thread kbuild test robot

Hi Taniya,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.17-rc5 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Taniya-Das/Add-support-for-QCOM-cpufreq-FW-driver/20180519-050902
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
linux-next
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpufreq_fw_cpu_init':
>> drivers/cpufreq/qcom-cpufreq-fw.c:77:8: error: 'struct cpufreq_policy' has 
>> no member named 'table'
 policy->table = c->table;
   ^~
   drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpu_resources_init':
>> drivers/cpufreq/qcom-cpufreq-fw.c:187:37: error: passing argument 1 of 
>> 'platform_get_resource_byname' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base");
^~~
   In file included from include/linux/of_device.h:6:0,
from include/linux/of_platform.h:12,
from drivers/cpufreq/qcom-cpufreq-fw.c:11:
   include/linux/platform_device.h:56:25: note: expected 'struct 
platform_device *' but argument is of type 'struct device *'
extern struct resource *platform_get_resource_byname(struct platform_device 
*,
^~~~
>> drivers/cpufreq/qcom-cpufreq-fw.c:187:6: error: incompatible types when 
>> assigning to type 'struct resource' from type 'struct resource *'
 res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base");
 ^
>> drivers/cpufreq/qcom-cpufreq-fw.c:188:6: error: wrong type argument to unary 
>> exclamation mark
 if (!res) {
 ^
>> drivers/cpufreq/qcom-cpufreq-fw.c:193:33: error: invalid type argument of 
>> '->' (have 'struct resource')
 en_base = devm_ioremap(dev, res->start, resource_size(res));
^~
>> drivers/cpufreq/qcom-cpufreq-fw.c:193:56: error: incompatible type for 
>> argument 1 of 'resource_size'
 en_base = devm_ioremap(dev, res->start, resource_size(res));
   ^~~
   In file included from include/linux/of_address.h:4:0,
from drivers/cpufreq/qcom-cpufreq-fw.c:10:
   include/linux/ioport.h:196:31: note: expected 'const struct resource *' but 
argument is of type 'struct resource'
static inline resource_size_t resource_size(const struct resource *res)
  ^
   cc1: some warnings being treated as errors

vim +77 drivers/cpufreq/qcom-cpufreq-fw.c

  > 11  #include 
12  
13  #define INIT_RATE   3UL
14  #define XO_RATE 1920UL
15  #define LUT_MAX_ENTRIES 40U
16  #define CORE_COUNT_VAL(val) ((val & GENMASK(18, 16)) >> 16)
17  #define LUT_ROW_SIZE32
18  
19  struct cpufreq_qcom {
20  struct cpufreq_frequency_table *table;
21  struct device *dev;
22  void __iomem *perf_base;
23  void __iomem *lut_base;
24  cpumask_t related_cpus;
25  unsigned int max_cores;
26  };
27  
28  static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS];
29  
30  static int
31  qcom_cpufreq_fw_target_index(struct cpufreq_policy *policy, unsigned 
int index)
32  {
33  struct cpufreq_qcom *c = policy->driver_data;
34  
35  if (index >= LUT_MAX_ENTRIES) {
36  dev_err(c->dev,
37  "Passing an index (%u) that's greater than max (%d)\n",
38  index, LUT_MAX_ENTRIES - 1);
39  return -EINVAL;
40  }
41  
42  writel_relaxed(index, c->perf_base);
43  
44  /* Make sure the write goes through before proceeding */
45  mb();
46  return 0;
47  }
48  
49  static unsigned int qcom_cpufreq_fw_get(unsigned int cpu)
50  {
51  struct cpufreq_qcom *c;
52  unsigned int index;
53  
54  c = qcom_freq_domain_map[cpu];
55  if (!c)
56  return -ENODEV;
57  
58  index = readl_relaxed(c->perf_base);
59  index = min(index, LUT_MAX_ENTRIES - 1);
60  
61  return

Re: [v0 2/2] cpufreq: qcom-fw: Add support for QCOM cpufreq FW driver

2018-05-18 Thread kbuild test robot

Hi Taniya,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.17-rc5 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Taniya-Das/Add-support-for-QCOM-cpufreq-FW-driver/20180519-050902
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
linux-next
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpufreq_fw_cpu_init':
>> drivers/cpufreq/qcom-cpufreq-fw.c:77:8: error: 'struct cpufreq_policy' has 
>> no member named 'table'
 policy->table = c->table;
   ^~
   drivers/cpufreq/qcom-cpufreq-fw.c: In function 'qcom_cpu_resources_init':
>> drivers/cpufreq/qcom-cpufreq-fw.c:187:37: error: passing argument 1 of 
>> 'platform_get_resource_byname' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
 res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base");
^~~
   In file included from include/linux/of_device.h:6:0,
from include/linux/of_platform.h:12,
from drivers/cpufreq/qcom-cpufreq-fw.c:11:
   include/linux/platform_device.h:56:25: note: expected 'struct 
platform_device *' but argument is of type 'struct device *'
extern struct resource *platform_get_resource_byname(struct platform_device 
*,
^~~~
>> drivers/cpufreq/qcom-cpufreq-fw.c:187:6: error: incompatible types when 
>> assigning to type 'struct resource' from type 'struct resource *'
 res = platform_get_resource_byname(dev, IORESOURCE_MEM, "en_base");
 ^
>> drivers/cpufreq/qcom-cpufreq-fw.c:188:6: error: wrong type argument to unary 
>> exclamation mark
 if (!res) {
 ^
>> drivers/cpufreq/qcom-cpufreq-fw.c:193:33: error: invalid type argument of 
>> '->' (have 'struct resource')
 en_base = devm_ioremap(dev, res->start, resource_size(res));
^~
>> drivers/cpufreq/qcom-cpufreq-fw.c:193:56: error: incompatible type for 
>> argument 1 of 'resource_size'
 en_base = devm_ioremap(dev, res->start, resource_size(res));
   ^~~
   In file included from include/linux/of_address.h:4:0,
from drivers/cpufreq/qcom-cpufreq-fw.c:10:
   include/linux/ioport.h:196:31: note: expected 'const struct resource *' but 
argument is of type 'struct resource'
static inline resource_size_t resource_size(const struct resource *res)
  ^
   cc1: some warnings being treated as errors

vim +77 drivers/cpufreq/qcom-cpufreq-fw.c

  > 11  #include 
12  
13  #define INIT_RATE   3UL
14  #define XO_RATE 1920UL
15  #define LUT_MAX_ENTRIES 40U
16  #define CORE_COUNT_VAL(val) ((val & GENMASK(18, 16)) >> 16)
17  #define LUT_ROW_SIZE32
18  
19  struct cpufreq_qcom {
20  struct cpufreq_frequency_table *table;
21  struct device *dev;
22  void __iomem *perf_base;
23  void __iomem *lut_base;
24  cpumask_t related_cpus;
25  unsigned int max_cores;
26  };
27  
28  static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS];
29  
30  static int
31  qcom_cpufreq_fw_target_index(struct cpufreq_policy *policy, unsigned 
int index)
32  {
33  struct cpufreq_qcom *c = policy->driver_data;
34  
35  if (index >= LUT_MAX_ENTRIES) {
36  dev_err(c->dev,
37  "Passing an index (%u) that's greater than max (%d)\n",
38  index, LUT_MAX_ENTRIES - 1);
39  return -EINVAL;
40  }
41  
42  writel_relaxed(index, c->perf_base);
43  
44  /* Make sure the write goes through before proceeding */
45  mb();
46  return 0;
47  }
48  
49  static unsigned int qcom_cpufreq_fw_get(unsigned int cpu)
50  {
51  struct cpufreq_qcom *c;
52  unsigned int index;
53  
54  c = qcom_freq_domain_map[cpu];
55  if (!c)
56  return -ENODEV;
57  
58  index = readl_relaxed(c->perf_base);
59  index = min(index, LUT_MAX_ENTRIES - 1);
60  
61  return

MANAGER AUDIT AND ACCOUNTING DEPARTMENT

2018-05-18 Thread Mr.Nelson Capper

FROM MR NELSON  CAPPER
MANAGER AUDIT AND ACCOUNTING DEPARTMENT
KOREA DEVELOPMENT BANK (K.D.B) LONDON U.K

PLEASE READ CAREFULLY

This message might meet you in utmost surprise. However, it’s just my
urgent need for foreign partner that made me to contact you for this
transaction.

I am Mr, Nelson Capper Manager Audit Accounting Department Korea
Development Bank KDB London U.K I would like to know if this proposal
will be worthwhile for your acceptance.

I have a Foreign Customer, Manfred Hoffman from Germany who was an
Investor, Crude Oil Merchant and Federal Government Contractor he was
a victim with Concord Air Line, flight AF4590 killing 113 people
crashed on 25 July 2000 near Paris leaving a closing balance of Ten
Million Five Hundred Thousand United States Dollars ($10.5m) in one of
his Private US Dollar Account that was been managed by me as the
Customer’s Account Officer.

Base on my security report, these funds can be claim without any
hitches as no one is aware of the funds and it’s closing balance
except me and the customer who is (Now Deceased) therefore, I can
present you as the Next of Kin and we will work out the modalities for
the claiming of the funds in
accordance with the law.

Now, if you are really sure of your trustworthy, accountability and
confidentiality on this transaction without disappointment, contact me
through email for further details. i expect your letter.After your
reply I shall give you the details and procedures of the transaction.

Regards

Mr, Nelson Capper

MANAGER AUDIT AND ACCOUNTING DEPARTMENT

2018-05-18 Thread Mr.Nelson Capper

FROM MR NELSON  CAPPER
MANAGER AUDIT AND ACCOUNTING DEPARTMENT
KOREA DEVELOPMENT BANK (K.D.B) LONDON U.K

PLEASE READ CAREFULLY

This message might meet you in utmost surprise. However, it’s just my
urgent need for foreign partner that made me to contact you for this
transaction.

I am Mr, Nelson Capper Manager Audit Accounting Department Korea
Development Bank KDB London U.K I would like to know if this proposal
will be worthwhile for your acceptance.

I have a Foreign Customer, Manfred Hoffman from Germany who was an
Investor, Crude Oil Merchant and Federal Government Contractor he was
a victim with Concord Air Line, flight AF4590 killing 113 people
crashed on 25 July 2000 near Paris leaving a closing balance of Ten
Million Five Hundred Thousand United States Dollars ($10.5m) in one of
his Private US Dollar Account that was been managed by me as the
Customer’s Account Officer.

Base on my security report, these funds can be claim without any
hitches as no one is aware of the funds and it’s closing balance
except me and the customer who is (Now Deceased) therefore, I can
present you as the Next of Kin and we will work out the modalities for
the claiming of the funds in
accordance with the law.

Now, if you are really sure of your trustworthy, accountability and
confidentiality on this transaction without disappointment, contact me
through email for further details. i expect your letter.After your
reply I shall give you the details and procedures of the transaction.

Regards

Mr, Nelson Capper

Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option

2018-05-18 Thread Tom Talpey

On 5/18/2018 1:58 PM, Long Li wrote:

Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option

On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical  wrote:

Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option

On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:

From: Long Li 

When cache=rdma is enabled on mount options, CIFS do not allocate
internal data buffer pages for I/O, data is read/writen directly to
user

memory via RDMA.

I don't think this should be an option.  For direct I/O without
signing or encryption CIFS should always use get_user_pages, with or

without RDMA.

Yes this should be done for all transport. If there are no objections, I'll send

patches to change this.

Would this help/change performance much?

On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O 
patterns.

But I haven't tested on TCP. Maybe it will help a little bit.

Well, when the application requests direct i/o on a TCP connection,
you definitely don't want to cache it! So even if the performance
is different, correctness would dictate doing this.

You probably don't need to pin the buffer in the TCP case, which
might be worth avoiding.

Tom.

Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option

2018-05-18 Thread Tom Talpey

On 5/18/2018 1:58 PM, Long Li wrote:

Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option

On Fri, May 18, 2018 at 12:00 PM, Long Li via samba-technical  wrote:

Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option

On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:

From: Long Li 

When cache=rdma is enabled on mount options, CIFS do not allocate
internal data buffer pages for I/O, data is read/writen directly to
user

memory via RDMA.

I don't think this should be an option.  For direct I/O without
signing or encryption CIFS should always use get_user_pages, with or

without RDMA.

Yes this should be done for all transport. If there are no objections, I'll send

patches to change this.

Would this help/change performance much?

On RDMA, it helps with I/O latency and reduces CPU usage on certain I/O 
patterns.

But I haven't tested on TCP. Maybe it will help a little bit.

Well, when the application requests direct i/o on a TCP connection,
you definitely don't want to cache it! So even if the performance
is different, correctness would dictate doing this.

You probably don't need to pin the buffer in the TCP case, which
might be worth avoiding.

Tom.

[PATCH v3] usbip: vhci_sysfs: fix potential Spectre v1

2018-05-18 Thread Gustavo A. R. Silva

pdev_nr and rhport can be controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:
drivers/usb/usbip/vhci_sysfs.c:238 detach_store() warn: potential spectre issue 
'vhcis'
drivers/usb/usbip/vhci_sysfs.c:328 attach_store() warn: potential spectre issue 
'vhcis'
drivers/usb/usbip/vhci_sysfs.c:338 attach_store() warn: potential spectre issue 
'vhci->vhci_hcd_ss->vdev'
drivers/usb/usbip/vhci_sysfs.c:340 attach_store() warn: potential spectre issue 
'vhci->vhci_hcd_hs->vdev'

Fix this by sanitizing pdev_nr and rhport before using them to index
vhcis and vhci->vhci_hcd_ss->vdev respectively.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel=152449131114778=2

Cc: sta...@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva 
---
Changes in v3:
 - Pass the addresses of pdev_nr and rhport into valid_port and
   valid_args.

Changes in v2:
 - Place the barriers into valid_port.

 drivers/usb/usbip/vhci_sysfs.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
index 4880838..be37aec 100644
--- a/drivers/usb/usbip/vhci_sysfs.c
+++ b/drivers/usb/usbip/vhci_sysfs.c
@@ -10,6 +10,9 @@
 #include 
 #include 
 
+/* Hardening for Spectre-v1 */
+#include 
+
 #include "usbip_common.h"
 #include "vhci.h"
 
@@ -205,16 +208,20 @@ static int vhci_port_disconnect(struct vhci_hcd 
*vhci_hcd, __u32 rhport)
return 0;
 }
 
-static int valid_port(__u32 pdev_nr, __u32 rhport)
+static int valid_port(__u32 *pdev_nr, __u32 *rhport)
 {
-   if (pdev_nr >= vhci_num_controllers) {
-   pr_err("pdev %u\n", pdev_nr);
+   if (*pdev_nr >= vhci_num_controllers) {
+   pr_err("pdev %u\n", *pdev_nr);
return 0;
}
-   if (rhport >= VHCI_HC_PORTS) {
-   pr_err("rhport %u\n", rhport);
+   *pdev_nr = array_index_nospec(*pdev_nr, vhci_num_controllers);
+
+   if (*rhport >= VHCI_HC_PORTS) {
+   pr_err("rhport %u\n", *rhport);
return 0;
}
+   *rhport = array_index_nospec(*rhport, VHCI_HC_PORTS);
+
return 1;
 }
 
@@ -232,7 +239,7 @@ static ssize_t detach_store(struct device *dev, struct 
device_attribute *attr,
pdev_nr = port_to_pdev_nr(port);
rhport = port_to_rhport(port);
 
-   if (!valid_port(pdev_nr, rhport))
+   if (!valid_port(_nr, ))
return -EINVAL;
 
hcd = platform_get_drvdata(vhcis[pdev_nr].pdev);
@@ -258,7 +265,8 @@ static ssize_t detach_store(struct device *dev, struct 
device_attribute *attr,
 }
 static DEVICE_ATTR_WO(detach);
 
-static int valid_args(__u32 pdev_nr, __u32 rhport, enum usb_device_speed speed)
+static int valid_args(__u32 *pdev_nr, __u32 *rhport,
+ enum usb_device_speed speed)
 {
if (!valid_port(pdev_nr, rhport)) {
return 0;
@@ -322,7 +330,7 @@ static ssize_t attach_store(struct device *dev, struct 
device_attribute *attr,
 sockfd, devid, speed);
 
/* check received parameters */
-   if (!valid_args(pdev_nr, rhport, speed))
+   if (!valid_args(_nr, , speed))
return -EINVAL;
 
hcd = platform_get_drvdata(vhcis[pdev_nr].pdev);
-- 
2.7.4

[PATCH v3] usbip: vhci_sysfs: fix potential Spectre v1

2018-05-18 Thread Gustavo A. R. Silva

pdev_nr and rhport can be controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:
drivers/usb/usbip/vhci_sysfs.c:238 detach_store() warn: potential spectre issue 
'vhcis'
drivers/usb/usbip/vhci_sysfs.c:328 attach_store() warn: potential spectre issue 
'vhcis'
drivers/usb/usbip/vhci_sysfs.c:338 attach_store() warn: potential spectre issue 
'vhci->vhci_hcd_ss->vdev'
drivers/usb/usbip/vhci_sysfs.c:340 attach_store() warn: potential spectre issue 
'vhci->vhci_hcd_hs->vdev'

Fix this by sanitizing pdev_nr and rhport before using them to index
vhcis and vhci->vhci_hcd_ss->vdev respectively.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel=152449131114778=2

Cc: sta...@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva 
---
Changes in v3:
 - Pass the addresses of pdev_nr and rhport into valid_port and
   valid_args.

Changes in v2:
 - Place the barriers into valid_port.

 drivers/usb/usbip/vhci_sysfs.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c
index 4880838..be37aec 100644
--- a/drivers/usb/usbip/vhci_sysfs.c
+++ b/drivers/usb/usbip/vhci_sysfs.c
@@ -10,6 +10,9 @@
 #include 
 #include 
 
+/* Hardening for Spectre-v1 */
+#include 
+
 #include "usbip_common.h"
 #include "vhci.h"
 
@@ -205,16 +208,20 @@ static int vhci_port_disconnect(struct vhci_hcd 
*vhci_hcd, __u32 rhport)
return 0;
 }
 
-static int valid_port(__u32 pdev_nr, __u32 rhport)
+static int valid_port(__u32 *pdev_nr, __u32 *rhport)
 {
-   if (pdev_nr >= vhci_num_controllers) {
-   pr_err("pdev %u\n", pdev_nr);
+   if (*pdev_nr >= vhci_num_controllers) {
+   pr_err("pdev %u\n", *pdev_nr);
return 0;
}
-   if (rhport >= VHCI_HC_PORTS) {
-   pr_err("rhport %u\n", rhport);
+   *pdev_nr = array_index_nospec(*pdev_nr, vhci_num_controllers);
+
+   if (*rhport >= VHCI_HC_PORTS) {
+   pr_err("rhport %u\n", *rhport);
return 0;
}
+   *rhport = array_index_nospec(*rhport, VHCI_HC_PORTS);
+
return 1;
 }
 
@@ -232,7 +239,7 @@ static ssize_t detach_store(struct device *dev, struct 
device_attribute *attr,
pdev_nr = port_to_pdev_nr(port);
rhport = port_to_rhport(port);
 
-   if (!valid_port(pdev_nr, rhport))
+   if (!valid_port(_nr, ))
return -EINVAL;
 
hcd = platform_get_drvdata(vhcis[pdev_nr].pdev);
@@ -258,7 +265,8 @@ static ssize_t detach_store(struct device *dev, struct 
device_attribute *attr,
 }
 static DEVICE_ATTR_WO(detach);
 
-static int valid_args(__u32 pdev_nr, __u32 rhport, enum usb_device_speed speed)
+static int valid_args(__u32 *pdev_nr, __u32 *rhport,
+ enum usb_device_speed speed)
 {
if (!valid_port(pdev_nr, rhport)) {
return 0;
@@ -322,7 +330,7 @@ static ssize_t attach_store(struct device *dev, struct 
device_attribute *attr,
 sockfd, devid, speed);
 
/* check received parameters */
-   if (!valid_args(pdev_nr, rhport, speed))
+   if (!valid_args(_nr, , speed))
return -EINVAL;
 
hcd = platform_get_drvdata(vhcis[pdev_nr].pdev);
-- 
2.7.4

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Jason Wang




On 2018年05月18日 22:33, Tiwei Bie wrote:

On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:

On 2018年05月18日 19:29, Tiwei Bie wrote:

On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:

On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

It's quite common I think, e.g driver track e.g dma addr and page frag
somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field).

Yes, we may want this for virtio-net as well in the future.


   But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
   * descriptor offset can be deduced from the CQE index instead of
   * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess so (at least from the above bits). Git grep -i "out of order" in
drivers/net gives some hints. Looks like there're few deivces do this.


I guess maybe storage device may want OOO.

Right, some iSCSI did.

But tracking them elsewhere is not only for OOO.

Spec said:

for element address

"
In a used descriptor, Element Address is unused.
"

for Next flag:

"
For example, if descriptors are used in the same order in which they are
made available, this will result in
the used descriptor overwriting the first available descriptor in the list,
the used descriptor for the next list
overwriting the first available descriptor in the next list, etc.
"

for in order completion:

"
This will result in the used descriptor overwriting the first available
descriptor in the batch, the used descriptor
for the next batch overwriting the first available descriptor in the next
batch, etc.
"

So:

- It's an alignment to the spec
- device may (or should) overwrite the descriptor make also make address
field useless.

You didn't get my point...


I don't hope so.


I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-18 Thread Jason Wang




On 2018年05月18日 22:33, Tiwei Bie wrote:

On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:

On 2018年05月18日 19:29, Tiwei Bie wrote:

On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:

On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = >vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?

It's quite common I think, e.g driver track e.g dma addr and page frag
somewhere. e.g the ring->rx_info in mlx4 driver.

It seems that I had a misunderstanding on your
previous comments. I know it's quite common for
drivers to track e.g. DMA addrs somewhere (and
I think one reason behind this is that they want
to reuse the bits of addr field).

Yes, we may want this for virtio-net as well in the future.


   But tracking
addrs somewhere doesn't means supporting OOO.
I thought you were saying it's quite common for
hardware NIC drivers to support OOO (i.e. NICs
will return the descriptors OOO):

I'm not familiar with mlx4, maybe I'm wrong.
I just had a quick glance. And I found below
comments in mlx4_en_process_rx_cq():

```
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
   * descriptor offset can be deduced from the CQE index instead of
   * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
```

It seems that although they have a completion
queue, they are still using the ring in order.

I guess so (at least from the above bits). Git grep -i "out of order" in
drivers/net gives some hints. Looks like there're few deivces do this.


I guess maybe storage device may want OOO.

Right, some iSCSI did.

But tracking them elsewhere is not only for OOO.

Spec said:

for element address

"
In a used descriptor, Element Address is unused.
"

for Next flag:

"
For example, if descriptors are used in the same order in which they are
made available, this will result in
the used descriptor overwriting the first available descriptor in the list,
the used descriptor for the next list
overwriting the first available descriptor in the next list, etc.
"

for in order completion:

"
This will result in the used descriptor overwriting the first available
descriptor in the batch, the used descriptor
for the next batch overwriting the first available descriptor in the next
batch, etc.
"

So:

- It's an alignment to the spec
- device may (or should) overwrite the descriptor make also make address
field useless.

You didn't get my point...


I don't hope so.


I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always

Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page

2018-05-18 Thread Tom Talpey


On 5/17/2018 5:22 PM, Long Li wrote:

From: Long Li 


There's a typo "recognize" in the patch title



When doing RDMA send, the offset needs to be checked as data may start in an 
offset
in the 1st page.


Doesn't this patch alter the generic smb2pdu.c code too? I think this
should note "any" send, not just RDMA?

Tom.



Signed-off-by: Long Li 
---
  fs/cifs/smb2pdu.c   |  3 ++-
  fs/cifs/smbdirect.c | 25 +++--
  2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 5097f28..fdcf97e 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
  
  	rqst.rq_iov = iov;

rqst.rq_nvec = 2;
-   rqst.rq_pages = wdata->pages;
+   rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : 
wdata->pages;
+   rqst.rq_offset = wdata->page_offset;
rqst.rq_npages = wdata->nr_pages;
rqst.rq_pagesz = wdata->pagesz;
rqst.rq_tailsz = wdata->tailsz;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index b0a1955..b46586d 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
  
  	/* add in the page array if there is one */

if (rqst->rq_npages) {
-   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-   buflen += rqst->rq_tailsz;
+   if (rqst->rq_npages == 1)
+   buflen += rqst->rq_tailsz;
+   else
+   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - 
rqst->rq_offset + rqst->rq_tailsz;
}
  
  	if (buflen + sizeof(struct smbd_data_transfer) >

@@ -2182,8 +2184,19 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
  
  	/* now sending pages if there are any */

for (i = 0; i < rqst->rq_npages; i++) {
-   buflen = (i == rqst->rq_npages-1) ?
-   rqst->rq_tailsz : rqst->rq_pagesz;
+   unsigned int offset = 0;
+   if (i == 0)
+   offset = rqst->rq_offset;
+   if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
+   buflen = rqst->rq_tailsz;
+   else {
+   /* We have at least two pages, and this is not the last 
page */
+   if (i == 0)
+   buflen = rqst->rq_pagesz - rqst->rq_offset;
+   else
+   buflen = rqst->rq_pagesz;
+   }
+
nvecs = (buflen + max_iov_size - 1) / max_iov_size;
log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
buflen, nvecs);
@@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
remaining_data_length -= size;
log_write(INFO, "sending pages i=%d offset=%d size=%d"
" remaining_data_length=%d\n",
-   i, j*max_iov_size, size, remaining_data_length);
+   i, j*max_iov_size+offset, size, 
remaining_data_length);
rc = smbd_post_send_page(
-   info, rqst->rq_pages[i], j*max_iov_size,
+   info, rqst->rq_pages[i], j*max_iov_size + 
offset,
size, remaining_data_length);
if (rc)
goto done;

Re: [RFC PATCH 05/09] Change RDMA send to regonize page offset in the 1st page

2018-05-18 Thread Tom Talpey


On 5/17/2018 5:22 PM, Long Li wrote:

From: Long Li 


There's a typo "recognize" in the patch title



When doing RDMA send, the offset needs to be checked as data may start in an 
offset
in the 1st page.


Doesn't this patch alter the generic smb2pdu.c code too? I think this
should note "any" send, not just RDMA?

Tom.



Signed-off-by: Long Li 
---
  fs/cifs/smb2pdu.c   |  3 ++-
  fs/cifs/smbdirect.c | 25 +++--
  2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 5097f28..fdcf97e 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -3015,7 +3015,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
  
  	rqst.rq_iov = iov;

rqst.rq_nvec = 2;
-   rqst.rq_pages = wdata->pages;
+   rqst.rq_pages = wdata->direct_pages ? wdata->direct_pages : 
wdata->pages;
+   rqst.rq_offset = wdata->page_offset;
rqst.rq_npages = wdata->nr_pages;
rqst.rq_pagesz = wdata->pagesz;
rqst.rq_tailsz = wdata->tailsz;
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index b0a1955..b46586d 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2084,8 +2084,10 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
  
  	/* add in the page array if there is one */

if (rqst->rq_npages) {
-   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1);
-   buflen += rqst->rq_tailsz;
+   if (rqst->rq_npages == 1)
+   buflen += rqst->rq_tailsz;
+   else
+   buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - 
rqst->rq_offset + rqst->rq_tailsz;
}
  
  	if (buflen + sizeof(struct smbd_data_transfer) >

@@ -2182,8 +2184,19 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
  
  	/* now sending pages if there are any */

for (i = 0; i < rqst->rq_npages; i++) {
-   buflen = (i == rqst->rq_npages-1) ?
-   rqst->rq_tailsz : rqst->rq_pagesz;
+   unsigned int offset = 0;
+   if (i == 0)
+   offset = rqst->rq_offset;
+   if (rqst->rq_npages == 1 || i == rqst->rq_npages-1)
+   buflen = rqst->rq_tailsz;
+   else {
+   /* We have at least two pages, and this is not the last 
page */
+   if (i == 0)
+   buflen = rqst->rq_pagesz - rqst->rq_offset;
+   else
+   buflen = rqst->rq_pagesz;
+   }
+
nvecs = (buflen + max_iov_size - 1) / max_iov_size;
log_write(INFO, "sending pages buflen=%d nvecs=%d\n",
buflen, nvecs);
@@ -2194,9 +2207,9 @@ int smbd_send(struct smbd_connection *info, struct 
smb_rqst *rqst)
remaining_data_length -= size;
log_write(INFO, "sending pages i=%d offset=%d size=%d"
" remaining_data_length=%d\n",
-   i, j*max_iov_size, size, remaining_data_length);
+   i, j*max_iov_size+offset, size, 
remaining_data_length);
rc = smbd_post_send_page(
-   info, rqst->rq_pages[i], j*max_iov_size,
+   info, rqst->rq_pages[i], j*max_iov_size + 
offset,
size, remaining_data_length);
if (rc)
goto done;

Re: [PATCH net] tuntap: raise EPOLLOUT on device up

2018-05-18 Thread Jason Wang




On 2018年05月18日 22:46, Michael S. Tsirkin wrote:

On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote:


On 2018年05月18日 22:06, Michael S. Tsirkin wrote:

On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:

On 2018年05月18日 21:26, Jason Wang wrote:

On 2018年05月18日 21:13, Michael S. Tsirkin wrote:

On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:

We return -EIO on device down but can not raise EPOLLOUT after it was
up. This may confuse user like vhost which expects tuntap to raise
EPOLLOUT to re-enable its TX routine after tuntap is down. This could
be easily reproduced by transmitting packets from VM while down and up
the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.

Cc: Hannes Frederic Sowa 
Cc: Eric Dumazet 
Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: Jason Wang 
---
    drivers/net/tun.c | 4 +++-
    1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d45ac37..1b29761 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
tun_struct *tun, struct tun_file *tfile,
    int skb_xdp = 1;
    bool frags = tun_napi_frags_enabled(tun);
    -    if (!(tun->dev->flags & IFF_UP))
+    if (!(tun->dev->flags & IFF_UP)) {

Isn't this racy?  What if flag is cleared at this point?

I think you mean "set at this point"? Then yes, so we probably need to
set the bit during tun_net_close().

Thanks

Looks no need, vhost will poll socket after it see EIO. So we are ok here?

Thanks

In fact I don't even understand why does this help any longer.


We disable tx polling and only enable it on demand for a better rx
performance. You may want to have a look at :

commit feb8892cb441c742d4220cf7ced001e7fa070731
Author: Jason Wang 
Date:   Mon Nov 13 11:45:34 2017 +0800

     vhost_net: conditionally enable tx polling

Thanks


Question is, what looks at SOCKWQ_ASYNC_NOSPACE.
I think it's tested when packet is transmitted,
but there is no guarantee here any packet will
ever be transmitted.



Well, actually, I do plan to disable vq polling from the beginning. But 
looks like you do not want this:


See https://patchwork.kernel.org/patch/10034025/

Thanks

Re: [PATCH net] tuntap: raise EPOLLOUT on device up

2018-05-18 Thread Jason Wang




On 2018年05月18日 22:46, Michael S. Tsirkin wrote:

On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote:


On 2018年05月18日 22:06, Michael S. Tsirkin wrote:

On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:

On 2018年05月18日 21:26, Jason Wang wrote:

On 2018年05月18日 21:13, Michael S. Tsirkin wrote:

On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:

We return -EIO on device down but can not raise EPOLLOUT after it was
up. This may confuse user like vhost which expects tuntap to raise
EPOLLOUT to re-enable its TX routine after tuntap is down. This could
be easily reproduced by transmitting packets from VM while down and up
the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.

Cc: Hannes Frederic Sowa 
Cc: Eric Dumazet 
Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: Jason Wang 
---
    drivers/net/tun.c | 4 +++-
    1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d45ac37..1b29761 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
tun_struct *tun, struct tun_file *tfile,
    int skb_xdp = 1;
    bool frags = tun_napi_frags_enabled(tun);
    -    if (!(tun->dev->flags & IFF_UP))
+    if (!(tun->dev->flags & IFF_UP)) {

Isn't this racy?  What if flag is cleared at this point?

I think you mean "set at this point"? Then yes, so we probably need to
set the bit during tun_net_close().

Thanks

Looks no need, vhost will poll socket after it see EIO. So we are ok here?

Thanks

In fact I don't even understand why does this help any longer.


We disable tx polling and only enable it on demand for a better rx
performance. You may want to have a look at :

commit feb8892cb441c742d4220cf7ced001e7fa070731
Author: Jason Wang 
Date:   Mon Nov 13 11:45:34 2017 +0800

     vhost_net: conditionally enable tx polling

Thanks


Question is, what looks at SOCKWQ_ASYNC_NOSPACE.
I think it's tested when packet is transmitted,
but there is no guarantee here any packet will
ever be transmitted.



Well, actually, I do plan to disable vq polling from the beginning. But 
looks like you do not want this:


See https://patchwork.kernel.org/patch/10034025/

Thanks

[PATCH v1] gpu: host1x: Skip IOMMU initialization if firewall is enabled

2018-05-18 Thread Dmitry Osipenko

Host1x's CDMA can't access the command buffers if IOMMU and Host1x
firewall are enabled in the kernels config because firewall doesn't map
the copied buffer into IOVA space. Fix this by skipping IOMMU
initialization if firewall is enabled as firewall merges sparse cmdbufs
into a single contiguous buffer and hence IOMMU isn't needed in this case.

Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/host1x/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index f1d5f76e9c33..d88073e7d22d 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -218,6 +218,9 @@ static int host1x_probe(struct platform_device *pdev)
return err;
}
 
+   if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
+   goto skip_iommu;
+
host->group = iommu_group_get(>dev);
if (host->group) {
struct iommu_domain_geometry *geometry;
-- 
2.17.0

[PATCH v1] gpu: host1x: Skip IOMMU initialization if firewall is enabled

2018-05-18 Thread Dmitry Osipenko

Host1x's CDMA can't access the command buffers if IOMMU and Host1x
firewall are enabled in the kernels config because firewall doesn't map
the copied buffer into IOVA space. Fix this by skipping IOMMU
initialization if firewall is enabled as firewall merges sparse cmdbufs
into a single contiguous buffer and hence IOMMU isn't needed in this case.

Signed-off-by: Dmitry Osipenko 
---
 drivers/gpu/host1x/dev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index f1d5f76e9c33..d88073e7d22d 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -218,6 +218,9 @@ static int host1x_probe(struct platform_device *pdev)
return err;
}
 
+   if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
+   goto skip_iommu;
+
host->group = iommu_group_get(>dev);
if (host->group) {
struct iommu_domain_geometry *geometry;
-- 
2.17.0

Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA

2018-05-18 Thread Tom Talpey

On 5/17/2018 11:03 PM, Long Li wrote:

Subject: Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for
RDMA

On 5/17/2018 8:22 PM, Long Li wrote:

From: Long Li 

This patchset implements direct user I/O through RDMA.

In normal code path (even with cache=none), CIFS copies I/O data from
user-space to kernel-space for security reasons.

With this patchset, a new mounting option is introduced to have CIFS
pin the user-space buffer into memory and performs I/O through RDMA.
This avoids memory copy, at the cost of added security risk.

What's the security risk? This type of direct i/o behavior is not uncommon,
and can certainly be made safe, using the appropriate memory registration
and protection domains. Any risk needs to be stated explicitly, and mitigation
provided, or at least described.

I think it's an assumption that user-mode buffer can't be trusted, so CIFS 
always copies them into internal buffers, and calculate signature and 
encryption based on protocol used.

With the direct buffer, the user can potentially modify the buffer when 
signature or encryption is in progress or after they are done.

I don't agree that the legacy copying behavior is because the buffer is
"untrusted". The buffer is the user's data, there's no trust issue here.
If the user application modifies the buffer while it's being sent, it's
a violation of the API contract, and the only victim is the application
itself. Same applies for receiving data. And as pointed out, most all
storage layers, file and block both, use this strategy for direct i/o.

Regarding signing, if the application alters the data then the integrity
hash will simply do its job and catch the application in the act. Again,
nothing suffers but the application.

Regarding encryption, I assume you're proposing to encrypt and decrypt
the data in a kernel buffer, effectively a copy. So in fact, in the
encryption case there's no need to pin and map the user buffer at all.

I'll mention however that Windows takes the path of not performing
RDMA placement when encrypting data. It saves nothing, and even adds
some overhead, because of the need to touch the buffer anyway to
manage the encryption/decryption.

Bottom line - no security implication for using user buffers directly.

Tom.

I also want to point out that, I choose to implement .read_iter and .write_iter 
from file_operations to implement direct I/O (CIFS is already doing this for 
O_DIRECT, so following this code path will avoid a big mess up).  The ideal 
choice is to implement .direct_IO from address_space_operations that I think 
eventually we want to move to.

Tom.

This patchset is RFC. The work is in progress, do not merge.

Long Li (9):
Introduce offset for the 1st page in data transfer structures
Change wdata alloc to support direct pages
Change rdata alloc to support direct pages
Change function to support offset when reading pages
Change RDMA send to regonize page offset in the 1st page
Change RDMA recv to support offset in the 1st page
Support page offset in memory regsitrations
Implement no-copy file I/O interfaces
Introduce cache=rdma moutning option

   fs/cifs/cifs_fs_sb.h  |   2 +
   fs/cifs/cifsfs.c  |  19 +++
   fs/cifs/cifsfs.h  |   3 +
   fs/cifs/cifsglob.h|   6 +
   fs/cifs/cifsproto.h   |   4 +-
   fs/cifs/cifssmb.c |  10 +-
   fs/cifs/connect.c |  13 +-
   fs/cifs/dir.c |   5 +
   fs/cifs/file.c| 351

++

   fs/cifs/inode.c   |   4 +-
   fs/cifs/smb2ops.c |   2 +-
   fs/cifs/smb2pdu.c |  22 ++-
   fs/cifs/smbdirect.c   | 132 ++---
   fs/cifs/smbdirect.h   |   2 +-
   fs/read_write.c   |   7 +
   include/linux/ratelimit.h |   2 +-
   16 files changed, 489 insertions(+), 95 deletions(-)

N�r��y���b�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!tml=

Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for RDMA

2018-05-18 Thread Tom Talpey

On 5/17/2018 11:03 PM, Long Li wrote:

Subject: Re: [RFC PATCH 00/09] Implement direct user I/O interfaces for
RDMA

On 5/17/2018 8:22 PM, Long Li wrote:

From: Long Li 

This patchset implements direct user I/O through RDMA.

In normal code path (even with cache=none), CIFS copies I/O data from
user-space to kernel-space for security reasons.

With this patchset, a new mounting option is introduced to have CIFS
pin the user-space buffer into memory and performs I/O through RDMA.
This avoids memory copy, at the cost of added security risk.

What's the security risk? This type of direct i/o behavior is not uncommon,
and can certainly be made safe, using the appropriate memory registration
and protection domains. Any risk needs to be stated explicitly, and mitigation
provided, or at least described.

I think it's an assumption that user-mode buffer can't be trusted, so CIFS 
always copies them into internal buffers, and calculate signature and 
encryption based on protocol used.

With the direct buffer, the user can potentially modify the buffer when 
signature or encryption is in progress or after they are done.

I don't agree that the legacy copying behavior is because the buffer is
"untrusted". The buffer is the user's data, there's no trust issue here.
If the user application modifies the buffer while it's being sent, it's
a violation of the API contract, and the only victim is the application
itself. Same applies for receiving data. And as pointed out, most all
storage layers, file and block both, use this strategy for direct i/o.

Regarding signing, if the application alters the data then the integrity
hash will simply do its job and catch the application in the act. Again,
nothing suffers but the application.

Regarding encryption, I assume you're proposing to encrypt and decrypt
the data in a kernel buffer, effectively a copy. So in fact, in the
encryption case there's no need to pin and map the user buffer at all.

I'll mention however that Windows takes the path of not performing
RDMA placement when encrypting data. It saves nothing, and even adds
some overhead, because of the need to touch the buffer anyway to
manage the encryption/decryption.

Bottom line - no security implication for using user buffers directly.

Tom.

I also want to point out that, I choose to implement .read_iter and .write_iter 
from file_operations to implement direct I/O (CIFS is already doing this for 
O_DIRECT, so following this code path will avoid a big mess up).  The ideal 
choice is to implement .direct_IO from address_space_operations that I think 
eventually we want to move to.

Tom.

This patchset is RFC. The work is in progress, do not merge.

Long Li (9):
Introduce offset for the 1st page in data transfer structures
Change wdata alloc to support direct pages
Change rdata alloc to support direct pages
Change function to support offset when reading pages
Change RDMA send to regonize page offset in the 1st page
Change RDMA recv to support offset in the 1st page
Support page offset in memory regsitrations
Implement no-copy file I/O interfaces
Introduce cache=rdma moutning option

   fs/cifs/cifs_fs_sb.h  |   2 +
   fs/cifs/cifsfs.c  |  19 +++
   fs/cifs/cifsfs.h  |   3 +
   fs/cifs/cifsglob.h|   6 +
   fs/cifs/cifsproto.h   |   4 +-
   fs/cifs/cifssmb.c |  10 +-
   fs/cifs/connect.c |  13 +-
   fs/cifs/dir.c |   5 +
   fs/cifs/file.c| 351

++

   fs/cifs/inode.c   |   4 +-
   fs/cifs/smb2ops.c |   2 +-
   fs/cifs/smb2pdu.c |  22 ++-
   fs/cifs/smbdirect.c   | 132 ++---
   fs/cifs/smbdirect.h   |   2 +-
   fs/read_write.c   |   7 +
   include/linux/ratelimit.h |   2 +-
   16 files changed, 489 insertions(+), 95 deletions(-)

N�r��y���b�X��ǧv�^�)޺{.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!tml=

Re: [RFC PATCH 02/09] Change wdata alloc to support direct pages

2018-05-18 Thread Tom Talpey


On 5/17/2018 5:22 PM, Long Li wrote:

From: Long Li 

When using direct pages from user space, there is no need to allocate pages.

Just ping those user pages for RDMA.


Did you mean "pin" those user pages? If so, where does that pinning
occur, it's not in this patch.

Perhaps this should just say "point to" those user pages.

I also don't think this is necessarily only "for RDMA". Perhaps
there are other transport scenarios where this is advantageous.




Signed-off-by: Long Li 
---
  fs/cifs/cifsproto.h |  2 +-
  fs/cifs/cifssmb.c   | 10 +++---
  fs/cifs/file.c  |  4 ++--
  3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 365a414..94106b9 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -523,7 +523,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, 
struct mid_q_entry *mid);
  int cifs_async_writev(struct cifs_writedata *wdata,
  void (*release)(struct kref *kref));
  void cifs_writev_complete(struct work_struct *work);
-struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
+struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, struct page 
**direct_pages,
work_func_t complete);
  void cifs_writedata_release(struct kref *refcount);
  int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 1529a08..3b1731d 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1983,7 +1983,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
}
  
-		wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);

+   wdata2 = cifs_writedata_alloc(nr_pages, NULL, 
cifs_writev_complete);
if (!wdata2) {
rc = -ENOMEM;
break;
@@ -2067,12 +2067,16 @@ cifs_writev_complete(struct work_struct *work)
  }
  
  struct cifs_writedata *

-cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, 
work_func_t complete)
  {
struct cifs_writedata *wdata;
  
  	/* writedata + number of page pointers */

-   wdata = kzalloc(sizeof(*wdata) +
+   if (direct_pages) {
+   wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
+   wdata->direct_pages = direct_pages;
+   } else
+   wdata = kzalloc(sizeof(*wdata) +
sizeof(struct page *) * nr_pages, GFP_NOFS);
if (wdata != NULL) {
kref_init(>refcount);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 23fd430..a6ec896 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1965,7 +1965,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct 
address_space *mapping,
  {
struct cifs_writedata *wdata;
  
-	wdata = cifs_writedata_alloc((unsigned int)tofind,

+   wdata = cifs_writedata_alloc((unsigned int)tofind, NULL,
 cifs_writev_complete);
if (!wdata)
return NULL;
@@ -2554,7 +2554,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct 
iov_iter *from,
break;
  
  		nr_pages = get_numpages(wsize, len, _len);

-   wdata = cifs_writedata_alloc(nr_pages,
+   wdata = cifs_writedata_alloc(nr_pages, NULL,
 cifs_uncached_writev_complete);
if (!wdata) {
rc = -ENOMEM;

Re: [RFC PATCH 02/09] Change wdata alloc to support direct pages

2018-05-18 Thread Tom Talpey


On 5/17/2018 5:22 PM, Long Li wrote:

From: Long Li 

When using direct pages from user space, there is no need to allocate pages.

Just ping those user pages for RDMA.


Did you mean "pin" those user pages? If so, where does that pinning
occur, it's not in this patch.

Perhaps this should just say "point to" those user pages.

I also don't think this is necessarily only "for RDMA". Perhaps
there are other transport scenarios where this is advantageous.




Signed-off-by: Long Li 
---
  fs/cifs/cifsproto.h |  2 +-
  fs/cifs/cifssmb.c   | 10 +++---
  fs/cifs/file.c  |  4 ++--
  3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
index 365a414..94106b9 100644
--- a/fs/cifs/cifsproto.h
+++ b/fs/cifs/cifsproto.h
@@ -523,7 +523,7 @@ int cifs_readv_receive(struct TCP_Server_Info *server, 
struct mid_q_entry *mid);
  int cifs_async_writev(struct cifs_writedata *wdata,
  void (*release)(struct kref *kref));
  void cifs_writev_complete(struct work_struct *work);
-struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages,
+struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, struct page 
**direct_pages,
work_func_t complete);
  void cifs_writedata_release(struct kref *refcount);
  int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon,
diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
index 1529a08..3b1731d 100644
--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -1983,7 +1983,7 @@ cifs_writev_requeue(struct cifs_writedata *wdata)
tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE;
}
  
-		wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete);

+   wdata2 = cifs_writedata_alloc(nr_pages, NULL, 
cifs_writev_complete);
if (!wdata2) {
rc = -ENOMEM;
break;
@@ -2067,12 +2067,16 @@ cifs_writev_complete(struct work_struct *work)
  }
  
  struct cifs_writedata *

-cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete)
+cifs_writedata_alloc(unsigned int nr_pages, struct page **direct_pages, 
work_func_t complete)
  {
struct cifs_writedata *wdata;
  
  	/* writedata + number of page pointers */

-   wdata = kzalloc(sizeof(*wdata) +
+   if (direct_pages) {
+   wdata = kzalloc(sizeof(*wdata), GFP_NOFS);
+   wdata->direct_pages = direct_pages;
+   } else
+   wdata = kzalloc(sizeof(*wdata) +
sizeof(struct page *) * nr_pages, GFP_NOFS);
if (wdata != NULL) {
kref_init(>refcount);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 23fd430..a6ec896 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1965,7 +1965,7 @@ wdata_alloc_and_fillpages(pgoff_t tofind, struct 
address_space *mapping,
  {
struct cifs_writedata *wdata;
  
-	wdata = cifs_writedata_alloc((unsigned int)tofind,

+   wdata = cifs_writedata_alloc((unsigned int)tofind, NULL,
 cifs_writev_complete);
if (!wdata)
return NULL;
@@ -2554,7 +2554,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct 
iov_iter *from,
break;
  
  		nr_pages = get_numpages(wsize, len, _len);

-   wdata = cifs_writedata_alloc(nr_pages,
+   wdata = cifs_writedata_alloc(nr_pages, NULL,
 cifs_uncached_writev_complete);
if (!wdata) {
rc = -ENOMEM;

[BUG] 4.17-rcX delays during boot and memory mapping errors

2018-05-18 Thread Bob Tracy

Apologies for not having more in the way of diagnostic info.

Every one of the 4.17-rcX series has hung during boot at the point where
"acpid" starts up.  Beginning with -rc5, the boot actually proceeds to
completion after an "uncomfortably" long delay.

Attempting to run the X11 server results in the following diagnostic
output in "/var/log/Xorg.0.log":

(...)
[   109.951] (--) Depth 24 pixmap format is 32 bpp
[   109.951] (II) RADEON(0): [DRI2] Setup complete
[   109.951] (II) RADEON(0): [DRI2]   DRI driver: r600
[   109.951] (II) RADEON(0): [DRI2]   VDPAU driver: r600
[   109.951] Failed to map cursor buffer memory
[   109.951] Failed to map cursor buffer memory
[   109.951] Failed to map cursor buffer memory
[   109.952] Failed to map cursor buffer memory
[   109.952] Failed to map cursor buffer memory
[   109.952] Failed to map cursor buffer memory
[   109.952] (II) RADEON(0): Front buffer size: 8160K
[   109.952] (II) RADEON(0): VRAM usage limit set to 218167K
[   109.952] (==) RADEON(0): DRI3 disabled
[   109.952] (==) RADEON(0): Backing store enabled
[   109.952] (II) RADEON(0): Direct rendering enabled
[   109.952] (II) EXA(0): Driver allocated offscreen pixmaps
[   109.952] (II) EXA(0): Driver registered support for the following
operations:
[   109.952] (II) Solid
[   109.952] (II) Copy
[   109.952] (II) Composite (RENDER acceleration)
[   109.952] (II) UploadToScreen
[   109.952] (II) DownloadFromScreen
[   109.952] (EE) 
Fatal server error:
[   109.952] (EE) failed to map shader -75
[   109.952] (EE) 
[   109.952] (EE) 
Please consult the The X.Org Foundation support 
 at http://wiki.x.org
 for help. 
[   109.952] (EE) Please also check the log file at "/var/log/Xorg.0.log" for 
additional information.
[   109.952] (EE) 
[   109.971] (EE) Server terminated with error (1). Closing log file.

Everything works normally with a 4.16 kernel.

--Bob

[BUG] 4.17-rcX delays during boot and memory mapping errors

2018-05-18 Thread Bob Tracy

Apologies for not having more in the way of diagnostic info.

Every one of the 4.17-rcX series has hung during boot at the point where
"acpid" starts up.  Beginning with -rc5, the boot actually proceeds to
completion after an "uncomfortably" long delay.

Attempting to run the X11 server results in the following diagnostic
output in "/var/log/Xorg.0.log":

(...)
[   109.951] (--) Depth 24 pixmap format is 32 bpp
[   109.951] (II) RADEON(0): [DRI2] Setup complete
[   109.951] (II) RADEON(0): [DRI2]   DRI driver: r600
[   109.951] (II) RADEON(0): [DRI2]   VDPAU driver: r600
[   109.951] Failed to map cursor buffer memory
[   109.951] Failed to map cursor buffer memory
[   109.951] Failed to map cursor buffer memory
[   109.952] Failed to map cursor buffer memory
[   109.952] Failed to map cursor buffer memory
[   109.952] Failed to map cursor buffer memory
[   109.952] (II) RADEON(0): Front buffer size: 8160K
[   109.952] (II) RADEON(0): VRAM usage limit set to 218167K
[   109.952] (==) RADEON(0): DRI3 disabled
[   109.952] (==) RADEON(0): Backing store enabled
[   109.952] (II) RADEON(0): Direct rendering enabled
[   109.952] (II) EXA(0): Driver allocated offscreen pixmaps
[   109.952] (II) EXA(0): Driver registered support for the following
operations:
[   109.952] (II) Solid
[   109.952] (II) Copy
[   109.952] (II) Composite (RENDER acceleration)
[   109.952] (II) UploadToScreen
[   109.952] (II) DownloadFromScreen
[   109.952] (EE) 
Fatal server error:
[   109.952] (EE) failed to map shader -75
[   109.952] (EE) 
[   109.952] (EE) 
Please consult the The X.Org Foundation support 
 at http://wiki.x.org
 for help. 
[   109.952] (EE) Please also check the log file at "/var/log/Xorg.0.log" for 
additional information.
[   109.952] (EE) 
[   109.971] (EE) Server terminated with error (1). Closing log file.

Everything works normally with a 4.16 kernel.

--Bob

Re: [PATCH v1] gpu: host1x: Utilize IOMMU mapping for firewall-copied buffers

2018-05-18 Thread Dmitry Osipenko

On 19.05.2018 02:52, Dmitry Osipenko wrote:
> Map firewall-copied buffers into Host1x's IOVA space, otherwise Host1x
> CDMA can't access the command buffers and all submitted jobs fail if IOMMU
> and Host1x firewall are enabled in the kernels config.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/host1x/job.c | 58 +++-
>  include/linux/host1x.h   |  4 ++-
>  2 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index e2f4a4d93d20..57384a5b5059 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -449,10 +449,13 @@ static int validate(struct host1x_firewall *fw, struct 
> host1x_job_gather *g)
>  
>  static inline int copy_gathers(struct host1x_job *job, struct device *dev)
>  {
> + struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
>   struct host1x_firewall fw;
> + dma_addr_t dma_addr;
>   size_t size = 0;
>   size_t offset = 0;
>   unsigned int i;
> + int err;
>  
>   fw.job = job;
>   fw.dev = dev;
> @@ -466,23 +469,55 @@ static inline int copy_gathers(struct host1x_job *job, 
> struct device *dev)
>   size += g->words * sizeof(u32);
>   }
>  
> + if (host->domain)
> + size = iova_align(>iova, size);
> +
>   /*
>* Try a non-blocking allocation from a higher priority pools first,
>* as awaiting for the allocation here is a major performance hit.
>*/
> - job->gather_copy_mapped = dma_alloc_wc(dev, size, >gather_copy,
> -GFP_NOWAIT);
> + job->gather_copy_mapped = dma_alloc_wc(dev, size,
> +>gather_copy_phys,
> +GFP_NOWAIT);
>  
>   /* the higher priority allocation failed, try the generic-blocking */
>   if (!job->gather_copy_mapped)
>   job->gather_copy_mapped = dma_alloc_wc(dev, size,
> ->gather_copy,
> +>gather_copy_phys,
>  GFP_KERNEL);
>   if (!job->gather_copy_mapped)
>   return -ENOMEM;
>  
>   job->gather_copy_size = size;
>  
> + if (host->domain) {
> + unsigned long shift;
> +
> + shift = iova_shift(>iova);
> + job->gather_copy_iova_alloc = alloc_iova(
> + >iova, size >> shift,
> + host->iova_end >> shift, true);
> + if (!job->gather_copy_iova_alloc)
> + return -ENOMEM;
> +
> + job->gather_copy_iova = iova_dma_addr(
> + >iova, job->gather_copy_iova_alloc);
> +
> + err = iommu_map(host->domain,
> + job->gather_copy_iova,
> + job->gather_copy_phys,
> + size, IOMMU_READ);
> + if (err) {
> + __free_iova(>iova, job->gather_copy_iova_alloc);
> + job->gather_copy_iova_alloc = NULL;
> + return err;
> + }
> +
> + dma_addr = job->gather_copy_iova;
> + } else {
> + dma_addr = job->gather_copy_phys;
> + }
> +
>   for (i = 0; i < job->num_gathers; i++) {
>   struct host1x_job_gather *g = >gathers[i];
>   void *gather;
> @@ -494,7 +529,7 @@ static inline int copy_gathers(struct host1x_job *job, 
> struct device *dev)
>   host1x_bo_munmap(g->bo, gather);
>  
>   /* Store the location in the buffer */
> - g->base = job->gather_copy;
> + g->base = dma_addr;
>   g->offset = offset;
>  
>   /* Validate the job */
> @@ -582,9 +617,20 @@ void host1x_job_unpin(struct host1x_job *job)
>  
>   job->num_unpins = 0;
>  
> - if (job->gather_copy_size)
> + if (job->gather_copy_size) {
>   dma_free_wc(job->channel->dev, job->gather_copy_size,
> - job->gather_copy_mapped, job->gather_copy);
> + job->gather_copy_mapped, job->gather_copy_phys);
> +
> + if (job->gather_copy_iova_alloc) {
> + iommu_unmap(host->domain,
> + job->gather_copy_iova,
> + job->gather_copy_size);
> +
> + __free_iova(>iova, job->gather_copy_iova_alloc);
> +
> + job->gather_copy_iova_alloc = NULL;
> + }
> + }
>  }
>  EXPORT_SYMBOL(host1x_job_unpin);
>  
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index 57d26406bdfd..536a678f81d4 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -235,8 +235,10 @@ struct

Re: [PATCH v1] gpu: host1x: Utilize IOMMU mapping for firewall-copied buffers

2018-05-18 Thread Dmitry Osipenko

On 19.05.2018 02:52, Dmitry Osipenko wrote:
> Map firewall-copied buffers into Host1x's IOVA space, otherwise Host1x
> CDMA can't access the command buffers and all submitted jobs fail if IOMMU
> and Host1x firewall are enabled in the kernels config.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/host1x/job.c | 58 +++-
>  include/linux/host1x.h   |  4 ++-
>  2 files changed, 55 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index e2f4a4d93d20..57384a5b5059 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -449,10 +449,13 @@ static int validate(struct host1x_firewall *fw, struct 
> host1x_job_gather *g)
>  
>  static inline int copy_gathers(struct host1x_job *job, struct device *dev)
>  {
> + struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
>   struct host1x_firewall fw;
> + dma_addr_t dma_addr;
>   size_t size = 0;
>   size_t offset = 0;
>   unsigned int i;
> + int err;
>  
>   fw.job = job;
>   fw.dev = dev;
> @@ -466,23 +469,55 @@ static inline int copy_gathers(struct host1x_job *job, 
> struct device *dev)
>   size += g->words * sizeof(u32);
>   }
>  
> + if (host->domain)
> + size = iova_align(>iova, size);
> +
>   /*
>* Try a non-blocking allocation from a higher priority pools first,
>* as awaiting for the allocation here is a major performance hit.
>*/
> - job->gather_copy_mapped = dma_alloc_wc(dev, size, >gather_copy,
> -GFP_NOWAIT);
> + job->gather_copy_mapped = dma_alloc_wc(dev, size,
> +>gather_copy_phys,
> +GFP_NOWAIT);
>  
>   /* the higher priority allocation failed, try the generic-blocking */
>   if (!job->gather_copy_mapped)
>   job->gather_copy_mapped = dma_alloc_wc(dev, size,
> ->gather_copy,
> +>gather_copy_phys,
>  GFP_KERNEL);
>   if (!job->gather_copy_mapped)
>   return -ENOMEM;
>  
>   job->gather_copy_size = size;
>  
> + if (host->domain) {
> + unsigned long shift;
> +
> + shift = iova_shift(>iova);
> + job->gather_copy_iova_alloc = alloc_iova(
> + >iova, size >> shift,
> + host->iova_end >> shift, true);
> + if (!job->gather_copy_iova_alloc)
> + return -ENOMEM;
> +
> + job->gather_copy_iova = iova_dma_addr(
> + >iova, job->gather_copy_iova_alloc);
> +
> + err = iommu_map(host->domain,
> + job->gather_copy_iova,
> + job->gather_copy_phys,
> + size, IOMMU_READ);
> + if (err) {
> + __free_iova(>iova, job->gather_copy_iova_alloc);
> + job->gather_copy_iova_alloc = NULL;
> + return err;
> + }
> +
> + dma_addr = job->gather_copy_iova;
> + } else {
> + dma_addr = job->gather_copy_phys;
> + }
> +
>   for (i = 0; i < job->num_gathers; i++) {
>   struct host1x_job_gather *g = >gathers[i];
>   void *gather;
> @@ -494,7 +529,7 @@ static inline int copy_gathers(struct host1x_job *job, 
> struct device *dev)
>   host1x_bo_munmap(g->bo, gather);
>  
>   /* Store the location in the buffer */
> - g->base = job->gather_copy;
> + g->base = dma_addr;
>   g->offset = offset;
>  
>   /* Validate the job */
> @@ -582,9 +617,20 @@ void host1x_job_unpin(struct host1x_job *job)
>  
>   job->num_unpins = 0;
>  
> - if (job->gather_copy_size)
> + if (job->gather_copy_size) {
>   dma_free_wc(job->channel->dev, job->gather_copy_size,
> - job->gather_copy_mapped, job->gather_copy);
> + job->gather_copy_mapped, job->gather_copy_phys);
> +
> + if (job->gather_copy_iova_alloc) {
> + iommu_unmap(host->domain,
> + job->gather_copy_iova,
> + job->gather_copy_size);
> +
> + __free_iova(>iova, job->gather_copy_iova_alloc);
> +
> + job->gather_copy_iova_alloc = NULL;
> + }
> + }
>  }
>  EXPORT_SYMBOL(host1x_job_unpin);
>  
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index 57d26406bdfd..536a678f81d4 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -235,8 +235,10 @@ struct host1x_job {
>

[PATCH] usb: dwc2: Fix HiKey regression caused by power_down feature

2018-05-18 Thread John Stultz

In 4.17-rc, commit 03ea6d6e9e1f ("usb: dwc2: Enable power down")
caused the HiKey board to not correctly handle switching between
usb-gadget and usb-host mode.

Unplugging the OTG port would result in:
[   42.240973] dwc2 f72c.usb: dwc2_restore_host_registers: no host 
registers to restore
[   42.249066] dwc2 f72c.usb: dwc2_host_exit_hibernation: failed to restore 
host registers

And the USB-host ports would not function.

And plugging in the OTG port, we would see:
[   46.046557] WARNING: CPU: 3 PID: 6 at drivers/usb/dwc2/gadget.c:260 
dwc2_hsotg_init_fifo+0x194/0x1a0
[   46.055761] CPU: 3 PID: 6 Comm: kworker/u16:0 Not tainted 
4.17.0-rc5-00030-ge67da8c #231
[   46.055767] Hardware name: HiKey Development Board (DT)
[   46.055784] Workqueue: dwc2 dwc2_conn_id_status_change
...

Thus, this patch sets the hisi params to disable the power_down
flag by default, and gets thing working again.

Cc: John Youn 
Cc: Vardan Mikayelyan 
Cc: Artur Petrosyan 
Cc: Grigor Tovmasyan 
Cc: Felipe Balbi 
Cc: linux-...@vger.kernel.org
Signed-off-by: John Stultz 
---
 drivers/usb/dwc2/params.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c
index f03e418..96b1b25 100644
--- a/drivers/usb/dwc2/params.c
+++ b/drivers/usb/dwc2/params.c
@@ -70,6 +70,7 @@ static void dwc2_set_his_params(struct dwc2_hsotg *hsotg)
GAHBCFG_HBSTLEN_SHIFT;
p->uframe_sched = false;
p->change_speed_quirk = true;
+   p->power_down = false;
 }
 
 static void dwc2_set_rk_params(struct dwc2_hsotg *hsotg)
-- 
2.7.4

[PATCH] usb: dwc2: Fix HiKey regression caused by power_down feature

2018-05-18 Thread John Stultz

In 4.17-rc, commit 03ea6d6e9e1f ("usb: dwc2: Enable power down")
caused the HiKey board to not correctly handle switching between
usb-gadget and usb-host mode.

Unplugging the OTG port would result in:
[   42.240973] dwc2 f72c.usb: dwc2_restore_host_registers: no host 
registers to restore
[   42.249066] dwc2 f72c.usb: dwc2_host_exit_hibernation: failed to restore 
host registers

And the USB-host ports would not function.

And plugging in the OTG port, we would see:
[   46.046557] WARNING: CPU: 3 PID: 6 at drivers/usb/dwc2/gadget.c:260 
dwc2_hsotg_init_fifo+0x194/0x1a0
[   46.055761] CPU: 3 PID: 6 Comm: kworker/u16:0 Not tainted 
4.17.0-rc5-00030-ge67da8c #231
[   46.055767] Hardware name: HiKey Development Board (DT)
[   46.055784] Workqueue: dwc2 dwc2_conn_id_status_change
...

Thus, this patch sets the hisi params to disable the power_down
flag by default, and gets thing working again.

Cc: John Youn 
Cc: Vardan Mikayelyan 
Cc: Artur Petrosyan 
Cc: Grigor Tovmasyan 
Cc: Felipe Balbi 
Cc: linux-...@vger.kernel.org
Signed-off-by: John Stultz 
---
 drivers/usb/dwc2/params.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c
index f03e418..96b1b25 100644
--- a/drivers/usb/dwc2/params.c
+++ b/drivers/usb/dwc2/params.c
@@ -70,6 +70,7 @@ static void dwc2_set_his_params(struct dwc2_hsotg *hsotg)
GAHBCFG_HBSTLEN_SHIFT;
p->uframe_sched = false;
p->change_speed_quirk = true;
+   p->power_down = false;
 }
 
 static void dwc2_set_rk_params(struct dwc2_hsotg *hsotg)
-- 
2.7.4

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2450 matches

Mail list logo