[Spice-devel] [PATCH] Do not loop on ERESTARTSYS using interruptible waits

2015-05-28 Thread Dave Airlie
On 19 May 2015 at 19:54, Frediano Ziglio  wrote:
> This problem happens using KMS surfaces and QXL driver.
> To easy reproduce use KDE Plasma (which use surfaces a lot) and assure
> you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to
> stop using them). Open some complex application like LibreOffice and
> after a while your machine get stuck using 100% CPU on Xorg.
> The problem occurs as creating new surfaces not interruptible wait
> are used however instead of returning ERESTARTSYS back to userspace
> you try to loop but wait routines always keep returning ERESTARTSYS
> once the signal is marked.
> On out of memory conditions TTM module try to move objects to system
> memory and QXL assure surface is updated before the move.
> The fix handle differently this case using no interruptible wait so
> wait functions will wait instead of returning ERESTARTSYS.
> Note the when the loop occurs driver will send a lot of update requests
> causing more CPU usage on Qemu side too.

I actually don't think we should be enabling surfaces upstream, I
don't mind fixing
the kernel driver to not be crap, but I really don't think surfaces
really help the
SPICE protocol.

I should h ave pushed the disable surface in all cases upstream, feel free to do
so, they were a bad experiment, and nobody ever showed they were faster, or
at least when they hit eviction paths they didn't plummet down the
side of a massive
cliff.

The reason this loops in -ERESTARTSYS is that the hw craps itself if you try
and redo an operation, so you can't go back out to userspace and re-enter the
kernel, just one of many bad design points in the QXL hw.

you should probably drop wait_for_io_cmd completely.

Dave.

>
> Signed-off-by: Frediano Ziglio 
> ---
>  qxl/qxl_cmd.c   | 12 +++-
>  qxl/qxl_drv.h   |  2 +-
>  qxl/qxl_ioctl.c |  2 +-
>  3 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c
> index 9782364..bd5404e 100644
> --- a/drivers/gpu/drm/qxl/qxl_cmd.c
> +++ b/drivers/gpu/drm/qxl/qxl_cmd.c
> @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev, 
> uint8_t val, long port)
>  {
> int ret;
>
> -restart:
> ret = wait_for_io_cmd_user(qdev, val, port, false);
> -   if (ret == -ERESTARTSYS)
> -   goto restart;
>  }
>
>  int qxl_io_update_area(struct qxl_ddevice *qdev, struct qxl_bo *surf,
> -   const struct qxl_rect *area)
> +   const struct qxl_rect *area, bool intr)
>  {
> int surface_id;
> uint32_t surface_width, surface_height;
> @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct 
> qxl_bo *surf,
> mutex_lock(>update_area_mutex);
> qdev->ram_header->update_area = *area;
> qdev->ram_header->update_surface = surface_id;
> -   ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true);
> +   ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr);
> mutex_unlock(>update_area_mutex);
> return ret;
>  }
> @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct 
> qxl_bo *surf)
> rect.right = surf->surf.width;
> rect.top = 0;
> rect.bottom = surf->surf.height;
> -retry:
> -   ret = qxl_io_update_area(qdev, surf, );
> -   if (ret == -ERESTARTSYS)
> -   goto retry;
> +   ret = qxl_io_update_area(qdev, surf, , false);
> return ret;
>  }
>
> diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_drv.h b/qxl/qxl_drv.h
> index 7c6cafe..6745c44 100644
> --- a/drivers/gpu/drm/qxl/qxl_drv.h
> +++ b/drivers/gpu/drm/qxl/qxl_drv.h
> @@ -462,7 +462,7 @@ void qxl_io_memslot_add(struct qxl_device *qdev, uint8_t 
> id);
>  void qxl_io_notify_oom(struct qxl_device *qdev);
>
>  int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
> -  const struct qxl_rect *area);
> +  const struct qxl_rect *area, bool intr);
>
>  void qxl_io_reset(struct qxl_device *qdev);
>  void qxl_io_monitors_config(struct qxl_device *qdev);
> diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/qxl/qxl_ioctl.c
> index b110883..afd7297 100644
> --- a/drivers/gpu/drm/qxl/qxl_ioctl.c
> +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c
> @@ -348,7 +348,7 @@ static int qxl_update_area_ioctl(struct drm_device *dev, 
> void *data,
> goto out2;
> if (!qobj->surface_id)
> DRM_ERROR("got update area for surface with no id %d\n", 
> update_area->handle);
> -   ret = qxl_io_update_area(qdev, qobj, );
> +   ret = qxl_io_update_area(qdev, qobj, , true);
>
>  out2:
> qxl_bo_unreserve(qobj);
> --
> 2.1.0
> ___
> Spice-devel mailing list
> Spice-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/spice-devel


[Spice-devel] [PATCH] Do not loop on ERESTARTSYS using interruptible waits

2015-05-26 Thread Christophe Fergeau
Hey,

On Fri, May 22, 2015 at 09:58:10AM -0400, Frediano Ziglio wrote:
> > >  }
> > >  
> > >  int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
> > > - const struct qxl_rect *area)
> > > + const struct qxl_rect *area, bool intr)
> > >  {
> > >   int surface_id;
> > >   uint32_t surface_width, surface_height;
> > > @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct
> > > qxl_bo *surf,
> > >   mutex_lock(>update_area_mutex);
> > >   qdev->ram_header->update_area = *area;
> > >   qdev->ram_header->update_surface = surface_id;
> > > - ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true);
> > > + ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr);
> > >   mutex_unlock(>update_area_mutex);
> > >   return ret;
> > >  }
> > > @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, 
> > > struct
> > > qxl_bo *surf)
> > >   rect.right = surf->surf.width;
> > >   rect.top = 0;
> > >   rect.bottom = surf->surf.height;
> > > -retry:
> > > - ret = qxl_io_update_area(qdev, surf, );
> > > - if (ret == -ERESTARTSYS)
> > > - goto retry;
> > > + ret = qxl_io_update_area(qdev, surf, , false);
> > 
> > My understanding is that the fix is this hunk? If so, this could be made
> > more obvious with an intermediate commit adding the 'bool intr' arg to
> > qxl_io_update_area and only calling it with 'true' in the appropriate
> > places.
> > This code path is only triggered from qxl_surface_evict() which I assume
> > is not necessarily easily interruptible, so this change makes sense to
> > me. However it would be much better to get a review from Dave Airlie ;)
> > 
> > Christophe
> > 
> 
> Are you asking if just removing the loop would fix the issue?
> So you are proposing a first patch that add the argument always
> passing true and another that change some calls to false? It make
> sense but still the loop should be removed.

Sorry, I was not clear, removing the loop is fine, I was not suggesting
to keep it.

Christophe
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: 



[Spice-devel] [PATCH] Do not loop on ERESTARTSYS using interruptible waits

2015-05-22 Thread Christophe Fergeau
Hey,

On Tue, May 19, 2015 at 05:54:54AM -0400, Frediano Ziglio wrote:
> This problem happens using KMS surfaces and QXL driver.
> To easy reproduce use KDE Plasma (which use surfaces a lot) and assure
> you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to
> stop using them). Open some complex application like LibreOffice and
> after a while your machine get stuck using 100% CPU on Xorg.
> The problem occurs as creating new surfaces not interruptible wait
> are used however instead of returning ERESTARTSYS back to userspace
> you try to loop but wait routines always keep returning ERESTARTSYS
> once the signal is marked.
> On out of memory conditions TTM module try to move objects to system
> memory and QXL assure surface is updated before the move.
> The fix handle differently this case using no interruptible wait so
> wait functions will wait instead of returning ERESTARTSYS.
> Note the when the loop occurs driver will send a lot of update requests
> causing more CPU usage on Qemu side too.
> 
> Signed-off-by: Frediano Ziglio 
> ---
>  qxl/qxl_cmd.c   | 12 +++-
>  qxl/qxl_drv.h   |  2 +-
>  qxl/qxl_ioctl.c |  2 +-
>  3 files changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c
> index 9782364..bd5404e 100644
> --- a/drivers/gpu/drm/qxl/qxl_cmd.c
> +++ b/drivers/gpu/drm/qxl/qxl_cmd.c
> @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev, 
> uint8_t val, long port)
>  {
>   int ret;
>  
> -restart:
>   ret = wait_for_io_cmd_user(qdev, val, port, false);
> - if (ret == -ERESTARTSYS)
> - goto restart;

I think this one is not directly related to the fix, but can be removed
because wait_for_io_cmd_user(qdev, val, port, false); will call
wait_event_timeout() which cannot return ERESTARTSYS? Or was this loop
causing issues too?

>  }
>  
>  int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
> - const struct qxl_rect *area)
> + const struct qxl_rect *area, bool intr)
>  {
>   int surface_id;
>   uint32_t surface_width, surface_height;
> @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct 
> qxl_bo *surf,
>   mutex_lock(>update_area_mutex);
>   qdev->ram_header->update_area = *area;
>   qdev->ram_header->update_surface = surface_id;
> - ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true);
> + ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr);
>   mutex_unlock(>update_area_mutex);
>   return ret;
>  }
> @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct 
> qxl_bo *surf)
>   rect.right = surf->surf.width;
>   rect.top = 0;
>   rect.bottom = surf->surf.height;
> -retry:
> - ret = qxl_io_update_area(qdev, surf, );
> - if (ret == -ERESTARTSYS)
> - goto retry;
> + ret = qxl_io_update_area(qdev, surf, , false);

My understanding is that the fix is this hunk? If so, this could be made
more obvious with an intermediate commit adding the 'bool intr' arg to
qxl_io_update_area and only calling it with 'true' in the appropriate
places.
This code path is only triggered from qxl_surface_evict() which I assume
is not necessarily easily interruptible, so this change makes sense to
me. However it would be much better to get a review from Dave Airlie ;)

Christophe
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: 



[Spice-devel] [PATCH] Do not loop on ERESTARTSYS using interruptible waits

2015-05-22 Thread Frediano Ziglio

> 
> Hey,
> 
> On Tue, May 19, 2015 at 05:54:54AM -0400, Frediano Ziglio wrote:
> > This problem happens using KMS surfaces and QXL driver.
> > To easy reproduce use KDE Plasma (which use surfaces a lot) and assure
> > you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to
> > stop using them). Open some complex application like LibreOffice and
> > after a while your machine get stuck using 100% CPU on Xorg.
> > The problem occurs as creating new surfaces not interruptible wait
> > are used however instead of returning ERESTARTSYS back to userspace
> > you try to loop but wait routines always keep returning ERESTARTSYS
> > once the signal is marked.
> > On out of memory conditions TTM module try to move objects to system
> > memory and QXL assure surface is updated before the move.
> > The fix handle differently this case using no interruptible wait so
> > wait functions will wait instead of returning ERESTARTSYS.
> > Note the when the loop occurs driver will send a lot of update requests
> > causing more CPU usage on Qemu side too.
> > 
> > Signed-off-by: Frediano Ziglio 
> > ---
> >  qxl/qxl_cmd.c   | 12 +++-
> >  qxl/qxl_drv.h   |  2 +-
> >  qxl/qxl_ioctl.c |  2 +-
> >  3 files changed, 5 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c
> > index 9782364..bd5404e 100644
> > --- a/drivers/gpu/drm/qxl/qxl_cmd.c
> > +++ b/drivers/gpu/drm/qxl/qxl_cmd.c
> > @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev,
> > uint8_t val, long port)
> >  {
> > int ret;
> >  
> > -restart:
> > ret = wait_for_io_cmd_user(qdev, val, port, false);
> > -   if (ret == -ERESTARTSYS)
> > -   goto restart;
> 
> I think this one is not directly related to the fix, but can be removed
> because wait_for_io_cmd_user(qdev, val, port, false); will call
> wait_event_timeout() which cannot return ERESTARTSYS? Or was this loop
> causing issues too?
> 

Yes, but it has the same issue. Try till ERESTARTSYS are gone.
Currently perhaps not broken but prone to have same issue.

> >  }
> >  
> >  int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf,
> > -   const struct qxl_rect *area)
> > +   const struct qxl_rect *area, bool intr)
> >  {
> > int surface_id;
> > uint32_t surface_width, surface_height;
> > @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct
> > qxl_bo *surf,
> > mutex_lock(>update_area_mutex);
> > qdev->ram_header->update_area = *area;
> > qdev->ram_header->update_surface = surface_id;
> > -   ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true);
> > +   ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr);
> > mutex_unlock(>update_area_mutex);
> > return ret;
> >  }
> > @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct
> > qxl_bo *surf)
> > rect.right = surf->surf.width;
> > rect.top = 0;
> > rect.bottom = surf->surf.height;
> > -retry:
> > -   ret = qxl_io_update_area(qdev, surf, );
> > -   if (ret == -ERESTARTSYS)
> > -   goto retry;
> > +   ret = qxl_io_update_area(qdev, surf, , false);
> 
> My understanding is that the fix is this hunk? If so, this could be made
> more obvious with an intermediate commit adding the 'bool intr' arg to
> qxl_io_update_area and only calling it with 'true' in the appropriate
> places.
> This code path is only triggered from qxl_surface_evict() which I assume
> is not necessarily easily interruptible, so this change makes sense to
> me. However it would be much better to get a review from Dave Airlie ;)
> 
> Christophe
> 

Are you asking if just removing the loop would fix the issue?
So you are proposing a first patch that add the argument always passing true 
and another that change some calls to false? It make sense but still the loop 
should be removed.

Frediano