[Bug 82588] X fails to start with linus-tip or drm-next

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82588

Mike Lothian  changed:

   What|Removed |Added

 CC||mike at fireburn.co.uk

--- Comment #2 from Mike Lothian  ---
Both were compiled with the new kabini firmware

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/e262b28a/attachment.html>


[Bug 82588] X fails to start with linus-tip or drm-next

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82588

--- Comment #1 from Mike Lothian  ---
Created attachment 104591
  --> https://bugs.freedesktop.org/attachment.cgi?id=104591&action=edit
Linux Tip

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/1003870e/attachment-0001.html>


[Bug 82588] New: X fails to start with linus-tip or drm-next

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82588

  Priority: medium
Bug ID: 82588
  Assignee: dri-devel at lists.freedesktop.org
   Summary: X fails to start with linus-tip or drm-next
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: mike at fireburn.co.uk
  Hardware: Other
Status: NEW
   Version: DRI CVS
 Component: DRM/Radeon
   Product: DRI

Created attachment 104590
  --> https://bugs.freedesktop.org/attachment.cgi?id=104590&action=edit
DRM Next

I've not had a chance to diagnose this issue yet or bisect

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/c7539b0f/attachment.html>


[Bug 82371] Capa Verde (Radeon 7750) GPU lockup using UVD

2014-08-13 Thread bugzilla-dae...@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=82371

--- Comment #1 from Pablo Wagner  ---
Created attachment 146521
  --> https://bugzilla.kernel.org/attachment.cgi?id=146521&action=edit
This is the diff from one version to the problematic I think.

I'll try to revert the package version to see if I can use again UVD

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 82371] New: Capa Verde (Radeon 7750) GPU lockup using UVD

2014-08-13 Thread bugzilla-dae...@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=82371

Bug ID: 82371
   Summary: Capa Verde (Radeon 7750) GPU lockup using UVD
   Product: Drivers
   Version: 2.5
Kernel Version: 3.13
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-dri at kernel-bugs.osdl.org
  Reporter: pablow.1422 at gmail.com
Regression: No

Created attachment 146511
  --> https://bugzilla.kernel.org/attachment.cgi?id=146511&action=edit
dmesg output till complete hang (ssh server not responding)

I'm using Ubuntu 14.04 with oibaf ppa. With a recent update, the GPU would just
 lockup using UVD in mplayer and xmbc. I'll try to search for the git revision
of the update, but in the meantime, I'm attaching the dmesg output which I
gather from ssh to my computer.

Also, I've tried with and without dpm enabled.

Please ask any other info I could provide. Thanks.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 82586] New: UBO matrix in std140 struct does not work

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82586

  Priority: medium
Bug ID: 82586
  Assignee: dri-devel at lists.freedesktop.org
   Summary: UBO matrix in std140 struct does not work
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: pavol at klacansky.com
  Hardware: Other
Status: NEW
   Version: git
 Component: Drivers/Gallium/r600
   Product: Mesa

I am not sure if the problem is actually in my code, but I tested it on nVidia
driver and it worked (but their compiler might have a fix for it internally).

Here is vertex shader:

#version 330

layout(location = 0) in vec4 position;

layout(std140) uniform Matrices {
mat4 model;
mat4 view;
mat4 projection;
} matrices;

out gl_PerVertex {
vec4 gl_Position;
};

/* interpolate to compute normal in fragment shader */
out vec3 v_position;

void main(void)
{
v_position = vec3(matrices.view * matrices.model * position);
gl_Position = matrices.projection * matrices.view * matrices.model *
position;
}


If I use only one matrix from the uniform it works. I have printed UBOs values
and they are just identity matrices.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/dc9cd609/attachment.html>


[Bug 82585] New: geometry shader with optional out variable segfaults

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82585

  Priority: medium
Bug ID: 82585
  Assignee: dri-devel at lists.freedesktop.org
   Summary: geometry shader with optional out variable segfaults
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: pavol at klacansky.com
  Hardware: Other
Status: NEW
   Version: git
 Component: Drivers/Gallium/r600
   Product: Mesa

Running this shader in geometry stage, it segfaults:

#version 330

layout(points) in;
layout(points, max_vertices = 1) out;

in gl_PerVertex {
vec4 gl_Position;
float gl_PointSize;
float gl_ClipDistance[];
} gl_in[];

in vec4 v_colour[];
out vec4 g_colour;

void main() {
  for(int i = 0; i < 1; i++) {
gl_Position = gl_in[i].gl_Position;
gl_PrimitiveID = gl_PrimitiveIDIn;
/* TODO report bug */
g_colour = v_colour[i];
EmitVertex();
  }
  EndPrimitive();
}


Following message is from Valgrind:
EE ../../../../../../src/gallium/drivers/r600/r600_shader.c:353
tgsi_is_supported - unsupported src 0 (dimension 1)
EE ../../../../../../src/gallium/drivers/r600/r600_shader.c:157
r600_pipe_shader_create - translation from TGSI failed !
EE ../../../../../../src/gallium/drivers/r600/r600_state_common.c:750
r600_shader_select - Failed to build shader variant (type=2) -22

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/cd8eea16/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #10 from Aaron B  ---
I'd suggest compton myself, I have no tearing when I tell games to limit
themselves.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/48ce7786/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #25 from dimangi42 at gmail.com ---
Created attachment 104584
  --> https://bugs.freedesktop.org/attachment.cgi?id=104584&action=edit
dmesg 3.16 mesa 10.3

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/28afc653/attachment-0001.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #24 from dimangi42 at gmail.com ---
(In reply to comment #22)
> Also does the combination of a newer kernel with mesa 10.3 help?

Does not. Same results with kernel 3.16 and mesa 10.3.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/7dc8d4ab/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #23 from dimangi42 at gmail.com ---
(In reply to comment #21)
> Does disabling tiling help?  Add:
>   Option "ColorTiling" "false"
>   Option "ColorTiling2D" "false"
> To the device section of your xorg config.

Does not help. I get back to the orig. striped screen. Even after removing the
new options from xorg.conf.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/cdf69a6a/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #22 from Alex Deucher  ---
Also does the combination of a newer kernel with mesa 10.3 help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/33a0d805/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #21 from Alex Deucher  ---
Does disabling tiling help?  Add:
Option "ColorTiling" "false"
Option "ColorTiling2D" "false"
To the device section of your xorg config.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/13f6540f/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

dimangi42 at gmail.com changed:

   What|Removed |Added

 Attachment #104583|text/plain  |image/jpeg
  mime type||

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/166c3846/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #20 from dimangi42 at gmail.com ---
Created attachment 104583
  --> https://bugs.freedesktop.org/attachment.cgi?id=104583&action=edit
mesa 10.3

different kind of wrong display

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/ce84aa6f/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #19 from dimangi42 at gmail.com ---
Created attachment 104582
  --> https://bugs.freedesktop.org/attachment.cgi?id=104582&action=edit
dmesg 3.13 mesa 10.3

I did
sudo add-apt-repository ppa:xorg-edgers/ppa
sudo apt-get update
sudo apt-get dist-upgrade

and I get

ii  libegl1-mesa:amd64  
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64free
implementation of the EGL API -- runtime
ii  libegl1-mesa-drivers:amd64  
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64free
implementation of the EGL API -- hardware drivers
ii  libgl1-mesa-dri:amd64   
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64free
implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-glx:amd64   
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64free
implementation of the OpenGL API -- GLX runtime
ii  libglapi-mesa:amd64 
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64free
implementation of the GL API -- shared library
ii  libglu1-mesa:amd64   9.0.0-2   
amd64Mesa OpenGL utility library (GLU)
ii  libopenvg1-mesa:amd64   
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64free
implementation of the OpenVG API -- runtime
ii  libwayland-egl1-mesa:amd64  
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64   
implementation of the Wayland EGL platform -- runtime
ii  mesa-utils   8.1.0-2   
amd64Miscellaneous Mesa GL utilities
ii  mesa-vdpau-drivers:amd64
10.3.0~git20140812.fa5b76e3-0ubuntu0ricotz~trusty  amd64Mesa VDPAU
video acceleration drivers

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/efb07572/attachment.html>


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Thomas Hellstrom
On 08/13/2014 06:30 PM, Alex Deucher wrote:
> On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter  wrote:
>> On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
>>> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
 On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
 From: J?r?me Glisse 

 When experiencing memory pressure we want to minimize pool size so that
 memory we just shrinked is not added back again just as the next thing.

 This will divide by 2 the maximum pool size for each device each time
 the pool have to shrink. The limit is bumped again is next allocation
 happen after one second since the last shrink. The one second delay is
 obviously an arbitrary choice.
>>> J?r?me,
>>>
>>> I don't like this patch. It adds extra complexity and its usefulness is
>>> highly questionable.
>>> There are a number of caches in the system, and if all of them added
>>> some sort of voluntary shrink heuristics like this, we'd end up with
>>> impossible-to-debug unpredictable performance issues.
>>>
>>> We should let the memory subsystem decide when to reclaim pages from
>>> caches and what caches to reclaim them from.
>> Yeah, artificially limiting your cache from growing when your shrinker
>> gets called will just break the equal-memory pressure the core mm uses to
>> rebalance between all caches when workload changes. In i915 we let
>> everything grow without artificial bounds and only rely upon the shrinker
>> callbacks to ensure we don't consume more than our fair share of 
>> available
>> memory overall.
>> -Daniel
> Now when you bring i915 memory usage up, Daniel,
> I can't refrain from bringing up the old user-space unreclaimable kernel
> memory issue, for which gem open is a good example ;) Each time
> user-space opens a gem handle, some un-reclaimable kernel memory is
> allocated, for which there is no accounting, so theoretically I think a
> user can bring a system to unusability this way.
>
> Typically there are various limits on unreclaimable objects like this,
> like open file descriptors, and IIRC the kernel even has an internal
> limit on the number of struct files you initialize, based on the
> available system memory, so dma-buf / prime should already have some
> sort of protection.
 Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
 so there's not really a way to isolate gpu memory usage in a sane way for
 specific processes. But there's also zero limits on actual gpu usage
 itself (timeslices or whatever) so I guess no one asked for this yet.
>>> In its simplest form (like in TTM if correctly implemented by drivers)
>>> this type of accounting stops non-privileged malicious GPU-users from
>>> exhausting all system physical memory causing grief for other kernel
>>> systems but not from causing grief for other GPU users. I think that's
>>> the minimum level that's intended also for example also for the struct
>>> file accounting.
>> I think in i915 we're fairly close on that minimal standard - interactions
>> with shrinkers and oom logic work decently. It starts to fall apart though
>> when we've actually run out of memory - if the real memory hog is a gpu
>> process the oom killer won't notice all that memory since it's not
>> accounted against processes correctly.
>>
>> I don't agree that gpu process should be punished in general compared to
>> other subsystems in the kernel. If the user wants to use 90% of all memory
>> for gpu tasks then I want to make that possible, even if it means that
>> everything else thrashes horribly. And as long as the system recovers and
>> rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
>> fairly arbitrary (tunable) setting to limit system memory consumption, but
>> I might be wrong on that.
> Yes, it currently limits you to half of memory, but at least we would
> like to make it tuneable since there are a lot of user cases where the
> user wants to use 90% of memory for GPU tasks at the expense of
> everything else.
>
> Alex
>

It's in /sys/devices/virtual/drm/ttm/memory_accounting/*

Run-time tunable, but you should probably write an app to tune if you
want to hand out to users, since if you up the limit, you probably want
to modify a number of values.

zone_memory: ro: Total memory in the zone.
used_memory: ro: Currently pinned memory.
available_memory: rw: Allocation limit.
emergency_memory: rw: Allocation limit for CAP_SYS_ADMIN
swap_limit: rw: Swapper thread starts at this limit.

/Thomas










drm/nv50-/disp: audit and version DAC_LOAD method

2014-08-13 Thread Ben Skeggs
- Original Message -
> From: "Dan Carpenter" 
> To: bskeggs at redhat.com
> Cc: dri-devel at lists.freedesktop.org
> Sent: Wednesday, 13 August, 2014 9:29:16 PM
> Subject: re: drm/nv50-/disp: audit and version DAC_LOAD method
> 
> Hello Ben Skeggs,
> 
> The patch c4abd3178e11: "drm/nv50-/disp: audit and version DAC_LOAD
> method" from Aug 10, 2014, leads to the following static checker
> warning:
> 
>   drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c:78 nv50_dac_sense()
>   warn: 0xfff0 is larger than 16 bits
There was a mess-up when I cleaned up the structs, I've got a fix in my tree 
that I'll get to Linus in -fixes.

Thanks,
Ben.

> 
> drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c
> 65  nv50_dac_sense(NV50_DISP_MTHD_V1)
> 66  {
> 67  union {
> 68  struct nv50_disp_dac_load_v0 v0;
> 69  } *args = data;
> 70  const u32 doff = outp->or * 0x800;
> 71  u32 loadval;
> 72  int ret;
> 73
> 74  nv_ioctl(object, "disp dac load size %d\n", size);
> 75  if (nvif_unpack(args->v0, 0, 0, false)) {
> 76  nv_ioctl(object, "disp dac load vers %d data %08x\n",
> 77   args->v0.version, args->v0.data);
> 78  if (args->v0.data & 0xfff0)
> ^^
> This condition can't be true.  It's not clear what was intended.
> 
> 79  return -EINVAL;
> 80  loadval = args->v0.data;
> 81  } else
> 82  return ret;
> 
> 
> regards,
> dan carpenter
> 


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Thomas Hellstrom
On 08/13/2014 06:24 PM, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
 On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
>>> From: J?r?me Glisse 
>>>
>>> When experiencing memory pressure we want to minimize pool size so that
>>> memory we just shrinked is not added back again just as the next thing.
>>>
>>> This will divide by 2 the maximum pool size for each device each time
>>> the pool have to shrink. The limit is bumped again is next allocation
>>> happen after one second since the last shrink. The one second delay is
>>> obviously an arbitrary choice.
>> J?r?me,
>>
>> I don't like this patch. It adds extra complexity and its usefulness is
>> highly questionable.
>> There are a number of caches in the system, and if all of them added
>> some sort of voluntary shrink heuristics like this, we'd end up with
>> impossible-to-debug unpredictable performance issues.
>>
>> We should let the memory subsystem decide when to reclaim pages from
>> caches and what caches to reclaim them from.
> Yeah, artificially limiting your cache from growing when your shrinker
> gets called will just break the equal-memory pressure the core mm uses to
> rebalance between all caches when workload changes. In i915 we let
> everything grow without artificial bounds and only rely upon the shrinker
> callbacks to ensure we don't consume more than our fair share of available
> memory overall.
> -Daniel
 Now when you bring i915 memory usage up, Daniel,
 I can't refrain from bringing up the old user-space unreclaimable kernel
 memory issue, for which gem open is a good example ;) Each time
 user-space opens a gem handle, some un-reclaimable kernel memory is
 allocated, for which there is no accounting, so theoretically I think a
 user can bring a system to unusability this way.

 Typically there are various limits on unreclaimable objects like this,
 like open file descriptors, and IIRC the kernel even has an internal
 limit on the number of struct files you initialize, based on the
 available system memory, so dma-buf / prime should already have some
 sort of protection.
>>> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
>>> so there's not really a way to isolate gpu memory usage in a sane way for
>>> specific processes. But there's also zero limits on actual gpu usage
>>> itself (timeslices or whatever) so I guess no one asked for this yet.
>> In its simplest form (like in TTM if correctly implemented by drivers)
>> this type of accounting stops non-privileged malicious GPU-users from
>> exhausting all system physical memory causing grief for other kernel
>> systems but not from causing grief for other GPU users. I think that's
>> the minimum level that's intended also for example also for the struct
>> file accounting.
> I think in i915 we're fairly close on that minimal standard - interactions
> with shrinkers and oom logic work decently. It starts to fall apart though
> when we've actually run out of memory - if the real memory hog is a gpu
> process the oom killer won't notice all that memory since it's not
> accounted against processes correctly.
>
> I don't agree that gpu process should be punished in general compared to
> other subsystems in the kernel. If the user wants to use 90% of all memory
> for gpu tasks then I want to make that possible, even if it means that
> everything else thrashes horribly. And as long as the system recovers and
> rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
> fairly arbitrary (tunable) setting to limit system memory consumption, but
> I might be wrong on that.

No, that's correct, or rather it's intended to limit pinned
unreclaimable system memory (though part of what's unreclaimable could
actually be made reclaimable if we'd implement another shrinker level).

>>> My comment really was about balancing mm users under the assumption that
>>> they're all unlimited.
>> Yeah, sorry for stealing the thread. I usually bring this up now and
>> again but nowadays with an exponential backoff.
> Oh I'd love to see some cgroups or similar tracking so that server users
> could set sane per-process/user/task limits on how much memory/gpu time
> that group is allowed to consume. It's just that I haven't seen real
> demand for this and so couldn't make the time available to implement it.
> So thus far my goal is to make everything work nicely for unlimited tasks
> right up to the point where the OOM killer needs to step in. Past that
> everything starts to fall apart, but thus far that was good enough for
> desktop usage.

W

GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 6:38 PM, Daniel Vetter  wrote:
>> Yes, it currently limits you to half of memory, but at least we would
>> like to make it tuneable since there are a lot of user cases where the
>> user wants to use 90% of memory for GPU tasks at the expense of
>> everything else.
>
> Ime a lot of fun stuff starts to happen when you go there. We have piles
> of memory thrashing testcases and generally had lots of fun with our
> shrinker, so I think until you've really beaten onto those paths in
> ttm+radeon I'd keep the limit where it is.

One example that already starts if you go above 50% is that by default
the dirty pagecache is limited to 40% of memory. Above that you start
to stall in writeback, but gpus are really fast at re-dirtying memory.
So we've seen cases where the core mm OOMed with essentially 90% of
memory on writeback and piles of free swap. Waiting a few seconds for
the SSD to catch up would have gotten it out of that tight spot
without killing any process. One side-effect of such fun is that
memory allocations start to fail in really interesting places, and you
need to pile in hacks so make it all a bit more synchronous to avoid
the core mm freaking out.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 12:30:45PM -0400, Alex Deucher wrote:
> On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter  wrote:
> > On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
> >> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
> >> > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
> >> >> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> >> >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
> >>  On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> >> > From: J?r?me Glisse 
> >> >
> >> > When experiencing memory pressure we want to minimize pool size so 
> >> > that
> >> > memory we just shrinked is not added back again just as the next 
> >> > thing.
> >> >
> >> > This will divide by 2 the maximum pool size for each device each time
> >> > the pool have to shrink. The limit is bumped again is next allocation
> >> > happen after one second since the last shrink. The one second delay 
> >> > is
> >> > obviously an arbitrary choice.
> >>  J?r?me,
> >> 
> >>  I don't like this patch. It adds extra complexity and its usefulness 
> >>  is
> >>  highly questionable.
> >>  There are a number of caches in the system, and if all of them added
> >>  some sort of voluntary shrink heuristics like this, we'd end up with
> >>  impossible-to-debug unpredictable performance issues.
> >> 
> >>  We should let the memory subsystem decide when to reclaim pages from
> >>  caches and what caches to reclaim them from.
> >> >>> Yeah, artificially limiting your cache from growing when your shrinker
> >> >>> gets called will just break the equal-memory pressure the core mm uses 
> >> >>> to
> >> >>> rebalance between all caches when workload changes. In i915 we let
> >> >>> everything grow without artificial bounds and only rely upon the 
> >> >>> shrinker
> >> >>> callbacks to ensure we don't consume more than our fair share of 
> >> >>> available
> >> >>> memory overall.
> >> >>> -Daniel
> >> >> Now when you bring i915 memory usage up, Daniel,
> >> >> I can't refrain from bringing up the old user-space unreclaimable kernel
> >> >> memory issue, for which gem open is a good example ;) Each time
> >> >> user-space opens a gem handle, some un-reclaimable kernel memory is
> >> >> allocated, for which there is no accounting, so theoretically I think a
> >> >> user can bring a system to unusability this way.
> >> >>
> >> >> Typically there are various limits on unreclaimable objects like this,
> >> >> like open file descriptors, and IIRC the kernel even has an internal
> >> >> limit on the number of struct files you initialize, based on the
> >> >> available system memory, so dma-buf / prime should already have some
> >> >> sort of protection.
> >> > Oh yeah, we have zero cgroups limits or similar stuff for gem 
> >> > allocations,
> >> > so there's not really a way to isolate gpu memory usage in a sane way for
> >> > specific processes. But there's also zero limits on actual gpu usage
> >> > itself (timeslices or whatever) so I guess no one asked for this yet.
> >>
> >> In its simplest form (like in TTM if correctly implemented by drivers)
> >> this type of accounting stops non-privileged malicious GPU-users from
> >> exhausting all system physical memory causing grief for other kernel
> >> systems but not from causing grief for other GPU users. I think that's
> >> the minimum level that's intended also for example also for the struct
> >> file accounting.
> >
> > I think in i915 we're fairly close on that minimal standard - interactions
> > with shrinkers and oom logic work decently. It starts to fall apart though
> > when we've actually run out of memory - if the real memory hog is a gpu
> > process the oom killer won't notice all that memory since it's not
> > accounted against processes correctly.
> >
> > I don't agree that gpu process should be punished in general compared to
> > other subsystems in the kernel. If the user wants to use 90% of all memory
> > for gpu tasks then I want to make that possible, even if it means that
> > everything else thrashes horribly. And as long as the system recovers and
> > rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
> > fairly arbitrary (tunable) setting to limit system memory consumption, but
> > I might be wrong on that.
> 
> Yes, it currently limits you to half of memory, but at least we would
> like to make it tuneable since there are a lot of user cases where the
> user wants to use 90% of memory for GPU tasks at the expense of
> everything else.

Ime a lot of fun stuff starts to happen when you go there. We have piles
of memory thrashing testcases and generally had lots of fun with our
shrinker, so I think until you've really beaten onto those paths in
ttm+radeon I'd keep the limit where it is.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 05:09:49PM +0300, Oded Gabbay wrote:
> 
> 
> On 13/08/14 16:01, Daniel Vetter wrote:
> >On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
> >>On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> >>>On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> >From: J?r?me Glisse 
> >
> >When experiencing memory pressure we want to minimize pool size so that
> >memory we just shrinked is not added back again just as the next thing.
> >
> >This will divide by 2 the maximum pool size for each device each time
> >the pool have to shrink. The limit is bumped again is next allocation
> >happen after one second since the last shrink. The one second delay is
> >obviously an arbitrary choice.
> J?r?me,
> 
> I don't like this patch. It adds extra complexity and its usefulness is
> highly questionable.
> There are a number of caches in the system, and if all of them added
> some sort of voluntary shrink heuristics like this, we'd end up with
> impossible-to-debug unpredictable performance issues.
> 
> We should let the memory subsystem decide when to reclaim pages from
> caches and what caches to reclaim them from.
> >>>Yeah, artificially limiting your cache from growing when your shrinker
> >>>gets called will just break the equal-memory pressure the core mm uses to
> >>>rebalance between all caches when workload changes. In i915 we let
> >>>everything grow without artificial bounds and only rely upon the shrinker
> >>>callbacks to ensure we don't consume more than our fair share of available
> >>>memory overall.
> >>>-Daniel
> >>
> >>Now when you bring i915 memory usage up, Daniel,
> >>I can't refrain from bringing up the old user-space unreclaimable kernel
> >>memory issue, for which gem open is a good example ;) Each time
> >>user-space opens a gem handle, some un-reclaimable kernel memory is
> >>allocated, for which there is no accounting, so theoretically I think a
> >>user can bring a system to unusability this way.
> >>
> >>Typically there are various limits on unreclaimable objects like this,
> >>like open file descriptors, and IIRC the kernel even has an internal
> >>limit on the number of struct files you initialize, based on the
> >>available system memory, so dma-buf / prime should already have some
> >>sort of protection.
> >
> >Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
> >so there's not really a way to isolate gpu memory usage in a sane way for
> >specific processes. But there's also zero limits on actual gpu usage
> >itself (timeslices or whatever) so I guess no one asked for this yet.
> >
> >My comment really was about balancing mm users under the assumption that
> >they're all unlimited.
> >-Daniel
> >
> I think the point you brought up becomes very important for compute (HSA)
> processes. I still don't know how to distinguish between legitimate use of
> GPU local memory and misbehaving/malicious processes.
> 
> We have a requirement that HSA processes will be allowed to allocate and pin
> GPU local memory. They do it through an ioctl.
> In the kernel driver, we have an accounting of those memory allocations,
> meaning that I can print a list of all the objects that were allocated by a
> certain process, per device.
> Therefore, in theory, I can reclaim any object, but that will probably break
> the userspace app. If the app is misbehaving/malicious than that's ok, I
> guess. But how do I know that ? And what prevents that malicious app to
> re-spawn and do the same allocation again ?

You can't do that in the kernel, this is policy decisions which is
userspaces job. But what we instead need to allow is to properly track
memory allocations so that memory limits can be set with cgroups. With SVM
you get that for free. Without SVM we need some work in that area since
currently the memory accounting for gem/ttm drivers is broken.

The other bit is limits for wasting gpu time, and I guess for that we want
a new gpu time cgroup system so that users can set soft/hard limits for
different gpgpu tasks on servers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
> > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
> >> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>  On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> > From: J?r?me Glisse 
> >
> > When experiencing memory pressure we want to minimize pool size so that
> > memory we just shrinked is not added back again just as the next thing.
> >
> > This will divide by 2 the maximum pool size for each device each time
> > the pool have to shrink. The limit is bumped again is next allocation
> > happen after one second since the last shrink. The one second delay is
> > obviously an arbitrary choice.
>  J?r?me,
> 
>  I don't like this patch. It adds extra complexity and its usefulness is
>  highly questionable.
>  There are a number of caches in the system, and if all of them added
>  some sort of voluntary shrink heuristics like this, we'd end up with
>  impossible-to-debug unpredictable performance issues.
> 
>  We should let the memory subsystem decide when to reclaim pages from
>  caches and what caches to reclaim them from.
> >>> Yeah, artificially limiting your cache from growing when your shrinker
> >>> gets called will just break the equal-memory pressure the core mm uses to
> >>> rebalance between all caches when workload changes. In i915 we let
> >>> everything grow without artificial bounds and only rely upon the shrinker
> >>> callbacks to ensure we don't consume more than our fair share of available
> >>> memory overall.
> >>> -Daniel
> >> Now when you bring i915 memory usage up, Daniel,
> >> I can't refrain from bringing up the old user-space unreclaimable kernel
> >> memory issue, for which gem open is a good example ;) Each time
> >> user-space opens a gem handle, some un-reclaimable kernel memory is
> >> allocated, for which there is no accounting, so theoretically I think a
> >> user can bring a system to unusability this way.
> >>
> >> Typically there are various limits on unreclaimable objects like this,
> >> like open file descriptors, and IIRC the kernel even has an internal
> >> limit on the number of struct files you initialize, based on the
> >> available system memory, so dma-buf / prime should already have some
> >> sort of protection.
> > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
> > so there's not really a way to isolate gpu memory usage in a sane way for
> > specific processes. But there's also zero limits on actual gpu usage
> > itself (timeslices or whatever) so I guess no one asked for this yet.
> 
> In its simplest form (like in TTM if correctly implemented by drivers)
> this type of accounting stops non-privileged malicious GPU-users from
> exhausting all system physical memory causing grief for other kernel
> systems but not from causing grief for other GPU users. I think that's
> the minimum level that's intended also for example also for the struct
> file accounting.

I think in i915 we're fairly close on that minimal standard - interactions
with shrinkers and oom logic work decently. It starts to fall apart though
when we've actually run out of memory - if the real memory hog is a gpu
process the oom killer won't notice all that memory since it's not
accounted against processes correctly.

I don't agree that gpu process should be punished in general compared to
other subsystems in the kernel. If the user wants to use 90% of all memory
for gpu tasks then I want to make that possible, even if it means that
everything else thrashes horribly. And as long as the system recovers and
rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
fairly arbitrary (tunable) setting to limit system memory consumption, but
I might be wrong on that.

> > My comment really was about balancing mm users under the assumption that
> > they're all unlimited.
> 
> Yeah, sorry for stealing the thread. I usually bring this up now and
> again but nowadays with an exponential backoff.

Oh I'd love to see some cgroups or similar tracking so that server users
could set sane per-process/user/task limits on how much memory/gpu time
that group is allowed to consume. It's just that I haven't seen real
demand for this and so couldn't make the time available to implement it.
So thus far my goal is to make everything work nicely for unlimited tasks
right up to the point where the OOM killer needs to step in. Past that
everything starts to fall apart, but thus far that was good enough for
desktop usage.

Maybe WebGL will finally make this important enough so that we can fix it
for real ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Fence, timeline and android sync points

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 09:36:04AM -0400, Jerome Glisse wrote:
> On Wed, Aug 13, 2014 at 10:28:22AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 12, 2014 at 06:13:41PM -0400, Jerome Glisse wrote:
> > > Hi,
> > > 
> > > So i want over the whole fence and sync point stuff as it's becoming a 
> > > pressing
> > > issue. I think we first need to agree on what is the problem we want to 
> > > solve
> > > and what would be the requirements to solve it.
> > > 
> > > Problem :
> > >   Explicit synchronization btw different hardware block over a buffer 
> > > object.
> > > 
> > > Requirements :
> > >   Share common infrastructure.
> > >   Allow optimal hardware command stream scheduling accross hardware block.
> > >   Allow android sync point to be implemented on top of it.
> > >   Handle/acknowledge exception (like good old gpu lockup).
> > >   Minimize driver changes.
> > > 
> > > Glossary :
> > >   hardware timeline: timeline bound to a specific hardware block.
> > >   pipeline timeline: timeline bound to a userspace rendering pipeline, 
> > > each
> > >  point on that timeline can be a composite of several
> > >  different hardware pipeline point.
> > >   pipeline: abstract object representing userspace application graphic 
> > > pipeline
> > > of each of the application graphic operations.
> > >   fence: specific point in a timeline where synchronization needs to 
> > > happen.
> > > 
> > > 
> > > So now, current include/linux/fence.h implementation is i believe missing 
> > > the
> > > objective by confusing hardware and pipeline timeline and by bolting 
> > > fence to
> > > buffer object while what is really needed is true and proper timeline for 
> > > both
> > > hardware and pipeline. But before going further down that road let me 
> > > look at
> > > things and explain how i see them.
> > 
> > fences can be used free-standing and no one forces you to integrate them
> > with buffers. We actually plan to go this way with the intel svm stuff.
> > Ofc for dma-buf the plan is to synchronize using such fences, but that's
> > somewhat orthogonal I think. At least you only talk about fences and
> > timeline and not dma-buf here.
> >  
> > > Current ttm fence have one and a sole purpose, allow synchronization for 
> > > buffer
> > > object move even thought some driver like radeon slightly abuse it and 
> > > use them
> > > for things like lockup detection.
> > > 
> > > The new fence want to expose an api that would allow some implementation 
> > > of a
> > > timeline. For that it introduces callback and some hard requirement on 
> > > what the
> > > driver have to expose :
> > >   enable_signaling
> > >   [signaled]
> > >   wait
> > > 
> > > Each of those have to do work inside the driver to which the fence 
> > > belongs and
> > > each of those can be call more or less from unexpected (with restriction 
> > > like
> > > outside irq) context. So we end up with thing like :
> > > 
> > >  Process 1  Process 2   Process 3
> > >  I_A_schedule(fence0)
> > > CI_A_F_B_signaled(fence0)
> > > I_A_signal(fence0)
> > > 
> > > CI_B_F_A_callback(fence0)
> > > CI_A_F_B_wait(fence0)
> > > Lexique:
> > > I_x  in driver x (I_A == in driver A)
> > > CI_x_F_y call in driver X from driver Y (CI_A_F_B call in driver A from 
> > > driver B)
> > > 
> > > So this is an happy mess everyone call everyone and this bound to get 
> > > messy.
> > > Yes i know there is all kind of requirement on what happen once a fence is
> > > signaled. But those requirement only looks like they are trying to atone 
> > > any
> > > mess that can happen from the whole callback dance.
> > > 
> > > While i was too seduced by the whole callback idea long time ago, i think 
> > > it is
> > > a highly dangerous path to take where the combinatorial of what could 
> > > happen
> > > are bound to explode with the increase in the number of players.
> > > 
> > > 
> > > So now back to how to solve the problem we are trying to address. First i 
> > > want
> > > to make an observation, almost all GPU that exist today have a command 
> > > ring
> > > on to which userspace command buffer are executed and inside the command 
> > > ring
> > > you can do something like :
> > > 
> > >   if (condition) execute_command_buffer else skip_command_buffer
> > > 
> > > where condition is a simple expression (memory_address cop value)) with 
> > > cop one
> > > of the generic comparison (==, <, >, <=, >=). I think it is a safe 
> > > assumption
> > > that any gpu that slightly matter can do that. Those who can not should 
> > > fix
> > > there command ring processor.
> > > 
> > > 
> > > With that in mind, i think proper solution is implementing timeline and 
> > > having
> > > fence be a timeline object with a way simpler api. For each hardware 
> > > timelin

[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #18 from dimangi42 at gmail.com ---
(In reply to comment #16)
> What version of mesa are you using?  Does changing the mesa version help?

Sorry, I'm bit of a noob here, but how do you replace mesa with an earlier
version?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/81825666/attachment.html>


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Thomas Hellstrom
On 08/13/2014 04:09 PM, Oded Gabbay wrote:
>
>
> On 13/08/14 16:01, Daniel Vetter wrote:
>> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
 On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
>> From: J?r?me Glisse 
>>
>> When experiencing memory pressure we want to minimize pool size
>> so that
>> memory we just shrinked is not added back again just as the next
>> thing.
>>
>> This will divide by 2 the maximum pool size for each device each
>> time
>> the pool have to shrink. The limit is bumped again is next
>> allocation
>> happen after one second since the last shrink. The one second
>> delay is
>> obviously an arbitrary choice.
> J?r?me,
>
> I don't like this patch. It adds extra complexity and its
> usefulness is
> highly questionable.
> There are a number of caches in the system, and if all of them added
> some sort of voluntary shrink heuristics like this, we'd end up with
> impossible-to-debug unpredictable performance issues.
>
> We should let the memory subsystem decide when to reclaim pages from
> caches and what caches to reclaim them from.
 Yeah, artificially limiting your cache from growing when your shrinker
 gets called will just break the equal-memory pressure the core mm
 uses to
 rebalance between all caches when workload changes. In i915 we let
 everything grow without artificial bounds and only rely upon the
 shrinker
 callbacks to ensure we don't consume more than our fair share of
 available
 memory overall.
 -Daniel
>>>
>>> Now when you bring i915 memory usage up, Daniel,
>>> I can't refrain from bringing up the old user-space unreclaimable
>>> kernel
>>> memory issue, for which gem open is a good example ;) Each time
>>> user-space opens a gem handle, some un-reclaimable kernel memory is
>>> allocated, for which there is no accounting, so theoretically I think a
>>> user can bring a system to unusability this way.
>>>
>>> Typically there are various limits on unreclaimable objects like this,
>>> like open file descriptors, and IIRC the kernel even has an internal
>>> limit on the number of struct files you initialize, based on the
>>> available system memory, so dma-buf / prime should already have some
>>> sort of protection.
>>
>> Oh yeah, we have zero cgroups limits or similar stuff for gem
>> allocations,
>> so there's not really a way to isolate gpu memory usage in a sane way
>> for
>> specific processes. But there's also zero limits on actual gpu usage
>> itself (timeslices or whatever) so I guess no one asked for this yet.
>>
>> My comment really was about balancing mm users under the assumption that
>> they're all unlimited.
>> -Daniel
>>
> I think the point you brought up becomes very important for compute
> (HSA) processes. I still don't know how to distinguish between
> legitimate use of GPU local memory and misbehaving/malicious processes.
>
> We have a requirement that HSA processes will be allowed to allocate
> and pin GPU local memory. They do it through an ioctl.
> In the kernel driver, we have an accounting of those memory
> allocations, meaning that I can print a list of all the objects that
> were allocated by a certain process, per device.
> Therefore, in theory, I can reclaim any object, but that will probably
> break the userspace app. If the app is misbehaving/malicious than
> that's ok, I guess. But how do I know that ? And what prevents that
> malicious app to re-spawn and do the same allocation again ?

If you have per-process accounting of all those memory allocations and
you need to reclaim and there's no nice way of doing it, you should
probably do it like the kernel OOM killer: You simply kill the process
that is most likely to bring back most memory (or use any other
heuristic). Typically the kernel OOM killer does that when all swap
space is exhausted, probably assuming that the process that uses most
swap space is most likely to be malicious, if there are any malicious
processes.

If the process respawns, and you run out of resources again, repeat the
process.

/Thomas


>
> Oded



GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Thomas Hellstrom
On 08/13/2014 03:01 PM, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
 On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> From: J?r?me Glisse 
>
> When experiencing memory pressure we want to minimize pool size so that
> memory we just shrinked is not added back again just as the next thing.
>
> This will divide by 2 the maximum pool size for each device each time
> the pool have to shrink. The limit is bumped again is next allocation
> happen after one second since the last shrink. The one second delay is
> obviously an arbitrary choice.
 J?r?me,

 I don't like this patch. It adds extra complexity and its usefulness is
 highly questionable.
 There are a number of caches in the system, and if all of them added
 some sort of voluntary shrink heuristics like this, we'd end up with
 impossible-to-debug unpredictable performance issues.

 We should let the memory subsystem decide when to reclaim pages from
 caches and what caches to reclaim them from.
>>> Yeah, artificially limiting your cache from growing when your shrinker
>>> gets called will just break the equal-memory pressure the core mm uses to
>>> rebalance between all caches when workload changes. In i915 we let
>>> everything grow without artificial bounds and only rely upon the shrinker
>>> callbacks to ensure we don't consume more than our fair share of available
>>> memory overall.
>>> -Daniel
>> Now when you bring i915 memory usage up, Daniel,
>> I can't refrain from bringing up the old user-space unreclaimable kernel
>> memory issue, for which gem open is a good example ;) Each time
>> user-space opens a gem handle, some un-reclaimable kernel memory is
>> allocated, for which there is no accounting, so theoretically I think a
>> user can bring a system to unusability this way.
>>
>> Typically there are various limits on unreclaimable objects like this,
>> like open file descriptors, and IIRC the kernel even has an internal
>> limit on the number of struct files you initialize, based on the
>> available system memory, so dma-buf / prime should already have some
>> sort of protection.
> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
> so there's not really a way to isolate gpu memory usage in a sane way for
> specific processes. But there's also zero limits on actual gpu usage
> itself (timeslices or whatever) so I guess no one asked for this yet.

In its simplest form (like in TTM if correctly implemented by drivers)
this type of accounting stops non-privileged malicious GPU-users from
exhausting all system physical memory causing grief for other kernel
systems but not from causing grief for other GPU users. I think that's
the minimum level that's intended also for example also for the struct
file accounting.

> My comment really was about balancing mm users under the assumption that
> they're all unlimited.

Yeah, sorry for stealing the thread. I usually bring this up now and
again but nowadays with an exponential backoff.


> -Daniel

Thomas



GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Oded Gabbay


On 13/08/14 16:01, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
 On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> From: J?r?me Glisse 
>
> When experiencing memory pressure we want to minimize pool size so that
> memory we just shrinked is not added back again just as the next thing.
>
> This will divide by 2 the maximum pool size for each device each time
> the pool have to shrink. The limit is bumped again is next allocation
> happen after one second since the last shrink. The one second delay is
> obviously an arbitrary choice.
 J?r?me,

 I don't like this patch. It adds extra complexity and its usefulness is
 highly questionable.
 There are a number of caches in the system, and if all of them added
 some sort of voluntary shrink heuristics like this, we'd end up with
 impossible-to-debug unpredictable performance issues.

 We should let the memory subsystem decide when to reclaim pages from
 caches and what caches to reclaim them from.
>>> Yeah, artificially limiting your cache from growing when your shrinker
>>> gets called will just break the equal-memory pressure the core mm uses to
>>> rebalance between all caches when workload changes. In i915 we let
>>> everything grow without artificial bounds and only rely upon the shrinker
>>> callbacks to ensure we don't consume more than our fair share of available
>>> memory overall.
>>> -Daniel
>>
>> Now when you bring i915 memory usage up, Daniel,
>> I can't refrain from bringing up the old user-space unreclaimable kernel
>> memory issue, for which gem open is a good example ;) Each time
>> user-space opens a gem handle, some un-reclaimable kernel memory is
>> allocated, for which there is no accounting, so theoretically I think a
>> user can bring a system to unusability this way.
>>
>> Typically there are various limits on unreclaimable objects like this,
>> like open file descriptors, and IIRC the kernel even has an internal
>> limit on the number of struct files you initialize, based on the
>> available system memory, so dma-buf / prime should already have some
>> sort of protection.
>
> Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
> so there's not really a way to isolate gpu memory usage in a sane way for
> specific processes. But there's also zero limits on actual gpu usage
> itself (timeslices or whatever) so I guess no one asked for this yet.
>
> My comment really was about balancing mm users under the assumption that
> they're all unlimited.
> -Daniel
>
I think the point you brought up becomes very important for compute (HSA) 
processes. I still don't know how to distinguish between legitimate use of GPU 
local memory and misbehaving/malicious processes.

We have a requirement that HSA processes will be allowed to allocate and pin 
GPU 
local memory. They do it through an ioctl.
In the kernel driver, we have an accounting of those memory allocations, 
meaning 
that I can print a list of all the objects that were allocated by a certain 
process, per device.
Therefore, in theory, I can reclaim any object, but that will probably break 
the 
userspace app. If the app is misbehaving/malicious than that's ok, I guess. But 
how do I know that ? And what prevents that malicious app to re-spawn and do 
the 
same allocation again ?

Oded


Fence, timeline and android sync points

2014-08-13 Thread Christian König
> The whole issue is that today cs ioctl assume implied synchronization. So this
> can not change, so for now anything that goes through cs ioctl would need to
> use an implied timeline and have all ring that use common buffer synchronize
> on it. As long as those ring use different buffer there is no need for sync.
Exactly my thoughts.

> Buffer object are what links hw timeline.
A couple of people at AMD have a problem with that and I'm currently 
working full time on a solution. But solving this and keeping 100% 
backward compatibility at the same time is not an easy task.

> Of course there might be way to be more flexible if timeline are expose to
> userspace and userspace can create several of them for a single process.
Concurrent execution is mostly used for temporary things e.g. copying a 
result to a userspace buffer while VCE is decoding into the ring buffer 
at a different location for example. Creating an extra timeline just to 
tell the kernel that two commands are allowed to run in parallel sounds 
like to much overhead to me.

Cheers,
Christian.

Am 13.08.2014 um 15:41 schrieb Jerome Glisse:
> On Wed, Aug 13, 2014 at 09:59:26AM +0200, Christian K?nig wrote:
>> Hi Jerome,
>>
>> first of all that finally sounds like somebody starts to draw the whole
>> picture for me.
>>
>> So far all I have seen was a bunch of specialized requirements and some not
>> so obvious design decisions based on those requirements.
>>
>> So thanks a lot for finally summarizing the requirements from a top above
>> view and I perfectly agree with your analysis of the current fence design
>> and the downsides of that API.
>>
>> Apart from that I also have some comments / requirements that hopefully can
>> be taken into account as well:
>>
>>>pipeline timeline: timeline bound to a userspace rendering pipeline, each
>>>   point on that timeline can be a composite of several
>>>   different hardware pipeline point.
>>>pipeline: abstract object representing userspace application graphic 
>>> pipeline
>>>  of each of the application graphic operations.
>> In the long term a requirement for the driver for AMD GFX hardware is that
>> instead of a fixed pipeline timeline we need a bit more flexible model where
>> concurrent execution on different hardware engines is possible as well.
>>
>> So the requirement is that you can do things like submitting a 3D job A, a
>> DMA job B, a VCE job C and another 3D job D that are executed like this:
>>  A
>> /  \
>>B  C
>> \  /
>>  D
>>
>> (Let's just hope that looks as good on your mail client as it looked for
>> me).
> My thinking of hw timeline is that a gpu like amd or nvidia would have several
> different hw timeline. They are per block/engine so one for dma ring, one for
> gfx, one for vce, 
>
>   
>> My current thinking is that we avoid having a pipeline object in the kernel
>> and instead letting userspace specify which fence we want to synchronize to
>> explicitly as long as everything stays withing the same client. As soon as
>> any buffer is shared between clients the kernel we would need to fall back
>> to implicitly synchronization to allow backward compatibility with DRI2/3.
> The whole issue is that today cs ioctl assume implied synchronization. So this
> can not change, so for now anything that goes through cs ioctl would need to
> use an implied timeline and have all ring that use common buffer synchronize
> on it. As long as those ring use different buffer there is no need for sync.
>
> Buffer object are what links hw timeline.
>
> Of course there might be way to be more flexible if timeline are expose to
> userspace and userspace can create several of them for a single process.
>
>>>if (condition) execute_command_buffer else skip_command_buffer
>>>
>>> where condition is a simple expression (memory_address cop value)) with cop 
>>> one
>>> of the generic comparison (==, <, >, <=, >=). I think it is a safe 
>>> assumption
>>> that any gpu that slightly matter can do that. Those who can not should fix
>>> there command ring processor.
>> At least for some engines on AMD hardware that isn't possible (UVD, VCE and
>> in some extends DMA as well), but I don't see any reason why we shouldn't be
>> able to use software based scheduling on those engines by default. So this
>> isn't really a problem, but just an additional comment to keep in mind.
> Yes not everything can do that but as it's a simple memory access with simple
> comparison then it's easy to do on cpu for limited hardware. But this really
> sounds like something so easy to add to hw ring execution that it is a shame
> hw designer do not already added such thing.
>
>> Regards,
>> Christian.
>>
>> Am 13.08.2014 um 00:13 schrieb Jerome Glisse:
>>> Hi,
>>>
>>> So i want over the whole fence and sync point stuff as it's becoming a 
>>> pressing
>>> issue. I think we first need to agree on what is the problem we want to 
>>> solve
>>> 

[Bug 82186] [r600g] BARTS GPU lockup with minecraft shaders

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82186

EoD  changed:

   What|Removed |Added

 Attachment #104555|0   |1
is obsolete||

--- Comment #4 from EoD  ---
Created attachment 104572
  --> https://bugs.freedesktop.org/attachment.cgi?id=104572&action=edit
syslog with kernel 3.16.0, mesa 10.2.4 and libdrm master

A shorter log during the crash

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/c9c5aa12/attachment.html>


[Bug 81192] Garbled screen with VDPAU playback and h264

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=81192

--- Comment #5 from Simon Gebler  ---
Late answer, but mplayer doesn't give any output.
And I'm wondering if it could be related to Bug 71923

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/ea341002/attachment.html>


[PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

2014-08-13 Thread Michel Dänzer
On 13.08.2014 12:52, J?r?me Glisse wrote:
> From: J?r?me Glisse 
> 
> When experiencing memory pressure we want to minimize pool size so that
> memory we just shrinked is not added back again just as the next thing.
> 
> This will divide by 2 the maximum pool size for each device each time
> the pool have to shrink. The limit is bumped again is next allocation
> happen after one second since the last shrink. The one second delay is
> obviously an arbitrary choice.
> 
> Signed-off-by: J?r?me Glisse 

[...]

> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index 09874d6..ab41adf 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -68,6 +68,8 @@
>   * @list: Pool of free uc/wc pages for fast reuse.
>   * @gfp_flags: Flags to pass for alloc_page.
>   * @npages: Number of pages in pool.
> + * @cur_max_size: Current maximum size for the pool.
> + * @shrink_timeout: Timeout for pool maximum size restriction.
>   */
>  struct ttm_page_pool {
>   spinlock_t  lock;
> @@ -76,6 +78,8 @@ struct ttm_page_pool {
>   gfp_t   gfp_flags;
>   unsignednpages;
>   char*name;
> + unsignedcur_max_size;
> + unsigned long   last_shrink;

s/last_shrink/shrink_timeout/

Looks like maybe you posted an untested stale set of patches?


> @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct 
> ttm_page_pool *pool,
>   pool->nfrees += freed_pages;
>  }
>  
> +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool)
> +{
> + if (time_before(jiffies, pool->shrink_timeout))
> + return;
> + /* In case we reached zero bounce back to 512 pages. */
> + pool->cur_max_size = max(pool->cur_max_size << 1, 512);

Another 'comparison of distinct pointer types lacks a cast' warning.


Both issues apply to ttm_page_alloc_dma.c as well.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer


[PATCH 1/3] drm/ttm: set sensible pool size limit.

2014-08-13 Thread Michel Dänzer
On 13.08.2014 12:52, J?r?me Glisse wrote:
> From: J?r?me Glisse 
> 
> Due to bug in code it appear that some of the pool where never properly
> use and always empty. Before fixing that bug this patch set sensible
> limit on pool size. The magic 64MB number was nominated.
> 
> This is obviously a some what arbitrary number but the intend of ttm pool
> is to minimize page alloc cost especialy when allocating page that will be
> mark to be excluded from cpu cache mecanisms. We assume that mostly small
> buffer that are constantly allocated/deallocated might suffer from core
> memory allocation overhead as well as cache status change. This are the
> assumptions behind the 64MB value.
> 
> This obviously need some serious testing including monitoring pool size.
> 
> Signed-off-by: J?r?me Glisse 

[...]

> @@ -393,8 +404,9 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
>   pr_info("Zone %7s: Available graphics memory: %llu kiB\n",
>   zone->name, (unsigned long long)zone->max_mem >> 10);
>   }
> - ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
> - ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
> + max_pool_size = min(glob->zone_kernel->max_mem >> 3UL, MAX_POOL_SIZE);

This introduces a 'comparison of distinct pointer types lacks a cast'
warning for me.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer


[Bug 82050] R9270X pyrit benchmark perf regressions with latest kernel/llvm

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82050

--- Comment #12 from Andy Furniss  ---
(In reply to comment #11)
> Created attachment 104549 [details]
> Only flush HDP cache for indirect buffers from userspace
> 
> Does this patch help?

No, I'm afraid that doesn't help either.

Valley is the same - pyrit only slightly different, probably within random
variation. 

I am testing with "bad" llvm so the numbers are all low.
As I recorded them here's a paste of pyrit good (kernel), head, patch 1 and
patch 2

On good -

Running benchmark (57982.3 PMKs/s)... / 

Computed 58917.21 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 55101.4 PMKs/s (RTT 1.1)
#2: 'CPU-Core (SSE2)': 757.2 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 756.0 PMKs/s (RTT 3.0)
#4: 'CPU-Core (SSE2)': 755.5 PMKs/s (RTT 2.8)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)


On head

Running benchmark (50267.7 PMKs/s)... \ 

Computed 50096.30 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48501.1 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 757.5 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 757.1 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 757.0 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

Head + patch one

Running benchmark (50883.7 PMKs/s)... - 

Computed 51220.59 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48583.5 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 756.5 PMKs/s (RTT 3.0)
#3: 'CPU-Core (SSE2)': 756.0 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 754.2 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

Head + patch two

Running benchmark (51348.9 PMKs/s)... | 

Computed 50781.53 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48676.9 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 752.4 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 755.4 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 752.8 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/4134ee5d/attachment-0001.html>


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> > On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
> >> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> >>> From: J?r?me Glisse 
> >>>
> >>> When experiencing memory pressure we want to minimize pool size so that
> >>> memory we just shrinked is not added back again just as the next thing.
> >>>
> >>> This will divide by 2 the maximum pool size for each device each time
> >>> the pool have to shrink. The limit is bumped again is next allocation
> >>> happen after one second since the last shrink. The one second delay is
> >>> obviously an arbitrary choice.
> >> J?r?me,
> >>
> >> I don't like this patch. It adds extra complexity and its usefulness is
> >> highly questionable.
> >> There are a number of caches in the system, and if all of them added
> >> some sort of voluntary shrink heuristics like this, we'd end up with
> >> impossible-to-debug unpredictable performance issues.
> >>
> >> We should let the memory subsystem decide when to reclaim pages from
> >> caches and what caches to reclaim them from.
> > Yeah, artificially limiting your cache from growing when your shrinker
> > gets called will just break the equal-memory pressure the core mm uses to
> > rebalance between all caches when workload changes. In i915 we let
> > everything grow without artificial bounds and only rely upon the shrinker
> > callbacks to ensure we don't consume more than our fair share of available
> > memory overall.
> > -Daniel
> 
> Now when you bring i915 memory usage up, Daniel,
> I can't refrain from bringing up the old user-space unreclaimable kernel
> memory issue, for which gem open is a good example ;) Each time
> user-space opens a gem handle, some un-reclaimable kernel memory is
> allocated, for which there is no accounting, so theoretically I think a
> user can bring a system to unusability this way.
> 
> Typically there are various limits on unreclaimable objects like this,
> like open file descriptors, and IIRC the kernel even has an internal
> limit on the number of struct files you initialize, based on the
> available system memory, so dma-buf / prime should already have some
> sort of protection.

Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
so there's not really a way to isolate gpu memory usage in a sane way for
specific processes. But there's also zero limits on actual gpu usage
itself (timeslices or whatever) so I guess no one asked for this yet.

My comment really was about balancing mm users under the assumption that
they're all unlimited.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Thomas Hellstrom
On 08/13/2014 02:40 PM, David Herrmann wrote:
> Hi
>
> On Wed, Aug 13, 2014 at 2:35 PM, Thomas Hellstrom  
> wrote:
>> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
 On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> From: J?r?me Glisse 
>
> When experiencing memory pressure we want to minimize pool size so that
> memory we just shrinked is not added back again just as the next thing.
>
> This will divide by 2 the maximum pool size for each device each time
> the pool have to shrink. The limit is bumped again is next allocation
> happen after one second since the last shrink. The one second delay is
> obviously an arbitrary choice.
 J?r?me,

 I don't like this patch. It adds extra complexity and its usefulness is
 highly questionable.
 There are a number of caches in the system, and if all of them added
 some sort of voluntary shrink heuristics like this, we'd end up with
 impossible-to-debug unpredictable performance issues.

 We should let the memory subsystem decide when to reclaim pages from
 caches and what caches to reclaim them from.
>>> Yeah, artificially limiting your cache from growing when your shrinker
>>> gets called will just break the equal-memory pressure the core mm uses to
>>> rebalance between all caches when workload changes. In i915 we let
>>> everything grow without artificial bounds and only rely upon the shrinker
>>> callbacks to ensure we don't consume more than our fair share of available
>>> memory overall.
>>> -Daniel
>> Now when you bring i915 memory usage up, Daniel,
>> I can't refrain from bringing up the old user-space unreclaimable kernel
>> memory issue, for which gem open is a good example ;) Each time
>> user-space opens a gem handle, some un-reclaimable kernel memory is
>> allocated, for which there is no accounting, so theoretically I think a
>> user can bring a system to unusability this way.
>>
>> Typically there are various limits on unreclaimable objects like this,
>> like open file descriptors, and IIRC the kernel even has an internal
>> limit on the number of struct files you initialize, based on the
>> available system memory, so dma-buf / prime should already have some
>> sort of protection.
> gem->filp points to a fresh shmem file, which itself is limited like
> dmabuf. That should suffice, right?
>
> Thanks
> David
I'm thinking of situations where you have a gem name and open a new
handle. It allocates a new unaccounted idr object. Admittedly you'd have
to open a hell of a lot of new handles to stress the system, but that's
an example of the situation I'm thinking of. Similarly perhaps if you
create a gem handle from a prime file-descriptor but I haven't looked at
that code in detail.

Thanks

/Thomas



GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread David Herrmann
Hi

On Wed, Aug 13, 2014 at 2:35 PM, Thomas Hellstrom  
wrote:
> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
 From: J?r?me Glisse 

 When experiencing memory pressure we want to minimize pool size so that
 memory we just shrinked is not added back again just as the next thing.

 This will divide by 2 the maximum pool size for each device each time
 the pool have to shrink. The limit is bumped again is next allocation
 happen after one second since the last shrink. The one second delay is
 obviously an arbitrary choice.
>>> J?r?me,
>>>
>>> I don't like this patch. It adds extra complexity and its usefulness is
>>> highly questionable.
>>> There are a number of caches in the system, and if all of them added
>>> some sort of voluntary shrink heuristics like this, we'd end up with
>>> impossible-to-debug unpredictable performance issues.
>>>
>>> We should let the memory subsystem decide when to reclaim pages from
>>> caches and what caches to reclaim them from.
>> Yeah, artificially limiting your cache from growing when your shrinker
>> gets called will just break the equal-memory pressure the core mm uses to
>> rebalance between all caches when workload changes. In i915 we let
>> everything grow without artificial bounds and only rely upon the shrinker
>> callbacks to ensure we don't consume more than our fair share of available
>> memory overall.
>> -Daniel
>
> Now when you bring i915 memory usage up, Daniel,
> I can't refrain from bringing up the old user-space unreclaimable kernel
> memory issue, for which gem open is a good example ;) Each time
> user-space opens a gem handle, some un-reclaimable kernel memory is
> allocated, for which there is no accounting, so theoretically I think a
> user can bring a system to unusability this way.
>
> Typically there are various limits on unreclaimable objects like this,
> like open file descriptors, and IIRC the kernel even has an internal
> limit on the number of struct files you initialize, based on the
> available system memory, so dma-buf / prime should already have some
> sort of protection.

gem->filp points to a fresh shmem file, which itself is limited like
dmabuf. That should suffice, right?

Thanks
David


GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Thomas Hellstrom
On 08/13/2014 12:42 PM, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
>>> From: J?r?me Glisse 
>>>
>>> When experiencing memory pressure we want to minimize pool size so that
>>> memory we just shrinked is not added back again just as the next thing.
>>>
>>> This will divide by 2 the maximum pool size for each device each time
>>> the pool have to shrink. The limit is bumped again is next allocation
>>> happen after one second since the last shrink. The one second delay is
>>> obviously an arbitrary choice.
>> J?r?me,
>>
>> I don't like this patch. It adds extra complexity and its usefulness is
>> highly questionable.
>> There are a number of caches in the system, and if all of them added
>> some sort of voluntary shrink heuristics like this, we'd end up with
>> impossible-to-debug unpredictable performance issues.
>>
>> We should let the memory subsystem decide when to reclaim pages from
>> caches and what caches to reclaim them from.
> Yeah, artificially limiting your cache from growing when your shrinker
> gets called will just break the equal-memory pressure the core mm uses to
> rebalance between all caches when workload changes. In i915 we let
> everything grow without artificial bounds and only rely upon the shrinker
> callbacks to ensure we don't consume more than our fair share of available
> memory overall.
> -Daniel

Now when you bring i915 memory usage up, Daniel,
I can't refrain from bringing up the old user-space unreclaimable kernel
memory issue, for which gem open is a good example ;) Each time
user-space opens a gem handle, some un-reclaimable kernel memory is
allocated, for which there is no accounting, so theoretically I think a
user can bring a system to unusability this way.

Typically there are various limits on unreclaimable objects like this,
like open file descriptors, and IIRC the kernel even has an internal
limit on the number of struct files you initialize, based on the
available system memory, so dma-buf / prime should already have some
sort of protection.

/Thomas


>> /Thomas
>>> Signed-off-by: J?r?me Glisse 
>>> Cc: Mario Kleiner 
>>> Cc: Michel D?nzer 
>>> Cc: Thomas Hellstrom 
>>> Cc: Konrad Rzeszutek Wilk 
>>> ---
>>>  drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 
>>> +---
>>>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++--
>>>  2 files changed, 53 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> index 09874d6..ab41adf 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> @@ -68,6 +68,8 @@
>>>   * @list: Pool of free uc/wc pages for fast reuse.
>>>   * @gfp_flags: Flags to pass for alloc_page.
>>>   * @npages: Number of pages in pool.
>>> + * @cur_max_size: Current maximum size for the pool.
>>> + * @shrink_timeout: Timeout for pool maximum size restriction.
>>>   */
>>>  struct ttm_page_pool {
>>> spinlock_t  lock;
>>> @@ -76,6 +78,8 @@ struct ttm_page_pool {
>>> gfp_t   gfp_flags;
>>> unsignednpages;
>>> char*name;
>>> +   unsignedcur_max_size;
>>> +   unsigned long   last_shrink;
>>> unsigned long   nfrees;
>>> unsigned long   nrefills;
>>>  };
>>> @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct 
>>> ttm_page_pool *pool,
>>> pool->nfrees += freed_pages;
>>>  }
>>>  
>>> +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool)
>>> +{
>>> +   if (time_before(jiffies, pool->shrink_timeout))
>>> +   return;
>>> +   /* In case we reached zero bounce back to 512 pages. */
>>> +   pool->cur_max_size = max(pool->cur_max_size << 1, 512);
>>> +   pool->cur_max_size = min(pool->cur_max_size,
>>> +_manager->options.max_size);
>>> +}
>>> +
>>>  /**
>>>   * Free pages from pool.
>>>   *
>>> @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct 
>>> shrink_control *sc)
>>> if (shrink_pages == 0)
>>> break;
>>> pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
>>> +   /* No matter what make sure the pool do not grow in next 
>>> second. */
>>> +   pool->cur_max_size = pool->cur_max_size >> 1;
>>> +   pool->shrink_timeout = jiffies + HZ;
>>> shrink_pages = ttm_page_pool_free(pool, nr_free,
>>>   sc->gfp_mask);
>>> freed += nr_free - shrink_pages;
>>> @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, 
>>> unsigned npages, int flags,
>>> }
>>> /* Check that we don't go over the pool limit */
>>> npages = 0;
>>> -   if (pool->npages > _manager->options.max_size) {
>>> -   npages = pool->npages - _manager->options.max_

drm/nv50-/disp: audit and version DAC_LOAD method

2014-08-13 Thread Dan Carpenter
Hello Ben Skeggs,

The patch c4abd3178e11: "drm/nv50-/disp: audit and version DAC_LOAD
method" from Aug 10, 2014, leads to the following static checker
warning:

drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c:78 nv50_dac_sense()
warn: 0xfff0 is larger than 16 bits

drivers/gpu/drm/nouveau/core/engine/disp/dacnv50.c
65  nv50_dac_sense(NV50_DISP_MTHD_V1)
66  {
67  union {
68  struct nv50_disp_dac_load_v0 v0;
69  } *args = data;
70  const u32 doff = outp->or * 0x800;
71  u32 loadval;
72  int ret;
73  
74  nv_ioctl(object, "disp dac load size %d\n", size);
75  if (nvif_unpack(args->v0, 0, 0, false)) {
76  nv_ioctl(object, "disp dac load vers %d data %08x\n",
77   args->v0.version, args->v0.data);
78  if (args->v0.data & 0xfff0)
^^
This condition can't be true.  It's not clear what was intended.

79  return -EINVAL;
80  loadval = args->v0.data;
81  } else
82  return ret;


regards,
dan carpenter


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

Michel D?nzer  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |NOTABUG

--- Comment #9 from Michel D?nzer  ---
The tearing is due to xcompmgr, please use a compositing manager which can
unredirect fullscreen windows.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/7a7a880c/attachment-0001.html>


[Bug 82517] [RADEONSI,VDPAU] SIGSEGV in map_msg_fb_buf called from ruvd_destroy, when closing a Tab with accelerated video player

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82517

--- Comment #7 from Christian K?nig  ---
Created attachment 104563
  --> https://bugs.freedesktop.org/attachment.cgi?id=104563&action=edit
Possible fix.

That patch does the trick for me, please test.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/140782c3/attachment.html>


[PATCH] Documentation: DocBook: Rename drm_stub.c to drm_drv.c cause 'make xmldocs' failed

2014-08-13 Thread Randy Dunlap
On 08/08/14 12:16, Masanari Iida wrote:
> This patch fixed 'make xmldocs' failed on linus's tree and
> linux-next as of 8th/Aug,2014.
> 
> When drm merge for 3.17-rc1 happen,  a file was renamed from
> drm_stub.c to drm_drv.c.
> But Documentation/DocBook/drm.tmpl still have an old file name.
> 
> Signed-off-by: Masanari Iida 

Applied to my patch queue.  Thanks.

> ---
>  Documentation/DocBook/drm.tmpl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/DocBook/drm.tmpl b/Documentation/DocBook/drm.tmpl
> index 1d3756d..bacefc5 100644
> --- a/Documentation/DocBook/drm.tmpl
> +++ b/Documentation/DocBook/drm.tmpl
> @@ -315,7 +315,7 @@ char *date;
>  drm_dev_unregister() followed by a call to
>  drm_dev_unref().
>
> -!Edrivers/gpu/drm/drm_stub.c
> +!Edrivers/gpu/drm/drm_drv.c
>  
>  
>Driver Load
> 


-- 
~Randy


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #17 from dimangi42 at gmail.com ---
(In reply to comment #16)
> What version of mesa are you using?  Does changing the mesa version help?

ii  libegl1-mesa:amd64   10.1.3-0ubuntu0.1
amd64free implementation of the EGL API -- runtime
ii  libegl1-mesa-drivers:amd64   10.1.3-0ubuntu0.1
amd64free implementation of the EGL API -- hardware drivers
ii  libgl1-mesa-dri:amd6410.1.3-0ubuntu0.1
amd64free implementation of the OpenGL API -- DRI modules
ii  libgl1-mesa-glx:amd6410.1.3-0ubuntu0.1
amd64free implementation of the OpenGL API -- GLX runtime
ii  libglapi-mesa:amd64  10.1.3-0ubuntu0.1
amd64free implementation of the GL API -- shared library
ii  libglu1-mesa:amd64   9.0.0-2  
amd64Mesa OpenGL utility library (GLU)
ii  libopenvg1-mesa:amd6410.1.3-0ubuntu0.1
amd64free implementation of the OpenVG API -- runtime
ii  libwayland-egl1-mesa:amd64   10.1.3-0ubuntu0.1
amd64implementation of the Wayland EGL platform -- runtime
ii  mesa-utils   8.1.0-2  
amd64Miscellaneous Mesa GL utilities
ii  mesa-vdpau-drivers:amd64 10.1.3-0ubuntu0.1
amd64Mesa VDPAU video acceleration drivers

I will try to downgrade later today and report back.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/51b275d0/attachment.html>


Fence, timeline and android sync points

2014-08-13 Thread Jerome Glisse
On Wed, Aug 13, 2014 at 05:54:20PM +0200, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 09:36:04AM -0400, Jerome Glisse wrote:
> > On Wed, Aug 13, 2014 at 10:28:22AM +0200, Daniel Vetter wrote:
> > > On Tue, Aug 12, 2014 at 06:13:41PM -0400, Jerome Glisse wrote:
> > > > Hi,
> > > > 
> > > > So i want over the whole fence and sync point stuff as it's becoming a 
> > > > pressing
> > > > issue. I think we first need to agree on what is the problem we want to 
> > > > solve
> > > > and what would be the requirements to solve it.
> > > > 
> > > > Problem :
> > > >   Explicit synchronization btw different hardware block over a buffer 
> > > > object.
> > > > 
> > > > Requirements :
> > > >   Share common infrastructure.
> > > >   Allow optimal hardware command stream scheduling accross hardware 
> > > > block.
> > > >   Allow android sync point to be implemented on top of it.
> > > >   Handle/acknowledge exception (like good old gpu lockup).
> > > >   Minimize driver changes.
> > > > 
> > > > Glossary :
> > > >   hardware timeline: timeline bound to a specific hardware block.
> > > >   pipeline timeline: timeline bound to a userspace rendering pipeline, 
> > > > each
> > > >  point on that timeline can be a composite of 
> > > > several
> > > >  different hardware pipeline point.
> > > >   pipeline: abstract object representing userspace application graphic 
> > > > pipeline
> > > > of each of the application graphic operations.
> > > >   fence: specific point in a timeline where synchronization needs to 
> > > > happen.
> > > > 
> > > > 
> > > > So now, current include/linux/fence.h implementation is i believe 
> > > > missing the
> > > > objective by confusing hardware and pipeline timeline and by bolting 
> > > > fence to
> > > > buffer object while what is really needed is true and proper timeline 
> > > > for both
> > > > hardware and pipeline. But before going further down that road let me 
> > > > look at
> > > > things and explain how i see them.
> > > 
> > > fences can be used free-standing and no one forces you to integrate them
> > > with buffers. We actually plan to go this way with the intel svm stuff.
> > > Ofc for dma-buf the plan is to synchronize using such fences, but that's
> > > somewhat orthogonal I think. At least you only talk about fences and
> > > timeline and not dma-buf here.
> > >  
> > > > Current ttm fence have one and a sole purpose, allow synchronization 
> > > > for buffer
> > > > object move even thought some driver like radeon slightly abuse it and 
> > > > use them
> > > > for things like lockup detection.
> > > > 
> > > > The new fence want to expose an api that would allow some 
> > > > implementation of a
> > > > timeline. For that it introduces callback and some hard requirement on 
> > > > what the
> > > > driver have to expose :
> > > >   enable_signaling
> > > >   [signaled]
> > > >   wait
> > > > 
> > > > Each of those have to do work inside the driver to which the fence 
> > > > belongs and
> > > > each of those can be call more or less from unexpected (with 
> > > > restriction like
> > > > outside irq) context. So we end up with thing like :
> > > > 
> > > >  Process 1  Process 2   Process 3
> > > >  I_A_schedule(fence0)
> > > > CI_A_F_B_signaled(fence0)
> > > > I_A_signal(fence0)
> > > > 
> > > > CI_B_F_A_callback(fence0)
> > > > CI_A_F_B_wait(fence0)
> > > > Lexique:
> > > > I_x  in driver x (I_A == in driver A)
> > > > CI_x_F_y call in driver X from driver Y (CI_A_F_B call in driver A from 
> > > > driver B)
> > > > 
> > > > So this is an happy mess everyone call everyone and this bound to get 
> > > > messy.
> > > > Yes i know there is all kind of requirement on what happen once a fence 
> > > > is
> > > > signaled. But those requirement only looks like they are trying to 
> > > > atone any
> > > > mess that can happen from the whole callback dance.
> > > > 
> > > > While i was too seduced by the whole callback idea long time ago, i 
> > > > think it is
> > > > a highly dangerous path to take where the combinatorial of what could 
> > > > happen
> > > > are bound to explode with the increase in the number of players.
> > > > 
> > > > 
> > > > So now back to how to solve the problem we are trying to address. First 
> > > > i want
> > > > to make an observation, almost all GPU that exist today have a command 
> > > > ring
> > > > on to which userspace command buffer are executed and inside the 
> > > > command ring
> > > > you can do something like :
> > > > 
> > > >   if (condition) execute_command_buffer else skip_command_buffer
> > > > 
> > > > where condition is a simple expression (memory_address cop value)) with 
> > > > cop one
> > > > of the generic comparison (==, <, >, <=, >=). I think it is a safe 
> > > > assumption
> > > 

[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #16 from Alex Deucher  ---
What version of mesa are you using?  Does changing the mesa version help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/430f53b4/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #8 from Kertesz Laszlo  ---
(In reply to comment #5)
> Created attachment 104559 [details]
> Glxinfo
> 
> My current setup is: SLiM as login manager, Openbox as window manager, feh
> to show the desktop wallpaper, Xcompmgr for transparency, xfce4-panel for
> the panel.
> My autostart for Openbox is like this:
> feh --bg-scale /home/davide/Immagini/background.jpg &
> 
> if which xfce4-panel >/dev/null 2>&1; then
>   (sleep 2 && xcompmgr) &
>   (sleep 2 && xfce4-panel) &
> fi
> 
> I'm using linux 3.15.8, Mesa 10.2.5 and Xorg 1.16 on archlinux.

xcompmgr is a software compositing manager if im not mistaken. Try stopping it
before starting your games.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/7b79d540/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

dawide2211 at gmail.com changed:

   What|Removed |Added

 Attachment #104561|Corg.0.log  |Xorg.0.log
description||

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/6ef6dd12/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #7 from dawide2211 at gmail.com ---
Created attachment 104561
  --> https://bugs.freedesktop.org/attachment.cgi?id=104561&action=edit
Corg.0.log

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/fa211c70/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #6 from dawide2211 at gmail.com ---
Created attachment 104560
  --> https://bugs.freedesktop.org/attachment.cgi?id=104560&action=edit
dmesg

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/0ea1069b/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #5 from dawide2211 at gmail.com ---
Created attachment 104559
  --> https://bugs.freedesktop.org/attachment.cgi?id=104559&action=edit
Glxinfo

My current setup is: SLiM as login manager, Openbox as window manager, feh to
show the desktop wallpaper, Xcompmgr for transparency, xfce4-panel for the
panel.
My autostart for Openbox is like this:
feh --bg-scale /home/davide/Immagini/background.jpg &

if which xfce4-panel >/dev/null 2>&1; then
  (sleep 2 && xcompmgr) &
  (sleep 2 && xfce4-panel) &
fi

I'm using linux 3.15.8, Mesa 10.2.5 and Xorg 1.16 on archlinux.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/b24026b1/attachment.html>


[PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

2014-08-13 Thread Daniel Vetter
On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
> 
> On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> > From: J?r?me Glisse 
> >
> > When experiencing memory pressure we want to minimize pool size so that
> > memory we just shrinked is not added back again just as the next thing.
> >
> > This will divide by 2 the maximum pool size for each device each time
> > the pool have to shrink. The limit is bumped again is next allocation
> > happen after one second since the last shrink. The one second delay is
> > obviously an arbitrary choice.
> 
> J?r?me,
> 
> I don't like this patch. It adds extra complexity and its usefulness is
> highly questionable.
> There are a number of caches in the system, and if all of them added
> some sort of voluntary shrink heuristics like this, we'd end up with
> impossible-to-debug unpredictable performance issues.
> 
> We should let the memory subsystem decide when to reclaim pages from
> caches and what caches to reclaim them from.

Yeah, artificially limiting your cache from growing when your shrinker
gets called will just break the equal-memory pressure the core mm uses to
rebalance between all caches when workload changes. In i915 we let
everything grow without artificial bounds and only rely upon the shrinker
callbacks to ensure we don't consume more than our fair share of available
memory overall.
-Daniel

> 
> /Thomas
> >
> > Signed-off-by: J?r?me Glisse 
> > Cc: Mario Kleiner 
> > Cc: Michel D?nzer 
> > Cc: Thomas Hellstrom 
> > Cc: Konrad Rzeszutek Wilk 
> > ---
> >  drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 
> > +---
> >  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++--
> >  2 files changed, 53 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
> > b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> > index 09874d6..ab41adf 100644
> > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> > @@ -68,6 +68,8 @@
> >   * @list: Pool of free uc/wc pages for fast reuse.
> >   * @gfp_flags: Flags to pass for alloc_page.
> >   * @npages: Number of pages in pool.
> > + * @cur_max_size: Current maximum size for the pool.
> > + * @shrink_timeout: Timeout for pool maximum size restriction.
> >   */
> >  struct ttm_page_pool {
> > spinlock_t  lock;
> > @@ -76,6 +78,8 @@ struct ttm_page_pool {
> > gfp_t   gfp_flags;
> > unsignednpages;
> > char*name;
> > +   unsignedcur_max_size;
> > +   unsigned long   last_shrink;
> > unsigned long   nfrees;
> > unsigned long   nrefills;
> >  };
> > @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct 
> > ttm_page_pool *pool,
> > pool->nfrees += freed_pages;
> >  }
> >  
> > +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool)
> > +{
> > +   if (time_before(jiffies, pool->shrink_timeout))
> > +   return;
> > +   /* In case we reached zero bounce back to 512 pages. */
> > +   pool->cur_max_size = max(pool->cur_max_size << 1, 512);
> > +   pool->cur_max_size = min(pool->cur_max_size,
> > +_manager->options.max_size);
> > +}
> > +
> >  /**
> >   * Free pages from pool.
> >   *
> > @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct 
> > shrink_control *sc)
> > if (shrink_pages == 0)
> > break;
> > pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
> > +   /* No matter what make sure the pool do not grow in next 
> > second. */
> > +   pool->cur_max_size = pool->cur_max_size >> 1;
> > +   pool->shrink_timeout = jiffies + HZ;
> > shrink_pages = ttm_page_pool_free(pool, nr_free,
> >   sc->gfp_mask);
> > freed += nr_free - shrink_pages;
> > @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, 
> > unsigned npages, int flags,
> > }
> > /* Check that we don't go over the pool limit */
> > npages = 0;
> > -   if (pool->npages > _manager->options.max_size) {
> > -   npages = pool->npages - _manager->options.max_size;
> > -   /* free at least NUM_PAGES_TO_ALLOC number of pages
> > -* to reduce calls to set_memory_wb */
> > -   if (npages < NUM_PAGES_TO_ALLOC)
> > -   npages = NUM_PAGES_TO_ALLOC;
> > -   }
> > +   /*
> > +* Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to
> > +* set_memory_wb.
> > +*/
> > +   if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC))
> > +   npages = pool->npages - pool->cur_max_size;
> > spin_unlock_irqrestore(&pool->lock, irq_flags);
> > if (npages)
> > ttm_page_pool_free(pool, npages, GFP_KERNEL);
> > @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned 
> > npages, int flags

GEM memory DOS (WAS Re: [PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool)

2014-08-13 Thread Alex Deucher
On Wed, Aug 13, 2014 at 12:24 PM, Daniel Vetter  wrote:
> On Wed, Aug 13, 2014 at 05:13:56PM +0200, Thomas Hellstrom wrote:
>> On 08/13/2014 03:01 PM, Daniel Vetter wrote:
>> > On Wed, Aug 13, 2014 at 02:35:52PM +0200, Thomas Hellstrom wrote:
>> >> On 08/13/2014 12:42 PM, Daniel Vetter wrote:
>> >>> On Wed, Aug 13, 2014 at 11:06:25AM +0200, Thomas Hellstrom wrote:
>>  On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
>> > From: J?r?me Glisse 
>> >
>> > When experiencing memory pressure we want to minimize pool size so that
>> > memory we just shrinked is not added back again just as the next thing.
>> >
>> > This will divide by 2 the maximum pool size for each device each time
>> > the pool have to shrink. The limit is bumped again is next allocation
>> > happen after one second since the last shrink. The one second delay is
>> > obviously an arbitrary choice.
>>  J?r?me,
>> 
>>  I don't like this patch. It adds extra complexity and its usefulness is
>>  highly questionable.
>>  There are a number of caches in the system, and if all of them added
>>  some sort of voluntary shrink heuristics like this, we'd end up with
>>  impossible-to-debug unpredictable performance issues.
>> 
>>  We should let the memory subsystem decide when to reclaim pages from
>>  caches and what caches to reclaim them from.
>> >>> Yeah, artificially limiting your cache from growing when your shrinker
>> >>> gets called will just break the equal-memory pressure the core mm uses to
>> >>> rebalance between all caches when workload changes. In i915 we let
>> >>> everything grow without artificial bounds and only rely upon the shrinker
>> >>> callbacks to ensure we don't consume more than our fair share of 
>> >>> available
>> >>> memory overall.
>> >>> -Daniel
>> >> Now when you bring i915 memory usage up, Daniel,
>> >> I can't refrain from bringing up the old user-space unreclaimable kernel
>> >> memory issue, for which gem open is a good example ;) Each time
>> >> user-space opens a gem handle, some un-reclaimable kernel memory is
>> >> allocated, for which there is no accounting, so theoretically I think a
>> >> user can bring a system to unusability this way.
>> >>
>> >> Typically there are various limits on unreclaimable objects like this,
>> >> like open file descriptors, and IIRC the kernel even has an internal
>> >> limit on the number of struct files you initialize, based on the
>> >> available system memory, so dma-buf / prime should already have some
>> >> sort of protection.
>> > Oh yeah, we have zero cgroups limits or similar stuff for gem allocations,
>> > so there's not really a way to isolate gpu memory usage in a sane way for
>> > specific processes. But there's also zero limits on actual gpu usage
>> > itself (timeslices or whatever) so I guess no one asked for this yet.
>>
>> In its simplest form (like in TTM if correctly implemented by drivers)
>> this type of accounting stops non-privileged malicious GPU-users from
>> exhausting all system physical memory causing grief for other kernel
>> systems but not from causing grief for other GPU users. I think that's
>> the minimum level that's intended also for example also for the struct
>> file accounting.
>
> I think in i915 we're fairly close on that minimal standard - interactions
> with shrinkers and oom logic work decently. It starts to fall apart though
> when we've actually run out of memory - if the real memory hog is a gpu
> process the oom killer won't notice all that memory since it's not
> accounted against processes correctly.
>
> I don't agree that gpu process should be punished in general compared to
> other subsystems in the kernel. If the user wants to use 90% of all memory
> for gpu tasks then I want to make that possible, even if it means that
> everything else thrashes horribly. And as long as the system recovers and
> rebalances after that gpu memory hog is gone ofc. Iirc ttm currently has a
> fairly arbitrary (tunable) setting to limit system memory consumption, but
> I might be wrong on that.

Yes, it currently limits you to half of memory, but at least we would
like to make it tuneable since there are a lot of user cases where the
user wants to use 90% of memory for GPU tasks at the expense of
everything else.

Alex

>
>> > My comment really was about balancing mm users under the assumption that
>> > they're all unlimited.
>>
>> Yeah, sorry for stealing the thread. I usually bring this up now and
>> again but nowadays with an exponential backoff.
>
> Oh I'd love to see some cgroups or similar tracking so that server users
> could set sane per-process/user/task limits on how much memory/gpu time
> that group is allowed to consume. It's just that I haven't seen real
> demand for this and so couldn't make the time available to implement it.
> So thus far my goal is to make everything work nicely for unlimited tasks
> right up to the point where the OOM killer needs to step in

[Bug 82517] [RADEONSI,VDPAU] SIGSEGV in map_msg_fb_buf called from ruvd_destroy, when closing a Tab with accelerated video player

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82517

Christian K?nig  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #6 from Christian K?nig  ---
(In reply to comment #5)
> (Or maybe you can reproduce it on your end? Just pick one recored broadcast
> from Twitch, e.g.
> <http://www.twitch.tv/carbotanimations/b/557380266?t=1h01m> and it should
> segfault)

Took me a while, but I was able to reproduce the problem and take a complete
log.

> Another observation I made: if you run a YouTube video in the backgrund in
> parallel to e.g. opening
> <http://www.twitch.tv/carbotanimations/b/557380266?t=1h01m>, then suddenly
> Flash is not crashing. I traced VDPAU with a YT video running in the
> background and attached the file to this post.

No wonder that it crashes, what flash does here is destroying the device first
and then trying to destroy the decoder who depends on the device:

vdp_device_destroy(1)
vdp_decoder_destroy(2)

Sounds like a bug in flash to me, but on the other hand the VDPAU state tracker
shouldn't crash, but just return an error here.

Going to take a look into the code if we can't fix that somehow.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/5a150528/attachment-0001.html>


Fence, timeline and android sync points

2014-08-13 Thread Jerome Glisse
On Wed, Aug 13, 2014 at 04:08:14PM +0200, Christian K?nig wrote:
> >The whole issue is that today cs ioctl assume implied synchronization. So 
> >this
> >can not change, so for now anything that goes through cs ioctl would need to
> >use an implied timeline and have all ring that use common buffer synchronize
> >on it. As long as those ring use different buffer there is no need for sync.
> Exactly my thoughts.
> 
> >Buffer object are what links hw timeline.
> A couple of people at AMD have a problem with that and I'm currently working
> full time on a solution. But solving this and keeping 100% backward
> compatibility at the same time is not an easy task.

Let me rephrase, with current cs ioctl forcing synchronization for everybuffer
that appear on different hw ring is mandatory and there is no way to fix that.

That being said one can imagine a single buffer where one engine works on a
region of it and another hw block on another non overlapping region in which
case there is no need for synchronization btw those different hw block (like
multi-gpu each rendering one half of the screen). But to properly do such thing
you need to expose timeline or something like that to userspace and have user
space emit sync on this timeline. So something like :

cs_ioctl(timeline, cs) {return csno + hwtimeline_id;}
timeline_sync(nsync, seqno[], hwtimeline_id[])

When you schedule something using a new ioctl that just take a timeline as
extra parameter you add no synchronization to timeline you assume that user
space will call timeline_sync that will insert synchronization point on the
timeline. So you can schedule bunch of cs on different hwblock, user space
keep track of last emited cs seqno and its associated hwtimeline and when
it wants to synchronize it call the timeline sync and any new cs ioctl on
that timeline will have to wait before being able to schedule.

So really i do not see how to fix that properly without a new cs ioctl that
just take an extra timeline as a parameter (well in case of radeon we can add
a timeline chunk to cs ioctl).

> 
> >Of course there might be way to be more flexible if timeline are expose to
> >userspace and userspace can create several of them for a single process.
> Concurrent execution is mostly used for temporary things e.g. copying a
> result to a userspace buffer while VCE is decoding into the ring buffer at a
> different location for example. Creating an extra timeline just to tell the
> kernel that two commands are allowed to run in parallel sounds like to much
> overhead to me.

Never was my intention to create different timeline, like said above when
scheduling with explicit timeline then each time you schedule there is no
synchronization whatsoever. Only time an engine have to wait is when user
space emit an explicit sync point. Like said above.

The allowing multi-timeline per process is more for thing where you have a
process working on two different distinct problem with no interdependency.
Hence one timeline for each of those task but inside that task you can schedule
thing concurently as said above.

> 
> Cheers,
> Christian.
> 
> Am 13.08.2014 um 15:41 schrieb Jerome Glisse:
> >On Wed, Aug 13, 2014 at 09:59:26AM +0200, Christian K?nig wrote:
> >>Hi Jerome,
> >>
> >>first of all that finally sounds like somebody starts to draw the whole
> >>picture for me.
> >>
> >>So far all I have seen was a bunch of specialized requirements and some not
> >>so obvious design decisions based on those requirements.
> >>
> >>So thanks a lot for finally summarizing the requirements from a top above
> >>view and I perfectly agree with your analysis of the current fence design
> >>and the downsides of that API.
> >>
> >>Apart from that I also have some comments / requirements that hopefully can
> >>be taken into account as well:
> >>
> >>>   pipeline timeline: timeline bound to a userspace rendering pipeline, 
> >>> each
> >>>  point on that timeline can be a composite of several
> >>>  different hardware pipeline point.
> >>>   pipeline: abstract object representing userspace application graphic 
> >>> pipeline
> >>> of each of the application graphic operations.
> >>In the long term a requirement for the driver for AMD GFX hardware is that
> >>instead of a fixed pipeline timeline we need a bit more flexible model where
> >>concurrent execution on different hardware engines is possible as well.
> >>
> >>So the requirement is that you can do things like submitting a 3D job A, a
> >>DMA job B, a VCE job C and another 3D job D that are executed like this:
> >> A
> >>/  \
> >>   B  C
> >>\  /
> >> D
> >>
> >>(Let's just hope that looks as good on your mail client as it looked for
> >>me).
> >My thinking of hw timeline is that a gpu like amd or nvidia would have 
> >several
> >different hw timeline. They are per block/engine so one for dma ring, one for
> >gfx, one for vce, 
> >
> >>My current thinking is that we avoid ha

[Bug 82186] [r600g] BARTS GPU lockup with minecraft shaders

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82186

--- Comment #3 from EoD  ---
Mesa 10.2.4 with kernel 3.16.0 and 3.15.9 give about the same errors as the
dmesg file attached.

Unfortunately I cannot test current mesa git
(52901ec2615761390f5ef97b11516dae330d27d1) as there are some runtime linking
problems on my system.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/897b3610/attachment.html>


[PATCH 3/3] drm/ttm: under memory pressure minimize the size of memory pool

2014-08-13 Thread Thomas Hellstrom

On 08/13/2014 05:52 AM, J?r?me Glisse wrote:
> From: J?r?me Glisse 
>
> When experiencing memory pressure we want to minimize pool size so that
> memory we just shrinked is not added back again just as the next thing.
>
> This will divide by 2 the maximum pool size for each device each time
> the pool have to shrink. The limit is bumped again is next allocation
> happen after one second since the last shrink. The one second delay is
> obviously an arbitrary choice.

J?r?me,

I don't like this patch. It adds extra complexity and its usefulness is
highly questionable.
There are a number of caches in the system, and if all of them added
some sort of voluntary shrink heuristics like this, we'd end up with
impossible-to-debug unpredictable performance issues.

We should let the memory subsystem decide when to reclaim pages from
caches and what caches to reclaim them from.

/Thomas
>
> Signed-off-by: J?r?me Glisse 
> Cc: Mario Kleiner 
> Cc: Michel D?nzer 
> Cc: Thomas Hellstrom 
> Cc: Konrad Rzeszutek Wilk 
> ---
>  drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 
> +---
>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 27 ++--
>  2 files changed, 53 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index 09874d6..ab41adf 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -68,6 +68,8 @@
>   * @list: Pool of free uc/wc pages for fast reuse.
>   * @gfp_flags: Flags to pass for alloc_page.
>   * @npages: Number of pages in pool.
> + * @cur_max_size: Current maximum size for the pool.
> + * @shrink_timeout: Timeout for pool maximum size restriction.
>   */
>  struct ttm_page_pool {
>   spinlock_t  lock;
> @@ -76,6 +78,8 @@ struct ttm_page_pool {
>   gfp_t   gfp_flags;
>   unsignednpages;
>   char*name;
> + unsignedcur_max_size;
> + unsigned long   last_shrink;
>   unsigned long   nfrees;
>   unsigned long   nrefills;
>  };
> @@ -289,6 +293,16 @@ static void ttm_pool_update_free_locked(struct 
> ttm_page_pool *pool,
>   pool->nfrees += freed_pages;
>  }
>  
> +static inline void ttm_pool_update_max_size(struct ttm_page_pool *pool)
> +{
> + if (time_before(jiffies, pool->shrink_timeout))
> + return;
> + /* In case we reached zero bounce back to 512 pages. */
> + pool->cur_max_size = max(pool->cur_max_size << 1, 512);
> + pool->cur_max_size = min(pool->cur_max_size,
> +  _manager->options.max_size);
> +}
> +
>  /**
>   * Free pages from pool.
>   *
> @@ -407,6 +421,9 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct 
> shrink_control *sc)
>   if (shrink_pages == 0)
>   break;
>   pool = &_manager->pools[(i + pool_offset)%NUM_POOLS];
> + /* No matter what make sure the pool do not grow in next 
> second. */
> + pool->cur_max_size = pool->cur_max_size >> 1;
> + pool->shrink_timeout = jiffies + HZ;
>   shrink_pages = ttm_page_pool_free(pool, nr_free,
> sc->gfp_mask);
>   freed += nr_free - shrink_pages;
> @@ -701,13 +718,12 @@ static void ttm_put_pages(struct page **pages, unsigned 
> npages, int flags,
>   }
>   /* Check that we don't go over the pool limit */
>   npages = 0;
> - if (pool->npages > _manager->options.max_size) {
> - npages = pool->npages - _manager->options.max_size;
> - /* free at least NUM_PAGES_TO_ALLOC number of pages
> -  * to reduce calls to set_memory_wb */
> - if (npages < NUM_PAGES_TO_ALLOC)
> - npages = NUM_PAGES_TO_ALLOC;
> - }
> + /*
> +  * Free at least NUM_PAGES_TO_ALLOC number of pages to reduce calls to
> +  * set_memory_wb.
> +  */
> + if (pool->npages > (pool->cur_max_size + NUM_PAGES_TO_ALLOC))
> + npages = pool->npages - pool->cur_max_size;
>   spin_unlock_irqrestore(&pool->lock, irq_flags);
>   if (npages)
>   ttm_page_pool_free(pool, npages, GFP_KERNEL);
> @@ -751,6 +767,9 @@ static int ttm_get_pages(struct page **pages, unsigned 
> npages, int flags,
>   return 0;
>   }
>  
> + /* Update pool size in case shrinker limited it. */
> + ttm_pool_update_max_size(pool);
> +
>   /* combine zero flag to pool flags */
>   gfp_flags |= pool->gfp_flags;
>  
> @@ -803,6 +822,8 @@ static void ttm_page_pool_init_locked(struct 
> ttm_page_pool *pool, gfp_t flags,
>   pool->npages = pool->nfrees = 0;
>   pool->gfp_flags = flags;
>   pool->name = name;
> + pool->cur_max_size = _manager->options.max_size;
> + pool->shrink_timeout = jiffies;
>  }
>  
>  int ttm_page_alloc_init(struct ttm_mem_glob

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-13 Thread Michel Dänzer
On 12.08.2014 00:17, Jerome Glisse wrote:
> On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:
>> On 08/10/2014 08:02 PM, Mario Kleiner wrote:
>>> On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:
 On 08/10/2014 05:11 AM, Mario Kleiner wrote:
>
> The other problem is that probably TTM does not reuse pages from the
> DMA pool. If i trace the __ttm_dma_alloc_page
> 
> and
> __ttm_dma_free_page
> 
> calls for
> those single page allocs/frees, then over a 20 second interval of
> tracing and switching tabs in firefox, scrolling things around etc. i
> find about as many alloc's as i find free's, e.g., 1607 allocs vs.
> 1648 frees.
 This is because historically the pools have been designed to keep only
 pages with nonstandard caching attributes since changing page caching
 attributes have been very slow but the kernel page allocators have been
 reasonably fast.

 /Thomas
>>>
>>> Ok. A bit more ftraceing showed my hang problem case goes through the
>>> "if (is_cached)" paths, so the pool doesn't recycle anything and i see
>>> it bouncing up and down by 4 pages all the time.
>>>
>>> But for the non-cached case, which i don't hit with my problem, could
>>> one of you look at line 954...
>>>
>>> https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60
>>>
>>>
>>> ... and tell me why that unconditional npages = count; assignment
>>> makes sense? It seems to essentially disable all recycling for the dma
>>> pool whenever the pool isn't filled up to/beyond its maximum with free
>>> pages? When the pool is filled up, lots of stuff is recycled, but when
>>> it is already somewhat below capacity, it gets "punished" by not
>>> getting refilled? I'd just like to understand the logic behind that line.
>>>
>>> thanks,
>>> -mario
>>
>> I'll happily forward that question to Konrad who wrote the code (or it
>> may even stem from the ordinary page pool code which IIRC has Dave
>> Airlie / Jerome Glisse as authors)
> 
> This is effectively bogus code, i now wonder how it came to stay alive.
> Attached patch will fix that.

I haven't tested Mario's scenario specifically, but it survived piglit
and the UE4 Effects Cave Demo (for which 1GB of VRAM isn't enough, so
some BOs ended up in GTT instead with write-combined CPU mappings) on
radeonsi without any noticeable issues.

Tested-by: Michel D?nzer 


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer


[Bug 82186] [r600g] BARTS GPU lockup with minecraft shaders

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82186

--- Comment #2 from EoD  ---
Created attachment 104555
  --> https://bugs.freedesktop.org/attachment.cgi?id=104555&action=edit
dmesg with kernel 3.14.16, mesa 10.2.4 and libdrm 2.4.54

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/474082b1/attachment.html>


CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-13 Thread Lucas Stach
Am Dienstag, den 12.08.2014, 22:17 -0400 schrieb Jerome Glisse:
[...]
> > I haven't tested the patch yet. For the original bug it won't help directly,
> > because the super-slow allocations which cause the desktop stall are
> > tt_cached allocations, so they go through the if (is_cached) code path which
> > isn't improved by Jerome's patch. is_cached always releases memory
> > immediately, so the tt_cached pool just bounces up and down between 4 and 7
> > pages. So this was an independent issue. The slow allocations i noticed were
> > mostly caused by exa allocating new gem bo's, i don't know which path is
> > taken by 3d graphics?
> > 
> > However, the fixed ttm path could indirectly solve the DMA_CMA stalls by
> > completely killing CMA for its intended purpose. Typical CMA sizes are
> > probably around < 100 MB (kernel default is 16 MB, Ubuntu config is 64 MB),
> > and the limit for the page pool seems to be more like 50% of all system RAM?
> > Iow. if the ttm dma pool is allowed to grow that big with recycled pages, it
> > probably will almost completely monopolize the whole CMA memory after a
> > short amount of time. ttm won't suffer stalls if it essentially doesn't
> > interact with CMA anymore after a warmup period, but actual clients which
> > really need CMA (ie., hardware without scatter-gather dma etc.) will be
> > starved of what they need as far as my limited understanding of the CMA
> > goes.
> 
> Yes currently we allow the pool to be way too big, given that pool was 
> probably
> never really use we most likely never had much of an issue. So i would hold on
> applying my patch until more proper limit are in place. My thinking was to go
> for something like 32/64M at most and less then that if < 256M total ram. I 
> also
> think that we should lower the pool size on first call to shrink and only 
> increase
> it again after some timeout since last call to shrink so that when shrink is 
> call
> we minimize our pool size at least for a time. Will put together couple 
> patches
> for doing that.
> 
> > 
> > So fwiw probably the fix to ttm will increase the urgency for the CMA people
> > to come up with a fix/optimization for the allocator. Unless it doesn't
> > matter if most desktop systems have CMA disabled by default, and ttm is
> > mostly used by desktop graphics drivers (nouveau, radeon, vmgfx)? I only
> > stumbled over the problem because the Ubuntu 3.16 mainline testing kernels
> > are compiled with CMA on.
> > 
> 
> Enabling cma on x86 is proof of brain damage that said the dma allocator 
> should
> not use the cma area for single page allocation.
> 
Harsh words.

Yes, allocating pages unconditionally from CMA if it is enabled is an
artifact of CMAs ARM heritage. While it seems completely backwards to
allocate single pages from CMA on x86, on ARM the CMA pool is the only
way to get lowmem pages on which you are allowed to change the caching
state.

So the obvious fix is to avoid CMA for order 0 allocations on x86. I can
cook a patch for this.

Regards,
Lucas 
-- 
Pengutronix e.K. | Lucas Stach |
Industrial Linux Solutions   | http://www.pengutronix.de/  |



[Bug 80673] XCOM: Enemy Unknown - Wrong read access when starting the game

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=80673

--- Comment #11 from James Legg  ---
A new version of this game has been released which should fix this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/165b8636/attachment-0001.html>


Fence, timeline and android sync points

2014-08-13 Thread Daniel Vetter
On Tue, Aug 12, 2014 at 06:13:41PM -0400, Jerome Glisse wrote:
> Hi,
> 
> So i want over the whole fence and sync point stuff as it's becoming a 
> pressing
> issue. I think we first need to agree on what is the problem we want to solve
> and what would be the requirements to solve it.
> 
> Problem :
>   Explicit synchronization btw different hardware block over a buffer object.
> 
> Requirements :
>   Share common infrastructure.
>   Allow optimal hardware command stream scheduling accross hardware block.
>   Allow android sync point to be implemented on top of it.
>   Handle/acknowledge exception (like good old gpu lockup).
>   Minimize driver changes.
> 
> Glossary :
>   hardware timeline: timeline bound to a specific hardware block.
>   pipeline timeline: timeline bound to a userspace rendering pipeline, each
>  point on that timeline can be a composite of several
>  different hardware pipeline point.
>   pipeline: abstract object representing userspace application graphic 
> pipeline
> of each of the application graphic operations.
>   fence: specific point in a timeline where synchronization needs to happen.
> 
> 
> So now, current include/linux/fence.h implementation is i believe missing the
> objective by confusing hardware and pipeline timeline and by bolting fence to
> buffer object while what is really needed is true and proper timeline for both
> hardware and pipeline. But before going further down that road let me look at
> things and explain how i see them.

fences can be used free-standing and no one forces you to integrate them
with buffers. We actually plan to go this way with the intel svm stuff.
Ofc for dma-buf the plan is to synchronize using such fences, but that's
somewhat orthogonal I think. At least you only talk about fences and
timeline and not dma-buf here.

> Current ttm fence have one and a sole purpose, allow synchronization for 
> buffer
> object move even thought some driver like radeon slightly abuse it and use 
> them
> for things like lockup detection.
> 
> The new fence want to expose an api that would allow some implementation of a
> timeline. For that it introduces callback and some hard requirement on what 
> the
> driver have to expose :
>   enable_signaling
>   [signaled]
>   wait
> 
> Each of those have to do work inside the driver to which the fence belongs and
> each of those can be call more or less from unexpected (with restriction like
> outside irq) context. So we end up with thing like :
> 
>  Process 1  Process 2   Process 3
>  I_A_schedule(fence0)
> CI_A_F_B_signaled(fence0)
> I_A_signal(fence0)
> CI_B_F_A_callback(fence0)
> CI_A_F_B_wait(fence0)
> Lexique:
> I_x  in driver x (I_A == in driver A)
> CI_x_F_y call in driver X from driver Y (CI_A_F_B call in driver A from 
> driver B)
> 
> So this is an happy mess everyone call everyone and this bound to get messy.
> Yes i know there is all kind of requirement on what happen once a fence is
> signaled. But those requirement only looks like they are trying to atone any
> mess that can happen from the whole callback dance.
> 
> While i was too seduced by the whole callback idea long time ago, i think it 
> is
> a highly dangerous path to take where the combinatorial of what could happen
> are bound to explode with the increase in the number of players.
> 
> 
> So now back to how to solve the problem we are trying to address. First i want
> to make an observation, almost all GPU that exist today have a command ring
> on to which userspace command buffer are executed and inside the command ring
> you can do something like :
> 
>   if (condition) execute_command_buffer else skip_command_buffer
> 
> where condition is a simple expression (memory_address cop value)) with cop 
> one
> of the generic comparison (==, <, >, <=, >=). I think it is a safe assumption
> that any gpu that slightly matter can do that. Those who can not should fix
> there command ring processor.
> 
> 
> With that in mind, i think proper solution is implementing timeline and having
> fence be a timeline object with a way simpler api. For each hardware timeline
> driver provide a system memory address at which the lastest signaled fence
> sequence number can be read. Each fence object is uniquely associated with
> both a hardware and a pipeline timeline. Each pipeline timeline have a wait
> queue.
> 
> When scheduling something that require synchronization on a hardware timeline
> a fence is created and associated with the pipeline timeline and hardware
> timeline. Other hardware block that need to wait on a fence can use there
> command ring conditional execution to directly check the fence sequence from
> the other hw block so you do optimistic scheduling. If optimistic scheduling
> fails (which would be reported by

[Bug 81907] Unreal Engine "Effects Cave" Demo crashes with SIGBUS in __memcpy_sse2_unaligned

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=81907

Christoph Haag  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Christoph Haag  ---
This here doesn't happen anymore with recent mesa git builds either.

Cave Effects demo now runs on my GPU.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/718e45a7/attachment.html>


[Bug 82019] Unreal Engine Effects Cave demo lockup HD 7970M

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82019

Christoph Haag  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Christoph Haag  ---
Well, not sure what exactly fixed it, but with a recent mesa git build it runs
without a hang now.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/5d9ce1e2/attachment.html>


How to design a DRM KMS driver exposing 2D compositing?

2014-08-13 Thread Pekka Paalanen
On Tue, 12 Aug 2014 09:10:47 -0700
Eric Anholt  wrote:

> Pekka Paalanen  writes:
> 
> > On Mon, 11 Aug 2014 19:27:45 +0200
> > Daniel Vetter  wrote:
> >
> >> On Mon, Aug 11, 2014 at 10:16:24AM -0700, Eric Anholt wrote:
> >> > Daniel Vetter  writes:
> >> > 
> >> > > On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote:
> >> > >> Hi,
> >> > >> 
> >> > >> there is some hardware than can do 2D compositing with an arbitrary
> >> > >> number of planes. I'm not sure what the absolute maximum number of
> >> > >> planes is, but for the discussion, let's say it is 100.
> >> > >> 
> >> > >> There are many complicated, dynamic constraints on how many, what 
> >> > >> size,
> >> > >> etc. planes can be used at once. A driver would be able to check those
> >> > >> before kicking the 2D compositing engine.
> >> > >> 
> >> > >> The 2D compositing engine in the best case (only few planes used) is
> >> > >> able to composite on the fly in scanout, just like the usual overlay
> >> > >> hardware blocks in CRTCs. When the composition complexity goes up, the
> >> > >> driver can fall back to compositing into a buffer rather than on the
> >> > >> fly in scanout. This fallback needs to be completely transparent to 
> >> > >> the
> >> > >> user space, implying only additional latency if anything.
> >> > >> 
> >> > >> These 2D compositing features should be exposed to user space through 
> >> > >> a
> >> > >> standard kernel ABI, hopefully an existing ABI in the very near future
> >> > >> like the KMS atomic.
> >> > >
> >> > > I presume we're talking about the video core from raspi? Or at least
> >> > > something similar?
> >> > 
> >> > Pekka wasn't sure if things were confidential here, but I can say it:
> >> > Yeah, it's the RPi.
> >> > 
> >> > While I haven't written code using the compositor interface (I just did
> >> > enough to shim in a single plane for bringup, and I'm hoping Pekka and
> >> > company can handle the rest for me :) ), my understanding is that the
> >> > way you make use of it is that you've got your previous frame loaded up
> >> > in the HVS (the plane compositor hardware), then when you're asked to
> >> > put up a new frame that's going to be too hard, you take some
> >> > complicated chunk of your scene and ask the HVS to use any spare
> >> > bandwidth it has while it's still scanning out the previous frame in
> >> > order to composite that piece of new scene into memory.  Then, when it's
> >> > done with the offline composite, you ask the HVS to do the next scanout
> >> > frame using the original scene with the pre-composited temporary buffer.
> >> > 
> >> > I'm pretty comfortable with the idea of having some large number of
> >> > planes preallocated, and deciding that "nobody could possibly need more
> >> > than 16" (or whatever).
> >> > 
> >> > My initial reaction to "we should just punt when we run out of bandwidth
> >> > and have a special driver interface for offline composite" was "that's
> >> > awful, when the kernel could just get the job done immediately, and
> >> > easily, and it would know exactly what it needed to composite to get
> >> > things to fit (unlike userspace)".  I'm trying to come up with what
> >> > benefit there would be to having a separate interface for offline
> >> > composite.  I've got 3 things:
> >> > 
> >> > - Avoids having a potentially long, interruptible wait in the modeset
> >> >   path while the offline composite happens.  But I think we have other
> >> >   interruptible waits in that path alreaady.
> >> > 
> >> > - Userspace could potentially do something else besides use the HVS to
> >> >   get the fallback done.  Video would have to use the HVS, to get the
> >> >   same scaling filters applied as the previous frame where things *did*
> >> >   fit, but I guess you could composite some 1:1 RGBA overlays in GL,
> >> >   which would have more BW available to it than what you're borrowing
> >> >   from the previous frame's HVS capacity.
> >> > 
> >> > - Userspace could potentially use the offline composite interface for
> >> >   things besides just the running-out-of-bandwidth case.  Like, it was
> >> >   doing a nicely-filtered downscale of an overlaid video, then the user
> >> >   hit pause and walked away: you could have a timeout that noticed that
> >> >   the complicated scene hadn't changed in a while, and you'd drop from
> >> >   overlays to a HVS-composited single plane to reduce power.
> >> > 
> >> > The third one is the one I've actually found kind of compelling, and
> >> > might be switching me from wanting no userspace visibility into the
> >> > fallback.  But I don't have a good feel for how much complexity there is
> >> > to our descriptions of planes, and how much poorly-tested interface we'd
> >> > be adding to support this usecase.
> >> 
> >> Compositor should already do a rough bw guesstimate and if stuff doesn't
> >> change any more bake the entire scene into a single framebuffer. The exact
> >> same issue happens on more usual hw with video

Fence, timeline and android sync points

2014-08-13 Thread Christian König
Hi Jerome,

first of all that finally sounds like somebody starts to draw the whole 
picture for me.

So far all I have seen was a bunch of specialized requirements and some 
not so obvious design decisions based on those requirements.

So thanks a lot for finally summarizing the requirements from a top 
above view and I perfectly agree with your analysis of the current fence 
design and the downsides of that API.

Apart from that I also have some comments / requirements that hopefully 
can be taken into account as well:

>pipeline timeline: timeline bound to a userspace rendering pipeline, each
>   point on that timeline can be a composite of several
>   different hardware pipeline point.
>pipeline: abstract object representing userspace application graphic 
> pipeline
>  of each of the application graphic operations.
In the long term a requirement for the driver for AMD GFX hardware is 
that instead of a fixed pipeline timeline we need a bit more flexible 
model where concurrent execution on different hardware engines is 
possible as well.

So the requirement is that you can do things like submitting a 3D job A, 
a DMA job B, a VCE job C and another 3D job D that are executed like this:
 A
/  \
   B  C
\  /
 D

(Let's just hope that looks as good on your mail client as it looked for 
me).

My current thinking is that we avoid having a pipeline object in the 
kernel and instead letting userspace specify which fence we want to 
synchronize to explicitly as long as everything stays withing the same 
client. As soon as any buffer is shared between clients the kernel we 
would need to fall back to implicitly synchronization to allow backward 
compatibility with DRI2/3.

>if (condition) execute_command_buffer else skip_command_buffer
>
> where condition is a simple expression (memory_address cop value)) with cop 
> one
> of the generic comparison (==, <, >, <=, >=). I think it is a safe assumption
> that any gpu that slightly matter can do that. Those who can not should fix
> there command ring processor.
At least for some engines on AMD hardware that isn't possible (UVD, VCE 
and in some extends DMA as well), but I don't see any reason why we 
shouldn't be able to use software based scheduling on those engines by 
default. So this isn't really a problem, but just an additional comment 
to keep in mind.

Regards,
Christian.

Am 13.08.2014 um 00:13 schrieb Jerome Glisse:
> Hi,
>
> So i want over the whole fence and sync point stuff as it's becoming a 
> pressing
> issue. I think we first need to agree on what is the problem we want to solve
> and what would be the requirements to solve it.
>
> Problem :
>Explicit synchronization btw different hardware block over a buffer object.
>
> Requirements :
>Share common infrastructure.
>Allow optimal hardware command stream scheduling accross hardware block.
>Allow android sync point to be implemented on top of it.
>Handle/acknowledge exception (like good old gpu lockup).
>Minimize driver changes.
>
> Glossary :
>hardware timeline: timeline bound to a specific hardware block.
>pipeline timeline: timeline bound to a userspace rendering pipeline, each
>   point on that timeline can be a composite of several
>   different hardware pipeline point.
>pipeline: abstract object representing userspace application graphic 
> pipeline
>  of each of the application graphic operations.
>fence: specific point in a timeline where synchronization needs to happen.
>
>
> So now, current include/linux/fence.h implementation is i believe missing the
> objective by confusing hardware and pipeline timeline and by bolting fence to
> buffer object while what is really needed is true and proper timeline for both
> hardware and pipeline. But before going further down that road let me look at
> things and explain how i see them.
>
> Current ttm fence have one and a sole purpose, allow synchronization for 
> buffer
> object move even thought some driver like radeon slightly abuse it and use 
> them
> for things like lockup detection.
>
> The new fence want to expose an api that would allow some implementation of a
> timeline. For that it introduces callback and some hard requirement on what 
> the
> driver have to expose :
>enable_signaling
>[signaled]
>wait
>
> Each of those have to do work inside the driver to which the fence belongs and
> each of those can be call more or less from unexpected (with restriction like
> outside irq) context. So we end up with thing like :
>
>   Process 1  Process 2   Process 3
>   I_A_schedule(fence0)
>  CI_A_F_B_signaled(fence0)
>  I_A_signal(fence0)
>  CI_B_F_A_callback(fence0)
>  CI_A_F_B_wait(fence0)
> 

[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #15 from dimangi42 at gmail.com ---
Created attachment 104553
  --> https://bugs.freedesktop.org/attachment.cgi?id=104553&action=edit
Xorg log

with no acceleration

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/5e3719d8/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #14 from dimangi42 at gmail.com ---
Created attachment 104552
  --> https://bugs.freedesktop.org/attachment.cgi?id=104552&action=edit
dmesg 3.13 no acceleration

With xorg.conf "NoAccel" "true"

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/b47d67bc/attachment.html>


[Bug 82473] No picture

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82473

--- Comment #13 from dimangi42 at gmail.com ---
(In reply to comment #10)
> Just to be clear is it a problem specifically with HDMI or using any
> display?  Also make sure you fully uninstalled fglrx.  It overwrites some of
> the libs on your system with it's own versions.  You can also try disabling
> acceleration.  Add:
> Option "NoAccel" "true"
> to the device section of your xorg.conf

Disabling acceleration boots fine, and I get graphics as expected.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/be5bbc2f/attachment.html>


Fence, timeline and android sync points

2014-08-13 Thread Jerome Glisse
On Wed, Aug 13, 2014 at 09:59:26AM +0200, Christian K?nig wrote:
> Hi Jerome,
> 
> first of all that finally sounds like somebody starts to draw the whole
> picture for me.
> 
> So far all I have seen was a bunch of specialized requirements and some not
> so obvious design decisions based on those requirements.
> 
> So thanks a lot for finally summarizing the requirements from a top above
> view and I perfectly agree with your analysis of the current fence design
> and the downsides of that API.
> 
> Apart from that I also have some comments / requirements that hopefully can
> be taken into account as well:
> 
> >   pipeline timeline: timeline bound to a userspace rendering pipeline, each
> >  point on that timeline can be a composite of several
> >  different hardware pipeline point.
> >   pipeline: abstract object representing userspace application graphic 
> > pipeline
> > of each of the application graphic operations.
> In the long term a requirement for the driver for AMD GFX hardware is that
> instead of a fixed pipeline timeline we need a bit more flexible model where
> concurrent execution on different hardware engines is possible as well.
> 
> So the requirement is that you can do things like submitting a 3D job A, a
> DMA job B, a VCE job C and another 3D job D that are executed like this:
> A
>/  \
>   B  C
>\  /
> D
> 
> (Let's just hope that looks as good on your mail client as it looked for
> me).

My thinking of hw timeline is that a gpu like amd or nvidia would have several
different hw timeline. They are per block/engine so one for dma ring, one for
gfx, one for vce, 


> My current thinking is that we avoid having a pipeline object in the kernel
> and instead letting userspace specify which fence we want to synchronize to
> explicitly as long as everything stays withing the same client. As soon as
> any buffer is shared between clients the kernel we would need to fall back
> to implicitly synchronization to allow backward compatibility with DRI2/3.

The whole issue is that today cs ioctl assume implied synchronization. So this
can not change, so for now anything that goes through cs ioctl would need to
use an implied timeline and have all ring that use common buffer synchronize
on it. As long as those ring use different buffer there is no need for sync.

Buffer object are what links hw timeline.

Of course there might be way to be more flexible if timeline are expose to
userspace and userspace can create several of them for a single process.

> 
> >   if (condition) execute_command_buffer else skip_command_buffer
> >
> >where condition is a simple expression (memory_address cop value)) with cop 
> >one
> >of the generic comparison (==, <, >, <=, >=). I think it is a safe assumption
> >that any gpu that slightly matter can do that. Those who can not should fix
> >there command ring processor.
> At least for some engines on AMD hardware that isn't possible (UVD, VCE and
> in some extends DMA as well), but I don't see any reason why we shouldn't be
> able to use software based scheduling on those engines by default. So this
> isn't really a problem, but just an additional comment to keep in mind.

Yes not everything can do that but as it's a simple memory access with simple
comparison then it's easy to do on cpu for limited hardware. But this really
sounds like something so easy to add to hw ring execution that it is a shame
hw designer do not already added such thing.

> Regards,
> Christian.
> 
> Am 13.08.2014 um 00:13 schrieb Jerome Glisse:
> >Hi,
> >
> >So i want over the whole fence and sync point stuff as it's becoming a 
> >pressing
> >issue. I think we first need to agree on what is the problem we want to solve
> >and what would be the requirements to solve it.
> >
> >Problem :
> >   Explicit synchronization btw different hardware block over a buffer 
> > object.
> >
> >Requirements :
> >   Share common infrastructure.
> >   Allow optimal hardware command stream scheduling accross hardware block.
> >   Allow android sync point to be implemented on top of it.
> >   Handle/acknowledge exception (like good old gpu lockup).
> >   Minimize driver changes.
> >
> >Glossary :
> >   hardware timeline: timeline bound to a specific hardware block.
> >   pipeline timeline: timeline bound to a userspace rendering pipeline, each
> >  point on that timeline can be a composite of several
> >  different hardware pipeline point.
> >   pipeline: abstract object representing userspace application graphic 
> > pipeline
> > of each of the application graphic operations.
> >   fence: specific point in a timeline where synchronization needs to happen.
> >
> >
> >So now, current include/linux/fence.h implementation is i believe missing the
> >objective by confusing hardware and pipeline timeline and by bolting fence to
> >buffer object while what is really needed is true and proper time

Fence, timeline and android sync points

2014-08-13 Thread Jerome Glisse
On Wed, Aug 13, 2014 at 10:28:22AM +0200, Daniel Vetter wrote:
> On Tue, Aug 12, 2014 at 06:13:41PM -0400, Jerome Glisse wrote:
> > Hi,
> > 
> > So i want over the whole fence and sync point stuff as it's becoming a 
> > pressing
> > issue. I think we first need to agree on what is the problem we want to 
> > solve
> > and what would be the requirements to solve it.
> > 
> > Problem :
> >   Explicit synchronization btw different hardware block over a buffer 
> > object.
> > 
> > Requirements :
> >   Share common infrastructure.
> >   Allow optimal hardware command stream scheduling accross hardware block.
> >   Allow android sync point to be implemented on top of it.
> >   Handle/acknowledge exception (like good old gpu lockup).
> >   Minimize driver changes.
> > 
> > Glossary :
> >   hardware timeline: timeline bound to a specific hardware block.
> >   pipeline timeline: timeline bound to a userspace rendering pipeline, each
> >  point on that timeline can be a composite of several
> >  different hardware pipeline point.
> >   pipeline: abstract object representing userspace application graphic 
> > pipeline
> > of each of the application graphic operations.
> >   fence: specific point in a timeline where synchronization needs to happen.
> > 
> > 
> > So now, current include/linux/fence.h implementation is i believe missing 
> > the
> > objective by confusing hardware and pipeline timeline and by bolting fence 
> > to
> > buffer object while what is really needed is true and proper timeline for 
> > both
> > hardware and pipeline. But before going further down that road let me look 
> > at
> > things and explain how i see them.
> 
> fences can be used free-standing and no one forces you to integrate them
> with buffers. We actually plan to go this way with the intel svm stuff.
> Ofc for dma-buf the plan is to synchronize using such fences, but that's
> somewhat orthogonal I think. At least you only talk about fences and
> timeline and not dma-buf here.
>  
> > Current ttm fence have one and a sole purpose, allow synchronization for 
> > buffer
> > object move even thought some driver like radeon slightly abuse it and use 
> > them
> > for things like lockup detection.
> > 
> > The new fence want to expose an api that would allow some implementation of 
> > a
> > timeline. For that it introduces callback and some hard requirement on what 
> > the
> > driver have to expose :
> >   enable_signaling
> >   [signaled]
> >   wait
> > 
> > Each of those have to do work inside the driver to which the fence belongs 
> > and
> > each of those can be call more or less from unexpected (with restriction 
> > like
> > outside irq) context. So we end up with thing like :
> > 
> >  Process 1  Process 2   Process 3
> >  I_A_schedule(fence0)
> > CI_A_F_B_signaled(fence0)
> > I_A_signal(fence0)
> > 
> > CI_B_F_A_callback(fence0)
> > CI_A_F_B_wait(fence0)
> > Lexique:
> > I_x  in driver x (I_A == in driver A)
> > CI_x_F_y call in driver X from driver Y (CI_A_F_B call in driver A from 
> > driver B)
> > 
> > So this is an happy mess everyone call everyone and this bound to get messy.
> > Yes i know there is all kind of requirement on what happen once a fence is
> > signaled. But those requirement only looks like they are trying to atone any
> > mess that can happen from the whole callback dance.
> > 
> > While i was too seduced by the whole callback idea long time ago, i think 
> > it is
> > a highly dangerous path to take where the combinatorial of what could happen
> > are bound to explode with the increase in the number of players.
> > 
> > 
> > So now back to how to solve the problem we are trying to address. First i 
> > want
> > to make an observation, almost all GPU that exist today have a command ring
> > on to which userspace command buffer are executed and inside the command 
> > ring
> > you can do something like :
> > 
> >   if (condition) execute_command_buffer else skip_command_buffer
> > 
> > where condition is a simple expression (memory_address cop value)) with cop 
> > one
> > of the generic comparison (==, <, >, <=, >=). I think it is a safe 
> > assumption
> > that any gpu that slightly matter can do that. Those who can not should fix
> > there command ring processor.
> > 
> > 
> > With that in mind, i think proper solution is implementing timeline and 
> > having
> > fence be a timeline object with a way simpler api. For each hardware 
> > timeline
> > driver provide a system memory address at which the lastest signaled fence
> > sequence number can be read. Each fence object is uniquely associated with
> > both a hardware and a pipeline timeline. Each pipeline timeline have a wait
> > queue.
> > 
> > When scheduling something that require synchronization on a hardware 
> > time

[Bug 82050] R9270X pyrit benchmark perf regressions with latest kernel/llvm

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82050

Michel D?nzer  changed:

   What|Removed |Added

 Attachment #104475|0   |1
is obsolete||

--- Comment #11 from Michel D?nzer  ---
Created attachment 104549
  --> https://bugs.freedesktop.org/attachment.cgi?id=104549&action=edit
Only flush HDP cache for indirect buffers from userspace

Does this patch help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/05c31bd5/attachment.html>


[Bug 82544] New: Unreal Engine 4 Elemental fails to start up on Cape Verde with LLVM assertion failure

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82544

  Priority: medium
Bug ID: 82544
  Assignee: dri-devel at lists.freedesktop.org
   Summary: Unreal Engine 4 Elemental fails to start up on Cape
Verde with LLVM assertion failure
  Severity: normal
Classification: Unclassified
OS: All
  Reporter: michel at daenzer.net
  Hardware: Other
Status: NEW
   Version: git
 Component: Drivers/Gallium/radeonsi
   Product: Mesa

Created attachment 104542
  --> https://bugs.freedesktop.org/attachment.cgi?id=104542&action=edit
R600_DEBUG=vs output for failing shader

Trying to start the Elemental demo ends with this LLVM assertion failure:

Elemental: /home/daenzer/src/llvm-git/llvm/lib/CodeGen/VirtRegMap.cpp:369: void
(anonymous namespace)::VirtRegRewriter::rewrite(): Assertion `PhysReg &&
"Invalid SubReg for physical register"' failed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/56dc498f/attachment-0001.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #4 from Michel D?nzer  ---
Please attach /var/log/Xorg.0.log and the output of dmesg and glxinfo.

Are you using a compositing manager? If so, make sure it unredirects fullscreen
windows.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/ec3e9fdf/attachment.html>


[Bug 68856] Rendering artefacts with Unvanquished

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=68856

Michel D?nzer  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Michel D?nzer  ---
Seems to work fine with current Mesa.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/337496ba/attachment.html>


CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-13 Thread Mario Kleiner
On 08/13/2014 03:50 AM, Michel D?nzer wrote:
> On 12.08.2014 00:17, Jerome Glisse wrote:
>> On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:
>>> On 08/10/2014 08:02 PM, Mario Kleiner wrote:
 On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:
> On 08/10/2014 05:11 AM, Mario Kleiner wrote:
>> The other problem is that probably TTM does not reuse pages from the
>> DMA pool. If i trace the __ttm_dma_alloc_page
>> 
>> and
>> __ttm_dma_free_page
>> 
>> calls for
>> those single page allocs/frees, then over a 20 second interval of
>> tracing and switching tabs in firefox, scrolling things around etc. i
>> find about as many alloc's as i find free's, e.g., 1607 allocs vs.
>> 1648 frees.
> This is because historically the pools have been designed to keep only
> pages with nonstandard caching attributes since changing page caching
> attributes have been very slow but the kernel page allocators have been
> reasonably fast.
>
> /Thomas
 Ok. A bit more ftraceing showed my hang problem case goes through the
 "if (is_cached)" paths, so the pool doesn't recycle anything and i see
 it bouncing up and down by 4 pages all the time.

 But for the non-cached case, which i don't hit with my problem, could
 one of you look at line 954...

 https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A&s=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60


 ... and tell me why that unconditional npages = count; assignment
 makes sense? It seems to essentially disable all recycling for the dma
 pool whenever the pool isn't filled up to/beyond its maximum with free
 pages? When the pool is filled up, lots of stuff is recycled, but when
 it is already somewhat below capacity, it gets "punished" by not
 getting refilled? I'd just like to understand the logic behind that line.

 thanks,
 -mario
>>> I'll happily forward that question to Konrad who wrote the code (or it
>>> may even stem from the ordinary page pool code which IIRC has Dave
>>> Airlie / Jerome Glisse as authors)
>> This is effectively bogus code, i now wonder how it came to stay alive.
>> Attached patch will fix that.
> I haven't tested Mario's scenario specifically, but it survived piglit
> and the UE4 Effects Cave Demo (for which 1GB of VRAM isn't enough, so
> some BOs ended up in GTT instead with write-combined CPU mappings) on
> radeonsi without any noticeable issues.
>
> Tested-by: Michel D?nzer 
>
>

I haven't tested the patch yet. For the original bug it won't help 
directly, because the super-slow allocations which cause the desktop 
stall are tt_cached allocations, so they go through the if (is_cached) 
code path which isn't improved by Jerome's patch. is_cached always 
releases memory immediately, so the tt_cached pool just bounces up and 
down between 4 and 7 pages. So this was an independent issue. The slow 
allocations i noticed were mostly caused by exa allocating new gem bo's, 
i don't know which path is taken by 3d graphics?

However, the fixed ttm path could indirectly solve the DMA_CMA stalls by 
completely killing CMA for its intended purpose. Typical CMA sizes are 
probably around < 100 MB (kernel default is 16 MB, Ubuntu config is 64 
MB), and the limit for the page pool seems to be more like 50% of all 
system RAM? Iow. if the ttm dma pool is allowed to grow that big with 
recycled pages, it probably will almost completely monopolize the whole 
CMA memory after a short amount of time. ttm won't suffer stalls if it 
essentially doesn't interact with CMA anymore after a warmup period, but 
actual clients which really need CMA (ie., hardware without 
scatter-gather dma etc.) will be starved of what they need as far as my 
limited understanding of the CMA goes.

So fwiw probably the fix to ttm will increase the urgency for the CMA 
people to come up with a fix/optimization for the allocator. Unless it 
doesn't matter if most desktop systems have CMA disabled by default, and 
ttm is mostly used by desktop graphics drivers (nouveau, radeon, vmgfx)?

[Bug 82397] some GLSL demo render badly on Radeon 7870

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82397

--- Comment #2 from Damian Nowak  ---
Thanks for the info Michel. I'll bump my drivers soon and let you know.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/f8946066/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

--- Comment #3 from dawide2211 at gmail.com ---
Created attachment 104537
  --> https://bugs.freedesktop.org/attachment.cgi?id=104537&action=edit
Example 4, Mount&Blade

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/22323bec/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

dawide2211 at gmail.com changed:

   What|Removed |Added

 Attachment #104535|text/plain  |image/jpeg
  mime type||
 Attachment #104535|1   |0
   is patch||

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/19e7f84a/attachment-0001.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

dawide2211 at gmail.com changed:

   What|Removed |Added

 Attachment #104534|1   |0
is obsolete||

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/71b0c4dc/attachment.html>


[Bug 82533] Line of tearing while playing games

2014-08-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=82533

dawide2211 at gmail.com changed:

   What|Removed |Added

 Attachment #104534|text/plain  |image/jpeg
  mime type||
 Attachment #104534|1   |0
   is patch||
 Attachment #104534|0   |1
is obsolete||

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140813/f7caf533/attachment.html>