[RFC] drm: add support for tiled/compressed/etc modifier in addfb2

2014-12-13 Thread Ville Syrjälä
On Fri, Dec 12, 2014 at 04:33:38PM -0500, Rob Clark wrote:
> On Fri, Dec 12, 2014 at 3:57 PM, Laurent Pinchart
>  wrote:
> > Hi Rob,
> >
> > On Wednesday 10 December 2014 12:17:51 Rob Clark wrote:
> >> In DRM/KMS we are lacking a good way to deal with tiled/compressed
> >> formats.  Especially in the case of dmabuf/prime buffer sharing, where
> >> we cannot always rely on under-the-hood flags passed to driver specific
> >> gem-create ioctl to pass around these extra flags.
> >>
> >> The proposal is to add a per-plane format modifier.  This allows to, if
> >> necessary, use different tiling patters for sub-sampled planes, etc.
> >> The format modifiers are added at the end of the ioctl struct, so for
> >> legacy userspace it will be zero padded.
> >
> > But it will change the size of the structure, and thus the ioctl value. You
> > can't extend existing structures used in ioctls I'm afraid.
> 
> Actually, that is why it will work.  Old userspace passes smaller
> size, drm_ioctl() zero pads the difference..
> 
> The issue is (potentially) in the other direction (new userspace, old
> kernel) since the old kernel will ignore the new fields.  But that can
> be sorted w/ a cap/query

Or by using up one flag in the ioctl to specify whether the new fields
are valid or not.

> 
> BR,
> -R
> 
> 
> > By the way, is thus calls for an addfb3, I would add reserved fields at the
> > end of the structure to make future extensions possible without a new ioctl.
> >
> >> TODO how best to deal with assignment of modifier token values?  The
> >> rough idea was to namespace things with an 8bit vendor-id, and then
> >> beyond that it is treated as an opaque value.  But that was a relatively
> >> arbitrary choice.  There are cases where same tiling pattern and/or
> >> compression is supported by various different vendors.  So we should
> >> standardize to use the vendor-id and value of the first one who
> >> documents the format?
> >>
> >> TODO move definition of tokens to drm_fourcc.h?
> >>
> >> Signed-off-by: Rob Clark 
> >> ---
> >>  include/uapi/drm/drm_mode.h | 30 ++
> >>  1 file changed, 30 insertions(+)
> >>
> >> diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
> >> index aae71cb..c43648c 100644
> >> --- a/include/uapi/drm/drm_mode.h
> >> +++ b/include/uapi/drm/drm_mode.h
> >> @@ -330,6 +330,28 @@ struct drm_mode_fb_cmd {
> >>
> >>  #define DRM_MODE_FB_INTERLACED   (1<<0) /* for interlaced 
> >> framebuffers */
> >>
> >> +/*
> >> + * Format Modifiers:
> >> + *
> >> + * Format modifiers describe, typically, a re-ordering or modification
> >> + * of the data in a plane of an FB.  This can be used to express tiled/
> >> + * swizzled formats, or compression, or a combination of the two.
> >> + *
> >> + * The upper 8 bits of the format modifier are a vendor-id as assigned
> >> + * below.  The lower 24 bits are assigned as vendor sees fit.
> >> + */
> >> +
> >> +/* Vendor Ids: */
> >> +#define DRM_FOURCC_MOD_VENDOR_SAMSUNG 0x03
> >> +/* ... more */
> >> +
> >> +#define DRM_FOURCC_MOD(vendor, name) (((DRM_FOURCC_MOD_VENDOR_## vendor) 
> >> <<
> >> 24) | val)
> >> +
> >> +/* Modifier values: */
> >> +/* 64x32 macroblock, ie NV12MT: */
> >> +#define DRM_FOURCC_MOD_SAMSUNG_64_32_TILE DRM_FOURCC_MOD(SAMSUNG, 123)
> >> +/* ... more */
> >> +
> >>  struct drm_mode_fb_cmd2 {
> >>   __u32 fb_id;
> >>   __u32 width, height;
> >> @@ -349,10 +371,18 @@ struct drm_mode_fb_cmd2 {
> >>* So it would consist of Y as offsets[0] and UV as
> >>* offsets[1].  Note that offsets[0] will generally
> >>* be 0 (but this is not required).
> >> +  *
> >> +  * To accommodate tiled, compressed, etc formats, a per-plane
> >> +  * modifier can be specified.  The default value of zero
> >> +  * indicates "native" format as specified by the fourcc.
> >> +  * Vendor specific modifier token.  This allows, for example,
> >> +  * different tiling/swizzling pattern on different planes.
> >> +  * See discussion above of DRM_FOURCC_MOD_xxx.
> >>*/
> >>   __u32 handles[4];
> >>   __u32 pitches[4]; /* pitch for each plane */
> >>   __u32 offsets[4]; /* offset of each plane */
> >> + __u32 modifier[4]; /* ie, tiling, compressed (per plane) */
> >>  };
> >>
> >>  #define DRM_MODE_FB_DIRTY_ANNOTATE_COPY 0x01
> >
> > --
> > Regards,
> >
> > Laurent Pinchart
> >
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrjälä
Intel OTC


[PATCH 2/4] drm/cache: Try to be smarter about clflushing on x86

2014-12-13 Thread Matt Turner
On Sat, Dec 13, 2014 at 7:08 PM, Ben Widawsky
 wrote:
> Any GEM driver which has very large objects and a slow CPU is subject to very
> long waits simply for clflushing incoherent objects. Generally, each 
> individual
> object is not a problem, but if you have very large objects, or very many
> objects, the flushing begins to show up in profiles. Because on x86 we know 
> the
> cache size, we can easily determine when an object will use all the cache, and
> forego iterating over each cacheline.
>
> We need to be careful when using wbinvd. wbinvd() is itself potentially slow
> because it requires synchronizing the flush across all CPUs so they have a
> coherent view of memory. This can result in either stalling work being done on
> other CPUs, or this call itself stalling while waiting for a CPU to accept the
> interrupt. Also, wbinvd() also has the downside of invalidating all 
> cachelines,
> so we don't want to use it unless we're sure we already own most of the
> cachelines.
>
> The current algorithm is very naive. I think it can be tweaked more, and it
> would be good if someone else gave it some thought. I am pretty confident in
> i915, we can even skip the IPI in the execbuf path with minimal code change 
> (or
> perhaps just some verifying of the existing code). It would be nice to hear 
> what
> other developers who depend on this code think.
>
> Cc: Intel GFX 
> Signed-off-by: Ben Widawsky 
> ---
>  drivers/gpu/drm/drm_cache.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index d7797e8..6009c2d 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[],
> drm_clflush_page(*pages++);
> mb();
>  }
> +
> +static bool
> +drm_cache_should_clflush(unsigned long num_pages)
> +{
> +   const int cache_size = boot_cpu_data.x86_cache_size;
> +
> +   /* For now the algorithm simply checks if the number of pages to be
> +* flushed is greater than the entire system cache. One could make the
> +* function more aware of the actual system (ie. if SMP, how large is
> +* the cache, CPU freq. etc. All those help to determine when to
> +* wbinvd() */
> +   WARN_ON_ONCE(!cache_size);
> +   return !cache_size || num_pages < (cache_size >> 2);
> +}
>  #endif
>
>  void
> @@ -71,7 +85,7 @@ drm_clflush_pages(struct page *pages[], unsigned long 
> num_pages)
>  {
>
>  #if defined(CONFIG_X86)
> -   if (cpu_has_clflush) {
> +   if (cpu_has_clflush && drm_cache_should_clflush(num_pages)) {
> drm_cache_flush_clflush(pages, num_pages);
> return;
> }
> @@ -104,7 +118,7 @@ void
>  drm_clflush_sg(struct sg_table *st)
>  {
>  #if defined(CONFIG_X86)
> -   if (cpu_has_clflush) {
> +   if (cpu_has_clflush && drm_cache_should_clflush(st->nents)) {
> struct sg_page_iter sg_iter;
>
> mb();
> @@ -128,7 +142,7 @@ void
>  drm_clflush_virt_range(void *addr, unsigned long length)
>  {
>  #if defined(CONFIG_X86)
> -   if (cpu_has_clflush) {
> +   if (cpu_has_clflush && drm_cache_should_clflush(length / PAGE_SIZE)) {

If length isn't a multiple of page size, isn't this ignoring the
remainder? Should it be rounding length up to the next multiple of
PAGE_SIZE, like ROUND_UP_TO?


[PATCH 4/4] drm/i915: Opportunistically reduce flushing at execbuf

2014-12-13 Thread Ben Widawsky
If we're moving a bunch of buffers from the CPU domain to the GPU domain, and
we've already blown out the entire cache via a wbinvd, there is nothing more to
do.

With this and the previous patches, I am seeing a 3x FPS increase on a certain
benchmark which uses a giant 2d array texture. Unless I missed something in the
code, it should only effect non-LLC i915 platforms.

I haven't yet run any numbers for other benchmarks, nor have I attempted to
check if various conformance tests still pass.

NOTE: As mentioned in the previous patch, if one can easily obtain the largest
buffer and attempt to flush it first, the results would be even more desirable.

Cc: DRI Development 
Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/i915/i915_drv.h|  3 ++-
 drivers/gpu/drm/i915/i915_gem.c| 12 +---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +---
 drivers/gpu/drm/i915/intel_lrc.c   |  8 +---
 4 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d68c75f..fdb92a3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2642,7 +2642,8 @@ static inline bool i915_stop_ring_allow_warn(struct 
drm_i915_private *dev_priv)
 }

 void i915_gem_reset(struct drm_device *dev);
-bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
+enum drm_cache_flush
+i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
 int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
 int __must_check i915_gem_init(struct drm_device *dev);
 int i915_gem_init_rings(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index de241eb..3746738 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3608,7 +3608,7 @@ err_unpin:
return vma;
 }

-bool
+enum drm_cache_flush
 i915_gem_clflush_object(struct drm_i915_gem_object *obj,
bool force)
 {
@@ -3617,14 +3617,14 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 * again at bind time.
 */
if (obj->pages == NULL)
-   return false;
+   return DRM_CACHE_FLUSH_NONE;

/*
 * Stolen memory is always coherent with the GPU as it is explicitly
 * marked as wc by the system, or the system is cache-coherent.
 */
if (obj->stolen || obj->phys_handle)
-   return false;
+   return DRM_CACHE_FLUSH_NONE;

/* If the GPU is snooping the contents of the CPU cache,
 * we do not need to manually clear the CPU cache lines.  However,
@@ -3635,12 +3635,10 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 * tracking.
 */
if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
-   return false;
+   return DRM_CACHE_FLUSH_NONE;

trace_i915_gem_object_clflush(obj);
-   drm_clflush_sg(obj->pages);
-
-   return true;
+   return drm_clflush_sg(obj->pages);
 }

 /** Flushes the GTT write domain for the object if it's dirty. */
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0c25f62..e8eb9e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -827,7 +827,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs 
*ring,
 {
struct i915_vma *vma;
uint32_t flush_domains = 0;
-   bool flush_chipset = false;
+   enum drm_cache_flush flush_chipset = DRM_CACHE_FLUSH_NONE;
int ret;

list_for_each_entry(vma, vmas, exec_list) {
@@ -836,8 +836,10 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs 
*ring,
if (ret)
return ret;

-   if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
-   flush_chipset |= i915_gem_clflush_object(obj, false);
+   if (obj->base.write_domain & I915_GEM_DOMAIN_CPU &&
+   flush_chipset != DRM_CACHE_FLUSH_WBINVD) {
+   flush_chipset = i915_gem_clflush_object(obj, false);
+   }

flush_domains |= obj->base.write_domain;
}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 89b5577..a6c6ebd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -611,7 +611,7 @@ static int execlists_move_to_gpu(struct intel_ringbuffer 
*ringbuf,
struct intel_engine_cs *ring = ringbuf->ring;
struct i915_vma *vma;
uint32_t flush_domains = 0;
-   bool flush_chipset = false;
+   enum drm_cache_flush flush_chipset = DRM_CACHE_FLUSH_NONE;
int ret;

list_for_each_entry(vma, vmas, exec_list) {
@@ -621,8 +621,10 @@ static int execlists_move_to_gpu(struct intel_ringbuffer 

[PATCH 3/4] drm/cache: Return what type of cache flush occurred

2014-12-13 Thread Ben Widawsky
The caller of the cache flush APIs can sometimes do useful things with the
information of how the cache was flushed. For instance, when giving buffers to
the GPU to read, we need to make sure all of them have properly invalidated the
caches (when not using LLC). If we can determine a wbinvd() occurred though, we
can skip trying to clflush all remaining objects.

There is a further optimization to be made here in the driver specific code
where it can try to flush the largest object first in hopes of it needing a
wbinvd(). I haven't implemented that yet.

The enum parts of this were very minimally considered for the sake of getting
the data for the profile.

Cc: Intel GFX 
Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/drm_cache.c | 34 +-
 include/drm/drmP.h  | 13 ++---
 2 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 6009c2d..433b15d 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -80,21 +80,25 @@ drm_cache_should_clflush(unsigned long num_pages)
 }
 #endif

-void
+int
 drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 {

 #if defined(CONFIG_X86)
if (cpu_has_clflush && drm_cache_should_clflush(num_pages)) {
drm_cache_flush_clflush(pages, num_pages);
-   return;
+   return DRM_CACHE_FLUSH_CL;
}

-   if (wbinvd_on_all_cpus())
+   if (wbinvd_on_all_cpus()) {
printk(KERN_ERR "Timed out waiting for cache flush.\n");
+   return DRM_CACHE_FLUSH_ERROR;
+   } else
+   return DRM_CACHE_FLUSH_WBINVD;

 #elif defined(__powerpc__)
unsigned long i;
+   int ret = DRM_CACHE_FLUSH_NONE;
for (i = 0; i < num_pages; i++) {
struct page *page = pages[i];
void *page_virtual;
@@ -106,15 +110,19 @@ drm_clflush_pages(struct page *pages[], unsigned long 
num_pages)
flush_dcache_range((unsigned long)page_virtual,
   (unsigned long)page_virtual + PAGE_SIZE);
kunmap_atomic(page_virtual);
+   ret = DRM_CACHE_FLUSH_DCACHE;
}
+   WARN_ON(ret == DRM_CACHE_FLUSH_NONE);
+   return ret;
 #else
printk(KERN_ERR "Architecture has no drm_cache.c support\n");
WARN_ON_ONCE(1);
+   return DRM_CACHE_FLUSH_NONE;
 #endif
 }
 EXPORT_SYMBOL(drm_clflush_pages);

-void
+int
 drm_clflush_sg(struct sg_table *st)
 {
 #if defined(CONFIG_X86)
@@ -126,19 +134,23 @@ drm_clflush_sg(struct sg_table *st)
drm_clflush_page(sg_page_iter_page(_iter));
mb();

-   return;
+   return DRM_CACHE_FLUSH_CL;
}

-   if (wbinvd_on_all_cpus())
+   if (wbinvd_on_all_cpus()) {
printk(KERN_ERR "Timed out waiting for cache flush.\n");
+   return DRM_CACHE_FLUSH_ERROR;
+   } else
+   return DRM_CACHE_FLUSH_WBINVD;
 #else
printk(KERN_ERR "Architecture has no drm_cache.c support\n");
WARN_ON_ONCE(1);
+   return DRM_CACHE_FLUSH_NONE;
 #endif
 }
 EXPORT_SYMBOL(drm_clflush_sg);

-void
+int
 drm_clflush_virt_range(void *addr, unsigned long length)
 {
 #if defined(CONFIG_X86)
@@ -149,14 +161,18 @@ drm_clflush_virt_range(void *addr, unsigned long length)
clflushopt(addr);
clflushopt(end - 1);
mb();
-   return;
+   return DRM_CACHE_FLUSH_CL;
}

-   if (wbinvd_on_all_cpus())
+   if (wbinvd_on_all_cpus()) {
printk(KERN_ERR "Timed out waiting for cache flush.\n");
+   return DRM_CACHE_FLUSH_ERROR;
+   } else
+   return DRM_CACHE_FLUSH_WBINVD;
 #else
printk(KERN_ERR "Architecture has no drm_cache.c support\n");
WARN_ON_ONCE(1);
+   return DRM_CACHE_FLUSH_NONE;
 #endif
 }
 EXPORT_SYMBOL(drm_clflush_virt_range);
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 8ba35c6..09ebe46 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -884,9 +884,16 @@ int drm_noop(struct drm_device *dev, void *data,
 struct drm_file *file_priv);

 /* Cache management (drm_cache.c) */
-void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
-void drm_clflush_sg(struct sg_table *st);
-void drm_clflush_virt_range(void *addr, unsigned long length);
+enum drm_cache_flush {
+   DRM_CACHE_FLUSH_NONE=0,
+   DRM_CACHE_FLUSH_ERROR,
+   DRM_CACHE_FLUSH_CL,
+   DRM_CACHE_FLUSH_WBINVD,
+   DRM_CACHE_FLUSH_DCACHE,
+};
+int drm_clflush_pages(struct page *pages[], unsigned long num_pages);
+int drm_clflush_sg(struct sg_table *st);
+int drm_clflush_virt_range(void *addr, unsigned long length);

 /*
  * These are exported to drivers so that they can implement fencing using
-- 
2.1.3



[PATCH 2/4] drm/cache: Try to be smarter about clflushing on x86

2014-12-13 Thread Ben Widawsky
Any GEM driver which has very large objects and a slow CPU is subject to very
long waits simply for clflushing incoherent objects. Generally, each individual
object is not a problem, but if you have very large objects, or very many
objects, the flushing begins to show up in profiles. Because on x86 we know the
cache size, we can easily determine when an object will use all the cache, and
forego iterating over each cacheline.

We need to be careful when using wbinvd. wbinvd() is itself potentially slow
because it requires synchronizing the flush across all CPUs so they have a
coherent view of memory. This can result in either stalling work being done on
other CPUs, or this call itself stalling while waiting for a CPU to accept the
interrupt. Also, wbinvd() also has the downside of invalidating all cachelines,
so we don't want to use it unless we're sure we already own most of the
cachelines.

The current algorithm is very naive. I think it can be tweaked more, and it
would be good if someone else gave it some thought. I am pretty confident in
i915, we can even skip the IPI in the execbuf path with minimal code change (or
perhaps just some verifying of the existing code). It would be nice to hear what
other developers who depend on this code think.

Cc: Intel GFX 
Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/drm_cache.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index d7797e8..6009c2d 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[],
drm_clflush_page(*pages++);
mb();
 }
+
+static bool
+drm_cache_should_clflush(unsigned long num_pages)
+{
+   const int cache_size = boot_cpu_data.x86_cache_size;
+
+   /* For now the algorithm simply checks if the number of pages to be
+* flushed is greater than the entire system cache. One could make the
+* function more aware of the actual system (ie. if SMP, how large is
+* the cache, CPU freq. etc. All those help to determine when to
+* wbinvd() */
+   WARN_ON_ONCE(!cache_size);
+   return !cache_size || num_pages < (cache_size >> 2);
+}
 #endif

 void
@@ -71,7 +85,7 @@ drm_clflush_pages(struct page *pages[], unsigned long 
num_pages)
 {

 #if defined(CONFIG_X86)
-   if (cpu_has_clflush) {
+   if (cpu_has_clflush && drm_cache_should_clflush(num_pages)) {
drm_cache_flush_clflush(pages, num_pages);
return;
}
@@ -104,7 +118,7 @@ void
 drm_clflush_sg(struct sg_table *st)
 {
 #if defined(CONFIG_X86)
-   if (cpu_has_clflush) {
+   if (cpu_has_clflush && drm_cache_should_clflush(st->nents)) {
struct sg_page_iter sg_iter;

mb();
@@ -128,7 +142,7 @@ void
 drm_clflush_virt_range(void *addr, unsigned long length)
 {
 #if defined(CONFIG_X86)
-   if (cpu_has_clflush) {
+   if (cpu_has_clflush && drm_cache_should_clflush(length / PAGE_SIZE)) {
void *end = addr + length;
mb();
for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
-- 
2.1.3



[PATCH 1/4] drm/cache: Use wbinvd helpers

2014-12-13 Thread Ben Widawsky
When the original drm code was written there were no centralized functions for
doing a coordinated wbinvd across all CPUs. Now (since 2010) there are, so use
them instead of rolling a new one.

Cc: Intel GFX 
Signed-off-by: Ben Widawsky 
---
 drivers/gpu/drm/drm_cache.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index a6b6906..d7797e8 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -64,12 +64,6 @@ static void drm_cache_flush_clflush(struct page *pages[],
drm_clflush_page(*pages++);
mb();
 }
-
-static void
-drm_clflush_ipi_handler(void *null)
-{
-   wbinvd();
-}
 #endif

 void
@@ -82,7 +76,7 @@ drm_clflush_pages(struct page *pages[], unsigned long 
num_pages)
return;
}

-   if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
+   if (wbinvd_on_all_cpus())
printk(KERN_ERR "Timed out waiting for cache flush.\n");

 #elif defined(__powerpc__)
@@ -121,7 +115,7 @@ drm_clflush_sg(struct sg_table *st)
return;
}

-   if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
+   if (wbinvd_on_all_cpus())
printk(KERN_ERR "Timed out waiting for cache flush.\n");
 #else
printk(KERN_ERR "Architecture has no drm_cache.c support\n");
@@ -144,7 +138,7 @@ drm_clflush_virt_range(void *addr, unsigned long length)
return;
}

-   if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
+   if (wbinvd_on_all_cpus())
printk(KERN_ERR "Timed out waiting for cache flush.\n");
 #else
printk(KERN_ERR "Architecture has no drm_cache.c support\n");
-- 
2.1.3



[PATCH 0/4] [RFC] Sometimes opt for wbinvd over clflush

2014-12-13 Thread Ben Widawsky
While looking at a recent benchmark on a non-LLC platform, Kristian noticed that
the amount of time simply spent cflushing buffers was not only measurable, but
dominating the profile. It's possible I'm oversimplifying the problem, but it
seems like for cases where we have a slow CPU, and when you know the set of BOs
is using all or most of the cache, wbinvd is the optimal solution. The gains are
about 3.5x FPS on the micro-benchmark with these patches.

These patches attempt to make a generic solution which could potentially be used
by other drivers. It can just as easily be implemented solely in i915, and if
that's what people find more desirable and safe, I am happy to do that as well.

I wouldn't say these patches are ready for inclusion as I haven't spent much
time testing, or polishing them. I would like feedback on what people think of 
the
general idea. Thoughts on figuring out when to switch over to wbinvd, and in
particular [as mentioned in patch 3] if I even need to do the synchronized
wbinvd. (For the time being, I have convinced myself we can avoid it on i915,
but I am quite often wrong about such things; more details in the relevant
patch.)

PPC specific code is only compile tested.

Thanks.

Ben Widawsky (4):
  drm/cache: Use wbinvd helpers
  drm/cache: Try to be smarter about clflushing on x86
  drm/cache: Return what type of cache flush occurred
  drm/i915: Opportunistically reduce flushing at execbuf

 drivers/gpu/drm/drm_cache.c| 54 +-
 drivers/gpu/drm/i915/i915_drv.h|  3 +-
 drivers/gpu/drm/i915/i915_gem.c| 12 +++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +++--
 drivers/gpu/drm/i915/intel_lrc.c   |  8 +++--
 include/drm/drmP.h | 13 +--
 6 files changed, 66 insertions(+), 32 deletions(-)

-- 
2.1.3



[Bug 87278] Packet0 not allowed and GPU fault detected errors with Serious Engine games

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=87278

Christoph Haag  changed:

   What|Removed |Added

 CC||haagch at frickel.club

--- Comment #2 from Christoph Haag  ---
This also happens on a HD 7970M (pitcairn).

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor
Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Wimbledon XT [Radeon HD 7970M] (rev ff)

On latest mesa git master and both with linux 3.18 and drm-next-3.19.

Setting the game to lowest settings doesn't seem to help.

When closely looking at the ground the vm faults seem to stop, when looking a
little bit in the distance, they start happening again.

[47303.471209] radeon :01:00.0: GPU fault detected: 147 0x0fe2c401
[47303.471211] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0FF03E4E
[47303.471212] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x020C4001
[47303.471214] VM fault (0x01, vmid 1) at page 267402830, read from TC (196)
[47303.487684] radeon :01:00.0: GPU fault detected: 147 0x09c2c801
[47303.487689] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0FF03E4E
[47303.487691] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x020C8001
[47303.487694] VM fault (0x01, vmid 1) at page 267402830, read from TC (200)
[47303.487696] radeon :01:00.0: GPU fault detected: 147 0x09c28401
[47303.487698] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0FFF
[47303.487699] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x020C4001
[47303.487701] VM fault (0x01, vmid 1) at page 268435455, read from TC (196)
[47303.504293] radeon :01:00.0: GPU fault detected: 147 0x09c24801
[47303.504298] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0FF03E4E
[47303.504300] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x02048001
[47303.504302] VM fault (0x01, vmid 1) at page 267402830, read from TC (72)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/6559c228/attachment-0001.html>


[Bug 87291] Broken dpm for HD6670 since kernel 3.15

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=87291

Bug ID: 87291
   Summary: Broken dpm for HD6670 since kernel 3.15
   Product: DRI
   Version: XOrg git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: DRM/Radeon
  Assignee: dri-devel at lists.freedesktop.org
  Reporter: e.edgar.frank at gmail.com

Hello there, I'd like to report that since kernel 3.15 its nearly impossible to
boot past the grub screen because post grub, during X init my monitor starts to
endlessly power down/ power on. The only way to boot is to play around with the
restart button, in arround 10 restarts, ~once I would get X to initialize
properly. The other way is to put radeon.dpm=0 in the kernel-line. This way no
errors occur, but I am left without dynamic power managemenent for the GPU.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/55389559/attachment.html>


[Bug 87265] display turned off during normal work, radeon GPU lockup

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=87265

--- Comment #1 from Kamil Páral  ---
This happened before I upgraded to the latest
mesa-dri-drivers-10.3.5-1.20141207.fc21. It's possible it's fixed now. If you
can't easily extract information about what's wrong, let's close it and I'll
reopen if it happens again.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/631af226/attachment.html>


[Bug 80584] XCOM: Eenemy Unknown incorrect hair rendering

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=80584

--- Comment #5 from dex+fdobugzilla at dragonslave.de ---
I can confirm this for PITCAIRN as well.

Card: VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Pitcairn
  Subsystem: Micro-Star International Co., Ltd. Device 3036

Kernel: 3.18.0-rc7
libdrm: 2.4.58
mesa: git-gc97cbd7
llvm: 3.5

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/cc29b47c/attachment.html>


[Bug 87278] Packet0 not allowed and GPU fault detected errors with Serious Engine games

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=87278

--- Comment #1 from Daniel Scharrer  ---
Created attachment 110809
  --> https://bugs.freedesktop.org/attachment.cgi?id=110809=edit
standard output from The Talos Principle and Serious Sam 3

The standard output has this repeated a few times:

radeon: The kernel rejected CS, see dmesg for more information.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/6c5d2e87/attachment-0001.html>


[Bug 87278] Packet0 not allowed and GPU fault detected errors with Serious Engine games

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=87278

Bug ID: 87278
   Summary: Packet0 not allowed and GPU fault detected errors with
Serious Engine games
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/Gallium/radeonsi
  Assignee: dri-devel at lists.freedesktop.org
  Reporter: daniel at constexpr.org

Created attachment 110808
  --> https://bugs.freedesktop.org/attachment.cgi?id=110808=edit
dmesg output with the GPU fault errors filtered out

Running Serious Sam 3 or The Talos Principle spams dmesg with thousands of
these errors:

[ 6001.212237] radeon :01:00.0: GPU fault detected: 147 0x02528801
[ 6001.212243] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0FF02192
[ 6001.212246] radeon :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x12088001
[ 6001.212249] VM fault (0x01, vmid 9) at page 267395474, read from TC (136)

There are also a few "Packet0 not allowed" errors (followed by a hex dump):

[15446.473341] radeon :01:00.0: Packet0 not allowed!

So far it's only these errors in dmesg - I haven't observed any actual
rendering issues, crashes, GPU lockups because of this.

I have only attached a filtered kernel log with the GPU fault errors removed  -
the full log is available at http://constexpr.org/tmp/serious-dmesg.log (140
MiB).

Both of these games use the Serious Engine 3.5 (Serious Sam 3) or 4 (The Talos
Principle). This is also reproducible with The Talos Principle Public Test
which as of now is still available as a free download on Steam.

Kernel: 3.18.0-gentoo
GPU: Radeon HD 7950
Driver: radeonsi, Mesa 10.5.0-devel (git-ff96537)

This might be related to bug 84500 - however those spurious Packet0 have been
gone for a while now with updated Mesa - now I got them again but only while
running Serious Engine games.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/e28a3bab/attachment.html>


[Bug 89661] Kernel panic when trying use amdkfd driver on Kaveri

2014-12-13 Thread bugzilla-dae...@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=89661

--- Comment #1 from Bernd Steinhauser  ---
Created attachment 160441
  --> https://bugzilla.kernel.org/attachment.cgi?id=160441=edit
Picture of the kernel panic output

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 89661] New: Kernel panic when trying use amdkfd driver on Kaveri

2014-12-13 Thread bugzilla-dae...@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=89661

Bug ID: 89661
   Summary: Kernel panic when trying use amdkfd driver on Kaveri
   Product: Drivers
   Version: 2.5
Kernel Version: 3.18.0 + drm-next branch
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-dri at kernel-bugs.osdl.org
  Reporter: linux at bernd-steinhauser.de
Regression: No

The kernel I tried to use was 3.18.0 and I merged the drm-next branch from
git://people.freedesktop.org/~airlied/linux

which includes the HSA driver amdkfd. CONFIG_HSA_AMD=y is set.
When trying to boot the kernel, I get a kernel panic, as shown in the uploaded
picture.

CPU is a A10-7800 Kaveri, Motherboard is ASRock FM2A88X Extreme6+.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 66067] Trine 2's fragment normal buffer is mixtextured on Radeon HD 6770 (Juniper)

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=66067

--- Comment #30 from Daniel Scharrer  ---
(In reply to Nicholas Miell from comment #29)
> Does it ever actually sample from a depth texture or is unconditionally
> converting the SHADOW2D to 2D safe?

In the apitraces I have, depth formats (GL_DEPTH_COMPONENT* and
GL_DEPTH*_STENCIL*) are only ever used for renderbuffers. I also haven't
noticed any visual problems with the shadows other than ones that are also
there with fglrx.

No idea if this is true for the whole game and for all video settings (I only
tried Very High graphics detail + FXAA + 2xSSAA). It can also of course change
with future updates.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/363a80cc/attachment.html>


[PATCH] drm: sti: fix module compilation issue

2014-12-13 Thread Benjamin Gaignard
When compiling in module some symbol aren't missing,
export them correctly.

Signed-off-by: Benjamin Gaignard 
---
 drivers/gpu/drm/sti/sti_drm_plane.c | 1 +
 drivers/gpu/drm/sti/sti_hqvdp.c | 1 +
 drivers/gpu/drm/sti/sti_layer.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/sti/sti_drm_plane.c 
b/drivers/gpu/drm/sti/sti_drm_plane.c
index c9dd0e5..bb6a293 100644
--- a/drivers/gpu/drm/sti/sti_drm_plane.c
+++ b/drivers/gpu/drm/sti/sti_drm_plane.c
@@ -194,3 +194,4 @@ struct drm_plane *sti_drm_plane_init(struct drm_device *dev,

return >plane;
 }
+EXPORT_SYMBOL(sti_drm_plane_init);
diff --git a/drivers/gpu/drm/sti/sti_hqvdp.c b/drivers/gpu/drm/sti/sti_hqvdp.c
index 200d020..f3db05d 100644
--- a/drivers/gpu/drm/sti/sti_hqvdp.c
+++ b/drivers/gpu/drm/sti/sti_hqvdp.c
@@ -816,6 +816,7 @@ struct sti_layer *sti_hqvdp_create(struct device *dev)

return >layer;
 }
+EXPORT_SYMBOL(sti_hqvdp_create);

 static void sti_hqvdp_init_plugs(struct sti_hqvdp *hqvdp)
 {
diff --git a/drivers/gpu/drm/sti/sti_layer.c b/drivers/gpu/drm/sti/sti_layer.c
index 480ec1c..899104f 100644
--- a/drivers/gpu/drm/sti/sti_layer.c
+++ b/drivers/gpu/drm/sti/sti_layer.c
@@ -40,6 +40,7 @@ const char *sti_layer_to_str(struct sti_layer *layer)
return "";
}
 }
+EXPORT_SYMBOL(sti_layer_to_str);

 struct sti_layer *sti_layer_create(struct device *dev, int desc,
   void __iomem *baseaddr)
@@ -77,6 +78,7 @@ struct sti_layer *sti_layer_create(struct device *dev, int 
desc,

return layer;
 }
+EXPORT_SYMBOL(sti_layer_create);

 int sti_layer_prepare(struct sti_layer *layer,
  struct drm_crtc *crtc,
-- 
1.9.1



[Bug 87126] Fragment shader value lock up rv635 device

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=87126

pavol at klacansky.com changed:

   What|Removed |Added

   Priority|medium  |highest
   Severity|normal  |major

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/843e0617/attachment.html>


[Bug 73338] Fan speed in idle at 40% with radeonsi and at 18% with catalyst

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=73338

--- Comment #72 from Martin Peres  ---
Hey Oleg,

(In reply to Chernovsky Oleg from comment #68)
> (In reply to Martin Peres from comment #66)
> 
> Martin, there're things I think I didn't get correctly:
> - What should I return if pwm1_enable is 0 and I'm trying to push 50 into
> pwm1?

I made Nouveau return -EINVAL.

> - Can pwm1_enabled value be 2 after I push 0 to it? Radeon default fan value
> comes from fuse and it's not 100%

So, you mean that the boot default for the fan mode is the automatic mode,
right? That's perfectly fine.

The only thing you need to make sure of is that when the user writes 0 to
pwm1_enable, you should disable auto fan management and set the fan speed to
100%.

To be honest, I find the mode 0 to be useless, but the spec is the spec :s

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/35e913d4/attachment-0001.html>


[Bug 66067] Trine 2's fragment normal buffer is mixtextured on Radeon HD 6770 (Juniper)

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=66067

--- Comment #29 from Nicholas Miell  ---
Does it ever actually sample from a depth texture or is unconditionally
converting the SHADOW2D to 2D safe?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/ded4b227/attachment.html>


[Bug 66067] Trine 2's fragment normal buffer is mixtextured on Radeon HD 6770 (Juniper)

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=66067

--- Comment #28 from Daniel Scharrer  ---
Created attachment 110801
  --> https://bugs.freedesktop.org/attachment.cgi?id=110801=edit
LD_PRELOAD library to fix Trine's ARB shaders

Trine Enchanted Edition (port/remake of Trine 1 in the Trine 2 engine) has the
same lighting problem.

Since the attached radeonsi patch no longer applies, I have created a small
LD_PRELOAD library to change all instances of " SHADOW2D;" to "   2D;" in
ARB shaders. Using this confirms that it's still the same bug.

Obviously fglrx somehow silently falls back to non-shadow samplers. Is this
also the case for the nvidia driver with ARB shaders or does that just get fed
Cg shaders directly? If so, would it be feasible for Mesa to include something
like the "Handle sampler depth compare mode" but for ARB shaders only? Or at
least a drirc option - either to decide based on the compare mode or to disable
shadow samplers completely?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/c0edd7de/attachment.html>


[PATCH v2] drm/radeon: evergreen/cayman indirect draw support

2014-12-13 Thread Glenn Kennard
Add the necessary set of commands to support OpenGL
indirect draw calls on evergreen/cayman devices that
do not have VM.

Signed-off-by: Glenn Kennard 
---
Changes since patch V1:
* Removed multi draw indirect, not used by current userspace which instead
  decomposes multi into separate draw indirect calls
* Added validation to reject cs if indirect draw buffer is too small for 
address+offset
* Removed useless calls to evergreen_cs_track_check
* Reject attempt to call PACKET3_SET_BASE with other value than indirect draw 
buffer base for chips using VM.

I've left the index buffer logic unchanged, outside of scope for this feature,
additional checking of that would be part of support for 
GL_ARB/KHR_robust_buffer_access_behavior.

 drivers/gpu/drm/radeon/evergreen_cs.c | 76 +++
 drivers/gpu/drm/radeon/radeon_drv.c   |  3 +-
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c 
b/drivers/gpu/drm/radeon/evergreen_cs.c
index 924b1b7..5152c1f 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -83,6 +83,7 @@ struct evergreen_cs_track {
u32 htile_offset;
u32 htile_surface;
struct radeon_bo*htile_bo;
+   unsigned long   indirect_draw_buffer_size;
 };

 static u32 evergreen_cs_get_aray_mode(u32 tiling_flags)
@@ -1896,6 +1897,14 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
}
break;
}
+   case PACKET3_INDEX_BUFFER_SIZE:
+   {
+   if (pkt->count != 0) {
+   DRM_ERROR("bad INDEX_BUFFER_SIZE\n");
+   return -EINVAL;
+   }
+   break;
+   }
case PACKET3_DRAW_INDEX:
{
uint64_t offset;
@@ -2006,6 +2015,67 @@ static int evergreen_packet3_check(struct 
radeon_cs_parser *p,
return r;
}
break;
+   case PACKET3_SET_BASE:
+   {
+   /*
+   DW 1 HEADER Header of the packet. Shader_Type in bit 1 of the 
Header will correspond to the shader type of the Load, see Type-3 Packet.
+  2 BASE_INDEX Bits [3:0] BASE_INDEX - Base Index specifies 
which base address is specified in the last two DWs.
+0001: DX11 Draw_Index_Indirect Patch Table Base: Base 
address for Draw_Index_Indirect data.
+  3 ADDRESS_LO Bits [31:3] - Lower bits of QWORD-Aligned 
Address. Bits [2:0] - Reserved
+  4 ADDRESS_HI Bits [31:8] - Reserved. Bits [7:0] - Upper bits 
of Address [47:32]
+   */
+   if (pkt->count != 2) {
+   DRM_ERROR("bad SET_BASE\n");
+   return -EINVAL;
+   }
+
+   /* currently only supporting setting indirect draw buffer base 
address */
+   if (idx_value != 1) {
+   DRM_ERROR("bad SET_BASE\n");
+   return -EINVAL;
+   }
+
+   r = radeon_cs_packet_next_reloc(p, , 0);
+   if (r) {
+   DRM_ERROR("bad SET_BASE\n");
+   return -EINVAL;
+   }
+
+   track->indirect_draw_buffer_size = radeon_bo_size(reloc->robj);
+
+   ib[idx+1] = reloc->gpu_offset;
+   ib[idx+2] = upper_32_bits(reloc->gpu_offset) & 0xff;
+
+   break;
+   }
+   case PACKET3_DRAW_INDIRECT:
+   case PACKET3_DRAW_INDEX_INDIRECT:
+   {
+   u64 size = pkt->opcode == PACKET3_DRAW_INDIRECT ? 16 : 20;
+
+   /*
+   DW 1 HEADER
+  2 DATA_OFFSET Bits [31:0] + byte aligned offset where the 
required data structure starts. Bits 1:0 are zero
+  3 DRAW_INITIATOR Draw Initiator Register. Written to the 
VGT_DRAW_INITIATOR register for the assigned context
+   */
+   if (pkt->count != 1) {
+   DRM_ERROR("bad DRAW_INDIRECT\n");
+   return -EINVAL;
+   }
+
+   if (idx_value + size > track->indirect_draw_buffer_size) {
+   dev_warn(p->dev, "DRAW_INDIRECT buffer too small %llu + 
%llu > %lu\n",
+   idx_value, size, 
track->indirect_draw_buffer_size);
+   return -EINVAL;
+   }
+
+   r = evergreen_cs_track_check(p);
+   if (r) {
+   dev_warn(p->dev, "%s:%d invalid cmd stream\n", 
__func__, __LINE__);
+   return r;
+   }
+   break;
+   }
case PACKET3_DISPATCH_DIRECT:
if (pkt->count != 3) {
DRM_ERROR("bad DISPATCH_DIRECT\n");
@@ -3243,7 +3313,13 @@ static int evergreen_vm_packet3_check(struct 
radeon_device *rdev,

  

[RFC 01/15] drivers/base: add track framework

2014-12-13 Thread AH
Hi Mark,

Thanks for review.

Mark Brown wrote on 12.12.2014 17:36:
> On Wed, Dec 10, 2014 at 04:48:19PM +0100, Andrzej Hajda wrote:
>> track is a generic framework for safe tracking presence of any kernel objects
>> having an id. There are no special requirements about type of object or its
>> id. Id shall be unique.
>
> This is pretty confusing, when it talks about "kernel objects" that
> sounds like it means struct kobject but in fact there's no connection
> with that or any of the other kernel object stuff.  Perhaps it makes
> sense but it should be addressed.
>

OK

>> Typical usage of the framework by consumer looks as follow:
>> 1. Consumer registers notifier callback for objects with given id.
>
> This is also a custom thing not connected with the notifier mechanism
> implemented by struct notifier_block.  Again this should probably be
> addressed, even if it's just used internally it seems like there should
> be some opportunity for code reuse here.

I though about using notfier_block, but beside the struct itself there 
wouldn't be too much code to reuse. In my previous attempt of this 
framework more arguments were passed to the callback so I have dropped
this idea completely, but now after simplifying the callback it fits better.

>
>> +case track_task_up:
>> +node = track_get_node(track, task->type, task->id, true);
>> +
>> +node->up = true;
>> +node->data = task->data;
>> +list_for_each_entry_safe(itb, node->itb_next, >itb_head,
>> + list)
>> +itb->callback(itb, node->data, true);
>> +return;
>> +case track_task_down:
>
> I'm not sure the up and down naming is the most obvious naming for
> users.  It's obviously inspired by semaphores but it's not entirely
> obvious that this is going to make things clear and meaningful for
> someone trying to understand the interface.

In my 1st attempt I have called the framework interface_tracker, so it 
was named ifup/ifdown after unix commands:) Now it is less obvious.
Finding good names is always painful, anyway I will think about it.

>
>> +static int track_process_queue(struct tracker *track)
>> +{
>> +struct track_task *task, *ptask = NULL;
>> +unsigned long flags;
>> +bool empty;
>> +
>> +/* Queue non-emptiness is used as a sentinel to prevent processing
>> + * by multiple threads, so we cannot delete entry from the queue
>> + * until it is processed.
>> + */
>> +while (true) {
>> +spin_lock_irqsave(>queue_lock, flags);
>> +
>> +if (ptask)
>> +list_del(>list);
>> +task = list_first_entry(>queue,
>> +struct track_task, list);
>> +
>> +empty = list_empty(>queue);
>> +if (empty)
>> +complete_all(>queue_empty);
>> +
>> +spin_unlock_irqrestore(>queue_lock, flags);
>
> Here we get a task from the head of the list and drop the lock, leaving
> the task on the list...

Yes and it is explained in the comment few lines above.

>
>> +kfree(ptask);
>> +
>> +if (empty)
>> +break;
>> +
>> +track_process_task(track, task);
>
> ...we then go and do some other stuff, including processing that task,
> without the lock or or any other means I can see of excluding other
> users before going round and removing the task.  This seems to leave us
> vulnerable to double execution.

No, if you look at track_add_task function you will see that the queue 
is processed only if it is initially empty, otherwise the task is only 
added to the queue, so it will be processed after processing earlier tasks.
So the rule is that if someone add task to the queue it checks if the 
queue is empty, in such case it process all tasks from the queue until
the queue becomes empty, even the tasks added by other processed.
This way all tasks are serialized.


> I *think* this is supposed to be
> handled by your comment "Provider should take care of calling
> notifications synchronously in proper order" in the changelog but that's
> a bit obscure, it's not specific about what the requirements are (or
> what the limits are supposed to be on the notification callbacks).

No, this comment is just to avoid advertising object (ie calling 
track_up) that can be just removed by concurrent thread.


>
> I'm also unclear what is supposed to happen if adding a notification
> races with removing the thing being watched.
>

The sequence should be always as follows:
1. create thing, then call track_up(thing).
...
2. call track_down(thing) then remove thing.

If we put 1 into probe and 2 into remove callback of the driver it will 
be safe - we are synchronised by device_lock. But if, for some reason, 
we want to create object after probe we should do own synchronization or 
just put device_lock around 1. The same applies if we want to remove
object earlier. 

[Bug 78951] gl_PrimitiveID is zero if no geometry shader is present

2014-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=78951

--- Comment #5 from Ian Romanick  ---
(In reply to pavol from comment #4)
> I do not have radeonsi supported device. How does Intel driver support this?
> I think this is really useful for picking algorithm.

The piglit test tests/spec/glsl-1.50/execution/primitive-id-no-gs.shader_test
is for this very case.  That test passes at least on my Ivybridge.  I don't
know what we do in the driver to make this work... or if the hardware just
sorts it out.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141213/dc78d686/attachment.html>