i915.conf mmio_debug=1 output during resume of a ThinkPad T440s

2014-12-14 Thread Toralf Förster
A resume of a ThinkPad T440s brought this probably simialr to an earlier 
repoeted trace):


Dec 14 11:34:17 t44 kernel: [drm:intel_dp_start_link_train] *ERROR* too many 
full retries, give up
Dec 14 11:34:18 t44 kernel: [drm:intel_dp_start_link_train] *ERROR* too many 
full retries, give up
Dec 14 11:34:18 t44 kernel: [drm:intel_dp_complete_link_train] *ERROR* failed 
to train DP, aborting
Dec 14 11:34:18 t44 kernel: [ cut here ]
Dec 14 11:34:18 t44 kernel: WARNING: CPU: 3 PID: 2719 at 
drivers/gpu/drm/drm_dp_mst_topology.c:1242 process_single_tx_qlock+0x4cf/0x560 
[drm_kms_helper]()
Dec 14 11:34:18 t44 kernel: fail
Dec 14 11:34:18 t44 kernel: Modules linked in: ctr ccm ipt_MASQUERADE 
iptable_nat nf_nat_ipv4 nf_nat xt_multiport nf_log_ipv4 nf_log_common xt_LOG 
xt_limit ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_recent 
xt_conntrack iptable_filter ip_tables x_tables nf_conntrack_ftp nf_conntrack 
af_packet hid_cherry hid_generic usbhid hid uvcvideo videobuf2_vmalloc 
videobuf2_memops videobuf2_core v4l2_common videodev arc4 snd_hda_codec_generic 
x86_pkg_temp_thermal coretemp kvm_intel evdev iTCO_wdt kvm iwlmvm aesni_intel 
mac80211 aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse atkbd 
iwlwifi cfg80211 thermal wmi i915 fbcon thinkpad_acpi cfbfillrect bitblit 
cfbimgblt softcursor font nvram i2c_algo_bit ac cfbcopyarea battery rfkill 
snd_hda_intel drm_kms_helper ehci_pci snd_hda_controller ehci_hcd tpm_tis
Dec 14 11:34:18 t44 kernel:  tpm drm snd_hda_codec video xhci_hcd snd_pcm 
intel_gtt i2c_i801 button agpgart i2c_core fb snd_timer e1000e lpc_ich 
processor usbcore snd ptp fbdev thermal_sys mfd_core soundcore pps_core 
usb_common hwmon [last unloaded: microcode]
Dec 14 11:34:18 t44 kernel: CPU: 3 PID: 2719 Comm: kworker/u8:129 Not tainted 
3.17.6-hardened #10
Dec 14 11:34:18 t44 kernel: Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS 
GJET80WW (2.30 ) 10/20/2014
Dec 14 11:34:18 t44 kernel: Workqueue: events_unbound 810770d0
Dec 14 11:34:18 t44 kernel:  818147c7  0009 
c9002a633740
Dec 14 11:34:18 t44 kernel:  8151e6c5 c9002a633788 c9002a633778 
810527cd
Dec 14 11:34:18 t44 kernel:  c04abd68 04da c04ac8be 

Dec 14 11:34:18 t44 kernel: Call Trace:

Dec 14 11:34:18 t44 kernel:  [] dump_stack+0x45/0x5c
Dec 14 11:34:18 t44 kernel:  [] warn_slowpath_common+0x7d/0xa0
Dec 14 11:34:18 t44 kernel:  [] ? paniced+0x4e8/0x14a5 
[drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] ? paniced+0x103e/0x14a5 
[drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] warn_slowpath_fmt+0x60/0x80
Dec 14 11:34:18 t44 kernel:  [] ? paniced+0x103e/0x14a5 
[drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
process_single_tx_qlock+0x4cf/0x560 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
process_single_down_tx_qlock+0x41/0xf0 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
drm_dp_queue_down_tx+0x4e/0x70 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
drm_dp_mst_i2c_xfer+0x195/0x240 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] ? stop_ring+0x61/0x1e0 [i915]
Dec 14 11:34:18 t44 kernel:  [] ? 
intel_aux_display_runtime_put+0x15/0x20 [i915]
Dec 14 11:34:18 t44 kernel:  [] ? gmbus_xfer+0x459/0x640 
[i915]
Dec 14 11:34:18 t44 kernel:  [] __i2c_transfer+0x83/0x270 
[i2c_core]
Dec 14 11:34:18 t44 kernel:  [] i2c_transfer+0x51/0x90 
[i2c_core]
Dec 14 11:34:18 t44 kernel:  [] 
drm_do_probe_ddc_edid+0xd7/0x150 [drm]
Dec 14 11:34:18 t44 kernel:  [] drm_get_edid+0x35/0x2c0 [drm]
Dec 14 11:34:18 t44 kernel:  [] ? mutex_unlock+0x15/0x20
Dec 14 11:34:18 t44 kernel:  [] ? 
intel_dp_mst_enc_funcs+0x10/0x10 [i915]
Dec 14 11:34:18 t44 kernel:  [] drm_dp_mst_get_edid+0x36/0x80 
[drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
intel_dp_mst_get_modes+0x2c/0x60 [i915]
Dec 14 11:34:18 t44 kernel:  [] 
drm_helper_probe_single_connector_modes_merge_bits+0x212/0x410 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
drm_helper_probe_single_connector_modes+0x2a/0x40 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
drm_fb_helper_probe_connector_modes.isra.3+0x67/0x90 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
drm_fb_helper_hotplug_event+0x5e/0xe0 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
intel_fbdev_output_poll_changed+0x22/0x30 [i915]
Dec 14 11:34:18 t44 kernel:  [] 
drm_kms_helper_hotplug_event+0x37/0x40 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] 
drm_helper_hpd_irq_event+0x100/0x150 [drm_kms_helper]
Dec 14 11:34:18 t44 kernel:  [] __i915_drm_thaw+0x18e/0x1f0 
[i915]
Dec 14 11:34:18 t44 kernel:  [] ? i915_gem_vm_ops+0x40/0x40 
[i915]
Dec 14 11:34:18 t44 kernel:  [] ? 
pci_pm_default_resume+0x40/0x40
Dec 14 11:34:18 t44 kernel:  [] i915_resume+0x2b/0x50 [i915]
Dec 14 11:34:18 t44 kernel:  [] i915_pm_resume+0x19/0x30 
[i915]
Dec 14 11:34:18 t44 kernel:  [] pci_pm_resume+0x78/0xc0
Dec 14 11:34:18 t44 kernel:  [] dpm_run_callback+0x45/0x130
Dec 14 11:34:18 t44 kernel:  

[Bug 80584] XCOM: Enemy Unknown incorrect hair rendering

2014-12-14 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=80584

Felix Schwarz  changed:

   What|Removed |Added

Summary|XCOM: Eenemy Unknown|XCOM: Enemy Unknown
   |incorrect hair rendering|incorrect hair rendering

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141214/14df28b8/attachment.html>


[Bug 80584] XCOM: Eenemy Unknown incorrect hair rendering

2014-12-14 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=80584

Felix Schwarz  changed:

   What|Removed |Added

 CC||felix.schwarz at oss.schwarz.e
   ||u
 Blocks||77449

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141214/7176f64f/attachment.html>


[Intel-gfx] [PATCH 2/4] drm/cache: Try to be smarter about clflushing on x86

2014-12-14 Thread Jesse Barnes
On 12/14/2014 4:59 AM, Chris Wilson wrote:
> One of the things wbinvd is considered evil for is that it blocks the
> CPU for an indeterminate amount of time - upsetting latency critcial
> aspects of the OS. For example, the x86/mm has similar code to use
> wbinvd for large clflushes that caused a bit of controversy with RT:
>
> http://linux-kernel.2935.n7.nabble.com/PATCH-x86-Use-clflush-instead-of-wbinvd-whenever-possible-when-changing-mapping-td493751.html
>
> and also the use of wbinvd in the nvidia driver has also been noted as
> evil by RT folks.
>
> However as the wbinvd still exists, it can't be all that bad...

Yeah there are definitely tradeoffs here.  In this particular case, 
we're trying to flush out a ~140M object on every frame, which just 
seems silly.

There's definitely room for optimization in Mesa too; avoiding a mapping 
that marks a large bo as dirty would be good, but if we improve the 
kernel in this area, even sloppy apps and existing binaries will speed up.

Maybe we could apply this only on !llc systems or something?  I wonder 
how much wbinvd performance varies across microarchitectures; maybe 
Thomas's issue isn't really relevant anymore (at least one can hope).

When digging into this, I found that an optimization to remove the IPI 
for wbinvd was clobbered during a merge; maybe that should be 
resurrected too.  Surely a single, global wbinvd is sufficient; we don't 
need to do n_cpus^2 wbinvd + the associated invalidation bus signals here...

Alternately, we could insert some delays into this path just to make it 
extra clear to userspace that they really shouldn't be hitting this in 
the common case (and provide some additional interfaces to let them 
avoid it by allowing flushing and dirty management in userspace).

Jesse


[Bug 86720] [radeon] Europa Universalis 4 freezing during game start (10.3.3)

2014-12-14 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=86720

--- Comment #6 from Joti Papadopoulos  ---
Actually, i'm having this issue as well with a HD5850. So it's probably not
Cayman specific

OS: Arch Linux
GPU:Radeon HD5850
Mesa 10.3.5 and 10.5git(last tested about a week ago)
CPU:Phenom 2 X3 720

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20141214/3539154a/attachment.html>


[PATCH 1/1] amdkfd: Fixing topology bug in building sysfs nodes

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

Original code sent always 0 as the index number of the node. This patch fixes
this bug by sending a variable which is incremented per node.

Signed-off-by: Ben Goz 
Reviewed-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b11792d..cca1708 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -921,7 +921,7 @@ static int kfd_build_sysfs_node_tree(void)
uint32_t i = 0;

list_for_each_entry(dev, _device_list, list) {
-   ret = kfd_build_sysfs_node_entry(dev, 0);
+   ret = kfd_build_sysfs_node_entry(dev, i);
if (ret < 0)
return ret;
i++;
-- 
1.9.1



[PATCH v2 1/1] amdkfd: fix error printing in kfd_ioctl()

2014-12-14 Thread Oded Gabbay
When an ioctl function returns -EAGAIN, don't print error in kfd_ioctl()

v2: Also don't print an error if the ioctl function returns -ERESTARTSYS

Signed-off-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 7d4974b..e10b5b9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -571,7 +571,7 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, 
unsigned long arg)
break;
}

-   if (err < 0)
+   if ((err < 0) && (err != -EAGAIN) && (err != -ERESTARTSYS))
dev_err(kfd_device,
"ioctl error %ld for ioctl cmd 0x%x (#%d)\n",
err, cmd, _IOC_NR(cmd));
-- 
1.9.1



[PATCH 6/6] amdkfd: Pass queue type to pqm_create_queue()

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

This patch passes the correct queue type to pqm_create_queue() instead of a
fixed KFD_QUEUE_TYPE_COMPUTE type.

Signed-off-by: Ben Goz 
Reviewed-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index fbaa98e..77008d5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -259,8 +259,8 @@ static long kfd_ioctl_create_queue(struct file *filep, 
struct kfd_process *p,
p->pasid,
dev->id);

-   err = pqm_create_queue(>pqm, dev, filep, _properties, 0,
-   KFD_QUEUE_TYPE_COMPUTE, _id);
+   err = pqm_create_queue(>pqm, dev, filep, _properties,
+   0, q_properties.type, _id);
if (err != 0)
goto err_create_queue;

-- 
1.9.1



[PATCH 5/6] amdkfd: Identify SDMA queue in create queue ioctl

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

This patch adds a check to the create queue ioctl path, which identifies SDMA
queue type that is sent by userspace.

Signed-off-by: Ben Goz 
Reviewed-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 4083dbc..fbaa98e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -191,6 +191,8 @@ static int set_queue_properties_from_user(struct 
queue_properties *q_properties,
if (args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE ||
args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE_AQL)
q_properties->type = KFD_QUEUE_TYPE_COMPUTE;
+   else if (args->queue_type == KFD_IOC_QUEUE_TYPE_SDMA)
+   q_properties->type = KFD_QUEUE_TYPE_SDMA;
else
return -ENOTSUPP;

-- 
1.9.1



[PATCH 4/6] amdkfd: Add SDMA user-mode queues support to QCM

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

This patch adds support for SDMA user-mode queues to the QCM - the Queue
management system that manages queues-per-device and queues-per-process.

Signed-off-by: Ben Goz 
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 167 +++--
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |   5 +
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   2 +-
 3 files changed, 164 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7b6df51..55ee2da 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -46,9 +46,24 @@ static int set_pasid_vmid_mapping(struct 
device_queue_manager *dqm,
 static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
struct queue *q,
struct qcm_process_device *qpd);
+
 static int execute_queues_cpsch(struct device_queue_manager *dqm, bool lock);
 static int destroy_queues_cpsch(struct device_queue_manager *dqm, bool lock);

+static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
+   struct queue *q,
+   struct qcm_process_device *qpd);
+
+static void deallocate_sdma_queue(struct device_queue_manager *dqm,
+   unsigned int sdma_queue_id);
+
+static inline
+enum KFD_MQD_TYPE get_mqd_type_from_queue_type(enum kfd_queue_type type)
+{
+   if (type == KFD_QUEUE_TYPE_SDMA)
+   return KFD_MQD_TYPE_CIK_SDMA;
+   return KFD_MQD_TYPE_CIK_CP;
+}

 static inline unsigned int get_pipes_num(struct device_queue_manager *dqm)
 {
@@ -189,7 +204,10 @@ static int create_queue_nocpsch(struct 
device_queue_manager *dqm,
*allocated_vmid = qpd->vmid;
q->properties.vmid = qpd->vmid;

-   retval = create_compute_queue_nocpsch(dqm, q, qpd);
+   if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE)
+   retval = create_compute_queue_nocpsch(dqm, q, qpd);
+   if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
+   retval = create_sdma_queue_nocpsch(dqm, q, qpd);

if (retval != 0) {
if (list_empty(>queues_list)) {
@@ -202,7 +220,8 @@ static int create_queue_nocpsch(struct device_queue_manager 
*dqm,

list_add(>list, >queues_list);
dqm->queue_count++;
-
+   if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
+   dqm->sdma_queue_count++;
mutex_unlock(>lock);
return 0;
 }
@@ -279,8 +298,7 @@ static int destroy_queue_nocpsch(struct 
device_queue_manager *dqm,
struct queue *q)
 {
int retval;
-   struct mqd_manager *mqd;
-
+   struct mqd_manager *mqd, *mqd_sdma;
BUG_ON(!dqm || !q || !q->mqd || !qpd);

retval = 0;
@@ -294,6 +312,12 @@ static int destroy_queue_nocpsch(struct 
device_queue_manager *dqm,
goto out;
}

+   mqd_sdma = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_SDMA);
+   if (mqd_sdma == NULL) {
+   mutex_unlock(>lock);
+   return -ENOMEM;
+   }
+
retval = mqd->destroy_mqd(mqd, q->mqd,
KFD_PREEMPT_TYPE_WAVEFRONT,
QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
@@ -302,7 +326,12 @@ static int destroy_queue_nocpsch(struct 
device_queue_manager *dqm,
if (retval != 0)
goto out;

-   deallocate_hqd(dqm, q);
+   if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE)
+   deallocate_hqd(dqm, q);
+   else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+   dqm->sdma_queue_count--;
+   deallocate_sdma_queue(dqm, q->sdma_id);
+   }

mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);

@@ -324,7 +353,7 @@ static int update_queue(struct device_queue_manager *dqm, 
struct queue *q)
BUG_ON(!dqm || !q || !q->mqd);

mutex_lock(>lock);
-   mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+   mqd = dqm->get_mqd_manager(dqm, q->properties.type);
if (mqd == NULL) {
mutex_unlock(>lock);
return -ENOMEM;
@@ -536,6 +565,11 @@ static int init_pipelines(struct device_queue_manager *dqm,
 }


+static int init_sdma_engines(struct device_queue_manager *dqm)
+{
+   return kfd2kgd->init_sdma_engines(dqm->dev->kgd);
+}
+
 static int init_scheduler(struct device_queue_manager *dqm)
 {
int retval;
@@ -549,7 +583,10 @@ static int init_scheduler(struct device_queue_manager *dqm)
return retval;

retval = init_memory(dqm);
+   if (retval != 0)
+   return retval;

+   retval = init_sdma_engines(dqm);
return retval;
 }

@@ -565,6 +602,7 @@ static int initialize_nocpsch(struct device_queue_manager 

[PATCH 3/6] amdkfd: Add SDMA mqd support

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

This patch adds support for SDMA mqd operations:
- init_mqd_sdma
- uninit_mqd_sdma
- load_mqd_sdma
- update_mqd_sdma
- destroy_mqd_sdma
- is_occupied_sdma

It also adds SDMA queue information to some private structures of amdkfd.

Signed-off-by: Ben Goz 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 121 +++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|   8 ++
 2 files changed, 129 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index adc3147..9eda956 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -111,6 +111,37 @@ static int init_mqd(struct mqd_manager *mm, void **mqd,
return retval;
 }

+static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
+   struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+   struct queue_properties *q)
+{
+   int retval;
+   struct cik_sdma_rlc_registers *m;
+
+   BUG_ON(!mm || !mqd || !mqd_mem_obj);
+
+   retval = kfd2kgd->allocate_mem(mm->dev->kgd,
+   sizeof(struct cik_sdma_rlc_registers),
+   256,
+   KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+   (struct kgd_mem **) mqd_mem_obj);
+
+   if (retval != 0)
+   return -ENOMEM;
+
+   m = (struct cik_sdma_rlc_registers *) (*mqd_mem_obj)->cpu_ptr;
+
+   memset(m, 0, sizeof(struct cik_sdma_rlc_registers));
+
+   *mqd = m;
+   if (gart_addr != NULL)
+   *gart_addr = (*mqd_mem_obj)->gpu_addr;
+
+   retval = mm->update_mqd(mm, m, q);
+
+   return retval;
+}
+
 static void uninit_mqd(struct mqd_manager *mm, void *mqd,
struct kfd_mem_obj *mqd_mem_obj)
 {
@@ -118,11 +149,24 @@ static void uninit_mqd(struct mqd_manager *mm, void *mqd,
kfd2kgd->free_mem(mm->dev->kgd, (struct kgd_mem *) mqd_mem_obj);
 }

+static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
+   struct kfd_mem_obj *mqd_mem_obj)
+{
+   BUG_ON(!mm || !mqd);
+   kfd2kgd->free_mem(mm->dev->kgd, (struct kgd_mem *) mqd_mem_obj);
+}
+
 static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id,
uint32_t queue_id, uint32_t __user *wptr)
 {
return kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id, wptr);
+}

+static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
+   uint32_t pipe_id, uint32_t queue_id,
+   uint32_t __user *wptr)
+{
+   return kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd);
 }

 static int update_mqd(struct mqd_manager *mm, void *mqd,
@@ -170,6 +214,41 @@ static int update_mqd(struct mqd_manager *mm, void *mqd,
return 0;
 }

+static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
+   struct queue_properties *q)
+{
+   struct cik_sdma_rlc_registers *m;
+
+   BUG_ON(!mm || !mqd || !q);
+
+   m = get_sdma_mqd(mqd);
+   m->sdma_rlc_rb_cntl =
+   RB_SIZE((ffs(q->queue_size / sizeof(unsigned int |
+   RB_VMID(q->vmid) |
+   RPTR_WRITEBACK_ENABLE |
+   RPTR_WRITEBACK_TIMER(6);
+
+   m->sdma_rlc_rb_base = lower_32_bits(q->queue_address >> 8);
+   m->sdma_rlc_rb_base_hi = upper_32_bits(q->queue_address >> 8);
+   m->sdma_rlc_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
+   m->sdma_rlc_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
+   m->sdma_rlc_doorbell = OFFSET(q->doorbell_off) | DB_ENABLE;
+   m->sdma_rlc_virtual_addr = q->sdma_vm_addr;
+
+   m->sdma_engine_id = q->sdma_engine_id;
+   m->sdma_queue_id = q->sdma_queue_id;
+
+   q->is_active = false;
+   if (q->queue_size > 0 &&
+   q->queue_address != 0 &&
+   q->queue_percent > 0) {
+   m->sdma_rlc_rb_cntl |= RB_ENABLE;
+   q->is_active = true;
+   }
+
+   return 0;
+}
+
 static int destroy_mqd(struct mqd_manager *mm, void *mqd,
enum kfd_preempt_type type,
unsigned int timeout, uint32_t pipe_id,
@@ -179,6 +258,18 @@ static int destroy_mqd(struct mqd_manager *mm, void *mqd,
pipe_id, queue_id);
 }

+/*
+ * preempt type here is ignored because there is only one way
+ * to preempt sdma queue
+ */
+static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
+   enum kfd_preempt_type type,
+   unsigned int timeout, uint32_t pipe_id,
+   uint32_t queue_id)
+{
+   return kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout);
+}
+
 static bool is_occupied(struct 

[PATCH 2/6] drm/radeon: Implement SDMA interface functions

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

This patch implements the new SDMA interface functions. It also adds defines
and structures related to SDMA registers.

Signed-off-by: Ben Goz 
---
 drivers/gpu/drm/radeon/cik_reg.h| 200 +++-
 drivers/gpu/drm/radeon/radeon_kfd.c | 132 +++-
 2 files changed, 329 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik_reg.h b/drivers/gpu/drm/radeon/cik_reg.h
index 79c45e8..5008964 100644
--- a/drivers/gpu/drm/radeon/cik_reg.h
+++ b/drivers/gpu/drm/radeon/cik_reg.h
@@ -147,10 +147,73 @@

 #define CIK_LB_DESKTOP_HEIGHT 0x6b0c

+#define KFD_CIK_SDMA_QUEUE_OFFSET  0x200
+
+#define SQ_IND_INDEX   0x8DE0
+#define SQ_CMD 0x8DEC
+#define SQ_IND_DATA0x8DE4
+
+#define TCP_WATCH0_ADDR_H  (0x32A0*4)
+#define TCP_WATCH1_ADDR_H  (0x32A3*4)
+#define TCP_WATCH2_ADDR_H  (0x32A6*4)
+#define TCP_WATCH3_ADDR_H  (0x32A9*4)
+#define TCP_WATCH0_ADDR_L  (0x32A1*4)
+#define TCP_WATCH1_ADDR_L  (0x32A4*4)
+#define TCP_WATCH2_ADDR_L  (0x32A7*4)
+#define TCP_WATCH3_ADDR_L  (0x32AA*4)
+#define TCP_WATCH0_CNTL(0x32A2*4)
+#define TCP_WATCH1_CNTL(0x32A5*4)
+#define TCP_WATCH2_CNTL(0x32A8*4)
+#define TCP_WATCH3_CNTL(0x32AB*4)
+
 #define CP_HQD_IQ_RPTR 0xC970u
 #define AQL_ENABLE (1U << 0)
-
-#define IDLE   (1 << 2)
+#define SDMA0_RLC0_RB_CNTL 0xD400u
+#defineRB_ENABLE   (1 << 0)
+#defineRB_SIZE(x)  (x << 1)
+#defineRB_SWAP_ENABLE  (1 << 9)
+#defineRPTR_WRITEBACK_ENABLE   (1 << 12)
+#defineRPTR_WRITEBACK_SWAP_ENABLE  (1 << 13)
+#defineRPTR_WRITEBACK_TIMER(x) (x << 16)
+#defineRB_VMID(x)  (x << 24)
+#defineSDMA0_RLC0_RB_BASE  0xD404u
+#defineSDMA0_RLC0_RB_BASE_HI   0xD408u
+#defineSDMA0_RLC0_RB_RPTR  0xD40Cu
+#defineSDMA0_RLC0_RB_WPTR  0xD410u
+#defineSDMA0_RLC0_RB_WPTR_POLL_CNTL0xD414u
+#defineSDMA0_RLC0_RB_WPTR_POLL_ADDR_HI 0xD418u
+#defineSDMA0_RLC0_RB_WPTR_POLL_ADDR_LO 0xD41Cu
+#defineSDMA0_RLC0_RB_RPTR_ADDR_HI  0xD420u
+#defineSDMA0_RLC0_RB_RPTR_ADDR_LO  0xD424u
+#defineSDMA0_RLC0_IB_CNTL  0xD428u
+#defineSDMA0_RLC0_IB_RPTR  0xD42Cu
+#defineSDMA0_RLC0_IB_OFFSET0xD430u
+#defineSDMA0_RLC0_IB_BASE_LO   0xD434u
+#defineSDMA0_RLC0_IB_BASE_HI   0xD438u
+#defineSDMA0_RLC0_IB_SIZE  0xD43Cu
+#defineSDMA0_RLC0_SKIP_CNTL0xD440u
+#defineSDMA0_RLC0_CONTEXT_STATUS   0xD444u
+#defineSELECTED(1 << 0)
+#defineIDLE(1 << 2)
+#defineEXPIRED (1 << 3)
+#defineEXCEPTION   (1 << 4)
+#defineCTXSW_ABLE  (1 << 7)
+#defineCTXSW_READY (1 << 8)
+#defineSDMA0_RLC0_DOORBELL 0xD448u
+#defineOFFSET(x)   (x << 0)
+#defineDB_ENABLE   (1 << 28)
+#defineCAPTURED(1 << 30)
+#defineSDMA0_RLC0_VIRTUAL_ADDR 0xD49Cu
+#defineATC (1 << 0)
+#defineVA_PTR32(1 << 4)
+#defineVA_SHARED_BASE(x)   (x << 8)
+#defineVM_HOLE (1 << 30)
+#defineSDMA0_RLC0_APE1_CNTL0xD4A0u
+#defineSDMA0_RLC0_DOORBELL_LOG 0xD4A4u
+#define

[PATCH 1/6] drm/amd: Add SDMA functions to kfd-->kgd interface

2014-12-14 Thread Oded Gabbay
From: Ben Goz 

This patch adds four new functions to the kfd2kgd interface:

1. init_sdma_engines() - Initializes the SDMA engines through GPU registers.

2. hqd_sdma_load() - Loads SDMA mqd to a H/W SDMA hqd slot. Used only in no HWS
 mode.

3. hqd_sdma_is_occupied() - Checks if an SDMA hqd slot is occupied. Used only
in no HWS mode.

4. hqd_sdma_destroy() - Destructs and preempts the SDMA queue assigned to
that SDMA hqd slot. Used only in no HWS mode.

These functions are needed to support SDMA queues scheduling.

Signed-off-by: Ben Goz 
Reviewed-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 47b5519..3da21e7 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -141,13 +141,23 @@ struct kgd2kfd_calls {
  *
  * @init_pipeline: Initialized the compute pipelines.
  *
+ * @init_sdma_engines: Initialize the sdma engines.
+ *
  * @hqd_load: Loads the mqd structure to a H/W hqd slot. used only for no cp
  * sceduling mode.
  *
+ * @hqd_sdma_load: Loads the SDMA mqd structure to a H/W SDMA hqd slot.
+ * used only for no HWS mode.
+ *
  * @hqd_is_occupies: Checks if a hqd slot is occupied.
  *
  * @hqd_destroy: Destructs and preempts the queue assigned to that hqd slot.
  *
+ * @hqd_sdma_is_occupied: Checks if an SDMA hqd slot is occupied.
+ *
+ * @hqd_sdma_destroy: Destructs and preempts the SDMA queue assigned to that
+ * SDMA hqd slot.
+ *
  * @get_fw_version: Returns FW versions from the header
  *
  * This structure contains function pointers to services that the kgd driver
@@ -179,16 +189,19 @@ struct kfd2kgd_calls {
int (*init_memory)(struct kgd_dev *kgd);
int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id,
uint32_t hpd_size, uint64_t hpd_gpu_addr);
-
+   int (*init_sdma_engines)(struct kgd_dev *kgd);
int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
uint32_t queue_id, uint32_t __user *wptr);
-
+   int (*hqd_sdma_load)(struct kgd_dev *kgd, void *mqd);
bool (*hqd_is_occupies)(struct kgd_dev *kgd, uint64_t queue_address,
uint32_t pipe_id, uint32_t queue_id);

int (*hqd_destroy)(struct kgd_dev *kgd, uint32_t reset_type,
unsigned int timeout, uint32_t pipe_id,
uint32_t queue_id);
+   bool (*hqd_sdma_is_occupied)(struct kgd_dev *kgd, void *mqd);
+   int (*hqd_sdma_destroy)(struct kgd_dev *kgd, void *mqd,
+   unsigned int timeout);
uint16_t (*get_fw_version)(struct kgd_dev *kgd,
enum kgd_engine_type type);
 };
-- 
1.9.1



[PATCH 0/6] Add SDMA user-mode queues support to amdkfd

2014-12-14 Thread Oded Gabbay
This patch-set enables amdkfd to provide the ability to HSA processes to
create SDMA user-mode queues.

The queues can be scheduled on either one of Kaveri's two SDMA engines. The
assignment is done during the creation of the queue and it is alternating
between the first engine and the second. e.g. first SDMA queue will be assigned
to engine 1, second SDMA queue will be assigned to engine 2, third SDMA queue
will be assigned to engine 1 and so forth.

The creation and destruction of the queues is done through the same IOCTLs that
are used to create regular compute queues. The identification in the create
queue ioctl is done by using the queue_type argument that is passed by the
HSA process to the amdkfd. That argument is already present in the current
interface so it is backward compatible.

The patch-set adds four new functions to the interface between kfd and kgd.
Three of those functions are used only in no-HWS mode, which is used when
during debug and bring-up.

The main abstraction is done at the MQD level, which has a different layout
than a compute MQD.

Oded Gabbay

Ben Goz (6):
  drm/amd: Add SDMA functions to kfd-->kgd interface
  drm/radeon: Implement SDMA interface functions
  amdkfd: Add SDMA mqd support
  amdkfd: Add SDMA user-mode queues support to QCM
  amdkfd: Identify SDMA queue in create queue ioctl
  amdkfd: Pass queue type to pqm_create_queue()

 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   |   6 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 167 -
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |   5 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c   | 121 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |   8 +
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   2 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h|  17 +-
 drivers/gpu/drm/radeon/cik_reg.h   | 200 -
 drivers/gpu/drm/radeon/radeon_kfd.c| 132 +-
 9 files changed, 641 insertions(+), 17 deletions(-)

-- 
1.9.1



[PATCH 1/4] amdkfd: fix error printing in kfd_ioctl()

2014-12-14 Thread Oded Gabbay


On 12/14/2014 04:10 PM, Christian König wrote:
> Am 14.12.2014 um 14:35 schrieb Oded Gabbay:
>> When an ioctl function returns -EAGAIN, don't print error in kfd_ioctl()
>
> You most likely want to handle -ERESTARTSYS the same way.
>
> Christian.

Thanks, will fix and resend.

Oded
>
>>
>> Signed-off-by: Oded Gabbay 
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index 7d4974b..69c5fe7 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -571,7 +571,7 @@ static long kfd_ioctl(struct file *filep, unsigned int
>> cmd, unsigned long arg)
>>   break;
>>   }
>> -if (err < 0)
>> +if ((err < 0) && (err != -EAGAIN))
>>   dev_err(kfd_device,
>>   "ioctl error %ld for ioctl cmd 0x%x (#%d)\n",
>>   err, cmd, _IOC_NR(cmd));
>


[Intel-gfx] [PATCH 4/4] drm/i915: Opportunistically reduce flushing at execbuf

2014-12-14 Thread Ben Widawsky
On Sun, Dec 14, 2014 at 03:12:21PM +0200, Ville Syrjälä wrote:
> On Sat, Dec 13, 2014 at 07:08:24PM -0800, Ben Widawsky wrote:
> > If we're moving a bunch of buffers from the CPU domain to the GPU domain, 
> > and
> > we've already blown out the entire cache via a wbinvd, there is nothing 
> > more to
> > do.
> > 
> > With this and the previous patches, I am seeing a 3x FPS increase on a 
> > certain
> > benchmark which uses a giant 2d array texture. Unless I missed something in 
> > the
> > code, it should only effect non-LLC i915 platforms.
> > 
> > I haven't yet run any numbers for other benchmarks, nor have I attempted to
> > check if various conformance tests still pass.
> > 
> > NOTE: As mentioned in the previous patch, if one can easily obtain the 
> > largest
> > buffer and attempt to flush it first, the results would be even more 
> > desirable.
> 
> So even with that optimization if you only have tons of small buffers
> that need to be flushed you'd still take the clflush path for every
> single one.
> 
> How difficult would it to calculate the total size to be flushed first,
> and then make the clflush vs. wbinvd decision base on that?
> 

I'll write the patch and send it to Eero for test.

It's not hard, and I think that's a good idea as well. One reason I didn't put
such code in this series is that moves away from a global DRM solution (and like
I said in the cover-letter, I am fine with that). Implementing this, I think in
the i915 code we'd just iterate through the BOs until we got to a certain
threshold, then just call wbinvd() from i915 and not even both with drm_cache.
You could also maybe try to shorcut if there are more than X buffers.

However, for what you describe, I think it might make more sense to let
userspace specify an execbuf flag to do the wbinvd(). Userspace can trivially
determine such info, it prevents having to iterate through the buffers an extra
time in the kernel.

I wonder if the clflushing many small objects is showing up on profiles? So far,
this specific microbenchmark was the only profile I'd seen where the clflushes
show up.

Thanks.

[snip]



[PATCH 4/4] amdkfd: Process-device data creation and lookup split

2014-12-14 Thread Oded Gabbay
From: Alexey Skidanov 

This patch splits the current kfd_get_process_device_data() to two
functions, one that specifically creates a pdd and another one which
just do lookup.

This is done to enhance the readability and maintainability of the code.

Signed-off-by: Alexey Skidanov 
Reviewed-by: Oded Gabbay 
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  1 -
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c  |  4 ---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c   | 13 ---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  5 +--
 drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 40 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 14 +---
 6 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index f44d673..7b6df51 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -75,7 +75,6 @@ get_sh_mem_bases_nybble_64(struct kfd_process_device *pdd)
nybble = (pdd->lds_base >> 60) & 0x0E;

return nybble;
-
 }

 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index b5791a5..1a9b355 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -137,10 +137,6 @@ int kfd_doorbell_mmap(struct kfd_process *process, struct 
vm_area_struct *vma)
if (dev == NULL)
return -EINVAL;

-   /* Find if pdd exists for combination of process and gpu id */
-   if (!kfd_get_process_device_data(dev, process, 0))
-   return -EINVAL;
-
/* Calculate physical address of doorbell */
address = kfd_get_process_doorbells(dev, process);

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index e64aa99..5c91029 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -303,10 +303,11 @@ int kfd_init_apertures(struct kfd_process *process)
while ((dev = kfd_topology_enum_kfd_devices(id)) != NULL &&
id < NUM_OF_SUPPORTED_GPUS) {

-   pdd = kfd_get_process_device_data(dev, process, 1);
-   if (!pdd)
-   return -1;
-
+   pdd = kfd_create_process_device_data(dev, process);
+   if (pdd == NULL) {
+   pr_err("Failed to create process device data\n");
+   goto err;
+   }
/*
 * For 64 bit process aperture will be statically reserved in
 * the x86_64 non canonical process address space
@@ -349,6 +350,10 @@ int kfd_init_apertures(struct kfd_process *process)
}

return 0;
+
+err:
+   mutex_unlock(>mutex);
+   return -1;
 }


diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index ba2bba8..a2e053c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -473,8 +473,9 @@ struct kfd_process_device 
*kfd_bind_process_to_device(struct kfd_dev *dev,
struct kfd_process *p);
 void kfd_unbind_process_from_device(struct kfd_dev *dev, unsigned int pasid);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
-   struct kfd_process *p,
-   int create_pdd);
+   struct kfd_process *p);
+struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
+   struct kfd_process *p);

 /* Process device data iterator */
 struct kfd_process_device *kfd_get_first_process_device_data(struct 
kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 3c76ef0..a369c14 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -311,24 +311,29 @@ err_alloc_process:
 }

 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
-   struct kfd_process *p,
-   int create_pdd)
+   struct kfd_process *p)
 {
struct kfd_process_device *pdd = NULL;

list_for_each_entry(pdd, >per_device_data, per_device_list)
if (pdd->dev == dev)
-   return pdd;
-
-   if (create_pdd) {
-   pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
-   if 

[PATCH 3/4] amdkfd: Remove duplicate include

2014-12-14 Thread Oded Gabbay
Signed-off-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 69c5fe7..4083dbc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include "kfd_priv.h"
-- 
1.9.1



[PATCH 2/4] amdkfd: Add number of watch points to topology

2014-12-14 Thread Oded Gabbay
From: Alexey Skidanov 

This patch adds the number of watch points to the node capabilities in the
topology module

Reviewed-by: Oded Gabbay 
Signed-off-by: Alexey Skidanov 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 13 +
 3 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 43884eb..436c31c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -32,6 +32,7 @@
 static const struct kfd_device_info kaveri_device_info = {
.max_pasid_bits = 16,
.ih_ring_entry_size = 4 * sizeof(uint32_t),
+   .num_of_watch_points = 4,
.mqd_size_aligned = MQD_SIZE_ALIGNED
 };

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f9fb81e..ba2bba8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -107,6 +107,7 @@ enum cache_policy {
 struct kfd_device_info {
unsigned int max_pasid_bits;
size_t ih_ring_entry_size;
+   uint8_t num_of_watch_points;
uint16_t mqd_size_aligned;
 };

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b11792d..da34e1f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "kfd_priv.h"
 #include "kfd_crat.h"
@@ -634,6 +635,7 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
struct kfd_topology_device *dev;
char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
uint32_t i;
+   uint32_t log_max_watch_addr;

/* Making sure that the buffer is an empty string */
buffer[0] = 0;
@@ -708,6 +710,17 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
dev->node_props.location_id);

if (dev->gpu) {
+   log_max_watch_addr =
+   
__ilog2_u32(dev->gpu->device_info->num_of_watch_points);
+
+   if (log_max_watch_addr) {
+   dev->node_props.capability |=
+   HSA_CAP_WATCH_POINTS_SUPPORTED;
+   dev->node_props.capability |=
+   (log_max_watch_addr << 
HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT) &
+   HSA_CAP_WATCH_POINTS_TOTALBITS_MASK;
+   }
+
sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
kfd2kgd->get_max_engine_clock_in_mhz(
dev->gpu->kgd));
-- 
1.9.1



[PATCH 1/4] amdkfd: fix error printing in kfd_ioctl()

2014-12-14 Thread Oded Gabbay
When an ioctl function returns -EAGAIN, don't print error in kfd_ioctl()

Signed-off-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 7d4974b..69c5fe7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -571,7 +571,7 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, 
unsigned long arg)
break;
}

-   if (err < 0)
+   if ((err < 0) && (err != -EAGAIN))
dev_err(kfd_device,
"ioctl error %ld for ioctl cmd 0x%x (#%d)\n",
err, cmd, _IOC_NR(cmd));
-- 
1.9.1



[Intel-gfx] [PATCH 4/4] drm/i915: Opportunistically reduce flushing at execbuf

2014-12-14 Thread Ville Syrjälä
On Sat, Dec 13, 2014 at 07:08:24PM -0800, Ben Widawsky wrote:
> If we're moving a bunch of buffers from the CPU domain to the GPU domain, and
> we've already blown out the entire cache via a wbinvd, there is nothing more 
> to
> do.
> 
> With this and the previous patches, I am seeing a 3x FPS increase on a certain
> benchmark which uses a giant 2d array texture. Unless I missed something in 
> the
> code, it should only effect non-LLC i915 platforms.
> 
> I haven't yet run any numbers for other benchmarks, nor have I attempted to
> check if various conformance tests still pass.
> 
> NOTE: As mentioned in the previous patch, if one can easily obtain the largest
> buffer and attempt to flush it first, the results would be even more 
> desirable.

So even with that optimization if you only have tons of small buffers
that need to be flushed you'd still take the clflush path for every
single one.

How difficult would it to calculate the total size to be flushed first,
and then make the clflush vs. wbinvd decision base on that?

> 
> Cc: DRI Development 
> Signed-off-by: Ben Widawsky 
> ---
>  drivers/gpu/drm/i915/i915_drv.h|  3 ++-
>  drivers/gpu/drm/i915/i915_gem.c| 12 +---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  8 +---
>  drivers/gpu/drm/i915/intel_lrc.c   |  8 +---
>  4 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d68c75f..fdb92a3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2642,7 +2642,8 @@ static inline bool i915_stop_ring_allow_warn(struct 
> drm_i915_private *dev_priv)
>  }
>  
>  void i915_gem_reset(struct drm_device *dev);
> -bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
> +enum drm_cache_flush
> +i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force);
>  int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj);
>  int __must_check i915_gem_init(struct drm_device *dev);
>  int i915_gem_init_rings(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index de241eb..3746738 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3608,7 +3608,7 @@ err_unpin:
>   return vma;
>  }
>  
> -bool
> +enum drm_cache_flush
>  i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>   bool force)
>  {
> @@ -3617,14 +3617,14 @@ i915_gem_clflush_object(struct drm_i915_gem_object 
> *obj,
>* again at bind time.
>*/
>   if (obj->pages == NULL)
> - return false;
> + return DRM_CACHE_FLUSH_NONE;
>  
>   /*
>* Stolen memory is always coherent with the GPU as it is explicitly
>* marked as wc by the system, or the system is cache-coherent.
>*/
>   if (obj->stolen || obj->phys_handle)
> - return false;
> + return DRM_CACHE_FLUSH_NONE;
>  
>   /* If the GPU is snooping the contents of the CPU cache,
>* we do not need to manually clear the CPU cache lines.  However,
> @@ -3635,12 +3635,10 @@ i915_gem_clflush_object(struct drm_i915_gem_object 
> *obj,
>* tracking.
>*/
>   if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
> - return false;
> + return DRM_CACHE_FLUSH_NONE;
>  
>   trace_i915_gem_object_clflush(obj);
> - drm_clflush_sg(obj->pages);
> -
> - return true;
> + return drm_clflush_sg(obj->pages);
>  }
>  
>  /** Flushes the GTT write domain for the object if it's dirty. */
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 0c25f62..e8eb9e9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -827,7 +827,7 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs 
> *ring,
>  {
>   struct i915_vma *vma;
>   uint32_t flush_domains = 0;
> - bool flush_chipset = false;
> + enum drm_cache_flush flush_chipset = DRM_CACHE_FLUSH_NONE;
>   int ret;
>  
>   list_for_each_entry(vma, vmas, exec_list) {
> @@ -836,8 +836,10 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs 
> *ring,
>   if (ret)
>   return ret;
>  
> - if (obj->base.write_domain & I915_GEM_DOMAIN_CPU)
> - flush_chipset |= i915_gem_clflush_object(obj, false);
> + if (obj->base.write_domain & I915_GEM_DOMAIN_CPU &&
> + flush_chipset != DRM_CACHE_FLUSH_WBINVD) {
> + flush_chipset = i915_gem_clflush_object(obj, false);
> + }
>  
>   flush_domains |= obj->base.write_domain;
>   }
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 89b5577..a6c6ebd 100644
> --- 

[PATCH 1/4] amdkfd: fix error printing in kfd_ioctl()

2014-12-14 Thread Christian König
Am 14.12.2014 um 14:35 schrieb Oded Gabbay:
> When an ioctl function returns -EAGAIN, don't print error in kfd_ioctl()

You most likely want to handle -ERESTARTSYS the same way.

Christian.

>
> Signed-off-by: Oded Gabbay 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 7d4974b..69c5fe7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -571,7 +571,7 @@ static long kfd_ioctl(struct file *filep, unsigned int 
> cmd, unsigned long arg)
>   break;
>   }
>   
> - if (err < 0)
> + if ((err < 0) && (err != -EAGAIN))
>   dev_err(kfd_device,
>   "ioctl error %ld for ioctl cmd 0x%x (#%d)\n",
>   err, cmd, _IOC_NR(cmd));



[PATCH 1/3] drm/amd: Add get_fw_version to kfd-->kgd interface

2014-12-14 Thread Oded Gabbay
This should be [PATCH v2 x/3] for all three patches.
Sorry.

Oded

On 12/14/2014 02:29 PM, Oded Gabbay wrote:
> This patch adds a new interface to the kfd-->kgd interface.
> The new interface function retrieves the firmware version that is currently in
> use by the MEC engine. The firmware was uploaded to the MEC engine by the kgd
> (radeon).
>
> v2: Added parameter of engine type to interface function
>
> Signed-off-by: Oded Gabbay 
> ---
>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 15 +++
>   1 file changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 9c729dd..47b5519 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -45,6 +45,17 @@ enum kgd_memory_pool {
>   KGD_POOL_FRAMEBUFFER = 3,
>   };
>
> +enum kgd_engine_type {
> + KGD_ENGINE_PFP = 1,
> + KGD_ENGINE_ME,
> + KGD_ENGINE_CE,
> + KGD_ENGINE_MEC1,
> + KGD_ENGINE_MEC2,
> + KGD_ENGINE_RLC,
> + KGD_ENGINE_SDMA,
> + KGD_ENGINE_MAX
> +};
> +
>   struct kgd2kfd_shared_resources {
>   /* Bit n == 1 means VMID n is available for KFD. */
>   unsigned int compute_vmid_bitmap;
> @@ -137,6 +148,8 @@ struct kgd2kfd_calls {
>*
>* @hqd_destroy: Destructs and preempts the queue assigned to that hqd slot.
>*
> + * @get_fw_version: Returns FW versions from the header
> + *
>* This structure contains function pointers to services that the kgd driver
>* provides to amdkfd driver.
>*
> @@ -176,6 +189,8 @@ struct kfd2kgd_calls {
>   int (*hqd_destroy)(struct kgd_dev *kgd, uint32_t reset_type,
>   unsigned int timeout, uint32_t pipe_id,
>   uint32_t queue_id);
> + uint16_t (*get_fw_version)(struct kgd_dev *kgd,
> + enum kgd_engine_type type);
>   };
>
>   bool kgd2kfd_init(unsigned interface_version,
>


[PATCH 3/3] amdkfd: Display MEC fw version in topology node

2014-12-14 Thread Oded Gabbay
This patch displays the firmware version of the microcode that is currently
running in the MEC.
This is needed for the HSA RT, so it could differentiate its behavior based on
fw version. e.g. workarounds for bugs in fw

v2: Send the KGD_ENGINE_MEC1 as a parameter to the get_fw_version()

Signed-off-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 5733e28..b11792d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -700,8 +700,6 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
dev->node_props.simd_per_cu);
sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
dev->node_props.max_slots_scratch_cu);
-   sysfs_show_32bit_prop(buffer, "engine_id",
-   dev->node_props.engine_id);
sysfs_show_32bit_prop(buffer, "vendor_id",
dev->node_props.vendor_id);
sysfs_show_32bit_prop(buffer, "device_id",
@@ -715,6 +713,12 @@ static ssize_t node_show(struct kobject *kobj, struct 
attribute *attr,
dev->gpu->kgd));
sysfs_show_64bit_prop(buffer, "local_mem_size",
kfd2kgd->get_vmem_size(dev->gpu->kgd));
+
+   sysfs_show_32bit_prop(buffer, "fw_version",
+   kfd2kgd->get_fw_version(
+   dev->gpu->kgd,
+   KGD_ENGINE_MEC1));
+
}

ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
-- 
1.9.1



[PATCH 2/3] drm/radeon: Add implementation of get_fw_version

2014-12-14 Thread Oded Gabbay
This patch implements a new interface that was added to the kfd-->kgd interface.
The new interface function retrieves the firmware version that is currently
in use by a specific engine. The firmware was uploaded to the engine by the
radeon driver.

v2: Returns the fw version of the specific engine, as passed into the function
by a new parameter

Signed-off-by: Oded Gabbay 
---
 drivers/gpu/drm/radeon/radeon_kfd.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c 
b/drivers/gpu/drm/radeon/radeon_kfd.c
index 065d020..242fd8b 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -28,6 +28,8 @@
 #include "cikd.h"
 #include "cik_reg.h"
 #include "radeon_kfd.h"
+#include "radeon_ucode.h"
+#include 

 #define CIK_PIPE_PER_MEC   (4)

@@ -49,6 +51,7 @@ static uint64_t get_vmem_size(struct kgd_dev *kgd);
 static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);

 static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
+static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);

 /*
  * Register access functions
@@ -91,6 +94,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
.hqd_load = kgd_hqd_load,
.hqd_is_occupies = kgd_hqd_is_occupies,
.hqd_destroy = kgd_hqd_destroy,
+   .get_fw_version = get_fw_version
 };

 static const struct kgd2kfd_calls *kgd2kfd;
@@ -561,3 +565,52 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, uint32_t 
reset_type,
release_queue(kgd);
return 0;
 }
+
+static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
+{
+   struct radeon_device *rdev = (struct radeon_device *) kgd;
+   const union radeon_firmware_header *hdr;
+
+   BUG_ON(kgd == NULL || rdev->mec_fw == NULL);
+
+   switch (type) {
+   case KGD_ENGINE_PFP:
+   hdr = (const union radeon_firmware_header *) rdev->pfp_fw->data;
+   break;
+
+   case KGD_ENGINE_ME:
+   hdr = (const union radeon_firmware_header *) rdev->me_fw->data;
+   break;
+
+   case KGD_ENGINE_CE:
+   hdr = (const union radeon_firmware_header *) rdev->ce_fw->data;
+   break;
+
+   case KGD_ENGINE_MEC1:
+   hdr = (const union radeon_firmware_header *) rdev->mec_fw->data;
+   break;
+
+   case KGD_ENGINE_MEC2:
+   hdr = (const union radeon_firmware_header *)
+   rdev->mec2_fw->data;
+   break;
+
+   case KGD_ENGINE_RLC:
+   hdr = (const union radeon_firmware_header *) rdev->rlc_fw->data;
+   break;
+
+   case KGD_ENGINE_SDMA:
+   hdr = (const union radeon_firmware_header *)
+   rdev->sdma_fw->data;
+   break;
+
+   default:
+   return 0;
+   }
+
+   if (hdr == NULL)
+   return 0;
+
+   /* Only 12 bit in use*/
+   return hdr->common.ucode_version;
+}
-- 
1.9.1



[PATCH 1/3] drm/amd: Add get_fw_version to kfd-->kgd interface

2014-12-14 Thread Oded Gabbay
This patch adds a new interface to the kfd-->kgd interface.
The new interface function retrieves the firmware version that is currently in
use by the MEC engine. The firmware was uploaded to the MEC engine by the kgd
(radeon).

v2: Added parameter of engine type to interface function

Signed-off-by: Oded Gabbay 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 9c729dd..47b5519 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -45,6 +45,17 @@ enum kgd_memory_pool {
KGD_POOL_FRAMEBUFFER = 3,
 };

+enum kgd_engine_type {
+   KGD_ENGINE_PFP = 1,
+   KGD_ENGINE_ME,
+   KGD_ENGINE_CE,
+   KGD_ENGINE_MEC1,
+   KGD_ENGINE_MEC2,
+   KGD_ENGINE_RLC,
+   KGD_ENGINE_SDMA,
+   KGD_ENGINE_MAX
+};
+
 struct kgd2kfd_shared_resources {
/* Bit n == 1 means VMID n is available for KFD. */
unsigned int compute_vmid_bitmap;
@@ -137,6 +148,8 @@ struct kgd2kfd_calls {
  *
  * @hqd_destroy: Destructs and preempts the queue assigned to that hqd slot.
  *
+ * @get_fw_version: Returns FW versions from the header
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -176,6 +189,8 @@ struct kfd2kgd_calls {
int (*hqd_destroy)(struct kgd_dev *kgd, uint32_t reset_type,
unsigned int timeout, uint32_t pipe_id,
uint32_t queue_id);
+   uint16_t (*get_fw_version)(struct kgd_dev *kgd,
+   enum kgd_engine_type type);
 };

 bool kgd2kfd_init(unsigned interface_version,
-- 
1.9.1



[Bug 89731] System doesn't boot on muxed IntelHD + HD5650

2014-12-14 Thread bugzilla-dae...@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=89731

--- Comment #1 from Andrea  ---
A few details on the machine:

Acer 5820TG, it's got an Intel HD first gen + AMD radeon HD5650.

It is a muxed switchable graphics system as far as I know

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Bug 89731] New: System doesn't boot on muxed IntelHD + HD5650

2014-12-14 Thread bugzilla-dae...@bugzilla.kernel.org
https://bugzilla.kernel.org/show_bug.cgi?id=89731

Bug ID: 89731
   Summary: System doesn't boot on muxed IntelHD + HD5650
   Product: Drivers
   Version: 2.5
Kernel Version: 3.17.4-301.fc21.x86_64
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-dri at kernel-bugs.osdl.org
  Reporter: and.bernabei at gmail.com
Regression: No

Created attachment 160521
  --> https://bugzilla.kernel.org/attachment.cgi?id=160521=edit
journal log

I installed Fedora21 (also tried updating the packages) but my system doesn't
boot at all when in "Switchable" mode in BIOS.

The only way to have a successful boot is to use the VESA driver (nomodeset),
or using the "Discrete" BIOS vga mode.

You can find the journal log attached, in contains useful call traces.

I am available to help with the debugging of this issue ;)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[Intel-gfx] [PATCH 2/4] drm/cache: Try to be smarter about clflushing on x86

2014-12-14 Thread Chris Wilson
On Sat, Dec 13, 2014 at 07:08:22PM -0800, Ben Widawsky wrote:
> Any GEM driver which has very large objects and a slow CPU is subject to very
> long waits simply for clflushing incoherent objects. Generally, each 
> individual
> object is not a problem, but if you have very large objects, or very many
> objects, the flushing begins to show up in profiles. Because on x86 we know 
> the
> cache size, we can easily determine when an object will use all the cache, and
> forego iterating over each cacheline.
> 
> We need to be careful when using wbinvd. wbinvd() is itself potentially slow
> because it requires synchronizing the flush across all CPUs so they have a
> coherent view of memory. This can result in either stalling work being done on
> other CPUs, or this call itself stalling while waiting for a CPU to accept the
> interrupt. Also, wbinvd() also has the downside of invalidating all 
> cachelines,
> so we don't want to use it unless we're sure we already own most of the
> cachelines.
> 
> The current algorithm is very naive. I think it can be tweaked more, and it
> would be good if someone else gave it some thought. I am pretty confident in
> i915, we can even skip the IPI in the execbuf path with minimal code change 
> (or
> perhaps just some verifying of the existing code). It would be nice to hear 
> what
> other developers who depend on this code think.

One of the things wbinvd is considered evil for is that it blocks the
CPU for an indeterminate amount of time - upsetting latency critcial
aspects of the OS. For example, the x86/mm has similar code to use
wbinvd for large clflushes that caused a bit of controversy with RT:

http://linux-kernel.2935.n7.nabble.com/PATCH-x86-Use-clflush-instead-of-wbinvd-whenever-possible-when-changing-mapping-td493751.html

and also the use of wbinvd in the nvidia driver has also been noted as
evil by RT folks.

However as the wbinvd still exists, it can't be all that bad...

> Cc: Intel GFX 
> Signed-off-by: Ben Widawsky 
> ---
>  drivers/gpu/drm/drm_cache.c | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index d7797e8..6009c2d 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -64,6 +64,20 @@ static void drm_cache_flush_clflush(struct page *pages[],
>   drm_clflush_page(*pages++);
>   mb();
>  }
> +
> +static bool
> +drm_cache_should_clflush(unsigned long num_pages)
> +{
> + const int cache_size = boot_cpu_data.x86_cache_size;

Maybe const int cpu_cache_size = boot_cpu_data.x86_cache_size >> 2; /* in pages 
*/

Just to make it clear where the factor of 4 is required.

How stand alone is this patch? What happens if you just apply this by
itself? I presume it wasn't all that effective since you needed the
additional patches to prevent superfluous flushes. But it should have an
effect to reduce the time it takes to bind framebuffers, etc. I expect
it to overall worsen performance as we do the repeated wbinvd in
execbuffer.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Regression] 3.18 black screen after boot (bisected)

2014-12-14 Thread Heinz Diehl
Hi,

since kernel 3.18 I'm no longer able to run X on my machine. While
3.17.6 is fine, 3.18 leaves me with a black screen when starting
X. Booting into runlevel 1/3 is fine.

I did a "git bisect", and the offending commit is this one:

[root at kiera linux-git]# git bisect bad
83f45fc360c8e16a330474860ebda872d1384c8c is the first bad commit
commit 83f45fc360c8e16a330474860ebda872d1384c8c
Author: Daniel Vetter 
Date:   Wed Aug 6 09:10:18 2014 +0200

drm: Don't grab an fb reference for the idr

[]

I double-checked, and 3.18 is fine with this commit reverted.

My machine is an Asus U45JC laptop, and the CPU is an Intel i450M
(Ironlake). Please contact me if I can help in any way (I'm subscribed
to lkml, but not to other X or kernel related lists).

Thanks,
 Heinz.




[Intel-gfx] [PATCH 1/4] drm/cache: Use wbinvd helpers

2014-12-14 Thread Chris Wilson
On Sat, Dec 13, 2014 at 07:08:21PM -0800, Ben Widawsky wrote:
> When the original drm code was written there were no centralized functions for
> doing a coordinated wbinvd across all CPUs. Now (since 2010) there are, so use
> them instead of rolling a new one.
> 
> Cc: Intel GFX 
> Signed-off-by: Ben Widawsky 
Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH 2/3] drm/radeon: Add implementation of get_fw_version

2014-12-14 Thread Oded Gabbay


On 12/10/2014 11:57 PM, Alex Deucher wrote:
> On Wed, Dec 10, 2014 at 8:13 AM, Oded Gabbay  wrote:
>> From: Alexey Skidanov 
>>
>> This patch implements a new interface that was added to the kfd-->kgd 
>> interface.
>> The new interface function retrieves the firmware version that is currently
>> in use by the MEC engine. The firmware was uploaded to the MEC engine by the
>> radeon driver.
>>
>> Signed-off-by: Alexey Skidanov 
>> Reviewed-by: Oded Gabbay 
>> ---
>>   drivers/gpu/drm/radeon/radeon_kfd.c | 23 +++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c 
>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>> index 065d020..223c831 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>> @@ -28,6 +28,8 @@
>>   #include "cikd.h"
>>   #include "cik_reg.h"
>>   #include "radeon_kfd.h"
>> +#include "radeon_ucode.h"
>> +#include 
>>
>>   #define CIK_PIPE_PER_MEC   (4)
>>
>> @@ -49,6 +51,7 @@ static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>   static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
>>
>>   static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
>> +static uint16_t get_fw_version(struct kgd_dev *kgd);
>>
>>   /*
>>* Register access functions
>> @@ -91,6 +94,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>  .hqd_load = kgd_hqd_load,
>>  .hqd_is_occupies = kgd_hqd_is_occupies,
>>  .hqd_destroy = kgd_hqd_destroy,
>> +   .get_fw_version = get_fw_version
>>   };
>>
>>   static const struct kgd2kfd_calls *kgd2kfd;
>> @@ -561,3 +565,22 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, 
>> uint32_t reset_type,
>>  release_queue(kgd);
>>  return 0;
>>   }
>> +
>> +static uint16_t get_fw_version(struct kgd_dev *kgd)
>
> I think it would be better to call the is get_mec_fw_version or add an
> engine and/or instance parameter if it need to query other engines
> (e.g., MEC2 or GFX).
>
Right, I will add the engine parameter.
>> +{
>> +   struct radeon_device *rdev;
>> +   const struct gfx_firmware_header_v1_0 *hdr;
>> +
>> +   BUG_ON(kgd == NULL);
>> +
>> +   rdev = (struct radeon_device *) kgd;
>> +
>> +   BUG_ON(rdev->mec_fw == NULL);
>> +
>> +   hdr = (const struct gfx_firmware_header_v1_0 *)rdev->mec_fw->data;
>> +
>
> Do you care about the fw version of MEC2?
Not currently, as MEC1 & MEC2 have the same fw version, although we load 
different fw files to them.

Oded
>
> Alex
>


[Regression] 83f45fc turns machine's screen off

2014-12-14 Thread Emmanuel Benisty
Hi Daniel,

> On Mon, Nov 10, 2014 at 10:19 PM, Daniel Vetter  
> wrote:
>> Adding relevant mailing lists.
>>
>>
>> On Sat, Nov 8, 2014 at 7:34 PM, Emmanuel Benisty  
>> wrote:
>>> Hi,
>>>
>>> The following commit permanently turns my screen off when x server is
>>> started (i3 330M Ironlake):
>>>
>>> [83f45fc360c8e16a330474860ebda872d1384c8c] drm: Don't grab an fb
>>> reference for the idr
>>>
>>> Reverting this commit fixed the issue.
>>
>> This is definitely unexpected. I think we need a bit more data to
>> figure out what's going on here:
>> - Please boot with drm.debug=0xe added to your kernel cmdline and grab
>> the dmesg right after boot-up for both a working or broken kernel.
>
> Please see attached files.
>
>> - Are you using any special i915 cmdline options?
>
> Nope.

Is there anything else I could provide to help fixing this issue? It's
still in Linus' tree.

Thanks in advance.