Re: [RFC PATCH 00/10] Device Memory TCP

2023-07-16 Thread Andy Lutomirski

On 7/10/23 15:32, Mina Almasry wrote:

* TL;DR:

Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.


(I'm writing this as someone who might plausibly use this mechanism, but 
I don't think I'm very likely to end up working on the kernel side, 
unless I somehow feel extremely inspired to implement it for i40e.)


I looked at these patches and the GVE tree, and I'm trying to wrap my 
head around the data path.  As I understand it, for RX:


1. The GVE driver notices that the queue is programmed to use devmem, 
and it programs the NIC to copy packet payloads to the devmem that has 
been programmed.
2. The NIC receives the packet and copies the header to kernel memory 
and the payload to dma-buf memory.

3. The kernel tells userspace where in the dma-buf the data is.
4. Userspace does something with the data.
5. Userspace does DONTNEED to recycle the memory and make it available 
for new received packets.


Did I get this right?

This seems a bit awkward if there's any chance that packets not intended 
for the target device end up in the rxq.


I'm wondering if a more capable if somewhat higher latency model could 
work where the NIC stores received packets in its own device memory. 
Then userspace (or the kernel or a driver or whatever) could initiate a 
separate DMA from the NIC to the final target *after* reading the 
headers.  Can the hardware support this?


Another way of putting this is: steering received data to a specific 
device based on the *receive queue* forces the logic selecting a 
destination device to be the same as the logic selecting the queue.  RX 
steering logic is pretty limited on most hardware (as far as I know -- 
certainly I've never had much luck doing anything especially intelligent 
with RX flow steering, and I've tried on a couple of different brands of 
supposedly fancy NICs).  But Linux has very nice capabilities to direct 
packets, in software, to where they are supposed to go, and it would be 
nice if all that logic could just work, scalably, with device memory. 
If Linux could examine headers *before* the payload gets DMAed to 
wherever it goes, I think this could plausibly work quite nicely.  One 
could even have an easy-to-use interface in which one directs a *socket* 
to a PCIe device.  I expect, although I've never looked at the 
datasheets, that the kernel could even efficiently make rx decisions 
based on data in device memory on upcoming CXL NICs where device memory 
could participate in the host cache hierarchy.


My real ulterior motive is that I think it would be great to use an 
ability like this for DPDK-like uses.  Wouldn't it be nifty if I could 
open a normal TCP socket, then, after it's open, ask the kernel to 
kindly DMA the results directly to my application memory (via udmabuf, 
perhaps)?  Or have a whole VLAN or macvlan get directed to a userspace 
queue, etc?



It also seems a bit odd to me that the binding from rxq to dma-buf is 
established by programming the dma-buf.  This makes the security model 
(and the mental model) awkward -- this binding is a setting on the 
*queue*, not the dma-buf, and in a containerized or privilege-separated 
system, a process could have enough privilege to make a dma-buf 
somewhere but not have any privileges on the NIC.  (And may not even 
have the NIC present in its network namespace!)


--Andy


Re: [RFC PATCH 06/10] net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages

2023-07-16 Thread Andy Lutomirski

On 7/10/23 15:32, Mina Almasry wrote:

Add an interface for the user to notify the kernel that it is done reading
the NET_RX dmabuf pages returned as cmsg. The kernel will drop the
reference on the NET_RX pages to make them available for re-use.

Signed-off-by: Mina Almasry 
---



+   for (i = 0; i < num_tokens; i++) {
+   for (j = 0; j < tokens[i].token_count; j++) {
+   struct page *pg = xa_erase(>sk_pagepool,
+  
tokens[i].token_start + j);
+
+   if (pg)
+   put_page(pg);
+   else
+   /* -EINTR here notifies the userspace
+* that not all tokens passed to it have
+* been freed.
+*/
+   ret = -EINTR;


Unless I'm missing something, this type of error reporting is 
unrecoverable -- userspace doesn't know how many tokens have been freed.


I think you should either make it explicitly unrecoverable (somehow shut 
down dmabuf handling entirely) or tell userspace how many tokens were 
successfully freed.


--Andy


Re: [PATCH 0/2] Nuke PAGE_KERNEL_IO

2021-11-12 Thread Andy Lutomirski

On 10/21/21 11:15, Lucas De Marchi wrote:

Last user of PAGE_KERNEL_IO is the i915 driver. While removing it from
there as we seek to bring the driver to other architectures, Daniel
suggested that we could finish the cleanup and remove it altogether,
through the tip tree. So here I'm sending both commits needed for that.

Lucas De Marchi (2):
   drm/i915/gem: stop using PAGE_KERNEL_IO
   x86/mm: nuke PAGE_KERNEL_IO

  arch/x86/include/asm/fixmap.h | 2 +-
  arch/x86/include/asm/pgtable_types.h  | 7 ---
  arch/x86/mm/ioremap.c | 2 +-
  arch/x86/xen/setup.c  | 2 +-
  drivers/gpu/drm/i915/gem/i915_gem_pages.c | 4 ++--
  include/asm-generic/fixmap.h  | 2 +-
  6 files changed, 6 insertions(+), 13 deletions(-)



Acked-by: Andy Lutomirski 


Re: [PATCH v2 3/4] drm/ttm, drm/vmwgfx: Correctly support support AMD memory encryption

2019-09-03 Thread Andy Lutomirski


> On Sep 3, 2019, at 3:15 PM, Thomas Hellström (VMware) 
>  wrote:
> 
>> On 9/4/19 12:08 AM, Thomas Hellström (VMware) wrote:
>>> On 9/3/19 11:46 PM, Andy Lutomirski wrote:
>>> On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware)
>>>  wrote:
>>>> On 9/3/19 10:51 PM, Dave Hansen wrote:
>>>>>> On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote:
>>>>>> So the question here should really be, can we determine already at mmap
>>>>>> time whether backing memory will be unencrypted and adjust the *real*
>>>>>> vma->vm_page_prot under the mmap_sem?
>>>>>> 
>>>>>> Possibly, but that requires populating the buffer with memory at mmap
>>>>>> time rather than at first fault time.
>>>>> I'm not connecting the dots.
>>>>> 
>>>>> vma->vm_page_prot is used to create a VMA's PTEs regardless of if they
>>>>> are created at mmap() or fault time.  If we establish a good
>>>>> vma->vm_page_prot, can't we just use it forever for demand faults?
>>>> With SEV I think that we could possibly establish the encryption flags
>>>> at vma creation time. But thinking of it, it would actually break with
>>>> SME where buffer content can be moved between encrypted system memory
>>>> and unencrypted graphics card PCI memory behind user-space's back. That
>>>> would imply killing all user-space encrypted PTEs and at fault time set
>>>> up new ones pointing to unencrypted PCI memory..
>>>> 
>>>>> Or, are you concerned that if an attempt is made to demand-fault page
>>>>> that's incompatible with vma->vm_page_prot that we have to SEGV?
>>>>> 
>>>>>> And it still requires knowledge whether the device DMA is always
>>>>>> unencrypted (or if SEV is active).
>>>>> I may be getting mixed up on MKTME (the Intel memory encryption) and
>>>>> SEV.  Is SEV supported on all memory types?  Page cache, hugetlbfs,
>>>>> anonymous?  Or just anonymous?
>>>> SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a
>>>> SWIOTLB backed by unencrypted memory, and it also flips coherent DMA
>>>> memory to unencrypted (which is a very slow operation and patch 4 deals
>>>> with caching such memory).
>>>> 
>>> I'm still lost.  You have some fancy VMA where the backing pages
>>> change behind the application's back.  This isn't particularly novel
>>> -- plain old anonymous memory and plain old mapped files do this too.
>>> Can't you all the insert_pfn APIs and call it a day?  What's so
>>> special that you need all this magic?  ISTM you should be able to
>>> allocate memory that's addressable by the device (dma_alloc_coherent()
>>> or whatever) and then map it into user memory just like you'd map any
>>> other page.
>>> 
>>> I feel like I'm missing something here.
>> 
>> Yes, so in this case we use dma_alloc_coherent().
>> 
>> With SEV, that gives us unencrypted pages. (Pages whose linear kernel map is 
>> marked unencrypted). With SME that (typcially) gives us encrypted pages. In 
>> both these cases, vm_get_page_prot() returns
>> an encrypted page protection, which lands in vma->vm_page_prot.
>> 
>> In the SEV case, we therefore need to modify the page protection to 
>> unencrypted. Hence we need to know whether we're running under SEV and 
>> therefore need to modify the protection. If not, the user-space PTE would 
>> incorrectly have the encryption flag set.
>> 

I’m still confused. You got unencrypted pages with an unencrypted PFN. Why do 
you need to fiddle?  You have a PFN, and you’re inserting it with 
vmf_insert_pfn().  This should just work, no?  There doesn’t seem to be any 
real funny business in dma_mmap_attrs() or dma_common_mmap().

But, reading this, I have more questions:

Can’t you get rid of cvma by using vmf_insert_pfn_prot()?

Would it make sense to add a vmf_insert_dma_page() to directly do exactly what 
you’re trying to do?

And a broader question just because I’m still confused: why isn’t the 
encryption bit in the PFN?  The whole SEV/SME system seems like it’s trying a 
bit to hard to be fully invisible to the kernel.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 3/4] drm/ttm, drm/vmwgfx: Correctly support support AMD memory encryption

2019-09-03 Thread Andy Lutomirski
On Tue, Sep 3, 2019 at 2:05 PM Thomas Hellström (VMware)
 wrote:
>
> On 9/3/19 10:51 PM, Dave Hansen wrote:
> > On 9/3/19 1:36 PM, Thomas Hellström (VMware) wrote:
> >> So the question here should really be, can we determine already at mmap
> >> time whether backing memory will be unencrypted and adjust the *real*
> >> vma->vm_page_prot under the mmap_sem?
> >>
> >> Possibly, but that requires populating the buffer with memory at mmap
> >> time rather than at first fault time.
> > I'm not connecting the dots.
> >
> > vma->vm_page_prot is used to create a VMA's PTEs regardless of if they
> > are created at mmap() or fault time.  If we establish a good
> > vma->vm_page_prot, can't we just use it forever for demand faults?
>
> With SEV I think that we could possibly establish the encryption flags
> at vma creation time. But thinking of it, it would actually break with
> SME where buffer content can be moved between encrypted system memory
> and unencrypted graphics card PCI memory behind user-space's back. That
> would imply killing all user-space encrypted PTEs and at fault time set
> up new ones pointing to unencrypted PCI memory..
>
> >
> > Or, are you concerned that if an attempt is made to demand-fault page
> > that's incompatible with vma->vm_page_prot that we have to SEGV?
> >
> >> And it still requires knowledge whether the device DMA is always
> >> unencrypted (or if SEV is active).
> > I may be getting mixed up on MKTME (the Intel memory encryption) and
> > SEV.  Is SEV supported on all memory types?  Page cache, hugetlbfs,
> > anonymous?  Or just anonymous?
>
> SEV AFAIK encrypts *all* memory except DMA memory. To do that it uses a
> SWIOTLB backed by unencrypted memory, and it also flips coherent DMA
> memory to unencrypted (which is a very slow operation and patch 4 deals
> with caching such memory).
>

I'm still lost.  You have some fancy VMA where the backing pages
change behind the application's back.  This isn't particularly novel
-- plain old anonymous memory and plain old mapped files do this too.
Can't you all the insert_pfn APIs and call it a day?  What's so
special that you need all this magic?  ISTM you should be able to
allocate memory that's addressable by the device (dma_alloc_coherent()
or whatever) and then map it into user memory just like you'd map any
other page.

I feel like I'm missing something here.


Re: [PATCH v2 1/4] x86/mm: Export force_dma_unencrypted

2019-09-03 Thread Andy Lutomirski
On Tue, Sep 3, 2019 at 1:46 PM Thomas Hellström (VMware)
 wrote:
>
> On 9/3/19 6:22 PM, Christoph Hellwig wrote:
> > On Tue, Sep 03, 2019 at 04:32:45PM +0200, Thomas Hellström (VMware) wrote:
> >> Is this a layer violation concern, that is, would you be ok with a similar
> >> helper for TTM, or is it that you want to force the graphics drivers into
> >> adhering strictly to the DMA api, even when it from an engineering
> >> perspective makes no sense?
> > >From looking at DRM I strongly believe that making DRM use the DMA
> > mapping properly makes a lot of sense from the engineering perspective,
> > and this series is a good argument for that positions.
>
> What I mean with "from an engineering perspective" is that drivers would
> end up with a non-trivial amount of code supporting purely academic
> cases: Setups where software rendering would be faster than gpu
> accelerated, and setups on platforms where the driver would never run
> anyway because the device would never be supported on that platform...
>
> >   If DRM was using
> > the DMA properl we would not need this series to start with, all the
> > SEV handling is hidden behind the DMA API.  While we had occasional
> > bugs in that support fixing it meant that it covered all drivers
> > properly using that API.
>
> That is not really true. The dma API can't handle faulting of coherent
> pages which is what this series is really all about supporting also with
> SEV active. To handle the case where we move graphics buffers or send
> them to swap space while user-space have them mapped.
>
> To do that and still be fully dma-api compliant we would ideally need,
> for example, an exported dma_pgprot(). (dma_pgprot() by the way is still
> suffering from one of the bugs that you mention above).
>
> Still, I need a way forward and my questions weren't really answered by
> this.
>
>

I read this patch, I read force_dma_encrypted(), I read the changelog
again, and I haven't the faintest clue what TTM could possibly be
doing with force_dma_encrypted().

You're saying that TTM needs to transparently change mappings to
relocate objects in memory between system memory and device memory.
Great, I don't see the problem.  Is the issue that you need to
allocate system memory that is addressable by the GPU and that, if the
GPU has insufficient PA bits, you need unencrypted memory?  If so,
this sounds like an excellent use for the DMA API.   Rather than
kludging directly knowledge of force_dma_encrypted() into the driver,
can't you at least add, if needed, a new helper specifically to
allocate memory that can be addressed by the device?  Like
dma_alloc_coherent()?  Or, if for some reason, dma_alloc_coherent()
doesn't do what you need or your driver isn't ready to use it, then
explain *why* and introduce a new function to solve your problem?

Keep in mind that, depending on just how MKTME ends up being supported
in Linux, it's entirely possible that it will be *backwards* from what
you expect -- high address bits will be needed to ask for
*unencrypted* memory.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH] drm/i915: Improve PSR activation timing

2018-02-09 Thread Andy Lutomirski
On Fri, Feb 9, 2018 at 7:39 AM, Rodrigo Vivi <rodrigo.v...@intel.com> wrote:
> Rodrigo Vivi <rodrigo.v...@intel.com> writes:
>
>> "Pandiyan, Dhinakaran" <dhinakaran.pandi...@intel.com> writes:
>>
>>> On Thu, 2018-02-08 at 14:48 -0800, Rodrigo Vivi wrote:
>>>> Hi Andy,
>>>>
>>>> thanks for getting involved with PSR and sorry for not replying sooner.
>>>>
>>>> I first saw this patch on that bugzilla entry but only now I stop to
>>>> really think why I have written the code that way.
>>>>
>>>> So some clarity below.
>>>>
>>>> On Mon, Feb 05, 2018 at 10:07:09PM +, Andy Lutomirski wrote:
>>>> > The current PSR code has a two call sites that each schedule delayed
>>>> > work to activate PSR.  As far as I can tell, each call site intends
>>>> > to keep PSR inactive for the given amount of time and then allow it
>>>> > to be activated.
>>>> >
>>>> > The call sites are:
>>>> >
>>>> >  - intel_psr_enable(), which explicitly states in a comment that
>>>> >it's trying to keep PSR off a short time after the dispay is
>>>> >initialized as a workaround.
>>>>
>>>> First of all I really want to kill this call here and remove the
>>>> FIXME. It was an ugly hack that I added to solve a corner case
>>>> that was leaving me with blank screens when activating so sooner.
>>>>
>>>> >
>>>> >  - intel_psr_flush().  There isn't an explcit explanation, but the
>>>> >intent is presumably to keep PSR off until the display has been
>>>> >idle for 100ms.
>>>>
>>>> The reason for 100 is kind of ugly-nonsense-empirical value
>>>> I concluded from VLV/CHV experience.
>>>> On platforms with HW tracking HW waits few identical frames
>>>> until really activating PSR. VLV/CHV activation is immediate.
>>>> But HW is also different and there it seemed that hw needed a
>>>> few more time before starting the transitions.
>>>> Furthermore I didn't want to add that so quickly because I didn't
>>>> want to take the risk of killing battery with software tracking
>>>> when doing transitions so quickly using software tracking.
>>>>
>>>> >
>>>> > The current code doesn't actually accomplish either of these goals.
>>>> > Rather than keeping PSR inactive for the given amount of time, it
>>>> > will schedule PSR for activation after the given time, with the
>>>> > earliest target time in such a request winning.
>>>>
>>>> Putting that way I was asking myself how that hack had ever fixed
>>>> my issue. Because the way you explained here seems obvious that it
>>>> wouldn't ever fix my bug or any other.
>>>>
>>>> So I applied your patch and it made even more sense (without considering
>>>> the fact I want to kill the first call anyways).
>>>>
>>>> So I came back, removed your patch and tried to understand how did
>>>> it ever worked.
>>>>
>>>> So, the thing is that intel_psr_flush will never be really executed
>>>> if intel_psr_enable wasn't executed. That is guaranteed by:
>>>>
>>>> mutex_lock(_priv->psr.lock);
>>>> if (!dev_priv->psr.enabled) {
>>>>
>>>> So, intel_psr_enable will be for sure the first one to schedule the
>>>> work delayed to the ugly higher delay.
>>>>
>>>> >
>>>> > In other words, if intel_psr_enable() is immediately followed by
>>>> > intel_psr_flush(), then PSR will be activated after 100ms even if
>>>> > intel_psr_enable() wanted a longer delay.  And, if the screen is
>>>> > being constantly updated so that intel_psr_flush() is called once
>>>> > per frame at 60Hz, PSR will still be activated once every 100ms.
>>>>
>>>> During this time you are right, many calls of intel_psr_exit
>>>> coming from flush functions can be called... But none of
>>>> them will schedule the work with 100 delay.
>>>>
>>>> they will skip on
>>>> if (!work_busy(_priv->psr.work.work))
>>>
>>> Wouldn't work_busy() return false until the work is actually queued
>>> which is 100ms after calling schedule_delayed_work()?
>>
>> That's not my understanding

Re: [Intel-gfx] [PATCH] drm/i915: Improve PSR activation timing

2018-02-08 Thread Andy Lutomirski



> On Feb 8, 2018, at 4:39 PM, Pandiyan, Dhinakaran 
> <dhinakaran.pandi...@intel.com> wrote:
> 
> 
>> On Thu, 2018-02-08 at 14:48 -0800, Rodrigo Vivi wrote:
>> Hi Andy,
>> 
>> thanks for getting involved with PSR and sorry for not replying sooner.
>> 
>> I first saw this patch on that bugzilla entry but only now I stop to
>> really think why I have written the code that way.
>> 
>> So some clarity below.
>> 
>>> On Mon, Feb 05, 2018 at 10:07:09PM +, Andy Lutomirski wrote:
>>> The current PSR code has a two call sites that each schedule delayed
>>> work to activate PSR.  As far as I can tell, each call site intends
>>> to keep PSR inactive for the given amount of time and then allow it
>>> to be activated.
>>> 
>>> The call sites are:
>>> 
>>> - intel_psr_enable(), which explicitly states in a comment that
>>>   it's trying to keep PSR off a short time after the dispay is
>>>   initialized as a workaround.
>> 
>> First of all I really want to kill this call here and remove the
>> FIXME. It was an ugly hack that I added to solve a corner case
>> that was leaving me with blank screens when activating so sooner.
>> 
>>> 
>>> - intel_psr_flush().  There isn't an explcit explanation, but the
>>>   intent is presumably to keep PSR off until the display has been
>>>   idle for 100ms.
>> 
>> The reason for 100 is kind of ugly-nonsense-empirical value
>> I concluded from VLV/CHV experience.
>> On platforms with HW tracking HW waits few identical frames
>> until really activating PSR. VLV/CHV activation is immediate.
>> But HW is also different and there it seemed that hw needed a
>> few more time before starting the transitions.
>> Furthermore I didn't want to add that so quickly because I didn't
>> want to take the risk of killing battery with software tracking
>> when doing transitions so quickly using software tracking.
>> 
>>> 
>>> The current code doesn't actually accomplish either of these goals.
>>> Rather than keeping PSR inactive for the given amount of time, it
>>> will schedule PSR for activation after the given time, with the
>>> earliest target time in such a request winning.
>> 
>> Putting that way I was asking myself how that hack had ever fixed
>> my issue. Because the way you explained here seems obvious that it
>> wouldn't ever fix my bug or any other.
>> 
>> So I applied your patch and it made even more sense (without considering
>> the fact I want to kill the first call anyways).
>> 
>> So I came back, removed your patch and tried to understand how did
>> it ever worked.
>> 
>> So, the thing is that intel_psr_flush will never be really executed
>> if intel_psr_enable wasn't executed. That is guaranteed by:
>> 
>> mutex_lock(_priv->psr.lock);
>>if (!dev_priv->psr.enabled) {
>> 
>> So, intel_psr_enable will be for sure the first one to schedule the
>> work delayed to the ugly higher delay.
>> 
>>> 
>>> In other words, if intel_psr_enable() is immediately followed by
>>> intel_psr_flush(), then PSR will be activated after 100ms even if
>>> intel_psr_enable() wanted a longer delay.  And, if the screen is
>>> being constantly updated so that intel_psr_flush() is called once
>>> per frame at 60Hz, PSR will still be activated once every 100ms.
>> 
>> During this time you are right, many calls of intel_psr_exit
>> coming from flush functions can be called... But none of
>> them will schedule the work with 100 delay.
>> 
>> they will skip on
>> if (!work_busy(_priv->psr.work.work))

As below, the first call will.  Then, 100ms later, the work will fire.  Then 
the next flush will schedule it again, etc.

> 
> Wouldn't work_busy() return false until the work is actually queued
> which is 100ms after calling schedule_delayed_work()?
> 
> For e.g, flushes at 0, 16, 32...96 will have work_busy() returning false
> until 100ms.
> 
> The first psr_work will end up getting scheduled at 100ms, which I
> believe is not what we want. 

Indeed.  I stuck some printks in and this seems to be what happens.

> 
> 
> However, I think 
> 
>if (dev_priv->psr.busy_frontbuffer_bits)
>goto unlock;
> 
>intel_psr_activate(intel_dp);
> 
> in psr_work might prevent activate being called at 100ms if an
> invalidate happened to be called before that.
> 

On my system, invalidate is never called.  Even if it were called, that check 
would only help if we got lucky and the w

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-05 Thread Andy Lutomirski


> On Feb 5, 2018, at 2:50 PM, Rodrigo Vivi <rodrigo.v...@intel.com> wrote:
> 
>> On Sat, Feb 03, 2018 at 05:33:08PM +0000, Andy Lutomirski wrote:
>>> On Fri, Feb 2, 2018 at 7:18 PM, Andy Lutomirski <l...@kernel.org> wrote:
>>>> On Fri, Feb 2, 2018 at 1:24 AM, Andy Lutomirski <l...@kernel.org> wrote:
>>>>> On Thu, Feb 1, 2018 at 9:20 PM, Chris Wilson <ch...@chris-wilson.co.uk> 
>>>>> wrote:
>>>>> Quoting Andy Lutomirski (2018-02-01 21:04:30)
>>>>>> I got this after a recent suspend/resume:
>>>>>> 
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed.
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all 
>>>>>> dirs
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>>>>> scanning /sys/bus
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>>>>> scanning /sys/class
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open
>>>>>> configuration file '/etc/systemd/sleep.conf': No such file or
>>>>>> directory
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending...
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>>>>> sender=n/a destination=n/a object=/org/freedesktop/login1
>>>>>> interface=org.freedesktop.login1.Manager member=PrepareForSleep
>>>>>> cookie=570 reply
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Got message
>>>>>> type=method_call sender=:1.46 destination=:1.1
>>>>>> object=/org/freedesktop/login1/session/_32
>>>>>> interface=org.freedesktop.login1.Session member=ReleaseDevice
>>>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>>>>> sender=n/a destination=:1.46
>>>>>> object=/org/freedesktop/login1/session/_32
>>>>>> interface=org.freedesktop.login1.Session member=PauseDevice cookie
>>>>>> Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane
>>>>>> transform 0: Permission denied
>>>>>> Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed
>>>>>> with (Permission denied), drawing cursor with OpenGL from now on
>>>>>> 
>>>>>> But I don't see the word "cursor" in my system logs before the first
>>>>>> suspend.  What am I looking for?  This is Fedora 27 running a Gnome
>>>>>> Wayland session, but it hasn't been reinstalled in some time, so it's
>>>>>> possible that there are some weird settings sitting around.  But I did
>>>>>> check and I have no weird i915 parameters.
>>>>> 
>>>>> You are using gnome-shell as the display server. From that it appears to
>>>>> have started off with a HW cursor and switched to a SW cursor after
>>>>> suspend. Did you notice a change in behaviour? After rebooting or just
>>>>> restarting gnome-shell?
>>>> 
>>>> I think it's less consistently bad after a reboot before suspending.
>>>> 
>>>>> 
>>>>>> Also, are these things potentially related:
>>>>>> 
>>>>>> [ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
>>>>>> atomic update failure on pipe A
>>>>> 
>>>>> They are just "missed the immediate vblank for the screen update"
>>>>> messages. Should not be related to PSR, but may cause jitter by delaying
>>>>> the odd screen update.
>>>> 
>>>> I just got this one, and the timestamp is at least reasonably close to
>>>> a giant latency spike:
>>>> 
>>>> [  288.799654] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
>>>> update failure on pipe A (start=31 end=32) time 15 us, min 1073, max
>>>> 1079, scanline start 1087, end 1088
>>>> 
>>>>> 
>>>>>> As I'm typing this, I've seen a couple instances of what seems like a
>>>>>> full *second* of cursor latency, but I've only gotten the potential
>>>>>> atomic update failure once.
>>>>>> 
>>>>>> And is there any straightforward tracing to do to distinguish between
>>>>>> PSR exit latency and other potential s

[PATCH] drm/i915: Improve PSR activation timing

2018-02-05 Thread Andy Lutomirski
The current PSR code has a two call sites that each schedule delayed
work to activate PSR.  As far as I can tell, each call site intends
to keep PSR inactive for the given amount of time and then allow it
to be activated.

The call sites are:

 - intel_psr_enable(), which explicitly states in a comment that
   it's trying to keep PSR off a short time after the dispay is
   initialized as a workaround.

 - intel_psr_flush().  There isn't an explcit explanation, but the
   intent is presumably to keep PSR off until the display has been
   idle for 100ms.

The current code doesn't actually accomplish either of these goals.
Rather than keeping PSR inactive for the given amount of time, it
will schedule PSR for activation after the given time, with the
earliest target time in such a request winning.

In other words, if intel_psr_enable() is immediately followed by
intel_psr_flush(), then PSR will be activated after 100ms even if
intel_psr_enable() wanted a longer delay.  And, if the screen is
being constantly updated so that intel_psr_flush() is called once
per frame at 60Hz, PSR will still be activated once every 100ms.

Rewrite the code so that it does what was intended.  This adds
a new function intel_psr_schedule(), which will enable PSR after
the requested time but no sooner.

Signed-off-by: Andy Lutomirski <l...@kernel.org>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  9 +++--
 drivers/gpu/drm/i915/i915_drv.h |  4 ++-
 drivers/gpu/drm/i915/intel_psr.c| 69 -
 3 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index c65e381b85f3..b67db93f905d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2663,8 +2663,13 @@ static int i915_edp_psr_status(struct seq_file *m, void 
*data)
seq_printf(m, "Active: %s\n", yesno(dev_priv->psr.active));
seq_printf(m, "Busy frontbuffer bits: 0x%03x\n",
   dev_priv->psr.busy_frontbuffer_bits);
-   seq_printf(m, "Re-enable work scheduled: %s\n",
-  yesno(work_busy(_priv->psr.work.work)));
+
+   if (timer_pending(_priv->psr.activate_timer))
+   seq_printf(m, "Activate scheduled: yes, in %ldms\n",
+  (long)(dev_priv->psr.earliest_activate - jiffies) *
+  1000 / HZ);
+   else
+   seq_printf(m, "Re-enable scheduled: no\n");
 
if (HAS_DDI(dev_priv)) {
if (dev_priv->psr.psr2_support)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 46eb729b367d..c0fb7d65cda6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1192,7 +1192,9 @@ struct i915_psr {
bool source_ok;
struct intel_dp *enabled;
bool active;
-   struct delayed_work work;
+   struct timer_list activate_timer;
+   struct work_struct activate_work;
+   unsigned long earliest_activate;
unsigned busy_frontbuffer_bits;
bool psr2_support;
bool aux_frame_sync;
diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c
index 55ea5eb3b7df..333d90d4e5af 100644
--- a/drivers/gpu/drm/i915/intel_psr.c
+++ b/drivers/gpu/drm/i915/intel_psr.c
@@ -461,6 +461,30 @@ static void intel_psr_activate(struct intel_dp *intel_dp)
dev_priv->psr.active = true;
 }
 
+static void intel_psr_schedule(struct drm_i915_private *dev_priv,
+  unsigned long min_wait_ms)
+{
+   unsigned long next;
+
+   lockdep_assert_held(_priv->psr.lock);
+
+   /*
+* We update next_enable *and* call mod_timer() because it's
+* possible that intel_psr_work() has already been called and is
+* waiting for psr.lock.  If that's the case, we don't want it
+* to immediately enable PSR.
+*
+* We also need to make sure that PSR is never activated earlier
+* than requested to avoid breaking intel_psr_enable()'s workaround
+* for pre-gen9 hardware.
+*/
+   next = jiffies + msecs_to_jiffies(min_wait_ms);
+   if (time_after(next, dev_priv->psr.earliest_activate)) {
+   dev_priv->psr.earliest_activate = next;
+   mod_timer(_priv->psr.activate_timer, next);
+   }
+}
+
 static void hsw_psr_enable_source(struct intel_dp *intel_dp,
  const struct intel_crtc_state *crtc_state)
 {
@@ -544,8 +568,7 @@ void intel_psr_enable(struct intel_dp *intel_dp,
 * - On HSW/BDW we get a recoverable frozen screen until
 *   next exit-activate sequence.
 */
-   schedule_delayed_work(_priv->psr.work,
- 
msecs_to_jiffies(intel_dp->panel_power_cycle_d

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-05 Thread Andy Lutomirski
On Mon, Feb 5, 2018 at 9:17 PM, Pandiyan, Dhinakaran
<dhinakaran.pandi...@intel.com> wrote:
>
> On Mon, 2018-02-05 at 20:35 +, Andy Lutomirski wrote:
>> On Mon, Feb 5, 2018 at 6:53 PM, Pandiyan, Dhinakaran
>> <dhinakaran.pandi...@intel.com> wrote:
>> >
>> >
>> >
>> > On Sun, 2018-02-04 at 21:50 +0000, Andy Lutomirski wrote:
>> >> On Sat, Feb 3, 2018 at 5:08 PM, Andy Lutomirski <l...@kernel.org> wrote:
>> >> > On Sat, Feb 3, 2018 at 5:20 AM, Pandiyan, Dhinakaran
>> >> > <dhinakaran.pandi...@intel.com> wrote:
>> >> >>
>> >> >> On Fri, 2018-02-02 at 19:18 +, Andy Lutomirski wrote:
>> >> >>> I updated to 4.15, and the situation is much worse.  With
>> >> >>> enable_psr=1, the system survives for several seconds and then the
>> >> >>> screen stops updating entirely.  If I boot with i915.enable_psr=1, I
>> >> >>> get to the Fedora login screen and then the system dies.  If I set
>> >> >>> enable_psr=1 using sysfs, it does a bit after the next resume.  It
>> >> >>> seems like it also sometimes hangs even worse a bit after the screen
>> >> >>> stops updating, but it's hard to tell.
>> >> >>
>> >> >> The login screen freeze sounds like what I have. Does this system have
>> >> >> DMC firmware? If yes, can you try this series
>> >> >> https://patchwork.freedesktop.org/series/37598/. You'll only need
>> >> >> patches 1,8,9 and 10.
>> >> >
>> >> > That fixes the hang.  Feel free to add:
>> >> >
>> >> > Tested-by: Andy Lutomirski <l...@kernel.org>
>> >> >
>> >> > to the i915 parts.  Also, any chance of getting it into the 4.15 stable 
>> >> > kernels?
>> >>
>> >> Correction: I'm still getting a second or two of complete screen
>> >> freezing every now and then.  The kernel says:
>> > Thanks a lot for testing. How do you trigger this freeze? Moving the
>> > cursor? Did you apply these patches on top of drm-tip or was it
>> > mainline?
>> >
>> > I also have another patch here that addresses screen freezes in console
>> > mode with PSR - https://patchwork.freedesktop.org/patch/201144/ in case
>> > that is what you are interested in.
>> >>
>> >> [69400.016524] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
>> >> update failure on pipe A (start=19 end=20) time 198 us, min 1073, max
>> >> 1079, scanline start 1068, end 1082
>> >>
>> >> So something might still be a bit buggy.
>> >
>> > This series fixes only the long freezes due to frame counter resets, I
>> > am sure there are still other issues with PSR.
>> >
>> > BTW does your patch on top of these patches help with the cursor lag?
>>
>> Maybe, but I'm not 100% sure.  I'm not currently seeing the lag with
>> or without the patch.  I also think my distro fixed the cursor in the
>> mean time so that it uses the HW cursor even after suspend/resume.
>>
>> A couple of questions, though:
>>
>> 1. Does moving the HW cursor cause the hardware to automatically turn off 
>> PSR?
>>
> That is correct.
>
>> 2 When something enables vblank interrupts (using drm_*_vblank_get(),
>> for example), are vblank interrupts generated even if PSR is on?
>
> Enabling vblank interrupts deactivates PSR (except on Braswell afaik)
>
>>   And
>> is the scanline, as returned by intel_get_crtc_scanline(), updated?
>
> I don't think so, I have not really checked but there are no frames
> generated, so the timing related registers will not get updated. This is
> the case with the frame counter register.
>

I bet that's the cause of some of the glitches I'm seeing.  I
instrumented intel_pipe_update_start() like this:

diff --git a/drivers/gpu/drm/i915/intel_sprite.c
b/drivers/gpu/drm/i915/intel_sprite.c
index 4a8a5d918a83..6ce0a35187fb 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -97,6 +97,7 @@ void intel_pipe_update_start(const struct
intel_crtc_state *new_crtc_state)
 bool need_vlv_dsi_wa = (IS_VALLEYVIEW(dev_priv) ||
IS_CHERRYVIEW(dev_priv)) &&
 intel_crtc_has_type(new_crtc_state, INTEL_OUTPUT_DSI);
 DEFINE_WAIT(wait);
+int first_scanline = -1;

 vblank_start = adjusted_mode->crtc_vblank_start;
 if (adjusted_mode->flags & DRM_MODE_FLAG_INTERLACE)
@@ -131,9 +132,12 @@ void intel

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-05 Thread Andy Lutomirski
On Mon, Feb 5, 2018 at 6:53 PM, Pandiyan, Dhinakaran
<dhinakaran.pandi...@intel.com> wrote:
>
>
>
> On Sun, 2018-02-04 at 21:50 +0000, Andy Lutomirski wrote:
>> On Sat, Feb 3, 2018 at 5:08 PM, Andy Lutomirski <l...@kernel.org> wrote:
>> > On Sat, Feb 3, 2018 at 5:20 AM, Pandiyan, Dhinakaran
>> > <dhinakaran.pandi...@intel.com> wrote:
>> >>
>> >> On Fri, 2018-02-02 at 19:18 +, Andy Lutomirski wrote:
>> >>> I updated to 4.15, and the situation is much worse.  With
>> >>> enable_psr=1, the system survives for several seconds and then the
>> >>> screen stops updating entirely.  If I boot with i915.enable_psr=1, I
>> >>> get to the Fedora login screen and then the system dies.  If I set
>> >>> enable_psr=1 using sysfs, it does a bit after the next resume.  It
>> >>> seems like it also sometimes hangs even worse a bit after the screen
>> >>> stops updating, but it's hard to tell.
>> >>
>> >> The login screen freeze sounds like what I have. Does this system have
>> >> DMC firmware? If yes, can you try this series
>> >> https://patchwork.freedesktop.org/series/37598/. You'll only need
>> >> patches 1,8,9 and 10.
>> >
>> > That fixes the hang.  Feel free to add:
>> >
>> > Tested-by: Andy Lutomirski <l...@kernel.org>
>> >
>> > to the i915 parts.  Also, any chance of getting it into the 4.15 stable 
>> > kernels?
>>
>> Correction: I'm still getting a second or two of complete screen
>> freezing every now and then.  The kernel says:
> Thanks a lot for testing. How do you trigger this freeze? Moving the
> cursor? Did you apply these patches on top of drm-tip or was it
> mainline?
>
> I also have another patch here that addresses screen freezes in console
> mode with PSR - https://patchwork.freedesktop.org/patch/201144/ in case
> that is what you are interested in.
>>
>> [69400.016524] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
>> update failure on pipe A (start=19 end=20) time 198 us, min 1073, max
>> 1079, scanline start 1068, end 1082
>>
>> So something might still be a bit buggy.
>
> This series fixes only the long freezes due to frame counter resets, I
> am sure there are still other issues with PSR.
>
> BTW does your patch on top of these patches help with the cursor lag?

Maybe, but I'm not 100% sure.  I'm not currently seeing the lag with
or without the patch.  I also think my distro fixed the cursor in the
mean time so that it uses the HW cursor even after suspend/resume.

A couple of questions, though:

1. Does moving the HW cursor cause the hardware to automatically turn off PSR?

2 When something enables vblank interrupts (using drm_*_vblank_get(),
for example), are vblank interrupts generated even if PSR is on?  And
is the scanline, as returned by intel_get_crtc_scanline(), updated?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-04 Thread Andy Lutomirski
On Sat, Feb 3, 2018 at 5:08 PM, Andy Lutomirski <l...@kernel.org> wrote:
> On Sat, Feb 3, 2018 at 5:20 AM, Pandiyan, Dhinakaran
> <dhinakaran.pandi...@intel.com> wrote:
>>
>> On Fri, 2018-02-02 at 19:18 +, Andy Lutomirski wrote:
>>> I updated to 4.15, and the situation is much worse.  With
>>> enable_psr=1, the system survives for several seconds and then the
>>> screen stops updating entirely.  If I boot with i915.enable_psr=1, I
>>> get to the Fedora login screen and then the system dies.  If I set
>>> enable_psr=1 using sysfs, it does a bit after the next resume.  It
>>> seems like it also sometimes hangs even worse a bit after the screen
>>> stops updating, but it's hard to tell.
>>
>> The login screen freeze sounds like what I have. Does this system have
>> DMC firmware? If yes, can you try this series
>> https://patchwork.freedesktop.org/series/37598/. You'll only need
>> patches 1,8,9 and 10.
>
> That fixes the hang.  Feel free to add:
>
> Tested-by: Andy Lutomirski <l...@kernel.org>
>
> to the i915 parts.  Also, any chance of getting it into the 4.15 stable 
> kernels?

Correction: I'm still getting a second or two of complete screen
freezing every now and then.  The kernel says:

[69400.016524] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
update failure on pipe A (start=19 end=20) time 198 us, min 1073, max
1079, scanline start 1068, end 1082

So something might still be a bit buggy.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-03 Thread Andy Lutomirski
On Fri, Feb 2, 2018 at 7:18 PM, Andy Lutomirski <l...@kernel.org> wrote:
> On Fri, Feb 2, 2018 at 1:24 AM, Andy Lutomirski <l...@kernel.org> wrote:
>> On Thu, Feb 1, 2018 at 9:20 PM, Chris Wilson <ch...@chris-wilson.co.uk> 
>> wrote:
>>> Quoting Andy Lutomirski (2018-02-01 21:04:30)
>>>> I got this after a recent suspend/resume:
>>>>
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed.
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all 
>>>> dirs
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>>> scanning /sys/bus
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>>> scanning /sys/class
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open
>>>> configuration file '/etc/systemd/sleep.conf': No such file or
>>>> directory
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending...
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>>> sender=n/a destination=n/a object=/org/freedesktop/login1
>>>> interface=org.freedesktop.login1.Manager member=PrepareForSleep
>>>> cookie=570 reply
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Got message
>>>> type=method_call sender=:1.46 destination=:1.1
>>>> object=/org/freedesktop/login1/session/_32
>>>> interface=org.freedesktop.login1.Session member=ReleaseDevice
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>>> sender=n/a destination=:1.46
>>>> object=/org/freedesktop/login1/session/_32
>>>> interface=org.freedesktop.login1.Session member=PauseDevice cookie
>>>> Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane
>>>> transform 0: Permission denied
>>>> Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed
>>>> with (Permission denied), drawing cursor with OpenGL from now on
>>>>
>>>> But I don't see the word "cursor" in my system logs before the first
>>>> suspend.  What am I looking for?  This is Fedora 27 running a Gnome
>>>> Wayland session, but it hasn't been reinstalled in some time, so it's
>>>> possible that there are some weird settings sitting around.  But I did
>>>> check and I have no weird i915 parameters.
>>>
>>> You are using gnome-shell as the display server. From that it appears to
>>> have started off with a HW cursor and switched to a SW cursor after
>>> suspend. Did you notice a change in behaviour? After rebooting or just
>>> restarting gnome-shell?
>>
>> I think it's less consistently bad after a reboot before suspending.
>>
>>>
>>>> Also, are these things potentially related:
>>>>
>>>> [ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
>>>> atomic update failure on pipe A
>>>
>>> They are just "missed the immediate vblank for the screen update"
>>> messages. Should not be related to PSR, but may cause jitter by delaying
>>> the odd screen update.
>>
>> I just got this one, and the timestamp is at least reasonably close to
>> a giant latency spike:
>>
>> [  288.799654] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
>> update failure on pipe A (start=31 end=32) time 15 us, min 1073, max
>> 1079, scanline start 1087, end 1088
>>
>>>
>>>> As I'm typing this, I've seen a couple instances of what seems like a
>>>> full *second* of cursor latency, but I've only gotten the potential
>>>> atomic update failure once.
>>>>
>>>> And is there any straightforward tracing to do to distinguish between
>>>> PSR exit latency and other potential sources of latency?
>>>
>>> It looks plausible that we could at least report how long it takes the
>>> registers to reflect the change in state (but we don't). The best source
>>> of information atm is /sys/kernel/debug/dri/0/i915_edp_psr_status.
>>
>> Hmm.
>>
>> I went and looked at the code, and I noticed what could be bugs or
>> could (more likely) be my confusion since I don't know this code at
>> all:
>>
>> intel_single_frame_update() does something inscrutable to me, but I
>> imagine it does something that causes the next page flip to get
>> noticed by the panel even with PSR on.  But how does the code that
>> calls it know that anything happened?  (Looking at 

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-03 Thread Andy Lutomirski
On Sat, Feb 3, 2018 at 5:20 AM, Pandiyan, Dhinakaran
<dhinakaran.pandi...@intel.com> wrote:
>
> On Fri, 2018-02-02 at 19:18 +, Andy Lutomirski wrote:
>> I updated to 4.15, and the situation is much worse.  With
>> enable_psr=1, the system survives for several seconds and then the
>> screen stops updating entirely.  If I boot with i915.enable_psr=1, I
>> get to the Fedora login screen and then the system dies.  If I set
>> enable_psr=1 using sysfs, it does a bit after the next resume.  It
>> seems like it also sometimes hangs even worse a bit after the screen
>> stops updating, but it's hard to tell.
>
> The login screen freeze sounds like what I have. Does this system have
> DMC firmware? If yes, can you try this series
> https://patchwork.freedesktop.org/series/37598/. You'll only need
> patches 1,8,9 and 10.

That fixes the hang.  Feel free to add:

Tested-by: Andy Lutomirski <l...@kernel.org>

to the i915 parts.  Also, any chance of getting it into the 4.15 stable kernels?

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-02 Thread Andy Lutomirski
On Fri, Feb 2, 2018 at 7:18 PM, Andy Lutomirski <l...@kernel.org> wrote:
> On Fri, Feb 2, 2018 at 1:24 AM, Andy Lutomirski <l...@kernel.org> wrote:
>> On Thu, Feb 1, 2018 at 9:20 PM, Chris Wilson <ch...@chris-wilson.co.uk> 
>> wrote:
>>> Quoting Andy Lutomirski (2018-02-01 21:04:30)
>>>> I got this after a recent suspend/resume:
>>>>
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed.
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all 
>>>> dirs
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>>> scanning /sys/bus
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>>> scanning /sys/class
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open
>>>> configuration file '/etc/systemd/sleep.conf': No such file or
>>>> directory
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending...
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>>> sender=n/a destination=n/a object=/org/freedesktop/login1
>>>> interface=org.freedesktop.login1.Manager member=PrepareForSleep
>>>> cookie=570 reply
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Got message
>>>> type=method_call sender=:1.46 destination=:1.1
>>>> object=/org/freedesktop/login1/session/_32
>>>> interface=org.freedesktop.login1.Session member=ReleaseDevice
>>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>>> sender=n/a destination=:1.46
>>>> object=/org/freedesktop/login1/session/_32
>>>> interface=org.freedesktop.login1.Session member=PauseDevice cookie
>>>> Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane
>>>> transform 0: Permission denied
>>>> Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed
>>>> with (Permission denied), drawing cursor with OpenGL from now on
>>>>
>>>> But I don't see the word "cursor" in my system logs before the first
>>>> suspend.  What am I looking for?  This is Fedora 27 running a Gnome
>>>> Wayland session, but it hasn't been reinstalled in some time, so it's
>>>> possible that there are some weird settings sitting around.  But I did
>>>> check and I have no weird i915 parameters.
>>>
>>> You are using gnome-shell as the display server. From that it appears to
>>> have started off with a HW cursor and switched to a SW cursor after
>>> suspend. Did you notice a change in behaviour? After rebooting or just
>>> restarting gnome-shell?
>>
>> I think it's less consistently bad after a reboot before suspending.
>>
>>>
>>>> Also, are these things potentially related:
>>>>
>>>> [ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
>>>> atomic update failure on pipe A
>>>
>>> They are just "missed the immediate vblank for the screen update"
>>> messages. Should not be related to PSR, but may cause jitter by delaying
>>> the odd screen update.
>>
>> I just got this one, and the timestamp is at least reasonably close to
>> a giant latency spike:
>>
>> [  288.799654] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
>> update failure on pipe A (start=31 end=32) time 15 us, min 1073, max
>> 1079, scanline start 1087, end 1088
>>
>>>
>>>> As I'm typing this, I've seen a couple instances of what seems like a
>>>> full *second* of cursor latency, but I've only gotten the potential
>>>> atomic update failure once.
>>>>
>>>> And is there any straightforward tracing to do to distinguish between
>>>> PSR exit latency and other potential sources of latency?
>>>
>>> It looks plausible that we could at least report how long it takes the
>>> registers to reflect the change in state (but we don't). The best source
>>> of information atm is /sys/kernel/debug/dri/0/i915_edp_psr_status.
>>
>> Hmm.
>>
>> I went and looked at the code, and I noticed what could be bugs or
>> could (more likely) be my confusion since I don't know this code at
>> all:
>>
>> intel_single_frame_update() does something inscrutable to me, but I
>> imagine it does something that causes the next page flip to get
>> noticed by the panel even with PSR on.  But how does the code that
>> calls it know that anything happened?  (Looking at 

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-02 Thread Andy Lutomirski
On Fri, Feb 2, 2018 at 1:24 AM, Andy Lutomirski <l...@kernel.org> wrote:
> On Thu, Feb 1, 2018 at 9:20 PM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
>> Quoting Andy Lutomirski (2018-02-01 21:04:30)
>>> I got this after a recent suspend/resume:
>>>
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed.
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all 
>>> dirs
>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>> scanning /sys/bus
>>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>>> scanning /sys/class
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open
>>> configuration file '/etc/systemd/sleep.conf': No such file or
>>> directory
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending...
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>> sender=n/a destination=n/a object=/org/freedesktop/login1
>>> interface=org.freedesktop.login1.Manager member=PrepareForSleep
>>> cookie=570 reply
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Got message
>>> type=method_call sender=:1.46 destination=:1.1
>>> object=/org/freedesktop/login1/session/_32
>>> interface=org.freedesktop.login1.Session member=ReleaseDevice
>>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>>> sender=n/a destination=:1.46
>>> object=/org/freedesktop/login1/session/_32
>>> interface=org.freedesktop.login1.Session member=PauseDevice cookie
>>> Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane
>>> transform 0: Permission denied
>>> Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed
>>> with (Permission denied), drawing cursor with OpenGL from now on
>>>
>>> But I don't see the word "cursor" in my system logs before the first
>>> suspend.  What am I looking for?  This is Fedora 27 running a Gnome
>>> Wayland session, but it hasn't been reinstalled in some time, so it's
>>> possible that there are some weird settings sitting around.  But I did
>>> check and I have no weird i915 parameters.
>>
>> You are using gnome-shell as the display server. From that it appears to
>> have started off with a HW cursor and switched to a SW cursor after
>> suspend. Did you notice a change in behaviour? After rebooting or just
>> restarting gnome-shell?
>
> I think it's less consistently bad after a reboot before suspending.
>
>>
>>> Also, are these things potentially related:
>>>
>>> [ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
>>> atomic update failure on pipe A
>>
>> They are just "missed the immediate vblank for the screen update"
>> messages. Should not be related to PSR, but may cause jitter by delaying
>> the odd screen update.
>
> I just got this one, and the timestamp is at least reasonably close to
> a giant latency spike:
>
> [  288.799654] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
> update failure on pipe A (start=31 end=32) time 15 us, min 1073, max
> 1079, scanline start 1087, end 1088
>
>>
>>> As I'm typing this, I've seen a couple instances of what seems like a
>>> full *second* of cursor latency, but I've only gotten the potential
>>> atomic update failure once.
>>>
>>> And is there any straightforward tracing to do to distinguish between
>>> PSR exit latency and other potential sources of latency?
>>
>> It looks plausible that we could at least report how long it takes the
>> registers to reflect the change in state (but we don't). The best source
>> of information atm is /sys/kernel/debug/dri/0/i915_edp_psr_status.
>
> Hmm.
>
> I went and looked at the code, and I noticed what could be bugs or
> could (more likely) be my confusion since I don't know this code at
> all:
>
> intel_single_frame_update() does something inscrutable to me, but I
> imagine it does something that causes the next page flip to get
> noticed by the panel even with PSR on.  But how does the code that
> calls it know that anything happened?  (Looking at the commit history,
> maybe this is something special that's only needed on some platforms
> but doesn't replace the normal PSR exit sequence.)
>
> Perhaps more interestingly, intel_psr_flush() does this:
>
> /* By definition flush = invalidate + flush */
> if (frontbuffer_bits)
> intel_psr_exit(dev_priv);
>
> if (!dev_priv->psr.active && !dev_priv->psr.busy_f

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-01 Thread Andy Lutomirski
On Thu, Feb 1, 2018 at 9:20 PM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> Quoting Andy Lutomirski (2018-02-01 21:04:30)
>> I got this after a recent suspend/resume:
>>
>> Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed.
>> Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all dirs
>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>> scanning /sys/bus
>> Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
>> scanning /sys/class
>> Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open
>> configuration file '/etc/systemd/sleep.conf': No such file or
>> directory
>> Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending...
>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>> sender=n/a destination=n/a object=/org/freedesktop/login1
>> interface=org.freedesktop.login1.Manager member=PrepareForSleep
>> cookie=570 reply
>> Feb 01 09:44:34 laptop systemd-logind[2412]: Got message
>> type=method_call sender=:1.46 destination=:1.1
>> object=/org/freedesktop/login1/session/_32
>> interface=org.freedesktop.login1.Session member=ReleaseDevice
>> Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
>> sender=n/a destination=:1.46
>> object=/org/freedesktop/login1/session/_32
>> interface=org.freedesktop.login1.Session member=PauseDevice cookie
>> Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane
>> transform 0: Permission denied
>> Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed
>> with (Permission denied), drawing cursor with OpenGL from now on
>>
>> But I don't see the word "cursor" in my system logs before the first
>> suspend.  What am I looking for?  This is Fedora 27 running a Gnome
>> Wayland session, but it hasn't been reinstalled in some time, so it's
>> possible that there are some weird settings sitting around.  But I did
>> check and I have no weird i915 parameters.
>
> You are using gnome-shell as the display server. From that it appears to
> have started off with a HW cursor and switched to a SW cursor after
> suspend. Did you notice a change in behaviour? After rebooting or just
> restarting gnome-shell?

I think it's less consistently bad after a reboot before suspending.

>
>> Also, are these things potentially related:
>>
>> [ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
>> atomic update failure on pipe A
>
> They are just "missed the immediate vblank for the screen update"
> messages. Should not be related to PSR, but may cause jitter by delaying
> the odd screen update.

I just got this one, and the timestamp is at least reasonably close to
a giant latency spike:

[  288.799654] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic
update failure on pipe A (start=31 end=32) time 15 us, min 1073, max
1079, scanline start 1087, end 1088

>
>> As I'm typing this, I've seen a couple instances of what seems like a
>> full *second* of cursor latency, but I've only gotten the potential
>> atomic update failure once.
>>
>> And is there any straightforward tracing to do to distinguish between
>> PSR exit latency and other potential sources of latency?
>
> It looks plausible that we could at least report how long it takes the
> registers to reflect the change in state (but we don't). The best source
> of information atm is /sys/kernel/debug/dri/0/i915_edp_psr_status.

Hmm.

I went and looked at the code, and I noticed what could be bugs or
could (more likely) be my confusion since I don't know this code at
all:

intel_single_frame_update() does something inscrutable to me, but I
imagine it does something that causes the next page flip to get
noticed by the panel even with PSR on.  But how does the code that
calls it know that anything happened?  (Looking at the commit history,
maybe this is something special that's only needed on some platforms
but doesn't replace the normal PSR exit sequence.)

Perhaps more interestingly, intel_psr_flush() does this:

/* By definition flush = invalidate + flush */
if (frontbuffer_bits)
intel_psr_exit(dev_priv);

if (!dev_priv->psr.active && !dev_priv->psr.busy_frontbuffer_bits)
if (!work_busy(_priv->psr.work.work))
schedule_delayed_work(_priv->psr.work,
  msecs_to_jiffies(100));

I'm guessing that the idea is that we're turning off PSR because we
want the panel to update and we expect that, in 100ms, the update will
have hit the panel and we'll have been idle long enough for it to make
sense to re-enter PSR.  IOW, the code wants PSR to be off for at least
100ms and then to turn back on.  But th

Re: [Intel-gfx] i915 PSR test results and cursor lag

2018-02-01 Thread Andy Lutomirski
On Thu, Feb 1, 2018 at 9:53 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> Quoting Andy Lutomirski (2018-02-01 17:40:22)
>> *However*, I do see one unfortunate side effect of turning on PSR.  It
>> seems that, when I move my cursor a little bit after a few seconds of
>> doing nothing, there seems to be a little bit of lag, as if either a
>> few frames are dropped at the beginning of the motion or maybe the
>> entire motion is delayed a bit.  I don't notice a similar delay when
>> typing, so I'm wondering if maybe there's a minor driver bug in which
>> the driver doesn't kick the panel out of PSR quite as quickly when the
>> cursor is updated as it does when the framebuffer is updated.
>
> One thing that's important know regarding the cursor is whether the
> display server is using a HW cursor or SW cursor. Could you please attach
> the log from the display server (or if you are using a stock
> distribution that's probably enough to work out what it is using)?
> -Chris

Looking at the logs, I see a few things.  First, I have a few of these:

Feb 01 09:24:24 laptop kernel: [drm:intel_pipe_update_start [i915]]
*ERROR* Potential atomic update failure on pipe A
Feb 01 09:24:48 laptop org.gnome.Shell.desktop[3261]: libinput error:
event15 - libinput error: DLL0704:01 06CB:76AE Touchpad: libinput
error: kernel bug: Touch jump detected and discarded.
Feb 01 09:24:48 laptop org.gnome.Shell.desktop[3261]: See
https://wayland.freedesktop.org/libinput/doc/1.9.3/touchpad_jumping_cursor.html
for details
Feb 01 09:24:50 laptop org.gnome.Shell.desktop[3261]: libinput error:
event15 - libinput error: DLL0704:01 06CB:76AE Touchpad: libinput
error: kernel bug: Touch jump detected and discarded.
Feb 01 09:24:50 laptop org.gnome.Shell.desktop[3261]: See
https://wayland.freedesktop.org/libinput/doc/1.9.3/touchpad_jumping_cursor.html
for details

(Hi, Peter!)

So it's entirely possible that what I'm seeing is actually an input
issue that's exacerbated by PSR for some bizarre reason.

I got this after a recent suspend/resume:

Feb 01 09:44:34 laptop systemd-logind[2412]: Lid closed.
Feb 01 09:44:34 laptop systemd-logind[2412]: device-enumerator: scan all dirs
Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
scanning /sys/bus
Feb 01 09:44:34 laptop systemd-logind[2412]:   device-enumerator:
scanning /sys/class
Feb 01 09:44:34 laptop systemd-logind[2412]: Failed to open
configuration file '/etc/systemd/sleep.conf': No such file or
directory
Feb 01 09:44:34 laptop systemd-logind[2412]: Suspending...
Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
sender=n/a destination=n/a object=/org/freedesktop/login1
interface=org.freedesktop.login1.Manager member=PrepareForSleep
cookie=570 reply
Feb 01 09:44:34 laptop systemd-logind[2412]: Got message
type=method_call sender=:1.46 destination=:1.1
object=/org/freedesktop/login1/session/_32
interface=org.freedesktop.login1.Session member=ReleaseDevice
Feb 01 09:44:34 laptop systemd-logind[2412]: Sent message type=signal
sender=n/a destination=:1.46
object=/org/freedesktop/login1/session/_32
interface=org.freedesktop.login1.Session member=PauseDevice cookie
Feb 01 09:44:34 laptop gnome-shell[2630]: Failed to apply DRM plane
transform 0: Permission denied
Feb 01 09:44:34 laptop gnome-shell[2630]: drmModeSetCursor2 failed
with (Permission denied), drawing cursor with OpenGL from now on

But I don't see the word "cursor" in my system logs before the first
suspend.  What am I looking for?  This is Fedora 27 running a Gnome
Wayland session, but it hasn't been reinstalled in some time, so it's
possible that there are some weird settings sitting around.  But I did
check and I have no weird i915 parameters.

Also, are these things potentially related:

[ 3067.702527] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
atomic update failure on pipe A

As I'm typing this, I've seen a couple instances of what seems like a
full *second* of cursor latency, but I've only gotten the potential
atomic update failure once.

And is there any straightforward tracing to do to distinguish between
PSR exit latency and other potential sources of latency?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: i915 PSR test results and cursor lag

2018-02-01 Thread Andy Lutomirski
On Thu, Feb 1, 2018 at 9:40 AM, Andy Lutomirski <l...@kernel.org> wrote:
> Hi-
>
> As requested in your blog post, I tested PSR.  I see something like
> 2.69W with PSR off and 2.17W with PSR on.  Screen blanking,
> suspend/resume, and the contents of the screen all seem okay.  This is
> a Dell XPS 13 9350, i.e.:
>
> System Information
> Manufacturer: Dell Inc.
> Product Name: XPS 13 9350
>
> EDID is attached.
>
> *However*, I do see one unfortunate side effect of turning on PSR.  It
> seems that, when I move my cursor a little bit after a few seconds of
> doing nothing, there seems to be a little bit of lag, as if either a
> few frames are dropped at the beginning of the motion or maybe the
> entire motion is delayed a bit.  I don't notice a similar delay when
> typing, so I'm wondering if maybe there's a minor driver bug in which
> the driver doesn't kick the panel out of PSR quite as quickly when the
> cursor is updated as it does when the framebuffer is updated.
>

I'm also getting occasional messages like:

[ 2675.574486] [drm:intel_pipe_update_start [i915]] *ERROR* Potential
atomic update failure on pipe A

with PSR on.  But there is nowhere near one of these messages per tiny
lag incident.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


i915 PSR test results and cursor lag

2018-02-01 Thread Andy Lutomirski
Hi-

As requested in your blog post, I tested PSR.  I see something like
2.69W with PSR off and 2.17W with PSR on.  Screen blanking,
suspend/resume, and the contents of the screen all seem okay.  This is
a Dell XPS 13 9350, i.e.:

System Information
Manufacturer: Dell Inc.
Product Name: XPS 13 9350

EDID is attached.

*However*, I do see one unfortunate side effect of turning on PSR.  It
seems that, when I move my cursor a little bit after a few seconds of
doing nothing, there seems to be a little bit of lag, as if either a
few frames are dropped at the beginning of the motion or maybe the
entire motion is delayed a bit.  I don't notice a similar delay when
typing, so I'm wondering if maybe there's a minor driver bug in which
the driver doesn't kick the panel out of PSR quite as quickly when the
cursor is updated as it does when the framebuffer is updated.

(A couple of lists are cc'd

BTW, switching PSR on and off using
/sys/module/i915/parameters/enable_psr seems to work fine, although it
seems like I may need to suspend/resume to get it to kick in.  But, if
there's really going to be a blacklist or whitelist of panels in
userspace, shouldn't there be an option in sysfs in
/sys/class/drm/card0-eDP-1/ or similar?


--Andy


panel-edid
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Skylake underruns on 4.8-rc4

2016-08-29 Thread Andy Lutomirski
My Dell XPS 13 9350 laptop just got a buffer underrun:

[drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A
FIFO underrun

I'm seeing this very occasionally, and they don't come in groups -- I
seem to get one underrun with a black flash and that's it.  This is
with just the laptop screen -- nothing at all is plugged in to the
USB-C port.

4.8-rc4 has the latest round of fixes applied, so
i915/skl_dmc_ver1_26.bin loaded successfully and the SAGV fix is
there.

I had the same problem on 4.8-rc3.  4.7 seemed okay.

I have:

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)

--Andy


[Nouveau] Should I expect nouveau on 4.6 to work on a GM206?

2016-06-26 Thread Andy Lutomirski
On Sun, Jun 26, 2016 at 10:59 AM, Ilia Mirkin  wrote:
> On Sun, Jun 26, 2016 at 1:49 PM, Andy Lutomirski  
> wrote:
>> On Sun, May 29, 2016 at 12:27 PM, Andy Lutomirski  wrote:
>>> On Sun, May 29, 2016 at 12:22 PM, Ilia Mirkin  
>>> wrote:
>>>> On Sun, May 29, 2016 at 3:07 PM, Andy Lutomirski  
>>>> wrote:
>>>>> On Sat, May 28, 2016 at 5:48 PM, Ilia Mirkin  
>>>>> wrote:
>>>>>> Do you have mesa 11.2 or later? GM20x support was only added in mesa 
>>>>>> 11.2.
>>>>>>
>>>>>
>>>>> I just upgraded to 11.2.  I'm getting errors like this in the log:
>>>>>
>>>>> [ 5383.723240] nouveau :09:00.0: fifo: read fault at 011000
>>>>> engine 07 [PBDMA0] client 06 [HOST] reason 00 [PDE] on channel -1
>>>>> [007f9ed000 unknown]
>>>>> [ 5398.722676] nouveau :09:00.0: systemd-logind[30778]: failed to
>>>>> idle channel 2 [systemd-logind[30778]]
>>>>> [ 5413.722853] nouveau :09:00.0: systemd-logind[30778]: failed to
>>>>> idle channel 2 [systemd-logind[30778]]
>>>>>
>>>>> and the display output in general is unreliable enough that I'm having
>>>>> trouble telling whether the performance is remotely reasonable.
>>>>
>>>> If you're having trouble telling, that means it's not :) The error you
>>>> pasted is quite odd. Was there anything in the log before those
>>>> messages? If there's no channel associated, that means that it's the
>>>> background copying between vram and sysmem? Not sure.
>>>
>>> Don't get too excited yet.  In the process of upgrading mesa, I
>>> managed to boot 4.5 without noticing.  I'll post back later today with
>>> actual valid test results.
>>>
>>
>> I replaced the monitor (turns out that my monitor had a known DP
>> problem), and now the screen lights up reliably.  I still get
>
> Great to hear!
>
>> occasional log lines like this:
>>
>> [Jun26 09:25] nouveau :09:00.0: fifo: FB_FLUSH_TIMEOUT
>> [Jun26 09:30] nouveau :09:00.0: fifo: FB_FLUSH_TIMEOUT
>> [Jun26 09:32] nouveau :09:00.0: fifo: CHSW_ERROR 0004
>> [  +0.000162] nouveau :09:00.0: fifo: CHSW_ERROR 0005
>
> These don't sound good at all!
>
>> [Jun26 09:46] nouveau :09:00.0: disp: outp 04:0006:0f44: link
>> training failed
>> [  +0.107894] nouveau :09:00.0: disp: outp 04:0006:0f44: link
>> training failed
>
> These are surprising if your monitor is working. Usually it means
> "couldn't establish link with the monitor". Perhaps something forces
> it to retry and it eventually succeeds.

Given the timing, I'm guessing that it tries a couple of times and
eventually works.

Given that the monitor is a newer, "fixed" revision of a
known-seriously-broken Dell monitor, it wouldn't shock me if what's
actually happening is that the monitor uses buggy DP hardware and the
"fixed" firmware A03 actually works by forcing several retries when
link training fails (as opposed to what A00 - A02 did, which seemed to
involve failing a few times and then crashing, sometimes hard enough
that even the monitor power button stopped working).

>
>>
>> but they aren't causing an obvious problem.
>>
>>>>
>>>> Note that with maxwell we have yet to add EXA support to
>>>> xf86-video-nouveau, so you're ending up with GLAMOR (and Ben and I
>>>> disagree on whether EXA support should be added in the first place).
>>>> There was also an issue that glamor was hitting with nouveau which
>>>> appears to have dissipated, either due to a change in nouveau or a
>>>> change in glamor. So you might consider upgrading to Xorg 1.18.3 (as
>>>> glamor is part of X).
>>
>> I do have a serious performance issue, though: when I scroll in
>> Firefox (default configuration), the whole system drops to ~1fps or
>> less and, if I scroll enough (even putting the mouse over a simple
>> page like start.fedoraproject.org and flicking the wheel up and down a
>> few times), the entire desktop will become unusable for several
>> seconds.  I seem to have this problem under X and under Wayland.
>>
>> For better or for worse, forcing Firefox's layers acceleration on
>> fixes the problem and scrolling is fast.
>>
>> I have no idea whether this is an X problem, a gnome-shell problem, a
>> mesa problem, a kernel problem, or something else.
>
> I believe the issue is with GLAMOR, but I'm not sure - in my

[Nouveau] Should I expect nouveau on 4.6 to work on a GM206?

2016-06-26 Thread Andy Lutomirski
On Sun, May 29, 2016 at 12:27 PM, Andy Lutomirski  wrote:
> On Sun, May 29, 2016 at 12:22 PM, Ilia Mirkin  wrote:
>> On Sun, May 29, 2016 at 3:07 PM, Andy Lutomirski  wrote:
>>> On Sat, May 28, 2016 at 5:48 PM, Ilia Mirkin  
>>> wrote:
>>>> Do you have mesa 11.2 or later? GM20x support was only added in mesa 11.2.
>>>>
>>>
>>> I just upgraded to 11.2.  I'm getting errors like this in the log:
>>>
>>> [ 5383.723240] nouveau :09:00.0: fifo: read fault at 011000
>>> engine 07 [PBDMA0] client 06 [HOST] reason 00 [PDE] on channel -1
>>> [007f9ed000 unknown]
>>> [ 5398.722676] nouveau :09:00.0: systemd-logind[30778]: failed to
>>> idle channel 2 [systemd-logind[30778]]
>>> [ 5413.722853] nouveau :09:00.0: systemd-logind[30778]: failed to
>>> idle channel 2 [systemd-logind[30778]]
>>>
>>> and the display output in general is unreliable enough that I'm having
>>> trouble telling whether the performance is remotely reasonable.
>>
>> If you're having trouble telling, that means it's not :) The error you
>> pasted is quite odd. Was there anything in the log before those
>> messages? If there's no channel associated, that means that it's the
>> background copying between vram and sysmem? Not sure.
>
> Don't get too excited yet.  In the process of upgrading mesa, I
> managed to boot 4.5 without noticing.  I'll post back later today with
> actual valid test results.
>

I replaced the monitor (turns out that my monitor had a known DP
problem), and now the screen lights up reliably.  I still get
occasional log lines like this:

[Jun26 09:25] nouveau :09:00.0: fifo: FB_FLUSH_TIMEOUT
[Jun26 09:30] nouveau :09:00.0: fifo: FB_FLUSH_TIMEOUT
[Jun26 09:32] nouveau :09:00.0: fifo: CHSW_ERROR 0004
[  +0.000162] nouveau :09:00.0: fifo: CHSW_ERROR 0005
[Jun26 09:46] nouveau :09:00.0: disp: outp 04:0006:0f44: link
training failed
[  +0.107894] nouveau :09:00.0: disp: outp 04:0006:0f44: link
training failed

but they aren't causing an obvious problem.

>>
>> Note that with maxwell we have yet to add EXA support to
>> xf86-video-nouveau, so you're ending up with GLAMOR (and Ben and I
>> disagree on whether EXA support should be added in the first place).
>> There was also an issue that glamor was hitting with nouveau which
>> appears to have dissipated, either due to a change in nouveau or a
>> change in glamor. So you might consider upgrading to Xorg 1.18.3 (as
>> glamor is part of X).

I do have a serious performance issue, though: when I scroll in
Firefox (default configuration), the whole system drops to ~1fps or
less and, if I scroll enough (even putting the mouse over a simple
page like start.fedoraproject.org and flicking the wheel up and down a
few times), the entire desktop will become unusable for several
seconds.  I seem to have this problem under X and under Wayland.

For better or for worse, forcing Firefox's layers acceleration on
fixes the problem and scrolling is fast.

I have no idea whether this is an X problem, a gnome-shell problem, a
mesa problem, a kernel problem, or something else.


DP link training and performance issues with HDMI USB-C dongle and Skylake

2016-06-22 Thread Andy Lutomirski
I have a Dell XPS 13 9350 (Skylake) and a Dell DA200 adapter.  The
latter is a Thunderbolt device that includes an HDMI port and connects
over USB Type C.  I believe that it's internally using DP Alternate
Mode.

When I plug it in on 4.7-rc4, I get spew like this:

[   90.718106] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   91.077604] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   91.437059] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   91.796479] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   92.156101] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   92.515647] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   92.875184] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   93.234735] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   93.594294] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   93.953812] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   94.313390] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   94.673043] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   95.032890] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   95.393016] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   95.752879] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   96.113074] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   96.473068] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   96.833185] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   97.193233] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   97.553138] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   97.913526] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   98.273525] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   98.634178] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   98.993859] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   99.354484] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[   99.714669] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  100.077412] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  100.432684] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  100.792499] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  101.152378] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  101.512265] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  101.872466] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  102.232284] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  102.592251] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  103.111283] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  103.466511] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  103.826082] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  104.191906] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  104.547038] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  104.911264] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  105.270679] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  105.625774] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  105.986064] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  106.350045] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  106.705325] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  107.064897] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  107.431263] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  107.790793] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  108.146016] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  108.506093] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  108.865924] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting
[  109.225629] [drm:intel_dp_start_link_train [i915]] *ERROR* failed
to train DP, aborting

[Nouveau] Should I expect nouveau on 4.6 to work on a GM206?

2016-05-29 Thread Andy Lutomirski
On Sun, May 29, 2016 at 12:22 PM, Ilia Mirkin  wrote:
> On Sun, May 29, 2016 at 3:07 PM, Andy Lutomirski  wrote:
>> On Sat, May 28, 2016 at 5:48 PM, Ilia Mirkin  wrote:
>>> Do you have mesa 11.2 or later? GM20x support was only added in mesa 11.2.
>>>
>>
>> I just upgraded to 11.2.  I'm getting errors like this in the log:
>>
>> [ 5383.723240] nouveau :09:00.0: fifo: read fault at 011000
>> engine 07 [PBDMA0] client 06 [HOST] reason 00 [PDE] on channel -1
>> [007f9ed000 unknown]
>> [ 5398.722676] nouveau :09:00.0: systemd-logind[30778]: failed to
>> idle channel 2 [systemd-logind[30778]]
>> [ 5413.722853] nouveau :09:00.0: systemd-logind[30778]: failed to
>> idle channel 2 [systemd-logind[30778]]
>>
>> and the display output in general is unreliable enough that I'm having
>> trouble telling whether the performance is remotely reasonable.
>
> If you're having trouble telling, that means it's not :) The error you
> pasted is quite odd. Was there anything in the log before those
> messages? If there's no channel associated, that means that it's the
> background copying between vram and sysmem? Not sure.

Don't get too excited yet.  In the process of upgrading mesa, I
managed to boot 4.5 without noticing.  I'll post back later today with
actual valid test results.

>
> Note that with maxwell we have yet to add EXA support to
> xf86-video-nouveau, so you're ending up with GLAMOR (and Ben and I
> disagree on whether EXA support should be added in the first place).
> There was also an issue that glamor was hitting with nouveau which
> appears to have dissipated, either due to a change in nouveau or a
> change in glamor. So you might consider upgrading to Xorg 1.18.3 (as
> glamor is part of X).
>
> FWIW a few other people have been using GM20x without incident, but
> this can all be very sensitive to your desktop/etc. Lots of things
> like to use GL nowadays - I stick to a more classic desktop - no
> compositor, simple window manager, etc.

This is GNOME 3 on Fedora 24 Beta.

--Andy


[Nouveau] Should I expect nouveau on 4.6 to work on a GM206?

2016-05-29 Thread Andy Lutomirski
On Sat, May 28, 2016 at 5:48 PM, Ilia Mirkin  wrote:
> Do you have mesa 11.2 or later? GM20x support was only added in mesa 11.2.
>

I just upgraded to 11.2.  I'm getting errors like this in the log:

[ 5383.723240] nouveau :09:00.0: fifo: read fault at 011000
engine 07 [PBDMA0] client 06 [HOST] reason 00 [PDE] on channel -1
[007f9ed000 unknown]
[ 5398.722676] nouveau :09:00.0: systemd-logind[30778]: failed to
idle channel 2 [systemd-logind[30778]]
[ 5413.722853] nouveau :09:00.0: systemd-logind[30778]: failed to
idle channel 2 [systemd-logind[30778]]

and the display output in general is unreliable enough that I'm having
trouble telling whether the performance is remotely reasonable.

--Andy

> Cheers,
>
>   -ilia
>
> On Sat, May 28, 2016 at 4:51 PM, Andy Lutomirski  wrote:
>> I have the signed firmware (I think) and I'm running a fresh 4.6
>> kernel.  I got an image to show up briefly, rendering the Fedora
>> sign-in screen at something like one frame per ten seconds.  But then
>> I got all kinds of garbage, and I see:
>>
>> [  719.300820] nouveau :09:00.0: disp: outp 04:0006:0f44: link
>> training failed
>>
>> dmesg |grep nouveau says:
>>
>> [   10.053162] fb: switching to nouveaufb from EFI VGA
>> [   10.053349] nouveau :09:00.0: NVIDIA GM206 (126010a1)
>> [   10.174033] nouveau :09:00.0: bios: version 84.06.0d.00.01
>> [   10.174854] nouveau :09:00.0: disp: dcb 15 type 8 unknown
>> [   10.178375] nouveau :09:00.0: fb: 2048 MiB GDDR5
>> [   10.202108] nouveau :09:00.0: DRM: VRAM: 2048 MiB
>> [   10.202109] nouveau :09:00.0: DRM: GART: 1048576 MiB
>> [   10.202113] nouveau :09:00.0: DRM: TMDS table version 2.0
>> [   10.202114] nouveau :09:00.0: DRM: DCB version 4.1
>> [   10.202116] nouveau :09:00.0: DRM: DCB outp 00: 01000f02 00020030
>> [   10.202117] nouveau :09:00.0: DRM: DCB outp 01: 02000f00 
>> [   10.202118] nouveau :09:00.0: DRM: DCB outp 02: 02811f76 04400020
>> [   10.202120] nouveau :09:00.0: DRM: DCB outp 03: 02011f72 00020020
>> [   10.202121] nouveau :09:00.0: DRM: DCB outp 04: 04822f86 04400010
>> [   10.202122] nouveau :09:00.0: DRM: DCB outp 05: 04022f82 00020010
>> [   10.202123] nouveau :09:00.0: DRM: DCB outp 06: 04833f96 04400020
>> [   10.202124] nouveau :09:00.0: DRM: DCB outp 07: 04033f92 00020020
>> [   10.202125] nouveau :09:00.0: DRM: DCB outp 08: 02044f62 00020010
>> [   10.202126] nouveau :09:00.0: DRM: DCB outp 15: 01df5ff8 
>> [   10.202127] nouveau :09:00.0: DRM: DCB conn 00: 1030
>> [   10.202128] nouveau :09:00.0: DRM: DCB conn 01: 00020146
>> [   10.202129] nouveau :09:00.0: DRM: DCB conn 02: 01000246
>> [   10.202130] nouveau :09:00.0: DRM: DCB conn 03: 02000346
>> [   10.202131] nouveau :09:00.0: DRM: DCB conn 04: 00010461
>> [   10.202132] nouveau :09:00.0: DRM: DCB conn 05: 0570
>> [   10.202134] nouveau :09:00.0: DRM: Pointer to flat panel table invalid
>> [   10.214683] nouveau :09:00.0: DRM: unknown connector type 70
>> [   10.214728] nouveau :09:00.0: DRM: failed to create encoder 1/8/0: -19
>> [   10.214730] nouveau :09:00.0: DRM: Unknown-1 has no encoders, removing
>> [   10.369691] nouveau :09:00.0: DRM: MM: using COPY for buffer copies
>> [   10.478561] nouveau :09:00.0: priv: GPC0: 419df4  (1e40820e)
>> [   10.478578] nouveau :09:00.0: priv: GPC1: 419df4  (1e40820e)
>> [   10.607100] nouveau :09:00.0: DRM: allocated 3840x2160 fb:
>> 0x6, bo 88044aad7400
>> [   10.607276] fbcon: nouveaufb (fb0) is primary device
>> [   10.607576] nouveau :09:00.0: fb0: nouveaufb frame buffer device
>> [   10.617064] [drm] Initialized nouveau 1.3.1 20120801 for
>> :09:00.0 on minor 0
>> [  719.282184] nouveau :09:00.0: disp: outp 04:0006:0f44: link
>> training failed
>> [  719.300820] nouveau :09:00.0: disp: outp 04:0006:0f44: link
>> training failed
>>
>>
>>
>> Thanks,
>> Andy
>> ___
>> Nouveau mailing list
>> Nouveau at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau


Should I expect nouveau on 4.6 to work on a GM206?

2016-05-28 Thread Andy Lutomirski
I have the signed firmware (I think) and I'm running a fresh 4.6
kernel.  I got an image to show up briefly, rendering the Fedora
sign-in screen at something like one frame per ten seconds.  But then
I got all kinds of garbage, and I see:

[  719.300820] nouveau :09:00.0: disp: outp 04:0006:0f44: link
training failed

dmesg |grep nouveau says:

[   10.053162] fb: switching to nouveaufb from EFI VGA
[   10.053349] nouveau :09:00.0: NVIDIA GM206 (126010a1)
[   10.174033] nouveau :09:00.0: bios: version 84.06.0d.00.01
[   10.174854] nouveau :09:00.0: disp: dcb 15 type 8 unknown
[   10.178375] nouveau :09:00.0: fb: 2048 MiB GDDR5
[   10.202108] nouveau :09:00.0: DRM: VRAM: 2048 MiB
[   10.202109] nouveau :09:00.0: DRM: GART: 1048576 MiB
[   10.202113] nouveau :09:00.0: DRM: TMDS table version 2.0
[   10.202114] nouveau :09:00.0: DRM: DCB version 4.1
[   10.202116] nouveau :09:00.0: DRM: DCB outp 00: 01000f02 00020030
[   10.202117] nouveau :09:00.0: DRM: DCB outp 01: 02000f00 
[   10.202118] nouveau :09:00.0: DRM: DCB outp 02: 02811f76 04400020
[   10.202120] nouveau :09:00.0: DRM: DCB outp 03: 02011f72 00020020
[   10.202121] nouveau :09:00.0: DRM: DCB outp 04: 04822f86 04400010
[   10.202122] nouveau :09:00.0: DRM: DCB outp 05: 04022f82 00020010
[   10.202123] nouveau :09:00.0: DRM: DCB outp 06: 04833f96 04400020
[   10.202124] nouveau :09:00.0: DRM: DCB outp 07: 04033f92 00020020
[   10.202125] nouveau :09:00.0: DRM: DCB outp 08: 02044f62 00020010
[   10.202126] nouveau :09:00.0: DRM: DCB outp 15: 01df5ff8 
[   10.202127] nouveau :09:00.0: DRM: DCB conn 00: 1030
[   10.202128] nouveau :09:00.0: DRM: DCB conn 01: 00020146
[   10.202129] nouveau :09:00.0: DRM: DCB conn 02: 01000246
[   10.202130] nouveau :09:00.0: DRM: DCB conn 03: 02000346
[   10.202131] nouveau :09:00.0: DRM: DCB conn 04: 00010461
[   10.202132] nouveau :09:00.0: DRM: DCB conn 05: 0570
[   10.202134] nouveau :09:00.0: DRM: Pointer to flat panel table invalid
[   10.214683] nouveau :09:00.0: DRM: unknown connector type 70
[   10.214728] nouveau :09:00.0: DRM: failed to create encoder 1/8/0: -19
[   10.214730] nouveau :09:00.0: DRM: Unknown-1 has no encoders, removing
[   10.369691] nouveau :09:00.0: DRM: MM: using COPY for buffer copies
[   10.478561] nouveau :09:00.0: priv: GPC0: 419df4  (1e40820e)
[   10.478578] nouveau :09:00.0: priv: GPC1: 419df4  (1e40820e)
[   10.607100] nouveau :09:00.0: DRM: allocated 3840x2160 fb:
0x6, bo 88044aad7400
[   10.607276] fbcon: nouveaufb (fb0) is primary device
[   10.607576] nouveau :09:00.0: fb0: nouveaufb frame buffer device
[   10.617064] [drm] Initialized nouveau 1.3.1 20120801 for
:09:00.0 on minor 0
[  719.282184] nouveau :09:00.0: disp: outp 04:0006:0f44: link
training failed
[  719.300820] nouveau :09:00.0: disp: outp 04:0006:0f44: link
training failed



Thanks,
Andy


i915 4.5 bugfix backport and release management issue?

2016-03-29 Thread Andy Lutomirski
On Tue, Mar 29, 2016 at 12:49 AM, Andy Lutomirski  
wrote:
> On Tue, Mar 29, 2016 at 12:43 AM, Daniel Vetter  
> wrote:
>> On Tue, Mar 29, 2016 at 4:39 AM, Andy Lutomirski  
>> wrote:
>>> AFAICT something got rather screwed up in i915 land for 4.5.
>>>
>>> $ git log --oneline --grep='Pretend cursor is always on' v4.5
>>> drivers/gpu/drm/i915/
>>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>>> calculations (v2)
>>>
>>> $ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
>>> drivers/gpu/drm/i915/
>>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>>> calculations (v2)
>>> b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
>>> calculations (v2)
>>>
>>> The two patches there are almost, but not quite, the same thing, which
>>> makes me wonder how they both ended up in Linus' tree without an
>>> obvious merge conflict.
>>>
>>> I have no idea what caused this.  However, I think (on very little
>>> inspection, but it's consistent with problems I have with 4.5 on my
>>> laptop) that the first one is an *incorrect* fix for a regression in
>>> 4.5 and the second is a correct fix for the same regression.  4.6-rc1
>>> seems okay.
>>>
>>> I reported the regression and everyone involved has known about it for
>>> weeks.  Nonetheless, 4.5 final is busted.
>>
>> Quoting from e2e407dc093f
>>
>> "(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)"
>>
>> i.e. this is intentionally twice in the history. We started to soak
>> bugfixes in -next and then cherry pick them because we had too much
>> fun with things blowing up, and also too much fun with really messy
>> conflicts. It's not a botched patch in 4.5 or anything else nefarious
>> at all.
>
> Bah, sorry, I read it wrong.  They have the same final state but they
> were on different bases.  I somehow reversed this in my head and
> thought they had the same initial state and different final states.
>

Also, sorry for the excessive diatribe.  I plead sleepiness and
mis-reading of code.

--Andy


i915 4.5 bugfix backport and release management issue?

2016-03-29 Thread Andy Lutomirski
On Tue, Mar 29, 2016 at 12:43 AM, Daniel Vetter  
wrote:
> On Tue, Mar 29, 2016 at 4:39 AM, Andy Lutomirski  
> wrote:
>> AFAICT something got rather screwed up in i915 land for 4.5.
>>
>> $ git log --oneline --grep='Pretend cursor is always on' v4.5
>> drivers/gpu/drm/i915/
>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>> calculations (v2)
>>
>> $ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
>> drivers/gpu/drm/i915/
>> e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
>> calculations (v2)
>> b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
>> calculations (v2)
>>
>> The two patches there are almost, but not quite, the same thing, which
>> makes me wonder how they both ended up in Linus' tree without an
>> obvious merge conflict.
>>
>> I have no idea what caused this.  However, I think (on very little
>> inspection, but it's consistent with problems I have with 4.5 on my
>> laptop) that the first one is an *incorrect* fix for a regression in
>> 4.5 and the second is a correct fix for the same regression.  4.6-rc1
>> seems okay.
>>
>> I reported the regression and everyone involved has known about it for
>> weeks.  Nonetheless, 4.5 final is busted.
>
> Quoting from e2e407dc093f
>
> "(cherry picked from commit b2435692dbb709d4c8ff3b2f2815c9b8423b72bb)"
>
> i.e. this is intentionally twice in the history. We started to soak
> bugfixes in -next and then cherry pick them because we had too much
> fun with things blowing up, and also too much fun with really messy
> conflicts. It's not a botched patch in 4.5 or anything else nefarious
> at all.

Bah, sorry, I read it wrong.  They have the same final state but they
were on different bases.  I somehow reversed this in my head and
thought they had the same initial state and different final states.

>
> - We've genuinely failed to cherry-pick a bugfix over. It happens,
> despite our best efforts (which of course includes running stuff on
> Linus' tree). Please do a reverse bisect so we know which precise
> commit fell through the cracks.

If I find some time, I'll try.  I've already failed miserably at
bisecting this thing once.

--Andy


i915 4.5 bugfix backport and release management issue?

2016-03-28 Thread Andy Lutomirski
Hi all-

AFAICT something got rather screwed up in i915 land for 4.5.

$ git log --oneline --grep='Pretend cursor is always on' v4.5
drivers/gpu/drm/i915/
e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
calculations (v2)

$ git log --oneline --grep='Pretend cursor is always on' v4.6-rc1
drivers/gpu/drm/i915/
e2e407dc093f drm/i915: Pretend cursor is always on for ILK-style WM
calculations (v2)
b2435692dbb7 drm/i915: Pretend cursor is always on for ILK-style WM
calculations (v2)

The two patches there are almost, but not quite, the same thing, which
makes me wonder how they both ended up in Linus' tree without an
obvious merge conflict.

I have no idea what caused this.  However, I think (on very little
inspection, but it's consistent with problems I have with 4.5 on my
laptop) that the first one is an *incorrect* fix for a regression in
4.5 and the second is a correct fix for the same regression.  4.6-rc1
seems okay.

I reported the regression and everyone involved has known about it for
weeks.  Nonetheless, 4.5 final is busted.

Can you please:

a) figure out what happened and send a backport of whatever needs to
be backported to stable at vger.kernel.org.

b) do whatever needs to be done so this doesn't happen again

c) teach the i915 CI system to test Linus' tree as-is in addition to
the development trees.  Linus' tree and the versions of i915 in actual
released versions of Linux are supposed to work.


I hate to nag, but this is at least the third time I've noticed weird
release management issues in i915.  I tripped on a regression a few
releases ago that was known to the i915 team and fixed but the fix
wasn't actually queued up for the current release.  Before that, I
tripped on a regression caused by an intentional behavior change that
was folded in to a merge commit, making it essentially impossible to
bisect and making it pointlessly hard to understand what was going on
even once I found the offending code.

Thanks,
Andy


[Intel-gfx] Possible 4.5 i915 Skylake regression

2016-03-13 Thread Andy Lutomirski
On Wed, Feb 17, 2016 at 8:18 AM, Daniel Vetter  wrote:
> On Tue, Feb 16, 2016 at 09:26:35AM -0800, Andy Lutomirski wrote:
>> On Tue, Feb 16, 2016 at 9:12 AM, Andy Lutomirski  
>> wrote:
>> > On Tue, Feb 16, 2016 at 8:12 AM, Daniel Vetter  wrote:
>> >> On Mon, Feb 15, 2016 at 06:58:33AM -0800, Andy Lutomirski wrote:
>> >>> On Sun, Feb 14, 2016 at 6:59 PM, Andy Lutomirski  
>> >>> wrote:
>> >>> > Hi-
>> >>> >
>> >>> > On 4.5-rc3 on a Dell XPS 13 9350 (Skylake i915, no nvidia on this
>> >>> > model), shortly after resume, I saw a single black flash on the
>> >>> > screen.  The log said:
>> >>> >
>> >>> > [Feb13 07:05] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR*
>> >>> > CPU pipe A FIFO underrun
>> >>> >
>> >>> > I haven't seen this on 4.4.
>> >>> >
>> >>> > I'd be happy to dig up debugging info, but I don't know what would be
>> >>> > useful.  I have no i915 module options set.
>> >>>
>> >>> It's flashing quite frequently now, although I seem to get the
>> >>> underrun warning only once per resume.
>> >>
>> >> We shut up the warning irq source to avoid hijacking an entire cpu core
>> >> ;-)
>> >>
>> >> There's a fix from Matt right after 4.5-rc4 in Linus' branch. I'm hoping
>> >> that should help.
>> >
>> > Do you mean:
>> >
>> > commit e2e407dc093f530b771ee8bf8fe1be41e3cea8b3
>> > Author: Matt Roper 
>> > Date:   Mon Feb 8 11:05:28 2016 -0800
>> >
>> > drm/i915: Pretend cursor is always on for ILK-style WM calculations 
>> > (v2)
>> >
>> > If so, it didn't help.  I'm currently doing a full rebuild just in
>> > case I messed something up, though.
>> >
>>
>> Definitely not fixed.  It seems to be okay after a reboot until the
>> first suspend/resume.
>>
>> This happened after resuming.  Five cents says it's the root cause.
>
> That's interesting, but doesn't ring a bell unfortunately. Can you try to
> attempt a bisect?
>

I'm giving up on my attempt to bisect for now.  After a bunch of false
starts to avoid this crap, I'm stuck at
651174a4a0ccaf41e14fadc4bc525d61ae7f7b18, which is based on 4.3-rc3
and doesn't merge cleanly up to 4.4.  It's also annoying because it
reproduces reasonably quickly but not instantaneously, and I can never
reproduce it before a suspend/resume, so my bisection attempts are
full of errors.

--Andy

> Thanks, Daniel
>
>>
>> [  160.361200] WARNING: CPU: 2 PID: 2512 at
>> drivers/gpu/drm/i915/intel_uncore.c:599
>> hsw_unclaimed_reg_debug+0x69/0x90 [i915]()
>> [  160.361209] Unclaimed register detected before writing to register 0x20a8
>> [  160.361213] Modules linked in: rfcomm fuse ccm cmac xt_CHECKSUM
>> ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
>> nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>> xt_conntrack ebtable_filter ebtable_nat ebtable_broute bridge stp llc
>> ebtables ip6table_raw ip6table_mangle ip6table_security ip6table_nat
>> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter
>> ip6_tables iptable_raw iptable_mangle iptable_security iptable_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bnep
>> arc4 iwlmvm mac80211 snd_hda_codec_hdmi snd_hda_codec_realtek
>> hid_multitouch snd_hda_codec_generic iwlwifi snd_hda_intel intel_rapl
>> snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hwdep
>> cfg80211 snd_hda_core kvm snd_seq uvcvideo snd_seq_device
>> i2c_designware_platform
>> [  160.361385]  i2c_designware_core btusb snd_pcm videobuf2_vmalloc
>> wmi_mof vfat dell_wmi fat videobuf2_memops btrtl btbcm btintel
>> bluetooth dell_laptop dell_smbios dcdbas videobuf2_v4l2 snd_timer
>> videobuf2_core rtsx_pci_ms snd irqbypass videodev memstick
>> ghash_clmulni_intel joydev mei_me efi_pstore mei i2c_i801 soundcore
>> efivars pcspkr idma64 shpchp virt_dma media rfkill intel_lpss_pci
>> processor_thermal_device intel_soc_dts_iosf wmi acpi_als kfifo_buf
>> int3403_thermal tpm_tis industrialio pinctrl_sunrisepoint tpm
>> intel_hid int3400_thermal pinctrl_intel intel_lpss_acpi sparse_keymap
>> int340x_thermal_zone acpi_thermal_rel intel_lpss nfsd acpi_pad
>> auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc dm_crypt i915
>> i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
>> fb_sys_fops drm rtsx_pci_sdmmc
>> [  160.

[Intel-gfx] Possible 4.5 i915 Skylake regression

2016-03-11 Thread Andy Lutomirski
On Mon, Feb 22, 2016 at 7:13 PM, Andy Lutomirski  wrote:
> On Wed, Feb 17, 2016 at 5:36 PM, Andy Lutomirski  
> wrote:
>> On Wed, Feb 17, 2016 at 8:18 AM, Daniel Vetter  wrote:
>>> On Tue, Feb 16, 2016 at 09:26:35AM -0800, Andy Lutomirski wrote:
>>>> On Tue, Feb 16, 2016 at 9:12 AM, Andy Lutomirski  
>>>> wrote:
>>>> > On Tue, Feb 16, 2016 at 8:12 AM, Daniel Vetter  
>>>> > wrote:
>>>> >> On Mon, Feb 15, 2016 at 06:58:33AM -0800, Andy Lutomirski wrote:
>>>> >>> On Sun, Feb 14, 2016 at 6:59 PM, Andy Lutomirski  
>>>> >>> wrote:
>>>> >>> > Hi-
>>>> >>> >
>>>> >>> > On 4.5-rc3 on a Dell XPS 13 9350 (Skylake i915, no nvidia on this
>>>> >>> > model), shortly after resume, I saw a single black flash on the
>>>> >>> > screen.  The log said:
>>>> >>> >
>>>> >>> > [Feb13 07:05] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] 
>>>> >>> > *ERROR*
>>>> >>> > CPU pipe A FIFO underrun
>>>> >>> >
>>>> >>> > I haven't seen this on 4.4.
>>>> >>> >
>>>> >>> > I'd be happy to dig up debugging info, but I don't know what would be
>>>> >>> > useful.  I have no i915 module options set.
>>>> >>>
>>>> >>> It's flashing quite frequently now, although I seem to get the
>>>> >>> underrun warning only once per resume.
>>>> >>
>>>> >> We shut up the warning irq source to avoid hijacking an entire cpu core
>>>> >> ;-)
>>>> >>
>>>> >> There's a fix from Matt right after 4.5-rc4 in Linus' branch. I'm hoping
>>>> >> that should help.
>>>> >
>>>> > Do you mean:
>>>> >
>>>> > commit e2e407dc093f530b771ee8bf8fe1be41e3cea8b3
>>>> > Author: Matt Roper 
>>>> > Date:   Mon Feb 8 11:05:28 2016 -0800
>>>> >
>>>> > drm/i915: Pretend cursor is always on for ILK-style WM calculations 
>>>> > (v2)
>>>> >
>>>> > If so, it didn't help.  I'm currently doing a full rebuild just in
>>>> > case I messed something up, though.
>>>> >
>>>>
>>>> Definitely not fixed.  It seems to be okay after a reboot until the
>>>> first suspend/resume.
>>>>
>>>> This happened after resuming.  Five cents says it's the root cause.
>>>
>>> That's interesting, but doesn't ring a bell unfortunately. Can you try to
>>> attempt a bisect?
>>
>> I probably can, but it's very slow.  Is there a reasonably
>> straightforward way to instrument the watermark computation to see
>> what's going wrong?  I'm reasonably confident that the bug is in the
>> resume code or in something that only happens on resume, since I still
>> haven't seen underruns after rebooting before suspending.
>>
>
> With some instrumentation applied, I got this:
>
> [  369.471064] skl_update_wm(crtc-0): computed update
> [  369.471072] skl_update_other_pipe_wm(crtc-0): no change
> [  369.471075] skl_write_wm_values...
> [  369.471078]  CRTC crtc-0 pipe A
> [  369.471083]   wm_linetime = 121
> [  369.471086]   plane_wm level 0 plane 0 = 2147500036
> [  369.471090]   plane_wm level 0 plane 1 = 0
> [  369.471094]   plane_wm level 0 cursor = 2147500036
> [  369.471097]   plane_wm level 1 plane 0 = 2147516439
> [  369.471101]   plane_wm level 1 plane 1 = 0
> [  369.471104]   plane_wm level 1 cursor = 2147516439
> [  369.471108]   plane_wm level 2 plane 0 = 2147516448
> [  369.47]   plane_wm level 2 plane 1 = 0
> [  369.471115]   plane_wm level 2 cursor = 0
> [  369.471118]   plane_wm level 3 plane 0 = 2147532837
> [  369.471121]   plane_wm level 3 plane 1 = 0
> [  369.471125]   plane_wm level 3 cursor = 0
> [  369.471128]   plane_wm level 4 plane 0 = 2147565639
> [  369.471131]   plane_wm level 4 plane 1 = 0
> [  369.471135]   plane_wm level 4 cursor = 0
> [  369.471138]   plane_wm level 5 plane 0 = 2147582038
> [  369.471141]   plane_wm level 5 plane 1 = 0
> [  369.471145]   plane_wm level 5 cursor = 0
> [  369.471148]   plane_wm level 6 plane 0 = 2147582044
> [  369.471151]   plane_wm level 6 plane 1 = 0
> [  369.471155]   plane_wm level 6 cursor = 0
> [  369.471158]   plane_wm level 7 plane 0 = 2147598443
> [  369.471161]   plane_wm level 7 plane 1 = 0
> [  369.47116

[PATCH v1 00/12] PCI: Rework shadow ROM handling

2016-03-11 Thread Andy Lutomirski
On Fri, Mar 11, 2016 at 3:29 PM, Bjorn Helgaas  wrote:
> On Fri, Mar 11, 2016 at 01:16:09PM -0800, Andy Lutomirski wrote:
>> On Tue, Mar 8, 2016 at 9:45 AM, Bjorn Helgaas  wrote:
>> > On Thu, Mar 03, 2016 at 10:53:50AM -0600, Bjorn Helgaas wrote:
>> >> The purpose of this series is to:
>> >>
>> >>   - Fix the "BAR 6: [??? 0x flags 0x2] has bogus alignment"
>> >> messages reported by Linus [1], Andy [2], and others.
>> >>
>> >>   - Move arch-specific shadow ROM location knowledge, e.g.,
>> >> 0xC-0xD, from PCI core to arch code.
>> >>
>> >>   - Fix the ia64 and MIPS Loongson 3 oddity of keeping virtual
>> >> addresses in shadow ROM struct resource (resources should always
>> >> contain *physical* addresses).
>> >>
>> >>   - Remove now-unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
>> >> flags.
>> >>
>> >> This series is based on v4.5-rc1, and it's available on my
>> >> pci/resource git branch (along with a couple tiny unrelated patches)
>> >> at [3].
>> >>
>> >> Bjorn
>> >>
>> >>
>> >> [1] 
>> >> http://lkml.kernel.org/r/CA+55aFyVMfTBB0oz_yx8+eQOEJnzGtCsYSj9QuhEpdZ9BHdq5A
>> >>  at mail.gmail.com
>> >> [2] 
>> >> http://lkml.kernel.org/r/CALCETrV+RwNPzxyL8UVNsrAGu-6cCzD_Cc9PFJT2NCTJPLZZiw
>> >>  at mail.gmail.com
>> >> [3] 
>> >> https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=pci/resource
>> >>
>> >>
>> >> ---
>> >>
>> >> Bjorn Helgaas (12):
>> >>   PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED
>> >>   PCI: Don't assign or reassign immutable resources
>> >>   PCI: Don't enable/disable ROM BAR if we're using a RAM shadow copy
>> >>   PCI: Set ROM shadow location in arch code, not in PCI core
>> >>   PCI: Clean up pci_map_rom() whitespace
>> >>   ia64/PCI: Use temporary struct resource * to avoid repetition
>> >>   ia64/PCI: Use ioremap() instead of open-coded equivalent
>> >>   ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM 
>> >> resource
>> >>   MIPS: Loongson 3: Use temporary struct resource * to avoid 
>> >> repetition
>> >>   MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in 
>> >> shadow ROM resource
>> >>   PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
>> >>   PCI: Simplify sysfs ROM cleanup
>> >>
>> >>
>> >>  arch/ia64/pci/fixup.c  |   21 +++--
>> >>  arch/ia64/sn/kernel/io_acpi_init.c |   22 ++
>> >>  arch/ia64/sn/kernel/io_init.c  |   51 --
>> >>  arch/mips/pci/fixup-loongson3.c|   19 +---
>> >>  arch/x86/pci/fixup.c   |   21 +++--
>> >>  drivers/pci/pci-sysfs.c|   13 +-
>> >>  drivers/pci/remove.c   |1
>> >>  drivers/pci/rom.c  |   83 
>> >> +++-
>> >>  drivers/pci/setup-res.c|6 +++
>> >>  include/linux/ioport.h |4 --
>> >>  10 files changed, 111 insertions(+), 130 deletions(-)
>> >
>> > I applied this series to pci/resource for v4.6.
>>
>> This gets rid of all the warnings for me until I try to read my i915
>> device's rom using sysfs.  Then I get:
>>
>> i915 :00:02.0: Invalid PCI ROM header signature: expecting 0xaa55,
>> got 0x
>>
>> So I suspect that something is still subtly wrong -- I'd imagine that
>> this should either work or the intialization code should detect that
>> there is no usable ROM and not expose it.
>>
>> (To be clear, there's no regression here.)
>
> Hmmm.  Thanks for testing this.  As you say, I think this is the way
> it's always been, but it does seem non-intuitive.
>
> That "Invalid PCI ROM header signature" warning comes from
> pci_get_rom_size().  We don't call that at enumeration-time; we only
> call it later when somebody tries to read the ROM via sysfs:
>
>   pci_bus_add_device
> pci_fixup_device(pci_fixup_final)
>   pci_fixup_video # final fixup
> res->flags = MEM | SHADOW | PCI_FIXED
> pci_create_sysfs_dev_files
>   if (SHADOW)
> 

[PATCH v1 00/12] PCI: Rework shadow ROM handling

2016-03-11 Thread Andy Lutomirski
On Tue, Mar 8, 2016 at 9:45 AM, Bjorn Helgaas  wrote:
> On Thu, Mar 03, 2016 at 10:53:50AM -0600, Bjorn Helgaas wrote:
>> The purpose of this series is to:
>>
>>   - Fix the "BAR 6: [??? 0x flags 0x2] has bogus alignment"
>> messages reported by Linus [1], Andy [2], and others.
>>
>>   - Move arch-specific shadow ROM location knowledge, e.g.,
>> 0xC-0xD, from PCI core to arch code.
>>
>>   - Fix the ia64 and MIPS Loongson 3 oddity of keeping virtual
>> addresses in shadow ROM struct resource (resources should always
>> contain *physical* addresses).
>>
>>   - Remove now-unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
>> flags.
>>
>> This series is based on v4.5-rc1, and it's available on my
>> pci/resource git branch (along with a couple tiny unrelated patches)
>> at [3].
>>
>> Bjorn
>>
>>
>> [1] 
>> http://lkml.kernel.org/r/CA+55aFyVMfTBB0oz_yx8+eQOEJnzGtCsYSj9QuhEpdZ9BHdq5A 
>> at mail.gmail.com
>> [2] 
>> http://lkml.kernel.org/r/CALCETrV+RwNPzxyL8UVNsrAGu-6cCzD_Cc9PFJT2NCTJPLZZiw 
>> at mail.gmail.com
>> [3] 
>> https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=pci/resource
>>
>>
>> ---
>>
>> Bjorn Helgaas (12):
>>   PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED
>>   PCI: Don't assign or reassign immutable resources
>>   PCI: Don't enable/disable ROM BAR if we're using a RAM shadow copy
>>   PCI: Set ROM shadow location in arch code, not in PCI core
>>   PCI: Clean up pci_map_rom() whitespace
>>   ia64/PCI: Use temporary struct resource * to avoid repetition
>>   ia64/PCI: Use ioremap() instead of open-coded equivalent
>>   ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM 
>> resource
>>   MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
>>   MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow 
>> ROM resource
>>   PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
>>   PCI: Simplify sysfs ROM cleanup
>>
>>
>>  arch/ia64/pci/fixup.c  |   21 +++--
>>  arch/ia64/sn/kernel/io_acpi_init.c |   22 ++
>>  arch/ia64/sn/kernel/io_init.c  |   51 --
>>  arch/mips/pci/fixup-loongson3.c|   19 +---
>>  arch/x86/pci/fixup.c   |   21 +++--
>>  drivers/pci/pci-sysfs.c|   13 +-
>>  drivers/pci/remove.c   |1
>>  drivers/pci/rom.c  |   83 
>> +++-
>>  drivers/pci/setup-res.c|6 +++
>>  include/linux/ioport.h |4 --
>>  10 files changed, 111 insertions(+), 130 deletions(-)
>
> I applied this series to pci/resource for v4.6.

This gets rid of all the warnings for me until I try to read my i915
device's rom using sysfs.  Then I get:

i915 :00:02.0: Invalid PCI ROM header signature: expecting 0xaa55,
got 0x

So I suspect that something is still subtly wrong -- I'd imagine that
this should either work or the intialization code should detect that
there is no usable ROM and not expose it.

(To be clear, there's no regression here.)


[Intel-gfx] i915 Skylake: "Invalid ROM contents"

2016-02-29 Thread Andy Lutomirski
On Sun, Jan 10, 2016 at 11:12 AM, Andy Lutomirski  
wrote:
> On Sun, Jan 10, 2016 at 10:41 AM, Andy Lutomirski  
> wrote:
>> On Wed, Nov 18, 2015 at 8:12 AM, Daniel Stone  
>> wrote:
>>> Hi,
>>>
>>> On 18 November 2015 at 15:59, Andy Lutomirski  
>>> wrote:
>>>> On Wed, Nov 18, 2015 at 2:59 AM, Ville Syrjälä
>>>>  wrote:
>>>>> On Tue, Nov 17, 2015 at 11:43:25AM -0800, Andy Lutomirski wrote:
>>>>>> Typing:
>>>>>>
>>>>>> # cat /sys/devices/pci:00/:00:02.0/rom
>>>>>>
>>>>>> Provokes:
>>>>>>
>>>>>> i915 :00:02.0: Invalid ROM contents
>>>>>
>>>>> Hmm. So there's no PCI option ROM there. I wonder what is there. I
>>>>> get the same on my Braswell BTW. I tried to look through the UEFI
>>>>> spec a bit, and it seems to say that even for non-legacy option ROMs
>>>>> the 0x55aa signature should be there.
>>>>>
>>>>> But this being the GPU means we may be using the shadow ROM stuff,
>>>>> which IIRC assumes that the shadow is at 0xc000. I'm not sure that
>>>>> holds anymore with UEFI, and maybe we should be using some UEFI
>>>>> trick instead to find out where it actually lives?
>>>>>
>>>>> BTW what does 'lspci -vv -s 00:02.0' say on your machine?
>>>>>
>>>>
>>>> 00:02.0 VGA compatible controller: Intel Corporation Sky Lake
>>>> Integrated Graphics (rev 07) (prog-if 00 [VGA controller])
>>>> DeviceName:  Onboard IGD
>>>> Subsystem: Dell Device 0704
>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>> SERR- >>> Latency: 0
>>>> Interrupt: pin A routed to IRQ 128
>>>> Region 0: Memory at db00 (64-bit, non-prefetchable) [size=16M]
>>>> Region 2: Memory at 9000 (64-bit, prefetchable) [size=256M]
>>>> Region 4: I/O ports at f000 [size=64]
>>>> Expansion ROM at  [disabled]
>>>
>>> UEFI has an option to enable option ROMs, which is disabled by
>>> default; I wonder if having it disabled prevents all access to the
>>> ROM.
>>>
>>> Mind you, it doesn't seem to be fatal; I've not had any issues with
>>> the same machine that I can pin down to lack of ROM.
>>>
>>
>> FWIW, my logs also get spammed with:
>>
>> [  127.101881] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
>> has bogus alignment
>>
>> I suspect that the PCI core is just failing to recognize that the ROM
>> is disabled.
>>
>
> A bit more info:
>
> I think I only get this error when suspending for the second time
> after boot.  No clue why.
>
> I instrumented the code a bit.  At the time of that error, res->flags
> == 0x2.  It's probably not a coincidence that:
>
> #define IORESOURCE_ROM_SHADOW(1<<1)/* ROM is copy at C000:0 */
>
> Should pci_fixup_video check that the resource exists in the first
> place before setting flags on it?

*ping*

Hi, PCI people.

--Andy


[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-12 Thread Andy Lutomirski
On Tue, Jan 12, 2016 at 6:06 PM, Linus Torvalds
 wrote:
> On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson  
> wrote:
>>
>> The double clflush() remains a mystery.
>
> Actually, I think it's explainable.
>
> It's wrong to do the clflush *after* the GPU has done the write, which
> seems to be what you are doing.
>
> Why?
>
> If the GPU really isn't cache coherent, what can happen is:
>
>  - the CPU has the line cached
>
>  - the GPU writes the data
>
>  - you do the clflushopt to invalidate the cacheline
>
>  - you expect to see the GPU data.
>
> Right?
>
> Wrong. The above is complete crap.
>
> Why?
>
> Very simple reason: the CPU may have had the cacheline dirty at some
> level in its caches, so when you did the clflushopt, it didn't just
> invalidate the CPU cacheline, it wrote it back to memory. And in the
> process over-wrote the data that the GPU had written.
>
> Now you can say "but the CPU never wrote to the cacheline, so it's not
> dirty in the CPU caches". That may or may not be trie. The CPU may
> have written to it quite a long time ago.
>
> So if you are doing a GPU write, and you want to see the data that the
> GPU wrote, you had better do the clflushopt long *before* the GPU ever
> writes to memory.
>
> Your pattern of doing "flush and read" is simply fundamentally buggy.
> There are only two valid CPU flushing patterns:
>
>  - write and flush (to make the writes visible to the GPU)
>
>  - flush before starting GPU accesses, and then read
>
> At no point can "flush and read" be right.
>
> Now, I haven't actually seen your code, so I'm just going by your
> high-level description of where the CPU flush and CPU read were done,
> but it *sounds* like you did that invalid "flush and read" behavior.

Since barriers are on my mind: how strong a barrier is needed to
prevent cache fills from being speculated across the barrier?

Concretely, if I do:

clflush A
clflush B
load A
load B

the architecture guarantees that (unless store forwarding happens) the
value I see for B is at least as new as the value I see for A *with
respect to other access within the coherency domain*.  But the GPU
isn't in the coherency domain at all.

Is:

clflush A
clflush B
load A
MFENCE
load B

good enough?  If it is, and if

clflush A
clflush B
load A
LOCK whatever
load B

is *not*, then this might account for the performance difference.

In any event, it seems to me that what i915 is trying to do isn't
really intended to be supported for WB memory.  i915 really wants to
force a read from main memory and to simultaneously prevent the CPU
from writing back to main memory.  Ick.  I'd assume that:

clflush A
clflush B
load A
serializing instruction here
load B

is good enough, as long as you make sure that the GPU does its writes
after the clflushes make it all the way out to main memory (which
might require a second serializing instruction in the case of
clflushopt), but this still relies on the hardware prefetcher not
prefetching B too early, which it's permitted to do even in the
absence of any explicit access at all.

Presumably this is good enough on any implementation:

clflush A
clflush B
load A
clflush B
load B

But that will be really, really slow.  And you're still screwed if the
hardware is permitted to arbitrarily change cache lines from S to M.

In other words, I'm not really convinced that x86 was ever intended to
have well-defined behavior if something outside the coherency domain
writes to a page of memory while that page is mapped WB.  Of course,
I'm also not sure how to reliably switch a page from WB to any other
memory type short of remapping it and doing CLFLUSH after remapping.

SDM Volume 3 11.12.4 seems to agree with me.

Could the driver be changed to use WC or UC and to use MOVNTDQA on
supported CPUs to get the performance back?  It sounds like i915 is
effectively doing PIO here, and reasonably modern CPUs have a nice set
of fast PIO instructions.

--Andy


[Intel-gfx] i915 Skylake: "Invalid ROM contents"

2016-01-10 Thread Andy Lutomirski
On Sun, Jan 10, 2016 at 10:41 AM, Andy Lutomirski  
wrote:
> On Wed, Nov 18, 2015 at 8:12 AM, Daniel Stone  wrote:
>> Hi,
>>
>> On 18 November 2015 at 15:59, Andy Lutomirski  wrote:
>>> On Wed, Nov 18, 2015 at 2:59 AM, Ville Syrjälä
>>>  wrote:
>>>> On Tue, Nov 17, 2015 at 11:43:25AM -0800, Andy Lutomirski wrote:
>>>>> Typing:
>>>>>
>>>>> # cat /sys/devices/pci:00/:00:02.0/rom
>>>>>
>>>>> Provokes:
>>>>>
>>>>> i915 :00:02.0: Invalid ROM contents
>>>>
>>>> Hmm. So there's no PCI option ROM there. I wonder what is there. I
>>>> get the same on my Braswell BTW. I tried to look through the UEFI
>>>> spec a bit, and it seems to say that even for non-legacy option ROMs
>>>> the 0x55aa signature should be there.
>>>>
>>>> But this being the GPU means we may be using the shadow ROM stuff,
>>>> which IIRC assumes that the shadow is at 0xc000. I'm not sure that
>>>> holds anymore with UEFI, and maybe we should be using some UEFI
>>>> trick instead to find out where it actually lives?
>>>>
>>>> BTW what does 'lspci -vv -s 00:02.0' say on your machine?
>>>>
>>>
>>> 00:02.0 VGA compatible controller: Intel Corporation Sky Lake
>>> Integrated Graphics (rev 07) (prog-if 00 [VGA controller])
>>> DeviceName:  Onboard IGD
>>> Subsystem: Dell Device 0704
>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> SERR- >> Latency: 0
>>> Interrupt: pin A routed to IRQ 128
>>> Region 0: Memory at db00 (64-bit, non-prefetchable) [size=16M]
>>> Region 2: Memory at 9000 (64-bit, prefetchable) [size=256M]
>>> Region 4: I/O ports at f000 [size=64]
>>> Expansion ROM at  [disabled]
>>
>> UEFI has an option to enable option ROMs, which is disabled by
>> default; I wonder if having it disabled prevents all access to the
>> ROM.
>>
>> Mind you, it doesn't seem to be fatal; I've not had any issues with
>> the same machine that I can pin down to lack of ROM.
>>
>
> FWIW, my logs also get spammed with:
>
> [  127.101881] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
> has bogus alignment
>
> I suspect that the PCI core is just failing to recognize that the ROM
> is disabled.
>

A bit more info:

I think I only get this error when suspending for the second time
after boot.  No clue why.

I instrumented the code a bit.  At the time of that error, res->flags
== 0x2.  It's probably not a coincidence that:

#define IORESOURCE_ROM_SHADOW(1<<1)/* ROM is copy at C000:0 */

Should pci_fixup_video check that the resource exists in the first
place before setting flags on it?

--Andy


[Intel-gfx] i915 Skylake: "Invalid ROM contents"

2016-01-10 Thread Andy Lutomirski
On Wed, Nov 18, 2015 at 8:12 AM, Daniel Stone  wrote:
> Hi,
>
> On 18 November 2015 at 15:59, Andy Lutomirski  wrote:
>> On Wed, Nov 18, 2015 at 2:59 AM, Ville Syrjälä
>>  wrote:
>>> On Tue, Nov 17, 2015 at 11:43:25AM -0800, Andy Lutomirski wrote:
>>>> Typing:
>>>>
>>>> # cat /sys/devices/pci:00/:00:02.0/rom
>>>>
>>>> Provokes:
>>>>
>>>> i915 :00:02.0: Invalid ROM contents
>>>
>>> Hmm. So there's no PCI option ROM there. I wonder what is there. I
>>> get the same on my Braswell BTW. I tried to look through the UEFI
>>> spec a bit, and it seems to say that even for non-legacy option ROMs
>>> the 0x55aa signature should be there.
>>>
>>> But this being the GPU means we may be using the shadow ROM stuff,
>>> which IIRC assumes that the shadow is at 0xc000. I'm not sure that
>>> holds anymore with UEFI, and maybe we should be using some UEFI
>>> trick instead to find out where it actually lives?
>>>
>>> BTW what does 'lspci -vv -s 00:02.0' say on your machine?
>>>
>>
>> 00:02.0 VGA compatible controller: Intel Corporation Sky Lake
>> Integrated Graphics (rev 07) (prog-if 00 [VGA controller])
>> DeviceName:  Onboard IGD
>> Subsystem: Dell Device 0704
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> SERR- > Latency: 0
>> Interrupt: pin A routed to IRQ 128
>> Region 0: Memory at db00 (64-bit, non-prefetchable) [size=16M]
>> Region 2: Memory at 9000 (64-bit, prefetchable) [size=256M]
>> Region 4: I/O ports at f000 [size=64]
>> Expansion ROM at  [disabled]
>
> UEFI has an option to enable option ROMs, which is disabled by
> default; I wonder if having it disabled prevents all access to the
> ROM.
>
> Mind you, it doesn't seem to be fatal; I've not had any issues with
> the same machine that I can pin down to lack of ROM.
>

FWIW, my logs also get spammed with:

[  127.101881] i915 :00:02.0: BAR 6: [??? 0x flags 0x2]
has bogus alignment

I suspect that the PCI core is just failing to recognize that the ROM
is disabled.

--Andy


[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-09 Thread Andy Lutomirski
On Sat, Jan 9, 2016 at 12:01 AM, Chris Wilson  
wrote:
> On Thu, Jan 07, 2016 at 02:32:23PM -0800, H. Peter Anvin wrote:
>> On 01/07/16 14:29, H. Peter Anvin wrote:
>> >
>> > I would be very interested in knowing if replacing the final clflushopt
>> > with a clflush would resolve your problems (in which case the last mb()
>> > shouldn't be necessary either.)
>> >
>>
>> Nevermind.  CLFLUSH is not ordered with regards to CLFLUSHOPT to the
>> same cache line.
>>
>> Could you add a sync_cpu(); call to the end (can replace the final mb())
>> and see if that helps your case?
>
> s/sync_cpu()/sync_core()/
>
> No. I still see failures on Baytrail and Braswell (Pineview is not
> affected) with the final mb() replaced with sync_core(). I can reproduce
> failures on Pineview by tweaking the clflush_cache_range() parameters,
> so I am fairly confident that it is validating the current code.
>
> iirc sync_core() is cpuid, a heavy serialising instruction, an
> alternative to mfence.  Is there anything that else I can infer about
> the nature of my bug from this result?

No clue, but I don't know much about the underlying architecture.

Can you try clflush_cache_ranging one cacheline less and then manually
doing clflushopt; mb on the last cache line, just to make sure that
the helper is really doing the right thing?  You could also try
clflush instead of clflushopt to see if that makes a difference.

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH] x86: Add an explicit barrier() to clflushopt()

2016-01-07 Thread Andy Lutomirski
On Thu, Jan 7, 2016 at 2:16 AM, Chris Wilson  
wrote:
> On Mon, Oct 19, 2015 at 10:58:55AM +0100, Chris Wilson wrote:
>> During testing we observed that the last cacheline was not being flushed
>> from a
>>
>>   mb()
>>   for (addr = addr & -clflush_size; addr < end; addr += clflush_size)
>>   clflushopt();
>>   mb()
>>
>> loop (where the initial addr and end were not cacheline aligned).
>>
>> Changing the loop from addr < end to addr <= end, or replacing the
>> clflushopt() with clflush() both fixed the testcase. Hinting that GCC
>> was miscompling the assembly within the loop and specifically the
>> alternative within clflushopt() was confusing the loop optimizer.
>>
>> Adding a barrier() into clflushopt() is enough for GCC to dtrt, but
>> solving why GCC is not seeing the constraints from the alternative_io()
>> would be smarter...
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92501
>> Testcase: gem_tiled_partial_pwrite_pread/read
>> Signed-off-by: Chris Wilson 
>> Cc: Ross Zwisler 
>> Cc: H. Peter Anvin 
>> Cc: Imre Deak 
>> Cc: Daniel Vetter 
>> Cc: dri-devel at lists.freedesktop.org
>> ---
>>  arch/x86/include/asm/special_insns.h | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/special_insns.h 
>> b/arch/x86/include/asm/special_insns.h
>> index 2270e41b32fd..0c7aedbf8930 100644
>> --- a/arch/x86/include/asm/special_insns.h
>> +++ b/arch/x86/include/asm/special_insns.h
>> @@ -199,6 +199,11 @@ static inline void clflushopt(volatile void *__p)
>>  ".byte 0x66; clflush %P0",
>>  X86_FEATURE_CLFLUSHOPT,
>>  "+m" (*(volatile char __force *)__p));
>> + /* GCC (4.9.1 and 5.2.1 at least) appears to be very confused when
>> +  * meeting this alternative() and demonstrably miscompiles loops
>> +  * iterating over clflushopts.
>> +  */
>> + barrier();
>>  }
>
> Or an alternative:
>
> +#define alternative_output(oldinstr, newinstr, feature, output)\
> +   asm volatile (ALTERNATIVE(oldinstr, newinstr, feature)  \
> +   : output : "i" (0) : "memory")
>
> I would really appreciate some knowledgeable folks taking a look at the
> asm for clflushopt() as it still affects today's kernel and gcc.
>
> Fwiw, I have confirmed that arch/x86/mm/pageattr.c clflush_cache_range()
> is similarly affected.

Unless I'm mis-reading the asm, clflush_cache_range() is compiled
correctly for me.  (I don't know what the %P is for in the asm, but
that shouldn't matter.)  The ALTERNATIVE shouldn't even be visible to
the optimizer.

Can you attach a bad .s file and let us know what gcc version this is?
 (You can usually do 'make foo/bar/baz.s' to get a .s file.)  I'd also
be curious whether changing clflushopt to clwb works around the issue.

--Andy


i915 Skylake crash on 4.4-rc3

2015-12-07 Thread Andy Lutomirski
[53834.386369] traps: gnome-session-b[2308] general protection
ip:7f10efc1fc2b sp:7ffdfde31880 error:0 in
libc-2.22.so[7f10efba1000+1b7000]
[53834.687584] [ cut here ]
[53834.687607] WARNING: CPU: 0 PID: 23730 at
drivers/gpu/drm/i915/i915_gem_context.c:144
i915_gem_context_free+0x196/0x1c0 [i915]()
[53834.687609] WARN_ON(!list_empty(>base.active_list))
[53834.687610] Modules linked in:
[53834.687612]  wmi_mof dell_wmi wmi(E) rfcomm fuse ccm cmac
xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_broute bridge stp llc
ebtable_nat ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw
ip6table_security ip6table_filter ip6_tables iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security arc4 bnep iwlmvm mac80211
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_intel iwlwifi snd_hda_codec intel_rapl x86_pkg_temp_thermal
snd_hwdep coretemp snd_hda_core kvm_intel btusb snd_seq btrtl kvm
uvcvideo btbcm cfg80211 btintel bluetooth snd_seq_device
[53834.687656]  videobuf2_vmalloc snd_pcm videobuf2_memops
videobuf2_v4l2 hid_multitouch videobuf2_core sparse_keymap v4l2_common
videodev i2c_designware_platform i2c_designware_core vfat snd_timer
fat irqbypass dell_laptop snd dcdbas efi_pstore pcspkr joydev efivars
media rtsx_pci_ms rfkill soundcore i2c_i801 memstick
pinctrl_sunrisepoint pinctrl_intel intel_lpss_acpi int3400_thermal
int3403_thermal acpi_thermal_rel acpi_pad mei_me tpm_tis mei tpm
idma64 shpchp virt_dma acpi_als kfifo_buf processor_thermal_device
nfsd industrialio intel_soc_dts_iosf iosf_mbi intel_lpss_pci
int340x_thermal_zone intel_lpss auth_rpcgss nfs_acl lockd grace sunrpc
dm_crypt i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect
sysimgblt fb_sys_fops drm rtsx_pci_sdmmc mmc_core crct10dif_pclmul
crc32_pclmul crc32c_intel
[53834.687700]  serio_raw rtsx_pci i2c_hid video [last unloaded: wmi]
[53834.687706] CPU: 0 PID: 23730 Comm: kworker/u8:4 Tainted: G
W   E   4.4.0-rc3+ #19
[53834.687708] Hardware name: Dell Inc. XPS 13 9350/07TYC2, BIOS 1.0.4
10/19/2015
[53834.687726] Workqueue: i915 i915_gem_retire_work_handler [i915]
[53834.687728]   c62a1d15 88017ffb7c70
8142510c
[53834.687731]  88017ffb7cb8 88017ffb7ca8 81092122
8802adfd1240
[53834.687734]  8802845dd800 88028468 88017ffb7d70
8802adfd12b8
[53834.687738] Call Trace:
[53834.687743]  [] dump_stack+0x4e/0x82
[53834.687747]  [] warn_slowpath_common+0x82/0xc0
[53834.687750]  [] warn_slowpath_fmt+0x5c/0x80
[53834.687764]  [] i915_gem_context_free+0x196/0x1c0 [i915]
[53834.68]  [] i915_gem_request_free+0x9f/0xb0 [i915]
[53834.687792]  []
intel_execlists_retire_requests+0x138/0x190 [i915]
[53834.687806]  [] i915_gem_retire_requests+0xd1/0xe0 [i915]
[53834.687827]  []
i915_gem_retire_work_handler+0x58/0x70 [i915]
[53834.687831]  [] process_one_work+0x152/0x400
[53834.687834]  [] worker_thread+0x4b/0x440
[53834.687837]  [] ? process_one_work+0x400/0x400
[53834.687839]  [] ? process_one_work+0x400/0x400
[53834.687842]  [] kthread+0xd8/0xf0
[53834.687845]  [] ? kthread_worker_fn+0x150/0x150
[53834.687849]  [] ret_from_fork+0x3f/0x70
[53834.687851]  [] ? kthread_worker_fn+0x150/0x150
[53834.690419] ---[ end trace 1802637761c0942d ]---

I think this happened when I started emacs.

My user session crashed, but the system is still usable after logging back in.

--Andy


[Intel-gfx] i915 Skylake: "Invalid ROM contents"

2015-11-18 Thread Andy Lutomirski
[adding linux-pci]

On Wed, Nov 18, 2015 at 2:59 AM, Ville Syrjälä
 wrote:
> On Tue, Nov 17, 2015 at 11:43:25AM -0800, Andy Lutomirski wrote:
>> Typing:
>>
>> # cat /sys/devices/pci:00/:00:02.0/rom
>>
>> Provokes:
>>
>> i915 :00:02.0: Invalid ROM contents
>
> Hmm. So there's no PCI option ROM there. I wonder what is there. I
> get the same on my Braswell BTW. I tried to look through the UEFI
> spec a bit, and it seems to say that even for non-legacy option ROMs
> the 0x55aa signature should be there.
>
> But this being the GPU means we may be using the shadow ROM stuff,
> which IIRC assumes that the shadow is at 0xc000. I'm not sure that
> holds anymore with UEFI, and maybe we should be using some UEFI
> trick instead to find out where it actually lives?
>
> BTW what does 'lspci -vv -s 00:02.0' say on your machine?
>

00:02.0 VGA compatible controller: Intel Corporation Sky Lake
Integrated Graphics (rev 07) (prog-if 00 [VGA controller])
DeviceName:  Onboard IGD
Subsystem: Dell Device 0704
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR-  [disabled]
Capabilities: [40] Vendor Specific Information: Len=0c 
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap:MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl:Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta:CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00018  Data: 
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] #1b
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap:Invalidate Queue Depth: 00
ATSCtl:Enable-, Smallest Translation Unit: 00
Capabilities: [300 v1] #13
Kernel driver in use: i915
Kernel modules: i915

--Andy

>>
>> This is on a Dell XPS 13 9350 (Skylake).  This is 4.3.0 plus some
>> wireless-next bits.
>>
>> --Andy
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
>> ___
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Ville Syrjälä
> Intel OTC



-- 
Andy Lutomirski
AMA Capital Management, LLC


i915 Skylake: "Invalid ROM contents"

2015-11-17 Thread Andy Lutomirski
Typing:

# cat /sys/devices/pci:00/:00:02.0/rom

Provokes:

i915 :00:02.0: Invalid ROM contents

This is on a Dell XPS 13 9350 (Skylake).  This is 4.3.0 plus some
wireless-next bits.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC


X using radeon is refusing to start

2015-02-05 Thread Andy Lutomirski
I just started getting X failures that say:

[   739.208] (EE) RADEON(0): [drm] failed to set drm interface version.

I'm not sure what triggered it.

dmesg says:

[  740.156499] [drm:drm_stub_open]
[  740.156502] [drm:drm_open_helper] pid = 2170, minor = 0
[  740.156541] [drm:drm_ioctl] pid=2170, dev=0xe200, auth=1,
DRM_IOCTL_MODE_GETRESOURCES
[  740.156557] [drm:drm_ioctl] pid=2170, dev=0xe200, auth=1,
DRM_IOCTL_MODE_GETRESOURCES
[  740.156575] [drm:drm_release] open_count = 2
[  740.156577] [drm:drm_release] pid = 2170, device = 0xe200, open_count = 2
[  740.158548] [drm:drm_ioctl] pid=2170, dev=0xe200, auth=1,
DRM_IOCTL_SET_VERSION
[  740.158549] [drm:drm_ioctl] ret = -13
[  740.159612] [drm:drm_framebuffer_reference] 8804457110a0: FB ID: 83 (3)

-13 means -EACCES.

The X logs say:

[   739.042]
X.Org X Server 1.16.3
Release Date: 2014-12-20
[   739.055] X Protocol Version 11, Revision 0
[   739.059] Build Operating System:  3.17.8-300.bz1178975.fc21.x86_64
[   739.063] Current Operating System: Linux
amaluto.corp.amacapital.net 3.18.3-201.fc21.x86_64 #1 SMP Mon Jan 19
15:59:31 UTC 2015 x86_64
[   739.063] Kernel command line: rd.md=0 rd.dm=0
rd.lvm.lv=vg_amaluto_2014/root  KEYTABLE=us
rd.luks.uuid=luks-1dd64d38-40c0-4e20-ad67-aa2590991023 SYSFONT=True ro
root=/dev/mapper/vg_amaluto_2014-root LANG=en_US.UTF-8 rhgb quiet
[   739.078] Build Date: 31 January 2015  11:23:27PM
[   739.082] Build ID: xorg-x11-server 1.16.3-2.fc21
[   739.087] Current version of pixman: 0.32.6
[   739.096] Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
[   739.097] Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[   739.116] (==) Log file: "/var/log/Xorg.0.log", Time: Thu Feb  5
15:54:44 2015
[   739.122] (==) Using config file: "/etc/X11/xorg.conf"
[   739.127] (==) Using config directory: "/etc/X11/xorg.conf.d"
[   739.131] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[   739.132] (==) No Layout section.  Using the first Screen section.
[   739.132] (==) No screen section available. Using defaults.
[   739.132] (**) |-->Screen "Default Screen Section" (0)
[   739.132] (**) |   |-->Monitor ""
[   739.133] (==) No monitor specified for screen "Default Screen Section".
Using a default monitor configuration.
[   739.133] (==) Automatically adding devices
[   739.133] (==) Automatically enabling devices
[   739.133] (==) Automatically adding GPU devices
[   739.133] (==) FontPath set to:
catalogue:/etc/X11/fontpath.d,
built-ins
[   739.133] (**) ModulePath set to
"/usr/local/lib/xorg/modules,/usr/lib64/xorg/modules"
[   739.133] (II) The server relies on udev to provide the list of
input devices.
If no devices become available, reconfigure udev or disable AutoAddDevices.
[   739.133] (II) Loader magic: 0x81de40
[   739.133] (II) Module ABI versions:
[   739.133] X.Org ANSI C Emulation: 0.4
[   739.133] X.Org Video Driver: 18.0
[   739.133] X.Org XInput driver : 21.0
[   739.133] X.Org Server Extension : 8.0
[   739.135] (II) systemd-logind: took control of session
/org/freedesktop/login1/session/_31
[   739.136] (II) xfree86: Adding drm device (/dev/dri/card0)
[   739.136] (II) systemd-logind: got fd for /dev/dri/card0 226:0 fd 10 paused 0
[   739.146] (--) PCI:*(0:9:0:0) 1002:683f:1787:2318 rev 0, Mem @
0xe000/268435456, 0xf4a0/262144, I/O @ 0xc000/256, BIOS @
0x/131072
[   739.146] (II) LoadModule: "glx"
[   739.147] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so
[   739.150] (II) Module glx: vendor="X.Org Foundation"
[   739.150] compiled for 1.16.3, module version = 1.0.0
[   739.150] ABI class: X.Org Server Extension, version 8.0
[   739.150] (==) AIGLX enabled
[   739.150] (==) Matched ati as autoconfigured driver 0
[   739.150] (==) Matched ati as autoconfigured driver 1
[   739.150] (==) Matched modesetting as autoconfigured driver 2
[   739.150] (==) Matched fbdev as autoconfigured driver 3
[   739.150] (==) Matched vesa as autoconfigured driver 4
[   739.150] (==) Assigned the driver to the xf86ConfigLayout
[   739.150] (II) LoadModule: "ati"
[   739.151] (II) Loading /usr/lib64/xorg/modules/drivers/ati_drv.so
[   739.151] (II) Module ati: vendor="X.Org Foundation"
[   739.151] compiled for 1.16.1, module version = 7.5.0
[   739.152] Module class: X.Org Video Driver
[   739.152] ABI class: X.Org Video Driver, version 18.0
[   739.152] (II) LoadModule: "radeon"
[   739.152] (II) Loading /usr/lib64/xorg/modules/drivers/radeon_drv.so
[   739.153] (II) Module radeon: vendor="X.Org Foundation"
[   739.153] compiled for 1.16.1, module version = 7.5.0
[   739.153] Module class: X.Org Video Driver
[   739.153] ABI class: X.Org Video Driver, version 18.0
[   739.153] (II) LoadModule: "modesetting"
[   739.154] 

Long radeon stalls on recent kernels

2014-12-10 Thread Andy Lutomirski
On Wed, Dec 10, 2014 at 8:24 PM, Michel Dänzer  wrote:
> On 11.12.2014 05:28, Andy Lutomirski wrote:
>> On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer  
>> wrote:
>>> On 10.12.2014 06:39, Andy Lutomirski wrote:
>>>> On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski  
>>>> wrote:
>>>>> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer  
>>>>> wrote:
>>>>>> On 09.12.2014 09:24, Andy Lutomirski wrote:
>>>>>>>
>>>>>>> The relevant line from latencytop seems to be:
>>>>>>>
>>>>>>> 154 20441402 489139 radeon_fence_default_wait [radeon]
>>>>>>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm]
>>>>>>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon]
>>>>>>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first
>>>>>>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm]
>>>>>>> radeon_bo_fault_reserve_notify [radeon]
>>>>>>
>>>>>> Which process is this?
>>>>>
>>>>> Xorg
>>>>>
>>>>>>
>>>>>> Looks like CPU access to a BO in VRAM, but the BO is located outside of
>>>>>> the CPU visible area of VRAM, so it has to be moved into the CPU visible
>>>>>> area first.
>
> [...]
>
>>>> But I'm still waiting for the day that buggy userspace *can't* cause
>>>> kernel graphics stalls.
>>>
>>> Actually, this looks more like buggy userspace stalling itself. :)
>>
>> I thought the stall was the kernel evicting things from vram.  Why
>> does it need to wait for userspace for that?  Is it that userspace is
>> actively using whatever's being evicted?
>
> As I explained above, the stall happens because userspace does CPU
> access to a BO which resides in the CPU-inaccessible part of VRAM. The
> kernel has to move the BO into the CPU accessible part of VRAM before it
> can let userspace proceed.

Sure, but why does that take nearly 500ms?  Even if the object in
question is the entire framebuffer, that still seems extraordinarily
slow.

--Andy

>
> Current Mesa (10.4 or newer I think) sets a hint for BOs which will
> likely be accessed by the CPU, so recent kernels can prioritize putting
> those into the CPU accessible part of VRAM in the first place.
>
> Or, if you're using EXA, the problem could be in the xf86-video-ati EXA
> code.
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-12-10 Thread Andy Lutomirski
On Wed, Dec 10, 2014 at 1:44 AM, Michel Dänzer  wrote:
> On 10.12.2014 06:39, Andy Lutomirski wrote:
>> On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski  
>> wrote:
>>> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer  
>>> wrote:
>>>> On 09.12.2014 09:24, Andy Lutomirski wrote:
>>>>>
>>>>> The relevant line from latencytop seems to be:
>>>>>
>>>>> 154 20441402 489139 radeon_fence_default_wait [radeon]
>>>>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm]
>>>>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon]
>>>>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first
>>>>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm]
>>>>> radeon_bo_fault_reserve_notify [radeon]
>>>>
>>>> Which process is this?
>>>
>>> Xorg
>>>
>>>>
>>>> Looks like CPU access to a BO in VRAM, but the BO is located outside of
>>>> the CPU visible area of VRAM, so it has to be moved into the CPU visible
>>>> area first.
>>>>
>>>> Which version of Mesa are you using?
>>>>
>>>
>>> mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64
>>>
>>> I'm planning on upgrading to Fedora 21 fairly soon.
>>
>> Upgrading to mesa-dri-drivers-10.3.3-1.20141110.fc21.x86_64 seems to
>> have helped enough that my usual test (open a couple of Firefox tabs
>> with graphics in them) doesn't hang anymore.
>
> Hmm, since that looks like the exact same upstream version, maybe it was
> actually upgrading something else that made the difference?
>

Maybe mutter?

>
>> This card still isn't *fast*.
>
> I'm afraid it wasn't exactly a high-end card even when it was new. What
> kind of operations are slow?

Things like scrolling in Google Maps.  It's not *that* bad, but older
Intel IGPs still seem considerably smoother.

>
>
>> Is there some way I can check that I'm actually using all 16 PCIe lanes?
>> In my tinkering w/ power management settings, I got some odd logs
>> suggesting that only one lane was in use.
>
> You can try forcing off ASPM with radeon.aspm=0, other than that I'm not
> sure.
>
>
>> But I'm still waiting for the day that buggy userspace *can't* cause
>> kernel graphics stalls.
>
> Actually, this looks more like buggy userspace stalling itself. :)

I thought the stall was the kernel evicting things from vram.  Why
does it need to wait for userspace for that?  Is it that userspace is
actively using whatever's being evicted?

--Andy

>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-12-09 Thread Andy Lutomirski
On Tue, Dec 9, 2014 at 8:06 AM, Andy Lutomirski  wrote:
> On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer  wrote:
>> On 09.12.2014 09:24, Andy Lutomirski wrote:
>>>
>>> The relevant line from latencytop seems to be:
>>>
>>> 154 20441402 489139 radeon_fence_default_wait [radeon]
>>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm]
>>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon]
>>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first
>>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm]
>>> radeon_bo_fault_reserve_notify [radeon]
>>
>> Which process is this?
>
> Xorg
>
>>
>> Looks like CPU access to a BO in VRAM, but the BO is located outside of
>> the CPU visible area of VRAM, so it has to be moved into the CPU visible
>> area first.
>>
>> Which version of Mesa are you using?
>>
>
> mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64
>
> I'm planning on upgrading to Fedora 21 fairly soon.

Upgrading to mesa-dri-drivers-10.3.3-1.20141110.fc21.x86_64 seems to
have helped enough that my usual test (open a couple of Firefox tabs
with graphics in them) doesn't hang anymore.

This card still isn't *fast*.  Is there some way I can check that I'm
actually using all 16 PCIe lanes?  In my tinkering w/ power management
settings, I got some odd logs suggesting that only one lane was in
use.

Other than that, maybe everything works :)  But I'm still waiting for
the day that buggy userspace *can't* cause kernel graphics stalls.

--Andy

>
> --Andy
>
>>
>> --
>> Earthling Michel Dänzer   |   http://www.amd.com
>> Libre software enthusiast | Mesa and X developer
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-12-09 Thread Andy Lutomirski
On Tue, Dec 9, 2014 at 1:18 AM, Michel Dänzer  wrote:
> On 09.12.2014 09:24, Andy Lutomirski wrote:
>>
>> The relevant line from latencytop seems to be:
>>
>> 154 20441402 489139 radeon_fence_default_wait [radeon]
>> fence_wait_timeout ttm_bo_wait [ttm] ttm_bo_move_accel_cleanup [ttm]
>> radeon_move_blit.isra.12 [radeon] radeon_bo_move [radeon]
>> ttm_bo_handle_move_mem [ttm] ttm_bo_evict [ttm] ttm_mem_evict_first
>> [ttm] ttm_bo_mem_space [ttm] ttm_bo_validate [ttm]
>> radeon_bo_fault_reserve_notify [radeon]
>
> Which process is this?

Xorg

>
> Looks like CPU access to a BO in VRAM, but the BO is located outside of
> the CPU visible area of VRAM, so it has to be moved into the CPU visible
> area first.
>
> Which version of Mesa are you using?
>

mesa-dri-drivers-10.3.3-1.20141110.fc20.x86_64

I'm planning on upgrading to Fedora 21 fairly soon.

--Andy

>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-12-08 Thread Andy Lutomirski
On Wed, Nov 26, 2014 at 7:38 AM, Andy Lutomirski  wrote:
> On Tue, Nov 25, 2014 at 10:42 PM, Michel Dänzer  
> wrote:
>> On 20.11.2014 09:58, Andy Lutomirski wrote:
>>>
>>> On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski 
>>> wrote:
>>>>
>>>> On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer 
>>>> wrote:
>>>>>
>>>>> On 19.11.2014 09:21, Andy Lutomirski wrote:
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer 
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>>>>>>> graphics intensive seems to cause my system to become unusable for
>>>>>>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender
>>>>>>>> --
>>>>>>>> it can take several minutes for me to move my mouse far enough to
>>>>>>>> close the tab and get my computer back.
>>>>>>>>
>>>>>>>> On bootup, I get this warning:
>>>>>>>> [drm:btc_dpm_set_power_state] *ERROR*
>>>>>>>> rv770_restrict_performance_levels_before_switch failed
>>>>>>>>
>>>>>>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>>>>>>> giving my rather slow graphics.
>>>>>>>>
>>>>>>>> Are there known issues here?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Can you bisect the kernel, or at least isolate which kernel version
>>>>>>> first
>>>>>>> introduced the problem?
>>>>>>
>>>>>>
>>>>>>
>>>>>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
>>>>>> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>>>>>>
>>>>>> With radeon.dpm=0, I can still trigger short stalls (around one
>>>>>> second), but I seem unable to trigger long stalls easily.  (I say
>>>>>> easily because, just as I was typing this email, my system stalled for
>>>>>> about a minute.)
>>>>>
>>>>>
>>>>>
>>>>> I can only think of two things offhand that could cause such extremely
>>>>> long
>>>>> stalls: Swap thrashing or IRQ storms.
>>>>>
>>>>> With a setup where you can easily trigger long stalls, can you try
>>>>> getting a
>>>>> CPU profile for a stall with sysprof or perf?
>>>>>
>>>>>
>>>>
>>>> Got one with perf:
>>>>
>>>>16.82% Xorg  libc-2.18.so[.]
>>>> __memcpy_sse2_unaligned
>>>> 9.20%  swapper  [kernel.kallsyms]   [k]
>>>> intel_idle
>>>> 1.00% Xorg  [kernel.kallsyms]   [k]
>>>> evergreen_irq_set
>>>> 0.83%  firefox  libxul.so   [.]
>>>> 0x01d93281
>>>> 0.69%  firefox  libxul.so   [.]
>>>> 0x01d932ad
>>>> 0.62%  firefox  [kernel.kallsyms]   [k]
>>>> copy_user_generic_string
>>>> 0.55%  swapper  [kernel.kallsyms]   [k]
>>>> evergreen_irq_ack
>>>> 0.54%  firefox  libpthread-2.18.so  [.]
>>>> pthread_mutex_lock
>>>> 0.52%  firefox  libpthread-2.18.so  [.]
>>>> pthread_mutex_unlock
>>>> 0.45% Xorg  [kernel.kallsyms]   [k]
>>>> drm_mm_insert_node_in_range_generic
>>>> 0.41% Xorg  [kernel.kallsyms]   [k]
>>>> lock_release
>>>> 0.40% Xorg  [kernel.kallsyms]   [k]
>>>> lock_acquire
>>>> 0.35%  firefox  firefox [.]
>>>> 0x0001245d
>>>> 0.33% 

Long radeon stalls on recent kernels

2014-11-26 Thread Andy Lutomirski
On Tue, Nov 25, 2014 at 10:42 PM, Michel Dänzer  wrote:
> On 20.11.2014 09:58, Andy Lutomirski wrote:
>>
>> On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski 
>> wrote:
>>>
>>> On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer 
>>> wrote:
>>>>
>>>> On 19.11.2014 09:21, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer 
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>>>>>> graphics intensive seems to cause my system to become unusable for
>>>>>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender
>>>>>>> --
>>>>>>> it can take several minutes for me to move my mouse far enough to
>>>>>>> close the tab and get my computer back.
>>>>>>>
>>>>>>> On bootup, I get this warning:
>>>>>>> [drm:btc_dpm_set_power_state] *ERROR*
>>>>>>> rv770_restrict_performance_levels_before_switch failed
>>>>>>>
>>>>>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>>>>>> giving my rather slow graphics.
>>>>>>>
>>>>>>> Are there known issues here?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can you bisect the kernel, or at least isolate which kernel version
>>>>>> first
>>>>>> introduced the problem?
>>>>>
>>>>>
>>>>>
>>>>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
>>>>> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>>>>>
>>>>> With radeon.dpm=0, I can still trigger short stalls (around one
>>>>> second), but I seem unable to trigger long stalls easily.  (I say
>>>>> easily because, just as I was typing this email, my system stalled for
>>>>> about a minute.)
>>>>
>>>>
>>>>
>>>> I can only think of two things offhand that could cause such extremely
>>>> long
>>>> stalls: Swap thrashing or IRQ storms.
>>>>
>>>> With a setup where you can easily trigger long stalls, can you try
>>>> getting a
>>>> CPU profile for a stall with sysprof or perf?
>>>>
>>>>
>>>
>>> Got one with perf:
>>>
>>>16.82% Xorg  libc-2.18.so[.]
>>> __memcpy_sse2_unaligned
>>> 9.20%  swapper  [kernel.kallsyms]   [k]
>>> intel_idle
>>> 1.00% Xorg  [kernel.kallsyms]   [k]
>>> evergreen_irq_set
>>> 0.83%  firefox  libxul.so   [.]
>>> 0x01d93281
>>> 0.69%  firefox  libxul.so   [.]
>>> 0x01d932ad
>>> 0.62%  firefox  [kernel.kallsyms]   [k]
>>> copy_user_generic_string
>>> 0.55%  swapper  [kernel.kallsyms]   [k]
>>> evergreen_irq_ack
>>> 0.54%  firefox  libpthread-2.18.so  [.]
>>> pthread_mutex_lock
>>> 0.52%  firefox  libpthread-2.18.so  [.]
>>> pthread_mutex_unlock
>>> 0.45% Xorg  [kernel.kallsyms]   [k]
>>> drm_mm_insert_node_in_range_generic
>>> 0.41% Xorg  [kernel.kallsyms]   [k]
>>> lock_release
>>> 0.40% Xorg  [kernel.kallsyms]   [k]
>>> lock_acquire
>>> 0.35%  firefox  firefox [.]
>>> 0x0001245d
>>> 0.33% Xorg  [kernel.kallsyms]   [k]
>>> __module_address
>>> 0.31%  firefox  [kernel.kallsyms]   [k]
>>> clear_page_c
>>> 0.29% Xorg  [kernel.kallsyms]   [k]
>>> copy_user_generic_string
>>> 0.28%  firefox  firefox [.]
>>> 0x00013159
>>

Long radeon stalls on recent kernels

2014-11-19 Thread Andy Lutomirski
On Wed, Nov 19, 2014 at 4:07 PM, Andy Lutomirski  wrote:
> On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer  
> wrote:
>> On 19.11.2014 09:21, Andy Lutomirski wrote:
>>>
>>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer  
>>> wrote:
>>>>
>>>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>>>> graphics intensive seems to cause my system to become unusable for
>>>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender --
>>>>> it can take several minutes for me to move my mouse far enough to
>>>>> close the tab and get my computer back.
>>>>>
>>>>> On bootup, I get this warning:
>>>>> [drm:btc_dpm_set_power_state] *ERROR*
>>>>> rv770_restrict_performance_levels_before_switch failed
>>>>>
>>>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>>>> giving my rather slow graphics.
>>>>>
>>>>> Are there known issues here?
>>>>
>>>>
>>>>
>>>> Can you bisect the kernel, or at least isolate which kernel version first
>>>> introduced the problem?
>>>
>>>
>>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
>>> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>>>
>>> With radeon.dpm=0, I can still trigger short stalls (around one
>>> second), but I seem unable to trigger long stalls easily.  (I say
>>> easily because, just as I was typing this email, my system stalled for
>>> about a minute.)
>>
>>
>> I can only think of two things offhand that could cause such extremely long
>> stalls: Swap thrashing or IRQ storms.
>>
>> With a setup where you can easily trigger long stalls, can you try getting a
>> CPU profile for a stall with sysprof or perf?
>>
>>
>
> Got one with perf:
>
>   16.82% Xorg  libc-2.18.so[.]
> __memcpy_sse2_unaligned
>9.20%  swapper  [kernel.kallsyms]   [k] intel_idle
>1.00% Xorg  [kernel.kallsyms]   [k]
> evergreen_irq_set
>0.83%  firefox  libxul.so   [.]
> 0x01d93281
>0.69%  firefox  libxul.so   [.]
> 0x01d932ad
>0.62%  firefox  [kernel.kallsyms]   [k]
> copy_user_generic_string
>0.55%  swapper  [kernel.kallsyms]   [k]
> evergreen_irq_ack
>0.54%  firefox  libpthread-2.18.so  [.]
> pthread_mutex_lock
>0.52%  firefox  libpthread-2.18.so  [.]
> pthread_mutex_unlock
>0.45% Xorg  [kernel.kallsyms]   [k]
> drm_mm_insert_node_in_range_generic
>0.41% Xorg  [kernel.kallsyms]   [k] 
> lock_release
>0.40% Xorg  [kernel.kallsyms]   [k] 
> lock_acquire
>0.35%  firefox  firefox [.]
> 0x0001245d
>0.33% Xorg  [kernel.kallsyms]   [k]
> __module_address
>0.31%  firefox  [kernel.kallsyms]   [k] 
> clear_page_c
>0.29% Xorg  [kernel.kallsyms]   [k]
> copy_user_generic_string
>0.28%  firefox  firefox [.]
> 0x00013159
>
> and:
>
> Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802
>   87.43%  swapper  [kernel.kallsyms]  [k] handle_irq_event_percpu
>7.52%  firefox  [kernel.kallsyms]  [k] handle_irq_event_percpu
>1.84%  irq/36-ahci  [kernel.kallsyms]  [k] handle_irq_event_percpu
>1.14% Xorg  [kernel.kallsyms]  [k] handle_irq_event_percpu
>0.75%  kworker/5:0  [kernel.kallsyms]  [k] handle_irq_event_percpu
>0.32%  gnome-shell  [kernel.kallsyms]  [k] handle_irq_event_percpu
>0.25% kworker/5:1H  [kernel.kallsyms]  [k] handle_irq_event_percpu
>0.25%  Media D~ode #10  [kernel.kallsyms]  [k] handle_irq_event_percpu
>    0.19%  ImageDe~er #330  [kernel.kallsyms]  [k] handle_irq_event_percpu
>0.07%   pulseaudio  [kernel.kallsyms]  [k] handle_irq_event_percpu
>
> The cycles were with -e cycles:pp, so I think that iret would have
> shown up if there were enough IRQs to cause the problem.
>
> I'll build a kernel with latencytop.
>

I just caught call_rwsem_down_write_failed for 5379 ms in khugepaged
(holy crap) and radeon_fence_default_wait for 489.2ms in Xorg.

Turning off THP gets rid of the khugepaged thing.  The 489.2ms is
radeon_fence_default_wait is amazingly reproducible -- I've seen that
exact number three times now.

> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-11-19 Thread Andy Lutomirski
On Tue, Nov 18, 2014 at 11:19 PM, Michel Dänzer  wrote:
> On 19.11.2014 09:21, Andy Lutomirski wrote:
>>
>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer  
>> wrote:
>>>
>>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>>> graphics intensive seems to cause my system to become unusable for
>>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender --
>>>> it can take several minutes for me to move my mouse far enough to
>>>> close the tab and get my computer back.
>>>>
>>>> On bootup, I get this warning:
>>>> [drm:btc_dpm_set_power_state] *ERROR*
>>>> rv770_restrict_performance_levels_before_switch failed
>>>>
>>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>>> giving my rather slow graphics.
>>>>
>>>> Are there known issues here?
>>>
>>>
>>>
>>> Can you bisect the kernel, or at least isolate which kernel version first
>>> introduced the problem?
>>
>>
>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
>> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>>
>> With radeon.dpm=0, I can still trigger short stalls (around one
>> second), but I seem unable to trigger long stalls easily.  (I say
>> easily because, just as I was typing this email, my system stalled for
>> about a minute.)
>
>
> I can only think of two things offhand that could cause such extremely long
> stalls: Swap thrashing or IRQ storms.
>
> With a setup where you can easily trigger long stalls, can you try getting a
> CPU profile for a stall with sysprof or perf?
>
>

Got one with perf:

  16.82% Xorg  libc-2.18.so[.]
__memcpy_sse2_unaligned
   9.20%  swapper  [kernel.kallsyms]   [k] intel_idle
   1.00% Xorg  [kernel.kallsyms]   [k]
evergreen_irq_set
   0.83%  firefox  libxul.so   [.]
0x01d93281
   0.69%  firefox  libxul.so   [.]
0x01d932ad
   0.62%  firefox  [kernel.kallsyms]   [k]
copy_user_generic_string
   0.55%  swapper  [kernel.kallsyms]   [k]
evergreen_irq_ack
   0.54%  firefox  libpthread-2.18.so  [.]
pthread_mutex_lock
   0.52%  firefox  libpthread-2.18.so  [.]
pthread_mutex_unlock
   0.45% Xorg  [kernel.kallsyms]   [k]
drm_mm_insert_node_in_range_generic
   0.41% Xorg  [kernel.kallsyms]   [k] lock_release
   0.40% Xorg  [kernel.kallsyms]   [k] lock_acquire
   0.35%  firefox  firefox [.]
0x0001245d
   0.33% Xorg  [kernel.kallsyms]   [k]
__module_address
   0.31%  firefox  [kernel.kallsyms]   [k] clear_page_c
   0.29% Xorg  [kernel.kallsyms]   [k]
copy_user_generic_string
   0.28%  firefox  firefox [.]
0x00013159

and:

Samples: 11K of event 'irq:irq_handler_entry', Event count (approx.): 11802
  87.43%  swapper  [kernel.kallsyms]  [k] handle_irq_event_percpu
   7.52%  firefox  [kernel.kallsyms]  [k] handle_irq_event_percpu
   1.84%  irq/36-ahci  [kernel.kallsyms]  [k] handle_irq_event_percpu
   1.14% Xorg  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.75%  kworker/5:0  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.32%  gnome-shell  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.25% kworker/5:1H  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.25%  Media D~ode #10  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.19%  ImageDe~er #330  [kernel.kallsyms]  [k] handle_irq_event_percpu
   0.07%   pulseaudio  [kernel.kallsyms]  [k] handle_irq_event_percpu

The cycles were with -e cycles:pp, so I think that iret would have
shown up if there were enough IRQs to cause the problem.

I'll build a kernel with latencytop.

--Andy


Long radeon stalls on recent kernels

2014-11-18 Thread Andy Lutomirski
On Tue, Nov 18, 2014 at 4:34 PM, Andy Lutomirski  wrote:
> On Tue, Nov 18, 2014 at 4:21 PM, Andy Lutomirski  
> wrote:
>> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer  
>> wrote:
>>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>>
>>>> I have a Caicos card, like this:
>>>>
>>>> [3.077260] [drm] radeon kernel modesetting enabled.
>>>> [3.077338] checking generic (e000 60) vs hw (e000
>>>> 1000)
>>>> [3.077339] fb: switching to radeondrmfb from EFI VGA
>>>> [3.077377] Console: switching to colour dummy device 80x25
>>>> [3.078881] [drm] initializing kernel modesetting (CAICOS
>>>> 0x1002:0x6779 0x174B:0xE164).
>>>> [3.078903] [drm] register mmio base: 0xF4A2
>>>> [3.078904] [drm] register mmio size: 131072
>>>> [3.078982] ATOM BIOS: C26401
>>>> [3.079572] radeon :09:00.0: VRAM: 1024M 0x -
>>>> 0x3FFF (1024M used)
>>>> [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 -
>>>> 0x7FFF
>>>> [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M
>>>> [3.079577] [drm] RAM width 64bits DDR
>>>> [3.079755] [TTM] Zone  kernel: Available graphics memory: 8186568 kiB
>>>> [3.079757] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
>>>> [3.079757] [TTM] Initializing pool allocator
>>>> [3.079773] [TTM] Initializing DMA pool allocator
>>>> [3.080011] [drm] radeon: 1024M of VRAM memory ready
>>>> [3.080012] [drm] radeon: 1024M of GTT memory ready.
>>>> [3.080049] [drm] Loading CAICOS Microcode
>>>> [3.080330] [drm] Internal thermal controller without fan control
>>>> [3.081425] [drm] radeon: power management initialized
>>>> [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144
>>>> [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with
>>>> radeon.pcie_gen2=0
>>>> [3.085030] [drm] PCIE GART of 1024M enabled (table at
>>>> 0x00274000).
>>>> [3.085221] radeon :09:00.0: WB enabled
>>>> [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu
>>>> addr 0x4c00 and cpu addr 0x88043d914c00
>>>> [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu
>>>> addr 0x4c0c and cpu addr 0x88043d914c0c
>>>> [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu
>>>> addr 0x00072118 and cpu addr 0xc900128b2118
>>>> [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>>>> [3.097442] [drm] Driver supports precise vblank timestamp query.
>>>> [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X
>>>> [3.097544] radeon :09:00.0: radeon: using MSI.
>>>> [3.097614] [drm] radeon: irq initialized.
>>>>
>>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>>> graphics intensive seems to cause my system to become unusable for
>>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender --
>>>> it can take several minutes for me to move my mouse far enough to
>>>> close the tab and get my computer back.
>>>>
>>>> On bootup, I get this warning:
>>>> [drm:btc_dpm_set_power_state] *ERROR*
>>>> rv770_restrict_performance_levels_before_switch failed
>>>>
>>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>>> giving my rather slow graphics.
>>>>
>>>> Are there known issues here?
>>>
>>>
>>> Can you bisect the kernel, or at least isolate which kernel version first
>>> introduced the problem?
>>
>> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
>> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>>
>> With radeon.dpm=0, I can still trigger short stalls (around one
>> second), but I seem unable to trigger long stalls easily.  (I say
>> easily because, just as I was typing this email, my system stalled for
>> about a minute.)
>
> I could be wrong here, but I think that radeon.dpm=0,
> power_profile=default is okay, but radeon.dpm=0, power_profile=high is
> bad.

I'm wrong again.  power_profile=default is also bad.

Grr.

--Andy


Long radeon stalls on recent kernels

2014-11-18 Thread Andy Lutomirski
On Tue, Nov 18, 2014 at 4:21 PM, Andy Lutomirski  wrote:
> On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer  wrote:
>> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>>
>>> I have a Caicos card, like this:
>>>
>>> [3.077260] [drm] radeon kernel modesetting enabled.
>>> [3.077338] checking generic (e000 60) vs hw (e000
>>> 1000)
>>> [3.077339] fb: switching to radeondrmfb from EFI VGA
>>> [3.077377] Console: switching to colour dummy device 80x25
>>> [3.078881] [drm] initializing kernel modesetting (CAICOS
>>> 0x1002:0x6779 0x174B:0xE164).
>>> [3.078903] [drm] register mmio base: 0xF4A2
>>> [3.078904] [drm] register mmio size: 131072
>>> [3.078982] ATOM BIOS: C26401
>>> [3.079572] radeon :09:00.0: VRAM: 1024M 0x -
>>> 0x3FFF (1024M used)
>>> [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 -
>>> 0x7FFF
>>> [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M
>>> [3.079577] [drm] RAM width 64bits DDR
>>> [3.079755] [TTM] Zone  kernel: Available graphics memory: 8186568 kiB
>>> [3.079757] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
>>> [3.079757] [TTM] Initializing pool allocator
>>> [3.079773] [TTM] Initializing DMA pool allocator
>>> [3.080011] [drm] radeon: 1024M of VRAM memory ready
>>> [3.080012] [drm] radeon: 1024M of GTT memory ready.
>>> [3.080049] [drm] Loading CAICOS Microcode
>>> [3.080330] [drm] Internal thermal controller without fan control
>>> [3.081425] [drm] radeon: power management initialized
>>> [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144
>>> [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with
>>> radeon.pcie_gen2=0
>>> [3.085030] [drm] PCIE GART of 1024M enabled (table at
>>> 0x00274000).
>>> [3.085221] radeon :09:00.0: WB enabled
>>> [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu
>>> addr 0x4c00 and cpu addr 0x88043d914c00
>>> [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu
>>> addr 0x4c0c and cpu addr 0x88043d914c0c
>>> [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu
>>> addr 0x00072118 and cpu addr 0xc900128b2118
>>> [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>>> [3.097442] [drm] Driver supports precise vblank timestamp query.
>>> [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X
>>> [3.097544] radeon :09:00.0: radeon: using MSI.
>>> [3.097614] [drm] radeon: irq initialized.
>>>
>>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>>> graphics intensive seems to cause my system to become unusable for
>>> tens of seconds.  Pointing Firefox at Google Maps is a big offender --
>>> it can take several minutes for me to move my mouse far enough to
>>> close the tab and get my computer back.
>>>
>>> On bootup, I get this warning:
>>> [drm:btc_dpm_set_power_state] *ERROR*
>>> rv770_restrict_performance_levels_before_switch failed
>>>
>>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>>> giving my rather slow graphics.
>>>
>>> Are there known issues here?
>>
>>
>> Can you bisect the kernel, or at least isolate which kernel version first
>> introduced the problem?
>
> With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
> 3.16, and 3.18-rc4+.  I haven't tried other versions.
>
> With radeon.dpm=0, I can still trigger short stalls (around one
> second), but I seem unable to trigger long stalls easily.  (I say
> easily because, just as I was typing this email, my system stalled for
> about a minute.)

I could be wrong here, but I think that radeon.dpm=0,
power_profile=default is okay, but radeon.dpm=0, power_profile=high is
bad.

--Andy

>
> --Andy
>
>>
>>
>> --
>> Earthling Michel Dänzer|  http://www.amd.com
>> Libre software enthusiast  |Mesa and X developer
>
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-11-18 Thread Andy Lutomirski
On Mon, Nov 17, 2014 at 1:51 AM, Michel Dänzer  wrote:
> On 15.11.2014 07:21, Andy Lutomirski wrote:
>>
>> I have a Caicos card, like this:
>>
>> [3.077260] [drm] radeon kernel modesetting enabled.
>> [3.077338] checking generic (e000 60) vs hw (e000
>> 1000)
>> [3.077339] fb: switching to radeondrmfb from EFI VGA
>> [3.077377] Console: switching to colour dummy device 80x25
>> [3.078881] [drm] initializing kernel modesetting (CAICOS
>> 0x1002:0x6779 0x174B:0xE164).
>> [3.078903] [drm] register mmio base: 0xF4A2
>> [3.078904] [drm] register mmio size: 131072
>> [3.078982] ATOM BIOS: C26401
>> [3.079572] radeon :09:00.0: VRAM: 1024M 0x -
>> 0x3FFF (1024M used)
>> [3.079574] radeon :09:00.0: GTT: 1024M 0x4000 -
>> 0x7FFF
>> [3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M
>> [3.079577] [drm] RAM width 64bits DDR
>> [3.079755] [TTM] Zone  kernel: Available graphics memory: 8186568 kiB
>> [3.079757] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
>> [3.079757] [TTM] Initializing pool allocator
>> [3.079773] [TTM] Initializing DMA pool allocator
>> [3.080011] [drm] radeon: 1024M of VRAM memory ready
>> [3.080012] [drm] radeon: 1024M of GTT memory ready.
>> [3.080049] [drm] Loading CAICOS Microcode
>> [3.080330] [drm] Internal thermal controller without fan control
>> [3.081425] [drm] radeon: power management initialized
>> [3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144
>> [3.082589] [drm] enabling PCIE gen 2 link speeds, disable with
>> radeon.pcie_gen2=0
>> [3.085030] [drm] PCIE GART of 1024M enabled (table at
>> 0x00274000).
>> [3.085221] radeon :09:00.0: WB enabled
>> [3.085224] radeon :09:00.0: fence driver on ring 0 use gpu
>> addr 0x4c00 and cpu addr 0x88043d914c00
>> [3.085225] radeon :09:00.0: fence driver on ring 3 use gpu
>> addr 0x4c0c and cpu addr 0x88043d914c0c
>> [3.097438] radeon :09:00.0: fence driver on ring 5 use gpu
>> addr 0x00072118 and cpu addr 0xc900128b2118
>> [3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>> [3.097442] [drm] Driver supports precise vblank timestamp query.
>> [3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X
>> [3.097544] radeon :09:00.0: radeon: using MSI.
>> [3.097614] [drm] radeon: irq initialized.
>>
>> On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
>> graphics intensive seems to cause my system to become unusable for
>> tens of seconds.  Pointing Firefox at Google Maps is a big offender --
>> it can take several minutes for me to move my mouse far enough to
>> close the tab and get my computer back.
>>
>> On bootup, I get this warning:
>> [drm:btc_dpm_set_power_state] *ERROR*
>> rv770_restrict_performance_levels_before_switch failed
>>
>> Setting radeon.dpm=0 seems to work around this problem at the cost of
>> giving my rather slow graphics.
>>
>> Are there known issues here?
>
>
> Can you bisect the kernel, or at least isolate which kernel version first
> introduced the problem?

With whatever userspace I'm running, I'm seeing it 3.13, 3.14, 3.15,
3.16, and 3.18-rc4+.  I haven't tried other versions.

With radeon.dpm=0, I can still trigger short stalls (around one
second), but I seem unable to trigger long stalls easily.  (I say
easily because, just as I was typing this email, my system stalled for
about a minute.)

--Andy

>
>
> --
> Earthling Michel Dänzer|  http://www.amd.com
> Libre software enthusiast  |Mesa and X developer



-- 
Andy Lutomirski
AMA Capital Management, LLC


Long radeon stalls on recent kernels

2014-11-14 Thread Andy Lutomirski
I have a Caicos card, like this:

[3.077260] [drm] radeon kernel modesetting enabled.
[3.077338] checking generic (e000 60) vs hw (e000 1000)
[3.077339] fb: switching to radeondrmfb from EFI VGA
[3.077377] Console: switching to colour dummy device 80x25
[3.078881] [drm] initializing kernel modesetting (CAICOS
0x1002:0x6779 0x174B:0xE164).
[3.078903] [drm] register mmio base: 0xF4A2
[3.078904] [drm] register mmio size: 131072
[3.078982] ATOM BIOS: C26401
[3.079572] radeon :09:00.0: VRAM: 1024M 0x -
0x3FFF (1024M used)
[3.079574] radeon :09:00.0: GTT: 1024M 0x4000 -
0x7FFF
[3.079576] [drm] Detected VRAM RAM=1024M, BAR=256M
[3.079577] [drm] RAM width 64bits DDR
[3.079755] [TTM] Zone  kernel: Available graphics memory: 8186568 kiB
[3.079757] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[3.079757] [TTM] Initializing pool allocator
[3.079773] [TTM] Initializing DMA pool allocator
[3.080011] [drm] radeon: 1024M of VRAM memory ready
[3.080012] [drm] radeon: 1024M of GTT memory ready.
[3.080049] [drm] Loading CAICOS Microcode
[3.080330] [drm] Internal thermal controller without fan control
[3.081425] [drm] radeon: power management initialized
[3.081551] [drm] GART: num cpu pages 262144, num gpu pages 262144
[3.082589] [drm] enabling PCIE gen 2 link speeds, disable with
radeon.pcie_gen2=0
[3.085030] [drm] PCIE GART of 1024M enabled (table at 0x00274000).
[3.085221] radeon :09:00.0: WB enabled
[3.085224] radeon :09:00.0: fence driver on ring 0 use gpu
addr 0x4c00 and cpu addr 0x88043d914c00
[3.085225] radeon :09:00.0: fence driver on ring 3 use gpu
addr 0x4c0c and cpu addr 0x88043d914c0c
[3.097438] radeon :09:00.0: fence driver on ring 5 use gpu
addr 0x00072118 and cpu addr 0xc900128b2118
[3.097441] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[3.097442] [drm] Driver supports precise vblank timestamp query.
[3.097514] radeon :09:00.0: irq 56 for MSI/MSI-X
[3.097544] radeon :09:00.0: radeon: using MSI.
[3.097614] [drm] radeon: irq initialized.

On recent kernels (3.16 through 3.18-rc4, perhaps), doing anything
graphics intensive seems to cause my system to become unusable for
tens of seconds.  Pointing Firefox at Google Maps is a big offender --
it can take several minutes for me to move my mouse far enough to
close the tab and get my computer back.

On bootup, I get this warning:
[drm:btc_dpm_set_power_state] *ERROR*
rv770_restrict_performance_levels_before_switch failed

Setting radeon.dpm=0 seems to work around this problem at the cost of
giving my rather slow graphics.

Are there known issues here?

Thanks,
Andy


[PATCH 1/6] x86: Add support for the pcommit instruction

2014-11-14 Thread Andy Lutomirski
On Fri, Nov 14, 2014 at 1:07 PM, Ross Zwisler
 wrote:
> On Wed, 2014-11-12 at 19:25 -0800, Andy Lutomirski wrote:
>> On 11/11/2014 10:43 AM, Ross Zwisler wrote:
>> > Add support for the new pcommit instruction.  This instruction was
>> > announced in the document "Intel Architecture Instruction Set Extensions
>> > Programming Reference" with reference number 319433-022.
>> >
>> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>> >
>> > Signed-off-by: Ross Zwisler 
>> > Cc: H Peter Anvin 
>> > Cc: Ingo Molnar 
>> > Cc: Thomas Gleixner 
>> > Cc: David Airlie 
>> > Cc: dri-devel at lists.freedesktop.org
>> > Cc: x86 at kernel.org
>> > ---
>> >  arch/x86/include/asm/cpufeature.h| 1 +
>> >  arch/x86/include/asm/special_insns.h | 6 ++
>> >  2 files changed, 7 insertions(+)
>> >
>> > diff --git a/arch/x86/include/asm/cpufeature.h 
>> > b/arch/x86/include/asm/cpufeature.h
>> > index 0bb1335..b3e6b89 100644
>> > --- a/arch/x86/include/asm/cpufeature.h
>> > +++ b/arch/x86/include/asm/cpufeature.h
>> > @@ -225,6 +225,7 @@
>> >  #define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */
>> >  #define X86_FEATURE_ADX( 9*32+19) /* The ADCX and ADOX 
>> > instructions */
>> >  #define X86_FEATURE_SMAP   ( 9*32+20) /* Supervisor Mode Access 
>> > Prevention */
>> > +#define X86_FEATURE_PCOMMIT( 9*32+22) /* PCOMMIT instruction */
>> >  #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
>> >  #define X86_FEATURE_AVX512PF   ( 9*32+26) /* AVX-512 Prefetch */
>> >  #define X86_FEATURE_AVX512ER   ( 9*32+27) /* AVX-512 Exponential and 
>> > Reciprocal */
>> > diff --git a/arch/x86/include/asm/special_insns.h 
>> > b/arch/x86/include/asm/special_insns.h
>> > index e820c08..1709a2e 100644
>> > --- a/arch/x86/include/asm/special_insns.h
>> > +++ b/arch/x86/include/asm/special_insns.h
>> > @@ -199,6 +199,12 @@ static inline void clflushopt(volatile void *__p)
>> >"+m" (*(volatile char __force *)__p));
>> >  }
>> >
>> > +static inline void pcommit(void)
>> > +{
>> > +   alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
>> > +   X86_FEATURE_PCOMMIT);
>> > +}
>> > +
>>
>> Should this patch add the feature bit and cpuinfo entry to go with it?
>>
>> --Andy
>
> I think this patch does everything we need?  The text for cpuinfo is
> auto-generated in arch/x86/kernel/cpu/capflags.c from the flags defined
> in arch/x86/include/asm/cpufeature.h, I think.  Here's what I get in
> cpuinfo on my system with a faked-out CPUID saying that clwb and pcommit
> are present:
>
> $ grep 'flags' /proc/cpuinfo
> flags   : fpu  erms pcommit clflushopt clwb xsaveopt
>
> The X86_FEATURE_CLWB and X86_FEATURE_PCOMMIT flags are being set up
> according to what's in CPUID, and the proper alternatives are being
> triggered.  I stuck some debug code in the alternatives code to see what
> was being patched in the presence and absence of each of the flags.
>
> Is there something else I'm missing?

No.  I just missed the magical auto-generation part.

--Andy

>
> Thanks,
> - Ross
>



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

2014-11-13 Thread Andy Lutomirski
On Nov 13, 2014 3:20 AM, "Borislav Petkov"  wrote:
>
> On Wed, Nov 12, 2014 at 07:14:21PM -0800, Andy Lutomirski wrote:
> > On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> > > If clwb is available on the system, use it in drm_clflush_virt_range.
> > > If clwb is not available, fall back to clflushopt if you can.
> > > If clflushopt is not supported, fall all the way back to clflush.
> >
> > I don't know exactly what drm_clflush_virt_range (and the other
> > functions you're modifying similarly) are for, but it seems plausible to
> > me that they're used before reads to make sure that non-coherent memory
> > sees updated data.  If that's true, then this will break it.
>
> Why would it break it? The updated cachelines will be in memory and
> subsequent reads will be serviced from the cache instead from going to
> memory as it is not invalidated as it would be by CLFLUSH.
>
> /me is puzzled.

Suppose you map some device memory WB, and then the device
non-coherently updates.  If you want the CPU to see it, you need
clflush or clflushopt.  Some architectures might do this for
dma_sync_single_for_cpu with DMA_FROM_DEVICE.

I'm not sure that such a thing exists on x86.

--Andy


[PATCH 1/6] x86: Add support for the pcommit instruction

2014-11-12 Thread Andy Lutomirski
On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> Add support for the new pcommit instruction.  This instruction was
> announced in the document "Intel Architecture Instruction Set Extensions
> Programming Reference" with reference number 319433-022.
> 
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> 
> Signed-off-by: Ross Zwisler 
> Cc: H Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: David Airlie 
> Cc: dri-devel at lists.freedesktop.org
> Cc: x86 at kernel.org
> ---
>  arch/x86/include/asm/cpufeature.h| 1 +
>  arch/x86/include/asm/special_insns.h | 6 ++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index 0bb1335..b3e6b89 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -225,6 +225,7 @@
>  #define X86_FEATURE_RDSEED   ( 9*32+18) /* The RDSEED instruction */
>  #define X86_FEATURE_ADX  ( 9*32+19) /* The ADCX and ADOX 
> instructions */
>  #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention 
> */
> +#define X86_FEATURE_PCOMMIT  ( 9*32+22) /* PCOMMIT instruction */
>  #define X86_FEATURE_CLFLUSHOPT   ( 9*32+23) /* CLFLUSHOPT instruction */
>  #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
>  #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and 
> Reciprocal */
> diff --git a/arch/x86/include/asm/special_insns.h 
> b/arch/x86/include/asm/special_insns.h
> index e820c08..1709a2e 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -199,6 +199,12 @@ static inline void clflushopt(volatile void *__p)
>  "+m" (*(volatile char __force *)__p));
>  }
>  
> +static inline void pcommit(void)
> +{
> + alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
> + X86_FEATURE_PCOMMIT);
> +}
> +

Should this patch add the feature bit and cpuinfo entry to go with it?

--Andy


[PATCH 6/6] x86: Use clwb in drm_clflush_virt_range

2014-11-12 Thread Andy Lutomirski
On 11/11/2014 10:43 AM, Ross Zwisler wrote:
> If clwb is available on the system, use it in drm_clflush_virt_range.
> If clwb is not available, fall back to clflushopt if you can.
> If clflushopt is not supported, fall all the way back to clflush.

I don't know exactly what drm_clflush_virt_range (and the other
functions you're modifying similarly) are for, but it seems plausible to
me that they're used before reads to make sure that non-coherent memory
sees updated data.  If that's true, then this will break it.

But maybe all the users are write to coherent memory that just need to
ensure that whatever's backing the memory knows about the write.

FWIW, it may make sense to rename this function to drm_clwb_virt_range
if you make this change.

--Andy

> 
> Signed-off-by: Ross Zwisler 
> Cc: H Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: David Airlie 
> Cc: dri-devel at lists.freedesktop.org
> Cc: x86 at kernel.org
> ---
>  drivers/gpu/drm/drm_cache.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index aad9d82..84e9a04 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -138,8 +138,8 @@ drm_clflush_virt_range(void *addr, unsigned long length)
>   void *end = addr + length;
>   mb();
>   for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
> - clflushopt(addr);
> - clflushopt(end - 1);
> + clwb(addr);
> + clwb(end - 1);
>   mb();
>   return;
>   }
> 



3.14 radeon regression: radeon is broken (pci bug?)

2014-09-16 Thread Andy Lutomirski
On Tue, Sep 16, 2014 at 9:45 AM, Bjorn Helgaas  wrote:
> On Thu, Mar 27, 2014 at 11:30:37AM -0600, Bjorn Helgaas wrote:
>> On Mon, Mar 24, 2014 at 4:04 PM, Bjorn Helgaas  
>> wrote:
>> > On Sat, Mar 22, 2014 at 9:18 AM, Andy Lutomirski  
>> > wrote:
>> >> On Fri, Mar 21, 2014 at 9:37 AM, Bjorn Helgaas  
>> >> wrote:
>> >>> On Fri, Mar 21, 2014 at 9:49 AM, Andy Lutomirski > >>> amacapital.net> wrote:
>> >>>> On Fri, Mar 21, 2014 at 7:41 AM, Alex Deucher > >>>> gmail.com> wrote:
>> >>>>> On Thu, Mar 20, 2014 at 10:17 PM, Andy Lutomirski > >>>>> amacapital.net> wrote:
>> >>>>>> My system works on a 3.13 Fedora kernel.  It does not work on a
>> >>>>>> more-or-less identically configured 3.14-rc7+ kernel.  The symptom is
>> >>>>>> that the Plymouth password prompt flashes and them the screen goes
>> >>>>>> blank.  Hitting escape brings back the text console, and all is well
>> >>>>>> until X tries to start.  Then I get a blank screen.  killall -9 Xorg
>> >>>>>> from ssh causes these errors to be logged:
>> >>>>>>
>> >>>>>>
>> >>>>>> [  226.239747] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> >>>>>> more than 5secs aborting
>> >>>>>> [  226.239751] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> >>>>>> executing CD34 (len 55, WS 0, PS 0) @ 0xCD57
>> >>>>>> [  231.241492] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> >>>>>> more than 5secs aborting
>> >>>>>> [  231.241496] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> >>>>>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>> >>>>>> [  236.243111] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> >>>>>> more than 5secs aborting
>> >>>>>> [  236.243115] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> >>>>>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>> >>>>>> [  241.244625] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> >>>>>> more than 5secs aborting
>> >>>>>> [  241.244628] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> >>>>>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>> >>>>>>
>> >>>>>>
>> >>>>>> lspci -vvvxxxnn on 3.14-rc7+ says:
>> >>>>>>
>> >>>>>> 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>> >>>>>> [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
>> >>>>>> (rev ff) (prog-if ff)
>> >>>>>> !!! Unknown header type 7f
>> >>>>>> Kernel driver in use: radeon
>> >>>>>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>>
>> >>>>>> 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
>> >>>>>> Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] (rev ff)
>> >>>>>> (prog-if ff)
>> >>>>>> !!! Unknown header type 7f
>> >>>>>> Kernel driver in use: snd_hda_intel
>> >>>>>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> >>>>>>
>> >>>>>> (oops!)
>> >>>>>>
>> >>>>>> On 3.13, it says:
>> >>>>>>
>> >>>>>> 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>> >>>>>> [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
>> >>>>>> (prog-if 00 [VGA controller])
>> >>>>>> Subsystem: PC Partner Limited / Sapphire Te

[PATCH 0/6] File Sealing & memfd_create()

2014-06-17 Thread Andy Lutomirski
On Jun 17, 2014 2:48 AM, "Florian Weimer"  wrote:
>
> On 04/10/2014 10:37 PM, Andy Lutomirski wrote:
>
>> It occurs to me that, before going nuts with these kinds of flags, it
>> may pay to just try to fix the /proc/self/fd issue for real -- we
>> could just make open("/proc/self/fd/3", O_RDWR) fail if fd 3 is
>> read-only.  That may be enough for the file sealing thing.
>
>
> Increasing privilege on O_PATH descriptors via access through
/proc/self/fd is part of the userspace API.  The same thing might be true
for O_RDONLY descriptors, but it's a bit less likely that there are any
users out there.  In any case, I'm not sure it makes sense to plug the
O_RDONLY hole while leaving the O_PATH hole open.

Do you mean O_PATH fds for the directory or O_PATH fds for the file
itself?  In any event, I'm much less concerned about passing O_PATH memfds
around than O_RDONLY memfds.

I have incomplete patches for this stuff.  I need to fix them so they work
and get past Al Viro.


--Andy
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140617/30d2d605/attachment.html>


[PATCH 2/6] shm: add sealing API

2014-04-11 Thread Andy Lutomirski
On Fri, Apr 11, 2014 at 2:42 PM, David Herrmann  
wrote:
> Hi
>
> On Fri, Apr 11, 2014 at 11:36 PM, Andy Lutomirski  
> wrote:
>> A quick grep of the kernel tree finds exactly zero code paths
>> incrementing i_mmap_writable outside of mmap and fork.
>>
>> Or do you mean a different kind of write ref?  What am I missing here?
>
> Sorry, I meant i_writecount.

I bet this is missing from lots of places.  For example, I can't find
any write_access stuff in the rdma code.

I suspect that the VM_DENYWRITE code is just generally racy.

--Andy


[PATCH 2/6] shm: add sealing API

2014-04-11 Thread Andy Lutomirski
On 04/11/2014 02:31 PM, David Herrmann wrote:
> Hi
> 
> On Fri, Apr 11, 2014 at 3:43 PM, Tony Battersby  
> wrote:
>> Exactly.  For O_DIRECT, that would be the call to get_user_pages_fast()
>> from dio_refill_pages() in fs/direct-io.c, which is ultimately called
>> from blkdev_direct_IO().
> 
> If you drop mmap_sem after pinning a page without taking a write-ref,
> you break i_mmap_writable / VM_DENYWRITE. In memfd I rely on
> i_mmap_writable to work, same thing is done by exec() (and the old,
> now disabled, MAP_DENYWRITE).
> 
> I don't know whether I should care. I mean, everyone pinning pages and
> writing to it without holding the mmap_sem has to take a write-ref for
> each page or it breaks i_mmap_writable. So this seems to be a bug in
> direct-IO, not in anyone relying on it, right?

A quick grep of the kernel tree finds exactly zero code paths
incrementing i_mmap_writable outside of mmap and fork.

Or do you mean a different kind of write ref?  What am I missing here?

--Andy


[PATCH 2/6] shm: add sealing API

2014-04-10 Thread Andy Lutomirski
On 04/10/2014 05:22 PM, David Herrmann wrote:
> Hi
> 
> On Thu, Apr 10, 2014 at 11:33 PM, Tony Battersby  
> wrote:
>> For O_DIRECT the kernel pins the submitted pages in memory for DMA by
>> incrementing the page reference counts when the I/O is submitted,
>> allowing the pages to be modified by DMA even if they are no longer
>> mapped in the address space of the process.  This is different from a
>> regular read(), which uses the CPU to copy the data and will fail if the
>> pages are not mapped.
> 
> Can you please provide an example code-path? For instance,
> file_read_actor() does not pin any pages but only keeps the user-space
> address and resolves it once it has data to write.

This may be an issue for anything in the kernel that calls
get_user_pages and holds onto the result at any time that mmap_sem isn't
held.

I don't know exactly what does that, but RDMA comes to mind.  So does
(ugh!) vmsplice, although I suspect that vmsplice doesn't write.

--Andy


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On Thu, Apr 10, 2014 at 4:16 PM, David Herrmann  
wrote:
> Hi
>
> On Fri, Apr 11, 2014 at 1:05 AM, Andy Lutomirski  
> wrote:
>> /proc/pid/fd is a really weird corner case in which the mode of an
>> inode that doesn't have a name matters.  I suspect that almost no one
>> will ever want to open one of these things out of /proc/self/fd, and
>> those who do should be made to think about it.
>
> I'm arguing in the context of memfd, and there's no security leak if
> people get access to the underlying inode (at least I'm not aware of
> any).

I'm not sure what you mean.

> As I said, context information is attached to the inode, not
> file context, so I'm fine if people want to open multiple file
> contexts via /proc. If someone wants to forbid open(), I want to hear
> _why_. I assume the memfd object has uid==uid-of-creator and
> mode==(777 & ~umask) (which usually results in X00, so no access for
> non-owners). I cannot see how /proc is a security issue here.

On further reflection, my argument for 000 is crap.  As far as I can
see, the only time that the mode matters at all when playing with
/proc/pid/fd, and they only way to get a non-O_RDWR memfd is using
/proc/pid/fd, so I'll argue for 0600 instead.

Argument why 0600 is better than 0600 & ~umask: either callers don't
care because the inode mode simply doesn't matter or they're using
/proc/pid/fd to *reduce* permissions, in which case they'd probably
like to avoid having to play with umask or call fchmod.

Argument why 0600 is better than 0777 & ~umask: People /prod/pid/fd
are the only ones who care, in which case they probably prefer for the
permissions not be increased by other users if they give them a
reduced-permission fd.

Anyway, this is all mostly unimportant.  Some text in the man page is
probably sufficient, but I still think that 0600 is trivial to
implement and a little bit more friendly.

--Andy

>
> Thanks
> David



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On Thu, Apr 10, 2014 at 3:57 PM, David Herrmann  
wrote:
> Hi
>
> On Thu, Apr 10, 2014 at 11:16 PM, Andy Lutomirski  
> wrote:
>> Would it make sense for the initial mode on a memfd inode to be 000?
>> Anyone who finds this to be problematic could use fchmod to fix it.
>
> memfd_create() should be subject to umask() just like anything else.
> That should solve any possible race here, right?

Yes, but how many people will actually think about umask when doing
things that don't really look like creating files?

/proc/pid/fd is a really weird corner case in which the mode of an
inode that doesn't have a name matters.  I suspect that almost no one
will ever want to open one of these things out of /proc/self/fd, and
those who do should be made to think about it.

It also avoids odd screwups where things are secure until someone runs
them with umask 000.

--Andy


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On Thu, Apr 10, 2014 at 1:49 PM, David Herrmann  
wrote:
> Hi
>
> On Thu, Apr 10, 2014 at 10:37 PM, Andy Lutomirski  
> wrote:
>> It occurs to me that, before going nuts with these kinds of flags, it
>> may pay to just try to fix the /proc/self/fd issue for real -- we
>> could just make open("/proc/self/fd/3", O_RDWR) fail if fd 3 is
>> read-only.  That may be enough for the file sealing thing.
>
> For the sealing API, none of this is needed. As long as the inode is
> owned by the uid who creates the memfd, you can pass it around and
> no-one besides root and you can open /proc/self/fd/$fd (assuming chmod
> 700). If you share the fd with someone with the same uid as you,
> you're screwed anyway. We don't protect users against themselves (I
> mean, they can ptrace you, or kill()..). Therefore, I'm not really
> convinced that we want this for memfd. At least no-one has provided a
> _proper_ use-case for this so far.

Hmm.  Fair enough.

Would it make sense for the initial mode on a memfd inode to be 000?
Anyone who finds this to be problematic could use fchmod to fix it.

I might even go so far as to suggest that the default uid on the inode
should be 0 (i.e. global root), since there is the odd corner case of
root setting euid != 0, creating a memfd, and setting euid back to 0.
The latter might cause resource accounting issues, though.

--Andy


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On Thu, Apr 10, 2014 at 1:32 PM, Theodore Ts'o  wrote:
> On Thu, Apr 10, 2014 at 12:14:27PM -0700, Andy Lutomirski wrote:
>>
>> This is the second time in a week that someone has asked for a way to
>> have a struct file (or struct inode or whatever) that can't be reopened
>> through /proc/pid/fd.  This should be quite easy to implement as a
>> separate feature.
>
> What I suggested on a different thread was to add the following new
> file descriptor flags, to join FD_CLOEXEC, which would be maniuplated
> using the F_GETFD and F_SETFD fcntl commands:
>
> FD_NOPROCFS disallow being able to open the inode via /proc//fd
>
> FD_NOPASSFD disallow being able to pass the fd via a unix domain socket
>
> FD_LOCKFLAGSif this bit is set, disallow any further changes of 
> FD_CLOEXEC,
> FD_NOPROCFS, FD_NOPASSFD, and FD_LOCKFLAGS flags.
>
> Regardless of what else we might need to meet the use case for the
> proposed File Sealing API, I think this is a useful feature that could
> be used in many other contexts besides just the proposed
> memfd_create() use case.

It occurs to me that, before going nuts with these kinds of flags, it
may pay to just try to fix the /proc/self/fd issue for real -- we
could just make open("/proc/self/fd/3", O_RDWR) fail if fd 3 is
read-only.  That may be enough for the file sealing thing.

--Andy


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On 04/08/2014 06:00 AM, Florian Weimer wrote:
> On 03/19/2014 08:06 PM, David Herrmann wrote:
> 
>> Unlike existing techniques that provide similar protection, sealing
>> allows
>> file-sharing without any trust-relationship. This is enforced by
>> rejecting seal
>> modifications if you don't own an exclusive reference to the given
>> file. So if
>> you own a file-descriptor, you can be sure that no-one besides you can
>> modify
>> the seals on the given file. This allows mapping shared files from
>> untrusted
>> parties without the fear of the file getting truncated or modified by an
>> attacker.
> 
> How do you keep these promises on network and FUSE file systems?  Surely
> there is still some trust involved for such descriptors?
> 
> What happens if you create a loop device on a sealed descriptor?
> 
> Why does memfd_create not create a file backed by a memory region in the
> current process?  Wouldn't this be a far more generic primitive?
> Creating aliases of memory regions would be interesting for many things
> (not just libffi bypassing SELinux-enforced NX restrictions :-).

If you write a patch to prevent selinux from enforcing NX, I will ack
that patch with all my might.  I don't know how far it would get me, but
I think that selinux has no business going anywhere near execmem.

Adding a clone mode to mremap might be a better bet.  But memfd solves
that problem, too, albeit messily.

--Andy


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On 04/10/2014 07:45 AM, Colin Walters wrote:
> On Thu, Mar 20, 2014 at 11:32 AM, tytso at mit.edu wrote:
>>
>> Looking at your patches, and what files you are modifying, you are
>> enforcing this in the low-level file system.
> 
> I would love for this to be implemented in the filesystem level as
> well.  Something like the ext4 immutable bit, but with the ability to
> still make hardlinks would be *very* useful for OSTree.  And anyone else
> that uses hardlinks as a data source.  The vserver people do something
> similiar:
> http://linux-vserver.org/util-vserver:Vhashify
> 
> At the moment I have a read-only bind mount over /usr, but what I really
> want is to make the individual objects in the object store in
> /ostree/repo/objects be immutable, so even if a user or app navigates
> out to /sysroot they still can't mutate them (or the link targets in the
> visible /usr).

COW links can do this already, I think.  Of course, you'll have to use a
filesystem that supports them.

--Andy


[PATCH 0/6] File Sealing & memfd_create()

2014-04-10 Thread Andy Lutomirski
On 03/20/2014 09:38 AM, tytso at mit.edu wrote:
> On Thu, Mar 20, 2014 at 04:48:30PM +0100, David Herrmann wrote:
>> On Thu, Mar 20, 2014 at 4:32 PM,   wrote:
>>> Why not make sealing an attribute of the "struct file", and enforce it
>>> at the VFS layer?  That way all file system objects would have access
>>> to sealing interface, and for memfd_shmem, you can't get another
>>> struct file pointing at the object, the security properties would be
>>> identical.
>>
>> Sealing as introduced here is an inode-attribute, not "struct file".
>> This is intentional. For instance, a gfx-client can get a read-only FD
>> via /proc/self/fd/ and pass it to the compositor so it can never
>> overwrite the contents (unless the compositor has write-access to the
>> inode itself, in which case it can just re-open it read-write).
> 
> Hmm, good point.  I had forgotten about the /proc/self/fd hole.
> Hmm... what if we have a SEAL_PROC which forces the permissions of
> /proc/self/fd to be 000?

This is the second time in a week that someone has asked for a way to
have a struct file (or struct inode or whatever) that can't be reopened
through /proc/pid/fd.  This should be quite easy to implement as a
separate feature.

Actually, that feature would solve a major pet peeve of mine, I think: I
want something like memfd that allows me to keep the thing read-write
but that whomever I pass the fd to can't change.  With this feature, I
could do:

fd_rw = memfd_create (or O_TMPFILE or whatever)
fd_ro = open(/proc/self/fd/fd_ro, O_RDONLY);
fcntl(fd_ro, F_RESTRICT, F_RESTRICT_REOPEN);

send fd_ro via SCM_RIGHTS.

To really make this work well, I also want to SEAL_SHRINK the inode so
that the receiver can verify that I'm not going to truncate the file out
from under it.

Bingo, fast and secure one-way IPC.

--Andy


[PATCH 3/6] shm: add memfd_create() syscall

2014-04-10 Thread Andy Lutomirski
On 04/02/2014 06:38 AM, Konstantin Khlebnikov wrote:
> On Wed, Mar 19, 2014 at 11:06 PM, David Herrmann  
> wrote:
>> memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
>> that you can pass to mmap(). It explicitly allows sealing and
>> avoids any connection to user-visible mount-points. Thus, it's not
>> subject to quotas on mounted file-systems, but can be used like
>> malloc()'ed memory, but with a file-descriptor to it.
>>
>> memfd_create() does not create a front-FD, but instead returns the raw
>> shmem file, so calls like ftruncate() can be used. Also calls like fstat()
>> will return proper information and mark the file as regular file. Sealing
>> is explicitly supported on memfds.
>>
>> Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
>> subject to quotas and alike.
> 
> Instead of adding new syscall we can extend existing openat() a little
> bit more:
> 
> openat(AT_FDSHM, "name", O_TMPFILE | O_RDWR, 0666)

Please don't.  O_TMPFILE is a messy enough API, and the last thing we
need to do is to extend it.  If we want a fancy API for creating new
inodes with no corresponding dentry, let's create one.

Otherwise, let's just stick with a special-purpose API for these shm files.

--Andy


Framebuffer corruption in QEMU or Linux's cirrus driver

2014-04-01 Thread Andy Lutomirski
On Tue, Apr 1, 2014 at 3:09 PM, Andy Lutomirski  wrote:
> Running:
>
> ./virtme-run --installed-kernel
>
> from this virtme commit:
>
> https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git/commit/?id=2b409a086d15b7a878c7d5204b1f44a6564a341f
>
> results in a bunch of missing lines of text once bootup finishes.
> Pressing enter a few times gradually fixes it.
>
> I don't know whether this is a qemu bug or a Linux bug.
>
> I'm seeing this on Fedora's 3.13.7 kernel and on a fairly recent
> 3.14-rc kernel.  For the latter, cirrus is built-in (not a module),
> I'm running:
>
> virtme-run --kimg arch/x86/boot/bzImage
>
> and I see more profound corruption.

I'm guessing this is a cirrus drm bug.  bochs-drm (using virtme-run
--installed-kernel --qemu-opts -vga std) does not appear to have the
same issue.  Neither does qxl.

(qxl is painfully slow, though, and it doesn't seem to be using UC memory.)

--Andy


3.14 radeon regression: radeon is broken (pci bug?)

2014-03-22 Thread Andy Lutomirski
On Fri, Mar 21, 2014 at 9:37 AM, Bjorn Helgaas  wrote:
> On Fri, Mar 21, 2014 at 9:49 AM, Andy Lutomirski  
> wrote:
>> On Fri, Mar 21, 2014 at 7:41 AM, Alex Deucher  
>> wrote:
>>> On Thu, Mar 20, 2014 at 10:17 PM, Andy Lutomirski  
>>> wrote:
>>>> My system works on a 3.13 Fedora kernel.  It does not work on a
>>>> more-or-less identically configured 3.14-rc7+ kernel.  The symptom is
>>>> that the Plymouth password prompt flashes and them the screen goes
>>>> blank.  Hitting escape brings back the text console, and all is well
>>>> until X tries to start.  Then I get a blank screen.  killall -9 Xorg
>>>> from ssh causes these errors to be logged:
>>>>
>>>>
>>>> [  226.239747] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>>>> more than 5secs aborting
>>>> [  226.239751] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>>>> executing CD34 (len 55, WS 0, PS 0) @ 0xCD57
>>>> [  231.241492] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>>>> more than 5secs aborting
>>>> [  231.241496] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>>>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>>>> [  236.243111] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>>>> more than 5secs aborting
>>>> [  236.243115] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>>>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>>>> [  241.244625] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>>>> more than 5secs aborting
>>>> [  241.244628] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>>>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>>>>
>>>>
>>>> lspci -vvvxxxnn on 3.14-rc7+ says:
>>>>
>>>> 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>>>> [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
>>>> (rev ff) (prog-if ff)
>>>> !!! Unknown header type 7f
>>>> Kernel driver in use: radeon
>>>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>
>>>> 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
>>>> Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] (rev ff)
>>>> (prog-if ff)
>>>> !!! Unknown header type 7f
>>>> Kernel driver in use: snd_hda_intel
>>>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>
>>>> (oops!)
>>>>
>>>> On 3.13, it says:
>>>>
>>>> 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>>>> [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
>>>> (prog-if 00 [VGA controller])
>>>> Subsystem: PC Partner Limited / Sapphire Technology Radeon HD
>>>> 6450 1 GB DDR3 [174b:e164]
>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>> SERR- >>> Latency: 0, Cache Line Size: 64 bytes
>>>> Interrupt: pin A routed to IRQ 92
>>>> Region 0: Memory at e000 (64-bit, prefetchable) [size=256M]
>>>> Region 2: Memory at f4a2 (64-bit, non-prefetchable) [size=128K]
>>>> Region 4: I/O ports at c000 [size=256]
>>>> Expansion ROM at f4a0 [disabled] [size=128K]
>>>> Capabilities: 
>>>> Kernel driver in use: radeon
>>>> 00: 02 10 79 67 07 04 10 00 00 00 00 03 10 00 80 00
>>>> 10: 0c 00 00 e0 00 00 00 00 04 00 a2 f4 00 00 00 00
>>>> 20: 01 c0 00 00 00 00 00 00 00 00 00 00 4b 17 64 e1
>>>> 30: 00 00 a0 f4 50 00 00 00 00 00 00 00 0a 01 00 00
>>>>
>>>> 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
>>>> Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98]
>>>> Subsystem: PC Partner Limited / Sapphire Technology Radeon HD
>>>>

3.14 radeon regression: radeon is broken (pci bug?)

2014-03-21 Thread Andy Lutomirski
On Fri, Mar 21, 2014 at 7:41 AM, Alex Deucher  wrote:
> On Thu, Mar 20, 2014 at 10:17 PM, Andy Lutomirski  
> wrote:
>> My system works on a 3.13 Fedora kernel.  It does not work on a
>> more-or-less identically configured 3.14-rc7+ kernel.  The symptom is
>> that the Plymouth password prompt flashes and them the screen goes
>> blank.  Hitting escape brings back the text console, and all is well
>> until X tries to start.  Then I get a blank screen.  killall -9 Xorg
>> from ssh causes these errors to be logged:
>>
>>
>> [  226.239747] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> more than 5secs aborting
>> [  226.239751] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> executing CD34 (len 55, WS 0, PS 0) @ 0xCD57
>> [  231.241492] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> more than 5secs aborting
>> [  231.241496] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>> [  236.243111] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> more than 5secs aborting
>> [  236.243115] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>> [  241.244625] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
>> more than 5secs aborting
>> [  241.244628] [drm:atom_execute_table_locked] *ERROR* atombios stuck
>> executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
>>
>>
>> lspci -vvvxxxnn on 3.14-rc7+ says:
>>
>> 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>> [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
>> (rev ff) (prog-if ff)
>> !!! Unknown header type 7f
>> Kernel driver in use: radeon
>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>
>> 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
>> Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] (rev ff)
>> (prog-if ff)
>> !!! Unknown header type 7f
>> Kernel driver in use: snd_hda_intel
>> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>
>> (oops!)
>>
>> On 3.13, it says:
>>
>> 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
>> [AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
>> (prog-if 00 [VGA controller])
>> Subsystem: PC Partner Limited / Sapphire Technology Radeon HD
>> 6450 1 GB DDR3 [174b:e164]
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> SERR- > Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin A routed to IRQ 92
>> Region 0: Memory at e000 (64-bit, prefetchable) [size=256M]
>> Region 2: Memory at f4a2 (64-bit, non-prefetchable) [size=128K]
>> Region 4: I/O ports at c000 [size=256]
>> Expansion ROM at f4a0 [disabled] [size=128K]
>> Capabilities: 
>> Kernel driver in use: radeon
>> 00: 02 10 79 67 07 04 10 00 00 00 00 03 10 00 80 00
>> 10: 0c 00 00 e0 00 00 00 00 04 00 a2 f4 00 00 00 00
>> 20: 01 c0 00 00 00 00 00 00 00 00 00 00 4b 17 64 e1
>> 30: 00 00 a0 f4 50 00 00 00 00 00 00 00 0a 01 00 00
>>
>> 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
>> Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98]
>> Subsystem: PC Partner Limited / Sapphire Technology Radeon HD
>> 6450 1GB DDR3 [174b:aa98]
>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>> SERR- > Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin B routed to IRQ 96
>> Region 0: Memory at f4a4 (64-bit, non-prefetchable) [size=16K]
>> Capabilities: 
>>     Kernel driver in use: snd_hda_intel
>> 00: 02 10 98 aa 06 04 10 00 00 00 03 04 10 00 80 00
>> 10: 04 00 a4 f4 00 00 00 00 00 00 00 00 00 00 00 00
>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 4b 17 98 aa
>> 30: 00 00 00 00 50 00 00 00 00 00 00 00 05 02 00 00
>>
>> Logs attached.
>>
>> Unfortunately, I'll be away from this computer until Wednesday.
>
> Can you bisect?

Not until Wednesday -- I don't have any way to test this remotely.  I
can do tests that don't involve rebooting, though.

>
> Alex



-- 
Andy Lutomirski
AMA Capital Management, LLC


3.14 radeon regression: radeon is broken (pci bug?)

2014-03-20 Thread Andy Lutomirski
My system works on a 3.13 Fedora kernel.  It does not work on a
more-or-less identically configured 3.14-rc7+ kernel.  The symptom is
that the Plymouth password prompt flashes and them the screen goes
blank.  Hitting escape brings back the text console, and all is well
until X tries to start.  Then I get a blank screen.  killall -9 Xorg
from ssh causes these errors to be logged:


[  226.239747] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  226.239751] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD34 (len 55, WS 0, PS 0) @ 0xCD57
[  231.241492] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  231.241496] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
[  236.243111] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  236.243115] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD6C (len 62, WS 0, PS 0) @ 0xCD88
[  241.244625] [drm:atom_op_jump] *ERROR* atombios stuck in loop for
more than 5secs aborting
[  241.244628] [drm:atom_execute_table_locked] *ERROR* atombios stuck
executing CD6C (len 62, WS 0, PS 0) @ 0xCD88


lspci -vvvxxxnn on 3.14-rc7+ says:

09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
(rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: radeon
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98] (rev ff)
(prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: snd_hda_intel
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

(oops!)

On 3.13, it says:

09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Caicos [Radeon HD 6450/7450/8450 / R5 230 OEM] [1002:6779]
(prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology Radeon HD
6450 1 GB DDR3 [174b:e164]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- 
Kernel driver in use: radeon
00: 02 10 79 67 07 04 10 00 00 00 00 03 10 00 80 00
10: 0c 00 00 e0 00 00 00 00 04 00 a2 f4 00 00 00 00
20: 01 c0 00 00 00 00 00 00 00 00 00 00 4b 17 64 e1
30: 00 00 a0 f4 50 00 00 00 00 00 00 00 0a 01 00 00

09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI]
Caicos HDMI Audio [Radeon HD 6400 Series] [1002:aa98]
Subsystem: PC Partner Limited / Sapphire Technology Radeon HD
6450 1GB DDR3 [174b:aa98]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- 
Kernel driver in use: snd_hda_intel
00: 02 10 98 aa 06 04 10 00 00 00 03 04 10 00 80 00
10: 04 00 a4 f4 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 4b 17 98 aa
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 02 00 00

Logs attached.

Unfortunately, I'll be away from this computer until Wednesday.
-- next part --
00:00.0 Host bridge [0600]: Intel Corporation Xeon E5/Core i7 DMI2 [8086:3c00] 
(rev 06)
Subsystem: Micro-Star International Co., Ltd. Device [1462:7760]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
00: 86 80 00 3c 00 00 10 00 06 00 00 06 10 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 62 14 60 77
30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00

00:01.0 PCI bridge [0604]: Intel Corporation Xeon E5/Core i7 IIO PCI Express 
Root Port 1a [8086:3c02] (rev 06) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: 
Kernel driver in use: pcieport
00: 86 80 02 3c 07 04 10 00 06 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 f0 00 00 00
20: f0 ff 00 00 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 10 00

00:02.0 PCI bridge [0604]: Intel Corporation Xeon E5/Core i7 IIO PCI Express 
Root Port 2a [8086:3c04] (rev 06) (prog-if 

Re: [PATCH 17/25] drm: rip out drm_core_has_MTRR checks

2013-08-10 Thread Andy Lutomirski
On Fri, Aug 9, 2013 at 11:36 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 On Fri, Aug 9, 2013 at 8:12 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Aug 8, 2013 at 6:41 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 The new arch_phys_wc_add/del functions do the right thing both with
 and without MTRR support in the kernel. So we can drop these
 additional checks.

 If any of the new arch_phys_wc_add calls are reachable and if the
 driver calls arch_phys_wc_add itself, then the lack of refcounting on
 non-PAT systems may cause a problem.  (I don't understand the drm
 stuff well enough to know whether that can actually happen.)

 This is only about compile-time options really. Somehow drm had the
 idea to use these check functions instead of #ifdef plus dummy static
 inline noop functions. David Herrmann just did the same patch for the
 agp stuff. So refcounting is of no concern here.

I feel like I'm missing something obvious here.  On nouveau, prior to
this patch, the drm maps code would not touch mtrrs.  Now it will.
Nouveau already calls arch_phys_wc_add, so if that maps code is
reached on the same resource, then there could be refcounting issues.

--Andy

 -Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 17/25] drm: rip out drm_core_has_MTRR checks

2013-08-10 Thread Andy Lutomirski
On Thu, Aug 8, 2013 at 6:41 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 The new arch_phys_wc_add/del functions do the right thing both with
 and without MTRR support in the kernel. So we can drop these
 additional checks.

If any of the new arch_phys_wc_add calls are reachable and if the
driver calls arch_phys_wc_add itself, then the lack of refcounting on
non-PAT systems may cause a problem.  (I don't understand the drm
stuff well enough to know whether that can actually happen.)

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 17/25] drm: rip out drm_core_has_MTRR checks

2013-08-10 Thread Andy Lutomirski
On Fri, Aug 9, 2013 at 11:47 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 On Fri, Aug 9, 2013 at 8:39 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Fri, Aug 9, 2013 at 11:36 AM, Daniel Vetter daniel.vet...@ffwll.ch 
 wrote:
 On Fri, Aug 9, 2013 at 8:12 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Aug 8, 2013 at 6:41 AM, Daniel Vetter daniel.vet...@ffwll.ch 
 wrote:
 The new arch_phys_wc_add/del functions do the right thing both with
 and without MTRR support in the kernel. So we can drop these
 additional checks.

 If any of the new arch_phys_wc_add calls are reachable and if the
 driver calls arch_phys_wc_add itself, then the lack of refcounting on
 non-PAT systems may cause a problem.  (I don't understand the drm
 stuff well enough to know whether that can actually happen.)

 This is only about compile-time options really. Somehow drm had the
 idea to use these check functions instead of #ifdef plus dummy static
 inline noop functions. David Herrmann just did the same patch for the
 agp stuff. So refcounting is of no concern here.

 I feel like I'm missing something obvious here.  On nouveau, prior to
 this patch, the drm maps code would not touch mtrrs.  Now it will.
 Nouveau already calls arch_phys_wc_add, so if that maps code is
 reached on the same resource, then there could be refcounting issues.

 Oh that kind of confusion. The maps code here is for old userspace
 drivers, I have some patches in the queue that will disable it
 properly for kms drivers. So it should never happen that both the kms
 driver and the maps code in the drm core set up a mtrr mapping. And if
 it happens someone is doing something really nasty, and that hole will
 soon be plugged.

In that case, I'm convinced.  In case you care:

Acked-by: Andy Lutomirski l...@amacapital.net

--Andy

 -Daniel
 --
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 17/25] drm: rip out drm_core_has_MTRR checks

2013-08-09 Thread Andy Lutomirski
On Fri, Aug 9, 2013 at 11:47 AM, Daniel Vetter  
wrote:
> On Fri, Aug 9, 2013 at 8:39 PM, Andy Lutomirski  
> wrote:
>> On Fri, Aug 9, 2013 at 11:36 AM, Daniel Vetter  
>> wrote:
>>> On Fri, Aug 9, 2013 at 8:12 PM, Andy Lutomirski  
>>> wrote:
>>>> On Thu, Aug 8, 2013 at 6:41 AM, Daniel Vetter  
>>>> wrote:
>>>>> The new arch_phys_wc_add/del functions do the right thing both with
>>>>> and without MTRR support in the kernel. So we can drop these
>>>>> additional checks.
>>>>
>>>> If any of the new arch_phys_wc_add calls are reachable and if the
>>>> driver calls arch_phys_wc_add itself, then the lack of refcounting on
>>>> non-PAT systems may cause a problem.  (I don't understand the drm
>>>> stuff well enough to know whether that can actually happen.)
>>>
>>> This is only about compile-time options really. Somehow drm had the
>>> idea to use these check functions instead of #ifdef plus dummy static
>>> inline noop functions. David Herrmann just did the same patch for the
>>> agp stuff. So refcounting is of no concern here.
>>
>> I feel like I'm missing something obvious here.  On nouveau, prior to
>> this patch, the drm maps code would not touch mtrrs.  Now it will.
>> Nouveau already calls arch_phys_wc_add, so if that maps code is
>> reached on the same resource, then there could be refcounting issues.
>
> Oh that kind of confusion. The maps code here is for old userspace
> drivers, I have some patches in the queue that will disable it
> properly for kms drivers. So it should never happen that both the kms
> driver and the maps code in the drm core set up a mtrr mapping. And if
> it happens someone is doing something really nasty, and that hole will
> soon be plugged.

In that case, I'm convinced.  In case you care:

Acked-by: Andy Lutomirski 

--Andy

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH 17/25] drm: rip out drm_core_has_MTRR checks

2013-08-09 Thread Andy Lutomirski
On Fri, Aug 9, 2013 at 11:36 AM, Daniel Vetter  
wrote:
> On Fri, Aug 9, 2013 at 8:12 PM, Andy Lutomirski  
> wrote:
>> On Thu, Aug 8, 2013 at 6:41 AM, Daniel Vetter  
>> wrote:
>>> The new arch_phys_wc_add/del functions do the right thing both with
>>> and without MTRR support in the kernel. So we can drop these
>>> additional checks.
>>
>> If any of the new arch_phys_wc_add calls are reachable and if the
>> driver calls arch_phys_wc_add itself, then the lack of refcounting on
>> non-PAT systems may cause a problem.  (I don't understand the drm
>> stuff well enough to know whether that can actually happen.)
>
> This is only about compile-time options really. Somehow drm had the
> idea to use these check functions instead of #ifdef plus dummy static
> inline noop functions. David Herrmann just did the same patch for the
> agp stuff. So refcounting is of no concern here.

I feel like I'm missing something obvious here.  On nouveau, prior to
this patch, the drm maps code would not touch mtrrs.  Now it will.
Nouveau already calls arch_phys_wc_add, so if that maps code is
reached on the same resource, then there could be refcounting issues.

--Andy

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Andy Lutomirski
AMA Capital Management, LLC


[PATCH 17/25] drm: rip out drm_core_has_MTRR checks

2013-08-09 Thread Andy Lutomirski
On Thu, Aug 8, 2013 at 6:41 AM, Daniel Vetter  wrote:
> The new arch_phys_wc_add/del functions do the right thing both with
> and without MTRR support in the kernel. So we can drop these
> additional checks.

If any of the new arch_phys_wc_add calls are reachable and if the
driver calls arch_phys_wc_add itself, then the lack of refcounting on
non-PAT systems may cause a problem.  (I don't understand the drm
stuff well enough to know whether that can actually happen.)

--Andy


Re: Bug in warning message from MTRR rework in uvesafb

2013-07-11 Thread Andy Lutomirski
On Wed, Jul 10, 2013 at 10:07 AM, Torsten Kaiser
just.for.l...@googlemail.com wrote:
 Commit 63e28a7a5ffce59b645ca9cbcc01e1e8be56bd75, uvesafb: Clean up
 MTRR code contains the following change:

 @@ -1930,6 +1891,9 @@ static int uvesafb_setup(char *options)
  }
  }

 +if (mtrr != 3  mtrr != 1)
 +pr_warn(uvesafb: mtrr should be set to 0 or 3; %d is
 unsupported, mtrr);
 +
  return 0;
  }
  #endif /* !MODULE */

 Shouldn't this be  mtrr != 0?

Indeed, and Sylvain Hitier (cc'd) sent a patch (off-list) that must
have gotten lost somewhere.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Bug in warning message from MTRR rework in uvesafb

2013-07-10 Thread Andy Lutomirski
On Wed, Jul 10, 2013 at 10:07 AM, Torsten Kaiser
 wrote:
> Commit 63e28a7a5ffce59b645ca9cbcc01e1e8be56bd75, "uvesafb: Clean up
> MTRR code" contains the following change:
>
> @@ -1930,6 +1891,9 @@ static int uvesafb_setup(char *options)
>  }
>  }
>
> +if (mtrr != 3 && mtrr != 1)
> +pr_warn("uvesafb: mtrr should be set to 0 or 3; %d is
> unsupported", mtrr);
> +
>  return 0;
>  }
>  #endif /* !MODULE */
>
> Shouldn't this be && mtrr != 0?

Indeed, and Sylvain Hitier (cc'd) sent a patch (off-list) that must
have gotten lost somewhere.

--Andy


[PATCH 33/39] drm: rip out drm_core_has_MTRR checks

2013-07-10 Thread Andy Lutomirski
On Wed, Jul 10, 2013 at 8:59 AM, Daniel Vetter  
wrote:
> On Wed, Jul 10, 2013 at 5:41 PM, David Herrmann  
> wrote:
>> On Wed, Jul 10, 2013 at 5:22 PM, Daniel Vetter  
>> wrote:
>>> On Wed, Jul 10, 2013 at 3:51 PM, David Herrmann  
>>> wrote:
> -#if __OS_HAS_MTRR
> -static inline int drm_core_has_MTRR(struct drm_device *dev)
> -{
> -   return drm_core_check_feature(dev, DRIVER_USE_MTRR);
> -}
> -#else
> -#define drm_core_has_MTRR(dev) (0)
> -#endif
> -

 That was the last user of DRIVER_USE_MTRR (apart from drivers setting
 it in .driver_features). Any reason to keep it around?
>>>
>>> Yeah, I guess we could rip things out. Which will also force me to
>>> properly audit drivers for the eventual behaviour change this could
>>> entail (in case there's an x86 driver which did not ask for an mtrr,
>>> but iirc there isn't).
>>
>> david at david-mb ~/dev/kernel/linux $ for i in drivers/gpu/drm/* ; do if
>> test -d "$i" ; then if ! grep -q USE_MTRR -r $i ; then echo $i ; fi ;
>> fi ; done
>> drivers/gpu/drm/exynos
>> drivers/gpu/drm/gma500
>> drivers/gpu/drm/i2c
>> drivers/gpu/drm/nouveau
>> drivers/gpu/drm/omapdrm
>> drivers/gpu/drm/qxl
>> drivers/gpu/drm/rcar-du
>> drivers/gpu/drm/shmobile
>> drivers/gpu/drm/tilcdc
>> drivers/gpu/drm/ttm
>> drivers/gpu/drm/udl
>> drivers/gpu/drm/vmwgfx
>> david at david-mb ~/dev/kernel/linux $
>>
>> So for x86 gma500,nouveau,qxl,udl,vmwgfx don't set DRIVER_USE_MTRR.
>> But I cannot tell whether they break if we call arch_phys_wc_add/del,
>> anyway. At least nouveau seemed to work here, but it doesn't use AGP
>> or drm_bufs, I guess.
>
> Cool, thanks a lot for stitching together the list of drivers to look
> at. So for real KMS drivers it's the drives responsibility to add an
> mtrr if it needs one. nouvea, radeon, mgag200, i915 and vmwgfx do that
> already. Somehow the savage driver also ends up doing that, I have no
> idea why.
>
> Note that gma500 as a pure KMS driver doesn't need MTRR setup since
> the platforms that it supports all support PAT. So no MTRRs needed to
> get wc iomappings.
>
> The mtrr support in the drm core is all for legacy mappings of garts,
> framebuffers and registers. All legacy drivers set the USE_MTRR flag,
> so we're good there.
>

Are all of those codepaths really inaccessible in non-legacy drm
drivers?  I didn't try to fully unravel all the ioctls and such, but
it seems like userspace could add bufs and map them.  Since the mtrr
code isn't very robust (reference counting?  what reference
counting?), I'm a little bit worried that potentially enabling it in
more cases, which your patch does, could be harmful.

The arch_phys_wc stuff puts a prettier interface on the mtrr code and
turns it off when PAT is available, but the underlying code is still
just as bad.

--Andy


Re: [PATCH 33/39] drm: rip out drm_core_has_MTRR checks

2013-07-10 Thread Andy Lutomirski
On Wed, Jul 10, 2013 at 8:59 AM, Daniel Vetter daniel.vet...@ffwll.ch wrote:
 On Wed, Jul 10, 2013 at 5:41 PM, David Herrmann dh.herrm...@gmail.com wrote:
 On Wed, Jul 10, 2013 at 5:22 PM, Daniel Vetter daniel.vet...@ffwll.ch 
 wrote:
 On Wed, Jul 10, 2013 at 3:51 PM, David Herrmann dh.herrm...@gmail.com 
 wrote:
 -#if __OS_HAS_MTRR
 -static inline int drm_core_has_MTRR(struct drm_device *dev)
 -{
 -   return drm_core_check_feature(dev, DRIVER_USE_MTRR);
 -}
 -#else
 -#define drm_core_has_MTRR(dev) (0)
 -#endif
 -

 That was the last user of DRIVER_USE_MTRR (apart from drivers setting
 it in .driver_features). Any reason to keep it around?

 Yeah, I guess we could rip things out. Which will also force me to
 properly audit drivers for the eventual behaviour change this could
 entail (in case there's an x86 driver which did not ask for an mtrr,
 but iirc there isn't).

 david@david-mb ~/dev/kernel/linux $ for i in drivers/gpu/drm/* ; do if
 test -d $i ; then if ! grep -q USE_MTRR -r $i ; then echo $i ; fi ;
 fi ; done
 drivers/gpu/drm/exynos
 drivers/gpu/drm/gma500
 drivers/gpu/drm/i2c
 drivers/gpu/drm/nouveau
 drivers/gpu/drm/omapdrm
 drivers/gpu/drm/qxl
 drivers/gpu/drm/rcar-du
 drivers/gpu/drm/shmobile
 drivers/gpu/drm/tilcdc
 drivers/gpu/drm/ttm
 drivers/gpu/drm/udl
 drivers/gpu/drm/vmwgfx
 david@david-mb ~/dev/kernel/linux $

 So for x86 gma500,nouveau,qxl,udl,vmwgfx don't set DRIVER_USE_MTRR.
 But I cannot tell whether they break if we call arch_phys_wc_add/del,
 anyway. At least nouveau seemed to work here, but it doesn't use AGP
 or drm_bufs, I guess.

 Cool, thanks a lot for stitching together the list of drivers to look
 at. So for real KMS drivers it's the drives responsibility to add an
 mtrr if it needs one. nouvea, radeon, mgag200, i915 and vmwgfx do that
 already. Somehow the savage driver also ends up doing that, I have no
 idea why.

 Note that gma500 as a pure KMS driver doesn't need MTRR setup since
 the platforms that it supports all support PAT. So no MTRRs needed to
 get wc iomappings.

 The mtrr support in the drm core is all for legacy mappings of garts,
 framebuffers and registers. All legacy drivers set the USE_MTRR flag,
 so we're good there.


Are all of those codepaths really inaccessible in non-legacy drm
drivers?  I didn't try to fully unravel all the ioctls and such, but
it seems like userspace could add bufs and map them.  Since the mtrr
code isn't very robust (reference counting?  what reference
counting?), I'm a little bit worried that potentially enabling it in
more cases, which your patch does, could be harmful.

The arch_phys_wc stuff puts a prettier interface on the mtrr code and
turns it off when PAT is available, but the underlying code is still
just as bad.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: MTRR use in drivers

2013-06-25 Thread Andy Lutomirski
On Sun, Jun 23, 2013 at 1:38 PM, H. Peter Anvin h...@zytor.com wrote:
 On 06/23/2013 01:30 PM, Dave Airlie wrote:
 Why do you care about performance when PAT is disabled?

 breaking old boxes just because, is just going to get reverted when I
 get the first regression report that you broke old boxes.


 Not just because, but *if* the choice is between breaking old boxes
 and breaking new boxes I'll take the latter.

 Andy Lutomirski just submitted a bunch of patches to clean up the DRM
 usage of mtrrs, they are in drm-next, afaik we no longer add them on
 PAT systems.

 Fantastic news.  No issue, then, and no need to break anything.

 The only problem I see with having ioremap_wc() installing an MTRR on
 non-PAT, rather than pushing that into the drivers which is clearly not
 the right thing, is that we will need a hook to uninstall it when the
 mapping is destroyed.

I have trouble believing that this will ever work well -- MTRRs have
crazy alignment requirements and interactions with other MTRRs, and a
few drivers have to jump through hoops to set up the right MTRRs.
There aren't really enough to break down every mapping.

My patches (in dri-next) add functions arch_wc_phys_add and
arch_wc_phys_del that do nothing except on x86 with MTRRs on and PAT
off, in which case they try to add a WC MTRR.  That way the handful of
drivers that need WC for performance on old hardware can try (and
possibly fail, depending on the usual vagaries of MTRRs).  With my
patches applied, DRM and agpgart no longer touch MTRRs at all with PAT
on.

I didn't get around to excising MTRRs from the non-DRM video drivers
or from the few odd cases like myri10ge.

This stuff is painful to test.  The only drivers I can really test are
i915 and radeon.  I have a myri10ge device, but it's on a production
server.  I also have several mgag200 devices, but they're in a
super-secret-locked-down datacenter a few thousand miles away, and
trying to gauge framebuffer performance over Dell and/or HP's crappy
remoting interface is a lost cause.  I'm not sure that my oldest
computer (locked in a basement in another state) is old enough to have
an AGP port.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC 3/6] drm: add SimpleDRM driver

2013-06-25 Thread Andy Lutomirski
On 06/24/2013 03:27 PM, David Herrmann wrote:
 + sdrm-fb_map = ioremap(sdrm-fb_base, sdrm-fb_size);

This should probably be ioremap_wc.  Otherwise it will be *really* slow
if used in legacy mode and it may cause conflicts with the
pgprot_writecombine mode for mmap.

(Watching boot messages go by on fbcon on efifb was like using an old
2400 baud modem before I made the corresponding change to efifb.)

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC 3/6] drm: add SimpleDRM driver

2013-06-24 Thread Andy Lutomirski
On 06/24/2013 03:27 PM, David Herrmann wrote:
> + sdrm->fb_map = ioremap(sdrm->fb_base, sdrm->fb_size);

This should probably be ioremap_wc.  Otherwise it will be *really* slow
if used in legacy mode and it may cause conflicts with the
pgprot_writecombine mode for mmap.

(Watching boot messages go by on fbcon on efifb was like using an old
2400 baud modem before I made the corresponding change to efifb.)

--Andy


MTRR use in drivers

2013-06-23 Thread Andy Lutomirski
On Sun, Jun 23, 2013 at 1:38 PM, H. Peter Anvin  wrote:
> On 06/23/2013 01:30 PM, Dave Airlie wrote:
>>>>> Why do you care about performance when PAT is disabled?
>>
>> breaking old boxes just because, is just going to get reverted when I
>> get the first regression report that you broke old boxes.
>>
>
> Not "just because", but *if* the choice is between breaking old boxes
> and breaking new boxes I'll take the latter.
>
>> Andy Lutomirski just submitted a bunch of patches to clean up the DRM
>> usage of mtrrs, they are in drm-next, afaik we no longer add them on
>> PAT systems.
>
> Fantastic news.  No issue, then, and no need to break anything.
>
> The only problem I see with having ioremap_wc() installing an MTRR on
> non-PAT, rather than pushing that into the drivers which is clearly not
> the right thing, is that we will need a hook to uninstall it when the
> mapping is destroyed.

I have trouble believing that this will ever work well -- MTRRs have
crazy alignment requirements and interactions with other MTRRs, and a
few drivers have to jump through hoops to set up the right MTRRs.
There aren't really enough to break down every mapping.

My patches (in dri-next) add functions arch_wc_phys_add and
arch_wc_phys_del that do nothing except on x86 with MTRRs on and PAT
off, in which case they try to add a WC MTRR.  That way the handful of
drivers that need WC for performance on old hardware can try (and
possibly fail, depending on the usual vagaries of MTRRs).  With my
patches applied, DRM and agpgart no longer touch MTRRs at all with PAT
on.

I didn't get around to excising MTRRs from the non-DRM video drivers
or from the few odd cases like myri10ge.

This stuff is painful to test.  The only drivers I can really test are
i915 and radeon.  I have a myri10ge device, but it's on a production
server.  I also have several mgag200 devices, but they're in a
super-secret-locked-down datacenter a few thousand miles away, and
trying to gauge framebuffer performance over Dell and/or HP's crappy
remoting interface is a lost cause.  I'm not sure that my oldest
computer (locked in a basement in another state) is old enough to have
an AGP port.

--Andy


[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

2013-06-18 Thread Andy Lutomirski
On Thu, Jun 13, 2013 at 2:22 PM, Andy Lutomirski  wrote:
> On Wed, Jun 12, 2013 at 6:56 AM, Jerome Glisse  wrote:
>> Andy can you test (without your patch) and see if it helps with your issue :
>> http://people.freedesktop.org/~glisse/0001-drm-radeon-update-lockup-tracking-when-scheduling-in.patch
>
> Testing now.  I'll report back in a couple of days.
>

3.9.4 plus this patch has been completely stable for several days now.

Tested-by: Andy Lutomirski 

Can you send this to Linux and -stable?

Thanks,
Andy


Re: [PATCH] radeon: Fix a false positive lockup after 10s of inactivity

2013-06-18 Thread Andy Lutomirski
On Thu, Jun 13, 2013 at 2:22 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Wed, Jun 12, 2013 at 6:56 AM, Jerome Glisse j.gli...@gmail.com wrote:
 Andy can you test (without your patch) and see if it helps with your issue :
 http://people.freedesktop.org/~glisse/0001-drm-radeon-update-lockup-tracking-when-scheduling-in.patch

 Testing now.  I'll report back in a couple of days.


3.9.4 plus this patch has been completely stable for several days now.

Tested-by: Andy Lutomirski l...@amacapital.net

Can you send this to Linux and -stable?

Thanks,
Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/3] fbdev no more!

2013-06-17 Thread Andy Lutomirski
On 06/16/2013 07:57 AM, Daniel Vetter wrote:
> Hi all,
> 
> So I've taken a look again at the locking mess in our fbdev support and cried.
> Fixing up the console_lock mess around the fbdev notifier will be real work,
> semanatically the fbdev layer does lots of stupid things (like the radeon 
> resume
> issue I've just debugged) and the panic notifier is pretty much a lost cause.
> 
> So I've decided to instead rip it all out. It seems to work \o/

I wonder how badly this breaks on EFI systems.  Currently, efifb is an
fbdev driver.  When i915 calls register_framebuffer, the fbdev core
removes efifb's framebuffer.  (This is scary already -- what if i915 has
reused that memory for something else beforehand?)  But now, if i915
doesn't call register_framebuffer, the efifb "framebuffer" might stick
around forever.

Presumably, efifb ought to become a framebuffer-only drm driver and
there should be a saner way to hand control from efifb (or vesa?) to a
real driver.

--Andy


Re: [PATCH 0/3] fbdev no more!

2013-06-17 Thread Andy Lutomirski
On 06/16/2013 07:57 AM, Daniel Vetter wrote:
 Hi all,
 
 So I've taken a look again at the locking mess in our fbdev support and cried.
 Fixing up the console_lock mess around the fbdev notifier will be real work,
 semanatically the fbdev layer does lots of stupid things (like the radeon 
 resume
 issue I've just debugged) and the panic notifier is pretty much a lost cause.
 
 So I've decided to instead rip it all out. It seems to work \o/

I wonder how badly this breaks on EFI systems.  Currently, efifb is an
fbdev driver.  When i915 calls register_framebuffer, the fbdev core
removes efifb's framebuffer.  (This is scary already -- what if i915 has
reused that memory for something else beforehand?)  But now, if i915
doesn't call register_framebuffer, the efifb framebuffer might stick
around forever.

Presumably, efifb ought to become a framebuffer-only drm driver and
there should be a saner way to hand control from efifb (or vesa?) to a
real driver.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] radeon: Fix a false positive lockup after 10s of inactivity

2013-06-14 Thread Andy Lutomirski
On Wed, Jun 12, 2013 at 6:56 AM, Jerome Glisse j.gli...@gmail.com wrote:
 On Wed, Jun 12, 2013 at 6:26 AM, Michel Dänzer mic...@daenzer.net wrote:
 On Die, 2013-06-11 at 16:23 -0700, Andy Lutomirski wrote:
 If the device is idle for over ten seconds, then the next attempt to do
 anything can race with the lockup detector and cause a bogus lockup
 to be detected.

 Oddly, the situation is well-described in the lockup detector's comments
 and a fix is even described.  This patch implements that fix (and corrects
 some typos in the description).

 My system has been stable for about a week running this code.  Without this,
 my screen would go blank every now and then and, when it came back, 
 everything
 would be remarkably slow (the latter is a separate bug).

 Signed-off-by: Andy Lutomirski l...@amacapital.net

 [...]

 diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
 b/drivers/gpu/drm/radeon/radeon_ring.c
 index 1ef5eaa..fb7b3ea 100644
 --- a/drivers/gpu/drm/radeon/radeon_ring.c
 +++ b/drivers/gpu/drm/radeon/radeon_ring.c
 @@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring 
 *ring)
   * have CP rptr to a different value of jiffies wrap around which will 
 force
   * initialization of the lockup tracking informations.
   *
 - * A possible false positivie is if we get call after while and 
 last_cp_rptr ==
 - * the current CP rptr, even if it's unlikely it might happen. To avoid 
 this
 - * if the elapsed time since last call is bigger than 2 second than we 
 return
 - * false and update the tracking information. Due to this the caller must 
 call
 - * radeon_ring_test_lockup several time in less than 2sec for lockup to be 
 reported
 - * the fencing code should be cautious about that.
 + * A possible false positive is if we get called after a while and
 + * last_cp_rptr == the current CP rptr, even if it's unlikely it might
 + * happen. To avoid this if the elapsed time since the last call is bigger
 + * than 2 second then we return false and update the tracking
 + * information. Due to this the caller must call radeon_ring_test_lockup
 + * more frequently than once every 2s when waiting.

 Is it guaranteed that radeon_ring_test_lockup will be called more often
 than every 2s when waiting? If not, this change might prevent a real
 lockup from being detected?

 Yes it will if you wait for a fence, because the fence timeout wait is
 way smaller than 2sec so radeon_ring_is_lockup get call several time,
 which call radeon_ring_force_activity and then
 radeon_ring_test_lockup.

 This also means it very very very unlikely (see below for the likely
 case) to have a wrap around that give last rptr same as current one.

 The likely case is when you have something like a long compute, then
 nothing is lockup but you keep filling ring with
 radeon_ring_force_activity but the cp is still stuck on the ib of the
 compute stuff so rptr does not progress.

 Either way, I wonder if there might not be a simpler solution to the
 problem, e.g. by updating last_activity when submitting commands to a
 previously empty ring.

 Maybe but i still don't think it should matter.

 Andy can you test (without your patch) and see if it helps with your issue :
 http://people.freedesktop.org/~glisse/0001-drm-radeon-update-lockup-tracking-when-scheduling-in.patch

Testing now.  I'll report back in a couple of days.

I don't think that long computes have anything to do with it.  The
bogus lockups happen when I look away from my computer for a while and
then click something.  I thing the graphics are usually completely
idle when this happens.

AFAIK I've never run an OpenCL or similar application on this system.

--Andy
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

2013-06-13 Thread Andy Lutomirski
On Wed, Jun 12, 2013 at 6:56 AM, Jerome Glisse  wrote:
> On Wed, Jun 12, 2013 at 6:26 AM, Michel D?nzer  wrote:
>> On Die, 2013-06-11 at 16:23 -0700, Andy Lutomirski wrote:
>>> If the device is idle for over ten seconds, then the next attempt to do
>>> anything can race with the lockup detector and cause a bogus lockup
>>> to be detected.
>>>
>>> Oddly, the situation is well-described in the lockup detector's comments
>>> and a fix is even described.  This patch implements that fix (and corrects
>>> some typos in the description).
>>>
>>> My system has been stable for about a week running this code.  Without this,
>>> my screen would go blank every now and then and, when it came back, 
>>> everything
>>> would be remarkably slow (the latter is a separate bug).
>>>
>>> Signed-off-by: Andy Lutomirski 
>>
>> [...]
>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
>>> b/drivers/gpu/drm/radeon/radeon_ring.c
>>> index 1ef5eaa..fb7b3ea 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_ring.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
>>> @@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring 
>>> *ring)
>>>   * have CP rptr to a different value of jiffies wrap around which will 
>>> force
>>>   * initialization of the lockup tracking informations.
>>>   *
>>> - * A possible false positivie is if we get call after while and 
>>> last_cp_rptr ==
>>> - * the current CP rptr, even if it's unlikely it might happen. To avoid 
>>> this
>>> - * if the elapsed time since last call is bigger than 2 second than we 
>>> return
>>> - * false and update the tracking information. Due to this the caller must 
>>> call
>>> - * radeon_ring_test_lockup several time in less than 2sec for lockup to be 
>>> reported
>>> - * the fencing code should be cautious about that.
>>> + * A possible false positive is if we get called after a while and
>>> + * last_cp_rptr == the current CP rptr, even if it's unlikely it might
>>> + * happen. To avoid this if the elapsed time since the last call is bigger
>>> + * than 2 second then we return false and update the tracking
>>> + * information. Due to this the caller must call radeon_ring_test_lockup
>>> + * more frequently than once every 2s when waiting.
>>
>> Is it guaranteed that radeon_ring_test_lockup will be called more often
>> than every 2s when waiting? If not, this change might prevent a real
>> lockup from being detected?
>
> Yes it will if you wait for a fence, because the fence timeout wait is
> way smaller than 2sec so radeon_ring_is_lockup get call several time,
> which call radeon_ring_force_activity and then
> radeon_ring_test_lockup.
>
> This also means it very very very unlikely (see below for the likely
> case) to have a wrap around that give last rptr same as current one.
>
> The likely case is when you have something like a long compute, then
> nothing is lockup but you keep filling ring with
> radeon_ring_force_activity but the cp is still stuck on the ib of the
> compute stuff so rptr does not progress.
>
>> Either way, I wonder if there might not be a simpler solution to the
>> problem, e.g. by updating last_activity when submitting commands to a
>> previously empty ring.
>
> Maybe but i still don't think it should matter.
>
> Andy can you test (without your patch) and see if it helps with your issue :
> http://people.freedesktop.org/~glisse/0001-drm-radeon-update-lockup-tracking-when-scheduling-in.patch

Testing now.  I'll report back in a couple of days.

I don't think that long computes have anything to do with it.  The
bogus lockups happen when I look away from my computer for a while and
then click something.  I thing the graphics are usually completely
idle when this happens.

AFAIK I've never run an OpenCL or similar application on this system.

--Andy


[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

2013-06-11 Thread Andy Lutomirski
If the device is idle for over ten seconds, then the next attempt to do
anything can race with the lockup detector and cause a bogus lockup
to be detected.

Oddly, the situation is well-described in the lockup detector's comments
and a fix is even described.  This patch implements that fix (and corrects
some typos in the description).

My system has been stable for about a week running this code.  Without this,
my screen would go blank every now and then and, when it came back, everything
would be remarkably slow (the latter is a separate bug).

Signed-off-by: Andy Lutomirski 
---

This may be -stable material.

 drivers/gpu/drm/radeon/radeon.h  |  1 +
 drivers/gpu/drm/radeon/radeon_ring.c | 23 ---
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8263af3..9de5778 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -652,6 +652,7 @@ struct radeon_ring {
unsignedring_free_dw;
int count_dw;
unsigned long   last_activity;
+   unsigned long   last_test_lockup;
unsignedlast_rptr;
uint64_tgpu_addr;
uint32_talign_mask;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 1ef5eaa..fb7b3ea 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring *ring)
  * have CP rptr to a different value of jiffies wrap around which will force
  * initialization of the lockup tracking informations.
  *
- * A possible false positivie is if we get call after while and last_cp_rptr ==
- * the current CP rptr, even if it's unlikely it might happen. To avoid this
- * if the elapsed time since last call is bigger than 2 second than we return
- * false and update the tracking information. Due to this the caller must call
- * radeon_ring_test_lockup several time in less than 2sec for lockup to be 
reported
- * the fencing code should be cautious about that.
+ * A possible false positive is if we get called after a while and
+ * last_cp_rptr == the current CP rptr, even if it's unlikely it might
+ * happen. To avoid this if the elapsed time since the last call is bigger
+ * than 2 second then we return false and update the tracking
+ * information. Due to this the caller must call radeon_ring_test_lockup
+ * more frequently than once every 2s when waiting.
  *
  * Caller should write to the ring to force CP to do something so we don't get
  * false positive when CP is just gived nothing to do.
@@ -560,10 +560,14 @@ void radeon_ring_lockup_update(struct radeon_ring *ring)
  **/
 bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring 
*ring)
 {
-   unsigned long cjiffies, elapsed;
+   unsigned long cjiffies, elapsed, last_test;
uint32_t rptr;

cjiffies = jiffies;
+
+   last_test = ring->last_test_lockup;
+   ring->last_test_lockup = cjiffies;
+
if (!time_after(cjiffies, ring->last_activity)) {
/* likely a wrap around */
radeon_ring_lockup_update(ring);
@@ -576,6 +580,11 @@ bool radeon_ring_test_lockup(struct radeon_device *rdev, 
struct radeon_ring *rin
radeon_ring_lockup_update(ring);
return false;
}
+   if (cjiffies - last_test > 2 * HZ) {
+   /* Possible race -- see comment above */
+   radeon_ring_lockup_update(ring);
+   return false;
+   }
elapsed = jiffies_to_msecs(cjiffies - ring->last_activity);
if (radeon_lockup_timeout && elapsed >= radeon_lockup_timeout) {
dev_err(rdev->dev, "GPU lockup CP stall for more than 
%lumsec\n", elapsed);
-- 
1.8.1.4



[PATCH] radeon: Fix a false positive lockup after 10s of inactivity

2013-06-11 Thread Andy Lutomirski
If the device is idle for over ten seconds, then the next attempt to do
anything can race with the lockup detector and cause a bogus lockup
to be detected.

Oddly, the situation is well-described in the lockup detector's comments
and a fix is even described.  This patch implements that fix (and corrects
some typos in the description).

My system has been stable for about a week running this code.  Without this,
my screen would go blank every now and then and, when it came back, everything
would be remarkably slow (the latter is a separate bug).

Signed-off-by: Andy Lutomirski l...@amacapital.net
---

This may be -stable material.

 drivers/gpu/drm/radeon/radeon.h  |  1 +
 drivers/gpu/drm/radeon/radeon_ring.c | 23 ---
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8263af3..9de5778 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -652,6 +652,7 @@ struct radeon_ring {
unsignedring_free_dw;
int count_dw;
unsigned long   last_activity;
+   unsigned long   last_test_lockup;
unsignedlast_rptr;
uint64_tgpu_addr;
uint32_talign_mask;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c 
b/drivers/gpu/drm/radeon/radeon_ring.c
index 1ef5eaa..fb7b3ea 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -547,12 +547,12 @@ void radeon_ring_lockup_update(struct radeon_ring *ring)
  * have CP rptr to a different value of jiffies wrap around which will force
  * initialization of the lockup tracking informations.
  *
- * A possible false positivie is if we get call after while and last_cp_rptr ==
- * the current CP rptr, even if it's unlikely it might happen. To avoid this
- * if the elapsed time since last call is bigger than 2 second than we return
- * false and update the tracking information. Due to this the caller must call
- * radeon_ring_test_lockup several time in less than 2sec for lockup to be 
reported
- * the fencing code should be cautious about that.
+ * A possible false positive is if we get called after a while and
+ * last_cp_rptr == the current CP rptr, even if it's unlikely it might
+ * happen. To avoid this if the elapsed time since the last call is bigger
+ * than 2 second then we return false and update the tracking
+ * information. Due to this the caller must call radeon_ring_test_lockup
+ * more frequently than once every 2s when waiting.
  *
  * Caller should write to the ring to force CP to do something so we don't get
  * false positive when CP is just gived nothing to do.
@@ -560,10 +560,14 @@ void radeon_ring_lockup_update(struct radeon_ring *ring)
  **/
 bool radeon_ring_test_lockup(struct radeon_device *rdev, struct radeon_ring 
*ring)
 {
-   unsigned long cjiffies, elapsed;
+   unsigned long cjiffies, elapsed, last_test;
uint32_t rptr;
 
cjiffies = jiffies;
+
+   last_test = ring-last_test_lockup;
+   ring-last_test_lockup = cjiffies;
+
if (!time_after(cjiffies, ring-last_activity)) {
/* likely a wrap around */
radeon_ring_lockup_update(ring);
@@ -576,6 +580,11 @@ bool radeon_ring_test_lockup(struct radeon_device *rdev, 
struct radeon_ring *rin
radeon_ring_lockup_update(ring);
return false;
}
+   if (cjiffies - last_test  2 * HZ) {
+   /* Possible race -- see comment above */
+   radeon_ring_lockup_update(ring);
+   return false;
+   }
elapsed = jiffies_to_msecs(cjiffies - ring-last_activity);
if (radeon_lockup_timeout  elapsed = radeon_lockup_timeout) {
dev_err(rdev-dev, GPU lockup CP stall for more than 
%lumsec\n, elapsed);
-- 
1.8.1.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


  1   2   3   >