Re: [Intel-gfx] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, 2022-02-28 at 23:59 +0200, Mike Rapoport wrote: > > On February 28, 2022 10:42:53 PM GMT+02:00, James Bottomley < > james.bottom...@hansenpartnership.com> wrote: > > On Mon, 2022-02-28 at 21:07 +0100, Christian König wrote: [...] > > > > I do wish we could actually poison the 'pos' value after the > > > > loop somehow - but clearly the "might be uninitialized" I was > > > > hoping for isn't the way to do it. > > > > > > > > Anybody have any ideas? > > > > > > I think we should look at the use cases why code is touching > > > (pos) after the loop. > > > > > > Just from skimming over the patches to change this and experience > > > with the drivers/subsystems I help to maintain I think the > > > primary pattern looks something like this: > > > > > > list_for_each_entry(entry, head, member) { > > > if (some_condition_checking(entry)) > > > break; > > > } > > > do_something_with(entry); > > > > Actually, we usually have a check to see if the loop found > > anything, but in that case it should something like > > > > if (list_entry_is_head(entry, head, member)) { > >return with error; > > } > > do_somethin_with(entry); > > > > Suffice? The list_entry_is_head() macro is designed to cope with > > the bogus entry on head problem. > > Won't suffice because the end goal of this work is to limit scope of > entry only to loop. Hence the need for additional variable. Well, yes, but my objection is more to the size of churn than the desire to do loop local. I'm not even sure loop local is possible, because it's always annoyed me that for (int i = 0; ... in C++ defines i in the outer scope not the loop scope, which is why I never use it. However, if the desire is really to poison the loop variable then we can do #define list_for_each_entry(pos, head, member) \ for (pos = list_first_entry(head, typeof(*pos), member);\ !list_entry_is_head(pos, head, member) && ((pos = NULL) == NULL; \ pos = list_next_entry(pos, member)) Which would at least set pos to NULL when the loop completes. > Besides, there are no guarantees that people won't > do_something_with(entry) without the check or won't compare entry to > NULL to check if the loop finished with break or not. I get the wider goal, but we have to patch the problem cases now and a simple one-liner is better than a larger patch that may or may not work if we ever achieve the local definition or value poisoning idea. I'm also fairly certain coccinelle can come up with a use without checking for loop completion semantic patch which we can add to 0day. James
Re: [Intel-gfx] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, 2022-02-28 at 21:56 +0100, Christian König wrote: > > Am 28.02.22 um 21:42 schrieb James Bottomley: > > On Mon, 2022-02-28 at 21:07 +0100, Christian König wrote: > > > Am 28.02.22 um 20:56 schrieb Linus Torvalds: > > > > On Mon, Feb 28, 2022 at 4:19 AM Christian König > > > > wrote: > > > > [SNIP] > > > > Anybody have any ideas? > > > I think we should look at the use cases why code is touching > > > (pos) > > > after the loop. > > > > > > Just from skimming over the patches to change this and experience > > > with the drivers/subsystems I help to maintain I think the > > > primary pattern looks something like this: > > > > > > list_for_each_entry(entry, head, member) { > > > if (some_condition_checking(entry)) > > > break; > > > } > > > do_something_with(entry); > > > > Actually, we usually have a check to see if the loop found > > anything, but in that case it should something like > > > > if (list_entry_is_head(entry, head, member)) { > > return with error; > > } > > do_somethin_with(entry); > > > > Suffice? The list_entry_is_head() macro is designed to cope with > > the bogus entry on head problem. > > That will work and is also what people already do. > > The key problem is that we let people do the same thing over and > over again with slightly different implementations. > > Out in the wild I've seen at least using a separate variable, using > a bool to indicate that something was found and just assuming that > the list has an entry. > > The last case is bogus and basically what can break badly. Yes, I understand that. I'm saying we should replace that bogus checks of entry->something against some_value loop termination condition with the list_entry_is_head() macro. That should be a one line and fairly mechanical change rather than the explosion of code changes we seem to have in the patch series. James
Re: [Intel-gfx] [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr
On Mon, 2022-02-28 at 21:07 +0100, Christian König wrote: > Am 28.02.22 um 20:56 schrieb Linus Torvalds: > > On Mon, Feb 28, 2022 at 4:19 AM Christian König > > wrote: > > > I don't think that using the extra variable makes the code in any > > > way > > > more reliable or easier to read. > > So I think the next step is to do the attached patch (which > > requires > > that "-std=gnu11" that was discussed in the original thread). > > > > That will guarantee that the 'pos' parameter of > > list_for_each_entry() > > is only updated INSIDE the for_each_list_entry() loop, and can > > never > > point to the (wrongly typed) head entry. > > > > And I would actually hope that it should actually cause compiler > > warnings about possibly uninitialized variables if people then use > > the > > 'pos' pointer outside the loop. Except > > > > (a) that code in sgx/encl.c currently initializes 'tmp' to NULL > > for > > inexplicable reasons - possibly because it already expected this > > behavior > > > > (b) when I remove that NULL initializer, I still don't get a > > warning, > > because we've disabled -Wno-maybe-uninitialized since it results in > > so > > many false positives. > > > > Oh well. > > > > Anyway, give this patch a look, and at least if it's expanded to do > > "(pos) = NULL" in the entry statement for the for-loop, it will > > avoid the HEAD type confusion that Jakob is working on. And I think > > in a cleaner way than the horrid games he plays. > > > > (But it won't avoid possible CPU speculation of such type > > confusion. That, in my opinion, is a completely different issue) > > Yes, completely agree. > > > I do wish we could actually poison the 'pos' value after the loop > > somehow - but clearly the "might be uninitialized" I was hoping for > > isn't the way to do it. > > > > Anybody have any ideas? > > I think we should look at the use cases why code is touching (pos) > after the loop. > > Just from skimming over the patches to change this and experience > with the drivers/subsystems I help to maintain I think the primary > pattern looks something like this: > > list_for_each_entry(entry, head, member) { > if (some_condition_checking(entry)) > break; > } > do_something_with(entry); Actually, we usually have a check to see if the loop found anything, but in that case it should something like if (list_entry_is_head(entry, head, member)) { return with error; } do_somethin_with(entry); Suffice? The list_entry_is_head() macro is designed to cope with the bogus entry on head problem. James
Re: [Intel-gfx] [PATCH v3 3/4] tpm_tis: Disable interrupts if interrupt storm detected
On Mon, 2020-12-07 at 15:28 -0400, Jason Gunthorpe wrote: > On Sun, Dec 06, 2020 at 08:26:16PM +0100, Thomas Gleixner wrote: > > Just as a side note. I was looking at tpm_tis_probe_irq_single() > > and that function is leaking the interrupt request if any of the > > checks afterwards fails, except for the final interrupt probe check > > which does a cleanup. That means on fail before that the interrupt > > handler stays requested up to the point where the module is > > removed. If that's a shared interrupt and some other device is > > active on the same line, then each interrupt from that device will > > call into the TPM code. Something like the below is needed. > > > > Also the X86 autoprobe mechanism is interesting: > > > > if (IS_ENABLED(CONFIG_X86)) > > for (i = 3; i <= 15; i++) > > if (!tpm_tis_probe_irq_single(chip, intmask, 0, > > i)) > > return; > > > > The third argument is 'flags' which is handed to request_irq(). So > > that won't ever be able to probe a shared interrupt. But if an > > interrupt number > 0 is handed to tpm_tis_core_init() the interrupt > > is requested with IRQF_SHARED. Same issue when the chip has an > > interrupt number in the register. It's also requested exclusive > > which is pretty likely to fail on ancient x86 machines. > > It is very likely none of this works any more, it has been repeatedly > reworked over the years and just left behind out of fear someone > needs it. I've thought it should be deleted for a while now. > > I suppose the original logic was to try and probe without SHARED > because a probe would need exclusive access to the interrupt to tell > if the TPM was actually the source, not some other device. > > It is all very old and very out of step with current thinking, IMHO. > I skeptical that TPM interrupts were ever valuable enough to deserve > this in the first place. For what it's worth, I agree. Trying to probe all 15 ISA interrupts is last millennium thinking we should completely avoid. If it's not described in ACPI then you don't get an interrupt full stop. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v3 1/4] irq: export kstat_irqs
On Sun, 2020-12-06 at 17:40 +0100, Thomas Gleixner wrote: > On Sat, Dec 05 2020 at 12:39, Jarkko Sakkinen wrote: > > On Fri, Dec 04, 2020 at 06:43:37PM -0700, Jerry Snitselaar wrote: > > > To try and detect potential interrupt storms that > > > have been occurring with tpm_tis devices it was suggested > > > to use kstat_irqs() to get the number of interrupts. > > > Since tpm_tis can be built as a module it needs kstat_irqs > > > exported. > > > > I think you should also have a paragraph explicitly stating that > > i915_pmu.c contains a duplicate of kstat_irqs() because it is not > > exported as of today. It adds a lot more weight to this given that > > there is already existing mainline usage (kind of). > > It's abusage and just the fact that it exists is not an argument by > itself. What we want is a count of the interrupts to see if we're having an interrupt storm from the TPM device (some seem to be wired to fire the interrupt even when there's no event to warrant it). Since kstat_irqs_user() does the correct RCU locking, should we be using that instead? James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang
On Tue, 2020-11-24 at 13:32 -0800, Kees Cook wrote: > On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote: > > Really, no ... something which produces no improvement has no value > > at all ... we really shouldn't be wasting maintainer time with it > > because it has a cost to merge. I'm not sure we understand where > > the balance lies in value vs cost to merge but I am confident in > > the zero value case. > > What? We can't measure how many future bugs aren't introduced because > the kernel requires explicit case flow-control statements for all new > code. No but we can measure how vulnerable our current coding habits are to the mistake this warning would potentially prevent. I don't think it's wrong to extrapolate that if we had no instances at all of prior coding problems we likely wouldn't have any in future either making adopting the changes needed to enable the warning valueless ... that's the zero value case I was referring to above. Now, what we have seems to be about 6 cases (at least what's been shown in this thread) where a missing break would cause potentially user visible issues. That means the value of this isn't zero, but it's not a no-brainer massive win either. That's why I think asking what we've invested vs the return isn't a useless exercise. > We already enable -Wimplicit-fallthrough globally, so that's not the > discussion. The issue is that Clang is (correctly) even more strict > than GCC for this, so these are the remaining ones to fix for full > Clang coverage too. > > People have spent more time debating this already than it would have > taken to apply the patches. :) You mean we've already spent 90% of the effort to come this far so we might as well go the remaining 10% because then at least we get some return? It's certainly a clinching argument in defence procurement ... > This is about robustness and language wrangling. It's a big code- > base, and this is the price of our managing technical debt for > permanent robustness improvements. (The numbers I ran from Gustavo's > earlier patches were that about 10% of the places adjusted were > identified as legitimate bugs being fixed. This final series may be > lower, but there are still bugs being found from it -- we need to > finish this and shut the door on it for good.) I got my six patches by analyzing the lwn.net report of the fixes that was cited which had 21 of which 50% didn't actually change the emitted code, and 25% didn't have a user visible effect. But the broader point I'm making is just because the compiler people come up with a shiny new warning doesn't necessarily mean the problem it's detecting is one that causes us actual problems in the code base. I'd really be happier if we had a theory about what classes of CVE or bug we could eliminate before we embrace the next new warning. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Mon, 2020-11-23 at 19:56 +0100, Miguel Ojeda wrote: > On Mon, Nov 23, 2020 at 4:58 PM James Bottomley > wrote: > > Well, I used git. It says that as of today in Linus' tree we have > > 889 patches related to fall throughs and the first series went in > > in october 2017 ... ignoring a couple of outliers back to February. > > I can see ~10k insertions over ~1k commits and 15 years that mention > a fallthrough in the entire repo. That is including some commits > (like the biggest one, 960 insertions) that have nothing to do with C > fallthrough. A single kernel release has an order of magnitude more > changes than this... > > But if we do the math, for an author, at even 1 minute per line > change and assuming nothing can be automated at all, it would take 1 > month of work. For maintainers, a couple of trivial lines is noise > compared to many other patches. So you think a one line patch should take one minute to produce ... I really don't think that's grounded in reality. I suppose a one line patch only takes a minute to merge with b4 if no-one reviews or tests it, but that's not really desirable. > In fact, this discussion probably took more time than the time it > would take to review the 200 lines. :-) I'm framing the discussion in terms of the whole series of changes we have done for fall through, both what's in the tree currently (889 patches) both in terms of the produce and the consumer. That's what I used for my figures for cost. > > We're also complaining about the inability to recruit maintainers: > > > > https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ > > > > And burn out: > > > > http://antirez.com/news/129 > > Accepting trivial and useful 1-line patches Part of what I'm trying to measure is the "and useful" bit because that's not a given. > is not what makes a voluntary maintainer quit... so the proverb "straw which broke the camel's back" uniquely doesn't apply to maintainers > Thankless work with demanding deadlines is. That's another potential reason, but it doesn't may other reasons less valid. > > The whole crux of your argument seems to be maintainers' time isn't > > important so we should accept all trivial patches > > I have not said that, at all. In fact, I am a voluntary one and I > welcome patches like this. It takes very little effort on my side to > review and it helps the kernel overall. Well, you know, subsystems are very different in terms of the amount of patches a maintainer has to process per release cycle of the kernel. If a maintainer is close to capacity, additional patches, however trivial, become a problem. If a maintainer has spare cycles, trivial patches may look easy. > Paid maintainers are the ones that can take care of big > features/reviews. > > > What I'm actually trying to articulate is a way of measuring value > > of the patch vs cost ... it has nothing really to do with who foots > > the actual bill. > > I understand your point, but you were the one putting it in terms of > a junior FTE. No, I evaluated the producer side in terms of an FTE. What we're mostly arguing about here is the consumer side: the maintainers and people who have to rework their patch sets. I estimated that at 100h. > In my view, 1 month-work (worst case) is very much worth > removing a class of errors from a critical codebase. > > > One thesis I'm actually starting to formulate is that this > > continual devaluing of maintainers is why we have so much > > difficulty keeping and recruiting them. > > That may very well be true, but I don't feel anybody has devalued > maintainers in this discussion. You seem to be saying that because you find it easy to merge trivial patches, everyone should. I'm reminded of a friend long ago who thought being a Tees River Pilot was a sinecure because he could navigate the Tees blindfold. What he forgot, of course, is that just because it's easy with a trawler doesn't mean it's easy with an oil tanker. In fact it takes longer to qualify as a Tees River Pilot than it does to get a PhD. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang
On Mon, 2020-11-23 at 07:03 -0600, Gustavo A. R. Silva wrote: > On Sun, Nov 22, 2020 at 11:53:55AM -0800, James Bottomley wrote: > > On Sun, 2020-11-22 at 11:22 -0800, Joe Perches wrote: > > > On Sun, 2020-11-22 at 11:12 -0800, James Bottomley wrote: > > > > On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote: > > > > > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > > > > > > Please tell me our reward for all this effort isn't a > > > > > > single missing error print. > > > > > > > > > > There were quite literally dozens of logical defects found > > > > > by the fallthrough additions. Very few were logging only. > > > > > > > > So can you give us the best examples (or indeed all of them if > > > > someone is keeping score)? hopefully this isn't a US election > > > > situation ... > > > > > > Gustavo? Are you running for congress now? > > > > > > https://lwn.net/Articles/794944/ > > > > That's 21 reported fixes of which about 50% seem to produce no > > change in code behaviour at all, a quarter seem to have no user > > visible effect with the remaining quarter producing unexpected > > errors on obscure configuration parameters, which is why no-one > > really noticed them before. > > The really important point here is the number of bugs this has > prevented and will prevent in the future. See an example of this, > below: > > https://lore.kernel.org/linux-iio/20190813135802.gb27...@kroah.com/ I think this falls into the same category as the other six bugs: it changes the output/input for parameters but no-one has really noticed, usually because the command is obscure or the bias effect is minor. > This work is still relevant, even if the total number of issues/bugs > we find in the process is zero (which is not the case). Really, no ... something which produces no improvement has no value at all ... we really shouldn't be wasting maintainer time with it because it has a cost to merge. I'm not sure we understand where the balance lies in value vs cost to merge but I am confident in the zero value case. > "The sucky thing about doing hard work to deploy hardening is that > the result is totally invisible by definition (things not happening) > [..]" > - Dmitry Vyukov Really, no. Something that can't be measured at all doesn't exist. And actually hardening is one of those things you can measure (which I do have to admit isn't true for everything in the security space) ... it's number of exploitable bugs found before you did it vs number of exploitable bugs found after you did it. Usually hardening eliminates a class of bug, so the way I've measured hardening before is to go through the CVE list for the last couple of years for product X, find all the bugs that are of the class we're looking to eliminate and say if we had hardened X against this class of bug we'd have eliminated Y% of the exploits. It can be quite impressive if Y is a suitably big number. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Mon, 2020-11-23 at 15:19 +0100, Miguel Ojeda wrote: > On Sun, Nov 22, 2020 at 11:36 PM James Bottomley > wrote: > > Well, it seems to be three years of someone's time plus the > > maintainer review time and series disruption of nearly a thousand > > patches. Let's be conservative and assume the producer worked > > about 30% on the series and it takes about 5-10 minutes per patch > > to review, merge and for others to rework existing series. So > > let's say it's cost a person year of a relatively junior engineer > > producing the patches and say 100h of review and application > > time. The latter is likely the big ticket item because it's what > > we have in least supply in the kernel (even though it's 20x vs the > > producer time). > > How are you arriving at such numbers? It is a total of ~200 trivial > lines. Well, I used git. It says that as of today in Linus' tree we have 889 patches related to fall throughs and the first series went in in october 2017 ... ignoring a couple of outliers back to February. > > It's not about the risk of the changes it's about the cost of > > implementing them. Even if you discount the producer time (which > > someone gets to pay for, and if I were the engineering manager, I'd > > be unhappy about), the review/merge/rework time is pretty > > significant in exchange for six minor bug fixes. Fine, when a new > > compiler warning comes along it's certainly reasonable to see if we > > can benefit from it and the fact that the compiler people think > > it's worthwhile is enough evidence to assume this initially. But > > at some point you have to ask whether that assumption is supported > > by the evidence we've accumulated over the time we've been using > > it. And if the evidence doesn't support it perhaps it is time to > > stop the experiment. > > Maintainers routinely review 1-line trivial patches, not to mention > internal API changes, etc. We're also complaining about the inability to recruit maintainers: https://www.theregister.com/2020/06/30/hard_to_find_linux_maintainers_says_torvalds/ And burn out: http://antirez.com/news/129 The whole crux of your argument seems to be maintainers' time isn't important so we should accept all trivial patches ... I'm pushing back on that assumption in two places, firstly the valulessness of the time and secondly that all trivial patches are valuable. > If some company does not want to pay for that, that's fine, but they > don't get to be maintainers and claim `Supported`. What I'm actually trying to articulate is a way of measuring value of the patch vs cost ... it has nothing really to do with who foots the actual bill. One thesis I'm actually starting to formulate is that this continual devaluing of maintainers is why we have so much difficulty keeping and recruiting them. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Mon, 2020-11-23 at 09:54 +1100, Finn Thain wrote: > But is anyone keeping score of the regressions? If unreported bugs > count, what about unreported regressions? Well, I was curious about the former (obviously no tool will tell me about the latter), so I asked git what patches had a fall-through series named in a fixes tag and these three popped out: 9cf51446e686 bpf, powerpc: Fix misuse of fallthrough in bpf_jit_comp() 6a9dc5fd6170 lib: Revert use of fallthrough pseudo-keyword in lib/ 91dbd73a1739 mips/oprofile: Fix fallthrough placement I don't think any of these is fixing a significant problem, but they did cause someone time and trouble to investigate. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Sun, 2020-11-22 at 21:35 +0100, Miguel Ojeda wrote: > On Sun, Nov 22, 2020 at 7:22 PM James Bottomley > wrote: > > Well, it's a problem in an error leg, sure, but it's not a really > > compelling reason for a 141 patch series, is it? All that fixing > > this error will do is get the driver to print "oh dear there's a > > problem" under four more conditions than it previously did. > > > > We've been at this for three years now with nearly a thousand > > patches, firstly marking all the fall throughs with /* fall through > > */ and later changing it to fallthrough. At some point we do have > > to ask if the effort is commensurate with the protection > > afforded. Please tell me our reward for all this effort isn't a > > single missing error print. > > It isn't that much effort, isn't it? Well, it seems to be three years of someone's time plus the maintainer review time and series disruption of nearly a thousand patches. Let's be conservative and assume the producer worked about 30% on the series and it takes about 5-10 minutes per patch to review, merge and for others to rework existing series. So let's say it's cost a person year of a relatively junior engineer producing the patches and say 100h of review and application time. The latter is likely the big ticket item because it's what we have in least supply in the kernel (even though it's 20x vs the producer time). > Plus we need to take into account the future mistakes that it might > prevent, too. So even if there were zero problems found so far, it is > still a positive change. Well, the question I was asking is if it's worth the cost which I've tried to outline above. > I would agree if these changes were high risk, though; but they are > almost trivial. It's not about the risk of the changes it's about the cost of implementing them. Even if you discount the producer time (which someone gets to pay for, and if I were the engineering manager, I'd be unhappy about), the review/merge/rework time is pretty significant in exchange for six minor bug fixes. Fine, when a new compiler warning comes along it's certainly reasonable to see if we can benefit from it and the fact that the compiler people think it's worthwhile is enough evidence to assume this initially. But at some point you have to ask whether that assumption is supported by the evidence we've accumulated over the time we've been using it. And if the evidence doesn't support it perhaps it is time to stop the experiment. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Sun, 2020-11-22 at 11:22 -0800, Joe Perches wrote: > On Sun, 2020-11-22 at 11:12 -0800, James Bottomley wrote: > > On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote: > > > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > > > > Please tell me our reward for all this effort isn't a single > > > > missing error print. > > > > > > There were quite literally dozens of logical defects found > > > by the fallthrough additions. Very few were logging only. > > > > So can you give us the best examples (or indeed all of them if > > someone is keeping score)? hopefully this isn't a US election > > situation ... > > Gustavo? Are you running for congress now? > > https://lwn.net/Articles/794944/ That's 21 reported fixes of which about 50% seem to produce no change in code behaviour at all, a quarter seem to have no user visible effect with the remaining quarter producing unexpected errors on obscure configuration parameters, which is why no-one really noticed them before. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Sun, 2020-11-22 at 10:25 -0800, Joe Perches wrote: > On Sun, 2020-11-22 at 10:21 -0800, James Bottomley wrote: > > Please tell me our reward for all this effort isn't a single > > missing error print. > > There were quite literally dozens of logical defects found > by the fallthrough additions. Very few were logging only. So can you give us the best examples (or indeed all of them if someone is keeping score)? hopefully this isn't a US election situation ... James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 000/141] Fix fall-through warnings for Clang
On Sun, 2020-11-22 at 08:17 -0800, Kees Cook wrote: > On Fri, Nov 20, 2020 at 11:51:42AM -0800, Jakub Kicinski wrote: > > On Fri, 20 Nov 2020 11:30:40 -0800 Kees Cook wrote: > > > On Fri, Nov 20, 2020 at 10:53:44AM -0800, Jakub Kicinski wrote: > > > > On Fri, 20 Nov 2020 12:21:39 -0600 Gustavo A. R. Silva wrote: > > > > > This series aims to fix almost all remaining fall-through > > > > > warnings in order to enable -Wimplicit-fallthrough for Clang. > > > > > > > > > > In preparation to enable -Wimplicit-fallthrough for Clang, > > > > > explicitly add multiple break/goto/return/fallthrough > > > > > statements instead of just letting the code fall through to > > > > > the next case. > > > > > > > > > > Notice that in order to enable -Wimplicit-fallthrough for > > > > > Clang, this change[1] is meant to be reverted at some point. > > > > > So, this patch helps to move in that direction. > > > > > > > > > > Something important to mention is that there is currently a > > > > > discrepancy between GCC and Clang when dealing with switch > > > > > fall-through to empty case statements or to cases that only > > > > > contain a break/continue/return statement[2][3][4]. > > > > > > > > Are we sure we want to make this change? Was it discussed > > > > before? > > > > > > > > Are there any bugs Clangs puritanical definition of fallthrough > > > > helped find? > > > > > > > > IMVHO compiler warnings are supposed to warn about issues that > > > > could be bugs. Falling through to default: break; can hardly be > > > > a bug?! > > > > > > It's certainly a place where the intent is not always clear. I > > > think this makes all the cases unambiguous, and doesn't impact > > > the machine code, since the compiler will happily optimize away > > > any behavioral redundancy. > > > > If none of the 140 patches here fix a real bug, and there is no > > change to machine code then it sounds to me like a W=2 kind of a > > warning. > > FWIW, this series has found at least one bug so far: > https://lore.kernel.org/lkml/CAFCwf11izHF=g1mGry1fE5kvFFFrxzhPSM6qKAO8gxSp=kr...@mail.gmail.com/ Well, it's a problem in an error leg, sure, but it's not a really compelling reason for a 141 patch series, is it? All that fixing this error will do is get the driver to print "oh dear there's a problem" under four more conditions than it previously did. We've been at this for three years now with nearly a thousand patches, firstly marking all the fall throughs with /* fall through */ and later changing it to fallthrough. At some point we do have to ask if the effort is commensurate with the protection afforded. Please tell me our reward for all this effort isn't a single missing error print. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] MAINTAINERS tag for cleanup robot
On Sun, 2020-11-22 at 08:10 -0800, Tom Rix wrote: > On 11/22/20 6:56 AM, Matthew Wilcox wrote: > > On Sun, Nov 22, 2020 at 06:46:46AM -0800, Tom Rix wrote: > > > On 11/21/20 7:23 PM, Matthew Wilcox wrote: > > > > On Sat, Nov 21, 2020 at 08:50:58AM -0800, t...@redhat.com > > > > wrote: > > > > > The fixer review is > > > > > https://reviews.llvm.org/D91789 > > > > > > > > > > A run over allyesconfig for x86_64 finds 62 issues, 5 are > > > > > false positives. The false positives are caused by macros > > > > > passed to other macros and by some macro expansions that did > > > > > not have an extra semicolon. > > > > > > > > > > This cleans up about 1,000 of the current 10,000 -Wextra- > > > > > semi-stmt warnings in linux-next. > > > > Are any of them not false-positives? It's all very well to > > > > enable stricter warnings, but if they don't fix any bugs, > > > > they're just churn. > > > > > > > While enabling additional warnings may be a side effect of this > > > effort > > > > > > the primary goal is to set up a cleaning robot. After that a > > > refactoring robot. > > Why do we need such a thing? Again, it sounds like more churn. > > It's really annoying when I'm working on something important that > > gets derailed by pointless churn. Churn also makes it harder to > > backport patches to earlier kernels. > > > A refactoring example on moving to treewide, consistent use of a new > api may help. > > Consider > > 2efc459d06f1630001e3984854848a5647086232 > > sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output > > A new api for printing in the sysfs. How do we use it treewide ? > > Done manually, it would be a heroic effort requiring high level > maintainers pushing and likely only get partially done. > > If a refactoring programatic fixit is done and validated on a one > subsystem, it can run on all the subsystems. > > The effort is a couple of weeks to write and validate the fixer, > hours to run over the tree. > > It won't be perfect but will be better than doing it manually. Here's a thought: perhaps we don't. sysfs_emit isn't a "new api" its a minor rewrap of existing best practice. The damage caused by the churn of forcing its use everywhere would far outweigh any actual benefit because pretty much every bug in this area has already been caught and killed by existing tools. We can enforce sysfs_emit going forwards using tools like checkpatch but there's no benefit and a lot of harm to be done by trying to churn the entire tree retrofitting it (both in terms of review time wasted as well as patch series derailed). James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC] MAINTAINERS tag for cleanup robot
On Sat, 2020-11-21 at 08:50 -0800, t...@redhat.com wrote: > A difficult part of automating commits is composing the subsystem > preamble in the commit log. For the ongoing effort of a fixer > producing > one or two fixes a release the use of 'treewide:' does not seem > appropriate. > > It would be better if the normal prefix was used. Unfortunately > normal is > not consistent across the tree. > > > D: Commit subsystem prefix > > ex/ for FPGA DFL DRIVERS > > D: fpga: dfl: > I've got to bet this is going to cause more issues than it solves. SCSI uses scsi: : for drivers but not every driver has a MAINTAINERS entry. We use either scsi: or scsi: core: for mid layer things, but we're not consistent. Block uses blk-: for all of it's stuff but almost no s have a MAINTAINERS entry. So the next thing you're going to cause is an explosion of suggested MAINTAINERs entries. Has anyone actually complained about treewide:? James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH RFC PKS/PMEM 22/58] fs/f2fs: Utilize new kmap_thread()
On Fri, 2020-10-09 at 14:34 -0700, Eric Biggers wrote: > On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.we...@intel.com wrote: > > From: Ira Weiny > > > > The kmap() calls in this FS are localized to a single thread. To > > avoid the over head of global PKRS updates use the new > > kmap_thread() call. > > > > Cc: Jaegeuk Kim > > Cc: Chao Yu > > Signed-off-by: Ira Weiny > > --- > > fs/f2fs/f2fs.h | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > > index d9e52a7f3702..ff72a45a577e 100644 > > --- a/fs/f2fs/f2fs.h > > +++ b/fs/f2fs/f2fs.h > > @@ -2410,12 +2410,12 @@ static inline struct page > > *f2fs_pagecache_get_page( > > > > static inline void f2fs_copy_page(struct page *src, struct page > > *dst) > > { > > - char *src_kaddr = kmap(src); > > - char *dst_kaddr = kmap(dst); > > + char *src_kaddr = kmap_thread(src); > > + char *dst_kaddr = kmap_thread(dst); > > > > memcpy(dst_kaddr, src_kaddr, PAGE_SIZE); > > - kunmap(dst); > > - kunmap(src); > > + kunmap_thread(dst); > > + kunmap_thread(src); > > } > > Wouldn't it make more sense to switch cases like this to > kmap_atomic()? > The pages are only mapped to do a memcpy(), then they're immediately > unmapped. On a VIPT/VIVT architecture, this is horrendously wasteful. You're taking something that was mapped at colour c_src mapping it to a new address src_kaddr, which is likely a different colour and necessitates flushing the original c_src, then you copy it to dst_kaddr, which is also likely a different colour from c_dst, so dst_kaddr has to be flushed on kunmap and c_dst has to be invalidated on kmap. What we should have is an architectural primitive for doing this, something like kmemcopy_arch(dst, src). PIPT architectures can implement it as the above (possibly losing kmap if they don't need it) but VIPT/VIVT architectures can set up a correctly coloured mapping so they can simply copy from c_src to c_dst without any need to flush and the data arrives cache hot at c_dst. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] block: convert tasklets to use new tasklet_setup() API
On Wed, 2020-08-19 at 21:54 +0530, Allen wrote: > > [...] > > > > Since both threads seem to have petered out, let me suggest in > > > > kernel.h: > > > > > > > > #define cast_out(ptr, container, member) \ > > > > container_of(ptr, typeof(*container), member) > > > > > > > > It does what you want, the argument order is the same as > > > > container_of with the only difference being you name the > > > > containing structure instead of having to specify its type. > > > > > > Not to incessantly bike shed on the naming, but I don't like > > > cast_out, it's not very descriptive. And it has connotations of > > > getting rid of something, which isn't really true. > > > > Um, I thought it was exactly descriptive: you're casting to the > > outer container. I thought about following the C++ dynamic casting > > style, so out_cast(), but that seemed a bit pejorative. What about > > outer_cast()? > > > > > FWIW, I like the from_ part of the original naming, as it has > > > some clues as to what is being done here. Why not just > > > from_container()? That should immediately tell people what it > > > does without having to look up the implementation, even before > > > this becomes a part of the accepted coding norm. > > > > I'm not opposed to container_from() but it seems a little less > > descriptive than outer_cast() but I don't really care. I always > > have to look up container_of() when I'm using it so this would just > > be another macro of that type ... > > > > So far we have a few which have been suggested as replacement > for from_tasklet() > > - out_cast() or outer_cast() > - from_member(). > - container_from() or from_container() > > from_container() sounds fine, would trimming it a bit work? like > from_cont(). I'm fine with container_from(). It's the same form as container_of() and I think we need urgent agreement to not stall everything else so the most innocuous name is likely to get the widest acceptance. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] block: convert tasklets to use new tasklet_setup() API
On Wed, 2020-08-19 at 07:00 -0600, Jens Axboe wrote: > On 8/18/20 1:00 PM, James Bottomley wrote: [...] > > Since both threads seem to have petered out, let me suggest in > > kernel.h: > > > > #define cast_out(ptr, container, member) \ > > container_of(ptr, typeof(*container), member) > > > > It does what you want, the argument order is the same as > > container_of with the only difference being you name the containing > > structure instead of having to specify its type. > > Not to incessantly bike shed on the naming, but I don't like > cast_out, it's not very descriptive. And it has connotations of > getting rid of something, which isn't really true. Um, I thought it was exactly descriptive: you're casting to the outer container. I thought about following the C++ dynamic casting style, so out_cast(), but that seemed a bit pejorative. What about outer_cast()? > FWIW, I like the from_ part of the original naming, as it has some > clues as to what is being done here. Why not just from_container()? > That should immediately tell people what it does without having to > look up the implementation, even before this becomes a part of the > accepted coding norm. I'm not opposed to container_from() but it seems a little less descriptive than outer_cast() but I don't really care. I always have to look up container_of() when I'm using it so this would just be another macro of that type ... James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] block: convert tasklets to use new tasklet_setup() API
On Tue, 2020-08-18 at 13:10 -0700, Kees Cook wrote: > On Tue, Aug 18, 2020 at 01:00:33PM -0700, James Bottomley wrote: > > On Mon, 2020-08-17 at 13:02 -0700, Jens Axboe wrote: > > > On 8/17/20 12:48 PM, Kees Cook wrote: > > > > On Mon, Aug 17, 2020 at 12:44:34PM -0700, Jens Axboe wrote: > > > > > On 8/17/20 12:29 PM, Kees Cook wrote: > > > > > > On Mon, Aug 17, 2020 at 06:56:47AM -0700, Jens Axboe wrote: > > > > > > > On 8/17/20 2:15 AM, Allen Pais wrote: > > > > > > > > From: Allen Pais > > > > > > > > > > > > > > > > In preparation for unconditionally passing the > > > > > > > > struct tasklet_struct pointer to all tasklet > > > > > > > > callbacks, switch to using the new tasklet_setup() > > > > > > > > and from_tasklet() to pass the tasklet pointer > > > > > > > > explicitly. > > > > > > > > > > > > > > Who came up with the idea to add a macro 'from_tasklet' > > > > > > > that > > > > > > > is just container_of? container_of in the code would be > > > > > > > _much_ more readable, and not leave anyone guessing wtf > > > > > > > from_tasklet is doing. > > > > > > > > > > > > > > I'd fix that up now before everything else goes in... > > > > > > > > > > > > As I mentioned in the other thread, I think this makes > > > > > > things > > > > > > much more readable. It's the same thing that the > > > > > > timer_struct > > > > > > conversion did (added a container_of wrapper) to avoid the > > > > > > ever-repeating use of typeof(), long lines, etc. > > > > > > > > > > But then it should use a generic name, instead of each sub- > > > > > system > > > > > using some random name that makes people look up exactly what > > > > > it > > > > > does. I'm not huge fan of the container_of() redundancy, but > > > > > adding private variants of this doesn't seem like the best > > > > > way > > > > > forward. Let's have a generic helper that does this, and use > > > > > it > > > > > everywhere. > > > > > > > > I'm open to suggestions, but as things stand, these kinds of > > > > treewide > > > > > > On naming? Implementation is just as it stands, from_tasklet() is > > > totally generic which is why I objected to it. from_member()? Not > > > great with naming... But I can see this going further and then > > > we'll > > > suddenly have tons of these. It's not good for readability. > > > > Since both threads seem to have petered out, let me suggest in > > kernel.h: > > > > #define cast_out(ptr, container, member) \ > > container_of(ptr, typeof(*container), member) > > > > It does what you want, the argument order is the same as > > container_of with the only difference being you name the containing > > structure instead of having to specify its type. > > I like this! Shall I send this to Linus to see if this can land in > -rc2 for use going forward? Sure ... he's probably been lurking on this thread anyway ... it's about time he got off his arse^Wthe fence and made an executive decision ... James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] block: convert tasklets to use new tasklet_setup() API
On Mon, 2020-08-17 at 13:02 -0700, Jens Axboe wrote: > On 8/17/20 12:48 PM, Kees Cook wrote: > > On Mon, Aug 17, 2020 at 12:44:34PM -0700, Jens Axboe wrote: > > > On 8/17/20 12:29 PM, Kees Cook wrote: > > > > On Mon, Aug 17, 2020 at 06:56:47AM -0700, Jens Axboe wrote: > > > > > On 8/17/20 2:15 AM, Allen Pais wrote: > > > > > > From: Allen Pais > > > > > > > > > > > > In preparation for unconditionally passing the > > > > > > struct tasklet_struct pointer to all tasklet > > > > > > callbacks, switch to using the new tasklet_setup() > > > > > > and from_tasklet() to pass the tasklet pointer explicitly. > > > > > > > > > > Who came up with the idea to add a macro 'from_tasklet' that > > > > > is just container_of? container_of in the code would be > > > > > _much_ more readable, and not leave anyone guessing wtf > > > > > from_tasklet is doing. > > > > > > > > > > I'd fix that up now before everything else goes in... > > > > > > > > As I mentioned in the other thread, I think this makes things > > > > much more readable. It's the same thing that the timer_struct > > > > conversion did (added a container_of wrapper) to avoid the > > > > ever-repeating use of typeof(), long lines, etc. > > > > > > But then it should use a generic name, instead of each sub-system > > > using some random name that makes people look up exactly what it > > > does. I'm not huge fan of the container_of() redundancy, but > > > adding private variants of this doesn't seem like the best way > > > forward. Let's have a generic helper that does this, and use it > > > everywhere. > > > > I'm open to suggestions, but as things stand, these kinds of > > treewide > > On naming? Implementation is just as it stands, from_tasklet() is > totally generic which is why I objected to it. from_member()? Not > great with naming... But I can see this going further and then we'll > suddenly have tons of these. It's not good for readability. Since both threads seem to have petered out, let me suggest in kernel.h: #define cast_out(ptr, container, member) \ container_of(ptr, typeof(*container), member) It does what you want, the argument order is the same as container_of with the only difference being you name the containing structure instead of having to specify its type. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Wed, 2019-07-17 at 23:27 +0200, Paul Bolle wrote: > Hi Jose, > > Souza, Jose schreef op di 16-07-2019 om 16:32 [+]: > > Paul and James could you test this final solution(at least for > > 5.2)? Please revert the hack patch and apply this one. > > I've just reached a day of uptime with your revert. (The proper > uptime is just a fraction of a day, this being a laptop.) Anyhow, no > screen freezes occurred during this day. I'm afraid my status is that I'm in Tokyo doing conference presentations (and kernel demos) so I'm a bit reluctant to mess with my setup until I finish everything on Friday, but I will test it after then, promise. > So feel free to add my Tested-by tag to your revert. But please don't > forget that James earned a Reported-by tag. Thanks, James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Thu, 2019-07-11 at 16:40 -0700, James Bottomley wrote: > On Thu, 2019-07-11 at 23:28 +, Souza, Jose wrote: > > On Fri, 2019-07-12 at 01:03 +0200, Paul Bolle wrote: > > > James Bottomley schreef op do 11-07-2019 om 15:38 [-0700]: > > > > On Thu, 2019-07-11 at 22:26 +, Souza, Jose wrote: > > > > > It eventually comes back from screen freeze? Like moving the > > > > > mouse or typing brings it back? > > > > > > > > No, it seems to be frozen for all time (at least until I got > > > > bored waiting, which was probably 20 minutes). Even if I > > > > reboot the machine, the current screen state stays until the > > > > system powers off. > > > > > > As I mentioned earlier, a suspend/resume cycle unfreezes the > > > screen. > > > > > > And I seem to remember that, if the gnome screen-locking > > > eventually kicks in, unlocking the screen still works, as the > > > screen then isn't frozen anymore. > > > > > > Thanks, > > > > Thanks for all the information Paul. > > > > Could test with the patch attached? > > Applied and running with it now. It has survived 6h without manifesting the regression. Starting again to try a whole day. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Thu, 2019-07-11 at 23:28 +, Souza, Jose wrote: > On Fri, 2019-07-12 at 01:03 +0200, Paul Bolle wrote: > > James Bottomley schreef op do 11-07-2019 om 15:38 [-0700]: > > > On Thu, 2019-07-11 at 22:26 +, Souza, Jose wrote: > > > > It eventually comes back from screen freeze? Like moving the > > > > mouse or typing brings it back? > > > > > > No, it seems to be frozen for all time (at least until I got > > > bored waiting, which was probably 20 minutes). Even if I reboot > > > the machine, the current screen state stays until the system > > > powers off. > > > > As I mentioned earlier, a suspend/resume cycle unfreezes the > > screen. > > > > And I seem to remember that, if the gnome screen-locking eventually > > kicks in, unlocking the screen still works, as the screen then > > isn't frozen anymore. > > > > Thanks, > > Thanks for all the information Paul. > > Could test with the patch attached? Applied and running with it now. > If the issue happens again could send the output of: > > /sys/kernel/debug/dri/0/eDP-1/i915_psr_sink_status > /sys/kernel/debug/dri/0/i915_edp_psr_status > > Thanks so much for all the help Sure, James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Thu, 2019-07-11 at 22:26 +, Souza, Jose wrote: > On Thu, 2019-07-11 at 14:57 -0700, James Bottomley wrote: > > On Thu, 2019-07-11 at 13:28 -0700, James Bottomley wrote: > > > I've also updated to the released 5.2 kernel and am running with > > > the > > > debug parameters you requested ... but so far no reproduction. > > > > OK, it's happened. I've attached the dmesg (it's 4MB > > uncompressed). > > Is there any other output you'd like from the machine? I've got an > > ssh session into it so I can try anything. > > Thanks, could you also share the output of this after the screen > freeze? > > /sys/kernel/debug/dri/0/i915_edp_psr_status > /sys/kernel/debug/dri/0/i915_display_info > /sys/kernel/debug/dri/0/i915_dmc_info > /sys/kernel/debug/pmc_core/package_cstate_show jarvis:~ # for f in `cat ~/tmp.txt`; do echo $f; cat $f; done /sys/kernel/debug/dri/0/i915_edp_psr_status Sink support: yes [0x01] PSR mode: PSR1 enabled Source PSR ctl: disabled [0x01f00726] Source PSR status: IDLE [0x04010216] Busy frontbuffer bits: 0x0001 /sys/kernel/debug/dri/0/i915_display_info CRTC info - CRTC 47: pipe: A, active=yes, (size=3200x1800), dither=no, bpp=24 fb: 118, pos: 0x0, size: 3200x1800 encoder 84: type: DDI A, connectors: connector 85: type: eDP-1, status: connected, mode: "": 0 373250 3200 3248 3280 3360 1800 1803 1808 1852 0x0 0xa cursor visible? yes, position (-3, 261), size 256x256, addr 0x0174 num_scalers=2, scaler_users=0 scaler_id=-1, scalers[0]: use=no, mode=0, scalers[1]: use=no, mode=0 --Plane id 30: type=PRI, crtc_pos= 0x 0, crtc_size=3200x1800, src_pos=0.x0., src_size=3200.x1800., format=XR24 little-endian (0x34325258), rotation=0 (0x0001) --Plane id 37: type=OVL, crtc_pos= 0x 0, crtc_size= 0x 0, src_pos=0.x0., src_size=0.x0., format=N/A, rotation=0 (0x0001) --Plane id 44: type=CUR, crtc_pos= -3x 261, crtc_size= 256x 256, src_pos=0.x0., src_size=256.x256., format=AR24 little-endian (0x34325241), rotation=0 (0x0001) underrun reporting: cpu=yes pch=yes CRTC 65: pipe: B, active=yes, (size=1600x1200), dither=no, bpp=24 fb: 118, pos: 0x0, size: 3200x1800 encoder 90: type: DDI B, connectors: connector 91: type: DP-1, status: connected, mode: "": 0 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x0 0x5 cursor visible? yes, position (-3, 261), size 256x256, addr 0x017c num_scalers=2, scaler_users=0 scaler_id=-1, scalers[0]: use=no, mode=0, scalers[1]: use=no, mode=0 --Plane id 48: type=PRI, crtc_pos= 0x 0, crtc_size=1600x1200, src_pos=0.x0., src_size=1600.x1200., format=XR24 little-endian (0x34325258), rotation=0 (0x0001) --Plane id 55: type=OVL, crtc_pos= 0x 0, crtc_size= 0x 0, src_pos=0.x0., src_size=0.x0., format=N/A, rotation=0 (0x0001) --Plane id 62: type=CUR, crtc_pos= -3x 261, crtc_size= 256x 256, src_pos=0.x0., src_size=256.x256., format=AR24 little-endian (0x34325241), rotation=0 (0x0001) underrun reporting: cpu=yes pch=yes CRTC 83: pipe: C, active=no, (size=0x0), dither=no, bpp=0 underrun reporting: cpu=yes pch=yes Connector info -- connector 85: type eDP-1, status: connected physical dimensions: 290x170mm subpixel order: Unknown CEA rev: 0 DPCD rev: 12 audio support: no fixed mode: "3200x1800": 60 373250 3200 3248 3280 3360 1800 1803 1808 1852 0x48 0xa DP branch device present: no modes: "3200x1800": 60 373250 3200 3248 3280 3360 1800 1803 1808 1852 0x48 0xa "3200x1800": 48 298600 3200 3248 3280 3360 1800 1803 1808 1852 0x40 0xa connector 91: type DP-1, status: connected physical dimensions: 430x320mm subpixel order: Unknown CEA rev: 0 DPCD rev: 12 audio support: no DP branch device present: yes Type: VGA ID: HW: 0.0 SW: 1.0 modes: "1600x1200": 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x48 0x5 "1400x1050": 75 156000 1400 1504 1648 1896 1050 1053 1057 1099 0x40 0x6 "1400x1050": 60 121750 1400 1488 1632 1864 1050 1053 1057 1089 0x40 0x6 "1280x1024": 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5 "1280x1024": 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x40 0x5 "1280x960": 60 108000 1280 1376 1488 1800 960 961 964 1000 0x40 0x5 "1152x864": 75 10800
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Thu, 2019-07-11 at 20:25 +, Souza, Jose wrote: > On Thu, 2019-07-11 at 13:11 -0700, James Bottomley wrote: > > On Thu, 2019-07-11 at 10:29 +0100, Chris Wilson wrote: > > > Quoting James Bottomley (2019-06-29 19:56:52) > > > > The symptoms are really weird: the screen image is locked in > > > > place. The machine is still functional and if I log in over > > > > the network can do anything I like, including killing the X > > > > server and the display will never alter. It also seems that > > > > the system is accepting keyboard input because when it freezes > > > > I can cat information to a file (if the mouse was over an > > > > xterm) and verify over the network the file contents. Nothing > > > > unusual appears in dmesg when the lockup happens. > > > > > > > > The last kernel I booted successfully on the system was 5.0, so > > > > I'll try compiling 5.1 to narrow down the changes. > > > > > > It's likely this is panel self-refresh going haywire. > > > > > > commit 8f6e87d6d561f10cfa48a687345512419839b6d8 > > > Author: José Roberto de Souza > > > Date: Thu Mar 7 16:00:50 2019 -0800 > > > > > > drm/i915: Enable PSR2 by default > > > > > > The support for PSR2 was polished, IGT tests for PSR2 was > > > added and > > > it was tested performing regular user workloads like > > > browsing, > > > editing documents and compiling Linux, so it is time to > > > enable it by > > > default and enjoy even more power-savings. > > > > > > Temporary workaround would be to set i915.enable_psr=0 > > > > It looks plausible. I have to say I was just about to mark a > > bisect containing this as good, but that probably reflects my > > difficulty > > reproducing the issue. > > Take at look of what PSR version is supported by your panel, it > likely that a notebook shipped with Skylake will have panel that > supports only PSR1 so that patch has no effect on your machine. > > sudo more /sys/kernel/debug/dri/0/i915_edp_psr_status > Sink support: yes [0x01] It says Sink support: yes [0x01] PSR mode: PSR1 enabled Source PSR ctl: enabled [0x81f00726] Source PSR status: IDLE [0x04010212] Busy frontbuffer bits: 0x I've also updated to the released 5.2 kernel and am running with the debug parameters you requested ... but so far no reproduction. James
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Thu, 2019-07-11 at 10:29 +0100, Chris Wilson wrote: > Quoting James Bottomley (2019-06-29 19:56:52) > > The symptoms are really weird: the screen image is locked in > > place. The machine is still functional and if I log in over the > > network can do anything I like, including killing the X server and > > the display will never alter. It also seems that the system is > > accepting keyboard input because when it freezes I can cat > > information to a file (if the mouse was over an xterm) and verify > > over the network the file contents. Nothing unusual appears in > > dmesg when the lockup happens. > > > > The last kernel I booted successfully on the system was 5.0, so > > I'll try compiling 5.1 to narrow down the changes. > > It's likely this is panel self-refresh going haywire. > > commit 8f6e87d6d561f10cfa48a687345512419839b6d8 > Author: José Roberto de Souza > Date: Thu Mar 7 16:00:50 2019 -0800 > > drm/i915: Enable PSR2 by default > > The support for PSR2 was polished, IGT tests for PSR2 was added > and > it was tested performing regular user workloads like browsing, > editing documents and compiling Linux, so it is time to enable it > by > default and enjoy even more power-savings. > > Temporary workaround would be to set i915.enable_psr=0 It looks plausible. I have to say I was just about to mark a bisect containing this as good, but that probably reflects my difficulty reproducing the issue. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Wed, 2019-07-10 at 23:59 +0200, Paul Bolle wrote: > James Bottomley schreef op wo 10-07-2019 om 10:35 [-0700]: > > I can get back to it this afternoon, when I'm done with the meeting > > requirements and doing other dev stuff. > > I've started bisecting using your suggestion of that drm merge: > $ git bisect log > git bisect start > # good: [89c3b37af87ec183b666d83428cb28cc421671a6] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide > git bisect good 89c3b37af87ec183b666d83428cb28cc421671a6 > # bad: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm- > next-2019-05-09' of git://anongit.freedesktop.org/drm/drm > git bisect bad a2d635decbfa9c1e4ae15cb05b68b2559f7f827c > # bad: [ad2c467aa92e283e9e8009bb9eb29a5c6a2d1217] drm/i915: > Update DRIVER_DATE to 20190417 > git bisect bad ad2c467aa92e283e9e8009bb9eb29a5c6a2d1217 > > Git told me I have nine steps after this. So at two hours per step I > might > pinpoint the offending commit by Friday the 12th. If I'm lucky. > (There are > other things to do than bisecting this issue.) > > If you find that commit before I do, I'll be all ears. Sure ... I'm doing the holistic thing and looking at the tree in that branch. It seems to consist of 7 i915 updates c09d39166d8a3f3788680b32dbb0a40a70de32e2 DRIVER_DATE to 20190207 47ed55a9bb9e284d46d6f2489e32a53b59152809 DRIVER_DATE to 20190220 f4ecb8ae70de86710e85138ce49af5c689951953 DRIVER_DATE to 20190311 1284ec985572232ace4817476baeb2d82b60be7a DRIVER_DATE to 20190320 a01b2c6f47d86c7d1a9fa822b3b91ec233b61784 DRIVER_DATE to 20190328 28d618e9ab86f26a31af0b235ced55beb3e343c8 DRIVER_DATE to 20190404 ad2c467aa92e283e9e8009bb9eb29a5c6a2d1217 DRIVER_DATE to 20190417 So I figured I'd see if I can locate the problem by bisection of those plus inspection. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Wed, 2019-07-10 at 18:45 +0200, Paul Bolle wrote: > James Bottomley schreef op wo 10-07-2019 om 09:32 [-0700]: > > You seem to be getting it to happen much more often than I can. > > Last > > night, on the below pull request it took me a good hour to see the > > freeze. > > Yes. Sometimes within a minute of resuming. Typing stuff into > evolution seems to help with triggering this. It's all a bit > mysterious, but this message alone frooze my laptop a few times. > Seriously! > > > Sure, my current testing indicates it's somewhere inside this pull > > request: > > > > Merge: 89c3b37af87e eb85d03e01c3 > > Author: Linus Torvalds > > Date: Wed May 8 21:35:19 2019 -0700 > > > > Merge tag 'drm-next-2019-05-09' of > > git://anongit.freedesktop.org/drm/drm > > > > Pull drm updates from Dave Airlie: > > Lazy question: how does one determine the first and last commit > inside a merge request? Can I simply do > good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c^ > bada2d635decbfa9c1e4ae15cb05b68b2559f7f827c > > for git bisect? I think so. I actually did bad a2d635decbfa9c1e4ae15cb05b68b2559f7f827c good89c3b37af87ec183b666d83428cb28cc421671a6 But I think ^ steps down in the main branch. Note, I've only done a couple of hours run on what I think is the good commit ... and actually I'd already marked a2d635decbfa9c1e4ae15cb05b68b2559f7f827c as good until the screen froze just after I'd built the new bisect kernel, so there's still some chance 89c3b37af87ec183b666d83428cb28cc421671a6 is bad. > > So I was about to test out the i915 changes in that but since my > > laptop is what I use for daily work, it's a bit hard (can't freeze > > up on video calls for instance). > > I usually use one of the ThinkPads from my embarrassing pile of > outdated hardware to do nasty bisects, but I'm not about to loose any > income if my much appreciated XPS 13 is out of order for a while. I can get back to it this afternoon, when I'm done with the meeting requirements and doing other dev stuff. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Wed, 2019-07-10 at 18:16 +0200, Paul Bolle wrote: > Hi James, > > James Bottomley schreef op wo 10-07-2019 om 08:01 [-0700]: > > I've confirmed that 5.1 doesn't have the regression and I'm now > > trying to bisect the 5.2 merge window, but since the problem takes > > quite a while to manifest this will take some time. Any hints > > about specific patches that might be the problem would be welcome. > > (Perhaps my message of yesterday never reached you.) No, sorry, if the list is followup to list, I'm not subscribed. I see it now I look in the archives, though. --- > Upgrading to 5.2 (from 5.1.y) on a "Dell XPS 13 9350" (is that a > skylake too?) I believe so. My laptop is a 9350. I believe they're the earliest skylake produced. > showed similar symptoms. There's no pattern to the freezes that I > can see. They're rather frequent too (think every few minutes). Eg, > two freezes while composing this message! You seem to be getting it to happen much more often than I can. Last night, on the below pull request it took me a good hour to see the freeze. --- > It seems I hit this problem quite easily. Bisecting v5.1..v5.2 could > be a real chore, so perhaps we could coordinate efforts (off-list)? Sure, my current testing indicates it's somewhere inside this pull request: Merge: 89c3b37af87e eb85d03e01c3 Author: Linus Torvalds Date: Wed May 8 21:35:19 2019 -0700 Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm Pull drm updates from Dave Airlie: So I was about to test out the i915 changes in that but since my laptop is what I use for daily work, it's a bit hard (can't freeze up on video calls for instance). James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
On Sat, 2019-06-29 at 11:56 -0700, James Bottomley wrote: > The symptoms are really weird: the screen image is locked in place. > The machine is still functional and if I log in over the network I > can do anything I like, including killing the X server and the > display will never alter. It also seems that the system is accepting > keyboard input because when it freezes I can cat information to a > file (if the mouse was over an xterm) and verify over the network the > file contents. Nothing unusual appears in dmesg when the lockup > happens. > > The last kernel I booted successfully on the system was 5.0, so I'll > try compiling 5.1 to narrow down the changes. I've confirmed that 5.1 doesn't have the regression and I'm now trying to bisect the 5.2 merge window, but since the problem takes quite a while to manifest this will take some time. Any hints about specific patches that might be the problem would be welcome. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
screen freeze with 5.2-rc6 Dell XPS-13 skylake i915
The symptoms are really weird: the screen image is locked in place. The machine is still functional and if I log in over the network I can do anything I like, including killing the X server and the display will never alter. It also seems that the system is accepting keyboard input because when it freezes I can cat information to a file (if the mouse was over an xterm) and verify over the network the file contents. Nothing unusual appears in dmesg when the lockup happens. The last kernel I booted successfully on the system was 5.0, so I'll try compiling 5.1 to narrow down the changes. James
Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
On Mon, 2016-09-19 at 08:09 -0700, James Bottomley wrote: > On Sun, 2016-09-18 at 13:35 +0200, Thorsten Leemhuis wrote: > > Hi! James & Paulo: What's the current status of this? > > No, the only interaction has been the suggestion below for a revert, > which didn't fix the problem. > > > Was this issue discussed elsewhere or even fixed in between? Just > > asking, because this issue is on the list of regressions for 4.8. > > > I'm just about to try out -rc7, but it's not fixed so far. OK, with -rc7 and the i915 fixes, there seems to be a marked improvement. I can no longer crash the crtc by using lots of xrandr switches, which was the principal problem. I've so far only got one of these in the logs [14858.635035] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun And the only residual problem seems to be that the monitor goes blank periodically, but this can be fixed by switching resolution a couple of times. I haven't seen any of the link training errors so far and I've run through my usual battery of be nasty to the external monitor tests. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
On Sun, 2016-09-18 at 13:35 +0200, Thorsten Leemhuis wrote: > Hi! James & Paulo: What's the current status of this? No, the only interaction has been the suggestion below for a revert, which didn't fix the problem. > Was this issue discussed elsewhere or even fixed in between? Just > asking, because this issue is on the list of regressions for 4.8. I'm just about to try out -rc7, but it's not fixed so far. James > Ciao, Thorsten > > On 01.09.2016 00:25, James Bottomley wrote: > > On Wed, 2016-08-31 at 21:51 +, Zanoni, Paulo R wrote: > > > Em Qua, 2016-08-31 às 14:43 -0700, James Bottomley escreveu: > > > > On Wed, 2016-08-31 at 11:23 -0700, James Bottomley wrote: > > > > > On Fri, 2016-08-26 at 09:10 -0400, James Bottomley wrote: > > > > > > We seem to have an xrandr regression with skylake now. > > > > > > What's > > > > > > happening is that I can get output on to a projector, but > > > > > > the > > > > > > system is losing video when I change the xrandr sessions > > > > > > (like > > > > > > going from a --above b to a --same-as b). The main screen > > > > > > goes > > > > > > blank, which is basically a reboot situation. > > > > > > Unfortunately, I > > > > > > can't seem to get the logs out of systemd to see if there > > > > > > was a > > > > > > dump to dmesg (the system was definitely responding). > > > > > > > > > > > > I fell back to 4.6.2 which worked perfectly, so this is > > > > > > definitely > > > > > > some sort of regression. I'll be able to debug more fully > > > > > > when > > > > > > I > > > > > > get back home from the Linux Security Summit. > > > > > > > > > > I'm home now. Unfortunately, my monitor isn't as problematic > > > > > as > > > > > the > > > > > projector, but by flipping between various modes and > > > > > separating > > > > > and > > > > > overlaying the panels with --above and --same-as (xrandr), I > > > > > can > > > > > eventually get it to the point where the main LCD panel goes > > > > > black > > > > > and can only be restarted by specifying a different mode. > > > > > > > > > > This seems to be associated with these lines in the X > > > > > > > > > > [ 14714.389] (EE) intel(0): failed to set mode: Invalid > > > > > argument > > > > > [22] > > > > > > > > > > But the curious thing is that even if this fails with the > > > > > error > > > > > message once, it may succeed a second time, so it looks to be > > > > > a > > > > > transient error translation problem from the kernel driver. > > > > > > > > > > I've attached the full log below. > > > > > > > > > > This is only with a VGA output. I currently don't have a > > > > > HDMI > > > > > dongle, but I'm in the process of acquiring one. > > > > > > > > After more playing around, I'm getting thousands of these in > > > > the > > > > kernel > > > > log (possibly millions: the log wraps very fast): > > > > > > > > [23504.873606] [drm:intel_dp_start_link_train [i915]] *ERROR* > > > > failed > > > > to train DP, aborting > > > > > > > > And then finally it gives up with > > > > > > > > [25023.770951] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] > > > > *ERROR* CPU pipe B FIFO underrun > > > > [25561.926075] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] > > > > *ERROR* CPU pipe A FIFO underrun > > > > > > > > And the crtc for the VGA output becomes non-responsive to any > > > > configuration command. This requires a reboot and sometimes a > > > > UEFI > > > > variable reset before it comes back. > > > > > > Please see this discussion: > > > https://patchwork.freedesktop.org/patch/103237/ > > > > > > Do you have this patch on your tree? Does the problem go away if > > > you > > > revert it? > > > > Yes, I've got it, it went in in 4.8-rc3 according to git: > > > > commit 58e311b09c319183254d9220c50a533e7157c9ab > > Author: Matt Roper <matthew.d.ro...@intel.com> > > Date: Thu Aug 4 14:08:00 2016 -0700 > > > > drm/i915/gen9: Give one extra block per line for SKL plane WM > > calculations > > > > Reverting it causes the secondary display not to sync pretty much > > at > > all. However, in the flickers I can see, it does work OK and > > doesn't > > now crash switching from --same-as to --above and back > > > > I also still get the logs filling up with the link training errors. > > > > On balance, although the behaviour is different, it's not an > > improvement because if I can't sync with the projector, I can't > > really > > use this as a fix. > > > > James > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
On Wed, 2016-08-31 at 21:51 +, Zanoni, Paulo R wrote: > Hi > > Em Qua, 2016-08-31 às 14:43 -0700, James Bottomley escreveu: > > On Wed, 2016-08-31 at 11:23 -0700, James Bottomley wrote: > > > > > > On Fri, 2016-08-26 at 09:10 -0400, James Bottomley wrote: > > > > > > > > We seem to have an xrandr regression with skylake now. What's > > > > happening is that I can get output on to a projector, but the > > > > system is losing video when I change the xrandr sessions (like > > > > going from a --above b to a --same-as b). The main screen goes > > > > blank, which is basically a reboot situation. Unfortunately, I > > > > can't seem to get the logs out of systemd to see if there was a > > > > dump to dmesg (the system was definitely responding). > > > > > > > > I fell back to 4.6.2 which worked perfectly, so this is > > > > definitely > > > > some sort of regression. I'll be able to debug more fully when > > > > I > > > > get back home from the Linux Security Summit. > > > > > > I'm home now. Unfortunately, my monitor isn't as problematic as > > > the > > > projector, but by flipping between various modes and separating > > > and > > > overlaying the panels with --above and --same-as (xrandr), I can > > > eventually get it to the point where the main LCD panel goes > > > black > > > and can only be restarted by specifying a different mode. > > > > > > This seems to be associated with these lines in the X > > > > > > [ 14714.389] (EE) intel(0): failed to set mode: Invalid argument > > > [22] > > > > > > But the curious thing is that even if this fails with the error > > > message once, it may succeed a second time, so it looks to be a > > > transient error translation problem from the kernel driver. > > > > > > I've attached the full log below. > > > > > > This is only with a VGA output. I currently don't have a HDMI > > > dongle, but I'm in the process of acquiring one. > > > > After more playing around, I'm getting thousands of these in the > > kernel > > log (possibly millions: the log wraps very fast): > > > > [23504.873606] [drm:intel_dp_start_link_train [i915]] *ERROR* > > failed > > to train DP, aborting > > > > And then finally it gives up with > > > > [25023.770951] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] > > *ERROR* CPU pipe B FIFO underrun > > [25561.926075] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] > > *ERROR* CPU pipe A FIFO underrun > > > > And the crtc for the VGA output becomes non-responsive to any > > configuration command. This requires a reboot and sometimes a UEFI > > variable reset before it comes back. > > Please see this discussion: > https://patchwork.freedesktop.org/patch/103237/ > > Do you have this patch on your tree? Does the problem go away if you > revert it? Yes, I've got it, it went in in 4.8-rc3 according to git: commit 58e311b09c319183254d9220c50a533e7157c9ab Author: Matt Roper <matthew.d.ro...@intel.com> Date: Thu Aug 4 14:08:00 2016 -0700 drm/i915/gen9: Give one extra block per line for SKL plane WM calculations Reverting it causes the secondary display not to sync pretty much at all. However, in the flickers I can see, it does work OK and doesn't now crash switching from --same-as to --above and back I also still get the logs filling up with the link training errors. On balance, although the behaviour is different, it's not an improvement because if I can't sync with the projector, I can't really use this as a fix. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
On Wed, 2016-08-31 at 11:23 -0700, James Bottomley wrote: > On Fri, 2016-08-26 at 09:10 -0400, James Bottomley wrote: > > We seem to have an xrandr regression with skylake now. What's > > happening is that I can get output on to a projector, but the > > system is losing video when I change the xrandr sessions (like > > going from a --above b to a --same-as b). The main screen goes > > blank, which is basically a reboot situation. Unfortunately, I > > can't seem to get the logs out of systemd to see if there was a > > dump to dmesg (the system was definitely responding). > > > > I fell back to 4.6.2 which worked perfectly, so this is definitely > > some sort of regression. I'll be able to debug more fully when I > > get back home from the Linux Security Summit. > > I'm home now. Unfortunately, my monitor isn't as problematic as the > projector, but by flipping between various modes and separating and > overlaying the panels with --above and --same-as (xrandr), I can > eventually get it to the point where the main LCD panel goes black > and can only be restarted by specifying a different mode. > > This seems to be associated with these lines in the X > > [ 14714.389] (EE) intel(0): failed to set mode: Invalid argument [22] > > But the curious thing is that even if this fails with the error > message once, it may succeed a second time, so it looks to be a > transient error translation problem from the kernel driver. > > I've attached the full log below. > > This is only with a VGA output. I currently don't have a HDMI > dongle, but I'm in the process of acquiring one. After more playing around, I'm getting thousands of these in the kernel log (possibly millions: the log wraps very fast): [23504.873606] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to train DP, aborting And then finally it gives up with [25023.770951] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [25561.926075] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun And the crtc for the VGA output becomes non-responsive to any configuration command. This requires a reboot and sometimes a UEFI variable reset before it comes back. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
On Fri, 2016-08-26 at 09:10 -0400, James Bottomley wrote: > We seem to have an xrandr regression with skylake now. What's > happening is that I can get output on to a projector, but the system > is losing video when I change the xrandr sessions (like going from a > --above b to a --same-as b). The main screen goes blank, which is > basically a reboot situation. Unfortunately, I can't seem to get the > logs out of systemd to see if there was a dump to dmesg (the system > was definitely responding). > > I fell back to 4.6.2 which worked perfectly, so this is definitely > some sort of regression. I'll be able to debug more fully when I get > back home from the Linux Security Summit. I'm home now. Unfortunately, my monitor isn't as problematic as the projector, but by flipping between various modes and separating and overlaying the panels with --above and --same-as (xrandr), I can eventually get it to the point where the main LCD panel goes black and can only be restarted by specifying a different mode. This seems to be associated with these lines in the X [ 14714.389] (EE) intel(0): failed to set mode: Invalid argument [22] But the curious thing is that even if this fails with the error message once, it may succeed a second time, so it looks to be a transient error translation problem from the kernel driver. I've attached the full log below. This is only with a VGA output. I currently don't have a HDMI dongle, but I'm in the process of acquiring one. James [30.672] X.Org X Server 1.17.2 Release Date: 2015-06-16 [30.672] X Protocol Version 11, Revision 0 [30.672] Build Operating System: openSUSE SUSE LINUX [30.672] Current Operating System: Linux jarvis 4.8.0-rc3+ #12 SMP PREEMPT Mon Aug 22 08:16:45 EDT 2016 x86_64 [30.672] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-test root=UUID=994363fd-c12c-4c6c-92e9-85e8ac695cf9 ro resume=/dev/nvme0n1p2 splash=silent quiet showopts pcie_aspm=force [30.672] Build Date: 07 March 2016 08:22:28AM [30.672] [30.672] Current version of pixman: 0.32.6 [30.672] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [30.672] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [30.673] (==) Log file: "/var/log/Xorg.0.log", Time: Wed Aug 31 07:10:56 2016 [30.673] (==) Using config directory: "/etc/X11/xorg.conf.d" [30.673] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [30.673] (==) No Layout section. Using the first Screen section. [30.673] (==) No screen section available. Using defaults. [30.673] (**) |-->Screen "Default Screen Section" (0) [30.673] (**) | |-->Monitor "" [30.673] (==) No monitor specified for screen "Default Screen Section". Using a default monitor configuration. [30.673] (==) Automatically adding devices [30.673] (==) Automatically enabling devices [30.673] (==) Automatically adding GPU devices [30.674] (WW) The directory "/usr/share/fonts/misc/sgi" does not exist. [30.674] Entry deleted from font path. [30.674] (==) FontPath set to: /usr/share/fonts/misc:unscaled, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi:unscaled, /usr/share/fonts/75dpi:unscaled, /usr/share/fonts/ghostscript/, /usr/share/fonts/cyrillic:unscaled, /usr/share/fonts/truetype/, built-ins [30.674] (==) ModulePath set to "/usr/lib64/xorg/modules" [30.674] (II) The server relies on udev to provide the list of input devices. If no devices become available, reconfigure udev or disable AutoAddDevices. [30.674] (II) Loader magic: 0x80dd00 [30.674] (II) Module ABI versions: [30.674] X.Org ANSI C Emulation: 0.4 [30.674] X.Org Video Driver: 19.0 [30.674] X.Org XInput driver : 21.0 [30.674] X.Org Server Extension : 9.0 [30.675] (++) using VT number 7 [30.675] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration [30.675] (II) xfree86: Adding drm device (/dev/dri/card0) [30.688] (--) PCI:*(0:0:2:0) 8086:1916:1028:0704 rev 7, Mem @ 0xdb00/16777216, 0x9000/268435456, I/O @ 0xf000/64, BIOS @ 0x/131072 [30.688] (II) LoadModule: "glx" [30.689] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [30.691] (II) Module glx: vendor="X.Org Foundation" [30.691] compiled for 1.17.2, module version = 1.0.0 [30.691] ABI class: X.Org Server Extension, version 9.0 [30.691] (==) AIGLX enabled [30.691] (==) Matched intel as autoconfigured driver 0 [30.691] (==) Matched intel as autoconfigured driver 1 [30.691] (==) Matched modesetting as autoconfigured driver 2 [30.691] (=
Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
On Fri, 2016-08-26 at 09:10 -0400, James Bottomley wrote: > We seem to have an xrandr regression with skylake now. What's > happening is that I can get output on to a projector, but the system > is losing video when I change the xrandr sessions (like going from a > --above b to a --same-as b). The main screen goes blank, which is > basically a reboot situation. Unfortunately, I can't seem to get the > logs out of systemd to see if there was a dump to dmesg (the system > was definitely responding). > > I fell back to 4.6.2 which worked perfectly, so this is definitely > some sort of regression. I'll be able to debug more fully when I get > back home from the Linux Security Summit. Sorry, ignore the logs: they're from a previous bug; as I said I'll try to gather logs when I get back to a more controlled environment over the weekend. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3
We seem to have an xrandr regression with skylake now. What's happening is that I can get output on to a projector, but the system is losing video when I change the xrandr sessions (like going from a - -above b to a --same-as b). The main screen goes blank, which is basically a reboot situation. Unfortunately, I can't seem to get the logs out of systemd to see if there was a dump to dmesg (the system was definitely responding). I fell back to 4.6.2 which worked perfectly, so this is definitely some sort of regression. I'll be able to debug more fully when I get back home from the Linux Security Summit. James [13.504] X.Org X Server 1.17.2 Release Date: 2015-06-16 [13.504] X Protocol Version 11, Revision 0 [13.504] Build Operating System: openSUSE SUSE LINUX [13.504] Current Operating System: Linux jarvis 4.7.0-rc1+ #1 SMP PREEMPT Mon May 30 12:29:53 PDT 2016 x86_64 [13.504] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-test root=UUID=994363fd-c12c-4c6c-92e9-85e8ac695cf9 ro resume=/dev/nvme0n1p2 splash=silent quiet showopts pcie_aspm=force [13.504] Build Date: 07 March 2016 08:22:28AM [13.504] [13.504] Current version of pixman: 0.32.6 [13.504] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [13.504] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [13.504] (==) Log file: "/var/log/Xorg.0.log", Time: Mon May 30 12:40:53 2016 [13.504] (==) Using config directory: "/etc/X11/xorg.conf.d" [13.504] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [13.504] (==) No Layout section. Using the first Screen section. [13.504] (==) No screen section available. Using defaults. [13.504] (**) |-->Screen "Default Screen Section" (0) [13.504] (**) | |-->Monitor "" [13.505] (==) No monitor specified for screen "Default Screen Section". Using a default monitor configuration. [13.505] (==) Automatically adding devices [13.505] (==) Automatically enabling devices [13.505] (==) Automatically adding GPU devices [13.509] (WW) The directory "/usr/share/fonts/misc/sgi" does not exist. [13.509] Entry deleted from font path. [13.509] (==) FontPath set to: /usr/share/fonts/misc:unscaled, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi:unscaled, /usr/share/fonts/75dpi:unscaled, /usr/share/fonts/ghostscript/, /usr/share/fonts/cyrillic:unscaled, /usr/share/fonts/truetype/, built-ins [13.509] (==) ModulePath set to "/usr/lib64/xorg/modules" [13.509] (II) The server relies on udev to provide the list of input devices. If no devices become available, reconfigure udev or disable AutoAddDevices. [13.509] (II) Loader magic: 0x80dd00 [13.509] (II) Module ABI versions: [13.509] X.Org ANSI C Emulation: 0.4 [13.509] X.Org Video Driver: 19.0 [13.509] X.Org XInput driver : 21.0 [13.509] X.Org Server Extension : 9.0 [13.509] (++) using VT number 7 [13.509] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration [13.510] (II) xfree86: Adding drm device (/dev/dri/card0) [13.925] (--) PCI:*(0:0:2:0) 8086:1916:1028:0704 rev 7, Mem @ 0xdb00/16777216, 0x9000/268435456, I/O @ 0xf000/64, BIOS @ 0x/131072 [13.925] (II) LoadModule: "glx" [13.926] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [13.930] (II) Module glx: vendor="X.Org Foundation" [13.930] compiled for 1.17.2, module version = 1.0.0 [13.931] ABI class: X.Org Server Extension, version 9.0 [13.931] (==) AIGLX enabled [13.931] (==) Matched intel as autoconfigured driver 0 [13.931] (==) Matched intel as autoconfigured driver 1 [13.931] (==) Matched modesetting as autoconfigured driver 2 [13.931] (==) Matched fbdev as autoconfigured driver 3 [13.931] (==) Matched vesa as autoconfigured driver 4 [13.931] (==) Assigned the driver to the xf86ConfigLayout [13.931] (II) LoadModule: "intel" [13.931] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so [13.932] (II) Module intel: vendor="X.Org Foundation" [13.932] compiled for 1.17.2, module version = 2.99.917 [13.932] Module class: X.Org Video Driver [13.932] ABI class: X.Org Video Driver, version 19.0 [13.932] (II) LoadModule: "modesetting" [13.932] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so [13.932] (II) Module modesetting: vendor="X.Org Foundation" [13.932] compiled for 1.17.2, module version = 1.17.2 [13.932] Module class: X.Org Video Driver [13.932] ABI class: X.Org Video Driver, version 19.0 [13.932] (II) LoadModule: "fbdev" [13.933] (II) Loading /usr/lib64/xorg/modules/drivers/fbdev_drv.so [13.933] (II) Module fbdev: vendor="X.Org Foundation" [
Re: [Intel-gfx] [PATCH] drm/i915: Ignore panel type from OpRegion on SKL
On Tue, 2016-07-12 at 15:00 +0300, ville.syrj...@linux.intel.com wrote: > From: Ville Syrjälä <ville.syrj...@linux.intel.com> > > Dell XPS 13 9350 apparently doesn't like it when we use the panel > type > from OpRegion. The OpRegion panel type (0) tells us to use use low > vswing for eDP, whereas the VBT panel type (2) tells us to use normal > vswing. The problem is that low vswing results in some display > flickers. > Since no one seems to know how this stuff is supposed to be handled, > let's just ignore the OpRegion panel type on SKL for now. > > Reported-by: James Bottomley <james.bottom...@hansenpartnership.com> > Cc: James Bottomley <james.bottom...@hansenpartnership.com> > Cc: drm-intel-fi...@lists.freedesktop.org > References: https://lists.freedesktop.org/archives/intel-gfx/2016-Jun > e/098826.html > Fixes: a05628195a0d ("drm/i915: Get panel_type from OpRegion panel > details") > Signed-off-by: Ville Syrjälä <ville.syrj...@linux.intel.com> This one doesn't actually compile on git head: drivers/gpu/drm/i915/intel_opregion.c: In function ‘intel_opregion_get_panel_type’: drivers/gpu/drm/i915/intel_opregion.c:1047:17: error: ‘dev_priv’ undeclared (first use in this function) if (IS_SKYLAKE(dev_priv)) So it's going to be hard to backport to stable. However, I've applied it on top of the drm branch and there have been no flicker problems after 12h of use, so you can add a tested by from me if you want. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Fri, 2016-07-08 at 13:19 +0300, Ville Syrjälä wrote: > On Thu, Jul 07, 2016 at 12:19:36PM -0700, James Bottomley wrote: > > On Thu, 2016-07-07 at 09:55 -0700, James Bottomley wrote: > > > On Thu, 2016-07-07 at 19:14 +0300, Ville Syrjälä wrote: > > > > On Tue, Jun 21, 2016 at 06:44:34PM +0300, Ville Syrjälä wrote: > > > > > On Tue, Jun 21, 2016 at 09:53:15AM -0400, James Bottomley > > > > > wrote: > > > > > > On Mon, 2016-06-20 at 11:03 +0300, Jani Nikula wrote: > > > > > > > Cc: Ville > > > > > > > > > > > > > > On Mon, 20 Jun 2016, James Bottomley < > > > > > > > james.bottom...@hansenpartnership.com> wrote: > > > > > > > > OK, my candidate bad commit is this one: > > > > > > > > > > > > > > > > commit a05628195a0d9f3173dd9aa76f482aef692e46ee > > > > > > > > Author: Ville Syrjälä <ville.syrj...@linux.intel.com> > > > > > > > > Date: Mon Apr 11 10:23:51 2016 +0300 > > > > > > > > > > > > > > > > drm/i915: Get panel_type from OpRegion panel > > > > > > > > details > > > > > > > > > > > > > > > > After being more careful about waiting to identify > > > > > > > > flicker, > > > > > > > > this one seems to be the one the bisect finds. I'm now > > > > > > > > running v4.7-rc3 with this one reverted and am > > > > > > > > currently > > > > > > > > seeing no flicker problems. It is, however, early > > > > > > > > days > > > > > > > > because the flicker can hide for long periods, so I 'll > > > > > > > > wait > > > > > > > > until Monday evening and a few reboots before declaring > > > > > > > > victory. > > > > > > > > > > > > > > If that turns out to be the bad commit, it doesn't really > > > > > > > surprise me, and that in itself is depressing. > > > > > > > > > > > > As far as I can tell, after running for a day with this > > > > > > reverted, > > > > > > this is the problem. The flicker hasn't appeared with it > > > > > > reverted. It's pretty noticeable with this commit > > > > > > included. > > > > > > > > > > Hmm. The only difference I can see is low vs. normal vswing. > > > > > Panel > > > > > 0 has low, panel 2 has normal. So either the VBT or opregion > > > > > is > > > > > telling utter lies, or there's some other bug in our low > > > > > vswing > > > > > support. > > > > > > > > I did a quick once over of out DDI vswing stuff and didn't find > > > > anything too serious. There were some buglets in the iboost > > > > handling, > > > > but I'm not very hopeful that fixing those would help with your > > > > machine. > > > > > > > > Here's a branch anyway in case you want to give it a go: > > > > git://github.com/vsyrjala/linux.git ddi_iboost_fixes > > > > > > > > Actually, I think the only patch in there that might make a > > > > difference is 15d887855180 ("drm/i915: Fix iboost setting for > > > > DDI > > > > with 4 lanes on SKL") > > > > > > Running with it now (the entire branch). So far it looks OK, but > > > I'll give it a couple of days to see if anything manifests before > > > declaring victory. > > > > Bad news, I'm afraid: after a couple of hours of run time, there is > > now > > noticeable flicker on the display, so although the iboost fixes may > > have lessened it, it's still present. > > Oh well, I suspected as much. Which BIOS version did you have > exactly? As mentioned upthread, now 1.4.3 since skylake microcode was potentially implicated in the problem: http://mid.gmane.org/1466179729.2271.33.ca...@hansenpartnership.com > If I'm reading the Dell website correctly there's a new one (1.4.4) > released on Jun 30, which is after you reported the issue. Might be > I'm reading the wrong thing though. Can you double check this? New bios doesn't change the microcode: [2.390172] microcode: CPU0 sig=0x406e3, pf=0x80, revision=0x8a And the flicker is still present. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Thu, 2016-07-07 at 09:55 -0700, James Bottomley wrote: > On Thu, 2016-07-07 at 19:14 +0300, Ville Syrjälä wrote: > > On Tue, Jun 21, 2016 at 06:44:34PM +0300, Ville Syrjälä wrote: > > > On Tue, Jun 21, 2016 at 09:53:15AM -0400, James Bottomley wrote: > > > > On Mon, 2016-06-20 at 11:03 +0300, Jani Nikula wrote: > > > > > Cc: Ville > > > > > > > > > > On Mon, 20 Jun 2016, James Bottomley < > > > > > james.bottom...@hansenpartnership.com> wrote: > > > > > > OK, my candidate bad commit is this one: > > > > > > > > > > > > commit a05628195a0d9f3173dd9aa76f482aef692e46ee > > > > > > Author: Ville Syrjälä <ville.syrj...@linux.intel.com> > > > > > > Date: Mon Apr 11 10:23:51 2016 +0300 > > > > > > > > > > > > drm/i915: Get panel_type from OpRegion panel details > > > > > > > > > > > > After being more careful about waiting to identify flicker, > > > > > > this one seems to be the one the bisect finds. I'm now > > > > > > running v4.7-rc3 with this one reverted and am currently > > > > > > seeing no flicker problems. It is, however, early days > > > > > > because the flicker can hide for long periods, so I 'll > > > > > > wait > > > > > > until Monday evening and a few reboots before declaring > > > > > > victory. > > > > > > > > > > If that turns out to be the bad commit, it doesn't really > > > > > surprise me, and that in itself is depressing. > > > > > > > > As far as I can tell, after running for a day with this > > > > reverted, > > > > this is the problem. The flicker hasn't appeared with it > > > > reverted. It's pretty noticeable with this commit included. > > > > > > Hmm. The only difference I can see is low vs. normal vswing. > > > Panel > > > 0 has low, panel 2 has normal. So either the VBT or opregion is > > > telling utter lies, or there's some other bug in our low vswing > > > support. > > > > I did a quick once over of out DDI vswing stuff and didn't find > > anything too serious. There were some buglets in the iboost > > handling, > > but I'm not very hopeful that fixing those would help with your > > machine. > > > > Here's a branch anyway in case you want to give it a go: > > git://github.com/vsyrjala/linux.git ddi_iboost_fixes > > > > Actually, I think the only patch in there that might make a > > difference is 15d887855180 ("drm/i915: Fix iboost setting for DDI > > with 4 lanes on SKL") > > Running with it now (the entire branch). So far it looks OK, but > I'll give it a couple of days to see if anything manifests before > declaring victory. Bad news, I'm afraid: after a couple of hours of run time, there is now noticeable flicker on the display, so although the iboost fixes may have lessened it, it's still present. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Thu, 2016-07-07 at 19:14 +0300, Ville Syrjälä wrote: > On Tue, Jun 21, 2016 at 06:44:34PM +0300, Ville Syrjälä wrote: > > On Tue, Jun 21, 2016 at 09:53:15AM -0400, James Bottomley wrote: > > > On Mon, 2016-06-20 at 11:03 +0300, Jani Nikula wrote: > > > > Cc: Ville > > > > > > > > On Mon, 20 Jun 2016, James Bottomley < > > > > james.bottom...@hansenpartnership.com> wrote: > > > > > OK, my candidate bad commit is this one: > > > > > > > > > > commit a05628195a0d9f3173dd9aa76f482aef692e46ee > > > > > Author: Ville Syrjälä <ville.syrj...@linux.intel.com> > > > > > Date: Mon Apr 11 10:23:51 2016 +0300 > > > > > > > > > > drm/i915: Get panel_type from OpRegion panel details > > > > > > > > > > After being more careful about waiting to identify flicker, > > > > > this one seems to be the one the bisect finds. I'm now > > > > > running v4.7-rc3 with this one reverted and am currently > > > > > seeing no flicker problems. It is, however, early days > > > > > because the flicker can hide for long periods, so I 'll wait > > > > > until Monday evening and a few reboots before declaring > > > > > victory. > > > > > > > > If that turns out to be the bad commit, it doesn't really > > > > surprise me, and that in itself is depressing. > > > > > > As far as I can tell, after running for a day with this reverted, > > > this is the problem. The flicker hasn't appeared with it > > > reverted. It's pretty noticeable with this commit included. > > > > Hmm. The only difference I can see is low vs. normal vswing. Panel > > 0 has low, panel 2 has normal. So either the VBT or opregion is > > telling utter lies, or there's some other bug in our low vswing > > support. > > I did a quick once over of out DDI vswing stuff and didn't find > anything too serious. There were some buglets in the iboost handling, > but I'm not very hopeful that fixing those would help with your > machine. > > Here's a branch anyway in case you want to give it a go: > git://github.com/vsyrjala/linux.git ddi_iboost_fixes > > Actually, I think the only patch in there that might make a > difference is 15d887855180 ("drm/i915: Fix iboost setting for DDI > with 4 lanes on SKL") Running with it now (the entire branch). So far it looks OK, but I'll give it a couple of days to see if anything manifests before declaring victory. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Tue, 2016-06-21 at 17:00 -0400, James Bottomley wrote: > On Tue, 2016-06-21 at 18:44 +0300, Ville Syrjälä wrote: > > On Tue, Jun 21, 2016 at 09:53:15AM -0400, James Bottomley wrote: > > > On Mon, 2016-06-20 at 11:03 +0300, Jani Nikula wrote: > > > > Cc: Ville > > > > > > > > On Mon, 20 Jun 2016, James Bottomley < > > > > james.bottom...@hansenpartnership.com> wrote: > > > > > OK, my candidate bad commit is this one: > > > > > > > > > > commit a05628195a0d9f3173dd9aa76f482aef692e46ee > > > > > Author: Ville Syrjälä <ville.syrj...@linux.intel.com> > > > > > Date: Mon Apr 11 10:23:51 2016 +0300 > > > > > > > > > > drm/i915: Get panel_type from OpRegion panel details > > > > > > > > > > After being more careful about waiting to identify flicker, > > > > > this one seems to be the one the bisect finds. I'm now > > > > > running v4.7-rc3 with this one reverted and am currently > > > > > seeing no flicker problems. It is, however, early days > > > > > because the flicker can hide for long periods, so I 'll wait > > > > > until Monday evening and a few reboots before declaring > > > > > victory. > > > > > > > > If that turns out to be the bad commit, it doesn't really > > > > surprise me, and that in itself is depressing. > > > > > > As far as I can tell, after running for a day with this reverted, > > > this is the problem. The flicker hasn't appeared with it > > > reverted. It's pretty noticeable with this commit included. > > > > Hmm. The only difference I can see is low vs. normal vswing. Panel > > 0 has low, panel 2 has normal. So either the VBT or opregion is > > telling utter lies, or there's some other bug in our low vswing > > support. > > > > To confirm it's really a vswing issue, you should be able to run > > with i915.edp_vswing=2 without flickers on the broken kernel. > > Preliminary boot indicates no flicker with the bad commit included > and this option, but I'll have to run for quite a bit longer to > verify, since it can sometimes be elusive. Two days of runtime seems to confirm this is the problem (still no flicker issues). James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Tue, 2016-06-21 at 18:44 +0300, Ville Syrjälä wrote: > On Tue, Jun 21, 2016 at 09:53:15AM -0400, James Bottomley wrote: > > On Mon, 2016-06-20 at 11:03 +0300, Jani Nikula wrote: > > > Cc: Ville > > > > > > On Mon, 20 Jun 2016, James Bottomley < > > > james.bottom...@hansenpartnership.com> wrote: > > > > OK, my candidate bad commit is this one: > > > > > > > > commit a05628195a0d9f3173dd9aa76f482aef692e46ee > > > > Author: Ville Syrjälä <ville.syrj...@linux.intel.com> > > > > Date: Mon Apr 11 10:23:51 2016 +0300 > > > > > > > > drm/i915: Get panel_type from OpRegion panel details > > > > > > > > After being more careful about waiting to identify flicker, > > > > this one seems to be the one the bisect finds. I'm now running > > > > v4.7-rc3 with this one reverted and am currently seeing no > > > > flicker problems. It is, however, early days because the > > > > flicker can hide for long periods, so I 'll wait until Monday > > > > evening and a few reboots before declaring victory. > > > > > > If that turns out to be the bad commit, it doesn't really > > > surprise me, and that in itself is depressing. > > > > As far as I can tell, after running for a day with this reverted, > > this is the problem. The flicker hasn't appeared with it reverted. > > It's pretty noticeable with this commit included. > > Hmm. The only difference I can see is low vs. normal vswing. Panel 0 > has low, panel 2 has normal. So either the VBT or opregion is telling > utter lies, or there's some other bug in our low vswing support. > > To confirm it's really a vswing issue, you should be able to run with > i915.edp_vswing=2 without flickers on the broken kernel. Preliminary boot indicates no flicker with the bad commit included and this option, but I'll have to run for quite a bit longer to verify, since it can sometimes be elusive. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Fri, 2016-06-17 at 16:06 -0700, James Bottomley wrote: > On Fri, 2016-06-17 at 16:34 +0300, Jani Nikula wrote: > > On Fri, 17 Jun 2016, Daniel Vetter <dan...@ffwll.ch> wrote: > > > On Thu, Jun 16, 2016 at 03:42:12PM -0700, James Bottomley wrote: > > > > On Thu, 2016-06-16 at 14:29 -0700, James Bottomley wrote: > > > > > On Thu, 2016-06-16 at 23:24 +0200, Daniel Vetter wrote: > > > > > > I guess we'll need the bisect on this one to make progress. > > > > > > > > > > Sigh, I was afraid that might be the next step. > > > > > > > > OK, I have a curious data point. I assumed the problem would > > > > be > > > > somewhere in the drm update, so I started bisecting that at the > > > > top. > > > > However, the top most commit: > > > > > > > > commit 1d6da87a3241deb13d073c4125d19ed0e5a0c62c > > > > Merge: 1f40c49 a39ed68 > > > > Author: Linus Torvalds <torva...@linux-foundation.org> > > > > Date: Mon May 23 11:48:48 2016 -0700 > > > > > > > > Merge branch 'drm-next' of > > > > git://people.freedesktop.org/~airlied/linux > > > > > > > > Isn't actually bad. There's no flicker here, so whatever > > > > caused > > > > the > > > > problem came from some update after this. > > > > > > There was a fixes pull after this. Might be worth it to restrict > > > to > > > just > > > the i915 changes, which are just > > > 5b4fd5bb1230cd037..157d2c7fad0863222 > > > > > > Looking at those nothing seems to stick out which might explain > > > what's > > > happening for you. > > OK, so just on the firmware, the system seems less flickery with the > new 1.4.3 UEFI, so I'm starting to think it is a Skylake errata > issue. The flicker isn't gone for good, but seems to be reboot > dependent (it's there in some boots, but gone on a reboot). > > > This should be easy enough to try before bisecting: > > http://patchwork.freedesktop.org/patch/msgid/1466162081-12042-1-git > > -s > > end-email-mika.kah...@intel.com > > Applying this didn't seem to make a difference: still there on some > and gone on other reboots. OK, my candidate bad commit is this one: commit a05628195a0d9f3173dd9aa76f482aef692e46ee Author: Ville Syrjälä <ville.syrj...@linux.intel.com> Date: Mon Apr 11 10:23:51 2016 +0300 drm/i915: Get panel_type from OpRegion panel details After being more careful about waiting to identify flicker, this one seems to be the one the bisect finds. I'm now running v4.7-rc3 with this one reverted and am currently seeing no flicker problems. It is, however, early days because the flicker can hide for long periods, so I 'll wait until Monday evening and a few reboots before declaring victory. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Fri, 2016-06-17 at 16:34 +0300, Jani Nikula wrote: > On Fri, 17 Jun 2016, Daniel Vetter <dan...@ffwll.ch> wrote: > > On Thu, Jun 16, 2016 at 03:42:12PM -0700, James Bottomley wrote: > > > On Thu, 2016-06-16 at 14:29 -0700, James Bottomley wrote: > > > > On Thu, 2016-06-16 at 23:24 +0200, Daniel Vetter wrote: > > > > > I guess we'll need the bisect on this one to make progress. > > > > > > > > Sigh, I was afraid that might be the next step. > > > > > > OK, I have a curious data point. I assumed the problem would be > > > somewhere in the drm update, so I started bisecting that at the > > > top. > > > However, the top most commit: > > > > > > commit 1d6da87a3241deb13d073c4125d19ed0e5a0c62c > > > Merge: 1f40c49 a39ed68 > > > Author: Linus Torvalds <torva...@linux-foundation.org> > > > Date: Mon May 23 11:48:48 2016 -0700 > > > > > > Merge branch 'drm-next' of > > > git://people.freedesktop.org/~airlied/linux > > > > > > Isn't actually bad. There's no flicker here, so whatever caused > > > the > > > problem came from some update after this. > > > > There was a fixes pull after this. Might be worth it to restrict to > > just > > the i915 changes, which are just > > 5b4fd5bb1230cd037..157d2c7fad0863222 > > > > Looking at those nothing seems to stick out which might explain > > what's > > happening for you. OK, so just on the firmware, the system seems less flickery with the new 1.4.3 UEFI, so I'm starting to think it is a Skylake errata issue. The flicker isn't gone for good, but seems to be reboot dependent (it's there in some boots, but gone on a reboot). > This should be easy enough to try before bisecting: > http://patchwork.freedesktop.org/patch/msgid/1466162081-12042-1-git-s > end-email-mika.kah...@intel.com Applying this didn't seem to make a difference: still there on some and gone on other reboots. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Fri, 2016-06-17 at 16:34 +0300, Jani Nikula wrote: > On Fri, 17 Jun 2016, Daniel Vetter <dan...@ffwll.ch> wrote: > > On Thu, Jun 16, 2016 at 03:42:12PM -0700, James Bottomley wrote: > > > On Thu, 2016-06-16 at 14:29 -0700, James Bottomley wrote: > > > > On Thu, 2016-06-16 at 23:24 +0200, Daniel Vetter wrote: > > > > > I guess we'll need the bisect on this one to make progress. > > > > > > > > Sigh, I was afraid that might be the next step. > > > > > > OK, I have a curious data point. I assumed the problem would be > > > somewhere in the drm update, so I started bisecting that at the > > > top. > > > However, the top most commit: > > > > > > commit 1d6da87a3241deb13d073c4125d19ed0e5a0c62c > > > Merge: 1f40c49 a39ed68 > > > Author: Linus Torvalds <torva...@linux-foundation.org> > > > Date: Mon May 23 11:48:48 2016 -0700 > > > > > > Merge branch 'drm-next' of > > > git://people.freedesktop.org/~airlied/linux > > > > > > Isn't actually bad. There's no flicker here, so whatever caused > > > the > > > problem came from some update after this. > > > > There was a fixes pull after this. Might be worth it to restrict to > > just > > the i915 changes, which are just > > 5b4fd5bb1230cd037..157d2c7fad0863222 > > > > Looking at those nothing seems to stick out which might explain > > what's > > happening for you. > > This should be easy enough to try before bisecting: > http://patchwork.freedesktop.org/patch/msgid/1466162081-12042-1-git-s > end-email-mika.kah...@intel.com I'll try that. You should know this platform has a firmware bug in the digital signals across the display port ... some problem with the mux in the thunderbolt. Unfortunately the microcode update for this (the thunderbolt firmware) can't be applied from Linux or even the usual FreeDOS, so I'm a bit stuck. It's also been pointed out to me off list that there's a lot of Skylake errata that have display flicker due to non-graphics issues: http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html My system is a Dell XPS 9350 and I have updated it with the just released firmware: http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=VNT5J Which causes my cpu microcode version to go from > [2.314472] microcode: CPU0 sig=0x406e3, pf=0x80, revision=0x82 to > [2.264669] microcode: CPU0 sig=0x406e3, pf=0x80, revision=0x8a But thanks to the completely opaque microcode update process via the OEMs it's not clear whether this is sufficient to fix the errata that are listed as causing flicker (the reported revision on current skylake lenovos seems to be 0x94) James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Thu, 2016-06-16 at 14:29 -0700, James Bottomley wrote: > On Thu, 2016-06-16 at 23:24 +0200, Daniel Vetter wrote: > > I guess we'll need the bisect on this one to make progress. > > Sigh, I was afraid that might be the next step. OK, I have a curious data point. I assumed the problem would be somewhere in the drm update, so I started bisecting that at the top. However, the top most commit: commit 1d6da87a3241deb13d073c4125d19ed0e5a0c62c Merge: 1f40c49 a39ed68 Author: Linus Torvalds <torva...@linux-foundation.org> Date: Mon May 23 11:48:48 2016 -0700 Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Isn't actually bad. There's no flicker here, so whatever caused the problem came from some update after this. James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Thu, 2016-06-16 at 23:24 +0200, Daniel Vetter wrote: > On Thu, Jun 16, 2016 at 11:15 PM, James Bottomley > <james.bottom...@hansenpartnership.com> wrote: > > On Mon, 2016-06-13 at 13:14 +0300, Jani Nikula wrote: > > > On Tue, 31 May 2016, James Bottomley < > > > james.bottom...@hansenpartnership.com> wrote: > > > > On Tue, 2016-05-31 at 10:51 +0300, Jani Nikula wrote: > > > > > On Mon, 30 May 2016, James Bottomley < > > > > > james.bottom...@hansenpartnership.com> wrote: > > > > > > I've tested a pristine 4.6.0 system, so it's definitely > > > > > > something > > > > > > that > > > > > > went in during the merge window. The flicker isn't > > > > > > continuous, > > > > > > it's > > > > > > periodic, with an interval of something like 2-5 seconds. > > > > > > It > > > > > > looks > > > > > > like an old analogue TV going out of sync and then > > > > > > resyncing. > > > > > > I've > > > > > > attached the dmesg and X.org log below just in case they > > > > > > can > > > > > > help. > > > > > > I > > > > > > might be able to bisect this next week, but, unfortunately, > > > > > > this is > > > > > > my > > > > > > current laptop and I'm travelling this week. > > > > > > > > > > Please try i915.enable_psr=0 module parameter. > > > > > > > > Makes no discernable difference. Current parameter settings > > > > are: > > > > > > Sorry for the silence. Would you mind trying out drm-intel > > > -nightly > > > branch of [1]? > > > > > > BR, > > > Jani. > > > > > > [1] http://cgit.freedesktop.org/drm-intel > > > > No, flicker is still there (and in fact seems worse) with the tree > > with > > this commit at the top: > > > > commit 3eb202ecc3668583f9ff4338211dbab47d755d1c > > Author: Daniel Vetter <daniel.vet...@ffwll.ch> > > Date: Thu Jun 16 14:38:54 2016 +0200 > > > > drm-intel-nightly: 2016y-06m-16d-12h-38m-37s UTC integration > > manifest > > Strange indeed, I hoped the improved watermark code in -nightly would > help. I assume nothing in dmesg about underruns or something similar? Not that I can tell. I've attached the full dmesg just in case. > I guess we'll need the bisect on this one to make progress. Sigh, I was afraid that might be the next step. James [0.00] Linux version 4.7.0-rc3+ (jejb@jarvis) (gcc version 4.8.5 (SUSE Linux) ) #4 SMP PREEMPT Thu Jun 16 13:57:07 PDT 2016 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-test root=UUID=994363fd-c12c-4c6c-92e9-85e8ac695cf9 ro resume=/dev/nvme0n1p2 splash=silent quiet showopts pcie_aspm=force [0.00] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.00] x86/fpu: xstate_offset[3]: 960, xstate_sizes[3]: 64 [0.00] x86/fpu: xstate_offset[4]: 1024, xstate_sizes[4]: 64 [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' [0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' [0.00] x86/fpu: Enabled xstate features 0x1f, context size is 1088 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0fff] reserved [0.00] BIOS-e820: [mem 0x1000-0x00057fff] usable [0.00] BIOS-e820: [mem 0x00058000-0x00058fff] reserved [0.00] BIOS-e820: [mem 0x00059000-0x0009cfff] usable [0.00] BIOS-e820: [mem 0x0009d000-0x0009] reserved [0.00] BIOS-e820: [mem 0x0010-0x69537fff] usable [0.00] BIOS-e820: [mem 0x69538000-0x69538fff] ACPI NVS [0.00] BIOS-e820: [mem 0x69539000-0x69562fff] reserved [0.00] BIOS-e820: [mem 0x69563000-0x695bdfff] usable [0.00] BIOS-e820: [mem 0x695be000-0x69daefff] reserved [0.00] BIOS-e820: [mem 0x69daf000-0x7907dfff] usable [0.00] BIOS-e820: [mem 0x7907e000-0x793f8fff] reserved [0.00] BIOS-e820: [mem 0x7
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Mon, 2016-06-13 at 13:14 +0300, Jani Nikula wrote: > On Tue, 31 May 2016, James Bottomley < > james.bottom...@hansenpartnership.com> wrote: > > On Tue, 2016-05-31 at 10:51 +0300, Jani Nikula wrote: > > > On Mon, 30 May 2016, James Bottomley < > > > james.bottom...@hansenpartnership.com> wrote: > > > > I've tested a pristine 4.6.0 system, so it's definitely > > > > something > > > > that > > > > went in during the merge window. The flicker isn't continuous, > > > > it's > > > > periodic, with an interval of something like 2-5 seconds. It > > > > looks > > > > like an old analogue TV going out of sync and then resyncing. > > > > I've > > > > attached the dmesg and X.org log below just in case they can > > > > help. > > > > I > > > > might be able to bisect this next week, but, unfortunately, > > > > this is > > > > my > > > > current laptop and I'm travelling this week. > > > > > > Please try i915.enable_psr=0 module parameter. > > > > Makes no discernable difference. Current parameter settings are: > > Sorry for the silence. Would you mind trying out drm-intel-nightly > branch of [1]? > > BR, > Jani. > > [1] http://cgit.freedesktop.org/drm-intel No, flicker is still there (and in fact seems worse) with the tree with this commit at the top: commit 3eb202ecc3668583f9ff4338211dbab47d755d1c Author: Daniel Vetter <daniel.vet...@ffwll.ch> Date: Thu Jun 16 14:38:54 2016 +0200 drm-intel-nightly: 2016y-06m-16d-12h-38m-37s UTC integration manifest James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
On Tue, 2016-05-31 at 10:51 +0300, Jani Nikula wrote: > On Mon, 30 May 2016, James Bottomley < > james.bottom...@hansenpartnership.com> wrote: > > I've tested a pristine 4.6.0 system, so it's definitely something > > that > > went in during the merge window. The flicker isn't continuous, > > it's > > periodic, with an interval of something like 2-5 seconds. It looks > > like an old analogue TV going out of sync and then resyncing. I've > > attached the dmesg and X.org log below just in case they can help. > > I > > might be able to bisect this next week, but, unfortunately, this is > > my > > current laptop and I'm travelling this week. > > Please try i915.enable_psr=0 module parameter. Makes no discernable difference. Current parameter settings are: disable_display: N disable_power_well: 1 edp_vswing: 0 enable_cmd_parser: 1 enable_dc: -1 enable_dp_mst: Y enable_execlists: 1 enable_fbc: -1 enable_guc_submission: N enable_hangcheck: Y enable_ips: 1 enable_ppgtt: 3 enable_psr: 0 enable_rc6: 1 fastboot: N guc_log_level: -1 inject_load_failure: 0 invert_brightness: 0 load_detect_test: N lvds_channel_mode: 0 lvds_use_ssc: -1 mmio_debug: 0 modeset: -1 nuclear_pageflip: N panel_ignore_lid: 1 prefault_disable: N preliminary_hw_support: 1 reset: Y semaphores: -1 use_mmio_flip: 0 vbt_sdvo_panel_type: -1 verbose_state_checks: Y James ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window
I've tested a pristine 4.6.0 system, so it's definitely something that went in during the merge window. The flicker isn't continuous, it's periodic, with an interval of something like 2-5 seconds. It looks like an old analogue TV going out of sync and then resyncing. I've attached the dmesg and X.org log below just in case they can help. I might be able to bisect this next week, but, unfortunately, this is my current laptop and I'm travelling this week. James [13.504] X.Org X Server 1.17.2 Release Date: 2015-06-16 [13.504] X Protocol Version 11, Revision 0 [13.504] Build Operating System: openSUSE SUSE LINUX [13.504] Current Operating System: Linux jarvis 4.7.0-rc1+ #1 SMP PREEMPT Mon May 30 12:29:53 PDT 2016 x86_64 [13.504] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-test root=UUID=994363fd-c12c-4c6c-92e9-85e8ac695cf9 ro resume=/dev/nvme0n1p2 splash=silent quiet showopts pcie_aspm=force [13.504] Build Date: 07 March 2016 08:22:28AM [13.504] [13.504] Current version of pixman: 0.32.6 [13.504] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [13.504] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [13.504] (==) Log file: "/var/log/Xorg.0.log", Time: Mon May 30 12:40:53 2016 [13.504] (==) Using config directory: "/etc/X11/xorg.conf.d" [13.504] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [13.504] (==) No Layout section. Using the first Screen section. [13.504] (==) No screen section available. Using defaults. [13.504] (**) |-->Screen "Default Screen Section" (0) [13.504] (**) | |-->Monitor "" [13.505] (==) No monitor specified for screen "Default Screen Section". Using a default monitor configuration. [13.505] (==) Automatically adding devices [13.505] (==) Automatically enabling devices [13.505] (==) Automatically adding GPU devices [13.509] (WW) The directory "/usr/share/fonts/misc/sgi" does not exist. [13.509] Entry deleted from font path. [13.509] (==) FontPath set to: /usr/share/fonts/misc:unscaled, /usr/share/fonts/Type1/, /usr/share/fonts/100dpi:unscaled, /usr/share/fonts/75dpi:unscaled, /usr/share/fonts/ghostscript/, /usr/share/fonts/cyrillic:unscaled, /usr/share/fonts/truetype/, built-ins [13.509] (==) ModulePath set to "/usr/lib64/xorg/modules" [13.509] (II) The server relies on udev to provide the list of input devices. If no devices become available, reconfigure udev or disable AutoAddDevices. [13.509] (II) Loader magic: 0x80dd00 [13.509] (II) Module ABI versions: [13.509] X.Org ANSI C Emulation: 0.4 [13.509] X.Org Video Driver: 19.0 [13.509] X.Org XInput driver : 21.0 [13.509] X.Org Server Extension : 9.0 [13.509] (++) using VT number 7 [13.509] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration [13.510] (II) xfree86: Adding drm device (/dev/dri/card0) [13.925] (--) PCI:*(0:0:2:0) 8086:1916:1028:0704 rev 7, Mem @ 0xdb00/16777216, 0x9000/268435456, I/O @ 0xf000/64, BIOS @ 0x/131072 [13.925] (II) LoadModule: "glx" [13.926] (II) Loading /usr/lib64/xorg/modules/extensions/libglx.so [13.930] (II) Module glx: vendor="X.Org Foundation" [13.930] compiled for 1.17.2, module version = 1.0.0 [13.931] ABI class: X.Org Server Extension, version 9.0 [13.931] (==) AIGLX enabled [13.931] (==) Matched intel as autoconfigured driver 0 [13.931] (==) Matched intel as autoconfigured driver 1 [13.931] (==) Matched modesetting as autoconfigured driver 2 [13.931] (==) Matched fbdev as autoconfigured driver 3 [13.931] (==) Matched vesa as autoconfigured driver 4 [13.931] (==) Assigned the driver to the xf86ConfigLayout [13.931] (II) LoadModule: "intel" [13.931] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so [13.932] (II) Module intel: vendor="X.Org Foundation" [13.932] compiled for 1.17.2, module version = 2.99.917 [13.932] Module class: X.Org Video Driver [13.932] ABI class: X.Org Video Driver, version 19.0 [13.932] (II) LoadModule: "modesetting" [13.932] (II) Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so [13.932] (II) Module modesetting: vendor="X.Org Foundation" [13.932] compiled for 1.17.2, module version = 1.17.2 [13.932] Module class: X.Org Video Driver [13.932] ABI class: X.Org Video Driver, version 19.0 [13.932] (II) LoadModule: "fbdev" [13.933] (II) Loading /usr/lib64/xorg/modules/drivers/fbdev_drv.so [13.933] (II) Module fbdev: vendor="X.Org Foundation" [13.933] compiled for 1.17.2, module version = 0.4.4 [13.933] Module class: X.Org Video Driver [13.933] ABI class: X.Org Video Driver, version
[Intel-gfx] i915 Sky Lake graphics problem: screen blank after grub boot
With kernel 4.3.x (tested x=2,3) the screen goes blank as soon as the system boots (both for VGA console and graphics). The only way to get it to turn on is to suspend and resume the system. The system is a Dell XPS 13 (latest rev) running OpenSuse Leap 42.1 and this is the lspci output: 00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 07) 00:04.0 Signal processing controller: Intel Corporation Device 1903 (rev 08) 00:14.0 USB controller: Intel Corporation Device 9d2f (rev 21) 00:14.2 Signal processing controller: Intel Corporation Device 9d31 (rev 21) 00:15.0 Signal processing controller: Intel Corporation Device 9d60 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Device 9d61 (rev 21) 00:16.0 Communication controller: Intel Corporation Device 9d3a (rev 21) 00:1c.0 PCI bridge: Intel Corporation Device 9d10 (rev f1) 00:1c.4 PCI bridge: Intel Corporation Device 9d14 (rev f1) 00:1c.5 PCI bridge: Intel Corporation Device 9d15 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Device 9d18 (rev f1) 00:1f.0 ISA bridge: Intel Corporation Device 9d48 (rev 21) 00:1f.2 Memory controller: Intel Corporation Device 9d21 (rev 21) 00:1f.3 Audio device: Intel Corporation Device 9d70 (rev 21) 00:1f.4 SMBus: Intel Corporation Device 9d23 (rev 21) 3a:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) 3b:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 525a (rev 01) 3c:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a802 (rev 01) It looks like it might be something to do with this warning in the boot dmesgs (full set below): [2.951393] WARNING: CPU: 3 PID: 6 at ../drivers/gpu/drm/i915/intel_dp.c:854 intel_dp_aux_ch+0x105/0x660 [i915]() [2.951394] dp_aux_ch not started status 0xad40001f James --- [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 4.3.2-1.g2aebb11-default (geeko@buildhost) (gcc version 5.2.1 20151008 [gcc-5-branch revision 228597] (SUSE Linux) ) #1 SMP PREEMPT Fri Dec 11 08:22:04 UTC 2015 (2aebb11) [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.3.2-1.g2aebb11-default root=UUID=994363fd-c12c-4c6c-92e9-85e8ac695cf9 ro resume=/dev/nvme0n1p2 splash=silent quiet showopts [0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [0.00] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040 [0.00] x86/fpu: xstate_offset[4]: 0400, xstate_sizes[4]: 0040 [0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' [0.00] x86/fpu: Supporting XSAVE feature 0x08: 'MPX bounds registers' [0.00] x86/fpu: Supporting XSAVE feature 0x10: 'MPX CSR' [0.00] x86/fpu: Enabled xstate features 0x1f, context size is 0x440 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x00057fff] usable [0.00] BIOS-e820: [mem 0x00058000-0x00058fff] reserved [0.00] BIOS-e820: [mem 0x00059000-0x0009cfff] usable [0.00] BIOS-e820: [mem 0x0009d000-0x0009] reserved [0.00] BIOS-e820: [mem 0x0010-0x27265fff] usable [0.00] BIOS-e820: [mem 0x27266000-0x27277fff] reserved [0.00] BIOS-e820: [mem 0x27278000-0x27318fff] usable [0.00] BIOS-e820: [mem 0x27319000-0x27319fff] ACPI NVS [0.00] BIOS-e820: [mem 0x2731a000-0x27343fff] reserved [0.00] BIOS-e820: [mem 0x27344000-0x2739cfff] usable [0.00] BIOS-e820: [mem 0x2739d000-0x27b8dfff] reserved [0.00] BIOS-e820: [mem 0x27b8e000-0x36e5cfff] usable [0.00] BIOS-e820: [mem 0x36e5d000-0x371d6fff] reserved [0.00] BIOS-e820: [mem 0x371d7000-0x37226fff] ACPI data [0.00] BIOS-e820: [mem 0x37227000-0x379defff] ACPI NVS [0.00] BIOS-e820: [mem 0x379df000-0x37ffefff] reserved [0.00] BIOS-e820: [mem 0x37fff000-0x37ff] usable [0.00] BIOS-e820: [mem 0x3800-0x380f] reserved [0.00] BIOS-e820: [mem 0xe000-0xefff] reserved [0.00] BIOS-e820: [mem 0xfe00-0xfe010fff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00]