Re: [PATCH v2] epoll: be better about file lifetimes

2024-05-05 Thread Jens Axboe
On 5/5/24 11:55 AM, Linus Torvalds wrote:
> epoll can call out to vfs_poll() with a file pointer that may race with
> the last 'fput()'. That would make f_count go down to zero, and while
> the ep->mtx locking means that the resulting file pointer tear-down will
> be blocked until the poll returns, it means that f_count is already
> dead, and any use of it won't actually get a reference to the file any
> more: it's dead regardless.
> 
> Make sure we have a valid ref on the file pointer before we call down to
> vfs_poll() from the epoll routines.
> 
> Link: https://lore.kernel.org/lkml/2d631f0615918...@google.com/
> Reported-by: syzbot+045b454ab35fd82a3...@syzkaller.appspotmail.com
> Reviewed-by: Jens Axboe 
> Signed-off-by: Linus Torvalds 
> ---
> 
> Changes since v1:
> 
>  - add Link, Reported-by, and Jens' reviewed-by. And sign off on it
>because it looks fine to me and we have some testing now.
> 
>  - move epi_fget() closer to the user
> 
>  - more comments about the background
> 
>  - remove the rcu_read_lock(), with the comment explaining why it's not
>needed
> 
>  - note about returning zero rather than something like EPOLLERR|POLLHUP
>for a file that is going away

I did look at that 0 return as well and agreed this is the right choice,
but adding the comment is a good idea.

Anyway, patch still looks fine to me. I'd word wrap the comment section
above epi_fget() wider, but that's just a stylistic choice...

-- 
Jens Axboe



Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes

2024-05-05 Thread Jens Axboe
On 5/3/24 3:11 PM, Linus Torvalds wrote:
> epoll is a mess, and does various invalid things in the name of
> performance.
> 
> Let's try to rein it in a bit. Something like this, perhaps?
> 
> Not-yet-signed-off-by: Linus Torvalds 
> ---
> 
> This is entirely untested, thus the "Not-yet-signed-off-by".  But I
> think this may be kind of the right path forward. 
> 
> I suspect the ->poll() call is the main case that matters, but there are
> other places where eventpoll just looks up the file pointer without then
> being very careful about it.  The sock_from_file(epi->ffd.file) uses in
> particular should probably also use this to look up the file. 
> 
> Comments?

FWIW, I agree that epoll is the odd one out and there's no reason NOT to
close this gap, regardless of how safe we currently think the existing
usage is.

I've done some basic testing with this - both to verify it fixes the
actual issue at hand (it does, crashes trivially without it), and
networking/pipe based epoll usage and no ill effects observed. Also
passes all ltp test cases as well, but I was less concerned about that
side.

Reviewed-by: Jens Axboe 

-- 
Jens Axboe



Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)

2024-05-03 Thread Jens Axboe
On 5/3/24 1:22 PM, Kees Cook wrote:
> On Fri, May 03, 2024 at 12:49:11PM -0600, Jens Axboe wrote:
>> On 5/3/24 12:26 PM, Kees Cook wrote:
>>> Thanks for doing this analysis! I suspect at least a start of a fix
>>> would be this:
>>>
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index 8fe5aa67b167..15e8f74ee0f2 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, 
>>> poll_table *poll)
>>>  
>>> if (events & EPOLLOUT) {
>>> /* Paired with fput in dma_buf_poll_cb */
>>> -   get_file(dmabuf->file);
>>> -
>>> -   if (!dma_buf_poll_add_cb(resv, true, dcb))
>>> +   if (!atomic_long_inc_not_zero(>file) &&
>>> +   !dma_buf_poll_add_cb(resv, true, dcb))
>>> /* No callback queued, wake up any other 
>>> waiters */
>>
>> Don't think this is sane at all. I'm assuming you meant:
>>
>>  atomic_long_inc_not_zero(>file->f_count);
> 
> Oops, yes, sorry. I was typed from memory instead of copy/paste.

Figured :-)

>> but won't fly as you're not under RCU in the first place. And what
>> protects it from being long gone before you attempt this anyway? This is
>> sane way to attempt to fix it, it's completely opposite of what sane ref
>> handling should look like.
>>
>> Not sure what the best fix is here, seems like dma-buf should hold an
>> actual reference to the file upfront rather than just stash a pointer
>> and then later _hope_ that it can just grab a reference. That seems
>> pretty horrible, and the real source of the issue.
> 
> AFAICT, epoll just doesn't hold any references at all. It depends,
> I think, on eventpoll_release() (really eventpoll_release_file())
> synchronizing with epoll_wait() (but I don't see how this happens, and
> the race seems to be against ep_item_poll() ...?)
>
> I'm really confused about how eventpoll manages the lifetime of polled
> fds.

epoll doesn't hold any references, and it's got some ugly callback to
deal with that. It's not ideal, nor pretty, but that's how it currently
works. See eventpoll_release() and how it's called. This means that
epoll itself is supposedly safe from the file going away, even though it
doesn't hold a reference to it.

Except that in this case, the file is already gone by the time
eventpoll_release() is called. Which presumably is some interaction with
the somewhat suspicious file reference management that dma-buf is doing.
But I didn't look into that much, outside of noting it looks a bit
suspect.

>>> Due to this issue I've proposed fixing get_file() to detect pathological 
>>> states:
>>> https://lore.kernel.org/lkml/2024050252.work.690-k...@kernel.org/
>>
>> I don't think this would catch this case, as the memory could just be
>> garbage at this point.
> 
> It catches it just fine! :) I tested it against the published PoC.

Sure it _may_ catch the issue, but the memory may also just be garbage
at that point. Not saying it's a useless addition, outside of the usual
gripes of making the hot path slower, just that it won't catch all
cases. Which I guess is fine and expected.

> And for cases where further allocations have progressed far enough to
> corrupt the freed struct file and render the check pointless, nothing
> different has happened than what happens today. At least now we have a
> window to catch the situation across the time frame before it is both
> reallocated _and_ the contents at the f_count offset gets changed to
> non-zero.

Right.

-- 
Jens Axboe



Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)

2024-05-03 Thread Jens Axboe
On 5/3/24 12:26 PM, Kees Cook wrote:
> Thanks for doing this analysis! I suspect at least a start of a fix
> would be this:
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 8fe5aa67b167..15e8f74ee0f2 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, 
> poll_table *poll)
>  
>   if (events & EPOLLOUT) {
>   /* Paired with fput in dma_buf_poll_cb */
> - get_file(dmabuf->file);
> -
> - if (!dma_buf_poll_add_cb(resv, true, dcb))
> + if (!atomic_long_inc_not_zero(>file) &&
> + !dma_buf_poll_add_cb(resv, true, dcb))
>   /* No callback queued, wake up any other 
> waiters */

Don't think this is sane at all. I'm assuming you meant:

atomic_long_inc_not_zero(>file->f_count);

but won't fly as you're not under RCU in the first place. And what
protects it from being long gone before you attempt this anyway? This is
sane way to attempt to fix it, it's completely opposite of what sane ref
handling should look like.

Not sure what the best fix is here, seems like dma-buf should hold an
actual reference to the file upfront rather than just stash a pointer
and then later _hope_ that it can just grab a reference. That seems
pretty horrible, and the real source of the issue.

> Due to this issue I've proposed fixing get_file() to detect pathological 
> states:
> https://lore.kernel.org/lkml/2024050252.work.690-k...@kernel.org/

I don't think this would catch this case, as the memory could just be
garbage at this point.

-- 
Jens Axboe



Re: [RFC PATCH net-next v8 07/14] page_pool: devmem support

2024-04-30 Thread Jens Axboe
On 4/30/24 12:29 PM, Mina Almasry wrote:
> On Tue, Apr 30, 2024 at 6:46?AM Jens Axboe  wrote:
>>
>> On 4/26/24 8:11 PM, Mina Almasry wrote:
>>> On Fri, Apr 26, 2024 at 5:18?PM David Wei  wrote:
>>>>
>>>> On 2024-04-02 5:20 pm, Mina Almasry wrote:
>>>>> @@ -69,20 +106,26 @@ net_iov_binding(const struct net_iov *niov)
>>>>>   */
>>>>>  typedef unsigned long __bitwise netmem_ref;
>>>>>
>>>>> +static inline bool netmem_is_net_iov(const netmem_ref netmem)
>>>>> +{
>>>>> +#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_DMA_SHARED_BUFFER)
>>>>
>>>> I am guessing you added this to try and speed up the fast path? It's
>>>> overly restrictive for us since we do not need dmabuf necessarily. I
>>>> spent a bit too much time wondering why things aren't working only to
>>>> find this :(
>>>
>>> My apologies, I'll try to put the changelog somewhere prominent, or
>>> notify you when I do something that I think breaks you.
>>>
>>> Yes, this is a by-product of a discussion with regards to the
>>> page_pool benchmark regressions due to adding devmem. There is some
>>> background on why this was added and the impact on the
>>> bench_page_pool_simple tests in the cover letter.
>>>
>>> For you, I imagine you want to change this to something like:
>>>
>>> #if defined(CONFIG_PAGE_POOL)
>>> #if defined(CONFIG_DMA_SHARED_BUFFER) || defined(CONFIG_IOURING)
>>>
>>> or something like that, right? Not sure if this is something I should
>>> do here or if something more appropriate to be in the patches you
>>> apply on top.
>>
>> In general, attempting to hide overhead behind config options is always
>> a losing proposition. It merely serves to say "look, if these things
>> aren't enabled, the overhead isn't there", while distros blindly enable
>> pretty much everything and then you're back where you started.
>>
> 
> The history there is that this check adds 1 cycle regression to the
> page_pool fast path benchmark. The regression last I measured is 8->9
> cycles, so in % wise it's a quite significant 12.5% (more details in
> the cover letter[1]). I doubt I can do much better than that to be
> honest.

I'm all for cycle counting, and do it myself too, but is that even
measurable in anything that isn't a super targeted microbenchmark? Or
even in that? 

> There was a desire not to pay this overhead in setups that will likely
> not care about devmem, like embedded devices maybe, or setups without
> GPUs. Adding a CONFIG check here seemed like very low hanging fruit,
> but yes it just hides the overhead in some configs, not really removes
> it.
> 
> There was a discussion about adding this entire netmem/devmem work
> under a new CONFIG. There was pushback particularly from Willem that
> at the end of the day what is enabled on most distros is what matters
> and we added code churn and CONFIG churn for little value.
> 
> If there is significant pushback to the CONFIG check I can remove it.
> I don't feel like it's critical, it just mirco-optimizes some setups
> that doesn't really care about this work area.

That is true, but in practice it'll be enabled anyway. Seems like it's
not really worth it in this scenario.

-- 
Jens Axboe



Re: [RFC PATCH net-next v8 07/14] page_pool: devmem support

2024-04-30 Thread Jens Axboe
On 4/26/24 8:11 PM, Mina Almasry wrote:
> On Fri, Apr 26, 2024 at 5:18?PM David Wei  wrote:
>>
>> On 2024-04-02 5:20 pm, Mina Almasry wrote:
>>> @@ -69,20 +106,26 @@ net_iov_binding(const struct net_iov *niov)
>>>   */
>>>  typedef unsigned long __bitwise netmem_ref;
>>>
>>> +static inline bool netmem_is_net_iov(const netmem_ref netmem)
>>> +{
>>> +#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_DMA_SHARED_BUFFER)
>>
>> I am guessing you added this to try and speed up the fast path? It's
>> overly restrictive for us since we do not need dmabuf necessarily. I
>> spent a bit too much time wondering why things aren't working only to
>> find this :(
> 
> My apologies, I'll try to put the changelog somewhere prominent, or
> notify you when I do something that I think breaks you.
> 
> Yes, this is a by-product of a discussion with regards to the
> page_pool benchmark regressions due to adding devmem. There is some
> background on why this was added and the impact on the
> bench_page_pool_simple tests in the cover letter.
> 
> For you, I imagine you want to change this to something like:
> 
> #if defined(CONFIG_PAGE_POOL)
> #if defined(CONFIG_DMA_SHARED_BUFFER) || defined(CONFIG_IOURING)
> 
> or something like that, right? Not sure if this is something I should
> do here or if something more appropriate to be in the patches you
> apply on top.

In general, attempting to hide overhead behind config options is always
a losing proposition. It merely serves to say "look, if these things
aren't enabled, the overhead isn't there", while distros blindly enable
pretty much everything and then you're back where you started.

-- 
Jens Axboe



Re: REGRESSION: no console on current -git

2024-01-19 Thread Jens Axboe
On 1/19/24 5:27 PM, Helge Deller wrote:
> On 1/19/24 22:22, Jens Axboe wrote:
>> On 1/19/24 2:14 PM, Helge Deller wrote:
>>> On 1/19/24 22:01, Jens Axboe wrote:
>>>> On 1/19/24 1:55 PM, Helge Deller wrote:
>>>>> Adding Mirsad Todorovac (who reported a similar issue).
>>>>>
>>>>> On 1/19/24 19:39, Jens Axboe wrote:
>>>>>> My trusty R7525 test box is failing to show a console, or in fact 
>>>>>> anything,
>>>>>> on current -git. There's no output after:
>>>>>>
>>>>>> Loading Linux 6.7.0+ ...
>>>>>> Loading initial ramdisk ...
>>>>>>
>>>>>> and I don't get a console up. I went through the bisection pain and
>>>>>> found this was the culprit:
>>>>>>
>>>>>> commit df67699c9cb0ceb70f6cc60630ca938c06773eda
>>>>>> Author: Thomas Zimmermann 
>>>>>> Date:   Wed Jan 3 11:15:11 2024 +0100
>>>>>>
>>>>>>firmware/sysfb: Clear screen_info state after consuming it
>>>>>>
>>>>>> Reverting this commit, and everything is fine. Looking at dmesg with a
>>>>>> buggy kernel, I get no frame or fb messages. On a good kernel, it looks
>>>>>> ilke this:
>>>>>>
>>>>>> [1.416486] efifb: probing for efifb
>>>>>> [1.416602] efifb: framebuffer at 0xde00, using 3072k, total 3072k
>>>>>> [1.416605] efifb: mode is 1024x768x32, linelength=4096, pages=1
>>>>>> [1.416607] efifb: scrolling: redraw
>>>>>> [1.416608] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
>>>>>> [1.449746] fb0: EFI VGA frame buffer device
>>>>>>
>>>>>> Happy to test a fix, or barring that, can someone just revert this
>>>>>> commit please?
>>>>>
>>>>> I've temporarily added a revert patch into the fbdev for-next tree for 
>>>>> now,
>>>>> so people should not face the issue in the for-next series:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next
>>>>> I'd like to wait for Thomas to return on monday to check the issue
>>>>> as there are some other upcoming patches in this area from him.
>>>>
>>>> Given the issue (and that I'm not the only one reporting it), can we
>>>> please just get that pushed so it'll make -rc1? It can always get
>>>> re-introduced in a fixed fashion. I don't run -next here, I rely on
>>>> mainline working for my testing.
>>>
>>> I agree, it would be good to get it fixed for -rc1.
>>> So, it's ok for me, but I won't be able to test the revert short time right 
>>> now.
>>> If you can assure that the revert fixes it, and builds in git-head,
>>> I can now prepare the pull request for Linus now (or he just reverts
>>> commit df67699c9cb0 manually).
>>
>> I already tested a revert on top of the current tree, and it builds just
>> fine and boots with a working console. So reverting it does work and
>> solves the issue.
> 
> I sent a pull request with the revert.

Thanks! You forgot the Reported-by, but no big deal.

-- 
Jens Axboe



Re: REGRESSION: no console on current -git

2024-01-19 Thread Jens Axboe
On 1/19/24 2:14 PM, Helge Deller wrote:
> On 1/19/24 22:01, Jens Axboe wrote:
>> On 1/19/24 1:55 PM, Helge Deller wrote:
>>> Adding Mirsad Todorovac (who reported a similar issue).
>>>
>>> On 1/19/24 19:39, Jens Axboe wrote:
>>>> My trusty R7525 test box is failing to show a console, or in fact anything,
>>>> on current -git. There's no output after:
>>>>
>>>> Loading Linux 6.7.0+ ...
>>>> Loading initial ramdisk ...
>>>>
>>>> and I don't get a console up. I went through the bisection pain and
>>>> found this was the culprit:
>>>>
>>>> commit df67699c9cb0ceb70f6cc60630ca938c06773eda
>>>> Author: Thomas Zimmermann 
>>>> Date:   Wed Jan 3 11:15:11 2024 +0100
>>>>
>>>>   firmware/sysfb: Clear screen_info state after consuming it
>>>>
>>>> Reverting this commit, and everything is fine. Looking at dmesg with a
>>>> buggy kernel, I get no frame or fb messages. On a good kernel, it looks
>>>> ilke this:
>>>>
>>>> [1.416486] efifb: probing for efifb
>>>> [1.416602] efifb: framebuffer at 0xde00, using 3072k, total 3072k
>>>> [1.416605] efifb: mode is 1024x768x32, linelength=4096, pages=1
>>>> [1.416607] efifb: scrolling: redraw
>>>> [1.416608] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
>>>> [1.449746] fb0: EFI VGA frame buffer device
>>>>
>>>> Happy to test a fix, or barring that, can someone just revert this
>>>> commit please?
>>>
>>> I've temporarily added a revert patch into the fbdev for-next tree for now,
>>> so people should not face the issue in the for-next series:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next
>>> I'd like to wait for Thomas to return on monday to check the issue
>>> as there are some other upcoming patches in this area from him.
>>
>> Given the issue (and that I'm not the only one reporting it), can we
>> please just get that pushed so it'll make -rc1? It can always get
>> re-introduced in a fixed fashion. I don't run -next here, I rely on
>> mainline working for my testing.
> 
> I agree, it would be good to get it fixed for -rc1.
> So, it's ok for me, but I won't be able to test the revert short time right 
> now.
> If you can assure that the revert fixes it, and builds in git-head,
> I can now prepare the pull request for Linus now (or he just reverts
> commit df67699c9cb0 manually).

I already tested a revert on top of the current tree, and it builds just
fine and boots with a working console. So reverting it does work and
solves the issue.

-- 
Jens Axboe



Re: REGRESSION: no console on current -git

2024-01-19 Thread Jens Axboe
On 1/19/24 1:55 PM, Helge Deller wrote:
> Adding Mirsad Todorovac (who reported a similar issue).
> 
> On 1/19/24 19:39, Jens Axboe wrote:
>> My trusty R7525 test box is failing to show a console, or in fact anything,
>> on current -git. There's no output after:
>>
>> Loading Linux 6.7.0+ ...
>> Loading initial ramdisk ...
>>
>> and I don't get a console up. I went through the bisection pain and
>> found this was the culprit:
>>
>> commit df67699c9cb0ceb70f6cc60630ca938c06773eda
>> Author: Thomas Zimmermann 
>> Date:   Wed Jan 3 11:15:11 2024 +0100
>>
>>  firmware/sysfb: Clear screen_info state after consuming it
>>
>> Reverting this commit, and everything is fine. Looking at dmesg with a
>> buggy kernel, I get no frame or fb messages. On a good kernel, it looks
>> ilke this:
>>
>> [1.416486] efifb: probing for efifb
>> [1.416602] efifb: framebuffer at 0xde00, using 3072k, total 3072k
>> [1.416605] efifb: mode is 1024x768x32, linelength=4096, pages=1
>> [1.416607] efifb: scrolling: redraw
>> [1.416608] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
>> [1.449746] fb0: EFI VGA frame buffer device
>>
>> Happy to test a fix, or barring that, can someone just revert this
>> commit please?
> 
> I've temporarily added a revert patch into the fbdev for-next tree for now,
> so people should not face the issue in the for-next series:
> https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git/commit/?h=for-next
> I'd like to wait for Thomas to return on monday to check the issue
> as there are some other upcoming patches in this area from him.

Given the issue (and that I'm not the only one reporting it), can we
please just get that pushed so it'll make -rc1? It can always get
re-introduced in a fixed fashion. I don't run -next here, I rely on
mainline working for my testing.

-- 
Jens Axboe



Re: [PATCH v2 0/4] eventfd: simplify signal helpers

2023-11-23 Thread Jens Axboe
On 11/22/23 5:48 AM, Christian Brauner wrote:
> Hey everyone,
> 
> This simplifies the eventfd_signal() and eventfd_signal_mask() helpers
> significantly. They can be made void and not take any unnecessary
> arguments.
> 
> I've added a few more simplifications based on Sean's suggestion.
> 
> Signed-off-by: Christian Brauner 
> 
> Changes in v2:
> - further simplify helpers
> - Link to v1: 
> https://lore.kernel.org/r/20230713-vfs-eventfd-signal-v1-0-7fda6c5d2...@kernel.org

Only oddity I spotted was the kerneldoc, which someone else already
brought up.

Reviewed-by: Jens Axboe 

-- 
Jens Axboe




Re: [PATCH v2 0/5] Fix probe failed when modprobe modules

2022-11-30 Thread Jens Axboe
On 11/29/22 9:06 AM, Li Zetao wrote:
> This patchset fixes similar issue, the root cause of the
> problem is that the virtqueues are not stopped on error
> handling path.

Not related to just this patchset, but guys, Huawei really *REALLY* need
to get the email situation sorted. I'm digging whole/half patchsets out
of spam every morning.

This has been brought up in the past. And no, the cloud variant of
the email also doesn't work properly.

Talk to your IT department, get this sorted once and for all. You risk
your patches being dumped on the floor because people don't see them,
or only see small parts of a patchset. And it's really annoying to have
to deal with as a recipient.

-- 
Jens Axboe




Re: ____i915_gem_object_get_pages() -> shmem_get_pages() crash in -git

2021-11-12 Thread Jens Axboe
On 11/11/21 10:03 AM, Ajay Garg wrote:
> Hi Jens.
> 
> Same issue at my side.
> 
> Have posted a patch at
> https://lore.kernel.org/linux-mm/camzfgtup6dkt4owzlhl8whqnnxabfvw5c6aqoghzy3bbm_k...@mail.gmail.com/T/#m2189d135b9293de9b4a11362f0179c17b254d5ab
> 
> Will be great to hear back if it fixes things at your side too.

Yep, looks like that'd fix it indeed.

-- 
Jens Axboe



____i915_gem_object_get_pages() -> shmem_get_pages() crash in -git

2021-11-12 Thread Jens Axboe
Hi,

Running -git as of a day or two ago on my laptop, and I've hit this resume
crash a few times:

OOM killer enabled.
Restarting tasks ... done.
thermal thermal_zone7: failed to read out thermal zone (-61)
xfs filesystem being remounted at 
/run/systemd/unit-root/var/cache/private/fwupdmgr supports timestamps until 
2038 (0x7fff)
PM: suspend exit
BUG: unable to handle page fault for address: fff4
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 21360b067 P4D 21360b067 PUD 21360d067 PMD 0 
Oops:  [#1] PREEMPT SMP
CPU: 7 PID: 3687 Comm: Xorg Tainted: G S5.15.0+ #12136
Hardware name: LENOVO 20XWCTO1WW/20XWCTO1WW, BIOS N32ET72W (1.48 ) 10/08/2021
RIP: 0010:shmem_read_mapping_page_gfp+0x53/0x90
Code: af 75 5b 41 89 d0 6a 00 45 31 c9 b9 02 00 00 00 6a 00 48 8d 55 f0 4c 89 
d7 e8 89 f6 ff ff 5a 85 c0 59 74 2b 48 98 48 89 45 f0 <48> 8b 10 f7 c2 00 00 80 
00 48 c7 c2 fb ff ff ff 48 0f 45 c2 48 8b
RSP: 0018:a01940ec7c18 EFLAGS: 00010282
RAX: fff4 RBX: 02f3 RCX: 
RDX:  RSI:  RDI: aff0a3c0
RBP: a01940ec7c28 R08:  R09: 0f00
R10: 0002 R11:  R12: 001120d2
R13: 8c6b0648a200 R14: 8c69a45472c0 R15: 8c69ba412c10
FS:  7ffae02f3a40() GS:8c6b8f7c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff4 CR3: 00010efb5001 CR4: 00570ee0
PKRU: 5554
Call Trace:
 
 shmem_get_pages+0x242/0x640 [i915]
 ? drm_vma_node_allow+0xb7/0xe0 [drm]
 ? drm_gem_handle_create_tail+0xca/0x1a0 [drm]
 i915_gem_object_get_pages+0x20/0x50 [i915]
 __i915_gem_object_get_pages+0x35/0x40 [i915]
 i915_gem_set_domain_ioctl+0x255/0x2d0 [i915]
 ? i915_gem_object_set_to_cpu_domain+0xb0/0xb0 [i915]
 drm_ioctl_kernel+0xb4/0x140 [drm]
 drm_ioctl+0x22d/0x440 [drm]
 ? i915_gem_object_set_to_cpu_domain+0xb0/0xb0 [i915]
 ? __fget_files+0x74/0xa0
 __x64_sys_ioctl+0x8e/0xc0
 do_syscall_64+0x35/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ffae065350b
Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff 
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
RSP: 002b:7ffe8b906c88 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 5611c921d0b0 RCX: 7ffae065350b
RDX: 7ffe8b906cd4 RSI: 400c645f RDI: 0011
RBP: 7ffe8b906d60 R08: 5611ca9f7360 R09: 5611cb62e920
R10: 5611c9167010 R11: 0246 R12: 5611ca9f7360
R13: 5611c921d0c8 R14: 7ffe8b906cd4 R15: 0011
 
Modules linked in: rfcomm ccm cmac bnep iwlmvm mac80211 btusb btrtl binfmt_misc 
btbcm uvcvideo btintel bluetooth x86_pkg_temp_thermal videobuf2_vmalloc 
videobuf2_memops videobuf2_v4l2 videobuf2_common videodev nls_iso8859_1 
intel_powerclamp mc coretemp ecdh_generic ecc libarc4 kvm_intel wmi_bmof 
iwlwifi snd_hda_codec_hdmi kvm snd_ctl_led snd_hda_codec_realtek 
snd_hda_codec_generic cfg80211 irqbypass snd_hda_intel snd_intel_dspcfg 
snd_hda_codec intel_cstate input_leds snd_hwdep thinkpad_acpi snd_hda_core 
serio_raw hid_multitouch nvram ledtrig_audio mei_me platform_profile snd_seq 
snd_pcm mei snd_timer processor_thermal_device_pci_legacy snd_seq_device 
intel_soc_dts_iosf processor_thermal_device processor_thermal_rfim snd 
ucsi_acpi typec_ucsi processor_thermal_mbox typec processor_thermal_rapl 
intel_rapl_common soundcore int3403_thermal int340x_thermal_zone 
int3400_thermal wmi acpi_thermal_rel acpi_pad sch_fq_codel msr ip_tables 
x_tables usbhid i915 hid_generic i2c_algo_bit
 ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm 
crct10dif_pclmul crc32_pclmul nvme ghash_clmulni_intel aesni_intel nvme_core 
intel_lpss_pci crypto_simd intel_lpss cryptd idma64 virt_dma video
CR2: fff4
---[ end trace f26a0d7d10ef13d6 ]---
RIP: 0010:shmem_read_mapping_page_gfp+0x53/0x90
Code: af 75 5b 41 89 d0 6a 00 45 31 c9 b9 02 00 00 00 6a 00 48 8d 55 f0 4c 89 
d7 e8 89 f6 ff ff 5a 85 c0 59 74 2b 48 98 48 89 45 f0 <48> 8b 10 f7 c2 00 00 80 
00 48 c7 c2 fb ff ff ff 48 0f 45 c2 48 8b
RSP: 0018:a01940ec7c18 EFLAGS: 00010282
RAX: fff4 RBX: 02f3 RCX: 
RDX:  RSI:  RDI: aff0a3c0
RBP: a01940ec7c28 R08:  R09: 0f00
R10: 0002 R11:  R12: 001120d2
R13: 8c6b0648a200 R14: 8c69a45472c0 R15: 8c69ba412c10
FS:  7ffae02f3a40() GS:8c6b8f7c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: fff4 CR3: 00010efb5001 CR4: 00570ee0
PKRU: 5554

Ring a bell for anyone? This is an X1 gen9 laptop, intel graphics.

-- 
Jens Axboe



Re: [PATCH] block: convert tasklets to use new tasklet_setup() API

2020-08-20 Thread Jens Axboe
On 8/19/20 9:24 AM, Allen wrote:
>> [...]
>>>> Since both threads seem to have petered out, let me suggest in
>>>> kernel.h:
>>>>
>>>> #define cast_out(ptr, container, member) \
>>>> container_of(ptr, typeof(*container), member)
>>>>
>>>> It does what you want, the argument order is the same as
>>>> container_of with the only difference being you name the containing
>>>> structure instead of having to specify its type.
>>>
>>> Not to incessantly bike shed on the naming, but I don't like
>>> cast_out, it's not very descriptive. And it has connotations of
>>> getting rid of something, which isn't really true.
>>
>> Um, I thought it was exactly descriptive: you're casting to the outer
>> container.  I thought about following the C++ dynamic casting style, so
>> out_cast(), but that seemed a bit pejorative.  What about outer_cast()?
>>
>>> FWIW, I like the from_ part of the original naming, as it has some
>>> clues as to what is being done here. Why not just from_container()?
>>> That should immediately tell people what it does without having to
>>> look up the implementation, even before this becomes a part of the
>>> accepted coding norm.
>>
>> I'm not opposed to container_from() but it seems a little less
>> descriptive than outer_cast() but I don't really care.  I always have
>> to look up container_of() when I'm using it so this would just be
>> another macro of that type ...
>>
> 
>  So far we have a few which have been suggested as replacement
> for from_tasklet()
> 
> - out_cast() or outer_cast()
> - from_member().
> - container_from() or from_container()
> 
> from_container() sounds fine, would trimming it a bit work? like from_cont().

I like container_from() the most, since it's the closest to contain_of()
which is a well known idiom for years. The lines will already be shorter
without the need to specify the struct, so don't like the idea of
squeezing container into cont for any of them. For most people, cont is
usually short for continue, not container.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] block: convert tasklets to use new tasklet_setup() API

2020-08-20 Thread Jens Axboe
On 8/18/20 1:00 PM, James Bottomley wrote:
> On Mon, 2020-08-17 at 13:02 -0700, Jens Axboe wrote:
>> On 8/17/20 12:48 PM, Kees Cook wrote:
>>> On Mon, Aug 17, 2020 at 12:44:34PM -0700, Jens Axboe wrote:
>>>> On 8/17/20 12:29 PM, Kees Cook wrote:
>>>>> On Mon, Aug 17, 2020 at 06:56:47AM -0700, Jens Axboe wrote:
>>>>>> On 8/17/20 2:15 AM, Allen Pais wrote:
>>>>>>> From: Allen Pais 
>>>>>>>
>>>>>>> In preparation for unconditionally passing the
>>>>>>> struct tasklet_struct pointer to all tasklet
>>>>>>> callbacks, switch to using the new tasklet_setup()
>>>>>>> and from_tasklet() to pass the tasklet pointer explicitly.
>>>>>>
>>>>>> Who came up with the idea to add a macro 'from_tasklet' that
>>>>>> is just container_of? container_of in the code would be
>>>>>> _much_ more readable, and not leave anyone guessing wtf
>>>>>> from_tasklet is doing.
>>>>>>
>>>>>> I'd fix that up now before everything else goes in...
>>>>>
>>>>> As I mentioned in the other thread, I think this makes things
>>>>> much more readable. It's the same thing that the timer_struct
>>>>> conversion did (added a container_of wrapper) to avoid the
>>>>> ever-repeating use of typeof(), long lines, etc.
>>>>
>>>> But then it should use a generic name, instead of each sub-system 
>>>> using some random name that makes people look up exactly what it
>>>> does. I'm not huge fan of the container_of() redundancy, but
>>>> adding private variants of this doesn't seem like the best way
>>>> forward. Let's have a generic helper that does this, and use it
>>>> everywhere.
>>>
>>> I'm open to suggestions, but as things stand, these kinds of
>>> treewide
>>
>> On naming? Implementation is just as it stands, from_tasklet() is
>> totally generic which is why I objected to it. from_member()? Not
>> great with naming... But I can see this going further and then we'll
>> suddenly have tons of these. It's not good for readability.
> 
> Since both threads seem to have petered out, let me suggest in
> kernel.h:
> 
> #define cast_out(ptr, container, member) \
>   container_of(ptr, typeof(*container), member)
> 
> It does what you want, the argument order is the same as container_of
> with the only difference being you name the containing structure
> instead of having to specify its type.

Not to incessantly bike shed on the naming, but I don't like cast_out,
it's not very descriptive. And it has connotations of getting rid of
something, which isn't really true.

FWIW, I like the from_ part of the original naming, as it has some clues
as to what is being done here. Why not just from_container()? That
should immediately tell people what it does without having to look up
the implementation, even before this becomes a part of the accepted
coding norm.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] block: convert tasklets to use new tasklet_setup() API

2020-08-20 Thread Jens Axboe
On 8/19/20 6:11 AM, Greg KH wrote:
> On Wed, Aug 19, 2020 at 07:00:53AM -0600, Jens Axboe wrote:
>> On 8/18/20 1:00 PM, James Bottomley wrote:
>>> On Mon, 2020-08-17 at 13:02 -0700, Jens Axboe wrote:
>>>> On 8/17/20 12:48 PM, Kees Cook wrote:
>>>>> On Mon, Aug 17, 2020 at 12:44:34PM -0700, Jens Axboe wrote:
>>>>>> On 8/17/20 12:29 PM, Kees Cook wrote:
>>>>>>> On Mon, Aug 17, 2020 at 06:56:47AM -0700, Jens Axboe wrote:
>>>>>>>> On 8/17/20 2:15 AM, Allen Pais wrote:
>>>>>>>>> From: Allen Pais 
>>>>>>>>>
>>>>>>>>> In preparation for unconditionally passing the
>>>>>>>>> struct tasklet_struct pointer to all tasklet
>>>>>>>>> callbacks, switch to using the new tasklet_setup()
>>>>>>>>> and from_tasklet() to pass the tasklet pointer explicitly.
>>>>>>>>
>>>>>>>> Who came up with the idea to add a macro 'from_tasklet' that
>>>>>>>> is just container_of? container_of in the code would be
>>>>>>>> _much_ more readable, and not leave anyone guessing wtf
>>>>>>>> from_tasklet is doing.
>>>>>>>>
>>>>>>>> I'd fix that up now before everything else goes in...
>>>>>>>
>>>>>>> As I mentioned in the other thread, I think this makes things
>>>>>>> much more readable. It's the same thing that the timer_struct
>>>>>>> conversion did (added a container_of wrapper) to avoid the
>>>>>>> ever-repeating use of typeof(), long lines, etc.
>>>>>>
>>>>>> But then it should use a generic name, instead of each sub-system 
>>>>>> using some random name that makes people look up exactly what it
>>>>>> does. I'm not huge fan of the container_of() redundancy, but
>>>>>> adding private variants of this doesn't seem like the best way
>>>>>> forward. Let's have a generic helper that does this, and use it
>>>>>> everywhere.
>>>>>
>>>>> I'm open to suggestions, but as things stand, these kinds of
>>>>> treewide
>>>>
>>>> On naming? Implementation is just as it stands, from_tasklet() is
>>>> totally generic which is why I objected to it. from_member()? Not
>>>> great with naming... But I can see this going further and then we'll
>>>> suddenly have tons of these. It's not good for readability.
>>>
>>> Since both threads seem to have petered out, let me suggest in
>>> kernel.h:
>>>
>>> #define cast_out(ptr, container, member) \
>>> container_of(ptr, typeof(*container), member)
>>>
>>> It does what you want, the argument order is the same as container_of
>>> with the only difference being you name the containing structure
>>> instead of having to specify its type.
>>
>> Not to incessantly bike shed on the naming, but I don't like cast_out,
>> it's not very descriptive. And it has connotations of getting rid of
>> something, which isn't really true.
> 
> I agree, if we want to bike shed, I don't like this color either.
> 
>> FWIW, I like the from_ part of the original naming, as it has some clues
>> as to what is being done here. Why not just from_container()? That
>> should immediately tell people what it does without having to look up
>> the implementation, even before this becomes a part of the accepted
>> coding norm.
> 
> Why are people hating on the well-known and used container_of()?
> 
> If you really hate to type the type and want a new macro, what about
> 'container_from()'?  (noun/verb is nicer to sort symbols by...)
> 
> But really, why is this even needed?

container_from() or from_container(), either works just fine for me
in terms of naming.

I think people are hating on it because it makes for _really_ long
lines, and it's arguably cleaner/simpler to just pass in the pointer
type instead. Then you end up with lines like this:

struct request_queue *q =   
container_of(work, struct request_queue, requeue_work.work);  

But I'm not the one that started this addition of from_tasklet(), my
objection was adding a private macro for something that should be
generic functionality. Hence I think we either need to provide that, or
tell the from_tasklet() folks that they should just use container_of().

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] block: convert tasklets to use new tasklet_setup() API

2020-08-18 Thread Jens Axboe
On 8/17/20 2:15 AM, Allen Pais wrote:
> From: Allen Pais 
> 
> In preparation for unconditionally passing the
> struct tasklet_struct pointer to all tasklet
> callbacks, switch to using the new tasklet_setup()
> and from_tasklet() to pass the tasklet pointer explicitly.

Who came up with the idea to add a macro 'from_tasklet' that is just
container_of? container_of in the code would be _much_ more readable,
and not leave anyone guessing wtf from_tasklet is doing.

I'd fix that up now before everything else goes in...

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] block: convert tasklets to use new tasklet_setup() API

2020-08-18 Thread Jens Axboe
On 8/17/20 12:29 PM, Kees Cook wrote:
> On Mon, Aug 17, 2020 at 06:56:47AM -0700, Jens Axboe wrote:
>> On 8/17/20 2:15 AM, Allen Pais wrote:
>>> From: Allen Pais 
>>>
>>> In preparation for unconditionally passing the
>>> struct tasklet_struct pointer to all tasklet
>>> callbacks, switch to using the new tasklet_setup()
>>> and from_tasklet() to pass the tasklet pointer explicitly.
>>
>> Who came up with the idea to add a macro 'from_tasklet' that is just
>> container_of? container_of in the code would be _much_ more readable,
>> and not leave anyone guessing wtf from_tasklet is doing.
>>
>> I'd fix that up now before everything else goes in...
> 
> As I mentioned in the other thread, I think this makes things much more
> readable. It's the same thing that the timer_struct conversion did
> (added a container_of wrapper) to avoid the ever-repeating use of
> typeof(), long lines, etc.

But then it should use a generic name, instead of each sub-system using
some random name that makes people look up exactly what it does. I'm not
huge fan of the container_of() redundancy, but adding private variants
of this doesn't seem like the best way forward. Let's have a generic
helper that does this, and use it everywhere.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] block: convert tasklets to use new tasklet_setup() API

2020-08-18 Thread Jens Axboe
On 8/17/20 12:48 PM, Kees Cook wrote:
> On Mon, Aug 17, 2020 at 12:44:34PM -0700, Jens Axboe wrote:
>> On 8/17/20 12:29 PM, Kees Cook wrote:
>>> On Mon, Aug 17, 2020 at 06:56:47AM -0700, Jens Axboe wrote:
>>>> On 8/17/20 2:15 AM, Allen Pais wrote:
>>>>> From: Allen Pais 
>>>>>
>>>>> In preparation for unconditionally passing the
>>>>> struct tasklet_struct pointer to all tasklet
>>>>> callbacks, switch to using the new tasklet_setup()
>>>>> and from_tasklet() to pass the tasklet pointer explicitly.
>>>>
>>>> Who came up with the idea to add a macro 'from_tasklet' that is just
>>>> container_of? container_of in the code would be _much_ more readable,
>>>> and not leave anyone guessing wtf from_tasklet is doing.
>>>>
>>>> I'd fix that up now before everything else goes in...
>>>
>>> As I mentioned in the other thread, I think this makes things much more
>>> readable. It's the same thing that the timer_struct conversion did
>>> (added a container_of wrapper) to avoid the ever-repeating use of
>>> typeof(), long lines, etc.
>>
>> But then it should use a generic name, instead of each sub-system using
>> some random name that makes people look up exactly what it does. I'm not
>> huge fan of the container_of() redundancy, but adding private variants
>> of this doesn't seem like the best way forward. Let's have a generic
>> helper that does this, and use it everywhere.
> 
> I'm open to suggestions, but as things stand, these kinds of treewide

On naming? Implementation is just as it stands, from_tasklet() is
totally generic which is why I objected to it. from_member()? Not great
with naming... But I can see this going further and then we'll suddenly
have tons of these. It's not good for readability.

> changes end up getting whole-release delays because of the need to have
> the API in place for everyone before patches to do the changes can be
> sent to multiple maintainers, etc.

Sure, that's always true of treewide changes like that.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-wired-lan] [PATCH V2] mm: Replace all open encodings for NUMA_NO_NODE

2018-11-27 Thread Jens Axboe
On 11/26/18 10:56 AM, Jeff Kirsher wrote:
> On Mon, 2018-11-26 at 17:56 +0530, Anshuman Khandual wrote:
>> At present there are multiple places where invalid node number is
>> encoded
>> as -1. Even though implicitly understood it is always better to have
>> macros
>> in there. Replace these open encodings for an invalid node number
>> with the
>> global macro NUMA_NO_NODE. This helps remove NUMA related assumptions
>> like
>> 'invalid node' from various places redirecting them to a common
>> definition.
>>
>> Signed-off-by: Anshuman Khandual 
> 
> For the 'ixgbe' driver changes.
> 
> Acked-by: Jeff Kirsher 

Lost the original, but for mtip32xx:

Acked-by: Jens Axboe 

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] [RFC v2] Drop all 00-INDEX files from Documentation/

2018-09-05 Thread Jens Axboe
On 9/3/18 4:15 PM, Henrik Austad wrote:
> This is a respin with a wider audience (all that get_maintainer returned)
> and I know this spams a *lot* of people. Not sure what would be the correct
> way, so my apologies for ruining your inbox.
> 
> The 00-INDEX files are supposed to give a summary of all files present
> in a directory, but these files are horribly out of date and their
> usefulness is brought into question. Often a simple "ls" would reveal
> the same information as the filenames are generally quite descriptive as
> a short introduction to what the file covers (it should not surprise
> anyone what Documentation/sched/sched-design-CFS.txt covers)
> 
> A few years back it was mentioned that these files were no longer really
> needed, and they have since then grown further out of date, so perhaps
> it is time to just throw them out.
> 
> A short status yields the following _outdated_ 00-INDEX files, first
> counter is files listed in 00-INDEX but missing in the directory, last
> is files present but not listed in 00-INDEX.

For the block related bits:

Reviewed-by: Jens Axboe 

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 02/12] blk: use for_each_if

2018-07-13 Thread Jens Axboe
On 7/12/18 12:45 AM, Joe Perches wrote:
> On Wed, 2018-07-11 at 20:50 +0200, Daniel Vetter wrote:
>> On Wed, Jul 11, 2018 at 8:30 PM, Jens Axboe  wrote:
>>> On 7/11/18 10:45 AM, Tejun Heo wrote:
>>>> On Wed, Jul 11, 2018 at 09:40:58AM -0700, Tejun Heo wrote:
>>>>> On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote:
>>>>>> Makes the macros resilient against if {} else {} blocks right
>>>>>> afterwards.
>>>>>>
>>>>>> Signed-off-by: Daniel Vetter 
>>>>>> Cc: Tejun Heo 
>>>>>> Cc: Jens Axboe 
>>>>>> Cc: Shaohua Li 
>>>>>> Cc: Kate Stewart 
>>>>>> Cc: Greg Kroah-Hartman 
>>>>>> Cc: Joseph Qi 
>>>>>> Cc: Daniel Vetter 
>>>>>> Cc: Arnd Bergmann 
>>>>>
>>>>> Acked-by: Tejun Heo 
>>>>>
>>>>> Jens, it'd probably be best to route this through block tree.
>>>>
>>>> Oops, this requires an earlier patch to move the for_each_if def to a
>>>> common header and should be routed together.
>>>
>>> Yeah, this is a problem with the submission.
>>>
>>> Always (ALWAYS) CC folks on at least the cover letter and generic
>>> earlier patches. Getting just one patch sent like this is mostly
>>> useless, and causes more harm than good.
>>
>> Ime sending a patch with more than 20 or so recipients means it gets
>> stuck everywhere in moderation queues. Or outright spam filters. I
>> thought the correct way to do this is to cc: mailing lists (lkml has
>> them all), but apparently that's not how it's done. Despite that all
>> the patch series I get never have the cover letter addressed to me
>> either.
>>
>> So what's the magic way to make this possible?
> 
> Jens' advice is crap.
> 
> There is no generic way to make this possible.

Nobody claimed there was. And the advice is perfectly fine,
sending out patches to folks that have hidden dependencies on other
patches is a no-go.

> BCC's don't work, series that touch multiple subsystems
> get rejected when the recipient list is too large.
> 
> I think you did it correctly.

Clearly that's not the case, regardless of what you think.

Thanks for your invaluable and useful feedback, sharing your vast
experience in patchsets with dependencies.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 02/12] blk: use for_each_if

2018-07-12 Thread Jens Axboe
On 7/11/18 10:45 AM, Tejun Heo wrote:
> On Wed, Jul 11, 2018 at 09:40:58AM -0700, Tejun Heo wrote:
>> On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote:
>>> Makes the macros resilient against if {} else {} blocks right
>>> afterwards.
>>>
>>> Signed-off-by: Daniel Vetter 
>>> Cc: Tejun Heo 
>>> Cc: Jens Axboe 
>>> Cc: Shaohua Li 
>>> Cc: Kate Stewart 
>>> Cc: Greg Kroah-Hartman 
>>> Cc: Joseph Qi 
>>> Cc: Daniel Vetter 
>>> Cc: Arnd Bergmann 
>>
>> Acked-by: Tejun Heo 
>>
>> Jens, it'd probably be best to route this through block tree.
> 
> Oops, this requires an earlier patch to move the for_each_if def to a
> common header and should be routed together.

Yeah, this is a problem with the submission.

Always (ALWAYS) CC folks on at least the cover letter and generic
earlier patches. Getting just one patch sent like this is mostly
useless, and causes more harm than good.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 02/12] blk: use for_each_if

2018-07-12 Thread Jens Axboe
On 7/11/18 12:50 PM, Daniel Vetter wrote:
> On Wed, Jul 11, 2018 at 8:30 PM, Jens Axboe  wrote:
>> On 7/11/18 10:45 AM, Tejun Heo wrote:
>>> On Wed, Jul 11, 2018 at 09:40:58AM -0700, Tejun Heo wrote:
>>>> On Mon, Jul 09, 2018 at 10:36:40AM +0200, Daniel Vetter wrote:
>>>>> Makes the macros resilient against if {} else {} blocks right
>>>>> afterwards.
>>>>>
>>>>> Signed-off-by: Daniel Vetter 
>>>>> Cc: Tejun Heo 
>>>>> Cc: Jens Axboe 
>>>>> Cc: Shaohua Li 
>>>>> Cc: Kate Stewart 
>>>>> Cc: Greg Kroah-Hartman 
>>>>> Cc: Joseph Qi 
>>>>> Cc: Daniel Vetter 
>>>>> Cc: Arnd Bergmann 
>>>>
>>>> Acked-by: Tejun Heo 
>>>>
>>>> Jens, it'd probably be best to route this through block tree.
>>>
>>> Oops, this requires an earlier patch to move the for_each_if def to a
>>> common header and should be routed together.
>>
>> Yeah, this is a problem with the submission.
>>
>> Always (ALWAYS) CC folks on at least the cover letter and generic
>> earlier patches. Getting just one patch sent like this is mostly
>> useless, and causes more harm than good.
> 
> Ime sending a patch with more than 20 or so recipients means it gets
> stuck everywhere in moderation queues. Or outright spam filters. I
> thought the correct way to do this is to cc: mailing lists (lkml has
> them all), but apparently that's not how it's done. Despite that all
> the patch series I get never have the cover letter addressed to me
> either.
> 
> So what's the magic way to make this possible?

I don't think there's a git easy way of sending it out outside of
just ensuring that everybody is CC'ed on everything. I don't mind
that at all. I don't subscribe to lkml, and the patches weren't
sent to linux-block. Hence all I see is this stand-alone patch,
and logic would dictate that it's stand-alone (but it isn't).

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 02/12] blk: use for_each_if

2018-07-12 Thread Jens Axboe
On 7/11/18 3:08 PM, Daniel Vetter wrote:
> On Wed, Jul 11, 2018 at 10:06 PM, Tejun Heo  wrote:
>> On Wed, Jul 11, 2018 at 01:31:51PM -0600, Jens Axboe wrote:
>>> I don't think there's a git easy way of sending it out outside of
>>> just ensuring that everybody is CC'ed on everything. I don't mind
>>> that at all. I don't subscribe to lkml, and the patches weren't
>>> sent to linux-block. Hence all I see is this stand-alone patch,
>>> and logic would dictate that it's stand-alone (but it isn't).
> 
> Hm yeah I forgot to add linux-block. But others where there's no
> dedicated list (or get_maintainers.pl didn't have one) also complained
> about not getting Cc'ed, and I can't Cc everyone for sweeping changes.

I don't personally see a problem with just CC'ing everyone.

>> What I sometimes do is including a short blurb on each patch giving
>> the overview and action hints (e.g. this is part of patchset doing XYZ
>> and should be routed such and such).  It's a bit redundant but has
>> worked pretty well for patchsets with dependenat & sweeping changes.
> 
> Yeah I guess I can just copypaste/summarize patch 1 to all the
> subsequent patches, sounds like the best option.

Another approach might be to submit the first independent patch
separately. Once that's in the kernel, you can send out the rest
as independent patches instead of doing a cross-kernel series that
all depend on one single patch. Seems to me that's where you run
into issues, and it can be avoided quite easily.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/i915: Make vblank evade warnings optional

2017-05-08 Thread Jens Axboe
On 05/07/2017 11:56 AM, Daniel Vetter wrote:
> On Sun, May 7, 2017 at 7:46 PM, Jens Axboe <ax...@kernel.dk> wrote:
>> On 05/07/2017 11:12 AM, ville.syrj...@linux.intel.com wrote:
>>> From: Ville Syrjälä <ville.syrj...@linux.intel.com>
>>>
>>> Add a new Kconfig option to enable/disable the extra warnings
>>> from the vblank evade code. For now we'll keep the warning
>>> about an actually missed vblank always enabled as that can have
>>> an actual user visible impact. But if we miss the deadline
>>> othrwise there's no real need to bother the user with that.
>>> We'll want these warnings enabled during development however
>>> so that we can catch regressions.
>>>
>>> Based on the reports it looks like this is still very easy
>>> to hit on SKL, so we have more work ahead of us to optimize
>>> the crtiical section further.
>>
>> Shouldn't it just be a debug printk or something instead, so that normal
>> people don't see it, but the folks that turn on debugging can get the
>> info they need? Seems silly to add a kconfig option for this.
> 
> I guess we could keep it as debug for users, but we want to make this
> a hard failure on our CI machines. Kconfig knob is the easiest to roll
> out to all machines.

Wouldn't a module parameter be more useful then, as an opt-in
to catch these violations.

Nobody is going to know wtf to set this kconfig option to.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/i915: Make vblank evade warnings optional

2017-05-08 Thread Jens Axboe
On 05/08/2017 01:25 AM, Daniel Vetter wrote:
> On Sun, May 07, 2017 at 07:52:14PM -0600, Jens Axboe wrote:
>> On 05/07/2017 11:56 AM, Daniel Vetter wrote:
>>> On Sun, May 7, 2017 at 7:46 PM, Jens Axboe <ax...@kernel.dk> wrote:
>>>> On 05/07/2017 11:12 AM, ville.syrj...@linux.intel.com wrote:
>>>>> From: Ville Syrjälä <ville.syrj...@linux.intel.com>
>>>>>
>>>>> Add a new Kconfig option to enable/disable the extra warnings
>>>>> from the vblank evade code. For now we'll keep the warning
>>>>> about an actually missed vblank always enabled as that can have
>>>>> an actual user visible impact. But if we miss the deadline
>>>>> othrwise there's no real need to bother the user with that.
>>>>> We'll want these warnings enabled during development however
>>>>> so that we can catch regressions.
>>>>>
>>>>> Based on the reports it looks like this is still very easy
>>>>> to hit on SKL, so we have more work ahead of us to optimize
>>>>> the crtiical section further.
>>>>
>>>> Shouldn't it just be a debug printk or something instead, so that normal
>>>> people don't see it, but the folks that turn on debugging can get the
>>>> info they need? Seems silly to add a kconfig option for this.
>>>
>>> I guess we could keep it as debug for users, but we want to make this
>>> a hard failure on our CI machines. Kconfig knob is the easiest to roll
>>> out to all machines.
>>
>> Wouldn't a module parameter be more useful then, as an opt-in
>> to catch these violations.
>>
>> Nobody is going to know wtf to set this kconfig option to.
> 
> They're all hidden behind an overall i915 debugging option which tells you
> not to enable it. You won't see this.

OK, that does improve things a bit.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/i915: Make vblank evade warnings optional

2017-05-07 Thread Jens Axboe
On 05/07/2017 11:12 AM, ville.syrj...@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrj...@linux.intel.com>
> 
> Add a new Kconfig option to enable/disable the extra warnings
> from the vblank evade code. For now we'll keep the warning
> about an actually missed vblank always enabled as that can have
> an actual user visible impact. But if we miss the deadline
> othrwise there's no real need to bother the user with that.
> We'll want these warnings enabled during development however
> so that we can catch regressions.
> 
> Based on the reports it looks like this is still very easy
> to hit on SKL, so we have more work ahead of us to optimize
> the crtiical section further.

Shouldn't it just be a debug printk or something instead, so that normal
people don't see it, but the folks that turn on debugging can get the
info they need? Seems silly to add a kconfig option for this.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] drm] Atomic update on pipe (A) took 119 us, max time under evasion is 100 us

2017-05-04 Thread Jens Axboe
On 05/04/2017 11:42 AM, Ville Syrjälä wrote:
> On Thu, May 04, 2017 at 09:26:09AM -0600, Jens Axboe wrote:
>> Hi,
>>
>> Running current -git on my laptop (20FB, X1 Carbon gen4, skylake), I get
>> a lot of the below warnings. Things seem to work fine (in fact it seems
>> faster in general use than previously), but it's a lot of warning spew.
>>
>> [  764.877978] [drm] Atomic update on pipe (A) took 156 us, max time under 
>> evasion is 100 us
> 
> I tried to optimize this a bit recently but indeed it's stil known to be too
> slow. Looks like all of that stuff did land in Linus's tree already,
> so presumably you have it all already.

Yes, this is Linus' tree...

> I did have some further ideas that should help but I got sidetracked by
> other things before I managed to finish the work. I guess I'll need to get
> back on that horse and try to finish what I started.
> 
> In the meantime, maybe we should just silence this error spew again
> until we're more confident about meeting the deadlines. Maarten?
> 
> Do you have lockdep enabled BTW? Based on what I've seen lockdep does
> seem be a major contributor to slowness here.

Nope, running a fairly optimized build on my laptop.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] drm/i915: Limit the busy wait on requests to 2us not 10ms!

2015-11-16 Thread Jens Axboe
On 11/15/2015 06:32 AM, Chris Wilson wrote:
> When waiting for high frequency requests, the finite amount of time
> required to set up the irq and wait upon it limits the response rate. By
> busywaiting on the request completion for a short while we can service
> the high frequency waits as quick as possible. However, if it is a slow
> request, we want to sleep as quickly as possible. The tradeoff between
> waiting and sleeping is roughly the time it takes to sleep on a request,
> on the order of a microsecond. Based on measurements from big core, I
> have set the limit for busywaiting as 2 microseconds.
>
> The code currently uses the jiffie clock, but that is far too coarse (on
> the order of 10 milliseconds) and results in poor interactivity as the
> CPU ends up being hogged by slow requests. To get microsecond resolution
> we need to use a high resolution timer. The cheapest of which is polling
> local_clock(), but that is only valid on the same CPU. If we switch CPUs
> because the task was preempted, we can also use that as an indicator that
>   the system is too busy to waste cycles on spinning and we should sleep
> instead.

I tried this (1+2), and it feels better. However, I added some counters 
just to track how well it's faring:

[  491.077612] i915: invoked=7168, success=50

so out of 6144 invocations, we only avoided going to sleep 49 of those 
times. As a percentage, that's 99.3% of the time we spun 2usec for no 
good reason other than to burn up more of my battery. So the reason 
there's an improvement for me is that we're just not spinning the 10ms 
anymore, however we're still just wasting time for my use case.

I'd recommend putting this behind some option so that people can enable 
it and play with it if they want, but not making it default to on until 
some more clever tracking has been added to dynamically adapt to on when 
to poll and when not to. It should not be a default-on type of thing 
until it's closer to doing the right thing for a normal workload, not 
just some synthetic benchmark.

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-13 Thread Jens Axboe
On 11/13/2015 03:12 PM, Chris Wilson wrote:
> On Fri, Nov 13, 2015 at 09:22:52AM -0700, Jens Axboe wrote:
>> On 11/13/2015 09:13 AM, Mike Galbraith wrote:
>>> On Fri, 2015-11-13 at 08:36 -0700, Jens Axboe wrote:
>>>> Previous patch was obvious pre-coffee crap, this should work for using
>>>> ktime to spin max 1usec.
>>>
>>> 1us seems a tad low.  I doubt the little wooden gears and pulleys of my
>>> core2 Toshiba Satellite lappy can get one loop ground out in a usec :)
>>
>> Maybe it is, it's based off the original intent of the function,
>> though. See the original commit referenced.
>
> I've been looking at numbers from one laptop and I can set the timeout
> at 2us before we see a steep decline in what is more or less synchronous
> request handling (which affects a variety of rendering workloads).

Alright, at least that's a vast improvement from 10ms. If you send me 
something tested, I can try it here.

> Looking around, other busy loops seem to use local_clock() (i.e. rdstcll
> with a fair wind). Is that worth using here?

Honestly, don't think it matters too much for this case. You'd have to 
disable preempt to use local_clock(), fwiw. It is a faster variant 
though, but the RT people might hate you for 2us preempt disables :-)

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-13 Thread Jens Axboe
On 11/13/2015 09:13 AM, Mike Galbraith wrote:
> On Fri, 2015-11-13 at 08:36 -0700, Jens Axboe wrote:
>> Previous patch was obvious pre-coffee crap, this should work for using
>> ktime to spin max 1usec.
>
> 1us seems a tad low.  I doubt the little wooden gears and pulleys of my
> core2 Toshiba Satellite lappy can get one loop ground out in a usec :)

Maybe it is, it's based off the original intent of the function, though. 
See the original commit referenced.

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-13 Thread Jens Axboe
Previous patch was obvious pre-coffee crap, this should work for using
ktime to spin max 1usec.

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5cf4a1998273..21192e55c33c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1148,17 +1148,18 @@ static bool missed_irq(struct drm_i915_private 
*dev_priv,

 static int __i915_spin_request(struct drm_i915_gem_request *req)
 {
-   unsigned long timeout;
+   ktime_t timeout;

if (i915_gem_request_get_ring(req)->irq_refcount)
return -EBUSY;

-   timeout = jiffies + 1;
+   timeout = ktime_get();
+   ktime_add_us(timeout, 1);
while (!need_resched()) {
if (i915_gem_request_completed(req, true))
return 0;

-   if (time_after_eq(jiffies, timeout))
+   if (ktime_after(ktime_get(), timeout))
break;

cpu_relax_lowlatency();

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-13 Thread Jens Axboe
On 11/13/2015 02:15 AM, Chris Wilson wrote:
> On Thu, Nov 12, 2015 at 03:52:02PM -0700, Jens Axboe wrote:
>> On 11/12/2015 03:19 PM, Chris Wilson wrote:
>>>>> So today, I figured I'd try just killing that spin. If it fails, we'll
>>>>> punt to normal completions, so easy change. And wow, MASSIVE difference.
>>>>> I can now scroll in chrome and not rage! It's like the laptop is 10x
>>>>> faster now.
>>>>>
>>>>> Ran git blame, and found:
>>>>>
>>>>> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
>>>>> Author: Chris Wilson 
>>>>> Date:   Tue Apr 7 16:20:41 2015 +0100
>>>>>
>>>>>  drm/i915: Optimistically spin for the request completion
>>>>>
>>>>> and read the commit message. Doesn't sound that impressive. Especially
>>>>> not for something that screws up interactive performance by a LOT.
>>>>>
>>>>> What's the deal? Revert?
>>>
>>> The tests that it improved the most were the latency sensitive tests and
>>> since my Broadwell xps13 behaves itself, I'd like to understand how it
>>> culminates in an interactivity loss.
>>>
>>> 1. Maybe it is the uninterruptible nature of the polling, making X's
>>> SIGIO jerky:
>>
>> This one still feels bad.
>>
>>> 2. Or maybe it is increased mutex contention:
>>
>> And so does this one... I had to manually apply hunks 2-3, and after
>> doing seat-of-the-pants testing for both variants, I confirmed with
>> perf that we're still seeing a ton of time in __i915_wait_request()
>> for both of them.
>>
>>> Or maybe it is an indirect effect, such as power balancing between the
>>> CPU and GPU, or just thermal throttling, or it may be the task being
>>> penalised for consuming its timeslice (for which any completion polling
>>> seems susceptible).
>>
>> Look, polls in the 1-10ms range are just insane. Either you botched
>> the commit message and really meant "~1ms at most" and in which case
>> I'd suspect you of smoking something good, or you hacked it up wrong
>> and used jiffies when you really wanted to be using some other time
>> check that really did give you 1us.
>
> What other time check? I was under the impression of setting up a
> hrtimer was expensive and jiffies was available.

Looping for 10ms is a lot more expensive :-). jiffies is always there, 
but it's WAY too coarse to be used for something like this.

You could use ktime_get(), there's a lot of helpers for checking 
time_after, adding msecs, etc. Something like the below, not tested here
yet.

>> I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks.
>>
>>>> "Limit the spinning to a single jiffie (~1us) at most"
>>>>
>>>> is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms!
>>>> Even if I had HZ=1000, that'd still be 1ms of spinning. That's
>>>> seriously screwed up, guys.
>>>
>>> That's over and above the termination condition for blk_poll().
>>
>> ?! And this is related, how? Comparing apples and oranges. One is a
>> test opt-in feature for experimentation, the other is
>> unconditionally enabled for everyone. I believe the commit even says
>> so. See the difference? Would I use busy loop spinning waiting for
>> rotating storage completions, which are in the 1-10ms range? No,
>> with the reason being that the potential wins for spins are in the
>> usec range.
>
> Equally I expect the service interval for a batch to be around 2-20us.
> There are many workloads that execute 30-50k requests/s, and you can
> appreciate that they are sensitive to the latency in setting up an irq
> and receiving it - just as equally leaving that irq enabled saturates a
> CPU with simply executing the irq handler. So what mechanism do you use
> to guard against either a very long queue depth or an abnormal request
> causing msec+ spins?

Not disputing that polling can work, but it needs to be a bit more 
clever. Do you know which requests are fast and which ones are not? 
Could you track it? Should we make this a module option?

20usec is too long to poll. If we look at the wins of polling, we're
talking anywhere from 1-2 usec to maybe 5 usec, depending on different
factors. So spinning between 1-3 usec should be a hard limit on most
platforms. And it's somewhat of a policy decision, since it does involve
throwing CPU at the problem. There's a crossover point where below it's
always a win, but that needs a lot more work than just optimistic
spinning for everythin

__i915_spin_request() sucks

2015-11-12 Thread Jens Axboe
On 11/12/2015 03:52 PM, Jens Axboe wrote:
> On 11/12/2015 03:19 PM, Chris Wilson wrote:
>>>> So today, I figured I'd try just killing that spin. If it fails, we'll
>>>> punt to normal completions, so easy change. And wow, MASSIVE
>>>> difference.
>>>> I can now scroll in chrome and not rage! It's like the laptop is 10x
>>>> faster now.
>>>>
>>>> Ran git blame, and found:
>>>>
>>>> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
>>>> Author: Chris Wilson 
>>>> Date:   Tue Apr 7 16:20:41 2015 +0100
>>>>
>>>>  drm/i915: Optimistically spin for the request completion
>>>>
>>>> and read the commit message. Doesn't sound that impressive. Especially
>>>> not for something that screws up interactive performance by a LOT.
>>>>
>>>> What's the deal? Revert?
>>
>> The tests that it improved the most were the latency sensitive tests and
>> since my Broadwell xps13 behaves itself, I'd like to understand how it
>> culminates in an interactivity loss.
>>
>> 1. Maybe it is the uninterruptible nature of the polling, making X's
>> SIGIO jerky:
>
> This one still feels bad.
>
>> 2. Or maybe it is increased mutex contention:
>
> And so does this one... I had to manually apply hunks 2-3, and after
> doing seat-of-the-pants testing for both variants, I confirmed with perf
> that we're still seeing a ton of time in __i915_wait_request() for both
> of them.

I don't see how #2 could make any difference, you're passing in 0x3 hard 
coded for most call sites, so we poll. The ones that don't, pass a bool 
(?!).

I should note that with the basic patch of just never spinning, I don't 
see __i915_wait_request() in the profiles. At all.

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-12 Thread Jens Axboe
On 11/12/2015 03:19 PM, Chris Wilson wrote:
>>> So today, I figured I'd try just killing that spin. If it fails, we'll
>>> punt to normal completions, so easy change. And wow, MASSIVE difference.
>>> I can now scroll in chrome and not rage! It's like the laptop is 10x
>>> faster now.
>>>
>>> Ran git blame, and found:
>>>
>>> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
>>> Author: Chris Wilson 
>>> Date:   Tue Apr 7 16:20:41 2015 +0100
>>>
>>>  drm/i915: Optimistically spin for the request completion
>>>
>>> and read the commit message. Doesn't sound that impressive. Especially
>>> not for something that screws up interactive performance by a LOT.
>>>
>>> What's the deal? Revert?
>
> The tests that it improved the most were the latency sensitive tests and
> since my Broadwell xps13 behaves itself, I'd like to understand how it
> culminates in an interactivity loss.
>
> 1. Maybe it is the uninterruptible nature of the polling, making X's
> SIGIO jerky:

This one still feels bad.

> 2. Or maybe it is increased mutex contention:

And so does this one... I had to manually apply hunks 2-3, and after 
doing seat-of-the-pants testing for both variants, I confirmed with perf 
that we're still seeing a ton of time in __i915_wait_request() for both 
of them.

> Or maybe it is an indirect effect, such as power balancing between the
> CPU and GPU, or just thermal throttling, or it may be the task being
> penalised for consuming its timeslice (for which any completion polling
> seems susceptible).

Look, polls in the 1-10ms range are just insane. Either you botched the 
commit message and really meant "~1ms at most" and in which case I'd 
suspect you of smoking something good, or you hacked it up wrong and 
used jiffies when you really wanted to be using some other time check 
that really did give you 1us.

I'll take an IRQ over 10 msecs of busy looping on my laptop, thanks.

>> "Limit the spinning to a single jiffie (~1us) at most"
>>
>> is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms!
>> Even if I had HZ=1000, that'd still be 1ms of spinning. That's
>> seriously screwed up, guys.
>
> That's over and above the termination condition for blk_poll().

?! And this is related, how? Comparing apples and oranges. One is a test 
opt-in feature for experimentation, the other is unconditionally enabled 
for everyone. I believe the commit even says so. See the difference? 
Would I use busy loop spinning waiting for rotating storage completions, 
which are in the 1-10ms range? No, with the reason being that the 
potential wins for spins are in the usec range.

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-12 Thread Jens Axboe
On 11/12/2015 01:36 PM, Jens Axboe wrote:
> Hi,
>
> So a few months ago I got an XPS13 laptop, the one with the high res
> screen. GUI performance was never really that great, I attributed it to
> coming from a more powerful laptop, and the i915 driving a lot of
> pixels. But yesterday I browsed from my wife's macbook, and was blown
> away. Wow, scrolling in chrome SUCKS on the xps13. Not just scrolling,
> basically anything in chrome. Molasses. So I got sick of it, fired up a
> quick perf record, did a bunch of stuff in chrome. No super smoking
> guns, but one thing did stick out - the path leading to
> __i915_spin_request().
>
> So today, I figured I'd try just killing that spin. If it fails, we'll
> punt to normal completions, so easy change. And wow, MASSIVE difference.
> I can now scroll in chrome and not rage! It's like the laptop is 10x
> faster now.
>
> Ran git blame, and found:
>
> commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
> Author: Chris Wilson 
> Date:   Tue Apr 7 16:20:41 2015 +0100
>
>  drm/i915: Optimistically spin for the request completion
>
> and read the commit message. Doesn't sound that impressive. Especially
> not for something that screws up interactive performance by a LOT.
>
> What's the deal? Revert?

BTW, this:

"Limit the spinning to a single jiffie (~1us) at most"

is totally wrong. I have HZ=100 on my laptop. That's 10ms. 10ms! Even if 
I had HZ=1000, that'd still be 1ms of spinning. That's seriously screwed 
up, guys.

-- 
Jens Axboe



__i915_spin_request() sucks

2015-11-12 Thread Jens Axboe
Hi,

So a few months ago I got an XPS13 laptop, the one with the high res 
screen. GUI performance was never really that great, I attributed it to 
coming from a more powerful laptop, and the i915 driving a lot of 
pixels. But yesterday I browsed from my wife's macbook, and was blown 
away. Wow, scrolling in chrome SUCKS on the xps13. Not just scrolling, 
basically anything in chrome. Molasses. So I got sick of it, fired up a 
quick perf record, did a bunch of stuff in chrome. No super smoking 
guns, but one thing did stick out - the path leading to 
__i915_spin_request().

So today, I figured I'd try just killing that spin. If it fails, we'll 
punt to normal completions, so easy change. And wow, MASSIVE difference. 
I can now scroll in chrome and not rage! It's like the laptop is 10x 
faster now.

Ran git blame, and found:

commit 2def4ad99befa25775dd2f714fdd4d92faec6e34
Author: Chris Wilson 
Date:   Tue Apr 7 16:20:41 2015 +0100

 drm/i915: Optimistically spin for the request completion

and read the commit message. Doesn't sound that impressive. Especially 
not for something that screws up interactive performance by a LOT.

What's the deal? Revert?

-- 
Jens Axboe



[PATCH] drm: Fix locking for sysfs dpms file

2015-09-29 Thread Jens Axboe
On 09/29/2015 01:56 AM, Daniel Vetter wrote:
> With atomic drivers we need to make sure that (at least in general)
> property reads hold the right locks. But the legacy dpms property is
> special and can be read locklessly. Since userspace loves to just
> randomly look at that all the time (like with "status") do that.
>
> To make it clear that we play tricks use the READ_ONCE compiler
> barrier (and also for paranoia).
>
> Note that there's not really anything bad going on since even with the
> new atomic paths we eventually end up not chasing any pointers (and
> hence possibly freed memory and other fun stuff). The locking WARNING
> has been added in
>
> commit 88a48e297b3a3bac6022c03babfb038f1a886cea
> Author: Rob Clark 
> Date:   Thu Dec 18 16:01:50 2014 -0500
>
>  drm: add atomic properties
>
> but since drivers are converting not everyone will have seen this from
> the start.
>
> Jens reported this and submitted a patch to just grab the
> mode_config.connection_mutex, but we can do a bit better.
>
> v2: Remove unused variables I failed to git add for real.
>
> Reported-by: Jens Axboe 
> Cc: Jens Axboe 
> Cc: Rob Clark 
> Cc: stable at vger.kernel.org
> Signed-off-by: Daniel Vetter 

Works for me, thanks Daniel.

Tested-by: Jens Axboe 

-- 
Jens Axboe



[PATCH] drm: fix missing modeset lock from sysfs path

2015-09-28 Thread Jens Axboe
Hi,

Running 4.3-rc on a new laptop, and I notice that I get these whenever
powertop --auto-tune is run:

[14954.927920] [ cut here ]
[14954.927926] WARNING: CPU: 1 PID: 14297 at drivers/gpu/drm/drm_atomic.c:889 
drm_atomic_get_property+0x232/0x2b0()
[14954.927927] Modules linked in: msr drbg ctr ccm cmac uvcvideo hid_generic 
videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev usbhid 
btusb btintel arc4 iwlmvm mac80211 joydev snd_hda_codec_realtek 
snd_hda_codec_generic x86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp 
kvm_intel dell_laptop dell_wmi dcdbas snd_hda_intel hid_multitouch kvm 
sparse_keymap iwlwifi snd_hda_codec snd_hwdep snd_hda_core aesni_intel 
aes_x86_64 glue_helper ehci_pci lrw gf128mul xhci_pci ablk_helper cfg80211 
cryptd xhci_hcd ehci_hcd snd_pcm int3403_thermal snd_seq_midi bnep 
snd_seq_midi_event rfcomm snd_rawmidi bluetooth snd_seq snd_timer 
snd_seq_device processor_thermal_device snd intel_soc_dts_iosf iosf_mbi i2c_hid 
hid i2c_designware_platform i2c_designware_core int3402_thermal int3400_thermal 
int340x_thermal_zone acpi_thermal_rel soundcore binfmt_misc nls_iso8859_1 
nls_cp437 fuse vfat fat
[14954.927963] CPU: 1 PID: 14297 Comm: powertop Tainted: G U  W   
4.3.0-rc3+ #177
[14954.927964] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A05 07/14/2015
[14954.927965]  81a7c2db 8800032e3c88 812923a9 

[14954.927967]  8800032e3cc0 81050aa6 880215577028 
88021551df80
[14954.927968]  880215577000 880133e2ea80 88020ea59000 
8800032e3cd0
[14954.927970] Call Trace:
[14954.927974]  [] dump_stack+0x4b/0x72
[14954.927976]  [] warn_slowpath_common+0x86/0xc0
[14954.927978]  [] warn_slowpath_null+0x1a/0x20
[14954.927980]  [] drm_atomic_get_property+0x232/0x2b0
[14954.927982]  [] drm_object_property_get_value+0x6c/0x70
[14954.927984]  [] dpms_show+0x2f/0x70
[14954.927987]  [] dev_attr_show+0x20/0x50
[14954.927989]  [] sysfs_kf_seq_show+0xa3/0x140
[14954.927990]  [] kernfs_seq_show+0x20/0x30
[14954.927993]  [] seq_read+0xcd/0x370
[14954.927994]  [] kernfs_fop_read+0x107/0x160
[14954.927996]  [] __vfs_read+0x28/0xd0
[14954.927999]  [] ? security_file_permission+0xa3/0xc0
[14954.928001]  [] ? rw_verify_area+0x53/0xf0
[14954.928002]  [] vfs_read+0x86/0x130
[14954.928004]  [] SyS_read+0x46/0xa0
[14954.928007]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[14954.928008] ---[ end trace b1d297c2aeff3470 ]---

Looks like we're missing a modeset lock/unlock around the property
retrieval from the sysfs path. Below patch works for me.

Signed-off-by: Jens Axboe 

diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index 0f6cd33b531f..2b62166f6b2a 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -239,9 +239,12 @@ static ssize_t dpms_show(struct device *device,
uint64_t dpms_status;
int ret;

+   drm_modeset_lock(>mode_config.connection_mutex, NULL);
ret = drm_object_property_get_value(>base,
dev->mode_config.dpms_property,
_status);
+   drm_modeset_unlock(>mode_config.connection_mutex);
+
if (ret)
return 0;


-- 
Jens Axboe



3.12-rc6 nouveau oops

2013-10-25 Thread Jens Axboe
On Fri, Oct 25 2013, Dave Airlie wrote:
> On Fri, Oct 25, 2013 at 8:59 AM, Dave Airlie  wrote:
> >> Booting 3.12-rc6 on my macbook quickly freezes after logging in to X. I
> >> was able to capture this oops. It's 100% consistent. 3.11 works
> >> perfectly fine, as did previous kernels.
> >
> > Can you boot with nouveau.runpm=0 and see if it boots and survives
> > suspend/resume.

It does!

> Oh also a complete dmesg so we can see the DCB table.

Below.


[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.12.0-rc6+ (axboe at lenny) (gcc version 4.7.3 
(Ubuntu/Linaro 4.7.3-2ubuntu4) ) #68 SMP PREEMPT Fri Oct 25 13:07:23 BST 2013
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.12 root=/dev/sda6 
video=efifb i915.modeset=1 i915.i915_enable_rc6=3 i915.lvds_channel_mode=1 
i915.i915_enable_fbc=1 i915.lvds_downclock=1 drm.vblankoffdelay=1 
i915.i915_enable_ppgtt=0 nouveau.runpm=0
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0008dfff] usable
[0.00] BIOS-e820: [mem 0x0008e000-0x0008] reserved
[0.00] BIOS-e820: [mem 0x0009-0x0009] usable
[0.00] BIOS-e820: [mem 0x000a-0x000b] reserved
[0.00] BIOS-e820: [mem 0x0010-0x1fff] usable
[0.00] BIOS-e820: [mem 0x2000-0x201f] reserved
[0.00] BIOS-e820: [mem 0x2020-0x40003fff] usable
[0.00] BIOS-e820: [mem 0x40004000-0x40004fff] reserved
[0.00] BIOS-e820: [mem 0x40005000-0x8ad13fff] usable
[0.00] BIOS-e820: [mem 0x8ad14000-0x8ad52fff] ACPI NVS
[0.00] BIOS-e820: [mem 0x8ad53000-0x8ad68fff] usable
[0.00] BIOS-e820: [mem 0x8ad69000-0x8ad8efff] ACPI data
[0.00] BIOS-e820: [mem 0x8ad8f000-0x8ae3afff] usable
[0.00] BIOS-e820: [mem 0x8ae3b000-0x8ae8efff] reserved
[0.00] BIOS-e820: [mem 0x8ae8f000-0x8aecdfff] usable
[0.00] BIOS-e820: [mem 0x8aece000-0x8aefefff] reserved
[0.00] BIOS-e820: [mem 0x8aeff000-0x8af91fff] usable
[0.00] BIOS-e820: [mem 0x8af92000-0x8affefff] reserved
[0.00] BIOS-e820: [mem 0x8afff000-0x8aff] usable
[0.00] BIOS-e820: [mem 0x8b00-0x8f9f] reserved
[0.00] BIOS-e820: [mem 0xe00f8000-0xe00f8fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1] reserved
[0.00] BIOS-e820: [mem 0xffe7-0xffe9] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00026f5f] usable
[0.00] NX (Execute Disable) protection: active
[0.00] efi: EFI v1.10 by Apple
[0.00] efi:  ACPI=0x8ad8e000  ACPI 2.0=0x8ad8e014  SMBIOS=0x8ad1b000 
[0.00] efi: mem00: type=7, attr=0x80f, 
range=[0x-0x0008e000) (0MB)
[0.00] efi: mem01: type=0, attr=0x80f, 
range=[0x0008e000-0x0009) (0MB)
[0.00] efi: mem02: type=7, attr=0x80f, 
range=[0x0009-0x000a) (0MB)
[0.00] efi: mem03: type=2, attr=0xf, 
range=[0x0010-0x0131) (18MB)
[0.00] efi: mem04: type=7, attr=0xf, 
range=[0x0131-0x2000) (492MB)
[0.00] efi: mem05: type=0, attr=0xf, 
range=[0x2000-0x2020) (2MB)
[0.00] efi: mem06: type=7, attr=0xf, 
range=[0x2020-0x36006000) (350MB)
[0.00] efi: mem07: type=2, attr=0xf, 
range=[0x36006000-0x36ffb000) (15MB)
[0.00] efi: mem08: type=7, attr=0xf, 
range=[0x36ffb000-0x40004000) (144MB)
[0.00] efi: mem09: type=0, attr=0xf, 
range=[0x40004000-0x40005000) (0MB)
[0.00] efi: mem10: type=7, attr=0xf, 
range=[0x40005000-0x655bf000) (597MB)
[0.00] efi: mem11: type=2, attr=0xf, 
range=[0x655bf000-0x8721a000) (540MB)
[0.00] efi: mem12: type=1, attr=0xf, 
range=[0x8721a000-0x872d) (0MB)
[0.00] efi: mem13: type=4, attr=0xf, 
range=[0x872d-0x872d1000) (0MB)
[0.00] efi: mem14: type=7, attr=0xf, 
range=[0x872d1000-0x87383000) (0MB)
[0.00] efi: mem15: type=4, attr=0xf, 
range=[0x87383000-0x87386000) (0MB)
[0.00] efi: mem16: type=2, attr=0xf, 
range=[0x87386000-0x87387000) (0MB)
[0.00] efi: mem17: type=4, attr=0xf, 

3.12-rc6 nouveau oops

2013-10-24 Thread Jens Axboe
Hi,

Booting 3.12-rc6 on my macbook quickly freezes after logging in to X. I
was able to capture this oops. It's 100% consistent. 3.11 works
perfectly fine, as did previous kernels.

-- 
Jens Axboe

-- next part --
A non-text attachment was scrubbed...
Name: nouveau-oops.jpg
Type: image/jpeg
Size: 1083671 bytes
Desc: not available
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20131024/78073692/attachment-0001.jpg>


[PATCH] Block: use a freezable workqueue for disk-event polling

2012-03-02 Thread Jens Axboe
On 02/17/2012 10:22 PM, Alan Stern wrote:
> This patch (as1519) fixes a bug in the block layer's disk-events
> polling.  The polling is done by a work routine queued on the
> system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
> polling continues even in the middle of a system sleep transition.
> 
> Obviously, polling a suspended drive for media changes and such isn't
> a good thing to do; in the case of USB mass-storage devices it can
> lead to real problems requiring device resets and even re-enumeration.
> 
> The patch fixes things by creating a new system-wide, non-reentrant,
> freezable workqueue and using it for disk-events polling.

Thanks Alan, picked up.

-- 
Jens Axboe



Re: [PATCH] Block: use a freezable workqueue for disk-event polling

2012-03-02 Thread Jens Axboe
On 02/17/2012 10:22 PM, Alan Stern wrote:
 This patch (as1519) fixes a bug in the block layer's disk-events
 polling.  The polling is done by a work routine queued on the
 system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
 polling continues even in the middle of a system sleep transition.
 
 Obviously, polling a suspended drive for media changes and such isn't
 a good thing to do; in the case of USB mass-storage devices it can
 lead to real problems requiring device resets and even re-enumeration.
 
 The patch fixes things by creating a new system-wide, non-reentrant,
 freezable workqueue and using it for disk-events polling.

Thanks Alan, picked up.

-- 
Jens Axboe

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Remaining BKL users, what to do

2010-09-17 Thread Jens Axboe
On 2010-09-16 16:49, Steven Rostedt wrote:
 On Thu, 2010-09-16 at 16:32 +0200, Arnd Bergmann wrote:
 The big kernel lock is gone from almost all code in linux-next, this is
 the status of what I think will happen to the remaining users:
 
 kernel/trace/blktrace.c:
  Should be easy. Ingo? Steven?

 
 Jens,
 
 Git blame shows this to be your code (copied from block/blktrace.c from
 years past).
 
 Is the lock_kernel() needed here? (although Arnd did add it in 62c2a7d9)

It isn't, it can be removed.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments 
to it are confidential to the intended recipient, and may contain information 
that is privileged and/or exempt from disclosure under applicable law. If you 
are not the intended recipient, please immediately notify the sender and 
destroy the original e-mail message and any attachments (and any copies that 
may have been made) from your system or otherwise. Any unauthorized use, 
copying, disclosure or distribution of this information is strictly prohibited.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Remaining BKL users, what to do

2010-09-16 Thread Jens Axboe
On 2010-09-16 16:49, Steven Rostedt wrote:
> On Thu, 2010-09-16 at 16:32 +0200, Arnd Bergmann wrote:
>> The big kernel lock is gone from almost all code in linux-next, this is
>> the status of what I think will happen to the remaining users:
> 
>> kernel/trace/blktrace.c:
>>  Should be easy. Ingo? Steven?
>>
> 
> Jens,
> 
> Git blame shows this to be your code (copied from block/blktrace.c from
> years past).
> 
> Is the lock_kernel() needed here? (although Arnd did add it in 62c2a7d9)

It isn't, it can be removed.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments 
to it are confidential to the intended recipient, and may contain information 
that is privileged and/or exempt from disclosure under applicable law. If you 
are not the intended recipient, please immediately notify the sender and 
destroy the original e-mail message and any attachments (and any copies that 
may have been made) from your system or otherwise. Any unauthorized use, 
copying, disclosure or distribution of this information is strictly prohibited.


[PATCH RESEND block#for-2.6.36] block_dev: always serialize exclusive open attempts

2010-08-05 Thread Jens Axboe
On 2010-08-05 11:17, Markus Trippelsdorf wrote:
> On Thu, Aug 05, 2010 at 11:02:43AM +0200, Jens Axboe wrote:
>> On 2010-08-04 17:59, Tejun Heo wrote:
>>> bd_prepare_to_claim() incorrectly allowed multiple attempts for
>>> exclusive open to progress in parallel if the attempting holders are
>>> identical.  This triggered BUG_ON() as reported in the following bug.
>>>
>>>   https://bugzilla.kernel.org/show_bug.cgi?id=16393
>>>
>>> __bd_abort_claiming() is used to finish claiming blocks and doesn't
>>> work if multiple openers are inside a claiming block.  Allowing
>>> multiple parallel open attempts to continue doesn't gain anything as
>>> those are serialized down in the call chain anyway.  Fix it by always
>>> allowing only single open attempt in a claiming block.
>>>
>>> This problem can easily be reproduced by adding a delay after
>>> bd_prepare_to_claim() and attempting to mount two partitions of a
>>> disk.
>>>
>>> stable: only applicable to v2.6.35
>>>
>>> Signed-off-by: Tejun Heo 
>>> Reported-by: Markus Trippelsdorf 
>>> Cc: stable at kernel.org
>>
>> Thanks Tejun, applied.
> 
> It's already in mainline:
> e75aa85892b2ee78c79edac720868cbef16e62eb

Irk, had not noticed yet, my for-2.6.36 branch isn't fully merged
up yet. Thanks for the heads-up.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments 
to it are confidential to the intended recipient, and may contain information 
that is privileged and/or exempt from disclosure under applicable law. If you 
are not the intended recipient, please immediately notify the sender and 
destroy the original e-mail message and any attachments (and any copies that 
may have been made) from your system or otherwise. Any unauthorized use, 
copying, disclosure or distribution of this information is strictly prohibited.


[PATCH RESEND block#for-2.6.36] block_dev: always serialize exclusive open attempts

2010-08-05 Thread Jens Axboe
On 2010-08-04 17:59, Tejun Heo wrote:
> bd_prepare_to_claim() incorrectly allowed multiple attempts for
> exclusive open to progress in parallel if the attempting holders are
> identical.  This triggered BUG_ON() as reported in the following bug.
> 
>   https://bugzilla.kernel.org/show_bug.cgi?id=16393
> 
> __bd_abort_claiming() is used to finish claiming blocks and doesn't
> work if multiple openers are inside a claiming block.  Allowing
> multiple parallel open attempts to continue doesn't gain anything as
> those are serialized down in the call chain anyway.  Fix it by always
> allowing only single open attempt in a claiming block.
> 
> This problem can easily be reproduced by adding a delay after
> bd_prepare_to_claim() and attempting to mount two partitions of a
> disk.
> 
> stable: only applicable to v2.6.35
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Markus Trippelsdorf 
> Cc: stable at kernel.org

Thanks Tejun, applied.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments 
to it are confidential to the intended recipient, and may contain information 
that is privileged and/or exempt from disclosure under applicable law. If you 
are not the intended recipient, please immediately notify the sender and 
destroy the original e-mail message and any attachments (and any copies that 
may have been made) from your system or otherwise. Any unauthorized use, 
copying, disclosure or distribution of this information is strictly prohibited.