Re: [PATCH 04/32] aio: remove retry-based AIO

2012-12-28 Thread Hillf Danton
On Thu, Dec 27, 2012 at 9:59 AM, Kent Overstreet  wrote:
> @@ -1585,18 +1278,27 @@ static int io_submit_one(struct kioctx *ctx, struct 
> iocb __user *user_iocb,
>  * don't see ctx->dead set here, io_destroy() waits for our IO to
>  * finish.
>  */
> -   if (ctx->dead) {
> -   spin_unlock_irq(>ctx_lock);
> +   if (ctx->dead)
> ret = -EINVAL;
> +   spin_unlock_irq(>ctx_lock);
> +   if (ret)
> goto out_put_req;
> +
> +   if (unlikely(kiocbIsCancelled(req))) {
> +   ret = -EINTR;
> +   } else {
> +   ret = req->ki_retry(req);
> }
> -   aio_run_iocb(req);
> -   if (!list_empty(>run_list)) {
> -   /* drain the run list */
> -   while (__aio_run_iocbs(ctx))
> -   ;
> +   if (ret != -EIOCBQUEUED) {
> +   /*
> +* There's no easy way to restart the syscall since other 
> AIO's
> +* may be already running. Just fail this IO with EINTR.
> +*/
> +   if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR ||
> +ret == -ERESTARTNOHAND || ret == 
> -ERESTART_RESTARTBLOCK))
> +   ret = -EINTR;
> +   aio_complete(req, ret, 0);
> }
> -   spin_unlock_irq(>ctx_lock);
>
> aio_put_req(req);   /* drop extra ref to req */
> return 0;

  return ret;  yes?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/32] aio: remove retry-based AIO

2012-12-28 Thread Hillf Danton
On Thu, Dec 27, 2012 at 9:59 AM, Kent Overstreet  wrote:
> -   /*
> -* This is so that aio_complete knows it doesn't need to
> -* pull the iocb off the run list (We can't just call
> -* INIT_LIST_HEAD because we don't want a kick_iocb to
> -* queue this on the run list yet)
> -*/
> -   iocb->ki_run_list.next = iocb->ki_run_list.prev = NULL;

The tricky, list.next = NULL, is erased here, but it is reused in one of
the subsequent patches, why?

Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: do not sleep in balance_pgdat if there's no i/o congestion

2012-12-28 Thread Hillf Danton
On Thu, Dec 27, 2012 at 11:42 PM, Zlatko Calusic
 wrote:
> On 21.12.2012 12:51, Hillf Danton wrote:
>>
>> On Thu, Dec 20, 2012 at 7:25 AM, Zlatko Calusic 
>> wrote:
>>>
>>>   static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
>>>  int
>>> *classzone_idx)
>>>   {
>>> -   int all_zones_ok;
>>> +   struct zone *unbalanced_zone;
>>
>>
>> nit: less hunks if not erase that mark
>>
>> Hillf
>
>
> This one left unanswered and forgotten because I didn't understand what you
> meant. Could you elaborate?
>
Sure, the patch looks simpler(and nicer) if we dont
erase all_zones_ok.

Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at mm/huge_memory.c:1798!

2012-12-28 Thread Hillf Danton
On Fri, Dec 28, 2012 at 12:08 AM, Alex Xu  wrote:
> On 25/12/12 07:05 AM, Hillf Danton wrote:
>> On Tue, Dec 25, 2012 at 12:38 PM, Zhouping Liu  wrote:
>>> Hello all,
>>>
>>> I found the below kernel bug using latest mainline(637704cbc95),
>>> my hardware has 2 numa nodes, and it's easy to reproduce the issue
>>> using LTP test case: "# ./mmap10 -a -s -c 200":
>>
>> Can you test with 5a505085f0 and 4fc3f1d66b1 reverted?
>>
>> Hillf
>>
>
> (for people from mailing lists, please cc me when replying)
>
> Same thing?

Yes and thank you very much for reporting it.

Hillf
>
> mapcount 0 page_mapcount 1
> [ cut here ]
> kernel BUG at mm/huge_memory.c:1798!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/9] ARM: dt: tegra114: Add new SoC base, Tegra 114 SoC

2012-12-28 Thread Olof Johansson
On Thu, Dec 20, 2012 at 11:44:04AM +0200, Hiroshi Doyu wrote:
> Initial support for Tegra 114 SoC. This is expected to be included in
> the board DTS files, Tegra 114 SoC based evaluation board family.
> 
> Signed-off-by: Hiroshi Doyu 

Hi,

I think it could be a good idea to start documenting the cpu configuration in
the device tree on some of these larger platforms as well, i.e. include a cpus/
hierarchy. Would you mind adding those, please?

It would give you the flexibility of switching over to using device
tree to probe the number of cpus in case the platform grows yet another
way to figure out the number of cores in the future, without having to
update the device-trees at that time. :)


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread Matthew Garrett
On Fri, 2012-12-28 at 21:19 -0800, H. Peter Anvin wrote:
> Again, we could hack a simulator and try it.

Yeah, shouldn't be too hard to wedge into qemu/ovmf.
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH/RFC] user_ns: fix missing limiting of user_ns counts

2012-12-28 Thread Vasily Kulikov
On Fri, Dec 28, 2012 at 20:05 -0800, Eric W. Biederman wrote:
> Vasily Kulikov  writes:
> 
> > Currently there is completely no limiting in number of user namespaces
> > created by unprivileged users.  One can freely create thousands of
> > user_ns'es and exhaust kernel memory without even bumping in
> > RLIMIT_NPROC or similar.
> 
> First for a proper sense of scale it will take roughly 14,000 to consume
> a megabyte.  So it will take hundreds of millions of user namespaces to
> eat up all of kernel memory.

Yes, but you can freely create *any* number of nested userns by a loop:

for() {
unshare()
write to /proc/self/{u,g}id_map
}

> > The code needs several checks.  First, noone should be able to create
> > user_ns of arbitrary depth.  Besides kernel stack overflow one could
> > create too big depth to DoS processes belonging to other users by
> > forcing them to loop a long time in cap_capable called from some
> > ns_capable() (e.g. in case one does smth like "ls -R /proc").
> 
> Where do you get a ns_capable call from "ls -R /proc" ?

E.g. if procfs is mounted with hidepid=2 then ls does
ptrace_may_access() check. 

Thanks,

-- 
Vasily Kulikov
http://www.openwall.com - bringing security into open computing environments
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread H. Peter Anvin
Again, we could hack a simulator and try it.

Matthew Garrett  wrote:

>On Fri, 2012-12-28 at 16:42 -0800, H. Peter Anvin wrote:
>> On 12/28/2012 03:39 PM, Matthew Garrett wrote:
>> > On Fri, 2012-12-28 at 15:32 -0800, H. Peter Anvin wrote:
>> >> No, that is explicitly the ordering that is excludable.
>> > 
>> > Windows uses the UEFI time functions. The fact that they fail for
>us on
>> > a bunch of systems is just another symptom of our general failure
>to
>> > accurately mimic Windows when making runtime calls, and we should
>just
>> > make sure that that gets fixed rather than layering more
>workarounds on
>> > top. It's an embarrassment that we're still unable to deal with
>> > platforms that work fine with another OS. 
>> > 
>> 
>> Are they using them in preference to ACPI?  This seems a bit odd,
>since
>> one can presume that the ACPI functions were added at Microsoft's
>request...
>
>In preference to the ACPI RTC, I haven't yet had an ACPI 5.0 TAD device
>to test in comparison.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] user_ns: fix missing limiting of user_ns counts

2012-12-28 Thread Al Viro
On Fri, Dec 28, 2012 at 08:05:32PM -0800, Eric W. Biederman wrote:

> Yes.  Gcc can't turn a tail call into a jump in even the most basic
> cases apparently.

What.  The.  Fuck?

You have introduced unlimited recursion on kernel stack.  OK, it's
unpleasant, but it can happen to anybody.  But then you have the gall
to complain about gcc optimizations not saving your broken code.  Are
you serious?

It's not a question of gcc being not smart enough, and while we are
at it, it's *not* a basic case at all - we have recursion going through
the callback and we have non-trivial return value on top of that; if
the caller of kref_put() had checked said return value, you would've
been unable to turn that into a loop at all.

Are you seriously saying that you relied on compiler being smart enough
to
* notice that return value of kref_put() is ignored
* notice that therefore the return value of kref_sub() is ignored
* notice that you have this call of put_user_ns() calling
kref_put() calling kref_sub() inside the callback we'd passed through the
last two levels of calls and that we are dealing with the tail recursion
here
... so that your code wouldn't have stepped into unlimited recursion?
I sincerely hope that you are not that much of an idiot.  If nothing else,
even if gcc did spot that one, the result would've been extremely brittle -
minor change to kref.h several year down the road and we'd get the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] ARM: arm-soc OMAP low-prio fixes

2012-12-28 Thread Olof Johansson
Hi Linus,

I'll leave it up to you to decide if you want to pull this or not --
it can wait to 3.9 as well.

This branch came in from Tony at the start of the merge window, and I
wanted to give it a little time in -next (but it got more than expected
due to travel).

It's a mix of lowish priority fixes and a couple of cleanups to split
setup between OMAP1/2+, needed for multiplatform restructuring.

Said multiplatform restructuring is the main reason arm-soc saw so much
churn this release -- i.MX and OMAP platforms were reshuffled quite a
bit to make them ready for multiplatform. i.MX finished, OMAP is nearly
there with just a few more patches needed (for 3.9). It should slow down
again for next release.


Thanks,

-Olof

The following changes since commit 2b8318881ddbcb67c5e8d2178b4228474944:

  Merge tag 'fbdev-for-3.8' of git://gitorious.org/linux-omap-dss2/linux 
(2012-12-15 13:03:48 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git 
tags/omap-late-cleanups

for you to fetch changes up to f64d204b5cf21b282445ba2fa357d970028c9f52:

  arch/arm/mach-omap2/dpll3xxx.c: drop if around WARN_ON (2012-12-17 10:50:41 
-0800)


ARM: arm-soc: late cleanups for omap

>From Tony Lindgren:
Here are few more patches to finish the omap changes for
multiplatform conversion that are not strictly fixes, but
were too complex to do with the dependencies during the
merge window. Those are to move of serial-omap.h to
platform_data, and the removal of remaining cpu_is_omap
macro usage outside mach-omap2.

Then there are several trivial fixes for typos and few
minimal omap2plus_defconfig updates.


AnilKumar Chimata (1):
  ARM: OMAP2+: omap2plus_defconfig: Add tps65217 support

Javier Martinez Canillas (2):
  ARM: OMAP2+: enable devtmpfs and devtmpfs automount
  ARM: OMAP2+: omap2plus_defconfig: enable twl4030 SoC audio

Jon Hunter (1):
  ARM: OMAP2+: PMU: Remove unused header

Julia Lawall (1):
  arch/arm/mach-omap2/dpll3xxx.c: drop if around WARN_ON

Peter Ujfalusi (1):
  ARM: OMAP2+: omap_twl: Change TWL4030_MODULE_PM_RECEIVER to 
TWL_MODULE_PM_RECEIVER

Srinivas Kandagatla (1):
  ARM/omap: use module_platform_driver macro

Tony Lindgren (3):
  MAINTAINERS: Add an entry for omap related .dts files
  ARM: OMAP: Split fb.c to remove last remaining cpu_is_omap usage
  ARM: OMAP2+: Drop plat/cpu.h for omap2plus

Wei Yongjun (1):
  ARM: OMAP4: remove duplicated include from omap_hwmod_44xx_data.c

YOSHIFUJI Hideaki (1):
  OMAP2: Fix a typo - replace regist with register.

 MAINTAINERS|  9 
 arch/arm/configs/omap2plus_defconfig   |  5 ++
 arch/arm/mach-omap1/Makefile   |  2 +-
 arch/arm/mach-omap1/fb.c   | 80 ++
 arch/arm/mach-omap2/Makefile   |  2 +-
 arch/arm/mach-omap2/control.h  |  2 +-
 arch/arm/mach-omap2/dpll3xxx.c |  3 +-
 arch/arm/mach-omap2/drm.c  |  1 -
 arch/arm/mach-omap2/dss-common.c   |  3 +-
 arch/arm/{plat-omap => mach-omap2}/fb.c| 50 +--
 arch/arm/mach-omap2/omap_hwmod_44xx_data.c |  1 -
 arch/arm/mach-omap2/omap_twl.c |  8 +--
 arch/arm/mach-omap2/pmu.c  |  2 -
 arch/arm/mach-omap2/prm2xxx.c  |  3 +-
 arch/arm/mach-omap2/prm3xxx.c  |  3 +-
 arch/arm/plat-omap/Makefile|  2 +-
 arch/arm/plat-omap/dmtimer.c   | 13 +
 arch/arm/plat-omap/include/plat/cpu.h  |  4 --
 drivers/media/platform/omap3isp/isp.c  |  2 -
 drivers/power/avs/smartreflex.c|  2 -
 20 files changed, 109 insertions(+), 88 deletions(-)
 create mode 100644 arch/arm/mach-omap1/fb.c
 rename arch/arm/{plat-omap => mach-omap2}/fb.c (76%)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread Matthew Garrett
On Fri, 2012-12-28 at 16:42 -0800, H. Peter Anvin wrote:
> On 12/28/2012 03:39 PM, Matthew Garrett wrote:
> > On Fri, 2012-12-28 at 15:32 -0800, H. Peter Anvin wrote:
> >> No, that is explicitly the ordering that is excludable.
> > 
> > Windows uses the UEFI time functions. The fact that they fail for us on
> > a bunch of systems is just another symptom of our general failure to
> > accurately mimic Windows when making runtime calls, and we should just
> > make sure that that gets fixed rather than layering more workarounds on
> > top. It's an embarrassment that we're still unable to deal with
> > platforms that work fine with another OS. 
> > 
> 
> Are they using them in preference to ACPI?  This seems a bit odd, since
> one can presume that the ACPI functions were added at Microsoft's request...

In preference to the ACPI RTC, I haven't yet had an ACPI 5.0 TAD device
to test in comparison.


[GIT PULL] ARM: arm-soc fixes for 3.8-rc

2012-12-28 Thread Olof Johansson
Hi Linus,

The following changes since commit a49f0d1ea3ec94fc7cf33a7c36a16343b74bd565:

  Linux 3.8-rc1 (2012-12-21 17:19:00 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc.git tags/fixes

for you to fetch changes up to 2e376799b2f6af2e8baeb407b6437a4d7b650ac0:

  Merge tag 'sunxi-fixes-for-3.8-rc2' of git://github.com/mripard/linux into 
fixes (2012-12-28 08:53:01 +0100)



ARM: arm-soc: fixes for -rc2

It's been quiet over the holidays, but we have had a couple of trivial
fixes coming in for the newly introduced sunxi platform; one to add it
to the multiplatform defconfig for build coverage, and one fixup for
device tree strings.


Maxime Ripard (2):
  ARM: multi_v7_defconfig: Add ARCH_SUNXI
  sunxi: Change the machine compatible string.

Olof Johansson (1):
  Merge tag 'sunxi-fixes-for-3.8-rc2' of git://github.com/mripard/linux 
into fixes

 arch/arm/configs/multi_v7_defconfig | 1 +
 arch/arm/mach-sunxi/sunxi.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] printk: Fix incorrect length from print_time() when seconds > 99999

2012-12-28 Thread Roland Dreier
From: Roland Dreier 

print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value.  However, this is obviously incorrect when
the whole seconds part of the time gets beyond 5 digits (10
seconds is a bit more than a day of uptime).

The simple fix is to use snprintf(NULL, 0, ...) to calculate the
actual length of the time prefix.  This could be micro-optimized but
it seems better to have simpler, more readable code here.

The bug leads to the syslog system call miscomputing which messages
fit into the userspace buffer.  If there are enough messages to fill
log_buf_len and some have a timestamp >= 10, dmesg may fail with:

# dmesg
klogctl: Bad address

When this happens, strace shows that the failure is indeed EFAULT due
to the kernel mistakenly accessing past the end of dmesg's buffer,
since dmesg asks the kernel how big a buffer it needs, allocates a bit
more, and then gets an error when it asks the kernel to fill it:

syslog(0xa, 0, 0)   = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x18)   = -1 EFAULT (Bad address)

Signed-off-by: Roland Dreier 
---
 kernel/printk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 19c0d7b..357f714 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -870,10 +870,11 @@ static size_t print_time(u64 ts, char *buf)
if (!printk_time)
return 0;
 
+   rem_nsec = do_div(ts, 10);
+
if (!buf)
-   return 15;
+   return snprintf(NULL, 0, "[%5lu.00] ", (unsigned long)ts);
 
-   rem_nsec = do_div(ts, 10);
return sprintf(buf, "[%5lu.%06lu] ",
   (unsigned long)ts, rem_nsec / 1000);
 }
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] user_ns: fix missing limiting of user_ns counts

2012-12-28 Thread Eric W. Biederman
Vasily Kulikov  writes:

> Currently there is completely no limiting in number of user namespaces
> created by unprivileged users.  One can freely create thousands of
> user_ns'es and exhaust kernel memory without even bumping in
> RLIMIT_NPROC or similar.

First for a proper sense of scale it will take roughly 14,000 to consume
a megabyte.  So it will take hundreds of millions of user namespaces to
eat up all of kernel memory.

That said I have no objects to a patch that implemnts sysctls for
maximum limits.

> Even more -- it allows user to overflow kernel stack theoretically
> allowing user to overwrite some important kernel data.  The problem is
> that free_user_ns() may also free its parent user_namespace recursively
> calling free_user_ns().  As kernel stack is very limited, it leads to
> kernel stack overflow.

Yes.  Gcc can't turn a tail call into a jump in even the most basic
cases apparently.  So we need to adopt the solution of the pid
namespace.  Patch to follow shortly.

> The code needs several checks.  First, noone should be able to create
> user_ns of arbitrary depth.  Besides kernel stack overflow one could
> create too big depth to DoS processes belonging to other users by
> forcing them to loop a long time in cap_capable called from some
> ns_capable() (e.g. in case one does smth like "ls -R /proc").

Where do you get a ns_capable call from "ls -R /proc" ?

> Second,
> non-privileged users must not be able to overlimit some count of
> namespaces to not be able to exhaust kernel memory.

> The included patch is a basic fix for both or them.  Both values are
> hardcoded here to 100 max depth and 1000 max in total.  I'm not sure how
> better to make them configurable.  Looks like it needs some sysctl value
> like kernel.max_user_ns_per_user, but also something more configurable
> like new rlimit'ish limit may be created for user_ns needs.  E.g. in
> case root wants one user to contain hundreds of private containers
> (container owner user), but he doesn't want anybody to fill the kernel
> with hundreds of containers multiplied by number of system users (equals
> to thousands).
>
> I'm not sure how it is an approved way for user_ns.  Eric?

An per user limit for user namespaces is pretty much useless, as it is
expected that many user namespaces will be allocated multiple uids to play
with.  My current target is to modify newuser allocate 10,000 uids for
each user by default.

Other than a global limit the recommended solution is some kind of
control group.

With that said I am starting to think there may be a good argument for
per userns limits that apply to a user namespace and all of it's
children.  But for that to really make sense requires showing that
control groups can't do the job well.   I think there might be a
reasonable argument there.

> A related issue which is NOT FIXED HERE is limits for all resources
> available for containerized pseudo roots.  E.g. I succeeded creating
> thousands of veth network devices without problems by a non-root user,
> there seems no limit in number of network devices.  I suspect it is
> possible to setup routing and net_ns'es the way it will be very
> time-consuming for kernel to handle IP packets inside of ksoftirq, which
> is not counted as this user scheduler time.   I suppose the issue is not
> veth-specific, almost all newly available for unprivileged users code
> pathes are vulnerable to DoS attacks.

veth at least should process packets synchronously so I don't see how
you will get softirq action.  There is also for whatever it is worth
the network memory control group, that should limit networking things.
I haven't had a chance to look how sane it is in practice.

> Signed-off-by: Vasily Kulikov 
> -- 
>  include/linux/sched.h   |3 +++
>  kernel/user_namespace.c |   26 ++
>  2 files changed, 29 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 206bb08..479940e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -706,6 +706,9 @@ struct user_struct {
>  #ifdef CONFIG_EPOLL
>   atomic_long_t epoll_watches; /* The number of file descriptors 
> currently watched */
>  #endif
> +#ifdef CONFIG_USER_NS
> + atomic_t user_namespaces; /* How many user_ns does this user created? */
> +#endif
>  #ifdef CONFIG_POSIX_MQUEUE
>   /* protected by mq_lock */
>   unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 2b042c4..a52c4e8 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -45,6 +45,16 @@ static void set_cred_user_ns(struct cred *cred, struct 
> user_namespace *user_ns)
>   cred->user_ns = user_ns;
>  }
>  
> +static long get_user_ns_depth(struct user_namespace *ns)
> +{
> + long depth;
> +
> + for (depth = 1; ns != _user_ns; ns = ns->parent)
> + depth++;
> +
> + return depth;
> +}
> +
>  /*
>   * Create a new 

[PATCH] perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side

2012-12-28 Thread David Ahern
This patch is brought to you by the letter 'H'.

Commit 20b279 breaks compatiblity with older perf binaries when run with
precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
samples) unless host only profiling is requested (:H modifier). The workaround
for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
toggles exclude_guest to 1). This was deemed unacceptable by Linus:

https://lkml.org/lkml/2012/12/12/570

Between family in town and the fresh snow in Breckenridge there is no time left
to be working on the proper fix for this over the holidays. In the New Year I
have more pressing problems to resolve -- like some memory leaks in perf which
are proving to be elusive -- although the aforementioned snow is probably why
they are proving to be elusive. Either way I do not have any spare time to work
on this and from the time I have managed to spend on it the solution is more
difficult than just moving to a new exclude_guest flag (does not work) or
flipping the logic to include_guest (which is not as trivial as one would
think).

So, two options: silently force exclude_guest on as suggested by Gleb which
means no impact to older perf binaries or revert the original patch which
caused the breakage.

This patch does the latter -- reverts the original patch that introduced the
regression. The problem can be revisited in the future as time allows.

Signed-off-By: David Ahern 
Cc: Linus Torvalds 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: Peter Zijlstra 
Cc: Avi Kivity 
Cc: David Ahern 
Cc: Gleb Natapov 
Cc: Robert Richter 
---
 arch/x86/kernel/cpu/perf_event.c |6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..6774c17 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -340,9 +340,6 @@ int x86_setup_perfctr(struct perf_event *event)
/* BTS is currently only allowed for user-mode. */
if (!attr->exclude_kernel)
return -EOPNOTSUPP;
-
-   if (!attr->exclude_guest)
-   return -EOPNOTSUPP;
}
 
hwc->config |= config;
@@ -385,9 +382,6 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip) {
int precise = 0;
 
-   if (!event->attr.exclude_guest)
-   return -EOPNOTSUPP;
-
/* Support for constant skid */
if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V3] Fix problem with cpufreq_ondemand or cpufreq_conservative

2012-12-28 Thread Larry Finger
Since commit 2aacdff entitled "cpufreq: Move common part from governors
to separate file", whenever the drivers that depend on this new file
(cpufreq_ondemand or cpufreq_conservative) are built as modules, a new
module named cpufreq_governor is created because the Makefile includes
cpufreq_governor.o twice. As drivers/cpufreq/cpufreq_governor.c contains no
MODULE directives, the resulting module has no license specified, which
results in logging of a "module license 'unspecified' taints kernel". In
addition, a number of globals are exported GPL only, and are therefore
not available. This fix establishes a new boolean configuration variable
that forces cpufreq_governor.o to be linked into the kernel whenever
either cpufreq_ondemand or cpufreq_conservative is selected.

Signed-off-by: Larry Finger 
---
V3 changes only Kconfig and Makefile and avoids creating a new module.

Larry

 Kconfig  |5 +
 Makefile |5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

Index: wireless-testing-new/drivers/cpufreq/Kconfig
===
--- wireless-testing-new.orig/drivers/cpufreq/Kconfig
+++ wireless-testing-new/drivers/cpufreq/Kconfig
@@ -20,6 +20,9 @@ if CPU_FREQ
 config CPU_FREQ_TABLE
tristate
 
+config CPU_FREQ_GOVERNOR
+   bool
+
 config CPU_FREQ_STAT
tristate "CPU frequency translation statistics"
select CPU_FREQ_TABLE
@@ -141,6 +144,7 @@ config CPU_FREQ_GOV_USERSPACE
 config CPU_FREQ_GOV_ONDEMAND
tristate "'ondemand' cpufreq policy governor"
select CPU_FREQ_TABLE
+   select CPU_FREQ_GOVERNOR
help
  'ondemand' - This driver adds a dynamic cpufreq policy governor.
  The governor does a periodic polling and 
@@ -159,6 +163,7 @@ config CPU_FREQ_GOV_ONDEMAND
 config CPU_FREQ_GOV_CONSERVATIVE
tristate "'conservative' cpufreq governor"
depends on CPU_FREQ
+   select CPU_FREQ_GOVERNOR
help
  'conservative' - this driver is rather similar to the 'ondemand'
  governor both in its source code and its purpose, the difference is
Index: wireless-testing-new/drivers/cpufreq/Makefile
===
--- wireless-testing-new.orig/drivers/cpufreq/Makefile
+++ wireless-testing-new/drivers/cpufreq/Makefile
@@ -7,8 +7,9 @@ obj-$(CONFIG_CPU_FREQ_STAT)
 obj-$(CONFIG_CPU_FREQ_GOV_PERFORMANCE) += cpufreq_performance.o
 obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE)   += cpufreq_powersave.o
 obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE)   += cpufreq_userspace.o
-obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND)+= cpufreq_ondemand.o cpufreq_governor.o
-obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE)+= cpufreq_conservative.o 
cpufreq_governor.o
+obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND)+= cpufreq_ondemand.o
+obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE)+= cpufreq_conservative.o
+obj-$(CONFIG_CPU_FREQ_GOVERNOR)+= cpufreq_governor.o
 
 # CPUfreq cross-arch helpers
 obj-$(CONFIG_CPU_FREQ_TABLE)   += freq_table.o
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] regulator: s5m8767: Remove max_vol parameter from s5m8767_convert_voltage_to_sel

2012-12-28 Thread Axel Lin
It looks pointless to pass max_vol to s5m8767_convert_voltage_to_sel().
Compare selected voltage to desc->max is enough to ensure selected voltage
is in supported range.

Signed-off-by: Axel Lin 
---
 drivers/regulator/s5m8767.c |   33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/drivers/regulator/s5m8767.c b/drivers/regulator/s5m8767.c
index eeadf58..aa0ccef 100644
--- a/drivers/regulator/s5m8767.c
+++ b/drivers/regulator/s5m8767.c
@@ -297,16 +297,15 @@ static int s5m8767_get_vsel_reg(int reg_id, struct 
s5m8767_info *s5m8767)
return reg;
 }
 
-static int s5m8767_convert_voltage_to_sel(
-   const struct sec_voltage_desc *desc,
-   int min_vol, int max_vol)
+static int s5m8767_convert_voltage_to_sel(const struct sec_voltage_desc *desc,
+ int min_vol)
 {
int selector = 0;
 
if (desc == NULL)
return -EINVAL;
 
-   if (max_vol < desc->min || min_vol > desc->max)
+   if (min_vol > desc->max)
return -EINVAL;
 
if (min_vol < desc->min)
@@ -314,7 +313,7 @@ static int s5m8767_convert_voltage_to_sel(
 
selector = DIV_ROUND_UP(min_vol - desc->min, desc->step);
 
-   if (desc->min + desc->step * selector > max_vol)
+   if (desc->min + desc->step * selector > desc->max)
return -EINVAL;
 
return selector;
@@ -544,23 +543,17 @@ static int s5m8767_pmic_probe(struct platform_device 
*pdev)
s5m8767->opmode = pdata->opmode;
 
buck_init = s5m8767_convert_voltage_to_sel(_voltage_val2,
-   pdata->buck2_init,
-   pdata->buck2_init +
-   buck_voltage_val2.step);
+  pdata->buck2_init);
 
sec_reg_write(s5m8767->iodev, S5M8767_REG_BUCK2DVS2, buck_init);
 
buck_init = s5m8767_convert_voltage_to_sel(_voltage_val2,
-   pdata->buck3_init,
-   pdata->buck3_init +
-   buck_voltage_val2.step);
+  pdata->buck3_init);
 
sec_reg_write(s5m8767->iodev, S5M8767_REG_BUCK3DVS2, buck_init);
 
buck_init = s5m8767_convert_voltage_to_sel(_voltage_val2,
-   pdata->buck4_init,
-   pdata->buck4_init +
-   buck_voltage_val2.step);
+  pdata->buck4_init);
 
sec_reg_write(s5m8767->iodev, S5M8767_REG_BUCK4DVS2, buck_init);
 
@@ -569,27 +562,21 @@ static int s5m8767_pmic_probe(struct platform_device 
*pdev)
s5m8767->buck2_vol[i] =
s5m8767_convert_voltage_to_sel(
_voltage_val2,
-   pdata->buck2_voltage[i],
-   pdata->buck2_voltage[i] +
-   buck_voltage_val2.step);
+   pdata->buck2_voltage[i]);
}
 
if (s5m8767->buck3_gpiodvs) {
s5m8767->buck3_vol[i] =
s5m8767_convert_voltage_to_sel(
_voltage_val2,
-   pdata->buck3_voltage[i],
-   pdata->buck3_voltage[i] +
-   buck_voltage_val2.step);
+   pdata->buck3_voltage[i]);
}
 
if (s5m8767->buck4_gpiodvs) {
s5m8767->buck4_vol[i] =
s5m8767_convert_voltage_to_sel(
_voltage_val2,
-   pdata->buck4_voltage[i],
-   pdata->buck4_voltage[i] +
-   buck_voltage_val2.step);
+   pdata->buck4_voltage[i]);
}
}
 
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread H. Peter Anvin
On 12/28/2012 05:00 PM, joeyli wrote:
> 於 五,2012-12-28 於 17:43 +,Matthew Garrett 提到:
>> On Sat, 2012-12-29 at 00:26 +0800, Lee, Chun-Yi wrote:
>>> UEFI time services, GetTime(), SetTime(), GetWakeupTime(), SetWakeupTime() 
>>> are also
>>> supported by other non-IA64 architecutre with UEFI BIOS, e.g. x86.
>>>
>>> This patch changed RTC_DRV_EFI configuration to depend on EFI but not just 
>>> IA64. It
>>> checks efi_enabled flag and efi-rtc driver should enabled.
>>
>> In theory, certainly - but do we still have machines that explode if the
>> get_time call is made? We may also want to think about disabling the
>> legacy access to the RTC if the EFI calls are present.
> 
> The legacy get_time access on my test machine is work fine, not thing
> explode. :-)
> Just we have a function want to expose the timezone information to
> userspace and also store it.
> 

We should indeed save the timezone information if it is available --
either from the ACPI TAD or from the EFI RTC, or even via some
platform-dependent mechanism.  It is important, though, that that is
separate from the order of priority.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread joeyli
於 五,2012-12-28 於 17:43 +,Matthew Garrett 提到:
> On Sat, 2012-12-29 at 00:26 +0800, Lee, Chun-Yi wrote:
> > UEFI time services, GetTime(), SetTime(), GetWakeupTime(), SetWakeupTime() 
> > are also
> > supported by other non-IA64 architecutre with UEFI BIOS, e.g. x86.
> > 
> > This patch changed RTC_DRV_EFI configuration to depend on EFI but not just 
> > IA64. It
> > checks efi_enabled flag and efi-rtc driver should enabled.
> 
> In theory, certainly - but do we still have machines that explode if the
> get_time call is made? We may also want to think about disabling the
> legacy access to the RTC if the EFI calls are present.

The legacy get_time access on my test machine is work fine, not thing
explode. :-)
Just we have a function want to expose the timezone information to
userspace and also store it.


Thanks a lot!
Joey Lee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix problem with cpufreq_pndemand or cpufreq_conservative

2012-12-28 Thread Larry Finger

On 12/28/2012 06:33 PM, Fabio Baltieri wrote:

On Fri, Dec 28, 2012 at 05:45:54PM -0600, Larry Finger wrote:

I wonder if that's avoidable?  The intention is not to create an additional
module, clearly.


It appears not to be possible. I don't know enough about to kmake to
understand why it is forcing a new module. Perhaps some expert knows
what Kconfig or Makefile magic will prevent that.


kbuild is building an additional module just because the makefile is
adding the new objects in the obj-m list directly, as in:

obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND) += cpufreq_ondemand.o cpufreq_governor.o
obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE) += cpufreq_conservative.o 
cpufreq_governor.o

To build just two modules the Makefile would have to be modified [1]
into something into something like:

obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND) += cpufreq_ondemand_mod.o
cpufreq_ondemand_mod-y  := cpufreq_ondemand.o cpufreq_governor.o
obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE) += cpufreq_conservative_mod.o
cpufreq_conservative_mod-y  := cpufreq_conservative.o 
cpufreq_governor.o

so that only two .o are added to obj-m, but that's not correct either as
you end up with cpufreq_governor symbols exported twice.

I think the only way would be to force cpufreq_governor as builtin with
an automatic Kconfig option.

Fabio

1. http://lxr.linux.no/#linux+v3.7.1/Documentation/kbuild/makefiles.txt#L191


Fabio,

Thanks for the explanation. Now I think I know how to do it.

V3 follows.

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V7 2/7] ARM: dt: change .dtb build rules to build in dts directory

2012-12-28 Thread Stephen Warren
On 12/26/2012 07:27 AM, Shawn Guo wrote:
> On Tue, Nov 27, 2012 at 04:29:11PM -0700, Stephen Warren wrote:
>> From: Grant Likely 
>>
>> The current rules have the .dtb files build in a different directory
>> from the .dts files. The only reason for this is that it was what
>> PowerPC has done historically. This patch changes ARM to use the generic
>> dtb rule which builds .dtb files in the same directory as the source .dts.
>
> It's a pity that after merging the patch, all the enabled dts files
> will be rebuilt anyway no matter whether they are actually changed
> or not.

Oops. I thought I'd tested that quite extensively, but I somehow missed
something. I've sent patches to fix this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread H. Peter Anvin
On 12/28/2012 03:39 PM, Matthew Garrett wrote:
> On Fri, 2012-12-28 at 15:32 -0800, H. Peter Anvin wrote:
>> No, that is explicitly the ordering that is excludable.
> 
> Windows uses the UEFI time functions. The fact that they fail for us on
> a bunch of systems is just another symptom of our general failure to
> accurately mimic Windows when making runtime calls, and we should just
> make sure that that gets fixed rather than layering more workarounds on
> top. It's an embarrassment that we're still unable to deal with
> platforms that work fine with another OS. 
> 

Are they using them in preference to ACPI?  This seems a bit odd, since
one can presume that the ACPI functions were added at Microsoft's request...

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] arm64: dts: prevent *.dtb from always being rebuilt

2012-12-28 Thread Stephen Warren
From: Stephen Warren 

if_changed (used by the *.dts->*.dtc rule) rebuilds files if they aren't
contained in $(targets). (make V=2 indicates this). Add $(dtb-y) to
$(targets) to prevent *.dtb from always being rebuilt. Note

This fixes a regression introduced by the .dtb rule rework in da4cbc6
"arm64: use new common dtc rule", although since arm64 doesn't actually
have any *.dts yet, this isn't a critical issue.

Signed-off-by: Stephen Warren 
---
 arch/arm64/boot/dts/Makefile |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile
index 801e2d7..32ac0ae 100644
--- a/arch/arm64/boot/dts/Makefile
+++ b/arch/arm64/boot/dts/Makefile
@@ -1,4 +1,5 @@
 targets += dtbs
+targets += $(dtb-y)
 
 dtbs: $(addprefix $(obj)/, $(dtb-y))
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] ARM: dts: prevent *.dtb from always being rebuilt

2012-12-28 Thread Stephen Warren
From: Stephen Warren 

if_changed (used by the *.dts->*.dtc rule) rebuilds files if they aren't
contained in $(targets). (make V=2 indicates this). Add $(dtb-y) to
$(targets) to prevent *.dtb from always being rebuilt.

This fixes a regression introduced by the .dtb rule rework in 499cd82
"ARM: dt: change .dtb build rules to build in dts directory".

Reported-by: Shawn Guo 
Signed-off-by: Stephen Warren 
---
 arch/arm/boot/dts/Makefile |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index e44da40..5ebb44f 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -155,6 +155,7 @@ dtb-$(CONFIG_ARCH_VT8500) += vt8500-bv07.dtb \
 dtb-$(CONFIG_ARCH_ZYNQ) += zynq-zc702.dtb
 
 targets += dtbs
+targets += $(dtb-y)
 endif
 
 # *.dtb used to be generated in the directory above. Clean out the
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-28 Thread Eric Dumazet
On Thu, 2012-12-27 at 14:31 -0500, Rik van Riel wrote:
> to use a bigger/smaller one.
> 
> I guess we want a larger value.
> 
> With your hashed lock approach, we can get away with
> larger values - they will not penalize other locks
> the same way a single value per cpu might have.

Then, we absolutely want to detect hash collisions to clear the (wrong)
estimation or else we might 'pollute' a spinlock with a delay of a very
slow spinlock.

In my tests, the mm zone lock can be held for very long for example...

[  657.439995] cpu 18 lock 88067fffeb40 delay 6906
[  657.444855] [ cut here ]
[  657.444859] WARNING: at arch/x86/kernel/smp.c:170
ticket_spin_lock_wait+0xf9/0x100()
[  657.444860] Hardware name: TBG,ICH10
[  657.444861] Modules linked in: msr cpuid genrtc mlx4_en ib_uverbs
mlx4_ib ib_sa ib_mad ib_core mlx4_core e1000e bnx2x libcrc32c mdio ipv6
[  657.444871] Pid: 24942, comm: hotplug Tainted: GW
3.8.0-smp-DEV #31
[  657.444872] Call Trace:
[  657.444876]  [] warn_slowpath_common+0x7f/0xc0
[  657.444878]  [] warn_slowpath_null+0x1a/0x20
[  657.444881]  [] ticket_spin_lock_wait+0xf9/0x100
[  657.444885]  [] _raw_spin_lock_irqsave+0x2f/0x40
[  657.444887]  [] release_pages+0x160/0x220
[  657.444891]  [] free_pages_and_swap_cache+0x9e/0xc0
[  657.444893]  [] ? flush_tlb_mm_range+0x48/0x220
[  657.444896]  [] tlb_flush_mmu+0x67/0xb0
[  657.444898]  [] tlb_finish_mmu+0x1c/0x50
[  657.444900]  [] exit_mmap+0xf6/0x170
[  657.444903]  [] mmput+0x47/0xf0
[  657.444906]  [] do_exit+0x24d/0xa20
[  657.444908]  [] ? recalc_sigpending+0x1f/0x60
[  657.444910]  [] ? __set_task_blocked+0x37/0x80
[  657.444913]  [] do_group_exit+0x44/0xa0
[  657.444915]  [] sys_exit_group+0x17/0x20
[  657.444918]  [] sysenter_dispatch+0x7/0x1a
[  657.444920] ---[ end trace a460fe18a5578dda ]---

My current function looks like :

 /*
+ * Wait on a congested ticket spinlock.
+ */
+#define MIN_SPINLOCK_DELAY 1
+#define MAX_SPINLOCK_DELAY 1
+#define DELAY_HASH_SHIFT 6
+struct delay_entry {
+   u32 hash;
+   u32 delay;
+};
+static DEFINE_PER_CPU(struct delay_entry [1 << DELAY_HASH_SHIFT], 
spinlock_delay) = {
+   [0 ... (1 << DELAY_HASH_SHIFT) - 1] = {
+   .hash = 0,
+   .delay = MIN_SPINLOCK_DELAY,
+   },
+};
+static DEFINE_PER_CPU(u16, maxdelay);
+
+void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
+{
+   u32 hash = hash32_ptr(lock);
+   u32 slot = hash_32(hash, DELAY_HASH_SHIFT);
+   struct delay_entry *ent = &__get_cpu_var(spinlock_delay[slot]);
+   u32 delay = (ent->hash == hash) ? ent->delay : MIN_SPINLOCK_DELAY;
+
+   for (;;) {
+   u32 loops = delay * (__ticket_t)(inc.tail - inc.head);
+
+   loops -= delay >> 1;
+   while (loops--)
+   cpu_relax();
+
+   inc.head = ACCESS_ONCE(lock->tickets.head);
+
+   if (inc.head == inc.tail) {
+   /* Decrease the delay, since we may have overslept. */
+   if (delay > MIN_SPINLOCK_DELAY)
+   delay--;
+   break;
+   }
+
+   /*
+* The lock is still busy, the delay was not long enough.
+* Going through here 2.7 times will, on average, cancel
+* out the decrement above. Using a non-integer number
+* gets rid of performance artifacts and reduces oversleeping.
+*/
+   if (delay < MAX_SPINLOCK_DELAY &&
+   (!(inc.head & 3) == 0 || (inc.head & 7) == 1))
+   delay++;
+   }
+   if (__this_cpu_read(maxdelay) < delay) {
+   pr_err("cpu %d lock %p delay %d\n", smp_processor_id(), lock, 
delay);
+   __this_cpu_write(maxdelay, delay);
+   WARN_ON(1);
+   }
+   ent->hash = hash;
+   ent->delay = delay;
+}
+


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC 4/4] common: dma-mapping: Move dma_common_*() to

2012-12-28 Thread Mauro Carvalho Chehab
Em Fri, 28 Dec 2012 20:23:34 +0100
Geert Uytterhoeven  escreveu:

> dma_common_mmap() and dma_common_get_sgtable() are defined in
> drivers/base/dma-mapping.c, and always compiled if CONFIG_HAS_DMA=y.
> 
> However, their forward declarations and the inline functions defined on top
> of them (dma_mmap_attrs(), dma_mmap_coherent(), dma_mmap_writecombine(),
> dma_get_sgtable_attrs()), dma_get_sgtable()) are in
> , which is not included by all
> architectures supporting CONFIG_HAS_DMA=y.  There exist no alternative
> implementations.
> 
> Hence for e.g. m68k allmodconfig, I get:
> 
> drivers/media/v4l2-core/videobuf2-dma-contig.c: In function ‘vb2_dc_mmap’:
> drivers/media/v4l2-core/videobuf2-dma-contig.c:204: error: implicit 
> declaration of function ‘dma_mmap_coherent’
> drivers/media/v4l2-core/videobuf2-dma-contig.c: In function 
> ‘vb2_dc_get_base_sgt’:
> drivers/media/v4l2-core/videobuf2-dma-contig.c:387: error: implicit 
> declaration of function ‘dma_get_sgtable’
> 
> To fix this
>   - Move the forward declarations and inline definitions to
> , so all CONFIG_HAS_DMA=y architectures can use
> them,
>   - Replace the hard "BUG_ON(!ops)" checks for dma_map_ops by soft checks,
> so architectures can fall back to the common code by returning NULL
> from their get_dma_ops(). Note that there are no "BUG_ON(!ops)" checks
> in other functions in ,
>   - Make "struct dma_map_ops *ops" const while we're at it.
> 
> Signed-off-by: Geert Uytterhoeven 

>From my side:

Acked-by: Mauro Carvalho Chehab 

> ---
>  include/asm-generic/dma-mapping-common.h |   55 
> --
>  include/linux/dma-mapping.h  |   54 +
>  2 files changed, 54 insertions(+), 55 deletions(-)
> 
> diff --git a/include/asm-generic/dma-mapping-common.h 
> b/include/asm-generic/dma-mapping-common.h
> index de8bf89..2e248d8 100644
> --- a/include/asm-generic/dma-mapping-common.h
> +++ b/include/asm-generic/dma-mapping-common.h
> @@ -176,59 +176,4 @@ dma_sync_sg_for_device(struct device *dev, struct 
> scatterlist *sg,
>  #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, NULL)
>  #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, NULL)
>  
> -extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma,
> -void *cpu_addr, dma_addr_t dma_addr, size_t size);
> -
> -/**
> - * dma_mmap_attrs - map a coherent DMA allocation into user space
> - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
> - * @vma: vm_area_struct describing requested user mapping
> - * @cpu_addr: kernel CPU-view address returned from dma_alloc_attrs
> - * @handle: device-view address returned from dma_alloc_attrs
> - * @size: size of memory originally requested in dma_alloc_attrs
> - * @attrs: attributes of mapping properties requested in dma_alloc_attrs
> - *
> - * Map a coherent DMA buffer previously allocated by dma_alloc_attrs
> - * into user space.  The coherent DMA buffer must not be freed by the
> - * driver until the user space mapping has been released.
> - */
> -static inline int
> -dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void 
> *cpu_addr,
> -dma_addr_t dma_addr, size_t size, struct dma_attrs *attrs)
> -{
> - struct dma_map_ops *ops = get_dma_ops(dev);
> - BUG_ON(!ops);
> - if (ops->mmap)
> - return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
> - return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
> -}
> -
> -#define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, NULL)
> -
> -static inline int dma_mmap_writecombine(struct device *dev, struct 
> vm_area_struct *vma,
> -   void *cpu_addr, dma_addr_t dma_addr, size_t size)
> -{
> - DEFINE_DMA_ATTRS(attrs);
> - dma_set_attr(DMA_ATTR_WRITE_COMBINE, );
> - return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, );
> -}
> -
> -int
> -dma_common_get_sgtable(struct device *dev, struct sg_table *sgt,
> -void *cpu_addr, dma_addr_t dma_addr, size_t size);
> -
> -static inline int
> -dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void 
> *cpu_addr,
> -   dma_addr_t dma_addr, size_t size, struct dma_attrs *attrs)
> -{
> - struct dma_map_ops *ops = get_dma_ops(dev);
> - BUG_ON(!ops);
> - if (ops->get_sgtable)
> - return ops->get_sgtable(dev, sgt, cpu_addr, dma_addr, size,
> - attrs);
> - return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, size);
> -}
> -
> -#define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 
> NULL)
> -
>  #endif
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 94af418..4b47150 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -74,6 +74,60 @@ static inline int is_device_dma_capable(struct device *dev)
>  
>  #ifdef 

Re: [PATCH] Fix problem with cpufreq_pndemand or cpufreq_conservative

2012-12-28 Thread Fabio Baltieri
On Fri, Dec 28, 2012 at 05:45:54PM -0600, Larry Finger wrote:
> >I wonder if that's avoidable?  The intention is not to create an additional
> >module, clearly.
> 
> It appears not to be possible. I don't know enough about to kmake to
> understand why it is forcing a new module. Perhaps some expert knows
> what Kconfig or Makefile magic will prevent that.

kbuild is building an additional module just because the makefile is
adding the new objects in the obj-m list directly, as in:

obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND) += cpufreq_ondemand.o cpufreq_governor.o
obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE) += cpufreq_conservative.o 
cpufreq_governor.o

To build just two modules the Makefile would have to be modified [1]
into something into something like:

obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND) += cpufreq_ondemand_mod.o
cpufreq_ondemand_mod-y  := cpufreq_ondemand.o cpufreq_governor.o
obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE) += cpufreq_conservative_mod.o
cpufreq_conservative_mod-y  := cpufreq_conservative.o 
cpufreq_governor.o

so that only two .o are added to obj-m, but that's not correct either as
you end up with cpufreq_governor symbols exported twice.

I think the only way would be to force cpufreq_governor as builtin with
an automatic Kconfig option.

Fabio

1. http://lxr.linux.no/#linux+v3.7.1/Documentation/kbuild/makefiles.txt#L191

-- 
Fabio Baltieri
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2] Fix problem with cpufreq_ondemand or cpufreq_conservative

2012-12-28 Thread Larry Finger
Since commit 2aacdff entitled "cpufreq: Move common part from governors to separate file", whenever the drivers that 
depend on this new file (cpufreq_ondemand or cpufreq_conservative) are built as modules, a new module named 
cpufreq_governor is created. It seems that kmake is smart enough to create a separate module whenever more than one 
module includes the same object file. As drivers/cpufreq/cpufreq_governor.c contains no MODULE directives, the resulting 
module has no license specified, which results in logging of a "module license 'unspecified' taints kernel". In 
addition, a number of globals are exported GPL only, and are therefore not available.


Signed-off-by: Larry Finger 
---

V2 is the more complicated version that makes clear what is happening.

Larry
---

 Kconfig|5 +
 Makefile   |5 +++--
 cpufreq_governor.c |5 +
 3 files changed, 13 insertions(+), 2 deletions(-)
---

Index: wireless-testing-new/drivers/cpufreq/Makefile
===
--- wireless-testing-new.orig/drivers/cpufreq/Makefile
+++ wireless-testing-new/drivers/cpufreq/Makefile
@@ -7,8 +7,9 @@ obj-$(CONFIG_CPU_FREQ_STAT)
 obj-$(CONFIG_CPU_FREQ_GOV_PERFORMANCE) += cpufreq_performance.o
 obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE)   += cpufreq_powersave.o
 obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE)   += cpufreq_userspace.o
-obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND)+= cpufreq_ondemand.o cpufreq_governor.o
-obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE)+= cpufreq_conservative.o 
cpufreq_governor.o
+obj-$(CONFIG_CPU_FREQ_GOV_ONDEMAND)+= cpufreq_ondemand.o
+obj-$(CONFIG_CPU_FREQ_GOV_CONSERVATIVE)+= cpufreq_conservative.o
+obj-$(CONFIG_CPU_FREQ_GOVERNOR)+= cpufreq_governor.o

 # CPUfreq cross-arch helpers
 obj-$(CONFIG_CPU_FREQ_TABLE)   += freq_table.o
Index: wireless-testing-new/drivers/cpufreq/Kconfig
===
--- wireless-testing-new.orig/drivers/cpufreq/Kconfig
+++ wireless-testing-new/drivers/cpufreq/Kconfig
@@ -20,6 +20,9 @@ if CPU_FREQ
 config CPU_FREQ_TABLE
tristate

+config CPU_FREQ_GOVERNOR
+   tristate
+
 config CPU_FREQ_STAT
tristate "CPU frequency translation statistics"
select CPU_FREQ_TABLE
@@ -141,6 +144,7 @@ config CPU_FREQ_GOV_USERSPACE
 config CPU_FREQ_GOV_ONDEMAND
tristate "'ondemand' cpufreq policy governor"
select CPU_FREQ_TABLE
+   select CPU_FREQ_GOVERNOR
help
  'ondemand' - This driver adds a dynamic cpufreq policy governor.
  The governor does a periodic polling and
@@ -159,6 +163,7 @@ config CPU_FREQ_GOV_ONDEMAND
 config CPU_FREQ_GOV_CONSERVATIVE
tristate "'conservative' cpufreq governor"
depends on CPU_FREQ
+   select CPU_FREQ_GOVERNOR
help
  'conservative' - this driver is rather similar to the 'ondemand'
  governor both in its source code and its purpose, the difference is
Index: wireless-testing-new/drivers/cpufreq/cpufreq_governor.c
===
--- wireless-testing-new.orig/drivers/cpufreq/cpufreq_governor.c
+++ wireless-testing-new/drivers/cpufreq/cpufreq_governor.c
@@ -316,3 +316,8 @@ second_time:
return 0;
 }
 EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
+
+MODULE_AUTHOR("Alexander Clouter ");
+MODULE_DESCRIPTION("'cpufreq_governor' - A mini-module containing common code "
+  "for cpufreq_conservative and cpufreq_ondemand");
+MODULE_LICENSE("GPL");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] lib: cpu_rmap: avoid flushing all workqueues

2012-12-28 Thread Josh Triplett
On Fri, Dec 28, 2012 at 10:18:11AM -0800, David Decotigny wrote:
> Thank you, Josh,
> 
> A few comments below, and the revised version shortly.

Responses below.

> On Thu, Dec 27, 2012 at 8:04 PM, Josh Triplett  wrote:
> > On Thu, Dec 27, 2012 at 11:24:34AM -0800, David Decotigny wrote:
> >> In some cases, free_irq_cpu_rmap() is called while holding a lock
> >> (eg. rtnl). This can lead to deadlocks, because it invokes
> >> flush_scheduled_work() which ends up waiting for whole system
> >> workqueue to flush, but some pending works might try to acquire the
> >> lock we are already holding.
> >>
> >> This commit uses reference-counting to replace
> >> irq_run_affinity_notifiers(). It also removes
> >> irq_run_affinity_notifiers() altogether.
> >>
> >> Signed-off-by: David Decotigny 
> >
> > A couple of comments below; with those addressed,
> > Reviewed-by: Josh Triplett 
> >
> >> --- a/lib/cpu_rmap.c
> >> +++ b/lib/cpu_rmap.c
> >> @@ -230,16 +256,23 @@ irq_cpu_rmap_notify(struct irq_affinity_notify 
> >> *notify, const cpumask_t *mask)
> >>   pr_warning("irq_cpu_rmap_notify: update failed: %d\n", rc);
> >>  }
> >>
> >> +/**
> >> + * irq_cpu_rmap_release - reclaiming callback for IRQ subsystem
> >> + * @ref: kref to struct irq_affinity_notify passed by irq/manage.c
> >> + */
> >>  static void irq_cpu_rmap_release(struct kref *ref)
> >>  {
> >>   struct irq_glue *glue =
> >>   container_of(ref, struct irq_glue, notify.kref);
> >> + struct cpu_rmap *rmap = glue->rmap;
> >> +
> >>   kfree(glue);
> >> + kref_put(>refcount, reclaim_cpu_rmap);
> >
> > Likewise, but also, why not call free_cpu_rmap(glue->rmap) before
> > kfree(glue) so you don't need the local copy?
> 
> I prefer to keep this kref_put here. I believe that calling something
> named "free_cpu_rmap" here might be misleading. It's code sharing vs.
> what we actually need to do, even though both are equivalent... for
> now.

If calling something named free_cpu_rmap feels wrong here, perhaps you
should call it cpu_rmap_put or cpu_rmap_unref or similar instead, since
it doesn't actually free unless the refcount goes to zero.  Then you
could have irq_cpu_rmap_release calling cpu_rmap_put, which feels more
natural.  But in any case, I think you should avoid having multiple
instances of the full call to kref_put on a cpu_rmap.

> For the order, it was deliberate, to have some kind of symmetry with
> kfree/kref_put in the error path we have in next function
> (irq_cpu_rmap_add). I reversed the order in that next function to
> avoid this unneeded local variable here. New ordering makes more sense
> anyways.

Ah, I see; makes sense to me.

> >>  }
> >>
> >>  /**
> >>   * irq_cpu_rmap_add - add an IRQ to a CPU affinity reverse-map
> >> - * @rmap: The reverse-map
> >> + * @rmap: The per-IRQ reverse-map
> >>   * @irq: The IRQ number
> >>   *
> >>   * This adds an IRQ affinity notifier that will update the reverse-map
> >> @@ -259,9 +292,12 @@ int irq_cpu_rmap_add(struct cpu_rmap *rmap, int irq)
> >>   glue->notify.release = irq_cpu_rmap_release;
> >>   glue->rmap = rmap;
> >>   glue->index = cpu_rmap_add(rmap, glue);
> >> + kref_get(>refcount);
> >>   rc = irq_set_affinity_notifier(irq, >notify);
> >> - if (rc)
> >> + if (rc) {
> >>   kfree(glue);
> >> + kref_put(>refcount, reclaim_cpu_rmap);
> >
> > Likewise.
> 
> I prefer to leave the explicit kref_put here too.

In this case, for symmetry with kref_get?

Would it help to add a cpu_rmap_get, along with cpu_rmap_put?

static inline struct cpu_rmap *cpu_rmap_get(struct cpu_rmap *rmap)
{
kref_get(>refcount);
return rmap;
}

...
glue->rmap = cpu_rmap_get(rmap);
...

> Next version soon, after some re-testing.

Thanks.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix problem with cpufreq_pndemand or cpufreq_conservative

2012-12-28 Thread Larry Finger

On 12/28/2012 05:01 PM, Rafael J. Wysocki wrote:

On Friday, December 28, 2012 04:17:24 PM Larry Finger wrote:

Since commit 2aacdff entitled "cpufreq: Move common part from governors to
separate file", whenever the drivers that depend on this new file
(cpufreq_ondemand or cpufreq_conservative) are built as modules, a new module
named cpufreq_governor is created. It seems that kmake is smart enough to create
a separate module whenever more than one module includes the same object file.
As drivers/cpufreq/cpufreq_governor.c contains no MODULE directives, the
resulting module has no license specified, which results in logging of a "module
license 'unspecified' taints kernel". In addition, a number of globals are
exported GPL only, and are therefore not available.

Signed-off-by: Larry Finger 
---

This particular patch is the simplest possible; however, it hides the intent. I
have prepared the longer version that makes the reason clearer by adding a new
configuration variable that is dependent on the other two, and rearranges
drivers/cpufreq/Makefile. That version could be submitted if that is what is
desired.


Yes, please.


I'll send it shortly.


The changes to cpufreq_governor.c are the same as in this version.


I wonder if that's avoidable?  The intention is not to create an additional
module, clearly.


It appears not to be possible. I don't know enough about to kmake to understand 
why it is forcing a new module. Perhaps some expert knows what Kconfig or 
Makefile magic will prevent that.


Larry



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread H. Peter Anvin
On 12/28/2012 12:49 PM, Matthew Garrett wrote:
> On Fri, 2012-12-28 at 12:40 -0800, H. Peter Anvin wrote:
> 
>>> I suspect that what we *should* do looks like:
>>>
>>> 1. If ACPI exports a Time and Alarm Device (ACPI000E) the use it;
>>> 2. If ACPI exports an PC/AT device (PNP0B00/1/2) then use it(*);
>>> 3. If we have an EFI RTC use it;
>>> 4. Probe for a PC/AT RTC device.
> 
> In terms of ordering, 3 should probably come before 2 - but that depends
> on us actually fixing the issues that are preventing some of these calls
> from working. As far as wallclock time goes, EFI is going to be
> available to us before we've parsed the DSDT to determine whether
> there's any ACPI devices, so we'll almost certainly end up having to use
> it at at least some point during boot. Otherwise, agreed.
> 

[Resending due to misconfigured tablet]

3 before 2 is exactly what is known to break on existing hardware (ASUS).

If anything, we should move to using the EFI RTC as a very last resort,
i.e. 3 after 4.  Ideally we should match Win8 behavior, but that would
require someone mocking up different ACPI and EFI functions in a
simulator and see how Win8 prioritizes things.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread Matthew Garrett
On Fri, 2012-12-28 at 15:32 -0800, H. Peter Anvin wrote:
> No, that is explicitly the ordering that is excludable.

Windows uses the UEFI time functions. The fact that they fail for us on
a bunch of systems is just another symptom of our general failure to
accurately mimic Windows when making runtime calls, and we should just
make sure that that gets fixed rather than layering more workarounds on
top. It's an embarrassment that we're still unable to deal with
platforms that work fine with another OS. 
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH V2] forcedeth: Fix WARNINGS that result when DMA mapping is not checked

2012-12-28 Thread David Miller
From: Larry Finger 
Date: Thu, 27 Dec 2012 21:25:41 -0600

> With 3.8-rc1, the first call of pci_map_single() that is not checked
> with a corresponding pci_dma_mapping_error() call results in a warning
> with a splat as follows:
> 
> WARNING: at lib/dma-debug.c:933 check_unmap+0x480/0x950()
> Hardware name: HP Pavilion dv2700 Notebook PC
> forcedeth :00:0a.0: DMA-API: device driver failed to check
>  map error[device address=0xb176e002] [size=90 bytes] [mapped as 
> single]
> 
> Reviewed-by: Eric Dumazet 
> Signed-off-by: Larry Finger 

Applied, thanks Larry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scripts/tags.sh: Fix regex syntax for etags

2012-12-28 Thread Jesper Juhl
On Sat, 29 Dec 2012, Andreas Schwab wrote:

> Signed-off-by: Andreas Schwab 
> ---
>  scripts/tags.sh | 50 +-
>  1 file changed, 25 insertions(+), 25 deletions(-)
> 
> diff --git a/scripts/tags.sh b/scripts/tags.sh
> index 08f06c0..4c53b7d 100755
> --- a/scripts/tags.sh
> +++ b/scripts/tags.sh
> @@ -201,34 +201,34 @@ exuberant()
>  emacs()
>  {
>   all_target_sources | xargs $1 -a\
> - --regex='/^(ENTRY|_GLOBAL)(\([^)]*\)).*/\2/'\
> + --regex='/^\(ENTRY\|_GLOBAL\)(\([^)]*\)).*/\2/' \
>   --regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/'   \
>   --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/'  \
>   --regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1/' \
> - --regex='/PAGEFLAG\(([^,)]*).*/Page\1/' \
> - --regex='/PAGEFLAG\(([^,)]*).*/SetPage\1/'  \
> - --regex='/PAGEFLAG\(([^,)]*).*/ClearPage\1/'\
> - --regex='/TESTSETFLAG\(([^,)]*).*/TestSetPage\1/'   \
> - --regex='/TESTPAGEFLAG\(([^,)]*).*/Page\1/' \
> - --regex='/SETPAGEFLAG\(([^,)]*).*/SetPage\1/'   \
> - --regex='/__SETPAGEFLAG\(([^,)]*).*/__SetPage\1/'   \
> - --regex='/TESTCLEARFLAG\(([^,)]*).*/TestClearPage\1/'   \
> - --regex='/__TESTCLEARFLAG\(([^,)]*).*/TestClearPage\1/' \
> - --regex='/CLEARPAGEFLAG\(([^,)]*).*/ClearPage\1/'   \
> - --regex='/__CLEARPAGEFLAG\(([^,)]*).*/__ClearPage\1/'   \
> - --regex='/__PAGEFLAG\(([^,)]*).*/__SetPage\1/'  \
> - --regex='/__PAGEFLAG\(([^,)]*).*/__ClearPage\1/'\
> - --regex='/PAGEFLAG_FALSE\(([^,)]*).*/Page\1/'   \
> - --regex='/TESTSCFLAG\(([^,)]*).*/TestSetPage\1/'\
> - --regex='/TESTSCFLAG\(([^,)]*).*/TestClearPage\1/'  \
> - --regex='/SETPAGEFLAG_NOOP\(([^,)]*).*/SetPage\1/'  \
> - --regex='/CLEARPAGEFLAG_NOOP\(([^,)]*).*/ClearPage\1/'  \
> - --regex='/__CLEARPAGEFLAG_NOOP\(([^,)]*).*/__ClearPage\1/' \
> - --regex='/TESTCLEARFLAG_FALSE\(([^,)]*).*/TestClearPage\1/' \
> - --regex='/__TESTCLEARFLAG_FALSE\(([^,)]*).*/__TestClearPage\1/' \
> - --regex='/_PE\(([^,)]*).*/PEVENT_ERRNO__\1/'\
> - --regex='/PCI_OP_READ\(([a-z]*[a-z]).*[1-4]\)/pci_bus_read_config_\1/' \
> - --regex='/PCI_OP_WRITE\(([a-z]*[a-z]).*[1-4]\)/pci_bus_write_config_\1/'
> + --regex='/PAGEFLAG(\([^,)]*\).*/Page\1/'\
> + --regex='/PAGEFLAG(\([^,)]*\).*/SetPage\1/' \
> + --regex='/PAGEFLAG(\([^,)]*\).*/ClearPage\1/'   \
> + --regex='/TESTSETFLAG(\([^,)]*\).*/TestSetPage\1/'  \
> + --regex='/TESTPAGEFLAG(\([^,)]*\).*/Page\1/'\
> + --regex='/SETPAGEFLAG(\([^,)]*\).*/SetPage\1/'  \
> + --regex='/__SETPAGEFLAG(\([^,)]*\).*/__SetPage\1/'  \
> + --regex='/TESTCLEARFLAG(\([^,)]*\).*/TestClearPage\1/'  \
> + --regex='/__TESTCLEARFLAG(\([^,)]*\).*/TestClearPage\1/'\
> + --regex='/CLEARPAGEFLAG(\([^,)]*\).*/ClearPage\1/'  \
> + --regex='/__CLEARPAGEFLAG(\([^,)]*\).*/__ClearPage\1/'  \
> + --regex='/__PAGEFLAG(\([^,)]*\).*/__SetPage\1/' \
> + --regex='/__PAGEFLAG(\([^,)]*\).*/__ClearPage\1/'   \
> + --regex='/PAGEFLAG_FALSE(\([^,)]*\).*/Page\1/'  \
> + --regex='/TESTSCFLAG(\([^,)]*\).*/TestSetPage\1/'   \
> + --regex='/TESTSCFLAG(\([^,)]*\).*/TestClearPage\1/' \
> + --regex='/SETPAGEFLAG_NOOP(\([^,)]*\).*/SetPage\1/' \
> + --regex='/CLEARPAGEFLAG_NOOP(\([^,)]*\).*/ClearPage\1/' \
> + --regex='/__CLEARPAGEFLAG_NOOP(\([^,)]*\).*/__ClearPage\1/' \
> + --regex='/TESTCLEARFLAG_FALSE(\([^,)]*\).*/TestClearPage\1/' \
> + --regex='/__TESTCLEARFLAG_FALSE(\([^,)]*\).*/__TestClearPage\1/' \
> + --regex='/_PE(\([^,)]*\).*/PEVENT_ERRNO__\1/'   \
> + --regex='/PCI_OP_READ(\([a-z]*[a-z]\).*[1-4])/pci_bus_read_config_\1/' \
> + --regex='/PCI_OP_WRITE(\([a-z]*[a-z]\).*[1-4])/pci_bus_write_config_\1/'
>  
>   all_kconfigs | xargs $1 -a  \
>   --regex='/^[ \t]*\(\(menu\)*config\)[ \t]+\([a-zA-Z0-9_]+\)/\3/'
> 

This works for me. Thanks.

Tested-by: Jesper Juhl 


-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18.5/24] blkcg: export __blkg_prfill_rwstat() take#2

2012-12-28 Thread Tejun Heo
Hierarchical stats for cfq-iosched will need __blkg_prfill_rwstat().
Export it.

Signed-off-by: Tejun Heo 
Reported-by: Fengguang Wu 
---
Fengguang's build test discovered that cfq now uses
__blkg_prfill_rwstat() which isn't exported leading to build failures
when cfq is built as a module.  Export it.  This doesn't affect
!module builds.  Git branch updated accordingly.

Thanks.

 block/blk-cgroup.c |1 +
 1 file changed, 1 insertion(+)

--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -552,6 +552,7 @@ u64 __blkg_prfill_rwstat(struct seq_file
seq_printf(sf, "%s Total %llu\n", dname, (unsigned long long)v);
return v;
 }
+EXPORT_SYMBOL_GPL(__blkg_prfill_rwstat);
 
 /**
  * blkg_prfill_stat - prfill callback for blkg_stat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scripts/tags.sh: Fix regex syntax for etags

2012-12-28 Thread Andreas Schwab
Signed-off-by: Andreas Schwab 
---
 scripts/tags.sh | 50 +-
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/scripts/tags.sh b/scripts/tags.sh
index 08f06c0..4c53b7d 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -201,34 +201,34 @@ exuberant()
 emacs()
 {
all_target_sources | xargs $1 -a\
-   --regex='/^(ENTRY|_GLOBAL)(\([^)]*\)).*/\2/'\
+   --regex='/^\(ENTRY\|_GLOBAL\)(\([^)]*\)).*/\2/' \
--regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/'   \
--regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/'  \
--regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1/' \
-   --regex='/PAGEFLAG\(([^,)]*).*/Page\1/' \
-   --regex='/PAGEFLAG\(([^,)]*).*/SetPage\1/'  \
-   --regex='/PAGEFLAG\(([^,)]*).*/ClearPage\1/'\
-   --regex='/TESTSETFLAG\(([^,)]*).*/TestSetPage\1/'   \
-   --regex='/TESTPAGEFLAG\(([^,)]*).*/Page\1/' \
-   --regex='/SETPAGEFLAG\(([^,)]*).*/SetPage\1/'   \
-   --regex='/__SETPAGEFLAG\(([^,)]*).*/__SetPage\1/'   \
-   --regex='/TESTCLEARFLAG\(([^,)]*).*/TestClearPage\1/'   \
-   --regex='/__TESTCLEARFLAG\(([^,)]*).*/TestClearPage\1/' \
-   --regex='/CLEARPAGEFLAG\(([^,)]*).*/ClearPage\1/'   \
-   --regex='/__CLEARPAGEFLAG\(([^,)]*).*/__ClearPage\1/'   \
-   --regex='/__PAGEFLAG\(([^,)]*).*/__SetPage\1/'  \
-   --regex='/__PAGEFLAG\(([^,)]*).*/__ClearPage\1/'\
-   --regex='/PAGEFLAG_FALSE\(([^,)]*).*/Page\1/'   \
-   --regex='/TESTSCFLAG\(([^,)]*).*/TestSetPage\1/'\
-   --regex='/TESTSCFLAG\(([^,)]*).*/TestClearPage\1/'  \
-   --regex='/SETPAGEFLAG_NOOP\(([^,)]*).*/SetPage\1/'  \
-   --regex='/CLEARPAGEFLAG_NOOP\(([^,)]*).*/ClearPage\1/'  \
-   --regex='/__CLEARPAGEFLAG_NOOP\(([^,)]*).*/__ClearPage\1/' \
-   --regex='/TESTCLEARFLAG_FALSE\(([^,)]*).*/TestClearPage\1/' \
-   --regex='/__TESTCLEARFLAG_FALSE\(([^,)]*).*/__TestClearPage\1/' \
-   --regex='/_PE\(([^,)]*).*/PEVENT_ERRNO__\1/'\
-   --regex='/PCI_OP_READ\(([a-z]*[a-z]).*[1-4]\)/pci_bus_read_config_\1/' \
-   --regex='/PCI_OP_WRITE\(([a-z]*[a-z]).*[1-4]\)/pci_bus_write_config_\1/'
+   --regex='/PAGEFLAG(\([^,)]*\).*/Page\1/'\
+   --regex='/PAGEFLAG(\([^,)]*\).*/SetPage\1/' \
+   --regex='/PAGEFLAG(\([^,)]*\).*/ClearPage\1/'   \
+   --regex='/TESTSETFLAG(\([^,)]*\).*/TestSetPage\1/'  \
+   --regex='/TESTPAGEFLAG(\([^,)]*\).*/Page\1/'\
+   --regex='/SETPAGEFLAG(\([^,)]*\).*/SetPage\1/'  \
+   --regex='/__SETPAGEFLAG(\([^,)]*\).*/__SetPage\1/'  \
+   --regex='/TESTCLEARFLAG(\([^,)]*\).*/TestClearPage\1/'  \
+   --regex='/__TESTCLEARFLAG(\([^,)]*\).*/TestClearPage\1/'\
+   --regex='/CLEARPAGEFLAG(\([^,)]*\).*/ClearPage\1/'  \
+   --regex='/__CLEARPAGEFLAG(\([^,)]*\).*/__ClearPage\1/'  \
+   --regex='/__PAGEFLAG(\([^,)]*\).*/__SetPage\1/' \
+   --regex='/__PAGEFLAG(\([^,)]*\).*/__ClearPage\1/'   \
+   --regex='/PAGEFLAG_FALSE(\([^,)]*\).*/Page\1/'  \
+   --regex='/TESTSCFLAG(\([^,)]*\).*/TestSetPage\1/'   \
+   --regex='/TESTSCFLAG(\([^,)]*\).*/TestClearPage\1/' \
+   --regex='/SETPAGEFLAG_NOOP(\([^,)]*\).*/SetPage\1/' \
+   --regex='/CLEARPAGEFLAG_NOOP(\([^,)]*\).*/ClearPage\1/' \
+   --regex='/__CLEARPAGEFLAG_NOOP(\([^,)]*\).*/__ClearPage\1/' \
+   --regex='/TESTCLEARFLAG_FALSE(\([^,)]*\).*/TestClearPage\1/' \
+   --regex='/__TESTCLEARFLAG_FALSE(\([^,)]*\).*/__TestClearPage\1/' \
+   --regex='/_PE(\([^,)]*\).*/PEVENT_ERRNO__\1/'   \
+   --regex='/PCI_OP_READ(\([a-z]*[a-z]\).*[1-4])/pci_bus_read_config_\1/' \
+   --regex='/PCI_OP_WRITE(\([a-z]*[a-z]\).*[1-4])/pci_bus_write_config_\1/'
 
all_kconfigs | xargs $1 -a  \
--regex='/^[ \t]*\(\(menu\)*config\)[ \t]+\([a-zA-Z0-9_]+\)/\3/'
-- 
1.8.0.3


-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix problem with cpufreq_pndemand or cpufreq_conservative

2012-12-28 Thread Rafael J. Wysocki
On Friday, December 28, 2012 04:17:24 PM Larry Finger wrote:
> Since commit 2aacdff entitled "cpufreq: Move common part from governors to 
> separate file", whenever the drivers that depend on this new file 
> (cpufreq_ondemand or cpufreq_conservative) are built as modules, a new module 
> named cpufreq_governor is created. It seems that kmake is smart enough to 
> create 
> a separate module whenever more than one module includes the same object 
> file. 
> As drivers/cpufreq/cpufreq_governor.c contains no MODULE directives, the 
> resulting module has no license specified, which results in logging of a 
> "module 
> license 'unspecified' taints kernel". In addition, a number of globals are 
> exported GPL only, and are therefore not available.
> 
> Signed-off-by: Larry Finger 
> ---
> 
> This particular patch is the simplest possible; however, it hides the intent. 
> I 
> have prepared the longer version that makes the reason clearer by adding a 
> new 
> configuration variable that is dependent on the other two, and rearranges 
> drivers/cpufreq/Makefile. That version could be submitted if that is what is 
> desired.

Yes, please.

> The changes to cpufreq_governor.c are the same as in this version.

I wonder if that's avoidable?  The intention is not to create an additional
module, clearly.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to look for source code in kernel

2012-12-28 Thread Jesper Juhl
On Fri, 28 Dec 2012, Guennadi Liakhovetski wrote:

> On Fri, 28 Dec 2012, Borislav Petkov wrote:
> 
> > On Thu, Dec 27, 2012 at 11:36:13PM -0800, Eric W. Biederman wrote:
> > > git-ls-files | xargs fgrep 'struct f2fs_inode'
> > > 
> > > That returns instantly and tells me where to look.  If you can do an
> > > instant brute force search what is the point of an index?
> > 
> > Not if you're using a lame-ass laptop with a rotating disk:
> > 
> > $ time git ls-files | xargs grep -E 'struct mce\W*{'
> > arch/x86/include/uapi/asm/mce.h:struct mce {
> > arch/x86/kernel/cpu/mcheck/mce.c:   if (!final || memcmp(m, 
> > final, sizeof(struct mce))) {
> > 
> > real2m48.415s
> > user0m2.388s
> > sys 0m15.668s
> > 
> > What I've grown accustomed to is cscope with a prior find run on the
> > kernel source tree to create a custom cscope.files which cscope uses to
> > index and then using vim bindings in cscope so that if, for example, the
> > cursor is on a function call, executing a keyboard shortcut opens the
> > definition of that function in another vim tab. I.e., a thin IDE done
> > right.
> 
> ...same works with TAGS under emacs, the only annoying problem with it 
> since "recently" (about half a year or more) is that "make TAGS" is 
> spewing out tons of
> 
> etags: Unmatched ( or \( while compiling pattern
> 
> oh well, looks like noone is using it / noone cares enough...
> 
I use it. I haven't bothered (yet) to look into the cause of the message 
it spews since the etags ('TAGS') file it produces still seems to work 
well enough.

-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to look for source code in kernel

2012-12-28 Thread Guennadi Liakhovetski
On Fri, 28 Dec 2012, Borislav Petkov wrote:

> On Thu, Dec 27, 2012 at 11:36:13PM -0800, Eric W. Biederman wrote:
> > git-ls-files | xargs fgrep 'struct f2fs_inode'
> > 
> > That returns instantly and tells me where to look.  If you can do an
> > instant brute force search what is the point of an index?
> 
> Not if you're using a lame-ass laptop with a rotating disk:
> 
> $ time git ls-files | xargs grep -E 'struct mce\W*{'
> arch/x86/include/uapi/asm/mce.h:struct mce {
> arch/x86/kernel/cpu/mcheck/mce.c:   if (!final || memcmp(m, 
> final, sizeof(struct mce))) {
> 
> real2m48.415s
> user0m2.388s
> sys 0m15.668s
> 
> What I've grown accustomed to is cscope with a prior find run on the
> kernel source tree to create a custom cscope.files which cscope uses to
> index and then using vim bindings in cscope so that if, for example, the
> cursor is on a function call, executing a keyboard shortcut opens the
> definition of that function in another vim tab. I.e., a thin IDE done
> right.

...same works with TAGS under emacs, the only annoying problem with it 
since "recently" (about half a year or more) is that "make TAGS" is 
spewing out tons of

etags: Unmatched ( or \( while compiling pattern

oh well, looks like noone is using it / noone cares enough...

Thanks
Guennadi

> 
> > My experience with gui editors is that they always hide something I
> > need to see, or my code is just strange enough (say having asm file,
> > or supporting multiple architectures) that the tools get horribly
> > confused.
> 
> That's true, then I tend to use another xterm with tabbed vim showing
> additional files. Btw, git ls-files assumes a source file is tracked by
> git and in the seldom case where you're adding new, yet untracked files,
> that won't work.
> 
> So probably a mixed approach of cscope in one window and grep + editor
> in another would cover all bases. For a newbie who wants to only browse
> the code, cscope should be enough for starters, I'd say.
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
> Boris.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED

2012-12-28 Thread Christoffer Dall
On Fri, Dec 28, 2012 at 5:01 PM, Linus Torvalds
 wrote:
> On Fri, Dec 28, 2012 at 9:07 AM,   wrote:
>> From: Christoffer Dall 
>>
>> Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
>> (PageHead) is true, for tail pages.  This breaks cache cleaning on some
>> ARM systems, and may cause other bugs.
>
> So this already got committed earlier as commit ad4b3fb7ff99 ("mm: Fix
> PageHead when !CONFIG_PAGEFLAGS_EXTENDED")
>
> Linus

Sorry about the noise then, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Fix problem with cpufreq_pndemand or cpufreq_conservative

2012-12-28 Thread Larry Finger
Since commit 2aacdff entitled "cpufreq: Move common part from governors to 
separate file", whenever the drivers that depend on this new file 
(cpufreq_ondemand or cpufreq_conservative) are built as modules, a new module 
named cpufreq_governor is created. It seems that kmake is smart enough to create 
a separate module whenever more than one module includes the same object file. 
As drivers/cpufreq/cpufreq_governor.c contains no MODULE directives, the 
resulting module has no license specified, which results in logging of a "module 
license 'unspecified' taints kernel". In addition, a number of globals are 
exported GPL only, and are therefore not available.


Signed-off-by: Larry Finger 
---

This particular patch is the simplest possible; however, it hides the intent. I 
have prepared the longer version that makes the reason clearer by adding a new 
configuration variable that is dependent on the other two, and rearranges 
drivers/cpufreq/Makefile. That version could be submitted if that is what is 
desired. The changes to cpufreq_governor.c are the same as in this version.


Larry


  cpufreq_governor.c |5 +
  1 file changed, 5 insertions(+)
---
Index: wireless-testing-new/drivers/cpufreq/cpufreq_governor.c
===
--- wireless-testing-new.orig/drivers/cpufreq/cpufreq_governor.c
+++ wireless-testing-new/drivers/cpufreq/cpufreq_governor.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "cpufreq_governor.h"

@@ -316,3 +317,7 @@ second_time:
return 0;
 }
 EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
+MODULE_AUTHOR("Alexander Clouter ");
+MODULE_DESCRIPTION("'cpufreq_governor' - A mini-module containing "
+   "common code for cpufreq_conservative and cpufreq_ondemand");
+MODULE_LICENSE("GPL");






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mn10300: select generic atomic64_t support

2012-12-28 Thread Geert Uytterhoeven
On Tue, Aug 14, 2012 at 5:17 PM, Fengguang Wu  wrote:
> Rationals from Eric:
>
> So I just looked a little deeper and it appears architectures that do
> not support atomic64_t are broken.
>
> The generic atomic64 support came in 2009 to support the perf subsystem
> with the expectation that all architectures would implement atomic64
> support.
>
> Furthermore upon inspection of the kernel atomic64_t is used in a fair
> number of places beyond the performance counters:
>
> block/blk-cgroup.c
> drivers/acpi/apei/
> drivers/block/rbd.c
> drivers/crypto/nx/nx.h
> drivers/gpu/drm/radeon/radeon.h
> drivers/infiniband/hw/ipath/
> drivers/infiniband/hw/qib/
> drivers/staging/octeon/
> fs/xfs/
> include/linux/perf_event.h
> include/net/netfilter/nf_conntrack_acct.h
> kernel/events/
> kernel/trace/
> net/mac80211/key.h
> net/rds/
>
> The block control group, infiniband, xfs, crypto, 802.11, netfilter.
> Nothing quite so fundamental as fs/namespace.c but definitely in
> multiplatform-code that should work, and is already broken on those
> architecutres.
>
> Looking at the implementation of atomic64_add_return in lib/atomic64.c
> the code looks as efficient as these kinds of things get.
>
> Which leads me to the conclusion that we need atomic64 support on all
> architectures.
>
> CC: "Eric W. Biederman" 
> Signed-off-by: Fengguang Wu 

Acked-by: Geert Uytterhoeven 

> ---
>  arch/mn10300/Kconfig |1 +
>  1 file changed, 1 insertion(+)
>
> --- linux.orig/arch/mn10300/Kconfig 2012-08-03 15:36:21.391560778 +0800
> +++ linux/arch/mn10300/Kconfig  2012-08-14 22:53:14.751795832 +0800
> @@ -8,6 +8,7 @@ config MN10300
> select HAVE_ARCH_KGDB
> select HAVE_NMI_WATCHDOG if MN10300_WD_TIMER
> select GENERIC_CLOCKEVENTS
> +   select GENERIC_ATOMIC64
>
>  config AM33_2
> def_bool n

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to look for source code in kernel

2012-12-28 Thread Geert Uytterhoeven
On Fri, Dec 28, 2012 at 2:07 PM, Borislav Petkov  wrote:
> On Thu, Dec 27, 2012 at 11:36:13PM -0800, Eric W. Biederman wrote:
>> git-ls-files | xargs fgrep 'struct f2fs_inode'
>>
>> That returns instantly and tells me where to look.  If you can do an
>> instant brute force search what is the point of an index?
>
> Not if you're using a lame-ass laptop with a rotating disk:
>
> $ time git ls-files | xargs grep -E 'struct mce\W*{'
> arch/x86/include/uapi/asm/mce.h:struct mce {
> arch/x86/kernel/cpu/mcheck/mce.c:   if (!final || memcmp(m, 
> final, sizeof(struct mce))) {
>
> real2m48.415s
> user0m2.388s
> sys 0m15.668s

That's the first run. Now everything is in the buffer cache (assumed you have
enough RAM), and try again...

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: Fix PageHead when !CONFIG_PAGEFLAGS_EXTENDED

2012-12-28 Thread Linus Torvalds
On Fri, Dec 28, 2012 at 9:07 AM,   wrote:
> From: Christoffer Dall 
>
> Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
> (PageHead) is true, for tail pages.  This breaks cache cleaning on some
> ARM systems, and may cause other bugs.

So this already got committed earlier as commit ad4b3fb7ff99 ("mm: Fix
PageHead when !CONFIG_PAGEFLAGS_EXTENDED")

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] lib: cpu_rmap: avoid flushing all workqueues

2012-12-28 Thread Eric Dumazet
On Fri, 2012-12-28 at 13:44 -0800, David Decotigny wrote:
> Thanks,
> 
> Ok for the cpu_rmap_put helper. Will do this in v3 of this patch.
> 
> Your comments suggest more refactoring, which might be better in the
> form of 1 or 2 additional patches that:
>  - rename alloc_cpu_rmap & co according to new conventions (cpu_rmap_*)
>  - remove the cpu_rmap sub-API altogether, keeping only irq_cpu_rmap
> (controversial?)
> 
> I'd rather current patch is integrated in its own as it fixes an actual bug.
> I will send the other patch(es) separately later.

This looks fine to me !


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] video/exynos: don't use [delayed_]work_pending()

2012-12-28 Thread 'Tejun Heo'
On Wed, Dec 26, 2012 at 01:04:02PM +0900, Jingoo Han wrote:
> On Saturday, December 22, 2012 12:06 PM, Kukjin Kim wrote 
> > Tejun Heo wrote:
> > >
> > > There's no need to test whether a (delayed) work item in pending
> > > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > > and quite a few of them are buggy.
> > >
> > > Remove unnecessary pending tests from exynos_dp_core.  Only compile
> > > tested.
> > >
> > > Signed-off-by: Tejun Heo 
> > > Cc: Kukjin Kim 
> > 
> > Acked-by: Kukjin Kim 
> 
> Acked-by: Jingoo Han 
> 
> I tested this patch with Exynos5250.
> Also, taking it through the workqueue tree would be better.

Applied to wq/for-3.9-cleanups.  Thanks!

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mips: Export min_low_pfn if CONFIG_FLATMEM

2012-12-28 Thread Ralf Baechle
On Fri, Dec 28, 2012 at 09:40:56PM +0100, Geert Uytterhoeven wrote:

> Subject: [PATCH] mips: Export min_low_pfn if CONFIG_FLATMEM
> 
> If CONFIG_FLATMEM=y on MIPS, pfn_valid() (and thus virt_addr_valid()) uses
> min_low_pfn. However, min_low_pfn is not exported by the generic bootmem
> code.
> 
> As of commit e52a29326462badd9ceec90a9eb2ac2a8550e02e ("aoe: avoid races
> between device destruction and discovery"), aoeblk_open() uses
> virt_addr_valid(), causing a link error in the modular case:
> 
> ERROR: "min_low_pfn" [drivers/block/aoe/aoe.ko] undefined!
> 
> Add a custom export to fix this, just like is done for ia64 in
> arch/ia64/kernel/ia64_ksyms.c.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
> http://kisskb.ellerman.id.au/kisskb/buildresult/7864918/
> 
>  arch/mips/kernel/mips_ksyms.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/mips/kernel/mips_ksyms.c b/arch/mips/kernel/mips_ksyms.c
> index df1e3e4..8579bb8 100644
> --- a/arch/mips/kernel/mips_ksyms.c
> +++ b/arch/mips/kernel/mips_ksyms.c
> @@ -60,3 +60,8 @@ EXPORT_SYMBOL(invalid_pte_table);
>  /* _mcount is defined in arch/mips/kernel/mcount.S */
>  EXPORT_SYMBOL(_mcount);
>  #endif
> +
> +#ifdef CONFIG_FLATMEM
> +#include 
> +EXPORT_SYMBOL(min_low_pfn);  /* defined by bootmem.c, but not exported by 
> generic code */
> +#endif

I've already applied a different patch which turns virt_addr_valid() into
a normal function and exports it.

Thanks,

  Ralf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] lib: cpu_rmap: avoid flushing all workqueues

2012-12-28 Thread David Decotigny
Thanks,

Ok for the cpu_rmap_put helper. Will do this in v3 of this patch.

Your comments suggest more refactoring, which might be better in the
form of 1 or 2 additional patches that:
 - rename alloc_cpu_rmap & co according to new conventions (cpu_rmap_*)
 - remove the cpu_rmap sub-API altogether, keeping only irq_cpu_rmap
(controversial?)

I'd rather current patch is integrated in its own as it fixes an actual bug.
I will send the other patch(es) separately later.

v3 for this patch coming soon.

On Fri, Dec 28, 2012 at 1:14 PM, Eric Dumazet  wrote:
> On Fri, 2012-12-28 at 11:03 -0800, David Decotigny wrote:
>> In some cases, free_irq_cpu_rmap() is called while holding a lock
>> (eg. rtnl). This can lead to deadlocks, because it invokes
>> flush_scheduled_work() which ends up waiting for whole system
>> workqueue to flush, but some pending works might try to acquire the
>> lock we are already holding.
>>
>> This commit uses reference-counting to replace
>> irq_run_affinity_notifiers(). It also removes
>> irq_run_affinity_notifiers() altogether.
>>
>> Signed-off-by: David Decotigny 
>> ---
>>  include/linux/cpu_rmap.h  |   13 -
>>  include/linux/interrupt.h |5 -
>>  lib/cpu_rmap.c|   47 
>> +++--
>>  3 files changed, 45 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
>> index ac3bbb5..3be2813 100644
>> --- a/include/linux/cpu_rmap.h
>> +++ b/include/linux/cpu_rmap.h
>> @@ -13,9 +13,11 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  /**
>>   * struct cpu_rmap - CPU affinity reverse-map
>> + * @refcount: kref for object
>>   * @size: Number of objects to be reverse-mapped
>>   * @used: Number of objects added
>>   * @obj: Pointer to array of object pointers
>> @@ -23,6 +25,7 @@
>>   *  based on affinity masks
>>   */
>>  struct cpu_rmap {
>> + struct kref refcount;
>>   u16 size, used;
>>   void**obj;
>>   struct {
>> @@ -33,15 +36,7 @@ struct cpu_rmap {
>>  #define CPU_RMAP_DIST_INF 0x
>>
>>  extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags);
>> -
>> -/**
>> - * free_cpu_rmap - free CPU affinity reverse-map
>> - * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
>> - */
>> -static inline void free_cpu_rmap(struct cpu_rmap *rmap)
>> -{
>> - kfree(rmap);
>> -}
>> +extern void free_cpu_rmap(struct cpu_rmap *rmap);
>>
>>  extern int cpu_rmap_add(struct cpu_rmap *rmap, void *obj);
>>  extern int cpu_rmap_update(struct cpu_rmap *rmap, u16 index,
>> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
>> index 5e4e617..5fa5afe 100644
>> --- a/include/linux/interrupt.h
>> +++ b/include/linux/interrupt.h
>> @@ -268,11 +268,6 @@ struct irq_affinity_notify {
>>  extern int
>>  irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify 
>> *notify);
>>
>> -static inline void irq_run_affinity_notifiers(void)
>> -{
>> - flush_scheduled_work();
>> -}
>> -
>>  #else /* CONFIG_SMP */
>>
>>  static inline int irq_set_affinity(unsigned int irq, const struct cpumask 
>> *m)
>> diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
>> index 145dec5..bb5d0af 100644
>> --- a/lib/cpu_rmap.c
>> +++ b/lib/cpu_rmap.c
>> @@ -45,6 +45,7 @@ struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t 
>> flags)
>>   if (!rmap)
>>   return NULL;
>>
>> + kref_init(>refcount);
>>   rmap->obj = (void **)((char *)rmap + obj_offset);
>>
>>   /* Initially assign CPUs to objects on a rota, since we have
>> @@ -63,6 +64,26 @@ struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t 
>> flags)
>>  }
>>  EXPORT_SYMBOL(alloc_cpu_rmap);
>>
>> +/**
>> + * reclaim_cpu_rmap - internal reclaiming helper called from kref_put
>> + * @ref: kref to struct cpu_rmap
>> + */
>> +static void reclaim_cpu_rmap(struct kref *ref)
>
> Could be named cpu_rmap_free()
>
> (and alloc_cpu_rmap() should be renamed as cpu_rmap_alloc()
>
>> +{
>> + struct cpu_rmap *rmap = container_of(ref, struct cpu_rmap, refcount);
>> + kfree(rmap);
>> +}
>> +
>> +/**
>> + * free_cpu_rmap - free CPU affinity reverse-map
>> + * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
>> + */
>> +void free_cpu_rmap(struct cpu_rmap *rmap)
>
> This could be named : cpu_rmap_put()
>
>> +{
>> + kref_put(>refcount, reclaim_cpu_rmap);
>> +}
>> +EXPORT_SYMBOL(free_cpu_rmap);
>
> It seems this function could be static, and not exported.
>
>> +
>>  /* Reevaluate nearest object for given CPU, comparing with the given
>>   * neighbours at the given distance.
>>   */
>> @@ -197,8 +218,7 @@ struct irq_glue {
>>   * free_irq_cpu_rmap - free a CPU affinity reverse-map used for IRQs
>>   * @rmap: Reverse-map allocated with alloc_irq_cpu_map(), or %NULL
>>   *
>> - * Must be called in process context, before freeing the IRQs, and
>> - * without holding any locks required by global workqueue items.
>> + * Must be 

Re: [PATCH 21/25] tty/max3100: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
On Fri, Dec 21, 2012 at 08:21:25PM -0800, Greg Kroah-Hartman wrote:
> On Fri, Dec 21, 2012 at 05:57:11PM -0800, Tejun Heo wrote:
> > There's no need to test whether a (delayed) work item in pending
> > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > and quite a few of them are buggy.
> > 
> > Remove unnecessary pending tests from max3100.  Only compile tested.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: Greg Kroah-Hartman 
> > Cc: Jiri Slaby 
> > ---
> > Please let me know how this patch should be routed.  I can take it
> > through the workqueue tree if necessary.
> 
> Please, feel free to take it through your tree:
> 
> Acked-by: Greg Kroah-Hartman 

Applied to wq/for-3.9-cleanups.  Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 15/25] x86/mce: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from x86/mce.  Only compile tested.

v2: Local var work removed from mce_schedule_work() as suggested by
Borislav.

Signed-off-by: Tejun Heo 
Acked-by: Borislav Petkov 
Cc: Tony Luck 
Cc: linux-e...@vger.kernel.org
---
Applied to wq/for-3.9-cleanups.  Thanks.

Thanks.

 arch/x86/kernel/cpu/mcheck/mce.c |   14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -512,11 +512,8 @@ int mce_available(struct cpuinfo_x86 *c)
 
 static void mce_schedule_work(void)
 {
-   if (!mce_ring_empty()) {
-   struct work_struct *work = &__get_cpu_var(mce_work);
-   if (!work_pending(work))
-   schedule_work(work);
-   }
+   if (!mce_ring_empty())
+   schedule_work(&__get_cpu_var(mce_work));
 }
 
 DEFINE_PER_CPU(struct irq_work, mce_irq_work);
@@ -1351,12 +1348,7 @@ int mce_notify_irq(void)
/* wake processes polling /dev/mcelog */
wake_up_interruptible(_chrdev_wait);
 
-   /*
-* There is no risk of missing notifications because
-* work_pending is always cleared before the function is
-* executed.
-*/
-   if (mce_helper[0] && !work_pending(_trigger_work))
+   if (mce_helper[0])
schedule_work(_trigger_work);
 
if (__ratelimit())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/25] rfkill: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
On Sat, Dec 22, 2012 at 09:22:13PM +0100, Johannes Berg wrote:
> On Fri, 2012-12-21 at 17:57 -0800, Tejun Heo wrote:
> > There's no need to test whether a (delayed) work item in pending
> > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > and quite a few of them are buggy.
> > 
> > Remove unnecessary pending tests from rfkill.  Only compile
> > tested.
> 
> Looks fine to me, feel free to route through your tree -- nobody changes
> rfkill much (famous last words...) :-)

Applied to wq/for-3.9-cleanups w/ your Acked-by added.  Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/25] wl1251: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
On Sat, Dec 22, 2012 at 04:14:29PM +0200, Luciano Coelho wrote:
> On Fri, 2012-12-21 at 17:56 -0800, Tejun Heo wrote:
> > There's no need to test whether a (delayed) work item in pending
> > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > and quite a few of them are buggy.
> > 
> > Remove unnecessary pending tests from wl1251.  Only compile tested.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: Luciano Coelho 
> > Cc: linux-wirel...@vger.kernel.org
> > ---
> > Please let me know how this patch should be routed.  I can take it
> > through the workqueue tree if necessary.
> > 
> > Thanks.
> 
> It's probably easier if you take it via your tree.  This driver doesn't
> get patches very often, so I doubt there will be any conflicts.
> 
> Thank you!
> 
> Acked-by: Luciano Coelho 

Applied to wq/for-3.9-cleanups.  Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/25] thinkpad_acpi: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
On Sat, Dec 22, 2012 at 09:55:04PM -0200, Henrique de Moraes Holschuh wrote:
> On Fri, 21 Dec 2012, Tejun Heo wrote:
> > There's no need to test whether a (delayed) work item in pending
> > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > and quite a few of them are buggy.
> > 
> > Remove unnecessary pending tests from thinkpad_acpi.  Only compile
> > tested.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: Henrique de Moraes Holschuh 
> > Cc: ibm-acpi-de...@lists.sourceforge.net
> > Cc: platform-driver-...@vger.kernel.org
> 
> Acked-by: Henrique de Moraes Holschuh 
> 
> > ---
> > Please let me know how this patch should be routed.  I can take it
> > through the workqueue tree if necessary.
> 
> It will not clash with anything I know of, so feel free to route it
> through the workqueue tree.

Applied to wq/for-3.9-cleanups.  Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/25] mwifiex: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
On Sat, Dec 22, 2012 at 02:29:57PM -0800, Bing Zhao wrote:
> Hi Tejun,
> 
> Thanks for the patch.
> 
> > Drop work_pending() test from mwifiex_sdio_card_reset().  As
> > work_pending() becomes %false before sdio_card_reset_worker() starts
> > executing, it doesn't really protect anything.  reset_host may change
> > between mmc_remove_host() and mmc_add_host().  Make
> > sdio_card_reset_worker() cache the target mmc_host so that it isn't
> > affected by mwifiex_sdio_card_reset() racing with it.
> > 
> > Only compile tested.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: Bing Zhao 
> > Cc: linux-wirel...@vger.kernel.org
> 
> Acked-by: Bing Zhao 
> 
> > ---
> > Please let me know how this patch should be routed.  I can take it
> > through the workqueue tree if necessary.
> 
> If you are taking other patches in this series through your tree, please take 
> this one too.

Applied to wq/for-3.9-cleanups.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/25] sja1000: don't use [delayed_]work_pending()

2012-12-28 Thread Tejun Heo
On Sat, Dec 22, 2012 at 12:01:11AM -0800, David Miller wrote:
> From: Tejun Heo 
> Date: Fri, 21 Dec 2012 17:56:53 -0800
> 
> > There's no need to test whether a (delayed) work item in pending
> > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > and quite a few of them are buggy.
> > 
> > Remove unnecessary pending tests from sja1000.  Only compile tested.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: Wolfgang Grandegger 
> > Cc: "David S. Miller" 
> > Cc: net...@vger.kernel.org
> > ---
> > Please let me know how this patch should be routed.  I can take it
> > through the workqueue tree if necessary.
> 
> I would suggest just taking it via the workqueue tree, thanks Tejun.

Applied to wq/for-3.9-cleanups w/ your Acked-by added.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: how to look for source code in kernel

2012-12-28 Thread Joe Perches
On Fri, 2012-12-28 at 16:09 +, Al Viro wrote:
> On Thu, Dec 27, 2012 at 11:36:13PM -0800, Eric W. Biederman wrote:
> > But then I am probably peculiar keeping an index of the source code in
> > my head.  When I need to look for something and I don't know where to
> > find it I do.
> > 
> > git-ls-files | xargs fgrep 'struct f2fs_inode'
> 
> What's wrong with git grep?

Positive: it loads multiple cpus with very little file I/O
Negative: it doesn't use -P very well.
  it won't spans lines like:

struct name
{

with git grep -P "\bstruct\s+\w+\s*{"

(the last grep that does I believe is 2.54)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cgroup: remove unused dummy cgroup_fork_callbacks()

2012-12-28 Thread Tejun Heo
>From a0a4bddd2779a51b6529afa113c5671ebcc21b14 Mon Sep 17 00:00:00 2001
From: Tejun Heo 
Date: Fri, 28 Dec 2012 13:18:28 -0800

5edee61ede ("cgroup: cgroup_subsys->fork() should be called after the
task is added to css_set") removed cgroup_fork_callbacks() but forgot
to remove its dummy version for !CONFIG_CGROUPS.  Remove it.

Signed-off-by: Tejun Heo 
Reported-by: Herton Ronaldo Krzesinski 
---
Applied to cgroup/for-3.9.  Thanks.

 include/linux/cgroup.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 7d73905..942e687 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -706,7 +706,6 @@ struct cgroup_subsys_state *cgroup_css_from_dir(struct file 
*f, int id);
 static inline int cgroup_init_early(void) { return 0; }
 static inline int cgroup_init(void) { return 0; }
 static inline void cgroup_fork(struct task_struct *p) {}
-static inline void cgroup_fork_callbacks(struct task_struct *p) {}
 static inline void cgroup_post_fork(struct task_struct *p) {}
 static inline void cgroup_exit(struct task_struct *p, int callbacks) {}
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] PCI / ACPI: Rework ACPI device nodes lookup for the PCI bus type

2012-12-28 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

As the kernel Bugzilla report #42696 indicates, it generally is not
sufficient to use _ADR to get an ACPI device node corresponding to
the given PCI device, because there may be multiple objects with
matching _ADR in the ACPI namespace (this probably is against the
spec, but it evidently happens in practice).

One possible way to improve the situation is to use the presence of
another ACPI method to distinguish between the matching namespace
nodes.  For example, if the presence of _INI is checked in addition
to the return value of _ADR, bug #42696 goes away on the affected
machines.  Of course, this is somewhat arbitrary, but it may be
argued that executing _INI for an ACPI device node kind of means that
we are going to use that device node going forward, so we should
generally prefer the nodes where we have executed _INI to "competing"
nodes without _INI.

In that case, though, we shouldn't take the nodes where we haven't
executed _INI into account, but that's quite straightforwad to
achieve.  Namely, we only need to check nodes that we created struct
acpi_device objects for.  This also makes sense for a different
reason, which is that the result of acpi_pci_find_device() is used
to get a struct acpi_device object (not just an ACPI handle)
corresponding to the given PCI device.

Accordingly, introduce acpi_get_child_device() that finds a struct
acpi_device corresponding to the given address by walking the
children of the ACPI device node whose handle is its first argument.
The lookup is carried out by evaluating _ADR for every child and
comparing the result with the given address.  If there's a match and
that child has _INI defined, it is returned as a result.  If _INI is
not present, the search continues until (a) there are no more matches
or (b) there is another matching child whose _INI is present (in
which case that child is returned instead of the first matching one).

The walk of the list of children is done in the reverse direction for
two reasons.  The first reason is for compatibility with
acpi_get_child() that returns the handle of the last matching child
of the given parent.  The second one is to get the last device whose
_INI was executed first (that _INI might have overriden whatever _INI
for the other matching device nodes had done).

To fix bug #42696, modify acpi_pci_find_device() to use
acpi_get_child_device() instead of acpi_get_child() for ACPI device
node lookup.

References: https://bugzilla.kernel.org/show_bug.cgi?id=42696
Reported-by: Peter Wu 
Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |   45 +
 drivers/acpi/internal.h |4 
 drivers/acpi/proc.c |1 +
 drivers/acpi/sleep.h|1 -
 drivers/pci/pci-acpi.c  |7 +--
 include/acpi/acpi_bus.h |1 +
 6 files changed, 56 insertions(+), 3 deletions(-)

Index: linux/drivers/acpi/glue.c
===
--- linux.orig/drivers/acpi/glue.c
+++ linux/drivers/acpi/glue.c
@@ -93,6 +93,51 @@ static int acpi_find_bridge_device(struc
return ret;
 }
 
+/**
+ * acpi_get_child_device - Find specific child of an ACPI device.
+ * @phandle: ACPI handle of the parent device to find a child of.
+ * @address: Address of the child to find (as returned by _ADR).
+ *
+ * Find the child of the ACPI device node represented by @phandle whose _ADR
+ * method's return value is equal to @address.  If there are more children with
+ * matching _ADR return values, take the (last) one having _INI defined.
+ */
+struct acpi_device *acpi_get_child_device(acpi_handle phandle, u64 address)
+{
+   struct acpi_device *parent, *adev, *ret = NULL;
+
+   if (acpi_bus_get_device(phandle, ))
+   return NULL;
+
+   mutex_lock(_device_lock);
+   /* Use reverse direction for compatibility with acpi_get_child(). */
+   list_for_each_entry_reverse(adev, >children, node) {
+   unsigned long long addr;
+   acpi_status status;
+   acpi_handle out;
+
+   status = acpi_evaluate_integer(adev->handle, METHOD_NAME__ADR,
+  NULL, );
+   if (ACPI_FAILURE(status) || addr != address)
+   continue;
+
+   if (ret)
+   acpi_handle_warn(adev->handle,
+"_ADR conflict with device %s\n",
+dev_name(>dev));
+
+   status = acpi_get_handle(adev->handle, "_INI", );
+   if (ACPI_SUCCESS(status)) {
+   ret = adev;
+   break;
+   } else if (!ret) {
+   ret = adev;
+   }
+   }
+   mutex_unlock(_device_lock);
+   return ret;
+}
+
 /* Get device's handler per its address under its parent */
 struct acpi_find_child {
acpi_handle handle;
Index: 

Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-28 Thread Thierry Reding
On Mon, Dec 24, 2012 at 10:25:00PM -0700, Stephen Warren wrote:
> On 12/21/2012 11:50 PM, Terje Bergström wrote:
> > On 21.12.2012 16:36, Thierry Reding wrote:
> >> On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
> >>> +static struct platform_driver tegra_drm_platform_driver = {
> >>> + .driver = {
> >>> + .name = "tegradrm",
> >>
> >> This should be "tegra-drm" to match the module name.
> > 
> > We've actually created two problems.
> > 
> > First is that the device name should match driver name which should
> > match module name. But host1x doesn't know the module name of tegradrm.
> 
> There's no hard requirement for the device/driver name to match the
> module name. It's good thing to do, but nothing will blow up if it don't
> (modules can use MODULE_ALIAS() to declare which drivers they expose).
> 
> But, what's the problem with host1x knowing the driver name; the host1x
> driver and tegradrm driver are both part of the same code-base, so this
> seems trivial to achieve.

Indeed. If we define the name to match the tegra-drm module name, then
just changing the above line is fine. This doesn't need to be automatic.
Making sure that both strings match in both drivers is enough.

> > Second problem is that host1x driver creates tegradrm device even if
> > tegradrm isn't loaded to system.
> 
> That's fine. If there's no driver, the device simply won't be probe()d.
> That's just like a device node existing in device tree, but the driver
> for it not being enabled in the kernel, or the relevant module not being
> inserted.
> 
> > These mean that the device has to be created in tegra-drm module to have
> 
> I definitely disagree here.

Instead of going over this back and forth, I've decided to rewrite this
patch from scratch the way I think it should be done. Maybe that'll make
things clearer. I haven't tested it on real hardware yet because I don't
have access over the holidays, but I'll post the patch once I've
verified that it actually works. The code is based on patches 1-4 of
this series and is meant to replace patch 5.

Thierry


pgp7wavbtHlkV.pgp
Description: PGP signature


Re: [PATCH v2] lib: cpu_rmap: avoid flushing all workqueues

2012-12-28 Thread Eric Dumazet
On Fri, 2012-12-28 at 11:03 -0800, David Decotigny wrote:
> In some cases, free_irq_cpu_rmap() is called while holding a lock
> (eg. rtnl). This can lead to deadlocks, because it invokes
> flush_scheduled_work() which ends up waiting for whole system
> workqueue to flush, but some pending works might try to acquire the
> lock we are already holding.
> 
> This commit uses reference-counting to replace
> irq_run_affinity_notifiers(). It also removes
> irq_run_affinity_notifiers() altogether.
> 
> Signed-off-by: David Decotigny 
> ---
>  include/linux/cpu_rmap.h  |   13 -
>  include/linux/interrupt.h |5 -
>  lib/cpu_rmap.c|   47 
> +++--
>  3 files changed, 45 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/cpu_rmap.h b/include/linux/cpu_rmap.h
> index ac3bbb5..3be2813 100644
> --- a/include/linux/cpu_rmap.h
> +++ b/include/linux/cpu_rmap.h
> @@ -13,9 +13,11 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /**
>   * struct cpu_rmap - CPU affinity reverse-map
> + * @refcount: kref for object
>   * @size: Number of objects to be reverse-mapped
>   * @used: Number of objects added
>   * @obj: Pointer to array of object pointers
> @@ -23,6 +25,7 @@
>   *  based on affinity masks
>   */
>  struct cpu_rmap {
> + struct kref refcount;
>   u16 size, used;
>   void**obj;
>   struct {
> @@ -33,15 +36,7 @@ struct cpu_rmap {
>  #define CPU_RMAP_DIST_INF 0x
>  
>  extern struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t flags);
> -
> -/**
> - * free_cpu_rmap - free CPU affinity reverse-map
> - * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
> - */
> -static inline void free_cpu_rmap(struct cpu_rmap *rmap)
> -{
> - kfree(rmap);
> -}
> +extern void free_cpu_rmap(struct cpu_rmap *rmap);
>  
>  extern int cpu_rmap_add(struct cpu_rmap *rmap, void *obj);
>  extern int cpu_rmap_update(struct cpu_rmap *rmap, u16 index,
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5e4e617..5fa5afe 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -268,11 +268,6 @@ struct irq_affinity_notify {
>  extern int
>  irq_set_affinity_notifier(unsigned int irq, struct irq_affinity_notify 
> *notify);
>  
> -static inline void irq_run_affinity_notifiers(void)
> -{
> - flush_scheduled_work();
> -}
> -
>  #else /* CONFIG_SMP */
>  
>  static inline int irq_set_affinity(unsigned int irq, const struct cpumask *m)
> diff --git a/lib/cpu_rmap.c b/lib/cpu_rmap.c
> index 145dec5..bb5d0af 100644
> --- a/lib/cpu_rmap.c
> +++ b/lib/cpu_rmap.c
> @@ -45,6 +45,7 @@ struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t 
> flags)
>   if (!rmap)
>   return NULL;
>  
> + kref_init(>refcount);
>   rmap->obj = (void **)((char *)rmap + obj_offset);
>  
>   /* Initially assign CPUs to objects on a rota, since we have
> @@ -63,6 +64,26 @@ struct cpu_rmap *alloc_cpu_rmap(unsigned int size, gfp_t 
> flags)
>  }
>  EXPORT_SYMBOL(alloc_cpu_rmap);
>  
> +/**
> + * reclaim_cpu_rmap - internal reclaiming helper called from kref_put
> + * @ref: kref to struct cpu_rmap
> + */
> +static void reclaim_cpu_rmap(struct kref *ref)

Could be named cpu_rmap_free()

(and alloc_cpu_rmap() should be renamed as cpu_rmap_alloc()

> +{
> + struct cpu_rmap *rmap = container_of(ref, struct cpu_rmap, refcount);
> + kfree(rmap);
> +}
> +
> +/**
> + * free_cpu_rmap - free CPU affinity reverse-map
> + * @rmap: Reverse-map allocated with alloc_cpu_rmap(), or %NULL
> + */
> +void free_cpu_rmap(struct cpu_rmap *rmap)

This could be named : cpu_rmap_put()

> +{
> + kref_put(>refcount, reclaim_cpu_rmap);
> +}
> +EXPORT_SYMBOL(free_cpu_rmap);

It seems this function could be static, and not exported.

> +
>  /* Reevaluate nearest object for given CPU, comparing with the given
>   * neighbours at the given distance.
>   */
> @@ -197,8 +218,7 @@ struct irq_glue {
>   * free_irq_cpu_rmap - free a CPU affinity reverse-map used for IRQs
>   * @rmap: Reverse-map allocated with alloc_irq_cpu_map(), or %NULL
>   *
> - * Must be called in process context, before freeing the IRQs, and
> - * without holding any locks required by global workqueue items.
> + * Must be called in process context, before freeing the IRQs.
>   */
>  void free_irq_cpu_rmap(struct cpu_rmap *rmap)
>  {
> @@ -212,12 +232,18 @@ void free_irq_cpu_rmap(struct cpu_rmap *rmap)
>   glue = rmap->obj[index];
>   irq_set_affinity_notifier(glue->notify.irq, NULL);
>   }
> - irq_run_affinity_notifiers();
>  
> - kfree(rmap);
> + free_cpu_rmap(rmap);
>  }
>  EXPORT_SYMBOL(free_irq_cpu_rmap);
>  
> +/**
> + * irq_cpu_rmap_notify - callback for IRQ subsystem when IRQ affinity updated
> + * @notify: struct irq_affinity_notify passed by irq/manage.c
> + * @mask: cpu mask for new SMP affinity
> + *
> + * This is executed in workqueue context.
> + 

Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread Matthew Garrett
On Fri, 2012-12-28 at 12:40 -0800, H. Peter Anvin wrote:

> > I suspect that what we *should* do looks like:
> >
> > 1. If ACPI exports a Time and Alarm Device (ACPI000E) the use it;
> > 2. If ACPI exports an PC/AT device (PNP0B00/1/2) then use it(*);
> > 3. If we have an EFI RTC use it;
> > 4. Probe for a PC/AT RTC device.

In terms of ordering, 3 should probably come before 2 - but that depends
on us actually fixing the issues that are preventing some of these calls
from working. As far as wallclock time goes, EFI is going to be
available to us before we've parsed the DSDT to determine whether
there's any ACPI devices, so we'll almost certainly end up having to use
it at at least some point during boot. Otherwise, agreed.


Re: how to look for source code in kernel

2012-12-28 Thread Jonathan Neuschäfer
On Fri, Dec 28, 2012 at 11:49:53AM -0800, Eric W. Biederman wrote:
> Al Viro  writes:
> 
> > On Thu, Dec 27, 2012 at 11:36:13PM -0800, Eric W. Biederman wrote:
> >> But then I am probably peculiar keeping an index of the source code in
> >> my head.  When I need to look for something and I don't know where to
> >> find it I do.
> >> 
> >> git-ls-files | xargs fgrep 'struct f2fs_inode'
> >
> > What's wrong with git grep?
> 
> I haven't learned it yet.  git-ls-files is a lot better than find
> speed wise so is very much worth doing.I haven't a clue if
> there is an advantage to git-grep, over just knowing find, xargs, and
> grep.

Brevity.

Jonathan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/24] blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->online

2012-12-28 Thread Tejun Heo
Add two blkcg_policy methods, ->online_pd_fn() and ->offline_pd_fn(),
which are invoked as the policy_data gets activated and deactivated
while holding both blkcg and q locks.

Also, add blkcg_gq->online bool, which is set and cleared as the
blkcg_gq gets activated and deactivated.  This flag also is toggled
while holding both blkcg and q locks.

These will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
---
 block/blk-cgroup.c | 21 -
 block/blk-cgroup.h |  7 +++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3a8de32..4d625d2 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -182,7 +182,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
struct blkcg_gq *new_blkg)
 {
struct blkcg_gq *blkg;
-   int ret;
+   int i, ret;
 
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(q->queue_lock);
@@ -218,7 +218,15 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
if (likely(!ret)) {
hlist_add_head_rcu(>blkcg_node, >blkg_list);
list_add(>q_node, >blkg_list);
+
+   for (i = 0; i < BLKCG_MAX_POLS; i++) {
+   struct blkcg_policy *pol = blkcg_policy[i];
+
+   if (blkg->pd[i] && pol->pd_online_fn)
+   pol->pd_online_fn(blkg);
+   }
}
+   blkg->online = true;
spin_unlock(>lock);
 
if (!ret)
@@ -291,6 +299,7 @@ EXPORT_SYMBOL_GPL(blkg_lookup_create);
 static void blkg_destroy(struct blkcg_gq *blkg)
 {
struct blkcg *blkcg = blkg->blkcg;
+   int i;
 
lockdep_assert_held(blkg->q->queue_lock);
lockdep_assert_held(>lock);
@@ -299,6 +308,14 @@ static void blkg_destroy(struct blkcg_gq *blkg)
WARN_ON_ONCE(list_empty(>q_node));
WARN_ON_ONCE(hlist_unhashed(>blkcg_node));
 
+   for (i = 0; i < BLKCG_MAX_POLS; i++) {
+   struct blkcg_policy *pol = blkcg_policy[i];
+
+   if (blkg->pd[i] && pol->pd_offline_fn)
+   pol->pd_offline_fn(blkg);
+   }
+   blkg->online = false;
+
radix_tree_delete(>blkg_tree, blkg->q->id);
list_del_init(>q_node);
hlist_del_init_rcu(>blkcg_node);
@@ -956,6 +973,8 @@ void blkcg_deactivate_policy(struct request_queue *q,
/* grab blkcg lock too while removing @pd from @blkg */
spin_lock(>blkcg->lock);
 
+   if (pol->pd_offline_fn)
+   pol->pd_offline_fn(blkg);
if (pol->pd_exit_fn)
pol->pd_exit_fn(blkg);
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 40f5b97..678e89e 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -106,12 +106,17 @@ struct blkcg_gq {
/* reference count */
int refcnt;
 
+   /* is this blkg online? protected by both blkcg and q locks */
+   boolonline;
+
struct blkg_policy_data *pd[BLKCG_MAX_POLS];
 
struct rcu_head rcu_head;
 };
 
 typedef void (blkcg_pol_init_pd_fn)(struct blkcg_gq *blkg);
+typedef void (blkcg_pol_online_pd_fn)(struct blkcg_gq *blkg);
+typedef void (blkcg_pol_offline_pd_fn)(struct blkcg_gq *blkg);
 typedef void (blkcg_pol_exit_pd_fn)(struct blkcg_gq *blkg);
 typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkcg_gq *blkg);
 
@@ -124,6 +129,8 @@ struct blkcg_policy {
 
/* operations */
blkcg_pol_init_pd_fn*pd_init_fn;
+   blkcg_pol_online_pd_fn  *pd_online_fn;
+   blkcg_pol_offline_pd_fn *pd_offline_fn;
blkcg_pol_exit_pd_fn*pd_exit_fn;
blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn;
 };
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/24] blkcg: add blkg_policy_data->plid

2012-12-28 Thread Tejun Heo
Add pd->plid so that the policy a pd belongs to can be identified
easily.  This will be used to implement hierarchical blkg_[rw]stats.

Signed-off-by: Tejun Heo 
---
 block/blk-cgroup.c | 2 ++
 block/blk-cgroup.h | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 10e1df9..3a8de32 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -113,6 +113,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, 
struct request_queue *q,
 
blkg->pd[i] = pd;
pd->blkg = blkg;
+   pd->plid = i;
 
/* invoke per-policy init */
if (pol->pd_init_fn)
@@ -908,6 +909,7 @@ int blkcg_activate_policy(struct request_queue *q,
 
blkg->pd[pol->plid] = pd;
pd->blkg = blkg;
+   pd->plid = pol->plid;
pol->pd_init_fn(blkg);
 
spin_unlock(>blkcg->lock);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 2446225..40f5b97 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -81,8 +81,9 @@ struct blkg_rwstat {
  * beginning and pd_size can't be smaller than pd.
  */
 struct blkg_policy_data {
-   /* the blkg this per-policy data belongs to */
+   /* the blkg and policy id this per-policy data belongs to */
struct blkcg_gq *blkg;
+   int plid;
 
/* used during policy activation */
struct list_headalloc_node;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/24] blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock

2012-12-28 Thread Tejun Heo
Instead of holding blkcg->lock while walking ->blkg_list and executing
prfill(), RCU walk ->blkg_list and hold the blkg's queue lock while
executing prfill().  This makes prfill() implementations easier as
stats are mostly protected by queue lock.

This will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
---
 block/blk-cgroup.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a1a4b97..22f75d1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -504,8 +504,9 @@ static const char *blkg_dev_name(struct blkcg_gq *blkg)
  *
  * This function invokes @prfill on each blkg of @blkcg if pd for the
  * policy specified by @pol exists.  @prfill is invoked with @sf, the
- * policy data and @data.  If @show_total is %true, the sum of the return
- * values from @prfill is printed with "Total" label at the end.
+ * policy data and @data and the matching queue lock held.  If @show_total
+ * is %true, the sum of the return values from @prfill is printed with
+ * "Total" label at the end.
  *
  * This is to be used to construct print functions for
  * cftype->read_seq_string method.
@@ -520,11 +521,14 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkcg 
*blkcg,
struct hlist_node *n;
u64 total = 0;
 
-   spin_lock_irq(>lock);
-   hlist_for_each_entry(blkg, n, >blkg_list, blkcg_node)
+   rcu_read_lock();
+   hlist_for_each_entry_rcu(blkg, n, >blkg_list, blkcg_node) {
+   spin_lock_irq(blkg->q->queue_lock);
if (blkcg_policy_enabled(blkg->q, pol))
total += prfill(sf, blkg->pd[pol->plid], data);
-   spin_unlock_irq(>lock);
+   spin_unlock_irq(blkg->q->queue_lock);
+   }
+   rcu_read_unlock();
 
if (show_total)
seq_printf(sf, "Total %llu\n", (unsigned long long)total);
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/24] blkcg: cosmetic updates to blkg_create()

2012-12-28 Thread Tejun Heo
* Rename out_* labels to err_*.

* Do ERR_PTR() conversion once in the error return path.

This patch is cosmetic and to prepare for the hierarchy support.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal 
---
 block/blk-cgroup.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2012754..18ae480 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -187,16 +187,16 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 
/* blkg holds a reference to blkcg */
if (!css_tryget(>css)) {
-   blkg = ERR_PTR(-EINVAL);
-   goto out_free;
+   ret = -EINVAL;
+   goto err_free_blkg;
}
 
/* allocate */
if (!new_blkg) {
new_blkg = blkg_alloc(blkcg, q, GFP_ATOMIC);
if (unlikely(!new_blkg)) {
-   blkg = ERR_PTR(-ENOMEM);
-   goto out_put;
+   ret = -ENOMEM;
+   goto err_put_css;
}
}
blkg = new_blkg;
@@ -213,12 +213,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
if (!ret)
return blkg;
 
-   blkg = ERR_PTR(ret);
-out_put:
+err_put_css:
css_put(>css);
-out_free:
+err_free_blkg:
blkg_free(new_blkg);
-   return blkg;
+   return ERR_PTR(ret);
 }
 
 /**
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/24] cfq-iosched: Get rid of unnecessary local variable

2012-12-28 Thread Tejun Heo
From: Vivek Goyal 

Use of local varibale "n" seems to be unnecessary. Remove it. This brings
it inline with function __cfq_group_st_add(), which is also doing the
similar operation of adding a group to a rb tree.

No functionality change here.

Signed-off-by: Vivek Goyal 
Acked-by: Jeff Moyer 
Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e34e142..5ad4cae 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1658,8 +1658,6 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
cfqq->service_tree = st;
p = >rb.rb_node;
while (*p) {
-   struct rb_node **n;
-
parent = *p;
__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
 
@@ -1667,13 +1665,11 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
 * sort by key, that represents service time.
 */
if (time_before(rb_key, __cfqq->rb_key))
-   n = &(*p)->rb_left;
+   p = >rb_left;
else {
-   n = &(*p)->rb_right;
+   p = >rb_right;
left = 0;
}
-
-   p = n;
}
 
if (left)
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] rtc-efi: register rtc-efi device when EFI enabled

2012-12-28 Thread H. Peter Anvin

On 12/28/2012 11:17 AM, Matthew Garrett wrote:

On Fri, 2012-12-28 at 11:07 -0800, H. Peter Anvin wrote:

We do have such machines, which is why this change has been reverted twice 
already.  I believe we should stick to the priority scheme I proposed a few 
weeks ago.


I seem to have missed that discussion, and couldn't find it after a
brief search. Got a pointer?



Looks like it was a non-public distribution ... anyway, what I wrote was:


I suspect that what we *should* do looks like:

1. If ACPI exports a Time and Alarm Device (ACPI000E) the use it;
2. If ACPI exports an PC/AT device (PNP0B00/1/2) then use it(*);
3. If we have an EFI RTC use it;
4. Probe for a PC/AT RTC device.

I'm unsure what the ordering of 1 & 2 should be.  The ACPI device has
the advantage that it contains time zone information, which is important
for Windows interoperability, and at least optionally supports
millisecond resolution; there is no way to even export "this is where
you find time zone information" (since we're dealing with an RTC with
embedded CMOS, there is storage available, it is just a matter of
telling the OS how to find it) for the PNP0B0x devices.

The TAD is also guaranteed to map 1:1 to the EFI RTC.

-hpa



--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: (sb105x) Fix compile error seen if PARPORT_PC is undefined

2012-12-28 Thread Guenter Roeck
Fix:
ERROR: "parport_pc_probe_port" [drivers/staging/sb105x/sb105x.ko] undefined!

by making the code calling it conditional. Also remove the external reference to
parport_pc_probe_port() from the sb105x source file, as it doesn't belong there.

Signed-off-by: Guenter Roeck 
---
 drivers/staging/sb105x/sb_pci_mp.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/sb105x/sb_pci_mp.c 
b/drivers/staging/sb105x/sb_pci_mp.c
index edb2a85..b7ded88 100644
--- a/drivers/staging/sb105x/sb_pci_mp.c
+++ b/drivers/staging/sb105x/sb_pci_mp.c
@@ -1,12 +1,7 @@
 #include "sb_pci_mp.h"
 #include 
 #include 
-
-extern struct parport *parport_pc_probe_port(unsigned long base_lo,
-   unsigned long base_hi,
-   int irq, int dma,
-   struct device *dev,
-   int irqflags);
+#include 
 
 static struct mp_device_t mp_devs[MAX_MP_DEV];
 static int mp_nrpcibrds = sizeof(mp_pciboards)/sizeof(mppcibrd_t);
@@ -3066,12 +3061,16 @@ static int init_mp_dev(struct pci_dev *pcidev, 
mppcibrd_t brd)
pci_remap_base(pcidev, PCI_BASE_ADDRESS_1, 
prev_port_addr + 8, 8);
}
 
+#if IS_ENABLED(CONFIG_PARPORT_PC)
/* add PC compatible parallel port */
parport_pc_probe_port(pcidev->resource[2].start, 
pcidev->resource[3].start, PARPORT_IRQ_NONE, PARPORT_DMA_NONE, >dev, 0);
+#endif
break;
case PCI_DEVICE_ID_MP1P :
+#if IS_ENABLED(CONFIG_PARPORT_PC)
/* add PC compatible parallel port */
parport_pc_probe_port(pcidev->resource[2].start, 
pcidev->resource[3].start, PARPORT_IRQ_NONE, PARPORT_DMA_NONE, >dev, 0);
+#endif
break;
}
 
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mips: Export min_low_pfn if CONFIG_FLATMEM

2012-12-28 Thread Geert Uytterhoeven
If CONFIG_FLATMEM=y on MIPS, pfn_valid() (and thus virt_addr_valid()) uses
min_low_pfn. However, min_low_pfn is not exported by the generic bootmem
code.

As of commit e52a29326462badd9ceec90a9eb2ac2a8550e02e ("aoe: avoid races
between device destruction and discovery"), aoeblk_open() uses
virt_addr_valid(), causing a link error in the modular case:

ERROR: "min_low_pfn" [drivers/block/aoe/aoe.ko] undefined!

Add a custom export to fix this, just like is done for ia64 in
arch/ia64/kernel/ia64_ksyms.c.

Signed-off-by: Geert Uytterhoeven 
---
http://kisskb.ellerman.id.au/kisskb/buildresult/7864918/

 arch/mips/kernel/mips_ksyms.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/mips/kernel/mips_ksyms.c b/arch/mips/kernel/mips_ksyms.c
index df1e3e4..8579bb8 100644
--- a/arch/mips/kernel/mips_ksyms.c
+++ b/arch/mips/kernel/mips_ksyms.c
@@ -60,3 +60,8 @@ EXPORT_SYMBOL(invalid_pte_table);
 /* _mcount is defined in arch/mips/kernel/mcount.S */
 EXPORT_SYMBOL(_mcount);
 #endif
+
+#ifdef CONFIG_FLATMEM
+#include 
+EXPORT_SYMBOL(min_low_pfn);/* defined by bootmem.c, but not exported by 
generic code */
+#endif
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/24] cfq-iosched: Properly name all references to IO class

2012-12-28 Thread Tejun Heo
From: Vivek Goyal 

Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we
are calling workloads belonging to these classes as "prio". This gets
very confusing as one starts to associate it with ioprio.

So this patch just does bunch of renaming so that reading code becomes
easier. All reference to RT, BE and IDLE workload are done using keyword
"class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made
using keyword "type".

This makes me feel much better while I am reading the code. There is no
functionality change due to this patch.

Signed-off-by: Vivek Goyal 
Acked-by: Jeff Moyer 
Acked-by: Tejun Heo 
Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 67 +++--
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e62e920..7646dfd 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -155,7 +155,7 @@ struct cfq_queue {
  * First index in the service_trees.
  * IDLE is handled separately, so it has negative index
  */
-enum wl_prio_t {
+enum wl_class_t {
BE_WORKLOAD = 0,
RT_WORKLOAD = 1,
IDLE_WORKLOAD = 2,
@@ -250,7 +250,7 @@ struct cfq_group {
 
unsigned long saved_workload_slice;
enum wl_type_t saved_workload;
-   enum wl_prio_t saved_serving_prio;
+   enum wl_class_t saved_serving_class;
 
/* number of requests that are on the dispatch list or inside driver */
int dispatched;
@@ -280,7 +280,7 @@ struct cfq_data {
/*
 * The priority currently being served
 */
-   enum wl_prio_t serving_prio;
+   enum wl_class_t serving_class;
enum wl_type_t serving_type;
unsigned long workload_expires;
struct cfq_group *serving_group;
@@ -354,16 +354,16 @@ struct cfq_data {
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
 
 static struct cfq_rb_root *service_tree_for(struct cfq_group *cfqg,
-   enum wl_prio_t prio,
+   enum wl_class_t class,
enum wl_type_t type)
 {
if (!cfqg)
return NULL;
 
-   if (prio == IDLE_WORKLOAD)
+   if (class == IDLE_WORKLOAD)
return >service_tree_idle;
 
-   return >service_trees[prio][type];
+   return >service_trees[class][type];
 }
 
 enum cfqq_state_flags {
@@ -732,7 +732,7 @@ static inline bool iops_mode(struct cfq_data *cfqd)
return false;
 }
 
-static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq)
+static inline enum wl_class_t cfqq_class(struct cfq_queue *cfqq)
 {
if (cfq_class_idle(cfqq))
return IDLE_WORKLOAD;
@@ -751,16 +751,16 @@ static enum wl_type_t cfqq_type(struct cfq_queue *cfqq)
return SYNC_WORKLOAD;
 }
 
-static inline int cfq_group_busy_queues_wl(enum wl_prio_t wl,
+static inline int cfq_group_busy_queues_wl(enum wl_class_t wl_class,
struct cfq_data *cfqd,
struct cfq_group *cfqg)
 {
-   if (wl == IDLE_WORKLOAD)
+   if (wl_class == IDLE_WORKLOAD)
return cfqg->service_tree_idle.count;
 
-   return cfqg->service_trees[wl][ASYNC_WORKLOAD].count
-   + cfqg->service_trees[wl][SYNC_NOIDLE_WORKLOAD].count
-   + cfqg->service_trees[wl][SYNC_WORKLOAD].count;
+   return cfqg->service_trees[wl_class][ASYNC_WORKLOAD].count
+   + cfqg->service_trees[wl_class][SYNC_NOIDLE_WORKLOAD].count
+   + cfqg->service_trees[wl_class][SYNC_WORKLOAD].count;
 }
 
 static inline int cfqg_busy_async_queues(struct cfq_data *cfqd,
@@ -1304,7 +1304,7 @@ static void cfq_group_served(struct cfq_data *cfqd, 
struct cfq_group *cfqg,
cfqg->saved_workload_slice = cfqd->workload_expires
- jiffies;
cfqg->saved_workload = cfqd->serving_type;
-   cfqg->saved_serving_prio = cfqd->serving_prio;
+   cfqg->saved_serving_class = cfqd->serving_class;
} else
cfqg->saved_workload_slice = 0;
 
@@ -1616,7 +1616,7 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
int left;
int new_cfqq = 1;
 
-   service_tree = service_tree_for(cfqq->cfqg, cfqq_prio(cfqq),
+   service_tree = service_tree_for(cfqq->cfqg, cfqq_class(cfqq),
cfqq_type(cfqq));
if (cfq_class_idle(cfqq)) {
rb_key = CFQ_IDLE_DELAY;
@@ -2030,8 +2030,8 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
   struct cfq_queue *cfqq)
 {
if (cfqq) {
-   cfq_log_cfqq(cfqd, cfqq, "set_active wl_prio:%d wl_type:%d",
-   cfqd->serving_prio, cfqd->serving_type);
+

[PATCH 04/24] cfq-iosched: Rename few functions related to selecting workload

2012-12-28 Thread Tejun Heo
From: Vivek Goyal 

choose_service_tree() selects/sets both wl_class and wl_type.  Rename it to
choose_wl_class_and_type() to make it very clear.

cfq_choose_wl() only selects and sets wl_type. It is easy to confuse
it with choose_st(). So rename it to cfq_choose_wl_type() to make
it clear what does it do.

Just renaming. No functionality change.

Signed-off-by: Vivek Goyal 
Acked-by: Jeff Moyer 
Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index db4a1a5..e34e142 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2490,7 +2490,7 @@ static void cfq_setup_merge(struct cfq_queue *cfqq, 
struct cfq_queue *new_cfqq)
}
 }
 
-static enum wl_type_t cfq_choose_wl(struct cfq_data *cfqd,
+static enum wl_type_t cfq_choose_wl_type(struct cfq_data *cfqd,
struct cfq_group *cfqg, enum wl_class_t wl_class)
 {
struct cfq_queue *queue;
@@ -2513,7 +2513,8 @@ static enum wl_type_t cfq_choose_wl(struct cfq_data *cfqd,
return cur_best;
 }
 
-static void choose_service_tree(struct cfq_data *cfqd, struct cfq_group *cfqg)
+static void
+choose_wl_class_and_type(struct cfq_data *cfqd, struct cfq_group *cfqg)
 {
unsigned slice;
unsigned count;
@@ -2551,7 +2552,7 @@ static void choose_service_tree(struct cfq_data *cfqd, 
struct cfq_group *cfqg)
 
 new_workload:
/* otherwise select new workload type */
-   cfqd->serving_wl_type = cfq_choose_wl(cfqd, cfqg,
+   cfqd->serving_wl_type = cfq_choose_wl_type(cfqd, cfqg,
cfqd->serving_wl_class);
st = st_for(cfqg, cfqd->serving_wl_class, cfqd->serving_wl_type);
count = st->count;
@@ -2621,7 +2622,7 @@ static void cfq_choose_cfqg(struct cfq_data *cfqd)
} else
cfqd->workload_expires = jiffies - 1;
 
-   choose_service_tree(cfqd, cfqg);
+   choose_wl_class_and_type(cfqd, cfqg);
 }
 
 /*
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/24] blkcg: make blkcg_gq's hierarchical

2012-12-28 Thread Tejun Heo
Currently a child blkg (blkcg_gq) can be created even if its parent
doesn't exist.  ie. Given a blkg, it's not guaranteed that its
ancestors will exist.  This makes it difficult to implement proper
hierarchy support for blkcg policies.

Always create blkgs recursively and make a child blkg hold a reference
to its parent.  blkg->parent is added so that finding the parent is
easy.  blkcg_parent() is also added in the process.

This change can be visible to userland.  e.g. while issuing IO in a
nested cgroup didn't affect the ancestors at all, now it will
initialize all ancestor blkgs and zero stats for the request_queue
will always appear on them.  While this is userland visible, this
shouldn't cause any functional difference.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal 
---
 block/blk-cgroup.c | 42 +-
 block/blk-cgroup.h | 18 ++
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 18ae480..942f344 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -201,7 +201,16 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
}
blkg = new_blkg;
 
-   /* insert */
+   /* link parent and insert */
+   if (blkcg_parent(blkcg)) {
+   blkg->parent = __blkg_lookup(blkcg_parent(blkcg), q, false);
+   if (WARN_ON_ONCE(!blkg->parent)) {
+   blkg = ERR_PTR(-EINVAL);
+   goto err_put_css;
+   }
+   blkg_get(blkg->parent);
+   }
+
spin_lock(>lock);
ret = radix_tree_insert(>blkg_tree, q->id, blkg);
if (likely(!ret)) {
@@ -213,6 +222,10 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
if (!ret)
return blkg;
 
+   /* @blkg failed fully initialized, use the usual release path */
+   blkg_put(blkg);
+   return ERR_PTR(ret);
+
 err_put_css:
css_put(>css);
 err_free_blkg:
@@ -226,8 +239,9 @@ err_free_blkg:
  * @q: request_queue of interest
  *
  * Lookup blkg for the @blkcg - @q pair.  If it doesn't exist, try to
- * create one.  This function should be called under RCU read lock and
- * @q->queue_lock.
+ * create one.  blkg creation is performed recursively from blkcg_root such
+ * that all non-root blkg's have access to the parent blkg.  This function
+ * should be called under RCU read lock and @q->queue_lock.
  *
  * Returns pointer to the looked up or created blkg on success, ERR_PTR()
  * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
@@ -252,7 +266,23 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
if (blkg)
return blkg;
 
-   return blkg_create(blkcg, q, NULL);
+   /*
+* Create blkgs walking down from blkcg_root to @blkcg, so that all
+* non-root blkgs have access to their parents.
+*/
+   while (true) {
+   struct blkcg *pos = blkcg;
+   struct blkcg *parent = blkcg_parent(blkcg);
+
+   while (parent && !__blkg_lookup(parent, q, false)) {
+   pos = parent;
+   parent = blkcg_parent(parent);
+   }
+
+   blkg = blkg_create(pos, q, NULL);
+   if (pos == blkcg || IS_ERR(blkg))
+   return blkg;
+   }
 }
 EXPORT_SYMBOL_GPL(blkg_lookup_create);
 
@@ -321,8 +351,10 @@ static void blkg_rcu_free(struct rcu_head *rcu_head)
 
 void __blkg_release(struct blkcg_gq *blkg)
 {
-   /* release the extra blkcg reference this blkg has been holding */
+   /* release the blkcg and parent blkg refs this blkg has been holding */
css_put(>blkcg->css);
+   if (blkg->parent)
+   blkg_put(blkg->parent);
 
/*
 * A group is freed in rcu manner. But having an rcu lock does not
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 2459730..b26ed58 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -94,8 +94,13 @@ struct blkcg_gq {
struct list_headq_node;
struct hlist_node   blkcg_node;
struct blkcg*blkcg;
+
+   /* all non-root blkcg_gq's are guaranteed to have access to parent */
+   struct blkcg_gq *parent;
+
/* request allocation list for this blkcg-q pair */
struct request_list rl;
+
/* reference count */
int refcnt;
 
@@ -181,6 +186,19 @@ static inline struct blkcg *bio_blkcg(struct bio *bio)
 }
 
 /**
+ * blkcg_parent - get the parent of a blkcg
+ * @blkcg: blkcg of interest
+ *
+ * Return the parent blkcg of @blkcg.  Can be called anytime.
+ */
+static inline struct blkcg *blkcg_parent(struct blkcg *blkcg)
+{
+   struct cgroup *pcg = blkcg->css.cgroup->parent;
+
+   return pcg ? cgroup_to_blkcg(pcg) : NULL;
+}
+
+/**
  * blkg_to_pdata - get policy private data
  * 

[PATCH 08/24] blkcg: reorganize blkg_lookup_create() and friends

2012-12-28 Thread Tejun Heo
Reorganize such that

* __blkg_lookup() takes bool param @update_hint to determine whether
  to update hint.

* __blkg_lookup_create() no longer performs lookup before trying to
  create.  Renamed to blkg_create().

* blkg_lookup_create() now performs lookup and then invokes
  blkg_create() if lookup fails.

* root_blkg creation in blkcg_activate_policy() updated accordingly.
  Note that blkcg_activate_policy() no longer updates lookup hint if
  root_blkg already exists.

Except for the last lookup hint bit which is immaterial, this is pure
reorganization and doesn't introduce any visible behavior change.
This is to prepare for proper hierarchy support.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal 
---
 block/blk-cgroup.c | 75 +-
 1 file changed, 52 insertions(+), 23 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 7ef747b..2012754 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -126,7 +126,7 @@ err_free:
 }
 
 static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg,
- struct request_queue *q)
+ struct request_queue *q, bool update_hint)
 {
struct blkcg_gq *blkg;
 
@@ -135,14 +135,19 @@ static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg,
return blkg;
 
/*
-* Hint didn't match.  Look up from the radix tree.  Note that we
-* may not be holding queue_lock and thus are not sure whether
-* @blkg from blkg_tree has already been removed or not, so we
-* can't update hint to the lookup result.  Leave it to the caller.
+* Hint didn't match.  Look up from the radix tree.  Note that the
+* hint can only be updated under queue_lock as otherwise @blkg
+* could have already been removed from blkg_tree.  The caller is
+* responsible for grabbing queue_lock if @update_hint.
 */
blkg = radix_tree_lookup(>blkg_tree, q->id);
-   if (blkg && blkg->q == q)
+   if (blkg && blkg->q == q) {
+   if (update_hint) {
+   lockdep_assert_held(q->queue_lock);
+   rcu_assign_pointer(blkcg->blkg_hint, blkg);
+   }
return blkg;
+   }
 
return NULL;
 }
@@ -162,7 +167,7 @@ struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, struct 
request_queue *q)
 
if (unlikely(blk_queue_bypass(q)))
return NULL;
-   return __blkg_lookup(blkcg, q);
+   return __blkg_lookup(blkcg, q, false);
 }
 EXPORT_SYMBOL_GPL(blkg_lookup);
 
@@ -170,9 +175,9 @@ EXPORT_SYMBOL_GPL(blkg_lookup);
  * If @new_blkg is %NULL, this function tries to allocate a new one as
  * necessary using %GFP_ATOMIC.  @new_blkg is always consumed on return.
  */
-static struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
-struct request_queue *q,
-struct blkcg_gq *new_blkg)
+static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
+   struct request_queue *q,
+   struct blkcg_gq *new_blkg)
 {
struct blkcg_gq *blkg;
int ret;
@@ -180,13 +185,6 @@ static struct blkcg_gq *__blkg_lookup_create(struct blkcg 
*blkcg,
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(q->queue_lock);
 
-   /* lookup and update hint on success, see __blkg_lookup() for details */
-   blkg = __blkg_lookup(blkcg, q);
-   if (blkg) {
-   rcu_assign_pointer(blkcg->blkg_hint, blkg);
-   goto out_free;
-   }
-
/* blkg holds a reference to blkcg */
if (!css_tryget(>css)) {
blkg = ERR_PTR(-EINVAL);
@@ -223,16 +221,39 @@ out_free:
return blkg;
 }
 
+/**
+ * blkg_lookup_create - lookup blkg, try to create one if not there
+ * @blkcg: blkcg of interest
+ * @q: request_queue of interest
+ *
+ * Lookup blkg for the @blkcg - @q pair.  If it doesn't exist, try to
+ * create one.  This function should be called under RCU read lock and
+ * @q->queue_lock.
+ *
+ * Returns pointer to the looked up or created blkg on success, ERR_PTR()
+ * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
+ * dead and bypassing, returns ERR_PTR(-EBUSY).
+ */
 struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
struct request_queue *q)
 {
+   struct blkcg_gq *blkg;
+
+   WARN_ON_ONCE(!rcu_read_lock_held());
+   lockdep_assert_held(q->queue_lock);
+
/*
 * This could be the first entry point of blkcg implementation and
 * we shouldn't allow anything to go through for a bypassing queue.
 */
if (unlikely(blk_queue_bypass(q)))
return ERR_PTR(blk_queue_dying(q) ? -EINVAL : -EBUSY);
-   return __blkg_lookup_create(blkcg, q, NULL);
+
+   blkg = 

[PATCH 02/24] cfq-iosched: More renaming to better represent wl_class and wl_type

2012-12-28 Thread Tejun Heo
From: Vivek Goyal 

Some more renaming. Again making the code uniform w.r.t use of
wl_class/class to represent IO class (RT, BE, IDLE) and using
wl_type/type to represent subclass (SYNC, SYNC-IDLE, ASYNC).

At places this patch shortens the string "workload" to "wl".
Renamed "saved_workload" to "saved_wl_type". Renamed
"saved_serving_class" to "saved_wl_class".

For uniformity with "saved_wl_*" variables, renamed "serving_class"
to "serving_wl_class" and renamed "serving_type" to "serving_wl_type".

Again, just trying to improve upon code uniformity and improve
readability. No functional change.

v2:
- Restored the usage of keyword "service" based on Jeff Moyer's feedback.

Signed-off-by: Vivek Goyal 
Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 64 +++--
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7646dfd..8f890bf 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -248,9 +248,9 @@ struct cfq_group {
struct cfq_rb_root service_trees[2][3];
struct cfq_rb_root service_tree_idle;
 
-   unsigned long saved_workload_slice;
-   enum wl_type_t saved_workload;
-   enum wl_class_t saved_serving_class;
+   unsigned long saved_wl_slice;
+   enum wl_type_t saved_wl_type;
+   enum wl_class_t saved_wl_class;
 
/* number of requests that are on the dispatch list or inside driver */
int dispatched;
@@ -280,8 +280,8 @@ struct cfq_data {
/*
 * The priority currently being served
 */
-   enum wl_class_t serving_class;
-   enum wl_type_t serving_type;
+   enum wl_class_t serving_wl_class;
+   enum wl_type_t serving_wl_type;
unsigned long workload_expires;
struct cfq_group *serving_group;
 
@@ -1241,7 +1241,7 @@ cfq_group_notify_queue_del(struct cfq_data *cfqd, struct 
cfq_group *cfqg)
 
cfq_log_cfqg(cfqd, cfqg, "del_from_rr group");
cfq_group_service_tree_del(st, cfqg);
-   cfqg->saved_workload_slice = 0;
+   cfqg->saved_wl_slice = 0;
cfqg_stats_update_dequeue(cfqg);
 }
 
@@ -1301,12 +1301,12 @@ static void cfq_group_served(struct cfq_data *cfqd, 
struct cfq_group *cfqg,
 
/* This group is being expired. Save the context */
if (time_after(cfqd->workload_expires, jiffies)) {
-   cfqg->saved_workload_slice = cfqd->workload_expires
+   cfqg->saved_wl_slice = cfqd->workload_expires
- jiffies;
-   cfqg->saved_workload = cfqd->serving_type;
-   cfqg->saved_serving_class = cfqd->serving_class;
+   cfqg->saved_wl_type = cfqd->serving_wl_type;
+   cfqg->saved_wl_class = cfqd->serving_wl_class;
} else
-   cfqg->saved_workload_slice = 0;
+   cfqg->saved_wl_slice = 0;
 
cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
st->min_vdisktime);
@@ -2031,7 +2031,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
 {
if (cfqq) {
cfq_log_cfqq(cfqd, cfqq, "set_active wl_class:%d wl_type:%d",
-   cfqd->serving_class, cfqd->serving_type);
+   cfqd->serving_wl_class, cfqd->serving_wl_type);
cfqg_stats_update_avg_queue_size(cfqq->cfqg);
cfqq->slice_start = 0;
cfqq->dispatch_start = jiffies;
@@ -2118,8 +2118,8 @@ static inline void cfq_slice_expired(struct cfq_data 
*cfqd, bool timed_out)
 static struct cfq_queue *cfq_get_next_queue(struct cfq_data *cfqd)
 {
struct cfq_rb_root *service_tree =
-   service_tree_for(cfqd->serving_group, cfqd->serving_class,
-   cfqd->serving_type);
+   service_tree_for(cfqd->serving_group, cfqd->serving_wl_class,
+   cfqd->serving_wl_type);
 
if (!cfqd->rq_queued)
return NULL;
@@ -2523,20 +2523,20 @@ static void choose_service_tree(struct cfq_data *cfqd, 
struct cfq_group *cfqg)
unsigned count;
struct cfq_rb_root *st;
unsigned group_slice;
-   enum wl_class_t original_class = cfqd->serving_class;
+   enum wl_class_t original_class = cfqd->serving_wl_class;
 
/* Choose next priority. RT > BE > IDLE */
if (cfq_group_busy_queues_wl(RT_WORKLOAD, cfqd, cfqg))
-   cfqd->serving_class = RT_WORKLOAD;
+   cfqd->serving_wl_class = RT_WORKLOAD;
else if (cfq_group_busy_queues_wl(BE_WORKLOAD, cfqd, cfqg))
-   cfqd->serving_class = BE_WORKLOAD;
+   cfqd->serving_wl_class = BE_WORKLOAD;
else {
-   cfqd->serving_class = IDLE_WORKLOAD;
+   cfqd->serving_wl_class = IDLE_WORKLOAD;
cfqd->workload_expires = jiffies 

[PATCH 13/24] cfq-iosched: implement hierarchy-ready cfq_group charge scaling

2012-12-28 Thread Tejun Heo
Currently, cfqg charges are scaled directly according to cfqg->weight.
Regardless of the number of active cfqgs or the amount of active
weights, a given weight value always scales charge the same way.  This
works fine as long as all cfqgs are treated equally regardless of
their positions in the hierarchy, which is what cfq currently
implements.  It can't work in hierarchical settings because the
interpretation of a given weight value depends on where the weight is
located in the hierarchy.

This patch reimplements cfqg charge scaling so that it can be used to
support hierarchy properly.  The scheme is fairly simple and
light-weight.

* When a cfqg is added to the service tree, v(disktime)weight is
  calculated.  It walks up the tree to root calculating the fraction
  it has in the hierarchy.  At each level, the fraction can be
  calculated as

cfqg->weight / parent->level_weight

  By compounding these, the global fraction of vdisktime the cfqg has
  claim to - vfraction - can be determined.

* When the cfqg needs to be charged, the charge is scaled inversely
  proportionally to the vfraction.

The new scaling scheme uses the same CFQ_SERVICE_SHIFT for fixed point
representation as before; however, the smallest scaling factor is now
1 (ie. 1 << CFQ_SERVICE_SHIFT).  This is different from before where 1
was for CFQ_WEIGHT_DEFAULT and higher weight would result in smaller
scaling factor.

While this shifts the global scale of vdisktime a bit, it doesn't
change the relative relationships among cfqgs and the scheduling
result isn't different.

cfq_group_notify_queue_add uses fixed CFQ_IDLE_DELAY when appending
new cfqg to the service tree.  The specific value of CFQ_IDLE_DELAY
didn't have any relevance to vdisktime before and is unlikely to cause
any visible behavior difference now especially as the scale shift
isn't that large.

As the new scheme now makes proper distinction between cfqg->weight
and ->leaf_weight, reverse the weight aliasing for root cfqgs.  For
root, both weights are now mapped to ->leaf_weight instead of the
other way around.

Because we're still using cfqg_flat_parent(), this patch shouldn't
change the scheduling behavior in any noticeable way.

v2: Beefed up comments on vfraction as requested by Vivek.

Signed-off-by: Tejun Heo 
Cc: Vivek Goyal 
---
 block/cfq-iosched.c | 107 +---
 1 file changed, 77 insertions(+), 30 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7701c3f..b24acf6 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -237,6 +237,18 @@ struct cfq_group {
unsigned int children_weight;
 
/*
+* vfraction is the fraction of vdisktime that the tasks in this
+* cfqg are entitled to.  This is determined by compounding the
+* ratios walking up from this cfqg to the root.
+*
+* It is in fixed point w/ CFQ_SERVICE_SHIFT and the sum of all
+* vfractions on a service tree is approximately 1.  The sum may
+* deviate a bit due to rounding errors and fluctuations caused by
+* cfqgs entering and leaving the service tree.
+*/
+   unsigned int vfraction;
+
+   /*
 * There are two weights - (internal) weight is the weight of this
 * cfqg against the sibling cfqgs.  leaf_weight is the wight of
 * this cfqg against the child cfqgs.  For the root cfqg, both
@@ -891,13 +903,27 @@ cfq_prio_to_slice(struct cfq_data *cfqd, struct cfq_queue 
*cfqq)
return cfq_prio_slice(cfqd, cfq_cfqq_sync(cfqq), cfqq->ioprio);
 }
 
-static inline u64 cfq_scale_slice(unsigned long delta, struct cfq_group *cfqg)
+/**
+ * cfqg_scale_charge - scale disk time charge according to cfqg weight
+ * @charge: disk time being charged
+ * @vfraction: vfraction of the cfqg, fixed point w/ CFQ_SERVICE_SHIFT
+ *
+ * Scale @charge according to @vfraction, which is in range (0, 1].  The
+ * scaling is inversely proportional.
+ *
+ * scaled = charge / vfraction
+ *
+ * The result is also in fixed point w/ CFQ_SERVICE_SHIFT.
+ */
+static inline u64 cfqg_scale_charge(unsigned long charge,
+   unsigned int vfraction)
 {
-   u64 d = delta << CFQ_SERVICE_SHIFT;
+   u64 c = charge << CFQ_SERVICE_SHIFT;/* make it fixed point */
 
-   d = d * CFQ_WEIGHT_DEFAULT;
-   do_div(d, cfqg->weight);
-   return d;
+   /* charge / vfraction */
+   c <<= CFQ_SERVICE_SHIFT;
+   do_div(c, vfraction);
+   return c;
 }
 
 static inline u64 max_vdisktime(u64 min_vdisktime, u64 vdisktime)
@@ -1237,7 +1263,9 @@ cfq_update_group_weight(struct cfq_group *cfqg)
 static void
 cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
 {
+   unsigned int vfr = 1 << CFQ_SERVICE_SHIFT;  /* start with 1 */
struct cfq_group *pos = cfqg;
+   struct cfq_group *parent;
bool propagate;
 
/* add to the service tree */
@@ -1248,22 +1276,34 

[PATCH 18/24] blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/

2012-12-28 Thread Tejun Heo
Rename blkg_rwstat_sum() to blkg_rwstat_total().  sum will be used for
summing up stats from multiple blkgs.

Signed-off-by: Tejun Heo 
---
 block/blk-cgroup.h  | 4 ++--
 block/cfq-iosched.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 678e89e..586c0ac 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -461,14 +461,14 @@ static inline struct blkg_rwstat blkg_rwstat_read(struct 
blkg_rwstat *rwstat)
 }
 
 /**
- * blkg_rwstat_sum - read the total count of a blkg_rwstat
+ * blkg_rwstat_total - read the total count of a blkg_rwstat
  * @rwstat: blkg_rwstat to read
  *
  * Return the total count of @rwstat regardless of the IO direction.  This
  * function can be called without synchronization and takes care of u64
  * atomicity.
  */
-static inline uint64_t blkg_rwstat_sum(struct blkg_rwstat *rwstat)
+static inline uint64_t blkg_rwstat_total(struct blkg_rwstat *rwstat)
 {
struct blkg_rwstat tmp = blkg_rwstat_read(rwstat);
 
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e8f3106..d43145cc 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -536,7 +536,7 @@ static void cfqg_stats_set_start_empty_time(struct 
cfq_group *cfqg)
 {
struct cfqg_stats *stats = >stats;
 
-   if (blkg_rwstat_sum(>queued))
+   if (blkg_rwstat_total(>queued))
return;
 
/*
@@ -580,7 +580,7 @@ static void cfqg_stats_update_avg_queue_size(struct 
cfq_group *cfqg)
struct cfqg_stats *stats = >stats;
 
blkg_stat_add(>avg_queue_size_sum,
- blkg_rwstat_sum(>queued));
+ blkg_rwstat_total(>queued));
blkg_stat_add(>avg_queue_size_samples, 1);
cfqg_stats_update_group_wait_time(stats);
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/24] blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()

2012-12-28 Thread Tejun Heo
Implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge().
The former two collect the [rw]stats designated by the target policy
data and offset from the pd's subtree.  The latter two add one
[rw]stat to another.

Note that the recursive sum functions require the queue lock to be
held on entry to make blkg online test reliable.  This is necessary to
properly handle stats of a dying blkg.

These will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
---
 block/blk-cgroup.c | 107 +
 block/blk-cgroup.h |  35 ++
 2 files changed, 142 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 4d625d2..a1a4b97 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -32,6 +32,26 @@ EXPORT_SYMBOL_GPL(blkcg_root);
 
 static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];
 
+static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg,
+ struct request_queue *q, bool 
update_hint);
+
+/**
+ * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants
+ * @d_blkg: loop cursor pointing to the current descendant
+ * @pos_cgrp: used for iteration
+ * @p_blkg: target blkg to walk descendants of
+ *
+ * Walk @c_blkg through the descendants of @p_blkg.  Must be used with RCU
+ * read locked.  If called under either blkcg or queue lock, the iteration
+ * is guaranteed to include all and only online blkgs.  The caller may
+ * update @pos_cgrp by calling cgroup_rightmost_descendant() to skip
+ * subtree.
+ */
+#define blkg_for_each_descendant_pre(d_blkg, pos_cgrp, p_blkg) \
+   cgroup_for_each_descendant_pre((pos_cgrp), (p_blkg)->blkcg->css.cgroup) 
\
+   if (((d_blkg) = __blkg_lookup(cgroup_to_blkcg(pos_cgrp), \
+ (p_blkg)->q, false)))
+
 static bool blkcg_policy_enabled(struct request_queue *q,
 const struct blkcg_policy *pol)
 {
@@ -127,6 +147,17 @@ err_free:
return NULL;
 }
 
+/**
+ * __blkg_lookup - internal version of blkg_lookup()
+ * @blkcg: blkcg of interest
+ * @q: request_queue of interest
+ * @update_hint: whether to update lookup hint with the result or not
+ *
+ * This is internal version and shouldn't be used by policy
+ * implementations.  Looks up blkgs for the @blkcg - @q pair regardless of
+ * @q's bypass state.  If @update_hint is %true, the caller should be
+ * holding @q->queue_lock and lookup hint is updated on success.
+ */
 static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg,
  struct request_queue *q, bool update_hint)
 {
@@ -585,6 +616,82 @@ u64 blkg_prfill_rwstat(struct seq_file *sf, struct 
blkg_policy_data *pd,
 EXPORT_SYMBOL_GPL(blkg_prfill_rwstat);
 
 /**
+ * blkg_stat_recursive_sum - collect hierarchical blkg_stat
+ * @pd: policy private data of interest
+ * @off: offset to the blkg_stat in @pd
+ *
+ * Collect the blkg_stat specified by @off from @pd and all its online
+ * descendants and return the sum.  The caller must be holding the queue
+ * lock for online tests.
+ */
+u64 blkg_stat_recursive_sum(struct blkg_policy_data *pd, int off)
+{
+   struct blkcg_policy *pol = blkcg_policy[pd->plid];
+   struct blkcg_gq *pos_blkg;
+   struct cgroup *pos_cgrp;
+   u64 sum;
+
+   lockdep_assert_held(pd->blkg->q->queue_lock);
+
+   sum = blkg_stat_read((void *)pd + off);
+
+   rcu_read_lock();
+   blkg_for_each_descendant_pre(pos_blkg, pos_cgrp, pd_to_blkg(pd)) {
+   struct blkg_policy_data *pos_pd = blkg_to_pd(pos_blkg, pol);
+   struct blkg_stat *stat = (void *)pos_pd + off;
+
+   if (pos_blkg->online)
+   sum += blkg_stat_read(stat);
+   }
+   rcu_read_unlock();
+
+   return sum;
+}
+EXPORT_SYMBOL_GPL(blkg_stat_recursive_sum);
+
+/**
+ * blkg_rwstat_recursive_sum - collect hierarchical blkg_rwstat
+ * @pd: policy private data of interest
+ * @off: offset to the blkg_stat in @pd
+ *
+ * Collect the blkg_rwstat specified by @off from @pd and all its online
+ * descendants and return the sum.  The caller must be holding the queue
+ * lock for online tests.
+ */
+struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkg_policy_data *pd,
+int off)
+{
+   struct blkcg_policy *pol = blkcg_policy[pd->plid];
+   struct blkcg_gq *pos_blkg;
+   struct cgroup *pos_cgrp;
+   struct blkg_rwstat sum;
+   int i;
+
+   lockdep_assert_held(pd->blkg->q->queue_lock);
+
+   sum = blkg_rwstat_read((void *)pd + off);
+
+   rcu_read_lock();
+   blkg_for_each_descendant_pre(pos_blkg, pos_cgrp, pd_to_blkg(pd)) {
+   struct blkg_policy_data *pos_pd = blkg_to_pd(pos_blkg, pol);
+   struct blkg_rwstat *rwstat = (void *)pos_pd + off;
+   struct blkg_rwstat tmp;
+
+   if (!pos_blkg->online)
+   

[PATCH 22/24] cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()

2012-12-28 Thread Tejun Heo
Separate out cfqg_stats_reset() which takes struct cfqg_stats * from
cfq_pd_reset_stats() and move the latter to where other pd methods are
defined.  cfqg_stats_reset() will be used to implement hierarchical
stats.

Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index d43145cc..f8b34bb 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -688,11 +688,9 @@ static inline void cfqg_stats_update_completion(struct 
cfq_group *cfqg,
io_start_time - start_time);
 }
 
-static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
+/* @stats = 0 */
+static void cfqg_stats_reset(struct cfqg_stats *stats)
 {
-   struct cfq_group *cfqg = blkg_to_cfqg(blkg);
-   struct cfqg_stats *stats = >stats;
-
/* queued stats shouldn't be cleared */
blkg_rwstat_reset(>service_bytes);
blkg_rwstat_reset(>serviced);
@@ -1477,6 +1475,13 @@ static void cfq_pd_init(struct blkcg_gq *blkg)
cfqg->leaf_weight = blkg->blkcg->cfq_leaf_weight;
 }
 
+static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
+{
+   struct cfq_group *cfqg = blkg_to_cfqg(blkg);
+
+   cfqg_stats_reset(>stats);
+}
+
 /*
  * Search for the cfq group current task belongs to. request_queue lock must
  * be held.
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/24] cfq-iosched: implement cfq_group->nr_active and ->children_weight

2012-12-28 Thread Tejun Heo
To prepare for blkcg hierarchy support, add cfqg->nr_active and
->children_weight.  cfqg->nr_active counts the number of active cfqgs
at the cfqg's level and ->children_weight is sum of weights of those
cfqgs.  The level covers itself (cfqg->leaf_weight) and immediate
children.

The two values are updated when a cfqg enters and leaves the group
service tree.  Unless the hierarchy is very deep, the added overhead
should be negligible.

Currently, the parent is determined using cfqg_flat_parent() which
makes the root cfqg the parent of all other cfqgs.  This is to make
the transition to hierarchy-aware scheduling gradual.  Scheduling
logic will be converted to use cfqg->children_weight without actually
changing the behavior.  When everything is ready,
blkcg_weight_parent() will be replaced with proper parent function.

This patch doesn't introduce any behavior chagne.

v2: s/cfqg->level_weight/cfqg->children_weight/ as per Vivek.

Signed-off-by: Tejun Heo 
Cc: Vivek Goyal 
---
 block/cfq-iosched.c | 76 +
 1 file changed, 76 insertions(+)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 175218d6..7701c3f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -225,6 +225,18 @@ struct cfq_group {
u64 vdisktime;
 
/*
+* The number of active cfqgs and sum of their weights under this
+* cfqg.  This covers this cfqg's leaf_weight and all children's
+* weights, but does not cover weights of further descendants.
+*
+* If a cfqg is on the service tree, it's active.  An active cfqg
+* also activates its parent and contributes to the children_weight
+* of the parent.
+*/
+   int nr_active;
+   unsigned int children_weight;
+
+   /*
 * There are two weights - (internal) weight is the weight of this
 * cfqg against the sibling cfqgs.  leaf_weight is the wight of
 * this cfqg against the child cfqgs.  For the root cfqg, both
@@ -583,6 +595,22 @@ static inline struct cfq_group *blkg_to_cfqg(struct 
blkcg_gq *blkg)
return pd_to_cfqg(blkg_to_pd(blkg, _policy_cfq));
 }
 
+/*
+ * Determine the parent cfqg for weight calculation.  Currently, cfqg
+ * scheduling is flat and the root is the parent of everyone else.
+ */
+static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg)
+{
+   struct blkcg_gq *blkg = cfqg_to_blkg(cfqg);
+   struct cfq_group *root;
+
+   while (blkg->parent)
+   blkg = blkg->parent;
+   root = blkg_to_cfqg(blkg);
+
+   return root != cfqg ? root : NULL;
+}
+
 static inline void cfqg_get(struct cfq_group *cfqg)
 {
return blkg_get(cfqg_to_blkg(cfqg));
@@ -683,6 +711,7 @@ static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
 
 #else  /* CONFIG_CFQ_GROUP_IOSCHED */
 
+static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg) { 
return NULL; }
 static inline void cfqg_get(struct cfq_group *cfqg) { }
 static inline void cfqg_put(struct cfq_group *cfqg) { }
 
@@ -1208,11 +1237,33 @@ cfq_update_group_weight(struct cfq_group *cfqg)
 static void
 cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
 {
+   struct cfq_group *pos = cfqg;
+   bool propagate;
+
+   /* add to the service tree */
BUG_ON(!RB_EMPTY_NODE(>rb_node));
 
cfq_update_group_weight(cfqg);
__cfq_group_service_tree_add(st, cfqg);
st->total_weight += cfqg->weight;
+
+   /*
+* Activate @cfqg and propagate activation upwards until we meet an
+* already activated node or reach root.
+*/
+   propagate = !pos->nr_active++;
+   pos->children_weight += pos->leaf_weight;
+
+   while (propagate) {
+   struct cfq_group *parent = cfqg_flat_parent(pos);
+
+   if (!parent)
+   break;
+
+   propagate = !parent->nr_active++;
+   parent->children_weight += pos->weight;
+   pos = parent;
+   }
 }
 
 static void
@@ -1243,6 +1294,31 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct 
cfq_group *cfqg)
 static void
 cfq_group_service_tree_del(struct cfq_rb_root *st, struct cfq_group *cfqg)
 {
+   struct cfq_group *pos = cfqg;
+   bool propagate;
+
+   /*
+* Undo activation from cfq_group_service_tree_add().  Deactivate
+* @cfqg and propagate deactivation upwards.
+*/
+   propagate = !--pos->nr_active;
+   pos->children_weight -= pos->leaf_weight;
+
+   while (propagate) {
+   struct cfq_group *parent = cfqg_flat_parent(pos);
+
+   /* @pos has 0 nr_active at this point */
+   WARN_ON_ONCE(pos->children_weight);
+
+   if (!parent)
+   break;
+
+   propagate = !--parent->nr_active;
+   parent->children_weight -= pos->weight;
+   pos = parent;
+   }
+
+   /* 

[PATCH 14/24] cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction

2012-12-28 Thread Tejun Heo
cfq_group_slice() calculates slice by taking a fraction of
cfq_target_latency according to the ratio of cfqg->weight against
service_tree->total_weight.  This currently works only because all
cfqgs are treated to be at the same level.

To prepare for proper hierarchy support, convert cfq_group_slice() to
base the calculation on cfqg->vfraction.  As cfqg->vfraction is always
a fraction of 1 and represents the fraction allocated to the cfqg with
hierarchy considered, the slice can be simply calculated by
multiplying cfqg->vfraction to cfq_target_latency (with fixed point
shift factored in).

As vfraction calculation currently treats all non-root cfqgs as
children of the root cfqg, this patch doesn't introduce noticeable
behavior difference.

Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b24acf6..ee34282 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -85,7 +85,6 @@ struct cfq_rb_root {
struct rb_root rb;
struct rb_node *left;
unsigned count;
-   unsigned total_weight;
u64 min_vdisktime;
struct cfq_ttime ttime;
 };
@@ -979,9 +978,7 @@ static inline unsigned cfq_group_get_avg_queues(struct 
cfq_data *cfqd,
 static inline unsigned
 cfq_group_slice(struct cfq_data *cfqd, struct cfq_group *cfqg)
 {
-   struct cfq_rb_root *st = >grp_service_tree;
-
-   return cfqd->cfq_target_latency * cfqg->weight / st->total_weight;
+   return cfqd->cfq_target_latency * cfqg->vfraction >> CFQ_SERVICE_SHIFT;
 }
 
 static inline unsigned
@@ -1273,7 +1270,6 @@ cfq_group_service_tree_add(struct cfq_rb_root *st, struct 
cfq_group *cfqg)
 
cfq_update_group_weight(cfqg);
__cfq_group_service_tree_add(st, cfqg);
-   st->total_weight += cfqg->weight;
 
/*
 * Activate @cfqg and calculate the portion of vfraction @cfqg is
@@ -1360,7 +1356,6 @@ cfq_group_service_tree_del(struct cfq_rb_root *st, struct 
cfq_group *cfqg)
}
 
/* remove from the service tree */
-   st->total_weight -= cfqg->weight;
if (!RB_EMPTY_NODE(>rb_node))
cfq_rb_erase(>rb_node, st);
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/24] cfq-iosched: add leaf_weight

2012-12-28 Thread Tejun Heo
cfq blkcg is about to grow proper hierarchy handling, where a child
blkg's weight would nest inside the parent's.  This makes tasks in a
blkg to compete against both tasks in the sibling blkgs and the tasks
of child blkgs.

We're gonna use the existing weight as the group weight which decides
the blkg's weight against its siblings.  This patch introduces a new
weight - leaf_weight - which decides the weight of a blkg against the
child blkgs.

It's named leaf_weight because another way to look at it is that each
internal blkg nodes have a hidden child leaf node which contains all
its tasks and leaf_weight is the weight of the leaf node and handled
the same as the weight of the child blkgs.

This patch only adds leaf_weight fields and exposes it to userland.
The new weight isn't actually used anywhere yet.  Note that
cfq-iosched currently offcially supports only single level hierarchy
and root blkgs compete with the first level blkgs - ie. root weight is
basically being used as leaf_weight.  For root blkgs, the two weights
are kept in sync for backward compatibility.

v2: cfqd->root_group->leaf_weight initialization was missing from
cfq_init_queue() causing divide by zero when
!CONFIG_CFQ_GROUP_SCHED.  Fix it.  Reported by Fengguang.

Signed-off-by: Tejun Heo 
Cc: Fengguang Wu 
---
 block/blk-cgroup.c  |   4 +-
 block/blk-cgroup.h  |   1 +
 block/cfq-iosched.c | 134 
 3 files changed, 130 insertions(+), 9 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 942f344..10e1df9 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -26,7 +26,8 @@
 
 static DEFINE_MUTEX(blkcg_pol_mutex);
 
-struct blkcg blkcg_root = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT };
+struct blkcg blkcg_root = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT,
+   .cfq_leaf_weight = 2 * CFQ_WEIGHT_DEFAULT, };
 EXPORT_SYMBOL_GPL(blkcg_root);
 
 static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];
@@ -710,6 +711,7 @@ static struct cgroup_subsys_state *blkcg_css_alloc(struct 
cgroup *cgroup)
return ERR_PTR(-ENOMEM);
 
blkcg->cfq_weight = CFQ_WEIGHT_DEFAULT;
+   blkcg->cfq_leaf_weight = CFQ_WEIGHT_DEFAULT;
blkcg->id = atomic64_inc_return(_seq); /* root is 0, start from 1 */
 done:
spin_lock_init(>lock);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index b26ed58..2446225 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -54,6 +54,7 @@ struct blkcg {
 
/* TODO: per-policy storage in blkcg */
unsigned intcfq_weight; /* belongs to cfq */
+   unsigned intcfq_leaf_weight;
 };
 
 struct blkg_stat {
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index bc076f4..175218d6 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -223,10 +223,21 @@ struct cfq_group {
 
/* group service_tree key */
u64 vdisktime;
+
+   /*
+* There are two weights - (internal) weight is the weight of this
+* cfqg against the sibling cfqgs.  leaf_weight is the wight of
+* this cfqg against the child cfqgs.  For the root cfqg, both
+* weights are kept in sync for backward compatibility.
+*/
unsigned int weight;
unsigned int new_weight;
unsigned int dev_weight;
 
+   unsigned int leaf_weight;
+   unsigned int new_leaf_weight;
+   unsigned int dev_leaf_weight;
+
/* number of cfqq currently on this group */
int nr_cfqq;
 
@@ -1182,10 +1193,16 @@ static void
 cfq_update_group_weight(struct cfq_group *cfqg)
 {
BUG_ON(!RB_EMPTY_NODE(>rb_node));
+
if (cfqg->new_weight) {
cfqg->weight = cfqg->new_weight;
cfqg->new_weight = 0;
}
+
+   if (cfqg->new_leaf_weight) {
+   cfqg->leaf_weight = cfqg->new_leaf_weight;
+   cfqg->new_leaf_weight = 0;
+   }
 }
 
 static void
@@ -1348,6 +1365,7 @@ static void cfq_pd_init(struct blkcg_gq *blkg)
 
cfq_init_cfqg_base(cfqg);
cfqg->weight = blkg->blkcg->cfq_weight;
+   cfqg->leaf_weight = blkg->blkcg->cfq_leaf_weight;
 }
 
 /*
@@ -1404,6 +1422,26 @@ static int cfqg_print_weight_device(struct cgroup *cgrp, 
struct cftype *cft,
return 0;
 }
 
+static u64 cfqg_prfill_leaf_weight_device(struct seq_file *sf,
+ struct blkg_policy_data *pd, int off)
+{
+   struct cfq_group *cfqg = pd_to_cfqg(pd);
+
+   if (!cfqg->dev_leaf_weight)
+   return 0;
+   return __blkg_prfill_u64(sf, pd, cfqg->dev_leaf_weight);
+}
+
+static int cfqg_print_leaf_weight_device(struct cgroup *cgrp,
+struct cftype *cft,
+struct seq_file *sf)
+{
+   blkcg_print_blkgs(sf, cgroup_to_blkcg(cgrp),
+ cfqg_prfill_leaf_weight_device, _policy_cfq, 0,
+ 

[PATCH 23/24] cfq-iosched: collect stats from dead cfqgs

2012-12-28 Thread Tejun Heo
To support hierarchical stats, it's necessary to remember stats from
dead children.  Add cfqg->dead_stats and make a dying cfqg transfer
its stats to the parent's dead-stats.

The transfer happens form ->pd_offline_fn() and it is possible that
there are some residual IOs completing afterwards.  Currently, we lose
these stats.  Given that cgroup removal isn't a very high frequency
operation and the amount of residual IOs on offline are likely to be
nil or small, this shouldn't be a big deal and the complexity needed
to handle residual IOs - another callback and rather elaborate
synchronization to reach and lock the matching q - doesn't seem
justified.

Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 57 -
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index f8b34bb..4d75b79 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -289,7 +289,8 @@ struct cfq_group {
/* number of requests that are on the dispatch list or inside driver */
int dispatched;
struct cfq_ttime ttime;
-   struct cfqg_stats stats;
+   struct cfqg_stats stats;/* stats for this cfqg */
+   struct cfqg_stats dead_stats;   /* stats pushed from dead children */
 };
 
 struct cfq_io_cq {
@@ -709,6 +710,47 @@ static void cfqg_stats_reset(struct cfqg_stats *stats)
 #endif
 }
 
+/* @to += @from */
+static void cfqg_stats_merge(struct cfqg_stats *to, struct cfqg_stats *from)
+{
+   /* queued stats shouldn't be cleared */
+   blkg_rwstat_merge(>service_bytes, >service_bytes);
+   blkg_rwstat_merge(>serviced, >serviced);
+   blkg_rwstat_merge(>merged, >merged);
+   blkg_rwstat_merge(>service_time, >service_time);
+   blkg_rwstat_merge(>wait_time, >wait_time);
+   blkg_stat_merge(>time, >time);
+#ifdef CONFIG_DEBUG_BLK_CGROUP
+   blkg_stat_merge(>unaccounted_time, >unaccounted_time);
+   blkg_stat_merge(>avg_queue_size_sum, >avg_queue_size_sum);
+   blkg_stat_merge(>avg_queue_size_samples, 
>avg_queue_size_samples);
+   blkg_stat_merge(>dequeue, >dequeue);
+   blkg_stat_merge(>group_wait_time, >group_wait_time);
+   blkg_stat_merge(>idle_time, >idle_time);
+   blkg_stat_merge(>empty_time, >empty_time);
+#endif
+}
+
+/*
+ * Transfer @cfqg's stats to its parent's dead_stats so that the ancestors'
+ * recursive stats can still account for the amount used by this cfqg after
+ * it's gone.
+ */
+static void cfqg_stats_xfer_dead(struct cfq_group *cfqg)
+{
+   struct cfq_group *parent = cfqg_parent(cfqg);
+
+   lockdep_assert_held(cfqg_to_blkg(cfqg)->q->queue_lock);
+
+   if (unlikely(!parent))
+   return;
+
+   cfqg_stats_merge(>dead_stats, >stats);
+   cfqg_stats_merge(>dead_stats, >dead_stats);
+   cfqg_stats_reset(>stats);
+   cfqg_stats_reset(>dead_stats);
+}
+
 #else  /* CONFIG_CFQ_GROUP_IOSCHED */
 
 static inline struct cfq_group *cfqg_parent(struct cfq_group *cfqg) { return 
NULL; }
@@ -1475,11 +1517,23 @@ static void cfq_pd_init(struct blkcg_gq *blkg)
cfqg->leaf_weight = blkg->blkcg->cfq_leaf_weight;
 }
 
+static void cfq_pd_offline(struct blkcg_gq *blkg)
+{
+   /*
+* @blkg is going offline and will be ignored by
+* blkg_[rw]stat_recursive_sum().  Transfer stats to the parent so
+* that they don't get lost.  If IOs complete after this point, the
+* stats for them will be lost.  Oh well...
+*/
+   cfqg_stats_xfer_dead(blkg_to_cfqg(blkg));
+}
+
 static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
 {
struct cfq_group *cfqg = blkg_to_cfqg(blkg);
 
cfqg_stats_reset(>stats);
+   cfqg_stats_reset(>dead_stats);
 }
 
 /*
@@ -4408,6 +4462,7 @@ static struct blkcg_policy blkcg_policy_cfq = {
.cftypes= cfq_blkcg_files,
 
.pd_init_fn = cfq_pd_init,
+   .pd_offline_fn  = cfq_pd_offline,
.pd_reset_stats_fn  = cfq_pd_reset_stats,
 };
 #endif
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/24] cfq-iosched: add hierarchical cfq_group statistics

2012-12-28 Thread Tejun Heo
Unfortunately, at this point, there's no way to make the existing
statistics hierarchical without creating nasty surprises for the
existing users.  Just create recursive counterpart of the existing
stats.

Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 105 
 1 file changed, 105 insertions(+)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4d75b79..b66365b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1528,6 +1528,32 @@ static void cfq_pd_offline(struct blkcg_gq *blkg)
cfqg_stats_xfer_dead(blkg_to_cfqg(blkg));
 }
 
+/* offset delta from cfqg->stats to cfqg->dead_stats */
+static const int dead_stats_off_delta = offsetof(struct cfq_group, dead_stats) 
-
+   offsetof(struct cfq_group, stats);
+
+/* to be used by recursive prfill, sums live and dead stats recursively */
+static u64 cfqg_stat_pd_recursive_sum(struct blkg_policy_data *pd, int off)
+{
+   u64 sum = 0;
+
+   sum += blkg_stat_recursive_sum(pd, off);
+   sum += blkg_stat_recursive_sum(pd, off + dead_stats_off_delta);
+   return sum;
+}
+
+/* to be used by recursive prfill, sums live and dead rwstats recursively */
+static struct blkg_rwstat cfqg_rwstat_pd_recursive_sum(struct blkg_policy_data 
*pd,
+  int off)
+{
+   struct blkg_rwstat a, b;
+
+   a = blkg_rwstat_recursive_sum(pd, off);
+   b = blkg_rwstat_recursive_sum(pd, off + dead_stats_off_delta);
+   blkg_rwstat_merge(, );
+   return a;
+}
+
 static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
 {
struct cfq_group *cfqg = blkg_to_cfqg(blkg);
@@ -1732,6 +1758,42 @@ static int cfqg_print_rwstat(struct cgroup *cgrp, struct 
cftype *cft,
return 0;
 }
 
+static u64 cfqg_prfill_stat_recursive(struct seq_file *sf,
+ struct blkg_policy_data *pd, int off)
+{
+   u64 sum = cfqg_stat_pd_recursive_sum(pd, off);
+
+   return __blkg_prfill_u64(sf, pd, sum);
+}
+
+static u64 cfqg_prfill_rwstat_recursive(struct seq_file *sf,
+   struct blkg_policy_data *pd, int off)
+{
+   struct blkg_rwstat sum = cfqg_rwstat_pd_recursive_sum(pd, off);
+
+   return __blkg_prfill_rwstat(sf, pd, );
+}
+
+static int cfqg_print_stat_recursive(struct cgroup *cgrp, struct cftype *cft,
+struct seq_file *sf)
+{
+   struct blkcg *blkcg = cgroup_to_blkcg(cgrp);
+
+   blkcg_print_blkgs(sf, blkcg, cfqg_prfill_stat_recursive,
+ _policy_cfq, cft->private, false);
+   return 0;
+}
+
+static int cfqg_print_rwstat_recursive(struct cgroup *cgrp, struct cftype *cft,
+  struct seq_file *sf)
+{
+   struct blkcg *blkcg = cgroup_to_blkcg(cgrp);
+
+   blkcg_print_blkgs(sf, blkcg, cfqg_prfill_rwstat_recursive,
+ _policy_cfq, cft->private, true);
+   return 0;
+}
+
 #ifdef CONFIG_DEBUG_BLK_CGROUP
 static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf,
  struct blkg_policy_data *pd, int off)
@@ -1803,6 +1865,7 @@ static struct cftype cfq_blkcg_files[] = {
.write_u64 = cfq_set_leaf_weight,
},
 
+   /* statistics, covers only the tasks in the cfqg */
{
.name = "time",
.private = offsetof(struct cfq_group, stats.time),
@@ -1843,6 +1906,48 @@ static struct cftype cfq_blkcg_files[] = {
.private = offsetof(struct cfq_group, stats.queued),
.read_seq_string = cfqg_print_rwstat,
},
+
+   /* the same statictics which cover the cfqg and its descendants */
+   {
+   .name = "time_recursive",
+   .private = offsetof(struct cfq_group, stats.time),
+   .read_seq_string = cfqg_print_stat_recursive,
+   },
+   {
+   .name = "sectors_recursive",
+   .private = offsetof(struct cfq_group, stats.sectors),
+   .read_seq_string = cfqg_print_stat_recursive,
+   },
+   {
+   .name = "io_service_bytes_recursive",
+   .private = offsetof(struct cfq_group, stats.service_bytes),
+   .read_seq_string = cfqg_print_rwstat_recursive,
+   },
+   {
+   .name = "io_serviced_recursive",
+   .private = offsetof(struct cfq_group, stats.serviced),
+   .read_seq_string = cfqg_print_rwstat_recursive,
+   },
+   {
+   .name = "io_service_time_recursive",
+   .private = offsetof(struct cfq_group, stats.service_time),
+   .read_seq_string = cfqg_print_rwstat_recursive,
+   },
+   {
+   .name = "io_wait_time_recursive",
+   .private = offsetof(struct cfq_group, stats.wait_time),
+   .read_seq_string = cfqg_print_rwstat_recursive,
+ 

[PATCH 20/24] block: RCU free request_queue

2012-12-28 Thread Tejun Heo
RCU free request_queue so that blkcg_gq->q can be dereferenced under
RCU lock.  This will be used to implement hierarchical stats.

Signed-off-by: Tejun Heo 
---
 block/blk-sysfs.c  | 9 -
 include/linux/blkdev.h | 2 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 7881477..6206a93 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -497,6 +497,13 @@ queue_attr_store(struct kobject *kobj, struct attribute 
*attr,
return res;
 }
 
+static void blk_free_queue_rcu(struct rcu_head *rcu_head)
+{
+   struct request_queue *q = container_of(rcu_head, struct request_queue,
+  rcu_head);
+   kmem_cache_free(blk_requestq_cachep, q);
+}
+
 /**
  * blk_release_queue: - release a  request_queue when it is no longer 
needed
  * @kobj:the kobj belonging to the request queue to be released
@@ -538,7 +545,7 @@ static void blk_release_queue(struct kobject *kobj)
bdi_destroy(>backing_dev_info);
 
ida_simple_remove(_queue_ida, q->id);
-   kmem_cache_free(blk_requestq_cachep, q);
+   call_rcu(>rcu_head, blk_free_queue_rcu);
 }
 
 static const struct sysfs_ops queue_sysfs_ops = {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f94bc83..406343c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -437,6 +438,7 @@ struct request_queue {
/* Throttle data */
struct throtl_data *td;
 #endif
+   struct rcu_head rcu_head;
 };
 
 #define QUEUE_FLAG_QUEUED  1   /* uses generic tag queueing */
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/24] cfq-iosched: Print sync-noidle information in blktrace messages

2012-12-28 Thread Tejun Heo
From: Vivek Goyal 

Currently we attach a character "S" or "A" to the cfqq, to represent
whether queues is sync or async. Add one more character "N" to represent
whether it is sync-noidle queue or sync queue. So now three different
type of queues will look as follows.

cfq1234S   --> sync queus
cfq1234SN  --> sync noidle queue
cfq1234A   --> Async queue

Previously S/A classification was being printed only if group scheduling
was enabled. This patch also makes sure that this classification is
displayed even if group idling is disabled.

Signed-off-by: Vivek Goyal 
Acked-by: Jeff Moyer 
Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 5ad4cae..bc076f4 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -586,8 +586,9 @@ static inline void cfqg_put(struct cfq_group *cfqg)
char __pbuf[128];   \
\
blkg_path(cfqg_to_blkg((cfqq)->cfqg), __pbuf, sizeof(__pbuf));  \
-   blk_add_trace_msg((cfqd)->queue, "cfq%d%c %s " fmt, (cfqq)->pid, \
- cfq_cfqq_sync((cfqq)) ? 'S' : 'A',\
+   blk_add_trace_msg((cfqd)->queue, "cfq%d%c%c %s " fmt, (cfqq)->pid, \
+   cfq_cfqq_sync((cfqq)) ? 'S' : 'A',  \
+   cfqq_type((cfqq)) == SYNC_NOIDLE_WORKLOAD ? 'N' : ' ',\
  __pbuf, ##args);  \
 } while (0)
 
@@ -675,7 +676,10 @@ static inline void cfqg_get(struct cfq_group *cfqg) { }
 static inline void cfqg_put(struct cfq_group *cfqg) { }
 
 #define cfq_log_cfqq(cfqd, cfqq, fmt, args...) \
-   blk_add_trace_msg((cfqd)->queue, "cfq%d " fmt, (cfqq)->pid, ##args)
+   blk_add_trace_msg((cfqd)->queue, "cfq%d%c%c " fmt, (cfqq)->pid, \
+   cfq_cfqq_sync((cfqq)) ? 'S' : 'A',  \
+   cfqq_type((cfqq)) == SYNC_NOIDLE_WORKLOAD ? 'N' : ' ',\
+   ##args)
 #define cfq_log_cfqg(cfqd, cfqg, fmt, args...) do {} while (0)
 
 static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg,
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/24] cfq-iosched: enable full blkcg hierarchy support

2012-12-28 Thread Tejun Heo
With the previous two patches, all cfqg scheduling decisions are based
on vfraction and ready for hierarchy support.  The only thing which
keeps the behavior flat is cfqg_flat_parent() which makes vfraction
calculation consider all non-root cfqgs children of the root cfqg.

Replace it with cfqg_parent() which returns the real parent.  This
enables full blkcg hierarchy support for cfq-iosched.  For example,
consider the following hierarchy.

root
  /  \
   A:500  B:250
  / \
 AA:500  AB:1000

For simplicity, let's say all the leaf nodes have active tasks and are
on service tree.  For each leaf node, vfraction would be

 AA: (500  / 1500) * (500 / 750) =~ 0.
 AB: (1000 / 1500) * (500 / 750) =~ 0.
  B: (250 / 750) =~ 0.

and vdisktime will be distributed accordingly.  For more detail,
please refer to Documentation/block/cfq-iosched.txt.

v2: cfq-iosched.txt updated to describe group scheduling as suggested
by Vivek.

Signed-off-by: Tejun Heo 
Cc: Vivek Goyal 
---
 Documentation/block/cfq-iosched.txt | 58 +
 block/cfq-iosched.c | 21 --
 2 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/Documentation/block/cfq-iosched.txt 
b/Documentation/block/cfq-iosched.txt
index d89b4fe..a5eb7d1 100644
--- a/Documentation/block/cfq-iosched.txt
+++ b/Documentation/block/cfq-iosched.txt
@@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can 
imporve the
 performace although this can cause the latency of some I/O to increase due
 to more number of requests.
 
+CFQ Group scheduling
+
+
+CFQ supports blkio cgroup and has "blkio." prefixed files in each
+blkio cgroup directory. It is weight-based and there are four knobs
+for configuration - weight[_device] and leaf_weight[_device].
+Internal cgroup nodes (the ones with children) can also have tasks in
+them, so the former two configure how much proportion the cgroup as a
+whole is entitled to at its parent's level while the latter two
+configure how much proportion the tasks in the cgroup have compared to
+its direct children.
+
+Another way to think about it is assuming that each internal node has
+an implicit leaf child node which hosts all the tasks whose weight is
+configured by leaf_weight[_device]. Let's assume a blkio hierarchy
+composed of five cgroups - root, A, B, AA and AB - with the following
+weights where the names represent the hierarchy.
+
+weight leaf_weight
+ root :  125125
+ A:  500750
+ B:  250500
+ AA   :  500500
+ AB   : 1000500
+
+root never has a parent making its weight is meaningless. For backward
+compatibility, weight is always kept in sync with leaf_weight. B, AA
+and AB have no child and thus its tasks have no children cgroup to
+compete with. They always get 100% of what the cgroup won at the
+parent level. Considering only the weights which matter, the hierarchy
+looks like the following.
+
+  root
+   /|   \
+  A Bleaf
+ 500   250   125
+   /  |  \
+  AA  AB  leaf
+ 500 1000 750
+
+If all cgroups have active IOs and competing with each other, disk
+time will be distributed like the following.
+
+Distribution below root. The total active weight at this level is
+A:500 + B:250 + C:125 = 875.
+
+ root-leaf :   125 /  875  =~ 14%
+ A :   500 /  875  =~ 57%
+ B(-leaf)  :   250 /  875  =~ 28%
+
+A has children and further distributes its 57% among the children and
+the implicit leaf node. The total active weight at this level is
+AA:500 + AB:1000 + A-leaf:750 = 2250.
+
+ A-leaf: ( 750 / 2250) * A =~ 19%
+ AA(-leaf) : ( 500 / 2250) * A =~ 12%
+ AB(-leaf) : (1000 / 2250) * A =~ 25%
+
 CFQ IOPS Mode for group scheduling
 ===
 Basic CFQ design is to provide priority based time slices. Higher priority
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index ee34282..e8f3106 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -606,20 +606,11 @@ static inline struct cfq_group *blkg_to_cfqg(struct 
blkcg_gq *blkg)
return pd_to_cfqg(blkg_to_pd(blkg, _policy_cfq));
 }
 
-/*
- * Determine the parent cfqg for weight calculation.  Currently, cfqg
- * scheduling is flat and the root is the parent of everyone else.
- */
-static inline struct cfq_group *cfqg_flat_parent(struct cfq_group *cfqg)
+static inline struct cfq_group *cfqg_parent(struct cfq_group *cfqg)
 {
-   struct blkcg_gq *blkg = cfqg_to_blkg(cfqg);
-   struct cfq_group *root;
-
-   while (blkg->parent)
-   blkg = blkg->parent;
-   root = blkg_to_cfqg(blkg);
+   struct blkcg_gq *pblkg = cfqg_to_blkg(cfqg)->parent;
 
-   return root != cfqg ? root : NULL;
+   return pblkg ? blkg_to_cfqg(pblkg) : NULL;
 }
 
 static inline void cfqg_get(struct cfq_group *cfqg)
@@ -722,7 +713,7 @@ static void cfq_pd_reset_stats(struct blkcg_gq *blkg)
 
 #else  /* 

[PATCH 03/24] cfq-iosched: Rename "service_tree" to "st" at some places

2012-12-28 Thread Tejun Heo
From: Vivek Goyal 

At quite a few places we use the keyword "service_tree". At some places,
especially local variables, I have abbreviated it to "st".

Also at couple of places moved binary operator "+" from beginning of line
to end of previous line, as per Tejun's feedback.

v2:
 Reverted most of the service tree name change based on Jeff Moyer's feedback.

Signed-off-by: Vivek Goyal 
Signed-off-by: Tejun Heo 
---
 block/cfq-iosched.c | 77 +
 1 file changed, 36 insertions(+), 41 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 8f890bf..db4a1a5 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -353,7 +353,7 @@ struct cfq_data {
 
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
 
-static struct cfq_rb_root *service_tree_for(struct cfq_group *cfqg,
+static struct cfq_rb_root *st_for(struct cfq_group *cfqg,
enum wl_class_t class,
enum wl_type_t type)
 {
@@ -758,16 +758,16 @@ static inline int cfq_group_busy_queues_wl(enum 
wl_class_t wl_class,
if (wl_class == IDLE_WORKLOAD)
return cfqg->service_tree_idle.count;
 
-   return cfqg->service_trees[wl_class][ASYNC_WORKLOAD].count
-   + cfqg->service_trees[wl_class][SYNC_NOIDLE_WORKLOAD].count
-   + cfqg->service_trees[wl_class][SYNC_WORKLOAD].count;
+   return cfqg->service_trees[wl_class][ASYNC_WORKLOAD].count +
+   cfqg->service_trees[wl_class][SYNC_NOIDLE_WORKLOAD].count +
+   cfqg->service_trees[wl_class][SYNC_WORKLOAD].count;
 }
 
 static inline int cfqg_busy_async_queues(struct cfq_data *cfqd,
struct cfq_group *cfqg)
 {
-   return cfqg->service_trees[RT_WORKLOAD][ASYNC_WORKLOAD].count
-   + cfqg->service_trees[BE_WORKLOAD][ASYNC_WORKLOAD].count;
+   return cfqg->service_trees[RT_WORKLOAD][ASYNC_WORKLOAD].count +
+   cfqg->service_trees[BE_WORKLOAD][ASYNC_WORKLOAD].count;
 }
 
 static void cfq_dispatch_insert(struct request_queue *, struct request *);
@@ -1612,15 +1612,14 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
struct rb_node **p, *parent;
struct cfq_queue *__cfqq;
unsigned long rb_key;
-   struct cfq_rb_root *service_tree;
+   struct cfq_rb_root *st;
int left;
int new_cfqq = 1;
 
-   service_tree = service_tree_for(cfqq->cfqg, cfqq_class(cfqq),
-   cfqq_type(cfqq));
+   st = st_for(cfqq->cfqg, cfqq_class(cfqq), cfqq_type(cfqq));
if (cfq_class_idle(cfqq)) {
rb_key = CFQ_IDLE_DELAY;
-   parent = rb_last(_tree->rb);
+   parent = rb_last(>rb);
if (parent && parent != >rb_node) {
__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
rb_key += __cfqq->rb_key;
@@ -1638,7 +1637,7 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
cfqq->slice_resid = 0;
} else {
rb_key = -HZ;
-   __cfqq = cfq_rb_first(service_tree);
+   __cfqq = cfq_rb_first(st);
rb_key += __cfqq ? __cfqq->rb_key : jiffies;
}
 
@@ -1647,8 +1646,7 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
/*
 * same position, nothing more to do
 */
-   if (rb_key == cfqq->rb_key &&
-   cfqq->service_tree == service_tree)
+   if (rb_key == cfqq->rb_key && cfqq->service_tree == st)
return;
 
cfq_rb_erase(>rb_node, cfqq->service_tree);
@@ -1657,8 +1655,8 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
 
left = 1;
parent = NULL;
-   cfqq->service_tree = service_tree;
-   p = _tree->rb.rb_node;
+   cfqq->service_tree = st;
+   p = >rb.rb_node;
while (*p) {
struct rb_node **n;
 
@@ -1679,12 +1677,12 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, 
struct cfq_queue *cfqq,
}
 
if (left)
-   service_tree->left = >rb_node;
+   st->left = >rb_node;
 
cfqq->rb_key = rb_key;
rb_link_node(>rb_node, parent, p);
-   rb_insert_color(>rb_node, _tree->rb);
-   service_tree->count++;
+   rb_insert_color(>rb_node, >rb);
+   st->count++;
if (add_front || !new_cfqq)
return;
cfq_group_notify_queue_add(cfqd, cfqq->cfqg);
@@ -2117,19 +2115,18 @@ static inline void cfq_slice_expired(struct cfq_data 
*cfqd, bool timed_out)
  */
 static struct cfq_queue *cfq_get_next_queue(struct cfq_data *cfqd)
 {
-   struct cfq_rb_root *service_tree =
-   

[PATCH 07/24] blkcg: fix minor bug in blkg_alloc()

2012-12-28 Thread Tejun Heo
blkg_alloc() was mistakenly checking blkcg_policy_enabled() twice.
The latter test should have been on whether pol->pd_init_fn() exists.
This doesn't cause actual problems because both blkcg policies
implement pol->pd_init_fn().  Fix it.

Signed-off-by: Tejun Heo 
Acked-by: Vivek Goyal 
---
 block/blk-cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b8858fb..7ef747b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -114,7 +114,7 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, 
struct request_queue *q,
pd->blkg = blkg;
 
/* invoke per-policy init */
-   if (blkcg_policy_enabled(blkg->q, pol))
+   if (pol->pd_init_fn)
pol->pd_init_fn(blkg);
}
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHSET] block: implement blkcg hierarchy support in cfq, take#2

2012-12-28 Thread Tejun Heo
Hello,

This is the second iteration to implement blkcg hierarchy support in
cfq-iosched.  Changes from the first task[L] are

* Vivek's cfq cleanup patches are included in the series for
  convenience.

* Divide by zero bug when !CONFIG_CFQ_GROUP_IOSCHED reported by
  Fengguang fixed.

* Updated to reflect Vivek's reviews - renames & documentation.

* Recursive stats no longer forget stats from dead descendants.  This
  turned out to be more complex than I wished involving implementing
  policy on/offline callbacks.

cfq-iosched is currently utterly broken in how it handles cgroup
hierarchy.  It ignores the hierarchy structure and just treats every
blkcgs equally.  This is simply broken.  This breakage makes blkcg
behave very differently from other properly-hierarchical controllers
and makes it impossible to give any uniform interpretation to the
hierarchy, which in turn makes it impossible to implement unified
hierarchy.

Given the relative simplicity of cfqg scheduling, implementing proper
hierarchy support isn't that difficult.  All that's necessary is
determining how much fraction each cfqg on the service tree has claim
to considering the hierarchy.  The calculation can be done by
maintaining the sum of active weights at each level and compounding
the ratios from the cfqg in question to root.  The overhead isn't
significant.  Tree traversals happen only when cfqgs are added or
removed from the service tree and they are from the cfqg being
modified to the root.

There are some design choices which are worth mentioning.

* Internal (non-leaf) cfqgs w/ tasks treat the tasks as a single unit
  competeting against the children cfqgs.  New config knobs -
  blkio.leaf_weight[_device] - are added to configure the weight of
  these tasks.  Another way to look at it is that each cfqg has a
  hidden leaf child node attached to it which hosts all tasks and
  leaf_weight controls the weight of that hidden node.

  Treating cfqqs and cfqgs as equals doesn't make much sense to me and
  is hairy - we need to establish ioprio to weight mapping and the
  weights fluctuate as processes fork and exit.  This becomes hairier
  when considering multiple controllers, Such mappings can't be
  established consistently across different controllers and the
  weights are given out differently - ie. blkcg give weights out to
  io_contexts while cpu to tasks, which may share io_contexts.  It's
  difficult to make sense of what's going on.

  The goal is to bring cpu, currently the only other controller which
  implements weight based resource allocation, to similar behavior.

* The existing stats aren't converted to hierarchical but new
  hierarchical ones are added.  There isn't a way to do that w/o
  introducing nasty silent surprises to the existing flat hierarchy
  users, so while being a bit clumsy, I can't see a better way.

* I based it on top of Vivek's cleanup patchset[1] but not the cfqq,
  cfqg scheduling unification patchset.  I don't think it's necessary
  or beneficial to mix the two and would really like to avoid messing
  with !blkcg scheduling logic.

The hierarchical scheduling itself is fairly simple.  The cfq part is
only ~260 lines with ~60 lines being comment, and the hierarchical
weight scaling is really straight-forward.

This patchset contains the following 24 patches.

 0001-cfq-iosched-Properly-name-all-references-to-IO-class.patch
 0002-cfq-iosched-More-renaming-to-better-represent-wl_cla.patch
 0003-cfq-iosched-Rename-service_tree-to-st-at-some-places.patch
 0004-cfq-iosched-Rename-few-functions-related-to-selectin.patch
 0005-cfq-iosched-Get-rid-of-unnecessary-local-variable.patch
 0006-cfq-iosched-Print-sync-noidle-information-in-blktrac.patch
 0007-blkcg-fix-minor-bug-in-blkg_alloc.patch
 0008-blkcg-reorganize-blkg_lookup_create-and-friends.patch
 0009-blkcg-cosmetic-updates-to-blkg_create.patch
 0010-blkcg-make-blkcg_gq-s-hierarchical.patch
 0011-cfq-iosched-add-leaf_weight.patch
 0012-cfq-iosched-implement-cfq_group-nr_active-and-childr.patch
 0013-cfq-iosched-implement-hierarchy-ready-cfq_group-char.patch
 0014-cfq-iosched-convert-cfq_group_slice-to-use-cfqg-vfra.patch
 0015-cfq-iosched-enable-full-blkcg-hierarchy-support.patch
 0016-blkcg-add-blkg_policy_data-plid.patch
 0017-blkcg-implement-blkcg_policy-on-offline_pd_fn-and-bl.patch
 0018-blkcg-s-blkg_rwstat_sum-blkg_rwstat_total.patch
 0019-blkcg-implement-blkg_-rw-stat_recursive_sum-and-blkg.patch
 0020-block-RCU-free-request_queue.patch
 0021-blkcg-make-blkcg_print_blkgs-grab-q-locks-instead-of.patch
 0022-cfq-iosched-separate-out-cfqg_stats_reset-from-cfq_p.patch
 0023-cfq-iosched-collect-stats-from-dead-cfqgs.patch
 0024-cfq-iosched-add-hierarchical-cfq_group-statistics.patch

0001-0006 are Vivek's cfq cleanup patches.

0007-0009 are prep patches.

0010 makes blkcg core always allocate non-leaf blkgs so that any given
blkg is guaranteed to have all its ancestor blkgs to the root.

0011-0012 prepare for hierarchical scheduling.

0013-0014 

[PATCH] parisc: Remove obsolete regs parameter from do_execve() in hpux_execve()

2012-12-28 Thread Geert Uytterhoeven
commit da3d4c5fa56236dd924d77ffc4f982356816b93b ("get rid of pt_regs
argument of do_execve()") removed the parameter, but forgot to update
hpux_execve():

arch/parisc/hpux/fs.c: In function 'hpux_execve':
arch/parisc/hpux/fs.c:47:6: error: too many arguments to function 'do_execve'

Signed-off-by: Geert Uytterhoeven 
---
http://kisskb.ellerman.id.au/kisskb/buildresult/7864854/

 arch/parisc/hpux/fs.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/hpux/fs.c b/arch/parisc/hpux/fs.c
index a0760b8..838b479 100644
--- a/arch/parisc/hpux/fs.c
+++ b/arch/parisc/hpux/fs.c
@@ -43,8 +43,7 @@ int hpux_execve(struct pt_regs *regs)
 
error = do_execve(filename->name,
  (const char __user *const __user *) regs->gr[25],
- (const char __user *const __user *) regs->gr[24],
- regs);
+ (const char __user *const __user *) regs->gr[24]);
 
putname(filename);
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 005/173] ALSA: hda - Add Lynx Point HD Audio Controller DeviceIDs

2012-12-28 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Seth Heasley 

commit 8bc039a1e15a72da8426b84293723fb7181f0b5e upstream.

This patch adds the HD Audio DeviceIDs for the Intel Lynx Point PCH.

Signed-off-by: Seth Heasley 
Signed-off-by: Takashi Iwai 
Signed-off-by: Ben Hutchings 
---
 sound/pci/hda/hda_intel.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index d3bd3e7..e354c16 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -148,6 +148,7 @@ MODULE_SUPPORTED_DEVICE("{{Intel, ICH6},"
 "{Intel, PCH},"
 "{Intel, CPT},"
 "{Intel, PPT},"
+"{Intel, LPT},"
 "{Intel, PBG},"
 "{Intel, SCH},"
 "{ATI, SB450},"
@@ -3001,6 +3002,10 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids) = {
{ PCI_DEVICE(0x8086, 0x1e20),
  .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_SCH_SNOOP |
  AZX_DCAPS_BUFSIZE},
+   /* Lynx Point */
+   { PCI_DEVICE(0x8086, 0x8c20),
+ .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_SCH_SNOOP |
+ AZX_DCAPS_BUFSIZE},
/* SCH */
{ PCI_DEVICE(0x8086, 0x811b),
  .driver_data = AZX_DRIVER_SCH | AZX_DCAPS_SCH_SNOOP |


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 003/173] KVM: x86: invalid opcode oops on SET_SREGS with OSXSAVE bit set (CVE-2012-4461)

2012-12-28 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Petr Matousek 

commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream.

On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.

invalid opcode:  [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G  D  3.2.0-3-686-pae
EIP: 0060:[] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 0001 EBX: 000f387e ECX:  EDX: 
ESI:  EDI:  EBP: ef5a0060 ESP: d7c63e70
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
 0001 f70a1200 f8b940a9 ef5a0060  00200202 f8769009 
 ef5a0060 000f387e eda5c020 8722f9c8 00015bae  ed84a0c0 ed84a0c0
 c12bf02d ae80 ef7f8740 fffb f359b740 ef5a0060 f8b85dc1 ae80
Call Trace:
 [] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
 [] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70

QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.

Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.

Signed-off-by: Petr Matousek 
Signed-off-by: Marcelo Tosatti 
[bwh: Backported to 3.2: both functions are in arch/x86/kvm/x86.c]
Signed-off-by: Ben Hutchings 
---
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -578,6 +578,9 @@ static bool guest_cpuid_has_xsave(struct
 {
struct kvm_cpuid_entry2 *best;
 
+   if (!static_cpu_has(X86_FEATURE_XSAVE))
+   return 0;
+
best = kvm_find_cpuid_entry(vcpu, 1, 0);
return best && (best->ecx & bit(X86_FEATURE_XSAVE));
 }
@@ -6149,6 +6152,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct
int pending_vec, max_bits, idx;
struct desc_ptr dt;
 
+   if (!guest_cpuid_has_xsave(vcpu) && (sregs->cr4 & X86_CR4_OSXSAVE))
+   return -EINVAL;
+
dt.size = sregs->idt.limit;
dt.address = sregs->idt.base;
kvm_x86_ops->set_idt(vcpu, );


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 007/173] ALSA: hda - add id for Atom Cedar Trail HDMI codec

2012-12-28 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Wu Fengguang 

commit 6edc59e602b36cd3c95a426ef6e8cad0344af8c7 upstream.

[the order sorted by tiwai]

Signed-off-by: Wu Fengguang 
Signed-off-by: Takashi Iwai 
Signed-off-by: Ben Hutchings 
---
 sound/pci/hda/patch_hdmi.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
index 1168ebd..540cd13 100644
--- a/sound/pci/hda/patch_hdmi.c
+++ b/sound/pci/hda/patch_hdmi.c
@@ -1912,6 +1912,7 @@ static const struct hda_codec_preset 
snd_hda_preset_hdmi[] = {
 { .id = 0x80862804, .name = "IbexPeak HDMI",   .patch = patch_generic_hdmi },
 { .id = 0x80862805, .name = "CougarPoint HDMI",.patch = 
patch_generic_hdmi },
 { .id = 0x80862806, .name = "PantherPoint HDMI", .patch = patch_generic_hdmi },
+{ .id = 0x80862880, .name = "CedarTrail HDMI", .patch = patch_generic_hdmi },
 { .id = 0x808629fb, .name = "Crestline HDMI",  .patch = patch_generic_hdmi },
 {} /* terminator */
 };
@@ -1958,6 +1959,7 @@ MODULE_ALIAS("snd-hda-codec-id:80862803");
 MODULE_ALIAS("snd-hda-codec-id:80862804");
 MODULE_ALIAS("snd-hda-codec-id:80862805");
 MODULE_ALIAS("snd-hda-codec-id:80862806");
+MODULE_ALIAS("snd-hda-codec-id:80862880");
 MODULE_ALIAS("snd-hda-codec-id:808629fb");
 
 MODULE_LICENSE("GPL");


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 000/173] 3.2.36-stable review

2012-12-28 Thread Ben Hutchings
This is the start of the stable review cycle for the 3.2.36 release.
There are 173 patches in this series, which will be posted as responses
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jan  2 20:00:00 GMT 2013.
Anything received after that time might be too late.

A combined patch relative to 3.2.35 will be posted as an additional
response to this.  A shortlog and diffstat can be found below.

Ben.

-

Aaron Lu (1):
  libata: set dma_mode to 0xff in reset
 [5416912af75de9cba5d1c75b99a7888b0bbbd2fb]

Al Viro (1):
  ARM: missing ->mmap_sem around find_vma() in swp_emulate.c
 [7bf9b7bef881aac820bf1f2e9951a17b09bd7e04]

Alan Cox (3):
  ACPI: missing break
 [879dca019dc43a1622edca3e7dde644b14b5acc5]
  ne2000: add the right platform device
 [da9da01d9199b5bb15289d0859053c9aa3a34ac0]
  pnpacpi: fix incorrect TEST_ALPHA() test
 [cdc87c5a30f407ed1ce43d8a22261116873d5ef1]

Alan Stern (3):
  USB: EHCI: bugfix: urb->hcpriv should not be NULL
 [2656a9abcf1ec8dd5fee6a75d6997a0f2fa0094e]
  USB: OHCI: workaround for hardware bug: retired TDs not added to the Done 
Queue
 [50ce5c0683aa83eb161624ea89daa5a9eee0c2ce]
  USB: fix endpoint-disabling for failed config changes
 [36caff5d795429c572443894e8789c2150dd796b]

Alasdair G Kergon (1):
  dm ioctl: prevent unsafe change to dm_ioctl data_size
 [e910d7ebecd1aac43125944a8641b6cb1a0dfabe]

Alex Deucher (4):
  drm/radeon/dce32+: use fractional fb dividers for high clocks
 [a02dc74b317d78298cb0587b9b1f6f741fd5c139]
  drm/radeon/kms: use frac fb div on APUs
 [37d4174d2d252c37dcb3d88cafae488542087848]
  drm/radeon: fix eDP clk and lane setup for scaled modes
 [93927f9c1db5f55085457e820f0631064c7bfa34]
  drm: fix documentation for drm_crtc_set_mode()
 [4c9287c6009b37754c42e0ba73a4cc79de92d8f8]

Alexander Graf (1):
  KVM: PPC: 44x: fix DCR read/write
 [e43a028752fed049e4bd94ef895542f96d79fa74]

Alexander Shiyan (1):
  irda: sir_dev: Fix copy/paste typo
 [2355a62bcbdcc4b567425bab036bfab6ade87eed]

Alexander Stein (1):
  can: Do not call dev_put if restart timer is running upon close
 [ab48b03ec9ae1840a1e427e2375bd0d9d554b4ed]

Alexey Kaminsky (1):
  HID: apple: Add Apple wireless keyboard 2011 ANSI PID
 [0a97e1e9f9a6765e6243030ac42b04694f3f3647]

Alexis R. Cortes (1):
  usb: host: xhci: Stricter conditional for Z1 system models for Compliance 
Mode Patch
 [b0e4e606ff6ff26da0f60826e75577b56ba4e463]

Andreas Hartmann (1):
  rt2x00: Don't let mac80211 send a BAR when an AMPDU subframe fails
 [be03d4a45c09ee5100d3aaaedd087f19bc20d01f]

Anton Blanchard (1):
  powerpc: Fix CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n build
 [11ee7e99f35ecb15f59b21da6a82d96d2cd3fcc8]

Artem Bityutskiy (1):
  proc: pid/status: show all supplementary groups
 [8d238027b87e654be552eabdf492042a34c5c300]

Ben Hutchings (4):
  HID: Add Apple wireless keyboard 2011 ANSI to special driver list
 [f9af7b9edccb87d4d80b58687ab63e58f3b64c4c]
  Revert "device_cgroup: fix RCU usage"
 [not upstream; change was correct for mainline but not 3.2]
  Revert "mm: vmscan: fix endless loop in kswapd balancing"
 [not upstream; change was correct for mainline but not 3.2]
  mm: add kmap_to_page()
 [part of 5a178119b0fbe37f7dfb602b37df9cc4b1dc9d71]

Bjørn Mork (1):
  USB: option: blacklist network interface on Huawei E173
 [f36446cf9bbebaa03a80d95cfeeafbaf68218249]

Boris Ostrovsky (1):
  x86,AMD: Power driver support for AMD's family 16h processors
 [22e32f4f57778ebc6e17812fa3008361c05d64f9]

Bryan Schumaker (1):
  NFS: Add sequence_priviliged_ops for nfs4_proc_sequence()
 [6bdb5f213c4344324f600dde885f25768fbd14db]

Calvin Walton (1):
  i915: Quirk no_lvds on Gigabyte GA-D525TUD ITX motherboard
 [a51d4ed01e5bb39d2cf36a12f9976ab08872c192]

Charles Keepax (1):
  mfd: Only unregister platform devices allocated by the mfd core
 [b9fbb62eb61452d728c39b2e5020739c575aac53]

Chase Douglas (1):
  HID: hid-magicmouse: Add pointer and buttonpad properties for Magic 
Trackpad
 [503f7d53b368c8e4e13a7756220c0536fcf1a034]

Chris Ball (1):
  Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts"
 [6984f3c31bb57cb7491dbec1be44b74bd00f4648]

Chris Wilson (2):
  drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
 [c31407a3672aaebb4acddf90944a114fa5c8af7b]
  drm/i915: Close race between processing unpin task and queueing the flip
 [e7d841ca03b7ab668620045cd7b428eda9f41601]

Christian Borntraeger (1):
  s390/kvm: dont announce RRBM support
 [87cac8f879a5ecd7109dbe688087e8810b3364eb]

Corentin Chary (1):
  acpi/video_detect: blacklist samsung 

[ 002/173] freezer: PF_FREEZER_NOSIG should be cleared along with PF_NOFREEZE

2012-12-28 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Oleg Nesterov 

This patch is only for pre-v3.3 stable trees which backported
b40a7959 "freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD".
v3.3+ doesn't need this fix.

b40a7959 is the trivial bugfix, but unfortunately I forgot that
until 34b087e4 "freezer: kill unused set_freezable_with_signal()"
there were another only-for-kernel-threads flag, PF_FREEZER_NOSIG,
which should be cleared as well.

See https://bugs.launchpad.net/ubuntu/+source/v86d/+bug/1080530
The freezer fails because it expects that a PF_FREEZER_NOSIG task
doesn't need a signal. Before b40a7959 it wrongly succeeds leaving
the PF_NOFREEZE | PF_FREEZER_NOSIG task unfrozen.

Reported-and-tested-by: Joseph Salisbury 
Signed-off-by: Oleg Nesterov 
[bwh: Don't touch PF_FORKNOEXEC; it's cleared elsewhere]
Signed-off-by: Ben Hutchings 

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1095,7 +1095,8 @@ int flush_old_exec(struct linux_binprm *
bprm->mm = NULL;/* We're using it now */
 
set_fs(USER_DS);
-   current->flags &= ~(PF_RANDOMIZE | PF_KTHREAD | PF_NOFREEZE);
+   current->flags &=
+   ~(PF_RANDOMIZE | PF_KTHREAD | PF_NOFREEZE | PF_FREEZER_NOSIG);
flush_thread();
current->personality &= ~bprm->per_clear;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] mm, bootmem: panic in bootmem alloc functions even if slab is available

2012-12-28 Thread Yinghai Lu
On Fri, Dec 28, 2012 at 6:42 AM, JoonSoo Kim  wrote:
>
> I have a different idea.
> How about removing fallback allocation in bootmem.c completely?
> I don't know why it is there exactly.
> But, warning for 'slab_is_available()' is there for a long time.
> So, most people who misuse fallback allocation change their code adequately.
> I think that removing fallback at this time is valid. Isn't it?

if you guys really want to make thing simple, please do try to help to kill
mm/bootmem.c and use memblock instead.

at last we could the wrapper mm/nobootmem.c.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 010/173] ACPI: missing break

2012-12-28 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Alan Cox 

commit 879dca019dc43a1622edca3e7dde644b14b5acc5 upstream.

We handle NOTIFY_THROTTLING so don't then fall through to unsupported event.

Signed-off-by: Alan Cox 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Ben Hutchings 
---
 drivers/acpi/processor_driver.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index e78c2a5..bd4e5dc 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -409,6 +409,7 @@ static void acpi_processor_notify(struct acpi_device 
*device, u32 event)
acpi_bus_generate_proc_event(device, event, 0);
acpi_bus_generate_netlink_event(device->pnp.device_class,
  dev_name(>dev), 
event, 0);
+   break;
default:
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
  "Unsupported event [0x%x]\n", event));


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 018/173] sata_svw: check DMA start bit before reset

2012-12-28 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: David Milburn 

commit b03e66a6be91f8389fcd902ab6c1563db1c9c06b upstream.

If kdump is triggered with pending IO, controller may not respond causing
kdump to fail.

http://marc.info/?l=linux-ide=133032255424658=2

During error recovery ata_do_dev_read_id never completes due hang
in mmio_insw.

ata_do_dev_read_id
 ata_sff_data_xfer
  ioread16_rep
   mmio_insw

if DMA start bit is cleared before reset, PIO command is successful
and kdump succeeds.

Signed-off-by: David Milburn 
Signed-off-by: Jeff Garzik 
Signed-off-by: Ben Hutchings 
---
 drivers/ata/sata_svw.c |   35 +++
 1 file changed, 35 insertions(+)

diff --git a/drivers/ata/sata_svw.c b/drivers/ata/sata_svw.c
index 44a4256..08608de 100644
--- a/drivers/ata/sata_svw.c
+++ b/drivers/ata/sata_svw.c
@@ -142,6 +142,39 @@ static int k2_sata_scr_write(struct ata_link *link,
return 0;
 }
 
+static int k2_sata_softreset(struct ata_link *link,
+unsigned int *class, unsigned long deadline)
+{
+   u8 dmactl;
+   void __iomem *mmio = link->ap->ioaddr.bmdma_addr;
+
+   dmactl = readb(mmio + ATA_DMA_CMD);
+
+   /* Clear the start bit */
+   if (dmactl & ATA_DMA_START) {
+   dmactl &= ~ATA_DMA_START;
+   writeb(dmactl, mmio + ATA_DMA_CMD);
+   }
+
+   return ata_sff_softreset(link, class, deadline);
+}
+
+static int k2_sata_hardreset(struct ata_link *link,
+unsigned int *class, unsigned long deadline)
+{
+   u8 dmactl;
+   void __iomem *mmio = link->ap->ioaddr.bmdma_addr;
+
+   dmactl = readb(mmio + ATA_DMA_CMD);
+
+   /* Clear the start bit */
+   if (dmactl & ATA_DMA_START) {
+   dmactl &= ~ATA_DMA_START;
+   writeb(dmactl, mmio + ATA_DMA_CMD);
+   }
+
+   return sata_sff_hardreset(link, class, deadline);
+}
 
 static void k2_sata_tf_load(struct ata_port *ap, const struct ata_taskfile *tf)
 {
@@ -346,6 +379,8 @@ static struct scsi_host_template k2_sata_sht = {
 
 static struct ata_port_operations k2_sata_ops = {
.inherits   = _bmdma_port_ops,
+   .softreset  = k2_sata_softreset,
+   .hardreset  = k2_sata_hardreset,
.sff_tf_load= k2_sata_tf_load,
.sff_tf_read= k2_sata_tf_read,
.sff_check_status   = k2_stat_check_status,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] get_maintainer.pl: Find maintainers for removed files

2012-12-28 Thread Joe Perches
On Fri, 2012-12-28 at 20:27 +0100, Geert Uytterhoeven wrote:
> For removed files, get_maintainer.pl doesn't find any maintainers (besides
> the default linux-kernel@vger.kernel.org), as it only looks at the "+++"
> lines, which are "/dev/null" for removals.
> Fix this by extending the parsing to the "---" lines.
[]
> diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl
[]
> @@ -433,7 +433,7 @@ foreach my $file (@ARGV) {
>  
>   while (<$patch>) {
>   my $patch_line = $_;
> - if (m/^\+\+\+\s+(\S+)/) {
> + if (m/^\+\+\+\s+(\S+)/ or m/^---\s+(\S+)/) {
>   my $filename = $1;
>   $filename =~ s@^[^/]*/@@;
>   $filename =~ s@\n@@;

OK, but this would now match any keyword in a deleted filename.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >