date:20140114

Re: [PATCH] gpio: pxa: normalize the return value for gpio_get

2014-01-14 Thread Linus Walleij

On Fri, Jan 10, 2014 at 7:03 AM, Neil Zhang  wrote:

> It would be convenient to normalize the return value for gpio_get.
>
> I have checked mach-mmp / mach-pxa / plat-pxa / plat-orion / mach-orion5x.
> It's OK for all of them to change this function to return 0 and 1.
>
> Signed-off-by: Neil Zhang 

Bah I updated the commit message a bit ... you dropped the
gpio: pxa: etc from the previous patch, not good but don't worry
I fixed it up.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Jan 15

2014-01-14 Thread Stephen Rothwell

Hi all,

This tree fails (more than usual) the powerpc allyesconfig build.

Changes since 20140114:

Dropped tree: sh (complex merge conflicts against very old commits)

Removed tree: pstore (at maintainer's request)

The powerpc tree still had its build failure.

The net-next tree gained a conflict against the mips tree.

The infiniband tree gained a conflict against the net-next tree.

The device-mapper tree gained a build failure, so I used the version from
next-20130114.

The md tree gained conflicts against the block tree.

The iommu tree gained a conflict against the drm tree.

The audit tree gained a conflict against Linus' tree.

The tip tree lost its build failure and gained conflicts against the mips
tree.

The kvm tree still had its build failure so I used the version from
next-20140109.

Non-merge commits (relative to Linus' tree): 9105
 8462 files changed, 439243 insertions(+), 228766 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final
link) and i386, sparc, sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

I am currently merging 208 trees (counting Linus' and 29 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (a6da83f98267 Merge branch 'merge' of 
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc)
Merging fixes/master (b0031f227e47 Merge tag 's2mps11-build' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator)
Merging kbuild-current/rc-fixes (19514fc665ff arm, kbuild: make "make install" 
not depend on vmlinux)
Merging arc-current/for-curr (1e01c7eb7c43 ARC: Allow conditional multiple 
inclusion of uapi/asm/unistd.h)
Merging arm-current/fixes (b25f3e1c3584 ARM: 7938/1: OMAP4/highbank: Flush L2 
cache before disabling)
Merging m68k-current/for-linus (77a42796786c m68k: Remove deprecated 
IRQF_DISABLED)
Merging metag-fixes/fixes (3b2f64d00c46 Linux 3.11-rc2)
Merging powerpc-merge/merge (10348f597683 powerpc: Check return value of 
instance-to-package OF call)
Merging sparc/master (ef350bb7c5e0 Merge tag 'ext4_for_linus_stable' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4)
Merging net/master (fdc3452cd2c7 net: usbnet: fix SG initialisation)
Merging ipsec/master (965cdea82569 dccp: catch failed request_module call in 
dccp_probe init)
Merging sound-current/for-linus (150116bcfbd9 ALSA: hiface: Fix typo in 352800 
rate definition)
Merging pci-current/for-linus (f0b75693cbb2 MAINTAINERS: Add DesignWare, i.MX6, 
Armada, R-Car PCI host maintainers)
Merging wireless/master (2eff7c791a18 Merge tag 'nfc-fixes-3.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes)
Merging driver-core.current/driver-core-linus (413541dd66d5 Linux 3.13-rc5)
Merging tty.current/tty-linus (413541dd66d5 Linux 3.13-rc5)
Merging usb.current/usb-linus (413541dd66d5 Linux 3.13-rc5)
Merging staging.current/staging-linus (413541dd66d5 Linux 3.13-rc5)
Merging char-misc.current/char-misc-linus (802eee95bde7 Linux 3.13-rc6)
Merging input-current/for-linus (8e2f2325b73f Input: xpad - add new USB IDs for 
Logitech F310 and F710)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (efb753b8e013 crypto: ixp4xx - Fix kernel compile 
error)
Merging id

Re: [PATCH] gpio: pxa: fix bug when get gpio value

2014-01-14 Thread Linus Walleij

On Thu, Jan 9, 2014 at 11:46 AM, Gerhard Sittig  wrote:

> Here is why I'm asking:  Is there a need from GPIO get_value()
> routines to return normalized values,

That totally depends.

All drivers calling gpio[d]_get_value() will be returned the
value directly from the driver without any clamping to [0,1] in
gpiolib. These are 496 occurences in the kernel, you'd have
to check them all to see if they expect this or not.

Hm. Maybe we should clamp it in gpiolib...

> and if so should not more
> drivers receive an update?

Probably. But on my part I want that more as a code
readability and maintenance hygiene thing, it gives a clear
sign that the driver author think about details.

(Possibly it gives the compiler a chance to optimize stuff
also, I don't quite know that.)

> If the GPIO subsystem's API wants to guarantee values of 0 and 1
> (which I think it doesn't), then I feel the adjustment should be
> done in the gpio_get_value() routines (in all its public
> variants, or a common routine which all of them pass through),
> and certainly not in individual chip drivers.

One does not exclude the other.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/3] mutex: In mutex_can_spin_on_owner(), return false if task need_resched()

2014-01-14 Thread Peter Zijlstra

On Wed, Jan 15, 2014 at 08:44:20AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 14, 2014 at 04:33:08PM -0800, Jason Low wrote:
> > The mutex_can_spin_on_owner() function should also return false if the
> > task needs to be rescheduled.
> > 
> 
> While I was staring at mutex_can_spin_on_owner(); don't we need this?
> 
>  kernel/locking/mutex.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 4dd6e4c219de..480d2f437964 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -214,8 +214,10 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> *lock)
>  
>   rcu_read_lock();
>   owner = ACCESS_ONCE(lock->owner);
> - if (owner)
> + if (owner) {

That is, its an unmatched barrier, as mutex_set_owner() doesn't include
a barrier, and I don't think i needs to; but on alpha we still need this
read barrier to ensure we do not mess up this related load afaik.

Paul? can you explain an unpaired read_barrier_depends?

> + smp_read_barrier_depends();
>   retval = owner->on_cpu;
> + }
>   rcu_read_unlock();
>   /*
>* if lock->owner is not set, the mutex owner may have just acquired
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3.1 11/21] ARM: pxa: support ICP DAS LP-8x4x FPGA irq

2014-01-14 Thread Linus Walleij

This is looking much better!

On Fri, Jan 10, 2014 at 12:07 AM, Sergei Ianovich  wrote:

> +++ b/drivers/irqchip/irq-lp8x4x.c
(...)

You could add some kerneldoc to this following struct (OK nitpick, but
still nice, especially for the last two variables).

> +struct lp8x4x_irq_data {
> +   void*base;
> +   struct irq_domain   *domain;
> +   unsigned long   num_irq;
> +   unsigned char   irq_sys_enabled;
> +   unsigned char   irq_high_enabled;
> +};
> +
> +static void lp8x4x_mask_irq(struct irq_data *d)
> +{
> +   unsigned mask;
> +   unsigned long irq = d->hwirq;

Name the local variable hwirq too so we know what it is.

> +   struct lp8x4x_irq_data *host = irq_data_get_irq_chip_data(d);
> +
> +   if (!host) {
> +   pr_err("lp8x4x: missing host data for irq %i\n", d->irq);
> +   return;
> +   }
> +
> +   if (irq >= host->num_irq) {
> +   pr_err("lp8x4x: wrong irq handler for irq %i\n", d->irq);
> +   return;
> +   }

This is on the hotpath. Do you *really* need these two checks?

(...)
> +static void lp8x4x_unmask_irq(struct irq_data *d)
> +{
> +   unsigned mask;
> +   unsigned long irq = d->hwirq;

Name the variable "hwirq".

> +   struct lp8x4x_irq_data *host = irq_data_get_irq_chip_data(d);
> +
> +   if (!host) {
> +   pr_err("lp8x4x: missing host data for irq %i\n", d->irq);
> +   return;
> +   }
> +
> +   if (irq >= host->num_irq) {
> +   pr_err("lp8x4x: wrong irq handler for irq %i\n", d->irq);
> +   return;
> +   }

Again overzealous error checks.

(...)
> +static void lp8x4x_irq_handler(unsigned int irq, struct irq_desc *desc)
> +{
> +   int n;
> +   unsigned long mask;
> +   struct irq_chip *chip = irq_desc_get_chip(desc);
> +   struct lp8x4x_irq_data *host = irq_desc_get_handler_data(desc);
> +
> +   if (!host)
> +   return;

I don't think this happens either?

> +   chained_irq_enter(chip, desc);
> +
> +   for (;;) {
> +   mask = ioread8(host->base + CLRHILVINT) & 0xff;
> +   mask |= (ioread8(host->base + SECOINT) & SECOINT_MASK) << 8;
> +   mask |= (ioread8(host->base + PRIMINT) & PRIMINT_MASK) << 8;
> +   mask &= host->irq_high_enabled | (host->irq_sys_enabled << 8);
> +   if (mask == 0)
> +   break;
> +   for_each_set_bit(n, , BITS_PER_LONG)
> +   generic_handle_irq(irq_find_mapping(host->domain, n));
> +   }

I like the looks of this.

If you fix this:
Reviewed-by: Linus Walleij 

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/3] mutex: In mutex_can_spin_on_owner(), return false if task need_resched()

2014-01-14 Thread Peter Zijlstra

On Tue, Jan 14, 2014 at 04:33:08PM -0800, Jason Low wrote:
> The mutex_can_spin_on_owner() function should also return false if the
> task needs to be rescheduled.
> 

While I was staring at mutex_can_spin_on_owner(); don't we need this?

 kernel/locking/mutex.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4dd6e4c219de..480d2f437964 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -214,8 +214,10 @@ static inline int mutex_can_spin_on_owner(struct mutex 
*lock)
 
rcu_read_lock();
owner = ACCESS_ONCE(lock->owner);
-   if (owner)
+   if (owner) {
+   smp_read_barrier_depends();
retval = owner->on_cpu;
+   }
rcu_read_unlock();
/*
 * if lock->owner is not set, the mutex owner may have just acquired
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Dear Customer

2014-01-14 Thread NAUKRI.COM

This message is from Naukri Job Portal and to all registered Naukri account 
owners. We are currently facing phishers on our Data Base due to Spam. We want 
to exercise an improve secure service quality in our Admin System to reduce the 
spam in every job/users portal. Please Confirm your Naukri Login account. click 
the blow link, fill your detail of Naukri login.

http://naukriresdexportalsecurityadministration.yolasite.com

Confirmation of your Naukri account will help to stop spaming.

Warning!!!
Naukri Secure Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Dear Customer

2014-01-14 Thread NAUKRI.COM

This message is from Naukri Job Portal and to all registered Naukri account 
owners. We are currently facing phishers on our Data Base due to Spam. We want 
to exercise an improve secure service quality in our Admin System to reduce the 
spam in every job/users portal. Please Confirm your Naukri Login account. click 
the blow link, fill your detail of Naukri login.

http://naukriresdexportalsecurityadministration.yolasite.com

Confirmation of your Naukri account will help to stop spaming.

Warning!!!
Naukri Secure Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 11/21] ARM: pxa: support ICP DAS LP-8x4x FPGA irq

2014-01-14 Thread Linus Walleij

On Wed, Jan 8, 2014 at 8:01 PM, Sergei Ianovich  wrote:
> On Thu, 2014-01-02 at 13:32 +0100, Linus Walleij wrote:
>> On Tue, Dec 17, 2013 at 8:37 PM, Sergei Ianovich  wrote:
>> Usually combined GPIO+IRQ controllers are put into drivers/gpio but
>> this is a bit special as it seems to handle also non-GPIO-related IRQs
>> so let's get some input on this.
>
> This one is a plain IRQ controller. It has simple input lines, not GPIO
> pins. The chip reports its status to upper level interrupt controller.
> The upper level controller is PXA GPIO in this case.

Hm I don't know why I was deluded into thinking this had something to
do with GPIO. I must have been soft in the head. Sorry about all those
comments ...

I'll re-read the irqchip driver v3.1.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Peter Zijlstra

On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote:
> Currently we just try to find least load cpu. If some cpus idled,
> we just pick the first cpu in cpu mask.
> 
> In fact we can get the interrupted idle cpu or the latest idled cpu,
> then we may get the benefit from both latency and power.
> The selected cpu maybe not the best, since other cpu may be interrupted
> during our selecting. But be captious costs too much.

No, we should not do anything like this without first integrating
cpuidle.

At which point we have a sane view of the idle states and can make a
sane choice between them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries

2014-01-14 Thread Jason Low

On Tue, 2014-01-14 at 17:06 -0800, Davidlohr Bueso wrote:
> On Tue, 2014-01-14 at 16:33 -0800, Jason Low wrote:
> > When running workloads that have high contention in mutexes on an 8 socket
> > machine, spinners would often spin for a long time with no lock owner.
> > 
> > One of the potential reasons for this is because a thread can be preempted
> > after clearing lock->owner but before releasing the lock
> 
> What happens if you invert the order here? So mutex_clear_owner() is
> called after the actual unlocking (__mutex_fastpath_unlock).

Reversing the mutex_fastpath_unlock and mutex_clear_owner resulted in a
20+% performance improvement to Ingo's test-mutex application at 160
threads on an 8 socket box.

I have tried this method before, but what I was initially concerned
about with clearing the owner after unlocking was that the following
scenario may occur.

thread 1 releases the lock
thread 2 acquires the lock (in the fastpath)
thread 2 sets the owner
thread 1 clears owner

In this situation, lock owner is NULL but thread 2 has the lock.

> >  or preempted after
> > acquiring the mutex but before setting lock->owner. 
> 
> That would be the case _only_ for the fastpath. For the slowpath
> (including optimistic spinning) preemption is already disabled at that
> point.

Right, for just the fastpath_lock.

> > In those cases, the
> > spinner cannot check if owner is not on_cpu because lock->owner is NULL.
> > 
> > A solution that would address the preemption part of this problem would
> > be to disable preemption between acquiring/releasing the mutex and
> > setting/clearing the lock->owner. However, that will require adding overhead
> > to the mutex fastpath.
> 
> It's not uncommon to disable preemption in hotpaths, the overhead should
> be quite smaller, actually.
> 
> > 
> > The solution used in this patch is to limit the # of times thread can spin 
> > on
> > lock->count when !owner.
> > 
> > The threshold used in this patch for each spinner was 128, which appeared to
> > be a generous value, but any suggestions on another method to determine
> > the threshold are welcomed.
> 
> Hmm generous compared to what? Could you elaborate further on how you
> reached this value? These kind of magic numbers have produced
> significant debate in the past.

I've observed that when running workloads which don't exhibit this
behavior (long spins with no owner), threads rarely take more than 100
extra spins. So I went with 128 based on those number.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] pinctrl: single: fix infinite loop caused by bad mask

2014-01-14 Thread Linus Walleij

On Thu, Jan 9, 2014 at 1:50 PM, Tomi Valkeinen  wrote:

> commit 4e7e8017a80e1 (pinctrl: pinctrl-single:
> enhance to configure multiple pins of different modules) improved
> support for pinctrl-single,bits option, but also caused a regression
> in parsing badly configured mask data.
>
> If the masks in DT data are not quite right,
> pcs_parse_bits_in_pinctrl_entry() can end up in an infinite loop,
> trashing memory at the same time.
>
> Add a check to verify that each loop actually removes bits from the
> 'mask', so that the loop can eventually end.
>
> Signed-off-by: Tomi Valkeinen 
> Acked-by: Tony Lindgren 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] pinctrl: single: fix pcs_disable with bits_per_mux

2014-01-14 Thread Linus Walleij

On Thu, Jan 9, 2014 at 1:50 PM, Tomi Valkeinen  wrote:

> pcs_enable() uses vals->mask instead of pcs->fmask when bits_per_mux is
> enabled. However, pcs_disable() always uses pcs->fmask.
>
> Fix pcs_disable() to use vals->mask with bits_per_mux.
>
> Signed-off-by: Tomi Valkeinen 
> Acked-by: Peter Ujfalusi 
> Acked-by: Tony Lindgren 

Patch applied. However make sure to CC Haojian on such patches as
he's using it on a non-OMAP system and often has good feedback.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] pinctrl: single: fix DT bindings documentation

2014-01-14 Thread Linus Walleij

On Thu, Jan 9, 2014 at 1:50 PM, Tomi Valkeinen  wrote:

> Remove extra comma in pinctrl-single documentation.
>
> Signed-off-by: Tomi Valkeinen 
> Acked-by: Tony Lindgren 

Patch applied.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ACPI / init: Run acpi_early_init() before timekeeping_init()

2014-01-14 Thread Lee, Chun-Yi

This is a variant patch from Rafael J. Wysocki's
ACPI / init: Run acpi_early_init() before efi_enter_virtual_mode()

According to Matt Fleming, if acpi_early_init() was executed before
efi_enter_virtual_mode(), the EFI initialization could benefit from
it, so Rafael's patch makes that happen.

And, we want accessing ACPI TAD device to set system clock, so move
acpi_early_init() before timekeeping_init(). This final position is
also before efi_enter_virtual_mode().

v2:
Move acpi_early_init() before timekeeping_init() to prepare setting
system clock with ACPI TAD.

v1:
Rafael J. Wysocki
ACPI / init: Run acpi_early_init() before efi_enter_virtual_mode()

Cc: Rafael J. Wysocki 
Cc: Matt Fleming 
Cc: H. Peter Anvin 
Cc: Borislav Petkov 
Cc: Matthew Garrett 
Tested-by: Toshi Kani 
Signed-off-by: Lee, Chun-Yi 
---
 init/main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/main.c b/init/main.c
index febc511..b6d93c8 100644
--- a/init/main.c
+++ b/init/main.c
@@ -565,6 +565,7 @@ asmlinkage void __init start_kernel(void)
init_timers();
hrtimers_init();
softirq_init();
+   acpi_early_init();
timekeeping_init();
time_init();
sched_clock_postinit();
@@ -641,7 +642,6 @@ asmlinkage void __init start_kernel(void)
 
check_bugs();
 
-   acpi_early_init(); /* before LAPIC and SMP init */
sfi_init_late();
 
if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-- 
1.6.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] tun/macvtap: limit the packets queued through rcvbuf

2014-01-14 Thread Michael S. Tsirkin

On Wed, Jan 15, 2014 at 11:36:01AM +0800, Jason Wang wrote:
> On 01/14/2014 05:52 PM, Michael S. Tsirkin wrote:
> > On Tue, Jan 14, 2014 at 04:45:24PM +0800, Jason Wang wrote:
> >> > On 01/14/2014 04:25 PM, Michael S. Tsirkin wrote:
> >>> > > On Tue, Jan 14, 2014 at 02:53:07PM +0800, Jason Wang wrote:
>  > >> We used to limit the number of packets queued through 
>  > >> tx_queue_length. This
>  > >> has several issues:
>  > >>
>  > >> - tx_queue_length is the control of qdisc queue length, simply 
>  > >> reusing it
>  > >>   to control the packets queued by device may cause confusion.
>  > >> - After commit 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 ("macvtap: 
>  > >> Add
>  > >>   support of packet capture on macvtap device."), an unexpected 
>  > >> qdisc
>  > >>   caused by non-zero tx_queue_length will lead qdisc lock 
>  > >> contention for
>  > >>   multiqueue deivce.
>  > >> - What we really want is to limit the total amount of memory 
>  > >> occupied not
>  > >>   the number of packets.
>  > >>
>  > >> So this patch tries to solve the above issues by using socket 
>  > >> rcvbuf to
>  > >> limit the packets could be queued for tun/macvtap. This was done by 
>  > >> using
>  > >> sock_queue_rcv_skb() instead of a direct call to skb_queue_tail(). 
>  > >> Also two
>  > >> new ioctl() were introduced for userspace to change the rcvbuf like 
>  > >> what we
>  > >> have done for sndbuf.
>  > >>
>  > >> With this fix, we can safely change the tx_queue_len of macvtap to
>  > >> zero. This will make multiqueue works without extra lock contention.
>  > >>
>  > >> Cc: Vlad Yasevich 
>  > >> Cc: Michael S. Tsirkin 
>  > >> Cc: John Fastabend 
>  > >> Cc: Stephen Hemminger 
>  > >> Cc: Herbert Xu 
>  > >> Signed-off-by: Jason Wang 
> >>> > > No, I don't think we can change userspace-visible behaviour like that.
> >>> > >
> >>> > > This will break any existing user that tries to control
> >>> > > queue length through sysfs,netlink or device ioctl.
> >> > 
> >> > But it looks like a buggy API, since tx_queue_len should be for qdisc
> >> > queue length instead of device itself.
> > Probably, but it's been like this since 2.6.x time.
> > Also, qdisc queue is unused for tun so it seemed kind of
> > reasonable to override tx_queue_len.
> >
> >> > If we really want to preserve the
> >> > behaviour, how about using a new feature flag and change the behaviour
> >> > only when the device is created (TUNSETIFF) with the new flag?
> > OK this addresses the issue partially, but there's also an issue
> > of permissions: tx_queue_len can only be changed if
> > capable(CAP_NET_ADMIN). OTOH in your patch a regular user
> > can change the amount of memory consumed per queue
> > by calling TUNSETRCVBUF.
> 
> Yes, but we have the same issue for TUNSETSNDBUF.

To an extent, but TUNSETSNDBUF is different. It limits how much device can queue
*in the networking stack* but each queue in the stack is also
limited, when we exceed that we star dropping packets.
So while with infinite value (which is the default btw)
you can keep host pretty busy, you will not be able to run
it out of memory.

The proposed TUNSETRCVBUF would keep configured amount
of memory around indefinitely so you can run host out of memory.

So assuming all this
How about an ethtool or netlink command to configure this
instead?

> >
> >>> > >
> >>> > > Take a look at my patch in msg ID 20140109071721.gd19...@redhat.com
> >>> > > which gives one way to set tx_queue_len to zero without
> >>> > > breaking userspace.
> >> > 
> >> > If I read the patch correctly, it will make no way for the user who
> >> > really want to change the qdisc queue length for tun.
> > Why would this matter?  As far as I can see qdisc queue is currently unused.
> >
> 
> User may use qdisc to do port mirroring, bandwidth limitation, traffic
> prioritization or more for a VM. So we do have users and maybe more
> consider the case of vpn.

Well it's not used by default at least.
I remember that we discussed this previously actually.

If all we want to do actually is utilize no_qdisc by default,
we can simply use Eric's patch:

http://article.gmane.org/gmane.linux.kernel/1279597

and a similar patch for macvtap.
I tried it at the time and it didn't seem to help performance
at all, but a lot has changed since, in particular I didn't
test mq.

If you now have results showing how it's beneficial, pls post them.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFT][PATCH] ACPI / init: Run acpi_early_init() before efi_enter_virtual_mode()

2014-01-14 Thread joeyli

於 二，2014-01-14 於 13:32 -0700，Toshi Kani 提到：
> > > +   acpi_early_init();
> > > timekeeping_init();
> > > time_init();
> > > sched_clock_postinit();
> > > @@ -641,7 +642,6 @@ asmlinkage void __init start_kernel(void)
> > >  
> > > check_bugs();
> > >  
> > > -   acpi_early_init(); /* before LAPIC and SMP init */
> > > sfi_init_late();
> > >  
> > > if (efi_enabled(EFI_RUNTIME_SERVICES)) {
> > > 
> > 
> > Hi Toshi,
> > 
> > Could you try this variant, too?  If this works as well then we end
> up
> > solving two problems in one patch...
> 
> Hi Peter,
> 
> Yes, this version works fine as well.
> 
> Tested-by: Toshi Kani 
> 
> Thanks,
> -Toshi

Thanks a lot for your testing.

I will re-send a formal patch with changelog to everybody.

Regards
Joey Lee



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru

2014-01-14 Thread David Rientjes

On Tue, 14 Jan 2014, Dave Hansen wrote:

> > block/blk-mq.c: In function ‘blk_mq_free_rq_map’:
> > block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> > block/blk-mq.c:1094:10: warning: initialization from incompatible pointer 
> > type [enabled by default]
> > block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> > block/blk-mq.c:1095:22: error: ‘struct page’ has no member named ‘list’
> > block/blk-mq.c: In function ‘blk_mq_init_rq_map’:
> > block/blk-mq.c:1159:22: error: ‘struct page’ has no member named ‘list’
> 
> As I mentioned in the introduction, these are against linux-next.
> There's a patch in there at the moment which fixed this.
> 

Ok, thanks, I like this patch.

Acked-by: David Rientjes

[PATCH 2/2] powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()

2014-01-14 Thread Michael Ellerman

At a glance these are just the inverse of each other. The one subtlety
is that arch_spin_value_unlocked() takes the lock by value, rather than
as a pointer, which is important for the lockref code.

On the other hand arch_spin_is_locked() doesn't really care, so
implement it in terms of arch_spin_value_unlocked().

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/spinlock.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 5162f8c..a30ef69 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define arch_spin_is_locked(x) ((x)->slock != 0)
-
 #ifdef CONFIG_PPC64
 /* use 0x80yy when locked, where yy == CPU number */
 #ifdef __BIG_ENDIAN__
@@ -59,6 +57,11 @@ static __always_inline int 
arch_spin_value_unlocked(arch_spinlock_t lock)
return lock.slock == 0;
 }
 
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+   return !arch_spin_value_unlocked(*lock);
+}
+
 /*
  * This returns the old value in the lock, so we succeeded
  * in getting the lock if the return value is 0.
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] powerpc: Add support for the optimised lockref implementation

2014-01-14 Thread Michael Ellerman

This commit adds the architecture support required to enable the
optimised implementation of lockrefs.

That's as simple as defining arch_spin_value_unlocked() and selecting
the Kconfig option.

We also define cmpxchg64_relaxed(), because the lockref code does not
need the cmpxchg to have barrier semantics.

Using Linus' test case[1] on one system I see a 4x improvement for the
basic enablement, and a further 1.3x for cmpxchg64_relaxed(), for a
total of 5.3x vs the baseline.

On another system I see more like 2x improvement.

[1]: http://marc.info/?l=linux-fsdevel=137782380714721=4

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Kconfig| 1 +
 arch/powerpc/include/asm/cmpxchg.h  | 1 +
 arch/powerpc/include/asm/spinlock.h | 5 +
 3 files changed, 7 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b44b52c..b34b53d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -139,6 +139,7 @@ config PPC
select OLD_SIGACTION if PPC32
select HAVE_DEBUG_STACKOVERFLOW
select HAVE_IRQ_EXIT_ON_IRQ_STACK
+   select ARCH_USE_CMPXCHG_LOCKREF if PPC64
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/cmpxchg.h 
b/arch/powerpc/include/asm/cmpxchg.h
index e245aab..d463c68 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -300,6 +300,7 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, 
unsigned long new,
BUILD_BUG_ON(sizeof(*(ptr)) != 8);  \
cmpxchg_local((ptr), (o), (n)); \
   })
+#define cmpxchg64_relaxed  cmpxchg64_local
 #else
 #include 
 #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 5f54a74..5162f8c 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -54,6 +54,11 @@
 #define SYNC_IO
 #endif
 
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+   return lock.slock == 0;
+}
+
 /*
  * This returns the old value in the lock, so we succeeded
  * in getting the lock if the return value is 0.
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 3/3] mutex: When there is no owner, stop spinning after too many tries

2014-01-14 Thread Jason Low

On Tue, 2014-01-14 at 17:00 -0800, Andrew Morton wrote:
> On Tue, 14 Jan 2014 16:33:10 -0800 Jason Low  wrote:
> 
> > When running workloads that have high contention in mutexes on an 8 socket
> > machine, spinners would often spin for a long time with no lock owner.
> > 
> > One of the potential reasons for this is because a thread can be preempted
> > after clearing lock->owner but before releasing the lock, or preempted after
> > acquiring the mutex but before setting lock->owner. In those cases, the
> > spinner cannot check if owner is not on_cpu because lock->owner is NULL.
> 
> That sounds like a very small window.  And your theory is that this
> window is being hit sufficiently often to impact aggregate runtime
> measurements, which sounds improbable to me?
> 
> > A solution that would address the preemption part of this problem would
> > be to disable preemption between acquiring/releasing the mutex and
> > setting/clearing the lock->owner. However, that will require adding overhead
> > to the mutex fastpath.
> 
> preempt_disable() is cheap, and sometimes free.
> 
> Have you confirmed that the preempt_disable() approach actually fixes
> the performance issues?  If it does then this would confirm your
> "potential reason" hypothesis.  If it doesn't then we should be hunting
> further for the explanation.

Using Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50)
which can also generate high mutex contention, the preempt_disable()
approach did provide approximately a 4% improvement at 160 threads, but
not nearly the 25+% I was seeing with this patchset. So, it looks like
preemption is not the main cause of the problem then.

Thanks,
Jason

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] vt: detect and ignore OSC codes.

2014-01-14 Thread Adam Borowski

These can be used to send commands consisting of an arbitrary string to the
terminal, most often used to set a terminal's window title or to redefine
the colour palette.  Our console doesn't use OSC, unlike everything else,
which can lead to junk being displayed if a process sends such a code
unconditionally.

Not following Ecma-48, this commit recognizes 7-bit forms (ESC ] ... 0x07,
ESC ] .. ESC \) but not 8-bit (0x9D ... 0x9C).

Signed-off-by: Adam Borowski 
---
 drivers/tty/vt/vt.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 61b1137..0377c52 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -1590,7 +1590,7 @@ static void restore_cur(struct vc_data *vc)
 
 enum { ESnormal, ESesc, ESsquare, ESgetpars, ESgotpars, ESfunckey,
EShash, ESsetG0, ESsetG1, ESpercent, ESignore, ESnonstd,
-   ESpalette };
+   ESpalette, ESosc };
 
 /* console_lock is held (except via vc_init()) */
 static void reset_terminal(struct vc_data *vc, int do_clear)
@@ -1650,11 +1650,15 @@ static void do_con_trol(struct tty_struct *tty, struct 
vc_data *vc, int c)
 *  Control characters can be used in the _middle_
 *  of an escape sequence.
 */
+   if (vc->vc_state == ESosc && c>=8 && c<=13) /* ... except for OSC */
+   return;
switch (c) {
case 0:
return;
case 7:
-   if (vc->vc_bell_duration)
+   if (vc->vc_state == ESosc)
+   vc->vc_state = ESnormal;
+   else if (vc->vc_bell_duration)
kd_mksound(vc->vc_bell_pitch, vc->vc_bell_duration);
return;
case 8:
@@ -1765,7 +1769,9 @@ static void do_con_trol(struct tty_struct *tty, struct 
vc_data *vc, int c)
} else if (c=='R') {   /* reset palette */
reset_palette(vc);
vc->vc_state = ESnormal;
-   } else
+   } else if (c>='0' && c<='9')
+   vc->vc_state = ESosc;
+   else
vc->vc_state = ESnormal;
return;
case ESpalette:
@@ -2023,6 +2029,8 @@ static void do_con_trol(struct tty_struct *tty, struct 
vc_data *vc, int c)
return;
default:
vc->vc_state = ESnormal;
+   case ESosc:
+   return;
}
 }
 
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] x86: intel-mid: sfi_handle_*_dev() should check for pdata error code

2014-01-14 Thread Ingo Molnar


* David Cohen  wrote:

> Hi Ingo,
> 
> On Fri, Dec 20, 2013 at 09:49:53AM +0100, Ingo Molnar wrote:
> > 
> > * David Cohen  wrote:
> > 
> > > Prevent sfi_handle_*_dev() to register device in case
> > > intel_mid_sfi_get_pdata() failed to execute.
> > > 
> > > Since 'NULL' is a valid return value, this patch makes
> > > sfi_handle_*_dev() functions to use IS_ERR() to validate returned pdata.
> > 
> > Is this bug triggering in practice? If not then please say so in the 
> > changelog. If yes then is this patch desired for v3.13 merging and 
> > also please fix the changelog to conform to the standard changelog 
> > style:
> > 
> >  - first describe the symptoms of the bug - how does a user notice?
> > 
> >  - then describe how the code behaves today and how that is causing
> >the bug
> > 
> >  - and then only describe how it's fixed.
> > 
> > The first item is the most important one - while developers 
> > (naturally) tend to concentrate on the least important point, the last 
> > one.
> 
> Thanks for the feedback :)
> This new patch set was done in reply to your comment:
> https://lkml.org/lkml/2013/12/20/517

Hm, in what way does the new changelog address my first request:

> >  - first describe the symptoms of the bug - how does a user notice?

They are all phrased as bug fixes, yet _none_ of the three changelogs 
appears to describe specific symptoms on specific systems - they all 
seem to talk in the abstract, with no specific connection to reality.

That really makes it harder for patches to get into the (way too 
narrow) attention span of maintainersm, while phrasing it like this:

 'If an Intel-MID system boots in a specific SFI environment then it 
  will hang on bootup without this fix.'

or:

 'Existing Intel-MID hardware will run faster with this patch.'

will certainly wake up maintainers like a good coffee in the morning.

If a patch is a cleanup with no known bug fix effects then say so in 
the title and the changelog.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/9] mm: slab/slub: use page->list consistently instead of page->lru

2014-01-14 Thread Dave Hansen

On 01/14/2014 06:31 PM, David Rientjes wrote:
> Did you try with a CONFIG_BLOCK config?
> 
> block/blk-mq.c: In function ‘blk_mq_free_rq_map’:
> block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> block/blk-mq.c:1094:10: warning: initialization from incompatible pointer 
> type [enabled by default]
> block/blk-mq.c:1094:10: error: ‘struct page’ has no member named ‘list’
> block/blk-mq.c:1095:22: error: ‘struct page’ has no member named ‘list’
> block/blk-mq.c: In function ‘blk_mq_init_rq_map’:
> block/blk-mq.c:1159:22: error: ‘struct page’ has no member named ‘list’

As I mentioned in the introduction, these are against linux-next.
There's a patch in there at the moment which fixed this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Alex Shi

On 01/15/2014 01:33 PM, Michael wang wrote:
> On 01/15/2014 12:07 PM, Alex Shi wrote:
>> > Currently we just try to find least load cpu. If some cpus idled,
>> > we just pick the first cpu in cpu mask.
>> > 
>> > In fact we can get the interrupted idle cpu or the latest idled cpu,
>> > then we may get the benefit from both latency and power.
>> > The selected cpu maybe not the best, since other cpu may be interrupted
>> > during our selecting. But be captious costs too much.
> So the idea here is we want to choose the latest idle cpu if we have
> multiple idle cpu for choosing, correct?

yes.
> 
> And I guess that was in order to avoid choosing tickless cpu while there
> are un-tickless idle one, is that right?

no, current logical choice least load cpu no matter if it is idle.
> 
> What confused me is, what about those cpu who just going to recover from
> tickless as you mentioned, which means latest idle doesn't mean the best
> choice, or even could be the worst (if just two choice, and the longer
> tickless one is just going to recover while the latest is going to
> tickless).

yes, to save your scenario, we need to know the next timer for idle cpu,
but that is not enough, interrupt is totally unpredictable. So, I'd
rather bear the coarse method now.
> 
> So what about just check 'ts->tick_stopped' and record one ticking idle
> cpu? the cost could be lower than time comparison, we could reduce the
> risk may be...(well, not so risky since the logical only works when
> system is relaxing with several cpu idle)

first, nohz full also stop tick. second, tick_stopped can not reflect
the interrupt. when the idle cpu was interrupted, it's waken, then be a
good candidate for task running.

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RESEND PATCH v10] x86, apic, kexec, Documentation: Add disable_cpu_apicid kernel parameter

2014-01-14 Thread HATAYAMA Daisuke

Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.

This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.

Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.

However, doing this procedure at each boot time manually is awkward,
which should be automatically done by user-land service scripts, for
example, kexec-tools on fedora/RHEL distributions.

This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.

One assumption behind this design is that users get initial APIC ID of
the BSP in still healthy state and so BSP is uniquely kept in
CPU0. Thus, through the kernel parameter, only one initial APIC ID can
be specified.

In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
boot_cpu_physical_apicid, because on some platforms, the variable is
modified to the apicid reported as BSP through MP table and this
function is executed with the temporarily modified
boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
parameter doesn't work well for apicids of APs.

Fixing the wrong handling of boot_cpu_physical_apicid requires some
reviews and tests beyond some platforms and it could take some
time. The fix here is a kind of workaround to focus on the main topic
of this patch.

Signed-off-by: HATAYAMA Daisuke 
---
 Documentation/kernel-parameters.txt |9 ++
 arch/x86/kernel/apic/apic.c |   49 +++
 2 files changed, 58 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 50680a5..4e5528c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -774,6 +774,15 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
disable=[IPV6]
See Documentation/networking/ipv6.txt.
 
+   disable_cpu_apicid= [X86,APIC,SMP]
+   Format: 
+   The number of initial APIC ID for the
+   corresponding CPU to be disabled at boot,
+   mostly used for the kdump 2nd kernel to
+   disable BSP to wake up multiple CPUs without
+   causing system reset or hang due to sending
+   INIT from AP to BSP.
+
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this if
to workaround buggy firmware.
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index d278736..6c0b7d5 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -75,6 +75,13 @@ unsigned int max_physical_apicid;
 physid_mask_t phys_cpu_present_map;
 
 /*
+ * Processor to be disabled specified by kernel parameter
+ * disable_cpu_apicid=, mostly used for the kdump 2nd kernel to
+ * avoid undefined behaviour caused by sending INIT from AP to BSP.
+ */
+unsigned int disabled_cpu_apicid = BAD_APICID;
+
+/*
  * Map cpu index to physical APIC ID
  */
 DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -2115,6 +2122,39 @@ int generic_processor_info(int apicid, int version)
phys_cpu_present_map);
 
/*
+* boot_cpu_physical_apicid is designed to have the apicid
+* returned by read_apic_id(), i.e, the apicid of the
+* currently booting-up processor. However, on some platforms,
+* it is temporarilly modified by the apicid reported as BSP
+* through MP table. Concretely:
+*
+* - arch/x86/kernel/mpparse.c: MP_processor_info()
+* - arch/x86/mm/amdtopology.c: amd_numa_init()
+* - arch/x86/platform/visws/visws_quirks.c: MP_processor_info()
+*
+* This function is executed with the modified
+* boot_cpu_physical_apicid. So, disabled_cpu_apicid kernel
+* parameter doesn't work to disable APs on kdump 2nd kernel.
+*
+* Since fixing handling of boot_cpu_physical_apicid requires
+* another discussion and tests on each platform, we leave it
+* for now and here we use read_apic_id() directly in this
+* function, generic_processor_info().
+*/
+   if (disabled_cpu_apicid != BAD_APICID &&
+   disabled_cpu_apicid != read_apic_id() &&
+   disabled_cpu_apicid == apicid) {
+   int thiscpu = num_processors + disabled_cpus;
+
+   pr_warning("ACPI:

Re: [PATCH 7/8 v3] crypto:s5p-sss: validate iv before memcpy

2014-01-14 Thread Naveen Krishna Ch

Hello Tomasz,

On 10 January 2014 21:33, Tomasz Figa  wrote:
> Hi Naveen,
>
>
> On 10.01.2014 12:45, Naveen Krishna Chatradhi wrote:
>>
>> This patch adds code to validate "iv" buffer before trying to
>> memcpy the contents
>>
>> Signed-off-by: Naveen Krishna Chatradhi 
>> ---
>> Changes since v2:
>> None
>>
>>   drivers/crypto/s5p-sss.c |5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/crypto/s5p-sss.c b/drivers/crypto/s5p-sss.c
>> index f274f5f..7058bb6 100644
>> --- a/drivers/crypto/s5p-sss.c
>> +++ b/drivers/crypto/s5p-sss.c
>> @@ -381,8 +381,9 @@ static void s5p_set_aes(struct s5p_aes_dev *dev,
>> struct samsung_aes_variant *var = dev->variant;
>> void __iomem *keystart;
>>
>> -   memcpy(dev->ioaddr + SSS_REG_AES_IV_DATA
>> -   (var->aes_offset, 0), iv, 0x10);
>> +   if (iv)
>> +   memcpy(dev->ioaddr + SSS_REG_AES_IV_DATA
>> +   (var->aes_offset, 0), iv, 0x10);
>
>
> In what conditions can the iv end up being NULL?
req->info is the initialization vector in our case, which comes from user space.
Its good to have a check to avoid any crashes.

Also AES ECB mode does not use IV.
>
> Best regards,
> Tomasz



-- 
Shine bright,
(: Nav :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: Synthesize anon MMAP records on the heap

2014-01-14 Thread Gaurav Jain

Hi Namhyung,


On 1/15/14, 12:46 AM, "Namhyung Kim"  wrote:

>I'd like to take my ack back - it seems I missed some points.

No worries, looks like the patch wasn’t well thought out.

>On Tue, 14 Jan 2014 20:48:23 +, Gaurav Jain wrote:
>> On 1/13/14, 11:54 AM, "Don Zickus"  wrote:
>>
>>>On Sat, Jan 11, 2014 at 08:32:14PM -0800, Gaurav Jain wrote:
 Anon records usually do not have the 'execname' entry. However if they
are on
 the heap, the execname shows up as '[heap]'. The fix considers any
executable
 entries in the map that do not have a name or are on the heap as anon
records
 and sets the name to '//anon'.
 
 This fixes JIT profiling for records on the heap.
>>>
>>>I guess I don't understand the need for this fix.  It seems breaking out
>>>//anon vs. [heap] would be useful.  Your patch is saying otherwise.  Can
>>>give a description of the problem you are trying to solve?
>>
>> Thank you for looking at the patch.
>>
>> We generate a perf map file which includes certain JIT¹ed functions that
>> show up as [heap] entries. As a result, I included the executable heap
>> entries as anon pages so that it would be handled in
>> util/map.c:map__new(). The alternative would be to handle heap entries
>>in
>> map__new() directly, however I wasn¹t sure if this would break something
>> as it seems that heap and stack entries are expected to fail all
>> map__find_* functions. Thus I considered executable heap entries as
>> //anon, but perhaps there is a better way.
>
>Hmm.. so the point is that an executable heap mapping should have
>/tmp/perf-XXX.map as a file name, right?  If so, does something like
>below work well for you?

Just gave it a try and it fixed the issue perfectly! Thanks for the help.
This looks like a much better solution than treating the heap mapping as
an anon record.

Gaurav

>diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
>index 9b9bd719aa19..d52387fe83f1 100644
>--- a/tools/perf/util/map.c
>+++ b/tools/perf/util/map.c
>@@ -69,7 +69,7 @@ struct map *map__new(struct list_head *dsos__list, u64
>start, u64 len,
>   map->ino = ino;
>   map->ino_generation = ino_gen;
> 
>-  if (anon) {
>+  if (anon || (no_dso && type == MAP__FUNCTION)) {
>   snprintf(newfilename, sizeof(newfilename), 
> "/tmp/perf-%d.map", pid);
>   filename = newfilename;
>   }
>@@ -93,7 +93,7 @@ struct map *map__new(struct list_head *dsos__list, u64
>start, u64 len,
>* functions still return NULL, and we avoid the
>* unnecessary map__load warning.
>*/
>-  if (no_dso)
>+  if (no_dso && type != MAP__FUNCTION)
>   dso__set_loaded(dso, map->type);
>   }
>   }

Re: [GIT PULL] clockevents/clocksources: 3.13 fixes

2014-01-14 Thread Ingo Molnar


* Daniel Lezcano  wrote:

> 
> Hi Thomas and Ingo,
> 
> here is a pull request for a single fix for 3.13. It is based on the
> latest timers/urgent update.
> 
>  * Soren Brinkmann fixed the cadence_ttc driver where a call to
> clk_get_rate happens in an interrupt context. More precisely in an
> IPI when the broadcast timer is initialized for each cpu in the
> cpuidle driver
> 
> Thanks
>   -- Daniel
> 
> The following changes since commit b0031f227e47919797dc0e1c1990f3ef151ff0cc:
> 
>   Merge tag 's2mps11-build' of
> git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
> (2013-12-17 12:57:36 -0800)
> 
> are available in the git repository at:
> 
> 
>   git://git.linaro.org/people/daniel.lezcano/linux.git
> clockevents/3.13-fixes
> 
> for you to fetch changes up to c1dcc927dae01dfd4904ee82ce2c00b50eab6dc3:
> 
>   clocksource: cadence_ttc: Fix mutex taken inside interrupt context
> (2013-12-30 11:32:24 +0100)
> 
> 
> Soren Brinkmann (1):
>   clocksource: cadence_ttc: Fix mutex taken inside interrupt context
> 
>  drivers/clocksource/cadence_ttc_timer.c |   21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)

Pulled into tip:timers/urgent, thanks Daniel!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:core/urgent] sched_clock: Disable seqlock lockdep usage in sched_clock()

2014-01-14 Thread Ingo Molnar


* John Stultz  wrote:

> On 01/12/2014 10:42 AM, tip-bot for John Stultz wrote:
> > Commit-ID:  7a06c41cbec33c6dbe7eec575c61986122617408
> > Gitweb: 
> > http://git.kernel.org/tip/7a06c41cbec33c6dbe7eec575c61986122617408
> > Author: John Stultz 
> > AuthorDate: Thu, 2 Jan 2014 15:11:14 -0800
> > Committer:  Ingo Molnar 
> > CommitDate: Sun, 12 Jan 2014 10:14:00 +0100
> >
> > sched_clock: Disable seqlock lockdep usage in sched_clock()
> >
> > Unfortunately the seqlock lockdep enablement can't be used
> > in sched_clock(), since the lockdep infrastructure eventually
> > calls into sched_clock(), which causes a deadlock.
> >
> > Thus, this patch changes all generic sched_clock() usage
> > to use the raw_* methods.
> >
> > Acked-by: Linus Torvalds 
> > Reviewed-by: Stephen Boyd 
> > Reported-by: Krzysztof Hałasa 
> > Signed-off-by: John Stultz 
> > Cc: Uwe Kleine-König 
> > Cc: Willy Tarreau 
> > Signed-off-by: Peter Zijlstra 
> > Link: 
> > http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stu...@linaro.org
> > Signed-off-by: Ingo Molnar 
> 
> Hey Ingo,
> Just wanted to follow up here, since I've still not seen this (and
> the raw_ renaming patch) submitted to Linus. These address a lockup that
> triggers on ARM systems if lockdep is enabled and it would be good to
> get it in before 3.13 is out.

It's in tip:core/urgent, i.e. lined up for v3.13, I plan to send it to 
Linus later today.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86/urgent] x86, cpu, amd: Add workaround for family 16h, erratum 793

2014-01-14 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> On 01/14/2014 04:45 PM, tip-bot for Borislav Petkov wrote:
> > +   rdmsrl(MSR_AMD64_LS_CFG, val);
> > +   if (!(val & BIT(15)))
> > +   wrmsrl(MSR_AMD64_LS_CFG, val | BIT(15));
> 
> Incidentally, I'm wondering if we shouldn't have a
> set_in_msr()/clear_in_msr() set of functions which would incorporate the
> above construct:
> 
> void set_in_msr(u32 msr, u64 mask)
> {
>   u64 old, new;
> 
>   old = rdmsrl(msr);
>   new = old | mask;
>   if (old != new)
>   wrmsrl(msr, new);
> }
> 
> ... and the obvious equivalent for clear_in_msr().
> 
> The perhaps only question is if it should be "set/clear_bit_in_msr()"
> rather than having to haul a full 64-bit mask in the common case.

I'd suggest the introduction of a standard set of methods operating on 
MSRs:

msr_read()
msr_write()
msr_set_bit()
msr_clear_bit()
msr_set_mask()
msr_clear_mask()

etc.

msr_read() would essentially map to rdmsr_safe(). Each method has a 
return value that can be checked for failure.

Note that the naming of 'msr_set_bit()' and 'msr_clear_bit()' mirrors 
that of bitops, and set_mask/clear_mask is named along a similar 
pattern, so that it's more immediately obvious what's going on.

With such methods in place we could use them in most new code, and 
would use 'raw, unsafe' rdmsr()/wrmsr() only in very specific, 
justified cases.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] perf tools: Spare double comparison of callchain first entry

2014-01-14 Thread Namhyung Kim

On Tue, 14 Jan 2014 16:37:15 +0100, Frederic Weisbecker wrote:
> When a new callchain child branch matches an existing one in the rbtree,
> the comparison of its first entry is performed twice:
>
> 1) From append_chain_children() on branch lookup
>
> 2) If 1) reports a match, append_chain() then compares all entries of
> the new branch against the matching node in the rbtree, and this
> comparison includes the first entry of the new branch again.

Right.

>
> Lets shortcut this by performing the whole comparison only from
> append_chain() which then returns the result of the comparison between
> the first entry of the new branch and the iterating node in the rbtree.
> If the first entry matches, the lookup on the current level of siblings
> stops and propagates to the children of the matching nodes.

Hmm..  it looks like that I thought directly calling append_chain() has
some overhead - but it's not.

>
> This results in less comparisons performed by the CPU.

Do you have any numbers?  I suspect it'd not be a big change, but just
curious.

>
> Signed-off-by: Frederic Weisbecker 

Reviewed-by: Namhyung Kim 

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] perf tools: Remove unnecessary callchain cursor state restore on unmatch

2014-01-14 Thread Namhyung Kim

On Tue, 14 Jan 2014 16:37:16 +0100, Frederic Weisbecker wrote:
> If a new callchain branch doesn't match a single entry of the node that
> it is given against comparison in append_chain(), then the cursor is
> expected to be at the same position as it was before the comparison loop.
>
> As such, there is no need to restore the cursor position on exit in case
> of non matching branches.
>
> Signed-off-by: Frederic Weisbecker 

Reviewed-by: Namhyung Kim 

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] ACPI/Battery: Add a _BIX quirk for NEC LZ750/LS

2014-01-14 Thread Robert Hancock


On 01/14/2014 03:37 PM, Rafael J. Wysocki wrote:

On Tuesday, January 14, 2014 04:06:01 PM Matthew Garrett wrote:

On Mon, Jan 06, 2014 at 11:25:53PM +0100, Rafael J. Wysocki wrote:


Queued up as a fix for 3.13 (I fixed up the indentation).


Ah, sorry, I missed this chunk of the thread. If the system provides
valid _BIF data then we should possibly just fall back to that rather
than adding another quirk table.


The problem is to know that _BIX is broken.  If we could figure that out
upfront, we woulnd't need the quirk table in any case.

Tianyu, can we do some effort during the driver initialization to detect
this breakage and handle it without blacklisting systems?


Yes, the usual question in such cases is "how does Windows manage to 
function on such systems, (almost certainly) without a system-specific 
hack, and can we replicate that behavior?"


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] clk: sirf: re-arch to make the codes support both prima2 and atlas6

2014-01-14 Thread Barry Song

2014/1/15 Mike Turquette :
> Quoting Barry Song (2014-01-05 21:38:19)
>> diff --git a/drivers/clk/sirf/clk-atlas6.c b/drivers/clk/sirf/clk-atlas6.c
>> new file mode 100644
>> index 000..21e776a
>> --- /dev/null
>> +++ b/drivers/clk/sirf/clk-atlas6.c
>> @@ -0,0 +1,153 @@
>> +/*
>> + * Clock tree for CSR SiRFatlasVI
>> + *
>> + * Copyright (c) 2011 Cambridge Silicon Radio Limited, a CSR plc group 
>> company.
>> + *
>> + * Licensed under GPLv2 or later.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>
> Please do not use clk-private.h. It is slated for removal (some day...).
> Do you actually need it?

removed in v3.
>
> Regards,
> Mike

-barry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] toshiba_acpi: Support RFKILL hotkey scancode

2014-01-14 Thread Unai Uribarri

In Windows disables the Wi-Fi & Bluetooth. In Linux does nothing but
showing this error in dmesg about the unrecognized scancode:
[230128.042047] toshiba_acpi: Unknown key 158

On Tue, Jan 14, 2014 at 4:54 PM, Matthew Garrett
 wrote:
> On Tue, 2014-01-14 at 11:06 +0100, Unai Uribarri wrote:
>> This scancode is used in new 2013 models like Satellite P75-A7200.
>
> Just to check - this is generated by a key that otherwise does nothing,
> right? Ie, hitting the key generates the scancode but doesn't
> automatically change the wireless state?
>
> --
> Matthew Garrett 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: dts: imx28-apf28dev: add user button

2014-01-14 Thread Shawn Guo

On Tue, Jan 14, 2014 at 03:21:27PM +0100, Sébastien Szymanski wrote:
> Signed-off-by: Sébastien Szymanski 
> ---
>  arch/arm/boot/dts/imx28-apf28dev.dts | 11 +++
>  1 file changed, 11 insertions(+)

Applied, thanks.

Shawn

> 
> diff --git a/arch/arm/boot/dts/imx28-apf28dev.dts 
> b/arch/arm/boot/dts/imx28-apf28dev.dts
> index 334dea5..221cac4 100644
> --- a/arch/arm/boot/dts/imx28-apf28dev.dts
> +++ b/arch/arm/boot/dts/imx28-apf28dev.dts
> @@ -48,6 +48,7 @@
>   MX28_PAD_LCD_D20__GPIO_1_20
>   MX28_PAD_LCD_D21__GPIO_1_21
>   MX28_PAD_LCD_D22__GPIO_1_22
> + MX28_PAD_GPMI_CE1N__GPIO_0_17
>   >;
>   fsl,drive-strength = ;
>   fsl,voltage = ;
> @@ -193,4 +194,14 @@
>   brightness-levels = <0 4 8 16 32 64 128 255>;
>   default-brightness-level = <6>;
>   };
> +
> + gpio-keys {
> + compatible = "gpio-keys";
> +
> + user-button {
> + label = "User button";
> + gpios = < 17 0>;
> + linux,code = <0x100>;
> + };
> + };
>  };
> -- 
> 1.8.3.2
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 9/9] mm: keep page cache radix tree nodes in check

2014-01-14 Thread Bob Liu

Hi Johannes,

On 01/11/2014 02:10 AM, Johannes Weiner wrote:
> Previously, page cache radix tree nodes were freed after reclaim
> emptied out their page pointers.  But now reclaim stores shadow
> entries in their place, which are only reclaimed when the inodes
> themselves are reclaimed.  This is problematic for bigger files that
> are still in use after they have a significant amount of their cache
> reclaimed, without any of those pages actually refaulting.  The shadow
> entries will just sit there and waste memory.  In the worst case, the
> shadow entries will accumulate until the machine runs out of memory.
> 

I have one more question. It seems that other algorithm only remember
history information of a limit number of evicted pages where the number
is usually the same as the total cache or memory size.
But in your patch, I didn't see a preferred value that how many evicted
pages' history information should be recorded. It all depends on the
workingset_shadow_shrinker?

Thanks,
-Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] perf tools: Do proper comm override error handling

2014-01-14 Thread Namhyung Kim

Hi Frederic,

On Tue, 14 Jan 2014 16:37:14 +0100, Frederic Weisbecker wrote:
> The comm overriding API ignores memory allocation failures by silently
> keeping the previous and out of date comm.
>
> As a result, the user may get buggy events without ever being notified
> about the problem and its source.
>
> Lets start to fix this by propagating the error from the API. Not all
> callers may be doing proper error handling on comm set yet but this
> is the first step toward it.

Acked-by: Namhyung Kim 

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Documentation / CPU hotplug: Fix the typo in example code

2014-01-14 Thread Srivatsa S. Bhat

On 01/15/2014 10:40 AM, Sangjung Woo wrote:
> As the notifier_block name (i.e. foobar_cpu_notifer) is different from
> the parameter (i.e.foobar_cpu_notifier) of register function, that is
> definitely error and it also makes readers confused.
> 
> Signed-off-by: Sangjung Woo 

Reviewed-by: Srivatsa S. Bhat 

Regards,
Srivatsa S. Bhat

> ---
>  Documentation/cpu-hotplug.txt |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
> index 8cb9938..be675d2 100644
> --- a/Documentation/cpu-hotplug.txt
> +++ b/Documentation/cpu-hotplug.txt
> @@ -285,7 +285,7 @@ A: This is what you would need in your kernel code to 
> receive notifications.
>   return NOTIFY_OK;
>   }
> 
> - static struct notifier_block foobar_cpu_notifer =
> + static struct notifier_block foobar_cpu_notifier =
>   {
>  .notifier_call = foobar_cpu_callback,
>   };
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf tools: Synthesize anon MMAP records on the heap

2014-01-14 Thread Namhyung Kim

Hi Gaurav,

I'd like to take my ack back - it seems I missed some points.


On Tue, 14 Jan 2014 20:48:23 +, Gaurav Jain wrote:
> On 1/13/14, 11:54 AM, "Don Zickus"  wrote:
>
>>On Sat, Jan 11, 2014 at 08:32:14PM -0800, Gaurav Jain wrote:
>>> Anon records usually do not have the 'execname' entry. However if they
>>>are on
>>> the heap, the execname shows up as '[heap]'. The fix considers any
>>>executable
>>> entries in the map that do not have a name or are on the heap as anon
>>>records
>>> and sets the name to '//anon'.
>>> 
>>> This fixes JIT profiling for records on the heap.
>>
>>I guess I don't understand the need for this fix.  It seems breaking out
>>//anon vs. [heap] would be useful.  Your patch is saying otherwise.  Can
>>give a description of the problem you are trying to solve?
>
> Thank you for looking at the patch.
>
> We generate a perf map file which includes certain JIT¹ed functions that
> show up as [heap] entries. As a result, I included the executable heap
> entries as anon pages so that it would be handled in
> util/map.c:map__new(). The alternative would be to handle heap entries in
> map__new() directly, however I wasn¹t sure if this would break something
> as it seems that heap and stack entries are expected to fail all
> map__find_* functions. Thus I considered executable heap entries as
> //anon, but perhaps there is a better way.

Hmm.. so the point is that an executable heap mapping should have
/tmp/perf-XXX.map as a file name, right?  If so, does something like
below work well for you?

Thanks,
Namhyung


diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 9b9bd719aa19..d52387fe83f1 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -69,7 +69,7 @@ struct map *map__new(struct list_head *dsos__list, u64 start, 
u64 len,
map->ino = ino;
map->ino_generation = ino_gen;
 
-   if (anon) {
+   if (anon || (no_dso && type == MAP__FUNCTION)) {
snprintf(newfilename, sizeof(newfilename), 
"/tmp/perf-%d.map", pid);
filename = newfilename;
}
@@ -93,7 +93,7 @@ struct map *map__new(struct list_head *dsos__list, u64 start, 
u64 len,
 * functions still return NULL, and we avoid the
 * unnecessary map__load warning.
 */
-   if (no_dso)
+   if (no_dso && type != MAP__FUNCTION)
dso__set_loaded(dso, map->type);
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm/zswap: add writethrough option

2014-01-14 Thread Minchan Kim

Hello,

On Tue, Jan 14, 2014 at 10:10:44AM -0500, Dan Streetman wrote:
> On Mon, Jan 13, 2014 at 7:11 PM, Minchan Kim  wrote:
> > Hello Dan,
> >
> > Sorry for the late response and I didn't look at the code yet
> > because I am not convinced. :(
> >
> > On Thu, Dec 19, 2013 at 08:23:27AM -0500, Dan Streetman wrote:
> >> Currently, zswap is writeback cache; stored pages are not sent
> >> to swap disk, and when zswap wants to evict old pages it must
> >> first write them back to swap cache/disk manually.  This avoids
> >> swap out disk I/O up front, but only moves that disk I/O to
> >> the writeback case (for pages that are evicted), and adds the
> >> overhead of having to uncompress the evicted pages and the
> >> need for an additional free page (to store the uncompressed page).
> >>
> >> This optionally changes zswap to writethrough cache by enabling
> >> frontswap_writethrough() before registering, so that any
> >> successful page store will also be written to swap disk.  The
> >> default remains writeback.  To enable writethrough, the param
> >> zswap.writethrough=1 must be used at boot.
> >>
> >> Whether writeback or writethrough will provide better performance
> >> depends on many factors including disk I/O speed/throughput,
> >> CPU speed(s), system load, etc.  In most cases it is likely
> >> that writeback has better performance than writethrough before
> >> zswap is full, but after zswap fills up writethrough has
> >> better performance than writeback.
> >
> > So you claims we should use writeback default but writethrough
> > after memory limit is full?
> > But it would break LRU ordering and I think better idea is to
> > handle it more generic way rather than chaning entire policy.
> 
> This patch only adds the option of using writethrough.  That's all.

The point is that please explain that what's the your problem now
and prove that adding new option for solve the problem is best.
Just "Optionally, having is better" is not good approach to merge/maintain.

> 
> > Now, zswap evict out just *a* page rather than a bunch of pages
> > so it stucks every store if many swap write happens continuously.
> > It's not efficient so how about adding kswapd's threshold concept
> > like min/low/high? So, it could evict early before reaching zswap
> > memory pool and stop it reaches high watermark.
> > I guess it could be better than now.
> 
> Well, I don't think that's related to this patch, but certainly a good idea to
> investigate.

Why I suggested it that I feel from your description that wb is just
slower than wt since zswap memory is pool.

> 
> >
> > Other point: As I read device-mapper/cache.txt, cache operating mode
> > already supports writethrough. It means zram zRAM can support
> > writeback/writethough with dm-cache.
> > Have you tried it? Is there any problem?
> 
> zswap isn't a block device though, so that doesn't apply (unless I'm
> missing something).

zram is block device so freely you can make it to swap block device
and binding it with dm-cache will make what you want.
The whole point is we could do what you want without adding new
so I hope you prove what's the problem in existing solution so that
we could judge and try to solve the pain point with more ideal
approach.

> 
> >
> > Acutally, I really don't know how much benefit we have that in-memory
> > swap overcomming to the real storage but if you want, zRAM with dm-cache
> > is another option rather than invent new wheel by "just having is better".
> 
> I'm not sure if this patch is related to the zswap vs. zram discussions.  This
> only adds the option of using writethrough to zswap.  It's a first
> step to possibly
> making zswap work more efficiently using writeback and/or writethrough
> depending on
> the system and conditions.

The patch size is small. Okay I don't want to be a party-pooper
but at least, I should say my thought for Andrew to help judging.

> 
> >
> > Thanks.
> >
> >>
> >> Signed-off-by: Dan Streetman 
> >>
> >> ---
> >>
> >> Based on specjbb testing on my laptop, the results for both writeback
> >> and writethrough are better than not using zswap at all, but writeback
> >> does seem to be better than writethrough while zswap isn't full.  Once
> >> it fills up, performance for writethrough is essentially close to not
> >> using zswap, while writeback seems to be worse than not using zswap.
> >> However, I think more testing on a wider span of systems and conditions
> >> is needed.  Additionally, I'm not sure that specjbb is measuring true
> >> performance under fully loaded cpu conditions, so additional cpu load
> >> might need to be added or specjbb parameters modified (I took the
> >> values from the 4 "warehouses" test run).
> >>
> >> In any case though, I think having writethrough as an option is still
> >> useful.  More changes could be made, such as changing from writeback
> >> to writethrough based on the zswap % full.  And the patch doesn't
> >> change default behavior - writethrough must be specifically

[PATCH REGRESSION FIX] x86 idle: restore mwait_idle()

2014-01-14 Thread Len Brown

From: Len Brown 

In Linux-3.9 we removed the mwait_idle() loop:
'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
(69fb3676df3329a7142803bb3502fa59dc0db2e3)

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT loop,
until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
   MWAIT-C1 is preferred for optimal power and performance.
   But if they support just C1, cpuidle never loads and
   so they use the boot-time default idle loop forever.

2. Some laptops will boot-hang if HALT is used,
   but will boot successfully if MWAIT is used.
   This appears to be a hidden assumption in BIOS SMI,
   that is presumably valid on the proprietary OS
   where the BIOS was validated.

   https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop.  However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
For 3.11, 3.12, 3.13, this patch applies cleanly

Cc: Mike Galbraith 
Cc: Ian Malone 
Cc: Josh Boyer 
Cc:  # 3.9, 3.10, 3.11, 3.12, 3.13
Signed-off-by: Len Brown 
---
 arch/x86/kernel/process.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95..db471a8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -398,6 +398,49 @@ static void amd_e400_idle(void)
default_idle();
 }
 
+/*
+ * Intel Core2 and older machines prefer MWAIT over HALT for C1.
+ * We can't rely on cpuidle installing MWAIT, because it will not load
+ * on systems that support only C1 -- so the boot default must be MWAIT.
+ *  
+ * Some AMD machines are the opposite, they depend on using HALT.
+ *
+ * So for default C1, which is used during boot until cpuidle loads,
+ * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ */
+static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
+{
+   if (c->x86_vendor != X86_VENDOR_INTEL)
+   return 0;
+
+   if (!cpu_has(c, X86_FEATURE_MWAIT))
+   return 0;
+
+   return 1;
+}
+
+/*
+ * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * This invokes MWAIT with interrutps enabled and no flags,
+ * which is backwards compatible with the original MWAIT implementation.
+ */
+
+static void mwait_idle(void)
+{
+   if (!need_resched()) {
+   if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
+   clflush((void *)_thread_info()->flags);
+
+   __monitor((void *)_thread_info()->flags, 0, 0);
+   smp_mb();
+   if (!need_resched())
+   __sti_mwait(0, 0);
+   else
+   local_irq_enable();
+   } else
+   local_irq_enable();
+}
+
 void select_idle_routine(const struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
@@ -411,6 +454,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
/* E400: APIC timer interrupt does not wake up CPU from C1e */
pr_info("using AMD E400 aware idle routine\n");
x86_idle = amd_e400_idle;
+   } else if (prefer_mwait_c1_over_halt(c)) {
+   pr_info("using mwait in idle threads\n");
+   x86_idle = mwait_idle;
} else
x86_idle = default_idle;
 }
-- 
1.8.5.2.309.ga25014b

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] clk: sirf: re-arch to make the codes support both prima2 and atlas6

2014-01-14 Thread Mike Turquette

Quoting Barry Song (2014-01-05 21:38:19)
> diff --git a/drivers/clk/sirf/clk-atlas6.c b/drivers/clk/sirf/clk-atlas6.c
> new file mode 100644
> index 000..21e776a
> --- /dev/null
> +++ b/drivers/clk/sirf/clk-atlas6.c
> @@ -0,0 +1,153 @@
> +/*
> + * Clock tree for CSR SiRFatlasVI
> + *
> + * Copyright (c) 2011 Cambridge Silicon Radio Limited, a CSR plc group 
> company.
> + *
> + * Licensed under GPLv2 or later.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Please do not use clk-private.h. It is slated for removal (some day...).
Do you actually need it?

Regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Michael wang

On 01/15/2014 12:07 PM, Alex Shi wrote:
> Currently we just try to find least load cpu. If some cpus idled,
> we just pick the first cpu in cpu mask.
> 
> In fact we can get the interrupted idle cpu or the latest idled cpu,
> then we may get the benefit from both latency and power.
> The selected cpu maybe not the best, since other cpu may be interrupted
> during our selecting. But be captious costs too much.

So the idea here is we want to choose the latest idle cpu if we have
multiple idle cpu for choosing, correct?

And I guess that was in order to avoid choosing tickless cpu while there
are un-tickless idle one, is that right?

What confused me is, what about those cpu who just going to recover from
tickless as you mentioned, which means latest idle doesn't mean the best
choice, or even could be the worst (if just two choice, and the longer
tickless one is just going to recover while the latest is going to
tickless).

So what about just check 'ts->tick_stopped' and record one ticking idle
cpu? the cost could be lower than time comparison, we could reduce the
risk may be...(well, not so risky since the logical only works when
system is relaxing with several cpu idle)

Regards,
Michael Wang

> 
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/fair.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c7395d9..fb52d26 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct 
> task_struct *p, int this_cpu)
>   min_load = load;
>   idlest = i;
>   }
> +#ifdef CONFIG_NO_HZ_COMMON
> + /*
> +  * Coarsely to get the latest idle cpu for shorter latency and
> +  * possible power benefit.
> +  */
> + if (!min_load) {
> + struct tick_sched *ts = _cpu(tick_cpu_sched, i);
> +
> + s64 latest_wake = 0;
> + /* idle cpu doing irq */
> + if (ts->inidle && !ts->idle_active)
> + idlest = i;
> + /* the cpu resched */
> + else if (!ts->inidle)
> + idlest = i;
> + /* find latest idle cpu */
> + else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
> + idlest = i;
> + }
> +#endif
>   }
> 
>   return idlest;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the md tree with the block tree

2014-01-14 Thread NeilBrown

On Wed, 15 Jan 2014 15:07:43 +1100 Stephen Rothwell 
wrote:

> Hi Neil,
> 
> Today's linux-next merge of the md tree got a conflict in
> drivers/md/raid10.c between commit 4f024f3797c4 ("block: Abstract out
> bvec iterator") from the block tree and commit b50c259e25d9 ("md/raid10:
> fix two bugs in handling of known-bad-blocks") from the md tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).
> 

Thanks Stephen.
Those md fixes are now on their way to Linus (I'm hoping for 3.13 inclusion,
I might be lucky).  Good they they are easy to resolve!

Thanks,
NeilBrown


signature.asc
Description: PGP signature

[PATCH tip/core/timers 3/4] timers: Reduce future __run_timers() latency for newly emptied list

2014-01-14 Thread Paul E. McKenney

From: "Paul E. McKenney" 

The __run_timers() function currently steps through the list one jiffy at
a time in order to update the timer wheel.  However, if the timer wheel
is empty, no adjustment is needed other than updating ->timer_jiffies.
Therefore, if we just emptied the timer wheel, for example, by deleting
the last timer, we should mark the timer wheel as being up to date.
This marking will reduce (and perhaps eliminate) the jiffy-stepping that
a future __run_timers() call will need to do in response to some future
timer posting or migration.  This commit therefore catches ->timer_jiffies
for this case.

Signed-off-by: Paul E. McKenney 
---
 kernel/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 295837e5e011..bdd1c00ec4ec 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -700,6 +700,7 @@ static int detach_if_pending(struct timer_list *timer, 
struct tvec_base *base,
base->next_timer = base->timer_jiffies;
}
base->all_timers--;
+   (void)catchup_timer_jiffies(base);
return 1;
 }
 
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH tip/core/timers 2/4] timers: Reduce __run_timers() latency for empty list

2014-01-14 Thread Paul E. McKenney

From: "Paul E. McKenney" 

The __run_timers() function currently steps through the list one jiffy at
a time in order to update the timer wheel.  However, if the timer wheel
is empty, no adjustment is needed other than updating ->timer_jiffies.
In this case, which is likely to be common for NO_HZ_FULL kernels, the
kernel currently incurs a large latency for no good reason.  This commit
therefore short-circuits this case.

Signed-off-by: Paul E. McKenney 
---
 kernel/timer.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 2245b7374c3d..295837e5e011 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -338,6 +338,17 @@ void set_timer_slack(struct timer_list *timer, int 
slack_hz)
 }
 EXPORT_SYMBOL_GPL(set_timer_slack);
 
+static bool catchup_timer_jiffies(struct tvec_base *base)
+{
+#ifdef CONFIG_NO_HZ_FULL
+   if (!base->all_timers) {
+   base->timer_jiffies = jiffies;
+   return 1;
+   }
+#endif /* #ifdef CONFIG_NO_HZ_FULL */
+   return 0;
+}
+
 static void
 __internal_add_timer(struct tvec_base *base, struct timer_list *timer)
 {
@@ -1150,6 +1161,10 @@ static inline void __run_timers(struct tvec_base *base)
struct timer_list *timer;
 
spin_lock_irq(>lock);
+   if (catchup_timer_jiffies(base)) {
+   spin_unlock_irq(>lock);
+   return;
+   }
while (time_after_eq(jiffies, base->timer_jiffies)) {
struct list_head work_list;
struct list_head *head = _list;
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH tip/core/timers 1/4] timers: Track total number of timers in list

2014-01-14 Thread Paul E. McKenney

From: "Paul E. McKenney" 

Currently, the tvec_base structure's ->active_timers field tracks only
the non-deferrable timers, which means that even if ->active_timers is
zero, there might well be non-deferrable timers in the list.  This commit
therefore adds an ->all_timers field to track all the timers, whether
deferrable or not.

Signed-off-by: Paul E. McKenney 
---
 kernel/timer.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index 6582b82fa966..2245b7374c3d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -81,6 +81,7 @@ struct tvec_base {
unsigned long timer_jiffies;
unsigned long next_timer;
unsigned long active_timers;
+   unsigned long all_timers;
struct tvec_root tv1;
struct tvec tv2;
struct tvec tv3;
@@ -392,6 +393,7 @@ static void internal_add_timer(struct tvec_base *base, 
struct timer_list *timer)
base->next_timer = timer->expires;
base->active_timers++;
}
+   base->all_timers++;
 }
 
 #ifdef CONFIG_TIMER_STATS
@@ -671,6 +673,7 @@ detach_expired_timer(struct timer_list *timer, struct 
tvec_base *base)
detach_timer(timer, true);
if (!tbase_get_deferrable(timer->base))
base->active_timers--;
+   base->all_timers--;
 }
 
 static int detach_if_pending(struct timer_list *timer, struct tvec_base *base,
@@ -685,6 +688,7 @@ static int detach_if_pending(struct timer_list *timer, 
struct tvec_base *base,
if (timer->expires == base->next_timer)
base->next_timer = base->timer_jiffies;
}
+   base->all_timers--;
return 1;
 }
 
@@ -1560,6 +1564,7 @@ static int init_timers_cpu(int cpu)
base->timer_jiffies = jiffies;
base->next_timer = base->timer_jiffies;
base->active_timers = 0;
+   base->all_timers = 0;
return 0;
 }
 
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH tip/core/timers 4/4] timers: Reduce future __run_timers() latency for first add to empty list

2014-01-14 Thread Paul E. McKenney

From: "Paul E. McKenney" 

The __run_timers() function currently steps through the list one jiffy at
a time in order to update the timer wheel.  However, if the timer wheel
is empty, no adjustment is needed other than updating ->timer_jiffies.
Therefore, just before we add a timer to an empty timer wheel, we should
mark the timer wheel as being up to date.  This marking will reduce (and
perhaps eliminate) the jiffy-stepping that a future __run_timers() call
will need to do in response to some future timer posting or migration.
This commit therefore updates ->timer_jiffies for this case.

Signed-off-by: Paul E. McKenney 
---
 kernel/timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/timer.c b/kernel/timer.c
index bdd1c00ec4ec..b49d2d0e879e 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -749,6 +749,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
 
base = lock_timer_base(timer, );
 
+   (void)catchup_timer_jiffies(base);
ret = detach_if_pending(timer, base, false);
if (!ret && pending_only)
goto out_unlock;
-- 
1.8.1.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL REQUEST] last minute md fixes for 3.13

2014-01-14 Thread NeilBrown


Sorry they are late  Christmas holidays and all that.  Hopefully they can
still squeak into 3.13.

NeilBrown


The following changes since commit 6d183de4077191d1201283a9035ce57a9b05254d:

  md/raid5: fix newly-broken locking in get_active_stripe. (2013-11-28 11:00:15 
+1100)

are available in the git repository at:

  git://neil.brown.name/md tags/md/3.13-fixes

for you to fetch changes up to 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97:

  md: fix problem when adding device to read-only array with bitmap. 
(2014-01-14 16:44:08 +1100)


md: half a dozen bug fixes for 3.13

All of these fix real bugs the people have hit, and are tagged
for -stable.


NeilBrown (6):
  md/raid5: Fix possible confusion when multiple write errors occur.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid1: fix request counting bug in new 'barrier' code.
  md/raid5: fix a recently broken BUG_ON().
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md: fix problem when adding device to read-only array with bitmap.

 drivers/md/md.c | 18 +++---
 drivers/md/md.h |  3 +++
 drivers/md/raid1.c  |  3 +--
 drivers/md/raid10.c | 12 ++--
 drivers/md/raid5.c  |  7 ---
 5 files changed, 29 insertions(+), 14 deletions(-)


signature.asc
Description: PGP signature

[PATCH v2 tip/core/timers] Crude timer-wheel latency hacks

2014-01-14 Thread Paul E. McKenney

Hello!

The following three patches provide some crude timer-wheel latency
patches.  I understand that a more comprehensive solution is in progress,
but in the meantime, these patches work well in cases where a given
CPU has either zero or one timers pending, which is a common case for
NO_HZ_FULL kernels.  So, on the off-chance that this is helpful to
someone, the individual patches are as follows:

1.  Add ->all_timers field to tbase_vec to count all timers, not
just the non-deferrable ones.

2.  Avoid jiffy-at-a-time stepping when the timer wheel is empty.

3.  Avoid jiffy-at-a-time stepping when the timer wheel transitions
to empty.

4.  Avoid jiffy-at-a-time stepping after a timer is added to an
initially empty timer wheel.

Differences from v1:

o   Fix an embarrassing bug located by Oleg Nesterov where the
timer wheel could be judged to be empty even if it contained
deferrable timers.

Thanx, Paul



 b/kernel/timer.c |   22 ++
 1 file changed, 22 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] drm last fixes

2014-01-14 Thread Dave Airlie


Hi Linus,

One nouveau regression fix on older cards, i915 black screen fixes, and a 
revert for a strange G33 intel problem.

Dave.

The following changes since commit 7e22e91102c6b9df7c4ae2168910e19d2bb14cd6:

  Linux 3.13-rc8 (2014-01-12 17:04:18 +0700)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 703a8c2dfa5aa69b9b0a6684dc78ea28a2c7fe3e:

  Merge branch 'drm-nouveau-next' of 
git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes (2014-01-15 
15:01:11 +1000)



Ben Skeggs (1):
  drm/nouveau: fix null ptr dereferences on some boards

Dave Airlie (3):
  Merge tag 'drm-intel-fixes-2014-01-13' of 
git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
  Revert "drm: copy mode type in drm_mode_connector_list_update()"
  Merge branch 'drm-nouveau-next' of 
git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

Jesse Barnes (1):
  drm/i915/bdw: make sure south port interrupts are enabled properly v2

Paulo Zanoni (1):
  drm/i915: fix DDI PLLs HW state readout code

Ville Syrjälä (1):
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()

 drivers/gpu/drm/drm_modes.c   |  2 +-
 drivers/gpu/drm/i915/i915_irq.c   |  2 ++
 drivers/gpu/drm/i915/intel_ddi.c  |  8 +++-
 drivers/gpu/drm/i915/intel_display.c  |  4 ++--
 drivers/gpu/drm/nouveau/core/include/subdev/i2c.h |  2 +-
 drivers/gpu/drm/nouveau/core/include/subdev/instmem.h |  7 +++
 drivers/gpu/drm/nouveau/core/subdev/i2c/base.c|  4 ++--
 drivers/gpu/drm/nouveau/core/subdev/therm/ic.c| 10 +-
 drivers/gpu/drm/nouveau/dispnv04/dfp.c|  2 +-
 drivers/gpu/drm/nouveau/dispnv04/tvnv04.c |  2 +-
 10 files changed, 29 insertions(+), 14 deletions(-)

Re: [PATCH] mm/zswap: Check all pool pages instead of one pool pages

2014-01-14 Thread Minchan Kim

On Tue, Jan 14, 2014 at 02:15:44PM +0800, Weijie Yang wrote:
> On Tue, Jan 14, 2014 at 1:42 PM, Bob Liu  wrote:
> >
> > On 01/14/2014 01:05 PM, Minchan Kim wrote:
> >> On Tue, Jan 14, 2014 at 01:50:22PM +0900, Minchan Kim wrote:
> >>> Hello Bob,
> >>>
> >>> On Tue, Jan 14, 2014 at 09:19:23AM +0800, Bob Liu wrote:
> 
>  On 01/14/2014 07:35 AM, Minchan Kim wrote:
> > Hello,
> >
> > On Sat, Jan 11, 2014 at 03:43:07PM +0800, Cai Liu wrote:
> >> zswap can support multiple swapfiles. So we need to check
> >> all zbud pool pages in zswap.
> >
> > True but this patch is rather costly that we should iterate
> > zswap_tree[MAX_SWAPFILES] to check it. SIGH.
> >
> > How about defining zswap_tress as linked list instead of static
> > array? Then, we could reduce unnecessary iteration too much.
> >
> 
>  But if use linked list, it might not easy to access the tree like this:
>  struct zswap_tree *tree = zswap_trees[type];
> >>>
> >>> struct zswap_tree {
> >>> ..
> >>> ..
> >>> struct list_head list;
> >>> }
> >>>
> >>> zswap_frontswap_init()
> >>> {
> >>> ..
> >>> ..
> >>> zswap_trees[type] = tree;
> >>> list_add(>list, _list);
> >>> }
> >>>
> >>> get_zswap_pool_pages(void)
> >>> {
> >>> struct zswap_tree *cur;
> >>> list_for_each_entry(cur, _list, list) {
> >>> pool_pages += zbud_get_pool_size(cur->pool);
> >>> }
> >>> return pool_pages;
> >>> }
> >
> > Okay, I see your point. Yes, it's much better.
> > Cai, Please make an new patch.
> 
> This improved patch could reduce unnecessary iteration too much.
> 
> But I still have a question: why do we need so many zbud pools?
> How about use only one global zbud pool for all zswap_tree?
> I do not test it, but I think it can improve the strore density.

Just a quick glance,

I don't know how multiple swap configuration is popular?
With your approach, what kinds of change do we need in 
frontswap_invalidate_area?
You will add encoded *type* in offset of entry?
So we always should decode it when we need search opeartion?
We lose speed but get a density(? but not sure because it's dependent on 
workload)
for rare configuration(ie, multiple swap) and rare event(ie, swapoff).
It's just popped question, not strong objection.
Anyway, point is that you can try it if you want and then, report the number. :)

Thanks.

> 
> Just for your reference, Thanks!
> 
-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] Staging: comedi: convert while loop to timeout in ni_mio_common.c

2014-01-14 Thread Chase Southwood

This patch for ni_mio_common.c changes out a while loop for a timeout,
which is preferred.

Signed-off-by: Chase Southwood 
---

All right, I think this guy's ready to go now!  Thanks for all the help!
Chase

2: Changed from simple clean-up to swapping a timeout in for a while loop.

3: Removed extra counter variable, and added error checking.

4: No longer using counter variable, using jiffies instead.

5: udelay for 10u, instead of 1u.

 drivers/staging/comedi/drivers/ni_mio_common.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/ni_mio_common.c 
b/drivers/staging/comedi/drivers/ni_mio_common.c
index 457b884..07e9777 100644
--- a/drivers/staging/comedi/drivers/ni_mio_common.c
+++ b/drivers/staging/comedi/drivers/ni_mio_common.c
@@ -687,12 +687,21 @@ static void ni_clear_ai_fifo(struct comedi_device *dev)
 {
const struct ni_board_struct *board = comedi_board(dev);
struct ni_private *devpriv = dev->private;
+   unsigned long timeout;
 
if (board->reg_type == ni_reg_6143) {
/*  Flush the 6143 data FIFO */
ni_writel(0x10, AIFIFO_Control_6143);   /*  Flush fifo */
ni_writel(0x00, AIFIFO_Control_6143);   /*  Flush fifo */
-   while (ni_readl(AIFIFO_Status_6143) & 0x10) ;   /*  Wait for 
complete */
+   /*  Wait for complete */
+   timeout = jiffies + msec_to_jiffies(500);
+   while (ni_readl(AIFIFO_Status_6143) & 0x10) {
+   if (time_after(jiffies, timeout)) {
+   comedi_error(dev, "FIFO flush timeout.");
+   break;
+   }
+   udelay(10);
+   }
} else {
devpriv->stc_writew(dev, 1, ADC_FIFO_Clear);
if (board->reg_type == ni_reg_625x) {
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Your email a.w.a.r.d in the Jargua sum of $800, 000.00 send your full name, address phone number

2014-01-14 Thread jaguarautomobileanniversary

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 00/15] PCI: Allocate 64-bit BARs above 4G when possible

2014-01-14 Thread Dave Airlie

> 
> I added Daniel's Reviewed-by to the AGP patches (except the trivial
> PCI_COMMAND change in ati_configure()).
> 
> I added the incremental patch below to fix these warnings found by
> Fengguang's autobuilder in the original b1e0e392f5dd commit:
> 
>   drivers/char/agp/amd-k7-agp.c:115:38: warning: 'addr' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
>   drivers/pci/bus.c:105:5: warning: large integer implicitly truncated to 
> unsigned type [-Woverflow]
> 
> Finally, I merged the pci/resource branch with these changes into my "next"
> branch, so it should appear in v3.14-rc1.
> 
> Dave, let me know if you have any issue with these AGP changes going
> through my tree.

None, all fine by me.

Acked-by: Dave Airlie 

Dave.
> 
> Bjorn
> 
> 
> diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c
> index e8c2e9167e89..3661a51e93e2 100644
> --- a/drivers/char/agp/amd-k7-agp.c
> +++ b/drivers/char/agp/amd-k7-agp.c
> @@ -148,8 +148,8 @@ static int amd_create_gatt_table(struct agp_bridge_data 
> *bridge)
>* used to program the agp master not the cpu
>*/
>  
> - agp_bridge->gart_bus_addr = pci_bus_address(agp_bridge->dev,
> - AGP_APERTURE_BAR);
> + addr = pci_bus_address(agp_bridge->dev, AGP_APERTURE_BAR);
> + agp_bridge->gart_bus_addr = addr;
>  
>   /* Calculate the agp offset */
>   for (i = 0; i < value->num_entries / 1024; i++, addr += 0x0040) {
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 107ad9a5b8aa..86fb8ec5e448 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -99,10 +99,12 @@ void pci_bus_remove_resources(struct pci_bus *bus)
>  }
>  
>  static struct pci_bus_region pci_32_bit = {0, 0xULL};
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>  static struct pci_bus_region pci_64_bit = {0,
> - (resource_size_t) 0xULL};
> -static struct pci_bus_region pci_high = {(resource_size_t) 0x1ULL,
> - (resource_size_t) 0xULL};
> + (dma_addr_t) 0xULL};
> +static struct pci_bus_region pci_high = {(dma_addr_t) 0x1ULL,
> + (dma_addr_t) 0xULL};
> +#endif
>  
>  /*
>   * @res contains CPU addresses.  Clip it so the corresponding bus addresses
> @@ -207,6 +209,7 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct 
> resource *res,
> resource_size_t),
>   void *alignf_data)
>  {
> +#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
>   int rc;
>  
>   if (res->flags & IORESOURCE_MEM_64) {
> @@ -220,6 +223,7 @@ int pci_bus_alloc_resource(struct pci_bus *bus, struct 
> resource *res,
>type_mask, alignf, alignf_data,
>_64_bit);
>   }
> +#endif
>  
>   return pci_bus_alloc_from_region(bus, res, size, align, min,
>type_mask, alignf, alignf_data,
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Documentation / CPU hotplug: Fix the typo in example code

2014-01-14 Thread Sangjung Woo

As the notifier_block name (i.e. foobar_cpu_notifer) is different from
the parameter (i.e.foobar_cpu_notifier) of register function, that is
definitely error and it also makes readers confused.

Signed-off-by: Sangjung Woo 
---
 Documentation/cpu-hotplug.txt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 8cb9938..be675d2 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -285,7 +285,7 @@ A: This is what you would need in your kernel code to 
receive notifications.
return NOTIFY_OK;
}
 
-   static struct notifier_block foobar_cpu_notifer =
+   static struct notifier_block foobar_cpu_notifier =
{
   .notifier_call = foobar_cpu_callback,
};
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: unsigned commits in the mips tree

2014-01-14 Thread Stephen Rothwell

Hi Ralf,

I noticed that there is a whole series of commits in today's mips tree
that were committed by John Crispin but have no Signed-off-by from him ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpurbLY5MRfw.pgp
Description: PGP signature

Re: [PATCH v3 5/5] slab: make more slab management structure off the slab

2014-01-14 Thread David Rientjes

On Mon, 2 Dec 2013, Joonsoo Kim wrote:

> Now, the size of the freelist for the slab management diminish,
> so that the on-slab management structure can waste large space
> if the object of the slab is large.
> 
> Consider a 128 byte sized slab. If on-slab is used, 31 objects can be
> in the slab. The size of the freelist for this case would be 31 bytes
> so that 97 bytes, that is, more than 75% of object size, are wasted.
> 
> In a 64 byte sized slab case, no space is wasted if we use on-slab.
> So set off-slab determining constraint to 128 bytes.
> 
> Acked-by: Christoph Lameter 
> Signed-off-by: Joonsoo Kim 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH/RFC] perf ui/tui: Show column header in hist browser

2014-01-14 Thread Namhyung Kim

Add a line for showing column headers like --stdio.

Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/browser.c|  4 +--
 tools/perf/ui/browsers/hists.c | 55 ++
 2 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index d11541d4d7d7..441470ba8d2c 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -278,8 +278,8 @@ void ui_browser__hide(struct ui_browser *browser)
 static void ui_browser__scrollbar_set(struct ui_browser *browser)
 {
int height = browser->height, h = 0, pct = 0,
-   col = browser->width,
-   row = browser->y - 1;
+   col = browser->width, row = 0;
+
 
if (browser->nr_entries > 1) {
pct = ((browser->index * (browser->height - 1)) /
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index e0ab399d431d..5b94de48f23c 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -25,6 +25,7 @@ struct hist_browser {
struct map_symbol   *selection;
int  print_seq;
bool show_dso;
+   bool show_column_header;
floatmin_pcnt;
u64  nr_pcnt_entries;
 };
@@ -312,6 +313,57 @@ static void ui_browser__warn_lost_events(struct ui_browser 
*browser)
 
 static void hist_browser__update_pcnt_entries(struct hist_browser *hb);
 
+static void hist_browser__column_header(struct hist_browser *browser)
+{
+   struct perf_hpp_fmt *fmt;
+   struct sort_entry *se;
+   struct hists *hists = browser->hists;
+   bool show = browser->show_column_header;
+   unsigned int width;
+
+   pthread_mutex_lock(__lock);
+
+   if (!show)
+   goto out;
+
+   ui_browser__gotorc(>b, 0, 0);
+
+   perf_hpp__for_each_format(fmt) {
+   char buf[128];
+   struct perf_hpp hpp = {
+   .buf= buf,
+   .size   = sizeof(buf),
+   .ptr= hists_to_evsel(hists),
+   };
+
+   width = fmt->width(fmt, );
+   fmt->header(fmt, );
+
+   slsmg_printf(" %-*s", width, hpp.buf);
+   }
+
+   list_for_each_entry(se, _entry__sort_list, list) {
+   if (se->elide)
+   continue;
+
+   width = strlen(se->se_header);
+   if (!hists__new_col_len(hists, se->se_width_idx, width))
+   width = hists__col_len(hists, se->se_width_idx);
+
+   slsmg_printf("  %-*s", width, se->se_header);
+   }
+
+   /*
+* We just called ui_browser__refresh_dimensions().
+* Update it to honor a new column header line.
+*/
+   browser->b.y++;
+   browser->b.height--;
+
+out:
+   pthread_mutex_unlock(__lock);
+}
+
 static int hist_browser__run(struct hist_browser *browser, const char *ev_name,
 struct hist_browser_timer *hbt)
 {
@@ -331,6 +383,8 @@ static int hist_browser__run(struct hist_browser *browser, 
const char *ev_name,
 "Press '?' for help on key bindings") < 0)
return -1;
 
+   hist_browser__column_header(browser);
+
while (1) {
key = ui_browser__run(>b, delay_secs);
 
@@ -1198,6 +1252,7 @@ static struct hist_browser *hist_browser__new(struct 
hists *hists)
browser->b.refresh = hist_browser__refresh;
browser->b.seek = ui_browser__hists_seek;
browser->b.use_navkeypressed = true;
+   browser->show_column_header = true;
}
 
return browser;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 4/5] slab: introduce byte sized index for the freelist of a slab

2014-01-14 Thread David Rientjes

On Mon, 2 Dec 2013, Joonsoo Kim wrote:

> Currently, the freelist of a slab consist of unsigned int sized indexes.
> Since most of slabs have less number of objects than 256, large sized
> indexes is needless. For example, consider the minimum kmalloc slab. It's
> object size is 32 byte and it would consist of one page, so 256 indexes
> through byte sized index are enough to contain all possible indexes.
> 
> There can be some slabs whose object size is 8 byte. We cannot handle
> this case with byte sized index, so we need to restrict minimum
> object size. Since these slabs are not major, wasted memory from these
> slabs would be negligible.
> 
> Some architectures' page size isn't 4096 bytes and rather larger than
> 4096 bytes (One example is 64KB page size on PPC or IA64) so that
> byte sized index doesn't fit to them. In this case, we will use
> two bytes sized index.
> 
> Below is some number for this patch.
> 
> * Before *
> kmalloc-512  52564051281 : tunables   54   270 : 
> slabdata 80 80  0
> kmalloc-256  210210256   151 : tunables  120   600 : 
> slabdata 14 14  0
> kmalloc-192 1016   1040192   201 : tunables  120   600 : 
> slabdata 52 52  0
> kmalloc-96   560620128   311 : tunables  120   600 : 
> slabdata 20 20  0
> kmalloc-64  2148   2280 64   601 : tunables  120   600 : 
> slabdata 38 38  0
> kmalloc-128  647682128   311 : tunables  120   600 : 
> slabdata 22 22  0
> kmalloc-32 11360  11413 32  1131 : tunables  120   600 : 
> slabdata101101  0
> kmem_cache   197200192   201 : tunables  120   600 : 
> slabdata 10 10  0
> 
> * After *
> kmalloc-512  52164851281 : tunables   54   270 : 
> slabdata 81 81  0
> kmalloc-256  208208256   161 : tunables  120   600 : 
> slabdata 13 13  0
> kmalloc-192 1029   1029192   211 : tunables  120   600 : 
> slabdata 49 49  0
> kmalloc-96   529589128   311 : tunables  120   600 : 
> slabdata 19 19  0
> kmalloc-64  2142   2142 64   631 : tunables  120   600 : 
> slabdata 34 34  0
> kmalloc-128  660682128   311 : tunables  120   600 : 
> slabdata 22 22  0
> kmalloc-32 11716  11780 32  1241 : tunables  120   600 : 
> slabdata 95 95  0
> kmem_cache   197210192   211 : tunables  120   600 : 
> slabdata 10 10  0
> 
> kmem_caches consisting of objects less than or equal to 256 byte have
> one or more objects than before. In the case of kmalloc-32, we have 11 more
> objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of
> memory. Of couse, this percentage decreases as the number of objects
> in a slab decreases.
> 
> Here are the performance results on my 4 cpus machine.
> 
> * Before *
> 
>  Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
> runs):
> 
>229,945,138 cache-misses   
>( +-  0.23% )
> 
>   11.627897174 seconds time elapsed   
>( +-  0.14% )
> 
> * After *
> 
>  Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 
> runs):
> 
>218,640,472 cache-misses   
>( +-  0.42% )
> 
>   11.504999837 seconds time elapsed   
>( +-  0.21% )
> 
> cache-misses are reduced by this patchset, roughly 5%.
> And elapsed times are improved by 1%.
> 
> Signed-off-by: Joonsoo Kim 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Alex Shi

On 01/15/2014 12:53 PM, Alex Shi wrote:
>>> >> I guess we missed some code for latest_wake here?
>> > 
>> > Yes, thanks for reminder!
>> > 
>> > so updated patch:
>> > 
> ops, still incorrect. re-updated:

update to wrong file. re-re-update. :(

===

>From b75e43bb77df14e2209532c1e5c48e0e03afa414 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi 
---
 kernel/sched/fair.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..f82ca3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4159,6 +4159,10 @@ find_idlest_cpu(struct sched_group *group, struct 
task_struct *p, int this_cpu)
int idlest = -1;
int i;
 
+#ifdef CONFIG_NO_HZ_COMMON
+   s64 latest_wake = 0;
+#endif
+
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
load = weighted_cpuload(i);
@@ -4167,6 +4171,30 @@ find_idlest_cpu(struct sched_group *group, struct 
task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+   /*
+* Coarsely to get the latest idle cpu for shorter latency and
+* possible power benefit.
+*/
+   if (!load) {
+   struct tick_sched *ts = _cpu(tick_cpu_sched, i);
+
+   /* idle cpu doing irq */
+   if (ts->inidle && !ts->idle_active)
+   idlest = i;
+   /* the cpu resched */
+   else if (!ts->inidle)
+   idlest = i;
+   /* find latest idle cpu */
+   else {
+   s64 temp = ktime_to_us(ts->idle_entrytime);
+   if (temp > latest_wake) {
+   latest_wake = temp;
+   idlest = i;
+   }
+   }
+   }
+#endif
}
 
return idlest;
-- 
1.8.1.2

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the tip tree with the mips tree

2014-01-14 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in
arch/mips/netlogic/xlp/setup.c between commit 8e4857962d97 ("MIPS:
Netlogic: Core wakeup improvements") from the mips tree and commit
7972e966b032 ("MIPS: Remove panic_timeout settings") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/mips/netlogic/xlp/setup.c
index c3af2d8772cf,54e75c77184b..
--- a/arch/mips/netlogic/xlp/setup.c
+++ b/arch/mips/netlogic/xlp/setup.c
@@@ -96,15 -92,6 +96,14 @@@ static void __init xlp_init_mem_from_ba
  
  void __init plat_mem_setup(void)
  {
 +#ifdef CONFIG_SMP
 +  nlm_wakeup_secondary_cpus();
 +
 +  /* update TLB size after waking up threads */
 +  current_cpu_data.tlbsize = ((read_c0_config6() >> 16) & 0x) + 1;
 +
 +  register_smp_ops(_smp_ops);
 +#endif
-   panic_timeout   = 5;
_machine_restart = (void (*)(char *))nlm_linux_exit;
_machine_halt   = nlm_linux_exit;
pm_power_off= nlm_linux_exit;


pgp2bufSJR2Ri.pgp
Description: PGP signature

Re: [PATCH v3 3/5] slab: restrict the number of objects in a slab

2014-01-14 Thread David Rientjes

On Mon, 2 Dec 2013, Joonsoo Kim wrote:

> To prepare to implement byte sized index for managing the freelist
> of a slab, we should restrict the number of objects in a slab to be less
> or equal to 256, since byte only represent 256 different values.
> Setting the size of object to value equal or more than newly introduced
> SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or
> equal to 256 for a slab with 1 page.
> 
> If page size is rather larger than 4096, above assumption would be wrong.
> In this case, we would fall back on 2 bytes sized index.
> 
> If minimum size of kmalloc is less than 16, we use it as minimum object
> size and give up this optimization.
> 
> Signed-off-by: Joonsoo Kim 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/17] tools lib traceevent: Add state member to struct trace_seq

2014-01-14 Thread Namhyung Kim

On Tue, 14 Jan 2014 21:56:55 -0500, Steven Rostedt wrote:
> On Wed, 15 Jan 2014 11:49:28 +0900
> Namhyung Kim  wrote:
>> Oh, it looks better.  But I'd like to TRACE_SEQ_CHECK() as is for some
>> cases.  How about this?
>
> Looks good to me.
>
> Acked-by: Steven Rostedt 
>
> I'll try to look at the rest of the patches tomorrow.

Thanks, and I found that trace_seq_destroy() should use
TRACE_SEQ_TRACE_RET instead.  Here's an updated patch.

Thanks,
Namhyung


>From 539a5dd1cb3ba4cb03c283115a9a84f8e345514e Mon Sep 17 00:00:00 2001
From: Namhyung Kim 
Date: Thu, 19 Dec 2013 18:17:44 +0900
Subject: [PATCH] tools lib traceevent: Add state member to struct trace_seq

The trace_seq->state is for tracking errors during the use of
trace_seq APIs and getting rid of die() in it.

Acked-by: Steven Rostedt 
Signed-off-by: Namhyung Kim 
---
 tools/lib/traceevent/Makefile  |  2 +-
 tools/lib/traceevent/event-parse.h |  7 +
 tools/lib/traceevent/trace-seq.c   | 55 +-
 3 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index f778d48ac609..56d52a33a3df 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -136,7 +136,7 @@ export Q VERBOSE
 
 EVENT_PARSE_VERSION = $(EP_VERSION).$(EP_PATCHLEVEL).$(EP_EXTRAVERSION)
 
-INCLUDES = -I. $(CONFIG_INCLUDES)
+INCLUDES = -I. -I $(srctree)/../../include $(CONFIG_INCLUDES)
 
 # Set compile option CFLAGS if not set elsewhere
 CFLAGS ?= -g -Wall
diff --git a/tools/lib/traceevent/event-parse.h 
b/tools/lib/traceevent/event-parse.h
index cf5db9013f2c..3c890cb28db7 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -58,6 +58,12 @@ struct pevent_record {
 #endif
 };
 
+enum trace_seq_fail {
+   TRACE_SEQ__GOOD,
+   TRACE_SEQ__BUFFER_POISONED,
+   TRACE_SEQ__MEM_ALLOC_FAILED,
+};
+
 /*
  * Trace sequences are used to allow a function to call several other functions
  * to create a string of data to use (up to a max of PAGE_SIZE).
@@ -68,6 +74,7 @@ struct trace_seq {
unsigned intbuffer_size;
unsigned intlen;
unsigned intreadpos;
+   enum trace_seq_fail state;
 };
 
 void trace_seq_init(struct trace_seq *s);
diff --git a/tools/lib/traceevent/trace-seq.c b/tools/lib/traceevent/trace-seq.c
index d7f2e68bc5b9..f7112138e6af 100644
--- a/tools/lib/traceevent/trace-seq.c
+++ b/tools/lib/traceevent/trace-seq.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 
+#include 
 #include "event-parse.h"
 #include "event-utils.h"
 
@@ -32,10 +33,21 @@
 #define TRACE_SEQ_POISON   ((void *)0xdeadbeef)
 #define TRACE_SEQ_CHECK(s) \
 do {   \
-   if ((s)->buffer == TRACE_SEQ_POISON)\
-   die("Usage of trace_seq after it was destroyed");   \
+   if (WARN_ONCE((s)->buffer == TRACE_SEQ_POISON,  \
+ "Usage of trace_seq after it was destroyed")) \
+   (s)->state = TRACE_SEQ__BUFFER_POISONED;\
 } while (0)
 
+#define TRACE_SEQ_CHECK_RET_N(s, n)\
+do {   \
+   TRACE_SEQ_CHECK(s); \
+   if ((s)->state != TRACE_SEQ__GOOD)  \
+   return n;   \
+} while (0)
+
+#define TRACE_SEQ_CHECK_RET(s)   TRACE_SEQ_CHECK_RET_N(s, )
+#define TRACE_SEQ_CHECK_RET0(s)  TRACE_SEQ_CHECK_RET_N(s, 0)
+
 /**
  * trace_seq_init - initialize the trace_seq structure
  * @s: a pointer to the trace_seq structure to initialize
@@ -46,6 +58,7 @@ void trace_seq_init(struct trace_seq *s)
s->readpos = 0;
s->buffer_size = TRACE_SEQ_BUF_SIZE;
s->buffer = malloc_or_die(s->buffer_size);
+   s->state = TRACE_SEQ__GOOD;
 }
 
 /**
@@ -71,7 +84,7 @@ void trace_seq_destroy(struct trace_seq *s)
 {
if (!s)
return;
-   TRACE_SEQ_CHECK(s);
+   TRACE_SEQ_CHECK_RET(s);
free(s->buffer);
s->buffer = TRACE_SEQ_POISON;
 }
@@ -80,8 +93,9 @@ static void expand_buffer(struct trace_seq *s)
 {
s->buffer_size += TRACE_SEQ_BUF_SIZE;
s->buffer = realloc(s->buffer, s->buffer_size);
-   if (!s->buffer)
-   die("Can't allocate trace_seq buffer memory");
+   if (WARN_ONCE(!s->buffer,
+ "Can't allocate trace_seq buffer memory"))
+   s->state = TRACE_SEQ__MEM_ALLOC_FAILED;
 }
 
 /**
@@ -105,9 +119,9 @@ trace_seq_printf(struct trace_seq *s, const char *fmt, ...)
int len;
int ret;
 
-   TRACE_SEQ_CHECK(s);
-
  try_again:
+   TRACE_SEQ_CHECK_RET0(s);
+
len = (s->buffer_size - 1) - s->len;
 
va_start(ap, fmt);
@@ -141,9 +155,9 @@ trace_seq_vprintf(struct trace_seq *s, const char *fmt, 
va_list args)
int len;

[PATCH 16/16] hold bus_mutex in netlink and search

2014-01-14 Thread David Fries

The bus_mutex needs to be taken to serialize access to a specific bus.
netlink wasn't updated when bus_mutex was added and was calling
without that lock held, and not all of the masters were holding the
bus_mutex in a search.  This was causing the ds2490 hardware to stop
responding when both netlink and /sys slaves were executing bus
commands at the same time.

Signed-off-by: David Fries 
---
This fixes existing bugs, tacking it to the end of the previous patch
series.

 drivers/w1/masters/ds1wm.c  |4 +++-
 drivers/w1/masters/ds2490.c |8 ++--
 drivers/w1/w1_netlink.c |   13 +++--
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/w1/masters/ds1wm.c b/drivers/w1/masters/ds1wm.c
index 02df3b1..b077b8b 100644
--- a/drivers/w1/masters/ds1wm.c
+++ b/drivers/w1/masters/ds1wm.c
@@ -326,13 +326,14 @@ static void ds1wm_search(void *data, struct w1_master 
*master_dev,
unsigned slaves_found = 0;
unsigned int pass = 0;
 
+   mutex_lock(_dev->bus_mutex);
dev_dbg(_data->pdev->dev, "search begin\n");
while (true) {
++pass;
if (pass > 100) {
dev_dbg(_data->pdev->dev,
"too many attempts (100), search aborted\n");
-   return;
+   break;
}
 
mutex_lock(_dev->bus_mutex);
@@ -439,6 +440,7 @@ static void ds1wm_search(void *data, struct w1_master 
*master_dev,
dev_dbg(_data->pdev->dev,
"pass: %d total: %d search done ms d bit pos: %d\n", pass,
slaves_found, ms_discrep_bit);
+   mutex_unlock(_dev->bus_mutex);
 }
 
 /* - */
diff --git a/drivers/w1/masters/ds2490.c b/drivers/w1/masters/ds2490.c
index db0bf32..7404ad30 100644
--- a/drivers/w1/masters/ds2490.c
+++ b/drivers/w1/masters/ds2490.c
@@ -727,9 +727,11 @@ static void ds9490r_search(void *data, struct w1_master 
*master,
 */
u64 buf[2*64/8];
 
+   mutex_lock(>bus_mutex);
+
/* address to start searching at */
if (ds_send_data(dev, (u8 *)>search_id, 8) < 0)
-   return;
+   goto search_out;
master->search_id = 0;
 
value = COMM_SEARCH_ACCESS | COMM_IM | COMM_RST | COMM_SM | COMM_F |
@@ -739,7 +741,7 @@ static void ds9490r_search(void *data, struct w1_master 
*master,
search_limit = 0;
index = search_type | (search_limit << 8);
if (ds_send_control(dev, value, index) < 0)
-   return;
+   goto search_out;
 
do {
schedule_timeout(jtime);
@@ -791,6 +793,8 @@ static void ds9490r_search(void *data, struct w1_master 
*master,
master->max_slave_count);
set_bit(W1_WARN_MAX_COUNT, >flags);
}
+search_out:
+   mutex_unlock(>bus_mutex);
 }
 
 #if 0
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index a5dc219..5234964 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -246,11 +246,16 @@ static int w1_process_command_master(struct w1_master 
*dev,
 {
int err = -EINVAL;
 
+   /* drop bus_mutex for search (does it's own locking), and add/remove
+* which doesn't use the bus
+*/
switch (req_cmd->cmd) {
case W1_CMD_SEARCH:
case W1_CMD_ALARM_SEARCH:
case W1_CMD_LIST_SLAVES:
+   mutex_unlock(>bus_mutex);
err = w1_get_slaves(dev, req_msg, req_hdr, req_cmd);
+   mutex_lock(>bus_mutex);
break;
case W1_CMD_READ:
case W1_CMD_WRITE:
@@ -262,8 +267,12 @@ static int w1_process_command_master(struct w1_master *dev,
break;
case W1_CMD_SLAVE_ADD:
case W1_CMD_SLAVE_REMOVE:
+   mutex_unlock(>bus_mutex);
+   mutex_lock(>mutex);
err = w1_process_command_addremove(dev, req_msg, req_hdr,
req_cmd);
+   mutex_unlock(>mutex);
+   mutex_lock(>bus_mutex);
break;
default:
err = -EINVAL;
@@ -400,7 +409,7 @@ static void w1_process_cb(struct w1_master *dev, struct 
w1_async_cmd *async_cmd)
struct w1_slave *sl = node->sl;
struct w1_netlink_cmd *cmd = NULL;
 
-   mutex_lock(>mutex);
+   mutex_lock(>bus_mutex);
dev->portid = node->block->portid;
if (sl && w1_reset_select_slave(sl))
err = -ENODEV;
@@ -437,7 +446,7 @@ static void w1_process_cb(struct w1_master *dev, struct 
w1_async_cmd *async_cmd)
else
atomic_dec(>refcnt);
dev->portid = 0;
-   mutex_unlock(>mutex);
+   mutex_unlock(>bus_mutex);
 
mutex_lock(>list_mutex);
list_del(_cmd->async_entry);
-- 
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body

[git pull] Please pull powerpc.git merge branch

2014-01-14 Thread Benjamin Herrenschmidt

Hi Linus !

So you make the call onto whether taking that one now or waiting for the
merge window. It's a bug fix for a crash in mremap that occurs on
powerpc with THP enabled.

The fix however requires a small change in the generic code. It moves a
condition into a helper we can override from the arch which is harmless,
but it *also* slightly changes the order of the set_pmd and the withdraw
& deposit, which should be fine according to Kirill (who wrote that
code) but I agree -rc8 is a bit late...

It was acked by Kirill and Andrew told me to just merge it via powerpc.

My original intend was to put it in powerpc-next and then shoot it to
stable, but it got a tad annoying (due to churn it needs to be applied
at least on rc4 or later while my next is at rc1 and clean that way), so
I put it in the merge branch.

>From there, you tell me if you want to take it now, if not, I'll send
you that branch along with my normal next one after you open the merge
window.

Cheers,
Ben.

The following changes since commit a6da83f98267bc8ee4e34aa899169991eb0ceb93:

  Merge branch 'merge' of 
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc (2014-01-13 10:59:05 
+0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

for you to fetch changes up to b3084f4db3aeb991c507ca774337c7e7893ed04f:

  powerpc/thp: Fix crash on mremap (2014-01-15 15:46:38 +1100)


Aneesh Kumar K.V (1):
  powerpc/thp: Fix crash on mremap

 arch/powerpc/include/asm/pgtable-ppc64.h | 14 ++
 include/asm-generic/pgtable.h| 12 
 mm/huge_memory.c | 14 +-
 3 files changed, 31 insertions(+), 9 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the tip tree with the mips tree

2014-01-14 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in
arch/mips/Kconfig between commit 597ce1723e0f ("MIPS: Support for 64-bit
FP with O32 binaries") from the mips tree and commit 19952a92037e
("stackprotector: Unify the HAVE_CC_STACKPROTECTOR logic between
architectures") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/mips/Kconfig
index 87dc0c3fe05f,c93d92beb3d6..
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@@ -2354,36 -2323,6 +2355,23 @@@ config SECCOM
  
  If unsure, say Y. Only embedded should say N here.
  
- config CC_STACKPROTECTOR
-   bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
-   help
- This option turns on the -fstack-protector GCC feature. This
- feature puts, at the beginning of functions, a canary value on
- the stack just before the return address, and validates
- the value just before actually returning.  Stack based buffer
- overflows (that need to overwrite this return address) now also
- overwrite the canary, which gets detected and the attack is then
- neutralized via a kernel panic.
- 
- This feature requires gcc version 4.2 or above.
- 
 +config MIPS_O32_FP64_SUPPORT
 +  bool "Support for O32 binaries using 64-bit FP"
 +  depends on 32BIT || MIPS32_O32
 +  default y
 +  help
 +When this is enabled, the kernel will support use of 64-bit floating
 +point registers with binaries using the O32 ABI along with the
 +EF_MIPS_FP64 ELF header flag (typically built with -mfp64). On
 +32-bit MIPS systems this support is at the cost of increasing the
 +size and complexity of the compiled FPU emulator. Thus if you are
 +running a MIPS32 system and know that none of your userland binaries
 +will require 64-bit floating point, you may wish to reduce the size
 +of your kernel & potentially improve FP emulation performance by
 +saying N here.
 +
 +If unsure, say Y.
 +
  config USE_OF
bool
select OF


pgpdHaXJGytMf.pgp
Description: PGP signature

Re: [PATCH v3 2/5] slab: introduce helper functions to get/set free object

2014-01-14 Thread David Rientjes

On Mon, 2 Dec 2013, Joonsoo Kim wrote:

> In the following patches, to get/set free objects from the freelist
> is changed so that simple casting doesn't work for it. Therefore,
> introduce helper functions.
> 
> Acked-by: Christoph Lameter 
> Signed-off-by: Joonsoo Kim 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/5] slab: factor out calculate nr objects in cache_estimate

2014-01-14 Thread David Rientjes

On Mon, 2 Dec 2013, Joonsoo Kim wrote:

> This logic is not simple to understand so that making separate function
> helping readability. Additionally, we can use this change in the
> following patch which implement for freelist to have another sized index
> in according to nr objects.
> 
> Acked-by: Christoph Lameter 
> Signed-off-by: Joonsoo Kim 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 13/14] mm, hugetlb: retry if failed to allocate and there is concurrent user

2014-01-14 Thread Andrew Morton

On Tue, 14 Jan 2014 20:37:49 -0800 Davidlohr Bueso  wrote:

> On Tue, 2014-01-14 at 19:08 -0800, David Rientjes wrote:
> > On Mon, 6 Jan 2014, Davidlohr Bueso wrote:
> > 
> > > > If Andrew agree, It would be great to merge 1-7 patches into mainline
> > > > before your mutex approach. There are some of clean-up patches and, IMO,
> > > > it makes the code more readable and maintainable, so it is worth to 
> > > > merge
> > > > separately.
> > > 
> > > Fine by me.
> > > 
> > 
> > It appears like patches 1-7 are still missing from linux-next, would you 
> > mind posting them in a series with your approach?
> 
> I haven't looked much into patches 4-7, but at least the first three are
> ok. I was waiting for Andrew to take all seven for linux-next and then
> I'd rebase my approach on top. Anyway, unless Andrew has any
> preferences, if by later this week they're not picked up, I'll resend
> everything.

Well, we're mainly looking for bugfixes this last in the cycle. 
"[PATCH v3 03/14] mm, hugetlb: protect region tracking via newly
introduced resv_map lock" fixes a bug, but I'd assumed that it depended
on earlier patches.  If we think that one is serious then it would be
better to cook up a minimal fix which is backportable into 3.12 and
eariler?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Alex Shi

On 01/15/2014 12:48 PM, Alex Shi wrote:
> On 01/15/2014 12:31 PM, Michael wang wrote:
>> Hi, Alex
>>
>> On 01/15/2014 12:07 PM, Alex Shi wrote:
>> [snip]   }
>>> +#ifdef CONFIG_NO_HZ_COMMON
>>> +   /*
>>> +* Coarsely to get the latest idle cpu for shorter latency and
>>> +* possible power benefit.
>>> +*/
>>> +   if (!min_load) {
> 
> here should be !load.
>>> +   struct tick_sched *ts = _cpu(tick_cpu_sched, i);
>>> +
>>> +   s64 latest_wake = 0;
>>
>> I guess we missed some code for latest_wake here?
> 
> Yes, thanks for reminder!
> 
> so updated patch:
> 

ops, still incorrect. re-updated:

===

>From 5d48303b3eb3b5ca7fde54a6dfcab79cff360403 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi 
---
 kernel/sched/fair.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..e2c4cd9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4161,12 +4161,38 @@ find_idlest_cpu(struct sched_group *group, struct 
task_struct *p, int this_cpu)
 
/* Traverse only the allowed CPUs */
for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
+   s64 latest_wake = 0;
+
load = weighted_cpuload(i);
 
if (load < min_load || (load == min_load && i == this_cpu)) {
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+   /*
+* Coarsely to get the latest idle cpu for shorter latency and
+* possible power benefit.
+*/
+   if (!load) {
+   struct tick_sched *ts = _cpu(tick_cpu_sched, i);
+
+   /* idle cpu doing irq */
+   if (ts->inidle && !ts->idle_active)
+   idlest = i;
+   /* the cpu resched */
+   else if (!ts->inidle)
+   idlest = i;
+   /* find latest idle cpu */
+   else {
+   s64 temp = ktime_to_us(ts->idle_entrytime);
+   if (temp > latest_wake) {
+   latest_wake = temp;
+   idlest = i;
+   }
+   }
+   }
+#endif
}
 
return idlest;
-- 
1.8.1.2

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Alex Shi

On 01/15/2014 12:31 PM, Michael wang wrote:
> Hi, Alex
> 
> On 01/15/2014 12:07 PM, Alex Shi wrote:
> [snip]}
>> +#ifdef CONFIG_NO_HZ_COMMON
>> +/*
>> + * Coarsely to get the latest idle cpu for shorter latency and
>> + * possible power benefit.
>> + */
>> +if (!min_load) {

here should be !load.
>> +struct tick_sched *ts = _cpu(tick_cpu_sched, i);
>> +
>> +s64 latest_wake = 0;
> 
> I guess we missed some code for latest_wake here?

Yes, thanks for reminder!

so updated patch:



>From c3a88e73fed3da96549b5a922076e996832685f8 Mon Sep 17 00:00:00 2001
From: Alex Shi 
Date: Tue, 14 Jan 2014 23:07:42 +0800
Subject: [PATCH] sched: find the latest idle cpu

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi 
---
 kernel/sched/fair.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..73a2a07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,31 @@ find_idlest_cpu(struct sched_group *group, struct 
task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+   /*
+* Coarsely to get the latest idle cpu for shorter latency and
+* possible power benefit.
+*/
+   if (!load) {
+   struct tick_sched *ts = _cpu(tick_cpu_sched, i);
+
+   s64 latest_wake = 0;
+   /* idle cpu doing irq */
+   if (ts->inidle && !ts->idle_active)
+   idlest = i;
+   /* the cpu resched */
+   else if (!ts->inidle)
+   idlest = i;
+   /* find latest idle cpu */
+   else {
+   s64 temp = ktime_to_us(ts->idle_entrytime);
+   if (temp > latest_wake) {
+   latest_wake = temp;
+   idlest = i;
+   }
+   }
+   }
+#endif
}
 
return idlest;
-- 
1.8.1.2

-- 
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the audit tree with Linus' tree

2014-01-14 Thread Stephen Rothwell

Hi Eric,

Today's linux-next merge of the audit tree got a conflict in
include/net/xfrm.h between commit d511337a1eda ("xfrm.h: Remove extern
from function prototypes") from Linus' tree and commit 4440e8548153
("audit: convert all sessionid declaration to unsigned int") from the
audit tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc include/net/xfrm.h
index cd7c46ff6f1f,f8d32b908423..
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@@ -714,23 -713,23 +714,23 @@@ static inline void xfrm_audit_helper_us
audit_log_task_context(audit_buf);
  }
  
 -extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
 -kuid_t auid, unsigned int ses, u32 secid);
 -extern void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result,
 -kuid_t auid, unsigned int ses, u32 secid);
 -extern void xfrm_audit_state_add(struct xfrm_state *x, int result,
 -   kuid_t auid, unsigned int ses, u32 secid);
 -extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
 -  kuid_t auid, unsigned int ses, u32 secid);
 -extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
 -   struct sk_buff *skb);
 -extern void xfrm_audit_state_replay(struct xfrm_state *x,
 -  struct sk_buff *skb, __be32 net_seq);
 -extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family);
 -extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family,
 -__be32 net_spi, __be32 net_seq);
 -extern void xfrm_audit_state_icvfail(struct xfrm_state *x,
 -   struct sk_buff *skb, u8 proto);
 +void xfrm_audit_policy_add(struct xfrm_policy *xp, int result, kuid_t auid,
-  u32 ses, u32 secid);
++ unsigned int ses, u32 secid);
 +void xfrm_audit_policy_delete(struct xfrm_policy *xp, int result, kuid_t auid,
- u32 ses, u32 secid);
++unsigned int ses, u32 secid);
 +void xfrm_audit_state_add(struct xfrm_state *x, int result, kuid_t auid,
- u32 ses, u32 secid);
++unsigned int ses, u32 secid);
 +void xfrm_audit_state_delete(struct xfrm_state *x, int result, kuid_t auid,
-u32 ses, u32 secid);
++   unsigned int ses, u32 secid);
 +void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
 +struct sk_buff *skb);
 +void xfrm_audit_state_replay(struct xfrm_state *x, struct sk_buff *skb,
 +   __be32 net_seq);
 +void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family);
 +void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family, __be32 
net_spi,
 + __be32 net_seq);
 +void xfrm_audit_state_icvfail(struct xfrm_state *x, struct sk_buff *skb,
 +u8 proto);
  #else
  
  static inline void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,


pgpaVLYDBxn9h.pgp
Description: PGP signature

[PATCH v9 2/5] qrwlock x86: Enable x86 to use queue read/write lock

2014-01-14 Thread Waiman Long

This patch makes the necessary changes at the x86 architecture specific
layer to enable the presence of the CONFIG_QUEUE_RWLOCK kernel option
to replace the read/write lock by the queue read/write lock.

It also enables the CONFIG_QUEUE_RWLOCK option by default for x86 which
will force the use of queue read/write lock. That will greatly improve
the fairness of read/write lock and eliminate live-lock situation
where one task may not get the lock for an indefinite period of time.

Signed-off-by: Waiman Long 
Reviewed-by: Paul E. McKenney 
---
 arch/x86/Kconfig  |1 +
 arch/x86/include/asm/spinlock.h   |2 ++
 arch/x86/include/asm/spinlock_types.h |4 
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e0acf82..3c6094b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
select MODULES_USE_ELF_RELA if X86_64
select CLONE_BACKWARDS if X86_32
select ARCH_USE_BUILTIN_BSWAP
+   select ARCH_USE_QUEUE_RWLOCK
select OLD_SIGSUSPEND3 if X86_32 || IA32_EMULATION
select OLD_SIGACTION if X86_32
select COMPAT_OLD_SIGACTION if IA32_EMULATION
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index bf156de..8fb88c5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -188,6 +188,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
cpu_relax();
 }
 
+#ifndef CONFIG_QUEUE_RWLOCK
 /*
  * Read-write spinlocks, allowing multiple readers
  * but only one writer.
@@ -270,6 +271,7 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
asm volatile(LOCK_PREFIX WRITE_LOCK_ADD(%1) "%0"
 : "+m" (rw->write) : "i" (RW_LOCK_BIAS) : "memory");
 }
+#endif /* CONFIG_QUEUE_RWLOCK */
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 4f1bea1..a585635 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -34,6 +34,10 @@ typedef struct arch_spinlock {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED  { { 0 } }
 
+#ifdef CONFIG_QUEUE_RWLOCK
+#include 
+#else
 #include 
+#endif
 
 #endif /* _ASM_X86_SPINLOCK_TYPES_H */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v9 1/5] qrwlock: A queue read/write lock implementation

2014-01-14 Thread Waiman Long

This patch introduces a new read/write lock implementation that put
waiting readers and writers into a queue instead of actively contending
the lock like the current read/write lock implementation. This will
improve performance in highly contended situation by reducing the
cache line bouncing effect.

The queue read/write lock (qrwlock) is a fair lock even though there
is still a slight chance of lock stealing if a reader or writer comes
at the right moment.  Other than that, lock granting is done in a
FIFO manner.  As a result, it is possible to determine a maximum time
period after which the waiting is over and the lock can be acquired.

Internally, however, there is a second type of readers which try to
steal lock aggressively. They simply increments the reader count and
wait until the writer releases the lock. The transition to aggressive
reader happens in the read lock slowpath when

 1. In an interrupt context.
 2. When a reader comes to the head of the wait queue and sees
the release of a write lock.

The queue read lock is safe to use in an interrupt context (softirq
or hardirq) as it will switch to become an aggressive reader in such
environment allowing recursive read lock.

The only downside of queue rwlock is the size increase in the lock
structure by 4 bytes for 32-bit systems and by 12 bytes for 64-bit
systems.

In term of single-thread performance (no contention), a 256K
lock/unlock loop was run on a 2.4GHz and 2.93Ghz Westmere x86-64
CPUs. The following table shows the average time (in ns) for a single
lock/unlock sequence (including the looping and timing overhead):

Lock Type   2.4GHz  2.93GHz
-   --  ---
Ticket spinlock  14.912.3
Read lock17.013.5
Write lock   17.013.5
Queue read lock  16.013.4
Queue write lock  9.2 7.8

The queue read lock is slightly slower than the spinlock, but is
slightly faster than the read lock. The queue write lock, however,
is the fastest of all. It is almost twice as fast as the write lock
and about 1.5X of the spinlock.

With lock contention, the speed of each individual lock/unlock function
is less important than the amount of contention-induced delays.

To investigate the performance characteristics of the queue rwlock
compared with the regular rwlock, Ingo's anon_vmas patch that converts
rwsem to rwlock was applied to a 3.12 kernel. This kernel was then
tested under the following 3 conditions:

 1) Plain 3.12
 2) Ingo's patch
 3) Ingo's patch + qrwlock

Each of the 3 kernels were booted up twice with and without the
"idle=poll" kernel parameter which keeps the CPUs in C0 state while
idling instead of a more energy-saving sleep state.  The jobs per
minutes (JPM) results of the AIM7's high_systime workload at 1500
users on a 8-socket 80-core DL980 (HT off) were:

 Kernel JPMs%Change from (1)
 -- 
   1145704/227295   -
   2229750/236066   +58%/+3.8%
   4240062/248606   +65%/+9.4%

The first JPM number is without the "idle=poll" kernel parameter,
the second number is with that parameter. It can be seen that most
of the performance benefit of converting rwsem to rwlock actually
come from the latency improvement of not needing to wake up a CPU
from deep sleep state when work is available.

The use of non-sleeping locks did improve performance by eliminating
the context switching cost. Using queue rwlock gave almost tripling
of performance gain. The performance gain was reduced somewhat with
a fair lock which was to be expected.

Looking at the perf profiles (with idle=poll) below, we can clearly see
that other bottlenecks were constraining the performance improvement.

Perf profile of kernel (2):

 18.65%reaim  [kernel.kallsyms]  [k] __write_lock_failed
  9.00%reaim  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
  5.21%  swapper  [kernel.kallsyms]  [k] cpu_idle_loop
  3.08%reaim  [kernel.kallsyms]  [k] mspin_lock
  2.50%reaim  [kernel.kallsyms]  [k] anon_vma_interval_tree_insert
  2.00%   ls  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
  1.29%reaim  [kernel.kallsyms]  [k] update_cfs_rq_blocked_load
  1.21%reaim  [kernel.kallsyms]  [k] __read_lock_failed
  1.12%reaim  [kernel.kallsyms]  [k] _raw_spin_lock
  1.10%reaim  [kernel.kallsyms]  [k] perf_event_aux
  1.09% true  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave

Perf profile of kernel (3):

 20.14%  swapper  [kernel.kallsyms]  [k] cpu_idle_loop
  7.94%reaim  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
  5.41%reaim  [kernel.kallsyms]  [k] queue_write_lock_slowpath
  5.01%reaim  [kernel.kallsyms]  [k] mspin_lock
  2.12%reaim  [kernel.kallsyms]  [k] anon_vma_interval_tree_insert
  2.07%   ls  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
  1.58%reaim  [kernel.kallsyms]  [k] update_cfs_rq_blocked_load

[PATCH v9 0/5] qrwlock: Introducing a queue read/write lock implementation

2014-01-14 Thread Waiman Long

v8->v9:
 - Rebase to the tip branch which has the PeterZ's
   smp_load_acquire()/smp_store_release() patch.
 - Only pass integer type arguments to smp_load_acquire() &
   smp_store_release() functions.
 - Add a new patch to make any data type less than or equal to long as
   atomic or native in x86.
 - Modify write_unlock() to use atomic_sub() if the writer field is
   not atomic.

v7->v8:
 - Use atomic_t functions (which are implemented in all arch's) to
   modify reader counts.
 - Use smp_load_acquire() & smp_store_release() for barriers.
 - Further tuning in slowpath performance.

v6->v7:
 - Remove support for unfair lock, so only fair qrwlock will be provided.
 - Move qrwlock.c to the kernel/locking directory.

v5->v6:
 - Modify queue_read_can_lock() to avoid false positive result.
 - Move the two slowpath functions' performance tuning change from
   patch 4 to patch 1.
 - Add a new optional patch to use the new smp_store_release() function 
   if that is merged.

v4->v5:
 - Fix wrong definitions for QW_MASK_FAIR & QW_MASK_UNFAIR macros.
 - Add an optional patch 4 which should only be applied after the
   mcs_spinlock.h header file is merged.

v3->v4:
 - Optimize the fast path with better cold cache behavior and
   performance.
 - Removing some testing code.
 - Make x86 use queue rwlock with no user configuration.

v2->v3:
 - Make read lock stealing the default and fair rwlock an option with
   a different initializer.
 - In queue_read_lock_slowpath(), check irq_count() and force spinning
   and lock stealing in interrupt context.
 - Unify the fair and classic read-side code path, and make write-side
   to use cmpxchg with 2 different writer states. This slows down the
   write lock fastpath to make the read side more efficient, but is
   still slightly faster than a spinlock.

v1->v2:
 - Improve lock fastpath performance.
 - Optionally provide classic read/write lock behavior for backward
   compatibility.
 - Use xadd instead of cmpxchg for fair reader code path to make it
   immute to reader contention.
 - Run more performance testing.

As mentioned in the LWN article http://lwn.net/Articles/364583/,
the read/write lock suffer from an unfairness problem that it is
possible for a stream of incoming readers to block a waiting writer
from getting the lock for a long time. Also, a waiting reader/writer
contending a rwlock in local memory will have a higher chance of
acquiring the lock than a reader/writer in remote node.

This patch set introduces a queue-based read/write lock implementation
that can largely solve this unfairness problem.

The read lock slowpath will check if the reader is in an interrupt
context. If so, it will force lock spinning and stealing without
waiting in a queue. This is to ensure the read lock will be granted
as soon as possible.

The queue write lock can also be used as a replacement for ticket
spinlocks that are highly contended if lock size increase is not
an issue.

The first 2 patches provides the base queue read/write lock support
on x86 architecture. Support for other architectures can be added
later on once architecture specific support infrastructure is added
and proper testing is done.

Patches 3 and 4 are currently applicable on the tip git tree where
the smp_load_acquire() & smp_store_release() macros are defined.

The optional patch 5 has a dependency on the the mcs_spinlock.h
header file which has not been merged yet. So this patch should only
be applied after the mcs_spinlock.h header file is merged.

Waiman Long (5):
  qrwlock: A queue read/write lock implementation
  qrwlock x86: Enable x86 to use queue read/write lock
  qrwlock, x86 - Treat all data type not bigger than long as atomic in
x86
  qrwlock: Use smp_store_release() in write_unlock()
  qrwlock: Use the mcs_spinlock helper functions for MCS queuing

 arch/x86/Kconfig  |1 +
 arch/x86/include/asm/barrier.h|8 ++
 arch/x86/include/asm/spinlock.h   |2 +
 arch/x86/include/asm/spinlock_types.h |4 +
 include/asm-generic/qrwlock.h |  208 +
 kernel/Kconfig.locks  |7 +
 kernel/locking/Makefile   |1 +
 kernel/locking/qrwlock.c  |  191 ++
 8 files changed, 422 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/qrwlock.h
 create mode 100644 kernel/locking/qrwlock.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v9 4/5] qrwlock: Use smp_store_release() in write_unlock()

2014-01-14 Thread Waiman Long

This patch modifies the queue_write_unlock() function to use the new
smp_store_release() function (currently in tip). It also removes the
temporary implementation of smp_load_acquire() and smp_store_release()
function in qrwlock.c.

This patch will use atomic subtraction instead if the writer field is
not atomic.

Signed-off-by: Waiman Long 
---
 include/asm-generic/qrwlock.h |   10 ++
 kernel/locking/qrwlock.c  |   34 --
 2 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 5abb6ca..68f488b 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -181,11 +181,13 @@ static inline void queue_read_unlock(struct qrwlock *lock)
 static inline void queue_write_unlock(struct qrwlock *lock)
 {
/*
-* Make sure that none of the critical section will be leaked out.
+* If the writer field is atomic, it can be cleared directly.
+* Otherwise, an atomic subtraction will be used to clear it.
 */
-   smp_mb__before_clear_bit();
-   ACCESS_ONCE(lock->cnts.writer) = 0;
-   smp_mb__after_clear_bit();
+   if (__native_word(lock->cnts.writer))
+   smp_store_release(>cnts.writer, 0);
+   else
+   atomic_sub(_QW_LOCKED, >cnts.rwa);
 }
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 053be4d..2727188 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -47,40 +47,6 @@
 # define arch_mutex_cpu_relax() cpu_relax()
 #endif
 
-#ifndef smp_load_acquire
-# ifdef CONFIG_X86
-#   define smp_load_acquire(p) \
-   ({  \
-   typeof(*p) ___p1 = ACCESS_ONCE(*p); \
-   barrier();  \
-   ___p1;  \
-   })
-# else
-#   define smp_load_acquire(p) \
-   ({  \
-   typeof(*p) ___p1 = ACCESS_ONCE(*p); \
-   smp_mb();   \
-   ___p1;  \
-   })
-# endif
-#endif
-
-#ifndef smp_store_release
-# ifdef CONFIG_X86
-#   define smp_store_release(p, v) \
-   do {\
-   barrier();  \
-   ACCESS_ONCE(*p) = v;\
-   } while (0)
-# else
-#   define smp_store_release(p, v) \
-   do {\
-   smp_mb();   \
-   ACCESS_ONCE(*p) = v;\
-   } while (0)
-# endif
-#endif
-
 /*
  * If an xadd (exchange-add) macro isn't available, simulate one with
  * the atomic_add_return() function.
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v9 3/5] qrwlock, x86 - Treat all data type not bigger than long as atomic in x86

2014-01-14 Thread Waiman Long

The generic __native_word() macro defined in include/linux/compiler.h
only allows "int" and "long" data types to be treated as native and
atomic. The x86 architecture, however, allow the use of char and short
data types as atomic as well.

This patch extends the data type allowed in the __native_word() macro to
allow the use of char and short.

Signed-off-by: Waiman Long 
---
 arch/x86/include/asm/barrier.h |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 04a4890..4d3e30a 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -24,6 +24,14 @@
 #define wmb()  asm volatile("sfence" ::: "memory")
 #endif
 
+/*
+ * All data types <= long are atomic in x86
+ */
+#ifdef __native_word
+#undef __native_word
+#endif
+#define __native_word(t)   (sizeof(t) <= sizeof(long))
+
 /**
  * read_barrier_depends - Flush all pending reads that subsequents reads
  * depend on.
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v9 5/5] qrwlock: Use the mcs_spinlock helper functions for MCS queuing

2014-01-14 Thread Waiman Long

There is a pending MCS lock patch series that adds generic MCS lock
helper functions to do MCS-style locking. This patch will enable
the queue rwlock to use that generic MCS lock/unlock primitives for
internal queuing. This patch should only be merged after the merging
of that generic MCS locking patch.

Signed-off-by: Waiman Long 
Reviewed-by: Paul E. McKenney 
---
 include/asm-generic/qrwlock.h |7 +---
 kernel/locking/qrwlock.c  |   71 -
 2 files changed, 9 insertions(+), 69 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 68f488b..db6b1e2 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -38,10 +38,7 @@
  * the writer field. The least significant 8 bits is the writer field
  * whereas the remaining 24 bits is the reader count.
  */
-struct qrwnode {
-   struct qrwnode *next;
-   int wait;   /* Waiting flag */
-};
+struct mcs_spinlock;
 
 typedef struct qrwlock {
union qrwcnts {
@@ -57,7 +54,7 @@ typedef struct qrwlock {
atomic_trwa;/* Reader/writer atomic */
u32 rwc;/* Reader/writer counts */
} cnts;
-   struct qrwnode *waitq;  /* Tail of waiting queue */
+   struct mcs_spinlock *waitq; /* Tail of waiting queue */
 } arch_rwlock_t;
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2727188..18b4f2c 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -58,64 +59,6 @@
 #endif
 
 /**
- * wait_in_queue - Add to queue and wait until it is at the head
- * @lock: Pointer to queue rwlock structure
- * @node: Node pointer to be added to the queue
- */
-static inline void wait_in_queue(struct qrwlock *lock, struct qrwnode *node)
-{
-   struct qrwnode *prev;
-
-   node->next = NULL;
-   node->wait = true;
-   prev = xchg(>waitq, node);
-   if (prev) {
-   prev->next = node;
-   /*
-* Wait until the waiting flag is off
-*/
-   while (smp_load_acquire(>wait))
-   arch_mutex_cpu_relax();
-   }
-}
-
-/**
- * signal_next - Signal the next one in queue to be at the head
- * @lock: Pointer to queue rwlock structure
- * @node: Node pointer to the current head of queue
- */
-static inline void signal_next(struct qrwlock *lock, struct qrwnode *node)
-{
-   struct qrwnode *next;
-
-   /*
-* Try to notify the next node first without disturbing the cacheline
-* of the lock. If that fails, check to see if it is the last node
-* and so should clear the wait queue.
-*/
-   next = ACCESS_ONCE(node->next);
-   if (likely(next))
-   goto notify_next;
-
-   /*
-* Clear the wait queue if it is the last node
-*/
-   if ((ACCESS_ONCE(lock->waitq) == node) &&
-   (cmpxchg(>waitq, node, NULL) == node))
-   return;
-   /*
-* Wait until the next one in queue set up the next field
-*/
-   while (likely(!(next = ACCESS_ONCE(node->next
-   arch_mutex_cpu_relax();
-   /*
-* The next one in queue is now at the head
-*/
-notify_next:
-   smp_store_release(>wait, false);
-}
-
-/**
  * rspin_until_writer_unlock - inc reader count & spin until writer is gone
  * @lock  : Pointer to queue rwlock structure
  * @writer: Current queue rwlock writer status byte
@@ -138,7 +81,7 @@ rspin_until_writer_unlock(struct qrwlock *lock, u32 rwc)
  */
 void queue_read_lock_slowpath(struct qrwlock *lock)
 {
-   struct qrwnode node;
+   struct mcs_spinlock node;
union qrwcnts cnts;
 
/*
@@ -158,7 +101,7 @@ void queue_read_lock_slowpath(struct qrwlock *lock)
/*
 * Put the reader into the wait queue
 */
-   wait_in_queue(lock, );
+   mcs_spin_lock(>waitq, );
 
/*
 * At the head of the wait queue now, wait until the writer state
@@ -175,7 +118,7 @@ void queue_read_lock_slowpath(struct qrwlock *lock)
/*
 * Signal the next one in queue to become queue head
 */
-   signal_next(lock, );
+   mcs_spin_unlock(>waitq, );
 }
 EXPORT_SYMBOL(queue_read_lock_slowpath);
 
@@ -231,18 +174,18 @@ static inline void queue_write_3step_lock(struct qrwlock 
*lock)
  */
 void queue_write_lock_slowpath(struct qrwlock *lock)
 {
-   struct qrwnode node;
+   struct mcs_spinlock node;
 
/*
 * Put the writer into the wait queue
 */
-   wait_in_queue(lock, );
+   mcs_spin_lock(>waitq, );
 
/*
 * At the head of the wait queue now, call queue_write_3step_lock()
 * to acquire the lock until it is done.
 */
queue_write_3step_lock(lock);
-   signal_next(lock, );
+

Re: [PATCH v3 13/14] mm, hugetlb: retry if failed to allocate and there is concurrent user

2014-01-14 Thread Davidlohr Bueso

On Tue, 2014-01-14 at 19:08 -0800, David Rientjes wrote:
> On Mon, 6 Jan 2014, Davidlohr Bueso wrote:
> 
> > > If Andrew agree, It would be great to merge 1-7 patches into mainline
> > > before your mutex approach. There are some of clean-up patches and, IMO,
> > > it makes the code more readable and maintainable, so it is worth to merge
> > > separately.
> > 
> > Fine by me.
> > 
> 
> It appears like patches 1-7 are still missing from linux-next, would you 
> mind posting them in a series with your approach?

I haven't looked much into patches 4-7, but at least the first three are
ok. I was waiting for Andrew to take all seven for linux-next and then
I'd rebase my approach on top. Anyway, unless Andrew has any
preferences, if by later this week they're not picked up, I'll resend
everything.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the device-mapper tree

2014-01-14 Thread Mike Snitzer

On Tue, Jan 14 2014 at 10:52pm -0500,
Stephen Rothwell  wrote:

> Hi all,
> 
> After merging the device-mapper tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> ERROR: ".dm_bufio_get_device_size" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_release" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_client_destroy" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_prefetch" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_set_minimum_buffers" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_forget" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_client_create" [drivers/md/dm-snapshot.ko] undefined!
> ERROR: ".dm_bufio_read" [drivers/md/dm-snapshot.ko] undefined!
> 
> Presumably caused by commit b41bf7440bcf ("dm snapshot: use dm-bufio").

Hi Stephen,

That commit was missing a Kconfig update to have DM_SNAPSHOT select
DM_BUFIO.  I've rebased the "dm snapshot: use dm-bufio" commit to include
the Kconfig change and pushed to 'for-next'.

Thanks,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Michael wang

Hi, Alex

On 01/15/2014 12:07 PM, Alex Shi wrote:
[snip]  }
> +#ifdef CONFIG_NO_HZ_COMMON
> + /*
> +  * Coarsely to get the latest idle cpu for shorter latency and
> +  * possible power benefit.
> +  */
> + if (!min_load) {
> + struct tick_sched *ts = _cpu(tick_cpu_sched, i);
> +
> + s64 latest_wake = 0;

I guess we missed some code for latest_wake here?

Regards,
Michael Wang

> + /* idle cpu doing irq */
> + if (ts->inidle && !ts->idle_active)
> + idlest = i;
> + /* the cpu resched */
> + else if (!ts->inidle)
> + idlest = i;
> + /* find latest idle cpu */
> + else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
> + idlest = i;
> + }
> +#endif
>   }
> 
>   return idlest;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] zram: rework reported to end-user zram statistics

2014-01-14 Thread Minchan Kim

Hello Sergey,

On Tue, Jan 14, 2014 at 12:37:40PM +0300, Sergey Senozhatsky wrote:
> 1) Introduce ZRAM_ATTR_RO macro to generate zram atomic64_t stats
> `show' functions and reduce code duplication.
> 
> 2) Account and report back to user numbers of failed READ and WRITE
> operations.
> 
> 3) Remove `good' and `bad' compressed sub-requests stats. RW request may
> cause a number of RW sub-requests. zram used to account `good' compressed
> sub-queries (with compressed size less than 50% of original size), `bad'
> compressed sub-queries (with compressed size greater that 75% of original
> size), leaving sub-requests with compression size between 50% and 75% of
> original size not accounted and not reported. Account each sub-request
> compression size so we can calculate real device compression ratio.
> 
> 4) reported zram stats:
>   - num_writes  -- number of writes
>   - num_reads  -- number of reads
>   - pages_stored -- number of pages currently stored
>   - compressed_size -- compressed size of pages stored
>   - pages_zero  -- number of zero filled pages
>   - failed_read -- number of failed reads
>   - failed_writes -- can happen when memory is too low
>   - invalid_io   -- non-page-aligned I/O requests
>   - notify_free  -- number of swap slot free notifications
>   - memory_used -- zs pool zs_get_total_size_bytes()
> 
> Signed-off-by: Sergey Senozhatsky 

So this patch includes some clean up and behavior changes?
Please seaprate them and each patch in behavior change includes
why it's bad or good(ie, motivation).

It could make reviewer/maintainer happy, even you because
some of them could be picked up and other things could be dropped.
Sorry for the bothering you.

Thanks.

> ---
>  drivers/block/zram/zram_drv.c | 167 
> --
>  drivers/block/zram/zram_drv.h |  17 ++---
>  2 files changed, 54 insertions(+), 130 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 2a7682c..8bddaff 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -42,6 +42,17 @@ static struct zram *zram_devices;
>  /* Module params (documentation at end) */
>  static unsigned int num_devices = 1;
>  
> +#define ZRAM_ATTR_RO(name)   \
> +static ssize_t zram_attr_##name##_show(struct device *d, \
> + struct device_attribute *attr, char *b) \
> +{\
> + struct zram *zram = dev_to_zram(d); \
> + return sprintf(b, "%llu\n", \
> + (u64)atomic64_read(>stats.name)); \
> +}\
> +static struct device_attribute dev_attr_##name = \
> + __ATTR(name, S_IRUGO, zram_attr_##name##_show, NULL);
> +
>  static inline int init_done(struct zram *zram)
>  {
>   return zram->meta != NULL;
> @@ -52,97 +63,36 @@ static inline struct zram *dev_to_zram(struct device *dev)
>   return (struct zram *)dev_to_disk(dev)->private_data;
>  }
>  
> -static ssize_t disksize_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%llu\n", zram->disksize);
> -}
> -
> -static ssize_t initstate_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%u\n", init_done(zram));
> -}
> -
> -static ssize_t num_reads_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%llu\n",
> - (u64)atomic64_read(>stats.num_reads));
> -}
> -
> -static ssize_t num_writes_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%llu\n",
> - (u64)atomic64_read(>stats.num_writes));
> -}
> -
> -static ssize_t invalid_io_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%llu\n",
> - (u64)atomic64_read(>stats.invalid_io));
> -}
> -
> -static ssize_t notify_free_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%llu\n",
> - (u64)atomic64_read(>stats.notify_free));
> -}
> -
> -static ssize_t zero_pages_show(struct device *dev,
> - struct device_attribute *attr, char *buf)
> -{
> - struct zram *zram = dev_to_zram(dev);
> -
> - return sprintf(buf, "%u\n",

linux-next: manual merge of the iommu tree with the drm tree

2014-01-14 Thread Stephen Rothwell

Hi Joerg,

Today's linux-next merge of the iommu tree got a conflict in
drivers/gpu/drm/msm/Kconfig between commit 3083894f7f29 ("drm/msm:
COMPILE_TEST support") from the drm tree and commit 4c071c7b851b
("drm/msm: Fix link error with !MSM_IOMMU") from the iommu tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/gpu/drm/msm/Kconfig
index bb103fb4519e,d3de8e1ae915..
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@@ -2,7 -2,9 +2,8 @@@
  config DRM_MSM
tristate "MSM DRM"
depends on DRM
 -  depends on ARCH_MSM
 -  depends on ARCH_MSM8960
 +  depends on (ARCH_MSM && ARCH_MSM8960) || (ARM && COMPILE_TEST)
+   depends on MSM_IOMMU
select DRM_KMS_HELPER
select SHMEM
select TMPFS


pgpbZmQz1_4Qf.pgp
Description: PGP signature

[PATCH] misc: sram: cleanup the code

2014-01-14 Thread Xiubo Li

Since the devm_gen_pool_create() is used, so the gen_pool_destroy()
here is redundant.

Signed-off-by: Xiubo Li 
---
 drivers/misc/sram.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/misc/sram.c b/drivers/misc/sram.c
index afe66571..e3e421d 100644
--- a/drivers/misc/sram.c
+++ b/drivers/misc/sram.c
@@ -87,8 +87,6 @@ static int sram_remove(struct platform_device *pdev)
if (gen_pool_avail(sram->pool) < gen_pool_size(sram->pool))
dev_dbg(>dev, "removed while SRAM allocated\n");
 
-   gen_pool_destroy(sram->pool);
-
if (sram->clk)
clk_disable_unprepare(sram->clk);
 
-- 
1.8.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Notifying on empty cgroup

2014-01-14 Thread Victor Porton

I want to write software which needs to receive a signal when the cgroup 
created by it becomes empty. (After this the empty cgroup should be deleted 
just not to clutter the memory.)

If the kernel does not support such notifications, it should be improved. 
This functionality is crucial for some kinds of software.

There is /sys/fs/cgroup/systemd/release_agent but I don't understand how to 
use it. I don't understand why we would need it at all.

Starting a binary on emptying a cgroup with the purpose to notify an other 
binary looks like a big overkill. Also my program should work in userspace 
without the need to use release_agent which can be accessed only by root.

Note that my work is related with sandboxing software (running a program in 
closed environment, so that it would be unable for example to remove user's 
files).

See also
http://portonsoft.wordpress.com/2014/01/11/toward-robust-linux-sandbox/

-- 
Victor Porton - http://portonvictor.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] sched: find the latest idle cpu

2014-01-14 Thread Alex Shi

Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.

In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.

Signed-off-by: Alex Shi 
---
 kernel/sched/fair.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..fb52d26 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct 
task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+   /*
+* Coarsely to get the latest idle cpu for shorter latency and
+* possible power benefit.
+*/
+   if (!min_load) {
+   struct tick_sched *ts = _cpu(tick_cpu_sched, i);
+
+   s64 latest_wake = 0;
+   /* idle cpu doing irq */
+   if (ts->inidle && !ts->idle_active)
+   idlest = i;
+   /* the cpu resched */
+   else if (!ts->inidle)
+   idlest = i;
+   /* find latest idle cpu */
+   else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
+   idlest = i;
+   }
+#endif
}
 
return idlest;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the md tree with the block tree

2014-01-14 Thread Stephen Rothwell

Hi Neil,

Today's linux-next merge of the md tree got a conflict in
drivers/md/raid10.c between commit 4f024f3797c4 ("block: Abstract out
bvec iterator") from the block tree and commit b50c259e25d9 ("md/raid10:
fix two bugs in handling of known-bad-blocks") from the md tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/md/raid10.c
index 6d43d88657aa,8d39d63281b9..
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@@ -1256,8 -1319,8 +1256,8 @@@ read_again
/* Could not read all from this device, so we will
 * need another r10_bio.
 */
-   sectors_handled = (r10_bio->sectors + max_sectors
+   sectors_handled = (r10_bio->sector + max_sectors
 - - bio->bi_sector);
 + - bio->bi_iter.bi_sector);
r10_bio->sectors = max_sectors;
spin_lock_irq(>device_lock);
if (bio->bi_phys_segments == 0)


pgp0mUzDMyGJx.pgp
Description: PGP signature

Re: [PATCH v4 1/3] Send loginuid and sessionid in SCM_AUDIT

2014-01-14 Thread Richard Guy Briggs

On 14/01/13, Jan Kaluza wrote:
> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
> 
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.
> 
> This introduces a new SCM type called SCM_AUDIT to allow the direct
> attaching of "loginuid" and "sessionid" to SCM, which is significantly more
> efficient and will reliably avoid the race with the round-trip over
> procfs.
> 
> Signed-off-by: Jan Kaluza 

Acked-by: Richard Guy Briggs 

> ---
>  include/linux/socket.h |  6 ++
>  include/net/af_unix.h  |  2 ++
>  include/net/scm.h  | 28 ++--
>  net/unix/af_unix.c |  7 +++
>  4 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index 5d488a6..eeac565 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -130,6 +130,7 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr 
> *__msg, struct cmsghdr
>  #define  SCM_RIGHTS  0x01/* rw: access rights (array of 
> int) */
>  #define SCM_CREDENTIALS 0x02 /* rw: struct ucred */
>  #define SCM_SECURITY 0x03/* rw: security label   */
> +#define SCM_AUDIT0x04/* rw: struct uaudit*/
>  
>  struct ucred {
>   __u32   pid;
> @@ -137,6 +138,11 @@ struct ucred {
>   __u32   gid;
>  };
>  
> +struct uaudit {
> + __u32   loginuid;
> + __u32   sessionid;
> +};
> +
>  /* Supported address families. */
>  #define AF_UNSPEC0
>  #define AF_UNIX  1   /* Unix domain sockets  */
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index a175ba4..3b9d22a 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -36,6 +36,8 @@ struct unix_skb_parms {
>   u32 secid;  /* Security ID  */
>  #endif
>   u32 consumed;
> + kuid_t  loginuid;
> + unsigned intsessionid;
>  };
>  
>  #define UNIXCB(skb)  (*(struct unix_skb_parms *)&((skb)->cb))
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 262532d..67de64f 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /* Well, we should have at least one descriptor open
>   * to accept passed FDs 8)
> @@ -18,6 +19,11 @@ struct scm_creds {
>   kgid_t  gid;
>  };
>  
> +struct scm_audit {
> + kuid_t loginuid;
> + unsigned int sessionid;
> +};
> +
>  struct scm_fp_list {
>   short   count;
>   short   max;
> @@ -28,6 +34,7 @@ struct scm_cookie {
>   struct pid  *pid;   /* Skb credentials */
>   struct scm_fp_list  *fp;/* Passed files */
>   struct scm_credscreds;  /* Skb credentials  */
> + struct scm_auditaudit;  /* Skb audit*/
>  #ifdef CONFIG_SECURITY_NETWORK
>   u32 secid;  /* Passed security ID   */
>  #endif
> @@ -58,6 +65,13 @@ static __inline__ void scm_set_cred(struct scm_cookie *scm,
>   scm->creds.gid = gid;
>  }
>  
> +static inline void scm_set_audit(struct scm_cookie *scm,
> + kuid_t loginuid, unsigned int sessionid)
> +{
> + scm->audit.loginuid = loginuid;
> + scm->audit.sessionid = sessionid;
> +}
> +
>  static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
>  {
>   put_pid(scm->pid);
> @@ -77,8 +91,12 @@ static __inline__ int scm_send(struct socket *sock, struct 
> msghdr *msg,
>   memset(scm, 0, sizeof(*scm));
>   scm->creds.uid = INVALID_UID;
>   scm->creds.gid = INVALID_GID;
> - if (forcecreds)
> - scm_set_cred(scm, task_tgid(current), current_uid(), 
> current_gid());
> + if (forcecreds) {
> + scm_set_cred(scm, task_tgid(current), current_uid(),
> +  current_gid());
> + scm_set_audit(scm, audit_get_loginuid(current),
> +   audit_get_sessionid(current));
> + }
>   unix_get_peersec_dgram(sock, scm);
>   if (msg->msg_controllen <= 0)
>   return 0;
> @@ -123,7 +141,13 @@ static __inline__ void scm_recv(struct socket *sock, 
> struct msghdr *msg,
>   .uid = from_kuid_munged(current_ns, scm->creds.uid),
>   .gid =

Re: [STABLE] find missing bug fixes in a stable kernel

2014-01-14 Thread Greg Kroah-Hartman

On Tue, Jan 14, 2014 at 09:37:22AM +0800, Li Zefan wrote:
> On 2014/1/13 23:57, Greg Kroah-Hartman wrote:
> > On Mon, Jan 13, 2014 at 03:28:11PM +0800, Li Zefan wrote:
> >> We have several long-term and extended stable kernels, and it's possible
> >> that a bug fix is in some stable versions but is missing in some other
> >> versions, so I've written a script to find out those fixes.
> >>
> >> Take 3.4.xx and 3.2.xx for example. If a bug fix was merged into upstream
> >> kernel after 3.4, and then it was backported to 3.2.xx, then it probably
> >> needs to be backported to 3.4.xx.
> > 
> > I agree.
> > 
> >> The result is, there're ~430 bug fixes in 3.2.xx that probably need to be
> >> backported to 3.4.xx. Given there're about 4500 commits in 3.2.xx, that
> >> is ~10%, which is quite a big number for stable kernels.
> > 
> > That's a really big number, how am I missing so many patches for the 3.4
> > kernel?  Is it because people are doing backports to 3.2 for patches
> > that didn't apply to 3.4?  Or are these patches being applied that do
> > not have -stable markings on them?  Or something else?
> > 
> 
> I guess the biggest reason is, most people tag a patch with stable without
> specifying kernel versions, and if this patch can't be applied to 3.4, it
> will be dropped silently. I guess Ben has been checking this kind of patches
> manually.
> 
> >> We (our team in Huawei) are going to go through the whole list to filter
> >> out fixes that're applicable for 3.4.xx.

Please do this, I don't have the resources at the moment to be able to
go through all of these git ids to see if they really are relevant or
not right now.  Any help that you can provide with this for 3.4 or 3.10
would be most appreciated.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/3] Send comm and cmdline in SCM_PROCINFO

2014-01-14 Thread Richard Guy Briggs

On 14/01/13, Jan Kaluza wrote:
> Server-like processes in many cases need credentials and other
> metadata of the peer, to decide if the calling process is allowed to
> request a specific action, or the server just wants to log away this
> type of information for auditing tasks.
> 
> The current practice to retrieve such process metadata is to look that
> information up in procfs with the $PID received over SCM_CREDENTIALS.
> This is sufficient for long-running tasks, but introduces a race which
> cannot be worked around for short-living processes; the calling
> process and all the information in /proc/$PID/ is gone before the
> receiver of the socket message can look it up.
> 
> This introduces a new SCM type called SCM_PROCINFO to allow the direct
> attaching of "comm" and "cmdline" to SCM, which is significantly more
> efficient and will reliably avoid the race with the round-trip over
> procfs.
> 
> To achieve that, new struct called unix_skb_parms_scm had to be created,
> because otherwise unix_skb_parms would be too big.
> 
> scm_get_current_procinfo is inspired by ./fs/proc/base.c.
> 
> Signed-off-by: Jan Kaluza 

Acked-by: Richard Guy Briggs 

> ---
>  include/linux/socket.h |  2 ++
>  include/net/af_unix.h  | 11 +++--
>  include/net/scm.h  | 24 +++
>  net/core/scm.c | 65 
> ++
>  net/unix/af_unix.c | 57 +--
>  5 files changed, 150 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index eeac565..5a41f35 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -131,6 +131,8 @@ static inline struct cmsghdr * cmsg_nxthdr (struct msghdr 
> *__msg, struct cmsghdr
>  #define SCM_CREDENTIALS 0x02 /* rw: struct ucred */
>  #define SCM_SECURITY 0x03/* rw: security label   */
>  #define SCM_AUDIT0x04/* rw: struct uaudit*/
> +#define SCM_PROCINFO 0x05/* rw: comm + cmdline (NULL terminated
> +array of char *) */
>  
>  struct ucred {
>   __u32   pid;
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index 3b9d22a..05c7678 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -27,6 +27,13 @@ struct unix_address {
>   struct sockaddr_un name[0];
>  };
>  
> +struct unix_skb_parms_scm {
> + kuid_t loginuid;
> + unsigned int sessionid;
> + char *procinfo;
> + int procinfo_len;
> +};
> +
>  struct unix_skb_parms {
>   struct pid  *pid;   /* Skb credentials  */
>   kuid_t  uid;
> @@ -36,12 +43,12 @@ struct unix_skb_parms {
>   u32 secid;  /* Security ID  */
>  #endif
>   u32 consumed;
> - kuid_t  loginuid;
> - unsigned intsessionid;
> + struct unix_skb_parms_scm *scm;
>  };
>  
>  #define UNIXCB(skb)  (*(struct unix_skb_parms *)&((skb)->cb))
>  #define UNIXSID(skb) (((skb)).secid)
> +#define UNIXSCM(skb) (*(UNIXCB((skb)).scm))
>  
>  #define unix_state_lock(s)   spin_lock(_sk(s)->lock)
>  #define unix_state_unlock(s) spin_unlock(_sk(s)->lock)
> diff --git a/include/net/scm.h b/include/net/scm.h
> index 67de64f..f084e19 100644
> --- a/include/net/scm.h
> +++ b/include/net/scm.h
> @@ -30,11 +30,17 @@ struct scm_fp_list {
>   struct file *fp[SCM_MAX_FD];
>  };
>  
> +struct scm_procinfo {
> + char *procinfo;
> + int len;
> +};
> +
>  struct scm_cookie {
>   struct pid  *pid;   /* Skb credentials */
>   struct scm_fp_list  *fp;/* Passed files */
>   struct scm_credscreds;  /* Skb credentials  */
>   struct scm_auditaudit;  /* Skb audit*/
> + struct scm_procinfo procinfo;   /* Skb procinfo */
>  #ifdef CONFIG_SECURITY_NETWORK
>   u32 secid;  /* Passed security ID   */
>  #endif
> @@ -45,6 +51,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct 
> scm_cookie *scm);
>  int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie 
> *scm);
>  void __scm_destroy(struct scm_cookie *scm);
>  struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl);
> +int scm_get_current_procinfo(char **procinfo);
>  
>  #ifdef CONFIG_SECURITY_NETWORK
>  static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct 
> scm_cookie *scm)
> @@ -72,10 +79,20 @@ static inline void scm_set_audit(struct scm_cookie *scm,
>   scm->audit.sessionid = sessionid;
>  }
>  
> +static inline void scm_set_procinfo(struct scm_cookie *scm,
> + char *procinfo, int len)
> +{
> + scm->procinfo.procinfo = procinfo;
> + scm->procinfo.len = len;
> +}
> +
>  static __inline__ void scm_destroy_cred(struct scm_cookie *scm)
>  {
>

linux-next: manual merge of the md tree with the tree

2014-01-14 Thread Stephen Rothwell

Hi Neil,

Today's linux-next merge of the md tree got a conflict in
drivers/md/raid1.c between commit 4f024f3797c4 ("block: Abstract out bvec
iterator") from the block tree and commit 41a336e01188 ("md/raid1: fix
request counting bug in new 'barrier' code") from the md tree.

I fixed it up (a line fixed in the latter was removed by the latter) and
can carry the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp6bOELZxdyF.pgp
Description: PGP signature

Re: [PATCH v4] Staging: comedi: convert while loop to timeout in ni_mio_common.c

2014-01-14 Thread Greg KH

On Tue, Jan 14, 2014 at 06:23:05PM -0600, Chase Southwood wrote:
> This patch for ni_mio_common.c changes out a while loop for a timeout,
> which is preferred.
> 
> Signed-off-by: Chase Southwood 
> ---
> 
> OK, here's another go at it.  Hopefully everything looks more correct
> this time.  Greg, I've followed the pattern you gave me, and I really
> appreciate all of the tips!  As always, just let me know if there are
> still things that need adjusting (especially length of timeout, udelay,
> etc.).
> 
> Thanks,
> Chase Southwood
> 
> 2: Changed from simple clean-up to swapping a timeout in for a while loop.
> 
> 3: Removed extra counter variable, and added error checking.
> 
> 4: No longer using counter variable, using jiffies instead.
> 
>  drivers/staging/comedi/drivers/ni_mio_common.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/comedi/drivers/ni_mio_common.c 
> b/drivers/staging/comedi/drivers/ni_mio_common.c
> index 457b884..882b249 100644
> --- a/drivers/staging/comedi/drivers/ni_mio_common.c
> +++ b/drivers/staging/comedi/drivers/ni_mio_common.c
> @@ -687,12 +687,21 @@ static void ni_clear_ai_fifo(struct comedi_device *dev)
>  {
>   const struct ni_board_struct *board = comedi_board(dev);
>   struct ni_private *devpriv = dev->private;
> + unsigned long timeout;
>  
>   if (board->reg_type == ni_reg_6143) {
>   /*  Flush the 6143 data FIFO */
>   ni_writel(0x10, AIFIFO_Control_6143);   /*  Flush fifo */
>   ni_writel(0x00, AIFIFO_Control_6143);   /*  Flush fifo */
> - while (ni_readl(AIFIFO_Status_6143) & 0x10) ;   /*  Wait for 
> complete */
> + /*  Wait for complete */
> + timeout = jiffies + msec_to_jiffies(100);
> + while (ni_readl(AIFIFO_Status_6143) & 0x10) {
> + if (time_after(jiffies, timeout)) {
> + comedi_error(dev, "FIFO flush timeout.");
> + break;
> + }
> + udelay(1);

Sleep for at least 10, as I think that's the smallest time delay you can
sleep for anyway (meaning it will be that long no matter what number you
put there less than 10, depending on the hardware used of course.)

Other than that, looks much better.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 7/9] mm: thrash detection-based file cache sizing

2014-01-14 Thread Zhang Yanfei

Hello

On 01/15/2014 10:57 AM, Bob Liu wrote:
> 
> On 01/15/2014 03:16 AM, Johannes Weiner wrote:
>> On Tue, Jan 14, 2014 at 09:01:09AM +0800, Bob Liu wrote:
>>> Hi Johannes,
>>>
>>> On 01/11/2014 02:10 AM, Johannes Weiner wrote:
 The VM maintains cached filesystem pages on two types of lists.  One
 list holds the pages recently faulted into the cache, the other list
 holds pages that have been referenced repeatedly on that first list.
 The idea is to prefer reclaiming young pages over those that have
 shown to benefit from caching in the past.  We call the recently used
 list "inactive list" and the frequently used list "active list".

 Currently, the VM aims for a 1:1 ratio between the lists, which is the
 "perfect" trade-off between the ability to *protect* frequently used
 pages and the ability to *detect* frequently used pages.  This means
 that working set changes bigger than half of cache memory go
 undetected and thrash indefinitely, whereas working sets bigger than
 half of cache memory are unprotected against used-once streams that
 don't even need caching.

>>>
>>> Good job! This patch looks good to me and with nice descriptions.
>>> But it seems that this patch only fix the issue "working set changes
>>> bigger than half of cache memory go undetected and thrash indefinitely".
>>> My concern is could it be extended easily to address all other issues
>>> based on this patch set?
>>>
>>> The other possible way is something like Peter has implemented the CART
>>> and Clock-Pro which I think may be better because of using advanced
>>> algorithms and consider the problem as a whole from the beginning.(Sorry
>>> I haven't get enough time to read the source code, so I'm not 100% sure.)
>>> http://linux-mm.org/PeterZClockPro2
>>
>> My patches are moving the VM towards something that is comparable to
>> how Peter implemented Clock-Pro.  However, the current VM has evolved
>> over time in small increments based on real life performance
>> observations.  Rewriting everything in one go would be incredibly
>> disruptive and I doubt very much we would merge any such proposal in
>> the first place.  So it's not like I don't see the big picture, it's
>> just divide and conquer:
>>
>> Peter's Clock-Pro implementation was basically a double clock with an
>> intricate system to classify hotness, augmented by eviction
>> information to work with reuse distances independent of memory size.
>>
>> What we have right now is a double clock with a very rudimentary
>> system to classify whether a page is hot: it has been accessed twice
>> while on the inactive clock.  My patches now add eviction information
>> to this, and improve the classification so that it can work with reuse
>> distances up to memory size and is no longer dependent on the inactive
>> clock size.
>>
>> This is the smallest imaginable step that is still useful, and even
>> then we had a lot of discussions about scalability of the data
>> structures and confusion about how the new data point should be
>> interpreted.  It also took a long time until somebody read the series
>> and went, "Ok, this actually makes sense to me."  Now, maybe I suck at
>> documenting, but maybe this is just complicated stuff.  Either way, we
>> have to get there collectively, so that the code is maintainable in
>> the long term.
>>
>> Once we have these new concepts established, we can further improve
>> the hotness detector so that it can classify and order pages with
>> reuse distances beyond memory size.  But this will come with its own
>> set of problems.  For example, some time ago we stopped regularly
>> scanning and rotating active pages because of scalability issues, but
>> we'll most likely need an uptodate estimate of the reuse distances on
>> the active list in order to classify refaults properly.
>>
> 
> Thank you for your kindly explanation. It make sense to me please feel
> free to add my review.
> 
 + * Approximating inactive page access frequency - Observations:
 + *
 + * 1. When a page is accessed for the first time, it is added to the
 + *head of the inactive list, slides every existing inactive page
 + *towards the tail by one slot, and pushes the current tail page
 + *out of memory.
 + *
 + * 2. When a page is accessed for the second time, it is promoted to
 + *the active list, shrinking the inactive list by one slot.  This
 + *also slides all inactive pages that were faulted into the cache
 + *more recently than the activated page towards the tail of the
 + *inactive list.
 + *
>>>
>>> Nitpick, how about the reference bit?
>>
>> What do you mean?
>>
> 
> Sorry, I mean the PG_referenced flag. I thought when a page is accessed
> for the second time only PG_referenced flag  will be set instead of be
> promoted to active list.
> 

No. I try to explain a bit. For mapped file pages, if the second access
occurs on a

linux-next: build failure after merge of the device-mapper tree

2014-01-14 Thread Stephen Rothwell

Hi all,

After merging the device-mapper tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

ERROR: ".dm_bufio_get_device_size" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_release" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_client_destroy" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_prefetch" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_set_minimum_buffers" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_forget" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_client_create" [drivers/md/dm-snapshot.ko] undefined!
ERROR: ".dm_bufio_read" [drivers/md/dm-snapshot.ko] undefined!

Presumably caused by commit b41bf7440bcf ("dm snapshot: use dm-bufio").

I have used the device-mapper tree from next-20140114 for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpyxYgbtaJpF.pgp
Description: PGP signature

Re: [RFC] sysfs_rename_link() and its usage

2014-01-14 Thread Ding Tianhong

On 2014/1/15 1:17, Veaceslav Falico wrote:
> Hi,
> 
> I'm hitting a strange issue and/or I'm completely lost in sysfs internals.
> 
> Consider having two net_device *a, *b; which are registered normally.
> Now, to create a link from /sys/class/net/a->name/linkname to b, one should
> use:
> 
> sysfs_create_link(&(a->dev.kobj), &(b->dev.kobj), linkname);
> 
> To remove it, even simpler:
> 
> sysfs_remove_link(&(a->dev.kobj), linkname);
> 
> This works like a charm. However, if I want to use (obviously, with the
> symlink present):
> 
> sysfs_rename_link(&(a->dev.kobj), &(b->dev.kobj), oldname, newname);
> 
> this fails with:
> 
> "sysfs: ns invalid in 'a->name' for 'oldname'"
> 
> in
> 
>  608 struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
> ...
>  615 if (!!sysfs_ns_type(parent_sd) != !!ns) {
>  616 WARN(1, KERN_WARNING "sysfs: ns %s in '%s' for '%s'\n",
>  617 sysfs_ns_type(parent_sd) ? "required" : 
> "invalid",
>  618 parent_sd->s_name, name);
>  619 return NULL;
>  620 }
> 
> Code path:
> warn_slowpath_fmt+0x46/0x50
> sysfs_get_dirent_ns+0x30/0x80
> sysfs_find_dirent+0x84/0x110
> sysfs_get_dirent_ns+0x3e/0x80
> sysfs_rename_link_ns+0x54/0xd0
> 
> I have no idea what this code means. Is there any reason for it to
> fail (i.e. am I doing something wrong?) or I've hit a bug?
> 
> I've tested the only user of it (bridge) - and it works fine, however it's
> not using its own net_device's kobject but rather its own dir.
> 

I use the sysfs_rename_link(x,x) and meet the same problem, I review the code 
for bridge,
I found the br->ifobj was using kobject_create_and_add() to add a subdir for 
this, maybe it
helps?

Ding

> Thank you!
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2 v2] ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63

2014-01-14 Thread Brown, Aaron F

On Fri, 2013-12-27 at 01:02 -0800, Jeff Kirsher wrote:
> On Wed, 2013-12-25 at 00:12 +0800, Ethan Zhao wrote:
> > Because ixgbe driver limit the max number of VF functions could be
> > enabled
> > to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT and cleanup the
> > const 63
> > in code.
> > 
> > v2: fix a typo.
> > 
> > Signed-off-by: Ethan Zhao 
> > ---
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  | 4 ++--
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 5 +++--
> >  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h | 5 +
> >  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> Added to my queue, thanks Ethan!

Hi Ethan,

Did Jeff contact you about this failing to compile?  I'm currently
providing vacation covering for him and we found this was failing to
compile just before he left.  We captured the failure in our notes for
this but there is no comment on if you were contacted or not.

Regardless, when I apply this patch (with or without 2-2) we get the
following error on a compilation attempt: Here's the error:

Here's the error:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function
"ixgbe_sw_init":
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: stray "\357"
in program
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: stray "\274"
in program
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: stray "\215"
in program
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:5033: error: expected ")"
before numeric constant
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function
"ixgbe_probe":
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: stray "\357"
in program
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: stray "\274"
in program
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: stray "\215"
in program
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7977: error: expected ")"
before numeric constant
make[5]: *** [drivers/net/ethernet/intel/ixgbe/ixgbe_main.o] Error 1
make[5]: *** Waiting for unfinished jobs
make[4]: *** [drivers/net/ethernet/intel/ixgbe] Error 2
make[4]: *** Waiting for unfinished jobs
make[3]: *** [drivers/net/ethernet/intel] Error 2
make[2]: *** [drivers/net/ethernet] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2


Thanks,
Aaron

Re: [PATCH] mm: nobootmem: avoid type warning about alignment value

2014-01-14 Thread Nicolas Pitre

On Sun, 12 Jan 2014, Russell King - ARM Linux wrote:

> This patch makes their types match exactly with x86's definitions of
> the same, which is the basic problem: on ARM, they all took "int" values
> and returned "int"s, which leads to min() in nobootmem.c complaining.
> 
>  arch/arm/include/asm/bitops.h | 54 
> +++
>  1 file changed, 44 insertions(+), 10 deletions(-)

For the record:

Acked-by: Nicolas Pitre 

The reason why macros were used at the time this was originally written 
is because gcc used to have issues forwarding the constant nature of a 
variable down multiple levels of inline functions and 
__builtin_constant_p() always returned false.  But that was quite a long 
time ago.


> diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
> index e691ec91e4d3..b2e298a90d76 100644
> --- a/arch/arm/include/asm/bitops.h
> +++ b/arch/arm/include/asm/bitops.h
> @@ -254,25 +254,59 @@ static inline int constant_fls(int x)
>  }
>  
>  /*
> - * On ARMv5 and above those functions can be implemented around
> - * the clz instruction for much better code efficiency.
> + * On ARMv5 and above those functions can be implemented around the
> + * clz instruction for much better code efficiency.  __clz returns
> + * the number of leading zeros, zero input will return 32, and
> + * 0x8000 will return 0.
>   */
> +static inline unsigned int __clz(unsigned int x)
> +{
> + unsigned int ret;
> +
> + asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
>  
> + return ret;
> +}
> +
> +/*
> + * fls() returns zero if the input is zero, otherwise returns the bit
> + * position of the last set bit, where the LSB is 1 and MSB is 32.
> + */
>  static inline int fls(int x)
>  {
> - int ret;
> -
>   if (__builtin_constant_p(x))
>  return constant_fls(x);
>  
> - asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
> - ret = 32 - ret;
> - return ret;
> + return 32 - __clz(x);
> +}
> +
> +/*
> + * __fls() returns the bit position of the last bit set, where the
> + * LSB is 0 and MSB is 31.  Zero input is undefined.
> + */
> +static inline unsigned long __fls(unsigned long x)
> +{
> + return fls(x) - 1;
> +}
> +
> +/*
> + * ffs() returns zero if the input was zero, otherwise returns the bit
> + * position of the first set bit, where the LSB is 1 and MSB is 32.
> + */
> +static inline int ffs(int x)
> +{
> + return fls(x & -x);
> +}
> +
> +/*
> + * __ffs() returns the bit position of the first bit set, where the
> + * LSB is 0 and MSB is 31.  Zero input is undefined.
> + */
> +static inline unsigned long __ffs(unsigned long x)
> +{
> + return ffs(x) - 1;
>  }
>  
> -#define __fls(x) (fls(x) - 1)
> -#define ffs(x) ({ unsigned long __t = (x); fls(__t & -__t); })
> -#define __ffs(x) (ffs(x) - 1)
>  #define ffz(x) __ffs( ~(x) )
>  
>  #endif
> 
> 
> -- 
> FTTC broadband for 0.8mile line: 5.8Mbps down 500kbps up.  Estimation
> in database were 13.1 to 19Mbit for a good line, about 7.5+ for a bad.
> Estimate before purchase was "up to 13.2Mbit".
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1818 matches

Mail list logo