date:20130228

Re: [PATCH] regulators: max8998.c: use dev_err() instead of printk()

2013-02-28 Thread Mark Brown

On Sat, Feb 23, 2013 at 12:52:35AM -0300, Thiago Farina wrote:
> Fixes the following checkpatch warning:
> 
> WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(...  
> to printk(KERN_ERR ...

Applied, thanks (and discarded the previous version as it's subsumed in
there).


signature.asc
Description: Digital signature

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Yinghai Lu

On Thu, Feb 28, 2013 at 10:02 PM, Yasuaki Ishimatsu
 wrote:
> 2013/03/01 14:00, Yinghai Lu wrote:
>
> Original issue occurs by two patches. And it is fixed by Tang's reverting
> patch. So other patches are obviously unrelated to original problem. Thus
> there is no reason to revert all patches related with movablemem_map.
>
> If there is a reason, movablemem_map patches prevent only your work.
>
> If you keep on developing your work, you should develop it in consideration
> of those patches.

Let me try again:

movablemem_map is broken idea or poor design.

It just push down kernel memory from local node to some place.

It is ridiculous to let use specify mem range in command line to make
memory hotplug working.
Think about different memory layout conf, that will drive customer crazy.
Also not mention there is performance regarding put numa data low.

Right way or good pratice is:
Find out those kernel memory that can not be moved, either put them low
or make it to local node ram.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] regulators: max8998.c: use dev_err() instead of printk()

2013-02-28 Thread Mark Brown

On Sat, Feb 23, 2013 at 12:51:26AM -0300, Thiago Farina wrote:
> Fixes the following checkpatch warning:
> 
> WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(...  
> to printk(KERN_ERR ...

This doesn't apply against current mainline, and...

> @@ -666,7 +666,7 @@ static int max8998_pmic_probe(struct platform_device 
> *pdev)
>   /* Check if SET1 is not equal to 0 */
>   if (!pdata->buck1_set1) {
>   dev_err(>dev,
> -"MAX8998 SET1 GPIO defined as 0 !\n");
> + "MAX8998 SET1 GPIO defined as 0 !\n");
>   WARN_ON(!pdata->buck1_set1);
>   ret = -EIO;
>   goto err_out;

...this (which is one of the failing hunks) is an indentation change
which bears no relation to the changelog.

I've applied the final hunk which looks good but please be more careful
in future.


signature.asc
Description: Digital signature

Re: [PATCH] arm: omap: RX-51: ARM errata 430973 workaround

2013-02-28 Thread Ивайло Димитров

   
They look similar, but they are not equivalent :). The first major difference 
is here (code taken from omap-smc.S)

> ENTRY(omap_smc2)
>  stmfd   sp!, {r4-r12, lr}
>  mov r3, r2
>  mov r2, r1
>  mov r1, #0x0@ Process ID
>  mov r6, #0xff
>  mov r12, #0x00  @ Secure Service ID

Always zero, while RX51 PPA expects a real value. I wonder if it is a bug, but 
anyway I don't see the id parameter (R0) used.

>  mov r7, #0
>  mcr p15, 0, r7, c7, c5, 6

According to ARM TRM, this is "Invalidate entire branch predictor array"(IIUC). 
NFC why it is needed here, but this will not work on RX-51 until IBE bit in ACR 
is set. 

>  dsb
>  dmb
>  smc #0

RX-51 needs smc #1 ;)

>  ldmfd   sp!, {r4-r12, pc}


The next major difference is that RX-51 expects parameter count passed in R3[0] 
to be the count of the remaining parameters +1, but omap_secure_dispatcher (in 
omap-secure.c) is passing the exact count of the remaining parameters.

I guess all of the above problems can be fixed/workarounded, but I wonder does 
it worth. Not to say that I don't have BB around to test if the code still 
works if I make changes to omap2-secure.c/omap-smc.S :)


 > Оригинално писмо 
 >От:  Nishanth Menon 
 >Относно: Re: [PATCH] arm: omap: RX-51: ARM errata 430973 workaround
 >До: Pali Rohár 
 >Изпратено на: Четвъртък, 2013, Февруари 28 16:40:05 EET
 >
 >
 >On 10:42-20130228, Pali Rohár wrote:
 >> Signed-off-by: Ivaylo Dimitrov 
 >> Signed-off-by: Pali Rohár 
 >> ---
 >>  arch/arm/mach-omap2/Makefile|1 +
 >>  arch/arm/mach-omap2/board-rx51-secure.c |   66 
 >> +++
 >>  arch/arm/mach-omap2/board-rx51-secure.h |   36 +
 >>  arch/arm/mach-omap2/board-rx51-smc.S|   34 
 >>  arch/arm/mach-omap2/board-rx51.c|7 
 >
 >Wondering if we can integrate these into 
 >arch/arm/mach-omap2/omap-smc.S
 >and
 >arch/arm/mach-omap2/omap-secure.c
 >on a quick look, it does seem there are commonalities.
 >
 >>  5 files changed, 144 insertions(+)
 >>  create mode 100644 arch/arm/mach-omap2/board-rx51-secure.c
 >>  create mode 100644 arch/arm/mach-omap2/board-rx51-secure.h
 >>  create mode 100644 arch/arm/mach-omap2/board-rx51-smc.S
 >> 
 >> diff --git a/arch/arm/mach-omap2/Makefile b/arch/arm/mach-omap2/Makefile
 >> index 0ebbdd50..8eb4fb4 100644
 >> --- a/arch/arm/mach-omap2/Makefile
 >> +++ b/arch/arm/mach-omap2/Makefile
 >> @@ -241,6 +241,7 @@ obj-(CONFIG_MACH_NOKIA_RX51)   += 
 >> board-rx51.o sdram-nokia.o
 >>  obj-(CONFIG_MACH_NOKIA_RX51)  += board-rx51-peripherals.o
 >>  obj-(CONFIG_MACH_NOKIA_RX51)  += board-rx51-video.o
 >>  obj-(CONFIG_MACH_NOKIA_RX51)  += board-rx51-camera.o
 >> +obj-(CONFIG_MACH_NOKIA_RX51)  += board-rx51-smc.o 
 >> board-rx51-secure.o
 >>  obj-(CONFIG_MACH_OMAP_ZOOM2)  += board-zoom.o 
 >> board-zoom-peripherals.o
 >>  obj-(CONFIG_MACH_OMAP_ZOOM2)  += board-zoom-display.o
 >>  obj-(CONFIG_MACH_OMAP_ZOOM2)  += board-zoom-debugboard.o
 >> diff --git a/arch/arm/mach-omap2/board-rx51-secure.c 
 >> b/arch/arm/mach-omap2/board-rx51-secure.c
 >> new file mode 100644
 >> index 000..361dc78
 >> --- /dev/null
 >> +++ b/arch/arm/mach-omap2/board-rx51-secure.c
 >> @@ -0,0 +1,66 @@
 >> +/*
 >> + * RX51 Secure PPA API.
 >> + *
 >> + * Copyright (C) 2012 Ivaylo Dimitrov 
 >> + *
 >> + *
 >> + * This program is free software,you can redistribute it and/or modify
 >> + * it under the terms of the GNU General Public License version 2 as
 >> + * published by the Free Software Foundation.
 >> + */
 >> +#include 
 >> +
 >> +#include board-rx51-secure.h
 >> +
 >> +/**
 >> + * rx51_secure_dispatcher: Routine to dispatch secure PPA API calls
 >> + * @idx: The PPA API index
 >> + * @flag: The flag indicating criticality of operation
 >> + * @nargs: Number of valid arguments out of four.
 >> + * @arg1, arg2, arg3 args4: Parameters passed to secure API
 >> + *
 >> + * Return the non-zero error value on failure.
 >> + */
 >> +u32 rx51_secure_dispatcher(u32 idx, u32 flag, u32 nargs, u32 arg1, u32 
 >> arg2,
 >> +  u32 arg3, u32 arg4)
 >> +{
 >> +   u32 ret;
 >> +   u32 param[5];
 >> +
 >> +   param[0] = nargs+1;
 >> +   param[1] = arg1;
 >> +   param[2] = arg2;
 >> +   param[3] = arg3;
 >> +   param[4] = arg4

Re: [ 17/53] s390/kvm: Fix store status for ACRS/FPRS

2013-02-28 Thread Christian Borntraeger

On 28/02/13 23:26, Jiri Slaby wrote:
> On 02/27/2013 12:57 AM, Greg Kroah-Hartman wrote:
>> 3.0-stable review patch.  If anyone has any objections, please let me know.
>>
>> --
>>
>> From: Christian Borntraeger 
>>
>> commit 15bc8d8457875f495c59d933b05770ba88d1eacb upstream.
>>
>> On store status we need to copy the current state of registers
>> into a save area. Currently we might save stale versions:
>> The sie state descriptor doesnt have fields for guest ACRS,FPRS,
>> those registers are simply stored in the host registers. The host
>> program must copy these away if needed. We do that in vcpu_put/load.
>>
>> If we now do a store status in KVM code between vcpu_put/load, the
>> saved values are not up-to-date. Lets collect the ACRS/FPRS before
>> saving them.
>>
>> This also fixes some strange problems with hotplug and virtio-ccw,
>> since the low level machine check handler (on hotplug a machine check
>> will happen) will revalidate all registers with the content of the
>> save area.
>>
>> Signed-off-by: Christian Borntraeger 
>> Signed-off-by: Gleb Natapov 
>> Signed-off-by: Greg Kroah-Hartman 
>>
>> ---
>>  arch/s390/kvm/kvm-s390.c |8 
>>  1 file changed, 8 insertions(+)
>>
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -584,6 +584,14 @@ int kvm_s390_vcpu_store_status(struct kv
>>  } else
>>  prefix = 0;
>>  
>> +/*
>> + * The guest FPRS and ACRS are in the host FPRS/ACRS due to the lazy
>> + * copying in vcpu load/put. Lets update our copies before we save
>> + * it into the save area
>> + */
>> +save_fp_regs(>arch.guest_fpregs);
>> +save_access_regs(vcpu->run->s.regs.acrs);
> 
> kvm_run structure does not have kvm_sync_regs in it in 3.0 yet. So this
> fails with:
> arch/s390/kvm/kvm-s390.c: In function 'kvm_s390_vcpu_store_status':
> arch/s390/kvm/kvm-s390.c:593: error: 'struct kvm_run' has no member
> named 's'
> 
> I believe the fix is just to remove save_access_regs, right?

Before the sync reg changes, the ACRS were saved in the vcpu->arch.
So the fix would look like 

save_access_regs(vcpu->arch.guest_acrs);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] kconfig: use config scripts to detect ncurses libs

2013-02-28 Thread justin

On 28/02/13 21:59, Yann E. MORIN wrote:
> I've queued that one in my tree, now, in branch yem-kconfig-for-next:
> https://www.gitorious.org/linux-kconfig/linux-kconfig
> 

Thanks for queuing, but with the points which Sven mentioned we should
first agree on a way to detect things.

The simplest option which might only be present at very recent distro
versions is using pkg-config.
Second best is ncurses*5-config. There we have the strange situation
that "ncurses-bin" contains the config scripts although no build time
dependencies are installed along.
Third, we might only find either of nurses{,w}5-config.
And at last we might face the situation where the new ABI is used and
the script is named ncurses{,w}6-config.

Ugly but working would be a simple "||" chain. This would create
problems on debian/ubuntu which needs to be solved.

Any other suggestions how we can solve the problem?

Thanks
justin

signature.asc
Description: OpenPGP digital signature

Re: [PATCH 4/4] regulator: palmas: Change the DT node property names to follow the convention

2013-02-28 Thread Mark Brown

On Mon, Feb 18, 2013 at 10:44:20AM +0530, J Keerthy wrote:
> DT node properties should not have "_". Replacing them by "-".

Applied, thanks.


signature.asc
Description: Digital signature

Re: [PATCH v2 1/2] regulator: palmas fix SMPS no voltages

2013-02-28 Thread Mark Brown

On Sat, Feb 23, 2013 at 04:35:40PM +, Ian Lartey wrote:
> From: Graeme Gregory 
> 
> Number of voltages for SMPS regulators was off by one.

Applied, thanks.


signature.asc
Description: Digital signature

Re: [PATCH 2/2] drivers/regulator/s5m8767.c: adjust duplicate test

2013-02-28 Thread Mark Brown

On Sun, Feb 24, 2013 at 12:55:34PM +0100, Julia Lawall wrote:
> From: Julia Lawall 
> 
> Delete successive tests to the same location.

Applied, thanks.  If you're sending a bunch of patches intended to be
applied separately it's probably not worth numbering them, it avoids
confusion (like waiting for 1/2).

signature.asc
Description: Digital signature

Re: [PATCH] pci: do not try to assign irq 255

2013-02-28 Thread Hannes Reinecke

On 02/27/2013 10:13 PM, Bjorn Helgaas wrote:

[+cc Andy]

On Wed, Feb 20, 2013 at 11:53 PM, Hannes Reinecke  wrote:

On 02/20/2013 05:57 PM, Yinghai Lu wrote:

On Tue, Feb 19, 2013 at 11:58 PM, Hannes Reinecke  wrote:

Apparently this device is meant to use MSI _only_ so the BIOS developer
didn't feel the need to assign an INTx here.

According to PCI-3.0, section 6.8 (Message Signalled Interrupts):

It is recommended that devices implement interrupt pins to
provide compatibility in systems that do not support MSI
(devices default to interrupt pins). However, it is expected
that the need for interrupt pins will diminish over time.
Devices that do not support interrupt pins due to pin
constraints (rely on polling for device service) may implement
messages to increase performance without adding additional pins. >
Therefore, system configuration software must not assume that a
message capable device has an interrupt pin.

Which sounds to me as if the implementation is valid...

it seems you mess pin with interrupt line.

current code:
  unsigned char irq;

  pci_read_config_byte(dev, PCI_INTERRUPT_PIN, );
  dev->pin = irq;
  if (irq)
  pci_read_config_byte(dev, PCI_INTERRUPT_LINE, );
  dev->irq = irq;

so if the device does not have interrupt pin implemented, pin should be
zero.
and  pin and irq in dev should
be all 0.

But the device _has_ an interrupt pin implemented.
The whole point here is that the interrupt line is _NOT_ zero.

00:14.0 USB controller [0c03]: Intel Corporation 7 Series/C210 Series
Chipset Family USB xHCI Host Controller [8086:1e31] (rev 04) (prog-if 30
[XHCI])
 Subsystem: Hewlett-Packard Company Device [103c:179b]
 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
SERR- irq is not valid, despite it being
not set to zero.
An alternative fix would be this:

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 68a921d..4a480cb 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -469,6 +469,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
 } else {
 dev_warn(>dev, "PCI INT %c: no GSI\n",
  pin_name(pin));
+   dev->irq = 0;
 }
 return 0;
 }

Which probably is a better solution, as here ->irq is _definitely_
not valid, so we should reset it to '0' to avoid confusion on upper
layers.

I didn't like the pci_read_irq() change because the PCI spec doesn't
say anything about any PCI_INTERRUPT_LINE values being invalid.

I like this solution better, but I still don't quite understand it.
We have the following code in acpi_pci_irq_enable().  We have
previously tried to look up "gsi," but the _PRT doesn't mention this
device, so we have "gsi == -1" at this point:

 /*
  * No IRQ known to the ACPI subsystem - maybe the BIOS /
  * driver reported one, then use it. Exit in any case.
  */
 if (gsi < 0) {
 u32 dev_gsi;
 /* Interrupt Line values above 0xF are forbidden */
 if (dev->irq > 0 && (dev->irq <= 0xF) &&
 (acpi_isa_irq_to_gsi(dev->irq, _gsi) == 0)) {
 dev_warn(>dev, "PCI INT %c: no GSI -
using ISA IRQ %d\n",
  pin_name(pin), dev->irq);
 acpi_register_gsi(>dev, dev_gsi,
   ACPI_LEVEL_SENSITIVE,
   ACPI_ACTIVE_LOW);
 } else {
 dev_warn(>dev, "PCI INT %c: no GSI\n",
  pin_name(pin));
 }

 return 0;
 }

1) I don't know where the restriction of 0x1-0xF came from.
Presumably this value of dev->irq came from PCI_INTERRUPT_LINE, and I
don't know what forbids values > 0xF.  The test was added by Andy
Grover in the attached commit.  This is ancient history; probably Andy
doesn't remember either :)

This is most likely due to ISA compability. Cf ACPI 4.0,
section 5.2.12.4 Platforms with APIC and Dual 8259 Support:

> Systems that support both APIC and dual 8259 interrupt models
> must map global system interrupts 0-15 to the 8259 IRQs 0-15,
> except where Interrupt Source Overrides are provided (see section
> 5.2.10.8, “Interrupt Source Overrides”). This means that I/O APIC
> interrupt inputs 0-15 must be mapped to global system interrupts
> 0-15 and have identical sources as the 8259 IRQs 0-15 unless
> overrides are used. This allows a platform to support OSPM
> implementations that use the APIC model as well as OSPM
> implementations that use the 8259 model (OSPM will only use
> one model; it will not mix models).
> When OSPM supports the 8259 model, it will assume

Re: [PATCH] regulator: tps6586x: Use dev_err rather than dev_warn for error message

2013-02-28 Thread Mark Brown

On Wed, Feb 20, 2013 at 10:23:46AM +0800, Axel Lin wrote:
> tps6586x_regulator_set_slew_rate() returns -EINVAL when having slew rate
> settings for other than SM0/1, thus use dev_err rather than dev_warn.

Applied, thanks.


signature.asc
Description: Digital signature

Re: [PATCH 2/2] kconfig: use config scripts to detect ncurses libs

2013-02-28 Thread justin

On 28/02/13 22:50, Sven Joachim wrote:
> On 2013-02-28 10:59 +0100, j...@gentoo.org wrote:
> 
>> Ncurses provides a config script (ncurses5-config) to assist finding ncurses.
>> This patch makes use of it to detect the necessary libs for linking of the
>> ncurses nconfig dialog.
> 
> That script is not necessarily called ncurses5-config, it might also be
> called ncurses6-config is ncurses is configured for a different ABI
> (--enable-ext-colors, --enable-ext-mouse).  Although I would suspect
> that any distribution who does that provides a compatibility symlink.
>

We don't do that, but I rechecked with building ncurses manually. You
are right. How widely spread is the usage of this options? Or is it
rather an experimental option?
What we could do is simple extending the syntax to additionally check
for the ABI version 6 config scripts. Is this an option to consider?

>>  scripts/kconfig/Makefile | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
>> index 3091794..c372976 100644
>> --- a/scripts/kconfig/Makefile
>> +++ b/scripts/kconfig/Makefile
>> @@ -216,7 +216,9 @@ HOSTCFLAGS_gconf.o   = `pkg-config --cflags gtk+-2.0 
>> gmodule-2.0 libglade-2.0` \
>>  
>>  HOSTLOADLIBES_mconf   = $(shell $(CONFIG_SHELL) $(check-lxdialog) -ldflags 
>> $(HOSTCC))
>>  
>> -HOSTLOADLIBES_nconf = -lmenu -lpanel -lncurses
>> +HOSTLOADLIBES_nconf = -lmenu -lpanel
>> +HOSTLOADLIBES_nconf += $(shellncursesw5-config --libs 2>/dev/null \
>> +   || ncurses5-config --libs 2>/dev/null  )
> 
> This will link with ncursesw, not ncurses.  Probably not what you want,
> since nconf.h does not #include the right headers for that.
> 

That's true, and again it would change two things at once. I will go
back to simple -lncurses as it was before.

> On Debian/Ubuntu, there's also the problem that ncursesw5-config exists
> even if the libncursesw5-dev package is not installed, so this patch
> makes the build fail in such cases.

Will be solved when reverting as described above. But actually it smells
like a bug in the package management, doesn't it? Why are build time
config scripts shipped in runtime only packages? What is their purpose?

> 
> Can we just call ncurses5-config and not ncursesw5-config, or are there
> any distros who ship the latter and not the former?
> 

I can't talk about distro wide situation, but manual building gives only
one of the two. So there might be a situation where only one of the two
is present.

It seems we have a little dilemma here. Any suggestions how to solve it?

Thanks,
Justin

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v13 1/8] save/load cpu runstate

2013-02-28 Thread Hu Tao

On Thu, Feb 28, 2013 at 02:12:37PM -0700, Eric Blake wrote:
> On 02/28/2013 05:13 AM, Hu Tao wrote:
> > This patch enables preservation of cpu runstate during save/load vm.
> > So when a vm is restored from snapshot, the cpu runstate is restored,
> > too.
> 
> What happens if a management app wants to override the runstate when
> restoring the domain?  I can think of several useful scenarios:
> 
> 1. management app pauses the guest, then saves domain state and other
> things (management state, or disk clones), then resumes the guest.
> Later, the management wants to revert to the saved state, but have the
> guest running right away.  I guess here, knowing that the guest was
> saved in a paused state doesn't hurt, since the management app can
> resume it right away.
> 
> 2. management app saves domain state of a live guest, then copies that
> state elsewhere.  In its new location, the management app wants to
> investigate the state for forensic analysis - so even though the guest
> remembers that it was running, management wants to start it paused.
> Here, it is important that there must not be a window of time where the
> guest can run, otherwise, the results are not reproducible.

-S takes precedence in the case. But for in-migration, runstate is
loaded from src.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/4] regulator: core: support shared enable GPIO

2013-02-28 Thread Mark Brown

On Mon, Feb 18, 2013 at 06:50:30AM +, Kim, Milo wrote:
> A Regulator can be enabled by external GPIO pin which is configurable in the
> regulator_config.
> At this moment, the GPIO can be owned by only one regulator device.
> In some devices like LP8788 LDOs, multiple regulators are enabled by shared
> one GPIO pin.
> This patch-set enables shared enable GPIO concept and fix LP8788 LDO driver
> as well.

Applied all, thanks.  Axel had a few comments as did I but these can be
fixed up incrementally.


signature.asc
Description: Digital signature

Re: dmesg macro in Documentation/kdump/gdbmacros.txt outdated

2013-02-28 Thread Daniel Mack

On 01.03.2013 02:31, Andreas Fenkart wrote:
> Is there an updated version matching the changed printk structure?

I hacked something up a while ago, but it's not perfect. The code below
stops dumping too early IIRC, but I never got around to fix that. Maybe
someone wants to look at it and help debug, so we can put it into
gdbmacros.txt ...

Thanks,
Daniel


define dmesg
set $idx = 0
set $seq = 0

while ($seq++ < log_next_seq)
set $buf = log_buf + $idx
set $log = (struct log *) $buf

if ($log->len == 0)
loop_break
end

printf "%s\n", (char *) ($buf + sizeof(struct log))
set $idx += $log->len
end
end

document dmesg
dmesg
Print the content of the kernel message buffer
end

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/4] regulator: core: support shared enable GPIO concept

2013-02-28 Thread Mark Brown

On Mon, Feb 18, 2013 at 06:50:39AM +, Kim, Milo wrote:

> + pin->gpio = config->ena_gpio;
> + pin->ena_gpio_invert = config->ena_gpio_invert;
> + list_add(>list, _ena_gpio_list);

We should really validate that the invert settings are consistent but
it's not so important since this is a user error that they'd probably
notice.

signature.asc
Description: Digital signature

Re: [PATCH] : fix compilation warnings with DT disabled

2013-02-28 Thread Simon Horman

On Tue, Feb 19, 2013 at 02:58:25AM +0300, Sergei Shtylyov wrote:
> Fix the following compilation warnings (in Simon Horman's renesas.git repo):
> 
> In file included from arch/arm/mach-shmobile/setup-r8a7779.c:24:0:
> include/linux/of_platform.h:107:13: warning: ‘struct of_device_id’ declared
> inside parameter list [enabled by default]
> include/linux/of_platform.h:107:13: warning: its scope is only this definition
> or declaration, which is probably not what you want [enabled by default]
> include/linux/of_platform.h:107:13: warning: ‘struct device_node’ declared
> inside parameter list [enabled by default]
> 
>  only #include's headers with definitions of the above
> mentioned structures if CONFIG_OF_DEVICE=y but uses them even if not. One
> solution is to move some #include's out of #ifdef CONFIG_OF_DEVICE and use
> incomplete declarations for the rest of the structures where the #ifdef move
> doesn't help...
> 
> Reported-by: Vladimir Barinov 
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Simon Horman 

Grant, could you consider taking this patch?

> ---
> Actually, it compiles eve without 'struct device_node' declared, I haven't
> found the reason of this, so left it there...
> 
>  include/linux/of_platform.h |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Index: linux/include/linux/of_platform.h
> ===
> --- linux.orig/include/linux/of_platform.h
> +++ linux/include/linux/of_platform.h
> @@ -11,9 +11,10 @@
>   *
>   */
>  
> -#ifdef CONFIG_OF_DEVICE
>  #include 
>  #include 
> +
> +#ifdef CONFIG_OF_DEVICE
>  #include 
>  #include 
>  #include 
> @@ -100,7 +101,7 @@ extern int of_platform_populate(struct d
>  
>  #if !defined(CONFIG_OF_ADDRESS)
>  struct of_dev_auxdata;
> -struct device;
> +struct device_node;
>  static inline int of_platform_populate(struct device_node *root,
>   const struct of_device_id *matches,
>   const struct of_dev_auxdata *lookup,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sh" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cgroup: no need to check css refs for release notification

2013-02-28 Thread Li Zefan

We no longer fail rmdir() when there're still css refs, so we don't
need to check css refs in check_for_release().

This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
without cgroup_mutex, so it can race with cgroup_unload_subsys().

cgroup_has_css_refs()
...
  if (ss == NULL || ss->root != cgrp->root)

if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
right after the former check but before the latter, the memory that
net_cls_subsys resides has become invalid.

Signed-off-by: Li Zefan 
---
 kernel/cgroup.c | 67 +++--
 1 file changed, 8 insertions(+), 59 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 43ff59e..f4554cc 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4343,47 +4343,6 @@ static int cgroup_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
return cgroup_create(c_parent, dentry, mode | S_IFDIR);
 }
 
-/*
- * Check the reference count on each subsystem. Since we already
- * established that there are no tasks in the cgroup, if the css refcount
- * is also 1, then there should be no outstanding references, so the
- * subsystem is safe to destroy. We scan across all subsystems rather than
- * using the per-hierarchy linked list of mounted subsystems since we can
- * be called via check_for_release() with no synchronization other than
- * RCU, and the subsystem linked list isn't RCU-safe.
- */
-static int cgroup_has_css_refs(struct cgroup *cgrp)
-{
-   int i;
-
-   /*
-* We won't need to lock the subsys array, because the subsystems
-* we're concerned about aren't going anywhere since our cgroup root
-* has a reference on them.
-*/
-   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-   struct cgroup_subsys *ss = subsys[i];
-   struct cgroup_subsys_state *css;
-
-   /* Skip subsystems not present or not in this hierarchy */
-   if (ss == NULL || ss->root != cgrp->root)
-   continue;
-
-   css = cgrp->subsys[ss->subsys_id];
-   /*
-* When called from check_for_release() it's possible
-* that by this point the cgroup has been removed
-* and the css deleted. But a false-positive doesn't
-* matter, since it can only happen if the cgroup
-* has been deleted and hence no longer needs the
-* release agent to be called anyway.
-*/
-   if (css && css_refcnt(css) > 1)
-   return 1;
-   }
-   return 0;
-}
-
 static int cgroup_destroy_locked(struct cgroup *cgrp)
__releases(_mutex) __acquires(_mutex)
 {
@@ -5112,12 +5071,15 @@ static void check_for_release(struct cgroup *cgrp)
 {
/* All of these checks rely on RCU to keep the cgroup
 * structure alive */
-   if (cgroup_is_releasable(cgrp) && !atomic_read(>count)
-   && list_empty(>children) && !cgroup_has_css_refs(cgrp)) {
-   /* Control Group is currently removeable. If it's not
+   if (cgroup_is_releasable(cgrp) &&
+   !atomic_read(>count) && list_empty(>children)) {
+   /*
+* Control Group is currently removeable. If it's not
 * already queued for a userspace notification, queue
-* it now */
+* it now
+*/
int need_schedule_work = 0;
+
raw_spin_lock(_list_lock);
if (!cgroup_is_removed(cgrp) &&
list_empty(>release_list)) {
@@ -5150,24 +5112,11 @@ EXPORT_SYMBOL_GPL(__css_tryget);
 /* Caller must verify that the css is not for root cgroup */
 void __css_put(struct cgroup_subsys_state *css)
 {
-   struct cgroup *cgrp = css->cgroup;
int v;
 
-   rcu_read_lock();
v = css_unbias_refcnt(atomic_dec_return(>refcnt));
-
-   switch (v) {
-   case 1:
-   if (notify_on_release(cgrp)) {
-   set_bit(CGRP_RELEASABLE, >flags);
-   check_for_release(cgrp);
-   }
-   break;
-   case 0:
+   if (v == 0)
schedule_work(>dput_work);
-   break;
-   }
-   rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(__css_put);
 
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking

2013-02-28 Thread Li Zefan

subsys[i] is set to NULL in cgroup_unload_subsys() at modular unload,
and that's protected by cgroup_mutex, and then the memory *subsys[i]
resides will be freed.

So this is unsafe without any locking:

  if (!ss || ss->module)
  ...

Signed-off-by: Li Zefan 
---
 include/linux/cgroup.h | 11 +--
 kernel/cgroup.c| 32 ++--
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c6ec1..3ac6bb0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -46,12 +46,19 @@ extern const struct file_operations proc_cgroup_operations;
 
 /* Define the enumeration of all builtin cgroup subsystems */
 #define SUBSYS(_x) _x ## _subsys_id,
-#define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
 enum cgroup_subsys_id {
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
 #include 
+#undef IS_SUBSYS_ENABLED
+   CGROUP_BUILTIN_SUBSYS_COUNT,
+
+   __CGROUP_SUBSYS_TEMP_PLACEHOLDER = CGROUP_BUILTIN_SUBSYS_COUNT - 1,
+
+#define IS_SUBSYS_ENABLED(option) IS_MODULE(option)
+#include 
+#undef IS_SUBSYS_ENABLED
CGROUP_SUBSYS_COUNT,
 };
-#undef IS_SUBSYS_ENABLED
 #undef SUBSYS
 
 /* Per-subsystem/per-cgroup state maintained by the system. */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f4554cc..29273db 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4944,17 +4944,17 @@ void cgroup_post_fork(struct task_struct *child)
 * and addition to css_set.
 */
if (need_forkexit_callback) {
-   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+   /*
+* fork/exit callbacks are supported only for builtin
+* subsystems, and the builtin section of the subsys
+* array is immutable, so we don't need to lock the
+* subsys array here. On the other hand, modular section
+* of the array can be freed at module unload, so we
+* can't touch that.
+*/
+   for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
 
-   /*
-* fork/exit callbacks are supported only for
-* builtin subsystems and we don't need further
-* synchronization as they never go away.
-*/
-   if (!ss || ss->module)
-   continue;
-
if (ss->fork)
ss->fork(child);
}
@@ -5019,13 +5019,17 @@ void cgroup_exit(struct task_struct *tsk, int 
run_callbacks)
tsk->cgroups = _css_set;
 
if (run_callbacks && need_forkexit_callback) {
-   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+   /*
+* fork/exit callbacks are supported only for builtin
+* subsystems, and the builtin section of the subsys
+* array is immutable, so we don't need to lock the
+* subsys array here. On the other hand, modular section
+* of the array can be freed at module unload, so we
+* can't touch that.
+*/
+   for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
 
-   /* modular subsystems can't use callbacks */
-   if (!ss || ss->module)
-   continue;
-
if (ss->exit) {
struct cgroup *old_cgrp =

rcu_dereference_raw(cg->subsys[i])->cgroup;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: commit_creds oops

2013-02-28 Thread Eric W. Biederman

Dave Jones  writes:

> On Thu, Feb 28, 2013 at 04:25:40PM -0800, Eric W. Biederman wrote:
>  
>  > > [   89.639850] RIP: 0010:[]  [] 
> commit_creds+0x250/0x2f0
>  > > [   89.658399] Call Trace:
>  > > [   89.658822]  [] 
> key_change_session_keyring+0xfb/0x140
>  > > [   89.659845]  [] task_work_run+0xa5/0xd0
>  > > [   89.660698]  [] do_notify_resume+0x71/0xb0
>  > > [   89.661581]  [] int_signal+0x12/0x17
>  > >
>  > > Appears to be..
>  > >
>  > > if ((set_ns == subset_ns->parent)  &&
>  > >  850:   48 8b 8a c8 00 00 00mov0xc8(%rdx),%rcx
>  > >
>  > > from the inlined cred_cap_issubset
>  > 
>  > Interesting.  That line is protected with the check subset_ns !=
>  > _user_ns so subset_ns->parent must be valid or subset_ns is not
>  > a proper user namespace.
>  > 
>  > Ugh.  I think I see what is going on and it is just silly. 
>  > 
>  > It looks like by historical accident we have been reading trying to set
>  > new->user_ns from new->user_ns.  Which is totally silly as new->user_ns
>  > is NULL (as is every other field in new except session_keyring at that
>  > point).
>  > 
>  > It looks like it is safe to sleep in key_change_session_keyring so why
>  > we just don't use prepare_creds there like everywhere else is beyond
>  > me.
>  > 
>  > The intent is clearly to copy all of the fields from old to new so what
>  > we should be doing is is copying old->user_ns into new->user_ns.
>  > 
>  > Dave can you verify that this patch fixes the oops?
>
> Looks like it.  Haven't hit the same thing since applying your patch.
>
> I noticed though that get_user_ns bumps a refcount.  Is this what we
> want if we're just copying ?

Yes.  commit_creds(new) winds up finding old on the current process
and calling put_cred(old).

put_cred when the count drops to zero winds up calling put_cred_rcu
which calls put_user_ns(old->user_ns);

For the same reason we need an extra count on the user namespace new
so that when it eventually is put and put_user_ns(new->user_ns) is
called we don't have a negative count.

Which is a long of way of saying yes we are adding another reference and
we need to increase the reference count.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 1/2] cgroup: fix cgroup_path() vs rename() race

2013-02-28 Thread Li Zefan

rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.

As accessing dentry->name must be protected by dentry->d_lock or
parent inode's i_mutex, while on the other hand cgroup-path() can
be called with some irq-safe spinlocks held, we can't generate
cgroup path using dentry->d_name.

Alternatively we make a copy of dentry->d_name and save it in
cgrp->name when a cgroup is created, and update cgrp->name at
rename().

v5: use flexible array instead of zero-size array.
v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
- add cgroup_name() wrapper.
v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
v2: make cgrp->name RCU safe.

Signed-off-by: Li Zefan 
---
 block/blk-cgroup.h |   2 -
 include/linux/cgroup.h |  24 +++
 kernel/cgroup.c| 106 +++--
 3 files changed, 100 insertions(+), 32 deletions(-)

diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 2459730..e2e3404 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -216,9 +216,7 @@ static inline int blkg_path(struct blkcg_gq *blkg, char 
*buf, int buflen)
 {
int ret;
 
-   rcu_read_lock();
ret = cgroup_path(blkg->blkcg->css.cgroup, buf, buflen);
-   rcu_read_unlock();
if (ret)
strncpy(buf, "", buflen);
return ret;
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 900af59..75c6ec1 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -150,6 +150,11 @@ enum {
CGRP_CPUSET_CLONE_CHILDREN,
 };
 
+struct cgroup_name {
+   struct rcu_head rcu_head;
+   char name[];
+};
+
 struct cgroup {
unsigned long flags;/* "unsigned long" so bitops work */
 
@@ -172,6 +177,19 @@ struct cgroup {
struct cgroup *parent;  /* my parent */
struct dentry *dentry;  /* cgroup fs entry, RCU protected */
 
+   /*
+* This is a copy of dentry->d_name, and it's needed because
+* we can't use dentry->d_name in cgroup_path().
+*
+* You must acquire rcu_read_lock() to access cgrp->name, and
+* the only place that can change it is rename(), which is
+* protected by parent dir's i_mutex.
+*
+* Normally you should use cgroup_name() wrapper rather than
+* access it directly.
+*/
+   struct cgroup_name __rcu *name;
+
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
 
@@ -404,6 +422,12 @@ struct cgroup_scanner {
void *data;
 };
 
+/* Caller should hold rcu_read_lock() */
+static inline const char *cgroup_name(const struct cgroup *cgrp)
+{
+   return rcu_dereference(cgrp->name)->name;
+}
+
 int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b5c6432..43ff59e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -238,6 +238,8 @@ static DEFINE_SPINLOCK(hierarchy_id_lock);
 /* dummytop is a shorthand for the dummy hierarchy's top cgroup */
 #define dummytop (_cgroup)
 
+static struct cgroup_name root_cgroup_name = { .name = "/" };
+
 /* This flag indicates whether tasks in the fork and exit paths should
  * check for fork/exit handlers to call. This avoids us having to do
  * extra work in the fork/exit path if none of the subsystems need to
@@ -860,6 +862,17 @@ static struct inode *cgroup_new_inode(umode_t mode, struct 
super_block *sb)
return inode;
 }
 
+static struct cgroup_name *cgroup_alloc_name(struct dentry *dentry)
+{
+   struct cgroup_name *name;
+
+   name = kmalloc(sizeof(*name) + dentry->d_name.len + 1, GFP_KERNEL);
+   if (!name)
+   return NULL;
+   strcpy(name->name, dentry->d_name.name);
+   return name;
+}
+
 static void cgroup_free_fn(struct work_struct *work)
 {
struct cgroup *cgrp = container_of(work, struct cgroup, free_work);
@@ -890,6 +903,7 @@ static void cgroup_free_fn(struct work_struct *work)
simple_xattrs_free(>xattrs);
 
ida_simple_remove(>root->cgroup_ida, cgrp->id);
+   kfree(rcu_dereference_raw(cgrp->name));
kfree(cgrp);
 }
 
@@ -1422,6 +1436,7 @@ static void init_cgroup_root(struct cgroupfs_root *root)
INIT_LIST_HEAD(>allcg_list);
root->number_of_cgroups = 1;
cgrp->root = root;
+   cgrp->name = _cgroup_name;
cgrp->top_cgroup = cgrp;
init_cgroup_housekeeping(cgrp);
list_add_tail(>allcg_node, >allcg_list);
@@ -1771,49 +1786,45 @@ static struct kobject *cgroup_kobj;
  * @buf: the buffer to write the path into
  * @buflen: the length of the buffer
  *
- * Called with cgroup_mutex held or else with an

[PATCH 2/2] cpuset: use cgroup_name() in cpuset_print_task_mems_allowed()

2013-02-28 Thread Li Zefan

Use cgroup_name() instead of cgrp->dentry->name. This makes the code
a bit simpler.

While at it, remove cpuset_name and make cpuset_nodelist a local variable
to cpuset_print_task_mems_allowed().

Signed-off-by: Li Zefan 
---
 kernel/cpuset.c | 32 +---
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4f9dfe4..ace5bfc 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -265,17 +265,6 @@ static DEFINE_MUTEX(cpuset_mutex);
 static DEFINE_MUTEX(callback_mutex);
 
 /*
- * cpuset_buffer_lock protects both the cpuset_name and cpuset_nodelist
- * buffers.  They are statically allocated to prevent using excess stack
- * when calling cpuset_print_task_mems_allowed().
- */
-#define CPUSET_NAME_LEN(128)
-#defineCPUSET_NODELIST_LEN (256)
-static char cpuset_name[CPUSET_NAME_LEN];
-static char cpuset_nodelist[CPUSET_NODELIST_LEN];
-static DEFINE_SPINLOCK(cpuset_buffer_lock);
-
-/*
  * CPU / memory hotplug is handled asynchronously.
  */
 static struct workqueue_struct *cpuset_propagate_hotplug_wq;
@@ -2592,6 +2581,8 @@ int cpuset_mems_allowed_intersects(const struct 
task_struct *tsk1,
return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
 }
 
+#define CPUSET_NODELIST_LEN(256)
+
 /**
  * cpuset_print_task_mems_allowed - prints task's cpuset and mems_allowed
  * @task: pointer to task_struct of some task.
@@ -2602,24 +2593,19 @@ int cpuset_mems_allowed_intersects(const struct 
task_struct *tsk1,
  */
 void cpuset_print_task_mems_allowed(struct task_struct *tsk)
 {
-   struct dentry *dentry;
+/* Statically allocated to prevent using excess stack. */
+   static char cpuset_nodelist[CPUSET_NODELIST_LEN];
+   static DEFINE_SPINLOCK(cpuset_buffer_lock);
 
-   dentry = task_cs(tsk)->css.cgroup->dentry;
-   spin_lock(_buffer_lock);
+   struct cgroup *cgrp = task_cs(tsk)->css.cgroup;
 
-   if (!dentry) {
-   strcpy(cpuset_name, "/");
-   } else {
-   spin_lock(>d_lock);
-   strlcpy(cpuset_name, (const char *)dentry->d_name.name,
-   CPUSET_NAME_LEN);
-   spin_unlock(>d_lock);
-   }
+   spin_lock(_buffer_lock);
 
nodelist_scnprintf(cpuset_nodelist, CPUSET_NODELIST_LEN,
   tsk->mems_allowed);
printk(KERN_INFO "%s cpuset=%s mems_allowed=%s\n",
-  tsk->comm, cpuset_name, cpuset_nodelist);
+  tsk->comm, cgroup_name(cgrp), cpuset_nodelist);
+
spin_unlock(_buffer_lock);
 }
 
-- 
1.8.0.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regulator: tps6586x: (cosmetic) simplify a conditional

2013-02-28 Thread Mark Brown

On Mon, Feb 25, 2013 at 12:34:09PM +0100, Guennadi Liakhovetski wrote:
> of_node_put() is called on either branch of a conditional, simplify the
> code by only calling it once.

Applied, thanks.


signature.asc
Description: Digital signature

Re: For review: pid_namespaces(7) man page

2013-02-28 Thread Eric W. Biederman

Rob Landley  writes:

> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>> Eric et al,
>> 
>> Eventually, there will be more namespace man pages, but let us start
>> now with one for PID namespaces. The attached page aims to provide a
>> fairly complete overview of PID namespaces.
>
> Onward!
>
>> PID_NAMESPACES(7)  Linux Programmer's Manual PID_NAMESPACES(7)
>> 
>> NAME
>>pid_namespaces - overview of Linux PID namespaces
>> 
>> DESCRIPTION
>>For an overview of namespaces, see namespaces(7).
>> 
>>PID  namespaces  isolate  the  process ID number space, meaning
>>that processes in different PID namespaces can  have  the  same
>>PID.
>
> Um, perhaps "different processes"? Slightly repetitive, but trying to  
> avoid the potential misreading that "a processes can have the same PID  
> in different namespaces". (A single process can't be a member of more  
> than one namespace. This is not about selective visibility.)

Well actually a process is visible and arguably a member of all parent
pid namespaces, and a process certainly had a pid value in each pid
namespace up to the root of the pid namespace tree.

>> PID namespaces allow containers to migrate to a new host
>>while the processes inside  the  container  maintain  the  same
>>PIDs.
>
> I thought suspend/resume a container was the simple case. Migration to  
> a new host is built on top of that. (On resume in a new container on  
> the same system, if other stuff is going on in the system so the  
> available PIDs have shifted.)

I don't know if there is a difference at the implementation level.

>>Likewise, a process in an ancestor namespace can—subject to the
>>usual permission checks described in  kill(2)—send  signals  to
>>the  "init" process of a child PID namespace only if the "init"
>>process has established a handler for that signal.  (Within the
>>handler,  the  siginfo_t si_pid field described in sigaction(2)
>>will be zero.)  SIGKILL or SIGSTOP are  treated  exceptionally:
>>these signals are forcibly delivered when sent from an ancestor
>>PID namespace.  Neither of these signals can be caught  by  the
>>"init" process, and so will result in the usual actions associ‐
>>ated with those signals (respectively, terminating and stopping
>>the process).
>
> If SIGKILL to init is propogated to all the children of init, is  
> SIGSTOP also propogated to all the children? (I.E. will SIGSTOP to  
> container's init suspend the whole container, and will SIGCONT resume  
> the whole container? If the latter, will it only resume processes that  
> weren't previously stopped? :)

No.  SIGSTOP stops sent to init stops just init.

It isn't SIGKILL that is propogated it is the exiting of init that is
propogated by way of SIGKILL.  If your init process calls _exit() or
hits a SIGSEGV and dies all of the other processes in the pid namespace
will be sent a SIGKILL and be forced down.

This is similar to a the system panic if the global init exits.

>>To put things another way: a process's PID namespace membership
>>is determined when the process is created and cannot be changed
>>thereafter.  Among other things, this means that  the  parental
>>relationship between processes mirrors the parental between PID
>
> mirrors the relationship
>
>>namespaces: the parent of a  process  is  either  in  the  same
>>namespace or resides in the immediate parent PID namespace.
>> 
>>Every  thread  in  a process must be in the same PID namespace.
>>For this reason, the two following call sequences will fail:
>> 
>>unshare(CLONE_NEWPID);
>>clone(..., CLONE_VM, ...);/* Fails */
>> 
>>setns(fd, CLONE_NEWPID);
>>clone(..., CLONE_VM, ...);/* Fails */
>
> They fail with -EUNDOCUMENTED
Make that -EINVAL.

>>Because the above unshare(2) and setns(2) calls only change the
>>PID  namespace  for created children, the clone(2) calls neces‐
>>sarily put the new thread in a different PID namespace from the
>>calling thread.
>
> Um, no they don't. They fail. That's the point. They _would_ put the  
> new thread in a different PID namespace, which breaks the definition of  
> threads.
>
> How about:
>
> The above unshare(2) and setns(2) calls change the PID namespace of
> children created by subsequent clone(2) calls, which is incompatible
> with CLONE_VM.
>
>>Miscellaneous
>>After  creating a new PID namespace, it is useful for the child
>>to change its root directory and mount a new procfs instance at
>>/proc  so  that  tools such as ps(1) work correctly.  (If a new
>>mount  namespace  is  simultaneously   created   by   including
>>CLONE_NEWNS  in  the flags argument of clone(2) or unshare(2)),
>>then it isn't necessary to

Re: [PATCH 1/5] regmap: irq: call pm_runtime_put in pm_runtime_get_sync failed case

2013-02-28 Thread Mark Brown

On Thu, Feb 28, 2013 at 03:37:11PM +0800, Li Fei wrote:
> 
> Even in failed case of pm_runtime_get_sync, the usage_count
> is incremented. In order to keep the usage_count with correct
> value and runtime power management to behave correctly, call
> pm_runtime_put(_sync) in such case.

Oh, that is a surprising interface...  anyway, applied thanks.


signature.asc
Description: Digital signature

Re: [PATCHv5 2/8] zsmalloc: add documentation

2013-02-28 Thread Ric Mason


On 02/25/2013 11:18 PM, Seth Jennings wrote:

On 02/23/2013 06:37 PM, Ric Mason wrote:

On 02/23/2013 05:02 AM, Seth Jennings wrote:

On 02/21/2013 08:56 PM, Ric Mason wrote:

On 02/21/2013 11:50 PM, Seth Jennings wrote:

On 02/21/2013 02:49 AM, Ric Mason wrote:

On 02/19/2013 03:16 AM, Seth Jennings wrote:

On 02/16/2013 12:21 AM, Ric Mason wrote:

On 02/14/2013 02:38 AM, Seth Jennings wrote:

This patch adds a documentation file for zsmalloc at
Documentation/vm/zsmalloc.txt

Signed-off-by: Seth Jennings 
---
  Documentation/vm/zsmalloc.txt |   68
+
  1 file changed, 68 insertions(+)
  create mode 100644 Documentation/vm/zsmalloc.txt

diff --git a/Documentation/vm/zsmalloc.txt
b/Documentation/vm/zsmalloc.txt
new file mode 100644
index 000..85aa617
--- /dev/null
+++ b/Documentation/vm/zsmalloc.txt
@@ -0,0 +1,68 @@
+zsmalloc Memory Allocator
+
+Overview
+
+zmalloc a new slab-based memory allocator,
+zsmalloc, for storing compressed pages.  It is designed for
+low fragmentation and high allocation success rate on
+large object, but <= PAGE_SIZE allocations.
+
+zsmalloc differs from the kernel slab allocator in two primary
+ways to achieve these design goals.
+
+zsmalloc never requires high order page allocations to back
+slabs, or "size classes" in zsmalloc terms. Instead it allows
+multiple single-order pages to be stitched together into a
+"zspage" which backs the slab.  This allows for higher
allocation
+success rate under memory pressure.
+
+Also, zsmalloc allows objects to span page boundaries within the
+zspage.  This allows for lower fragmentation than could be had
+with the kernel slab allocator for objects between PAGE_SIZE/2
+and PAGE_SIZE.  With the kernel slab allocator, if a page
compresses
+to 60% of it original size, the memory savings gained through
+compression is lost in fragmentation because another object of
+the same size can't be stored in the leftover space.
+
+This ability to span pages results in zsmalloc allocations not
being
+directly addressable by the user.  The user is given an
+non-dereferencable handle in response to an allocation request.
+That handle must be mapped, using zs_map_object(), which returns
+a pointer to the mapped region that can be used.  The mapping is
+necessary since the object data may reside in two different
+noncontigious pages.

Do you mean the reason of  to use a zsmalloc object must map after
malloc is object data maybe reside in two different nocontiguous
pages?

Yes, that is one reason for the mapping.  The other reason (more
of an
added bonus) is below.


+
+For 32-bit systems, zsmalloc has the added benefit of being
+able to back slabs with HIGHMEM pages, something not possible

What's the meaning of "back slabs with HIGHMEM pages"?

By HIGHMEM, I'm referring to the HIGHMEM memory zone on 32-bit
systems
with larger that 1GB (actually a little less) of RAM.  The upper
3GB
of the 4GB address space, depending on kernel build options, is not
directly addressable by the kernel, but can be mapped into the
kernel
address space with functions like kmap() or kmap_atomic().

These pages can't be used by slab/slub because they are not
continuously mapped into the kernel address space.  However, since
zsmalloc requires a mapping anyway to handle objects that span
non-contiguous page boundaries, we do the kernel mapping as part of
the process.

So zspages, the conceptual slab in zsmalloc backed by single-order
pages can include pages from the HIGHMEM zone as well.

Thanks for your clarify,
http://lwn.net/Articles/537422/, your article about zswap in lwn.
"Additionally, the kernel slab allocator does not allow
objects that
are less
than a page in size to span a page boundary. This means that if an
object is
PAGE_SIZE/2 + 1 bytes in size, it effectively use an entire page,
resulting in
~50% waste. Hense there are *no kmalloc() cache size* between
PAGE_SIZE/2 and
PAGE_SIZE."
Are your sure? It seems that kmalloc cache support big size, your
can
check in
include/linux/kmalloc_sizes.h

Yes, kmalloc can allocate large objects > PAGE_SIZE, but there are no
cache sizes _between_ PAGE_SIZE/2 and PAGE_SIZE.  For example, on a
system with 4k pages, there are no caches between kmalloc-2048 and
kmalloc-4096.

kmalloc object > PAGE_SIZE/2 or > PAGE_SIZE should also allocate from
slab cache, correct? Then how can alloc object w/o slab cache which?
contains this object size objects?

I have to admit, I didn't understand the question.

object is allocated from slab cache, correct? There two kinds of slab
cache, one is for general purpose, eg. kmalloc slab cache, the other
is for special purpose, eg. mm_struct, task_struct. kmalloc object >
PAGE_SIZE/2 or > PAGE_SIZE should also allocated from slab cache,
correct? then why you said that there are no caches between
kmalloc-2048 and kmalloc-4096?

Ok, now I get it.  Yes, I guess I should qualified here that there are
no _kmalloc_ caches between PAGE_SIZE/2 and

Re: Reproduceable SATA lockup on 3.7.8 with SSD

2013-02-28 Thread Marc MERLIN

On Tue, Feb 26, 2013 at 08:50:04AM -0800, Marc MERLIN wrote:
> On Tue, Feb 26, 2013 at 10:29:59AM -0500, Jeff Garzik wrote:
> > On 02/25/2013 07:27 PM, Marc MERLIN wrote:
> > >Howdy,
> > >
> > >I seem to have the same problem (or similar) as Mathieu Desnoyers in
> > >https://lkml.org/lkml/2013/2/22/437
> > >
> > >I can reliably get my SSD to drop from the SATA bus given the right 
> > >workload
> > >on linux.
> > >
> > >How can I tell if it's linux's fault of the drive's fault?
> > 
> > Manually force speed to 3.0 Gbps, then 1.5 Gbps, and see what happens.
> > 
> > Try module/kernel parameter libata.force=1.5Gbps or libata.force=3.0Gbps
> 
> Ok, so by reading my log at time of failure, you saw that speed was
> flipping between the two? (I couldn't see that, but I'm not good at reading
> it).
> 
> Also, just to make sure, you're not saying that you want me to change the
> speed at runtime, but 
> 1) boot once with speed forced at 3Gbps and try and reproduce
> 2) boot a 2nd time with speed forced at 1.5Gbps and try and reproduce
> 
> If libata is not a module in my kernel, I can still put 
> libata.force=1.5Gbps 
> on the lilo/grub command line, correct?

Jeff, could you clear up what you'd like me to try out?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: mm: Check if PUD is large when validating a kernel address v2

2013-02-28 Thread Simon Jeons


On 02/13/2013 07:02 PM, Mel Gorman wrote:

Andrew or Ingo, please pick up.

Changelog since v1
   o Add reviewed-bys and acked-bys

A user reported a bug whereby a backup process accessing /proc/kcore
caused an oops.

  BUG: unable to handle kernel paging request at bb00ff33b000
  IP: [] kern_addr_valid+0xbe/0x110
  PGD 0
  Oops:  [#1] SMP
  CPU 6
  Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc 
8021q garp stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave 
acpi_cpufreq mperf microcode fuse nls_iso8859_1 nls_cp437 vfat fat loop dm_mod 
ioatdma ipv6 ipv6_lib igb dca i7core_edac edac_core i2c_i801 i2c_core cdc_ether 
usbnet bnx2 mii iTCO_wdt iTCO_vendor_support shpchp rtc_cmos pci_hotplug 
tpm_tis sg tpm pcspkr tpm_bios serio_raw button ext3 jbd mbcache uhci_hcd 
ehci_hcd usbcore sd_mod crc_t10dif usb_common processor thermal_sys hwmon 
scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh_hp_sw scsi_dh ata_generic 
ata_piix libata megaraid_sas scsi_mod

  Pid: 16196, comm: Hibackp Not tainted 3.0.13-0.27-default #1 IBM System x3550 
M3 -[7944 K3G]-/94Y7614
  RIP: 0010:[]  [] 
kern_addr_valid+0xbe/0x110
  RSP: 0018:88094165fe80  EFLAGS: 00010246
  RAX: 3300ff33b000 RBX: 8801 RCX: 
  RDX: 0001 RSI: 8800 RDI: ff32b300ff33b400
  RBP: 1000 R08: 3000 R09: 
  R10: 22302e31223d6e6f R11: 0246 R12: 1000
  R13: 3000 R14: 00571be0 R15: 88094165ff50
  FS:  7ff152d33700() GS:88097f2c() knlGS:
  CS:  0010 DS:  ES:  CR0: 8005003b
  CR2: bb00ff33b000 CR3: 0009405a3000 CR4: 06e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: 0ff0 DR7: 0400
  Process Hibackp (pid: 16196, threadinfo 88094165e000, task 
8808eb9ba600)
  Stack:
   811b8aaa 4000 880943fea480 8808ef2bae50
   880943d32980 fffb 8808ef2bae40 88094165ff50
   4000 0056ebe0 811ad847 0056ebe0
  Call Trace:
   [] read_kcore+0x17a/0x370
   [] proc_reg_read+0x77/0xc0
   [] vfs_read+0xc7/0x130
   [] sys_read+0x53/0xa0
   [] system_call_fastpath+0x16/0x1b

Investigation determined that the bug triggered when reading system RAM
at the 4G mark. On this system, that was the first address using 1G pages


Do you mean there is one page which is 1G?


for the virt->phys direct mapping so the PUD is pointing to a physical
address, not a PMD page.  The problem is that the page table walker in
kern_addr_valid() is not checking pud_large() and treats the physical
address as if it was a PMD.  If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If the data
happens to look like a present PMD though, it will be walked resulting in
the oops above. This patch adds the necessary pud_large() check.

Cc: sta...@vger.kernel.org
Signed-off-by: Mel Gorman 
Reviewed-by: Rik van Riel 
Reviewed-by: Michal Hocko 
Acked-by: Johannes Weiner 
---
  arch/x86/include/asm/pgtable.h |5 +
  arch/x86/mm/init_64.c  |3 +++
  2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5199db2..1c1a955 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -142,6 +142,11 @@ static inline unsigned long pmd_pfn(pmd_t pmd)
return (pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT;
  }
  
+static inline unsigned long pud_pfn(pud_t pud)

+{
+   return (pud_val(pud) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
  #define pte_page(pte) pfn_to_page(pte_pfn(pte))
  
  static inline int pmd_large(pmd_t pte)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2ead3c8..75c9a6a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -831,6 +831,9 @@ int kern_addr_valid(unsigned long addr)
if (pud_none(*pud))
return 0;
  
+	if (pud_large(*pud))

+   return pfn_valid(pud_pfn(*pud));
+
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
return 0;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/7] Add O_DENY* support for VFS and CIFS/NFS

2013-02-28 Thread Pavel Shilovsky

2013/3/1 Andy Lutomirski :
> [possible resend -- sorry]
>
> On 02/28/2013 07:25 AM, Pavel Shilovsky wrote:
>> This patchset adds support of O_DENY* flags for Linux fs layer. These flags 
>> can be used by any application that needs share reservations to organize a 
>> file access. VFS already has some sort of this capability - now it's done 
>> through flock/LOCK_MAND mechanis, but that approach is non-atomic. This 
>> patchset build new capabilities on top of the existing one but doesn't bring 
>> any changes into the flock call semantic.
>>
>> These flags can be used by NFS (built-in-kernel) and CIFS (Samba) servers 
>> and Wine applications through VFS (for local filesystems) or CIFS/NFS 
>> modules. This will help when e.g. Samba and NFS server share the same 
>> directory for Windows and Linux users or Wine applications use Samba/NFS 
>> share to access the same data from different clients.
>>
>> According to the previous discussions the most problematic question is how 
>> to prevent situations like DoS attacks where e.g /lib/liba.so file can be 
>> open with DENYREAD, or smth like this. That's why one extra flag O_DENYMAND 
>> is added. It indicates to underlying layer that an application want to use 
>> O_DENY* flags semantic. It allows us not affect native Linux applications 
>> (that don't use O_DENYMAND flag) - so, these flags (and the semantic of open 
>> syscall that they bring) are used only for those applications that really 
>> want it proccessed that way.
>>
>> So, we have four new flags:
>> O_DENYREAD - to prevent other opens with read access,
>> O_DENYWRITE - to prevent other opens with write access,
>> O_DENYDELETE - to prevent delete operations (this flag is not implemented in 
>> VFS and NFS part and only suitable for CIFS module),
>> O_DENYMAND - to switch on/off three flags above.
>
> O_DENYMAND doesn't deny anything.  Would a name like O_RESPECT_DENY be
> better?
>
> Other than that, this seems like a sensible mechanism.

I don't mind to rename it. Your suggestion looks ok to me, thanks.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:core/locking] x86/smp: Move waiting on contended ticket lock out of line

2013-02-28 Thread Rik van Riel


On 02/28/2013 06:09 PM, Linus Torvalds wrote:


So I almost think that *everything* there in the semaphore code could
be done under RCU. The actual spinlock doesn't seem to much matter, at
least for semaphores. The semaphore values themselves seem to be
protected by the atomic operations, but I might be wrong about that, I
didn't even check.


Checking try_atomic_semop and do_smart_update, it looks like neither
is using atomic operations. That part of the semaphore code would
still benefit from spinlocks.

The way the code handles a whole batch of semops all at once,
potentially to multiple semaphores at once, and with the ability
to undo all of the operations, it looks like the spinlock will
still need to be per block of semaphores.

I guess the code may still benefit from Michel's locking code,
after the permission stuff has been moved from under the spinlock.

Two remaining worries are the security_sem_free call and the
non-RCU list_del calls from freeary, called from the SEM_RMID
code. They are probably fine, but worth another pair of eyes...

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regulator: palmas: use correct device node for DT parsing

2013-02-28 Thread Mark Brown

On Wed, Feb 27, 2013 at 02:23:42PM +, Graeme Gregory wrote:
> On 27/02/13 14:10, Laxman Dewangan wrote:

> > When device is registered through the DT then regulators node
> > exist in the parent device node of regulator driver. Hence passing
> > parent device node for parsing DT in place of self-device node
> > which is typically NULL.

> > -   struct device_node *node = pdev->dev.of_node;
> > +   struct device_node *node = pdev->dev.parent->of_node;

> This is not correct, nor is the reasoning.

> I suspect your previous patch broke DT probing so your not getting nodes
> filled in.

So, the reason that this pattern has generally been followed is so that
the regulator core can do the equivalent of regulator_get(dev, supply)
to find the supplies.  Using the parent device there is particularly
important in non-DT systems so that we can map the child regulator
supply in by using the dev_name() of the parent rather than the MFD
internal subdevice name but for pure DT systems where it's all just
direct links it's less of an issue.

signature.asc
Description: Digital signature

Re: [PATCH RFC] usb: dwc3: Get PHY from platform specific dwc3 dt node.

2013-02-28 Thread Felipe Balbi

Hi,

On Thu, Feb 28, 2013 at 08:09:33PM +0530, Vivek Gautam wrote:
> On Thu, Jan 31, 2013 at 9:08 PM, Felipe Balbi  wrote:
> > On Thu, Jan 31, 2013 at 09:00:37PM +0530, Vivek Gautam wrote:
> >> Hi Felipe,
> >>
> >>
> >> On Thu, Jan 31, 2013 at 8:55 PM, Felipe Balbi  wrote:
> >> > Hi,
> >> >
> >> > On Thu, Jan 31, 2013 at 08:53:27PM +0530, Vivek Gautam wrote:
> >> >> >> Moreover, SoCs having multiple dwc3 controllers will have multiple
> >> >> >> PHYs, which eventually be added using usb_add_phy_dev(), and not
> >> >> >> using usb_add_phy(). So each dwc3 controller won't be able to
> >> >> >> get PHYs by simply calling devm_usb_get_phy() also.
> >> >> >
> >> >> > No. We have added usb_get_phy_dev() for that purpose in the case of 
> >> >> > non-dt.
> >> >> > I think, instead you can have a patch to use devm_usb_get_phy_dev() 
> >> >> > here and
> >> >> > in exynos platform specific code use usb_bind_phy() to bind the phy 
> >> >> > and
> >> >> > controller till you change it to dt.
> >> >> >
> >> >>
> >> >> We have dt support for dwc3-exynos, in such case we should go ahead with
> >> >> of_platform_populate(), right ?
> >> >> But if when i use of_platform_populate() i will not be able to set
> >> >> dma_mask to dwc3->dev. :-(
> >> >
> >> > do you have a special need for dma_mask because OF already sets it.
> >> >
> >> If i am not wrong of_platform_device_create_pdata() will set
> >> "dev->dev.coherent_dma_mask = DMA_BIT_MASK(32)"
> >> and not dma_mask.
> >> I fact we had some discussion sometime back when we needed the same
> >> for dwc3-exynos in the thread:
> >> [PATCH v2 1/2] USB: dwc3-exynos: Add support for device tree
> >>
> >> But couldn't get final node on it.
> >> So suggestions here please. :-)
> >
> > hmm.. you're right there. Grant, Rob ? Any hints ?
> >
> 
> Any suggestions on this ?

anyone ?

-- 
balbi


signature.asc
Description: Digital signature

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread H. Peter Anvin

On 02/25/2013 08:51 PM, Martin Bligh wrote:
>> Do you mean we can remove numaq x86 32bit code now?
> 
> Wouldn't bother me at all. The machine is from 1995, end of life c. 2000?
> Was useful in the early days of getting NUMA up and running on Linux,
> but is now too old to be a museum piece, really.
> 

I'd be very happy to get the NUMAQ code ripped out.  I am wondering if
there are any reasons to keep any 32-bit x86 NUMA code at all.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] cgroup: add cgroup_name() API

2013-02-28 Thread Li Zefan

On 2013/2/28 22:49, Tejun Heo wrote:
> On Wed, Feb 27, 2013 at 10:53 PM, Li Zefan  wrote:
>>> static const struct cgroup_name root_cgroup_name = { .name = "/" };
>>
>> Can't... That's char name[0] not char *name.
> 
> Flexible array members can be statically initialized. If you wanna be
> really anal about it, you can do it manually with a wrapping struct
> but I don't think that would be necessary.
> 

I didn't know this difference between flexible array and zero-size array.
Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Adds support for Open Firmware in MAX730x GPIO Driver

2013-02-28 Thread leroy christophe



Le 01/03/2013 01:43, Linus Walleij a écrit :

On Fri, Feb 22, 2013 at 10:26 AM, Christophe Leroy
 wrote:


This patch allows the use of the MAX730x Driver on systems using
the Open Firmware platform format

Signed-off-by: Patrick Vasseur 
Signed-off-by: Christophe Leroy 

(...)

 /* bits_per_word cannot be configured in platform data */
-   spi->bits_per_word = 16;
+   if (spi->dev.platform_data)
+   spi->bits_per_word = 16;

What about just fixing so you *can* specify that instead?
The comment looks more like a FIXME to me.
Euh, ok, why not. But here the purpose of my patch is to allow using 
this driver with of_platform in addition to platform.

This FIXME is not mine, it was already existing in that driver.
As of_platform can configure bits per word, the only thing I did is to 
add a test in order to not apply this FIXME on the of_platform case.


Do you think my patch is not acceptable like this ?

Regards
Christophe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL 0/3] arm-soc: late changes for 3.9

2013-02-28 Thread Olof Johansson

On Thu, Feb 28, 2013 at 8:18 PM, Linus Torvalds
 wrote:

>> I've pushed a resolved branch for reference (late-branches-resolved)
>> in case you want to compare conflict resolutions.
>
> So Arnd's tag talked about removing the stale gpio.h, but I think it
> was the i2c.h that was now also stale. So I removed both - even though
> technically, the merge should have left i2c.h since it was in both
> parents. You should double-check that, but I don't see how that
>  could *possibly* be valid any more, and people had tried
> (unsuccessfully) to remove it once already, so...

The i2c include is definitely unnecessary since there's no
i2c_board_info stuff left in the file. Thanks for catching that.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Tang Chen


On 03/01/2013 01:00 PM, Yinghai Lu wrote:

On Thursday, February 28, 2013, H. Peter Anvin wrote:


On 02/28/2013 08:32 PM, Linus Torvalds wrote:

Yingai, Andrew,
  is this ok with you two?

 Linus


FWIW, it makes sense to me iff it resolves the problems



I prefer to reverting all 8 patches.

Actually I have worked out one patch that could solve all problems, but it
is too intrusive that I do  not want to split it to small pieces to
post it.

Leaving the movablemem_map related changes in  the upstream tree,
will prevent me from continuing to make memblock to be used to allocate
page table on local node ram for hot add.


Hi Yinghai,

Would you please give me a url to your code ?

I don't think movablemem_map will block your work a lot. According to your
description, you are modifying memblock to reserve some memory for local
node pagetables, right ?

If so, I think it won't be too difficult to make the code OK with your work.

Thanks. :)



Will send reverting patch and putting page table on local node patch around
10pm after I get home.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regulator: core: fix documentation error in regulator_allow_bypass

2013-02-28 Thread Mark Brown

On Thu, Feb 28, 2013 at 06:44:54PM -0600, Nishanth Menon wrote:
> commit f59c8f9f (regulator: core: Support bypass mode)
> has a short documentation error around the regulator_allow_bypass
> parameter 'enable' which is documented as 'allow'.

Applied, thanks.


signature.asc
Description: Digital signature

Re: [PATCH] regulator: core: update kernel documentation for regulator_desc

2013-02-28 Thread Mark Brown

On Thu, Feb 28, 2013 at 06:12:47PM -0600, Nishanth Menon wrote:
> commit df367931 (regulator: core: Provide regmap get/set bypass
> operations) introduced regulator_[gs]et_bypass_regmap

Applied, thanks.


signature.asc
Description: Digital signature

Re: commit_creds oops

2013-02-28 Thread Dave Jones

On Thu, Feb 28, 2013 at 04:25:40PM -0800, Eric W. Biederman wrote:
 
 > > [   89.639850] RIP: 0010:[]  [] 
 > > commit_creds+0x250/0x2f0
 > > [   89.658399] Call Trace:
 > > [   89.658822]  [] key_change_session_keyring+0xfb/0x140
 > > [   89.659845]  [] task_work_run+0xa5/0xd0
 > > [   89.660698]  [] do_notify_resume+0x71/0xb0
 > > [   89.661581]  [] int_signal+0x12/0x17
 > >
 > > Appears to be..
 > >
 > > if ((set_ns == subset_ns->parent)  &&
 > >  850:   48 8b 8a c8 00 00 00mov0xc8(%rdx),%rcx
 > >
 > > from the inlined cred_cap_issubset
 > 
 > Interesting.  That line is protected with the check subset_ns !=
 > _user_ns so subset_ns->parent must be valid or subset_ns is not
 > a proper user namespace.
 > 
 > Ugh.  I think I see what is going on and it is just silly. 
 > 
 > It looks like by historical accident we have been reading trying to set
 > new->user_ns from new->user_ns.  Which is totally silly as new->user_ns
 > is NULL (as is every other field in new except session_keyring at that
 > point).
 > 
 > It looks like it is safe to sleep in key_change_session_keyring so why
 > we just don't use prepare_creds there like everywhere else is beyond
 > me.
 > 
 > The intent is clearly to copy all of the fields from old to new so what
 > we should be doing is is copying old->user_ns into new->user_ns.
 > 
 > Dave can you verify that this patch fixes the oops?

Looks like it.  Haven't hit the same thing since applying your patch.

I noticed though that get_user_ns bumps a refcount.  Is this what we
want if we're just copying ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Regulator: db8500-prcmu - remove incorrect __exit markup

2013-02-28 Thread Mark Brown

On Sun, Feb 24, 2013 at 07:26:25PM -0800, Dmitry Torokhov wrote:
> Even if bus is not hot-pluggable, the devices can be unbound from the
> driver via sysfs, so we should not be using __exit annotations on
> remove() methods. The only exception is drivers registered with
> platform_driver_probe() which specifically disables sysfs bind/unbind
> attributes.

Applied, thanks.  Should've been __devexit up until that getting
removed.


signature.asc
Description: Digital signature

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Yasuaki Ishimatsu


2013/03/01 14:00, Yinghai Lu wrote:

On Thursday, February 28, 2013, H. Peter Anvin wrote:


On 02/28/2013 08:32 PM, Linus Torvalds wrote:

Yingai, Andrew,
  is this ok with you two?

 Linus


FWIW, it makes sense to me iff it resolves the problems



I prefer to reverting all 8 patches.

Actually I have worked out one patch that could solve all problems, but it
is too intrusive that I do  not want to split it to small pieces to
post it.




Leaving the movablemem_map related changes in  the upstream tree,
will prevent me from continuing to make memblock to be used to allocate
page table on local node ram for hot add.


Original issue occurs by two patches. And it is fixed by Tang's reverting
patch. So other patches are obviously unrelated to original problem. Thus
there is no reason to revert all patches related with movablemem_map.

If there is a reason, movablemem_map patches prevent only your work.

If you keep on developing your work, you should develop it in consideration
of those patches.

Thanks,
Yasuaki Ishimatsu



Will send reverting patch and putting page table on local node patch around
10pm after I get home.

Thanks




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[V2 PATCH -rt] futex: fix unbalanced spin_lock/spin_unlock() in exit_pi_state_list()

2013-02-28 Thread Yong Zhang

From: Yong Zhang 

Otherwise, below warning is shown somtimes when running some test:

WARNING: at kernel/sched/core.c:3423 migrate_disable+0xbf/0xd0()
Hardware name: OptiPlex 755
Modules linked in: floppy parport parport_pc minix
Pid: 1800, comm: tst-robustpi8 Tainted: GW3.4.28-rt40 #1
Call Trace:
 [] warn_slowpath_common+0x7f/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] migrate_disable+0xbf/0xd0
 [] exit_pi_state_list+0xa5/0x170
 [] mm_release+0x12f/0x170
 [] exit_mm+0x26/0x140
 [] ? acct_collect+0x186/0x1c0
 [] do_exit+0x146/0x930
 [] ? get_parent_ip+0x11/0x50
 [] do_group_exit+0x4d/0xc0
 [] get_signal_to_deliver+0x23f/0x6a0
 [] do_signal+0x65/0x5e0
 [] ? group_send_sig_info+0x76/0x80
 [] do_notify_resume+0x98/0xd0
 [] int_signal+0x12/0x17
---[ end trace 0004 ]---

The reason is that spin_lock() is taken in atomic context, but
spin_unlock() is not.

Signed-off-by: Yong Zhang 
Cc: Thomas Gleixner 
Cc: Steven Rostedt 
---
 kernel/futex.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 9e26e87..daada3d 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -568,7 +568,9 @@ void exit_pi_state_list(struct task_struct *curr)
 * task still owns the PI-state:
 */
if (head->next != next) {
+   raw_spin_unlock_irq(>pi_lock);
spin_unlock(>lock);
+   raw_spin_lock_irq(>pi_lock);
continue;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] futex: fix unbalanced spin_lock/spin_unlock() in exit_pi_state_list()

2013-02-28 Thread Yong Zhang

On Fri, Mar 1, 2013 at 9:36 AM, Yong Zhang  wrote:
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -562,16 +562,17 @@ void exit_pi_state_list(struct task_struct *curr)
>
> spin_lock(>lock);
>
> -   raw_spin_lock_irq(>pi_lock);
> /*
>  * We dropped the pi-lock, so re-check whether this
>  * task still owns the PI-state:
>  */
> if (head->next != next) {

Just ignore this patch, race window is opened here.
New patch comes soon.

Thanks,
Yong

> spin_unlock(>lock);
> +   raw_spin_lock_irq(>pi_lock);
> continue;
> }
>
> +   raw_spin_lock_irq(>pi_lock);
> WARN_ON(pi_state->owner != curr);
> WARN_ON(list_empty(_state->list));
> list_del_init(_state->list);
> --
> 1.7.9.5
>



-- 
Only stand for myself
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH]serial: 8250: Fix detect XScale port wrong

2013-02-28 Thread Wang YanQing

Some UARTs add enhanced functions with unused bit in
16550 standard, like UART_IER_UUE bit, it cause XScale
detect wrong. Now detect UART_IER_UUE and UART_IER_RTOIE
to reduce the annoying wrong result which cause UARTs don't
work.

Serial controller: Device 4348:3253(CH352 PCI based Multi-I/O Controller)
is a example. It use UART_IER_UUE as the LOWPOWER function,
you can get the datasheet from below urls:

http://wch-ic.com/download/list.asp?id=116
CH352DS1.PDF

http://wch-ic.com/download/list.asp?id=117
CH352DS2.PDF.

I choice UART_IER_RTOIE as another test bit, because
choice it is harmless for current code, we will set
UART_CAP_RTOIE if it is XScale port.

Signed-off-by: Wang YanQing 
---
 drivers/tty/serial/8250/8250.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/tty/serial/8250/8250.c b/drivers/tty/serial/8250/8250.c
index 0efc815..2c1f9c9 100644
--- a/drivers/tty/serial/8250/8250.c
+++ b/drivers/tty/serial/8250/8250.c
@@ -841,6 +841,7 @@ static void autoconfig_16550a(struct uart_8250_port *up)
 {
unsigned char status1, status2;
unsigned int iersave;
+   unsigned int iertest;
 
up->port.type = PORT_16550A;
up->capabilities |= UART_CAP_FIFO;
@@ -966,16 +967,25 @@ static void autoconfig_16550a(struct uart_8250_port *up)
 * We're going to explicitly set the UUE bit to 0 before
 * trying to write and read a 1 just to make sure it's not
 * already a 1 and maybe locked there before we even start start.
+*
+* 01/03/2013
+* Some UARTs add enhanced functions with unused bit in
+* 16550 standard, like UART_IER_UUE bit, it cause XScale
+* detect wrong. Now detect UART_IER_UUE and UART_IER_RTOIE
+* to reduce the annoying wrong result which cause UART don't
+* work.
 */
iersave = serial_in(up, UART_IER);
-   serial_out(up, UART_IER, iersave & ~UART_IER_UUE);
-   if (!(serial_in(up, UART_IER) & UART_IER_UUE)) {
+   iertest = UART_IER_UUE | UART_IER_RTOIE;
+
+   serial_out(up, UART_IER, iersave & ~iertest);
+   if (!(serial_in(up, UART_IER) & iertest)) {
/*
 * OK it's in a known zero state, try writing and reading
 * without disturbing the current state of the other bits.
 */
-   serial_out(up, UART_IER, iersave | UART_IER_UUE);
-   if (serial_in(up, UART_IER) & UART_IER_UUE) {
+   serial_out(up, UART_IER, iersave | iertest);
+   if ((serial_in(up, UART_IER) & iertest) == iertest) {
/*
 * It's an Xscale.
 * We'll leave the UART_IER_UUE bit set to 1 (enabled).
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: kprobing "hash_64.constprop.26" crashes the system, recursion through get_kprobe?

2013-02-28 Thread Masami Hiramatsu

Hi,

(2013/03/01 14:31), Ananth N Mavinakayanahalli wrote:
> On Wed, Feb 27, 2013 at 11:42:41AM +0200, Timo Juhani Lindfors wrote:
>>
>> There is a long-standing problem in the systemtap community where
>> accidentally kprobing a delicate function causes the system to crash:
>>
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604453
>> http://sourceware.org/bugzilla/show_bug.cgi?id=2725
>> https://bugzilla.redhat.com/show_bug.cgi?id=655904
>> http://sourceware.org/bugzilla/show_bug.cgi?id=13659
>>
>> The current solution is to mark these functions with __kprobes that
>> places them to a separate kprobe-free section (from __kprobes_text_start
>> to __kprobes_text_end). This has the nice side effect that also inlined
>> copies of innocent functions can not be kprobed when they are called
>> from functions marked with __kprobes.
>>
>> Now, hash_64 is marked "inline" but this is only a hint for the
>> compiler. On my Debian unstable system (Linux 3.7.3-1~experimental.1)
>> hash_64 actually exists in six different places thanks to the GCC
>> ipa-cp (interprocedural constant propagation) optimization:
> 
> I am unable to recreate this problem on a fedora system; hash_64 is
> inlined AFAICS.

I also tried and couldn't recreate hash_64 problem on my ubuntu 12.10.
Could you tell us your kconfig?

>> crashes the system. I used the "xm dump-core" facility of xen to dump
>> the memory of the domU and obtained the following bactrace using
>> "crash vm.img /usr/lib/debug/boot/vmlinux-3.7-trunk-amd64" and
>> "for bt":
>>
>> PID: 3007   TASK: 88003b9bb840  CPU: 0   COMMAND: "insmod"
>>  #0 [88003db8] __schedule at 813777f8
>>  #1 [88003db999a8] hash_64.constprop.26 at 81099909
>>  #2 [88003db999d0] get_kprobe at 8137c5bb
>>  #3 [88003db999e0] kprobe_exceptions_notify at 8137a3c1
>>  #4 [88003db99a40] notifier_call_chain at 8137b5a3
>>  #5 [88003db99a80] notify_die at 8137b60c
>>  #6 [88003db99ab0] do_int3 at 81378fa0
>>  #7 [88003db99ad0] xen_int3 at 8137887e
>> [exception RIP: hash_64.constprop.26+1]
>> RIP: 81099909  RSP: 88003db99b80  RFLAGS: 0086
>> RAX:   RBX: 81099908  RCX: 
>> RDX: 88003db99c38  RSI: 0002  RDI: 81099908
>> RBP: 0002   R8:    R9: 81629b10
>> R10: 66a8  R11: a016a000  R12: 88003f80dd90
>> R13: 81099908  R14: 81099909  R15: a016a010
>> ORIG_RAX:   CS: 1e030  SS: e02b
>>  #8 [88003db99b80] get_kprobe at 8137c5bb
>>  #9 [88003db99b90] kprobe_exceptions_notify at 8137a3c1
>> #10 [88003db99bf0] notifier_call_chain at 8137b5a3
>> #11 [88003db99c30] notify_die at 8137b60c
>> #12 [88003db99c60] do_int3 at 81378fa0
>> #13 [88003db99c80] xen_int3 at 8137887e
>> [exception RIP: hash_64.constprop.26+1]
>> RIP: 81099909  RSP: 88003db99d30  RFLAGS: 0246
>> RAX:   RBX: 81099908  RCX: a521
>> RDX: 81099908  RSI: 81099908  RDI: 81099908
>> RBP: 88003db99e10   R8: 140b   R9: 81099908
>> R10: 66a8  R11: a016a000  R12: 81099908
>> R13: 81099903  R14:   R15: a016a010
>> ORIG_RAX:   CS: e030  SS: e02b
>> #14 [88003db99d30] get_kprobe at 8137c5bb
>> #15 [88003db99d40] __recover_optprobed_insn at 8102d4d4
>> #16 [88003db99d70] recover_probed_instruction at 8102d479
>> #17 [88003db99d90] can_optimize at 8137a952
>> #18 [88003db99e50] arch_prepare_optimized_kprobe at 8137ab2c
>> #19 [88003db99ea0] alloc_aggr_kprobe.isra.17 at 8137bb9b
>> #20 [88003db99ec0] register_kprobe at 8137cf16
>> #21 [88003db99f00] init_module at a000600d [testcase1]
>> #22 [88003db99f10] do_one_initcall at 810020b6
>> #23 [88003db99f40] sys_init_module at 81083c4f
>> #24 [88003db99f80] system_call_fastpath at 8137d6e9
>> RIP: 7f4aef62414a  RSP: 7fffbd2e9d08  RFLAGS: 0202
>> RAX: 00af  RBX: 8137d6e9  RCX: 7f4aef62048a
>> RDX: 7f4aef8e3f68  RSI: 0002b833  RDI: 7f4aefcca000
>> RBP: 7f4af0a391a0   R8: 0003   R9: 
>> R10: 7f4aef62048a  R11: 0202  R12: 7f4aef8e3f68
>> R13: 7f4af0a39270  R14: 7f4af0a38090  R15: 
>> ORIG_RAX: 00af  CS: e033  SS: e02b
>>
>>
>> It seems that the recursion occurs even before register_kprobe
>> returns. I am not sure how this should be solved. Should we mark hash_64
>> with __kprobes? Or perhaps with __attribute__((always_inline))?
> 
> This

Re: [RFC PATCH v3 2/6] uretprobes/x86: hijack return address

2013-02-28 Thread Ananth N Mavinakayanahalli

On Thu, Feb 28, 2013 at 12:00:11PM +0100, Anton Arapov wrote:
>   hijack the return address and replace it with a "trampoline"
> 
> v2:
>   - remove ->doomed flag, kill task immediately
> 
> Signed-off-by: Anton Arapov 
> ---
>  arch/x86/include/asm/uprobes.h |  1 +
>  arch/x86/kernel/uprobes.c  | 29 +
>  2 files changed, 30 insertions(+)
> 
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index 8ff8be7..c353555 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -55,4 +55,5 @@ extern int  arch_uprobe_post_xol(struct arch_uprobe *aup, 
> struct pt_regs *regs);
>  extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
>  extern int  arch_uprobe_exception_notify(struct notifier_block *self, 
> unsigned long val, void *data);
>  extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs 
> *regs);
> +extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long 
> rp_trampoline_vaddr, struct pt_regs *regs);
>  #endif   /* _ASM_UPROBES_H */
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 0ba4cfb..85e2153 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -697,3 +697,32 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, 
> struct pt_regs *regs)
>   send_sig(SIGTRAP, current, 0);
>   return ret;
>  }
> +
> +extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long
> + rp_trampoline_vaddr, struct pt_regs *regs)
> +{
> + int rasize, ncopied;
> + unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
> +
> + rasize = is_ia32_task() ? 4 : 8;
> + ncopied = copy_from_user(_ret_vaddr, (void __user *)regs->sp, 
> rasize);
> + if (unlikely(ncopied))

What if ncopied < rasize? Agreed that the upper order bits can be 0, but should
you not validate ncopied == rasize?

Ananth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] x86: change names of e820 memory map type

2013-02-28 Thread H. Peter Anvin

NAK.  Gratuitous pointless change.

liguang  wrote:

>E820_RAM -> E820_TYPE_RAM
>E820_ACPI-> E820_TYPE_ACPI
>...
>
>names like E820_RAM is conflict-prone,
>because user is more likely to define
>a macro like this if did not strongly
>aware this name have been defined
>by e820.h
>
>Signed-off-by: liguang 
>---
> arch/x86/boot/compressed/eboot.c   |   10 +++---
> arch/x86/include/asm/gart.h|2 +-
> arch/x86/include/uapi/asm/e820.h   |   12 
> arch/x86/kernel/acpi/boot.c|2 +-
> arch/x86/kernel/aperture_64.c  |4 +-
> arch/x86/kernel/cpu/centaur.c  |2 +-
> arch/x86/kernel/cpu/mtrr/cleanup.c |2 +-
>arch/x86/kernel/e820.c |   52
>
> arch/x86/kernel/setup.c|   22 +++---
> arch/x86/kernel/tboot.c|8 ++--
> arch/x86/mm/init_64.c  |   12 
> arch/x86/pci/mmconfig-shared.c |2 +-
> arch/x86/platform/efi/efi.c|   14 
> arch/x86/platform/visws/visws_quirks.c |6 ++--
> arch/x86/xen/setup.c   |   16 +-
> 15 files changed, 83 insertions(+), 83 deletions(-)
>
>diff --git a/arch/x86/boot/compressed/eboot.c
>b/arch/x86/boot/compressed/eboot.c
>index f8fa411..5bda487 100644
>--- a/arch/x86/boot/compressed/eboot.c
>+++ b/arch/x86/boot/compressed/eboot.c
>@@ -1040,15 +1040,15 @@ again:
>   case EFI_MEMORY_MAPPED_IO:
>   case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
>   case EFI_PAL_CODE:
>-  e820_type = E820_RESERVED;
>+  e820_type = E820_TYPE_RESERVED;
>   break;
> 
>   case EFI_UNUSABLE_MEMORY:
>-  e820_type = E820_UNUSABLE;
>+  e820_type = E820_TYPE_UNUSABLE;
>   break;
> 
>   case EFI_ACPI_RECLAIM_MEMORY:
>-  e820_type = E820_ACPI;
>+  e820_type = E820_TYPE_ACPI;
>   break;
> 
>   case EFI_LOADER_CODE:
>@@ -1056,11 +1056,11 @@ again:
>   case EFI_BOOT_SERVICES_CODE:
>   case EFI_BOOT_SERVICES_DATA:
>   case EFI_CONVENTIONAL_MEMORY:
>-  e820_type = E820_RAM;
>+  e820_type = E820_TYPE_RAM;
>   break;
> 
>   case EFI_ACPI_MEMORY_NVS:
>-  e820_type = E820_NVS;
>+  e820_type = E820_TYPE_NVS;
>   break;
> 
>   default:
>diff --git a/arch/x86/include/asm/gart.h b/arch/x86/include/asm/gart.h
>index 156cd5d..4d22bcc 100644
>--- a/arch/x86/include/asm/gart.h
>+++ b/arch/x86/include/asm/gart.h
>@@ -97,7 +97,7 @@ static inline int aperture_valid(u64 aper_base, u32
>aper_size, u32 min_size)
>   printk(KERN_INFO "Aperture beyond 4GB. Ignoring.\n");
>   return 0;
>   }
>-  if (e820_any_mapped(aper_base, aper_base + aper_size, E820_RAM)) {
>+  if (e820_any_mapped(aper_base, aper_base + aper_size, E820_TYPE_RAM))
>{
>   printk(KERN_INFO "Aperture pointing to e820 RAM. Ignoring.\n");
>   return 0;
>   }
>diff --git a/arch/x86/include/uapi/asm/e820.h
>b/arch/x86/include/uapi/asm/e820.h
>index bbae024..2d400b1 100644
>--- a/arch/x86/include/uapi/asm/e820.h
>+++ b/arch/x86/include/uapi/asm/e820.h
>@@ -32,11 +32,11 @@
> 
> #define E820NR0x1e8   /* # entries in E820MAP */
> 
>-#define E820_RAM  1
>-#define E820_RESERVED 2
>-#define E820_ACPI 3
>-#define E820_NVS  4
>-#define E820_UNUSABLE 5
>+#define E820_TYPE_RAM 1
>+#define E820_TYPE_RESERVED2
>+#define E820_TYPE_ACPI3
>+#define E820_TYPE_NVS 4
>+#define E820_TYPE_UNUSABLE5
> 
> 
> /*
>@@ -45,7 +45,7 @@
>  * included in the S3 integrity calculation and so should not include
>  * any memory that BIOS might alter over the S3 transition
>  */
>-#define E820_RESERVED_KERN128
>+#define E820_TYPE_RESERVED_KERN128
> 
> #ifndef __ASSEMBLY__
> #include 
>diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>index 230c8ea..9595747 100644
>--- a/arch/x86/kernel/acpi/boot.c
>+++ b/arch/x86/kernel/acpi/boot.c
>@@ -1712,6 +1712,6 @@ int __acpi_release_global_lock(unsigned int
>*lock)
> 
>void __init arch_reserve_mem_area(acpi_physical_address addr, size_t
>size)
> {
>-  e820_add_region(addr, size, E820_ACPI);
>+  e820_add_region(addr, size, E820_TYPE_ACPI);
>   update_e820();
> }
>diff --git a/arch/x86/kernel/aperture_64.c
>b/arch/x86/kernel/aperture_64.c
>index d5fd66f..0210300 100644
>--- a/arch/x86/kernel/aperture_64.c
>+++ b/arch/x86/kernel/aperture_64.c
>@@ -322,10 +322,10 @@ void __init early_gart_iommu_check(void)
> 
>   if (gart_fix_e820 && !fix && aper_enabled) {
>   if (e820_any_mapped(aper_base, aper_base + aper_size,
>-  E820_RAM)) {

Re: [PATCH 2/2] x86: add e820 descriptor attribute field

2013-02-28 Thread H. Peter Anvin

NAK in the extreme.  Not only does this break the bootloader protocol, but 
there are systems in the field that break if you give e820 anything other than 
a 20-byte buffer.

liguang  wrote:

>according to ACPI 5.0 Table 15-273
>Address Range Descriptor Structure,
>offset 20 is 32-bit field of Extended
>Attributes for Address Range Descriptor Structure.
>
>Signed-off-by: liguang 
>---
> arch/x86/include/uapi/asm/e820.h |7 ++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
>diff --git a/arch/x86/include/uapi/asm/e820.h
>b/arch/x86/include/uapi/asm/e820.h
>index 2d400b1..eb87284 100644
>--- a/arch/x86/include/uapi/asm/e820.h
>+++ b/arch/x86/include/uapi/asm/e820.h
>@@ -38,6 +38,10 @@
> #define E820_TYPE_NVS 4
> #define E820_TYPE_UNUSABLE5
> 
>+#define E820_ATTRIB_NV 0x2
>+#define E820_ATTRIB_SLOW_ACCESS 0x4
>+#define E820_ATTRIB_ERR_LOG 0x8
>+
> 
> /*
>  * reserved RAM used by kernel itself
>@@ -53,7 +57,8 @@ struct e820entry {
>   __u64 addr; /* start of memory segment */
>   __u64 size; /* size of memory segment */
>   __u32 type; /* type of memory segment */
>-} __attribute__((packed));
>+  __u32 attrib;
>+};
> 
> struct e820map {
>   __u32 nr_map;

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kprobing "hash_64.constprop.26" crashes the system, recursion through get_kprobe?

2013-02-28 Thread Ananth N Mavinakayanahalli

On Wed, Feb 27, 2013 at 11:42:41AM +0200, Timo Juhani Lindfors wrote:
> 
> There is a long-standing problem in the systemtap community where
> accidentally kprobing a delicate function causes the system to crash:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604453
> http://sourceware.org/bugzilla/show_bug.cgi?id=2725
> https://bugzilla.redhat.com/show_bug.cgi?id=655904
> http://sourceware.org/bugzilla/show_bug.cgi?id=13659
> 
> The current solution is to mark these functions with __kprobes that
> places them to a separate kprobe-free section (from __kprobes_text_start
> to __kprobes_text_end). This has the nice side effect that also inlined
> copies of innocent functions can not be kprobed when they are called
> from functions marked with __kprobes.
> 
> Now, hash_64 is marked "inline" but this is only a hint for the
> compiler. On my Debian unstable system (Linux 3.7.3-1~experimental.1)
> hash_64 actually exists in six different places thanks to the GCC
> ipa-cp (interprocedural constant propagation) optimization:

I am unable to recreate this problem on a fedora system; hash_64 is
inlined AFAICS.

> crashes the system. I used the "xm dump-core" facility of xen to dump
> the memory of the domU and obtained the following bactrace using
> "crash vm.img /usr/lib/debug/boot/vmlinux-3.7-trunk-amd64" and
> "for bt":
> 
> PID: 3007   TASK: 88003b9bb840  CPU: 0   COMMAND: "insmod"
>  #0 [88003db8] __schedule at 813777f8
>  #1 [88003db999a8] hash_64.constprop.26 at 81099909
>  #2 [88003db999d0] get_kprobe at 8137c5bb
>  #3 [88003db999e0] kprobe_exceptions_notify at 8137a3c1
>  #4 [88003db99a40] notifier_call_chain at 8137b5a3
>  #5 [88003db99a80] notify_die at 8137b60c
>  #6 [88003db99ab0] do_int3 at 81378fa0
>  #7 [88003db99ad0] xen_int3 at 8137887e
> [exception RIP: hash_64.constprop.26+1]
> RIP: 81099909  RSP: 88003db99b80  RFLAGS: 0086
> RAX:   RBX: 81099908  RCX: 
> RDX: 88003db99c38  RSI: 0002  RDI: 81099908
> RBP: 0002   R8:    R9: 81629b10
> R10: 66a8  R11: a016a000  R12: 88003f80dd90
> R13: 81099908  R14: 81099909  R15: a016a010
> ORIG_RAX:   CS: 1e030  SS: e02b
>  #8 [88003db99b80] get_kprobe at 8137c5bb
>  #9 [88003db99b90] kprobe_exceptions_notify at 8137a3c1
> #10 [88003db99bf0] notifier_call_chain at 8137b5a3
> #11 [88003db99c30] notify_die at 8137b60c
> #12 [88003db99c60] do_int3 at 81378fa0
> #13 [88003db99c80] xen_int3 at 8137887e
> [exception RIP: hash_64.constprop.26+1]
> RIP: 81099909  RSP: 88003db99d30  RFLAGS: 0246
> RAX:   RBX: 81099908  RCX: a521
> RDX: 81099908  RSI: 81099908  RDI: 81099908
> RBP: 88003db99e10   R8: 140b   R9: 81099908
> R10: 66a8  R11: a016a000  R12: 81099908
> R13: 81099903  R14:   R15: a016a010
> ORIG_RAX:   CS: e030  SS: e02b
> #14 [88003db99d30] get_kprobe at 8137c5bb
> #15 [88003db99d40] __recover_optprobed_insn at 8102d4d4
> #16 [88003db99d70] recover_probed_instruction at 8102d479
> #17 [88003db99d90] can_optimize at 8137a952
> #18 [88003db99e50] arch_prepare_optimized_kprobe at 8137ab2c
> #19 [88003db99ea0] alloc_aggr_kprobe.isra.17 at 8137bb9b
> #20 [88003db99ec0] register_kprobe at 8137cf16
> #21 [88003db99f00] init_module at a000600d [testcase1]
> #22 [88003db99f10] do_one_initcall at 810020b6
> #23 [88003db99f40] sys_init_module at 81083c4f
> #24 [88003db99f80] system_call_fastpath at 8137d6e9
> RIP: 7f4aef62414a  RSP: 7fffbd2e9d08  RFLAGS: 0202
> RAX: 00af  RBX: 8137d6e9  RCX: 7f4aef62048a
> RDX: 7f4aef8e3f68  RSI: 0002b833  RDI: 7f4aefcca000
> RBP: 7f4af0a391a0   R8: 0003   R9: 
> R10: 7f4aef62048a  R11: 0202  R12: 7f4aef8e3f68
> R13: 7f4af0a39270  R14: 7f4af0a38090  R15: 
> ORIG_RAX: 00af  CS: e033  SS: e02b
> 
> 
> It seems that the recursion occurs even before register_kprobe
> returns. I am not sure how this should be solved. Should we mark hash_64
> with __kprobes? Or perhaps with __attribute__((always_inline))?

This is a clear case of recursion. Either of the two options should fix
the problem.

Ananth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH 2/7] ksm: treat unstable nid like in stable tree

2013-02-28 Thread Ric Mason



Hi Hugh,
On 02/23/2013 05:03 AM, Hugh Dickins wrote:

On Fri, 22 Feb 2013, Ric Mason wrote:

On 02/21/2013 04:20 PM, Hugh Dickins wrote:

An inconsistency emerged in reviewing the NUMA node changes to KSM:
when meeting a page from the wrong NUMA node in a stable tree, we say
that it's okay for comparisons, but not as a leaf for merging; whereas
when meeting a page from the wrong NUMA node in an unstable tree, we
bail out immediately.

IIUC
- ksm page from the wrong NUMA node will be add to current node's stable tree


Please forgive my late response.


That should never happen (and when I was checking with a WARN_ON it did
not happen).  What can happen is that a node already in a stable tree
has its page migrated away to another NUMA node.


- normal page from the wrong NUMA node will be merged to current node's
stable tree  <- where I miss here? I didn't see any special handling in
function stable_tree_search for this case.

nid = get_kpfn_nid(page_to_pfn(page));
root = root_stable_tree + nid;

to choose the right tree for the page, and

if (get_kpfn_nid(stable_node->kpfn) !=
NUMA(stable_node->nid)) {
put_page(tree_page);
goto replace;
}

to make sure that we don't latch on to a node whose page got migrated away.


I think the ksm implementation for num awareness  is buggy.

For page migratyion stuff, new page is allocated from node *which page 
is migrated to*.

- when meeting a page from the wrong NUMA node in an unstable tree
get_kpfn_nid(page_to_pfn(page)) *==* page_to_nid(tree_page)
How can say it's okay for comparisons, but not as a leaf for merging?
- when meeting a page from the wrong NUMA node in an stable tree
   - meeting a normal page
   - meeting a page which is ksm page before migration
 get_kpfn_nid(stable_node->kpfn) != NUMA(stable_node->nid) can't 
capture them since stable_node is for tree page in current stable tree. 
They are always equal.



- normal page from the wrong NUMA node will compare but not as a leaf for
merging after the patch

I don't understand you there, but hope my remarks above resolve it.

Hugh


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] x86: add e820 descriptor attribute field

2013-02-28 Thread liguang

according to ACPI 5.0 Table 15-273
Address Range Descriptor Structure,
offset 20 is 32-bit field of Extended
Attributes for Address Range Descriptor Structure.

Signed-off-by: liguang 
---
 arch/x86/include/uapi/asm/e820.h |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index 2d400b1..eb87284 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -38,6 +38,10 @@
 #define E820_TYPE_NVS  4
 #define E820_TYPE_UNUSABLE 5
 
+#define E820_ATTRIB_NV 0x2
+#define E820_ATTRIB_SLOW_ACCESS 0x4
+#define E820_ATTRIB_ERR_LOG 0x8
+
 
 /*
  * reserved RAM used by kernel itself
@@ -53,7 +57,8 @@ struct e820entry {
__u64 addr; /* start of memory segment */
__u64 size; /* size of memory segment */
__u32 type; /* type of memory segment */
-} __attribute__((packed));
+   __u32 attrib;
+};
 
 struct e820map {
__u32 nr_map;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] x86: change names of e820 memory map type

2013-02-28 Thread liguang

E820_RAM -> E820_TYPE_RAM
E820_ACPI-> E820_TYPE_ACPI
...

names like E820_RAM is conflict-prone,
because user is more likely to define
a macro like this if did not strongly
aware this name have been defined
by e820.h

Signed-off-by: liguang 
---
 arch/x86/boot/compressed/eboot.c   |   10 +++---
 arch/x86/include/asm/gart.h|2 +-
 arch/x86/include/uapi/asm/e820.h   |   12 
 arch/x86/kernel/acpi/boot.c|2 +-
 arch/x86/kernel/aperture_64.c  |4 +-
 arch/x86/kernel/cpu/centaur.c  |2 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c |2 +-
 arch/x86/kernel/e820.c |   52 
 arch/x86/kernel/setup.c|   22 +++---
 arch/x86/kernel/tboot.c|8 ++--
 arch/x86/mm/init_64.c  |   12 
 arch/x86/pci/mmconfig-shared.c |2 +-
 arch/x86/platform/efi/efi.c|   14 
 arch/x86/platform/visws/visws_quirks.c |6 ++--
 arch/x86/xen/setup.c   |   16 +-
 15 files changed, 83 insertions(+), 83 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index f8fa411..5bda487 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -1040,15 +1040,15 @@ again:
case EFI_MEMORY_MAPPED_IO:
case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
case EFI_PAL_CODE:
-   e820_type = E820_RESERVED;
+   e820_type = E820_TYPE_RESERVED;
break;
 
case EFI_UNUSABLE_MEMORY:
-   e820_type = E820_UNUSABLE;
+   e820_type = E820_TYPE_UNUSABLE;
break;
 
case EFI_ACPI_RECLAIM_MEMORY:
-   e820_type = E820_ACPI;
+   e820_type = E820_TYPE_ACPI;
break;
 
case EFI_LOADER_CODE:
@@ -1056,11 +1056,11 @@ again:
case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY:
-   e820_type = E820_RAM;
+   e820_type = E820_TYPE_RAM;
break;
 
case EFI_ACPI_MEMORY_NVS:
-   e820_type = E820_NVS;
+   e820_type = E820_TYPE_NVS;
break;
 
default:
diff --git a/arch/x86/include/asm/gart.h b/arch/x86/include/asm/gart.h
index 156cd5d..4d22bcc 100644
--- a/arch/x86/include/asm/gart.h
+++ b/arch/x86/include/asm/gart.h
@@ -97,7 +97,7 @@ static inline int aperture_valid(u64 aper_base, u32 
aper_size, u32 min_size)
printk(KERN_INFO "Aperture beyond 4GB. Ignoring.\n");
return 0;
}
-   if (e820_any_mapped(aper_base, aper_base + aper_size, E820_RAM)) {
+   if (e820_any_mapped(aper_base, aper_base + aper_size, E820_TYPE_RAM)) {
printk(KERN_INFO "Aperture pointing to e820 RAM. Ignoring.\n");
return 0;
}
diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index bbae024..2d400b1 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -32,11 +32,11 @@
 
 #define E820NR 0x1e8   /* # entries in E820MAP */
 
-#define E820_RAM   1
-#define E820_RESERVED  2
-#define E820_ACPI  3
-#define E820_NVS   4
-#define E820_UNUSABLE  5
+#define E820_TYPE_RAM  1
+#define E820_TYPE_RESERVED 2
+#define E820_TYPE_ACPI 3
+#define E820_TYPE_NVS  4
+#define E820_TYPE_UNUSABLE 5
 
 
 /*
@@ -45,7 +45,7 @@
  * included in the S3 integrity calculation and so should not include
  * any memory that BIOS might alter over the S3 transition
  */
-#define E820_RESERVED_KERN128
+#define E820_TYPE_RESERVED_KERN128
 
 #ifndef __ASSEMBLY__
 #include 
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 230c8ea..9595747 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1712,6 +1712,6 @@ int __acpi_release_global_lock(unsigned int *lock)
 
 void __init arch_reserve_mem_area(acpi_physical_address addr, size_t size)
 {
-   e820_add_region(addr, size, E820_ACPI);
+   e820_add_region(addr, size, E820_TYPE_ACPI);
update_e820();
 }
diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index d5fd66f..0210300 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -322,10 +322,10 @@ void __init early_gart_iommu_check(void)
 
if (gart_fix_e820 && !fix && aper_enabled) {
if (e820_any_mapped(aper_base, aper_base + aper_size,
-   E820_RAM)) {
+   E820_TYPE_RAM)) {
/* reserve it, so we can reuse it in second kernel */

[tip:x86/cleanups] x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety

2013-02-28 Thread tip-bot for gmail

Commit-ID:  b317219b322e36e25150d7b64f4532401779959d
Gitweb: http://git.kernel.org/tip/b317219b322e36e25150d7b64f4532401779959d
Author: gmail 
AuthorDate: Fri, 1 Mar 2013 09:20:39 +0800
Committer:  H. Peter Anvin 
CommitDate: Thu, 28 Feb 2013 20:19:50 -0800

x86_64: Use __BOOT_DS instead_of __KERNEL_DS for safety

In startup_32, the running code still uses the initial GDT
located in setup. Thus, __BOOT_DS is preferred. Currently
__KERNEL_DS is lucky to equal to __BOOT_DS, but this is
not always a safe way.

Signed-off-by: Lans Zhang 
Link: http://lkml.kernel.org/r/51300267.6000...@gmail.com
Signed-off-by: H. Peter Anvin 
---
 arch/x86/boot/compressed/head_64.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index c1d383d..16f24e6 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -52,7 +52,7 @@ ENTRY(startup_32)
jnz 1f
 
cli
-   movl$(__KERNEL_DS), %eax
+   movl$(__BOOT_DS), %eax
movl%eax, %ds
movl%eax, %es
movl%eax, %ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] i2o: check copy_from_user() size parameter

2013-02-28 Thread Dan Carpenter

Limit the size of the copy so we don't corrupt memory.  Hopefully
this can only be called by root, but fixing this makes the static
checkers happier.

Signed-off-by: Dan Carpenter 

diff --git a/drivers/message/i2o/i2o_config.c b/drivers/message/i2o/i2o_config.c
index 5451bef..a60c188 100644
--- a/drivers/message/i2o/i2o_config.c
+++ b/drivers/message/i2o/i2o_config.c
@@ -687,6 +687,11 @@ static int i2o_cfg_passthru32(struct file *file, unsigned 
cmnd,
}
size = size >> 16;
size *= 4;
+   if (size > sizeof(rmsg)) {
+   rcode = -EINVAL;
+   goto sg_list_cleanup;
+   }
+
/* Copy in the user's I2O command */
if (copy_from_user(rmsg, user_msg, size)) {
rcode = -EFAULT;
@@ -922,6 +927,11 @@ static int i2o_cfg_passthru(unsigned long arg)
}
size = size >> 16;
size *= 4;
+   if (size > sizeof(rmsg)) {
+   rcode = -EFAULT;
+   goto sg_list_cleanup;
+   }
+
/* Copy in the user's I2O command */
if (copy_from_user(rmsg, user_msg, size)) {
rcode = -EFAULT;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCHv3 0/8] Thermal Framework Enhancements

2013-02-28 Thread R, Durgadoss

Hi Eduardo,

> -Original Message-
> From: Eduardo Valentin [mailto:eduardo.valen...@ti.com]
> Sent: Friday, March 01, 2013 3:04 AM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux...@vger.kernel.org; linux-kernel@vger.kernel.org;
> hongbo.zh...@linaro.org; w...@nvidia.com
> Subject: Re: [PATCHv3 0/8] Thermal Framework Enhancements
> 
> 
> Durga,
> 
> 
> 
> On 05-02-2013 06:46, Durgadoss R wrote:
> > This patch set is a v3 of the previous versions submitted here:
> > [v2]: http://lwn.net/Articles/531720/
> > [v1]: https://lkml.org/lkml/2012/12/18/108
> > [RFC]:https://patchwork.kernel.org/patch/1758921/
> 
> On this version I have some implementation details which applies mostly
> for the series. So, I am replying to patch 0 to summarize:
> 
> - Consider using linked list

This I thought through on my RFC itself.
I know we have arrays, but using list adds too many members
to the structures, and protection becomes really cryptic.

> - You may have contention on your indexes and arrays
> - overflow on your buffer (carefully check your implementation)
> - zone removal condition. can we remove zones with sensors/cdevs/maps
> registered?
> - Minors on strlcpy, snprintf, devm_ helpers
> - documentation in the code for these helper functions and also better
> naming..

I will try to take care of these in my next version, as far as I can see.
But would really help if you can point the specific code that needs
improvement.

Thanks,
Durga
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/8] Thermal: Create sensor level APIs

2013-02-28 Thread R, Durgadoss

Hi Eduardo,

> -Original Message-
> From: Eduardo Valentin [mailto:eduardo.valen...@ti.com]
> Sent: Friday, March 01, 2013 12:29 AM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux...@vger.kernel.org; linux-kernel@vger.kernel.org;
> hongbo.zh...@linaro.org; w...@nvidia.com
> Subject: Re: [PATCH 1/8] Thermal: Create sensor level APIs
> 
> Durga,
> 
> On 05-02-2013 06:46, Durgadoss R wrote:
> > This patch creates sensor level APIs, in the
> > generic thermal framework.
> >
> > A Thermal sensor is a piece of hardware that can report
> > temperature of the spot in which it is placed. A thermal
> > sensor driver reads the temperature from this sensor
> > and reports it out. This kind of driver can be in
> > any subsystem. If the sensor needs to participate
> > in platform thermal management, the corresponding
> > driver can use the APIs introduced in this patch, to
> > register(or unregister) with the thermal framework.
> 
> At first glance, patch seams reasonable. But I have one major concern as
> follows inline, apart from several minor comments.
> 
> >
> > Signed-off-by: Durgadoss R 
> > ---
> >   drivers/thermal/thermal_sys.c |  280
> +
> >   include/linux/thermal.h   |   29 +
> >   2 files changed, 309 insertions(+)
> >
> > diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
> > index 0a1bf6b..cb94497 100644
> > --- a/drivers/thermal/thermal_sys.c
> > +++ b/drivers/thermal/thermal_sys.c
> > @@ -44,13 +44,16 @@ MODULE_LICENSE("GPL");
> >
> >   static DEFINE_IDR(thermal_tz_idr);
> >   static DEFINE_IDR(thermal_cdev_idr);
> > +static DEFINE_IDR(thermal_sensor_idr);
> >   static DEFINE_MUTEX(thermal_idr_lock);
> >
> >   static LIST_HEAD(thermal_tz_list);
> > +static LIST_HEAD(thermal_sensor_list);
> >   static LIST_HEAD(thermal_cdev_list);
> >   static LIST_HEAD(thermal_governor_list);
> >
> >   static DEFINE_MUTEX(thermal_list_lock);
> > +static DEFINE_MUTEX(sensor_list_lock);
> >   static DEFINE_MUTEX(thermal_governor_lock);
> >
> >   static struct thermal_governor *__find_governor(const char *name)
> > @@ -423,6 +426,103 @@ static void thermal_zone_device_check(struct
> work_struct *work)
> >   #define to_thermal_zone(_dev) \
> > container_of(_dev, struct thermal_zone_device, device)
> >
> > +#define to_thermal_sensor(_dev) \
> > +   container_of(_dev, struct thermal_sensor, device)
> > +
> > +static ssize_t
> > +sensor_name_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> > +{
> > +   struct thermal_sensor *ts = to_thermal_sensor(dev);
> > +
> > +   return sprintf(buf, "%s\n", ts->name);
> 
> For security reasons:
> s/sprintf/snprintf
> 
> > +}
> > +
> > +static ssize_t
> > +sensor_temp_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> > +{
> > +   int ret;
> > +   long val;
> > +   struct thermal_sensor *ts = to_thermal_sensor(dev);
> > +
> > +   ret = ts->ops->get_temp(ts, );
> > +
> > +   return ret ? ret : sprintf(buf, "%ld\n", val);
> 
> ditto.
> 
> > +}
> > +
> > +static ssize_t
> > +hyst_show(struct device *dev, struct device_attribute *attr, char *buf)
> > +{
> > +   int indx, ret;
> > +   long val;
> > +   struct thermal_sensor *ts = to_thermal_sensor(dev);
> > +
> > +   if (!sscanf(attr->attr.name, "threshold%d_hyst", ))
> 
> I'd rather check if it returns 1.
> 
> > +   return -EINVAL;
> > +
> > +   ret = ts->ops->get_hyst(ts, indx, );
> 
>  From your probe, you won't check for devices registered with
> ops.get_hyst == NULL. This may lead to a NULL pointer access above.

if ops.get_hyst is NULL, we don't even create these sysfs interfaces.
This check is in enable_sensor_thresholds function.

> 
> > +
> > +   return ret ? ret : sprintf(buf, "%ld\n", val);
> 
> snprintf.
> 
> > +}
> > +
> > +static ssize_t
> > +hyst_store(struct device *dev, struct device_attribute *attr,
> > +  const char *buf, size_t count)
> > +{
> > +   int indx, ret;
> > +   long val;
> > +   struct thermal_sensor *ts = to_thermal_sensor(dev);
> > +
> > +   if (!ts->ops->set_hyst)
> > +   return -EPERM;
> > +
> > +   if (!sscanf(attr->attr.name, "threshold%d_hyst", ))
> > +   return -EINVAL;
> > +
> > +   if (kstrtol(buf, 10, ))
> > +   return -EINVAL;
> > +
> > +   ret = ts->ops->set_hyst(ts, indx, val);
> 
>  From your probe, you won't check for devices registered with
> ops.set_hyst == NULL. This may lead to a NULL pointer access above.
> 
> > +
> > +   return ret ? ret : count;
> > +}
> > +
> > +static ssize_t
> > +threshold_show(struct device *dev, struct device_attribute *attr, char
> *buf)
> > +{
> > +   int indx, ret;
> > +   long val;
> > +   struct thermal_sensor *ts = to_thermal_sensor(dev);
> > +
> > +   if (!sscanf(attr->attr.name, "threshold%d", ))
> > +   return -EINVAL;
> > +
> > +   ret = ts->ops->get_threshold(ts, indx, );
>  From your probe, you won't check for devices registered with
> ops.get_threshold == NULL. This may

Re: [PATCH 3/3] convert headers_install.pl to headers_install.sh

2013-02-28 Thread Rob Landley


On 02/28/2013 02:19:24 AM, Michal Marek wrote:

On 27.2.2013 06:58, Rob Landley wrote:
> From: Rob Landley 
>
> Remove perl from make headers_install by replacing a perl script  
(doing

> a simple regex search and replace) with a smaller, faster, simpler,
> POSIX-2008 shell script implementation.  The new shell script is a  
single
> for loop calling sed and piping its output through unifdef to  
produce the

> target file.
>
> Same as last time except for minor tweak to deal with code review  
from here:

> http://lkml.indiana.edu/hypermail/linux/kernel/1302.3/00078.html
>
> (Note that this drops the "arch" argument, which isn't used. Kbuild
> already points to the right input files on the command line.)
>
> Signed-off-by: Rob Landley 

Looks good, I will apply it after v3.9-rc1 is out.


Yay! Thank you!

(Andrew took the other two! Thank you!)

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: For review: pid_namespaces(7) man page

2013-02-28 Thread Rob Landley


On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:

Eric et al,

Eventually, there will be more namespace man pages, but let us start
now with one for PID namespaces. The attached page aims to provide a
fairly complete overview of PID namespaces.


Onward!


PID_NAMESPACES(7)  Linux Programmer's Manual PID_NAMESPACES(7)

NAME
   pid_namespaces - overview of Linux PID namespaces

DESCRIPTION
   For an overview of namespaces, see namespaces(7).

   PID  namespaces  isolate  the  process ID number space, meaning
   that processes in different PID namespaces can  have  the  same
   PID.


Um, perhaps "different processes"? Slightly repetitive, but trying to  
avoid the potential misreading that "a processes can have the same PID  
in different namespaces". (A single process can't be a member of more  
than one namespace. This is not about selective visibility.)



PID namespaces allow containers to migrate to a new host
   while the processes inside  the  container  maintain  the  same
   PIDs.


I thought suspend/resume a container was the simple case. Migration to  
a new host is built on top of that. (On resume in a new container on  
the same system, if other stuff is going on in the system so the  
available PIDs have shifted.)



   Likewise, a process in an ancestor namespace can—subject to the
   usual permission checks described in  kill(2)—send  signals  to
   the  "init" process of a child PID namespace only if the "init"
   process has established a handler for that signal.  (Within the
   handler,  the  siginfo_t si_pid field described in sigaction(2)
   will be zero.)  SIGKILL or SIGSTOP are  treated  exceptionally:
   these signals are forcibly delivered when sent from an ancestor
   PID namespace.  Neither of these signals can be caught  by  the
   "init" process, and so will result in the usual actions associ‐
   ated with those signals (respectively, terminating and stopping
   the process).


If SIGKILL to init is propogated to all the children of init, is  
SIGSTOP also propogated to all the children? (I.E. will SIGSTOP to  
container's init suspend the whole container, and will SIGCONT resume  
the whole container? If the latter, will it only resume processes that  
weren't previously stopped? :)



   To put things another way: a process's PID namespace membership
   is determined when the process is created and cannot be changed
   thereafter.  Among other things, this means that  the  parental
   relationship between processes mirrors the parental between PID


mirrors the relationship


   namespaces: the parent of a  process  is  either  in  the  same
   namespace or resides in the immediate parent PID namespace.

   Every  thread  in  a process must be in the same PID namespace.
   For this reason, the two following call sequences will fail:

   unshare(CLONE_NEWPID);
   clone(..., CLONE_VM, ...);/* Fails */

   setns(fd, CLONE_NEWPID);
   clone(..., CLONE_VM, ...);/* Fails */


They fail with -EUNDOCUMENTED


   Because the above unshare(2) and setns(2) calls only change the
   PID  namespace  for created children, the clone(2) calls neces‐
   sarily put the new thread in a different PID namespace from the
   calling thread.


Um, no they don't. They fail. That's the point. They _would_ put the  
new thread in a different PID namespace, which breaks the definition of  
threads.


How about:

The above unshare(2) and setns(2) calls change the PID namespace of
children created by subsequent clone(2) calls, which is incompatible
with CLONE_VM.


   Miscellaneous
   After  creating a new PID namespace, it is useful for the child
   to change its root directory and mount a new procfs instance at
   /proc  so  that  tools such as ps(1) work correctly.  (If a new
   mount  namespace  is  simultaneously   created   by   including
   CLONE_NEWNS  in  the flags argument of clone(2) or unshare(2)),
   then it isn't necessary to change the  root  directory:  a  new
   procfs instance can be mounted directly over /proc.)


Why is the (If) clause in parentheses? And unshare(2)) has a Bruce.
(I.E. unbalanced parens.).


   Calling  readlink(2)  on the path /proc/self yields the process
   ID of the caller in the  PID  namespace  of  the  procfs  mount
   (i.e.,  the  PID  namespace  of  the  process  that mounted the
   procfs).


This is per-filesystem rather than using the process's namespace  
because...?
(Where /proc/self points is already process-local data, so the races  
here can't be too horrible...)



   When a process ID is passed over a  UNIX  domain  socket  to  a
   process  in  a  different PID namespace (see the description of
   SCM_CREDENTIALS in unix(7)), it is translated into  the  corre‐
   sponding PID value in the receiving process's PID

[GIT PULL] slave-dmaengine updates 2

2013-02-28 Thread Vinod Koul

Hi Linus,

Here is my second pull request for this merge window.

The Arnd's patch moves the dw_dmac to use generic DMA binding. I agreed to merge
this late as it will avoid the conflicts between tree's.

The second patch from Matt adding a  dma_request_slave_channel_compat API was
supposed to be picked up, but somehow never got picked up.  Some patches
dependent on this are already in -next :(

Thanks
~Vinod

The following changes since commit 17166a3b6e88b93189e6be5f7e1335a3cc4fa965:
are available in the git repository at:

  git://git.infradead.org/users/vkoul/slave-dma.git next

Arnd Bergmann (1):
  dmaengine: dw_dmac: move to generic DMA binding

Matt Porter (1):
  dmaengine: add dma_request_slave_channel_compat()

 Documentation/devicetree/bindings/dma/snps-dma.txt |   70 +-
 drivers/dma/dw_dmac.c  |  145 ++--
 drivers/dma/dw_dmac_regs.h |7 +-
 include/linux/dmaengine.h  |   16 ++
 include/linux/dw_dmac.h|5 -
 5 files changed, 127 insertions(+), 116 deletions(-)


signature.asc
Description: Digital signature

Re: [PATCH 1/2] ACPI / glue: Add .match() callback to struct acpi_bus_type

2013-02-28 Thread Greg Kroah-Hartman

On Thu, Feb 28, 2013 at 10:53:21PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> USB uses the .find_bridge() callback from struct acpi_bus_type
> incorrectly, because as a result of the way it is used by USB every
> device in the system that doesn't have a bus type or parent is
> passed to usb_acpi_find_device() for inspection.
> 
> What USB actually needs, though, is to call usb_acpi_find_device()
> for USB ports that don't have a bus type defined, but have
> usb_port_device_type as their device type, as well as for USB
> devices.
> 
> To fix that replace the struct bus_type pointer in struct
> acpi_bus_type used for matching devices to specific subsystems
> with a .match() callback to be used for this purpose and update
> the users of struct acpi_bus_type, including USB, accordingly.
> Define the .match() callback routine for USB, usb_acpi_bus_match(),
> in such a way that it will cover both USB devices and USB ports
> and remove the now redundant .find_bridge() callback pointer from
> usb_acpi_bus.
> 
> Signed-off-by: Rafael J. Wysocki 

Acked-by: Greg Kroah-Hartman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Mar 1

2013-02-28 Thread Stephen Rothwell

Hi all,

Please do not add any work destined for v3.10 to your -next included
branches until after Linus has release v3.9-rc1.

Changes since 20130228:

The nfsd tree gained a conflict against Linus' tree and a build failure
so I used the version from next-20130228.

The bluetooth tree gained a conflict against Linus' tree.

The ftrace tree gained a build failure so I used the version from
next-20130228.

The kvm tree gained a conflict against Linus' tree.

The akpm tree gained a build failure for which I applied a patch and lost
a lost of patches that turned up elsewhere.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 216 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (de1a226 Merge tag 'writeback-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux)
Merging fixes/master (d287b87 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging kbuild-current/rc-fixes (02f3e53 Merge branch 'yem-kconfig-rc-fixes' of 
git://gitorious.org/linux-kconfig/linux-kconfig into kbuild/rc-fixes)
Merging arm-current/fixes (e36815e ARM: Fix broken commit 0cc41e4a21d43 
corrupting kernel messages)
Merging m68k-current/for-linus (5618395 m68k: Sort out !CONFIG_MMU_SUN3 vs. 
CONFIG_HAS_DMA)
Merging powerpc-merge/merge (eda8eeb powerpc/mm: Fix hash computation function)
Merging sparc/master (f9fd348 sparc32: refactor smp boot)
Merging net/master (32fcafb net/phy: micrel: Disable asymmetric pause for 
KSZ9021)
Merging ipsec/master (85dfb74 af_key: initialize satype in 
key_notify_policy_flush())
Merging sound-current/for-linus (d0ec95f ALSA: emu10k1: Allow to switch 
hardware sampe rate on EMU)
Merging pci-current/for-linus (249bfb8 PCI/PM: Clean up PME state when removing 
a device)
Merging wireless/master (4660269 libertas: fix crash for SD8688)
Merging driver-core.current/driver-core-linus (949db15 Linux 3.8-rc5)
Merging tty.current/tty-linus (8b5628a Merge tag 'virt' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging usb.current/usb-linus (221f8df USB: EHCI: revert "remove ASS/PSS 
polling timeout")
Merging staging.current/staging-linus (8b5628a Merge tag 'virt' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc)
Merging char-misc.current/char-misc-linus (302a3c0 Drivers: hv: vmbus: Use the 
new infrastructure for delivering VMBUS interrupts)
Merging input-current/for-linus (171fb58 Input: ALPS - update documentation for 
recent touchpad driver mods)
Merging md-current/for-linus (f3378b4 md: expedite metadata update when 
switching  read-auto -> active)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (8fd61d3 crypto: user - ensure user supplied 
strings are nul-terminated)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (084a0ec x86: add CONFIG_X86_MOVBE option)
CONFLICT (content): Merge conflict in arch/x86/Kconfig
CONFLICT (content): Merge conflict in arch/powerpc/Kconfig
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline fun

[QUESTION] How can I understand ARM_DMA_MEM_BUFFERABLE?

2013-02-28 Thread 杨可亲

Hi everyone:

In the newest linux kernel version3.8, arch/arm/mm/Kconfig, there are
some menu like this:

872 *config ARM_DMA_MEM_BUFFERABLE*

 873 *bool "Use non-cacheable memory for DMA" if (CPU_V6 ||
CPU_V6K) && !CPU_V7*

 874 depends on !(MACH_REALVIEW_PB1176 || REALVIEW_EB_ARM11MP || \

 875  MACH_REALVIEW_PB11MP)

 876 default y if CPU_V6 || CPU_V6K || CPU_V7

 877 help

 878   Historically, the kernel has used strongly ordered mappings to

 879   provide DMA coherent memory.  With the advent of ARMv7, mapping

 880   memory with differing types results in unpredictable behaviour,

 881   so on these CPUs, this option is forced on.

 882

 883   Multiple mappings with differing attributes is also unpredictable

 884   on ARMv6 CPUs, but since they do not have aggressive speculative

 885   prefetch, no harm appears to occur.

 886
 887   However, drivers may be missing the necessary barriers for ARMv6,

 888   and therefore turning this on may result in unpredictable driver

 889   behaviour.  Therefore, we offer this as an option.

 890
 891   You are recommended say 'Y' here and debug any affected drivers.



 I have three questions:

 1)Does this mean that I cann't get unbuffered DMA buffer on ARM
cortex a8(CPU_V7)?


 2)How can I　allocate unbuffered DMA buffer on linux of cortex a8 platform?

 3)In the upper contex,there are:
   "With the advent of ARMv7, mapping memory with differing types
results in unpredictablebehaviour"
  What does "mapping memory with differing types" really mean?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] arch/arc for v3.9-rc1

2013-02-28 Thread Vineet Gupta

On Friday 22 February 2013 12:28 PM, Vineet Gupta wrote:
> Hi Linus,
> 
> I would like to introduce the Linux port to ARC Processors (from Synopsys) for
> 3.9-rc1. The patch-set has been discussed on the public lists since Nov and 
> has
> received a fair bit of review, specially from Arnd, tglx, Al and other 
> subsystem
> maintainers for DeviceTree, kgdb .
> 
> The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a 
> minor
> change to PARISC (acked by Helge).
> 
> The series is a touch bigger for a new port for 2 main reasons:
> 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later
> 2. Some of the fallout of review (DeviceTree support, multi-platform-image
> support) were added on top of orig series, primarily to record the revision 
> history.
> 
> Please consider pulling.
> 
> Thanks,
> Vineet

Hi Linus,

Did you get a chance to look at the patch series and is there anything in
particular which could be gating the merge.

Thanks,
Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread H. Peter Anvin

On 02/28/2013 08:32 PM, Linus Torvalds wrote:
> Yingai, Andrew,
>  is this ok with you two?
> 
> Linus

FWIW, it makes sense to me iff it resolves the problems.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Andrew Morton

On Thu, 28 Feb 2013 20:32:15 -0800 Linus Torvalds 
 wrote:

> Yingai, Andrew,
>  is this ok with you two?

If it works.  I haven't tested it yet!  Ordinarily I'd give it a few
days for -next testing and to let Fengguang's testbot chew on it. 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Linus Torvalds

Yingai, Andrew,
 is this ok with you two?

Linus

On Thu, Feb 28, 2013 at 7:46 PM, Tang Chen  wrote:
> Hi Linus,
>
> Please refer to the attached patch.
>
> This patch everts only the following two patches.
>
>
> commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb
> acpi, memory-hotplug: support getting hotplug info from SRAT
> commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
>
> acpi, memory-hotplug: parse SRAT before memblock is ready
>
> Without these two patches, users can use "movablemem_map=nn[KMG]@ss[KMG]"
> correctly, and cause no problem.
>
> And of course, the kernel will work as before if users don't use
>
> "movablemem_map=nn[KMG]@ss[KMG]".
>
> I do hope we can keep "movablemem_map=nn[KMG]@ss[KMG]" in 3.9.
>
>
> We are working on fixing the SRAT problems, and we aims to push SRAT related
> patches in 3.10. And we will also improve "movablemem_map=nn[KMG]@ss[KMG]"
> functionality consistently in the future.
>
> Thanks. :)
>
>
> On 03/01/2013 11:13 AM, Linus Torvalds wrote:
>>
>> On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton
>>   wrote:
>>>
>>>
>>> So I'm thinking that the best approach here is to revert everything and
>>> then try again for 3.10-rc1.  This gives people time to test the code
>>> while it's only in linux-next.  (Hint!)
>>
>>
>> I'd prefer to revert too by now - the bug seems to be known, and
>> apparently it's not a trivial fix. We're getting close to the end of
>> the merge window, and it's still being discussed, it clearly wasn't
>> really fully cooked.
>>
>> Can we agree on some minimal set of reverts? Can somebody send me a
>> patch with the revert and the commit explanation for the revert?
>> Yinghai? Or I can do the reverts too if just the exact set of commits
>> is clear, but I'd rather get it from somebody who sees and understand
>> the problem, and can test the state afterwards..
>>
>> Linus
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the akpm tree

2013-02-28 Thread Stephen Rothwell

Hi Andrew,

After merging the akpm tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

fs/aio.c: In function 'exit_aio':
fs/aio.c:522:60: error: macro "hlist_for_each_entry_safe" passed 5 arguments, 
but takes just 4
fs/aio.c:522:2: error: 'hlist_for_each_entry_safe' undeclared (first use in 
this function)
fs/aio.c:522:2: note: each undeclared identifier is reported only once for each 
function it appears in
fs/aio.c:522:62: error: expected ';' before '{' token

I added this fix patch:

From: Stephen Rothwell 
Date: Fri, 1 Mar 2013 15:30:24 +1100
Subject: [PATCH] aio: fixup for hlist_for_each_entry_safe API change

Signed-off-by: Stephen Rothwell 
---
 fs/aio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index b36c2b6..2512232 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -517,9 +517,9 @@ EXPORT_SYMBOL(wait_on_sync_kiocb);
 void exit_aio(struct mm_struct *mm)
 {
struct kioctx *ctx;
-   struct hlist_node *p, *n;
+   struct hlist_node *n;
 
-   hlist_for_each_entry_safe(ctx, p, n, >ioctx_list, list) {
+   hlist_for_each_entry_safe(ctx, n, >ioctx_list, list) {
/*
 * We don't need to bother with munmap() here -
 * exit_mmap(mm) is coming and it'll unmap everything.
-- 
1.8.1

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpEFdCX584KE.pgp
Description: PGP signature

[PATCH 1/1] arm: remove extra semicolon in if statement

2013-02-28 Thread Vinicius Tinti

Remove extra semicolon in perf_event.c if statement.

Signed-off-by: Vinicius Tinti 
---
 arch/arm/kernel/perf_event.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 31e0eb3..a892067 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -400,7 +400,7 @@ __hw_perf_event_init(struct perf_event *event)
}
 
if (event->group_leader != event) {
-   if (validate_group(event) != 0);
+   if (validate_group(event) != 0)
return -EINVAL;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL 0/3] arm-soc: late changes for 3.9

2013-02-28 Thread Linus Torvalds

On Thu, Feb 28, 2013 at 2:54 PM, Olof Johansson  wrote:
>
> Final two pull requests are for the same code. As Arnd describes in the
> tags, they are for a set of mvebu patches that depend on contents in
> the MMC tree. We had pulled in part of the MMC branch as a dependency,
> but unfortunately Chris Ball rebased it.

Has Chris Ball been told what an incredible pain this kind of crap is,
and that there's a damn good reason why WE DO NOT REBASE PUBLIC TREES
THAT OTHERS MAY BE BASING THEIR DEVELOPMENT ON!

Chris, can you hear me shouting? Don't do that.

> We're giving you the choice of taking the rebased version, or a
> non-rebased-but-merged-and-fixed-up version to avoid dealing with the
> excessive conflicts. The rebased one has the obvious benefit of not
> having duplicate commits in the tree for the same changes, but, well,
> it's rebased. Actual tree contents is identical though.

I'm taking the rebased one, thanks for the explanation. I really don't
like rebasing, but you did it for a valid reason, and it wasn't your
mistake. And duplicating the commits just to be a pain is not worth
it.

> I've pushed a resolved branch for reference (late-branches-resolved)
> in case you want to compare conflict resolutions.

So Arnd's tag talked about removing the stale gpio.h, but I think it
was the i2c.h that was now also stale. So I removed both - even though
technically, the merge should have left i2c.h since it was in both
parents. You should double-check that, but I don't see how that
 could *possibly* be valid any more, and people had tried
(unsuccessfully) to remove it once already, so...

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/10] ipc MSG_COPY fixes

2013-02-28 Thread Stanislav Kinsbursky


26.02.2013 16:00, Peter Hurley пишет:

On Tue, 2013-02-26 at 11:53 +0400, Stanislav Kinsbursky wrote:

Looks good to me. Thanks you, Peter!

Acked-by: Stanislav Kinsbursky 

Next time please, add maintainer to "To" list instead of "CC" list (no need to resend - 
I've added Andrew Morton to "To" list in this reply).


Ok.


Can the alloc_msg() be further simplified to allocate one block with
vmalloc() and link the msg segments in-place?


Any thoughts on this suggestion?



Emm... You can do so, is you want.
But this will be just an optimisation on a slow-path. I.e. I have nothing
to object, but don't see any other reason except striving for perfection.


Regards,
Peter Hurley




--
Best regards,
Stanislav Kinsbursky
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lockdep trace from nfsd

2013-02-28 Thread Jeff Layton

On Thu, 28 Feb 2013 19:30:38 -0500
Dave Jones  wrote:

> [   39.878535] =
> [   39.879670] [ BUG: rpc.nfsd/666 still has locks held! ]
> [   39.880871] 3.8.0+ #3 Not tainted
> [   39.881858] -
> [   39.882850] 2 locks on stack by rpc.nfsd/666:
> [   39.883868]  #0: held: (nfsd_mutex){+.+.+.}, instance: 
> a01cf0b8, at: [] write_ports+0x37/0x7a0 [nfsd]
> [   39.884750]  #1: held: (rpcb_create_local_mutex){+.+.+.}, instance: 
> a016d878, at: [] rpcb_create_local+0x46/0x90 
> [sunrpc]
> [   39.885903] 
> stack backtrace:
> [   39.897044] Pid: 666, comm: rpc.nfsd Not tainted 3.8.0+ #3
> [   39.898186] Call Trace:
> [   39.900755]  [] debug_check_no_locks_held+0x9a/0xa0
> [   39.901823]  [] rpc_wait_bit_killable+0x85/0xb0 [sunrpc]
> [   39.902866]  [] __wait_on_bit+0x60/0x90
> [   39.903879]  [] ? __rpc_execute+0x170/0x5a0 [sunrpc]
> [   39.904900]  [] ? 
> __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
> [   39.905969]  [] out_of_line_wait_on_bit+0x7c/0x90
> [   39.907010]  [] ? __rpc_execute+0x170/0x5a0 [sunrpc]
> [   39.908070]  [] ? autoremove_wake_function+0x50/0x50
> [   39.909124]  [] ? call_connect+0xa0/0xa0 [sunrpc]
> [   39.910154]  [] __rpc_execute+0x1a1/0x5a0 [sunrpc]
> [   39.911176]  [] ? wake_up_bit+0x2e/0x40
> [   39.912058]  [] rpc_execute+0x59/0x180 [sunrpc]
> [   39.912745]  [] rpc_run_task+0x70/0x90 [sunrpc]
> [   39.913446]  [] rpc_call_sync+0x43/0xa0 [sunrpc]
> [   39.914280]  [] rpc_ping+0x52/0x70 [sunrpc]
> [   39.914992]  [] rpc_create+0x188/0x230 [sunrpc]
> [   39.915735]  [] ? sched_clock+0x9/0x10
> [   39.916577]  [] ? put_lock_stats.isra.25+0xe/0x40
> [   39.917635]  [] ? 
> lock_release_holdtime.part.26+0xcc/0x140
> [   39.918667]  [] rpcb_create_local_unix+0x5c/0xe0 [sunrpc]
> [   39.919669]  [] rpcb_create_local+0x78/0x90 [sunrpc]
> [   39.920705]  [] svc_rpcb_setup+0x23/0x50 [sunrpc]
> [   39.921725]  [] svc_bind+0x34/0x50 [sunrpc]
> [   39.921733]  [] nfsd_create_serv+0x1cd/0x320 [nfsd]
> [   39.921738]  [] ? nfsd_create_serv+0x5/0x320 [nfsd]
> [   39.921742]  [] write_ports+0x52a/0x7a0 [nfsd]
> [   39.921746]  [] ? write_ports+0x418/0x7a0 [nfsd]
> [   39.921750]  [] ? _raw_spin_unlock+0x35/0x60
> [   39.921754]  [] ? simple_transaction_get+0xca/0xe0
> [   39.921759]  [] ? write_maxblksize+0x2e0/0x2e0 [nfsd]
> [   39.921764]  [] nfsctl_transaction_write+0x57/0x90 [nfsd]
> [   39.921768]  [] vfs_write+0xaf/0x190
> [   39.921771]  [] sys_write+0x55/0xa0
> [   39.921775]  [] system_call_fastpath+0x16/0x1b
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ok, I see...

rpc_wait_bit_killable() calls freezable_schedule(). That calls
freezer_count() which calls  try_to_freeze(). try_to_freeze does this
lockdep check now as of commit 6aa9707099.

The assumption seems to be that freezing a thread while holding any
sort of lock is bad. The rationale in that patch seems a bit sketchy to
me though. We can be fairly certain that we're not going to deadlock by
holding these locks, but I guess there could be something I've missed.

Mandeep, can you elaborate on whether there's really a deadlock
scenario here? If not, then is there some way to annotate these locks
so this lockdep pop goes away?

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL URGENT] ext4 regression fix for 3.9

2013-02-28 Thread Theodore Ts'o

On Thu, Feb 28, 2013 at 10:30:05PM -0500, Dave Jones wrote:
> 
> This has fixed the problem I reported, but I notice now that my
> desktop is really sluggish. perf top shows it's almost constantly
> spinning in ext4_es_reclaim_extents_count
> 
> Any ideas ?

ext4_es_reclaim_extents_count() is getting called out of the slab
shrinker.  It's getting called too often when there is significant
memory pressure.  We can optimize this so we're not calculating it all
the time.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/30] staging: sync: Export sync API symbols

2013-02-28 Thread John Stultz


On 02/28/2013 06:00 PM, Greg KH wrote:

On Thu, Feb 28, 2013 at 04:43:06PM -0800, John Stultz wrote:

From: Erik Gilling 

This is needed to allow modules to link against the sync subsystem

Cc: Maarten Lankhorst 
Cc: Erik Gilling 
Cc: Daniel Vetter 
Cc: Rob Clark 
Cc: Sumit Semwal 
Cc: Greg KH 
Cc: dri-de...@lists.freedesktop.org
Cc: Android Kernel Team 
Signed-off-by: Erik Gilling 
Signed-off-by: John Stultz 
---
  drivers/staging/android/sync.c |   14 ++
  1 file changed, 14 insertions(+)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 54f84d9..6739a84 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -15,6 +15,7 @@
   */
  
  #include 

+#include 
  #include 
  #include 
  #include 
@@ -64,6 +65,7 @@ struct sync_timeline *sync_timeline_create(const struct 
sync_timeline_ops *ops,
  
  	return obj;

  }
+EXPORT_SYMBOL(sync_timeline_create);

As these are now global, should they be a bit more "specific"?  "sync_"
seems pretty broad.


Given its the sync driver, its most obvious choice, but I agree its 
likely to collide with filesystem related or other sync_ named functions 
that don't have a subsystem prefix.


Any suggestions?

The only good alternative I can think of is that in some private 
conversations with DanielV, he referred to Android using "sync-points".


Erik: Would syncpoint_ be an ok prefix? Or do you have other ideas?


Also, EXPORT_SYMBOL_GPL() perhaps?

And who is using these exports?


From some quick git grepping...

In the android exynos tree:
https://android.googlesource.com/kernel/exynos.git 
android-exynos-manta-3.4-jb-mr1.1


drivers/gpu/arm/t6xx/kbase/src/linux/mali_kbase_sync.c: tl = 
sync_timeline_creat
drivers/media/video/videobuf2-core.c:   q->timeline = 
sw_sync_timeline_create(q-

drivers/video/s3c-fb.c: sfb->timeline = sw_sync_timeline_create("s3c-fb");

In the android msm tree:
https://android.googlesource.com/kernel/msm.git 
android-msm-mako-3.4-jb-mr1.1


drivers/gpu/msm/kgsl_sync.c:context->timeline = 
sync_timeline_create(_s
drivers/video/msm/msm_fb.c: mfd->timeline = 
sw_sync_timeline_create(


thanks
-john


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Enable multiple MSI feature in pSeries

2013-02-28 Thread Michael Ellerman

On Fri, Mar 01, 2013 at 11:08:45AM +0800, Mike wrote:
> Hi all
> 
> Any comments? or any questions about my patchset?

You were going to get some performance numbers that show a definite
benefit for using more than one MSI.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 linux-next] cpufreq: governors: Calculate iowait time only when necessary

2013-02-28 Thread Viresh Kumar

On 28 February 2013 22:27, Stratos Karafotis  wrote:
> Currently we always calculate the CPU iowait time and add it to idle time.
> If we are in ondemand and we use io_is_busy, we re-calculate iowait time
> and we subtract it from idle time.
>
> With this patch iowait time is calculated only when necessary avoiding
> the double call to get_cpu_iowait_time_us. We use a parameter in
> function get_cpu_idle_time to distinguish when the iowait time will be
> added to idle time or not, without the need of keeping the prev_io_wait.
>
> Signed-off-by: Stratos Karafotis 
> ---
>  drivers/cpufreq/cpufreq_conservative.c |  2 +-
>  drivers/cpufreq/cpufreq_governor.c | 46 
> +-
>  drivers/cpufreq/cpufreq_governor.h |  3 +--
>  drivers/cpufreq/cpufreq_ondemand.c | 11 +++-
>  4 files changed, 29 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_conservative.c 
> b/drivers/cpufreq/cpufreq_conservative.c
> index 4fd0006..dfe652c 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -242,7 +242,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, 
> struct attribute *b,
> struct cs_cpu_dbs_info_s *dbs_info;
> dbs_info = _cpu(cs_cpu_dbs_info, j);
> dbs_info->cdbs.prev_cpu_idle = get_cpu_idle_time(j,
> -   
> _info->cdbs.prev_cpu_wall);
> +   _info->cdbs.prev_cpu_wall, 0);
> if (cs_tuners.ignore_nice)
> dbs_info->cdbs.prev_cpu_nice =
> kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> diff --git a/drivers/cpufreq/cpufreq_governor.c 
> b/drivers/cpufreq/cpufreq_governor.c
> index 5a76086..a322bda 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -50,13 +50,13 @@ static inline u64 get_cpu_idle_time_jiffy(unsigned int 
> cpu, u64 *wall)
> return cputime_to_usecs(idle_time);
>  }
>
> -u64 get_cpu_idle_time(unsigned int cpu, u64 *wall)
> +u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy)
>  {
> u64 idle_time = get_cpu_idle_time_us(cpu, NULL);
>
> if (idle_time == -1ULL)
> return get_cpu_idle_time_jiffy(cpu, wall);
> -   else
> +   else if (!io_busy)
> idle_time += get_cpu_iowait_time_us(cpu, wall);
>
> return idle_time;
> @@ -83,13 +83,22 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> /* Get Absolute Load (in terms of freq for ondemand gov) */
> for_each_cpu(j, policy->cpus) {
> struct cpu_dbs_common_info *j_cdbs;
> -   u64 cur_wall_time, cur_idle_time, cur_iowait_time;
> -   unsigned int idle_time, wall_time, iowait_time;
> +   u64 cur_wall_time, cur_idle_time;
> +   unsigned int idle_time, wall_time;
> unsigned int load;
> +   int io_busy = 0;
>
> j_cdbs = dbs_data->get_cpu_cdbs(j);
>
> -   cur_idle_time = get_cpu_idle_time(j, _wall_time);
> +   /*
> +* For the purpose of ondemand, waiting for disk IO is
> +* an indication that you're performance critical, and
> +* not that the system is actually idle. So do not add
> +* the iowait time to the cpu idle time.
> +*/
> +   if (dbs_data->governor == GOV_ONDEMAND)
> +   io_busy = od_tuners->io_is_busy;
> +   cur_idle_time = get_cpu_idle_time(j, _wall_time, io_busy);
>
> wall_time = (unsigned int)
> (cur_wall_time - j_cdbs->prev_cpu_wall);
> @@ -117,29 +126,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
> idle_time += jiffies_to_usecs(cur_nice_jiffies);
> }
>
> -   if (dbs_data->governor == GOV_ONDEMAND) {
> -   struct od_cpu_dbs_info_s *od_j_dbs_info =
> -   dbs_data->get_cpu_dbs_info_s(cpu);
> -
> -   cur_iowait_time = get_cpu_iowait_time_us(j,
> -   _wall_time);
> -   if (cur_iowait_time == -1ULL)
> -   cur_iowait_time = 0;
> -
> -   iowait_time = (unsigned int) (cur_iowait_time -
> -   od_j_dbs_info->prev_cpu_iowait);
> -   od_j_dbs_info->prev_cpu_iowait = cur_iowait_time;
> -
> -   /*
> -* For the purpose of ondemand, waiting for disk IO is
> -* an indication that you're performance critical, and
> -* not that the system is actually idle. So subtract 
> the
> -* iowait time from the cpu idle time.
> -

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Tang Chen


Hi Linus,

Please refer to the attached patch.

This patch everts only the following two patches.

commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb
acpi, memory-hotplug: support getting hotplug info from SRAT
commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
acpi, memory-hotplug: parse SRAT before memblock is ready

Without these two patches, users can use "movablemem_map=nn[KMG]@ss[KMG]"
correctly, and cause no problem.

And of course, the kernel will work as before if users don't use
"movablemem_map=nn[KMG]@ss[KMG]".

I do hope we can keep "movablemem_map=nn[KMG]@ss[KMG]" in 3.9.


We are working on fixing the SRAT problems, and we aims to push SRAT related
patches in 3.10. And we will also improve "movablemem_map=nn[KMG]@ss[KMG]"
functionality consistently in the future.

Thanks. :)

On 03/01/2013 11:13 AM, Linus Torvalds wrote:

On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton
  wrote:


So I'm thinking that the best approach here is to revert everything and
then try again for 3.10-rc1.  This gives people time to test the code
while it's only in linux-next.  (Hint!)


I'd prefer to revert too by now - the bug seems to be known, and
apparently it's not a trivial fix. We're getting close to the end of
the merge window, and it's still being discussed, it clearly wasn't
really fully cooked.

Can we agree on some minimal set of reverts? Can somebody send me a
patch with the revert and the commit explanation for the revert?
Yinghai? Or I can do the reverts too if just the exact set of commits
is clear, but I'd rather get it from somebody who sees and understand
the problem, and can test the state afterwards..

Linus

>From 2e859dc212ce13fb812da6f971409a0518914574 Mon Sep 17 00:00:00 2001
From: Tang Chen 
Date: Thu, 28 Feb 2013 10:43:51 +0900
Subject: [PATCH] x86, ACPI, mm: Revert SRAT support from movablemem_map boot option.

The following two commits suooprt getting info from SRAT and determine
which memory is hot-pluggable, also AKA "movablemem_map=srat" boot option.

	commit 01a178a94e8eaec351b29ee49fbb3d1c124cb7fb
		acpi, memory-hotplug: support getting hotplug info from SRAT
	commit e8d1955258091e4c92d5a975ebd7fd8a98f5d30f
		acpi, memory-hotplug: parse SRAT before memblock is ready

We need to know SRAT info before memblock is ready, so that we can
prevent memblock from allocate movable memory.

To achieve goal, we moved SRAT parsing code earlier in these patches. But it broke
ACPI_INITRD_TABLE_OVERRIDE functionality, and the fallback path of numa_init().

So we revert these two commits for now. And after that, users can only use
"movablemem_map=nn[KMG]@ss[KMG]".

NOTE: 
1) It is OK to revert only these two patches. The core problems mentioned by
   Lu Yinghai:
   1. numa_init is called several times, NOT just for srat. so those
	nodes_clear(numa_nodes_parsed)
	memset(_meminfo, 0, sizeof(numa_meminfo))
  can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
  and make fall back path working.
   2. simply split acpi_numa_init to early_parse_srat.
  a. that early_parse_srat is NOT called for ia64, so you break ia64.
  b. for (i = 0; i < MAX_LOCAL_APIC; i++)
	set_apicid_to_node(i, NUMA_NO_NODE)
 still left in numa_init. So it will just clear result from early_parse_srat.
 it should be moved before that
  c.  it breaks ACPI_TABLE_OVERRIDE...as the acpi table scan is moved
  early before override from INITRD is settled.

   They are caused by moving SRAT parsing earlier. And "movablemem_map=nn[KMG]@ss[KMG]" 
   causes no harm to kernel.

2) With these two patches reverted, memblock will start to work before we parse SRAT,
   which means we won't know the end address of each node early enough.

   For example:
   If one node has memory [10G, 20G), and user specifies [15G, 16G), we cannot extend
   it to [15G, 20G). So memblock could still have a chance to allocate memory from
   [16G, 20G) for kernel, which is non-movable.

   As a resule, users could only use this option in a very limit way: 
   They should specify the memory range to the end of each node.

Reported-by: Tim Gardner 
Reported-by: Don Morris 
Bisected-by: Don Morris 
Reported-by: Yinghai Lu 
Signed-off-by: Tang Chen 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Andrew Morton 
Cc: Tony Luck 
Cc: Thomas Renninger 
Cc: Tejun Heo 
Cc: Tang Chen 
Cc: Yasuaki Ishimatsu 
---
 Documentation/kernel-parameters.txt |   29 ++
 arch/x86/kernel/setup.c |   13 ++
 arch/x86/mm/numa.c  |6 +--
 arch/x86/mm/srat.c  |   71 ++
 drivers/acpi/numa.c |   23 +--
 include/linux/acpi.h|8 
 include/linux/mm.h  |2 -
 mm/page_alloc.c |   22 +--
 8 files changed, 27 insertions(+), 147 deletions(-)

diff --git

[PATCH]serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller

2013-02-28 Thread Wang YanQing

01:08.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O 
Controller (rev 01)
Subsystem: Device [1000:0012]
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- 
---
 drivers/tty/serial/8250/8250_pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/tty/serial/8250/8250_pci.c 
b/drivers/tty/serial/8250/8250_pci.c
index 791c5a7..ebcc362 100644
--- a/drivers/tty/serial/8250/8250_pci.c
+++ b/drivers/tty/serial/8250/8250_pci.c
@@ -4791,6 +4791,10 @@ static struct pci_device_id serial_pci_tbl[] = {
PCI_VENDOR_ID_IBM, 0x0299,
0, 0, pbn_b0_bt_2_115200 },
 
+   {   PCI_VENDOR_ID_NETMOS, PCI_DEVICE_ID_NETMOS_9835,
+   0x1000, 0x0012,
+   0, 0, pbn_b0_bt_2_115200 },
+
{   PCI_VENDOR_ID_NETMOS, PCI_DEVICE_ID_NETMOS_9901,
0xA000, 0x1000,
0, 0, pbn_b0_1_115200 },
-- 
1.7.11.1.116.g8228a23
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH V2] smp: Give WARN()ing when calling smp_call_function_many()/single() in serving irq

2013-02-28 Thread Liu, Chuansheng



> -Original Message-
> From: Lai Jiangshan [mailto:eag0...@gmail.com]
> Sent: Wednesday, February 27, 2013 10:51 PM
> To: Liu, Chuansheng
> Cc: mi...@kernel.org; pet...@infradead.org; jbeul...@suse.com;
> paul...@linux.vnet.ibm.com; a...@linux-foundation.org;
> min...@mina86.org; srivatsa.b...@linux.vnet.ibm.com;
> linux-kernel@vger.kernel.org; Zhang, Jun; Wu, Fengguang
> Subject: Re: [PATCH V2] smp: Give WARN()ing when calling
> smp_call_function_many()/single() in serving irq
> 
> On Sat, Feb 16, 2013 at 10:10 PM, Chuansheng Liu
>  wrote:
> > Currently the functions smp_call_function_many()/single() will
> > give a WARN()ing only in the case of irqs_disabled(), but that
> > check is not enough to guarantee execution of the SMP
> > cross-calls.
> >
> > In many other cases such as softirq handling/interrupt handling,
> > the two APIs still can not be called, just as the
> > smp_call_function_many() comments say:
> >
> >   * You must not call this function with disabled interrupts or from a
> >   * hardware interrupt handler or from a bottom half handler. Preemption
> >   * must be disabled when calling this function.
> >
> > There is a real case for softirq DEADLOCK case:
> >
> > CPUACPUB
> > spin_lock()
> > Any irq coming, call the irq handler
> > irq_exit()
> > spin_lock_irq()
> > <== Blocking here due to
> > CPUB hold it
> >   __do_softirq()
> > run_timer_softirq()
> >   timer_cb()
> > call
> smp_call_function_many()
> >   send IPI interrupt to
> CPUA
> > wait_csd()
> >
> > Then both CPUA and CPUB will be deadlocked here.
> >
> > So we should give a warning in the nmi, hardirq or softirq context as well.
> >
> > Moreover, adding one new macro in_serving_irq() which indicates
> > we are processing nmi, hardirq or sofirq.
> 
> The code smells bad. in_serving_softirq() don't take spin_lock_bh() in 
> account.
> 
> CPUACPUB CPUC
> spin_lock()
>   Any irq coming, call
>   the irq handler
>   irq_exit()
> spin_lock_irq()
> *Blocking* here
> due to CPUB hold it
> spin_lock_bh()
> __do_softirq()
>   run_timer_softirq()
> spin_lock_bh()
> *Blocking* heredue to
> CPUC hold it
>  call
> smp_call_function_many()
>  send IPI
> interrupt to CPUA
> 
> wait_csd()
> 
> *Blocking* here.
> 
> So it is still deadlock. but your code does not warn it.
In your case, even you change spin_lock_bh() to spin_lock(), the deadlock is 
still there. So no relation with _bh() at all,
Do not need warning for such deadlock case in smp_call_xxx() or for _bh() case.

> so in_softirq() is better than in_serving_softirq() in in_serving_irq(),
> and results in_serving_irq() is the same as in_interrupt().
> 
> so please remove in_serving_irq() and use in_interrupt() instead.
The original patch is using in_interrupt(). https://lkml.org/lkml/2013/2/6/34 

> And add:
> 
> Reviewed-by: Lai Jiangshan 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL URGENT] ext4 regression fix for 3.9

2013-02-28 Thread Dave Jones

On Wed, Feb 27, 2013 at 03:12:17PM -0500, Linus Torvalds wrote:
 > The following changes since commit 304e220f0879198b1f5309ad6f0be862b4009491:
 > 
 >   ext4: fix free clusters calculation in bigalloc filesystem (2013-02-22 
 > 15:27:52 -0500)
 > 
 > are available in the git repository at:
 > 
 >   git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git 
 > tags/ext4_for_linus
 > 
 > for you to fetch changes up to 8e919d13048cd5acaadb2b15b48acbfb8832d3c2:
 > 
 >   ext4: fix extent status tree regression for file systems > 512GB 
 > (2013-02-27 14:54:37 -0500)
 > 
 > 
 > This fixes a real brown paper bag bug which causes ext4 to choke on
 > file systems larger than 512GB.
 > 
 > 
 > Theodore Ts'o (1):
 >   ext4: fix extent status tree regression for file systems > 512GB
 > 
 >  fs/ext4/extents_status.h | 19 +++
 >  1 file changed, 11 insertions(+), 8 deletions(-)

This has fixed the problem I reported, but I notice now that my desktop is 
really
sluggish. perf top shows it's almost constantly spinning in 
ext4_es_reclaim_extents_count

Any ideas ?

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] efivarfs: fix abnormal GUID in variable name by using strcpy to replace null with dash

2013-02-28 Thread Lee, Chun-Yi

From: Michael Schroeder 

On HP z220 system (firmware version 1.54), some EFI variables are incorrectly
named :

ls -d /sys/firmware/efi/vars/*8be4d* | grep -v -- -8be returns
/sys/firmware/efi/vars/dbxDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
/sys/firmware/efi/vars/KEKDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
/sys/firmware/efi/vars/SecureBoot-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
/sys/firmware/efi/vars/SetupMode-Information8be4df61-93ca-11d2-aa0d-00e098032b8c

That causes by the following statement in efivar_create_sysfs_entry function:

 *(short_name + strlen(short_name)) = '-';
efi_guid_unparse(vendor_guid, short_name + strlen(short_name));

The trailing \0 is overwritten with '-', but the next char doesn't seem to be a 
\0
as well for HP. So, the second strlen return the point of next '\0', causes 
there
have garbage string attached before GUID.

Tested on On HP z220.

Cc: Matt Fleming 
Cc: Josh Boyer 
Cc: Jeremy Kerr 
Cc: Michael Schroeder 
Reported-by: Frederic Crozat 
Tested-by: Frederic Crozat 
Signed-off-by: Lee, Chun-Yi 
---
 drivers/firmware/efivars.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 8bcb595..fbf18ff 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -1708,7 +1708,7 @@ efivar_create_sysfs_entry(struct efivars *efivars,
/* This is ugly, but necessary to separate one vendor's
   private variables from another's. */
 
-   *(short_name + strlen(short_name)) = '-';
+   strcpy(short_name + strlen(short_name), "-");
efi_guid_unparse(vendor_guid, short_name + strlen(short_name));
 
new_efivar->kobj.kset = efivars->kset;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!

2013-02-28 Thread Linus Torvalds

On Wed, Feb 27, 2013 at 1:26 PM, Andrew Morton
 wrote:
>
> So I'm thinking that the best approach here is to revert everything and
> then try again for 3.10-rc1.  This gives people time to test the code
> while it's only in linux-next.  (Hint!)

I'd prefer to revert too by now - the bug seems to be known, and
apparently it's not a trivial fix. We're getting close to the end of
the merge window, and it's still being discussed, it clearly wasn't
really fully cooked.

Can we agree on some minimal set of reverts? Can somebody send me a
patch with the revert and the commit explanation for the revert?
Yinghai? Or I can do the reverts too if just the exact set of commits
is clear, but I'd rather get it from somebody who sees and understand
the problem, and can test the state afterwards..

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Enable multiple MSI feature in pSeries

2013-02-28 Thread Mike

Hi all

Any comments? or any questions about my patchset?

Thanks
Mike
在 2013-01-15二的 15:38 +0800，Mike Qiu写道：
> Currently, multiple MSI feature hasn't been enabled in pSeries,
> These patches try to enbale this feature.
> 
> These patches have been tested by using ipr driver, and the driver patch
> has been made by Wen Xiong :
> 
> [PATCH 0/7] Add support for new IBM SAS controllers
> 
> Test platform: One partition of pSeries with one cpu core(4 SMTs) and 
>RAID bus controller: IBM PCI-E IPR SAS Adapter (ASIC) in POWER7
> OS version: SUSE Linux Enterprise Server 11 SP2  (ppc64) with 3.8-rc3 kernel 
> 
> IRQ 21 and 22 are assigned to the ipr device which support 2 mutiple MSI.
> 
> The test results is shown by 'cat /proc/interrups':
>   CPU0   CPU1   CPU2   CPU3   
> 16: 240458 261601 226310 200425  XICS Level IPI
> 17:  0  0  0  0  XICS Level RAS_EPOW
> 18: 10  0  3  2  XICS Level 
> hvc_console
> 19: 122182  28481  28527  28864  XICS Level ibmvscsi
> 20:5067388226108118  XICS Level eth0
> 21:  6  5  5  5  XICS Level host1-0
> 22:817814816813  XICS Level host1-1
> LOC: 398077 316725 231882 203049   Local timer interrupts
> SPU:   1659919961903   Spurious interrupts
> CNT:  0  0  0  0   Performance
> monitoring interrupts
> MCE:  0  0  0  0   Machine check exceptions
> 
> Mike Qiu (3):
>   irq: Set multiple MSI descriptor data for multiple IRQs
>   irq: Add hw continuous IRQs map to virtual continuous IRQs support
>   powerpc/pci: Enable pSeries multiple MSI feature
> 
>  arch/powerpc/kernel/msi.c|4 --
>  arch/powerpc/platforms/pseries/msi.c |   62 -
>  include/linux/irq.h  |4 ++
>  include/linux/irqdomain.h|3 ++
>  kernel/irq/chip.c|   40 -
>  kernel/irq/irqdomain.c   |   61 +
>  6 files changed, 158 insertions(+), 16 deletions(-)
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the ftrace tree

2013-02-28 Thread Steven Rostedt

On Fri, 2013-03-01 at 13:47 +1100, Stephen Rothwell wrote:
> Hi Steven,
> 
> After merging the ftrace tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> kernel/trace/trace_kdb.c: In function 'ftrace_dump_buf':
> kernel/trace/trace_kdb.c:29:33: error: invalid type argument of '->' (have 
> 'struct trace_array_cpu')
> kernel/trace/trace_kdb.c:86:33: error: invalid type argument of '->' (have 
> 'struct trace_array_cpu')
> 
> Caused by commit eaac1836c10e ("tracing: Replace the static global
> per_cpu arrays with allocated per_cpu").
> 
> I have used the ftrace tree from next-20130228 for today.

Thanks, I'll take a look into it. I also found that my latest push also
broke the ftrace snapshot feature. I'm currently bisecting what caused
that.

Hmm, interesting though, I thought it succeeded in building against an
allyesconfig?? Grumble, I'll have to run it through the tests again to
make sure I didn't screw something up, like test the wrong branch :-p

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: commit_creds oops

2013-02-28 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> Dave Jones  writes:
> 
> > Just hit this on Linus' current tree.
> >
> > [   89.621770] BUG: unable to handle kernel NULL pointer dereference at 
> > 00c8
> > [   89.623111] IP: [] commit_creds+0x250/0x2f0
> > [   89.624062] PGD 122bfd067 PUD 122bfe067 PMD 0 
> > [   89.624901] Oops:  [#1] PREEMPT SMP 
> > [   89.625678] Modules linked in: caif_socket caif netrom bridge hidp 8021q 
> > garp stp mrp rose llc2 af_rxrpc phonet af_key binfmt_misc bnep l2tp_ppp 
> > can_bcm l2tp_core pppoe pppox can_raw scsi_transport_iscsi ppp_generic slhc 
> > nfnetlink can ipt_ULOG ax25 decnet irda nfc rds x25 crc_ccitt appletalk atm 
> > ipx p8023 psnap p8022 llc lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 
> > nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables btusb 
> > bluetooth snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm 
> > vhost_net snd_page_alloc snd_timer tun macvtap usb_debug snd rfkill 
> > microcode macvlan edac_core pcspkr serio_raw kvm_amd soundcore kvm r8169 mii
> > [   89.637846] CPU 2 
> > [   89.638175] Pid: 782, comm: trinity-main Not tainted 3.8.0+ #63 Gigabyte 
> > Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
> > [   89.639850] RIP: 0010:[]  [] 
> > commit_creds+0x250/0x2f0
> > [   89.641161] RSP: 0018:880115657eb8  EFLAGS: 00010207
> > [   89.641984] RAX: 03e8 RBX: 88012688b000 RCX: 
> > 
> > [   89.643069] RDX:  RSI: 81c32960 RDI: 
> > 880105839600
> > [   89.644167] RBP: 880115657ed8 R08:  R09: 
> > 
> > [   89.645254] R10: 0001 R11: 0246 R12: 
> > 880105839600
> > [   89.646340] R13: 88011beea490 R14: 88011beea490 R15: 
> > 
> > [   89.647431] FS:  7f3ac063b740() GS:88012b20() 
> > knlGS:
> > [   89.648660] CS:  0010 DS:  ES:  CR0: 8005003b
> > [   89.649548] CR2: 00c8 CR3: 000122bfc000 CR4: 
> > 07e0
> > [   89.650635] DR0:  DR1:  DR2: 
> > 
> > [   89.651723] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > [   89.652812] Process trinity-main (pid: 782, threadinfo 880115656000, 
> > task 88011beea490)
> > [   89.654128] Stack:
> > [   89.654433]   8801058396a0 880105839600 
> > 88011beeaa78
> > [   89.655769]  880115657ef8 812c7d9b 82079be0 
> > 
> > [   89.657073]  880115657f28 8106c665 0002 
> > 880115657f58
> > [   89.658399] Call Trace:
> > [   89.658822]  [] key_change_session_keyring+0xfb/0x140
> > [   89.659845]  [] task_work_run+0xa5/0xd0
> > [   89.660698]  [] do_notify_resume+0x71/0xb0
> > [   89.661581]  [] int_signal+0x12/0x17
> > [   89.662385] Code: 24 90 00 00 00 48 8b b3 90 00 00 00 49 8b 4c 24 40 48 
> > 39 f2 75 08 e9 83 00 00 00 48 89 ca 48 81 fa 60 29 c3 81 0f 84 41 fe ff ff 
> > <48> 8b 8a c8 00 00 00 48 39 ce 75 e4 3b 82 d0 00 00 00 0f 84 4b 
> > [   89.667778] RIP  [] commit_creds+0x250/0x2f0
> > [   89.668733]  RSP 
> > [   89.669301] CR2: 00c8
> >
> > My fastest trinity induced oops yet!
> >
> >
> > Appears to be..
> >
> > if ((set_ns == subset_ns->parent)  &&
> >  850:   48 8b 8a c8 00 00 00mov0xc8(%rdx),%rcx
> >
> > from the inlined cred_cap_issubset
> 
> Interesting.  That line is protected with the check subset_ns !=
> _user_ns so subset_ns->parent must be valid or subset_ns is not
> a proper user namespace.
> 
> Ugh.  I think I see what is going on and it is just silly. 
> 
> It looks like by historical accident we have been reading trying to set
> new->user_ns from new->user_ns.  Which is totally silly as new->user_ns
> is NULL (as is every other field in new except session_keyring at that
> point).
> 
> It looks like it is safe to sleep in key_change_session_keyring so why
> we just don't use prepare_creds there like everywhere else is beyond
> me.
> 
> The intent is clearly to copy all of the fields from old to new so what
> we should be doing is is copying old->user_ns into new->user_ns.
> 
> Dave can you verify that this patch fixes the oops?
> 
> Signed-off-by: "Eric W. Biederman" 

Jinkeys - that should have stood out like a sore thumb.

Acked-by: Serge Hallyn 

> ---
> diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
> index 58dfe08..a571fad 100644
> --- a/security/keys/process_keys.c
> +++ b/security/keys/process_keys.c
> @@ -839,7 +839,7 @@ void key_change_session_keyring(struct callback_head 
> *twork)
> new-> sgid  = old-> sgid;
> new->fsgid  = old->fsgid;
> new->user   = get_uid(old->user);
> -   new->user_ns= get_user_ns(new->user_ns);
> +   new->user_ns= get_user_ns(old->user_ns);
> new->group_info =

[PATCH 4/5 V3] usb: call pm_runtime_put_noidle in pm_runtime_get_sync failed case

2013-02-28 Thread Li Fei

Even in failed case of pm_runtime_get_sync, the usage_count
is incremented. In order to keep the usage_count with correct
value and runtime power management to behave correctly, call
pm_runtime_put(_sync/noidle) in such case.

Signed-off-by Liu Chuansheng 
Signed-off-by: Li Fei 
---
 drivers/usb/core/hub.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5480352..4a6c055 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3148,12 +3148,13 @@ int usb_port_resume(struct usb_device *udev, 
pm_message_t msg)
 
if (port_dev->did_runtime_put) {
status = pm_runtime_get_sync(_dev->dev);
-   port_dev->did_runtime_put = false;
if (status < 0) {
dev_dbg(>dev, "can't resume usb port, status 
%d\n",
status);
+   pm_runtime_put_noidle(_dev->dev);
return status;
}
+   port_dev->did_runtime_put = false;
}
 
/* Skip the initial Clear-Suspend step for a remote wakeup */
-- 
1.7.4.1




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5 V3] usb: call pm_runtime_put_noidle in pm_runtime_get_sync failed case

2013-02-28 Thread Li Fei


Even in failed case of pm_runtime_get_sync, the usage_count
is incremented. In order to keep the usage_count with correct
value and runtime power management to behave correctly, call
pm_runtime_put(_sync/noidle) in such case.

Signed-off-by Liu Chuansheng 
Signed-off-by: Li Fei 
---
 drivers/usb/core/hub.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5480352..4a6c055 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3148,12 +3148,13 @@ int usb_port_resume(struct usb_device *udev, 
pm_message_t msg)
 
if (port_dev->did_runtime_put) {
status = pm_runtime_get_sync(_dev->dev);
-   port_dev->did_runtime_put = false;
if (status < 0) {
dev_dbg(>dev, "can't resume usb port, status 
%d\n",
status);
+   pm_runtime_put_noidle(_dev->dev);
return status;
}
+   port_dev->did_runtime_put = false;
}
 
/* Skip the initial Clear-Suspend step for a remote wakeup */
-- 
1.7.4.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] CIFS: Decrease reconnection delay when switching nics

2013-02-28 Thread Steve French

On Thu, Feb 28, 2013 at 6:11 PM, Jeff Layton  wrote:
> On Thu, 28 Feb 2013 23:54:13 +0100
> Björn JACKE  wrote:
>
>> On 2013-02-28 at 07:26 -0800 Jeff Layton sent off:
>> > NTFS doesn't support sparse files, so the OS has to zero-fill up to the
>> > point where you're writing. That can take a long time on slow
>> > storage (minutes even).
>>
>> but you are talking about FAT here, right? NTFS does support sparse files if
>> the sparse bit has been explicitly been set on it. Bit even if the sparse bit
>> is not set filling a file with zeros by writing after a seek long beyond the
>> end of the file is very fast because NTFS supports that feature what Unix
>> filesystems like xfs call extents.
>>
>> If writing beyond the end of a file is really slow via cifs vfs in the test
>> case against a ntfs volume then I wonder if that operation is being really 
>> done
>> optimally over the wire. ntfs really isn't that bad with handling this kind 
>> of
>> files.
>>
>
> I'm not sure since I don't know the internals of NTFS. I had always
> assumed that it didn't really handle sparse files well (hence the
> "rabbit-pellet" thing that windows clients do).
>
> All I can say however is that writes long past the EOF can take a
> *really* long time to run. Typically we just issue a SMB_COM_WRITEX at
> the offset to which we want to put the data. Is there some other way we
> ought to be doing this?
>
> In any case, it doesn't really change the fact that there is no
> guaranteed time of response from CIFS servers. They can easily take a
> really long time to respond to certain requests. The best method we
> have to deal with that is to periodically "ping" the server with an
> echo to see if it's still there.

SMB2/SMB3 with better async support may make this easier - but Jeff is right.

-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the kvm tree with Linus' tree

2013-02-28 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm tree got a conflict in
arch/x86/kernel/kvmclock.c between commit 5dfd486c4750 ("x86, kvm: Fix
kvm's use of __pa() on percpu areas") from Linus' tree and commit
fe1140cc3694 ("x86: kvmclock: Do not setup kvmclock vsyscall in the
absence of that clock") from the kvm tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/x86/kernel/kvmclock.c
index 0732f00,b730efa..000
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@@ -160,10 -160,14 +160,14 @@@ int kvm_register_clock(char *txt
  {
int cpu = smp_processor_id();
int low, high, ret;
-   struct pvclock_vcpu_time_info *src = _clock[cpu].pvti;
+   struct pvclock_vcpu_time_info *src;
+ 
+   if (!hv_clock)
+   return 0;
  
+   src = _clock[cpu].pvti;
 -  low = (int)__pa(src) | 1;
 -  high = ((u64)__pa(src) >> 32);
 +  low = (int)slow_virt_to_phys(src) | 1;
 +  high = ((u64)slow_virt_to_phys(src) >> 32);
ret = native_write_msr_safe(msr_kvm_system_time, low, high);
printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
   cpu, high, low, txt);


pgpGKjEQZo2hW.pgp
Description: PGP signature

linux-next: build failure after merge of the ftrace tree

2013-02-28 Thread Stephen Rothwell

Hi Steven,

After merging the ftrace tree, today's linux-next build (x86_64
allmodconfig) failed like this:

kernel/trace/trace_kdb.c: In function 'ftrace_dump_buf':
kernel/trace/trace_kdb.c:29:33: error: invalid type argument of '->' (have 
'struct trace_array_cpu')
kernel/trace/trace_kdb.c:86:33: error: invalid type argument of '->' (have 
'struct trace_array_cpu')

Caused by commit eaac1836c10e ("tracing: Replace the static global
per_cpu arrays with allocated per_cpu").

I have used the ftrace tree from next-20130228 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpjmKDCVIPcu.pgp
Description: PGP signature

Re: [RFC PATCH v2 1/2] mm: tuning hardcoded reserved memory

2013-02-28 Thread Ric Mason


On 02/28/2013 11:48 AM, Andrew Shewmaker wrote:

On Thu, Feb 28, 2013 at 02:12:00PM -0800, Andrew Morton wrote:

On Wed, 27 Feb 2013 15:56:30 -0500
Andrew Shewmaker  wrote:


The following patches are against the mmtom git tree as of February 27th.

The first patch only affects OVERCOMMIT_NEVER mode, entirely removing
the 3% reserve for other user processes.

The second patch affects both OVERCOMMIT_GUESS and OVERCOMMIT_NEVER
modes, replacing the hardcoded 3% reserve for the root user with a
tunable knob.


Gee, it's been years since anyone thought about the overcommit code.

Documentation/vm/overcommit-accounting says that OVERCOMMIT_ALWAYS is
"Appropriate for some scientific applications", but doesn't say why.
You're running a scientific cluster but you're using OVERCOMMIT_NEVER,
I think?  Is the documentation wrong?

None of my scientists appeared to use sparse arrays as Alan described.
My users would run jobs that appeared to initialize correctly. However,
they wouldn't write to every page they malloced (and they wouldn't use
calloc), so I saw jobs failing well into a computation once the
simulation tried to access a page and the kernel couldn't give it to them.

I think Roadrunner (http://en.wikipedia.org/wiki/IBM_Roadrunner) was
the first cluster I put into OVERCOMMIT_NEVER mode. Jobs with
infeasible memory requirements fail early and the OOM killer
gets triggered much less often than in guess mode. More often than not
the OOM killer seemed to kill the wrong thing causing a subtle brokenness.
Disabling overcommit worked so well during the stabilization and
early user phases that we did the same with other clusters.


Do you mean OVERCOMMIT_NEVER is more suitable for scientific application 
than OVERCOMMIT_GUESS and OVERCOMMIT_ALWAYS? Or should depend on 
workload? Since your users would run jobs that wouldn't write to every 
page they malloced, so why OVERCOMMIT_GUESS is not more suitable for you?





__vm_enough_memory reserves 3% of free pages with the default
overcommit mode and 6% when overcommit is disabled. These hardcoded
values have become less reasonable as memory sizes have grown.

On scientific clusters, systems are generally dedicated to one user.
Also, overcommit is sometimes disabled in order to prevent a long
running job from suddenly failing days or weeks into a calculation.
In this case, a user wishing to allocate as much memory as possible
to one process may be prevented from using, for example, around 7GB
out of 128GB.

The effect is less, but still significant when a user starts a job
with one process per core. I have repeatedly seen a set of processes
requesting the same amount of memory fail because one of them could
not allocate the amount of memory a user would expect to be able to
allocate.

...

--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -182,11 +182,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, 
int cap_sys_admin)
allowed -= allowed / 32;
allowed += total_swap_pages;
  
-	/* Don't let a single process grow too big:

-  leave 3% of the size of this process for other processes */
-   if (mm)
-   allowed -= mm->total_vm / 32;
-
if (percpu_counter_read_positive(_committed_as) < allowed)
return 0;

So what might be the downside for this change?  root can't log in, I
assume.  Have you actually tested for this scenario and observed the
effects?

If there *are* observable risks and/or to preserve back-compatibility,
I guess we could create a fourth overcommit mode which provides the
headroom which you desire.

Also, should we be looking at removing root's 3% from OVERCOMMIT_GUESS
as well?

The downside of the first patch, which removes the "other" reserve
(sorry about the confusing duplicated subject line), is that a user
may not be able to kill their process, even if they have a shell prompt.
When testing, I did sometimes get into spot where I attempted to execute
kill, but got: "bash: fork: Cannot allocate memory". Of course, a
user can get in the same predicament with the current 3% reserve--they
just have to start processes until 3% becomes negligible.

With just the first patch, root still has a 3% reserve, so they can
still log in.

When I resubmit the second patch, adding a tunable rootuser_reserve_pages
variable, I'll test both guess and never overcommit modes to see what
minimum initial values allow root to login and kill a user's memory
hogging process. This will be safer than the current behavior since
root's reserve will never shrink to something useless in the case where
a user has grabbed all available memory with many processes.


The idea of two patches looks reasonable to me.



As an estimate of a useful rootuser_reserve_pages, the rss+share size of


Sorry for my silly, why you mean share size is not consist in rss size?


sshd, bash, and top is about 16MB. Overcommit disabled mode would need
closer to 360MB for the same processes. On a 128GB box 3% is 3.8GB, so
the new tunable

Re: [PATCH] VMware Balloon: rename module

2013-02-28 Thread jbian


Yeah,the vmware_balloon driver name changed,when the name is vmware_balloon,the
rhel6 guest on esxi5.1,when modprobe the balloon driver,there will be a
process"vmmemctl",after the driver upgrade:vmw_balloon,the rhel7 guest on
esxi5.1,the balloon is vmw_balloon,but the process daemon disappear,but the
balloon still worked well,so why remove the vmmemctl process??

Thanks in advance,I will be feel grateful to receive all your reply.

B.R
jbian



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH] sched: wakeup buddy

2013-02-28 Thread Michael Wang

On 02/28/2013 11:31 PM, Namhyung Kim wrote:
> 2013-02-28 (목), 11:06 +0100, Mike Galbraith:
>> On Thu, 2013-02-28 at 18:25 +0900, Namhyung Kim wrote:
>>
>>> Not sure if it should require bidirectional relationship.  Looks like
>>> just for benchmarks.  Isn't there a one-way relationship that could get
>>> a benefit from this?  I don't know ;-)
>>
>> ??  Meaningful relationships are bare minimum bidirectional, how can you
>> describe one connection and have it remain meaningful?  I love "her" is
>> unlikely to lead to anything meaningful if "she" doesn't know you exist.
> 
> Maybe I misunderstood something.  I was thinking about typical
> cooperation models like manager-worker, producer-consumer or pipeline
> and thought that they are usually one-way relationship in terms of the
> wakeup.

I agree with Mike's point here, relax the restriction usually benefit
one model but damage more.

The whole wake_affine() stuff is somewhat blindly, we image that the
cache will benefit the wakee but could not estimate how much it is, and
the formula contain too many elements, I'd prefer to gamble only when
I'm likely to win, that will win less money, but lose less too ;-)

Regards,
Michael Wang

> 
> Thanks,
> Namhyung
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/4] documentation: add palmas dts definition

2013-02-28 Thread J, KEERTHY

Hi Stephen,

> -Original Message-
> From: Stephen Warren [mailto:swar...@wwwdotorg.org]
> Sent: Friday, March 01, 2013 12:37 AM
> To: J, KEERTHY
> Cc: grant.lik...@secretlab.ca; rob.herr...@calxeda.com;
> r...@landley.net; devicetree-disc...@lists.ozlabs.org; linux-
> d...@vger.kernel.org; linux-kernel@vger.kernel.org; Cousson, Benoit;
> g...@slimlogic.co.uk
> Subject: Re: [PATCH 1/4] documentation: add palmas dts definition
> 
> On 02/28/2013 05:09 AM, J, KEERTHY wrote:
> > Stephen Warren wrote at Thursday, February 28, 2013 12:03 AM:
> >> On 02/17/2013 10:11 PM, J Keerthy wrote:
> >>> Add the DTS definition for the palmas device including the MFD
> children.
> >> ...
> >>> diff --git a/Documentation/devicetree/bindings/mfd/palmas.txt
> 
> >>> +Required properties:
> >>> +- compatible : Must be "ti,palmas";
> >>
> >> Do you need a version number there; will there be Palmas v1 HW, then
> >> later Palmas v2 HW, and so on?
> >
> > AFAIK there is no HW version.
> 
> My point was more: might there be in the future. However, I guess we
> can go with a first compatible value that has no version, and for any
> future device we can add the version to its compatible value then.
> 

I got it. We can add versioning when we have the next versions available.

> >>> +- interrupts : This i2c device has an IRQ line connected to the
> >>> +main SoC
> >>> +- interrupt-controller : Since the palmas support several
> >>> +interrupts internally,
> >>> +  it is considered as an interrupt controller cascaded to the SoC
> one.
> >>> +- #interrupt-cells = <1>;
> >>
> >> Why not 2; can't any IRQ flags be represented in DT? 1 seems
> limiting
> >> here unless the HW truly can't support configuration of IRQ input
> >> polarity of edge-vs-level sensitivity.
> >
> > From the register manual I see that only GPIO has the edge detect
> capability.
> > I agree.
> 
> I'm not sure if you're agreeing that #interrupt-cells should be 2 here
> as I suggested, or with the original code. you say "only GPIO has the
> edge detect capability" which would imply that IRQs don't, which would
> imply no need for a flags cell in DT, so #interrupt-cells=<1> would be
> fine... But then you say "I agree" after I suggested that #interrupt-
> cells=<2> might be better.

Sorry I did not give a detailed explanation earlier. There are 32 sources
Of interrupts from Palmas but only one physical line coming out. Out of 32
8 of them are from the GPIO module of palmas. The GPIO module interrupts
Are edge based. Hence I completely agreed to the point of using
#interrupt-cells=<2>.
 
> 
> BTW, your mailer completely mangled the line-wrapping of my email,
> making the parts you quoted rather harder to read.
> 

Sorry about that but not quite sure what happened there!

> >>> +Optional node:
> >>> +- Child nodes contain in the palmas. The palmas family is made of
> >>> +several
> >>> +  variants that support a different number of features.
> >>> +  The child nodes will thus depend of the capability of the
> variant.
> >>
> >> Are there DT bindings for those child nodes anywhere?
> >>
> >> Representing each internal component as a separate DT node feels a
> >> little like designing the DT bindings to model the Linux-internal
> MFD
> >> structure. DT bindings should be driven by the HW design and OS-
> >> agnostic.
> >>
> >> From a DT perspective, is there any need at all to create a separate
> >> DT node for each component? This would only be needed or useful if
> >> the child IP blocks (and hence DT bindings for those blocks) could
> be
> >> re- used in other top-level devices that aren't represented by this
> >> top- level ti,palmas DT binding. Are the HW IP blocks here re-used
> >> anywhere, or will they be?
> >
> > I guess for now I will drop this patch and will be taken up once we
> > Finalize on the design.
> 
> The DT binding has to be fully defined before the code, or how do you
> know what binding you're writing the code for? Dropping this patch and
> then moving forward with posting it (which is what your statement
> implies) doesn't seem correct.

Since Graeme is planning to redesign a bit I am dropping this patch
And he plans to send the updated documentation patch. The original
Intent of this series was to add documentation since the code was
Defined before Documentation. To make it clear I am not posting
Any further patches before Documentation/DT binding is completely
Defined.

Regards,
Keerthy 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in pm_runtime_get_sync failed case

2013-02-28 Thread Liu, Chuansheng



> -Original Message-
> From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> Sent: Friday, March 01, 2013 10:22 AM
> To: Liu, Chuansheng
> Cc: Li, Fei; gre...@linuxfoundation.org; Lan, Tianyu;
> st...@rowland.harvard.edu; sarah.a.sh...@linux.intel.com;
> linux-...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in
> pm_runtime_get_sync failed case
> 
> On Friday, March 01, 2013 02:07:54 AM Liu, Chuansheng wrote:
> >
> > > -Original Message-
> > > From: Li, Fei
> > > Sent: Thursday, February 28, 2013 5:06 PM
> > > To: gre...@linuxfoundation.org; Lan, Tianyu; st...@rowland.harvard.edu;
> > > sarah.a.sh...@linux.intel.com
> > > Cc: r...@sisk.pl; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> Liu,
> > > Chuansheng; Li, Fei
> > > Subject: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in
> > > pm_runtime_get_sync failed case
> > >
> > >
> > > Even in failed case of pm_runtime_get_sync, the usage_count
> > > is incremented. In order to keep the usage_count with correct
> > > value and runtime power management to behave correctly, call
> > > pm_runtime_put(_sync) in such case.
> > >
> > > Signed-off-by Liu Chuansheng 
> > > Signed-off-by: Li Fei 
> > > ---
> > >  drivers/usb/core/hub.c |3 ++-
> > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > index 5480352..f72dede 100644
> > > --- a/drivers/usb/core/hub.c
> > > +++ b/drivers/usb/core/hub.c
> > > @@ -3148,12 +3148,13 @@ int usb_port_resume(struct usb_device
> *udev,
> > > pm_message_t msg)
> > >
> > >   if (port_dev->did_runtime_put) {
> > >   status = pm_runtime_get_sync(_dev->dev);
> > > - port_dev->did_runtime_put = false;
> > >   if (status < 0) {
> > >   dev_dbg(>dev, "can't resume usb port,
> status %d\n",
> > >   status);
> > > + pm_runtime_put_sync(_dev->dev);
> > Rechecked the usb similar codes, in usb_autoresume_device() and
> usb_autopm_get_interface(),
> > when pm_runtime_get_sync() failed, the paired pm_runtime_put_sync() will
> be called.
> > Alan and Rafael, is it reasonable to consider this cleanup patch also? 
> > Thanks.
> 
> You can very well use pm_runtime_put_noidle() here too.  Then, it will
> be kind of clear what it's for.
Thanks. Your advice really express we want to do. Will update the patch soon.

> 
> >
> > >   return status;
> > >   }
> > > + port_dev->did_runtime_put = false;
> > >   }
> > >
> > >   /* Skip the initial Clear-Suspend step for a remote wakeup */
> > > --
> > > 1.7.4.1
> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

Re: [RFC PATCH] sched: wakeup buddy

2013-02-28 Thread Michael Wang

Hi, Namhyung

Thanks for your reply.

On 02/28/2013 05:25 PM, Namhyung Kim wrote:
[snip]
>> Thus, if B is also the wakeup buddy of A, which means no other task has
>> destroyed their relationship, then A is likely to benefit from the cached
>> data of B, make them running closely is likely to gain benefit.
> 
> Not sure if it should require bidirectional relationship.  Looks like
> just for benchmarks.  Isn't there a one-way relationship that could get
> a benefit from this?  I don't know ;-)

That's one point :)

Actually I have tried the one-way case at very beginning, the
performance is not good.

I think it was caused by that if A lost interesting on B and walking
with C, then make A and B closely won't gain so many benefit, since the
cached data of A is likely to benefit C not B now.

> 
> Few nitpicks below..
> 
>>
>> This patch add the feature wakeup buddy, reorganized the logical of
>> wake_affine() stuff with the new feature, by doing these, pgbench and
>> 'perf bench sched pipe' perform better.
>>
>> Highlight:
>>  Default value of sysctl_sched_wakeup_buddy_ref is 8 temporarily,
>>  please let me know if some number perform better on your system,
>>  I'd like to make it bigger to make the decision more carefully,
>>  so we could provide the solution when it is really needed.
>>
>>  Comments are very welcomed.
>>
>> Test:
>>  Test with a 12 cpu X86 server and tip 3.8.0-rc7.
>>
>>  'perf bench sched pipe' show nearly double improvement.
>>
>>  pgbench result:
>>  prevpost
>>
>> | db_size | clients |  tps  |   |  tps  |
>> +-+-+---+   +---+
>> | 22 MB   |   1 | 10794 |   | 10820 |
>> | 22 MB   |   2 | 21567 |   | 21915 |
>> | 22 MB   |   4 | 41621 |   | 42766 |
>> | 22 MB   |   8 | 53883 |   | 60511 |   +12.30%
>> | 22 MB   |  12 | 50818 |   | 57129 |   +12.42%
>> | 22 MB   |  16 | 50463 |   | 59345 |   +17.60%
>> | 22 MB   |  24 | 46698 |   | 63787 |   +36.59%
>> | 22 MB   |  32 | 43404 |   | 62643 |   +44.33%
>>
>> | 7484 MB |   1 |  7974 |   |  8014 |
>> | 7484 MB |   2 | 19341 |   | 19534 |
>> | 7484 MB |   4 | 36808 |   | 38092 |
>> | 7484 MB |   8 | 47821 |   | 51968 |   +8.67%
>> | 7484 MB |  12 | 45913 |   | 52284 |   +13.88%
>> | 7484 MB |  16 | 46478 |   | 54418 |   +17.08%
>> | 7484 MB |  24 | 42793 |   | 56375 |   +31.74%
>> | 7484 MB |  32 | 36329 |   | 55783 |   +53.55%
>> 
>> | 15 GB   |   1 |  7636 |   |  7880 |   
>> | 15 GB   |   2 | 19195 |   | 19477 |
>> | 15 GB   |   4 | 35975 |   | 37962 |
>> | 15 GB   |   8 | 47919 |   | 51558 |   +7.59%
>> | 15 GB   |  12 | 45397 |   | 51163 |   +12.70%
>> | 15 GB   |  16 | 45926 |   | 53912 |   +17.39%
>> | 15 GB   |  24 | 42184 |   | 55343 |   +31.19%
>> | 15 GB   |  32 | 35983 |   | 55358 |   +53.84%
>>
>> Signed-off-by: Michael Wang 
>> ---
> [SNIP]
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 81fa536..d5acfd8 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3173,6 +3173,75 @@ static int wake_affine(struct sched_domain *sd, 
>> struct task_struct *p, int sync)
>>  }
>>  
>>  /*
>> + * Reduce sysctl_sched_wakeup_buddy_ref will reduce the preparation time
>> + * to active the wakeup buddy feature, and make it agile, however, this
>> + * will increase the risk of misidentify.
>> + *
>> + * Check wakeup_buddy() for the usage.
>> + */
>> +unsigned int sysctl_sched_wakeup_buddy_ref = 8UL;
> 
> It seems that just 8U (or even 8) is enough.

I will correct it.

> 
>> +
>> +/*
>> + * wakeup_buddy() help to check whether p1 is the wakeup buddy of p2.
>> + *
>> + * Return 1 for yes, 0 for no.
>> +*/
>> +static inline int wakeup_buddy(struct task_struct *p1, struct task_struct 
>> *p2)
>> +{
>> +if (p1->waker != p2 || p1->wakee != p2)
>> +return 0;
>> +
>> +if (p1->waker_ref < sysctl_sched_wakeup_buddy_ref)
>> +return 0;
>> +
>> +if (p1->wakee_ref < sysctl_sched_wakeup_buddy_ref)
>> +return 0;
>> +
>> +return 1;
>> +}
> [SNIP]
>> @@ -3399,6 +3490,8 @@ select_task_rq_fair(struct task_struct *p, int 
>> sd_flag, int wake_flags)
>>  unlock:
>>  rcu_read_unlock();
>>  
>> +wakeup_ref(p);
>> +
> 
> Why did you call it here?  Shouldn't it be on somewhere in the ttwu?

I'd like to put the changes closely, just another 'bad' habit ;-)

But you notified me that I should add a check

Re: [RFC PATCH] sched: wakeup buddy

2013-02-28 Thread Michael Wang

On 02/28/2013 05:18 PM, Mike Galbraith wrote:
> On Thu, 2013-02-28 at 16:49 +0800, Michael Wang wrote: 
>> On 02/28/2013 04:24 PM, Mike Galbraith wrote:
>>> On Thu, 2013-02-28 at 16:14 +0800, Michael Wang wrote: 
 On 02/28/2013 04:04 PM, Mike Galbraith wrote:
>>>
> It would be nice if it _were_ a promise, but it is not, it's a hint.

 Bad to know :(

 Should we fix it or this is by designed? The comments after WF_SYNC
 cheated me...
>>>
>>> You can't fix it, because it's not busted.  You can say "Ok guys, I'm
>>> off for a nap RSN" all you want, but that won't guarantee that nobody
>>> pokes you, and hands you something more useful to do than snoozing.
>>
>> So sync still means current is going to sleep, what you concerned is
>> this promise will be easily broken by other waker, correct?
> 
> That makes it a lie, and it can already have been one with no help.
> Just because you wake one sync does not mean you're not going to find
> another to wake.  Smart tasks are taught to look before they leap.
> 
>> Hmm.. may be you are right, if 'perf bench sched pipe' is not the one we
>> should care, I have no reason to add this logical currently.
> 
> Well, there is reason to identify task relationships methinks, you just
> can't rely on the fact that you're alone on the rq at the moment, and
> doing a sync wakeup to bind tasks.  They _will_ lie to you :)

I see.

> 
>> I will remove this plus branch, unless I found other benchmark could
>> benefit a lot from it.
>>
>> Besides this, how do you think about this idea?
> 
> I like the idea of filtering true buddy pairs, and automagically
> detecting the point when 1:N wants spreading rather a lot (fwtw).  I'll
> look closer at your method, but when it comes to implementation
> opinions, the only one I trust comes out of a box in front of me.

And please let me know how it works on your box ;-)

Regards,
Michael Wang

> 
> I'm somewhat.. "taste challenged", Peter and Ingo have some though :)
> 
> -Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: IMA: How to manage user space signing policy with others

2013-02-28 Thread Mimi Zohar

On Thu, 2013-02-28 at 16:35 -0500, Vivek Goyal wrote:
> On Thu, Feb 28, 2013 at 02:23:39PM -0500, Mimi Zohar wrote:
> 
> [..]
> > I would suggest that the ima_appraise_tcb, which is more restrictive, be
> > permitted to replace the secureboot policy.
> 
> Also ima_appraise_tcb is not necessarily more restrictive. It takes
> appraises only for root user. Files for rest of users are not appraised.

Ok, good point.  

> In general case of "memory locked execution of signed binary" I was
> hoping to give user a flexibility to do appraisal either for root
> or both root and non-root user.
> 
> For the time being I can hardcode things only for root user but the
> moment somebody will extend functionality for non-root user, again
> we will run into the issue that ima_appraise_tcb is not superset so
> we can't allow that.

So we can agree that the 'ima_appraise_tcb' policy is more restrictive
for root owned files.  So as long as the 'ima_appraise_tcb' policy
precedes the secureboot integrity policy, we should be good.

thanks,

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in pm_runtime_get_sync failed case

2013-02-28 Thread Rafael J. Wysocki

On Friday, March 01, 2013 02:07:54 AM Liu, Chuansheng wrote:
> 
> > -Original Message-
> > From: Li, Fei
> > Sent: Thursday, February 28, 2013 5:06 PM
> > To: gre...@linuxfoundation.org; Lan, Tianyu; st...@rowland.harvard.edu;
> > sarah.a.sh...@linux.intel.com
> > Cc: r...@sisk.pl; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> > Liu,
> > Chuansheng; Li, Fei
> > Subject: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in
> > pm_runtime_get_sync failed case
> > 
> > 
> > Even in failed case of pm_runtime_get_sync, the usage_count
> > is incremented. In order to keep the usage_count with correct
> > value and runtime power management to behave correctly, call
> > pm_runtime_put(_sync) in such case.
> > 
> > Signed-off-by Liu Chuansheng 
> > Signed-off-by: Li Fei 
> > ---
> >  drivers/usb/core/hub.c |3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 5480352..f72dede 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3148,12 +3148,13 @@ int usb_port_resume(struct usb_device *udev,
> > pm_message_t msg)
> > 
> > if (port_dev->did_runtime_put) {
> > status = pm_runtime_get_sync(_dev->dev);
> > -   port_dev->did_runtime_put = false;
> > if (status < 0) {
> > dev_dbg(>dev, "can't resume usb port, status 
> > %d\n",
> > status);
> > +   pm_runtime_put_sync(_dev->dev);
> Rechecked the usb similar codes, in usb_autoresume_device() and 
> usb_autopm_get_interface(),
> when pm_runtime_get_sync() failed, the paired pm_runtime_put_sync() will be 
> called.
> Alan and Rafael, is it reasonable to consider this cleanup patch also? Thanks.

You can very well use pm_runtime_put_noidle() here too.  Then, it will
be kind of clear what it's for.

> 
> > return status;
> > }
> > +   port_dev->did_runtime_put = false;
> > }
> > 
> > /* Skip the initial Clear-Suspend step for a remote wakeup */
> > --
> > 1.7.4.1

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in pm_runtime_get_sync failed case

2013-02-28 Thread Rafael J. Wysocki

On Friday, March 01, 2013 12:59:23 AM Liu, Chuansheng wrote:
> 
> > -Original Message-
> > From: Rafael J. Wysocki [mailto:r...@sisk.pl]
> > Sent: Friday, March 01, 2013 8:51 AM
> > To: Liu, Chuansheng
> > Cc: Alan Stern; Li, Fei; gre...@linuxfoundation.org; Lan, Tianyu;
> > sarah.a.sh...@linux.intel.com; linux-...@vger.kernel.org;
> > linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in
> > pm_runtime_get_sync failed case
> > 
> > On Friday, March 01, 2013 12:38:07 AM Liu, Chuansheng wrote:
> > >
> > > > -Original Message-
> > > > From: Alan Stern [mailto:st...@rowland.harvard.edu]
> > > > Sent: Thursday, February 28, 2013 11:17 PM
> > > > To: Li, Fei
> > > > Cc: gre...@linuxfoundation.org; Lan, Tianyu;
> > sarah.a.sh...@linux.intel.com;
> > > > r...@sisk.pl; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> > > > Liu,
> > > > Chuansheng
> > > > Subject: Re: [PATCH 4/5 V2] usb: call pm_runtime_put_sync in
> > > > pm_runtime_get_sync failed case
> > > >
> > > > On Thu, 28 Feb 2013, Li Fei wrote:
> > > >
> > > > >
> > > > > Even in failed case of pm_runtime_get_sync, the usage_count
> > > > > is incremented. In order to keep the usage_count with correct
> > > > > value and runtime power management to behave correctly, call
> > > > > pm_runtime_put(_sync) in such case.
> > > > >
> > > > > Signed-off-by Liu Chuansheng 
> > > > > Signed-off-by: Li Fei 
> > > > > ---
> > > > >  drivers/usb/core/hub.c |3 ++-
> > > > >  1 files changed, 2 insertions(+), 1 deletions(-)
> > > > >
> > > > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > > > > index 5480352..f72dede 100644
> > > > > --- a/drivers/usb/core/hub.c
> > > > > +++ b/drivers/usb/core/hub.c
> > > > > @@ -3148,12 +3148,13 @@ int usb_port_resume(struct usb_device
> > *udev,
> > > > pm_message_t msg)
> > > > >
> > > > >   if (port_dev->did_runtime_put) {
> > > > >   status = pm_runtime_get_sync(_dev->dev);
> > > > > - port_dev->did_runtime_put = false;
> > > > >   if (status < 0) {
> > > > >   dev_dbg(>dev, "can't resume usb port,
> > status %d\n",
> > > > >   status);
> > > > > + pm_runtime_put_sync(_dev->dev);
> > > > >   return status;
> > > > >   }
> > > > > + port_dev->did_runtime_put = false;
> > > > >   }
> > > >
> > > > I don't see much point in this.  After a failed resume, the port's
> > > > runtime PM status is undefined.  Whether or not you do a
> > > > pm_runtime_put_sync won't make any difference.
> > > In case of failed resume, calling pm_runtime_put_sync() is just for 
> > > decrease
> > the dev->power.usage_count,
> > > because pm_runtime_get_sync() always increase the
> > dev->power.usage_count even failed.
> > >
> > > If not pairing runtime_get/put, after that case, the device can not enter
> > runtime suspend any more due to dev->power.usage_count > 0 always.
> > > Is it making sense?
> > 
> > Well, not really.
> > 
> > Before returning an error code, rpm_callback() assigns that code to
> > dev->power.runtime_error and that will effectively disable runtime PM for 
> > dev
> > going forward anyway.
> Thanks your pointing out.
> dev->power.runtime_error!=0 will really block the runtime PM resume/suspend 
> to continue.
> 
> But in case of rpm_resume return error when dev->power.disable_depth > 0, the 
> dev->power.runtime_error
> is not set yet. Is it the case?

Yes, it is.

> And another case is when user called pm_runtime_set_status to clear the 
> runtime_error after dev->power.runtime_error
> is set during pm_runtime_get_sync(), the runtime_resume/suspend() can be 
> tried again? But the dev->power.usage_count is still wrong?

If you clear runtime_error using pm_runtime_set_status(), you can correct the
reference counter as well.

But I agree that with runtime PM disabled it is actually useful to keep the
reference counter balanced appropriately so that you don't need to special
case that.  All depends on how runtime PM is used in the given piece of code.

That's why I didn't comment your other patches.  That said, I didn't look at
them in detail either.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1462 matches

Mail list logo