Re: [PATCHv3 2/4] Input: keypad: Add smsc ece1099 keypad driver

2012-10-02 Thread Poddar, Sourav
Hi Dmitry,

On Tue, Oct 2, 2012 at 11:48 AM, Dmitry Torokhov
 wrote:
> Hi Sourav,
>
> On Mon, Oct 01, 2012 at 04:31:50PM +0530, Sourav Poddar wrote:
>> From: G, Manjunath Kondaiah 
>>
>> SMSC ECE1099 is a keyboard scan or GPIO expansion device.The device
>> supports a keypad scan matrix of 23*8.This driver uses this
>> device as a keypad driver.
>>
>> Tested on omap5430 evm with 3.6-rc6 custom kernel.
>>
>> Cc: Dmitry Torokhov 
>> Cc: Benoit Cousson 
>> Cc: Felipe Balbi 
>> Cc: Santosh Shilimkar 
>> Signed-off-by: G, Manjunath Kondaiah 
>> Signed-off-by: Sourav Poddar 
>> Acked-by: Felipe Balbi 
>> ---
>> Changes since v2:
>>  - Replace magic numbers through driver variable.
>>  - Provide comments for some initialisation done in probe.
>>  drivers/input/keyboard/Kconfig   |   11 +
>>  drivers/input/keyboard/Makefile  |1 +
>>  drivers/input/keyboard/smsc-ece1099-keypad.c |  304 
>> ++
>>  3 files changed, 316 insertions(+), 0 deletions(-)
>>  create mode 100644 drivers/input/keyboard/smsc-ece1099-keypad.c
>>
>> diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig
>> index c50fa75..b03a39c 100644
>> --- a/drivers/input/keyboard/Kconfig
>> +++ b/drivers/input/keyboard/Kconfig
>> @@ -593,6 +593,17 @@ config KEYBOARD_TWL4030
>> To compile this driver as a module, choose M here: the
>> module will be called twl4030_keypad.
>>
>> +config KEYBOARD_SMSC
>> +   tristate "SMSC ECE1099 keypad support"
>
> Should also select INPUT_MATRIXKMAP.
>
Yes, will include in the next version.
>> +   depends on I2C
>> +   help
>> + Say Y here if your board use the smsc keypad controller
>> + for omap5 defconfig. It's safe to say enable this
>> + even on boards that don't use the keypad controller.
>> +
>> + To compile this driver as a module, choose M here: the
>> + module will be called smsc-ece1099-keypad.
>> +
>>  config KEYBOARD_XTKBD
>>   tristate "XT keyboard"
>>   select SERIO
>> diff --git a/drivers/input/keyboard/Makefile 
>> b/drivers/input/keyboard/Makefile
>> index 44e7600..0f2aa26 100644
>> --- a/drivers/input/keyboard/Makefile
>> +++ b/drivers/input/keyboard/Makefile
>> @@ -52,5 +52,6 @@ obj-$(CONFIG_KEYBOARD_TC3589X)  += 
>> tc3589x-keypad.o
>>  obj-$(CONFIG_KEYBOARD_TEGRA) += tegra-kbc.o
>>  obj-$(CONFIG_KEYBOARD_TNETV107X) += tnetv107x-keypad.o
>>  obj-$(CONFIG_KEYBOARD_TWL4030)   += twl4030_keypad.o
>> +obj-$(CONFIG_KEYBOARD_SMSC)+= smsc-ece1099-keypad.o
>>  obj-$(CONFIG_KEYBOARD_XTKBD) += xtkbd.o
>>  obj-$(CONFIG_KEYBOARD_W90P910)   += w90p910_keypad.o
>> diff --git a/drivers/input/keyboard/smsc-ece1099-keypad.c 
>> b/drivers/input/keyboard/smsc-ece1099-keypad.c
>> new file mode 100644
>> index 000..15dc147
>> --- /dev/null
>> +++ b/drivers/input/keyboard/smsc-ece1099-keypad.c
>> @@ -0,0 +1,304 @@
>> +/*
>> + * SMSC_ECE1099 Keypad driver
>> + *
>> + * Copyright (C) 2012 Texas Instruments Incorporated - http://www.ti.com/
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define KEYPRESS_TIME  200
>> +
>> +struct smsc_keypad {
>> + struct smsc *smsc;
>> + struct matrix_keymap_data *keymap_data;
>> + unsigned int last_key_state[16];
>> + unsigned int last_col;
>> + unsigned int last_key_ms[16];
>> + unsigned short *keymap;
>> + struct i2c_client *client;
>> + struct input_dev *input;
>> + int rows, cols;
>> + int row_shift;
>> + bool no_autorepeat;
>> + unsignedirq;
>> + struct device *dev;
>> +};
>> +
>> +static void smsc_kp_scan(struct smsc_keypad *kp)
>> +{
>> + struct input_dev *input = kp->input;
>> + int i, j;
>> + int row, col;
>> + int temp, code;
>> + unsigned int new_state[16];
>> + unsigned int bits_changed;
>> + int this_ms;
>> +
>> + smsc_write(kp->dev, SMSC_KP_INT_MASK, 0x00);
>> + smsc_write(kp->dev, SMSC_KP_INT_STAT, 0xFF);
>> +
>> + /* Scan for row and column */
>> + for (i = 0; i < kp->cols; i++) {
>> + smsc_write(kp->dev, SMSC_KP_OUT, SMSC_KSO_EVAL + i);
>> + /* Read Row Status */
>> + smsc_read(kp->dev, SMSC_KP_IN, );
>> + if (temp == 0xFF)
>> + continue;
>> +
>> + col = i;
>> + for (j = 0; j < kp->rows; j++) {
>> + if ((temp & 0x01) != 0x00) {
>> + temp = temp >> 1;
>> + continue;
>> + }
>> +
>> + 

[PATCH] module: Fix kallsyms to show the last symbol properly

2012-10-02 Thread masaki . kimura . kz
This patch fixes a bug that the last symbol in the .symtab section of
kernel modules is not displayed with /proc/kallsyms. This happens
because the first symbol is processed twice before and inside the loop
without incrementing "src".

This bug exists since the following commit was introduced.
   module: reduce symbol table for loaded modules (v2)
   commit: 4a4962263f07d14660849ec134ee42b63e95ea9a

This patch is tested on 3.6.0-rc6 kernel with the simple test module by the 
below steps, to check if all the core symbols appear in /proc/kallsyms.

[Test steps]
1. Compile the test module, like below.
   (My compiler tends to put a function named with 18 charactors, 
like zz, at the end of .symtab section. 
I don't know why, though.)

# cat tp.c
#include 
#include 

void zz(void) {}

static int init_tp(void)
{
return 0;
}

static void exit_tp(void) {}

module_init(init_tp);
module_exit(exit_tp);
MODULE_LICENSE("GPL");

# cat Makefile
KERNEL_RELEASE=$(shell uname -r)
BUILDDIR := /lib/modules/$(KERNEL_RELEASE)/source

obj-m := tp.o

all:
$(MAKE) -C $(BUILDDIR) M=$(PWD) V=1 modules
clean:
$(MAKE) -C $(BUILDDIR) M=$(PWD) V=1 clean

# make

2. Check if the target symbol, zz in this case, 
   is located at the last entry.

# readelf -s tp.ko | tail
18: 002011 FUNCLOCAL  DEFAULT2 exit_tp
19: 12 OBJECT  LOCAL  DEFAULT4 __mod_license15
20:  0 FILELOCAL  DEFAULT  ABS tp.mod.c
21: 000c 9 OBJECT  LOCAL  DEFAULT4 __module_depends
22: 001545 OBJECT  LOCAL  DEFAULT4 __mod_vermagic5
23:    600 OBJECT  GLOBAL DEFAULT8 __this_module
24: 002011 FUNCGLOBAL DEFAULT2 cleanup_module
25: 001013 FUNCGLOBAL DEFAULT2 init_module
26:  0 NOTYPE  GLOBAL DEFAULT  UND mcount
27: 11 FUNCGLOBAL DEFAULT2 zz

3. Load the module.
# insmod tp.ko

4. Check if all the core symbols are shown /proc/kallsyms properly.

[Before my patch applied]
# grep "\[tp\]" /proc/kallsyms
a0135010 t init_tp  [tp]
a0135020 t exit_tp  [tp]
a0137000 d __this_module[tp]
a0135020 t cleanup_module   [tp]
a0135010 t init_module  [tp]

(The last entry, or zz, is not shown.)

[After my patch applied]
# grep "\[tp\]" /proc/kallsyms
a0135010 t init_tp  [tp]
a0135020 t exit_tp  [tp]
a0137000 d __this_module[tp]
a0135020 t cleanup_module   [tp]
a0135010 t init_module  [tp]
a0135000 t zz   [tp]

(The last entry, or zz, is shown properly.)


Signed-off-by: Masaki Kimura 
---
 kernel/module.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index 4edbd9c..d432c21 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2273,7 +2273,9 @@ static void layout_symtab(struct module *mod, struct 
load_info *info)
src = (void *)info->hdr + symsect->sh_offset;
nsrc = symsect->sh_size / sizeof(*src);
 
-   /* Compute total space required for the core symbols' strtab. */
+   /* Compute total space required for the core symbols' strtab.
+  We start searching core symbols from the second entry. */
+   src++;
for (ndst = i = strtab_size = 1; i < nsrc; ++i, ++src)
if (is_core_symbol(src, info->sechdrs, info->hdr->e_shnum)) {
strtab_size += strlen(>strtab[src->st_name]) + 1;
@@ -2314,6 +2316,8 @@ static void add_kallsyms(struct module *mod, const struct 
load_info *info)
src = mod->symtab;
*dst = *src;
*s++ = 0;
+   /* We start searching core symbols from the second entry. */
+   src++;
for (ndst = i = 1; i < mod->num_symtab; ++i, ++src) {
if (!is_core_symbol(src, info->sechdrs, info->hdr->e_shnum))
continue;
-- 
1.7.10.1



[PATCH v0] Add SHA-3 hash algorithm

2012-10-02 Thread Jeff Garzik

Whee -- SHA-3 is out!   I wanted to explore the new toy a bit, and
so, here is a blatantly untested rough draft of SHA-3 kernel support.

Why rough draft?  Because answers to the questions below will inform a
more polished version.

Code notes and questions:

1) tcrypt setup blatantly wrong.  What is the best setup here?  Define a
separate entry for each digest length?  Is there some special string
descriptor format that is desired, like "sha3-256" or "sha3(256)"?

2) Digest and block size are easily variable, as shown below...
do we want hand-craft individual versions for each -- sha3_256.c,
sha3_512.c, etc.?

3) Is it even feasible for struct shash_alg to have a dynamic (filled in
at context init time, not driver registration time) digestsize and
cra_blocksize?  That would permit a single shash_alg for all sha3.

4) Original implementation from readable_keccak.tgz (link below).  The
official sources have a bazillion different flavors for various
architectures and bit sizes, and the code is not pretty.  I wanted to
start small and readable, and _then_ branch out into x86[-64]-specific
versions, etc. as users and use cases appear.



Commit e52113b7b4ace50ab586b426098c6d69d75c263a
Branch sha3
Repo git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/linux.git

References:
http://keccak.noekeon.org/
http://www.mjos.fi/dist/readable_keccak.tgz
http://www.nist.gov/itl/csd/sha-100212.cfm

Not-signed-off-by: Jeff Garzik 

 crypto/Kconfig|6 +
 crypto/Makefile   |1 
 crypto/sha3_generic.c |  280 ++
 crypto/tcrypt.c   |   14 ++
 include/crypto/sha3.h |   26 
 5 files changed, 326 insertions(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index a323805..97f5e75 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -457,6 +457,12 @@ config CRYPTO_SHA512
  This code also includes SHA-384, a 384 bit hash with 192 bits
  of security against collision attacks.
 
+config CRYPTO_SHA3
+   tristate "SHA3 digest algorithm"
+   select CRYPTO_HASH
+   help
+ SHA-3 secure hash standard.
+
 config CRYPTO_TGR192
tristate "Tiger digest algorithms"
select CRYPTO_HASH
diff --git a/crypto/Makefile b/crypto/Makefile
index 30f33d6..65150d1 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -45,6 +45,7 @@ obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
+obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
 obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o
diff --git a/crypto/sha3_generic.c b/crypto/sha3_generic.c
new file mode 100644
index 000..9255ea1
--- /dev/null
+++ b/crypto/sha3_generic.c
@@ -0,0 +1,280 @@
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define KECCAK_ROUNDS 24
+
+#define ROTL64(x, y) (((x) << (y)) | ((x) >> (64 - (y
+
+static const u64 keccakf_rndc[24] = 
+{
+   0x0001, 0x8082, 0x8000808a,
+   0x800080008000, 0x808b, 0x8001,
+   0x800080008081, 0x80008009, 0x008a,
+   0x0088, 0x80008009, 0x800a,
+   0x8000808b, 0x808b, 0x80008089,
+   0x80008003, 0x80008002, 0x8080, 
+   0x800a, 0x8000800a, 0x800080008081,
+   0x80008080, 0x8001, 0x800080008008
+};
+
+static const int keccakf_rotc[24] = 
+{
+   1,  3,  6,  10, 15, 21, 28, 36, 45, 55, 2,  14, 
+   27, 41, 56, 8,  25, 43, 62, 18, 39, 61, 20, 44
+};
+
+static const int keccakf_piln[24] = 
+{
+   10, 7,  11, 17, 18, 3, 5,  16, 8,  21, 24, 4, 
+   15, 23, 19, 13, 12, 2, 20, 14, 22, 9,  6,  1 
+};
+
+// update the state with given number of rounds
+
+static void keccakf(u64 st[25], int rounds)
+{
+   int i, j, round;
+   u64 t, bc[5];
+
+   for (round = 0; round < rounds; round++) {
+
+   // Theta
+   for (i = 0; i < 5; i++)  
+   bc[i] = st[i] ^ st[i + 5] ^ st[i + 10] ^ st[i + 15] ^ 
st[i + 20];
+
+   for (i = 0; i < 5; i++) {
+   t = bc[(i + 4) % 5] ^ ROTL64(bc[(i + 1) % 5], 1);
+   for (j = 0; j < 25; j += 5)
+   st[j + i] ^= t;
+   }
+
+   // Rho Pi
+   t = st[1];
+   for (i = 0; i < 24; i++) {
+   j = keccakf_piln[i];
+   bc[0] = st[j];
+   st[j] = ROTL64(t, keccakf_rotc[i]);
+   t = bc[0];
+   }
+
+   //  Chi
+   for (j = 0; j < 25; j += 5) {
+   for (i = 0; i < 5; i++)
+  

Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread H. Peter Anvin
On 10/02/2012 10:28 PM, Matthew Garrett wrote:
> On Tue, Oct 02, 2012 at 11:13:17PM -0600, Thavatchai Makphaibulchoke wrote:
> 
>> Sounds like a better solution is to allow accesses to only I/O regions 
>> presented in the EFI memory map for physical addresses below 1 MB.
> 
> That won't work - unfortunately we do still need the low region to be 
> available for X because some platforms expect us to use int10 even on 
> EFI (yes, yes, I know). Do you have a copy of the EFI memory map for a 
> system that's broken with the current code?
> 

I honestly think this calls for a quirk, or more likely, no action at
all ("don't do that, then.")

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: use %pK for /proc/vmallocinfo

2012-10-02 Thread David Rientjes
On Tue, 2 Oct 2012, Kees Cook wrote:

> >> In the paranoid case of sysctl kernel.kptr_restrict=2, mask the kernel
> >> virtual addresses in /proc/vmallocinfo too.
> >>
> >> Reported-by: Brad Spengler 
> >> Signed-off-by: Kees Cook 
> >
> > /proc/vmallocinfo is S_IRUSR, not S_IRUGO, so exactly what are you trying
> > to protect?
> 
> Trying to block the root user from seeing virtual memory addresses
> (mode 2 of kptr_restrict).
> 
> Documentation/sysctl/kernel.txt:
> "This toggle indicates whether restrictions are placed on
> exposing kernel addresses via /proc and other interfaces.  When
> kptr_restrict is set to (0), there are no restrictions.  When
> kptr_restrict is set to (1), the default, kernel pointers
> printed using the %pK format specifier will be replaced with 0's
> unless the user has CAP_SYSLOG.  When kptr_restrict is set to
> (2), kernel pointers printed using %pK will be replaced with 0's
> regardless of privileges."
> 
> Even though it's S_IRUSR, it still needs %pK for the paranoid case.
> 

So root does echo 0 > /proc/sys/kernel/kptr_restrict first.  Again: what 
are you trying to protect?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: fate of the kvmtool tree

2012-10-02 Thread Stephen Rothwell
Hi all,

Well, here we are at another merge window and the kvmtool tree is still
not merged.  So, is it likely that it will be merged in this merge
window?  or the next?  If not, can I please remove it from linux-next
(and have it removed from the auto-latest branch of the tip tree) as it
just adds weight to the patches and the tree that people have to fetch to
work with.

It has been in linux-next since before v3.2 ...

 224 files changed, 25539 insertions(+), 1 deletion(-)
1266 commits

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpoiU7M8W0qK.pgp
Description: PGP signature


Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread Matthew Garrett
On Tue, Oct 02, 2012 at 11:13:17PM -0600, Thavatchai Makphaibulchoke wrote:

> Sounds like a better solution is to allow accesses to only I/O regions 
> presented in the EFI memory map for physical addresses below 1 MB.

That won't work - unfortunately we do still need the low region to be 
available for X because some platforms expect us to use int10 even on 
EFI (yes, yes, I know). Do you have a copy of the EFI memory map for a 
system that's broken with the current code?

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread H. Peter Anvin
On 10/02/2012 10:15 PM, Matthew Garrett wrote:
> On Tue, Oct 02, 2012 at 09:44:16PM -0700, H. Peter Anvin wrote:
> 
>> We *always* expose the I/O regions to /dev/mem.  That is what /dev/mem
>> *does*.  The above is an exception (which is really obsolete, too: we
>> should simply disallow access to anything which is treated as system
>> RAM, which doesn't include the BIOS regions in question; the only reason
>> we don't is that some versions of X take a checksum of the RAM in the
>> first megabyte as some kind of idiotic random seed.)
> 
> Oh, right, got you. In that case I think we potentially need a 
> finer-grained check on EFI platforms - the EFI memory map is kind enough 
> to tell us the difference between unusable regions and io regions, and 
> we could avoid access to the unusable ones.
> 

Well, we have the same in BIOS space with "reserved" regions.  The
problem is that they are actually I/O regions as far as programs like X,
dmidecode and so on.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata fixes for 3.7

2012-10-02 Thread Michael Tokarev
On 02.10.2012 23:59, Jeff Garzik wrote:
> On 10/02/2012 03:44 PM, Michael Tokarev wrote:
>> On 02.10.2012 23:40, Jeff Garzik wrote:
>>
>>> Minor libata updates, nothing notable.
>>>
>>> 1) Apply -- and then revert -- the FUA feature.  Caused
>>> disk corruption in linux-next, proving it cannot be turned on by
>>> default.
>>
>> Any details on that?  Disk corruprion is rather a nasty
>> side-effect indeed.
> 
> One thread with reports is
> 
> Storage related regression in linux-next 20120824

Eg, https://lkml.org/lkml/2012/8/27/66 (two reports).
Thank you!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread Thavatchai Makphaibulchoke
Thank you both for the comments.

Sounds like a better solution is to allow accesses to only I/O regions 
presented in the EFI memory map for physical addresses below 1 MB.

Do we need to worry about the X checksum in the first MB on an EFI system?

Thanks,
Mak.


On 10/02/2012 11:15 PM, Matthew Garrett wrote:
> On Tue, Oct 02, 2012 at 09:44:16PM -0700, H. Peter Anvin wrote:
> 
>> We *always* expose the I/O regions to /dev/mem.  That is what /dev/mem
>> *does*.  The above is an exception (which is really obsolete, too: we
>> should simply disallow access to anything which is treated as system
>> RAM, which doesn't include the BIOS regions in question; the only reason
>> we don't is that some versions of X take a checksum of the RAM in the
>> first megabyte as some kind of idiotic random seed.)
> 
> Oh, right, got you. In that case I think we potentially need a 
> finer-grained check on EFI platforms - the EFI memory map is kind enough 
> to tell us the difference between unusable regions and io regions, and 
> we could avoid access to the unusable ones.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] psmouse: mitigate failing-mouse symptoms

2012-10-02 Thread Jim Hill
On 09/30/2012 11:02 AM, Alessandro Rubini wrote:
>> I think this would be less controversial if the run-time default were
>> to disable the feature.
>
> Yes, that's the common sensible path

Fixed, there's no way I can test it well enough for anything more widespread.

> Then, I think it would be good to have a specific sub-structure for
> this stuff.

I don't care much either way, though I think I'm missing the point of subbing
in a memset -- only reason I can think of is efficiency which doesn't make
sense to me here.  Ask and I'll add one or both.

> I also think it would make it clearer what these are:

I did de-jargonize the names some,  "interval_start" for "base" makes it
clearer which as you say it could use.

I encountered one other person with this problem and he ran it for a long while
and was happy to have it.  I'm appending the latest version, which is a good
bit improved and what I've been running for the last year amended with the name
and default-filter changes above.

But of course I upgraded a month ago to a box with no PS/2 mouse port, so at
this point all I can do is hope someone finds it helpful.  The reorg'd code 
kinda highlights how incomplete it is, there's lots of mouse models out there.

So if it looks good or almost-good to you and there's anything else I can do,
tell me and I'll be glad to do it.

Thanks,
Jim

>From 2681957a610191cb5d7b7f65be11ea2be06df00f Mon Sep 17 00:00:00 2001
From: Jim Hill 
Date: Mon, 28 Mar 2011 13:10:36 -0700
Subject: [PATCH] Input: psmouse - further improve error handling for
 basic protocols

  Keep a failing PS/2 mouse usable until it's convenient to replace it.
  Filter incoming packets: drop invalid ones and attempt to correct for
  dropped bytes.
 
  New parameter 'filter' makes filtering and logging selectable, leave at 0
  to shut off all effects, 3 to work silently.
--
 drivers/input/mouse/psmouse-base.c | 197 +
 drivers/input/mouse/psmouse.h  |   7 ++
 2 files changed, 204 insertions(+)

diff --git a/drivers/input/mouse/psmouse-base.c 
b/drivers/input/mous/psmouse-base.c
index 22fe254..4a3a95f 100644
--- a/drivers/input/mouse/psmouse-base.c
+++ b/drivers/input/mouse/psmouse-base.c
@@ -72,6 +72,25 @@ static unsigned int psmouse_resync_time;
 module_param_named(resync_time, psmouse_resync_time, uint, 0644);
 MODULE_PARM_DESC(resync_time, "How long can mouse stay idle before forcing 
resync (in seconds, 0 = never).");

+enum {
+   DROP_BAD_PACKET = 1,
+   ATTEMPT_SYNC = 2,
+   LOG_SUMMARIES = 4,
+   LOG_MALFORMED = 8,
+   LOG_ALL  = 16,
+   REPORT_SYNC_FAILURE = 32,
+
+   DEFAULT_FILTER = 0
+};
+static int psmouse_filter = DEFAULT_FILTER;
+module_param_named(filter, psmouse_filter, int, 0644);
+MODULE_PARM_DESC(filter, "1 = drop invalid or hotio packets"
+   ", +2 = attempt-sync"
+   ", +4 = summary logs"
+   ", +8 = log-malformed"
+   ",+16 = log-all"
+   ",+32 = use hard resets");
+
 PSMOUSE_DEFINE_ATTR(protocol, S_IWUSR | S_IRUGO,
NULL,
psmouse_attr_show_protocol, psmouse_attr_set_protocol);
@@ -123,6 +142,169 @@ struct psmouse_protocol {
 };

 /*
+ * psmouse_filter_*: diagnose bad data, recover what we can, drop the rest, log
+ * selected events. See input_report_*()s in psmouse_process_byte below, and
+ * 
+ */
+
+static int psmouse_filter_header_byte(enum psmouse_type type, int byte)
+{
+   int todo = 0;
+   if ((byte & 0xc0) &&
+   type != PSMOUSE_THINKPS &&
+   type != PSMOUSE_GENPS)
+   todo |= DROP_BAD_PACKET+ATTEMPT_SYNC;
+   if ((byte & 0x08) == 0 &&
+   type != PSMOUSE_THINKPS &&
+   type != PSMOUSE_CORTRON)
+   todo |= DROP_BAD_PACKET+ATTEMPT_SYNC;
+   return todo;
+}
+
+static int psmouse_filter_wheel_byte(enum psmouse_type type, int byte)
+{
+   int todo = 0;
+   if (type == PSMOUSE_IMPS || type == PSMOUSE_GENPS)
+   if (abs(byte) > 3)
+   todo = DROP_BAD_PACKET+ATTEMPT_SYNC;
+   if (type == PSMOUSE_IMEX)
+   if (((byte&0xC0) == 0) || ((byte&0XC0) == 0xC0))
+   if (abs((byte&0x08)-(byte&0x07)) > 3)
+   todo = DROP_BAD_PACKET+ATTEMPT_SYNC;
+   return todo;
+}
+
+static int psmouse_filter_motion(struct psmouse *m)
+{  /*
+* Hunt for implausible accelerations here if it ever seems necessary.
+* Header/wheel sniffing seems to detect everything recoverable so far.
+*/
+   return 0;
+}
+
+static int psmouse_filter_hotio(struct psmouse *m)
+{
+   int ret = 0;
+   if (time_after(m->last, m->hotio_interval_start + HZ/m->rate)) {
+   m->hotio_interval_pkts = 0;
+   m->hotio_interval_start = m->last;
+   }
+   if 

Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread Matthew Garrett
On Tue, Oct 02, 2012 at 09:44:16PM -0700, H. Peter Anvin wrote:

> We *always* expose the I/O regions to /dev/mem.  That is what /dev/mem
> *does*.  The above is an exception (which is really obsolete, too: we
> should simply disallow access to anything which is treated as system
> RAM, which doesn't include the BIOS regions in question; the only reason
> we don't is that some versions of X take a checksum of the RAM in the
> first megabyte as some kind of idiotic random seed.)

Oh, right, got you. In that case I think we potentially need a 
finer-grained check on EFI platforms - the EFI memory map is kind enough 
to tell us the difference between unusable regions and io regions, and 
we could avoid access to the unusable ones.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: use %pK for /proc/vmallocinfo

2012-10-02 Thread Kees Cook
On Tue, Oct 2, 2012 at 10:12 PM, David Rientjes  wrote:
> On Tue, 2 Oct 2012, Kees Cook wrote:
>
>> In the paranoid case of sysctl kernel.kptr_restrict=2, mask the kernel
>> virtual addresses in /proc/vmallocinfo too.
>>
>> Reported-by: Brad Spengler 
>> Signed-off-by: Kees Cook 
>
> /proc/vmallocinfo is S_IRUSR, not S_IRUGO, so exactly what are you trying
> to protect?

Trying to block the root user from seeing virtual memory addresses
(mode 2 of kptr_restrict).

Documentation/sysctl/kernel.txt:
"This toggle indicates whether restrictions are placed on
exposing kernel addresses via /proc and other interfaces.  When
kptr_restrict is set to (0), there are no restrictions.  When
kptr_restrict is set to (1), the default, kernel pointers
printed using the %pK format specifier will be replaced with 0's
unless the user has CAP_SYSLOG.  When kptr_restrict is set to
(2), kernel pointers printed using %pK will be replaced with 0's
regardless of privileges."

Even though it's S_IRUSR, it still needs %pK for the paranoid case.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: use %pK for /proc/vmallocinfo

2012-10-02 Thread David Rientjes
On Tue, 2 Oct 2012, Kees Cook wrote:

> In the paranoid case of sysctl kernel.kptr_restrict=2, mask the kernel
> virtual addresses in /proc/vmallocinfo too.
> 
> Reported-by: Brad Spengler 
> Signed-off-by: Kees Cook 

/proc/vmallocinfo is S_IRUSR, not S_IRUGO, so exactly what are you trying 
to protect?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/2] hwmon: (ads7828) add support for ADS7830

2012-10-02 Thread Guenter Roeck
On Tue, Oct 02, 2012 at 11:33:27PM -0400, Vivien Didelot wrote:
> From: Guillaume Roguez 
> 
> The ADS7830 device is almost the same as the ADS7828,
> except that it does 8-bit sampling, instead of 12-bit.
> This patch extends the ads7828 driver to support this chip.
> 
> Signed-off-by: Guillaume Roguez 
> Signed-off-by: Vivien Didelot 

Guillaume,
Vivien,

[ ... ]

> @@ -147,6 +152,7 @@ static int ads7828_detect(struct i2c_client *client,
>  {
>   struct i2c_adapter *adapter = client->adapter;
>   u8 default_cmd_byte = ADS7828_CMD_SD_SE | ADS7828_CMD_PD3;
> + bool is_8bit = false;
>   int ch;
>  
>   /* Check we have a valid client */
> @@ -158,7 +164,9 @@ static int ads7828_detect(struct i2c_client *client,
>* dedicated register so attempt to sanity check using knowledge of
>* the chip
>* - Read from the 8 channel addresses
> -  * - Check the top 4 bits of each result are not set (12 data bits)
> +  * - Check the top 4 bits of each result:
> +  *   - They should not be set in case of 12-bit samples
> +  *   - The two bytes should be equal in case of 8-bit samples
>*/
>   for (ch = 0; ch < ADS7828_NCH; ch++) {
>   u8 cmd = ads7828_cmd_byte(default_cmd_byte, ch);
> @@ -168,13 +176,20 @@ static int ads7828_detect(struct i2c_client *client,
>   return -ENODEV;
>  
>   if (in_data & 0xF000) {
> - pr_debug("%s : Doesn't look like an ads7828 device\n",
> -  __func__);
> - return -ENODEV;
> + if ((in_data >> 8) == (in_data & 0xFF)) {
> + /* Seems to be an ADS7830 (8-bit sample) */
> + is_8bit = true;
> + } else {
> + dev_dbg(>dev, "doesn't look like an 
> ADS7828 compatible device\n");
> + return -ENODEV;
> + }
>   }
>   }

I have been thinking about this. The detection function is already quite weak,
and this makes it even weaker. Reason is that you conly check for ADS7830 if the
check for ADS7828 failed, and you repeat the pattern for each channel.
Unfortunately, that means that you don't check for the ADS7830 condition if the
value returned for a channel happens to be a valid ADS7828 value, even if it is
not valid for ADS7830 (and even if you already know that the chip is not a
ADS7828).

Example:
ch=0: 0x1818--> You know it is not ADS7828
ch=1: 0x0818--> You know it is not ADS7830, but you don't check for 
it

I don't know an optimal solution right now, but maybe something like

maybe_7828 = true;
maybe_7830 = true;
for (ch = 0; ch < ADS7828_NCH && (maybe_7828 || maybe_7830); ch++) {
...
if (in_data & 0xF000)
maybe_7828 = false;
if ((in_data >> 8) != (in_data & 0xFF))
maybe_7830 = false;
}
if (!maybe_7828 && !maybe_7830)
return -ENODEV;

if (maybe_7828)
strlcpy(info->type, "ads7828", I2C_NAME_SIZE);
else
strlcpy(info->type, "ads7830", I2C_NAME_SIZE);

Frankly I would prefer to get rid of the _detect function entirely, I just don't
know if that would negatively affect some users. To give you an example for a
bad result: The function will wrongly detect an ADS7830 as ADS7828 if all ADC
channels report a value between 0x00 and 0x0f.

How do you use the chip ? Do you need the detect function in your application ?

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] make GFP_NOTRACK flag unconditional

2012-10-02 Thread David Rientjes
On Fri, 28 Sep 2012, Glauber Costa wrote:

> There was a general sentiment in a recent discussion (See
> https://lkml.org/lkml/2012/9/18/258) that the __GFP flags should be
> defined unconditionally. Currently, the only offender is GFP_NOTRACK,
> which is conditional to KMEMCHECK.
> 
> This simple patch makes it unconditional.
> 
> Signed-off-by: Glauber Costa 
> CC: Christoph Lameter 
> CC: Mel Gorman 
> CC: Andrew Morton 

Acked-by: David Rientjes 

I think it was done this way to show that if CONFIG_KMEMCHECK=n then the 
bit could be reused for something else but I can't think of any reason why 
that would be useful; what would need to add a gfp bit that would also 
happen to depend on CONFIG_KMEMCHECK=n?  Nothing comes to mind to save a 
bit.

There are other cases of this as well, like __GFP_OTHER_NODE which is only 
useful for thp and it's defined unconditionally.  So this seems fine to 
me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] slub: init_kmem_cache_cpus() and put_cpu_partial() can be static

2012-10-02 Thread David Rientjes
On Fri, 28 Sep 2012, Fengguang Wu wrote:

> Acked-by: Glauber Costa 
> Signed-off-by: Fengguang Wu 

Acked-by: David Rientjes 

I think init_kmem_cache_cpus() would also benefit from just being inlined
into alloc_kmem_cache_cpus().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/2] hwmon: (ads7828) driver cleanup

2012-10-02 Thread Guenter Roeck
On Tue, Oct 02, 2012 at 11:33:26PM -0400, Vivien Didelot wrote:
> * Remove module parameters, add a ads7828_platform_data;
> * Move driver declaration to avoid adding function prototypes;
> * Remove unused macros;
> * Coding Style fixes.
> 
> Signed-off-by: Vivien Didelot 

Hi Vivien,

nice cleanup.

One more comment below. No need to re-send; I'll fix that and apply the patch to
-next.

Guenter

[ ... ]

>  /* Return 0 if detection is successful, -ENODEV otherwise */
>  static int ads7828_detect(struct i2c_client *client,
> struct i2c_board_info *info)
>  {
>   struct i2c_adapter *adapter = client->adapter;
> + u8 default_cmd_byte = ADS7828_CMD_SD_SE | ADS7828_CMD_PD3;
>   int ch;
>  
>   /* Check we have a valid client */
> @@ -195,9 +161,12 @@ static int ads7828_detect(struct i2c_client *client,
>* - Check the top 4 bits of each result are not set (12 data bits)
>*/
>   for (ch = 0; ch < ADS7828_NCH; ch++) {
> - u16 in_data;
> - u8 cmd = channel_cmd_byte(ch);
> - in_data = i2c_smbus_read_word_swapped(client, cmd);
> + u8 cmd = ads7828_cmd_byte(default_cmd_byte, ch);
> + u16 in_data = i2c_smbus_read_word_swapped(client, cmd);

s/u16/int/

Otherwise in_data can never be < 0.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] Security subsystem updates for 3.7

2012-10-02 Thread Linus Torvalds
On Tue, Oct 2, 2012 at 4:35 AM, James Morris  wrote:
> Highlights:
>
> - Integrity: add local fs integrity verification to detect offline attacks
> - Integrity: add digital signature verification

Ok, the integrity changes in particular clashed with the new user
namespace support by Eric Biederman.

The clashes weren't all that big, but there have been semantic changes
in this area, and I'd like Eric to please check that I resolved it
correctly, and the integrity people to double-check my changes to the
"fowner" field.

Mimi, Dmitry, Eric? Please check my current git tree, in particular
the security/integrity/ima/ima_policy.c file, but I think Eric should
look at the kernel/auditsc.c merge too, in case I missed something.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread H. Peter Anvin
On 10/02/2012 09:31 PM, Matthew Garrett wrote:
> On Tue, Oct 02, 2012 at 02:50:09PM -0700, H. Peter Anvin wrote:
> 
>> That sounds like exactly the opposite of normal /dev/mem behavior... we
>> allow access to non-memory resources (which really could do anything if
>> misused), but not memory.
> 
> From arch/x86/mm/init.c:
> 
>  * On x86, access has to be given to the first megabyte of ram because that 
> area
>  * contains bios code and data regions used by X and dosemu and similar apps.
> 
> Limiting this to just RAM would be safer than it currently is. I'm not 
> convinced that there's any good reason to allow *any* access down there 
> for EFI systems, though.
> 

Sorry, fail.

We *always* expose the I/O regions to /dev/mem.  That is what /dev/mem
*does*.  The above is an exception (which is really obsolete, too: we
should simply disallow access to anything which is treated as system
RAM, which doesn't include the BIOS regions in question; the only reason
we don't is that some versions of X take a checksum of the RAM in the
first megabyte as some kind of idiotic random seed.)

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH v3 1/3] devfreq: Core updates to support devices which can idle

2012-10-02 Thread Rajagopal Venkat
On 2 October 2012 11:11, MyungJoo Ham  wrote:
>> On 27 September 2012 13:50, MyungJoo Ham  wrote:
>> >> Prepare devfreq core framework to support devices which
>> >> can idle. When device idleness is detected perhaps through
>> >> runtime-pm, need some mechanism to suspend devfreq load
>> >> monitoring and resume back when device is online. Present
>> >> code continues monitoring unless device is removed from
>> >> devfreq core.
>> >>
>> >> This patch introduces following design changes,
>> >>
>> >> - use per device work instead of global work to monitor device
>> >>   load. This enables suspend/resume of device devfreq and
>> >>   reduces monitoring code complexity.
>> >> - decouple delayed work based load monitoring logic from core
>> >>   by introducing helpers functions to be used by governors. This
>> >>   provides flexibility for governors either to use delayed work
>> >>   based monitoring functions or to implement their own mechanism.
>> >> - devfreq core interacts with governors via events to perform
>> >>   specific actions. These events include start/stop devfreq.
>> >>   This sets ground for adding suspend/resume events.
>> >>
>> >> The devfreq apis are not modified and are kept intact.
>> >>
>> >> Signed-off-by: Rajagopal Venkat 
>> >
>> > Hello,
>> >
>> >
>> > I'll do more through review tomorrow (sorry, I was occuppied by
>> > something other than Linux tasks for a while again); however,
>> > there are two concerns in this patch.
>> >
>> > 1. (minor but may bothersome in some rare but not-ignorable cases)
>> > Serialization issue between suspend/resume
>> > functions; this may happen with some failure or interrupts while entering 
>> > STR or
>> > unexpected usage of the API at drivers.
>>
>> Regarding the invalid usage of suspend/resume apis, we can have
>> additional checks
>> something like,
>>
>> void devfreq_monitor_suspend(struct devfreq *devfreq)
>> {
>> .
>> if (devfreq->stop_polling)
>> return;
>> ..
>> }
>>
>> void devfreq_monitor_resume(struct devfreq *devfreq)
>> {
>> .
>> if (!devfreq->stop_polling)
>> return;
>> ..
>> }
>>
>> >
>> >   For example, if devfreq_monitor_suspend() and devfreq_montir_resume() 
>> > are called
>> > almost simultaneously, we may execute 1) locked part of suspend, 2) locked 
>> > part of
>> > resume, 3) cancel_delayed_work_sync of suspend.
>> >
>> >   Then, we may have stop_polling = false w/ cancel_delayed_work_sync() in 
>> > effect.
>> >
>> >   Let's not assume that suspend() and resume() may called almost 
>> > simultaneously,
>> > especially in subsystem core code.
>
> (sorry, I missed "not be" between "may" and "called" here)
>
>> >
>>
>> These devfreq_monitor_suspend() and devfreq_monitor_resume() functions are
>> executed when device idleness is detected. Perhaps,
>> - using runtime-pm: the runtime_suspend() and runtime_resume() are mutually
>> exclusive and is guaranteed not to run in parallel.
>> - driver may have its own mechanism: in my opinion, driver should ensure
>> suspend/resume are sequential even for it to know its devfreq status.
>>
>> Assuming even if above sequence occurs, I don't see any problem having
>> stop_polling = false w/ cancel_delayed_work_sync() in effect. Since the 
>> suspend
>> is the last one to complete, monitoring will not continue.
>
> Why don't you simply extend the mutex-locked context?
>
> I.e.,
> +   mutex_lock(>lock);
> +   devfreq->stop_polling = true;
> +   mutex_unlock(>lock);
> +   cancel_delayed_work_sync(>work);
> -->
> +   mutex_lock(>lock);
> +   devfreq->stop_polling = true;
> +   cancel_delayed_work_sync(>work);
> +   mutex_unlock(>lock);
>
Extending the mutex-locked context would cause deadlock.

Since scheduled work callback also needs mutex lock, calling
cancel_delayed_work_sync
with lock held, would cause deadlock.

> This serializes data-update and the execution based on the data-update,
> resolving any inconsistency issues with the queue-status and devfreq
> variable.
>
> It doesn't have a heavy overhead to extend it and we have the
> probably of inconsistency due to serialization issues.
>
>>
>> >
>> > 2. What if polling_ms = 0 w/ active governors (such as ondemand)?
>> >
>> >  Users may declare the initial polling_ms = 0 w/ simple-ondemand in order 
>> > to
>> > pause sampling at boot-time and start sampling at run-time some time later.
>> >
>> > It appears that this patch will start forcibly at boot-time in such a case.
>>
>> Yes. This is a valid case which can be handled by
>>
>>  void devfreq_monitor_start(struct devfreq *devfreq)
>>  {
>> INIT_DELAYED_WORK_DEFERRABLE(>work, devfreq_monitor);
>> +   if (devfreq->profile->polling_ms)
>> queue_delayed_work(devfreq_wq, >work,
>> msecs_to_jiffies(devfreq->profile->polling_ms));
>>  }
>
>
> Please add the checking statement to every queue_delayed_work() statement:
> 

RE: [PATCHv4 1/4] modem_shm: Add Modem Access Framework

2012-10-02 Thread Arun MURTHY
> On Wed, Oct 3, 2012 at 9:24 AM, Arun MURTHY
>  wrote:
> >> On Mon, Oct 01, 2012 at 07:30:38AM +0200, Arun MURTHY wrote:
> >> > > On Fri, Sep 28, 2012 at 01:35:01PM +0530, Arun Murthy wrote:
> >> > > > +#include 
> >> > > > +#include 
> >> > > > +#include 
> >> > > > +#include 
> >> > > > +#include 
> >> > > > +
> >> > > > +static struct class *modem_class;
> >> > >
> >> > > What's wrong with a bus_type instead?
> >> >
> >> > Can I know the advantage of using bus_type over class?
> >>
> >> You have devices living on a bus, and it's much more descriptive than
> >> a class (which we are going to eventually get rid of one of these days...).
> >>
> >> Might I ask why you choose a class over a bus_type?
> >
> > Basically my requirement is to create a central entity for accessing
> > and releasing modem from APE. Since this is done by different clients
> > the central entity should be able to handle the request and play
> > safely, since this has more affect in system suspend and deep sleep.
> > Using class helps me in achieving this and also create an entry to
> > user space which can be used in the later parts. Moreover
> You can have that same mechanism work for bus_type as well.
> > this not something like a bus or so, so I didn't use bus instead went
> > with a simple class approach.
> >
> >>
> >> > > > +int modem_release(struct modem_desc *mdesc) {
> >> > > > +   if (!mdesc->release)
> >> > > > +   return -EFAULT;
> >> > > > +
> >> > > > +   if (modem_is_requested(mdesc)) {
> >> > > > +   atomic_dec(>mclients->cnt);
> >> > > > +   if (atomic_read(>use_cnt) == 1) {
> >> > > > +   mdesc->release(mdesc);
> >> > > > +   atomic_dec(>use_cnt);
> >> > > > +   }
> >> > >
> >> > > Eeek, why aren't you using the built-in reference counting that
> >> > > the struct device provided to you, and instead are rolling your own?
> >> > > This happens in many places, why?
> >> >
> >> > My usage of counters over here is for each modem there are many
> clients.
> >> > Each of the clients will have a ref to modem_desc. Each of them use
> >> > this for requesting and releasing the modem. One counter for
> >> > tracking the request and release for each client which is done by
> >> > variable 'cnt' in
> >> struct clients.
> >> > The counter use_cnt is used for tracking the modem request/release
> >> > irrespective of the clients and counter cli_cnt is used for
> >> > restricting the modem_get to the no of clients defined in no_clients.
> >> >
> >> > So totally 3 counter one for restricting the usage of modem_get by
> >> > clients, second for restricting modem request/release at top level,
> >> > and 3rd for restricting modem release/request for per client per
> >> > modem
> >> basis.
> >> >
> >> > Can you let me know if the same can be achieved by using built-in
> >> > ref counting?
> >>
> >> Yes, because you don't need all of those different levels, just stick
> >> with one and you should be fine. :)
> >>
> >
> > No, checks at all these levels are required, I have briefed out the need 
> > also.
> > This will have effect on system power management, i.e suspend and deep
> > sleep.
> > We restrict that the drivers should request modem only once and
> > release only once, but we cannot rely on the clients hence a check for
> > the same has to be done in the MAF. Also the no of clients should be
> > defined and hence a check for the same is done in MAF. Apart from all
> > these the requests coming from all the clients is to be accumulated
> > and based on that modem release or access should be performed, hence
> so.
> I think best way to deal with this is:
> Define a new bus type and have your clients call the bus exposed
> functionality when ever they need a service.So in your case it would be
> request and release only AND when all of your clients have released the bus
> then you can do the cleanup i.e. switch off the modem and on added
> advantage of making it a bus_type would be that you can do the reference
> counting in your bus driver.
> 
> Designing is not my forte but I feel this way you can solve the problem at
> hand.
> Please feel free to correct me.I would really appreciate it.

At the very first look itself this MAF is not a bus by its technical meaning, so
why to use bus_type is the point that I have.

Thanks and Regards,
Arun R Murthy
--


Re: [PATCH] Fix devmem_is_allowed for below 1MB accesses for an efi machine

2012-10-02 Thread Matthew Garrett
On Tue, Oct 02, 2012 at 02:50:09PM -0700, H. Peter Anvin wrote:

> That sounds like exactly the opposite of normal /dev/mem behavior... we
> allow access to non-memory resources (which really could do anything if
> misused), but not memory.

>From arch/x86/mm/init.c:

 * On x86, access has to be given to the first megabyte of ram because that area
 * contains bios code and data regions used by X and dosemu and similar apps.

Limiting this to just RAM would be safer than it currently is. I'm not 
convinced that there's any good reason to allow *any* access down there 
for EFI systems, though.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the battery tree with the tree

2012-10-02 Thread Stephen Rothwell
Hi Anton,

Today's linux-next merge of the battery tree got a conflict in
include/linux/mfd/88pm860x.h between commit 2e57d56747e6 ("mfd: 88pm860x:
Device tree support") from the mfd tree and commit a830d28b48bf
("power_supply: Enable battery-charger for 88pm860x") from the battery
tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc include/linux/mfd/88pm860x.h
index ef3e6b7,b7c5a3c..000
--- a/include/linux/mfd/88pm860x.h
+++ b/include/linux/mfd/88pm860x.h
@@@ -359,24 -460,10 +440,25 @@@ struct pm860x_platform_data 
struct pm860x_rtc_pdata *rtc;
struct pm860x_touch_pdata   *touch;
struct pm860x_power_pdata   *power;
+   struct charger_desc *chg_desc;
 -  struct regulator_init_data  *regulator;
 -
 -  unsigned short  companion_addr; /* I2C address of companion chip */
 +  struct regulator_init_data  *buck1;
 +  struct regulator_init_data  *buck2;
 +  struct regulator_init_data  *buck3;
 +  struct regulator_init_data  *ldo1;
 +  struct regulator_init_data  *ldo2;
 +  struct regulator_init_data  *ldo3;
 +  struct regulator_init_data  *ldo4;
 +  struct regulator_init_data  *ldo5;
 +  struct regulator_init_data  *ldo6;
 +  struct regulator_init_data  *ldo7;
 +  struct regulator_init_data  *ldo8;
 +  struct regulator_init_data  *ldo9;
 +  struct regulator_init_data  *ldo10;
 +  struct regulator_init_data  *ldo12;
 +  struct regulator_init_data  *ldo_vibrator;
 +  struct regulator_init_data  *ldo14;
 +
 +  int companion_addr; /* I2C address of companion chip */
int i2c_port;   /* Controlled by GI2C or PI2C */
int irq_mode;   /* Clear interrupt by read/write(0/1) */
int irq_base;   /* IRQ base number of 88pm860x */


pgp9IaKMGg527.pgp
Description: PGP signature


Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-10-02 Thread Benjamin Herrenschmidt
On Tue, 2012-10-02 at 14:43 -0700, Nishanth Aravamudan wrote:
> 
> Started looking into this. If your suspicion were accurate, wouldn't
> the
> bisection have stopped at 0e4bc95d87394364f408627067238453830bdbf3
> ("powerpc/iommu: Reduce spinlock coverage in iommu_alloc and
> iommu_free")?
> 
> Alex, the error is reproducible, right? Does it go away by reverting
> that commit against mainline? Just trying to narrow down my focus.

My suspiction is, I'm afraid, a real bug but not that bug since it would
only happen on U3 and this is an U4 machine ... so we have two bugs, one
of them still unidentified.

Cleaning up the CC list for now...

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH]staging "xgifb" Fix typos.

2012-10-02 Thread Justin P. Mattock
From: "Justin P. Mattock" 

Signed-off-by: Justin P. Mattock 

---

The patch below fixes typos while reading through staging "xgifb".

 drivers/staging/xgifb/TODO |2 +-
 drivers/staging/xgifb/vb_setmode.c |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/xgifb/TODO b/drivers/staging/xgifb/TODO
index 13d9bc2..392b29d 100644
--- a/drivers/staging/xgifb/TODO
+++ b/drivers/staging/xgifb/TODO
@@ -1,4 +1,4 @@
-This drivers still need a lot of work. I can list all cleanups to do but it's
+This drivers still needs a lot of work. I can list all cleanups to do but it's
 going to be long. So, I'm writing "cleanups" and not the list.
 
 Arnaud
diff --git a/drivers/staging/xgifb/vb_setmode.c 
b/drivers/staging/xgifb/vb_setmode.c
index e95a165..c8561a0 100644
--- a/drivers/staging/xgifb/vb_setmode.c
+++ b/drivers/staging/xgifb/vb_setmode.c
@@ -2501,7 +2501,7 @@ static void XGI_GetVBInfo(unsigned short ModeNo, unsigned 
short ModeIdIndex,
} else {
temp = 0x017C;
}
-   } else { /* 3nd party chip */
+   } else { /* 3rd party chip */
temp = SetCRT2ToLCD;
}
 
@@ -4390,7 +4390,7 @@ static void XGI_SetLCDRegs(unsigned short ModeNo, 
unsigned short ModeIdIndex,
xgifb_reg_and_or(pVBInfo->Part2Port, 0x17, 0xFB, 0x00);
xgifb_reg_and_or(pVBInfo->Part2Port, 0x18, 0xDF, 0x00);
 
-   /* Customized LCDB Des no add */
+   /* Customized LCDB Does not add */
tempbx = 5;
LCDBDesPtr = XGI_GetLcdPtr(tempbx, ModeNo, ModeIdIndex,
   RefreshRateTableIndex, pVBInfo);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 1/4] modem_shm: Add Modem Access Framework

2012-10-02 Thread anish singh
On Wed, Oct 3, 2012 at 9:24 AM, Arun MURTHY  wrote:
>> On Mon, Oct 01, 2012 at 07:30:38AM +0200, Arun MURTHY wrote:
>> > > On Fri, Sep 28, 2012 at 01:35:01PM +0530, Arun Murthy wrote:
>> > > > +#include 
>> > > > +#include 
>> > > > +#include 
>> > > > +#include 
>> > > > +#include 
>> > > > +
>> > > > +static struct class *modem_class;
>> > >
>> > > What's wrong with a bus_type instead?
>> >
>> > Can I know the advantage of using bus_type over class?
>>
>> You have devices living on a bus, and it's much more descriptive than a class
>> (which we are going to eventually get rid of one of these days...).
>>
>> Might I ask why you choose a class over a bus_type?
>
> Basically my requirement is to create a central entity for accessing and 
> releasing
> modem from APE. Since this is done by different clients the central entity 
> should
> be able to handle the request and play safely, since this has more affect in
> system suspend and deep sleep. Using class helps me in achieving this and
> also create an entry to user space which can be used in the later parts. 
> Moreover
You can have that same mechanism work for bus_type as well.
> this not something like a bus or so, so I didn't use bus instead went with a
> simple class approach.
>
>>
>> > > > +int modem_release(struct modem_desc *mdesc) {
>> > > > +   if (!mdesc->release)
>> > > > +   return -EFAULT;
>> > > > +
>> > > > +   if (modem_is_requested(mdesc)) {
>> > > > +   atomic_dec(>mclients->cnt);
>> > > > +   if (atomic_read(>use_cnt) == 1) {
>> > > > +   mdesc->release(mdesc);
>> > > > +   atomic_dec(>use_cnt);
>> > > > +   }
>> > >
>> > > Eeek, why aren't you using the built-in reference counting that the
>> > > struct device provided to you, and instead are rolling your own?
>> > > This happens in many places, why?
>> >
>> > My usage of counters over here is for each modem there are many clients.
>> > Each of the clients will have a ref to modem_desc. Each of them use
>> > this for requesting and releasing the modem. One counter for tracking
>> > the request and release for each client which is done by variable 'cnt' in
>> struct clients.
>> > The counter use_cnt is used for tracking the modem request/release
>> > irrespective of the clients and counter cli_cnt is used for
>> > restricting the modem_get to the no of clients defined in no_clients.
>> >
>> > So totally 3 counter one for restricting the usage of modem_get by
>> > clients, second for restricting modem request/release at top level,
>> > and 3rd for restricting modem release/request for per client per modem
>> basis.
>> >
>> > Can you let me know if the same can be achieved by using built-in ref
>> > counting?
>>
>> Yes, because you don't need all of those different levels, just stick with 
>> one
>> and you should be fine. :)
>>
>
> No, checks at all these levels are required, I have briefed out the need also.
> This will have effect on system power management, i.e suspend and deep
> sleep.
> We restrict that the drivers should request modem only once and release
> only once, but we cannot rely on the clients hence a check for the same has
> to be done in the MAF. Also the no of clients should be defined and hence a
> check for the same is done in MAF. Apart from all these the requests coming
> from all the clients is to be accumulated and based on that modem release
> or access should be performed, hence so.
I think best way to deal with this is:
Define a new bus type and have your clients call the bus exposed functionality
when ever they need a service.So in your case it would be request and release
only AND when all of your clients have released the bus then you can do the
cleanup i.e. switch off the modem and on added advantage of making it a bus_type
would be that you can do the reference counting in your bus driver.

Designing is not my forte but I feel this way you can solve the problem at hand.
Please feel free to correct me.I would really appreciate it.
>
> Thanks and Regards,
> Arun R Murthy
> ---
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Srivatsa S. Bhat
On 10/03/2012 09:37 AM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 09:29:01AM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
>>> On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
 On Tue, 2 Oct 2012, Jiri Kosina wrote:

 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
 commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
 Author: Paul E. McKenney 
 Date:   Thu Aug 2 17:43:50 2012 -0700

 rcu: Remove _rcu_barrier() dependency on __stop_machine()
 
 Currently, _rcu_barrier() relies on preempt_disable() to prevent
 any CPU from going offline, which in turn depends on CPU hotplug's
 use of __stop_machine().
 
 This patch therefore makes _rcu_barrier() use get_online_cpus() to
 block CPU-hotplug operations.  This has the added benefit of 
 removing
 the need for _rcu_barrier() to adopt callbacks:  Because 
 CPU-hotplug
 operations are excluded, there can be no callbacks to adopt.  This
 commit simplifies the code accordingly.
 
 Signed-off-by: Paul E. McKenney 
 Signed-off-by: Paul E. McKenney 
 Reviewed-by: Josh Triplett 
 ==

 is causing lockdep to complain (see the full trace below). I haven't 
 yet 
 had time to analyze what exactly is happening, and probably will not 
 have 
 time to do so until tomorrow, so just sending this as a heads-up in 
 case 
 anyone sees the culprit immediately.
>>>
>>> Hmmm...  Does the following patch help?  It swaps the order in which
>>> rcu_barrier() acquires the hotplug and rcu_barrier locks.
>>
>> It changed the report slightly (see for example the change in possible 
>> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
>> now directly about cpu_hotplug.lock). With the patch applied I get
>>
>>
>>
>> ==
>> [ INFO: possible circular locking dependency detected ]
>> 3.6.0-03888-g3f99f3b #145 Not tainted
>
> And it really seems valid. 
>>>
>>> Yep, it sure is.  I wasn't getting the full picture earlier, so please
>>> accept my apologies for the bogus patch.
>>>
> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
>
> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
> gets called, which acquires slab_mutex. This gives the reverse 
> dependency, 
> i.e. deadlock scenario is valid one.
>
> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
>
> Simply put, the commit causes get_online_cpus() to be called with 
> slab_mutex held, which is invalid.

 Oh, and it seems to be actually triggering in real.

 With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
 your patch, changing the order in which rcu_barrier() acquires hotplug and 
 rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
 very likely actually is the deadlock described above.
>>>
>>> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
>>> CPU hotplug events.
>>
>> Why not? IMHO it should have been perfectly fine! See below...
>>
>>>  I could go back to the old approach, but it is
>>> significantly more complex.  I cannot say that I am all that happy
>>> about anyone calling rcu_barrier() from a CPU hotplug notifier because
>>> it doesn't help CPU hotplug latency, but that is a separate issue.
>>>
>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
>>> is executing.  So the right way to resolve this seems to be to do the
>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>> of a hotplug notifier.  Should be fixable without too much hassle...
>>>
>>
>> The thing is, get_online_cpus() is smart: it *knows* when you are calling
>> it in a hotplug-writer, IOW, when you are in a hotplug notifier.
>>
>> The relevant code is:
>>
>> void get_online_cpus(void)
>> {
>> might_sleep();
>> if (cpu_hotplug.active_writer == current)
>> return;
>>  
>> }
>>
>> So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
>> notifier 

linux-next: manual merge of the mfd tree with Linus' tree

2012-10-02 Thread Stephen Rothwell
Hi Samuel,

Today's linux-next merge of the mfd tree got a conflict in
drivers/video/backlight/88pm860x_bl.c between commit e1c9ac420ef1
("Revert "backlight: fix memory leak on obscure error path"") from Linus'
tree and commit a6ccdcd98c39 ("mfd: 88pm860x: Use REG resource for
backlight") from the mfd tree.

I used the mfd tree version and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpxhQtMuQZIx.pgp
Description: PGP signature


[PATCH] perf kvm: move global variables into a perf_kvm struct

2012-10-02 Thread David Ahern
Cleans up the builtin-kvm code in preparation for the live mode.
No functional changes; only code movement.

Signed-off-by: David Ahern 
Cc: Dong Hao 
Cc: Runzhen Wang 
Cc: Xiao Guangrong 
Cc: Ingo Molnar 
---
 tools/perf/builtin-kvm.c |  460 +-
 1 file changed, 253 insertions(+), 207 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index a28c9ca..260abc5 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -32,16 +32,76 @@ struct event_key {
int info;
 };
 
+struct kvm_event_stats {
+   u64 time;
+   struct stats stats;
+};
+
+struct kvm_event {
+   struct list_head hash_entry;
+   struct rb_node rb;
+
+   struct event_key key;
+
+   struct kvm_event_stats total;
+
+   #define DEFAULT_VCPU_NUM 8
+   int max_vcpu;
+   struct kvm_event_stats *vcpu;
+};
+
+typedef int (*key_cmp_fun)(struct kvm_event*, struct kvm_event*, int);
+
+struct kvm_event_key {
+   const char *name;
+   key_cmp_fun key;
+};
+
+
+struct perf_kvm;
+
 struct kvm_events_ops {
bool (*is_begin_event)(struct perf_evsel *evsel,
   struct perf_sample *sample,
   struct event_key *key);
bool (*is_end_event)(struct perf_evsel *evsel,
 struct perf_sample *sample, struct event_key *key);
-   void (*decode_key)(struct event_key *key, char decode[20]);
+   void (*decode_key)(struct perf_kvm *kvm, struct event_key *key,
+  char decode[20]);
const char *name;
 };
 
+struct exit_reasons_table {
+   unsigned long exit_code;
+   const char *reason;
+};
+
+#define EVENTS_BITS12
+#define EVENTS_CACHE_SIZE  (1UL << EVENTS_BITS)
+
+struct perf_kvm {
+   struct perf_tooltool;
+   struct perf_session *session;
+
+   const char *file_name;
+   const char *report_event;
+   const char *sort_key;
+   int trace_vcpu;
+
+   struct exit_reasons_table *exit_reasons;
+   int exit_reasons_size;
+   const char *exit_reasons_isa;
+
+   struct kvm_events_ops *events_ops;
+   key_cmp_fun compare;
+   struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
+   u64 total_time;
+   u64 total_count;
+
+   struct rb_root result;
+};
+
+
 static void exit_event_get_key(struct perf_evsel *evsel,
   struct perf_sample *sample,
   struct event_key *key)
@@ -78,45 +138,35 @@ static bool exit_event_end(struct perf_evsel *evsel,
return kvm_entry_event(evsel);
 }
 
-struct exit_reasons_table {
-   unsigned long exit_code;
-   const char *reason;
-};
-
-struct exit_reasons_table vmx_exit_reasons[] = {
+static struct exit_reasons_table vmx_exit_reasons[] = {
VMX_EXIT_REASONS
 };
 
-struct exit_reasons_table svm_exit_reasons[] = {
+static struct exit_reasons_table svm_exit_reasons[] = {
SVM_EXIT_REASONS
 };
 
-static int cpu_isa;
-
-static const char *get_exit_reason(u64 exit_code)
+static const char *get_exit_reason(struct perf_kvm *kvm, u64 exit_code)
 {
-   int table_size = ARRAY_SIZE(svm_exit_reasons);
-   struct exit_reasons_table *table = svm_exit_reasons;
-
-   if (cpu_isa == 1) {
-   table = vmx_exit_reasons;
-   table_size = ARRAY_SIZE(vmx_exit_reasons);
-   }
+   int i = kvm->exit_reasons_size;
+   struct exit_reasons_table *tbl = kvm->exit_reasons;
 
-   while (table_size--) {
-   if (table->exit_code == exit_code)
-   return table->reason;
-   table++;
+   while (i--) {
+   if (tbl->exit_code == exit_code)
+   return tbl->reason;
+   tbl++;
}
 
pr_err("unknown kvm exit code:%lld on %s\n",
-   (unsigned long long)exit_code, cpu_isa ? "VMX" : "SVM");
+   (unsigned long long)exit_code, kvm->exit_reasons_isa);
return "UNKNOWN";
 }
 
-static void exit_event_decode_key(struct event_key *key, char decode[20])
+static void exit_event_decode_key(struct perf_kvm *kvm,
+ struct event_key *key,
+ char decode[20])
 {
-   const char *exit_reason = get_exit_reason(key->key);
+   const char *exit_reason = get_exit_reason(kvm, key->key);
 
scnprintf(decode, 20, "%s", exit_reason);
 }
@@ -128,11 +178,11 @@ static struct kvm_events_ops exit_events = {
.name = "VM-EXIT"
 };
 
-/*
- * For the mmio events, we treat:
- * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
- * the time of MMIO read: kvm_exit -> kvm_mmio(KVM_TRACE_MMIO_READ...).
- */
+/*
+ * For the mmio events, we treat:
+ * the time of MMIO write: kvm_mmio(KVM_TRACE_MMIO_WRITE...) -> kvm_entry
+ * the time of MMIO read: kvm_exit -> kvm_mmio(KVM_TRACE_MMIO_READ...).
+ 

Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Paul E. McKenney
On Wed, Oct 03, 2012 at 09:29:01AM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
> > On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
> >> On Tue, 2 Oct 2012, Jiri Kosina wrote:
> >>
> >> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> >> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> >> Author: Paul E. McKenney 
> >> Date:   Thu Aug 2 17:43:50 2012 -0700
> >>
> >> rcu: Remove _rcu_barrier() dependency on __stop_machine()
> >> 
> >> Currently, _rcu_barrier() relies on preempt_disable() to prevent
> >> any CPU from going offline, which in turn depends on CPU hotplug's
> >> use of __stop_machine().
> >> 
> >> This patch therefore makes _rcu_barrier() use get_online_cpus() to
> >> block CPU-hotplug operations.  This has the added benefit of 
> >> removing
> >> the need for _rcu_barrier() to adopt callbacks:  Because 
> >> CPU-hotplug
> >> operations are excluded, there can be no callbacks to adopt.  This
> >> commit simplifies the code accordingly.
> >> 
> >> Signed-off-by: Paul E. McKenney 
> >> Signed-off-by: Paul E. McKenney 
> >> Reviewed-by: Josh Triplett 
> >> ==
> >>
> >> is causing lockdep to complain (see the full trace below). I haven't 
> >> yet 
> >> had time to analyze what exactly is happening, and probably will not 
> >> have 
> >> time to do so until tomorrow, so just sending this as a heads-up in 
> >> case 
> >> anyone sees the culprit immediately.
> >
> > Hmmm...  Does the following patch help?  It swaps the order in which
> > rcu_barrier() acquires the hotplug and rcu_barrier locks.
> 
>  It changed the report slightly (see for example the change in possible 
>  unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
>  now directly about cpu_hotplug.lock). With the patch applied I get
> 
> 
> 
>  ==
>  [ INFO: possible circular locking dependency detected ]
>  3.6.0-03888-g3f99f3b #145 Not tainted
> >>>
> >>> And it really seems valid. 
> > 
> > Yep, it sure is.  I wasn't getting the full picture earlier, so please
> > accept my apologies for the bogus patch.
> > 
> >>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
> >>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
> >>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
> >>>
> >>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
> >>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
> >>> gets called, which acquires slab_mutex. This gives the reverse 
> >>> dependency, 
> >>> i.e. deadlock scenario is valid one.
> >>>
> >>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
> >>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
> >>>
> >>> Simply put, the commit causes get_online_cpus() to be called with 
> >>> slab_mutex held, which is invalid.
> >>
> >> Oh, and it seems to be actually triggering in real.
> >>
> >> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
> >> your patch, changing the order in which rcu_barrier() acquires hotplug and 
> >> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
> >> very likely actually is the deadlock described above.
> > 
> > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
> > notifier, which doesn't sit so well with rcu_barrier() trying to exclude
> > CPU hotplug events.
> 
> Why not? IMHO it should have been perfectly fine! See below...
> 
> >  I could go back to the old approach, but it is
> > significantly more complex.  I cannot say that I am all that happy
> > about anyone calling rcu_barrier() from a CPU hotplug notifier because
> > it doesn't help CPU hotplug latency, but that is a separate issue.
> > 
> > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > is executing.  So the right way to resolve this seems to be to do the
> > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > of a hotplug notifier.  Should be fixable without too much hassle...
> > 
> 
> The thing is, get_online_cpus() is smart: it *knows* when you are calling
> it in a hotplug-writer, IOW, when you are in a hotplug notifier.
> 
> The relevant code is:
> 
> void get_online_cpus(void)
> {
> might_sleep();
> if (cpu_hotplug.active_writer == current)
> return;
>   
> }
> 
> So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
> notifier should pose no problem at all!

Indeed, that was my confusion. 

Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Srivatsa S. Bhat
On 10/03/2012 09:14 AM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 09:05:31AM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 03:47 AM, Jiri Kosina wrote:
>>> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
>>>
 I don't see how this circular locking dependency can occur.. If you are 
 using SLUB,
 kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If 
 you are
 using SLAB, kmem_cache_destroy() wraps its whole operation inside 
 get/put_online_cpus(),
 which means, it cannot run concurrently with a hotplug operation such as 
 cpu_up(). So, I'm
 rather puzzled at this lockdep splat..
>>>
>>> I am using SLAB here.
>>>
>>> The scenario I think is very well possible:
>>>
>>>
>>> CPU 0   CPU 1
>>> kmem_cache_destroy()
>>
>> What about the get_online_cpus() right here at CPU0 before
>> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
>> on CPU1?? I still don't get it... :(
>>
>> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
>> and releasing slab_mutex).
> 
> The problem is that there is a CPU-hotplug notifier for slab, which
> establishes hotplug->slab.

Agreed.

>  Then having kmem_cache_destroy() call
> rcu_barrier() under the lock

Ah, that's where I disagree. kmem_cache_destroy() *cannot* proceed at
this point in time, because it has invoked get_online_cpus()! It simply
cannot be running past that point in the presence of a running hotplug
notifier! So, kmem_cache_destroy() should have been sleeping on the
hotplug lock, waiting for the notifier to release it, no?

> establishes slab->hotplug, which results
> in deadlock.  Jiri really did explain this in an earlier email
> message, but both of us managed to miss it.  ;-)
> 

Maybe I'm just being blind, sorry! ;-)

Regards,
Srivatsa S. Bhat

>   Thanx, Paul
> 
>> Regards,
>> Srivatsa S. Bhat
>>
>>> mutex_lock(slab_mutex)
>>> _cpu_up()
>>> cpu_hotplug_begin()
>>> mutex_lock(cpu_hotplug.lock)
>>> rcu_barrier()
>>> _rcu_barrier()
>>> get_online_cpus()
>>> mutex_lock(cpu_hotplug.lock)
>>>  (blocks, CPU 1 has the mutex)
>>> __cpu_notify()
>>> mutex_lock(slab_mutex)
>>>
>>> Deadlock.
>>>
>>> Right?
>>>
>>
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Srivatsa S. Bhat
On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
> On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
>> On Tue, 2 Oct 2012, Jiri Kosina wrote:
>>
>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
>> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
>> Author: Paul E. McKenney 
>> Date:   Thu Aug 2 17:43:50 2012 -0700
>>
>> rcu: Remove _rcu_barrier() dependency on __stop_machine()
>> 
>> Currently, _rcu_barrier() relies on preempt_disable() to prevent
>> any CPU from going offline, which in turn depends on CPU hotplug's
>> use of __stop_machine().
>> 
>> This patch therefore makes _rcu_barrier() use get_online_cpus() to
>> block CPU-hotplug operations.  This has the added benefit of removing
>> the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>> operations are excluded, there can be no callbacks to adopt.  This
>> commit simplifies the code accordingly.
>> 
>> Signed-off-by: Paul E. McKenney 
>> Signed-off-by: Paul E. McKenney 
>> Reviewed-by: Josh Triplett 
>> ==
>>
>> is causing lockdep to complain (see the full trace below). I haven't yet 
>> had time to analyze what exactly is happening, and probably will not 
>> have 
>> time to do so until tomorrow, so just sending this as a heads-up in case 
>> anyone sees the culprit immediately.
>
> Hmmm...  Does the following patch help?  It swaps the order in which
> rcu_barrier() acquires the hotplug and rcu_barrier locks.

 It changed the report slightly (see for example the change in possible 
 unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
 now directly about cpu_hotplug.lock). With the patch applied I get



 ==
 [ INFO: possible circular locking dependency detected ]
 3.6.0-03888-g3f99f3b #145 Not tainted
>>>
>>> And it really seems valid. 
> 
> Yep, it sure is.  I wasn't getting the full picture earlier, so please
> accept my apologies for the bogus patch.
> 
>>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
>>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
>>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
>>>
>>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
>>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
>>> gets called, which acquires slab_mutex. This gives the reverse dependency, 
>>> i.e. deadlock scenario is valid one.
>>>
>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
>>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
>>>
>>> Simply put, the commit causes get_online_cpus() to be called with 
>>> slab_mutex held, which is invalid.
>>
>> Oh, and it seems to be actually triggering in real.
>>
>> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
>> your patch, changing the order in which rcu_barrier() acquires hotplug and 
>> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
>> very likely actually is the deadlock described above.
> 
> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
> CPU hotplug events.

Why not? IMHO it should have been perfectly fine! See below...

>  I could go back to the old approach, but it is
> significantly more complex.  I cannot say that I am all that happy
> about anyone calling rcu_barrier() from a CPU hotplug notifier because
> it doesn't help CPU hotplug latency, but that is a separate issue.
> 
> But the thing is that rcu_barrier()'s assumptions work just fine if either
> (1) it excludes hotplug operations or (2) if it is called from a hotplug
> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> is executing.  So the right way to resolve this seems to be to do the
> get_online_cpus() only if rcu_barrier() is -not- executing in the context
> of a hotplug notifier.  Should be fixable without too much hassle...
> 

The thing is, get_online_cpus() is smart: it *knows* when you are calling
it in a hotplug-writer, IOW, when you are in a hotplug notifier.

The relevant code is:

void get_online_cpus(void)
{
might_sleep();
if (cpu_hotplug.active_writer == current)
return;

}

So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
notifier should pose no problem at all!
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] memory-hotplug : notification of memoty block's state

2012-10-02 Thread Yasuaki Ishimatsu

Hi Chen,

2012/10/03 11:12, Ni zhan Chen wrote:

On 10/03/2012 09:21 AM, Yasuaki Ishimatsu wrote:

Hi Andrew,

2012/10/03 6:42, Andrew Morton wrote:

On Tue, 2 Oct 2012 17:25:06 +0900
Yasuaki Ishimatsu  wrote:


remove_memory() offlines memory. And it is called by following two cases:

1. echo offline >/sys/devices/system/memory/memoryXX/state
2. hot remove a memory device

In the 1st case, the memory block's state is changed and the notification
that memory block's state changed is sent to userland after calling
offline_memory(). So user can notice memory block is changed.

But in the 2nd case, the memory block's state is not changed and the
notification is not also sent to userspcae even if calling offline_memory().
So user cannot notice memory block is changed.

We should also notify to userspace at 2nd case.


These two little patches look reasonable to me.

There's a lot of recent activity with memory hotplug!  We're in the 3.7
merge window now so it is not a good time to be merging new material.



Also there appear to be two teams working on it and it's unclear to me
how well coordinated this work is?


As you know, there are two teams for developing the memory hotplug.
  - Wen's patch-set
https://lkml.org/lkml/2012/9/5/201

  - Lai's patch-set
https://lkml.org/lkml/2012/9/10/180

Wen's patch-set is for removing physical memory. Now, I'm splitting the
patch-set for reviewing more easy. If the patch-set is merged into
linux kernel, I believe that linux on x86 can hot remove a physical
memory device.

But it is not enough since we cannot remove a memory which has kernel
memory. If we guarantee the memory hot remove, the memory must belong
to ZONE_MOVABLE.

So Lai's patch-set tries to create a movable node that the all memory
belongs to ZONE_MOVABLE.

I think there are two chances for creating the movable node.
  - boot time
  - after hot add memory

- boot time

For creating a movable memory, linux has two kernel parameters
(kernelcore and movablecore). But it is not enough, since even if we
set the kernel paramter, the movable memory is distributed evenly in
each node. So we introduce the kernelcore_max_addr boot parameter.
The parameter limits the range of the memory used as a kernel memory.

For example, the system has following nodes.

node0 : 0x4000 - 0x8000
node1 : 0x8000 - 0xc000

And when I want to hot remove a node1, we set "kernelcore_max_addr=0x8000".
In doing so, kernel memory is limited within 0x8000 and node1's
memory belongs to ZONE_MOEVALBE. As a result, we can guarantee that
node1 is a movable node and we always hot remove node1.

- after hot add memory

When hot adding memory, the memory belongs to ZONE_NORMAL and is offline.
If we online the memory, the memory may have kernel memory. In this case,
we cannot hot remove the memory. So we introduce the online_movable
function. If we use the function as follow, the memory belongs to
ZONE_MOVABLE.

echo online_movable > /sys/devices/system/node/nodeX/memoryX/state

So when new node is hot added and I echo "online_movale" to all hot added
memory, the node's memory belongs to ZONE_MOVABLE. As a result, we can Y
guarantee that the node is a movable node and we always hot remove node.


Hi Yasuaki,

This time can kernel memory allocated from ZONE_MOVABLE ?


No. In this case, the memory cannot be used as kernel memory.

Thanks,
Yasuaki Ishimatsu





# I hope to help your understanding about our works by the information.

Thanks,
Yasuaki Ishimatsu



However these two patches are pretty simple and do fix a problem, so I
added them to the 3.7 MM queue.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv4 1/4] modem_shm: Add Modem Access Framework

2012-10-02 Thread Arun MURTHY
> On Mon, Oct 01, 2012 at 07:30:38AM +0200, Arun MURTHY wrote:
> > > On Fri, Sep 28, 2012 at 01:35:01PM +0530, Arun Murthy wrote:
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +
> > > > +static struct class *modem_class;
> > >
> > > What's wrong with a bus_type instead?
> >
> > Can I know the advantage of using bus_type over class?
> 
> You have devices living on a bus, and it's much more descriptive than a class
> (which we are going to eventually get rid of one of these days...).
> 
> Might I ask why you choose a class over a bus_type?

Basically my requirement is to create a central entity for accessing and 
releasing
modem from APE. Since this is done by different clients the central entity 
should
be able to handle the request and play safely, since this has more affect in
system suspend and deep sleep. Using class helps me in achieving this and
also create an entry to user space which can be used in the later parts. 
Moreover
this not something like a bus or so, so I didn't use bus instead went with a 
simple class approach.

> 
> > > > +int modem_release(struct modem_desc *mdesc) {
> > > > +   if (!mdesc->release)
> > > > +   return -EFAULT;
> > > > +
> > > > +   if (modem_is_requested(mdesc)) {
> > > > +   atomic_dec(>mclients->cnt);
> > > > +   if (atomic_read(>use_cnt) == 1) {
> > > > +   mdesc->release(mdesc);
> > > > +   atomic_dec(>use_cnt);
> > > > +   }
> > >
> > > Eeek, why aren't you using the built-in reference counting that the
> > > struct device provided to you, and instead are rolling your own?
> > > This happens in many places, why?
> >
> > My usage of counters over here is for each modem there are many clients.
> > Each of the clients will have a ref to modem_desc. Each of them use
> > this for requesting and releasing the modem. One counter for tracking
> > the request and release for each client which is done by variable 'cnt' in
> struct clients.
> > The counter use_cnt is used for tracking the modem request/release
> > irrespective of the clients and counter cli_cnt is used for
> > restricting the modem_get to the no of clients defined in no_clients.
> >
> > So totally 3 counter one for restricting the usage of modem_get by
> > clients, second for restricting modem request/release at top level,
> > and 3rd for restricting modem release/request for per client per modem
> basis.
> >
> > Can you let me know if the same can be achieved by using built-in ref
> > counting?
> 
> Yes, because you don't need all of those different levels, just stick with one
> and you should be fine. :)
> 

No, checks at all these levels are required, I have briefed out the need also.
This will have effect on system power management, i.e suspend and deep
sleep.
We restrict that the drivers should request modem only once and release
only once, but we cannot rely on the clients hence a check for the same has
to be done in the MAF. Also the no of clients should be defined and hence a
check for the same is done in MAF. Apart from all these the requests coming
from all the clients is to be accumulated and based on that modem release
or access should be performed, hence so.

Thanks and Regards,
Arun R Murthy
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))

2012-10-02 Thread Srivatsa S. Bhat
On 10/03/2012 06:15 AM, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> 
>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>
 Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
 notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
 CPU hotplug events.  I could go back to the old approach, but it is 
 significantly more complex.  I cannot say that I am all that happy about 
 anyone calling rcu_barrier() from a CPU hotplug notifier because it 
 doesn't help CPU hotplug latency, but that is a separate issue.

 But the thing is that rcu_barrier()'s assumptions work just fine if either
 (1) it excludes hotplug operations or (2) if it is called from a hotplug
 notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
 is executing.  So the right way to resolve this seems to be to do the
 get_online_cpus() only if rcu_barrier() is -not- executing in the context
 of a hotplug notifier.  Should be fixable without too much hassle...
>>>
>>> Sorry, I don't think I understand what you are proposing just yet.
>>>
>>> If I understand it correctly, you are proposing to introduce some magic 
>>> into _rcu_barrier() such as (pseudocode of course):
>>>
>>> if (!being_called_from_hotplug_notifier_callback)
>>> get_online_cpus()
>>>
>>> How does that protect from the scenario I've outlined before though?
>>>
>>> CPU 0   CPU 1
>>> kmem_cache_destroy()
>>> mutex_lock(slab_mutex)
>>> _cpu_up()
>>> cpu_hotplug_begin()
>>> mutex_lock(cpu_hotplug.lock)
>>> rcu_barrier()
>>> _rcu_barrier()
>>> get_online_cpus()
>>> mutex_lock(cpu_hotplug.lock)
>>>  (blocks, CPU 1 has the mutex)
>>> __cpu_notify()
>>> mutex_lock(slab_mutex)  
>>>
>>> CPU 0 grabs both locks anyway (it's not running from notifier callback). 
>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
>>> from notifier callback either.
>>>
>>> What did I miss?
>>
>> You didn't miss anything, I was suffering a failure to read carefully.
>>
>> So my next stupid question is "Why can't kmem_cache_destroy drop
>> slab_mutex early?" like the following:
>>
>>  void kmem_cache_destroy(struct kmem_cache *cachep)
>>  {
>>  BUG_ON(!cachep || in_interrupt());
>>
>>  /* Find the cache in the chain of caches. */
>>  get_online_cpus();
>>  mutex_lock(_mutex);
>>  /*
>>   * the chain is never empty, cache_cache is never destroyed
>>   */
>>  list_del(>list);
>>  if (__cache_shrink(cachep)) {
>>  slab_error(cachep, "Can't free all objects");
>>  list_add(>list, _caches);
>>  mutex_unlock(_mutex);
>>  put_online_cpus();
>>  return;
>>  }
>>  mutex_unlock(_mutex);
>>
>>  if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>  rcu_barrier();
>>
>>  __kmem_cache_destroy(cachep);
>>  put_online_cpus();
>>  }
>>
>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>> Looks to me like it is just freeing now-disconnected memory.
> 
> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> cachep has already been unlinked. I am adding slab people and linux-mm to 
> CC (the whole thread on LKML can be found at 
> https://lkml.org/lkml/2012/10/2/296 for reference).
> 
> How about the patch below? Pekka, Christoph, please?
> 
> It makes the lockdep happy again, and obviously removes the deadlock (I 
> tested it).
> 
> 
> 
> From: Jiri Kosina 
> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> This opens a possibilty for deadlock:
> 
> CPU 0   CPU 1
>   kmem_cache_destroy()
>   mutex_lock(slab_mutex)
>   _cpu_up()
>   cpu_hotplug_begin()
>   mutex_lock(cpu_hotplug.lock)
>   rcu_barrier()
>   _rcu_barrier()
>   get_online_cpus()
>   mutex_lock(cpu_hotplug.lock)
>(blocks, CPU 1 has the mutex)
>   __cpu_notify()
>   

Re: Soft lockup in scsi_remove_target under 3.6 (regression from 3.5)

2012-10-02 Thread Mike Christie
On 10/02/2012 07:43 PM, Jonathan McDowell wrote:
> Upgraded to 3.6 today on my dev box and after seeing an FC attached SAN
> go down and come back up (due to an expected reboot) I started getting
> the following in my logs. It continues even after the array is back and
> functioning - I'm seeing:
> 
>  kernel:[109104.348034] BUG: soft lockup - CPU#6 stuck for 23s!
> [kworker/6:0:30692]
> 
> repeated on logged in sessions and backtraces like the following (this
> is the first). I don't see the same problem under 3.5.
> 


I think you need this patch
http://marc.info/?l=linux-scsi=134621716223056=2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Paul E. McKenney
On Wed, Oct 03, 2012 at 09:05:31AM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:47 AM, Jiri Kosina wrote:
> > On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> > 
> >> I don't see how this circular locking dependency can occur.. If you are 
> >> using SLUB,
> >> kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If 
> >> you are
> >> using SLAB, kmem_cache_destroy() wraps its whole operation inside 
> >> get/put_online_cpus(),
> >> which means, it cannot run concurrently with a hotplug operation such as 
> >> cpu_up(). So, I'm
> >> rather puzzled at this lockdep splat..
> > 
> > I am using SLAB here.
> > 
> > The scenario I think is very well possible:
> > 
> > 
> > CPU 0   CPU 1
> > kmem_cache_destroy()
> 
> What about the get_online_cpus() right here at CPU0 before
> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
> on CPU1?? I still don't get it... :(
> 
> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
> and releasing slab_mutex).

The problem is that there is a CPU-hotplug notifier for slab, which
establishes hotplug->slab.  Then having kmem_cache_destroy() call
rcu_barrier() under the lock establishes slab->hotplug, which results
in deadlock.  Jiri really did explain this in an earlier email
message, but both of us managed to miss it.  ;-)

Thanx, Paul

> Regards,
> Srivatsa S. Bhat
> 
> > mutex_lock(slab_mutex)
> > _cpu_up()
> > cpu_hotplug_begin()
> > mutex_lock(cpu_hotplug.lock)
> > rcu_barrier()
> > _rcu_barrier()
> > get_online_cpus()
> > mutex_lock(cpu_hotplug.lock)
> >  (blocks, CPU 1 has the mutex)
> > __cpu_notify()
> > mutex_lock(slab_mutex)
> > 
> > Deadlock.
> > 
> > Right?
> > 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: makes bio_split support bio without data

2012-10-02 Thread Kent Overstreet
Adding Martin to the cc, so he can chime in on WRITE_SAME if I got it
wrong

On Wed, Oct 03, 2012 at 01:30:45PM +1000, NeilBrown wrote:
> On Tue, 2 Oct 2012 14:09:23 -0700 Kent Overstreet 
> wrote:
> 
> > On Tue, Oct 02, 2012 at 04:22:01PM +1000, NeilBrown wrote:
> > > On Fri, 28 Sep 2012 09:23:43 -0700 Kent Overstreet 
> > > 
> > > wrote:
> > > 
> > > > On Mon, Sep 24, 2012 at 02:56:39PM +1000, NeilBrown wrote:
> > > > > 
> > > > > Hi Jens,
> > > > >  this patch has been sitting in my -next tree for a little while and 
> > > > > I was
> > > > >  hoping for it to go in for the next merge window.
> > > > >  It simply allows bio_split() to be used on bios without a payload, 
> > > > > such as
> > > > >  'discard'.
> > > > 
> > > > Thing is, at some point in the stack a discard bio is going to have data
> > > > - see blk_add_rquest_payload(), and it used to be the single page was
> > > > added to discard bios above generic_make_request(), in
> > > > blkdev_issue_discard() or whatever it's called.
> > > > 
> > > > So while I'm sure your code works, it's just a fragile way of doing it.
> > > > 
> > > > There's also other types of bios where bi_size has nothing to do with
> > > > the amount of data in the bi_io_vec - actually I think this is a new
> > > > thing, since Martin Petersen just added REQ_WRITE_SAME and I don't think
> > > > there were any other instances besides REQ_DISCARD before.
> > > > 
> > > > So my preference would be defining a mask (REQ_DISCARD|REQ_WRITE_SAME),
> > > > and if bio->bi_rw & that mask is true, just duplicate the bvec or
> > > > whatever.
> > > 
> > > Hi Kent,
> > >  I'm afraid I don't see the relevance of your comments to the patch.
> > > 
> > > The current bio_split code can successfully split a bio with zero or one
> > > bi_vec entry.  If there are more than that, we cannot split.
> > > 
> > > How does it matter whether the bio is a DISCARD or a WRITE_SAME or a DATA 
> > > or
> > > whatever?
> > 
> > Hrm, I think I didn't explain very well.
> > 
> > After your change, if bio->bi_vcnt != 0, then it splits the bvec.
> > 
> > The trouble is that discard bios do under certain circumstances have
> > bio->bi_vcnt != 0, in which case splitting the bvec is the wrong thing
> > to do - first_sectors will quite likely be bigger than the bvec.
> > 
> > In practice this isn't currently a problem for discard bios, because
> > since Christoph added blk_add_request_payload(), discard bios won't have
> > that bvec added until they hit the scsi layer which will be after any
> > splitting. But this is a fairly recent and unrelated change, and IMO not
> > the kind of behaviour I'd want to rely on.
> > 
> > WRITE_SAME is a problem for the same reason - bio_sectors(bio) may be
> > large, but the bio will always have a single bvec and splitting the bvec
> > is always the wrong thing to do for WRITE_SAME.
> > 
> > So, I think it makes more sense to make the splitting conditional on
> > !(bio->bi_rw & (REQ_DISCARD|REQ_WRITE_SAME)), in addition to
> > bio->bi_vcnt == 1.
> > 
> > ..That make more sense?
> 
> Yes, that does make some more sense, thanks.  However it doesn't convince me
> that we need to change the patch.
> 
> I guess my position is that once we get to this code, we absolutely have to
> split the bio - it maps to two separate devices in a RAID0 or similar so
> not-splitting is not an option.
> 
> Maybe various md devices need to detect and reject REQ_DISCARD requests that
> have a payload and REQ_WRITE_SAME requests?  Or would they need to explicitly
> set a flag to say they accept them?

I think we should be able to split REQ_DISCARD bios that have a payload
or REQ_WRITE_SAME bios just fine though - for both of those cases, the
payload doesn't correspond to a particular sector, so just copy the
original bvec to the two splits and don't do anything else to it.

This gets so much cleaner with immutable bvecs :p

Actually that might be wrong for REQ_DISCARD bios if they had a payload,
I have no idea what that payload is actually for. But that should never
happen anymore, could make do WARN_ON((bio->bi_rw & REQ_DISCARD) &&
bio->bi_vcnt)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))

2012-10-02 Thread Paul E. McKenney
On Wed, Oct 03, 2012 at 02:45:30AM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> 
> > On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> > > On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> > > 
> > > > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> > > > notifier, which doesn't sit so well with rcu_barrier() trying to 
> > > > exclude 
> > > > CPU hotplug events.  I could go back to the old approach, but it is 
> > > > significantly more complex.  I cannot say that I am all that happy 
> > > > about 
> > > > anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> > > > doesn't help CPU hotplug latency, but that is a separate issue.
> > > > 
> > > > But the thing is that rcu_barrier()'s assumptions work just fine if 
> > > > either
> > > > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > > > notifier.  You see, either way, the CPU cannot go away while 
> > > > rcu_barrier()
> > > > is executing.  So the right way to resolve this seems to be to do the
> > > > get_online_cpus() only if rcu_barrier() is -not- executing in the 
> > > > context
> > > > of a hotplug notifier.  Should be fixable without too much hassle...
> > > 
> > > Sorry, I don't think I understand what you are proposing just yet.
> > > 
> > > If I understand it correctly, you are proposing to introduce some magic 
> > > into _rcu_barrier() such as (pseudocode of course):
> > > 
> > >   if (!being_called_from_hotplug_notifier_callback)
> > >   get_online_cpus()
> > > 
> > > How does that protect from the scenario I've outlined before though?
> > > 
> > >   CPU 0   CPU 1
> > >   kmem_cache_destroy()
> > >   mutex_lock(slab_mutex)
> > >   _cpu_up()
> > >   cpu_hotplug_begin()
> > >   mutex_lock(cpu_hotplug.lock)
> > >   rcu_barrier()
> > >   _rcu_barrier()
> > >   get_online_cpus()
> > >   mutex_lock(cpu_hotplug.lock)
> > >(blocks, CPU 1 has the mutex)
> > >   __cpu_notify()
> > >   mutex_lock(slab_mutex)  
> > > 
> > > CPU 0 grabs both locks anyway (it's not running from notifier callback). 
> > > CPU 1 grabs both locks as well, as there is no _rcu_barrier() being 
> > > called 
> > > from notifier callback either.
> > > 
> > > What did I miss?
> > 
> > You didn't miss anything, I was suffering a failure to read carefully.
> > 
> > So my next stupid question is "Why can't kmem_cache_destroy drop
> > slab_mutex early?" like the following:
> > 
> > void kmem_cache_destroy(struct kmem_cache *cachep)
> > {
> > BUG_ON(!cachep || in_interrupt());
> > 
> > /* Find the cache in the chain of caches. */
> > get_online_cpus();
> > mutex_lock(_mutex);
> > /*
> >  * the chain is never empty, cache_cache is never destroyed
> >  */
> > list_del(>list);
> > if (__cache_shrink(cachep)) {
> > slab_error(cachep, "Can't free all objects");
> > list_add(>list, _caches);
> > mutex_unlock(_mutex);
> > put_online_cpus();
> > return;
> > }
> > mutex_unlock(_mutex);
> > 
> > if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> > rcu_barrier();
> > 
> > __kmem_cache_destroy(cachep);
> > put_online_cpus();
> > }
> > 
> > Or did I miss some reason why __kmem_cache_destroy() needs that lock?
> > Looks to me like it is just freeing now-disconnected memory.
> 
> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> cachep has already been unlinked. I am adding slab people and linux-mm to 
> CC (the whole thread on LKML can be found at 
> https://lkml.org/lkml/2012/10/2/296 for reference).
> 
> How about the patch below? Pekka, Christoph, please?
> 
> It makes the lockdep happy again, and obviously removes the deadlock (I 
> tested it).

You can certainly add my Reviewed-by, for whatever that is worth.  ;-)

BTW, with this patch, are you able to dispense with my earlier patch,
or is it still needed?

Thanx, Paul

> From: Jiri Kosina 
> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> This opens a possibilty for deadlock:
> 
> CPU 0   CPU 1
>   kmem_cache_destroy()
>   mutex_lock(slab_mutex)
>   _cpu_up()
>   

Re: crash in ocfs2_fast_symlink_readpage in kernel 3.5.0

2012-10-02 Thread Bret Towe
On Sun, Jul 22, 2012 at 1:16 PM, Bret Towe  wrote:
> just booted a fresh 3.5 kernel and got the following on login via gdm
> on the client computer
> didn't see any crashes yet on any other computer but didn't give it
> long to try after seeing this
> let me know if you need more info
> this client is running debian wheezy 64bit
>
> Jul 22 12:48:38 ghoststar kernel: [  382.563886] general protection
> fault:  [#1] PREEMPT SMP
> Jul 22 12:48:38 ghoststar kernel: [  382.563918] CPU 0
> Jul 22 12:48:38 ghoststar kernel: [  382.563927] Modules linked in:
> ocfs2 quota_tree jbd2 sch_fq_codel xt_LOG xt_tcpudp nf_conntrack_ipv6
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_mangle ip6table_filter
> ip6_tables x_tables cpufreq_userspace cpufreq_powersave
> cpufreq_conservative binfmt_misc iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi tcp_bic ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm
> ocfs2_nodemanager nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc
> af_packet ocfs2_stack_user dlm sctp crc32c libcrc32c ipv6
> ocfs2_stackglue configfs loop fuse aoe msr joydev snd_hda_codec_hdmi
> snd_hda_codec_realtek evdev snd_hda_intel snd_hda_codec snd_hwdep
> snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_seq_dummy
> snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq
> powernow_k8 mperf kvm_amd kvm snd_seq_device snd_timer psmouse pcspkr
> serio_raw asus_atk0110 snd k10temp soundcore i2c_piix4 button
> processor dm_mod uas usb_storage firewire_ohci r8169 firewire_core
> crc_itu_t [last unloaded: scsi_wait_scan]
> Jul 22 12:48:38 ghoststar kernel: [  382.564352]
> Jul 22 12:48:38 ghoststar kernel: [  382.564354] Pid: 3690, comm:
> fbsetbg Not tainted 3.5.0+ #4 System manufacturer System Product
> Name/M3A78-EM
> Jul 22 12:48:38 ghoststar kernel: [  382.564395] RIP:
> 0010:[]  [] strnlen+0xd/0x40
> Jul 22 12:48:38 ghoststar kernel: [  382.564428] RSP:
> :88011ed31b78  EFLAGS: 00010202
> Jul 22 12:48:38 ghoststar kernel: [  382.564449] RAX: 880122caac00
> RBX: 495441c589455601 RCX: 0f3f
> Jul 22 12:48:38 ghoststar kernel: [  382.564476] RDX: 0001
> RSI: 0f40 RDI: 495441c589455601
> Jul 22 12:48:38 ghoststar kernel: [  382.564503] RBP: 88011ed31b78
> R08: 88011ed31b50 R09: a0588fa0
> Jul 22 12:48:38 ghoststar kernel: [  382.564530] R10: 
> R11: 0001 R12: ea00046ee040
> Jul 22 12:48:38 ghoststar kernel: [  382.564557] R13: 88011e0ef138
> R14: ea00046ee040 R15: a05e2a50
> Jul 22 12:48:38 ghoststar kernel: [  382.564585] FS:
> 7f6632782700() GS:88012fc0()
> knlGS:
> Jul 22 12:48:38 ghoststar kernel: [  382.564616] CS:  0010 DS: 
> ES:  CR0: 8005003b
> Jul 22 12:48:38 ghoststar kernel: [  382.564638] CR2: 7f24b14ea340
> CR3: 00011ee66000 CR4: 07f0
> Jul 22 12:48:38 ghoststar kernel: [  382.564665] DR0: 
> DR1:  DR2: 
> Jul 22 12:48:38 ghoststar kernel: [  382.564692] DR3: 
> DR6: 0ff0 DR7: 0400
> Jul 22 12:48:38 ghoststar kernel: [  382.564719] Process fbsetbg (pid:
> 3690, threadinfo 88011ed3, task 880122382c80)
> Jul 22 12:48:38 ghoststar kernel: [  382.564750] Stack:
> Jul 22 12:48:38 ghoststar kernel: [  382.564758]  88011ed31bc8
> a05e2aac 88011e0ef278 000200da
> Jul 22 12:48:38 ghoststar kernel: [  382.564790]  88011ed31bc8
> 810db189 88011e0ef278 
> Jul 22 12:48:38 ghoststar kernel: [  382.564822]  88011e0ef278
> 000200da 88011ed31c28 810db225
> Jul 22 12:48:38 ghoststar kernel: [  382.564854] Call Trace:
> Jul 22 12:48:38 ghoststar kernel: [  382.564892]  []
> ocfs2_fast_symlink_readpage+0x5c/0x1a0 [ocfs2]
> Jul 22 12:48:38 ghoststar kernel: [  382.564922]  []
> ? add_to_page_cache_lru+0x29/0x40
> Jul 22 12:48:38 ghoststar kernel: [  382.564947]  []
> do_read_cache_page+0x85/0x180
> Jul 22 12:48:38 ghoststar kernel: [  382.564971]  []
> read_cache_page_async+0x14/0x20
> Jul 22 12:48:38 ghoststar kernel: [  382.564995]  []
> read_cache_page+0x9/0x20
> Jul 22 12:48:38 ghoststar kernel: [  382.565018]  []
> page_getlink.isra.21+0x25/0x80
> Jul 22 12:48:38 ghoststar kernel: [  382.565042]  []
> page_follow_link_light+0x21/0x40
> Jul 22 12:48:38 ghoststar kernel: [  382.565066]  []
> link_path_walk+0x48b/0x950
> Jul 22 12:48:38 ghoststar kernel: [  382.565089]  []
> path_lookupat+0x52/0x7c0
> Jul 22 12:48:38 ghoststar kernel: [  382.565112]  []
> ? xfs_file_aio_read+0x187/0x370
> Jul 22 12:48:38 ghoststar kernel: [  382.565136]  []
> do_path_lookup+0x2c/0xc0
> Jul 22 12:48:38 ghoststar kernel: [  382.565159]  []
> ? getname_flags+0x4e/0xf0
> Jul 22 12:48:38 ghoststar kernel: [  382.565181]  []
> user_path_at_empty+0x58/0xa0
> Jul 22 12:48:38 ghoststar kernel: [  382.565205]  []
> ? smack_cred_prepare+0x46/0x70
> Jul 22 12:48:38 ghoststar 

Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Srivatsa S. Bhat
On 10/03/2012 03:47 AM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>> I don't see how this circular locking dependency can occur.. If you are 
>> using SLUB,
>> kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If 
>> you are
>> using SLAB, kmem_cache_destroy() wraps its whole operation inside 
>> get/put_online_cpus(),
>> which means, it cannot run concurrently with a hotplug operation such as 
>> cpu_up(). So, I'm
>> rather puzzled at this lockdep splat..
> 
> I am using SLAB here.
> 
> The scenario I think is very well possible:
> 
> 
>   CPU 0   CPU 1
>   kmem_cache_destroy()

What about the get_online_cpus() right here at CPU0 before
calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
on CPU1?? I still don't get it... :(

(kmem_cache_destroy() uses get/put_online_cpus() around acquiring
and releasing slab_mutex).

Regards,
Srivatsa S. Bhat

>   mutex_lock(slab_mutex)
>   _cpu_up()
>   cpu_hotplug_begin()
>   mutex_lock(cpu_hotplug.lock)
>   rcu_barrier()
>   _rcu_barrier()
>   get_online_cpus()
>   mutex_lock(cpu_hotplug.lock)
>(blocks, CPU 1 has the mutex)
>   __cpu_notify()
>   mutex_lock(slab_mutex)
> 
> Deadlock.
> 
> Right?
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 1/2] hwmon: (ads7828) driver cleanup

2012-10-02 Thread Vivien Didelot
* Remove module parameters, add a ads7828_platform_data;
* Move driver declaration to avoid adding function prototypes;
* Remove unused macros;
* Coding Style fixes.

Signed-off-by: Vivien Didelot 
---
 Documentation/hwmon/ads7828   |  31 +++--
 drivers/hwmon/ads7828.c   | 206 --
 include/linux/platform_data/ads7828.h |  29 +
 3 files changed, 147 insertions(+), 119 deletions(-)
 create mode 100644 include/linux/platform_data/ads7828.h

diff --git a/Documentation/hwmon/ads7828 b/Documentation/hwmon/ads7828
index 2bbebe6..b35668c 100644
--- a/Documentation/hwmon/ads7828
+++ b/Documentation/hwmon/ads7828
@@ -5,21 +5,32 @@ Supported chips:
   * Texas Instruments/Burr-Brown ADS7828
 Prefix: 'ads7828'
 Addresses scanned: I2C 0x48, 0x49, 0x4a, 0x4b
-Datasheet: Publicly available at the Texas Instruments website :
+Datasheet: Publicly available at the Texas Instruments website:
http://focus.ti.com/lit/ds/symlink/ads7828.pdf
 
 Authors:
 Steve Hardy 
 
-Module Parameters
--
-
-* se_input: bool (default Y)
-  Single ended operation - set to N for differential mode
-* int_vref: bool (default Y)
-  Operate with the internal 2.5V reference - set to N for external reference
-* vref_mv: int (default 2500)
-  If using an external reference, set this to the reference voltage in mV
+Platform data
+-
+
+The ads7828 driver accepts an optional ads7828_platform_data structure (defined
+in include/linux/platform_data/ads7828.h). If no structure is provided, the
+configuration defaults to single ended operation and internal vref (2.5V).
+
+The structure fields are:
+
+* diff_input: bool
+  Differential operation - set to true for differential mode,
+  false for default single ended mode.
+* ext_vref: bool
+  External reference - set to true if it operates with an external reference,
+  false for default internal reference.
+* vref_mv: int
+  Voltage reference - if using an external reference, set this to the reference
+  voltage in mV, otherwise, it will default to the internal value (2500mV).
+  This value will be bounded with limits accepted by the chip, described in the
+  datasheet.
 
 Description
 ---
diff --git a/drivers/hwmon/ads7828.c b/drivers/hwmon/ads7828.c
index 1f9e8af..756bd1b 100644
--- a/drivers/hwmon/ads7828.c
+++ b/drivers/hwmon/ads7828.c
@@ -6,7 +6,7 @@
  *
  * Written by Steve Hardy 
  *
- * Datasheet available at: http://focus.ti.com/lit/ds/symlink/ads7828.pdf
+ * For further information, see the Documentation/hwmon/ads7828 file.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -23,63 +23,48 @@
  * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include 
 #include 
 #include 
-#include 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
+#include 
 
 /* The ADS7828 registers */
-#define ADS7828_NCH 8 /* 8 channels of 12-bit A-D supported */
-#define ADS7828_CMD_SD_SE 0x80 /* Single ended inputs */
-#define ADS7828_CMD_SD_DIFF 0x00 /* Differential inputs */
-#define ADS7828_CMD_PD0 0x0 /* Power Down between A-D conversions */
-#define ADS7828_CMD_PD1 0x04 /* Internal ref OFF && A-D ON */
-#define ADS7828_CMD_PD2 0x08 /* Internal ref ON && A-D OFF */
-#define ADS7828_CMD_PD3 0x0C /* Internal ref ON && A-D ON */
-#define ADS7828_INT_VREF_MV 2500 /* Internal vref is 2.5V, 2500mV */
+#define ADS7828_NCH8   /* 8 channels supported */
+#define ADS7828_CMD_SD_SE  0x80/* Single ended inputs */
+#define ADS7828_CMD_PD10x04/* Internal vref OFF && A/D ON 
*/
+#define ADS7828_CMD_PD30x0C/* Internal vref ON && A/D ON */
+#define ADS7828_INT_VREF_MV2500/* Internal vref is 2.5V, 2500mV */
+#define ADS7828_EXT_VREF_MV_MIN50  /* External vref min value 
0.05V */
+#define ADS7828_EXT_VREF_MV_MAX5250/* External vref max value 
5.25V */
 
 /* Addresses to scan */
-static const unsigned short normal_i2c[] = { 0x48, 0x49, 0x4a, 0x4b,
+static const unsigned short ads7828_normal_i2c[] = { 0x48, 0x49, 0x4a, 0x4b,
I2C_CLIENT_END };
 
-/* Module parameters */
-static bool se_input = 1; /* Default is SE, 0 == diff */
-static bool int_vref = 1; /* Default is internal ref ON */
-static int vref_mv = ADS7828_INT_VREF_MV; /* set if vref != 2.5V */
-module_param(se_input, bool, S_IRUGO);
-module_param(int_vref, bool, S_IRUGO);
-module_param(vref_mv, int, S_IRUGO);
-
-/* Global Variables */
-static u8 ads7828_cmd_byte; /* cmd byte without channel bits */
-static unsigned int ads7828_lsb_resol; /* resolution of the ADC sample lsb */
-
-/* Each client has this additional data */
+/* Client specific data */
 struct ads7828_data {
struct device *hwmon_dev;
-   struct mutex update_lock; /* mutex protect updates */

[PATCH v4 2/2] hwmon: (ads7828) add support for ADS7830

2012-10-02 Thread Vivien Didelot
From: Guillaume Roguez 

The ADS7830 device is almost the same as the ADS7828,
except that it does 8-bit sampling, instead of 12-bit.
This patch extends the ads7828 driver to support this chip.

Signed-off-by: Guillaume Roguez 
Signed-off-by: Vivien Didelot 
---
 Documentation/hwmon/ads7828 | 12 ++--
 drivers/hwmon/Kconfig   |  7 ---
 drivers/hwmon/ads7828.c | 45 ++---
 3 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/Documentation/hwmon/ads7828 b/Documentation/hwmon/ads7828
index b35668c..51eab52 100644
--- a/Documentation/hwmon/ads7828
+++ b/Documentation/hwmon/ads7828
@@ -8,8 +8,15 @@ Supported chips:
 Datasheet: Publicly available at the Texas Instruments website:
http://focus.ti.com/lit/ds/symlink/ads7828.pdf
 
+  * Texas Instruments ADS7830
+Prefix: 'ads7830'
+Addresses scanned: I2C 0x48, 0x49, 0x4a, 0x4b
+Datasheet: Publicly available at the Texas Instruments website:
+   http://focus.ti.com/lit/ds/symlink/ads7830.pdf
+
 Authors:
 Steve Hardy 
+Guillaume Roguez 
 
 Platform data
 -
@@ -35,9 +42,10 @@ The structure fields are:
 Description
 ---
 
-This driver implements support for the Texas Instruments ADS7828.
+This driver implements support for the Texas Instruments ADS7828 and ADS7830.
 
-This device is a 12-bit 8-channel A-D converter.
+The ADS7828 device is a 12-bit 8-channel A/D converter, while the ADS7830 does
+8-bit sampling.
 
 It can operate in single ended mode (8 +ve inputs) or in differential mode,
 where 4 differential pairs can be measured.
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 83e3e9d..960c8c5 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1060,11 +1060,12 @@ config SENSORS_ADS1015
  will be called ads1015.
 
 config SENSORS_ADS7828
-   tristate "Texas Instruments ADS7828"
+   tristate "Texas Instruments ADS7828 and compatibles"
depends on I2C
help
- If you say yes here you get support for Texas Instruments ADS7828
- 12-bit 8-channel ADC device.
+ If you say yes here you get support for Texas Instruments ADS7828 and
+ ADS7830 8-channel A/D converters. ADS7828 resolution is 12-bit, while
+ it is 8-bit on ADS7830.
 
  This driver can also be built as a module.  If so, the module
  will be called ads7828.
diff --git a/drivers/hwmon/ads7828.c b/drivers/hwmon/ads7828.c
index 756bd1b..638c969 100644
--- a/drivers/hwmon/ads7828.c
+++ b/drivers/hwmon/ads7828.c
@@ -1,11 +1,13 @@
 /*
- * ads7828.c - lm_sensors driver for ads7828 12-bit 8-channel ADC
+ * ads7828.c - driver for TI ADS7828 8-channel A/D converter and compatibles
  * (C) 2007 EADS Astrium
  *
  * This driver is based on the lm75 and other lm_sensors/hwmon drivers
  *
  * Written by Steve Hardy 
  *
+ * ADS7830 support, by Guillaume Roguez 
+ *
  * For further information, see the Documentation/hwmon/ads7828 file.
  *
  * This program is free software; you can redistribute it and/or modify
@@ -43,6 +45,9 @@
 #define ADS7828_EXT_VREF_MV_MIN50  /* External vref min value 
0.05V */
 #define ADS7828_EXT_VREF_MV_MAX5250/* External vref max value 
5.25V */
 
+/* List of supported devices */
+enum ads7828_chips { ads7828, ads7830 };
+
 /* Addresses to scan */
 static const unsigned short ads7828_normal_i2c[] = { 0x48, 0x49, 0x4a, 0x4b,
I2C_CLIENT_END };
@@ -59,6 +64,7 @@ struct ads7828_data {
unsigned int vref_mv;   /* voltage reference value */
u8 cmd_byte;/* Command byte without channel bits */
unsigned int lsb_resol; /* Resolution of the ADC sample LSB */
+   s32 (*read_channel)(const struct i2c_client *client, u8 command);
 };
 
 /* Command byte C2,C1,C0 - see datasheet */
@@ -82,8 +88,7 @@ static struct ads7828_data *ads7828_update_device(struct 
device *dev)
 
for (ch = 0; ch < ADS7828_NCH; ch++) {
u8 cmd = ads7828_cmd_byte(data->cmd_byte, ch);
-   data->adc_input[ch] =
-   i2c_smbus_read_word_swapped(client, cmd);
+   data->adc_input[ch] = data->read_channel(client, cmd);
}
data->last_updated = jiffies;
data->valid = true;
@@ -147,6 +152,7 @@ static int ads7828_detect(struct i2c_client *client,
 {
struct i2c_adapter *adapter = client->adapter;
u8 default_cmd_byte = ADS7828_CMD_SD_SE | ADS7828_CMD_PD3;
+   bool is_8bit = false;
int ch;
 
/* Check we have a valid client */
@@ -158,7 +164,9 @@ static int ads7828_detect(struct i2c_client *client,
 * dedicated register so attempt to sanity check using knowledge of
 * the chip
 * - Read from the 8 channel addresses
-* - Check the top 4 bits of each result are not set (12 data bits)
+

RE: [PATCH] Input: Add new driver into Input Subsystem for Synaptics DS4 touchscreen I2C devices

2012-10-02 Thread Alexandra Chin
Add Chris in mail loop.

Best Regards,
Alexandra Chin


 Synaptics Hong Kong Limited, Taiwan Branch
 5F., No. 501, Sec. 2 Tiding Blvd., Neihu District, 
 Taipei City 114, Taiwan
Office: 886.2.8752.5700  ext:652
Email:alexandra.c...@synaptics.com.tw


-Original Message-
From: Alexandra Chin 
Sent: Tuesday, October 02, 2012 3:58 PM
To: Dmitry Torokhov; 'Henrik Rydberg'
Cc: Linux Kernel; Linux Input; Linus Walleij; Naveen Kumar Gaddipati; Mahesh 
Srinivasan; Alex Chang; Scott Lin
Subject: [PATCH] Input: Add new driver into Input Subsystem for Synaptics DS4 
touchscreen I2C devices

Hi Henrik/Dmitry,

We are working on a product specific driver for Synaptics DS4 I2C touchscreen
devices. It was submitted on Sept. 16, 2012, but has not been reviewed.
(http://lkml.org/lkml/2012/9/16/24). 
We found several warnings after running script/checkpatch.pl, therefore 
an updated patch is attached.

As Chris says in https://lkml.org/lkml/2012/9/19/505, this driver will enable
us to support all our customers effectively and provide our customers with 
the best flexibility possible.
Please help review attached patch, and we really appreciate your feedback :-)



Synaptics DS4 touchscreen driver implements a generic driver supporting I2C
protocol for Synaptics Design Studio 4 (DS4) family of Touchscreen Controllers
which include the following:

- S32xX series
- S730X series
- S22xx series

The driver supports multifinger pointing functionality and power management.
The driver is based on the original work submitted by
Linus Walleij  and
Naveen Kumar Gaddipati .

This patch is against the v3.1-rc9 tag of Dmitry Torokhov's kernel tree,
commit bd68dfe0071b50bc69416a92ee22b63d1cc33a3b.

Changes in this patch:
 - modified:   drivers/input/touchscreen/Kconfig
 - modified:   drivers/input/touchscreen/Makefile
 - new file:   drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.c
 - new file:   drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.h
 - new file:   include/linux/input/synaptics_dsx.h

This patch is functionally tested on omap beagleboard-xm.

We will continue to maintain and support this driver officially, including
making updates, as well as supporting future Touch Controller revisions
from Synaptics.

Any comments are much appreciated.

Alexandra Chin

Signed-off-by: Alexandra Chin 
---
 drivers/input/touchscreen/Kconfig  |   12 +
 drivers/input/touchscreen/Makefile |1 +
 drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.c | 1083 
 drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.h |   94 ++
 include/linux/input/synaptics_dsx.h|   49 +
 5 files changed, 1239 insertions(+), 0 deletions(-)
 create mode 100644 drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.c
 create mode 100644 drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.h
 create mode 100644 include/linux/input/synaptics_dsx.h

diff --git a/drivers/input/touchscreen/Kconfig 
b/drivers/input/touchscreen/Kconfig
index 1ba232c..431c72b 100644
--- a/drivers/input/touchscreen/Kconfig
+++ b/drivers/input/touchscreen/Kconfig
@@ -900,4 +900,16 @@ config TOUCHSCREEN_TPS6507X
  To compile this driver as a module, choose M here: the
  module will be called tps6507x_ts.
 
+config TOUCHSCREEN_SYNAPTICS_DS4_RMI4_I2C
+   tristate "Synaptics ds4 i2c touchscreen"
+   depends on I2C
+   help
+ Say Y here if you have a Synaptics DS4 I2C touchscreen
+ connected to your system.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called synaptics_ds4_rmi4_i2c.
+
 endif
diff --git a/drivers/input/touchscreen/Makefile 
b/drivers/input/touchscreen/Makefile
index 178eb12..61f5f22 100644
--- a/drivers/input/touchscreen/Makefile
+++ b/drivers/input/touchscreen/Makefile
@@ -73,3 +73,4 @@ obj-$(CONFIG_TOUCHSCREEN_WM97XX_MAINSTONE)+= 
mainstone-wm97xx.o
 obj-$(CONFIG_TOUCHSCREEN_WM97XX_ZYLONITE)  += zylonite-wm97xx.o
 obj-$(CONFIG_TOUCHSCREEN_W90X900)  += w90p910_ts.o
 obj-$(CONFIG_TOUCHSCREEN_TPS6507X) += tps6507x-ts.o
+obj-$(CONFIG_TOUCHSCREEN_SYNAPTICS_DS4_RMI4_I2C)   += 
synaptics_ds4_rmi4_i2c.o
diff --git a/drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.c 
b/drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.c
new file mode 100644
index 000..6e85d97
--- /dev/null
+++ b/drivers/input/touchscreen/synaptics_ds4_rmi4_i2c.c
@@ -0,0 +1,1083 @@
+/*
+ * Synaptics DS4 touchscreen driver
+ *
+ * Copyright (C) 2012 Synaptics Incorporated
+ *
+ * Copyright (C) 2012 Alexandra Chin 
+ * Copyright (C) 2012 Scott Lin 
+ * Copyright (C) 2010 Js HA 
+ * Copyright (C) 2010 Naveen Kumar G 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it 

Re: [PATCH] block: makes bio_split support bio without data

2012-10-02 Thread NeilBrown
On Tue, 2 Oct 2012 14:09:23 -0700 Kent Overstreet 
wrote:

> On Tue, Oct 02, 2012 at 04:22:01PM +1000, NeilBrown wrote:
> > On Fri, 28 Sep 2012 09:23:43 -0700 Kent Overstreet 
> > wrote:
> > 
> > > On Mon, Sep 24, 2012 at 02:56:39PM +1000, NeilBrown wrote:
> > > > 
> > > > Hi Jens,
> > > >  this patch has been sitting in my -next tree for a little while and I 
> > > > was
> > > >  hoping for it to go in for the next merge window.
> > > >  It simply allows bio_split() to be used on bios without a payload, 
> > > > such as
> > > >  'discard'.
> > > 
> > > Thing is, at some point in the stack a discard bio is going to have data
> > > - see blk_add_rquest_payload(), and it used to be the single page was
> > > added to discard bios above generic_make_request(), in
> > > blkdev_issue_discard() or whatever it's called.
> > > 
> > > So while I'm sure your code works, it's just a fragile way of doing it.
> > > 
> > > There's also other types of bios where bi_size has nothing to do with
> > > the amount of data in the bi_io_vec - actually I think this is a new
> > > thing, since Martin Petersen just added REQ_WRITE_SAME and I don't think
> > > there were any other instances besides REQ_DISCARD before.
> > > 
> > > So my preference would be defining a mask (REQ_DISCARD|REQ_WRITE_SAME),
> > > and if bio->bi_rw & that mask is true, just duplicate the bvec or
> > > whatever.
> > 
> > Hi Kent,
> >  I'm afraid I don't see the relevance of your comments to the patch.
> > 
> > The current bio_split code can successfully split a bio with zero or one
> > bi_vec entry.  If there are more than that, we cannot split.
> > 
> > How does it matter whether the bio is a DISCARD or a WRITE_SAME or a DATA or
> > whatever?
> 
> Hrm, I think I didn't explain very well.
> 
> After your change, if bio->bi_vcnt != 0, then it splits the bvec.
> 
> The trouble is that discard bios do under certain circumstances have
> bio->bi_vcnt != 0, in which case splitting the bvec is the wrong thing
> to do - first_sectors will quite likely be bigger than the bvec.
> 
> In practice this isn't currently a problem for discard bios, because
> since Christoph added blk_add_request_payload(), discard bios won't have
> that bvec added until they hit the scsi layer which will be after any
> splitting. But this is a fairly recent and unrelated change, and IMO not
> the kind of behaviour I'd want to rely on.
> 
> WRITE_SAME is a problem for the same reason - bio_sectors(bio) may be
> large, but the bio will always have a single bvec and splitting the bvec
> is always the wrong thing to do for WRITE_SAME.
> 
> So, I think it makes more sense to make the splitting conditional on
> !(bio->bi_rw & (REQ_DISCARD|REQ_WRITE_SAME)), in addition to
> bio->bi_vcnt == 1.
> 
> ..That make more sense?

Yes, that does make some more sense, thanks.  However it doesn't convince me
that we need to change the patch.

I guess my position is that once we get to this code, we absolutely have to
split the bio - it maps to two separate devices in a RAID0 or similar so
not-splitting is not an option.

Maybe various md devices need to detect and reject REQ_DISCARD requests that
have a payload and REQ_WRITE_SAME requests?  Or would they need to explicitly
set a flag to say they accept them?

So maybe there is something to fix, but I don't think it is in bit_split,
except maybe to add WARN_ON ??

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH 0/4] dw_dmac: few cleanups to the driver

2012-10-02 Thread viresh kumar
On Tue, Oct 2, 2012 at 5:11 PM, Andy Shevchenko
 wrote:
> There are few cleanups to the driver which already acked and reviewed.
> I decide to split last series to two parts. This is first part.

Looks good.

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] hwmon: (ads7828) driver cleanup

2012-10-02 Thread Guenter Roeck
On Tue, Oct 02, 2012 at 10:28:13PM -0400, Vivien Didelot wrote:
> Hi Guenter,
> 
> Some of those changes weren't in the mainline tree a few days ago.
> I'll cherry-pick them and send a update very soon :-)
> 
True, but they have been in -next for the last two months or so ...

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] i2c-hid: introduce HID over i2c specification implementation

2012-10-02 Thread Jian-Jhong Ding
Hi Benjamin,

I have one little question about __i2chid_command(), please see below.

"benjamin.tissoires"  writes:
> From: Benjamin Tissoires 
>
> Microsoft published the protocol specification of HID over i2c:
> http://msdn.microsoft.com/en-us/library/windows/hardware/hh852380.aspx
>
> This patch introduces an implementation of this protocol.
>
> This implementation does not includes the ACPI part of the specification.
> This will come when ACPI 5.0 devices will be available.
>
> Once the ACPI part will be done, OEM will not have to declare HID over I2C
> devices in their platform specific driver.
>
> Signed-off-by: Benjamin Tissoires 
> ---
>
> Hi,
>
> this is finally my first implementation of HID over I2C.
>
> This has been tested on an Elan Microelectronics HID over I2C device, with
> a Samsung Exynos 4412 board.
>
> Any comments are welcome.
>
> Cheers,
> Benjamin
>
>  drivers/i2c/Kconfig |8 +
>  drivers/i2c/Makefile|1 +
>  drivers/i2c/i2c-hid.c   | 1027 
> +++
>  include/linux/i2c/i2c-hid.h |   35 ++
>  4 files changed, 1071 insertions(+)
>  create mode 100644 drivers/i2c/i2c-hid.c
>  create mode 100644 include/linux/i2c/i2c-hid.h
>
> diff --git a/drivers/i2c/Kconfig b/drivers/i2c/Kconfig
> index 5a3bb3d..5adf65a 100644
> --- a/drivers/i2c/Kconfig
> +++ b/drivers/i2c/Kconfig
> @@ -47,6 +47,14 @@ config I2C_CHARDEV
> This support is also available as a module.  If so, the module 
> will be called i2c-dev.
>  
> +config I2C_HID
> + tristate "HID over I2C bus"
> + help
> +   Say Y here to use the HID over i2c protocol implementation.
> +
> +   This support is also available as a module.  If so, the module
> +   will be called i2c-hid.
> +
>  config I2C_MUX
>   tristate "I2C bus multiplexing support"
>   help
> diff --git a/drivers/i2c/Makefile b/drivers/i2c/Makefile
> index beee6b2..8f38116 100644
> --- a/drivers/i2c/Makefile
> +++ b/drivers/i2c/Makefile
> @@ -6,6 +6,7 @@ obj-$(CONFIG_I2C_BOARDINFO)   += i2c-boardinfo.o
>  obj-$(CONFIG_I2C)+= i2c-core.o
>  obj-$(CONFIG_I2C_SMBUS)  += i2c-smbus.o
>  obj-$(CONFIG_I2C_CHARDEV)+= i2c-dev.o
> +obj-$(CONFIG_I2C_HID)+= i2c-hid.o
>  obj-$(CONFIG_I2C_MUX)+= i2c-mux.o
>  obj-y+= algos/ busses/ muxes/
>  
> diff --git a/drivers/i2c/i2c-hid.c b/drivers/i2c/i2c-hid.c
> new file mode 100644
> index 000..eb17d8c
> --- /dev/null
> +++ b/drivers/i2c/i2c-hid.c
> @@ -0,0 +1,1027 @@
> +/*
> + * HID over I2C protocol implementation
> + *
> + * Copyright (c) 2012 Benjamin Tissoires 
> + * Copyright (c) 2012 Ecole Nationale de l'Aviation Civile, France
> + *
> + * This code is partly based on "USB HID support for Linux":
> + *
> + *  Copyright (c) 1999 Andreas Gal
> + *  Copyright (c) 2000-2005 Vojtech Pavlik 
> + *  Copyright (c) 2005 Michael Haboustak  for Concept2, 
> Inc
> + *  Copyright (c) 2007-2008 Oliver Neukum
> + *  Copyright (c) 2006-2010 Jiri Kosina
> + *
> + * This file is subject to the terms and conditions of the GNU General Public
> + * License.  See the file COPYING in the main directory of this archive for
> + * more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#define DRIVER_NAME  "i2chid"
> +#define DRIVER_DESC  "HID over I2C core driver"
> +
> +#define I2C_HID_COMMAND_TRIES3
> +
> +/* flags */
> +#define I2C_HID_STARTED  (1 << 0)
> +#define I2C_HID_OUT_RUNNING  (1 << 1)
> +#define I2C_HID_IN_RUNNING   (1 << 2)
> +#define I2C_HID_RESET_PENDING(1 << 3)
> +#define I2C_HID_SUSPENDED(1 << 4)
> +
> +#define I2C_HID_PWR_ON   0x00
> +#define I2C_HID_PWR_SLEEP0x01
> +
> +/* debug option */
> +static bool debug = false;
> +module_param(debug, bool, 0444);
> +MODULE_PARM_DESC(debug, "print a lot of debug informations");
> +
> +struct i2chid_desc {
> + __le16 wHIDDescLength;
> + __le16 bcdVersion;
> + __le16 wReportDescLength;
> + __le16 wReportDescRegister;
> + __le16 wInputRegister;
> + __le16 wMaxInputLength;
> + __le16 wOutputRegister;
> + __le16 wMaxOutputLength;
> + __le16 wCommandRegister;
> + __le16 wDataRegister;
> + __le16 wVendorID;
> + __le16 wProductID;
> + __le16 wVersionID;
> +} __packed;
> +
> +struct i2chid_cmd {
> + enum {
> + /* fecth HID descriptor */
> + HID_DESCR_CMD,
> +
> + /* fetch report descriptors */
> + HID_REPORT_DESCR_CMD,
> +
> + /* commands */
> + HID_RESET_CMD,
> + HID_GET_REPORT_CMD,
> + HID_SET_REPORT_CMD,
> + HID_GET_IDLE_CMD,
> + HID_SET_IDLE_CMD,
> + HID_GET_PROTOCOL_CMD,
> + HID_SET_PROTOCOL_CMD,
> +   

Re: [RFC, PATCH] Extensible AIO interface

2012-10-02 Thread Kent Overstreet
On Wed, Oct 03, 2012 at 10:41:06AM +0900, Tejun Heo wrote:
> Hello, Kent.
> 
> On Tue, Oct 02, 2012 at 02:41:13PM -0700, Kent Overstreet wrote:
> > Seems to me it'd be no different from security considerations when
> > introducing new ioctls. I.e., messy, ad hoc, easy to get wrong, but
> > sometimes no way around it.
> > 
> > It really has to be ad hoc if it's extensible, unfortunately.
> > 
> > The only way of getting around _that_ would be with some kind of
> > reflective type system, so that generic code could parse (in some
> > fashion) the types of the various attributes, and for pointers copy the
> > user data into the kernel and do whatever access controls in generic
> > code.
> 
> I'm not userland API expert by any stretch of imagination but the
> basic mechanism to pass data around seems sane to me and aio as stinky
> as it is seems to be the only logical stuff for IOs w/ extra
> attributes although alignment is always painful with any form of
> concatenated opaque structures.
> 
> However, I don't think it's a good idea to try to implement something
> which is a neutral transport of opaque data between userland and lower
> layers.  Things like that sound attractive with unlimited
> possibilities but reality seems to have the tendancy to make a big
> mess out of setups like that.

I don't see how the "neutral transport of opaque data" itself is a bad
thing. We want something simple and sane to build actual interfaces on
top of - once we've got that, we can either build clean generic well
defined interfaces or we can make a mess like with ioctls :P

It's like any other mechanism. There's good syscalls and bad syscalls...

> So, if we're gonna do this, let's define what attributes we want to
> have and let them be processed at the interface layer and fed to lower
> layers afterwards - e.g. for cgroup context association, associate the
> resulting bios with the target cgroup from the aio layer.

That sounds perfectly reasonable to me (the emails with Zach ended up
heading in that direction).

> I'm quite skeptical of general usefulness of having opaque knobs to
> lower IO stack which don't have proper generic abstraction.  Nobody
> can make proper use of things like that.  Well, not nobody, maybe if
> the lower stack, the interface and the application are implemented by
> a single organization over relatively short span of time, maybe it
> would be useful for them, but that isn't something which generic
> interface design should focus on.

I think it could work fine, but I'm not convinced either way on what the
correct approach is.

Say we implement an attr to control a block layer cache. That attr could
be parsed/validated in high level code (if there's any to do) - that I
don't object to. But the high level code isn't going to /know/ whether
there was any block cache in the stack that handled the attr. If the
attr is passed down to the block cache, that block cache can return that
it was handled.

Or if we implement an attr that says "return whether it was a cache hit
or miss, and maybe other statistics". Similar thing.

We could conceivably confine attrs to the upper layer, and define in
kernel interfaces for passing all this stuff around, but I'm doubtful
it's worth the trouble - at least if attrs themselves are implemented
sanely. And honestly I think it'd make (more of) a mess of the block
layer and the rest of the io stack to have to explicitly pass around
cache hints/cache controlling options/whatever else we think of in the
future - especially when most of this stuff isn't going to be in use
most of the time.

But, like I sort of mentioned in another email with Zach - passing the
attrs from userspace around is _not_ mutually exclusive with standard
code for parsing/validating them that exists in one place. We shouldn't
have driver code reaching in and grabbing the raw attr unless there _is_
no parsing/validation to do.


> It's okay to allow some side channel thing for specific hacky uses but
> I really hope the general design were focused around properly
> abstracted attributes which can be understood and handled by the upper
> layer.

Completely agreed. I want to leave that side channel open for
experimentation, and so we have a way of implementing one off hacky
stuff when we need to - but normal mainline stuff should be sane and
well designed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] Remove the pSeries_reconfig.h file

2012-10-02 Thread Nathan Fontenot
Remove the pSeries_reconfig.h header file. At this point there is only one
definition in the file, pSeries_coalesce_init(), which can be
moved to rtas.h.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/include/asm/pSeries_reconfig.h |   15 ---
 arch/powerpc/include/asm/rtas.h |5 +
 arch/powerpc/kernel/rtas.c  |1 -
 arch/powerpc/platforms/pseries/smp.c|1 -
 4 files changed, 5 insertions(+), 17 deletions(-)

Index: dt-next/arch/powerpc/include/asm/pSeries_reconfig.h
===
--- dt-next.orig/arch/powerpc/include/asm/pSeries_reconfig.h2012-10-02 
09:14:01.0 -0500
+++ /dev/null   1970-01-01 00:00:00.0 +
@@ -1,15 +0,0 @@
-#ifndef _PPC64_PSERIES_RECONFIG_H
-#define _PPC64_PSERIES_RECONFIG_H
-#ifdef __KERNEL__
-
-#ifdef CONFIG_PPC_PSERIES
-/* Not the best place to put this, will be fixed when we move some
- * of the rtas suspend-me stuff to pseries */
-extern void pSeries_coalesce_init(void);
-#else /* !CONFIG_PPC_PSERIES */
-static inline void pSeries_coalesce_init(void) { }
-#endif /* CONFIG_PPC_PSERIES */
-
-
-#endif /* __KERNEL__ */
-#endif /* _PPC64_PSERIES_RECONFIG_H */
Index: dt-next/arch/powerpc/include/asm/rtas.h
===
--- dt-next.orig/arch/powerpc/include/asm/rtas.h2012-10-02 
09:14:01.0 -0500
+++ dt-next/arch/powerpc/include/asm/rtas.h 2012-10-02 09:14:40.0 
-0500
@@ -353,8 +353,13 @@
return 1;
return 0;
 }
+
+/* Not the best place to put pSeries_coalesce_init, will be fixed when we
+ * move some of the rtas suspend-me stuff to pseries */
+extern void pSeries_coalesce_init(void);
 #else
 static inline int page_is_rtas_user_buf(unsigned long pfn) { return 0;}
+static inline void pSeries_coalesce_init(void) { }
 #endif
 
 extern int call_rtas(const char *, int, int, unsigned long *, ...);
Index: dt-next/arch/powerpc/kernel/rtas.c
===
--- dt-next.orig/arch/powerpc/kernel/rtas.c 2012-10-02 09:14:01.0 
-0500
+++ dt-next/arch/powerpc/kernel/rtas.c  2012-10-02 09:14:40.0 -0500
@@ -42,7 +42,6 @@
 #include 
 #include 
 #include 
-#include 
 
 struct rtas_t rtas = {
.lock = __ARCH_SPIN_LOCK_UNLOCKED
Index: dt-next/arch/powerpc/platforms/pseries/smp.c
===
--- dt-next.orig/arch/powerpc/platforms/pseries/smp.c   2012-10-02 
09:14:01.0 -0500
+++ dt-next/arch/powerpc/platforms/pseries/smp.c2012-10-02 
09:14:40.0 -0500
@@ -38,7 +38,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] Rename the drivers/of prom_* functions to of_*

2012-10-02 Thread Nathan Fontenot
Rename the prom_*_property routines of the generic OF code to of_*_property.
This brings them in line with the naming used by the rest of the OF code.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/kernel/machine_kexec.c   |   12 ++--
 arch/powerpc/kernel/machine_kexec_64.c|8 
 arch/powerpc/kernel/pci_32.c  |2 +-
 arch/powerpc/platforms/ps3/os-area.c  |6 +++---
 arch/powerpc/platforms/pseries/iommu.c|4 ++--
 arch/powerpc/platforms/pseries/mobility.c |6 +++---
 arch/powerpc/platforms/pseries/reconfig.c |8 
 drivers/macintosh/smu.c   |2 +-
 drivers/of/base.c |   15 +++
 include/linux/of.h|9 -
 10 files changed, 35 insertions(+), 37 deletions(-)

Index: dt-next/include/linux/of.h
===
--- dt-next.orig/include/linux/of.h 2012-10-02 08:50:22.0 -0500
+++ dt-next/include/linux/of.h  2012-10-02 09:07:23.0 -0500
@@ -263,11 +263,10 @@
 
 extern int of_machine_is_compatible(const char *compat);
 
-extern int prom_add_property(struct device_node* np, struct property* prop);
-extern int prom_remove_property(struct device_node *np, struct property *prop);
-extern int prom_update_property(struct device_node *np,
-   struct property *newprop,
-   struct property *oldprop);
+extern int of_add_property(struct device_node *np, struct property *prop);
+extern int of_remove_property(struct device_node *np, struct property *prop);
+extern int of_update_property(struct device_node *np, struct property *newprop,
+ struct property *oldprop);
 
 #if defined(CONFIG_OF_DYNAMIC)
 /* For updating the device tree at runtime */
Index: dt-next/arch/powerpc/kernel/pci_32.c
===
--- dt-next.orig/arch/powerpc/kernel/pci_32.c   2012-10-02 08:30:22.0 
-0500
+++ dt-next/arch/powerpc/kernel/pci_32.c2012-10-02 09:01:10.0 
-0500
@@ -208,7 +208,7 @@
of_prop->name = "pci-OF-bus-map";
of_prop->length = 256;
of_prop->value = _prop[1];
-   prom_add_property(dn, of_prop);
+   of_add_property(dn, of_prop);
of_node_put(dn);
}
 }
Index: dt-next/arch/powerpc/kernel/machine_kexec.c
===
--- dt-next.orig/arch/powerpc/kernel/machine_kexec.c2012-10-02 
08:30:22.0 -0500
+++ dt-next/arch/powerpc/kernel/machine_kexec.c 2012-10-02 09:01:10.0 
-0500
@@ -212,16 +212,16 @@
 * be sure what's in them, so remove them. */
prop = of_find_property(node, "linux,crashkernel-base", NULL);
if (prop)
-   prom_remove_property(node, prop);
+   of_remove_property(node, prop);
 
prop = of_find_property(node, "linux,crashkernel-size", NULL);
if (prop)
-   prom_remove_property(node, prop);
+   of_remove_property(node, prop);
 
if (crashk_res.start != 0) {
-   prom_add_property(node, _base_prop);
+   of_add_property(node, _base_prop);
crashk_size = resource_size(_res);
-   prom_add_property(node, _size_prop);
+   of_add_property(node, _size_prop);
}
 }
 
@@ -237,11 +237,11 @@
/* remove any stale properties so ours can be found */
prop = of_find_property(node, kernel_end_prop.name, NULL);
if (prop)
-   prom_remove_property(node, prop);
+   of_remove_property(node, prop);
 
/* information needed by userspace when using default_machine_kexec */
kernel_end = __pa(_end);
-   prom_add_property(node, _end_prop);
+   of_add_property(node, _end_prop);
 
export_crashk_values(node);
 
Index: dt-next/arch/powerpc/kernel/machine_kexec_64.c
===
--- dt-next.orig/arch/powerpc/kernel/machine_kexec_64.c 2012-10-02 
08:30:22.0 -0500
+++ dt-next/arch/powerpc/kernel/machine_kexec_64.c  2012-10-02 
09:01:10.0 -0500
@@ -389,14 +389,14 @@
/* remove any stale propertys so ours can be found */
prop = of_find_property(node, htab_base_prop.name, NULL);
if (prop)
-   prom_remove_property(node, prop);
+   of_remove_property(node, prop);
prop = of_find_property(node, htab_size_prop.name, NULL);
if (prop)
-   prom_remove_property(node, prop);
+   of_remove_property(node, prop);
 
htab_base = __pa(htab_address);
-   prom_add_property(node, _base_prop);
-   prom_add_property(node, _size_prop);
+   of_add_property(node, _base_prop);
+   of_add_property(node, _size_prop);
 

[PATCH 3/5] Add of node/property notification chain for adds and removes

2012-10-02 Thread Nathan Fontenot
This patch moves the notification chain for updates to the device tree
from the powerpc/pseries code to the base OF code. This makes this
functionality available to all architectures.

Additionally the notification chain is updated to allow notifications
for property add/remove/update. To make this work a pointer to a new
struct (of_prop_reconfig) is passed to the routines in the notification chain.
The of_prop_reconfig property contains a pointer to the node containing the
property and a pointer to the property itself. In the case of property
updates, the property pointer refers to the new property.

Signed-off-by: Nathan Fontenot 
---
 arch/powerpc/include/asm/pSeries_reconfig.h |   32 --
 arch/powerpc/kernel/prom.c  |6 -
 arch/powerpc/platforms/pseries/dlpar.c  |   14 ++--
 arch/powerpc/platforms/pseries/hotplug-cpu.c|8 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c |   60 +--
 arch/powerpc/platforms/pseries/iommu.c  |6 -
 arch/powerpc/platforms/pseries/reconfig.c   |   65 -
 arch/powerpc/platforms/pseries/setup.c  |6 -
 drivers/of/base.c   |   74 ++--
 include/linux/of.h  |   20 +-
 10 files changed, 154 insertions(+), 137 deletions(-)

Index: dt-next/arch/powerpc/platforms/pseries/reconfig.c
===
--- dt-next.orig/arch/powerpc/platforms/pseries/reconfig.c  2012-10-02 
08:40:51.0 -0500
+++ dt-next/arch/powerpc/platforms/pseries/reconfig.c   2012-10-02 
08:45:12.0 -0500
@@ -16,11 +16,11 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
-#include 
 #include 
 
 /**
@@ -55,28 +55,6 @@
return parent;
 }
 
-static BLOCKING_NOTIFIER_HEAD(pSeries_reconfig_chain);
-
-int pSeries_reconfig_notifier_register(struct notifier_block *nb)
-{
-   return blocking_notifier_chain_register(_reconfig_chain, nb);
-}
-EXPORT_SYMBOL_GPL(pSeries_reconfig_notifier_register);
-
-void pSeries_reconfig_notifier_unregister(struct notifier_block *nb)
-{
-   blocking_notifier_chain_unregister(_reconfig_chain, nb);
-}
-EXPORT_SYMBOL_GPL(pSeries_reconfig_notifier_unregister);
-
-int pSeries_reconfig_notify(unsigned long action, void *p)
-{
-   int err = blocking_notifier_call_chain(_reconfig_chain,
-   action, p);
-
-   return notifier_to_errno(err);
-}
-
 static int pSeries_reconfig_add_node(const char *path, struct property 
*proplist)
 {
struct device_node *np;
@@ -100,13 +78,12 @@
goto out_err;
}
 
-   err = pSeries_reconfig_notify(PSERIES_RECONFIG_ADD, np);
+   err = of_attach_node(np);
if (err) {
printk(KERN_ERR "Failed to add device node %s\n", path);
goto out_err;
}
 
-   of_attach_node(np);
of_node_put(np->parent);
 
return 0;
@@ -134,9 +111,7 @@
return -EBUSY;
}
 
-   pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, np);
of_detach_node(np);
-
of_node_put(parent);
of_node_put(np); /* Must decrement the refcount */
return 0;
@@ -381,7 +356,6 @@
 static int do_update_property(char *buf, size_t bufsize)
 {
struct device_node *np;
-   struct pSeries_reconfig_prop_update upd_value;
unsigned char *value;
char *name, *end, *next_prop;
int rc, length;
@@ -410,41 +384,8 @@
return -ENODEV;
}
 
-   upd_value.node = np;
-   upd_value.property = newprop;
-   pSeries_reconfig_notify(PSERIES_UPDATE_PROPERTY, _value);
-
rc = prom_update_property(np, newprop, oldprop);
-   if (rc)
-   return rc;
-
-   /* For memory under the ibm,dynamic-reconfiguration-memory node
-* of the device tree, adding and removing memory is just an update
-* to the ibm,dynamic-memory property instead of adding/removing a
-* memory node in the device tree.  For these cases we still need to
-* involve the notifier chain.
-*/
-   if (!strcmp(name, "ibm,dynamic-memory")) {
-   int action;
-
-   next_prop = parse_next_property(next_prop, end, ,
-   , );
-   if (!next_prop)
-   return -EINVAL;
-
-   if (!strcmp(name, "add"))
-   action = PSERIES_DRCONF_MEM_ADD;
-   else
-   action = PSERIES_DRCONF_MEM_REMOVE;
-
-   rc = pSeries_reconfig_notify(action, value);
-   if (rc) {
-   prom_update_property(np, oldprop, newprop);
-   return rc;
-   }
-   }
-
-   return 0;
+   return rc;
 }
 
 /**
Index: dt-next/drivers/of/base.c

[PATCH 2/5] Move of_drconf_cell struct definition to asm/prom.h

2012-10-02 Thread Nathan Fontenot
This patch moves the definition of the of_drconf_cell struct to asm/prom.h 
to make it available for all powerpc/pseries code.

Signed-off-by: Nathan Fontenot 

---
 arch/powerpc/include/asm/prom.h |   16 
 arch/powerpc/mm/numa.c  |   12 
 2 files changed, 16 insertions(+), 12 deletions(-)

Index: dt-next/arch/powerpc/mm/numa.c
===
--- dt-next.orig/arch/powerpc/mm/numa.c 2012-10-02 08:30:23.0 -0500
+++ dt-next/arch/powerpc/mm/numa.c  2012-10-02 08:41:42.0 -0500
@@ -397,18 +397,6 @@
return result;
 }
 
-struct of_drconf_cell {
-   u64 base_addr;
-   u32 drc_index;
-   u32 reserved;
-   u32 aa_index;
-   u32 flags;
-};
-
-#define DRCONF_MEM_ASSIGNED0x0008
-#define DRCONF_MEM_AI_INVALID  0x0040
-#define DRCONF_MEM_RESERVED0x0080
-
 /*
  * Read the next memblock list entry from the ibm,dynamic-memory property
  * and return the information in the provided of_drconf_cell structure.
Index: dt-next/arch/powerpc/include/asm/prom.h
===
--- dt-next.orig/arch/powerpc/include/asm/prom.h2011-11-17 
09:12:07.0 -0600
+++ dt-next/arch/powerpc/include/asm/prom.h 2012-10-02 08:41:42.0 
-0500
@@ -58,6 +58,22 @@
 
 extern void of_instantiate_rtc(void);
 
+/* The of_drconf_cell struct defines the layout of the LMB array
+ * specified in the device tree property
+ * ibm,dynamic-reconfiguration-memory/ibm,dynamic-memory
+ */
+struct of_drconf_cell {
+   u64 base_addr;
+   u32 drc_index;
+   u32 reserved;
+   u32 aa_index;
+   u32 flags;
+};
+
+#define DRCONF_MEM_ASSIGNED0x0008
+#define DRCONF_MEM_AI_INVALID  0x0040
+#define DRCONF_MEM_RESERVED0x0080
+
 /* These includes are put at the bottom because they may contain things
  * that are overridden by this file.  Ideally they shouldn't be included
  * by this file, but there are a bunch of .c files that currently depend

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] Add /proc device tree updating to of node add/remove

2012-10-02 Thread Nathan Fontenot
When adding or removing a device tree node we should also update
the device tree in /proc/device-tree. This action is already done in the
generic OF code for adding/removing properties of a node. This patch adds
this functionality for nodes.

Signed-off-by: Nathan Fontenot  
---
 arch/powerpc/platforms/pseries/dlpar.c|   24 -
 arch/powerpc/platforms/pseries/reconfig.c |   47 -
 drivers/of/base.c |   55 +++---
 3 files changed, 51 insertions(+), 75 deletions(-)

Index: dt-next/arch/powerpc/platforms/pseries/dlpar.c
===
--- dt-next.orig/arch/powerpc/platforms/pseries/dlpar.c 2012-10-02 
08:30:23.0 -0500
+++ dt-next/arch/powerpc/platforms/pseries/dlpar.c  2012-10-02 
08:40:51.0 -0500
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -255,9 +254,6 @@
 
 int dlpar_attach_node(struct device_node *dn)
 {
-#ifdef CONFIG_PROC_DEVICETREE
-   struct proc_dir_entry *ent;
-#endif
int rc;
 
of_node_set_flag(dn, OF_DYNAMIC);
@@ -274,32 +270,12 @@
}
 
of_attach_node(dn);
-
-#ifdef CONFIG_PROC_DEVICETREE
-   ent = proc_mkdir(strrchr(dn->full_name, '/') + 1, dn->parent->pde);
-   if (ent)
-   proc_device_tree_add_node(dn, ent);
-#endif
-
of_node_put(dn->parent);
return 0;
 }
 
 int dlpar_detach_node(struct device_node *dn)
 {
-#ifdef CONFIG_PROC_DEVICETREE
-   struct device_node *parent = dn->parent;
-   struct property *prop = dn->properties;
-
-   while (prop) {
-   remove_proc_entry(prop->name, dn->pde);
-   prop = prop->next;
-   }
-
-   if (dn->pde)
-   remove_proc_entry(dn->pde->name, parent->pde);
-#endif
-
pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, dn);
of_detach_node(dn);
of_node_put(dn); /* Must decrement the refcount */
Index: dt-next/arch/powerpc/platforms/pseries/reconfig.c
===
--- dt-next.orig/arch/powerpc/platforms/pseries/reconfig.c  2012-10-02 
08:30:23.0 -0500
+++ dt-next/arch/powerpc/platforms/pseries/reconfig.c   2012-10-02 
08:40:51.0 -0500
@@ -23,48 +23,6 @@
 #include 
 #include 
 
-
-
-/*
- * Routines for "runtime" addition and removal of device tree nodes.
- */
-#ifdef CONFIG_PROC_DEVICETREE
-/*
- * Add a node to /proc/device-tree.
- */
-static void add_node_proc_entries(struct device_node *np)
-{
-   struct proc_dir_entry *ent;
-
-   ent = proc_mkdir(strrchr(np->full_name, '/') + 1, np->parent->pde);
-   if (ent)
-   proc_device_tree_add_node(np, ent);
-}
-
-static void remove_node_proc_entries(struct device_node *np)
-{
-   struct property *pp = np->properties;
-   struct device_node *parent = np->parent;
-
-   while (pp) {
-   remove_proc_entry(pp->name, np->pde);
-   pp = pp->next;
-   }
-   if (np->pde)
-   remove_proc_entry(np->pde->name, parent->pde);
-}
-#else /* !CONFIG_PROC_DEVICETREE */
-static void add_node_proc_entries(struct device_node *np)
-{
-   return;
-}
-
-static void remove_node_proc_entries(struct device_node *np)
-{
-   return;
-}
-#endif /* CONFIG_PROC_DEVICETREE */
-
 /**
  * derive_parent - basically like dirname(1)
  * @path:  the full_name of a node to be added to the tree
@@ -149,9 +107,6 @@
}
 
of_attach_node(np);
-
-   add_node_proc_entries(np);
-
of_node_put(np->parent);
 
return 0;
@@ -179,8 +134,6 @@
return -EBUSY;
}
 
-   remove_node_proc_entries(np);
-
pSeries_reconfig_notify(PSERIES_RECONFIG_REMOVE, np);
of_detach_node(np);
 
Index: dt-next/drivers/of/base.c
===
--- dt-next.orig/drivers/of/base.c  2012-10-02 08:30:47.0 -0500
+++ dt-next/drivers/of/base.c   2012-10-02 08:40:51.0 -0500
@@ -1103,6 +1103,22 @@
  * device tree nodes.
  */
 
+#ifdef CONFIG_PROC_DEVICETREE
+static void of_add_proc_dt_entry(struct device_node *dn)
+{
+   struct proc_dir_entry *ent;
+
+   ent = proc_mkdir(strrchr(dn->full_name, '/') + 1, dn->parent->pde);
+   if (ent)
+   proc_device_tree_add_node(dn, ent);
+}
+#else
+static void of_add_proc_dt_entry(struct device_node *dn)
+{
+   return;
+}
+#endif
+
 /**
  * of_attach_node - Plug a device node into the tree and global list.
  */
@@ -1116,7 +1132,30 @@
np->parent->child = np;
allnodes = np;
write_unlock_irqrestore(_lock, flags);
+
+   of_add_proc_dt_entry(np);
+}
+
+#ifdef CONFIG_PROC_DEVICETREE
+static void of_remove_proc_dt_entry(struct device_node *dn)
+{
+   struct device_node *parent = dn->parent;
+   struct property *prop = dn->properties;
+
+   while 

Re: [git pull] vfs, part 1

2012-10-02 Thread Al Viro
On Tue, Oct 02, 2012 at 07:31:58PM -0700, Linus Torvalds wrote:
> On Tue, Oct 2, 2012 at 6:39 PM, Al Viro  wrote:
> > This is *not* all; fs/dcache.c bits will go separately, for one
> > thing - that's just the first pile.  Please, pull from the usual place -
> 
> Al, *please* describe what is going on. Your description is negative
> (what *doesn't* happen in this pull) and does not at all describe what
> is going on.

Umm...  OK, but it won't be particulary pretty:
* big one - consolidation of descriptor-related logics; almost all
of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the
result to fd.c.  As it is, we have a situation when file_table.c is about
handling of struct file and file.c is about handling of descriptor tables;
the reasons are historical - file_table.c used to be about a static array
of struct file we used to have way back).   A lot of stray ends got cleaned
up and converted to saner primitives, disgusting mess in android/binder.c
is still disgusting, but at least doesn't poke so much in descriptor table
guts anymore.  A bunch of relatively minor races got fixed in process,
plus an ext4 struct file leak.
* related thing - fget_light() partially unuglified; see fdget()
in there (and yes, it generates the code as good as we used to have).
* also related - bits of Cyrill's procfs stuff that got entangled
into that work; _not_ all of it, just the initial move to fs/proc/fd.c
and switch of fdinfo to seq_file.
* Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts.  The rest is a separate pile, this
was just a mechanical code movement.
* a few misc patches all over the place.  Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH] Extensible AIO interface

2012-10-02 Thread Kent Overstreet
On Wed, Oct 03, 2012 at 11:28:25AM +1000, Dave Chinner wrote:
> On Tue, Oct 02, 2012 at 05:20:29PM -0700, Kent Overstreet wrote:
> > On Tue, Oct 02, 2012 at 01:41:17PM -0400, Jeff Moyer wrote:
> > > Kent Overstreet  writes:
> > > 
> > > > So, I and other people keep running into things where we really need to
> > > > add an interface to pass some auxiliary... stuff along with a pread() or
> > > > pwrite().
> > > >
> > > > A few examples:
> > > >
> > > > * IO scheduler hints. Some userspace program wants to, per IO, specify
> > > > either priorities or a cgroup - by specifying a cgroup you can have a
> > > > fileserver in userspace that makes use of cfq's per cgroup bandwidth
> > > > quotas.
> > > 
> > > You can do this today by splitting I/O between processes and placing
> > > those processes in different cgroups.  For io priority, there is
> > > ioprio_set, which incurs an extra system call, but can be used.  Not
> > > elegant, but possible.
> > 
> > Yes - those are things I'm trying to replace. Doing it that way is a
> > real pain, both as it's a lousy interface for this and it does impact
> > performance (ioprio_set doesn't really work too well with aio, too).
> > 
> > > > * Cache hints. For bcache and other things, userspace may want to 
> > > > specify
> > > > "this data should be cached", "this data should bypass the cache", etc.
> > > 
> > > Please explain how you will differentiate this from posix_fadvise.
> > 
> > Oh sorry, I think about SSD caching so much I forget to say that's what
> > I'm talking about. posix_fadvise is for the page cache, we want
> > something different for an SSD cache (IMO it'd be really ugly to use it
> > for both, and posix_fadvise() can't really specifify everything we'd
> > want to for an SSD cache).
> 
> Similar discussions about posix_fadvise() are being had for marking
> ranges of files as volatile (i.e. useful for determining what can be
> evicted from a cache when space reclaim is required).
> 
> https://lkml.org/lkml/2012/10/2/501

Hmm, interesting

Speaking as an implementor though, hints that aren't associated with any
specific IO are harder to make use of - stuff is in the cache. What you
really want is to know, for a given IO, whether to cache it or not, and
possibly where in the LRU to stick it.

Well, it's quite possible that different implementations would have no
trouble making use of those kinds of hints, I'm no doubt biased by
having implemented bcache. With bcache though, cache replacement is done
in terms of physical address space, not logical (i.e. the address space
of the device being cached). 

So to handle posix_fadvise, we'd have to walk the btree and chase
pointers to buckets, and modify the bucket priorities up or down... but
what about the other data in those buckets? It's not clear what should
happen, but there isn't any good way to take that into account.

(The exception is dropping data from the cache entirely, we can just
invalidate that part of the keyspace and garbage collection will reclaim
the buckets they pointed to. Bcache does that for discard requests,
currently).

> If you have requirements for specific cache management, then it
> might be worth seeing if you can steer an existing interface
> proposal for some form of cache management in the direction you
> need.

Certainly - I don't plan on implementing anything bcache specific, or
implementing anything from scratch if there's a good proposal out there.
But a per-io interface does seem useful from an implementation pov and
natural to use for at least some classes of applications.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: ethernet: davinci_cpdma: decrease the desc count when cleaning up the remaining packets

2012-10-02 Thread David Miller
From: Tao Hou 
Date: Tue,  2 Oct 2012 10:42:43 +0800

> chan->count is used by rx channel. If the desc count is not updated by
> the clean up loop in cpdma_chan_stop, the value written to the rxfree
> register in cpdma_chan_start will be incorrect.
> 
> Signed-off-by: Tao Hou 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git pull] vfs, part 1

2012-10-02 Thread Linus Torvalds
On Tue, Oct 2, 2012 at 6:39 PM, Al Viro  wrote:
> This is *not* all; fs/dcache.c bits will go separately, for one
> thing - that's just the first pile.  Please, pull from the usual place -

Al, *please* describe what is going on. Your description is negative
(what *doesn't* happen in this pull) and does not at all describe what
is going on.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] hwmon: (ads7828) driver cleanup

2012-10-02 Thread Vivien Didelot
Hi Guenter,

Some of those changes weren't in the mainline tree a few days ago.
I'll cherry-pick them and send a update very soon :-)

Thanks for the tips,
Vivien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the ubi tree

2012-10-02 Thread Stephen Rothwell
Hi Artem,

After merging the ubi tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

ERROR: "ubi_update_fastmap" [drivers/mtd/ubi/ubi.ko] undefined!
ERROR: "ubi_scan_fastmap" [drivers/mtd/ubi/ubi.ko] undefined!
ERROR: "ubi_calc_fm_size" [drivers/mtd/ubi/ubi.ko] undefined!

I have used the ubi tree form next-20121002 for today.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpggJOs1F8TF.pgp
Description: PGP signature


[PATCH] extcon: max77693: Fix max77693_muic_probe error handling

2012-10-02 Thread Axel Lin
Fix below issues:
1. If request_threaded_irq() fails, current code does not free all requested
   irqs.
2. Add missing extcon_dev_unregister() in error path if failed to read revision
   number.

Signed-off-by: Axel Lin 
---
 drivers/extcon/extcon-max77693.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/extcon/extcon-max77693.c b/drivers/extcon/extcon-max77693.c
index e0ed622..9928c63 100644
--- a/drivers/extcon/extcon-max77693.c
+++ b/drivers/extcon/extcon-max77693.c
@@ -696,12 +696,8 @@ static int __devinit max77693_muic_probe(struct 
platform_device *pdev)
IRQF_ONESHOT, muic_irq->name, info);
if (ret) {
dev_err(>dev,
-   "failed: irq request (IRQ: %d,"
-   " error :%d)\n",
+   "failed: irq request (IRQ: %d, error :%d)\n",
muic_irq->irq, ret);
-
-   for (i = i - 1; i >= 0; i--)
-   free_irq(muic_irq->virq, info);
goto err_irq;
}
}
@@ -726,7 +722,7 @@ static int __devinit max77693_muic_probe(struct 
platform_device *pdev)
MAX77693_MUIC_REG_ID, );
if (ret < 0) {
dev_err(>dev, "failed to read revision number\n");
-   goto err_extcon;
+   goto err_read_reg;
}
dev_info(info->dev, "device ID : 0x%x\n", id);
 
@@ -738,9 +734,13 @@ static int __devinit max77693_muic_probe(struct 
platform_device *pdev)
 
return ret;
 
+err_read_reg:
+   extcon_dev_unregister(info->edev);
 err_extcon:
kfree(info->edev);
 err_irq:
+   while (--i >= 0)
+   free_irq(muic_irqs[i].virq, info);
 err_regmap:
kfree(info);
 err_kfree:
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Fix build error caused by broken PCH_PTP module dependency.

2012-10-02 Thread David Miller
From: Haicheng Li 
Date: Fri, 28 Sep 2012 14:57:38 +0800

> On 09/28/2012 02:46 PM, David Miller wrote:
>> From: Haicheng Li
>> Date: Fri, 28 Sep 2012 14:41:43 +0800
>>
>>> On 09/28/2012 06:09 AM, David Miller wrote:
 Look at how other people submit patches, do any other patch
 submissions
 look like your's having all of this metadata in the message body:
>>> I'm sorry for it.
>>>
 As for this specific patch:

> - depends on PTP_1588_CLOCK_PCH
> + depends on PTP_1588_CLOCK_PCH = PCH_GBE

 This is not the correct way to ensure that the module'ness of one
 config option meets the module'ness requirements of another.
 The correct way is to say something like "&&   (PCH_GBE || PCH_GBE=n)"
>>>
>>> This case is a little bit tricky than usual, with PCH_PTP selected,
>>> the valid config would be either "PTP_1588_CLOCK_PCH=PCH_GBE=m" or
>>> "PTP_1588_CLOCK_PCH=PCH_GBE=y", and PTP_1588_CLOCK_PCH depends on
>>> PCH_GBE.
>>
>> And a simple "&&  PCH_GBE" should accomplish this, no?
> No sir. it's actually same with the original Kconfig (by a if
> PCH_GBE"), it just failed with this config:
> 
> CONFIG_PCH_GBE=y
> CONFIG_PCH_PTP=y
> CONFIG_PTP_1588_CLOCK=m

The correct fix is to make the Kconfig entry for PCH_PTP use
a "select PTP_1588_CLOCK" instead of "depends PTP_1588_CLOCK"

I'll apply this fix.

The is another, extremely convoluted, way to do this, which is
what the SFC driver does which is:

depends on SFC && PTP_1588_CLOCK && !(SFC=y && PTP_1588_CLOCK=m)

but that looks horrible to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] memory-hotplug : notification of memoty block's state

2012-10-02 Thread Ni zhan Chen

On 10/03/2012 09:21 AM, Yasuaki Ishimatsu wrote:

Hi Andrew,

2012/10/03 6:42, Andrew Morton wrote:

On Tue, 2 Oct 2012 17:25:06 +0900
Yasuaki Ishimatsu  wrote:

remove_memory() offlines memory. And it is called by following two 
cases:


1. echo offline >/sys/devices/system/memory/memoryXX/state
2. hot remove a memory device

In the 1st case, the memory block's state is changed and the 
notification

that memory block's state changed is sent to userland after calling
offline_memory(). So user can notice memory block is changed.

But in the 2nd case, the memory block's state is not changed and the
notification is not also sent to userspcae even if calling 
offline_memory().

So user cannot notice memory block is changed.

We should also notify to userspace at 2nd case.


These two little patches look reasonable to me.

There's a lot of recent activity with memory hotplug!  We're in the 3.7
merge window now so it is not a good time to be merging new material.



Also there appear to be two teams working on it and it's unclear to me
how well coordinated this work is?


As you know, there are two teams for developing the memory hotplug.
  - Wen's patch-set
https://lkml.org/lkml/2012/9/5/201

  - Lai's patch-set
https://lkml.org/lkml/2012/9/10/180

Wen's patch-set is for removing physical memory. Now, I'm splitting the
patch-set for reviewing more easy. If the patch-set is merged into
linux kernel, I believe that linux on x86 can hot remove a physical
memory device.

But it is not enough since we cannot remove a memory which has kernel
memory. If we guarantee the memory hot remove, the memory must belong
to ZONE_MOVABLE.

So Lai's patch-set tries to create a movable node that the all memory
belongs to ZONE_MOVABLE.

I think there are two chances for creating the movable node.
  - boot time
  - after hot add memory

- boot time

For creating a movable memory, linux has two kernel parameters
(kernelcore and movablecore). But it is not enough, since even if we
set the kernel paramter, the movable memory is distributed evenly in
each node. So we introduce the kernelcore_max_addr boot parameter.
The parameter limits the range of the memory used as a kernel memory.

For example, the system has following nodes.

node0 : 0x4000 - 0x8000
node1 : 0x8000 - 0xc000

And when I want to hot remove a node1, we set 
"kernelcore_max_addr=0x8000".

In doing so, kernel memory is limited within 0x8000 and node1's
memory belongs to ZONE_MOEVALBE. As a result, we can guarantee that
node1 is a movable node and we always hot remove node1.

- after hot add memory

When hot adding memory, the memory belongs to ZONE_NORMAL and is offline.
If we online the memory, the memory may have kernel memory. In this case,
we cannot hot remove the memory. So we introduce the online_movable
function. If we use the function as follow, the memory belongs to
ZONE_MOVABLE.

echo online_movable > /sys/devices/system/node/nodeX/memoryX/state

So when new node is hot added and I echo "online_movale" to all hot added
memory, the node's memory belongs to ZONE_MOVABLE. As a result, we can Y
guarantee that the node is a movable node and we always hot remove node.


Hi Yasuaki,

This time can kernel memory allocated from ZONE_MOVABLE ?



# I hope to help your understanding about our works by the information.

Thanks,
Yasuaki Ishimatsu



However these two patches are pretty simple and do fix a problem, so I
added them to the 3.7 MM queue.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC, PATCH] Extensible AIO interface

2012-10-02 Thread Tejun Heo
Hello, Kent.

On Tue, Oct 02, 2012 at 02:41:13PM -0700, Kent Overstreet wrote:
> Seems to me it'd be no different from security considerations when
> introducing new ioctls. I.e., messy, ad hoc, easy to get wrong, but
> sometimes no way around it.
> 
> It really has to be ad hoc if it's extensible, unfortunately.
> 
> The only way of getting around _that_ would be with some kind of
> reflective type system, so that generic code could parse (in some
> fashion) the types of the various attributes, and for pointers copy the
> user data into the kernel and do whatever access controls in generic
> code.

I'm not userland API expert by any stretch of imagination but the
basic mechanism to pass data around seems sane to me and aio as stinky
as it is seems to be the only logical stuff for IOs w/ extra
attributes although alignment is always painful with any form of
concatenated opaque structures.

However, I don't think it's a good idea to try to implement something
which is a neutral transport of opaque data between userland and lower
layers.  Things like that sound attractive with unlimited
possibilities but reality seems to have the tendancy to make a big
mess out of setups like that.

So, if we're gonna do this, let's define what attributes we want to
have and let them be processed at the interface layer and fed to lower
layers afterwards - e.g. for cgroup context association, associate the
resulting bios with the target cgroup from the aio layer.

I'm quite skeptical of general usefulness of having opaque knobs to
lower IO stack which don't have proper generic abstraction.  Nobody
can make proper use of things like that.  Well, not nobody, maybe if
the lower stack, the interface and the application are implemented by
a single organization over relatively short span of time, maybe it
would be useful for them, but that isn't something which generic
interface design should focus on.

It's okay to allow some side channel thing for specific hacky uses but
I really hope the general design were focused around properly
abstracted attributes which can be understood and handled by the upper
layer.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] vfs, part 1

2012-10-02 Thread Al Viro
This is *not* all; fs/dcache.c bits will go separately, for one
thing - that's just the first pile.  Please, pull from the usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus
Shortlog:
Al Viro (67):
  do_add_mount()/umount -l races
  close the race in nlmsvc_free_block()
  Merge remote branch 'origin' into for-next
  take descriptor handling from sock_alloc_file() to callers
  unexport sock_map_fd(), switch to sock_alloc_file()
  make get_unused_fd_flags() a function
  autofs4: don't open-code fd_install()
  binder: don't allow mmap() by process other than proc->tsk
  pipe(2) - race-free error recovery
  events: don't use get_unused_fd_flags() when get_unused_fd() will do
  fanotify: sanitize failure exits in copy_event_to_user()
  take rlimit check to callers of expand_files()
  new helper: __alloc_fd()
  move files_struct-related bits from kernel/exit.c to fs/file.c
  don't bother with call_rcu() in put_files_struct()
  trim free_fdtable_rcu()
  move put_unused_fd() and fd_install() to fs/file.c
  expose a low-level variant of fd_install() for binder
  take fget() and friends to fs/file.c
  take descriptor-related part of close() to file.c
  um: resurrect the right variant of mconsole_proc()
  take close-on-exec logics to fs/file.c, clean it up a bit
  take purely descriptor-related stuff from fcntl.c to file.c
  new helper: replace_fd()
  switch flush_unauthorized_files() to replace_fd()
  take __{set,clear}_{open_fd,close_on_exec}() into fs/file.c
  make expand_files() and alloc_fd() static
  new helper: iterate_fd()
  do_coredump(): make sure that descriptor table isn't shared
  switch spufs/coredump to iterate_fd()
  new helper: daemonize_descriptors()
  don't leak O_CLOEXEC into ->f_flags
  namei.c: fix BS comment
  switch ftruncate(2) to fget_light
  switch fallocate(2) to fget_light()
  switch fchmod(2) to fget_light()
  switch fadvise(2) to fget_light()
  switch readahead(2) to fget_light()
  switch osf_getdirentries() to fget_light()
  switch itanic perfmonctl(2) to fget_light()
  switch hpux_getdents() to fget_light()
  ext4: close struct file leak on EXT4_IOC_MOVE_EXT
  export fget_light
  switch EXT4_IOC_MOVE_EXT to fget_light()
  switch btrfs_ioctl_snap_create_transid() to fget_light()
  switch epoll_wait(2) to fget_light()
  switch timerfd_[sg]ettime(2) to fget_light()
  switch SNDRV_PCM_IOCTL_LINK to fget_light()
  switch mqueue syscalls to fget_light()
  switch btrfs_ioctl_clone() to fget_light()
  switch vfio_group_set_container() to fget_light()
  switch infinibarf users of fget() to fget_light()
  switch coda get_device_index() to fget_light()
  switch xfs_swapext() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch prctl_set_mm_exe_file() to fget_light()
  vhost_set_vring(): turn pollstart/pollstop into bool
  make get_file() return its argument
  proc_map_files_readdir(): don't bother with grabbing files
  switch o2hb_region_dev_write() to fget_light()
  new helpers: fdget()/fdput()
  switch simple cases of fget_light to fdget
  hypfs: ->d_parent is never NULL or negative
  ceph: don't abuse d_delete() on failure exits
  fcntl: fix misannotations
  usb/gadget: fix misannotations
  btrfs: reada_extent doesn't need kref for refcount

Alan Cox (1):
  vfs: delete surplus inode NULL check

Alex Kelly (1):
  coredump: move core dump functionality into its own file

Catalin Marinas (1):
  compat: fs: Generic compat_sys_sendfile implementation

Chuck Lever (1):
  MAX_LFS_FILESIZE should be a loff_t

Cyrill Gorcunov (2):
  procfs: Move /proc/pid/fd[info] handling code to fd.[ch]
  procfs: Convert /proc/pid/fdinfo/ handling routines to seq-file v2

Denys Vlasenko (1):
  coredump: prevent double-free on an error path in core dumper

Kirill A. Shutemov (1):
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems

Diffstat:
 arch/alpha/kernel/osf_sys.c  |   13 +-
 arch/ia64/kernel/perfmon.c   |   18 +-
 arch/parisc/hpux/fs.c|   17 +-
 arch/powerpc/include/asm/systbl.h|4 +-
 arch/powerpc/include/asm/unistd.h|1 +
 arch/powerpc/kernel/sys_ppc32.c  |   45 +--
 arch/powerpc/platforms/cell/spu_syscalls.c   |   21 +-
 arch/powerpc/platforms/cell/spufs/coredump.c |   40 +-
 arch/s390/hypfs/inode.c  |2 -
 arch/sparc/include/asm/unistd.h  |1 +
 arch/sparc/kernel/sys32.S|2 +-
 arch/sparc/kernel/sys_sparc32.c  |   46 --
 arch/um/drivers/mconsole_kern.c  |   99 +---
 drivers/base/dma-buf.c   |3 +-
 drivers/infiniband/core/ucma.c 

Re: [RFC, PATCH] Extensible AIO interface

2012-10-02 Thread Dave Chinner
On Tue, Oct 02, 2012 at 05:20:29PM -0700, Kent Overstreet wrote:
> On Tue, Oct 02, 2012 at 01:41:17PM -0400, Jeff Moyer wrote:
> > Kent Overstreet  writes:
> > 
> > > So, I and other people keep running into things where we really need to
> > > add an interface to pass some auxiliary... stuff along with a pread() or
> > > pwrite().
> > >
> > > A few examples:
> > >
> > > * IO scheduler hints. Some userspace program wants to, per IO, specify
> > > either priorities or a cgroup - by specifying a cgroup you can have a
> > > fileserver in userspace that makes use of cfq's per cgroup bandwidth
> > > quotas.
> > 
> > You can do this today by splitting I/O between processes and placing
> > those processes in different cgroups.  For io priority, there is
> > ioprio_set, which incurs an extra system call, but can be used.  Not
> > elegant, but possible.
> 
> Yes - those are things I'm trying to replace. Doing it that way is a
> real pain, both as it's a lousy interface for this and it does impact
> performance (ioprio_set doesn't really work too well with aio, too).
> 
> > > * Cache hints. For bcache and other things, userspace may want to specify
> > > "this data should be cached", "this data should bypass the cache", etc.
> > 
> > Please explain how you will differentiate this from posix_fadvise.
> 
> Oh sorry, I think about SSD caching so much I forget to say that's what
> I'm talking about. posix_fadvise is for the page cache, we want
> something different for an SSD cache (IMO it'd be really ugly to use it
> for both, and posix_fadvise() can't really specifify everything we'd
> want to for an SSD cache).

Similar discussions about posix_fadvise() are being had for marking
ranges of files as volatile (i.e. useful for determining what can be
evicted from a cache when space reclaim is required).

https://lkml.org/lkml/2012/10/2/501

If you have requirements for specific cache management, then it
might be worth seeing if you can steer an existing interface
proposal for some form of cache management in the direction you
need.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] memory-hotplug : notification of memoty block's state

2012-10-02 Thread Yasuaki Ishimatsu

Hi Andrew,

2012/10/03 6:42, Andrew Morton wrote:

On Tue, 2 Oct 2012 17:25:06 +0900
Yasuaki Ishimatsu  wrote:


remove_memory() offlines memory. And it is called by following two cases:

1. echo offline >/sys/devices/system/memory/memoryXX/state
2. hot remove a memory device

In the 1st case, the memory block's state is changed and the notification
that memory block's state changed is sent to userland after calling
offline_memory(). So user can notice memory block is changed.

But in the 2nd case, the memory block's state is not changed and the
notification is not also sent to userspcae even if calling offline_memory().
So user cannot notice memory block is changed.

We should also notify to userspace at 2nd case.


These two little patches look reasonable to me.

There's a lot of recent activity with memory hotplug!  We're in the 3.7
merge window now so it is not a good time to be merging new material.



Also there appear to be two teams working on it and it's unclear to me
how well coordinated this work is?


As you know, there are two teams for developing the memory hotplug.
  - Wen's patch-set
https://lkml.org/lkml/2012/9/5/201

  - Lai's patch-set
https://lkml.org/lkml/2012/9/10/180

Wen's patch-set is for removing physical memory. Now, I'm splitting the
patch-set for reviewing more easy. If the patch-set is merged into
linux kernel, I believe that linux on x86 can hot remove a physical
memory device.

But it is not enough since we cannot remove a memory which has kernel
memory. If we guarantee the memory hot remove, the memory must belong
to ZONE_MOVABLE.

So Lai's patch-set tries to create a movable node that the all memory
belongs to ZONE_MOVABLE.

I think there are two chances for creating the movable node.
  - boot time
  - after hot add memory

- boot time

For creating a movable memory, linux has two kernel parameters
(kernelcore and movablecore). But it is not enough, since even if we
set the kernel paramter, the movable memory is distributed evenly in
each node. So we introduce the kernelcore_max_addr boot parameter.
The parameter limits the range of the memory used as a kernel memory.

For example, the system has following nodes.

node0 : 0x4000 - 0x8000
node1 : 0x8000 - 0xc000

And when I want to hot remove a node1, we set "kernelcore_max_addr=0x8000".
In doing so, kernel memory is limited within 0x8000 and node1's
memory belongs to ZONE_MOEVALBE. As a result, we can guarantee that
node1 is a movable node and we always hot remove node1.

- after hot add memory

When hot adding memory, the memory belongs to ZONE_NORMAL and is offline.
If we online the memory, the memory may have kernel memory. In this case,
we cannot hot remove the memory. So we introduce the online_movable
function. If we use the function as follow, the memory belongs to
ZONE_MOVABLE.

echo online_movable > /sys/devices/system/node/nodeX/memoryX/state

So when new node is hot added and I echo "online_movale" to all hot added
memory, the node's memory belongs to ZONE_MOVABLE. As a result, we can
guarantee that the node is a movable node and we always hot remove node.

# I hope to help your understanding about our works by the information.

Thanks,
Yasuaki Ishimatsu



However these two patches are pretty simple and do fix a problem, so I
added them to the 3.7 MM queue.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Soft lockup in scsi_remove_target under 3.6 (regression from 3.5)

2012-10-02 Thread Jonathan McDowell
Upgraded to 3.6 today on my dev box and after seeing an FC attached SAN
go down and come back up (due to an expected reboot) I started getting
the following in my logs. It continues even after the array is back and
functioning - I'm seeing:

 kernel:[109104.348034] BUG: soft lockup - CPU#6 stuck for 23s!
[kworker/6:0:30692]

repeated on logged in sessions and backtraces like the following (this
is the first). I don't see the same problem under 3.5.

[10819.389706] device-mapper: multipath: Failing path 8:240.
[11233.683936] device-mapper: multipath: Failing path 8:240.
[108394.592042]  rport-10:0-4: blocked FC remote port time out: removing target 
and saving binding
[108394.609594] sd 10:0:1:0: rejecting I/O to offline device
[108394.620457] lpfc :0c:00.0: 2:(0):0203 Devloss timeout on WWPN 
21:11:00:02:ac:01:86:06 NPort x030500 Data: x0 x7 x0
[108394.620591] sd 10:0:1:0: alua: Detached
[108394.650159] sd 10:0:1:0: [sdbc] Synchronizing SCSI cache
[108394.661071] sd 10:0:1:0: [sdbc]  
[108394.667877] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[108394.680154] ses 10:0:1:254: alua: Detached
[108420.348032] BUG: soft lockup - CPU#6 stuck for 23s! [kworker/6:0:30692]
[108420.352003] Modules linked in: nfsv4 autofs4 ip6table_filter ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables rpcsec_gss_krb5 ipv6 
nfsd nfs_acl auth_rpcgss nfs lockd sunrpc dm_round_robin dm_multipath 
ipmi_devintf ipmi_si ipmi_msghandler sg evdev acpi_cpufreq freq_table serio_raw 
mperf processor button thermal_sys coretemp kvm_intel kvm lpc_ich ioatdma 
mfd_core tpm_tis i2c_i801 tpm microcode tpm_bios rng_core i2c_core i5k_amb dca 
ses enclosure ata_generic lpfc ata_piix scsi_transport_fc scsi_tgt
[108420.352003] CPU 6 
[108420.352003] Pid: 30692, comm: kworker/6:0 Not tainted 3.6.0 #5 Intel 
S5000PAL./S5000PAL0 
[108420.352003] RIP: 0010:[]  [] 
_raw_spin_unlock_irqrestore+0x5/0x6
[108420.352003] RSP: 0018:8802563a7d98  EFLAGS: 0286
[108420.352003] RAX: 88024e975000 RBX: 00bb RCX: 

[108420.352003] RDX:  RSI: 0286 RDI: 
88024e975050
[108420.352003] RBP: 88024e975000 R08:  R09: 
8166f890
[108420.352003] R10: 88024e975000 R11: a00d6bf0 R12: 

[108420.352003] R13: 8166f890 R14: 88024e975000 R15: 
a00d6bf0
[108420.352003] FS:  () GS:88025fd8() 
knlGS:
[108420.352003] CS:  0010 DS:  ES:  CR0: 8005003b
[108420.352003] CR2: 7f5b5dec6070 CR3: 00024f0eb000 CR4: 
07e0
[108420.352003] DR0:  DR1:  DR2: 

[108420.352003] DR3:  DR6: 0ff0 DR7: 
0400
[108420.352003] Process kworker/6:0 (pid: 30692, threadinfo 8802563a6000, 
task 880252f32a10)
[108420.352003] Stack:
[108420.352003]  8125a498 88025fd8d080 0286 
a0015c28
[108420.352003]  88024d1207c0 88025fd8d080 88025fd98100 
a0015c28
[108420.352003]   88024f22abd8 81045d07 
00012240
[108420.352003] Call Trace:
[108420.352003]  [] ? scsi_remove_target+0x138/0x154
[108420.352003]  [] ? store_fc_host_system_hostname+0x66/0x66 
[scsi_transport_fc]
[108420.352003]  [] ? store_fc_host_system_hostname+0x66/0x66 
[scsi_transport_fc]
[108420.352003]  [] ? process_one_work+0x1f8/0x30a
[108420.352003]  [] ? worker_thread+0x21b/0x314
[108420.352003]  [] ? process_one_work+0x30a/0x30a
[108420.352003]  [] ? process_one_work+0x30a/0x30a
[108420.352003]  [] ? kthread+0x81/0x89
[108420.352003]  [] ? kernel_thread_helper+0x4/0x10
[108420.352003]  [] ? kthread_freezable_should_stop+0x4e/0x4e
[108420.352003]  [] ? gs_change+0xb/0xb
[108420.352003] Code: 66 39 d0 0f 94 c0 0f b6 c0 c3 fa b8 00 01 00 00 f0 66 0f 
c1 07 88 c2 66 c1 e8 08 38 c2 74 06 f3 90 8a 17 eb f6 c3 80 07 01 56 9d  83 
ca ff f0 0f c1 17 b8 01 00 00 00 ff ca 79 05 f0 ff 07 30 
[108448.348033] BUG: soft lockup - CPU#6 stuck for 22s! [kworker/6:0:30692]
[108448.352003] Modules linked in: nfsv4 autofs4 ip6table_filter ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables rpcsec_gss_krb5 ipv6 
nfsd nfs_acl auth_rpcgss nfs lockd sunrpc dm_round_robin dm_multipath 
ipmi_devintf ipmi_si ipmi_msghandler sg evdev acpi_cpufreq freq_table serio_raw 
mperf processor button thermal_sys coretemp kvm_intel kvm lpc_ich ioatdma 
mfd_core tpm_tis i2c_i801 tpm microcode tpm_bios rng_core i2c_core i5k_amb dca 
ses enclosure ata_generic lpfc ata_piix scsi_transport_fc scsi_tgt
[108448.352003] CPU 6 
[108448.352003] Pid: 30692, comm: kworker/6:0 Not tainted 3.6.0 #5 Intel 
S5000PAL./S5000PAL0 
[108448.352003] RIP: 0010:[]  [] 
_raw_spin_unlock_irqrestore+0x5/0x6
[108448.352003] RSP: 0018:8802563a7d98  EFLAGS: 0286
[108448.352003] RAX: 88024e975000 RBX: 00a2 RCX: 
0087
[108448.352003] RDX: 

Re: [PATCH v2 1/3] w1: mxc_w1: Adapt the clock name to the new clock framework

2012-10-02 Thread Fabio Estevam
Sascha,

On Tue, Oct 2, 2012 at 2:32 AM, Evgeniy Polyakov  wrote:
> On Mon, Oct 01, 2012 at 02:51:44PM -0300, Fabio Estevam 
> (fabio.este...@freescale.com) wrote:
>> Evgeny,
>>
>> Any comments, please?
>
> I have no objections per se, but I'm hardly an expert in imx clock
> framework :)
>
> Since it is only one patch in set of 3, I suppose it will be pushed
> through different tree than w1. Feel free to add my ack.
>
> Acked-by: Evgeniy Polyakov 

Could this series go via your tree?

Regards,

Fabio Estevam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] regmap: silence GCC warning

2012-10-02 Thread Valdis . Kletnieks
On Mon, 01 Oct 2012 11:03:21 +0100, Mark Brown said:
> On Sun, Sep 30, 2012 at 12:15:55PM +0200, Paul Bolle wrote:
> > Building regmap.o triggers this GCC warning:
> > drivers/base/regmap/regmap.c: In function ‘regmap_raw_read’:
> > drivers/base/regmap/regmap.c:1172:6: warning: ‘ret’ may be used 
> > uninitialized in this function [-Wmaybe-uninitialized]
> >
> > It seems 'ret' should always be set when this function returns. See, the
> > else-branch can leave 'ret' uninitialized only if 'val_count' is zero.
> > But if 'val_count' is zero regmap_volatile_range() will return true.

I've not dug into it that deeply - is there a way that gcc is able to intuit
this fact and use it for flow analysis?  If not, it's not going to be able to
include that information in its analysis.

> > That implies that 'ret' will be set in the if-branch. ('val_count' could
> > be zero if 'val_len' is, for example, zero. That would be useless input,
> > however.)

But gcc doesn't know what "useless input" means, semantically.

> > Anyhow, initializing 'ret' to -EINVAL silences GCC and is harmless.
>
> Have you reported this bug in GCC?  Their flow analyis just seems to
> keep on getting worse and worse.

I'm not convinced that it's at fault in this particular case...


pgpxWnEN9X2Ta.pgp
Description: PGP signature


Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu

2012-10-02 Thread Takao Indoh

(2012/10/03 4:37), Andi Kleen wrote:

Takao Indoh  writes:


These patches reset PCIe devices at boot time to address DMA problem on
kdump with iommu. When "reset_devices" is specified, a hot reset is
triggered on each PCIe root port and downstream port to reset its
downstream endpoint.


Great. I've been pondering this for a long time, but you did finally
implement it. I hope this will make kdump a lot more reliable at least
on the systems that support per port reset.


Actually I got the idea of this patch from your comment below;-)
http://permalink.gmane.org/gmane.linux.network/162414



Now the only question is: why make it a option and not default?


Except kdump I don't know the situation where this reset is useful, so I
introduced it as option so that it works only during kdump.

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/15] cfq-iosched: Rename "service_tree" to "st"

2012-10-02 Thread Tejun Heo
Hello,

On Tue, Oct 02, 2012 at 09:26:03AM -0400, Vivek Goyal wrote:
> Yes this one is little odd. Ok, I will change it back to "service_tree"
> and only use "st" for local variables and in some function names.

Yes, please do that.  In general, it's beneficial to use at least
somewhat descriptive names for globals / struct fields and use
consistent shorthands for local variables dealing with them.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/15] cfq-iosched: Properly name all references to IO class

2012-10-02 Thread Tejun Heo
On Mon, Oct 01, 2012 at 03:32:42PM -0400, Vivek Goyal wrote:
> Currently CFQ has three IO classes, RT, BE and IDLE. At many a places we
> are calling workloads belonging to these classes as "prio". This gets
> very confusing as one starts to associate it with ioprio.
> 
> So this patch just does bunch of renaming so that reading code becomes
> easier. All reference to RT, BE and IDLE workload are done using keyword
> "class" and all references to subclass, SYNC, SYNC-IDLE, ASYNC are made
> using keyword "type".
> 
> This makes me feel much better while I am reading the code. There is no
> functionality change due to this patch.
> 
> Signed-off-by: Vivek Goyal 

Acked-by: Tejun Heo 

> @@ -751,16 +751,16 @@ static enum wl_type_t cfqq_type(struct cfq_queue *cfqq)
>   return SYNC_WORKLOAD;
>  }
>  
> -static inline int cfq_group_busy_queues_wl(enum wl_prio_t wl,
> +static inline int cfq_group_busy_queues_wl(enum wl_class_t wl_class,
>   struct cfq_data *cfqd,
>   struct cfq_group *cfqg)
>  {
> - if (wl == IDLE_WORKLOAD)
> + if (wl_class == IDLE_WORKLOAD)
>   return cfqg->service_tree_idle.count;
>  
> - return cfqg->service_trees[wl][ASYNC_WORKLOAD].count
> - + cfqg->service_trees[wl][SYNC_NOIDLE_WORKLOAD].count
> - + cfqg->service_trees[wl][SYNC_WORKLOAD].count;
> + return cfqg->service_trees[wl_class][ASYNC_WORKLOAD].count
> + + cfqg->service_trees[wl_class][SYNC_NOIDLE_WORKLOAD].count
> + + cfqg->service_trees[wl_class][SYNC_WORKLOAD].count;

While at it, maybe move the operator to the end of the preceding line
like everybody else?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/8] THP support for Sparc64

2012-10-02 Thread David Miller
From: Andrew Morton 
Date: Tue, 2 Oct 2012 15:55:44 -0700

> I had a shot at integrating all this onto the pending stuff in linux-next. 
> "mm: Add and use update_mmu_cache_pmd() in transparent huge page code."
> needed minor massaging in huge_memory.c.  But as Andrea mentioned, we
> ran aground on Gerald's
> http://ozlabs.org/~akpm/mmotm/broken-out/thp-remove-assumptions-on-pgtable_t-type.patch,
> part of the thp-for-s390 work.

I'll rebase my work against that stuff, looks fine to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL

2012-10-02 Thread David Miller
From: Ben Hutchings 
Date: Wed, 03 Oct 2012 01:17:12 +0100

> On Tue, 2012-10-02 at 23:06 +0400, Michael Tokarev wrote:
>> On 02.10.2012 22:49, Ferenc Wagner wrote:
>> > "Michael Chan"  writes:
>> >> These are the likely fixes:
>> >>
>> >> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
>> >> Author: Matt Carlson 
>> >> Date: Mon Nov 28 09:41:03 2011 +
>> >>
>> >> tg3: Fix TSO CAP for 5704 devs w / ASF enabled
 ...
> The fix went into 3.3, so only 3.0 and 3.2 need it.
> 
> David, please can you include the above commit in your next batches for
> these stable series?

Done.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))

2012-10-02 Thread Jiri Kosina
On Tue, 2 Oct 2012, Paul E. McKenney wrote:

> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> > On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> > 
> > > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> > > notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
> > > CPU hotplug events.  I could go back to the old approach, but it is 
> > > significantly more complex.  I cannot say that I am all that happy about 
> > > anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> > > doesn't help CPU hotplug latency, but that is a separate issue.
> > > 
> > > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > > is executing.  So the right way to resolve this seems to be to do the
> > > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > > of a hotplug notifier.  Should be fixable without too much hassle...
> > 
> > Sorry, I don't think I understand what you are proposing just yet.
> > 
> > If I understand it correctly, you are proposing to introduce some magic 
> > into _rcu_barrier() such as (pseudocode of course):
> > 
> > if (!being_called_from_hotplug_notifier_callback)
> > get_online_cpus()
> > 
> > How does that protect from the scenario I've outlined before though?
> > 
> > CPU 0   CPU 1
> > kmem_cache_destroy()
> > mutex_lock(slab_mutex)
> > _cpu_up()
> > cpu_hotplug_begin()
> > mutex_lock(cpu_hotplug.lock)
> > rcu_barrier()
> > _rcu_barrier()
> > get_online_cpus()
> > mutex_lock(cpu_hotplug.lock)
> >  (blocks, CPU 1 has the mutex)
> > __cpu_notify()
> > mutex_lock(slab_mutex)  
> > 
> > CPU 0 grabs both locks anyway (it's not running from notifier callback). 
> > CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
> > from notifier callback either.
> > 
> > What did I miss?
> 
> You didn't miss anything, I was suffering a failure to read carefully.
> 
> So my next stupid question is "Why can't kmem_cache_destroy drop
> slab_mutex early?" like the following:
> 
>   void kmem_cache_destroy(struct kmem_cache *cachep)
>   {
>   BUG_ON(!cachep || in_interrupt());
> 
>   /* Find the cache in the chain of caches. */
>   get_online_cpus();
>   mutex_lock(_mutex);
>   /*
>* the chain is never empty, cache_cache is never destroyed
>*/
>   list_del(>list);
>   if (__cache_shrink(cachep)) {
>   slab_error(cachep, "Can't free all objects");
>   list_add(>list, _caches);
>   mutex_unlock(_mutex);
>   put_online_cpus();
>   return;
>   }
>   mutex_unlock(_mutex);
> 
>   if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>   rcu_barrier();
> 
>   __kmem_cache_destroy(cachep);
>   put_online_cpus();
>   }
> 
> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
> Looks to me like it is just freeing now-disconnected memory.

Good question. I believe it should be safe to drop slab_mutex earlier, as 
cachep has already been unlinked. I am adding slab people and linux-mm to 
CC (the whole thread on LKML can be found at 
https://lkml.org/lkml/2012/10/2/296 for reference).

How about the patch below? Pekka, Christoph, please?

It makes the lockdep happy again, and obviously removes the deadlock (I 
tested it).



From: Jiri Kosina 
Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().

This opens a possibilty for deadlock:

CPU 0   CPU 1
kmem_cache_destroy()
mutex_lock(slab_mutex)
_cpu_up()
cpu_hotplug_begin()
mutex_lock(cpu_hotplug.lock)
rcu_barrier()
_rcu_barrier()
get_online_cpus()
mutex_lock(cpu_hotplug.lock)
 (blocks, CPU 1 has the mutex)
__cpu_notify()
mutex_lock(slab_mutex)

It turns out that slab's kmem_cache_destroy() might release 

Re: [PATCH] hardening: add PROT_FINAL prot flag to mmap/mprotect

2012-10-02 Thread Hugh Dickins
On Tue, 2 Oct 2012, Andrew Morton wrote:
> On Tue, 2 Oct 2012 15:10:56 -0700
> Kees Cook  wrote:
> 
> > >> Has there been any more progress on this patch over-all?
> > >
> > > No progress.
> > 
> > Al, Andrew, anyone? Thoughts on this?
> > (First email is https://lkml.org/lkml/2012/8/14/448)
> 
> Wasn't cc'ed, missed it.
> 
> The patch looks straightforward enough.  Have the maintainers of the
> runtime linker (I guess that's glibc) provided any feedback on the
> proposal?

It looks reasonable to me too.  I checked through VM_MAYflag handling
and don't expect surprises (a few places already turn off VM_MAYWRITE
in much the same way that this does, I hadn't realized).

I'm disappointed to find that our mmap() is lax about checking its
PROT and MAP args, so old kernels will accept PROT_FINAL but do
nothing with it.  Luckily mprotect() is stricter, so that can be
used to check for whether it's supported.

The patch does need to be slightly extended though: alpha, mips,
parisc and xtensa have their own include/asm/mman.h, which does
not include asm-generic/mman-common.h at all.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: use %pK for /proc/vmallocinfo

2012-10-02 Thread KOSAKI Motohiro
On Tue, Oct 2, 2012 at 7:49 PM, Kees Cook  wrote:
> In the paranoid case of sysctl kernel.kptr_restrict=2, mask the kernel
> virtual addresses in /proc/vmallocinfo too.
>
> Reported-by: Brad Spengler 
> Signed-off-by: Kees Cook 
> ---
>  mm/vmalloc.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2bb90b1..9c871db 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2572,7 +2572,7 @@ static int s_show(struct seq_file *m, void *p)
>  {
> struct vm_struct *v = p;
>
> -   seq_printf(m, "0x%p-0x%p %7ld",
> +   seq_printf(m, "0x%pK-0x%pK %7ld",
> v->addr, v->addr + v->size, v->size);

Looks good.
Acked-by: KOSAKI Motohiro 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/17] ARM: shmobile: fix memory size for kota2_defconfig

2012-10-02 Thread Simon Horman
On Tue, Oct 02, 2012 at 06:36:40PM +0200, Arnd Bergmann wrote:
> The CONFIG_MEMORY_SIZE value is interpreted as a 32 bit integer, which
> makes sense on a system without PAE. I'm assuming 0x1000 (256 MB)
> is the correct size, because that is used on most other shmobile
> boards.
> 
> Without this patch, building kota2_defconfig results in:

Hi Arnd,

I looked through my fines and found a config that I believe
worked with a derivative of 2.6.35.7.

It has CONFIG_MEMORY_SIZE=0x1e80.

So what I suspect has happened is that an extra zero has crept into
arch/arm/configs/kota2_defconfig and the intended value is:

CONFIG_MEMORY_SIZE=0x1e00

Unfortunately I do not have access to a board to test this,
nor am I aware of anyone who does.

> /home/arnd/linux-arm/arch/arm/kernel/setup.c:790:2: warning: large integer 
> implicitly truncated to unsigned type [-Woverflow]
> 
> Signed-off-by: Arnd Bergmann 
> Cc: Paul Mundt 
> Cc: Magnus Damm 
> Cc: linux...@vger.kernel.org
> Cc: Simon Horman 
> ---
>  arch/arm/configs/kota2_defconfig |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/configs/kota2_defconfig 
> b/arch/arm/configs/kota2_defconfig
> index b7735d6..0ea4c90 100644
> --- a/arch/arm/configs/kota2_defconfig
> +++ b/arch/arm/configs/kota2_defconfig
> @@ -21,7 +21,7 @@ CONFIG_ARCH_SHMOBILE=y
>  CONFIG_KEYBOARD_GPIO_POLLED=y
>  CONFIG_ARCH_SH73A0=y
>  CONFIG_MACH_KOTA2=y
> -CONFIG_MEMORY_SIZE=0x1e000
> +CONFIG_MEMORY_SIZE=0x1000
>  # CONFIG_SH_TIMER_TMU is not set
>  # CONFIG_SWP_EMULATE is not set
>  CONFIG_CPU_BPREDICT_DISABLE=y
> -- 
> 1.7.10
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] staging: rts_pstor: reuse kbasename()

2012-10-02 Thread Ryan Mallon
On 03/10/12 10:30, Ryan Mallon wrote:
> On 03/10/12 01:00, Andy Shevchenko wrote:
>> The custom filename function mostly repeats the kernel's kbasename. This 
>> patch
>> simplifies it. The updated filename() will not check for the '\' in the
>> filenames. It seems redundant in Linux.
>>
>> Signed-off-by: Andy Shevchenko 
>> Cc: YAMANE Toshiaki 
>> Cc: Greg Kroah-Hartman 
>> ---
>>  drivers/staging/rts_pstor/trace.h |   16 +++-
>>  1 file changed, 3 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/staging/rts_pstor/trace.h 
>> b/drivers/staging/rts_pstor/trace.h
>> index cf60a1b..59c5686 100644
>> --- a/drivers/staging/rts_pstor/trace.h
>> +++ b/drivers/staging/rts_pstor/trace.h
>> @@ -24,26 +24,16 @@
>>  #ifndef __REALTEK_RTSX_TRACE_H
>>  #define __REALTEK_RTSX_TRACE_H
>>  
>> +#include 
>> +
>>  #define _MSG_TRACE
>>  
>>  #ifdef _MSG_TRACE
>>  static inline char *filename(char *path)
>>  {
>> -char *ptr;
>> -
>>  if (path == NULL)
>>  return NULL;
>> -
>> -ptr = path;
>> -
>> -while (*ptr != '\0') {
>> -if ((*ptr == '\\') || (*ptr == '/'))
>> -path = ptr + 1;
> 
> The original version here returns the string after the last '/' or '\',
> the new kbasename function only looks for '/'. Does that matter here, or
> was the original code over eager?

Nevermind, I didn't read the changelog fully :-/.

~Ryan


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFT/PATCH] serial: omap: prevent resume if device is not suspended.

2012-10-02 Thread Kevin Hilman
"Poddar, Sourav"  writes:

> Hi,
>
> On Tue, Sep 25, 2012 at 2:51 PM, Russell King - ARM Linux
>  wrote:
>> On Tue, Sep 25, 2012 at 12:11:14PM +0300, Felipe Balbi wrote:
>>> On Tue, Sep 25, 2012 at 10:12:28AM +0100, Russell King - ARM Linux wrote:
>>> > On Tue, Sep 25, 2012 at 11:31:20AM +0300, Felipe Balbi wrote:
>>> > > On Tue, Sep 25, 2012 at 09:30:29AM +0100, Russell King - ARM Linux 
>>> > > wrote:
>>> > > > How is this happening?  I think that needs proper investigation - or 
>>> > > > if
>>> > > > it's had more investigation, then the results needs to be included in
>>> > > > the commit description so that everyone can understand the issue here.
>>> > > >
>>> > > > We should not be resuming a device which hasn't been suspended.  Maybe
>>> > > > the runtime PM enable sequence is wrong, and that's what should be 
>>> > > > fixed
>>> > > > instead?
>>> > > >
>>> > > > This sequence in the probe() function:
>>> > > >
>>> > > > pm_runtime_irq_safe(>dev);
>>> > > > pm_runtime_enable(>dev);
>>> > > > pm_runtime_get_sync(>dev);
>>> > > >
>>> > > > would enable runtime PM while the s/w state indicates that it's 
>>> > > > disabled,
>>> > > > and then that pm_runtime_get_sync() will want to resume the device.  
>>> > > > See
>>> > > > the section "5. Runtime PM Initialization, Device Probing and Removal"
>>> > > > in Documentation/power/runtime_pm.txt, specifically the second 
>>> > > > paragraph
>>> > > > of that section.
>>> > >
>>> > > that was tested. It worked in pandaboard but didn't work on beagleboard
>>> > > XM. Sourav tried to start a discussion about that, but it simply died...
>>> > >
>>> > > In any case, pm_runtime_get_sync() in probe will always call
>>> > > runtime_resume callback, right ?
>>> >
>>> > Well, if the runtime PM state says it's suspended, and then you enable
>>> > runtime PM, the first call to pm_runtime_get_sync() will trigger a resume
>>> > attempt.  The patch description is complaining about resume events without
>>> > there being a preceding suspend event.
>>> >
>>> > This could well be why.
>>>
>>> that's most likely, of course. But should we cause a regression to
>>> beagleboard XM because of that ?
>>
>> What would cause a regression on beagleboard XM?  I have not suggested
>> any change other than more investigation of the issue and a fuller patch
>> description - yet you're screaming (idiotically IMHO) that mere
>> investigation would break beagleboard.
>>
>> Well, if it's _that_ fragile, that mere investigation of this issue by
>> someone elsewhere on the planet would break your beagleboard, maybe it
>> deserves to be broken!
>
> The issue was observed at serial init itself in the N800 board and the
> log does not
> show up much.
> http://www.pwsan.com/omap/testlogs/test_tty_next_e36851d0/20120910020323/boot/2420n800/2420n800_log.txt
>  What we thought the problem might be with n800 is that it tries to
> resume when it didn't suspend before.
>
> There are two ways through which we thought of handling this issue:
>
> a) set device as active before enabling pm (which will prevent
>
> pm_runtime_set_active(dev);
> pm_runtime_enable(dev);
>
> OR
>
> b) adding a "suspended" flag to struct omap_uart_port which gets set on
> suspend and cleared on resume. Then on resume you can check:
>
> if (!up->suspended)
> return 0;
>
> But using "pm_runtime_set_active" approach breaks things even on
> beagle board xm,  though
> it works fine on Panda.
> Therefore, we used the "suspended" flag approach.
>
> So. I just wanted to get some feedback from community about how using
> "pm_runtime_set_active"
> behaves differently in omap3 and omap4.

As Russell has already pointed out in great detail, the difference is
simply a mismatch between assumed HW stated and actual hardware state
between various boards.  Put simply, the driver assumes the HW is
disabled (runtime suspended) when it loads, and the first runtime resume
is meant to enable the HW.  If that assumption is wrong, it needs to be
fixed.

Have you figured out why the HW is already active on OMAP2?  (probably
bootloader?)  

That being said, already active HW should not cause this problem.  In
fact, because of possible early console use, the hwmod init of the UART
hwmods does not idle/reset them on boot, so they are left in the state
that the bootloader set them in.  

When the hwmod is later enabled for real during probe, the hwmod muxing
is done for that IP.  So, I suspect what is really happening is that the
mux settings are not right for the UARTS on n800, so when the probe
happens, the UART mux settings are changed and you lose the UART.

Can you double check the UART mux settings for that board?  You might
need some different mux settings in the board file.

Kevin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH 3/7] staging: rts_pstor: reuse kbasename()

2012-10-02 Thread Ryan Mallon
On 03/10/12 01:00, Andy Shevchenko wrote:
> The custom filename function mostly repeats the kernel's kbasename. This patch
> simplifies it. The updated filename() will not check for the '\' in the
> filenames. It seems redundant in Linux.
> 
> Signed-off-by: Andy Shevchenko 
> Cc: YAMANE Toshiaki 
> Cc: Greg Kroah-Hartman 
> ---
>  drivers/staging/rts_pstor/trace.h |   16 +++-
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/staging/rts_pstor/trace.h 
> b/drivers/staging/rts_pstor/trace.h
> index cf60a1b..59c5686 100644
> --- a/drivers/staging/rts_pstor/trace.h
> +++ b/drivers/staging/rts_pstor/trace.h
> @@ -24,26 +24,16 @@
>  #ifndef __REALTEK_RTSX_TRACE_H
>  #define __REALTEK_RTSX_TRACE_H
>  
> +#include 
> +
>  #define _MSG_TRACE
>  
>  #ifdef _MSG_TRACE
>  static inline char *filename(char *path)
>  {
> - char *ptr;
> -
>   if (path == NULL)
>   return NULL;
> -
> - ptr = path;
> -
> - while (*ptr != '\0') {
> - if ((*ptr == '\\') || (*ptr == '/'))
> - path = ptr + 1;

The original version here returns the string after the last '/' or '\',
the new kbasename function only looks for '/'. Does that matter here, or
was the original code over eager?

~Ryan

> - 
> - ptr++;
> - }
> -
> - return path;
> + return kbasename(path);
>  }
>  
>  #define TRACE_RET(chip, ret) 
> \
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/2] hwmon: (ads7828) driver cleanup

2012-10-02 Thread Guenter Roeck
On Tue, Oct 02, 2012 at 06:10:02PM -0400, Vivien Didelot wrote:
> * Remove module parameters, add a ads7828_platform_data;
> * Move driver declaration to avoid adding function prototypes;
> * Remove unused macros;
> * Coding Style fixes.
> 
> Signed-off-by: Vivien Didelot 

Hi Vivien,

nice cleanup. Couple of minor comments below.

> ---
>  Documentation/hwmon/ads7828   |  31 +++--
>  drivers/hwmon/ads7828.c   | 216 
> +-
>  include/linux/platform_data/ads7828.h |  29 +
>  3 files changed, 155 insertions(+), 121 deletions(-)
>  create mode 100644 include/linux/platform_data/ads7828.h
> 
> diff --git a/Documentation/hwmon/ads7828 b/Documentation/hwmon/ads7828
> index 2bbebe6..b35668c 100644
> --- a/Documentation/hwmon/ads7828
> +++ b/Documentation/hwmon/ads7828
> @@ -5,21 +5,32 @@ Supported chips:
>* Texas Instruments/Burr-Brown ADS7828
>  Prefix: 'ads7828'
>  Addresses scanned: I2C 0x48, 0x49, 0x4a, 0x4b
> -Datasheet: Publicly available at the Texas Instruments website :
> +Datasheet: Publicly available at the Texas Instruments website:
> http://focus.ti.com/lit/ds/symlink/ads7828.pdf
>  
>  Authors:
>  Steve Hardy 
>  
> -Module Parameters
> --
> -
> -* se_input: bool (default Y)
> -  Single ended operation - set to N for differential mode
> -* int_vref: bool (default Y)
> -  Operate with the internal 2.5V reference - set to N for external reference
> -* vref_mv: int (default 2500)
> -  If using an external reference, set this to the reference voltage in mV
> +Platform data
> +-
> +
> +The ads7828 driver accepts an optional ads7828_platform_data structure 
> (defined
> +in include/linux/platform_data/ads7828.h). If no structure is provided, the
> +configuration defaults to single ended operation and internal vref (2.5V).
> +
> +The structure fields are:
> +
> +* diff_input: bool
> +  Differential operation - set to true for differential mode,
> +  false for default single ended mode.
> +* ext_vref: bool
> +  External reference - set to true if it operates with an external reference,
> +  false for default internal reference.
> +* vref_mv: int
> +  Voltage reference - if using an external reference, set this to the 
> reference
> +  voltage in mV, otherwise, it will default to the internal value (2500mV).
> +  This value will be bounded with limits accepted by the chip, described in 
> the
> +  datasheet.
>  
>  Description
>  ---
> diff --git a/drivers/hwmon/ads7828.c b/drivers/hwmon/ads7828.c
> index bf3fdf4..0a13bf8 100644
> --- a/drivers/hwmon/ads7828.c
> +++ b/drivers/hwmon/ads7828.c
> @@ -6,7 +6,7 @@
>   *
>   * Written by Steve Hardy 
>   *
> - * Datasheet available at: http://focus.ti.com/lit/ds/symlink/ads7828.pdf
> + * For further information, see the Documentation/hwmon/ads7828 file.
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
> @@ -23,63 +23,48 @@
>   * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
>   */
>  
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> +#include 
>  #include 
>  #include 
> -#include 
> +#include 
> +#include 
> +#include 
> +#include 
>  #include 
> +#include 
> +#include 
>  
>  /* The ADS7828 registers */
> -#define ADS7828_NCH 8 /* 8 channels of 12-bit A-D supported */
> -#define ADS7828_CMD_SD_SE 0x80 /* Single ended inputs */
> -#define ADS7828_CMD_SD_DIFF 0x00 /* Differential inputs */
> -#define ADS7828_CMD_PD0 0x0 /* Power Down between A-D conversions */
> -#define ADS7828_CMD_PD1 0x04 /* Internal ref OFF && A-D ON */
> -#define ADS7828_CMD_PD2 0x08 /* Internal ref ON && A-D OFF */
> -#define ADS7828_CMD_PD3 0x0C /* Internal ref ON && A-D ON */
> -#define ADS7828_INT_VREF_MV 2500 /* Internal vref is 2.5V, 2500mV */
> +#define ADS7828_NCH  8   /* 8 channels supported */
> +#define ADS7828_CMD_SD_SE0x80/* Single ended inputs */
> +#define ADS7828_CMD_PD1  0x04/* Internal vref OFF && A/D ON 
> */
> +#define ADS7828_CMD_PD3  0x0C/* Internal vref ON && A/D ON */
> +#define ADS7828_INT_VREF_MV  2500/* Internal vref is 2.5V, 2500mV */
> +#define ADS7828_EXT_VREF_MV_MIN  50  /* External vref min value 
> 0.05V */
> +#define ADS7828_EXT_VREF_MV_MAX  5250/* External vref max value 
> 5.25V */
>  
>  /* Addresses to scan */
>  static const unsigned short normal_i2c[] = { 0x48, 0x49, 0x4a, 0x4b,
>   I2C_CLIENT_END };
>  
> -/* Module parameters */
> -static bool se_input = 1; /* Default is SE, 0 == diff */
> -static bool int_vref = 1; /* Default is internal ref ON */
> -static int vref_mv = ADS7828_INT_VREF_MV; /* set if vref != 2.5V */
> -module_param(se_input, bool, S_IRUGO);
> -module_param(int_vref, bool, S_IRUGO);
> -module_param(vref_mv, int, S_IRUGO);
> -
> -/* Global Variables */
> -static u8 ads7828_cmd_byte; /* cmd 

Re: [RFC, PATCH] Extensible AIO interface

2012-10-02 Thread Kent Overstreet
On Tue, Oct 02, 2012 at 01:41:17PM -0400, Jeff Moyer wrote:
> Kent Overstreet  writes:
> 
> > So, I and other people keep running into things where we really need to
> > add an interface to pass some auxiliary... stuff along with a pread() or
> > pwrite().
> >
> > A few examples:
> >
> > * IO scheduler hints. Some userspace program wants to, per IO, specify
> > either priorities or a cgroup - by specifying a cgroup you can have a
> > fileserver in userspace that makes use of cfq's per cgroup bandwidth
> > quotas.
> 
> You can do this today by splitting I/O between processes and placing
> those processes in different cgroups.  For io priority, there is
> ioprio_set, which incurs an extra system call, but can be used.  Not
> elegant, but possible.

Yes - those are things I'm trying to replace. Doing it that way is a
real pain, both as it's a lousy interface for this and it does impact
performance (ioprio_set doesn't really work too well with aio, too).

> > * Cache hints. For bcache and other things, userspace may want to specify
> > "this data should be cached", "this data should bypass the cache", etc.
> 
> Please explain how you will differentiate this from posix_fadvise.

Oh sorry, I think about SSD caching so much I forget to say that's what
I'm talking about. posix_fadvise is for the page cache, we want
something different for an SSD cache (IMO it'd be really ugly to use it
for both, and posix_fadvise() can't really specifify everything we'd
want to for an SSD cache).

> > * Passing checksums out to userspace. We've got bio integrity, which is
> > a (somewhat) generic interface for passing data checksums between the
> > filesystem and the hardware. There are various circumstances under which
> > you may want to pass these checksums out to userspace, and if so we
> > ought to have a generic way of doing it.
> 
> Yes, that needs a new interface.
> 
> > Hence, AIO attributes.
> 
> *No.*  Start with the non-AIO case first.

Why? It is orthogonal to AIO (and I should make that clearer), but to do
it for sync IO we'd need new syscalls that take an extra argument so IMO
it's a bit easier to start with AIO.

Might be worth implementing the sync interface sooner rather than later
just to discover any potential issues, I suppose.


> > * FUTURE STUFF:
> >
> > Return values:
> >
> > Some attributes are probably going to want to return something to
> > userspace.
> >
> > If nothing else, we want this so that userspace can tell if anything
> > handled the attributes it specified - as dynamic as the io stack can be,
> > with something extensible like this there really isn't any generic way
> > of knowing ahead of time if something is going to interpret any
> > attribute - we want to return at least an error code.
> 
> Seems odd to me.  Why not expose supported attributes via some other
> call?  fcntl?

It's not possible in general - consider stacking block devices, and
attrs that are supported only by specific block drivers. I.e. if you've
got lvm on top of bcache or bcache on top of md, we can pass the attr
down with the IO but we can't determine ahead of time, in general, where
the IO is going to go.

But that probably isn't true for most attrs so it probably would be a
good idea to have an interface for querying what's supported, and even
for device specific ones you could query what a device supports.

> > One could imagine sticking the return in the attribute itself, but I
> > don't want to do this. For some things (checksums), the attribute will
> > contain a pointer to a buffer - that's fine. But I don't want the
> > attributes themselves to be writeable.
> 
> One could imagine that attributes don't return anything, because, well,
> they're properties of something else, and properties don't return
> anything.

With a strict definition of attribute, yeah. One of the real uses cases
we have for this is per IO timings, for aio - right now we've got an
interface for the kernel to tell userspace how long a syscall took
(don't think it's upstream yet - Paul's been behind that stuff), but it
only really makes sense with synchronous syscalls.

These AIO attributes would be useful for that too, but I'd _much_ prefer
if the timing information was explicitly returned instead of using a
pointer to a buffer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Fix line over 80 character issue and space before tabs issue in trace.h

2012-10-02 Thread Toshiaki Yamane
On Tue, Oct 2, 2012 at 9:23 PM, Dan Carpenter  wrote:
> On Tue, Oct 02, 2012 at 08:54:48PM +0900, YAMANE Toshiaki wrote:
>> fixed some checkpatch warnings.
>>
>> -WARNING: line over 80 characters
>> -WARNING: please, no space before tabs
>>
>
> These looked nicer in the original, sorry.

I will destroy this patch series.
thanks.


YAMANE Toshiaki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] Add parenthesis to macros with complex values in trace.h

2012-10-02 Thread Toshiaki Yamane
On Tue, Oct 2, 2012 at 9:21 PM, Dan Carpenter  wrote:
> On Tue, Oct 02, 2012 at 08:54:28PM +0900, YAMANE Toshiaki wrote:
>> fixed some checkpatch below error.
>> -ERROR: Macros with complex values should be enclosed in parenthesis
>>
>> Signed-off-by: Toshiaki Yamane 
>> ---
>>  drivers/staging/rts_pstor/trace.h |4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/staging/rts_pstor/trace.h 
>> b/drivers/staging/rts_pstor/trace.h
>> index 740999c..a34493c 100644
>> --- a/drivers/staging/rts_pstor/trace.h
>> +++ b/drivers/staging/rts_pstor/trace.h
>> @@ -78,8 +78,8 @@ do {   
>>  \
>>   goto label;
>>  \
>>  } while (0)
>>  #else
>> -#define TRACE_RET(chip, ret) return ret
>> -#define TRACE_GOTO(chip, label)  goto label
>> +#define TRACE_RET(chip, ret) return(ret)
>> +#define TRACE_GOTO(chip, label)  goto(label)
>
> This will cause a compile error.
>
> There is no need to do this, checkpatch.pl is wrong here.

I will destroy this patch series.
thanks.


YAMANE Toshiaki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] Fix trailing whitespace in trace.h

2012-10-02 Thread Toshiaki Yamane
On Tue, Oct 2, 2012 at 9:12 PM, Andy Shevchenko
 wrote:
> On Tue, 2012-10-02 at 20:53 +0900, YAMANE Toshiaki wrote:
>> fixed below checkpatch error.
>>
>> -ERROR: trailing whitespace
>>
>> Signed-off-by: Toshiaki Yamane 
>> ---
>>  drivers/staging/rts_pstor/trace.h |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/staging/rts_pstor/trace.h 
>> b/drivers/staging/rts_pstor/trace.h
>> index cf60a1b..740999c 100644
>> --- a/drivers/staging/rts_pstor/trace.h
>> +++ b/drivers/staging/rts_pstor/trace.h
>> @@ -39,7 +39,7 @@ static inline char *filename(char *path)
>>   while (*ptr != '\0') {
>>   if ((*ptr == '\\') || (*ptr == '/'))
>>   path = ptr + 1;
>> -
>> +
>>   ptr++;
>>   }
>
> For me it seems total filename() function could be squeezed to just 
> strrchr(path, '/') + 1;
> Moreover there is already basename() in the lib/dynamic_debug.c you
> could rename, export and reuse.

I will destroy this patch series.
thanks.


YAMANE Toshiaki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tg3 driver upgrade (Linux 2.6.32 -> 3.2) breaks IBM Bladecenter SoL

2012-10-02 Thread Ben Hutchings
On Tue, 2012-10-02 at 23:06 +0400, Michael Tokarev wrote:
> On 02.10.2012 22:49, Ferenc Wagner wrote:
> > "Michael Chan"  writes:
> >> These are the likely fixes:
> >>
> >> commit cf9ecf4b631f649a964fa611f1a5e8874f2a76db 
> >> Author: Matt Carlson 
> >> Date: Mon Nov 28 09:41:03 2011 +
> >>
> >> tg3: Fix TSO CAP for 5704 devs w / ASF enabled
> > 
> > You are exactly right: cf9ecf4b fixed the premanent SoL breakage
> > introduced by dabc5c67.  Looks like ASF utilizes similar technology to
> > that of the HS20 BMC.  Thanks for the tip, it greatly reduced our CPU
> > wear. :)  It's a pity ethtool -k did not give a hint.  Do you think it's
> > possible to work around in 3.2 by eg. fiddling some ethtool setting?
> 
> Maybe it's better to push this commit to -stable instead?

But that will take time, so I imagine a temporary workaround would be
useful to Ferenc.

> (the commit
> that broke things is part of 3.0 kernel so all current 3.x -stable
> kernels are affected)
[...]

The fix went into 3.3, so only 3.0 and 3.2 need it.

David, please can you include the above commit in your next batches for
these stable series?

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.


signature.asc
Description: This is a digitally signed message part


Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")

2012-10-02 Thread Paul E. McKenney
On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> 
> > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> > notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
> > CPU hotplug events.  I could go back to the old approach, but it is 
> > significantly more complex.  I cannot say that I am all that happy about 
> > anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> > doesn't help CPU hotplug latency, but that is a separate issue.
> > 
> > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > is executing.  So the right way to resolve this seems to be to do the
> > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > of a hotplug notifier.  Should be fixable without too much hassle...
> 
> Sorry, I don't think I understand what you are proposing just yet.
> 
> If I understand it correctly, you are proposing to introduce some magic 
> into _rcu_barrier() such as (pseudocode of course):
> 
>   if (!being_called_from_hotplug_notifier_callback)
>   get_online_cpus()
> 
> How does that protect from the scenario I've outlined before though?
> 
>   CPU 0   CPU 1
>   kmem_cache_destroy()
>   mutex_lock(slab_mutex)
>   _cpu_up()
>   cpu_hotplug_begin()
>   mutex_lock(cpu_hotplug.lock)
>   rcu_barrier()
>   _rcu_barrier()
>   get_online_cpus()
>   mutex_lock(cpu_hotplug.lock)
>(blocks, CPU 1 has the mutex)
>   __cpu_notify()
>   mutex_lock(slab_mutex)  
> 
> CPU 0 grabs both locks anyway (it's not running from notifier callback). 
> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
> from notifier callback either.
> 
> What did I miss?

You didn't miss anything, I was suffering a failure to read carefully.

So my next stupid question is "Why can't kmem_cache_destroy drop
slab_mutex early?" like the following:

void kmem_cache_destroy(struct kmem_cache *cachep)
{
BUG_ON(!cachep || in_interrupt());

/* Find the cache in the chain of caches. */
get_online_cpus();
mutex_lock(_mutex);
/*
 * the chain is never empty, cache_cache is never destroyed
 */
list_del(>list);
if (__cache_shrink(cachep)) {
slab_error(cachep, "Can't free all objects");
list_add(>list, _caches);
mutex_unlock(_mutex);
put_online_cpus();
return;
}
mutex_unlock(_mutex);

if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
rcu_barrier();

__kmem_cache_destroy(cachep);
put_online_cpus();
}

Or did I miss some reason why __kmem_cache_destroy() needs that lock?
Looks to me like it is just freeing now-disconnected memory.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-02 Thread Linus Torvalds
On Tue, Oct 2, 2012 at 5:01 PM, Jiri Kosina  wrote:
> On Tue, 2 Oct 2012, Linus Torvalds wrote:
>
>> And see this email from Kay Sievers that shows that it was all known
>> about and intentional in the udev camp:
>>
>>   http://www.spinics.net/lists/netdev/msg185742.html
>
> This seems confusing indeed.
>
> That e-mail referenced above is talking about loading firmware at ifup
> time. While that might work for network device drivers (I am not sure even
> about that), what are the udev maintainers advice for other drivers, where
> there is no analogy to ifup?

Yeah, it's an udev bug. It really is that simple.

This is why I'm complaining. There's no way in hell we're fixing this
in kernel space, unless we call the "bypass udev entirely because the
maintainership quality of it has taken a nose dive". Yes, I've seen
some work-around patches, but quite frankly, I think it would be
absolutely insane for the kernel to work around the fact that udev is
buggy.

The fact is, doing request_firmware() from within module_init() is
simply the easiest approach for some devices.

Now, at the same time, I do agree that network devices should
generally try to delay it until ifup time, so I'm not arguing against
that part per se. I do think that when possible, people should aim to
delay firmware loading until as late as reasonable.

But as you point out, it's simply not always reasonable, and the media
people are clearly hitting the cases where it's just painful. Now,
those cases seem to be happily fairly *rare*, so this isn't getting a
ton of attention, but we should fix it.

Because the udev behavior is all pain, no gain. There's no *reason*
for udev to be pissy about this. And it didn't use to be.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 00/10] Introduce huge zero page

2012-10-02 Thread Andrew Morton
On Wed, 3 Oct 2012 03:04:02 +0300
"Kirill A. Shutemov"  wrote:

> Is the overview complete enough? Have I answered all you questions here?

Yes, thanks!

The design overview is short enough to be put in as code comments in
suitable places.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/7] aoe: support more AoE addresses with dynamic block device minor numbers

2012-10-02 Thread Ed Cashin
On Oct 2, 2012, at 5:12 PM, Andrew Morton wrote:
...
>> +static int
>> +minor_get(ulong *minor)
>> {
>> -struct aoedev *d;
>>  ulong flags;
>> +ulong n;
>> +int error = 0;
>> +
>> +spin_lock_irqsave(_minors_lock, flags);
>> +n = find_first_zero_bit(used_minors, N_DEVS);
>> +if (n < N_DEVS)
>> +set_bit(n, used_minors);
>> +else
>> +error = -1;
>> +spin_unlock_irqrestore(_minors_lock, flags);
>> +
>> +*minor = n * AOE_PARTITIONS;
>> +return error;
>> +}
> 
> - can use the more efficient __set_bit() inside that spinlock.

Thanks for that observation.  Because this operation occurs on target 
discovery, which is expected to be relatively infrequent, my inclination is to 
leave it in its atomic form, though, and leave the __set_bit() for another time 
when optimization is needed.  Like you said, this is a minor point.  I wouldn't 
mind changing it, though, if you think it's worth me resubmitting the patch.  
Just let me know.

> - could avoid setting *minor if we're returning an error.

Yes.  The only caller of aoedev.c:minor_get() handles that correctly.  Again, 
just let me know if you think this is worth a resubmission of the patch.  
Otherwise I'll just make a note to myself to try to avoid setting output 
parameters on error in the future.

-- 
  Ed Cashin
  ecas...@coraid.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 00/10] Introduce huge zero page

2012-10-02 Thread Kirill A. Shutemov
On Tue, Oct 02, 2012 at 03:31:48PM -0700, Andrew Morton wrote:
> On Tue,  2 Oct 2012 18:19:22 +0300
> "Kirill A. Shutemov"  wrote:
> 
> > During testing I noticed big (up to 2.5 times) memory consumption overhead
> > on some workloads (e.g. ft.A from NPB) if THP is enabled.
> > 
> > The main reason for that big difference is lacking zero page in THP case.
> > We have to allocate a real page on read page fault.
> > 
> > A program to demonstrate the issue:
> > #include 
> > #include 
> > #include 
> > 
> > #define MB 1024*1024
> > 
> > int main(int argc, char **argv)
> > {
> > char *p;
> > int i;
> > 
> > posix_memalign((void **), 2 * MB, 200 * MB);
> > for (i = 0; i < 200 * MB; i+= 4096)
> > assert(p[i] == 0);
> > pause();
> > return 0;
> > }
> > 
> > With thp-never RSS is about 400k, but with thp-always it's 200M.
> > After the patcheset thp-always RSS is 400k too.
> 
> I'd like to see a full description of the design, please.

Okay. Design overview.

Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with
zeros.  The way how we allocate it changes in the patchset:

- [01/10] simplest way: hzp allocated on boot time in hugepage_init();
- [09/10] lazy allocation on first use;
- [10/10] lockless refcounting + shrinker-reclaimable hzp;

We setup it in do_huge_pmd_anonymous_page() if area around fault address
is suitable for THP and we've got read page fault.
If we fail to setup hzp (ENOMEM) we fallback to handle_pte_fault() as we
normally do in THP.

On wp fault to hzp we allocate real memory for the huge page and clear it.
If ENOMEM, graceful fallback: we create a new pmd table and set pte around
fault address to newly allocated normal (4k) page. All other ptes in the
pmd set to normal zero page.

We cannot split hzp (and it's bug if we try), but we can split the pmd
which points to it. On splitting the pmd we create a table with all ptes
set to normal zero page.

Patchset organized in bisect-friendly way:
 Patches 01-07: prepare all code paths for hzp
 Patch 08: all code paths are covered: safe to setup hzp
 Patch 09: lazy allocation
 Patch 10: lockless refcounting for hzp

--

By hpa request I've tried alternative approach for hzp implementation (see
Virtual huge zero page patchset): pmd table with all entries set to zero
page. This way should be more cache friendly, but it increases TLB
pressure.

The problem with virtual huge zero page: it requires per-arch enabling.
We need a way to mark that pmd table has all ptes set to zero page.

Some numbers to compare two implementations (on 4s Westmere-EX):

Mirobenchmark1
==

test:
posix_memalign((void **), 2 * MB, 8 * GB);
for (i = 0; i < 100; i++) {
assert(memcmp(p, p + 4*GB, 4*GB) == 0);
asm volatile ("": : :"memory");
}

hzp:
 Performance counter stats for './test_memcmp' (5 runs):

  32356.272845 task-clock#0.998 CPUs utilized   
 ( +-  0.13% )
40 context-switches  #0.001 K/sec   
 ( +-  0.94% )
 0 CPU-migrations#0.000 K/sec
 4,218 page-faults   #0.130 K/sec   
 ( +-  0.00% )
76,712,481,765 cycles#2.371 GHz 
 ( +-  0.13% ) [83.31%]
36,279,577,636 stalled-cycles-frontend   #   47.29% frontend cycles idle
 ( +-  0.28% ) [83.35%]
 1,684,049,110 stalled-cycles-backend#2.20% backend  cycles idle
 ( +-  2.96% ) [66.67%]
   134,355,715,816 instructions  #1.75  insns per cycle
 #0.27  stalled cycles per insn 
 ( +-  0.10% ) [83.35%]
13,526,169,702 branches  #  418.039 M/sec   
 ( +-  0.10% ) [83.31%]
 1,058,230 branch-misses #0.01% of all branches 
 ( +-  0.91% ) [83.36%]

  32.413866442 seconds time elapsed 
 ( +-  0.13% )

vhzp:
 Performance counter stats for './test_memcmp' (5 runs):

  30327.183829 task-clock#0.998 CPUs utilized   
 ( +-  0.13% )
38 context-switches  #0.001 K/sec   
 ( +-  1.53% )
 0 CPU-migrations#0.000 K/sec
 4,218 page-faults   #0.139 K/sec   
 ( +-  0.01% )
71,964,773,660 cycles#2.373 GHz 
 ( +-  0.13% ) [83.35%]
31,191,284,231 stalled-cycles-frontend   #   43.34% frontend cycles idle
 ( +-  0.40% ) [83.32%]
   773,484,474 stalled-cycles-backend#1.07% backend  cycles idle
 ( +-  6.61% ) [66.67%]
   134,982,215,437 instructions  #1.88  insns per cycle

Re: udev breakages - was: Re: Need of an ".async_probe()" type of callback at driver's core - Was: Re: [PATCH] [media] drxk: change it to use request_firmware_nowait()

2012-10-02 Thread Jiri Kosina
On Tue, 2 Oct 2012, Linus Torvalds wrote:

> And see this email from Kay Sievers that shows that it was all known
> about and intentional in the udev camp:
> 
>   http://www.spinics.net/lists/netdev/msg185742.html

This seems confusing indeed.

That e-mail referenced above is talking about loading firmware at ifup 
time. While that might work for network device drivers (I am not sure even 
about that), what are the udev maintainers advice for other drivers, where 
there is no analogy to ifup?

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: use %pK for /proc/vmallocinfo

2012-10-02 Thread Kees Cook
In the paranoid case of sysctl kernel.kptr_restrict=2, mask the kernel
virtual addresses in /proc/vmallocinfo too.

Reported-by: Brad Spengler 
Signed-off-by: Kees Cook 
---
 mm/vmalloc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2bb90b1..9c871db 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2572,7 +2572,7 @@ static int s_show(struct seq_file *m, void *p)
 {
struct vm_struct *v = p;
 
-   seq_printf(m, "0x%p-0x%p %7ld",
+   seq_printf(m, "0x%pK-0x%pK %7ld",
v->addr, v->addr + v->size, v->size);
 
if (v->caller)
-- 
1.7.9.5


-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/31] perf, x86: Basic Haswell PMU support v2

2012-10-02 Thread Andi Kleen
From: Andi Kleen 

Add basic Haswell PMU support.

Similar to SandyBridge, but has a few new events. Further
differences are handled in followon patches.

There are some new counter flags that need to be prevented
from being set on fixed counters.

Contains fixes from Stephane Eranian

v2: Folded TSX bits into standard FIXED_EVENT_CONSTRAINTS
Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/perf_event.h  |3 +++
 arch/x86/kernel/cpu/perf_event.h   |5 -
 arch/x86/kernel/cpu/perf_event_intel.c |   29 +
 3 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index 4fabcdf..4003bb6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,9 @@
 #define ARCH_PERFMON_EVENTSEL_INV  (1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK0xFF00ULL
 
+#define HSW_INTX   (1ULL << 32)
+#define HSW_INTX_CHECKPOINTED  (1ULL << 33)
+
 #define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY  (1ULL << 41)
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 8b6defe..8e50d8b 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -219,11 +219,14 @@ struct cpu_hw_events {
  *  - inv
  *  - edge
  *  - cnt-mask
+ *  - intx
+ *  - intx_cp
  *  The other filters are supported by fixed counters.
  *  The any-thread option is supported starting with v3.
  */
+#define FIXED_EVENT_FLAGS (X86_RAW_EVENT_MASK|HSW_INTX|HSW_INTX_CHECKPOINTED)
 #define FIXED_EVENT_CONSTRAINT(c, n)   \
-   EVENT_CONSTRAINT(c, (1ULL << (32+n)), X86_RAW_EVENT_MASK)
+   EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
 /*
  * Constraint on the Event code + UMask
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 6bca492..50c43ca 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -133,6 +133,17 @@ static struct extra_reg intel_snb_extra_regs[] 
__read_mostly = {
EVENT_EXTRA_END
 };
 
+static struct event_constraint intel_hsw_event_constraints[] =
+{
+   FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+   FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+   FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
+   INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.PENDING */
+   INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
+   INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
+   EVENT_CONSTRAINT_END
+};
+
 static u64 intel_pmu_event_map(int hw_event)
 {
return intel_perfmon_event_map[hw_event];
@@ -2096,6 +2107,24 @@ __init int intel_pmu_init(void)
break;
 
 
+   case 60: /* Haswell Client */
+   case 70:
+   case 71:
+   memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
+  sizeof(hw_cache_event_ids));
+
+   intel_pmu_lbr_init_nhm();
+
+   x86_pmu.event_constraints = intel_hsw_event_constraints;
+
+   x86_pmu.extra_regs = intel_snb_extra_regs;
+   /* all extra regs are per-cpu when HT is on */
+   x86_pmu.er_flags |= ERF_HAS_RSP_1;
+   x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+
+   pr_cont("Haswell events, ");
+   break;
+
default:
switch (x86_pmu.version) {
case 1:
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/31] perf, x86: Support PERF_SAMPLE_ADDR on Haswell

2012-10-02 Thread Andi Kleen
From: Andi Kleen 

Haswell supplies the address for every PEBS memory event, so always fill it in
when the user requested it.  It will be 0 when not useful (no memory access)

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 5d3d6be..8c893ce 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -637,6 +637,10 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
data.raw = 
}
 
+   if ((event->attr.sample_type & PERF_SAMPLE_ADDR) &&
+   x86_pmu.intel_cap.pebs_format >= 2)
+   data.addr = ((struct pebs_record_v2 *)pebs)->nhm.dla;
+
if (has_branch_stack(event))
data.br_stack = >lbr_stack;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/31] perf, x86: Support Haswell v4 LBR format

2012-10-02 Thread Andi Kleen
From: Andi Kleen 

Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the PEBS record.

Handle those in and adjust the sign extension code to still correctly extend.
The flags are exported similarly in the LBR record to the existing misprediction
flag

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   18 +++---
 include/linux/perf_event.h |7 ++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c 
b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..2af6695b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -12,6 +12,7 @@ enum {
LBR_FORMAT_LIP  = 0x01,
LBR_FORMAT_EIP  = 0x02,
LBR_FORMAT_EIP_FLAGS= 0x03,
+   LBR_FORMAT_EIP_FLAGS2   = 0x04,
 };
 
 /*
@@ -56,6 +57,8 @@ enum {
 LBR_FAR)
 
 #define LBR_FROM_FLAG_MISPRED  (1ULL << 63)
+#define LBR_FROM_FLAG_INTX (1ULL << 62)
+#define LBR_FROM_FLAG_ABORT(1ULL << 61)
 
 #define for_each_branch_sample_type(x) \
for ((x) = PERF_SAMPLE_BRANCH_USER; \
@@ -270,21 +273,30 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events 
*cpuc)
 
for (i = 0; i < x86_pmu.lbr_nr; i++) {
unsigned long lbr_idx = (tos - i) & mask;
-   u64 from, to, mis = 0, pred = 0;
+   u64 from, to, mis = 0, pred = 0, intx = 0, abort = 0;
 
rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to   + lbr_idx, to);
 
-   if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS ||
+   lbr_format == LBR_FORMAT_EIP_FLAGS2) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
-   from = (u64)s64)from) << 1) >> 1);
+   if (lbr_format == LBR_FORMAT_EIP_FLAGS)
+   from = (u64)s64)from) << 1) >> 1);
+   else if (lbr_format == LBR_FORMAT_EIP_FLAGS2) {
+   intx = !!(from & LBR_FROM_FLAG_INTX);
+   abort = !!(from & LBR_FROM_FLAG_ABORT);
+   from = (u64)s64)from) << 3) >> 3);
+   }
}
 
cpuc->lbr_entries[i].from   = from;
cpuc->lbr_entries[i].to = to;
cpuc->lbr_entries[i].mispred= mis;
cpuc->lbr_entries[i].predicted  = pred;
+   cpuc->lbr_entries[i].intx   = intx;
+   cpuc->lbr_entries[i].abort  = abort;
cpuc->lbr_entries[i].reserved   = 0;
}
cpuc->lbr_stack.nr = i;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 599afc4..bb34750 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -671,13 +671,18 @@ struct perf_raw_record {
  *
  * support for mispred, predicted is optional. In case it
  * is not supported mispred = predicted = 0.
+ *
+ * intx: running in a hardware transaction
+ * abort: aborting a hardware transaction
  */
 struct perf_branch_entry {
__u64   from;
__u64   to;
__u64   mispred:1,  /* target mispredicted */
predicted:1,/* target predicted */
-   reserved:62;
+   intx:1, /* in transaction */
+   abort:1,/* transaction abort */
+   reserved:60;
 };
 
 /*
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/31] perf, x86: Support full width counting on Haswell

2012-10-02 Thread Andi Kleen
From: Andi Kleen 

Haswell has a new alternative MSR range for perfctrs that allows writing the 
full
counter width. Enable this range if the hardware reports it using a new 
capability
bit. This lowers overhead of perf stat slightly because it has to do less 
interrupts
to accumulate the counter value. It also avoids some problems with TSX
aborting when the end of the counter range is reached.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/msr-index.h   |3 +++
 arch/x86/kernel/cpu/perf_event.h   |1 +
 arch/x86/kernel/cpu/perf_event_intel.c |6 ++
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 957ec87..cbf344f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -121,6 +121,9 @@
 #define MSR_P6_EVNTSEL00x0186
 #define MSR_P6_EVNTSEL10x0187
 
+/* Alternative perfctr range with full access. */
+#define MSR_IA32_PMC0  0x04c1
+
 /* AMD64 MSRs. Not complete. See the architecture manual for a more
complete list. */
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index c1dfe5d..4b468ae 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -278,6 +278,7 @@ union perf_capabilities {
u64 pebs_arch_reg:1;
u64 pebs_format:4;
u64 smm_freeze:1;
+   u64 fw_write:1;
};
u64 capabilities;
 };
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index bd50116..21542bf 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2227,5 +2227,11 @@ __init int intel_pmu_init(void)
}
}
 
+   /* Support full width counters using alternative MSR range */
+   if (x86_pmu.intel_cap.fw_write) {
+   x86_pmu.max_period = x86_pmu.cntval_mask;
+   x86_pmu.perfctr = MSR_IA32_PMC0;
+   }
+
return 0;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >