date:20121213

[PATCH 6/6] iio: dac: ad5791: Don't set error code to [pos|neg]_voltage_uv

2012-12-13 Thread Axel Lin

regulator_get_voltage() may return negative error code.
Don't set error code to to pos_voltage_uv and neg_voltage_uv.

Signed-off-by: Axel Lin 
---
 drivers/iio/dac/ad5791.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/iio/dac/ad5791.c b/drivers/iio/dac/ad5791.c
index 2bd2e37..6efe83e 100644
--- a/drivers/iio/dac/ad5791.c
+++ b/drivers/iio/dac/ad5791.c
@@ -365,7 +365,11 @@ static int __devinit ad5791_probe(struct spi_device *spi)
if (ret)
goto error_put_reg_pos;
 
-   pos_voltage_uv = regulator_get_voltage(st->reg_vdd);
+   ret = regulator_get_voltage(st->reg_vdd);
+   if (ret < 0)
+   goto error_disable_reg_pos;
+
+   pos_voltage_uv = ret;
}
 
st->reg_vss = regulator_get(>dev, "vss");
@@ -374,7 +378,11 @@ static int __devinit ad5791_probe(struct spi_device *spi)
if (ret)
goto error_put_reg_neg;
 
-   neg_voltage_uv = regulator_get_voltage(st->reg_vss);
+   ret = regulator_get_voltage(st->reg_vss);
+   if (ret < 0)
+   goto error_disable_reg_neg;
+
+   neg_voltage_uv = ret;
}
 
st->pwr_down = true;
@@ -428,6 +436,7 @@ error_put_reg_neg:
if (!IS_ERR(st->reg_vss))
regulator_put(st->reg_vss);
 
+error_disable_reg_pos:
if (!IS_ERR(st->reg_vdd))
regulator_disable(st->reg_vdd);
 error_put_reg_pos:
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-13 Thread Alex Shi

On 12/14/2012 03:45 PM, Mike Galbraith wrote:
> On Fri, 2012-12-14 at 14:36 +0800, Alex Shi wrote: 
>> On 12/14/2012 12:45 PM, Mike Galbraith wrote:
> Do you have further ideas for buddy cpu on such example?
>>>
>>> Which kind of sched_domain configuration have you for such system ?
>>> and how many sched_domain level have you ?
>
> it is general X86 domain configuration. with 4 levels,
> sibling/core/cpu/numa.
>>> CPU is a bug that slipped into domain degeneration.  You should have
>>> SIBLING/MC/NUMA (chasing that down is on todo).
>>
>> Maybe.
>> the CPU/NUMA is different on domain flags, CPU has SD_PREFER_SIBLING.
> 
> What I noticed during (an unrelated) bisection on a 40 core box was
> domains going from so..
> 
> 3.4.0-bisect (virgin)
> [5.056214] CPU0 attaching sched-domain:
> [5.065009]  domain 0: span 0,32 level SIBLING
> [5.075011]   groups: 0 (cpu_power = 589) 32 (cpu_power = 589)
> [5.088381]   domain 1: span 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
> [5.107669]groups: 0,32 (cpu_power = 1178)  4,36 (cpu_power = 1178)  
> 8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
>  16,48 (cpu_power = 1177) 20,52 (cpu_power = 1178) 
> 24,56 (cpu_power = 1177) 28,60 (cpu_power = 1177)
>  64,72 (cpu_power = 1176) 68,76 (cpu_power = 1176)
> [5.162115]domain 2: span 0-79 level NODE
> [5.171927] groups: 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11773)
>
> 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11772)
>
> 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78 (cpu_power = 11773)
>
> 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11770)
> 
> ..to so, which looks a little bent.  CPU and MC have identical spans, so
> CPU should have gone away, as it used to do.
> 

better to remove one, and believe you can make it. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/6] iio: dac: ad5686: Don't set error code to voltage_uv

2012-12-13 Thread Axel Lin

regulator_get_voltage() may return negative error code.
Add error checking to avoid setting error code to voltage_uv.

Signed-off-by: Axel Lin 
---
 drivers/iio/dac/ad5686.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/dac/ad5686.c b/drivers/iio/dac/ad5686.c
index bc92ff9..28f2e59 100644
--- a/drivers/iio/dac/ad5686.c
+++ b/drivers/iio/dac/ad5686.c
@@ -332,7 +332,11 @@ static int __devinit ad5686_probe(struct spi_device *spi)
if (ret)
goto error_put_reg;
 
-   voltage_uv = regulator_get_voltage(st->reg);
+   ret = regulator_get_voltage(st->reg);
+   if (ret)
+   goto error_disable_reg;
+
+   voltage_uv = ret;
}
 
st->chip_info =
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] iio: dac: ad5624r_spi: Don't set error code to voltage_uv

2012-12-13 Thread Axel Lin

regulator_get_voltage() may return negative error code.
Add error checking to avoid setting error code to voltage_uv.

Signed-off-by: Axel Lin 
---
 drivers/iio/dac/ad5624r_spi.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/dac/ad5624r_spi.c b/drivers/iio/dac/ad5624r_spi.c
index 6a7d6a4..ae7fb64 100644
--- a/drivers/iio/dac/ad5624r_spi.c
+++ b/drivers/iio/dac/ad5624r_spi.c
@@ -238,7 +238,11 @@ static int __devinit ad5624r_probe(struct spi_device *spi)
if (ret)
goto error_put_reg;
 
-   voltage_uv = regulator_get_voltage(st->reg);
+   ret = regulator_get_voltage(st->reg);
+   if (ret)
+   goto error_disable_reg;
+
+   voltage_uv = ret;
}
 
spi_set_drvdata(spi, indio_dev);
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/6] iio: dac: ad5504: Don't set error code to voltage_uv

2012-12-13 Thread Axel Lin

regulator_get_voltage() may return negative error code.
Add error checking to avoid setting error code to voltage_uv.

Signed-off-by: Axel Lin 
---
 drivers/iio/dac/ad5504.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/dac/ad5504.c b/drivers/iio/dac/ad5504.c
index 242bdc7..1e9a1f4 100644
--- a/drivers/iio/dac/ad5504.c
+++ b/drivers/iio/dac/ad5504.c
@@ -296,7 +296,11 @@ static int __devinit ad5504_probe(struct spi_device *spi)
if (ret)
goto error_put_reg;
 
-   voltage_uv = regulator_get_voltage(reg);
+   ret = regulator_get_voltage(reg);
+   if (ret)
+   goto error_disable_reg;
+
+   voltage_uv = ret;
}
 
spi_set_drvdata(spi, indio_dev);
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/6] iio: dac: ad5380: Don't set error code to st->vref

2012-12-13 Thread Axel Lin

regulator_get_voltage() may return negative error code.
Add error checking to avoid setting error code to st->vref_uv.

Signed-off-by: Axel Lin 
---
 drivers/iio/dac/ad5380.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/dac/ad5380.c b/drivers/iio/dac/ad5380.c
index 14991ac..4aca189 100644
--- a/drivers/iio/dac/ad5380.c
+++ b/drivers/iio/dac/ad5380.c
@@ -406,7 +406,11 @@ static int __devinit ad5380_probe(struct device *dev, 
struct regmap *regmap,
goto error_free_reg;
}
 
-   st->vref = regulator_get_voltage(st->vref_reg);
+   ret = regulator_get_voltage(st->vref_reg);
+   if (ret < 0)
+   goto error_disable_reg;
+
+   st->vref = ret;
} else {
st->vref = st->chip_info->int_vref;
ctrl |= AD5380_CTRL_INT_VREF_EN;
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/6] iio: adc: ad7266: Don't set error code to st->vref_uv

2012-12-13 Thread Axel Lin

regulator_get_voltage() may return negative error code.
Add error checking to avoid setting error code to st->vref_uv.

Signed-off-by: Axel Lin 
---
 drivers/iio/adc/ad7266.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iio/adc/ad7266.c b/drivers/iio/adc/ad7266.c
index a6f4fc5..e36107d 100644
--- a/drivers/iio/adc/ad7266.c
+++ b/drivers/iio/adc/ad7266.c
@@ -411,7 +411,11 @@ static int __devinit ad7266_probe(struct spi_device *spi)
if (ret)
goto error_put_reg;
 
-   st->vref_uv = regulator_get_voltage(st->reg);
+   ret = regulator_get_voltage(st->reg);
+   if (ret < 0)
+   goto error_disable_reg;
+
+   st->vref_uv = ret;
} else {
/* Use internal reference */
st->vref_uv = 250;
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-13 Thread Mike Galbraith

On Fri, 2012-12-14 at 14:36 +0800, Alex Shi wrote: 
> On 12/14/2012 12:45 PM, Mike Galbraith wrote:
> >> > Do you have further ideas for buddy cpu on such example?
> >>> > > 
> >>> > > Which kind of sched_domain configuration have you for such system ?
> >>> > > and how many sched_domain level have you ?
> >> > 
> >> > it is general X86 domain configuration. with 4 levels,
> >> > sibling/core/cpu/numa.
> > CPU is a bug that slipped into domain degeneration.  You should have
> > SIBLING/MC/NUMA (chasing that down is on todo).
> 
> Maybe.
> the CPU/NUMA is different on domain flags, CPU has SD_PREFER_SIBLING.

What I noticed during (an unrelated) bisection on a 40 core box was
domains going from so..

3.4.0-bisect (virgin)
[5.056214] CPU0 attaching sched-domain:
[5.065009]  domain 0: span 0,32 level SIBLING
[5.075011]   groups: 0 (cpu_power = 589) 32 (cpu_power = 589)
[5.088381]   domain 1: span 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
[5.107669]groups: 0,32 (cpu_power = 1178)  4,36 (cpu_power = 1178)  
8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
 16,48 (cpu_power = 1177) 20,52 (cpu_power = 1178) 
24,56 (cpu_power = 1177) 28,60 (cpu_power = 1177)
 64,72 (cpu_power = 1176) 68,76 (cpu_power = 1176)
[5.162115]domain 2: span 0-79 level NODE
[5.171927] groups: 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11773)
   
1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11772)
   
2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78 (cpu_power = 11773)
   
3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11770)

..to so, which looks a little bent.  CPU and MC have identical spans, so
CPU should have gone away, as it used to do.

3.6.0-bisect (virgin)
[3.978338] CPU0 attaching sched-domain:
[3.987125]  domain 0: span 0,32 level SIBLING
[3.997125]   groups: 0 (cpu_power = 588) 32 (cpu_power = 589)
[4.010477]   domain 1: span 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
[4.029748]groups: 0,32 (cpu_power = 1177)  4,36 (cpu_power = 1177)  
8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
 16,48 (cpu_power = 1178) 20,52 (cpu_power = 1178) 
24,56 (cpu_power = 1178) 28,60 (cpu_power = 1178)
 64,72 (cpu_power = 1178) 68,76 (cpu_power = 1177)
[4.084143]domain 2: span 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level CPU
[4.103796] groups: 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)
[4.124373] domain 3: span 0-79 level NUMA
[4.134369]  groups: 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)

1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11778)

2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74 ,78 (cpu_power = 11778)

3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11780)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Dec 14

2012-12-13 Thread Stephen Rothwell

Hi all,

Changes since 20121213:

Lots of conflicts are migrating between trees.

The powerpc tree lost its build failure.

The xtensa tree gained a conflict against Linus' tree.

The akpm tree lost lots of commits that turned up elsewhere and lost its
build failure.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 214 trees (counting Linus' and 28 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (896ea17 Merge tag 'stable/for-linus-3.8-rc0-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen)
Merging fixes/master (8041853 disable the SB105X driver)
Merging kbuild-current/rc-fixes (bad9955 menuconfig: Replace CIRCLEQ by 
list_head-style lists.)
Merging arm-current/fixes (810883f ARM: 7594/1: Add .smp entry for REALVIEW_EB)
Merging m68k-current/for-linus (5fec45a m68k/sun3: Fix instruction faults)
Merging powerpc-merge/merge (e716e01 powerpc/eeh: Do not invalidate PE properly)
Merging sparc/master (df2fc24 Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux)
Merging net/master (eca2a43 bridge: fix icmpv6 endian bug and other sparse 
warnings)
Merging sound-current/for-linus (d846b17 ALSA: hda - Fix build without 
CONFIG_PM)
Merging pci-current/for-linus (ff8e59b PCI/portdrv: Don't create hotplug slots 
unless port supports hotplug)
Merging wireless/master (6bdd253 mac80211: fix remain-on-channel 
(non-)cancelling)
Merging driver-core.current/driver-core-linus (6be35c7 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next)
Merging tty.current/tty-linus (1ebaf4f Merge branch 'x86-timers-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging usb.current/usb-linus (1ebaf4f Merge branch 'x86-timers-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging staging.current/staging-linus (1ebaf4f Merge branch 
'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging char-misc.current/char-misc-linus (1ebaf4f Merge branch 
'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging input-current/for-linus (e12b3ce Input: wacom - fix touch support for 
Bamboo Fun CTH-461)
Merging md-current/for-linus (749586b md/raid5: use async_tx_quiesce() instead 
of open-coding it.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in 
cryptd_queue_worker to prevent data corruption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (03a0b4c solos-pci: fix double-free of TX skb in DMA mode)
CONFLICT (content): Merge conflict in arch/x86/Kconfig.cpu
CONFLICT (content): Merge conflict in arch/x86/Kconfig
CONFLICT (content): Merge conflict in arch/powerpc/Kconfig
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (a0d271c Linux 3.6)
Merging devicetree-current/devicetree/merge (0e622d3 of/address: sparc: Declare 
of_iomap as an extern function for sparc again)
Merging spi-current/spi/merge (a0d271c Linux 3.6)
Mergi

Re: [PATCH v2] pwm: samsung: add missing s3c->pwm_id

2012-12-13 Thread Joonyoung Shim


On 12/14/2012 04:34 PM, Thierry Reding wrote:

On Fri, Dec 14, 2012 at 03:58:58PM +0900, Joonyoung Shim wrote:

The s3c->pwm_id is used to calculate offset of related register.

Signed-off-by: Joonyoung Shim 

I've modified the subject a bit to make it clear the assignment of
s3c->pwm_id was missing, not the s3c->pwm_id field altogether.

Applied, thanks.


OK, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/26] AIO performance improvements/cleanups, v2

2012-12-13 Thread Jens Axboe

On 2012-12-14 03:26, Jack Wang wrote:
> 2012/12/14 Jens Axboe :
>> On Mon, Dec 03 2012, Kent Overstreet wrote:
>>> Last posting: http://thread.gmane.org/gmane.linux.kernel.aio.general/3169
>>>
>>> Changes since the last posting should all be noted in the individual
>>> patch descriptions.
>>>
>>>  * Zach pointed out the aio_read_evt() patch was calling functions that
>>>could sleep in TASK_INTERRUPTIBLE state, that patch is rewritten.
>>>  * Ben pointed out some synchronize_rcu() usage was problematic,
>>>converted it to call_rcu()
>>>  * The flush_dcache_page() patch is new
>>>  * Changed the "use cancellation list lazily" patch so as to remove
>>>ki_flags from struct kiocb.
>>
>> Kent, I ran a few tests, and the below patches still don't seem as fast
>> as the approach I took. To keep it fair, I used your aio branch and
>> applied by dio speedups too. As a sanity check, I ran with your branch
>> alone as well. The quick results below - kaio is kent-aio, just your
>> branch. kaio-dio is with the direct IO speedups too. jaio is my branch,
>> which already has the dio changes too.
>>
>> Devices Branch  IOPS
>> 1   kaio~915K
>> 1   kaio-dio~930K
>> 1   jaio   ~1220K
>> 6   kaio   ~3050K
>> 6   kaio-dio   ~3080K
>> 6   jaio3500K
>>
>> The box runs out of CPU driving power, which is why it doesn't scale
>> linearly, otherwise I know that jaio at least does. It's basically
>> completion limited for the 6 device test at the moment.
>>
>> I'll run some profiling tomorrow morning and get you some better
>> results. Just thought I'd share these at least.
>>
>> --
>> Jens Axboe
>>
> 
> A really good performance, woo.
> 
> I think the device tested is really fast PCIe SSD builded by fusionio
> with fusionio in house block driver?

It is pci-e flash storage, but it is not fusion-io.

> any compare number with current mainline?

Sure, I should have included that. Here's the table again, this time
with mainline as well.

Devices Branch  IOPS
1   mainline~870K
1   kaio~915K
1   kaio-dio~930K
1   jaio   ~1220K
6   kaio   ~3050K
6   kaio-dio   ~3080K
6   jaio   ~3500K
6   mainline   ~2850K


-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] pwm: samsung: add missing s3c->pwm_id

2012-12-13 Thread Thierry Reding

On Fri, Dec 14, 2012 at 03:58:58PM +0900, Joonyoung Shim wrote:
> The s3c->pwm_id is used to calculate offset of related register.
> 
> Signed-off-by: Joonyoung Shim 

I've modified the subject a bit to make it clear the assignment of
s3c->pwm_id was missing, not the s3c->pwm_id field altogether.

Applied, thanks.

Thierry

pgp45WtmsVaUF.pgp
Description: PGP signature

Re: [PATCH] mm: Downgrade mmap_sem before locking or populating on mmap

2012-12-13 Thread Al Viro

On Thu, Dec 13, 2012 at 09:49:43PM -0800, Andy Lutomirski wrote:
> This is a serious cause of mmap_sem contention.  MAP_POPULATE
> and MCL_FUTURE, in particular, are disastrous in multithreaded programs.
> 
> Signed-off-by: Andy Lutomirski 
> ---
> 
> Sensible people use anonymous mappings.  I write kernel patches :)
> 
> I'm not entirely thrilled by the aesthetics of this patch.  The MAP_POPULATE 
> case
> could also be improved by doing it without any lock at all.  This is still a 
> big
> improvement, though.

Wait a minute.  get_user_pages() relies on ->mmap_sem being held.  Unless
I'm seriously misreading your patch it removes that protection.  And yes,
I'm aware of execve-related exception; it's in special circumstances -
bprm->mm is guaranteed to be not shared (and we need to rearchitect that
area anyway, but that's a separate story).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RESEND][PATCH] gpu: remove gma500 stub driver

2012-12-13 Thread Lee, Chun-Yi

In v3.3, the gma500 drm driver moved from staging to drm group by
Alan Cox's 3abcf41fb patch. the gma500 drm driver should control
brightness well and don't need gma500 stub driver anymore.

Reference:
http://lists.freedesktop.org/archives/dri-devel/2012-May/023426.html
http://lists.freedesktop.org/archives/dri-devel/2012-May/023467.html

Cc: randy.dun...@oracle.com
Cc: tr...@suse.de
Cc: valdis.kletni...@vt.edu
Cc: dennis.jan...@web.de
Cc: airl...@redhat.com
Cc: e...@anholt.net
Cc: alexdeuc...@gmail.com
Cc: bske...@redhat.com
Cc: ch...@chris-wilson.co.uk
Cc: Alan Cox 
Acked-by: Matthew Garrett 
Acked-by: Greg Kroah-Hartman 
Signed-off-by: Lee, Chun-Yi 
---
 drivers/gpu/Makefile   |2 +-
 drivers/gpu/stub/Kconfig   |   18 
 drivers/gpu/stub/Makefile  |1 -
 drivers/gpu/stub/poulsbo.c |   64 
 drivers/video/Kconfig  |2 -
 5 files changed, 1 insertions(+), 86 deletions(-)
 delete mode 100644 drivers/gpu/stub/Kconfig
 delete mode 100644 drivers/gpu/stub/Makefile
 delete mode 100644 drivers/gpu/stub/poulsbo.c

diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index cc92778..30879df 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -1 +1 @@
-obj-y  += drm/ vga/ stub/
+obj-y  += drm/ vga/
diff --git a/drivers/gpu/stub/Kconfig b/drivers/gpu/stub/Kconfig
deleted file mode 100644
index 4199179..000
--- a/drivers/gpu/stub/Kconfig
+++ /dev/null
@@ -1,18 +0,0 @@
-config STUB_POULSBO
-   tristate "Intel GMA500 Stub Driver"
-   depends on PCI
-   depends on NET # for THERMAL
-   # Poulsbo stub depends on ACPI_VIDEO when ACPI is enabled
-   # but for select to work, need to select ACPI_VIDEO's dependencies, ick
-   select BACKLIGHT_CLASS_DEVICE if ACPI
-   select VIDEO_OUTPUT_CONTROL if ACPI
-   select INPUT if ACPI
-   select ACPI_VIDEO if ACPI
-   select THERMAL if ACPI
-   help
- Choose this option if you have a system that has Intel GMA500
- (Poulsbo) integrated graphics. If M is selected, the module will
- be called Poulsbo. This driver is a stub driver for Poulsbo that
- will call poulsbo.ko to enable the acpi backlight control sysfs
- entry file because there have no poulsbo native driver can support
- intel opregion.
diff --git a/drivers/gpu/stub/Makefile b/drivers/gpu/stub/Makefile
deleted file mode 100644
index cd940cc..000
--- a/drivers/gpu/stub/Makefile
+++ /dev/null
@@ -1 +0,0 @@
-obj-$(CONFIG_STUB_POULSBO) += poulsbo.o
diff --git a/drivers/gpu/stub/poulsbo.c b/drivers/gpu/stub/poulsbo.c
deleted file mode 100644
index 7edfd27..000
--- a/drivers/gpu/stub/poulsbo.c
+++ /dev/null
@@ -1,64 +0,0 @@
-/*
- * Intel Poulsbo Stub driver
- *
- * Copyright (C) 2010 Novell 
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License version 2 as published by
- * the Free Software Foundation.
- *
- */
-
-#include 
-#include 
-#include 
-#include 
-
-#define DRIVER_NAME "poulsbo"
-
-enum {
-   CHIP_PSB_8108 = 0,
-   CHIP_PSB_8109 = 1,
-};
-
-static DEFINE_PCI_DEVICE_TABLE(pciidlist) = {
-   {0x8086, 0x8108, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_PSB_8108}, \
-   {0x8086, 0x8109, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_PSB_8109}, \
-   {0, 0, 0}
-};
-
-static int poulsbo_probe(struct pci_dev *pdev, const struct pci_device_id *id)
-{
-   return acpi_video_register();
-}
-
-static void poulsbo_remove(struct pci_dev *pdev)
-{
-   acpi_video_unregister();
-}
-
-static struct pci_driver poulsbo_driver = {
-   .name = DRIVER_NAME,
-   .id_table = pciidlist,
-   .probe = poulsbo_probe,
-   .remove = poulsbo_remove,
-};
-
-static int __init poulsbo_init(void)
-{
-   return pci_register_driver(_driver);
-}
-
-static void __exit poulsbo_exit(void)
-{
-   pci_unregister_driver(_driver);
-}
-
-module_init(poulsbo_init);
-module_exit(poulsbo_exit);
-
-MODULE_AUTHOR("Lee, Chun-Yi ");
-MODULE_DESCRIPTION("Poulsbo Stub Driver");
-MODULE_LICENSE("GPL");
-
-MODULE_DEVICE_TABLE(pci, pciidlist);
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index d08d799..cc99f41 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -21,8 +21,6 @@ source "drivers/gpu/vga/Kconfig"
 
 source "drivers/gpu/drm/Kconfig"
 
-source "drivers/gpu/stub/Kconfig"
-
 config VGASTATE
tristate
default n
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] usb: host: tegra: Resetting PORT0 based on information received via DT.

2012-12-13 Thread Venu Byravarasu

Tegra USB host driver is using port instance number,
to handle some of the hardware issues on SOC e.g. reset PORT0
twice etc. As instance number based handling looks ugly,
making use of information passed through DT for achieving this.

Signed-off-by: Venu Byravarasu 
---
 drivers/usb/host/ehci-tegra.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/ehci-tegra.c b/drivers/usb/host/ehci-tegra.c
index acf1755..55a9cde 100644
--- a/drivers/usb/host/ehci-tegra.c
+++ b/drivers/usb/host/ehci-tegra.c
@@ -43,6 +43,7 @@ struct tegra_ehci_hcd {
struct usb_phy *transceiver;
int host_resumed;
int port_resuming;
+   bool needs_double_reset;
enum tegra_usb_phy_port_speed port_speed;
 };
 
@@ -184,7 +185,7 @@ static int tegra_ehci_hub_control(
}
 
/* For USB1 port we need to issue Port Reset twice internally */
-   if (tegra->phy->instance == 0 &&
+   if (tegra->needs_double_reset &&
   (typeReq == SetPortFeature && wValue == USB_PORT_FEAT_RESET)) {
spin_unlock_irqrestore(>lock, flags);
return tegra_ehci_internal_port_reset(ehci, status_reg);
@@ -666,6 +667,9 @@ static int tegra_ehci_probe(struct platform_device *pdev)
clk_prepare_enable(tegra->emc_clk);
clk_set_rate(tegra->emc_clk, 4);
 
+   tegra->needs_double_reset = of_property_read_bool(pdev->dev.of_node,
+   "nvidia,needs-double-reset");
+
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!res) {
dev_err(>dev, "Failed to get I/O memory\n");
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/2] usb: tegra: modifying port reset based on instance number

2012-12-13 Thread Venu Byravarasu

Tegra USB host driver is using port instance number,
to handle some of the hardware issues on SOC e.g. reset PORT0
twice etc. As instance number based handling looks ugly,
added a new property to USB DT node for this purpose.
Modified host driver to make use of the information passed
through DT to reset the port second time.


Venu Byravarasu (2):
  arm: tegra: Add new DT property to USB node.
  usb: host: tegra: Resetting PORT0 based on information received via
DT.

 .../bindings/usb/nvidia,tegra20-ehci.txt   |2 ++
 arch/arm/boot/dts/tegra20.dtsi |1 +
 drivers/usb/host/ehci-tegra.c  |6 +-
 3 files changed, 8 insertions(+), 1 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] arm: tegra: Add new DT property to USB node.

2012-12-13 Thread Venu Byravarasu

As Tegra USB host driver is using instance number for resetting
PORT0 twice, adding a new DT property for handling this.

Signed-off-by: Venu Byravarasu 
---
 .../bindings/usb/nvidia,tegra20-ehci.txt   |2 ++
 arch/arm/boot/dts/tegra20.dtsi |1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/nvidia,tegra20-ehci.txt 
b/Documentation/devicetree/bindings/usb/nvidia,tegra20-ehci.txt
index e9b005d..443a2b4 100644
--- a/Documentation/devicetree/bindings/usb/nvidia,tegra20-ehci.txt
+++ b/Documentation/devicetree/bindings/usb/nvidia,tegra20-ehci.txt
@@ -27,3 +27,5 @@ Optional properties:
 registers are accessed through the APB_MISC base address instead of
 the USB controller. Since this is a legacy issue it probably does not
 warrant a compatible string of its own.
+  - nvidia,needs-double-reset : boolean is to be set for some of the Tegra2
+USB ports, which need reset twice due to hardware issues.
diff --git a/arch/arm/boot/dts/tegra20.dtsi b/arch/arm/boot/dts/tegra20.dtsi
index b8effa1..3ebbd0c 100644
--- a/arch/arm/boot/dts/tegra20.dtsi
+++ b/arch/arm/boot/dts/tegra20.dtsi
@@ -364,6 +364,7 @@
phy_type = "utmi";
nvidia,has-legacy-mode;
status = "disabled";
+   nvidia,needs-double-reset;
};
 
usb@c5004000 {
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 02/38] PCI/PM: Fix deadlock when unbinding device if parent in D3cold

2012-12-13 Thread Huang Ying

On Tue, 2012-12-11 at 10:08 -0800, Greg Kroah-Hartman wrote:
> On Tue, Dec 11, 2012 at 04:12:52PM +0800, Huang Ying wrote:
[snip]
> > Which tree should I base the patch on?
> 
> 3.6.10 would be great.

Here is the patch for 3.6.10.

->
Subject: [PATCH] PCI/PM: Fix deadlock when unbind device if its parent in D3cold

If a PCI device and its parents are put into D3cold, unbinding the
device will trigger deadlock as follow:

- driver_unbind
  - device_release_driver
- device_lock(dev)  <--- previous lock here
- __device_release_driver
  - pm_runtime_get_sync
...
  - rpm_resume(dev)
- rpm_resume(dev->parent)
  ...
- pci_pm_runtime_resume
  ...
  - pci_set_power_state
- __pci_start_power_transition
  - pci_wakeup_bus(dev->parent->subordinate)
- pci_walk_bus
  - device_lock(dev)<--- dead lock here


If we do not do device_lock in pci_walk_bus, we can avoid dead lock.
Device_lock in pci_walk_bus is introduced in commit:
d71374dafbba7ec3f67371d3b7e9f6310a588808, corresponding email thread
is: https://lkml.org/lkml/2006/5/26/38.  The patch author Zhang Yanmin
said device_lock is added to pci_walk_bus because:

  Some error handling functions call pci_walk_bus. For example, PCIe
  aer. Here we lock the device, so the driver wouldn't detach from the
  device, as the cb might call driver's callback function.

So I fixed the dead lock as follow:

- remove device_lock from pci_walk_bus
- add device_lock into callback if callback will call driver's callback

I checked pci_walk_bus users one by one, and found only PCIe aer needs
device lock.

Signed-off-by: Huang Ying 
Cc: Zhang Yanmin 
---
 arch/powerpc/platforms/pseries/eeh_driver.c |   35 
 drivers/pci/bus.c   |3 --
 drivers/pci/pcie/aer/aerdrv_core.c  |   20 
 3 files changed, 41 insertions(+), 17 deletions(-)

--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -316,10 +316,7 @@ void pci_walk_bus(struct pci_bus *top, i
} else
next = dev->bus_list.next;
 
-   /* Run device routines with the device locked */
-   device_lock(>dev);
retval = cb(dev, userdata);
-   device_unlock(>dev);
if (retval)
break;
}
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -244,6 +244,7 @@ static int report_error_detected(struct
struct aer_broadcast_data *result_data;
result_data = (struct aer_broadcast_data *) data;
 
+   device_lock(>dev);
dev->error_state = result_data->state;
 
if (!dev->driver ||
@@ -262,12 +263,14 @@ static int report_error_detected(struct
   dev->driver ?
   "no AER-aware driver" : "no driver");
}
-   return 0;
+   goto out;
}
 
err_handler = dev->driver->err_handler;
vote = err_handler->error_detected(dev, result_data->state);
result_data->result = merge_result(result_data->result, vote);
+out:
+   device_unlock(>dev);
return 0;
 }
 
@@ -278,14 +281,17 @@ static int report_mmio_enabled(struct pc
struct aer_broadcast_data *result_data;
result_data = (struct aer_broadcast_data *) data;
 
+   device_lock(>dev);
if (!dev->driver ||
!dev->driver->err_handler ||
!dev->driver->err_handler->mmio_enabled)
-   return 0;
+   goto out;
 
err_handler = dev->driver->err_handler;
vote = err_handler->mmio_enabled(dev);
result_data->result = merge_result(result_data->result, vote);
+out:
+   device_unlock(>dev);
return 0;
 }
 
@@ -296,14 +302,17 @@ static int report_slot_reset(struct pci_
struct aer_broadcast_data *result_data;
result_data = (struct aer_broadcast_data *) data;
 
+   device_lock(>dev);
if (!dev->driver ||
!dev->driver->err_handler ||
!dev->driver->err_handler->slot_reset)
-   return 0;
+   goto out;
 
err_handler = dev->driver->err_handler;
vote = err_handler->slot_reset(dev);
result_data->result = merge_result(result_data->result, vote);
+out:
+   device_unlock(>dev);
return 0;
 }
 
@@ -311,15 +320,18 @@ static int report_resume(struct pci_dev
 {
struct pci_error_handlers *err_handler;
 
+   device_lock(>dev);
dev->error_state = pci_channel_io_normal;
 
if (!dev->driver ||
!dev->driver->err_handler ||
!dev->driver->err_handler->resume)
-   return

Re: [PATCH] OMAP GPIO - don't wake from suspend unless requested.

2012-12-13 Thread NeilBrown

On Mon, 10 Sep 2012 10:57:07 -0700 Kevin Hilman 
wrote:


> OK thanks, I'll queue this up for v3.6-rc as this should qualify as a
> regression.

I don't think this did actually get queued.  At least I'm still seeing the
bug in 3.7 and I cannot see a patch in the git history that looks right.
But then I don't remember what we ended up with - it was 3 months ago!!!

This is the last thing I can find in my email history - it still seems to
apply and still seems to work.

NeilBrown


From: NeilBrown 
Date: Mon, 10 Sep 2012 14:09:06 +1000
Subject: [PATCH] OMAP GPIO - don't wake from suspend unless requested.

Current kernel will wake from suspend on an event an any active
GPIO event if enable_irq_wake() wasn't called.

There are two reasons that the hardware wake-enable bit should be set:

1/ while non-suspended the CPU might go into a deep sleep (off_mode)
  in which the wake-enable bit is needed for an interrupt to be
  recognised.
2/ while suspended the GPIO interrupt should wake from suspend if and
   only if irq_wake as been enabled.

The code currently doesn't keep these two reasons separate so they get
confused and sometimes the wakeup flags is set incorrectly.

This patch reverts:
 commit 9c4ed9e6c01e7a8bd9079da8267e1f03cb4761fc
gpio/omap: remove suspend/resume callbacks
and
 commit 0aa2727399c0b78225021413022c164cb99fbc5e
gpio/omap: remove suspend_wakeup field from struct gpio_bank

and makes some minor changes so that we have separate flags for "GPIO
should wake from deep idle" and "GPIO should wake from suspend".

With this patch, the GPIO from my touch screen doesn't wake my device
any more, which is what I want.

Cc: Kevin Hilman 
Cc: Tony Lindgren 
Cc: Santosh Shilimkar 
Cc: Benoit Cousson 
Cc: Grant Likely 
Cc: Tarun Kanti DebBarma 
Cc: Felipe Balbi 
Cc: Govindraj.R 

Signed-off-by: NeilBrown 

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 4fbc208..79e1340 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -57,6 +57,7 @@ struct gpio_bank {
u16 irq;
int irq_base;
struct irq_domain *domain;
+   u32 suspend_wakeup;
u32 non_wakeup_gpios;
u32 enabled_non_wakeup_gpios;
struct gpio_regs context;
@@ -522,11 +523,9 @@ static int _set_gpio_wakeup(struct gpio_bank *bank, int 
gpio, int enable)
 
spin_lock_irqsave(>lock, flags);
if (enable)
-   bank->context.wake_en |= gpio_bit;
+   bank->suspend_wakeup |= gpio_bit;
else
-   bank->context.wake_en &= ~gpio_bit;
-
-   __raw_writel(bank->context.wake_en, bank->base + bank->regs->wkup_en);
+   bank->suspend_wakeup &= ~gpio_bit;
spin_unlock_irqrestore(>lock, flags);
 
return 0;
@@ -1157,6 +1156,49 @@ static int __devinit omap_gpio_probe(struct 
platform_device *pdev)
 #ifdef CONFIG_ARCH_OMAP2PLUS
 
 #if defined(CONFIG_PM_RUNTIME)
+
+#if defined(CONFIG_PM_SLEEP)
+static int omap_gpio_suspend(struct device *dev)
+{
+   struct platform_device *pdev = to_platform_device(dev);
+   struct gpio_bank *bank = platform_get_drvdata(pdev);
+   void __iomem *base = bank->base;
+   unsigned long flags;
+
+   if (!bank->mod_usage ||
+   !bank->regs->wkup_en ||
+   !bank->context.wake_en)
+   return 0;
+
+   spin_lock_irqsave(>lock, flags);
+   _gpio_rmw(base, bank->regs->wkup_en, 0x, 0);
+   _gpio_rmw(base, bank->regs->wkup_en, bank->suspend_wakeup, 1);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+
+static int omap_gpio_resume(struct device *dev)
+{
+   struct platform_device *pdev = to_platform_device(dev);
+   struct gpio_bank *bank = platform_get_drvdata(pdev);
+   void __iomem *base = bank->base;
+   unsigned long flags;
+
+   if (!bank->mod_usage ||
+   !bank->regs->wkup_en ||
+   !bank->context.wake_en)
+   return 0;
+
+   spin_lock_irqsave(>lock, flags);
+   _gpio_rmw(base, bank->regs->wkup_en, 0x, 0);
+   _gpio_rmw(base, bank->regs->wkup_en, bank->context.wake_en, 1);
+   spin_unlock_irqrestore(>lock, flags);
+
+   return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
 static void omap_gpio_restore_context(struct gpio_bank *bank);
 
 static int omap_gpio_runtime_suspend(struct device *dev)
@@ -1386,11 +1428,14 @@ static void omap_gpio_restore_context(struct gpio_bank 
*bank)
 }
 #endif /* CONFIG_PM_RUNTIME */
 #else
+#define omap_gpio_suspend NULL
+#define omap_gpio_resume NULL
 #define omap_gpio_runtime_suspend NULL
 #define omap_gpio_runtime_resume NULL
 #endif
 
 static const struct dev_pm_ops gpio_pm_ops = {
+   SET_SYSTEM_SLEEP_PM_OPS(omap_gpio_suspend, omap_gpio_resume)
SET_RUNTIME_PM_OPS(omap_gpio_runtime_suspend, omap_gpio_runtime_resume,
NULL)
 };

[signature.asc  application/pgp-signature (828 bytes)] 


signature.asc

Re: [PATCH 2/2] dmatest: check for dma mapping error

2012-12-13 Thread Andy Shevchenko

On Fri, Dec 14, 2012 at 1:34 AM, Andrew Morton
 wrote:
> On Mon, 10 Dec 2012 13:37:44 +0200
> Andy Shevchenko  wrote:
>
>> We get a warning if CONFIG_DMA_API_DEBUG=y
>>
>> [   28.150631] WARNING: at lib/dma-debug.c:933 check_unmap+0x5d6/0x6ac()
>> [   28.157058] dw_dmac dw_dmac.0: DMA-API: device driver failed to check map 
>> error[device address=0x35698305] [size=14365 bytes] [mapped as 
>> single]

>> --- a/drivers/dma/dmatest.c
>> +++ b/drivers/dma/dmatest.c
>> @@ -378,15 +378,35 @@ static int dmatest_func(void *data)
>>
>>   dma_srcs[i] = dma_map_single(dev->dev, buf, len,
>>DMA_TO_DEVICE);
>> + ret = dma_mapping_error(dev->dev, dma_srcs[i]);
>> + if (ret) {
>> + unmap_src(dev->dev, dma_srcs, len, i);
>> + pr_warn("%s: #%u: mapping error %d with "
>> + "src_off=0x%x len=0x%x\n",
>> + thread_name, total_tests - 1, ret,
>> + src_off, len);
>> + failed_tests++;
>> + continue;
>> + }
>
> The changelog and the code don't match.  Which one is out of date?

I'm afraid I didn't get what is wrong.

dmatest maps memory via dma_map_single() call. And it asks dmaengine
(at the end the DMA controller driver) to unmap them after usage.
Exactly at that time user gets a warning. And this warning is
independent of DMA controller in use.

-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMAR and DRHD errors[DMAR:[fault reason 06] PTE Read access is not set] Vt-d & intel_iommu

2012-12-13 Thread Jason Gao

On Fri, Dec 14, 2012 at 2:56 PM, Jason Gao  wrote:
> On Fri, Dec 14, 2012 at 12:45 PM, Alex Williamson
>  wrote:
>> Is the MegaRAID firmware and system management firmware the same as
>> well?  Thanks.
>
> I'v updated  all the firmware using Dell's firmware-tools:
>
> # inventory_firmware
> Wait while we inventory system:
> System inventory:
> BIOS = 6.3.0
> SAS/SATA Backplane 0:0 Backplane Firmware = 1.07
> PERC 6/i Integrated Controller 0 Firmware = 6.3.1-0003
> Dell OS Drivers Pack, v.6.5.3, A00 = 6.5.3
> Dell Lifecycle Controller = 1.5.5.27
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth1) = 7.2.20
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth0) = 7.2.20
> ST3600057SS Firmware = es66
> iDRAC6 = 1.92
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth2) = 7.2.20
> NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth3) = 7.2.20
> Dell 32 Bit Diagnostics, v.5154A0, 5154.1 = 5154a0
> System BIOS for PowerEdge R710 = 6.3.0
>
> Thanks

#lspci - -s 03:00.0|grep fail:
pcilib: sysfs_read_vpd: read failed: Connection timed out

#strace lspci - -s 03:00.0|grep fail:

open("/sys/bus/pci/devices/:03:00.0/vpd", O_RDONLY) = 4
pread(4, 0x7fff30670b3f, 1, 0)  = -1 ETIMEDOUT (Connection timed out)
write(2, "pcilib: ", 8pcilib: ) = 8
write(2, "sysfs_read_vpd: read failed: Con"..., 49sysfs_read_vpd: read
failed: Connection timed out) = 49
write(2, "\n", 1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] pwm: samsung: add missing s3c->pwm_id

2012-12-13 Thread Joonyoung Shim

The s3c->pwm_id is used to calculate offset of related register.

Signed-off-by: Joonyoung Shim 
---
Changelog from v1:
 - move the assignment code to below.

 drivers/pwm/pwm-samsung.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pwm/pwm-samsung.c b/drivers/pwm/pwm-samsung.c
index 023a3be..0704a2a 100644
--- a/drivers/pwm/pwm-samsung.c
+++ b/drivers/pwm/pwm-samsung.c
@@ -222,6 +222,7 @@ static int s3c_pwm_probe(struct platform_device *pdev)
 
/* calculate base of control bits in TCON */
s3c->tcon_base = id == 0 ? 0 : (id * 4) + 4;
+   s3c->pwm_id = id;
s3c->chip.dev = >dev;
s3c->chip.ops = _pwm_ops;
s3c->chip.base = -1;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMAR and DRHD errors[DMAR:[fault reason 06] PTE Read access is not set] Vt-d & intel_iommu

2012-12-13 Thread Jason Gao

On Fri, Dec 14, 2012 at 12:45 PM, Alex Williamson
 wrote:
> Is the MegaRAID firmware and system management firmware the same as
> well?  Thanks.

I'v updated  all the firmware using Dell's firmware-tools:

# inventory_firmware
Wait while we inventory system:
System inventory:
BIOS = 6.3.0
SAS/SATA Backplane 0:0 Backplane Firmware = 1.07
PERC 6/i Integrated Controller 0 Firmware = 6.3.1-0003
Dell OS Drivers Pack, v.6.5.3, A00 = 6.5.3
Dell Lifecycle Controller = 1.5.5.27
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth1) = 7.2.20
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth0) = 7.2.20
ST3600057SS Firmware = es66
iDRAC6 = 1.92
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth2) = 7.2.20
NetXtreme II BCM5709 Gigabit Ethernet rev 20 (eth3) = 7.2.20
Dell 32 Bit Diagnostics, v.5154A0, 5154.1 = 5154a0
System BIOS for PowerEdge R710 = 6.3.0

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pwm: samsung: add missing s3c->pwm_id

2012-12-13 Thread Thierry Reding

On Fri, Dec 14, 2012 at 01:20:09PM +0900, Joonyoung Shim wrote:
> The s3c->pwm_id is used to calculate offset of related register.
> 
> Signed-off-by: Joonyoung Shim 
> ---
>  drivers/pwm/pwm-samsung.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/pwm/pwm-samsung.c b/drivers/pwm/pwm-samsung.c
> index 023a3be..b415102 100644
> --- a/drivers/pwm/pwm-samsung.c
> +++ b/drivers/pwm/pwm-samsung.c
> @@ -220,6 +220,8 @@ static int s3c_pwm_probe(struct platform_device *pdev)
>   return -ENOMEM;
>   }
>  
> + s3c->pwm_id = id;
> +
>   /* calculate base of control bits in TCON */
>   s3c->tcon_base = id == 0 ? 0 : (id * 4) + 4;

Can you please move the assignment to below the above line? That way s3c
initialization is more localized.

Thierry


pgppXbhun0qiD.pgp
Description: PGP signature

[PATCH 1/1] scripts/config: Fix wrong "shift" for --keep-case

2012-12-13 Thread Hiroshi Doyu

Remove wrong "shift" for --keep-case. There is always "shift" at
beginning of while-loop. No need "shift" at --keep-case just before
"continue" to process next argument.

Now the following works as expected:

./scripts/config -e aAa -k -e bBb -e cCc && tail -3 .config
CONFIG_AAA=y
CONFIG_bBb=y
CONFIG_cCc=y

Signed-off-by: Hiroshi Doyu 
---
 scripts/config |1 -
 1 file changed, 1 deletion(-)

diff --git a/scripts/config b/scripts/config
index ee35539..bb4d3de 100755
--- a/scripts/config
+++ b/scripts/config
@@ -101,7 +101,6 @@ while [ "$1" != "" ] ; do
case "$CMD" in
--keep-case|-k)
MUNGE_CASE=no
-   shift
continue
;;
--refresh)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix up f2fs_get_parent issue to retrieve correct parent inode number

2012-12-13 Thread Jaegeuk Kim

2012-12-14 (금), 14:41 +0900, Namjae Jeon:
> 2012/12/14, Jaegeuk Kim :
> > Hi,
> >
> >> diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
> >> index a60f042..5e48bac 100644
> >> --- a/fs/f2fs/hash.c
> >> +++ b/fs/f2fs/hash.c
> >> @@ -76,6 +76,10 @@ f2fs_hash_t f2fs_dentry_hash(const char *name, int
> >> len)
> >>const char *p;
> >>__u32 in[8], buf[4];
> >>
> >> +  if ((len <= 2) && (name[0] == '.') &&
> >> +  (name[1] == '.' || name[1] == '\0'))
> >> +  return 0;
> >
> > If len == 1, we should avoid referencing name[1].
> > Likewise VFS does, I rewrote that like below.
> >
> > if (name[0] == '.') {
> > switch (len) {
> > case 1:
> > return 0;
> > case 2:
> > if (name[1] == '.')
> > return 0;
> > }
> > }
> >
> > So, how about this patch?
> 
> I think that there is no issue on current patch. Since, the strings
> are always expected to be NULL terminated.
> 
> "." should include '\0', So we can distingsh by checking only name[0], 
> name[1].
> 
> When we do:
> char *ptr="hello"; -> it will always be NULL terminated -> "hello" in
> memory followed by '\0';
> when we reserver space
> char ptr[5];-> We need to reserver space for '\0' at the end.

Got it.
I found that NULL is added to the dentry->d_name as follows.

In __d_alloc(),
dentry->d_name.len = name->len;
dentry->d_name.hash = name->hash;
memcpy(dname, name->name, name->len);
dname[name->len] = 0; 

I'll merge your patch. :)
Thanks,

-- 
Jaegeuk Kim
Samsung


signature.asc
Description: This is a digitally signed message part

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-13 Thread Alex Shi

On 12/14/2012 12:45 PM, Mike Galbraith wrote:
>> > Do you have further ideas for buddy cpu on such example?
>>> > > 
>>> > > Which kind of sched_domain configuration have you for such system ?
>>> > > and how many sched_domain level have you ?
>> > 
>> > it is general X86 domain configuration. with 4 levels,
>> > sibling/core/cpu/numa.
> CPU is a bug that slipped into domain degeneration.  You should have
> SIBLING/MC/NUMA (chasing that down is on todo).

Maybe.
the CPU/NUMA is different on domain flags, CPU has SD_PREFER_SIBLING.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: Safely remove option shows with Micro SD Card connected to Linux through an Android phone

2012-12-13 Thread prasannatsmkumar

On Fri, Dec 14, 2012 at 12:49 AM, Alan Stern  wrote:
> On Thu, 13 Dec 2012, prasannatsmkumar wrote:
>
>> On Thu, Dec 13, 2012 at 11:40 PM, Alan Stern  
>> wrote:
>> > On Thu, 13 Dec 2012, prasannatsmkumar wrote:
>> >
>> >> On Wed, Dec 12, 2012 at 2:07 AM, Alan Stern  
>> >> wrote:
>> >> > STOP UNIT means spin down the disk or eject the disc.  Since your phone
>> >> > doesn't have a disk drive or an optical disc, no wonder this step
>> >> > failed.
>> >>
>> >> Yes of course it does not have a optical disc or disk drive. But I
>> >> thought if there is no such thing nautilus should not try to spin
>> >> down. Linux kernel has nothing to do with this problem though.
>> >
>> > Right.  Bear in mind that nautilus may not have any way of finding out
>> > whether the device has removable media, other than requesting for the
>> > media to be ejected.  But if it doesn't know then failure of the
>> > request shouldn't be reported as an error.
>>
>> Is kernel not exposing this information?
>
> What I wrote earlier was wrong, sorry.

No problem :).


> No, the kernel does not export it.  But user programs can get the
> information directly from the device in exactly the same way that the
> kernel does, by issuing an INQUIRY command.

I will try to file a bug in nautilus project. As I am not in the
nautilus mailing list my mails are not getting delivered.

>
>>  The other OS shows "Eject"
>> for the android device and for other pen drive I get a "safely remove"
>> option - stated this assuming the options in nautilus and the other OS
>> mean the same.
>
> Alan Stern
>

Thanks,
PrasannaKumar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2012-12-13 Thread Terje Bergström

On 13.12.2012 19:58, Stephen Warren wrote:
> On 12/13/2012 01:57 AM, Thierry Reding wrote:
>> After some more discussion with Stephen on IRC we came to the
>> conclusion that the easiest might be to have tegra-drm call into
>> host1x with something like:
>>
>> void host1x_set_drm_device(struct host1x *host1x, struct device
>> *dev);
> 
> If host1x is registering the dummy device that causes tegradrm to be
> instantiated, then presumably there's no need for the API above, since
> host1x will already have the struct device * for tegradrm, since it
> created it?

I didn't add the dummy device in my latest patch set. I first set out to
add it, and moved the drm global data to be drvdata of that device. Then
I noticed that it doesn't actually help at all.

The critical accesses to the global data are from probes of DC, HDMI,
etc. They want to get the global data by getting drvdata of their
parent. The dummy device is not their parent - host1x is. There's no
connection between the dummy and the real client devices. So there's no
logical way for DC and HDMI to find the the dummy device, except perhaps
by searching for "tegradrm" device from platform bus. But then again,
that'd break encapsulation so it's as bad as a global variable - only
with a lot more code to do the same thing.

Accesses after probing can happen via drm->dev_private, so post-probe
we're fine.

Another problem arouse (already mentioned it) when I used the dummy
device to call to drm_platform_init(). It called back into tegradrm,
which calls the CMA FB helper to allocate memory. Unfortunately the
memory is allocated for the dummy device, and it's not initialized to do
do that. I ended up with failed frame buffer allocation. That needs
host1x allocator to fix.

host1x itself shouldn't need any DRM specific calls or callbacks. The
device model already allows traversing through list of host1x children.
The list of drm clients and devices is something that tegradrm needs to
be able to initialize DRM at appropriate time. I also took that into use
for storing the channel and class data. So we should try to keep the
list maintenance as local to tegradrm as we can.

In my latest patch set, I kept the list management inside tegradrm, and
put all DRM global data into struct tegradrm, which is accessed via a
global. I guess global in tegradrm is not as bad as global in host1x,
because one DRM driver can handle multiple devices, so there's no reason
to have multiple tegradrm's trampling on each others data. But if you
can come up with a better solution, I'm all ears.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] i2c-core: Add gpio based bus arbitration implementation

2012-12-13 Thread Naveen Krishna Chatradhi

The arbitrator is a general purpose function which uses two GPIOs to
communicate with another device to claim/release a bus.

i2c_transfer()
if adapter->gpio_arbit
i2c_bus_claim();
__i2c_transfer();
i2c_bus_release();

Signed-off-by: Simon Glass 
Cc: Grant Grundler 
Signed-off-by: Naveen Krishna Chatradhi 
---
 .../devicetree/bindings/i2c/arbitrator-i2c.txt |   56 
 drivers/i2c/i2c-core.c |   67 
 drivers/of/of_i2c.c|   27 
 include/linux/i2c.h|   17 +
 include/linux/of_i2c.h |2 +
 5 files changed, 169 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/i2c/arbitrator-i2c.txt

diff --git a/Documentation/devicetree/bindings/i2c/arbitrator-i2c.txt 
b/Documentation/devicetree/bindings/i2c/arbitrator-i2c.txt
new file mode 100644
index 000..cb91ea8
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/arbitrator-i2c.txt
@@ -0,0 +1,56 @@
+Device-Tree bindings for i2c gpio based bus arbitrator
+
+bus-arbitration-gpios :
+   Two GPIOs to use with the GPIO-based bus arbitration protocol
+(see below).
+The first should be an output, and is used to claim the I2C bus,
+the second should be an input, and signals that the other side (Client)
+wants to claim the bus. This allows two masters to share the same I2C bus.
+
+Required properties:
+   - bus-needs-gpio-arbitration;
+   - bus-arbitration-gpios: AP_CLAIM and Client_CLAIM gpios
+   - bus-arbitration-slew-delay-us:
+   - bus-arbitration-wait-retry-us:
+   - bus-arbitration-wait-free-us:
+
+Example nodes:
+
+i2c@0 {
+   /* If you want GPIO-based bus arbitration */
+   bus-needs-gpio-arbitration;
+   bus-arbitration-gpios = < 3 1 0 0>, /* AP_CLAIM */
+   < 4 0 3 0>; /* EC_CLAIM */
+
+   bus-arbitration-slew-delay-us = <10>;
+   bus-arbitration-wait-retry-us = <2000>;
+   bus-arbitration-wait-free-us = <5>;
+};
+
+GPIO-based Arbitration
+==
+This uses GPIO lines between the AP (SoC) and an attached EC (embedded
+controller) which both want to talk on the same I2C bus as master.
+
+The AP and EC each have a 'bus claim' line, which is an output that the
+other can see. These are both active low, with pull-ups enabled.
+
+- AP_CLAIM: output from AP, signalling to the EC that the AP wants the bus
+- EC_CLAIM: output from EC, signalling to the AP that the EC wants the bus
+
+
+Algorithm
+-
+The basic algorithm is to assert your line when you want the bus, then make
+sure that the other side doesn't want it also. A detailed explanation is best
+done with an example.
+
+Let's say the AP wants to claim the bus. It:
+1. Asserts AP_CLAIM
+2. Waits a little bit for the other side to notice (slew time, say 10
+microseconds)
+3. Checks EC_CLAIM. If this is not asserted, then the AP has the bus, and
+we are done
+4. Otherwise, wait for a few milliseconds and see if EC_CLAIM is released
+5. If not, back off, release the claim and wait for a few more milliseconds
+6. Go back to 1 (until retry time has expired)
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index a7edf98..222a6da 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -39,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "i2c-core.h"
@@ -1256,6 +1258,59 @@ void i2c_release_client(struct i2c_client *client)
 }
 EXPORT_SYMBOL(i2c_release_client);
 
+/**
+ * If we have enabled arbitration on this bus, claim the i2c bus, using
+ * the GPIO-based signalling protocol.
+ */
+static int i2c_bus_claim(struct i2c_gpio_arbit *i2c_arbit)
+{
+   unsigned long stop_retry, stop_time;
+   unsigned int gpio;
+
+   /* Start a round of trying to claim the bus */
+   stop_time = jiffies + usecs_to_jiffies(i2c_arbit->wait_free_us) + 1;
+   do {
+   /* Indicate that we want to claim the bus */
+   gpio_set_value(i2c_arbit->arb_gpios[I2C_ARB_GPIO_AP], 0);
+   udelay(i2c_arbit->slew_delay_us);
+
+   /* Wait for the EC to release it */
+   stop_retry = jiffies +
+   usecs_to_jiffies(i2c_arbit->wait_retry_us) + 1;
+   while (time_before(jiffies, stop_retry)) {
+   gpio = i2c_arbit->arb_gpios[I2C_ARB_GPIO_EC];
+   if (gpio_get_value(gpio)) {
+   /* We got it, so return */
+   return 0;
+   }
+
+   usleep_range(50, 200);
+   }
+
+   /* It didn't release, so give up, wait, and try again */
+   gpio_set_value(i2c_arbit->arb_gpios[I2C_ARB_GPIO_AP], 1);
+
+

[PATCH 2/2] i2c-s3c2410: Add GPIO based bus arbitration functionality

2012-12-13 Thread Naveen Krishna Chatradhi

Makes use of the generic fucntions in of_i2c.c to parse arbitration
timing information and GPIOs for arbitration.

Also uses devm_gpio_request() instead of gpio_request() and
removes the gpio_free() calls

Signed-off-by: Naveen Krishna Chatradhi 
---
 drivers/i2c/busses/i2c-s3c2410.c |   79 +++---
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c
index e93e7d6..d055cf8 100644
--- a/drivers/i2c/busses/i2c-s3c2410.c
+++ b/drivers/i2c/busses/i2c-s3c2410.c
@@ -855,54 +855,61 @@ static inline void s3c24xx_i2c_deregister_cpufreq(struct 
s3c24xx_i2c *i2c)
 #endif
 
 #ifdef CONFIG_OF
-static int s3c24xx_i2c_parse_dt_gpio(struct s3c24xx_i2c *i2c)
+static int of_i2c_parse_gpio(struct device *dev, const char *name,
+   int gpios[], size_t count, bool required)
 {
-   int idx, gpio, ret;
-
-   if (i2c->quirks & QUIRK_NO_GPIO)
-   return 0;
+   struct device_node *dn = dev->of_node;
+   int idx, gpio;
 
-   for (idx = 0; idx < 2; idx++) {
-   gpio = of_get_gpio(i2c->dev->of_node, idx);
+   for (idx = 0; idx < count; idx++) {
+   gpio = of_get_named_gpio(dn, name, idx);
if (!gpio_is_valid(gpio)) {
-   dev_err(i2c->dev, "invalid gpio[%d]: %d\n", idx, gpio);
-   goto free_gpio;
+   dev_dbg(dev, "invalid gpio[%d]: %d\n", idx, gpio);
+   if (idx || required) {
+   dev_err(dev, "invalid gpio[%d]: %d\n",
+   idx, gpio);
+   }
+   return -EINVAL;
}
-   i2c->gpios[idx] = gpio;
+   gpios[idx] = gpio;
 
-   ret = gpio_request(gpio, "i2c-bus");
-   if (ret) {
-   dev_err(i2c->dev, "gpio [%d] request failed\n", gpio);
-   goto free_gpio;
+   if (devm_gpio_request(dev, gpio, "i2c-bus")) {
+   dev_err(dev, "gpio [%d] request failed\n", gpio);
+   return -EINVAL;
}
}
return 0;
-
-free_gpio:
-   while (--idx >= 0)
-   gpio_free(i2c->gpios[idx]);
-   return -EINVAL;
 }
 
-static void s3c24xx_i2c_dt_gpio_free(struct s3c24xx_i2c *i2c)
+static int s3c24xx_i2c_parse_dt_gpio(struct s3c24xx_i2c *i2c)
 {
-   unsigned int idx;
+   int ret = 0;
 
if (i2c->quirks & QUIRK_NO_GPIO)
-   return;
+   goto out;
+
+   if (of_i2c_parse_gpio(i2c->dev, "gpios", i2c->gpios, 2, true)) {
+   ret = -EINVAL;
+   goto out;
+   }
 
-   for (idx = 0; idx < 2; idx++)
-   gpio_free(i2c->gpios[idx]);
+   if (i2c->adap.gpio_arbit) {
+   if (!of_i2c_parse_gpio(i2c->dev, "bus-arbitration-gpios",
+   i2c->adap.gpio_arbit->arb_gpios, I2C_ARB_GPIO_COUNT,
+   false)) {
+   dev_warn(i2c->dev, "GPIO-based arbitration enabled");
+   } else
+   ret = -EINVAL;
+   }
+
+ out:
+   return ret;
 }
 #else
 static int s3c24xx_i2c_parse_dt_gpio(struct s3c24xx_i2c *i2c)
 {
return 0;
 }
-
-static void s3c24xx_i2c_dt_gpio_free(struct s3c24xx_i2c *i2c)
-{
-}
 #endif
 
 /* s3c24xx_i2c_init
@@ -981,6 +988,7 @@ static int s3c24xx_i2c_probe(struct platform_device *pdev)
 {
struct s3c24xx_i2c *i2c;
struct s3c2410_platform_i2c *pdata = NULL;
+   struct i2c_gpio_arbit *arbit = NULL;
struct resource *res;
int ret;
 
@@ -1004,11 +1012,21 @@ static int s3c24xx_i2c_probe(struct platform_device 
*pdev)
goto err_noclk;
}
 
+   arbit = devm_kzalloc(>dev, sizeof(*arbit), GFP_KERNEL);
+   if (!arbit) {
+   ret = -ENOMEM;
+   goto err_noclk;
+   }
+
i2c->quirks = s3c24xx_get_device_quirks(pdev);
if (pdata)
memcpy(i2c->pdata, pdata, sizeof(*pdata));
-   else
+   else {
s3c24xx_i2c_parse_dt(pdev->dev.of_node, i2c);
+   /* Arbitration parameters */
+   i2c->adap.gpio_arbit = of_get_arbitrator_info(
+   pdev->dev.of_node, arbit);
+   }
 
strlcpy(i2c->adap.name, "s3c2410-i2c", sizeof(i2c->adap.name));
i2c->adap.owner   = THIS_MODULE;
@@ -1158,9 +1176,6 @@ static int s3c24xx_i2c_remove(struct platform_device 
*pdev)
clk_disable_unprepare(i2c->clk);
clk_put(i2c->clk);
 
-   if (pdev->dev.of_node && IS_ERR(i2c->pctrl))
-   s3c24xx_i2c_dt_gpio_free(i2c);
-
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH 0/2] i2c: Implement generic gpio based bus arbitration

2012-12-13 Thread Naveen Krishna Chatradhi

This patchset adds
1. Support for generic gpio based i2c bus arbitration between 2 i2c Masters
Ex: between AP(exynos), device(EC).
2. Documentation and sample implmentation in i2c-s3c2410 driver.

Naveen Krishna Chatradhi (2):
  i2c-core: Add gpio based bus arbitration implementation
  i2c-s3c2410: Add GPIO based bus arbitration functionality

 .../devicetree/bindings/i2c/arbitrator-i2c.txt |   56 ++
 drivers/i2c/busses/i2c-s3c2410.c   |   79 
 drivers/i2c/i2c-core.c |   67 +
 drivers/of/of_i2c.c|   27 +++
 include/linux/i2c.h|   17 +
 include/linux/of_i2c.h |2 +
 6 files changed, 216 insertions(+), 32 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/i2c/arbitrator-i2c.txt

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/08] clocksource: sh_cmt: Add CMT register layout comment

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Add a comment about different register layouts
supported by the CMT driver.

Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |   15 +++
 1 file changed, 15 insertions(+)

--- 0005/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 11:43:10.0 +0900
@@ -66,6 +66,21 @@ struct sh_cmt_priv {
unsigned long value);
 };
 
+/* Examples of supported CMT timer register layouts and I/O access widths:
+ *
+ * "16-bit counter and 16-bit control" as found on sh7263:
+ * CMSTR 0xfffec000 16-bit
+ * CMCSR 0xfffec002 16-bit
+ * CMCNT 0xfffec004 16-bit
+ * CMCOR 0xfffec006 16-bit
+ *
+ * "32-bit counter and 16-bit control" as found on sh7372, sh73a0, r8a7740:
+ * CMSTR 0xffca 16-bit
+ * CMCSR 0xffca0060 16-bit
+ * CMCNT 0xffca0064 32-bit
+ * CMCOR 0xffca0068 32-bit
+ */
+
 static unsigned long sh_cmt_read16(void __iomem *base, unsigned long offs)
 {
return ioread16(base + (offs << 1));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mm: Downgrade mmap_sem before locking or populating on mmap

2012-12-13 Thread Andy Lutomirski

This is a serious cause of mmap_sem contention.  MAP_POPULATE
and MCL_FUTURE, in particular, are disastrous in multithreaded programs.

Signed-off-by: Andy Lutomirski 
---

Sensible people use anonymous mappings.  I write kernel patches :)

I'm not entirely thrilled by the aesthetics of this patch.  The MAP_POPULATE 
case
could also be improved by doing it without any lock at all.  This is still a big
improvement, though.

 arch/tile/mm/elf.c |   9 ++--
 fs/aio.c   |   8 ++--
 include/linux/mm.h |   8 ++--
 ipc/shm.c  |   6 ++-
 mm/fremap.c|  10 ++--
 mm/mmap.c  | 133 +
 mm/util.c  |   3 +-
 7 files changed, 117 insertions(+), 60 deletions(-)

diff --git a/arch/tile/mm/elf.c b/arch/tile/mm/elf.c
index 3cfa98b..a0441f2 100644
--- a/arch/tile/mm/elf.c
+++ b/arch/tile/mm/elf.c
@@ -129,12 +129,13 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 */
if (!retval) {
unsigned long addr = MEM_USER_INTRPT;
-   addr = mmap_region(NULL, addr, INTRPT_SIZE,
-  MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
-  VM_READ|VM_EXEC|
-  VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
+   addr = mmap_region_unlock(NULL, addr, INTRPT_SIZE,
+ MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
if (addr > (unsigned long) -PAGE_SIZE)
retval = (int) addr;
+   return retval;  /* We already unlocked mmap_sem. */
}
 #endif
 
diff --git a/fs/aio.c b/fs/aio.c
index 71f613c..8e2b162 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -127,11 +127,10 @@ static int aio_setup_ring(struct kioctx *ctx)
info->mmap_size = nr_pages * PAGE_SIZE;
dprintk("attempting mmap of %lu bytes\n", info->mmap_size);
down_write(>mm->mmap_sem);
-   info->mmap_base = do_mmap_pgoff(NULL, 0, info->mmap_size, 
-   PROT_READ|PROT_WRITE,
-   MAP_ANONYMOUS|MAP_PRIVATE, 0);
+   info->mmap_base = do_mmap_pgoff_unlock(NULL, 0, info->mmap_size,
+  PROT_READ|PROT_WRITE,
+  MAP_ANONYMOUS|MAP_PRIVATE, 0);
if (IS_ERR((void *)info->mmap_base)) {
-   up_write(>mm->mmap_sem);
info->mmap_size = 0;
aio_free_ring(ctx);
return -EAGAIN;
@@ -141,7 +140,6 @@ static int aio_setup_ring(struct kioctx *ctx)
info->nr_pages = get_user_pages(current, ctx->mm,
info->mmap_base, nr_pages, 
1, 0, info->ring_pages, NULL);
-   up_write(>mm->mmap_sem);
 
if (unlikely(info->nr_pages != nr_pages)) {
aio_free_ring(ctx);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bcaab4e..bb13d11 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1441,12 +1441,12 @@ extern int install_special_mapping(struct mm_struct *mm,
 
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned 
long, unsigned long, unsigned long);
 
-extern unsigned long mmap_region(struct file *file, unsigned long addr,
+extern unsigned long mmap_region_unlock(struct file *file, unsigned long addr,
unsigned long len, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff);
-extern unsigned long do_mmap_pgoff(struct file *, unsigned long,
-unsigned long, unsigned long,
-unsigned long, unsigned long);
+extern unsigned long do_mmap_pgoff_unlock(struct file *, unsigned long addr,
+   unsigned long len, unsigned long prot,
+   unsigned long flags, unsigned long pgoff);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t);
 
 /* These take the mm semaphore themselves */
diff --git a/ipc/shm.c b/ipc/shm.c
index dff40c9..d0001c8 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1068,12 +1068,14 @@ long do_shmat(int shmid, char __user *shmaddr, int 
shmflg, ulong *raddr,
addr > current->mm->start_stack - size - PAGE_SIZE * 5)
goto invalid;
}
-   
-   user_addr = do_mmap_pgoff(file, addr, size, prot, flags, 0);
+
+   user_addr = do_mmap_pgoff_unlock(file, addr, size, prot, flags, 0);
*raddr = user_addr;
err = 0;
if (IS_ERR_VALUE(user_addr))
err = (long)user_addr;
+   goto out_fput;
+
 invalid:
up_write(>mm->mmap_sem);
 
diff --git a/mm/fremap.c b/mm/fremap.c
index a0aaf0e..232402c 100644
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -200,8 +200,8 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, 
unsigned long, size,

[PATCH 04/08] clocksource: sh_cmt: Introduce per-register functions

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Introduce sh_cmt_read_cmstr/cmcsr/cmcnt() and
sh_cmt_write_cmstr/cmcsr/cmcnt/cmcor() to in the
future allow us to split counter registers from
control registers and reduce code complexity by
removing sh_cmt_read() and sh_cmt_write(). 

Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |   71 --
 1 file changed, 55 insertions(+), 16 deletions(-)

--- 0001/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 10:50:15.0 +0900
@@ -86,6 +86,21 @@ static inline unsigned long sh_cmt_read(
return ioread16(base + offs);
 }
 
+static inline unsigned long sh_cmt_read_cmstr(struct sh_cmt_priv *p)
+{
+   return sh_cmt_read(p, CMSTR);
+}
+
+static inline unsigned long sh_cmt_read_cmcsr(struct sh_cmt_priv *p)
+{
+   return sh_cmt_read(p, CMCSR);
+}
+
+static inline unsigned long sh_cmt_read_cmcnt(struct sh_cmt_priv *p)
+{
+   return sh_cmt_read(p, CMCNT);
+}
+
 static inline void sh_cmt_write(struct sh_cmt_priv *p, int reg_nr,
unsigned long value)
 {
@@ -112,21 +127,45 @@ static inline void sh_cmt_write(struct s
iowrite16(value, base + offs);
 }
 
+static inline void sh_cmt_write_cmstr(struct sh_cmt_priv *p,
+ unsigned long value)
+{
+   sh_cmt_write(p, CMSTR, value);
+}
+
+static inline void sh_cmt_write_cmcsr(struct sh_cmt_priv *p,
+ unsigned long value)
+{
+   sh_cmt_write(p, CMCSR, value);
+}
+
+static inline void sh_cmt_write_cmcnt(struct sh_cmt_priv *p,
+ unsigned long value)
+{
+   sh_cmt_write(p, CMCNT, value);
+}
+
+static inline void sh_cmt_write_cmcor(struct sh_cmt_priv *p,
+ unsigned long value)
+{
+   sh_cmt_write(p, CMCOR, value);
+}
+
 static unsigned long sh_cmt_get_counter(struct sh_cmt_priv *p,
int *has_wrapped)
 {
unsigned long v1, v2, v3;
int o1, o2;
 
-   o1 = sh_cmt_read(p, CMCSR) & p->overflow_bit;
+   o1 = sh_cmt_read_cmcsr(p) & p->overflow_bit;
 
/* Make sure the timer value is stable. Stolen from acpi_pm.c */
do {
o2 = o1;
-   v1 = sh_cmt_read(p, CMCNT);
-   v2 = sh_cmt_read(p, CMCNT);
-   v3 = sh_cmt_read(p, CMCNT);
-   o1 = sh_cmt_read(p, CMCSR) & p->overflow_bit;
+   v1 = sh_cmt_read_cmcnt(p);
+   v2 = sh_cmt_read_cmcnt(p);
+   v3 = sh_cmt_read_cmcnt(p);
+   o1 = sh_cmt_read_cmcsr(p) & p->overflow_bit;
} while (unlikely((o1 != o2) || (v1 > v2 && v1 < v3)
  || (v2 > v3 && v2 < v1) || (v3 > v1 && v3 < v2)));
 
@@ -142,14 +181,14 @@ static void sh_cmt_start_stop_ch(struct
 
/* start stop register shared by multiple timer channels */
raw_spin_lock_irqsave(_cmt_lock, flags);
-   value = sh_cmt_read(p, CMSTR);
+   value = sh_cmt_read_cmstr(p);
 
if (start)
value |= 1 << cfg->timer_bit;
else
value &= ~(1 << cfg->timer_bit);
 
-   sh_cmt_write(p, CMSTR, value);
+   sh_cmt_write_cmstr(p, value);
raw_spin_unlock_irqrestore(_cmt_lock, flags);
 }
 
@@ -173,14 +212,14 @@ static int sh_cmt_enable(struct sh_cmt_p
/* configure channel, periodic mode and maximum timeout */
if (p->width == 16) {
*rate = clk_get_rate(p->clk) / 512;
-   sh_cmt_write(p, CMCSR, 0x43);
+   sh_cmt_write_cmcsr(p, 0x43);
} else {
*rate = clk_get_rate(p->clk) / 8;
-   sh_cmt_write(p, CMCSR, 0x01a4);
+   sh_cmt_write_cmcsr(p, 0x01a4);
}
 
-   sh_cmt_write(p, CMCOR, 0x);
-   sh_cmt_write(p, CMCNT, 0);
+   sh_cmt_write_cmcor(p, 0x);
+   sh_cmt_write_cmcnt(p, 0);
 
/*
 * According to the sh73a0 user's manual, as CMCNT can be operated
@@ -194,12 +233,12 @@ static int sh_cmt_enable(struct sh_cmt_p
 * take RCLKx2 at maximum.
 */
for (k = 0; k < 100; k++) {
-   if (!sh_cmt_read(p, CMCNT))
+   if (!sh_cmt_read_cmcnt(p))
break;
udelay(1);
}
 
-   if (sh_cmt_read(p, CMCNT)) {
+   if (sh_cmt_read_cmcnt(p)) {
dev_err(>pdev->dev, "cannot clear CMCNT\n");
ret = -ETIMEDOUT;
goto err1;
@@ -222,7 +261,7 @@ static void sh_cmt_disable(struct sh_cmt
sh_cmt_start_stop_ch(p, 0);
 
/* disable interrupts in CMT block */
-   sh_cmt_write(p, CMCSR, 0);
+   sh_cmt_write_cmcsr(p, 0);
 
/* stop clock */
clk_disable(p->clk);
@@ -270,7 +309,7 @@ static void sh_cmt_clock_event_program_v
if (new_match > p->max_match_value)
new_match = p->max_match_value;

[PATCH 05/08] clocksource: sh_cmt: CMSTR and CMCSR register access update

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Update hardware register access code for CMSTR and CMCSR
from using sh_cmt_read() and sh_cmt_write() to make use
of 16-bit register access functions such as sh_cmt_read16()
and sh_cmt_write16(). Also update sh_cmt_read() and
sh_cmt_write() now when the special cases are gone.

This patch moves us one step closer to the goal of separating
counter register access functions from control control register
functions.

Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |   66 +++---
 1 file changed, 30 insertions(+), 36 deletions(-)

--- 0005/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 12:58:01.0 +0900
@@ -56,44 +56,46 @@ struct sh_cmt_priv {
bool cs_enabled;
 };
 
-static DEFINE_RAW_SPINLOCK(sh_cmt_lock);
+static inline unsigned long sh_cmt_read16(void __iomem *base,
+ unsigned long offs)
+{
+   return ioread16(base + (offs << 1));
+}
+
+static inline void sh_cmt_write16(void __iomem *base, unsigned long offs,
+ unsigned long value)
+{
+   iowrite16(value, base + (offs << 1));
+}
 
-#define CMSTR -1 /* shared register */
 #define CMCSR 0 /* channel register */
 #define CMCNT 1 /* channel register */
 #define CMCOR 2 /* channel register */
 
 static inline unsigned long sh_cmt_read(struct sh_cmt_priv *p, int reg_nr)
 {
-   struct sh_timer_config *cfg = p->pdev->dev.platform_data;
void __iomem *base = p->mapbase;
-   unsigned long offs;
+   unsigned long offs = reg_nr;
 
-   if (reg_nr == CMSTR) {
-   offs = 0;
-   base -= cfg->channel_offset;
-   } else
-   offs = reg_nr;
-
-   if (p->width == 16)
+   if (p->width == 16) {
offs <<= 1;
-   else {
+   return ioread16(base + offs);
+   } else {
offs <<= 2;
-   if ((reg_nr == CMCNT) || (reg_nr == CMCOR))
-   return ioread32(base + offs);
+   return ioread32(base + offs);
}
-
-   return ioread16(base + offs);
 }
 
 static inline unsigned long sh_cmt_read_cmstr(struct sh_cmt_priv *p)
 {
-   return sh_cmt_read(p, CMSTR);
+   struct sh_timer_config *cfg = p->pdev->dev.platform_data;
+
+   return sh_cmt_read16(p->mapbase - cfg->channel_offset, 0);
 }
 
 static inline unsigned long sh_cmt_read_cmcsr(struct sh_cmt_priv *p)
 {
-   return sh_cmt_read(p, CMCSR);
+   return sh_cmt_read16(p->mapbase, CMCSR);
 }
 
 static inline unsigned long sh_cmt_read_cmcnt(struct sh_cmt_priv *p)
@@ -104,39 +106,30 @@ static inline unsigned long sh_cmt_read_
 static inline void sh_cmt_write(struct sh_cmt_priv *p, int reg_nr,
unsigned long value)
 {
-   struct sh_timer_config *cfg = p->pdev->dev.platform_data;
void __iomem *base = p->mapbase;
-   unsigned long offs;
-
-   if (reg_nr == CMSTR) {
-   offs = 0;
-   base -= cfg->channel_offset;
-   } else
-   offs = reg_nr;
+   unsigned long offs = reg_nr;
 
-   if (p->width == 16)
+   if (p->width == 16) {
offs <<= 1;
-   else {
+   iowrite16(value, base + offs);
+   } else {
offs <<= 2;
-   if ((reg_nr == CMCNT) || (reg_nr == CMCOR)) {
-   iowrite32(value, base + offs);
-   return;
-   }
+   iowrite32(value, base + offs);
}
-
-   iowrite16(value, base + offs);
 }
 
 static inline void sh_cmt_write_cmstr(struct sh_cmt_priv *p,
  unsigned long value)
 {
-   sh_cmt_write(p, CMSTR, value);
+   struct sh_timer_config *cfg = p->pdev->dev.platform_data;
+
+   sh_cmt_write16(p->mapbase - cfg->channel_offset, 0, value);
 }
 
 static inline void sh_cmt_write_cmcsr(struct sh_cmt_priv *p,
  unsigned long value)
 {
-   sh_cmt_write(p, CMCSR, value);
+   sh_cmt_write16(p->mapbase, CMCSR, value);
 }
 
 static inline void sh_cmt_write_cmcnt(struct sh_cmt_priv *p,
@@ -173,6 +166,7 @@ static unsigned long sh_cmt_get_counter(
return v2;
 }
 
+static DEFINE_RAW_SPINLOCK(sh_cmt_lock);
 
 static void sh_cmt_start_stop_ch(struct sh_cmt_priv *p, int start)
 {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/08] clocksource: sh_cmt: CMCNT and CMCOR register access update

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Break out the CMCNT and CMCOR register access code
into separate 16-bit and 32-bit functions that are
hooked into callbacks at init time. This reduces
the amount of software calculations happening at
runtime.

Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |   62 +-
 1 file changed, 26 insertions(+), 36 deletions(-)

--- 0006/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 13:00:20.0 +0900
@@ -54,38 +54,39 @@ struct sh_cmt_priv {
struct clocksource cs;
unsigned long total_cycles;
bool cs_enabled;
+
+   /* callbacks for CMCNT and CMCOR access */
+   unsigned long (*read_count)(void __iomem *base, unsigned long offs);
+   void (*write_count)(void __iomem *base, unsigned long offs,
+   unsigned long value);
 };
 
-static inline unsigned long sh_cmt_read16(void __iomem *base,
- unsigned long offs)
+static unsigned long sh_cmt_read16(void __iomem *base, unsigned long offs)
 {
return ioread16(base + (offs << 1));
 }
 
-static inline void sh_cmt_write16(void __iomem *base, unsigned long offs,
- unsigned long value)
+static unsigned long sh_cmt_read32(void __iomem *base, unsigned long offs)
+{
+   return ioread32(base + (offs << 2));
+}
+
+static void sh_cmt_write16(void __iomem *base, unsigned long offs,
+  unsigned long value)
 {
iowrite16(value, base + (offs << 1));
 }
 
+static void sh_cmt_write32(void __iomem *base, unsigned long offs,
+  unsigned long value)
+{
+   iowrite32(value, base + (offs << 2));
+}
+
 #define CMCSR 0 /* channel register */
 #define CMCNT 1 /* channel register */
 #define CMCOR 2 /* channel register */
 
-static inline unsigned long sh_cmt_read(struct sh_cmt_priv *p, int reg_nr)
-{
-   void __iomem *base = p->mapbase;
-   unsigned long offs = reg_nr;
-
-   if (p->width == 16) {
-   offs <<= 1;
-   return ioread16(base + offs);
-   } else {
-   offs <<= 2;
-   return ioread32(base + offs);
-   }
-}
-
 static inline unsigned long sh_cmt_read_cmstr(struct sh_cmt_priv *p)
 {
struct sh_timer_config *cfg = p->pdev->dev.platform_data;
@@ -100,22 +101,7 @@ static inline unsigned long sh_cmt_read_
 
 static inline unsigned long sh_cmt_read_cmcnt(struct sh_cmt_priv *p)
 {
-   return sh_cmt_read(p, CMCNT);
-}
-
-static inline void sh_cmt_write(struct sh_cmt_priv *p, int reg_nr,
-   unsigned long value)
-{
-   void __iomem *base = p->mapbase;
-   unsigned long offs = reg_nr;
-
-   if (p->width == 16) {
-   offs <<= 1;
-   iowrite16(value, base + offs);
-   } else {
-   offs <<= 2;
-   iowrite32(value, base + offs);
-   }
+   return p->read_count(p->mapbase, CMCNT);
 }
 
 static inline void sh_cmt_write_cmstr(struct sh_cmt_priv *p,
@@ -135,13 +121,13 @@ static inline void sh_cmt_write_cmcsr(st
 static inline void sh_cmt_write_cmcnt(struct sh_cmt_priv *p,
  unsigned long value)
 {
-   sh_cmt_write(p, CMCNT, value);
+   p->write_count(p->mapbase, CMCNT, value);
 }
 
 static inline void sh_cmt_write_cmcor(struct sh_cmt_priv *p,
  unsigned long value)
 {
-   sh_cmt_write(p, CMCOR, value);
+   p->write_count(p->mapbase, CMCOR, value);
 }
 
 static unsigned long sh_cmt_get_counter(struct sh_cmt_priv *p,
@@ -718,10 +704,14 @@ static int sh_cmt_setup(struct sh_cmt_pr
 
if (resource_size(res) == 6) {
p->width = 16;
+   p->read_count = sh_cmt_read16;
+   p->write_count = sh_cmt_write16;
p->overflow_bit = 0x80;
p->clear_bits = ~0x80;
} else {
p->width = 32;
+   p->read_count = sh_cmt_read32;
+   p->write_count = sh_cmt_write32;
p->overflow_bit = 0x8000;
p->clear_bits = ~0xc000;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/08] clocksource: sh_cmt: Add control register callbacks

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

This patch adds control register callbacks for the CMT
driver. At this point only 16-bit access is supported
but in the future this will be updated to allow 32-bit
access as well.

Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

--- 0004/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 11:28:57.0 +0900
@@ -55,6 +55,11 @@ struct sh_cmt_priv {
unsigned long total_cycles;
bool cs_enabled;
 
+   /* callbacks for CMSTR and CMCSR access */
+   unsigned long (*read_control)(void __iomem *base, unsigned long offs);
+   void (*write_control)(void __iomem *base, unsigned long offs,
+ unsigned long value);
+
/* callbacks for CMCNT and CMCOR access */
unsigned long (*read_count)(void __iomem *base, unsigned long offs);
void (*write_count)(void __iomem *base, unsigned long offs,
@@ -94,12 +99,12 @@ static inline unsigned long sh_cmt_read_
 {
struct sh_timer_config *cfg = p->pdev->dev.platform_data;
 
-   return sh_cmt_read16(p->mapbase - cfg->channel_offset, 0);
+   return p->read_control(p->mapbase - cfg->channel_offset, 0);
 }
 
 static inline unsigned long sh_cmt_read_cmcsr(struct sh_cmt_priv *p)
 {
-   return sh_cmt_read16(p->mapbase, CMCSR);
+   return p->read_control(p->mapbase, CMCSR);
 }
 
 static inline unsigned long sh_cmt_read_cmcnt(struct sh_cmt_priv *p)
@@ -112,13 +117,13 @@ static inline void sh_cmt_write_cmstr(st
 {
struct sh_timer_config *cfg = p->pdev->dev.platform_data;
 
-   sh_cmt_write16(p->mapbase - cfg->channel_offset, 0, value);
+   p->write_control(p->mapbase - cfg->channel_offset, 0, value);
 }
 
 static inline void sh_cmt_write_cmcsr(struct sh_cmt_priv *p,
  unsigned long value)
 {
-   sh_cmt_write16(p->mapbase, CMCSR, value);
+   p->write_control(p->mapbase, CMCSR, value);
 }
 
 static inline void sh_cmt_write_cmcnt(struct sh_cmt_priv *p,
@@ -714,6 +719,9 @@ static int sh_cmt_setup(struct sh_cmt_pr
goto err1;
}
 
+   p->read_control = sh_cmt_read16;
+   p->write_control = sh_cmt_write16;
+
if (resource_size(res) == 6) {
p->width = 16;
p->read_count = sh_cmt_read16;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/08] clocksource: sh_cmt: Consolidate platform_set_drvdata() call

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Cleanup the use of platform_set_drvdata() to reduce code size

Signed-off-by: Shinya Kuribayashi 
Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- 0003/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 12:54:59.0 +0900
@@ -649,8 +649,6 @@ static int sh_cmt_setup(struct sh_cmt_pr
goto err0;
}
 
-   platform_set_drvdata(pdev, p);
-
res = platform_get_resource(p->pdev, IORESOURCE_MEM, 0);
if (!res) {
dev_err(>pdev->dev, "failed to get I/O memory\n");
@@ -718,6 +716,8 @@ static int sh_cmt_setup(struct sh_cmt_pr
goto err2;
}
 
+   platform_set_drvdata(pdev, p);
+
return 0;
 err2:
clk_put(p->clk);
@@ -753,7 +753,6 @@ static int __devinit sh_cmt_probe(struct
ret = sh_cmt_setup(p, pdev);
if (ret) {
kfree(p);
-   platform_set_drvdata(pdev, NULL);
pm_runtime_idle(>dev);
return ret;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/08] clocksource: sh_cmt: Initialize 'max_match_value' and 'lock' in sh_cmt_setup()

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Move the setup of spinlock and max_match_value to sh_cmt_setup().
There's no need to defer those steps until sh_cmt_register().

Signed-off-by: Shinya Kuribayashi 
Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |   16 
 1 file changed, 8 insertions(+), 8 deletions(-)

--- 0002/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 12:52:35.0 +0900
@@ -625,14 +625,6 @@ static int sh_cmt_register(struct sh_cmt
   unsigned long clockevent_rating,
   unsigned long clocksource_rating)
 {
-   if (p->width == (sizeof(p->max_match_value) * 8))
-   p->max_match_value = ~0;
-   else
-   p->max_match_value = (1 << p->width) - 1;
-
-   p->match_value = p->max_match_value;
-   raw_spin_lock_init(>lock);
-
if (clockevent_rating)
sh_cmt_register_clockevent(p, name, clockevent_rating);
 
@@ -703,6 +695,14 @@ static int sh_cmt_setup(struct sh_cmt_pr
p->clear_bits = ~0xc000;
}
 
+   if (p->width == (sizeof(p->max_match_value) * 8))
+   p->max_match_value = ~0;
+   else
+   p->max_match_value = (1 << p->width) - 1;
+
+   p->match_value = p->max_match_value;
+   raw_spin_lock_init(>lock);
+
ret = sh_cmt_register(p, (char *)dev_name(>pdev->dev),
  cfg->clockevent_rating,
  cfg->clocksource_rating);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/08] clocksource: sh_cmt: CMT driver update

2012-12-13 Thread Magnus Damm

clocksource: sh_cmt: CMT driver update

[PATCH 01/08] clocksource: sh_cmt: Take care of clk_put() when setup_irq() fails
[PATCH 02/08] clocksource: sh_cmt: Initialize 'max_match_value' and 'lock' in 
sh_cmt_setup()
[PATCH 03/08] clocksource: sh_cmt: Consolidate platform_set_drvdata() call
[PATCH 04/08] clocksource: sh_cmt: Introduce per-register functions
[PATCH 05/08] clocksource: sh_cmt: CMSTR and CMCSR register access update
[PATCH 06/08] clocksource: sh_cmt: CMCNT and CMCOR register access update
[PATCH 07/08] clocksource: sh_cmt: Add control register callbacks
[PATCH 08/08] clocksource: sh_cmt: Add CMT register layout comment

This patch series contains a couple of driver updates from Kuribayashi-san
together with some register access changes from me. The register access
changes work towards adding support for 32-bit only CMT hardware, but
the final bits are not yet included in this series due to lack of hardware.

Patches 1-3:
Signed-off-by: Shinya Kuribayashi 
Signed-off-by: Magnus Damm 

Patches 4-8:
Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |  257 --
 1 file changed, 152 insertions(+), 105 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/08] clocksource: sh_cmt: Take care of clk_put() when setup_irq() fails

2012-12-13 Thread Magnus Damm

From: Magnus Damm 

Make sure clk_put() is called in case of failure in sh_cmt_setup().

Signed-off-by: Shinya Kuribayashi 
Signed-off-by: Magnus Damm 
---

 drivers/clocksource/sh_cmt.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

--- 0001/drivers/clocksource/sh_cmt.c
+++ work/drivers/clocksource/sh_cmt.c   2012-12-14 12:50:02.0 +0900
@@ -708,17 +708,19 @@ static int sh_cmt_setup(struct sh_cmt_pr
  cfg->clocksource_rating);
if (ret) {
dev_err(>pdev->dev, "registration failed\n");
-   goto err1;
+   goto err2;
}
p->cs_enabled = false;
 
ret = setup_irq(irq, >irqaction);
if (ret) {
dev_err(>pdev->dev, "failed to request irq %d\n", irq);
-   goto err1;
+   goto err2;
}
 
return 0;
+err2:
+   clk_put(p->clk);
 
 err1:
iounmap(p->mapbase);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

percpu allocation failures in kvm

2012-12-13 Thread Andy Lutomirski

On 3.7.0 + irrelevant patches, I get this on boot.  I've seen it on
and off on earlier kernels, I think (although I'm not currently
getting it on 3.5).

[   10.230054] PERCPU: allocation failed, size=304 align=32, alloc
from reserved chunk failed
[   10.230059] Pid: 1026, comm: modprobe Tainted: GW3.7.0-ama+ #5
[   10.230060] Call Trace:
[   10.230070]  [] pcpu_alloc+0x9db/0xa40
[   10.230074]  [] ? find_symbol_in_section+0x4d/0x140
[   10.230077]  [] ? finished_loading+0x50/0x50
[   10.230080]  [] ? each_symbol_section+0x30/0x70
[   10.230083]  [] ? find_symbol+0x31/0x60
[   10.230086]  [] __alloc_reserved_percpu+0x13/0x20
[   10.230089]  [] load_module+0x3ed/0x1b50
[   10.230093]  [] ? __srcu_read_unlock+0x4b/0x70

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix up f2fs_get_parent issue to retrieve correct parent inode number

2012-12-13 Thread Namjae Jeon

2012/12/14, Jaegeuk Kim :
> Hi,
>
>> diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
>> index a60f042..5e48bac 100644
>> --- a/fs/f2fs/hash.c
>> +++ b/fs/f2fs/hash.c
>> @@ -76,6 +76,10 @@ f2fs_hash_t f2fs_dentry_hash(const char *name, int
>> len)
>>  const char *p;
>>  __u32 in[8], buf[4];
>>
>> +if ((len <= 2) && (name[0] == '.') &&
>> +(name[1] == '.' || name[1] == '\0'))
>> +return 0;
>
> If len == 1, we should avoid referencing name[1].
> Likewise VFS does, I rewrote that like below.
>
>   if (name[0] == '.') {
>   switch (len) {
>   case 1:
>   return 0;
>   case 2:
>   if (name[1] == '.')
>   return 0;
>   }
>   }
>
> So, how about this patch?

I think that there is no issue on current patch. Since, the strings
are always expected to be NULL terminated.

"." should include '\0', So we can distingsh by checking only name[0], name[1].

When we do:
char *ptr="hello"; -> it will always be NULL terminated -> "hello" in
memory followed by '\0';
when we reserver space
char ptr[5];-> We need to reserver space for '\0' at the end.

Also, even if we donot encounter NULL at the end.
there should be no problem referencing the memory, it is just we can
get garbage value from that refrencing.

May be I am not able to think the code path which will cause the problem.
Is it possible that "." is not null termindated ?

Otherwise, in my opinion - we can retain the suggested changes, there
is no need to introducing switch case, even
though technically there is no issues in the code you suggested also.

So, final decision is yours.

Thanks.

>
> From 391d584afadfb177584a3a3e4a7ec97e1a674457 Mon Sep 17 00:00:00 2001
> From: Namjae Jeon 
> Date: Thu, 13 Dec 2012 23:44:11 +0900
> Subject: [PATCH] f2fs: fix up f2fs_get_parent issue to retrieve correct
> parent
>  inode number
>
> Test Case:
> [NFS Client]
> ls -lR .
>
> [NFS Server]
> while [ 1 ]
> do
> echo 3 > /proc/sys/vm/drop_caches
> done
>
> Error on NFS Client: "No such file or directory"
>
> When cache is dropped at the server, it results in lookup failure at the
> NFS client due to non-connection with the parent. The default path is it
> initiates a lookup by calculating the hash value for the name, even
> though
> the hash values stored on the disk for "." and ".." is maintained as
> zero,
> which results in failure from find_in_block due to not matching HASH
> values.
> Fix up, by using the correct hashing values for these entries.
>
> Signed-off-by: Namjae Jeon 
> Signed-off-by: Amit Sahrawat 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/dir.c  |  4 ++--
>  fs/f2fs/hash.c | 10 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index b4e24f3..e1f66df 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -540,13 +540,13 @@ int f2fs_make_empty(struct inode *inode, struct
> inode *parent)
>
>   de = _blk->dentry[0];
>   de->name_len = cpu_to_le16(1);
> - de->hash_code = 0;
> + de->hash_code = f2fs_dentry_hash(".", 1);
>   de->ino = cpu_to_le32(inode->i_ino);
>   memcpy(dentry_blk->filename[0], ".", 1);
>   set_de_type(de, inode);
>
>   de = _blk->dentry[1];
> - de->hash_code = 0;
> + de->hash_code = f2fs_dentry_hash("..", 2);
>   de->name_len = cpu_to_le16(2);
>   de->ino = cpu_to_le32(parent->i_ino);
>   memcpy(dentry_blk->filename[1], "..", 2);
> diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
> index a60f042..9d7a7c6 100644
> --- a/fs/f2fs/hash.c
> +++ b/fs/f2fs/hash.c
> @@ -76,6 +76,16 @@ f2fs_hash_t f2fs_dentry_hash(const char *name, int
> len)
>   const char *p;
>   __u32 in[8], buf[4];
>
> + if (name[0] == '.') {
> + switch (len) {
> + case 1:
> + return 0;
> + case 2:
> + if (name[1] == '.')
> + return 0;
> + }
> + }
> +
>   /* Initialize the default seed for the hash checksum functions */
>   buf[0] = 0x67452301;
>   buf[1] = 0xefcdab89;
> --
> 1.8.0.1.250.gb7973fb
>
> --
> Jaegeuk Kim
> Samsung
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] user namespace and namespace infrastructure changes for 3.8

2012-12-13 Thread Andy Lutomirski

On Thu, Dec 13, 2012 at 8:11 PM, Eric W. Biederman
 wrote:
> Andy Lutomirski  writes:
>
>> One more issue: the requirement that both upper and lower uids (etc.)
>> in the maps are in order is rather limiting.  I have no objection if
>> you only require upper ids to be monotonic, but currently there's no
>> way to may root outside to uid n (for n > 0) and some nonroot user
>> outside to uid 0.
>
> There is.  You may set up to 5 (extents).  You just have to use a second
> extent for the non-contiguous bits.  My reader is lazy and you have to
> set all of the extents with a single write, so you may have missed the
> ability to set more than one extent.
>

If I'm wrong, I'll happily eat my words.  Both:

0 1 1
1 0 1

and

1 0 1
0 1 1

are rejected, unless I totally messed up.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] f2fs: fix up f2fs_get_parent issue to retrieve correct parent inode number

2012-12-13 Thread Jaegeuk Kim

Hi,

> diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
> index a60f042..5e48bac 100644
> --- a/fs/f2fs/hash.c
> +++ b/fs/f2fs/hash.c
> @@ -76,6 +76,10 @@ f2fs_hash_t f2fs_dentry_hash(const char *name, int len)
>   const char *p;
>   __u32 in[8], buf[4];
>  
> + if ((len <= 2) && (name[0] == '.') &&
> + (name[1] == '.' || name[1] == '\0'))
> + return 0;

If len == 1, we should avoid referencing name[1].
Likewise VFS does, I rewrote that like below.

if (name[0] == '.') {
switch (len) {
case 1:
return 0;
case 2:
if (name[1] == '.')
return 0;
}
}

So, how about this patch?

From 391d584afadfb177584a3a3e4a7ec97e1a674457 Mon Sep 17 00:00:00 2001
From: Namjae Jeon 
Date: Thu, 13 Dec 2012 23:44:11 +0900
Subject: [PATCH] f2fs: fix up f2fs_get_parent issue to retrieve correct
parent
 inode number

Test Case:
[NFS Client]
ls -lR .

[NFS Server]
while [ 1 ]
do
echo 3 > /proc/sys/vm/drop_caches
done

Error on NFS Client: "No such file or directory"

When cache is dropped at the server, it results in lookup failure at the
NFS client due to non-connection with the parent. The default path is it
initiates a lookup by calculating the hash value for the name, even
though
the hash values stored on the disk for "." and ".." is maintained as
zero,
which results in failure from find_in_block due to not matching HASH
values.
Fix up, by using the correct hashing values for these entries.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/dir.c  |  4 ++--
 fs/f2fs/hash.c | 10 ++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index b4e24f3..e1f66df 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -540,13 +540,13 @@ int f2fs_make_empty(struct inode *inode, struct
inode *parent)
 
de = _blk->dentry[0];
de->name_len = cpu_to_le16(1);
-   de->hash_code = 0;
+   de->hash_code = f2fs_dentry_hash(".", 1);
de->ino = cpu_to_le32(inode->i_ino);
memcpy(dentry_blk->filename[0], ".", 1);
set_de_type(de, inode);
 
de = _blk->dentry[1];
-   de->hash_code = 0;
+   de->hash_code = f2fs_dentry_hash("..", 2);
de->name_len = cpu_to_le16(2);
de->ino = cpu_to_le32(parent->i_ino);
memcpy(dentry_blk->filename[1], "..", 2);
diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
index a60f042..9d7a7c6 100644
--- a/fs/f2fs/hash.c
+++ b/fs/f2fs/hash.c
@@ -76,6 +76,16 @@ f2fs_hash_t f2fs_dentry_hash(const char *name, int
len)
const char *p;
__u32 in[8], buf[4];
 
+   if (name[0] == '.') {
+   switch (len) {
+   case 1:
+   return 0;
+   case 2:
+   if (name[1] == '.')
+   return 0;
+   }
+   }
+
/* Initialize the default seed for the hash checksum functions */
buf[0] = 0x67452301;
buf[1] = 0xefcdab89;
-- 
1.8.0.1.250.gb7973fb

-- 
Jaegeuk Kim
Samsung


signature.asc
Description: This is a digitally signed message part

Re: [patch 2/8] mm: vmscan: disregard swappiness shortly before going OOM

2012-12-13 Thread Johannes Weiner

On Thu, Dec 13, 2012 at 10:25:43PM +, Satoru Moriya wrote:
> 
> On 12/13/2012 11:05 AM, Michal Hocko wrote:> On Thu 13-12-12 16:29:59, Michal 
> Hocko wrote:
> >> On Thu 13-12-12 10:34:20, Mel Gorman wrote:
> >>> On Wed, Dec 12, 2012 at 04:43:34PM -0500, Johannes Weiner wrote:
>  When a reclaim scanner is doing its final scan before giving up and 
>  there is swap space available, pay no attention to swappiness 
>  preference anymore.  Just swap.
> 
>  Note that this change won't make too big of a difference for 
>  general
>  reclaim: anonymous pages are already force-scanned when there is 
>  only very little file cache left, and there very likely isn't when 
>  the reclaimer enters this final cycle.
> 
>  Signed-off-by: Johannes Weiner 
> >>>
> >>> Ok, I see the motivation for your patch but is the block inside 
> >>> still wrong for what you want? After your patch the block looks like 
> >>> this
> >>>
> >>> if (sc->priority || noswap) {
> >>> scan >>= sc->priority;
> >>> if (!scan && force_scan)
> >>> scan = SWAP_CLUSTER_MAX;
> >>> scan = div64_u64(scan * fraction[file], 
> >>> denominator);
> >>> }
> >>>
> >>> if sc->priority == 0 and swappiness==0 then you enter this block but 
> >>> fraction[0] for anonymous pages will also be 0 and because of the 
> >>> ordering of statements there, scan will be
> >>>
> >>> scan = scan * 0 / denominator
> >>>
> >>> so you are still not reclaiming anonymous pages in the swappiness=0 
> >>> case. What did I miss?
> >>
> >> Yes, now that you have mentioned that I realized that it really 
> >> doesn't make any sense. fraction[0] is _always_ 0 for swappiness==0. 
> >> So we just made a bigger pressure on file LRUs. So this sounds like a 
> >> misuse of the swappiness. This all has been introduced with fe35004f 
> >> (mm: avoid swapping out with swappiness==0).
> >>
> >> I think that removing swappiness check make sense but I am not sure 
> >> it does what the changelog says. It should have said that checking 
> >> swappiness doesn't make any sense for small LRUs.
> >
> > Bahh, wait a moment. Now I remember why the check made sense 
> > especially for memcg.
> > It made "don't swap _at all_ for swappiness==0" for real - you are 
> > even willing to sacrifice OOM. Maybe this is OK for the global case 
> > because noswap would safe you here (assuming that there is no swap if 
> > somebody doesn't want to swap at all and swappiness doesn't play such 
> > a big role) but for memcg you really might want to prevent from 
> > swapping - not everybody has memcg swap extension enabled and swappiness is 
> > handy then.
> > So I am not sure this is actually what we want. Need to think about it.
> 
> I introduced swappiness check here with fe35004f because, in some
> cases, we prefer OOM to swap out pages to detect problems as soon
> as possible. Basically, we design the system not to swap out and
> so if it causes swapping, something goes wrong.

I might be missing something terribly obvious, but... why do you add
swap space to the system in the first place?  Or in case of cgroups,
why not set the memsw limit equal to the memory limit?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/5] KVM: x86: improve reexecute_instruction

2012-12-13 Thread Xiao Guangrong

On 12/14/2012 06:54 AM, Marcelo Tosatti wrote:
> On Thu, Dec 13, 2012 at 04:05:55AM +0800, Xiao Guangrong wrote:
>> On 12/12/2012 07:36 AM, Marcelo Tosatti wrote:
>>> On Mon, Dec 10, 2012 at 05:11:35PM +0800, Xiao Guangrong wrote:
 Changelog:
 There are some changes from Marcelo and Gleb's review, thank you all!
 - access indirect_shadow_pages in the protection of mmu-lock
 - fix the issue when unhandleable instruction access on large page
 - add a new test case for large page

 The current reexecute_instruction can not well detect the failed 
 instruction
 emulation. It allows guest to retry all the instructions except it accesses
 on error pfn.

 For example, these cases can not be detected:
 - for tdp used
   currently, it refused to retry all instructions. If nested npt is used, 
 the
   emulation may be caused by shadow page, it can be fixed by unshadow the
   shadow page.

 - for shadow mmu
   some cases are nested-write-protect, for example, if the page we want to
   write is used as PDE but it chains to itself. Under this case, we should
   stop the emulation and report the case to userspace.

 There are two test cases based on kvm-unit-test can trigger a infinite 
 loop on
 current code (ept = 0), after this patchset, it can report the error to 
 Qemu.

 Subject: [PATCH] access test: test unhandleable instruction

 Test the instruction which can not be handled by kvm

 Signed-off-by: Xiao Guangrong 
>>>
>>> Please submit the test for inclusion. There should be some way to make
>>> it fail.. 
>>
>> Yes.
>>
>> But it is not easy. If the test cases run normally, kvm will report a error 
>> to Qemu
>> then Qemu will exit the vcpu thread after dumping the vcpu state.
>>
>> We need to do something to let guest can be aware that the error report is 
>> triggered.
>> I guess we can add a option in Qemu, say '-notify-guest' and allow Qemu to 
>> inject #GP
>> to guest with a special ERROR_CODE if error is reported.
>>
>>> program a timer interrupt and #GP? 
>>
>> Could you please explain the detail?
> 
> Before the instruction which writes continuously to the pagetable, arm
> say lapic timer. #GP on the interrupt handler and test with failure.

Sorry, I am confused about this. After Qemu exits due to 
KVM_EXIT_INTERNAL_ERROR,
the vm is stopped then interrupt can not be injected to guest. Or i missed 
something?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMAR and DRHD errors[DMAR:[fault reason 06] PTE Read access is not set] Vt-d & intel_iommu

2012-12-13 Thread Alex Williamson

On Fri, 2012-12-14 at 10:01 +0800, Jason Gao wrote:
> On Fri, Dec 14, 2012 at 12:23 AM, Alex Williamson
>  wrote:
> >
> > Device 03:00.0 is your raid controller:
> >
> > 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 
> > (rev 04)
> >
> > For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
> > ffe86000, ffe87000, ffe84000.  Those are in reserved memory regions, so
> > it's not reading an OS allocated buffer, which probably means it's some
> > kind of side-band communication with a management controller.  I'd guess
> > it's a BIOS bug and there should be an RMRR covering those accesses.
> > Thanks,
> 
> First of all ,I want to known whether I can ignore these errors on the
> production server,and do these error may affect the system?

You'll have to make that call, the device is being blocked from reading
a memory address, we don't know what it's reading or why.

> By the way,when I removed the "intel_iommu=on" from /etc/grub.conf,no
> DMAR related errors occur

Of course.  One option you have is to use the iommu in passthrough mode
which allows host used devices unrestricted, identity mapped access to
the system while still offering PCI device assignment.  I wouldn't try
assigning device 3:00.0 though.  Add iommu=pt to enable this.

> It's a strange thing,other three Dell R710 servers with the same bios
> version v. 6.3.0, same kernel 2.6.32-279.14.1 on RHEL6u3(Centos 6u3)
> ,but these errors don't appear on these tree servers

Is the MegaRAID firmware and system management firmware the same as
well?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-13 Thread Mike Galbraith

On Thu, 2012-12-13 at 22:25 +0800, Alex Shi wrote: 
> On 12/13/2012 06:11 PM, Vincent Guittot wrote:
> > On 13 December 2012 03:17, Alex Shi  wrote:
> >> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
> >>> During the creation of sched_domain, we define a pack buddy CPU for each 
> >>> CPU
> >>> when one is available. We want to pack at all levels where a group of CPU 
> >>> can
> >>> be power gated independently from others.
> >>> On a system that can't power gate a group of CPUs independently, the flag 
> >>> is
> >>> set at all sched_domain level and the buddy is set to -1. This is the 
> >>> default
> >>> behavior.
> >>> On a dual clusters / dual cores system which can power gate each core and
> >>> cluster independently, the buddy configuration will be :
> >>>
> >>>   | Cluster 0   | Cluster 1   |
> >>>   | CPU0 | CPU1 | CPU2 | CPU3 |
> >>> ---
> >>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
> >>>
> >>> Small tasks tend to slip out of the periodic load balance so the best 
> >>> place
> >>> to choose to migrate them is during their wake up. The decision is in 
> >>> O(1) as
> >>> we only check again one buddy CPU
> >>
> >> Just have a little worry about the scalability on a big machine, like on
> >> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
> >> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
> >> is different on task distribution decision.
> > 
> > The buddy CPU should probably not be the same for all 64 LCPU it
> > depends on where it's worth packing small tasks
> 
> Do you have further ideas for buddy cpu on such example?
> > 
> > Which kind of sched_domain configuration have you for such system ?
> > and how many sched_domain level have you ?
> 
> it is general X86 domain configuration. with 4 levels,
> sibling/core/cpu/numa.

CPU is a bug that slipped into domain degeneration.  You should have
SIBLING/MC/NUMA (chasing that down is on todo).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] ndisc: Use more standard logging styles

2012-12-13 Thread Joe Perches

The logging style in this file is "baroque".

Convert indirect uses of net__ratelimited to
simply use net__ratelimited.

Add a nd_dbg macro and #define ND_DEBUG for the other
generally inactivated logging messages.

Make those inactivated messages emit only at KERN_DEBUG
instead of other levels.  Add "%s: " __func__ to all
these nd_dbg macros and remove the embedded function
names and prefixes.

Signed-off-by: Joe Perches 
---
 net/ipv6/ndisc.c |  139 --
 1 files changed, 61 insertions(+), 78 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f2a007b..de1c1f2 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -72,14 +72,19 @@
 #include 
 #include 
 
-/* Set to 3 to get tracing... */
-#define ND_DEBUG 1
+/* #define ND_DEBUG */
 
-#define ND_PRINTK(val, level, fmt, ...)\
+#ifdef ND_DEBUG
+#define nd_dbg(fmt, ...)   \
+   net_dbg_ratelimited("%s: " fmt, __func__, ##__VA_ARGS__)
+#else
+#define nd_dbg(fmt, ...)   \
 do {   \
-   if (val <= ND_DEBUG)\
-   net_##level##_ratelimited(fmt, ##__VA_ARGS__);  \
+   if (0)  \
+   net_dbg_ratelimited("%s: " fmt, \
+   __func__, ##__VA_ARGS__);   \
 } while (0)
+#endif
 
 static u32 ndisc_hash(const void *pkey,
  const struct net_device *dev,
@@ -220,9 +225,8 @@ struct ndisc_options *ndisc_parse_options(u8 *opt, int 
opt_len,
case ND_OPT_MTU:
case ND_OPT_REDIRECT_HDR:
if (ndopts->nd_opt_array[nd_opt->nd_opt_type]) {
-   ND_PRINTK(2, warn,
- "%s: duplicated ND6 option found: 
type=%d\n",
- __func__, nd_opt->nd_opt_type);
+   nd_dbg("duplicated ND6 option found: type=%d\n",
+  nd_opt->nd_opt_type);
} else {
ndopts->nd_opt_array[nd_opt->nd_opt_type] = 
nd_opt;
}
@@ -250,11 +254,9 @@ struct ndisc_options *ndisc_parse_options(u8 *opt, int 
opt_len,
 * to accommodate future extension to the
 * protocol.
 */
-   ND_PRINTK(2, notice,
- "%s: ignored unsupported option; 
type=%d, len=%d\n",
- __func__,
- nd_opt->nd_opt_type,
- nd_opt->nd_opt_len);
+   nd_dbg("ignored unsupported option; type=%d, 
len=%d\n",
+  nd_opt->nd_opt_type,
+  nd_opt->nd_opt_len);
}
}
opt_len -= l;
@@ -399,8 +401,8 @@ static struct sk_buff *ndisc_build_skb(struct net_device 
*dev,
   len + hlen + tlen),
  1, );
if (!skb) {
-   ND_PRINTK(0, err, "ND: %s failed to allocate an skb, err=%d\n",
- __func__, err);
+   net_err_ratelimited("ND: %s failed to allocate an skb, 
err=%d\n",
+   __func__, err);
return NULL;
}
 
@@ -629,9 +631,8 @@ static void ndisc_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
 
if ((probes -= neigh->parms->ucast_probes) < 0) {
if (!(neigh->nud_state & NUD_VALID)) {
-   ND_PRINTK(1, dbg,
- "%s: trying to ucast probe in NUD_INVALID: 
%pI6\n",
- __func__, target);
+   net_dbg_ratelimited("%s: trying to ucast probe in 
NUD_INVALID: %pI6\n",
+   __func__, target);
}
ndisc_send_ns(dev, neigh, target, target, saddr);
} else if ((probes -= neigh->parms->app_probes) < 0) {
@@ -677,7 +678,7 @@ static void ndisc_recv_ns(struct sk_buff *skb)
int is_router = -1;
 
if (ipv6_addr_is_multicast(>target)) {
-   ND_PRINTK(2, warn, "NS: multicast target address\n");
+   nd_dbg("multicast target address\n");
return;
}
 
@@ -690,20 +691,19 @@ static void ndisc_recv_ns(struct sk_buff *skb)
  daddr->s6_addr32[1] == htonl(0x) &&
  daddr->s6_addr32[2] == htonl(0x0001) &&
  daddr->s6_addr [12] == 0xff )) {
-   ND_PRINTK(2, warn, "NS: bad DAD packet (wrong

[PATCH] pwm: samsung: add missing s3c->pwm_id

2012-12-13 Thread Joonyoung Shim

The s3c->pwm_id is used to calculate offset of related register.

Signed-off-by: Joonyoung Shim 
---
 drivers/pwm/pwm-samsung.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pwm/pwm-samsung.c b/drivers/pwm/pwm-samsung.c
index 023a3be..b415102 100644
--- a/drivers/pwm/pwm-samsung.c
+++ b/drivers/pwm/pwm-samsung.c
@@ -220,6 +220,8 @@ static int s3c_pwm_probe(struct platform_device *pdev)
return -ENOMEM;
}
 
+   s3c->pwm_id = id;
+
/* calculate base of control bits in TCON */
s3c->tcon_base = id == 0 ? 0 : (id * 4) + 4;
s3c->chip.dev = >dev;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] user namespace and namespace infrastructure changes for 3.8

2012-12-13 Thread Eric W. Biederman

Andy Lutomirski  writes:

> On Thu, Dec 13, 2012 at 2:01 PM, Eric W. Biederman
>  wrote:
>> Andy Lutomirski  writes:
>>
>>> On 12/11/2012 01:17 PM, Eric W. Biederman wrote:
>
>> But please also note the difference between capable and ns_capable.  Any
>> security check that is capable() still requires priviliges in the
>> initial user namespace.
>
> Huh?
>
> I'm talking about:
>
> clone with CLONE_NEWUSER
>  - child does unshare(CLONE_NEWPID)
>  - parent does setfd(child's pid namespace)
>
> Now the parent is running in the init userns with a different pid ns.
> Setuid binaries will work but will see the unexpected pid ns.  With
> mount namespaces, this would be Really Bad (tm).  With pid or ipc or
> net, it's less obviously dangerous, but I'm not convinced it's safe.

That isn't safe.  That is a sneaky bug in my tree that I overlooked. :(

> I sort of think that setns on a *non*-userns should require
> CAP_SYS_ADMIN in the current userns, at least if no_new_privs isn't
> set.

Yes.  CAP_SYS_ADMIN in your current user namespace should make
setns as safe as it currently is before my patches.

That is just a matter of adding a couple nsown_capable(CAP_SYS_ADMIN)
permission checks.

Right now I test for nsown_capable(CAP_SYS_CHROOT) for the mount
namespace, which is probably sufficient to prevent those kinds of
shenanigans but I am going to add a nsown_capable(CAP_SYS_ADMIN) for
good measure.

> A non-threaded process can have mm_users == 2 with CLONE_VM.  I'm not
> sure this is a problem, though.

No it isn't.  I said threads because they are the easy concept not
because that covers all possible corner cases.

>>> In any case, why are threads special here?
>>
>> You know I don't think I stopped to think about it.   The combination
>> of CLONE_NEWUSER and CLONE_THREAD have been denined since the first user
>> namespace support was merged in 2008.
>>
>> I do know that things can get really strange when you mix multiple
>> namespaces in a process.  tkill of your own threads will stop working.
>> Which access permissions should apply to files you mmap, file handles
>> you have open, the core dumper etc.
>>
>> We do allow setresuid per thread so we might be able to cope
>> with a process that mixes with user namespaces in different threads,
>> but I would want a close review of things before we allow that kind of
>> sharing.
>>
>
> Fair enough.
>
> I'd personally be more concerned about shared signal handlers than
> shared tgid, though.  The signal handler set has all kinds of weird
> things like session id.

CLONE_THREAD implies CLONE_VM and CLONE_SIGNAL,
and in practice mm_users > 1 protects against all of those cases.

So I was really thinking all of those cases.


>> (See the end.  A significant bug in cap_capable slipped in about
>>  3.5. cap_capable is only supposed to grant permissions to the owner
>>  of a user namespace if it is a child user namespace).
>
> [snipping lots of stuff]
>
> If the intended semantics of cap_capable are, indeed:
>
> If targ_ns is equals or is a descendent of cred->user_ns and the cap
> is effective, return true.  If targ_ns is owned by cred->euid and
> targ_ns is a descendent of cred->user_ns (but is not equal to
> cred->user_ns), then return true.  Else return false
>
> then I agree with you on almost everything you said.  I assumed that
> the actual semantics were intended.

Good.

>>> unshare has a bug.  This code:
>>
>> Interesting...
>>
>> Looking at it this is a very small misfeature.
>>
>> What is happening is that commit_creds is setting is making the task
>> undumpable because we changed the set of capabilities in struct cred.
>>
>> This in turn results in pid_revalidate setting the owner of
>> of /proc/self/uid_map to GLOBAL_ROOT_UID.
>>
>> From the outside the permissions on /proc/self/uid_map look like:
>> -rw-r--r-- 1 root root 0 2012-12-13 12:43 /proc/30530/uid_map
>>
>> Then since /proc/self/uid_map uses the default permission function
>> and the test program below is not run as root the read-write open
>> of uid_map fails.
>
> It's probably either worth fixing this or disabling unshare of userns.
>  This makes it hard to use.  IMO non-dumpable tasks should still be
> able to access the contents of /proc/self -- i.e. I'd call this a more
> general bug.
>
> But I'd also argue that unsharing userns shouldn't set non-dumpable --
> cap_permitted increased, but the new capabilities are still logically
> a subset of the old ones.

Agreed.  Setting dumpable is the bug.

I am going to sleep on it but the code in commit_creds should probably
read:

/* dumpability changes */
if (!uid_eq(old->euid, new->euid) ||
!gid_eq(old->egid, new->egid) ||
!uid_eq(old->fsuid, new->fsuid) ||
!gid_eq(old->fsgid, new->fsgid) ||
((old->user_ns == new->user_ns) &&
 !cap_issubset(new->cap_permitted, old->cap_permitted))) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);

Re: [PATCH 1/3] timekeeping: Add persistent_clock_exist flag

2012-12-13 Thread Jason Gunthorpe

On Fri, Dec 14, 2012 at 11:13:30AM +0800, Feng Tang wrote:

> > This seems like a great use of that hardware resource, and no doubt
> > those mach's also have a class RTC driver available talking to
> > different hardware.
> 
> Interesting to know this, thanks for the info. For the x86 desktop
> and mobile processors I've used, the read_persistent_clock and rtc
> are the same on-board device (always power on), so I see many time
> related code are execuated twice, like init/suspend/resume if
> HCTOSYS config is enabled, that's why I came up with the patches.

Ah, I see, there is some duplication here, my earlier comments about
update_persistent_clock are not quite right, some places like PCs
stick a RTC driver and then continue to access the same hardware
directly outside the rtc driver context! That seems ugly :|

I see the PC CMOS rtc driver does not implement the set_mmss
operation, instead running that code through update_persistent_clock..
That seems like a cleanup waiting to happen.

Regarding your problem - IMHO, it would be fantastic if the class RTC
driver could be used instead of read_persistent_clock on PC.

John mentioned that read_persistent_clock had a requirement to work
with IRQs off - that seems like it would be easy to incorporate into
class rtc - for hardware that supports it (and PC is not the only RTC
HW that can do this) Is that the only reason it still exists on pc?

I have to feel the long term direction should be to remove
*_persistent_clock in favor of class RTC?

> > Maybe Feng would be better off adjusting read_persistent_clock to
> > return ENODEV in such cases??
> 
> For mach's without read_persistent_clock capability, there is already
> a weakly defined 

This is used for arch's without the functionality, mach's are arch
specific things. ARM provides a function pointer indirection for it's
read_persistent_clock implementation.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: some merging notes

2012-12-13 Thread Rusty Russell

Stephen Rothwell  writes:
> The virtio tree
> (git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux.git#virtio-next)
> has a conflict with the net-next tree
> (git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git#master)
> that requires the following extra fix up patch:
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 33d6f6f..8afe32d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -147,7 +147,7 @@ struct padded_vnet_hdr {
>   */
>  static int vq2txq(struct virtqueue *vq)
>  {
> - return (virtqueue_get_queue_index(vq) - 1) / 2;
> + return (vq->index - 1) / 2;
>  }
>  
>  static int txq2vq(int txq)
> @@ -157,7 +157,7 @@ static int txq2vq(int txq)
>  
>  static int vq2rxq(struct virtqueue *vq)
>  {
> - return virtqueue_get_queue_index(vq) / 2;
> + return vq->index / 2;
>  }

I had to fold in a fix to another commit, so I also altered the commit
which removed virtqueue_get_queue_index().  I'll remove it next time.

I'll give it a couple of days in -next to be sure.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Suggestion] drivers/staging/tidspbridge: strcpy and strncpy, src length checking issue.

2012-12-13 Thread Chen Gang

Hello Omar Ramirez Luna:

  in drivers/staging/tidspbridge/rmgr/proc.c:

if strlen(drv_datap->base_img) == size, will pass checking (line 397)
the size is the full length of exec_file (line 382, line 468..469)
strcpy causes issue: src len is strlen(drv_datap->base_img) + '\0'. (line 
400)

strncpy seems also has issue: need use size instead of strlen(iva_img) + 1. 
(line 402..403)

  please help to check, thanks.

gchen.


 380 static int get_exec_file(struct cfg_devnode *dev_node_obj,
 381 struct dev_object *hdev_obj,
 382 u32 size, char *exec_file)
 383 {
 384 u8 dev_type;
 385 s32 len;
 386 struct drv_data *drv_datap = dev_get_drvdata(bridge);
 387 
 388 dev_get_dev_type(hdev_obj, (u8 *) _type);
 389 
 390 if (!exec_file)
 391 return -EFAULT;
 392 
 393 if (dev_type == DSP_UNIT) {
 394 if (!drv_datap || !drv_datap->base_img)
 395 return -EFAULT;
 396 
 397 if (strlen(drv_datap->base_img) > size)
 398 return -EINVAL;
 399 
 400 strcpy(exec_file, drv_datap->base_img);
 401 } else if (dev_type == IVA_UNIT && iva_img) {
 402 len = strlen(iva_img);
 403 strncpy(exec_file, iva_img, len + 1);
 404 } else {
 405 return -ENOENT;
 406 }
 407 
 408 return 0;
 409 }
 410 
 ...

 465 /* Get the default executable for this board... */
 466 dev_get_dev_type(hdev_obj, (u8 *) _type);
 467 p_proc_object->processor_id = dev_type;
 468 status = get_exec_file(dev_node_obj, hdev_obj, 
sizeof(sz_exec_file),
 469sz_exec_file);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 5/5] KVM: x86: improve reexecute_instruction

2012-12-13 Thread Xiao Guangrong

On 12/14/2012 07:02 AM, Marcelo Tosatti wrote:

>>> Same comment as before: the only case where it should not attempt to 
>>> emulate is when there is a condition which makes it impossible to fix
>>> (the information is available to detect that condition).
>>>
>>> The earlier suggestion
>>>
>>> "How about recording the gfn number for shadow pages that have been
>>> shadowed in the current pagefault run?"
>>>
>>> Was about that.
>>
>> I think we can have a try. Is this change good to you, Marcelo?
>>
>> [eric@localhost kvm]$ git diff
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 01d7c2a..e3d0001 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -4359,24 +4359,34 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm 
>> *kvm)
>> return nr_mmu_pages;
>>  }
>>
>> -int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 
>> sptes[4])
>> +void kvm_mmu_get_sp_hierarchy(struct kvm_vcpu *vcpu, u64 addr,
>> + struct kvm_mmu_sp_hierarchy *hierarchy)
>>  {
>> struct kvm_shadow_walk_iterator iterator;
>> u64 spte;
>> -   int nr_sptes = 0;
>> +
>> +   hierarchy->max_level = hierarchy->nr_levels = 0;
>>
>> walk_shadow_page_lockless_begin(vcpu);
>> for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
>> -   sptes[iterator.level-1] = spte;
>> -   nr_sptes++;
>> +   struct kvm_mmu_page *sp =  page_header(__pa(iterator.sptep));
>> +
>> +   if (hierarchy->indirect_only && sp->role.direct)
>> +   break;
>> +
>> +   if (!hierarchy->max_level)
>> +   hierarchy->max_level = iterator.level;
>> +
>> +   hierarchy->shadow_gfns[iterator.level-1] = sp->gfn;
>> +   hierarchy->sptes[iterator.level-1] = spte;
>> +   hierarchy->nr_levels++;
>> +
>> if (!is_shadow_present_pte(spte))
>> break;
>> }
>> walk_shadow_page_lockless_end(vcpu);
>> -
>> -   return nr_sptes;
>>  }
> 
> Record gfns while shadowing in the vcpu struct, in a struct, along with cr2.
> Then validate 
> That way its guaranteed its not some other vcpu.

Okay, i will try this way. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write

2012-12-13 Thread Dave Chinner

On Thu, Dec 13, 2012 at 06:10:49PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 13, 2012 at 05:48:06PM -0800, Andy Lutomirski wrote:
> > On 12/13/2012 12:08 AM, Darrick J. Wong wrote:
> > > Several complaints have been received regarding long file write latencies 
> > > when
> > > memory pages must be held stable during writeback.  Since it might not be
> > > acceptable to stall programs for the entire duration of a page write 
> > > (which may
> > > take many milliseconds even on good hardware), enable a second strategy 
> > > wherein
> > > pages are snapshotted as part of submit_bio; the snapshot can be held 
> > > stable
> > > while writes continue.
> > > 
> > > This provides a band-aid to provide stable page writes on jbd without 
> > > needing
> > > to backport the fixed locking scheme in jbd2.  A mount option is added to 
> > > ext4
> > > to allow administrators to enable it there.
> > 
> > I'm a bit confused as to what it has to do with ext3.  Wouldn't this be
> > useful as a mount option everywhere, though?
> 
> ext3 requires snapshots; the rest are ok with either strategy.
> 
> *If* snapshotting is generally liked, then yes I'll go redo it as a vfs mount
> option.

It's copying every single IO, right? If so, then please don't
propagate any further than is necessary to fix the broken
filesystems...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps.

2012-12-13 Thread Eric W. Biederman

"Serge E. Hallyn"  writes:

> Quoting Eric W. Biederman (ebied...@xmission.com):
>> 
>> Andy Lutomirski pointed out that the current behavior of allowing the
>> owner of a user namespace to have all caps when that owner is not in a
>> parent user namespace is wrong.
>
> To make sure I understand right, the issue is when a uid is mapped
> into multiple namespaces.

Yes.

i.e. uid 1000 in ns1 may own ns2, but uid 1000 in ns3 does not?

I am not certain of your example.

The simple case is:

init_user_ns:
 child_user_ns1 (owned by uid == 0 [in all user namespaces])
   child_user_ns2 (owned by uid == 0 [ in all user namespaces])


root (uid == 0) in child_user_ns2 has all rights over anything in
child_user_ns1.


Thank you for looking.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps.

2012-12-13 Thread Serge E. Hallyn

Quoting Eric W. Biederman (ebied...@xmission.com):
> 
> Andy Lutomirski pointed out that the current behavior of allowing the
> owner of a user namespace to have all caps when that owner is not in a
> parent user namespace is wrong.

To make sure I understand right, the issue is when a uid is mapped
into multiple namespaces, i.e. uid 1000 in ns1 may own ns2, but uid
1000 in ns3 does not?

> This is a bug introduced by the kuid conversion which made it possible
> for the owner of a user namespace to live in a child user namespace.  I
> goofed and totally missed this implication.
> 
> Serge and can you please take a look and see if my corrected cap_capable
> reads correctly to you.
> 
> Andy or anyone else that wants to give me a second eyeball and double
> check me on this I would appreciate it.
> 
> Signed-off-by: "Eric W. Biederman" 

Acked-by: Serge Hallyn 

> ---
> 
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 6dbae46..4639f44 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -70,37 +70,44 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
>   *
>   * NOTE WELL: cap_has_capability() cannot be used like the kernel's capable()
>   * and has_capability() functions.  That is, it has the reverse semantics:
>   * cap_has_capability() returns 0 when a task has a capability, but the
>   * kernel's capable() and has_capability() returns 1 for this case.
>   */
>  int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
>   int cap, int audit)
>  {
>   for (;;) {
> - /* The owner of the user namespace has all caps. */
> - if (targ_ns != _user_ns && uid_eq(targ_ns->owner, 
> cred->euid))
> - return 0;
> + struct user_namespace *parent_ns;
>  
>   /* Do we have the necessary capabilities? */
>   if (targ_ns == cred->user_ns)
>   return cap_raised(cred->cap_effective, cap) ? 0 : 
> -EPERM;
>  
>   /* Have we tried all of the parent namespaces? */
>   if (targ_ns == _user_ns)
>   return -EPERM;
>  
> + parent_ns = targ_ns->parent;
> +
> + /* 
> +  * The owner of the user namespace in the parent user
> +  * namespace has all caps.
> +  */
> + if ((parent_ns == cred->user_ns) && uid_eq(targ_ns->owner, 
> cred->euid))
> + return 0;
> +
>   /*
> -  *If you have a capability in a parent user ns, then you have
> +  * If you have a capability in a parent user ns, then you have
>* it over all children user namespaces as well.
>*/
> - targ_ns = targ_ns->parent;
> + targ_ns = parent_ns;
>   }
>  
>   /* We never get here */
>  }
>  
>  /**
>   * cap_settime - Determine whether the current process may set the system 
> clock
>   * @ts: The time to set
>   * @tz: The timezone to set
>   *
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps.

2012-12-13 Thread Eric W. Biederman


Andy Lutomirski pointed out that the current behavior of allowing the
owner of a user namespace to have all caps when that owner is not in a
parent user namespace is wrong.  Add a test to ensure the owner of a user
namespace is in the parent of the user namespace to fix this bug.

Thankfully this bug did not apply to the initial user namespace, keeping
the mischief that can be caused by this bug quite small.

This is bug was introduced in v3.5 by commit 783291e6900
"Simplify the user_namespace by making userns->creator a kuid."

The bug made it possible for the owner of a user namespace to be
present in a child user namespace.  Since the owner of a user nameapce
is granted all capabilities it became possible for users in a
grandchild user namespace to have all privilges over their parent user
namspace.

Reorder the checks in cap_capable.  This should make the common case
faster and make it clear that nothing magic happens in the initial
user namespace.  The reordering is safe because cred->user_ns
can only be in targ_ns or targ_ns->parent but not both.

Add a comment a the top of the loop to make the logic of
the code clear.

Cc: sta...@vger.kernel.org
Signed-off-by: "Eric W. Biederman" 
---
 security/commoncap.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 6dbae46..ded41a0 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -76,11 +76,12 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
int cap, int audit)
 {
-   for (;;) {
-   /* The owner of the user namespace has all caps. */
-   if (targ_ns != _user_ns && uid_eq(targ_ns->owner, 
cred->euid))
-   return 0;
 
+   /* See if cred has the capability in the target user namespace
+* by examining the target user namespace and all of the target
+* user namespace's parents.
+*/
+   for (;;) {
/* Do we have the necessary capabilities? */
if (targ_ns == cred->user_ns)
return cap_raised(cred->cap_effective, cap) ? 0 : 
-EPERM;
@@ -89,8 +90,16 @@ int cap_capable(const struct cred *cred, struct 
user_namespace *targ_ns,
if (targ_ns == _user_ns)
return -EPERM;
 
+   /* 
+* The owner of the user namespace in the parent of the
+* user namespace has all caps.
+*/
+   if ((targ_ns->parent == cred->user_ns) &&
+uid_eq(targ_ns->owner, cred->euid))
+   return 0;
+
/*
-*If you have a capability in a parent user ns, then you have
+* If you have a capability in a parent user ns, then you have
 * it over all children user namespaces as well.
 */
targ_ns = targ_ns->parent;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 2/2]block: add plug for blkdev_issue_discard

2012-12-13 Thread Shaohua Li

Last post of this patch appears lost, so I resend this.

Now discard merge works, add plug for blkdev_issue_discard. This will help
discard request merge especially for raid0 case. In raid0, a big discard
request is split to small requests, and if correct plug is added, such small
requests can be merged in low layer.

Signed-off-by: Shaohua Li 
---
 block/blk-lib.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux/block/blk-lib.c
===
--- linux.orig/block/blk-lib.c  2012-12-14 10:03:22.305422686 +0800
+++ linux/block/blk-lib.c   2012-12-14 10:08:50.577295672 +0800
@@ -48,6 +48,7 @@ int blkdev_issue_discard(struct block_de
struct bio_batch bb;
struct bio *bio;
int ret = 0;
+   struct blk_plug plug;
 
if (!q)
return -ENXIO;
@@ -82,6 +83,7 @@ int blkdev_issue_discard(struct block_de
bb.flags = 1 << BIO_UPTODATE;
bb.wait = 
 
+   blk_start_plug();
while (nr_sects) {
unsigned int req_sects;
sector_t end_sect, tmp;
@@ -120,6 +122,7 @@ int blkdev_issue_discard(struct block_de
atomic_inc();
submit_bio(type, bio);
}
+   blk_finish_plug();
 
/* Wait for bios in-flight */
if (!atomic_dec_and_test())
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/2]block: discard granularity might not be power of 2

2012-12-13 Thread Shaohua Li

In MD raid case, discard granularity might not be power of 2, for example, a
4-disk raid5 has 3*chunk_size discard granularity. Correct the calculation for
such cases.

Reported-by: Neil Brown 
Signed-off-by: Shaohua Li 
---
 block/blk-lib.c|   23 +--
 block/blk-settings.c   |6 +++---
 include/linux/blkdev.h |7 ---
 3 files changed, 20 insertions(+), 16 deletions(-)

Index: linux/block/blk-lib.c
===
--- linux.orig/block/blk-lib.c  2012-10-15 10:01:52.763544641 +0800
+++ linux/block/blk-lib.c   2012-12-14 08:56:24.539932760 +0800
@@ -43,8 +43,8 @@ int blkdev_issue_discard(struct block_de
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors;
-   unsigned int granularity, alignment, mask;
+   sector_t max_discard_sectors;
+   sector_t granularity, alignment;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -57,15 +57,16 @@ int blkdev_issue_discard(struct block_de
 
/* Zero-sector (unknown) and one-sector granularities are the same.  */
granularity = max(q->limits.discard_granularity >> 9, 1U);
-   mask = granularity - 1;
-   alignment = (bdev_discard_alignment(bdev) >> 9) & mask;
+   alignment = bdev_discard_alignment(bdev) >> 9;
+   alignment = sector_div(alignment, granularity);
 
/*
 * Ensure that max_discard_sectors is of the proper
 * granularity, so that requests stay aligned after a split.
 */
max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
-   max_discard_sectors = round_down(max_discard_sectors, granularity);
+   sector_div(max_discard_sectors, granularity);
+   max_discard_sectors *= granularity;
if (unlikely(!max_discard_sectors)) {
/* Avoid infinite loop below. Being cautious never hurts. */
return -EOPNOTSUPP;
@@ -83,7 +84,7 @@ int blkdev_issue_discard(struct block_de
 
while (nr_sects) {
unsigned int req_sects;
-   sector_t end_sect;
+   sector_t end_sect, tmp;
 
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
@@ -98,10 +99,12 @@ int blkdev_issue_discard(struct block_de
 * misaligned, stop the discard at the previous aligned sector.
 */
end_sect = sector + req_sects;
-   if (req_sects < nr_sects && (end_sect & mask) != alignment) {
-   end_sect =
-   round_down(end_sect - alignment, granularity)
-   + alignment;
+   tmp = end_sect;
+   if (req_sects < nr_sects &&
+   sector_div(tmp, granularity) != alignment) {
+   end_sect = end_sect - alignment;
+   sector_div(end_sect, granularity);
+   end_sect = end_sect * granularity + alignment;
req_sects = end_sect - sector;
}
 
Index: linux/block/blk-settings.c
===
--- linux.orig/block/blk-settings.c 2012-10-15 10:01:52.763544641 +0800
+++ linux/block/blk-settings.c  2012-12-14 09:53:18.493013557 +0800
@@ -611,7 +611,7 @@ int blk_stack_limits(struct queue_limits
bottom = b->discard_granularity + alignment;
 
/* Verify that top and bottom intervals line up */
-   if (max(top, bottom) & (min(top, bottom) - 1))
+   if ((max(top, bottom) % min(top, bottom)) != 0)
t->discard_misaligned = 1;
}
 
@@ -619,8 +619,8 @@ int blk_stack_limits(struct queue_limits
  b->max_discard_sectors);
t->discard_granularity = max(t->discard_granularity,
 b->discard_granularity);
-   t->discard_alignment = lcm(t->discard_alignment, alignment) &
-   (t->discard_granularity - 1);
+   t->discard_alignment = lcm(t->discard_alignment, alignment) %
+   t->discard_granularity;
}
 
return ret;
Index: linux/include/linux/blkdev.h
===
--- linux.orig/include/linux/blkdev.h   2012-10-15 10:01:52.999541673 +0800
+++ linux/include/linux/blkdev.h2012-12-13 14:26:25.469877308 +0800
@@ -1180,13 +1180,14 @@ static inline int queue_discard_alignmen
 
 static inline int queue_limit_discard_alignment(struct queue_limits *lim, 
sector_t sector)
 {
-   unsigned int alignment = (sector << 9) & (lim->discard_granularity - 1);
+   sector_t alignment = sector << 9;
+

Re: [PATCH 1/3] timekeeping: Add persistent_clock_exist flag

2012-12-13 Thread Feng Tang

On Thu, Dec 13, 2012 at 07:38:26PM -0700, Jason Gunthorpe wrote:
> On Thu, Dec 13, 2012 at 06:00:23PM -0800, John Stultz wrote:
> 
> > So per Jason's related patch, he's made the point that the
> > persistent_clock and RTC class functionality are basically exclusive
> > (well, in his case, he said this with respect to updating the RTC,
> > not reading it - I don't mean to put words in his mouth - Please do
> > correct me here Jason. :).  In other words, we probably should avoid
> > configurations where both the rtc hctosys and persistent_clock
> > interfaces are both active.
> 
> I only studied update_persistent_clock, read_persistent_clock is
> very much different.
> 
> Looking at it, I don't think that update_persistent_clock is in any
> way related to read_persistent_clock..  update_persistent_clock is
> *only* called by NTP, and its *only* purpose is to update the RTC with
> NTP synchronized time. In many configurations it will never even be
> called.
> 
> I think update_persistent_clock is badly named, it should be called
> platform_save_ntp_time_to_rtc(), keep it divorced from
> read_presistent_clock :)
> 
> > make the HCTOSYS option be dependent on !HAS_PERSISTENT_CLOCK. This
> > way we avoid having configs where there are conflicting paths that
> > we chose from.
> 
> On ARM the read_presistent_clock is used to access a true monotonic
> counter that is divorced from the system RTC - look at
> arch/arm/plat-omap/counter_32k.c for instance.
> 
> This seems like a great use of that hardware resource, and no doubt
> those mach's also have a class RTC driver available talking to
> different hardware.

Interesting to know this, thanks for the info. For the x86 desktop
and mobile processors I've used, the read_persistent_clock and rtc
are the same on-board device (always power on), so I see many time
related code are execuated twice, like init/suspend/resume if
HCTOSYS config is enabled, that's why I came up with the patches.

> 
> For mach's without that functionality ARM returns a fixed 0 value
> from read_persistent_clock, persumably the kernel detects this and
> falls back to using class rtc functions?
> 
> Maybe Feng would be better off adjusting read_persistent_clock to
> return ENODEV in such cases??

For mach's without read_persistent_clock capability, there is already
a weakly defined 

void __attribute__((weak)) read_persistent_clock(struct timespec *ts)
{
ts->tv_sec = 0;
ts->tv_nsec = 0;
}
so those machs can simply do nothing, and let time core code to judge it.

> 
> So, I think you have to keep your test as a run time test. To support
> the single image ARM boot you can't make the distinction with kconfig.

Good point. Figuring out the kconfig for all arm platforms is very
challenging.

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mm: clean up soft_offline_page()

2012-12-13 Thread Xishi Qiu

On 2012/12/14 7:01, Naoya Horiguchi wrote:

> Hi,
> 
> I wrote this patch inspired by the discussion about fixing mce_bad_pages bug.
> https://lkml.org/lkml/2012/12/7/66
> As mentioned by Andrew, this bug seemed to be undetected because of the
> messiness of soft_offline_page(), so with this patch we can deal with the 
> problem.
> 
> Thanks,
> Naoya
> ---
> From: Naoya Horiguchi 
> Date: Thu, 13 Dec 2012 16:08:54 -0500
> Subject: [PATCH] mm: clean up soft_offline_page()
> 
> Currently soft_offline_page() is hard to maintain because it has many
> return points and goto statements. All of this mess come from get_any_page().
> This function should only get page refcount as the name implies, but it does
> some page isolating actions like SetPageHWPoison() and dequeuing hugepage.
> This patch corrects it and introduces some internal subroutines to make
> soft offlining code more readable and maintainable.
> 
> Signed-off-by: Naoya Horiguchi 
> ---
>  mm/memory-failure.c | 189 
> 
>  1 file changed, 101 insertions(+), 88 deletions(-)
> 
> diff --git v3.7.orig/mm/memory-failure.c v3.7/mm/memory-failure.c
> index 8b20278..8cef032 100644
> --- v3.7.orig/mm/memory-failure.c
> +++ v3.7/mm/memory-failure.c
> @@ -1368,7 +1368,7 @@ static struct page *new_page(struct page *p, unsigned 
> long private, int **x)
>   * that is not free, and 1 for any other page type.
>   * For 1 the page is returned with increased page count, otherwise not.
>   */
> -static int get_any_page(struct page *p, unsigned long pfn, int flags)
> +static int __get_any_page(struct page *p, unsigned long pfn, int flags)
>  {
>   int ret;
>  
> @@ -1393,11 +1393,9 @@ static int get_any_page(struct page *p, unsigned long 
> pfn, int flags)
>   if (!get_page_unless_zero(compound_head(p))) {
>   if (PageHuge(p)) {
>   pr_info("%s: %#lx free huge page\n", __func__, pfn);
> - ret = dequeue_hwpoisoned_huge_page(compound_head(p));
> + ret = 0;
>   } else if (is_free_buddy_page(p)) {
>   pr_info("%s: %#lx free buddy page\n", __func__, pfn);
> - /* Set hwpoison bit while page is still isolated */
> - SetPageHWPoison(p);
>   ret = 0;
>   } else {
>   pr_info("%s: %#lx: unknown zero refcount page type 
> %lx\n",
> @@ -1413,23 +1411,45 @@ static int get_any_page(struct page *p, unsigned long 
> pfn, int flags)
>   return ret;
>  }
>  
> +static int get_any_page(struct page *page, unsigned long pfn, int flags)
> +{
> + int ret = __get_any_page(page, pfn, flags);
> +
> + if (ret == 1 && !PageHuge(page) && !PageLRU(page)) {
> + /*
> +  * Try to free it.
> +  */
> + put_page(page);
> + shake_page(page, 1);
> +
> + /*
> +  * Did it turn free?
> +  */
> + ret = __get_any_page(page, pfn, 0);
> + if (!PageLRU(page)) {
> + pr_info("soft_offline: %#lx: unknown non LRU page type 
> %lx\n",
> + pfn, page->flags);
> + return -EIO;
> + }
> + }
> + return ret;
> +}
> +
>  static int soft_offline_huge_page(struct page *page, int flags)
>  {
>   int ret;
>   unsigned long pfn = page_to_pfn(page);
>   struct page *hpage = compound_head(page);
>  
> - ret = get_any_page(page, pfn, flags);
> - if (ret < 0)
> - return ret;
> - if (ret == 0)
> - goto done;
> -
> + /* Synchronized using the page lock with memory_failure() */
> + lock_page(hpage);
>   if (PageHWPoison(hpage)) {
> + unlock_page(hpage);
>   put_page(hpage);
>   pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
>   return -EBUSY;
>   }
> + unlock_page(hpage);
>  
>   /* Keep page count to indicate a given hugepage is isolated. */
>   ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false,
> @@ -1439,85 +1459,19 @@ static int soft_offline_huge_page(struct page *page, 
> int flags)
>   pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
>   pfn, ret, page->flags);
>   return ret;
> + } else {
> + set_page_hwpoison_huge_page(hpage);
> + dequeue_hwpoisoned_huge_page(hpage);
> + atomic_long_add(1<   }
> -done:
> - if (!PageHWPoison(hpage))
> - atomic_long_add(1 << compound_trans_order(hpage),
> - _bad_pages);
> - set_page_hwpoison_huge_page(hpage);
> - dequeue_hwpoisoned_huge_page(hpage);
>   /* keep elevated page count for bad page */
>   return ret;
>  }
>  
> -/**
> - * soft_offline_page - Soft offline a page.
> - * @page: page to offline
> - * @flags:

Re: linux-next: unusual update of the security tree

2012-12-13 Thread James Morris

On Thu, 13 Dec 2012, Stephen Rothwell wrote:

> Hi James,
> 
> On Fri, 7 Dec 2012 10:21:31 +1100 (EST) James Morris  
> wrote:
> >
> > On Thu, 6 Dec 2012, Linus Torvalds wrote:
> > 
> > > Have people pulled that thing into anything else? Because quite
> > > frankly, I think it's unsalvageable except with a rebase.
> > 
> > AFAIK, only developers such as Casey will have pulled it for development 
> > purposes.
> > 
> > And sorry, I should be checking the trees I pull from more carefully.
> 
> Are you going to fix this before asking Linus to pull?

Yes.

-- 
James Morris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/12] regulator: gpio-regulator: Demote GPIO Regulator driver to start later

2012-12-13 Thread Mark Brown

On Thu, Dec 13, 2012 at 11:55:24AM +, Lee Jones wrote:

> I understand your logic, hence why I wrote such a lengthy commit
> message. However, I'm not sure I see a logical way around it. Asking
> all users of MMCI to provide a not-regulator to declare that a
> secondary regulator isn't available seems a little unreasonable to me.

> Is there anything else we can do?

Have all the people setting up this secondary regulator explicitly
declare it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] PCIe/PM: Do not suspend port if any subordinate device need PME polling

2012-12-13 Thread Huang Ying

In

  http://www.mail-archive.com/linux-usb@vger.kernel.org/msg07976.html

Ulrich reported that his USB3 cardreader does not work reliably when
connected to the USB3 port.  It turns out that USB3 controller failed
to be waken up when plugging in the USB3 cardreader.  Further
experiment found that the USB3 host controller can only be waken up
via polling, while not via PME interrupt.  But if the PCIe port that
the USB3 host controller is connected is suspended, we can not poll
the USB3 host controller because its config space is not accessible if
the PCIe port is put into low power state.

To solve the issue, the PCIe port will not be suspended if any
subordinate device need PME polling.

Reported-by: Ulrich Eckhardt 
Signed-off-by: Huang Ying 
Tested-by: Sarah Sharp 
Cc: sta...@vger.kernel.org  # 3.6+
---
 drivers/pci/pcie/portdrv_pci.c |   18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -134,10 +134,26 @@ static int pcie_port_runtime_resume(stru
return 0;
 }
 
+static int pci_dev_pme_poll(struct pci_dev *pdev, void *data)
+{
+   int *pme_poll = data;
+   *pme_poll = *pme_poll || pdev->pme_poll;
+   return 0;
+}
+
 static int pcie_port_runtime_idle(struct device *dev)
 {
+   struct pci_dev *pdev = to_pci_dev(dev);
+   int pme_poll = false;
+
+   /*
+* If any subordinate device needs pme poll, we should keep
+* the port in D0, because we need port in D0 to poll it.
+*/
+   pci_walk_bus(pdev->subordinate, pci_dev_pme_poll, _poll);
/* Delay for a short while to prevent too frequent suspend/resume */
-   pm_schedule_suspend(dev, 10);
+   if (!pme_poll)
+   pm_schedule_suspend(dev, 10);
return -EBUSY;
 }
 #else
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] core_pattern: set core helpers root and namespace to crashing process

2012-12-13 Thread Neil Horman

On Thu, Dec 13, 2012 at 02:25:34PM -0800, Andrew Morton wrote:
> On Thu, 13 Dec 2012 13:12:20 -0500
> Neil Horman  wrote:
> 
> > On Thu, Dec 13, 2012 at 06:20:48AM -0600, Serge Hallyn wrote:
> > > Quoting Neil Horman (nhor...@tuxdriver.com):
> > > > Theres one problem I currently see with it, and that is that I'm not 
> > > > sure we can
> > > > change the current behavior of how the root fs is set for the pipe 
> > > > reader, lest
> > > > we break some user space expectations. As such, I've added a sysctl in 
> > > > this
> > > > patch to allow administrators to globally select if a core reader 
> > > > specified via
> > > > /proc/sys/kernel/core_pattern should use the global rootfs, or the 
> > > > (possibly)
> > > > chrooted fs of the crashing process.
> > > 
> > > Practical question:  How is the admin to make an educated decision on
> > > how to set the sysctl?
> 
> By reading the documentation which Neil didn't include?
> 
Yeah, that was stupid of me, I'll respin this with docs.

> > My thought was that the admin typically wouldn't touch this at all.  I 
> > really
> > added it as a backwards compatibility option only.  Setting the user space
> > helper task to the root of the crashing parent has the possibility of 
> > breaking
> > existing installs because the core_pattern helper might be expecting global 
> > file
> > system access.  Moving forward, my expectation would be that core_pattern
> > helpers would be written with the default setting in mind, and we could
> > eventually deprecate the control entirely.
> > 
> > If you have a better mechanism in mind however (or if you think that 
> > removing
> > the control is a resaonable approach), I'm certainly open to that.
> 
> Yeah, this is a tiresome patch but I can't think of a better way.
> 
> Except, perhaps, adding a new token to the core_pattern which says
> "switch namespaces"?
>
I like that idea, perhaps '||' instead of '|' as the leading token can indicate
"use the namespace root" vs. "use the global root".  Thoughts?

> Is there any propect that the core_pattern itself will later become a
> per-namespace containerised thing?  I guess that if the per-container
> core_pattern has been configured, we can implicitly do the namespace
> switch as well.
Yes, that makes sense.  Unfortunately, I don't see proc containerization
happening any time soon.  I suppose if we do the above tokenization, that can be
used despite any future containerization that takes place.

I'll respin the patch with documentation, and replace the extra sysctl with the
above tokenization in the AM.

Best
Neil
 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 resend hot_track 00/16] vfs: hot data tracking

2012-12-13 Thread Darrick J. Wong

On Thu, Dec 13, 2012 at 08:17:26PM +0800, Zhi Yong Wu wrote:
> On Thu, Dec 13, 2012 at 3:50 AM, Darrick J. Wong
>  wrote:
> > On Mon, Dec 10, 2012 at 11:30:03AM +0800, Zhi Yong Wu wrote:
> >> HI, all guys.
> >>
> >> any comments or suggestions?
> >
> > Why did ffsb drop from 924 transactions/sec to 322?
> It is maybe that some noise operations impact on it. I am doing one
> larger scale perf testing in one clearer environment. So i want to
> look at if its ffsb testing result has some difference from this.

That's quite a big noise there...

--D
> 
> >
> > --D
> >>
> >> On Thu, Dec 6, 2012 at 11:28 AM, Zhi Yong Wu  
> >> wrote:
> >> > HI, guys
> >> >
> >> > THe perf testing is done separately with fs_mark, fio, ffsb and
> >> > compilebench in one kvm guest.
> >> >
> >> > Below is the performance testing report for hot tracking, and no obvious
> >> > perf downgrade is found.
> >> >
> >> > Note: original kernel means its source code is not changed;
> >> >   kernel with enabled hot tracking means its source code is with hot
> >> > tracking patchset.
> >> >
> >> > The test env is set up as below:
> >> >
> >> > root@debian-i386:/home/zwu# uname -a
> >> > Linux debian-i386 3.7.0-rc8+ #266 SMP Tue Dec 4 12:17:55 CST 2012 x86_64
> >> > GNU/Linux
> >> >
> >> > root@debian-i386:/home/zwu# mkfs.xfs -f -l
> >> > size=1310b,sunit=8 /home/zwu/bdev.img
> >> > meta-data=/home/zwu/bdev.img isize=256agcount=4, agsize=128000
> >> > blks
> >> >  =   sectsz=512   attr=2, projid32bit=0
> >> > data =   bsize=4096   blocks=512000, imaxpct=25
> >> >  =   sunit=0  swidth=0 blks
> >> > naming   =version 2  bsize=4096   ascii-ci=0
> >> > log  =internal log   bsize=4096   blocks=1310, version=2
> >> >  =   sectsz=512   sunit=1 blks, lazy-count=1
> >> > realtime =none   extsz=4096   blocks=0, rtextents=0
> >> >
> >> > 1.) original kernel
> >> >
> >> > root@debian-i386:/home/zwu# mount -o
> >> > loop,logbsize=256k /home/zwu/bdev.img /mnt/scratch
> >> > [ 1197.421616] XFS (loop0): Mounting Filesystem
> >> > [ 1197.567399] XFS (loop0): Ending clean mount
> >> > root@debian-i386:/home/zwu# mount
> >> > /dev/sda1 on / type ext3 (rw,errors=remount-ro)
> >> > tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
> >> > proc on /proc type proc (rw,noexec,nosuid,nodev)
> >> > sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
> >> > udev on /dev type tmpfs (rw,mode=0755)
> >> > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> >> > devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
> >> > none on /selinux type selinuxfs (rw,relatime)
> >> > debugfs on /sys/kernel/debug type debugfs (rw)
> >> > binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
> >> > (rw,noexec,nosuid,nodev)
> >> > /dev/loop0 on /mnt/scratch type xfs (rw,logbsize=256k)
> >> > root@debian-i386:/home/zwu# free -m
> >> >  total   used   free sharedbuffers
> >> > cached
> >> > Mem:   112109  2  0  4
> >> > 53
> >> > -/+ buffers/cache: 51 60
> >> > Swap:  713 29684
> >> >
> >> > 2.) kernel with enabled hot tracking
> >> >
> >> > root@debian-i386:/home/zwu# mount -o
> >> > hot_track,loop,logbsize=256k /home/zwu/bdev.img /mnt/scratch
> >> > [  364.648470] XFS (loop0): Mounting Filesystem
> >> > [  364.910035] XFS (loop0): Ending clean mount
> >> > [  364.921063] VFS: Turning on hot data tracking
> >> > root@debian-i386:/home/zwu# mount
> >> > /dev/sda1 on / type ext3 (rw,errors=remount-ro)
> >> > tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
> >> > proc on /proc type proc (rw,noexec,nosuid,nodev)
> >> > sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
> >> > udev on /dev type tmpfs (rw,mode=0755)
> >> > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> >> > devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
> >> > none on /selinux type selinuxfs (rw,relatime)
> >> > binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
> >> > (rw,noexec,nosuid,nodev)
> >> > /dev/loop0 on /mnt/scratch type xfs (rw,hot_track,logbsize=256k)
> >> > root@debian-i386:/home/zwu# free -m
> >> >  total   used   free sharedbuffers
> >> > cached
> >> > Mem:   112107  4  0  2
> >> > 34
> >> > -/+ buffers/cache: 70 41
> >> > Swap:  713  2711
> >> >
> >> > 1. fs_mark test
> >> >
> >> > 1.) orginal kernel
> >> >
> >> > #  ./fs_mark  -D  100  -S0  -n  1000  -s  1  -L  30  -d  /mnt/scratch/0
> >> > -d  /mnt/scratch/1  -d  /mnt/scratch/2  -d  /mnt/scratch/3
> >> > -d  /mnt/scratch/4  -d  /mnt/scratch/5  -d  /mnt/scratch/6
> >> > -d  /mnt/scratch/7
> >> > #   Version 3.3, 8 thread(s) starting at Wed Dec  5 03:20:58 2012
> >> > #   Sync method: NO SYNC: Test does not issue sync() or fsync() 
>

Re: [PATCH 02/12] regulator: gpio-regulator: Only read GPIO [dis|en]able pin if not always-on

2012-12-13 Thread Mark Brown

On Thu, Dec 13, 2012 at 11:48:18AM +, Lee Jones wrote:
> On Mon, 10 Dec 2012, Mark Brown wrote:
> > On Mon, Dec 10, 2012 at 08:55:51AM +, Lee Jones wrote:
> > > If a regulator is specified as always-on, then it can't have an
> > > enable/disable pin, as it can't be turned off.

> > Sometimes always on gets set for regulators which do have a physical
> > control wired up - the control might exist for use in suspend mode for
> > example.  Is the ability to specify an enable pin causing a practical
> > problem for systems?  If it is we should fix that.

> My logic is that there is no point in requesting a pin which can
> disable a regulator that can't be disabled. Then we can follow
> on from that logic and say that if a regulator is _not_ always on
> this we _require_ a way to disable it, thus we insist on an enable
> GPIO pin.

> With me?

No.  Making the enable pin optional for always on regulators is fine,
forbidding it is not - that won't work for things like the suspend case
I mentioned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] timekeeping: Add persistent_clock_exist flag

2012-12-13 Thread Jason Gunthorpe

On Thu, Dec 13, 2012 at 06:00:23PM -0800, John Stultz wrote:

> So per Jason's related patch, he's made the point that the
> persistent_clock and RTC class functionality are basically exclusive
> (well, in his case, he said this with respect to updating the RTC,
> not reading it - I don't mean to put words in his mouth - Please do
> correct me here Jason. :).  In other words, we probably should avoid
> configurations where both the rtc hctosys and persistent_clock
> interfaces are both active.

I only studied update_persistent_clock, read_persistent_clock is
very much different.

Looking at it, I don't think that update_persistent_clock is in any
way related to read_persistent_clock..  update_persistent_clock is
*only* called by NTP, and its *only* purpose is to update the RTC with
NTP synchronized time. In many configurations it will never even be
called.

I think update_persistent_clock is badly named, it should be called
platform_save_ntp_time_to_rtc(), keep it divorced from
read_presistent_clock :)

> make the HCTOSYS option be dependent on !HAS_PERSISTENT_CLOCK. This
> way we avoid having configs where there are conflicting paths that
> we chose from.

On ARM the read_presistent_clock is used to access a true monotonic
counter that is divorced from the system RTC - look at
arch/arm/plat-omap/counter_32k.c for instance.

This seems like a great use of that hardware resource, and no doubt
those mach's also have a class RTC driver available talking to
different hardware.

For mach's without that functionality ARM returns a fixed 0 value
from read_persistent_clock, persumably the kernel detects this and
falls back to using class rtc functions?

Maybe Feng would be better off adjusting read_persistent_clock to
return ENODEV in such cases??

So, I think you have to keep your test as a run time test. To support
the single image ARM boot you can't make the distinction with kconfig.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps.

2012-12-13 Thread Andy Lutomirski

On Thu, Dec 13, 2012 at 6:33 PM, Eric W. Biederman
 wrote:
>
> Andy thank you for your review.
>
> Andy Lutomirski  writes:
>> This is confusing enough that I can't immediately tell whether it's
>> correct.  I think it's close but out of order.
>
> Yeah.  That is the trick. Figuring out how to write that code so it is
> correct and obvious.
>
> I have added a comment at the top and removed the extra variable I was
> adding.
>
> The order except for verifying a user_ns->parent is valid by checking
> for targ_ns == _user_ns doesn't make a difference.  Because
> cred->user_ns can only be one of targ_ns or targ_ns->parent.
>
>> Should this be transitive?
>
> Yes.
>>  I.e. suppose uid 1 owns a child of
>> init_user_ns and uid 2 (mapped in the first ns as the identity) owns
>> an inner ns.  Does uid 2 in the root ns have all caps inside?  I'd say
>> no, but I don't have a great argument for that.
>
> I also say no.  It is more code and it doesn't fit my nice small
> definition.  You have to be the owner and you have to be in the parent
> of the target user namespace.  Being able to remember the rules in
> your head is important.
>
>> But uid 1 presumably
>> does have caps because it could enter the parent with setns, then
>> change uid, then enter the child.
>
> Yes. uid 1 does have caps.
>
>> How about (severely whitespace damaged):
>
> You know that makes the termination condition a bit clearer.  But it
> looses the nice place to put a comment when we loop again.  This loop
> is just subtle enough that I want to preserve my comments.
>
> I think I must have put -EPERM towards the end for the same reason to
> make it clear that is the termination condition.
>
> In practice I think it is important to have the cap_raised case first,
> as that is the common case, and if we can be clear and still test that
> case first it means the code will be faster.  With my reordering it is
> obvious that nothing strange happens in the initial user namespace with
> the owner test after the exit when we are the initial user namespace.

Ah.  You are correct about the ordering.  I read it slightly wrong.

I'd still suggest using a variable like "here" instead of "targ_ns".
The latter is confusing because it changes on the second and later
iterations.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps.

2012-12-13 Thread Eric W. Biederman

Andy thank you for your review.

Andy Lutomirski  writes:
> This is confusing enough that I can't immediately tell whether it's
> correct.  I think it's close but out of order.

Yeah.  That is the trick. Figuring out how to write that code so it is
correct and obvious.

I have added a comment at the top and removed the extra variable I was
adding.

The order except for verifying a user_ns->parent is valid by checking
for targ_ns == _user_ns doesn't make a difference.  Because
cred->user_ns can only be one of targ_ns or targ_ns->parent.

> Should this be transitive?

Yes.
>  I.e. suppose uid 1 owns a child of
> init_user_ns and uid 2 (mapped in the first ns as the identity) owns
> an inner ns.  Does uid 2 in the root ns have all caps inside?  I'd say
> no, but I don't have a great argument for that. 

I also say no.  It is more code and it doesn't fit my nice small
definition.  You have to be the owner and you have to be in the parent
of the target user namespace.  Being able to remember the rules in
your head is important.

> But uid 1 presumably
> does have caps because it could enter the parent with setns, then
> change uid, then enter the child.

Yes. uid 1 does have caps.

> How about (severely whitespace damaged):

You know that makes the termination condition a bit clearer.  But it
looses the nice place to put a comment when we loop again.  This loop
is just subtle enough that I want to preserve my comments.

I think I must have put -EPERM towards the end for the same reason to
make it clear that is the termination condition.

In practice I think it is important to have the cap_raised case first,
as that is the common case, and if we can be clear and still test that
case first it means the code will be faster.  With my reordering it is
obvious that nothing strange happens in the initial user namespace with
the owner test after the exit when we are the initial user namespace.

> int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
> int cap, int audit)
> {
> struct user_namespace *here = targ_ns;
>
> /* Walk up the namespace hierarchy until we find our own namespace. */
> for (;;) {
> /* The owner of an ancestor namespace has all caps, if
> that owner is in the parent ns. */
> if (cred->user_ns == here->parent &&
> uid_eq(targ_ns->owner, cred->euid))
> return 0;

This would have needed a check that (here != _user_ns).  Because
the init_user_ns does not have a parent.

> /* Do we have the necessary capabilities? */
> if (here == cred->user_ns)
> return cap_raised(cred->cap_effective, cap) ?
> 0 : -EPERM;
>
> /* Have we tried all of the parent namespaces? */
> if (here == _user_ns)
> return -EPERM;
> else
> here = targ_ns->parent;
> }
> }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/26] AIO performance improvements/cleanups, v2

2012-12-13 Thread Jack Wang

2012/12/14 Jens Axboe :
> On Mon, Dec 03 2012, Kent Overstreet wrote:
>> Last posting: http://thread.gmane.org/gmane.linux.kernel.aio.general/3169
>>
>> Changes since the last posting should all be noted in the individual
>> patch descriptions.
>>
>>  * Zach pointed out the aio_read_evt() patch was calling functions that
>>could sleep in TASK_INTERRUPTIBLE state, that patch is rewritten.
>>  * Ben pointed out some synchronize_rcu() usage was problematic,
>>converted it to call_rcu()
>>  * The flush_dcache_page() patch is new
>>  * Changed the "use cancellation list lazily" patch so as to remove
>>ki_flags from struct kiocb.
>
> Kent, I ran a few tests, and the below patches still don't seem as fast
> as the approach I took. To keep it fair, I used your aio branch and
> applied by dio speedups too. As a sanity check, I ran with your branch
> alone as well. The quick results below - kaio is kent-aio, just your
> branch. kaio-dio is with the direct IO speedups too. jaio is my branch,
> which already has the dio changes too.
>
> Devices Branch  IOPS
> 1   kaio~915K
> 1   kaio-dio~930K
> 1   jaio   ~1220K
> 6   kaio   ~3050K
> 6   kaio-dio   ~3080K
> 6   jaio3500K
>
> The box runs out of CPU driving power, which is why it doesn't scale
> linearly, otherwise I know that jaio at least does. It's basically
> completion limited for the 6 device test at the moment.
>
> I'll run some profiling tomorrow morning and get you some better
> results. Just thought I'd share these at least.
>
> --
> Jens Axboe
>

A really good performance, woo.

I think the device tested is really fast PCIe SSD builded by fusionio
with fusionio in house block driver?

any compare number with current mainline?

Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1]linux-usb: optimize to match rules in USB storage for Huawei dongles.

2012-12-13 Thread fangxiaozhi 00110321

From: fangxiaozhi 

1. Add a new macro define for USB storage match rule.
2. Optimize the match rules with new macro for Huawei USB storage devices, 
   to avoid to load USB storage driver for the modem interface 
   with Huawei devices.
3. Add to support new switch command for new Huawei USB dongles.

Signed-off-by: fangxiaozhi 


diff -uprN linux-3.7_bak/drivers/usb/storage/initializers.c 
linux-3.7/drivers/usb/storage/initializers.c
--- linux-3.7_bak/drivers/usb/storage/initializers.c2012-12-11 
09:56:11.0 +0800
+++ linux-3.7/drivers/usb/storage/initializers.c2012-12-13 
20:53:40.0 +0800
@@ -92,8 +92,8 @@ int usb_stor_ucr61s2b_init(struct us_dat
return 0;
 }
 
-/* This places the HUAWEI E220 devices in multi-port mode */
-int usb_stor_huawei_e220_init(struct us_data *us)
+/* This places the HUAWEI usb dongles in multi-port mode */
+static int usb_stor_huawei_feature_init(struct us_data *us)
 {
int result;
 
@@ -104,3 +104,55 @@ int usb_stor_huawei_e220_init(struct us_
US_DEBUGP("Huawei mode set result is %d\n", result);
return 0;
 }
+
+/* Find the supported Huawei USB dongles */
+static int usb_stor_huawei_dongles_pid(struct us_data *us)
+{
+   struct usb_interface_descriptor *idesc;
+   int idProduct;
+   idesc = >pusb_intf->cur_altsetting->desc;
+   idProduct = us->pusb_dev->descriptor.idProduct;
+   if (idesc && idesc->bInterfaceNumber == 0) {
+   if ((idProduct == 0x1001)
+   || (idProduct == 0x1003)
+   || (idProduct == 0x1004)
+   || (idProduct >= 0x1401 && idProduct < 0x1501)
+   || (idProduct > 0x1504 && idProduct <= 0x1600)
+   || (idProduct >= 0x1c02 && idProduct <= 0x2202)) {
+   return 1;
+   }
+   }
+   return 0;
+}
+
+static int usb_stor_huawei_scsi_init(struct us_data *us)
+{
+   int result = 0;
+   int act_len = 0;
+   char rewind_cmd[] = {0x11, 0x06, 0x20, 0x00, 0x00, 0x01, 0x01, 0x00,
+   0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+   struct bulk_cb_wrap bcbw = {0};
+   bcbw.Signature = cpu_to_le32(US_BULK_CB_SIGN);
+   bcbw.Tag = 0;
+   bcbw.DataTransferLength = cpu_to_le32(0);
+   bcbw.Flags = bcbw.Lun = 0;
+   bcbw.Length = sizeof(rewind_cmd);
+   memcpy(bcbw.CDB, rewind_cmd, sizeof(rewind_cmd));
+
+   result = usb_stor_bulk_transfer_buf(us, us->send_bulk_pipe, ,
+   US_BULK_CS_WRAP_LEN, _len);
+   US_DEBUGP("transfer actual length=%d, result=%d\n", act_len, result);
+   return result;
+}
+
+int usb_stor_huawei_init(struct us_data *us)
+{
+   int result = 0;
+   if (usb_stor_huawei_dongles_pid(us)) {
+   if (us->pusb_dev->descriptor.idProduct >= 0x1446)
+   result = usb_stor_huawei_scsi_init(us);
+   else
+   result = usb_stor_huawei_feature_init(us);
+   }
+   return result;
+}
diff -uprN linux-3.7_bak/drivers/usb/storage/initializers.h 
linux-3.7/drivers/usb/storage/initializers.h
--- linux-3.7_bak/drivers/usb/storage/initializers.h2012-12-11 
09:56:11.0 +0800
+++ linux-3.7/drivers/usb/storage/initializers.h2012-12-12 
11:43:58.0 +0800
@@ -46,5 +46,5 @@ int usb_stor_euscsi_init(struct us_data 
  * flash reader */
 int usb_stor_ucr61s2b_init(struct us_data *us);
 
-/* This places the HUAWEI E220 devices in multi-port mode */
-int usb_stor_huawei_e220_init(struct us_data *us);
+/* This places the HUAWEI usb dongles in multi-port mode */
+int usb_stor_huawei_init(struct us_data *us);
diff -uprN linux-3.7_bak/drivers/usb/storage/unusual_devs.h 
linux-3.7/drivers/usb/storage/unusual_devs.h
--- linux-3.7_bak/drivers/usb/storage/unusual_devs.h2012-12-11 
09:56:11.0 +0800
+++ linux-3.7/drivers/usb/storage/unusual_devs.h2012-12-13 
20:51:23.0 +0800
@@ -1527,335 +1527,10 @@ UNUSUAL_DEV(  0x1210, 0x0003, 0x0100, 0x
 /* Reported by fangxiaozhi 
  * This brings the HUAWEI data card devices into multi-port mode
  */
-UNUSUAL_DEV(  0x12d1, 0x1001, 0x, 0x,
+UNUSUAL_VENDOR_INTF(0x12d1, 0x08, 0x06, 0x50,
"HUAWEI MOBILE",
"Mass Storage",
-   USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_huawei_e220_init,
-   0),
-UNUSUAL_DEV(  0x12d1, 0x1003, 0x, 0x,
-   "HUAWEI MOBILE",
-   "Mass Storage",
-   USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_huawei_e220_init,
-   0),
-UNUSUAL_DEV(  0x12d1, 0x1004, 0x, 0x,
-   "HUAWEI MOBILE",
-   "Mass Storage",
-   USB_SC_DEVICE, USB_PR_DEVICE, usb_stor_huawei_e220_init,
-   0),
-UNUSUAL_DEV(  0x12d1, 0x1401, 0x, 0x,
-   "HUAWEI MOBILE",
-

[PATCH 1/2] autofs4 - dont clear DCACHE_NEED_AUTOMOUNT on rootless mount

2012-12-13 Thread Ian Kent

The DCACHE_NEED_AUTOMOUNT flag is cleared on mount and set on expire
for autofs rootless multi-mount dentrys to prevent unnecessary calls
to ->d_automount().

Since DCACHE_MANAGE_TRANSIT is always set on autofs dentrys ->d_managed()
is always called so the check can be done in ->d_manage() without the
need to change the flag. This still avoids unnecessary calls to
->d_automount(), adds negligible overhead and eliminates a seriously
ugly check in the expire code.

Signed-off-by: Ian Kent 
---

 fs/autofs4/expire.c |9 
 fs/autofs4/root.c   |   61 ++-
 2 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 842d000..01443ce 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -548,15 +548,6 @@ int autofs4_do_expire_multi(struct super_block *sb, struct 
vfsmount *mnt,
 
spin_lock(>fs_lock);
ino->flags &= ~AUTOFS_INF_EXPIRING;
-   spin_lock(>d_lock);
-   if (!ret) {
-   if ((IS_ROOT(dentry) ||
-   (autofs_type_indirect(sbi->type) &&
-IS_ROOT(dentry->d_parent))) &&
-   !(dentry->d_flags & DCACHE_NEED_AUTOMOUNT))
-   __managed_dentry_set_automount(dentry);
-   }
-   spin_unlock(>d_lock);
complete_all(>expire_complete);
spin_unlock(>fs_lock);
dput(dentry);
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 91b1165..30a6ab66 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -355,7 +355,6 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
status = autofs4_mount_wait(dentry);
if (status)
return ERR_PTR(status);
-   spin_lock(>fs_lock);
goto done;
}
 
@@ -364,8 +363,11 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
 * having d_mountpoint() true, so there's no need to call back
 * to the daemon.
 */
-   if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode))
+   if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)) {
+   spin_unlock(>fs_lock);
goto done;
+   }
+
if (!d_mountpoint(dentry)) {
/*
 * It's possible that user space hasn't removed directories
@@ -379,8 +381,10 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
 * require user space behave.
 */
if (sbi->version > 4) {
-   if (have_submounts(dentry))
+   if (have_submounts(dentry)) {
+   spin_unlock(>fs_lock);
goto done;
+   }
} else {
spin_lock(>d_lock);
if (!list_empty(>d_subdirs)) {
@@ -399,28 +403,8 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
return ERR_PTR(status);
}
}
-done:
-   if (!(ino->flags & AUTOFS_INF_EXPIRING)) {
-   /*
-* Any needed mounting has been completed and the path
-* updated so clear DCACHE_NEED_AUTOMOUNT so we don't
-* call ->d_automount() on rootless multi-mounts since
-* it can lead to an incorrect ELOOP error return.
-*
-* Only clear DMANAGED_AUTOMOUNT for rootless multi-mounts and
-* symlinks as in all other cases the dentry will be covered by
-* an actual mount so ->d_automount() won't be called during
-* the follow.
-*/
-   spin_lock(>d_lock);
-   if ((!d_mountpoint(dentry) &&
-   !list_empty(>d_subdirs)) ||
-   (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)))
-   __managed_dentry_clear_automount(dentry);
-   spin_unlock(>d_lock);
-   }
spin_unlock(>fs_lock);
-
+done:
/* Mount succeeded, check if we ended up with a new dentry */
dentry = autofs4_mountpoint_changed(path);
if (!dentry)
@@ -432,6 +416,8 @@ done:
 int autofs4_d_manage(struct dentry *dentry, bool rcu_walk)
 {
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
+   struct autofs_info *ino = autofs4_dentry_ino(dentry);
+   int status;
 
DPRINTK("dentry=%p %.*s",
dentry, dentry->d_name.len, dentry->d_name.name);
@@ -456,7 +442,32 @@ int autofs4_d_manage(struct dentry *dentry, bool rcu_walk)
 * This dentry may be under construction so wait on mount
 * completion.
 */
-   return autofs4_mount_wait(dentry);
+   status = autofs4_mount_wait(dentry);
+   if (status)
+

[PATCH 2/2] autofs4 - use simple_empty() for empty directory check

2012-12-13 Thread Ian Kent

For direct (and offset) mounts, if an automounted mount is manually
umounted the trigger mount dentry can appear non-empty causing it to
not trigger mounts. This can also happen if there is a file handle
leak in a user space automounting application.

This happens because, when a ioctl control file handle is opened
on the mount, a cursor dentry is created which causes list_empty()
to see the dentry as non-empty. Since there is a case where listing
the directory of these dentrys is needed, the use of dcache_dir_*()
functions for .open() and .release() is needed.

Consequently simple_empty() must be used instead of list_empty()
when checking for an empty directory.

Signed-off-by: Ian Kent 
---

 fs/autofs4/root.c |   22 +-
 1 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 30a6ab66..c934476 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -124,13 +124,10 @@ static int autofs4_dir_open(struct inode *inode, struct 
file *file)
 * it.
 */
spin_lock(>lookup_lock);
-   spin_lock(>d_lock);
-   if (!d_mountpoint(dentry) && list_empty(>d_subdirs)) {
-   spin_unlock(>d_lock);
+   if (!d_mountpoint(dentry) && simple_empty(dentry)) {
spin_unlock(>lookup_lock);
return -ENOENT;
}
-   spin_unlock(>d_lock);
spin_unlock(>lookup_lock);
 
 out:
@@ -386,12 +383,8 @@ static struct vfsmount *autofs4_d_automount(struct path 
*path)
goto done;
}
} else {
-   spin_lock(>d_lock);
-   if (!list_empty(>d_subdirs)) {
-   spin_unlock(>d_lock);
+   if (!simple_empty(dentry))
goto done;
-   }
-   spin_unlock(>d_lock);
}
ino->flags |= AUTOFS_INF_PENDING;
spin_unlock(>fs_lock);
@@ -610,9 +603,7 @@ static int autofs4_dir_unlink(struct inode *dir, struct 
dentry *dentry)
 
spin_lock(>lookup_lock);
__autofs4_add_expiring(dentry);
-   spin_lock(>d_lock);
-   __d_drop(dentry);
-   spin_unlock(>d_lock);
+   d_drop(dentry);
spin_unlock(>lookup_lock);
 
return 0;
@@ -683,15 +674,12 @@ static int autofs4_dir_rmdir(struct inode *dir, struct 
dentry *dentry)
return -EACCES;
 
spin_lock(>lookup_lock);
-   spin_lock(>d_lock);
-   if (!list_empty(>d_subdirs)) {
-   spin_unlock(>d_lock);
+   if (!simple_empty(dentry)) {
spin_unlock(>lookup_lock);
return -ENOTEMPTY;
}
__autofs4_add_expiring(dentry);
-   __d_drop(dentry);
-   spin_unlock(>d_lock);
+   d_drop(dentry);
spin_unlock(>lookup_lock);
 
if (sbi->version < 5)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-13 Thread Andy Lutomirski

On Thu, Dec 13, 2012 at 6:18 PM, H. Peter Anvin  wrote:
> Wouldn't the vdso get mapped already and could be mremap()'d.  If we really 
> need more control I'd almost push for a device/filesystem node that could be 
> mmapped the usual way.

Hmm.  That may work, but it'll still break ABI.  I'm not sure that
criu is stable enough yet that we should care.  Criu people?

(In brief summary: how annoying would it be if the vdso was no longer
just a bunch of constant bytes that lived somewhere?)

--Andy

>
> Andy Lutomirski  wrote:
>
>>On Thu, Dec 13, 2012 at 5:49 PM, H. Peter Anvin  wrote:
>>> On 12/13/2012 05:42 PM, Andy Lutomirski wrote:

 The 64-bit/x32 case is currently very simple and fast because it
>>uses
 absolute addressing.  Admittedly, pcrel references are free, so
 changing this wouldn't cost much.  For native, it'll be slower, but
 maybe no one cares.  I seem to care about this more than anyone
>>else,
 and I don't use 32 bit code. :)

>>>
>>> pcrel is actually cheaper than absolute addressing in 64-bit mode.
>>>
 The benefit of switching is that the vdso code could be the same in
 all three cases.  (Actually, it's even better than that.  All of the
 VVAR magic could be the same in the vdso and the kernel -- the
>>kernel
 linker script would just have to have an appropriate symbol to see
>>the
 appropriate mapping.)


 This:

 __attribute__((visibility("hidden"))) int foo;

 int get_foo(void)
 {
   return foo;
 }

 generates a rip-relative access on 64 bits and GOTOFF on 32 bits.

 The only reason I didn't use a real symbol in the first place is
 because I couldn't figure out how to get gcc to emit an absolute
 relocation in pic code.
>>>
>>> Well, then, we wouldn't need to do that... this is starting to sound
>>> like a significant win.
>>
>>How will this avoid breaking checkpoint/restore in userspace?  If the
>>vdso is not just plain old code, criu presumably needs to know about
>>it.  Should there be an arch_prctl(ARCH_MAP_VDSO, addr) to create a
>>vdso mapping somewhere?
>>
>>--Andy
>
> --
> Sent from my mobile phone. Please excuse brevity and lack of formatting.



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-13 Thread H. Peter Anvin

Wouldn't the vdso get mapped already and could be mremap()'d.  If we really 
need more control I'd almost push for a device/filesystem node that could be 
mmapped the usual way.

Andy Lutomirski  wrote:

>On Thu, Dec 13, 2012 at 5:49 PM, H. Peter Anvin  wrote:
>> On 12/13/2012 05:42 PM, Andy Lutomirski wrote:
>>>
>>> The 64-bit/x32 case is currently very simple and fast because it
>uses
>>> absolute addressing.  Admittedly, pcrel references are free, so
>>> changing this wouldn't cost much.  For native, it'll be slower, but
>>> maybe no one cares.  I seem to care about this more than anyone
>else,
>>> and I don't use 32 bit code. :)
>>>
>>
>> pcrel is actually cheaper than absolute addressing in 64-bit mode.
>>
>>> The benefit of switching is that the vdso code could be the same in
>>> all three cases.  (Actually, it's even better than that.  All of the
>>> VVAR magic could be the same in the vdso and the kernel -- the
>kernel
>>> linker script would just have to have an appropriate symbol to see
>the
>>> appropriate mapping.)
>>>
>>>
>>> This:
>>>
>>> __attribute__((visibility("hidden"))) int foo;
>>>
>>> int get_foo(void)
>>> {
>>>   return foo;
>>> }
>>>
>>> generates a rip-relative access on 64 bits and GOTOFF on 32 bits.
>>>
>>> The only reason I didn't use a real symbol in the first place is
>>> because I couldn't figure out how to get gcc to emit an absolute
>>> relocation in pic code.
>>
>> Well, then, we wouldn't need to do that... this is starting to sound
>> like a significant win.
>
>How will this avoid breaking checkpoint/restore in userspace?  If the
>vdso is not just plain old code, criu presumably needs to know about
>it.  Should there be an arch_prctl(ARCH_MAP_VDSO, addr) to create a
>vdso mapping somewhere?
>
>--Andy

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 3/3 RESEND] MCE: use num_poisoned_pages instead of mce_bad_pages

2012-12-13 Thread Xishi Qiu

Since MCE is an x86 concept, and this code is in mm/, it would be
better to use the name num_poisoned_pages instead of mce_bad_pages.

Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Suggested-by: Borislav Petkov 
Reviewed-by: Wanpeng Li 
---
 fs/proc/meminfo.c   |2 +-
 include/linux/mm.h  |2 +-
 mm/memory-failure.c |   16 
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 80e4645..c3dac61 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -158,7 +158,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
vmi.used >> 10,
vmi.largest_chunk >> 10
 #ifdef CONFIG_MEMORY_FAILURE
-   ,atomic_long_read(_bad_pages) << (PAGE_SHIFT - 10)
+   ,atomic_long_read(_poisoned_pages) << (PAGE_SHIFT - 10)
 #endif
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5432a3e..8ccc477 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,7 +1653,7 @@ extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
-extern atomic_long_t mce_bad_pages;
+extern atomic_long_t num_poisoned_pages;
 extern int soft_offline_page(struct page *page, int flags);

 extern void dump_page(struct page *page);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e513a7b..ff5e611 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -61,7 +61,7 @@ int sysctl_memory_failure_early_kill __read_mostly = 0;

 int sysctl_memory_failure_recovery __read_mostly = 1;

-atomic_long_t mce_bad_pages __read_mostly = ATOMIC_LONG_INIT(0);
+atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);

 #if defined(CONFIG_HWPOISON_INJECT) || defined(CONFIG_HWPOISON_INJECT_MODULE)

@@ -1040,7 +1040,7 @@ int memory_failure(unsigned long pfn, int trapno, int 
flags)
}

nr_pages = 1 << compound_trans_order(hpage);
-   atomic_long_add(nr_pages, _bad_pages);
+   atomic_long_add(nr_pages, _poisoned_pages);

/*
 * We need/can do nothing about count=0 pages.
@@ -1070,7 +1070,7 @@ int memory_failure(unsigned long pfn, int trapno, int 
flags)
if (!PageHWPoison(hpage)
|| (hwpoison_filter(p) && TestClearPageHWPoison(p))
|| (p != hpage && TestSetPageHWPoison(hpage))) {
-   atomic_long_sub(nr_pages, _bad_pages);
+   atomic_long_sub(nr_pages, _poisoned_pages);
return 0;
}
set_page_hwpoison_huge_page(hpage);
@@ -1128,7 +1128,7 @@ int memory_failure(unsigned long pfn, int trapno, int 
flags)
}
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
-   atomic_long_sub(nr_pages, _bad_pages);
+   atomic_long_sub(nr_pages, _poisoned_pages);
unlock_page(hpage);
put_page(hpage);
return 0;
@@ -1323,7 +1323,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}
if (TestClearPageHWPoison(p))
-   atomic_long_sub(nr_pages, _bad_pages);
+   atomic_long_sub(nr_pages, _poisoned_pages);
pr_info("MCE: Software-unpoisoned free page %#lx\n", pfn);
return 0;
}
@@ -1337,7 +1337,7 @@ int unpoison_memory(unsigned long pfn)
 */
if (TestClearPageHWPoison(page)) {
pr_info("MCE: Software-unpoisoned page %#lx\n", pfn);
-   atomic_long_sub(nr_pages, _bad_pages);
+   atomic_long_sub(nr_pages, _poisoned_pages);
freeit = 1;
if (PageHuge(page))
clear_page_hwpoison_huge_page(page);
@@ -1442,7 +1442,7 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
}
 done:
/* keep elevated page count for bad page */
-   atomic_long_add(1 << compound_trans_order(hpage), _bad_pages);
+   atomic_long_add(1 << compound_trans_order(hpage), _poisoned_pages);
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
 out:
@@ -1584,7 +1584,7 @@ int soft_offline_page(struct page *page, int flags)

 done:
/* keep elevated page count for bad page */
-   atomic_long_inc(_bad_pages);
+   atomic_long_inc(_poisoned_pages);
SetPageHWPoison(page);
 out:
return ret;
-- 
1.7.1



.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

[PATCH V4 1/3 RESEND] MCE: fix an error of mce_bad_pages statistics

2012-12-13 Thread Xishi Qiu

Move poisoned page check at the beginning of the function in order to
fix the error.

Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Tested-by: Naoya Horiguchi 
---
 mm/memory-failure.c |   38 +-
 1 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b20278..3a8b4b2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1419,18 +1419,17 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_head(page);

+   if (PageHWPoison(hpage)) {
+   pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
+   return -EBUSY;
+   }
+
ret = get_any_page(page, pfn, flags);
if (ret < 0)
return ret;
if (ret == 0)
goto done;

-   if (PageHWPoison(hpage)) {
-   put_page(hpage);
-   pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
-   return -EBUSY;
-   }
-
/* Keep page count to indicate a given hugepage is isolated. */
ret = migrate_huge_page(hpage, new_page, MPOL_MF_MOVE_ALL, false,
MIGRATE_SYNC);
@@ -1441,12 +1440,11 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
return ret;
}
 done:
-   if (!PageHWPoison(hpage))
-   atomic_long_add(1 << compound_trans_order(hpage),
-   _bad_pages);
+   /* keep elevated page count for bad page */
+   atomic_long_add(1 << compound_trans_order(hpage), _bad_pages);
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
-   /* keep elevated page count for bad page */
+
return ret;
 }

@@ -1488,6 +1486,11 @@ int soft_offline_page(struct page *page, int flags)
}
}

+   if (PageHWPoison(page)) {
+   pr_info("soft offline: %#lx page already poisoned\n", pfn);
+   return -EBUSY;
+   }
+
ret = get_any_page(page, pfn, flags);
if (ret < 0)
return ret;
@@ -1519,19 +1522,11 @@ int soft_offline_page(struct page *page, int flags)
return -EIO;
}

-   lock_page(page);
-   wait_on_page_writeback(page);
-
/*
 * Synchronized using the page lock with memory_failure()
 */
-   if (PageHWPoison(page)) {
-   unlock_page(page);
-   put_page(page);
-   pr_info("soft offline: %#lx page already poisoned\n", pfn);
-   return -EBUSY;
-   }
-
+   lock_page(page);
+   wait_on_page_writeback(page);
/*
 * Try to invalidate first. This should work for
 * non dirty unmapped page cache pages.
@@ -1582,8 +1577,9 @@ int soft_offline_page(struct page *page, int flags)
return ret;

 done:
+   /* keep elevated page count for bad page */
atomic_long_add(1, _bad_pages);
SetPageHWPoison(page);
-   /* keep elevated page count for bad page */
+
return ret;
 }
-- 
1.7.1



.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 19/27] x86, boot: update comments about entries for 64bit image

2012-12-13 Thread Yinghai Lu

On Thu, Dec 13, 2012 at 3:27 PM, H. Peter Anvin  wrote:
> if we depend on other
> things we should make that explicit, not just here but in boot.txt.

please check lines for boot.txt

---

 64-bit BOOT PROTOCOL

For machine with 64bit cpus and 64bit kernel, we could use 64bit bootloader
We need a 64-bit boot protocol.

In 64-bit boot protocol, the first step in loading a Linux kernel
should be to setup the boot parameters (struct boot_params,
traditionally known as "zero page"). The memory for struct boot_params
should be allocated under or above 4G and initialized to all zero.
Then the setup header from offset 0x01f1 of kernel image on should be
loaded into struct boot_params and examined. The end of setup header
can be calculated as follow:

0x0202 + byte value at offset 0x0201

In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as that
described in zero-page.txt.

After setting up the struct boot_params, the boot loader can load the
64-bit kernel in the same way as that of 16-bit boot protocol, but
kernel could be above 4G.

In 64-bit boot protocol, the kernel is started by jumping to the
64-bit kernel entry point, which is the start address of loaded
64-bit kernel plus 0x200.

At entry, the CPU must be in 64-bit mode with paging enabled.
The range with setup_header.init_size from start address of loaded
kernel and zero page and command line buffer get ident mapping;
a GDT must be loaded with the descriptors for selectors
__BOOT_CS(0x10) and __BOOT_DS(0x18); both descriptors must be 4G flat
segment; __BOOT_CS must have execute/read permission, and __BOOT_DS
must have read/write permission; CS must be __BOOT_CS and DS, ES, SS
must be __BOOT_DS; interrupt must be disabled; %rsi must hold the base
address of the struct boot_params.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 0/3 RESEND] MCE: fix an error of mce_bad_pages statistics

2012-12-13 Thread Xishi Qiu

$ echo paddr > /sys/devices/system/memory/soft_offline_page to offline a
*free* page, the value of mce_bad_pages will be added, and the page is set
HWPoison flag, but it is still managed by page buddy alocator.

$ cat /proc/meminfo | grep HardwareCorrupted shows the value.

If we offline the same page, the value of mce_bad_pages will be added
*again*, this means the value is incorrect now. Assume the page is
still free during this short time.

soft_offline_page()
get_any_page()
"else if (is_free_buddy_page(p))" branch return 0
"goto done";
"atomic_long_add(1, _bad_pages);"

Changelog:
V4:
-use num_poisoned_pages instead of mce_bad_pages
-remove page lock
V3:
-add page lock when set HWPoison flag
-adjust the function structure
V2 and V1:
-fix the error

Xishi Qiu (3):
  move-poisoned-page-check-at-the-beginning-of-the-function
  fix-function-structure
  use-num_poisoned_pages-instead-of-mce_bad_pages

 fs/proc/meminfo.c   |2 +-
 include/linux/mm.h  |2 +-
 mm/memory-failure.c |   76 ++-
 3 files changed, 41 insertions(+), 39 deletions(-)


.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V4 2/3 RESEND] MCE: do code refactor of soft_offline_page

2012-12-13 Thread Xishi Qiu

There are too many return points randomly intermingled with some
"goto done" return points. So adjust the function structure, one
for the success path, the other for the failure path.
Use atomic_long_inc instead of atomic_long_add.

Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Suggested-by: Andrew Morton 
---
 mm/memory-failure.c |   34 --
 1 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3a8b4b2..e513a7b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1421,12 +1421,13 @@ static int soft_offline_huge_page(struct page *page, 
int flags)

if (PageHWPoison(hpage)) {
pr_info("soft offline: %#lx hugepage already poisoned\n", pfn);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out;
}

ret = get_any_page(page, pfn, flags);
if (ret < 0)
-   return ret;
+   goto out;
if (ret == 0)
goto done;

@@ -1437,14 +1438,14 @@ static int soft_offline_huge_page(struct page *page, 
int flags)
if (ret) {
pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
pfn, ret, page->flags);
-   return ret;
+   goto out;
}
 done:
/* keep elevated page count for bad page */
atomic_long_add(1 << compound_trans_order(hpage), _bad_pages);
set_page_hwpoison_huge_page(hpage);
dequeue_hwpoisoned_huge_page(hpage);
-
+out:
return ret;
 }

@@ -1476,24 +1477,28 @@ int soft_offline_page(struct page *page, int flags)
unsigned long pfn = page_to_pfn(page);
struct page *hpage = compound_trans_head(page);

-   if (PageHuge(page))
-   return soft_offline_huge_page(page, flags);
+   if (PageHuge(page)) {
+   ret = soft_offline_huge_page(page, flags);
+   goto out;
+   }
if (PageTransHuge(hpage)) {
if (PageAnon(hpage) && unlikely(split_huge_page(hpage))) {
pr_info("soft offline: %#lx: failed to split THP\n",
pfn);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out;
}
}

if (PageHWPoison(page)) {
pr_info("soft offline: %#lx page already poisoned\n", pfn);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out;
}

ret = get_any_page(page, pfn, flags);
if (ret < 0)
-   return ret;
+   goto out;
if (ret == 0)
goto done;

@@ -1512,14 +1517,15 @@ int soft_offline_page(struct page *page, int flags)
 */
ret = get_any_page(page, pfn, 0);
if (ret < 0)
-   return ret;
+   goto out;
if (ret == 0)
goto done;
}
if (!PageLRU(page)) {
pr_info("soft_offline: %#lx: unknown non LRU page type %lx\n",
pfn, page->flags);
-   return -EIO;
+   ret = -EIO;
+   goto out;
}

/*
@@ -1574,12 +1580,12 @@ int soft_offline_page(struct page *page, int flags)
pfn, ret, page_count(page), page->flags);
}
if (ret)
-   return ret;
+   goto out;

 done:
/* keep elevated page count for bad page */
-   atomic_long_add(1, _bad_pages);
+   atomic_long_inc(_bad_pages);
SetPageHWPoison(page);
-
+out:
return ret;
 }
-- 
1.7.1



.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] timekeeping: Add persistent_clock_exist flag

2012-12-13 Thread Feng Tang

On Thu, Dec 13, 2012 at 06:00:23PM -0800, John Stultz wrote:
> On 12/13/2012 05:37 PM, Feng Tang wrote:
> >On Thu, Dec 13, 2012 at 05:20:36PM -0800, John Stultz wrote:
> >>On 12/12/2012 06:05 PM, Feng Tang wrote:
> >>>In current kernel, there are several places which need to check
> >>>whether there is a persistent clock for the platform. Current check
> >>>is done by calling the read_persistent_clock() and validating the
> >>>return value.
> >>>
> >>>Add such a flag to make code more readable and call read_persistent_clock()
> >>>only once for all the checks.
> >>Sorry.. What  the actual benefit of this patch set?   (Usually with
> >>changelogs its better to explain why you're doing something, rather
> >>then just what you're doing.)
> >The main benefits is not bother to do the rtc_resume and rtc_suspend work
> >if persistent clock exists. Current RTC suspend/resume code will do many
> >time calculation and compensation work at first, and then call
> >timekeeping_inject_sleeptime() which will just return for platform with
> >persistent clock, what I did in this patchset is to put the check at
> >the start, also I save the persistent_clock_exist flag for all possible
> >check after timekeeping_init().
> 
> CC'ing Jason as his recent patch is conceptually connected here.
> 
> Ok, Feng, so your patch set is a suspend/resume optimization for the
> case where the architecture has a read_persistent_clock()
> implementation, but the kernel config has also the rtc
> HCTOSYS_DEVICE set, right?

Exactly! Sorry for I didn't make it clear

> 
> So we basically short-cut the rtc's HCTOSYS_DEVICE suspend/resume
> logic, likely to speed up suspend/resume.
> 
> So per Jason's related patch, he's made the point that the
> persistent_clock and RTC class functionality are basically exclusive
> (well, in his case, he said this with respect to updating the RTC,
> not reading it - I don't mean to put words in his mouth - Please do
> correct me here Jason. :).  In other words, we probably should avoid
> configurations where both the rtc hctosys and persistent_clock
> interfaces are both active.

Yes, I agree these 2 should be exclusive.

> 
> So my thought here is that this same behavioral change could be made
> via Kconfig constraints rather then extra run-time conditionals.
> Basically we add a HAS_PERSISTENT_CLOCK, that architectures select
> if they want to use the read/update_persistent_clock calls. Then we
> make the HCTOSYS option be dependent on !HAS_PERSISTENT_CLOCK. This
> way we avoid having configs where there are conflicting paths that
> we chose from.

Sounds good to me, and we may need to add the HAS_PERSISTENT_CLOCK
to the default config of platforms who implemente the read_persistent_clock().

One concern is if a platform has read_persistent_clock capability, will
it also has the write_persistent_clock? The answer may be no  

Thanks,
Feng


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write

2012-12-13 Thread Darrick J. Wong

On Thu, Dec 13, 2012 at 05:48:06PM -0800, Andy Lutomirski wrote:
> On 12/13/2012 12:08 AM, Darrick J. Wong wrote:
> > Several complaints have been received regarding long file write latencies 
> > when
> > memory pages must be held stable during writeback.  Since it might not be
> > acceptable to stall programs for the entire duration of a page write (which 
> > may
> > take many milliseconds even on good hardware), enable a second strategy 
> > wherein
> > pages are snapshotted as part of submit_bio; the snapshot can be held stable
> > while writes continue.
> > 
> > This provides a band-aid to provide stable page writes on jbd without 
> > needing
> > to backport the fixed locking scheme in jbd2.  A mount option is added to 
> > ext4
> > to allow administrators to enable it there.
> 
> I'm a bit confused as to what it has to do with ext3.  Wouldn't this be
> useful as a mount option everywhere, though?

ext3 requires snapshots; the rest are ok with either strategy.

*If* snapshotting is generally liked, then yes I'll go redo it as a vfs mount
option.

> If this becomes widely used, would it be better to snapshot on
> wait_for_stable_page instead of on io submission?

That really depends on how long you can afford to wait and how much free
memory you have. :)  It's all a big tradeoff between write latency and
consumption of memory pages and bandwidth, and one that I doubt I'm qualified
to make for everyone.

> FWIW, I'm about to pound pretty hard on this whole patchset on a box
> that doesn't need stable pages.  I'll let you know how it goes.

Yay!

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-13 Thread Andy Lutomirski

On Thu, Dec 13, 2012 at 5:49 PM, H. Peter Anvin  wrote:
> On 12/13/2012 05:42 PM, Andy Lutomirski wrote:
>>
>> The 64-bit/x32 case is currently very simple and fast because it uses
>> absolute addressing.  Admittedly, pcrel references are free, so
>> changing this wouldn't cost much.  For native, it'll be slower, but
>> maybe no one cares.  I seem to care about this more than anyone else,
>> and I don't use 32 bit code. :)
>>
>
> pcrel is actually cheaper than absolute addressing in 64-bit mode.
>
>> The benefit of switching is that the vdso code could be the same in
>> all three cases.  (Actually, it's even better than that.  All of the
>> VVAR magic could be the same in the vdso and the kernel -- the kernel
>> linker script would just have to have an appropriate symbol to see the
>> appropriate mapping.)
>>
>>
>> This:
>>
>> __attribute__((visibility("hidden"))) int foo;
>>
>> int get_foo(void)
>> {
>>   return foo;
>> }
>>
>> generates a rip-relative access on 64 bits and GOTOFF on 32 bits.
>>
>> The only reason I didn't use a real symbol in the first place is
>> because I couldn't figure out how to get gcc to emit an absolute
>> relocation in pic code.
>
> Well, then, we wouldn't need to do that... this is starting to sound
> like a significant win.

How will this avoid breaking checkpoint/restore in userspace?  If the
vdso is not just plain old code, criu presumably needs to know about
it.  Should there be an arch_prctl(ARCH_MAP_VDSO, addr) to create a
vdso mapping somewhere?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 02/11] drivers/base: Add hotplug framework code

2012-12-13 Thread Toshi Kani

On Thu, 2012-12-13 at 10:24 -0800, Greg KH wrote:
> On Thu, Dec 13, 2012 at 09:30:51AM -0700, Toshi Kani wrote:
> > On Thu, 2012-12-13 at 04:24 +, Greg KH wrote:
> > > On Wed, Dec 12, 2012 at 09:02:45PM -0700, Toshi Kani wrote:
> > > > On Wed, 2012-12-12 at 15:54 -0800, Greg KH wrote:
> > > > > On Wed, Dec 12, 2012 at 04:17:14PM -0700, Toshi Kani wrote:
> > > > > > Added hotplug.c, which is the hotplug framework code.
> > > > > 
> > > > > Again, better naming please.
> > > > 
> > > > Yes, I will change it to be more specific, something like
> > > > "sys_hotplug.c".
> > > 
> > > Ugh, what's wrong with just a simple "system_bus.c" or something like
> > > that, and then put all of the needed system bus logic in there and tie
> > > the cpus and other sysdev code into that?
> > 
> > The issue is that the framework does not provide the system bus
> > structure.  This is because the system bus structure is not used for CPU
> > and memory initialization at boot (as I explained in my other email).
> 
> I understand, please fix that and then you will not have these issues :)
> 
> > The framework manages the calling sequence of hotplug operations, which
> > is similar to the boot sequence managed by start_kernel(),
> > kernel_init(), do_initcalls(), etc.  In such sense, this file might not
> > be a good fit for drivers/base, but I could not find a better place for
> > it.
> 
> Having "similar but slightly different" isn't a good way to do things,
> and I think you are trying to solve that problem here, so converting
> everything to use the driver model properly will solve these issues for
> you, right?
> 
> I _really_ don't want to see yet-another-way-to-do-things be created at
> all, unless it really really really is special and different for some
> reason.  So far, I have yet to be convinced, especially given that your
> reasoning for doing this seems to be "to do it correctly would be too
> much work so I created another interface".  That isn't going to fly,
> sorry.

Let's continue to discuss on other thread since I copied s390 and ppc
folks on that one.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMAR and DRHD errors[DMAR:[fault reason 06] PTE Read access is not set] Vt-d & intel_iommu

2012-12-13 Thread Jason Gao

On Fri, Dec 14, 2012 at 12:23 AM, Alex Williamson
 wrote:
>
> Device 03:00.0 is your raid controller:
>
> 03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 
> 04)
>
> For some reason it's trying to read from ffe65000, ffe8a000, ffe89000,
> ffe86000, ffe87000, ffe84000.  Those are in reserved memory regions, so
> it's not reading an OS allocated buffer, which probably means it's some
> kind of side-band communication with a management controller.  I'd guess
> it's a BIOS bug and there should be an RMRR covering those accesses.
> Thanks,

First of all ,I want to known whether I can ignore these errors on the
production server,and do these error may affect the system?

By the way,when I removed the "intel_iommu=on" from /etc/grub.conf,no
DMAR related errors occur

It's a strange thing,other three Dell R710 servers with the same bios
version v. 6.3.0, same kernel 2.6.32-279.14.1 on RHEL6u3(Centos 6u3)
,but these errors don't appear on these tree servers

Anyone have any idea for this ?

thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] timekeeping: Add persistent_clock_exist flag

2012-12-13 Thread John Stultz


On 12/13/2012 05:37 PM, Feng Tang wrote:

On Thu, Dec 13, 2012 at 05:20:36PM -0800, John Stultz wrote:

On 12/12/2012 06:05 PM, Feng Tang wrote:

In current kernel, there are several places which need to check
whether there is a persistent clock for the platform. Current check
is done by calling the read_persistent_clock() and validating the
return value.

Add such a flag to make code more readable and call read_persistent_clock()
only once for all the checks.

Sorry.. What  the actual benefit of this patch set?   (Usually with
changelogs its better to explain why you're doing something, rather
then just what you're doing.)

The main benefits is not bother to do the rtc_resume and rtc_suspend work
if persistent clock exists. Current RTC suspend/resume code will do many
time calculation and compensation work at first, and then call
timekeeping_inject_sleeptime() which will just return for platform with
persistent clock, what I did in this patchset is to put the check at
the start, also I save the persistent_clock_exist flag for all possible
check after timekeeping_init().


CC'ing Jason as his recent patch is conceptually connected here.

Ok, Feng, so your patch set is a suspend/resume optimization for the 
case where the architecture has a read_persistent_clock() 
implementation, but the kernel config has also the rtc HCTOSYS_DEVICE 
set, right?


So we basically short-cut the rtc's HCTOSYS_DEVICE suspend/resume logic, 
likely to speed up suspend/resume.


So per Jason's related patch, he's made the point that the 
persistent_clock and RTC class functionality are basically exclusive 
(well, in his case, he said this with respect to updating the RTC, not 
reading it - I don't mean to put words in his mouth - Please do correct 
me here Jason. :).  In other words, we probably should avoid 
configurations where both the rtc hctosys and persistent_clock 
interfaces are both active.


So my thought here is that this same behavioral change could be made via 
Kconfig constraints rather then extra run-time conditionals. Basically 
we add a HAS_PERSISTENT_CLOCK, that architectures select if they want to 
use the read/update_persistent_clock calls. Then we make the HCTOSYS 
option be dependent on !HAS_PERSISTENT_CLOCK. This way we avoid having 
configs where there are conflicting paths that we chose from.


thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 00/11] Hot-plug and Online/Offline framework

2012-12-13 Thread Toshi Kani

On Thu, 2012-12-13 at 10:30 -0800, Greg KH wrote:
> On Thu, Dec 13, 2012 at 09:03:54AM -0700, Toshi Kani wrote:
> > On Thu, 2012-12-13 at 04:16 +, Greg KH wrote:
> > > On Wed, Dec 12, 2012 at 08:37:44PM -0700, Toshi Kani wrote:
> > > > On Wed, 2012-12-12 at 16:55 -0800, Greg KH wrote:
> > > > > On Wed, Dec 12, 2012 at 05:39:36PM -0700, Toshi Kani wrote:
> > > > > > On Wed, 2012-12-12 at 15:56 -0800, Greg KH wrote:
> > > > > > > On Wed, Dec 12, 2012 at 04:17:12PM -0700, Toshi Kani wrote:
> > > > > > > > This patchset is an initial prototype of proposed hot-plug 
> > > > > > > > framework
> > > > > > > > for design review.  The hot-plug framework is designed to 
> > > > > > > > provide 
> > > > > > > > the common framework for hot-plugging and online/offline 
> > > > > > > > operations
> > > > > > > > of system devices, such as CPU, Memory and Node.  While this 
> > > > > > > > patchset
> > > > > > > > only supports ACPI-based hot-plug operations, the framework 
> > > > > > > > itself is
> > > > > > > > designed to be platform-neural and can support other FW 
> > > > > > > > architectures
> > > > > > > > as necessary.
> > > > > > > > 
> > > > > > > > The patchset has not been fully tested yet, esp. for memory 
> > > > > > > > hot-plug.
> > > > > > > > Any help for testing will be very appreciated since my test 
> > > > > > > > setup
> > > > > > > > is limited.
> > > > > > > > 
> > > > > > > > The patchset is based on the linux-next branch of linux-pm.git 
> > > > > > > > tree.
> > > > > > > > 
> > > > > > > > Overview of the Framework
> > > > > > > > =
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Why all the new framework, doesn't the existing bus infrastructure
> > > > > > > provide everything you need here?  Shouldn't you just be putting 
> > > > > > > your
> > > > > > > cpus and memory sticks on a bus and handle stuff that way?  What 
> > > > > > > makes
> > > > > > > these types of devices so unique from all other devices that 
> > > > > > > Linux has
> > > > > > > been handling in a dynamic manner (i.e. hotplugging them) for 
> > > > > > > many many
> > > > > > > years?
> > > > > > > 
> > > > > > > Why are you reinventing the wheel?
> > > > > > 
> > > > > > Good question.  Yes, USB and PCI hotplug operate based on their bus
> > > > > > structures.  USB and PCI cards only work under USB and PCI bus
> > > > > > controllers.  So, their framework can be composed within the bus
> > > > > > structures as you pointed out.
> > > > > > 
> > > > > > However, system devices such CPU and memory do not have their 
> > > > > > standard
> > > > > > bus.  ACPI allows these system devices to be enumerated, but it 
> > > > > > does not
> > > > > > make ACPI as the HW bus hierarchy for CPU and memory, unlike PCI and
> > > > > > USB.  Therefore, CPU and memory modules manage CPU and memory 
> > > > > > outside of
> > > > > > ACPI.  This makes sense because CPU and memory can be used without 
> > > > > > ACPI.
> > > > > > 
> > > > > > This leads us an issue when we try to manage system device hotplug
> > > > > > within ACPI, because ACPI does not control everything.  This 
> > > > > > patchset
> > > > > > provides a common hotplug framework for system devices, which both 
> > > > > > ACPI
> > > > > > and non-ACPI modules (i.e. CPU and memory modules) can participate 
> > > > > > and
> > > > > > are coordinated for their hotplug operations.  This is analogous to 
> > > > > > the
> > > > > > boot-up sequence, which ACPI and non-ACPI modules can participate to
> > > > > > enable CPU and memory.
> > > > > 
> > > > > Then create a "virtual" bus and put the devices you wish to control on
> > > > > that.  That is what the "system bus" devices were supposed to be, it's
> > > > > about time someone took that code and got it all working properly in
> > > > > this way, that is why it was created oh so long ago.
> > > > 
> > > > It may be the ideal, but it will take us great effort to make such
> > > > things to happen based on where we are now.  It is going to be a long
> > > > way.  I believe the first step is to make the boot-up flow and hot-plug
> > > > flow consistent for system devices.  This is what this patchset is
> > > > trying to do.
> > > 
> > > If you use the system "bus" for this, the "flow" will be identical, that
> > > is what the driver core provides for you.  I don't see why you need to
> > > implement something that sits next to it and not just use what we
> > > already have here.
> > 
> > Here is very brief boot-up flow.  
> > 
> > start_kernel()
> >   boot_cpu_init() // init cpu0
> >   setup_arch()
> > x86_init.paging.pagetable_init() // init mem pagetable
> >   :
> > kernel_init()
> >   kernel_init_freeable()
> > smp_init()// init other CPUs
> >   :
> > do_basic_setup()
> >   driver_init()
> > cpu_dev_init()// build system/cpu tree
> > memory_dev_init() // build system/memory tree
> >   do_initcalls()
> >

Re: [RESEND PATCH v3 1/6] watchdog: omap_wdt: convert to new watchdog core

2012-12-13 Thread Aaro Koskinen

Hi,

On Fri, Dec 14, 2012 at 02:23:36AM +0100, Sebastian Reichel wrote:
> On Mon, Nov 12, 2012 at 02:47:03PM -0800, Tony Lindgren wrote:
> > * Aaro Koskinen  [121112 10:49]:
> > > Convert omap_wdt to new watchdog core. On OMAP boards, there are usually
> > > multiple watchdogs. Since the new watchdog core supports multiple
> > > watchdogs, all watchdog drivers used on OMAP should be converted.
> > > 
> > > The legacy watchdog device node is still created, so this should not
> > > break existing users.
> > > 
> > > Signed-off-by: Aaro Koskinen 
> > > Tested-by: Jarkko Nikula 
> > > Tested-by: Lokesh Vutla 
> > > Cc: Wim Van Sebroeck 
> > 
> > Wim, looks like these will cause merge conflicts with what we
> > have already queued in omap-for-v3.8/cleanup-prcm as patch
> > "watchdog: OMAP: use standard GETBOOTSTATUS interface; use
> > platform_data fn ptr" along with other ARM multiplatform
> > related clean up. If these look ackable to you, I can queue
> > these that's OK to you.
> 
> What's the status of this patchset? If I'm not mistaken it is
> neither included in linux-omap, nor in linux-watchdog-next.

Since it's not in -next, then I guess it won't appear in 3.8. Once
the 3.8-rc1 is out, I will continue rebasing/retesting/resending the
patch set...

A.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling

2012-12-13 Thread Alex Shi

On 12/13/2012 07:35 PM, Borislav Petkov wrote:
> On Thu, Dec 13, 2012 at 11:07:43AM +0800, Alex Shi wrote:
 now, on the other hand, if you have two threads of a process that
 share a bunch of data structures, and you'd spread these over 2
 sockets, you end up bouncing data between the two sockets a lot,
 running inefficient --> bad for power.
>>>
>>> Yeah, that should be addressed by the NUMA patches people are
>>> working on right now.
>>
>> Yes, as to balance/powersaving policy, we can tight pack tasks
>> firstly, then NUMA balancing will make memory follow us.
>>
>> BTW, NUMA balancing is more related with page in memory. not LLC.
> 
> Sure, let's look at the worst and best cases:
> 
> * worst case: you have memory shared by multiple threads on one node
> *and* working set doesn't fit in LLC.
> 
> Here, if you pack threads tightly only on one node, you still suffer the
> working set kicking out parts of itself out of LLC.
> 
> If you spread threads around, you still cannot avoid the LLC thrashing
> because the LLC of the node containing the shared memory needs to cache
> all those transactions. *In* *addition*, you get the cross-node traffic
> because the shared pages are on the first node.
> 
> Major suckage.
> 
> Does it matter? I don't know. It can be decided on a case-by-case basis.
> If people care about singlethread perf, they would likely want to spread
> around and buy in the cross-node traffic.
> 
> If they care for power, then maybe they don't want to turn on the second
> socket yet.
> 
> * the optimal case is where memory follows threads and gets spread
> around such that LLC doesn't get thrashed and cross-node traffic gets
> avoided.
> 
> Now, you can think of all those other scenarios in between :-/

You are right. thanks for explanation! :)

Actually, what I went to say is that numa balancing target is pages in
different node memory, but of course, it may improve LLC performance.
> 
> Thanks.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-13 Thread H. Peter Anvin

On 12/13/2012 05:42 PM, Andy Lutomirski wrote:
> 
> The 64-bit/x32 case is currently very simple and fast because it uses
> absolute addressing.  Admittedly, pcrel references are free, so
> changing this wouldn't cost much.  For native, it'll be slower, but
> maybe no one cares.  I seem to care about this more than anyone else,
> and I don't use 32 bit code. :)
> 

pcrel is actually cheaper than absolute addressing in 64-bit mode.

> The benefit of switching is that the vdso code could be the same in
> all three cases.  (Actually, it's even better than that.  All of the
> VVAR magic could be the same in the vdso and the kernel -- the kernel
> linker script would just have to have an appropriate symbol to see the
> appropriate mapping.)
> 
> 
> This:
> 
> __attribute__((visibility("hidden"))) int foo;
> 
> int get_foo(void)
> {
>   return foo;
> }
> 
> generates a rip-relative access on 64 bits and GOTOFF on 32 bits.
> 
> The only reason I didn't use a real symbol in the first place is
> because I couldn't figure out how to get gcc to emit an absolute
> relocation in pic code.

Well, then, we wouldn't need to do that... this is starting to sound
like a significant win.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/4] block: Optionally snapshot page contents to provide stable pages during write

2012-12-13 Thread Andy Lutomirski

On 12/13/2012 12:08 AM, Darrick J. Wong wrote:
> Several complaints have been received regarding long file write latencies when
> memory pages must be held stable during writeback.  Since it might not be
> acceptable to stall programs for the entire duration of a page write (which 
> may
> take many milliseconds even on good hardware), enable a second strategy 
> wherein
> pages are snapshotted as part of submit_bio; the snapshot can be held stable
> while writes continue.
> 
> This provides a band-aid to provide stable page writes on jbd without needing
> to backport the fixed locking scheme in jbd2.  A mount option is added to ext4
> to allow administrators to enable it there.

I'm a bit confused as to what it has to do with ext3.  Wouldn't this be
useful as a mount option everywhere, though?

If this becomes widely used, would it be better to snapshot on
wait_for_stable_page instead of on io submission?

FWIW, I'm about to pound pretty hard on this whole patchset on a box
that doesn't need stable pages.  I'll let you know how it goes.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH v3 1/6] watchdog: omap_wdt: convert to new watchdog core

2012-12-13 Thread Sebastian Reichel

Hi Tony,

On Thu, Dec 13, 2012 at 05:32:57PM -0800, Tony Lindgren wrote:
> * Sebastian Reichel  [121213 17:26]:
> > What's the status of this patchset? If I'm not mistaken it is
> > neither included in linux-omap, nor in linux-watchdog-next.
> 
> It should be all linux-watchdog-next with what got merged into
> the mainline kernel today. No more arch/arm/*omap*/ dependencies
> AFAIK.

I can see the dependency fixes, but the driver itself is still using
miscdevice instead of the new watchdog core:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=drivers/watchdog/omap_wdt.c;b=master

This patch is not needed for using the watchdog on a multiplatform
kernel, but using multiple watchdogs (i.e. twl4030_wdt and omap_wdt)
at the same time. It also simplifies the driver.

-- Sebastian

signature.asc
Description: Digital signature

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-13 Thread Alex Shi

On 12/13/2012 11:48 PM, Vincent Guittot wrote:
> On 13 December 2012 15:53, Vincent Guittot  wrote:
>> On 13 December 2012 15:25, Alex Shi  wrote:
>>> On 12/13/2012 06:11 PM, Vincent Guittot wrote:
 On 13 December 2012 03:17, Alex Shi  wrote:
> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
>> During the creation of sched_domain, we define a pack buddy CPU for each 
>> CPU
>> when one is available. We want to pack at all levels where a group of 
>> CPU can
>> be power gated independently from others.
>> On a system that can't power gate a group of CPUs independently, the 
>> flag is
>> set at all sched_domain level and the buddy is set to -1. This is the 
>> default
>> behavior.
>> On a dual clusters / dual cores system which can power gate each core and
>> cluster independently, the buddy configuration will be :
>>
>>   | Cluster 0   | Cluster 1   |
>>   | CPU0 | CPU1 | CPU2 | CPU3 |
>> ---
>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
>>
>> Small tasks tend to slip out of the periodic load balance so the best 
>> place
>> to choose to migrate them is during their wake up. The decision is in 
>> O(1) as
>> we only check again one buddy CPU
>
> Just have a little worry about the scalability on a big machine, like on
> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
> is different on task distribution decision.

 The buddy CPU should probably not be the same for all 64 LCPU it
 depends on where it's worth packing small tasks
>>>
>>> Do you have further ideas for buddy cpu on such example?
>>
>> yes, I have several ideas which were not really relevant for small
>> system but could be interesting for larger system
>>
>> We keep the same algorithm in a socket but we could either use another
>> LCPU in the targeted socket (conf0) or chain the socket (conf1)
>> instead of packing directly in one LCPU
>>
>> The scheme below tries to summaries the idea:
>>
>> Socket  | socket 0 | socket 1   | socket 2   | socket 3   |
>> LCPU| 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 |
>> buddy conf0 | 0 | 0| 1  | 16| 2  | 32| 3  | 48|
>> buddy conf1 | 0 | 0| 0  | 16| 16 | 32| 32 | 48|
>> buddy conf2 | 0 | 0| 16 | 16| 32 | 32| 48 | 48|
>>
>> But, I don't know how this can interact with NUMA load balance and the
>> better might be to use conf3.
> 
> I mean conf2 not conf3

So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it
is unbalanced for different socket.

And the ground level has just one buddy for 16 LCPUs - 8 cores, that's
not a good design, consider my previous examples: if there are 4 or 8
tasks in one socket, you just has 2 choices: spread them into all cores,
or pack them into one LCPU. Actually, moving them just into 2 or 4 cores
maybe a better solution. but the design missed this.

Obviously, more and more cores is the trend on any kinds of CPU, the
buddy system seems hard to catch up this.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH v3 1/6] watchdog: omap_wdt: convert to new watchdog core

2012-12-13 Thread Sebastian Reichel

On Mon, Nov 12, 2012 at 02:47:03PM -0800, Tony Lindgren wrote:
> * Aaro Koskinen  [121112 10:49]:
> > Convert omap_wdt to new watchdog core. On OMAP boards, there are usually
> > multiple watchdogs. Since the new watchdog core supports multiple
> > watchdogs, all watchdog drivers used on OMAP should be converted.
> > 
> > The legacy watchdog device node is still created, so this should not
> > break existing users.
> > 
> > Signed-off-by: Aaro Koskinen 
> > Tested-by: Jarkko Nikula 
> > Tested-by: Lokesh Vutla 
> > Cc: Wim Van Sebroeck 
> 
> Wim, looks like these will cause merge conflicts with what we
> have already queued in omap-for-v3.8/cleanup-prcm as patch
> "watchdog: OMAP: use standard GETBOOTSTATUS interface; use
> platform_data fn ptr" along with other ARM multiplatform
> related clean up. If these look ackable to you, I can queue
> these that's OK to you.

What's the status of this patchset? If I'm not mistaken it is
neither included in linux-omap, nor in linux-watchdog-next.

-- Sebastian


signature.asc
Description: Digital signature

Re: [PATCH] Add VDSO time function support for x86 32-bit kernel

2012-12-13 Thread Andy Lutomirski

On Thu, Dec 13, 2012 at 5:32 PM, H. Peter Anvin  wrote:
> On 12/13/2012 04:20 PM, Andy Lutomirski wrote:
>>
>>
>> What you could do is probably arrange (using some linker script magic)
>> for a symbol to exist that points at the page *before* the vdso
>> starts.  Then just map the vvar page there when starting a compat
>> task.  You could then address it using a normal symbol reference by
>> tweaking the vvar macro.  (I think this'll access it via the GOT.)
>> Alternatively, you could just pick an absolute address -- the page is
>> NX, so you don't really need to worry about randomization.
>>
>
> The best would probably if we could generate GOTOFF references rather than
> GOT, which again probably means making the vvar page part of the vdso
> proper.  Then, when building the list of vdso pages, we need to substitute
> in the vvar page in the proper place.
>
> I have to admit to kind of thinking this might work well even for the
> 64-bit/x32 case, and perhaps even for native 32 bits.

The 64-bit/x32 case is currently very simple and fast because it uses
absolute addressing.  Admittedly, pcrel references are free, so
changing this wouldn't cost much.  For native, it'll be slower, but
maybe no one cares.  I seem to care about this more than anyone else,
and I don't use 32 bit code. :)

The benefit of switching is that the vdso code could be the same in
all three cases.  (Actually, it's even better than that.  All of the
VVAR magic could be the same in the vdso and the kernel -- the kernel
linker script would just have to have an appropriate symbol to see the
appropriate mapping.)

This:

__attribute__((visibility("hidden"))) int foo;

int get_foo(void)
{
  return foo;
}

generates a rip-relative access on 64 bits and GOTOFF on 32 bits.

The only reason I didn't use a real symbol in the first place is
because I couldn't figure out how to get gcc to emit an absolute
relocation in pic code.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1552 matches

Mail list logo