Re: [PATCH 11/12] drm/nouveau: support GK20A in nouveau_accel_init()

2014-04-15 Thread Alexandre Courbot
On Wed, Mar 26, 2014 at 1:27 PM, Ben Skeggs  wrote:
> On Tue, Mar 25, 2014 at 9:10 AM, Thierry Reding
>  wrote:
>> On Mon, Mar 24, 2014 at 05:42:33PM +0900, Alexandre Courbot wrote:
>>> GK20A does not embed a dedicated COPY engine and thus cannot allocate
>>> the copy channel that nouveau_accel_init() attempts to create. It also
>>> lacks any display hardware, so the creation of a software channel does
>>> not apply neither.
>>
>> Perhaps this should be two separate patches?
>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> [...]
>>> + if (device->chipset == 0xea) {
>>> + /* gk20a does not have CE0/CE1 */
>>
>> This would be another good candidate for a feature flag.
> There are ways to query this in a chipset-independent way.  However,
> despite reporting it as an error if no copy engines are available, the
> code should continue on without the channel happily.  Perhaps we can
> just punt the relevent error messages to a debug loglevel for now?

Do you know how to query this in a chipset-independant way? I have
failed to find any information for this.

The code does continue without any issue after reporting the error, so
indeed that check is not strictly necessary. But I was just mimicking
what follows right after:

if (device->chipset >= 0xa3 &&
device->chipset != 0xaa &&
device->chipset != 0xac) {
ret = nouveau_channel_new(drm, >client, NVDRM_DEVICE,
  NVDRM_CHAN + 1, NvDmaFB, NvDmaTT,
  >cechan);
if (ret)
NV_ERROR(drm, "failed to create ce channel, %d\n", ret);

arg0 = NvDmaFB;
arg1 = NvDmaTT;
} else {
arg0 = NvDmaFB;
arg1 = NvDmaTT;
}

So if we are trying to avoid showing this error for 0xa0 class
devices, why not for NV_E0?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq, powernv: Fix build failure on UP

2014-04-15 Thread Viresh Kumar
On 16 April 2014 11:18, Srivatsa S. Bhat
 wrote:
> From: Srivatsa S. Bhat 
> Subject: [PATCH] cpufreq, powernv: Fix build failure on UP
>
> Paul Gortmaker reported the following build failure of the powernv cpufreq
> driver on UP configs:
>
> drivers/cpufreq/powernv-cpufreq.c:241:2: error: implicit declaration of
> function 'cpu_sibling_mask' [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> make[3]: *** [drivers/cpufreq/powernv-cpufreq.o] Error 1
> make[2]: *** [drivers/cpufreq] Error 2
> make[1]: *** [drivers] Error 2
> make: *** [sub-make] Error 2
>
> The trouble here is that cpu_sibling_mask is defined only in , and
>  includes  only in SMP builds.
>
> So fix this build failure by explicitly including  in the driver,
> so that we get the definition of cpu_sibling_mask even in UP configurations.
>
> Reported-by: Paul Gortmaker 
> Signed-off-by: Srivatsa S. Bhat 
> ---
>
>  drivers/cpufreq/powernv-cpufreq.c |1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 9edccc6..ed1c7e5 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -29,6 +29,7 @@
>
>  #include 
>  #include 

Probably a comment here ?, so that people don't try to remove it in future.

/* Required for cpu_sibling_mask() in UP configurations */

> +#include 
>
>  #define POWERNV_MAX_PSTATES256

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 17/19] VFS: set PF_FSTRANS while namespace_sem is held.

2014-04-15 Thread NeilBrown
On Wed, 16 Apr 2014 05:46:18 +0100 Al Viro  wrote:

> On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote:
> > namespace_sem can be taken while various i_mutex locks are held, so we
> > need to avoid reclaim from blocking on an FS (particularly loop-back
> > NFS).
> 
> I would really prefer to deal with that differently - by explicit change of
> gfp_t arguments of allocators.
> 
> The thing is, namespace_sem is held *only* over allocations, and not a lot
> of them, at that - only mnt_alloc_id(), mnt_alloc_group_id(), alloc_vfsmnt()
> and new_mountpoint().  That is all that is allowed.
> 
> Again, actual work with filesystems (setup, shutdown, remount, pathname
> resolution, etc.) is all done outside of namespace_sem; it's held only
> for manipulations of fs/{namespace,pnode}.c data structures and the only
> reason it isn't a spinlock is that we need to do some allocations.
> 
> So I'd rather slap GFP_NOFS on those few allocations...

So something like this?  I put that in to my testing instead.

Thanks,
NeilBrown

diff --git a/fs/namespace.c b/fs/namespace.c
index 83dcd5083dbb..8e103b8c8323 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -103,7 +103,7 @@ static int mnt_alloc_id(struct mount *mnt)
int res;
 
 retry:
-   ida_pre_get(_id_ida, GFP_KERNEL);
+   ida_pre_get(_id_ida, GFP_NOFS);
spin_lock(_id_lock);
res = ida_get_new_above(_id_ida, mnt_id_start, >mnt_id);
if (!res)
@@ -134,7 +134,7 @@ static int mnt_alloc_group_id(struct mount *mnt)
 {
int res;
 
-   if (!ida_pre_get(_group_ida, GFP_KERNEL))
+   if (!ida_pre_get(_group_ida, GFP_NOFS))
return -ENOMEM;
 
res = ida_get_new_above(_group_ida,
@@ -193,7 +193,7 @@ unsigned int mnt_get_count(struct mount *mnt)
 
 static struct mount *alloc_vfsmnt(const char *name)
 {
-   struct mount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
+   struct mount *mnt = kmem_cache_zalloc(mnt_cache, GFP_NOFS);
if (mnt) {
int err;
 
@@ -202,7 +202,7 @@ static struct mount *alloc_vfsmnt(const char *name)
goto out_free_cache;
 
if (name) {
-   mnt->mnt_devname = kstrdup(name, GFP_KERNEL);
+   mnt->mnt_devname = kstrdup(name, GFP_NOFS);
if (!mnt->mnt_devname)
goto out_free_id;
}
@@ -682,7 +682,7 @@ static struct mountpoint *new_mountpoint(struct dentry 
*dentry)
}
}
 
-   mp = kmalloc(sizeof(struct mountpoint), GFP_KERNEL);
+   mp = kmalloc(sizeof(struct mountpoint), GFP_NOFS);
if (!mp)
return ERR_PTR(-ENOMEM);
 


signature.asc
Description: PGP signature


Re: [PATCH 13/19] MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.

2014-04-15 Thread Dave Chinner
On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote:
> lockdep reports a locking chain
> 
>   sk_lock-AF_INET --> rtnl_mutex --> pcpu_alloc_mutex
> 
> As sk_lock may be needed to reclaim memory, allowing that
> reclaim while pcu_alloc_mutex is held can lead to deadlock.
> So set PF_FSTRANS while it is help to avoid the FS reclaim.
> 
> pcpu_alloc_mutex can be taken when rtnl_mutex is held:
> 
> [] pcpu_alloc+0x49/0x960
> [] __alloc_percpu+0xb/0x10
> [] loopback_dev_init+0x17/0x60
> [] register_netdevice+0xec/0x550
> [] register_netdev+0x15/0x30
> 
> Signed-off-by: NeilBrown 

This looks like a workaround to avoid passing a gfp mask around to
describe the context in which the allocation is taking place.
Whether or not that's the right solution, I can't say, but spreading
this "we can turn off all reclaim of filesystem objects" mechanism
all around the kernel doesn't sit well with me...

And, again, PF_FSTRANS looks plainly wrong in this code - it sure
isn't a fs transaction context we are worried about here...


-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cpufreq, powernv: Fix build failure on UP

2014-04-15 Thread Srivatsa S. Bhat
On 04/15/2014 07:41 PM, Paul Gortmaker wrote:
> Hi all,
> 
> This new driver is causing build fails on linux-next for non-SMP.
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/10911507/
> 
> I didn't bisect since there are only two commits in total.  :)
> 
> Looks like some header foo where  is not getting 
> 

Hi Paul,

Thanks a lot for reporting the build failure. Please find the fix
below.
 
Regards,
Srivatsa S. Bhat

---

From: Srivatsa S. Bhat 
Subject: [PATCH] cpufreq, powernv: Fix build failure on UP

Paul Gortmaker reported the following build failure of the powernv cpufreq
driver on UP configs:

drivers/cpufreq/powernv-cpufreq.c:241:2: error: implicit declaration of
function 'cpu_sibling_mask' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[3]: *** [drivers/cpufreq/powernv-cpufreq.o] Error 1
make[2]: *** [drivers/cpufreq] Error 2
make[1]: *** [drivers] Error 2
make: *** [sub-make] Error 2

The trouble here is that cpu_sibling_mask is defined only in , and
 includes  only in SMP builds.

So fix this build failure by explicitly including  in the driver,
so that we get the definition of cpu_sibling_mask even in UP configurations.

Reported-by: Paul Gortmaker 
Signed-off-by: Srivatsa S. Bhat 
---

 drivers/cpufreq/powernv-cpufreq.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 9edccc6..ed1c7e5 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -29,6 +29,7 @@
 
 #include 
 #include 
+#include 
 
 #define POWERNV_MAX_PSTATES256
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/19] NET: set PF_FSTRANS while holding sk_lock

2014-04-15 Thread NeilBrown
On Tue, 15 Apr 2014 22:13:46 -0700 Eric Dumazet 
wrote:

> On Wed, 2014-04-16 at 14:03 +1000, NeilBrown wrote:
> > sk_lock can be taken while reclaiming memory (in nfsd for loop-back
> > NFS mounts, and presumably in nfs), and memory can be allocated while
> > holding sk_lock, at least via:
> > 
> >  inet_listen -> inet_csk_listen_start ->reqsk_queue_alloc
> > 
> > So to avoid deadlocks, always set PF_FSTRANS while holding sk_lock.
> > 
> > This deadlock was found by lockdep.
> 
> Wow, this is adding expensive stuff in fast path, only for nfsd :(

Yes, this was probably one part that I was least comfortable about.

> 
> BTW, why should the current->flags should be saved on a socket field,
> and not a current->save_flags. This really looks a thread property, not
> a socket one.
> 
> Why nfsd could not have PF_FSTRANS in its current->flags ?

nfsd does have PF_FSTRANS set in current->flags.  But some other processes
might not.

If any process takes sk_lock, allocates memory, and then blocks in reclaim it
could be waiting for nfsd.  If nfsd waits for that sk_lock, it would cause a
deadlock.

Thinking a bit more carefully  I suspect that any socket that nfsd
created would only ever be locked by nfsd.  If that is the case then the
problem can be resolved entirely within nfsd.  We would need to tell lockdep
that there are two sorts of sk_locks, those which nfsd uses and all the
rest.  That might get a little messy, but wouldn't impact performance.

Is it justified to assume that sockets created by nfsd threads would only
ever be locked by nfsd threads (and interrupts, which won't be allocating
memory so don't matter), or might there be locked by other threads - e.g for
'netstat -a' etc??


> 
> For applications handling millions of sockets, this makes a difference.
> 

Thanks,
NeilBrown



signature.asc
Description: PGP signature


[PATCH] driver/memory:Add Kconfig help description for IFC

2014-04-15 Thread Prabhakar Kushwaha
Freescale's Integrated Flash controller(IFC) module is used to handle
devices such as NOR, NAND, FPGA and ASIC.

Update same in Help section of Kconfig for IFC.

Signed-off-by: Prabhakar Kushwaha 
---
 drivers/memory/Kconfig |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index c59e9c9..9a59581 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -64,5 +64,9 @@ config TEGRA30_MC
 config FSL_IFC
bool
depends on FSL_SOC
+   help
+ This driver is for the Integrated Flash Controller(IFC) module
+ available in Freescale SoCs. This controller allows to handle
+ devices such as NOR, NAND, FPGA and ASIC etc.
 
 endif
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 回复: [PATCH] unicore32: include: asm: add missing ')' for PAGE_* macros in pgtable.h

2014-04-15 Thread Chen Gang
On 04/16/2014 01:19 PM, 管雪涛 wrote:
> This problem has been fixed, but I didn't submit.
> Anyway, I can apply your version.
> 
> Acked-by: Xuetao Guan 
> 

OK, thanks.

> - Chen Gang  写道:
>> Missing related ')', the related compiling error:
>>
>> CC [M]  drivers/gpu/drm/udl/udl_fb.o
>>   drivers/gpu/drm/udl/udl_fb.c: In function ‘udl_fb_mmap’:
>>   drivers/gpu/drm/udl/udl_fb.c:273: error: expected ‘)’ before ‘return’
>>   drivers/gpu/drm/udl/udl_fb.c:281: error: expected expression before ‘}’ 
>> token
>>   make[4]: *** [drivers/gpu/drm/udl/udl_fb.o] Error 1
>>   make[3]: *** [drivers/gpu/drm/udl] Error 2
>>   make[2]: *** [drivers/gpu/drm] Error 2
>>   make[1]: *** [drivers/gpu] Error 2
>>   make: *** [drivers] Error 2
>>
>>
>> Signed-off-by: Chen Gang 
>> ---
>>  arch/unicore32/include/asm/pgtable.h |   10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/unicore32/include/asm/pgtable.h 
>> b/arch/unicore32/include/asm/pgtable.h
>> index 233c258..ed6f7d0 100644
>> --- a/arch/unicore32/include/asm/pgtable.h
>> +++ b/arch/unicore32/include/asm/pgtable.h
>> @@ -87,16 +87,16 @@ extern pgprot_t pgprot_kernel;
>>  
>>  #define PAGE_NONE   pgprot_user
>>  #define PAGE_SHARED __pgprot(pgprot_val(pgprot_user | PTE_READ \
>> -| PTE_WRITE)
>> +| PTE_WRITE))
>>  #define PAGE_SHARED_EXEC__pgprot(pgprot_val(pgprot_user | PTE_READ \
>>  | PTE_WRITE \
>> -| PTE_EXEC)
>> +| PTE_EXEC))
>>  #define PAGE_COPY   __pgprot(pgprot_val(pgprot_user | PTE_READ)
>>  #define PAGE_COPY_EXEC  __pgprot(pgprot_val(pgprot_user | 
>> PTE_READ \
>> -| PTE_EXEC)
>> -#define PAGE_READONLY   __pgprot(pgprot_val(pgprot_user | 
>> PTE_READ)
>> +| PTE_EXEC))
>> +#define PAGE_READONLY   __pgprot(pgprot_val(pgprot_user | 
>> PTE_READ))
>>  #define PAGE_READONLY_EXEC  __pgprot(pgprot_val(pgprot_user | PTE_READ \
>> -| PTE_EXEC)
>> +| PTE_EXEC))
>>  #define PAGE_KERNEL pgprot_kernel
>>  #define PAGE_KERNEL_EXEC__pgprot(pgprot_val(pgprot_kernel | PTE_EXEC))
>>  
>> -- 
>> 1.7.9.5
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread Masami Hiramatsu
(2014/04/16 0:10), Sasha Levin wrote:
>>>  - Memory access size. We're currently decoding the size (in bytes) of an
>>> address size, and operand size. kmemcheck would like to know in addition
>>> how many bytes were read/written from/to an address by a given instruction,
>>> so we also keep the size of the memory access.
>>
>> And also, at least in this time, since the operation/mem_size are
>> only used in kmemcheck, you should generate another table for that in 
>> kmemcheck
>> from x86-opcode-map.txt.
> 
> I don't want to "teach" kmemcheck to parse x86-opcode-map.txt, that
> should be something that the instruction API does.
> 
> kmemcheck would also be the 3rd in-kernel user of that API, so it's
> not fair to push it as an exception :)

OK, I think we can push the size information bits into current insn_attr_t.
I don't think we should have another byte for that.

For example, here I pulled the operand size detector from my disasm code,


static int get_operand_size(struct insn *insn, const char *opnd)
{
int size = insn->opnd_bytes;

switch (opnd[1]) {
case 'b':
case 'B':
size = 1;
break;
case 'w':
size = 2;
break;
case 'd':
if (opnd[2] == 'q')
size = 16;
else
size = 4;
break;
case 'p':
if (opnd[2] == 's' || opnd[2] == 'd')
size = insn_vex_l_bit(insn) ? 32 : 16;
break;
case 'q':
if (opnd[2] == 'q')
size = 32;
else
size = 8;
break;
case 's':
if (opnd[2] == 's' || opnd[2] == 'd')
size = 16;
break;
case 'x':
size = insn_vex_l_bit(insn) ? 32 : 16;
break;
case 'z':
if (size == 8)
size = 4;
break;
}
return size;
}


Same thing can be done in awk part and insn.c, and we can encode it by

#define INAT_MAKE_MEMSZ(size) (size << INAT_MEMSZ_OFFS)

And decode it by

insn->memsz_bytes = 1 << ((attr & INAT_MEMSZ_MASK) >> INAT_MEMSZ_OFFS)

Thus, we only need 3 bits to represent 1, 2, 4, 8, 16 and 32. :)

> It's also just one more byte in 'struct insn'...

I actually don't like to expand struct insn_attr_t, I'd like to keep it in
an immediate value.

[...]
>>> @@ -141,15 +141,15 @@ void __kprobes synthesize_relcall(void *from, void 
>>> *to)
>>>   */
>>>  static kprobe_opcode_t *__kprobes skip_prefixes(kprobe_opcode_t *insn)
>>>  {
>>> -   insn_attr_t attr;
>>> +   insn_flags_t flags;
>>>  
>>> -   attr = inat_get_opcode_attribute((insn_byte_t)*insn);
>>> -   while (inat_is_legacy_prefix(attr)) {
>>> +   flags = inat_get_opcode((insn_byte_t)*insn)->flags;
>>
>> Do not refer a member from the return value directly. If it returns NULL,
>> the kernel just crashes!
> 
> Right, I'll fix that. Probably by adding a dummy "empty" instruction
> just so we won't have to deal with too many NULL checks.

Note that if we can put them all in one value, we can avoid such ugly NULL 
checks.


Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 29/38] tick-sched: remove wrapper around __tick_nohz_task_switch()

2014-04-15 Thread Viresh Kumar
On 15 April 2014 18:14, Frederic Weisbecker  wrote:
> Sure but check out the static_key_false() in the implementation of 
> tick_nohz_full_enabled().
> That's where the magic hides.

> No problem, the jump label/static key code is quite tricky. And its use
> can be easily missed, as in here.
>
> Also its unfamous API naming (static_key_true/static_key_true) that is
> anything but intuitive.

That was tricky enough to be missed on first look :)

Okay here is the answer to what you asked earlier:

>> In this case probably we can move !can_stop_full_tick() as well to the 
>> wrapper ?
>
> Do you mean moving all the code of __tick_nohz_task_switch() to 
> tick_nohz_task_switch()?
> I much prefer we don't do that. This is going to make can_stop_full_tick() a 
> publicly
> visible nohz internal. And it may uglify tick.h as well.

I probably asked the wrong question, I meant tick_nohz_tick_stopped()
instead of !can_stop_full_tick().

Or, can we make the code more efficient by avoiding a branch by doing this:

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 1065a51..12632cc 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -220,7 +220,7 @@ static inline void tick_nohz_full_check(void)

 static inline void tick_nohz_task_switch(void)
 {
-   if (tick_nohz_full_enabled())
+   if (tick_nohz_full_enabled() && tick_nohz_tick_stopped())
__tick_nohz_task_switch();
 }

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 45037c4..904e09b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -273,7 +273,7 @@ void __tick_nohz_task_switch(void)

local_irq_save(flags);

-   if (tick_nohz_tick_stopped() && !can_stop_full_tick())
+   if (!can_stop_full_tick())
tick_nohz_full_kick();

local_irq_restore(flags);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 回复: [PATCH v3] arch:unicore32:mm: add devmem_is_allowed() to support STRICT_DEVMEM

2014-04-15 Thread Chen Gang
On 04/16/2014 01:15 PM, 管雪涛 wrote:
> Acked-by: Xuetao Guan 
> 

OK, thanks.

> - Chen Gang  写道:
>> unicore32 supports STRICT_DEVMEM, so it needs devmem_is_allowed(), like
>> some of other architectures have done (e.g. arm, powerpc, x86 ...).
>>
>> The related error with allmodconfig:
>>
>> CC  drivers/char/mem.o
>>   drivers/char/mem.c: In function ‘range_is_allowed’:
>>   drivers/char/mem.c:69: error: implicit declaration of function 
>> ‘devmem_is_allowed’
>>   make[2]: *** [drivers/char/mem.o] Error 1
>>   make[1]: *** [drivers/char] Error 2
>>   make: *** [drivers] Error 2
>>
>>
>> Signed-off-by: Chen Gang 
>> ---
>>  arch/unicore32/include/asm/io.h |   23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/unicore32/include/asm/io.h 
>> b/arch/unicore32/include/asm/io.h
>> index 39decb6..ae327e4 100644
>> --- a/arch/unicore32/include/asm/io.h
>> +++ b/arch/unicore32/include/asm/io.h
>> @@ -44,5 +44,28 @@ extern void __uc32_iounmap(volatile void __iomem *addr);
>>  #define PIO_MASK(unsigned int)(IO_SPACE_LIMIT)
>>  #define PIO_RESERVED(PIO_OFFSET + PIO_MASK + 1)
>>  
>> +#ifdef CONFIG_STRICT_DEVMEM
>> +
>> +#include 
>> +#include 
>> +
>> +/*
>> + * devmem_is_allowed() checks to see if /dev/mem access to a certain
>> + * address is valid. The argument is a physical page number.
>> + * We mimic x86 here by disallowing access to system RAM as well as
>> + * device-exclusive MMIO regions. This effectively disable read()/write()
>> + * on /dev/mem.
>> + */
>> +static inline int devmem_is_allowed(unsigned long pfn)
>> +{
>> +if (iomem_is_exclusive(pfn << PAGE_SHIFT))
>> +return 0;
>> +if (!page_is_ram(pfn))
>> +return 1;
>> +return 0;
>> +}
>> +
>> +#endif /* CONFIG_STRICT_DEVMEM */
>> +
>>  #endif  /* __KERNEL__ */
>>  #endif  /* __UNICORE_IO_H__ */
>> -- 
>> 1.7.9.5
>>
> 

-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] Drivers/PCI: Logging clean-up [1]

2014-04-15 Thread Joe Perches
On Wed, 2014-04-16 at 07:37 +0200, Fabian Frederick wrote:
> -Convert printk(KERN_WARNING|KERN_ERR|KERN_INFO to pr_foo()
> -Define pr_fmt where it doesn't break existing format.
[]
>  (other ones ?)
[]
> diff --git a/drivers/pci/hotplug/cpqphp_core.c 
> b/drivers/pci/hotplug/cpqphp_core.c
[]
> @@ -830,8 +831,8 @@ static int cpqhpc_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   err = pci_enable_device(pdev);
>   if (err) {
> - printk(KERN_ERR MY_NAME ": cannot enable PCI device %s (%d)\n",
> - pci_name(pdev), err);
> + pr_err("cannot enable PCI device %s (%d)\n", pci_name(pdev),
> +err);

trivia:

My preference is to keep format and argument
on a single line where possible and on
separate lines when not.

pr_err("cannot enable PCI device %s (%d)\n",
   pci_name(pdev), err);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the akpm-current tree

2014-04-15 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

ERROR: "kmalloc_order" [drivers/net/wireless/brcm80211/brcmfmac/brcmfmac.ko] 
undefined!

Caused by commit a3ed1ed24850 ("mm: get rid of __GFP_KMEMCG").

I added the following patch:

From: Stephen Rothwell 
Date: Wed, 16 Apr 2014 15:32:54 +1000
Subject: [PATCH] mm: get rid of __GFP_KMEMCG fix

Signed-off-by: Stephen Rothwell 
---
 mm/slab_common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index cab4c49b3e8c..edd10a8af69e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -584,6 +584,7 @@ void *kmalloc_order(size_t size, gfp_t flags, unsigned int 
order)
kmemleak_alloc(ret, size, 1, flags);
return ret;
 }
+EXPORT_SYMBOL(kmalloc_order);
 
 #ifdef CONFIG_TRACING
 void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order)
-- 
1.9.2

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpfVQ5xtFebh.pgp
Description: PGP signature


Hello

2014-04-15 Thread Mr. Raymond Hung Lee
From: Mr. RWH Lee
Email: (1952025...@qq.com)

Good Day

I am Mr. Raymond Wing Hung LEE the Executive Director and chief Financial 
officer of Fubon Bank (Hong Kong) Limited. I need your assistance in 
transferring funds worth $47.1million dollars.

Kindly write back to (1952025...@qq.com), I shall let you know what is required 
of you. 

Best Regards, 
Mr. RWH Lee
Fubon Bank (Hong Kong)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/19] Make effect of PF_FSTRANS to disable __GFP_FS universal.

2014-04-15 Thread Dave Chinner
On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote:
> Currently both xfs and nfs will handle PF_FSTRANS by disabling
> __GFP_FS.
> 
> Make this effect global by repurposing memalloc_noio_flags (which
> does the same thing for PF_MEMALLOC_NOIO and __GFP_IO) to generally
> impost the task flags on a gfp_t.
> Due to this repurposing we change the name of memalloc_noio_flags
> to gfp_from_current().
> 
> As PF_FSTRANS now uniformly removes __GFP_FS we can remove special
> code for this from xfs and nfs.
> 
> As we can now expect other code to set PF_FSTRANS, its meaning is more
> general, so the WARN_ON in xfs_vm_writepage() which checks PF_FSTRANS
> is not set is no longer appropriate.  PF_FSTRANS may be set for other
> reasons than an XFS transaction.

So PF_FSTRANS no longer means "filesystem in transaction context".
Are you going to rename to match whatever it's meaning is now?
I'm not exactly clear on what it means now...


> As lockdep cares about __GFP_FS, we need to translate PF_FSTRANS to
> __GFP_FS before calling lockdep_alloc_trace() in various places.
> 
> Signed-off-by: NeilBrown 

> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index 64db0e53edea..882b86270ebe 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -50,8 +50,6 @@ kmem_flags_convert(xfs_km_flags_t flags)
>   lflags = GFP_ATOMIC | __GFP_NOWARN;
>   } else {
>   lflags = GFP_KERNEL | __GFP_NOWARN;
> - if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
> - lflags &= ~__GFP_FS;
>   }

I think KM_NOFS needs to remain here, as it has use outside of
transaction contexts that set PF_FSTRANS

>   if (flags & KM_ZERO)
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index db2cfb067d0b..207a7f86d5d7 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -952,13 +952,6 @@ xfs_vm_writepage(
>   PF_MEMALLOC))
>   goto redirty;
>  
> - /*
> -  * Given that we do not allow direct reclaim to call us, we should
> -  * never be called while in a filesystem transaction.
> -  */
> - if (WARN_ON(current->flags & PF_FSTRANS))
> - goto redirty;

We still need to ensure this rule isn't broken. If it is, the
filesystem will silently deadlock in delayed allocation rather than
gracefully handle the problem with a warning

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] Drivers/PCI: Logging clean-up [1]

2014-04-15 Thread Fabian Frederick
-Convert printk(KERN_WARNING|KERN_ERR|KERN_INFO to pr_foo()
-Define pr_fmt where it doesn't break existing format.

Cc: Bjorn Helgaas 
Cc: Andrew Morton 
Cc: Joe Perches 
Signed-off-by: Fabian Frederick 
---
v2: (From suggestions by Joe Perches)
-Coalesce format fragments.
-Remove 'PCI: ' in drivers/pci/pci.c pr_err
-Use KBUILD_MODNAME instead of pci-stub module name
 (other ones ?)

 drivers/pci/bus.c   |  4 +++-
 drivers/pci/hotplug-pci.c   |  4 ++--
 drivers/pci/hotplug/acpi_pcihp.c| 17 +++-
 drivers/pci/hotplug/cpqphp_core.c   |  5 +++--
 drivers/pci/hotplug/rpadlpar_core.c | 39 -
 drivers/pci/hotplug/sgi_hotplug.c   | 15 ++
 drivers/pci/pci-stub.c  | 10 --
 drivers/pci/pci-sysfs.c |  5 ++---
 drivers/pci/pci.c   |  5 +++--
 drivers/pci/pcie/aer/aer_inject.c   |  8 +++-
 drivers/pci/pcie/aspm.c |  4 ++--
 drivers/pci/pcie/portdrv_pci.c  |  3 ++-
 drivers/pci/probe.c |  9 -
 drivers/pci/quirks.c|  2 +-
 drivers/pci/slot.c  |  3 ++-
 15 files changed, 61 insertions(+), 72 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index fb8aed3..a565452 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -7,6 +7,8 @@
  * David Miller (da...@redhat.com)
  * Ivan Kokshaysky (i...@jurassic.park.msu.ru)
  */
+#define pr_fmt(fmt) "PCI: " fmt
+
 #include 
 #include 
 #include 
@@ -25,7 +27,7 @@ void pci_add_resource_offset(struct list_head *resources, 
struct resource *res,
 
window = kzalloc(sizeof(struct pci_host_bridge_window), GFP_KERNEL);
if (!window) {
-   printk(KERN_ERR "PCI: can't add host bridge window %pR\n", res);
+   pr_err("can't add host bridge window %pR\n", res);
return;
}
 
diff --git a/drivers/pci/hotplug-pci.c b/drivers/pci/hotplug-pci.c
index 6258dc2..c76dfd4 100644
--- a/drivers/pci/hotplug-pci.c
+++ b/drivers/pci/hotplug-pci.c
@@ -15,8 +15,8 @@ int __ref pci_hp_add_bridge(struct pci_dev *dev)
break;
}
if (busnr-- > end) {
-   printk(KERN_ERR "No bus number available for hot-added bridge 
%s\n",
-   pci_name(dev));
+   pr_err("No bus number available for hot-added bridge %s\n",
+  pci_name(dev));
return -1;
}
for (pass = 0; pass < 2; pass++)
diff --git a/drivers/pci/hotplug/acpi_pcihp.c b/drivers/pci/hotplug/acpi_pcihp.c
index a94d850..90253ab 100644
--- a/drivers/pci/hotplug/acpi_pcihp.c
+++ b/drivers/pci/hotplug/acpi_pcihp.c
@@ -68,8 +68,7 @@ decode_type0_hpx_record(union acpi_object *record, struct 
hotplug_params *hpx)
hpx->t0->enable_perr = fields[5].integer.value;
break;
default:
-   printk(KERN_WARNING
-  "%s: Type 0 Revision %d record not supported\n",
+   pr_warn("%s: Type 0 Revision %d record not supported\n",
   __func__, revision);
return AE_ERROR;
}
@@ -97,9 +96,8 @@ decode_type1_hpx_record(union acpi_object *record, struct 
hotplug_params *hpx)
hpx->t1->tot_max_split = fields[4].integer.value;
break;
default:
-   printk(KERN_WARNING
-  "%s: Type 1 Revision %d record not supported\n",
-  __func__, revision);
+   pr_warn("%s: Type 1 Revision %d record not supported\n",
+   __func__, revision);
return AE_ERROR;
}
return AE_OK;
@@ -139,8 +137,7 @@ decode_type2_hpx_record(union acpi_object *record, struct 
hotplug_params *hpx)
hpx->t2->sec_unc_err_mask_or   = fields[17].integer.value;
break;
default:
-   printk(KERN_WARNING
-  "%s: Type 2 Revision %d record not supported\n",
+   pr_warn("%s: Type 2 Revision %d record not supported\n",
   __func__, revision);
return AE_ERROR;
}
@@ -201,7 +198,7 @@ acpi_run_hpx(acpi_handle handle, struct hotplug_params *hpx)
goto exit;
break;
default:
-   printk(KERN_ERR "%s: Type %d record not supported\n",
+   pr_err("%s: Type %d record not supported\n",
   __func__, type);
status = AE_ERROR;
goto exit;
@@ -270,8 +267,8 @@ static acpi_status acpi_run_oshp(acpi_handle handle)
status = acpi_evaluate_object(handle, METHOD_NAME_OSHP, NULL, NULL);
if (ACPI_FAILURE(status))
if (status != AE_NOT_FOUND)
-   printk(KERN_ERR "%s:%s OSHP fails=0x%x\n",
-  

[PATCH 2/2] workqueue: jumps to use_dfl_pwq if the target cpumask is equal wq's

2014-04-15 Thread Daeseok Youn

Replace blocks of code which checks whether pwq is defalut with
jump to use_dfl_pwq. It is same as before.

Signed-off-by: Daeseok Youn 
---
 kernel/workqueue.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3150b21..0679854 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4087,10 +4087,7 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
goto out_unlock;
} else {
-   if (pwq == wq->dfl_pwq)
-   goto out_unlock;
-   else
-   goto use_dfl_pwq;
+   goto use_dfl_pwq;
}
 
mutex_unlock(>mutex);
-- 
1.7.4.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] workqueue: fix bugs in wq_update_unbound_numa() failure path

2014-04-15 Thread Daeseok Youn

wq_update_unbound_numa() failure path has the following two bugs.
 - alloc_unbound_pwq() is called without holding wq->mutex;
 however, if the allocation fails, it jumps to out_unlock
 which tries to unlock wq->mutex.

 - The function should switch to dfl_pwq on failure
 but didn't do so after alloc_unbound_pwq() failure.

Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn 
---
 kernel/workqueue.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0ee63af..3150b21 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4100,7 +4100,8 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
if (!pwq) {
pr_warning("workqueue: allocation failed while updating NUMA 
affinity of \"%s\"\n",
   wq->name);
-   goto out_unlock;
+   mutex_lock(>mutex);
+   goto use_dfl_pwq;
}
 
/*
-- 
1.7.4.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread Masami Hiramatsu
(2014/04/16 13:03), Sasha Levin wrote:
> On 04/15/2014 11:54 PM, H. Peter Anvin wrote:
>> On 04/15/2014 08:47 PM, Sasha Levin wrote:

 Yes, if kmemcheck for some reason needs to figure out if an instruction
 is a MOV variant we'll need to list quite a few mnemonics, but that list
 will be much shorter and more readable than a corresponding list of 
 opcodes.

>> You're completely missing my point.  If you are looking at MOV, with
>> 80%+ probability you're doing something very, very wrong, because you
>> will be including instructions that do something completely different
>> from what you thought.
>>
>> This is true for a lot of the x86 instructions.
> 
> Right, but assuming that the AND example I presented earlier makes sense, I
> can't create mnemonic entries only for instructions where doing so would
> "probably" be right.
> 
> If there are use cases where working with mnemonics is correct, we should
> be doing that in kmemcheck. If the way kmemcheck deals with mnemonics is
> incorrect we should go ahead and fix kmemcheck.

In that case, as I said, the mnemonics classifier should be build in
kmemcheck at this point, since we cannot provide any general mnemonic
classifier for that purpose. If it becomes enough generic, and accurate,
it would be better consolidate both, I think.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext4: deadlock occurs when running fsstress and ENOSPC errors are seen.

2014-04-15 Thread Andreas Dilger
On Apr 15, 2014, at 11:07 PM, Theodore Ts'o  wrote:
> On Wed, Apr 16, 2014 at 10:30:10AM +0530, Amit Sahrawat wrote:
>> 4)   Corrupt the block group ‘1’  by writing all ‘1’, we had one file
>> with all 1’s, so using ‘dd’ –
>> dd if=i_file of=/dev/sdb1 bs=4096 seek=17 count=1
>> After this mount the partition – create few random size files and then
>> ran ‘fsstress,
> 
> Um, sigh.  You didn't say that you were deliberately corrupting the
> file system.  That wasn't in the subject line, or anywhere else in the
> original message.
> 
> So the question isn't how the file system got corrupted, but that
> you'd prefer that the system recovers without hanging after this
> corruption.
> 
> I wish you had *said* that.  It would have saved me a lot of time,
> since I was trying to figure out how the system had gotten so
> corrupted (not realizing you had deliberately corrupted the file
> system).

Don't we check the bitmaps upon load to verify they are not bogus?
Looks like this is disabled completely if flex_bg is enabled, though
I guess that it is not impossible to do some checking even if flex_bg
is enabled.

For example, we have the background thread to do the inode table zeroing,
and it could load the block bitmaps and check the group descriptors for
itable and bitmap locations against the various bitmaps.  With flex_bg,
this would probably only load a small subset of block bitmaps.

> So I think if you run "tune2fs -e remount-ro /dev/sdb1" before you
> started the fsstress, the file system would have remounted the
> filesystem read-only at the first EXT4-fs error message.  This would
> avoid the hang that you saw, since the file system would hopefully
> "failed fast", before th euser had the opportunity to put data into
> the page cache that would be lost when the system discovered there was
> no place to put the data.

Even without remount-ro, it would be possible to flag the block bitmaps
for a particular group as corrupt, so that no new allocations are done
from that group until the next e2fsck is run.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: How do I increment a per-CPU variable without warning?

2014-04-15 Thread Peter Zijlstra
On Tue, Apr 15, 2014 at 08:54:19PM -0700, Paul E. McKenney wrote:
> But falling back on the old ways of doing this at least looks a bit
> nicer:
> 
>   static inline bool rcu_should_resched(void)
>   {
>   int t;
>   int *tp = _cpu(rcu_cond_resched_count, 
> raw_smp_processor_id());
> 
>   t = ACCESS_ONCE(*tp) + 1;
>   if (t < RCU_COND_RESCHED_LIM) {



>   ACCESS_ONCE(*tp) = t;
>   return false;
>   }
>   return true;
>   }
> 
> Other thoughts?

Still broken, if A starts out on CPU1, gets migrated to CPU0 at ,
then B starts the same on CPU1. It is possible for both CPU0 and CPU1 to
write a different value into your rcu_cond_resched_count.

You really want to disable preemption around there. The proper old way
would've been get_cpu_var()/put_cpu_var().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] workqueue: fix bugs in wq_update_unbound_numa() failure path

2014-04-15 Thread Daeseok Youn
wq_update_unbound_numa() failure path has the following two bugs.
 - alloc_unbound_pwq() is called without holding wq->mutex;
 however, if the allocation fails, it jumps to out_unlock
 which tries to unlock wq->mutex.

 - The function should switch to dfl_pwq on failure
 but didn't do so after alloc_unbound_pwq() failure.

Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn 
---
 kernel/workqueue.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0ee63af..3150b21 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4100,7 +4100,8 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
if (!pwq) {
pr_warning("workqueue: allocation failed while updating NUMA 
affinity of \"%s\"\n",
   wq->name);
-   goto out_unlock;
+   mutex_lock(>mutex);
+   goto use_dfl_pwq;
}
 
/*
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 10/21] mtd: support BB SRAM on ICP DAS LP-8x4x

2014-04-15 Thread Sergei Ianovich
On Tue, 2014-04-15 at 22:04 -0700, Brian Norris wrote:
> Hi Sergei,
> 
> On Tue, Dec 17, 2013 at 11:37:40PM +0400, Sergei Ianovich wrote:
> > This provides an MTD device driver for 512kB of battery backed up SRAM
> > on ICPDAS LP-8X4X programmable automation controllers.
> > 
> > SRAM chip is connected via FPGA and is not accessible without a driver,
> > unlike flash memory which is wired to CPU MMU.
> > 
> > This SRAM becomes an excellent persisent storage of volatile process
> > data like counter values and sensor statuses. Storing those data in
> > flash or mmc card is not a viable solution.
> > 
> > Signed-off-by: Sergei Ianovich 
> > ---
> >v2..v3
> >* no changes (except number 08/16 -> 10/21)
> > 
> >v0..v2
> >* use device tree
> >* use devm helpers where possible
> 
> What's the status of this series? Is the rest of this platform support
> going in via other trees yet? I have a few trivial comments, but this
> driver mostly looks good as-is.

(...)

> You'll need a (trivial) DT binding doc in
> Documentation/devicetree/bindings/ to go with this. You might also need
> to split the arch/arm/boot/dts/ stuff out to a separate patch, so the
> MTD driver can go in the MTD tree separate from the ARM tree bits.

(...)

> With that, you have my:
> 
>   Reviewed-by: Brian Norris 
> 
> and I can merge this to the MTD tree if/when you'd like.

Thanks for response, the rest of the series is stuck on a DMA driver
with no development activity. I'll fix the comments, separate ARM staff
and refile the patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build failure after merge of the akpm-current tree

2014-04-15 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm-current tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

In file included from mm/vmscan.c:50:0:
include/linux/swapops.h: In function 'is_swap_pte':
include/linux/swapops.h:57:2: error: implicit declaration of function 
'pte_present_nonuma' [-Werror=implicit-function-declaration]
  return !pte_none(pte) && !pte_present_nonuma(pte) && !pte_file(pte);
  ^

Caused by commit 851fe3337768 ("x86: define _PAGE_NUMA by reusing
software bits on the PMD and PTE levels").  This build does not have
CONFIG_NUMA_BALANCING set.

I have reverted that commit for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpc2bHYGY9At.pgp
Description: PGP signature


回复: [PATCH] unicore32: include: asm: add missing ')' for PAGE_* macros in pgtable.h

2014-04-15 Thread 管雪涛
This problem has been fixed, but I didn't submit.
Anyway, I can apply your version.

Acked-by: Xuetao Guan 

- Chen Gang  写道:
> Missing related ')', the related compiling error:
> 
> CC [M]  drivers/gpu/drm/udl/udl_fb.o
>   drivers/gpu/drm/udl/udl_fb.c: In function ‘udl_fb_mmap’:
>   drivers/gpu/drm/udl/udl_fb.c:273: error: expected ‘)’ before ‘return’
>   drivers/gpu/drm/udl/udl_fb.c:281: error: expected expression before ‘}’ 
> token
>   make[4]: *** [drivers/gpu/drm/udl/udl_fb.o] Error 1
>   make[3]: *** [drivers/gpu/drm/udl] Error 2
>   make[2]: *** [drivers/gpu/drm] Error 2
>   make[1]: *** [drivers/gpu] Error 2
>   make: *** [drivers] Error 2
> 
> 
> Signed-off-by: Chen Gang 
> ---
>  arch/unicore32/include/asm/pgtable.h |   10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/unicore32/include/asm/pgtable.h 
> b/arch/unicore32/include/asm/pgtable.h
> index 233c258..ed6f7d0 100644
> --- a/arch/unicore32/include/asm/pgtable.h
> +++ b/arch/unicore32/include/asm/pgtable.h
> @@ -87,16 +87,16 @@ extern pgprot_t pgprot_kernel;
>  
>  #define PAGE_NONEpgprot_user
>  #define PAGE_SHARED  __pgprot(pgprot_val(pgprot_user | PTE_READ \
> - | PTE_WRITE)
> + | PTE_WRITE))
>  #define PAGE_SHARED_EXEC __pgprot(pgprot_val(pgprot_user | PTE_READ \
>   | PTE_WRITE \
> - | PTE_EXEC)
> + | PTE_EXEC))
>  #define PAGE_COPY__pgprot(pgprot_val(pgprot_user | PTE_READ)
>  #define PAGE_COPY_EXEC   __pgprot(pgprot_val(pgprot_user | 
> PTE_READ \
> - | PTE_EXEC)
> -#define PAGE_READONLY__pgprot(pgprot_val(pgprot_user | 
> PTE_READ)
> + | PTE_EXEC))
> +#define PAGE_READONLY__pgprot(pgprot_val(pgprot_user | 
> PTE_READ))
>  #define PAGE_READONLY_EXEC   __pgprot(pgprot_val(pgprot_user | PTE_READ \
> - | PTE_EXEC)
> + | PTE_EXEC))
>  #define PAGE_KERNEL  pgprot_kernel
>  #define PAGE_KERNEL_EXEC __pgprot(pgprot_val(pgprot_kernel | PTE_EXEC))
>  
> -- 
> 1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do I increment a per-CPU variable without warning?

2014-04-15 Thread Peter Zijlstra
On Tue, Apr 15, 2014 at 03:17:55PM -0700, Paul E. McKenney wrote:
> Hello, Christoph,
> 
> I have a patch that currently uses __this_cpu_inc_return() to increment a
> per-CPU variable, but without preemption disabled.  Of course, given that
> preemption is enabled, it might well end up picking up one CPU's counter,
> adding one to it, then storing the result into some other CPU's counter.
> But this is OK, the test can be probabilistic.  And when I run this
> against v3.14 and earlier, it works fine.
> 
> But now there is 188a81409ff7 (percpu: add preemption checks to
> __this_cpu ops), which gives me lots of splats.
> 
> My current admittedly crude workaround is as follows:
> 
>   static inline bool rcu_should_resched(void)
>   {
>   int t;
> 
>   #ifdef CONFIG_DEBUG_PREEMPT
>   preempt_disable();
>   #endif /* #ifdef CONFIG_DEBUG_PREEMPT */
>   t = __this_cpu_read(rcu_cond_resched_count) + 1;
>   if (t < RCU_COND_RESCHED_LIM) {
>   __this_cpu_write(rcu_cond_resched_count, t);
>   #ifdef CONFIG_DEBUG_PREEMPT
>   preempt_enable();
>   #endif /* #ifdef CONFIG_DEBUG_PREEMPT */
>   return false;
>   }
>   #ifdef CONFIG_DEBUG_PREEMPT
>   preempt_enable();
>   #endif /* #ifdef CONFIG_DEBUG_PREEMPT */
>   return true;
>   }
> 
> This is arguably better than the original __this_cpu_read() because it
> avoids overflow, but I thought I should check to see if there was some
> better way to do this.

you could use raw_cpu_{read,write}(). But note that without the
unconditional preempt_disable() in there your code can read a different
rcu_cond_resched_count than it writes.

So I think you very much want that preempt_disable().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


回复: [PATCH v3] arch:unicore32:mm: add devmem_is_allowed() to support STRICT_DEVMEM

2014-04-15 Thread 管雪涛
Acked-by: Xuetao Guan 

- Chen Gang  写道:
> unicore32 supports STRICT_DEVMEM, so it needs devmem_is_allowed(), like
> some of other architectures have done (e.g. arm, powerpc, x86 ...).
> 
> The related error with allmodconfig:
> 
> CC  drivers/char/mem.o
>   drivers/char/mem.c: In function ‘range_is_allowed’:
>   drivers/char/mem.c:69: error: implicit declaration of function 
> ‘devmem_is_allowed’
>   make[2]: *** [drivers/char/mem.o] Error 1
>   make[1]: *** [drivers/char] Error 2
>   make: *** [drivers] Error 2
> 
> 
> Signed-off-by: Chen Gang 
> ---
>  arch/unicore32/include/asm/io.h |   23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/unicore32/include/asm/io.h b/arch/unicore32/include/asm/io.h
> index 39decb6..ae327e4 100644
> --- a/arch/unicore32/include/asm/io.h
> +++ b/arch/unicore32/include/asm/io.h
> @@ -44,5 +44,28 @@ extern void __uc32_iounmap(volatile void __iomem *addr);
>  #define PIO_MASK (unsigned int)(IO_SPACE_LIMIT)
>  #define PIO_RESERVED (PIO_OFFSET + PIO_MASK + 1)
>  
> +#ifdef CONFIG_STRICT_DEVMEM
> +
> +#include 
> +#include 
> +
> +/*
> + * devmem_is_allowed() checks to see if /dev/mem access to a certain
> + * address is valid. The argument is a physical page number.
> + * We mimic x86 here by disallowing access to system RAM as well as
> + * device-exclusive MMIO regions. This effectively disable read()/write()
> + * on /dev/mem.
> + */
> +static inline int devmem_is_allowed(unsigned long pfn)
> +{
> + if (iomem_is_exclusive(pfn << PAGE_SHIFT))
> + return 0;
> + if (!page_is_ram(pfn))
> + return 1;
> + return 0;
> +}
> +
> +#endif /* CONFIG_STRICT_DEVMEM */
> +
>  #endif   /* __KERNEL__ */
>  #endif   /* __UNICORE_IO_H__ */
> -- 
> 1.7.9.5
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] f2fs: flush dirty directory pages when scattered pages are gathered

2014-04-15 Thread Jaegeuk Kim
Hi all,

Please ignore this, since it turns out that is doesn't solve the
problem.
Instead, please consider the following patch.

>From ac9ad0b1de42dc311323b9c3b8f431f4b097b43b Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Wed, 16 Apr 2014 10:47:06 +0900
Subject: [PATCH] f2fs: adjust free mem size to flush dentry blocks

If so many dirty dentry blocks are cached, not reached to the flush
condition,
we should fall into livelock in balance_dirty_pages.
So, let's consider the mem size for the condition.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c |  3 ++-
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/node.c | 44 ++--
 fs/f2fs/node.h |  5 +++--
 4 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b5cd6d1..6b89b25 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -863,7 +863,8 @@ static int f2fs_write_data_pages(struct
address_space *mapping,
return 0;
 
if (S_ISDIR(inode->i_mode) && wbc->sync_mode == WB_SYNC_NONE &&
-   get_dirty_dents(inode) < nr_pages_to_skip(sbi, DATA))
+   get_dirty_dents(inode) < nr_pages_to_skip(sbi, DATA) &&
+   available_free_memory(sbi, DIRTY_DENTS))
goto skip_write;
 
diff = nr_pages_to_write(sbi, DATA, wbc);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2c5a5da..afca412 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1140,6 +1140,7 @@ f2fs_hash_t f2fs_dentry_hash(const char *,
size_t);
 struct dnode_of_data;
 struct node_info;
 
+bool available_free_memory(struct f2fs_sb_info *, int);
 int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
 bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
 void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index f760793..d08671b 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -26,20 +26,26 @@
 static struct kmem_cache *nat_entry_slab;
 static struct kmem_cache *free_nid_slab;
 
-static inline bool available_free_memory(struct f2fs_nm_info *nm_i, int
type)
+bool available_free_memory(struct f2fs_sb_info *sbi, int type)
 {
+   struct f2fs_nm_info *nm_i = NM_I(sbi);
struct sysinfo val;
unsigned long mem_size = 0;
+   bool res = false;
 
si_meminfo();
-   if (type == FREE_NIDS)
-   mem_size = nm_i->fcnt * sizeof(struct free_nid);
-   else if (type == NAT_ENTRIES)
-   mem_size += nm_i->nat_cnt * sizeof(struct nat_entry);
-   mem_size >>= 12;
-
-   /* give 50:50 memory for free nids and nat caches respectively */
-   return (mem_size < ((val.totalram * nm_i->ram_thresh) >> 11));
+   /* give 25%, 25%, 50% memory for each components respectively */
+   if (type == FREE_NIDS) {
+   mem_size = (nm_i->fcnt * sizeof(struct free_nid)) >> 12;
+   res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 2);
+   } else if (type == NAT_ENTRIES) {
+   mem_size = (nm_i->nat_cnt * sizeof(struct nat_entry)) >> 12;
+   res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 2);
+   } else if (type == DIRTY_DENTS) {
+   mem_size = get_pages(sbi, F2FS_DIRTY_DENTS);
+   res = mem_size < ((val.totalram * nm_i->ram_thresh / 100) >> 1);
+   }
+   return res;
 }
 
 static void clear_node_page_dirty(struct page *page)
@@ -243,7 +249,7 @@ int try_to_free_nats(struct f2fs_sb_info *sbi, int
nr_shrink)
 {
struct f2fs_nm_info *nm_i = NM_I(sbi);
 
-   if (available_free_memory(nm_i, NAT_ENTRIES))
+   if (available_free_memory(sbi, NAT_ENTRIES))
return 0;
 
write_lock(_i->nat_tree_lock);
@@ -1312,13 +1318,14 @@ static void __del_from_free_nid_list(struct
f2fs_nm_info *nm_i,
radix_tree_delete(_i->free_nid_root, i->nid);
 }
 
-static int add_free_nid(struct f2fs_nm_info *nm_i, nid_t nid, bool
build)
+static int add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool
build)
 {
+   struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i;
struct nat_entry *ne;
bool allocated = false;
 
-   if (!available_free_memory(nm_i, FREE_NIDS))
+   if (!available_free_memory(sbi, FREE_NIDS))
return -1;
 
/* 0 nid should not be used */
@@ -1371,9 +1378,10 @@ static void remove_free_nid(struct f2fs_nm_info
*nm_i, nid_t nid)
kmem_cache_free(free_nid_slab, i);
 }
 
-static void scan_nat_page(struct f2fs_nm_info *nm_i,
+static void scan_nat_page(struct f2fs_sb_info *sbi,
struct page *nat_page, nid_t start_nid)
 {
+   struct f2fs_nm_info *nm_i = NM_I(sbi);
struct f2fs_nat_block *nat_blk = page_address(nat_page);
block_t blk_addr;
int i;
@@ -1388,7 +1396,7 @@ static void scan_nat_page(struct f2fs_nm_info
*nm_i,
blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr);
  

Re: [PATCH 10/19] NET: set PF_FSTRANS while holding sk_lock

2014-04-15 Thread Eric Dumazet
On Wed, 2014-04-16 at 14:03 +1000, NeilBrown wrote:
> sk_lock can be taken while reclaiming memory (in nfsd for loop-back
> NFS mounts, and presumably in nfs), and memory can be allocated while
> holding sk_lock, at least via:
> 
>  inet_listen -> inet_csk_listen_start ->reqsk_queue_alloc
> 
> So to avoid deadlocks, always set PF_FSTRANS while holding sk_lock.
> 
> This deadlock was found by lockdep.

Wow, this is adding expensive stuff in fast path, only for nfsd :(

BTW, why should the current->flags should be saved on a socket field,
and not a current->save_flags. This really looks a thread property, not
a socket one.

Why nfsd could not have PF_FSTRANS in its current->flags ?

For applications handling millions of sockets, this makes a difference.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3 v2] f2fs: fix to decrease the number of dirty dentry page

2014-04-15 Thread Jaegeuk Kim
Change log from v1:
 o change the patch, which includes this bug fix

>From 0f3b8427b40b9ace829ba0b16336d5cd67589022 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Tue, 15 Apr 2014 16:04:15 +0900
Subject: [PATCH] f2fs: call redirty_page_for_writepage

This patch replace some general codes with redirty_page_for_writepage,
which
can be enabled after consideration on additional procedure like counting
dirty
pages appropriately.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c |  5 +
 fs/f2fs/data.c   | 10 +++---
 fs/f2fs/node.c   |  5 +
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 890e23d..2902f7d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -174,10 +174,7 @@ no_write:
return 0;
 
 redirty_out:
-   dec_page_count(sbi, F2FS_DIRTY_META);
-   wbc->pages_skipped++;
-   account_page_redirty(page);
-   set_page_dirty(page);
+   redirty_page_for_writepage(wbc, page);
return AOP_WRITEPAGE_ACTIVATE;
 }
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 45abd60..b5cd6d1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -798,10 +798,8 @@ static int f2fs_write_data_page(struct page *page,
 * this page does not have to be written to disk.
 */
offset = i_size & (PAGE_CACHE_SIZE - 1);
-   if ((page->index >= end_index + 1) || !offset) {
-   inode_dec_dirty_dents(inode);
+   if ((page->index >= end_index + 1) || !offset)
goto out;
-   }
 
zero_user_segment(page, offset, PAGE_CACHE_SIZE);
 write:
@@ -810,7 +808,6 @@ write:
 
/* Dentry blocks are controlled by checkpoint */
if (S_ISDIR(inode->i_mode)) {
-   inode_dec_dirty_dents(inode);
err = do_write_data_page(page, );
goto done;
}
@@ -832,15 +829,14 @@ done:
 
clear_cold_data(page);
 out:
+   inode_dec_dirty_dents(inode);
unlock_page(page);
if (need_balance_fs)
f2fs_balance_fs(sbi);
return 0;
 
 redirty_out:
-   wbc->pages_skipped++;
-   account_page_redirty(page);
-   set_page_dirty(page);
+   redirty_page_for_writepage(wbc, page);
return AOP_WRITEPAGE_ACTIVATE;
 }
 
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index a161e95..f760793 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1227,10 +1227,7 @@ static int f2fs_write_node_page(struct page
*page,
return 0;
 
 redirty_out:
-   dec_page_count(sbi, F2FS_DIRTY_NODES);
-   wbc->pages_skipped++;
-   account_page_redirty(page);
-   set_page_dirty(page);
+   redirty_page_for_writepage(wbc, page);
return AOP_WRITEPAGE_ACTIVATE;
 }
 
-- 
1.8.4.474.g128a96c


-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] workqueue: fix bugs in wq_update_unbound_numa() failure path

2014-04-15 Thread Daeseok Youn
wq_update_unbound_numa() failure path has the following two bugs.
 - alloc_unbound_pwq() is called without holding wq->mutex;
 however, if the allocation fails, it jumps to out_unlock
 which tries to unlock wq->mutex.

 - The function should switch to dfl_pwq on failure
 but didn't do so after alloc_unbound_pwq() failure.

Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn 
---
 kernel/workqueue.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0ee63af..3150b21 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4100,7 +4100,8 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
if (!pwq) {
pr_warning("workqueue: allocation failed while updating NUMA 
affinity of \"%s\"\n",
   wq->name);
-   goto out_unlock;
+   mutex_lock(>mutex);
+   goto use_dfl_pwq;
}
 
/*
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/2] iio: adc: exynos_adc: Control special clock of ADC to support Exynos3250 ADC

2014-04-15 Thread Chanwoo Choi
Hi Sachin,

On 04/16/2014 02:04 PM, Sachin Kamat wrote:
> Hi Chanwoo,
> 
> On 16 April 2014 10:25, Chanwoo Choi  wrote:
>> Hi Sachin,
>>
>> On 04/16/2014 01:44 PM, Chanwoo Choi wrote:
>>> Hi Sachin,
>>>
>>> On 04/16/2014 12:48 PM, Sachin Kamat wrote:
 Hi Chanwoo,

 On 14 April 2014 14:37, Chanwoo Choi  wrote:
> This patch control special clock for ADC in Exynos series's FSYS block.
> If special clock of ADC is registerd on clock list of common clk 
> framework,
> Exynos ADC drvier have to control this clock.
>
> Exynos3250/Exynos4/Exynos5 has 'adc' clock as following:
> - 'adc' clock: bus clock for ADC
>
> Exynos3250 has additional 'sclk_tsadc' clock as following:
> - 'sclk_tsadc' clock: special clock for ADC which provide clock to 
> internal ADC
>
> Exynos 4210/4212/4412 and Exynos5250/5420 has not included 'sclk_tsadc' 
> clock
> in FSYS_BLK. But, Exynos3250 based on Cortex-A7 has only included 
> 'sclk_tsadc'
> clock in FSYS_BLK.
>
> Cc: Jonathan Cameron 
> Cc: Kukjin Kim 
> Cc: Naveen Krishna Chatradhi
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Chanwoo Choi 
> Acked-by: Kyungmin Park 
> ---
>  drivers/iio/adc/exynos_adc.c | 54 
> +---
>  1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iio/adc/exynos_adc.c b/drivers/iio/adc/exynos_adc.c
> index d25b262..3c99243 100644
> --- a/drivers/iio/adc/exynos_adc.c
> +++ b/drivers/iio/adc/exynos_adc.c
> @@ -40,8 +40,9 @@
>  #include 
>
>  enum adc_version {
> -   ADC_V1,
> -   ADC_V2
> +   ADC_V1 = 0x1,
> +   ADC_V2 = 0x2,
> +   ADC_V3 = (ADC_V1 | ADC_V2),

 Can't this be simply 0x3? Or is this not really a h/w version?
>>>
>>> Even thought ADC_V3 isn't h/w revision, ADC_V3 include all featues of ADC_V2
>>> and only one difference of clock(sclk_tsadc) from ADC_V2.
>>> I want to describethat ADC_V3 include ADC_V2 feature So, I add as following:
>>>   >> +   ADC_V3 = (ADC_V1 | ADC_V2),
>>>

>  };
>
>  /* EXYNOS4412/5250 ADC_V1 registers definitions */
> @@ -88,6 +89,7 @@ struct exynos_adc {
> void __iomem*regs;
> void __iomem*enable_reg;
> struct clk  *clk;
> +   struct clk  *sclk;
> unsigned intirq;
> struct regulator*vdd;
>
> @@ -100,6 +102,7 @@ struct exynos_adc {
>  static const struct of_device_id exynos_adc_match[] = {
> { .compatible = "samsung,exynos-adc-v1", .data = (void *)ADC_V1 },
> { .compatible = "samsung,exynos-adc-v2", .data = (void *)ADC_V2 },
> +   { .compatible = "samsung,exynos-adc-v3", .data = (void *)ADC_V3 },
> {},
>  };
>  MODULE_DEVICE_TABLE(of, exynos_adc_match);
> @@ -128,7 +131,7 @@ static int exynos_read_raw(struct iio_dev *indio_dev,
> mutex_lock(_dev->mlock);
>
> /* Select the channel to be used and Trigger conversion */
> -   if (info->version == ADC_V2) {
> +   if (info->version & ADC_V2) {

 So, now this would be applicable for ADC_V3 too, right?
>>
>> ADC_V3 isn't h/w version. So, I think this code is proper instead of using 
>> ADC_V3 direclty.
>> I want to use ADC_V3 version on checking clock(sclk_tsadc).
> 
> OK. Just a readability concern. Probably a check something like
> (version >= ADC_V2) would
> have made it more explicit.

OK I'll modify it as your comment.
(version >= ADC_V2)

> 
>>


> con2 = readl(ADC_V2_CON2(info->regs));
> con2 &= ~ADC_V2_CON2_ACH_MASK;
> con2 |= ADC_V2_CON2_ACH_SEL(chan->address);
> @@ -165,7 +168,7 @@ static irqreturn_t exynos_adc_isr(int irq, void 
> *dev_id)
> info->value = readl(ADC_V1_DATX(info->regs)) &
> ADC_DATX_MASK;
> /* clear irq */
> -   if (info->version == ADC_V2)
> +   if (info->version & ADC_V2)
> writel(1, ADC_V2_INT_ST(info->regs));
> else
> writel(1, ADC_V1_INTCLR(info->regs));
> @@ -226,11 +229,25 @@ static int exynos_adc_remove_devices(struct device 
> *dev, void *c)
> return 0;
>  }
>
> +static void exynos_adc_enable_clock(struct exynos_adc *info, bool enable)
> +{
> +   if (enable) {
> +   clk_prepare_enable(info->clk);

 This could fail. Is it OK without any checks?
>>>
>>> OK, I'll check return value.
>>
>> Do you want to check return value always?
> 
> It is a good practice to check the return values for errors. Having
> said that it depends on
> your s/w design and the h/w requirements. If proceeding with the error
> does not 

Re: [PATCH] mtd: nand: remove unused function input parameter

2014-04-15 Thread Brian Norris
On Wed, Dec 25, 2013 at 12:40:50PM +, Caizhiyong wrote:
> From: Cai Zhiyong 
> Date: Wed, 25 Dec 2013 20:11:15 +0800
> Subject: [PATCH] mtd: nand: remove unused function input parameter
> 
> The nand_get_flash_type parameter "busw" input value is not used by any
> branch, and it is updated before use it in the function, so remove it, 
> define the "busw" as an internal variable.
> 
> Signed-off-by: Cai Zhiyong 

I merged this patch and sent it upstream for 3.15-rc1, but I don't think
I ever sent you a notification... here goes:

Applied, thanks!

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext4: deadlock occurs when running fsstress and ENOSPC errors are seen.

2014-04-15 Thread Theodore Ts'o
On Wed, Apr 16, 2014 at 10:30:10AM +0530, Amit Sahrawat wrote:
> 4)Corrupt the block group ‘1’  by writing all ‘1’, we had one file
> with all 1’s, so using ‘dd’ –
> dd if=i_file of=/dev/sdb1 bs=4096 seek=17 count=1
> After this mount the partition – create few random size files and then
> ran ‘fsstress,

Um, sigh.  You didn't say that you were deliberately corrupting the
file system.  That wasn't in the subject line, or anywhere else in the
original message.

So the question isn't how the file system got corrupted, but that
you'd prefer that the system recovers without hanging after this
corruption.

I wish you had *said* that.  It would have saved me a lot of time,
since I was trying to figure out how the system had gotten so
corrupted (not realizing you had deliberately corrupted the file
system).

So I think if you run "tune2fs -e remount-ro /dev/sdb1" before you
started the fsstress, the file system would have remounted the
filesystem read-only at the first EXT4-fs error message.  This would
avoid the hang that you saw, since the file system would hopefully
"failed fast", before th euser had the opportunity to put data into
the page cache that would be lost when the system discovered there was
no place to put the data.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 10/21] mtd: support BB SRAM on ICP DAS LP-8x4x

2014-04-15 Thread Brian Norris
Hi Sergei,

On Tue, Dec 17, 2013 at 11:37:40PM +0400, Sergei Ianovich wrote:
> This provides an MTD device driver for 512kB of battery backed up SRAM
> on ICPDAS LP-8X4X programmable automation controllers.
> 
> SRAM chip is connected via FPGA and is not accessible without a driver,
> unlike flash memory which is wired to CPU MMU.
> 
> This SRAM becomes an excellent persisent storage of volatile process
> data like counter values and sensor statuses. Storing those data in
> flash or mmc card is not a viable solution.
> 
> Signed-off-by: Sergei Ianovich 
> ---
>v2..v3
>* no changes (except number 08/16 -> 10/21)
> 
>v0..v2
>* use device tree
>* use devm helpers where possible

What's the status of this series? Is the rest of this platform support
going in via other trees yet? I have a few trivial comments, but this
driver mostly looks good as-is.

>  arch/arm/boot/dts/pxa27x-lp8x4x.dts |   6 +
>  arch/arm/configs/lp8x4x_defconfig   |   1 +
>  drivers/mtd/devices/Kconfig |  14 +++
>  drivers/mtd/devices/Makefile|   1 +
>  drivers/mtd/devices/sram_lp8x4x.c   | 227 
> 
>  5 files changed, 249 insertions(+)
>  create mode 100644 drivers/mtd/devices/sram_lp8x4x.c
> 
> diff --git a/arch/arm/boot/dts/pxa27x-lp8x4x.dts 
> b/arch/arm/boot/dts/pxa27x-lp8x4x.dts
> index 872c386..07856e0 100644
> --- a/arch/arm/boot/dts/pxa27x-lp8x4x.dts
> +++ b/arch/arm/boot/dts/pxa27x-lp8x4x.dts
> @@ -161,6 +161,12 @@
>   reg = <0x901c 0x1>;
>   status = "okay";
>   };
> +
> + sram@a000 {
> + compatible = "icpdas,sram-lp8x4x";
> + reg = <0xa000 0x1000
> +0x901e 0x1>;
> + };

You'll need a (trivial) DT binding doc in
Documentation/devicetree/bindings/ to go with this. You might also need
to split the arch/arm/boot/dts/ stuff out to a separate patch, so the
MTD driver can go in the MTD tree separate from the ARM tree bits.

>   };
>   };
>  };
> diff --git a/arch/arm/configs/lp8x4x_defconfig 
> b/arch/arm/configs/lp8x4x_defconfig
> index d60e37a..17a4e6f 100644
> --- a/arch/arm/configs/lp8x4x_defconfig
> +++ b/arch/arm/configs/lp8x4x_defconfig
> @@ -57,6 +57,7 @@ CONFIG_MTD_CFI_ADV_OPTIONS=y
>  CONFIG_MTD_CFI_GEOMETRY=y
>  CONFIG_MTD_CFI_INTELEXT=y
>  CONFIG_MTD_PHYSMAP_OF=y
> +CONFIG_MTD_SRAM_LP8X4X=y
>  CONFIG_PROC_DEVICETREE=y
>  CONFIG_BLK_DEV_LOOP=y
>  CONFIG_BLK_DEV_LOOP_MIN_COUNT=2
> diff --git a/drivers/mtd/devices/Kconfig b/drivers/mtd/devices/Kconfig
> index 0128138..95f2075 100644
> --- a/drivers/mtd/devices/Kconfig
> +++ b/drivers/mtd/devices/Kconfig
> @@ -217,4 +217,18 @@ config BCH_CONST_T
>   default 4
>  endif
>  
> +config MTD_SRAM_LP8X4X
> + tristate "SRAM on ICPDAS LP-8X4X"
> + depends on OF && ARCH_PXA
> +   ---help---
> +  This provides an MTD device driver for 512kiB of battery backed up SRAM
> +  on ICPDAS LP-8X4X programmable automation controllers.
> +
> +  SRAM chip is connected via FPGA and is not accessible without a driver,
> +  unlike flash memory which is wired to CPU MMU.
> +
> +  Say N, unless you plan to run this kernel on LP-8X4X.
> +
> +  If you say M, the module will be called sram_lp8x4x.
> +
>  endmenu
> diff --git a/drivers/mtd/devices/Makefile b/drivers/mtd/devices/Makefile
> index d83bd73..56a74c9 100644
> --- a/drivers/mtd/devices/Makefile
> +++ b/drivers/mtd/devices/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_MTD_NAND_OMAP_BCH) += elm.o
>  obj-$(CONFIG_MTD_SPEAR_SMI)  += spear_smi.o
>  obj-$(CONFIG_MTD_SST25L) += sst25l.o
>  obj-$(CONFIG_MTD_BCM47XXSFLASH)  += bcm47xxsflash.o
> +obj-$(CONFIG_MTD_SRAM_LP8X4X)+= sram_lp8x4x.o
>  
>  
>  CFLAGS_docg3.o   += -I$(src)
> diff --git a/drivers/mtd/devices/sram_lp8x4x.c 
> b/drivers/mtd/devices/sram_lp8x4x.c
> new file mode 100644
> index 000..9dc7149
> --- /dev/null
> +++ b/drivers/mtd/devices/sram_lp8x4x.c
> @@ -0,0 +1,227 @@
> +/*
> + *  linux/drivers/mtd/devices/lp8x4x_sram.c
> + *
> + *  MTD Driver for SRAM on ICPDAS LP-8x4x
> + *  Copyright (C) 2013 Sergei Ianovich 
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License version 2 as
> + *  published by the Free Software Foundation or any later version.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 

Do you need this? It looks unused.

> +
> +struct lp8x4x_sram_info {
> + void __iomem*bank;
> + void __iomem*virt;
> + struct mutexlock;
> + unsignedactive_bank;
> + struct mtd_info mtd;
> +};
> +
> +static int
> +lp8x4x_sram_erase(struct mtd_info *mtd, struct 

Re: [PATCHv2 1/2] iio: adc: exynos_adc: Control special clock of ADC to support Exynos3250 ADC

2014-04-15 Thread Sachin Kamat
Hi Chanwoo,

On 16 April 2014 10:25, Chanwoo Choi  wrote:
> Hi Sachin,
>
> On 04/16/2014 01:44 PM, Chanwoo Choi wrote:
>> Hi Sachin,
>>
>> On 04/16/2014 12:48 PM, Sachin Kamat wrote:
>>> Hi Chanwoo,
>>>
>>> On 14 April 2014 14:37, Chanwoo Choi  wrote:
 This patch control special clock for ADC in Exynos series's FSYS block.
 If special clock of ADC is registerd on clock list of common clk framework,
 Exynos ADC drvier have to control this clock.

 Exynos3250/Exynos4/Exynos5 has 'adc' clock as following:
 - 'adc' clock: bus clock for ADC

 Exynos3250 has additional 'sclk_tsadc' clock as following:
 - 'sclk_tsadc' clock: special clock for ADC which provide clock to 
 internal ADC

 Exynos 4210/4212/4412 and Exynos5250/5420 has not included 'sclk_tsadc' 
 clock
 in FSYS_BLK. But, Exynos3250 based on Cortex-A7 has only included 
 'sclk_tsadc'
 clock in FSYS_BLK.

 Cc: Jonathan Cameron 
 Cc: Kukjin Kim 
 Cc: Naveen Krishna Chatradhi
 Cc: linux-...@vger.kernel.org
 Signed-off-by: Chanwoo Choi 
 Acked-by: Kyungmin Park 
 ---
  drivers/iio/adc/exynos_adc.c | 54 
 +---
  1 file changed, 41 insertions(+), 13 deletions(-)

 diff --git a/drivers/iio/adc/exynos_adc.c b/drivers/iio/adc/exynos_adc.c
 index d25b262..3c99243 100644
 --- a/drivers/iio/adc/exynos_adc.c
 +++ b/drivers/iio/adc/exynos_adc.c
 @@ -40,8 +40,9 @@
  #include 

  enum adc_version {
 -   ADC_V1,
 -   ADC_V2
 +   ADC_V1 = 0x1,
 +   ADC_V2 = 0x2,
 +   ADC_V3 = (ADC_V1 | ADC_V2),
>>>
>>> Can't this be simply 0x3? Or is this not really a h/w version?
>>
>> Even thought ADC_V3 isn't h/w revision, ADC_V3 include all featues of ADC_V2
>> and only one difference of clock(sclk_tsadc) from ADC_V2.
>> I want to describethat ADC_V3 include ADC_V2 feature So, I add as following:
>>   >> +   ADC_V3 = (ADC_V1 | ADC_V2),
>>
>>>
  };

  /* EXYNOS4412/5250 ADC_V1 registers definitions */
 @@ -88,6 +89,7 @@ struct exynos_adc {
 void __iomem*regs;
 void __iomem*enable_reg;
 struct clk  *clk;
 +   struct clk  *sclk;
 unsigned intirq;
 struct regulator*vdd;

 @@ -100,6 +102,7 @@ struct exynos_adc {
  static const struct of_device_id exynos_adc_match[] = {
 { .compatible = "samsung,exynos-adc-v1", .data = (void *)ADC_V1 },
 { .compatible = "samsung,exynos-adc-v2", .data = (void *)ADC_V2 },
 +   { .compatible = "samsung,exynos-adc-v3", .data = (void *)ADC_V3 },
 {},
  };
  MODULE_DEVICE_TABLE(of, exynos_adc_match);
 @@ -128,7 +131,7 @@ static int exynos_read_raw(struct iio_dev *indio_dev,
 mutex_lock(_dev->mlock);

 /* Select the channel to be used and Trigger conversion */
 -   if (info->version == ADC_V2) {
 +   if (info->version & ADC_V2) {
>>>
>>> So, now this would be applicable for ADC_V3 too, right?
>
> ADC_V3 isn't h/w version. So, I think this code is proper instead of using 
> ADC_V3 direclty.
> I want to use ADC_V3 version on checking clock(sclk_tsadc).

OK. Just a readability concern. Probably a check something like
(version >= ADC_V2) would
have made it more explicit.

>
>>>
>>>
 con2 = readl(ADC_V2_CON2(info->regs));
 con2 &= ~ADC_V2_CON2_ACH_MASK;
 con2 |= ADC_V2_CON2_ACH_SEL(chan->address);
 @@ -165,7 +168,7 @@ static irqreturn_t exynos_adc_isr(int irq, void 
 *dev_id)
 info->value = readl(ADC_V1_DATX(info->regs)) &
 ADC_DATX_MASK;
 /* clear irq */
 -   if (info->version == ADC_V2)
 +   if (info->version & ADC_V2)
 writel(1, ADC_V2_INT_ST(info->regs));
 else
 writel(1, ADC_V1_INTCLR(info->regs));
 @@ -226,11 +229,25 @@ static int exynos_adc_remove_devices(struct device 
 *dev, void *c)
 return 0;
  }

 +static void exynos_adc_enable_clock(struct exynos_adc *info, bool enable)
 +{
 +   if (enable) {
 +   clk_prepare_enable(info->clk);
>>>
>>> This could fail. Is it OK without any checks?
>>
>> OK, I'll check return value.
>
> Do you want to check return value always?

It is a good practice to check the return values for errors. Having
said that it depends on
your s/w design and the h/w requirements. If proceeding with the error
does not cause any
functional issues, then it is OK to ignore them. However I would
atleast prefer to print
a warning/info about such failures.

-- 
With warm regards,
Sachin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

Re: Ext4: deadlock occurs when running fsstress and ENOSPC errors are seen.

2014-04-15 Thread Amit Sahrawat
Thanks Ted, for the detailed reply.

We could not retain the original HDD – on which we got the issue. In
order to replicate the problem we did steps like this:
1)  Make 250MB partition
2)  Format the partition with blocksize ‘4K’
3)  Using dumpe2fs – get the block group information
Group 0: (Blocks 0-32767)
  Checksum 0x89ae, unused inodes 29109
  Primary superblock at 0, Group descriptors at 1-1
  Reserved GDT blocks at 2-15
  Block bitmap at 16 (+16), Inode bitmap at 32 (+32)
  Inode table at 48-957 (+48)
  26826 free blocks, 29109 free inodes, 2 directories, 29109 unused inodes
  Free blocks: 5942-32767
  Free inodes: 12-29120
Group 1: (Blocks 32768-58226) [INODE_UNINIT]
  Checksum 0xb43a, unused inodes 29120
  Backup superblock at 32768, Group descriptors at 32769-32769
  Reserved GDT blocks at 32770-32783
  Block bitmap at 17 (bg #0 + 17), Inode bitmap at 33 (bg #0 + 33)
  Inode table at 958-1867 (bg #0 + 958)
  25443 free blocks, 29120 free inodes, 0 directories, 29120 unused inodes
  Free blocks: 32784-58226
  Free inodes: 29121-58240
4)  Corrupt the block group ‘1’  by writing all ‘1’, we had one file
with all 1’s, so using ‘dd’ –
dd if=i_file of=/dev/sdb1 bs=4096 seek=17 count=1
After this mount the partition – create few random size files and then
ran ‘fsstress,

fsstress -p 10 -n 100 -l 100 -d /mnt/test_dir

and we get logs like before the hang:
#fsstress -p 10 -n 100 -l 100 -d /mnt/test_dir
seed = 582332
EXT4-fs error (device sdb1): ext4_mb_generate_buddy:743: group 1,
20480 clusters in bitmap, 25443 in gd; block bitmap corrupt.
JBD2: Spotted dirty metadata buffer (dev = sdb1, blocknr = 0). There's
a risk of filesystem corruption in case of system crash.

EXT4-fs (sdb1): delayed block allocation failed for inode 26 at
logical offset 181 with max blocks 2 with error -28
EXT4-fs (sdb1): This should not happen!! Data will be lost

EXT4-fs (sdb1): Total free blocks count 0
EXT4-fs (sdb1): Free/Dirty block details
EXT4-fs (sdb1): free_blocks=25443
EXT4-fs (sdb1): dirty_blocks=51
EXT4-fs (sdb1): Block reservation details
EXT4-fs (sdb1): i_reserved_data_blocks=9
EXT4-fs (sdb1): i_reserved_meta_blocks=1

EXT4-fs (sdb1): delayed block allocation failed for inode 101 at
logical offset 198 with max blocks 1 with error -28
EXT4-fs (sdb1): This should not happen!! Data will be lost

EXT4-fs (sdb1): Total free blocks count 0
EXT4-fs (sdb1): Free/Dirty block details
EXT4-fs (sdb1): free_blocks=25443
EXT4-fs (sdb1): dirty_blocks=43
EXT4-fs (sdb1): Block reservation details
EXT4-fs (sdb1): i_reserved_data_blocks=1
EXT4-fs (sdb1): i_reserved_meta_blocks=1

EXT4-fs (sdb1): delayed block allocation failed for inode 94 at
logical offset 450 with max blocks 1 with error -28
EXT4-fs (sdb1): This should not happen!! Data will be lost

EXT4-fs (sdb1): Total free blocks count 0
EXT4-fs (sdb1): Free/Dirty block details
EXT4-fs (sdb1): free_blocks=25443
EXT4-fs (sdb1): dirty_blocks=36
EXT4-fs (sdb1): Block reservation details
EXT4-fs (sdb1): i_reserved_data_blocks=12
EXT4-fs (sdb1): i_reserved_meta_blocks=2


…
EXT4-fs (sdb1): error count: 3
EXT4-fs (sdb1): initial error at 545: ext4_mb_generate_buddy:743
EXT4-fs (sdb1): last error at 42: ext4_mb_generate_buddy:743
…
Yes, we are running this on ARM target and we did not set the time
before running these operations. So, the time actually corresponds to
date -d @545
Thu Jan  1 05:39:05 IST 1970
date -d @42
Thu Jan  1 05:30:42 IST 1970


Yes, we are running kernel version 3.8 and applied the patches from
‘Darrick’ to fix the first looping around in ext4_da_writepages.

As you suggested, we also made change to mark the FS READ ONLY in case
of corruption. The changes are;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c0fbd96..04f3a66 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1641,8 +1641,10 @@ static void mpage_da_map_and_submit(struct
mpage_da_data *mpd)
 mpd->b_size >> mpd->inode->i_blkbits, err);
ext4_msg(sb, KERN_CRIT,
"This should not happen!! Data will be lost\n");
-   if (err == -ENOSPC)
+   if (err == -ENOSPC) {
ext4_print_free_blocks(mpd->inode);
+   mpd->inode->i_sb->s_flags |= MS_RDONLY;
+   }
}
/* invalidate all the pages */
ext4_da_block_invalidatepages(mpd);
@@ -2303,6 +2305,36 @@ out:
return ret;
 }

+static void ext4_invalidate_mapping(struct  address_space *mapping)
+{
+   struct pagevec pvec;
+   unsigned int i;
+   pgoff_t index = 0;
+   struct inode* inode = mapping->host;
+
+   pagevec_init(, 0);
+   while (pagevec_lookup_tag(, mapping, , PAGECACHE_TAG_DIRTY,
+ PAGEVEC_SIZE)) {
+   for (i = 0; i < pagevec_count(); i++) {
+   struct page *page = pvec.pages[i];
+  

Re: [PATCHv2 1/2] iio: adc: exynos_adc: Control special clock of ADC to support Exynos3250 ADC

2014-04-15 Thread Chanwoo Choi
Hi Sachin,

On 04/16/2014 01:44 PM, Chanwoo Choi wrote:
> Hi Sachin,
> 
> On 04/16/2014 12:48 PM, Sachin Kamat wrote:
>> Hi Chanwoo,
>>
>> On 14 April 2014 14:37, Chanwoo Choi  wrote:
>>> This patch control special clock for ADC in Exynos series's FSYS block.
>>> If special clock of ADC is registerd on clock list of common clk framework,
>>> Exynos ADC drvier have to control this clock.
>>>
>>> Exynos3250/Exynos4/Exynos5 has 'adc' clock as following:
>>> - 'adc' clock: bus clock for ADC
>>>
>>> Exynos3250 has additional 'sclk_tsadc' clock as following:
>>> - 'sclk_tsadc' clock: special clock for ADC which provide clock to internal 
>>> ADC
>>>
>>> Exynos 4210/4212/4412 and Exynos5250/5420 has not included 'sclk_tsadc' 
>>> clock
>>> in FSYS_BLK. But, Exynos3250 based on Cortex-A7 has only included 
>>> 'sclk_tsadc'
>>> clock in FSYS_BLK.
>>>
>>> Cc: Jonathan Cameron 
>>> Cc: Kukjin Kim 
>>> Cc: Naveen Krishna Chatradhi
>>> Cc: linux-...@vger.kernel.org
>>> Signed-off-by: Chanwoo Choi 
>>> Acked-by: Kyungmin Park 
>>> ---
>>>  drivers/iio/adc/exynos_adc.c | 54 
>>> +---
>>>  1 file changed, 41 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/iio/adc/exynos_adc.c b/drivers/iio/adc/exynos_adc.c
>>> index d25b262..3c99243 100644
>>> --- a/drivers/iio/adc/exynos_adc.c
>>> +++ b/drivers/iio/adc/exynos_adc.c
>>> @@ -40,8 +40,9 @@
>>>  #include 
>>>
>>>  enum adc_version {
>>> -   ADC_V1,
>>> -   ADC_V2
>>> +   ADC_V1 = 0x1,
>>> +   ADC_V2 = 0x2,
>>> +   ADC_V3 = (ADC_V1 | ADC_V2),
>>
>> Can't this be simply 0x3? Or is this not really a h/w version?
> 
> Even thought ADC_V3 isn't h/w revision, ADC_V3 include all featues of ADC_V2
> and only one difference of clock(sclk_tsadc) from ADC_V2.
> I want to describethat ADC_V3 include ADC_V2 feature So, I add as following:
>   >> +   ADC_V3 = (ADC_V1 | ADC_V2),
> 
>>
>>>  };
>>>
>>>  /* EXYNOS4412/5250 ADC_V1 registers definitions */
>>> @@ -88,6 +89,7 @@ struct exynos_adc {
>>> void __iomem*regs;
>>> void __iomem*enable_reg;
>>> struct clk  *clk;
>>> +   struct clk  *sclk;
>>> unsigned intirq;
>>> struct regulator*vdd;
>>>
>>> @@ -100,6 +102,7 @@ struct exynos_adc {
>>>  static const struct of_device_id exynos_adc_match[] = {
>>> { .compatible = "samsung,exynos-adc-v1", .data = (void *)ADC_V1 },
>>> { .compatible = "samsung,exynos-adc-v2", .data = (void *)ADC_V2 },
>>> +   { .compatible = "samsung,exynos-adc-v3", .data = (void *)ADC_V3 },
>>> {},
>>>  };
>>>  MODULE_DEVICE_TABLE(of, exynos_adc_match);
>>> @@ -128,7 +131,7 @@ static int exynos_read_raw(struct iio_dev *indio_dev,
>>> mutex_lock(_dev->mlock);
>>>
>>> /* Select the channel to be used and Trigger conversion */
>>> -   if (info->version == ADC_V2) {
>>> +   if (info->version & ADC_V2) {
>>
>> So, now this would be applicable for ADC_V3 too, right?

ADC_V3 isn't h/w version. So, I think this code is proper instead of using 
ADC_V3 direclty.
I want to use ADC_V3 version on checking clock(sclk_tsadc).

>>
>>
>>> con2 = readl(ADC_V2_CON2(info->regs));
>>> con2 &= ~ADC_V2_CON2_ACH_MASK;
>>> con2 |= ADC_V2_CON2_ACH_SEL(chan->address);
>>> @@ -165,7 +168,7 @@ static irqreturn_t exynos_adc_isr(int irq, void *dev_id)
>>> info->value = readl(ADC_V1_DATX(info->regs)) &
>>> ADC_DATX_MASK;
>>> /* clear irq */
>>> -   if (info->version == ADC_V2)
>>> +   if (info->version & ADC_V2)
>>> writel(1, ADC_V2_INT_ST(info->regs));
>>> else
>>> writel(1, ADC_V1_INTCLR(info->regs));
>>> @@ -226,11 +229,25 @@ static int exynos_adc_remove_devices(struct device 
>>> *dev, void *c)
>>> return 0;
>>>  }
>>>
>>> +static void exynos_adc_enable_clock(struct exynos_adc *info, bool enable)
>>> +{
>>> +   if (enable) {
>>> +   clk_prepare_enable(info->clk);
>>
>> This could fail. Is it OK without any checks?
> 
> OK, I'll check return value.

Do you want to check return value always?
I think again, Some device drivers in mainline would not check
return value of clock function. If maintainer confirm this modification,
I'll fix it as your comment.

Best Regards,
Chanwoo Choi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH] kernel/panic: Add "late_kdump" option for kdump in unstable condition

2014-04-15 Thread Masami Hiramatsu
(2014/04/16 11:33), Vivek Goyal wrote:
> On Wed, Apr 16, 2014 at 10:28:39AM +0900, Masami Hiramatsu wrote:
>> (2014/04/15 23:08), Vivek Goyal wrote:
>>> On Tue, Apr 15, 2014 at 10:37:40AM +0900, Masami Hiramatsu wrote:
>>>
>>> [..]
> Masami,
>
> So what's the alternative to kdump which is more reliable? IOW, what
> action you are planning to take through kmsg_dump() or through
> panic_notifiers?
>
> I have seen that many a times developers have tried to make the case 
> to save kernel buffers to NVRAM. Does it work well? Has it been proven
> to be more reliable than kdump?

 Yeah, one possible option is the NVRAM, but even with the serial,
 there are other reasons to kick the notifiers, e.g.
  - dump to ipmi which has a very small amount of non-volatile memory
  - ftrace_dump() to dump "flight recorder" log to serial
>>>
>>> So why do we need to run them in crashed kernel? Only argument I seem
>>> to receive that there is no guarantee that kdump kernel will successfully
>>> boot hence we want to run these notifiers.
>>>
>>> But what's the guarantee that these will run successfully without creating
>>> futher issues? Is there data to prove it.
>>
>> I think there is no guarantee, but that's same as kdump is.
>> However, if we can try both, there is higher possibility (more cases)
>> to save some information.
> 
> This is only valid if the entity which is running before kdump has 
> higher probability of saving some useful information. So do kmsg_dump()
> and backend drivers provide more reliable way to save kernel logs as
> compared to kdump? 

IMHO, reliability discussion is meaningless at this point, because
it depends on the hardware/software configuration. For example,
someone has setup a server with a serial logger, serial output
is much more reliable. But it does NOT mean kdump is useless.
For precise information, we actually need kdump but it will be less
reliable than serial logger. On the other hand, if we have no
such external equipments, we'd better try kdump first.
(Actually, the problem of kdump is here, if we try kdump, it never
 return when it fails in the second kernel boot. Thus we can't fail
 back to the other handlers.)

> [..]
>>> I think big debate here is that we should be able to do most of it
>>> in second kernel. 
>>
>> No, that's another topic what we talk about.
>>
>> What I (and others who had argued) consider that in some rare cases,
>> kdump might fail to boot up the second kernel, and only for who worries
>> in those cases, we can give a chance.
> 
> And *rare failure cases* don't exist in other mechanisms which are
> planning to take control before kdump?

No, definitely not. Thus users must bet the safer way. :)
As I said, this option doesn't guarantee improving the safeness
of getting crash information, but just gives another option which
users want.

> You are assuming that any entity
> which runs before kdump is more reliable than kdump. And I don't think
> anybody has any data to prove that. People are just looking for a hook
> to execute things before kdump hoping that it will provide them better
> results.

Yeah, they hope, hope to have a chance to bet the better handler which
they believe. As you said, there are any data to prove that kdump is
safer than others too. Why don't we give a hand?

>>> If you provide a knob to run these in first kernel, this functionality
>>> will never migrate to second kernel.
>>
>> No, there are many use-cases which doesn't (and can't) use kdump
>> because of the limitation of resources etc. For those cases, that
>> functionality never migrate (means move) to the second one.
> 
> What are those use cases? What resources you are referring too. Are you
> planning to do a whole lot after kernel has crashed. That will not make
> much sense.

There are not only the enterprise server users, but also embedded
device users. Sometimes such small devices are distributed on the
remote site or somewhere outside. And they have limited resources.
Kdump is too big to them, but other panic handlers will be useful.

>>> And trying to make them safe in
>>> crashed kernel is a losing battle, I think.
>>
>> Why? the best goal what users expect is both panic-notifiers and kdump
>> runs safely. If one of them fails, that's a bug (except for some rare
>> hardware-related corruption.)
> 
> So you think that running panic-notifiers can be made safe? How would
> we do that?

Step 1. Add the late_kdump option.
Step 2. "Brave" users start using the option with panic-notifiers.
Step 3. They hits problems on that and reports bugs.
Step 4. We fix them.

> What's the special action panicn notifiers are taking which can't be
> done in second kernel. 

No, ant that is out of focus. What I meant is NOT what action
we can take IN the second kernel, but what action we can take
BEFORE the second kernel.

>>> So providing this knob does not help with making these notifiers better.
>>> These notifiers can become better 

Re: [PATCH 17/19] VFS: set PF_FSTRANS while namespace_sem is held.

2014-04-15 Thread Al Viro
On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote:
> namespace_sem can be taken while various i_mutex locks are held, so we
> need to avoid reclaim from blocking on an FS (particularly loop-back
> NFS).

I would really prefer to deal with that differently - by explicit change of
gfp_t arguments of allocators.

The thing is, namespace_sem is held *only* over allocations, and not a lot
of them, at that - only mnt_alloc_id(), mnt_alloc_group_id(), alloc_vfsmnt()
and new_mountpoint().  That is all that is allowed.

Again, actual work with filesystems (setup, shutdown, remount, pathname
resolution, etc.) is all done outside of namespace_sem; it's held only
for manipulations of fs/{namespace,pnode}.c data structures and the only
reason it isn't a spinlock is that we need to do some allocations.

So I'd rather slap GFP_NOFS on those few allocations...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/2] iio: adc: exynos_adc: Control special clock of ADC to support Exynos3250 ADC

2014-04-15 Thread Chanwoo Choi
Hi Sachin,

On 04/16/2014 12:48 PM, Sachin Kamat wrote:
> Hi Chanwoo,
> 
> On 14 April 2014 14:37, Chanwoo Choi  wrote:
>> This patch control special clock for ADC in Exynos series's FSYS block.
>> If special clock of ADC is registerd on clock list of common clk framework,
>> Exynos ADC drvier have to control this clock.
>>
>> Exynos3250/Exynos4/Exynos5 has 'adc' clock as following:
>> - 'adc' clock: bus clock for ADC
>>
>> Exynos3250 has additional 'sclk_tsadc' clock as following:
>> - 'sclk_tsadc' clock: special clock for ADC which provide clock to internal 
>> ADC
>>
>> Exynos 4210/4212/4412 and Exynos5250/5420 has not included 'sclk_tsadc' clock
>> in FSYS_BLK. But, Exynos3250 based on Cortex-A7 has only included 
>> 'sclk_tsadc'
>> clock in FSYS_BLK.
>>
>> Cc: Jonathan Cameron 
>> Cc: Kukjin Kim 
>> Cc: Naveen Krishna Chatradhi
>> Cc: linux-...@vger.kernel.org
>> Signed-off-by: Chanwoo Choi 
>> Acked-by: Kyungmin Park 
>> ---
>>  drivers/iio/adc/exynos_adc.c | 54 
>> +---
>>  1 file changed, 41 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/iio/adc/exynos_adc.c b/drivers/iio/adc/exynos_adc.c
>> index d25b262..3c99243 100644
>> --- a/drivers/iio/adc/exynos_adc.c
>> +++ b/drivers/iio/adc/exynos_adc.c
>> @@ -40,8 +40,9 @@
>>  #include 
>>
>>  enum adc_version {
>> -   ADC_V1,
>> -   ADC_V2
>> +   ADC_V1 = 0x1,
>> +   ADC_V2 = 0x2,
>> +   ADC_V3 = (ADC_V1 | ADC_V2),
> 
> Can't this be simply 0x3? Or is this not really a h/w version?

Even thought ADC_V3 isn't h/w revision, ADC_V3 include all featues of ADC_V2
and only one difference of clock(sclk_tsadc) from ADC_V2.
I want to describethat ADC_V3 include ADC_V2 feature So, I add as following:
>> +   ADC_V3 = (ADC_V1 | ADC_V2),

> 
>>  };
>>
>>  /* EXYNOS4412/5250 ADC_V1 registers definitions */
>> @@ -88,6 +89,7 @@ struct exynos_adc {
>> void __iomem*regs;
>> void __iomem*enable_reg;
>> struct clk  *clk;
>> +   struct clk  *sclk;
>> unsigned intirq;
>> struct regulator*vdd;
>>
>> @@ -100,6 +102,7 @@ struct exynos_adc {
>>  static const struct of_device_id exynos_adc_match[] = {
>> { .compatible = "samsung,exynos-adc-v1", .data = (void *)ADC_V1 },
>> { .compatible = "samsung,exynos-adc-v2", .data = (void *)ADC_V2 },
>> +   { .compatible = "samsung,exynos-adc-v3", .data = (void *)ADC_V3 },
>> {},
>>  };
>>  MODULE_DEVICE_TABLE(of, exynos_adc_match);
>> @@ -128,7 +131,7 @@ static int exynos_read_raw(struct iio_dev *indio_dev,
>> mutex_lock(_dev->mlock);
>>
>> /* Select the channel to be used and Trigger conversion */
>> -   if (info->version == ADC_V2) {
>> +   if (info->version & ADC_V2) {
> 
> So, now this would be applicable for ADC_V3 too, right?
> 
> 
>> con2 = readl(ADC_V2_CON2(info->regs));
>> con2 &= ~ADC_V2_CON2_ACH_MASK;
>> con2 |= ADC_V2_CON2_ACH_SEL(chan->address);
>> @@ -165,7 +168,7 @@ static irqreturn_t exynos_adc_isr(int irq, void *dev_id)
>> info->value = readl(ADC_V1_DATX(info->regs)) &
>> ADC_DATX_MASK;
>> /* clear irq */
>> -   if (info->version == ADC_V2)
>> +   if (info->version & ADC_V2)
>> writel(1, ADC_V2_INT_ST(info->regs));
>> else
>> writel(1, ADC_V1_INTCLR(info->regs));
>> @@ -226,11 +229,25 @@ static int exynos_adc_remove_devices(struct device 
>> *dev, void *c)
>> return 0;
>>  }
>>
>> +static void exynos_adc_enable_clock(struct exynos_adc *info, bool enable)
>> +{
>> +   if (enable) {
>> +   clk_prepare_enable(info->clk);
> 
> This could fail. Is it OK without any checks?

OK, I'll check return value.
> 
>> +   if (info->version == ADC_V3)
>> +   clk_prepare_enable(info->sclk);
> 
> ditto.

ditto.

> 
>> +
>> +   } else {
>> +   if (info->version == ADC_V3)
>> +   clk_disable_unprepare(info->sclk);
>> +   clk_disable_unprepare(info->clk);
>> +   }
>> +}
>> +
>>  static void exynos_adc_hw_init(struct exynos_adc *info)
>>  {
>> u32 con1, con2;
>>
>> -   if (info->version == ADC_V2) {
>> +   if (info->version & ADC_V2) {
>> con1 = ADC_V2_CON1_SOFT_RESET;
>> writel(con1, ADC_V2_CON1(info->regs));
>>
>> @@ -300,6 +317,8 @@ static int exynos_adc_probe(struct platform_device *pdev)
>>
>> writel(1, info->enable_reg);
>>
>> +   info->version = exynos_adc_get_version(pdev);
>> +
>> info->clk = devm_clk_get(>dev, "adc");
>> if (IS_ERR(info->clk)) {
>> dev_err(>dev, "failed getting clock, err = %ld\n",
>> @@ -308,6 +327,17 @@ static int exynos_adc_probe(struct platform_device 
>> *pdev)
>> goto err_irq;
>> }
>>
>> + 

[RFC PATCH 2/2] PCI: exynos: Add PCIe support for Samsung GH7 SoC

2014-04-15 Thread Jingoo Han
Samsung GH7 has four PCIe controllers which can be used as root
complex for PCIe interface.

Signed-off-by: Jingoo Han 
---
 drivers/pci/host/Kconfig  |2 +-
 drivers/pci/host/pci-exynos.c |  135 ++---
 2 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig
index a6f67ec..3be047c 100644
--- a/drivers/pci/host/Kconfig
+++ b/drivers/pci/host/Kconfig
@@ -11,7 +11,7 @@ config PCIE_DW
 
 config PCI_EXYNOS
bool "Samsung Exynos PCIe controller"
-   depends on SOC_EXYNOS5440
+   depends on SOC_EXYNOS5440 || ARCH_GH7
select PCIEPORTBUS
select PCIE_DW
 
diff --git a/drivers/pci/host/pci-exynos.c b/drivers/pci/host/pci-exynos.c
index 3de6bfb..6e845ca 100644
--- a/drivers/pci/host/pci-exynos.c
+++ b/drivers/pci/host/pci-exynos.c
@@ -57,6 +57,8 @@ struct exynos_pcie {
 #define PCIE_NONSTICKY_RESET   0x024
 #define PCIE_APP_INIT_RESET0x028
 #define PCIE_APP_LTSSM_ENABLE  0x02c
+#define PCIE_SYS_AUX_PWR_DET   0x038
+#define PCIE_SYS_AUX_PWR_ENABLE(0x1 << 0)
 #define PCIE_ELBI_RDLH_LINKUP  0x064
 #define PCIE_ELBI_LTSSM_ENABLE 0x1
 #define PCIE_ELBI_SLV_AWMISC   0x11c
@@ -72,6 +74,23 @@ struct exynos_pcie {
 #define PCIE_PHY_TRSVREG_RESET 0x020
 #define PCIE_PHY_TRSV_RESET0x024
 
+/* PCIe PHY glue registers */
+#define PCIE_PHY_GLUE_REG0 0x000
+#define PCIE_PHY_GLUE_GLOBAL_RESET (0x1 << 0)
+#define PCIE_PHY_GLUE_COMMON_RESET (0x1 << 1)
+#define PCIE_PHY_GLUE_MAC_RESET(0x1 << 11)
+#define PCIE_PHY_GLUE_REG2 0x008
+#define PCIE_PHY_GLUE_CLK100M_DS_MAX   (0x7 << 0)
+#define PCIE_PHY_GLUE_CLK100M_RFCLK(0x1 << 3)
+#define PCIE_PHY_GLUE_CLK100M_ENABLE   (0x1 << 4)
+#define PCIE_PHY_GLUE_PLL_BUF_ENABLE   (0x1 << 8)
+#define PCIE_PHY_GLUE_PLL_DIV_ENABLE   (0x1 << 9)
+#define PCIE_PHY_GLUE_REFCLK_IN(x) (((x) & 0xf) << 10)
+#define PCIE_PHY_GLUE_REG3 0x00c
+#define PCIE_PHY_GLUE_REF_RATE_100MHZ  (0x2 << 0)
+#define PCIE_PHY_GLUE_REG4 0x010
+#define PCIE_PHY_GLUE_MODE_PCIE(0x0 << 0)
+
 /* PCIe PHY registers */
 #define PCIE_PHY_IMPEDANCE 0x004
 #define PCIE_PHY_PLL_DIV_0 0x008
@@ -164,11 +183,45 @@ static void exynos_pcie_sideband_dbi_r_mode(struct 
pcie_port *pp, bool on)
}
 }
 
+static void exynos_pcie_set_phy_mode(struct pcie_port *pp)
+{
+   u32 val;
+   struct exynos_pcie *exynos_pcie = to_exynos_pcie(pp);
+
+   if (!of_device_is_compatible(pp->dev->of_node, "samsung,gh7-pcie"))
+   return;
+
+   /* configure phy reference clock setting */
+   val = exynos_blk_readl(exynos_pcie, PCIE_PHY_GLUE_REG2);
+   val |= PCIE_PHY_GLUE_CLK100M_ENABLE | PCIE_PHY_GLUE_CLK100M_RFCLK |
+   PCIE_PHY_GLUE_CLK100M_DS_MAX;
+   exynos_blk_writel(exynos_pcie, val, PCIE_PHY_GLUE_REG2);
+
+   val = exynos_blk_readl(exynos_pcie, PCIE_PHY_GLUE_REG2);
+   val |= PCIE_PHY_GLUE_PLL_DIV_ENABLE | PCIE_PHY_GLUE_PLL_BUF_ENABLE;
+   exynos_blk_writel(exynos_pcie, val, PCIE_PHY_GLUE_REG2);
+
+   val = exynos_blk_readl(exynos_pcie, PCIE_PHY_GLUE_REG2);
+   val |= PCIE_PHY_GLUE_REFCLK_IN(6);
+   exynos_blk_writel(exynos_pcie, val, PCIE_PHY_GLUE_REG2);
+
+   /* set phy reference clock  */
+   exynos_blk_writel(exynos_pcie, PCIE_PHY_GLUE_REF_RATE_100MHZ,
+ PCIE_PHY_GLUE_REG3);
+
+   /* set phy mode to pcie mode */
+   exynos_blk_writel(exynos_pcie, PCIE_PHY_GLUE_MODE_PCIE,
+ PCIE_PHY_GLUE_REG4);
+}
+
 static void exynos_pcie_assert_core_reset(struct pcie_port *pp)
 {
u32 val;
struct exynos_pcie *exynos_pcie = to_exynos_pcie(pp);
 
+   if (of_device_is_compatible(pp->dev->of_node, "samsung,gh7-pcie"))
+   return;
+
val = exynos_elb_readl(exynos_pcie, PCIE_CORE_RESET);
val &= ~PCIE_CORE_RESET_ENABLE;
exynos_elb_writel(exynos_pcie, val, PCIE_CORE_RESET);
@@ -190,27 +243,48 @@ static void exynos_pcie_deassert_core_reset(struct 
pcie_port *pp)
exynos_elb_writel(exynos_pcie, 1, PCIE_NONSTICKY_RESET);
exynos_elb_writel(exynos_pcie, 1, PCIE_APP_INIT_RESET);
exynos_elb_writel(exynos_pcie, 0, PCIE_APP_INIT_RESET);
-   exynos_blk_writel(exynos_pcie, 1, PCIE_PHY_MAC_RESET);
+
+   if (of_device_is_compatible(pp->dev->of_node, "samsung,gh7-pcie")) {
+   val = exynos_blk_readl(exynos_pcie, PCIE_PHY_GLUE_REG0);
+   val |= PCIE_PHY_GLUE_MAC_RESET;
+   exynos_blk_writel(exynos_pcie, val, PCIE_PHY_GLUE_REG0);
+   } else {
+   exynos_blk_writel(exynos_pcie, 1, PCIE_PHY_MAC_RESET);
+   }
 }
 
 static void exynos_pcie_assert_phy_reset(struct pcie_port *pp)
 {
struct exynos_pcie *exynos_pcie = to_exynos_pcie(pp);
 
+   if 

[RFC PATCH 1/2] PCI: designware: Add ARM64 PCI support

2014-04-15 Thread Jingoo Han
Add ARM64 PCI support for Synopsys designware PCIe, by using
pcie arm64 arch support and creating generic pcie bridge from
device tree.

Signed-off-by: Jingoo Han 
---
 drivers/pci/host/pcie-designware.c |   32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/pci/host/pcie-designware.c 
b/drivers/pci/host/pcie-designware.c
index 6d23d8c..fac0440 100644
--- a/drivers/pci/host/pcie-designware.c
+++ b/drivers/pci/host/pcie-designware.c
@@ -65,14 +65,27 @@
 #define PCIE_ATU_FUNC(x)   (((x) & 0x7) << 16)
 #define PCIE_ATU_UPPER_TARGET  0x91C
 
+#ifdef CONFIG_ARM
 static struct hw_pci dw_pci;
+#endif
 
 static unsigned long global_io_offset;
 
+#ifdef CONFIG_ARM
 static inline struct pcie_port *sys_to_pcie(struct pci_sys_data *sys)
 {
return sys->private_data;
 }
+#endif
+
+#ifdef CONFIG_ARM64
+static inline struct pcie_port *sys_to_pcie(struct pcie_port *pp)
+{
+   return pp;
+}
+
+static struct pci_ops dw_pcie_ops;
+#endif
 
 int dw_pcie_cfg_read(void __iomem *addr, int where, int size, u32 *val)
 {
@@ -381,7 +394,9 @@ static int dw_pcie_msi_map(struct irq_domain *domain, 
unsigned int irq,
 {
irq_set_chip_and_handler(irq, _msi_irq_chip, handle_simple_irq);
irq_set_chip_data(irq, domain->host_data);
+#ifdef CONFIG_ARM
set_irq_flags(irq, IRQF_VALID);
+#endif
 
return 0;
 }
@@ -397,6 +412,10 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
struct of_pci_range_parser parser;
u32 val;
int i;
+#ifdef CONFIG_ARM64
+   struct pci_host_bridge *bridge;
+   resource_size_t lastbus;
+#endif
 
if (of_pci_range_parser_init(, np)) {
dev_err(pp->dev, "missing ranges property\n");
@@ -489,6 +508,7 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
val |= PORT_LOGIC_SPEED_CHANGE;
dw_pcie_wr_own_conf(pp, PCIE_LINK_WIDTH_SPEED_CONTROL, 4, val);
 
+#ifdef CONFIG_ARM
dw_pci.nr_controllers = 1;
dw_pci.private_data = (void **)
 
@@ -497,6 +517,16 @@ int __init dw_pcie_host_init(struct pcie_port *pp)
 #ifdef CONFIG_PCI_DOMAINS
dw_pci.domain++;
 #endif
+#endif
+
+#ifdef CONFIG_ARM64
+   bridge = of_create_pci_host_bridge(pp->dev, _pcie_ops, pp);
+   if (IS_ERR_OR_NULL(bridge))
+   return PTR_ERR(bridge);
+
+   lastbus = pci_rescan_bus(bridge->bus);
+   pci_bus_update_busn_res_end(bridge->bus, lastbus);
+#endif
 
return 0;
 }
@@ -695,6 +725,7 @@ static struct pci_ops dw_pcie_ops = {
.write = dw_pcie_wr_conf,
 };
 
+#ifdef CONFIG_ARM
 static int dw_pcie_setup(int nr, struct pci_sys_data *sys)
 {
struct pcie_port *pp;
@@ -758,6 +789,7 @@ static struct hw_pci dw_pci = {
.map_irq= dw_pcie_map_irq,
.add_bus= dw_pcie_add_bus,
 };
+#endif /* CONFIG_ARM */
 
 void dw_pcie_setup_rc(struct pcie_port *pp)
 {
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] usb: gadget: Add xilinx axi usb2 device support

2014-04-15 Thread Felipe Balbi
Hi,

On Tue, Apr 15, 2014 at 01:55:38PM -0400, Alan Stern wrote:
> On Tue, 15 Apr 2014, Felipe Balbi wrote:
> 
> > > 2. Does device need to know OUT transactions before hand so that OUT
> > > requests are queued for endpoint before packets are received
> > > from host?
> > 
> > well, no. Gadget driver shouldn't depend on that. That's UDC driver's
> > responsability to manage that. I mean, if host sends OUT token and
> > there's nothing in the out queue, then UDC need to start transfer as
> > soon as gadget driver queues the request. If, on the other hand, gadget
> > driver queues packet before host has sent OUT token then you have two
> > choices:
> > 
> > 1) start the transfer - most HW will wait for OUT token
> > 2) wait for out token
> 
> I'm not familiar with the variations in all the different UDC hardware.  
> Nevertheless, I wouldn't describe the situation in those terms.

OK, I've oversimplified... what I meant was that even if you start a
transfer at the UDC level, nothing will happen on the bus until HW sees
an OUT token. The buffer pointed to by req->buf won't get any writes,
DMA won't do anything.

On (2) I meant that some HW (e.g. dwc3) will assert the IRQ line once
they see a token for which they have to transfer descriptors in the
internal controller's cache.

> If an OUT transaction occurs and the gadget driver hasn't queued a
> request, the UDC hardware could store the incoming data in an internal
> buffer or it could NAK the transaction.  There aren't any other
> choices.  If there isn't enough space available in an internal buffer,
> the only possible action is NAK.

in HS there's also NYET

> Regardless, gadget drivers do not need to queue requests for OUT
> endpoints before the host starts sending data.  When the request does

they're not required, but they can. It's UDC driver's responsability to
start consuming the queue at the proper time.

> get queued, the UDC driver will make sure that the transfer takes
> place.

correct.

-- 
balbi


signature.asc
Description: Digital signature


[RFC PATCH 0/2] Add support for Samsung GH7 PCIe controller

2014-04-15 Thread Jingoo Han
This patch adds support for Samsung GH7 PCIe host controller.
Samsung GH7 PCIe controller driver has dependency on the PCI arm64
arch support. So, the Liviu Dudau's patcheset for creating generic
host_bridge from device tree [1] and supporting PCI in AArch64 [2]
are required.

This patch is marked as RFC, so any comment will be welcomed.
Thank you.

[1] http://www.spinics.net/lists/linux-pci/msg29786.html
[2] http://www.spinics.net/lists/linux-pci/msg29793.html

Jingoo Han (2)
  PCI: designware: Add ARM64 PCI support
  PCI: exynos: Add PCIe support for Samsung GH7 SoC

---
 drivers/pci/host/Kconfig   |2 +-
 drivers/pci/host/pci-exynos.c  |  135 +---
 drivers/pci/host/pcie-designware.c |   32 +
 3 files changed, 158 insertions(+), 11 deletions(-)

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb: phy: mv_u3d: Remove usb phy driver for mv_u3d

2014-04-15 Thread Felipe Balbi
On Tue, Apr 15, 2014 at 08:08:32PM +0200, Paul Bolle wrote:
> On Tue, 2014-04-15 at 12:23 -0500, Felipe Balbi wrote:
> > so this means that drivers/usb/gadget/mv_u3d_core.c isn't used either ?
> 
> Why should it? There's no dependency on CPU_MMP3 for USB_MV_U3D anymore,
> is there?

no, but the UDC needs its PHY driver.

> > Instead of deleting this and introducing a new driver, why don't you
> > just help fix what's already in-tree ?
> 
> Were any of the reasons I gave for removing this driver incorrect? Has
> it actually ever been possible to build it?

I don't know, let me check:

$ make drivers/usb/phy/phy-mv-u3d-usb.o
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
make[1]: `include/generated/mach-types.h' is up to date.
  CALLscripts/checksyscalls.sh
  CC  drivers/usb/phy/phy-mv-u3d-usb.o

yup, builds just fine. Even if the ARCH support isn't in place, this
driver is *not* breaking anything, it's not preventing anyone from
getting work done and it might be helping Marvell decrease the amount of
changes they keep out of tree.

I don't see any problems this driver in tree as long as there are people
working on it and I see the latest commit was 10 days ago, it wouldn't
be fair to Marvell to delete their driver if they're still finding ways
to make it useful one way or another.

cheers

-- 
balbi


signature.asc
Description: Digital signature


[PATCH v2 0/7] Support 4 levels of translation tables for ARM64

2014-04-15 Thread Jungseok Lee
Hi All,

This the 2nd patchset supports 4 levels of tranlsation tables for ARM64.

Firstly, The patchset decouples page size from level of translation tables
as taking account into the comment from Catalin Marinas:
http://www.spinics.net/linux/lists/arm-kernel/msg319552.html

Then, it implements 4 levels of translation tables for native, HYP
and stage2 sides.

All ARMv8 and ARMv7 related changes are validated with FastModels+kvmtool
and A15+QEMU, respectively.

Changes since v1:
- fixed unmatched data types as per Steve's comment
- removed unnecessary #ifdef in arch/arm64/mm/* as per Steve's comment
- revised create_pgd_entry to deal with PUD entry as per Steve's comment
- introduced a macro for initial memblock limit as per Steve's comment
- dropped "Fix line length exceeding 80 characters" patch as per Marc's comment
- removed unnecessary #ifdef in arch/arm/kvm/mmu.c as per Marc's comment
- added a macro for a number of objects of as per Marc's comment

Jungseok Lee (7):
  arm64: Use pr_* instead of printk
  arm64: Decouple page size from level of translation tables
  arm64: Introduce a kernel configuration option for VA_BITS
  arm64: Add a description on 48-bit address space with 4KB pages
  arm64: Add 4 levels of page tables definition with 4KB pages
  arm64: mm: Implement 4 levels of translation tables
  arm64: KVM: Implement 4 levels of translation tables for HYP and stage2

Documentation/arm64/memory.txt|   59 +--
 arch/arm/include/asm/kvm_mmu.h|   10 ++
 arch/arm/kvm/mmu.c|   88 ++---
 arch/arm64/Kconfig|   51 +-
 arch/arm64/include/asm/kvm_arm.h  |   20 
 arch/arm64/include/asm/kvm_mmu.h  |   10 ++
 arch/arm64/include/asm/memblock.h |6 ++
 arch/arm64/include/asm/memory.h   |6 +-
 arch/arm64/include/asm/page.h |6 +-
 arch/arm64/include/asm/pgalloc.h  |   24 -
 arch/arm64/include/asm/pgtable-4level-hwdef.h |   50 ++
 arch/arm64/include/asm/pgtable-4level-types.h |   71 +
 arch/arm64/include/asm/pgtable-hwdef.h|8 +-
 arch/arm64/include/asm/pgtable.h  |   52 --
 arch/arm64/include/asm/tlb.h  |   10 +-
 arch/arm64/kernel/head.S  |   40 +---
 arch/arm64/kernel/traps.c |   19 ++--
 arch/arm64/mm/fault.c |1 +
 arch/arm64/mm/mmu.c   |   16 ++-
 19 files changed, 485 insertions(+), 62 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/7] arm64: Decouple page size from level of translation tables

2014-04-15 Thread Jungseok Lee
This patch separates page size from level of translation tables in
configuration. It facilitates introduction of different options,
such as 4KB + 4 levels, 16KB + 4 levels and 64KB + 3 levels, easily.

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 arch/arm64/Kconfig |   36 +++-
 arch/arm64/include/asm/page.h  |2 +-
 arch/arm64/include/asm/pgalloc.h   |4 ++--
 arch/arm64/include/asm/pgtable-hwdef.h |2 +-
 arch/arm64/include/asm/pgtable.h   |8 +++
 arch/arm64/include/asm/tlb.h   |2 +-
 6 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e6e4d37..1a2faf9 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -144,14 +144,48 @@ endmenu
 
 menu "Kernel Features"
 
+choice
+   prompt "Page size"
+   default ARM64_4K_PAGES
+   help
+ Allows page size.
+
+config ARM64_4K_PAGES
+   bool "4KB"
+   help
+ This feature enables 4KB pages support.
+
 config ARM64_64K_PAGES
-   bool "Enable 64KB pages support"
+   bool "64KB"
help
  This feature enables 64KB pages support (4KB by default)
  allowing only two levels of page tables and faster TLB
  look-up. AArch32 emulation is not available when this feature
  is enabled.
 
+endchoice
+
+choice
+   prompt "Level of translation tables"
+   default ARM64_3_LEVELS if ARM64_4K_PAGES
+   default ARM64_2_LEVELS if ARM64_64K_PAGES
+   help
+ Allows level of translation tables.
+
+config ARM64_2_LEVELS
+   bool "2 level"
+   depends on ARM64_64K_PAGES
+   help
+ This feature enables 2 levels of translation tables.
+
+config ARM64_3_LEVELS
+   bool "3 level"
+   depends on ARM64_4K_PAGES
+   help
+ This feature enables 3 levels of translation tables.
+
+endchoice
+
 config CPU_BIG_ENDIAN
bool "Build big-endian kernel"
help
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 46bf666..268e53d 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -33,7 +33,7 @@
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_ARM64_64K_PAGES
+#ifdef CONFIG_ARM64_2_LEVELS
 #include 
 #else
 #include 
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 9bea6e7..4829837 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -26,7 +26,7 @@
 
 #define check_pgt_cache()  do { } while (0)
 
-#ifndef CONFIG_ARM64_64K_PAGES
+#ifndef CONFIG_ARM64_2_LEVELS
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
@@ -44,7 +44,7 @@ static inline void pud_populate(struct mm_struct *mm, pud_t 
*pud, pmd_t *pmd)
set_pud(pud, __pud(__pa(pmd) | PMD_TYPE_TABLE));
 }
 
-#endif /* CONFIG_ARM64_64K_PAGES */
+#endif /* CONFIG_ARM64_2_LEVELS */
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
 extern void pgd_free(struct mm_struct *mm, pgd_t *pgd);
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 5fc8a66..9cd86c6 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -16,7 +16,7 @@
 #ifndef __ASM_PGTABLE_HWDEF_H
 #define __ASM_PGTABLE_HWDEF_H
 
-#ifdef CONFIG_ARM64_64K_PAGES
+#ifdef CONFIG_ARM64_2_LEVELS
 #include 
 #else
 #include 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 90c811f..a64ce5e 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -47,7 +47,7 @@ extern void __pmd_error(const char *file, int line, unsigned 
long val);
 extern void __pgd_error(const char *file, int line, unsigned long val);
 
 #define pte_ERROR(pte) __pte_error(__FILE__, __LINE__, pte_val(pte))
-#ifndef CONFIG_ARM64_64K_PAGES
+#ifndef CONFIG_ARM64_2_LEVELS
 #define pmd_ERROR(pmd) __pmd_error(__FILE__, __LINE__, pmd_val(pmd))
 #endif
 #define pgd_ERROR(pgd) __pgd_error(__FILE__, __LINE__, pgd_val(pgd))
@@ -320,7 +320,7 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
  */
 #define mk_pte(page,prot)  pfn_pte(page_to_pfn(page),prot)
 
-#ifndef CONFIG_ARM64_64K_PAGES
+#ifndef CONFIG_ARM64_2_LEVELS
 
 #define pud_none(pud)  (!pud_val(pud))
 #define pud_bad(pud)   (!(pud_val(pud) & 2))
@@ -342,7 +342,7 @@ static inline pmd_t *pud_page_vaddr(pud_t pud)
return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
 }
 
-#endif /* CONFIG_ARM64_64K_PAGES */
+#endif /* CONFIG_ARM64_2_LEVELS */
 
 /* to find an entry in a page-table-directory */
 #define pgd_index(addr)(((addr) >> PGDIR_SHIFT) & 
(PTRS_PER_PGD - 1))
@@ -353,7 +353,7 @@ static inline pmd_t *pud_page_vaddr(pud_t pud)
 #define pgd_offset_k(addr) pgd_offset(_mm, addr)
 
 /* Find an entry in the second-level page table.. */
-#ifndef CONFIG_ARM64_64K_PAGES

[PATCH v2 5/7] arm64: Add 4 levels of page tables definition with 4KB pages

2014-04-15 Thread Jungseok Lee
This patch adds hardware definition and types for 4 levels of
translation tables with 4KB pages.

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 arch/arm64/include/asm/pgtable-4level-hwdef.h |   50 +
 arch/arm64/include/asm/pgtable-4level-types.h |   71 +
 2 files changed, 121 insertions(+)
 create mode 100644 arch/arm64/include/asm/pgtable-4level-hwdef.h
 create mode 100644 arch/arm64/include/asm/pgtable-4level-types.h

diff --git a/arch/arm64/include/asm/pgtable-4level-hwdef.h 
b/arch/arm64/include/asm/pgtable-4level-hwdef.h
new file mode 100644
index 000..0ec84e2
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-4level-hwdef.h
@@ -0,0 +1,50 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_PGTABLE_4LEVEL_HWDEF_H
+#define __ASM_PGTABLE_4LEVEL_HWDEF_H
+
+#define PTRS_PER_PTE   512
+#define PTRS_PER_PMD   512
+#define PTRS_PER_PUD   512
+#define PTRS_PER_PGD   512
+
+/*
+ * PGDIR_SHIFT determines the size a top-level page table entry can map.
+ */
+#define PGDIR_SHIFT39
+#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
+#define PGDIR_MASK (~(PGDIR_SIZE-1))
+
+/*
+ * PUD_SHIFT determines the size the second level page table entry can map.
+ */
+#define PUD_SHIFT  30
+#define PUD_SIZE   (_AC(1, UL) << PUD_SHIFT)
+#define PUD_MASK   (~(PUD_SIZE-1))
+
+/*
+ * PMD_SHIFT determines the size the third level page table entry can map.
+ */
+#define PMD_SHIFT  21
+#define PMD_SIZE   (_AC(1, UL) << PMD_SHIFT)
+#define PMD_MASK   (~(PMD_SIZE-1))
+
+/*
+ * section address mask and size definitions.
+ */
+#define SECTION_SHIFT  21
+#define SECTION_SIZE   (_AC(1, UL) << SECTION_SHIFT)
+#define SECTION_MASK   (~(SECTION_SIZE-1))
+
+#endif
diff --git a/arch/arm64/include/asm/pgtable-4level-types.h 
b/arch/arm64/include/asm/pgtable-4level-types.h
new file mode 100644
index 000..7ad8dd2
--- /dev/null
+++ b/arch/arm64/include/asm/pgtable-4level-types.h
@@ -0,0 +1,71 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_PGTABLE_4LEVEL_TYPES_H
+#define __ASM_PGTABLE_4LEVEL_TYPES_H
+
+#include 
+
+typedef u64 pteval_t;
+typedef u64 pmdval_t;
+typedef u64 pudval_t;
+typedef u64 pgdval_t;
+
+#undef STRICT_MM_TYPECHECKS
+
+#ifdef STRICT_MM_TYPECHECKS
+
+/*
+ * These are used to make use of C type-checking..
+ */
+typedef struct { pteval_t pte; } pte_t;
+typedef struct { pmdval_t pmd; } pmd_t;
+typedef struct { pudval_t pud; } pud_t;
+typedef struct { pgdval_t pgd; } pgd_t;
+typedef struct { pteval_t pgprot; } pgprot_t;
+
+#define pte_val(x) ((x).pte)
+#define pmd_val(x) ((x).pmd)
+#define pud_val(x) ((x).pud)
+#define pgd_val(x) ((x).pgd)
+#define pgprot_val(x)  ((x).pgprot)
+
+#define __pte(x)   ((pte_t) { (x) } )
+#define __pmd(x)   ((pmd_t) { (x) } )
+#define __pud(x)   ((pud_t) { (x) } )
+#define __pgd(x)   ((pgd_t) { (x) } )
+#define __pgprot(x)((pgprot_t) { (x) } )
+
+#else  /* !STRICT_MM_TYPECHECKS */
+
+typedef pteval_t pte_t;
+typedef pmdval_t pmd_t;
+typedef pudval_t pud_t;
+typedef pgdval_t pgd_t;
+typedef pteval_t pgprot_t;
+
+#define pte_val(x) (x)
+#define pmd_val(x) (x)
+#define pud_val(x) (x)
+#define pgd_val(x) (x)
+#define pgprot_val(x)  (x)
+
+#define __pte(x)   (x)
+#define __pmd(x)   (x)
+#define __pud(x)   (x)
+#define __pgd(x)   (x)
+#define __pgprot(x)(x)
+
+#endif /* STRICT_MM_TYPECHECKS */
+
+#endif /* __ASM_PGTABLE_4LEVEL_TYPES_H */
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 4/7] arm64: Add a description on 48-bit address space with 4KB pages

2014-04-15 Thread Jungseok Lee
This patch adds memory layout and translation lookup information
about 48-bit address space with 4K pages. The description is based
on 4 levels of translation tables.

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 Documentation/arm64/memory.txt |   59 ++--
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/Documentation/arm64/memory.txt b/Documentation/arm64/memory.txt
index d50fa61..8142709 100644
--- a/Documentation/arm64/memory.txt
+++ b/Documentation/arm64/memory.txt
@@ -8,10 +8,11 @@ This document describes the virtual memory layout used by the 
AArch64
 Linux kernel. The architecture allows up to 4 levels of translation
 tables with a 4KB page size and up to 3 levels with a 64KB page size.
 
-AArch64 Linux uses 3 levels of translation tables with the 4KB page
-configuration, allowing 39-bit (512GB) virtual addresses for both user
-and kernel. With 64KB pages, only 2 levels of translation tables are
-used but the memory layout is the same.
+AArch64 Linux uses 3 levels and 4 levels of translation tables with
+the 4KB page configuration, allowing 39-bit (512GB) and 48-bit (256TB)
+virtual addresses, respectively, for both user and kernel. With 64KB
+pages, only 2 levels of translation tables are used but the memory layout
+is the same.
 
 User addresses have bits 63:39 set to 0 while the kernel addresses have
 the same bits set to 1. TTBRx selection is given by bit 63 of the
@@ -21,7 +22,7 @@ The swapper_pgd_dir address is written to TTBR1 and never 
written to
 TTBR0.
 
 
-AArch64 Linux memory layout with 4KB pages:
+AArch64 Linux memory layout with 4KB pages + 3 levels:
 
 Start  End SizeUse
 ---
@@ -48,7 +49,34 @@ ffbffc00 ffbf  64MB  
modules
 ffc0    256GB  kernel logical 
memory map
 
 
-AArch64 Linux memory layout with 64KB pages:
+AArch64 Linux memory layout with 4KB pages + 4 levels:
+
+Start  End SizeUse
+---
+    256TB  user
+
+   7bfe~124TB  vmalloc
+
+7bff   7bff  64KB  [guard page]
+
+7c00   7dff   2TB  vmemmap
+
+7e00   7bbf  ~2TB  [guard, future 
vmmemap]
+
+7a00   7aff  16MB  PCI I/O space
+
+7b00   7bbf  12MB  [guard]
+
+7bc0   7bdf   2MB  earlyprintk 
device
+
+7be0   7bff   2MB  [guard]
+
+7c00   7fff  64MB  modules
+
+8000    128TB  kernel logical 
memory map
+
+
+AArch64 Linux memory layout with 64KB pages + 2 levels:
 
 Start  End SizeUse
 ---
@@ -75,7 +103,7 @@ fdfffc00 fdff  64MB  
modules
 fe00      2TB  kernel logical 
memory map
 
 
-Translation table lookup with 4KB pages:
+Translation table lookup with 4KB pages + 3 levels:
 
 +++++++++
 |6356|5548|4740|3932|3124|2316|15 8|7  0|
@@ -90,7 +118,22 @@ Translation table lookup with 4KB pages:
  +-> [63] TTBR0/1
 
 
-Translation table lookup with 64KB pages:
+Translation table lookup with 4KB pages + 4 levels:
+
++++++++++
+|6356|5548|4740|3932|3124|2316|15 8|7  0|
++++++++++
+ | | | | | |
+ | | | | | v
+ | | | | |   [11:0]  in-page offset
+ | | | | +-> [20:12] L3 index
+ | | | +---> [29:21] L2 index
+ | | +-> [38:30] L1 index
+ | +---> [47:39] L0 index
+ +-> [63] TTBR0/1
+
+
+Translation table lookup with 64KB pages + 2 levels:
 
 +++++++++
 |6356|5548|4740|3932|3124|2316|15 8|7  0|
-- 
1.7.10.4


--
To unsubscribe from 

[PATCH v2 6/7] arm64: mm: Implement 4 levels of translation tables

2014-04-15 Thread Jungseok Lee
This patch implements 4 levels of translation tables since 3 levels
of page tables with 4KB pages cannot support 40-bit physical address
space described in [1] due to the following issue.

It is a restriction that kernel logical memory map with 4KB + 3 levels
(0xffc0-0x) cannot cover RAM region from
544GB to 1024GB in [1]. Specifically, ARM64 kernel fails to create
mapping for this region in map_mem function since __phys_to_virt for
this region reaches to address overflow.

If SoC design follows the document, [1], over 32GB RAM would be placed
from 544GB. Even 64GB system is supposed to use the region from 544GB
to 576GB for only 32GB RAM. Naturally, it would reach to enable 4 levels
of page tables to avoid hacking __virt_to_phys and __phys_to_virt.

However, it is recommended 4 levels of page table should be only enabled
if memory map is too sparse or there is about 512GB RAM.

References
--
[1]: Principles of ARM Memory Maps, White Paper, Issue C

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 arch/arm64/Kconfig |7 +
 arch/arm64/include/asm/memblock.h  |6 +
 arch/arm64/include/asm/page.h  |4 ++-
 arch/arm64/include/asm/pgalloc.h   |   20 +++
 arch/arm64/include/asm/pgtable-hwdef.h |6 +++--
 arch/arm64/include/asm/pgtable.h   |   44 ++--
 arch/arm64/include/asm/tlb.h   |8 ++
 arch/arm64/kernel/head.S   |   40 -
 arch/arm64/kernel/traps.c  |5 
 arch/arm64/mm/fault.c  |1 +
 arch/arm64/mm/mmu.c|   16 +---
 11 files changed, 136 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 431acbc..7f5270b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -184,12 +184,19 @@ config ARM64_3_LEVELS
help
  This feature enables 3 levels of translation tables.
 
+config ARM64_4_LEVELS
+   bool "4 level"
+   depends on ARM64_4K_PAGES
+   help
+ This feature enables 4 levels of translation tables.
+
 endchoice
 
 config ARM64_VA_BITS
int "Virtual address space size"
range 39 39 if ARM64_4K_PAGES && ARM64_3_LEVELS
range 42 42 if ARM64_64K_PAGES && ARM64_2_LEVELS
+   range 48 48 if ARM64_4K_PAGES && ARM64_4_LEVELS
help
  This feature is determined by a combination of page size and
  level of translation tables.
diff --git a/arch/arm64/include/asm/memblock.h 
b/arch/arm64/include/asm/memblock.h
index 6afeed2..e4ac8bf 100644
--- a/arch/arm64/include/asm/memblock.h
+++ b/arch/arm64/include/asm/memblock.h
@@ -16,6 +16,12 @@
 #ifndef __ASM_MEMBLOCK_H
 #define __ASM_MEMBLOCK_H
 
+#ifndef CONFIG_ARM64_4_LEVELS
+#define MEMBLOCK_INITIAL_LIMIT PGDIR_SIZE
+#else
+#define MEMBLOCK_INITIAL_LIMIT PUD_SIZE
+#endif
+
 extern void arm64_memblock_init(void);
 
 #endif
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 268e53d..83b5289 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -35,8 +35,10 @@
 
 #ifdef CONFIG_ARM64_2_LEVELS
 #include 
-#else
+#elif defined(CONFIG_ARM64_3_LEVELS)
 #include 
+#else
+#include 
 #endif
 
 extern void __cpu_clear_user_page(void *p, unsigned long user);
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 4829837..8d745fa 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -26,6 +26,26 @@
 
 #define check_pgt_cache()  do { } while (0)
 
+#ifdef CONFIG_ARM64_4_LEVELS
+
+static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   return (pud_t *)get_zeroed_page(GFP_KERNEL | __GFP_REPEAT);
+}
+
+static inline void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+   BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
+   free_page((unsigned long)pud);
+}
+
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+   set_pgd(pgd, __pgd(__pa(pud) | PUD_TYPE_TABLE));
+}
+
+#endif  /* CONFIG_ARM64_4_LEVELS */
+
 #ifndef CONFIG_ARM64_2_LEVELS
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 9cd86c6..ba30053 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -18,8 +18,10 @@
 
 #ifdef CONFIG_ARM64_2_LEVELS
 #include 
-#else
+#elif defined(CONFIG_ARM64_3_LEVELS)
 #include 
+#else
+#include 
 #endif
 
 /*
@@ -27,7 +29,7 @@
  *
  * Level 1 descriptor (PUD).
  */
-
+#define PUD_TYPE_TABLE (_AT(pudval_t, 3) << 0)
 #define PUD_TABLE_BIT  (_AT(pgdval_t, 1) << 1)
 
 /*
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index a64ce5e..efc40d1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ 

[PATCH v2 3/7] arm64: Introduce a kernel configuration option for VA_BITS

2014-04-15 Thread Jungseok Lee
This patch adds a kernel configuration for VA_BITS.

It helps to prevent unnecessary #ifdef statements insertions
for VA_BITS when implementing different page sizes and level of
translation tables.

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 arch/arm64/Kconfig  |8 
 arch/arm64/include/asm/memory.h |6 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1a2faf9..431acbc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -186,6 +186,14 @@ config ARM64_3_LEVELS
 
 endchoice
 
+config ARM64_VA_BITS
+   int "Virtual address space size"
+   range 39 39 if ARM64_4K_PAGES && ARM64_3_LEVELS
+   range 42 42 if ARM64_64K_PAGES && ARM64_2_LEVELS
+   help
+ This feature is determined by a combination of page size and
+ level of translation tables.
+
 config CPU_BIG_ENDIAN
bool "Build big-endian kernel"
help
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index e94f945..f6e7480 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -41,11 +41,7 @@
  * The module space lives between the addresses given by TASK_SIZE
  * and PAGE_OFFSET - it must be within 128MB of the kernel text.
  */
-#ifdef CONFIG_ARM64_64K_PAGES
-#define VA_BITS(42)
-#else
-#define VA_BITS(39)
-#endif
+#define VA_BITS(CONFIG_ARM64_VA_BITS)
 #define PAGE_OFFSET(UL(0x) << (VA_BITS - 1))
 #define MODULES_END(PAGE_OFFSET)
 #define MODULES_VADDR  (MODULES_END - SZ_64M)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 7/7] arm64: KVM: Implement 4 levels of translation tables for HYP and stage2

2014-04-15 Thread Jungseok Lee
This patch adds 4 levels of translation tables implementation for both
HYP and stage2. A combination of 4KB + 4 levels host and 4KB + 4 levels
guest can run on ARMv8 architecture as introducing this feature.

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 arch/arm/include/asm/kvm_mmu.h   |   10 +
 arch/arm/kvm/mmu.c   |   88 +-
 arch/arm64/include/asm/kvm_arm.h |   20 +
 arch/arm64/include/asm/kvm_mmu.h |   10 +
 4 files changed, 117 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5c7aa3c..6f7906e 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,11 @@
  */
 #define TRAMPOLINE_VA  UL(CONFIG_VECTORS_BASE)
 
+/*
+ * NUM_OBJS depends on the number of page table translation levels
+ */
+#define NUM_OBJS   2
+
 #ifndef __ASSEMBLY__
 
 #include 
@@ -94,6 +99,11 @@ static inline void kvm_clean_pgd(pgd_t *pgd)
clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t));
 }
 
+static inline void kvm_clean_pmd(pmd_t *pmd)
+{
+   clean_dcache_area(pmd, PTRS_PER_PMD * sizeof(pmd_t));
+}
+
 static inline void kvm_clean_pmd_entry(pmd_t *pmd)
 {
clean_pmd_entry(pmd);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 80bb1e6..7fc9e55 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -388,13 +388,44 @@ static int create_hyp_pmd_mappings(pud_t *pud, unsigned 
long start,
return 0;
 }
 
+static int create_hyp_pud_mappings(pgd_t *pgd, unsigned long start,
+  unsigned long end, unsigned long pfn,
+  pgprot_t prot)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+   unsigned long addr, next;
+
+   addr = start;
+   do {
+   pud = pud_offset(pgd, addr);
+
+   if (pud_none_or_clear_bad(pud)) {
+   pmd = pmd_alloc_one(NULL, addr);
+   if (!pmd) {
+   kvm_err("Cannot allocate Hyp pmd\n");
+   return -ENOMEM;
+   }
+   pud_populate(NULL, pud, pmd);
+   get_page(virt_to_page(pud));
+   kvm_flush_dcache_to_poc(pud, sizeof(*pud));
+   }
+
+   next = pud_addr_end(addr, end);
+
+   create_hyp_pmd_mappings(pud, addr, next, pfn, prot);
+   pfn += (next - addr) >> PAGE_SHIFT;
+   } while (addr = next, addr != end);
+
+   return 0;
+}
+
 static int __create_hyp_mappings(pgd_t *pgdp,
 unsigned long start, unsigned long end,
 unsigned long pfn, pgprot_t prot)
 {
pgd_t *pgd;
pud_t *pud;
-   pmd_t *pmd;
unsigned long addr, next;
int err = 0;
 
@@ -403,22 +434,23 @@ static int __create_hyp_mappings(pgd_t *pgdp,
end = PAGE_ALIGN(end);
do {
pgd = pgdp + pgd_index(addr);
-   pud = pud_offset(pgd, addr);
 
-   if (pud_none_or_clear_bad(pud)) {
-   pmd = pmd_alloc_one(NULL, addr);
-   if (!pmd) {
-   kvm_err("Cannot allocate Hyp pmd\n");
+   if (pgd_none(*pgd)) {
+   pud = pud_alloc_one(NULL, addr);
+   if (!pud) {
+   kvm_err("Cannot allocate Hyp pud\n");
err = -ENOMEM;
goto out;
}
-   pud_populate(NULL, pud, pmd);
-   get_page(virt_to_page(pud));
-   kvm_flush_dcache_to_poc(pud, sizeof(*pud));
+   pgd_populate(NULL, pgd, pud);
+   get_page(virt_to_page(pgd));
+   kvm_flush_dcache_to_poc(pgd, sizeof(*pgd));
}
 
next = pgd_addr_end(addr, end);
-   err = create_hyp_pmd_mappings(pud, addr, next, pfn, prot);
+
+   err = create_hyp_pud_mappings(pgd, addr, next, pfn, prot);
+
if (err)
goto out;
pfn += (next - addr) >> PAGE_SHIFT;
@@ -563,6 +595,24 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
kvm->arch.pgd = NULL;
 }
 
+static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache 
*cache,
+phys_addr_t addr)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+
+   pgd = kvm->arch.pgd + pgd_index(addr);
+   if (pgd_none(*pgd)) {
+   if (!cache)
+   return NULL;
+   pud = mmu_memory_cache_alloc(cache);
+   pgd_populate(NULL, pgd, pud);
+   get_page(virt_to_page(pgd));
+   }
+
+   return pud_offset(pgd, addr);
+}
+
 static pmd_t *stage2_get_pmd(struct kvm 

[PATCH v2 1/7] arm64: Use pr_* instead of printk

2014-04-15 Thread Jungseok Lee
This patch fixed the following checkpatch complaint as using pr_*
instead of printk.

WARNING: printk() should include KERN_ facility level

Signed-off-by: Jungseok Lee 
Reviewed-by: Sungjinn Chung 
---
 arch/arm64/kernel/traps.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 7ffaddd..0484e81 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -65,7 +65,7 @@ static void dump_mem(const char *lvl, const char *str, 
unsigned long bottom,
fs = get_fs();
set_fs(KERNEL_DS);
 
-   printk("%s%s(0x%016lx to 0x%016lx)\n", lvl, str, bottom, top);
+   pr_emerg("%s%s(0x%016lx to 0x%016lx)\n", lvl, str, bottom, top);
 
for (first = bottom & ~31; first < top; first += 32) {
unsigned long p;
@@ -83,7 +83,7 @@ static void dump_mem(const char *lvl, const char *str, 
unsigned long bottom,
sprintf(str + i * 9, " ");
}
}
-   printk("%s%04lx:%s\n", lvl, first & 0x, str);
+   pr_emerg("%s%04lx:%s\n", lvl, first & 0x, str);
}
 
set_fs(fs);
@@ -124,7 +124,7 @@ static void dump_instr(const char *lvl, struct pt_regs 
*regs)
break;
}
}
-   printk("%sCode: %s\n", lvl, str);
+   pr_emerg("%sCode: %s\n", lvl, str);
 
set_fs(fs);
 }
@@ -156,7 +156,7 @@ static void dump_backtrace(struct pt_regs *regs, struct 
task_struct *tsk)
frame.pc = thread_saved_pc(tsk);
}
 
-   printk("Call trace:\n");
+   pr_emerg("Call trace:\n");
while (1) {
unsigned long where = frame.pc;
int ret;
@@ -328,17 +328,17 @@ asmlinkage void bad_mode(struct pt_regs *regs, int 
reason, unsigned int esr)
 
 void __pte_error(const char *file, int line, unsigned long val)
 {
-   printk("%s:%d: bad pte %016lx.\n", file, line, val);
+   pr_crit("%s:%d: bad pte %016lx.\n", file, line, val);
 }
 
 void __pmd_error(const char *file, int line, unsigned long val)
 {
-   printk("%s:%d: bad pmd %016lx.\n", file, line, val);
+   pr_crit("%s:%d: bad pmd %016lx.\n", file, line, val);
 }
 
 void __pgd_error(const char *file, int line, unsigned long val)
 {
-   printk("%s:%d: bad pgd %016lx.\n", file, line, val);
+   pr_crit("%s:%d: bad pgd %016lx.\n", file, line, val);
 }
 
 void __init trap_init(void)
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread H. Peter Anvin
I really wonder if it makes sense...

On April 15, 2014 9:03:48 PM PDT, Sasha Levin  wrote:
>On 04/15/2014 11:54 PM, H. Peter Anvin wrote:
>> On 04/15/2014 08:47 PM, Sasha Levin wrote:
>>> > 
>>> > Yes, if kmemcheck for some reason needs to figure out if an
>instruction
>>> > is a MOV variant we'll need to list quite a few mnemonics, but
>that list
>>> > will be much shorter and more readable than a corresponding list
>of opcodes.
>>> > 
>> You're completely missing my point.  If you are looking at MOV, with
>> 80%+ probability you're doing something very, very wrong, because you
>> will be including instructions that do something completely different
>> from what you thought.
>> 
>> This is true for a lot of the x86 instructions.
>
>Right, but assuming that the AND example I presented earlier makes
>sense, I
>can't create mnemonic entries only for instructions where doing so
>would
>"probably" be right.
>
>If there are use cases where working with mnemonics is correct, we
>should
>be doing that in kmemcheck. If the way kmemcheck deals with mnemonics
>is
>incorrect we should go ahead and fix kmemcheck.
>
>
>Thanks,
>Sasha

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/19] lockdep: lockdep_set_current_reclaim_state should save old value

2014-04-15 Thread NeilBrown
Currently kswapd sets current->lockdep_reclaim_gfp but the first
memory allocation call will clear it.  So the setting does no good.
Thus the lockdep_set_current_reclaim_state call in kswapd() is
ineffective.

With this patch we always save the old value and then restore it,
so lockdep gets to properly check the locks that kswapd takes.

Signed-off-by: NeilBrown 
---
 include/linux/lockdep.h  |8 
 kernel/locking/lockdep.c |8 +---
 mm/page_alloc.c  |5 +++--
 mm/vmscan.c  |   10 ++
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 92b1bfc5da60..18eedd692d16 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -351,8 +351,8 @@ static inline void lock_set_subclass(struct lockdep_map 
*lock,
lock_set_class(lock, lock->name, lock->key, subclass, ip);
 }
 
-extern void lockdep_set_current_reclaim_state(gfp_t gfp_mask);
-extern void lockdep_clear_current_reclaim_state(void);
+extern gfp_t lockdep_set_current_reclaim_state(gfp_t gfp_mask);
+extern void lockdep_restore_current_reclaim_state(gfp_t old_mask);
 extern void lockdep_trace_alloc(gfp_t mask);
 
 # define INIT_LOCKDEP  .lockdep_recursion = 0, 
.lockdep_reclaim_gfp = 0,
@@ -379,8 +379,8 @@ static inline void lockdep_on(void)
 # define lock_release(l, n, i) do { } while (0)
 # define lock_set_class(l, n, k, s, i) do { } while (0)
 # define lock_set_subclass(l, s, i)do { } while (0)
-# define lockdep_set_current_reclaim_state(g)  do { } while (0)
-# define lockdep_clear_current_reclaim_state() do { } while (0)
+# define lockdep_set_current_reclaim_state(g)  (0)
+# define lockdep_restore_current_reclaim_state(g) do { } while (0)
 # define lockdep_trace_alloc(g)do { } while (0)
 # define lockdep_init()do { } while (0)
 # define lockdep_info()do { } while (0)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index eb8a54783fa0..e05b82e92373 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3645,14 +3645,16 @@ int lock_is_held(struct lockdep_map *lock)
 }
 EXPORT_SYMBOL_GPL(lock_is_held);
 
-void lockdep_set_current_reclaim_state(gfp_t gfp_mask)
+gfp_t lockdep_set_current_reclaim_state(gfp_t gfp_mask)
 {
+   gfp_t old = current->lockdep_reclaim_gfp;
current->lockdep_reclaim_gfp = gfp_mask;
+   return old;
 }
 
-void lockdep_clear_current_reclaim_state(void)
+void lockdep_restore_current_reclaim_state(gfp_t gfp_mask)
 {
-   current->lockdep_reclaim_gfp = 0;
+   current->lockdep_reclaim_gfp = gfp_mask;
 }
 
 #ifdef CONFIG_LOCK_STAT
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a3d1f5da2f21..ff8b91aa0b87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2327,20 +2327,21 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order, 
struct zonelist *zonelist,
struct reclaim_state reclaim_state;
int progress;
unsigned int pflags;
+   gfp_t old_mask;
 
cond_resched();
 
/* We now go into synchronous reclaim */
cpuset_memory_pressure_bump();
current_set_flags_nested(, PF_MEMALLOC);
-   lockdep_set_current_reclaim_state(gfp_mask);
+   old_mask = lockdep_set_current_reclaim_state(gfp_mask);
reclaim_state.reclaimed_slab = 0;
current->reclaim_state = _state;
 
progress = try_to_free_pages(zonelist, order, gfp_mask, nodemask);
 
current->reclaim_state = NULL;
-   lockdep_clear_current_reclaim_state();
+   lockdep_restore_current_reclaim_state(old_mask);
current_restore_flags_nested(, PF_MEMALLOC);
 
cond_resched();
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 94acf53d9abf..67165f839936 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3344,16 +3344,17 @@ unsigned long shrink_all_memory(unsigned long 
nr_to_reclaim)
struct task_struct *p = current;
unsigned long nr_reclaimed;
unsigned int pflags;
+   gfp_t old_mask;
 
current_set_flags_nested(, PF_MEMALLOC);
-   lockdep_set_current_reclaim_state(sc.gfp_mask);
+   old_mask = lockdep_set_current_reclaim_state(sc.gfp_mask);
reclaim_state.reclaimed_slab = 0;
p->reclaim_state = _state;
 
nr_reclaimed = do_try_to_free_pages(zonelist, , );
 
p->reclaim_state = NULL;
-   lockdep_clear_current_reclaim_state();
+   lockdep_restore_current_reclaim_state(old_mask);
current_restore_flags_nested(, PF_MEMALLOC);
 
return nr_reclaimed;
@@ -3532,6 +3533,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t 
gfp_mask, unsigned int order)
};
unsigned long nr_slab_pages0, nr_slab_pages1;
unsigned int pflags;
+   gfp_t old_mask;
 
cond_resched();
/*
@@ -3540,7 +3542,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t 

[PATCH 04/19] Make effect of PF_FSTRANS to disable __GFP_FS universal.

2014-04-15 Thread NeilBrown
Currently both xfs and nfs will handle PF_FSTRANS by disabling
__GFP_FS.

Make this effect global by repurposing memalloc_noio_flags (which
does the same thing for PF_MEMALLOC_NOIO and __GFP_IO) to generally
impost the task flags on a gfp_t.
Due to this repurposing we change the name of memalloc_noio_flags
to gfp_from_current().

As PF_FSTRANS now uniformly removes __GFP_FS we can remove special
code for this from xfs and nfs.

As we can now expect other code to set PF_FSTRANS, its meaning is more
general, so the WARN_ON in xfs_vm_writepage() which checks PF_FSTRANS
is not set is no longer appropriate.  PF_FSTRANS may be set for other
reasons than an XFS transaction.

As lockdep cares about __GFP_FS, we need to translate PF_FSTRANS to
__GFP_FS before calling lockdep_alloc_trace() in various places.

Signed-off-by: NeilBrown 
---
 fs/nfs/file.c |3 +--
 fs/xfs/kmem.h |2 --
 fs/xfs/xfs_aops.c |7 ---
 include/linux/sched.h |5 -
 mm/page_alloc.c   |3 ++-
 mm/slab.c |2 ++
 mm/slob.c |2 ++
 mm/slub.c |1 +
 mm/vmscan.c   |4 ++--
 9 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 5bb790a69c71..ed863f52bae7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -472,8 +472,7 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
/* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not
 * doing this memory reclaim for a fs-related allocation.
 */
-   if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL &&
-   !(current->flags & PF_FSTRANS)) {
+   if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL) {
int how = FLUSH_SYNC;
 
/* Don't let kswapd deadlock waiting for OOM RPC calls */
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 64db0e53edea..882b86270ebe 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -50,8 +50,6 @@ kmem_flags_convert(xfs_km_flags_t flags)
lflags = GFP_ATOMIC | __GFP_NOWARN;
} else {
lflags = GFP_KERNEL | __GFP_NOWARN;
-   if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
-   lflags &= ~__GFP_FS;
}
 
if (flags & KM_ZERO)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index db2cfb067d0b..207a7f86d5d7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -952,13 +952,6 @@ xfs_vm_writepage(
PF_MEMALLOC))
goto redirty;
 
-   /*
-* Given that we do not allow direct reclaim to call us, we should
-* never be called while in a filesystem transaction.
-*/
-   if (WARN_ON(current->flags & PF_FSTRANS))
-   goto redirty;
-
/* Is this page beyond the end of the file? */
offset = i_size_read(inode);
end_index = offset >> PAGE_CACHE_SHIFT;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 56fa52a0654c..f3291ed33c27 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1860,10 +1860,13 @@ extern void thread_group_cputime_adjusted(struct 
task_struct *p, cputime_t *ut,
 #define used_math() tsk_used_math(current)
 
 /* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
-static inline gfp_t memalloc_noio_flags(gfp_t flags)
+/* __GFP_FS isn't allowed if PF_FSTRANS is set in current->flags */
+static inline gfp_t gfp_from_current(gfp_t flags)
 {
if (unlikely(current->flags & PF_MEMALLOC_NOIO))
flags &= ~__GFP_IO;
+   if (unlikely(current->flags & PF_FSTRANS))
+   flags &= ~__GFP_FS;
return flags;
 }
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff8b91aa0b87..5e9225df3447 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2718,6 +2718,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
struct mem_cgroup *memcg = NULL;
 
gfp_mask &= gfp_allowed_mask;
+   gfp_mask = gfp_from_current(gfp_mask);
 
lockdep_trace_alloc(gfp_mask);
 
@@ -2765,7 +2766,7 @@ retry_cpuset:
 * can deadlock because I/O on the device might not
 * complete.
 */
-   gfp_mask = memalloc_noio_flags(gfp_mask);
+   gfp_mask = gfp_from_current(gfp_mask);
page = __alloc_pages_slowpath(gfp_mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, migratetype);
diff --git a/mm/slab.c b/mm/slab.c
index b264214c77ea..914d88661f3d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3206,6 +3206,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, 
int nodeid,
int slab_node = numa_mem_id();
 
flags &= gfp_allowed_mask;
+   flags = gfp_from_current(flags);
 
lockdep_trace_alloc(flags);
 
@@ -3293,6 +3294,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, 
unsigned long caller)
void *objp;

[PATCH 05/19] SUNRPC: track whether a request is coming from a loop-back interface.

2014-04-15 Thread NeilBrown
If an incoming NFS request is coming from the local host, then
nfsd will need to perform some special handling.  So detect that
possibility and make the source visible in rq_local.

Signed-off-by: NeilBrown 
---
 include/linux/sunrpc/svc.h  |1 +
 include/linux/sunrpc/svc_xprt.h |1 +
 net/sunrpc/svcsock.c|   10 ++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 04e763221246..a0dbbd1e00e9 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -254,6 +254,7 @@ struct svc_rqst {
u32 rq_prot;/* IP protocol */
unsigned short
rq_secure  : 1; /* secure port */
+   unsigned short  rq_local   : 1; /* local request */
 
void *  rq_argp;/* decoded arguments */
void *  rq_resp;/* xdr'd results */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index b05963f09ebf..b99bdfb0fcf9 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -63,6 +63,7 @@ struct svc_xprt {
 #defineXPT_DETACHED10  /* detached from tempsocks list 
*/
 #define XPT_LISTENER   11  /* listening endpoint */
 #define XPT_CACHE_AUTH 12  /* cache auth info */
+#define XPT_LOCAL  13  /* connection from loopback interface */
 
struct svc_serv *xpt_server;/* service for transport */
atomic_txpt_reserved;   /* space on outq that is rsvd */
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b6e59f0a9475..193115fe968c 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -811,6 +811,7 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt 
*xprt)
struct socket   *newsock;
struct svc_sock *newsvsk;
int err, slen;
+   struct dst_entry *dst;
RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]);
 
dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
@@ -867,6 +868,14 @@ static struct svc_xprt *svc_tcp_accept(struct svc_xprt 
*xprt)
}
svc_xprt_set_local(>sk_xprt, sin, slen);
 
+   clear_bit(XPT_LOCAL, >sk_xprt.xpt_flags);
+   rcu_read_lock();
+   dst = rcu_dereference(newsock->sk->sk_dst_cache);
+   if (dst && dst->dev &&
+   (dst->dev->features & NETIF_F_LOOPBACK))
+   set_bit(XPT_LOCAL, >sk_xprt.xpt_flags);
+   rcu_read_unlock();
+
if (serv->sv_stats)
serv->sv_stats->nettcpconn++;
 
@@ -1112,6 +1121,7 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 
rqstp->rq_xprt_ctxt   = NULL;
rqstp->rq_prot= IPPROTO_TCP;
+   rqstp->rq_local   = !!test_bit(XPT_LOCAL, >sk_xprt.xpt_flags);
 
p = (__be32 *)rqstp->rq_arg.head[0].iov_base;
calldir = p[1];


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/19] nfsd: set PF_FSTRANS for nfsd threads.

2014-04-15 Thread NeilBrown
If a localhost mount is present, then it is easy to deadlock NFS by
nfsd entering direct reclaim and calling nfs_release_page() which
requires nfsd to perform an fsync() (which it cannot do because it is
reclaiming memory).

By setting PF_FSTRANS we stop the memory allocator from ever
attempting any FS operation would could deadlock.

We need this flag set for any thread which is handling a request from
the local host, but we also need to always have it for at least 1 or 2
threads so that we don't end up with all threads blocked in allocation.

When we set PF_FSTRANS we also tell lockdep that we are handling
reclaim so that it can detect deadlocks for us.

Signed-off-by: NeilBrown 
---
 fs/nfsd/nfssvc.c   |   18 ++
 include/linux/sunrpc/svc.h |1 +
 net/sunrpc/svc.c   |6 ++
 3 files changed, 25 insertions(+)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 9a4a5f9e7468..6af8bc2daf7d 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -565,6 +565,8 @@ nfsd(void *vrqstp)
struct svc_xprt *perm_sock = 
list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct svc_xprt), 
xpt_list);
struct net *net = perm_sock->xpt_net;
int err;
+   unsigned int pflags = 0;
+   gfp_t reclaim_state = 0;
 
/* Lock module and set up kernel thread */
mutex_lock(_mutex);
@@ -611,14 +613,30 @@ nfsd(void *vrqstp)
;
if (err == -EINTR)
break;
+   if (rqstp->rq_local && !current_test_flags(PF_FSTRANS)) {
+   current_set_flags_nested(, PF_FSTRANS);
+   atomic_inc(>rq_pool->sp_nr_fstrans);
+   reclaim_state = 
lockdep_set_current_reclaim_state(GFP_KERNEL);
+   }
validate_process_creds();
svc_process(rqstp);
validate_process_creds();
+   if (current_test_flags(PF_FSTRANS) &&
+   atomic_dec_if_positive(>rq_pool->sp_nr_fstrans) >= 
0) {
+   current_restore_flags_nested(, PF_FSTRANS);
+   lockdep_restore_current_reclaim_state(reclaim_state);
+   }
}
 
/* Clear signals before calling svc_exit_thread() */
flush_signals(current);
 
+   if (current_test_flags(PF_FSTRANS)) {
+   current_restore_flags_nested(, PF_FSTRANS);
+   lockdep_restore_current_reclaim_state(reclaim_state);
+   atomic_dec(>rq_pool->sp_nr_fstrans);
+   }
+
mutex_lock(_mutex);
nfsdstats.th_cnt --;
 
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index a0dbbd1e00e9..4b274aba51dd 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -48,6 +48,7 @@ struct svc_pool {
struct list_headsp_threads; /* idle server threads */
struct list_headsp_sockets; /* pending sockets */
unsigned intsp_nrthreads;   /* # of threads in pool */
+   atomic_tsp_nr_fstrans;  /* # threads with PF_FSTRANS */
struct list_headsp_all_threads; /* all server threads */
struct svc_pool_stats   sp_stats;   /* statistics on pool operation 
*/
int sp_task_pending;/* has pending task */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 5de6801cd924..8b13f35b6cbb 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -477,6 +477,12 @@ __svc_create(struct svc_program *prog, unsigned int 
bufsize, int npools,
INIT_LIST_HEAD(>sp_threads);
INIT_LIST_HEAD(>sp_sockets);
INIT_LIST_HEAD(>sp_all_threads);
+   /* The number of threads with PF_FSTRANS set
+* should never be reduced below 2, except when
+* threads exit.  So we use atomic_dec_if_positive()
+* on this value.
+*/
+   atomic_set(>sp_nr_fstrans, -2);
spin_lock_init(>sp_lock);
}
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/19] FS: set PF_FSTRANS while holding mmap_sem in exec.c

2014-04-15 Thread NeilBrown
Because mmap_sem is sometimes(*) taken while holding a sock lock,
and the sock lock might be needed for reclaim (at least when loop-back
NFS is active), we must not block on FS reclaim while mmap_sem is
held.

exec.c allocates memory while holding mmap_sem, and so needs
PF_FSTRANS protection.

* lockdep reports:
[   57.653355][] lock_acquire+0xa8/0x1f0
[   57.653355][] might_fault+0x84/0xb0
[   57.653355][] do_ip_setsockopt.isra.18+0x93d/0xed0
[   57.653355][] ip_setsockopt+0x27/0x90
[   57.653355][] udp_setsockopt+0x16/0x30
[   57.653355][] sock_common_setsockopt+0xf/0x20
[   57.653355][] SyS_setsockopt+0x5e/0xc0
[   57.653355][] system_call_fastpath+0x16/0x1b

to explain why mmap_sem might be taken while sock lock is held.

Signed-off-by: NeilBrown 
---
 fs/exec.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index 3d78fccdd723..2c70a03ddb2b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -652,6 +652,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_size;
unsigned long stack_expand;
unsigned long rlim_stack;
+   unsigned int pflags;
 
 #ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -688,6 +689,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
 
down_write(>mmap_sem);
vm_flags = VM_STACK_FLAGS;
+   current_set_flags_nested(, PF_FSTRANS);
 
/*
 * Adjust stack execute permissions; explicitly enable for
@@ -741,6 +743,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
ret = -EFAULT;
 
 out_unlock:
+   current_restore_flags_nested(, PF_FSTRANS);
up_write(>mmap_sem);
return ret;
 }
@@ -1369,6 +1372,7 @@ int search_binary_handler(struct linux_binprm *bprm)
bool need_retry = IS_ENABLED(CONFIG_MODULES);
struct linux_binfmt *fmt;
int retval;
+   unsigned int pflags;
 
/* This allows 4 levels of binfmt rewrites before failing hard. */
if (bprm->recursion_depth > 5)
@@ -1381,6 +1385,7 @@ int search_binary_handler(struct linux_binprm *bprm)
retval = -ENOENT;
  retry:
read_lock(_lock);
+   current_set_flags_nested(, PF_FSTRANS);
list_for_each_entry(fmt, , lh) {
if (!try_module_get(fmt->module))
continue;
@@ -1396,6 +1401,7 @@ int search_binary_handler(struct linux_binprm *bprm)
read_lock(_lock);
put_binfmt(fmt);
}
+   current_restore_flags_nested(, PF_FSTRANS);
read_unlock(_lock);
 
if (need_retry && retval == -ENOEXEC) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Li Zefan
On 2014/4/16 11:50, Eric W. Biederman wrote:
> Kay Sievers  writes:
> 
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
 cgroup users often need a way to determine when a cgroup's
 subhierarchy becomes empty so that it can be cleaned up.  cgroup
 currently provides release_agent for it; unfortunately, this mechanism
 is riddled with issues.

 * It delivers events by forking and execing a userland binary
   specified as the release_agent.  This is a long deprecated method of
   notification delivery.  It's extremely heavy, slow and cumbersome to
   integrate with larger infrastructure.

 * There is single monitoring point at the root.  There's no way to
   delegate management of subtree.

 * The event isn't recursive.  It triggers when a cgroup doesn't have
   any tasks or child cgroups.  Events for internal nodes trigger only
   after all children are removed.  This again makes it impossible to
   delegate management of subtree.

 * Events are filtered from the kernel side.  "notify_on_release" file
   is used to subscribe to or suppress release event.  This is
   unnecessarily complicated and probably done this way because event
   delivery itself was expensive.

 This patch implements interface file "cgroup.subtree_populated" which
 can be used to monitor whether the cgroup's subhierarchy has tasks in
 it or not.  Its value is 0 if there is no task in the cgroup and its
 descendants; otherwise, 1, and kernfs_notify() notificaiton is
 triggers when the value changes, which can be monitored through poll
 and [di]notify.

>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>>   empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>>   empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
> 
> inotify won't work on cgroupfs.
> 

Tejun's working on inotify support for cgroupfs, and I believe this patchset
has been tested, so it works.

So what do you mean by saying it won't work? Could you be more specific?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/19] nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc.

2014-04-15 Thread NeilBrown
nfsd will sometimes call flush_workqueue on the workqueue running
nfsd4_do_callback_rpc, so we must ensure that it doesn't block in
filesystem reclaim.
So set PF_FSTRANS.

Signed-off-by: NeilBrown 
---
 fs/nfsd/nfs4callback.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 7f05cd140de3..7784b5d4edf0 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -997,6 +997,9 @@ static void nfsd4_do_callback_rpc(struct work_struct *w)
struct nfsd4_callback *cb = container_of(w, struct nfsd4_callback, 
cb_work);
struct nfs4_client *clp = cb->cb_clp;
struct rpc_clnt *clnt;
+   unsigned int pflags;
+
+   current_set_flags_nested(, PF_FSTRANS);
 
if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
nfsd4_process_cb_update(cb);
@@ -1005,11 +1008,13 @@ static void nfsd4_do_callback_rpc(struct work_struct *w)
if (!clnt) {
/* Callback channel broken, or client killed; give up: */
nfsd4_release_cb(cb);
+   current_restore_flags_nested(, PF_FSTRANS);
return;
}
cb->cb_msg.rpc_cred = clp->cl_cb_cred;
rpc_call_async(clnt, >cb_msg, RPC_TASK_SOFT | RPC_TASK_SOFTCONN,
cb->cb_ops, cb);
+   current_restore_flags_nested(, PF_FSTRANS);
 }
 
 void nfsd4_init_callback(struct nfsd4_callback *cb)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V1 Resend 4/5] tick-sched: don't call update_wall_time() when delta is lesser than tick_period

2014-04-15 Thread Viresh Kumar
On 16 April 2014 00:14, Thomas Gleixner  wrote:
> On Tue, 15 Apr 2014, Viresh Kumar wrote:
>
>> In tick_do_update_jiffies64() we are processing ticks only if delta is 
>> greater
>> than tick_period. This is what we are supposed to do here and it broke a bit
>> with this patch:
>>
>> commit 47a1b796306356f358e515149d86baf0cc6bf007
>> Author: John Stultz 
>> Date:   Thu Dec 12 13:10:55 2013 -0800
>>
>> tick/timekeeping: Call update_wall_time outside the jiffies lock
>
> Please look how I massaged the change log. There is no point in
> copying the whole gunk.

I see.. Nice.

>> With above patch, we might end up calling update_wall_time() even if delta is
>> found to be smaller that tick_period. Fix this by reversing the check and
>> returning early.
>
> Well.
>
>> Cc:  # v3.14+
>> Cc: John Stultz 
>> Signed-off-by: Viresh Kumar 
>> ---
>>  kernel/time/tick-sched.c | 32 +---
>>  1 file changed, 17 insertions(+), 15 deletions(-)
>
> That's not how we do bug fixes if they can be done with 3 lines of
> change. See the commit.

I tried that initially but with these changes as well (which must
be done now ??), which probably makes it more clear ?:

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 3cafe7d..0e70b1c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -84,12 +84,12 @@ static void tick_do_update_jiffies64(ktime_t now)

/* Keep the tick_next_period variable up to date */
tick_next_period = ktime_add(last_jiffies_update, tick_period);
+
+   write_sequnlock(_lock);
+   update_wall_time();
} else {
write_sequnlock(_lock);
-   return;
}
-   write_sequnlock(_lock);
-   update_wall_time();
 }

 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/19] driver core: set PF_FSTRANS while holding gdp_mutex

2014-04-15 Thread NeilBrown
lockdep reports a locking chain:

  sk_lock-AF_INET --> rtnl_mutex --> gdp_mutex

As sk_lock can be needed for memory reclaim (when loop-back NFS is in
use at least), any memory allocation under gdp_mutex needs to
be protected by PF_FSTRANS.

The path frome rtnl_mutex to gdp_mutex is

[] get_device_parent+0x4c/0x1f0
[] device_add+0xe6/0x610
[] netdev_register_kobject+0x7a/0x130
[] register_netdevice+0x354/0x550
[] register_netdev+0x15/0x30

Signed-off-by: NeilBrown 
---
 drivers/base/core.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b567177ef78..1a2735237650 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -750,6 +750,7 @@ static struct kobject *get_device_parent(struct device *dev,
struct kobject *kobj = NULL;
struct kobject *parent_kobj;
struct kobject *k;
+   unsigned int pflags;
 
 #ifdef CONFIG_BLOCK
/* block disks show up in /sys/block */
@@ -788,7 +789,9 @@ static struct kobject *get_device_parent(struct device *dev,
}
 
/* or create a new class-directory at the parent device */
+   current_set_flags_nested(, PF_FSTRANS);
k = class_dir_create_and_add(dev->class, parent_kobj);
+   current_restore_flags_nested(, PF_FSTRANS);
/* do not emit an uevent for this simple "glue" directory */
mutex_unlock(_mutex);
return k;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.

2014-04-15 Thread NeilBrown
__d_alloc can be called with i_mutex held, so it is safer to
use GFP_NOFS.

lockdep reports this can deadlock when loop-back NFS is in use,
as nfsd may be required to write out for reclaim, and nfsd certainly
takes i_mutex.

Signed-off-by: NeilBrown 
---
 fs/dcache.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ca02c13a84aa..3651ff6185b4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1483,7 +1483,7 @@ struct dentry *__d_alloc(struct super_block *sb, const 
struct qstr *name)
struct dentry *dentry;
char *dname;
 
-   dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
+   dentry = kmem_cache_alloc(dentry_cache, GFP_NOFS);
if (!dentry)
return NULL;
 
@@ -1495,7 +1495,7 @@ struct dentry *__d_alloc(struct super_block *sb, const 
struct qstr *name)
 */
dentry->d_iname[DNAME_INLINE_LEN-1] = 0;
if (name->len > DNAME_INLINE_LEN-1) {
-   dname = kmalloc(name->len + 1, GFP_KERNEL);
+   dname = kmalloc(name->len + 1, GFP_NOFS);
if (!dname) {
kmem_cache_free(dentry_cache, dentry); 
return NULL;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/19] XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks

2014-04-15 Thread NeilBrown
memory allocates can happen while the xfs ilock is held in
xfs_free_eofblocks, particularly

  [] kmem_zone_alloc+0x67/0xc0
  [] xfs_trans_add_item+0x25/0x50
  [] xfs_trans_ijoin+0x2c/0x60
  [] xfs_itruncate_extents+0xbe/0x400
  [] xfs_free_eofblocks+0x1c4/0x240

So set PF_FSTRANS to avoid this causing a deadlock.

Care is needed here as xfs_trans_reserve() also sets PF_FSTRANS, while
xfs_trans_cancel and xfs_trans_commit will clear it.
So our extra setting must fully nest these calls.

Signed-off-by: NeilBrown 
---
 fs/xfs/xfs_bmap_util.c |4 
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f264616080ca..53761fe4fada 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -889,6 +889,7 @@ xfs_free_eofblocks(
xfs_filblks_t   map_len;
int nimaps;
xfs_bmbt_irec_t imap;
+   unsigned int pflags;
 
/*
 * Figure out if there are any blocks beyond the end
@@ -929,12 +930,14 @@ xfs_free_eofblocks(
}
}
 
+   current_set_flags_nested(, PF_FSTRANS);
error = xfs_trans_reserve(tp, _RES(mp)->tr_itruncate, 0, 0);
if (error) {
ASSERT(XFS_FORCED_SHUTDOWN(mp));
xfs_trans_cancel(tp, 0);
if (need_iolock)
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+   current_restore_flags_nested(, PF_FSTRANS);
return error;
}
 
@@ -964,6 +967,7 @@ xfs_free_eofblocks(
xfs_inode_clear_eofblocks_tag(ip);
}
 
+   current_restore_flags_nested(, PF_FSTRANS);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
if (need_iolock)
xfs_iunlock(ip, XFS_IOLOCK_EXCL);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/19] MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.

2014-04-15 Thread NeilBrown
lockdep reports a locking chain

  sk_lock-AF_INET --> rtnl_mutex --> pcpu_alloc_mutex

As sk_lock may be needed to reclaim memory, allowing that
reclaim while pcu_alloc_mutex is held can lead to deadlock.
So set PF_FSTRANS while it is help to avoid the FS reclaim.

pcpu_alloc_mutex can be taken when rtnl_mutex is held:

[] pcpu_alloc+0x49/0x960
[] __alloc_percpu+0xb/0x10
[] loopback_dev_init+0x17/0x60
[] register_netdevice+0xec/0x550
[] register_netdev+0x15/0x30

Signed-off-by: NeilBrown 
---
 mm/percpu.c |4 
 1 file changed, 4 insertions(+)

diff --git a/mm/percpu.c b/mm/percpu.c
index 036cfe07050f..77dd24032f41 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -712,6 +712,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved)
int slot, off, new_alloc;
unsigned long flags;
void __percpu *ptr;
+   unsigned int pflags;
 
if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
WARN(true, "illegal size (%zu) or align (%zu) for "
@@ -720,6 +721,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, 
bool reserved)
}
 
mutex_lock(_alloc_mutex);
+   current_set_flags_nested(, PF_FSTRANS);
spin_lock_irqsave(_lock, flags);
 
/* serve reserved allocations from the reserved chunk if available */
@@ -801,6 +803,7 @@ area_found:
goto fail_unlock;
}
 
+   current_restore_flags_nested(, PF_FSTRANS);
mutex_unlock(_alloc_mutex);
 
/* return address relative to base address */
@@ -811,6 +814,7 @@ area_found:
 fail_unlock:
spin_unlock_irqrestore(_lock, flags);
 fail_unlock_mutex:
+   current_restore_flags_nested(, PF_FSTRANS);
mutex_unlock(_alloc_mutex);
if (warn_limit) {
pr_warning("PERCPU: allocation failed, size=%zu align=%zu, "


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/19] VFS: set PF_FSTRANS while namespace_sem is held.

2014-04-15 Thread NeilBrown
namespace_sem can be taken while various i_mutex locks are held, so we
need to avoid reclaim from blocking on an FS (particularly loop-back
NFS).

A memory allocation happens under namespace_sem at least in:

[] kmem_cache_alloc+0x4f/0x290
[] alloc_vfsmnt+0x1f/0x1d0
[] clone_mnt+0x2a/0x310
[] copy_tree+0x53/0x380
[] copy_mnt_ns+0x7f/0x280
[] create_new_namespaces+0x5c/0x190
[] unshare_nsproxy_namespaces+0x59/0x90

So set PF_FSTRANS in namespace_lock() and restore in
namespace_unlock().

Signed-off-by: NeilBrown 
---
 fs/namespace.c |4 
 1 file changed, 4 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2ffc5a2905d4..83dcd5083dbb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -63,6 +63,7 @@ static struct hlist_head *mount_hashtable __read_mostly;
 static struct hlist_head *mountpoint_hashtable __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
 static DECLARE_RWSEM(namespace_sem);
+static unsigned long namespace_sem_pflags;
 
 /* /sys/fs */
 struct kobject *fs_kobj;
@@ -1196,6 +1197,8 @@ static void namespace_unlock(void)
struct mount *mnt;
struct hlist_head head = unmounted;
 
+   current_restore_flags_nested(_sem_pflags, PF_FSTRANS);
+
if (likely(hlist_empty())) {
up_write(_sem);
return;
@@ -1220,6 +1223,7 @@ static void namespace_unlock(void)
 static inline void namespace_lock(void)
 {
down_write(_sem);
+   current_set_flags_nested(_sem_pflags, PF_FSTRANS);
 }
 
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/19] nfsd: set PF_FSTRANS when client_mutex is held.

2014-04-15 Thread NeilBrown
When loop-back NFS with NFSv4 is in use, client_mutex might be needed
to reclaim memory, so any memory allocation while client_mutex is held
must avoid __GFP_FS, so best to set PF_FSTRANS.

Signed-off-by: NeilBrown 
---
 fs/nfsd/nfs4state.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d5d070fbeb35..7b7fbcbe20cb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -75,6 +75,7 @@ static int check_for_locks(struct nfs4_file *filp, struct 
nfs4_lockowner *lowner
 
 /* Currently used for almost all code touching nfsv4 state: */
 static DEFINE_MUTEX(client_mutex);
+static unsigned int client_mutex_pflags;
 
 /*
  * Currently used for the del_recall_lru and file hash table.  In an
@@ -93,6 +94,7 @@ void
 nfs4_lock_state(void)
 {
mutex_lock(_mutex);
+   current_set_flags_nested(_mutex_pflags, PF_FSTRANS);
 }
 
 static void free_session(struct nfsd4_session *);
@@ -127,6 +129,7 @@ static __be32 nfsd4_get_session_locked(struct nfsd4_session 
*ses)
 void
 nfs4_unlock_state(void)
 {
+   current_restore_flags_nested(_mutex_pflags, PF_FSTRANS);
mutex_unlock(_mutex);
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/19] Set PF_FSTRANS while write_cache_pages calls ->writepage

2014-04-15 Thread NeilBrown
It is normally safe for direct reclaim to enter filesystems
even when a page is locked - as can happen if ->writepage
allocates memory with GFP_KERNEL (which xfs does).

However if a localhost NFS mount is present, then a flush-*
thread might hold a page locked and then in direct reclaim,
ask nfs to commit an inode (nfs_release_page).  When nfsd
performs the fsync it might try to lock the same page, which leads to
a deadlock.

A ->writepage should not allocate much memory, or do so very often, so
it is safe to set PF_FSTRANS, and this removes the possible deadlock.

This was not detected by lockdep as it doesn't monitor the page lock.
It was found as a real deadlock in testing.

Signed-off-by: NeilBrown  
---
 mm/page-writeback.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7106cb1aca8e..572e70b9a3f7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1909,6 +1909,7 @@ retry:
 
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
+   unsigned int pflags;
 
/*
 * At this point, the page may be truncated or
@@ -1960,8 +1961,10 @@ continue_unlock:
if (!clear_page_dirty_for_io(page))
goto continue_unlock;
 
+   current_set_flags_nested(, PF_FSTRANS);
trace_wbc_writepage(wbc, mapping->backing_dev_info);
ret = (*writepage)(page, wbc, data);
+   current_restore_flags_nested(, PF_FSTRANS);
if (unlikely(ret)) {
if (ret == AOP_WRITEPAGE_ACTIVATE) {
unlock_page(page);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/19] NET: set PF_FSTRANS while holding sk_lock

2014-04-15 Thread NeilBrown
sk_lock can be taken while reclaiming memory (in nfsd for loop-back
NFS mounts, and presumably in nfs), and memory can be allocated while
holding sk_lock, at least via:

 inet_listen -> inet_csk_listen_start ->reqsk_queue_alloc

So to avoid deadlocks, always set PF_FSTRANS while holding sk_lock.

This deadlock was found by lockdep.

Signed-off-by: NeilBrown 
---
 include/net/sock.h |1 +
 net/core/sock.c|2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index b9586a137cad..27c355637e44 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -324,6 +324,7 @@ struct sock {
 #define sk_v6_rcv_saddr__sk_common.skc_v6_rcv_saddr
 
socket_lock_t   sk_lock;
+   unsigned intsk_pflags; /* process flags before taking lock 
*/
struct sk_buff_head sk_receive_queue;
/*
 * The backlog queue is special, it is always used with
diff --git a/net/core/sock.c b/net/core/sock.c
index cf9bd24e4099..8bc677ef072e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2341,6 +2341,7 @@ void lock_sock_nested(struct sock *sk, int subclass)
/*
 * The sk_lock has mutex_lock() semantics here:
 */
+   current_set_flags_nested(>sk_pflags, PF_FSTRANS);
mutex_acquire(>sk_lock.dep_map, subclass, 0, _RET_IP_);
local_bh_enable();
 }
@@ -2352,6 +2353,7 @@ void release_sock(struct sock *sk)
 * The sk_lock has mutex_unlock() semantics:
 */
mutex_release(>sk_lock.dep_map, 1, _RET_IP_);
+   current_restore_flags_nested(>sk_pflags, PF_FSTRANS);
 
spin_lock_bh(>sk_lock.slock);
if (sk->sk_backlog.tail)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/19] NET: set PF_FSTRANS while holding rtnl_lock

2014-04-15 Thread NeilBrown
As rtnl_mutex can be taken while holding sk_lock, and sk_lock can be
taken while performing memory reclaim (at least when loop-back NFS is
active), any memory allocation under rtnl_mutex must avoid __GFP_FS,
which is most easily done by setting PF_MEMALLOC.


CPU0CPU1

   lock(rtnl_mutex);
lock(sk_lock-AF_INET);
lock(rtnl_mutex);
   
 lock(sk_lock-AF_INET);

  *** DEADLOCK ***

1/ rtnl_mutex is taken while holding sk_lock:

[] rtnl_lock+0x12/0x20
[] ip_mc_leave_group+0x2a/0x160
[] do_ip_setsockopt.isra.18+0x96b/0xed0
[] ip_setsockopt+0x27/0x90
[] udp_setsockopt+0x16/0x30
[] sock_common_setsockopt+0xf/0x20
[] SyS_setsockopt+0x5e/0xc0

2/ memory is allocated under rtnl_mutex:
[] kobject_set_name_vargs+0x21/0x70
[] dev_set_name+0x42/0x50
[] netdev_register_kobject+0x57/0x130
[] register_netdevice+0x354/0x550
[] register_netdev+0x15/0x30


Signed-off-by: NeilBrown 
---
 net/core/rtnetlink.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 120eecc0f5a4..6870211e93a6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -61,15 +61,18 @@ struct rtnl_link {
 };
 
 static DEFINE_MUTEX(rtnl_mutex);
+static int rtnl_pflags;
 
 void rtnl_lock(void)
 {
mutex_lock(_mutex);
+   current_set_flags_nested(_pflags, PF_FSTRANS);
 }
 EXPORT_SYMBOL(rtnl_lock);
 
 void __rtnl_unlock(void)
 {
+   current_restore_flags_nested(_pflags, PF_FSTRANS);
mutex_unlock(_mutex);
 }
 
@@ -82,7 +85,11 @@ EXPORT_SYMBOL(rtnl_unlock);
 
 int rtnl_trylock(void)
 {
-   return mutex_trylock(_mutex);
+   if (mutex_trylock(_mutex)) {
+   current_set_flags_nested(_pflags, PF_FSTRANS);
+   return 1;
+   }
+   return 0;
 }
 EXPORT_SYMBOL(rtnl_trylock);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/19] XFS: ensure xfs_file_*_read cannot deadlock in memory allocation.

2014-04-15 Thread NeilBrown
xfs_file_*_read holds an inode lock while calling a generic 'read'
function.  These functions perform read-ahead and are quite likely to
allocate memory.
So set PF_FSTRANS to ensure they avoid __GFP_FS and so don't recurse
into a filesystem to free memory.

This can be a problem with loop-back NFS mounts, if free_pages ends up
wating in nfs_release_page(), and nfsd is blocked waiting for the lock
that this code holds.

This was found both by lockdep and as a real deadlock during testing.

Signed-off-by: NeilBrown 
---
 fs/xfs/xfs_file.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 64b48eade91d..88b33ef64668 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -243,6 +243,7 @@ xfs_file_aio_read(
ssize_t ret = 0;
int ioflags = 0;
xfs_fsize_t n;
+   unsigned intpflags;
 
XFS_STATS_INC(xs_read_calls);
 
@@ -290,6 +291,10 @@ xfs_file_aio_read(
 * proceeed concurrently without serialisation.
 */
xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
+   /* As we hold a lock, we must ensure that any allocation
+* in generic_file_aio_read avoid __GFP_FS
+*/
+   current_set_flags_nested(, PF_FSTRANS);
if ((ioflags & IO_ISDIRECT) && inode->i_mapping->nrpages) {
xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
@@ -313,6 +318,7 @@ xfs_file_aio_read(
if (ret > 0)
XFS_STATS_ADD(xs_read_bytes, ret);
 
+   current_restore_flags_nested(, PF_FSTRANS);
xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
return ret;
 }
@@ -328,6 +334,7 @@ xfs_file_splice_read(
struct xfs_inode*ip = XFS_I(infilp->f_mapping->host);
int ioflags = 0;
ssize_t ret;
+   unsigned intpflags;
 
XFS_STATS_INC(xs_read_calls);
 
@@ -338,6 +345,10 @@ xfs_file_splice_read(
return -EIO;
 
xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
+   /* As we hold a lock, we must ensure that any allocation
+* in generic_file_splice_read avoid __GFP_FS
+*/
+   current_set_flags_nested(, PF_FSTRANS);
 
trace_xfs_file_splice_read(ip, count, *ppos, ioflags);
 
@@ -345,6 +356,7 @@ xfs_file_splice_read(
if (ret > 0)
XFS_STATS_ADD(xs_read_bytes, ret);
 
+   current_restore_flags_nested(, PF_FSTRANS);
xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
return ret;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/19] nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list.

2014-04-15 Thread NeilBrown
nfsd already uses PF_LESS_THROTTLE (and is the only user) to avoid
throttling while dirtying pages.  Use it also to avoid throttling while
doing direct reclaim as this can stall nfsd in the same way.

Also only set PF_LESS_THROTTLE when handling a 'write' request for a
local connection.  This is the only time when the throttling can cause
a problem.  In other cases we should throttle if the system is busy.

Signed-off-by: NeilBrown 
---
 fs/nfsd/nfssvc.c |6 --
 fs/nfsd/vfs.c|6 ++
 mm/vmscan.c  |7 +--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 6af8bc2daf7d..cd24aa76e58d 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -593,12 +593,6 @@ nfsd(void *vrqstp)
nfsdstats.th_cnt++;
mutex_unlock(_mutex);
 
-   /*
-* We want less throttling in balance_dirty_pages() so that nfs to
-* localhost doesn't cause nfsd to lock up due to all the client's
-* dirty pages.
-*/
-   current->flags |= PF_LESS_THROTTLE;
set_freezable();
 
/*
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6d7be3f80356..be2d7af3beee 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -913,6 +913,10 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, 
struct file *file,
int stable = *stablep;
int use_wgather;
loff_t  pos = offset;
+   unsigned intpflags;
+
+   if (rqstp->rq_local)
+   current_set_flags_nested(, PF_LESS_THROTTLE);
 
dentry = file->f_path.dentry;
inode = dentry->d_inode;
@@ -950,6 +954,8 @@ out_nfserr:
err = 0;
else
err = nfserrno(host_err);
+   if (rqstp->rq_local)
+   current_restore_flags_nested(, PF_LESS_THROTTLE);
return err;
 }
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 05de3289d031..1b7c4e44f0a1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1552,7 +1552,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
lruvec *lruvec,
 * implies that pages are cycling through the LRU faster than
 * they are written so also forcibly stall.
 */
-   if (nr_unqueued_dirty == nr_taken || nr_immediate)
+   if ((nr_unqueued_dirty == nr_taken || nr_immediate)
+   && !current_test_flags(PF_LESS_THROTTLE))
congestion_wait(BLK_RW_ASYNC, HZ/10);
}
 
@@ -1561,7 +1562,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
lruvec *lruvec,
 * is congested. Allow kswapd to continue until it starts encountering
 * unqueued dirty pages or cycling through the LRU too quickly.
 */
-   if (!sc->hibernation_mode && !current_is_kswapd())
+   if (!sc->hibernation_mode &&
+   !current_is_kswapd() &&
+   !current_test_flags(PF_LESS_THROTTLE))
wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
 
trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Li Zefan
On 2014/4/16 11:33, Kay Sievers wrote:
> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up.  cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>>   specified as the release_agent.  This is a long deprecated method of
>>>   notification delivery.  It's extremely heavy, slow and cumbersome to
>>>   integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root.  There's no way to
>>>   delegate management of subtree.
>>>
>>> * The event isn't recursive.  It triggers when a cgroup doesn't have
>>>   any tasks or child cgroups.  Events for internal nodes trigger only
>>>   after all children are removed.  This again makes it impossible to
>>>   delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side.  "notify_on_release" file
>>>   is used to subscribe to or suppress release event.  This is
>>>   unnecessarily complicated and probably done this way because event
>>>   delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not.  Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>   empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>>   empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
> 
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.
> 

Never use inotify. Thanks for explanation, so I think inotify can scale
to thounsands of cgroups after I googled a bit.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/19] lockdep: improve scenario messages for RECLAIM_FS errors.

2014-04-15 Thread NeilBrown
lockdep can check for locking problems involving reclaim using
the same infrastructure as used for interrupts.

However a number of the messages still refer to interrupts even
if it was actually a reclaim-related problem.

So determine where the problem was caused by reclaim or irq and adjust
messages accordingly.

Signed-off-by: NeilBrown 
---
 kernel/locking/lockdep.c |   43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e05b82e92373..33d2ac7519dc 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1423,7 +1423,8 @@ static void
 print_irq_lock_scenario(struct lock_list *safe_entry,
struct lock_list *unsafe_entry,
struct lock_class *prev_class,
-   struct lock_class *next_class)
+   struct lock_class *next_class,
+   int reclaim)
 {
struct lock_class *safe_class = safe_entry->class;
struct lock_class *unsafe_class = unsafe_entry->class;
@@ -1455,20 +1456,27 @@ print_irq_lock_scenario(struct lock_list *safe_entry,
printk("\n\n");
}
 
-   printk(" Possible interrupt unsafe locking scenario:\n\n");
+   if (reclaim)
+   printk(" Possible reclaim unsafe locking scenario:\n\n");
+   else
+   printk(" Possible interrupt unsafe locking scenario:\n\n");
printk("   CPU0CPU1\n");
printk("   \n");
printk("  lock(");
__print_lock_name(unsafe_class);
printk(");\n");
-   printk("   local_irq_disable();\n");
+   if (!reclaim)
+   printk("   local_irq_disable();\n");
printk("   lock(");
__print_lock_name(safe_class);
printk(");\n");
printk("   lock(");
__print_lock_name(middle_class);
printk(");\n");
-   printk("  \n");
+   if (reclaim)
+   printk("  \n");
+   else
+   printk("  \n");
printk("lock(");
__print_lock_name(safe_class);
printk(");\n");
@@ -1487,6 +1495,8 @@ print_bad_irq_dependency(struct task_struct *curr,
 enum lock_usage_bit bit2,
 const char *irqclass)
 {
+   int reclaim = strncmp(irqclass, "RECLAIM", 7) == 0;
+
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
 
@@ -1528,7 +1538,7 @@ print_bad_irq_dependency(struct task_struct *curr,
 
printk("\nother info that might help us debug this:\n\n");
print_irq_lock_scenario(backwards_entry, forwards_entry,
-   hlock_class(prev), hlock_class(next));
+   hlock_class(prev), hlock_class(next), reclaim);
 
lockdep_print_held_locks(curr);
 
@@ -2200,7 +2210,7 @@ static void check_chain_key(struct task_struct *curr)
 }
 
 static void
-print_usage_bug_scenario(struct held_lock *lock)
+print_usage_bug_scenario(struct held_lock *lock, enum lock_usage_bit new_bit)
 {
struct lock_class *class = hlock_class(lock);
 
@@ -2210,7 +2220,11 @@ print_usage_bug_scenario(struct held_lock *lock)
printk("  lock(");
__print_lock_name(class);
printk(");\n");
-   printk("  \n");
+   if (new_bit == LOCK_USED_IN_RECLAIM_FS ||
+   new_bit == LOCK_USED_IN_RECLAIM_FS_READ)
+   printk("  \n");
+   else
+   printk("  \n");
printk("lock(");
__print_lock_name(class);
printk(");\n");
@@ -2246,7 +2260,7 @@ print_usage_bug(struct task_struct *curr, struct 
held_lock *this,
 
print_irqtrace_events(curr);
printk("\nother info that might help us debug this:\n");
-   print_usage_bug_scenario(this);
+   print_usage_bug_scenario(this, new_bit);
 
lockdep_print_held_locks(curr);
 
@@ -2285,13 +2299,17 @@ print_irq_inversion_bug(struct task_struct *curr,
struct lock_list *entry = other;
struct lock_list *middle = NULL;
int depth;
+   int reclaim = strncmp(irqclass, "RECLAIM", 7) == 0;
 
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
 
printk("\n");
printk("=\n");
-   printk("[ INFO: possible irq lock inversion dependency detected ]\n");
+   if (reclaim)
+   printk("[ INFO: possible memory reclaim lock inversion 
dependency detected ]\n");
+   else
+   printk("[ INFO: possible irq lock inversion dependency detected 
]\n");
print_kernel_ident();
printk("-\n");
printk("%s/%d just changed the 

[PATCH/RFC 00/19] Support loop-back NFS mounts

2014-04-15 Thread NeilBrown
Loop-back NFS mounts are when the NFS client and server run on the
same host.

The use-case for this is a high availability cluster with shared
storage.  The shared filesystem is mounted on any one machine and
NFS-mounted on the others.
If the nfs server fails, some other node will take over that service,
and then it will have a loop-back NFS mount which needs to keep
working.

This patch set addresses the "keep working" bit and specifically
addresses deadlocks and livelocks.
Allowing the fail-over itself to be deadlock free is a separate
challenge for another day.

The short description of how this works is:

deadlocks:
  - Elevate PF_FSTRANS to apply globally instead of just in NFS and XFS.
PF_FSTRANS disables __GFP_NS in the same way that PF_MEMALLOC_NOIO
disables __GFP_IO.
  - Set PF_FSTRANS in nfsd when handling requests related to
memory reclaim, or requests which could block requests related
to memory reclaim.
  - Use lockdep to find all consequent deadlocks from some other
thread allocating memory while holding a lock that nfsd might
want.
  - Fix those other deadlocks by setting PF_FSTRANS or using GFP_NOFS
as appropriate.

livelocks:
  - identify throttling during reclaim and bypass it when
PF_LESS_THROTTLE is set
  - only set PF_LESS_THROTTLE for nfsd when handling write requests
from the local host.

The last 12 patches address various deadlocks due to locking chains.
11 were found by lockdep, 2 by testing.  There is a reasonable chance
that there are more, I just need to exercise more code while
testing

There is one issue that lockdep reports which I haven't fixed (I've
just hacked the code out for my testing).  That issue relates to
freeze_super().
I may not be interpreting the lockdep reports perfectly, but I think
they are basically saying that if I were to freeze a filesystem that
was exported to the local host, then we could end up deadlocking.
This is to be expected.  The NFS filesystem would need to be frozen
first.  I don't know how to tell lockdep that I know that is a problem
and I don't want to be warned about it.  Suggestions welcome.
Until this is addressed I cannot really ask others to test the code
with lockdep enabled.

There are more subsidiary places that I needed to add PF_FSTRANS than
I would have liked.  The thought keeps crossing my mind that maybe we
can get rid of __GFP_FS and require that memory reclaim never ever
block on a filesystem.  Then most of these patches go away.

Now that writeback doesn't happen from reclaim (but from kswapd) much
of the calls from reclaim to FS are gone.
The ->releasepage call is the only one that I *know* causes me
problems so I'd like to just say that that must never block.  I don't
really understand the consequences of that though.
There are a couple of other places where __GFP_FS is used and I'd need
to carefully analyze those.  But if someone just said "no, that is
impossible", I could be happy and stick with the current approach

I've cc:ed Peter Zijlstra and Ingo Molnar only on the lockdep-related
patches, Ming Lei only on the PF_MEMALLOC_NOIO related patches,
and net-dev only on the network-related patches.
There are probably other people I should CC.  Apologies if I missed you.
I'll ensure better coverage if the nfs/mm/xfs people are reasonably happy.

Comments, criticisms, etc most welcome.

Thanks,
NeilBrown


---

NeilBrown (19):
  Promote current_{set,restore}_flags_nested from xfs to global.
  lockdep: lockdep_set_current_reclaim_state should save old value
  lockdep: improve scenario messages for RECLAIM_FS errors.
  Make effect of PF_FSTRANS to disable __GFP_FS universal.
  SUNRPC: track whether a request is coming from a loop-back interface.
  nfsd: set PF_FSTRANS for nfsd threads.
  nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in 
shrink_inactive_list.
  Set PF_FSTRANS while write_cache_pages calls ->writepage
  XFS: ensure xfs_file_*_read cannot deadlock in memory allocation.
  NET: set PF_FSTRANS while holding sk_lock
  FS: set PF_FSTRANS while holding mmap_sem in exec.c
  NET: set PF_FSTRANS while holding rtnl_lock
  MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.
  driver core: set PF_FSTRANS while holding gdp_mutex
  nfsd: set PF_FSTRANS when client_mutex is held.
  VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
  VFS: set PF_FSTRANS while namespace_sem is held.
  nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc.
  XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks


 drivers/base/core.c |3 ++
 drivers/base/power/runtime.c|6 ++---
 drivers/block/nbd.c |6 ++---
 drivers/md/dm-bufio.c   |6 ++---
 drivers/md/dm-ioctl.c   |6 ++---
 drivers/mtd/nand/nandsim.c  |   28 ++---
 drivers/scsi/iscsi_tcp.c|6 ++---
 drivers/usb/core/hub.c  |6 ++---
 

[PATCH 01/19] Promote current_{set, restore}_flags_nested from xfs to global.

2014-04-15 Thread NeilBrown
These are useful macros from xfs for modifying current->flags.
Other places in the kernel perform the same task in various different
ways.
This patch moves the macros from xfs to include/linux/sched.h and
changes all code which temporarily sets a current->flags flag to
use these macros.

This does not change functionality in any important, but does fix a
few sites which assume that PF_FSTRANS is not already set and so
arbitrarily set and then clear it.  The new code is more careful and
will only clear it if it was previously clear.

Signed-off-by: NeilBrown 
---
 drivers/base/power/runtime.c|6 +++---
 drivers/block/nbd.c |6 +++---
 drivers/md/dm-bufio.c   |6 +++---
 drivers/md/dm-ioctl.c   |6 +++---
 drivers/mtd/nand/nandsim.c  |   28 
 drivers/scsi/iscsi_tcp.c|6 +++---
 drivers/usb/core/hub.c  |6 +++---
 fs/fs-writeback.c   |5 +++--
 fs/xfs/xfs_linux.h  |7 ---
 include/linux/sched.h   |   27 ---
 kernel/softirq.c|6 +++---
 mm/migrate.c|9 -
 mm/page_alloc.c |   10 ++
 mm/vmscan.c |   10 ++
 net/core/dev.c  |6 +++---
 net/core/sock.c |6 +++---
 net/sunrpc/sched.c  |5 +++--
 net/sunrpc/xprtrdma/transport.c |5 +++--
 net/sunrpc/xprtsock.c   |   17 ++---
 19 files changed, 78 insertions(+), 99 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 72e00e66ecc5..02448f11c879 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -348,7 +348,7 @@ static int rpm_callback(int (*cb)(struct device *), struct 
device *dev)
return -ENOSYS;
 
if (dev->power.memalloc_noio) {
-   unsigned int noio_flag;
+   unsigned int pflags;
 
/*
 * Deadlock might be caused if memory allocation with
@@ -359,9 +359,9 @@ static int rpm_callback(int (*cb)(struct device *), struct 
device *dev)
 * device, so network device and its ancestor should
 * be marked as memalloc_noio too.
 */
-   noio_flag = memalloc_noio_save();
+   current_set_flags_nested(, PF_MEMALLOC_NOIO);
retval = __rpm_callback(cb, dev);
-   memalloc_noio_restore(noio_flag);
+   current_restore_flags_nested(, PF_MEMALLOC_NOIO);
} else {
retval = __rpm_callback(cb, dev);
}
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 55298db36b2d..d3ddfa8a4da4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -158,7 +158,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void 
*buf, int size,
struct msghdr msg;
struct kvec iov;
sigset_t blocked, oldset;
-   unsigned long pflags = current->flags;
+   unsigned int pflags;
 
if (unlikely(!sock)) {
dev_err(disk_to_dev(nbd->disk),
@@ -172,7 +172,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void 
*buf, int size,
siginitsetinv(, sigmask(SIGKILL));
sigprocmask(SIG_SETMASK, , );
 
-   current->flags |= PF_MEMALLOC;
+   current_set_flags_nested(, PF_MEMALLOC);
do {
sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC;
iov.iov_base = buf;
@@ -220,7 +220,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void 
*buf, int size,
} while (size > 0);
 
sigprocmask(SIG_SETMASK, , NULL);
-   tsk_restore_flags(current, pflags, PF_MEMALLOC);
+   current_restore_flags_nested(, PF_MEMALLOC);
 
return result;
 }
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 66c5d130c8c2..f5fa93ea3a59 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -322,7 +322,7 @@ static void __cache_size_refresh(void)
 static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
   enum data_mode *data_mode)
 {
-   unsigned noio_flag;
+   unsigned int pflags;
void *ptr;
 
if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
@@ -350,12 +350,12 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, 
gfp_t gfp_mask,
 */
 
if (gfp_mask & __GFP_NORETRY)
-   noio_flag = memalloc_noio_save();
+   current_set_flags_nested(, PF_MEMALLOC_NOIO);
 
ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL);
 
if (gfp_mask & __GFP_NORETRY)
-   memalloc_noio_restore(noio_flag);
+   current_restore_flags_nested(, PF_MEMALLOC_NOIO);
 
return ptr;
 }
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 51521429fb59..5409533f22b5 100644
--- 

Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Kay Sievers
On Tue, Apr 15, 2014 at 8:50 PM, Eric W. Biederman
 wrote:
> Kay Sievers  writes:
>
>> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
>>> On 2014/4/15 5:44, Tejun Heo wrote:
 cgroup users often need a way to determine when a cgroup's
 subhierarchy becomes empty so that it can be cleaned up.  cgroup
 currently provides release_agent for it; unfortunately, this mechanism
 is riddled with issues.

 * It delivers events by forking and execing a userland binary
   specified as the release_agent.  This is a long deprecated method of
   notification delivery.  It's extremely heavy, slow and cumbersome to
   integrate with larger infrastructure.

 * There is single monitoring point at the root.  There's no way to
   delegate management of subtree.

 * The event isn't recursive.  It triggers when a cgroup doesn't have
   any tasks or child cgroups.  Events for internal nodes trigger only
   after all children are removed.  This again makes it impossible to
   delegate management of subtree.

 * Events are filtered from the kernel side.  "notify_on_release" file
   is used to subscribe to or suppress release event.  This is
   unnecessarily complicated and probably done this way because event
   delivery itself was expensive.

 This patch implements interface file "cgroup.subtree_populated" which
 can be used to monitor whether the cgroup's subhierarchy has tasks in
 it or not.  Its value is 0 if there is no task in the cgroup and its
 descendants; otherwise, 1, and kernfs_notify() notificaiton is
 triggers when the value changes, which can be monitored through poll
 and [di]notify.

>>>
>>> For the old notification mechanism, the path of the cgroup that becomes
>>> empty will be passed to the user specified release agent. Like this:
>>>
>>> # cat /sbin/cpuset_release_agent
>>> #!/bin/sh
>>> rmdir /dev/cpuset/$1
>>>
>>> How do we achieve this using inotify?
>>>
>>> - monitor all the cgroups, or
>>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>>   empty cgroups.
>>> - monitor root cgroup only, and travel the whole hierarchy to find
>>>   empy cgroups when it gets an fs event.
>>>
>>> Seems none of them is scalible.
>>
>> The manager would add all cgroups as watches to one inotify file
>> descriptor, it should not be problem to do that.
>
> inotify won't work on cgroupfs.

Inotify on kernfs will work.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [perf] more perf_fuzzer memory corruption

2014-04-15 Thread Vince Weaver
On Tue, 15 Apr 2014, Vince Weaver wrote:

> Possibly it looks like a struct perf_event is being used after freed,
> specifically the event->migrate_entry->prev value?  I could
> be completely wrong about that.

and actually I'm mixing up hex and decimal.  It looks like the actual 
value being written to the freed area is at 0x48 whichi I think maps to
event->hlist_entry->pprev

but really if it's late enough I'm mixing hex and decimal I should 
probably stop staring at trace dumps and get some sleep.

Vince


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/8] sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-15 Thread Simon Horman
On Wed, Apr 16, 2014 at 09:30:24AM +0530, Viresh Kumar wrote:
> On 16 April 2014 06:54, Simon Horman  wrote:
> > I have dropped this patch.
> >
> > Please let me know if there is a stable branch which
> > includes cpufreq_frequency_table which I can use as a base to apply
> > this patch.
> >
> > Alternatively, I would be happy to Ack this patch and let someone
> > else pick up this patch, but I'm entirely unclear on who that would be.
> 
> Rafael will take this patch, please Ack it :)

Thanks, done.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/8] sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-15 Thread Simon Horman
On Wed, Apr 16, 2014 at 01:27:04AM +0300, Stratos Karafotis wrote:
> The cpufreq core now supports the cpufreq_for_each_valid_entry macro
> helper for iteration over the cpufreq_frequency_table, so use it.
> 
> It should have no functional changes.
> 
> Signed-off-by: Stratos Karafotis 

Rafael, please feel free to take this one.

Acked-by: Simon Horman 

> ---
>  drivers/sh/clk/core.c | 20 +---
>  1 file changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/sh/clk/core.c b/drivers/sh/clk/core.c
> index 7472785..be56b22 100644
> --- a/drivers/sh/clk/core.c
> +++ b/drivers/sh/clk/core.c
> @@ -196,17 +196,11 @@ int clk_rate_table_find(struct clk *clk,
>   struct cpufreq_frequency_table *freq_table,
>   unsigned long rate)
>  {
> - int i;
> -
> - for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
> - unsigned long freq = freq_table[i].frequency;
> + struct cpufreq_frequency_table *pos;
>  
> - if (freq == CPUFREQ_ENTRY_INVALID)
> - continue;
> -
> - if (freq == rate)
> - return i;
> - }
> + cpufreq_for_each_valid_entry(pos, freq_table)
> + if (pos->frequency == rate)
> + return pos - freq_table;
>  
>   return -ENOENT;
>  }
> @@ -575,11 +569,7 @@ long clk_round_parent(struct clk *clk, unsigned long 
> target,
>   return abs(target - *best_freq);
>   }
>  
> - for (freq = parent->freq_table; freq->frequency != CPUFREQ_TABLE_END;
> -  freq++) {
> - if (freq->frequency == CPUFREQ_ENTRY_INVALID)
> - continue;
> -
> + cpufreq_for_each_valid_entry(freq, parent->freq_table) {
>   if (unlikely(freq->frequency / target <= div_min - 1)) {
>   unsigned long freq_max;
>  
> -- 
> 1.9.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V1 Resend 1/5] tick-common: fix wrong check in tick_check_replacement()

2014-04-15 Thread Viresh Kumar
On 16 April 2014 00:12, Thomas Gleixner  wrote:
> B1;3202;0c

What does this mean ??

> On Tue, 15 Apr 2014, Viresh Kumar wrote:
>
>> tick_check_replacement() returns if a replacement of clock_event_device is
>> possible or not. It does this as the first check:
>>
>>   if (tick_check_percpu(curdev, newdev, smp_processor_id()))
>>   return false;
>>
>> This looks wrong as we are returning false when tick_check_percpu() returned
>> true. Probably Thomas forgot '!' here in his commit: 03e13cf5e ?
>
> Come on. You can do better changelogs.

:(

> "This looks wrong" is definitely not a good description of the
> problem.
>
> Either you know WHY it is wrong, then you say so. If not, then you can
> send an RFC.
>
> I fixed the changelog up this time.

Thanks, will take care of such stuff in future.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V1 Resend 2/5] tick-common: don't check tick_oneshot_mode_active() from tick_check_preferred()

2014-04-15 Thread Viresh Kumar
On 16 April 2014 00:00, Thomas Gleixner  wrote:
> On Tue, 15 Apr 2014, Viresh Kumar wrote:
>
>> If 'curdev' passed to tick_check_preferred() is the current 
>> clock_event_device
>> then these two checks look exactly same, because td->mode is set to
>> TICKDEV_MODE_ONESHOT only when the event device has ONESHOT feature.
>>
>>   if (curdev && (curdev->features & CLOCK_EVT_FEAT_ONESHOT))
>>   return false;
>>
>>   if (tick_oneshot_mode_active())
>>   return false;
>>
>> Now left the case where 'curdev' is not the current clock_event_device. This 
>> can
>> happen from the sequence started from clockevents_replace(). Here we are 
>> trying
>> to find the best possible device that we should choose. And so even in this 
>> case
>> we don't need the above check as we aren't really worried about the current
>> device.
>
> Wrong. If curdev is NULL, you might select a device w/o ONESHOT if the
> system is in oneshot mode. Go figure.

Okay, so the logs must have another case where curdev is NULL. But codewise
we are already taking care of that here:

return !curdev ||
newdev->rating > curdev->rating ||
   !cpumask_equal(curdev->cpumask, newdev->cpumask);

And so this patch wouldn't harm. And this is preserved in the next patch (3/5)
as well, which adds checks for other cases as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread Sasha Levin
On 04/15/2014 11:54 PM, H. Peter Anvin wrote:
> On 04/15/2014 08:47 PM, Sasha Levin wrote:
>> > 
>> > Yes, if kmemcheck for some reason needs to figure out if an instruction
>> > is a MOV variant we'll need to list quite a few mnemonics, but that list
>> > will be much shorter and more readable than a corresponding list of 
>> > opcodes.
>> > 
> You're completely missing my point.  If you are looking at MOV, with
> 80%+ probability you're doing something very, very wrong, because you
> will be including instructions that do something completely different
> from what you thought.
> 
> This is true for a lot of the x86 instructions.

Right, but assuming that the AND example I presented earlier makes sense, I
can't create mnemonic entries only for instructions where doing so would
"probably" be right.

If there are use cases where working with mnemonics is correct, we should
be doing that in kmemcheck. If the way kmemcheck deals with mnemonics is
incorrect we should go ahead and fix kmemcheck.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the audit tree with Linus' tree

2014-04-15 Thread Stephen Rothwell
Hi Eric,

Today's linux-next merge of the audit tree got conflicts in
arch/mips/include/asm/syscall.h, arch/x86/Kconfig and kernel/audit.c
between commits from Linus' tree and commit 596b0569084b ("Merge tag
'v3.14' into mergeing") from the audit tree.

This happened because you merged Linus' tag v3.14 into your tree.  In
this case, that merge had conflicts that you resolved differently to the
way Linus had resolved them when he merged your tree for v3.15-rc1.  I
fixed it up (by using Linus' version) and can carry the fix as necessary
(no action is required).

You could have avoided this by doing a fast forward merge of v3.15-rc1
instead of the v3.14 merge (since everything in your tree before that
merge was also in Linus' tree by v3.15-rc1).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp3sjGSfJRsq.pgp
Description: PGP signature


Re: [PATCH v2 0/8] Introduce new cpufreq helper macros

2014-04-15 Thread Viresh Kumar
On 16 April 2014 03:55, Stratos Karafotis  wrote:
> Hi all,
>
> This patch set introduces two freq_table helper macros which
> can be used for iteration over cpufreq_frequency_table and
> makes the necessary changes to cpufreq core and drivers that
> use such an iteration procedure.
>
> The motivation was a usage of common procedure to iterate over
> cpufreq_frequency_table across all drivers and cpufreq core.
>
> This was tested on a x86_64 platform.
> Most files compiled successfully but unfortunately I was not
> able to compile sh_sir.c pasemi_cpufreq.c and ppc_cbe_cpufreq.c
> due to lack of cross compiler.
>
> Changes v1 -> v2
> - Rearrange patches
> - Remove redundant braces
> - Fix a newly introduced bug in exynos5440
> - Use cpufreq_for_each_valid_entry instead of
> cpufreq_for_each_entry in cpufreq_frequency_table_get_index()
> - Drop redundant double ! operator in longhaul and change

You dropped this !! in thermal stuff and not longhaul :)

> the pos loop cursor variable to freq_pos.
> - Declare pos variable on a separate line
>
> Stratos Karafotis (8):
>   cpufreq: Introduce macros for cpufreq_frequency_table iteration
>   cpufreq: Use cpufreq_for_each_* macros for frequency table iteration
>   davinci: da850: Use cpufreq_for_each_entry macro for iteration
>   mips: lemote 2f: se cpufreq_for_each_entry macro for iteration
>   mfd: db8500-prcmu: Use cpufreq_for_each_entry macro for iteration
>   thermal: cpu_cooling: Use cpufreq_for_each_valid_entry macro for
> iteration
>   irda: sh_sir: Use cpufreq_for_each_valid_entry macro for iteration
>   sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

Acked-by: Viresh Kumar 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC][PATCH 1/3] ARM: dts: vf610: Add Freescale FlexTimer Module timer node.

2014-04-15 Thread dongsheng.w...@freescale.com


> -Original Message-
> From: Xiubo Li [mailto:li.xi...@freescale.com]
> Sent: Wednesday, April 16, 2014 10:20 AM
> To: daniel.lezc...@linaro.org; t...@linutronix.de; shawn@linaro.org; Lu
> Jingchang-B35083; Jin Zhengxiong-R64188; Wang Dongsheng-B40534
> Cc: devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; Xiubo Li-B47053
> Subject: [RFC][PATCH 1/3] ARM: dts: vf610: Add Freescale FlexTimer Module 
> timer
> node.
> 
> Signed-off-by: Xiubo Li 
> Cc: Shawn Guo 
> Cc: Jingchang Lu 
> ---
>  arch/arm/boot/dts/vf610.dtsi | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/vf610.dtsi b/arch/arm/boot/dts/vf610.dtsi
> index 107e2c0..c3a276f 100644
> --- a/arch/arm/boot/dts/vf610.dtsi
> +++ b/arch/arm/boot/dts/vf610.dtsi
> @@ -153,6 +153,19 @@
>   clock-names = "pit";
>   };
> 
> + ftm0: ftm@40038000 {
> + compatible = "fsl,vf610-ftm-timer";
> + reg = <0x40038000 0x2000>;
> + interrupts = <0 42 IRQ_TYPE_LEVEL_HIGH>;
> + clock-names = "ftm0", "ftm1",
> + "ftm0_counter_en", "ftm1_counter_en";
> + clocks = < VF610_CLK_FTM0>,
> + < VF610_CLK_FTM1>,
> + < VF610_CLK_FTM0_EXT_FIX_EN>,
> + < VF610_CLK_FTM1_EXT_FIX_EN>;
> + status = "disabled";
> + };
> +

They need to be separated. ftm0, ftm1.

>   wdog@4003e000 {
>   compatible = "fsl,vf610-wdt", "fsl,imx21-wdt";
>   reg = <0x4003e000 0x1000>;
> --
> 1.8.4
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/8] sh: clk: Use cpufreq_for_each_valid_entry macro for iteration

2014-04-15 Thread Viresh Kumar
On 16 April 2014 06:54, Simon Horman  wrote:
> I have dropped this patch.
>
> Please let me know if there is a stable branch which
> includes cpufreq_frequency_table which I can use as a base to apply
> this patch.
>
> Alternatively, I would be happy to Ack this patch and let someone
> else pick up this patch, but I'm entirely unclear on who that would be.

Rafael will take this patch, please Ack it :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How do I increment a per-CPU variable without warning?

2014-04-15 Thread Paul E. McKenney
On Tue, Apr 15, 2014 at 03:47:26PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 15, 2014 at 06:29:51PM -0400, Dave Jones wrote:
> > On Tue, Apr 15, 2014 at 03:17:55PM -0700, Paul E. McKenney wrote:
> > 
> >  > My current admittedly crude workaround is as follows:
> >  > 
> >  >  static inline bool rcu_should_resched(void)
> >  >  {
> >  >  int t;
> >  > 
> >  >  #ifdef CONFIG_DEBUG_PREEMPT
> >  >  preempt_disable();
> >  >  #endif /* #ifdef CONFIG_DEBUG_PREEMPT */
> >  >  t = __this_cpu_read(rcu_cond_resched_count) + 1;
> >  >  if (t < RCU_COND_RESCHED_LIM) {
> >  >  __this_cpu_write(rcu_cond_resched_count, t);
> >  >  #ifdef CONFIG_DEBUG_PREEMPT
> >  >  preempt_enable();
> >  >  #endif /* #ifdef CONFIG_DEBUG_PREEMPT */
> >  >  return false;
> >  >  }
> >  >  #ifdef CONFIG_DEBUG_PREEMPT
> >  >  preempt_enable();
> >  >  #endif /* #ifdef CONFIG_DEBUG_PREEMPT */
> >  >  return true;
> >  >  }
> > 
> > Won't using DEBUG_PREEMPT instead of just CONFIG_PREEMPT here make this
> > silently do the wrong thing if preemption is enabled, but debugging isn't ?
> 
> If preemption is enabled, but debugging is not, then yes, the above code
> might force an unnecessary schedule() if the above code was preempted
> between the __this_cpu_read() and the __this_cpu_write().  Which does
> not cause a problem, especially given that it won't happen very often.
> 
> > I'm not seeing why you need the ifdefs at all, unless the implied
> > barrier() is a problem ?
> 
> I don't think that Peter Zijlstra would be too happy about an extra
> unneeded preempt_disable()/preempt_enable() pair in the cond_resched()
> fastpath.  Not that I necessarily expect him to be particularly happy
> with the above, but perhaps someone has a better approach.

But falling back on the old ways of doing this at least looks a bit
nicer:

static inline bool rcu_should_resched(void)
{
int t;
int *tp = _cpu(rcu_cond_resched_count, 
raw_smp_processor_id());

t = ACCESS_ONCE(*tp) + 1;
if (t < RCU_COND_RESCHED_LIM) {
ACCESS_ONCE(*tp) = t;
return false;
}
return true;
}

Other thoughts?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread H. Peter Anvin
On 04/15/2014 08:47 PM, Sasha Levin wrote:
> 
> Yes, if kmemcheck for some reason needs to figure out if an instruction
> is a MOV variant we'll need to list quite a few mnemonics, but that list
> will be much shorter and more readable than a corresponding list of opcodes.
> 

You're completely missing my point.  If you are looking at MOV, with
80%+ probability you're doing something very, very wrong, because you
will be including instructions that do something completely different
from what you thought.

This is true for a lot of the x86 instructions.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Eric W. Biederman
Kay Sievers  writes:

> On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
>> On 2014/4/15 5:44, Tejun Heo wrote:
>>> cgroup users often need a way to determine when a cgroup's
>>> subhierarchy becomes empty so that it can be cleaned up.  cgroup
>>> currently provides release_agent for it; unfortunately, this mechanism
>>> is riddled with issues.
>>>
>>> * It delivers events by forking and execing a userland binary
>>>   specified as the release_agent.  This is a long deprecated method of
>>>   notification delivery.  It's extremely heavy, slow and cumbersome to
>>>   integrate with larger infrastructure.
>>>
>>> * There is single monitoring point at the root.  There's no way to
>>>   delegate management of subtree.
>>>
>>> * The event isn't recursive.  It triggers when a cgroup doesn't have
>>>   any tasks or child cgroups.  Events for internal nodes trigger only
>>>   after all children are removed.  This again makes it impossible to
>>>   delegate management of subtree.
>>>
>>> * Events are filtered from the kernel side.  "notify_on_release" file
>>>   is used to subscribe to or suppress release event.  This is
>>>   unnecessarily complicated and probably done this way because event
>>>   delivery itself was expensive.
>>>
>>> This patch implements interface file "cgroup.subtree_populated" which
>>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>>> it or not.  Its value is 0 if there is no task in the cgroup and its
>>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>>> triggers when the value changes, which can be monitored through poll
>>> and [di]notify.
>>>
>>
>> For the old notification mechanism, the path of the cgroup that becomes
>> empty will be passed to the user specified release agent. Like this:
>>
>> # cat /sbin/cpuset_release_agent
>> #!/bin/sh
>> rmdir /dev/cpuset/$1
>>
>> How do we achieve this using inotify?
>>
>> - monitor all the cgroups, or
>> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>>   empty cgroups.
>> - monitor root cgroup only, and travel the whole hierarchy to find
>>   empy cgroups when it gets an fs event.
>>
>> Seems none of them is scalible.
>
> The manager would add all cgroups as watches to one inotify file
> descriptor, it should not be problem to do that.

inotify won't work on cgroupfs.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/2] iio: adc: exynos_adc: Control special clock of ADC to support Exynos3250 ADC

2014-04-15 Thread Sachin Kamat
Hi Chanwoo,

On 14 April 2014 14:37, Chanwoo Choi  wrote:
> This patch control special clock for ADC in Exynos series's FSYS block.
> If special clock of ADC is registerd on clock list of common clk framework,
> Exynos ADC drvier have to control this clock.
>
> Exynos3250/Exynos4/Exynos5 has 'adc' clock as following:
> - 'adc' clock: bus clock for ADC
>
> Exynos3250 has additional 'sclk_tsadc' clock as following:
> - 'sclk_tsadc' clock: special clock for ADC which provide clock to internal 
> ADC
>
> Exynos 4210/4212/4412 and Exynos5250/5420 has not included 'sclk_tsadc' clock
> in FSYS_BLK. But, Exynos3250 based on Cortex-A7 has only included 'sclk_tsadc'
> clock in FSYS_BLK.
>
> Cc: Jonathan Cameron 
> Cc: Kukjin Kim 
> Cc: Naveen Krishna Chatradhi
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Chanwoo Choi 
> Acked-by: Kyungmin Park 
> ---
>  drivers/iio/adc/exynos_adc.c | 54 
> +---
>  1 file changed, 41 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iio/adc/exynos_adc.c b/drivers/iio/adc/exynos_adc.c
> index d25b262..3c99243 100644
> --- a/drivers/iio/adc/exynos_adc.c
> +++ b/drivers/iio/adc/exynos_adc.c
> @@ -40,8 +40,9 @@
>  #include 
>
>  enum adc_version {
> -   ADC_V1,
> -   ADC_V2
> +   ADC_V1 = 0x1,
> +   ADC_V2 = 0x2,
> +   ADC_V3 = (ADC_V1 | ADC_V2),

Can't this be simply 0x3? Or is this not really a h/w version?

>  };
>
>  /* EXYNOS4412/5250 ADC_V1 registers definitions */
> @@ -88,6 +89,7 @@ struct exynos_adc {
> void __iomem*regs;
> void __iomem*enable_reg;
> struct clk  *clk;
> +   struct clk  *sclk;
> unsigned intirq;
> struct regulator*vdd;
>
> @@ -100,6 +102,7 @@ struct exynos_adc {
>  static const struct of_device_id exynos_adc_match[] = {
> { .compatible = "samsung,exynos-adc-v1", .data = (void *)ADC_V1 },
> { .compatible = "samsung,exynos-adc-v2", .data = (void *)ADC_V2 },
> +   { .compatible = "samsung,exynos-adc-v3", .data = (void *)ADC_V3 },
> {},
>  };
>  MODULE_DEVICE_TABLE(of, exynos_adc_match);
> @@ -128,7 +131,7 @@ static int exynos_read_raw(struct iio_dev *indio_dev,
> mutex_lock(_dev->mlock);
>
> /* Select the channel to be used and Trigger conversion */
> -   if (info->version == ADC_V2) {
> +   if (info->version & ADC_V2) {

So, now this would be applicable for ADC_V3 too, right?


> con2 = readl(ADC_V2_CON2(info->regs));
> con2 &= ~ADC_V2_CON2_ACH_MASK;
> con2 |= ADC_V2_CON2_ACH_SEL(chan->address);
> @@ -165,7 +168,7 @@ static irqreturn_t exynos_adc_isr(int irq, void *dev_id)
> info->value = readl(ADC_V1_DATX(info->regs)) &
> ADC_DATX_MASK;
> /* clear irq */
> -   if (info->version == ADC_V2)
> +   if (info->version & ADC_V2)
> writel(1, ADC_V2_INT_ST(info->regs));
> else
> writel(1, ADC_V1_INTCLR(info->regs));
> @@ -226,11 +229,25 @@ static int exynos_adc_remove_devices(struct device 
> *dev, void *c)
> return 0;
>  }
>
> +static void exynos_adc_enable_clock(struct exynos_adc *info, bool enable)
> +{
> +   if (enable) {
> +   clk_prepare_enable(info->clk);

This could fail. Is it OK without any checks?

> +   if (info->version == ADC_V3)
> +   clk_prepare_enable(info->sclk);

ditto.

> +
> +   } else {
> +   if (info->version == ADC_V3)
> +   clk_disable_unprepare(info->sclk);
> +   clk_disable_unprepare(info->clk);
> +   }
> +}
> +
>  static void exynos_adc_hw_init(struct exynos_adc *info)
>  {
> u32 con1, con2;
>
> -   if (info->version == ADC_V2) {
> +   if (info->version & ADC_V2) {
> con1 = ADC_V2_CON1_SOFT_RESET;
> writel(con1, ADC_V2_CON1(info->regs));
>
> @@ -300,6 +317,8 @@ static int exynos_adc_probe(struct platform_device *pdev)
>
> writel(1, info->enable_reg);
>
> +   info->version = exynos_adc_get_version(pdev);
> +
> info->clk = devm_clk_get(>dev, "adc");
> if (IS_ERR(info->clk)) {
> dev_err(>dev, "failed getting clock, err = %ld\n",
> @@ -308,6 +327,17 @@ static int exynos_adc_probe(struct platform_device *pdev)
> goto err_irq;
> }
>
> +   if (info->version == ADC_V3) {
> +   info->sclk = devm_clk_get(>dev, "sclk_tsadc");
> +   if (IS_ERR(info->sclk)) {
> +   dev_warn(>dev,
> +   "failed getting sclk clock, err = %ld\n",
> +   PTR_ERR(info->sclk));
> +   ret = PTR_ERR(info->sclk);

nit: you could move this line above dev_warn and use 'ret' in the print
statement.


-- 
With warm regards,
Sachin
--
To unsubscribe from this 

Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread Sasha Levin
On 04/15/2014 11:26 PM, H. Peter Anvin wrote:
> On 04/15/2014 08:10 AM, Sasha Levin wrote:
>>
>> Mnemonics don't have 1:1 relationship with opcodes. So, for example,
>> if kmemcheck needs to check (and it does) whether a given instruction
>> is an "ADD", it would need to compare it to 9 different opcodes.
>>
> 
> Excuse me, but on what planet does, for example, it makes sense if a
> particular instruction is a "MOV", for example?  The trend in x86
> opcodes have varied over the years and at some points it seems to have
> been trendy to have very general mnemonics (consider MOV CR, MOV DR) and
> at some points quite the opposite (hence MOVD, MOVQ, MOVDQA, MOVDQU,
> MOVAPS, MOVUPS, MOVAPD, MOVUPD, VMOVxxx).
> 
> So it is not at all clear that this makes any kind of sense whatsoever,
> and is more likely just going to be abused.

Looking at kmemcheck, and "AND" vs "MOV" for example, we need to know if a
given instruction is AND because AND may operate on only part of the memory
it's accessing to. So some accesses to what kmemcheck sees as "uninitialized
memory" are actually valid ones because we don't touch the uninitialized
part.

So for kmemcheck, AND and MOV (for example) are different because ANDing
a value and MOVing a value mean different things wrt to uninitialized memory.

Yes, if kmemcheck for some reason needs to figure out if an instruction
is a MOV variant we'll need to list quite a few mnemonics, but that list
will be much shorter and more readable than a corresponding list of opcodes.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] net: Implement SO_PASSCGROUP to enable passing cgroup path

2014-04-15 Thread Andy Lutomirski
On Apr 15, 2014 5:20 PM, "Vivek Goyal"  wrote:
>
> On Tue, Apr 15, 2014 at 02:53:13PM -0700, Andy Lutomirski wrote:
> > On Tue, Apr 15, 2014 at 2:15 PM, Vivek Goyal  wrote:
> > > This patch implements socket option SO_PASSCGROUP along the lines of
> > > SO_PASSCRED.
> > >
> > > If SO_PASSCGROUP is set, then recvmsg() will get a control message
> > > SCM_CGROUP which will contain the cgroup path of sender. This cgroup
> > > belongs to first mounted hierarchy in the sytem.
> > >
> > > SCM_CGROUP control message can only be received and sender can not send
> > > a SCM_CGROUP message. Kernel automatically generates one if receiver
> > > chooses to receive one.
> > >
> > > This works both for unix stream and datagram sockets.
> > >
> > > cgroup information is passed only if either the sender or receiver has
> > > SO_PASSCGROUP option set. This means for existing workloads they should
> > > not see any significant performance impact of this change.
> >
> > This is odd.  Shouldn't an SCM_CGROUP cmsg be generated when the
> > receiver has SO_PASSCGROUP set and the sender passes SCM_CGROUP to
> > sendmsg?
>
> How can receiver trust the cgroup info generated by sender. It needs to
> be generated by kernel so that receiver can trust it.
>
> And if receiver needs to know cgroup of sender, receiver can just set
> SO_PASSCGROUP on socket and receiver should get one SCM_CGROUP message
> with each message received.

I think the kernel should validate the data.

Here's an attack against SO_PEERCGROUP: if you create a container with
a super secret name, then every time you connect to any unix socket,
you leak the name.

Here's an attack against SO_PASSCGROUP, as you implemented it: connect
a socket and get someone else to write(2) to it.  This isn't very
hard.  Now you've impersonated.

I advocate for the following semantics: if sendmsg is passed a
SCM_CGROUP cmsg, and that cmsg has the right cgroup, and the receiver
has SO_PASSCGROUP set, then the receiver gets SCM_CGROUP.  If you try
to lie using SCM_CGROUP, you get -EPERM.  If you set SO_PASSCGROUP,
but your peer doesn't sent SCM_CREDS, you get nothing.

This is immune to both attacks.  It should be cheaper, too, since
there's no overhead for people who don't use it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC][PATCH 3/3] clocksource: Add Freescale FlexTimer Module (FTM) timer support

2014-04-15 Thread li.xi...@freescale.com
[...]
> > +static void ftm_set_mode(enum clock_event_mode mode,
> > +   struct clock_event_device *evt)
> > +{
> > +   switch (mode) {
> > +   case CLOCK_EVT_MODE_PERIODIC:
> > +   ftm_set_next_event(peroidic_cyc, evt);
> > +   break;
> > +   default:
> > +   break;
> 
> Remove this break;
> 

I'll revise this.


> > +   }
> > +}


[...]
> > +static void __init ftm_timer_init(struct device_node *np)
> > +{
> > +   struct clk *ftm_clk;
> > +   void __iomem *timer_base;
> > +   unsigned long freq;
> > +   int irq;
> > +
> > +   timer_base = of_iomap(np, 0);
> > +   BUG_ON(!timer_base);
> > +
> > +   clksrc_base = timer_base + FTM_OFFSET(1);
> > +   clkevt_base = timer_base + FTM_OFFSET(0);
> > +
> > +   irq = irq_of_parse_and_map(np, 0);
> > +   BUG_ON(irq <= 0);
> > +
> > +   ftm_clk = of_clk_get_by_name(np, "ftm0_counter_en");
> > +   BUG_ON(IS_ERR(ftm_clk));
> > +   BUG_ON(clk_prepare_enable(ftm_clk));
> > +
> > +   ftm_clk = of_clk_get_by_name(np, "ftm1_counter_en");
> > +   BUG_ON(IS_ERR(ftm_clk));
> > +   BUG_ON(clk_prepare_enable(ftm_clk));
> > +
> > +   ftm_clk = of_clk_get_by_name(np, "ftm0");
> > +   BUG_ON(IS_ERR(ftm_clk));
> > +   BUG_ON(clk_prepare_enable(ftm_clk));
> > +
> > +   ftm_clk = of_clk_get_by_name(np, "ftm1");
> 
> Why dts is not have ftm1 node?
> 

Because the 'ftm0: ftm@40038000' node is used to ftm0
and ftm1 device nodes at the same time.

May using 'ftm: ftm@40038000' will be much better ?

Thanks,

BRs
Xiubo



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] cgroup: implement cgroup.subtree_populated for the default hierarchy

2014-04-15 Thread Kay Sievers
On Tue, Apr 15, 2014 at 7:48 PM, Li Zefan  wrote:
> On 2014/4/15 5:44, Tejun Heo wrote:
>> cgroup users often need a way to determine when a cgroup's
>> subhierarchy becomes empty so that it can be cleaned up.  cgroup
>> currently provides release_agent for it; unfortunately, this mechanism
>> is riddled with issues.
>>
>> * It delivers events by forking and execing a userland binary
>>   specified as the release_agent.  This is a long deprecated method of
>>   notification delivery.  It's extremely heavy, slow and cumbersome to
>>   integrate with larger infrastructure.
>>
>> * There is single monitoring point at the root.  There's no way to
>>   delegate management of subtree.
>>
>> * The event isn't recursive.  It triggers when a cgroup doesn't have
>>   any tasks or child cgroups.  Events for internal nodes trigger only
>>   after all children are removed.  This again makes it impossible to
>>   delegate management of subtree.
>>
>> * Events are filtered from the kernel side.  "notify_on_release" file
>>   is used to subscribe to or suppress release event.  This is
>>   unnecessarily complicated and probably done this way because event
>>   delivery itself was expensive.
>>
>> This patch implements interface file "cgroup.subtree_populated" which
>> can be used to monitor whether the cgroup's subhierarchy has tasks in
>> it or not.  Its value is 0 if there is no task in the cgroup and its
>> descendants; otherwise, 1, and kernfs_notify() notificaiton is
>> triggers when the value changes, which can be monitored through poll
>> and [di]notify.
>>
>
> For the old notification mechanism, the path of the cgroup that becomes
> empty will be passed to the user specified release agent. Like this:
>
> # cat /sbin/cpuset_release_agent
> #!/bin/sh
> rmdir /dev/cpuset/$1
>
> How do we achieve this using inotify?
>
> - monitor all the cgroups, or
> - monitor all the leaf cgroups, and travel cgrp->parent to delete all
>   empty cgroups.
> - monitor root cgroup only, and travel the whole hierarchy to find
>   empy cgroups when it gets an fs event.
>
> Seems none of them is scalible.

The manager would add all cgroups as watches to one inotify file
descriptor, it should not be problem to do that.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] x86/insn: Extract more information about instructions

2014-04-15 Thread H. Peter Anvin
On 04/15/2014 08:10 AM, Sasha Levin wrote:
> 
> Mnemonics don't have 1:1 relationship with opcodes. So, for example,
> if kmemcheck needs to check (and it does) whether a given instruction
> is an "ADD", it would need to compare it to 9 different opcodes.
> 

Excuse me, but on what planet does, for example, it makes sense if a
particular instruction is a "MOV", for example?  The trend in x86
opcodes have varied over the years and at some points it seems to have
been trendy to have very general mnemonics (consider MOV CR, MOV DR) and
at some points quite the opposite (hence MOVD, MOVQ, MOVDQA, MOVDQU,
MOVAPS, MOVUPS, MOVAPD, MOVUPD, VMOVxxx).

So it is not at all clear that this makes any kind of sense whatsoever,
and is more likely just going to be abused.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2 v2] x86, MCE: Clean get_cpu_xxx with this_cpu_xxx

2014-04-15 Thread Chen, Gong
This is a cleanup patch suggested by Peter. Use new
this_cpu_xxx to improve operation speed.

v2 -> v1: Separate cleanup from bug fix.

Signed-off-by: Chen, Gong 
Suggested-by: H. Peter Anvin 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68317c8..5284189 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -402,7 +402,7 @@ static u64 mce_rdmsrl(u32 msr)
 
if (offset < 0)
return 0;
-   return *(u64 *)((char *)&__get_cpu_var(injectm) + offset);
+   return *(u64 *)((char *)this_cpu_ptr() + offset);
}
 
if (rdmsrl_safe(msr, )) {
@@ -424,7 +424,7 @@ static void mce_wrmsrl(u32 msr, u64 v)
int offset = msr_to_offset(msr);
 
if (offset >= 0)
-   *(u64 *)((char *)&__get_cpu_var(injectm) + offset) = v;
+   *(u64 *)((char *)this_cpu_ptr() + offset) = v;
return;
}
wrmsrl(msr, v);
@@ -480,7 +480,7 @@ static DEFINE_PER_CPU(struct mce_ring, mce_ring);
 /* Runs with CPU affinity in workqueue */
 static int mce_ring_empty(void)
 {
-   struct mce_ring *r = &__get_cpu_var(mce_ring);
+   struct mce_ring *r = this_cpu_ptr(_ring);
 
return r->start == r->end;
 }
@@ -492,7 +492,7 @@ static int mce_ring_get(unsigned long *pfn)
 
*pfn = 0;
get_cpu();
-   r = &__get_cpu_var(mce_ring);
+   r = this_cpu_ptr(_ring);
if (r->start == r->end)
goto out;
*pfn = r->ring[r->start];
@@ -506,7 +506,7 @@ out:
 /* Always runs in MCE context with preempt off */
 static int mce_ring_add(unsigned long pfn)
 {
-   struct mce_ring *r = &__get_cpu_var(mce_ring);
+   struct mce_ring *r = this_cpu_ptr(_ring);
unsigned next;
 
next = (r->end + 1) % MCE_RING_SIZE;
@@ -528,7 +528,7 @@ int mce_available(struct cpuinfo_x86 *c)
 static void mce_schedule_work(void)
 {
if (!mce_ring_empty())
-   schedule_work(&__get_cpu_var(mce_work));
+   schedule_work(this_cpu_ptr(_work));
 }
 
 DEFINE_PER_CPU(struct irq_work, mce_irq_work);
@@ -553,7 +553,7 @@ static void mce_report_event(struct pt_regs *regs)
return;
}
 
-   irq_work_queue(&__get_cpu_var(mce_irq_work));
+   irq_work_queue(this_cpu_ptr(_irq_work));
 }
 
 /*
@@ -1050,7 +1050,7 @@ void do_machine_check(struct pt_regs *regs, long 
error_code)
 
mce_gather_info(, regs);
 
-   final = &__get_cpu_var(mces_seen);
+   final = this_cpu_ptr(_seen);
*final = m;
 
memset(valid_banks, 0, sizeof(valid_banks));
@@ -1282,16 +1282,14 @@ static unsigned long mce_adjust_timer_default(unsigned 
long interval)
 static unsigned long (*mce_adjust_timer)(unsigned long interval) =
mce_adjust_timer_default;
 
-static int cmc_error_seen(void)
+static inline int cmc_error_seen(void)
 {
-   unsigned long *v = &__get_cpu_var(mce_polled_error);
-
-   return test_and_clear_bit(0, v);
+   return this_cpu_xchg(mce_polled_error, 0);
 }
 
 static void mce_timer_fn(unsigned long data)
 {
-   struct timer_list *t = &__get_cpu_var(mce_timer);
+   struct timer_list *t = this_cpu_ptr(_timer);
unsigned long iv;
int notify;
 
@@ -1299,7 +1297,7 @@ static void mce_timer_fn(unsigned long data)
 
if (mce_available(__this_cpu_ptr(_info))) {
machine_check_poll(MCP_TIMESTAMP,
-   &__get_cpu_var(mce_poll_banks));
+   this_cpu_ptr(_poll_banks));
mce_intel_cmci_poll();
}
 
@@ -1329,7 +1327,7 @@ static void mce_timer_fn(unsigned long data)
  */
 void mce_timer_kick(unsigned long interval)
 {
-   struct timer_list *t = &__get_cpu_var(mce_timer);
+   struct timer_list *t = this_cpu_ptr(_timer);
unsigned long when = jiffies + interval;
unsigned long iv = __this_cpu_read(mce_next_interval);
 
@@ -1665,7 +1663,7 @@ static void mce_start_timer(unsigned int cpu, struct 
timer_list *t)
 
 static void __mcheck_cpu_init_timer(void)
 {
-   struct timer_list *t = &__get_cpu_var(mce_timer);
+   struct timer_list *t = this_cpu_ptr(_timer);
unsigned int cpu = smp_processor_id();
 
setup_timer(t, mce_timer_fn, cpu);
@@ -1708,8 +1706,8 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
__mcheck_cpu_init_timer();
-   INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
-   init_irq_work(&__get_cpu_var(mce_irq_work), _irq_work_cb);
+   INIT_WORK(this_cpu_ptr(_work), mce_process_work);
+   init_irq_work(this_cpu_ptr(_irq_work), _irq_work_cb);
 }
 
 /*
@@ -1961,7 +1959,7 @@ static struct miscdevice mce_chrdev_device = {

[PATCH 1/2 v2] x86, MCE: Fix a bug in CMCI handler

2014-04-15 Thread Chen, Gong
This bug is introduced by me in commit 27f6c573e0. I forget
to execute put_cpu_var operation after get_cpu_var. Fix it
via this_cpu_write instead of get_cpu_var.

v2 -> v1: Separate cleanup from bug fix.

Signed-off-by: Chen, Gong 
Suggested-by: H. Peter Anvin 
---
 arch/x86/kernel/cpu/mcheck/mce.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 23f..68317c8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -598,7 +598,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t 
*b)
 {
struct mce m;
int i;
-   unsigned long *v;
 
this_cpu_inc(mce_poll_count);
 
@@ -618,8 +617,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t 
*b)
if (!(m.status & MCI_STATUS_VAL))
continue;
 
-   v = _cpu_var(mce_polled_error);
-   set_bit(0, v);
+   this_cpu_write(mce_polled_error, 1);
/*
 * Uncorrected or signalled events are handled by the exception
 * handler when it is enabled, so don't process those here.
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >