date:20121009

Re: [GIT PULL] Disintegrate UAPI for can

2012-10-09 Thread Marc Kleine-Budde

On 10/09/2012 07:55 PM, Oliver Hartkopp wrote:
> Hello Marc,
> 
> can you please pull these changes, so that they can go via the CAN tree to 
> Dave?

Sure, will do, when I have a proper internet connection again. This will
be this evening. :)

Marc
-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 1/4] module: add syscall to load module from fd

2012-10-09 Thread Michael Kerrisk (man-pages)

[resending because my mobile device decided it
wanted to send HTML, which of course bounced.]

On Oct 10, 2012 12:09 AM, "H. Peter Anvin"  wrote:
>
> On 10/10/2012 06:03 AM, Michael Kerrisk (man-pages) wrote:
> > Good point. A "whole hog" openat()-style interface is worth thinking about 
> > too.
>
> *Although* you could argue that you can always simply open the module
> file first, and that finit_module() is really what we should have had in
> the first place.  Then you don't need the flags since those would come
> from openat().

But in that case, I'd still stand by my original point: it may be
desirable to have a flags argument to allow future modifications to
the behavior of finit_module() (as opposed to the behavior of the file
open).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 02/16] f2fs: add on-disk layout

2012-10-09 Thread Jaegeuk Kim

[snip]

> > +/*
> > + * For superblock
> > + */
> > +struct f2fs_super_block {
> > +   __le32 magic;   /* Magic Number */
> > +   __le16 major_ver;   /* Major Version */
> > +   __le16 minor_ver;   /* Minor Version */
> > +   __le32 log_sectorsize;  /* log2 (Sector size in bytes) */
> > +   __le32 log_sectors_per_block;   /* log2 (Number of sectors per block */
> > +   __le32 log_blocksize;   /* log2 (Block size in bytes) */
> 
> Why store log_blocksize on disk when it can be calculated from
> log_sectorsize and log_sectors_per_block?  It may be handy to keep this
> in the in-memory superblock but keeping it on-disk means you need a
> consistency check and error code path when loading the superblock.
> 

Yes, I added this for sanity check of superblock.

> > +struct f2fs_inode {
> > +   __le16 i_mode;  /* File mode */
> > +   __le16 i_reserved;  /* Reserved */
> > +   __le32 i_uid;   /* User ID */
> > +   __le32 i_gid;   /* Group ID */
> > +   __le32 i_links; /* Links count */
> > +   __le64 i_size;  /* File size in bytes */
> > +   __le64 i_blocks;/* File size in bytes */
> 
> File size in blocks
> 
> > +struct direct_node {
> > +   __le32 addr[ADDRS_PER_BLOCK];   /* aray of data block address */
> 
> s/aray/array/
> 
> > +} __packed;
> > +
> > +struct indirect_node {
> > +   __le32 nid[NIDS_PER_BLOCK]; /* aray of data block address */
> 
> s/aray/array/

I'll change them.
Thank you.
> 
> Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] No need to call irq_domain_legacy_revmap() for twice

2012-10-09 Thread Mike

Any comments?
Thanks
在 2012-09-24一的 17:37 +0800，Mike Qiu写道：
> Function irq_create_mapping() calls irq_find_mapping(). The later
> function has checked if the indicated IRQ domain has hw IRQ mapped to
> virtual IRQ through legacy mode or not and return the value of the
> legacy irq number by call irq_domain_legacy_revmap(). We needn't
> to call irq_domain_legacy_revmap() to do same check in
> irq_create_mapping() again.
> 
> The patch removes the duplicate call.
> 
> Signed-off-by: Mike Qiu 
> ---
>  kernel/irq/irqdomain.c |7 +--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index 49a7772..286d672 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -547,9 +547,12 @@ unsigned int irq_create_mapping(struct irq_domain 
> *domain,
>   return virq;
>   }
> 
> - /* Get a virtual interrupt number */
> + /*
> +  * For IRQ domain with type of IRQ_DOMAIN_MAP_LEGACY, we needn't
> +  * create the IRQ mapping for non-existing one, so just return 0.
> +  */
>   if (domain->revmap_type == IRQ_DOMAIN_MAP_LEGACY)
> - return irq_domain_legacy_revmap(domain, hwirq);
> + return 0;
> 
>   /* Allocate a virtual interrupt number */
>   hint = hwirq % nr_irqs;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] task_work: avoid unneeded cmpxchg() in task_work_run()

2012-10-09 Thread Lai Jiangshan

On 10/09/2012 07:04 PM, Peter Zijlstra wrote:
> On Mon, 2012-10-08 at 14:38 +0200, Oleg Nesterov wrote:
>> But the code looks more complex, and the only advantage is that
>> non-exiting task does xchg() instead of cmpxchg(). Not sure this
>> worth the trouble, in this case task_work_run() will likey run
>> the callbacks (the caller checks ->task_works != NULL), I do not
>> think this can add any noticeable speedup. 
> 
> Yeah, I agree, the patch doesn't seem worth the trouble. It makes tricky
> code unreadable at best.
> 

To gain better readability, we need to move work_exited things out
from task_work_run() too.

Thanks,
Lai

Subject: task_work: avoid unneeded cmpxchg() in task_work_run()

We only require cmpxchg() when task is exiting.
xchg() is enough in other cases like original code in ac3d0da8.

So we use xchg() for task_work_run() and move the logic
of exit_task_work() out from task_work_run().

Signed-off-by: Lai Jiangshan 
---

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index ca5a1cf..1e686a5 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -15,10 +15,6 @@ init_task_work(struct callback_head *twork, task_work_func_t 
func)
 int task_work_add(struct task_struct *task, struct callback_head *twork, bool);
 struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t);
 void task_work_run(void);
-
-static inline void exit_task_work(struct task_struct *task)
-{
-   task_work_run();
-}
+void exit_task_work(struct task_struct *task);
 
 #endif /* _LINUX_TASK_WORK_H */
diff --git a/kernel/task_work.c b/kernel/task_work.c
index 65bd3c9..87ef3b7 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -52,16 +52,7 @@ void task_work_run(void)
struct callback_head *work, *head, *next;
 
for (;;) {
-   /*
-* work->func() can do task_work_add(), do not set
-* work_exited unless the list is empty.
-*/
-   do {
-   work = ACCESS_ONCE(task->task_works);
-   head = !work && (task->flags & PF_EXITING) ?
-   _exited : NULL;
-   } while (cmpxchg(>task_works, work, head) != work);
-
+   work = xchg(>task_works, NULL);
if (!work)
break;
/*
@@ -90,3 +81,17 @@ void task_work_run(void)
} while (work);
}
 }
+
+void exit_task_work(struct task_struct *task)
+{
+   for (;;) {
+   /*
+* work->func() can do task_work_add(), do not set
+* work_exited unless the list is empty.
+*/
+   if (unlikely(task->task_works))
+   task_work_run();
+   if (cmpxchg(>task_works, NULL, _exited) == NULL)
+   break;
+   }
+}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix use-after-free of q->root_blkg and q->root_rl.blkg

2012-10-09 Thread Jun'ichi Nomura

I got system stall after the following warning with 3.6:

> WARNING: at /work/build/linux/block/blk-cgroup.h:250 blk_put_rl+0x4d/0x95()
> Modules linked in: bridge stp llc sunrpc acpi_cpufreq freq_table mperf 
> ipt_REJEC
> T nf_conntrack_ipv4 nf_defrag_ipv4
> Pid: 0, comm: swapper/0 Not tainted 3.6.0 #1
> Call Trace:
>[] warn_slowpath_common+0x85/0x9d
>  [] warn_slowpath_null+0x1a/0x1c
>  [] blk_put_rl+0x4d/0x95
>  [] __blk_put_request+0xc3/0xcb
>  [] blk_finish_request+0x232/0x23f
>  [] ? blk_end_bidi_request+0x34/0x5d
>  [] blk_end_bidi_request+0x42/0x5d
>  [] blk_end_request+0x10/0x12
>  [] scsi_io_completion+0x207/0x4d5
>  [] scsi_finish_command+0xfa/0x103
>  [] scsi_softirq_done+0xff/0x108
>  [] blk_done_softirq+0x8d/0xa1
>  [] ? generic_smp_call_function_single_interrupt+0x9f/0xd7
>  [] __do_softirq+0x102/0x213
>  [] ? lock_release_holdtime+0xb6/0xbb
>  [] ? raise_softirq_irqoff+0x9/0x3d
>  [] call_softirq+0x1c/0x30
>  [] do_softirq+0x4b/0xa3
>  [] irq_exit+0x53/0xd5
>  [] smp_call_function_single_interrupt+0x34/0x36
>  [] call_function_single_interrupt+0x6f/0x80
>[] ? mwait_idle+0x94/0xcd
>  [] ? mwait_idle+0x8b/0xcd
>  [] cpu_idle+0xbb/0x114
>  [] rest_init+0xc1/0xc8
>  [] ? csum_partial_copy_generic+0x16c/0x16c
>  [] start_kernel+0x3d4/0x3e1
>  [] ? kernel_init+0x1f7/0x1f7
>  [] x86_64_start_reservations+0xb8/0xbd
>  [] x86_64_start_kernel+0x101/0x110

blk_put_rl() does this:
 if (rl->blkg && rl->blkg->blkcg != _root)
 blkg_put(rl->blkg);
but if rl is q->root_rl, rl->blkg might be a bogus pointer
because blkcg_deactivate_policy() does not clear q->root_rl.blkg
after blkg_destroy_all().

Attached patch works for me.

Signed-off-by: Jun'ichi Nomura 

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index f3b44a6..5015764 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -285,6 +285,9 @@ static void blkg_destroy_all(struct request_queue *q)
blkg_destroy(blkg);
spin_unlock(>lock);
}
+
+   q->root_blkg = NULL;
+   q->root_rl.blkg = NULL;
 }
 
 static void blkg_rcu_free(struct rcu_head *rcu_head)
@@ -333,7 +336,7 @@ struct request_list *__blk_queue_next_rl(struct 
request_list *rl,
 
/* walk to the next list_head, skip root blkcg */
ent = ent->next;
-   if (ent == >root_blkg->q_node)
+   if (q->root_blkg && ent == >root_blkg->q_node)
ent = ent->next;
if (ent == >blkg_list)
return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT] Sparc

2012-10-09 Thread Al Viro

On Mon, Oct 08, 2012 at 04:18:06PM -0400, David Miller wrote:
> 
> There is an attempt to fix a bad interaction between syscall tracing
> and force_successful_syscall() from Al Viro, but it needs to be redone
> as it introduced regressions and thus had to be reverted for now.
> 
> Al is working on an updated version.

See below.  Just in case: Linus, please DO NOT APPLY unless it goes through
sparc tree - this is modulo approval by davem.

sparc64: fix ptrace interaction with force_successful_syscall_return()

we want syscall_trace_leave() called on exit from any syscall;
skipping its call in case we'd done force_successful_syscall_return()
is broken...

Signed-off-by: Al Viro 

diff --git a/arch/sparc/kernel/syscalls.S b/arch/sparc/kernel/syscalls.S
index 1d7e274..7f5f65d 100644
--- a/arch/sparc/kernel/syscalls.S
+++ b/arch/sparc/kernel/syscalls.S
@@ -212,24 +212,20 @@ linux_sparc_syscall:
 3: stx %o0, [%sp + PTREGS_OFF + PT_V9_I0]
 ret_sys_call:
ldx [%sp + PTREGS_OFF + PT_V9_TSTATE], %g3
-   ldx [%sp + PTREGS_OFF + PT_V9_TNPC], %l1 ! pc = npc
sra %o0, 0, %o0
mov %ulo(TSTATE_XCARRY | TSTATE_ICARRY), %g2
sllx%g2, 32, %g2
 
-   /* Check if force_successful_syscall_return()
-* was invoked.
-*/
-   ldub[%g6 + TI_SYS_NOERROR], %l2
-   brnz,a,pn %l2, 80f
-stb%g0, [%g6 + TI_SYS_NOERROR]
-
cmp %o0, -ERESTART_RESTARTBLOCK
bgeu,pn %xcc, 1f
-andcc  %l0, 
(_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT|_TIF_SYSCALL_TRACEPOINT), 
%l6
-80:
+andcc  %l0, 
(_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT|_TIF_SYSCALL_TRACEPOINT), 
%g0
+   ldx [%sp + PTREGS_OFF + PT_V9_TNPC], %l1 ! pc = npc
+
+2:
+   stb %g0, [%g6 + TI_SYS_NOERROR]
/* System call success, clear Carry condition code. */
andn%g3, %g2, %g3
+3:
stx %g3, [%sp + PTREGS_OFF + PT_V9_TSTATE]  
bne,pn  %icc, linux_syscall_trace2
 add%l1, 0x4, %l2   ! npc = npc+4
@@ -238,20 +234,20 @@ ret_sys_call:
 stx%l2, [%sp + PTREGS_OFF + PT_V9_TNPC]
 
 1:
+   /* Check if force_successful_syscall_return()
+* was invoked.
+*/
+   ldub[%g6 + TI_SYS_NOERROR], %l2
+   brnz,pn %l2, 2b
+ldx[%sp + PTREGS_OFF + PT_V9_TNPC], %l1 ! pc = npc
/* System call failure, set Carry condition code.
 * Also, get abs(errno) to return to the process.
 */
-   andcc   %l0, 
(_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT|_TIF_SYSCALL_TRACEPOINT), 
%l6  
sub %g0, %o0, %o0
-   or  %g3, %g2, %g3
stx %o0, [%sp + PTREGS_OFF + PT_V9_I0]
-   stx %g3, [%sp + PTREGS_OFF + PT_V9_TSTATE]
-   bne,pn  %icc, linux_syscall_trace2
-add%l1, 0x4, %l2   ! npc = npc+4
-   stx %l1, [%sp + PTREGS_OFF + PT_V9_TPC]
+   ba,pt   %xcc, 3b
+or %g3, %g2, %g3
 
-   b,pt%xcc, rtrap
-stx%l2, [%sp + PTREGS_OFF + PT_V9_TNPC]
 linux_syscall_trace2:
callsyscall_trace_leave
 add%sp, PTREGS_OFF, %o0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-10-09 Thread Stanislav Kinsbursky


10.10.2012 06:00, Eric W. Biederman пишет:

ebied...@xmission.com (Eric W. Biederman) writes:


"J. Bruce Fields"  writes:


On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote:

"Myklebust, Trond"  writes:


On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote:

Cc'ing Eric since I seem to recall he suggested doing it this way?

Yes.  On second look setting fs->root won't work. We need to change fs.
The problem is that by default all kernel threads share fs so changing
fs->root will have non-local consequences.

Oh, huh.  And we can't "unshare" it somehow?

I don't fully understand how nfs uses kernel threads and work queues.
My general understanding is work queues reuse their kernel threads
between different users.  So it is mostly a don't pollute your
environment thing.  If there was a dedicated kernel thread for each
environment this would be trivial.

What I was suggesting here is changing task->fs instead of
task->fs.root.  That should just require task_lock().


Or, previously you suggested:

- introduce sockaddr_fd that can be applied to AF_UNIX sockets,
  and teach unix_bind and unix_connect how to deal with a second
  type of sockaddr, AT_FD:
  struct sockaddr_fd { short fd_family; short pad; int fd; }

- introduce sockaddr_unix_at that takes a directory file
  descriptor as well as a unix path, and teach unix_bind and
  unix_connect to deal with a second sockaddr type, AF_UNIX_AT:
  struct sockaddr_unix_at { short family; short pad; int dfd; char 
path[102]; }

Any other options?

I am still half hoping we don't have to change the userspace API/ABI.
There is sanity checking on that path that no one seems interested in to
solve this problem.

There is a good option if we are up to userspace ABI extensions.

Implement open(2) on unix domain sockets.  Where open(2) would
essentially equal connect(2) on unix domain sockets.

With an open(2) implementation we could use file_open_path and the
implementation should be pretty straight forward and maintainable.
So implementing open(2) looks like a good alternative implementation
route.


This requires patching of vfs layer as well. I don't want to say, that 
the idea is not good. But it requires much more time to implement and test.
And this patch addresses the problem, which exist already and would be 
great to fix it as soon as possible.
So, probably, implementing open for unix sockets is the next and more 
generic step.

Thanks.


Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Oct 10

2012-10-09 Thread Stephen Rothwell

Hi all,

Do not add stuff destined for v3.8 to your linux-next included branches
until after v3.7-rc1 is released.

Changes since 201201009:

New tree: lzo-update

Conflicts are migrating as trees are merged by Linus.

Linus' tree gained a build failure for which I reverted a commit.

The block tree gained a build failure for which I add a merge fix patch.

The tip tree gained conflicts against Linus' tree.

The kmemleak tree gained a conflict against Linus' tree.

The akpm tree lost lots of commits that turned up in Linus' tree.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 204 trees (counting Linus' and 26 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (547b1e8 Fix staging driver use of VM_RESERVED)
Merging fixes/master (9023a40 Merge tag 'mmc-fixes-for-3.5-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc)
[master 92fbc9f] Revert "memory-hotplug: suppress "Trying to free nonexistent 
resource " warning"
Merging kbuild-current/rc-fixes (b1e0d8b kbuild: Fix gcc -x syntax)
Merging arm-current/fixes (846a136 ARM: vfp: fix saving d16-d31 vfp registers 
on v6+ kernels)
Merging m68k-current/for-linus (f82735d m68k: Use PTR_RET rather than 
if(IS_ERR(...)) + PTR_ERR)
Merging powerpc-merge/merge (636802e powerpc: Don't use __put_user() in 
patch_instruction)
Merging sparc/master (9836d34 Merge tag 'disintegrate-sparc-20121009' of 
git://git.infradead.org/users/dhowells/linux-headers)
Merging net/master (5175a5e RDS: fix rds-ping spinlock recursion)
Merging sound-current/for-linus (5d037f9 Merge tag 'asoc-3.6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (0ff9514 PCI: Don't print anything while decoding 
is disabled)
Merging wireless/master (c3e7724 mac80211: use ieee80211_free_txskb to fix 
possible skb leaks)
Merging driver-core.current/driver-core-linus (5698bd7 Linux 3.6-rc6)
Merging tty.current/tty-linus (b70936d tty: serial: sccnxp: Fix bug with 
unterminated platform_id list)
Merging usb.current/usb-linus (ecefbd9 Merge tag 'kvm-3.7-1' of 
git://git.kernel.org/pub/scm/virt/kvm/kvm)
Merging staging.current/staging-linus (5698bd7 Linux 3.6-rc6)
Merging char-misc.current/char-misc-linus (fea7a08 Linux 3.6-rc3)
Merging input-current/for-linus (dde3ada Merge branch 'next' into for-linus)
Merging md-current/for-linus (80b4812 md/raid10: fix "enough" function for 
detecting if array is failed.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (c9f97a2 crypto: x86/glue_helper - fix storing of 
new IV in CBC encryption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (244dc4e Merge 
git://git.infradead.org/users/dwmw2/random-2.6)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs 
formatting)
Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for 
of_parse_phandle_with_args)
Merging spi-current/spi/merge (d1c185b of/spi: Fix S

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-10-09 Thread Stanislav Kinsbursky


10.10.2012 02:47, Eric W. Biederman пишет:

"J. Bruce Fields"  writes:


On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote:

"Myklebust, Trond"  writes:


On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote:

Cc'ing Eric since I seem to recall he suggested doing it this way?

Yes.  On second look setting fs->root won't work. We need to change fs.
The problem is that by default all kernel threads share fs so changing
fs->root will have non-local consequences.

Oh, huh.  And we can't "unshare" it somehow?

I don't fully understand how nfs uses kernel threads and work queues.
My general understanding is work queues reuse their kernel threads
between different users.  So it is mostly a don't pollute your
environment thing.  If there was a dedicated kernel thread for each
environment this would be trivial.


One kernel thread per environment is exactly what we are trying to avoid 
if possible.
And the reason why we don't want to do this is that it's really looks 
like overkill.



What I was suggesting here is changing task->fs instead of
task->fs.root.  That should just require task_lock().


Or, previously you suggested:

- introduce sockaddr_fd that can be applied to AF_UNIX sockets,
  and teach unix_bind and unix_connect how to deal with a second
  type of sockaddr, AT_FD:
  struct sockaddr_fd { short fd_family; short pad; int fd; }

- introduce sockaddr_unix_at that takes a directory file
  descriptor as well as a unix path, and teach unix_bind and
  unix_connect to deal with a second sockaddr type, AF_UNIX_AT:
  struct sockaddr_unix_at { short family; short pad; int dfd; char 
path[102]; }

Any other options?

I am still half hoping we don't have to change the userspace API/ABI.
There is sanity checking on that path that no one seems interested in to
solve this problem.

This is a weird issue as we are dealing with both the vfs and the
networking stack.  Fundamentally we need to change task->fs.root or
we need to capitialize on the openat functionality in the kernel, so
that we don't create mountains of special cases to support this.

I think swapping task->fs instead of task->fs.root is effecitely the
same complexity.


Thanks for the idea. And for mentioning, that kernel threads shares fs 
struct. I missed it.





I very much believe we want if at all possible to perform a local
modification.

Changing fs isn't all that different from what devtmpfs is doing.

Sorry, I don't know much about devtmpfs, are you suggesting it as a
model?  What exactly should we look at?

Roughly all I meant was that devtmpsfsd is a kernel thread that runs
with an unshared fs struct.  Although I admit devtmpfsd is for all
practical purposes a userspace daemon that just happens to run in kernel
space.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] rtc: kconfig: fix RTC_INTF defaults connected to RTC_CLASS

2012-10-09 Thread Venu Byravarasu


On Wednesday 10 October 2012 05:39 AM, Kevin Hilman wrote:

From: Kevin Hilman 

commit 6b8029fab64164b5895d58d23229b75c82e3a6fc (rtc: kconfig: remove
unnecessary dependencies) removed various 'depends on RTC_CLASS'
dependencies but also removed a few 'default RTC_CLASS' statements,
which actually changed default behavior.

This resulted in the various RTC interfaces (sysfs, proc, dev) all
being disabled by default, even when RTC_CLASS is enabled:

# CONFIG_RTC_INTF_SYSFS is not set
# CONFIG_RTC_INTF_PROC is not set
# CONFIG_RTC_INTF_DEV is not set

which is different from previous behavior (all of these where enabled.)

To fix, add back the 'default RTC_CLASS' statments to each of the
RTC_INTF_* options.

Thanks for fixing this.

  config RTC_INTF_DEV
boolean "/dev/rtcN (character devices)"
+   default RTC_CLASS
help
  Say yes here if you want to use your RTCs using the /dev
  interfaces, which "udev" sets up as /dev/rtc0 through

Acked-by: Venu Byravarasu 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the kmemleak tree with Linus' tree

2012-10-09 Thread Sergey Senozhatsky

On (10/10/12 14:06), Stephen Rothwell wrote:
> Hi Catalin,
> 
> Today's linux-next merge of the kmemleak tree got a conflict in
> mm/kmemleak.c between commit 85d3a316c714 ("kmemleak: use rbtree instead
> of prio tree") from Linus' tree and commit 48786770bf3b ("kmemleak: do
> not leak object after tree insertion error") from the kmemleak tree.
> 
> The kmemleak tree commit has been there since April, should it have
> progressed by now?  Its fix is also included in the above commit from
> Linus' tree.
> 
> I just used the version from Linus' tree and can carry the fix as
> necessary (no action is required).
> 

Oh, my bad! Just took a look on current create_object(). I guess we can
drop my patch.


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the kmemleak tree with Linus' tree

2012-10-09 Thread Sergey Senozhatsky

On (10/10/12 14:06), Stephen Rothwell wrote:
> Hi Catalin,
> 
> Today's linux-next merge of the kmemleak tree got a conflict in
> mm/kmemleak.c between commit 85d3a316c714 ("kmemleak: use rbtree instead
> of prio tree") from Linus' tree and commit 48786770bf3b ("kmemleak: do
> not leak object after tree insertion error") from the kmemleak tree.
> 
> The kmemleak tree commit has been there since April, should it have
> progressed by now?  Its fix is also included in the above commit from
> Linus' tree.
> 
> I just used the version from Linus' tree and can carry the fix as
> necessary (no action is required).
> 

Hello,
I can re-base my patch (I thought it's already in Linus' tree).

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] extcon : callback function to read cable property

2012-10-09 Thread Jenny TC

For some cables a boolean variable will not be enough to represent
the state and properties of the cable. For example a charger cable can
have states CONNECT,DISCOONECT,SUSPEND(Host suspend for SDP cable),
RESUME(Host wakeup), and UPDATE (to increase the charge
current after USB enumaeration).Also the properties of the cable may
vary based on the state. FOr example in SUSPENDED state platforms can
support 0/100/500/950(USB 3.0) mA based on the HW. To initiate charging
the consumer should be able to get the charger properties dynamically.

Signed-off-by: Jenny TC 
---
 include/linux/extcon.h |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/extcon.h b/include/linux/extcon.h
index 073fd49..2e61ee0 100644
--- a/include/linux/extcon.h
+++ b/include/linux/extcon.h
@@ -122,6 +122,7 @@ struct extcon_dev {
/* --- Optional callbacks to override class functions --- */
ssize_t (*print_name)(struct extcon_dev *edev, char *buf);
ssize_t (*print_state)(struct extcon_dev *edev, char *buf);
+   int (*get_cable_properties)(const char *cable_name, void *cable_props);
 
/* --- Internal data. Please do not set. --- */
struct device   *dev;
@@ -177,6 +178,19 @@ struct extcon_specific_cable_nb {
unsigned long previous_value;
 };
 
+enum extcon_chrgr_cbl_stat {
+   EXTCON_CHRGR_CABLE_CONNECTED,
+   EXTCON_CHRGR_CABLE_DISCONNECTED,
+   EXTCON_CHRGR_CABLE_SUSPENDED,
+   EXTCON_CHRGR_CABLE_RESUMED,
+   EXTCON_CHRGR_CABLE_UPDATED,
+};
+
+struct extcon_chrgr_cbl_props {
+   enum extcon_chrgr_cbl_stat cable_stat;
+   unsigned long mA;
+};
+
 #if IS_ENABLED(CONFIG_EXTCON)
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][CFT][CFReview] execve and kernel_thread unification work

2012-10-09 Thread Al Viro

On Tue, Oct 09, 2012 at 03:48:16PM -0400, Chris Metcalf wrote:
> On 10/1/2012 5:38 PM, Al Viro wrote:
> > There's an interesting ongoing project around kernel_thread() and
> > friends, including execve() variants.  I really need help from architecture
> > maintainers on that one; I'd been able to handle (and test) quite a few
> > architectures on my own [alpha, arm, m68k, powerpc, s390, sparc, x86, um]
> > plus two more untested [frv, mn10300].  c6x patches had been supplied by
> > Mark Salter; everything else remains to be done.  Right now it's at
> > minus 1.2KLoC, quite a bit of that removed from asm glue and other black
> > magic.
> 
> I'll take a look at this for arch/tile this week.

Thanks.  FWIW, changes since the last posting:
* untested conversion for cris added [me]
* conversion for mips added [Ralf has done execve side, I've added
kernel_thread() one]
* conversion for parisc added [aka "Al has generated broken patches,
jejb has tested and fixed that crap"]
* arm64 conversion added [Catalin Marinas]

IOW, right now we have
* alpha, arm, arm64, c6x, m68k, mips, parisc, powerpc, s390, sparc,
um, x86 - apparently over and done with (IOW, all old architectures except for
itanic are converted by now)
* avr32, blackfin, hexagon, h8300, ia64, m32r, microblaze, openrisc,
score, sh, tile, unicore32, xtensa - need to be done
* cris, frv, mn10300 - need to be tested (and very likely will need
fixing)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] extcon : add charger supported as per spec

2012-10-09 Thread Jenny TC

Add support for cable names as per USB charging spec 1.2
Also add cable name for AC adapter. This standardises the
cable names

Signed-off-by: Jenny TC 
---
 drivers/extcon/extcon-class.c |5 +
 include/linux/extcon.h|5 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/extcon/extcon-class.c b/drivers/extcon/extcon-class.c
index 3d8e825..7188daf 100644
--- a/drivers/extcon/extcon-class.c
+++ b/drivers/extcon/extcon-class.c
@@ -42,6 +42,11 @@
  * names that are actually used in your extcon device.
  */
 const char *extcon_cable_name[] = {
+   [EXTCON_SDP]= "CHARGER_USB_SDP",
+   [EXTCON_DCP]= "CHARGER_USB_DCP",
+   [EXTCON_CDP]= "CHARGER_USB_CDP",
+   [EXTCON_ACA]= "CHARGER_USB_ACA",
+   [EXTCON_AC] = "CHARGER_AC",
[EXTCON_USB]= "USB",
[EXTCON_USB_HOST]   = "USB-Host",
[EXTCON_TA] = "TA",
diff --git a/include/linux/extcon.h b/include/linux/extcon.h
index 9be8286..073fd49 100644
--- a/include/linux/extcon.h
+++ b/include/linux/extcon.h
@@ -53,6 +53,11 @@ enum extcon_cable_name {
EXTCON_FAST_CHARGER,
EXTCON_SLOW_CHARGER,
EXTCON_CHARGE_DOWNSTREAM, /* Charging an external device */
+   EXTCON_SDP,
+   EXTCON_DCP,
+   EXTCON_CDP,
+   EXTCON_ACA,
+   EXTCON_AC,
EXTCON_HDMI,
EXTCON_MHL,
EXTCON_DVI,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] extcon : register for cable interest by cable name

2012-10-09 Thread Jenny TC

There are some scnearios where a driver/framework needs to register
interest for a particular cable without specifying the extcon device
name. One such scenario is charger notifications. The platform will
have charger cabel which will be bound to any extcon device. It's
not mandatory for the charger driver to know which extcon device
it should use. This patch enables the support for registering
interest for a cable just by cable name wihtout specifying the
extcon device name

Signed-off-by: Jenny TC 
---
 drivers/extcon/extcon-class.c |   52 +
 include/linux/extcon.h|3 +++
 2 files changed, 55 insertions(+)

diff --git a/drivers/extcon/extcon-class.c b/drivers/extcon/extcon-class.c
index 946a318..3d8e825 100644
--- a/drivers/extcon/extcon-class.c
+++ b/drivers/extcon/extcon-class.c
@@ -485,6 +485,58 @@ int extcon_register_interest(struct 
extcon_specific_cable_nb *obj,
 }
 
 /**
+ * extcon_register_interest_cable_byname() - Register a notifier for a state
+ * change of a specific cable, on any extcon device
+ * extcon device.
+ * @obj:   an empty extcon_specific_cable_nb object to be returned.
+ * @cable_name:the target cable name.
+ * @nb:the notifier block to get notified.
+ *
+ * Provide an empty extcon_specific_cable_nb.
+ * extcon_register_interest_cable_name() sets the struct for you.
+ *
+ * extcon_register_cable_interest is a helper function for those who want to 
get
+ * notification for a single specific cable's status change without knowing the
+ * extcon device name.
+ *
+ * Note : This will register the interest with the first extcon device which
+ * reports the status for the cable. If multiple extcon devices reports the
+ * same cable name, this API will register interest with the first extcon 
device
+ */
+
+struct extcon_dev *register_interest_cable_byname
+   (struct extcon_specific_cable_nb *extcon_dev,
+   const char *cable_name, struct notifier_block *nb)
+{
+   struct class_dev_iter iter;
+   struct device *dev;
+   struct extcon_dev *extd = NULL;
+
+   /* Identify the extcon device which supports the cable and register
+   * interest.
+   */
+   if (extcon_class == NULL)
+   return NULL;
+   class_dev_iter_init(, extcon_class, NULL, NULL);
+   while ((dev = class_dev_iter_next())) {
+   extd = (struct extcon_dev *)dev_get_drvdata(dev);
+   /* check for cable  support */
+   if (extcon_find_cable_index(extd, cable_name) < 0) {
+   extd = NULL;
+   continue;
+   }
+
+   if (extcon_register_interest(extcon_dev, extd->name,
+   cable_name, nb) < 0) {
+   extd = NULL;
+   continue;
+   }
+   }
+   class_dev_iter_exit();
+   return extd;
+}
+
+/**
  * extcon_unregister_interest() - Unregister the notifier registered by
  * extcon_register_interest().
  * @obj:   the extcon_specific_cable_nb object returned by
diff --git a/include/linux/extcon.h b/include/linux/extcon.h
index 7443a56..9be8286 100644
--- a/include/linux/extcon.h
+++ b/include/linux/extcon.h
@@ -222,6 +222,9 @@ extern int extcon_register_interest(struct 
extcon_specific_cable_nb *obj,
const char *extcon_name,
const char *cable_name,
struct notifier_block *nb);
+extern struct extcon_dev *register_interest_cable_byname
+   (struct extcon_specific_cable_nb *extcon_dev,
+   const char *cable_name, struct notifier_block *nb);
 extern int extcon_unregister_interest(struct extcon_specific_cable_nb *nb);
 
 /*
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: dtc: import latest upstream dtc

2012-10-09 Thread Warner Losh


On Oct 9, 2012, at 6:04 PM, Scott Wood wrote:

> On 10/09/2012 06:20:53 PM, Mitch Bradley wrote:
>> On 10/9/2012 11:16 AM, Stephen Warren wrote:
>> > On 10/01/2012 12:39 PM, Jon Loeliger wrote:
>> >>>
>> >>> What more do you think needs discussion re: dtc+cpp?
>> >>
>> >> How not to abuse the ever-loving shit out of it? :-)
>> >
>> > Perhaps we can just handle this through the regular patch review
>> > process; I think it may be difficult to define and agree upon exactly
>> > what "abuse" means ahead of time, but it's probably going to be easy
>> > enough to recognize it when one sees it?
>> One of the ways it could get out of hand would be via "include
>> dependency hell".  People will be tempted to reuse existing .h files
>> containing pin definitions, which, if history is a guide, will end up
>> depending on all sorts of other .h files.
>> Another problem I often face with symbolic names is the difficulty of
>> figuring out what the numerical values really are (for debugging),
>> especially when .h files are in different subtrees from the files that
>> use the definitions, and when they use multiple macro levels and fancy
>> features like concatenation.  Sometimes I think it's clearer just to
>> write the number and use a comment to say what it is.
> 
> Both comments apply just as well to ordinary C code, and I don't think anyone 
> would seriously suggest just using comments instead for C code.

.h files include both structs and defines, which are fine for ordinary C code, 
but problematic in this context.

> Is there a way to ask CPP to evaluate a macro in the context of the input 
> file, rather than produce normal output?  If not, I guess you could make a 
> tool that creates a wrapper file that includes the main file and then 
> evaluates the symbol you want.

Not in the standard CPP, but perhaps you could scan the .dts file for all the 
values you need, and have it output the right values to use.

Warner--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cpufreq:core: Fix printing of governor and driver name

2012-10-09 Thread Viresh Kumar

Arrays for governer and driver name are of size CPUFREQ_NAME_LEN or 16.
i.e. 15 bytes for name and 1 for trailing '\0'.

When cpufreq driver print these names (for sysfs), it includes '\n' or ' ' in
the fmt string and still passes length as CPUFREQ_NAME_LEN. If the driver or
governor names are using all 15 fields allocated to them, then the trailing '\n'
or ' ' will never be printed. And so commands like:

root@linaro-developer# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

will print something like:

cpufreq_foodrvroot@linaro-developer#

Fix this by increasing print length by one character.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq.c | 6 +++---
 include/linux/cpufreq.h   | 2 ++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 021973b..db6e337 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -445,7 +445,7 @@ static ssize_t show_scaling_governor(struct cpufreq_policy 
*policy, char *buf)
else if (policy->policy == CPUFREQ_POLICY_PERFORMANCE)
return sprintf(buf, "performance\n");
else if (policy->governor)
-   return scnprintf(buf, CPUFREQ_NAME_LEN, "%s\n",
+   return scnprintf(buf, CPUFREQ_NAME_PLEN, "%s\n",
policy->governor->name);
return -EINVAL;
 }
@@ -491,7 +491,7 @@ static ssize_t store_scaling_governor(struct cpufreq_policy 
*policy,
  */
 static ssize_t show_scaling_driver(struct cpufreq_policy *policy, char *buf)
 {
-   return scnprintf(buf, CPUFREQ_NAME_LEN, "%s\n", cpufreq_driver->name);
+   return scnprintf(buf, CPUFREQ_NAME_PLEN, "%s\n", cpufreq_driver->name);
 }
 
 /**
@@ -512,7 +512,7 @@ static ssize_t show_scaling_available_governors(struct 
cpufreq_policy *policy,
if (i >= (ssize_t) ((PAGE_SIZE / sizeof(char))
- (CPUFREQ_NAME_LEN + 2)))
goto out;
-   i += scnprintf([i], CPUFREQ_NAME_LEN, "%s ", t->name);
+   i += scnprintf([i], CPUFREQ_NAME_PLEN, "%s ", t->name);
}
 out:
i += sprintf([i], "\n");
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index b60f6ba..fc4b785 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -22,6 +22,8 @@
 #include 
 
 #define CPUFREQ_NAME_LEN 16
+/* Print length for names. Extra 1 space for accomodating '\n' in prints */
+#define CPUFREQ_NAME_PLEN (CPUFREQ_NAME_LEN + 1)
 
 
 /*
-- 
1.7.12.rc2.18.g61b472e


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] usb:musb: Dequeue urbs on device unplug

2012-10-09 Thread Virupax Sadashivpetimath

Flush queued urbs on receiving device disconnect
interrupt. This is required for successful disconnect
and successive enumeration of the device.

In a failure case khubd hangs on usb-storage thread
for completion. Seen in the below trace.

[ 1355.764526] SysRq : Show Blocked State
[ 1355.768341]   taskPC stack   pid father
[ 1355.773620] khubd   D c06a1fbc 0   503  2 0x
[ 1355.780151] [] (__schedule+0x3f0/0x8ec) from [] 
(schedule+0x58/0x70)
[ 1355.788330] [] (schedule+0x58/0x70) from [] 
(schedule_timeout+0x1d8/0x31c)
[ 1355.796997] [] (schedule_timeout+0x1d8/0x31c) from [] 
(wait_for_common+0xd8/0x180)
[ 1355.806396] [] (wait_for_common+0xd8/0x180) from [] 
(wait_for_completion+0x20/0x24)
[ 1355.815887] [] (wait_for_completion+0x20/0x24) from [] 
(kthread_stop+0x68/0x17c)
[ 1355.825103] [] (kthread_stop+0x68/0x17c) from [] 
(release_everything+0x30/0x8c)
[ 1355.834228] [] (release_everything+0x30/0x8c) from [] 
(usb_stor_disconnect+0x2c/0x30)
[ 1355.843902] [] (usb_stor_disconnect+0x2c/0x30) from [] 
(usb_unbind_interface+0x60/0x1e0)
[ 1355.853820] [] (usb_unbind_interface+0x60/0x1e0) from [] 
(__device_release_driver+0x80/0xd0)
[ 1355.864074] [] (__device_release_driver+0x80/0xd0) from 
[] (device_release_driver+0x2c/0x38)
[ 1355.874359] [] (device_release_driver+0x2c/0x38) from [] 
(bus_remove_device+0xbc/0x10c)
[ 1355.884155] [] (bus_remove_device+0xbc/0x10c) from [] 
(device_del+0x108/0x17c)
[ 1355.893188] [] (device_del+0x108/0x17c) from [] 
(usb_disable_device+0xbc/0x200)
[ 1355.902313] [] (usb_disable_device+0xbc/0x200) from [] 
(usb_disconnect+0xb8/0x194)
[ 1355.911682] [] (usb_disconnect+0xb8/0x194) from [] 
(hub_thread+0x45c/0x14b0)
[ 1355.920562] [] (hub_thread+0x45c/0x14b0) from [] 
(kthread+0x98/0xa0)
[ 1355.928710] [] (kthread+0x98/0xa0) from [] 
(kernel_thread_exit+0x0/0x8)
[ 1356.014373] usb-storage D c06a1fbc 0  2379  2 0x
[ 1356.020843] [] (__schedule+0x3f0/0x8ec) from [] 
(schedule+0x58/0x70)
[ 1356.029022] [] (schedule+0x58/0x70) from [] 
(schedule_timeout+0x1d8/0x31c)
[ 1356.037719] [] (schedule_timeout+0x1d8/0x31c) from [] 
(wait_for_common+0xd8/0x180)
[ 1356.047088] [] (wait_for_common+0xd8/0x180) from [] 
(wait_for_completion+0x20/0x24)
[ 1356.056549] [] (wait_for_completion+0x20/0x24) from [] 
(usb_sg_wait+0x108/0x194)
[ 1356.065795] [] (usb_sg_wait+0x108/0x194) from [] 
(usb_stor_bulk_transfer_sglist+0x9c/0xf4)
[ 1356.075866] [] (usb_stor_bulk_transfer_sglist+0x9c/0xf4) from 
[] (usb_stor_bulk_srb+0x38/0x50)
[ 1356.086303] [] (usb_stor_bulk_srb+0x38/0x50) from [] 
(usb_stor_Bulk_transport+0x114/0x2d0)
[ 1356.096374] [] (usb_stor_Bulk_transport+0x114/0x2d0) from 
[] (usb_stor_invoke_transport+0x34/0x3f4)
[ 1356.107238] [] (usb_stor_invoke_transport+0x34/0x3f4) from 
[] (usb_stor_transparent_scsi_command+0x18/0x1c)
[ 1356.118804] [] (usb_stor_transparent_scsi_command+0x18/0x1c) from 
[] (usb_stor_control_thread+0x190/0x28c)
[ 1356.130279] [] (usb_stor_control_thread+0x190/0x28c) from 
[] (kthread+0x98/0xa0)
[ 1356.139465] [] (kthread+0x98/0xa0) from [] 
(kernel_thread_exit+0x0/0x8)

Signed-off-by: Virupax Sadashivpetimath 

Acked-by: Linus Walleij 
---
 drivers/usb/musb/musb_core.c |   37 +
 drivers/usb/musb/musb_host.c |3 +++
 2 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index bb56a0e..fc6e990 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -414,6 +414,41 @@ static void musb_otg_timer_func(unsigned long data)
spin_unlock_irqrestore(>lock, flags);
 }
 
+void musb_handle_disconnect(struct musb *musb)
+{
+   int epnum, i;
+   struct urb  *urb;
+   struct musb_hw_ep   *hw_ep;
+   struct musb_qh  *qh;
+   struct usb_hcd *hcd = musb_to_hcd(musb);
+
+   for (epnum = 0; epnum < musb->config->num_eps;
+   epnum++) {
+   hw_ep = musb->endpoints + epnum;
+   for (i = 0; i < 2; i++) {
+   if (hw_ep->in_qh == hw_ep->out_qh)
+   i++;
+   qh = (i == 0) ? hw_ep->in_qh : hw_ep->out_qh;
+
+   if (qh && qh->hep) {
+   qh->is_ready = 0;
+   while ((urb = next_urb(qh))) {
+   usb_hcd_unlink_urb_from_ep(hcd, urb);
+
+   spin_unlock(>lock);
+   usb_hcd_giveback_urb(hcd, urb, 0);
+   spin_lock(>lock);
+   }
+
+   qh->hep->hcpriv = NULL;
+   list_del(>ring);
+   kfree(qh);
+   hw_ep->in_qh = hw_ep->out_qh = NULL;
+   }
+   }
+   }
+}
+
 /*
  *

Re: [GIT PULL] Disintegrate UAPI for powerpc [ver #2]

2012-10-09 Thread Benjamin Herrenschmidt

On Tue, 2012-10-09 at 10:15 +0100, David Howells wrote:
> Can you merge the following branch into the powerpc tree please.
> 
> This is to complete part of the UAPI disintegration for which the preparatory
> patches were pulled recently.

This is now in the powerpc "merge" branch:

git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge

Linus, free free to pull it.

Note that upstream Linus (and thus this branch) fails to build powerpc
configs with memory hotplug enabled due to Andrew merging a completely
untested and un-next'ed patch :-) (tsk tsk tsk !)

I assume you (Andrew) already has a fix on its way to Linus so I haven't
put it in this branch.

Cheers,
Ben.


> Now that the fixups and the asm-generic chunk have been merged, I've
> regenerated the patches to get rid of those dependencies and to take account 
> of
> any changes made so far in the merge window.  If you have already pulled the
> older version of the branch aimed at you, then please feel free to ignore this
> request.
> 
> The following changes since commit 9e2d8656f5e8aa214e66b462680cf86b210b74a8:
> 
>   Merge branch 'akpm' (Andrew's patch-bomb) (2012-10-09 16:23:15 +0900)
> 
> are available in the git repository at:
> 
> 
>   git://git.infradead.org/users/dhowells/linux-headers.git 
> tags/disintegrate-powerpc-20121009
> 
> for you to fetch changes up to c3617f72036c909e1f6086b5b9e364e0ef90a6da:
> 
>   UAPI: (Scripted) Disintegrate arch/powerpc/include/asm (2012-10-09 09:47:26 
> +0100)
> 
> 
> UAPI Disintegration 2012-10-09
> 
> 
> David Howells (1):
>   UAPI: (Scripted) Disintegrate arch/powerpc/include/asm
> 
>  arch/powerpc/include/asm/Kbuild   |  35 --
>  arch/powerpc/include/asm/bootx.h  | 123 +--
>  arch/powerpc/include/asm/cputable.h   |  35 +-
>  arch/powerpc/include/asm/elf.h| 311 +-
>  arch/powerpc/include/asm/kvm_para.h   |  70 +---
>  arch/powerpc/include/asm/mman.h   |  27 +-
>  arch/powerpc/include/asm/nvram.h  |  55 +---
>  arch/powerpc/include/asm/ptrace.h | 242 +-
>  arch/powerpc/include/asm/signal.h | 143 +---
>  arch/powerpc/include/asm/spu_info.h   |  29 +-
>  arch/powerpc/include/asm/swab.h   |  15 +-
>  arch/powerpc/include/asm/termios.h|  69 +---
>  arch/powerpc/include/asm/types.h  |  30 +-
>  arch/powerpc/include/asm/unistd.h | 374 +
>  arch/powerpc/include/uapi/asm/Kbuild  |  41 +++
>  arch/powerpc/include/{ => uapi}/asm/auxvec.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/bitsperlong.h |   0
>  arch/powerpc/include/uapi/asm/bootx.h | 132 
>  arch/powerpc/include/{ => uapi}/asm/byteorder.h   |   0
>  arch/powerpc/include/uapi/asm/cputable.h  |  36 ++
>  arch/powerpc/include/uapi/asm/elf.h   | 307 +
>  arch/powerpc/include/{ => uapi}/asm/errno.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/fcntl.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/ioctl.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/ioctls.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/ipcbuf.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/kvm.h |   0
>  arch/powerpc/include/uapi/asm/kvm_para.h  |  90 +
>  arch/powerpc/include/{ => uapi}/asm/linkage.h |   0
>  arch/powerpc/include/uapi/asm/mman.h  |  31 ++
>  arch/powerpc/include/{ => uapi}/asm/msgbuf.h  |   0
>  arch/powerpc/include/uapi/asm/nvram.h |  62 
>  arch/powerpc/include/{ => uapi}/asm/param.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/poll.h|   0
>  arch/powerpc/include/{ => uapi}/asm/posix_types.h |   0
>  arch/powerpc/include/{ => uapi}/asm/ps3fb.h   |   0
>  arch/powerpc/include/uapi/asm/ptrace.h| 259 +++
>  arch/powerpc/include/{ => uapi}/asm/resource.h|   0
>  arch/powerpc/include/{ => uapi}/asm/seccomp.h |   0
>  arch/powerpc/include/{ => uapi}/asm/sembuf.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/setup.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/shmbuf.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/sigcontext.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/siginfo.h |   0
>  arch/powerpc/include/uapi/asm/signal.h| 145 +
>  arch/powerpc/include/{ => uapi}/asm/socket.h  |   0
>  arch/powerpc/inclu

Re: [PATCH RFC] function probe_roms accessing improper addresses

2012-10-09 Thread Randy Wright

> Date: Thu, 4 Oct 2012 20:22:56 +0100
> From: Matthew Garrett 
> To: rwri...@hp.com
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [PATCH RFC] function probe_roms accessing improper addresses
>  on UEFI systems
> Message-ID: <20121004192256.ga6...@srcf.ucam.org>
> References: <201210032353.q93nrkni018...@filesys1.fc.hp.com>
> 
> On Wed, Oct 03, 2012 at 05:53:46PM -0600, Randy Wright wrote:
> 
> > The following proposed patch takes advantage of the fact that on EFI
> > systems, the memory map provides a better description of the physical
> > space than on pre-EFI legacy systems. If the efi_enabled state variable
> > indicates the kernel is running on an UEFI system, the patch will use
> > information from the UEFI memory map so as not to access addresses that
> > should avoided according to the UEFI specification.
> 
> This turns out to be awkward. Some (mostly older) EFI platforms still 
> only provide the video ROM through the 0xc window, and that's 
> sometimes needed even if the platform isn't using int10 for anything 
> (for instance, some Intel graphics machines only provide the VBT through 
> the video ROM and don't provide that via the PCI BAR). And, of course, 
> they have an EFI memory map that just shows a hole there.
> 
> So we can't distinguish between the two cases easily. The only thing I 
> can think of would be to push that policy out to the graphics drivers 
> and have them trigger a scan only if they can't get the required 
> information from any other source. I suspect that this patch as is would 
> break graphics on a reasonable number of EFI platforms.
> -- 
> Matthew Garrett | mj...@srcf.ucam.org

Hi Matthew, 

I appreciate your description of the problems with my approach, as well
as the reply from h...@zytor.com (H. Peter Anvin) in response to my mention
of this patch in another thread.  His reply contained a couple of
suggestions that initially appeal to me more than an approach requiring
a change to a set of video drivers, the size of which I don't quite
know how to scope.  In that other thread, hpa said:

| One option would be to quirk it; obviously there is some piece of
| hardware which does cause this #MC and hopefully we could use that to
| detect that specific regions should be excluded; another option would be
| to trap the #MC during ROM probing.

I definitely see the appeal of trapping the #MC and triggering a
solution from that, if it can be made to work. I've spent some time
evaluating that, and I see these issues:

1. I don't believe the kernel's MC handler is initialized early enough
to handle a machine check occurring as early as probe_roms.  Probe_roms
is called very early in boot.  I see this as the call stack:
  start_kernel->setup_arch->probe_roms
Whereas the machine check initialization for x86 appears to come later:
  start_kernel->check_bugs->identify_boot_cpu->identify_cpu->mcheck_cpu_init
At present, I do not want to tackle such a major reordering of
intialization as would be required to change this.

2. For all platforms, is the setup of chipset and cpu address decoding
robust enough to allow the OS to handle the resulting machine check and
recover?  I've worked with some platforms in the past where this was not
always the case, the result being that for some unpopulated address
ranges, the resulting machine check would not be recoverable.

Because of the above difficulties with the MC handler idea, I have
focused my thoughts more on the quirk idea that hpa mentioned. I've been
investigating some existing examples in the kernel, and trying to
understand some of the issues involved with designing a new one.

1. Can the interface be chosen to present the needed interface to all callers? 
I recognize this as a challenge if a single interface is to be used both
in early boot (e.g. probe_roms) and later runtime (e.g.
devmem_is_allowed).  Something like a new member added to the
x86_platform_ops structure?

2. How can it automatically be activated for platforms that need it? I
see quite a few quirks selected by cpu id, but that's probably not
appropriate here.  Again, activating it by hitting the #MC in probe_roms
would be cool, but I see it as involving a major reordering of
initialization code.  So I'm left thinking about something in keying off
the dmi platform strings, which fortunately are initialized thusly:
  start_kernel->setup_arch->dmi_scan_machine
convenient, as it's just before probe_roms is called.

3. Can it be activated on demand for testing on other platforms? A
kernel boot command line parameter could be added, for example. How does
the community feel about adding more of those?

What are other design issues I'm overlooking?

Are there are existing quirks that strike you as particularly
good models for this case?  

--
Randy WrightHewlett-Packard Company
Phone: (970) 898-0998   Mail: rwri...@hp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org

Re: [PATCH v2 0/2] Reset PCIe devices to address DMA problem on kdump with iommu

2012-10-09 Thread Takao Indoh


(2012/10/10 1:05), Don Dutile wrote:

On 10/09/2012 05:03 AM, Takao Indoh wrote:

(2012/10/03 22:23), Don Dutile wrote:

On 10/02/2012 03:49 AM, Takao Indoh wrote:

These patches reset PCIe devices at boot time to address DMA problem on
kdump with iommu. When "reset_devices" is specified, a hot reset is
triggered on each PCIe root port and downstream port to reset its
downstream endpoint.

Background:
A kdump problem about DMA has been discussed for a long time. That is,
when a kernel is switched to the kdump kernel DMA derived from first
kernel affects second kernel. Recently this problem surfaces when iommu
is used for PCI passthrough on KVM guest. In the case of the machine I
use, when intel_iommu=on is specified, DMAR error is detected in kdump
kernel and PCI SERR is also detected. Finally kdump fails because some
devices does not work correctly.

The root cause is that ongoing DMA from first kernel causes DMAR fault
because page table of DMAR is initialized while kdump kernel is booting
up. Therefore to address this problem DMA needs to be stopped before
DMAR is initialized at kdump kernel boot time. By these patches, PCIe
devices are reset by hot reset and its DMA is stopped when reset_devices
is specified. One problem of this solution is that the monitor blacks
out when VGA controller is reset. So this patch does not reset the port
whose child endpoint is VGA device.

v2:
Reset devices in setup_arch() because reset need to be done before
interrupt remapping is initialized.

v1:
https://lkml.org/lkml/2012/8/3/160

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html



Maybe you've tried the following, and I missed a thread on it,
but instead of a somewhat-large, reset hammer, did any one try
just reading all the endpoint-only device CMD register, flip the MasterEnable
bit off, and write it back?  that would stop all DMA (should stop
all MSI writes as well since they are just another DMA), and then
restart the system?
May also have to do PCI INT Disable as well... and note, that's a PCI 2.3
optional feature so devices using INT signalling vs MSI
is just borked on IOMMU/intr-remapping systems... which I would expect
are few. Then again, if this is foolishly done, then reset legacy PCI
busses as the fallback.


Just clearing bus master bit and INTx disable bit in setup_arch() did
not solve this problem. I still got DMAR error on devices(for exmaple,
igb and megaraid_sas).

Clearing bus master in setup_arch() and resetting devices in fixup_final
like v1 patch is better, DMAR error was not detected. But on certain
machine kdump kernel hung up when resetting devices. It seems to be a
problem specific to the platform.

And, resetting devices in setup_arch() like v2 patch solves all problems
I found so far.

Thanks,
Takao Indoh


this summary should be in the patch set, so others know/learn
what was attempted, what failed, and how you reached this working conclusion.


Ok, I'll post new patch with this information.

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ODEBUG: free active (active state 0) object type: work_struct hint: flush_to_ldisc+0x0/0x1a0

2012-10-09 Thread Dave Jones

Just hit this..

WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0()
ODEBUG: free active (active state 0) object type: work_struct hint: 
flush_to_ldisc+0x0/0x1a0
Modules linked in: fuse ipt_ULOG nfnetlink tun binfmt_misc nfc caif_socket caif 
phonet can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet 
rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 lockd sunrpc 
bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state 
nf_conntrack ip6table_filter ip6_tables kvm_intel usb_debug kvm crc32c_intel 
ghash_clmulni_intel microcode pcspkr i2c_i801 e1000e uinput i915 video 
i2c_algo_bit drm_kms_helper drm i2c_core
Pid: 23707, comm: kworker/3:0 Not tainted 3.6.0+ #26
Call Trace:
 [] warn_slowpath_common+0x7f/0xc0
 [] warn_slowpath_fmt+0x46/0x50
 [] debug_print_object+0x8c/0xb0
 [] ? tty_insert_flip_string_fixed_flag+0x100/0x100
 [] debug_check_no_obj_freed+0x119/0x210
 [] ? free_tty_struct+0x46/0x50
 [] kfree+0xe6/0x340
 [] free_tty_struct+0x46/0x50
 [] release_one_tty+0xa7/0xc0
 [] process_one_work+0x207/0x770
 [] ? process_one_work+0x197/0x770
 [] ? free_tty_struct+0x50/0x50
 [] worker_thread+0x15e/0x440
 [] ? rescuer_thread+0x240/0x240
 [] kthread+0xed/0x100
 [] ? sub_preempt_count+0x79/0xd0
 [] kernel_thread_helper+0x4/0x10
 [] ? finish_task_switch+0x7c/0x120
 [] ? _raw_spin_unlock_irq+0x4b/0x80
 [] ? retint_restore_args+0x13/0x13
 [] ? kthread_create_on_node+0x160/0x160
 [] ? gs_change+0x13/0x13
---[ end trace fbabf37a8756c1c9 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 10/10] mm: kill vma flag VM_RESERVED and mm->reserved_vm counter

2012-10-09 Thread Alex Williamson

On Tue, 2012-10-09 at 17:00 -0600, Alex Williamson wrote:
> On Tue, 2012-10-09 at 08:21 -0600, Alex Williamson wrote:
> > On Tue, 2012-10-09 at 21:12 +0900, Linus Torvalds wrote:
> > > On Tue, Oct 9, 2012 at 7:02 PM, Eric Dumazet  
> > > wrote:
> > > >
> > > > It seems drivers/vfio/pci/vfio_pci.c uses VM_RESERVED
> > > 
> > > Yeah, I just pushed out what I think is the right (trivial) fix.
> > 
> > Thank you, looks correct to me as well.
> 
> Well, that might still be correct, but it's actually b3b9c293 (mm, x86,
> pat: rework linear pfn-mmap tracking) that breaks vfio.  As soon as I
> add that commit our use of mmap'd device areas stops working, both
> mapping them through the iommu and through kvm.  kvm hits the BUG_ON
> from !kvm_is_mmio_pfn in virt/kvm/kvm_main.c:hva_to_pfn.  Thanks,

It looks like vfio was for some reason relying on the cow special case
in remap_pfn_range() to set vma->vm_pgoff.  Since we're not doing a
is_cow_mapping, vm_pgoff is no longer updated in that function after
b3b9c293.  I'll add a vfio patch to fix this.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v2 2/7] Regulator: DA9055 Regulator driver

2012-10-09 Thread Mark Brown

On Tue, Oct 09, 2012 at 04:30:16PM +0530, Ashish Jangam wrote:
> On Tue, 2012-10-09 at 15:37 +0530, Mark Brown wrote:
> > On Mon, Oct 08, 2012 at 07:00:39PM +0530, Ashish Jangam wrote:

> > > + /* Set the GPIO I/P pin for controlling the regulator state. */
> > > + ret = devm_gpio_request_one(config->dev, gpio, GPIOF_DIR_IN,
> > > + name);
> > > + if (ret < 0)
> > > + goto err;

> > We never actually appear to use this GPIO anywhere...  why are we
> > requesting it?

> DA9055 regulator changes its state by detecting the rising/failing edge at
> GPI DA9055. Therefore we just need to set the DA9055 GPIO direction to input.

Right, so there's several problems here.  One is that this code is very
obscure - you're really doing pinmux here rather than actually using it
as a GPIO, a better comment would clarify this.  The other is that
you're requiring a defined gpio_base in platform data, it would be
better to allow this to be dynamically assigned as the driver can find
it's own GPIOs easily enough.

> >   Also, why is the ability to read the regulator state via
> > a GPIO associated with controlling it via a GPIO, it's unusual for these
> > things to be tied together.

> There is no connection between state just to differentiate between two 
> strings/labels.
> If required I can change the string.

It's nothing to do with the name, it's that it looks due to the above
like the input GPIO is used by the CPU to read the state of the device.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] PM / devfreq: add PM-QoS support

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 07:59:13PM +0900, MyungJoo Ham wrote:
> Even if the performance of a device is controlled properly with devfreq,
> sometimes, we still need to get PM-QoS inputs in order to meet the
> required performance.
> 
> In our testbed of Exynos4412, which has on-chip various DMA devices, the
> memory interface and system bus are controlled according to their
> utilization by devfreq. However, in some multimedia applications
> including video-playing with MFC (multi-function codec) H/W and
> photo/video-capturing with FIMC H/W, we have observed issues due to
> insufficient DMA throughput or latency.
> 
> In such applications, there are deadlines: less than 16.6ms with 60Hz.
> With shorter polling intervals (5 ~ 15ms), the frequencies fluctuate
> within a frame and we get missing frames and distorted pictures.
> With longer polling intervals (20 ~ 100ms), the response time is not
> sufficient and we get distorted or broken images. In other words,
> regardless of polling interval, we get poor results with hard-deadline
> H/Ws. They, in fact, have a preset requirement on DMA throughput.
> 
> Thus, we need PM-QoS capabilities in devfreq. (Note that for general
> user applications, devfreq for bus/memory works fine. They are not so
> sensitive to tens of ms in performance increasing responses in general.
> 
> In order to express how to handle QoS requests in devfreq devices,
> the devfreq device drivers only need to express the mappings of
> QoS value and frequency pairs with QoS class along with
> devfreq_add_device() call.
> 
> Tested on Exynos4412 machines with memory/bus frequencies and multimedia
> H/W blocks. (camera, video decoding, and video encoding)
> 
> Signed-off-by: MyungJoo Ham 
> Signed-off-by: Kyungmin Park 
> 
> ---
> Changes from V3
> - Corrected and added comments
> - Code Clean
> - Merged per-dev qos patch and global qos patch
> 
> Changed from V2-resend
> - Removed dependencies on global pm-qos class definitions
> - Revised data structure handling pm-qos (being ready for dev-pm-qos)
> 
> Changes from V2
> - Rebased
> 
> Changes from V1
> - Error handling at devfreq_add_device()
> - Handling pm_qos_max information
> - Styly update
> ---
>  drivers/devfreq/devfreq.c |  137 
> -
>  include/linux/devfreq.h   |   41 +
>  2 files changed, 177 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 00e326c..d0cb33b 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "governor.h"
>  
>  static struct class *devfreq_class;
> @@ -136,8 +137,13 @@ int update_devfreq(struct devfreq *devfreq)
>* List from the highest proiority
>* max_freq (probably called by thermal when it's too hot)
>* min_freq
> +  * qos_min_freq
>*/
>  
> + if (devfreq->qos_min_freq && freq < devfreq->qos_min_freq) {
> + freq = devfreq->qos_min_freq;
> + flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
> + }
>   if (devfreq->min_freq && freq < devfreq->min_freq) {
>   freq = devfreq->min_freq;
>   flags &= ~DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use GLB */
> @@ -183,6 +189,56 @@ static int devfreq_notifier_call(struct notifier_block 
> *nb, unsigned long type,
>  }
>  
>  /**
> + * devfreq_qos_notifier_call() - Handle QoS requests (updates)
> + *
> + * @nb   the notifier_block (supposed to be devfreq->qos_nb)
> + * @valuethe QoS value
> + * @devp not used
> + */
> +static int devfreq_qos_notifier_call(struct notifier_block *nb,
> +  unsigned long value, void *devp)
> +{
> + struct devfreq *devfreq = container_of(nb, struct devfreq, qos_nb);
> + int ret;
> + int i;
> + s32 default_value = PM_QOS_DEFAULT_VALUE;
> + struct devfreq_pm_qos_table *qos_list = devfreq->profile->qos_list;
> + bool qos_use_max = devfreq->profile->qos_use_max;
> +
> + if (!qos_list)
> + return NOTIFY_DONE;
> +
> + mutex_lock(>lock);
> +
> + if (value == default_value) {
> + devfreq->qos_min_freq = 0;
> + goto update;
> + }
> +
> + for (i = 0; qos_list[i].freq; i++) {
> + /* QoS Met */
> + if (qos_use_max) {
> + if (qos_list[i].qos_value < value)
> + continue;
> + } else {
> + if (qos_list[i].qos_value > value)
> + continue;
> + }
> + devfreq->qos_min_freq = qos_list[i].freq;
> + goto update;
> + }
> +
> + /* Use the highest QoS freq */
> + devfreq->qos_min_freq = qos_list[i - 1].freq;
> +
> +update:
> + ret = update_devfreq(devfreq);
> + mutex_unlock(>lock);
> +
> + return ret;
> +}
> +
> +/**
>   * _remove_devfreq()

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/10/2012 10:06 AM, Wen Congyang Wrote:
> At 10/10/2012 07:27 AM, David Rientjes Wrote:
>> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
>>
>>> Well the code they were patching is in the wakeup path. As I think Tang
>>> said, we leave !runnable tasks on whatever cpu they ran on last, even if
>>> that cpu is offlined, we try and fix up state when we get a wakeup.
>>>
>>> On wakeup, it tries to find a cpu to run on and will try a cpu of the
>>> same node first.
>>>
>>> Now if that node's entirely gone away, it appears the cpu_to_node() map
>>> will not return a valid node number.
>>>
>>> I think that's a change in behaviour, it didn't used to do that afaik.
>>> Certainly this code hasn't change in a while.
>>>
>>
>> If cpu_to_node() always returns a valid node id even if all cpus on the 
>> node are offline, then the cpumask_of_node() implementation, which the 
>> sched code is using, should either return an empty cpumask (if 
>> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
>> behavior here occurred because 
>> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
>> return a valid node id and forces it to return -1 so a kzalloc_node(..., 
>> -1) fallsback to allocate anywhere.
>>
>> But if you only need cpu_to_node() when waking up to find a runnable cpu 
>> for this NUMA information, then I think you can just change the 
>> kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
>> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
>>
>>  [ The changelog here is confusing because it's fixing a problem in 
>>linux-next without saying so. ]
>>
> 
> I don't agree with this way. Because it only fix the code which causes a
> problem, and we can't say there is no any similar problem. So it is
> why I clear the cpu-to-node mapping.
> 
> What about the following solution:
> 1. clear the cpu-to-node mapping when the node is offlined

There is no interface to online/offline a node. We online a node only
when the cpu/memory is node, and offline it when all cpu/memory in
this node is offlined(TODO).

So we may need to map cpu-to-node when the cpu is onlined if clear
it when the node is offlined. But we don't know the cpu's node.

Thanks
Wen Congyang

> 2. tang's patch is still necessary because we leave !runnable tasks on
>whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means
>the entire node is offlined, and we must migrate the task to the other
>node.
> 
> Thanks
> Wen Congyang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-numa" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/7] PM / QoS: Prepare struct dev_pm_qos_request for more request types

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:06:08AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> The subsequent patches will use struct dev_pm_qos_request for
> representing both latency requests and flags requests.  To make that
> easier, put the node member of struct dev_pm_qos_request (under the
> name "pnode") into a union called "data" that will represent the
> request's  value and list node depending on its type.
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  drivers/base/power/qos.c   |6 +++---
>  drivers/base/power/sysfs.c |2 +-
>  include/linux/pm_qos.h |4 +++-
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> Index: linux/include/linux/pm_qos.h
> ===
> --- linux.orig/include/linux/pm_qos.h
> +++ linux/include/linux/pm_qos.h
> @@ -39,7 +39,9 @@ struct pm_qos_flags_request {
>  };
>  
>  struct dev_pm_qos_request {
> - struct plist_node node;
> + union {
> + struct plist_node pnode;
> + } data;
>   struct device *dev;
>  };
>  
> Index: linux/drivers/base/power/sysfs.c
> ===
> --- linux.orig/drivers/base/power/sysfs.c
> +++ linux/drivers/base/power/sysfs.c
> @@ -221,7 +221,7 @@ static DEVICE_ATTR(autosuspend_delay_ms,
>  static ssize_t pm_qos_latency_show(struct device *dev,
>  struct device_attribute *attr, char *buf)
>  {
> - return sprintf(buf, "%d\n", dev->power.pq_req->node.prio);
> + return sprintf(buf, "%d\n", dev->power.pq_req->data.pnode.prio);
>  }
>  
>  static ssize_t pm_qos_latency_store(struct device *dev,
> Index: linux/drivers/base/power/qos.c
> ===
> --- linux.orig/drivers/base/power/qos.c
> +++ linux/drivers/base/power/qos.c
> @@ -90,7 +90,7 @@ static int apply_constraint(struct dev_p
>   int ret, curr_value;
>  
>   ret = pm_qos_update_target(>dev->power.qos->latency,
> ->node, action, value);
> +>data.pnode, action, value);
>  
>   if (ret) {
>   /* Call the global callbacks if needed */
> @@ -183,7 +183,7 @@ void dev_pm_qos_constraints_destroy(stru
>  
>   c = >latency;
>   /* Flush the constraints list for the device */
> - plist_for_each_entry_safe(req, tmp, >list, node) {
> + plist_for_each_entry_safe(req, tmp, >list, data.pnode) {
>   /*
>* Update constraints list and call the notification
>* callbacks if needed
> @@ -293,7 +293,7 @@ int dev_pm_qos_update_request(struct dev
>   mutex_lock(_pm_qos_mtx);
>  
>   if (req->dev->power.qos) {
> - if (new_value != req->node.prio)
> + if (new_value != req->data.pnode.prio)
>   ret = apply_constraint(req, PM_QOS_UPDATE_REQ,
>  new_value);
>   } else {
> 
Reviewed-by: mark gross 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/7] PM / QoS: Introduce PM QoS device flags support

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:07:10AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Modify the device PM QoS core code to support PM QoS flags requests.
> 
> First, add a new field of type struct pm_qos_flags called "flags"
> to struct dev_pm_qos for representing the list of PM QoS flags
> requests for the given device.  Accordingly, add a new "type" field
> to struct dev_pm_qos_request (along with an enum for representing
> request types) and a new member called "flr" to its data union for
> representig flags requests.
> 
> Second, modify dev_pm_qos_add_request(), dev_pm_qos_update_request(),
> the internal routine apply_constraint() used by them and their
> existing callers to cover flags requests as well as latency
> requests.  In particular, dev_pm_qos_add_request() gets a new
> argument called "type" for specifying the type of a request to be
> added.
> 
> Finally, introduce two routines, __dev_pm_qos_flags() and
> dev_pm_qos_flags(), allowing their callers to check which PM QoS
> flags have been requested for the given device (the caller is
> supposed to pass the mask of flags to check as the routine's
> second argument and examine its return value for the result).
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  Documentation/power/pm_qos_interface.txt |2 
>  drivers/base/power/qos.c |  124 
> ---
>  drivers/mtd/nand/sh_flctl.c  |4 -
>  include/linux/pm_qos.h   |   26 ++
>  4 files changed, 127 insertions(+), 29 deletions(-)
> 
> Index: linux/drivers/base/power/qos.c
> ===
> --- linux.orig/drivers/base/power/qos.c
> +++ linux/drivers/base/power/qos.c
> @@ -48,6 +48,50 @@ static DEFINE_MUTEX(dev_pm_qos_mtx);
>  static BLOCKING_NOTIFIER_HEAD(dev_pm_notifiers);
>  
>  /**
> + * __dev_pm_qos_flags - Check PM QoS flags for a given device.
> + * @dev: Device to check the PM QoS flags for.
> + * @mask: Flags to check against.
> + *
> + * This routine must be called with dev->power.lock held.
> + */
> +enum pm_qos_flags_status __dev_pm_qos_flags(struct device *dev, s32 mask)
> +{
> + struct dev_pm_qos *qos = dev->power.qos;
> + struct pm_qos_flags *pqf;
> + s32 val;
> +
> + if (!qos)
> + return PM_QOS_FLAGS_UNDEFINED;
> +
> + pqf = >flags;
> + if (list_empty(>list))
> + return PM_QOS_FLAGS_UNDEFINED;
> +
> + val = pqf->effective_flags & mask;
> + if (val)
> + return (val == mask) ? PM_QOS_FLAGS_ALL : PM_QOS_FLAGS_SOME;
> +
> + return PM_QOS_FLAGS_NONE;
> +}
> +
> +/**
> + * dev_pm_qos_flags - Check PM QoS flags for a given device (locked).
> + * @dev: Device to check the PM QoS flags for.
> + * @mask: Flags to check against.
> + */
> +enum pm_qos_flags_status dev_pm_qos_flags(struct device *dev, s32 mask)
> +{
> + unsigned long irqflags;
> + enum pm_qos_flags_status ret;
> +
> + spin_lock_irqsave(>power.lock, irqflags);
> + ret = __dev_pm_qos_flags(dev, mask);
> + spin_unlock_irqrestore(>power.lock, irqflags);
> +
> + return ret;
> +}
> +
> +/**
>   * __dev_pm_qos_read_value - Get PM QoS constraint for a given device.
>   * @dev: Device to get the PM QoS constraint value for.
>   *
> @@ -74,30 +118,39 @@ s32 dev_pm_qos_read_value(struct device
>   return ret;
>  }
>  
> -/*
> - * apply_constraint
> - * @req: constraint request to apply
> - * @action: action to perform add/update/remove, of type enum 
> pm_qos_req_action
> - * @value: defines the qos request
> +/**
> + * apply_constraint - Add/modify/remove device PM QoS request.
> + * @req: Constraint request to apply
> + * @action: Action to perform (add/update/remove).
> + * @value: Value to assign to the QoS request.
>   *
>   * Internal function to update the constraints list using the PM QoS core
>   * code and if needed call the per-device and the global notification
>   * callbacks
>   */
>  static int apply_constraint(struct dev_pm_qos_request *req,
> - enum pm_qos_req_action action, int value)
> + enum pm_qos_req_action action, s32 value)
>  {
> - int ret, curr_value;
> -
> - ret = pm_qos_update_target(>dev->power.qos->latency,
> ->data.pnode, action, value);
> + struct dev_pm_qos *qos = req->dev->power.qos;
> + int ret;
>  
> - if (ret) {
> - /* Call the global callbacks if needed */
> - curr_value = pm_qos_read_value(>dev->power.qos->latency);
> - blocking_notifier_call_chain(_pm_notifiers,
> -  (unsigned long)curr_value,
> -  req);
> + switch(req->type) {
> + case DEV_PM_QOS_LATENCY:
> + ret = pm_qos_update_target(>latency, >data.pnode,
> +action, value);
> + if (ret) {
> +

Re: [PATCH 7/7] PM / ACPI: Take device PM QoS flags into account

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:09:26AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Make ACPI power management routines and PCI power management
> routines depending on ACPI take device PM QoS flags into account
> when deciding what power state to put the device into.
> 
> In particular, after this change acpi_pm_device_sleep_state() will
> not return ACPI_STATE_D3_COLD as the deepest available low-power
> state if PM_QOS_FLAG_NO_POWER_OFF is requested for the device and it
> will not require remote wakeup to work for the device in the returned
> low-power state if there is at least one PM QoS flags request for the
> device, but PM_QOS_FLAG_REMOTE_WAKEUP is not requested for it.
> 
> Accordingly, acpi_pci_set_power_state() will refuse to put the
> device into D3cold if PM_QOS_FLAG_NO_POWER_OFF is requested for it.
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  drivers/acpi/sleep.c   |   21 +
>  drivers/pci/pci-acpi.c |8 +++-
>  2 files changed, 24 insertions(+), 5 deletions(-)
> 
> Index: linux/drivers/pci/pci-acpi.c
> ===
> --- linux.orig/drivers/pci/pci-acpi.c
> +++ linux/drivers/pci/pci-acpi.c
> @@ -17,6 +17,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include "pci.h"
>  
>  static DEFINE_MUTEX(pci_acpi_pm_notify_mtx);
> @@ -257,11 +258,16 @@ static int acpi_pci_set_power_state(stru
>   return -ENODEV;
>  
>   switch (state) {
> + case PCI_D3cold:
> + if (dev_pm_qos_flags(>dev, PM_QOS_FLAG_NO_POWER_OFF) ==
> + PM_QOS_FLAGS_ALL) {
> + error = -EBUSY;
> + break;
> + }
>   case PCI_D0:
>   case PCI_D1:
>   case PCI_D2:
>   case PCI_D3hot:
> - case PCI_D3cold:
>   error = acpi_bus_set_power(handle, state_conv[state]);
>   }
>  
> Index: linux/drivers/acpi/sleep.c
> ===
> --- linux.orig/drivers/acpi/sleep.c
> +++ linux/drivers/acpi/sleep.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -711,6 +712,7 @@ int acpi_pm_device_sleep_state(struct de
>   struct acpi_device *adev;
>   char acpi_method[] = "_SxD";
>   unsigned long long d_min, d_max;
> + bool wakeup = false;
>  
>   if (d_max_in < ACPI_STATE_D0 || d_max_in > ACPI_STATE_D3)
>   return -EINVAL;
> @@ -718,6 +720,13 @@ int acpi_pm_device_sleep_state(struct de
>   printk(KERN_DEBUG "ACPI handle has no context!\n");
>   return -ENODEV;
>   }
> + if (d_max_in > ACPI_STATE_D3_HOT) {
> + enum pm_qos_flags_status stat;
> +
> + stat = dev_pm_qos_flags(dev, PM_QOS_FLAG_NO_POWER_OFF);
> + if (stat == PM_QOS_FLAGS_ALL)
> + d_max_in = ACPI_STATE_D3_HOT;
> + }
>  
>   acpi_method[2] = '0' + acpi_target_sleep_state;
>   /*
> @@ -737,8 +746,14 @@ int acpi_pm_device_sleep_state(struct de
>* NOTE: We rely on acpi_evaluate_integer() not clobbering the integer
>* provided -- that's our fault recovery, we ignore retval.
>*/
> - if (acpi_target_sleep_state > ACPI_STATE_S0)
> + if (acpi_target_sleep_state > ACPI_STATE_S0) {
>   acpi_evaluate_integer(handle, acpi_method, NULL, _min);
> + wakeup = device_may_wakeup(dev) && adev->wakeup.flags.valid
> + && adev->wakeup.sleep_state >= acpi_target_sleep_state;
> + } else if (dev_pm_qos_flags(dev, PM_QOS_FLAG_REMOTE_WAKEUP) !=
> + PM_QOS_FLAGS_NONE) {
> + wakeup = adev->wakeup.flags.valid;
> + }
>  
>   /*
>* If _PRW says we can wake up the system from the target sleep state,
> @@ -747,9 +762,7 @@ int acpi_pm_device_sleep_state(struct de
>* (ACPI 3.x), it should return the maximum (lowest power) D-state that
>* can wake the system.  _S0W may be valid, too.
>*/
> - if (acpi_target_sleep_state == ACPI_STATE_S0 ||
> - (device_may_wakeup(dev) && adev->wakeup.flags.valid &&
> -  adev->wakeup.sleep_state >= acpi_target_sleep_state)) {
> + if (wakeup) {
>   acpi_status status;
>  
>   acpi_method[3] = 'W';
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Reviewed-by: mark gross 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/7] PM / Domains: Check device PM QoS flags in pm_genpd_poweroff()

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:08:39AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Make the generic PM domains pm_genpd_poweroff() function take
> device PM QoS flags into account when deciding whether or not to
> remove power from the domain.
> 
> After this change the routine will return -EBUSY without executing
> the domain's .power_off() callback if there is at least one PM QoS
> flags request for at least one device in the domain and at least of
> those request has at least one of the NO_POWER_OFF and REMOTE_WAKEUP
> flags set.
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  drivers/base/power/domain.c |   11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> Index: linux/drivers/base/power/domain.c
> ===
> --- linux.orig/drivers/base/power/domain.c
> +++ linux/drivers/base/power/domain.c
> @@ -470,10 +470,19 @@ static int pm_genpd_poweroff(struct gene
>   return -EBUSY;
>  
>   not_suspended = 0;
> - list_for_each_entry(pdd, >dev_list, list_node)
> + list_for_each_entry(pdd, >dev_list, list_node) {
> + enum pm_qos_flags_status stat;
> +
> + stat = dev_pm_qos_flags(pdd->dev,
> + PM_QOS_FLAG_NO_POWER_OFF
> + | PM_QOS_FLAG_REMOTE_WAKEUP);
> + if (stat > PM_QOS_FLAGS_NONE)
> + return -EBUSY;
> +
>   if (pdd->dev->driver && (!pm_runtime_suspended(pdd->dev)
>   || pdd->dev->power.irq_safe))
>   not_suspended++;
> + }
>  
>   if (not_suspended > genpd->in_progress)
>   return -EBUSY;
>
looks ok to me.
Reviewed-by: mark gross 

--mrak
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] PM / QoS: Make it possible to expose PM QoS device flags to user space

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:07:58AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Define two device PM QoS flags, PM_QOS_FLAG_NO_POWER_OFF
> and PM_QOS_FLAG_REMOTE_WAKEUP, and introduce routines
> dev_pm_qos_expose_flags() and dev_pm_qos_hide_flags() allowing the
> caller to expose those two flags to user space or to hide them
> from it, respectively.
> 
> After the flags have been exposed, user space will see two
> additional sysfs attributes, pm_qos_no_power_off and
> pm_qos_remote_wakeup, under the device's /sys/devices/.../power/
> directory.  Then, writing 1 to one of them will update the
> PM QoS flags request owned by user space so that the corresponding
> flag is requested to be set.  In turn, writing 0 to one of them
> will cause the corresponding flag in the user space's request to
> be cleared (however, the owners of the other PM QoS flags requests
> for the same device may still request the flag to be set and it
> may be effectively set even if user space doesn't request that).
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  Documentation/ABI/testing/sysfs-devices-power |   31 
>  drivers/base/power/power.h|6 
>  drivers/base/power/qos.c  |  167 
> --
>  drivers/base/power/sysfs.c|   95 +-
>  include/linux/pm.h|1 
>  include/linux/pm_qos.h|   26 
>  6 files changed, 278 insertions(+), 48 deletions(-)
> 
> Index: linux/include/linux/pm_qos.h
> ===
> --- linux.orig/include/linux/pm_qos.h
> +++ linux/include/linux/pm_qos.h
> @@ -34,6 +34,9 @@ enum pm_qos_flags_status {
>  #define PM_QOS_NETWORK_THROUGHPUT_DEFAULT_VALUE  0
>  #define PM_QOS_DEV_LAT_DEFAULT_VALUE 0
>  
> +#define PM_QOS_FLAG_NO_POWER_OFF (1 << 0)
> +#define PM_QOS_FLAG_REMOTE_WAKEUP(1 << 1)
> +
>  struct pm_qos_request {
>   struct plist_node node;
>   int pm_qos_class;
> @@ -86,6 +89,8 @@ struct pm_qos_flags {
>  struct dev_pm_qos {
>   struct pm_qos_constraints latency;
>   struct pm_qos_flags flags;
> + struct dev_pm_qos_request *latency_req;
> + struct dev_pm_qos_request *flags_req;

I think I'm getting it now.  if someday we have per device throughput
you would have us add a pm_qos_constraints throughput; and a
dev_pm_qos_request *throughput_req;


>  };
>  
>  /* Action requested to pm_qos_update_target */
> @@ -187,10 +192,31 @@ static inline int dev_pm_qos_add_ancesto
>  #ifdef CONFIG_PM_RUNTIME
>  int dev_pm_qos_expose_latency_limit(struct device *dev, s32 value);
>  void dev_pm_qos_hide_latency_limit(struct device *dev);
> +int dev_pm_qos_expose_flags(struct device *dev, s32 value);
> +void dev_pm_qos_hide_flags(struct device *dev);
> +int dev_pm_qos_update_flags(struct device *dev, s32 mask, bool set);
> +
> +static inline s32 dev_pm_qos_requested_latency(struct device *dev)
> +{
> + return dev->power.qos->latency_req->data.pnode.prio;
> +}
> +
> +static inline s32 dev_pm_qos_requested_flags(struct device *dev)
> +{
> + return dev->power.qos->flags_req->data.flr.flags;
> +}
>  #else
>  static inline int dev_pm_qos_expose_latency_limit(struct device *dev, s32 
> value)
>   { return 0; }
>  static inline void dev_pm_qos_hide_latency_limit(struct device *dev) {}
> +static inline int dev_pm_qos_expose_flags(struct device *dev, s32 value)
> + { return 0; }
> +static inline void dev_pm_qos_hide_flags(struct device *dev) {}
> +static inline int dev_pm_qos_update_flags(struct device *dev, s32 m, bool 
> set)
> + { return 0; }
> +
> +static inline s32 dev_pm_qos_requested_latency(struct device *dev) { return 
> 0; }
> +static inline s32 dev_pm_qos_requested_flags(struct device *dev) { return 0; 
> }
>  #endif
>  
>  #endif
> Index: linux/include/linux/pm.h
> ===
> --- linux.orig/include/linux/pm.h
> +++ linux/include/linux/pm.h
> @@ -548,7 +548,6 @@ struct dev_pm_info {
>   unsigned long   active_jiffies;
>   unsigned long   suspended_jiffies;
>   unsigned long   accounting_timestamp;
> - struct dev_pm_qos_request *pq_req;
>  #endif
>   struct pm_subsys_data   *subsys_data;  /* Owned by the subsystem. */
>   struct dev_pm_qos   *qos;
> Index: linux/drivers/base/power/qos.c
> ===
> --- linux.orig/drivers/base/power/qos.c
> +++ linux/drivers/base/power/qos.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "power.h"
>  
> @@ -322,6 +323,36 @@ int dev_pm_qos_add_request(struct device
>  EXPORT_SYMBOL_GPL(dev_pm_qos_add_request);
>  
>  /**
> + * __dev_pm_qos_update_request - Modify an existing device PM QoS request.
> + * @req : PM QoS

Re: include/linux/cgroup.h:566 suspicious rcu_dereference_check() usage!

2012-10-09 Thread Paul E. McKenney

On Tue, Oct 09, 2012 at 06:08:59PM -0700, Sergey Senozhatsky wrote:
> On (10/08/12 12:49), Paul E. McKenney wrote:
> > 
> > 
> > 
> > device_cgroup: Restore rcu_read_lock() protection to devcgroup_inode_mknod()
> > 
> > Commit ad676077 (device_cgroup: convert device_cgroup internally to
> > policy + exceptions) restructured devcgroup_inode_mknod(), removing
> > rcu_read_lock() in the process.  However, RCU read-side protection
> > is required by the call to task_devcgroup(), so this commit restores
> > the rcu_read_lock() and rcu_read_unlock().
> > 
> > Signed-off-by: Paul E. McKenney 
> > 
> > diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> > index 44dfc41..c686110 100644
> > --- a/security/device_cgroup.c
> > +++ b/security/device_cgroup.c
> > @@ -576,9 +576,12 @@ int __devcgroup_inode_permission(struct inode *inode, 
> > int mask)
> >  
> >  int devcgroup_inode_mknod(int mode, dev_t dev)
> >  {
> > -   struct dev_cgroup *dev_cgroup = task_devcgroup(current);
> > +   struct dev_cgroup *dev_cgroup;
> > +   int ret;
> > short type;
> >  
> > +   rcu_read_lock();
> > +   dev_cgroup = task_devcgroup(current);
> > if (!S_ISBLK(mode) && !S_ISCHR(mode))
> > return 0;
> >  
> > @@ -587,7 +590,9 @@ int devcgroup_inode_mknod(int mode, dev_t dev)
> > else
> > type = DEV_CHAR;
> >  
> > -   return __devcgroup_check_permission(dev_cgroup, type, MAJOR(dev),
> > +   ret =  __devcgroup_check_permission(dev_cgroup, type, MAJOR(dev),
> > MINOR(dev), ACC_MKNOD);
> > +   rcu_read_unlock();
> > +   return ret;
> >  
> >  }
> > 
> 
> 
> I believe the same should be done for __devcgroup_inode_permission() as well. 
> And we
> probably can call task_devcgroup() and rcu_read_lock() after "S_ISBLK(mode) 
> && !S_ISCHR(mode)"
> checks (I guess we also need to unlock RCU on `return 0').

Looks sane to me!  Dropping my patch.

Thanx, Paul

> 
> 
> Commit ad676077
>  | Author: Aristeu Rozanski 
>  | Date:   Thu Oct 4 17:15:17 2012 -0700
>  |   device_cgroup: convert device_cgroup internally to policy + exceptions
> 
> moved RCU read-side protection from devcgroup_inode_mknod(), which, however 
> is required
> by task_devcgroup(). Patch also add RCU read-side protection to 
> __devcgroup_inode_permission()
> function, introduced in commit ad676077.
> 
> [0.946303] include/linux/cgroup.h:566 suspicious rcu_dereference_check() 
> usage!
> [0.946511] 
> [0.946606] 2 locks held by kdevtmpfs/28:
> [0.946684]  #0:  (sb_writers){.+.+.+}, at: [] 
> mnt_want_write+0x24/0x4b
> [0.947083]  #1:  (>s_type->i_mutex_key#3/1){+.+.+.}, at: 
> [] kern_path_create+0x83/0x144
> [0.947598] 
> [0.947787] Call Trace:
> [0.947868]  [] lockdep_rcu_suspicious+0x109/0x112
> [0.947958]  [] devcgroup_inode_mknod+0x9e/0xee
> [0.948043]  [] vfs_mknod+0x8a/0xed
> [0.948129]  [] handle_create.isra.2+0x144/0x1b5
> [0.948214]  [] ? devtmpfsd+0x9f/0x138
> [0.948298]  [] ? do_raw_spin_lock+0x67/0xde
> [0.948384]  [] ? do_raw_spin_unlock+0x8f/0x98
> [0.948469]  [] ? handle_create.isra.2+0x1b5/0x1b5
> [0.948554]  [] devtmpfsd+0xe4/0x138
> [0.948638]  [] ? handle_create.isra.2+0x1b5/0x1b5
> [0.948724]  [] kthread+0xd5/0xdd
> [0.948814]  [] kernel_thread_helper+0x4/0x10
> [0.948900]  [] ? retint_restore_args+0x13/0x13
> [0.948985]  [] ? __init_kthread_worker+0x5a/0x5a
> [0.949069]  [] ? gs_change+0x13/0x13
> 
> 
> devcgroup_inode_mknod() part submitted by Paul E. McKenney.
> 
> 
> Signed-off-by: Sergey Senozhatsky 
> 
> ---
> 
>  security/device_cgroup.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> index 44dfc41..043eb00 100644
> --- a/security/device_cgroup.c
> +++ b/security/device_cgroup.c
> @@ -558,7 +558,8 @@ static int __devcgroup_check_permission(struct dev_cgroup 
> *dev_cgroup,
> 
>  int __devcgroup_inode_permission(struct inode *inode, int mask)
>  {
> - struct dev_cgroup *dev_cgroup = task_devcgroup(current);
> + struct dev_cgroup *dev_cgroup;
> + int ret;
>   short type, access = 0;
> 
>   if (S_ISBLK(inode->i_mode))
> @@ -570,13 +571,20 @@ int __devcgroup_inode_permission(struct inode *inode, 
> int mask)
>   if (mask & MAY_READ)
>   access |= ACC_READ;
> 
> - return __devcgroup_check_permission(dev_cgroup, type, imajor(inode),
> + rcu_read_lock();
> +
> + dev_cgroup = task_devcgroup(current);
> + ret = __devcgroup_check_permission(dev_cgroup, type, imajor(inode),
>   iminor(inode), access);
> +
> + rcu_read_unlock();
> + return ret;
>  }
> 
>  int devcgroup_inode_mknod(int mode,

[PATCH v6 2/2] ACPI: Add Intel MID SPI early console support.

2012-10-09 Thread Lv Zheng

DesignWare SPI UART is used as one of the debug ports on Low Power Intel
Architecture (LPIA) platforms.  This patch is introduced to support this
debugging console reported by ACPI DBGP/DBG2.  The original MID SPI
early console stuff is also refined to co-exist with the new ACPI usage
model.

There are two alternatives to use this facility on LPIA platforms:
1. Launch by normal earlycon (on MEDFIELD platforms):
   Enable the following kernel configurations:
 CONFIG_EARLY_PRINTK_INTEL_MID=y
   Pass the following kernel parameter to the kernel:
 earlyprintk=mrst
2. Launch by ACPI DBG2 earlycon (on CLOVERVIEW platforms):
   Enable the following kernel configurations:
 CONFIG_EARLY_PRINTK_ACPI=y
 CONFIG_EARLY_PRINTK_INTEL_MID_SPI=y
   Pass the following kernel parameter to the kernel:
 earlyprintk=acpi

Signed-off-by: Lv Zheng 
---
 Documentation/kernel-parameters.txt|1 +
 arch/x86/Kconfig.debug |   23 +++
 arch/x86/include/asm/mrst.h|2 +-
 arch/x86/kernel/early_printk.c |   12 +-
 arch/x86/platform/mrst/early_printk_mrst.c |  186 +--
 drivers/platform/x86/Makefile  |2 +
 drivers/platform/x86/early/Makefile|5 +
 drivers/platform/x86/early/intel_mid_spi.c |  220 
 include/acpi/actbl2.h  |1 +
 include/linux/intel_mid_early.h|   12 ++
 10 files changed, 283 insertions(+), 181 deletions(-)
 create mode 100644 drivers/platform/x86/early/Makefile
 create mode 100644 drivers/platform/x86/early/intel_mid_spi.c
 create mode 100644 include/linux/intel_mid_early.h

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index f656765..003ffd9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -762,6 +762,7 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
earlyprintk=vga
earlyprintk=serial[,ttySn[,baudrate]]
earlyprintk=ttySn[,baudrate]
+   earlyprintk=mrst
earlyprintk=dbgp[debugController#]
earlyprintk=acpi[debugController#]
 
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 5778082..4a26e2d 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -43,9 +43,32 @@ config EARLY_PRINTK
  with klogd/syslogd or the X server. You should normally N here,
  unless you want to debug such a crash.
 
+config EARLY_PRINTK_INTEL_MID_SPI
+   bool "Early printk for Intel MID SPI UART port"
+   depends on EARLY_PRINTK
+   ---help---
+ Write kernel log output directly into the MID SPI UART debug port.
+
+ Intel MID platforms are using DesignWare SPI UART as its debug
+ console.  This option does not introduce actual early console into
+ the kernel binary, but is required by a real early console
+ implementation (EARLY_PRINTK_INTEL_MID or EARLY_PRINTK_ACPI).
+ You should normally N here unless you need to do kernel booting
+ development.
+
 config EARLY_PRINTK_INTEL_MID
bool "Early printk for Intel MID platform support"
depends on EARLY_PRINTK && X86_INTEL_MID
+   select EARLY_PRINTK_INTEL_MID_SPI
+   ---help---
+ Write kernel log output directly into the MID SPI UART debug port.
+
+ Intel MID platforms are always equipped with SPI debug ports and
+ USB OTG debug ports. To enable these debugging facilities, you
+ need to pass "earlyprintk=mrst" parameter to the kernel through
+ boot loaders.  Please see "Documentation/kernel-parameter.txt" for
+ details.  You should normally N here unless you need to do kernel
+ booting development.
 
 config EARLY_PRINTK_DBGP
bool "Early printk via EHCI debug port"
diff --git a/arch/x86/include/asm/mrst.h b/arch/x86/include/asm/mrst.h
index fc18bf3..8ab0655 100644
--- a/arch/x86/include/asm/mrst.h
+++ b/arch/x86/include/asm/mrst.h
@@ -12,6 +12,7 @@
 #define _ASM_X86_MRST_H
 
 #include 
+#include 
 
 extern int pci_mrst_init(void);
 extern int __init sfi_parse_mrtc(struct sfi_table_header *table);
@@ -63,7 +64,6 @@ extern enum mrst_timer_options mrst_timer_options;
 #define SFI_MTMR_MAX_NUM 8
 #define SFI_MRTC_MAX   8
 
-extern struct console early_mrst_console;
 extern void mrst_early_console_init(void);
 
 extern struct console early_hsu_console;
diff --git a/arch/x86/kernel/early_printk.c b/arch/x86/kernel/early_printk.c
index bf5b596..9b5ee96 100644
--- a/arch/x86/kernel/early_printk.c
+++ b/arch/x86/kernel/early_printk.c
@@ -205,6 +205,16 @@ static inline void early_console_register(struct console 
*con, int keep_early)
 
 int __init __acpi_early_console_start(struct acpi_debug_port *info)
 {
+#ifdef CONFIG_EARLY_PRINTK_INTEL_MID_SPI
+   if (info->port_type ==

[PATCH v6 1/2] ACPI: Add early console framework for DBGP/DBG2.

2012-10-09 Thread Lv Zheng

Microsoft Debug Port Table (DBGP or DBG2) is used by the Windows SoC
platforms to describe their debugging facilities.
DBGP: http://msdn.microsoft.com/en-us/windows/hardware/hh134821
DBG2: http://msdn.microsoft.com/en-us/library/windows/hardware/hh673515

This patch enables the DBGP/DBG2 debug ports as Linux early console
launcher.  Individual early console drivers are also needed to get the
early kernel messages dumped on the consoles.  For example, to use the
SPI UART early console for the Low Power Intel Architecture (LPIA)
platforms, you need to enable the following kernel configurations:
  CONFIG_EARLY_PRINTK_ACPI=y
  CONFIG_EARLY_PRINTK_INTEL_MID_SPI=y
Then you need to append the following kernel parameter to the kernel
command line in your the boot loader configuration file:
  earlyprintk=acpi

There is a dilemma in designing this patch set.  Let me describe it in
details.
There should be three steps to enable an early console for an operating
system:
1. Probe: In this stage, the Linux kernel can detect the early consoles
  and the base address of their register block can be determined.
  This can be done by parsing the descriptors in the ACPI DBGP/DBG2
  tables.  Note that acpi_table_init() must be called before
  parsing.
2. Setup: In this stage, the Linux kernel can apply user specified
  configuration options (ex. baudrate of serial ports) for the
  early consoles.  This is done by parsing the early parameters
  passed to the kernel from the boot loaders.  Note that
  parse_early_params() is called very early to allow parameters to
  be passed to other kernel subsystems.
3. Start: In this stage, the Linux kernel can make the consoles ready to
  output logs.  Since the early consoles are always used for the
  kernel boot up debugging, this must be done as early as possible
  to arm the kernel with the highest testability for other kernel
  subsystems.  Note that, this stage happens when the
  register_console() is called.
The preferred sequence for the above steps is:
   +-++---+++
   | ACPI DBGP PROBE | -> | EARLY_PARAM SETUP | -> | EARLY_RPINTK START |
   +-++---+++
But unfortunately, in the current x86 implementation, early parameters and
early printk initialization are called before acpi_table_init() which
depends on the early memory mapping facility.
There are some choices for me to design this patch set:
1. Invoking acpi_table_init() before parse_early_param() to maintain the
   sequence:
   +-++---+++
   | ACPI DBGP PROBE | -> | EARLY_PARAM SETUP | -> | EARLY_RPINTK START |
   +-++---+++
   This requires other subsystem maintainers' review to ensure no
   regressions will be introduced.  At the first glance, I found there
   might be problems for the EFI subsystsm:
   The EFI boot services and runtime services are mixed up in the x86
   specific initialization process before the ACPI table initialization.
   Things are much worse that you even cannot disable the runtime services
   while still allow the boot services codes to be executed in the kernel
   compilation stage.  Enabling the early consoles after the ACPI table
   initialization will make it difficult to debug the runtime BIOS bugs.
   If any progress is made to the kernel boot sequences, please let me
   know.  I'll be willing to redesign the ACPI DBGP/DBG2 console probing
   facility.  You can reach me at  and
   .
2. Modifying the above sequece to make it look like:
   +---++-+++
   | EARLY_PARAM SETUP | -> | ACPI DBGP PROBE | -> | EARLY_RPINTK START |
   +---++-+++
   Early consoles started in this style will lose part of the testability
   in the kernel boot up sequence.  If the system does not crash very
   early, developers still can see the bufferred kernel outputs when the
   register_console() is called.
   Current early console implementations need to be modified to be
   compatible with this design.  The original codes need to be split up
   into tow parts:
   1. Detecting hardware.  This part can be called in the PROBE stage.
   2. Applying user parameters.  This part can be called in the SETUP
  stage.
   Individual early console drver maintainers need to be involved to avoid
   regressions that might occur if we do things in this way.  And the
   maintainers can offer better tests than I can do.
3. Introducing a brand new debugging facility that does not relate to the
   current early console implementation to allow the early consoles to be
   automatically detected.
   +---+++
   | EARLY_PARAM SETUP | -> |

Re: [PATCH 2/7] PM / QoS: Introduce request and constraint data types for PM QoS flags

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:05:07AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Introduce struct pm_qos_flags_request and struct pm_qos_flags
> representing PM QoS flags request type and PM QoS flags constraint
> type, respectively.  With these definitions the data structures
> will be arranged so that the list member of a struct pm_qos_flags
> object will contain the head of a list of struct pm_qos_flags_request
> objects representing all of the "flags" requests present for the
> given device.  Then, the effective_flags member of a struct
> pm_qos_flags object will contain the bitwise OR of the flags members
> of all the struct pm_qos_flags_request objects in the list.
> 
> Additionally, introduce helper function pm_qos_update_flags()
> allowing the caller to manage the list of struct pm_qos_flags_request
> pointed to by the list member of struct pm_qos_flags.
> 
> The flags are of type s32 so that the request's "value" field
> is always of the same type regardless of what kind of request it
> is (latency requests already have value fields of type s32).
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  include/linux/pm_qos.h |   17 +++--
>  kernel/power/qos.c |   63 
> +
>  2 files changed, 78 insertions(+), 2 deletions(-)
> 
> Index: linux/include/linux/pm_qos.h
> ===
> --- linux.orig/include/linux/pm_qos.h
> +++ linux/include/linux/pm_qos.h
> @@ -33,6 +33,11 @@ struct pm_qos_request {
>   struct delayed_work work; /* for pm_qos_update_request_timeout */
>  };
>  
> +struct pm_qos_flags_request {
> + struct list_head node;
> + s32 flags;  /* Do not change to 64 bit */
> +};
> +
>  struct dev_pm_qos_request {
>   struct plist_node node;
>   struct device *dev;
> @@ -45,8 +50,8 @@ enum pm_qos_type {
>  };
>  
>  /*
> - * Note: The lockless read path depends on the CPU accessing
> - * target_value atomically.  Atomic access is only guaranteed on all CPU
> + * Note: The lockless read path depends on the CPU accessing target_value
> + * or effective_flags atomically.  Atomic access is only guaranteed on all 
> CPU
>   * types linux supports for 32 bit quantites
>   */
>  struct pm_qos_constraints {
> @@ -57,6 +62,11 @@ struct pm_qos_constraints {
>   struct blocking_notifier_head *notifiers;
>  };
>  
> +struct pm_qos_flags {
> + struct list_head list;
> + s32 effective_flags;/* Do not change to 64 bit */
> +};
> +
>  struct dev_pm_qos {
>   struct pm_qos_constraints latency;
>  };
> @@ -75,6 +85,9 @@ static inline int dev_pm_qos_request_act
>  
>  int pm_qos_update_target(struct pm_qos_constraints *c, struct plist_node 
> *node,
>enum pm_qos_req_action action, int value);
> +bool pm_qos_update_flags(struct pm_qos_flags *pqf,
> +  struct pm_qos_flags_request *req,
> +  enum pm_qos_req_action action, s32 val);
>  void pm_qos_add_request(struct pm_qos_request *req, int pm_qos_class,
>   s32 value);
>  void pm_qos_update_request(struct pm_qos_request *req,
> Index: linux/kernel/power/qos.c
> ===
> --- linux.orig/kernel/power/qos.c
> +++ linux/kernel/power/qos.c
> @@ -213,6 +213,69 @@ int pm_qos_update_target(struct pm_qos_c
>  }
>  
>  /**
> + * pm_qos_flags_remove_req - Remove device PM QoS flags request.
> + * @pqf: Device PM QoS flags set to remove the request from.
> + * @req: Request to remove from the set.
> + */
> +static void pm_qos_flags_remove_req(struct pm_qos_flags *pqf,
> + struct pm_qos_flags_request *req)
> +{
> + s32 val = 0;
> +
> + list_del(>node);
> + list_for_each_entry(req, >list, node)
> + val |= req->flags;
> +
> + pqf->effective_flags = val;
> +}
> +
> +/**
> + * pm_qos_update_flags - Update a set of PM QoS flags.
> + * @pqf: Set of flags to update.
> + * @req: Request to add to the set, to modify, or to remove from the set.
> + * @action: Action to take on the set.
> + * @val: Value of the request to add or modify.
> + *
> + * Update the given set of PM QoS flags and call notifiers if the aggregate
> + * value has changed.  Returns 1 if the aggregate constraint value has 
> changed,
> + * 0 otherwise.
> + */
> +bool pm_qos_update_flags(struct pm_qos_flags *pqf,
> +  struct pm_qos_flags_request *req,
> +  enum pm_qos_req_action action, s32 val)
> +{
> + unsigned long irqflags;
> + s32 prev_value, curr_value;
> +
> + spin_lock_irqsave(_qos_lock, irqflags);
> +
> + prev_value = list_empty(>list) ? 0 : pqf->effective_flags;
> +
> + switch (action) {
> + case PM_QOS_REMOVE_REQ:
> + pm_qos_flags_remove_req(pqf, req);
> + break;
> + case PM_QOS_UPDATE_REQ:
> +

Re: udev breakages -

2012-10-09 Thread Felipe Contreras

On Thu, Oct 4, 2012 at 9:17 PM, Alan Cox  wrote:
>> I don't know how to handle the /dev/ptmx issue properly from within
>> devtmpfs, does anyone?  Proposals are always welcome, the last time this
>> came up a week or so ago, I don't recall seeing any proposals, just a
>> general complaint.
>
> Is it really a problem - devtmpfs is optional. It's a problem for the
> userspace folks to handle and if they made it mandatory in their code
> diddums, someone better go fork working versions.

If only there was a viable alternative to udev.

Distributions are being pushed around by the udev+systemd project
precisely because of this reason; udev maintainers have said that udev
on non-systemd systems is a dead end, so everyone that uses udev
(everyone) is being forced to switch to systemd if they want to
receive proper support, and at some point there might not be even a
choice.

I for one would like an alternative to both systemd and udev on my
Linux systems, and as of yet, I don't know of one.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/2] mm: Export vm_committed_as

2012-10-09 Thread KY Srinivasan



> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, October 09, 2012 9:17 PM
> To: KY Srinivasan
> Cc: Greg KH; o...@aepfle.de; linux-kernel@vger.kernel.org; 
> a...@firstfloor.org;
> a...@canonical.com; de...@linuxdriverproject.org
> Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> 
> On Wed, 10 Oct 2012 00:11:28 + KY Srinivasan  wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Andrew Morton [mailto:a...@linux-foundation.org]
> > > Sent: Tuesday, October 09, 2012 3:48 PM
> > > To: Greg KH
> > > Cc: KY Srinivasan; o...@aepfle.de; linux-kernel@vger.kernel.org;
> > > a...@firstfloor.org; a...@canonical.com; de...@linuxdriverproject.org
> > > Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> > >
> > > On Mon, 8 Oct 2012 06:35:39 -0700
> > > Greg KH  wrote:
> > >
> > > > On Mon, Oct 08, 2012 at 03:35:50AM +, KY Srinivasan wrote:
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > > > > > Sent: Sunday, October 07, 2012 8:44 PM
> > > > > > To: KY Srinivasan
> > > > > > Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
> > > o...@aepfle.de;
> > > > > > a...@canonical.com; a...@linux-foundation.org; a...@firstfloor.org
> > > > > > Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> > > > > >
> > > > > > On Sun, Oct 07, 2012 at 04:59:45PM -0700, K. Y. Srinivasan wrote:
> > > > > > > The policy engine on the host expects the guest to report the
> > > > > > > committed_as. Since this variable is not exported,
> > > > > > > export this symbol.
> > > > > >
> > > > > > Why are these symbols not needed by either Xen or KVM or vmware,
> which
> > > > > > I think all support the same thing, right?
> > > > >
> > > > > The basic balloon driver does not need this symbol since the basic 
> > > > > balloon
> > > driver
> > > > > is not automatically driven by the host. On the Windows host we have a
> policy
> > > engine that
> > > > > drives the balloon driver based on both guest level memory pressure 
> > > > > that
> the
> > > guest
> > > > > reports as well as other system level metrics the host maintains. We 
> > > > > need
> this
> > > symbol to
> > > > > drive the policy engine on the host.
> > > >
> > > > Ok, but you're going to have to get the -mm developers to agree that
> > > > this is ok before I can accept it.
> > >
> > > Well I guess it won't kill us.
> >
> >
> > Thanks.
> >
> 
> The other part of my email seems to have vanished.  Which makes me sad,
> because I was rather interested in the answer.

Andrew,

I am truly sorry for not answering your question.  To be honest, I was not 
quite sure
what the issue was, given from what I can tell, the vm_committed_as is 
maintained
globally and would represent the overall memory commitment of the system. Is 
this not the
case. 

Regards,

K. Y 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] PM / QoS: Prepare device structure for adding more constraint types

2012-10-09 Thread mark gross

On Mon, Oct 08, 2012 at 10:04:03AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> Currently struct dev_pm_info contains only one PM QoS constraints
> pointer reserved for latency requirements.  Since one more device
> constraints type (i.e. flags) will be necessary, introduce a new
> structure, struct dev_pm_qos, that eventually will contain all of
> the available device PM QoS constraints and replace the "constraints"
> pointer in struct dev_pm_info with a pointer to the new structure
> called "qos".
> 
> Signed-off-by: Rafael J. Wysocki 
> Reviewed-by: Jean Pihet 
> ---
>  drivers/base/power/qos.c |   42 ++
>  include/linux/pm.h   |2 +-
>  include/linux/pm_qos.h   |4 
>  3 files changed, 27 insertions(+), 21 deletions(-)
> 
> Index: linux/include/linux/pm.h
> ===
> --- linux.orig/include/linux/pm.h
> +++ linux/include/linux/pm.h
> @@ -551,7 +551,7 @@ struct dev_pm_info {
>   struct dev_pm_qos_request *pq_req;
>  #endif
>   struct pm_subsys_data   *subsys_data;  /* Owned by the subsystem. */
> - struct pm_qos_constraints *constraints;
> + struct dev_pm_qos   *qos;
>  };
>  
>  extern void update_pm_runtime_accounting(struct device *dev);
> Index: linux/include/linux/pm_qos.h
> ===
> --- linux.orig/include/linux/pm_qos.h
> +++ linux/include/linux/pm_qos.h
> @@ -57,6 +57,10 @@ struct pm_qos_constraints {
>   struct blocking_notifier_head *notifiers;
>  };
>  
> +struct dev_pm_qos {
> + struct pm_qos_constraints latency;
What about non-latency constraints?  This pretty much makes it explicit
that dev_pm_qos is all about latency.  from the commit comment I thought
you where trying to make it more genaric.  Why not call "latency"
"constraint" or something less specific?

--mark

> +};
> +
>  /* Action requested to pm_qos_update_target */
>  enum pm_qos_req_action {
>   PM_QOS_ADD_REQ, /* Add a new request */
> Index: linux/drivers/base/power/qos.c
> ===
> --- linux.orig/drivers/base/power/qos.c
> +++ linux/drivers/base/power/qos.c
> @@ -55,9 +55,7 @@ static BLOCKING_NOTIFIER_HEAD(dev_pm_not
>   */
>  s32 __dev_pm_qos_read_value(struct device *dev)
>  {
> - struct pm_qos_constraints *c = dev->power.constraints;
> -
> - return c ? pm_qos_read_value(c) : 0;
> + return dev->power.qos ? pm_qos_read_value(>power.qos->latency) : 0;
>  }
>  
>  /**
> @@ -91,12 +89,12 @@ static int apply_constraint(struct dev_p
>  {
>   int ret, curr_value;
>  
> - ret = pm_qos_update_target(req->dev->power.constraints,
> + ret = pm_qos_update_target(>dev->power.qos->latency,
>  >node, action, value);
>  
>   if (ret) {
>   /* Call the global callbacks if needed */
> - curr_value = pm_qos_read_value(req->dev->power.constraints);
> + curr_value = pm_qos_read_value(>dev->power.qos->latency);
>   blocking_notifier_call_chain(_pm_notifiers,
>(unsigned long)curr_value,
>req);
> @@ -114,20 +112,22 @@ static int apply_constraint(struct dev_p
>   */
>  static int dev_pm_qos_constraints_allocate(struct device *dev)
>  {
> + struct dev_pm_qos *qos;
>   struct pm_qos_constraints *c;
>   struct blocking_notifier_head *n;
>  
> - c = kzalloc(sizeof(*c), GFP_KERNEL);
> - if (!c)
> + qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> + if (!qos)
>   return -ENOMEM;
>  
>   n = kzalloc(sizeof(*n), GFP_KERNEL);
>   if (!n) {
> - kfree(c);
> + kfree(qos);
>   return -ENOMEM;
>   }
>   BLOCKING_INIT_NOTIFIER_HEAD(n);
>  
> + c = >latency;
>   plist_head_init(>list);
>   c->target_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
>   c->default_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
> @@ -135,7 +135,7 @@ static int dev_pm_qos_constraints_alloca
>   c->notifiers = n;
>  
>   spin_lock_irq(>power.lock);
> - dev->power.constraints = c;
> + dev->power.qos = qos;
>   spin_unlock_irq(>power.lock);
>  
>   return 0;
> @@ -151,7 +151,7 @@ static int dev_pm_qos_constraints_alloca
>  void dev_pm_qos_constraints_init(struct device *dev)
>  {
>   mutex_lock(_pm_qos_mtx);
> - dev->power.constraints = NULL;
> + dev->power.qos = NULL;
>   dev->power.power_state = PMSG_ON;
>   mutex_unlock(_pm_qos_mtx);
>  }
> @@ -164,6 +164,7 @@ void dev_pm_qos_constraints_init(struct
>   */
>  void dev_pm_qos_constraints_destroy(struct device *dev)
>  {
> + struct dev_pm_qos *qos;
>   struct dev_pm_qos_request *req, *tmp;
>   struct pm_qos_constraints *c;
>  
> @@ -176,10 +177,11 @@ void dev_pm_qos_constraints_destroy(stru
>

Re: linux-next: build failure after merge of the origin tree

2012-10-09 Thread Stephen Rothwell

Hi,

On Wed, 10 Oct 2012 08:52:21 +0900 Yasuaki Ishimatsu 
 wrote:
>
> 2012/10/10 8:45, Andrew Morton wrote:
> > On Wed, 10 Oct 2012 10:21:50 +1100 Stephen Rothwell  
> > wrote:
> >
> >> Hi Linus,
> >>
> >> In Linus' tree, today's linux-next build (powerpc ppc64_defconfig) failed
> >> like this:
> >>
> >> arch/powerpc/platforms/pseries/hotplug-memory.c: In function 
> >> 'pseries_remove_memblock':
> >> arch/powerpc/platforms/pseries/hotplug-memory.c:103:17: error: unused 
> >> variable 'pfn' [-Werror=unused-variable]
> >>
> >> Caused by commit d760afd4d257 ("memory-hotplug: suppress "Trying to free
> >> nonexistent resource " warning").
> >>
> >> I can't see what the point of the "pfn" variable is
> >
> > This:
> >
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c~a
> > +++ a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -101,7 +101,7 @@ static int pseries_remove_memblock(unsig
> > sections_to_remove = (memblock_size >> PAGE_SHIFT) / PAGES_PER_SECTION;
> > for (i = 0; i < sections_to_remove; i++) {
> > unsigned long pfn = start_pfn + i * PAGES_PER_SECTION;
> > -   ret = __remove_pages(zone, start_pfn,  PAGES_PER_SECTION);
> > +   ret = __remove_pages(zone, pfn, PAGES_PER_SECTION);
> > if (ret)
> > return ret;
> > }
> 
> I believe the error to be fixed with this patch.
> Could you try it?

The certainly fixes the build problem.  I can't comment in the semantics
of the patch.

Tested-by: Stephen Rothwell   (Build only)
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpPd0hiaAeXO.pgp
Description: PGP signature

RE: [PATCH v5 2/2] ACPI: Add Intel MID SPI early console support.

2012-10-09 Thread Zheng, Lv

> > +#ifdef CONFIG_EARLY_PRINTK_INTEL_MID_SPI
> > +   if (info->port_type == ACPI_DBG2_SERIAL_PORT
> > +   && info->port_subtype == ACPI_DBG2_INTEL_MID_SPI
> > +   && info->register_count > 0) {
> Is it ever going to be zero?

NAK.
No register base definition (buggy BIOS?) is meaningless for ACPI launched 
MID_SPI earlycon.

Thanks and best regards/Lv Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] mfd: tps65910: Initialize mfd devices after all initialization done

2012-10-09 Thread Mark Brown

On Tue, Oct 09, 2012 at 04:58:34PM +0530, Laxman Dewangan wrote:
> Add sub devices of tps65910 after all initialization like interrupt,
> clock etc. is done. This will make sure that require data gets
> initialized properly before sub devices probe's get called.

Reviewed-by: Mark Brown 

but isn't this needed as a bug fix and so shouldn't it be the first
patch in the series and go in for v3.7?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/5] mfd: tps65910: move interrupt implementation code to mfd file

2012-10-09 Thread Mark Brown

On Tue, Oct 09, 2012 at 04:58:33PM +0530, Laxman Dewangan wrote:
> In place of implementing the irq support in separate file,
> moving implementation to main mfd file.
> The irq files only contains the table and init steps only
> and does not need extra file to have this only for this
> purpose.

Reviwed-by: Mark Brown 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/5] mfd: tps65910: use regmap irq framework for interrupt support

2012-10-09 Thread Mark Brown

On Tue, Oct 09, 2012 at 04:58:32PM +0530, Laxman Dewangan wrote:
> Implement irq support of tps65910 with regmap irq framework
> in place of implementing locally.
> This reduces the code size significantly and easy to maintain.

Reviwed-by: Mark Brown 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the kmemleak tree with Linus' tree

2012-10-09 Thread Stephen Rothwell

Hi Catalin,

Today's linux-next merge of the kmemleak tree got a conflict in
mm/kmemleak.c between commit 85d3a316c714 ("kmemleak: use rbtree instead
of prio tree") from Linus' tree and commit 48786770bf3b ("kmemleak: do
not leak object after tree insertion error") from the kmemleak tree.

The kmemleak tree commit has been there since April, should it have
progressed by now?  Its fix is also included in the above commit from
Linus' tree.

I just used the version from Linus' tree and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp9Oyu22PEGy.pgp
Description: PGP signature

Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-10-09 Thread Andrew Theurer

On Wed, 2012-10-10 at 00:21 +0530, Raghavendra K T wrote:
> * Avi Kivity  [2012-10-04 17:00:28]:
> 
> > On 10/04/2012 03:07 PM, Peter Zijlstra wrote:
> > > On Thu, 2012-10-04 at 14:41 +0200, Avi Kivity wrote:
> > >> 
> > >> Again the numbers are ridiculously high for arch_local_irq_restore.
> > >> Maybe there's a bad perf/kvm interaction when we're injecting an
> > >> interrupt, I can't believe we're spending 84% of the time running the
> > >> popf instruction. 
> > > 
> > > Smells like a software fallback that doesn't do NMI, hrtimer based
> > > sampling typically hits popf where we re-enable interrupts.
> > 
> > Good nose, that's probably it.  Raghavendra, can you ensure that the PMU
> > is properly exposed?  'dmesg' in the guest will tell.  If it isn't, -cpu
> > host will expose it (and a good idea anyway to get best performance).
> > 
> 
> Hi Avi, you are right. SandyBridge machine result was not proper.
> I cleaned up the services, enabled PMU, re-ran all the test again.
> 
> Here is the summary:
> We do get good benefit by increasing ple window. Though we don't
> see good benefit for kernbench and sysbench, for ebizzy, we get huge
> improvement for 1x scenario. (almost 2/3rd of ple disabled case).
> 
> Let me know if you think we can increase the default ple_window
> itself to 16k.
> 
> I am experimenting with V2 version of undercommit improvement(this) patch
> series, But I think if you wish  to go for increase of
> default ple_window, then we would have to measure the benefit of patches
> when ple_window = 16k.
> 
> I can respin the whole series including this default ple_window change.
> 
> I also have the perf kvm top result for both ebizzy and kernbench.
> I think they are in expected lines now.
> 
> Improvements
> 
> 
> 16 core PLE machine with 16 vcpu guest
> 
> base = 3.6.0-rc5 + ple handler optimization patches
> base_pleopt_16k = base + ple_window = 16k
> base_pleopt_32k = base + ple_window = 32k
> base_pleopt_nople = base + ple_gap = 0
> kernbench, hackbench, sysbench (time in sec lower is better)
> ebizzy (rec/sec higher is better)
> 
> % improvements w.r.t base (ple_window = 4k)
> ---+---+-+---+
>|base_pleopt_16k| base_pleopt_32k | base_pleopt_nople |
> ---+---+-+---+
> kernbench_1x   |  0.42371  |  1.15164|   0.09320 |
> kernbench_2x   | -1.40981  | -17.48282   |  -570.77053   |
> ---+---+-+---+
> sysbench_1x| -0.92367  | 0.24241 | -0.27027  |
> sysbench_2x| -2.22706  |-0.30896 | -1.27573  |
> sysbench_3x| -0.75509  | 0.09444 | -2.97756  |
> ---+---+-+---+
> ebizzy_1x  | 54.99976  | 67.29460|  74.14076 |
> ebizzy_2x  | -8.83386  |-27.38403| -96.22066 |
> ---+---+-+---+
> 
> perf kvm top observation for kernbench and ebizzy (nople, 4k, 32k window) 
> 

Is the perf data for 1x overcommit?

> pleopt   ple_gap=0
> 
> ebizzy : 18131 records/s
> 63.78%  [guest.kernel]  [g] _raw_spin_lock_irqsave
> 5.65%  [guest.kernel]  [g] smp_call_function_many
> 3.12%  [guest.kernel]  [g] clear_page
> 3.02%  [guest.kernel]  [g] down_read_trylock
> 1.85%  [guest.kernel]  [g] async_page_fault
> 1.81%  [guest.kernel]  [g] up_read
> 1.76%  [guest.kernel]  [g] native_apic_mem_write
> 1.70%  [guest.kernel]  [g] find_vma

Does 'perf kvm top' not give host samples at the same time?  Would be
nice to see the host overhead as a function of varying ple window.  I
would expect that to be the major difference between 4/16/32k window
sizes.

A big concern I have (if this is 1x overcommit) for ebizzy is that it
has just terrible scalability to begin with.  I do not think we should
try to optimize such a bad workload.

> kernbench :Elapsed Time 29.4933 (27.6007)
>5.72%  [guest.kernel]  [g] async_page_fault
> 3.48%  [guest.kernel]  [g] pvclock_clocksource_read
> 2.68%  [guest.kernel]  [g] copy_user_generic_unrolled
> 2.58%  [guest.kernel]  [g] clear_page
> 2.09%  [guest.kernel]  [g] page_cache_get_speculative
> 2.00%  [guest.kernel]  [g] do_raw_spin_lock
> 1.78%  [guest.kernel]  [g] unmap_single_vma
> 1.74%  [guest.kernel]  [g] kmem_cache_alloc

> 
> pleopt ple_window = 4k
> ---
> ebizzy: 10176 records/s
>69.17%  [guest.kernel]  [g] _raw_spin_lock_irqsave
> 3.34%  [guest.kernel]  [g] clear_page
> 2.16%  [guest.kernel]  [g] down_read_trylock
> 1.94%  [guest.kernel]  [g] async_page_fault
> 1.89%  [guest.kernel]  [g] native_apic_mem_write
> 1.63%  [guest.kernel]  [g]

Re: [PATCH v2] Makefile: Add option to disable gcc automatic inlining

2012-10-09 Thread Ezequiel Garcia

Hey Richard,

On Sun, Oct 7, 2012 at 7:30 PM, richard -rw- weinberger
 wrote:
> On Sun, Oct 7, 2012 at 9:18 PM, Ezequiel Garcia  wrote:
>> The new option is CONFIG_CC_DISABLE_AUTO_INLINE and it's
>> located at:
>>   * Kernel hacking
>> * Disable gcc automatic inlining
>
> Can you guarantee that this will still produce a valid kernel?
> AFAIK we have some functions which have to be inlined in any case.
>

As a matter of fact, you're right... this breaks compilation on my
desktop machine.
I guess I'll have to give it a respin :-(

Thanks for the feedback.

Ezequiel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the tip tree with Linus' tree

2012-10-09 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in mm/mempolicy.c
between commit 63f74ca21f1f ("mempolicy: fix refcount leak in
mpol_set_shared_policy()") from Linus' tree and commit 4d58c795f691
("mm/mpol: Check for misplaced page") from the tip tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc mm/mempolicy.c
index 0b78fb9,3360a8d..000
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@@ -2170,12 -2168,116 +2203,122 @@@ mpol_shared_policy_lookup(struct shared
return pol;
  }
  
 +static void sp_free(struct sp_node *n)
 +{
 +  mpol_put(n->policy);
 +  kmem_cache_free(sn_cache, n);
 +}
 +
+ /**
+  * mpol_misplaced - check whether current page node is valid in policy
+  *
+  * @page   - page to be checked
+  * @vma- vm area where page mapped
+  * @addr   - virtual address where page mapped
+  * @multi  - use multi-stage node binding
+  *
+  * Lookup current policy node id for vma,addr and "compare to" page's
+  * node id.
+  *
+  * Returns:
+  *-1  - not misplaced, page is in the right node
+  *node- node id where the page should be
+  *
+  * Policy determination "mimics" alloc_page_vma().
+  * Called from fault path where we know the vma and faulting address.
+  */
+ int mpol_misplaced(struct page *page, struct vm_area_struct *vma,
+  unsigned long addr, int multi)
+ {
+   struct mempolicy *pol;
+   struct zone *zone;
+   int curnid = page_to_nid(page);
+   unsigned long pgoff;
+   int polnid = -1;
+   int ret = -1;
+ 
+   BUG_ON(!vma);
+ 
+   pol = get_vma_policy(current, vma, addr);
+   if (!(pol->flags & MPOL_F_MOF))
+   goto out;
+ 
+   switch (pol->mode) {
+   case MPOL_INTERLEAVE:
+   BUG_ON(addr >= vma->vm_end);
+   BUG_ON(addr < vma->vm_start);
+ 
+   pgoff = vma->vm_pgoff;
+   pgoff += (addr - vma->vm_start) >> PAGE_SHIFT;
+   polnid = offset_il_node(pol, vma, pgoff);
+   break;
+ 
+   case MPOL_PREFERRED:
+   if (pol->flags & MPOL_F_LOCAL)
+   polnid = numa_node_id();
+   else
+   polnid = pol->v.preferred_node;
+   break;
+ 
+   case MPOL_BIND:
+   /*
+* allows binding to multiple nodes.
+* use current page if in policy nodemask,
+* else select nearest allowed node, if any.
+* If no allowed nodes, use current [!misplaced].
+*/
+   if (node_isset(curnid, pol->v.nodes))
+   goto out;
+   (void)first_zones_zonelist(
+   node_zonelist(numa_node_id(), GFP_HIGHUSER),
+   gfp_zone(GFP_HIGHUSER),
+   >v.nodes, );
+   polnid = zone->node;
+   break;
+ 
+   default:
+   BUG();
+   }
+ 
+   /*
+* Multi-stage node selection is used in conjunction with a periodic
+* migration fault to build a temporal task<->page relation. By
+* using a two-stage filter we remove short/unlikely relations.
+*
+* Using P(p) ~ n_p / n_t as per frequentist probability, we can
+* equate a task's usage of a particular page (n_p) per total usage
+* of this page (n_t) (in a given time-span) to a probability.
+*
+* Our periodic faults will then sample this probability and getting
+* the same result twice in a row, given these samples are fully
+* independent, is then given by P(n)^2, provided our sample period
+* is sufficiently short compared to the usage pattern.
+*
+* This quadric squishes small probabilities, making it less likely
+* we act on an unlikely task<->page relation.
+*
+* NOTE: effectively we're using task-home-node<->page-node relations
+* since those are the only thing we can affect.
+*
+* NOTE: we're using task-home-node as opposed to the current node
+* the task might be running on, since the task-home-node is the
+* long-term node of this task, further reducing noise. Also see
+* task_tick_numa().
+*/
+   if (multi && (pol->flags & MPOL_F_HOME)) {
+   int last_nid = page_xchg_last_nid(page, polnid);
+   if (last_nid != polnid)
+   goto out;
+   }
+ 
+   if (curnid != polnid)
+   ret = polnid;
+ out:
+   mpol_cond_put(pol);
+ 
+   return ret;
+ }
+ 
  static void sp_delete(struct shared_policy *sp, struct sp_node *n)
  {
pr_debug("deleting %lx-l%lx\n", n->start, n->end);


pgpVglgEmApPz.pgp
Description: PGP signature

linux-next: manual merge of the tip tree with Linus' tree

2012-10-09 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in
mm/huge_memory.c between commits d516904bd239 ("thp: merge page pre-alloc
in khugepaged_loop into khugepaged_do_scan"), e3ebcf643811 ("thp: remove
assumptions on pgtable_t type") and 46dcde735c9d ("thp: introduce
pmdp_invalidate()") from Linus' tree and commit 93c9d633bd9e ("mm/thp:
Preserve pgprot across huge page split") from the tip tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc mm/huge_memory.c
index a863af2,5b9ab25..000
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@@ -1346,59 -1428,55 +1417,54 @@@ static int __split_huge_page_map(struc
spin_lock(>page_table_lock);
pmd = page_check_address_pmd(page, mm, address,
 PAGE_CHECK_ADDRESS_PMD_SPLITTING_FLAG);
-   if (pmd) {
-   pgtable = pgtable_trans_huge_withdraw(mm);
-   pmd_populate(mm, &_pmd, pgtable);
- 
-   haddr = address;
-   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
-   pte_t *pte, entry;
-   BUG_ON(PageCompound(page+i));
-   entry = mk_pte(page + i, vma->vm_page_prot);
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
-   if (!pmd_write(*pmd))
-   entry = pte_wrprotect(entry);
-   else
-   BUG_ON(page_mapcount(page) != 1);
-   if (!pmd_young(*pmd))
-   entry = pte_mkold(entry);
-   pte = pte_offset_map(&_pmd, haddr);
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(mm, haddr, pte, entry);
-   pte_unmap(pte);
-   }
+   if (!pmd)
+   goto unlock;
  
-   smp_wmb(); /* make pte visible before pmd */
-   /*
-* Up to this point the pmd is present and huge and
-* userland has the whole access to the hugepage
-* during the split (which happens in place). If we
-* overwrite the pmd with the not-huge version
-* pointing to the pte here (which of course we could
-* if all CPUs were bug free), userland could trigger
-* a small page size TLB miss on the small sized TLB
-* while the hugepage TLB entry is still established
-* in the huge TLB. Some CPU doesn't like that. See
-* http://support.amd.com/us/Processor_TechDocs/41322.pdf,
-* Erratum 383 on page 93. Intel should be safe but is
-* also warns that it's only safe if the permission
-* and cache attributes of the two entries loaded in
-* the two TLB is identical (which should be the case
-* here). But it is generally safer to never allow
-* small and huge TLB entries for the same virtual
-* address to be loaded simultaneously. So instead of
-* doing "pmd_populate(); flush_tlb_range();" we first
-* mark the current pmd notpresent (atomically because
-* here the pmd_trans_huge and pmd_trans_splitting
-* must remain set at all times on the pmd until the
-* split is complete for this pmd), then we flush the
-* SMP TLB and finally we write the non-huge version
-* of the pmd entry with pmd_populate.
-*/
-   pmdp_invalidate(vma, address, pmd);
-   pmd_populate(mm, pmd, pgtable);
-   ret = 1;
+   prot = pmd_pgprot(*pmd);
 -  pgtable = get_pmd_huge_pte(mm);
++  pgtable = pgtable_trans_huge_withdraw(mm);
+   pmd_populate(mm, &_pmd, pgtable);
+ 
+   for (i = 0, haddr = address; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) 
{
+   pte_t *pte, entry;
+ 
+   BUG_ON(PageCompound(page+i));
+   entry = mk_pte(page + i, prot);
+   entry = pte_mkdirty(entry);
+   if (!pmd_young(*pmd))
+   entry = pte_mkold(entry);
+   pte = pte_offset_map(&_pmd, haddr);
+   BUG_ON(!pte_none(*pte));
+   set_pte_at(mm, haddr, pte, entry);
+   pte_unmap(pte);
}
+ 
+   smp_wmb(); /* make ptes visible before pmd, see __pte_alloc */
+   /*
+* Up to this point the pmd is present and huge.
+*
+* If we overwrite the pmd with the not-huge version, we could trigger
+* a small page size TLB miss on the small sized TLB while the hugepage
+* TLB entry is still established in the huge TLB.
+*
+* Some CPUs don't like that. See
+*

RE: [PATCH 00/16] f2fs: introduce flash-friendly file system

2012-10-09 Thread Jaegeuk Kim

> -Original Message-
> From: linux-fsdevel-ow...@vger.kernel.org 
> [mailto:linux-fsdevel-ow...@vger.kernel.org] On Behalf Of
> Dave Chinner
> Sent: Wednesday, October 10, 2012 6:20 AM
> To: Jaegeuk Kim
> Cc: 'Lukáš Czerner'; 'Namjae Jeon'; 'Vyacheslav Dubeyko'; 'Marco Stornelli'; 
> 'Jaegeuk Kim'; 'Al Viro';
> ty...@mit.edu; gre...@linuxfoundation.org; linux-kernel@vger.kernel.org; 
> chur@samsung.com;
> cm224@samsung.com; jooyoung.hw...@samsung.com; 
> linux-fsde...@vger.kernel.org
> Subject: Re: [PATCH 00/16] f2fs: introduce flash-friendly file system
> 
> [ Folks, can you trim your responses down to just quote the part you
> are responding to? Having to repeatedly scroll through 500 lines of
> irrelevant text just to find the 5 lines that is being commented on
> is exceedingly painful.  ]

Ok, I'll keep in mind.
Thanks.

> 
> On Tue, Oct 09, 2012 at 09:01:18PM +0900, Jaegeuk Kim wrote:
> > > From: Lukáš Czerner [mailto:lczer...@redhat.com]
> > > > > I am sorry but this reply makes me smile. How can you design a fs
> > > > > relying on time attack heuristics to figure out what the proper
> > > > > layout should be ? Or even endorse such heuristics to be used in
> > > > > mkfs ? What we should be focusing on is to push vendors to actually
> > > > > give us such information so we can properly propagate that
> > > > > throughout the kernel - that's something everyone will benefit from.
> > > > > After that the optimization can be done in every file system.
> > > > >
> > > >
> > > > Frankly speaking, I agree that it would be the right direction 
> > > > eventually.
> > > > But, as you know, it's very difficult for all flash vendors to promote 
> > > > and standardize that.
> > > > Because each vendors have different strategies to open their internal 
> > > > information and also try
> > > > to protect their secrets whatever they are.
> > > >
> > > > IMO, we don't need to wait them now.
> > > > Instead, from the start, I suggest f2fs that uses those information to 
> > > > the file system design.
> > > > In addition, I suggest using heuristics right now as best efforts.
> 
> And in response, other people are "suggesting" that this is the
> wrong approach.

Ok, it makes sense.
I agree that the Linaro survey has been well proceeded, and no more heuristic 
is needed.

> 
> > > > Maybe in future, if vendors give something, f2fs would be more feasible.
> > > > In the mean time, I strongly hope to validate and stabilize f2fs with 
> > > > community.
> > >
> > > Do not get me wrong, I do not think it is worth to wait for vendors
> > > to come to their senses, but it is worth constantly reminding that
> > > we *need* this kind of information and those heuristics are not
> > > feasible in the long run anyway.
> > >
> > > I believe that this conversation happened several times already, but
> > > what about having independent public database of all the internal
> > > information about hw from different vendors where users can add
> > > information gathered by the time attack heuristic so other does not
> > > have to run this again and again. I am not sure if Linaro or someone
> > > else have something like that, someone can maybe post a link to that.
> 
> Linaro already have one, which is another reason why using
> heuristics is the wrong approach:
> 
> https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey?action=show=WorkingGrou
> ps%2FKernelConsolidation%2FProjects%2FFlashCardSurvey
> 
> > As I mentioned, I agree to push vendors to open those information all the 
> > time.
> > And, I absolutely didn't mean that it is worth to wait vendors.
> > I meant, until opening those information by vendors, something like
> > proposing f2fs or gathering heuristics are also needed simultaneously.
> >
> > Anyway, it's very interesting to build a database gathering products' 
> > information.
> > May I access the database?
> 
> It's public information.
> 
> If you want to support different types of flash, then either add
> your timing attack derived information on specific hardware to the
> above table, or force vendors to update it themselves if they want
> their flash memory supported by this filesystem.

Sound good.
If I also get something, I'll try.
Thank you.
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver

2012-10-09 Thread KY Srinivasan



> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, October 09, 2012 9:15 PM
> To: KY Srinivasan
> Cc: gre...@linuxfoundation.org; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> a...@firstfloor.org
> Subject: Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver
> 
> On Wed, 10 Oct 2012 00:09:12 + KY Srinivasan  wrote:
> 
> > > > +   if (!pg) {
> > > > +   *alloc_error = true;
> > > > +   return i * alloc_unit;
> > > > +   }
> > > > +
> > > > +   totalram_pages -= alloc_unit;
> > >
> > > Well, I'd consider totalram_pages to be an mm-private thing which drivers
> > > shouldn't muck with.  Why is this done?
> >
> > By modifying the totalram_pages, the information presented in /proc/meminfo
> > correctly reflects what is currently assigned to the guest (MemTotal).
> 
> eh?  /proc/meminfo:MemTotal tells you the total memory in the machine.
> The only thing which should change it after boot is memory hotplug.
> 
> Modifying it in this manner puts the statistic into a state know as
> "wrong".  And temporarily modifying it in this fashion will cause the
> tremendous amount of initialisation code which relies upon
> totalram_pages for sizing to also enter the "wrong" state.
> 
> Why on earth do balloon drivers do this?  If the amount of memory which
> is consumed by balloons is interesting then it should be exported via a
> standalone metric, not by mucking with totalram_pages.

I see your point. I will get rid of the code that manipulates the 
totalram_pages.

Regards,

K. Y
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 068/108] HID: hidraw: add proper error handling to raw event reporting

2012-10-09 Thread Ben Hutchings

On Mon, 2012-10-08 at 16:34 -0300, Herton Ronaldo Krzesinski wrote:
> On Sun, Oct 07, 2012 at 11:59:42PM +0100, Ben Hutchings wrote:
> > 3.2-stable review patch.  If anyone has any objections, please let me know.
> > 
> > --
> > 
> > From: Jiri Kosina 
> > 
> > commit b6787242f32700377d3da3b8d788ab3928bab849 upstream.
> > 
> > If kmemdup() in hidraw_report_event() fails, we are not propagating
> > this fact properly.
> > 
> > Let hidraw_report_event() and hid_report_raw_event() return an error
> > value to the caller.
> 
> This needs in addition a small fix on top, commit d6d7c87
> ("HID: fix return value of hidraw_report_event() when !CONFIG_HIDRAW")

I've added that, thanks.

Ben.

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, about L-Space IRC channel #afp


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] MM: Support more pagesizes for MAP_HUGETLB/SHM_HUGETLB v3

2012-10-09 Thread Andi Kleen


Thanks for the review.

> > I also exported the new flags to the user headers
> > (they were previously under __KERNEL__). Right now only symbols
> > for x86 and some other architecture for 1GB and 2MB are defined.
> > The interface should already work for all other architectures
> > though.
> 
> So some manpages need updating.  I'm not sure which - mmap(2) surely,
> but which for the IPC change?

mmap and shmget. Was already planned.

> 
> > v2: Port to new tree. Fix unmount.
> > v3: Ported to latest tree.
> > Acked-by: Rik van Riel 
> > Acked-by: KAMEZAWA Hiroyuki 
> > Signed-off-by: Andi Kleen 
> > ---
> >  arch/x86/include/asm/mman.h |3 ++
> >  fs/hugetlbfs/inode.c|   63 
> > ++-
> >  include/asm-generic/mman.h  |   13 +
> >  include/linux/hugetlb.h |   12 +++-
> >  include/linux/shm.h |   19 +
> >  ipc/shm.c   |3 +-
> >  mm/mmap.c   |5 ++-
> 
> Alas, include/asm-generic/mman.h doesn't exist now.
> 
> Does this change touch all the hugetlb-capable architectures?

Right now only symbols
for x86 and some other architecture for 1GB and 2MB are defined.
The interface should already work for all other architectures
though.

So they can add new symbols for their page sizes at their leisure.

> > return capable(CAP_IPC_LOCK) || in_group_p(shm_group);
> >  }
> >  
> > +static int get_hstate_idx(int page_size_log)
> 
> nitlet: "page_size_order" would be more kernely.  Or just "page_order".

It's not really an order, just the index.  I think I would prefer the current 
name,
order would be misleading.

For x86 it's only 0 and 1

> > +   if (IS_ERR(hugetlbfs_vfsmount[i])) {
> > +   pr_err(
> > +   "hugetlb: Cannot mount internal hugetlbfs for page size 
> > %uK",
> > +  ps_kb);
> > +   error = PTR_ERR(hugetlbfs_vfsmount[i]);
> > +   }
> > +   i++;
> > +   }
> > +   /* Non default hstates are optional */
> > +   if (hugetlbfs_vfsmount[default_hstate_idx])
> > +   return 0;
> 
> hm, so if I'm understanding this, the patch mounts hugetlbfs N times,
> once for each page size.  And presumably the shm code somehow selects
> one of these mounts, based on incoming flags.  And presumably if those
> flags are all-zero, the behaviour is unaltered.

Yes.

> 
> Please update the changelog to describe all this - the overview of how
> the patch actually operates.

Ok.

> 
> Also, all this affects the /proc/mounts contents, yes?  Let's changelog
> that very-slightly-non-back-compatible user-visible change as well.

AFAIK not. The internal mounts are not visible. At least my laptop
doesn't show them.

> There's some overhead to doing all those additional mounts.  Can we
> quantify it?

On x86 it's one more mount (1GB). AFAIK it's just the sb structure, there's
nothing else preallocated. Maybe a couple hundred bytes per page size.

The number of huge page sizes is normally small, I don't think any architecture
has a large number.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ 028/108] ARM: imx: armadillo5x0: Fix illegal register access

2012-10-09 Thread Ben Hutchings

On Sun, 2012-10-07 at 23:37 +, Estevam Fabio-R49496 wrote:
> No, please drop it from 3.2-stable.
> 
> This one should only go to 3.5/3.6 stable trees.
> 
> Otherwise it will break 3.2-stable.

Thanks, I've dropped this.

Ben.

-- 
Ben Hutchings
Who are all these weirdos? - David Bowie, about L-Space IRC channel #afp


signature.asc
Description: This is a digitally signed message part

Re: [PATCH] mm: Fix XFS oops due to dirty pages without buffers on s390

2012-10-09 Thread Hugh Dickins

On Tue, 9 Oct 2012, Jan Kara wrote:
> On Mon 08-10-12 21:24:40, Hugh Dickins wrote:
> > On Mon, 1 Oct 2012, Jan Kara wrote:
> > 
> > > On s390 any write to a page (even from kernel itself) sets architecture
> > > specific page dirty bit. Thus when a page is written to via standard 
> > > write, HW
> > > dirty bit gets set and when we later map and unmap the page, 
> > > page_remove_rmap()
> > > finds the dirty bit and calls set_page_dirty().
> > > 
> > > Dirtying of a page which shouldn't be dirty can cause all sorts of 
> > > problems to
> > > filesystems. The bug we observed in practice is that buffers from the 
> > > page get
> > > freed, so when the page gets later marked as dirty and writeback writes 
> > > it, XFS
> > > crashes due to an assertion BUG_ON(!PagePrivate(page)) in page_buffers() 
> > > called
> > > from xfs_count_page_state().
> > 
> > What changed recently?  Was XFS hardly used on s390 until now?
>   The problem was originally hit on SLE11-SP2 which is 3.0 based after
> migration of our s390 build machines from SLE11-SP1 (2.6.32 based). I think
> XFS just started to be more peevish about what pages it gets between these
> two releases ;) (e.g. ext3 or ext4 just says "oh, well" and fixes things
> up).

Right, in 2.6.32 xfs_vm_writepage() had a !page_has_buffers(page) case,
whereas by 3.0 that had become ASSERT(page_has_buffers(page)), with the
ASSERT usually compiled out, stumbling later in page_buffers() as you say.

> 
> > > Similar problem can also happen when zero_user_segment() call from
> > > xfs_vm_writepage() (or block_write_full_page() for that matter) set the
> > > hardware dirty bit during writeback, later buffers get freed, and then 
> > > page
> > > unmapped.

Similar problem, or is that the whole of the problem?  Where else does
the page get written to, after clearing page dirty?  (It may not be worth
spending time to answer me, I feel I'm wasting too much time on this.)

I keep trying to put my finger on the precise bug.  I said in earlier
mails to Mel and to Martin that we're mixing a bugfix and an optimization,
but I cannot quite point to the bug.  Could one say that it's precisely at
the "page straddles i_size" zero_user_segment(), in XFS or in other FSes?
that the storage key ought to be re-cleaned after that?

What if one day I happened to copy that code into shmem_writepage()?
I've no intention to do so!  And it wouldn't cause a BUG.  Ah, and we
never write shmem to swap while it's still mapped, so it wouldn't even
have a chance to redirty the page in page_remove_rmap().

I guess I'm worrying too much; but it's not crystal clear to me why any
!mapping_cap_account_dirty mapping would necessarily not have the problem.

> > But here's where I think the problem is.  You're assuming that all
> > filesystems go the same mapping_cap_account_writeback_dirty() (yeah,
> > there's no such function, just a confusing maze of three) route as XFS.
> > 
> > But filesystems like tmpfs and ramfs (perhaps they're the only two
> > that matter here) don't participate in that, and wait for an mmap'ed
> > page to be seen modified by the user (usually via pte_dirty, but that's
> > a no-op on s390) before page is marked dirty; and page reclaim throws
> > away undirtied pages.
>   I admit I haven't thought of tmpfs and similar. After some discussion Mel
> pointed me to the code in mmap which makes a difference. So if I get it
> right, the difference which causes us problems is that on tmpfs we map the
> page writeably even during read-only fault. OK, then if I make the above
> code in page_remove_rmap():
>   if ((PageSwapCache(page) ||
>(!anon && !mapping_cap_account_dirty(page->mapping))) &&
>   page_test_and_clear_dirty(page_to_pfn(page), 1))
>   set_page_dirty(page);
> 
>   Things should be ok (modulo the ugliness of this condition), right?

(Setting aside my reservations above...) That's almost exactly right, but
I think the issue of a racing truncation (which could reset page->mapping
to NULL at any moment) means we have to be a bit more careful.  Usually
we guard against that with page lock, but here we can rely on mapcount.

page_mapping(page), with its built-in PageSwapCache check, actually ends
up making the condition look less ugly; and so far as I could tell,
the extra code does get optimized out on x86 (unless CONFIG_DEBUG_VM,
when we are left with its VM_BUG_ON(PageSlab(page))).

But please look this over very critically and test (and if you like it,
please adopt it as your own): I'm not entirely convinced yet myself.

(One day, I do want to move that block further down page_remove_rmap(),
outside the mem_cgroup_[begin,end]_update_stat() bracketing: I don't think
there's an actual problem at present in calling set_page_dirty() there,
but I have seen patches which could give it a lock-ordering issue, so
better to untangle them.  No reason to muddle that in with your fix,
but I thought I'd mention it while we're all staring at this.)

Hugh

---

Re: [GIT PULL] Disintegrate UAPI for powerpc [ver #2]

2012-10-09 Thread Benjamin Herrenschmidt

On Tue, 2012-10-09 at 10:15 +0100, David Howells wrote:
> Can you merge the following branch into the powerpc tree please.

Thanks, looking at this right now. If it passes my tests I'll ask Linus
to pull later today.

Cheers,
Ben.

> This is to complete part of the UAPI disintegration for which the preparatory
> patches were pulled recently.
> 
> Now that the fixups and the asm-generic chunk have been merged, I've
> regenerated the patches to get rid of those dependencies and to take account 
> of
> any changes made so far in the merge window.  If you have already pulled the
> older version of the branch aimed at you, then please feel free to ignore this
> request.
> 
> The following changes since commit 9e2d8656f5e8aa214e66b462680cf86b210b74a8:
> 
>   Merge branch 'akpm' (Andrew's patch-bomb) (2012-10-09 16:23:15 +0900)
> 
> are available in the git repository at:
> 
> 
>   git://git.infradead.org/users/dhowells/linux-headers.git 
> tags/disintegrate-powerpc-20121009
> 
> for you to fetch changes up to c3617f72036c909e1f6086b5b9e364e0ef90a6da:
> 
>   UAPI: (Scripted) Disintegrate arch/powerpc/include/asm (2012-10-09 09:47:26 
> +0100)
> 
> 
> UAPI Disintegration 2012-10-09
> 
> 
> David Howells (1):
>   UAPI: (Scripted) Disintegrate arch/powerpc/include/asm
> 
>  arch/powerpc/include/asm/Kbuild   |  35 --
>  arch/powerpc/include/asm/bootx.h  | 123 +--
>  arch/powerpc/include/asm/cputable.h   |  35 +-
>  arch/powerpc/include/asm/elf.h| 311 +-
>  arch/powerpc/include/asm/kvm_para.h   |  70 +---
>  arch/powerpc/include/asm/mman.h   |  27 +-
>  arch/powerpc/include/asm/nvram.h  |  55 +---
>  arch/powerpc/include/asm/ptrace.h | 242 +-
>  arch/powerpc/include/asm/signal.h | 143 +---
>  arch/powerpc/include/asm/spu_info.h   |  29 +-
>  arch/powerpc/include/asm/swab.h   |  15 +-
>  arch/powerpc/include/asm/termios.h|  69 +---
>  arch/powerpc/include/asm/types.h  |  30 +-
>  arch/powerpc/include/asm/unistd.h | 374 +
>  arch/powerpc/include/uapi/asm/Kbuild  |  41 +++
>  arch/powerpc/include/{ => uapi}/asm/auxvec.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/bitsperlong.h |   0
>  arch/powerpc/include/uapi/asm/bootx.h | 132 
>  arch/powerpc/include/{ => uapi}/asm/byteorder.h   |   0
>  arch/powerpc/include/uapi/asm/cputable.h  |  36 ++
>  arch/powerpc/include/uapi/asm/elf.h   | 307 +
>  arch/powerpc/include/{ => uapi}/asm/errno.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/fcntl.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/ioctl.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/ioctls.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/ipcbuf.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/kvm.h |   0
>  arch/powerpc/include/uapi/asm/kvm_para.h  |  90 +
>  arch/powerpc/include/{ => uapi}/asm/linkage.h |   0
>  arch/powerpc/include/uapi/asm/mman.h  |  31 ++
>  arch/powerpc/include/{ => uapi}/asm/msgbuf.h  |   0
>  arch/powerpc/include/uapi/asm/nvram.h |  62 
>  arch/powerpc/include/{ => uapi}/asm/param.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/poll.h|   0
>  arch/powerpc/include/{ => uapi}/asm/posix_types.h |   0
>  arch/powerpc/include/{ => uapi}/asm/ps3fb.h   |   0
>  arch/powerpc/include/uapi/asm/ptrace.h| 259 +++
>  arch/powerpc/include/{ => uapi}/asm/resource.h|   0
>  arch/powerpc/include/{ => uapi}/asm/seccomp.h |   0
>  arch/powerpc/include/{ => uapi}/asm/sembuf.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/setup.h   |   0
>  arch/powerpc/include/{ => uapi}/asm/shmbuf.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/sigcontext.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/siginfo.h |   0
>  arch/powerpc/include/uapi/asm/signal.h| 145 +
>  arch/powerpc/include/{ => uapi}/asm/socket.h  |   0
>  arch/powerpc/include/{ => uapi}/asm/sockios.h |   0
>  arch/powerpc/include/uapi/asm/spu_info.h  |  53 +++
>  arch/powerpc/include/{ => uapi}/asm/stat.h|   0
>  arch/powerpc/include/{ => uapi}/asm/statfs.h  |   0
>  arch/powerpc/include/uapi/asm/swab.h  |  23 ++
>  arch/powerpc/include/{ => uapi}/asm/termbits.h

RE: [PATCH] driver/char/tpm: fix regression causesd by ppi

2012-10-09 Thread Wei, Gang

Kent Yoder wrote on 2012-10-10:
> On Tue, Oct 09, 2012 at 05:35:22PM +0800, gang@intel.com wrote:
>> @@ -1476,7 +1477,7 @@ struct tpm_chip *tpm_register_hardware(struct
> device *dev,
>>  goto put_device;
>>  }
>> -if (sys_add_ppi(>kobj)) {
>> +if (tpm_add_ppi(>kobj)) {
>>  misc_deregister(>vendor.miscdev);
>>  goto put_device;
>>  }
>>  
>   Hmm, tpm_add_ppi is just sysfs_create_group, which only ever returns
> 0. Looks like we can remove this error path, but PPI is unusable in the
> failure case.

sysfs_create_group will return 0 on success or return error code. So I don't
think we can remove this error path. The previous call to sysfs_create_group
also have similar error path.

>> +EXPORT_SYMBOL_GPL(tpm_add_ppi);
>> ...
>> +EXPORT_SYMBOL_GPL(tpm_remove_ppi);
>> 
>  Do we need to export these symbols?  These might have been left around
> from when ppi was a standalone module.

We definitely need to export these symbols, since ppi was in tpm_bios.ko,
and these symbols are called from tpm.ko.

Jimmy


smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 4/5] regmap: add API to get irq_domain from regmap irq

2012-10-09 Thread Mark Brown

On Tue, Oct 09, 2012 at 04:58:35PM +0530, Laxman Dewangan wrote:
> Add API regmap_irq_get_irq_domain() for getting the
> irq domain from regmap irq.
> The irq domain created on result of regmap_add_irq_chip()
> from driver.

This needs stubbing.  

Please also fix the formatting of your commit log, you're either
wrapping randomly in the middle of paragraphs or not leaving blank lines
between paragraphs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu's node.

2012-10-09 Thread Wen Congyang

At 10/10/2012 07:27 AM, David Rientjes Wrote:
> On Tue, 9 Oct 2012, Peter Zijlstra wrote:
> 
>> Well the code they were patching is in the wakeup path. As I think Tang
>> said, we leave !runnable tasks on whatever cpu they ran on last, even if
>> that cpu is offlined, we try and fix up state when we get a wakeup.
>>
>> On wakeup, it tries to find a cpu to run on and will try a cpu of the
>> same node first.
>>
>> Now if that node's entirely gone away, it appears the cpu_to_node() map
>> will not return a valid node number.
>>
>> I think that's a change in behaviour, it didn't used to do that afaik.
>> Certainly this code hasn't change in a while.
>>
> 
> If cpu_to_node() always returns a valid node id even if all cpus on the 
> node are offline, then the cpumask_of_node() implementation, which the 
> sched code is using, should either return an empty cpumask (if 
> node_to_cpumask_map[nid] isn't freed) or cpu_online_mask.  The change in 
> behavior here occurred because 
> cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't 
> return a valid node id and forces it to return -1 so a kzalloc_node(..., 
> -1) fallsback to allocate anywhere.
> 
> But if you only need cpu_to_node() when waking up to find a runnable cpu 
> for this NUMA information, then I think you can just change the 
> kzalloc_node() in alloc_{fair,rt}_sched_group() to do 
> kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
> 
>  [ The changelog here is confusing because it's fixing a problem in 
>linux-next without saying so. ]
> 

I don't agree with this way. Because it only fix the code which causes a
problem, and we can't say there is no any similar problem. So it is
why I clear the cpu-to-node mapping.

What about the following solution:
1. clear the cpu-to-node mapping when the node is offlined
2. tang's patch is still necessary because we leave !runnable tasks on
   whatever cpu they ran on last. If cpu's node is NUMA_NO_NODE, it means
   the entire node is offlined, and we must migrate the task to the other
   node.

Thanks
Wen Congyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-10-09 Thread Eric W. Biederman

ebied...@xmission.com (Eric W. Biederman) writes:

> "J. Bruce Fields"  writes:
>
>> On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote:
>>> "Myklebust, Trond"  writes:
>>> 
>>> > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote:
>>> >> Cc'ing Eric since I seem to recall he suggested doing it this way?
>>> 
>>> Yes.  On second look setting fs->root won't work. We need to change fs.
>>> The problem is that by default all kernel threads share fs so changing
>>> fs->root will have non-local consequences.
>>
>> Oh, huh.  And we can't "unshare" it somehow?
>
> I don't fully understand how nfs uses kernel threads and work queues.
> My general understanding is work queues reuse their kernel threads
> between different users.  So it is mostly a don't pollute your
> environment thing.  If there was a dedicated kernel thread for each
> environment this would be trivial.
>
> What I was suggesting here is changing task->fs instead of
> task->fs.root.  That should just require task_lock().
>
>> Or, previously you suggested:
>>
>>  - introduce sockaddr_fd that can be applied to AF_UNIX sockets,
>>and teach unix_bind and unix_connect how to deal with a second
>>type of sockaddr, AT_FD:
>>struct sockaddr_fd { short fd_family; short pad; int fd; }
>>
>>  - introduce sockaddr_unix_at that takes a directory file
>>descriptor as well as a unix path, and teach unix_bind and
>>unix_connect to deal with a second sockaddr type, AF_UNIX_AT:
>>struct sockaddr_unix_at { short family; short pad; int dfd; char 
>> path[102]; }
>>
>> Any other options?
>
> I am still half hoping we don't have to change the userspace API/ABI.
> There is sanity checking on that path that no one seems interested in to
> solve this problem.

There is a good option if we are up to userspace ABI extensions.

Implement open(2) on unix domain sockets.  Where open(2) would
essentially equal connect(2) on unix domain sockets.

With an open(2) implementation we could use file_open_path and the
implementation should be pretty straight forward and maintainable.
So implementing open(2) looks like a good alternative implementation
route.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ASpeed Technologies KMS VGA Driver error in log

2012-10-09 Thread David Airlie

> > > 
> > 
> > Oh thanks for pointing this out, its just left over debug, I'll
> > send a patch
> > to remove it.
> > 
> > Dave.
> 
> Hello,
> I see no changes in the current kernel tree. Maybe you forgot about
> it ?
> 
> 

Not sure what kernel tree you are looking in, should be in Linus tree upstream 
now.

commit 0273de08c455031335dbea2630208f66106b0c14
Author: Dave Airlie 
Date:   Mon Sep 3 07:22:16 2012 +1000

drm/ast: drop debug level on error printk

This was never an error, drop to a debug print.

Reported-by: Keven Lachance
Signed-off-by: Dave Airlie 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] firmware: Don't attempt to allocate zero bytes with vmalloc()

2012-10-09 Thread Mark Brown

On Tue, Oct 09, 2012 at 10:55:17PM +0800, Ming Lei wrote:
> On Tue, Oct 9, 2012 at 8:36 PM, Mark Brown
> > On Tue, Oct 09, 2012 at 08:02:18PM +0800, Ming Lei wrote:

> > It doesn't really help as the ABI is such that you can only have one

> Could you let me know where the ABI is?

It's defined by firmware_class?

> > request_firmware() in play at once (unless this changed since I last
> > looked at it).

> I guess you mean that only one firmware device can be added
> as child of the device which is requesting firmware.

> The commit below(already merged into linus tree) should fix
> the problem:

> 99c2aa72306079976369aad7fc62cc71931d692a(firmware loader:
> fix creation failure of fw loader device)

That's not been well advertised (and is extremely recent too)...

> Could you test it to see if more than one request_firmware_nowait()
> can be called concurrently for the same device?

Not for some time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/7] x86, mm: setup page table from top-down

2012-10-09 Thread Yinghai Lu

On Tue, Oct 9, 2012 at 4:58 PM, Yinghai Lu  wrote:
> Get pgt_buf early from BRK, and use it to map PMD_SIZE to top at first.
> then use page from PMD_SIZE to map next blow range.
>
> alloc_low_page will use page from BRK at first, then will switch to use
> to memblock to find and reserve page for page table usage.
>
> At last we could get rid of calculation and find early pgt related code.
>
> Suggested-by: "H. Peter Anvin" 
> Signed-off-by: Yinghai Lu 

sorry , there one typo in this patch, please use attached one instead.


fix_max_pfn_xx_13.patch
Description: Binary data

Re: ASpeed Technologies KMS VGA Driver error in log

2012-10-09 Thread thekingmen

> - Original Message -
> > From: "Keven" 
> > To: linux-kernel@vger.kernel.org, airl...@redhat.com
> > Sent: Monday, 3 September, 2012 12:26:05 AM
> > Subject: ASpeed Technologies KMS VGA Driver error in log
> > 
> > 1. ASpeed Technologies KMS VGA Driver error in log
> > 2. Hi, I have an error message in my kernel log and I don't know if
> > this
> > need to be taken care of or this is a normal behaviour.
> > 
> > The component is [PATCH] drm: Initial KMS driver for AST (ASpeed
> > Technologies) 2000 series
> > This article is related to the new compoent.
> > http://lists.freedesktop.org/archives/dri-devel/2012-April/021578.html
> > 
> > The code file should be
> > Linux/drivers/gpu/drm/ast/ast_mode.c
> > 
> > The error I have is
> > [7.518793] [drm:ast_cursor_init] *ERROR* pinned cursor cache at 0
> > 
> 
> Oh thanks for pointing this out, its just left over debug, I'll send a patch
> to remove it.
> 
> Dave.

Hello, 
I see no changes in the current kernel tree. Maybe you forgot about
it ? 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.1-rt1

2012-10-09 Thread Steven Rostedt

On Tue, 2012-10-09 at 20:21 -0400, Steven Rostedt wrote:

> > 0007-stomp-machine-deal-clever-with-stopper-lock.patch
> 
> With this one, things have changed quite a bit. I'll take a deeper look
> at what you did and figure out how this applies to v3.0-rt.

It doesn't look like this patch is needed for v3.0-rt as the patch
addresses the new stop_machine_from_inactive_cpu() API used by the mtrr
code added by this commit:

commit 192d8857427dd23707d5f0b86ca990c3af6f2d74
Author: Suresh Siddha 
Date:   Thu Jun 23 11:19:29 2011 -0700

x86, mtrr: use stop_machine APIs for doing MTRR rendezvous


This was added in 3.1 so the patch is still required for v3.2-rt.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 7/8] video: mark nuc900fb_map_video_memory as __devinit

2012-10-09 Thread Wan ZongShun

2012/10/10 Arnd Bergmann :
> nuc900fb_map_video_memory is called by an devinit function
> that may be called at run-time, but the function itself is
> marked __init and will be discarded after boot.
>
> To avoid calling into a function that may have been overwritten,
> mark nuc900fb_map_video_memory itself as __devinit.
>
> Without this patch, building nuc950_defconfig results in:
>
> WARNING: drivers/video/built-in.o(.devinit.text+0x26c): Section mismatch in 
> reference from the function nuc900fb_probe() to the function 
> .init.text:nuc900fb_map_video_memory()
> The function __devinit nuc900fb_probe() references
> a function __init nuc900fb_map_video_memory().
> If nuc900fb_map_video_memory is only used by nuc900fb_probe then
> annotate nuc900fb_map_video_memory with a matching annotation.
>
> Signed-off-by: Arnd Bergmann 
> Cc: Wan ZongShun 
> Cc: Florian Tobias Schandinat 
> Cc: linux-fb...@vger.kernel.org

Thanks for your patch.
Acked-by: Wan ZongShun 

> ---
>  drivers/video/nuc900fb.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/video/nuc900fb.c b/drivers/video/nuc900fb.c
> index e10f551..b31b12b 100644
> --- a/drivers/video/nuc900fb.c
> +++ b/drivers/video/nuc900fb.c
> @@ -387,7 +387,7 @@ static int nuc900fb_init_registers(struct fb_info *info)
>   *The buffer should be a non-cached, non-buffered, memory region
>   *to allow palette and pixel writes without flushing the cache.
>   */
> -static int __init nuc900fb_map_video_memory(struct fb_info *info)
> +static int __devinit nuc900fb_map_video_memory(struct fb_info *info)
>  {
> struct nuc900fb_info *fbi = info->par;
> dma_addr_t map_dma;
> --
> 1.7.10
>



-- 
Wan ZongShun.
www.mcuos.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v5 1/2] ACPI: Add early console framework for DBGP/DBG2.

2012-10-09 Thread Zheng, Lv

> > +int __init acpi_early_console_keep(struct acpi_debug_port *info)
> 
> Why not make it 'bool' like the other (acpi_early_console_enabled)?

NAK.
"keep" is "int" in "setup_early_printk".

Best regards/Lv Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v5 2/2] ACPI: Add Intel MID SPI early console support.

2012-10-09 Thread Zheng, Lv

> >   earlyprintk=acpi
> .. or earlyprintk=mrst
> ?

ACK.
The two launchers are all workable for MID_SPI.  I'll add more comments and 
resend this patch.  Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v5 1/2] ACPI: Add early console framework for DBGP/DBG2.

2012-10-09 Thread Zheng, Lv

> > Signed-off-by: Lv Zheng 
> > Reviewed-by: Len Brown 
> > Reviewed-by: Rui Zhang 
> > Reviewed-by: Ying Huang 
> > Reviewed-by: Konrad Rzeszutek Wilk 
> Please don't include that unless I (or other folks looking at your code) say
> explicitly 'Acked' or 'Reviewed-by'

ACK.
I'll remove these names and resend.  Thanks.

> > +#define DEBUG
> That should not be the default case.

RFC.
If we do not add ignore_loglevel to the command line, the acpi earlycon will be 
mute just as what you want.
Do you really want this to be:
#undef DEBUG
If we do this, all pr_debug invocations in this file will be empty in 
compilation stage and are there any other means for us to view the output of 
ACPI earlycon without recompiling?

> > +DECLARE_BITMAP(acpi_early_flags, MAX_ACPI_DBG_PORTS);
> static?

ACK.
I'll do the modification.

> > +int acpi_early_enabled;
> __read_mostly and you could also make it a bool.

NAK.
I think this variable will be read only once and written up to 16 times so 
__read_mostly is not required.
I'll check and add __init and __initdata for this patch.

> > +   set_bit(port, acpi_early_flags);
> > +   if (keep)
> > +   set_bit(port+MAX_ACPI_DBG_PORTS, acpi_early_flags);
> Huh? The bitmap is up to MAX_ACPI_DB_PORTS, but here you offset it past
> that? Why?

ACK.
It's my mistake.  The size of the bitmap should be "MAX_ACPI_DBG_PORTS<<1 or 
MAX_ACPI_DBG_PORTS*2".
The reason is:
I think MAX_ACPI_DBG_PORTS=4 is enough in this case.
Since the systems running Linux are 32/64 bit architectures, using 2 bitmaps 
will be waste.
As the systems are at least 32 bit, the MAX_ACPI_DB_PORTS is defined as 16 to 
make full use of the bitmap.

> > +   if (!acpi_early_console_enabled(info->port_index))
> > +   return 0;
> Not -ENODEV?

NAK.
This is to support "MULTIPLE DBG2 debug ports".

** MULTIPLE DBGP table versions and MULTIPLE DBG2 debug ports support **
The whole patch takes ENODEV as the semantics of "table not exist or table 
version not supported" in PROBE/START stage so that we can obtain the ability 
of probing Microsoft's future tables in the order from DBGn, ..., DBG2, DBGP.
We should not return here as users may pass the following command line:
earlyprintk=acpi2,keep
to let the first 2 debug ports mute, but take the 3rd as a Linux earlycon.

> > +   while (((unsigned long)entry) + sizeof(struct acpi_dbg2_device) <
> > +  tbl_end) {
> Just make it one line. Ignore the 80 characters limit here.

ACK.
I'll try to implement this within 80 characters.

> > +   if (!max_entries || count++ < max_entries) {
> How about you just make this 'count'
> > +   pr_debug("early: DBG2 PROBE - console %d(%04x:%04x).\n",
> > +count-1,
> > +entry->port_type, entry->port_subtype);
> > +   devinfo.port_index = (u8)count-1;
> Then you don't this 'count -1'
> and then do
>   count++ here?

ACK.
It's my mistake, the previous version uses port_index=1 as the first debug 
port, but this version takes 0 as the first port.

> > +   acpi_early_console_start();
> no check of the return value to see whether you should return immediately?

NAK.
See " MULTIPLE DBGP tables and MULTIPLE DBG2 debug ports support" above.

> > +   acpi_early_console_start();
> how about 'return acpi_early_console_start(..)'

NAK.
See " MULTIPLE DBGP tables and MULTIPLE DBG2 debug ports support" above.

> > +   if (acpi_table_parse(ACPI_SIG_DBG2, acpi_parse_dbg2) != 0)
> > +   acpi_table_parse(ACPI_SIG_DBGP, acpi_parse_dbgp);

RFC.
This is to support "MULTIPLE DBGP table versions".

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

systemtap release 2.0

2012-10-09 Thread Josh Stone

The systemtap team announces release 2.0!

  prototype/preview dyninst backend, preprocessor macros, script
  privilege level conditionals, probe alias suffixes, revamped
  backtrace tapsets, tested on kernels 2.6.9 through 3.6.


= Where to get it

  http://sourceware.org/systemtap/ - our project page
  http://sourceware.org/systemtap/ftp/releases/systemtap-2.0.tar.gz
  http://koji.fedoraproject.org/koji/packageinfo?packageID=615
  git tag release-2.0 (commit a63381cc)

  There have been over 350 commits since the last release.
  There have been over 50 bugs fixed / features added since the last release.


= How to build it

  See the README and NEWS files at
  http://sourceware.org/git/?p=systemtap.git;a=tree
  Further information at http://sourceware.org/systemtap/wiki/


= Systemtap frontend (stap) changes

- A new --runtime option has been added to allow the user to choose
  between the existing kernel (--runtime=kernel) and the prototype
  dyninst (--runtime=dyninst) backends. See the sections "Systemtap
  runtime changes" and "Known issues" below for more information on
  the dyninst backend.


= Systemtap script language changes

- The systemtap preprocessor now has a simple macro facility as follows:

@define add(a,b) %( ((@a)+(@b)) %)
@define probegin(x) %(
   probe begin {
 @x
   }
%)

@probegin( foo = @add(40, 2); print(foo) )

  Macros defined in the user script and regular tapset .stp files are
  local to the file. To get around this, the tapset library can define
  globally visible 'library macros' inside .stpm files. (A .stpm file
  must contain a series of @define directives and nothing else.)

  The status of the feature is experimental; semantics of macroexpansion
  may change (unlikely) or expand in the future.

- Systemtap probe aliases may be used with additional suffixes
  attached. The suffixes are passed on to the underlying probe
  point(s) as shown below:

probe foo = bar, baz { }
probe foo.subfoo.option("gronk") { }
// expands to: bar.subfoo.option("gronk"), baz.subfoo.option("gronk")

  In practical terms, this allows us to specify additional options to
  certain tapset probe aliases, by writing e.g.
probe syscall.open.return.maxactive(5) { ... }

- Preprocessor conditional to vary code based on script privilege level:
  unprivileged -- %( systemtap_privilege == "stapusr" %? ... %)
  privileged   -- %( systemtap_privilege != "stapusr" %? ... %)
  or, alternately %( systemtap_privilege == "stapsys"
  || systemtap_privilege == "stapdev" %? ... %)

  The "unprivileged" category corresponds to code that must be able to
  run in stapusr mode, while the "privileged" category corresponds to
  all other code (requiring privilege level stapsys or above).

- To ease migration to the embedded-C locals syntax introduced in 1.8
  (namely, STAP_ARG_* and STAP_RETVALUE), the old syntax can now be
  re-enabled on a per-function basis using the /* unmangled */ pragma:

function add_foo:long(a:long, b:long) %{ /* unmangled */
  THIS->__retvalue = THIS->a + STAP_ARG_b;
%}

  Note that both the old and the new syntax may be used in an
  /* unmangled */ function. Functions not marked /* unmangled */
  can only use the new syntax.

- Adjacent string literals are now glued together irrespective of
  intervening whitespace or comments:
"foo " "bar" --> "foo bar"
"foo " /* comment */ "bar" --> "foo bar"
  Previously, the first pair of literals would be glued correctly,
  while the second would cause a syntax error.


= Systemtap runtime changes

- Systemtap includes a new prototype backend, which uses Dyninst to instrument
  a user's own processes at runtime. This backend does not use kernel modules,
  and does not require root privileges, but is restricted with respect to the
  kinds of probes and other constructs that a script may use.

  Users from source should configure --with-dyninst and install a
  fresh dyninst snapshot such as that in Fedora rawhide.  It may be
  necessary to disable conflicting selinux checks; systemtap will advise.

  Select this new backend with the stap option --runtime=dyninst and a
  -c target process, along with normal options. (-x target processes
  are not supported in this prototype version.) For example:

stap --runtime=dyninst -c 'stap -l begin' \
  -e 'probe process.function("main") { println("hi from dyninst!") }'

- To aid diagnoses in the event of a kernel panic, systemtap now uses
  the panic_notifier_list facility to dump a summary of its trace
  buffers to the serial console.

- Significant bug fixes to dwarfless kprobe behaviour. @entry() is now
  supported, and code such as

stap -e 'probe kprobe.function("foo") !, kprobe.function("sys_read")

  now behaves correctly with non-existent functions. This allows the
  dwarfless syscall tapset nd_syscalls.stp to achieve approximate
  feature parity with the DWARF-enabled syscall tapsets.


= Systemtap tapset changes

[GIT] Networking

2012-10-09 Thread David Miller


1) UAPI changes for networking from David Howells

2) A netlink dump is an operation we can sleep within, and
   therefore we need to make sure the dump provider module
   doesn't disappear on us meanwhile.  Fix from Gao Feng.

3) Now that tunnels support GRO, we have to be more careful
   in skb_gro_reset_offset() otherwise we OOPS, from Eric
   Dumazet.

4) We can end up processing packets for VLANs we aren't
   actually configured to be on, fix from Florian Zumbiehl.

5) Fix routing cache removal regression in redirects and IPVS.  The core
   issue on the IPVS side is that it wants to rewrite who the nexthop
   is and we have to explicitly accomodate that case.  From Julian
   Anastasov.

6) Error code return fixes all over the networking drivers from
   Peter Senna Tschudin.

7) Fix routing cache removal regressions in IPSEC, from Steffen
   Klassert.

8) Fix deadlock in RDS during pings, from Jeff Liu.

9) Neighbour packet queue can trigger skb_under_panic() because we
   do not reset the network header of the SKB in the right spot.
   From Ramesh Nagappa.

Please pull, thanks a lot!

The following changes since commit 547b1e81afe3119f7daf702cc03b158495535a25:

  Fix staging driver use of VM_RESERVED (2012-10-09 21:06:41 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 5175a5e76bbdf20a614fb47ce7a38f0f39e70226:

  RDS: fix rds-ping spinlock recursion (2012-10-09 13:57:23 -0400)


Ajit Khaparde (1):
  be2net: Remove code that stops further access to BE NIC based on UE bits

Alexander Duyck (1):
  ixgbe/ixgbevf: Limit maximum jumbo frame size to 9.5K to avoid Tx hangs

Bruce Allan (1):
  e1000e: add device IDs for i218

Dan Carpenter (2):
  cxgb4: allocate enough data in t4_memory_rw()
  farsync: fix support for over 30 cards

David Howells (10):
  UAPI: (Scripted) Disintegrate include/linux/caif
  UAPI: (Scripted) Disintegrate include/linux/isdn
  UAPI: (Scripted) Disintegrate include/linux/netfilter
  UAPI: (Scripted) Disintegrate include/linux/netfilter/ipset
  UAPI: (Scripted) Disintegrate include/linux/netfilter_arp
  UAPI: (Scripted) Disintegrate include/linux/netfilter_bridge
  UAPI: (Scripted) Disintegrate include/linux/netfilter_ipv4
  UAPI: (Scripted) Disintegrate include/linux/netfilter_ipv6
  UAPI: (Scripted) Disintegrate include/linux/tc_act
  UAPI: (Scripted) Disintegrate include/linux/tc_ematch

David S. Miller (3):
  Merge git://git.kernel.org/.../torvalds/linux
  Merge tag 'disintegrate-net-20121009' of 
git://git.infradead.org/users/dhowells/linux-headers
  Merge tag 'disintegrate-isdn-20121009' of 
git://git.infradead.org/users/dhowells/linux-headers

Eric Dumazet (5):
  net: remove skb recycling
  ipv6: GRO should be ECN friendly
  net: gro: fix a potential crash in skb_gro_reset_offset
  net: gro: selective flush of packets
  ipv6: gro: fix PV6_GRO_CB(skb)->proto problem

Florian Zumbiehl (1):
  vlan: don't deliver frames for unknown vlans to protocols

Gao feng (2):
  netlink: add reference of module in netlink_dump_start
  infiniband: pass rdma_cm module to netlink_dump_start

Graham Gower (1):
  skge: Add DMA mask quirk for Marvell 88E8001 on ASUS P5NSLI motherboard

Greg Rose (1):
  ixgbevf: Set the netdev number of Tx queues

Haicheng Li (1):
  pch_gbe: Fix build error by selecting all the possible dependencies.

Julian Anastasov (6):
  ipv4: fix sending of redirects
  ipv4: fix forwarding for strict source routes
  ipv4: make sure nh_pcpu_rth_output is always allocated
  ipv4: introduce rt_uses_gateway
  ipv4: Add FLOWI_FLAG_KNOWN_NH
  ipvs: fix ARP resolving for direct routing mode

Mark Brown (1):
  netdev/phy: Prototype of_mdio_find_bus()

Michael Neuling (1):
  net: fix typo in freescale/ucc_geth.c

Peter Senna Tschudin (18):
  drivers/net/ethernet/dec/tulip/dmfe.c: fix error return code
  drivers/net/ethernet/natsemi/natsemi.c: fix error return code
  drivers/net/ethernet/sis/sis900.c: fix error return code
  drivers/net/irda/irtty-sir.c: fix error return code
  drivers/net/irda/mcs7780.c: fix error return code
  drivers/net/irda/pxaficp_ir.c: fix error return code
  drivers/net/irda/sa1100_ir.c: fix error return code
  drivers/net/irda/sh_irda.c: fix error return code
  drivers/net/irda/sh_sir.c: fix error return code
  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c: fix error return code
  drivers/net/ethernet/amd/amd8111e.c: fix error return code
  drivers/net/ethernet/amd/au1000_eth.c: fix error return code
  drivers/net/ethernet/natsemi/xtsonic.c: fix error return code
  drivers/net/ethernet/renesas/sh_eth.c: fix error return code
  drivers/net/ethernet/sun/niu.c: fix error return c

[GIT] Sparc

2012-10-09 Thread David Miller


This is just the UAPI commits for sparc via David Howells.

Please pull, thanks a lot!

The following changes since commit 547b1e81afe3119f7daf702cc03b158495535a25:

  Fix staging driver use of VM_RESERVED (2012-10-09 21:06:41 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git master

for you to fetch changes up to 9836d3458cde82626f2828ca6bd44c4a02b56e63:

  Merge tag 'disintegrate-sparc-20121009' of 
git://git.infradead.org/users/dhowells/linux-headers (2012-10-09 09:54:30 -0700)



David Howells (1):
  UAPI: (Scripted) Disintegrate arch/sparc/include/asm

David S. Miller (1):
  Merge tag 'disintegrate-sparc-20121009' of 
git://git.infradead.org/users/dhowells/linux-headers

 arch/sparc/include/asm/Kbuild   |  16 ---
 arch/sparc/include/asm/fbio.h   | 260 
+-
 arch/sparc/include/asm/ioctls.h | 129 +
 arch/sparc/include/asm/mman.h   |  25 +---
 arch/sparc/include/asm/psr.h|  36 +
 arch/sparc/include/asm/ptrace.h | 347 
+-
 arch/sparc/include/asm/setup.h  |  10 +-
 arch/sparc/include/asm/sigcontext.h |   4 +-
 arch/sparc/include/asm/siginfo.h|  23 +--
 arch/sparc/include/asm/signal.h | 185 +
 arch/sparc/include/asm/termbits.h   | 260 
+-
 arch/sparc/include/asm/termios.h|  41 +-
 arch/sparc/include/asm/traps.h  | 111 +--
 arch/sparc/include/asm/unistd.h | 412 
+-
 arch/sparc/include/uapi/asm/Kbuild  |  46 ++
 arch/sparc/include/{ => uapi}/asm/apc.h |   0
 arch/sparc/include/{ => uapi}/asm/asi.h |   0
 arch/sparc/include/{ => uapi}/asm/auxvec.h  |   0
 arch/sparc/include/{ => uapi}/asm/bitsperlong.h |   0
 arch/sparc/include/{ => uapi}/asm/byteorder.h   |   0
 arch/sparc/include/{ => uapi}/asm/display7seg.h |   0
 arch/sparc/include/{ => uapi}/asm/envctrl.h |   0
 arch/sparc/include/{ => uapi}/asm/errno.h   |   0
 arch/sparc/include/uapi/asm/fbio.h  | 259 
++
 arch/sparc/include/{ => uapi}/asm/fcntl.h   |   0
 arch/sparc/include/{ => uapi}/asm/ioctl.h   |   0
 arch/sparc/include/uapi/asm/ioctls.h| 131 ++
 arch/sparc/include/{ => uapi}/asm/ipcbuf.h  |   0
 arch/sparc/include/{ => uapi}/asm/jsflash.h |   0
 arch/sparc/include/{ => uapi}/asm/kvm_para.h|   0
 arch/sparc/include/uapi/asm/mman.h  |  27 
 arch/sparc/include/{ => uapi}/asm/msgbuf.h  |   0
 arch/sparc/include/{ => uapi}/asm/openpromio.h  |   0
 arch/sparc/include/{ => uapi}/asm/param.h   |   0
 arch/sparc/include/{ => uapi}/asm/perfctr.h |   0
 arch/sparc/include/{ => uapi}/asm/poll.h|   0
 arch/sparc/include/{ => uapi}/asm/posix_types.h |   0
 arch/sparc/include/uapi/asm/psr.h   |  47 +++
 arch/sparc/include/{ => uapi}/asm/psrcompat.h   |   0
 arch/sparc/include/{ => uapi}/asm/pstate.h  |   0
 arch/sparc/include/uapi/asm/ptrace.h| 352 
++
 arch/sparc/include/{ => uapi}/asm/resource.h|   0
 arch/sparc/include/{ => uapi}/asm/sembuf.h  |   0
 arch/sparc/include/uapi/asm/setup.h |  15 ++
 arch/sparc/include/{ => uapi}/asm/shmbuf.h  |   0
 arch/sparc/include/uapi/asm/siginfo.h   |  25 
 arch/sparc/include/uapi/asm/signal.h| 185 +
 arch/sparc/include/{ => uapi}/asm/socket.h  |   0
 arch/sparc/include/{ => uapi}/asm/sockios.h |   0
 arch/sparc/include/{ => uapi}/asm/stat.h|   0
 arch/sparc/include/{ => uapi}/asm/statfs.h  |   0
 arch/sparc/include/{ => uapi}/asm/swab.h|   0
 arch/sparc/include/uapi/asm/termbits.h  | 263 
+++
 arch/sparc/include/uapi/asm/termios.h   |  43 ++
 arch/sparc/include/uapi/asm/traps.h | 120 
 arch/sparc/include/{ => uapi}/asm/types.h   |   0
 arch/sparc/include/{ => uapi}/asm/uctx.h|   0
 arch/sparc/include/uapi/asm/unistd.h| 422 

 arch/sparc/include/{ => uapi}/asm/utrap.h   |   0
 arch/sparc/include/{ => uapi}/asm/watchdog.h|   0
 60 files changed, 1951 insertions(+), 1843 deletions(-)
 rename arch/sparc/include/{ => uapi}/asm/apc.h (100%)
 rename arch/sparc/include/{ => uapi}/asm/asi.h (100%)
 rename arch/sparc/include/{ => uapi}/asm/auxvec.h (10

[PATCH] perf: fix duplicate header inclusion

2012-10-09 Thread Michel Lespinasse

#include  somehow got duplicated on its way to linus's tree
(probably as a conflict resolution as things got sent through multiple trees)

Signed-off-by: Michel Lespinasse 
---
 tools/perf/util/include/linux/rbtree.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/include/linux/rbtree.h 
b/tools/perf/util/include/linux/rbtree.h
index 9bcdc844b330..2a030c5af3aa 100644
--- a/tools/perf/util/include/linux/rbtree.h
+++ b/tools/perf/util/include/linux/rbtree.h
@@ -1,3 +1,2 @@
 #include 
-#include 
 #include "../../../../include/linux/rbtree.h"
-- 
1.7.7.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports

2012-10-09 Thread J. Bruce Fields

On Tue, Oct 09, 2012 at 03:47:42PM -0700, Eric W. Biederman wrote:
> "J. Bruce Fields"  writes:
> 
> > On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote:
> >> "Myklebust, Trond"  writes:
> >> 
> >> > On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote:
> >> >> Cc'ing Eric since I seem to recall he suggested doing it this way?
> >> 
> >> Yes.  On second look setting fs->root won't work. We need to change fs.
> >> The problem is that by default all kernel threads share fs so changing
> >> fs->root will have non-local consequences.
> >
> > Oh, huh.  And we can't "unshare" it somehow?
> 
> I don't fully understand how nfs uses kernel threads and work queues.
> My general understanding is work queues reuse their kernel threads
> between different users.  So it is mostly a don't pollute your
> environment thing.  If there was a dedicated kernel thread for each
> environment this would be trivial.
> 
> What I was suggesting here is changing task->fs instead of
> task->fs.root.  That should just require task_lock().

Oh, OK, got it--if that works, great.

> > Sorry, I don't know much about devtmpfs, are you suggesting it as a
> > model?  What exactly should we look at?
> 
> Roughly all I meant was that devtmpsfsd is a kernel thread that runs
> with an unshared fs struct.  Although I admit devtmpfsd is for all
> practical purposes a userspace daemon that just happens to run in kernel
> space.

Thanks for the explanation.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] mm: Export vm_committed_as

2012-10-09 Thread Andrew Morton

On Wed, 10 Oct 2012 00:11:28 + KY Srinivasan  wrote:

> 
> 
> > -Original Message-
> > From: Andrew Morton [mailto:a...@linux-foundation.org]
> > Sent: Tuesday, October 09, 2012 3:48 PM
> > To: Greg KH
> > Cc: KY Srinivasan; o...@aepfle.de; linux-kernel@vger.kernel.org;
> > a...@firstfloor.org; a...@canonical.com; de...@linuxdriverproject.org
> > Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> > 
> > On Mon, 8 Oct 2012 06:35:39 -0700
> > Greg KH  wrote:
> > 
> > > On Mon, Oct 08, 2012 at 03:35:50AM +, KY Srinivasan wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > > > > Sent: Sunday, October 07, 2012 8:44 PM
> > > > > To: KY Srinivasan
> > > > > Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
> > o...@aepfle.de;
> > > > > a...@canonical.com; a...@linux-foundation.org; a...@firstfloor.org
> > > > > Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> > > > >
> > > > > On Sun, Oct 07, 2012 at 04:59:45PM -0700, K. Y. Srinivasan wrote:
> > > > > > The policy engine on the host expects the guest to report the
> > > > > > committed_as. Since this variable is not exported,
> > > > > > export this symbol.
> > > > >
> > > > > Why are these symbols not needed by either Xen or KVM or vmware, which
> > > > > I think all support the same thing, right?
> > > >
> > > > The basic balloon driver does not need this symbol since the basic 
> > > > balloon
> > driver
> > > > is not automatically driven by the host. On the Windows host we have a 
> > > > policy
> > engine that
> > > > drives the balloon driver based on both guest level memory pressure 
> > > > that the
> > guest
> > > > reports as well as other system level metrics the host maintains. We 
> > > > need this
> > symbol to
> > > > drive the policy engine on the host.
> > >
> > > Ok, but you're going to have to get the -mm developers to agree that
> > > this is ok before I can accept it.
> > 
> > Well I guess it won't kill us.
> 
> 
> Thanks.
> 

The other part of my email seems to have vanished.  Which makes me sad,
because I was rather interested in the answer.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the block tree

2012-10-09 Thread Stephen Rothwell

Hi Jens,

After merging the block tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

fs/block_dev.c: In function 'set_blocksize':
fs/block_dev.c:135:2: error: implicit declaration of function 'prio_tree_empty' 
[-Werror=implicit-function-declaration]

Caused by commit b87570f5d349 ("Fix a crash when block device is read and
block size is changed at the same time") interacting with commit
147e615f83c2 ("prio_tree: remove") which is now in Linus' tree.

I have added the following merge fix patch for today (this is a mash of
tow fix patches that used to be in Andrew's tree):

From: Andrew Morton 
Subject: fs/block_dev.c:set_blocksize(): use mapping_mapped()

... instead of open-coding it.

Suggested-by: Mikulas Patocka 
Cc: Stephen Rothwell 
Signed-off-by: Andrew Morton 
---

diff -puN fs/block_dev.c~fs-block_devc-set_blocksize-use-mapping_mapped 
fs/block_dev.c
--- a/fs/block_dev.c~fs-block_devc-set_blocksize-use-mapping_mapped
+++ a/fs/block_dev.c
@@ -132,8 +132,7 @@ int set_blocksize(struct block_device *b
/* Check that the block device is not memory mapped */
mapping = bdev->bd_inode->i_mapping;
mutex_lock(>i_mmap_mutex);
-   if (!prio_tree_empty(>i_mmap) ||
-   !list_empty(>i_mmap_nonlinear)) {
+   if (mapping_mapped(mapping)) {
mutex_unlock(>i_mmap_mutex);
percpu_up_write(>bd_block_size_semaphore);
return -EBUSY;

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpm8cWFKW64K.pgp
Description: PGP signature

Re: [PATCH] ACPI: dock: Remove redundant ACPI NS walk

2012-10-09 Thread Yasuaki Ishimatsu

Hi Toshi,

Sorry for late reply.

2012/09/13 5:30, Toshi Kani wrote:
> Combined two ACPI namespace walks, which look for dock stations
> and then bays separately, into a single walk.
> 
> Signed-off-by: Toshi Kani 
> ---

I have not tested the patch. But it looks good to me.

Reviewed-by: Yasuaki Ishimatsu 

Thanks,
Yasuaki Ishimatsu

>   drivers/acpi/dock.c |   26 +++---
>   1 files changed, 7 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> index 88eb143..ae4ebf2 100644
> --- a/drivers/acpi/dock.c
> +++ b/drivers/acpi/dock.c
> @@ -1016,44 +1016,32 @@ static int dock_remove(struct dock_station *ds)
>   }
>   
>   /**
> - * find_dock - look for a dock station
> + * find_dock_and_bay - look for dock stations and bays
>* @handle: acpi handle of a device
>* @lvl: unused
> - * @context: counter of dock stations found
> + * @context: unused
>* @rv: unused
>*
> - * This is called by acpi_walk_namespace to look for dock stations.
> + * This is called by acpi_walk_namespace to look for dock stations and bays.
>*/
>   static __init acpi_status
> -find_dock(acpi_handle handle, u32 lvl, void *context, void **rv)
> +find_dock_and_bay(acpi_handle handle, u32 lvl, void *context, void **rv)
>   {
> - if (is_dock(handle))
> + if (is_dock(handle) || is_ejectable_bay(handle))
>   dock_add(handle);
>   
>   return AE_OK;
>   }
>   
> -static __init acpi_status
> -find_bay(acpi_handle handle, u32 lvl, void *context, void **rv)
> -{
> - /* If bay is a dock, it's already handled */
> - if (is_ejectable_bay(handle) && !is_dock(handle))
> - dock_add(handle);
> - return AE_OK;
> -}
> -
>   static int __init dock_init(void)
>   {
>   if (acpi_disabled)
>   return 0;
>   
> - /* look for a dock station */
> + /* look for dock stations and bays */
>   acpi_walk_namespace(ACPI_TYPE_DEVICE, ACPI_ROOT_OBJECT,
> - ACPI_UINT32_MAX, find_dock, NULL, NULL, NULL);
> + ACPI_UINT32_MAX, find_dock_and_bay, NULL, NULL, NULL);
>   
> - /* look for bay */
> - acpi_walk_namespace(ACPI_TYPE_DEVICE, ACPI_ROOT_OBJECT,
> - ACPI_UINT32_MAX, find_bay, NULL, NULL, NULL);
>   if (!dock_station_count) {
>   printk(KERN_INFO PREFIX "No dock devices found.\n");
>   return 0;
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver

2012-10-09 Thread Andrew Morton

On Wed, 10 Oct 2012 00:09:12 + KY Srinivasan  wrote:

> > > + if (!pg) {
> > > + *alloc_error = true;
> > > + return i * alloc_unit;
> > > + }
> > > +
> > > + totalram_pages -= alloc_unit;
> > 
> > Well, I'd consider totalram_pages to be an mm-private thing which drivers
> > shouldn't muck with.  Why is this done?
> 
> By modifying the totalram_pages, the information presented in /proc/meminfo
> correctly reflects what is currently assigned to the guest (MemTotal).

eh?  /proc/meminfo:MemTotal tells you the total memory in the machine. 
The only thing which should change it after boot is memory hotplug. 

Modifying it in this manner puts the statistic into a state know as
"wrong".  And temporarily modifying it in this fashion will cause the
tremendous amount of initialisation code which relies upon
totalram_pages for sizing to also enter the "wrong" state.

Why on earth do balloon drivers do this?  If the amount of memory which
is consumed by balloons is interesting then it should be exported via a
standalone metric, not by mucking with totalram_pages.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: include/linux/cgroup.h:566 suspicious rcu_dereference_check() usage!

2012-10-09 Thread Sergey Senozhatsky

On (10/08/12 12:49), Paul E. McKenney wrote:
> 
> 
> 
> device_cgroup: Restore rcu_read_lock() protection to devcgroup_inode_mknod()
> 
> Commit ad676077 (device_cgroup: convert device_cgroup internally to
> policy + exceptions) restructured devcgroup_inode_mknod(), removing
> rcu_read_lock() in the process.  However, RCU read-side protection
> is required by the call to task_devcgroup(), so this commit restores
> the rcu_read_lock() and rcu_read_unlock().
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/security/device_cgroup.c b/security/device_cgroup.c
> index 44dfc41..c686110 100644
> --- a/security/device_cgroup.c
> +++ b/security/device_cgroup.c
> @@ -576,9 +576,12 @@ int __devcgroup_inode_permission(struct inode *inode, 
> int mask)
>  
>  int devcgroup_inode_mknod(int mode, dev_t dev)
>  {
> - struct dev_cgroup *dev_cgroup = task_devcgroup(current);
> + struct dev_cgroup *dev_cgroup;
> + int ret;
>   short type;
>  
> + rcu_read_lock();
> + dev_cgroup = task_devcgroup(current);
>   if (!S_ISBLK(mode) && !S_ISCHR(mode))
>   return 0;
>  
> @@ -587,7 +590,9 @@ int devcgroup_inode_mknod(int mode, dev_t dev)
>   else
>   type = DEV_CHAR;
>  
> - return __devcgroup_check_permission(dev_cgroup, type, MAJOR(dev),
> + ret =  __devcgroup_check_permission(dev_cgroup, type, MAJOR(dev),
>   MINOR(dev), ACC_MKNOD);
> + rcu_read_unlock();
> + return ret;
>  
>  }
> 


I believe the same should be done for __devcgroup_inode_permission() as well. 
And we
probably can call task_devcgroup() and rcu_read_lock() after "S_ISBLK(mode) && 
!S_ISCHR(mode)"
checks (I guess we also need to unlock RCU on `return 0').





Commit ad676077
 | Author: Aristeu Rozanski 
 | Date:   Thu Oct 4 17:15:17 2012 -0700
 |   device_cgroup: convert device_cgroup internally to policy + exceptions

moved RCU read-side protection from devcgroup_inode_mknod(), which, however is 
required
by task_devcgroup(). Patch also add RCU read-side protection to 
__devcgroup_inode_permission()
function, introduced in commit ad676077.

[0.946303] include/linux/cgroup.h:566 suspicious rcu_dereference_check() 
usage!
[0.946511] 
[0.946606] 2 locks held by kdevtmpfs/28:
[0.946684]  #0:  (sb_writers){.+.+.+}, at: [] 
mnt_want_write+0x24/0x4b
[0.947083]  #1:  (>s_type->i_mutex_key#3/1){+.+.+.}, at: 
[] kern_path_create+0x83/0x144
[0.947598] 
[0.947787] Call Trace:
[0.947868]  [] lockdep_rcu_suspicious+0x109/0x112
[0.947958]  [] devcgroup_inode_mknod+0x9e/0xee
[0.948043]  [] vfs_mknod+0x8a/0xed
[0.948129]  [] handle_create.isra.2+0x144/0x1b5
[0.948214]  [] ? devtmpfsd+0x9f/0x138
[0.948298]  [] ? do_raw_spin_lock+0x67/0xde
[0.948384]  [] ? do_raw_spin_unlock+0x8f/0x98
[0.948469]  [] ? handle_create.isra.2+0x1b5/0x1b5
[0.948554]  [] devtmpfsd+0xe4/0x138
[0.948638]  [] ? handle_create.isra.2+0x1b5/0x1b5
[0.948724]  [] kthread+0xd5/0xdd
[0.948814]  [] kernel_thread_helper+0x4/0x10
[0.948900]  [] ? retint_restore_args+0x13/0x13
[0.948985]  [] ? __init_kthread_worker+0x5a/0x5a
[0.949069]  [] ? gs_change+0x13/0x13


devcgroup_inode_mknod() part submitted by Paul E. McKenney.


Signed-off-by: Sergey Senozhatsky 

---

 security/device_cgroup.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/security/device_cgroup.c b/security/device_cgroup.c
index 44dfc41..043eb00 100644
--- a/security/device_cgroup.c
+++ b/security/device_cgroup.c
@@ -558,7 +558,8 @@ static int __devcgroup_check_permission(struct dev_cgroup 
*dev_cgroup,
 
 int __devcgroup_inode_permission(struct inode *inode, int mask)
 {
-   struct dev_cgroup *dev_cgroup = task_devcgroup(current);
+   struct dev_cgroup *dev_cgroup;
+   int ret;
short type, access = 0;
 
if (S_ISBLK(inode->i_mode))
@@ -570,13 +571,20 @@ int __devcgroup_inode_permission(struct inode *inode, int 
mask)
if (mask & MAY_READ)
access |= ACC_READ;
 
-   return __devcgroup_check_permission(dev_cgroup, type, imajor(inode),
+   rcu_read_lock();
+
+   dev_cgroup = task_devcgroup(current);
+   ret = __devcgroup_check_permission(dev_cgroup, type, imajor(inode),
iminor(inode), access);
+
+   rcu_read_unlock();
+   return ret;
 }
 
 int devcgroup_inode_mknod(int mode, dev_t dev)
 {
-   struct dev_cgroup *dev_cgroup = task_devcgroup(current);
+   struct dev_cgroup *dev_cgroup;
+   int ret;
short type;
 
if (!S_ISBLK(mode) && !S_ISCHR(mode))
@@ -587,7 +595,12 @@ int devcgroup_inode_mknod(int mode, dev_t dev)
else
type = DEV_CHAR;
 
-   return __devcgroup_check_permission(dev_cgroup,

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Kent Overstreet

On Tue, Oct 09, 2012 at 05:36:26PM -0700, Zach Brown wrote:
> > The only situation you have to worry about is when the ringbuffer fills
> > up and stuff goes on the list, and then completions completely stop -
> > this should be a rare enough situation that maybe we could just hack
> > around it with a timer that gets flipped on when the list isn't empty.
> 
> Right.  And this is when we hopefully realize that we're adding overhead
> and complexity (how long's the timer?  always fire it?  MAKE IT STOP)
> and are actually making the system worse, not better.

Can still prototype it, and if it's that ugly... I throw away code all
the time :P

> > Also, for this to be an issue at all, _all_ the reaping would have to be
> > done from userspace - since existing libaio doesn't do that, there may
> > not be any code out there which triggers it.
> 
> And there may be.  We default to not breaking interfaces.  Seriously.

All the more reason to think about it now so we don't screw up the next
interfaces :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RNG: is it possible to spoil /dev/random by seeding it from (evil) TRNGs

2012-10-09 Thread Christoph Anton Mitterer

On Sun, 2012-10-07 at 21:24 -0400, Theodore Ts'o wrote:
> I've looked at his message, I didn't see any justification for his
> concern/assertion.  So I can't really comment on it since he didn't
> give any reason for his belief.
I asked him again[0] to be sure and he replied to have no reason to
believe it's possible to spoil it.



> We've made a lot of changes in how we gather entropy recently
>...
I see,.. I guess this was in 3.6 then? Cause I made some tests with 3.5
and there (even on my desktop) available entropy is always rather
low ... but with haveged it quickly falls and rises (that actually
puzzles me) between 4096  and ~1k



> We're not using SHA has a traditional cryptographic hash
>...
Of course :) Thanks for the good explanation of the operation though!


> So I'm not particularly worried at this point.  The other thing to
> note is that the possible alternatives to SHA-1 (i.e., SHA-2 and
> SHA-3) are actually slower, not faster.  So we would be giving up
> performance if we were to use them.
I rather meant some other fast algos, e.g. those from the SHA3
competition which seem to be faster than SHA1.
Haven't measured myself but just took:
http://arctic.org/~dean/crypto/sha-sse2-20041218.txt
http://skein-hash.info/sha3-engineering
Well it's perhaps rather minor...


Thanks anyway for all your information :)


Cheers,
Chris.



[0]
http://lists.gnupg.org/pipermail/gnupg-users/2012-October/045551.html


smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 04/10] x86, mm: Don't clear page table if next range is ram

2012-10-09 Thread Yinghai Lu

On Tue, Oct 9, 2012 at 9:04 AM, Konrad Rzeszutek Wilk  wrote:
> How do we clean it wrongly?
>> And it only happens when we are trying to map range one by one range 
>> separately.
>>
>> After we add checking before clearing the related page table, that panic will
>> not happen anymore.
>
> So we do not clean the pages anymore. Is the cleaning of the pages
> addressed somewhere?

that is related calling sequence.

old design assume: we should call that several times under 1G.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: acpi : acpi_bus_trim() stops removing devices when failing to remove the device

2012-10-09 Thread Yasuaki Ishimatsu


Hi Toshi,

2012/10/10 1:36, Toshi Kani wrote:

On Tue, 2012-10-09 at 17:48 +0900, Yasuaki Ishimatsu wrote:

acpi_bus_trim() stops removing devices, when acpi_bus_remove() return error
number. But acpi_bus_remove() cannot return error number correctly.
acpi_bus_remove() only return -EINVAL, when dev argument is NULL. Thus even if
device cannot be removed correctly, acpi_bus_trim() ignores and continues to
remove devices. acpi_bus_hot_remove_device() uses acpi_bus_trim() for removing
devices. Therefore acpi_bus_hot_remove_device() can send "_EJ0" to firmware,
even if the device is running on the system. In this case, the system cannot
work well.

Vasilis hit the bug at memory hotplug and reported it as follow:
https://lkml.org/lkml/2012/9/26/318

So acpi_bus_trim() should check whether device was removed or not correctly.
The patch adds error check into some functions to remove the device.

Applying the patch, acpi_bus_trim() stops removing devices when failing
to remove the device. But I think there is no impact with the
exceptionof CPU and Memory hotplug path. Because other device also fails
but the fail is an irregular case like device is NULL.

Signed-off-by: Yasuaki Ishimatsu 

---
  drivers/acpi/scan.c|   15 ---
  drivers/base/dd.c  |   22 +-
  include/linux/device.h |2 +-
  3 files changed, 30 insertions(+), 9 deletions(-)

Index: linux-3.6/drivers/acpi/scan.c
===
--- linux-3.6.orig/drivers/acpi/scan.c  2012-10-09 17:25:40.956496325 +0900
+++ linux-3.6/drivers/acpi/scan.c   2012-10-09 17:25:55.405497800 +0900
@@ -445,12 +445,17 @@ static int acpi_device_remove(struct dev
  {
struct acpi_device *acpi_dev = to_acpi_device(dev);
struct acpi_driver *acpi_drv = acpi_dev->driver;
+   int ret;

if (acpi_drv) {
if (acpi_drv->ops.notify)
acpi_device_remove_notify_handler(acpi_dev);
-   if (acpi_drv->ops.remove)
-   acpi_drv->ops.remove(acpi_dev, acpi_dev->removal_type);
+   if (acpi_drv->ops.remove) {
+   ret = acpi_drv->ops.remove(acpi_dev,
+  acpi_dev->removal_type);
+   if (ret)


Hi Yasuaki,

Shouldn't the notify handler be reinstalled here if it was removed by
the acpi_device_remove_notify_handler() above?


I do not reinstall the notify handler.
The function has not been removed on linux-3.6. And the patch is created
on linux-3.6. So the function remains in the patch.

Thanks,
Yasuaki Ishimatsu



Thanks,
-Toshi


+   return ret;
+   }
}
acpi_dev->driver = NULL;
acpi_dev->driver_data = NULL;
@@ -1226,11 +1231,15 @@ static int acpi_device_set_context(struc

  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
  {
+   int ret;
+
if (!dev)
return -EINVAL;

dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
-   device_release_driver(>dev);
+   ret = device_release_driver(>dev);
+   if (ret)
+   return ret;

if (!rmdevice)
return 0;
Index: linux-3.6/drivers/base/dd.c
===
--- linux-3.6.orig/drivers/base/dd.c2012-10-01 08:47:46.0 +0900
+++ linux-3.6/drivers/base/dd.c 2012-10-09 17:25:55.442497825 +0900
@@ -475,9 +475,10 @@ EXPORT_SYMBOL_GPL(driver_attach);
   * __device_release_driver() must be called with @dev lock held.
   * When called for a USB interface, @dev->parent lock must be held as well.
   */
-static void __device_release_driver(struct device *dev)
+static int __device_release_driver(struct device *dev)
  {
struct device_driver *drv;
+   int ret = 0;

drv = dev->driver;
if (drv) {
@@ -493,9 +494,11 @@ static void __device_release_driver(stru
pm_runtime_put_sync(dev);

if (dev->bus && dev->bus->remove)
-   dev->bus->remove(dev);
+   ret = dev->bus->remove(dev);
else if (drv->remove)
-   drv->remove(dev);
+   ret = drv->remove(dev);
+   if (ret)
+   goto rollback;
devres_release_all(dev);
dev->driver = NULL;
dev_set_drvdata(dev, NULL);
@@ -506,6 +509,12 @@ static void __device_release_driver(stru
 dev);

}
+
+   return ret;
+
+rollback:
+   driver_sysfs_add(dev);
+   return ret;
  }

  /**
@@ -515,16 +524,19 @@ static void __device_release_driver(stru
   * Manually detach device from driver.
   * When called for a USB interface, @dev->parent lock must be held.
   */
-void device_release_driver(struct device *dev)
+int device_release_driver(struct device *dev)
  {

Re: [PATCH 03/10] x86, mm: get early page table from BRK

2012-10-09 Thread Yinghai Lu

On Tue, Oct 9, 2012 at 9:01 AM, Konrad Rzeszutek Wilk  wrote:
>> +RESERVE_BRK(early_pgt_alloc, 16384);
>
> How did you come up with 16KB being the right size? What is this
> based on? Can you provide a comment explaining why 16KB is the
> right value on 32-bit and 64-bit machines?

good point, i add one line comment to the revised patch.

+/* need 3 4k for initial PMD_SIZE, 4k for 0-ISA_END_ADDRESS */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] x86, mm: Don't clear page table if next range is ram

2012-10-09 Thread Yinghai Lu

On Tue, Oct 9, 2012 at 8:46 AM, Konrad Rzeszutek Wilk  wrote:
>> + !e820_any_mapped(addr & PAGE_MASK, next, 0))
>
> What is the 0 parameter for?

any type

if type != 0, the will only check entries with same type.

int
e820_any_mapped(u64 start, u64 end, unsigned type)
{
int i;

for (i = 0; i < e820.nr_map; i++) {
struct e820entry *ei = [i];

if (type && ei->type != type)
continue;
if (ei->addr >= end || ei->addr + ei->size <= start)
continue;
return 1;
}
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-09 Thread Kent Overstreet

On Tue, Oct 09, 2012 at 05:26:34PM -0700, Zach Brown wrote:
> > The AIO ringbuffer stuff just annoys me more than most
> 
> Not more than everyone, though, I can personally promise you that :).
> 
> > (it wasn't until
> > the other day that I realized it was actually exported to userspace...
> > what led to figuring that out was noticing aio_context_t was a ulong,
> > and got truncated to 32 bits with a 32 bit program running on a 64 bit
> > kernel. I'd been horribly misled by the code comments and the lack of
> > documentation.) 
> 
> Yeah.  It's the userspace address of the mmaped ring.  This has annoyed
> the process migration people who can't recreate the context in a new
> kernel because there's no userspace interface to specify creation of a
> context at a specific address.

Yeah I did finally figure that out - and a file descriptor that
userspace then mmap()ed would solve that problem...

> 
> > But if we do have an explicit handle, I don't see why it shouldn't be a
> > file descriptor.
> 
> Because they're expensive to create and destroy when compared to a
> single system call.  Imagine that we're using waiting for a single
> completion to implement a cheap one-off sync call.  Imagine it's a
> buffered op which happens to hit the cache and is really quick.

True. But that could be solved with a separate interface that either
doesn't use a context to submit a call synchronously, or uses an
implicit per thread context.

> (And they're annoying to manage: libraries and O_CLOEXEC, running into
> fd/file limit tunables, bleh.)

I don't have a _strong_ opinion there, but my intuition is that we
shouldn't be creating new types of handles without a good reason. I
don't think the annoyances are for the most part particular to file
descriptors, I think the tend to be applicable to handles in general and
at least with file descriptors they're known and solved.

Also, with a file descriptor it naturally works with an epoll event
loop. (eventfd for aio is a hack).

> If the 'completion context' is no more than a structure in userspace
> memory then a lot of stuff just works.  Tasks can share it amongst
> themselves as they see fit.  A trivial one-off sync call can just dump
> it on the stack and point to it.  It doesn't have to be specifically
> torn down on task exit.

That would be awesome, though for it to be worthwhile there couldn't be
any kernel notion of a context at all and I'm not sure if that's
practical. But the idea hadn't occured to me before and I'm sure you've
thought about it more than I have... hrm.

Oh hey, that's what acall does :P

For completions though you really want the ringbuffer pinned... what do
you do about that?

> > > And perhaps obviously, I'd start with the acall stuff :).  It was a lot
> > > lighter.  We could talk about how to make it extensible without going
> > > all the way to the generic packed variable size duplicating or not and
> > > returning or not or.. attributes :).
> > 
> > Link? I haven't heard of acall before.
> 
> I linked to it after that giant silly comment earlier in the thread,
> here it is again:
> 
>   http://lwn.net/Articles/316806/

Oh whoops, hadn't started reading yet - looking at it now :)

> There's a mostly embarassing video of a jetlagged me giving that talk at
> LCA kicking around.. ah, here:
> 
>  http://mirror.linux.org.au/pub/linux.conf.au/2009/Thursday/131.ogg
> 
> - z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ACPI video: Ignore AE_AML_PACKAGE_LIMIT errors after _DOD evaluation.

2012-10-09 Thread Igor Murzov

On Wed, 10 Oct 2012 00:58:10 +0200
"Rafael J. Wysocki"  wrote:

> On Wednesday 10 of October 2012 02:19:06 Igor Murzov wrote:
> > This should fix brightness controls on some laptops.
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47861
> > 
> > Signed-off-by: Igor Murzov 
> 
> Put more detials into the changelog, please.  The BZ entry linked above
> may or may not be accessible when the changelog is read by someone.
> 
> > ---
> >  drivers/acpi/video.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/acpi/video.c b/drivers/acpi/video.c
> > index 1e0a9e1..01bb58d 100644
> > --- a/drivers/acpi/video.c
> > +++ b/drivers/acpi/video.c
> > @@ -1349,8 +1349,12 @@ acpi_video_bus_get_devices(struct acpi_video_bus 
> > *video,
> > struct acpi_device *dev;
> >  
> > status = acpi_video_device_enumerate(video);
> > -   if (status)
> > -   return status;
> > +   if (status) {
> > +   if (status == AE_AML_PACKAGE_LIMIT)
> > +   status = 0; /* Ignore this error */
> > +   else
> > +   return status;
> > +   }
> 
> First off, please add a comment explaining _why_ we're ignoring the error.

The problem is that i'm not sure if it's ok to ignore
AE_AML_PACKAGE_LIMIT here. Stefan Wilkens in bugzilla
claims that video module works fine on his Compaq 6720s
in spite of the fact that acpi_video_device_enumerate()
is not able to find any video device on his laptop.
This fix is just the most obvious one, not necessarily
the proper one.


> Second, what about adding just:
> 
> if (status == AE_AML_PACKAGE_LIMIT)
>   status = 0;
> 
> before the "if (status)" check?

Ok

-- Igor

> >  
> > list_for_each_entry(dev, >children, node) {
> 
> Thanks,
> Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Zach Brown

> The only situation you have to worry about is when the ringbuffer fills
> up and stuff goes on the list, and then completions completely stop -
> this should be a rare enough situation that maybe we could just hack
> around it with a timer that gets flipped on when the list isn't empty.

Right.  And this is when we hopefully realize that we're adding overhead
and complexity (how long's the timer?  always fire it?  MAKE IT STOP)
and are actually making the system worse, not better.

> Also, for this to be an issue at all, _all_ the reaping would have to be
> done from userspace - since existing libaio doesn't do that, there may
> not be any code out there which triggers it.

And there may be.  We default to not breaking interfaces.  Seriously.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] fix x2apic defect that Linux kernel doesn't mask 8259A interrupt during the time window between changing VT-d table base address and initializing these VT-d entries(smpboot.c and apic.c )

2012-10-09 Thread Zhang, Lin-Bao (Linux Kernel R)

Hi Suresh, 

Thanks very much for your reply!
I tested this patch in 2.6.32 with our machine , it can indeed can resolve 
current issue. that 's good. 
I also check 3.x version of kernel source , for example , 3.3.8 and 3.0.0 , 
they indeed include similar patch , 
I think the key action is this line :
+   /*
+* The number of IO-APIC IRQ registers (== #pins):
+*/
+   nr_ioapic_registers[idx] = entries;

In 3.3.8 , it is like this:
/*
4011 * The number of IO-APIC IRQ registers (== #pins):
4012 */
4013ioapics[idx].nr_registers = entries;
I also use 3.3.8 to test , no modification don't reproduce our issue. 
If I comment this line (ioapics[idx].nr_registers = entries;) , it will 
reproduce the problem that occurs in 2.6.32. 
so this can prove that your patch should work for 2.6.x kernel. 

But I am not sure why it can work. Let's discuss again. I am researching the 
whole source again. 
It seems that we have 2 directions to fix current problem :
a) your patch : all the IO-APIC RTE's are masked between the time we enable 
interrupt-remapping to the time 
> when the IO-APIC RTE's are configured correctly.
b) my patch mask interrupt during VT-d table base address changing and VT-d 
entries initialized successfully. 
I suppose during the time window , we should disable 8259A interrupt. 

> As I mentioned earlier, the current design already ensures that all the 
> IO-APIC
> RTE's are masked between the time we enable interrupt-remapping to the time
> when the IO-APIC RTE's are configured correctly.
> 
So , we can think ,as your patch , during the window , IO-apic is useless or we 
can think IO-APIC doesn't exist ? 
Could you mind please sharing your design details ? thanks very much! 

> So I looked at why you are seeing the problem with v2.6.32 but not with the
> recent kernels. And I think I found out the reason.
> 
> 2.6.32 kernel is missing this fix,
> 
Yes, 2.6.32 doesn't have this patch. 3.x.x indeed have it. 

> Because of this, in v2.6.32, mask_IO_APIC_setup() is not working as expected
> as nr_ioapic_registers[] are not yet initialized and thus the io-apic RTE's 
> are not
> masked as expected.
> 
> We just need the last hunk of that patch, I think.
> 
> Can you please apply the appended patch to 2.6.32 kernel and see if the issue
> you mentioned gets fixed? If so, we can ask the -stable and OSV's teams to
> pick up this fix.
Yes , it can resolve current issue. 
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-09 Thread Zach Brown

> The AIO ringbuffer stuff just annoys me more than most

Not more than everyone, though, I can personally promise you that :).

> (it wasn't until
> the other day that I realized it was actually exported to userspace...
> what led to figuring that out was noticing aio_context_t was a ulong,
> and got truncated to 32 bits with a 32 bit program running on a 64 bit
> kernel. I'd been horribly misled by the code comments and the lack of
> documentation.) 

Yeah.  It's the userspace address of the mmaped ring.  This has annoyed
the process migration people who can't recreate the context in a new
kernel because there's no userspace interface to specify creation of a
context at a specific address.

> But if we do have an explicit handle, I don't see why it shouldn't be a
> file descriptor.

Because they're expensive to create and destroy when compared to a
single system call.  Imagine that we're using waiting for a single
completion to implement a cheap one-off sync call.  Imagine it's a
buffered op which happens to hit the cache and is really quick.

(And they're annoying to manage: libraries and O_CLOEXEC, running into
fd/file limit tunables, bleh.)

If the 'completion context' is no more than a structure in userspace
memory then a lot of stuff just works.  Tasks can share it amongst
themselves as they see fit.  A trivial one-off sync call can just dump
it on the stack and point to it.  It doesn't have to be specifically
torn down on task exit.

> > And perhaps obviously, I'd start with the acall stuff :).  It was a lot
> > lighter.  We could talk about how to make it extensible without going
> > all the way to the generic packed variable size duplicating or not and
> > returning or not or.. attributes :).
> 
> Link? I haven't heard of acall before.

I linked to it after that giant silly comment earlier in the thread,
here it is again:

  http://lwn.net/Articles/316806/

There's a mostly embarassing video of a jetlagged me giving that talk at
LCA kicking around.. ah, here:

 http://mirror.linux.org.au/pub/linux.conf.au/2009/Thursday/131.ogg

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.1-rt1

2012-10-09 Thread Steven Rostedt

On Tue, 2012-10-09 at 14:19 -0400, Steven Rostedt wrote:

I applied and tested the backported patches to 3.4-rt. Things look good
and will be posting the -rc1 soon.

Status for 3.0-rt:

> -scsi-qla2xxx-fix-bug-sleeping-function-called-from-invalid-context.patch
> 0001-upstream-net-rt-remove-preemption-disabling-in-netisched-better-debug-output-for-might-sleep.patch
>  f_rx.patch

The above two have been applied to both 3.0-rt and 3.4-rt previously.

> 0002-random-make-it-work-on-rt.patch

The above has been applied to 3.4-rt but is not applicable to 3.0-rt, as
the add_interrupt_randomness() does not exist.

> 0003-softirq-init-softirq-local-lock-after-per-cpu-section-is-set-up.patch

For 3.0-rt the softirq_early_init() comes after the printk banner, which
is after per cpu data has been set up. But for consistency, I made this
patch moved to before the banner in the same location as 3.4-rt+ is.

> 0004-mm-slab-fix-potential-deadlock.patch
> 0005-mm-page-alloc-use-local-lock-on-target-cpu.patch
> 0006-rt-rw-lockdep-annotations.patch

The above applied to 3.0-rt with no issues.

* sched-better-debug-output-for-might-sleep.patch 

Had slight conflicts, but trivial fix.

> 0007-stomp-machine-deal-clever-with-stopper-lock.patch

With this one, things have changed quite a bit. I'll take a deeper look
at what you did and figure out how this applies to v3.0-rt.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] ARM: OMAP: Trivial driver changes to remove include plat/cpu.h

2012-10-09 Thread Tony Lindgren

* Péter Ujfalusi  [121009 02:03]:
> On 10/08/2012 07:35 PM, Tony Lindgren wrote:
> 
> > - omap-dma.c and omap-pcm.c can test the arch locally as
> >   omap1 and omap2 cannot be compiled together because of
> >   conflicting compiler flags
> 
> >  sound/soc/omap/omap-pcm.c |9 +++--
> 
> Tony: is this going to be included in 3.7?

Hmm I guess we could try to get this out of the way
to cut down the dependencies. Let's if maintainers
of the other affected drivers this is OK for the
-rc series.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver

2012-10-09 Thread KY Srinivasan



> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, October 09, 2012 3:45 PM
> To: KY Srinivasan
> Cc: gre...@linuxfoundation.org; linux-kernel@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> a...@firstfloor.org
> Subject: Re: [PATCH 2/2] Drivers: hv: Add Hyper-V balloon driver
> 
> On Sun,  7 Oct 2012 16:59:46 -0700
> "K. Y. Srinivasan"  wrote:
> 
> > Add the basic balloon driver.
> 
> hm, how many balloon drivers does one kernel need?
> 
> Although I see that the great majority of this code is hypervisor-specific.
> 
> > Windows hosts dynamically manage the guest
> > memory allocation via a combination memory hot add and ballooning. Memory
> > hot add is used to grow the guest memory upto the maximum memory that can
> be
> > allocatted to the guest. Ballooning is used to both shrink as well as expand
> > up to the max memory. Supporting hot add needs additional support from the
> > host. We will support hot add when this support is available. For now,
> > by setting the VM startup memory to the VM  max memory, we can use
> > ballooning alone to dynamically manage memory allocation amongst
> > competing guests on a given host.
> >
> >
> > ...
> >
> > +static int  alloc_balloon_pages(struct hv_dynmem_device *dm, int
> num_pages,
> > +struct dm_balloon_response *bl_resp, int alloc_unit,
> > +bool *alloc_error)
> > +{
> > +   int i = 0;
> > +   struct page *pg;
> > +
> > +   if (num_pages < alloc_unit)
> > +   return 0;
> > +
> > +   for (i = 0; (i * alloc_unit) < num_pages; i++) {
> > +   if (bl_resp->hdr.size + sizeof(union dm_mem_page_range) >
> > +   PAGE_SIZE)
> > +   return i * alloc_unit;
> > +
> > +   pg = alloc_pages(GFP_HIGHUSER | __GFP_NORETRY |
> GFP_ATOMIC |
> > +   __GFP_NOMEMALLOC | __GFP_NOWARN,
> > +   get_order(alloc_unit << PAGE_SHIFT));
> 
> This choice of GFP flags is basically impossible to understand, so I
> suggest that a comment be added explaining it all.
> 
> I'm a bit surprised at the inclusion of GFP_ATOMIC as it will a) dip
> into page reserves, whcih might be undesirable and b) won't even
> reclaim clean pages, which seems desirable.  I suggest this also be
> covered in the forthcoming code comment.

I will rework these flags and add appropriate comments.

> 
> drivers/misc/vmw_balloon.c seems to me to have used better choices here.
> 
> > +   if (!pg) {
> > +   *alloc_error = true;
> > +   return i * alloc_unit;
> > +   }
> > +
> > +   totalram_pages -= alloc_unit;
> 
> Well, I'd consider totalram_pages to be an mm-private thing which drivers
> shouldn't muck with.  Why is this done?

By modifying the totalram_pages, the information presented in /proc/meminfo
correctly reflects what is currently assigned to the guest (MemTotal).
 
> 
> drivers/xen/balloon.c and drivers/virtio/virtio_balloon.c also alter
> totalram_pages, also without explaining why.
> drivers/misc/vmw_balloon.c does not.
> 
> > +   dm->num_pages_ballooned += alloc_unit;
> > +
> > +   bl_resp->range_count++;
> > +   bl_resp->range_array[i].finfo.start_page =
> > +   page_to_pfn(pg);
> > +   bl_resp->range_array[i].finfo.page_cnt = alloc_unit;
> > +   bl_resp->hdr.size += sizeof(union dm_mem_page_range);
> > +
> > +   }
> > +
> > +   return num_pages;
> > +}
> >
> > ...
> >
> 
> 
> 

Thanks for the prompt review. I will address your comments and repost the 
patches soon.
If it is ok with you, I am going to keep the code that manipulates 
totalram_pages 
(for reasons I listed above).

Regards,

K. Y

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq, powernow-k8: Fix usage of smp_processor_id() in preemptible code

2012-10-09 Thread Tejun Heo

Hello,

On Tue, Oct 09, 2012 at 09:38:44PM +0200, Andreas Herrmann wrote:
> 
> Commit 6889125b8b4e09c5e53e6ecab3433bed1ce198c9
> (cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another 
> CPU)
> causes powernow-k8 to trigger a preempt warning, e.g.:
> 
>   BUG: using smp_processor_id() in preemptible [] code: cpufreq/3776
>   caller is powernowk8_target+0x20/0x49
>   Pid: 3776, comm: cpufreq Not tainted 3.6.0 #9
>   Call Trace:
>[] debug_smp_processor_id+0xc7/0xe0
>[] powernowk8_target+0x20/0x49
>[] __cpufreq_driver_target+0x82/0x8a
>[] cpufreq_governor_performance+0x4e/0x54
>[] __cpufreq_governor+0x8c/0xc9
>[] __cpufreq_set_policy+0x1a9/0x21e
>[] store_scaling_governor+0x16f/0x19b
>[] ? cpufreq_update_policy+0x124/0x124
>[] ? _raw_spin_unlock_irqrestore+0x2c/0x49
>[] store+0x60/0x88
>[] sysfs_write_file+0xf4/0x130
>[] vfs_write+0xb5/0x151
>[] sys_write+0x4a/0x71
>[] system_call_fastpath+0x16/0x1b
...
> - /*
> -  * Must run on @pol->cpu.  cpufreq core is responsible for ensuring
> -  * that we're bound to the current CPU and pol->cpu stays online.
> -  */

Urgh... so this wasn't true?  Well, the perils of the last minute
changes.

> - if (smp_processor_id() == pol->cpu)
> - return powernowk8_target_fn();
> - else
> - return work_on_cpu(pol->cpu, powernowk8_target_fn, );
> + this_cpu = get_cpu();
> + if (this_cpu == pol->cpu) {
> + ret = powernowk8_target_fn();
> + put_cpu();
> + } else {
> + put_cpu();
> + ret = work_on_cpu(pol->cpu, powernowk8_target_fn, );
> + }
> +
> + return ret;

Looking at the code, yes, I think the above is correct.  Rafael, can
you please confirm?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] aio: Rewrite refcounting

2012-10-09 Thread Kent Overstreet

On Tue, Oct 09, 2012 at 03:35:04PM -0700, Zach Brown wrote:
> > Alright... send it out then.
> 
> Workin' on it! :)
> 
> > Also, do you know which branch Jens has his patches in?
> 
> http://git.kernel.dk/?p=linux-block.git;a=commit;h=6b6723fc3e4f24dbd80526df935ca115ead578c6
> 
> https://plus.google.com/111643045511375507360/posts
> 
> As far as I know, he hasn't had a chance to work on it recently.

Thanks - wasn't sure if I was looking at the right branch. Looking at it
now...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Documentation of kconfig language differs from implementation regarding existence of symbols

2012-10-09 Thread Martin Walch

The file linux/Documentation/kbuild/kconfig-language.txt says:

> The following two methods produce the same kconfig symbol dependencies
> but differ greatly in kconfig symbol existence (production) in the
> generated config file.
>
> case 1:
>
> config FOO
> tristate "about foo"
> depends on BAR
>
> vs. case 2:
>
> if BAR
> config FOO
> tristate "about foo"
> endif
>
> In case 1, the symbol FOO will always exist in the config file (given
> no other dependencies).  In case 2, the symbol FOO will only exist in
> the config file if BAR is enabled.

However, I can not reproduce this. The attached file contains both cases. When 
running make menuconfig, setting BAR0 and BAR1 both to n, and saving the 
configuration, there is neither FOO0 nor FOO1 in the resulting configuration 
file.

According to the documentation, at least FOO0 should exist, even if BAR0 is 
set to n. But this is not the case. AFAICS, both versions behave equivalently. 
I suggest changing the documentation accordingly.

Regards
Martin Walch
-- 
config FOO0
tristate "FOO0"
depends on BAR0

config BAR0
tristate "BAR0"

config BAR1
tristate "BAR1"

if BAR1

config FOO1
tristate "FOO1"

endif


signature.asc
Description: This is a digitally signed message part.

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Kent Overstreet

On Tue, Oct 09, 2012 at 03:58:36PM -0700, Zach Brown wrote:
> > Not if we decouple the ringbuffer size from max_requests.
> 
> Hmm, interesting.
> 
> > This would be useful to do anyways because right now, allocating a kiocb
> > has to take a global refcount and check head and tail in the ringbuffer
> > just so it can avoid overflowing the ringbuffer.
> 
> I'm not sure what you mean by a 'global refcount'.. do you mean the
> per-mm ctx_lock?

kioctx->reqs_active. We just have to keep that count somehow if we're
going to avoid overflowing the ringbuffer.

> > If we change aio_complete() so that if the ringbuffer is full then the
> > kiocb just goes on a linked list - we can size the ringbuffer so this
> > doesn't happen normally and avoid the global synchronization in the fast
> > path.
> 
> How would completion events make their way from the list to the ring if
> an app is only checking the ring for completions from userspace?

Either they'd have to make a syscall when the ringbuffer is empty- which
should be fine, because at least for most apps all they could do is
sleep or spin. Alternately we could maintain a flag next to the tail
pointer in the ringbuffer that the kernel could maintain, if userspace
wants to be able to avoid that syscall when it's not necessary.

Although - since current libaio skips the syscall if the ringbuffer is
empty, this is yet another thing we can't do with the current
ringbuffer.

Crap.

Well, we should be able to hack around it with the existing
ringbuffer... normally, if the ringbuffer is full and stuff goes on the
list, then there's going to be more completions coming later and stuff
would get pulled off the list then.

The only situation you have to worry about is when the ringbuffer fills
up and stuff goes on the list, and then completions completely stop -
this should be a rare enough situation that maybe we could just hack
around it with a timer that gets flipped on when the list isn't empty.

Also, for this to be an issue at all, _all_ the reaping would have to be
done from userspace - since existing libaio doesn't do that, there may
not be any code out there which triggers it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/2] mm: Export vm_committed_as

2012-10-09 Thread KY Srinivasan



> -Original Message-
> From: Andrew Morton [mailto:a...@linux-foundation.org]
> Sent: Tuesday, October 09, 2012 3:48 PM
> To: Greg KH
> Cc: KY Srinivasan; o...@aepfle.de; linux-kernel@vger.kernel.org;
> a...@firstfloor.org; a...@canonical.com; de...@linuxdriverproject.org
> Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> 
> On Mon, 8 Oct 2012 06:35:39 -0700
> Greg KH  wrote:
> 
> > On Mon, Oct 08, 2012 at 03:35:50AM +, KY Srinivasan wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > > > Sent: Sunday, October 07, 2012 8:44 PM
> > > > To: KY Srinivasan
> > > > Cc: linux-kernel@vger.kernel.org; de...@linuxdriverproject.org;
> o...@aepfle.de;
> > > > a...@canonical.com; a...@linux-foundation.org; a...@firstfloor.org
> > > > Subject: Re: [PATCH 1/2] mm: Export vm_committed_as
> > > >
> > > > On Sun, Oct 07, 2012 at 04:59:45PM -0700, K. Y. Srinivasan wrote:
> > > > > The policy engine on the host expects the guest to report the
> > > > > committed_as. Since this variable is not exported,
> > > > > export this symbol.
> > > >
> > > > Why are these symbols not needed by either Xen or KVM or vmware, which
> > > > I think all support the same thing, right?
> > >
> > > The basic balloon driver does not need this symbol since the basic balloon
> driver
> > > is not automatically driven by the host. On the Windows host we have a 
> > > policy
> engine that
> > > drives the balloon driver based on both guest level memory pressure that 
> > > the
> guest
> > > reports as well as other system level metrics the host maintains. We need 
> > > this
> symbol to
> > > drive the policy engine on the host.
> >
> > Ok, but you're going to have to get the -mm developers to agree that
> > this is ok before I can accept it.
> 
> Well I guess it won't kill us.


Thanks.

K. Y
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Volatile Ranges (v7) & Lots of words

2012-10-09 Thread Minchan Kim

On Tue, Oct 09, 2012 at 02:30:03PM -0700, John Stultz wrote:
> On 10/09/2012 01:07 AM, Mike Hommey wrote:
> >Note it doesn't have to be a vs. situation. madvise could be an
> >additional way to interface with volatile ranges on a given fd.
> >
> >That is, madvise doesn't have to mean anonymous memory. As a matter of
> >fact, MADV_WILLNEED/MADV_DONTNEED are usually used on mmaped files.
> >Similarly, there could be a way to use madvise to mark volatile ranges,
> >without the application having to track what memory ranges are
> >associated to what part of what file, which the kernel already tracks.
> 
> Good point. We could add madvise() interface, but limit it only to
> mmapped tmpfs files, in parallel with the fallocate() interface.
> 
> However, I would like to think through how MADV_MARK_VOLATILE with
> purely anonymous memory could work, before starting that approach.
> That and Neil's point that having an identical kernel interface
> restricted to tmpfs, only as a convenience to userland in switching
> from virtual address to/from mmapped file offset may be better left
> to a userland library.

How about this?

The scenario I imagine about madvise semantic following as.

1) Anonymous pages
Assume that there is some allocator library which manage mmaped reserved pool.
If it has lots of free memory which isn't used by anyone, it can unmap part of
reserved pool but unmap isn't cheap because kernel should zap all ptes of the
pages in the range. But if we avoid unmap, VM would swap out that range which
have just garbage unnecessary when memory pressure happens.
If it mark that range volatile, we can avoid unnecessary swap out and even
reclaim them with no swap. Only thing allocator have to do is unmark that range
before allocating to user.

2) File pages(NOT tmpfs)
We can reclaim volatile file pages easily without recycling of LRU
although it is accessed recently.
The difference with DONTNEED is that DONTNEED always move pages to
tail of inactive LRU to reclaim early but VOLATILE semantic leave them
as it is without moving to tail and reclaim them without considering
recently-used when they reach at tail of LRU by aging because they can
be unmarked sooner or later for using and we can't expect cost of
recreating of the object.

So reclaim preference : NORMAL < VOLATILE < DONTNEED

> 
> thanks
> -john
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1422 matches

Mail list logo