Re: lots of suspicious RCU traces

2012-10-24 Thread Sergey Senozhatsky
On (10/25/12 00:32), Frederic Weisbecker wrote:
> > My understanding is (I may be wrong) that we can schedule() from ptrace 
> > chain to
> > some arbitrary task, which will continue its execution from the point where 
> > RCU assumes
> > CPU as not idle, while CPU in fact still in idle state -- no one said 
> > rcu_idle_exit()
> > (or similar) prior to schedule() call.
> 
> Yeah but when we are in syscall_trace_leave(), the CPU shouldn't be in
> RCU idle mode. That's where the bug is. How do you manage to trigger
> this bug?
> 

just for note, 
git bisect good v3.6

[  199.897703] ===
[  199.897706] [ INFO: suspicious RCU usage. ]
[  199.897710] 3.6.0-dbg-06307-ga78562e-dirty #1379 Not tainted
[  199.897713] ---
[  199.897717] include/linux/rcupdate.h:738 rcu_read_lock() used illegally 
while idle!
[  199.897719] 
other info that might help us debug this:

[  199.897724] 
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
[  199.897729] RCU used illegally from extended quiescent state!
[  199.897732] 2 locks held by top/2396:
[  199.897735]  #0:  (>lock){-.-.-.}, at: [] 
__schedule+0x119/0xb10
[  199.897755]  #1:  (rcu_read_lock){.+.+..}, at: [] 
cpuacct_charge+0x15/0x250
[  199.897770] 
stack backtrace:
[  199.897775] Pid: 2396, comm: top Not tainted 3.6.0-dbg-06307-ga78562e-dirty 
#1379
[  199.897779] Call Trace:
[  199.897791]  [] lockdep_rcu_suspicious+0xe2/0x130
[  199.897798]  [] cpuacct_charge+0x1bc/0x250
[  199.897804]  [] ? cpuacct_charge+0x15/0x250
[  199.897810]  [] ? __schedule+0x119/0xb10
[  199.897818]  [] update_curr+0xec/0x230
[  199.897825]  [] put_prev_task_fair+0x9c/0xf0
[  199.897831]  [] __schedule+0x1ac/0xb10
[  199.897841]  [] ? do_raw_spin_unlock+0x5d/0xb0
[  199.897847]  [] ? trace_hardirqs_off+0xd/0x10
[  199.897853]  [] ? _raw_spin_unlock_irqrestore+0x77/0x80
[  199.897860]  [] ? try_to_wake_up+0x1ff/0x350
[  199.897867]  [] ? __lock_acquire+0x3d9/0xb70
[  199.897875]  [] ? kfree+0xa9/0x260
[  199.897882]  [] ? __call_rcu+0x105/0x250
[  199.897887]  [] __cond_resched+0x2a/0x40
[  199.897891]  [] _cond_resched+0x2f/0x40
[  199.897898]  [] dput+0x128/0x1e0
[  199.897902]  [] __fput+0x148/0x260
[  199.897907]  [] ? finish_task_switch+0x3f/0x120
[  199.897911]  [] fput+0xe/0x10
[  199.897917]  [] task_work_run+0xbc/0xf0
[  199.897923]  [] ptrace_notify+0x89/0x90
[  199.897931]  [] syscall_trace_leave+0x8d/0x220
[  199.897939]  [] ? int_very_careful+0x5/0xd
[  199.897944]  [] ? trace_hardirqs_on_caller+0x105/0x190
[  199.897949]  [] int_check_syscall_exit_work+0x34/0x3d


-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: lots of suspicious RCU traces

2012-10-24 Thread Sergey Senozhatsky
On (10/25/12 00:32), Frederic Weisbecker wrote:
> First of all, thanks a lot for your report.
> 
> 2012/10/24 Sergey Senozhatsky :
> > On (10/24/12 20:06), Oleg Nesterov wrote:
> >> On 10/24, Sergey Senozhatsky wrote:
> >> >
> >> > small question,
> >> >
> >> > ptrace_notify() and forward calls are able to both indirectly and 
> >> > directly call schedule(),
> >> > /* direct call from ptrace_stop()*/,
> >> > should, in this case, rcu_user_enter() be called before 
> >> > tracehook_report_syscall_exit(regs, step)
> >> > and ptrace chain?
> >>
> >> Well, I don't really understand this magic... but why?
> >>
> >
> > My understanding is (I may be wrong) that we can schedule() from ptrace 
> > chain to
> > some arbitrary task, which will continue its execution from the point where 
> > RCU assumes
> > CPU as not idle, while CPU in fact still in idle state -- no one said 
> > rcu_idle_exit()
> > (or similar) prior to schedule() call.
> 
> Yeah but when we are in syscall_trace_leave(), the CPU shouldn't be in
> RCU idle mode. That's where the bug is. How do you manage to trigger
> this bug?
>

strace -f 

 
-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Remove struct CsrEvent

2012-10-24 Thread SeongJae Park
Nobody use struct CsrEvent. Remove it.

Signed-off-by: SeongJae Park 
---
 drivers/staging/csr/csr_framework_ext_types.h |   14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/staging/csr/csr_framework_ext_types.h 
b/drivers/staging/csr/csr_framework_ext_types.h
index d97c2de..9623eec 100644
--- a/drivers/staging/csr/csr_framework_ext_types.h
+++ b/drivers/staging/csr/csr_framework_ext_types.h
@@ -30,25 +30,11 @@ struct CsrThread
 charname[16];
 };
 
-struct CsrEvent
-{
-/* wait_queue for waking the kernel thread */
-wait_queue_head_t wakeup_q;
-unsigned int  wakeup_flag;
-};
-
 typedef struct semaphore CsrMutexHandle;
 typedef struct CsrThread CsrThreadHandle;
 
 #else /* __KERNEL __ */
 
-struct CsrEvent
-{
-pthread_cond_t  event;
-pthread_mutex_t mutex;
-u32   eventBits;
-};
-
 typedef pthread_mutex_t CsrMutexHandle;
 typedef pthread_t CsrThreadHandle;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Remove CsrEventHandle and functions using it

2012-10-24 Thread SeongJae Park
Nobody use CsrEventHandle, Nobody call function using it as parameter.
So, remove it.

Signed-off-by: SeongJae Park 
---
 drivers/staging/csr/csr_framework_ext.h   |   61 -
 drivers/staging/csr/csr_framework_ext_types.h |2 -
 2 files changed, 63 deletions(-)

diff --git a/drivers/staging/csr/csr_framework_ext.h 
b/drivers/staging/csr/csr_framework_ext.h
index 66973e9..7cfbd48 100644
--- a/drivers/staging/csr/csr_framework_ext.h
+++ b/drivers/staging/csr/csr_framework_ext.h
@@ -36,67 +36,6 @@ extern "C" {
 
 /**
  *  NAME
- *  CsrEventCreate
- *
- *  DESCRIPTION
- *  Creates an event and returns a handle to the created event.
- *
- *  RETURNS
- *  Possible values:
- *  CSR_RESULT_SUCCESS  in case of success
- *  CSR_FE_RESULT_NO_MORE_EVENTS   in case of out of event resources
- *  CSR_FE_RESULT_INVALID_POINTER  in case the eventHandle pointer is 
invalid
- *
- 
**/
-CsrResult CsrEventCreate(CsrEventHandle *eventHandle);
-
-/**
- *  NAME
- *  CsrEventWait
- *
- *  DESCRIPTION
- *  Wait fore one or more of the event bits to be set.
- *
- *  RETURNS
- *  Possible values:
- *  CSR_RESULT_SUCCESS  in case of success
- *  CSR_FE_RESULT_TIMEOUT  in case of timeout
- *  CSR_FE_RESULT_INVALID_HANDLE   in case the eventHandle is 
invalid
- *  CSR_FE_RESULT_INVALID_POINTER  in case the eventBits pointer 
is invalid
- *
- 
**/
-CsrResult CsrEventWait(CsrEventHandle *eventHandle, u16 timeoutInMs, u32 
*eventBits);
-
-/**
- *  NAME
- *  CsrEventSet
- *
- *  DESCRIPTION
- *  Set an event.
- *
- *  RETURNS
- *  Possible values:
- *  CSR_RESULT_SUCCESS  in case of success
- *  CSR_FE_RESULT_INVALID_HANDLE   in case the eventHandle is 
invalid
- *
- 
**/
-CsrResult CsrEventSet(CsrEventHandle *eventHandle, u32 eventBits);
-
-/**
- *  NAME
- *  CsrEventDestroy
- *
- *  DESCRIPTION
- *  Destroy the event associated.
- *
- *  RETURNS
- *  void
- *
- 
**/
-void CsrEventDestroy(CsrEventHandle *eventHandle);
-
-/**
- *  NAME
  *  CsrMutexCreate
  *
  *  DESCRIPTION
diff --git a/drivers/staging/csr/csr_framework_ext_types.h 
b/drivers/staging/csr/csr_framework_ext_types.h
index 57194ee..d97c2de 100644
--- a/drivers/staging/csr/csr_framework_ext_types.h
+++ b/drivers/staging/csr/csr_framework_ext_types.h
@@ -37,7 +37,6 @@ struct CsrEvent
 unsigned int  wakeup_flag;
 };
 
-typedef struct CsrEvent CsrEventHandle;
 typedef struct semaphore CsrMutexHandle;
 typedef struct CsrThread CsrThreadHandle;
 
@@ -50,7 +49,6 @@ struct CsrEvent
 u32   eventBits;
 };
 
-typedef struct CsrEvent CsrEventHandle;
 typedef pthread_mutex_t CsrMutexHandle;
 typedef pthread_t CsrThreadHandle;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Kdump with signed images

2012-10-24 Thread Mimi Zohar
On Wed, 2012-10-24 at 13:19 -0400, Vivek Goyal wrote:
> On Tue, Oct 23, 2012 at 09:44:59AM -0700, Eric W. Biederman wrote:
> > Matthew Garrett  writes:
> > 
> > > On Tue, Oct 23, 2012 at 10:59:20AM -0400, Vivek Goyal wrote:
> > >
> > >> But what about creation of a new program which can call kexec_load()
> > >> and execute an unsigned kernel. Doesn't look like that will be
> > >> prevented using IMA.

Like the existing kernel modules, kexec_load() is not file descriptor
based.  There isn't an LSM or IMA-appraisal hook here.

> > > Right. Trusting userspace would require a new system call that passes in 
> > > a signature of the userspace binary, and the kernel would then have to 
> > > verify the ELF object in memory in order to ensure that it 
> > > matches the signature. Verifying that the copy on the filesystem is 
> > > unmodified isn't adequate - an attacker could simply have paused the 
> > > process and injected code. 

I haven't looked at kexec_load() in detail, but like kernel modules, I
think the better solution would be to pass a file descriptor, especially
if you're discussing a new system call.  (cc'ing Kees.)

> > Verifying the copy on the filesystem at exec time is perfectly adequate
> > for gating extra permissions.  Certainly that is the model everywhere
> > else in the signed key chain.
> > 
> > Where IMA falls short is there is no offline signing capability in IMA
> > itself.  I think EVM may fix that.

I'm not sure what you mean by offline signing capability.  IMA-appraisal
verifies a file's 'security.ima' xattr, which may contain a hash or a
digital signature.  EVM protects a file's metadata, including
'security.ima'.  'security.evm' can be either an hmac or a digital
signature.

> [ CCing lkml. I think it is a good idea to open discussion to wider
> audience. Also CCing IMA/EVM folks ]

thanks!

> Based on reading following wiki page, looks like EVM also does not allow
> offline signing capability. And EVM is protecting IMA data to protect
> against offline attack. If we can assume that unisgned kernels can't be
> booted on the platform, then EVM might not be a strict requirement in
> this case.

> So as you said, one of the main problem with IMA use to verify /sbin/kexec
> is that IMA does not provide offline signing capability.

?
 
> > 
> > > Realistically, the only solution here is for 
> > > the kernel to verify that the kernel it's about to boot is signed and 
> > > for it not to take any untrusted executable code from userspace.

>From an IMA, as opposed to an IMA-appraisal, perspective, kexec is
problematic.  IMA maintains a measurement list and extends a PCR with
the file hash.  The measurement list and PCR value are used to attest to
the integrity of the running system.  As the original measurement list
is lost after kexec, but the PCR value hasn't been reset, the
measuremnet list and PCR value won't agree.

> > Hogwash.  The kernel verifing a signature of /sbin/kexec at exec time is
> > perfectly reasonable, and realistic.  In fact finding a way to trust
> > small bits of userspace even if root is compromised seems a far superior
> > model to simply solving the signing problem for /sbin/kexec.

Huh?  I don't understand what you're suggesting.  Once root has been
compromised, that's it.

> > Although I do admit some part of the kexec process will need to verify
> > keys on the images we decide to boot.

Which keys?  Isn't the kernel module key builtin to the kernel and
included in the kernel image signature?

> It should be an option, isn't it? Either /sbin/kexec can try to verify the
> integrity of kernel or we extend try to extend kexec() system call to also
> pass the signature of kernel and let kernel verify it (as you mentioned
> previously).
> 
> Thanks
> Vivek
> 

As suggested above, please consider passing a file descriptor, at least
in addition, if not in lieu of a signature.

thanks,

Mimi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [sqlite] light weight write barriers

2012-10-24 Thread Theodore Ts'o
On Wed, Oct 24, 2012 at 03:03:00PM -0700, da...@lang.hm wrote:
> Like what is being described for sqlite, loosing the tail end of the
> messages is not a big problem under normal conditions. But there is
> a need to be sure that what is there is complete up to the point
> where it's lost.
> 
> this is similar in concept to write-ahead-logs done for databases
> (without the absolute durability requirement)

If that's what you require, and you are using ext3/4, usng data
journalling might meet your requirements.  It's something you can
enable on a per-file basis, via chattr +j; you don't have to force all
file systems to use data journaling via the data=journalled mount
option.

The potential downsides that you may or may not care about for this
particular application:

(a) This will definitely have a performance impact, especially if you
are doing lots of small (less than 4k) writes, since the data blocks
will get run through the journal, and will only get written to their
final location on disk.

(b) You don't get atomicity if the write spans a 4k block boundary.
All of the bytes before i_size will be written, so you don't have to
worry about "holes"; but the last message written to the log file
might be truncated.

(c) There will be a performance impact, since the contents of data
blocks will be written at least twice (once to the journal, and once
to the final location on disk).  If you do lots of small, sub-4k
writes, the performance might be even worse, since data blocks might
be written multiple times to the journal.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] CMA and DMA-mapping fixes for v3.7-rc3

2012-10-24 Thread Marek Szyprowski
Hi Linus,

I would like to ask for pulling some minor fixes for both CMA
(Contiguous Memory Allocator) and DMA-mapping framework for v3.7-rc3.



The following changes since commit 6f0c0580b70c89094b3422ba81118c7b959c7556:

  Linux 3.7-rc2 (2012-10-20 12:11:32 -0700)

are available in the git repository at:

  git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git fixes_for_linus

for you to fetch changes up to 4e85fb831aa210fd1c5e2cb7909ac203c1f5b67f:

  ARM: mm: Remove unused arm_vmregion priv field (2012-10-24 07:38:15 +0200)



This pull request consists mainly of a set of one-liner fixes and
cleanups for a few minor issues identified in both Contiguous Memory
Allocator code and ARM DMA-mapping subsystem.

Thanks!

Best regards
Marek Szyprowski
Samsung Poland R Center



Patch summary:

Bob Liu (1):
  mm: cma: alloc_contig_range: return early for err path

Jingoo Han (1):
  ARM: dma-mapping: fix build warning in __dma_alloc()

Laurent Pinchart (4):
  drivers: dma-contiguous: Don't redefine SZ_1M
  drivers: dma-coherent: Fix typo in dma_mmap_from_coherent documentation
  drivers: cma: Fix wrong CMA selected region size default value
  ARM: mm: Remove unused arm_vmregion priv field

Ming Lei (1):
  ARM: dma-mapping: support debug_dma_mapping_error

 arch/arm/include/asm/dma-mapping.h |1 +
 arch/arm/mm/dma-mapping.c  |2 +-
 arch/arm/mm/vmregion.h |1 -
 drivers/base/Kconfig   |2 +-
 drivers/base/dma-coherent.c|5 ++---
 drivers/base/dma-contiguous.c  |5 +
 mm/page_alloc.c|2 +-
 7 files changed, 7 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


The idea about scheduler test module(STM)

2012-10-24 Thread Michael Wang
Hi, Folks

Charles has raised a problem that we don't have any tool yet
for testing the scheduler with out any disturb from other
subsystem, and I also found it's hard to test scheduler optimize
patch, since the improvement could be easily eaten by other
subsystem like IO.

So Let's check the tools we have currently:
1. perf sched

we can use it to trace the threads we interested, and
the info it provided is very good, but one issue is,
it could not create the workload we want, also collect
the info and do summary is not so easy.

2. linsched

It's a very good tool to create the test environment,
but it's implementation is to ideal, so it could not
present the real world problem.

Since both perf and linsched could not meet our requirement, we
decided to develop a new tool, let's currently call it
scheduler test module(STM).

It's propose is:
1. create the workload we want.
2. test the pure scheduler.
3. collect info we need and do summary.

This tool should be very easy to use and not depends on
the implementation of scheduler.

We can use it to check the pure scheduler performance on
our system.

We can use it to check whether there are regression in 
scheduler when testing patches.

And other usage I've not figure out yet.

In order to explain the idea more directly, I have wrote a
prototype STM, it's a separate module, and you can use it
just like 'rcutorture'.

I attached a small script 'play.sh' to help you easily
run the test, put 'schedtm.c' and 'play.sh' in same directory
and run 'play.sh', you will see out put like:

schedtm: summary
schedtm:cpu count:  cpu run preempt
schedtm:0   13811381
schedtm:1   957 955
schedtm:2   900 900
schedtm:3   10351034
schedtm:4   991 990
schedtm:5   940 939
schedtm:6   900 897
schedtm:7   942 948
schedtm:8   852 850
schedtm:9   931 938
schedtm:10  936 934
schedtm:11  951 950
schedtm:total time(us): 10138172
schedtm:run time(us):   5055223(49.86%)
schedtm:wait time(us):  5082949
schedtm:latency(us):10489
schedtm: stmt22 got highest run time 5604941(+10%)
schedtm: stmt3 got lowest run time 4852057(-4%)
schedtm: stmt12 got highest latency 11482(+9%)
schedtm: stmt0 got lowest latency 7561(-27%)

And you can enable/disable CONFIG_PREEMPT to see magnificent
change on latency.

This is nothing but a demo, and please "RUN IT ON A TEST MACHINE"...

It will create 24 kernel threads and run 10 seconds, you can change
it by module param.

I will be appreciate if I could get some feedback from the scheduler
experts like you, whatever you think it's good or junk, please let
me know :)

Regards,
Michael Wang



play.sh:

DURATION=10
NORMAL_THREADS=24
PERIOD=10

make clean
make
insmod ./schedtm.ko normalnr=$NORMAL_THREADS period=$PERIOD
sleep $DURATION
rmmod ./schedtm.ko
dmesg | grep schedtm

schedtm.c:

/*
 * scheduler test module
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2012
 *
 * Authors: Michael Wang 
 *
 */
#include 
#include 
#include 

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Michael Wang ");

#define pr_schedtm(fmt, ...)\
do {\
pr_alert("schedtm: ");  \
printk(pr_fmt(fmt), ##__VA_ARGS__); \
} while (0)

struct schedtm_data {
struct task_struct *t;
unsigned long run_count[NR_CPUS];
unsigned long preempt_count[NR_CPUS];
unsigned long start_time;
unsigned long total_time;
unsigned long wait_time;
unsigned long run_time;
unsigned long run_time_tmp;
unsigned long latency;
unsigned long latency_tmp;
unsigned long latency_count;
struct list_head list;
struct preempt_notifier pn;
};

LIST_HEAD(schedtm_data_list);
static struct schedtm_data *highest_run_time;

Re: [PATCH v2 00/10] sta2x11-mfd patches

2012-10-24 Thread Alessandro Rubini
> this is v2 of a patchset already submitted on 2012/09/12

Thanks Davide.

For the whole series:

  Acked-by: Alessandro Rubini 

/alessandro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] arm: mvebu: increase atomic coherent pool size for armada 370/XP

2012-10-24 Thread Marek Szyprowski

Hello,

On 10/24/2012 3:49 PM, Gregory CLEMENT wrote:

For Armada 370/XP we have the same problem that for the commit
cb01b63, so we applied the same solution: "The default 256 KiB
coherent pool may be too small for some of the Kirkwood devices, so
increase it to make sure that devices will be able to allocate their
buffers with GFP_ATOMIC flag"

Signed-off-by: Gregory CLEMENT 

Cc: Marek Szyprowski 


Acked-by: Marek Szyprowski 


---
  arch/arm/mach-mvebu/armada-370-xp.c |   12 
  1 file changed, 12 insertions(+)

diff --git a/arch/arm/mach-mvebu/armada-370-xp.c 
b/arch/arm/mach-mvebu/armada-370-xp.c
index 2af6ce5..cbad821 100644
--- a/arch/arm/mach-mvebu/armada-370-xp.c
+++ b/arch/arm/mach-mvebu/armada-370-xp.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -43,6 +44,16 @@ void __init armada_370_xp_timer_and_clk_init(void)
armada_370_xp_timer_init();
  }

+void __init armada_370_xp_init_early(void)
+{
+   /*
+* Some Armada 370/XP devices allocate their coherent buffers
+* from atomic context. Increase size of atomic coherent pool
+* to make sure such the allocations won't fail.
+*/
+   init_dma_coherent_pool_size(SZ_1M);
+}
+
  struct sys_timer armada_370_xp_timer = {
.init   = armada_370_xp_timer_and_clk_init,
  };
@@ -61,6 +72,7 @@ static const char * const armada_370_xp_dt_board_dt_compat[] 
= {
  DT_MACHINE_START(ARMADA_XP_DT, "Marvell Aramada 370/XP (Device Tree)")
.init_machine   = armada_370_xp_dt_init,
.map_io = armada_370_xp_map_io,
+   .init_early = armada_370_xp_init_early,
.init_irq   = armada_370_xp_init_irq,
.handle_irq = armada_370_xp_handle_irq,
.timer  = _370_xp_timer,


Best regards
--
Marek Szyprowski
Samsung Poland R Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

2012-10-24 Thread Justin P. Mattock



On Tue, Oct 23, 2012 at 10:06:52AM -0700, Justin P. Mattock wrote:
 > This is happening both with MAINLINE and NEXT.
 >
 > basically system is running fine, then under load system becomes
 > really sluggish and unresponsive. I was able to get dmesg of the
 > error..:
 >
 > [ 7745.007008] ath9k :05:00.0 wlan0: disabling VHT as WMM/QoS is
 > not supported by the AP
 > [ 7745.007736] wlan0: associate with 68:7f:74:b8:05:82 (try 1/3)
 > [ 7745.011456] wlan0: RX AssocResp from 68:7f:74:b8:05:82
 > (capab=0x411 status=0 aid=5)
 > [ 7745.011529] wlan0: associated
 > [ 8120.812482] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
 > elapsed... GPU hung
 > [ 8120.812642] [drm] capturing error event; look for more
 > information in /debug/dri/0/i915_error_state
 > [ 8122.328682] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer
 > elapsed... GPU hung
 > [ 8122.328845] [drm:i915_reset] *ERROR* GPU hanging too fast,
 > declaring wedged!
 > [ 8122.328850] [drm:i915_reset] *ERROR* Failed to reset chip.
 >
 > full log is here: http://fpaste.org/7xH8/
 >
 > as for good kernels from what I remember 3.6.0-rc1. I can try a
 > bisect on this once I get the time. or if anybody has a patch I can
 > test.

Can you please rehand your machine, and then grab the i915_error_state
from debugfs? That contains the gpu hang dump we need to diagnose things.

And the bisect would obviously be awesome.

Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


took a bit to trigger, but finally fired off.

here is a link to the file..: intel_error_decode
http://www.filefactory.com/file/22bypyjhs4mx

the file was to large to send to the list.. let me know if you need more 
info with this.
also if anybody has any ideas to trigger this would be appreciated so 
the bisect can be more precise. right now dont even think its worth it, 
due to not being able to trigger the crash causing the bisect to go 
astray and pointing to a wrong commit(which has happened in the past) 
but then again you never know.


Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/16] math128: Introduce various 128bit primitives

2012-10-24 Thread Geert Uytterhoeven
On Wed, Oct 24, 2012 at 11:53 PM, Juri Lelli  wrote:
> +#ifdef __SIZEOF_INT128__ /* gcc-4.6+ */
> +   unsigned __int128 val;
> +#endif

So the definition of val depends on (gcc) __SIZEOF_INT128__...

> +/*
> + * Make usage of __int128 dependent on arch code so they can
> + * judge if gcc is doing the right thing for them and can over-ride
> + * any funnies.
> + */
> +
> +#ifndef ARCH_HAS_INT128

... but all generic users depend on (Kconfig) ARCH_HAS_INT128?

How can Kconfig know if gcc supports this?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [sqlite] light weight write barriers

2012-10-24 Thread Nico Williams
On Wed, Oct 24, 2012 at 8:04 PM,   wrote:
> On Wed, 24 Oct 2012, Nico Williams wrote:
>> COW is "copy on write", which is actually a bit of a misnomer -- all
>> COW means is that blocks aren't over-written, instead new blocks are
>> written.  In particular this means that inodes, indirect blocks, data
>> blocks, and so on, that are changed are actually written to new
>> locations, and the on-disk format needs to handle this indirection.
>
> so how can you do this, and keep the writes in order (especially between two
> files) without being the filesystem?

By trusting fsync().  And if you don't care about immediate Durability
you can run the fsync() in a background thread and mark the associated
transaction as completed in the next transaction to be written after
the fsync() completes.

>> As for fsyn() and background threads... fsync() is synchronous, but in
>> this scheme we want it to happen asynchronously and then we want to
>> update each transaction with a pointer to the last transaction that is
>> known stable given an fsync()'s return.
>
> If you could specify ordering between two writes, I could see a process
> along the lines of
>
> [...]

fsync() deals with just one file.  fsync()s of different files are
another story.  That said, as long as the format of the two files is
COW then you can still compose transactions involving two files.  The
key is the file contents itself must be COW-structured.

Incidentally, here's a single-file, bag of b-trees that uses a COW
format: MDB, which can be found in
git://git.openldap.org/openldap.git, in the mdb.master branch.

> Or, as I type this, it occurs to me that you may be saying that every time
> you want to do an ordering guarantee, spawn a new thread to do the fsync and
> then just keep processing. The fsync will happen at some point, and the
> writes will not be re-ordered across the fsync, but you can keep going,
> writing more data while the fsync's are pending.

Yes, but only if the file's format is COWish.

The point is that COW saves the day.  A file-based DB needs to be COW.
 And the filesystem needs to be as well.

Note that write ahead logging approximates COW well enough most of the time.

> Then if you have a filesystem and I/O subsystem that can consolodate the
> fwyncs from all the different threads together into one I/O operation
> without having to flush the entire I/O queue for each one, you can get
> acceptable performance, with ordering. If the system crashes, data that
> hasn't had it's fsync() complete will be the only thing that is lost.

With the above caveat, yes.

Nico
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [sqlite] light weight write barriers

2012-10-24 Thread Theodore Ts'o
On Tue, Oct 23, 2012 at 03:53:11PM -0400, Vladislav Bolkhovitin wrote:
> Yes, SCSI has full support for ordered/simple commands designed
> exactly for that task: to have steady flow of commands even in case
> when some of them are ordered.

SCSI does, yes --- *if* the device actually implements Tagged Command
Queuing (TCQ).  Not all devices do.

More importantly, SATA drives do *not* have this capability, and when
you compare the price of SATA drives to uber-expensive "enterprise
drives", it's not surprising that most people don't actually use
SCSI/SAS drives that have implemented TCQ.  SATA's Native Command
Queuing (NCQ) is not equivalent; this allows the drive to reorder
requests (in particular read requests) so they can be serviced more
efficiently, but it does *not* allow the OS to specify a partial,
relative ordering of requests.

Yes, you can turn off writeback caching, but that has pretty huge
performance costs; and there is the FUA bit, but that's just an
unconditional high priority bypass of the writeback cache, which is
useful in some cases, but which again, does not give the ability for
the OS to specify a partial order, while letting the drive reorder
other requests for efficiency/performance's sake, since the drive has
a lot more information about the optimal way to reorder requests based
on the current location of the drive head and where certain blocks may
have been remapped due to bad block sparing, etc.

> Hopefully, eventually the storage developers will realize the value
> behind ordered commands and learn corresponding SCSI facilities to
> deal with them.

Eventually, drive manufacturers will realize that trying to price
guage people who want advanced features such as TCQ, DIF/DIX, is the
best way to gaurantee that most people won't bother to purchase them,
and hence the features will remain largely unused

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]

2012-10-24 Thread Ni zhan Chen

On 10/25/2012 12:36 PM, Hugh Dickins wrote:

On Wed, 24 Oct 2012, Dave Jones wrote:


Machine under significant load (4gb memory used, swap usage fluctuating)
triggered this...

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
  [] warn_slowpath_common+0x7f/0xc0
  [] warn_slowpath_null+0x1a/0x20
  [] shmem_getpage_gfp+0xa5c/0xa70
  [] ? shmem_getpage_gfp+0x29e/0xa70
  [] shmem_fault+0x4f/0xa0
  [] __do_fault+0x71/0x5c0
  [] ? __lock_acquire+0x306/0x1ba0
  [] ? local_clock+0x89/0xa0
  [] handle_pte_fault+0x97/0xae0
  [] ? sub_preempt_count+0x79/0xd0
  [] ? delay_tsc+0xae/0x120
  [] ? __const_udelay+0x28/0x30
  [] handle_mm_fault+0x289/0x350
  [] __do_page_fault+0x18e/0x530
  [] ? local_clock+0x89/0xa0
  [] ? get_parent_ip+0x11/0x50
  [] ? get_parent_ip+0x11/0x50
  [] ? sub_preempt_count+0x79/0xd0
  [] ? rcu_user_exit+0xc9/0xf0
  [] do_page_fault+0x2b/0x50
  [] page_fault+0x28/0x30
  [] ? copy_user_enhanced_fast_string+0x9/0x20
  [] ? sys_futimesat+0x41/0xe0
  [] ? syscall_trace_enter+0x25/0x2c0
  [] ? tracesys+0x7e/0xe6
  [] tracesys+0xe1/0xe6



1148 error = shmem_add_to_page_cache(page, mapping, 
index,
1149 gfp, 
swp_to_radix_entry(swap));
1150 /* We already confirmed swap, and make no 
allocation */
1151 VM_BUG_ON(error);
1152 }

That's very surprising.  Easy enough to handle an error there, but
of course I made it a VM_BUG_ON because it violates my assumptions:
I rather need to understand how this can be, and I've no idea.

Clutching at straws, I expect this is entirely irrelevant, but:
there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
in current linux.git; rather, there's a VM_BUG_ON on line 1149.

So you've inserted a couple of lines for some reason (more useful
trinity behaviour, perhaps)?  And have some config option I'm
unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?


Hi Hugh,

I think it maybe caused by your commit [d189922862e03ce: shmem: fix 
negative rss in memcg memory.stat], one question:


if function shmem_confirm_swap confirm the entry has already brought 
back from swap by a racing thread, then why call shmem_add_to_page_cache 
to add page from swapcache to pagecache again? otherwise, will goto 
unlock and then go to repeat? where I miss?


Regards,
Chen



Hugh



  total   used   free sharedbuffers cached
Mem:   388552828540641031464  0   9624  19208
-/+ buffers/cache:28252321060296
Swap:  6029308  306565998652

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pi futex oops in __lock_acquire

2012-10-24 Thread Darren Hart


On 10/24/2012 01:24 PM, Dave Jones wrote:
> I've been able to trigger this for the last week or so.
> Unclear whether this is a new bug, or my fuzzer got smarter, but I see the
> pi-futex code hasn't changed since the last time it found something..
> 
>  > BUG: unable to handle kernel NULL pointer dereference at 0018
>  > IP: [] __lock_acquire+0x5e/0x1ba0
>  > PGD 8e72c067 PUD 34f07067 PMD 0 
>  > Oops:  [#1] PREEMPT SMP 
>  > CPU 7 
>  > Pid: 27513, comm: trinity-child0 Not tainted 3.7.0-rc2+ #43
>  > RIP: 0010:[]  [] 
> __lock_acquire+0x5e/0x1ba0
>  > RSP: 0018:8800803f7b28  EFLAGS: 00010046
>  > RAX: 0086 RBX:  RCX: 
>  > RDX:  RSI:  RDI: 0018
>  > RBP: 8800803f7c18 R08: 0002 R09: 
>  > R10:  R11:  R12: 0002
>  > R13: 880051dd8000 R14: 0002 R15: 0018
>  > FS:  7f9fc6ccb740() GS:880148a0() 
> knlGS:
>  > CS:  0010 DS:  ES:  CR0: 80050033
>  > CR2: 0018 CR3: 8e6fb000 CR4: 001407e0
>  > DR0:  DR1:  DR2: 
>  > DR3:  DR6: 0ff0 DR7: 0400
>  > Process trinity-child0 (pid: 27513, threadinfo 8800803f6000, task 
> 880051dd8000)
>  > Stack:
>  >  8800803f7b48 816c5c59 8800803f7b48 88014840ebc0
>  >  8800803f7b68 816c18e3 8800803f7d10 0001
>  >  8800803f7ba8 810a1e62 8800803f7d10 0282
>  > Call Trace:
>  >  [] ? sub_preempt_count+0x79/0xd0
>  >  [] ? _raw_spin_unlock_irqrestore+0x73/0xa0
>  >  [] ? hrtimer_try_to_cancel+0x52/0x210
>  >  [] ? debug_rt_mutex_free_waiter+0x15/0x180
>  >  [] ? rt_mutex_slowlock+0x127/0x1b0
>  >  [] ? local_clock+0x89/0xa0
>  >  [] lock_acquire+0xa2/0x220
>  >  [] ? futex_lock_pi.isra.18+0x1cc/0x390
>  >  [] _raw_spin_lock+0x40/0x80
>  >  [] ? futex_lock_pi.isra.18+0x1cc/0x390
>  >  [] futex_lock_pi.isra.18+0x1cc/0x390
>  >  [] ? update_rmtp+0x70/0x70
>  >  [] do_futex+0x394/0xa50
>  >  [] ? might_fault+0x53/0xb0
>  >  [] sys_futex+0x8d/0x190
>  >  [] tracesys+0xe1/0xe6
>  > Code: d8 45 0f 45 e0 4c 89 75 f0 4c 89 7d f8 85 c0 0f 84 f8 00 00 00 8b 05 
> 22 fe f3 00 49 89 ff 89 f3 41 89 d2 85 c0 0f 84 02 01 00 00 <49> 8b 07 ba 01 
> 00 00 00 48 3d c0 81 06 82 44 0f 44 e2 83 fb 01 
>  > RIP  [] __lock_acquire+0x5e/0x1ba0
>  >  RSP 
>  > CR2: 0018
> 
> It looks like we got all the way to lock_acquire with a NULL 'lock' somehow.
> 
> Darren, any idea how this could happen ?

I'm digging. Can you get trinity to provide the arguments it used that
trigger the crash? That might help hone in on the exact path.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set

2012-10-24 Thread Siddhesh Poyarekar
On 25 October 2012 10:06, Darren Hart  wrote:
> Absolutely, that was great. Siddhesh, any objection to this test being
> incorporated into futextest?
>
> http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git;a=summary
>

I have no objection to the test being incorporated into futextest.

Thanks,
Siddhesh

-- 
http://siddhesh.in
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set

2012-10-24 Thread Darren Hart


On 10/24/2012 11:08 AM, Thomas Gleixner wrote:
> On Wed, 24 Oct 2012, Siddhesh Poyarekar wrote:
> 
>>> Now there is a different solution to that problem. Do not look at the
>>> user space value at all and enforce a lookup of possibly available
>>> pi_state. If pi_state can be found, then the new incoming locker T3
>>> blocks on that pi_state and legitimately races with T2 to acquire the
>>> rt_mutex and the pi_state and therefor the proper ownership of the
>>> user space futex.
>>
>> That works. Thanks for the detailed explanation too.
> 
> Thanks for the reproducer and finding the trouble spot in the first
> place!


Absolutely, that was great. Siddhesh, any objection to this test being
incorporated into futextest?

http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git;a=summary

> I'll queue that if Darren has no objections and mark it for stable as
> well.

I would mostly like to understand the stale waiters case you mentioned.
Otherwise, it seems sound - but changing what appears to be a workaround
for an undocumented cornercase in code this complex does make me a bit
nervous. I'd feel better if we could get Siddhesh's and this stale
waiters covered in futextest.

-- 
Darren Hart
Intel Open Source Technology Center
Yocto Project - Technical Lead - Linux Kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]

2012-10-24 Thread Hugh Dickins
On Wed, 24 Oct 2012, Dave Jones wrote:

> Machine under significant load (4gb memory used, swap usage fluctuating)
> triggered this...
> 
> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> Call Trace:
>  [] warn_slowpath_common+0x7f/0xc0
>  [] warn_slowpath_null+0x1a/0x20
>  [] shmem_getpage_gfp+0xa5c/0xa70
>  [] ? shmem_getpage_gfp+0x29e/0xa70
>  [] shmem_fault+0x4f/0xa0
>  [] __do_fault+0x71/0x5c0
>  [] ? __lock_acquire+0x306/0x1ba0
>  [] ? local_clock+0x89/0xa0
>  [] handle_pte_fault+0x97/0xae0
>  [] ? sub_preempt_count+0x79/0xd0
>  [] ? delay_tsc+0xae/0x120
>  [] ? __const_udelay+0x28/0x30
>  [] handle_mm_fault+0x289/0x350
>  [] __do_page_fault+0x18e/0x530
>  [] ? local_clock+0x89/0xa0
>  [] ? get_parent_ip+0x11/0x50
>  [] ? get_parent_ip+0x11/0x50
>  [] ? sub_preempt_count+0x79/0xd0
>  [] ? rcu_user_exit+0xc9/0xf0
>  [] do_page_fault+0x2b/0x50
>  [] page_fault+0x28/0x30
>  [] ? copy_user_enhanced_fast_string+0x9/0x20
>  [] ? sys_futimesat+0x41/0xe0
>  [] ? syscall_trace_enter+0x25/0x2c0
>  [] ? tracesys+0x7e/0xe6
>  [] tracesys+0xe1/0xe6
> 
> 
> 
> 1148 error = shmem_add_to_page_cache(page, mapping, 
> index,
> 1149 gfp, 
> swp_to_radix_entry(swap));
> 1150 /* We already confirmed swap, and make no 
> allocation */
> 1151 VM_BUG_ON(error);
> 1152 }

That's very surprising.  Easy enough to handle an error there, but
of course I made it a VM_BUG_ON because it violates my assumptions:
I rather need to understand how this can be, and I've no idea.

Clutching at straws, I expect this is entirely irrelevant, but:
there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
in current linux.git; rather, there's a VM_BUG_ON on line 1149.

So you've inserted a couple of lines for some reason (more useful
trinity behaviour, perhaps)?  And have some config option I'm
unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Hugh

> 
> 
>  total   used   free sharedbuffers cached
> Mem:   388552828540641031464  0   9624  19208
> -/+ buffers/cache:28252321060296
> Swap:  6029308  306565998652
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set

2012-10-24 Thread Darren Hart


On 10/23/2012 01:29 PM, Thomas Gleixner wrote:
> Darren, Siddhesh,
> 
> On Tue, 23 Oct 2012, Darren Hart wrote:
> 
>> Hi Siddesh,
>>
>> Thanks for the patch and your work to isolate it in the glibc bug 14076.
>>
>> On 10/21/2012 08:20 PM, Siddhesh Poyarekar wrote:
>>> In futex_lock_pi_atomic, we consider that if the value in the futex
>>> variable is 0 with additional flags, then it is safe for takeover
>>> since the owner of the futex is dead.  However, when FUTEX_WAITERS is
>>> set in the futex value, handle_futex_death calls futex_wake to wake up
>>> one task. 
>>
>> It shouldn't for PI mutexes. It should just set the FUTEX_OWNER_DIED flag,
>> maintaining the FUTEX_WAITERS flag, and exit.
>>
>> int handle_futex_death(...
>> ...
>>  /*
>>   * Wake robust non-PI futexes here. The wakeup of
>>   * PI futexes happens in exit_pi_state():
>>   */
>>  if (!pi && (uval & FUTEX_WAITERS))
>>  futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
> 
> Yes, the description of the problem is slightly wrong, but it still
> pinpoints the real wreckage.
> 
>>> Hence the assumption in futex_lock_pi_atomic is not correct.
>>> The correct assumption is that a futex may be considered safe for a
>>> takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and
>>> the FUTEX_WAITERS bit is not set.
> ...
>>> -   if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
>>> +   if (unlikely(ownerdied ||
>>> +   !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS {
> 
> This solves the problem at hand, but I'm not too happy with the
> solution. One of the real possible scenarios which expose the problem
> is:
> 
> Futex F is initialized with PTHREAD_PRIO_INHERIT and
> PTHREAD_MUTEX_ROBUST_NP attributes.
> 
> T1 lock_futex_pi(F);
> 
> T2 lock_futex_pi(F);
> 
>--> T2 blocks on the futex and creates pi_state which is associated
>to T1.
> 
> T1 exits
> 
>--> exit_robust_list() runs
> 
>--> Futex F userspace value TID field is set to 0 and
>FUTEX_OWNER_DIED bit is set.
>  
> T3 lock_futex_pi(F);
> 
>--> Succeeds due to the check for F's userspace TID field == 0
> 
>--> Claims ownership of the futex and sets its own TID into the
>userspace TID field of futex F
> 
>--> returns to user space  
> 
> T1 --> exit_pi_state_list()
> 
>--> Transfers pi_state to waiter T2 and wakes T2 via
>  rt_mutex_unlock(_state->mutex)
> 
> T2 --> acquires pi_state->mutex and gains real ownership of the
>pi_state
> 
>--> Claims ownership of the futex and sets its own TID into the
>userspace TID field of futex F
> 
>--> returns to user space  
> 
> T3 --> observes inconsistent state
> 
> This problem is independent of UP/SMP, preemptible/non preemptible
> kernels, or process shared vs. private. The only difference is that
> certain configurations are more likely to expose it.
> 
> So as Siddhesh correctly analyzed the following check in
> futex_lock_pi_atomic() is the culprit:
> 
>   if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
> 
> We check the userspace value for a TID value of 0 and take over the
> futex unconditionally if that's true.
> 
> AFAICT this check is there as it is correct for a different corner
> case of futexes: the WAITERS bit became stale.
> 
> Now the proposed change
> 
> - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
> +   if (unlikely(ownerdied ||
> +   !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS {
> 
> solves the problem, but it's not obvious why and it wreckages the
> "stale WAITERS bit" case.


In what scenario does the WAITERS bit become stale for pi futexes? This
corner case seems rather core to your solution, so I would like to
understand it a bit better.


> 
> What happens is, that due to the WAITERS bit being set (T2 is blocked
> on that futex) it enforces T3 to go through lookup_pi_state(), which
> in the above case returns an existing pi_state and therefor forces T3
> to legitimately fight with T2 over the ownership of the pi_state (via
> pi_state->mutex). Probelm solved!
> 
> Though that does not work for the "WAITERS bit is stale" problem
> because if lookup_pi_state() does not find existing pi_state it
> returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
> return -ESRCH to user space because the OWNER_DIED bit is not set.
> 
> Now there is a different solution to that problem. Do not look at the
> user space value at all and enforce a lookup of possibly available
> pi_state. If pi_state can be found, then the new incoming locker T3
> blocks on that pi_state and legitimately races with T2 to acquire the
> rt_mutex and the pi_state and therefor the proper ownership of the
> user space futex.


My first concern here is performance impact by forcing the pi_state
lookup, however, if we got this far, we already took the syscall, and
our 

RE: [PATCH v2] Support Elan Touchscreen eKTF product.

2012-10-24 Thread 劉嘉駿
Hi Dmitry,
Thanks for review.

> -Original Message-
> From: Dmitry Torokhov [mailto:dmitry.torok...@gmail.com]
> Sent: Thursday, October 25, 2012 2:13 AM
> To: Scott Liu
> Cc: linux-in...@vger.kernel.org; linux-...@vger.kernel.org;
linux-kernel@vger.kernel.org;
> Benjamin Tissoires; Jesse; Vincent Wang; Paul
> Subject: Re: [PATCH v2] Support Elan Touchscreen eKTF product.
> 
> Hi Scott,
> 
> On Wed, Oct 24, 2012 at 09:41:43AM +0800, Scott Liu wrote:
> > This patch is for Elan eKTF Touchscreen product, I2C adpater module.
> >
> > Signed-off-by: Scott Liu 
> > ---
> >
> > Hi,
> > v2 revision I have fixed some bug as your advise.
> > 1. To target the mainline
> > 2. No Android dependency
> > 3. reuse those duplication code from Henrik's patchset.
> > (input_mt_sync_frame()  / input_mt_get_slot_by_key())
> 
> Just a quick run through the code, so:
> 
> - please remove polling support, it is not useful in production;

OK.

> - why do you need a separate probe work instead of doing what you
>   need in elants_probe()

will fix.

> - it is not a good idea to register input device first and then
>   allocating memory for MT handling.

Ooop...will fix.

> - I do not understand why kfifo is needed

The firmware and the host would conflict by read command and finger report
simultaneously. So I'm simply using kfifo in IRQ thread function.

* read command: writing 4 bytes commands and the device asserts GPIO
interrupt and then response 4 bytes data.

There was an error if we do not use kfifo:
With heavy loading by finger report / read command, the driver may
get finger report as response data.

So, do you understand my meaning? 

> - please remove the rest of the custom threads
OK

> - you do not need to call input_mt_destroy_slots() explicitly
OK
> - use request_firmware() instead of special character device to upload
>   firmware.
OK, but I'll remove firmware update function at this patch first.

> - please use standard kernel-doc markup.
> - consider what attributes are there only for debugging and move them to
>   debugfs.
OK.

> - I find the use of enums in this driver quite unconventional, just
>   standard #defines would probably be more straightforward.
OK.

Thanks,
Scott

> 
> Thanks.
> 
> --
> Dmitry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread Fengguang Wu
Hi Chen,

> But how can bdi related ra_pages reflect different files' readahead
> window? Maybe these different files are sequential read, random read
> and so on.

It's simple: sequential reads will get ra_pages readahead size while
random reads will not get readahead at all.

Talking about the below chunk, it might hurt someone that explicitly
takes advantage of the behavior, however the ra_pages*2 seems more
like a hack than general solution to me: if the user will need
POSIX_FADV_SEQUENTIAL to double the max readahead window size for
improving IO performance, then why not just increase bdi->ra_pages and
benefit all reads? One may argue that it offers some differential
behavior to specific applications, however it may also present as a
counter-optimization: if the root already tuned bdi->ra_pages to the
optimal size, the doubled readahead size will only cost more memory
and perhaps IO latency.

--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t 
len, int advice)
spin_unlock(>f_lock);
break;
case POSIX_FADV_SEQUENTIAL:
-   file->f_ra.ra_pages = bdi->ra_pages * 2;
spin_lock(>f_lock);
file->f_mode &= ~FMODE_RANDOM;
spin_unlock(>f_lock);

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5] PWM: Add SPEAr PWM chip driver support

2012-10-24 Thread Viresh Kumar
On 25 October 2012 09:39, Shiraz Hashim  wrote:
> Add support for PWM chips present on SPEAr platforms. These PWM
> chips support 4 channel output with programmable duty cycle and
> frequency.
>
> More details on these PWM chips can be obtained from relevant
> chapter of reference manual, present at following[1] location.
>
> 1. http://www.st.com/internet/mcu/product/251211.jsp
>
> Cc: Thierry Reding 
> Signed-off-by: Shiraz Hashim 
> Signed-off-by: Viresh Kumar 
> Reviewed-by: Vipin Kumar 
> Acked-by: Viresh Kumar 

Looks fine.
I dare not comment on this one :)

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2 V2] memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]

2012-10-24 Thread KOSAKI Motohiro
On Wed, Oct 24, 2012 at 5:43 AM, Lai Jiangshan  wrote:
> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
> it forgets to manage node_states[N_NORMAL_MEMORY]. it may cause
> node_states[N_NORMAL_MEMORY] becomes incorrect.
>
> Example, if a node is empty before online, and we online a memory
> which is in ZONE_NORMAL. And after online,  node_states[N_HIGH_MEMORY]
> is correct, but node_states[N_NORMAL_MEMORY] is incorrect,
> the online code don't set the new online node to
> node_states[N_NORMAL_MEMORY].
>
> The same things like it will happen when offline(the offline code
> don't clear the node from node_states[N_NORMAL_MEMORY] when needed).
> Some memory managment code depends node_states[N_NORMAL_MEMORY],
> so we have to fix up the node_states[N_NORMAL_MEMORY].
>
> We add node_states_check_changes_online() and 
> node_states_check_changes_offline()
> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
> are changed while hotpluging.
>
> Also add @status_change_nid_normal to struct memory_notify, thus
> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
> are changed. (We can add a @flags and reuse @status_change_nid instead of
> introducing @status_change_nid_normal, but it will add much more complicated
> in memory hotplug callback in every subsystem. So introdcing
> @status_change_nid_normal is better and it don't change the sematic
> of @status_change_nid)
>
> Changed from V1:
> add more comments
> change the function name

Your patch didn't fix my previous comments and don't works correctly.
Please test your own patch before resubmitting. You should consider both
zone normal only node and zone high only node.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V5] PWM: Add SPEAr PWM chip driver support

2012-10-24 Thread Shiraz Hashim
Add support for PWM chips present on SPEAr platforms. These PWM
chips support 4 channel output with programmable duty cycle and
frequency.

More details on these PWM chips can be obtained from relevant
chapter of reference manual, present at following[1] location.

1. http://www.st.com/internet/mcu/product/251211.jsp

Cc: Thierry Reding 
Signed-off-by: Shiraz Hashim 
Signed-off-by: Viresh Kumar 
Reviewed-by: Vipin Kumar 
Acked-by: Viresh Kumar 
---
Changes:-
V4 --> V5:
   * replace tab by space in structure element declaration
   * restructure probe to register pwm_chip at end when clk is prepared, and
 all initializations done.
   * move clk_enable/disable in probe under if block which checks "1340"
 compatibility 
   * Replace (ret < 0) if condition by (!ret) at places where ret should be 0
 on success

V3 --> V4:
   * simplify remove
   * maintain alphabetical order in Makefile
   * donot check for device node in probe
   * move few assignment lines in probe

V2 --> V3:
   * remove "disabled" line from pwm dt binding documentation
   * remove un-necessary check on pwm chip (for NULL) in remove.

V1 --> V2:
   * make proper reference to pwm and pwm chip
   * take care to capitalize PWM at appropriate places
   * fix compatible string to the SoC where pwm chip was introduced
   * Rename the documentation file to the name of driver
   * Fix cosmetic changes like names, function name alignment, paragraph
 formating, comments placement and formating, etc.
   * Group and associate the bit field definitions to their registers
   * Fix kerneldoc for structure definition
   * Use chip to name pwm device and pwm for the channel instance
   * Remove init section qualifiers
   * Remove ifdefs around device tree from code and add dependency on CONFIG_OF
   * prepare/unprepare clock once in probe/remove and just enable/disable
 at rest of the places.
   * Use _relaxed for readl/writel.
   * Fix pwm disable part in remove

 .../devicetree/bindings/pwm/spear-pwm.txt  |   18 ++
 drivers/pwm/Kconfig|   11 +
 drivers/pwm/Makefile   |1 +
 drivers/pwm/pwm-spear.c|  276 
 4 files changed, 306 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pwm/spear-pwm.txt
 create mode 100644 drivers/pwm/pwm-spear.c

diff --git a/Documentation/devicetree/bindings/pwm/spear-pwm.txt 
b/Documentation/devicetree/bindings/pwm/spear-pwm.txt
new file mode 100644
index 000..3ac779d
--- /dev/null
+++ b/Documentation/devicetree/bindings/pwm/spear-pwm.txt
@@ -0,0 +1,18 @@
+== ST SPEAr SoC PWM controller ==
+
+Required properties:
+- compatible: should be one of:
+  - "st,spear320-pwm"
+  - "st,spear1340-pwm"
+- reg: physical base address and length of the controller's registers
+- #pwm-cells: number of cells used to specify PWM which is fixed to 2 on
+  SPEAr. The first cell specifies the per-chip index of the PWM to use and
+  the second cell is the period in nanoseconds.
+
+Example:
+
+pwm: pwm@a800 {
+compatible ="st,spear320-pwm";
+reg = <0xa800 0x1000>;
+#pwm-cells = <2>;
+};
diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
index ed81720..6e556c7 100644
--- a/drivers/pwm/Kconfig
+++ b/drivers/pwm/Kconfig
@@ -112,6 +112,17 @@ config PWM_SAMSUNG
  To compile this driver as a module, choose M here: the module
  will be called pwm-samsung.
 
+config PWM_SPEAR
+   tristate "STMicroelectronics SPEAr PWM support"
+   depends on PLAT_SPEAR
+   depends on OF
+   help
+ Generic PWM framework driver for the PWM controller on ST
+ SPEAr SoCs.
+
+ To compile this driver as a module, choose M here: the module
+ will be called pwm-spear.
+
 config PWM_TEGRA
tristate "NVIDIA Tegra PWM support"
depends on ARCH_TEGRA
diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
index acfe482..3b3f4c9a 100644
--- a/drivers/pwm/Makefile
+++ b/drivers/pwm/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_PWM_MXS)   += pwm-mxs.o
 obj-$(CONFIG_PWM_PUV3) += pwm-puv3.o
 obj-$(CONFIG_PWM_PXA)  += pwm-pxa.o
 obj-$(CONFIG_PWM_SAMSUNG)  += pwm-samsung.o
+obj-$(CONFIG_PWM_SPEAR)+= pwm-spear.o
 obj-$(CONFIG_PWM_TEGRA)+= pwm-tegra.o
 obj-$(CONFIG_PWM_TIECAP)   += pwm-tiecap.o
 obj-$(CONFIG_PWM_TIEHRPWM) += pwm-tiehrpwm.o
diff --git a/drivers/pwm/pwm-spear.c b/drivers/pwm/pwm-spear.c
new file mode 100644
index 000..6a8fd9b
--- /dev/null
+++ b/drivers/pwm/pwm-spear.c
@@ -0,0 +1,276 @@
+/*
+ * ST Microelectronics SPEAr Pulse Width Modulator driver
+ *
+ * Copyright (C) 2012 ST Microelectronics
+ * Shiraz Hashim 
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#include 

Re: [PATCH V4] PWM: Add SPEAr PWM chip driver support

2012-10-24 Thread Shiraz Hashim
On Wed, Oct 24, 2012 at 07:51:37AM +0200, Thierry Reding wrote:
> On Mon, Oct 22, 2012 at 04:36:41PM +0530, Shiraz Hashim wrote:
> [...]
> > +struct spear_pwm_chip {
> > +   void __iomem *mmio_base;
> > +   struct clk *clk;
> > +   struct pwm_chip chip;
> 
> My editor shows a tab between pwm_chip and chip. This should really be a
> space.
> 
> > +   ret = pwmchip_add(>chip);
> > +   if (ret < 0) {
> > +   dev_err(>dev, "pwmchip_add() failed: %d\n", ret);
> > +   return ret;
> > +   }
> > +
> > +   ret = clk_prepare_enable(pc->clk);
> > +   if (ret < 0)
> > +   return pwmchip_remove(>chip);
> 
> I think in order to fix the potential race condition that Viresh
> mentioned we should move the clk_prepare_enable() before the
> pwmchip_add(), but don't forget to disable and unprepare the clock if
> pwmchip_add() fails.
> 
> Actually, can't we make it a clk_prepare() only at this point and move
> the clk_enable() and clk_disable() into the if block below? In case the
> compatible value is not "st,spear1340-pwm" we don't need the clock
> enabled.
> 
> > +
> > +   if (of_device_is_compatible(np, "st,spear1340-pwm")) {
> > +   /*
> > +* Following enables PWM chip, channels would still be
> > +* enabled individually through their control register
> > +*/
> > +   val = readl_relaxed(pc->mmio_base + PWMMCR);
> > +   val |= PWMMCR_PWM_ENABLE;
> > +   writel_relaxed(val, pc->mmio_base + PWMMCR);
> > +
> 
> Oh, and a spurious newline here... =)
> 
> > +   }
> > +
> > +   /* only disable the clk and leave it prepared */
> > +   clk_disable(pc->clk);
> 
> This can go into the if block to match the clk_enable().

All suggestions would be included in V5. I hope this would be the
last one :).

--
regards
Shiraz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] exec: do not leave bprm->interp on stack

2012-10-24 Thread Al Viro
On Wed, Oct 24, 2012 at 04:20:32PM -0700, Kees Cook wrote:
> If a series of scripts are executed, each triggering module loading via
> unprintable bytes in the script header, kernel stack contents can leak
> into the command line.
> 
> Normally execution of binfmt_script and binfmt_misc happens
> recursively. However, when modules are enabled, and unprintable bytes
> exist in the bprm->buf, execution will restart after attempting to load
> matching binfmt modules. Unfortunately, the logic in binfmt_script and
> binfmt_misc does not expect to get restarted. They leave bprm->interp
> pointing to their local stack. This means on restart bprm->interp is
> left pointing into unused stack memory which can then be copied into
> the userspace argv areas.
> 
> This changes the logic to require allocation for any changes to the
> bprm->interp. To avoid adding a new kmalloc to every exec, the default
> value is left as-is. Only when passing through binfmt_script or
> binfmt_misc does an allocation take place.

I really don't like that.  It papers over the problem, but doesn't really
solve the underlying stupidity.  We have no good reason to retry a binfmt
we'd already attempted on this level of recursion.  And your patch doesn't
deal with that at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Oct 25

2012-10-24 Thread Stephen Rothwell
Hi all,

Changes since 201201024:

New tree: akpm-current

The modules tree lost its conflict.

The pm tree lost its build failure.

The usb tree gained a conflict against the usb.current tree.

The staging tree gained a conflict against the staging.current tree.

The akpm tree lost its 2 build failures.  It also lost a couple of
patches that turned up elsewhere.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64. After the
final fixups (if any), it is also built with powerpc allnoconfig (32 and
64 bit), ppc44x_defconfig and allyesconfig (minus
CONFIG_PROFILE_ALL_BRANCHES - this fails its final link) and i386, sparc,
sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

We are up to 206 trees (counting Linus' and 27 trees of patches pending
for Linus' tree), more are welcome (even if they are currently empty).
Thanks to those who have contributed, and to those who haven't, please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (0e9e3e3 Merge tag 'stable/for-linus-3.7-rc2-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen)
Merging fixes/master (12250d8 Merge branch 'i2c-embedded/for-next' of 
git://git.pengutronix.de/git/wsa/linux)
Merging kbuild-current/rc-fixes (b1e0d8b kbuild: Fix gcc -x syntax)
Merging arm-current/fixes (b43b1ff Merge tag 'fixes-for-rmk' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into fixes)
Merging m68k-current/for-linus (8a745ee m68k: Wire up kcmp)
Merging powerpc-merge/merge (83dac59 cpuidle/powerpc: Fix snooze state problem 
in the cpuidle design on pseries.)
Merging sparc/master (43c422e apparmor: fix apparmor OOPS in 
audit_log_untrustedstring+0x1c/0x40)
Merging net/master (37561f6 tcp: Reject invalid ack_seq to Fast Open sockets)
Merging sound-current/for-linus (21b3de8 ALSA: als3000: check for the kzalloc 
return value)
Merging pci-current/for-linus (0ff9514 PCI: Don't print anything while decoding 
is disabled)
Merging wireless/master (f89ff64 b43: Fix oops on unload when firmware not 
found)
Merging driver-core.current/driver-core-linus (d28d388 firmware loader: sync 
firmware cache by async_synchronize_full_domain)
Merging tty.current/tty-linus (a4f7438 Revert "serial: omap: fix software flow 
control")
Merging usb.current/usb-linus (d7870af usb-storage: add unusual_devs entry for 
Casio EX-N1 digital camera)
Merging staging.current/staging-linus (e297da6 staging: ipack: add missing 
include (implicit declaration of function 'kfree'))
Merging char-misc.current/char-misc-linus (1392550 Drivers: hv: Cleanup error 
handling in vmbus_open())
Merging input-current/for-linus (0cc8d6a Merge branch 'next' into for-linus)
Merging md-current/for-linus (72f36d5 md: refine reporting of resync/reshape 
delays.)
Merging audit-current/for-linus (c158a35 audit: no leading space in 
audit_log_d_path prefix)
Merging crypto-current/master (9efade1 crypto: cryptd - disable softirqs in 
cryptd_queue_worker to prevent data corruption)
Merging ide/master (9974e43 ide: fix generic_ide_suspend/resume Oops)
Merging dwmw2/master (244dc4e Merge 
git://git.infradead.org/users/dwmw2/random-2.6)
Merging sh-current/sh-fixes-for-linus (4403310 SH: Convert out[bwl] macros to 
inline functions)
Merging irqdomain-current/irqdomain/merge (15e06bf irqdomain: Fix debugfs 
formatting)
Merging devicetree-current/devicetree/merge (4e8383b of: release node fix for 
of_parse_phandle_with_args)
Merging spi-current/spi/merge (d1c185b of/spi: Fix SPI module loading by using 
proper "spi:" modalias prefixes.)
Merging gpio-current/gpio/merge (96b7064 gpio/tca6424: merge I2C transactions, 
remove cast)
Merging 

[PATCH v11 1/6] start vm after reseting it

2012-10-24 Thread Hu Tao
From: Wen Congyang 

The guest should run after reseting it, but it does not run if its
old state is RUN_STATE_INTERNAL_ERROR or RUN_STATE_PAUSED.

We don't set runstate to RUN_STATE_PAUSED when reseting the guest,
so the runstate will be changed from RUN_STATE_INTERNAL_ERROR or
RUN_STATE_PAUSED to RUN_STATE_RUNNING(not RUN_STATE_PAUSED).

Signed-off-by: Wen Congyang 
---
 block.h |2 ++
 qmp.c   |2 +-
 vl.c|7 ---
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/block.h b/block.h
index e2d89d7..df10e30 100644
--- a/block.h
+++ b/block.h
@@ -360,6 +360,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs);
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
+void iostatus_bdrv_it(void *opaque, BlockDriverState *bs);
+
 enum BlockAcctType {
 BDRV_ACCT_READ,
 BDRV_ACCT_WRITE,
diff --git a/qmp.c b/qmp.c
index 36c54c5..f4a757b 100644
--- a/qmp.c
+++ b/qmp.c
@@ -125,7 +125,7 @@ SpiceInfo *qmp_query_spice(Error **errp)
 };
 #endif
 
-static void iostatus_bdrv_it(void *opaque, BlockDriverState *bs)
+void iostatus_bdrv_it(void *opaque, BlockDriverState *bs)
 {
 bdrv_iostatus_reset(bs);
 }
diff --git a/vl.c b/vl.c
index ee3c43a..ca2e1e2 100644
--- a/vl.c
+++ b/vl.c
@@ -343,7 +343,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_INMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_INMIGRATE, RUN_STATE_PRELAUNCH },
 
-{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_PAUSED },
+{ RUN_STATE_INTERNAL_ERROR, RUN_STATE_RUNNING },
 { RUN_STATE_INTERNAL_ERROR, RUN_STATE_FINISH_MIGRATE },
 
 { RUN_STATE_IO_ERROR, RUN_STATE_RUNNING },
@@ -376,7 +376,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
-{ RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
+{ RUN_STATE_SHUTDOWN, RUN_STATE_RUNNING },
 { RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
 
 { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
@@ -1618,7 +1618,8 @@ static bool main_loop_should_exit(void)
 resume_all_vcpus();
 if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
 runstate_check(RUN_STATE_SHUTDOWN)) {
-runstate_set(RUN_STATE_PAUSED);
+bdrv_iterate(iostatus_bdrv_it, NULL);
+vm_start();
 }
 }
 if (qemu_wakeup_requested()) {
-- 
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 6/6] Thermal: Add ST-Ericsson DB8500 thermal properties and platform data.

2012-10-24 Thread Hongbo Zhang
[...]
>>  /*
>> + * Thermal Sensor
>> + */
>> +
>> +static struct resource db8500_thsens_resources[] = {
>> +   {
>> +   .name = "IRQ_HOTMON_LOW",
>> +   .start  = IRQ_PRCMU_HOTMON_LOW,
>> +   .end= IRQ_PRCMU_HOTMON_LOW,
>> +   .flags  = IORESOURCE_IRQ,
>> +   },
>> +   {
>
> I prefer }, {
I just follow the style of all the other definitions, let's keep a
uniform style.

>
>> +   .name = "IRQ_HOTMON_HIGH",
>> +   .start  = IRQ_PRCMU_HOTMON_HIGH,
>> +   .end= IRQ_PRCMU_HOTMON_HIGH,
>> +   .flags  = IORESOURCE_IRQ,
>> +   },
>> +};
[...]
>>
>>
>> ___
>> linaro-dev mailing list
>> linaro-...@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/linaro-dev
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 10/23] printk: Rename log_wait to printk_log_wait

2012-10-24 Thread Joe Perches
Make this generic variable more specific to the printk
subsystem to allow this variable to be used without
a specific extern.

Also update fs/proc/kmsg.c as it uses log_wait.

Signed-off-by: Joe Perches 
---
 fs/proc/kmsg.c |4 ++--
 kernel/printk/printk.c |   12 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/proc/kmsg.c b/fs/proc/kmsg.c
index bd4b5a7..16f2c85 100644
--- a/fs/proc/kmsg.c
+++ b/fs/proc/kmsg.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-extern wait_queue_head_t log_wait;
+extern wait_queue_head_t printk_log_wait;
 
 static int kmsg_open(struct inode * inode, struct file * file)
 {
@@ -41,7 +41,7 @@ static ssize_t kmsg_read(struct file *file, char __user *buf,
 
 static unsigned int kmsg_poll(struct file *file, poll_table *wait)
 {
-   poll_wait(file, _wait, wait);
+   poll_wait(file, _log_wait, wait);
if (do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_FILE))
return POLLIN | POLLRDNORM;
return 0;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 341f2d9..c87472b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -63,7 +63,7 @@ void asmlinkage __attribute__((weak)) early_printk(const char 
*fmt, ...)
 #define MINIMUM_CONSOLE_LOGLEVEL 1 /* Minimum loglevel we let people use */
 #define DEFAULT_CONSOLE_LOGLEVEL 7 /* anything MORE serious than KERN_DEBUG */
 
-DECLARE_WAIT_QUEUE_HEAD(log_wait);
+DECLARE_WAIT_QUEUE_HEAD(printk_log_wait);
 
 int console_printk[4] = {
DEFAULT_CONSOLE_LOGLEVEL,   /* console_loglevel */
@@ -447,7 +447,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
}
 
raw_spin_unlock_irq(_lock);
-   ret = wait_event_interruptible(log_wait,
+   ret = wait_event_interruptible(printk_log_wait,
   user->seq != 
printk_log_next_seq);
if (ret)
goto out;
@@ -589,7 +589,7 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
if (!user)
return POLLERR|POLLNVAL;
 
-   poll_wait(file, _wait, wait);
+   poll_wait(file, _log_wait, wait);
 
raw_spin_lock_irq(_lock);
if (user->seq < printk_log_next_seq) {
@@ -1122,7 +1122,7 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
error = -EFAULT;
goto out;
}
-   error = wait_event_interruptible(log_wait,
+   error = wait_event_interruptible(printk_log_wait,
 syslog_seq != 
printk_log_next_seq);
if (error)
goto out;
@@ -1948,7 +1948,7 @@ void printk_tick(void)
printk(KERN_WARNING "[sched_delayed] %s", buf);
}
if (pending & PRINTK_PENDING_WAKEUP)
-   wake_up_interruptible(_wait);
+   wake_up_interruptible(_log_wait);
}
 }
 
@@ -1961,7 +1961,7 @@ int printk_needs_cpu(int cpu)
 
 void wake_up_klogd(void)
 {
-   if (waitqueue_active(_wait))
+   if (waitqueue_active(_log_wait))
this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
 }
 
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 4/6] add a new qevent: QEVENT_GUEST_PANICKED

2012-10-24 Thread Hu Tao
This event will be emited when the guest is panicked.

Signed-off-by: Wen Congyang 
---
 monitor.c |1 +
 monitor.h |1 +
 2 files changed, 2 insertions(+)

diff --git a/monitor.c b/monitor.c
index d17ae2d..d2e4bbf 100644
--- a/monitor.c
+++ b/monitor.c
@@ -457,6 +457,7 @@ static const char *monitor_event_names[] = {
 [QEVENT_WAKEUP] = "WAKEUP",
 [QEVENT_BALLOON_CHANGE] = "BALLOON_CHANGE",
 [QEVENT_SPICE_MIGRATE_COMPLETED] = "SPICE_MIGRATE_COMPLETED",
+[QEVENT_GUEST_PANICKED] = "GUEST_PANICKED",
 };
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
 
diff --git a/monitor.h b/monitor.h
index b6e7d95..3962340 100644
--- a/monitor.h
+++ b/monitor.h
@@ -45,6 +45,7 @@ typedef enum MonitorEvent {
 QEVENT_WAKEUP,
 QEVENT_BALLOON_CHANGE,
 QEVENT_SPICE_MIGRATE_COMPLETED,
+QEVENT_GUEST_PANICKED,
 
 /* Add to 'monitor_event_names' array in monitor.c when
  * defining new events here */
-- 
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [tip:sched/core] sched: Track the runnable average on a per-task entity basis

2012-10-24 Thread li guang
在 2012-10-24三的 02:43 -0700,tip-bot for Paul Turner写道:
> Commit-ID:  9d85f21c94f7f7a84d0ba686c58aa6d9da58fdbb
> Gitweb: http://git.kernel.org/tip/9d85f21c94f7f7a84d0ba686c58aa6d9da58fdbb
> Author: Paul Turner 
> AuthorDate: Thu, 4 Oct 2012 13:18:29 +0200
> Committer:  Ingo Molnar 
> CommitDate: Wed, 24 Oct 2012 10:27:18 +0200
> 
> sched: Track the runnable average on a per-task entity basis
> 
> Instead of tracking averaging the load parented by a cfs_rq, we can track
> entity load directly. With the load for a given cfs_rq then being the sum
> of its children.
> 
> To do this we represent the historical contribution to runnable average
> within each trailing 1024us of execution as the coefficients of a
> geometric series.
> 
> We can express this for a given task t as:
> 
>   runnable_sum(t) = \Sum u_i * y^i, runnable_avg_period(t) = \Sum 1024 * y^i
>   load(t) = weight_t * runnable_sum(t) / runnable_avg_period(t)
> 
> Where: u_i is the usage in the last i`th 1024us period (approximately 1ms)
> ~ms and y is chosen such that y^k = 1/2.  We currently choose k to be 32 which
> roughly translates to about a sched period.
> 
> Signed-off-by: Paul Turner 
> Reviewed-by: Ben Segall 
> Signed-off-by: Peter Zijlstra 
> Link: http://lkml.kernel.org/r/20120823141506.372695...@google.com
> Signed-off-by: Ingo Molnar 
> ---
>  include/linux/sched.h |   13 +
>  kernel/sched/core.c   |5 ++
>  kernel/sched/debug.c  |4 ++
>  kernel/sched/fair.c   |  129 
> +
>  4 files changed, 151 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0dd42a0..418fc6d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1095,6 +1095,16 @@ struct load_weight {
>   unsigned long weight, inv_weight;
>  };
>  
> +struct sched_avg {
> + /*
> +  * These sums represent an infinite geometric series and so are bound
> +  * above by 1024/(1-y).  Thus we only need a u32 to store them for for 
> all
> +  * choices of y < 1-2^(-32)*1024.
> +  */
> + u32 runnable_avg_sum, runnable_avg_period;
> + u64 last_runnable_update;
> +};
> +
>  #ifdef CONFIG_SCHEDSTATS
>  struct sched_statistics {
>   u64 wait_start;
> @@ -1155,6 +1165,9 @@ struct sched_entity {
>   /* rq "owned" by this entity/group: */
>   struct cfs_rq   *my_q;
>  #endif
> +#ifdef CONFIG_SMP
> + struct sched_avgavg;
> +#endif
>  };
>  
>  struct sched_rt_entity {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d8927f..fd9d085 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1524,6 +1524,11 @@ static void __sched_fork(struct task_struct *p)
>   p->se.vruntime  = 0;
>   INIT_LIST_HEAD(>se.group_node);
>  
> +#ifdef CONFIG_SMP
> + p->se.avg.runnable_avg_period = 0;
> + p->se.avg.runnable_avg_sum = 0;
> +#endif
> +
>  #ifdef CONFIG_SCHEDSTATS
>   memset(>se.statistics, 0, sizeof(p->se.statistics));
>  #endif
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 6f79596..61f7097 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -85,6 +85,10 @@ static void print_cfs_group_stats(struct seq_file *m, int 
> cpu, struct task_group
>   P(se->statistics.wait_count);
>  #endif
>   P(se->load.weight);
> +#ifdef CONFIG_SMP
> + P(se->avg.runnable_avg_sum);
> + P(se->avg.runnable_avg_period);
> +#endif
>  #undef PN
>  #undef P
>  }
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6b800a1..16d67f9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -971,6 +971,126 @@ static inline void update_entity_shares_tick(struct 
> cfs_rq *cfs_rq)
>  }
>  #endif /* CONFIG_FAIR_GROUP_SCHED */
>  
> +#ifdef CONFIG_SMP
> +/*
> + * Approximate:
> + *   val * y^n,where y^32 ~= 0.5 (~1 scheduling period)
> + */
> +static __always_inline u64 decay_load(u64 val, u64 n)
> +{
> + for (; n && val; n--) {
> + val *= 4008;
> + val >>= 12;
> + }
> +
> + return val;
> +}
> +
> +/*
> + * We can represent the historical contribution to runnable average as the
> + * coefficients of a geometric series.  To do this we sub-divide our runnable
> + * history into segments of approximately 1ms (1024us); label the segment 
> that
> + * occurred N-ms ago p_N, with p_0 corresponding to the current period, e.g.
> + *
> + * [<- 1024us ->|<- 1024us ->|<- 1024us ->| ...
> + *  p0p1   p2
> + * (now)   (~1ms ago)  (~2ms ago)
> + *
> + * Let u_i denote the fraction of p_i that the entity was runnable.
> + *
> + * We then designate the fractions u_i as our co-efficients, yielding the
> + * following representation of historical load:
> + *   u_0 + u_1*y + u_2*y^2 + u_3*y^3 + ...
> + *
> + * We choose y based on the with of a reasonably scheduling period, fixing:
> + *   y^32 = 0.5
> + *
> + * This means that the 

[PATCH V2 23/23] printk: Move kmsg_dump functions to separate file

2012-10-24 Thread Joe Perches
Generic restructuring.

Create kmsg_dump.c, add to Makefile and remove from printk.c

Signed-off-by: Joe Perches 
---
 kernel/printk/Makefile|1 +
 kernel/printk/kmsg_dump.c |  328 +
 kernel/printk/printk.c|  318 ---
 3 files changed, 329 insertions(+), 318 deletions(-)
 create mode 100644 kernel/printk/kmsg_dump.c

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 7947661..b0072b0 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -2,4 +2,5 @@ obj-y   = printk.o
 obj-y  += printk_log.o
 obj-y  += devkmsg.o
 obj-y  += printk_syslog.o
+obj-y  += kmsg_dump.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/kmsg_dump.c b/kernel/printk/kmsg_dump.c
new file mode 100644
index 000..7962172
--- /dev/null
+++ b/kernel/printk/kmsg_dump.c
@@ -0,0 +1,328 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "printk_log.h"
+
+static DEFINE_SPINLOCK(dump_list_lock);
+static LIST_HEAD(dump_list);
+
+/**
+ * kmsg_dump_register - register a kernel log dumper.
+ * @dumper: pointer to the kmsg_dumper structure
+ *
+ * Adds a kernel log dumper to the system. The dump callback in the
+ * structure will be called when the kernel oopses or panics and must be
+ * set. Returns zero on success and %-EINVAL or %-EBUSY otherwise.
+ */
+int kmsg_dump_register(struct kmsg_dumper *dumper)
+{
+   unsigned long flags;
+   int err = -EBUSY;
+
+   /* The dump callback needs to be set */
+   if (!dumper->dump)
+   return -EINVAL;
+
+   spin_lock_irqsave(_list_lock, flags);
+   /* Don't allow registering multiple times */
+   if (!dumper->registered) {
+   dumper->registered = 1;
+   list_add_tail_rcu(>list, _list);
+   err = 0;
+   }
+   spin_unlock_irqrestore(_list_lock, flags);
+
+   return err;
+}
+EXPORT_SYMBOL_GPL(kmsg_dump_register);
+
+/**
+ * kmsg_dump_unregister - unregister a kmsg dumper.
+ * @dumper: pointer to the kmsg_dumper structure
+ *
+ * Removes a dump device from the system. Returns zero on success and
+ * %-EINVAL otherwise.
+ */
+int kmsg_dump_unregister(struct kmsg_dumper *dumper)
+{
+   unsigned long flags;
+   int err = -EINVAL;
+
+   spin_lock_irqsave(_list_lock, flags);
+   if (dumper->registered) {
+   dumper->registered = 0;
+   list_del_rcu(>list);
+   err = 0;
+   }
+   spin_unlock_irqrestore(_list_lock, flags);
+   synchronize_rcu();
+
+   return err;
+}
+EXPORT_SYMBOL_GPL(kmsg_dump_unregister);
+
+static bool always_kmsg_dump;
+module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | 
S_IWUSR);
+
+/**
+ * kmsg_dump - dump kernel log to kernel message dumpers.
+ * @reason: the reason (oops, panic etc) for dumping
+ *
+ * Call each of the registered dumper's dump() callback, which can
+ * retrieve the kmsg records with kmsg_dump_get_line() or
+ * kmsg_dump_get_buffer().
+ */
+void kmsg_dump(enum kmsg_dump_reason reason)
+{
+   struct kmsg_dumper *dumper;
+   unsigned long flags;
+
+   if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
+   return;
+
+   rcu_read_lock();
+   list_for_each_entry_rcu(dumper, _list, list) {
+   if (dumper->max_reason && reason > dumper->max_reason)
+   continue;
+
+   /* initialize iterator with data about the stored records */
+   dumper->active = true;
+
+   raw_spin_lock_irqsave(_logbuf_lock, flags);
+   dumper->cur_seq = printk_log_clear_seq;
+   dumper->cur_idx = printk_log_clear_idx;
+   dumper->next_seq = printk_log_next_seq;
+   dumper->next_idx = printk_log_next_idx;
+   raw_spin_unlock_irqrestore(_logbuf_lock, flags);
+
+   /* invoke dumper which will iterate over records */
+   dumper->dump(dumper, reason);
+
+   /* reset iterator */
+   dumper->active = false;
+   }
+   rcu_read_unlock();
+}
+
+/**
+ * kmsg_dump_get_line_nolock - retrieve one kmsg log line (unlocked version)
+ * @dumper: registered kmsg dumper
+ * @syslog: include the "<4>" prefixes
+ * @line: buffer to copy the line to
+ * @size: maximum size of the buffer
+ * @len: length of line placed into buffer
+ *
+ * Start at the beginning of the kmsg buffer, with the oldest kmsg
+ * record, and copy one record into the provided buffer.
+ *
+ * Consecutive calls will return the next available record moving
+ * towards the end of the buffer with the youngest messages.
+ *
+ * A return value of FALSE indicates that there are no more records to
+ * read.
+ *
+ * The function is similar to kmsg_dump_get_line(), but grabs no locks.
+ */
+bool kmsg_dump_get_line_nolock(struct kmsg_dumper *dumper, bool syslog,
+ 

[PATCH V2 22/23] printk: Add printk_syslog.c and .h

2012-10-24 Thread Joe Perches
Move syslog functions to a separate file.
Add compilation unit to Makefile.
Add missing #include 

Reported-by: Yuanhan Liu 
Signed-off-by: Joe Perches 
---
 kernel/printk/Makefile|1 +
 kernel/printk/printk.c|  351 +
 kernel/printk/printk_syslog.c |  355 +
 kernel/printk/printk_syslog.h |   13 ++
 4 files changed, 373 insertions(+), 347 deletions(-)
 create mode 100644 kernel/printk/printk_syslog.c
 create mode 100644 kernel/printk/printk_syslog.h

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index bda335f..7947661 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,4 +1,5 @@
 obj-y  = printk.o
 obj-y  += printk_log.o
 obj-y  += devkmsg.o
+obj-y  += printk_syslog.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0c6042a..bed0e7d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -30,12 +30,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -49,6 +47,7 @@
 #include "console_cmdline.h"
 #include "braille.h"
 #include "printk_log.h"
+#include "printk_syslog.h"
 
 /*
  * Architectures can override it:
@@ -122,11 +121,6 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 static int console_may_schedule;
 
 #ifdef CONFIG_PRINTK
-/* the next printk record to read by syslog(READ) or /proc/kmsg */
-static u64 syslog_seq;
-static u32 syslog_idx;
-static enum printk_log_flags syslog_prev;
-static size_t syslog_partial;
 
 /* the next printk record to write to the console */
 static u64 console_seq;
@@ -272,340 +266,6 @@ static inline void boot_delay_msec(void)
 }
 #endif
 
-#ifdef CONFIG_SECURITY_DMESG_RESTRICT
-int dmesg_restrict = 1;
-#else
-int dmesg_restrict;
-#endif
-
-static int syslog_action_restricted(int type)
-{
-   if (dmesg_restrict)
-   return 1;
-   /* Unless restricted, we allow "read all" and "get buffer size" for 
everybody */
-   return type != SYSLOG_ACTION_READ_ALL && type != 
SYSLOG_ACTION_SIZE_BUFFER;
-}
-
-static int check_syslog_permissions(int type, bool from_file)
-{
-   /*
-* If this is from /proc/kmsg and we've already opened it, then we've
-* already done the capabilities checks at open time.
-*/
-   if (from_file && type != SYSLOG_ACTION_OPEN)
-   return 0;
-
-   if (syslog_action_restricted(type)) {
-   if (capable(CAP_SYSLOG))
-   return 0;
-   /* For historical reasons, accept CAP_SYS_ADMIN too, with a 
warning */
-   if (capable(CAP_SYS_ADMIN)) {
-   printk_once(KERN_WARNING "%s (%d): "
-"Attempt to access syslog with CAP_SYS_ADMIN "
-"but no CAP_SYSLOG (deprecated).\n",
-current->comm, task_pid_nr(current));
-   return 0;
-   }
-   return -EPERM;
-   }
-   return 0;
-}
-
-static int syslog_print(char __user *buf, int size)
-{
-   char *text;
-   struct printk_log *msg;
-   int len = 0;
-
-   text = kmalloc(PRINTK_LOG_LINE_MAX + PRINTK_PREFIX_MAX, GFP_KERNEL);
-   if (!text)
-   return -ENOMEM;
-
-   while (size > 0) {
-   size_t n;
-   size_t skip;
-
-   raw_spin_lock_irq(_logbuf_lock);
-   if (syslog_seq < printk_log_first_seq) {
-   /* messages are gone, move to first one */
-   syslog_seq = printk_log_first_seq;
-   syslog_idx = printk_log_first_idx;
-   syslog_prev = 0;
-   syslog_partial = 0;
-   }
-   if (syslog_seq == printk_log_next_seq) {
-   raw_spin_unlock_irq(_logbuf_lock);
-   break;
-   }
-
-   skip = syslog_partial;
-   msg = printk_log_from_idx(syslog_idx);
-   n = printk_msg_print_text(msg, syslog_prev, true, text,
- PRINTK_LOG_LINE_MAX + 
PRINTK_PREFIX_MAX);
-   if (n - syslog_partial <= size) {
-   /* message fits into buffer, move forward */
-   syslog_idx = printk_log_next(syslog_idx);
-   syslog_seq++;
-   syslog_prev = msg->flags;
-   n -= syslog_partial;
-   syslog_partial = 0;
-   } else if (!len){
-   /* partial read(), remember position */
-   n = size;
-   syslog_partial += n;
-   } else
-   n = 0;
-   raw_spin_unlock_irq(_logbuf_lock);
-
-   if (!n)
-   break;

[PATCH V2 21/23] printk: Move functions printk_print_time and printk_msg_print_text

2012-10-24 Thread Joe Perches
Move these functions to printk_log.
Move the static function print_prefix too.
Add "#include " to printk_log.c.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |  112 ---
 kernel/printk/printk_log.c |  114 
 2 files changed, 114 insertions(+), 112 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b671523..0c6042a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -311,111 +311,6 @@ static int check_syslog_permissions(int type, bool 
from_file)
return 0;
 }
 
-#if defined(CONFIG_PRINTK_TIME)
-static bool printk_time = 1;
-#else
-static bool printk_time;
-#endif
-module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
-
-size_t printk_print_time(u64 ts, char *buf)
-{
-   unsigned long rem_nsec;
-
-   if (!printk_time)
-   return 0;
-
-   if (!buf)
-   return 15;
-
-   rem_nsec = do_div(ts, 10);
-   return sprintf(buf, "[%5lu.%06lu] ",
-  (unsigned long)ts, rem_nsec / 1000);
-}
-
-static size_t print_prefix(const struct printk_log *msg, bool syslog, char 
*buf)
-{
-   size_t len = 0;
-   unsigned int prefix = (msg->facility << 3) | msg->level;
-
-   if (syslog) {
-   if (buf) {
-   len += sprintf(buf, "<%u>", prefix);
-   } else {
-   len += 3;
-   if (prefix > 999)
-   len += 3;
-   else if (prefix > 99)
-   len += 2;
-   else if (prefix > 9)
-   len++;
-   }
-   }
-
-   len += printk_print_time(msg->ts_nsec, buf ? buf + len : NULL);
-   return len;
-}
-
-size_t printk_msg_print_text(const struct printk_log *msg,
-enum printk_log_flags prev,
-bool syslog, char *buf, size_t size)
-{
-   const char *text = printk_log_text(msg);
-   size_t text_size = msg->text_len;
-   bool prefix = true;
-   bool newline = true;
-   size_t len = 0;
-
-   if ((prev & LOG_CONT) && !(msg->flags & LOG_PREFIX))
-   prefix = false;
-
-   if (msg->flags & LOG_CONT) {
-   if ((prev & LOG_CONT) && !(prev & LOG_NEWLINE))
-   prefix = false;
-
-   if (!(msg->flags & LOG_NEWLINE))
-   newline = false;
-   }
-
-   do {
-   const char *next = memchr(text, '\n', text_size);
-   size_t text_len;
-
-   if (next) {
-   text_len = next - text;
-   next++;
-   text_size -= next - text;
-   } else {
-   text_len = text_size;
-   }
-
-   if (buf) {
-   if (print_prefix(msg, syslog, NULL) +
-   text_len + 1 >= size - len)
-   break;
-
-   if (prefix)
-   len += print_prefix(msg, syslog, buf + len);
-   memcpy(buf + len, text, text_len);
-   len += text_len;
-   if (next || newline)
-   buf[len++] = '\n';
-   } else {
-   /* SYSLOG_ACTION_* buffer size only calculation */
-   if (prefix)
-   len += print_prefix(msg, syslog, NULL);
-   len += text_len;
-   if (next || newline)
-   len++;
-   }
-
-   prefix = true;
-   text = next;
-   } while (text);
-
-   return len;
-}
-
 static int syslog_print(char __user *buf, int size)
 {
char *text;
@@ -1184,13 +1079,6 @@ static struct cont {
 struct printk_log *printk_log_from_idx(u32 idx) { return NULL; }
 u32 printk_log_next(u32 idx) { return 0; }
 static void call_console_drivers(int level, const char *text, size_t len) {}
-size_t printk_print_time(u64 ts, char *buf) { return 0; }
-size_t printk_msg_print_text(const struct printk_log *msg,
-enum printk_log_flags prev,
-bool syslog, char *buf, size_t size)
-{
-   return 0;
-}
 
 static size_t cont_print_text(char *text, size_t size) { return 0; }
 
diff --git a/kernel/printk/printk_log.c b/kernel/printk/printk_log.c
index b5c2b8f..d38129c 100644
--- a/kernel/printk/printk_log.c
+++ b/kernel/printk/printk_log.c
@@ -2,6 +2,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "printk_log.h"
 
@@ -135,6 +137,111 @@ void printk_log_store(int facility, int level,
printk_log_next_seq++;
 }
 
+#if defined(CONFIG_PRINTK_TIME)
+static bool printk_time = 1;
+#else
+static bool printk_time;
+#endif

[PATCH V2 20/23] printk: Prefix print_time and msg_print_text with printk_

2012-10-24 Thread Joe Perches
Make these static functions global and prefix them with printk_.
Create declarations for these functions in printk_log.h

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   43 ---
 kernel/printk/printk_log.h |4 
 2 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5e30343..b671523 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -318,7 +318,7 @@ static bool printk_time;
 #endif
 module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
 
-static size_t print_time(u64 ts, char *buf)
+size_t printk_print_time(u64 ts, char *buf)
 {
unsigned long rem_nsec;
 
@@ -352,11 +352,11 @@ static size_t print_prefix(const struct printk_log *msg, 
bool syslog, char *buf)
}
}
 
-   len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
+   len += printk_print_time(msg->ts_nsec, buf ? buf + len : NULL);
return len;
 }
 
-static size_t msg_print_text(const struct printk_log *msg,
+size_t printk_msg_print_text(const struct printk_log *msg,
 enum printk_log_flags prev,
 bool syslog, char *buf, size_t size)
 {
@@ -445,8 +445,8 @@ static int syslog_print(char __user *buf, int size)
 
skip = syslog_partial;
msg = printk_log_from_idx(syslog_idx);
-   n = msg_print_text(msg, syslog_prev, true, text,
-  PRINTK_LOG_LINE_MAX + PRINTK_PREFIX_MAX);
+   n = printk_msg_print_text(msg, syslog_prev, true, text,
+ PRINTK_LOG_LINE_MAX + 
PRINTK_PREFIX_MAX);
if (n - syslog_partial <= size) {
/* message fits into buffer, move forward */
syslog_idx = printk_log_next(syslog_idx);
@@ -512,7 +512,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
while (seq < printk_log_next_seq) {
struct printk_log *msg = printk_log_from_idx(idx);
 
-   len += msg_print_text(msg, prev, true, NULL, 0);
+   len += printk_msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
idx = printk_log_next(idx);
seq++;
@@ -525,7 +525,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
while (len > size && seq < printk_log_next_seq) {
struct printk_log *msg = printk_log_from_idx(idx);
 
-   len -= msg_print_text(msg, prev, true, NULL, 0);
+   len -= printk_msg_print_text(msg, prev, true, NULL, 0);
prev = msg->flags;
idx = printk_log_next(idx);
seq++;
@@ -540,8 +540,8 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
struct printk_log *msg = printk_log_from_idx(idx);
int textlen;
 
-   textlen = msg_print_text(msg, prev, true, text,
-PRINTK_LOG_LINE_MAX + 
PRINTK_PREFIX_MAX);
+   textlen = printk_msg_print_text(msg, prev, true, text,
+   PRINTK_LOG_LINE_MAX + 
PRINTK_PREFIX_MAX);
if (textlen < 0) {
len = textlen;
break;
@@ -685,7 +685,7 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
while (seq < printk_log_next_seq) {
struct printk_log *msg = 
printk_log_from_idx(idx);
 
-   error += msg_print_text(msg, prev, true, NULL, 
0);
+   error += printk_msg_print_text(msg, prev, true, 
NULL, 0);
idx = printk_log_next(idx);
seq++;
prev = msg->flags;
@@ -936,7 +936,7 @@ static size_t cont_print_text(char *text, size_t size)
size_t len;
 
if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
-   textlen += print_time(cont.ts_nsec, text);
+   textlen += printk_print_time(cont.ts_nsec, text);
size -= textlen;
}
 
@@ -1184,9 +1184,14 @@ static struct cont {
 struct printk_log *printk_log_from_idx(u32 idx) { return NULL; }
 u32 printk_log_next(u32 idx) { return 0; }
 static void call_console_drivers(int level, const char *text, size_t len) {}
-static size_t msg_print_text(const struct printk_log *msg,
+size_t printk_print_time(u64 ts, char *buf) { return 0; }
+size_t printk_msg_print_text(const struct printk_log *msg,
 enum printk_log_flags prev,
-bool syslog, 

[PATCH V2 19/23] printk: Move devkmsg bits to separate file

2012-10-24 Thread Joe Perches
Move the devkmsg_ functions and kmsg_fops declaration
to devkmsg.c.

Add devkmsg.o to Makefile.

Signed-off-by: Joe Perches 
---
 kernel/printk/Makefile  |1 +
 kernel/printk/devkmsg.c |  309 +++
 kernel/printk/printk.c  |  296 -
 3 files changed, 310 insertions(+), 296 deletions(-)
 create mode 100644 kernel/printk/devkmsg.c

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index a692b68..bda335f 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,3 +1,4 @@
 obj-y  = printk.o
 obj-y  += printk_log.o
+obj-y  += devkmsg.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/devkmsg.c b/kernel/printk/devkmsg.c
new file mode 100644
index 000..af83290
--- /dev/null
+++ b/kernel/printk/devkmsg.c
@@ -0,0 +1,309 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "printk_log.h"
+
+/* /dev/kmsg - userspace message inject/listen interface */
+struct devkmsg_user {
+   u64 seq;
+   u32 idx;
+   enum printk_log_flags prev;
+   struct mutex lock;
+   char buf[8192];
+};
+
+static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
+ unsigned long count, loff_t pos)
+{
+   char *buf, *line;
+   int i;
+   int level = default_message_loglevel;
+   int facility = 1;   /* LOG_USER */
+   size_t len = iov_length(iv, count);
+   ssize_t ret = len;
+
+   if (len > PRINTK_LOG_LINE_MAX)
+   return -EINVAL;
+   buf = kmalloc(len+1, GFP_KERNEL);
+   if (buf == NULL)
+   return -ENOMEM;
+
+   line = buf;
+   for (i = 0; i < count; i++) {
+   if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len)) {
+   ret = -EFAULT;
+   goto out;
+   }
+   line += iv[i].iov_len;
+   }
+
+   /*
+* Extract and skip the syslog prefix <[0-9]*>. Coming from userspace
+* the decimal value represents 32bit, the lower 3 bit are the log
+* level, the rest are the log facility.
+*
+* If no prefix or no userspace facility is specified, we
+* enforce LOG_USER, to be able to reliably distinguish
+* kernel-generated messages from userspace-injected ones.
+*/
+   line = buf;
+   if (line[0] == '<') {
+   char *endp = NULL;
+
+   i = simple_strtoul(line+1, , 10);
+   if (endp && endp[0] == '>') {
+   level = i & 7;
+   if (i >> 3)
+   facility = i >> 3;
+   endp++;
+   len -= endp - line;
+   line = endp;
+   }
+   }
+   line[len] = '\0';
+
+   printk_emit(facility, level, NULL, 0, "%s", line);
+out:
+   kfree(buf);
+   return ret;
+}
+
+static ssize_t devkmsg_read(struct file *file, char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   struct devkmsg_user *user = file->private_data;
+   struct printk_log *msg;
+   u64 ts_usec;
+   size_t i;
+   char cont = '-';
+   size_t len;
+   ssize_t ret;
+
+   if (!user)
+   return -EBADF;
+
+   ret = mutex_lock_interruptible(>lock);
+   if (ret)
+   return ret;
+   raw_spin_lock_irq(_logbuf_lock);
+   while (user->seq == printk_log_next_seq) {
+   if (file->f_flags & O_NONBLOCK) {
+   ret = -EAGAIN;
+   raw_spin_unlock_irq(_logbuf_lock);
+   goto out;
+   }
+
+   raw_spin_unlock_irq(_logbuf_lock);
+   ret = wait_event_interruptible(printk_log_wait,
+  user->seq != 
printk_log_next_seq);
+   if (ret)
+   goto out;
+   raw_spin_lock_irq(_logbuf_lock);
+   }
+
+   if (user->seq < printk_log_first_seq) {
+   /* our last seen message is gone, return error and reset */
+   user->idx = printk_log_first_idx;
+   user->seq = printk_log_first_seq;
+   ret = -EPIPE;
+   raw_spin_unlock_irq(_logbuf_lock);
+   goto out;
+   }
+
+   msg = printk_log_from_idx(user->idx);
+   ts_usec = msg->ts_nsec;
+   do_div(ts_usec, 1000);
+
+   /*
+* If we couldn't merge continuation line fragments during the print,
+* export the stored flags to allow an optional external merge of the
+* records. Merging the records isn't always neccessarily correct, like
+* when we hit a race during printing. In most cases though, it produces
+* better readable output. 'c' in the record flags mark the first
+* 

[PATCH V2 18/23] printk: Rename and move 2 #defines to printk_log.h

2012-10-24 Thread Joe Perches
Rename the LOG_LINE_MAX and PREFIX_MAX #defines with PRINTK_ prefixes.
Move the defines to printk_log.h
Remove duplicate define too.

Fixed redefined PRINTK_LOG_LINE_MAX and PRINTK_PREFIX_MAX.

Reported-by: Yuanhan Liu 
Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   22 --
 kernel/printk/printk_log.h |8 
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3b5c10e..988d048 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -133,9 +133,6 @@ static u64 console_seq;
 static u32 console_idx;
 static enum printk_log_flags console_prev;
 
-#define PREFIX_MAX 32
-#define LOG_LINE_MAX   1024 - PREFIX_MAX
-
 /* cpu currently holding printk_logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
@@ -158,7 +155,7 @@ static ssize_t devkmsg_writev(struct kiocb *iocb, const 
struct iovec *iv,
size_t len = iov_length(iv, count);
ssize_t ret = len;
 
-   if (len > LOG_LINE_MAX)
+   if (len > PRINTK_LOG_LINE_MAX)
return -EINVAL;
buf = kmalloc(len+1, GFP_KERNEL);
if (buf == NULL)
@@ -721,7 +718,7 @@ static int syslog_print(char __user *buf, int size)
struct printk_log *msg;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(PRINTK_LOG_LINE_MAX + PRINTK_PREFIX_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -745,7 +742,7 @@ static int syslog_print(char __user *buf, int size)
skip = syslog_partial;
msg = printk_log_from_idx(syslog_idx);
n = msg_print_text(msg, syslog_prev, true, text,
-  LOG_LINE_MAX + PREFIX_MAX);
+  PRINTK_LOG_LINE_MAX + PRINTK_PREFIX_MAX);
if (n - syslog_partial <= size) {
/* message fits into buffer, move forward */
syslog_idx = printk_log_next(syslog_idx);
@@ -784,7 +781,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
char *text;
int len = 0;
 
-   text = kmalloc(LOG_LINE_MAX + PREFIX_MAX, GFP_KERNEL);
+   text = kmalloc(PRINTK_LOG_LINE_MAX + PRINTK_PREFIX_MAX, GFP_KERNEL);
if (!text)
return -ENOMEM;
 
@@ -840,7 +837,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
int textlen;
 
textlen = msg_print_text(msg, prev, true, text,
-LOG_LINE_MAX + PREFIX_MAX);
+PRINTK_LOG_LINE_MAX + 
PRINTK_PREFIX_MAX);
if (textlen < 0) {
len = textlen;
break;
@@ -1160,7 +1157,7 @@ static inline void printk_delay(void)
  * reached the console in case of a kernel crash.
  */
 static struct cont {
-   char buf[LOG_LINE_MAX];
+   char buf[PRINTK_LOG_LINE_MAX];
size_t len; /* length == 0 means unused buffer */
size_t cons;/* bytes written to console */
struct task_struct *owner;  /* task of first print*/
@@ -1262,7 +1259,7 @@ asmlinkage int vprintk_emit(int facility, int level,
const char *fmt, va_list args)
 {
static int recursion_bug;
-   static char textbuf[LOG_LINE_MAX];
+   static char textbuf[PRINTK_LOG_LINE_MAX];
char *text = textbuf;
size_t text_len;
enum printk_log_flags lflags = 0;
@@ -1465,9 +1462,6 @@ EXPORT_SYMBOL(printk);
 
 #else /* CONFIG_PRINTK */
 
-#define LOG_LINE_MAX   0
-#define PREFIX_MAX 0
-#define LOG_LINE_MAX 0
 static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
@@ -1793,7 +1787,7 @@ out:
  */
 void console_unlock(void)
 {
-   static char text[LOG_LINE_MAX + PREFIX_MAX];
+   static char text[PRINTK_LOG_LINE_MAX + PRINTK_PREFIX_MAX];
static u64 seen_seq;
unsigned long flags;
bool wake_klogd = false;
diff --git a/kernel/printk/printk_log.h b/kernel/printk/printk_log.h
index e846f1d..724b6e0 100644
--- a/kernel/printk/printk_log.h
+++ b/kernel/printk/printk_log.h
@@ -92,6 +92,14 @@ struct printk_log {
 #endif
 #define __PRINTK_LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
 
+#ifdef CONFIG_PRINTK
+#define PRINTK_PREFIX_MAX  32
+#define PRINTK_LOG_LINE_MAX(1024 - PRINTK_PREFIX_MAX)
+#else
+#define PRINTK_LOG_LINE_MAX0
+#define PRINTK_PREFIX_MAX  0
+#endif
+
 extern raw_spinlock_t printk_logbuf_lock;
 extern wait_queue_head_t printk_log_wait;
 extern u64 printk_log_first_seq;
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH V2 17/23] printk: Make wait_queue_head_t printk_log_wait extern

2012-10-24 Thread Joe Perches
Move the variable to the .h file too.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk_log.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/printk/printk_log.h b/kernel/printk/printk_log.h
index 0327f8d..e846f1d 100644
--- a/kernel/printk/printk_log.h
+++ b/kernel/printk/printk_log.h
@@ -93,6 +93,7 @@ struct printk_log {
 #define __PRINTK_LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
 
 extern raw_spinlock_t printk_logbuf_lock;
+extern wait_queue_head_t printk_log_wait;
 extern u64 printk_log_first_seq;
 extern u32 printk_log_first_idx;
 extern u64 printk_log_next_seq;
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 16/23] printk: Add printk_log.c

2012-10-24 Thread Joe Perches
Move print_log variables and functions into a separate file.

Signed-off-by: Joe Perches 
---
 kernel/printk/Makefile |1 +
 kernel/printk/printk.c |  128 -
 kernel/printk/printk_log.c |  149 
 3 files changed, 150 insertions(+), 128 deletions(-)
 create mode 100644 kernel/printk/printk_log.c

diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 85405bd..a692b68 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1,2 +1,3 @@
 obj-y  = printk.o
+obj-y  += printk_log.o
 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3b18ade..3b5c10e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -121,12 +121,6 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 /* Flag: console code may call schedule() */
 static int console_may_schedule;
 
-/*
- * The printk_logbuf_lock protects kmsg buffer, indices, counters. It is also
- * used in interesting ways to provide interlocking in console_unlock();
- */
-DEFINE_RAW_SPINLOCK(printk_logbuf_lock);
-
 #ifdef CONFIG_PRINTK
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
@@ -134,139 +128,17 @@ static u32 syslog_idx;
 static enum printk_log_flags syslog_prev;
 static size_t syslog_partial;
 
-/* index and sequence number of the first record stored in the buffer */
-u64 printk_log_first_seq;
-u32 printk_log_first_idx;
-
-/* index and sequence number of the next record to store in the buffer */
-u64 printk_log_next_seq;
-u32 printk_log_next_idx;
-
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
 static enum printk_log_flags console_prev;
 
-/* the next printk record to read after the last 'clear' command */
-u64 printk_log_clear_seq;
-u32 printk_log_clear_idx;
-
 #define PREFIX_MAX 32
 #define LOG_LINE_MAX   1024 - PREFIX_MAX
 
-/* record buffer */
-char __printk_log_buf[__PRINTK_LOG_BUF_LEN] __aligned(PRINTK_LOG_ALIGN);
-char *printk_log_buf = __printk_log_buf;
-u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
-
 /* cpu currently holding printk_logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
-/* human readable text of the record */
-char *printk_log_text(const struct printk_log *msg)
-{
-   return (char *)msg + sizeof(struct printk_log);
-}
-
-/* optional key/value pair dictionary attached to the record */
-char *printk_log_dict(const struct printk_log *msg)
-{
-   return (char *)msg + sizeof(struct printk_log) + msg->text_len;
-}
-
-/* get record by index; idx must point to valid msg */
-struct printk_log *printk_log_from_idx(u32 idx)
-{
-   struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
-
-   /*
-* A length == 0 record is the end of buffer marker. Wrap around and
-* read the message at the start of the buffer.
-*/
-   if (!msg->len)
-   return (struct printk_log *)printk_log_buf;
-   return msg;
-}
-
-/* get next record; idx must point to valid msg */
-u32 printk_log_next(u32 idx)
-{
-   struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
-
-   /* length == 0 indicates the end of the buffer; wrap */
-   /*
-* A length == 0 record is the end of buffer marker. Wrap around and
-* read the message at the start of the buffer as *this* one, and
-* return the one after that.
-*/
-   if (!msg->len) {
-   msg = (struct printk_log *)printk_log_buf;
-   return msg->len;
-   }
-   return idx + msg->len;
-}
-
-/* insert record into the buffer, discard old ones, update heads */
-void printk_log_store(int facility, int level,
- enum printk_log_flags flags, u64 ts_nsec,
- const char *dict, u16 dict_len,
- const char *text, u16 text_len)
-{
-   struct printk_log *msg;
-   u32 size, pad_len;
-
-   /* number of '\0' padding bytes to next message */
-   size = sizeof(struct printk_log) + text_len + dict_len;
-   pad_len = (-size) & (PRINTK_LOG_ALIGN - 1);
-   size += pad_len;
-
-   while (printk_log_first_seq < printk_log_next_seq) {
-   u32 free;
-
-   if (printk_log_next_idx > printk_log_first_idx)
-   free = max(printk_log_buf_len - printk_log_next_idx, 
printk_log_first_idx);
-   else
-   free = printk_log_first_idx - printk_log_next_idx;
-
-   if (free > size + sizeof(struct printk_log))
-   break;
-
-   /* drop old messages until we have enough contiuous space */
-   printk_log_first_idx = printk_log_next(printk_log_first_idx);
-   printk_log_first_seq++;
-   }
-
-   if (printk_log_next_idx + size + sizeof(struct printk_log) >= 

[PATCH V2 15/23] printk: Add and use printk_log.h

2012-10-24 Thread Joe Perches
Create a header file for printk_log functions and variables.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   91 +--
 kernel/printk/printk_log.h |  115 
 2 files changed, 116 insertions(+), 90 deletions(-)
 create mode 100644 kernel/printk/printk_log.h

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bc0b4ed..3b18ade 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -48,6 +48,7 @@
 
 #include "console_cmdline.h"
 #include "braille.h"
+#include "printk_log.h"
 
 /*
  * Architectures can override it:
@@ -121,90 +122,6 @@ EXPORT_SYMBOL(console_set_on_cmdline);
 static int console_may_schedule;
 
 /*
- * The printk log buffer consists of a chain of concatenated variable
- * length records. Every record starts with a record header, containing
- * the overall length of the record.
- *
- * The heads to the first and last entry in the buffer, as well as the
- * sequence numbers of these both entries are maintained when messages
- * are stored..
- *
- * If the heads indicate available messages, the length in the header
- * tells the start next message. A length == 0 for the next message
- * indicates a wrap-around to the beginning of the buffer.
- *
- * Every record carries the monotonic timestamp in microseconds, as well as
- * the standard userspace syslog level and syslog facility. The usual
- * kernel messages use LOG_KERN; userspace-injected messages always carry
- * a matching syslog facility, by default LOG_USER. The origin of every
- * message can be reliably determined that way.
- *
- * The human readable log message directly follows the message header. The
- * length of the message text is stored in the header, the stored message
- * is not terminated.
- *
- * Optionally, a message can carry a dictionary of properties (key/value 
pairs),
- * to provide userspace with a machine-readable message context.
- *
- * Examples for well-defined, commonly used property names are:
- *   DEVICE=b12:8   device identifier
- *b12:8 block dev_t
- *c127:3char dev_t
- *n8netdev ifindex
- *+sound:card0  subsystem:devname
- *   SUBSYSTEM=pci  driver-core subsystem name
- *
- * Valid characters in property names are [a-zA-Z0-9.-_]. The plain text value
- * follows directly after a '=' character. Every property is terminated by
- * a '\0' character. The last property is not terminated.
- *
- * Example of a message structure:
- *     ff 8f 00 00 00 00 00 00  monotonic time in nsec
- *   0008  34 00record is 52 bytes long
- *   000a0b 00  text is 11 bytes long
- *   000c  1f 00dictionary is 23 bytes long
- *   000e03 00  LOG_KERN (facility) LOG_ERR (level)
- *   0010  69 74 27 73 20 61 20 6c  "it's a l"
- * 69 6e 65 "ine"
- *   001b   44 45 56 49 43  "DEVIC"
- * 45 3d 62 38 3a 32 00 44  "E=b8:2\0D"
- * 52 49 56 45 52 3d 62 75  "RIVER=bu"
- * 67   "g"
- *   0032 00 00 00  padding to next message header
- *
- * The 'struct printk_log' buffer header must never be directly exported to
- * userspace, it is a kernel-private implementation detail that might
- * need to be changed in the future, when the requirements change.
- *
- * /dev/kmsg exports the structured data in the following line format:
- *   "level,sequnum,timestamp;\n"
- *
- * The optional key/value pairs are attached as continuation lines starting
- * with a space character and terminated by a newline. All possible
- * non-prinatable characters are escaped in the "\xff" notation.
- *
- * Users of the export format should ignore possible additional values
- * separated by ',', and find the message after the ';' character.
- */
-
-enum printk_log_flags {
-   LOG_NOCONS  = 1,/* already flushed, do not print to console */
-   LOG_NEWLINE = 2,/* text ended with a newline */
-   LOG_PREFIX  = 4,/* text started with a prefix */
-   LOG_CONT= 8,/* text is a fragment of a continuation line */
-};
-
-struct printk_log {
-   u64 ts_nsec;/* timestamp in nanoseconds */
-   u16 len;/* length of entire record */
-   u16 text_len;   /* length of text buffer */
-   u16 dict_len;   /* length of dictionary buffer */
-   u8 facility;/* syslog facility */
-   u8 flags:5; /* internal record flags */
-   u8 level:3; /* syslog level */
-};
-
-/*
  * The printk_logbuf_lock protects kmsg buffer, indices, counters. It is also
  * used in interesting ways to provide interlocking in console_unlock();
  */
@@ -238,12 

Re: [PATCH V2 6/6] Thermal: Add ST-Ericsson DB8500 thermal properties and platform data.

2012-10-24 Thread Hongbo Zhang
On 24 October 2012 22:32, Joe Perches  wrote:
> On Wed, 2012-10-24 at 19:58 +0800, hongbo.zhang wrote:
>> This patch adds device tree properties for ST-Ericsson DB8500 thermal driver,
>> also adds the platform data to support the old fashion.
>
> Just a trivial note:
>
>> diff --git a/arch/arm/mach-ux500/board-mop500.c 
>> b/arch/arm/mach-ux500/board-mop500.c
>
>> @@ -229,6 +230,67 @@ static struct ab8500_platform_data ab8500_platdata = {
>>  };
>>
>>  /*
>> + * Thermal Sensor
>> + */
>> +
>> +static struct resource db8500_thsens_resources[] = {
>
> should there be a const in any of these?
There will be warnings from gcc:
warning: initialization discards 'const' qualifier from pointer target
type [enabled by default]

>
>> +static struct db8500_thsens_platform_data db8500_thsens_data = {
> []
>> +static struct platform_device u8500_thsens_device = {
> []
>> +static struct platform_device u8500_cpufreq_cooling_device = {
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 14/23] printk: Rename LOG_ALIGN to PRINTK_LOG_ALIGN

2012-10-24 Thread Joe Perches
Make the #define more specific to the printk subsystem.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0134b2e..bc0b4ed 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -239,12 +239,12 @@ u32 printk_log_clear_idx;
 
 /* record buffer */
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
-#define LOG_ALIGN 4
+#define PRINTK_LOG_ALIGN 4
 #else
-#define LOG_ALIGN __alignof__(struct printk_log)
+#define PRINTK_LOG_ALIGN __alignof__(struct printk_log)
 #endif
 #define __PRINTK_LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
-char __printk_log_buf[__PRINTK_LOG_BUF_LEN] __aligned(LOG_ALIGN);
+char __printk_log_buf[__PRINTK_LOG_BUF_LEN] __aligned(PRINTK_LOG_ALIGN);
 char *printk_log_buf = __printk_log_buf;
 u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
 
@@ -306,7 +306,7 @@ void printk_log_store(int facility, int level,
 
/* number of '\0' padding bytes to next message */
size = sizeof(struct printk_log) + text_len + dict_len;
-   pad_len = (-size) & (LOG_ALIGN - 1);
+   pad_len = (-size) & (PRINTK_LOG_ALIGN - 1);
size += pad_len;
 
while (printk_log_first_seq < printk_log_next_seq) {
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 13/23] printk: Remove static from printk_ variables

2012-10-24 Thread Joe Perches
Allow a separation of functions and variables into
multiple files.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   46 +++---
 1 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3785ac4..0134b2e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -208,7 +208,7 @@ struct printk_log {
  * The printk_logbuf_lock protects kmsg buffer, indices, counters. It is also
  * used in interesting ways to provide interlocking in console_unlock();
  */
-static DEFINE_RAW_SPINLOCK(printk_logbuf_lock);
+DEFINE_RAW_SPINLOCK(printk_logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
@@ -218,12 +218,12 @@ static enum printk_log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
-static u64 printk_log_first_seq;
-static u32 printk_log_first_idx;
+u64 printk_log_first_seq;
+u32 printk_log_first_idx;
 
 /* index and sequence number of the next record to store in the buffer */
-static u64 printk_log_next_seq;
-static u32 printk_log_next_idx;
+u64 printk_log_next_seq;
+u32 printk_log_next_idx;
 
 /* the next printk record to write to the console */
 static u64 console_seq;
@@ -231,8 +231,8 @@ static u32 console_idx;
 static enum printk_log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
-static u64 printk_log_clear_seq;
-static u32 printk_log_clear_idx;
+u64 printk_log_clear_seq;
+u32 printk_log_clear_idx;
 
 #define PREFIX_MAX 32
 #define LOG_LINE_MAX   1024 - PREFIX_MAX
@@ -244,27 +244,27 @@ static u32 printk_log_clear_idx;
 #define LOG_ALIGN __alignof__(struct printk_log)
 #endif
 #define __PRINTK_LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
-static char __printk_log_buf[__PRINTK_LOG_BUF_LEN] __aligned(LOG_ALIGN);
-static char *printk_log_buf = __printk_log_buf;
-static u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
+char __printk_log_buf[__PRINTK_LOG_BUF_LEN] __aligned(LOG_ALIGN);
+char *printk_log_buf = __printk_log_buf;
+u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
 
 /* cpu currently holding printk_logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
 /* human readable text of the record */
-static char *printk_log_text(const struct printk_log *msg)
+char *printk_log_text(const struct printk_log *msg)
 {
return (char *)msg + sizeof(struct printk_log);
 }
 
 /* optional key/value pair dictionary attached to the record */
-static char *printk_log_dict(const struct printk_log *msg)
+char *printk_log_dict(const struct printk_log *msg)
 {
return (char *)msg + sizeof(struct printk_log) + msg->text_len;
 }
 
 /* get record by index; idx must point to valid msg */
-static struct printk_log *printk_log_from_idx(u32 idx)
+struct printk_log *printk_log_from_idx(u32 idx)
 {
struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
 
@@ -278,7 +278,7 @@ static struct printk_log *printk_log_from_idx(u32 idx)
 }
 
 /* get next record; idx must point to valid msg */
-static u32 printk_log_next(u32 idx)
+u32 printk_log_next(u32 idx)
 {
struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
 
@@ -296,10 +296,10 @@ static u32 printk_log_next(u32 idx)
 }
 
 /* insert record into the buffer, discard old ones, update heads */
-static void printk_log_store(int facility, int level,
-enum printk_log_flags flags, u64 ts_nsec,
-const char *dict, u16 dict_len,
-const char *text, u16 text_len)
+void printk_log_store(int facility, int level,
+ enum printk_log_flags flags, u64 ts_nsec,
+ const char *dict, u16 dict_len,
+ const char *text, u16 text_len)
 {
struct printk_log *msg;
u32 size, pad_len;
@@ -1690,9 +1690,9 @@ static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
 static enum printk_log_flags syslog_prev;
-static u64 printk_log_first_seq;
-static u32 printk_log_first_idx;
-static u64 printk_log_next_seq;
+u64 printk_log_first_seq;
+u32 printk_log_first_idx;
+u64 printk_log_next_seq;
 static enum printk_log_flags console_prev;
 static struct cont {
size_t len;
@@ -1700,8 +1700,8 @@ static struct cont {
u8 level;
bool flushed:1;
 } cont;
-static struct printk_log *printk_log_from_idx(u32 idx) { return NULL; }
-static u32 printk_log_next(u32 idx) { return 0; }
+struct printk_log *printk_log_from_idx(u32 idx) { return NULL; }
+u32 printk_log_next(u32 idx) { return 0; }
 static void call_console_drivers(int level, const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg,
 enum printk_log_flags prev,
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" 

[PATCH V2 12/23] printk: Rename clear_seq and clear_idx variables

2012-10-24 Thread Joe Perches
Make these variables more specific to the printk log subsystem
adding prefix printk_log_.  This allows them to become non-static.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   34 +-
 1 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 709472f..3785ac4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -231,8 +231,8 @@ static u32 console_idx;
 static enum printk_log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
-static u32 clear_idx;
+static u64 printk_log_clear_seq;
+static u32 printk_log_clear_idx;
 
 #define PREFIX_MAX 32
 #define LOG_LINE_MAX   1024 - PREFIX_MAX
@@ -566,8 +566,8 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
 * like issued by 'dmesg -c'. Reading /dev/kmsg itself
 * changes no global state, and does not clear anything.
 */
-   user->idx = clear_idx;
-   user->seq = clear_seq;
+   user->idx = printk_log_clear_idx;
+   user->seq = printk_log_clear_seq;
break;
case SEEK_END:
/* after the last record */
@@ -1012,18 +1012,18 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
u32 idx;
enum printk_log_flags prev;
 
-   if (clear_seq < printk_log_first_seq) {
+   if (printk_log_clear_seq < printk_log_first_seq) {
/* messages are gone, move to first available one */
-   clear_seq = printk_log_first_seq;
-   clear_idx = printk_log_first_idx;
+   printk_log_clear_seq = printk_log_first_seq;
+   printk_log_clear_idx = printk_log_first_idx;
}
 
/*
 * Find first record that fits, including all following records,
 * into the user-provided buffer for this dump.
 */
-   seq = clear_seq;
-   idx = clear_idx;
+   seq = printk_log_clear_seq;
+   idx = printk_log_clear_idx;
prev = 0;
while (seq < printk_log_next_seq) {
struct printk_log *msg = printk_log_from_idx(idx);
@@ -1035,8 +1035,8 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
/* move first record forward until length fits into the buffer 
*/
-   seq = clear_seq;
-   idx = clear_idx;
+   seq = printk_log_clear_seq;
+   idx = printk_log_clear_idx;
prev = 0;
while (len > size && seq < printk_log_next_seq) {
struct printk_log *msg = printk_log_from_idx(idx);
@@ -1083,8 +1083,8 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
}
 
if (clear) {
-   clear_seq = printk_log_next_seq;
-   clear_idx = printk_log_next_idx;
+   printk_log_clear_seq = printk_log_next_seq;
+   printk_log_clear_idx = printk_log_next_idx;
}
raw_spin_unlock_irq(_logbuf_lock);
 
@@ -2566,8 +2566,8 @@ void kmsg_dump(enum kmsg_dump_reason reason)
dumper->active = true;
 
raw_spin_lock_irqsave(_logbuf_lock, flags);
-   dumper->cur_seq = clear_seq;
-   dumper->cur_idx = clear_idx;
+   dumper->cur_seq = printk_log_clear_seq;
+   dumper->cur_idx = printk_log_clear_idx;
dumper->next_seq = printk_log_next_seq;
dumper->next_idx = printk_log_next_idx;
raw_spin_unlock_irqrestore(_logbuf_lock, flags);
@@ -2774,8 +2774,8 @@ EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
  */
 void kmsg_dump_rewind_nolock(struct kmsg_dumper *dumper)
 {
-   dumper->cur_seq = clear_seq;
-   dumper->cur_idx = clear_idx;
+   dumper->cur_seq = printk_log_clear_seq;
+   dumper->cur_idx = printk_log_clear_idx;
dumper->next_seq = printk_log_next_seq;
dumper->next_idx = printk_log_next_idx;
 }
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 11/23] printk: Rename logbuf_lock to printk_logbuf_lock

2012-10-24 Thread Joe Perches
Make this generic name more specific to the printk
subsystem and allow it to become non-static.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |  102 
 1 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c87472b..709472f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -205,10 +205,10 @@ struct printk_log {
 };
 
 /*
- * The logbuf_lock protects kmsg buffer, indices, counters. It is also
+ * The printk_logbuf_lock protects kmsg buffer, indices, counters. It is also
  * used in interesting ways to provide interlocking in console_unlock();
  */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
+static DEFINE_RAW_SPINLOCK(printk_logbuf_lock);
 
 #ifdef CONFIG_PRINTK
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
@@ -248,7 +248,7 @@ static char __printk_log_buf[__PRINTK_LOG_BUF_LEN] 
__aligned(LOG_ALIGN);
 static char *printk_log_buf = __printk_log_buf;
 static u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
 
-/* cpu currently holding logbuf_lock */
+/* cpu currently holding printk_logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
 /* human readable text of the record */
@@ -438,20 +438,20 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
ret = mutex_lock_interruptible(>lock);
if (ret)
return ret;
-   raw_spin_lock_irq(_lock);
+   raw_spin_lock_irq(_logbuf_lock);
while (user->seq == printk_log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
goto out;
}
 
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
ret = wait_event_interruptible(printk_log_wait,
   user->seq != 
printk_log_next_seq);
if (ret)
goto out;
-   raw_spin_lock_irq(_lock);
+   raw_spin_lock_irq(_logbuf_lock);
}
 
if (user->seq < printk_log_first_seq) {
@@ -459,7 +459,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
user->idx = printk_log_first_idx;
user->seq = printk_log_first_seq;
ret = -EPIPE;
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
goto out;
}
 
@@ -526,7 +526,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
 
user->idx = printk_log_next(user->idx);
user->seq++;
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
 
if (len > count) {
ret = -EINVAL;
@@ -553,7 +553,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
if (offset)
return -ESPIPE;
 
-   raw_spin_lock_irq(_lock);
+   raw_spin_lock_irq(_logbuf_lock);
switch (whence) {
case SEEK_SET:
/* the first record */
@@ -577,7 +577,7 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
default:
ret = -EINVAL;
}
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
return ret;
 }
 
@@ -591,14 +591,14 @@ static unsigned int devkmsg_poll(struct file *file, 
poll_table *wait)
 
poll_wait(file, _log_wait, wait);
 
-   raw_spin_lock_irq(_lock);
+   raw_spin_lock_irq(_logbuf_lock);
if (user->seq < printk_log_next_seq) {
/* return error when data has vanished underneath us */
if (user->seq < printk_log_first_seq)
ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
ret = POLLIN|POLLRDNORM;
}
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
 
return ret;
 }
@@ -622,10 +622,10 @@ static int devkmsg_open(struct inode *inode, struct file 
*file)
 
mutex_init(>lock);
 
-   raw_spin_lock_irq(_lock);
+   raw_spin_lock_irq(_logbuf_lock);
user->idx = printk_log_first_idx;
user->seq = printk_log_first_seq;
-   raw_spin_unlock_irq(_lock);
+   raw_spin_unlock_irq(_logbuf_lock);
 
file->private_data = user;
return 0;
@@ -722,13 +722,13 @@ void __init setup_log_buf(int early)
return;
}
 
-   raw_spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_logbuf_lock, flags);
printk_log_buf_len = new_printk_log_buf_len;
printk_log_buf = new_printk_log_buf;
new_printk_log_buf_len = 0;
free = __PRINTK_LOG_BUF_LEN - printk_log_next_idx;
memcpy(printk_log_buf, __printk_log_buf, __PRINTK_LOG_BUF_LEN);
-   

[PATCH V2 09/23] printk: Rename enum log_flags to printk_log_flags

2012-10-24 Thread Joe Perches
Make this generic enum more specific to the printk subsystem.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   32 +---
 1 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 992c064..341f2d9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -187,7 +187,7 @@ static int console_may_schedule;
  * separated by ',', and find the message after the ';' character.
  */
 
-enum log_flags {
+enum printk_log_flags {
LOG_NOCONS  = 1,/* already flushed, do not print to console */
LOG_NEWLINE = 2,/* text ended with a newline */
LOG_PREFIX  = 4,/* text started with a prefix */
@@ -214,7 +214,7 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
 static u32 syslog_idx;
-static enum log_flags syslog_prev;
+static enum printk_log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
@@ -228,7 +228,7 @@ static u32 printk_log_next_idx;
 /* the next printk record to write to the console */
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags console_prev;
+static enum printk_log_flags console_prev;
 
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
@@ -297,7 +297,7 @@ static u32 printk_log_next(u32 idx)
 
 /* insert record into the buffer, discard old ones, update heads */
 static void printk_log_store(int facility, int level,
-enum log_flags flags, u64 ts_nsec,
+enum printk_log_flags flags, u64 ts_nsec,
 const char *dict, u16 dict_len,
 const char *text, u16 text_len)
 {
@@ -360,7 +360,7 @@ static void printk_log_store(int facility, int level,
 struct devkmsg_user {
u64 seq;
u32 idx;
-   enum log_flags prev;
+   enum printk_log_flags prev;
struct mutex lock;
char buf[8192];
 };
@@ -872,7 +872,8 @@ static size_t print_prefix(const struct printk_log *msg, 
bool syslog, char *buf)
return len;
 }
 
-static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
+static size_t msg_print_text(const struct printk_log *msg,
+enum printk_log_flags prev,
 bool syslog, char *buf, size_t size)
 {
const char *text = printk_log_text(msg);
@@ -1009,7 +1010,7 @@ static int syslog_print_all(char __user *buf, int size, 
bool clear)
u64 next_seq;
u64 seq;
u32 idx;
-   enum log_flags prev;
+   enum printk_log_flags prev;
 
if (clear_seq < printk_log_first_seq) {
/* messages are gone, move to first available one */
@@ -1194,7 +1195,7 @@ int do_syslog(int type, char __user *buf, int len, bool 
from_file)
} else {
u64 seq = syslog_seq;
u32 idx = syslog_idx;
-   enum log_flags prev = syslog_prev;
+   enum printk_log_flags prev = syslog_prev;
 
error = 0;
while (seq < printk_log_next_seq) {
@@ -1383,11 +1384,11 @@ static struct cont {
u64 ts_nsec;/* time of first print */
u8 level;   /* log level of first message */
u8 facility;/* log level of first message */
-   enum log_flags flags;   /* prefix, newline flags */
+   enum printk_log_flags flags;/* prefix, newline flags */
bool flushed:1; /* buffer sealed and committed */
 } cont;
 
-static void cont_flush(enum log_flags flags)
+static void cont_flush(enum printk_log_flags flags)
 {
if (cont.flushed)
return;
@@ -1481,7 +1482,7 @@ asmlinkage int vprintk_emit(int facility, int level,
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
size_t text_len;
-   enum log_flags lflags = 0;
+   enum printk_log_flags lflags = 0;
unsigned long flags;
int this_cpu;
int printed_len = 0;
@@ -1688,11 +1689,11 @@ static u64 syslog_seq;
 static u32 syslog_idx;
 static u64 console_seq;
 static u32 console_idx;
-static enum log_flags syslog_prev;
+static enum printk_log_flags syslog_prev;
 static u64 printk_log_first_seq;
 static u32 printk_log_first_idx;
 static u64 printk_log_next_seq;
-static enum log_flags console_prev;
+static enum printk_log_flags console_prev;
 static struct cont {
size_t len;
size_t cons;
@@ -1702,7 +1703,8 @@ static struct cont {
 static struct printk_log *printk_log_from_idx(u32 idx) { return NULL; }
 static u32 printk_log_next(u32 idx) { return 0; }
 static void call_console_drivers(int 

[PATCH V2 08/23] printk: Rename log_ variables and functions

2012-10-24 Thread Joe Perches
Make these generic names more specific to the printk
subsystem and allow these variables and functions to
become non-static.

Rename log_text to printk_log_text.
Rename log_dict to printk_log_dict.
Rename log_from_idx to printk_log_from_idx.
Rename log_next to printk_log_next.
Rename log_store to printk_log_store.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |  100 
 1 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 602a1ab..992c064 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -252,19 +252,19 @@ static u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
 /* human readable text of the record */
-static char *log_text(const struct printk_log *msg)
+static char *printk_log_text(const struct printk_log *msg)
 {
return (char *)msg + sizeof(struct printk_log);
 }
 
 /* optional key/value pair dictionary attached to the record */
-static char *log_dict(const struct printk_log *msg)
+static char *printk_log_dict(const struct printk_log *msg)
 {
return (char *)msg + sizeof(struct printk_log) + msg->text_len;
 }
 
 /* get record by index; idx must point to valid msg */
-static struct printk_log *log_from_idx(u32 idx)
+static struct printk_log *printk_log_from_idx(u32 idx)
 {
struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
 
@@ -278,7 +278,7 @@ static struct printk_log *log_from_idx(u32 idx)
 }
 
 /* get next record; idx must point to valid msg */
-static u32 log_next(u32 idx)
+static u32 printk_log_next(u32 idx)
 {
struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
 
@@ -296,10 +296,10 @@ static u32 log_next(u32 idx)
 }
 
 /* insert record into the buffer, discard old ones, update heads */
-static void log_store(int facility, int level,
- enum log_flags flags, u64 ts_nsec,
- const char *dict, u16 dict_len,
- const char *text, u16 text_len)
+static void printk_log_store(int facility, int level,
+enum log_flags flags, u64 ts_nsec,
+const char *dict, u16 dict_len,
+const char *text, u16 text_len)
 {
struct printk_log *msg;
u32 size, pad_len;
@@ -321,7 +321,7 @@ static void log_store(int facility, int level,
break;
 
/* drop old messages until we have enough contiuous space */
-   printk_log_first_idx = log_next(printk_log_first_idx);
+   printk_log_first_idx = printk_log_next(printk_log_first_idx);
printk_log_first_seq++;
}
 
@@ -337,9 +337,9 @@ static void log_store(int facility, int level,
 
/* fill message */
msg = (struct printk_log *)(printk_log_buf + printk_log_next_idx);
-   memcpy(log_text(msg), text, text_len);
+   memcpy(printk_log_text(msg), text, text_len);
msg->text_len = text_len;
-   memcpy(log_dict(msg), dict, dict_len);
+   memcpy(printk_log_dict(msg), dict, dict_len);
msg->dict_len = dict_len;
msg->facility = facility;
msg->level = level & 7;
@@ -348,7 +348,7 @@ static void log_store(int facility, int level,
msg->ts_nsec = ts_nsec;
else
msg->ts_nsec = local_clock();
-   memset(log_dict(msg) + dict_len, 0, pad_len);
+   memset(printk_log_dict(msg) + dict_len, 0, pad_len);
msg->len = sizeof(struct printk_log) + text_len + dict_len + pad_len;
 
/* insert message */
@@ -463,7 +463,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
goto out;
}
 
-   msg = log_from_idx(user->idx);
+   msg = printk_log_from_idx(user->idx);
ts_usec = msg->ts_nsec;
do_div(ts_usec, 1000);
 
@@ -488,7 +488,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
 
/* escape non-printable characters */
for (i = 0; i < msg->text_len; i++) {
-   unsigned char c = log_text(msg)[i];
+   unsigned char c = printk_log_text(msg)[i];
 
if (c < ' ' || c >= 127 || c == '\\')
len += sprintf(user->buf + len, "\\x%02x", c);
@@ -501,7 +501,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
bool line = true;
 
for (i = 0; i < msg->dict_len; i++) {
-   unsigned char c = log_dict(msg)[i];
+   unsigned char c = printk_log_dict(msg)[i];
 
if (line) {
user->buf[len++] = ' ';
@@ -524,7 +524,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
user->buf[len++] = '\n';
}
 
-   user->idx = log_next(user->idx);
+   user->idx = printk_log_next(user->idx);

[PATCH V2 07/23] printk: Rename log_first and log_next variables

2012-10-24 Thread Joe Perches
Make these generic names more specific to the printk
subsystem and allow these variables to become non-static.

Rename log_first_idx to printk_log_first_idx.
Rename log_first_seq to printk_log_first_seq.
Rename log_next_idx to printk_log_next_idx.
Rename log_next_seq to printk_log_next_seq.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |  150 
 1 files changed, 75 insertions(+), 75 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c45afb1..602a1ab 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -218,12 +218,12 @@ static enum log_flags syslog_prev;
 static size_t syslog_partial;
 
 /* index and sequence number of the first record stored in the buffer */
-static u64 log_first_seq;
-static u32 log_first_idx;
+static u64 printk_log_first_seq;
+static u32 printk_log_first_idx;
 
 /* index and sequence number of the next record to store in the buffer */
-static u64 log_next_seq;
-static u32 log_next_idx;
+static u64 printk_log_next_seq;
+static u32 printk_log_next_idx;
 
 /* the next printk record to write to the console */
 static u64 console_seq;
@@ -309,34 +309,34 @@ static void log_store(int facility, int level,
pad_len = (-size) & (LOG_ALIGN - 1);
size += pad_len;
 
-   while (log_first_seq < log_next_seq) {
+   while (printk_log_first_seq < printk_log_next_seq) {
u32 free;
 
-   if (log_next_idx > log_first_idx)
-   free = max(printk_log_buf_len - log_next_idx, 
log_first_idx);
+   if (printk_log_next_idx > printk_log_first_idx)
+   free = max(printk_log_buf_len - printk_log_next_idx, 
printk_log_first_idx);
else
-   free = log_first_idx - log_next_idx;
+   free = printk_log_first_idx - printk_log_next_idx;
 
if (free > size + sizeof(struct printk_log))
break;
 
/* drop old messages until we have enough contiuous space */
-   log_first_idx = log_next(log_first_idx);
-   log_first_seq++;
+   printk_log_first_idx = log_next(printk_log_first_idx);
+   printk_log_first_seq++;
}
 
-   if (log_next_idx + size + sizeof(struct printk_log) >= 
printk_log_buf_len) {
+   if (printk_log_next_idx + size + sizeof(struct printk_log) >= 
printk_log_buf_len) {
/*
 * This message + an additional empty header does not fit
 * at the end of the buffer. Add an empty header with len == 0
 * to signify a wrap around.
 */
-   memset(printk_log_buf + log_next_idx, 0, sizeof(struct 
printk_log));
-   log_next_idx = 0;
+   memset(printk_log_buf + printk_log_next_idx, 0, sizeof(struct 
printk_log));
+   printk_log_next_idx = 0;
}
 
/* fill message */
-   msg = (struct printk_log *)(printk_log_buf + log_next_idx);
+   msg = (struct printk_log *)(printk_log_buf + printk_log_next_idx);
memcpy(log_text(msg), text, text_len);
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
@@ -352,8 +352,8 @@ static void log_store(int facility, int level,
msg->len = sizeof(struct printk_log) + text_len + dict_len + pad_len;
 
/* insert message */
-   log_next_idx += msg->len;
-   log_next_seq++;
+   printk_log_next_idx += msg->len;
+   printk_log_next_seq++;
 }
 
 /* /dev/kmsg - userspace message inject/listen interface */
@@ -439,7 +439,7 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (ret)
return ret;
raw_spin_lock_irq(_lock);
-   while (user->seq == log_next_seq) {
+   while (user->seq == printk_log_next_seq) {
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
raw_spin_unlock_irq(_lock);
@@ -448,16 +448,16 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
 
raw_spin_unlock_irq(_lock);
ret = wait_event_interruptible(log_wait,
-  user->seq != log_next_seq);
+  user->seq != 
printk_log_next_seq);
if (ret)
goto out;
raw_spin_lock_irq(_lock);
}
 
-   if (user->seq < log_first_seq) {
+   if (user->seq < printk_log_first_seq) {
/* our last seen message is gone, return error and reset */
-   user->idx = log_first_idx;
-   user->seq = log_first_seq;
+   user->idx = printk_log_first_idx;
+   user->seq = printk_log_first_seq;
ret = -EPIPE;
raw_spin_unlock_irq(_lock);
goto out;
@@ -557,8 +557,8 @@ static loff_t 

[PATCH V2 06/23] printk: Rename log_buf and __LOG_BUF_LEN

2012-10-24 Thread Joe Perches
Make these generic names more specific to the printk
subsystem and allow these variables to become non-static.

Rename log_buf to printk_log_buf.
Rename __LOG_BUF_LEN define to __PRINTK_LOG_BUF_LEN.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   76 
 1 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index da2db46..c45afb1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -243,10 +243,10 @@ static u32 clear_idx;
 #else
 #define LOG_ALIGN __alignof__(struct printk_log)
 #endif
-#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
-static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
-static char *log_buf = __log_buf;
-static u32 log_buf_len = __LOG_BUF_LEN;
+#define __PRINTK_LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+static char __printk_log_buf[__PRINTK_LOG_BUF_LEN] __aligned(LOG_ALIGN);
+static char *printk_log_buf = __printk_log_buf;
+static u32 printk_log_buf_len = __PRINTK_LOG_BUF_LEN;
 
 /* cpu currently holding logbuf_lock */
 static volatile unsigned int logbuf_cpu = UINT_MAX;
@@ -266,21 +266,21 @@ static char *log_dict(const struct printk_log *msg)
 /* get record by index; idx must point to valid msg */
 static struct printk_log *log_from_idx(u32 idx)
 {
-   struct printk_log *msg = (struct printk_log *)(log_buf + idx);
+   struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
 
/*
 * A length == 0 record is the end of buffer marker. Wrap around and
 * read the message at the start of the buffer.
 */
if (!msg->len)
-   return (struct printk_log *)log_buf;
+   return (struct printk_log *)printk_log_buf;
return msg;
 }
 
 /* get next record; idx must point to valid msg */
 static u32 log_next(u32 idx)
 {
-   struct printk_log *msg = (struct printk_log *)(log_buf + idx);
+   struct printk_log *msg = (struct printk_log *)(printk_log_buf + idx);
 
/* length == 0 indicates the end of the buffer; wrap */
/*
@@ -289,7 +289,7 @@ static u32 log_next(u32 idx)
 * return the one after that.
 */
if (!msg->len) {
-   msg = (struct printk_log *)log_buf;
+   msg = (struct printk_log *)printk_log_buf;
return msg->len;
}
return idx + msg->len;
@@ -313,7 +313,7 @@ static void log_store(int facility, int level,
u32 free;
 
if (log_next_idx > log_first_idx)
-   free = max(log_buf_len - log_next_idx, log_first_idx);
+   free = max(printk_log_buf_len - log_next_idx, 
log_first_idx);
else
free = log_first_idx - log_next_idx;
 
@@ -325,18 +325,18 @@ static void log_store(int facility, int level,
log_first_seq++;
}
 
-   if (log_next_idx + size + sizeof(struct printk_log) >= log_buf_len) {
+   if (log_next_idx + size + sizeof(struct printk_log) >= 
printk_log_buf_len) {
/*
 * This message + an additional empty header does not fit
 * at the end of the buffer. Add an empty header with len == 0
 * to signify a wrap around.
 */
-   memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
+   memset(printk_log_buf + log_next_idx, 0, sizeof(struct 
printk_log));
log_next_idx = 0;
}
 
/* fill message */
-   msg = (struct printk_log *)(log_buf + log_next_idx);
+   msg = (struct printk_log *)(printk_log_buf + log_next_idx);
memcpy(log_text(msg), text, text_len);
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
@@ -663,8 +663,8 @@ const struct file_operations kmsg_fops = {
  */
 void log_buf_kexec_setup(void)
 {
-   VMCOREINFO_SYMBOL(log_buf);
-   VMCOREINFO_SYMBOL(log_buf_len);
+   VMCOREINFO_SYMBOL(printk_log_buf);
+   VMCOREINFO_SYMBOL(printk_log_buf_len);
VMCOREINFO_SYMBOL(log_first_idx);
VMCOREINFO_SYMBOL(log_next_idx);
/*
@@ -679,60 +679,60 @@ void log_buf_kexec_setup(void)
 }
 #endif
 
-/* requested log_buf_len from kernel cmdline */
-static unsigned long __initdata new_log_buf_len;
+/* requested printk_log_buf_len from kernel cmdline */
+static unsigned long __initdata new_printk_log_buf_len;
 
-/* save requested log_buf_len since it's too early to process it */
-static int __init log_buf_len_setup(char *str)
+/* save requested printk_log_buf_len since it's too early to process it */
+static int __init printk_log_buf_len_setup(char *str)
 {
unsigned size = memparse(str, );
 
if (size)
size = roundup_pow_of_two(size);
-   if (size > log_buf_len)
-   new_log_buf_len = size;
+   if (size > printk_log_buf_len)
+   new_printk_log_buf_len = size;
 
return 0;
 

[PATCH V2 05/23] printk: rename struct log to struct printk_log

2012-10-24 Thread Joe Perches
Rename the struct to enable moving portions of
printk.c to separate files.

The rename changes output of /proc/vmcoreinfo.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   80 
 1 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 099c439..da2db46 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -172,7 +172,7 @@ static int console_may_schedule;
  * 67   "g"
  *   0032 00 00 00  padding to next message header
  *
- * The 'struct log' buffer header must never be directly exported to
+ * The 'struct printk_log' buffer header must never be directly exported to
  * userspace, it is a kernel-private implementation detail that might
  * need to be changed in the future, when the requirements change.
  *
@@ -194,7 +194,7 @@ enum log_flags {
LOG_CONT= 8,/* text is a fragment of a continuation line */
 };
 
-struct log {
+struct printk_log {
u64 ts_nsec;/* timestamp in nanoseconds */
u16 len;/* length of entire record */
u16 text_len;   /* length of text buffer */
@@ -241,7 +241,7 @@ static u32 clear_idx;
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
 #define LOG_ALIGN 4
 #else
-#define LOG_ALIGN __alignof__(struct log)
+#define LOG_ALIGN __alignof__(struct printk_log)
 #endif
 #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
@@ -252,35 +252,35 @@ static u32 log_buf_len = __LOG_BUF_LEN;
 static volatile unsigned int logbuf_cpu = UINT_MAX;
 
 /* human readable text of the record */
-static char *log_text(const struct log *msg)
+static char *log_text(const struct printk_log *msg)
 {
-   return (char *)msg + sizeof(struct log);
+   return (char *)msg + sizeof(struct printk_log);
 }
 
 /* optional key/value pair dictionary attached to the record */
-static char *log_dict(const struct log *msg)
+static char *log_dict(const struct printk_log *msg)
 {
-   return (char *)msg + sizeof(struct log) + msg->text_len;
+   return (char *)msg + sizeof(struct printk_log) + msg->text_len;
 }
 
 /* get record by index; idx must point to valid msg */
-static struct log *log_from_idx(u32 idx)
+static struct printk_log *log_from_idx(u32 idx)
 {
-   struct log *msg = (struct log *)(log_buf + idx);
+   struct printk_log *msg = (struct printk_log *)(log_buf + idx);
 
/*
 * A length == 0 record is the end of buffer marker. Wrap around and
 * read the message at the start of the buffer.
 */
if (!msg->len)
-   return (struct log *)log_buf;
+   return (struct printk_log *)log_buf;
return msg;
 }
 
 /* get next record; idx must point to valid msg */
 static u32 log_next(u32 idx)
 {
-   struct log *msg = (struct log *)(log_buf + idx);
+   struct printk_log *msg = (struct printk_log *)(log_buf + idx);
 
/* length == 0 indicates the end of the buffer; wrap */
/*
@@ -289,7 +289,7 @@ static u32 log_next(u32 idx)
 * return the one after that.
 */
if (!msg->len) {
-   msg = (struct log *)log_buf;
+   msg = (struct printk_log *)log_buf;
return msg->len;
}
return idx + msg->len;
@@ -301,11 +301,11 @@ static void log_store(int facility, int level,
  const char *dict, u16 dict_len,
  const char *text, u16 text_len)
 {
-   struct log *msg;
+   struct printk_log *msg;
u32 size, pad_len;
 
/* number of '\0' padding bytes to next message */
-   size = sizeof(struct log) + text_len + dict_len;
+   size = sizeof(struct printk_log) + text_len + dict_len;
pad_len = (-size) & (LOG_ALIGN - 1);
size += pad_len;
 
@@ -317,7 +317,7 @@ static void log_store(int facility, int level,
else
free = log_first_idx - log_next_idx;
 
-   if (free > size + sizeof(struct log))
+   if (free > size + sizeof(struct printk_log))
break;
 
/* drop old messages until we have enough contiuous space */
@@ -325,18 +325,18 @@ static void log_store(int facility, int level,
log_first_seq++;
}
 
-   if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
+   if (log_next_idx + size + sizeof(struct printk_log) >= log_buf_len) {
/*
 * This message + an additional empty header does not fit
 * at the end of the buffer. Add an empty header with len == 0
 * to signify a wrap around.
 */
-   memset(log_buf + log_next_idx, 0, sizeof(struct log));
+   memset(log_buf + log_next_idx, 0, sizeof(struct printk_log));
log_next_idx 

[PATCH V2 04/23] printk: Use pointer for console_cmdline indexing

2012-10-24 Thread Joe Perches
Make the code a bit more compact by always using a pointer
for the active console_cmdline.

Move overly indented code to correct indent level.

Signed-off-by: Joe Perches 
---
 kernel/printk/printk.c |   49 +--
 1 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index df5b80f..099c439 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1718,18 +1718,19 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
 *  See if this tty is not yet registered, and
 *  if we have a slot free.
 */
-   for (i = 0; i < MAX_CMDLINECONSOLES && console_cmdline[i].name[0]; i++)
-   if (strcmp(console_cmdline[i].name, name) == 0 &&
- console_cmdline[i].index == idx) {
-   if (!brl_options)
-   selected_console = i;
-   return 0;
+   for (i = 0, c = console_cmdline;
+i < MAX_CMDLINECONSOLES && c->name[0];
+i++, c++) {
+   if (strcmp(c->name, name) == 0 && c->index == idx) {
+   if (!brl_options)
+   selected_console = i;
+   return 0;
}
+   }
if (i == MAX_CMDLINECONSOLES)
return -E2BIG;
if (!brl_options)
selected_console = i;
-   c = _cmdline[i];
strlcpy(c->name, name, sizeof(c->name));
c->options = options;
braille_set_options(c, brl_options);
@@ -1802,15 +1803,15 @@ int update_console_cmdline(char *name, int idx, char 
*name_new, int idx_new, cha
struct console_cmdline *c;
int i;
 
-   for (i = 0; i < MAX_CMDLINECONSOLES && console_cmdline[i].name[0]; i++)
-   if (strcmp(console_cmdline[i].name, name) == 0 &&
- console_cmdline[i].index == idx) {
-   c = _cmdline[i];
-   strlcpy(c->name, name_new, sizeof(c->name));
-   c->name[sizeof(c->name) - 1] = 0;
-   c->options = options;
-   c->index = idx_new;
-   return i;
+   for (i = 0, c = console_cmdline;
+i < MAX_CMDLINECONSOLES && c->name[0];
+i++, c++)
+   if (strcmp(c->name, name) == 0 && c->index == idx) {
+   strlcpy(c->name, name_new, sizeof(c->name));
+   c->name[sizeof(c->name) - 1] = 0;
+   c->options = options;
+   c->index = idx_new;
+   return i;
}
/* not found */
return -1;
@@ -2218,6 +2219,7 @@ void register_console(struct console *newcon)
int i;
unsigned long flags;
struct console *bcon = NULL;
+   struct console_cmdline *c;
 
/*
 * before we register a new CON_BOOT console, make sure we don't
@@ -2265,24 +2267,25 @@ void register_console(struct console *newcon)
 *  See if this console matches one we selected on
 *  the command line.
 */
-   for (i = 0; i < MAX_CMDLINECONSOLES && console_cmdline[i].name[0];
-   i++) {
-   if (strcmp(console_cmdline[i].name, newcon->name) != 0)
+   for (i = 0, c = console_cmdline;
+i < MAX_CMDLINECONSOLES && c->name[0];
+i++, c++) {
+   if (strcmp(c->name, newcon->name) != 0)
continue;
if (newcon->index >= 0 &&
-   newcon->index != console_cmdline[i].index)
+   newcon->index != c->index)
continue;
if (newcon->index < 0)
-   newcon->index = console_cmdline[i].index;
+   newcon->index = c->index;
 
-   if (_braille_register_console(newcon, _cmdline[i]))
+   if (_braille_register_console(newcon, c))
return;
 
if (newcon->setup &&
newcon->setup(newcon, console_cmdline[i].options) != 0)
break;
newcon->flags |= CON_ENABLED;
-   newcon->index = console_cmdline[i].index;
+   newcon->index = c->index;
if (i == selected_console) {
newcon->flags |= CON_CONSDEV;
preferred_console = selected_console;
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 03/23] printk: Move braille console support into separate braille.[ch] files

2012-10-24 Thread Joe Perches
Create files with prototypes and static inlines for braille
support.  Make braille_console functions return 1 on success.

Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup
return value to NULL.

link:  
http://lkml.kernel.org/r/1350999678-17441-1-git-send-email-ming@canonical.com
cc: Samuel Thibault 
cc: Ming Lei 

Signed-off-by: Joe Perches 
---
 drivers/accessibility/braille/braille_console.c |9 +++-
 kernel/printk/Makefile  |1 +
 kernel/printk/braille.c |   48 +++
 kernel/printk/braille.h |   48 +++
 kernel/printk/printk.c  |   44 ++---
 5 files changed, 117 insertions(+), 33 deletions(-)
 create mode 100644 kernel/printk/braille.c
 create mode 100644 kernel/printk/braille.h

diff --git a/drivers/accessibility/braille/braille_console.c 
b/drivers/accessibility/braille/braille_console.c
index d21167b..dc34a5b 100644
--- a/drivers/accessibility/braille/braille_console.c
+++ b/drivers/accessibility/braille/braille_console.c
@@ -359,6 +359,9 @@ int braille_register_console(struct console *console, int 
index,
char *console_options, char *braille_options)
 {
int ret;
+
+   if (!(console->flags & CON_BRL))
+   return 0;
if (!console_options)
/* Only support VisioBraille for now */
console_options = "57600o8";
@@ -374,15 +377,17 @@ int braille_register_console(struct console *console, int 
index,
braille_co = console;
register_keyboard_notifier(_notifier_block);
register_vt_notifier(_notifier_block);
-   return 0;
+   return 1;
 }
 
 int braille_unregister_console(struct console *console)
 {
if (braille_co != console)
return -EINVAL;
+   if (!(console->flags & CON_BRL))
+   return 0;
unregister_keyboard_notifier(_notifier_block);
unregister_vt_notifier(_notifier_block);
braille_co = NULL;
-   return 0;
+   return 1;
 }
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
index 36d306d..85405bd 100644
--- a/kernel/printk/Makefile
+++ b/kernel/printk/Makefile
@@ -1 +1,2 @@
 obj-y  = printk.o
+obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o
diff --git a/kernel/printk/braille.c b/kernel/printk/braille.c
new file mode 100644
index 000..b51087f
--- /dev/null
+++ b/kernel/printk/braille.c
@@ -0,0 +1,48 @@
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+
+#include "console_cmdline.h"
+#include "braille.h"
+
+char *_braille_console_setup(char **str, char **brl_options)
+{
+   if (!memcmp(*str, "brl,", 4)) {
+   *brl_options = "";
+   *str += 4;
+   } else if (!memcmp(str, "brl=", 4)) {
+   *brl_options = *str + 4;
+   *str = strchr(*brl_options, ',');
+   if (!*str)
+   pr_err("need port name after brl=\n");
+   else
+   *((*str)++) = 0;
+   }
+
+   return *str;
+}
+
+int
+_braille_register_console(struct console *console, struct console_cmdline *c)
+{
+   int rtn = 0;
+
+   if (c->brl_options) {
+   console->flags |= CON_BRL;
+   rtn = braille_register_console(console, c->index, c->options,
+  c->brl_options);
+   }
+
+   return rtn;
+}
+
+int
+_braille_unregister_console(struct console *console)
+{
+   if (console->flags & CON_BRL)
+   return braille_unregister_console(console);
+
+   return 0;
+}
diff --git a/kernel/printk/braille.h b/kernel/printk/braille.h
new file mode 100644
index 000..769d771
--- /dev/null
+++ b/kernel/printk/braille.h
@@ -0,0 +1,48 @@
+#ifndef _PRINTK_BRAILLE_H
+#define _PRINTK_BRAILLE_H
+
+#ifdef CONFIG_A11Y_BRAILLE_CONSOLE
+
+static inline void
+braille_set_options(struct console_cmdline *c, char *brl_options)
+{
+   c->brl_options = brl_options;
+}
+
+char *
+_braille_console_setup(char **str, char **brl_options);
+
+int
+_braille_register_console(struct console *console, struct console_cmdline *c);
+
+int
+_braille_unregister_console(struct console *console);
+
+#else
+
+static inline void
+braille_set_options(struct console_cmdline *c, char *brl_options)
+{
+}
+
+static inline char *
+_braille_console_setup(char **str, char **brl_options)
+{
+   return NULL;
+}
+
+static inline int
+_braille_register_console(struct console *console, struct console_cmdline *c)
+{
+   return 0;
+}
+
+static inline int
+_braille_unregister_console(struct console *console)
+{
+   return 0;
+}
+
+#endif
+
+#endif
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 50ef6af..df5b80f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -47,6 +47,7 @@
 #include 
 
 #include "console_cmdline.h"
+#include "braille.h"
 
 /*
  * 

[PATCH V2 02/23] printk: Add console_cmdline.h

2012-10-24 Thread Joe Perches
Add an include file for the console_cmdline struct
so that the braille console driver can be separated.

Signed-off-by: Joe Perches 
---
 kernel/printk/console_cmdline.h |   14 ++
 kernel/printk/printk.c  |   13 -
 2 files changed, 18 insertions(+), 9 deletions(-)
 create mode 100644 kernel/printk/console_cmdline.h

diff --git a/kernel/printk/console_cmdline.h b/kernel/printk/console_cmdline.h
new file mode 100644
index 000..cbd69d8
--- /dev/null
+++ b/kernel/printk/console_cmdline.h
@@ -0,0 +1,14 @@
+#ifndef _CONSOLE_CMDLINE_H
+#define _CONSOLE_CMDLINE_H
+
+struct console_cmdline
+{
+   charname[8];/* Name of the driver   */
+   int index;  /* Minor dev. to use*/
+   char*options;   /* Options for the driver   */
+#ifdef CONFIG_A11Y_BRAILLE_CONSOLE
+   char*brl_options;   /* Options for braille driver */
+#endif
+};
+
+#endif
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1950ecf..50ef6af 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,8 @@
 #define CREATE_TRACE_POINTS
 #include 
 
+#include "console_cmdline.h"
+
 /*
  * Architectures can override it:
  */
@@ -100,22 +102,15 @@ static int console_locked, console_suspended;
  */
 static struct console *exclusive_console;
 
+
 /*
  * Array of consoles built from command line options (console=)
  */
-struct console_cmdline
-{
-   charname[8];/* Name of the driver   */
-   int index;  /* Minor dev. to use*/
-   char*options;   /* Options for the driver   */
-#ifdef CONFIG_A11Y_BRAILLE_CONSOLE
-   char*brl_options;   /* Options for braille driver */
-#endif
-};
 
 #define MAX_CMDLINECONSOLES 8
 
 static struct console_cmdline console_cmdline[MAX_CMDLINECONSOLES];
+
 static int selected_console = -1;
 static int preferred_console = -1;
 int console_set_on_cmdline;
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 01/23] printk: Move to separate directory for easier modification

2012-10-24 Thread Joe Perches
Make it easier to break up printk into bite-sized chunks.

Remove printk path/filename from comment.

Signed-off-by: Joe Perches 
---
 kernel/Makefile  |3 ++-
 kernel/printk/Makefile   |1 +
 kernel/{ => printk}/printk.c |2 --
 3 files changed, 3 insertions(+), 3 deletions(-)
 create mode 100644 kernel/printk/Makefile
 rename kernel/{ => printk}/printk.c (99%)

diff --git a/kernel/Makefile b/kernel/Makefile
index 0dfeca4..d53980d 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -2,7 +2,7 @@
 # Makefile for the linux kernel.
 #
 
-obj-y = fork.o exec_domain.o panic.o printk.o \
+obj-y = fork.o exec_domain.o panic.o \
cpu.o exit.o itimer.o time.o softirq.o resource.o \
sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \
signal.o sys.o kmod.o workqueue.o pid.o task_work.o \
@@ -24,6 +24,7 @@ endif
 
 obj-y += sched/
 obj-y += power/
+obj-y += printk/
 
 ifeq ($(CONFIG_CHECKPOINT_RESTORE),y)
 obj-$(CONFIG_X86) += kcmp.o
diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile
new file mode 100644
index 000..36d306d
--- /dev/null
+++ b/kernel/printk/Makefile
@@ -0,0 +1 @@
+obj-y  = printk.o
diff --git a/kernel/printk.c b/kernel/printk/printk.c
similarity index 99%
rename from kernel/printk.c
rename to kernel/printk/printk.c
index 2d607f4..1950ecf 100644
--- a/kernel/printk.c
+++ b/kernel/printk/printk.c
@@ -1,6 +1,4 @@
 /*
- *  linux/kernel/printk.c
- *
  *  Copyright (C) 1991, 1992  Linus Torvalds
  *
  * Modified to make sys_syslog() more flexible: added commands to
-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH V2 00/23] printk: refactoring

2012-10-24 Thread Joe Perches
Make printk a bit more readable.

Created directory:
kernel/printk
Created source files:
printk.cGeneric printk and console routines
printk_log.[ch] Log buffer routines
printk_syslog.[ch]  syslog(2) routines
braille.c   Braille console support
devkmsg.c   /dev/kmsg support
kmsg_dump.c kmsg_dump support

Changes from v1:
Fix braille.h: _braille_console_setup return value
Remove duplicated #define
Add missing #include 

Joe Perches (23):
  printk: Move to separate directory for easier modification
  printk: Add console_cmdline.h
  printk: Move braille console support into separate braille.[ch] files
  printk: Use pointer for console_cmdline indexing
  printk: rename struct log to struct printk_log
  printk: Rename log_buf and __LOG_BUF_LEN
  printk: Rename log_first and log_next variables
  printk: Rename log_ variables and functions
  printk: Rename enum log_flags to printk_log_flags
  printk: Rename log_wait to printk_log_wait
  printk: Rename logbuf_lock to printk_logbuf_lock
  printk: Rename clear_seq and clear_idx variables
  printk: Remove static from printk_ variables
  printk: Rename LOG_ALIGN to PRINTK_LOG_ALIGN
  printk: Add and use printk_log.h
  printk: Add printk_log.c
  printk: Make wait_queue_head_t printk_log_wait extern
  printk: Rename and move 2 #defines to printk_log.h
  printk: Move devkmsg bits to separate file
  printk: Prefix print_time and msg_print_text with printk_
  printk: Move functions printk_print_time and printk_msg_print_text
  printk: Add printk_syslog.c and .h
  printk: Move kmsg_dump functions to separate file

 drivers/accessibility/braille/braille_console.c |9 +-
 fs/proc/kmsg.c  |4 +-
 kernel/Makefile |3 +-
 kernel/printk.c | 2820 ---
 kernel/printk/Makefile  |6 +
 kernel/printk/braille.c |   48 +
 kernel/printk/braille.h |   48 +
 kernel/printk/console_cmdline.h |   14 +
 kernel/printk/devkmsg.c |  309 +++
 kernel/printk/kmsg_dump.c   |  328 +++
 kernel/printk/printk.c  | 1513 
 kernel/printk/printk_log.c  |  263 +++
 kernel/printk/printk_log.h  |  128 +
 kernel/printk/printk_syslog.c   |  355 +++
 kernel/printk/printk_syslog.h   |   13 +
 15 files changed, 3036 insertions(+), 2825 deletions(-)
 delete mode 100644 kernel/printk.c
 create mode 100644 kernel/printk/Makefile
 create mode 100644 kernel/printk/braille.c
 create mode 100644 kernel/printk/braille.h
 create mode 100644 kernel/printk/console_cmdline.h
 create mode 100644 kernel/printk/devkmsg.c
 create mode 100644 kernel/printk/kmsg_dump.c
 create mode 100644 kernel/printk/printk.c
 create mode 100644 kernel/printk/printk_log.c
 create mode 100644 kernel/printk/printk_log.h
 create mode 100644 kernel/printk/printk_syslog.c
 create mode 100644 kernel/printk/printk_syslog.h

-- 
1.7.8.112.g3fd21

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11 5/6] introduce a new qom device to deal with panicked event

2012-10-24 Thread Hu Tao
If the target is x86/x86_64, the guest's kernel will write 0x01 to the
port KVM_PV_EVENT_PORT when it is panciked. This patch introduces a new
qom device kvm_pv_ioport to listen this I/O port, and deal with panicked
event according to panicked_action's value. The possible actions are:
1. emit QEVENT_GUEST_PANICKED only
2. emit QEVENT_GUEST_PANICKED and pause the guest
3. emit QEVENT_GUEST_PANICKED and poweroff the guest
4. emit QEVENT_GUEST_PANICKED and reset the guest

I/O ports does not work for some targets(for example: s390). And you
can implement another qom device, and include it's code into pv_event.c
for such target.

Note: if we emit QEVENT_GUEST_PANICKED only, and the management
application does not receive this event(the management may not
run when the event is emitted), the management won't know the
guest is panicked.

Signed-off-by: Wen Congyang 
Signed-off-by: Hu Tao 
---
 hw/kvm/Makefile.objs |2 +-
 hw/kvm/pv_event.c|  197 ++
 hw/pc_piix.c |5 ++
 kvm-stub.c   |4 +
 kvm.h|2 +
 5 files changed, 209 insertions(+), 1 deletion(-)
 create mode 100644 hw/kvm/pv_event.c

diff --git a/hw/kvm/Makefile.objs b/hw/kvm/Makefile.objs
index f620d7f..cf93199 100644
--- a/hw/kvm/Makefile.objs
+++ b/hw/kvm/Makefile.objs
@@ -1 +1 @@
-obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o pci-assign.o
+obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o pci-assign.o 
pv_event.o
diff --git a/hw/kvm/pv_event.c b/hw/kvm/pv_event.c
new file mode 100644
index 000..112491e
--- /dev/null
+++ b/hw/kvm/pv_event.c
@@ -0,0 +1,197 @@
+/*
+ * QEMU KVM support, paravirtual event device
+ *
+ * Copyright Fujitsu, Corp. 2012
+ *
+ * Authors:
+ * Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Possible values for action parameter. */
+#define PANICKED_REPORT 1   /* emit QEVENT_GUEST_PANICKED only */
+#define PANICKED_PAUSE  2   /* emit QEVENT_GUEST_PANICKED and pause VM */
+#define PANICKED_POWEROFF   3   /* emit QEVENT_GUEST_PANICKED and quit VM */
+#define PANICKED_RESET  4   /* emit QEVENT_GUEST_PANICKED and reset VM */
+
+#define PV_EVENT_DRIVER "kvm_pv_event"
+
+struct PVEventAction {
+char *panicked_action;
+int panicked_action_value;
+};
+
+#define DEFINE_PV_EVENT_PROPERTIES(_state, _conf)   \
+DEFINE_PROP_STRING("panicked_action", _state, _conf.panicked_action)
+
+static void panicked_mon_event(const char *action)
+{
+QObject *data;
+
+data = qobject_from_jsonf("{ 'action': %s }", action);
+monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
+qobject_decref(data);
+}
+
+static void panicked_perform_action(uint32_t panicked_action)
+{
+switch (panicked_action) {
+case PANICKED_REPORT:
+panicked_mon_event("report");
+break;
+
+case PANICKED_PAUSE:
+panicked_mon_event("pause");
+vm_stop(RUN_STATE_GUEST_PANICKED);
+break;
+
+case PANICKED_POWEROFF:
+panicked_mon_event("poweroff");
+qemu_system_shutdown_request();
+break;
+
+case PANICKED_RESET:
+panicked_mon_event("reset");
+qemu_system_reset_request();
+break;
+}
+}
+
+static uint64_t supported_event(void)
+{
+return 1 << KVM_PV_FEATURE_PANICKED;
+}
+
+static void handle_event(int event, struct PVEventAction *conf)
+{
+if (event == KVM_PV_EVENT_PANICKED) {
+panicked_perform_action(conf->panicked_action_value);
+}
+}
+
+static int pv_event_init(struct PVEventAction *conf)
+{
+if (!conf->panicked_action) {
+conf->panicked_action_value = PANICKED_REPORT;
+} else if (strcasecmp(conf->panicked_action, "none") == 0) {
+conf->panicked_action_value = PANICKED_REPORT;
+} else if (strcasecmp(conf->panicked_action, "pause") == 0) {
+conf->panicked_action_value = PANICKED_PAUSE;
+} else if (strcasecmp(conf->panicked_action, "poweroff") == 0) {
+conf->panicked_action_value = PANICKED_POWEROFF;
+} else if (strcasecmp(conf->panicked_action, "reset") == 0) {
+conf->panicked_action_value = PANICKED_RESET;
+} else {
+return -1;
+}
+
+return 0;
+}
+
+#if defined(KVM_PV_EVENT_PORT)
+
+#include "hw/isa.h"
+
+typedef struct {
+ISADevice dev;
+struct PVEventAction conf;
+MemoryRegion ioport;
+} PVIOPortState;
+
+static uint64_t pv_io_read(void *opaque, hwaddr addr, unsigned size)
+{
+return supported_event();
+}
+
+static void pv_io_write(void *opaque, hwaddr addr, uint64_t val,
+unsigned size)
+{
+PVIOPortState *s = opaque;
+
+handle_event(val, >conf);
+}
+
+static const MemoryRegionOps pv_io_ops = {
+.read = pv_io_read,
+.write = pv_io_write,
+.impl = {
+

[PATCH v11 6/6] allower the user to disable pv event support

2012-10-24 Thread Hu Tao
From: Wen Congyang 

Signed-off-by: Wen Congyang 
---
 hw/pc_piix.c|6 +-
 qemu-config.c   |4 
 qemu-options.hx |3 ++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index fb67dc1..864d356 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -149,6 +149,8 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 void *fw_cfg = NULL;
+QemuOptsList *list = qemu_find_opts("machine");
+bool enable_pv_event;
 
 pc_cpus_init(cpu_model);
 
@@ -287,7 +289,9 @@ static void pc_init1(MemoryRegion *system_memory,
 pc_pci_device_init(pci_bus);
 }
 
-if (kvm_enabled()) {
+enable_pv_event = qemu_opt_get_bool(QTAILQ_FIRST(>head),
+"enable_pv_event", false);
+if (kvm_enabled() && enable_pv_event) {
 kvm_pv_event_init(isa_bus);
 }
 }
diff --git a/qemu-config.c b/qemu-config.c
index cd1ec21..db8f828 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -619,6 +619,10 @@ static QemuOptsList qemu_machine_opts = {
 .name = "mem-merge",
 .type = QEMU_OPT_BOOL,
 .help = "enable/disable memory merge support",
+}, {
+.name = "enable_pv_event",
+.type = QEMU_OPT_BOOL,
+.help = "handle pv event"
 },
 { /* End of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 46f0539..de667d6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -35,7 +35,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "kernel_irqchip=on|off controls accelerated irqchip 
support\n"
 "kvm_shadow_mem=size of KVM shadow MMU\n"
 "dump-guest-core=on|off include guest memory in a core 
dump (default=on)\n"
-"mem-merge=on|off controls memory merge support (default: 
on)\n",
+"mem-merge=on|off controls memory merge support (default: 
on)\n"
+"enable_pv_event=on|off controls pv event support\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
-- 
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2012-10-24-17-15 uploaded

2012-10-24 Thread Stephen Rothwell
Hi Andrew,

On Wed, 24 Oct 2012 17:16:27 -0700 a...@linux-foundation.org wrote:
>
> The mm-of-the-moment snapshot 2012-10-24-17-15 has been uploaded to
> 
>http://www.ozlabs.org/~akpm/mmotm/

I have split this series so that all the next tagged patches before
linux-next.patch are in the akpm-current tree (based on Linus' tree) and
all the rest are as before (rebased onto today's linux-next including
the akpm-current tree).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgprtI9DkcWPZ.pgp
Description: PGP signature


[PATCH v11 3/6] add a new runstate: RUN_STATE_GUEST_PANICKED

2012-10-24 Thread Hu Tao
From: Wen Congyang 

The guest will be in this state when it is panicked.

Signed-off-by: Wen Congyang 
---
 qapi-schema.json |6 +-
 qmp.c|3 ++-
 vl.c |7 ++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index c615ee2..25a21eb 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -174,11 +174,15 @@
 # @suspended: guest is suspended (ACPI S3)
 #
 # @watchdog: the watchdog action is configured to pause and has been triggered
+#
+# @guest-panicked: the panicked action is configured to pause and has been
+# triggered.
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
-'running', 'save-vm', 'shutdown', 'suspended', 'watchdog' ] }
+'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
+'guest-panicked' ] }
 
 ##
 # @SnapshotInfo
diff --git a/qmp.c b/qmp.c
index f4a757b..52cc623 100644
--- a/qmp.c
+++ b/qmp.c
@@ -148,7 +148,8 @@ void qmp_cont(Error **errp)
 error_set(errp, QERR_MIGRATION_EXPECTED);
 return;
 } else if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-   runstate_check(RUN_STATE_SHUTDOWN)) {
+   runstate_check(RUN_STATE_SHUTDOWN) ||
+   runstate_check(RUN_STATE_GUEST_PANICKED)) {
 error_set(errp, QERR_RESET_REQUIRED);
 return;
 } else if (runstate_check(RUN_STATE_SUSPENDED)) {
diff --git a/vl.c b/vl.c
index ca2e1e2..33b4578 100644
--- a/vl.c
+++ b/vl.c
@@ -373,6 +373,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_RUNNING, RUN_STATE_SAVE_VM },
 { RUN_STATE_RUNNING, RUN_STATE_SHUTDOWN },
 { RUN_STATE_RUNNING, RUN_STATE_WATCHDOG },
+{ RUN_STATE_RUNNING, RUN_STATE_GUEST_PANICKED },
 
 { RUN_STATE_SAVE_VM, RUN_STATE_RUNNING },
 
@@ -387,6 +388,9 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
+
 { RUN_STATE_MAX, RUN_STATE_MAX },
 };
 
@@ -1617,7 +1621,8 @@ static bool main_loop_should_exit(void)
 qemu_system_reset(VMRESET_REPORT);
 resume_all_vcpus();
 if (runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-runstate_check(RUN_STATE_SHUTDOWN)) {
+runstate_check(RUN_STATE_SHUTDOWN) ||
+runstate_check(RUN_STATE_GUEST_PANICKED)) {
 bdrv_iterate(iostatus_bdrv_it, NULL);
 vm_start();
 }
-- 
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v11] kvm: notify host when the guest is panicked

2012-10-24 Thread Hu Tao
We can know the guest is panicked when the guest runs on xen.
But we do not have such feature on kvm.

Another purpose of this feature is: management app(for example:
libvirt) can do auto dump when the guest is panicked. If management
app does not do auto dump, the guest's user can do dump by hand if
he sees the guest is panicked.

We have three solutions to implement this feature:
1. use vmcall
2. use I/O port
3. use virtio-serial.

We have decided to avoid touching hypervisor. The reason why I choose
choose the I/O port is:
1. it is easier to implememt
2. it does not depend any virtual device
3. it can work when starting the kernel

Signed-off-by: Wen Congyang 
Signed-off-by: Hu Tao 
---

changes from v10:
 
  - add a kernel parameter to disable pv-event
  - detailed documentation to describe pv event interface
  - make kvm_pv_event_init() local

 Documentation/virtual/kvm/pv_event.txt |   38 +
 arch/ia64/include/asm/kvm_para.h   |   14 ++
 arch/powerpc/include/asm/kvm_para.h|   14 ++
 arch/s390/include/asm/kvm_para.h   |   14 ++
 arch/x86/include/asm/kvm_para.h|   21 ++
 arch/x86/kernel/kvm.c  |   48 
 include/linux/kvm_para.h   |   18 
 7 files changed, 167 insertions(+)
 create mode 100644 Documentation/virtual/kvm/pv_event.txt

diff --git a/Documentation/virtual/kvm/pv_event.txt 
b/Documentation/virtual/kvm/pv_event.txt
new file mode 100644
index 000..247379f
--- /dev/null
+++ b/Documentation/virtual/kvm/pv_event.txt
@@ -0,0 +1,38 @@
+The KVM Paravirtual Event Interface
+=
+
+The KVM Paravirtual Event Interface defines a simple interface,
+by which guest OS can inform hypervisor that something happened.
+
+To inform hypervisor of events, guest writes a 32-bit integer to
+the Interface. Each bit of the integer represents an event, if a
+bit is set, the corresponding event happens.
+
+To query events supported by hypervisor, guest reads from the
+Interface. If a bit is set, the corresponding event is supported.
+
+The Interface supports up to 32 events. Currently there is 1 event
+defined, as follow:
+
+KVM_PV_FEATURE_PANICKED0
+
+
+Querying whether the event can be ejected
+==
+kvm_pv_has_feature()
+Arguments:
+   feature: The bit value of this paravirtual event to query
+
+Return Value:
+0: The guest kernel can't eject this paravirtual event.
+1: The guest kernel can eject this paravirtual event.
+
+
+Ejecting paravirtual event
+==
+kvm_pv_eject_event()
+Arguments:
+   event: The event to be ejected.
+
+Return Value:
+   None
diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h
index 2019cb9..b5ec658 100644
--- a/arch/ia64/include/asm/kvm_para.h
+++ b/arch/ia64/include/asm/kvm_para.h
@@ -31,6 +31,20 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline int kvm_arch_pv_event_init(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
+static inline void kvm_arch_pv_eject_event(unsigned int event)
+{
+}
+
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c18916b..01b98c7 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -211,6 +211,20 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline int kvm_arch_pv_event_init(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
+static inline void kvm_arch_pv_eject_event(unsigned int event)
+{
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* __POWERPC_KVM_PARA_H__ */
diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h
index da44867..00ce058 100644
--- a/arch/s390/include/asm/kvm_para.h
+++ b/arch/s390/include/asm/kvm_para.h
@@ -154,6 +154,20 @@ static inline bool kvm_check_and_clear_guest_paused(void)
return false;
 }
 
+static inline int kvm_arch_pv_event_init(void)
+{
+   return 0;
+}
+
+static inline unsigned int kvm_arch_pv_features(void)
+{
+   return 0;
+}
+
+static inline void kvm_arch_pv_eject_event(unsigned int event)
+{
+}
+
 #endif
 
 #endif /* __S390_KVM_PARA_H */
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index eb3e9d8..4315af6 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -96,8 +96,11 @@ struct kvm_vcpu_pv_apf_data {
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 #define KVM_PV_EOI_DISABLED 0x0
 
+#define KVM_PV_EVENT_PORT  (0x505UL)
+
 #ifdef __KERNEL__
 #include 
+#include 
 
 extern void kvmclock_init(void);
 extern int kvm_register_clock(char *txt);
@@ -228,6 +231,24 @@ static inline void kvm_disable_steal_time(void)
 }
 

[PATCH v11 2/6] update kernel headers

2012-10-24 Thread Hu Tao
update kernel headers to add pv event macros.

Signed-off-by: Wen Congyang 
Signed-off-by: Hu Tao 
---
 linux-headers/asm-x86/kvm_para.h |1 +
 linux-headers/linux/kvm_para.h   |6 ++
 2 files changed, 7 insertions(+)

diff --git a/linux-headers/asm-x86/kvm_para.h b/linux-headers/asm-x86/kvm_para.h
index a1c3d72..781959a 100644
--- a/linux-headers/asm-x86/kvm_para.h
+++ b/linux-headers/asm-x86/kvm_para.h
@@ -96,5 +96,6 @@ struct kvm_vcpu_pv_apf_data {
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 #define KVM_PV_EOI_DISABLED 0x0
 
+#define KVM_PV_EVENT_PORT  (0x505UL)
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
index 7bdcf93..f6be0bb 100644
--- a/linux-headers/linux/kvm_para.h
+++ b/linux-headers/linux/kvm_para.h
@@ -20,6 +20,12 @@
 #define KVM_HC_FEATURES3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
+/* The bit of supported pv event */
+#define KVM_PV_FEATURE_PANICKED0
+
+/* The pv event value */
+#define KVM_PV_EVENT_PANICKED  1
+
 /*
  * hypercalls use architecture specific
  */
-- 
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: build warning after merge of the akpm tree

2012-10-24 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

mm/rmap.c: In function 'try_to_unmap_cluster':
mm/rmap.c:1364:9: warning: unused variable 'pud' [-Wunused-variable]
mm/rmap.c:1363:9: warning: unused variable 'pgd' [-Wunused-variable]

Introduced by commit "mm: introduce mm_find_pmd()".
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpJ5EU608SiV.pgp
Description: PGP signature


Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug

2012-10-24 Thread Michael Wang
On 10/24/2012 05:38 PM, Peter Zijlstra wrote:
> On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote:
>> We found poweroff sometimes fails on our computers, so we have the
>> lock debug options configured. Then, when we do poweroff or take a
>> cpu down via cpu-hotplug, kernel complain as below. To resove this,
>> we modify sched_ttwu_pending(), disable the local irq when acquire
>> rq->lock.
>>
>> [   83.066406] =
>> [   83.066406] [ INFO: inconsistent lock state ]
>> [   83.066406] 3.5.0-3.lemote #428 Not tainted
>> [   83.066406] -
>> [   83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
>> [   83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> [   83.066406]  (>lock){?.-.-.}, at: [] 
>> sched_ttwu_pending+0x64/0x98
>> [   83.066406] {IN-HARDIRQ-W} state was registered at:
>> [   83.066406]   [] __lock_acquire+0x80c/0x1cc0
>> [   83.066406]   [] lock_acquire+0x60/0x9c
>> [   83.066406]   [] _raw_spin_lock+0x3c/0x50
>> [   83.066406]   [] scheduler_tick+0x48/0x178
>> [   83.066406]   [] update_process_times+0x54/0x70
>> [   83.066406]   [] tick_handle_periodic+0x2c/0x9c
>> [   83.066406]   [] c0_compare_interrupt+0x8c/0x94
>> [   83.066406]   [] handle_irq_event_percpu+0x7c/0x248
>> [   83.066406]   [] handle_percpu_irq+0x8c/0xc0
>> [   83.066406]   [] generic_handle_irq+0x48/0x58
>> [   83.066406]   [] do_IRQ+0x18/0x24
>> [   83.066406]   [] mach_irq_dispatch+0xe4/0x124
>> [   83.066406]   [] ret_from_irq+0x0/0x4
>> [   83.066406]   [] console_unlock+0x3e8/0x4c0
>> [   83.066406]   [] con_init+0x370/0x398
>> [   83.066406]   [] console_init+0x34/0x50
>> [   83.066406]   [] start_kernel+0x2f8/0x4e0
>> [   83.066406] irq event stamp: 971
>> [   83.066406] hardirqs last  enabled at (971): [] 
>> local_flush_tlb_all+0x134/0x17c
>> [   83.066406] hardirqs last disabled at (970): [] 
>> local_flush_tlb_all+0x48/0x17c
>> [   83.066406] softirqs last  enabled at (0): [] 
>> copy_process+0x510/0x117c
>> [   83.066406] softirqs last disabled at (0): [<  (null)>] (null)
>> [   83.066406]
>> [   83.066406] other info that might help us debug this:
>> [   83.066406]  Possible unsafe locking scenario:
>> [   83.066406]
>> [   83.066406]CPU0
>> [   83.066406]
>> [   83.066406]   lock(>lock);
>> [   83.066406]   
>> [   83.066406] lock(>lock);
>> [   83.066406]
>> [   83.066406]  *** DEADLOCK ***
>> [   83.066406]
>> [   83.066406] no locks held by migration/1/7.
>> [   83.066406]
>> [   83.066406] stack backtrace:
>> [   83.066406] Call Trace:
>> [   83.066406] [] dump_stack+0x8/0x34
>> [   83.066406] [] print_usage_bug+0x2ec/0x314
>> [   83.066406] [] mark_lock+0x3fc/0x774
>> [   83.066406] [] __lock_acquire+0x8a8/0x1cc0
>> [   83.066406] [] lock_acquire+0x60/0x9c
>> [   83.066406] [] _raw_spin_lock+0x3c/0x50
>> [   83.066406] [] sched_ttwu_pending+0x64/0x98
>> [   83.066406] [] migration_call+0x10c/0x2e0
>> [   83.066406] [] notifier_call_chain+0x44/0x94
>> [   83.066406] [] __cpu_notify+0x30/0x5c
>> [   83.066406] [] take_cpu_down+0x5c/0x70
>> [   83.066406] [] stop_machine_cpu_stop+0x104/0x1e8
>> [   83.066406] [] cpu_stopper_thread+0x110/0x1ac
>> [   83.066406] [] kthread+0x88/0x90
>> [   83.066406] [] kernel_thread_helper+0x10/0x18
> 
> Weird, that's from a CPU_DYING call, I thought those were with IRQs
> disabled. 
> 
> Look at how __stop_machine() calls the function with IRQs disabled for !
> stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to
> disabled interrupts, so how do we end up calling take_cpu_down() with
> IRQs enabled?

The patch is no doubt wrong...

The discuss in:

https://lkml.org/lkml/2012/7/19/164

Which also faced the issue that the timer interrupt come in after apic
was shut down, I'm not sure whether this could do help to Huacai, just
as a clue...

Regards,
Michael Wang

> 
> That simply doesn't make any sense.
> 
>> Signed-off-by: Huacai Chen 
>> ---
>>  kernel/sched/core.c |5 +++--
>>  1 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 36e2666..703754a 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void)
>>  {
>>  struct rq *rq = this_rq();
>>  struct llist_node *llist = llist_del_all(>wake_list);
>> +unsigned long flags;
>>  struct task_struct *p;
>>  
>> -raw_spin_lock(>lock);
>> +raw_spin_lock_irqsave(>lock, flags);
>>  
>>  while (llist) {
>>  p = llist_entry(llist, struct task_struct, wake_entry);
>> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void)
>>  ttwu_do_activate(rq, p, 0);
>>  }
>>  
>> -raw_spin_unlock(>lock);
>> +raw_spin_unlock_irqrestore(>lock, flags);
>>  }
>>  
>>  void scheduler_ipi(void)
> 
> 
> That's wrong though, you add the cost to the common case instead of the
> hardly ever ran hotplug case.
> --
> To 

Re: linux-next: build warnings after merge of the akpm tree

2012-10-24 Thread Stephen Rothwell
Hi Andrew,

On Thu, 25 Oct 2012 14:28:54 +1100 Stephen Rothwell  
wrote:
>
> After merging the akpm tree, today's linux-next build (powerpc
> ppc64_defconfig) produced these warnings:
> 
> drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
> '__cxio_init_resource_fifo':
> drivers/infiniband/hw/cxgb3/cxio_resource.c:62:3: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c:74:4: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c:81:4: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c:86:4: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c:89:7: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
> 'cxio_init_qpid_fifo':
> drivers/infiniband/hw/cxgb3/cxio_resource.c:123:4: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
> 'cxio_hal_get_resource':
> drivers/infiniband/hw/cxgb3/cxio_resource.c:184:6: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
> 'cxio_hal_put_resource':
> drivers/infiniband/hw/cxgb3/cxio_resource.c:193:2: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c:193:2: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> drivers/infiniband/hw/cxgb3/cxio_resource.c:193:2: warning: comparison of 
> distinct pointer types lacks a cast [enabled by default]
> 
> Probably introduced by commit "include/linux/kfifo.h: replace open-coded
> type check code with typecheck()".

Also:

drivers/scsi/libiscsi.c: In function 'iscsi_free_task':
drivers/scsi/libiscsi.c:507:2: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
drivers/scsi/libiscsi.c: In function 'iscsi_pool_init':
drivers/scsi/libiscsi.c:2510:3: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
drivers/scsi/libiscsi.c: In function 'iscsi_conn_setup':
drivers/scsi/libiscsi.c:2881:2: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
drivers/scsi/libiscsi.c: In function 'iscsi_conn_teardown':
drivers/scsi/libiscsi.c:2944:2: warning: comparison of distinct pointer types 
lacks a cast [enabled by default]
drivers/scsi/libiscsi_tcp.c: In function 'iscsi_tcp_cleanup_task':
drivers/scsi/libiscsi_tcp.c:462:3: warning: comparison of distinct pointer 
types lacks a cast [enabled by default]
drivers/scsi/libiscsi_tcp.c:469:3: warning: comparison of distinct pointer 
types lacks a cast [enabled by default]
drivers/scsi/libiscsi_tcp.c: In function 'iscsi_tcp_r2t_rsp':
drivers/scsi/libiscsi_tcp.c:570:3: warning: comparison of distinct pointer 
types lacks a cast [enabled by default]
drivers/scsi/libiscsi_tcp.c:586:3: warning: comparison of distinct pointer 
types lacks a cast [enabled by default]
drivers/scsi/libiscsi_tcp.c:596:2: warning: comparison of distinct pointer 
types lacks a cast [enabled by default]
drivers/scsi/libiscsi_tcp.c: In function 'iscsi_tcp_get_curr_r2t':
drivers/scsi/libiscsi_tcp.c:998:5: warning: comparison of distinct pointer 
types lacks a cast [enabled by default]

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpxbpStna9kc.pgp
Description: PGP signature


linux-next: build warnings after merge of the akpm tree

2012-10-24 Thread Stephen Rothwell
Hi Andrew,

After merging the akpm tree, today's linux-next build (powerpc
ppc64_defconfig) produced these warnings:

drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
'__cxio_init_resource_fifo':
drivers/infiniband/hw/cxgb3/cxio_resource.c:62:3: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c:74:4: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c:81:4: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c:86:4: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c:89:7: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 'cxio_init_qpid_fifo':
drivers/infiniband/hw/cxgb3/cxio_resource.c:123:4: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
'cxio_hal_get_resource':
drivers/infiniband/hw/cxgb3/cxio_resource.c:184:6: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c: In function 
'cxio_hal_put_resource':
drivers/infiniband/hw/cxgb3/cxio_resource.c:193:2: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c:193:2: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]
drivers/infiniband/hw/cxgb3/cxio_resource.c:193:2: warning: comparison of 
distinct pointer types lacks a cast [enabled by default]

Probably introduced by commit "include/linux/kfifo.h: replace open-coded
type check code with typecheck()".
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpCatr1BqOEN.pgp
Description: PGP signature


RE: [PATCH] extcon : callback function to read cable property

2012-10-24 Thread Tc, Jenny


> Subject: Re: [PATCH] extcon : callback function to read cable property
> 
> On 10/19/2012 12:13 PM, Tc, Jenny wrote:
> >
> >
> >> Subject: Re: [PATCH] extcon : callback function to read cable
> >> property
> >>
> >> I think the reason why we have extcon is in first place is to
> >> only notify the clients of cable connection and disconnection and
> >> it is up to the client to decide what else to do with the cable
> >> such as finding which state it is in and other details.
> >> So I feel this should not be handled in the extcon.
> >>
> >> However it is up to the maintainer to decide.
> >
> > Once the consumer gets the notification, it needs to take some
> action.
> > One of the action is to read the cable properties. This can be
> > done by proprietary calls which is known both to the consumer and
> > the
> >> provider.
> > My intention is to avoid this proprietary calls. Since both the
> > provider and consumer are communicating with the extcon
> subsystem
> > , I feel having a callback function of this kind would help to
> > avoid the use of proprietary calls. Also I agree that extcon
> > notifier chains are used to notify the cable state
> > (attach/detach). But if a cable has more than two states (like the
> > charger cable) how do we support it without
>  having a callback function like this?
> > Let the maintainer take the final decision.
>  Well this use case will keep on growing if we start factor in this
>  kind of changes and that is why I am opposed to adding any other
> state.
>  Maintainer?
> >
> >
> 
> >>
> >> Hello,
> >>
> >>
> >> I don't think it's appropriate to declare the charger specific
> >> properties in extcon.h. The status of a charger should be and can be
> >> represented by an instance of regulator, power-supply-class, or charger-
> manager.
> >>
> > Agreed. We can move this to power supply subsystem.
> >
> >> Thus, we may (I'm still not sure) need to let extcon to relay the
> >> instance (struct device? or char *devname?) with some callback
> >> similar with get_cable_device().  However, allowing (and encouraging)
> >> to pass void pointer of cable_props to extcon users from extcon
> >> device appears not adequete. If the both parties can use their own
> "private"
> >> data structure, why they cannot simply pass their own data witht the
> >> "private" data channel?
> >>
> >>
> >> Recap:
> >> - The later part of patch: NACK
> >> - The first part of patch (callback): need to reconsider the data type.
> >> We may get device pointer or device name that is correspondant to the
> >> cable, which in turn, guides us to the corresponding data structure
> >> (charger- manager, regulator, or something) However, I'm still not
> >> sure which should be appropriate for this.
> >>
> >
> > The requirement for this feature came from the implementation of the
> > power supply charging framework
> > (http://www.spinics.net/lists/kernel/msg1420500.html
> > refer charger_cable_event_worker function). The charging framework is
> > not a driver. It can be compiled with the power supply class driver to
> > support charging. Also the private data structure may not provide a
> > generic method for this implementation since the extcon provider
> > drivers will be different in different platforms. So it's not necessary 
> > that the
> framework knows the private data structure of the provider.
> > Basically the requirement is to have a generic method to retrieve the
> > cable properties without knowing the extcon provider driver internal
> > implementation. Can you suggest a generic approach for this problem?
> >
> The rold of extcon inform only attached/detached state of extcon consumer
> driver from extcon provider driver. After extcon consumer driver detect the
> state of cable through extcon, extcon consumer driver or framework should
> get the additional information of cable from other device driver except of
> extcon.
> 
> Also, extcon manage various cables (e.g., USB, TA, MHL, JIG-USB-ON, JIG-
> USB-OFF, Dock) What are common properties among many cables expect
> attached or detached state?
> 

For charger cable the current each cable can provide will be common.
But may not be relevant for other cables. 

I understand your point on extcon role. But my concern is, when the consumer
driver gets a notification on cable state change, how does the consumer query 
the
cable properties in a generic way. Do you have any suggestions for this?

A use case can be as below

When a USB host cable (SDP) connected to the platform, without USB enumeration
it can support only up to 100mA(USB2.)/150mA(USB 3.0) (As per USB charging 
spec).
Once the enumeration is done this can be 500mA/950mA. If the consumer charger 
driver
need to configure the charger chip, it need to know the charger cable 
capabilities.
For example a platform PLAT1 may have charger driver CHRGR1 and OTG driver OTG1.
But 

Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread YingHang Zhu
On Thu, Oct 25, 2012 at 10:58 AM, Fengguang Wu  wrote:
> Hi Chen,
>
>> But how can bdi related ra_pages reflect different files' readahead
>> window? Maybe these different files are sequential read, random read
>> and so on.
>
> It's simple: sequential reads will get ra_pages readahead size while
> random reads will not get readahead at all.
>
> Talking about the below chunk, it might hurt someone that explicitly
> takes advantage of the behavior, however the ra_pages*2 seems more
> like a hack than general solution to me: if the user will need
> POSIX_FADV_SEQUENTIAL to double the max readahead window size for
> improving IO performance, then why not just increase bdi->ra_pages and
> benefit all reads? One may argue that it offers some differential
> behavior to specific applications, however it may also present as a
> counter-optimization: if the root already tuned bdi->ra_pages to the
> optimal size, the doubled readahead size will only cost more memory
> and perhaps IO latency.
I agree, we should choose the reasonable solution here.

Thanks,
 Ying Zhu
>
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -87,7 +86,6 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t 
> len, int advice)
> spin_unlock(>f_lock);
> break;
> case POSIX_FADV_SEQUENTIAL:
> -   file->f_ra.ra_pages = bdi->ra_pages * 2;
> spin_lock(>f_lock);
> file->f_mode &= ~FMODE_RANDOM;
> spin_unlock(>f_lock);
>
> Thanks,
> Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread YingHang Zhu
On Thu, Oct 25, 2012 at 10:38 AM, Fengguang Wu  wrote:
> Hi YingHang,
>
>> Actually I've talked about it with Fengguang, he advised we should unify the
>> ra_pages in struct bdi and file_ra_state and leave the issue that
>> spreading data
>> across disks as it is.
>> Fengguang, what's you opinion about this?
>
> Yeah the two ra_pages may run out of sync for already opened files,
> which could be a problem for long opened files. However as Dave put
> it, a device's max readahead size is typically a static value that can
> be set at mount time. So, the question is: do you really hurt from the
> old behavior that deserves this code change?
We could advise the above application to reopen files.
As I mentioned previously the many scst users also have this problem:
[quote]
Note2: you need to restart SCST after you changed read-ahead settings
on the target. It is a limitation of the Linux read ahead
implementation. It reads RA values for each file only when the file
is open and not updates them when the global RA parameters changed.
Hence, the need for vdisk to reopen all its files/devices.
[/quote]
So IMHO it's a functional bug in kernel that brings inconvenience to the
application developers.
>
> I agree with Dave that the multi-disk case is not a valid concern.  In
> fact, how can the patch help that case? I mean, if it's two fuse files
> lying in two disks, it *was* not a problem at all. If it's one big
> file spreading to two disks, it's a too complex scheme to be
> practically manageable which I doubt if you have such a setup.
Yes this patch does not solve the issue here. I'm just push the discussion
a little further, in reality we may never meet such setup.

Thanks,
 Ying Hang
>
> Thanks,
> Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] mm: compaction: Move migration fail/success stats to migrate.c

2012-10-24 Thread David Rientjes
On Mon, 22 Oct 2012, Mel Gorman wrote:

> The compact_pages_moved and compact_pagemigrate_failed events are
> convenient for determining if compaction is active and to what
> degree migration is succeeding but it's at the wrong level. Other
> users of migration may also want to know if migration is working
> properly and this will be particularly true for any automated
> NUMA migration. This patch moves the counters down to migration
> with the new events called pgmigrate_success and pgmigrate_fail.
> The compact_blocks_moved counter is removed because while it was
> useful for debugging initially, it's worthless now as no meaningful
> conclusions can be drawn from its value.
> 

Agreed, "compact_blocks_moved" should have been named 
"compact_blocks_scanned" to accurately describe what it was representing.

> Signed-off-by: Mel Gorman 

Acked-by: David Rientjes 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] Support volatile range for anon vma

2012-10-24 Thread Minchan Kim
This patch introudce new madvise behavior MADV_VOLATILE and
MADV_NOVOLATILE for anonymous pages. It's different with
John Stultz's version which considers only tmpfs. This patch
cannot cover John's one so if below idea is proved as reasonable
maybe, we can unify both concepts and I hope interface could become
madvise/fadvise.

Rationale is following as.
Many allocators call munmap(2) when user call free(3) if ptr is
in mmaped area. But munmap isn't cheap because it have to clean up
all pte entries and unlinking a vma so overhead would be increased
linearly by mmaped area's size.

Volatile conecept of Robert Love could be very useful for reducing
free overhead. Allocators can do madvise(MADV_VOLATILE) instead of
munmap(2). madvise(MADV_VOLATILE|NOVOLATILE) is very cheap opeartion
because it just mark the flag in VMA and if memory pressure happens,
VM can discard pages of volatile VMA instead of swapping out when
volatile pages is selected as victim by normal VM aging policy.

Allocator should call madvise(MADV_NOVOLATILE) before allocating
that area to user. Otherwise, accessing of volatile range will meet
SIGBUS error.

The downside is that we have to age anon lru list although we don't
have swap but I think it's a trade-off for getting good feature.
Even, we had done it two years ago until merge [1] and I believe
free(3) performance gain will beat loss of anon lru aging's overead
once all of allocator start to use madvise.

I hope seeing opinions from others before diving into glibc or bionic.
Welcome to any comment.

[1] 74e3f3c3, vmscan: prevent background aging of anon page in no swap system

Cc: John Stultz 
Cc: Andrew Morton 
Cc: Christoph Lameter 
Cc: Android Kernel Team 
Cc: Robert Love 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Rik van Riel 
Cc: Dave Chinner 
Cc: Neil Brown 
Cc: Mike Hommey 
Cc: Taras Glek 
Cc: KOSAKI Motohiro 
Cc: Christoph Lameter 
Cc: KAMEZAWA Hiroyuki 
Signed-off-by: Minchan Kim 
---
 include/asm-generic/mman-common.h |3 +
 include/linux/mm.h|8 ++-
 include/linux/mm_types.h  |5 ++
 include/linux/rmap.h  |   24 ++-
 mm/ksm.c  |4 +-
 mm/madvise.c  |   32 +-
 mm/memory.c   |2 +
 mm/migrate.c  |6 +-
 mm/rmap.c |  127 +++--
 mm/vmscan.c   |3 +
 10 files changed, 202 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/mman-common.h 
b/include/asm-generic/mman-common.h
index d030d2c..5f8090d 100644
--- a/include/asm-generic/mman-common.h
+++ b/include/asm-generic/mman-common.h
@@ -34,6 +34,9 @@
 #define MADV_SEQUENTIAL2   /* expect sequential page 
references */
 #define MADV_WILLNEED  3   /* will need these pages */
 #define MADV_DONTNEED  4   /* don't need these pages */
+#define MADV_VOLATILE  5   /* pages will disappear suddenly */
+#define MADV_NOVOLATILE 6  /* pages will not disappear */
+
 
 /* common parameters: try to keep these consistent across architectures */
 #define MADV_REMOVE9   /* remove these pages & resources */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 311be90..73b8711 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -120,6 +120,12 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_PFN_AT_MMAP 0x4000  /* PFNMAP vma that is fully mapped at 
mmap time */
 #define VM_MERGEABLE   0x8000  /* KSM may merge identical pages */
 
+/*
+ * Recently, Konstantin removed a few flags but not merged yet
+ * so we will get a room for new flag for supporting 32 bit. Thanks, 
Konstantin!.
+ */
+#define VM_VOLATILE0x1
+
 /* Bits set in the VMA until the stack is in its final location */
 #define VM_STACK_INCOMPLETE_SETUP  (VM_RAND_READ | VM_SEQ_READ)
 
@@ -143,7 +149,7 @@ extern unsigned int kobjsize(const void *objp);
  * Special vmas that are non-mergable, non-mlock()able.
  * Note: mm/huge_memory.c VM_NO_THP depends on this definition.
  */
-#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
+#define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP | 
VM_VOLATILE)
 
 /*
  * mapping from the currently active vm_flags protection bits (the
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index bf78672..4ad3c8d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -279,6 +279,11 @@ struct vm_area_struct {
 #ifdef CONFIG_NUMA
struct mempolicy *vm_policy;/* NUMA policy for the VMA */
 #endif
+   /*
+* True if page in this vma is reclaimed.
+* It's protected by anon_vma->mutex.
+*/
+   bool purged;
 };
 
 struct core_thread {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 3fce545..65b9f33 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ 

Re[2]:[RESEND PATCH] module: Fix kallsyms to show the last symbol properly

2012-10-24 Thread Masaki Kimura
Hi Rusty,

Thank you for your review of my patch and improvement for it.

>So I prefer the following fix:
I also prefer your way of fix from readability point of view. 
I tested your patch and confirmed that it works fine.

Best Regards,
Masaki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2012-10-24-17-15 uploaded (uml)

2012-10-24 Thread Stephen Rothwell
Hi Randy,

On Wed, 24 Oct 2012 18:17:59 -0700 Randy Dunlap  wrote:
>
> uml on x86_64 defconfig:
> 
> arch/um/drivers/chan_kern.c: In function 'tty_receive_char':
> arch/um/drivers/chan_kern.c:89:42: error: 'struct tty_struct' has no member 
> named 'raw'

Caused by commit 53c5ee2cfb4d ("TTY: move ldisc data from tty_struct:
simple members") from the tty tree in linux-next. cc's added.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgppCMmHYHrnx.pgp
Description: PGP signature


Re: [PATCH v2] staging/comedi: Use pr_ or dev_ printks in drivers/gsc_hdpi.c

2012-10-24 Thread Toshiaki Yamane
On Thu, Oct 25, 2012 at 11:43 AM, Greg Kroah-Hartman
 wrote:
> On Thu, Oct 25, 2012 at 11:23:13AM +0900, YAMANE Toshiaki wrote:
>> fixed below checkpatch warning.
>> - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then 
>> pr_warn(...  to printk(KERN_WARNING ...
>>
>> some of them have been replaced by dev_dbg or pr_debug,
>> and added pr_fmt.
>>
>> Signed-off-by: YAMANE Toshiaki 
>
> This patch doesn't apply, can you redo it against the next linux-next
> release and resend it?

Yes, thanks.


-- 

Regards,

YAMANE Toshiaki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] staging/comedi: Use pr_ or dev_ printks in drivers/gsc_hdpi.c

2012-10-24 Thread Greg Kroah-Hartman
On Thu, Oct 25, 2012 at 11:23:13AM +0900, YAMANE Toshiaki wrote:
> fixed below checkpatch warning.
> - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then 
> pr_warn(...  to printk(KERN_WARNING ...
> 
> some of them have been replaced by dev_dbg or pr_debug,
> and added pr_fmt.
> 
> Signed-off-by: YAMANE Toshiaki 

This patch doesn't apply, can you redo it against the next linux-next
release and resend it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread Fengguang Wu
Hi YingHang,

> Actually I've talked about it with Fengguang, he advised we should unify the
> ra_pages in struct bdi and file_ra_state and leave the issue that
> spreading data
> across disks as it is.
> Fengguang, what's you opinion about this?

Yeah the two ra_pages may run out of sync for already opened files,
which could be a problem for long opened files. However as Dave put
it, a device's max readahead size is typically a static value that can
be set at mount time. So, the question is: do you really hurt from the
old behavior that deserves this code change?

I agree with Dave that the multi-disk case is not a valid concern.  In
fact, how can the patch help that case? I mean, if it's two fuse files
lying in two disks, it *was* not a problem at all. If it's one big
file spreading to two disks, it's a too complex scheme to be
practically manageable which I doubt if you have such a setup. 

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]

2012-10-24 Thread Dave Jones
Machine under significant load (4gb memory used, swap usage fluctuating)
triggered this...

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
 [] warn_slowpath_common+0x7f/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] shmem_getpage_gfp+0xa5c/0xa70
 [] ? shmem_getpage_gfp+0x29e/0xa70
 [] shmem_fault+0x4f/0xa0
 [] __do_fault+0x71/0x5c0
 [] ? __lock_acquire+0x306/0x1ba0
 [] ? local_clock+0x89/0xa0
 [] handle_pte_fault+0x97/0xae0
 [] ? sub_preempt_count+0x79/0xd0
 [] ? delay_tsc+0xae/0x120
 [] ? __const_udelay+0x28/0x30
 [] handle_mm_fault+0x289/0x350
 [] __do_page_fault+0x18e/0x530
 [] ? local_clock+0x89/0xa0
 [] ? get_parent_ip+0x11/0x50
 [] ? get_parent_ip+0x11/0x50
 [] ? sub_preempt_count+0x79/0xd0
 [] ? rcu_user_exit+0xc9/0xf0
 [] do_page_fault+0x2b/0x50
 [] page_fault+0x28/0x30
 [] ? copy_user_enhanced_fast_string+0x9/0x20
 [] ? sys_futimesat+0x41/0xe0
 [] ? syscall_trace_enter+0x25/0x2c0
 [] ? tracesys+0x7e/0xe6
 [] tracesys+0xe1/0xe6



1148 error = shmem_add_to_page_cache(page, mapping, 
index,
1149 gfp, 
swp_to_radix_entry(swap));
1150 /* We already confirmed swap, and make no 
allocation */
1151 VM_BUG_ON(error);
1152 }


 total   used   free sharedbuffers cached
Mem:   388552828540641031464  0   9624  19208
-/+ buffers/cache:28252321060296
Swap:  6029308  306565998652


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread YingHang Zhu
On Thu, Oct 25, 2012 at 10:12 AM, Ni zhan Chen  wrote:
> On 10/25/2012 10:04 AM, YingHang Zhu wrote:
>>
>> On Thu, Oct 25, 2012 at 9:50 AM, Dave Chinner  wrote:
>>>
>>> On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:

 On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner 
 wrote:
>
> On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
>>
>> Hi Dave,
>> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner 
>> wrote:
>>>
>>> On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:

 Hi,
Recently we ran into the bug that an opened file's ra_pages does
 not
 synchronize with it's backing device's when the latter is changed
 with blockdev --setra, the application needs to reopen the file
 to know the change,
>>>
>>> or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
>>> window to the (new) bdi default.
>>>
 which is inappropriate under our circumstances.
>>>
>>> Which are? We don't know your circumstances, so you need to tell us
>>> why you need this and why existing methods of handling such changes
>>> are insufficient...
>>>
>>> Optimal readahead windows tend to be a physical property of the
>>> storage and that does not tend to change dynamically. Hence block
>>> device readahead should only need to be set up once, and generally
>>> that can be done before the filesystem is mounted and files are
>>> opened (e.g. via udev rules). Hence you need to explain why you need
>>> to change the default block device readahead on the fly, and why
>>> fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
>>> windows to the new defaults.
>>
>> Our system is a fuse-based file system, fuse creates a
>> pseudo backing device for the user space file systems, the default
>> readahead
>> size is 128KB and it can't fully utilize the backing storage's read
>> ability,
>> so we should tune it.
>
> Sure, but that doesn't tell me anything about why you can't do this
> at mount time before the application opens any files. i.e.  you've
> simply stated the reason why readahead is tunable, not why you need
> to be fully dynamic.

 We store our file system's data on different disks so we need to change
 ra_pages
 dynamically according to where the data resides, it can't be fixed at
 mount time
 or when we open files.
>>>
>>> That doesn't make a whole lot of sense to me. let me try to get this
>>> straight.
>>>
>>> There is data that resides on two devices (A + B), and a fuse
>>> filesystem to access that data. There is a single file in the fuse
>>> fs has data on both devices. An app has the file open, and when the
>>> data it is accessing is on device A you need to set the readahead to
>>> what is best for device A? And when the app tries to access data for
>>> that file that is on device B, you need to set the readahead to what
>>> is best for device B? And you are changing the fuse BDI readahead
>>> settings according to where the data in the back end lies?
>>>
>>> It seems to me that you should be setting the fuse readahead to the
>>> maximum of the readahead windows the data devices have configured at
>>> mount time and leaving it at that
>>
>> Then it may not fully utilize some device's read IO bandwidth and put too
>> much
>> burden on other devices.

 The abstract bdi of fuse and btrfs provides some dynamically changing
 bdi.ra_pages
 based on the real backing device. IMHO this should not be ignored.
>>>
>>> btrfs simply takes into account the number of disks it has for a
>>> given storage pool when setting up the default bdi ra_pages during
>>> mount.  This is basically doing what I suggested above.  Same with
>>> the generic fuse code - it's simply setting a sensible default value
>>> for the given fuse configuration.
>>>
>>> Neither are dynamic in the sense you are talking about, though.
>>
>> Actually I've talked about it with Fengguang, he advised we should unify
>> the
>
>
> But how can bdi related ra_pages reflect different files' readahead window?
> Maybe these different files are sequential read, random read and so on.
I think you mean the dynamic tuning of readahead window, that's exactly the job
of readahead algorithm and it's reflected by file_ra_state.sync_size and
file_ra_state.async_size.
The ra_pages in struct file_ra_state only means the max readahead ability.

Thanks,
Ying Zhu
>
>> ra_pages in struct bdi and file_ra_state and leave the issue that
>> spreading data
>> across disks as it is.
>> Fengguang, what's you opinion about this?
>>
>> Thanks,
>>   Ying Zhu
>>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> da...@fromorbit.com
>>
>> --
>>
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: 

Re: [RFC 1/2] vmevent: Implement pressure attribute

2012-10-24 Thread Anton Vorontsov
Hello Pekka,

Thanks for taking a look into this!

On Wed, Oct 24, 2012 at 12:03:10PM +0300, Pekka Enberg wrote:
> On Mon, 22 Oct 2012, Anton Vorontsov wrote:
> > This patch introduces VMEVENT_ATTR_PRESSURE, the attribute reports Linux
> > virtual memory management pressure. There are three discrete levels:
> > 
> > VMEVENT_PRESSURE_LOW: Notifies that the system is reclaiming memory for
> > new allocations. Monitoring reclaiming activity might be useful for
> > maintaining overall system's cache level.
> > 
> > VMEVENT_PRESSURE_MED: The system is experiencing medium memory pressure,
> > there is some mild swapping activity. Upon this event applications may
> > decide to free any resources that can be easily reconstructed or re-read
> > from a disk.
> 
> Nit:
> 
> s/VMEVENT_PRESSURE_MED/VMEVENT_PRESSUDE_MEDIUM/

Sure thing, will change.

> Other than that, I'm OK with this. Mel and others, what are your thoughts 
> on this?
> 
> Anton, have you tested this with real world scenarios?

Yup, I was mostly testing it on a desktop. I.e. in a KVM instance I was
running a full fedora17 desktop w/ a lot of apps opened. The pressure
index was pretty good in the sense that it was indeed reflecting the
sluggishness in the system during swap activity. It's not ideal, i.e. the
index might drop slightly for some time, but we usually interested in
"above some value" threshold, so it should be fine.

The _LOW level is defined very strictly, and cannot be tuned anyhow. So
it's very solid, and that's what we mostly use for Android.

The _OOM level is also defined quite strict, so from the API point of
view, it's also solid, and should not be a problem.

Although the problem with _OOM is delivering the event in time (i.e. we
must be quick in predicting it, before OOMK triggers). Today the patch has
a shortcut for _OOM level: we send _OOM notification when reclaimer's
priority is below empirically found value '3' (we might make it tunable
via sysctl too, but that would expose another mm detail -- although sysctl
sounds not that bad as exposing something in the C API; we have plenty of
mm knobs in /proc/sys/vm/ already).

The real tunable is _MED level, and this should be tuned based on the
desired system's behaviour that I described in more detail in this long
post: http://lkml.org/lkml/2012/10/7/29.

Based on my observations, I wouldn't say that we have plenty of room to
tune the value, though. Usual swapping activity causes index to rise to
say to 30%, and when the system can't keep up, it raises to 50..90 (but we
still have plenty of swap space, so the system is far away from OOM,
although it is thrashing. Ideally I'd prefer to not have any sysctl, but I
believe _MED level is really based on user's definition of "medium".

> How does it stack up against Android's low memory killer, for example?

The LMK driver is effectively using what we call _LOW pressure
notifications here, so by definition it is enough to build a full
replacement for the in-kernel LMK using just the _LOW level. But in the
future, we might want to use _MED as well, e.g. kill unneeded services
based not on the cache level, but based on the pressure.

Thanks,
Anton.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] staging/comedi: Use pr_ or dev_ printks in drivers/gsc_hdpi.c

2012-10-24 Thread YAMANE Toshiaki
fixed below checkpatch warning.
- WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then 
pr_warn(...  to printk(KERN_WARNING ...

some of them have been replaced by dev_dbg or pr_debug,
and added pr_fmt.

Signed-off-by: YAMANE Toshiaki 
---
 drivers/staging/comedi/drivers/gsc_hpdi.c |   31 +++--
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/comedi/drivers/gsc_hpdi.c 
b/drivers/staging/comedi/drivers/gsc_hpdi.c
index abff660..31d89d5 100644
--- a/drivers/staging/comedi/drivers/gsc_hpdi.c
+++ b/drivers/staging/comedi/drivers/gsc_hpdi.c
@@ -45,6 +45,8 @@ support could be added to this driver.
 
 */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include "../comedidev.h"
 #include 
@@ -107,8 +109,7 @@ enum hpdi_registers {
 int command_channel_valid(unsigned int channel)
 {
if (channel == 0 || channel > 6) {
-   printk(KERN_WARNING
-  "gsc_hpdi: bug! invalid cable command channel\n");
+   pr_debug("bug! invalid cable command channel\n");
return 0;
}
return 1;
@@ -553,7 +554,7 @@ static int hpdi_attach(struct comedi_device *dev, struct 
comedi_devconfig *it)
int i;
int retval;
 
-   printk(KERN_WARNING "comedi%d: gsc_hpdi\n", dev->minor);
+   dev_dbg(dev->class_dev, "gsc_hpdi\n");
 
if (alloc_private(dev, sizeof(struct hpdi_private)) < 0)
return -ENOMEM;
@@ -582,17 +583,17 @@ static int hpdi_attach(struct comedi_device *dev, struct 
comedi_devconfig *it)
} while (pcidev != NULL);
}
if (dev->board_ptr == NULL) {
-   printk(KERN_WARNING "gsc_hpdi: no hpdi card found\n");
+   dev_warn(dev->class_dev, "no hpdi card found\n");
return -EIO;
}
 
-   printk(KERN_WARNING
-  "gsc_hpdi: found %s on bus %i, slot %i\n", board(dev)->name,
-  pcidev->bus->number, PCI_SLOT(pcidev->devfn));
+   dev_dbg(dev->class_dev,
+   "found %s on bus %i, slot %i\n", board(dev)->name,
+   pcidev->bus->number, PCI_SLOT(pcidev->devfn));
 
if (comedi_pci_enable(pcidev, dev->driver->driver_name)) {
-   printk(KERN_WARNING
-  " failed enable PCI device and request regions\n");
+   dev_warn(dev->class_dev,
+" failed enable PCI device and request regions\n");
return -EIO;
}
pci_set_master(pcidev);
@@ -613,7 +614,7 @@ static int hpdi_attach(struct comedi_device *dev, struct 
comedi_devconfig *it)
ioremap(priv(dev)->hpdi_phys_iobase,
pci_resource_len(pcidev, HPDI_BADDRINDEX));
if (!priv(dev)->plx9080_iobase || !priv(dev)->hpdi_iobase) {
-   printk(KERN_WARNING " failed to remap io memory\n");
+   dev_warn(dev->class_dev, "failed to remap io memory\n");
return -ENOMEM;
}
 
@@ -625,13 +626,13 @@ static int hpdi_attach(struct comedi_device *dev, struct 
comedi_devconfig *it)
/*  get irq */
if (request_irq(pcidev->irq, handle_interrupt, IRQF_SHARED,
dev->driver->driver_name, dev)) {
-   printk(KERN_WARNING
-  " unable to allocate irq %u\n", pcidev->irq);
+   dev_warn(dev->class_dev,
+"unable to allocate irq %u\n", pcidev->irq);
return -EINVAL;
}
dev->irq = pcidev->irq;
 
-   printk(KERN_WARNING " irq %u\n", dev->irq);
+   dev_dbg(dev->class_dev, "irq %u\n", dev->irq);
 
/*  allocate pci dma buffers */
for (i = 0; i < NUM_DMA_BUFFERS; i++) {
@@ -649,8 +650,8 @@ static int hpdi_attach(struct comedi_device *dev, struct 
comedi_devconfig *it)
   (dev)->
   dma_desc_phys_addr);
if (priv(dev)->dma_desc_phys_addr & 0xf) {
-   printk(KERN_WARNING
-  " dma descriptors not quad-word aligned (bug)\n");
+   dev_warn(dev->class_dev,
+"dma descriptors not quad-word aligned (bug)\n");
return -EIO;
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the staging tree with the staging.current tree

2012-10-24 Thread Stephen Rothwell
Hi Greg,

Today's linux-next merge of the staging tree got a conflict in
drivers/staging/comedi/drivers/amplc_dio200.c between commit dfb2540e91e1
("staging: comedi: amplc_dio200: fix possible NULL deref during detach")
from the staging.current tree and commit 71b3e9e8dc21 ("staging: comedi:
amplc_dio200: support memory-mapped I/O") from the staging tree.

I fixed it up (the latter is a superset of the former) and can carry the
fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpzh3kcKSYvL.pgp
Description: PGP signature


Re: linux-next: manual merge of the usb tree with the usb.current tree

2012-10-24 Thread Greg KH
On Thu, Oct 25, 2012 at 01:05:45PM +1100, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the usb tree got a conflict in
> drivers/usb/misc/ezusb.c between commit 197ef5ef37d9 ("USB: Add missing
> license tag to ezusb driver") from the usb.current tree and commitc
> 30186e51e53 ("USB: ezusb: unexport some functions that aren't being
> used") from the usb tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Looks good to me, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread Ni zhan Chen

On 10/25/2012 10:04 AM, YingHang Zhu wrote:

On Thu, Oct 25, 2012 at 9:50 AM, Dave Chinner  wrote:

On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:

On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner  wrote:

On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:

Hi Dave,
On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner  wrote:

On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:

Hi,
   Recently we ran into the bug that an opened file's ra_pages does not
synchronize with it's backing device's when the latter is changed
with blockdev --setra, the application needs to reopen the file
to know the change,

or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
window to the (new) bdi default.


which is inappropriate under our circumstances.

Which are? We don't know your circumstances, so you need to tell us
why you need this and why existing methods of handling such changes
are insufficient...

Optimal readahead windows tend to be a physical property of the
storage and that does not tend to change dynamically. Hence block
device readahead should only need to be set up once, and generally
that can be done before the filesystem is mounted and files are
opened (e.g. via udev rules). Hence you need to explain why you need
to change the default block device readahead on the fly, and why
fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
windows to the new defaults.

Our system is a fuse-based file system, fuse creates a
pseudo backing device for the user space file systems, the default readahead
size is 128KB and it can't fully utilize the backing storage's read ability,
so we should tune it.

Sure, but that doesn't tell me anything about why you can't do this
at mount time before the application opens any files. i.e.  you've
simply stated the reason why readahead is tunable, not why you need
to be fully dynamic.

We store our file system's data on different disks so we need to change ra_pages
dynamically according to where the data resides, it can't be fixed at mount time
or when we open files.

That doesn't make a whole lot of sense to me. let me try to get this
straight.

There is data that resides on two devices (A + B), and a fuse
filesystem to access that data. There is a single file in the fuse
fs has data on both devices. An app has the file open, and when the
data it is accessing is on device A you need to set the readahead to
what is best for device A? And when the app tries to access data for
that file that is on device B, you need to set the readahead to what
is best for device B? And you are changing the fuse BDI readahead
settings according to where the data in the back end lies?

It seems to me that you should be setting the fuse readahead to the
maximum of the readahead windows the data devices have configured at
mount time and leaving it at that

Then it may not fully utilize some device's read IO bandwidth and put too much
burden on other devices.

The abstract bdi of fuse and btrfs provides some dynamically changing
bdi.ra_pages
based on the real backing device. IMHO this should not be ignored.

btrfs simply takes into account the number of disks it has for a
given storage pool when setting up the default bdi ra_pages during
mount.  This is basically doing what I suggested above.  Same with
the generic fuse code - it's simply setting a sensible default value
for the given fuse configuration.

Neither are dynamic in the sense you are talking about, though.

Actually I've talked about it with Fengguang, he advised we should unify the


But how can bdi related ra_pages reflect different files' readahead 
window? Maybe these different files are sequential read, random read and 
so on.



ra_pages in struct bdi and file_ra_state and leave the issue that
spreading data
across disks as it is.
Fengguang, what's you opinion about this?

Thanks,
  Ying Zhu

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 12/12] perf tools: Try to build Documentation when installing

2012-10-24 Thread Namhyung Kim
On Thu, Oct 25, 2012 at 6:50 AM, Arnaldo Carvalho de Melo
 wrote:
> From: Borislav Petkov 
>
> There's a portion in the "perf list" output refering to the exact
> specification of raw hardware events.
>
> Since this description is in the perf-list manpage, try to build and
> install the man pages, warning the user when that is not possible
> due to missing packages (xmlto and asciidoc).
>
> Signed-off-by: Borislav Petkov 
> Tested-by: Arnaldo Carvalho de Melo 
> Cc: Namhyung Kim 
> Link: http://lkml.kernel.org/n/tip-ij71ysszkdvz3fy3wr331...@git.kernel.org
> Signed-off-by: Arnaldo Carvalho de Melo 

Acked-by: Namhyung Kim 


Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the usb tree with the usb.current tree

2012-10-24 Thread Stephen Rothwell
Hi Greg,

Today's linux-next merge of the usb tree got a conflict in
drivers/usb/misc/ezusb.c between commit 197ef5ef37d9 ("USB: Add missing
license tag to ezusb driver") from the usb.current tree and commitc
30186e51e53 ("USB: ezusb: unexport some functions that aren't being
used") from the usb tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/usb/misc/ezusb.c
index 6589268,0a48de9..000
--- a/drivers/usb/misc/ezusb.c
+++ b/drivers/usb/misc/ezusb.c
@@@ -157,5 -162,4 +162,6 @@@ int ezusb_fx2_ihex_firmware_download(st
return ezusb_ihex_firmware_download(dev, ezusb_fx2, firmware_path);
  }
  EXPORT_SYMBOL_GPL(ezusb_fx2_ihex_firmware_download);
+ #endif
 +
 +MODULE_LICENSE("GPL");


pgpmr2yJgO58u.pgp
Description: PGP signature


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread YingHang Zhu
On Thu, Oct 25, 2012 at 9:50 AM, Dave Chinner  wrote:
> On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:
>> On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner  wrote:
>> > On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
>> >> Hi Dave,
>> >> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner  wrote:
>> >> > On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:
>> >> >> Hi,
>> >> >>   Recently we ran into the bug that an opened file's ra_pages does not
>> >> >> synchronize with it's backing device's when the latter is changed
>> >> >> with blockdev --setra, the application needs to reopen the file
>> >> >> to know the change,
>> >> >
>> >> > or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
>> >> > window to the (new) bdi default.
>> >> >
>> >> >> which is inappropriate under our circumstances.
>> >> >
>> >> > Which are? We don't know your circumstances, so you need to tell us
>> >> > why you need this and why existing methods of handling such changes
>> >> > are insufficient...
>> >> >
>> >> > Optimal readahead windows tend to be a physical property of the
>> >> > storage and that does not tend to change dynamically. Hence block
>> >> > device readahead should only need to be set up once, and generally
>> >> > that can be done before the filesystem is mounted and files are
>> >> > opened (e.g. via udev rules). Hence you need to explain why you need
>> >> > to change the default block device readahead on the fly, and why
>> >> > fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
>> >> > windows to the new defaults.
>> >> Our system is a fuse-based file system, fuse creates a
>> >> pseudo backing device for the user space file systems, the default 
>> >> readahead
>> >> size is 128KB and it can't fully utilize the backing storage's read 
>> >> ability,
>> >> so we should tune it.
>> >
>> > Sure, but that doesn't tell me anything about why you can't do this
>> > at mount time before the application opens any files. i.e.  you've
>> > simply stated the reason why readahead is tunable, not why you need
>> > to be fully dynamic.
>> We store our file system's data on different disks so we need to change 
>> ra_pages
>> dynamically according to where the data resides, it can't be fixed at mount 
>> time
>> or when we open files.
>
> That doesn't make a whole lot of sense to me. let me try to get this
> straight.
>
> There is data that resides on two devices (A + B), and a fuse
> filesystem to access that data. There is a single file in the fuse
> fs has data on both devices. An app has the file open, and when the
> data it is accessing is on device A you need to set the readahead to
> what is best for device A? And when the app tries to access data for
> that file that is on device B, you need to set the readahead to what
> is best for device B? And you are changing the fuse BDI readahead
> settings according to where the data in the back end lies?
>
> It seems to me that you should be setting the fuse readahead to the
> maximum of the readahead windows the data devices have configured at
> mount time and leaving it at that
Then it may not fully utilize some device's read IO bandwidth and put too much
burden on other devices.
>
>> The abstract bdi of fuse and btrfs provides some dynamically changing
>> bdi.ra_pages
>> based on the real backing device. IMHO this should not be ignored.
>
> btrfs simply takes into account the number of disks it has for a
> given storage pool when setting up the default bdi ra_pages during
> mount.  This is basically doing what I suggested above.  Same with
> the generic fuse code - it's simply setting a sensible default value
> for the given fuse configuration.
>
> Neither are dynamic in the sense you are talking about, though.
Actually I've talked about it with Fengguang, he advised we should unify the
ra_pages in struct bdi and file_ra_state and leave the issue that
spreading data
across disks as it is.
Fengguang, what's you opinion about this?

Thanks,
 Ying Zhu
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/5] efi_pstore: Add a logic erasing entries to an erase callback

2012-10-24 Thread Seiji Aguchi
Resending a patch  by changing a subject from "PATCH 2/5" to "PATCH v2 2/5".

[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in NVRAM.
So, in the following scenario, we will lose 1st panic messages.
 
 1. kernel panics.
 2. efi_pstore is kicked and writes panic messages to NVRAM.
 3. system reboots.
 4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

A reasonable solution to fix the issue is just holding multiple logs without 
erasing
existing entries.

This patch freshly adds a logic erasing existing entries, which shared with a 
write callback,
to an erase callback.
To support holding multiple logs, the write callback doesn't need to erase any 
entries and
it will be removed in a subsequent patch.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   46 +++-
 1 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 37ac21a..bee14cc 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -784,7 +784,51 @@ static int efi_pstore_write(enum pstore_type_id type,
 static int efi_pstore_erase(enum pstore_type_id type, u64 id,
struct pstore_info *psi)
 {
-   efi_pstore_write(type, 0, , (unsigned int)id, 0, psi);
+   char stub_name[DUMP_NAME_LEN];
+   efi_char16_t efi_name[DUMP_NAME_LEN];
+   efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
+   struct efivars *efivars = psi->data;
+   struct efivar_entry *entry, *found = NULL;
+   int i;
+
+   sprintf(stub_name, "dump-type%u-%u-", type, (unsigned int)id);
+
+   spin_lock(>lock);
+
+   for (i = 0; i < DUMP_NAME_LEN; i++)
+   efi_name[i] = stub_name[i];
+
+   /*
+* Clean up any entries with the same name
+*/
+
+   list_for_each_entry(entry, >list, list) {
+   get_var_data_locked(efivars, >var);
+
+   if (efi_guidcmp(entry->var.VendorGuid, vendor))
+   continue;
+   if (utf16_strncmp(entry->var.VariableName, efi_name,
+ utf16_strlen(efi_name)))
+   continue;
+   /* Needs to be a prefix */
+   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
+   continue;
+
+   /* found */
+   found = entry;
+   efivars->ops->set_variable(entry->var.VariableName,
+  >var.VendorGuid,
+  PSTORE_EFI_ATTRIBUTES,
+  0, NULL);
+   }
+
+   if (found)
+   list_del(>list);
+
+   spin_unlock(>lock);
+
+   if (found)
+   efivar_unregister(found);
 
return 0;
 }
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 5/5] efi_pstore: Add a sequence counter to a variable name

2012-10-24 Thread Seiji Aguchi
[Issue]

Currently, a variable name, which identifies each entry, consists of type, id 
and ctime.
But if multiple events happens in a short time, a second/third event may fail 
to log because
efi_pstore can't distinguish each event with current variable name.

[Solution]

A reasonable way to identify all events precisely is introducing a sequence 
counter to
the variable name.

The sequence counter has already supported in a pstore layer with "oopscount".
So, this patch adds it to a variable name.
Also, it is passed to read/erase callbacks of platform drivers in accordance 
with
the modification of the variable name.

  
 a variable name of first event: dump-type0-1-12345678
 a variable name of second event: dump-type0-1-12345678

  type:0
  id:1
  ctime:12345678

 If multiple events happen in a short time, efi_pstore can't distinguish them 
because
 variable names are same among them.

  

 it can be distinguishable by adding a sequence counter as follows.

 a variable name of first event: dump-type0-1-1-12345678
 a variable name of Second event: dump-type0-1-2-12345678

  type:0
  id:1
  sequence counter: 1(first event), 2(second event)
  ctime:12345678

Signed-off-by: Seiji Aguchi 
---
 drivers/acpi/apei/erst.c   |   12 ++--
 drivers/firmware/efivars.c |   18 +++---
 fs/pstore/inode.c  |8 +---
 fs/pstore/internal.h   |2 +-
 fs/pstore/platform.c   |   11 ++-
 fs/pstore/ram.c|7 +++
 include/linux/pstore.h |8 +---
 7 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index 0bd6ae4..6d894bf 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -931,13 +931,13 @@ static int erst_check_table(struct acpi_table_erst 
*erst_tab)
 
 static int erst_open_pstore(struct pstore_info *psi);
 static int erst_close_pstore(struct pstore_info *psi);
-static ssize_t erst_reader(u64 *id, enum pstore_type_id *type,
+static ssize_t erst_reader(u64 *id, enum pstore_type_id *type, int *count,
   struct timespec *time, char **buf,
   struct pstore_info *psi);
 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason,
-  u64 *id, unsigned int part,
+  u64 *id, unsigned int part, int count,
   size_t size, struct pstore_info *psi);
-static int erst_clearer(enum pstore_type_id type, u64 id,
+static int erst_clearer(enum pstore_type_id type, u64 id, int count,
struct timespec time, struct pstore_info *psi);
 
 static struct pstore_info erst_info = {
@@ -987,7 +987,7 @@ static int erst_close_pstore(struct pstore_info *psi)
return 0;
 }
 
-static ssize_t erst_reader(u64 *id, enum pstore_type_id *type,
+static ssize_t erst_reader(u64 *id, enum pstore_type_id *type, int *count,
   struct timespec *time, char **buf,
   struct pstore_info *psi)
 {
@@ -1055,7 +1055,7 @@ out:
 }
 
 static int erst_writer(enum pstore_type_id type, enum kmsg_dump_reason reason,
-  u64 *id, unsigned int part,
+  u64 *id, unsigned int part, int count,
   size_t size, struct pstore_info *psi)
 {
struct cper_pstore_record *rcd = (struct cper_pstore_record *)
@@ -1101,7 +1101,7 @@ static int erst_writer(enum pstore_type_id type, enum 
kmsg_dump_reason reason,
return ret;
 }
 
-static int erst_clearer(enum pstore_type_id type, u64 id,
+static int erst_clearer(enum pstore_type_id type, u64 id, int count,
struct timespec time, struct pstore_info *psi)
 {
return erst_clear(id);
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 6cbeea7..dc69802 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -658,13 +658,14 @@ static int efi_pstore_close(struct pstore_info *psi)
 }
 
 static ssize_t efi_pstore_read(u64 *id, enum pstore_type_id *type,
-  struct timespec *timespec,
+  int *count, struct timespec *timespec,
   char **buf, struct pstore_info *psi)
 {
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
struct efivars *efivars = psi->data;
char name[DUMP_NAME_LEN];
int i;
+   int cnt;
unsigned int part, size;
unsigned long time;
 
@@ -674,8 +675,10 @@ static ssize_t efi_pstore_read(u64 *id, enum 
pstore_type_id *type,
for (i = 0; i < DUMP_NAME_LEN; i++) {
name[i] = 
efivars->walk_entry->var.VariableName[i];
}
-   if (sscanf(name, "dump-type%u-%u-%lu", type, , 
) == 3) {
+   if (sscanf(name, "dump-type%u-%u-%d-%lu",
+  type, , , ) == 4) {

[PATCH v2 4/5] efi_pstore: Add ctime to argument of erase callback

2012-10-24 Thread Seiji Aguchi
[Issue]

Currently, a variable name, which is used to identify each log entry, consists 
of type,
id and ctime. But an erase callback does not use ctime.

If efi_pstore supported just one log, type and id were enough.
However, in case of supporting multiple logs, it doesn't work because
it can't distinguish each entry without ctime at erasing time.

 

 As you can see below, efi_pstore can't differentiate first event from second 
one without ctime.

 a variable name of first event: dump-type0-1-12345678
 a variable name of second event: dump-type0-1-23456789

  type:0
  id:1
  ctime:12345678, 23456789

[Solution]

This patch adds ctime to an argument of an erase callback.

It works across reboots because ctime of pstore means the date that the record 
was originally stored.
To do this, efi_pstore saves the ctime to variable name at writing time and 
passes it to pstore
at reading time.

Signed-off-by: Seiji Aguchi 
---
 drivers/acpi/apei/erst.c   |4 ++--
 drivers/firmware/efivars.c |   15 +++
 fs/pstore/inode.c  |3 ++-
 fs/pstore/ram.c|2 +-
 include/linux/pstore.h |2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c
index e4d9d24..0bd6ae4 100644
--- a/drivers/acpi/apei/erst.c
+++ b/drivers/acpi/apei/erst.c
@@ -938,7 +938,7 @@ static int erst_writer(enum pstore_type_id type, enum 
kmsg_dump_reason reason,
   u64 *id, unsigned int part,
   size_t size, struct pstore_info *psi);
 static int erst_clearer(enum pstore_type_id type, u64 id,
-   struct pstore_info *psi);
+   struct timespec time, struct pstore_info *psi);
 
 static struct pstore_info erst_info = {
.owner  = THIS_MODULE,
@@ -1102,7 +1102,7 @@ static int erst_writer(enum pstore_type_id type, enum 
kmsg_dump_reason reason,
 }
 
 static int erst_clearer(enum pstore_type_id type, u64 id,
-   struct pstore_info *psi)
+   struct timespec time, struct pstore_info *psi)
 {
return erst_clear(id);
 }
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index fbe9202..6cbeea7 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -747,24 +747,25 @@ static int efi_pstore_write(enum pstore_type_id type,
 };
 
 static int efi_pstore_erase(enum pstore_type_id type, u64 id,
-   struct pstore_info *psi)
+   struct timespec time, struct pstore_info *psi)
 {
-   char stub_name[DUMP_NAME_LEN];
+   char name[DUMP_NAME_LEN];
efi_char16_t efi_name[DUMP_NAME_LEN];
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
struct efivars *efivars = psi->data;
struct efivar_entry *entry, *found = NULL;
int i;
 
-   sprintf(stub_name, "dump-type%u-%u-", type, (unsigned int)id);
+   sprintf(name, "dump-type%u-%u-%lu", type, (unsigned int)id,
+   time.tv_sec);
 
spin_lock(>lock);
 
for (i = 0; i < DUMP_NAME_LEN; i++)
-   efi_name[i] = stub_name[i];
+   efi_name[i] = name[i];
 
/*
-* Clean up any entries with the same name
+* Clean up an entry with the same name
 */
 
list_for_each_entry(entry, >list, list) {
@@ -775,9 +776,6 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 
id,
if (utf16_strncmp(entry->var.VariableName, efi_name,
  utf16_strlen(efi_name)))
continue;
-   /* Needs to be a prefix */
-   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
-   continue;
 
/* found */
found = entry;
@@ -785,6 +783,7 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 
id,
   >var.VendorGuid,
   PSTORE_EFI_ATTRIBUTES,
   0, NULL);
+   break;
}
 
if (found)
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 4ab572e..4300af6 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -175,7 +175,8 @@ static int pstore_unlink(struct inode *dir, struct dentry 
*dentry)
struct pstore_private *p = dentry->d_inode->i_private;
 
if (p->psi->erase)
-   p->psi->erase(p->type, p->id, p->psi);
+   p->psi->erase(p->type, p->id, dentry->d_inode->i_ctime,
+ p->psi);
 
return simple_unlink(dir, dentry);
 }
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 1a4f6da..749693f 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -237,7 +237,7 @@ static int notrace ramoops_pstore_write_buf(enum 
pstore_type_id type,
 }
 
 static int ramoops_pstore_erase(enum pstore_type_id type, u64 id,
-   struct 

[PATCH v2 3/5] efi_pstore: Remove a logic erasing entries from a write callback to hold multiple logs

2012-10-24 Thread Seiji Aguchi
[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in NVRAM.
So, in the following scenario, we will lose 1st panic messages.

1. kernel panics.
2. efi_pstore is kicked and writes panic messages to NVRAM.
3. system reboots.
4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

A reasonable solution to fix the issue is just holding multiple logs without 
erasing
existing entries.
This patch removes a logic erasing existing entries in a write callback
because the logic is not needed in the write callback to support holding 
multiple logs.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   39 ++-
 1 files changed, 2 insertions(+), 37 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index bee14cc..fbe9202 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -701,18 +701,13 @@ static int efi_pstore_write(enum pstore_type_id type,
unsigned int part, size_t size, struct pstore_info *psi)
 {
char name[DUMP_NAME_LEN];
-   char stub_name[DUMP_NAME_LEN];
efi_char16_t efi_name[DUMP_NAME_LEN];
efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
struct efivars *efivars = psi->data;
-   struct efivar_entry *entry, *found = NULL;
int i, ret = 0;
u64 storage_space, remaining_space, max_variable_size;
efi_status_t status = EFI_NOT_FOUND;
 
-   sprintf(stub_name, "dump-type%u-%u-", type, part);
-   sprintf(name, "%s%lu", stub_name, get_seconds());
-
spin_lock(>lock);
 
/*
@@ -730,35 +725,8 @@ static int efi_pstore_write(enum pstore_type_id type,
return -ENOSPC;
}
 
-   for (i = 0; i < DUMP_NAME_LEN; i++)
-   efi_name[i] = stub_name[i];
-
-   /*
-* Clean up any entries with the same name
-*/
-
-   list_for_each_entry(entry, >list, list) {
-   get_var_data_locked(efivars, >var);
-
-   if (efi_guidcmp(entry->var.VendorGuid, vendor))
-   continue;
-   if (utf16_strncmp(entry->var.VariableName, efi_name,
- utf16_strlen(efi_name)))
-   continue;
-   /* Needs to be a prefix */
-   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
-   continue;
-
-   /* found */
-   found = entry;
-   efivars->ops->set_variable(entry->var.VariableName,
-  >var.VendorGuid,
-  PSTORE_EFI_ATTRIBUTES,
-  0, NULL);
-   }
-
-   if (found)
-   list_del(>list);
+   sprintf(name, "dump-type%u-%u-%lu", type, part,
+   get_seconds());
 
for (i = 0; i < DUMP_NAME_LEN; i++)
efi_name[i] = name[i];
@@ -768,9 +736,6 @@ static int efi_pstore_write(enum pstore_type_id type,
 
spin_unlock(>lock);
 
-   if (found)
-   efivar_unregister(found);
-
if (size)
ret = efivar_create_sysfs_entry(efivars,
  utf16_strsize(efi_name,
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] efi_pstore: Add a logic erasing entries to an erase callback

2012-10-24 Thread Seiji Aguchi
[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in NVRAM.
So, in the following scenario, we will lose 1st panic messages.
 
 1. kernel panics.
 2. efi_pstore is kicked and writes panic messages to NVRAM.
 3. system reboots.
 4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

A reasonable solution to fix the issue is just holding multiple logs without 
erasing
existing entries.

This patch freshly adds a logic erasing existing entries, which shared with a 
write callback,
to an erase callback.
To support holding multiple logs, the write callback doesn't need to erase any 
entries and
it will be removed in a subsequent patch.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   46 +++-
 1 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 37ac21a..bee14cc 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -784,7 +784,51 @@ static int efi_pstore_write(enum pstore_type_id type,
 static int efi_pstore_erase(enum pstore_type_id type, u64 id,
struct pstore_info *psi)
 {
-   efi_pstore_write(type, 0, , (unsigned int)id, 0, psi);
+   char stub_name[DUMP_NAME_LEN];
+   efi_char16_t efi_name[DUMP_NAME_LEN];
+   efi_guid_t vendor = LINUX_EFI_CRASH_GUID;
+   struct efivars *efivars = psi->data;
+   struct efivar_entry *entry, *found = NULL;
+   int i;
+
+   sprintf(stub_name, "dump-type%u-%u-", type, (unsigned int)id);
+
+   spin_lock(>lock);
+
+   for (i = 0; i < DUMP_NAME_LEN; i++)
+   efi_name[i] = stub_name[i];
+
+   /*
+* Clean up any entries with the same name
+*/
+
+   list_for_each_entry(entry, >list, list) {
+   get_var_data_locked(efivars, >var);
+
+   if (efi_guidcmp(entry->var.VendorGuid, vendor))
+   continue;
+   if (utf16_strncmp(entry->var.VariableName, efi_name,
+ utf16_strlen(efi_name)))
+   continue;
+   /* Needs to be a prefix */
+   if (entry->var.VariableName[utf16_strlen(efi_name)] == 0)
+   continue;
+
+   /* found */
+   found = entry;
+   efivars->ops->set_variable(entry->var.VariableName,
+  >var.VendorGuid,
+  PSTORE_EFI_ATTRIBUTES,
+  0, NULL);
+   }
+
+   if (found)
+   list_del(>list);
+
+   spin_unlock(>lock);
+
+   if (found)
+   efivar_unregister(found);
 
return 0;
 }
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/5] efi_pstore: Check remaining space with QueryVariableInfo() before writing data

2012-10-24 Thread Seiji Aguchi
[Issue]

As discussed in a thread below, Running out of space in EFI isn't a well-tested 
scenario.
And we wouldn't expect all firmware to handle it gracefully.
http://marc.info/?l=linux-kernel=134305325801789=2

On the other hand, current efi_pstore doesn't check a remaining space of 
storage at writing time.
Therefore, efi_pstore may not work if it tries to write a large amount of data.

[Patch Description]

To avoid handling the situation above, this patch checks if there is a space 
enough to log with
QueryVariableInfo() before writing data.

Signed-off-by: Seiji Aguchi 
---
 drivers/firmware/efivars.c |   18 ++
 include/linux/efi.h|1 +
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index d10c987..37ac21a 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -707,12 +707,29 @@ static int efi_pstore_write(enum pstore_type_id type,
struct efivars *efivars = psi->data;
struct efivar_entry *entry, *found = NULL;
int i, ret = 0;
+   u64 storage_space, remaining_space, max_variable_size;
+   efi_status_t status = EFI_NOT_FOUND;
 
sprintf(stub_name, "dump-type%u-%u-", type, part);
sprintf(name, "%s%lu", stub_name, get_seconds());
 
spin_lock(>lock);
 
+   /*
+* Check if there is a space enough to log.
+* size: a size of logging data
+* DUMP_NAME_LEN * 2: a maximum size of variable name
+*/
+   status = efivars->ops->query_variable_info(PSTORE_EFI_ATTRIBUTES,
+  _space,
+  _space,
+  _variable_size);
+   if (status || remaining_space < size + DUMP_NAME_LEN * 2) {
+   spin_unlock(>lock);
+   *id = part;
+   return -ENOSPC;
+   }
+
for (i = 0; i < DUMP_NAME_LEN; i++)
efi_name[i] = stub_name[i];
 
@@ -1237,6 +1254,7 @@ efivars_init(void)
ops.get_variable = efi.get_variable;
ops.set_variable = efi.set_variable;
ops.get_next_variable = efi.get_next_variable;
+   ops.query_variable_info = efi.query_variable_info;
error = register_efivars(&__efivars, , efi_kobj);
if (error)
goto err_put;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8670eb1..c47ec36 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -643,6 +643,7 @@ struct efivar_operations {
efi_get_variable_t *get_variable;
efi_get_next_variable_t *get_next_variable;
efi_set_variable_t *set_variable;
+   efi_query_variable_info_t *query_variable_info;
 };
 
 struct efivars {
-- 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread Dave Chinner
On Thu, Oct 25, 2012 at 08:17:05AM +0800, YingHang Zhu wrote:
> On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner  wrote:
> > On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:
> >> Hi Dave,
> >> On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner  wrote:
> >> > On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:
> >> >> Hi,
> >> >>   Recently we ran into the bug that an opened file's ra_pages does not
> >> >> synchronize with it's backing device's when the latter is changed
> >> >> with blockdev --setra, the application needs to reopen the file
> >> >> to know the change,
> >> >
> >> > or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
> >> > window to the (new) bdi default.
> >> >
> >> >> which is inappropriate under our circumstances.
> >> >
> >> > Which are? We don't know your circumstances, so you need to tell us
> >> > why you need this and why existing methods of handling such changes
> >> > are insufficient...
> >> >
> >> > Optimal readahead windows tend to be a physical property of the
> >> > storage and that does not tend to change dynamically. Hence block
> >> > device readahead should only need to be set up once, and generally
> >> > that can be done before the filesystem is mounted and files are
> >> > opened (e.g. via udev rules). Hence you need to explain why you need
> >> > to change the default block device readahead on the fly, and why
> >> > fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
> >> > windows to the new defaults.
> >> Our system is a fuse-based file system, fuse creates a
> >> pseudo backing device for the user space file systems, the default 
> >> readahead
> >> size is 128KB and it can't fully utilize the backing storage's read 
> >> ability,
> >> so we should tune it.
> >
> > Sure, but that doesn't tell me anything about why you can't do this
> > at mount time before the application opens any files. i.e.  you've
> > simply stated the reason why readahead is tunable, not why you need
> > to be fully dynamic.
> We store our file system's data on different disks so we need to change 
> ra_pages
> dynamically according to where the data resides, it can't be fixed at mount 
> time
> or when we open files.

That doesn't make a whole lot of sense to me. let me try to get this
straight.

There is data that resides on two devices (A + B), and a fuse
filesystem to access that data. There is a single file in the fuse
fs has data on both devices. An app has the file open, and when the
data it is accessing is on device A you need to set the readahead to
what is best for device A? And when the app tries to access data for
that file that is on device B, you need to set the readahead to what
is best for device B? And you are changing the fuse BDI readahead
settings according to where the data in the back end lies?

It seems to me that you should be setting the fuse readahead to the
maximum of the readahead windows the data devices have configured at
mount time and leaving it at that

> The abstract bdi of fuse and btrfs provides some dynamically changing
> bdi.ra_pages
> based on the real backing device. IMHO this should not be ignored.

btrfs simply takes into account the number of disks it has for a
given storage pool when setting up the default bdi ra_pages during
mount.  This is basically doing what I suggested above.  Same with
the generic fuse code - it's simply setting a sensible default value
for the given fuse configuration.

Neither are dynamic in the sense you are talking about, though.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/5] efi_pstore: multiple event logging support

2012-10-24 Thread Seiji Aguchi
Changelog
v1 -> v2
   - Separate into 5 patches in accordance with Mike's comment
   - Erase an extra line of comment in patch 1/5

[Issue]

Currently, efi_pstore driver simply overwrites existing panic messages in 
NVRAM.
So, in the following scenario, we will lose 1st panic messages.

1. kernel panics.
2. efi_pstore is kicked and writes panic messages to NVRAM.
3. system reboots.
4. kernel panics again before a user checks the 1st panic messages in NVRAM.

[Solution]

   Solutions of this problem has been discussed among Tony, Matthew, Don, Mike 
and me.

   http://marc.info/?l=linux-kernel=134273270704586=2

   And there are two possible solutions right now.
 - First one is introducing some policy overwriting existing logs.
 - Second one is simply holding multiple log without overwriting any 
entries.

   We haven't decided the overwriting policy which is reasonable to all users 
yet.
   But I believe we agree that just holding multiple logs is a reasonable way.

   We may need further discussions to find the possibility of introducing 
overwriting
   policy, especially getting critical messages in multiple oops case.
   But I would like to begin with a simple and reasonable way to everyone.
   So, this patch takes an approach just holding multiple logs.

[Patch Description]

(1/5) efi_pstore: Check remaining space with QueryVariableInfo() before writing 
data

(2/5) efi_pstore: Add a logic erasing entries to an erase callback

(3/5) efi_pstore: Remove a logic erasing entries from a write callback to hold 
multiple logs

(4/5) efi_pstore: Add ctime to argument of erase callback

(5/5) efi_pstore: Add a sequence counter to a variable name

Detailed explanations are written in each patch.

 drivers/acpi/apei/erst.c   |   16 
 drivers/firmware/efivars.c |  100 ---
 fs/pstore/inode.c  |7 ++-
 fs/pstore/internal.h   |2 +-
 fs/pstore/platform.c   |   11 +++--
 fs/pstore/ram.c|9 ++--
 include/linux/efi.h|1 +
 include/linux/pstore.h |6 ++-
 8 files changed, 94 insertions(+), 58 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: readahead: remove redundant ra_pages in file_ra_state

2012-10-24 Thread Ni zhan Chen

On 10/25/2012 08:17 AM, YingHang Zhu wrote:

On Thu, Oct 25, 2012 at 4:19 AM, Dave Chinner  wrote:

On Wed, Oct 24, 2012 at 07:53:59AM +0800, YingHang Zhu wrote:

Hi Dave,
On Wed, Oct 24, 2012 at 6:47 AM, Dave Chinner  wrote:

On Tue, Oct 23, 2012 at 08:46:51PM +0800, Ying Zhu wrote:

Hi,
   Recently we ran into the bug that an opened file's ra_pages does not
synchronize with it's backing device's when the latter is changed
with blockdev --setra, the application needs to reopen the file
to know the change,

or simply call fadvise(fd, POSIX_FADV_NORMAL) to reset the readhead
window to the (new) bdi default.


which is inappropriate under our circumstances.

Which are? We don't know your circumstances, so you need to tell us
why you need this and why existing methods of handling such changes
are insufficient...

Optimal readahead windows tend to be a physical property of the
storage and that does not tend to change dynamically. Hence block
device readahead should only need to be set up once, and generally
that can be done before the filesystem is mounted and files are
opened (e.g. via udev rules). Hence you need to explain why you need
to change the default block device readahead on the fly, and why
fadvise(POSIX_FADV_NORMAL) is "inappropriate" to set readahead
windows to the new defaults.

Our system is a fuse-based file system, fuse creates a
pseudo backing device for the user space file systems, the default readahead
size is 128KB and it can't fully utilize the backing storage's read ability,
so we should tune it.

Sure, but that doesn't tell me anything about why you can't do this
at mount time before the application opens any files. i.e.  you've
simply stated the reason why readahead is tunable, not why you need
to be fully dynamic.

We store our file system's data on different disks so we need to change ra_pages
dynamically according to where the data resides, it can't be fixed at mount time
or when we open files.
The abstract bdi of fuse and btrfs provides some dynamically changing
bdi.ra_pages
based on the real backing device. IMHO this should not be ignored.


And how to tune ra_pages if one big file distribution in different 
disks, I think Fengguang Wu can answer these questions,


Hi Fengguang,


The above third-party application using our file system maintains
some long-opened files, we does not have any chances
to force them to call fadvise(POSIX_FADV_NORMAL). :(

So raise a bug/feature request with the third party.  Modifying
kernel code because you can't directly modify the application isn't
the best solution for anyone. This really is an application problem
- the kernel already provides the mechanisms to solve this
problem...  :/

Thanks for advice, I will consult the above application's developers
for more information.
Now from the code itself should we merge the gap between the real
device's ra_pages and the file's?
Obviously the ra_pages is duplicated, otherwise each time we run into this
problem, someone will do the same work as I have done here.

Thanks,
  Ying Zhu

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org;> em...@kvack.org 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

2012-10-24 Thread Jiang Liu
On 2012-10-25 9:31, Tang Chen wrote:
> Hi Toshi,
> 
> On 10/25/2012 01:14 AM, Toshi Kani wrote:
>> On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:
>>> +static int container_device_remove(struct acpi_device *device)
>>> +{
>>> +int ret;
>>> +struct acpi_eject_event *ej_event;
>>> +
>>> +/* stop container device at first */
>>> +ret = acpi_bus_trim(device, 0);
>>
>> Hi Tang,
>>
>> Why do you need to call acpi_bus_trim(device,0) to stop the container
>> device first?
> 
> This issue was introduced by Lu Yinghai, I think he could give a better
> answer than me. :)
> Please refer to the following url:
> 
> http://www.spinics.net/lists/linux-pci/msg17667.html
> 
> However, this is not applied into the pci tree yet.
We have worked out a patch set to clean up the logic for PCI/ACPI binding
relationship. It updates PCI/ACPI binding relationship by registering bus
notification onto pci_bus_type instead of hooking into the ACPI/glue.c.

To accommodate that patch set, the ACPI device destroy process has been
split into two steps:
1) acpi_bus_trim(device,0) to unbind ACPI drivers
2) acpi_bus_trim(device,1) to destroy ACPI devices

> 
>>
>>> +printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);
>>
>> Do you need this message in the normal case?  If so, I'd suggest to use
>> pr_debug().
>>
>>> +if (ret)
>>> +return ret;
>>> +
>>> +/* event originated from ACPI eject notification */
>>> +device->flags.eject_pending = 1;
>>
>> You do not need to set the eject_pending flag when the handler calls
>> acpi_bus_hot_remove_device().  It was set before because the handler did
>> not initiate the hot-remove operation.
> 
> I just set it to keep the logic the same as before.
> And thanks for telling me this. :)
> 
>>
> ...
>>> +printk(KERN_WARNING "Container driver received %s event\n",
>>> +"ACPI_NOTIFY_EJECT_REQUEST");
>>
>> Same as other comment.  Suggest to use pr_debug().
> 
> OK. :)
> 
>>
>>> +
>>> +if (!present || ACPI_FAILURE(status) || !device)
>>> +break;
>>> +
>>> +result = container_device_remove(device);
>>> +if (result) {
>>> +printk(KERN_WARNING "Failed to remove container\n");
>>
>> Please use pr_warn().
>>
>> Thanks,
>> -Toshi
>>
>>> +break;
>>>   }
>>> -break;
>>> +
>>> +return;
>>>
>>>   default:
>>>   /* non-hotplug event; possibly handled by other handler */
>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount)

2012-10-24 Thread Nix
On 25 Oct 2012, Theodore Ts'o stated:

> On Thu, Oct 25, 2012 at 12:27:02AM +0100, Nix wrote:
>>
>>  - /sbin/reboot -f of running system
>>-> Journal replay, no problems other than the expected free block
>>   count problems. This is not such a severe problem after all!
>> 
>>  - Normal shutdown, but a 60 second pause after lazy umount, more than
>>long enough for all umounts to proceed to termination
>>-> no corruption, but curiously /home experienced a journal replay
>>   before being fscked, even though a cat of /proc/mounts after
>>   umounting revealed that the only mounted filesystem was /,
>>   read-only, so /home should have been clean
>
> Question: how are you doing the journal replay?  Is it happening as
> part of running e2fsck, or are you mounting the file system and
> letting kernel do the journal replay?

This most recent instance was e2fsck. Normally, it's mount. Both
seem able to yield the same corruption.

> Also, can you reproduce the problem with the nobarrier and
> journal_async_commit options *removed*?  Yes, I know you have battery
> backup, but it would be interesting to see if the problem shows up in
> the default configuration with none of the more specialist options.
> (So it would probably be good to test with journal_checksum removed as
> well.)

I'll try that, hopefully tomorrow sometime. It's 2:30am now and probably
time to sleep.

>> Unfortunately, the massive corruption in the last testcase was seen in
>> 3.6.1 as well as 3.6.3: it appears that the only effect that superblock
>> change had in 3.6.3 was to make this problem easier to hit, and that the
>> bug itself was introduced probably somewhere between 3.5 and 3.6 (though
>> I only rebooted 3.5.x twice, and it's rare enough before 3.6.[23], at
>> ~1/20 boots, that it may have been present for longer and I never
>> noticed).
>
> Hmm ok.  Can you tell whether or not the 2nd patch I posted on
> this thread made any difference to how frequently it happened?  The

Well, I had a couple of reboots without corruption with that patch
applied, and /home was only ever corrupted with it not applied -- but
that could perfectly well be chance, since I only had two or three
instances of /home corruption so far, thank goodness.

> When you say it's rare before 3.6.[23], how rare is it?  How reliably
> can you trigger it under 3.6.1?  One in 3?  One in 5?  One in 20?

I've rebooted out of 3.6.1 about fifteen times so far. I've seen once
instance of corruption. I've never seen it before 3.6, but I only
rebooted 3.5.x or 3.4.x once or twice in total, so that too could be
chance.

> As far as bisecting, one experiment that I'd really appreciate your
> doing is to check and see whether you can reproduce the problem using
> the 3.4 kernel, and if you can, to see if it reproduces under the 3.3
> kernel.

Will try. It might be the weekend before I can find the time though :(

> The reason why I ask this is there were not any major changes between
> 3.5 and 3.6, or between 3.4 and 3.5.  There *were* however, some
> fairly major changes made by Jan Kara that were introduced between 3.3
> and 3.4.  Among other things, this is where we started using FUA
> (Force Unit Attention) writes to update the journal superblock instead
> of just using REQ_FLUSH.  This is in fact the most likely place where
> we might have introduced the regression, since it wouldn't surprise me
> if Jan didn't test the case of using nobarrier with a storage array
> with battery backup (I certainly didn't, since I don't have easy
> access to such fancy toys :-).

Hm. At boot, I see this for both volumes on the Areca controller:

[0.855376] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[0.855465] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA

So it looks to me like FUA changes themselves could have little effect.

(btw, the controller cost only about £150... if it was particularly
fancy I certainly couldn't have afforded it.)

>> It also appears impossible for me to reliably shut my system down,
>> though a 60s timeout after lazy umount and before reboot is likely to
>> work in all but the most pathological of cases (where a downed NFS
>> server comes up at just the wrong instant): it is clear that the
>> previous 5s timeout eventually became insufficient simply because of the
>> amount of time it can take to do a umount on today's larger filesystems.
>
> Something that you might want to consider trying is after you kill all
> of the processes, remount all of the local disk file systems
> read-only, then kick off the unmount of the NFS file systems (just to
> be nice to the NFS servers, so they are notified of the unmount), and

Actually I umount NFS first of all, because if I kill the processes
first, this causes trouble with the NFS unmounts, particularly if I'm
doing self-mounting (which I do sometimes, though not at the moment).

I will certainly try 

[BUGFIX] PCI/PM: Fix proc config reg access for D3cold and bridge suspending

2012-10-24 Thread Huang Ying
In

  https://bugzilla.kernel.org/show_bug.cgi?id=48981

Peter reported that /proc/bus/pci/??/??.? does not works for 3.6.
This is This is because the device configuration space registers will
be not accessible if the corresponding parent bridge is suspended or
the device is put into D3cold state.

This is the same as /sys/bus/pci/devices/:??:??.?/config access
issue.  So the function used to solve sysfs issue is used to solve
this issue.

Cc: sta...@vger.kernel.org
Reported-by: Peter 
Signed-off-by: Huang Ying 
---
 drivers/pci/pci-sysfs.c |   34 --
 drivers/pci/pci.c   |   32 
 drivers/pci/pci.h   |2 ++
 drivers/pci/proc.c  |8 
 4 files changed, 42 insertions(+), 34 deletions(-)

--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -458,40 +458,6 @@ boot_vga_show(struct device *dev, struct
 }
 struct device_attribute vga_attr = __ATTR_RO(boot_vga);
 
-static void
-pci_config_pm_runtime_get(struct pci_dev *pdev)
-{
-   struct device *dev = >dev;
-   struct device *parent = dev->parent;
-
-   if (parent)
-   pm_runtime_get_sync(parent);
-   pm_runtime_get_noresume(dev);
-   /*
-* pdev->current_state is set to PCI_D3cold during suspending,
-* so wait until suspending completes
-*/
-   pm_runtime_barrier(dev);
-   /*
-* Only need to resume devices in D3cold, because config
-* registers are still accessible for devices suspended but
-* not in D3cold.
-*/
-   if (pdev->current_state == PCI_D3cold)
-   pm_runtime_resume(dev);
-}
-
-static void
-pci_config_pm_runtime_put(struct pci_dev *pdev)
-{
-   struct device *dev = >dev;
-   struct device *parent = dev->parent;
-
-   pm_runtime_put(dev);
-   if (parent)
-   pm_runtime_put_sync(parent);
-}
-
 static ssize_t
 pci_read_config(struct file *filp, struct kobject *kobj,
struct bin_attribute *bin_attr,
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1858,6 +1858,38 @@ bool pci_dev_run_wake(struct pci_dev *de
 }
 EXPORT_SYMBOL_GPL(pci_dev_run_wake);
 
+void pci_config_pm_runtime_get(struct pci_dev *pdev)
+{
+   struct device *dev = >dev;
+   struct device *parent = dev->parent;
+
+   if (parent)
+   pm_runtime_get_sync(parent);
+   pm_runtime_get_noresume(dev);
+   /*
+* pdev->current_state is set to PCI_D3cold during suspending,
+* so wait until suspending completes
+*/
+   pm_runtime_barrier(dev);
+   /*
+* Only need to resume devices in D3cold, because config
+* registers are still accessible for devices suspended but
+* not in D3cold.
+*/
+   if (pdev->current_state == PCI_D3cold)
+   pm_runtime_resume(dev);
+}
+
+void pci_config_pm_runtime_put(struct pci_dev *pdev)
+{
+   struct device *dev = >dev;
+   struct device *parent = dev->parent;
+
+   pm_runtime_put(dev);
+   if (parent)
+   pm_runtime_put_sync(parent);
+}
+
 /**
  * pci_pm_init - Initialize PM functions of given PCI device
  * @dev: PCI device to handle.
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -72,6 +72,8 @@ extern void pci_disable_enabled_device(s
 extern int pci_finish_runtime_suspend(struct pci_dev *dev);
 extern int __pci_pme_wakeup(struct pci_dev *dev, void *ign);
 extern void pci_wakeup_bus(struct pci_bus *bus);
+extern void pci_config_pm_runtime_get(struct pci_dev *dev);
+extern void pci_config_pm_runtime_put(struct pci_dev *dev);
 extern void pci_pm_init(struct pci_dev *dev);
 extern void platform_pci_wakeup_init(struct pci_dev *dev);
 extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -76,6 +76,8 @@ proc_bus_pci_read(struct file *file, cha
if (!access_ok(VERIFY_WRITE, buf, cnt))
return -EINVAL;
 
+   pci_config_pm_runtime_get(dev);
+
if ((pos & 1) && cnt) {
unsigned char val;
pci_user_read_config_byte(dev, pos, );
@@ -121,6 +123,8 @@ proc_bus_pci_read(struct file *file, cha
cnt--;
}
 
+   pci_config_pm_runtime_put(dev);
+
*ppos = pos;
return nbytes;
 }
@@ -146,6 +150,8 @@ proc_bus_pci_write(struct file *file, co
if (!access_ok(VERIFY_READ, buf, cnt))
return -EINVAL;
 
+   pci_config_pm_runtime_get(dev);
+
if ((pos & 1) && cnt) {
unsigned char val;
__get_user(val, buf);
@@ -191,6 +197,8 @@ proc_bus_pci_write(struct file *file, co
cnt--;
}
 
+   pci_config_pm_runtime_put(dev);
+
*ppos = pos;
i_size_write(ino, dp->size);
return nbytes;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v2 2/2] Improve container_notify_cb() to support container hot-remove.

2012-10-24 Thread Tang Chen

Hi Toshi,

On 10/25/2012 01:14 AM, Toshi Kani wrote:

On Wed, 2012-10-24 at 14:05 +0800, Tang Chen wrote:

+static int container_device_remove(struct acpi_device *device)
+{
+   int ret;
+   struct acpi_eject_event *ej_event;
+
+   /* stop container device at first */
+   ret = acpi_bus_trim(device, 0);


Hi Tang,

Why do you need to call acpi_bus_trim(device,0) to stop the container
device first?


This issue was introduced by Lu Yinghai, I think he could give a better
answer than me. :)
Please refer to the following url:

http://www.spinics.net/lists/linux-pci/msg17667.html

However, this is not applied into the pci tree yet.




+   printk(KERN_WARNING "acpi_bus_trim stop return %x\n", ret);


Do you need this message in the normal case?  If so, I'd suggest to use
pr_debug().


+   if (ret)
+   return ret;
+
+   /* event originated from ACPI eject notification */
+   device->flags.eject_pending = 1;


You do not need to set the eject_pending flag when the handler calls
acpi_bus_hot_remove_device().  It was set before because the handler did
not initiate the hot-remove operation.


I just set it to keep the logic the same as before.
And thanks for telling me this. :)




...

+   printk(KERN_WARNING "Container driver received %s event\n",
+   "ACPI_NOTIFY_EJECT_REQUEST");


Same as other comment.  Suggest to use pr_debug().


OK. :)




+
+   if (!present || ACPI_FAILURE(status) || !device)
+   break;
+
+   result = container_device_remove(device);
+   if (result) {
+   printk(KERN_WARNING "Failed to remove container\n");


Please use pr_warn().

Thanks,
-Toshi


+   break;
}
-   break;
+
+   return;

default:
/* non-hotplug event; possibly handled by other handler */



--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >