RE: [PATCH] x86/PAT: priority the PAT warn to error to highlight the developer

2019-09-30 Thread Zhang, Jun
Please see my comments.

Thanks,
Jun

-Original Message-
From: Borislav Petkov  
Sent: Monday, September 30, 2019 8:03 PM
To: Zhang, Jun 
Cc: dave.han...@linux.intel.com; l...@kernel.org; pet...@infradead.org; 
t...@linutronix.de; mi...@redhat.com; h...@zytor.com; He, Bo ; 
x...@kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/PAT: priority the PAT warn to error to highlight the 
developer

On Sun, Sep 29, 2019 at 03:20:31PM +0800, jun.zh...@intel.com wrote:
> From: zhang jun 
> 
> Documentation/x86/pat.txt says:
> set_memory_uc() or set_memory_wc() must use together with 
> set_memory_wb()

I had to open that file to see what it actually says - btw, the filename is 
pat.rst now - and you're very heavily paraphrasing what is there. So try again 
explaining what the requirement is.
[ZJ] next parts come from pat.txt in kernel version 4.19
Drivers wanting to export some pages to userspace do it by using mmap
interface and a combination of
1) pgprot_noncached()
2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn()

With PAT support, a new API pgprot_writecombine is being added. So, drivers can
continue to use the above sequence, with either pgprot_noncached() or
pgprot_writecombine() in step 1, followed by step 2.

In addition, step 2 internally tracks the region as UC or WC in memtype
list in order to ensure no conflicting mapping.

Note that this set of APIs only works with IO (non RAM) regions. If driver
wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
as step 0 above and also track the usage of those pages and use set_memory_wb()
before the page is freed to free pool.

> if break the PAT attribute, there are tons of warning like:
> [   45.846872] x86/PAT: NDK MediaCodec_:3753 map pfn RAM range req

That's some android NDK thing, it seems: "The Android NDK is a toolset that 
lets you implement parts of your app in native code,... " lemme guess, they 
have a kernel module?
[ZJ] no, "NDK MediaCodec_" is an android codec2.0 process. It want to use WC 
memory.

> write-combining for [mem 0x1e7a8-0x1e7a87fff], got write-back and 
> in the extremely case, we see kernel panic unexpected like:
> list_del corruption. prev->next should be 88806dbe69c0, but was 
> 888036f048c0

This is not really helpful. You need to explain what exactly you're doing - not 
shortening the error messages.
[ZJ] android codec2.0 want to use WC memory. Which use ion to allocate memory. 
So, we enable drivers/staging/android/ion, which work well except X86, x86 need 
to set_memory_wc().
So there are tons of warning, then list_del corruption. I use this 
patch(https://www.lkml.org/lkml/2019/9/29/25), list crash disappear.
Next is error message.
<4>[49967.389732] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091f8fff], got write-back
<4>[49967.389747] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f7fff], got write-back
<4>[49967.390622] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x10909-0x109090fff], got write-back
<4>[49967.390687] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091fbfff], got write-back
<4>[49967.390855] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f4fff], got write-back
<4>[49967.391405] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x109098000-0x109098fff], got write-back
<4>[49967.391454] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091f8fff], got write-back
<4>[49967.391474] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f7fff], got write-back
<4>[49967.392641] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x14eb68000-0x14eb68fff], got write-back
<4>[49967.392708] x86/PAT: .vorbis.decoder:10606 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091fbfff], got write-back
<4>[49967.393001] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f4000-0x1091f4fff], got write-back
<4>[49967.394066] x86/PAT: NDK MediaCodec_:10602 map pfn RAM range req 
write-combining for [mem 0x1091f8000-0x1091f8fff], got write-back
<6>[50045.677129] binder: 3390:3390 transaction failed 29189/-22, size 88-0 
line 3131
<3>[50046.153621] list_del corruption. prev->next should be 89598004c960, 
but was 895ad46e4590
<4>[50046.163464] invalid opcode:  [#1] PREEMPT SMP NOPTI
<4>[50046.169297] CPU: 1 PID: 18792 Comm: Binder:3390_1B Tainted: G U O 
 4.19.68-PKT-190905T163945Z-00031-g9de920e66b4e #1
<4>[50046.182213] RIP: 0010:__list_del_entry_valid+0x78/0x

RE: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup from softirq and interrupt

2019-03-27 Thread Zhang, Jun
Hello, Paul

Yes, I only use original V3.18.136 to test. Because system run very slowly, I 
give up.

Device: NUC (made in 2017)
OS:ubuntu 16.04
Kernel: V3.18.136 (come from 
https://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18.136/) 
System can boot, but run very slowly, and couldn't connect the network.
This make me not to work with terminal, because system very slowly. 
So I don’t use V3.18.136+patch to test.


-Original Message-
From: Paul E. McKenney [mailto:paul...@linux.ibm.com] 
Sent: Wednesday, March 27, 2019 23:10
To: Zhang, Jun 
Cc: He, Bo ; Greg Kroah-Hartman ; 
linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Xiao, Jin 
; Bai, Jie A 
Subject: Re: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup from 
softirq and interrupt

Hello, Jun,

Do you see the same hang without the patch?

Thanx, Paul

On Wed, Mar 27, 2019 at 01:50:57AM +, Zhang, Jun wrote:
> Hello,Paul
> 
> I use a new NUC (made in 2017) to test, meet the same hang. System run very 
> slowly.
> 
> But I use my PC (made before 2015), test V3.18.136+ patch for 12 hours, still 
> well.
> 
> Maybe V3.18.y don't support some new devices.
> 
> 
> -Original Message-
> From: Paul E. McKenney [mailto:paul...@linux.ibm.com]
> Sent: Tuesday, March 26, 2019 23:56
> To: He, Bo 
> Cc: Greg Kroah-Hartman ; 
> linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Zhang, Jun 
> ; Xiao, Jin ; Bai, Jie A 
> 
> Subject: Re: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup 
> from softirq and interrupt
> 
> On Tue, Mar 26, 2019 at 08:43:45AM +, He, Bo wrote:
> > Hi, Paul:
> > I have tried on my PC and not hit any hang issue with RCU torture test 
> > for one hour, the configurations are like:
> > OS: ubuntu 16.04
> > kenrel: 3.18.136 + 3.18 rcu patch
> > CPU:  Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
> 
> Sounds good, please proceed!
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Tuesday, March 26, 2019 12:00 AM
> > To: Greg Kroah-Hartman 
> > Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> > sta...@vger.kernel.org; Zhang, Jun ; Xiao, Jin 
> > ; Bai, Jie A 
> > Subject: Re: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup 
> > from softirq and interrupt
> > 
> > On Sat, Mar 23, 2019 at 07:33:15AM +0100, Greg Kroah-Hartman wrote:
> > > On Fri, Mar 22, 2019 at 04:00:17PM +, He, Bo wrote:
> > > > Hi, Greg:
> > > > Can you hold on the 3.18-stable branch, it seems there are some 
> > > > issue, please see the comments from Paul:
> > > > 
> > > > Comments from Paul:
> > > > I subjected all of the others to light rcutorture testing, which 
> > > > they passed.  This v3.18 patch hung, however.  Trying it again 
> > > > with stock
> > > > v3.18 got the same hang, so I believe we can exonerate the patch and 
> > > > give it a good firm "maybe" on 3.18.
> > > > 
> > > > Worth paying special attention to further test results from 3.18.x, 
> > > > though!
> > > 
> > > Ok, I've dropped this from the 3.18.y queue now, thanks.
> > 
> > Bo, if you know of a "y" for 3.18.y that would likely pass rcutorture 
> > testing, please let me know.
> > 
> > Thanx, Paul
> > 
> 



RE: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup from softirq and interrupt

2019-03-26 Thread Zhang, Jun
Hello,Paul

I use a new NUC (made in 2017) to test, meet the same hang. System run very 
slowly.

But I use my PC (made before 2015), test V3.18.136+ patch for 12 hours, still 
well.

Maybe V3.18.y don't support some new devices.


-Original Message-
From: Paul E. McKenney [mailto:paul...@linux.ibm.com] 
Sent: Tuesday, March 26, 2019 23:56
To: He, Bo 
Cc: Greg Kroah-Hartman ; 
linux-kernel@vger.kernel.org; sta...@vger.kernel.org; Zhang, Jun 
; Xiao, Jin ; Bai, Jie A 

Subject: Re: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup from 
softirq and interrupt

On Tue, Mar 26, 2019 at 08:43:45AM +, He, Bo wrote:
> Hi, Paul:
>   I have tried on my PC and not hit any hang issue with RCU torture test 
> for one hour, the configurations are like:
> OS: ubuntu 16.04
> kenrel: 3.18.136 + 3.18 rcu patch
> CPU:  Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz

Sounds good, please proceed!

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Tuesday, March 26, 2019 12:00 AM
> To: Greg Kroah-Hartman 
> Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> sta...@vger.kernel.org; Zhang, Jun ; Xiao, Jin 
> ; Bai, Jie A 
> Subject: Re: [PATCH 3.18 132/134] rcu: Do RCU GP kthread self-wakeup 
> from softirq and interrupt
> 
> On Sat, Mar 23, 2019 at 07:33:15AM +0100, Greg Kroah-Hartman wrote:
> > On Fri, Mar 22, 2019 at 04:00:17PM +, He, Bo wrote:
> > > Hi, Greg:
> > >   Can you hold on the 3.18-stable branch, it seems there are some issue, 
> > > please see the comments from Paul:
> > > 
> > > Comments from Paul:
> > > I subjected all of the others to light rcutorture testing, which 
> > > they passed.  This v3.18 patch hung, however.  Trying it again 
> > > with stock
> > > v3.18 got the same hang, so I believe we can exonerate the patch and give 
> > > it a good firm "maybe" on 3.18.
> > > 
> > > Worth paying special attention to further test results from 3.18.x, 
> > > though!
> > 
> > Ok, I've dropped this from the 3.18.y queue now, thanks.
> 
> Bo, if you know of a "y" for 3.18.y that would likely pass rcutorture 
> testing, please let me know.
> 
>   Thanx, Paul
> 



[tip:core/rcu] rcu: Do RCU GP kthread self-wakeup from softirq and interrupt

2019-02-13 Thread tip-bot for Zhang, Jun
Commit-ID:  1d1f898df6586c5ea9aeaf349f13089c6fa37903
Gitweb: https://git.kernel.org/tip/1d1f898df6586c5ea9aeaf349f13089c6fa37903
Author: Zhang, Jun 
AuthorDate: Tue, 18 Dec 2018 06:55:01 -0800
Committer:  Paul E. McKenney 
CommitDate: Fri, 25 Jan 2019 15:29:59 -0800

rcu: Do RCU GP kthread self-wakeup from softirq and interrupt

The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread.  Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.

Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call.  In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context.  Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.

This race window is quite narrow, but it actually did happen during real
testing.  It would of course need to be fixed even if it was strictly
theoretical in nature.

This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.

Fixes: 48a7639ce80c ("rcu: Make callers awaken grace-period kthread")
Reported-by: "He, Bo" 
Co-developed-by: "Zhang, Jun" 
Co-developed-by: "He, Bo" 
Co-developed-by: "xiao, jin" 
Co-developed-by: Bai, Jie A 
Signed-off: "Zhang, Jun" 
Signed-off: "He, Bo" 
Signed-off: "xiao, jin" 
Signed-off: Bai, Jie A 
Signed-off-by: "Zhang, Jun" 
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
  !in_serving_softirq() to avoid redundant wakeups and to also handle the
  interrupt-handler scenario as well as the softirq-handler scenario that
  actually occurred in testing. ]
Signed-off-by: Paul E. McKenney 
Link: 
https://lkml.kernel.org/r/cd6925e8781efd4d8e11882d20fc406d52a11...@shsmsx104.ccr.corp.intel.com
---
 kernel/rcu/tree.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9ceb93f848cd..21775eebb8f0 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1593,15 +1593,23 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
 }
 
 /*
- * Awaken the grace-period kthread.  Don't do a self-awaken, and don't
- * bother awakening when there is nothing for the grace-period kthread
- * to do (as in several CPUs raced to awaken, and we lost), and finally
- * don't try to awaken a kthread that has not yet been created.  If
- * all those checks are passed, track some debug information and awaken.
+ * Awaken the grace-period kthread.  Don't do a self-awaken (unless in
+ * an interrupt or softirq handler), and don't bother awakening when there
+ * is nothing for the grace-period kthread to do (as in several CPUs raced
+ * to awaken, and we lost), and finally don't try to awaken a kthread that
+ * has not yet been created.  If all those checks are passed, track some
+ * debug information and awaken.
+ *
+ * So why do the self-wakeup when in an interrupt or softirq handler
+ * in the grace-period kthread's context?  Because the kthread might have
+ * been interrupted just as it was going to sleep, and just after the final
+ * pre-sleep check of the awaken condition.  In this case, a wakeup really
+ * is required, and is therefore supplied.
  */
 static void rcu_gp_kthread_wake(void)
 {
-   if (current == rcu_state.gp_kthread ||
+   if ((current == rcu_state.gp_kthread &&
+!in_interrupt() && !in_serving_softirq()) ||
!READ_ONCE(rcu_state.gp_flags) ||
!rcu_state.gp_kthread)
return;


[tip:core/rcu] rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()

2019-02-13 Thread tip-bot for Zhang, Jun
Commit-ID:  13dc7d0c7a2ed438f0ec8e9fb365a1256d87cf87
Gitweb: https://git.kernel.org/tip/13dc7d0c7a2ed438f0ec8e9fb365a1256d87cf87
Author: Zhang, Jun 
AuthorDate: Wed, 19 Dec 2018 10:37:34 -0800
Committer:  Paul E. McKenney 
CommitDate: Fri, 25 Jan 2019 15:30:00 -0800

rcu: Prevent needless ->gp_seq_needed update in __note_gp_changes()

Currently, __note_gp_changes() checks to see if the rcu_node structure's
->gp_seq_needed is greater than or equal to that of the rcu_data
structure, and if so, updates the rcu_data structure's ->gp_seq_needed
field.  This results in a useless store in the case where the two fields
are equal.

This commit therefore carries out this store only in the case where the
rcu_node structure's ->gp_seq_needed is strictly greater than that of
the rcu_data structure.

Signed-off-by: "Zhang, Jun" 
Signed-off-by: Paul E. McKenney 
Link: 
https://lkml.kernel.org/r/88dc34334ca3444c85d647dbfa962c2735ad5...@shsmsx104.ccr.corp.intel.com
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 21775eebb8f0..9d0e2ac9356e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1758,7 +1758,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, 
struct rcu_data *rdp)
zero_cpu_stall_ticks(rdp);
}
rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
-   if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
+   if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
rdp->gp_seq_needed = rnp->gp_seq_needed;
WRITE_ONCE(rdp->gpwrap, false);
rcu_gpnum_ovf(rnp, rdp);


RE: rcu_preempt caused oom

2018-12-17 Thread Zhang, Jun
Hello, paul

In softirq context, and current is rcu_preempt-10,  rcu_gp_kthread_wake don't 
wakeup rcu_preempt.
Maybe next patch could fix it. Please help review.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1..98f5b40 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1697,7 +1697,7 @@ static bool rcu_future_gp_cleanup(struct rcu_state *rsp, 
struct rcu_node *rnp)
  */
 static void rcu_gp_kthread_wake(struct rcu_state *rsp)
 {
-   if (current == rsp->gp_kthread ||
+   if (((current == rsp->gp_kthread) && !in_softirq()) ||
!READ_ONCE(rsp->gp_flags) ||
!rsp->gp_kthread)
return;

[44932.311439, 0][ rcu_preempt]  rcu_preempt-10[001] .n.. 
44929.401037: rcu_grace_period: rcu_preempt 19063548 reqwait
..
[44932.311517, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
44929.402234: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startleaf
[44932.311536, 0][ rcu_preempt]  rcu_preempt-10[001] d.s2 
44929.402237: rcu_future_grace_period: rcu_preempt 19063548 19063552 0 0 3 
Startedroot


-Original Message-
From: He, Bo 
Sent: Tuesday, December 18, 2018 07:16
To: paul...@linux.ibm.com
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: RE: rcu_preempt caused oom

Thanks for your comments, the issue could be panic with the change if (ret == 
1). Here enclosed are the logs.

-Original Message-
From: Paul E. McKenney  
Sent: Monday, December 17, 2018 12:26 PM
To: He, Bo 
Cc: Zhang, Jun ; Steven Rostedt ; 
linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Xiao, Jin 
; Zhang, Yanmin ; Bai, Jie A 
; Sun, Yi J ; Chang, Junxiao 
; Mei, Paul 
Subject: Re: rcu_preempt caused oom

On Mon, Dec 17, 2018 at 03:15:42AM +, He, Bo wrote:
> for double confirm the issue is not reproduce after 90 hours, we tried only 
> add the enclosed patch on the easy reproduced build, the issue is not 
> reproduced after 63 hours in the whole weekend on 16 boards.
> so current conclusion is the debug patch has extreme  effect on the rcu issue.

This is not a surprise.  (Please see the end of this email for a replacement 
patch that won't suppress the bug.)

To see why this is not a surprise, let's take a closer look at your patch, in 
light of the comment header for wait_event_idle_timeout_exclusive():

 * Returns:
 * 0 if the @condition evaluated to %false after the @timeout elapsed,
 * 1 if the @condition evaluated to %true after the @timeout elapsed,
 * or the remaining jiffies (at least 1) if the @condition evaluated
 * to %true before the @timeout elapsed.

The situation we are seeing is that the RCU_GP_FLAG_INIT is set, but the 
rcu_preempt task does not wake up.  This would correspond to the second case 
above, that is, a return value of 1.  Looking now at your patch, with comments 
interspersed below:



>From e8b583aa685b3b4f304f72398a80461bba09389c Mon Sep 17 00:00:00 2001
From: "he, bo" 
Date: Sun, 9 Dec 2018 18:11:33 +0800
Subject: [PATCH] rcu: detect the preempt_rcu hang for triage jing's board

Change-Id: I2ffceec2ae4847867753609e45c99afc66956003
Tracked-On:
Signed-off-by: he, bo 
---
 kernel/rcu/tree.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 78c0cf2..d6de363 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2192,8 +2192,13 @@ static int __noreturn rcu_gp_kthread(void *arg)
int ret;
struct rcu_state *rsp = arg;
struct rcu_node *rnp = rcu_get_root(rsp);
+   pid_t rcu_preempt_pid;
 
rcu_bind_gp_kthread();
+   if(!strcmp(rsp->name, "rcu_preempt")) {
+   rcu_preempt_pid = rsp->gp_kthread->pid;
+   }
+
for (;;) {
 
/* Handle grace-period start. */
@@ -2202,8 +2207,19 @@ static int __noreturn rcu_gp_kthread(void *arg)
   READ_ONCE(rsp->gp_seq),
   TPS("reqwait"));
rsp->gp_state = RCU_GP_WAIT_GPS;
-   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
-RCU_GP_FLAG_INIT);
+   if (current->pid != rcu_preempt_pid) {
+   swait_event_idle_exclusive(rsp->gp_wq, 
READ_ONCE(rsp->gp_flags) &
+   RCU_GP_FLAG_INIT);
+   } else {
+   ret = 
swait_event_idle_timeout_exclusive(rsp-&g

RE: rcu_preempt caused oom

2018-12-12 Thread Zhang, Jun
Hello, Paul

I think the next patch is better.
Because ULONG_CMP_GE could cause double write, which has risk that write back 
old value.
Please help review.
I don't test it. If you agree, we will test it.
Thanks!


diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0b760c1..c00f34e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1849,7 +1849,7 @@ static bool __note_gp_changes(struct rcu_state *rsp, 
struct rcu_node *rnp,
zero_cpu_stall_ticks(rdp);
}
rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
-   if (ULONG_CMP_GE(rnp->gp_seq_needed, rdp->gp_seq_needed) || rdp->gpwrap)
+   if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap)
rdp->gp_seq_needed = rnp->gp_seq_needed;
WRITE_ONCE(rdp->gpwrap, false);
rcu_gpnum_ovf(rnp, rdp);


-Original Message-
From: Paul E. McKenney [mailto:paul...@linux.ibm.com] 
Sent: Thursday, December 13, 2018 08:12
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 
; Bai, Jie A ; Sun, Yi J 

Subject: Re: rcu_preempt caused oom

On Wed, Dec 12, 2018 at 11:13:22PM +, He, Bo wrote:
> I don't see the rcutree.sysrq_rcu parameter in v4.19 kernel, I also checked 
> the latest kernel and the latest tag v4.20-rc6, not see the sysrq_rcu.
> Please correct me if I have something wrong.

That would be because I sent you the wrong patch, apologies!  :-/

Please instead see the one below, which does add sysrq_rcu.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Thursday, December 13, 2018 5:03 AM
> To: He, Bo 
> Cc: Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> ; Bai, Jie A 
> Subject: Re: rcu_preempt caused oom
> 
> On Wed, Dec 12, 2018 at 07:42:24AM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 12, 2018 at 01:21:33PM +, He, Bo wrote:
> > > we reproduce on two boards, but I still not see the 
> > > show_rcu_gp_kthreads() dump logs, it seems the patch can't catch the 
> > > scenario.
> > > I double confirmed the CONFIG_PROVE_RCU=y is enabled in the config as 
> > > it's extracted from the /proc/config.gz.
> > 
> > Strange.
> > 
> > Are the systems responsive to sysrq keys once failure occurs?  If 
> > so, I will provide you a sysrq-R or some such to dump out the RCU state.
> 
> Or, as it turns out, sysrq-y if booting with rcutree.sysrq_rcu=1 using the 
> patch below.  Only lightly tested.



commit 04b6245c8458e8725f4169e62912c1fadfdf8141
Author: Paul E. McKenney 
Date:   Wed Dec 12 16:10:09 2018 -0800

rcu: Add sysrq rcu_node-dump capability

Backported from v4.21/v5.0

Life is hard if RCU manages to get stuck without triggering RCU CPU
stall warnings or triggering the rcu_check_gp_start_stall() checks
for failing to start a grace period.  This commit therefore adds a
boot-time-selectable sysrq key (commandeering "y") that allows manually
dumping Tree RCU state.  The new rcutree.sysrq_rcu kernel boot parameter
must be set for this sysrq to be available.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 
0b760c1369f7..e9392a9d6291 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "tree.h"
 #include "rcu.h"
@@ -128,6 +129,9 @@ int num_rcu_lvl[] = NUM_RCU_LVL_INIT;  int rcu_num_nodes 
__read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */
 /* panic() on RCU Stall sysctl. */
 int sysctl_panic_on_rcu_stall __read_mostly;
+/* Commandeer a sysrq key to dump RCU's tree. */ static bool sysrq_rcu; 
+module_param(sysrq_rcu, bool, 0444);
 
 /*
  * The rcu_scheduler_active variable is initialized to the value @@ -662,6 
+666,27 @@ void show_rcu_gp_kthreads(void)  }  
EXPORT_SYMBOL_GPL(show_rcu_gp_kthreads);
 
+/* Dump grace-period-request information due to commandeered sysrq. */ 
+static void sysrq_show_rcu(int key) {
+   show_rcu_gp_kthreads();
+}
+
+static struct sysrq_key_op sysrq_rcudump_op = {
+   .handler = sysrq_show_rcu,
+   .help_msg = "show-rcu(y)",
+   .action_msg = "Show RCU tree",
+   .enable_mask = SYSRQ_ENABLE_DUMP,
+};
+
+static int __init rcu_sysrq_init(void)
+{
+   if (sysrq_rcu)
+   return register_sysrq_key('y', _rcudump_op);
+   return 0;
+}
+early_initcall(rcu_sysrq_init);
+
 /*
  * Send along grace-period-related data for rcutorture diagnostics.
  */



RE: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred

2018-07-19 Thread Zhang, Jun
Hello, Takashi

I think use our patch, it's NOT possible that the returned size is over 
sgbuf->tblsize.

In function snd_malloc_sgbuf_pages, 

Pages is align page,
sgbuf->tblsize is align 32*page,
chunk is align 2^n*page,

in our panic case, pages = 123, tlbsize = 128,  
1st loop trunk = 32
2nd loop trunk = 32
3rd loop trunk = 32
4th loop trunk = 16
5th loop trunk = 16
So in 5th loop pages-trunk = -5, which make dead loop. 

Use our patch , in 5th loop,  while is break.  Returned size could NOT be over 
sgbuf->tblsize.

-Original Message-
From: Takashi Iwai [mailto:ti...@suse.de] 
Sent: Wednesday, July 18, 2018 20:34
To: He, Bo 
Cc: alsa-de...@alsa-project.org; pe...@perex.cz; linux-kernel@vger.kernel.org; 
Zhang, Jun ; Zhang, Yanmin 
Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred

On Wed, 18 Jul 2018 13:52:45 +0200,
 He, Bo  wrote:
> 
> we see the below kernel panic on stress suspend resume test in 
> snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk 
> maybe larger than the left pages due to the pages alignment, which 
> will cause the pages overflow.
> 
> while (pages > 0) {
>   ...
>   pages -= chunk;
> }
> 
> the patch is change the pages from unsigned int to int to fix the issue.

Thanks for the patch.

Although the analysis is correct, the fix doesn't look ideal.  It's also 
possible that the returned size may over sgbuf->tblsize if we are more unlucky.

A change like below should work instead.  Could you give it a try?


Takashi

-- 8< --
--- a/sound/core/sgbuf.c
+++ b/sound/core/sgbuf.c
@@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device,
break;
}
chunk = tmpb.bytes >> PAGE_SHIFT;
-   for (i = 0; i < chunk; i++) {
+   for (i = 0; i < chunk && pages > 0; i++) {
table->buf = tmpb.area;
table->addr = tmpb.addr;
if (!i)
@@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device,
*pgtable++ = virt_to_page(tmpb.area);
tmpb.area += PAGE_SIZE;
tmpb.addr += PAGE_SIZE;
+   sgbuf->pages++;
+   pages--;
}
-   sgbuf->pages += chunk;
-   pages -= chunk;
if (chunk < maxpages)
maxpages = chunk;
}


RE: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred

2018-07-19 Thread Zhang, Jun
Hello, Takashi

I think use our patch, it's NOT possible that the returned size is over 
sgbuf->tblsize.

In function snd_malloc_sgbuf_pages, 

Pages is align page,
sgbuf->tblsize is align 32*page,
chunk is align 2^n*page,

in our panic case, pages = 123, tlbsize = 128,  
1st loop trunk = 32
2nd loop trunk = 32
3rd loop trunk = 32
4th loop trunk = 16
5th loop trunk = 16
So in 5th loop pages-trunk = -5, which make dead loop. 

Use our patch , in 5th loop,  while is break.  Returned size could NOT be over 
sgbuf->tblsize.

-Original Message-
From: Takashi Iwai [mailto:ti...@suse.de] 
Sent: Wednesday, July 18, 2018 20:34
To: He, Bo 
Cc: alsa-de...@alsa-project.org; pe...@perex.cz; linux-kernel@vger.kernel.org; 
Zhang, Jun ; Zhang, Yanmin 
Subject: Re: [PATCH] ALSA: core: fix unsigned int pages overflow when comapred

On Wed, 18 Jul 2018 13:52:45 +0200,
 He, Bo  wrote:
> 
> we see the below kernel panic on stress suspend resume test in 
> snd_malloc_sgbuf_pages(), snd_dma_alloc_pages_fallback() alloc chunk 
> maybe larger than the left pages due to the pages alignment, which 
> will cause the pages overflow.
> 
> while (pages > 0) {
>   ...
>   pages -= chunk;
> }
> 
> the patch is change the pages from unsigned int to int to fix the issue.

Thanks for the patch.

Although the analysis is correct, the fix doesn't look ideal.  It's also 
possible that the returned size may over sgbuf->tblsize if we are more unlucky.

A change like below should work instead.  Could you give it a try?


Takashi

-- 8< --
--- a/sound/core/sgbuf.c
+++ b/sound/core/sgbuf.c
@@ -108,7 +108,7 @@ void *snd_malloc_sgbuf_pages(struct device *device,
break;
}
chunk = tmpb.bytes >> PAGE_SHIFT;
-   for (i = 0; i < chunk; i++) {
+   for (i = 0; i < chunk && pages > 0; i++) {
table->buf = tmpb.area;
table->addr = tmpb.addr;
if (!i)
@@ -117,9 +117,9 @@ void *snd_malloc_sgbuf_pages(struct device *device,
*pgtable++ = virt_to_page(tmpb.area);
tmpb.area += PAGE_SIZE;
tmpb.addr += PAGE_SIZE;
+   sgbuf->pages++;
+   pages--;
}
-   sgbuf->pages += chunk;
-   pages -= chunk;
if (chunk < maxpages)
maxpages = chunk;
}


RE: [PATCH] sched/fair: fix select_task_rq_fair return -1

2014-12-04 Thread Zhang, Jun
Hello, Hillf
This issue happened in 3.14.25. 
Do you know which patch to fix it in 3.18-rc7?
We can try it.

-Original Message-
From: Hillf Danton [mailto:hillf...@alibaba-inc.com] 
Sent: Thursday, December 04, 2014 5:05 PM
To: Zhang, Jun
Cc: Ingo Molnar; Peter Zijlstra; linux-kernel; Liu, Chuansheng; Liu, 
Changcheng; Hillf Danton; Vincent Guittot
Subject: Re: [PATCH] sched/fair: fix select_task_rq_fair return -1

> 
> From: zhang jun 
> 
> when cpu == -1 and sd->child == NULL, select_task_rq_fair return -1, system 
> panic.
> 
> [ 0.738326] BUG: unable to handle kernel paging request at 
> 8800997ea928 [ 0.746138] IP: [] 
> wake_up_new_task+0x43/0x1b0 [ 0.752886] PGD 25df067 PUD 0 [ 0.756321] 
> Oops:  1 PREEMPT SMP [ 0.760743] Modules linked in:
> [ 0.764179] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 
> 3.14.19-quilt-b27ac761 #2 [ 0.772651] Hardware name: Intel Corporation 
> CHERRYVIEW B1 PLATFORM/Cherry Trail CR, BIOS 
> CHTTRVP1.X64.0003.R08.140453
> 11/11/2014
> [ 0.786084] Workqueue: khelper __call_usermodehelper [ 0.791649] task: 
> 88007955a150 ti: 88007955c000 task.ti: 88007955c000 [ 
> 0.800021] RIP: 0010:[] [] 
> wake_up_new_task+0x43/0x1b0 [ 0.809478] RSP: :88007955dd58 
> EFLAGS: 00010092 [ 0.815422] RAX:  RBX: 
> 0001 RCX: 0020 [ 0.823404] RDX: 
>  RSI: 0020 RDI: 0020 [ 
> 0.831386] RBP: 88007955dd80 R08: 880079604b58 R09: 
>  [ 0.839368] R10: 0004 R11: 
> eae0 R12: 8800797ea650 [ 0.847350] R13: 
> 4000 R14: 8800797ead52 R15: 0206 [ 
> 0.855335] FS: () GS:88007aa0() 
> knlGS: [ 0.864387] CS: 0010 DS:  ES:  CR0: 
> 8005003b [ 0.870817] CR2: 8800997ea928 CR3: 0220b000 CR4: 
> 001007f0 [ 0.878796] Stack:
> [ 0.881046] 0001 8800797ea650 4000 
>  [ 0.889363] 003c 88007955ddf0 
> 8107ddfd 810b6a95 [ 0.897680]  
> 8800796beb00 8800 8100 [ 0.905998] Call Trace:
> [ 0.908752] [] do_fork+0x12d/0x3b0 [ 0.914416] 
> [] ? set_next_entity+0x95/0xb0 [ 0.920856] 
> [] kernel_thread+0x26/0x30 [ 0.926903] 
> [] __call_usermodehelper+0x2e/0x90 [ 0.933730] 
> [] process_one_work+0x171/0x490 [ 0.940264] 
> [] worker_thread+0x11b/0x3a0 [ 0.946508] 
> [] ? manage_workers.isra.27+0x2b0/0x2b0
> [ 0.953821] [] kthread+0xd2/0xf0 [ 0.959289] 
> [] ? kthread_create_on_node+0x170/0x170
> [ 0.966602] [] ret_from_fork+0x7c/0xb0 [ 0.972652] 
> [] ? kthread_create_on_node+0x170/0x170
> [ 0.979956] Code: 49 89 fc 4c 89 f7 53 e8 bc 5c a4 00 49 8b 54 24 08 
> 31 c9 49 89 c7 49 8b 44 24 60 4c 89 e7 8b 72 18 ba 08 00 00 00 ff 50 
> 40 89
> c2 <49> 0f a3 94 24 e0 02 00 00 19 c9 85 c9 0f 84 34 01 00 00 48 8b [ 
> 1.001809] RIP [] wake_up_new_task+0x43/0x1b0 [ 
> 1.008641] RSP  [ 1.012544] CR2: 8800997ea928 [ 
> 1.016279] --[ end trace 9737aaa337a5ca10 ]--
> 
> Signed-off-by: zhang jun 
> Signed-off-by: Chuansheng Liu 
> Signed-off-by: Changcheng Liu 
> ---
>  kernel/sched/fair.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 
> 34baa60..123153f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4587,6 +4587,8 @@ select_task_rq_fair(struct task_struct *p, int 
> prev_cpu, int sd_flag, int wake_f
>   if (new_cpu == -1 || new_cpu == cpu) {
>   /* Now try balancing at a lower domain level of cpu */
>   sd = sd->child;
> + if ((!sd) && (new_cpu == -1))
> + new_cpu = smp_processor_id();
>   continue;
>   }
> 
In 3.18-rc7 is -1 still selected?

Hillf



RE: [PATCH] sched/fair: fix select_task_rq_fair return -1

2014-12-04 Thread Zhang, Jun
Hello, Hillf
This issue happened in 3.14.25. 
Do you know which patch to fix it in 3.18-rc7?
We can try it.

-Original Message-
From: Hillf Danton [mailto:hillf...@alibaba-inc.com] 
Sent: Thursday, December 04, 2014 5:05 PM
To: Zhang, Jun
Cc: Ingo Molnar; Peter Zijlstra; linux-kernel; Liu, Chuansheng; Liu, 
Changcheng; Hillf Danton; Vincent Guittot
Subject: Re: [PATCH] sched/fair: fix select_task_rq_fair return -1

 
 From: zhang jun jun.zh...@intel.com
 
 when cpu == -1 and sd-child == NULL, select_task_rq_fair return -1, system 
 panic.
 
 [ 0.738326] BUG: unable to handle kernel paging request at 
 8800997ea928 [ 0.746138] IP: [810b15d3] 
 wake_up_new_task+0x43/0x1b0 [ 0.752886] PGD 25df067 PUD 0 [ 0.756321] 
 Oops:  1 PREEMPT SMP [ 0.760743] Modules linked in:
 [ 0.764179] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 
 3.14.19-quilt-b27ac761 #2 [ 0.772651] Hardware name: Intel Corporation 
 CHERRYVIEW B1 PLATFORM/Cherry Trail CR, BIOS 
 CHTTRVP1.X64.0003.R08.140453
 11/11/2014
 [ 0.786084] Workqueue: khelper __call_usermodehelper [ 0.791649] task: 
 88007955a150 ti: 88007955c000 task.ti: 88007955c000 [ 
 0.800021] RIP: 0010:[810b15d3] [810b15d3] 
 wake_up_new_task+0x43/0x1b0 [ 0.809478] RSP: :88007955dd58 
 EFLAGS: 00010092 [ 0.815422] RAX:  RBX: 
 0001 RCX: 0020 [ 0.823404] RDX: 
  RSI: 0020 RDI: 0020 [ 
 0.831386] RBP: 88007955dd80 R08: 880079604b58 R09: 
  [ 0.839368] R10: 0004 R11: 
 eae0 R12: 8800797ea650 [ 0.847350] R13: 
 4000 R14: 8800797ead52 R15: 0206 [ 
 0.855335] FS: () GS:88007aa0() 
 knlGS: [ 0.864387] CS: 0010 DS:  ES:  CR0: 
 8005003b [ 0.870817] CR2: 8800997ea928 CR3: 0220b000 CR4: 
 001007f0 [ 0.878796] Stack:
 [ 0.881046] 0001 8800797ea650 4000 
  [ 0.889363] 003c 88007955ddf0 
 8107ddfd 810b6a95 [ 0.897680]  
 8800796beb00 8800 8100 [ 0.905998] Call Trace:
 [ 0.908752] [8107ddfd] do_fork+0x12d/0x3b0 [ 0.914416] 
 [810b6a95] ? set_next_entity+0x95/0xb0 [ 0.920856] 
 [8107e0a6] kernel_thread+0x26/0x30 [ 0.926903] 
 [8109703e] __call_usermodehelper+0x2e/0x90 [ 0.933730] 
 [8109ad31] process_one_work+0x171/0x490 [ 0.940264] 
 [8109ba4b] worker_thread+0x11b/0x3a0 [ 0.946508] 
 [8109b930] ? manage_workers.isra.27+0x2b0/0x2b0
 [ 0.953821] [810a1802] kthread+0xd2/0xf0 [ 0.959289] 
 [810a1730] ? kthread_create_on_node+0x170/0x170
 [ 0.966602] [81af81ac] ret_from_fork+0x7c/0xb0 [ 0.972652] 
 [810a1730] ? kthread_create_on_node+0x170/0x170
 [ 0.979956] Code: 49 89 fc 4c 89 f7 53 e8 bc 5c a4 00 49 8b 54 24 08 
 31 c9 49 89 c7 49 8b 44 24 60 4c 89 e7 8b 72 18 ba 08 00 00 00 ff 50 
 40 89
 c2 49 0f a3 94 24 e0 02 00 00 19 c9 85 c9 0f 84 34 01 00 00 48 8b [ 
 1.001809] RIP [810b15d3] wake_up_new_task+0x43/0x1b0 [ 
 1.008641] RSP 88007955dd58 [ 1.012544] CR2: 8800997ea928 [ 
 1.016279] --[ end trace 9737aaa337a5ca10 ]--
 
 Signed-off-by: zhang jun jun.zh...@intel.com
 Signed-off-by: Chuansheng Liu chuansheng@intel.com
 Signed-off-by: Changcheng Liu changcheng@intel.com
 ---
  kernel/sched/fair.c |2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 
 34baa60..123153f 100644
 --- a/kernel/sched/fair.c
 +++ b/kernel/sched/fair.c
 @@ -4587,6 +4587,8 @@ select_task_rq_fair(struct task_struct *p, int 
 prev_cpu, int sd_flag, int wake_f
   if (new_cpu == -1 || new_cpu == cpu) {
   /* Now try balancing at a lower domain level of cpu */
   sd = sd-child;
 + if ((!sd)  (new_cpu == -1))
 + new_cpu = smp_processor_id();
   continue;
   }
 
In 3.18-rc7 is -1 still selected?

Hillf



[PATCH] firmware: give a protection when map page failed

2014-01-28 Thread Zhang, Jun
return retval;
 }
 
-- 
1.7.9.5


-----------
Best Regards
Zhang, Jun
Android System Integration Shanghai
Tel: +86 (0)21 6116 4273
Mob:+86 (0)15821507662


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] firmware: give a protection when map page failed

2014-01-28 Thread Zhang, Jun
, firmware_loading_show, 
firmware_loading_store);
@@ -916,6 +921,8 @@ err_del_dev:
device_del(f_dev);
 err_put_dev:
put_device(f_dev);
+   if (!buf-data)
+   return -ENOMEM;
return retval;
 }
 
-- 
1.7.9.5


---
Best Regards
Zhang, Jun
Android System Integration Shanghai
Tel: +86 (0)21 6116 4273
Mob:+86 (0)15821507662


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] At present, function sched_clock_idle_wakeup_event input parameter is not necessary, we can remove it.

2013-02-27 Thread Zhang, Jun
Signed-off-by: jzha144 
---
 arch/x86/kernel/tsc.c |2 +-
 drivers/acpi/processor_idle.c |4 ++--
 include/linux/sched.h |6 +++---
 kernel/sched/clock.c  |4 ++--
 kernel/time/tick-sched.c  |2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 4b9ea10..957d32e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 (1UL << CYC2NS_SCALE_FACTOR));
}
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
local_irq_restore(flags);
 }
 
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index fc95308..e725c83 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -809,7 +809,7 @@ static int acpi_idle_enter_simple(struct cpuidle_device 
*dev,
sched_clock_idle_sleep_event();
acpi_idle_do_entry(cx);
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
 
if (cx->entry_method != ACPI_CSTATE_FFH)
current_thread_info()->status |= TS_POLLING;
@@ -905,7 +905,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
raw_spin_unlock(_lock);
}
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
 
if (cx->entry_method != ACPI_CSTATE_FFH)
current_thread_info()->status |= TS_POLLING;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6853bf9..232cca9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1975,7 +1975,7 @@ static inline void sched_clock_idle_sleep_event(void)
 {
 }
 
-static inline void sched_clock_idle_wakeup_event(u64 delta_ns)
+static inline void sched_clock_idle_wakeup_event(void)
 {
 }
 #else
@@ -1989,7 +1989,7 @@ extern int sched_clock_stable;
 
 extern void sched_clock_tick(void);
 extern void sched_clock_idle_sleep_event(void);
-extern void sched_clock_idle_wakeup_event(u64 delta_ns);
+extern void sched_clock_idle_wakeup_event(void);
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
@@ -2016,7 +2016,7 @@ extern void sched_exec(void);
 #endif
 
 extern void sched_clock_idle_sleep_event(void);
-extern void sched_clock_idle_wakeup_event(u64 delta_ns);
+extern void sched_clock_idle_wakeup_event(void);
 
 #ifdef CONFIG_HOTPLUG_CPU
 extern void idle_task_exit(void);
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c685e31..759a5ff 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -266,9 +266,9 @@ void sched_clock_idle_sleep_event(void)
 EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
 
 /*
- * We just idled delta nanoseconds (called with irqs disabled):
+ * We are waking up form idle (called with irqs disabled):
  */
-void sched_clock_idle_wakeup_event(u64 delta_ns)
+void sched_clock_idle_wakeup_event(void)
 {
if (timekeeping_suspended)
return;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 314b9ee..d358a18 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -221,7 +221,7 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
update_ts_time_stats(cpu, ts, now, NULL);
ts->idle_active = 0;
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
 }
 
 static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
-- 
1.7.6

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] At present, function sched_clock_idle_wakeup_event input parameter is not necessary, we can remove it.

2013-02-27 Thread Zhang, Jun
Signed-off-by: jzha144 jun.zh...@intel.com
---
 arch/x86/kernel/tsc.c |2 +-
 drivers/acpi/processor_idle.c |4 ++--
 include/linux/sched.h |6 +++---
 kernel/sched/clock.c  |4 ++--
 kernel/time/tick-sched.c  |2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 4b9ea10..957d32e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -629,7 +629,7 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)
 (1UL  CYC2NS_SCALE_FACTOR));
}
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
local_irq_restore(flags);
 }
 
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index fc95308..e725c83 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -809,7 +809,7 @@ static int acpi_idle_enter_simple(struct cpuidle_device 
*dev,
sched_clock_idle_sleep_event();
acpi_idle_do_entry(cx);
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
 
if (cx-entry_method != ACPI_CSTATE_FFH)
current_thread_info()-status |= TS_POLLING;
@@ -905,7 +905,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
raw_spin_unlock(c3_lock);
}
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
 
if (cx-entry_method != ACPI_CSTATE_FFH)
current_thread_info()-status |= TS_POLLING;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6853bf9..232cca9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1975,7 +1975,7 @@ static inline void sched_clock_idle_sleep_event(void)
 {
 }
 
-static inline void sched_clock_idle_wakeup_event(u64 delta_ns)
+static inline void sched_clock_idle_wakeup_event(void)
 {
 }
 #else
@@ -1989,7 +1989,7 @@ extern int sched_clock_stable;
 
 extern void sched_clock_tick(void);
 extern void sched_clock_idle_sleep_event(void);
-extern void sched_clock_idle_wakeup_event(u64 delta_ns);
+extern void sched_clock_idle_wakeup_event(void);
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
@@ -2016,7 +2016,7 @@ extern void sched_exec(void);
 #endif
 
 extern void sched_clock_idle_sleep_event(void);
-extern void sched_clock_idle_wakeup_event(u64 delta_ns);
+extern void sched_clock_idle_wakeup_event(void);
 
 #ifdef CONFIG_HOTPLUG_CPU
 extern void idle_task_exit(void);
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index c685e31..759a5ff 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -266,9 +266,9 @@ void sched_clock_idle_sleep_event(void)
 EXPORT_SYMBOL_GPL(sched_clock_idle_sleep_event);
 
 /*
- * We just idled delta nanoseconds (called with irqs disabled):
+ * We are waking up form idle (called with irqs disabled):
  */
-void sched_clock_idle_wakeup_event(u64 delta_ns)
+void sched_clock_idle_wakeup_event(void)
 {
if (timekeeping_suspended)
return;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 314b9ee..d358a18 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -221,7 +221,7 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
update_ts_time_stats(cpu, ts, now, NULL);
ts-idle_active = 0;
 
-   sched_clock_idle_wakeup_event(0);
+   sched_clock_idle_wakeup_event();
 }
 
 static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
-- 
1.7.6

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] crash dump: don't delete non-E820_RAM during init

2012-11-04 Thread Zhang, Jun
Hello, Anvin
1) use "memmap=exactmap", which remove all ranges include ram and non-ram 
range. So my first patch reserve the non-ram range.
2)don't use " memmap=exactmap ", so it reserve all ranges. But we only need 
non-ram, so we need kernel to remove RAM, just as my second patch.


Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Monday, November 05, 2012 10:39 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] crash dump: don't delete non-E820_RAM during init

On 11/05/2012 02:37 AM, Zhang, Jun wrote:
> Hello, Gortmaker
> I will modify my subject. Thanks!
> 
> Hello, Anvin
> from our three options, I think third option is better. But in 3) option, 
> there are two choose, 3.1) is like memmap=REMOVERAM, 3.2) is 
> memmap=CRASHKDUMP.
> In 3.1) we maybe need ifdef/endif within the { } of the function (like 
> exactmap).
> In 3.2) we can remove the ifdef/endif. 
> Which one is the better? Maybe you have a better solution, please share it. 
> Thanks!
> 
> Next is our three option.
> 1)  my patch.
> 2)  modify kexec, only pass two parameters -- memmap=544K@64K 
> memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM 
> range.
> 3)  add extra optional, 3.1) like memmap=REMOVERAM
>3.2) like memmap=CRASHKDUMP
> 

Again, 2 would be better because it is a localized change to kexec.  If that 
works I don't see why there is any reason to change anything else.

-hpa




RE: [PATCH] crash dump: don't delete non-E820_RAM during init

2012-11-04 Thread Zhang, Jun
Hello, Gortmaker
I will modify my subject. Thanks!

Hello, Anvin
from our three options, I think third option is better. But in 3) option, there 
are two choose, 3.1) is like memmap=REMOVERAM, 3.2) is memmap=CRASHKDUMP.
In 3.1) we maybe need ifdef/endif within the { } of the function (like 
exactmap).
In 3.2) we can remove the ifdef/endif. 
Which one is the better? Maybe you have a better solution, please share it. 
Thanks!

Next is our three option.
1)  my patch.
2)  modify kexec, only pass two parameters -- memmap=544K@64K 
memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM 
range.
3)  add extra optional, 3.1) like memmap=REMOVERAM
   3.2) like memmap=CRASHKDUMP

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Saturday, November 03, 2012 12:38 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 11/01/2012 01:49 AM, Zhang, Jun wrote:
> Hello, Anvin
> 
> Thank for your advice.
> 
> Hello, All
> 
> the next patch is made by 2), please review it. Thanks!
> 

No, it is not.

You are still modifying the behavior of the kernel depending on 
CONFIG_CRASH_DUMP.

CONFIG_CRASH_DUMP doesn't mean "we are doing a crash dump".  It means "it is 
possible to use this kernel to do a crash dump".

Either you are using standard kernel parameters in a standard way which is what 
option 2 was supposed to be -- it should require no kernel changes! -- or you 
have to put something in a code path specific to a crash dump.

-hpa

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] crash dump: don't delete non-E820_RAM during init

2012-11-04 Thread Zhang, Jun
Hello, Gortmaker
I will modify my subject. Thanks!

Hello, Anvin
from our three options, I think third option is better. But in 3) option, there 
are two choose, 3.1) is like memmap=REMOVERAM, 3.2) is memmap=CRASHKDUMP.
In 3.1) we maybe need ifdef/endif within the { } of the function (like 
exactmap).
In 3.2) we can remove the ifdef/endif. 
Which one is the better? Maybe you have a better solution, please share it. 
Thanks!

Next is our three option.
1)  my patch.
2)  modify kexec, only pass two parameters -- memmap=544K@64K 
memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM 
range.
3)  add extra optional, 3.1) like memmap=REMOVERAM
   3.2) like memmap=CRASHKDUMP

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Saturday, November 03, 2012 12:38 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 11/01/2012 01:49 AM, Zhang, Jun wrote:
 Hello, Anvin
 
 Thank for your advice.
 
 Hello, All
 
 the next patch is made by 2), please review it. Thanks!
 

No, it is not.

You are still modifying the behavior of the kernel depending on 
CONFIG_CRASH_DUMP.

CONFIG_CRASH_DUMP doesn't mean we are doing a crash dump.  It means it is 
possible to use this kernel to do a crash dump.

Either you are using standard kernel parameters in a standard way which is what 
option 2 was supposed to be -- it should require no kernel changes! -- or you 
have to put something in a code path specific to a crash dump.

-hpa

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] crash dump: don't delete non-E820_RAM during init

2012-11-04 Thread Zhang, Jun
Hello, Anvin
1) use memmap=exactmap, which remove all ranges include ram and non-ram 
range. So my first patch reserve the non-ram range.
2)don't use  memmap=exactmap , so it reserve all ranges. But we only need 
non-ram, so we need kernel to remove RAM, just as my second patch.


Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Monday, November 05, 2012 10:39 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] crash dump: don't delete non-E820_RAM during init

On 11/05/2012 02:37 AM, Zhang, Jun wrote:
 Hello, Gortmaker
 I will modify my subject. Thanks!
 
 Hello, Anvin
 from our three options, I think third option is better. But in 3) option, 
 there are two choose, 3.1) is like memmap=REMOVERAM, 3.2) is 
 memmap=CRASHKDUMP.
 In 3.1) we maybe need ifdef/endif within the { } of the function (like 
 exactmap).
 In 3.2) we can remove the ifdef/endif. 
 Which one is the better? Maybe you have a better solution, please share it. 
 Thanks!
 
 Next is our three option.
 1)  my patch.
 2)  modify kexec, only pass two parameters -- memmap=544K@64K 
 memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM 
 range.
 3)  add extra optional, 3.1) like memmap=REMOVERAM
3.2) like memmap=CRASHKDUMP
 

Again, 2 would be better because it is a localized change to kexec.  If that 
works I don't see why there is any reason to change anything else.

-hpa




RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-11-01 Thread Zhang, Jun
Hello, Anvin

Thank for your advice.

Hello, All

the next patch is made by 2), please review it. Thanks!

Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
 memory information in order to do I/O. So only remove all
 RAM ranges which need to be dumped.

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c  |8 
 arch/x86/kernel/setup.c |   22 ++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..0bc1687 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -844,14 +844,6 @@ static int __init parse_memmap_opt(char *p)
return -EINVAL;
 
if (!strncmp(p, "exactmap", 8)) {
-#ifdef CONFIG_CRASH_DUMP
-   /*
-* If we are doing a crash dump, we still need to know
-* the real mem size before original memory map is
-* reset.
-*/
-   saved_max_pfn = e820_end_of_ram_pfn();
-#endif
e820.nr_map = 0;
userdef = 1;
return 0;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ca45696..5eb178b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -480,6 +480,25 @@ static void __init e820_reserve_setup_data(void)
e820_print_map("reserve setup_data");
 }
 
+#ifdef CONFIG_CRASH_DUMP
+static void __init e820_crashdump_remove_ram(void)
+{
+   /*
+* We are doing a crash dump, so remove all RAM ranges
+* as they are the ones that need to be dumped.
+* We still need all non-RAM information in order to do I/O.
+*/
+   /* NOTE: if you use old kexec, please remove memmap=exactmap
+* which remove all ranges, not only RAM ranges.
+*/
+   saved_max_pfn = e820_end_of_ram_pfn();
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), _map);
+   printk(KERN_INFO "crash dump non-RAM map:\n");
+   e820_print_map("crash_dump");
+}
+#endif
+
 static void __init memblock_x86_reserve_range_setup_data(void)
 {
struct setup_data *data;
@@ -751,6 +770,9 @@ void __init setup_arch(char **cmdline_p)
parse_setup_data();
/* update the e820_saved too */
e820_reserve_setup_data();
+#ifdef CONFIG_CRASH_DUMP
+   e820_crashdump_remove_ram();
+#endif
 
copy_edd();
 
-- 
1.7.6

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Thursday, November 01, 2012 12:20 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

2) would make most sense to me, but I'd be okay with 3) as well.

"Zhang, Jun"  wrote:

>Hello, Anvin
>
>I want to explain why I modify in this place. In kexec, it pass three 
>parameters, memmap=exactmap memmap=544K@64K memmap=64964K@32768K I 
>think my patch modify the least code.
>Actually, there are some choise to fix it. 
>1)  my patch.
>2)  modify kexec, only pass two parameters -- memmap=544K@64K 
>memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM 
>range.
>3)  add extra optional, like memmap=REMOVERAM
>
>Which one do you like? Maybe you have better solution, please share it.
>Thanks!
>
>Best Regards!
>
>Jun Zhang
>Inet: 8821-4273
>Dir.Tel: 86-21-6116-4273
>Email: jun.zh...@intel.com
>
>-----Original Message-
>From: H. Peter Anvin [mailto:h...@zytor.com]
>Sent: Wednesday, October 31, 2012 1:39 PM
>To: Zhang, Jun
>Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; 
>Fleming, Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH] To crash dump, we need keep other memory type 
>except E820_RAM, because other type come from BIOS or firmware is used 
>by other code(for example: PCI_MMCONFIG).
>
>On 10/30/2012 10:22 PM, Zhang, Jun wrote:
>> Hello, Anvin
>>You are right. Thanks!
>>
>> Hello, All
>>Please review it again. Thanks!
>>
>>  From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00
>> 2001
>> From: jzha144 
>> Date: Wed, 31 Oct 2012 08:51:18 +0800
>> Subject: [PATCH] When we are doing a crash dump, we still need
>non-E820_RAM
>>   memory type address information in order to do I/O. so only
>>   remove all RAM ranges which need to be dumped.
>>
>> Signed-off-by: jzha144 
>> ---
>>   arch/x86/kernel/e820.c |9 

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-11-01 Thread Zhang, Jun
Hello, Anvin

Thank for your advice.

Hello, All

the next patch is made by 2), please review it. Thanks!

Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
 memory information in order to do I/O. So only remove all
 RAM ranges which need to be dumped.

Signed-off-by: jzha144 jun.zh...@intel.com
---
 arch/x86/kernel/e820.c  |8 
 arch/x86/kernel/setup.c |   22 ++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..0bc1687 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -844,14 +844,6 @@ static int __init parse_memmap_opt(char *p)
return -EINVAL;
 
if (!strncmp(p, exactmap, 8)) {
-#ifdef CONFIG_CRASH_DUMP
-   /*
-* If we are doing a crash dump, we still need to know
-* the real mem size before original memory map is
-* reset.
-*/
-   saved_max_pfn = e820_end_of_ram_pfn();
-#endif
e820.nr_map = 0;
userdef = 1;
return 0;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ca45696..5eb178b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -480,6 +480,25 @@ static void __init e820_reserve_setup_data(void)
e820_print_map(reserve setup_data);
 }
 
+#ifdef CONFIG_CRASH_DUMP
+static void __init e820_crashdump_remove_ram(void)
+{
+   /*
+* We are doing a crash dump, so remove all RAM ranges
+* as they are the ones that need to be dumped.
+* We still need all non-RAM information in order to do I/O.
+*/
+   /* NOTE: if you use old kexec, please remove memmap=exactmap
+* which remove all ranges, not only RAM ranges.
+*/
+   saved_max_pfn = e820_end_of_ram_pfn();
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), e820.nr_map);
+   printk(KERN_INFO crash dump non-RAM map:\n);
+   e820_print_map(crash_dump);
+}
+#endif
+
 static void __init memblock_x86_reserve_range_setup_data(void)
 {
struct setup_data *data;
@@ -751,6 +770,9 @@ void __init setup_arch(char **cmdline_p)
parse_setup_data();
/* update the e820_saved too */
e820_reserve_setup_data();
+#ifdef CONFIG_CRASH_DUMP
+   e820_crashdump_remove_ram();
+#endif
 
copy_edd();
 
-- 
1.7.6

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Thursday, November 01, 2012 12:20 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: RE: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

2) would make most sense to me, but I'd be okay with 3) as well.

Zhang, Jun jun.zh...@intel.com wrote:

Hello, Anvin

I want to explain why I modify in this place. In kexec, it pass three 
parameters, memmap=exactmap memmap=544K@64K memmap=64964K@32768K I 
think my patch modify the least code.
Actually, there are some choise to fix it. 
1)  my patch.
2)  modify kexec, only pass two parameters -- memmap=544K@64K 
memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM 
range.
3)  add extra optional, like memmap=REMOVERAM

Which one do you like? Maybe you have better solution, please share it.
Thanks!

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com]
Sent: Wednesday, October 31, 2012 1:39 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; 
Fleming, Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type 
except E820_RAM, because other type come from BIOS or firmware is used 
by other code(for example: PCI_MMCONFIG).

On 10/30/2012 10:22 PM, Zhang, Jun wrote:
 Hello, Anvin
You are right. Thanks!

 Hello, All
Please review it again. Thanks!

  From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00
 2001
 From: jzha144 jun.zh...@intel.com
 Date: Wed, 31 Oct 2012 08:51:18 +0800
 Subject: [PATCH] When we are doing a crash dump, we still need
non-E820_RAM
   memory type address information in order to do I/O. so only
   remove all RAM ranges which need to be dumped.

 Signed-off-by: jzha144 jun.zh...@intel.com
 ---
   arch/x86/kernel/e820.c |9 +
   1 files changed, 9 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index
 df06ade..77be839 100644
 --- a/arch/x86/kernel/e820.c
 +++ b/arch/x86/kernel/e820.c
 @@ -851,6 +851,15 @@ static int __init

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-31 Thread Zhang, Jun
Hello, Anvin

I want to explain why I modify in this place. In kexec, it pass three 
parameters, memmap=exactmap memmap=544K@64K memmap=64964K@32768K
I think my patch modify the least code. 
Actually, there are some choise to fix it. 
1)  my patch.
2)  modify kexec, only pass two parameters -- memmap=544K@64K 
memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM range.
3)  add extra optional, like memmap=REMOVERAM

Which one do you like? Maybe you have better solution, please share it.
Thanks!

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 1:39 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 10:22 PM, Zhang, Jun wrote:
> Hello, Anvin
>You are right. Thanks!
>
> Hello, All
>Please review it again. Thanks!
>
>  From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 
> 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
>   memory type address information in order to do I/O. so only
>   remove all RAM ranges which need to be dumped.
>
> Signed-off-by: jzha144 
> ---
>   arch/x86/kernel/e820.c |9 +
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
> df06ade..77be839 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
>* reset.
>*/
>   saved_max_pfn = e820_end_of_ram_pfn();
> +
> + /*
> +  * We are doing a crash dump, so remove all RAM ranges
> +  * as they are the ones that need to be dumped.
> +  * We still need all non-RAM information in order to do I/O.
> +  */
> + e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
> + userdef = 1;
> + return 0;
>   #endif
>   e820.nr_map = 0;
>   userdef = 1;
>

The code is still wrong...

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center I work for Intel.  I don't 
speak on their behalf.

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-31 Thread Zhang, Jun
Hello, Anvin

I want to explain why I modify in this place. In kexec, it pass three 
parameters, memmap=exactmap memmap=544K@64K memmap=64964K@32768K
I think my patch modify the least code. 
Actually, there are some choise to fix it. 
1)  my patch.
2)  modify kexec, only pass two parameters -- memmap=544K@64K 
memmap=64964K@32768K, in kernel setup_memory_map, we can remove RAM range.
3)  add extra optional, like memmap=REMOVERAM

Which one do you like? Maybe you have better solution, please share it.
Thanks!

Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 1:39 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 10:22 PM, Zhang, Jun wrote:
 Hello, Anvin
You are right. Thanks!

 Hello, All
Please review it again. Thanks!

  From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 
 2001
 From: jzha144 jun.zh...@intel.com
 Date: Wed, 31 Oct 2012 08:51:18 +0800
 Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
   memory type address information in order to do I/O. so only
   remove all RAM ranges which need to be dumped.

 Signed-off-by: jzha144 jun.zh...@intel.com
 ---
   arch/x86/kernel/e820.c |9 +
   1 files changed, 9 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
 df06ade..77be839 100644
 --- a/arch/x86/kernel/e820.c
 +++ b/arch/x86/kernel/e820.c
 @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
* reset.
*/
   saved_max_pfn = e820_end_of_ram_pfn();
 +
 + /*
 +  * We are doing a crash dump, so remove all RAM ranges
 +  * as they are the ones that need to be dumped.
 +  * We still need all non-RAM information in order to do I/O.
 +  */
 + e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
 + userdef = 1;
 + return 0;
   #endif
   e820.nr_map = 0;
   userdef = 1;


The code is still wrong...

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center I work for Intel.  I don't 
speak on their behalf.

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
Hello, Anvin
  You are right. Thanks!

Hello, All
  Please review it again. Thanks!

From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
 memory type address information in order to do I/O. so only
 remove all RAM ranges which need to be dumped.

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..77be839 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* We are doing a crash dump, so remove all RAM ranges
+* as they are the ones that need to be dumped.
+* We still need all non-RAM information in order to do I/O.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6

Best Regards!
Zhang, jun

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 12:38 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 08:39 PM, Zhang, Jun wrote:
> Hello, Anvin
> Thanks!
>
> Hello, all
> Next is my the latest version, please review it.
> Thanks!

You're still starting in the wrong end which is confusing for the reader.

What you probably want to say is something more like:

"We are doing a crash dump, so remove all RAM ranges as they are the ones that 
need to be dumped.  We still need all non-RAM information in order to do I/O."

At that point it should be pretty obvious that the patch is wrong.  What if we 
are *not* doing a crash dump?  Just because crash dump is compiled in doesn't 
mean that that is what we are doing right now.

-hpa

>  From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 
> 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
>   memory type address information, which come from BIOS or
>   firmware. for example: PCI_MMCONFIG check this address.
>
> Signed-off-by: jzha144 
> ---
>   arch/x86/kernel/e820.c |9 +
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
> df06ade..f8672d0 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
>* reset.
>*/
>   saved_max_pfn = e820_end_of_ram_pfn();
> +
> + /*
> +  * If we are doing a crash dump, we still need non-E820_RAM
> +  * memory type address information. so we only remove
> +  * E820_RAM type.
> +  */
> + e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
> + userdef = 1;
> + return 0;
>   #endif
>   e820.nr_map = 0;
>   userdef = 1;
>


--
H. Peter Anvin, Intel Open Source Technology Center I work for Intel.  I don't 
speak on their behalf.

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
Hello, Anvin
Thanks!

Hello, all
Next is my the latest version, please review it. 
Thanks!

>From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
 memory type address information, which come from BIOS or
 firmware. for example: PCI_MMCONFIG check this address.

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..f8672d0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* If we are doing a crash dump, we still need non-E820_RAM
+* memory type address information. so we only remove
+* E820_RAM type.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6


Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 10:47 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 06:26 PM, Zhang, Jun wrote:
> From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
> From: jzha144 
> Date: Wed, 31 Oct 2012 08:51:18 +0800
> Subject: [PATCH] To crash dump, we need keep other memory type except  
> E820_RAM, because other type come from BIOS or firmware is  used by 
> other code(for example: PCI_MMCONFIG).

I'm sorry, I can't quite parse the description or the comment... could you 
clarify it a bit?  I think I know what you mean, but there is clearly risk for 
misunderstandings.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
>From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] To crash dump, we need keep other memory type except
 E820_RAM, because other type come from BIOS or firmware is
 used by other code(for example: PCI_MMCONFIG).

Signed-off-by: jzha144 
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..8760427 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* To CRASH DUMP, only remove E820_RAM.
+*  some other memory typecome from BIOS or firmware,
+* it must be same with system kernel.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
From: jzha144 jun.zh...@intel.com
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] To crash dump, we need keep other memory type except
 E820_RAM, because other type come from BIOS or firmware is
 used by other code(for example: PCI_MMCONFIG).

Signed-off-by: jzha144 jun.zh...@intel.com
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..8760427 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* To CRASH DUMP, only remove E820_RAM.
+*  some other memory typecome from BIOS or firmware,
+* it must be same with system kernel.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
Hello, Anvin
Thanks!

Hello, all
Next is my the latest version, please review it. 
Thanks!

From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 2001
From: jzha144 jun.zh...@intel.com
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
 memory type address information, which come from BIOS or
 firmware. for example: PCI_MMCONFIG check this address.

Signed-off-by: jzha144 jun.zh...@intel.com
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..f8672d0 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* If we are doing a crash dump, we still need non-E820_RAM
+* memory type address information. so we only remove
+* E820_RAM type.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6


Best Regards!

Jun Zhang
Inet: 8821-4273
Dir.Tel: 86-21-6116-4273
Email: jun.zh...@intel.com


-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 10:47 AM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 06:26 PM, Zhang, Jun wrote:
 From aebc336baa7ec2d4ccb6f21166770c7d2ee26cba Mon Sep 17 00:00:00 2001
 From: jzha144 jun.zh...@intel.com
 Date: Wed, 31 Oct 2012 08:51:18 +0800
 Subject: [PATCH] To crash dump, we need keep other memory type except  
 E820_RAM, because other type come from BIOS or firmware is  used by 
 other code(for example: PCI_MMCONFIG).

I'm sorry, I can't quite parse the description or the comment... could you 
clarify it a bit?  I think I know what you mean, but there is clearly risk for 
misunderstandings.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] To crash dump, we need keep other memory type except E820_RAM, because other type come from BIOS or firmware is used by other code(for example: PCI_MMCONFIG).

2012-10-30 Thread Zhang, Jun
Hello, Anvin
  You are right. Thanks!

Hello, All
  Please review it again. Thanks!

From bf7506ac7e9ce0df0b915164dbb7a6d858ef2e40 Mon Sep 17 00:00:00 2001
From: jzha144 jun.zh...@intel.com
Date: Wed, 31 Oct 2012 08:51:18 +0800
Subject: [PATCH] When we are doing a crash dump, we still need non-E820_RAM
 memory type address information in order to do I/O. so only
 remove all RAM ranges which need to be dumped.

Signed-off-by: jzha144 jun.zh...@intel.com
---
 arch/x86/kernel/e820.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index df06ade..77be839 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
 * reset.
 */
saved_max_pfn = e820_end_of_ram_pfn();
+
+   /*
+* We are doing a crash dump, so remove all RAM ranges
+* as they are the ones that need to be dumped.
+* We still need all non-RAM information in order to do I/O.
+*/
+   e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
+   userdef = 1;
+   return 0;
 #endif
e820.nr_map = 0;
userdef = 1;
-- 
1.7.6

Best Regards!
Zhang, jun

-Original Message-
From: H. Peter Anvin [mailto:h...@zytor.com] 
Sent: Wednesday, October 31, 2012 12:38 PM
To: Zhang, Jun
Cc: Thomas Gleixner; Ingo Molnar; x...@kernel.org; Andrew Morton; Fleming, 
Matt; Paul Gortmaker; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] To crash dump, we need keep other memory type except 
E820_RAM, because other type come from BIOS or firmware is used by other 
code(for example: PCI_MMCONFIG).

On 10/30/2012 08:39 PM, Zhang, Jun wrote:
 Hello, Anvin
 Thanks!

 Hello, all
 Next is my the latest version, please review it.
 Thanks!

You're still starting in the wrong end which is confusing for the reader.

What you probably want to say is something more like:

We are doing a crash dump, so remove all RAM ranges as they are the ones that 
need to be dumped.  We still need all non-RAM information in order to do I/O.

At that point it should be pretty obvious that the patch is wrong.  What if we 
are *not* doing a crash dump?  Just because crash dump is compiled in doesn't 
mean that that is what we are doing right now.

-hpa

  From 141546c77ff7be523a9e72f5259df4a6827f2c1a Mon Sep 17 00:00:00 
 2001
 From: jzha144 jun.zh...@intel.com
 Date: Wed, 31 Oct 2012 08:51:18 +0800
 Subject: [PATCH] If we are doing a crash dump, we still need non-E820_RAM
   memory type address information, which come from BIOS or
   firmware. for example: PCI_MMCONFIG check this address.

 Signed-off-by: jzha144 jun.zh...@intel.com
 ---
   arch/x86/kernel/e820.c |9 +
   1 files changed, 9 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 
 df06ade..f8672d0 100644
 --- a/arch/x86/kernel/e820.c
 +++ b/arch/x86/kernel/e820.c
 @@ -851,6 +851,15 @@ static int __init parse_memmap_opt(char *p)
* reset.
*/
   saved_max_pfn = e820_end_of_ram_pfn();
 +
 + /*
 +  * If we are doing a crash dump, we still need non-E820_RAM
 +  * memory type address information. so we only remove
 +  * E820_RAM type.
 +  */
 + e820_remove_range(0, ULLONG_MAX, E820_RAM, 1);
 + userdef = 1;
 + return 0;
   #endif
   e820.nr_map = 0;
   userdef = 1;



--
H. Peter Anvin, Intel Open Source Technology Center I work for Intel.  I don't 
speak on their behalf.

N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH] Sometimes, there is OOPS happened when we use oprofile.

2012-10-28 Thread Zhang, Jun
>From fff479313342940372444797814edee996b18fc9 Mon Sep 17 00:00:00 2001
From: jzha144 
Date: Mon, 29 Oct 2012 09:07:22 +0800
Subject: [PATCH] Sometimes, there is OOPS happened when we use oprofile. next
 is the call stack. From call stack, we find in
 call_on_stack if there is a nmi interrupt between "xchgl
 %%ebx,%%esp" and "call *%%edi", system will OOPS.

 BUG: unable to handle kernel paging request at ff06383f
 IP: [] print_context_stack+0x4d/0x100
 *pde = 
 Oops:  [#1] PREEMPT SMP
 Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211
 compat btwilink atomisp lm3554 mt9m114 mt9e013 videobuf2_memops videobuf2_core 
st_drv matrix(C)

 Pid: 162, comm: adbd Tainted: GWC  3.0.34-140446-g9e77874-dirty #1 
Intel Corporation
 EIP: 0060:[] EFLAGS: 00010083 CPU: 1
 EIP is at print_context_stack+0x4d/0x100
 EAX: ff063ffc EBX: ff06383f ECX: f4a0bd74 EDX: ff06383f
 ESI:  EDI: e000 EBP: f58dbe48 ESP: f58dbe24
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 Process adbd (pid: 162, ti=f58da000 task=f430a730 task.ti=f4a0a000)
 Stack:
  000c ff063ffc f4a0bd74 e000 ff062000 f4a0bd74 ff06383f c1b2b1c0
  ff062000 f58dbe74 c120428f c1b2b1c0 f58dbe98  f58dbe60 
   f4a0bd74 f58dbfc4 0005 f58dbebc c172d52f f4a0bd74 c1b2b1c0
 Call Trace:
  [] dump_trace+0x7f/0xf0
  [] x86_backtrace+0x13f/0x150
  [] ? op_cpu_buffer_write_commit+0x14/0x20
  [] ? log_sample+0x8e/0xb0
  [] oprofile_add_sample+0x9a/0xc0
  [] ppro_check_ctrs+0x8e/0x110
  [] ? rb_reserve_next_event+0x3e/0x370
  [] profile_exceptions_notify+0x67/0x70
  [] notifier_call_chain+0x47/0x90
  [] __atomic_notifier_call_chain+0x38/0x50
  [] ? remote_softirq_receive+0x110/0x110
  [] atomic_notifier_call_chain+0x1f/0x30
  [] notify_die+0x2d/0x30
  [] do_nmi+0xb0/0x300
  [] ? __local_bh_enable+0x4f/0xa0
  [] nmi_stack_correct+0x28/0x2d
  [] ? remote_softirq_receive+0x110/0x110
  [] ? do_softirq+0x8f/0xe0
  
  [] irq_exit+0x86/0xd0
  [] smp_apic_timer_interrupt+0x59/0x88
  [] ? trace_hardirqs_off_thunk+0xc/0x14
  [] apic_timer_interrupt+0x2f/0x34
  [] ? handle_vm86_fault+0x78b/0x9b0
  [] ? _raw_spin_unlock_irqrestore+0x3f/0x50
  [] __wake_up_sync_key+0x4c/0x60
  [] sock_def_readable+0x40/0x70
  [] unix_stream_sendmsg+0x22d/0x390
  [] sock_aio_write+0x11b/0x140
  [] ? __schedule+0x23d/0x8d0
  [] ? nmi_stack_correct+0x28/0x2d
  [] do_sync_write+0xa9/0xe0
  [] ? sub_preempt_count+0x3d/0x50
  [] vfs_write+0x151/0x160
  [] ? fget_light+0x58/0xd0
  [] sys_write+0x3d/0x70
  [] syscall_call+0x7/0xb
 Code: f6 89 4d f0 89 4d e4 89 45 e0 89 7d e8 74 5e 8d b4 26 00 00 00 00 39
 f3 72 0c 8b 45 f0 83 c4 18 5b 5e 5f 5d c3 90 3b 5d e8 72 ef <8b> 3b 89 f8
 89 7d dc e8 c7 07 06 00 85 c0 74 2b 8b 45 f0 83 c0
 EIP: [] print_context_stack+0x4d/0x100 SS:ESP 0068:f58dbe24
 CR2: ff06383f

Signed-off-by: jzha144 
---
 arch/x86/oprofile/backtrace.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d6aa6e8..c1af4f0 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,6 +113,10 @@ x86_backtrace(struct pt_regs * const regs, unsigned int 
depth)
 
if (!user_mode_vm(regs)) {
unsigned long stack = kernel_stack_pointer(regs);
+
+   if (!((unsigned long)stack & (THREAD_SIZE - 1)))
+   stack = 0;
+
if (depth)
dump_trace(NULL, regs, (unsigned long *)stack, 0,
   _ops, );
-- 
1.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Sometimes, there is OOPS happened when we use oprofile.

2012-10-28 Thread Zhang, Jun
From fff479313342940372444797814edee996b18fc9 Mon Sep 17 00:00:00 2001
From: jzha144 jun.zh...@intel.com
Date: Mon, 29 Oct 2012 09:07:22 +0800
Subject: [PATCH] Sometimes, there is OOPS happened when we use oprofile. next
 is the call stack. From call stack, we find in
 call_on_stack if there is a nmi interrupt between xchgl
 %%ebx,%%esp and call *%%edi, system will OOPS.

 BUG: unable to handle kernel paging request at ff06383f
 IP: [c12051cd] print_context_stack+0x4d/0x100
 *pde = 
 Oops:  [#1] PREEMPT SMP
 Modules linked in: wl12xx_sdio wl12xx mac80211 cfg80211
 compat btwilink atomisp lm3554 mt9m114 mt9e013 videobuf2_memops videobuf2_core 
st_drv matrix(C)

 Pid: 162, comm: adbd Tainted: GWC  3.0.34-140446-g9e77874-dirty #1 
Intel Corporation
 EIP: 0060:[c12051cd] EFLAGS: 00010083 CPU: 1
 EIP is at print_context_stack+0x4d/0x100
 EAX: ff063ffc EBX: ff06383f ECX: f4a0bd74 EDX: ff06383f
 ESI:  EDI: e000 EBP: f58dbe48 ESP: f58dbe24
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 Process adbd (pid: 162, ti=f58da000 task=f430a730 task.ti=f4a0a000)
 Stack:
  000c ff063ffc f4a0bd74 e000 ff062000 f4a0bd74 ff06383f c1b2b1c0
  ff062000 f58dbe74 c120428f c1b2b1c0 f58dbe98  f58dbe60 
   f4a0bd74 f58dbfc4 0005 f58dbebc c172d52f f4a0bd74 c1b2b1c0
 Call Trace:
  [c120428f] dump_trace+0x7f/0xf0
  [c172d52f] x86_backtrace+0x13f/0x150
  [c172b504] ? op_cpu_buffer_write_commit+0x14/0x20
  [c172b66e] ? log_sample+0x8e/0xb0
  [c172b8ca] oprofile_add_sample+0x9a/0xc0
  [c172f09e] ppro_check_ctrs+0x8e/0x110
  [c12a31ce] ? rb_reserve_next_event+0x3e/0x370
  [c172d8d7] profile_exceptions_notify+0x67/0x70
  [c18694c7] notifier_call_chain+0x47/0x90
  [c1869548] __atomic_notifier_call_chain+0x38/0x50
  [c1250930] ? remote_softirq_receive+0x110/0x110
  [c186957f] atomic_notifier_call_chain+0x1f/0x30
  [c18695bd] notify_die+0x2d/0x30
  [c1867390] do_nmi+0xb0/0x300
  [c124fcef] ? __local_bh_enable+0x4f/0xa0
  [c1866f95] nmi_stack_correct+0x28/0x2d
  [c1250930] ? remote_softirq_receive+0x110/0x110
  [c120412f] ? do_softirq+0x8f/0xe0
  IRQ
  [c1250e26] irq_exit+0x86/0xd0
  [c186cb49] smp_apic_timer_interrupt+0x59/0x88
  [c1496738] ? trace_hardirqs_off_thunk+0xc/0x14
  [c1866ca7] apic_timer_interrupt+0x2f/0x34
  [c122007b] ? handle_vm86_fault+0x78b/0x9b0
  [c186661f] ? _raw_spin_unlock_irqrestore+0x3f/0x50
  [c1230d3c] __wake_up_sync_key+0x4c/0x60
  [c17353f0] sock_def_readable+0x40/0x70
  [c17d050d] unix_stream_sendmsg+0x22d/0x390
  [c173103b] sock_aio_write+0x11b/0x140
  [c186375d] ? __schedule+0x23d/0x8d0
  [c1866f95] ? nmi_stack_correct+0x28/0x2d
  [c12feaf9] do_sync_write+0xa9/0xe0
  [c186942d] ? sub_preempt_count+0x3d/0x50
  [c12ff321] vfs_write+0x151/0x160
  [c1300798] ? fget_light+0x58/0xd0
  [c12ff53d] sys_write+0x3d/0x70
  [c18669a1] syscall_call+0x7/0xb
 Code: f6 89 4d f0 89 4d e4 89 45 e0 89 7d e8 74 5e 8d b4 26 00 00 00 00 39
 f3 72 0c 8b 45 f0 83 c4 18 5b 5e 5f 5d c3 90 3b 5d e8 72 ef 8b 3b 89 f8
 89 7d dc e8 c7 07 06 00 85 c0 74 2b 8b 45 f0 83 c0
 EIP: [c12051cd] print_context_stack+0x4d/0x100 SS:ESP 0068:f58dbe24
 CR2: ff06383f

Signed-off-by: jzha144 jun.zh...@intel.com
---
 arch/x86/oprofile/backtrace.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index d6aa6e8..c1af4f0 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -113,6 +113,10 @@ x86_backtrace(struct pt_regs * const regs, unsigned int 
depth)
 
if (!user_mode_vm(regs)) {
unsigned long stack = kernel_stack_pointer(regs);
+
+   if (!((unsigned long)stack  (THREAD_SIZE - 1)))
+   stack = 0;
+
if (depth)
dump_trace(NULL, regs, (unsigned long *)stack, 0,
   backtrace_ops, depth);
-- 
1.7.6
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/